aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-04-20writeback: safer lock nestingGreg Thelen4-26/+34
lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if the page's memcg is undergoing move accounting, which occurs when a process leaves its memcg for a new one that has memory.move_charge_at_immigrate set. unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if the given inode is switching writeback domains. Switches occur when enough writes are issued from a new domain. This existing pattern is thus suspicious: lock_page_memcg(page); unlocked_inode_to_wb_begin(inode, &locked); ... unlocked_inode_to_wb_end(inode, locked); unlock_page_memcg(page); If both inode switch and process memcg migration are both in-flight then unlocked_inode_to_wb_end() will unconditionally enable interrupts while still holding the lock_page_memcg() irq spinlock. This suggests the possibility of deadlock if an interrupt occurs before unlock_page_memcg(). truncate __cancel_dirty_page lock_page_memcg unlocked_inode_to_wb_begin unlocked_inode_to_wb_end <interrupts mistakenly enabled> <interrupt> end_page_writeback test_clear_page_writeback lock_page_memcg <deadlock> unlock_page_memcg Due to configuration limitations this deadlock is not currently possible because we don't mix cgroup writeback (a cgroupv2 feature) and memory.move_charge_at_immigrate (a cgroupv1 feature). If the kernel is hacked to always claim inode switching and memcg moving_account, then this script triggers lockup in less than a minute: cd /mnt/cgroup/memory mkdir a b echo 1 > a/memory.move_charge_at_immigrate echo 1 > b/memory.move_charge_at_immigrate ( echo $BASHPID > a/cgroup.procs while true; do dd if=/dev/zero of=/mnt/big bs=1M count=256 done ) & while true; do sync done & sleep 1h & SLEEP=$! while true; do echo $SLEEP > a/cgroup.procs echo $SLEEP > b/cgroup.procs done The deadlock does not seem possible, so it's debatable if there's any reason to modify the kernel. I suggest we should to prevent future surprises. And Wang Long said "this deadlock occurs three times in our environment", so there's more reason to apply this, even to stable. Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch see "[PATCH for-4.4] writeback: safer lock nesting" https://lkml.org/lkml/2018/4/11/146 Wang Long said "this deadlock occurs three times in our environment" [[email protected]: v4] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: comment tweaks, struct initialization simplification] Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613 Link: http://lkml.kernel.org/r/[email protected] Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates") Signed-off-by: Greg Thelen <[email protected]> Reported-by: Wang Long <[email protected]> Acked-by: Wang Long <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: <[email protected]> [v4.2+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-20mm, pagemap: fix swap offset value for PMD migration entryHuang Ying1-1/+5
The swap offset reported by /proc/<pid>/pagemap may be not correct for PMD migration entries. If addr passed into pagemap_pmd_range() isn't aligned with PMD start address, the swap offset reported doesn't reflect this. And in the loop to report information of each sub-page, the swap offset isn't increased accordingly as that for PFN. This may happen after opening /proc/<pid>/pagemap and seeking to a page whose address doesn't align with a PMD start address. I have verified this with a simple test program. BTW: migration swap entries have PFN information, do we need to restrict whether to show them? [[email protected]: fix typo, per Huang, Ying] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: "Huang, Ying" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Dan Williams <[email protected]> Cc: "Jerome Glisse" <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Zi Yan <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-20mm: fix do_pages_move status handlingMichal Hocko1-0/+3
Li Wang has reported that LTP move_pages04 test fails with the current tree: LTP move_pages04: TFAIL : move_pages04.c:143: status[1] is EPERM, expected EFAULT The test allocates an array of two pages, one is present while the other is not (resp. backed by zero page) and it expects EFAULT for the second page as the man page suggests. We are reporting EPERM which doesn't make any sense and this is a result of a bug from cf5f16b23ec9 ("mm: unclutter THP migration"). do_pages_move tries to handle as many pages in one batch as possible so we queue all pages with the same node target together and that corresponds to [start, i] range which is then used to update status array. add_page_for_migration will correctly notice the zero (resp. !present) page and returns with EFAULT which gets written to the status. But if this is the last page in the array we do not update start and so the last store_status after the loop will overwrite the range of the last batch with NUMA_NO_NODE (which corresponds to EPERM). Fix this by simply bailing out from the last flush if the pagelist is empty as there is clearly nothing more to do. Link: http://lkml.kernel.org/r/[email protected] Fixes: cf5f16b23ec9 ("mm: unclutter THP migration") Signed-off-by: Michal Hocko <[email protected]> Reported-by: Li Wang <[email protected]> Tested-by: Li Wang <[email protected]> Cc: Zi Yan <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-20fork: unconditionally clear stack on forkKees Cook2-7/+2
One of the classes of kernel stack content leaks[1] is exposing the contents of prior heap or stack contents when a new process stack is allocated. Normally, those stacks are not zeroed, and the old contents remain in place. In the face of stack content exposure flaws, those contents can leak to userspace. Fixing this will make the kernel no longer vulnerable to these flaws, as the stack will be wiped each time a stack is assigned to a new process. There's not a meaningful change in runtime performance; it almost looks like it provides a benefit. Performing back-to-back kernel builds before: Run times: 157.86 157.09 158.90 160.94 160.80 Mean: 159.12 Std Dev: 1.54 and after: Run times: 159.31 157.34 156.71 158.15 160.81 Mean: 158.46 Std Dev: 1.46 Instead of making this a build or runtime config, Andy Lutomirski recommended this just be enabled by default. [1] A noisy search for many kinds of stack content leaks can be seen here: https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak I did some more with perf and cycle counts on running 100,000 execs of /bin/true. before: Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841 Mean: 221015379122.60 Std Dev: 4662486552.47 after: Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348 Mean: 217745009865.40 Std Dev: 5935559279.99 It continues to look like it's faster, though the deviation is rather wide, but I'm not sure what I could do that would be less noisy. I'm open to ideas! Link: http://lkml.kernel.org/r/20180221021659.GA37073@beast Signed-off-by: Kees Cook <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-21mtd: rawnand: tango: Fix struct clk memory leakMarc Gonzalez1-1/+1
Use devm_clk_get() to let Linux manage struct clk memory. Fixes: 6956e2385a16 ("add tango NAND flash controller support") Cc: [email protected] Reported-by: Xidong Wang <[email protected]> Signed-off-by: Marc Gonzalez <[email protected]> Reviewed-by: Miquel Raynal <[email protected]> Signed-off-by: Boris Brezillon <[email protected]>
2018-04-20bpf: sockmap remove dead checkJann Horn1-3/+0
Remove dead code that bails on `attr->value_size > KMALLOC_MAX_SIZE` - the previous check already bails on `attr->value_size != 4`. Signed-off-by: Jann Horn <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-04-20CIFS: fix typo in cifs_dbgAurelien Aptel1-1/+1
Signed-off-by: Aurelien Aptel <[email protected]> Signed-off-by: Steve French <[email protected]> Reported-by: Long Li <[email protected]>
2018-04-20cifs: do not allow creating sockets except with SMB1 posix exensionsSteve French1-4/+5
RHBZ: 1453123 Since at least the 3.10 kernel and likely a lot earlier we have not been able to create unix domain sockets in a cifs share when mounted using the SFU mount option (except when mounted with the cifs unix extensions to Samba e.g.) Trying to create a socket, for example using the af_unix command from xfstests will cause : BUG: unable to handle kernel NULL pointer dereference at 00000000 00000040 Since no one uses or depends on being able to create unix domains sockets on a cifs share the easiest fix to stop this vulnerability is to simply not allow creation of any other special files than char or block devices when sfu is used. Added update to Ronnie's patch to handle a tcon link leak, and to address a buf leak noticed by Gustavo and Colin. Acked-by: Gustavo A. R. Silva <[email protected]> CC: Colin Ian King <[email protected]> Reviewed-by: Pavel Shilovsky <[email protected]> Reported-by: Eryu Guan <[email protected]> Signed-off-by: Ronnie Sahlberg <[email protected]> Signed-off-by: Steve French <[email protected]> Cc: [email protected]
2018-04-20PCI: Add "PCIe" to pcie_print_link_status() messagesJakub Kicinski1-2/+2
Currently the pcie_print_link_status() will print PCIe bandwidth and link width information but does not mention it is pertaining to the PCIe. Since this and related functions are used exclusively by networking drivers today users may get confused into thinking that it's the NIC bandwidth that is being talked about. Insert a "PCIe" into the messages. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2018-04-20Merge branch 'fixes' of ↵Linus Torvalds2-33/+6
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal Pull thermal fixes from Eduardo Valentin: "A couple of fixes for the thermal subsystem" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal: dt-bindings: thermal: Remove "cooling-{min|max}-level" properties dt-bindings: thermal: remove no longer needed samsung thermal properties
2018-04-20Merge remote-tracking branch 'lorenzo/pci/host/fixes' into for-linusBjorn Helgaas2-24/+31
- correct GPIO name to fix HiSilicon Kirin probing issue (Loic Poulain) - fix MRRS setting in Marvell Armada Aardvark (Evan Wang) - fix interrupts by using ISR1 on Marvell Armada Aardvark (Victor Gu) - fix config accesses on Marvell Armada Aardvark (Victor Gu) * lorenzo/pci/host/fixes: PCI: kirin: Fix reset gpio name PCI: aardvark: Fix PCIe Max Read Request Size setting PCI: aardvark: Use ISR1 instead of ISR0 interrupt in legacy irq mode PCI: aardvark: Set PIO_ADDR_LS correctly in advk_pcie_rd_conf() PCI: aardvark: Fix logic in advk_pcie_{rd,wr}_conf()
2018-04-20Merge tag 'mmc-v4.17-3' of ↵Linus Torvalds2-8/+56
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "A couple of MMC host fixes: - sdhci-pci: Fixup tuning for AMD for eMMC HS200 mode - renesas_sdhi_internal_dmac: Avoid data corruption by limiting DMA RX" * tag 'mmc-v4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: renesas_sdhi_internal_dmac: limit DMA RX for old SoCs mmc: sdhci-pci: Only do AMD tuning for HS200
2018-04-20Merge tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/mdLinus Torvalds2-7/+24
Pull MD fixes from Shaohua Li: "Three small fixes for MD: - md-cluster fix for faulty device from Guoqing - writehint fix for writebehind IO for raid1 from Mariusz - a live lock fix for interrupted recovery from Yufen" * tag 'md/4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid1: copy write hint from master bio to behind bio md/raid1: exit sync request if MD_RECOVERY_INTR is set md-cluster: don't update recovery_offset for faulty device
2018-04-20cifs: smbd: Dump SMB packet when configuredLong Li1-1/+5
When sending through SMB Direct, also dump the packet in SMB send path. Also fixed a typo in debug message. Signed-off-by: Long Li <[email protected]> Cc: [email protected] Signed-off-by: Steve French <[email protected]> Reviewed-by: Ronnie Sahlberg <[email protected]>
2018-04-20btrfs: print-tree: debugging output enhancementQu Wenruo2-11/+16
This patch enhances the following things: - tree block header * add generation and owner output for node and leaf - node pointer generation output - allow btrfs_print_tree() to not follow nodes * just like btrfs-progs Please note that, although function btrfs_print_tree() is not called by anyone right now, it's still a pretty useful function to debug kernel. So that function is still kept for later use. Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: Lu Fengqi <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-04-20btrfs: Fix race condition between delayed refs and blockgroup removalNikolay Borisov3-10/+26
When the delayed refs for a head are all run, eventually cleanup_ref_head is called which (in case of deletion) obtains a reference for the relevant btrfs_space_info struct by querying the bg for the range. This is problematic because when the last extent of a bg is deleted a race window emerges between removal of that bg and the subsequent invocation of cleanup_ref_head. This can result in cache being null and either a null pointer dereference or assertion failure. task: ffff8d04d31ed080 task.stack: ffff9e5dc10cc000 RIP: 0010:assfail.constprop.78+0x18/0x1a [btrfs] RSP: 0018:ffff9e5dc10cfbe8 EFLAGS: 00010292 RAX: 0000000000000044 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff8d04ffc1f868 RSI: ffff8d04ffc178c8 RDI: ffff8d04ffc178c8 RBP: ffff8d04d29e5ea0 R08: 00000000000001f0 R09: 0000000000000001 R10: ffff9e5dc0507d58 R11: 0000000000000001 R12: ffff8d04d29e5ea0 R13: ffff8d04d29e5f08 R14: ffff8d04efe29b40 R15: ffff8d04efe203e0 FS: 00007fbf58ead500(0000) GS:ffff8d04ffc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe6c6975648 CR3: 0000000013b2a000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __btrfs_run_delayed_refs+0x10e7/0x12c0 [btrfs] btrfs_run_delayed_refs+0x68/0x250 [btrfs] btrfs_should_end_transaction+0x42/0x60 [btrfs] btrfs_truncate_inode_items+0xaac/0xfc0 [btrfs] btrfs_evict_inode+0x4c6/0x5c0 [btrfs] evict+0xc6/0x190 do_unlinkat+0x19c/0x300 do_syscall_64+0x74/0x140 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7fbf589c57a7 To fix this, introduce a new flag "is_system" to head_ref structs, which is populated at insertion time. This allows to decouple the querying for the spaceinfo from querying the possibly deleted bg. Fixes: d7eae3403f46 ("Btrfs: rework delayed ref total_bytes_pinned accounting") CC: [email protected] # 4.14+ Suggested-by: Omar Sandoval <[email protected]> Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: Omar Sandoval <[email protected]> Signed-off-by: David Sterba <[email protected]>
2018-04-20vfs: Undo an overly zealous MS_RDONLY -> SB_RDONLY conversionDavid Howells1-1/+1
In do_mount() when the MS_* flags are being converted to MNT_* flags, MS_RDONLY got accidentally convered to SB_RDONLY. Undo this change. Fixes: e462ec50cb5f ("VFS: Differentiate mount flags (MS_*) from internal superblock flags") Signed-off-by: David Howells <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-20afs: Fix server record deletionDavid Howells1-1/+8
AFS server records get removed from the net->fs_servers tree when they're deleted, but not from the net->fs_addresses{4,6} lists, which can lead to an oops in afs_find_server() when a server record has been removed, for instance during rmmod. Fix this by deleting the record from the by-address lists before posting it for RCU destruction. The reason this hasn't been noticed before is that the fileserver keeps probing the local cache manager, thereby keeping the service record alive, so the oops would only happen when a fileserver eventually gets bored and stops pinging or if the module gets rmmod'd and a call comes in from the fileserver during the window between the server records being destroyed and the socket being closed. The oops looks something like: BUG: unable to handle kernel NULL pointer dereference at 000000000000001c ... Workqueue: kafsd afs_process_async_call [kafs] RIP: 0010:afs_find_server+0x271/0x36f [kafs] ... Call Trace: afs_deliver_cb_init_call_back_state3+0x1f2/0x21f [kafs] afs_deliver_to_call+0x1ee/0x5e8 [kafs] afs_process_async_call+0x5b/0xd0 [kafs] process_one_work+0x2c2/0x504 worker_thread+0x1d4/0x2ac kthread+0x11f/0x127 ret_from_fork+0x24/0x30 Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds69-349/+786
Pull networking fixes from David Miller: 1) Unbalanced refcounting in TIPC, from Jon Maloy. 2) Only allow TCP_MD5SIG to be set on sockets in close or listen state. Once the connection is established it makes no sense to change this. From Eric Dumazet. 3) Missing attribute validation in neigh_dump_table(), also from Eric Dumazet. 4) Fix address comparisons in SCTP, from Xin Long. 5) Neigh proxy table clearing can deadlock, from Wolfgang Bumiller. 6) Fix tunnel refcounting in l2tp, from Guillaume Nault. 7) Fix double list insert in team driver, from Paolo Abeni. 8) af_vsock.ko module was accidently made unremovable, from Stefan Hajnoczi. 9) Fix reference to freed llc_sap object in llc stack, from Cong Wang. 10) Don't assume netdevice struct is DMA'able memory in virtio_net driver, from Michael S. Tsirkin. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) net/smc: fix shutdown in state SMC_LISTEN bnxt_en: Fix memory fault in bnxt_ethtool_init() virtio_net: sparse annotation fix virtio_net: fix adding vids on big-endian virtio_net: split out ctrl buffer net: hns: Avoid action name truncation docs: ip-sysctl.txt: fix name of some ipv6 variables vmxnet3: fix incorrect dereference when rxvlan is disabled llc: hold llc_sap before release_sock() MAINTAINERS: Direct networking documentation changes to netdev atm: iphase: fix spelling mistake: "Tansmit" -> "Transmit" net: qmi_wwan: add Wistron Neweb D19Q1 net: caif: fix spelling mistake "UKNOWN" -> "UNKNOWN" net: stmmac: Disable ACS Feature for GMAC >= 4 net: mvpp2: Fix DMA address mask size net: change the comment of dev_mc_init net: qualcomm: rmnet: Fix warning seen with fill_info tun: fix vlan packet truncation tipc: fix infinite loop when dumping link monitor summary tipc: fix use-after-free in tipc_nametbl_stop ...
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds8-11/+39
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Assorted fixes. Some of that is only a matter with fault injection (broken handling of small allocation failure in various mount-related places), but the last one is a root-triggerable stack overflow, and combined with userns it gets really nasty ;-/" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Don't leak MNT_INTERNAL away from internal mounts mm,vmscan: Allow preallocating memory for register_shrinker(). rpc_pipefs: fix double-dput() orangefs_kill_sb(): deal with allocation failures jffs2_kill_sb(): deal with failed allocations hypfs_kill_super(): deal with failed allocations
2018-04-20Merge tag 'ecryptfs-4.17-rc2-fixes' of ↵Linus Torvalds4-21/+46
git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull eCryptfs fixes from Tyler Hicks: "Minor cleanups and a bug fix to completely ignore unencrypted filenames in the lower filesystem when filename encryption is enabled at the eCryptfs layer" * tag 'ecryptfs-4.17-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: don't pass up plaintext names when using filename encryption ecryptfs: fix spelling mistake: "cadidate" -> "candidate" ecryptfs: lookup: Don't check if mount_crypt_stat is NULL
2018-04-20Merge tag 'for_v4.17-rc2' of ↵Linus Torvalds9-40/+63
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs - isofs memory leak fix - two fsnotify fixes of event mask handling - udf fix of UTF-16 handling - couple other smaller cleanups * tag 'for_v4.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix leak of UTF-16 surrogates into encoded strings fs: ext2: Adding new return type vm_fault_t isofs: fix potential memory leak in mount option parsing MAINTAINERS: add an entry for FSNOTIFY infrastructure fsnotify: fix typo in a comment about mark->g_list fsnotify: fix ignore mask logic in send_to_group() isofs compress: Remove VLA usage fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init fanotify: fix logic of events on child
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds6-38/+92
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: - suspend/resume handling fix for Raydium I2C-connected touchscreen from Aaron Ma - protocol fixup for certain BT-connected Wacoms from Aaron Armstrong Skomra - battery level reporting fix on BT-connected mice from Dmitry Torokhov - hidraw race condition fix from Rodrigo Rivas Costa * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: i2c-hid: fix inverted return value from i2c_hid_command() HID: i2c-hid: Fix resume issue on Raydium touchscreen device HID: wacom: bluetooth: send exit report for recent Bluetooth devices HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device HID: input: fix battery level reporting on BT mice
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds5-81/+163
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fix from Jiri Kosina: "Shadow variable API list_head initialization fix from Petr Mladek" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: Allow to call a custom callback when freeing shadow variables livepatch: Initialize shadow variables safely by a custom callback
2018-04-20Merge tag 'for-linus-4.17-rc2-tag' of ↵Linus Torvalds4-22/+313
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - some fixes of kmalloc() flags - one fix of the xenbus driver - an update of the pv sound driver interface needed for a driver which will go through the sound tree * tag 'for-linus-4.17-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: xenbus_dev_frontend: Really return response string xen/sndif: Sync up with the canonical definition in Xen xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_reg_add xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in xen_pcibk_config_quirks_init xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_device_alloc xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_init_device xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_probe
2018-04-20arm/arm64: KVM: Add PSCI version selection APIMarc Zyngier10-4/+156
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1 or 1.0 to a guest, defaulting to the latest version of the PSCI implementation that is compatible with the requested version. This is no different from doing a firmware upgrade on KVM. But in order to give a chance to hypothetical badly implemented guests that would have a fit by discovering something other than PSCI 0.2, let's provide a new API that allows userspace to pick one particular version of the API. This is implemented as a new class of "firmware" registers, where we expose the PSCI version. This allows the PSCI version to be save/restored as part of a guest migration, and also set to any supported version if the guest requires it. Cc: [email protected] #4.16 Reviewed-by: Christoffer Dall <[email protected]> Signed-off-by: Marc Zyngier <[email protected]>
2018-04-20Merge tag 'mips_fixes_4.17_1' of ↵Linus Torvalds4-6/+26
git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips Pull MIPS fixes from James Hogan: - io: Add barriers to read*() & write*() - dts: Fix boston PCI bus DTC warnings (4.17) - memset: Several corner case fixes (one 3.10, others longer) * tag 'mips_fixes_4.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: MIPS: uaccess: Add micromips clobbers to bzero invocation MIPS: memset.S: Fix clobber of v1 in last_fixup MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup MIPS: memset.S: EVA & fault support for small_memset MIPS: dts: Boston: Fix PCI bus dtc warnings: MIPS: io: Add barrier after register read in readX() MIPS: io: Prevent compiler reordering writeX()
2018-04-20Merge tag 'powerpc-4.17-3' of ↵Linus Torvalds5-4/+20
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix an off-by-one bug in our alternative asm patching which leads to incorrectly patched code. This bug lay dormant for nearly 10 years but we finally hit it due to a recent change. - Fix lockups when running KVM guests on Power8 due to a missing check when a thread that's running KVM comes out of idle. - Fix an out-of-spec behaviour in the XIVE code (P9 interrupt controller). - Fix EEH handling of bridge MMIO windows. - Prevent crashes in our RFI fallback flush handler if firmware didn't tell us the size of the L1 cache (only seen on simulators). Thanks to: Benjamin Herrenschmidt, Madhavan Srinivasan, Michael Neuling. * tag 'powerpc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/kvm: Fix lockups when running KVM guests on Power8 powerpc/eeh: Fix enabling bridge MMIO windows powerpc/xive: Fix trying to "push" an already active pool VP powerpc/64s: Default l1d_size to 64K in RFI fallback flush powerpc/lib: Fix off-by-one in alternate feature patching
2018-04-20Merge branch 'for-linus' of ↵Linus Torvalds30-716/+998
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes and kexec-file-load from Martin Schwidefsky: "After the common code kexec patches went in via Andrew we can now push the architecture parts to implement the kexec-file-load system call. Plus a few more bug fixes and cleanups, this includes an update to the default configurations" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/signal: cleanup uapi struct sigaction s390: rename default_defconfig to debug_defconfig s390: remove gcov defconfig s390: update defconfig s390: add support for IBM z14 Model ZR1 s390: remove couple of duplicate includes s390/boot: remove unused COMPILE_VERSION and ccflags-y s390/nospec: include cpu.h s390/decompressor: Ignore file vmlinux.bin.full s390/kexec_file: add generated files to .gitignore s390/Kconfig: Move kexec config options to "Processor type and features" s390/kexec_file: Add ELF loader s390/kexec_file: Add crash support to image loader s390/kexec_file: Add image loader s390/kexec_file: Add kexec_file_load system call s390/kexec_file: Add purgatory s390/kexec_file: Prepare setup.h for kexec_file_load s390/smsgiucv: disable SMSG on module unload s390/sclp: avoid potential usage of uninitialized value
2018-04-20usb: host: xhci-plat: Fix clock resource by adding a register clockGregory CLEMENT3-6/+27
On Armada 7K/8K we need to explicitly enable the register clock. This clock is optional because not all the SoCs using this IP need it but at least for Armada 7K/8K it is actually mandatory. The change was done at xhci-plat level and not at a xhci-mvebu.c because, it is expected that other SoC would have this kind of constraint. The binding documentation is updating accordingly. Signed-off-by: Gregory CLEMENT <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-04-20usb: host: xhci-plat: Remove useless test before clk_disable_unprepareGregory CLEMENT1-4/+2
clk_disable_unprepare() already checks that the clock pointer is valid. No need to test it before calling it. Signed-off-by: Gregory CLEMENT <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-04-20xhci: Fix USB ports for Dell Inspiron 5775Kai-Heng Feng1-1/+4
The Dell Inspiron 5775 is a Raven Ridge. The Enable Slot command timed out when a USB device gets plugged: [ 212.156326] xhci_hcd 0000:03:00.3: Error while assigning device slot ID [ 212.156340] xhci_hcd 0000:03:00.3: Max number of devices this xHCI host supports is 64. [ 212.156348] usb usb2-port3: couldn't allocate usb_device AMD suggests that a delay before xHC suspends can fix the issue. I can confirm it fixes the issue, so use the suspend delay quirk for Raven Ridge's xHC. Cc: [email protected] Signed-off-by: Kai-Heng Feng <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-04-20perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUsOskar Senft1-1/+17
SBOX on some Broadwell CPUs is broken because it's enabled unconditionally despite the fact that there are no SBOXes available. Check the Power Control Unit CAPID4 register to determine the number of available SBOXes on the particular CPU before trying to enable them. If there are none, nullify the SBOX descriptor so it isn't tried to be initialized. Signed-off-by: Oskar Senft <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Mark van Dijk <[email protected]> Reviewed-by: Kan Liang <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-04-20perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"Stephane Eranian1-0/+21
This reverts commit 3b94a891667c ("perf/x86/intel/uncore: Remove SBOX support for Broadwell server") Revert because there exists a proper workaround for Broadwell-EP servers without SBOX now. Note that BDX-DE does not have a SBOX. Signed-off-by: Stephane Eranian <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kan Liang <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-04-20x86/power/64: Fix page-table setup for temporary text mappingJoerg Roedel1-1/+1
On a system with 4-level page-tables there is no p4d, so the pud in the pgd should be mapped. The old code before commit fb43d6cb91ef already did that. The change from above commit causes an invalid page-table which causes undefined behavior. In one report it caused triple faults. Fix it by changing the p4d back to pud. Fixes: fb43d6cb91ef ('x86/mm: Do not auto-massage page protections') Reported-by: Borislav Petkov <[email protected]> Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Michal Kubecek <[email protected]> Tested-by: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Dave Hansen <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-04-19Don't leak MNT_INTERNAL away from internal mountsAl Viro1-1/+2
We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for their copies. As it is, creating a deep stack of bindings of /proc/*/ns/* somewhere in a new namespace and exiting yields a stack overflow. Cc: [email protected] Reported-by: Alexander Aring <[email protected]> Bisected-by: Kirill Tkhai <[email protected]> Tested-by: Kirill Tkhai <[email protected]> Tested-by: Alexander Aring <[email protected]> Signed-off-by: Al Viro <[email protected]>
2018-04-19Merge branch 'omap-for-v4.17/fixes-ti-sysc' into omap-for-v4.17/fixesTony Lindgren1-4/+4
2018-04-19MAINTAINERS: Add backup maintainers for libnvdimm and DAXDave Jiang1-0/+15
Adding additional maintainers to libnvdimm related code and DAX. Signed-off-by: Dave Jiang <[email protected]> Acked-by: Ross Zwisler <[email protected]> Acked-by: Vishal Verma <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2018-04-19device-dax: allow MAP_SYNC to succeedDave Jiang1-0/+2
MAP_SYNC is a nop for device-dax. Allow MAP_SYNC to succeed on device-dax to eliminate special casing between device-dax and fs-dax as to when the flag can be specified. Device-dax users already implicitly assume that they do not need to call fsync(), and this enables them to explicitly check for this capability. Cc: <[email protected]> Fixes: b6fb293f2497 ("mm: Define MAP_SYNC and VM_SYNC flags") Signed-off-by: Dave Jiang <[email protected]> Reviewed-by: Dan Williams <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2018-04-19Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error"Dan Williams1-2/+1
With commit df3f126482db ("libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()") it is now possible to allow of_pmem to be built as a module as originally implemented. Signed-off-by: Dan Williams <[email protected]>
2018-04-19libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()Rob Herring1-1/+1
Remove the direct dependency on of_node_to_nid() by using dev_to_node() instead. Any DT platform device will have its NUMA node id set when the device is created. With this, commit 291717b6fbdb ("libnvdimm, of_pmem: workaround OF_NUMA=n build error") can be reverted. Fixes: 717197608952 ("libnvdimm: Add device-tree based driver") Cc: Dan Williams <[email protected]> Cc: Oliver O'Halloran <[email protected]> Cc: [email protected] Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2018-04-19net/smc: fix shutdown in state SMC_LISTENUrsula Braun1-6/+4
Calling shutdown with SHUT_RD and SHUT_RDWR for a listening SMC socket crashes, because commit 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") releases the internal clcsock in smc_close_active() and sets smc->clcsock to NULL. For SHUT_RD the smc_close_active() call is removed. For SHUT_RDWR the kernel_sock_shutdown() call is omitted, since the clcsock is already released. Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker") Signed-off-by: Ursula Braun <[email protected]> Reported-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-19bnxt_en: Fix memory fault in bnxt_ethtool_init()Vasundhara Volam2-24/+27
In some firmware images, the length of BNX_DIR_TYPE_PKG_LOG nvram type could be greater than the fixed buffer length of 4096 bytes allocated by the driver. This was causing HWRM_NVM_READ to copy more data to the buffer than the allocated size, causing general protection fault. Fix the issue by allocating the exact buffer length returned by HWRM_NVM_FIND_DIR_ENTRY, instead of 4096. Move the kzalloc() call into the bnxt_get_pkgver() function. Fixes: 3ebf6f0a09a2 ("bnxt_en: Add installed-package firmware version reporting via Ethtool GDRVINFO") Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-19Merge branch 'virtio-ctrl-buffer-fixes'David S. Miller1-29/+39
Michael S. Tsirkin says: ==================== virtio: ctrl buffer fixes Here are a couple of fixes related to the virtio control buffer. Lightly tested on x86 only. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-04-19virtio_net: sparse annotation fixMichael S. Tsirkin1-1/+1
offloads is a buffer in virtio format, should use the __virtio64 tag. Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-19virtio_net: fix adding vids on big-endianMichael S. Tsirkin1-3/+3
Programming vids (adding or removing them) still passes guest-endian values in the DMA buffer. That's wrong if guest is big-endian and when virtio 1 is enabled. Note: this is on top of a previous patch: virtio_net: split out ctrl buffer Fixes: 9465a7a6f ("virtio_net: enable v1.0 support") Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-19virtio_net: split out ctrl bufferMichael S. Tsirkin1-29/+39
When sending control commands, virtio net sets up several buffers for DMA. The buffers are all part of the net device which means it's actually allocated by kvmalloc so it's in theory (on extreme memory pressure) possible to get a vmalloc'ed buffer which on some platforms means we can't DMA there. Fix up by moving the DMA buffers into a separate structure. Reported-by: Mikulas Patocka <[email protected]> Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-19net: hns: Avoid action name truncationdann frazier1-1/+1
When longer interface names are used, the action names exposed in /proc/interrupts and /proc/irq/* maybe truncated. For example, when using the predictable name algorithm in systemd on a HiSilicon D05, I see: ubuntu@d05-3:~$ grep enahisic2i0-tx /proc/interrupts | sed 's/.* //' enahisic2i0-tx0 enahisic2i0-tx1 [...] enahisic2i0-tx8 enahisic2i0-tx9 enahisic2i0-tx1 enahisic2i0-tx1 enahisic2i0-tx1 enahisic2i0-tx1 enahisic2i0-tx1 enahisic2i0-tx1 Increase the max ring name length to allow for an interface name of IFNAMSIZE. After this change, I now see: $ grep enahisic2i0-tx /proc/interrupts | sed 's/.* //' enahisic2i0-tx0 enahisic2i0-tx1 enahisic2i0-tx2 [...] enahisic2i0-tx8 enahisic2i0-tx9 enahisic2i0-tx10 enahisic2i0-tx11 enahisic2i0-tx12 enahisic2i0-tx13 enahisic2i0-tx14 enahisic2i0-tx15 Signed-off-by: dann frazier <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-19fsnotify: Fix fsnotify_mark_connector raceRobert Kolchmeyer1-3/+1
fsnotify() acquires a reference to a fsnotify_mark_connector through the SRCU-protected pointer to_tell->i_fsnotify_marks. However, it appears that no precautions are taken in fsnotify_put_mark() to ensure that fsnotify() drops its reference to this fsnotify_mark_connector before assigning a value to its 'destroy_next' field. This can result in fsnotify_put_mark() assigning a value to a connector's 'destroy_next' field right before fsnotify() tries to traverse the linked list referenced by the connector's 'list' field. Since these two fields are members of the same union, this behavior results in a kernel panic. This issue is resolved by moving the connector's 'destroy_next' field into the object pointer union. This should work since the object pointer access is protected by both a spinlock and the value of the 'flags' field, and the 'flags' field is cleared while holding the spinlock in fsnotify_put_mark() before 'destroy_next' is updated. It shouldn't be possible for another thread to accidentally read from the object pointer after the 'destroy_next' field is updated. The offending behavior here is extremely unlikely; since fsnotify_put_mark() removes references to a connector (specifically, it ensures that the connector is unreachable from the inode it was formerly attached to) before updating its 'destroy_next' field, a sizeable chunk of code in fsnotify_put_mark() has to execute in the short window between when fsnotify() acquires the connector reference and saves the value of its 'list' field. On the HEAD kernel, I've only been able to reproduce this by inserting a udelay(1) in fsnotify(). However, I've been able to reproduce this issue without inserting a udelay(1) anywhere on older unmodified release kernels, so I believe it's worth fixing at HEAD. References: https://bugzilla.kernel.org/show_bug.cgi?id=199437 Fixes: 08991e83b7286635167bab40927665a90fb00d81 CC: [email protected] Signed-off-by: Robert Kolchmeyer <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2018-04-19docs: ip-sysctl.txt: fix name of some ipv6 variablesOlivier Gayot1-4/+4
The name of the following proc/sysctl entries were incorrectly documented: /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_number /proc/sys/net/ipv6/conf/<interface>/max_hbt_opts_number /proc/sys/net/ipv6/conf/<interface>/max_dst_opts_length /proc/sys/net/ipv6/conf/<interface>/max_hbt_length Their name was set to the name of the symbol in the .data field of the control table instead of their .proc name. Signed-off-by: Olivier Gayot <[email protected]> Signed-off-by: David S. Miller <[email protected]>