aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-10-15Merge tag 'for-v3.18' of git://git.infradead.org/battery-2.6Linus Torvalds33-249/+1919
Pull power supply and reset updates from Sebastian Reichel: - Initial support for the following chips * max77836 (charger) * max14577 (charger) * bq27742 (battery gauge) * ltc2952 (poweroff) * stih416 (restart) * syscon-reboot (restart) * gpio-restart (restart) - cleanup of power supply core - misc fixes in power supply and reset drivers * tag 'for-v3.18' of git://git.infradead.org/battery-2.6: (48 commits) power: ab8500_fg: Fix build warning Documentation: charger: max14577: Update the date of introducing ABI power: reset: corrections for simple syscon reboot driver Documentation: power: reset: Add documentation for generic SYSCON reboot driver power: reset: Add generic SYSCON register mapped reset bq27x00_battery: Fix flag reading for bq27742 power: reset: use restart_notifier mechanism for msm-poweroff power: Add simple gpio-restart driver power: reset: st: Provide DT bindings for ST's Power Reset driver power: reset: Add restart functionality for STiH41x platforms power: charger-manager: Fix NULL pointer exception with missing cm-fuel-gauge power: max14577: Fix circular config SYSFS dependency power: gpio-charger: do not use gpio value directly power: max8925: Use of_get_child_by_name power: max8925: Fix NULL ptr dereference on memory allocation failure bq27x00_battery: Add support to bq27742 Documentation: charger: max14577: Document exported sysfs entry devicetree: mfd: max14577: Add device tree bindings document power: max17040: Add ID for MAX77836 Fuel Gauge block charger: max14577: Configure battery-dependent settings from DTS and sysfs ... Conflicts: drivers/power/reset/Kconfig drivers/power/reset/Makefile
2014-10-15Merge branch 'for-linus' of ↵Linus Torvalds25-623/+948
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "There is the long-awaited discard support for RBD (Guangliang Zhao, Josh Durgin), a pile of RBD bug fixes that didn't belong in late -rc's (Ilya Dryomov, Li RongQing), a pile of fs/ceph bug fixes and performance and debugging improvements (Yan, Zheng, John Spray), and a smattering of cleanups (Chao Yu, Fabian Frederick, Joe Perches)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits) ceph: fix divide-by-zero in __validate_layout() rbd: rbd workqueues need a resque worker libceph: ceph-msgr workqueue needs a resque worker ceph: fix bool assignments libceph: separate multiple ops with commas in debugfs output libceph: sync osd op definitions in rados.h libceph: remove redundant declaration ceph: additional debugfs output ceph: export ceph_session_state_name function ceph: include the initial ACL in create/mkdir/mknod MDS requests ceph: use pagelist to present MDS request data libceph: reference counting pagelist ceph: fix llistxattr on symlink ceph: send client metadata to MDS ceph: remove redundant code for max file size verification ceph: remove redundant io_iter_advance() ceph: move ceph_find_inode() outside the s_mutex ceph: request xattrs if xattr_version is zero rbd: set the remaining discard properties to enable support rbd: use helpers to handle discard for layered images correctly ...
2014-10-15Merge branch 'CVE-2014-7970' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux Pull pivot_root() fix from Andy Lutomirski. Prevent a leak of unreachable mounts. * 'CVE-2014-7970' of git://git.kernel.org/pub/scm/linux/kernel/git/luto/linux: mnt: Prevent pivot_root from creating a loop in the mount tree
2014-10-15Merge branch 'cxgb4'David S. Miller4-139/+26
Anish Bhatt says: ==================== ipv6 and related cleanup for cxgb4/cxgb4i This patch set removes some duplicated/extraneous code from cxgb4i, guards cxgb4 against compilation failure based on ipv6 tristate, make ipv6 related code no longer be enabled by default irrespective of ipv6 tristate and fixes a refcnt issue. -Anish v2 : Provide more detailed commit messages, make subject more concise as recommended by Dave Miller. ==================== Signed-off-by: David S. Miller <[email protected]>
2014-10-15cxgb4i: Remove duplicate call to dst_neigh_lookup()Anish Bhatt1-5/+0
There is an extra call to dst_neigh_lookup() leftover in cxgb4i that can cause an unreleased refcnt issue. Remove extraneous call. Signed-off-by: Anish Bhatt <[email protected]> Fixes : 759a0cc5a3e1b ('cxgb4i: Add ipv6 code to driver, call into libcxgbi ipv6 api') Signed-off-by: David S. Miller <[email protected]>
2014-10-15cxgb4i : Fix -Wunused-function warningAnish Bhatt2-0/+10
A bunch of ipv6 related code is left on by default. While this causes no compilation issues, there is no need to have this enabled by default. Guard with an ipv6 check, which also takes care of a -Wunused-function warning. Signed-off-by: Anish Bhatt <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-15cxgb4 : Fix build failure in cxgb4 when ipv6 is disabled/not in-builtAnish Bhatt2-1/+9
cxgb4 ipv6 does not guard against ipv6 being disabled, or the standard ipv6 module vs inbuilt tri-state issue. This was fixed for cxgb4i & iw_cxgb4 but missed for cxgb4. Signed-off-by: Anish Bhatt <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-15cxgb4i : Remove duplicated CLIP handling codeAnish Bhatt2-133/+7
cxgb4 already handles CLIP updates from a previous changeset for iw_cxgb4, there is no need to have this functionality in cxgb4i. Remove duplicated code Signed-off-by: Anish Bhatt <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14sparc64: Fix FPU register corruption with AES crypto offload.David S. Miller2-1/+21
The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the key material is preloaded into the FPU registers, and then we loop over and over doing the crypt operation, reusing those pre-cooked key registers. There are intervening blkcipher*() calls between the crypt operation calls. And those might perform memcpy() and thus also try to use the FPU. The sparc64 kernel FPU usage mechanism is designed to allow such recursive uses, but with a catch. There has to be a trap between the two FPU using threads of control. The mechanism works by, when the FPU is already in use by the kernel, allocating a slot for FPU saving at trap time. Then if, within the trap handler, we try to use the FPU registers, the pre-trap FPU register state is saved into the slot. Then at trap return time we notice this and restore the pre-trap FPU state. Over the long term there are various more involved ways we can make this work, but for a quick fix let's take advantage of the fact that the situation where this happens is very limited. All sparc64 chips that support the crypto instructiosn also are using the Niagara4 memcpy routine, and that routine only uses the FPU for large copies where we can't get the source aligned properly to a multiple of 8 bytes. We look to see if the FPU is already in use in this context, and if so we use the non-large copy path which only uses integer registers. Furthermore, we also limit this special logic to when we are doing kernel copy, rather than a user copy. Signed-off-by: David S. Miller <[email protected]>
2014-10-15powerpc/msi: Use WARN_ON() in msi bitmap selftestsMichael Ellerman1-30/+24
As demonstrated in the previous commit, the failure message from the msi bitmap selftests is a bit subtle, it's easy to miss a failure in a busy boot log. So drop our check() macro and use WARN_ON() instead. This necessitates inverting all the conditions as well. Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/msi: Fix the msi bitmap alignment testsMichael Ellerman1-8/+18
When we added the alignment tests recently we failed to check they were actually passing - oops. They weren't passing, because the bitmap was full. We should also be a bit more careful when checking the return code, a negative error return could by divisible by our alignment value. Fixes: b0345bbc6d09 ("powerpc/msi: Improve IRQ bitmap allocator") Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/eeh: Block CFG upon frozen Shiner adapterGavin Shan1-2/+3
The Broadcom Shiner 2-ports 10G ethernet adapter has same problem commit 6f20bda0 ("powerpc/eeh: Block PCI config access upon frozen PE") fixes. Put it to the black list as well. # lspci -s 0004:01:00.0 0004:01:00.0 Ethernet controller: Broadcom Corporation \ NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10) # lspci -n -s 0004:01:00.0 0004:01:00.0 0200: 14e4:168e (rev 10) Reported-by: John Walthour <[email protected]> Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/eeh: Don't collect logs on PE with blocked config spaceGavin Shan1-0/+7
When the PE's config space is marked as blocked, PCI config read requests always return 0xFF's. It's pointless to collect logs in this case. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/eeh: Block PCI config access upon frozen PEGavin Shan3-0/+28
The problem was found when I tried to inject PCI config error by PHB3 PAPR error injection registers into Broadcom Austin 4-ports NIC adapter. The frozen PE was reported successfully and EEH core started to recover it. However, I run into fenced PHB when dumping PCI config space as EEH logs. I was told that PCI config requests should not be progagated to the adapter until PE reset is done successfully. Otherise, we would run out of PHB internal credits and trigger PCT (PCIE Completion Timeout), which leads to the fenced PHB. The patch introduces another PE flag EEH_PE_CFG_RESTRICTED, which is set during PE initialization time if the PE includes the specific PCI devices that need block PCI config access until PE reset is done. When the PE becomes frozen for the first time, EEH_PE_CFG_BLOCKED is set if the PE has flag EEH_PE_CFG_RESTRICTED. Then the PCI config access to the PE will be dropped by platform PCI accessors until PE reset is done successfully. The mechanism is shared by PowerNV platform owned PE or userland owned ones. It's not used on pSeries platform yet. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/pseries: Drop config requests in EEH accessorsGavin Shan1-19/+11
The pSeires EEH config accessors rely on rtas_{read, write}_config() and the condition to check if the PE's config space is blocked should be moved to those 2 functions so that config requests from kernel, userland, EEH core can be dropped to avoid recursive EEH error if necessary. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/powernv: Drop config requests in EEH accessorsGavin Shan1-2/+35
It's bad idea to access the PCI config registers of the adapters, which is experiencing reset. It leads to recursive EEH error without exception. The patch drops PCI config requests in EEH accessors if the PE has been marked to accept PCI config requests, for example during PE reseet time. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKEDGavin Shan6-17/+17
The flag EEH_PE_RESET indicates blocking config space of the PE during reset time. We potentially need block PE's config space other than reset time. So it's reasonable to replace it with EEH_PE_CFG_BLOCKED to indicate its usage. There are no substantial code or logic changes in this patch. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/eeh: Fix condition for isolated stateGavin Shan1-1/+1
Function eeh_pe_state_mark() could possibly have combination of multiple EEH PE state as its argument. The patch fixes the condition used to check if EEH_PE_ISOLATED is included. Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/pseries: Make CPU hotplug path endian safeBharata B Rao3-14/+15
- ibm,rtas-configure-connector should treat the RTAS data as big endian. - Treat ibm,ppc-interrupt-server#s as big-endian when setting smp_processor_id during hotplug. Signed-off-by: Bharata B Rao <[email protected]> Signed-off-by: Thomas Falcon <[email protected]> Acked-by: Nathan Fontenot <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc/pseries: Use dump_stack instead of show_stackAnton Blanchard1-6/+5
We can use the simpler dump_stack() instead of show_stack(current, __get_SP()) Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc: Rename __get_SP() to current_stack_pointer()Anton Blanchard7-7/+7
Michael points out that __get_SP() is a pretty horrible function name. Let's give it a better name. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15powerpc: Reimplement __get_SP() as a function not a defineAnton Blanchard6-5/+10
Li Zhong points out an issue with our current __get_SP() implementation. If ftrace function tracing is enabled (ie -pg profiling using _mcount) we spill a stack frame on 64bit all the time. If a function calls __get_SP() and later calls a function that is tail call optimised, we will pop the stack frame and the value returned by __get_SP() is no longer valid. An example from Li can be found in save_stack_trace -> save_context_stack: c0000000000432c0 <.save_stack_trace>: c0000000000432c0: mflr r0 c0000000000432c4: std r0,16(r1) c0000000000432c8: stdu r1,-128(r1) <-- stack frame for _mcount c0000000000432cc: std r3,112(r1) c0000000000432d0: bl <._mcount> c0000000000432d4: nop c0000000000432d8: mr r4,r1 <-- __get_SP() c0000000000432dc: ld r5,632(r13) c0000000000432e0: ld r3,112(r1) c0000000000432e4: li r6,1 c0000000000432e8: addi r1,r1,128 <-- pop stack frame c0000000000432ec: ld r0,16(r1) c0000000000432f0: mtlr r0 c0000000000432f4: b <.save_context_stack> <-- tail call optimized save_context_stack ends up with a stack pointer below the current one, and it is likely to be scribbled over. Fix this by making __get_SP() a function which returns the callers stack frame. Also replace inline assembly which grabs the stack pointer in save_stack_trace and show_stack with __get_SP(). This also fixes an issue with perf_arch_fetch_caller_regs(). It currently unwinds the stack once, which will skip a valid stack frame on a leaf function. With the __get_SP() fixes in this patch, we never need to unwind the stack frame to get to the first interesting frame. We have to export __get_SP() because perf_arch_fetch_caller_regs() (which is used in modules) calls it from a header file. Reported-by: Li Zhong <[email protected]> Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2014-10-15virtio-rng: refactor probe error handlingMichael S. Tsirkin1-6/+9
Code like vi->vq = NULL; kfree(vi) does not make sense. Clean it up, use goto error labels for cleanup. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_scsi: drop scan callbackMichael S. Tsirkin1-16/+7
Enable VQs early like we do for restore. This makes it possible to drop the scan callback, moving scanning into the probe function, and making code simpler. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_balloon: enable VQs early on restoreMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after resume returns, virtio balloon violated this rule by adding bufs, which causes the VQ to be used directly within restore. To fix, call virtio_device_ready before using VQ. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_scsi: fix race on device removalMichael S. Tsirkin1-1/+10
We cancel event work on device removal, but an interrupt could trigger immediately after this, and queue it again. To fix, set a flag. Loosely based on patch by Paolo Bonzini Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virito_scsi: use freezable WQ for eventsPaolo Bonzini1-1/+1
Michael S. Tsirkin noticed a race condition: we reset device on freeze, but system WQ is still running so it might try adding bufs to a VQ meanwhile. To fix, switch to handling events from the freezable WQ. Reported-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: enable VQs early on restoreMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio net violated this rule by using receive VQs within restore. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_console: enable VQs early on restoreMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after resume returns, virtio console violated this rule by adding inbufs, which causes the VQ to be used directly within restore. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_scsi: enable VQs early on restoreMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio scsi violated this rule on restore by kicking event vq within restore. To fix, call virtio_device_ready before using event queue. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_blk: enable VQs early on restoreMichael S. Tsirkin1-3/+6
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio block violated this rule on restore by restarting queues, which might in theory cause the VQ to be used directly within restore. To fix, call virtio_device_ready before using starting queues. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_scsi: move kick event out from virtscsi_initMichael S. Tsirkin1-5/+11
We currently kick event within virtscsi_init, before host is fully initialized. This can in theory confuse guest if device consumes the buffers immediately. To fix, move virtscsi_kick_event_all out to scan/restore. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: fix use after free on allocation failureMichael S. Tsirkin1-0/+2
In the extremely unlikely event that driver initialization fails after RX buffers are added, virtio net frees RX buffers while VQs are still active, potentially causing device to use a freed buffer. To fix, reset device first - same as we do on device removal. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-159p/trans_virtio: enable VQs earlyMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, but virtio 9p device adds self to channel list within probe, at which point VQ can be used in violation of the spec. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_console: enable VQs earlyMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio console violated this rule by adding inbufs, which causes the VQ to be used directly within probe. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_blk: enable VQs earlyMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio block violated this rule by calling add_disk, which causes the VQ to be used directly within probe. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: enable VQs earlyMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio net violated this rule by using receive VQs within probe. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio: add API to enable VQs earlyMichael S. Tsirkin1-0/+17
virtio spec 0.9.X requires DRIVER_OK to be set before VQs are used, but some drivers use VQs before probe function returns. Since DRIVER_OK is set after probe, this violates the spec. Even though under virtio 1.0 transitional devices support this behaviour, we want to make it possible for those early callers to become spec compliant and eventually support non-transitional devices. Add API for drivers to call before using VQs. Sets DRIVER_OK internally. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: minor cleanupMichael S. Tsirkin1-4/+2
goto done; done: return; is ugly, it was put there to make diff review easier. replace by open-coded return. Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio-net: drop config_mutexMichael S. Tsirkin1-6/+1
config_mutex served two purposes: prevent multiple concurrent config change handlers, and synchronize access to config_enable flag. Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1 workqueue: make all workqueues non-reentrant all workqueues are non-reentrant, and config_enable is now gone. Get rid of the unnecessary lock. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: drop config_enableMichael S. Tsirkin1-23/+4
Now that virtio core ensures config changes don't arrive during probing, drop config_enable flag in virtio net. On removal, flush is now sufficient to guarantee that no change work is queued. This help simplify the driver, and will allow setting DRIVER_OK earlier without losing config change notifications. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio-blk: drop config_mutexMichael S. Tsirkin1-8/+0
config_mutex served two purposes: prevent multiple concurrent config change handlers, and synchronize access to config_enable flag. Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1 workqueue: make all workqueues non-reentrant all workqueues are non-reentrant, and config_enable is now gone. Get rid of the unnecessary lock. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_blk: drop config_enableMichael S. Tsirkin1-19/+4
Now that virtio core ensures config changes don't arrive during probing, drop config_enable flag in virtio blk. On removal, flush is now sufficient to guarantee that no change work is queued. This help simplify the driver, and will allow setting DRIVER_OK earlier without losing config change notifications. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio: defer config changed notificationsMichael S. Tsirkin2-9/+55
Defer config changed notifications that arrive during probe/scan/freeze/restore. This will allow drivers to set DRIVER_OK earlier, without worrying about racing with config change interrupts. This change will also benefit old hypervisors (before 2009) that send interrupts without checking DRIVER_OK: previously, the callback could race with driver-specific initialization. This will also help simplify drivers. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]> (cosmetic changes)
2014-10-15virtio-pci: move freeze/restore to virtio coreMichael S. Tsirkin3-52/+62
This is in preparation to extending config changed event handling in core. Wrapping these in an API also seems to make for a cleaner code. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio: unify config_changed handlingMichael S. Tsirkin7-28/+17
Replace duplicated code in all transports with a single wrapper in virtio.c. The only functional change is in virtio_mmio.c: if a buggy device sends us an interrupt before driver is set, we previously returned IRQ_NONE, now we return IRQ_HANDLED. As this must not happen in practice, this does not look like a big deal. See also commit 3fff0179e33cd7d0a688dab65700c46ad089e934 virtio-pci: do not oops on config change if driver not loaded. for the original motivation behind the driver check. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_pci: fix virtio spec compliance on restoreMichael S. Tsirkin1-3/+30
On restore, virtio pci does the following: + set features + init vqs etc - device can be used at this point! + set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits This is in violation of the virtio spec, which requires the following order: - ACKNOWLEDGE - DRIVER - init vqs - DRIVER_OK This behaviour will break with hypervisors that assume spec compliant behaviour. It seems like a good idea to have this patch applied to stable branches to reduce the support butden for the hypervisors. Cc: [email protected] Cc: Amit Shah <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15modules, lock around setting of MODULE_STATE_UNFORMEDPrarit Bhargava1-0/+2
A panic was seen in the following sitation. There are two threads running on the system. The first thread is a system monitoring thread that is reading /proc/modules. The second thread is loading and unloading a module (in this example I'm using my simple dummy-module.ko). Note, in the "real world" this occurred with the qlogic driver module. When doing this, the following panic occurred: ------------[ cut here ]------------ kernel BUG at kernel/module.c:3739! invalid opcode: 0000 [#1] SMP Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module] CPU: 37 PID: 186343 Comm: cat Tainted: GF O-------------- 3.10.0+ #7 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013 task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000 RIP: 0010:[<ffffffff810d64c5>] [<ffffffff810d64c5>] module_flags+0xb5/0xc0 RSP: 0018:ffff88080fa7fe18 EFLAGS: 00010246 RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000 RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000 R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000 FS: 00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00 ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0 Call Trace: [<ffffffff810d666c>] m_show+0x19c/0x1e0 [<ffffffff811e4d7e>] seq_read+0x16e/0x3b0 [<ffffffff812281ed>] proc_reg_read+0x3d/0x80 [<ffffffff811c0f2c>] vfs_read+0x9c/0x170 [<ffffffff811c1a58>] SyS_read+0x58/0xb0 [<ffffffff81605829>] system_call_fastpath+0x16/0x1b Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 RIP [<ffffffff810d64c5>] module_flags+0xb5/0xc0 RSP <ffff88080fa7fe18> Consider the two processes running on the system. CPU 0 (/proc/modules reader) CPU 1 (loading/unloading module) CPU 0 opens /proc/modules, and starts displaying data for each module by traversing the modules list via fs/seq_file.c:seq_open() and fs/seq_file.c:seq_read(). For each module in the modules list, seq_read does op->start() <-- this is a pointer to m_start() op->show() <- this is a pointer to m_show() op->stop() <-- this is a pointer to m_stop() The m_start(), m_show(), and m_stop() module functions are defined in kernel/module.c. The m_start() and m_stop() functions acquire and release the module_mutex respectively. ie) When reading /proc/modules, the module_mutex is acquired and released for each module. m_show() is called with the module_mutex held. It accesses the module struct data and attempts to write out module data. It is in this code path that the above BUG_ON() warning is encountered, specifically m_show() calls static char *module_flags(struct module *mod, char *buf) { int bx = 0; BUG_ON(mod->state == MODULE_STATE_UNFORMED); ... The other thread, CPU 1, in unloading the module calls the syscall delete_module() defined in kernel/module.c. The module_mutex is acquired for a short time, and then released. free_module() is called without the module_mutex. free_module() then sets mod->state = MODULE_STATE_UNFORMED, also without the module_mutex. Some additional code is called and then the module_mutex is reacquired to remove the module from the modules list: /* Now we can delete it from the lists */ mutex_lock(&module_mutex); stop_machine(__unlink_module, mod, NULL); mutex_unlock(&module_mutex); This is the sequence of events that leads to the panic. CPU 1 is removing dummy_module via delete_module(). It acquires the module_mutex, and then releases it. CPU 1 has NOT set dummy_module->state to MODULE_STATE_UNFORMED yet. CPU 0, which is reading the /proc/modules, acquires the module_mutex and acquires a pointer to the dummy_module which is still in the modules list. CPU 0 calls m_show for dummy_module. The check in m_show() for MODULE_STATE_UNFORMED passed for dummy_module even though it is being torn down. Meanwhile CPU 1, which has been continuing to remove dummy_module without holding the module_mutex, now calls free_module() and sets dummy_module->state to MODULE_STATE_UNFORMED. CPU 0 now calls module_flags() with dummy_module and ... static char *module_flags(struct module *mod, char *buf) { int bx = 0; BUG_ON(mod->state == MODULE_STATE_UNFORMED); and BOOM. Acquire and release the module_mutex lock around the setting of MODULE_STATE_UNFORMED in the teardown path, which should resolve the problem. Testing: In the unpatched kernel I can panic the system within 1 minute by doing while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done and while (true) do cat /proc/modules; done in separate terminals. In the patched kernel I was able to run just over one hour without seeing any issues. I also verified the output of panic via sysrq-c and the output of /proc/modules looks correct for all three states for the dummy_module. dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-) dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE) dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+) Signed-off-by: Prarit Bhargava <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Signed-off-by: Rusty Russell <[email protected]> Cc: [email protected]
2014-10-14mnt: Prevent pivot_root from creating a loop in the mount treeEric W. Biederman1-0/+3
Andy Lutomirski recently demonstrated that when chroot is used to set the root path below the path for the new ``root'' passed to pivot_root the pivot_root system call succeeds and leaks mounts. In examining the code I see that starting with a new root that is below the current root in the mount tree will result in a loop in the mount tree after the mounts are detached and then reattached to one another. Resulting in all kinds of ugliness including a leak of that mounts involved in the leak of the mount loop. Prevent this problem by ensuring that the new mount is reachable from the current root of the mount tree. [Added stable cc. Fixes CVE-2014-7970. --Andy] Cc: [email protected] Reported-by: Andy Lutomirski <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]>
2014-10-14tcp: TCP Small Queues and strange attractorsEric Dumazet1-7/+19
TCP Small queues tries to keep number of packets in qdisc as small as possible, and depends on a tasklet to feed following packets at TX completion time. Choice of tasklet was driven by latencies requirements. Then, TCP stack tries to avoid reorders, by locking flows with outstanding packets in qdisc in a given TX queue. What can happen is that many flows get attracted by a low performing TX queue, and cpu servicing TX completion has to feed packets for all of them, making this cpu 100% busy in softirq mode. This became particularly visible with latest skb->xmit_more support Strategy adopted in this patch is to detect when tcp_wfree() is called from ksoftirqd and let the outstanding queue for this flow being drained before feeding additional packets, so that skb->ooo_okay can be set to allow select_queue() to select the optimal queue : Incoming ACKS are normally handled by different cpus, so this patch gives more chance for these cpus to take over the burden of feeding qdisc with future packets. Tested: lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 & lpaa23:~# sar -n DEV 1 10 | grep eth1 06:16:18 AM eth1 595448.00 1190564.00 38381.09 1760253.12 0.00 0.00 1.00 06:16:19 AM eth1 594858.00 1189686.00 38340.76 1758952.72 0.00 0.00 0.00 06:16:20 AM eth1 597017.00 1194019.00 38480.79 1765370.29 0.00 0.00 1.00 06:16:21 AM eth1 595450.00 1190936.00 38380.19 1760805.05 0.00 0.00 0.00 06:16:22 AM eth1 596385.00 1193096.00 38442.56 1763976.29 0.00 0.00 1.00 06:16:23 AM eth1 598155.00 1195978.00 38552.97 1768264.60 0.00 0.00 0.00 06:16:24 AM eth1 594405.00 1188643.00 38312.57 1757414.89 0.00 0.00 1.00 06:16:25 AM eth1 593366.00 1187154.00 38252.16 1755195.83 0.00 0.00 0.00 06:16:26 AM eth1 593188.00 1186118.00 38232.88 1753682.57 0.00 0.00 1.00 06:16:27 AM eth1 596301.00 1192241.00 38440.94 1762733.09 0.00 0.00 0.00 Average: eth1 595457.30 1190843.50 38381.69 1760664.84 0.00 0.00 0.50 lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog backlog 7606336b 2513p requeues 167982 backlog 224072b 74p requeues 566 backlog 581376b 192p requeues 5598 backlog 181680b 60p requeues 1070 backlog 5305056b 1753p requeues 110166 // Here, this TX queue is attracting flows backlog 157456b 52p requeues 1758 backlog 672216b 222p requeues 3025 backlog 60560b 20p requeues 24541 backlog 448144b 148p requeues 21258 lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect Immediate jump to full bandwidth, and traffic is properly shard on all tx queues. lpaa23:~# sar -n DEV 1 10 | grep eth1 06:16:46 AM eth1 1397632.00 2795397.00 90081.87 4133031.26 0.00 0.00 1.00 06:16:47 AM eth1 1396874.00 2793614.00 90032.99 4130385.46 0.00 0.00 0.00 06:16:48 AM eth1 1395842.00 2791600.00 89966.46 4127409.67 0.00 0.00 1.00 06:16:49 AM eth1 1395528.00 2791017.00 89946.17 4126551.24 0.00 0.00 0.00 06:16:50 AM eth1 1397891.00 2795716.00 90098.74 4133497.39 0.00 0.00 1.00 06:16:51 AM eth1 1394951.00 2789984.00 89908.96 4125022.51 0.00 0.00 0.00 06:16:52 AM eth1 1394608.00 2789190.00 89886.90 4123851.36 0.00 0.00 1.00 06:16:53 AM eth1 1395314.00 2790653.00 89934.33 4125983.09 0.00 0.00 0.00 06:16:54 AM eth1 1396115.00 2792276.00 89984.25 4128411.21 0.00 0.00 1.00 06:16:55 AM eth1 1396829.00 2793523.00 90030.19 4130250.28 0.00 0.00 0.00 Average: eth1 1396158.40 2792297.00 89987.09 4128439.35 0.00 0.00 0.50 lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog backlog 7900052b 2609p requeues 173287 backlog 878120b 290p requeues 589 backlog 1068884b 354p requeues 5621 backlog 996212b 329p requeues 1088 backlog 984100b 325p requeues 115316 backlog 956848b 316p requeues 1781 backlog 1080996b 357p requeues 3047 backlog 975016b 322p requeues 24571 backlog 990156b 327p requeues 21274 (All 8 TX queues get a fair share of the traffic) Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>