aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-10-15virtio-rng: refactor probe error handlingMichael S. Tsirkin1-6/+9
Code like vi->vq = NULL; kfree(vi) does not make sense. Clean it up, use goto error labels for cleanup. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_scsi: drop scan callbackMichael S. Tsirkin1-16/+7
Enable VQs early like we do for restore. This makes it possible to drop the scan callback, moving scanning into the probe function, and making code simpler. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_balloon: enable VQs early on restoreMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after resume returns, virtio balloon violated this rule by adding bufs, which causes the VQ to be used directly within restore. To fix, call virtio_device_ready before using VQ. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_scsi: fix race on device removalMichael S. Tsirkin1-1/+10
We cancel event work on device removal, but an interrupt could trigger immediately after this, and queue it again. To fix, set a flag. Loosely based on patch by Paolo Bonzini Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virito_scsi: use freezable WQ for eventsPaolo Bonzini1-1/+1
Michael S. Tsirkin noticed a race condition: we reset device on freeze, but system WQ is still running so it might try adding bufs to a VQ meanwhile. To fix, switch to handling events from the freezable WQ. Reported-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: enable VQs early on restoreMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio net violated this rule by using receive VQs within restore. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_console: enable VQs early on restoreMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after resume returns, virtio console violated this rule by adding inbufs, which causes the VQ to be used directly within restore. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_scsi: enable VQs early on restoreMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio scsi violated this rule on restore by kicking event vq within restore. To fix, call virtio_device_ready before using event queue. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_blk: enable VQs early on restoreMichael S. Tsirkin1-3/+6
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after restore returns, virtio block violated this rule on restore by restarting queues, which might in theory cause the VQ to be used directly within restore. To fix, call virtio_device_ready before using starting queues. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_scsi: move kick event out from virtscsi_initMichael S. Tsirkin1-5/+11
We currently kick event within virtscsi_init, before host is fully initialized. This can in theory confuse guest if device consumes the buffers immediately. To fix, move virtscsi_kick_event_all out to scan/restore. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: fix use after free on allocation failureMichael S. Tsirkin1-0/+2
In the extremely unlikely event that driver initialization fails after RX buffers are added, virtio net frees RX buffers while VQs are still active, potentially causing device to use a freed buffer. To fix, reset device first - same as we do on device removal. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-159p/trans_virtio: enable VQs earlyMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, but virtio 9p device adds self to channel list within probe, at which point VQ can be used in violation of the spec. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_console: enable VQs earlyMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio console violated this rule by adding inbufs, which causes the VQ to be used directly within probe. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_blk: enable VQs earlyMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio block violated this rule by calling add_disk, which causes the VQ to be used directly within probe. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: enable VQs earlyMichael S. Tsirkin1-0/+2
virtio spec requires drivers to set DRIVER_OK before using VQs. This is set automatically after probe returns, virtio net violated this rule by using receive VQs within probe. To fix, call virtio_device_ready before using VQs. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio: add API to enable VQs earlyMichael S. Tsirkin1-0/+17
virtio spec 0.9.X requires DRIVER_OK to be set before VQs are used, but some drivers use VQs before probe function returns. Since DRIVER_OK is set after probe, this violates the spec. Even though under virtio 1.0 transitional devices support this behaviour, we want to make it possible for those early callers to become spec compliant and eventually support non-transitional devices. Add API for drivers to call before using VQs. Sets DRIVER_OK internally. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: minor cleanupMichael S. Tsirkin1-4/+2
goto done; done: return; is ugly, it was put there to make diff review easier. replace by open-coded return. Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio-net: drop config_mutexMichael S. Tsirkin1-6/+1
config_mutex served two purposes: prevent multiple concurrent config change handlers, and synchronize access to config_enable flag. Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1 workqueue: make all workqueues non-reentrant all workqueues are non-reentrant, and config_enable is now gone. Get rid of the unnecessary lock. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_net: drop config_enableMichael S. Tsirkin1-23/+4
Now that virtio core ensures config changes don't arrive during probing, drop config_enable flag in virtio net. On removal, flush is now sufficient to guarantee that no change work is queued. This help simplify the driver, and will allow setting DRIVER_OK earlier without losing config change notifications. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio-blk: drop config_mutexMichael S. Tsirkin1-8/+0
config_mutex served two purposes: prevent multiple concurrent config change handlers, and synchronize access to config_enable flag. Since commit dbf2576e37da0fcc7aacbfbb9fd5d3de7888a3c1 workqueue: make all workqueues non-reentrant all workqueues are non-reentrant, and config_enable is now gone. Get rid of the unnecessary lock. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_blk: drop config_enableMichael S. Tsirkin1-19/+4
Now that virtio core ensures config changes don't arrive during probing, drop config_enable flag in virtio blk. On removal, flush is now sufficient to guarantee that no change work is queued. This help simplify the driver, and will allow setting DRIVER_OK earlier without losing config change notifications. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio: defer config changed notificationsMichael S. Tsirkin2-9/+55
Defer config changed notifications that arrive during probe/scan/freeze/restore. This will allow drivers to set DRIVER_OK earlier, without worrying about racing with config change interrupts. This change will also benefit old hypervisors (before 2009) that send interrupts without checking DRIVER_OK: previously, the callback could race with driver-specific initialization. This will also help simplify drivers. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]> (cosmetic changes)
2014-10-15virtio-pci: move freeze/restore to virtio coreMichael S. Tsirkin3-52/+62
This is in preparation to extending config changed event handling in core. Wrapping these in an API also seems to make for a cleaner code. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio: unify config_changed handlingMichael S. Tsirkin7-28/+17
Replace duplicated code in all transports with a single wrapper in virtio.c. The only functional change is in virtio_mmio.c: if a buggy device sends us an interrupt before driver is set, we previously returned IRQ_NONE, now we return IRQ_HANDLED. As this must not happen in practice, this does not look like a big deal. See also commit 3fff0179e33cd7d0a688dab65700c46ad089e934 virtio-pci: do not oops on config change if driver not loaded. for the original motivation behind the driver check. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15virtio_pci: fix virtio spec compliance on restoreMichael S. Tsirkin1-3/+30
On restore, virtio pci does the following: + set features + init vqs etc - device can be used at this point! + set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits This is in violation of the virtio spec, which requires the following order: - ACKNOWLEDGE - DRIVER - init vqs - DRIVER_OK This behaviour will break with hypervisors that assume spec compliant behaviour. It seems like a good idea to have this patch applied to stable branches to reduce the support butden for the hypervisors. Cc: [email protected] Cc: Amit Shah <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Rusty Russell <[email protected]>
2014-10-15modules, lock around setting of MODULE_STATE_UNFORMEDPrarit Bhargava1-0/+2
A panic was seen in the following sitation. There are two threads running on the system. The first thread is a system monitoring thread that is reading /proc/modules. The second thread is loading and unloading a module (in this example I'm using my simple dummy-module.ko). Note, in the "real world" this occurred with the qlogic driver module. When doing this, the following panic occurred: ------------[ cut here ]------------ kernel BUG at kernel/module.c:3739! invalid opcode: 0000 [#1] SMP Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module] CPU: 37 PID: 186343 Comm: cat Tainted: GF O-------------- 3.10.0+ #7 Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013 task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000 RIP: 0010:[<ffffffff810d64c5>] [<ffffffff810d64c5>] module_flags+0xb5/0xc0 RSP: 0018:ffff88080fa7fe18 EFLAGS: 00010246 RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000 RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000 RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000 R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000 FS: 00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00 ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0 Call Trace: [<ffffffff810d666c>] m_show+0x19c/0x1e0 [<ffffffff811e4d7e>] seq_read+0x16e/0x3b0 [<ffffffff812281ed>] proc_reg_read+0x3d/0x80 [<ffffffff811c0f2c>] vfs_read+0x9c/0x170 [<ffffffff811c1a58>] SyS_read+0x58/0xb0 [<ffffffff81605829>] system_call_fastpath+0x16/0x1b Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 RIP [<ffffffff810d64c5>] module_flags+0xb5/0xc0 RSP <ffff88080fa7fe18> Consider the two processes running on the system. CPU 0 (/proc/modules reader) CPU 1 (loading/unloading module) CPU 0 opens /proc/modules, and starts displaying data for each module by traversing the modules list via fs/seq_file.c:seq_open() and fs/seq_file.c:seq_read(). For each module in the modules list, seq_read does op->start() <-- this is a pointer to m_start() op->show() <- this is a pointer to m_show() op->stop() <-- this is a pointer to m_stop() The m_start(), m_show(), and m_stop() module functions are defined in kernel/module.c. The m_start() and m_stop() functions acquire and release the module_mutex respectively. ie) When reading /proc/modules, the module_mutex is acquired and released for each module. m_show() is called with the module_mutex held. It accesses the module struct data and attempts to write out module data. It is in this code path that the above BUG_ON() warning is encountered, specifically m_show() calls static char *module_flags(struct module *mod, char *buf) { int bx = 0; BUG_ON(mod->state == MODULE_STATE_UNFORMED); ... The other thread, CPU 1, in unloading the module calls the syscall delete_module() defined in kernel/module.c. The module_mutex is acquired for a short time, and then released. free_module() is called without the module_mutex. free_module() then sets mod->state = MODULE_STATE_UNFORMED, also without the module_mutex. Some additional code is called and then the module_mutex is reacquired to remove the module from the modules list: /* Now we can delete it from the lists */ mutex_lock(&module_mutex); stop_machine(__unlink_module, mod, NULL); mutex_unlock(&module_mutex); This is the sequence of events that leads to the panic. CPU 1 is removing dummy_module via delete_module(). It acquires the module_mutex, and then releases it. CPU 1 has NOT set dummy_module->state to MODULE_STATE_UNFORMED yet. CPU 0, which is reading the /proc/modules, acquires the module_mutex and acquires a pointer to the dummy_module which is still in the modules list. CPU 0 calls m_show for dummy_module. The check in m_show() for MODULE_STATE_UNFORMED passed for dummy_module even though it is being torn down. Meanwhile CPU 1, which has been continuing to remove dummy_module without holding the module_mutex, now calls free_module() and sets dummy_module->state to MODULE_STATE_UNFORMED. CPU 0 now calls module_flags() with dummy_module and ... static char *module_flags(struct module *mod, char *buf) { int bx = 0; BUG_ON(mod->state == MODULE_STATE_UNFORMED); and BOOM. Acquire and release the module_mutex lock around the setting of MODULE_STATE_UNFORMED in the teardown path, which should resolve the problem. Testing: In the unpatched kernel I can panic the system within 1 minute by doing while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done and while (true) do cat /proc/modules; done in separate terminals. In the patched kernel I was able to run just over one hour without seeing any issues. I also verified the output of panic via sysrq-c and the output of /proc/modules looks correct for all three states for the dummy_module. dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-) dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE) dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+) Signed-off-by: Prarit Bhargava <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Signed-off-by: Rusty Russell <[email protected]> Cc: [email protected]
2014-10-14mnt: Prevent pivot_root from creating a loop in the mount treeEric W. Biederman1-0/+3
Andy Lutomirski recently demonstrated that when chroot is used to set the root path below the path for the new ``root'' passed to pivot_root the pivot_root system call succeeds and leaks mounts. In examining the code I see that starting with a new root that is below the current root in the mount tree will result in a loop in the mount tree after the mounts are detached and then reattached to one another. Resulting in all kinds of ugliness including a leak of that mounts involved in the leak of the mount loop. Prevent this problem by ensuring that the new mount is reachable from the current root of the mount tree. [Added stable cc. Fixes CVE-2014-7970. --Andy] Cc: [email protected] Reported-by: Andy Lutomirski <[email protected]> Reviewed-by: Andy Lutomirski <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]>
2014-10-14tcp: TCP Small Queues and strange attractorsEric Dumazet1-7/+19
TCP Small queues tries to keep number of packets in qdisc as small as possible, and depends on a tasklet to feed following packets at TX completion time. Choice of tasklet was driven by latencies requirements. Then, TCP stack tries to avoid reorders, by locking flows with outstanding packets in qdisc in a given TX queue. What can happen is that many flows get attracted by a low performing TX queue, and cpu servicing TX completion has to feed packets for all of them, making this cpu 100% busy in softirq mode. This became particularly visible with latest skb->xmit_more support Strategy adopted in this patch is to detect when tcp_wfree() is called from ksoftirqd and let the outstanding queue for this flow being drained before feeding additional packets, so that skb->ooo_okay can be set to allow select_queue() to select the optimal queue : Incoming ACKS are normally handled by different cpus, so this patch gives more chance for these cpus to take over the burden of feeding qdisc with future packets. Tested: lpaa23:~# ./super_netperf 1400 --google-pacing-rate 3028000 -H lpaa24 -l 3600 & lpaa23:~# sar -n DEV 1 10 | grep eth1 06:16:18 AM eth1 595448.00 1190564.00 38381.09 1760253.12 0.00 0.00 1.00 06:16:19 AM eth1 594858.00 1189686.00 38340.76 1758952.72 0.00 0.00 0.00 06:16:20 AM eth1 597017.00 1194019.00 38480.79 1765370.29 0.00 0.00 1.00 06:16:21 AM eth1 595450.00 1190936.00 38380.19 1760805.05 0.00 0.00 0.00 06:16:22 AM eth1 596385.00 1193096.00 38442.56 1763976.29 0.00 0.00 1.00 06:16:23 AM eth1 598155.00 1195978.00 38552.97 1768264.60 0.00 0.00 0.00 06:16:24 AM eth1 594405.00 1188643.00 38312.57 1757414.89 0.00 0.00 1.00 06:16:25 AM eth1 593366.00 1187154.00 38252.16 1755195.83 0.00 0.00 0.00 06:16:26 AM eth1 593188.00 1186118.00 38232.88 1753682.57 0.00 0.00 1.00 06:16:27 AM eth1 596301.00 1192241.00 38440.94 1762733.09 0.00 0.00 0.00 Average: eth1 595457.30 1190843.50 38381.69 1760664.84 0.00 0.00 0.50 lpaa23:~# ./tc -s -d qd sh dev eth1 | grep backlog backlog 7606336b 2513p requeues 167982 backlog 224072b 74p requeues 566 backlog 581376b 192p requeues 5598 backlog 181680b 60p requeues 1070 backlog 5305056b 1753p requeues 110166 // Here, this TX queue is attracting flows backlog 157456b 52p requeues 1758 backlog 672216b 222p requeues 3025 backlog 60560b 20p requeues 24541 backlog 448144b 148p requeues 21258 lpaa23:~# echo 1 >/proc/sys/net/ipv4/tcp_tsq_enable_tcp_wfree_ksoftirqd_detect Immediate jump to full bandwidth, and traffic is properly shard on all tx queues. lpaa23:~# sar -n DEV 1 10 | grep eth1 06:16:46 AM eth1 1397632.00 2795397.00 90081.87 4133031.26 0.00 0.00 1.00 06:16:47 AM eth1 1396874.00 2793614.00 90032.99 4130385.46 0.00 0.00 0.00 06:16:48 AM eth1 1395842.00 2791600.00 89966.46 4127409.67 0.00 0.00 1.00 06:16:49 AM eth1 1395528.00 2791017.00 89946.17 4126551.24 0.00 0.00 0.00 06:16:50 AM eth1 1397891.00 2795716.00 90098.74 4133497.39 0.00 0.00 1.00 06:16:51 AM eth1 1394951.00 2789984.00 89908.96 4125022.51 0.00 0.00 0.00 06:16:52 AM eth1 1394608.00 2789190.00 89886.90 4123851.36 0.00 0.00 1.00 06:16:53 AM eth1 1395314.00 2790653.00 89934.33 4125983.09 0.00 0.00 0.00 06:16:54 AM eth1 1396115.00 2792276.00 89984.25 4128411.21 0.00 0.00 1.00 06:16:55 AM eth1 1396829.00 2793523.00 90030.19 4130250.28 0.00 0.00 0.00 Average: eth1 1396158.40 2792297.00 89987.09 4128439.35 0.00 0.00 0.50 lpaa23:~# tc -s -d qd sh dev eth1 | grep backlog backlog 7900052b 2609p requeues 173287 backlog 878120b 290p requeues 589 backlog 1068884b 354p requeues 5621 backlog 996212b 329p requeues 1088 backlog 984100b 325p requeues 115316 backlog 956848b 316p requeues 1781 backlog 1080996b 357p requeues 3047 backlog 975016b 322p requeues 24571 backlog 990156b 327p requeues 21274 (All 8 TX queues get a fair share of the traffic) Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14Merge branches 'core', 'cxgb4', 'iser', 'mlx5' and 'ocrdma' into for-nextRoland Dreier20-953/+1366
2014-10-14Merge branch 'qlcnic'David S. Miller1-3/+3
Rajesh Borundia says: ==================== qlcnic: Bug fixes This series fixes following issues. * We were programming maximum number of arguments supported by adapter instead of required in a command. * Destroy tx command requires three arguments instead of two. Please apply these patches to net. ==================== Signed-off-by: David S. Miller <[email protected]>
2014-10-14qlcnic: Fix number of arguments in destroy tx context commandRajesh Borundia1-1/+1
o Number of arguments taken by destroy tx command is three instead of two. Signed-off-by: Rajesh Borundia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14qlcnic: Fix programming number of arguments in a command.Rajesh Borundia1-2/+2
o Initially we were programming maximum number of arguments. Instead we should program number of arguments required in a command. o Maximum number of arguments for 82xx adapter is four. Fix it for GET_ESWITCH_STATS command. Signed-off-by: Rajesh Borundia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14genl_magic: Resolve logical-op warningsMark Rustad1-2/+2
Resolve "logical 'and' applied to non-boolean constant" warnings" that appear in W=2 builds by adding !! to a bit test. Signed-off-by: Mark Rustad <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14net: Trap attempts to call sock_kfree_s() with a NULL pointer.David S. Miller1-0/+2
Unlike normal kfree() it is never right to call sock_kfree_s() with a NULL pointer, because sock_kfree_s() also has the side effect of discharging the memory from the sockets quota. Signed-off-by: David S. Miller <[email protected]>
2014-10-14rds: avoid calling sock_kfree_s() on allocation failureCong Wang1-3/+4
It is okay to free a NULL pointer but not okay to mischarge the socket optmem accounting. Compile test only. Reported-by: [email protected] Cc: Chien Yen <[email protected]> Cc: Stephen Hemminger <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14cxgb4: Fix FW flash logic using ethtoolHariprasad Shenai3-6/+16
Use t4_fw_upgrade instead of t4_load_fw to write firmware into FLASH, since t4_load_fw doesn't co-ordinate with the firmware and the adapter can get hosed enough to require a power cycle of the system. Based on original work by Casey Leedom <[email protected]> Signed-off-by: Hariprasad Shenai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14perf symbols: Make sym->end be the first address after the symbol rangeArnaldo Carvalho de Melo3-8/+8
To follow vm_area_struct->vm_end convention. By adhering to the convention that ->end is the first address outside the symbol's range we can do things like: sym->end = start + len; len = sym->end - sym->start; This is also now the convention used for struct map->end, fixing some off-by-one bugs. Cc: Adrian Hunter <[email protected]> Cc: Chuck Ebbert <[email protected]> Cc: David Ahern <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2014-10-14perf symbols: Fix map->end fixupArnaldo Carvalho de Melo1-1/+1
When synthesizing maps from files that have incomplete symbol information, like kallsyms, we need to fixup the end of maps by seting its end from the ->start of the next map, fix it to set prev_map->end to curr_map->start, since ->end is the first byte outside prev_map address range. Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2014-10-14perf tools: Fixup off-by-one comparision in maps__findNamhyung Kim1-1/+1
map->end is the first addr _outside_ the a map, following the convention of vm_area_struct->vm_end. Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Stephane Eranian <[email protected]> Cc: David Ahern <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2014-10-14perf tools: fix off-by-one error in mapsStephane Eranian1-3/+3
This patch fixes off-by-one errors in the management of maps. A map is defined by start address and length as implemented by map__new(): map__init(map, type, start, start + len, pgoff, dso); map->start = addr; map->end = end; Consequently, the actual address range is [start; end[ map->end is the first byte outside the range. This patch fixes two bugs where upper bound checking was off-by-one. In V2, we fix map_groups__fixup_overlappings() some more where map->start was off-by-one as reported by Jiri. Signed-off-by: Stephane Eranian <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: David Ahern <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/20141006083532.GA4850@quad Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2014-10-14perf machine: Add missing dsos->root rbtree root initializationArnaldo Carvalho de Melo1-2/+8
A segfault happens on 'perf test hists_link' because we end up using a struct machines on the stack, and then machines__init() was not initializing the newly introduced rb_root, just the existing list_head. When we introduced struct dsos, to group the two ways to store dsos, i.e. the linked list and the rbtree, we didn't turned the initialization done in: machines__init(machines->host) -> machine__init() -> INIT_LIST_HEAD into a dsos__init() to keep on initializing the list_head but _as well_ initializing the rb_root, oops. All worked because outside perf-test we probably zalloc the whole thing which ends up initializing it in to NULL. So the problem looks contained to 'perf test' that uses it on stack, etc. Reported-by: Jiri Olsa <[email protected]> Acked-by: Waiman Long <[email protected]>, Cc: Adrian Hunter <[email protected]>, Cc: Don Zickus <[email protected]> Cc: Douglas Hatch <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Scott J Norton <[email protected]> Cc: Waiman Long <[email protected]>, Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2014-10-14Merge branch 'stmmac'David S. Miller3-211/+255
Giuseppe Cavallaro says: ==================== stmmac: review and fix the dwmac-sti glue-logic This patch is to review the whole glue logic adopted on STi SoCs that was bugged. In the old glue-logic there was a lot of confusion when setup the retiming especially for STiD127 where, for example, the bits 6 and 7 (in the GMAC control register) have a different meaning of what is used for STiH4xx SoCs. So we cannot adopt the same glue for all these SoCs. Moreover, GiGa on STiD127 didn't work and, for all the SoCs, the RGMII couldn't run when the speed was 10Mbps (because the clock was not properly managed). Note that the phy clock needs to be provided by the platform as well as documented in the related binding file (updated as consequence). The old code supported too many configurations never adopted and validated. This made the code very complex to maintain and debug in case of issues. The patch simplifies all the configurations as commented in the tables inside the file and obviously it has been tested on all the boards based on the SoCs mentioned. With this patch, the dwmac-sti is also ready to support new configurations that will be available on next SoC generations. ==================== Signed-off-by: David S. Miller <[email protected]>
2014-10-14stmmac: dwmac-sti: review the glue-logic for STi4xx and STiD127 SoCsGiuseppe CAVALLARO2-210/+253
This patch is to review the whole glue logic adopted on STi SoCs that was bugged. In the old glue-logic there was a lot of confusion when setup the retiming especially for STiD127 where, for example, the bits 6 and 7 (in the GMAC control register) have a different meaning of what is used for STiH4xx SoCs. So we cannot adopt the same glue for all these SoCs. Moreover, GiGa on STiD127 didn't work and, for all the SoCs, the RGMII couldn't run when the speed was 10Mbps (because the clock was not properly managed). Note that the phy clock needs to be provided by the platform as well as documented in the related binding file (updated as consequence). The old code supported too many configurations never adopted and validated. This made the code very complex to maintain and debug in case of issues. The patch simplifies all the configurations as commented in the tables inside the file and obviously it has been tested on all the boards based on the SoCs mentioned. With this patch, the dwmac-sti is also ready to support new configurations that will be available on next SoC generations. Signed-off-by: Giuseppe Cavallaro <[email protected]> Cc: Srinivas Kandagatla <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14stmmac: make the STi Layer compatible to STiH407Giuseppe CAVALLARO2-2/+3
This adds the missing compatibility to the STiH407 SoC. Signed-off-by: Giuseppe Cavallaro <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14stmmac: platform: fix FIXED_PHY support.Giuseppe CAVALLARO1-5/+10
On several STi platforms: e.g. stihxxx-b2120 an Ethernet switch is embedded and connected to the stmmac via RGMII mode. So this is managed by using the FIXED_PHY. In that case, the support in the platform needs to be fixed to allow the stmmac to dialog with the switch via fixed-link by using phy_bus_name property. Signed-off-by: Giuseppe Cavallaro <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-10-14perf evsel: Make some exit routines staticArnaldo Carvalho de Melo2-6/+3
Since they are automatically called by other methods used by tools. Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Ahern <[email protected]> Cc: Don Zickus <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jean Pihet <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2014-10-14perf evsel: Add missing 'target' struct forward declarationArnaldo Carvalho de Melo1-0/+1
We use it in evsel.h but were getting it indirectly, fix it. Noticed while working on having evsel.h usable by rasd.c. Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Ahern <[email protected]> Cc: Don Zickus <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jean Pihet <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2014-10-14perf evlist: Default to syswide target when no thread/cpu maps setArnaldo Carvalho de Melo1-0/+40
If all a tool wants is to do system wide event monitoring, there is no more the need to setup thread_map and cpu_map objects, just call perf_evlist__open() and it will do create one fd per CPU monitoring all threads. Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Ahern <[email protected]> Cc: Don Zickus <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jean Pihet <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2014-10-14perf evlist: Check that there is a thread_map when preparing a workloadArnaldo Carvalho de Melo1-1/+7
The perf_evlist__prepare_workload expects a thread map to be in place so that it can store the pid of the workload being started, so check it and tell the developer about it instead of segfaulting. Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Ahern <[email protected]> Cc: Don Zickus <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jean Pihet <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2014-10-14perf thread_map: Create dummy constructor out of open coded equivalentArnaldo Carvalho de Melo2-8/+14
Create a dummy thread_map, one that has just one entry and it is -1, meaning 'all threads', as this ends up going down to perf_event_open(). Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: David Ahern <[email protected]> Cc: Don Zickus <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jean Pihet <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>