aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-12-29net/mlx4_core: Use-after-free causes a resource leak in flow-steering detachJack Morgenstein1-2/+4
mlx4_QP_FLOW_STEERING_DETACH_wrapper first removes the steering rule (which results in freeing the rule structure), and then references a field in this struct (the qp number) when releasing the busy-status on the rule's qp. Since this memory was freed, it could reallocated and changed. Therefore, the qp number in the struct may be incorrect, so that we are releasing the incorrect qp. This leaves the rule's qp in the busy state (and could possibly release an incorrect qp as well). Fix this by saving the qp number in a local variable, for use after removing the steering rule. Fixes: 2c473ae7e582 ("net/mlx4_core: Disallow releasing VF QPs which have steering rules") Signed-off-by: Jack Morgenstein <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-29net: fix incorrect original ingress device index in PKTINFOWei Zhang1-1/+7
When we send a packet for our own local address on a non-loopback interface (e.g. eth0), due to the change had been introduced from commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the original ingress device index would be set as the loopback interface. However, the packet should be considered as if it is being arrived via the sending interface (eth0), otherwise it would break the expectation of the userspace application (e.g. the DHCPRELEASE message from dhcp_release binary would be ignored by the dnsmasq daemon, since it come from lo which is not the interface dnsmasq bind to) Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO") Acked-by: David Ahern <[email protected]> Signed-off-by: Wei Zhang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-29rtnl: stats - add missing netlink message size checksMathias Krause1-0/+6
We miss to check if the netlink message is actually big enough to contain a struct if_stats_msg. Add a check to prevent userland from sending us short messages that would make us access memory beyond the end of the message. Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump...") Signed-off-by: Mathias Krause <[email protected]> Cc: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-29mm: optimize PageWaiters bit use for unlock_page()Linus Torvalds3-6/+45
In commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") Nick Piggin made our page locking no longer unconditionally touch the hashed page waitqueue, which not only helps performance in general, but is particularly helpful on NUMA machines where the hashed wait queues can bounce around a lot. However, the "clear lock bit atomically and then test the waiters bit" sequence turns out to be much more expensive than it needs to be, because you get a nasty stall when trying to access the same word that just got updated atomically. On architectures where locking is done with LL/SC, this would be trivial to fix with a new primitive that clears one bit and tests another atomically, but that ends up not working on x86, where the only atomic operations that return the result end up being cmpxchg and xadd. The atomic bit operations return the old value of the same bit we changed, not the value of an unrelated bit. On x86, we could put the lock bit in the high bit of the byte, and use "xadd" with that bit (where the overflow ends up not touching other bits), and look at the other bits of the result. However, an even simpler model is to just use a regular atomic "and" to clear the lock bit, and then the sign bit in eflags will indicate the resulting state of the unrelated bit #7. So by moving the PageWaiters bit up to bit #7, we can atomically clear the lock bit and test the waiters bit on x86 too. And architectures with LL/SC (which is all the usual RISC suspects), the particular bit doesn't matter, so they are fine with this approach too. This avoids the extra access to the same atomic word, and thus avoids the costly stall at page unlock time. The only downside is that the interface ends up being a bit odd and specialized: clear a bit in a byte, and test the sign bit. Nick doesn't love the resulting name of the new primitive, but I'd rather make the name be descriptive and very clear about the limitation imposed by trying to work across all relevant architectures than make it be some generic thing that doesn't make the odd semantics explicit. So this introduces the new architecture primitive clear_bit_unlock_is_negative_byte(); and adds the trivial implementation for x86. We have a generic non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)" combination) which can be overridden by any architecture that can do better. According to Nick, Power has the same hickup x86 has, for example, but some other architectures may not even care. All these optimizations mean that my page locking stress-test (which is just executing a lot of small short-lived shell scripts: "make test" in the git source tree) no longer makes our page locking look horribly bad. Before all these optimizations, just the unlock_page() costs were just over 3% of all CPU overhead on "make test". After this, it's down to 0.66%, so just a quarter of the cost it used to be. (The difference on NUMA is bigger, but there this micro-optimization is likely less noticeable, since the big issue on NUMA was not the accesses to 'struct page', but the waitqueue accesses that were already removed by Nick's earlier commit). Acked-by: Nick Piggin <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Bob Peterson <[email protected]> Cc: Steven Whitehouse <[email protected]> Cc: Andrew Lutomirski <[email protected]> Cc: Andreas Gruenbacher <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-29net: stmmac: Fix error path after register_netdev moveFlorian Fainelli1-1/+8
Commit 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and stmmac_open") re-ordered how the MDIO bus registration and the network device are registered, but missed to unwind the MDIO bus registration in case we fail to register the network device. Fixes: 5701659004d6 ("net: stmmac: Fix race between stmmac_drv_probe and stmmac_open") Signed-off-by: Florian Fainelli <[email protected]> Acked-by: Kweh, Hock Leong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-29ipv6: Should use consistent conditional judgement for ip6 fragment between ↵Zheng Li1-1/+1
__ip6_append_data and ip6_finish_output There is an inconsistent conditional judgement between __ip6_append_data and ip6_finish_output functions, the variable length in __ip6_append_data just include the length of application's payload and udp6 header, don't include the length of ipv6 header, but in ip6_finish_output use (skb->len > ip6_skb_dst_mtu(skb)) as judgement, and skb->len include the length of ipv6 header. That causes some particular application's udp6 payloads whose length are between (MTU - IPv6 Header) and MTU were fragmented by ip6_fragment even though the rst->dev support UFO feature. Add the length of ipv6 header to length in __ip6_append_data to keep consistent conditional judgement as ip6_finish_output for ip6 fragment. Signed-off-by: Zheng Li <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net: wan: slic_ds26522: fix spelling mistake: "configurated" -> "configured"Colin Ian King1-1/+1
trivial fix to spelling mistake in pr_info message Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net: atm: Fix warnings in net/atm/lec.c when !CONFIG_PROC_FSAugusto Mecking Caringi1-0/+2
This patch fixes the following warnings when CONFIG_PROC_FS is not set: linux/net/atm/lec.c: In function ‘lane_module_cleanup’: linux/net/atm/lec.c:1062:27: error: ‘atm_proc_root’ undeclared (first use in this function) remove_proc_entry("lec", atm_proc_root); ^ linux/net/atm/lec.c:1062:27: note: each undeclared identifier is reported only once for each function it appears in Signed-off-by: Augusto Mecking Caringi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28Merge branch 'mlx5-fixes'David S. Miller12-187/+45
Saeed Mahameed says: ==================== Mellanox 100G mlx5 fixes 28-12-2016 Some fixes for mlx5 core and ethernet driver. for -stable: net/mlx5: Check FW limitations on log_max_qp before setting it net/mlx5: Cancel recovery work in remove flow net/mlx5: Avoid shadowing numa_node net/mlx5: Mask destination mac value in ethtool steering rules net/mlx5: Prevent setting multicast macs for VFs net/mlx5e: Don't sync netdev state when not registered net/mlx5e: Disable netdev after close ==================== Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5e: Disable netdev after closeSaeed Mahameed1-4/+4
Disable netdev should come after it was closed, although no harm of doing it before -hence the MLX5E_STATE_DESTROYING bit- but it is more natural this way. Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks") Signed-off-by: Saeed Mahameed <[email protected]> Reviewed-by: Mohamad Haj Yahia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5e: Don't sync netdev state when not registeredSaeed Mahameed1-7/+12
Skip setting netdev vxlan ports and netdev rx_mode on driver load when netdev is not yet registered. Synchronizing with netdev state is needed only on reset flow where the netdev remains registered for the whole reset period. This also fixes an access before initialization of net_device.addr_list_lock - which for some reason initialized on register_netdev - where we queued set_rx_mode work on driver load before netdev registration. Fixes: 26e59d8077a3 ("net/mlx5e: Implement mlx5e interface attach/detach callbacks") Signed-off-by: Saeed Mahameed <[email protected]> Reported-by: Sebastian Ott <[email protected]> Reviewed-by: Mohamad Haj Yahia <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5e: Check ets capability before initializing ets settingsHuy Nguyen1-0/+3
During the initial setup, the ets command is sent to firmware without checking if the HCA supports ets. This causes the invalid command error. Add the ets capiblity check before sending firmware command to initialize ets settings. Fixes: e207b7e99176 ("net/mlx5e: ConnectX-4 firmware support for DCBX") Signed-off-by: Huy Nguyen <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28Revert "net/mlx5: Add MPCNT register infrastructure"Gal Pressman3-99/+0
This reverts commit 7f503169cabd70c1f13b9279c50eca7dfb9a7d51. Fixes: 7f503169cabd ("net/mlx5: Add MPCNT register infrastructure") Signed-off-by: Gal Pressman <[email protected]> Reported-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28Revert "net/mlx5e: Expose PCIe statistics to ethtool"Gal Pressman3-72/+1
This reverts commit 9c7262399ba12825f3ca4b00a76d8d5e77c720f5. PCIe counters were introduced in a new firmware version, as a result users with old firmware encountered a syndrome every 200ms due to update stats work. This feature will be re-introduced later with appropriate capabilities infrastructure. Fixes: 9c7262399ba1 ("net/mlx5e: Expose PCIe statistics to ethtool") Signed-off-by: Gal Pressman <[email protected]> Reported-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5: Prevent setting multicast macs for VFsMohamad Haj Yahia1-1/+1
Need to check that VF mac address entered by the admin user is either zero or unicast mac. Multicast mac addresses are prohibited. Fixes: 77256579c6b4 ('net/mlx5: E-Switch, Introduce Vport administration functions') Signed-off-by: Mohamad Haj Yahia <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5: Release FTE lock in error flowMaor Gottlieb1-0/+1
Release the FTE lock when adding rule to the FTE has failed. Fixes: 0fd758d6112f ('net/mlx5: Don't unlock fte while still using it') Signed-off-by: Maor Gottlieb <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5: Mask destination mac value in ethtool steering rulesMaor Gottlieb1-0/+1
We need to mask the destination mac value with the destination mac mask when adding steering rule via ethtool. Fixes: 1174fce8d1410 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering') Signed-off-by: Maor Gottlieb <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5: Avoid shadowing numa_nodeEli Cohen1-2/+1
Avoid using a local variable named numa_node to avoid shadowing a public one. Fixes: db058a186f98 ('net/mlx5_core: Set irq affinity hints') Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5: Cancel recovery work in remove flowDaniel Jurgens1-2/+3
If there is pending delayed work for health recovery it must be canceled if the device is being unloaded. Fixes: 05ac2c0b7438 ("net/mlx5: Fix race between PCI error handlers and health work") Signed-off-by: Daniel Jurgens <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5: Check FW limitations on log_max_qp before setting itNoa Osherovich1-0/+7
When setting HCA capabilities, set log_max_qp to be the minimum between the selected profile's value and the HCA limitation. Fixes: 938fe83c8dcb ('net/mlx5_core: New device capabilities...') Signed-off-by: Noa Osherovich <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/mlx5: Disable RoCE on the e-switch management port under switchdev modeOr Gerlitz1-0/+11
Under the switchdev/offloads mode, packets that don't match any e-switch steering rule are sent towards the e-switch management port. We use a NIC HW steering rule set per vport (uplink and VFs) to make them be received into the host OS through the respective vport representor netdevice. Currnetly such missed RoCE packets will not get to this NIC steering rule, and hence VF RoCE will not work over the slow path of the offloads mode. This is b/c these packets will be matched by a steering rule added by the firmware that serves RoCE traffic set on the PF NIC vport which is also the e-switch management port under SRIOV. Disabling RoCE on the e-switch management vport when we are in the offloads mode, will signal to the firmware to remove their RoCE rule, and then the missed RoCE packets will be matched by the representor NIC steering rule as any other missed packets. To achieve that, we disable RoCE on the PF vport. We do that by removing (hot-unplugging) the IB device instance associated with the PF. This is also required by our current model where the PF serves as the uplink representor and hence only SW switching (TC, bridge, OVS) applications and slow path vport mlx5e net-device should be running over that vport. Fixes: c930a3ad7453 ('net/mlx5e: Add devlink based SRIOV mode changes') Signed-off-by: Or Gerlitz <[email protected]> Reviewed-by: Hadar Hen Zion <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-28net/sched: cls_flower: Fix missing addr_type in classifyPaul Blakey1-0/+4
Since we now use a non zero mask on addr_type, we are matching on its value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip failed in SW/classify path since its value was zero. This patch sets the proper value of addr_type for encapsulated packets. Fixes: 970bfcd09791 ('net/sched: cls_flower: Use mask for addr_type') Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Hadar Hen Zion <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-27net: stmmac: Fix race between stmmac_drv_probe and stmmac_openFlorian Fainelli1-10/+6
There is currently a small window during which the network device registered by stmmac can be made visible, yet all resources, including and clock and MDIO bus have not had a chance to be set up, this can lead to the following error to occur: [ 473.919358] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): stmmac_dvr_probe: warning: cannot get CSR clock [ 473.919382] stmmaceth 0000:01:00.0: no reset control found [ 473.919412] stmmac - user ID: 0x10, Synopsys ID: 0x42 [ 473.919429] stmmaceth 0000:01:00.0: DMA HW capability register supported [ 473.919436] stmmaceth 0000:01:00.0: RX Checksum Offload Engine supported [ 473.919443] stmmaceth 0000:01:00.0: TX Checksum insertion supported [ 473.919451] stmmaceth 0000:01:00.0 (unnamed net_device) (uninitialized): Enable RX Mitigation via HW Watchdog Timer [ 473.921395] libphy: PHY stmmac-1:00 not found [ 473.921417] stmmaceth 0000:01:00.0 eth0: Could not attach to PHY [ 473.921427] stmmaceth 0000:01:00.0 eth0: stmmac_open: Cannot attach to PHY (error: -19) [ 473.959710] libphy: stmmac: probed [ 473.959724] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 0 IRQ POLL (stmmac-1:00) active [ 473.959728] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 1 IRQ POLL (stmmac-1:01) [ 473.959731] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 2 IRQ POLL (stmmac-1:02) [ 473.959734] stmmaceth 0000:01:00.0 eth0: PHY ID 01410cc2 at 3 IRQ POLL (stmmac-1:03) Fix this by making sure that register_netdev() is the last thing being done, which guarantees that the clock and the MDIO bus are available. Fixes: 4bfcbd7abce2 ("stmmac: Move the mdio_register/_unregister in probe/remove") Reported-by: Kweh, Hock Leong <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-27Merge branch 'linus' of ↵Linus Torvalds3-3/+43
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a hash corruption bug in the marvell driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: marvell - Copy IVDIG before launching partial DMA ahash requests
2016-12-27fscrypt: fix the test_dummy_encryption mount optionTheodore Ts'o1-1/+2
Commit f1c131b45410a: "crypto: xts - Convert to skcipher" now fails the setkey operation if the AES key is the same as the tweak key. Previously this check was only done if FIPS mode is enabled. Now this check is also done if weak key checking was requested. This is reasonable, but since we were using the dummy key which was a constant series of 0x42 bytes, it now caused dummy encrpyption test mode to fail. Fix this by using 0x42... and 0x24... for the two keys, so they are different. Fixes: f1c131b45410a202eb45cc55980a7a9e4e4b4f40 Cc: [email protected] Signed-off-by: Theodore Ts'o <[email protected]>
2016-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds17-86/+112
Pull networking fixes from David Miller: 1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar. The most important is to not assume the packet is RX just because the destination address matches that of the device. Such an assumption causes problems when an interface is put into loopback mode. 2) If we retry when creating a new tc entry (because we dropped the RTNL mutex in order to load a module, for example) we end up with -EAGAIN and then loop trying to replay the request. But we didn't reset some state when looping back to the top like this, and if another thread meanwhile inserted the same tc entry we were trying to, we re-link it creating an enless loop in the tc chain. Fix from Daniel Borkmann. 3) There are two different WRITE bits in the MDIO address register for the stmmac chip, depending upon the chip variant. Due to a bug we could set them both, fix from Hock Leong Kweh. 4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: stmmac: fix incorrect bit set in gmac4 mdio addr register r8169: add support for RTL8168 series add-on card. net: xdp: remove unused bfp_warn_invalid_xdp_buffer() openvswitch: upcall: Fix vlan handling. ipv4: Namespaceify tcp_tw_reuse knob net: korina: Fix NAPI versus resources freeing net, sched: fix soft lockup in tc_classify net/mlx4_en: Fix user prio field in XDP forward tipc: don't send FIN message from connectionless socket ipvlan: fix multicast processing ipvlan: fix various issues in ipvlan_process_multicast()
2016-12-27Documentation/unaligned-memory-access.txt: fix incorrect comparison operatorCihangir Akturk1-1/+1
In the actual implementation ether_addr_equal function tests for equality to 0 when returning. It seems in commit 0d74c4 it is somehow overlooked to change this operator to reflect the actual function. Signed-off-by: Cihangir Akturk <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2016-12-27docs: Fix build failureJohn Brooks1-1/+1
The 80211.tmpl DocBook file was removed in commit 819bf593767c ("docs-rst: sphinxify 802.11 documentation"), but the 80211.xml target was re-added to the Makefile by commit 7ddedebb03b7 ("ALSA: doc: ReSTize writing-an-alsa-driver document"), leading to a failure when building the documentation: *** No rule to make target 'Documentation/DocBook/80211.xml', needed by 'Documentation/DocBook/80211.aux.xml'. cc: [email protected] Signed-off-by: John Brooks <[email protected]> Mea-culpa-by: Jonathan Corbet <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2016-12-27Merge tag 'v4.10-rc1' into docs-nextJonathan Corbet11424-224736/+700098
Linux 4.10-rc1
2016-12-27net: stmmac: fix incorrect bit set in gmac4 mdio addr registerKweh, Hock Leong1-1/+3
Fixing the gmac4 mdio write access to use MII_GMAC4_WRITE only instead of OR together with MII_WRITE. Signed-off-by: Kweh, Hock Leong <[email protected]> Acked-By: Joao Pinto <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-27r8169: add support for RTL8168 series add-on card.Chun-Hao Lin1-0/+1
This chip is the same as RTL8168, but its device id is 0x8161. Signed-off-by: Chun-Hao Lin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-27net: xdp: remove unused bfp_warn_invalid_xdp_buffer()Jason Wang2-7/+0
After commit 73b62bd085f4737679ea9afc7867fa5f99ba7d1b ("virtio-net: remove the warning before XDP linearizing"), there's no users for bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for commit f23bc46c30ca5ef58b8549434899fcbac41b2cfc. Cc: Daniel Borkmann <[email protected]> Cc: John Fastabend <[email protected]> Signed-off-by: Jason Wang <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-27openvswitch: upcall: Fix vlan handling.pravin shelar2-28/+27
Networking stack accelerate vlan tag handling by keeping topmost vlan header in skb. This works as long as packet remains in OVS datapath. But during OVS upcall vlan header is pushed on to the packet. When such packet is sent back to OVS datapath, core networking stack might not handle it correctly. Following patch avoids this issue by accelerating the vlan tag during flow key extract. This simplifies datapath by bringing uniform packet processing for packets from all code paths. Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets"). CC: Jarno Rajahalme <[email protected]> CC: Jiri Benc <[email protected]> Signed-off-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-27ipv4: Namespaceify tcp_tw_reuse knobHaishuang Yan4-10/+10
Different namespaces might have different requirements to reuse TIME-WAIT sockets for new connections. This might be required in cases where different namespace applications are in place which require TIME_WAIT socket connections to be reduced independently of the host. Signed-off-by: Haishuang Yan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-27crypto: testmgr - Use heap buffer for acomp test inputLaura Abbott1-2/+28
Christopher Covington reported a crash on aarch64 on recent Fedora kernels: kernel BUG at ./include/linux/scatterlist.h:140! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc #162 Hardware name: linux,dummy-virt (DT) task: ffff80007c650080 task.stack: ffff800008910000 PC is at sg_init_one+0xa0/0xb8 LR is at sg_init_one+0x24/0xb8 ... [<ffff000008398db8>] sg_init_one+0xa0/0xb8 [<ffff000008350a44>] test_acomp+0x10c/0x438 [<ffff000008350e20>] alg_test_comp+0xb0/0x118 [<ffff00000834f28c>] alg_test+0x17c/0x2f0 [<ffff00000834c6a4>] cryptomgr_test+0x44/0x50 [<ffff0000080dac70>] kthread+0xf8/0x128 [<ffff000008082ec0>] ret_from_fork+0x10/0x50 The test vectors used for input are part of the kernel image. These inputs are passed as a buffer to sg_init_one which eventually blows up with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns false for the kernel image since virt_to_page will not return the correct page. Fix this by copying the input vectors to heap buffer before setting up the scatterlist. Reported-by: Christopher Covington <[email protected]> Fixes: d7db7a882deb ("crypto: acomp - update testmgr with support for acomp") Signed-off-by: Laura Abbott <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2016-12-26ext4: Simplify DAX fault pathJan Kara1-38/+10
Now that dax_iomap_fault() calls ->iomap_begin() without entry lock, we can use transaction starting in ext4_iomap_begin() and thus simplify ext4_dax_fault(). It also provides us proper retries in case of ENOSPC. Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-12-26dax: Call ->iomap_begin without entry lock during dax faultJan Kara1-55/+66
Currently ->iomap_begin() handler is called with entry lock held. If the filesystem held any locks between ->iomap_begin() and ->iomap_end() (such as ext4 which will want to hold transaction open), this would cause lock inversion with the iomap_apply() from standard IO path which first calls ->iomap_begin() and only then calls ->actor() callback which grabs entry locks for DAX (if it faults when copying from/to user provided buffers). Fix the problem by nesting grabbing of entry lock inside ->iomap_begin() - ->iomap_end() pair. Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-12-26dax: Finish fault completely when loading holesJan Kara1-9/+18
The only case when we do not finish the page fault completely is when we are loading hole pages into a radix tree. Avoid this special case and finish the fault in that case as well inside the DAX fault handler. It will allow us for easier iomap handling. Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-12-26dax: Avoid page invalidation races and unnecessary radix tree traversalsJan Kara1-17/+11
Currently dax_iomap_rw() takes care of invalidating page tables and evicting hole pages from the radix tree when write(2) to the file happens. This invalidation is only necessary when there is some block allocation resulting from write(2). Furthermore in current place the invalidation is racy wrt page fault instantiating a hole page just after we have invalidated it. So perform the page invalidation inside dax_iomap_actor() where we can do it only when really necessary and after blocks have been allocated so nobody will be instantiating new hole pages anymore. Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-12-26mm: Invalidate DAX radix tree entries only if appropriateJan Kara3-24/+125
Currently invalidate_inode_pages2_range() and invalidate_mapping_pages() just delete all exceptional radix tree entries they find. For DAX this is not desirable as we track cache dirtiness in these entries and when they are evicted, we may not flush caches although it is necessary. This can for example manifest when we write to the same block both via mmap and via write(2) (to different offsets) and fsync(2) then does not properly flush CPU caches when modification via write(2) was the last one. Create appropriate DAX functions to handle invalidation of DAX entries for invalidate_inode_pages2_range() and invalidate_mapping_pages() and wire them up into the corresponding mm functions. Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-12-26ext2: Return BH_New buffers for zeroed blocksJan Kara1-2/+1
So far we did not return BH_New buffers from ext2_get_blocks() when we allocated and zeroed-out a block for DAX inode to avoid racy zeroing in DAX code. This zeroing is gone these days so we can remove the workaround. Reviewed-by: Ross Zwisler <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2016-12-26x86/mce/AMD: Make the init code more robustThomas Gleixner1-0/+3
If mce_device_init() fails then the mce device pointer is NULL and the AMD mce code happily dereferences it. Add a sanity check. Reported-by: Markus Trippelsdorf <[email protected]> Reported-by: Boris Ostrovsky <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-26smp/hotplug: Undo tglxs brainfartThomas Gleixner1-1/+8
The attempt to prevent overwriting an active state resulted in a disaster which effectively disables all dynamically allocated hotplug states. Cleanup the mess. Fixes: dc280d936239 ("cpu/hotplug: Prevent overwriting of callbacks") Reported-by: Markus Trippelsdorf <[email protected]> Reported-by: Boris Ostrovsky <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-12-26arm64: don't pull uaccess.h into *.SAl Viro9-71/+72
Split asm-only parts of arm64 uaccess.h into a new header and use that from *.S. Signed-off-by: Al Viro <[email protected]>
2016-12-26net: korina: Fix NAPI versus resources freeingFlorian Fainelli1-4/+4
Commit beb0babfb77e ("korina: disable napi on close and restart") introduced calls to napi_disable() that were missing before, unfortunately this leaves a small window during which NAPI has a chance to run, yet we just freed resources since korina_free_ring() has been called: Fix this by disabling NAPI first then freeing resource, and make sure that we also cancel the restart task before doing the resource freeing. Fixes: beb0babfb77e ("korina: disable napi on close and restart") Reported-by: Alexandros C. Couloumbis <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-26net, sched: fix soft lockup in tc_classifyDaniel Borkmann1-1/+3
Shahar reported a soft lockup in tc_classify(), where we run into an endless loop when walking the classifier chain due to tp->next == tp which is a state we should never run into. The issue only seems to trigger under load in the tc control path. What happens is that in tc_ctl_tfilter(), thread A allocates a new tp, initializes it, sets tp_created to 1, and calls into tp->ops->change() with it. In that classifier callback we had to unlock/lock the rtnl mutex and returned with -EAGAIN. One reason why we need to drop there is, for example, that we need to request an action module to be loaded. This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning after we loaded and found the requested action, we need to redo the whole request so we don't race against others. While we had to unlock rtnl in that time, thread B's request was processed next on that CPU. Thread B added a new tp instance successfully to the classifier chain. When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN and destroying its tp instance which never got linked, we goto replay and redo A's request. This time when walking the classifier chain in tc_ctl_tfilter() for checking for existing tp instances we had a priority match and found the tp instance that was created and linked by thread B. Now calling again into tp->ops->change() with that tp was successful and returned without error. tp_created was never cleared in the second round, thus kernel thinks that we need to link it into the classifier chain (once again). tp and *back point to the same object due to the match we had earlier on. Thus for thread B's already public tp, we reset tp->next to tp itself and link it into the chain, which eventually causes the mentioned endless loop in tc_classify() once a packet hits the data path. Fix is to clear tp_created at the beginning of each request, also when we replay it. On the paths that can cause -EAGAIN we already destroy the original tp instance we had and on replay we really need to start from scratch. It seems that this issue was first introduced in commit 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup"). Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup") Reported-by: Shahar Klein <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Cc: Cong Wang <[email protected]> Acked-by: Eric Dumazet <[email protected]> Tested-by: Shahar Klein <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-12-25Linux 4.10-rc1Linus Torvalds1-2/+2
2016-12-25powerpc: Fix build warning on 32-bit PPCLarry Finger1-1/+1
I am getting the following warning when I build kernel 4.9-git on my PowerBook G4 with a 32-bit PPC processor: AS arch/powerpc/kernel/misc_32.o arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef] This problem is evident after commit 989cea5c14be ("kbuild: prevent lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an error that has been in the code since 2005 when this source file was created. That was with commit 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S"). The offending line does not make a lot of sense. This error does not seem to cause any errors in the executable, thus I am not recommending that it be applied to any stable versions. Thanks to Nicholas Piggin for suggesting this solution. Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S") Signed-off-by: Larry Finger <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2016-12-25avoid spurious "may be used uninitialized" warningLinus Torvalds1-0/+1
The timer type simplifications caused a new gcc warning: drivers/base/power/domain.c: In function ‘genpd_runtime_suspend’: drivers/base/power/domain.c:562:14: warning: ‘time_start’ may be used uninitialized in this function [-Wmaybe-uninitialized] elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start)); despite the actual use of "time_start" not having changed in any way. It appears that simply changing the type of ktime_t from a union to a plain scalar type made gcc check the use. The variable wasn't actually used uninitialized, but gcc apparently failed to notice that the conditional around the use was exactly the same as the conditional around the initialization of that variable. Add an unnecessary initialization just to shut up the compiler. Signed-off-by: Linus Torvalds <[email protected]>
2016-12-25Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds227-665/+604
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer type cleanups from Thomas Gleixner: "This series does a tree wide cleanup of types related to timers/timekeeping. - Get rid of cycles_t and use a plain u64. The type is not really helpful and caused more confusion than clarity - Get rid of the ktime union. The union has become useless as we use the scalar nanoseconds storage unconditionally now. The 32bit timespec alike storage got removed due to the Y2038 limitations some time ago. That leaves the odd union access around for no reason. Clean it up. Both changes have been done with coccinelle and a small amount of manual mopping up" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Get rid of ktime_equal() ktime: Cleanup ktime_set() usage ktime: Get rid of the union clocksource: Use a plain u64 instead of cycle_t