aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2015-12-15net: diag: Support SOCK_DESTROY for inet sockets.Lorenzo Colitti1-8/+15
This passes the SOCK_DESTROY operation to the underlying protocol diag handler, or returns -EOPNOTSUPP if that handler does not define a destroy operation. Most of this patch is just renaming functions. This is not strictly necessary, but it would be fairly counterintuitive to have the code to destroy inet sockets be in a function whose name starts with inet_diag_get. Signed-off-by: Lorenzo Colitti <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15net: diag: Add the ability to destroy a socket.Lorenzo Colitti1-3/+20
This patch adds a SOCK_DESTROY operation, a destroy function pointer to sock_diag_handler, and a diag_destroy function pointer. It does not include any implementation code. Signed-off-by: Lorenzo Colitti <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15net: diag: split inet_diag_dump_one_icsk into twoLorenzo Colitti1-15/+27
Currently, inet_diag_dump_one_icsk finds a socket and then dumps its information to userspace. Split it into a part that finds the socket and a part that dumps the information. Signed-off-by: Lorenzo Colitti <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15ila: Add generic ILA translation facilityTom Herbert4-1/+691
This patch implements an ILA tanslation table. This table can be configured with identifier to locator mappings, and can be be queried to resolve a mapping. Queries can be parameterized based on interface, direction (incoming or outoing), and matching locator. The table is implemented using rhashtable and is configured via netlink (through "ip ila .." in iproute). The table may be used as alternative means to do do ILA tanslations other than the lw tunnels Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15netlink: add a start callback for starting a netlink dumpTom Herbert2-0/+20
The start callback allows the caller to set up a context for the dump callbacks. Presumably, the context can then be destroyed in the done callback. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15ila: Create net/ipv6/ila directoryTom Herbert5-81/+152
Create ila directory in preparation for supporting other hooks in the kernel than LWT for doing ILA. This includes: - Moving ila.c to ila/ila_lwt.c - Splitting out some common functions into ila_common.c Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15net: Add driver helper functions to determine checksum offloadabilityTom Herbert1-0/+136
Add skb_csum_offload_chk driver helper function to determine if a device with limited checksum offload capabilities is able to offload the checksum for a given packet. This patch includes: - The skb_csum_offload_chk function. Returns true if checksum is offloadable, else false. Optionally, in the case that the checksum is not offloable, the function can call skb_checksum_help to resolve the checksum. skb_csum_offload_chk also returns whether the checksum refers to an encapsulated checksum. - Definition of skb_csum_offl_spec structure that caller uses to indicate rules about what it can offload (e.g. IPv4/v6, TCP/UDP only, whether encapsulated checksums can be offloaded, whether checksum with IPv6 extension headers can be offloaded). - Ancilary functions called skb_csum_offload_chk_help, skb_csum_off_chk_help_cmn, skb_csum_off_chk_help_cmn_v4_only. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15tcp: Fix conditions to determine checksum offloadTom Herbert1-2/+2
In tcp_send_sendpage and tcp_sendmsg we check the route capabilities to determine if checksum offload can be performed. This check currently does not take the IP protocol into account for devices that advertise only one of NETIF_F_IPV6_CSUM or NETIF_F_IP_CSUM. This patch adds a function to check capabilities for checksum offload with a socket called sk_check_csum_caps. This function checks for specific IPv4 or IPv6 offload support based on the family of the socket. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15net: Eliminate NETIF_F_GEN_CSUM and NETIF_F_V[46]_CSUMTom Herbert8-15/+19
These netif flags are unnecessary convolutions. It is more straightforward to just use NETIF_F_HW_CSUM, NETIF_F_IP_CSUM, and NETIF_F_IPV6_CSUM directly. This patch also: - Cleans up can_checksum_protocol - Simplifies netdev_intersect_features Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASKTom Herbert4-9/+9
The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the set of features for offloading all checksums. This is a mask of the checksum offload related features bits. It is incorrect to set both NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for features of a device. This patch: - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where NETIF_F_ALL_CSUM is being used as a mask). - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15sctp: Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRCTom Herbert4-5/+5
The SCTP checksum is really a CRC and is very different from the standards 1's complement checksum that serves as the checksum for IP protocols. This offload interface is also very different. Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC to highlight these differences. The term CSUM should be reserved in the stack to refer to the standard 1's complement IP checksum. Signed-off-by: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15net: fix uninitialized variable issue[email protected]1-0/+1
msg_iocb needs to be initialized on the recv/recvfrom path. Otherwise afalg will wrongly interpret it as an async call. Cc: [email protected] Reported-by: Harald Freudenberger <[email protected]> Signed-off-by: Tadeusz Struk <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15bluetooth: Validate socket address length in sco_sock_bind().David S. Miller1-0/+3
Signed-off-by: David S. Miller <[email protected]>
2015-12-15net_sched: make qdisc_tree_decrease_qlen() work for non mqEric Dumazet1-1/+1
Stas Nichiporovich reported a regression in his HFSC qdisc setup on a non multi queue device. It turns out I mistakenly added a TCQ_F_NOPARENT flag on all qdisc allocated in qdisc_create() for non multi queue devices, which was rather buggy. I was clearly mislead by the TCQ_F_ONETXQUEUE that is also set here for no good reason, since it only matters for the root qdisc. Fixes: 4eaf3b84f288 ("net_sched: fix qdisc_tree_decrease_qlen() races") Reported-by: Stas Nichiporovich <[email protected]> Tested-by: Stas Nichiporovich <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15switchdev: Pass original device to port netdev driverIdo Schimmel7-0/+20
switchdev drivers need to know the netdev on which the switchdev op was invoked. For example, the STP state of a VLAN interface configured on top of a port can change while being member in a bridge. In this case, the underlying driver should only change the STP state of that particular VLAN and not of all the VLANs configured on the port. However, current switchdev infrastructure only passes the port netdev down to the driver. Solve that by passing the original device down to the driver as part of the required switchdev object / attribute. This doesn't entail any change in current switchdev drivers. It simply enables those supporting stacked devices to know the originating device and act accordingly. Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-15switchdev: vlan: Use switchdev_port* in vlan_netdev_opsIdo Schimmel1-0/+7
We need to be able to propagate static FDB entries and certain bridge port attributes (e.g. learning, flooding) down to the port netdev driver when bridge port is a VLAN interface. Achieve that by setting ndo_bridge* and ndo_fdb* in vlan_netdev_ops to the corresponding switchdev_port* functions. This is consistent with team and bond devices. Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-16batman-adv: rename equiv/equal or better to similar or betterSimon Wunderlich4-11/+11
Since the function applies a threshold and also slightly worse values are accepted, ''equal or better'' does not represent the intention of the function. ''Similar or better'' represents that better. Signed-off-by: Simon Wunderlich <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-12-16batman-adv: update last seen field of single hop originatorsMarek Lindner1-0/+10
Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-12-16batman-adv: export single hop neighbor list via debugfsMarek Lindner5-0/+100
Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-12-16batman-adv: add bat_hardif_neigh_init algo ops callMarek Lindner2-0/+6
Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-12-16batman-adv: add list of unique single hop neighbors per hard-interfaceMarek Lindner4-0/+188
Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
2015-12-15nfnetlink: add nfnl_dereference_protected helperFlorian Westphal1-6/+7
to avoid overly long line in followup patch. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-12-15mac80211: handle width changes from opmode notification IE in beaconEyal Shapira5-14/+8
An AP can send an operating channel width change in a beacon opmode notification IE as long as there's a change in the nss as well (See 802.11ac-2013 section 10.41). So don't limit updating to nss only from an opmode notification IE. Signed-off-by: Eyal Shapira <[email protected]> Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-12-15mac80211: suppress unchanged "limiting TX power" messagesJohannes Berg1-5/+10
When the AP is advertising limited TX power, the message can be printed over and over again. Suppress it when the power level isn't changing. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=106011 Signed-off-by: Johannes Berg <[email protected]>
2015-12-15mac80211: reprogram in interface orderJohannes Berg1-39/+37
During reprogramming, mac80211 currently first adds all the channel contexts, then binds them to the vifs and then goes to reconfigure all the interfaces. Drivers might, perhaps implicitly, rely on the operation order for certain things that typically happen within a single function elsewhere in mac80211. To avoid problems with that, reorder the code in mac80211's restart/reprogramming to work fully within the interface loop so that the order of operations is like in normal operation. For iwlwifi, this fixes a firmware crash when reprogramming with an AP/GO interface active. Reported-by: David Spinadel <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-12-15mac80211: run scan completed work on reconfig failureJohannes Berg1-11/+26
When reconfiguration during resume fails while a scan is pending for completion work, that work will never run, and the scan will be stuck forever. Factor out the code to recover this and call it also in ieee80211_handle_reconfig_failure(). Signed-off-by: Johannes Berg <[email protected]>
2015-12-15nl80211: Fix potential memory leak in nl80211_connectOla Olsson1-1/+3
Free cached keys if the last early return path is taken. Signed-off-by: Ola Olsson <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-12-15nl80211: Fix potential memory leak in nl80211_set_wowlanOla Olsson1-0/+1
Compared to cfg80211_rdev_free_wowlan in core.h, the error goto label lacks the freeing of nd_config. Fix that. Signed-off-by: Ola Olsson <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-12-15nl80211: fix a few memory leaks in reg.cOla Olsson1-1/+4
The first leak occurs when entering the default case in the switch for the initiator in set_regdom. The second leaks a platform_device struct if the platform registration in regulatory_init succeeds but the sub sequent regulatory hint fails due to no memory. Signed-off-by: Ola Olsson <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2015-12-15skbuff: Fix offset error in skb_reorder_vlan_headerVlad Yasevich1-1/+1
skb_reorder_vlan_header is called after the vlan header has been pulled. As a result the offset of the begining of the mac header has been incrased by 4 bytes (VLAN_HLEN). When moving the mac addresses, include this incrase in the offset calcualation so that the mac addresses are copied correctly. Fixes: a6e18ff1117 (vlan: Fix untag operations of stacked vlans with REORDER_HEADER off) CC: Nicolas Dichtel <[email protected]> CC: Patrick McHardy <[email protected]> Signed-off-by: Vladislav Yasevich <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-14net: fix IP early demux racesEric Dumazet2-5/+3
David Wilder reported crashes caused by dst reuse. <quote David> I am seeing a crash on a distro V4.2.3 kernel caused by a double release of a dst_entry. In ipv4_dst_destroy() the call to list_empty() finds a poisoned next pointer, indicating the dst_entry has already been removed from the list and freed. The crash occurs 18 to 24 hours into a run of a network stress exerciser. </quote> Thanks to his detailed report and analysis, we were able to understand the core issue. IP early demux can associate a dst to skb, after a lookup in TCP/UDP sockets. When socket cache is not properly set, we want to store into sk->sk_dst_cache the dst for future IP early demux lookups, by acquiring a stable refcount on the dst. Problem is this acquisition is simply using an atomic_inc(), which works well, unless the dst was queued for destruction from dst_release() noticing dst refcount went to zero, if DST_NOCACHE was set on dst. We need to make sure current refcount is not zero before incrementing it, or risk double free as David reported. This patch, being a stable candidate, adds two new helpers, and use them only from IP early demux problematic paths. It might be possible to merge in net-next skb_dst_force() and skb_dst_force_safe(), but I prefer having the smallest patch for stable kernels : Maybe some skb_dst_force() callers do not expect skb->dst can suddenly be cleared. Can probably be backported back to linux-3.6 kernels Reported-by: David J. Wilder <[email protected]> Tested-by: David J. Wilder <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-14Merge branch 'for-upstream' of ↵David S. Miller23-1305/+1580
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-12-11 Here's another set of Bluetooth & 802.15.4 patches for the 4.5 kernel: - 6LoWPAN debugfs support - New 802.15.4 driver for ADF7242 MAC IEEE802154 - Initial code for 6LoWPAN Generic Header Compression (GHC) support - Refactor Bluetooth LE scan & advertising behind dedicated workqueue - Cleanups to Bluetooth H:5 HCI driver - Support for Toshiba Broadcom based Bluetooth controllers - Use continuous scanning when establishing Bluetooth LE connections Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-12-14iucv: call skb_linearize() when neededEugene Crosser1-5/+15
When the linear buffer of the received sk_buff is shorter than the header, use skb_linearize(). sk_buffs with short linear buffer happen on the sending side under high traffic, and some kernel configurations, when allocated buffer starts just before page boundary, and IUCV transport has to send it as two separate QDIO buffer elements, with fist element shorter than the header. Signed-off-by: Eugene Crosser <[email protected]> Signed-off-by: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-14iucv: prevent information leak in iucv_messageEugene Crosser1-1/+1
Initialize storage for the future IUCV header that will be included in the transmitted packet. Some of the header fields are unused with HiperSockets transport, and will contain data left from some other functions. Signed-off-by: Eugene Crosser <[email protected]> Signed-off-by: Ursula Braun <[email protected]> Reviewed-by: Thomas Richter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-14ipv6: addrconf: drop ieee802154 specific thingsAlexander Aring1-5/+3
This patch removes ARPHRD_IEEE802154 from addrconf handling. In the earlier days of 802.15.4 6LoWPAN, the interface type was ARPHRD_IEEE802154 which introduced several issues, because 802.15.4 interfaces used the same type. Since commit 965e613d299c ("ieee802154: 6lowpan: fix ARPHRD to ARPHRD_6LOWPAN") we use ARPHRD_6LOWPAN for 6LoWPAN interfaces. This patch will remove ARPHRD_IEEE802154 which is currently deadcode, because ARPHRD_IEEE802154 doesn't reach the minimum 1280 MTU of IPv6. Also we use 6LoWPAN EUI64 specific defines instead using link-layer constanst from 802.15.4 link-layer header. Cc: David S. Miller <[email protected]> Cc: Alexey Kuznetsov <[email protected]> Cc: James Morris <[email protected]> Cc: Hideaki YOSHIFUJI <[email protected]> Cc: Patrick McHardy <[email protected]> Signed-off-by: Alexander Aring <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-14net: add validation for the socket syscall protocol argumentHannes Frederic Sowa5-0/+15
郭永刚 reported that one could simply crash the kernel as root by using a simple program: int socket_fd; struct sockaddr_in addr; addr.sin_port = 0; addr.sin_addr.s_addr = INADDR_ANY; addr.sin_family = 10; socket_fd = socket(10,3,0x40000000); connect(socket_fd , &addr,16); AF_INET, AF_INET6 sockets actually only support 8-bit protocol identifiers. inet_sock's skc_protocol field thus is sized accordingly, thus larger protocol identifiers simply cut off the higher bits and store a zero in the protocol fields. This could lead to e.g. NULL function pointer because as a result of the cut off inet_num is zero and we call down to inet_autobind, which is NULL for raw sockets. kernel: Call Trace: kernel: [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70 kernel: [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80 kernel: [<ffffffff81645069>] SYSC_connect+0xd9/0x110 kernel: [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80 kernel: [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200 kernel: [<ffffffff81645e0e>] SyS_connect+0xe/0x10 kernel: [<ffffffff81779515>] tracesys_phase2+0x84/0x89 I found no particular commit which introduced this problem. CVE: CVE-2015-8543 Cc: Cong Wang <[email protected]> Reported-by: 郭永刚 <[email protected]> Signed-off-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-14netfilter: implement xt_cgroup cgroup2 path matchTejun Heo1-0/+69
This patch implements xt_cgroup path match which matches cgroup2 membership of the associated socket. The match is recursive and invertible. For rationales on introducing another cgroup based match, please refer to a preceding commit "sock, cgroup: add sock->sk_cgroup". v3: Folded into xt_cgroup as a new revision interface as suggested by Pablo. v2: Included linux/limits.h from xt_cgroup2.h for PATH_MAX. Added explicit alignment to the priv field. Both suggested by Jan. Signed-off-by: Tejun Heo <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Daniel Wagner <[email protected]> CC: Neil Horman <[email protected]> Cc: Jan Engelhardt <[email protected]> Cc: Pablo Neira Ayuso <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-12-14netfilter: prepare xt_cgroup for multi revisionsTejun Heo1-17/+19
xt_cgroup will grow cgroup2 path based match. Postfix existing symbols with _v0 and prepare for multi revision registration. Signed-off-by: Tejun Heo <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Daniel Wagner <[email protected]> CC: Neil Horman <[email protected]> Cc: Jan Engelhardt <[email protected]> Cc: Pablo Neira Ayuso <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-12-14Merge branch 'master' of ↵Pablo Neira Ayuso135-2834/+3803
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Resolve conflict between commit 264640fc2c5f4f ("ipv6: distinguish frag queues by device for multicast and link-local packets") from the net tree and commit 029f7f3b8701c ("netfilter: ipv6: nf_defrag: avoid/free clone operations") from the nf-next tree. Signed-off-by: Pablo Neira Ayuso <[email protected]> Conflicts: net/ipv6/netfilter/nf_conntrack_reasm.c
2015-12-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller5-58/+56
Pablo Neira Ayuso says: ==================== netfilter fixes for net The following patchset contains Netfilter fixes for you net tree, specifically for nf_tables and nfnetlink_queue, they are: 1) Avoid a compilation warning in nfnetlink_queue that was introduced in the previous merge window with the simplification of the conntrack integration, from Arnd Bergmann. 2) nfnetlink_queue is leaking the pernet subsystem registration from a failure path, patch from Nikolay Borisov. 3) Pass down netns pointer to batch callback in nfnetlink, this is the largest patch and it is not a bugfix but it is a dependency to resolve a splat in the correct way. 4) Fix a splat due to incorrect socket memory accounting with nfnetlink skbuff clones. 5) Add missing conntrack dependencies to NFT_DUP_IPV4 and NFT_DUP_IPV6. 6) Traverse the nftables commit list in reverse order from the commit path, otherwise we crash when the user applies an incremental update via 'nft -f' that deletes an object that was just introduced in this batch, from Xin Long. Regarding the compilation warning fix, many people have sent us (and keep sending us) patches to address this, that's why I'm including this batch even if this is not critical. ==================== Signed-off-by: David S. Miller <[email protected]>
2015-12-14netfilter: cttimeout: add netns supportPablo Neira3-33/+53
Add a per-netns list of timeout objects and adjust code to use it. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-12-13net: Flush local routes when device changes vrf associationDavid Ahern1-0/+9
The VRF driver cycles netdevs when an interface is enslaved or released: the down event is used to flush neighbor and route tables and the up event (if the interface was already up) effectively moves local and connected routes to the proper table. As of 4f823defdd5b the local route is left hanging around after a link down, so when a netdev is moved from one VRF to another (or released from a VRF altogether) local routes are left in the wrong table. Fix by handling the NETDEV_CHANGEUPPER event. When the upper dev is an L3mdev then call fib_disable_ip to flush all routes, local ones to. Fixes: 4f823defdd5b ("ipv4: fix to not remove local route on link down") Cc: Julian Anastasov <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-13sched/wait: Fix the signal handling fixPeter Zijlstra1-3/+3
Jan Stancek reported that I wrecked things for him by fixing things for Vladimir :/ His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which should not be possible, however my previous patch made this possible by unconditionally checking signal_pending(). We cannot use current->state as was done previously, because the instruction after the store to that variable it can be changed. We must instead pass the initial state along and use that. Fixes: 68985633bccb ("sched/wait: Fix signal handling in bit wait helpers") Reported-by: Jan Stancek <[email protected]> Reported-by: Chris Mason <[email protected]> Tested-by: Jan Stancek <[email protected]> Tested-by: Vladimir Murzin <[email protected]> Tested-by: Chris Mason <[email protected]> Reviewed-by: Paul Turner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: [email protected] Cc: Oleg Nesterov <[email protected]> Cc: [email protected] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-12-13netfilter: nf_tables: use reverse traversal commit_list in nf_tables_abortXin Long1-1/+2
When we use 'nft -f' to submit rules, it will build multiple rules into one netlink skb to send to kernel, kernel will process them one by one. meanwhile, it add the trans into commit_list to record every commit. if one of them's return value is -EAGAIN, status |= NFNL_BATCH_REPLAY will be marked. after all the process is done. it will roll back all the commits. now kernel use list_add_tail to add trans to commit, and use list_for_each_entry_safe to roll back. which means the order of adding and rollback is the same. that will cause some cases cannot work well, even trigger call trace, like: 1. add a set into table foo [return -EAGAIN]: commit_list = 'add set trans' 2. del foo: commit_list = 'add set trans' -> 'del set trans' -> 'del tab trans' then nf_tables_abort will be called to roll back: firstly process 'add set trans': case NFT_MSG_NEWSET: trans->ctx.table->use--; list_del_rcu(&nft_trans_set(trans)->list); it will del the set from the table foo, but it has removed when del table foo [step 2], then the kernel will panic. the right order of rollback should be: 'del tab trans' -> 'del set trans' -> 'add set trans'. which is opposite with commit_list order. so fix it by rolling back commits with reverse order in nf_tables_abort. Signed-off-by: Xin Long <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2015-12-13Merge tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2-8/+12
Pull NFS client bugfix from Trond Myklebust: "SUNRPC: Fix a NFSv4.1 callback channel regression" * tag 'nfs-for-4.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Fix callback channel
2015-12-12mpls: make via address optional for multipath routesRobert Shearman1-9/+13
The via address is optional for a single path route, yet is mandatory when the multipath attribute is used: # ip -f mpls route add 100 dev lo # ip -f mpls route add 101 nexthop dev lo RTNETLINK answers: Invalid argument Make them consistent by making the via address optional when the RTA_MULTIPATH attribute is being parsed so that both forms of specifying the route work. Signed-off-by: Robert Shearman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-12mpls: fix out-of-bounds access when via address not specifiedRobert Shearman1-5/+12
When a via address isn't specified, the via table is left initialised to 0 (NEIGH_ARP_TABLE), and the via address length also left initialised to 0. This results in a via address array of length 0 being allocated (contiguous with route and nexthop array), meaning that when a packet is sent using neigh_xmit the neighbour lookup and creation will cause an out-of-bounds access when accessing the 4 bytes of the IPv4 address it assumes it has been given a pointer to. This could be fixed by allocating the 4 bytes of via address necessary and leaving it as all zeroes. However, it seems wrong to me to use an ipv4 nexthop (including possibly ARPing for 0.0.0.0) when the user didn't specify to do so. Instead, set the via address table to NEIGH_NR_TABLES to signify it hasn't been specified and use this at forwarding time to signify a neigh_xmit using an L2 address consisting of the device address. This mechanism is the same as that used for both ARP and ND for loopback interfaces and those flagged as no-arp, which are all we can really support in this case. Fixes: cf4b24f0024f ("mpls: reduce memory usage of routes") Signed-off-by: Robert Shearman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-12mpls: don't dump RTA_VIA attribute if not specifiedRobert Shearman1-2/+6
The problem seen is that when adding a route with a nexthop with no via address specified, iproute2 generates bogus output: # ip -f mpls route add 100 dev lo # ip -f mpls route list 100 via inet 0.0.8.0 dev lo The reason for this is that the kernel generates an RTA_VIA attribute with the family set to AF_INET, but the via address data having zero length. The cause of family being AF_INET is that on route insert cfg->rc_via_table is left set to 0, which just happens to be NEIGH_ARP_TABLE which is then translated into AF_INET. iproute2 doesn't validate the length prior to printing and so prints garbage. Although it could be fixed to do the validation, I would argue that AF_INET addresses should always be exactly 4 bytes so the kernel is really giving userspace bogus data. Therefore, avoid generating the RTA_VIA attribute when dumping the route if the via address wasn't specified on add/modify. This is indicated by NEIGH_ARP_TABLE and a zero via address length - if the user specified a via address the address length would have been validated such that it was 4 bytes. Although this is a change in behaviour that is visible to userspace, I believe that what was generated before was invalid and as such userspace wouldn't be expecting it. Signed-off-by: Robert Shearman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-12mpls: validate L2 via address lengthRobert Shearman1-0/+4
If an L2 via address for an mpls nexthop is specified, the length of the L2 address must match that expected by the output device, otherwise it could access memory beyond the end of the via address buffer in the route. This check was present prior to commit f8efb73c97e2 ("mpls: multipath route support"), but got lost in the refactoring, so add it back, applying it to all nexthops in multipath routes. Fixes: f8efb73c97e2 ("mpls: multipath route support") Signed-off-by: Robert Shearman <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-12-11openvswitch: Respect conntrack zone even if invalidJoe Stringer1-2/+5
If userspace executes ct(zone=1), and the connection tracker determines that the packet is invalid, then the ct_zone flow key field is populated with the default zone rather than the zone that was specified. Even though connection tracking failed, this field should be updated with the value that the action specified. Fix the issue. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Joe Stringer <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>