aboutsummaryrefslogtreecommitdiff
path: root/net/core
AgeCommit message (Collapse)AuthorFilesLines
2012-05-08net: export sysctl_[r|w]mem_max symbols needed by ip_vs_syncHans Schillstrom1-0/+2
To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max needs to be exported. The dependency was added by "ipvs: wakeup master thread" patch. Signed-off-by: Hans Schillstrom <[email protected]> Signed-off-by: Simon Horman <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2012-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-24/+64
Conflicts: drivers/net/ethernet/intel/e1000e/param.c drivers/net/wireless/iwlwifi/iwl-agn-rx.c drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c drivers/net/wireless/iwlwifi/iwl-trans.h Resolved the iwlwifi conflict with mainline using 3-way diff posted by John Linville and Stephen Rothwell. In 'net' we added a bug fix to make iwlwifi report a more accurate skb->truesize but this conflicted with RX path changes that happened meanwhile in net-next. In e1000e a conflict arose in the validation code for settings of adapter->itr. 'net-next' had more sophisticated logic so that logic was used. Signed-off-by: David S. Miller <[email protected]>
2012-05-06skb: Add inline helper for getting the skb end offset from headAlexander Duyck1-6/+6
With the recent changes for how we compute the skb truesize it occurs to me we are probably going to have a lot of calls to skb_end_pointer - skb->head. Instead of running all over the place doing that it would make more sense to just make it a separate inline skb_end_offset(skb) that way we can return the correct value without having gcc having to do all the optimization to cancel out skb->head - skb->head. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-05-06skb: Drop "fastpath" variable for skb_cloned check in pskb_expand_headAlexander Duyck1-14/+8
Since there is now only one spot that actually uses "fastpath" there isn't much point in carrying it. Instead we can just use a check for skb_cloned to verify if we can perform the fast-path free for the head or not. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-05-06skb: Drop bad code from pskb_expand_headAlexander Duyck1-12/+0
The fast-path for pskb_expand_head contains a check where the size plus the unaligned size of skb_shared_info is compared against the size of the data buffer. This code path has two issues. First is the fact that after the recent changes by Eric Dumazet to __alloc_skb and build_skb the shared info is always placed in the optimal spot for a buffer size making this check unnecessary. The second issue is the fact that the check doesn't take into account the aligned size of shared info. As a result the code burns cycles doing a memcpy with nothing actually being shifted. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-05-04Merge tag 'v3.4-rc5' into nextJames Morris4-16/+42
Linux 3.4-rc5 Merge to pull in prerequisite change for Smack: 86812bb0de1a3758dc6c7aa01a763158a7c0638a Requested by Casey.
2012-05-03skb: Add skb_head_is_locked helper functionAlexander Duyck1-2/+1
This patch adds support for a skb_head_is_locked helper function. It is meant to be used any time we are considering transferring the head from skb->head to a paged frag. If the head is locked it means we cannot remove the head from the skb so it must be copied or we must take the skb as a whole. Signed-off-by: Alexander Duyck <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-05-03net: Fix truesize accounting in skb_gro_receive()Eric Dumazet1-3/+8
GRO is very optimistic in skb truesize estimates, only taking into account the used part of fragments. Be conservative, and use more precise computation, so that bloated GRO skbs can be collapsed eventually. Signed-off-by: Eric Dumazet <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Jeff Kirsher <[email protected]> Acked-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-05-03userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgidEric W. Biederman1-2/+2
These function are no longer needed replace them with their more useful equivalents. Acked-by: Serge Hallyn <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2012-05-03net: Add missing linux/prefetch.h include to net/core/sock.cDavid S. Miller1-0/+1
Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-05-03net: Stop decapitating clones that have a head_fragAlexander Duyck1-4/+5
This change is meant ot prevent stealing the skb->head to use as a page in the event that the skb->head was cloned. This allows the other clones to track each other via shinfo->dataref. Without this we break down to two methods for tracking the reference count, one being dataref, the other being the page count. As a result it becomes difficult to track how many references there are to skb->head. Signed-off-by: Alexander Duyck <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jeff Kirsher <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-05-02drop_monitor: prevent init path from scheduling on the wrong cpuNeil Horman1-5/+7
I just noticed after some recent updates, that the init path for the drop monitor protocol has a minor error. drop monitor maintains a per cpu structure, that gets initalized from a single cpu. Normally this is fine, as the protocol isn't in use yet, but I recently made a change that causes a failed skb allocation to reschedule itself . Given the current code, the implication is that this workqueue reschedule will take place on the wrong cpu. If drop monitor is used early during the boot process, its possible that two cpus will access a single per-cpu structure in parallel, possibly leading to data corruption. This patch fixes the situation, by storing the cpu number that a given instance of this per-cpu data should be accessed from. In the case of a need for a reschedule, the cpu stored in the struct is assigned the rescheule, rather than the currently executing cpu Tested successfully by myself. Signed-off-by: Neil Horman <[email protected]> CC: David Miller <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-05-01net: add a prefetch in socket backlog processingEric Dumazet1-0/+1
TCP or UDP stacks have big enough latencies that prefetching next pointer is worth it. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-30net: makes skb_splice_bits() aware of skb->head_fragEric Dumazet1-3/+7
__skb_splice_bits() can check if skb to be spliced has its skb->head mapped to a page fragment, instead of a kmalloc() area. If so we can avoid a copy of the skb head and get a reference on underlying page. Signed-off-by: Eric Dumazet <[email protected]> Cc: Ilpo Järvinen <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Maciej Żenczykowski <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Tom Herbert <[email protected]> Cc: Jeff Kirsher <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Matt Carlson <[email protected]> Cc: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-30net: make GRO aware of skb->head_fragEric Dumazet2-2/+30
GRO can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We 'upgrade' skb->head as a fragment in itself This avoids the frag_list fallback, and permits to build true GRO skb (one sk_buff and up to 16 fragments), using less memory. This reduces number of cache misses when user makes its copy, since a single sk_buff is fetched. This is a followup of patch "net: allow skb->head to be a page fragment" Signed-off-by: Eric Dumazet <[email protected]> Cc: Ilpo Järvinen <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Maciej Żenczykowski <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Tom Herbert <[email protected]> Cc: Jeff Kirsher <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Matt Carlson <[email protected]> Cc: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-30net: allow skb->head to be a page fragmentEric Dumazet1-6/+18
skb->head is currently allocated from kmalloc(). This is convenient but has the drawback the data cannot be converted to a page fragment if needed. We have three spots were it hurts : 1) GRO aggregation When a linear skb must be appended to another skb, GRO uses the frag_list fallback, very inefficient since we keep all struct sk_buff around. So drivers enabling GRO but delivering linear skbs to network stack aren't enabling full GRO power. 2) splice(socket -> pipe). We must copy the linear part to a page fragment. This kind of defeats splice() purpose (zero copy claim) 3) TCP coalescing. Recently introduced, this permits to group several contiguous segments into a single skb. This shortens queue lengths and save kernel memory, and greatly reduce probabilities of TCP collapses. This coalescing doesnt work on linear skbs (or we would need to copy data, this would be too slow) Given all these issues, the following patch introduces the possibility of having skb->head be a fragment in itself. We use a new skb flag, skb->head_frag to carry this information. build_skb() is changed to accept a frag_size argument. Drivers willing to provide a page fragment instead of kmalloc() data will set a non zero value, set to the fragment size. Then, on situations we need to convert the skb head to a frag in itself, we can check if skb->head_frag is set and avoid the copies or various fallbacks we have. This means drivers currently using frags could be updated to avoid the current skb->head allocation and reduce their memory footprint (aka skb truesize). (thats 512 or 1024 bytes saved per skb). This also makes bpf/netfilter faster since the 'first frag' will be part of skb linear part, no need to copy data. Signed-off-by: Eric Dumazet <[email protected]> Cc: Ilpo Järvinen <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Maciej Żenczykowski <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Tom Herbert <[email protected]> Cc: Jeff Kirsher <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Matt Carlson <[email protected]> Cc: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-28net: Fixed a coding style issue related to spaces.Jeffrin Jose1-1/+1
Fixed a coding style issue relating to spaces in net/core/sock.c Signed-off-by: Jeffrin Jose <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-28drop_monitor: Make updating data->skb smp safeNeil Horman1-16/+54
Eric Dumazet pointed out to me that the drop_monitor protocol has some holes in its smp protections. Specifically, its possible to replace data->skb while its being written. This patch corrects that by making data->skb an rcu protected variable. That will prevent it from being overwritten while a tracepoint is modifying it. Signed-off-by: Neil Horman <[email protected]> Reported-by: Eric Dumazet <[email protected]> CC: David Miller <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-28drop_monitor: fix sleeping in invalid context warningNeil Horman1-7/+7
Eric Dumazet pointed out this warning in the drop_monitor protocol to me: [ 38.352571] BUG: sleeping function called from invalid context at kernel/mutex.c:85 [ 38.352576] in_atomic(): 1, irqs_disabled(): 0, pid: 4415, name: dropwatch [ 38.352580] Pid: 4415, comm: dropwatch Not tainted 3.4.0-rc2+ #71 [ 38.352582] Call Trace: [ 38.352592] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 [ 38.352599] [<ffffffff81063f2a>] __might_sleep+0xca/0xf0 [ 38.352606] [<ffffffff81655b16>] mutex_lock+0x26/0x50 [ 38.352610] [<ffffffff8153aaf0>] ? trace_napi_poll_hit+0xd0/0xd0 [ 38.352616] [<ffffffff810b72d9>] tracepoint_probe_register+0x29/0x90 [ 38.352621] [<ffffffff8153a585>] set_all_monitor_traces+0x105/0x170 [ 38.352625] [<ffffffff8153a8ca>] net_dm_cmd_trace+0x2a/0x40 [ 38.352630] [<ffffffff8154a81a>] genl_rcv_msg+0x21a/0x2b0 [ 38.352636] [<ffffffff810f8029>] ? zone_statistics+0x99/0xc0 [ 38.352640] [<ffffffff8154a600>] ? genl_rcv+0x30/0x30 [ 38.352645] [<ffffffff8154a059>] netlink_rcv_skb+0xa9/0xd0 [ 38.352649] [<ffffffff8154a5f0>] genl_rcv+0x20/0x30 [ 38.352653] [<ffffffff81549a7e>] netlink_unicast+0x1ae/0x1f0 [ 38.352658] [<ffffffff81549d76>] netlink_sendmsg+0x2b6/0x310 [ 38.352663] [<ffffffff8150824f>] sock_sendmsg+0x10f/0x130 [ 38.352668] [<ffffffff8150abe0>] ? move_addr_to_kernel+0x60/0xb0 [ 38.352673] [<ffffffff81515f04>] ? verify_iovec+0x64/0xe0 [ 38.352677] [<ffffffff81509c46>] __sys_sendmsg+0x386/0x390 [ 38.352682] [<ffffffff810ffaf9>] ? handle_mm_fault+0x139/0x210 [ 38.352687] [<ffffffff8165b5bc>] ? do_page_fault+0x1ec/0x4f0 [ 38.352693] [<ffffffff8106ba4d>] ? set_next_entity+0x9d/0xb0 [ 38.352699] [<ffffffff81310b49>] ? tty_ldisc_deref+0x9/0x10 [ 38.352703] [<ffffffff8106d363>] ? pick_next_task_fair+0x63/0x140 [ 38.352708] [<ffffffff8150b8d4>] sys_sendmsg+0x44/0x80 [ 38.352713] [<ffffffff8165f8e2>] system_call_fastpath+0x16/0x1b It stems from holding a spinlock (trace_state_lock) while attempting to register or unregister tracepoint hooks, making in_atomic() true in this context, leading to the warning when the tracepoint calls might_sleep() while its taking a mutex. Since we only use the trace_state_lock to prevent trace protocol state races, as well as hardware stat list updates on an rcu write side, we can just convert the spinlock to a mutex to avoid this problem. Signed-off-by: Neil Horman <[email protected]> Reported-by: Eric Dumazet <[email protected]> CC: David Miller <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-27net: cleanups in sock_setsockopt()Eric Dumazet1-27/+15
Use min_t()/max_t() macros, reformat two comments, use !!test_bit() to match !!sock_flag() Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-25net: sock_diag_handler structs can be constShan Wei1-6/+6
read only, so change it to const. Signed-off-by: Shan Wei <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-23net: make spd_fill_page() linear argument a boolEric Dumazet1-4/+4
Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2-15/+19
Fix merge between commit 3adadc08cc1e ("net ax25: Reorder ax25_exit to remove races") and commit 0ca7a4c87d27 ("net ax25: Simplify and cleanup the ax25 sysctl handling") The former moved around the sysctl register/unregister calls, the later simply removed them. With help from Stephen Rothwell. Signed-off-by: David S. Miller <[email protected]>
2012-04-23net: Use bool and remove inline in skb_splice_bits() code.David S. Miller1-29/+29
Signed-off-by: David S. Miller <[email protected]>
2012-04-23net: speedup skb_splice_bits()Eric Dumazet1-11/+19
Commit 35f3d14db (pipe: add support for shrinking and growing pipes) added a slowdown for splice(socket -> pipe), as we might grow the spd used in skb_splice_bits() for each skb we process in splice() syscall. Its not needed since skb lengths are capped. The default on-stack arrays are more than enough. Use MAX_SKB_FRAGS instead of PIPE_DEF_BUFFERS to describe the reasonable limit per skb. Add coalescing support to help splicing of GRO skbs built from linear skbs (linked into frag_list) Signed-off-by: Eric Dumazet <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Tom Herbert <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-23net: add a limit parameter to sk_add_backlog()Eric Dumazet1-2/+2
sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the memory limit. We need to make this limit a parameter for TCP use. No functional change expected in this patch, all callers still using the old sk_rcvbuf limit. Signed-off-by: Eric Dumazet <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Tom Herbert <[email protected]> Cc: Maciej Żenczykowski <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Ilpo Järvinen <[email protected]> Cc: Rick Jones <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-21net: allow better page reuse in splice(sock -> pipe)Eric Dumazet1-0/+3
splice() from socket to pipe needs linear_to_page() helper to transfert skb header to part of page. We can reset the offset in the current sk->sk_sndmsg_page if we are the last user of the page. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-21drop_monitor: allow more events per secondEric Dumazet1-0/+1
It seems there is a logic error in trace_drop_common(), since we store only 64 drops, even if they are from same location. This fix is a one liner, but we probably need more work to avoid useless atomic dec/inc Now I can watch 1 Mpps drops through dropwatch... Signed-off-by: Eric Dumazet <[email protected]> Cc: Neil Horman <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-21sock: Introduce named constants for sk_reusePavel Emelyanov1-1/+1
Name them in a "backward compatible" manner, i.e. reuse or not are still 1 and 0 respectively. The reuse value of 2 means that the socket with it will forcibly reuse everyone else's port. Signed-off-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-20net: Delete all remaining instances of ctl_pathEric W. Biederman1-6/+0
We don't use struct ctl_path anymore so delete the exported constants. Signed-off-by: Eric W. Biederman <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-20net: Convert all sysctl registrations to register_net_sysctlEric W. Biederman1-2/+1
This results in code with less boiler plate that is a bit easier to read. Additionally stops us from using compatibility code in the sysctl core, hastening the day when the compatibility code can be removed. Signed-off-by: Eric W. Biederman <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-20net neighbour: Convert to use register_net_sysctlEric W. Biederman1-27/+6
Using an ascii path to register_net_sysctl as opposed to the slightly awkward ctl_path allows for much simpler code. We no longer need to malloc dev_name to keep it alive the length of our sysctl register instead we can use a small temporary buffer on the stack. Signed-off-by: Eric W. Biederman <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-20net core: Remove unneded creation of an empty net/core sysctl directoryEric W. Biederman1-3/+0
On the next line we register the net_core_table in net/core which creates the directory and ensures it exists. Signed-off-by: Eric W. Biederman <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-20net: Move all of the network sysctls without a namespace into init_net.Eric W. Biederman2-2/+2
This makes it clearer which sysctls are relative to your current network namespace. This makes it a little less error prone by not exposing sysctls for the initial network namespace in other namespaces. This is the same way we handle all of our other network interfaces to userspace and I can't honestly remember why we didn't do this for sysctls right from the start. Signed-off-by: Eric W. Biederman <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-20net: Kill register_sysctl_rotableEric W. Biederman1-1/+1
register_sysctl_rotable never caught on as an interesting way to register sysctls. My take on the situation is that what we want are sysctls that we can only see in the initial network namespace. What we have implemented with register_sysctl_rotable are sysctls that we can see in all of the network namespaces and can only change in the initial network namespace. That is a very silly way to go. Just register the network sysctls in the initial network namespace and we don't have any weird special cases to deal with. The sysctls affected are: /proc/sys/net/ipv4/ipfrag_secret_interval /proc/sys/net/ipv4/ipfrag_max_dist /proc/sys/net/ipv6/ip6frag_secret_interval /proc/sys/net/ipv6/mld_max_msf I really don't expect anyone will miss them if they can't read them in a child user namespace. CC: Pavel Emelyanov <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-19net: gro: GRO_MERGED_FREE consumes packetsEric Dumazet1-1/+4
As part of GRO processing, merged skbs should be consumed, not freed, to not confuse dropwatch/drop_monitor. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-19net: dont drop packet but consume itEric Dumazet1-1/+1
When we need to clone skb, we dont drop a packet. Call consume_skb() to not confuse dropwatch. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-19net: fix compile error of leaking kmemleak.h headerShan Wei1-0/+1
net/core/sysctl_net_core.c: In function ‘sysctl_core_init’: net/core/sysctl_net_core.c:259: error: implicit declaration of function ‘kmemleak_not_leak’ with same error in net/ipv4/route.c Signed-off-by: Shan Wei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-18net/core:Remove memleak reports by kmemleak_not_leak.majianpeng1-1/+1
Signed-off-by: majianpeng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-18netns: do not leak net_generic data on failed initJulian Anastasov1-15/+18
ops_init should free the net_generic data on init failure and __register_pernet_operations should not call ops_free when NET_NS is not enabled. Signed-off-by: Julian Anastasov <[email protected]> Reviewed-by: "Eric W. Biederman" <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-15net: rtnetlink notify events for FDB NTF_SELF adds and deletesJohn Fastabend1-2/+33
It is useful to be able to monitor for FDB events in user space. This patch adds support to generate netlink events when a change is made to a device supporting the FDB ops. This brings embedded switches inline with the SW net/bridge which triggers events on FDB updates as well. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-15net: add fdb generic dump routineJohn Fastabend1-0/+84
This adds a generic dump routine drivers can call. It should be sufficient to handle any bridging model that uses the unicast address list. This should be most SR-IOV enabled NICs. v2: return error on nlmsg_put and use -EMSGSIZE instead of -ENOMEM this is inline other usages Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-15net: addr_list: add exclusive dev_uc_add and dev_mc_addJohn Fastabend1-16/+81
This adds a dev_uc_add_excl() and dev_mc_add_excl() calls similar to the original dev_{uc|mc}_add() except it sets the global bit and returns -EEXIST for duplicat entires. This is useful for drivers that support SR-IOV, macvlan devices and any other devices that need to manage the unicast and multicast lists. v2: fix typo UNICAST should be MULTICAST in dev_mc_add_excl() CC: Ben Hutchings <[email protected]> Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-15net: add generic PF_BRIDGE:RTM_ FDB hooksJohn Fastabend1-0/+152
This adds two new flags NTF_MASTER and NTF_SELF that can now be used to specify where PF_BRIDGE netlink commands should be sent. NTF_MASTER sends the commands to the 'dev->master' device for parsing. Typically this will be the linux net/bridge, or open-vswitch devices. Also without any flags set the command will be handled by the master device as well so that current user space tools continue to work as expected. The NTF_SELF flag will push the PF_BRIDGE commands to the device. In the basic example below the commands are then parsed and programmed in the embedded bridge. Note if both NTF_SELF and NTF_MASTER bits are set then the command will be sent to both 'dev->master' and 'dev' this allows user space to easily keep the embedded bridge and software bridge in sync. There is a slight complication in the case with both flags set when an error occurs. To resolve this the rtnl handler clears the NTF_ flag in the netlink ack to indicate which sets completed successfully. The add/del handlers will abort as soon as any error occurs. To support this new net device ops were added to call into the device and the existing bridging code was refactored to use these. There should be no required changes in user space to support the current bridge behavior. A basic setup with a SR-IOV enabled NIC looks like this, veth0 veth2 | | ------------ | bridge0 | <---- software bridging ------------ / / ethx.y ethx VF PF \ \ <---- propagate FDB entries to HW \ \ -------------------- | Embedded Bridge | <---- hardware offloaded switching -------------------- In this case the embedded bridge must be managed to allow 'veth0' to communicate with 'ethx.y' correctly. At present drivers managing the embedded bridge either send frames onto the network which then get dropped by the switch OR the embedded bridge will flood these frames. With this patch we have a mechanism to manage the embedded bridge correctly from user space. This example is specific to SR-IOV but replacing the VF with another PF or dropping this into the DSA framework generates similar management issues. Examples session using the 'br'[1] tool to add, dump and then delete a mac address with a new "embedded" option and enabled ixgbe driver: # br fdb add 22:35:19:ac:60:59 dev eth3 # br fdb port mac addr flags veth0 22:35:19:ac:60:58 static veth0 9a:5f:81:f7:f6:ec local eth3 00:1b:21:55:23:59 local eth3 22:35:19:ac:60:59 static veth0 22:35:19:ac:60:57 static #br fdb add 22:35:19:ac:60:59 embedded dev eth3 #br fdb port mac addr flags veth0 22:35:19:ac:60:58 static veth0 9a:5f:81:f7:f6:ec local eth3 00:1b:21:55:23:59 local eth3 22:35:19:ac:60:59 static veth0 22:35:19:ac:60:57 static eth3 22:35:19:ac:60:59 local embedded #br fdb del 22:35:19:ac:60:59 embedded dev eth3 I added a couple lines to 'br' to set the flags correctly is all. It is my opinion that the merit of this patch is now embedded and SW bridges can both be modeled correctly in user space using very nearly the same message passing. [1] 'br' tool was published as an RFC here and will be renamed 'bridge' http://patchwork.ozlabs.org/patch/117664/ Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for valuable feedback, suggestions, and review. v2: fixed api descriptions and error case with both NTF_SELF and NTF_MASTER set plus updated patch description. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet9-27/+28
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-14sk_run_filter: add BPF_S_ANC_SECCOMP_LD_WWill Drewry1-0/+6
Introduces a new BPF ancillary instruction that all LD calls will be mapped through when skb_run_filter() is being used for seccomp BPF. The rewriting will be done using a secondary chk_filter function that is run after skb_chk_filter. The code change is guarded by CONFIG_SECCOMP_FILTER which is added, along with the seccomp_bpf_load() function later in this series. This is based on http://lkml.org/lkml/2012/3/2/141 Suggested-by: Indan Zupancic <[email protected]> Signed-off-by: Will Drewry <[email protected]> Acked-by: Eric Dumazet <[email protected]> Acked-by: Eric Paris <[email protected]> v18: rebase ... v15: include seccomp.h explicitly for when seccomp_bpf_load exists. v14: First cut using a single additional instruction ... v13: made bpf functions generic. Signed-off-by: James Morris <[email protected]>
2012-04-13neighbour: Make neigh_table_init_no_netlink() static.Hiroaki SHIMODA1-2/+1
neigh_table_init_no_netlink() is only used in net/core/neighbour.c file. Signed-off-by: Hiroaki SHIMODA <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-13rtnetlink & bonding: change args got get_tx_queuesstephen hemminger1-4/+4
Change get_tx_queues, drop unsused arg/return value real_tx_queues, and use return by value (with error) rather than call by reference. Probably bonding should just change to LLTX and the whole get_tx_queues API could disappear! Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2012-04-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-0/+20
Pull in the 'net' tree to get CAIF bug fixes upon which the following set of CAIF feature patches depend. Signed-off-by: David S. Miller <[email protected]>
2012-04-13net: In unregister_netdevice_notifier unregister the netdevices.Eric W. Biederman1-0/+20
We already synthesize events in register_netdevice_notifier and synthesizing events in unregister_netdevice_notifier allows to us remove the need for special case cleanup code. This change should be safe as it adds no new cases for existing callers of unregiser_netdevice_notifier to handle. Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: David S. Miller <[email protected]>