Age | Commit message (Collapse) | Author | Files | Lines |
|
You can't call atomic_notifier_chain_unregister() while in atomic context.
Fix, call un/register_atmdevice_notifier in module __init and __exit.
Bug report:
http://comments.gmane.org/gmane.linux.network/172603
Reported-by: Mikko Vinni <[email protected]>
Tested-by: Mikko Vinni <[email protected]>
Signed-off-by: Karl Hiramoto <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Automatically allows vlans to get NETIF_F_HIGHDMA if underlying device
supports it.
On 32bit arches (and more precisely if CONFIG_HIGHMEM is enabled), it
can help to reduce cost of illegal_highdma() and __skb_linearize()
calls.
Tested on tg3 , bnx2, bonding, this worked very well.
This is a generalization of a patch provided by Yi Zou & Jeff Kirsher.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We have for each socket :
One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)
Possible scenarios are :
(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)
<BH>
spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...
(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)
This (C) case conflicts with (A) :
CPU1 [A] CPU2 [C]
read_lock(callback_lock)
<BH> spin_lock_bh(slock)
<wait to spin_lock(slock)>
<wait to write_lock_bh(callback_lock)>
We have one problematic (C) use case in inet_csk_listen_stop() :
local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)
lockdep is not happy with this, as reported by Tetsuo Handa
It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.
Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.
Reported-by: Tetsuo Handa <[email protected]>
Tested-by: Tetsuo Handa <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
CC: Jarek Poplawski <[email protected]>
Tested-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
While investigating a bit, I found ip_fragment() slow path was taken
because ip_append_data() provides following layout for a send(MTU +
N*(MTU - 20)) syscall :
- one skb with 1500 (mtu) bytes
- N fragments of 1480 (mtu-20) bytes (before adding IP header)
last fragment gets 17 bytes of trail data because of following bit:
if (datalen == length + fraggap)
alloclen += rt->dst.trailer_len;
Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
another bug ?)
In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
so we take slow path, building another skb chain.
In order to avoid taking slow path, we should correct ip_append_data()
to make sure last fragment has real trail space, under mtu...
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch fixes stale mac80211_tx_control_flags for
filtered / retried frames.
Because ieee80211_handle_filtered_frame feeds skbs back
into the tx path, they have to be stripped of some tx
flags so they won't confuse the stack, driver or device.
Cc: <[email protected]>
Acked-by: Johannes Berg <[email protected]>
Signed-off-by: Christian Lamparter <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
|
|
IEEE Std 802.11k-2008 added DS Parameter Set information element into
Probe Request frames as an optional information on 2.4 GHz band (and
mandatory, if radio measurements are enabled). This allows APs to
filter out Probe Request frames that may be received from neighboring
overlapping channels and by doing so, reduce the number of unnecessary
frames in the air. Make mac80211 add this IE into Probe Request frames
whenever the channel is known (i.e., whenever hwscan is not used).
Signed-off-by: Jouni Malinen <[email protected]>
Acked-by: Johannes Berg <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
|
|
If the TX rate set has been masked, the removed rates can also be
removed from the Supported Rates and Extended Supported Rates IEs in
Probe Request frames.
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
Conflicts:
drivers/net/wireless/ath/ath5k/base.c
net/mac80211/main.c
|
|
commit 8c0c709eea5cbab97fb464cd68b06f24acc58ee1
Author: Johannes Berg <[email protected]>
Date: Wed Nov 25 17:46:15 2009 +0100
mac80211: move cmntr flag out of rx flags
moved the CMTR flag into the skb's status, and
in doing so introduced a use-after-free -- when
the skb has been handed to cooked monitors the
status setting will touch now invalid memory.
Additionally, moving it there has effectively
discarded the optimisation -- since the bit is
only ever set on freed SKBs, and those were a
copy, it could never be checked.
For the current release, fixing this properly
is a bit too involved, so let's just remove the
problematic code and leave userspace with one
copy of each frame for each virtual interface.
Cc: [email protected] [2.6.33+]
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
|
|
Change "return (EXPR);" to "return EXPR;"
return is not a function, parentheses are not required.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Joe Perches <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
Signed-off-by: Thomas Weber <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
otherwise ECT(1) bit will get interpreted as RTO_ONLINK
and routing will fail with XfrmOutBundleGenError.
Signed-off-by: Ulrich Weber <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The x25_datagram_poll didn't add anything, removed it.
Signed-off-by: Andrew Hendry <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Andrew Hendry <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
we need to check proper socket type within ipv4_conntrack_defrag
function before referencing the nodefrag flag.
For example the tun driver receive path produces skbs with
AF_UNSPEC socket type, and so current code is causing unwanted
fragmented packets going out.
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fix checksum calculation in nf_nat_snmp_basic.
Based on patches by Clark Wang <[email protected]> and
Stephen Hemminger <[email protected]>.
https://bugzilla.kernel.org/show_bug.cgi?id=17622
Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As soon as rcu_read_unlock() is called, there is no guarantee current
thread can safely derefence t pointer, rcu protected.
Fix is to copy t->alloc_size in a temporary variable.
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
ip_route_me_harder can't create the route cache when the outdev is the same
with the indev for the skbs whichout a valid protocol set.
__mkroute_input functions has this check:
1998 if (skb->protocol != htons(ETH_P_IP)) {
1999 /* Not IP (i.e. ARP). Do not create route, if it is
2000 * invalid for proxy arp. DNAT routes are always valid.
2001 *
2002 * Proxy arp feature have been extended to allow, ARP
2003 * replies back to the same interface, to support
2004 * Private VLAN switch technologies. See arp.c.
2005 */
2006 if (out_dev == in_dev &&
2007 IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) {
2008 err = -EINVAL;
2009 goto cleanup;
2010 }
2011 }
This patch gives the new skb a valid protocol to bypass this check. In order
to make ipt_REJECT work with bridges, you also need to enable ip_forward.
This patch also fixes a regression. When we used skb_copy_expand(), we
didn't have this issue stated above, as the protocol was properly set.
Signed-off-by: Changli Gao <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
I initially noticed this because of the compiler warning below, but it
does seem to be a valid concern in the case where ct_sip_get_header()
returns 0 in the first iteration of the while loop.
net/netfilter/nf_conntrack_sip.c: In function 'sip_help_tcp':
net/netfilter/nf_conntrack_sip.c:1379: warning: 'ret' may be used uninitialized in this function
Signed-off-by: Simon Horman <[email protected]>
[Patrick: changed NF_DROP to NF_ACCEPT]
Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
transparent field of a socket is either inet_twsk(sk)->tw_transparent
for timewait sockets, or inet_sk(sk)->transparent for other sockets
(TCP/UDP).
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
With this patch, you can specify the expectation flags for user-space
created expectations.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
This patch adds the missing validation of the CTA_EXPECT_ZONE
attribute in the ctnetlink code.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
This patch improves the situation in which the expectation table is
full for conntrack NAT helpers. Basically, we give up if we don't
find a place in the table instead of looping over nf_ct_expect_related()
with a different port (we should only do this if it returns -EBUSY, for
-EMFILE or -ESHUTDOWN I think that it's better to skip this).
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
|
|
CAIF sockets should use socket's default send and receive buffers sizes.
Signed-off-by: Sjur Braendeland <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Check that receive function pointer is not null before calling it.
Signed-off-by: Sjur Braendeland <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use pr_debug for flow control printouts, and refine an error printout.
Signed-off-by: Sjur Braendeland <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Remove debugging quirk redefining pr_debug to pr_warning.
Signed-off-by: Sjur Brændeland <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
net/core/ethtool.c: In function 'ethtool_get_regs':
net/core/ethtool.c:818:2: error: implicit declaration of function 'vmalloc'
net/core/ethtool.c:818:9: warning: assignment makes pointer from integer without a cast
net/core/ethtool.c:833:2: error: implicit declaration of function 'vfree'
Signed-off-by: David S. Miller <[email protected]>
|
|
|
|
Special care should be taken when slow path is hit in ip_fragment() :
When walking through frags, we transfert truesize ownership from skb to
frags. Then if we hit a slow_path condition, we must undo this or risk
uncharging frags->truesize twice, and in the end, having negative socket
sk_wmem_alloc counter, or even freeing socket sooner than expected.
Many thanks to Nick Bowler, who provided a very clean bug report and
test program.
Thanks to Jarek for reviewing my first patch and providing a V2
While Nick bisection pointed to commit 2b85a34e911 (net: No more
expensive sock_hold()/sock_put() on each tx), underlying bug is older
(2.6.12-rc5)
A side effect is to extend work done in commit b2722b1c3a893e
(ip_fragment: also adjust skb->truesize for packets not owned by a
socket) to ipv6 as well.
Reported-and-bisected-by: Nick Bowler <[email protected]>
Tested-by: Nick Bowler <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
CC: Jarek Poplawski <[email protected]>
CC: Patrick McHardy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some NICs have huge register files which exceed the maximum heap
allocation size.
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
Conflicts:
arch/arm/mach-omap2/board-omap3pandora.c
drivers/net/wireless/ath/ath5k/base.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
|
|
Change the usage of svc usecnt during command execution:
- we check if svc is registered but we do not need to hold usecnt
reference while under __ip_vs_mutex, only the packet handling needs
it during scheduling
- change __ip_vs_service_get to __ip_vs_service_find and
__ip_vs_svc_fwm_get to __ip_vs_svc_fwm_find because now caller
will increase svc->usecnt
- put common code that calls update_service in __ip_vs_update_dest
- put common code in ip_vs_unlink_service() and use it to unregister
the service
- add comment that svc should not be accessed after ip_vs_del_service
anymore
- all IP_VS_WAIT_WHILE calls are now unified: usecnt > 0
- Properly log the app ports
As result, some problems are fixed:
- possible use-after-free of svc in ip_vs_genl_set_cmd after
ip_vs_del_service because our usecnt reference does not guarantee that
svc is not freed on refcnt==0, eg. when no dests are moved to trash
- possible usecnt leak in do_ip_vs_set_ctl after ip_vs_del_service
when the service is not freed now, for example, when some
destionations are moved into trash and svc->refcnt remains above 0.
It is harmless because svc is not in hash anymore.
Signed-off-by: Julian Anastasov <[email protected]>
Acked-by: Simon Horman <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
Since we don't change the tuple in the original direction, we can save it
in ct->tuplehash[IP_CT_DIR_REPLY].hnode.pprev for __nf_conntrack_confirm()
use.
__hash_conntrack() is split into two steps: hash_conntrack_raw() is used
to get the raw hash, and __hash_bucket() is used to get the bucket id.
In SYN-flood case, early_drop() doesn't need to recompute the hash again.
Signed-off-by: Changli Gao <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
Add new sysctl flag "snat_reroute". Recent kernels use
ip_route_me_harder() to route LVS-NAT responses properly by
VIP when there are multiple paths to client. But setups
that do not have alternative default routes can skip this
routing lookup by using snat_reroute=0.
Signed-off-by: Julian Anastasov <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
Add more code to IPVS to work with Netfilter connection
tracking and fix some problems.
- Allow IPVS to be compiled without connection tracking as in
2.6.35 and before. This can avoid keeping conntracks for all
IPVS connections because this costs memory. ip_vs_ftp still
depends on connection tracking and NAT as implemented for 2.6.36.
- Add sysctl var "conntrack" to enable connection tracking for
all IPVS connections. For loaded IPVS directors it needs
tuning of nf_conntrack_max limit.
- Add IP_VS_CONN_F_NFCT connection flag to request the connection
to use connection tracking. This allows user space to provide this
flag, for example, in dest->conn_flags. This can be useful to
request connection tracking per real server instead of forcing it
for all connections with the "conntrack" sysctl. This flag is
set currently only by ip_vs_ftp and of course by "conntrack" sysctl.
- Add ip_vs_nfct.c file to hold all connection tracking code,
by this way main code should not depend of netfilter conntrack
support.
- Return back the ip_vs_post_routing handler as in 2.6.35 and use
skb->ipvs_property=1 to allow IPVS to work without connection
tracking
Connection tracking:
- most of the code is already in 2.6.36-rc
- alter conntrack reply tuple for LVS-NAT connections when first packet
from client is forwarded and conntrack state is NEW or RELATED.
Additionally, alter reply for RELATED connections from real server,
again for packet in original direction.
- add IP_VS_XMIT_TUNNEL to confirm conntrack (without altering
reply) for LVS-TUN early because we want to call nf_reset. It is
needed because we add IPIP header and the original conntrack
should be preserved, not destroyed. The transmitted IPIP packets
can reuse same conntrack, so we do not set skb->ipvs_property.
- try to destroy conntrack when the IPVS connection is destroyed.
It is not fatal if conntrack disappears before that, it depends
on the used timers.
Fix problems from long time:
- add skb->ip_summed = CHECKSUM_NONE for the LVS-TUN transmitters
Signed-off-by: Julian Anastasov <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
Merge reason: Pick up the latest fixes in -rc5.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The `options_received' struct is redundant, since it re-duplicates the existing
`p' and `x_recv' fields. This patch removes the sub-struct and migrates the
format conversion operations to ccid3_hc_tx_parse_options().
Signed-off-by: Gerrit Renker <[email protected]>
|
|
This adds a function to take care of the following, separate cases occurring in
the computation of the Loss Rate p:
* 1/(2^32-1) is mapped into 0% as per RFC 4342, 8.5;
* 1/0 is mapped into 100%, the maximum;
* to avoid that p = 1/x is rounded down to 0 when x is very large, since this
means accidentally re-entering slow-start indicated by p == 0, the minimum
resolution value of p is now returned instead;
* a bug in ccid3_hc_rx_getsockopt is fixed: 1/0 was mapped into ~0U.
Signed-off-by: Gerrit Renker <[email protected]>
|
|
This patch is thanks to an investigation by Leandro Sales de Melo and his
colleagues. They worked out two state diagrams which highlight the fact that
the xxx_TERM states in CCID-3/4 are in fact not necessary.
And this can be confirmed by in turn looking at the code: the xxx_TERM states
are only ever set in ccid3_hc_{rx,tx}_exit(): when CCID-3 sets the state
to xxx_TERM, it is at a time where no more processing should be going on,
hence it is not necessary to introduce a dedicated exit state - this is already
implied by unloading the CCID.
Signed-off-by: Gerrit Renker <[email protected]>
|
|
The constants DCCPO_{MIN,MAX}_CCID_SPECIFIC are nowhere used in the code, but
instead for the CCID-specific options numbers are used.
This patch unifies the use of CCID-specific option numbers, by adding symbolic
names reflecting the definitions in RFC 4340, 10.3.
Signed-off-by: Gerrit Renker <[email protected]>
|
|
This
1. adds packet type information to ccid_hc_{rx,tx}_parse_options(). This is
necessary, since table 3 in RFC 4340, 5.8 leaves it to the CCIDs to state
which options may (not) appear on what packet type.
2. adds such a check for CCID-3's {Loss Event, Receive} Rate as specified in
RFC 4340 8.3 ("Receive Rate options MUST NOT be sent on DCCP-Data packets")
and 8.5 ("Loss Event Rate options MUST NOT be sent on DCCP-Data packets").
3. removes an unused argument `idx' from ccid_hc_{rx,tx}_parse_options(). This
is also no longer necessary, since the CCID-specific option-parsing routines
are passed every single parameter of the type-length-value option encoding.
Signed-off-by: Gerrit Renker <[email protected]>
|
|
If a RST comes in immediately after checking sk->sk_err, tcp_poll will
return POLLIN but not POLLOUT. Fix this by checking sk->sk_err at the end
of tcp_poll. Additionally, ensure the correct order of operations on SMP
machines with memory barriers.
Signed-off-by: Tom Marshall <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Just use explicit casts, since we really can't change the
types of structures exported to userspace which have been
around for 15 years or so.
Reported-by: Dan Rosenberg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The family parameter xfrm_state_find is used to find a state matching a
certain policy. This value is set to the template's family
(encap_family) right before xfrm_state_find is called.
The family parameter is however also used to construct a temporary state
in xfrm_state_find itself which is wrong for inter-family scenarios
because it produces a selector for the wrong family. Since this selector
is included in the xfrm_user_acquire structure, user space programs
misinterpret IPv6 addresses as IPv4 and vice versa.
This patch splits up the original init_tempsel function into a part that
initializes the selector respectively the props and id of the temporary
state, to allow for differing ip address families whithin the state.
Signed-off-by: Thomas Egerer <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|