Age | Commit message (Collapse) | Author | Files | Lines |
|
Use correct allocation flags during copy of user space fragments
to the kernel. Also "improve" couple of for loops.
Signed-off-by: Krishna Kumar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Jiri Pirko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
there are some out of bound accesses in netprio cgroup.
now before accessing the dev->priomap.priomap array,we only check
if the dev->priomap exist.and because we don't want to see
additional bound checkings in fast path, so we should make sure
that dev->priomap is null or array size of dev->priomap.priomap
is equal to max_prioidx + 1;
so in write_priomap logic,we should call extend_netdev_table when
dev->priomap is null and dev->priomap.priomap_len < max_len.
and in cgrp_create->update_netdev_tables logic,we should call
extend_netdev_table only when dev->priomap exist and
dev->priomap.priomap_len < max_len.
and it's not needed to call update_netdev_tables in write_priomap,
we can only allocate the net device's priomap which we change through
net_prio.ifpriomap.
this patch also add a return value for update_netdev_tables &
extend_netdev_table, so when new_priomap is allocated failed,
write_priomap will stop to access the priomap,and return -ENOMEM
back to the userspace to tell the user what happend.
Change From v3:
1. add rtnl protect when reading max_prioidx in write_priomap.
2. only call extend_netdev_table when map->priomap_len < max_len,
this will make sure array size of dev->map->priomap always
bigger than any prioidx.
3. add a function write_update_netdev_table to make codes clear.
Change From v2:
1. protect extend_netdev_table by RTNL.
2. when extend_netdev_table failed,call dev_put to reduce device's refcount.
Signed-off-by: Gao feng <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: Eric Dumazet <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Before this patch sock_diag works for init_net only and dumps
information about sockets from all namespaces.
This patch expands sock_diag for all name-spaces.
It creates a netlink kernel socket for each netns and filters
data during dumping.
v2: filter accoding with netns in all places
remove an unused variable.
Cc: "David S. Miller" <[email protected]>
Cc: Alexey Kuznetsov <[email protected]>
Cc: James Morris <[email protected]>
Cc: Hideaki YOSHIFUJI <[email protected]>
Cc: Patrick McHardy <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
CC: Eric Dumazet <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Andrew Vagin <[email protected]>
Acked-by: Pavel Emelyanov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Few drivers use GFP_DMA allocations, and netdev_alloc_frag()
doesn't allocate pages in DMA zone.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Cc: David Miller <[email protected]>
Cc: Linus Torvalds <[email protected]>
Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: [email protected]
|
|
This patch is meant to help improve performance by reducing the number of
locked operations required to allocate a frag on x86 and other platforms.
This is accomplished by using atomic_set operations on the page count
instead of calling get_page and put_page. It is based on work originally
provided by Eric Dumazet.
In addition it also helps to reduce memory overhead when using TCP. This
is done by recycling the page if the only holder of the frame is the
netdev_alloc_frag call itself. This can occur when skb heads are stolen by
either GRO or TCP and the driver providing the packets is using paged frags
to store all of the data for the packets.
Cc: Eric Dumazet <[email protected]>
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This introduce TSQ (TCP Small Queues)
TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.
sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.
TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.
As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.
This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.
Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.
Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)
I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.
As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.
If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.
[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
but some drivers call it in their start_xmit() handler.
These drivers should at least use BQL, or else a single TCP
session can still fill the whole NIC TX ring, since TSQ will
have no effect.
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Dave Taht <[email protected]>
Cc: Tom Herbert <[email protected]>
Cc: Matt Mathis <[email protected]>
Cc: Yuchung Cheng <[email protected]>
Cc: Nandita Dukkipati <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Conflicts:
net/batman-adv/bridge_loop_avoidance.c
net/batman-adv/bridge_loop_avoidance.h
net/batman-adv/soft-interface.c
net/mac80211/mlme.c
With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).
The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.
Signed-off-by: David S. Miller <[email protected]>
|
|
Fix incorrect start markers, wrapped summary lines, missing section
breaks, incorrect separators, and some name mismatches.
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Defining a function with no parameters as 'T foo()' is the deprecated
K&R style, and is not strictly equivalent to defining it as 'T foo(void)'.
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Nobody provides non-zero values any longer.
Signed-off-by: David S. Miller <[email protected]>
|
|
poitner -> pointer.
Signed-off-by: Liu Bo <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
dev->priomap is allocated by extend_netdev_table() called from
update_netdev_tables().
And this is only called if write_priomap() is called.
But if write_priomap() is not called, it seems we can have out of bounds
accesses in cgrp_destroy(), read_priomap() & skb_update_prio()
With help from Gao Feng
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Neil Horman <[email protected]>
Cc: Gao feng <[email protected]>
Acked-by: Gao feng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
we set max_prioidx to the first zero bit index of prioidx_map in
function get_prioidx.
So when we delete the low index netprio cgroup and adding a new
netprio cgroup again,the max_prioidx will be set to the low index.
when we set the high index cgroup's net_prio.ifpriomap,the function
write_priomap will call update_netdev_tables to alloc memory which
size is sizeof(struct netprio_map) + sizeof(u32) * (max_prioidx + 1),
so the size of array that map->priomap point to is max_prioidx +1,
which is low than what we actually need.
fix this by adding check in get_prioidx,only set max_prioidx when
max_prioidx low than the new prioidx.
Signed-off-by: Gao feng <[email protected]>
Acked-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
|
|
Most multi-queue networking driver consider the number of online cpus when
configuring RSS queues.
This patch adds a wrapper to the number of cpus, setting an upper limit on the
number of cpus a driver should consider (by default) when allocating resources
for his queues.
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: Eilon Greenstein <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
No longer used.
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: David S. Miller <[email protected]>
|
|
When a dst_confirm() happens, mark the confirmation as pending in the
dst. Then on the next packet out, when we have the neigh in-hand, do
the update.
This removes the dependency in dst_confirm() of dst's having an
attached neigh.
While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL. So just fix those cases up.
Signed-off-by: David S. Miller <[email protected]>
|
|
Do not use the dst cached neigh, we'll be getting rid of that.
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull networking update from David Miller:
1) Fix RX sequence number handling in mwifiex, from Stone Piao.
2) Netfilter ipset mis-compares device names, fix from Florian
Westphal.
3) Fix route leak in ipv6 IPVS, from Eric Dumazet.
4) NFS fixes. Several buffer overflows in NCI layer from Dan
Rosenberg, and release sock OOPS'er fix from Eric Dumazet.
5) Fix WEP handling ath9k, we started using a bit the chip provides to
indicate undecrypted packets but that bit turns out to be unreliable
in certain configurations. Fix from Felix Fietkau.
6) Fix Kconfig dependency bug in wlcore, from Randy Dunlap.
7) New USB IDs for rtlwifi driver from Larry Finger.
8) Fix crashes in qmi_wwan usbnet driver when disconnecting, from Bjørn
Mork.
9) Gianfar driver programs coalescing settings properly in single queue
mode, but does not do so in multi-queue mode. Fix from Claudiu
Manoil.
10) Missing module.h include in davinci_cpdma.c, from Daniel Mack.
11) Need dummy handler for IPSET_CMD_NONE otherwise we crash in ipset if
we get this via nfnetlink, fix from Tomasz Bursztyka.
12) Missing RCU unlock in nfnetlink error path, also from Tomasz.
13) Fix divide by zero in igbvf when the user tries to set an RX
coalescing value of 0 usecs, from Mitch A Williams.
14) We can process SCTP sacks for the wrong transport, oops. Fix from
Neil Horman.
15) Remove hw IP payload checksumming from e1000e driver. This has zery
value in our stack, and turning it on creates a very unintuitive
restriction for users when using jumbo MTUs.
Specifically, when IP payload checksums are on you cannot use both
receive hashing offload and jumbo MTU. Fix from Bruce Allan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
e1000e: remove use of IP payload checksum
sctp: be more restrictive in transport selection on bundled sacks
igbvf: fix divide by zero
netfilter: nfnetlink: fix missing rcu_read_unlock in nfnetlink_rcv_msg
netfilter: ipset: fix crash if IPSET_CMD_NONE command is sent
davinci_cpdma: include linux/module.h
gianfar: Fix RXICr/TXICr programming for multi-queue mode
net: Downgrade CAP_SYS_MODULE deprecated message from error to warning.
net: qmi_wwan: fix Oops while disconnecting
mwifiex: fix memory leak associated with IE manamgement
ath9k: fix panic caused by returning a descriptor we have queued for reuse
mac80211: correct behaviour on unrecognised action frames
ath9k: enable serialize_regmode for non-PCIE AR9287
rtlwifi: rtl8192cu: New USB IDs
NFC: Return from rawsock_release when sk is NULL
iwlwifi: fix activating inactive stations
wlcore: drop INET dependency
ath9k: fix dynamic WEP related regression
NFC: Prevent multiple buffer overflows in NCI
netfilter: update location of my trees
...
|
|
Pull block bits from Jens Axboe:
"As vacation is coming up, thought I'd better get rid of my pending
changes in my for-linus branch for this iteration. It contains:
- Two patches for mtip32xx. Killing a non-compliant sysfs interface
and moving it to debugfs, where it belongs.
- A few patches from Asias. Two legit bug fixes, and one killing an
interface that is no longer in use.
- A patch from Jan, making the annoying partition ioctl warning a bit
less annoying, by restricting it to !CAP_SYS_RAWIO only.
- Three bug fixes for drbd from Lars Ellenberg.
- A fix for an old regression for umem, it hasn't really worked since
the plugging scheme was changed in 3.0.
- A few fixes from Tejun.
- A splice fix from Eric Dumazet, fixing an issue with pipe
resizing."
* 'for-linus' of git://git.kernel.dk/linux-block:
scsi: Silence unnecessary warnings about ioctl to partition
block: Drop dead function blk_abort_queue()
block: Mitigate lock unbalance caused by lock switching
block: Avoid missed wakeup in request waitqueue
umem: fix up unplugging
splice: fix racy pipe->buffers uses
drbd: fix null pointer dereference with on-congestion policy when diskless
drbd: fix list corruption by failing but already aborted reads
drbd: fix access of unallocated pages and kernel panic
xen/blkfront: Add WARN to deal with misbehaving backends.
blkcg: drop local variable @q from blkg_destroy()
mtip32xx: Create debugfs entries for troubleshooting
mtip32xx: Remove 'registers' and 'flags' from sysfs
blkcg: fix blkg_alloc() failure path
block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
block: fix return value on cfq_init() failure
mtip32xx: Remove version.h header file inclusion
xen/blkback: Copy id field when doing BLKIF_DISCARD.
|
|
This patch adds the following structure:
struct netlink_kernel_cfg {
unsigned int groups;
void (*input)(struct sk_buff *skb);
struct mutex *cb_mutex;
};
That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.
I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.
That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.
This patch also adapts all callers to use this new interface.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If rpfilter is off (or the SKB has an IPSEC path) and there are not
tclassid users, we don't have to do anything at all when
fib_validate_source() is invoked besides setting the itag to zero.
We monitor tclassid uses with a counter (modified only under RTNL and
marked __read_mostly) and we protect the fib_validate_source() real
work with a test against this counter and whether rpfilter is to be
done.
Having a way to know whether we need no tclassid processing or not
also opens the door for future optimized rpfilter algorithms that do
not perform full FIB lookups.
Signed-off-by: David S. Miller <[email protected]>
|
|
Conflicts:
drivers/net/caif/caif_hsi.c
drivers/net/usb/qmi_wwan.c
The qmi_wwan merge was trivial.
The caif_hsi.c, on the other hand, was not. It's a conflict between
1c385f1fdf6f9c66d982802cd74349c040980b50 ("caif-hsi: Replace platform
device with ops structure.") in the net-next tree and commit
39abbaef19cd0a30be93794aa4773c779c3eb1f3 ("caif-hsi: Postpone init of
HIS until open()") in the net tree.
I did my best with that one and will ask Sjur to check it out.
Signed-off-by: David S. Miller <[email protected]>
|
|
Make logging level consistent with other deprecation messages in net
subsystem.
Signed-off-by: Vinson Lee <[email protected]>
Cc: David Mackey <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
dropwatch wrongly diagnose all received UDP packets as drops.
This patch removes trace_kfree_skb() done in skb_free_datagram_locked().
Locations calling skb_free_datagram_locked() should do it on their own.
As a result, drops are accounted on the right function.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Removes all RTA_GET*() and RTA_PUT*() variations, as well as the
the unused rtattr_strcmp(). Get rid of rtm_get_table() by moving
it to its only user decnet.
Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Input packet processing for local sockets involves two major demuxes.
One for the route and one for the socket.
But we can optimize this down to one demux for certain kinds of local
sockets.
Currently we only do this for established TCP sockets, but it could
at least in theory be expanded to other kinds of connections.
If a TCP socket is established then it's identity is fully specified.
This means that whatever input route was used during the three-way
handshake must work equally well for the rest of the connection since
the keys will not change.
Once we move to established state, we cache the receive packet's input
route to use later.
Like the existing cached route in sk->sk_dst_cache used for output
packets, we have to check for route invalidations using dst->obsolete
and dst->ops->check().
Early demux occurs outside of a socket locked section, so when a route
invalidation occurs we defer the fixup of sk->sk_rx_dst until we are
actually inside of established state packet processing and thus have
the socket locked.
Signed-off-by: David S. Miller <[email protected]>
|
|
Conflicts:
net/ipv6/route.c
This deals with a merge conflict between the net-next addition of the
inetpeer network namespace ops, and Thomas Graf's bug fix in
2a0c451ade8e1783c5d453948289e4a978d417c9 which makes sure we don't
register /proc/net/ipv6_route before it is actually safe to do so.
Signed-off-by: David S. Miller <[email protected]>
|
|
Orphaning skb in dev_hard_start_xmit() makes bonding behavior
unfriendly for applications sending big UDP bursts : Once packets
pass the bonding device and come to real device, they might hit a full
qdisc and be dropped. Without orphaning, the sender is automatically
throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming
sk_sndbuf is not too big)
We could try to defer the orphaning adding another test in
dev_hard_start_xmit(), but all this seems of little gain,
now that BQL tends to make packets more likely to be parked
in Qdisc queues instead of NIC TX ring, in cases where performance
matters.
Reverts commits :
fc6055a5ba31 net: Introduce skb_orphan_try()
87fd308cfc6b net: skb_tx_hash() fix relative to skb_orphan_try()
and removes SKBTX_DRV_NEEDS_SK_REF flag
Reported-and-bisected-by: Jean-Michel Hautbois <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Tested-by: Oliver Hartkopp <[email protected]>
Acked-by: Oliver Hartkopp <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Bogdan Hamciuc diagnosed and fixed following bug in netpoll_send_udp() :
"skb->len += len;" instead of "skb_put(skb, len);"
Meaning that _if_ a network driver needs to call skb_realloc_headroom(),
only packet headers would be copied, leaving garbage in the payload.
However the skb_realloc_headroom() must be avoided as much as possible
since it requires memory and netpoll tries hard to work even if memory
is exhausted (using a pool of preallocated skbs)
It appears netpoll_send_udp() reserved 16 bytes for the ethernet header,
which happens to work for typicall drivers but not all.
Right thing is to use LL_RESERVED_SPACE(dev)
(And also add dev->needed_tailroom of tailroom)
This patch combines both fixes.
Many thanks to Bogdan for raising this issue.
Reported-by: Bogdan Hamciuc <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Tested-by: Bogdan Hamciuc <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Neil Horman <[email protected]>
Reviewed-by: Neil Horman <[email protected]>
Reviewed-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()
commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.
Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.
Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.
splice_shrink_spd() loses its struct pipe_inode_info argument.
Reported-by: Dave Jones <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Tom Herbert <[email protected]>
Cc: stable <[email protected]> # 2.6.35
Tested-by: Dave Jones <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Conflicts:
MAINTAINERS
drivers/net/wireless/iwlwifi/pcie/trans.c
The iwlwifi conflict was resolved by keeping the code added
in 'net' that turns off the buggy chip feature.
The MAINTAINERS conflict was merely overlapping changes, one
change updated all the wireless web site URLs and the other
changed some GIT trees to be Johannes's instead of John's.
Signed-off-by: David S. Miller <[email protected]>
|
|
'Get' commands should generally not require CAP_NET_ADMIN, with
the exception of those that expose internal state.
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add dev_loopback_xmit() in order to deduplicate functions
ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and
ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c).
I was about to reinvent the wheel when I noticed that
ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I
need and are not IP-only functions, but they were not available to reuse
elsewhere.
ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I
understand that this is harmless, and should be in dev_loopback_xmit().
Signed-off-by: Michel Machado <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: Alexey Kuznetsov <[email protected]>
CC: James Morris <[email protected]>
CC: Hideaki YOSHIFUJI <[email protected]>
CC: Patrick McHardy <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: Jiri Pirko <[email protected]>
CC: "Michał Mirosław" <[email protected]>
CC: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem
|
|
Fix kernel-doc warnings in net/core:
Warning(net/core/skbuff.c:3368): No description found for parameter 'delta_truesize'
Warning(net/core/filter.c:628): No description found for parameter 'pfp'
Warning(net/core/filter.c:628): Excess function parameter 'sk' description in 'sk_unattached_filter_create'
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch extends the kernel's ethtool interface by adding support
for 2 new EEE commands - get_eee and set_eee.
Thanks goes to Giuseppe Cavallaro for his original patch adding this support.
Signed-off-by: Yuval Mintz <[email protected]>
Signed-off-by: Eilon Greenstein <[email protected]>
Reviewed-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
__alloc_skb() now extends tailroom to allow the use of padding added
by the heap allocator.
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Denys found out "ip neigh" output was truncated to
about 54 neighbours.
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Denys Fedoryshchenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
|
|
The only user of this was hal prior to its 0.5.12
release which happened over two years ago, so I'm
sure this can be removed without issues.
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
|
|
drop_monitor calls several sleeping functions while in atomic context.
BUG: sleeping function called from invalid context at mm/slub.c:943
in_atomic(): 1, irqs_disabled(): 0, pid: 2103, name: kworker/0:2
Pid: 2103, comm: kworker/0:2 Not tainted 3.5.0-rc1+ #55
Call Trace:
[<ffffffff810697ca>] __might_sleep+0xca/0xf0
[<ffffffff811345a3>] kmem_cache_alloc_node+0x1b3/0x1c0
[<ffffffff8105578c>] ? queue_delayed_work_on+0x11c/0x130
[<ffffffff815343fb>] __alloc_skb+0x4b/0x230
[<ffffffffa00b0360>] ? reset_per_cpu_data+0x160/0x160 [drop_monitor]
[<ffffffffa00b022f>] reset_per_cpu_data+0x2f/0x160 [drop_monitor]
[<ffffffffa00b03ab>] send_dm_alert+0x4b/0xb0 [drop_monitor]
[<ffffffff810568e0>] process_one_work+0x130/0x4c0
[<ffffffff81058249>] worker_thread+0x159/0x360
[<ffffffff810580f0>] ? manage_workers.isra.27+0x240/0x240
[<ffffffff8105d403>] kthread+0x93/0xa0
[<ffffffff816be6d4>] kernel_thread_helper+0x4/0x10
[<ffffffff8105d370>] ? kthread_freezable_should_stop+0x80/0x80
[<ffffffff816be6d0>] ? gs_change+0xb/0xb
Rework the logic to call the sleeping functions in right context.
Use standard timer/workqueue api to let system chose any cpu to perform
the allocation and netlink send.
Also avoid a loop if reset_per_cpu_data() cannot allocate memory :
use mod_timer() to wait 1/10 second before next try.
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Neil Horman <[email protected]>
Reviewed-by: Neil Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Adding socket backlog len in INET_DIAG_SKMEMINFO is really useful to
diagnose various TCP problems.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Now that we have module alias macros for generic netlink families, lets use
those to mark modules with the appropriate family names for loading
Signed-off-by: Neil Horman <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: David Miller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.
Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.
- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.
- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.
- With the user namespaces compiled out the performance is as good or
better than it is today.
- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.
- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).
- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.
- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.
- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.
- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.
My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."
Fix up trivial conflicts due to nearby independent changes in fs/stat.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...
|