Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch prepares p9_fcall structure for zero copy. Added
fields send the payload buffer information to the transport layer.
In addition it adds a 'private' field for the transport layer to
store mapped/pinned page information so that it can be freed/unpinned
during req_done.
This patch also creates trans_common.[ch] to house helper functions.
It adds the following helper functions.
p9_release_req_pages - Release pages after the transaction.
p9_nr_pages - Return number of pages needed to accomodate the payload.
payload_gup - Translates user buffer into kernel pages.
Signed-off-by: Venkateswararao Jujjuri <[email protected]>
Signed-off-by: Eric Van Hensbergen <[email protected]>
|
|
Structures ip6t_replace, compat_ip6t_replace, and xt_get_revision are
copied from userspace. Fields of these structs that are
zero-terminated strings are not checked. When they are used as argument
to a format string containing "%s" in request_module(), some sensitive
information is leaked to userspace via argument of spawned modprobe
process.
The first bug was introduced before the git epoch; the second was
introduced in 3bc3fe5e (v2.6.25-rc1); the third is introduced by
6b7d31fc (v2.6.15-rc1). To trigger the bug one should have
CAP_NET_ADMIN.
Signed-off-by: Vasiliy Kulikov <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
Structures ipt_replace, compat_ipt_replace, and xt_get_revision are
copied from userspace. Fields of these structs that are
zero-terminated strings are not checked. When they are used as argument
to a format string containing "%s" in request_module(), some sensitive
information is leaked to userspace via argument of spawned modprobe
process.
The first and the third bugs were introduced before the git epoch; the
second was introduced in 2722971c (v2.6.17-rc1). To trigger the bug
one should have CAP_NET_ADMIN.
Signed-off-by: Vasiliy Kulikov <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
Structures ipt_replace, compat_ipt_replace, and xt_get_revision are
copied from userspace. Fields of these structs that are
zero-terminated strings are not checked. When they are used as argument
to a format string containing "%s" in request_module(), some sensitive
information is leaked to userspace via argument of spawned modprobe
process.
The first bug was introduced before the git epoch; the second is
introduced by 6b7d31fc (v2.6.15-rc1); the third is introduced by
6b7d31fc (v2.6.15-rc1). To trigger the bug one should have
CAP_NET_ADMIN.
Signed-off-by: Vasiliy Kulikov <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
A potential race condition when generating connlimit_rnd is also fixed.
Signed-off-by: Changli Gao <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
The header of hlist is smaller than list.
Signed-off-by: Changli Gao <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
All the members are initialized after kzalloc().
Signed-off-by: Changli Gao <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
We use the reply tuples when limiting the connections by the destination
addresses, however, in SNAT scenario, the final reply tuples won't be
ready until SNAT is done in POSTROUING or INPUT chain, and the following
nf_conntrack_find_get() in count_tem() will get nothing, so connlimit
can't work as expected.
In this patch, the original tuples are always used, and an additional
member addr is appended to save the address in either end.
Signed-off-by: Changli Gao <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
Just need to make sure that AF_UNIX garbage collector won't
confuse O_PATHed socket on filesystem for real AF_UNIX opened
socket.
Signed-off-by: Al Viro <[email protected]>
|
|
Break out the portions of __ip_vs_control_init() and
__ip_vs_control_cleanup() where aren't necessary when
CONFIG_SYSCTL is undefined.
Signed-off-by: Simon Horman <[email protected]>
|
|
ip_vs_lblc_table and ip_vs_lblcr_table, and code that uses them
are unnecessary when CONFIG_SYSCTL is undefined.
Signed-off-by: Simon Horman <[email protected]>
|
|
Much of ip_vs_leave() is unnecessary if CONFIG_SYSCTL is undefined.
I tried an approach of breaking the now #ifdef'ed portions out
into a separate function. However this appeared to grow the
compiled code on x86_64 by about 200 bytes in the case where
CONFIG_SYSCTL is defined. So I have gone with the simpler though
less elegant #ifdef'ed solution for now.
Signed-off-by: Simon Horman <[email protected]>
|
|
In preparation for not including sysctl_lblc{r}_expiration in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <[email protected]>
|
|
In preparation for not including sysctl_expire_quiescent_template in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <[email protected]>
|
|
In preparation for not including sysctl_expire_nodest_conn in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <[email protected]>
|
|
In preparation for not including sysctl_sync_ver in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <[email protected]>
|
|
In preparation for not including sysctl_sync_threshold in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <[email protected]>
|
|
In preparation for not including sysctl_nat_icmp_send in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <[email protected]>
|
|
In preparation for not including sysctl_snat_reroute in
struct netns_ipvs when CONFIG_SYCTL is not defined.
Signed-off-by: Simon Horman <[email protected]>
|
|
Add ip_vs_route_me_harder() to avoid repeating the same code twice.
Signed-off-by: Simon Horman <[email protected]>
|
|
Rename ip_vs_new_estimator to ip_vs_start_estimator
and ip_vs_kill_estimator to ip_vs_stop_estimator to better
match their logic.
Signed-off-by: Julian Anastasov <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
Move the estimator reading from estimation_timer to user
context. ip_vs_read_estimator() will be used to decode the rate
values. As the decoded rates are not set by estimation timer
there is no need to reset them in ip_vs_zero_stats.
There is no need ip_vs_new_estimator() to encode stats
to rates, if the destination is in trash both the stats and the
rates are inactive.
Signed-off-by: Julian Anastasov <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
Currently, the new percpu counters are not zeroed and
the zero commands do not work as expected, we still show the old
sum of percpu values. OTOH, we can not reset the percpu counters
from user context without causing the incrementing to use old
and bogus values.
So, as Eric Dumazet suggested fix that by moving all overhead
to stats reading in user context. Do not introduce overhead in
timer context (estimator) and incrementing (packet handling in
softirqs).
The new ustats0 field holds the zero point for all
counter values, the rates always use 0 as base value as before.
When showing the values to user space just give the difference
between counters and the base values. The only drawback is that
percpu stats are not zeroed, they are accessible only from /proc
and are new interface, so it should not be a compatibility problem
as long as the sum stats are correct after zeroing.
Signed-off-by: Julian Anastasov <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
The global tot_stats contains cpustats field just like the
stats for dest and svc, so better use it to simplify the usage
in estimation_timer. As tot_stats is registered as estimator
we can remove the special ip_vs_read_cpu_stats call for
tot_stats. Fix ip_vs_read_cpu_stats to be called under
stats lock because it is still used as synchronization between
estimation timer and user context (the stats readers).
Also, make sure ip_vs_stats_percpu_show reads properly
the u64 stats from user context.
Signed-off-by: Julian Anastasov <[email protected]>
Eric Dumazet <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
The semantic patch that makes this output is available
in scripts/coccinelle/api/memdup.cocci.
More information about semantic patching is available at
http://coccinelle.lip6.fr/
Signed-off-by: Shan Wei <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
ip_vs_read_cpu_stats is called only from timer, so
no need for _bh locks.
Signed-off-by: Julian Anastasov <[email protected]>
Signed-off-by: Hans Schillstrom <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
Restore the previous behaviour to lookup for fwmark
service only when fwmark is non-null. This saves only CPU.
Signed-off-by: Julian Anastasov <[email protected]>
Signed-off-by: Hans Schillstrom <[email protected]>
Signed-off-by: Simon Horman <[email protected]>
|
|
Signed-off-by: Mark Rustad <[email protected]>
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
HyStart sets the initial exit point of slow start.
Suppose that HyStart exits at 0.5BDP in a BDP network and no history exists.
If the BDP of a network is large, CUBIC's initial cwnd growth may be
too conservative to utilize the link.
CUBIC increases the cwnd 20% per RTT in this case.
Signed-off-by: Sangtae Ha <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Make HyStart less sensitive to abrupt delay variations due to buffer bloat.
Signed-off-by: Sangtae Ha <[email protected]>
Acked-by: Stephen Hemminger <[email protected]>
Reported-by: Lucas Nussbaum <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This is a refined version of an earlier patch by Lucas Nussbaum.
Cubic needs RTT values in milliseconds. If HZ < 1000 then
the values will be too coarse.
Signed-off-by: Stephen Hemminger <[email protected]>
Reported-by: Lucas Nussbaum <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The hystart code was written with assumption that HZ=1000.
Replace the use of jiffies with bictcp_clock as a millisecond
real time clock.
Signed-off-by: Stephen Hemminger <[email protected]>
Reported-by: Lucas Nussbaum <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Make the spacing between ACK's that indicates a train a tuneable
value like other hystart values.
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Jiffies wraps around therefore the correct way to compare is
to use cast to signed value.
Note: cubic is not using full jiffies value on 64 bit arch
because using full unsigned long makes struct bictcp grow too
large for the available ca_priv area.
Includes correction from Sangtae Ha to improve ack train detection.
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In the congestion control interface, the callback for each ACK
includes an estimated round trip time in microseconds.
Some algorithms need high resolution (Vegas style) but most only
need jiffie resolution. If RTT is not accurate (like a retransmission)
-1 is used as a flag value.
When doing coarse resolution if RTT is less than a a jiffie
then 0 should be returned rather than no estimate. Otherwise algorithms
that expect good ack's to trigger slow start (like CUBIC Hystart)
will be confused.
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We latch our state using a spinlock not a r/w kind of lock.
Signed-off-by: Daniel Baluta <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If Spanning Tree Protocol is not enabled, there is no good reason for
the bridge code to wait for the forwarding delay period before enabling
the link. The purpose of the forwarding delay is to allow STP to
learn about other bridges before nominating itself.
The only possible impact is that when starting up a new port
the bridge may flood a packet now, where previously it might have
seen traffic from the other host and preseeded the forwarding table.
Includes change for local variable br already available in that func.
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This makes the bridge device behave like a physical device.
In earlier releases the bridge always asserted carrier. This
changes the behavior so that bridge device carrier is on only
if one or more ports are in the forwarding state. This
should help IPv6 autoconfiguration, DHCP, and routing daemons.
I did brief testing with Network and Virt manager and they
seem fine, but since this changes behavior of bridge, it should
wait until net-next (2.6.39).
Signed-off-by: Stephen Hemminger <[email protected]>
Reviewed-by: Nicolas de Pesloüan <[email protected]>
Tested-By: Adam Majer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6
|
|
(bug introduced by commit 26ad787962ef84677a48c560
(pktgen: speedup fragmented skbs)
The headers of pktgen were incorrectly added in a pktgen packet
without frags (frags=0). There was an offset in the pktgen headers.
The cause was in reusing the pgh variable as a return variable in skb_put
when adding the payload to the skb.
Signed-off-by: Daniel Turull <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
|
|
When running an AP interface along with the cooked monitor interface created
by hostapd, adding an interface and deleting it again triggers a channel type
recalculation during which the (non-HT) monitor interface takes precedence
over the HT AP interface, thus causing the channel type to be set to non-HT.
Fix this by ensuring that a more wide channel type will not be overwritten
by a less wide channel type.
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
|
|
Devices without multi rate retry support won't be able to use all rates
as specified by mintrel_ht. Hence, we can simply skip setting up further
rates as the devices will only use the first one.
Also add a special case for devices with only two possible tx rates. We
use sample_rate -> max_prob_rate for sampling and max_tp_rate ->
max_prob_rate by default.
Signed-off-by: Helmut Schaa <[email protected]>
Signed-off-by: John W. Linville <[email protected]>
|
|
Message in log because sysctl table was not empty at netns exit
WARNING: at net/sysctl_net.c:84 sysctl_net_exit+0x2a/0x2c()
Instrumenting showed that the nf_conntrack_timestamp was the entry
that was being created but not cleared.
Signed-off-by: Stephen Hemminger <[email protected]>
Signed-off-by: Patrick McHardy <[email protected]>
|
|
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: NFSROOT should default to "proto=udp"
nfs4: remove duplicated #include
NFSv4: nfs4_state_mark_reclaim_nograce() should be static
NFSv4: Fix the setlk error handler
NFSv4.1: Fix the handling of the SEQUENCE status bits
NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
NFSv4.1 reclaim complete must wait for completion
NFSv4: remove duplicate clientid in struct nfs_client
NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY
sunrpc: Propagate errors from xs_bind() through xs_create_sock()
(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid
nfs: fix compilation warning
nfs: add kmalloc return value check in decode_and_add_ds
SUNRPC: Remove resource leak in svc_rdma_send_error()
nfs: close NFSv4 COMMIT vs. CLOSE race
SUNRPC: Close a race in __rpc_wait_for_completion_task()
|
|
As Stephen correctly points out, we need to return -ENOENT in
xt_find_match()/xt_find_target() after the patch "netfilter: x_tables:
misuse of try_then_request_module" in order to properly indicate
a non-existant module to the caller.
Signed-off-by: Patrick McHardy <[email protected]>
|
|
Remove bogus semicolon only recently introduced in 34e46258cb9f5
that blocks cleanup of nodes for N>1 on shutdown.
Signed-off-by: Paul Gortmaker <[email protected]>
|
|
all remaining callers pass LOOKUP_PARENT to it, so
flags argument can die; renamed to kern_path_parent()
Signed-off-by: Al Viro <[email protected]>
|
|
After commit 7b46ac4e77f3224a (inetpeer: Don't disable BH for initial
fast RCU lookup.), we should use call_rcu() to wait proper RCU grace
period.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds a netlink based user interface to configure
esn and big anti-replay windows. The new netlink attribute
XFRMA_REPLAY_ESN_VAL is used to configure the new implementation.
If the XFRM_STATE_ESN flag is set, we use esn and support for big
anti-replay windows for the configured state. If this flag is not
set we use the new implementation with 32 bit sequence numbers.
A big anti-replay window can be configured in this case anyway.
Signed-off-by: Steffen Klassert <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds support for IPsec extended sequence numbers (esn)
as defined in RFC 4303. The bits to manage the anti-replay window
are based on a patch from Alex Badea.
Signed-off-by: Steffen Klassert <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|