Age | Commit message (Collapse) | Author | Files | Lines |
|
This commit removes kfree_rcu() special-casing and the lazy-callback
handling from Tree RCU. It moves some of this special casing to Tiny RCU,
the removal of which will be the subject of later commits.
This results in a nice negative delta.
Suggested-by: Paul E. McKenney <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
[ paulmck: Add slab.h #include, thanks to kbuild test robot <[email protected]>. ]
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
Recently a discussion about stability and performance of a system
involving a high rate of kfree_rcu() calls surfaced on the list [1]
which led to another discussion how to prepare for this situation.
This patch adds basic batching support for kfree_rcu(). It is "basic"
because we do none of the slab management, dynamic allocation, code
moving or any of the other things, some of which previous attempts did
[2]. These fancier improvements can be follow-up patches and there are
different ideas being discussed in those regards. This is an effort to
start simple, and build up from there. In the future, an extension to
use kfree_bulk and possibly per-slab batching could be done to further
improve performance due to cache-locality and slab-specific bulk free
optimizations. By using an array of pointers, the worker thread
processing the work would need to read lesser data since it does not
need to deal with large rcu_head(s) any longer.
Torture tests follow in the next patch and show improvements of around
5x reduction in number of grace periods on a 16 CPU system. More
details and test data are in that patch.
There is an implication with rcu_barrier() with this patch. Since the
kfree_rcu() calls can be batched, and may not be handed yet to the RCU
machinery in fact, the monitor may not have even run yet to do the
queue_rcu_work(), there seems no easy way of implementing rcu_barrier()
to wait for those kfree_rcu()s that are already made. So this means a
kfree_rcu() followed by an rcu_barrier() does not imply that memory will
be freed once rcu_barrier() returns.
Another implication is higher active memory usage (although not
run-away..) until the kfree_rcu() flooding ends, in comparison to
without batching. More details about this are in the second patch which
adds an rcuperf test.
Finally, in the near future we will get rid of kfree_rcu() special casing
within RCU such as in rcu_do_batch and switch everything to just
batching. Currently we don't do that since timer subsystem is not yet up
and we cannot schedule the kfree_rcu() monitor as the timer subsystem's
lock are not initialized. That would also mean getting rid of
kfree_call_rcu_nobatch() entirely.
[1] http://lore.kernel.org/lkml/[email protected]
[2] https://lkml.org/lkml/2017/12/19/824
Cc: [email protected]
Cc: [email protected]
Co-developed-by: Byungchul Park <[email protected]>
Signed-off-by: Byungchul Park <[email protected]>
Signed-off-by: Joel Fernandes (Google) <[email protected]>
[ paulmck: Applied 0day and Paul Walmsley feedback on ->monitor_todo. ]
[ paulmck: Make it work during early boot. ]
[ paulmck: Add a crude early boot self-test. ]
[ paulmck: Style adjustments and experimental docbook structure header. ]
Link: https://lore.kernel.org/lkml/[email protected]/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
This implements MP_CAPABLE options parsing and writing according
to RFC 6824 bis / RFC 8684: MPTCP v1.
Local key is sent on syn/ack, and both keys are sent on 3rd ack.
MP_CAPABLE messages len are updated accordingly. We need the skbuff to
correctly emit the above, so we push the skbuff struct as an argument
all the way from tcp code to the relevant mptcp callbacks.
When processing incoming MP_CAPABLE + data, build a full blown DSS-like
map info, to simplify later processing. On child socket creation, we
need to record the remote key, if available.
Signed-off-by: Christoph Paasch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Parses incoming DSS options and populates outgoing MPTCP ACK
fields. MPTCP fields are parsed from the TCP option header and placed in
an skb extension, allowing the upper MPTCP layer to access MPTCP
options after the skb has gone through the TCP stack.
The subflow implements its own data_ready() ops, which ensures that the
pending data is in sequence - according to MPTCP seq number - dropping
out-of-seq skbs. The DATA_READY bit flag is set if this is the case.
This allows the MPTCP socket layer to determine if more data is
available without having to consult the individual subflows.
It additionally validates the current mapping and propagates EoF events
to the connection socket.
Co-developed-by: Paolo Abeni <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Co-developed-by: Peter Krystad <[email protected]>
Signed-off-by: Peter Krystad <[email protected]>
Co-developed-by: Davide Caratti <[email protected]>
Signed-off-by: Davide Caratti <[email protected]>
Co-developed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Co-developed-by: Florian Westphal <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Christoph Paasch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Per-packet metadata required to write the MPTCP DSS option is written to
the skb_ext area. One write to the socket may contain more than one
packet of data, which is copied to page fragments and mapped in to MPTCP
DSS segments with size determined by the available page fragments and
the maximum mapping length allowed by the MPTCP specification. If
do_tcp_sendpages() splits a DSS segment in to multiple skbs, that's ok -
the later skbs can either have duplicated DSS mapping information or
none at all, and the receiver can handle that.
The current implementation uses the subflow frag cache and tcp
sendpages to avoid excessive code duplication. More work is required to
ensure that it works correctly under memory pressure and to support
MPTCP-level retransmissions.
The MPTCP DSS checksum is not yet implemented.
Co-developed-by: Paolo Abeni <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Co-developed-by: Peter Krystad <[email protected]>
Signed-off-by: Peter Krystad <[email protected]>
Co-developed-by: Florian Westphal <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Christoph Paasch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request,
to capture the MP_CAPABLE in the received SYN-ACK, to add MP_CAPABLE to
the final ACK of the three-way handshake.
Use the .sk_rx_dst_set() handler in the subflow proto to capture when the
responding SYN-ACK is received and notify the MPTCP connection layer.
Co-developed-by: Paolo Abeni <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Co-developed-by: Florian Westphal <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Peter Krystad <[email protected]>
Signed-off-by: Christoph Paasch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use ULP to associate a subflow_context structure with each TCP subflow
socket. Creating these sockets requires new bind and connect functions
to make sure ULP is set up immediately when the subflow sockets are
created.
Co-developed-by: Florian Westphal <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Co-developed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Co-developed-by: Davide Caratti <[email protected]>
Signed-off-by: Davide Caratti <[email protected]>
Co-developed-by: Paolo Abeni <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Peter Krystad <[email protected]>
Signed-off-by: Christoph Paasch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add hooks to parse and format the MP_CAPABLE option.
This option is handled according to MPTCP version 0 (RFC6824).
MPTCP version 1 MP_CAPABLE (RFC6824bis/RFC8684) will be added later in
coordination with related code changes.
Co-developed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Co-developed-by: Florian Westphal <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Co-developed-by: Davide Caratti <[email protected]>
Signed-off-by: Davide Caratti <[email protected]>
Signed-off-by: Peter Krystad <[email protected]>
Signed-off-by: Christoph Paasch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Implements the infrastructure for MPTCP sockets.
MPTCP sockets open one in-kernel TCP socket per subflow. These subflow
sockets are only managed by the MPTCP socket that owns them and are not
visible from userspace. This commit allows a userspace program to open
an MPTCP socket with:
sock = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP);
The resulting socket is simply a wrapper around a single regular TCP
socket, without any of the MPTCP protocol implemented over the wire.
Co-developed-by: Florian Westphal <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Co-developed-by: Peter Krystad <[email protected]>
Signed-off-by: Peter Krystad <[email protected]>
Co-developed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Co-developed-by: Paolo Abeni <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Christoph Paasch <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The first per-vlan option added is state, it is needed for EVPN and for
per-vlan STP. The state allows to control the forwarding on per-vlan
basis. The vlan state is considered only if the port state is forwarding
in order to avoid conflicts and be consistent. br_allowed_egress is
called only when the state is forwarding, but the ingress case is a bit
more complicated due to the fact that we may have the transition between
port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still
allow the bridge to learn from the packet after vlan filtering and it will
be dropped after that. Also to optimize the pvid state check we keep a
copy in the vlan group to avoid one lookup. The state members are
modified with *_ONCE() to annotate the lockless access.
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds support for option modification of single vlans and
ranges. It allows to only modify options, i.e. skip create/delete by
using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range
option changes we try to pack the notifications as much as possible.
v2: do full port (all vlans) notification only when creating/deleting
vlans for compatibility, rework the range detection when changing
options, add more verbose extack errors and check if a vlan should
be used (br_vlan_should_use checks)
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add kvm_vcpu_destroy() and wire up all architectures to call the common
function instead of their arch specific implementation. The common
destruction function will be used by future patches to move allocation
and initialization of vCPUs to common KVM code, i.e. to free resources
that are allocated by arch agnostic code.
No functional change intended.
Acked-by: Christoffer Dall <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Add a pre-allocation arch hook to handle checks that are currently done
by arch specific code prior to allocating the vCPU object. This paves
the way for moving the allocation to common KVM code.
Acked-by: Christoffer Dall <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Cornelia Huck <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove KVM's declaration of kvm_arch_vcpu_free() now that the function
is gone from all architectures (several architectures were relying on
the forward declaration).
Acked-by: Christoffer Dall <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
There are a lot of similar global registers being used across multiple SoCs
from Unisoc. But most of these registers are assigned with different offset
for different SoCs. It is hard to handle all of them in an all-in-one
kernel image.
Add a helper function to get regmap with arguments where we could put some
extra information such as the offset value.
Signed-off-by: Orson Zhai <[email protected]>
Tested-by: Baolin Wang <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
|
|
The DSI PLLs are handled by the generic clock framework
since ages, this code is completely unused and misleading.
Delete it.
Cc: Stephan Gerhold <[email protected]>
Cc: Ulf Hansson <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
|
|
The display clocks are handled by the generic clock framework
since ages, this code is completely unused and misleading.
Delete it.
Cc: Stephan Gerhold <[email protected]>
Cc: Ulf Hansson <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
|
|
ROHM BD71828 PMIC RTC block is from many parts similar to one
on BD70528. Support BD71828 RTC using BD70528 RTC driver and
avoid re-inventing the wheel.
Signed-off-by: Matti Vaittinen <[email protected]>
Acked-by: Alexandre Belloni <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
|
|
When RTC is used in 24H mode (and it is by this driver) the maximum
hour value is 24 in BCD. This occupies bits [5:0] - which means
correct mask for HOUR register is 0x3f not 0x1f. Fix the mask
Fixes: 32a4a4ebf768 ("rtc: bd70528: Initial support for ROHM bd70528 RTC")
Signed-off-by: Matti Vaittinen <[email protected]>
Acked-by: Alexandre Belloni <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
|
|
Few ROHM PMICs allow setting the voltage states for different system states
like RUN, IDLE, SUSPEND and LPSR. States are then changed via SoC specific
mechanisms. bd718x7 driver implemented device-tree parsing functions for
these state specific voltages. The parsing functions can be re-used by
other ROHM chip drivers like bd71828. Split the generic functions from
bd718x7-regulator.c to rohm-regulator.c and export them for other modules
to use.
Signed-off-by: Matti Vaittinen <[email protected]>
Acked-by: Mark Brown <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
|
|
BD71828GW is a single-chip power management IC for battery-powered portable
devices. Add support for controlling BD71828 clk using bd718x7 driver.
Signed-off-by: Matti Vaittinen <[email protected]>
Acked-by: Stephen Boyd <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
|
|
BD71828GW is a single-chip power management IC for battery-powered portable
devices. The IC integrates 7 buck converters, 7 LDOs, and a 1500 mA
single-cell linear charger. Also included is a Coulomb counter, a real-time
clock (RTC), 3 GPO/regulator control pins, HALL input and a 32.768 kHz
clock gate.
Add MFD core driver providing interrupt controller facilities and i2c
access to sub device drivers.
Signed-off-by: Matti Vaittinen <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
|
|
Thanks to Stephen Boyd I today learned we can use platform_device_id
to do device and module matching for MFD sub-devices!
Do device matching using the platform_device_id instead of using
explicit module_aliases to load modules and custom parent-data field
to do module loading and sub-device matching.
Cc: Stephen Boyd <[email protected]>
Signed-off-by: Matti Vaittinen <[email protected]>
Acked-by: Mark Brown <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
|
|
Currently it is not easy to find out which DMA channels are in use, and
which slave devices are using which channels.
Fix this by creating two symlinks between the DMA channel and the actual
slave device when a channel is requested:
1. A "slave" symlink from DMA channel to slave device,
2. A "dma:<name>" symlink slave device to DMA channel.
When the channel is released, the symlinks are removed again.
The latter requires keeping track of the slave device and the channel
name in the dma_chan structure.
Note that this is limited to channel request functions for requesting an
exclusive slave channel that take a device pointer (dma_request_chan()
and dma_request_slave_channel*()).
Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Niklas Söderlund <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
|
|
The idxd driver introduces the Intel Data Stream Accelerator [1] that will
be available on future Intel Xeon CPUs. One of the kernel access
point for the driver is through the dmaengine subsystem. It will initially
provide the DMA copy service to the kernel.
Some of the main functionality introduced with this accelerator
are: shared virtual memory (SVM) support, and descriptor submission using
Intel CPU instructions movdir64b and enqcmds. There will be additional
accelerator devices that share the same driver with variations to
capabilities.
This commit introduces the probe and initialization component of the
driver.
[1]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/157965023991.73301.6186843973135311580.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <[email protected]>
|
|
With the channel registration routines broken out, now add support code to
allow independent registering and unregistering of channels in a hotplug fashion.
Signed-off-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/157965023364.73301.7821862091077299040.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Vinod Koul <[email protected]>
|
|
If a chip is write protected, we can not change any limits, and we can
not clear status flags. This may be the reason why clearing status flags
is reported to not work for some chips. Detect the condition in the pmbus
core. If the chip is write protected, set limit attributes as read-only,
and set the flag indicating that the status flag should be ignored.
Signed-off-by: Guenter Roeck <[email protected]>
|
|
The hwmon ABI supports enable attributes since commit fb41a710f84e
("hwmon: Document the sensor enable attribute"), but did not
add support for those attributes to the hwmon core. Do that now.
Since the enable attributes are logically the most important attributes,
they are added as first attribute to the attribute list. Move
hwmon_in_enable from last to first place for consistency.
Signed-off-by: Guenter Roeck <[email protected]>
|
|
Add templates for intrusion%d_alarm and intrusion%d_beep.
Note, these start at 0.
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Guenter Roeck <[email protected]>
|
|
Pull XArray fixes from Matthew Wilcox:
"Primarily bugfixes, mostly around handling index wrap-around
correctly.
A couple of doc fixes and adding missing APIs.
I had an oops live on stage at linux.conf.au this year, and it turned
out to be a bug in xas_find() which I can't prove isn't triggerable in
the current codebase. Then in looking for the bug, I spotted two more
bugs.
The bots have had a few days to chew on this with no problems
reported, and it passes the test-suite (which now has more tests to
make sure these problems don't come back)"
* tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax:
XArray: Add xa_for_each_range
XArray: Fix xas_find returning too many entries
XArray: Fix xa_find_after with multi-index entries
XArray: Fix infinite loop with entry at ULONG_MAX
XArray: Add wrappers for nested spinlocks
XArray: Improve documentation of search marks
XArray: Fix xas_pause at ULONG_MAX
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various tracing fixes:
- Fix a function comparison warning for a xen trace event macro
- Fix a double perf_event linking to a trace_uprobe_filter for
multiple events
- Fix suspicious RCU warnings in trace event code for using
list_for_each_entry_rcu() when the "_rcu" portion wasn't needed.
- Fix a bug in the histogram code when using the same variable
- Fix a NULL pointer dereference when tracefs lockdown enabled and
calling trace_set_default_clock()
- A fix to a bug found with the double perf_event linking patch"
* tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/uprobe: Fix to make trace_uprobe_filter alignment safe
tracing: Do not set trace clock if tracefs lockdown is in effect
tracing: Fix histogram code when expression has same var as value
tracing: trigger: Replace unneeded RCU-list traversals
tracing/uprobe: Fix double perf_event linking on multiprobe uprobe
tracing: xen: Ordered comparison of function pointers
|
|
From: Dave Hansen <[email protected]>
MPX is being removed from the kernel due to a lack of support
in the toolchain going forward (gcc).
arch_bprm_mm_init() is used at execve() time. The only non-stub
implementation is on x86 for MPX. Remove the hook entirely from
all architectures and generic code.
Cc: Peter Zijlstra (Intel) <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: [email protected]
Cc: Linus Torvalds <[email protected]>
Cc: [email protected]
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Anton Ivanov <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Dave Hansen <[email protected]>
|
|
It's useful to dump written block and keys on btree write, this patch
add them into trace_bcache_btree_write.
Signed-off-by: Guoju Fang <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Avoid a pointless dependency on buffer heads in bcache by simply open
coding reading a single page. Also add a SB_OFFSET define for the
byte offset of the superblock instead of using magic numbers.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Split out an on-disk version struct cache_sb with the proper endianness
annotations. This fixes a fair chunk of sparse warnings, but there are
some left due to the way the checksum is defined.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of using the legacy GPIO API and keeping track on
polarity inversion semantics in the driver, switch to use
GPIO descriptors for this driver and change all consumers
in the process.
This makes it possible to retire platform data completely:
the only remaining platform data member was "wakeup" which
was intended to make the vbus interrupt wakeup capable,
but was not set by any users and thus remained unused. VBUS
was not waking any devices up. Leave a comment about it so
later developers using the platform can consider setting it
to always enabled so plugging in USB wakes up the platform.
Cc: Daniel Mack <[email protected]>
Cc: Haojian Zhuang <[email protected]>
Acked-by: Robert Jarzmik <[email protected]>
Acked-by: Felipe Balbi <[email protected]>
Acked-by: Sylwester Nawrocki <[email protected]>
Acked-by: Philipp Zabel <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This patch fixes the following sparse errors:
kernel/module.c:3623:9: error: incompatible types in comparison expression
kernel/module.c:4060:41: error: incompatible types in comparison expression
kernel/module.c:4203:28: error: incompatible types in comparison expression
kernel/module.c:4225:41: error: incompatible types in comparison expression
Signed-off-by: Madhuparna Bhowmik <[email protected]>
Signed-off-by: Jessica Yu <[email protected]>
|
|
|
|
gpiochip_set_chained_irqchip() would assign a chained handler
to a GPIO chip. We now populate struct gpio_irq_chip for all
chained GPIO irqchips so drop this function.
Cc: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Linus Walleij <[email protected]>
|
|
|
|
|
|
|
|
Principles:
- Packets are classified on flows.
- This is a Stochastic model (as we use a hash, several flows might
be hashed to the same slot)
- Each flow has a PIE managed queue.
- Flows are linked onto two (Round Robin) lists,
so that new flows have priority on old ones.
- For a given flow, packets are not reordered.
- Drops during enqueue only.
- ECN capability is off by default.
- ECN threshold (if ECN is enabled) is at 10% by default.
- Uses timestamps to calculate queue delay by default.
Usage:
tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ]
[ target TIME ] [ tupdate TIME ]
[ alpha NUMBER ] [ beta NUMBER ]
[ quantum BYTES ] [ memory_limit BYTES ]
[ ecnprob PERCENTAGE ] [ [no]ecn ]
[ [no]bytemode ] [ [no_]dq_rate_estimator ]
defaults:
limit: 10240 packets, flows: 1024
target: 15 ms, tupdate: 15 ms (in jiffies)
alpha: 1/8, beta : 5/4
quantum: device MTU, memory_limit: 32 Mb
ecnprob: 10%, ecn: off
bytemode: off, dq_rate_estimator: off
Signed-off-by: Mohit P. Tahiliani <[email protected]>
Signed-off-by: Sachin D. Patil <[email protected]>
Signed-off-by: V. Saicharan <[email protected]>
Signed-off-by: Mohit Bhasi <[email protected]>
Signed-off-by: Leslie Monis <[email protected]>
Signed-off-by: Gautam Ramakrishnan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch makes the drop_early(), calculate_probability() and
pie_process_dequeue() functions generic enough to be used by
both PIE and FQ-PIE (to be added in a future commit). The major
change here is in the way the functions take in arguments. This
patch exports these functions and makes FQ-PIE dependent on
sch_pie.
Signed-off-by: Mohit P. Tahiliani <[email protected]>
Signed-off-by: Leslie Monis <[email protected]>
Signed-off-by: Gautam Ramakrishnan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Improve the comments along with the commenting style used to
describe the members of the structures and their initial values
in the init functions.
Signed-off-by: Mohit P. Tahiliani <[email protected]>
Signed-off-by: Leslie Monis <[email protected]>
Signed-off-by: Gautam Ramakrishnan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Rearrange the members of the structure such that closely
referenced members appear together and/or fit in the same
cacheline. Also, change the order of their initializations to
match the order in which they appear in the structure.
Signed-off-by: Mohit P. Tahiliani <[email protected]>
Signed-off-by: Leslie Monis <[email protected]>
Signed-off-by: Gautam Ramakrishnan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Linux best practice recommends using u8 for true/false values in
structures.
Signed-off-by: Mohit P. Tahiliani <[email protected]>
Signed-off-by: Leslie Monis <[email protected]>
Signed-off-by: Gautam Ramakrishnan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Rearrange macros in order of length and align the values to
improve readability.
Signed-off-by: Mohit P. Tahiliani <[email protected]>
Signed-off-by: Leslie Monis <[email protected]>
Signed-off-by: Gautam Ramakrishnan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use the U64_MAX macro to denote the constant (2^64 - 1).
Signed-off-by: Mohit P. Tahiliani <[email protected]>
Signed-off-by: Leslie Monis <[email protected]>
Signed-off-by: Gautam Ramakrishnan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch moves macros, structures and small functions common
to PIE and FQ-PIE (to be added in a future commit) from the file
net/sched/sch_pie.c to the header file include/net/pie.h.
All the moved functions are made inline.
Signed-off-by: Mohit P. Tahiliani <[email protected]>
Signed-off-by: Leslie Monis <[email protected]>
Signed-off-by: Gautam Ramakrishnan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|