aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-01-24net: Fix skb->csum update in inet_proto_csum_replace16().Praveen Chaudhary1-3/+17
skb->csum is updated incorrectly, when manipulation for NF_NAT_MANIP_SRC\DST is done on IPV6 packet. Fix: There is no need to update skb->csum in inet_proto_csum_replace16(), because update in two fields a.) IPv6 src/dst address and b.) L4 header checksum cancels each other for skb->csum calculation. Whereas inet_proto_csum_replace4 function needs to update skb->csum, because update in 3 fields a.) IPv4 src/dst address, b.) IPv4 Header checksum and c.) L4 header checksum results in same diff as L4 Header checksum for skb->csum calculation. [ [email protected]: a few comestic documentation edits ] Signed-off-by: Praveen Chaudhary <[email protected]> Signed-off-by: Zhenggen Xu <[email protected]> Signed-off-by: Andy Stracner <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-24netfilter: nf_tables: autoload modules from the abort pathPablo Neira Ayuso4-44/+91
This patch introduces a list of pending module requests. This new module list is composed of nft_module_request objects that contain the module name and one status field that tells if the module has been already loaded (the 'done' field). In the first pass, from the preparation phase, the netlink command finds that a module is missing on this list. Then, a module request is allocated and added to this list and nft_request_module() returns -EAGAIN. This triggers the abort path with the autoload parameter set on from nfnetlink, request_module() is called and the module request enters the 'done' state. Since the mutex is released when loading modules from the abort phase, the module list is zapped so this is iteration occurs over a local list. Therefore, the request_module() calls happen when object lists are in consistent state (after fulling aborting the transaction) and the commit list is empty. On the second pass, the netlink command will find that it already tried to load the module, so it does not request it again and nft_request_module() returns 0. Then, there is a look up to find the object that the command was missing. If the module was successfully loaded, the command proceeds normally since it finds the missing object in place, otherwise -ENOENT is reported to userspace. This patch also updates nfnetlink to include the reason to enter the abort phase, which is required for this new autoload module rationale. Fixes: ec7470b834fe ("netfilter: nf_tables: store transaction list locally while requesting module") Reported-by: [email protected] Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-24netfilter: nf_tables: add __nft_chain_type_get()Pablo Neira Ayuso1-8/+21
This new helper function validates that unknown family and chain type coming from userspace do not trigger an out-of-bound array access. Bail out in case __nft_chain_type_get() returns NULL from nft_chain_parse_hook(). Fixes: 9370761c56b6 ("netfilter: nf_tables: convert built-in tables/chains to chain types") Reported-by: [email protected] Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-24netfilter: nf_tables_offload: fix check the chain offload flagwenxu1-1/+1
In the nft_indr_block_cb the chain should check the flag with NFT_CHAIN_HW_OFFLOAD. Fixes: 9a32669fecfb ("netfilter: nf_tables_offload: support indr block call") Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-24smp: Remove allocation mask from on_each_cpu_cond.*()Sebastian Andrzej Siewior6-20/+11
The allocation mask is no longer used by on_each_cpu_cond() and on_each_cpu_cond_mask() and can be removed. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-24smp: Add a smp_cond_func_t argument to smp_call_function_many()Sebastian Andrzej Siewior1-43/+38
on_each_cpu_cond_mask() allocates a new CPU mask. The newly allocated mask is a subset of the provided mask based on the conditional function. This memory allocation can be avoided by extending smp_call_function_many() with the conditional function and performing the remote function call based on the mask and the conditional function. Rename smp_call_function_many() to smp_call_function_many_cond() and add the smp_cond_func_t argument. If smp_cond_func_t is provided then it is used before invoking the function. Provide smp_call_function_many() with cond_func set to NULL. Let on_each_cpu_cond_mask() use smp_call_function_many_cond(). Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-24smp: Use smp_cond_func_t as type for the conditional functionSebastian Andrzej Siewior3-18/+16
Use a typdef for the conditional function instead defining it each time in the function prototype. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-24Merge tag 'irqchip-5.6' of ↵Thomas Gleixner26-104/+1930
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip updates from Marc Zyngier: - Conversion of the SiFive PLIC to hierarchical domains - New SiFive GPIO irqchip driver - New Aspeed SCI irqchip driver - New NXP INTMUX irqchip driver - Additional support for the Meson A1 GPIO irqchip - First part of the GICv4.1 support - Assorted fixes
2020-01-24Merge branches 'doc.2019.12.10a', 'exp.2019.12.09a', 'fixes.2020.01.24a', ↵Paul E. McKenney39-575/+1070
'kfree_rcu.2020.01.24a', 'list.2020.01.10a', 'preempt.2020.01.24a' and 'torture.2019.12.09a' into HEAD doc.2019.12.10a: Documentations updates exp.2019.12.09a: Expedited grace-period updates fixes.2020.01.24a: Miscellaneous fixes kfree_rcu.2020.01.24a: Batch kfree_rcu() work list.2020.01.10a: RCU-protected-list updates preempt.2020.01.24a: Preemptible RCU updates torture.2019.12.09a: Torture-test updates
2020-01-24rcu: Remove unused stop-machine #includePaul E. McKenney2-2/+0
Long ago, RCU used the stop-machine mechanism to implement expedited grace periods, but no longer does so. This commit therefore removes the no-longer-needed #includes of linux/stop_machine.h. Link: https://lwn.net/Articles/805317/ Reported-by: Viresh Kumar <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24powerpc: Remove comment about read_barrier_depends()Will Deacon1-2/+0
'read_barrier_depends()' doesn't exist anymore so stop talking about it. Signed-off-by: Will Deacon <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24.mailmap: Add entries for old [email protected] addressesPaul E. McKenney1-0/+4
[ paulmck: Apply Florian Fainelli feedback. ] Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24srcu: Apply *_ONCE() to ->srcu_last_gp_endPaul E. McKenney1-3/+4
The ->srcu_last_gp_end field is accessed from any CPU at any time by synchronize_srcu(), so non-initialization references need to use READ_ONCE() and WRITE_ONCE(). This commit therefore makes that change. Reported-by: [email protected] Acked-by: Marco Elver <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask()Paul E. McKenney1-8/+5
Currently, force_qs_rnp() uses a for_each_leaf_node_possible_cpu() loop containing a check of the current CPU's bit in ->qsmask. This works, but this commit saves three lines by instead using for_each_leaf_node_cpu_mask(), which combines the functionality of for_each_leaf_node_possible_cpu() and leaf_node_cpu_bit(). This commit also replaces the use of the local variable "bit" with rdp->grpmask. Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Move rcu_{expedited,normal} definitions into rcupdate.hBen Dooks2-2/+4
This commit moves the rcu_{expedited,normal} definitions from kernel/rcu/update.c to include/linux/rcupdate.h to make sure they are in sync, and also to avoid the following warning from sparse: kernel/ksysfs.c:150:5: warning: symbol 'rcu_expedited' was not declared. Should it be static? kernel/ksysfs.c:167:5: warning: symbol 'rcu_normal' was not declared. Should it be static? Signed-off-by: Ben Dooks <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.hLai Jiangshan3-22/+22
Only tree_stall.h needs to get name from GP state, so this commit moves the gp_state_names[] array and the gp_state_getname() from kernel/rcu/tree.h and kernel/rcu/tree.c, respectively, to kernel/rcu/tree_stall.h. While moving gp_state_names[], this commit uses the GCC syntax to ensure that the right string is associated with the right CPP macro. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Remove the declaration of call_rcu() in tree.hLai Jiangshan1-1/+0
The call_rcu() function is an external RCU API that is declared in include/linux/rcupdate.h. There is thus no point in redeclaring it in kernel/rcu/tree.h, so this commit removes that redundant declaration. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Fix tracepoint tracking RCU CPU kthread utilizationLai Jiangshan1-1/+1
In the call to trace_rcu_utilization() at the start of the loop in rcu_cpu_kthread(), "rcu_wait" is incorrect, plus this trace event needs to be hoisted above the loop to balance with either the "rcu_wait" or "rcu_yield", depending on how the loop exits. This commit therefore makes these changes. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Fix harmless omission of "CONFIG_" from #if conditionLai Jiangshan1-2/+2
The C preprocessor macros SRCU and TINY_RCU should instead be CONFIG_SRCU and CONFIG_TINY_RCU, respectively in the #f in kernel/rcu/rcu.h. But there is no harm when "TINY_RCU" is wrongly used, which are always non-defined, which makes "!defined(TINY_RCU)" always true, which means the code block is always included, and the included code block doesn't cause any compilation error so far in CONFIG_TINY_RCU builds. It is also the reason this change should not be taken in -stable. This commit adds the needed "CONFIG_" prefix to both macros. Not for -stable. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Avoid tick_dep_set_cpu() misorderingPaul E. McKenney1-3/+9
In the current code, rcu_nmi_enter_common() might decide to turn on the tick using tick_dep_set_cpu(), but be delayed just before doing so. Then the grace-period kthread might notice that the CPU in question had in fact gone through a quiescent state, thus turning off the tick using tick_dep_clear_cpu(). The later invocation of tick_dep_set_cpu() would then incorrectly leave the tick on. This commit therefore enlists the aid of the leaf rcu_node structure's ->lock to ensure that decisions to enable or disable the tick are carried out before they can be reversed. Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Provide wrappers for uses of ->rcu_read_lock_nestingLai Jiangshan2-21/+36
This commit provides wrapper functions for uses of ->rcu_read_lock_nesting to improve readability and to ease future changes to support inlining of __rcu_read_lock() and __rcu_read_unlock(). Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special()Paul E. McKenney1-1/+1
The rcu_node structure's ->expmask field is updated only when holding the ->lock, but is also accessed locklessly. This means that all ->expmask updates must use WRITE_ONCE() and all reads carried out without holding ->lock must use READ_ONCE(). This commit therefore changes the lockless ->expmask read in rcu_read_unlock_special() to use READ_ONCE(). Reported-by: [email protected] Signed-off-by: Paul E. McKenney <[email protected]> Acked-by: Marco Elver <[email protected]>
2020-01-24rcu: Clear ->rcu_read_unlock_special only onceLai Jiangshan1-16/+3
In rcu_preempt_deferred_qs_irqrestore(), ->rcu_read_unlock_special is cleared one piece at a time. Given that the "if" statements in this function use the copy in "special", this commit removes the clearing of the individual pieces in favor of clearing ->rcu_read_unlock_special in one go just after it has been determined to be non-zero. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Clear .exp_hint only when deferred quiescent state has been reportedLai Jiangshan1-2/+1
Currently, the .exp_hint flag is cleared in rcu_read_unlock_special(), which works, but which can also prevent subsequent rcu_read_unlock() calls from helping expedite the quiescent state needed by an ongoing expedited RCU grace period. This commit therefore defers clearing of .exp_hint from rcu_read_unlock_special() to rcu_preempt_deferred_qs_irqrestore(), thus ensuring that intervening calls to rcu_read_unlock() have a chance to help end the expedited grace period. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCULai Jiangshan2-5/+5
CONFIG_PREEMPTION and CONFIG_PREEMPT_RCU are always identical, but some code depends on CONFIG_PREEMPTION to access to rcu_preempt functionality. This patch changes CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU in these cases. Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Lai Jiangshan <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Remove kfree_call_rcu_nobatch()Joel Fernandes (Google)5-33/+5
Now that the kfree_rcu() special-casing has been removed from tree RCU, this commit removes kfree_call_rcu_nobatch() since it is no longer needed. Signed-off-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Remove kfree_rcu() special casing and lazy-callback handlingJoel Fernandes (Google)12-159/+90
This commit removes kfree_rcu() special-casing and the lazy-callback handling from Tree RCU. It moves some of this special casing to Tiny RCU, the removal of which will be the subject of later commits. This results in a nice negative delta. Suggested-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]> [ paulmck: Add slab.h #include, thanks to kbuild test robot <[email protected]>. ] Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Add support for debug_objects debugging for kfree_rcu()Joel Fernandes (Google)1-0/+8
This commit applies RCU's debug_objects debugging to the new batched kfree_rcu() implementations. The object is queued at the kfree_rcu() call and dequeued during reclaim. Tested that enabling CONFIG_DEBUG_OBJECTS_RCU_HEAD successfully detects double kfree_rcu() calls. Signed-off-by: Joel Fernandes (Google) <[email protected]> [ paulmck: Fix IRQ per kbuild test robot <[email protected]> feedback. ] Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Add multiple in-flight batches of kfree_rcu() workJoel Fernandes (Google)1-12/+39
During testing, it was observed that amount of memory consumed due kfree_rcu() batching is 300-400MB. Previously we had only a single head_free pointer pointing to the list of rcu_head(s) that are to be freed after a grace period. Until this list is drained, we cannot queue any more objects on it since such objects may not be ready to be reclaimed when the worker thread eventually gets to drainin g the head_free list. We can do better by maintaining multiple lists as done by this patch. Testing shows that memory consumption came down by around 100-150MB with just adding another list. Adding more than 1 additional list did not show any improvement. Suggested-by: Paul E. McKenney <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]> [ paulmck: Code style and initialization handling. ] [ paulmck: Fix field name, reported by kbuild test robot <[email protected]>. ] Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Make kfree_rcu() use a non-atomic ->monitor_todoJoel Fernandes1-6/+10
Because the ->monitor_todo field is always protected by krcp->lock, this commit downgrades from xchg() to non-atomic unmarked assignment statements. Signed-off-by: Joel Fernandes <[email protected]> [ paulmck: Update to include early-boot kick code. ] Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcuperf: Add kfree_rcu() performance TestsJoel Fernandes (Google)2-8/+190
This test runs kfree_rcu() in a loop to measure performance of the new kfree_rcu() batching functionality. The following table shows results when booting with arguments: rcuperf.kfree_loops=20000 rcuperf.kfree_alloc_num=8000 rcuperf.kfree_rcu_test=1 rcuperf.kfree_no_batch=X rcuperf.kfree_no_batch=X # Grace Periods Test Duration (s) X=1 (old behavior) 9133 11.5 X=0 (new behavior) 1732 12.5 On a 16 CPU system with the above boot parameters, we see that the total number of grace periods that elapse during the test drops from 9133 when not batching to 1732 when batching (a 5X improvement). The kfree_rcu() flood itself slows down a bit when batching, though, as shown. Note that the active memory consumption during the kfree_rcu() flood does increase to around 200-250MB due to the batching (from around 50MB without batching). However, this memory consumption is relatively constant. In other words, the system is able to keep up with the kfree_rcu() load. The memory consumption comes down considerably if KFREE_DRAIN_JIFFIES is increased from HZ/50 to HZ/80. A later patch will reduce memory consumption further by using multiple lists. Also, when running the test, please disable CONFIG_DEBUG_PREEMPT and CONFIG_PROVE_RCU for realistic comparisons with/without batching. Signed-off-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24rcu: Add basic support for kfree_rcu() batchingByungchul Park4-6/+206
Recently a discussion about stability and performance of a system involving a high rate of kfree_rcu() calls surfaced on the list [1] which led to another discussion how to prepare for this situation. This patch adds basic batching support for kfree_rcu(). It is "basic" because we do none of the slab management, dynamic allocation, code moving or any of the other things, some of which previous attempts did [2]. These fancier improvements can be follow-up patches and there are different ideas being discussed in those regards. This is an effort to start simple, and build up from there. In the future, an extension to use kfree_bulk and possibly per-slab batching could be done to further improve performance due to cache-locality and slab-specific bulk free optimizations. By using an array of pointers, the worker thread processing the work would need to read lesser data since it does not need to deal with large rcu_head(s) any longer. Torture tests follow in the next patch and show improvements of around 5x reduction in number of grace periods on a 16 CPU system. More details and test data are in that patch. There is an implication with rcu_barrier() with this patch. Since the kfree_rcu() calls can be batched, and may not be handed yet to the RCU machinery in fact, the monitor may not have even run yet to do the queue_rcu_work(), there seems no easy way of implementing rcu_barrier() to wait for those kfree_rcu()s that are already made. So this means a kfree_rcu() followed by an rcu_barrier() does not imply that memory will be freed once rcu_barrier() returns. Another implication is higher active memory usage (although not run-away..) until the kfree_rcu() flooding ends, in comparison to without batching. More details about this are in the second patch which adds an rcuperf test. Finally, in the near future we will get rid of kfree_rcu() special casing within RCU such as in rcu_do_batch and switch everything to just batching. Currently we don't do that since timer subsystem is not yet up and we cannot schedule the kfree_rcu() monitor as the timer subsystem's lock are not initialized. That would also mean getting rid of kfree_call_rcu_nobatch() entirely. [1] http://lore.kernel.org/lkml/[email protected] [2] https://lkml.org/lkml/2017/12/19/824 Cc: [email protected] Cc: [email protected] Co-developed-by: Byungchul Park <[email protected]> Signed-off-by: Byungchul Park <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]> [ paulmck: Applied 0day and Paul Walmsley feedback on ->monitor_todo. ] [ paulmck: Make it work during early boot. ] [ paulmck: Add a crude early boot self-test. ] [ paulmck: Style adjustments and experimental docbook structure header. ] Link: https://lore.kernel.org/lkml/[email protected]/T/#me9956f66cb611b95d26ae92700e1d901f46e8c59 Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-24Merge tag 'iommu-fixes-v5.5-rc7' of ↵Linus Torvalds2-7/+20
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: "Two fixes: - Fix NULL-ptr dereference bug in Intel IOMMU driver - Properly save and restore AMD IOMMU performance counter registers when testing if they are writable" * tag 'iommu-fixes-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix IOMMU perf counter clobbering during init iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointer
2020-01-24Merge tag 'powerpc-5.5-6' of ↵Linus Torvalds4-9/+18
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.5: - Fix our hash MMU code to avoid having overlapping ids between user and kernel, which isn't as bad as it sounds but led to crashes on some machines. - A fix for the Power9 XIVE interrupt code, which could return the wrong interrupt state in obscure error conditions. - A minor Kconfig fix for the recently added CONFIG_PPC_UV code. Thanks to Aneesh Kumar K.V, Bharata B Rao, Cédric Le Goater, Frederic Barrat" * tag 'powerpc-5.5-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm/hash: Fix sharing context ids between kernel & userspace powerpc/xive: Discard ESB load value when interrupt is invalid powerpc: Ultravisor: Fix the dependencies for CONFIG_PPC_UV
2020-01-24Merge tag 'drm-fixes-2020-01-24' of git://anongit.freedesktop.org/drm/drmLinus Torvalds15-147/+396
Pull drm fixes from Dave Airlie: "This one has a core mst fix and two i915 fixes. amdgpu just enables some hw outside experimental. The panfrost fix is a little bigger than I'd like at this stage but it fixes a fairly fundamental problem with global shared buffers in that driver, and since it's confined to that driver and I've taken a look at it, I think it's fine to get into the tree now, so it can get stable propagated as well. core/mst: - Fix SST branch device handling amdgpu: - enable renoir outside experimental i915: - Avoid overflow with huge userptr objects - uAPI fix to correctly handle negative values in engine->uabi_class/instance (cc: stable) panfrost: - Fix mapping of globally visible BO's (Boris)" * tag 'drm-fixes-2020-01-24' of git://anongit.freedesktop.org/drm/drm: drm/amdgpu: remove the experimental flag for renoir drm/panfrost: Add the panfrost_gem_mapping concept drm/i915: Align engine->uabi_class/instance with i915_drm.h drm/i915/userptr: fix size calculation drm/dp_mst: Handle SST-only branch device case
2020-01-24lib: Reduce user_access_begin() boundaries in strncpy_from_user() and ↵Christophe Leroy2-14/+14
strnlen_user() The range passed to user_access_begin() by strncpy_from_user() and strnlen_user() starts at 'src' and goes up to the limit of userspace although reads will be limited by the 'count' param. On 32 bits powerpc (book3s/32) access has to be granted for each 256Mbytes segment and the cost increases with the number of segments to unlock. Limit the range with 'count' param. Fixes: 594cc251fdd0 ("make 'user_access_begin()' do 'access_ok()'") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-24netfilter: conntrack: sctp: use distinct states for new SCTP connectionsJiri Wiesner1-3/+3
The netlink notifications triggered by the INIT and INIT_ACK chunks for a tracked SCTP association do not include protocol information for the corresponding connection - SCTP state and verification tags for the original and reply direction are missing. Since the connection tracking implementation allows user space programs to receive notifications about a connection and then create a new connection based on the values received in a notification, it makes sense that INIT and INIT_ACK notifications should contain the SCTP state and verification tags available at the time when a notification is sent. The missing verification tags cause a newly created netfilter connection to fail to verify the tags of SCTP packets when this connection has been created from the values previously received in an INIT or INIT_ACK notification. A PROTOINFO event is cached in sctp_packet() when the state of a connection changes. The CLOSED and COOKIE_WAIT state will be used for connections that have seen an INIT and INIT_ACK chunk, respectively. The distinct states will cause a connection state change in sctp_packet(). Signed-off-by: Jiri Wiesner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-24iommu/amd: Fix IOMMU perf counter clobbering during initShuah Khan1-6/+18
init_iommu_perf_ctr() clobbers the register when it checks write access to IOMMU perf counters and fails to restore when they are writable. Add save and restore to fix it. Signed-off-by: Shuah Khan <[email protected]> Fixes: 30861ddc9cca4 ("perf/x86/amd: Add IOMMU Performance Counter resource management") Reviewed-by: Suravee Suthikulpanit <[email protected]> Tested-by: Suravee Suthikulpanit <[email protected]> Signed-off-by: Joerg Roedel <[email protected]>
2020-01-24iommu/vt-d: Call __dmar_remove_one_dev_info with valid pointerJerry Snitselaar1-1/+2
It is possible for archdata.iommu to be set to DEFER_DEVICE_DOMAIN_INFO or DUMMY_DEVICE_DOMAIN_INFO so check for those values before calling __dmar_remove_one_dev_info. Without a check it can result in a null pointer dereference. This has been seen while booting a kdump kernel on an HP dl380 gen9. Cc: Joerg Roedel <[email protected]> Cc: Lu Baolu <[email protected]> Cc: David Woodhouse <[email protected]> Cc: [email protected] # 5.3+ Cc: [email protected] Fixes: ae23bfb68f28 ("iommu/vt-d: Detach domain before using a private one") Signed-off-by: Jerry Snitselaar <[email protected]> Acked-by: Lu Baolu <[email protected]> Signed-off-by: Joerg Roedel <[email protected]>
2020-01-24btrfs: scrub: Require mandatory block group RO for dev-replaceQu Wenruo1-5/+28
[BUG] For dev-replace test cases with fsstress, like btrfs/06[45] btrfs/071, looped runs can lead to random failure, where scrub finds csum error. The possibility is not high, around 1/20 to 1/100, but it's causing data corruption. The bug is observable after commit b12de52896c0 ("btrfs: scrub: Don't check free space before marking a block group RO") [CAUSE] Dev-replace has two source of writes: - Write duplication All writes to source device will also be duplicated to target device. Content: Not yet persisted data/meta - Scrub copy Dev-replace reused scrub code to iterate through existing extents, and copy the verified data to target device. Content: Previously persisted data and metadata The difference in contents makes the following race possible: Regular Writer | Dev-replace ----------------------------------------------------------------- ^ | | Preallocate one data extent | | at bytenr X, len 1M | v | ^ Commit transaction | | Now extent [X, X+1M) is in | v commit root | ================== Dev replace starts ========================= | ^ | | Scrub extent [X, X+1M) | | Read [X, X+1M) | | (The content are mostly garbage | | since it's preallocated) ^ | v | Write back happens for | | extent [X, X+512K) | | New data writes to both | | source and target dev. | v | | ^ | | Scrub writes back extent [X, X+1M) | | to target device. | | This will over write the new data in | | [X, X+512K) | v This race can only happen for nocow writes. Thus metadata and data cow writes are safe, as COW will never overwrite extents of previous transaction (in commit root). This behavior can be confirmed by disabling all fallocate related calls in fsstress (*), then all related tests can pass a 2000 run loop. *: FSSTRESS_AVOID="-f fallocate=0 -f allocsp=0 -f zero=0 -f insert=0 \ -f collapse=0 -f punch=0 -f resvsp=0" I didn't expect resvsp ioctl will fallback to fallocate in VFS... [FIX] Make dev-replace to require mandatory block group RO, and wait for current nocow writes before calling scrub_chunk(). This patch will mostly revert commit 76a8efa171bf ("btrfs: Continue replace when set_block_ro failed") for dev-replace path. The side effect is, dev-replace can be more strict on avaialble space, but definitely worth to avoid data corruption. Reported-by: Filipe Manana <[email protected]> Fixes: 76a8efa171bf ("btrfs: Continue replace when set_block_ro failed") Fixes: b12de52896c0 ("btrfs: scrub: Don't check free space before marking a block group RO") Reviewed-by: Filipe Manana <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-01-24mmc: core: Default to generic_cmd6_time as timeout in __mmc_switch()Ulf Hansson1-14/+11
All callers of __mmc_switch() should now be specifying a valid timeout for the CMD6 command. However, just to be sure, let's print a warning and default to use the generic_cmd6_time in case the provided timeout_ms argument is zero. In this context, let's also simplify some of the corresponding code and clarify some related comments. Signed-off-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-24mmc: block: Use generic_cmd6_time when modifying INAND_CMD38_ARG_EXT_CSDUlf Hansson1-3/+3
The INAND_CMD38_ARG_EXT_CSD is a vendor specific EXT_CSD register, which is used to prepare an erase/trim operation. However, it doesn't make sense to use a timeout of 10 minutes while updating the register, which becomes the case when the timeout_ms argument for mmc_switch() is set to zero. Instead, let's use the generic_cmd6_time, as that seems like a reasonable timeout to use for these cases. Signed-off-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-24mmc: core: Specify timeouts for BKOPS and CACHE_FLUSH for eMMCUlf Hansson1-3/+6
The timeout values used while waiting for a CMD6 for BKOPS or a CACHE_FLUSH to complete, are not defined by the eMMC spec. However, a timeout of 10 minutes as is currently being used, is just silly for both of these cases. Instead, let's specify more reasonable timeouts, 120s for BKOPS and 30s for CACHE_FLUSH. Signed-off-by: Ulf Hansson <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-24mmc: sdhci-cadence: remove unneeded 'inline' markerMasahiro Yamada1-1/+1
'static inline' in .c files does not make much sense because functions may or may not be inlined irrespective of the 'inline' marker. It is just a hint. This function is quite small, so very likely to be inlined by the compiler's optimization (-O2 or -Os), but it is up to the compiler after all. Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-01-24dt-bindings: mmc: rockchip-dw-mshc: add description for rk3308Johan Jonker1-0/+2
The description below is already in use for rk3308.dtsi, but was somehow never added to a document, so add "rockchip,rk3308-dw-mshc", "rockchip,rk3288-dw-mshc" for mmc nodes on a rk3308 platform to rockchip-dw-mshc.yaml. Signed-off-by: Johan Jonker <[email protected]> Acked-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-01-24dt-bindings: mmc: convert rockchip dw-mshc bindings to yamlJohan Jonker3-49/+124
Current dts files with 'dwmmc' nodes are manually verified. In order to automate this process rockchip-dw-mshc.txt has to be converted to yaml. In the new setup rockchip-dw-mshc.yaml will inherit properties from mmc-controller.yaml and synopsys-dw-mshc-common.yaml. 'dwmmc' will no longer be a valid name for a node and should be changed to 'mmc'. Signed-off-by: Johan Jonker <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-01-24dt-bindings: mmc: convert synopsys dw-mshc bindings to yamlJohan Jonker3-141/+138
Current dts files with 'dwmmc' nodes are manually verified. In order to automate this process synopsys-dw-mshc.txt has to be converted to yaml. In the new setup synopsys-dw-mshc.yaml will inherit properties from mmc-controller.yaml and synopsys-dw-mshc-common.yaml. 'dwmmc' will no longer be a valid name for a node and should be changed to 'mmc'. Signed-off-by: Johan Jonker <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-01-24mmc: sdhci-msm: Add CQHCI support for sdhci-msmRitesh Harjani2-1/+133
This adds CQHCI support for sdhci-msm platforms. Signed-off-by: Ritesh Harjani <[email protected]> Signed-off-by: Veerabhadrarao Badiganti <[email protected]> Acked-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-01-24mmc: sdhci: Let a vendor driver supply and update ADMA descriptor sizeVeerabhadrarao Badiganti2-10/+9
Let a vendor driver supply the maximum descriptor size that it can operate on. ADMA descriptor table would be allocated using this supplied size. If any SD Host controller is of version prior to v4.10 spec but supports 16byte descriptor, this change allows them to supply correct descriptor size for ADMA table allocation. Also let a vendor driver update the descriptor size by overriding sdhc_host->desc_size if it has to operates on a different descriptor sizes in different conditions. Suggested-by: Adrian Hunter <[email protected]> Signed-off-by: Veerabhadrarao Badiganti <[email protected]> Acked-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-01-24Merge branch 'netdev-seq_file-next-functions-should-increase-position-index'David S. Miller6-11/+7
Vasily Averin says: ==================== netdev: seq_file .next functions should increase position index In Aug 2018 NeilBrown noticed commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") "Some ->next functions do not increment *pos when they return NULL... Note that such ->next functions are buggy and should be fixed. A simple demonstration is dd if=/proc/swaps bs=1000 skip=1 Choose any block size larger than the size of /proc/swaps. This will always show the whole last line of /proc/swaps" Described problem is still actual. If you make lseek into middle of last output line following read will output end of last line and whole last line once again. $ dd if=/proc/swaps bs=1 # usual output Filename Type Size Used Priority /dev/dm-0 partition 4194812 97536 -2 104+0 records in 104+0 records out 104 bytes copied $ dd if=/proc/swaps bs=40 skip=1 # last line was generated twice dd: /proc/swaps: cannot skip to specified offset v/dm-0 partition 4194812 97536 -2 /dev/dm-0 partition 4194812 97536 -2 3+1 records in 3+1 records out 131 bytes copied There are lot of other affected files, I've found 30+ including /proc/net/ip_tables_matches and /proc/sysvipc/* This patch-set fixes files related to netdev@ https://bugzilla.kernel.org/show_bug.cgi?id=206283 ==================== Signed-off-by: David S. Miller <[email protected]>