aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-03-10Merge tag 'for-6.3-rc1-tag' of ↵Linus Torvalds1-1/+11
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "First batch of fixes. Among them there are two updates to sysfs and ioctl which are not strictly fixes but are used for testing so there's no reason to delay them. - fix block group item corruption after inserting new block group - fix extent map logging bit not cleared for split maps after dropping range - fix calculation of unusable block group space reporting bogus values due to 32/64b division - fix unnecessary increment of read error stat on write error - improve error handling in inode update - export per-device fsid in DEV_INFO ioctl to distinguish seeding devices, needed for testing - allocator size classes: - fix potential dead lock in size class loading logic - print sysfs stats for the allocation classes" * tag 'for-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix block group item corruption after inserting new block group btrfs: fix extent map logging bit not cleared for split maps after dropping range btrfs: fix percent calculation for bg reclaim message btrfs: fix unnecessary increment of read error stat on write error btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode btrfs: ioctl: return device fsid from DEV_INFO ioctl btrfs: fix potential dead lock in size class loading logic btrfs: sysfs: add size class stats
2023-03-10fbdev: Fix incorrect page mapping clearance at fb_deferred_io_release()Takashi Iwai1-0/+1
The recent fix for the deferred I/O by the commit 3efc61d95259 ("fbdev: Fix invalid page access after closing deferred I/O devices") caused a regression when the same fb device is opened/closed while it's being used. It resulted in a frozen screen even if something is redrawn there after the close. The breakage is because the patch was made under a wrong assumption of a single open; in the current code, fb_deferred_io_release() cleans up the page mapping of the pageref list and it calls cancel_delayed_work_sync() unconditionally, where both are no correct behavior for multiple opens. This patch adds a refcount for the opens of the device, and applies the cleanup only when all files get closed. As both fb_deferred_io_open() and _close() are called always in the fb_info lock (mutex), it's safe to use the normal int for the refcounting. Also, a useless BUG_ON() is dropped. Fixes: 3efc61d95259 ("fbdev: Fix invalid page access after closing deferred I/O devices") Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Patrik Jakobsson <patrik.r.jakobsson@gmail.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20230308105012.1845-1-tiwai@suse.de
2023-03-09Merge branch 'main' of ↵Jakub Kicinski2-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== Netfilter updates for net-next 1. nf_tables 'brouting' support, from Sriram Yagnaraman. 2. Update bridge netfilter and ovs conntrack helpers to handle IPv6 Jumbo packets properly, i.e. fetch the packet length from hop-by-hop extension header, from Xin Long. This comes with a test BIG TCP test case, added to tools/testing/selftests/net/. 3. Fix spelling and indentation in conntrack, from Jeremy Sowden. * 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nat: fix indentation of function arguments netfilter: conntrack: fix typo selftests: add a selftest for big tcp netfilter: use nf_ip6_check_hbh_len in nf_ct_skb_network_trim netfilter: move br_nf_check_hbh_len to utils netfilter: bridge: move pskb_trim_rcsum out of br_nf_check_hbh_len netfilter: bridge: check len before accessing more nh data netfilter: bridge: call pskb_may_pull in br_nf_check_hbh_len netfilter: bridge: introduce broute meta statement ==================== Link: https://lore.kernel.org/r/20230308193033.13965-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09neighbour: delete neigh_lookup_nodev as not usedLeon Romanovsky1-2/+0
neigh_lookup_nodev isn't used in the kernel after removal of DECnet. So let's remove it. Fixes: 1202cdd66531 ("Remove DECnet support from kernel") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/eb5656200d7964b2d177a36b77efa3c597d6d72d.1678267343.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09net: sched: remove qdisc_watchdog->last_expiresEric Dumazet1-1/+0
This field mirrors hrtimer softexpires, we can instead use the existing helpers. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230308182648.1150762-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09netlink: remove unused 'compare' functionFlorian Westphal1-1/+0
No users in the tree. Tested with allmodconfig build. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230308142006.20879-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski32-325/+237
Documentation/bpf/bpf_devel_QA.rst b7abcd9c656b ("bpf, doc: Link to submitting-patches.rst for general patch submission info") d56b0c461d19 ("bpf, docs: Fix link to netdev-FAQ target") https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09scsi: core: Add BLIST_NO_VPD_SIZE for some VDASDLee Duncan2-3/+5
Some storage, such as AIX VDASD (virtual storage) and IBM 2076 (front end), fail as a result of commit c92a6b5d6335 ("scsi: core: Query VPD size before getting full page"). That commit changed getting SCSI VPD pages so that we now read just enough of the page to get the actual page size, then read the whole page in a second read. The problem is that the above mentioned hardware returns zero for the page size, because of a firmware error. In such cases, until the firmware is fixed, this new blacklist flag says to revert to the original method of reading the VPD pages, i.e. try to read a whole buffer's worth on the first try. [mkp: reworked somewhat] Fixes: c92a6b5d6335 ("scsi: core: Query VPD size before getting full page") Reported-by: Martin Wilck <mwilck@suse.com> Suggested-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Lee Duncan <lduncan@suse.com> Link: https://lore.kernel.org/r/20220928181350.9948-1-leeman.duncan@gmail.com Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2023-03-09clk: Avoid invalid function names in CLK_OF_DECLARE()Nathan Chancellor1-2/+2
After commit c28cd1f3433c ("clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro"), drivers/clk/mvebu/kirkwood.c fails to build: drivers/clk/mvebu/kirkwood.c:358:1: error: expected identifier or '(' CLK_OF_DECLARE(98dx1135_clk, "marvell,mv98dx1135-core-clock", ^ include/linux/clk-provider.h:1367:21: note: expanded from macro 'CLK_OF_DECLARE' static void __init name##_of_clk_init_declare(struct device_node *np) \ ^ <scratch space>:124:1: note: expanded from here 98dx1135_clk_of_clk_init_declare ^ drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE' OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare) ^ <scratch space>:125:3: note: expanded from here 98dx1135_clk_of_clk_init_declare ^ drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE' OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare) ^ <scratch space>:125:3: note: expanded from here 98dx1135_clk_of_clk_init_declare ^ drivers/clk/mvebu/kirkwood.c:358:1: error: invalid digit 'd' in decimal constant include/linux/clk-provider.h:1372:34: note: expanded from macro 'CLK_OF_DECLARE' OF_DECLARE_1(clk, name, compat, name##_of_clk_init_declare) ^ <scratch space>:125:3: note: expanded from here 98dx1135_clk_of_clk_init_declare ^ C function names must start with either an alphabetic letter or an underscore. To avoid generating invalid function names from clock names, add two underscores to the beginning of the identifier. Fixes: c28cd1f3433c ("clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macro") Suggested-by: Saravana Kannan <saravanak@google.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://lore.kernel.org/r/20230308-clk_of_declare-fix-v1-1-317b741e2532@kernel.org Reviewed-by: Saravana Kannan <saravanak@google.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-03-09i2c: Switch .probe() to not take an id parameterUwe Kleine-König1-7/+11
Commit b8a1a4cd5a98 ("i2c: Provide a temporary .probe_new() call-back type") introduced a new probe callback to convert i2c init routines to not take an i2c_device_id parameter. Now that all in-tree drivers are converted to the temporary .probe_new() callback, .probe() can be modified to match the desired prototype. Now that .probe() and .probe_new() have the same semantic, they can be defined as members of an anonymous union to save some memory and simplify the core code a bit. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Wolfram Sang <wsa@kernel.org>
2023-03-09Merge tag 'net-6.3-rc2' of ↵Linus Torvalds3-2/+9
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - core: avoid skb end_offset change in __skb_unclone_keeptruesize() - sched: - act_connmark: handle errno on tcf_idr_check_alloc - flower: fix fl_change() error recovery path - ieee802154: prevent user from crashing the host Current release - new code bugs: - eth: bnxt_en: fix the double free during device removal - tools: ynl: - fix enum-as-flags in the generic CLI - fully inherit attrs in subsets - re-license uniformly under GPL-2.0 or BSD-3-clause Previous releases - regressions: - core: use indirect calls helpers for sk_exit_memory_pressure() - tls: - fix return value for async crypto - avoid hanging tasks on the tx_lock - eth: ice: copy last block omitted in ice_get_module_eeprom() Previous releases - always broken: - core: avoid double iput when sock_alloc_file fails - af_unix: fix struct pid leaks in OOB support - tls: - fix possible race condition - fix device-offloaded sendpage straddling records - bpf: - sockmap: fix an infinite loop error - test_run: fix &xdp_frame misplacement for LIVE_FRAMES - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR - netfilter: tproxy: fix deadlock due to missing BH disable - phylib: get rid of unnecessary locking - eth: bgmac: fix *initial* chip reset to support BCM5358 - eth: nfp: fix csum for ipsec offload - eth: mtk_eth_soc: fix RX data corruption issue Misc: - usb: qmi_wwan: add telit 0x1080 composition" * tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) tools: ynl: fix enum-as-flags in the generic CLI tools: ynl: move the enum classes to shared code net: avoid double iput when sock_alloc_file fails af_unix: fix struct pid leaks in OOB support eth: fealnx: bring back this old driver net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC net: microchip: sparx5: fix deletion of existing DSCP mappings octeontx2-af: Unlock contexts in the queue context cache in case of fault detection net/smc: fix fallback failed while sendmsg with fastopen ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause mailmap: update entries for Stephen Hemminger mailmap: add entry for Maxim Mikityanskiy nfc: change order inside nfc_se_io error path ethernet: ice: avoid gcc-9 integer overflow warning ice: don't ignore return codes in VSI related code ice: Fix DSCP PFC TLV creation net: usb: qmi_wwan: add Telit 0x1080 composition net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990 netfilter: conntrack: adopt safer max chain length net: tls: fix device-offloaded sendpage straddling records ...
2023-03-09Merge tag 'for-linus-2023030901' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid Pull HID fixes from Benjamin Tissoires: - fix potential out of bound write of zeroes in HID core with a specially crafted uhid device (Lee Jones) - fix potential use-after-free in work function in intel-ish-hid (Reka Norman) - selftests config fixes (Benjamin Tissoires) - few device small fixes and support * tag 'for-linus-2023030901' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: HID: intel-ish-hid: ipc: Fix potential use-after-free in work function HID: logitech-hidpp: Add support for Logitech MX Master 3S mouse HID: cp2112: Fix driver not registering GPIO IRQ chip as threaded selftest: hid: fix hid_bpf not set in config HID: uhid: Over-ride the default maximum data buffer value with our own HID: core: Provide new max_buffer_size attribute to over-ride the default
2023-03-09sctp: add weighted fair queueing stream schedulerXin Long3-1/+4
As it says in rfc8260#section-3.6 about the weighted fair queueing scheduler: A Weighted Fair Queueing scheduler between the streams is used. The weight is configurable per outgoing SCTP stream. This scheduler considers the lengths of the messages of each stream and schedules them in a specific way to use the capacity according to the given weights. If the weight of stream S1 is n times the weight of stream S2, the scheduler should assign to stream S1 n times the capacity it assigns to stream S2. The details are implementation dependent. Interleaving user messages allows for a better realization of the capacity usage according to the given weights. This patch adds Weighted Fair Queueing Scheduler actually based on the code of Fair Capacity Scheduler by adding fc_weight into struct sctp_stream_out_ext and taking it into account when sorting stream-> fc_list in sctp_sched_fc_sched() and sctp_sched_fc_dequeue_done(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-09sctp: add fair capacity stream schedulerXin Long3-1/+10
As it says in rfc8260#section-3.5 about the fair capacity scheduler: A fair capacity distribution between the streams is used. This scheduler considers the lengths of the messages of each stream and schedules them in a specific way to maintain an equal capacity for all streams. The details are implementation dependent. interleaving user messages allows for a better realization of the fair capacity usage. This patch adds Fair Capacity Scheduler based on the foundations added by commit 5bbbbe32a431 ("sctp: introduce stream scheduler foundations"): A fc_list and a fc_length are added into struct sctp_stream_out_ext and a fc_list is added into struct sctp_stream. In .enqueue, when there are chunks enqueued into a stream, this stream will be linked into stream-> fc_list by its fc_list ordered by its fc_length. In .dequeue, it always picks up the 1st skb from stream->fc_list. In .dequeue_done, fc_length is increased by chunk's len and update its location in stream->fc_list according to the its new fc_length. Note that when the new fc_length overflows in .dequeue_done, instead of resetting all fc_lengths to 0, we only reduced them by U32_MAX / 4 to avoid a moment of imbalance in the scheduling, as Marcelo suggested. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-03-08Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski4-1/+11
Andrii Nakryiko says: ==================== pull-request: bpf-next 2023-03-08 We've added 23 non-merge commits during the last 2 day(s) which contain a total of 28 files changed, 414 insertions(+), 104 deletions(-). The main changes are: 1) Add more precise memory usage reporting for all BPF map types, from Yafang Shao. 2) Add ARM32 USDT support to libbpf, from Puranjay Mohan. 3) Fix BTF_ID_LIST size causing problems in !CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 4) IMA selftests fix, from Roberto Sassu. 5) libbpf fix in APK support code, from Daniel Müller. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (23 commits) selftests/bpf: Fix IMA test libbpf: USDT arm arg parsing support libbpf: Refactor parse_usdt_arg() to re-use code libbpf: Fix theoretical u32 underflow in find_cd() function bpf: enforce all maps having memory usage callback bpf: offload map memory usage bpf, net: xskmap memory usage bpf, net: sock_map memory usage bpf, net: bpf_local_storage memory usage bpf: local_storage memory usage bpf: bpf_struct_ops memory usage bpf: queue_stack_maps memory usage bpf: devmap memory usage bpf: cpumap memory usage bpf: bloom_filter memory usage bpf: ringbuf memory usage bpf: reuseport_array memory usage bpf: stackmap memory usage bpf: arraymap memory usage bpf: hashtab memory usage ... ==================== Link: https://lore.kernel.org/r/20230308193533.1671597-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-08netfilter: move br_nf_check_hbh_len to utilsXin Long1-0/+2
Rename br_nf_check_hbh_len() to nf_ip6_check_hbh_len() and move it to netfilter utils, so that it can be used by other modules, like ovs and tc. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-08net: reclaim skb->scm_io_uring bitEric Dumazet2-2/+1
Commit 0091bfc81741 ("io_uring/af_unix: defer registered files gc to io_uring release") added one bit to struct sk_buff. This structure is critical for networking, and we try very hard to not add bloat on it, unless absolutely required. For instance, we can use a specific destructor as a wrapper around unix_destruct_scm(), to identify skbs that unix_gc() has to special case. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-08netfilter: bridge: introduce broute meta statementSriram Yagnaraman1-0/+2
nftables equivalent for ebtables -t broute. Implement broute meta statement to set br_netfilter_broute flag in skb to force a packet to be routed instead of being bridged. Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech> Signed-off-by: Florian Westphal <fw@strlen.de>
2023-03-07net: remove enum skb_free_reasonEric Dumazet1-11/+7
enum skb_drop_reason is more generic, we can adopt it instead. Provide dev_kfree_skb_irq_reason() and dev_kfree_skb_any_reason(). This means drivers can use more precise drop reasons if they want to. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> Link: https://lore.kernel.org/r/20230306204313.10492-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-07net: phy: improve phy_read_poll_timeoutHeiner Kallweit1-3/+2
cond sometimes is (val & MASK) what may result in a false positive if val is a negative errno. We shouldn't evaluate cond if val < 0. This has no functional impact here, but it's not nice. Therefore switch order of the checks. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/6d8274ac-4344-23b4-d9a3-cad4c39517d4@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-07ynl: re-license uniformly under GPL-2.0 OR BSD-3-ClauseJakub Kicinski2-2/+2
I was intending to make all the Netlink Spec code BSD-3-Clause to ease the adoption but it appears that: - I fumbled the uAPI and used "GPL WITH uAPI note" there - it gives people pause as they expect GPL in the kernel As suggested by Chuck re-license under dual. This gives us benefit of full BSD freedom while fulfilling the broad "kernel is under GPL" expectations. Link: https://lore.kernel.org/all/20230304120108.05dd44c5@kernel.org/ Link: https://lore.kernel.org/r/20230306200457.3903854-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-07interconnect: fix provider registration APIJohan Hovold1-0/+12
The current interconnect provider interface is inherently racy as providers are expected to be added before being fully initialised. Specifically, nodes are currently not added and the provider data is not initialised until after registering the provider which can cause racing DT lookups to fail. Add a new provider API which will be used to fix up the interconnect drivers. The old API is reimplemented using the new interface and will be removed once all drivers have been fixed. Fixes: 11f1ceca7031 ("interconnect: Add generic on-chip interconnect API") Fixes: 87e3031b6fbd ("interconnect: Allow endpoints translation via DT") Cc: stable@vger.kernel.org # 5.1 Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com> # i.MX8MP MSC SM2-MB-EP1 Board Link: https://lore.kernel.org/r/20230306075651.2449-4-johan+linaro@kernel.org Signed-off-by: Georgi Djakov <djakov@kernel.org>
2023-03-07cpumask: be more careful with 'cpumask_setall()'Linus Torvalds1-5/+5
Commit 596ff4a09b89 ("cpumask: re-introduce constant-sized cpumask optimizations") changed cpumask_setall() to use "bitmap_set()" instead of "bitmap_fill()", because bitmap_fill() would explicitly set all the bits of a constant sized small bitmap, and that's exactly what we don't want: we want to only set bits up to 'nr_cpu_ids', which is what "bitmap_set()" does. However, Yury correctly points out that while "bitmap_set()" does indeed only set bits up to the required bitmap size, it doesn't _clear_ bits above that size, so the upper bits would still not have well-defined values. Now, none of this should really matter, since any bits set past 'nr_cpu_ids' should always be ignored in the first place. Yes, the bit scanning functions might return them as a result, but since users should always consider the ">= nr_cpu_ids" condition to mean "no more bits", that shouldn't have any actual effect (see previous commit 8ca09d5fa354 "cpumask: fix incorrect cpumask scanning result checks"). But let's just do it right, the way the code was _intended_ to work. We have had enough lazy code that works but bites us in the *rse later (again, see previous commit) that there's no reason to not just do this properly. It turns out that "bitmap_fill()" gets this all right for the complex case, and really only fails for the inlined optimized case that just fills the whole word. And while we could just fix bitmap_fill() to use the proper last word mask, there's two issues with that: - the cpumask case wants to do the _optimization_ based on "NR_CPUS is a small constant", but then wants to do the actual bit _fill_ based on "nr_cpu_ids" that isn't necessarily that same constant - we have lots of non-cpumask users of bitmap_fill(), and while they hopefully don't care, and probably would want the proper semantics anyway ("only set bits up to the limit"), I do not want the cpumask changes to impact other parts So this ends up just doing the single-word optimization by hand in the cpumask code. If our cpumask is fundamentally limited to a single word, just do the proper "fill in that word" exactly. And if it's the more complex multi-word case, then the generic bitmap_fill() will DTRT. This is all an example of how our bitmap function optimizations really are somewhat broken. They conflate the "this is size of the bitmap" optimizations with the actual bit(s) we want to set. In many cases we really want to have the two be separate things: sometimes we base our optimizations on the size of the whole bitmap ("I know this whole bitmap fits in a single word, so I'll just use single-word accesses"), and sometimes we base them on the bit we are looking at ("this is just acting on bits that are in the first word, so I'll use single-word accesses"). Notice how the end result of the two optimizations are the same, but the way we get to them are quite different. And all our cpumask optimization games are really about that fundamental distinction, and we'd often really want to pass in both the "this is the bit I'm working on" (which _can_ be a small constant but might be variable), and "I know it's in this range even if it's variable" (based on CONFIG_NR_CPUS). So this cpumask_setall() implementation just makes that explicit. It checks the "I statically know the size is small" using the known static size of the cpumask (which is what that 'small_cpumask_bits' is all about), but then sets the actual bits using the exact number of cpus we have (ie 'nr_cpumask_bits') Of course, in a perfect world, the compiler would have done all the range analysis (possibly with help from us just telling it that "this value is always in this range"), and would do all of this for us. But that is not the world we live in. While we dream of that perfect world, this does that manual logic to make it all work out. And this was a very long explanation for a small code change that shouldn't even matter. Reported-by: Yury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/lkml/ZAV9nGG9e1%2FrV+L%2F@yury-laptop/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-07wifi: radiotap: separate vendor TLV into header/contentMordechay Goodstein1-6/+14
To be able to use a general function later for any kind of TLV, separate the vendor TLV header/content in the structs. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230305124407.8ac5195bb3e6.I19ad99c1ad3108453aede64bddf6ef1a7c4a0b74@changeid [separate from the original combined patch] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07bpf: offload map memory usageYafang Shao1-0/+6
A new helper is introduced to calculate offload map memory usage. But currently the memory dynamically allocated in netdev dev_ops, like nsim_map_update_elem, is not counted. Let's just put it aside now. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-18-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf, net: xskmap memory usageYafang Shao1-0/+1
A new helper is introduced to calculate xskmap memory usage. The xfsmap memory usage can be dynamically changed when we add or remove a xsk_map_node. Hence we need to track the count of xsk_map_node to get its memory usage. The result as follows, - before 10: xskmap name count_map flags 0x0 key 4B value 4B max_entries 65536 memlock 524288B - after 10: xskmap name count_map flags 0x0 <<< no elements case key 4B value 4B max_entries 65536 memlock 524608B Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-17-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf, net: bpf_local_storage memory usageYafang Shao1-0/+1
A new helper is introduced into bpf_local_storage map to calculate the memory usage. This helper is also used by other maps like bpf_cgrp_storage, bpf_inode_storage, bpf_task_storage and etc. Note that currently the dynamically allocated storage elements are not counted in the usage, since it will take extra runtime overhead in the elements update or delete path. So let's put it aside now, and implement it in the future when someone really need it. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-15-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: add new map ops ->map_mem_usageYafang Shao1-0/+2
Add a new map ops ->map_mem_usage to print the memory usage of a bpf map. This is a preparation for the followup change. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20230305124615.12358-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07bpf: Increase size of BTF_ID_LIST without CONFIG_DEBUG_INFO_BTF againNathan Chancellor1-1/+1
After commit 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr"), clang builds without CONFIG_DEBUG_INFO_BTF warn: kernel/bpf/verifier.c:10298:24: warning: array index 16 is past the end of the array (that has type 'u32[16]' (aka 'unsigned int[16]')) [-Warray-bounds] meta.func_id == special_kfunc_list[KF_bpf_dynptr_slice_rdwr]) { ^ ~~~~~~~~~~~~~~~~~~~~~~~~ kernel/bpf/verifier.c:9150:1: note: array 'special_kfunc_list' declared here BTF_ID_LIST(special_kfunc_list) ^ include/linux/btf_ids.h:207:27: note: expanded from macro 'BTF_ID_LIST' #define BTF_ID_LIST(name) static u32 __maybe_unused name[16]; ^ 1 warning generated. A warning of this nature was previously addressed by commit beb3d47d1d3d ("bpf: Fix a BTF_ID_LIST bug with CONFIG_DEBUG_INFO_BTF not set") but there have been new kfuncs added since then. Quadruple the size of the CONFIG_DEBUG_INFO_BTF=n definition so that this problem is unlikely to show up for some time. Link: https://github.com/ClangBuiltLinux/linux/issues/1810 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230307-bpf-kfuncs-warray-bounds-v1-1-00ad3191f3a6@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-07ACPI: x86: Introduce an acpi_quirk_skip_gpio_event_handlers() helperHans de Goede1-0/+5
x86 ACPI boards which ship with only Android as their factory image usually have pretty broken ACPI tables, relying on everything being hardcoded in the factory kernel image and often disabling parts of the ACPI enumeration kernel code to avoid the broken tables causing issues. Part of this broken ACPI code is that sometimes these boards have _AEI ACPI GPIO event handlers which are broken. So far this has been dealt with in the platform/x86/x86-android-tablets.c module, which contains various workarounds for these devices, by it calling acpi_gpiochip_free_interrupts() on gpiochip-s with troublesome handlers to disable the handlers. But in some cases this is too late, if the handlers are of the edge type then gpiolib-acpi.c's code will already have run them at boot. This can cause issues such as GPIOs ending up as owned by "ACPI:OpRegion", making them unavailable for drivers which actually need them. Boards with these broken ACPI tables are already listed in drivers/acpi/x86/utils.c for e.g. acpi_quirk_skip_i2c_client_enumeration(). Extend the quirks mechanism for a new acpi_quirk_skip_gpio_event_handlers() helper, this re-uses the DMI-ids rather then having to duplicate the same DMI table in gpiolib-acpi.c . Also add the new ACPI_QUIRK_SKIP_GPIO_EVENT_HANDLERS quirk to existing boards with troublesome ACPI gpio event handlers, so that the current acpi_gpiochip_free_interrupts() hack can be removed from x86-android-tablets.c . Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
2023-03-07wifi: nl80211: Add support for randomizing TA of auth and deauth framesVeerendranath Jakkam1-0/+5
Add support to use a random local address in authentication and deauthentication frames sent to unassociated peer when the driver supports. The driver needs to configure receive behavior to accept frames with random transmit address specified in TX path authentication frames during the time of the frame exchange is pending and such frames need to be acknowledged similarly to frames sent to the local permanent address when this random address functionality is used. This capability allows use of randomized transmit address for PASN authentication frames to improve privacy of WLAN clients. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20230112012415.167556-2-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: add LDPC related flags in ieee80211_bss_confRyder Lee1-0/+6
This is utilized to pass LDPC configurations from user space (i.e. hostapd) to driver. Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Link: https://lore.kernel.org/r/1de696aaa34efd77a926eb657b8c0fda05aaa177.1676628065.git.ryder.lee@mediatek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: add EHT MU-MIMO related flags in ieee80211_bss_confRyder Lee1-0/+9
Similar to VHT/HE. This is utilized to pass MU-MIMO configurations from user space (i.e. hostapd) to driver. Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Link: https://lore.kernel.org/r/8d9966c4c1e77cb1ade77d42bdc49905609192e9.1676628065.git.ryder.lee@mediatek.com [move into combined if statement, reset on !eht] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: introduce ieee80211_refresh_tx_agg_session_timer()Ryder Lee1-0/+14
This allows low level drivers to refresh the tx agg session timer, based on querying stats from the firmware usually. Especially for some mt76 devices support .net_fill_forward_path would bypass mac80211, which leads to tx BA session timeout clients that set a timeout in their AddBA response to our request, even if our request is without a timeout. Signed-off-by: Ryder Lee <ryder.lee@mediatek.com> Link: https://lore.kernel.org/r/7c3f72eac1c34921cd84a462e60d71e125862152.1676616450.git.ryder.lee@mediatek.com [slightly clarify commit message, add note about RCU] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: add support for driver adding radiotap TLVsMordechay Goodstein2-37/+27
The new TLV format enables adding TLVs after the fixed fields in radiotap, as part of the radiotap header. Support this and move vendor data to the TLV format, allowing a reuse of the RX_FLAG_RADIOTAP_VENDOR_DATA as the new RX_FLAG_RADIOTAP_TLV_AT_END flag. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.b18fd5da8477.I576400ec40a7b35ef97a3b09a99b3a49e9174786@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: radiotap: Add EHT radiotap definitionsMordechay Goodstein1-2/+185
This is based on https://www.radiotap.org/fields/EHT.html and https://www.radiotap.org/fields/U-SIG.html new EHT TLV definition for 11be standard. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.254b19fffe41.I4ce78e2c558da6e5a708a8d68d61b5d7b3eb3746@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: add netdev per-link debugfs data and driver hookBenjamin Berg1-0/+10
This adds the infrastructure to have netdev specific per-link data both for mac80211 and the driver in debugfs. For the driver, a new callback is added which is only used if MLO is supported. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.fb4c947e4df8.I69b3516ddf4c8a7501b395f652d6063444ecad63@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: add pointer from bss_conf to vifBenjamin Berg1-0/+3
While often not needed, this considerably simplifies going from a link specific bss_config to the vif. This helps with e.g. creating link specific debugfs entries inside drivers. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.46f701a10ed5.I20390b2a8165ff222d66585915689206ea93222b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: cfg80211/mac80211: report link ID on control port RXJohannes Berg1-2/+3
For control port RX, report the link ID for MLO. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.fe06dfc3791b.Iddcab94789cafe336417be406072ce8a6312fc2d@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: mac80211: add support for set_hw_timestamp commandAvraham Stern1-0/+6
Support the set_hw_timestamp callback for enabling and disabling HW timestamping if the low level driver supports it. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.700ded7badde.Ib2f7c228256ce313a04d3d9f9ecc6c7b9aa602bb@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: nl80211: add a command to enable/disable HW timestampingAvraham Stern2-0/+49
Add a command to enable and disable HW timestamping of TM and FTM frames. HW timestamping can be enabled for a specific mac address or for all addresses. The low level driver will indicate how many peers HW timestamping can be enabled concurrently, and this information will be passed to userspace. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.05678d7b1c17.Iccc08869ea8156f1c71a3111a47f86dd56234bd0@changeid [switch to needing netdev UP, minor edits] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: wireless: cleanup unused function parametersMordechay Goodstein1-3/+1
In the past ftype was used for deciding about 6G DUP beacon, but the logic was removed and ftype is not needed anymore. Signed-off-by: Mordechay Goodstein <mordechay.goodstein@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.98d4761b809b.I255f5ecd77cb24fcf2f1641bb5833ea2d121296e@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07wifi: nl80211: Update the documentation of NL80211_SCAN_FLAG_COLOCATED_6GHZIlan Peer1-2/+8
Add a detailed description of NL80211_SCAN_FLAG_COLOCATED_6GHZ flag. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Gregory Greenman <gregory.greenman@intel.com> Link: https://lore.kernel.org/r/20230301115906.487ab04feb39.I5129fd61841332474693046241586f057b134c3c@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-03-07new helper: put_and_unmap_page()Al Viro1-0/+6
kunmap_local() + put_page(), as done by e.g. ext2 directory handling. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-03-06Merge tag 'for-netdev' of ↵Jakub Kicinski6-39/+150
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-03-06 We've added 85 non-merge commits during the last 13 day(s) which contain a total of 131 files changed, 7102 insertions(+), 1792 deletions(-). The main changes are: 1) Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses, from Joanne Koong. 2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities, from Andrii Nakryiko. 3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc, from Alexei Starovoitov. 4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps, from Kumar Kartikeya Dwivedi. 5) Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them, from Eduard Zingerman. 6) Make uprobe attachment Android APK aware by supporting attachment to functions inside ELF objects contained in APKs via function names, from Daniel Müller. 7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper to start the timer with absolute expiration value instead of relative one, from Tero Kristo. 8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id, from Tejun Heo. 9) Extend libbpf to support users manually attaching kprobes/uprobes in the legacy/perf/link mode, from Menglong Dong. 10) Implement workarounds in the mips BPF JIT for DADDI/R4000, from Jiaxun Yang. 11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT, from Hengqi Chen. 12) Extend BPF instruction set doc with describing the encoding of BPF instructions in terms of how bytes are stored under big/little endian, from Jose E. Marchesi. 13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui. 14) Fix bpf_xdp_query() backwards compatibility on old kernels, from Yonghong Song. 15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS, from Florent Revest. 16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache, from Hou Tao. 17) Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call, from Ilya Leoshkevich. 18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode selftests/bpf: Split test_attach_probe into multi subtests libbpf: Add support to set kprobe/uprobe attach mode tools/resolve_btfids: Add /libsubcmd to .gitignore bpf: add support for fixed-size memory pointer returns for kfuncs bpf: generalize dynptr_get_spi to be usable for iters bpf: mark PTR_TO_MEM as non-null register type bpf: move kfunc_call_arg_meta higher in the file bpf: ensure that r0 is marked scratched after any function call bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper bpf: clean up visit_insn()'s instruction processing selftests/bpf: adjust log_fixup's buffer size for proper truncation bpf: honor env->test_state_freq flag in is_state_visited() selftests/bpf: enhance align selftest's expected log matching bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} bpf: improve stack slot state printing selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access() selftests/bpf: test if pointer type is tracked for BPF_ST_MEM bpf: allow ctx writes using BPF_ST_MEM instruction bpf: Use separate RCU callbacks for freeing selem ... ==================== Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-06clk: Mark a fwnode as initialized when using CLK_OF_DECLARE() macroSaravana Kannan1-1/+7
We already mark fwnodes as initialized when they are registered as clock providers. We do this so that fw_devlink can tell when a clock driver doesn't use the driver core framework to probe/initialize its device. This ensures fw_devlink doesn't block the consumers of such a clock provider indefinitely. However, some users of CLK_OF_DECLARE() macros don't use the same node that matches the macro as the node for the clock provider, but they initialize the entire node. To cover these cases, also mark the nodes that match the macros as initialized when the init callback function is called. An example of this is "stericsson,u8500-clks" that's handled using CLK_OF_DECLARE() and looks something like this: clocks { compatible = "stericsson,u8500-clks"; prcmu_clk: prcmu-clock { #clock-cells = <1>; }; prcc_pclk: prcc-periph-clock { #clock-cells = <2>; }; prcc_kclk: prcc-kernel-clock { #clock-cells = <2>; }; prcc_reset: prcc-reset-controller { #reset-cells = <2>; }; ... }; This patch makes sure that "clocks" is marked as initialized so that fw_devlink knows that all nodes under it have been initialized. If the driver creates struct devices for some of the subnodes, fw_devlink is smart enough to know to wait for those devices to probe, so no special handling is required for those cases. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reported-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/lkml/CACRpkdamxDX6EBVjKX5=D3rkHp17f5pwGdBVhzFU90-0MHY6dQ@mail.gmail.com/ Fixes: 4a032827daa8 ("of: property: Simplify of_link_to_phandle()") Signed-off-by: Saravana Kannan <saravanak@google.com> Link: https://lore.kernel.org/r/20230302014639.297514-1-saravanak@google.com Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Tested-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2023-03-06cpumask: Fix typo nr_cpumask_size --> nr_cpumask_bitsAndy Shevchenko1-1/+1
The never used nr_cpumask_size is just a typo, hence use existing redefinition that's called nr_cpumask_bits. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-06btrfs: ioctl: return device fsid from DEV_INFO ioctlQu Wenruo1-1/+11
Currently user space utilizes dev info ioctl to grab the info of a certain devid, this includes its device uuid. But the returned info is not enough to determine if a device is a seed. Commit a26d60dedf9a ("btrfs: sysfs: add devinfo/fsid to retrieve actual fsid from the device") exports the same value in sysfs so this is for parity with ioctl. Add a new member, fsid, into btrfs_ioctl_dev_info_args, and populate the member with fsid value. This should not cause any compatibility problem, following the combinations: - Old user space, old kernel - Old user space, new kernel User space tool won't even check the new member. - New user space, old kernel The kernel won't touch the new member, and user space tool should zero out its argument, thus the new member is all zero. User space tool can then know the kernel doesn't support this fsid reporting, and falls back to whatever they can. - New user space, new kernel Go as planned. Would find the fsid member is no longer zero, and trust its value. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06netfilter: tproxy: fix deadlock due to missing BH disableFlorian Westphal1-0/+7
The xtables packet traverser performs an unconditional local_bh_disable(), but the nf_tables evaluation loop does not. Functions that are called from either xtables or nftables must assume that they can be called in process context. inet_twsk_deschedule_put() assumes that no softirq interrupt can occur. If tproxy is used from nf_tables its possible that we'll deadlock trying to aquire a lock already held in process context. Add a small helper that takes care of this and use it. Link: https://lore.kernel.org/netfilter-devel/401bd6ed-314a-a196-1cdc-e13c720cc8f2@balasys.hu/ Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Reported-and-tested-by: Major Dávid <major.david@balasys.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-03-05cpumask: re-introduce constant-sized cpumask optimizationsLinus Torvalds1-55/+70
Commit aa47a7c215e7 ("lib/cpumask: deprecate nr_cpumask_bits") resulted in the cpumask operations potentially becoming hugely less efficient, because suddenly the cpumask was always considered to be variable-sized. The optimization was then later added back in a limited form by commit 6f9c07be9d02 ("lib/cpumask: add FORCE_NR_CPUS config option"), but that FORCE_NR_CPUS option is not useful in a generic kernel and more of a special case for embedded situations with fixed hardware. Instead, just re-introduce the optimization, with some changes. Instead of depending on CPUMASK_OFFSTACK being false, and then always using the full constant cpumask width, this introduces three different cpumask "sizes": - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids. This is used for situations where we should use the exact size. - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it fits in a single word and the bitmap operations thus end up able to trigger the "small_const_nbits()" optimizations. This is used for the operations that have optimized single-word cases that get inlined, notably the bit find and scanning functions. - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it is an sufficiently small constant that makes simple "copy" and "clear" operations more efficient. This is arbitrarily set at four words or less. As a an example of this situation, without this fixed size optimization, cpumask_clear() will generate code like movl nr_cpu_ids(%rip), %edx addq $63, %rdx shrq $3, %rdx andl $-8, %edx callq memset@PLT on x86-64, because it would calculate the "exact" number of longwords that need to be cleared. In contrast, with this patch, using a MAX_CPU of 64 (which is quite a reasonable value to use), the above becomes a single movq $0,cpumask instruction instead, because instead of caring to figure out exactly how many CPU's the system has, it just knows that the cpumask will be a single word and can just clear it all. Note that this does end up tightening the rules a bit from the original version in another way: operations that set bits in the cpumask are now limited to the actual nr_cpu_ids limit, whereas we used to do the nr_cpumask_bits thing almost everywhere in the cpumask code. But if you just clear bits, or scan for bits, we can use the simpler compile-time constants. In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()' which were not useful, and which fundamentally have to be limited to 'nr_cpu_ids'. Better remove them now than have somebody introduce use of them later. Of course, on x86-64 with MAXSMP there is no sane small compile-time constant for the cpumask sizes, and we end up using the actual CPU bits, and will generate the above kind of horrors regardless. Please don't use MAXSMP unless you really expect to have machines with thousands of cores. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>