aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2023-10-20Merge tag 'mtd/fixes-for-6.6-rc7' of ↵Linus Torvalds3-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD fixes from Miquel Raynal: "In the raw NAND subsystem, the major fix prevents using cached reads with devices not supporting it. There was two bug reports about this. Apart from that, three drivers (pl353, arasan and marvell) could sometimes hide page program failures due to their their own program page helper not being fully compliant with the specification (many drivers use the default helpers shared by the core). Adding a missing check prevents these situation. Finally, the Qualcomm driver had a broken error path. In the SPI-NAND subsystem one Micron device used a wrong bitmak reporting possibly corrupted ECC status. Finally, the physmap-core got stripped from its map_rom fallback by mistake, this feature is added back" * tag 'mtd/fixes-for-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: rawnand: Ensure the nand chip supports cached reads mtd: rawnand: qcom: Unmap the right resource upon probe failure mtd: rawnand: pl353: Ensure program page operations are successful mtd: rawnand: arasan: Ensure program page operations are successful mtd: spinand: micron: correct bitmask for ecc status mtd: physmap-core: Restore map_rom fallback mtd: rawnand: marvell: Ensure program page operations are successful
2023-10-20thermal: core: Pass trip pointer to governor throttle callbackRafael J. Wysocki1-1/+2
Modify the governor .throttle() callback definition so that it takes a trip pointer instead of a trip index as its second argument, adjust the governors accordingly and update the core code invoking .throttle(). This causes the governors to become independent of the representation of the list of trips in the thermal zone structure. This change is not expected to alter the general functionality. Signed-off-by: Rafael J. Wysocki <[email protected]> Reviewed-by: Daniel Lezcano <[email protected]>
2023-10-20tracing: Move readpos from seq_buf to trace_seqMatthew Wilcox (Oracle)2-4/+3
To make seq_buf more lightweight as a string buf, move the readpos member from seq_buf to its container, trace_seq. That puts the responsibility of maintaining the readpos entirely in the tracing code. If some future users want to package up the readpos with a seq_buf, we can define a new struct then. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Kees Cook <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Kent Overstreet <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-10-20ethtool: untangle the linkmode and ethtool headersJakub Kicinski2-35/+16
Commit 26c5334d344d ("ethtool: Add forced speed to supported link modes maps") added a dependency between ethtool.h and linkmode.h. The dependency in the opposite direction already exists so the new code was inserted in an awkward place. The reason for ethtool.h to include linkmode.h, is that ethtool_forced_speed_maps_init() is a static inline helper. That's not really necessary. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Paul Greenwalt <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20net: dsa: microchip: ksz8: Enable MIIM PHY Control reg accessOleksij Rempel1-0/+4
Provide access to MIIM PHY Control register (Reg. 31) through ksz8_r_phy_ctrl() and ksz8_w_phy_ctrl() functions. Necessary for upcoming micrel.c patch to address forced link mode configuration. Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Reported-by: kernel test robot <[email protected]> Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-10-20x86/srso: Fix unret validation dependenciesJosh Poimboeuf1-1/+2
CONFIG_CPU_SRSO isn't dependent on CONFIG_CPU_UNRET_ENTRY (AMD Retbleed), so the two features are independently configurable. Fix several issues for the (presumably rare) case where CONFIG_CPU_SRSO is enabled but CONFIG_CPU_UNRET_ENTRY isn't. Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation") Signed-off-by: Josh Poimboeuf <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Acked-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/299fb7740174d0f2335e91c58af0e9c242b4bac1.1693889988.git.jpoimboe@kernel.org
2023-10-20Merge tag 'iio-for-6.7a' of ↵Greg Kroah-Hartman4-7/+15
https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-next Jonathan writes: IIO: 1st set of new device support, features and cleanup for 6.7 Particularly great to see a resolver driver move out of staging via a massive set of changes. Only took 13 years :) One small patch added then reverted due to a report of test breakage (ashai-kasei,ak8975: Drop deprecated enums.) An immutable branch was used for some hid-senors changes in case there was a need to take them into the HID tree as well. New device support ----------------- adi,hmc425a - Add support for HMC540SLP3E broadband 4-bit digital attenuator. kionix,kx022a - Add support for the kx132-1211 accelerometer. Require significant driver rework to enable this including add a chip type specific structure to deal with the chip differences. - Add support for the kx132acr-lbz accelerometer (subset of the kx022a feature set). lltc,ltc2309 - New driver for this 8 channel ADC. microchip,mcp3911 - Add support for rest of mcp391x family of ADCs (there are various differences beyond simple channel count variation. Series includes some general driver cleanup. microchip,mcp3564 - New driver for MCP3461, MCP3462, MCP3464, MCP3541, MCP3562, MCP3564 and their R variants of 16/24bit ADCs. A few minor fixed followed. rohm,bu1390 - New driver for this pressure sensor. Staging graduation ------------------ adi,ad1210 (after 13 or so years :) - More or less a complete (step-wise) rewrite of this resolver driver to bring it up to date with modern IIO standards. The fault signal handling mapping to event channels was particularly complex and significant part of the changes. Features -------- iio-core - Add chromacity and color temperature channel types. adi,ad7192 - Oversampling ratio control (called fast settling in datasheet). adi,adis16475 - Add core support and then driver support for delta angle and delta velocity channels. These are intended for summation to establish angle and velocity changes over larger timescales. Fix was needed for alignment after the temperature channel. Further fix reduced set of devices for which the buffer support was applicable as seems burst reads don't cover these on all devices. hid-sensors-als - Chromacity and color temperatures support including in amd sfh. stx104 - Add support for counter subsystem to this multipurpose device. ti,twl6030 - Add missing device tree binding description. Clean up and minor fixes. ------------------------ treewide - Drop some unused declarations across IIO. - Make more use of device_get_match_data() instead of OF specific approaches. Similar cleanup to sets of drivers. - Stop platform remove callbacks returning anything by using the temporary remove_new() callback. - Use i2c_get_match_data() to cope nicely with all types of ID table entry. - Use device_get_match_data() for various platform device to cope with more types of firmware. - Convert from enum to pointer in ID tables allowing use of i2c_get_match_data(). - Fix sorting on some ID tables. - Include specific string helper headers rather than simply string_helpers.h docs - Better description of the ordering requirements etc for available_scan_masks. tools - Handle alignment of mixed sizes where the last element isn't the biggest correctly. Seems that doesn't happen often! adi,ad2s1210 - Lots of work from David Lechner on this driver including a few fixes that are going with the rework to avoid slowing that down. adi,ad4310 - Replace deprecated devm_clk_register() adi,ad74413r - Bring the channel function setting inline with the datasheet. adi,ad7192 - Change to FIELD_PREP(), FIELD_GET(). - Calculate f_order from the sinc filter and chop filter states. - Move more per chip config into data in struct ad7192_chip_info - Cleanup unused parameter in channel macros. adi,adf4350 - Make use of devm_* to simplify error handling for many of the setup calls in probe() / tear down in remove() and error paths. Some more work to be done on this one. - Use dev_err_probe() for errors in probe() callback. adi,adf4413 - Typo in function name prefix. adi,adxl345 - Add channel scale to the chip type specific structure and drop using a type field previously used for indirection. asahi,ak8985 - Fix a mismatch introduced when switching from enum->pointers in the match tables. amlogic,meson - Expand error logging during probe. invensense,mpu6050 - Support level-shifter control. Whilst no one is sure exactly what this is doing it is needed for some old boards. - Document mount-matrix dt-binding. mediatek,mt6577 - Use devm_clk_get_enabled() to replace open coded version and move everything over to being device managed. Drop now empty remove() callback. Fix follows to put the drvdata back. - Use dev_err_probe() for error reporting in probe() callback. memsic,mxc4005 - Add of_match_table. microchip,mcp4725 - Move various chip specific data from being looked up by chip ID to data in the chip type specific structure. silicon-labs,si7005 - Add of_match_table and entry in trivial-devices.yaml st,lsm6dsx - Add missing mount-matrix dt binding documentation. st,spear - Use devm_clk_get_enabled() and some other devm calls to move everything over to being device managed. Drop now empty remove() callback. - Use dev_err_probe() to better handled deferred probing and tidy up error reporting in probe() callback. st,stm32-adc - Add a bit of additional checking in probe() to protect against a NULL pointer (no known path to trigger it today). - Replace deprecated strncpy() ti,ads1015 - Allow for edge triggers. - Document interrupt in dt-bindings. * tag 'iio-for-6.7a' of https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (201 commits) iio: Use device_get_match_data() iio: adc: MCP3564: fix warn: unsigned '__x' is never less than zero. dt-bindings: trivial-devices: add silabs,si7005 iio: si7005: Add device tree support drivers: imu: adis16475.c: Remove scan index from delta channels dt-bindings: iio: imu: st,lsm6dsx: add mount-matrix property iio: resolver: ad2s1210: remove of_match_ptr() iio: resolver: ad2s1210: remove DRV_NAME macro iio: resolver: ad2s1210: move out of staging staging: iio: resolver: ad2s1210: simplify code with guard(mutex) staging: iio: resolver: ad2s1210: clear faults after soft reset staging: iio: resolver: ad2s1210: refactor sample toggle staging: iio: resolver: ad2s1210: remove fault attribute staging: iio: resolver: ad2s1210: add label attribute support staging: iio: resolver: ad2s1210: add register/fault support summary staging: iio: resolver: ad2s1210: implement fault events iio: event: add optional event label support staging: iio: resolver: ad2s1210: rename DOS reset min/max attrs staging: iio: resolver: ad2s1210: convert DOS mismatch threshold to event attr staging: iio: resolver: ad2s1210: convert DOS overrange threshold to event attr ...
2023-10-20crypto: hisilicon/qm - fix EQ/AEQ interrupt issueLongfang Liu1-0/+1
During hisilicon accelerator live migration operation. In order to prevent the problem of EQ/AEQ interrupt loss. Migration driver will trigger an EQ/AEQ doorbell at the end of the migration. This operation may cause double interruption of EQ/AEQ events. To ensure that the EQ/AEQ interrupt processing function is normal. The interrupt handling functionality of EQ/AEQ needs to be updated. Used to handle repeated interrupts event. Fixes: b0eed085903e ("hisi_acc_vfio_pci: Add support for VFIO live migration") Signed-off-by: Longfang Liu <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-10-20crypto: pkcs7 - remove sha1 supportDimitri John Ledkov1-4/+0
Removes support for sha1 signed kernel modules, importing sha1 signed x.509 certificates. rsa-pkcs1pad keeps sha1 padding support, which seems to be used by virtio driver. sha1 remains available as there are many drivers and subsystems using it. Note only hmac(sha1) with secret keys remains cryptographically secure. In the kernel there are filesystems, IMA, tpm/pcr that appear to be using sha1. Maybe they can all start to be slowly upgraded to something else i.e. blake3, ParallelHash, SHAKE256 as needed. Signed-off-by: Dimitri John Ledkov <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2023-10-19mm/slab: Add __free() support for kvfreeDan Williams1-0/+2
Allow for the declaration of variables that trigger kvfree() when they go out of scope. The check for NULL and call to kvfree() can be elided by the compiler in most cases, otherwise without the NULL check an unnecessary call to kvfree() may be emitted. Peter proposed a comment for this detail [1]. Link: http://lore.kernel.org/r/[email protected] [1] Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Acked-by: Pankaj Gupta <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]> Tested-by: Kuppuswamy Sathyanarayanan <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2023-10-19configfs-tsm: Introduce a shared ABI for attestation reportsDan Williams1-0/+69
One of the common operations of a TSM (Trusted Security Module) is to provide a way for a TVM (confidential computing guest execution environment) to take a measurement of its launch state, sign it and submit it to a verifying party. Upon successful attestation that verifies the integrity of the TVM additional secrets may be deployed. The concept is common across TSMs, but the implementations are unfortunately vendor specific. While the industry grapples with a common definition of this attestation format [1], Linux need not make this problem worse by defining a new ABI per TSM that wants to perform a similar operation. The current momentum has been to invent new ioctl-ABI per TSM per function which at best is an abdication of the kernel's responsibility to make common infrastructure concepts share common ABI. The proposal, targeted to conceptually work with TDX, SEV-SNP, COVE if not more, is to define a configfs interface to retrieve the TSM-specific blob. report=/sys/kernel/config/tsm/report/report0 mkdir $report dd if=binary_userdata_plus_nonce > $report/inblob hexdump $report/outblob This approach later allows for the standardization of the attestation blob format without needing to invent a new ABI. Once standardization happens the standard format can be emitted by $report/outblob and indicated by $report/provider, or a new attribute like "$report/tcg_coco_report" can emit the standard format alongside the vendor format. Review of previous iterations of this interface identified that there is a need to scale report generation for multiple container environments [2]. Configfs enables a model where each container can bind mount one or more report generation item instances. Still, within a container only a single thread can be manipulating a given configuration instance at a time. A 'generation' count is provided to detect conflicts between multiple threads racing to configure a report instance. The SEV-SNP concepts of "extended reports" and "privilege levels" are optionally enabled by selecting 'tsm_report_ext_type' at register_tsm() time. The expectation is that those concepts are generic enough that they may be adopted by other TSM implementations. In other words, configfs-tsm aims to address a superset of TSM specific functionality with a common ABI where attributes may appear, or not appear, based on the set of concepts the implementation supports. Link: http://lore.kernel.org/r/[email protected] [1] Link: http://lore.kernel.org/r/[email protected] [2] Cc: Kuppuswamy Sathyanarayanan <[email protected]> Cc: Dionna Amalie Glaze <[email protected]> Cc: James Bottomley <[email protected]> Cc: Peter Gonda <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Samuel Ortiz <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]> Tested-by: Kuppuswamy Sathyanarayanan <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Signed-off-by: Dan Williams <[email protected]>
2023-10-19bpf: teach the verifier to enforce css_iter and task_iter in RCU CSChuyi Zhou2-8/+12
css_iter and task_iter should be used in rcu section. Specifically, in sleepable progs explicit bpf_rcu_read_lock() is needed before use these iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to use them directly. This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check whether we are in rcu cs before we want to invoke this kfunc. If the rcu protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU. Once user do rcu_unlock during the iteration, state MEM_RCU of regs would be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED. It is worth noting that currently, bpf_rcu_read_unlock does not clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg search STACK_ITER. Signed-off-by: Chuyi Zhou <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-10-19cgroup: Prepare for using css_task_iter_*() in BPFChuyi Zhou1-7/+5
This patch makes some preparations for using css_task_iter_*() in BPF Program. 1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to use them. Convert them to enum so bpf prog can take them from vmlinux.h. 2. In the next patch we will add css_task_iter_*() in common kfuncs which is not safe. Since css_task_iter_*() does spin_unlock_irq() which might screw up irq flags depending on the context where bpf prog is running. So we should use irqsave/irqrestore here and the switching is harmless. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Chuyi Zhou <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-10-19io_uring/cmd: Pass compat mode in issue_flagsBreno Leitao1-0/+1
Create a new flag to track if the operation is running compat mode. This basically check the context->compat and pass it to the issue_flags, so, it could be queried later in the callbacks. Signed-off-by: Breno Leitao <[email protected]> Reviewed-by: Gabriel Krisman Bertazi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-10-19net/socket: Break down __sys_getsockoptBreno Leitao1-1/+1
Split __sys_getsockopt() into two functions by removing the core logic into a sub-function (do_sock_getsockopt()). This will avoid code duplication when doing the same operation in other callers, for instance. do_sock_getsockopt() will be called by io_uring getsockopt() command operation in the following patch. The same was done for the setsockopt pair. Suggested-by: Martin KaFai Lau <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski7-6/+41
Cross-merge networking fixes after downstream PR. net/mac80211/key.c 02e0e426a2fb ("wifi: mac80211: fix error path key leak") 2a8b665e6bcc ("wifi: mac80211: remove key_mtx") 7d6904bf26b9 ("Merge wireless into wireless-next") https://lore.kernel.org/all/[email protected]/ Adjacent changes: drivers/net/ethernet/ti/Kconfig a602ee3176a8 ("net: ethernet: ti: Fix mixed module-builtin object") 98bdeae9502b ("net: cpmac: remove driver to prepare for platform removal") Signed-off-by: Jakub Kicinski <[email protected]>
2023-10-19bpf: Add sockptr support for setsockoptBreno Leitao1-1/+1
The whole network stack uses sockptr, and while it doesn't move to something more modern, let's use sockptr in setsockptr BPF hooks, so, it could be used by other callers. The main motivation for this change is to use it in the io_uring {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a kernel value for optlen. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Breno Leitao <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-10-19bpf: Add sockptr support for getsockoptBreno Leitao1-2/+3
The whole network stack uses sockptr, and while it doesn't move to something more modern, let's use sockptr in getsockptr BPF hooks, so, it could be used by other callers. The main motivation for this change is to use it in the io_uring {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a kernel value for optlen. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Breno Leitao <[email protected]> Acked-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-10-19Merge tag 'net-6.6-rc7' of ↵Linus Torvalds1-3/+16
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, netfilter, WiFi. Feels like an up-tick in regression fixes, mostly for older releases. The hfsc fix, tcp_disconnect() and Intel WWAN fixes stand out as fairly clear-cut user reported regressions. The mlx5 DMA bug was causing strife for 390x folks. The fixes themselves are not particularly scary, tho. No open investigations / outstanding reports at the time of writing. Current release - regressions: - eth: mlx5: perform DMA operations in the right locations, make devices usable on s390x, again - sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve, previous fix of rejecting invalid config broke some scripts - rfkill: reduce data->mtx scope in rfkill_fop_open, avoid deadlock - revert "ethtool: Fix mod state of verbose no_mask bitset", needs more work Current release - new code bugs: - tcp: fix listen() warning with v4-mapped-v6 address Previous releases - regressions: - tcp: allow tcp_disconnect() again when threads are waiting, it was denied to plug a constant source of bugs but turns out .NET depends on it - eth: mlx5: fix double-free if buffer refill fails under OOM - revert "net: wwan: iosm: enable runtime pm support for 7560", it's causing regressions and the WWAN team at Intel disappeared - tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb, fix single-stream perf regression on some devices Previous releases - always broken: - Bluetooth: - fix issues in legacy BR/EDR PIN code pairing - correctly bounds check and pad HCI_MON_NEW_INDEX name - netfilter: - more fixes / follow ups for the large "commit protocol" rework, which went in as a fix to 6.5 - fix null-derefs on netlink attrs which user may not pass in - tcp: fix excessive TLP and RACK timeouts from HZ rounding (bless Debian for keeping HZ=250 alive) - net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation, prevent letting frankenstein UDP super-frames from getting into the stack - net: fix interface altnames when ifc moves to a new namespace - eth: qed: fix the size of the RX buffers - mptcp: avoid sending RST when closing the initial subflow" * tag 'net-6.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) Revert "ethtool: Fix mod state of verbose no_mask bitset" selftests: mptcp: join: no RST when rm subflow/addr mptcp: avoid sending RST when closing the initial subflow mptcp: more conservative check for zero probes tcp: check mptcp-level constraints for backlog coalescing selftests: mptcp: join: correctly check for no RST net: ti: icssg-prueth: Fix r30 CMDs bitmasks selftests: net: add very basic test for netdev names and namespaces net: move altnames together with the netdevice net: avoid UAF on deleted altname net: check for altname conflicts when changing netdev's netns net: fix ifname in netlink ntf during netns move net: ethernet: ti: Fix mixed module-builtin object net: phy: bcm7xxx: Add missing 16nm EPHY statistics ipv4: fib: annotate races around nh->nh_saddr_genid and nh->nh_saddr tcp_bpf: properly release resources on error paths net/sched: sch_hfsc: upgrade 'rt' to 'sc' when it becomes a inner curve net: mdio-mux: fix C45 access returning -EIO after API change tcp: tsq: relax tcp_small_queue_check() when rtx queue contains a single skb octeon_ep: update BQL sent bytes before ringing doorbell ...
2023-10-19lib/generic-radix-tree.c: Add peek_prev()Kent Overstreet1-1/+60
This patch adds genradix_peek_prev(), genradix_iter_rewind(), and genradix_for_each_reverse(), for iterating backwards over a generic radix tree. Signed-off-by: Kent Overstreet <[email protected]>
2023-10-19lib/generic-radix-tree.c: Don't overflow in peek()Kent Overstreet1-0/+7
When we started spreading new inode numbers throughout most of the 64 bit inode space, that triggered some corner case bugs, in particular some integer overflows related to the radix tree code. Oops. Signed-off-by: Kent Overstreet <[email protected]>
2023-10-19closures: closure_nr_remaining()Kent Overstreet1-1/+6
Factor out a new helper, which returns the number of events outstanding. Signed-off-by: Kent Overstreet <[email protected]>
2023-10-19closures: closure_wait_event()Kent Overstreet1-0/+22
Like wait_event() - except, because it uses closures and closure waitlists it doesn't have the restriction on modifying task state inside the condition check, like wait_event() does. Signed-off-by: Kent Overstreet <[email protected]> Acked-by: Coly Li <[email protected]>
2023-10-19bcache: move closures to lib/Kent Overstreet1-0/+377
Prep work for bcachefs - being a fork of bcache it also uses closures Signed-off-by: Kent Overstreet <[email protected]> Acked-by: Coly Li <[email protected]> Reviewed-by: Randy Dunlap <[email protected]>
2023-10-19Merge tag 'v6.6-rc7.vfs.fixes' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fix from Christian Brauner: "An openat() call from io_uring triggering an audit call can apparently cause the refcount of struct filename to be incremented from multiple threads concurrently during async execution, triggering a refcount underflow and hitting a BUG_ON(). That bug has been lurking around since at least v5.16 apparently. Switch to an atomic counter to fix that. The underflow check is downgraded from a BUG_ON() to a WARN_ON_ONCE() but we could easily remove that check altogether tbh" * tag 'v6.6-rc7.vfs.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: audit,io_uring: io_uring openat triggers audit reference count underflow
2023-10-19net: introduce napi_is_scheduled helperChristian Marangi1-0/+23
We currently have napi_if_scheduled_mark_missed that can be used to check if napi is scheduled but that does more thing than simply checking it and return a bool. Some driver already implement custom function to check if napi is scheduled. Drop these custom function and introduce napi_is_scheduled that simply check if napi is scheduled atomically. Update any driver and code that implement a similar check and instead use this new helper. Signed-off-by: Christian Marangi <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-19net: stmmac: intel: remove unnecessary field struct ↵Johannes Zink1-1/+0
plat_stmmacenet_data::ext_snapshot_num Do not store bitmask for enabling AUX_SNAPSHOT0. The previous commit ("net: stmmac: fix PPS capture input index") takes care of calculating the proper bit mask from the request data's extts.index field, which is 0 if not explicitly specified otherwise. Signed-off-by: Johannes Zink <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-10-19fs: fix umask on NFS with CONFIG_FS_POSIX_ACL=nMax Kellermann1-0/+5
Make IS_POSIXACL() return false if POSIX ACL support is disabled. Never skip applying the umask in namei.c and never bother to do any ACL specific checks if the filesystem falsely indicates it has ACLs enabled when the feature is completely disabled in the kernel. This fixes a problem where the umask is always ignored in the NFS client when compiled without CONFIG_FS_POSIX_ACL. This is a 4 year old regression caused by commit 013cdf1088d723 which itself was not completely wrong, but failed to consider all the side effects by misdesigned VFS code. Prior to that commit, there were two places where the umask could be applied, for example when creating a directory: 1. in the VFS layer in SYSCALL_DEFINE3(mkdirat), but only if !IS_POSIXACL() 2. again (unconditionally) in nfs3_proc_mkdir() The first one does not apply, because even without CONFIG_FS_POSIX_ACL, the NFS client sets SB_POSIXACL in nfs_fill_super(). After that commit, (2.) was replaced by: 2b. in posix_acl_create(), called by nfs3_proc_mkdir() There's one branch in posix_acl_create() which applies the umask; however, without CONFIG_FS_POSIX_ACL, posix_acl_create() is an empty dummy function which does not apply the umask. The approach chosen by this patch is to make IS_POSIXACL() always return false when POSIX ACL support is disabled, so the umask always gets applied by the VFS layer. This is consistent with the (regular) behavior of posix_acl_create(): that function returns early if IS_POSIXACL() is false, before applying the umask. Therefore, posix_acl_create() is responsible for applying the umask if there is ACL support enabled in the file system (SB_POSIXACL), and the VFS layer is responsible for all other cases (no SB_POSIXACL or no CONFIG_FS_POSIX_ACL). Signed-off-by: Max Kellermann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2023-10-19fs: store real path instead of fake path in backing file f_pathAmir Goldstein2-20/+5
A backing file struct stores two path's, one "real" path that is referring to f_inode and one "fake" path, which should be displayed to users in /proc/<pid>/maps. There is a lot more potential code that needs to know the "real" path, then code that needs to know the "fake" path. Instead of code having to request the "real" path with file_real_path(), store the "real" path in f_path and require code that needs to know the "fake" path request it with file_user_path(). Replace the file_real_path() helper with a simple const accessor f_path(). After this change, file_dentry() is not expected to observe any files with overlayfs f_path and real f_inode, so the call to ->d_real() should not be needed. Leave the ->d_real() call for now and add an assertion in ovl_d_real() to catch if we made wrong assumptions. Suggested-by: Miklos Szeredi <[email protected]> Link: https://lore.kernel.org/r/CAJfpegtt48eXhhjDFA1ojcHPNKj3Go6joryCPtEFAKpocyBsnw@mail.gmail.com/ Signed-off-by: Amir Goldstein <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2023-10-19fs: create helper file_user_path() for user displayed mapped file pathAmir Goldstein1-0/+14
Overlayfs uses backing files with "fake" overlayfs f_path and "real" underlying f_inode, in order to use underlying inode aops for mapped files and to display the overlayfs path in /proc/<pid>/maps. In preparation for storing the overlayfs "fake" path instead of the underlying "real" path in struct backing_file, define a noop helper file_user_path() that returns f_path for now. Use the new helper in procfs and kernel logs whenever a path of a mapped file is displayed to users. Signed-off-by: Amir Goldstein <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2023-10-19vfs: predict the error in retry_estale as unlikelyMateusz Guzik1-1/+1
Signed-off-by: Mateusz Guzik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2023-10-19file: convert to SLAB_TYPESAFE_BY_RCUChristian Brauner2-15/+6
In recent discussions around some performance improvements in the file handling area we discussed switching the file cache to rely on SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based freeing for files completely. This is a pretty sensitive change overall but it might actually be worth doing. The main downside is the subtlety. The other one is that we should really wait for Jann's patch to land that enables KASAN to handle SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this exists. With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times which requires a few changes. So it isn't sufficient anymore to just acquire a reference to the file in question under rcu using atomic_long_inc_not_zero() since the file might have already been recycled and someone else might have bumped the reference. In other words, callers might see reference count bumps from newer users. For this reason it is necessary to verify that the pointer is the same before and after the reference count increment. This pattern can be seen in get_file_rcu() and __files_get_rcu(). In addition, it isn't possible to access or check fields in struct file without first aqcuiring a reference on it. Not doing that was always very dodgy and it was only usable for non-pointer data in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a reference under rcu or they must hold the files_lock of the fdtable. Failing to do either one of this is a bug. Thanks to Jann for pointing out that we need to ensure memory ordering between reallocations and pointer check by ensuring that all subsequent loads have a dependency on the second load in get_file_rcu() and providing a fixup that was folded into this patch. Cc: Jann Horn <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-10-19watch_queue: Annotate struct watch_filter with __counted_byKees Cook1-1/+1
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct watch_filter. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: David Howells <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Al Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Siddh Raman Pant <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Qian Cai <[email protected]> Signed-off-by: Kees Cook <[email protected]> Tested-by: Siddh Raman Pant <[email protected]> Reviewed-by: "Gustavo A. R. Silva" <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-10-19fs/pipe: move check to pipe_has_watch_queue()Max Kellermann1-0/+16
This declutters the code by reducing the number of #ifdefs and makes the watch_queue checks simpler. This has no runtime effect; the machine code is identical. Signed-off-by: Max Kellermann <[email protected]> Message-Id: <[email protected]> Reviewed-by: David Howells <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-10-19pipe: reduce padding in struct pipe_inode_infoMax Kellermann1-3/+3
This has no effect on 64 bit because there are 10 32-bit integers surrounding the two bools, but on 32 bit architectures, this reduces the struct size by 4 bytes by merging the two bools into one word. Signed-off-by: Max Kellermann <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-10-19fs: add a new SB_I_NOUMASK flagJeff Layton2-1/+26
SB_POSIXACL must be set when a filesystem supports POSIX ACLs, but NFSv4 also sets this flag to prevent the VFS from applying the umask on newly-created files. NFSv4 doesn't support POSIX ACLs however, which causes confusion when other subsystems try to test for them. Add a new SB_I_NOUMASK flag that allows filesystems to opt-in to umask stripping without advertising support for POSIX ACLs. Set the new flag on NFSv4 instead of SB_POSIXACL. Also, move mode_strip_umask to namei.h and convert init_mknod and init_mkdir to use it. Signed-off-by: Jeff Layton <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-10-19vlynq: remove bus driverWolfram Sang1-149/+0
There are no users with a vlynq_driver in the Kernel tree. Also, only the AR7 platform ever initialized a VLYNQ bus, but AR7 is going to be removed from the Kernel. OpenWRT had some out-of-tree drivers which they probably intended to upport, but AR7 devices are even there not supported anymore because they are "stuck with Kernel 3.18" [1]. This code can go. [1] https://openwrt.org/docs/techref/targets/ar7 Signed-off-by: Wolfram Sang <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2023-10-19perf: Disallow mis-matched inherited group readsPeter Zijlstra1-0/+1
Because group consistency is non-atomic between parent (filedesc) and children (inherited) events, it is possible for PERF_FORMAT_GROUP read() to try and sum non-matching counter groups -- with non-sensical results. Add group_generation to distinguish the case where a parent group removes and adds an event and thus has the same number, but a different configuration of events as inherited groups. This became a problem when commit fa8c269353d5 ("perf/core: Invert perf_read_group() loops") flipped the order of child_list and sibling_list. Previously it would iterate the group (sibling_list) first, and for each sibling traverse the child_list. In this order, only the group composition of the parent is relevant. By flipping the order the group composition of the child (inherited) events becomes an issue and the mis-match in group composition becomes evident. That said; even prior to this commit, while reading of a group that is not equally inherited was not broken, it still made no sense. (Ab)use ECHILD as error return to indicate issues with child process group composition. Fixes: fa8c269353d5 ("perf/core: Invert perf_read_group() loops") Reported-by: Budimir Markovic <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2023-10-18buildid: reduce header file dependencies for moduleArnd Bergmann1-1/+2
The vmlinux decompressor code intentionally has only a limited set of included header files, but this started running into a build failure because of the bitmap logic needing linux/errno.h: In file included from include/linux/cpumask.h:12, from include/linux/mm_types_task.h:14, from include/linux/mm_types.h:5, from include/linux/buildid.h:5, from include/linux/module.h:14, from arch/arm/boot/compressed/../../../../lib/lz4/lz4_decompress.c:39, from arch/arm/boot/compressed/../../../../lib/decompress_unlz4.c:10, from arch/arm/boot/compressed/decompress.c:60: include/linux/bitmap.h: In function 'bitmap_allocate_region': include/linux/bitmap.h:527:25: error: 'EBUSY' undeclared (first use in this function) 527 | return -EBUSY; | ^~~~~ include/linux/bitmap.h:527:25: note: each undeclared identifier is reported only once for each function it appears in include/linux/bitmap.h: In function 'bitmap_find_free_region': include/linux/bitmap.h:554:17: error: 'ENOMEM' undeclared (first use in this function) 554 | return -ENOMEM; | ^~~~~~ This is easily avoided by changing linux/buildid.h to no longer depend on linux/mm_types.h, a header that pulls in a huge number of indirect dependencies. Fixes: b9c957f554442 ("bitmap: move bitmap_*_region() functions to bitmap.h") Fixes: bd7525dacd7e2 ("bpf: Move stack_map_get_build_id into lib") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Yury Norov <[email protected]>
2023-10-18string: Adjust strtomem() logic to allow for smaller sourcesKees Cook1-2/+5
Arnd noticed we have a case where a shorter source string is being copied into a destination byte array, but this results in a strnlen() call that exceeds the size of the source. This is seen with -Wstringop-overread: In file included from ../include/linux/uuid.h:11, from ../include/linux/mod_devicetable.h:14, from ../include/linux/cpufeature.h:12, from ../arch/x86/coco/tdx/tdx.c:7: ../arch/x86/coco/tdx/tdx.c: In function 'tdx_panic.constprop': ../include/linux/string.h:284:9: error: 'strnlen' specified bound 64 exceeds source size 60 [-Werror=stringop-overread] 284 | memcpy_and_pad(dest, _dest_len, src, strnlen(src, _dest_len), pad); \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../arch/x86/coco/tdx/tdx.c:124:9: note: in expansion of macro 'strtomem_pad' 124 | strtomem_pad(message.str, msg, '\0'); | ^~~~~~~~~~~~ Use the smaller of the two buffer sizes when calling strnlen(). When src length is unknown (SIZE_MAX), it is adjusted to use dest length, which is what the original code did. Reported-by: Arnd Bergmann <[email protected]> Fixes: dfbafa70bde2 ("string: Introduce strtomem() and strtomem_pad()") Tested-by: Arnd Bergmann <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: [email protected] Signed-off-by: Kees Cook <[email protected]>
2023-10-18compiler.h: move __is_constexpr() to compiler.hDavid Laight2-8/+8
Prior to f747e6667ebb2 __is_constexpr() was in its only user minmax.h. That commit moved it to const.h - but that file just defines ULL(x) and UL(x) so that constants can be defined for .S and .c files. So apart from the word 'const' it wasn't really a good location. Instead move the definition to compiler.h just before the similar is_signed_type() and is_unsigned_type(). This may not be a good long-term home, but the three definitions belong together. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Laight <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Steven Rostedt (Google) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-18minmax: relax check to allow comparison between unsigned arguments and ↵David Laight1-7/+17
signed constants Allow (for example) min(unsigned_var, 20). The opposite min(signed_var, 20u) is still errored. Since a comparison between signed and unsigned never makes the unsigned value negative it is only necessary to adjust the __types_ok() test. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Laight <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-18minmax: allow comparisons of 'int' against 'unsigned char/short'David Laight1-2/+3
Since 'unsigned char/short' get promoted to 'signed int' it is safe to compare them against an 'int' value. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Laight <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-18minmax: fix indentation of __cmp_once() and __clamp_once()David Laight1-15/+15
Remove the extra indentation and align continuation markers. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Laight <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-18minmax: allow min()/max()/clamp() if the arguments have the same signedness.David Laight1-28/+32
The type-check in min()/max() is there to stop unexpected results if a negative value gets converted to a large unsigned value. However it also rejects 'unsigned int' v 'unsigned long' compares which are common and never problematc. Replace the 'same type' check with a 'same signedness' check. The new test isn't itself a compile time error, so use static_assert() to report the error and give a meaningful error message. Due to the way builtin_choose_expr() works detecting the error in the 'non-constant' side (where static_assert() can be used) also detects errors when the arguments are constant. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Laight <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-18minmax: add umin(a, b) and umax(a, b)David Laight1-0/+17
Patch series "minmax: Relax type checks in min() and max()", v4. The min() (etc) functions in minmax.h require that the arguments have exactly the same types. However when the type check fails, rather than look at the types and fix the type of a variable/constant, everyone seems to jump on min_t(). In reality min_t() ought to be rare - when something unusual is being done, not normality. The orginal min() (added in 2.4.9) replaced several inline functions and included the type - so matched the implicit casting of the function call. This was renamed min_t() in 2.4.10 and the current min() added. There is no actual indication that the conversion of negatve values to large unsigned values has ever been an actual problem. A quick grep shows 5734 min() and 4597 min_t(). Having the casts on almost half of the calls shows that something is clearly wrong. If the wrong type is picked (and it is far too easy to pick the type of the result instead of the larger input) then significant bits can get discarded. Pretty much the worst example is in the derived clamp_val(), consider: unsigned char x = 200u; y = clamp_val(x, 10u, 300u); I also suspect that many of the min_t(u16, ...) are actually wrong. For example copy_data() in printk_ringbuffer.c contains: data_size = min_t(u16, buf_size, len); Here buf_size is 'unsigned int' and len 'u16', pass a 64k buffer (can you prove that doesn't happen?) and no data is returned. Apparantly it did - and has since been fixed. The only reason that most of the min_t() are 'fine' is that pretty much all the values in the kernel are between 0 and INT_MAX. Patch 1 adds umin(), this uses integer promotions to convert both arguments to 'unsigned long long'. It can be used to compare a signed type that is known to contain a non-negative value with an unsigned type. The compiler typically optimises it all away. Added first so that it can be referred to in patch 2. Patch 2 replaces the 'same type' check with a 'same signedness' one. This makes min(unsigned_int_var, sizeof()) be ok. The error message is also improved and will contain the expanded form of both arguments (useful for seeing how constants are defined). Patch 3 just fixes some whitespace. Patch 4 allows comparisons of 'unsigned char' and 'unsigned short' to signed types. The integer promotion rules convert them both to 'signed int' prior to the comparison so they can never cause a negative value be converted to a large positive one. Patch 5 (rewritted for v4) allows comparisons of unsigned values against non-negative constant integer expressions. This makes min(unsigned_int_var, 4) be ok. The only common case that is still errored is the comparison of signed values against unsigned constant integer expressions below __INT_MAX__. Typcally min(int_val, sizeof (foo)), the real fix for this is casting the constant: min(int_var, (int)sizeof (foo)). With all the patches applied pretty much all the min_t() could be replaced by min(), and most of the rest by umin(). However they all need careful inspection due to code like: sz = min_t(unsigned char, sz - 1, LIM - 1) + 1; which converts 0 to LIM. This patch (of 6): umin() and umax() can be used when min()/max() errors a signed v unsigned compare when the signed value is known to be non-negative. Unlike min_t(some_unsigned_type, a, b) umin() will never mask off high bits if an inappropriate type is selected. The '+ 0u + 0ul + 0ull' may look strange. The '+ 0u' is needed for 'signed int' on 64bit systems. The '+ 0ul' is needed for 'signed long' on 32bit systems. The '+ 0ull' is needed for 'signed long long'. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Laight <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-18kstrtox: remove strtobool()Christophe JAILLET1-5/+0
The conversion from strtobool() to kstrtobool() is completed. So strtobool() can now be removed. Link: https://lkml.kernel.org/r/87e3cc2547df174cd5af1fadbf866be4ef9e8e45.1694878151.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-18extract and use FILE_LINE macroAlexey Dobriyan3-3/+4
Extract nifty FILE_LINE useful for printk style debugging: printk("%s\n", FILE_LINE); It should not be used en mass probably because __FILE__ string literals can be merged while FILE_LINE's won't. But for debugging it is what the doctor ordered. Don't add leading and trailing underscores, they're painful to type. Trust me, I've tried both versions. Link: https://lkml.kernel.org/r/ebf12ac4-5a61-4b12-b8b0-1253eb371332@p183 Signed-off-by: Alexey Dobriyan <[email protected]> Cc: Kees Cook <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-18mm: update memfd seal write check to include F_SEAL_WRITELorenzo Stoakes1-7/+8
The seal_check_future_write() function is called by shmem_mmap() or hugetlbfs_file_mmap() to disallow any future writable mappings of an memfd sealed this way. The F_SEAL_WRITE flag is not checked here, as that is handled via the mapping->i_mmap_writable mechanism and so any attempt at a mapping would fail before this could be run. However we intend to change this, meaning this check can be performed for F_SEAL_WRITE mappings also. The logic here is equally applicable to both flags, so update this function to accommodate both and rename it accordingly. Link: https://lkml.kernel.org/r/913628168ce6cce77df7d13a63970bae06a526e0.1697116581.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-10-18mm: drop the assumption that VM_SHARED always implies writableLorenzo Stoakes2-2/+13
Patch series "permit write-sealed memfd read-only shared mappings", v4. The man page for fcntl() describing memfd file seals states the following about F_SEAL_WRITE:- Furthermore, trying to create new shared, writable memory-mappings via mmap(2) will also fail with EPERM. With emphasis on 'writable'. In turns out in fact that currently the kernel simply disallows all new shared memory mappings for a memfd with F_SEAL_WRITE applied, rendering this documentation inaccurate. This matters because users are therefore unable to obtain a shared mapping to a memfd after write sealing altogether, which limits their usefulness. This was reported in the discussion thread [1] originating from a bug report [2]. This is a product of both using the struct address_space->i_mmap_writable atomic counter to determine whether writing may be permitted, and the kernel adjusting this counter when any VM_SHARED mapping is performed and more generally implicitly assuming VM_SHARED implies writable. It seems sensible that we should only update this mapping if VM_MAYWRITE is specified, i.e. whether it is possible that this mapping could at any point be written to. If we do so then all we need to do to permit write seals to function as documented is to clear VM_MAYWRITE when mapping read-only. It turns out this functionality already exists for F_SEAL_FUTURE_WRITE - we can therefore simply adapt this logic to do the same for F_SEAL_WRITE. We then hit a chicken and egg situation in mmap_region() where the check for VM_MAYWRITE occurs before we are able to clear this flag. To work around this, perform this check after we invoke call_mmap(), with careful consideration of error paths. Thanks to Andy Lutomirski for the suggestion! [1]:https://lore.kernel.org/all/[email protected]/ [2]:https://bugzilla.kernel.org/show_bug.cgi?id=217238 This patch (of 3): There is a general assumption that VMAs with the VM_SHARED flag set are writable. If the VM_MAYWRITE flag is not set, then this is simply not the case. Update those checks which affect the struct address_space->i_mmap_writable field to explicitly test for this by introducing [vma_]is_shared_maywrite() helper functions. This remains entirely conservative, as the lack of VM_MAYWRITE guarantees that the VMA cannot be written to. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/d978aefefa83ec42d18dfa964ad180dbcde34795.1697116581.git.lstoakes@gmail.com Signed-off-by: Lorenzo Stoakes <[email protected]> Suggested-by: Andy Lutomirski <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>