aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-01-17s390/vfio-ap: always filter entire AP matrixTony Krowiak1-40/+17
The vfio_ap_mdev_filter_matrix function is called whenever a new adapter or domain is assigned to the mdev. The purpose of the function is to update the guest's AP configuration by filtering the matrix of adapters and domains assigned to the mdev. When an adapter or domain is assigned, only the APQNs associated with the APID of the new adapter or APQI of the new domain are inspected. If an APQN does not reference a queue device bound to the vfio_ap device driver, then it's APID will be filtered from the mdev's matrix when updating the guest's AP configuration. Inspecting only the APID of the new adapter or APQI of the new domain will result in passing AP queues through to a guest that are not bound to the vfio_ap device driver under certain circumstances. Consider the following: guest's AP configuration (all also assigned to the mdev's matrix): 14.0004 14.0005 14.0006 16.0004 16.0005 16.0006 unassign domain 4 unbind queue 16.0005 assign domain 4 When domain 4 is re-assigned, since only domain 4 will be inspected, the APQNs that will be examined will be: 14.0004 16.0004 Since both of those APQNs reference queue devices that are bound to the vfio_ap device driver, nothing will get filtered from the mdev's matrix when updating the guest's AP configuration. Consequently, queue 16.0005 will get passed through despite not being bound to the driver. This violates the linux device model requirement that a guest shall only be given access to devices bound to the device driver facilitating their pass-through. To resolve this problem, every adapter and domain assigned to the mdev will be inspected when filtering the mdev's matrix. Signed-off-by: Tony Krowiak <[email protected]> Acked-by: Halil Pasic <[email protected]> Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Gordeev <[email protected]>
2024-01-17s390/net: add Thorsten Winkler as maintainerAlexandra Winter1-2/+2
Thank you Wenjia for your support, welcome Thorsten! Acked-by: Wenjia Zhang <[email protected]> Acked-by: Thorsten Winkler <[email protected]> Signed-off-by: Alexandra Winter <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]>
2024-01-17ipvs: avoid stat macros calls from preemptible contextFedor Pchelkin1-2/+2
Inside decrement_ttl() upon discovering that the packet ttl has exceeded, __IP_INC_STATS and __IP6_INC_STATS macros can be called from preemptible context having the following backtrace: check_preemption_disabled: 48 callbacks suppressed BUG: using __this_cpu_add() in preemptible [00000000] code: curl/1177 caller is decrement_ttl+0x217/0x830 CPU: 5 PID: 1177 Comm: curl Not tainted 6.7.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xbd/0xe0 check_preemption_disabled+0xd1/0xe0 decrement_ttl+0x217/0x830 __ip_vs_get_out_rt+0x4e0/0x1ef0 ip_vs_nat_xmit+0x205/0xcd0 ip_vs_in_hook+0x9b1/0x26a0 nf_hook_slow+0xc2/0x210 nf_hook+0x1fb/0x770 __ip_local_out+0x33b/0x640 ip_local_out+0x2a/0x490 __ip_queue_xmit+0x990/0x1d10 __tcp_transmit_skb+0x288b/0x3d10 tcp_connect+0x3466/0x5180 tcp_v4_connect+0x1535/0x1bb0 __inet_stream_connect+0x40d/0x1040 inet_stream_connect+0x57/0xa0 __sys_connect_file+0x162/0x1a0 __sys_connect+0x137/0x160 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x6f/0x140 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7fe6dbbc34e0 Use the corresponding preemption-aware variants: IP_INC_STATS and IP6_INC_STATS. Found by Linux Verification Center (linuxtesting.org). Fixes: 8d8e20e2d7bb ("ipvs: Decrement ttl") Signed-off-by: Fedor Pchelkin <[email protected]> Acked-by: Julian Anastasov <[email protected]> Acked-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nf_tables: reject NFT_SET_CONCAT with not field length descriptionPablo Neira Ayuso1-1/+5
It is still possible to set on the NFT_SET_CONCAT flag by specifying a set size and no field description, report EINVAL in such case. Fixes: 1b6345d4160e ("netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nf_tables: skip dead set elements in netlink dumpPablo Neira Ayuso1-1/+1
Delete from packet path relies on the garbage collector to purge elements with NFT_SET_ELEM_DEAD_BIT on. Skip these dead elements from nf_tables_dump_setelem() path, I very rarely see tests/shell/testcases/maps/typeof_maps_add_delete reports [DUMP FAILED] showing a mismatch in the expected output with an element that should not be there. If the netlink dump happens before GC worker run, it might show dead elements in the ruleset listing. nft_rhash_get() already skips dead elements in nft_rhash_cmp(), therefore, it already does not show the element when getting a single element via netlink control plane. Fixes: 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nf_tables: do not allow mismatch field size and set key lengthPablo Neira Ayuso1-1/+5
The set description provides the size of each field in the set whose sum should not mismatch the set key length, bail out otherwise. I did not manage to crash nft_set_pipapo with mismatch fields and set key length so far, but this is UB which must be disallowed. Fixes: f3a2181e16f1 ("netfilter: nf_tables: Support for sets with multiple ranged fields") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nf_tables: check if catch-all set element is active in next ↵Pablo Neira Ayuso1-1/+1
generation When deactivating the catch-all set element, check the state in the next generation that represents this transaction. This bug uncovered after the recent removal of the element busy mark a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API"). Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Cc: [email protected] Reported-by: lonial con <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: bridge: replace physindev with physinif in nf_bridge_infoPavel Tikhomirov6-21/+61
An skb can be added to a neigh->arp_queue while waiting for an arp reply. Where original skb's skb->dev can be different to neigh's neigh->dev. For instance in case of bridging dnated skb from one veth to another, the skb would be added to a neigh->arp_queue of the bridge. As skb->dev can be reset back to nf_bridge->physindev and used, and as there is no explicit mechanism that prevents this physindev from been freed under us (for instance neigh_flush_dev doesn't cleanup skbs from different device's neigh queue) we can crash on e.g. this stack: arp_process neigh_update skb = __skb_dequeue(&neigh->arp_queue) neigh_resolve_output(..., skb) ... br_nf_dev_xmit br_nf_pre_routing_finish_bridge_slow skb->dev = nf_bridge->physindev br_handle_frame_finish Let's use plain ifindex instead of net_device link. To peek into the original net_device we will use dev_get_by_index_rcu(). Thus either we get device and are safe to use it or we don't get it and drop skb. Fixes: c4e70a87d975 ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c") Suggested-by: Florian Westphal <[email protected]> Signed-off-by: Pavel Tikhomirov <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: propagate net to nf_bridge_get_physindevPavel Tikhomirov7-15/+16
This is a preparation patch for replacing physindev with physinif on nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve device, when needed, and it requires net to be available. Signed-off-by: Pavel Tikhomirov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nf_queue: remove excess nf_bridge variablePavel Tikhomirov1-3/+1
We don't really need nf_bridge variable here. And nf_bridge_info_exists is better replacement for nf_bridge_info_get in case we are only checking for existence. Signed-off-by: Pavel Tikhomirov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nfnetlink_log: use proper helper for fetching physinifPavel Tikhomirov1-4/+4
We don't use physindev in __build_packet_message except for getting physinif from it. So let's switch to nf_bridge_get_physinif to get what we want directly. Signed-off-by: Pavel Tikhomirov <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nft_limit: do not ignore unsupported flagsPablo Neira Ayuso1-7/+12
Bail out if userspace provides unsupported flags, otherwise future extensions to the limit expression will be silently ignored by the kernel. Fixes: c7862a5f0de5 ("netfilter: nft_limit: allow to invert matching criteria") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nf_tables: bail out if stateful expression provides no .clonePablo Neira Ayuso1-8/+7
All existing NFT_EXPR_STATEFUL provide a .clone interface, remove fallback to copy content of stateful expression since this is never exercised and bail out if .clone interface is not defined. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nf_tables: validate .maxattr at expression registrationPablo Neira Ayuso1-0/+3
struct nft_expr_info allows to store up to NFT_EXPR_MAXATTR (16) attributes when parsing netlink attributes. Rise a warning in case there is ever a nft expression whose .maxattr goes beyond this number of expressions, in such case, struct nft_expr_info needs to be updated. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17netfilter: nf_tables: reject invalid set policyPablo Neira Ayuso1-1/+9
Report -EINVAL in case userspace provides a unsupported set backend policy. Fixes: c50b960ccc59 ("netfilter: nf_tables: implement proper set selection") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-01-17net: netdevsim: don't try to destroy PHC on VFsJakub Kicinski1-2/+7
PHC gets initialized in nsim_init_netdevsim(), which is only called if (nsim_dev_port_is_pf()). Create a counterpart of nsim_init_netdevsim() and move the mock_phc_destroy() there. This fixes a crash trying to destroy netdevsim with VFs instantiated, as caught by running the devlink.sh test: BUG: kernel NULL pointer dereference, address: 00000000000000b8 RIP: 0010:mock_phc_destroy+0xd/0x30 Call Trace: <TASK> nsim_destroy+0x4a/0x70 [netdevsim] __nsim_dev_port_del+0x47/0x70 [netdevsim] nsim_dev_reload_destroy+0x105/0x120 [netdevsim] nsim_drv_remove+0x2f/0xb0 [netdevsim] device_release_driver_internal+0x1a1/0x210 bus_remove_device+0xd5/0x120 device_del+0x159/0x490 device_unregister+0x12/0x30 del_device_store+0x11a/0x1a0 [netdevsim] kernfs_fop_write_iter+0x130/0x1d0 vfs_write+0x30b/0x4b0 ksys_write+0x69/0xf0 do_syscall_64+0xcc/0x1e0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 Fixes: b63e78fca889 ("net: netdevsim: use mock PHC driver") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-17mptcp: relax check on MPC passive fallbackPaolo Abeni1-1/+2
While testing the blamed commit below, I was able to miss (!) packetdrill failures in the fastopen test-cases. On passive fastopen the child socket is created by incoming TCP MPC syn, allow for both MPC_SYN and MPC_ACK header. Fixes: 724b00c12957 ("mptcp: refine opt_mp_capable determination") Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-17net: stmmac: Prevent DSA tags from breaking COERomain Gantois1-3/+29
Some DSA tagging protocols change the EtherType field in the MAC header e.g. DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105). On TX these tagged frames are ignored by the checksum offload engine and IP header checker of some stmmac cores. On RX, the stmmac driver wrongly assumes that checksums have been computed for these tagged packets, and sets CHECKSUM_UNNECESSARY. Add an additional check in the stmmac TX and RX hotpaths so that COE is deactivated for packets with ethertypes that will not trigger the COE and IP header checks. Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan") Cc: <[email protected]> Reported-by: Richard Tresidder <[email protected]> Link: https://lore.kernel.org/netdev/[email protected]/ Reported-by: Romain Gantois <[email protected]> Link: https://lore.kernel.org/netdev/[email protected]/ Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Romain Gantois <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-01-17gpiolib: revert the attempt to protect the GPIO device list with an rwsemBartosz Golaszewski4-89/+97
This reverts commits 1979a2807547 ("gpiolib: replace the GPIO device mutex with a read-write semaphore") and 65a828bab158 ("gpiolib: use a mutex to protect the list of GPIO devices"). Unfortunately the legacy GPIO API that's still used in older code has to translate numbers from the global GPIO numberspace to descriptors. This results in a GPIO device lookup in every call to legacy functions. Some of those functions - like gpio_set/get_value() - can be called from atomic context so taking a sleeping lock that is an RW semaphore results in an error. We'll probably have to protect this list with SRCU. Reported-by: Dan Carpenter <[email protected]> Closes: https://lore.kernel.org/linux-wireless/[email protected]/ Fixes: 1979a2807547 ("gpiolib: replace the GPIO device mutex with a read-write semaphore") Fixes: 65a828bab158 ("gpiolib: use a mutex to protect the list of GPIO devices") Signed-off-by: Bartosz Golaszewski <[email protected]>
2024-01-16selftests: rtnetlink: use setup_ns in bonding testNicolas Dichtel1-7/+5
This is a follow-up of commit a159cbe81d3b ("selftests: rtnetlink: check enslaving iface in a bond") after the merge of net-next into net. The goal is to follow the new convention, see commit d3b6b1116127 ("selftests/net: convert rtnetlink.sh to run it in unique namespace") for more details. Let's use also the generic dummy name instead of defining a new one. Signed-off-by: Nicolas Dichtel <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-01-16net: sfp-bus: fix SFP mode detect from bitrateRussell King (Oracle)1-4/+4
The referenced commit moved the setting of the Autoneg and pause bits early in sfp_parse_support(). However, we check whether the modes are empty before using the bitrate to set some modes. Setting these bits so early causes that test to always be false, preventing this working, and thus some modules that used to work no longer do. Move them just before the call to the quirk. Fixes: 8110633db49d ("net: sfp-bus: allow SFP quirks to override Autoneg and pause bits") Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Maxime Chevallier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-01-16net: dsa: vsc73xx: Add null pointer check to vsc73xx_gpio_probeKunwu Chan1-0/+2
devm_kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Signed-off-by: Kunwu Chan <[email protected]> Suggested-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-01-16eventfs: Use kcalloc() instead of kzalloc()Erick Archer1-3/+3
As noted in the "Deprecated Interfaces, Language Features, Attributes, and Conventions" documentation [1], size calculations (especially multiplication) should not be performed in memory allocator (or similar) function arguments due to the risk of them overflowing. This could lead to values wrapping around and a smaller allocation being made than the caller was expecting. Using those allocations could lead to linear overflows of heap memory and other misbehaviors. So, use the purpose specific kcalloc() function instead of the argument size * count in the kzalloc() function. [1] https://www.kernel.org/doc/html/next/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Mark Rutland <[email protected]> Link: https://github.com/KSPP/linux/issues/162 Signed-off-by: Erick Archer <[email protected]> Reviewed-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-01-16eventfs: Do not create dentries nor inodes in iterate_sharedSteven Rostedt (Google)1-15/+5
The original eventfs code added a wrapper around the dcache_readdir open callback and created all the dentries and inodes at open, and increment their ref count. A wrapper was added around the dcache_readdir release function to decrement all the ref counts of those created inodes and dentries. But this proved to be buggy[1] for when a kprobe was created during a dir read, it would create a dentry between the open and the release, and because the release would decrement all ref counts of all files and directories, that would include the kprobe directory that was not there to have its ref count incremented in open. This would cause the ref count to go to negative and later crash the kernel. To solve this, the dentries and inodes that were created and had their ref count upped in open needed to be saved. That list needed to be passed from the open to the release, so that the release would only decrement the ref counts of the entries that were incremented in the open. Unfortunately, the dcache_readdir logic was already using the file->private_data, which is the only field that can be used to pass information from the open to the release. What was done was the eventfs created another descriptor that had a void pointer to save the dcache_readdir pointer, and it wrapped all the callbacks, so that it could save the list of entries that had their ref counts incremented in the open, and pass it to the release. The wrapped callbacks would just put back the dcache_readdir pointer and call the functions it used so it could still use its data[2]. But Linus had an issue with the "hijacking" of the file->private_data (unfortunately this discussion was on a security list, so no public link). Which we finally agreed on doing everything within the iterate_shared callback and leave the dcache_readdir out of it[3]. All the information needed for the getents() could be created then. But this ended up being buggy too[4]. The iterate_shared callback was not the right place to create the dentries and inodes. Even Christian Brauner had issues with that[5]. An attempt was to go back to creating the inodes and dentries at the open, create an array to store the information in the file->private_data, and pass that information to the other callbacks.[6] The difference between that and the original method, is that it does not use dcache_readdir. It also does not up the ref counts of the dentries and pass them. Instead, it creates an array of a structure that saves the dentry's name and inode number. That information is used in the iterate_shared callback, and the array is freed in the dir release. The dentries and inodes created in the open are not used for the iterate_share or release callbacks. Just their names and inode numbers. Linus did not like that either[7] and just wanted to remove the dentries being created in iterate_shared and use the hard coded inode numbers. [ All this while Linus enjoyed an unexpected vacation during the merge window due to lack of power. ] [1] https://lore.kernel.org/linux-trace-kernel/[email protected]/ [2] https://lore.kernel.org/linux-trace-kernel/[email protected]/ [3] https://lore.kernel.org/linux-trace-kernel/[email protected]/ [4] https://lore.kernel.org/all/[email protected]/ [5] https://lore.kernel.org/all/20240111-unzahl-gefegt-433acb8a841d@brauner/ [6] https://lore.kernel.org/all/[email protected]/ [7] https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Al Viro <[email protected]> Cc: Ajay Kaher <[email protected]> Fixes: 493ec81a8fb8 ("eventfs: Stop using dcache_readdir() for getdents()") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-01-16block: Fix iterating over an empty bio with bio_for_each_folio_allMatthew Wilcox (Oracle)1-3/+6
If the bio contains no data, bio_first_folio() calls page_folio() on a NULL pointer and oopses. Move the test that we've reached the end of the bio from bio_next_folio() to bio_first_folio(). Reported-by: [email protected] Reported-by: [email protected] Fixes: 640d1930bef4 ("block: Add bio_for_each_folio_all()") Cc: [email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: add unlikely() to error case] Signed-off-by: Jens Axboe <[email protected]>
2024-01-16eventfs: Have the inodes all for files and directories all be the sameSteven Rostedt (Google)1-0/+10
The dentries and inodes are created in the readdir for the sole purpose of getting a consistent inode number. Linus stated that is unnecessary, and that all inodes can have the same inode number. For a virtual file system they are pretty meaningless. Instead use a single unique inode number for all files and one for all directories. Link: https://lore.kernel.org/all/[email protected]/ Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Al Viro <[email protected]> Cc: Ajay Kaher <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2024-01-16Input: atkbd - use ab83 as id when skipping the getid commandHans de Goede1-5/+7
Barnabás reported that the change to skip the getid command when the controller is in translated mode on laptops caused the Version field of his "AT Translated Set 2 keyboard" input device to change from ab83 to abba, breaking a custom hwdb entry for this keyboard. Use the standard ab83 id for keyboards when getid is skipped (rather then that getid fails) to avoid reporting a different Version to userspace then before skipping the getid. Fixes: 936e4d49ecbc ("Input: atkbd - skip ATKBD_CMD_GETID in translated mode") Reported-by: Barnabás Pőcze <[email protected]> Closes: https://lore.kernel.org/linux-input/W1ydwoG2fYv85Z3C3yfDOJcVpilEvGge6UGa9kZh8zI2-qkHXp7WLnl2hSkFz63j-c7WupUWI5TLL6n7Lt8DjRuU-yJBwLYWrreb1hbnd6A=@protonmail.com/ Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Torokhov <[email protected]>
2024-01-16block: bio-integrity: fix kcalloc() arguments orderDmitry Antipov1-1/+1
When compiling with gcc version 14.0.1 20240116 (experimental) and W=1, I've noticed the following warning: block/bio-integrity.c: In function 'bio_integrity_map_user': block/bio-integrity.c:339:38: warning: 'kcalloc' sizes specified with 'sizeof' in the earlier argument and not in the later argument [-Wcalloc-transposed-args] 339 | bvec = kcalloc(sizeof(*bvec), nr_vecs, GFP_KERNEL); | ^ block/bio-integrity.c:339:38: note: earlier argument should specify number of elements, later size of each element Since 'n' and 'size' arguments of 'kcalloc()' are multiplied to calculate the final size, their actual order doesn't affect the result and so this is not a bug. But it's still worth to fix it. Fixes: 492c5d455969 ("block: bio-integrity: directly map user buffers") Signed-off-by: Dmitry Antipov <[email protected]> Reviewed-by: Keith Busch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-01-16selftests/bpf: Add test for alu on PTR_TO_FLOW_KEYSHao Sun1-0/+19
Add a test case for PTR_TO_FLOW_KEYS alu. Testing if alu with variable offset on flow_keys is rejected. For the fixed offset success case, we already have C code coverage to verify (e.g. via bpf_flow.c). Signed-off-by: Hao Sun <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-01-16bpf: Reject variable offset alu on PTR_TO_FLOW_KEYSHao Sun1-0/+4
For PTR_TO_FLOW_KEYS, check_flow_keys_access() only uses fixed off for validation. However, variable offset ptr alu is not prohibited for this ptr kind. So the variable offset is not checked. The following prog is accepted: func#0 @0 0: R1=ctx() R10=fp0 0: (bf) r6 = r1 ; R1=ctx() R6_w=ctx() 1: (79) r7 = *(u64 *)(r6 +144) ; R6_w=ctx() R7_w=flow_keys() 2: (b7) r8 = 1024 ; R8_w=1024 3: (37) r8 /= 1 ; R8_w=scalar() 4: (57) r8 &= 1024 ; R8_w=scalar(smin=smin32=0, smax=umax=smax32=umax32=1024,var_off=(0x0; 0x400)) 5: (0f) r7 += r8 mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1 mark_precise: frame0: regs=r8 stack= before 4: (57) r8 &= 1024 mark_precise: frame0: regs=r8 stack= before 3: (37) r8 /= 1 mark_precise: frame0: regs=r8 stack= before 2: (b7) r8 = 1024 6: R7_w=flow_keys(smin=smin32=0,smax=umax=smax32=umax32=1024,var_off =(0x0; 0x400)) R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1024, var_off=(0x0; 0x400)) 6: (79) r0 = *(u64 *)(r7 +0) ; R0_w=scalar() 7: (95) exit This prog loads flow_keys to r7, and adds the variable offset r8 to r7, and finally causes out-of-bounds access: BUG: unable to handle page fault for address: ffffc90014c80038 [...] Call Trace: <TASK> bpf_dispatcher_nop_func include/linux/bpf.h:1231 [inline] __bpf_prog_run include/linux/filter.h:651 [inline] bpf_prog_run include/linux/filter.h:658 [inline] bpf_prog_run_pin_on_cpu include/linux/filter.h:675 [inline] bpf_flow_dissect+0x15f/0x350 net/core/flow_dissector.c:991 bpf_prog_test_run_flow_dissector+0x39d/0x620 net/bpf/test_run.c:1359 bpf_prog_test_run kernel/bpf/syscall.c:4107 [inline] __sys_bpf+0xf8f/0x4560 kernel/bpf/syscall.c:5475 __do_sys_bpf kernel/bpf/syscall.c:5561 [inline] __se_sys_bpf kernel/bpf/syscall.c:5559 [inline] __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:5559 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x3f/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x63/0x6b Fix this by rejecting ptr alu with variable offset on flow_keys. Applying the patch rejects the program with "R7 pointer arithmetic on flow_keys prohibited". Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Signed-off-by: Hao Sun <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-01-16selftests: bonding: add missing build configsJakub Kicinski1-0/+3
bonding tests also try to create bridge, veth and dummy interfaces. These are not currently listed in config. Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") Fixes: c078290a2b76 ("selftests: include bonding tests into the kselftest infra") Acked-by: Muhammad Usama Anjum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-01-16Merge branches 'pnp', 'acpi-resource' and 'acpica'Rafael J. Wysocki4-4/+33
Merge a PNP change, new ACPI IRQ management quirks and a small ACPICA code update for 6.8-rc1: - Make pnp_bus_type const (Greg Kroah-Hartman). - Add ACPI IRQ management quirks for ASUS ExpertBook B1502CGA and ASUS Vivobook E1504GA and E1504GAB (Ben Mayo, Michael Maltsev). - Add new MADT GICC/GICR/ITS non-coherent flags and GICC online capable bit handling to ACPICA (Lorenzo Pieralisi). * pnp: PNP: make pnp_bus_type const * acpi-resource: ACPI: resource: Skip IRQ override on ASUS ExpertBook B1502CGA ACPI: resource: Add DMI quirks for ASUS Vivobook E1504GA and E1504GAB * acpica: ACPICA: MADT: Add new MADT GICC/GICR/ITS non-coherent flags handling ACPICA: MADT: Add GICC online capable bit handling
2024-01-16selftests: netdevsim: correct expected FEC stringsJakub Kicinski1-7/+11
ethtool CLI has changed its output. Make the test compatible. Signed-off-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-01-16Merge branches 'thermal-core' and 'thermal-intel'Rafael J. Wysocki13-127/+1026
Merge additional updates for 6.8-rc1 in the thermal core and in the Intel HFI thermal driver: - Add debugfs-based diagnostics support to the thermal core (Daniel Lezcano, Dan Carpenter). - Fix a power allocator thermal governor issue preventing it from resetting cooling devices sometimes (Di Shen). - Simplify the thermal netlink API and clean up related code (Rafael J. Wysocki). - Make the Intel HFI driver support hibernation and deep suspend properly (Ricardo Neri). * thermal-core: thermal/debugfs: Unlock on error path in thermal_debug_tz_trip_up() thermal: gov_power_allocator: avoid inability to reset a cdev thermal: helpers: Rearrange thermal_cdev_set_cur_state() thermal: netlink: Rework notify API for cooling devices thermal: core: Use kstrdup_const() during cooling device registration thermal/debugfs: Add thermal debugfs information for mitigation episodes thermal/debugfs: Add thermal cooling device debugfs information thermal: netlink: Pass thermal zone pointer to notify routines thermal: netlink: Drop thermal_notify_tz_trip_add/delete() thermal: netlink: Pass pointers to thermal_notify_tz_trip_up/down() thermal: netlink: Pass pointers to thermal_notify_tz_trip_change() * thermal-intel: thermal: intel: hfi: Add syscore callbacks for system-wide PM
2024-01-16Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-qos' into pmRafael J. Wysocki6-59/+86
* pm-sleep: PM: sleep: Restore asynchronous device resume optimization * pm-cpufreq: Documentation: admin-guide: PM: Fix two typos cpufreq: intel_pstate: Update hybrid scaling factor for Meteor Lake * pm-qos: PM: QoS: Use kcalloc() instead of kzalloc()
2024-01-16selftests: netdevsim: sprinkle more udevadm settleJakub Kicinski2-0/+2
Number of tests are failing when netdev renaming is active on the system. Add udevadm settle in logic determining the names. Fixes: 242aaf03dc9b ("selftests: add a test for ethtool pause stats") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-01-16sched/fair: Fix frequency selection for non-invariant caseVincent Guittot1-1/+5
Linus reported a ~50% performance regression on single-threaded workloads on his AMD Ryzen system, and bisected it to: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation") When frequency invariance is not enabled, get_capacity_ref_freq(policy) is supposed to return the current frequency and the performance margin applied by map_util_perf(), enabling the utilization to go above the maximum compute capacity and to select a higher frequency than the current one. After the changes in 9c0b4bb7f630, the performance margin was applied earlier in the path to take into account utilization clampings and we couldn't get a utilization higher than the maximum compute capacity, and the CPU remained 'stuck' at lower frequencies. To fix this, we must use a frequency above the current frequency to get a chance to select a higher OPP when the current one becomes fully used. Apply the same margin and return a frequency 25% higher than the current one in order to switch to the next OPP before we fully use the CPU at the current one. [ mingo: Clarified the changelog. ] Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation") Reported-by: Linus Torvalds <[email protected]> Bisected-by: Linus Torvalds <[email protected]> Reported-by: Wyes Karny <[email protected]> Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Tested-by: Wyes Karny <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-01-16net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake ↵Qiang Ma3-2/+10
calls We found the following dmesg calltrace when testing the GMAC NIC notebook: [9.448656] ------------[ cut here ]------------ [9.448658] Unbalanced IRQ 43 wake disable [9.448673] WARNING: CPU: 3 PID: 1083 at kernel/irq/manage.c:688 irq_set_irq_wake+0xe0/0x128 [9.448717] CPU: 3 PID: 1083 Comm: ethtool Tainted: G O 4.19 #1 [9.448773] ... [9.448774] Call Trace: [9.448781] [<9000000000209b5c>] show_stack+0x34/0x140 [9.448788] [<9000000000d52700>] dump_stack+0x98/0xd0 [9.448794] [<9000000000228610>] __warn+0xa8/0x120 [9.448797] [<9000000000d2fb60>] report_bug+0x98/0x130 [9.448800] [<900000000020a418>] do_bp+0x248/0x2f0 [9.448805] [<90000000002035f4>] handle_bp_int+0x4c/0x78 [9.448808] [<900000000029ea40>] irq_set_irq_wake+0xe0/0x128 [9.448813] [<9000000000a96a7c>] stmmac_set_wol+0x134/0x150 [9.448819] [<9000000000be6ed0>] dev_ethtool+0x1368/0x2440 [9.448824] [<9000000000c08350>] dev_ioctl+0x1f8/0x3e0 [9.448827] [<9000000000bb2a34>] sock_ioctl+0x2a4/0x450 [9.448832] [<900000000046f044>] do_vfs_ioctl+0xa4/0x738 [9.448834] [<900000000046f778>] ksys_ioctl+0xa0/0xe8 [9.448837] [<900000000046f7d8>] sys_ioctl+0x18/0x28 [9.448840] [<9000000000211ab4>] syscall_common+0x20/0x34 [9.448842] ---[ end trace 40c18d9aec863c3e ]--- Multiple disable_irq_wake() calls will keep decreasing the IRQ wake_depth, When wake_depth is 0, calling disable_irq_wake() again, will report the above calltrace. Due to the need to appear in pairs, we cannot call disable_irq_wake() without calling enable_irq_wake(). Fix this by making sure there are no unbalanced disable_irq_wake() calls. Fixes: 3172d3afa998 ("stmmac: support wake up irq from external sources (v3)") Signed-off-by: Qiang Ma <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-01-16Merge branch 'selftests-net-small-fixes'Paolo Abeni3-2/+2
Benjamin Poirier says: ==================== selftests: net: Small fixes From: Benjamin Poirier <[email protected]> Two small fixes for net selftests. These patches were carved out of the following RFC series: https://lore.kernel.org/netdev/[email protected]/ I'm planning to send the rest of the series to net-next after it opens up. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-01-16selftests: forwarding: Remove executable bits from lib.shBenjamin Poirier1-0/+0
The lib.sh script is meant to be sourced from other scripts, not executed directly. Therefore, remove the executable bits from lib.sh's permissions. Fixes: fe32dffdcd33 ("selftests: forwarding: add TCPDUMP_EXTRA_FLAGS to lib.sh") Tested-by: Hangbin Liu <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Benjamin Poirier <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-01-16selftests: bonding: Change script interpreterBenjamin Poirier2-2/+2
The tests changed by this patch, as well as the scripts they source, use features which are not part of POSIX sh (ex. 'source' and 'local'). As a result, these tests fail when /bin/sh is dash such as on Debian. Change the interpreter to bash so that these tests can run successfully. Fixes: d43eff0b85ae ("selftests: bonding: up/down delay w/ slave link flapping") Tested-by: Hangbin Liu <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Reviewed-by: Petr Machata <[email protected]> Signed-off-by: Benjamin Poirier <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-01-15rtc: max31335: add driver supportAntoniu Miclaus4-0/+729
RTC driver for MAX31335 ±2ppm Automotive Real-Time Clock with Integrated MEMS Resonator. Reviewed-by: Guenter Roeck <[email protected]> Signed-off-by: Antoniu Miclaus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2024-01-15dt-bindings: rtc: max31335: add max31335 bindingsAntoniu Miclaus1-0/+70
Document the Analog Devices MAX31335 device tree bindings. Reviewed-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Antoniu Miclaus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2024-01-15gpio: EN7523: fix kernel-doc warningsRandy Dunlap1-3/+3
Add "struct" keyword and explain the @dir array differently to prevent kernel-doc warnings: gpio-en7523.c:22: warning: cannot understand function prototype: 'struct airoha_gpio_ctrl ' gpio-en7523.c:27: warning: Function parameter or struct member 'dir' not described in 'airoha_gpio_ctrl' gpio-en7523.c:27: warning: Excess struct member 'dir0' description in 'airoha_gpio_ctrl' gpio-en7523.c:27: warning: Excess struct member 'dir1' description in 'airoha_gpio_ctrl' Fixes: 0868ad385aff ("gpio: Add support for Airoha EN7523 GPIO controller") Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2024-01-15rtc: rv8803: add wakeup-source supportAlexandre Belloni2-2/+7
The RV8803 can be wired directly to a PMIC that can wake up an SoC without the CPU getting interrupts. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2024-01-15rtc: ac100: remove misuses of kernel-docRandy Dunlap1-2/+2
Prevent kernel-doc warnings by changing "/**" to common comment format "/*" in non-kernel-doc comments: drivers/rtc/rtc-ac100.c:103: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Clock controls for 3 clock output pins drivers/rtc/rtc-ac100.c:382: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * RTC related bits Signed-off-by: Randy Dunlap <[email protected]> Cc: Chen-Yu Tsai <[email protected]> Cc: Alexandre Belloni <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexandre Belloni <[email protected]>
2024-01-15Merge branch 'pci/misc'Bjorn Helgaas7-9/+17
- Drop unused struct pci_driver.node member (Mathias Krause) - Fix documentation typos (Attreyee Mukherjee) - Use a unique test pattern for each BAR in the pci_endpoint_test to make it easier to debug address translation issues (Niklas Cassel) - Fix kernel-doc issues (Bjorn Helgaas) * pci/misc: PCI: Fix kernel-doc issues misc: pci_endpoint_test: Use a unique test pattern for each BAR docs: PCI: Fix typos PCI: Remove unused 'node' member from struct pci_driver
2024-01-15Merge branch 'pci/dt-bindings'Bjorn Helgaas2-3/+62
- Increase qcom iommu-map maxItems to accommodate SDX55 (five entries) and SDM845 (sixteen entries) (Krzysztof Kozlowski) - Describe qcom,pcie-sc8180x clocks and resets accurately (Krzysztof Kozlowski) - Describe qcom,pcie-sm8150 clocks and resets accurately (Krzysztof Kozlowski) - Correct the qcom "reset-name" property, previously incorrectly called "reset-names" (Krzysztof Kozlowski) - Document rockchip optional PCIe reference clock input (Heiko Stuebner) - Document qcom,pcie-sm8650, based on qcom,pcie-sm8550 (Neil Armstrong) * pci/dt-bindings: dt-bindings: PCI: qcom: Document the SM8650 PCIe Controller dt-bindings: PCI: dwc: rockchip: Document optional PCIe reference clock input dt-bindings: PCI: qcom: Correct reset-names property dt-bindings: PCI: qcom: Correct clocks for SM8150 dt-bindings: PCI: qcom: Correct clocks for SC8180x dt-bindings: PCI: qcom: Adjust iommu-map for different SoC
2024-01-15Merge branch 'pci/remove-old-api'Bjorn Helgaas2-6/+6
- In dw-xdata-pcie, pci_endpoint_test, and vmd, replace usage of deprecated ida_simple_*() API with ida_alloc() and ida_free() (Christophe JAILLET) * pci/remove-old-api: dw-xdata: Remove usage of the deprecated ida_simple_*() API misc: pci_endpoint_test: Remove usage of the deprecated ida_simple_*() API PCI: vmd: Remove usage of the deprecated ida_simple_*() API
2024-01-15Merge branch 'pci/endpoint'Bjorn Helgaas5-6/+6
- Make struct pci_epc_event_ops and struct pci_epf_ops instances const (Lars-Peter Clausen) * pci/endpoint: PCI: endpoint: pci-epf-test: Make struct pci_epf_ops const PCI: endpoint: pci-epf-vntb: Make struct pci_epf_ops const PCI: endpoint: pci-epf-ntb: Make struct pci_epf_ops const PCI: endpoint: pci-epf-mhi: Make structs pci_epf_ops and pci_epf_event_ops const PCI: endpoint: Make struct pci_epf_ops in pci_epf_driver const