aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-12nvmet: always initialize cqe.resultDaniel Wagner3-9/+1
The spec doesn't mandate that the first two double words (aka results) for the command queue entry need to be set to 0 when they are not used (not specified). Though, the target implemention returns 0 for TCP and FC but not for RDMA. Let's make RDMA behave the same and thus explicitly initializing the result field. This prevents leaking any data from the stack. Signed-off-by: Daniel Wagner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2024-06-12nvmet-passthru: propagate status from id override functionsDaniel Wagner1-3/+3
The id override functions return a status which is not propagated to the caller. Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Daniel Wagner <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2024-06-12nvme: avoid double free special payloadChunguang Xu1-0/+1
If a discard request needs to be retried, and that retry may fail before a new special payload is added, a double free will result. Clear the RQF_SPECIAL_LOAD when the request is cleaned. Signed-off-by: Chunguang Xu <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2024-06-12block: unmap and free user mapped integrity via submitterAnuj Gupta3-6/+39
The user mapped intergity is copied back and unpinned by bio_integrity_free which is a low-level routine. Do it via the submitter rather than doing it in the low-level block layer code, to split the submitter side from the consumer side of the bio. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-06-12block: fix request.queuelist usage in flushChengming Zhou1-1/+2
Friedrich Weber reported a kernel crash problem and bisected to commit 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine"). The root cause is that we use "list_move_tail(&rq->queuelist, pending)" in the PREFLUSH/POSTFLUSH sequences. But rq->queuelist.next == xxx since it's popped out from plug->cached_rq in __blk_mq_alloc_requests_batch(). We don't initialize its queuelist just for this first request, although the queuelist of all later popped requests will be initialized. Fix it by changing to use "list_add_tail(&rq->queuelist, pending)" so rq->queuelist doesn't need to be initialized. It should be ok since rq can't be on any list when PREFLUSH or POSTFLUSH, has no move actually. Please note the commit 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine") also has another requirement that no drivers would touch rq->queuelist after blk_mq_end_request() since we will reuse it to add rq to the post-flush pending list in POSTFLUSH. If this is not true, we will have to revert that commit IMHO. This updated version adds "list_del_init(&rq->queuelist)" in flush rq callback since the dm layer may submit request of a weird invalid format (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH), which causes double list_add if without this "list_del_init(&rq->queuelist)". The weird invalid format problem should be fixed in dm layer. Reported-by: Friedrich Weber <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Closes: https://lore.kernel.org/lkml/[email protected]/ Fixes: 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine") Cc: Christoph Hellwig <[email protected]> Cc: [email protected] Cc: [email protected] Tested-by: Friedrich Weber <[email protected]> Signed-off-by: Chengming Zhou <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-06-12block: Optimize disk zone resource cleanupDamien Le Moal1-0/+3
For zoned block devices using zone write plugging, an rcu_barrier() call is needed in disk_free_zone_resources() to synchronize freeing of zone write plugs and the destrution of the mempool used to allocate the plugs. The barrier call does slow down a little teardown of zoned block devices but should not affect teardown of regular block devices or zoned block devices that do not use zone write plugging (e.g. zoned DM devices that do not require zone append emulation). Modify disk_free_zone_resources() to return early if we do not have a mempool to start with, that is, if the device does not use zone write plugging. This avoids the costly rcu_barrier() and speeds up disk teardown. Reported-by: Mikulas Patocka <[email protected]> Fixes: dd291d77cc90 ("block: Introduce zone write plugging") Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Tested-by: Mikulas Patocka <[email protected]> Reviewed-by: Niklas Cassel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-06-12block: sed-opal: avoid possible wrong address reference in read_sed_opal_key()Su Hui1-1/+1
Clang static checker (scan-build) warning: block/sed-opal.c:line 317, column 3 Value stored to 'ret' is never read. Fix this problem by returning the error code when keyring_search() failed. Otherwise, 'key' will have a wrong value when 'kerf' stores the error code. Fixes: 3bfeb6125664 ("block: sed-opal: keyring support for SED keys") Signed-off-by: Su Hui <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-06-12mailmap: Add my outdated addresses to the map fileAndy Shevchenko2-1/+3
There is a couple of outdated addresses that are still visible in the Git history, add them to .mailmap. While at it, replace one in the comment. Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-06-12i2c: designware: Fix the functionality flags of the slave-only interfaceJean Delvare1-1/+1
When an I2C adapter acts only as a slave, it should not claim to support I2C master capabilities. Fixes: 5b6d721b266a ("i2c: designware: enable SLAVE in platform module") Signed-off-by: Jean Delvare <[email protected]> Cc: Luis Oliveira <[email protected]> Cc: Jarkko Nikula <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Mika Westerberg <[email protected]> Cc: Jan Dabros <[email protected]> Cc: Andi Shyti <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Jarkko Nikula <[email protected]> Tested-by: Jarkko Nikula <[email protected]> Signed-off-by: Andi Shyti <[email protected]>
2024-06-12i2c: at91: Fix the functionality flags of the slave-only interfaceJean Delvare1-2/+1
When an I2C adapter acts only as a slave, it should not claim to support I2C master capabilities. Fixes: 9d3ca54b550c ("i2c: at91: added slave mode support") Signed-off-by: Jean Delvare <[email protected]> Cc: Juergen Fitschen <[email protected]> Cc: Ludovic Desroches <[email protected]> Cc: Codrin Ciubotariu <[email protected]> Cc: Andi Shyti <[email protected]> Cc: Nicolas Ferre <[email protected]> Cc: Alexandre Belloni <[email protected]> Cc: Claudiu Beznea <[email protected]> Signed-off-by: Andi Shyti <[email protected]>
2024-06-12cpufreq: intel_pstate: Check turbo_is_disabled() in store_no_turbo()Rafael J. Wysocki1-7/+12
After recent changes in intel_pstate, global.turbo_disabled is only set at the initialization time and never changed. However, it turns out that on some systems the "turbo disabled" bit in MSR_IA32_MISC_ENABLE, the initial state of which is reflected by global.turbo_disabled, can be flipped later and there should be a way to take that into account (other than checking that MSR every time the driver runs which is costly and useless overhead on the vast majority of systems). For this purpose, notice that before the changes in question, store_no_turbo() contained a turbo_is_disabled() check that was used for updating global.turbo_disabled if the "turbo disabled" bit in MSR_IA32_MISC_ENABLE had been flipped and that functionality can be restored. Then, users will be able to reset global.turbo_disabled by writing 0 to no_turbo which used to work before on systems with flipping "turbo disabled" bit. This guarantees the driver state to remain in sync, but READ_ONCE() annotations need to be added in two places where global.turbo_disabled is accessed locklessly, so modify the driver to make that happen. Fixes: 0940f1a8011f ("cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization") Closes: https://lore.kernel.org/linux-pm/[email protected] Suggested-by: Srinivas Pandruvada <[email protected]> Reported-by: Xi Ruoyao <[email protected]> Tested-by: Xi Ruoyao <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-06-12wifi: mac80211: Recalc offload when monitor stopRemi Pommarel1-0/+1
When a monitor interface is started, ieee80211_recalc_offload() is called and 802.11 encapsulation offloading support get disabled so monitor interface could get native wifi frames directly. But when this interface is stopped there is no need to keep the 802.11 encpasulation offloading off. This call ieee80211_recalc_offload() when monitor interface is stopped so 802.11 encapsulation offloading gets re-activated if possible. Fixes: 6aea26ce5a4c ("mac80211: rework tx encapsulation offload API") Signed-off-by: Remi Pommarel <[email protected]> Link: https://msgid.link/840baab454f83718e6e16fd836ac597d924e85b9.1716048326.git.repk@triplefau.lt Signed-off-by: Johannes Berg <[email protected]>
2024-06-12wifi: iwlwifi: scan: correctly check if PSC listen period is neededAyala Beker1-1/+1
The flags variable is incorrectly checked while it is still cleared and has not been assigned any value yet. Fix it. Fixes: a615323f7f90 ("wifi: iwlwifi: mvm: always apply 6 GHz probe limitations") Signed-off-by: Ayala Beker <[email protected]> Reviewed-by: Benjamin Berg <[email protected]> Signed-off-by: Miri Korenblit <[email protected]> Link: https://msgid.link/20240605140556.291c33f9a283.Id651fe69828aebce177b49b2316c5780906f1b37@changeid Signed-off-by: Johannes Berg <[email protected]>
2024-06-12wifi: iwlwifi: mvm: fix ROC version checkShaul Triebitz1-1/+1
For using the ROC command, check that the ROC version is *greater or equal* to 3, rather than *equal* to 3. The ROC version was added to the TLV starting from version 3. Fixes: 67ac248e4db0 ("wifi: iwlwifi: mvm: implement ROC version 3") Signed-off-by: Shaul Triebitz <[email protected]> Signed-off-by: Miri Korenblit <[email protected]> Link: https://msgid.link/20240605140327.93d86cd188ad.Iceadef5a2f3cfa4a127e94a0405eba8342ec89c1@changeid Signed-off-by: Johannes Berg <[email protected]>
2024-06-12wifi: iwlwifi: mvm: unlock mvm mutexShaul Triebitz1-0/+2
Unlock the mvm mutex before returning from a function with the mutex locked. Fixes: a1efeb823084 ("wifi: iwlwifi: mvm: Block EMLSR when a p2p/softAP vif is active") Signed-off-by: Shaul Triebitz <[email protected]> Signed-off-by: Miri Korenblit <[email protected]> Link: https://msgid.link/20240605140327.96cb956db4af.Ib468cbad38959910977b5581f6111ab0afae9880@changeid Signed-off-by: Johannes Berg <[email protected]>
2024-06-12wifi: cfg80211: wext: add extra SIOCSIWSCAN data checkDmitry Antipov1-2/+6
In 'cfg80211_wext_siwscan()', add extra check whether number of channels passed via 'ioctl(sock, SIOCSIWSCAN, ...)' doesn't exceed IW_MAX_FREQUENCIES and reject invalid request with -EINVAL otherwise. Reported-by: [email protected] Closes: https://syzkaller.appspot.com/bug?extid=253cd2d2491df77c93ac Signed-off-by: Dmitry Antipov <[email protected]> Link: https://msgid.link/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2024-06-12wifi: cfg80211: wext: set ssids=NULL for passive scansJohannes Berg1-1/+3
In nl80211, we always set the ssids of a scan request to NULL when n_ssids==0 (passive scan). Drivers have relied on this behaviour in the past, so we fixed it in 6 GHz scan requests as well, and added a warning so we'd have assurance the API would always be called that way. syzbot found that wext doesn't ensure that, so we reach the check and trigger the warning. Fix the wext code to set the ssids pointer to NULL when there are none. Reported-by: [email protected] Fixes: f7a8b10bfd61 ("wifi: cfg80211: fix 6 GHz scan request building") Signed-off-by: Johannes Berg <[email protected]>
2024-06-12drm/mediatek: Call drm_atomic_helper_shutdown() at shutdown timeDouglas Anderson1-0/+8
Based on grepping through the source code this driver appears to be missing a call to drm_atomic_helper_shutdown() at system shutdown time. Among other things, this means that if a panel is in use that it won't be cleanly powered off at system shutdown time. The fact that we should call drm_atomic_helper_shutdown() in the case of OS shutdown/restart comes straight out of the kernel doc "driver instance overview" in drm_drv.c. This driver users the component model and shutdown happens in the base driver. The "drvdata" for this driver will always be valid if shutdown() is called and as of commit 2a073968289d ("drm/atomic-helper: drm_atomic_helper_shutdown(NULL) should be a noop") we don't need to confirm that "drm" is non-NULL. Suggested-by: Maxime Ripard <[email protected]> Reviewed-by: Maxime Ripard <[email protected]> Reviewed-by: Fei Shao <[email protected]> Tested-by: Fei Shao <[email protected]> Signed-off-by: Douglas Anderson <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/20240611102744.v2.1.I2b014f90afc4729b6ecc7b5ddd1f6dedcea4625b@changeid
2024-06-12drm: renesas: shmobile: Call drm_atomic_helper_shutdown() at shutdown timeDouglas Anderson1-0/+8
Based on grepping through the source code, this driver appears to be missing a call to drm_atomic_helper_shutdown() at system shutdown time. This is important because drm_atomic_helper_shutdown() will cause panels to get disabled cleanly which may be important for their power sequencing. Future changes will remove any custom powering off in individual panel drivers so the DRM drivers need to start getting this right. The fact that we should call drm_atomic_helper_shutdown() in the case of OS shutdown comes straight out of the kernel doc "driver instance overview" in drm_drv.c. [geert: shmob_drm_remove() already calls drm_atomic_helper_shutdown] Suggested-by: Maxime Ripard <[email protected]> Signed-off-by: Douglas Anderson <[email protected]> Link: https://lore.kernel.org/r/20230901164111.RFT.15.Iaf638a1d4c8b3c307a6192efabb4cbb06b195f15@changeid [geert: s/drm_helper_force_disable_all/drm_atomic_helper_shutdown/] Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Reviewed-by: Sui Jingfeng <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/17c6a5a668e5975f871b77fb1fca6711a0799d9e.1718176895.git.geert+renesas@glider.be
2024-06-12xhci: Handle TD clearing for multiple streams caseHector Martin2-11/+44
When multiple streams are in use, multiple TDs might be in flight when an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for each, to ensure everything is reset properly and the caches cleared. Change the logic so that any N>1 TDs found active for different streams are deferred until after the first one is processed, calling xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to queue another command until we are done with all of them. Also change the error/"should never happen" paths to ensure we at least clear any affected TDs, even if we can't issue a command to clear the hardware cache, and complain loudly with an xhci_warn() if this ever happens. This problem case dates back to commit e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.") early on in the XHCI driver's life, when stream support was first added. It was then identified but not fixed nor made into a warning in commit 674f8438c121 ("xhci: split handling halted endpoints into two steps"), which added a FIXME comment for the problem case (without materially changing the behavior as far as I can tell, though the new logic made the problem more obvious). Then later, in commit 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs."), it was acknowledged again. [Mathias: commit 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.") was a targeted regression fix to the previously mentioned patch. Users reported issues with usb stuck after unmounting/disconnecting UAS devices. This rolled back the TD clearing of multiple streams to its original state.] Apparently the commit author was aware of the problem (yet still chose to submit it): It was still mentioned as a FIXME, an xhci_dbg() was added to log the problem condition, and the remaining issue was mentioned in the commit description. The choice of making the log type xhci_dbg() for what is, at this point, a completely unhandled and known broken condition is puzzling and unfortunate, as it guarantees that no actual users would see the log in production, thereby making it nigh undebuggable (indeed, even if you turn on DEBUG, the message doesn't really hint at there being a problem at all). It took me *months* of random xHC crashes to finally find a reliable repro and be able to do a deep dive debug session, which could all have been avoided had this unhandled, broken condition been actually reported with a warning, as it should have been as a bug intentionally left in unfixed (never mind that it shouldn't have been left in at all). > Another fix to solve clearing the caches of all stream rings with > cancelled TDs is needed, but not as urgent. 3 years after that statement and 14 years after the original bug was introduced, I think it's finally time to fix it. And maybe next time let's not leave bugs unfixed (that are actually worse than the original bug), and let's actually get people to review kernel commits please. Fixes xHC crashes and IOMMU faults with UAS devices when handling errors/faults. Easiest repro is to use `hdparm` to mark an early sector (e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop. At least in the case of JMicron controllers, the read errors end up having to cancel two TDs (for two queued requests to different streams) and the one that didn't get cleared properly ends up faulting the xHC entirely when it tries to access DMA pages that have since been unmapped, referred to by the stale TDs. This normally happens quickly (after two or three loops). After this fix, I left the `cat` in a loop running overnight and experienced no xHC failures, with all read errors recovered properly. Repro'd and tested on an Apple M1 Mac Mini (dwc3 host). On systems without an IOMMU, this bug would instead silently corrupt freed memory, making this a security bug (even on systems with IOMMUs this could silently corrupt memory belonging to other USB devices on the same controller, so it's still a security bug). Given that the kernel autoprobes partition tables, I'm pretty sure a malicious USB device pretending to be a UAS device and reporting an error with the right timing could deliberately trigger a UAF and write to freed memory, with no user action. [Mathias: Commit message and code comment edit, original at:] https://lore.kernel.org/linux-usb/[email protected]/ Fixes: e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.") Fixes: 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.") Fixes: 674f8438c121 ("xhci: split handling halted endpoints into two steps") Cc: [email protected] Cc: [email protected] Reviewed-by: Neal Gompa <[email protected]> Signed-off-by: Hector Martin <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-06-12xhci: Apply broken streams quirk to Etron EJ188 xHCI hostKuangyi Chiang1-1/+3
As described in commit 8f873c1ff4ca ("xhci: Blacklist using streams on the Etron EJ168 controller"), EJ188 have the same issue as EJ168, where Streams do not work reliable on EJ188. So apply XHCI_BROKEN_STREAMS quirk to EJ188 as well. Cc: [email protected] Signed-off-by: Kuangyi Chiang <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-06-12xhci: Apply reset resume quirk to Etron EJ188 xHCI hostKuangyi Chiang1-0/+5
As described in commit c877b3b2ad5c ("xhci: Add reset on resume quirk for asrock p67 host"), EJ188 have the same issue as EJ168, where completely dies on resume. So apply XHCI_RESET_ON_RESUME quirk to EJ188 as well. Cc: [email protected] Signed-off-by: Kuangyi Chiang <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-06-12xhci: Set correct transferred length for cancelled bulk transfersMathias Nyman1-3/+2
The transferred length is set incorrectly for cancelled bulk transfer TDs in case the bulk transfer ring stops on the last transfer block with a 'Stop - Length Invalid' completion code. length essentially ends up being set to the requested length: urb->actual_length = urb->transfer_buffer_length Length for 'Stop - Length Invalid' cases should be the sum of all TRB transfer block lengths up to the one the ring stopped on, _excluding_ the one stopped on. Fix this by always summing up TRB lengths for 'Stop - Length Invalid' bulk cases. This issue was discovered by Alan Stern while debugging https://bugzilla.kernel.org/show_bug.cgi?id=218890, but does not solve that bug. Issue is older than 4.10 kernel but fix won't apply to those due to major reworks in that area. Tested-by: Pierre Tomon <[email protected]> Cc: [email protected] # v4.10+ Cc: Alan Stern <[email protected]> Signed-off-by: Mathias Nyman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-06-11ksmbd: fix missing use of get_write in in smb2_set_ea()Namjae Jeon4-11/+19
Fix an issue where get_write is not used in smb2_set_ea(). Fixes: 6fc0a265e1b9 ("ksmbd: fix potential circular locking issue in smb2_set_ea()") Cc: [email protected] Reported-by: Wang Zhaolong <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-06-11ksmbd: move leading slash check to smb2_get_name()Namjae Jeon1-9/+6
If the directory name in the root of the share starts with character like 镜(0x955c) or Ṝ(0x1e5c), it (and anything inside) cannot be accessed. The leading slash check must be checked after converting unicode to nls string. Cc: [email protected] Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-06-11net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs ↵Xiaolei Wang1-14/+11
parameters The current cbs parameter depends on speed after uplinking, which is not needed and will report a configuration error if the port is not initially connected. The UAPI exposed by tc-cbs requires userspace to recalculate the send slope anyway, because the formula depends on port_transmit_rate (see man tc-cbs), which is not an invariant from tc's perspective. Therefore, we use offload->sendslope and offload->idleslope to derive the original port_transmit_rate from the CBS formula. Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC") Signed-off-by: Xiaolei Wang <[email protected]> Reviewed-by: Wojciech Drewek <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-11gve: ignore nonrelevant GSO type bits when processing TSO headersJoshua Washington1-15/+5
TSO currently fails when the skb's gso_type field has more than one bit set. TSO packets can be passed from userspace using PF_PACKET, TUNTAP and a few others, using virtio_net_hdr (e.g., PACKET_VNET_HDR). This includes virtualization, such as QEMU, a real use-case. The gso_type and gso_size fields as passed from userspace in virtio_net_hdr are not trusted blindly by the kernel. It adds gso_type |= SKB_GSO_DODGY to force the packet to enter the software GSO stack for verification. This issue might similarly come up when the CWR bit is set in the TCP header for congestion control, causing the SKB_GSO_TCP_ECN gso_type bit to be set. Fixes: a57e5de476be ("gve: DQO: Add TX path") Signed-off-by: Joshua Washington <[email protected]> Reviewed-by: Praveen Kaligineedi <[email protected]> Reviewed-by: Harshitha Ramamurthy <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Suggested-by: Eric Dumazet <[email protected]> Acked-by: Andrei Vagin <[email protected]> v2 - Remove unnecessary comments, remove line break between fixes tag and signoffs. v3 - Add back unrelated empty line removal. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-11Merge tag 'for-net-2024-06-10' of ↵Jakub Kicinski3-14/+36
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - hci_sync: fix not using correct handle - L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ - L2CAP: fix connection setup in l2cap_connect * tag 'for-net-2024-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth: Bluetooth: fix connection setup in l2cap_connect Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ Bluetooth: hci_sync: Fix not using correct handle ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-11net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPPKory Maincent1-2/+2
ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP as reported by checkpatch script. Fixes: 18ff0bcda6d1 ("ethtool: add interface to interact with Ethernet Power Equipment") Reviewed-by: Andrew Lunn <[email protected]> Acked-by: Oleksij Rempel <[email protected]> Signed-off-by: Kory Maincent <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-06-11scsi: mpi3mr: Fix ATA NCQ priority supportDamien Le Moal6-28/+89
The function mpi3mr_qcmd() of the mpi3mr driver is able to indicate to the HBA if a read or write command directed at an ATA device should be translated to an NCQ read/write command with the high prioiryt bit set when the request uses the RT priority class and the user has enabled NCQ priority through sysfs. However, unlike the mpt3sas driver, the mpi3mr driver does not define the sas_ncq_prio_supported and sas_ncq_prio_enable sysfs attributes, so the ncq_prio_enable field of struct mpi3mr_sdev_priv_data is never actually set and NCQ Priority cannot ever be used. Fix this by defining these missing atributes to allow a user to check if an ATA device supports NCQ priority and to enable/disable the use of NCQ priority. To do this, lift the function scsih_ncq_prio_supp() out of the mpt3sas driver and make it the generic SCSI SAS transport function sas_ata_ncq_prio_supported(). Nothing in that function is hardware specific, so this function can be used in both the mpt3sas driver and the mpi3mr driver. Reported-by: Scott McCoy <[email protected]> Fixes: 023ab2a9b4ed ("scsi: mpi3mr: Add support for queue command processing") Cc: [email protected] Signed-off-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Niklas Cassel <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2024-06-11scsi: ufs: core: Quiesce request queues before checking pending cmdsZiqi Chen1-3/+3
In ufshcd_clock_scaling_prepare(), after SCSI layer is blocked, ufshcd_pending_cmds() is called to check whether there are pending transactions or not. And only if there are no pending transactions can we proceed to kickstart the clock scaling sequence. ufshcd_pending_cmds() traverses over all SCSI devices and calls sbitmap_weight() on their budget_map. sbitmap_weight() can be broken down to three steps: 1. Calculate the nr outstanding bits set in the 'word' bitmap. 2. Calculate the nr outstanding bits set in the 'cleared' bitmap. 3. Subtract the result from step 1 by the result from step 2. This can lead to a race condition as outlined below: Assume there is one pending transaction in the request queue of one SCSI device, say sda, and the budget token of this request is 0, the 'word' is 0x1 and the 'cleared' is 0x0. 1. When step 1 executes, it gets the result as 1. 2. Before step 2 executes, block layer tries to dispatch a new request to sda. Since the SCSI layer is blocked, the request cannot pass through SCSI but the block layer would do budget_get() and budget_put() to sda's budget map regardless, so the 'word' has become 0x3 and 'cleared' has become 0x2 (assume the new request got budget token 1). 3. When step 2 executes, it gets the result as 1. 4. When step 3 executes, it gets the result as 0, meaning there is no pending transactions, which is wrong. Thread A Thread B ufshcd_pending_cmds() __blk_mq_sched_dispatch_requests() | | sbitmap_weight(word) | | scsi_mq_get_budget() | | | scsi_mq_put_budget() | | sbitmap_weight(cleared) ... When this race condition happens, the clock scaling sequence is started with transactions still in flight, leading to subsequent hibernate enter failure, broken link, task abort and back to back error recovery. Fix this race condition by quiescing the request queues before calling ufshcd_pending_cmds() so that block layer won't touch the budget map when ufshcd_pending_cmds() is working on it. In addition, remove the SCSI layer blocking/unblocking to reduce redundancies and latencies. Fixes: 8d077ede48c1 ("scsi: ufs: Optimize the command queueing code") Co-developed-by: Can Guo <[email protected]> Signed-off-by: Can Guo <[email protected]> Signed-off-by: Ziqi Chen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2024-06-11scsi: core: Disable CDL by defaultDamien Le Moal1-0/+7
For SCSI devices supporting the Command Duration Limits feature set, the user can enable/disable this feature use through the sysfs device attribute "cdl_enable". This attribute modification triggers a call to scsi_cdl_enable() to enable and disable the feature for ATA devices and set the scsi device cdl_enable field to the user provided bool value. For SCSI devices supporting CDL, the feature set is always enabled and scsi_cdl_enable() is reduced to setting the cdl_enable field. However, for ATA devices, a drive may spin-up with the CDL feature enabled by default. But the SCSI device cdl_enable field is always initialized to false (CDL disabled), regardless of the actual device CDL feature state. For ATA devices managed by libata (or libsas), libata-core always disables the CDL feature set when the device is attached, thus syncing the state of the CDL feature on the device and of the SCSI device cdl_enable field. However, for ATA devices connected to a SAS HBA, the CDL feature is not disabled on scan for ATA devices that have this feature enabled by default, leading to an inconsistent state of the feature on the device with the SCSI device cdl_enable field. Avoid this inconsistency by adding a call to scsi_cdl_enable() in scsi_cdl_check() to make sure that the device-side state of the CDL feature set always matches the scsi device cdl_enable field state. This implies that CDL will always be disabled for ATA devices connected to SAS HBAs, which is consistent with libata/libsas initialization of the device. Reported-by: Scott McCoy <[email protected]> Fixes: 1b22cfb14142 ("scsi: core: Allow enabling and disabling command duration limits") Cc: [email protected] Signed-off-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Niklas Cassel <[email protected]> Reviewed-by: Igor Pylypiv <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2024-06-12parisc: Try to fix random segmentation faults in package buildsJohn David Anglin3-180/+275
PA-RISC systems with PA8800 and PA8900 processors have had problems with random segmentation faults for many years. Systems with earlier processors are much more stable. Systems with PA8800 and PA8900 processors have a large L2 cache which needs per page flushing for decent performance when a large range is flushed. The combined cache in these systems is also more sensitive to non-equivalent aliases than the caches in earlier systems. The majority of random segmentation faults that I have looked at appear to be memory corruption in memory allocated using mmap and malloc. My first attempt at fixing the random faults didn't work. On reviewing the cache code, I realized that there were two issues which the existing code didn't handle correctly. Both relate to cache move-in. Another issue is that the present bit in PTEs is racy. 1) PA-RISC caches have a mind of their own and they can speculatively load data and instructions for a page as long as there is a entry in the TLB for the page which allows move-in. TLBs are local to each CPU. Thus, the TLB entry for a page must be purged before flushing the page. This is particularly important on SMP systems. In some of the flush routines, the flush routine would be called and then the TLB entry would be purged. This was because the flush routine needed the TLB entry to do the flush. 2) My initial approach to trying the fix the random faults was to try and use flush_cache_page_if_present for all flush operations. This actually made things worse and led to a couple of hardware lockups. It finally dawned on me that some lines weren't being flushed because the pte check code was racy. This resulted in random inequivalent mappings to physical pages. The __flush_cache_page tmpalias flush sets up its own TLB entry and it doesn't need the existing TLB entry. As long as we can find the pte pointer for the vm page, we can get the pfn and physical address of the page. We can also purge the TLB entry for the page before doing the flush. Further, __flush_cache_page uses a special TLB entry that inhibits cache move-in. When switching page mappings, we need to ensure that lines are removed from the cache. It is not sufficient to just flush the lines to memory as they may come back. This made it clear that we needed to implement all the required flush operations using tmpalias routines. This includes flushes for user and kernel pages. After modifying the code to use tmpalias flushes, it became clear that the random segmentation faults were not fully resolved. The frequency of faults was worse on systems with a 64 MB L2 (PA8900) and systems with more CPUs (rp4440). The warning that I added to flush_cache_page_if_present to detect pages that couldn't be flushed triggered frequently on some systems. Helge and I looked at the pages that couldn't be flushed and found that the PTE was either cleared or for a swap page. Ignoring pages that were swapped out seemed okay but pages with cleared PTEs seemed problematic. I looked at routines related to pte_clear and noticed ptep_clear_flush. The default implementation just flushes the TLB entry. However, it was obvious that on parisc we need to flush the cache page as well. If we don't flush the cache page, stale lines will be left in the cache and cause random corruption. Once a PTE is cleared, there is no way to find the physical address associated with the PTE and flush the associated page at a later time. I implemented an updated change with a parisc specific version of ptep_clear_flush. It fixed the random data corruption on Helge's rp4440 and rp3440, as well as on my c8000. At this point, I realized that I could restore the code where we only flush in flush_cache_page_if_present if the page has been accessed. However, for this, we also need to flush the cache when the accessed bit is cleared in ptep_clear_flush_young to keep things synchronized. The default implementation only flushes the TLB entry. Other changes in this version are: 1) Implement parisc specific version of ptep_get. It's identical to default but needed in arch/parisc/include/asm/pgtable.h. 2) Revise parisc implementation of ptep_test_and_clear_young to use ptep_get (READ_ONCE). 3) Drop parisc implementation of ptep_get_and_clear. We can use default. 4) Revise flush_kernel_vmap_range and invalidate_kernel_vmap_range to use full data cache flush. 5) Move flush_cache_vmap and flush_cache_vunmap to cache.c. Handle VM_IOREMAP case in flush_cache_vmap. At this time, I don't know whether it is better to always flush when the PTE present bit is set or when both the accessed and present bits are set. The later saves flushing pages that haven't been accessed, but we need to flush in ptep_clear_flush_young. It also needs a page table lookup to find the PTE pointer. The lpa instruction only needs a page table lookup when the PTE entry isn't in the TLB. We don't atomically handle setting and clearing the _PAGE_ACCESSED bit. If we miss an update, we may miss a flush and the cache may get corrupted. Whether the current code is effectively atomic depends on process control. When CONFIG_FLUSH_PAGE_ACCESSED is set to zero, the page will eventually be flushed when the PTE is cleared or in flush_cache_page_if_present. The _PAGE_ACCESSED bit is not used, so the problem is avoided. The flush method can be selected using the CONFIG_FLUSH_PAGE_ACCESSED define in cache.c. The default is 0. I didn't see a large difference in performance. Signed-off-by: John David Anglin <[email protected]> Cc: <[email protected]> # v6.6+ Signed-off-by: Helge Deller <[email protected]>
2024-06-12tracing: Build event generation tests only as modulesMasami Hiramatsu (Google)1-2/+2
The kprobes and synth event generation test modules add events and lock (get a reference) those event file reference in module init function, and unlock and delete it in module exit function. This is because those are designed for playing as modules. If we make those modules as built-in, those events are left locked in the kernel, and never be removed. This causes kprobe event self-test failure as below. [ 97.349708] ------------[ cut here ]------------ [ 97.353453] WARNING: CPU: 3 PID: 1 at kernel/trace/trace_kprobe.c:2133 kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.357106] Modules linked in: [ 97.358488] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.9.0-g699646734ab5-dirty #14 [ 97.361556] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 97.363880] RIP: 0010:kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.365538] Code: a8 24 08 82 e9 ae fd ff ff 90 0f 0b 90 48 c7 c7 e5 aa 0b 82 e9 ee fc ff ff 90 0f 0b 90 48 c7 c7 2d 61 06 82 e9 8e fd ff ff 90 <0f> 0b 90 48 c7 c7 33 0b 0c 82 89 c6 e8 6e 03 1f ff 41 ff c7 e9 90 [ 97.370429] RSP: 0000:ffffc90000013b50 EFLAGS: 00010286 [ 97.371852] RAX: 00000000fffffff0 RBX: ffff888005919c00 RCX: 0000000000000000 [ 97.373829] RDX: ffff888003f40000 RSI: ffffffff8236a598 RDI: ffff888003f40a68 [ 97.375715] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 [ 97.377675] R10: ffffffff811c9ae5 R11: ffffffff8120c4e0 R12: 0000000000000000 [ 97.379591] R13: 0000000000000001 R14: 0000000000000015 R15: 0000000000000000 [ 97.381536] FS: 0000000000000000(0000) GS:ffff88807dcc0000(0000) knlGS:0000000000000000 [ 97.383813] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 97.385449] CR2: 0000000000000000 CR3: 0000000002244000 CR4: 00000000000006b0 [ 97.387347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 97.389277] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 97.391196] Call Trace: [ 97.391967] <TASK> [ 97.392647] ? __warn+0xcc/0x180 [ 97.393640] ? kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.395181] ? report_bug+0xbd/0x150 [ 97.396234] ? handle_bug+0x3e/0x60 [ 97.397311] ? exc_invalid_op+0x1a/0x50 [ 97.398434] ? asm_exc_invalid_op+0x1a/0x20 [ 97.399652] ? trace_kprobe_is_busy+0x20/0x20 [ 97.400904] ? tracing_reset_all_online_cpus+0x15/0x90 [ 97.402304] ? kprobe_trace_self_tests_init+0x3f1/0x480 [ 97.403773] ? init_kprobe_trace+0x50/0x50 [ 97.404972] do_one_initcall+0x112/0x240 [ 97.406113] do_initcall_level+0x95/0xb0 [ 97.407286] ? kernel_init+0x1a/0x1a0 [ 97.408401] do_initcalls+0x3f/0x70 [ 97.409452] kernel_init_freeable+0x16f/0x1e0 [ 97.410662] ? rest_init+0x1f0/0x1f0 [ 97.411738] kernel_init+0x1a/0x1a0 [ 97.412788] ret_from_fork+0x39/0x50 [ 97.413817] ? rest_init+0x1f0/0x1f0 [ 97.414844] ret_from_fork_asm+0x11/0x20 [ 97.416285] </TASK> [ 97.417134] irq event stamp: 13437323 [ 97.418376] hardirqs last enabled at (13437337): [<ffffffff8110bc0c>] console_unlock+0x11c/0x150 [ 97.421285] hardirqs last disabled at (13437370): [<ffffffff8110bbf1>] console_unlock+0x101/0x150 [ 97.423838] softirqs last enabled at (13437366): [<ffffffff8108e17f>] handle_softirqs+0x23f/0x2a0 [ 97.426450] softirqs last disabled at (13437393): [<ffffffff8108e346>] __irq_exit_rcu+0x66/0xd0 [ 97.428850] ---[ end trace 0000000000000000 ]--- And also, since we can not cleanup dynamic_event file, ftracetest are failed too. To avoid these issues, build these tests only as modules. Link: https://lore.kernel.org/all/171811263754.85078.5877446624311852525.stgit@devnote2/ Fixes: 9fe41efaca08 ("tracing: Add synth event generation test module") Fixes: 64836248dda2 ("tracing: Add kprobe event command generation test module") Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Steven Rostedt (Google) <[email protected]>
2024-06-11x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checkingKees Cook2-3/+7
When reworking the range checking for get_user(), the get_user_8() case on 32-bit wasn't zeroing the high register. (The jump to bad_get_user_8 was accidentally dropped.) Restore the correct error handling destination (and rename the jump to using the expected ".L" prefix). While here, switch to using a named argument ("size") for the call template ("%c4" to "%c[size]") as already used in the other call templates in this file. Found after moving the usercopy selftests to KUnit: # usercopy_test_invalid: EXPECTATION FAILED at lib/usercopy_kunit.c:278 Expected val_u64 == 0, but val_u64 == -60129542144 (0xfffffff200000000) Closes: https://lore.kernel.org/all/CABVgOSn=tb=Lj9SxHuT4_9MTjjKVxsq-ikdXC4kGHO4CfKVmGQ@mail.gmail.com Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()") Reported-by: David Gow <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Qiuxu Zhuo <[email protected]> Tested-by: David Gow <[email protected]> Link: https://lore.kernel.org/all/20240610210213.work.143-kees%40kernel.org
2024-06-11bcachefs: Fix rcu_read_lock() leak in drop_extra_replicasKent Overstreet1-2/+1
Signed-off-by: Kent Overstreet <[email protected]>
2024-06-11drm/nouveau: remove unused struct 'init_exec'Dr. David Alan Gilbert1-5/+0
'init_exec' is unused since commit cb75d97e9c77 ("drm/nouveau: implement devinit subdev, and new init table parser") Remove it. Signed-off-by: Dr. David Alan Gilbert <[email protected]> Acked-by: Danilo Krummrich <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-06-11selftests/fchmodat2: fix clang build failure due to -static-libasanJohn Hubbard1-1/+10
gcc requires -static-libasan in order to ensure that Address Sanitizer's library is the first one loaded. However, this leads to build failures on clang, when building via: make LLVM=1 -C tools/testing/selftests However, clang already does the right thing by default: it statically links the Address Sanitizer if -fsanitize is specified. Therefore, simply omit -static-libasan for clang builds. And leave behind a comment, because the whole reason for static linking might not be obvious. Cc: Ryan Roberts <[email protected]> Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2024-06-11selftests/openat2: fix clang build failures: -static-libasan, LOCAL_HDRSJohn Hubbard1-2/+12
When building with clang via: make LLVM=1 -C tools/testing/selftests two distinct failures occur: 1) gcc requires -static-libasan in order to ensure that Address Sanitizer's library is the first one loaded. However, this leads to build failures on clang, when building via: make LLVM=1 -C tools/testing/selftests However, clang already does the right thing by default: it statically links the Address Sanitizer if -fsanitize is specified. Therefore, fix this by simply omitting -static-libasan for clang builds. And leave behind a comment, because the whole reason for static linking might not be obvious. 2) clang won't accept invocations of this form, but gcc will: $(CC) file1.c header2.h Fix this by using selftests/lib.mk facilities for tracking local header file dependencies: add them to LOCAL_HDRS, leaving only the .c files to be passed to the compiler. Reviewed-by: Ryan Roberts <[email protected]> Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2024-06-11Merge tag 'vfs-6.10-rc4.fixes' of ↵Linus Torvalds9-93/+215
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "Misc: - Restore debugfs behavior of ignoring unknown mount options - Fix kernel doc for netfs_wait_for_oustanding_io() - Fix struct statx comment after new addition for this cycle - Fix a check in find_next_fd() iomap: - Fix data zeroing behavior when an extent spans the block that contains i_size - Restore i_size increasing in iomap_write_end() for now to avoid stale data exposure on xfs with a realtime device Cachefiles: - Remove unneeded fdtable.h include - Improve trace output for cachefiles_obj_{get,put}_ondemand_fd() - Remove requests from the request list to prevent accessing already freed requests - Fix UAF when issuing restore command while the daemon is still alive by adding an additional reference count to requests - Fix UAF by grabbing a reference during xarray lookup with xa_lock() held - Simplify error handling in cachefiles_ondemand_daemon_read() - Add consistency checks read and open requests to avoid crashes - Add a spinlock to protect ondemand_id variable which is used to determine whether an anonymous cachefiles fd has already been closed - Make on-demand reads killable allowing to handle broken cachefiles daemon better - Flush all requests after the kernel has been marked dead via CACHEFILES_DEAD to avoid hung-tasks - Ensure that closed requests are marked as such to avoid reusing them with a reopen request - Defer fd_install() until after copy_to_user() succeeded and thereby get rid of having to use close_fd() - Ensure that anonymous cachefiles on-demand fds are reused while they are valid to avoid pinning already freed cookies" * tag 'vfs-6.10-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: Fix iomap_adjust_read_range for plen calculation iomap: keep on increasing i_size in iomap_write_end() cachefiles: remove unneeded include of <linux/fdtable.h> fs/file: fix the check in find_next_fd() cachefiles: make on-demand read killable cachefiles: flush all requests after setting CACHEFILES_DEAD cachefiles: Set object to close if ondemand_id < 0 in copen cachefiles: defer exposing anon_fd until after copy_to_user() succeeds cachefiles: never get a new anonymous fd if ondemand_id is valid cachefiles: add spin_lock for cachefiles_ondemand_info cachefiles: add consistency check for copen/cread cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read() cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read() cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd() cachefiles: remove requests from xarray during flushing requests cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd statx: Update offset commentary for struct statx netfs: fix kernel doc for nets_wait_for_outstanding_io() debugfs: continue to ignore unknown mount options
2024-06-11thermal: gov_step_wise: Restore passive polling managementRafael J. Wysocki1-0/+17
Consider a thermal zone with one passive trip point, a cooling device with 3 states (0, 1, 2) bound to it, passive polling enabled (nonzero passive_delay_jiffies) and no regular polling (polling_delay_jiffies equal to 0) that is managed by the Step-Wise governor. Suppose that the initial state of the cooling device is 0 and the zone temperature is below the trip point to start with. When the trip point is crossed, tz->passive is incremented by the thermal core and the governor's .manage() callback is invoked. It sets 'throttle' to 'true' for the trip in question and get_target_state() returns 1 for the instance corresponding to the cooling device (say that 'upper' and 'lower' are set to 2 and 0 for it, respectively), so its state changes to 1. Passive polling is still active for the zone, so next time the temperature is updated, the governor's .manage() callback will be invoked again. If the temperature is still rising, it will change the state of the cooling device to 2. Now suppose that next time the zone temperature is updated, it falls below the trip point, so tz->passive is decremented for the zone (say it becomes 0 then) and the governor's .manage() callbacks runs. It finds that the temperature trend for the zone is 'falling' and 'throttle' will be set to 'false' for the trip in question, so the cooling device's state will be changed to 1. However, because tz->polling is 0 for the zone, the governor's .manage() callback may not be invoked again for a long time and the cooling device's state will not be reset back to 0. This can happen because commit 042a3d80f118 ("thermal: core: Move passive polling management to the core") removed passive polling management from the Step-Wise governor. Before that change, thermal_zone_trip_update() would bump up tz->passive when changing the target state for a thermal instance from "no target" to a specific value and it would drop tz->passive when changing it back to "no target" which would cause passive polling to be active for the zone until the governor has reset the states of all cooling devices. In particular, in the example above tz->passive would be incremented when changing the state of the cooling device from 0 to 1 and then it would be still nonzero when the state of the cooling device was changed from 2 to 1. To prevent this problem from occurring, restore the passive polling management in the Step-Wise governor by partially reverting the commit in question and update the comment in the restored code to explain its role more clearly. Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core") Closes: https://lore.kernel.org/linux-pm/[email protected] Reported-by: Johan Hovold <[email protected]> Tested-by: Johan Hovold <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-06-11netfilter: Use flowlabel flow key when re-routing mangled packetsFlorian Westphal1-0/+1
'ip6 dscp set $v' in an nftables outpute route chain has no effect. While nftables does detect the dscp change and calls the reroute hook. But ip6_route_me_harder never sets the dscp/flowlabel: flowlabel/dsfield routing rules are ignored and no reroute takes place. Thanks to Yi Chen for an excellent reproducer script that I used to validate this change. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Yi Chen <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-06-11netfilter: ipset: Fix race between namespace cleanup and gc in the list:set typeJozsef Kadlecsik2-51/+60
Lion Ackermann reported that there is a race condition between namespace cleanup in ipset and the garbage collection of the list:set type. The namespace cleanup can destroy the list:set type of sets while the gc of the set type is waiting to run in rcu cleanup. The latter uses data from the destroyed set which thus leads use after free. The patch contains the following parts: - When destroying all sets, first remove the garbage collectors, then wait if needed and then destroy the sets. - Fix the badly ordered "wait then remove gc" for the destroy a single set case. - Fix the missing rcu locking in the list:set type in the userspace test case. - Use proper RCU list handlings in the list:set type. The patch depends on c1193d9bbbd3 (netfilter: ipset: Add list flush to cancel_gc). Fixes: 97f7cf1cd80e (netfilter: ipset: fix performance regression in swap operation) Reported-by: Lion Ackermann <[email protected]> Tested-by: Lion Ackermann <[email protected]> Signed-off-by: Jozsef Kadlecsik <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-06-11netfilter: nft_inner: validate mandatory meta and payloadDavide Ornaghi2-0/+7
Check for mandatory netlink attributes in payload and meta expression when used embedded from the inner expression, otherwise NULL pointer dereference is possible from userspace. Fixes: a150d122b6bd ("netfilter: nft_meta: add inner match support") Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Signed-off-by: Davide Ornaghi <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-06-11selftests: seccomp: fix format-zero-length warningsAmer Al Shanawany1-3/+3
fix the following errors by using string format specifier and an empty parameter: seccomp_benchmark.c:197:24: warning: zero-length gnu_printf format string [-Wformat-zero-length] 197 | ksft_print_msg(""); | ^~ seccomp_benchmark.c:202:24: warning: zero-length gnu_printf format string [-Wformat-zero-length] 202 | ksft_print_msg(""); | ^~ seccomp_benchmark.c:204:24: warning: zero-length gnu_printf format string [-Wformat-zero-length] 204 | ksft_print_msg(""); | ^~ Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Suggested-by: Kees Cook <[email protected]> Signed-off-by: Amer Al Shanawany <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2024-06-11selftests: filesystems: fix warn_unused_result build warningsAmer Al Shanawany1-2/+10
Fix the following warnings by adding return check and error messages. statmount_test.c: In function ‘cleanup_namespace’: statmount_test.c:128:9: warning: ignoring return value of ‘fchdir’ declared with attribute ‘warn_unused_result’ [-Wunused-result] 128 | fchdir(orig_root); | ^~~~~~~~~~~~~~~~~ statmount_test.c:129:9: warning: ignoring return value of ‘chroot’ declared with attribute ‘warn_unused_result’ [-Wunused-result] 129 | chroot("."); | ^~~~~~~~~~~ Signed-off-by: Amer Al Shanawany <[email protected]> Reviewed-by: Muhammad Usama Anjum <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2024-06-11s390/mm: Restore mapping of kernel image using large pagesAlexander Gordeev3-4/+26
Since physical and virtual kernel address spaces are uncoupled the kernel image is not mapped using large segment pages anymore, which is a regression. Put the kernel image at the same large segment page offset in physical memory as in virtual memory. Such approach preserves the existing number of bits of entropy used for randomization of the kernel location in virtual memory when KASLR is on. As result, the kernel is mapped using large segment pages. Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Reported-by: Heiko Carstens <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2024-06-11s390/mm: Allow large pages only for aligned physical addressesAlexander Gordeev1-2/+8
Do not allow creation of large pages against physical addresses, which itself are not aligned on the correct boundary. Failure to do so might lead to referencing wrong memory as result of the way DAT works. Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Alexander Gordeev <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2024-06-11s390: Update defconfigsHeiko Carstens3-19/+69
Signed-off-by: Heiko Carstens <[email protected]> Acked-by: Vasily Gorbik <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2024-06-11mips: bmips: BCM6358: make sure CBR is correctly setChristian Marangi1-1/+2
It was discovered that some device have CBR address set to 0 causing kernel panic when arch_sync_dma_for_cpu_all is called. This was notice in situation where the system is booted from TP1 and BMIPS_GET_CBR() returns 0 instead of a valid address and !!(read_c0_brcm_cmt_local() & (1 << 31)); not failing. The current check whether RAC flush should be disabled or not are not enough hence lets check if CBR is a valid address or not. Fixes: ab327f8acdf8 ("mips: bmips: BCM6358: disable RAC flush for TP1") Signed-off-by: Christian Marangi <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>