Age | Commit message (Collapse) | Author | Files | Lines |
|
Friedrich Weber reported a kernel crash problem and bisected to commit
81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine").
The root cause is that we use "list_move_tail(&rq->queuelist, pending)"
in the PREFLUSH/POSTFLUSH sequences. But rq->queuelist.next == xxx since
it's popped out from plug->cached_rq in __blk_mq_alloc_requests_batch().
We don't initialize its queuelist just for this first request, although
the queuelist of all later popped requests will be initialized.
Fix it by changing to use "list_add_tail(&rq->queuelist, pending)" so
rq->queuelist doesn't need to be initialized. It should be ok since rq
can't be on any list when PREFLUSH or POSTFLUSH, has no move actually.
Please note the commit 81ada09cc25e ("blk-flush: reuse rq queuelist in
flush state machine") also has another requirement that no drivers would
touch rq->queuelist after blk_mq_end_request() since we will reuse it to
add rq to the post-flush pending list in POSTFLUSH. If this is not true,
we will have to revert that commit IMHO.
This updated version adds "list_del_init(&rq->queuelist)" in flush rq
callback since the dm layer may submit request of a weird invalid format
(REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH), which causes double list_add
if without this "list_del_init(&rq->queuelist)". The weird invalid format
problem should be fixed in dm layer.
Reported-by: Friedrich Weber <[email protected]>
Closes: https://lore.kernel.org/lkml/[email protected]/
Closes: https://lore.kernel.org/lkml/[email protected]/
Fixes: 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine")
Cc: Christoph Hellwig <[email protected]>
Cc: [email protected]
Cc: [email protected]
Tested-by: Friedrich Weber <[email protected]>
Signed-off-by: Chengming Zhou <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
For zoned block devices using zone write plugging, an rcu_barrier() call
is needed in disk_free_zone_resources() to synchronize freeing of zone
write plugs and the destrution of the mempool used to allocate the
plugs. The barrier call does slow down a little teardown of zoned block
devices but should not affect teardown of regular block devices or zoned
block devices that do not use zone write plugging (e.g. zoned DM devices
that do not require zone append emulation).
Modify disk_free_zone_resources() to return early if we do not have a
mempool to start with, that is, if the device does not use zone write
plugging. This avoids the costly rcu_barrier() and speeds up disk
teardown.
Reported-by: Mikulas Patocka <[email protected]>
Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
Signed-off-by: Damien Le Moal <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Mikulas Patocka <[email protected]>
Reviewed-by: Niklas Cassel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Clang static checker (scan-build) warning:
block/sed-opal.c:line 317, column 3
Value stored to 'ret' is never read.
Fix this problem by returning the error code when keyring_search() failed.
Otherwise, 'key' will have a wrong value when 'kerf' stores the error code.
Fixes: 3bfeb6125664 ("block: sed-opal: keyring support for SED keys")
Signed-off-by: Su Hui <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
There is a couple of outdated addresses that are still visible
in the Git history, add them to .mailmap.
While at it, replace one in the comment.
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After recent changes in intel_pstate, global.turbo_disabled is only set
at the initialization time and never changed. However, it turns out
that on some systems the "turbo disabled" bit in MSR_IA32_MISC_ENABLE,
the initial state of which is reflected by global.turbo_disabled, can be
flipped later and there should be a way to take that into account (other
than checking that MSR every time the driver runs which is costly and
useless overhead on the vast majority of systems).
For this purpose, notice that before the changes in question,
store_no_turbo() contained a turbo_is_disabled() check that was used
for updating global.turbo_disabled if the "turbo disabled" bit in
MSR_IA32_MISC_ENABLE had been flipped and that functionality can be
restored. Then, users will be able to reset global.turbo_disabled
by writing 0 to no_turbo which used to work before on systems with
flipping "turbo disabled" bit.
This guarantees the driver state to remain in sync, but READ_ONCE()
annotations need to be added in two places where global.turbo_disabled
is accessed locklessly, so modify the driver to make that happen.
Fixes: 0940f1a8011f ("cpufreq: intel_pstate: Do not update global.turbo_disabled after initialization")
Closes: https://lore.kernel.org/linux-pm/[email protected]
Suggested-by: Srinivas Pandruvada <[email protected]>
Reported-by: Xi Ruoyao <[email protected]>
Tested-by: Xi Ruoyao <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Based on grepping through the source code this driver appears to be
missing a call to drm_atomic_helper_shutdown() at system shutdown
time. Among other things, this means that if a panel is in use that it
won't be cleanly powered off at system shutdown time.
The fact that we should call drm_atomic_helper_shutdown() in the case
of OS shutdown/restart comes straight out of the kernel doc "driver
instance overview" in drm_drv.c.
This driver users the component model and shutdown happens in the base
driver. The "drvdata" for this driver will always be valid if
shutdown() is called and as of commit 2a073968289d
("drm/atomic-helper: drm_atomic_helper_shutdown(NULL) should be a
noop") we don't need to confirm that "drm" is non-NULL.
Suggested-by: Maxime Ripard <[email protected]>
Reviewed-by: Maxime Ripard <[email protected]>
Reviewed-by: Fei Shao <[email protected]>
Tested-by: Fei Shao <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20240611102744.v2.1.I2b014f90afc4729b6ecc7b5ddd1f6dedcea4625b@changeid
|
|
Based on grepping through the source code, this driver appears to be
missing a call to drm_atomic_helper_shutdown() at system shutdown time.
This is important because drm_atomic_helper_shutdown() will cause
panels to get disabled cleanly which may be important for their power
sequencing. Future changes will remove any custom powering off in
individual panel drivers so the DRM drivers need to start getting this
right.
The fact that we should call drm_atomic_helper_shutdown() in the case of
OS shutdown comes straight out of the kernel doc "driver instance
overview" in drm_drv.c.
[geert: shmob_drm_remove() already calls drm_atomic_helper_shutdown]
Suggested-by: Maxime Ripard <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
Link: https://lore.kernel.org/r/20230901164111.RFT.15.Iaf638a1d4c8b3c307a6192efabb4cbb06b195f15@changeid
[geert: s/drm_helper_force_disable_all/drm_atomic_helper_shutdown/]
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Laurent Pinchart <[email protected]>
Reviewed-by: Sui Jingfeng <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/17c6a5a668e5975f871b77fb1fca6711a0799d9e.1718176895.git.geert+renesas@glider.be
|
|
parameters
The current cbs parameter depends on speed after uplinking,
which is not needed and will report a configuration error
if the port is not initially connected. The UAPI exposed by
tc-cbs requires userspace to recalculate the send slope anyway,
because the formula depends on port_transmit_rate (see man tc-cbs),
which is not an invariant from tc's perspective. Therefore, we
use offload->sendslope and offload->idleslope to derive the
original port_transmit_rate from the CBS formula.
Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC")
Signed-off-by: Xiaolei Wang <[email protected]>
Reviewed-by: Wojciech Drewek <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
TSO currently fails when the skb's gso_type field has more than one bit
set.
TSO packets can be passed from userspace using PF_PACKET, TUNTAP and a
few others, using virtio_net_hdr (e.g., PACKET_VNET_HDR). This includes
virtualization, such as QEMU, a real use-case.
The gso_type and gso_size fields as passed from userspace in
virtio_net_hdr are not trusted blindly by the kernel. It adds gso_type
|= SKB_GSO_DODGY to force the packet to enter the software GSO stack
for verification.
This issue might similarly come up when the CWR bit is set in the TCP
header for congestion control, causing the SKB_GSO_TCP_ECN gso_type bit
to be set.
Fixes: a57e5de476be ("gve: DQO: Add TX path")
Signed-off-by: Joshua Washington <[email protected]>
Reviewed-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Harshitha Ramamurthy <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Suggested-by: Eric Dumazet <[email protected]>
Acked-by: Andrei Vagin <[email protected]>
v2 - Remove unnecessary comments, remove line break between fixes tag
and signoffs.
v3 - Add back unrelated empty line removal.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_sync: fix not using correct handle
- L2CAP: fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
- L2CAP: fix connection setup in l2cap_connect
* tag 'for-net-2024-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: fix connection setup in l2cap_connect
Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
Bluetooth: hci_sync: Fix not using correct handle
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP as reported by
checkpatch script.
Fixes: 18ff0bcda6d1 ("ethtool: add interface to interact with Ethernet Power Equipment")
Reviewed-by: Andrew Lunn <[email protected]>
Acked-by: Oleksij Rempel <[email protected]>
Signed-off-by: Kory Maincent <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The function mpi3mr_qcmd() of the mpi3mr driver is able to indicate to
the HBA if a read or write command directed at an ATA device should be
translated to an NCQ read/write command with the high prioiryt bit set
when the request uses the RT priority class and the user has enabled NCQ
priority through sysfs.
However, unlike the mpt3sas driver, the mpi3mr driver does not define
the sas_ncq_prio_supported and sas_ncq_prio_enable sysfs attributes, so
the ncq_prio_enable field of struct mpi3mr_sdev_priv_data is never
actually set and NCQ Priority cannot ever be used.
Fix this by defining these missing atributes to allow a user to check if
an ATA device supports NCQ priority and to enable/disable the use of NCQ
priority. To do this, lift the function scsih_ncq_prio_supp() out of the
mpt3sas driver and make it the generic SCSI SAS transport function
sas_ata_ncq_prio_supported(). Nothing in that function is hardware
specific, so this function can be used in both the mpt3sas driver and
the mpi3mr driver.
Reported-by: Scott McCoy <[email protected]>
Fixes: 023ab2a9b4ed ("scsi: mpi3mr: Add support for queue command processing")
Cc: [email protected]
Signed-off-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Niklas Cassel <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
In ufshcd_clock_scaling_prepare(), after SCSI layer is blocked,
ufshcd_pending_cmds() is called to check whether there are pending
transactions or not. And only if there are no pending transactions can we
proceed to kickstart the clock scaling sequence.
ufshcd_pending_cmds() traverses over all SCSI devices and calls
sbitmap_weight() on their budget_map. sbitmap_weight() can be broken down
to three steps:
1. Calculate the nr outstanding bits set in the 'word' bitmap.
2. Calculate the nr outstanding bits set in the 'cleared' bitmap.
3. Subtract the result from step 1 by the result from step 2.
This can lead to a race condition as outlined below:
Assume there is one pending transaction in the request queue of one SCSI
device, say sda, and the budget token of this request is 0, the 'word' is
0x1 and the 'cleared' is 0x0.
1. When step 1 executes, it gets the result as 1.
2. Before step 2 executes, block layer tries to dispatch a new request to
sda. Since the SCSI layer is blocked, the request cannot pass through
SCSI but the block layer would do budget_get() and budget_put() to
sda's budget map regardless, so the 'word' has become 0x3 and 'cleared'
has become 0x2 (assume the new request got budget token 1).
3. When step 2 executes, it gets the result as 1.
4. When step 3 executes, it gets the result as 0, meaning there is no
pending transactions, which is wrong.
Thread A Thread B
ufshcd_pending_cmds() __blk_mq_sched_dispatch_requests()
| |
sbitmap_weight(word) |
| scsi_mq_get_budget()
| |
| scsi_mq_put_budget()
| |
sbitmap_weight(cleared)
...
When this race condition happens, the clock scaling sequence is started
with transactions still in flight, leading to subsequent hibernate enter
failure, broken link, task abort and back to back error recovery.
Fix this race condition by quiescing the request queues before calling
ufshcd_pending_cmds() so that block layer won't touch the budget map when
ufshcd_pending_cmds() is working on it. In addition, remove the SCSI layer
blocking/unblocking to reduce redundancies and latencies.
Fixes: 8d077ede48c1 ("scsi: ufs: Optimize the command queueing code")
Co-developed-by: Can Guo <[email protected]>
Signed-off-by: Can Guo <[email protected]>
Signed-off-by: Ziqi Chen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Bart Van Assche <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
For SCSI devices supporting the Command Duration Limits feature set, the
user can enable/disable this feature use through the sysfs device attribute
"cdl_enable". This attribute modification triggers a call to
scsi_cdl_enable() to enable and disable the feature for ATA devices and set
the scsi device cdl_enable field to the user provided bool value. For SCSI
devices supporting CDL, the feature set is always enabled and
scsi_cdl_enable() is reduced to setting the cdl_enable field.
However, for ATA devices, a drive may spin-up with the CDL feature enabled
by default. But the SCSI device cdl_enable field is always initialized to
false (CDL disabled), regardless of the actual device CDL feature
state. For ATA devices managed by libata (or libsas), libata-core always
disables the CDL feature set when the device is attached, thus syncing the
state of the CDL feature on the device and of the SCSI device cdl_enable
field. However, for ATA devices connected to a SAS HBA, the CDL feature is
not disabled on scan for ATA devices that have this feature enabled by
default, leading to an inconsistent state of the feature on the device with
the SCSI device cdl_enable field.
Avoid this inconsistency by adding a call to scsi_cdl_enable() in
scsi_cdl_check() to make sure that the device-side state of the CDL feature
set always matches the scsi device cdl_enable field state. This implies
that CDL will always be disabled for ATA devices connected to SAS HBAs,
which is consistent with libata/libsas initialization of the device.
Reported-by: Scott McCoy <[email protected]>
Fixes: 1b22cfb14142 ("scsi: core: Allow enabling and disabling command duration limits")
Cc: [email protected]
Signed-off-by: Damien Le Moal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Niklas Cassel <[email protected]>
Reviewed-by: Igor Pylypiv <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
'init_exec' is unused since
commit cb75d97e9c77 ("drm/nouveau: implement devinit subdev, and new
init table parser")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <[email protected]>
Acked-by: Danilo Krummrich <[email protected]>
Signed-off-by: Danilo Krummrich <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"Misc:
- Restore debugfs behavior of ignoring unknown mount options
- Fix kernel doc for netfs_wait_for_oustanding_io()
- Fix struct statx comment after new addition for this cycle
- Fix a check in find_next_fd()
iomap:
- Fix data zeroing behavior when an extent spans the block that
contains i_size
- Restore i_size increasing in iomap_write_end() for now to avoid
stale data exposure on xfs with a realtime device
Cachefiles:
- Remove unneeded fdtable.h include
- Improve trace output for cachefiles_obj_{get,put}_ondemand_fd()
- Remove requests from the request list to prevent accessing already
freed requests
- Fix UAF when issuing restore command while the daemon is still
alive by adding an additional reference count to requests
- Fix UAF by grabbing a reference during xarray lookup with xa_lock()
held
- Simplify error handling in cachefiles_ondemand_daemon_read()
- Add consistency checks read and open requests to avoid crashes
- Add a spinlock to protect ondemand_id variable which is used to
determine whether an anonymous cachefiles fd has already been
closed
- Make on-demand reads killable allowing to handle broken cachefiles
daemon better
- Flush all requests after the kernel has been marked dead via
CACHEFILES_DEAD to avoid hung-tasks
- Ensure that closed requests are marked as such to avoid reusing
them with a reopen request
- Defer fd_install() until after copy_to_user() succeeded and thereby
get rid of having to use close_fd()
- Ensure that anonymous cachefiles on-demand fds are reused while
they are valid to avoid pinning already freed cookies"
* tag 'vfs-6.10-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
iomap: Fix iomap_adjust_read_range for plen calculation
iomap: keep on increasing i_size in iomap_write_end()
cachefiles: remove unneeded include of <linux/fdtable.h>
fs/file: fix the check in find_next_fd()
cachefiles: make on-demand read killable
cachefiles: flush all requests after setting CACHEFILES_DEAD
cachefiles: Set object to close if ondemand_id < 0 in copen
cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
cachefiles: never get a new anonymous fd if ondemand_id is valid
cachefiles: add spin_lock for cachefiles_ondemand_info
cachefiles: add consistency check for copen/cread
cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read()
cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
cachefiles: remove requests from xarray during flushing requests
cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
statx: Update offset commentary for struct statx
netfs: fix kernel doc for nets_wait_for_outstanding_io()
debugfs: continue to ignore unknown mount options
|
|
Consider a thermal zone with one passive trip point, a cooling device
with 3 states (0, 1, 2) bound to it, passive polling enabled (nonzero
passive_delay_jiffies) and no regular polling (polling_delay_jiffies
equal to 0) that is managed by the Step-Wise governor. Suppose that
the initial state of the cooling device is 0 and the zone temperature
is below the trip point to start with.
When the trip point is crossed, tz->passive is incremented by the
thermal core and the governor's .manage() callback is invoked. It
sets 'throttle' to 'true' for the trip in question and
get_target_state() returns 1 for the instance corresponding to the
cooling device (say that 'upper' and 'lower' are set to 2 and 0 for
it, respectively), so its state changes to 1.
Passive polling is still active for the zone, so next time the
temperature is updated, the governor's .manage() callback will be
invoked again. If the temperature is still rising, it will change
the state of the cooling device to 2.
Now suppose that next time the zone temperature is updated, it falls
below the trip point, so tz->passive is decremented for the zone (say
it becomes 0 then) and the governor's .manage() callbacks runs.
It finds that the temperature trend for the zone is 'falling' and
'throttle' will be set to 'false' for the trip in question, so the
cooling device's state will be changed to 1. However, because
tz->polling is 0 for the zone, the governor's .manage() callback
may not be invoked again for a long time and the cooling device's
state will not be reset back to 0.
This can happen because commit 042a3d80f118 ("thermal: core: Move
passive polling management to the core") removed passive polling
management from the Step-Wise governor.
Before that change, thermal_zone_trip_update() would bump up
tz->passive when changing the target state for a thermal instance
from "no target" to a specific value and it would drop tz->passive
when changing it back to "no target" which would cause passive
polling to be active for the zone until the governor has reset the
states of all cooling devices. In particular, in the example above
tz->passive would be incremented when changing the state of the
cooling device from 0 to 1 and then it would be still nonzero when
the state of the cooling device was changed from 2 to 1.
To prevent this problem from occurring, restore the passive polling
management in the Step-Wise governor by partially reverting the
commit in question and update the comment in the restored code
to explain its role more clearly.
Fixes: 042a3d80f118 ("thermal: core: Move passive polling management to the core")
Closes: https://lore.kernel.org/linux-pm/[email protected]
Reported-by: Johan Hovold <[email protected]>
Tested-by: Johan Hovold <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
'ip6 dscp set $v' in an nftables outpute route chain has no effect.
While nftables does detect the dscp change and calls the reroute hook.
But ip6_route_me_harder never sets the dscp/flowlabel:
flowlabel/dsfield routing rules are ignored and no reroute takes place.
Thanks to Yi Chen for an excellent reproducer script that I used
to validate this change.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Yi Chen <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Lion Ackermann reported that there is a race condition between namespace cleanup
in ipset and the garbage collection of the list:set type. The namespace
cleanup can destroy the list:set type of sets while the gc of the set type is
waiting to run in rcu cleanup. The latter uses data from the destroyed set which
thus leads use after free. The patch contains the following parts:
- When destroying all sets, first remove the garbage collectors, then wait
if needed and then destroy the sets.
- Fix the badly ordered "wait then remove gc" for the destroy a single set
case.
- Fix the missing rcu locking in the list:set type in the userspace test
case.
- Use proper RCU list handlings in the list:set type.
The patch depends on c1193d9bbbd3 (netfilter: ipset: Add list flush to cancel_gc).
Fixes: 97f7cf1cd80e (netfilter: ipset: fix performance regression in swap operation)
Reported-by: Lion Ackermann <[email protected]>
Tested-by: Lion Ackermann <[email protected]>
Signed-off-by: Jozsef Kadlecsik <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Check for mandatory netlink attributes in payload and meta expression
when used embedded from the inner expression, otherwise NULL pointer
dereference is possible from userspace.
Fixes: a150d122b6bd ("netfilter: nft_meta: add inner match support")
Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching")
Signed-off-by: Davide Ornaghi <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Since physical and virtual kernel address spaces are uncoupled
the kernel image is not mapped using large segment pages anymore,
which is a regression.
Put the kernel image at the same large segment page offset in
physical memory as in virtual memory. Such approach preserves
the existing number of bits of entropy used for randomization
of the kernel location in virtual memory when KASLR is on.
As result, the kernel is mapped using large segment pages.
Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Reported-by: Heiko Carstens <[email protected]>
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Do not allow creation of large pages against physical addresses,
which itself are not aligned on the correct boundary. Failure to
do so might lead to referencing wrong memory as result of the way
DAT works.
Fixes: c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Signed-off-by: Heiko Carstens <[email protected]>
Acked-by: Vasily Gorbik <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
If the card doesn't have display hardware, hpd_work and hpd_lock are
left uninitialized which causes BUG when attempting to schedule hpd_work
on runtime PM resume.
Fix it by adding headless flag to DRM and skip any hpd if it's set.
Fixes: ae1aadb1eb8d ("nouveau: don't fail driver load if no display hw present.")
Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/337
Signed-off-by: Vasily Khoruzhick <[email protected]>
Reviewed-by: Ben Skeggs <[email protected]>
Signed-off-by: Danilo Krummrich <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Due to timer wheel implementation, a timer will usually fire
after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s
has a granularity of 64ms.
For this range of values, the extra delay could be up to 63ms.
For TCP, this means that tp->rcv_tstamp may be after
inet_csk(sk)->icsk_timeout whenever the timer interrupt
finally triggers, if one packet came during the extra delay.
We need to make sure tcp_rtx_probe0_timed_out() handles this case.
Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Menglong Dong <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Reviewed-by: Jason Xing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Matthieu Baerts says:
====================
mptcp: various fixes
The different patches here are some unrelated fixes for MPTCP:
- Patch 1 ensures 'snd_una' is initialised on connect in case of MPTCP
fallback to TCP followed by retransmissions before the processing of
any other incoming packets. A fix for v5.9+.
- Patch 2 makes sure the RmAddr MIB counter is incremented, and only
once per ID, upon the reception of a RM_ADDR. A fix for v5.10+.
- Patch 3 doesn't update 'add addr' related counters if the connect()
was not possible. A fix for v5.7+.
- Patch 4 updates the mailmap file to add Geliang's new email address.
====================
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-0-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Just like my other email addresses, map my new one to kernel.org
account too.
My new email address uses "last name, first name" format, which is
different from my other email addresses. This mailmap is also used
to indicate that it is actually the same person.
Suggested-by: Mat Martineau <[email protected]>
Suggested-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-4-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The creation of new subflows can fail for different reasons. If no
subflow have been created using the received ADD_ADDR, the related
counters should not be updated, otherwise they will never be decremented
for events related to this ID later on.
For the moment, the number of accepted ADD_ADDR is only decremented upon
the reception of a related RM_ADDR, and only if the remote address ID is
currently being used by at least one subflow. In other words, if no
subflow can be created with the received address, the counter will not
be decremented. In this case, it is then important not to increment
pm.add_addr_accepted counter, and not to modify pm.accept_addr bit.
Note that this patch does not modify the behaviour in case of failures
later on, e.g. if the MP Join is dropped or rejected.
The "remove invalid addresses" MP Join subtest has been modified to
validate this case. The broadcast IP address is added before the "valid"
address that will be used to successfully create a subflow, and the
limit is decreased by one: without this patch, it was not possible to
create the last subflow, because:
- the broadcast address would have been accepted even if it was not
usable: the creation of a subflow to this address results in an error,
- the limit of 2 accepted ADD_ADDR would have then been reached.
Fixes: 01cacb00b35c ("mptcp: add netlink-based PM")
Cc: [email protected]
Co-developed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: YonglongLi <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-3-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The RmAddr MIB counter is supposed to be incremented once when a valid
RM_ADDR has been received. Before this patch, it could have been
incremented as many times as the number of subflows connected to the
linked address ID, so it could have been 0, 1 or more than 1.
The "RmSubflow" is incremented after a local operation. In this case,
it is normal to tied it with the number of subflows that have been
actually removed.
The "remove invalid addresses" MP Join subtest has been modified to
validate this case. A broadcast IP address is now used instead: the
client will not be able to create a subflow to this address. The
consequence is that when receiving the RM_ADDR with the ID attached to
this broadcast IP address, no subflow linked to this ID will be found.
Fixes: 7a7e52e38a40 ("mptcp: add RM_ADDR related mibs")
Cc: [email protected]
Co-developed-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Signed-off-by: YonglongLi <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-2-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This is strictly related to commit fb7a0d334894 ("mptcp: ensure snd_nxt
is properly initialized on connect"). It turns out that syzkaller can
trigger the retransmit after fallback and before processing any other
incoming packet - so that snd_una is still left uninitialized.
Address the issue explicitly initializing snd_una together with snd_nxt
and write_seq.
Suggested-by: Mat Martineau <[email protected]>
Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect")
Cc: [email protected]
Reported-by: Christoph Paasch <[email protected]>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-1-1ab9ddfa3d00@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When the noop_qdisc owner isn't initialized, then it will be 0,
so packets will erroneously be regarded as having been subject
to recursion as long as only CPU 0 queues them. For non-SMP,
that's all packets, of course. This causes a change in what's
reported to userspace, normally noop_qdisc would drop packets
silently, but with this change the syscall returns -ENOBUFS if
RECVERR is also set on the socket.
Fix this by initializing the owner field to -1, just like it
would be for dynamically allocated qdiscs by qdisc_alloc().
Fixes: 0f022d32c3ec ("net/sched: Fix mirred deadlock on device recursion")
Signed-off-by: Johannes Berg <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/20240607175340.786bfb938803.I493bf8422e36be4454c08880a8d3703cea8e421a@changeid
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash
when the boot CPU is nohz_full") the kernel no longer crashes, but there is
another problem.
In this case tick_setup_device() calls tick_take_do_timer_from_boot() to
update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled)
in smp_call_function_single().
Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new
comment explains why this is safe (thanks Thomas!).
Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full")
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]
|
|
This happens when the amdgpu_bo_release_notify running
before amdgpu_ttm_set_buffer_funcs_status set the buffer
funcs to enabled.
check the buffer funcs enablement before calling the fill
buffer memory.
v2:(Christian)
- Apply it only for GEM buffers and since GEM buffers are only
allocated/freed while the driver is loaded we never run into
the issue to clear with buffer funcs disabled.
v3:(Mario)
- drop the stable tag as this will presumably go into a
-fixes PR for 6.10
Log snip:
*ERROR* Trying to clear memory with ring turned off.
RIP: 0010:amdgpu_bo_release_notify+0x201/0x220 [amdgpu]
Fixes: a68c7eaa7a8f ("drm/amdgpu: Enable clear page functionality")
Signed-off-by: Arunpravin Paneer Selvam <[email protected]>
Reviewed-by: Christian König <[email protected]>
Tested-by: Mikhail Gavrilov <[email protected]>
Tested-by: Richard Gong <[email protected]>
Suggested-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
We use the polling interface to srcu for tracking pending frees; when
shutting down we don't need to wait for an srcu barrier to free them,
but SRCU still gets confused if we shutdown with an outstanding grace
period.
Reported-by: [email protected]
Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Turn more asserts into proper recoverable error paths.
Reported-by: [email protected]
Signed-off-by: Kent Overstreet <[email protected]>
|
|
The bucket_gens array and gc_buckets array known their own size; we
should be using those members, and returning an error.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc2-ktest-00018-gebd1d148b278 #144 Not tainted
------------------------------------------------------
fio/1345 is trying to acquire lock:
ffff88813e200ab8 (&c->snapshot_create_lock){++++}-{3:3}, at: bch2_truncate+0x76/0xf0
but task is already holding lock:
ffff888105a1fa38 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: do_truncate+0x7b/0xc0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}:
down_write+0x3d/0xd0
bch2_write_iter+0x1c0/0x10f0
vfs_write+0x24a/0x560
__x64_sys_pwrite64+0x77/0xb0
x64_sys_call+0x17e5/0x1ab0
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #1 (sb_writers#10){.+.+}-{0:0}:
mnt_want_write+0x4a/0x1d0
filename_create+0x69/0x1a0
user_path_create+0x38/0x50
bch2_fs_file_ioctl+0x315/0xbf0
__x64_sys_ioctl+0x297/0xaf0
x64_sys_call+0x10cb/0x1ab0
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #0 (&c->snapshot_create_lock){++++}-{3:3}:
__lock_acquire+0x1445/0x25b0
lock_acquire+0xbd/0x2b0
down_read+0x40/0x180
bch2_truncate+0x76/0xf0
bchfs_truncate+0x240/0x3f0
bch2_setattr+0x7b/0xb0
notify_change+0x322/0x4b0
do_truncate+0x8b/0xc0
do_ftruncate+0x110/0x270
__x64_sys_ftruncate+0x43/0x80
x64_sys_call+0x1373/0x1ab0
do_syscall_64+0x68/0x130
entry_SYSCALL_64_after_hwframe+0x4b/0x53
other info that might help us debug this:
Chain exists of:
&c->snapshot_create_lock --> sb_writers#10 --> &sb->s_type->i_mutex_key#13
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sb->s_type->i_mutex_key#13);
lock(sb_writers#10);
lock(&sb->s_type->i_mutex_key#13);
rlock(&c->snapshot_create_lock);
*** DEADLOCK ***
Signed-off-by: Kent Overstreet <[email protected]>
|
|
fsck_err() does a goto fsck_err on error; factor out check_fix_ptr() so
that our error label can drop our device ref.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We count objects as freed when we move them to the srcu-pending lists
because we're doing the equivalent of a kfree_srcu(); the only
difference is managing the pending list ourself means we can allocate
from the pending list.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
inodes and dentries are still present in the btree node cache, in much
more compact form
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Since the key cache shrinker walks the rhashtable, a mostly empty
rhashtable leads to really nasty reclaim performance issues.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
There are three keys displayed in non-uniform format.
Let's fix them.
[Before]
```
Label: testbcachefs
Version: 1.9: (unknown version)
Version upgrade complete: 0.0: (unknown version)
```
[After]
```
Label: testbcachefs
Version: 1.9: (unknown version)
Version upgrade complete: 0.0: (unknown version)
```
Fixes: 7423330e30ab ("bcachefs: prt_printf() now respects \r\n\t")
Signed-off-by: Hongbo Li <[email protected]>
Signed-off-by: Kent Overstreet <[email protected]>
|
|
fsck.c always runs top of the stack so we're not too concerned here;
noinline_for_stack is sufficient
Signed-off-by: Kent Overstreet <[email protected]>
|
|
for forwards compat we now explicitly allow mounting and using
filesystems with unknown btrees, and we have to walk them for fsck.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
error handling here is slightly odd, which is why we were accidently
calling evict() on an error pointer
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Split the workqueues for btree read completions and btree write
submissions; we don't want concurrency control on btree read
completions, but we do want concurrency control on write submissions,
else blocking in submit_bio() will cause a ton of kworkers to be
allocated.
Signed-off-by: Kent Overstreet <[email protected]>
|