Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Functional fixes:
- Fix big endian conversion for arm64 in recordmcount processing
- Fix timestamp corruption in ring buffer on discarding events
- Fix memory leak in __create_synth_event()
- Skip selftests if tracing is disabled as it will cause them to
fail.
Non-functional fixes:
- Fix help text in Kconfig
- Remove duplicate prototype for trace_empty()
- Fix stale comment about the trace_event_call flags.
Self test update:
- Add more information to the validation output of when a corrupt
timestamp is found in the ring buffer, and also trigger a warning
to make sure that tests catch it"
* tag 'trace-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix comment about the trace_event_call flags
tracing: Skip selftests if tracing is disabled
tracing: Fix memory leak in __create_synth_event()
ring-buffer: Add a little more information and a WARN when time stamp going backwards is detected
ring-buffer: Force before_stamp and write_stamp to be different on discard
tracing: Fix help text of TRACEPOINT_BENCHMARK in Kconfig
tracing: Remove duplicate declaration from trace.h
ftrace: Have recordmcount use w8 to read relp->r_info in arm64_is_fake_mcount
|
|
The current blkio.throttle.io_service_bytes_recursive doesn't
work correctly.
As an example, for the following blkcg hierarchy:
(Made 1GB READ in test1, 512MB READ in test2)
test
/ \
test1 test2
$ head -n 1 test/test1/blkio.throttle.io_service_bytes_recursive
8:0 Read 1073684480
$ head -n 1 test/test2/blkio.throttle.io_service_bytes_recursive
8:0 Read 537448448
$ head -n 1 test/blkio.throttle.io_service_bytes_recursive
8:0 Read 537448448
Clearly, above data of "test" reflects "test2" not "test1"+"test2".
Do the correct summary in blkg_rwstat_recursive_sum().
Signed-off-by: Xunlei Pang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Prevent that an IO request is build during device shutdown initiated by
a driver unbind. This request will never be able to be processed or
canceled and will hang forever. This will lead also to a hanging unbind.
Fix by checking not only if the device is in READY state but also check
that there is no device offline initiated before building a new IO request.
Fixes: e443343e509a ("s390/dasd: blk-mq conversion")
Cc: <[email protected]> # v4.14+
Signed-off-by: Stefan Haberland <[email protected]>
Tested-by: Bjoern Walk <[email protected]>
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In case of an unbind of the DASD device driver the function
dasd_generic_remove() is called which shuts down the device.
Among others this functions removes the int_handler from the cdev.
During shutdown the device cancels all outstanding IO requests and waits
for completion of the clear request.
Unfortunately the clear interrupt will never be received when there is no
interrupt handler connected.
Fix by moving the int_handler removal after the call to the state machine
where no request or interrupt is outstanding.
Cc: [email protected]
Signed-off-by: Stefan Haberland <[email protected]>
Tested-by: Bjoern Walk <[email protected]>
Reviewed-by: Jan Hoeppner <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Commit 384d87ef2c95 ("block: Do not discard buffers under a mounted
filesystem") made paths issuing discard or zeroout requests to the
underlying device try to grab block device in exclusive mode. If that
failed we returned EBUSY to userspace. This however caused unexpected
fallout in userspace where e.g. FUSE filesystems issue discard requests
from userspace daemons although the device is open exclusively by the
kernel. Also shrinking of logical volume by LVM issues discard requests
to a device which may be claimed exclusively because there's another LV
on the same PV. So to avoid these userspace regressions, fall back to
invalidate_inode_pages2_range() instead of returning EBUSY to userspace
and return EBUSY only of that call fails as well (meaning that there's
indeed someone using the particular device range we are trying to
discard).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=211167
Fixes: 384d87ef2c95 ("block: Do not discard buffers under a mounted filesystem")
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In rxe_comp.c in rxe_completer() the function free_pkt() did not clear skb
which triggered a warning at 'done:' and could possibly at 'exit:'. The
WARN_ONCE() calls are not actually needed. The call to free_pkt() is
moved to the end to clearly show that all skbs are freed.
Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
rxe_rcv_mcast_pkt() dropped a reference to ib_device when no error
occurred causing an underflow on the reference counter. This code is
cleaned up to be clearer and easier to read.
Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
When the noted patch below extending the reference taken by
rxe_get_dev_from_net() in rxe_udp_encap_recv() until each skb is freed it
was not matched by a reference in the loopback path resulting in
underflows.
Fixes: 899aba891cab ("RDMA/rxe: Fix FIXME in rxe_udp_encap_recv()")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bob Pearson <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
45d189c606292 ("io_uring: replace force_nonblock with flags") did
something strange for io_openat() slicing all issue_flags but
IO_URING_F_NONBLOCK. Not a bug for now, but better to just forward the
flags.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull NVMe fixes from Christoph:
"nvme fixes for 5.12:
- more device quirks (Julian Einwag, Zoltán Böszörményi, Pascal Terjan)
- fix a hwmon error return (Daniel Wagner)
- fix the keep alive timeout initialization (Martin George)
- ensure the model_number can't be changed on a used subsystem
(Max Gurtovoy)"
* tag 'nvme-5.12-2021-03-05' of git://git.infradead.org/nvme:
nvmet: model_number must be immutable once set
nvme-fabrics: fix kato initialization
nvme-hwmon: Return error code when registration fails
nvme-pci: add quirks for Lexar 256GB SSD
nvme-pci: mark Kingston SKC2000 as not supporting the deepest power state
nvme-pci: mark Seagate Nytro XM1440 as QUIRK_NO_NS_DESC_LIST.
|
|
We have this weird true/false return from parking, and then some of the
callers decide to look at that. It can lead to unbalanced parks and
sqd locking. Have the callers check the thread status once it's parked.
We know we have the lock at that point, so it's either valid or it's NULL.
Fix race with parking on thread exit. We need to be careful here with
ordering of the sdq->lock and the IO_SQ_THREAD_SHOULD_PARK bit.
Rename sqd->completion to sqd->parked to reflect that this is the only
thing this completion event doesn.
Signed-off-by: Jens Axboe <[email protected]>
|
|
If we race with shutting down the io-wq context and someone queueing
a hashed entry, then we can exit the manager with it armed. If it then
triggers after the manager has exited, we can have a use-after-free where
io_wqe_hash_wake() attempts to wake a now gone manager process.
Move the killing of the hashed write queue into the manager itself, so
that we know we've killed it before the task exits.
Fixes: e941894eae31 ("io-wq: make buffered file write hashed work map per-ctx")
Signed-off-by: Jens Axboe <[email protected]>
|
|
The callback can only be armed, if we get -EIOCBQUEUED returned. It's
important that we clear the WAITQ bit for other cases, otherwise we can
queue for async retry and filemap will assume that we're armed and
return -EAGAIN instead of just blocking for the IO.
Cc: [email protected] # 5.9+
Signed-off-by: Jens Axboe <[email protected]>
|
|
It doesn't make sense to wait for more events to come in, if we can't
even flush the overflow we already have to the ring. Return -EBUSY for
that condition, just like we do for attempts to submit with overflow
pending.
Cc: [email protected] # 5.11
Signed-off-by: Jens Axboe <[email protected]>
|
|
This allows us to do task creation and setup without needing to use
completions to try and synchronize with the starting thread. Get rid of
the old io_wq_fork_thread() wrapper, and the 'wq' and 'worker' startup
completion events - we can now do setup before the task is running.
Signed-off-by: Jens Axboe <[email protected]>
|
|
* powercap:
powercap/drivers/dtpm: Add the experimental label to the option description
powercap/drivers/dtpm: Fix root node initialization
|
|
Directly connect the 'npt' param to the 'npt_enabled' variable so that
runtime adjustments to npt_enabled are reflected in sysfs. Move the
!PAE restriction to a runtime check to ensure NPT is forced off if the
host is using 2-level paging, and add a comment explicitly stating why
NPT requires a 64-bit kernel or a kernel with PAE enabled.
Opportunistically switch the param to octal permissions.
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Reviewed-by: Vitaly Kuznetsov <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When posting a deadline timer interrupt, open code the checks guarding
__kvm_wait_lapic_expire() in order to skip the lapic_timer_int_injected()
check in kvm_wait_lapic_expire(). The injection check will always fail
since the interrupt has not yet be injected. Moving the call after
injection would also be wrong as that wouldn't actually delay delivery
of the IRQ if it is indeed sent via posted interrupt.
Fixes: 010fd37fddf6 ("KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
In case we have already established connection to nvmf target, it
shouldn't be allowed to change the model_number. E.g. if someone will
identify ctrl and get model_number of "my_model" later on will change
the model_numbel via configfs to "my_new_model" this will break the NVMe
specification for "Get Log Page – Persistent Event Log" that refers to
Model Number as: "This field contains the same value as reported in the
Model Number field of the Identify Controller data structure, bytes
63:24."
Although it doesn't mentioned explicitly that this field can't be
changed, we can assume it.
So allow setting this field only once: using configfs or in the first
identify ctrl operation.
Signed-off-by: Max Gurtovoy <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Currently kato is initialized to NVME_DEFAULT_KATO for both
discovery & i/o controllers. This is a problem specifically
for non-persistent discovery controllers since it always ends
up with a non-zero kato value. Fix this by initializing kato
to zero instead, and ensuring various controllers are assigned
appropriate kato values as follows:
non-persistent controllers - kato set to zero
persistent controllers - kato set to NVMF_DEV_DISC_TMO
(or any positive int via nvme-cli)
i/o controllers - kato set to NVME_DEFAULT_KATO
(or any positive int via nvme-cli)
Signed-off-by: Martin George <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
The hwmon pointer wont be NULL if the registration fails. Though the
exit code path will assign it to ctrl->hwmon_device. Later
nvme_hwmon_exit() will try to free the invalid pointer. Avoid this by
returning the error code from hwmon_device_register_with_info().
Fixes: ed7770f66286 ("nvme/hwmon: rework to avoid devm allocation")
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
Add the NVME_QUIRK_NO_NS_DESC_LIST and NVME_QUIRK_IGNORE_DEV_SUBNQN
quirks for this buggy device.
Reported and tested in https://bugs.mageia.org/show_bug.cgi?id=28417
Signed-off-by: Pascal Terjan <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
My 2TB SKC2000 showed the exact same symptoms that were provided
in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on
Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed
cold boot to get it back.
According to some sources, the A2000 is simply a rebadged
SKC2000 with a slightly optimized firmware.
Adding the SKC2000 PCI ID to the quirk list with the same workaround
as the A2000 made my laptop survive a 5 hours long Yocto bootstrap
buildfest which reliably triggered the SSD lockup previously.
Signed-off-by: Zoltán Böszörményi <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
The kernel fails to fully detect these SSDs, only the character devices
are present:
[ 10.785605] nvme nvme0: pci function 0000:04:00.0
[ 10.876787] nvme nvme1: pci function 0000:81:00.0
[ 13.198614] nvme nvme0: missing or invalid SUBNQN field.
[ 13.198658] nvme nvme1: missing or invalid SUBNQN field.
[ 13.206896] nvme nvme0: Shutdown timeout set to 20 seconds
[ 13.215035] nvme nvme1: Shutdown timeout set to 20 seconds
[ 13.225407] nvme nvme0: 16/0/0 default/read/poll queues
[ 13.233602] nvme nvme1: 16/0/0 default/read/poll queues
[ 13.239627] nvme nvme0: Identify Descriptors failed (8194)
[ 13.246315] nvme nvme1: Identify Descriptors failed (8194)
Adding the NVME_QUIRK_NO_NS_DESC_LIST fixes this problem.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205679
Signed-off-by: Julian Einwag <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
|
|
Pull drm fixes from Dave Airlie:
"More may show up but this is what I have at this stage: just a single
nouveau regression fix, and a bunch of amdgpu fixes.
amdgpu:
- S0ix fix
- Handle new NV12 SKU
- Misc power fixes
- Display uninitialized value fix
- PCIE debugfs register access fix
nouveau:
- regression fix for gk104"
* tag 'drm-fixes-2021-03-05' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: fix parameter error of RREG32_PCIE() in amdgpu_regs_pcie
drm/amd/display: fix the return of the uninitialized value in ret
drm/amdgpu: enable BACO runpm by default on sienna cichlid and navy flounder
drm/amd/pm: correct Arcturus mmTHM_BACO_CNTL register address
drm/amdgpu/swsmu/vangogh: Only use RLCPowerNotify msg for disable
drm/amdgpu/pm: make unsupported power profile messages debug
drm/amdgpu:disable VCN for Navi12 SKU
drm/amdgpu: Only check for S0ix if AMD_PMC is configured
drm/nouveau/fifo/gk104-gp1xx: fix creation of sw class
|
|
As pointed out by Ilya and explained in the new comment, there's a
discrepancy between x86 and BPF CMPXCHG semantics: BPF always loads
the value from memory into r0, while x86 only does so when r0 and the
value in memory are different. The same issue affects s390.
At first this might sound like pure semantics, but it makes a real
difference when the comparison is 32-bit, since the load will
zero-extend r0/rax.
The fix is to explicitly zero-extend rax after doing such a
CMPXCHG. Since this problem affects multiple archs, this is done in
the verifier by patching in a BPF_ZEXT_REG instruction after every
32-bit cmpxchg. Any archs that don't need such manual zero-extension
can do a look-ahead with insn_is_zext to skip the unnecessary mov.
Note this still goes on top of Ilya's patch:
https://lore.kernel.org/bpf/[email protected]/T/#u
Differences v5->v6[1]:
- Moved is_cmpxchg_insn and ensured it can be safely re-used. Also renamed it
and removed 'inline' to match the style of the is_*_function helpers.
- Fixed up comments in verifier test (thanks for the careful review, Martin!)
Differences v4->v5[1]:
- Moved the logic entirely into opt_subreg_zext_lo32_rnd_hi32, thanks to Martin
for suggesting this.
Differences v3->v4[1]:
- Moved the optimization against pointless zext into the correct place:
opt_subreg_zext_lo32_rnd_hi32 is called _after_ fixup_bpf_calls.
Differences v2->v3[1]:
- Moved patching into fixup_bpf_calls (patch incoming to rename this function)
- Added extra commentary on bpf_jit_needs_zext
- Added check to avoid adding a pointless zext(r0) if there's already one there.
Difference v1->v2[1]: Now solved centrally in the verifier instead of
specifically for the x86 JIT. Thanks to Ilya and Daniel for the suggestions!
[1] v5: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
v4: https://lore.kernel.org/bpf/CA+i-1C3ytZz6FjcPmUg5s4L51pMQDxWcZNvM86w4RHZ_o2khwg@mail.gmail.com/T/#t
v3: https://lore.kernel.org/bpf/[email protected]/T/#t
v2: https://lore.kernel.org/bpf/[email protected]/T/#t
v1: https://lore.kernel.org/bpf/[email protected]/T/#t
Reported-by: Ilya Leoshkevich <[email protected]>
Fixes: 5ffa25502b5a ("bpf: Add instructions for atomic_[cmp]xchg")
Signed-off-by: Brendan Jackman <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Acked-by: Ilya Leoshkevich <[email protected]>
Tested-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi
Pull iSCSI fixes from Martin Petersen:
"Three fixes for missed iSCSI verification checks (and make the sysfs
files use "sysfs_emit()" - that's what it is there for)"
* tag 'mkp-scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi:
scsi: iscsi: Verify lengths on passthrough PDUs
scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE
scsi: iscsi: Restrict sessions and handles to admin capabilities
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-5.12-2021-03-03:
amdgpu:
- S0ix fix
- Handle new NV12 SKU
- Misc power fixes
- Display uninitialized value fix
- PCIE debugfs register access fix
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
A single regression fix here that I noticed while testing a bunch of
boards for something else, not sure where this got lost! Prevents 3D
driver from initialising on some GPUs.
Signed-off-by: Dave Airlie <[email protected]>
From: Ben Skeggs <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/CACAvsv5gmq14BrDmkMncfd=tHVSSaU89BdBEWfs6Jy-aRz03GQ@mail.gmail.com
|
|
Open-iSCSI sends passthrough PDUs over netlink, but the kernel should be
verifying that the provided PDU header and data lengths fall within the
netlink message to prevent accessing beyond that in memory.
Cc: [email protected]
Reported-by: Adam Nichols <[email protected]>
Reviewed-by: Lee Duncan <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Chris Leech <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
As the iSCSI parameters are exported back through sysfs, it should be
enforcing that they never are more than PAGE_SIZE (which should be more
than enough) before accepting updates through netlink.
Change all iSCSI sysfs attributes to use sysfs_emit().
Cc: [email protected]
Reported-by: Adam Nichols <[email protected]>
Reviewed-by: Lee Duncan <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Chris Leech <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Protect the iSCSI transport handle, available in sysfs, by requiring
CAP_SYS_ADMIN to read it. Also protect the netlink socket by restricting
reception of messages to ones sent with CAP_SYS_ADMIN. This disables
normal users from being able to end arbitrary iSCSI sessions.
Cc: [email protected]
Reported-by: Adam Nichols <[email protected]>
Reviewed-by: Chris Leech <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Lee Duncan <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
The current CIPSO and CALIPSO refcounting scheme for the DOI
definitions is a bit flawed in that we:
1. Don't correctly match gets/puts in netlbl_cipsov4_list().
2. Decrement the refcount on each attempt to remove the DOI from the
DOI list, only removing it from the list once the refcount drops
to zero.
This patch fixes these problems by adding the missing "puts" to
netlbl_cipsov4_list() and introduces a more conventional, i.e.
not-buggy, refcounting mechanism to the DOI definitions. Upon the
addition of a DOI to the DOI list, it is initialized with a refcount
of one, removing a DOI from the list removes it from the list and
drops the refcount by one; "gets" and "puts" behave as expected with
respect to refcounts, increasing and decreasing the DOI's refcount by
one.
Fixes: b1edeb102397 ("netlabel: Replace protocol/NetLabel linking with refrerence counts")
Fixes: d7cce01504a0 ("netlabel: Add support for removing a CALIPSO DOI.")
Reported-by: [email protected]
Signed-off-by: Paul Moore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Provide a generic helper for setting up an io_uring worker. Returns a
task_struct so that the caller can do whatever setup is needed, then call
wake_up_new_task() to kick it into gear.
Add a kernel_clone_args member, io_thread, which tells copy_process() to
mark the task with PF_IO_WORKER.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Linked timeouts are fired asynchronously (i.e. soft-irq), and use
generic cancellation paths to do its stuff, including poking into io-wq.
The problem is that it's racy to access tctx->io_wq, as
io_uring_task_cancel() and others may be happening at this exact moment.
Mark linked timeouts with REQ_F_INLIFGHT for now, making sure there are
no timeouts before io-wq destraction.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Instead of going into request internals, like checking req->file->f_op,
do match them based on REQ_F_INFLIGHT, it's set only when we want it to
be reliably cancelled.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The last change to ibmvnic_set_mac(), 8fc3672a8ad3, meant to prevent
users from setting an invalid MAC address on an ibmvnic interface
that has not been brought up yet. The change also prevented the
requested MAC address from being stored by the adapter object for an
ibmvnic interface when the state of the ibmvnic interface is
VNIC_PROBED - that is after probing has finished but before the
ibmvnic interface is brought up. The MAC address stored by the
adapter object is used and sent to the hypervisor for checking when
an ibmvnic interface is brought up.
The ibmvnic driver ignoring the requested MAC address when in
VNIC_PROBED state caused LACP bonds (bonds in 802.3ad mode) with more
than one slave to malfunction. The bonding code must be able to
change the MAC address of its slaves before they are brought up
during enslaving. The inability of kernels with 8fc3672a8ad3 to set
the MAC addresses of bonding slaves is observable in the output of
"ip address show". The MAC addresses of the slaves are the same as
the MAC address of the bond on a working system whereas the slaves
retain their original MAC addresses on a system with a malfunctioning
LACP bond.
Fixes: 8fc3672a8ad3 ("ibmvnic: fix ibmvnic_set_mac")
Signed-off-by: Jiri Wiesner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Init the u64 stats in order to avoid the lockdep prints on the 32bit
hardware like
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
Hardware name: ARM-Versatile Express
Backtrace:
[<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
[<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline])
[<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120)
[<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline])
[<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247)
[<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
[<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
[<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415)
[<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline])
[<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline])
[<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline])
[<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70)
[<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405)
[<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211)
[<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
[<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
[<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline])
[<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline])
[<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
[<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103)
[<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317)
[<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941)
[<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline])
[<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
[<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287)
[<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554)
[<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740)
[<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
[<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
[<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914)
[<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
[<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
[<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109)
[<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182)
[<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline])
[<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215)
[<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122)
[<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139)
[<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
[<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline])
[<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline])
[<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605)
[<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658)
[<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline])
[<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667)
[<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
Fixes: 83c9e13aa39a ("netdevsim: add software driver for testing offloads")
Reported-by: [email protected]
Tested-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Hillf Danton <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Mat Martineau says:
====================
mptcp: Fixes for v5.12
These patches from the MPTCP tree fix a few multipath TCP issues:
Patches 1 and 5 clear some stale pointers when subflows close.
Patches 2, 4, and 9 plug some memory leaks.
Patch 3 fixes a memory accounting error identified by syzkaller.
Patches 6 and 7 fix a race condition that slowed data transmission.
Patch 8 adds missing wakeups when write buffer space is freed.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
When the port number is mismatched with the announced ones, use
'goto dispose_child' to free the resources instead of using 'goto out'.
This patch also moves the port number checking code in
subflow_syn_recv_sock before mptcp_finish_join, otherwise subflow_drop_ctx
will fail in dispose_child.
Fixes: 5bc56388c74f ("mptcp: add port number check for MP_JOIN")
Reported-by: Paolo Abeni <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
__mptcp_clean_una() can free write memory and should wake-up
user-space processes when needed.
When such function is invoked by the MPTCP receive path, the wakeup
is not needed, as the TCP stack will later trigger subflow_write_space
which will do the wakeup as needed.
Other __mptcp_clean_una() call sites need an additional wakeup check
Let's bundle the relevant code in a new helper and use it.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/165
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Fixes: 64b9cea7a0af ("mptcp: fix spurious retransmissions")
Tested-by: Matthieu Baerts <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If we receive a MPTCP_PUSH_PENDING even from a subflow when
mptcp_release_cb() is serving the previous one, the latter
will be delayed up to the next release_sock(msk).
Address the issue implementing a test/serve loop for such
event.
Additionally rename the push helper to __mptcp_push_pending()
to be more consistent with the existing code.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks")
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Will simplify the following patch, no functional change
intended.
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Just like with last_snd, we have to NULL 'first' on subflow close.
ack_hint isn't strictly required (its never dereferenced), but better to
clear this explicitly as well instead of making it an exception.
msk->first is dereferenced unconditionally at accept time, but
at that point the ssk is not on the conn_list yet -- this means
worker can't see it when iterating the conn_list.
Reported-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Christoph Paasch reported following crash:
dst_release underflow
WARNING: CPU: 0 PID: 1319 at net/core/dst.c:175 dst_release+0xc1/0xd0 net/core/dst.c:175
CPU: 0 PID: 1319 Comm: syz-executor217 Not tainted 5.11.0-rc6af8e85128b4d0d24083c5cac646e891227052e0c #70
Call Trace:
rt_cache_route+0x12e/0x140 net/ipv4/route.c:1503
rt_set_nexthop.constprop.0+0x1fc/0x590 net/ipv4/route.c:1612
__mkroute_output net/ipv4/route.c:2484 [inline]
...
The worker leaves msk->subflow alone even when it
happened to close the subflow ssk associated with it.
Fixes: 866f26f2a9c33b ("mptcp: always graft subflow socket to parent")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/157
Reported-by: Christoph Paasch <[email protected]>
Suggested-by: Paolo Abeni <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In case of memory pressure the MPTCP xmit path keeps
at most a single skb in the tx cache, eventually freeing
additional ones.
The associated counter for forward memory is not update
accordingly, and that causes the following splat:
WARNING: CPU: 0 PID: 12 at net/core/stream.c:208 sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
Modules linked in:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.11.0-rc2 #59
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: events mptcp_worker
RIP: 0010:sk_stream_kill_queues+0x3ca/0x530 net/core/stream.c:208
Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 00 01 00 00 e9 60 ff ff ff e8 2f 24 d3 fe 0f 0b eb 97 e8 26 24 d3 fe <0f> 0b eb a0 e8 1d 24 d3 fe 0f 0b e9 a5 fe ff ff 4c 89 e7 e8 0e d0
RSP: 0018:ffffc900000c7bc8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88810030ac40 RSI: ffffffff8262ca4a RDI: 0000000000000003
RBP: 0000000000000d00 R08: 0000000000000000 R09: ffffffff85095aa7
R10: ffffffff8262c9ea R11: 0000000000000001 R12: ffff888108908100
R13: ffffffff85095aa0 R14: ffffc900000c7c48 R15: 1ffff92000018f85
FS: 0000000000000000(0000) GS:ffff88811b200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa7444baef8 CR3: 0000000035ee9005 CR4: 0000000000170ef0
Call Trace:
__mptcp_destroy_sock+0x4a7/0x6c0 net/mptcp/protocol.c:2547
mptcp_worker+0x7dd/0x1610 net/mptcp/protocol.c:2272
process_one_work+0x896/0x1170 kernel/workqueue.c:2275
worker_thread+0x605/0x1350 kernel/workqueue.c:2421
kthread+0x344/0x410 kernel/kthread.c:292
ret_from_fork+0x22/0x30 arch/x86/entry/entry_64.S:296
At close time, as reported by syzkaller/Christoph.
This change address the issue properly updating the fwd
allocated memory counter in the error path.
Reported-by: Christoph Paasch <[email protected]>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/136
Fixes: 724cfd2ee8aa ("mptcp: allocate TX skbs in msk context")
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
mptcp_add_pending_subflow() performs a sock_hold() on the subflow,
then adds the subflow to the join list.
Without a sock_put the subflow sk won't be freed in case connect() fails.
unreferenced object 0xffff88810c03b100 (size 3000):
[..]
sk_prot_alloc.isra.0+0x2f/0x110
sk_alloc+0x5d/0xc20
inet6_create+0x2b7/0xd30
__sock_create+0x17f/0x410
mptcp_subflow_create_socket+0xff/0x9c0
__mptcp_subflow_connect+0x1da/0xaf0
mptcp_pm_nl_work+0x6e0/0x1120
mptcp_worker+0x508/0x9a0
Fixes: 5b950ff4331ddda ("mptcp: link MPC subflow into msk only after accept")
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Send logic caches last active subflow in the msk, so it needs to be
cleared when the cached subflow is closed.
Fixes: d5f49190def61c ("mptcp: allow picking different xmit subflows")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/155
Reported-by: Christoph Paasch <[email protected]>
Acked-by: Paolo Abeni <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This is a follow up of commit ea3274695353 ("net: sched: avoid
duplicates in qdisc dump") which has fixed the issue only for the qdisc
dump.
The duplicate printing also occurs when dumping the classes via
tc class show dev eth0
Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable")
Signed-off-by: Maximilian Heyne <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The entries in the source files are removed as well.
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vishal Bhakta <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|