Age | Commit message (Collapse) | Author | Files | Lines |
|
Toshiaki Makita says:
====================
virtio_net: Fix problems around XDP tx and napi_tx
While I'm looking into how to account standard tx counters on XDP tx
processing, I found several bugs around XDP tx and napi_tx.
Patch1: Fix oops on error path. Patch2 depends on this.
Patch2: Fix memory corruption on freeing xdp_frames with napi_tx enabled.
Patch3: Minor fix patch5 depends on.
Patch4: Fix memory corruption on processing xdp_frames when XDP is disabled.
Also patch5 depends on this.
Patch5: Fix memory corruption on processing xdp_frames while XDP is being
disabled.
Patch6: Minor fix patch7 depends on.
Patch7: Fix memory corruption on freeing sk_buff or xdp_frames when a normal
queue is reused for XDP and vise versa.
v2:
- patch5: Make rcu_assign_pointer/synchronize_net conditional instead of
_virtnet_set_queues.
- patch7: Use napi_consume_skb() instead of dev_consume_skb_any()
====================
Signed-off-by: Toshiaki Makita <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We do not reset or free up unused buffers when enabling/disabling XDP,
so it can happen that xdp_frames are freed after disabling XDP or
sk_buffs are freed after enabling XDP on xdp tx queues.
Thus we need to handle both forms (xdp_frames and sk_buffs) regardless
of XDP setting.
One way to trigger this problem is to disable XDP when napi_tx is
enabled. In that case, virtnet_xdp_set() calls virtnet_napi_enable()
which kicks NAPI. The NAPI handler will call virtnet_poll_cleantx()
which invokes free_old_xmit_skbs() for queues which have been used by
XDP.
Note that even with this change we need to keep skipping
free_old_xmit_skbs() from NAPI handlers when XDP is enabled, because XDP
tx queues do not aquire queue locks.
- v2: Use napi_consume_skb() instead of dev_consume_skb_any()
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
put_page() can work as a fallback for freeing xdp_frames, but the
appropriate way is to use xdp_return_frame().
Fixes: cac320c850ef ("virtio_net: convert to use generic xdp_frame and xdp_return_frame API")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 8dcc5b0ab0ec ("virtio_net: fix ndo_xdp_xmit crash towards dev not
ready for XDP") tried to avoid access to unexpected sq while XDP is
disabled, but was not complete.
There was a small window which causes out of bounds sq access in
virtnet_xdp_xmit() while disabling XDP.
An example case of
- curr_queue_pairs = 6 (2 for SKB and 4 for XDP)
- online_cpu_num = xdp_queue_paris = 4
when XDP is enabled:
CPU 0 CPU 1
(Disabling XDP) (Processing redirected XDP frames)
virtnet_xdp_xmit()
virtnet_xdp_set()
_virtnet_set_queues()
set curr_queue_pairs (2)
check if rq->xdp_prog is not NULL
virtnet_xdp_sq(vi)
qp = curr_queue_pairs -
xdp_queue_pairs +
smp_processor_id()
= 2 - 4 + 1 = -1
sq = &vi->sq[qp] // out of bounds access
set xdp_queue_pairs (0)
rq->xdp_prog = NULL
Basically we should not change curr_queue_pairs and xdp_queue_pairs
while someone can read the values. Thus, when disabling XDP, assign NULL
to rq->xdp_prog first, and wait for RCU grace period, then change
xxx_queue_pairs.
Note that we need to keep the current order when enabling XDP though.
- v2: Make rcu_assign_pointer/synchronize_net conditional instead of
_virtnet_set_queues.
Fixes: 186b3c998c50 ("virtio-net: support XDP_REDIRECT")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When XDP is disabled, curr_queue_pairs + smp_processor_id() can be
larger than max_queue_pairs.
There is no guarantee that we have enough XDP send queues dedicated for
each cpu when XDP is disabled, so do not count drops on sq in that case.
Fixes: 5b8f3c8d30a6 ("virtio_net: Add XDP related stats")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When _virtnet_set_queues() failed we did not restore real_num_rx_queues.
Fix this by placing the change of real_num_rx_queues after
_virtnet_set_queues().
This order is also in line with virtnet_set_channels().
Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When napi_tx is enabled, virtnet_poll_cleantx() called
free_old_xmit_skbs() even for xdp send queue.
This is bogus since the queue has xdp_frames, not sk_buffs, thus mangled
device tx bytes counters because skb->len is meaningless value, and even
triggered oops due to general protection fault on freeing them.
Since xdp send queues do not aquire locks, old xdp_frames should be
freed only in virtnet_xdp_xmit(), so just skip free_old_xmit_skbs() for
xdp send queues.
Similarly virtnet_poll_tx() called free_old_xmit_skbs(). This NAPI
handler is called even without calling start_xmit() because cb for tx is
by default enabled. Once the handler is called, it enabled the cb again,
and then the handler would be called again. We don't need this handler
for XDP, so don't enable cb as well as not calling free_old_xmit_skbs().
Also, we need to disable tx NAPI when disabling XDP, so
virtnet_poll_tx() can safely access curr_queue_pairs and
xdp_queue_pairs, which are not atomically updated while disabling XDP.
Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 4e09ff536284 ("virtio-net: disable NAPI only when enabled during
XDP set") tried to fix inappropriate NAPI enabling/disabling when
!netif_running(), but was not complete.
On error path virtio_net could enable NAPI even when !netif_running().
This can cause enabling NAPI twice on virtnet_open(), which would
trigger BUG_ON() in napi_enable().
Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set")
Signed-off-by: Toshiaki Makita <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Lorenzo Bianconi says:
====================
erspan: always reports output key to userspace
Erspan protocol relies on output key to set session id header field.
However TUNNEL_KEY bit is cleared in order to not add key field to
the external GRE header and so the configured o_key is not reported
to userspace.
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter dumping
device info
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
As Erspan_v4, Erspan_v6 protocol relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
ip6erspan_tunnel_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:
$ip link add ip6erspan1 type ip6erspan local 2000::1 remote 2000::2 \
key 1 seq erspan_ver 1
$ip link set ip6erspan1 up
ip -d link sh ip6erspan1
ip6erspan1@NONE: <BROADCAST,MULTICAST> mtu 1422 qdisc noop state DOWN mode DEFAULT
link/ether ba:ff:09:24:c3:0e brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
ip6erspan remote 2000::2 local 2000::1 encaplimit 4 flowlabel 0x00000 ikey 0.0.0.1 iseq oseq
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ip6gre_fill_info
Fixes: 5a963eb61b7c ("ip6_gre: Add ERSPAN native tunnel support")
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Erspan protocol (version 1 and 2) relies on o_key to configure
session id header field. However TUNNEL_KEY bit is cleared in
erspan_xmit since ERSPAN protocol does not set the key field
of the external GRE header and so the configured o_key is not reported
to userspace. The issue can be triggered with the following reproducer:
$ip link add erspan1 type erspan local 192.168.0.1 remote 192.168.0.2 \
key 1 seq erspan_ver 1
$ip link set erspan1 up
$ip -d link sh erspan1
erspan1@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UNKNOWN mode DEFAULT
link/ether 52:aa:99:95:9a:b5 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500
erspan remote 192.168.0.2 local 192.168.0.1 ttl inherit ikey 0.0.0.1 iseq oseq erspan_index 0
Fix the issue adding TUNNEL_KEY bit to the o_flags parameter in
ipgre_fill_info
Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Signed-off-by: Lorenzo Bianconi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The default time is declared in units of microsecnds,
but is used as nanoseconds, resulting in significant
accounting errors for idle state 0 time when all idle
states deeper than 0 are disabled.
Under these unusual conditions, we don't really care
about the poll time limit anyhow.
Fixes: 800fb34a99ce ("cpuidle: poll_state: Disregard disable idle states")
Signed-off-by: Doug Smythies <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
A deadlock has been seen when swicthing clocksources which use
PM-runtime. The call path is:
change_clocksource
...
write_seqcount_begin
...
timekeeping_update
...
sh_cmt_clocksource_enable
...
rpm_resume
pm_runtime_mark_last_busy
ktime_get
do
read_seqcount_begin
while read_seqcount_retry
....
write_seqcount_end
Although we should be safe because we haven't yet changed the
clocksource at that time, we can't do that because of seqcount
protection.
Use ktime_get_mono_fast_ns() instead which is lock safe for such
cases.
With ktime_get_mono_fast_ns, the timestamp is not guaranteed to be
monotonic across an update and as a result can goes backward.
According to update_fast_timekeeper() description: "In the worst
case, this can result is a slightly wrong timestamp (a few
nanoseconds)". For PM-runtime autosuspend, this means only that
the suspend decision may be slightly suboptimal.
Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
Reported-by: Biju Das <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
Reviewed-by: Ulf Hansson <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
The current dentry number tracking code doesn't distinguish between
positive & negative dentries. It just reports the total number of
dentries in the LRU lists.
As excessive number of negative dentries can have an impact on system
performance, it will be wise to track the number of positive and
negative dentries separately.
This patch adds tracking for the total number of negative dentries in
the system LRU lists and reports it in the 5th field in the
/proc/sys/fs/dentry-state file. The number, however, does not include
negative dentries that are in flight but not in the LRU yet as well as
those in the shrinker lists which are on the way out anyway.
The number of positive dentries in the LRU lists can be roughly found by
subtracting the number of negative dentries from the unused count.
Matthew Wilcox had confirmed that since the introduction of the
dentry_stat structure in 2.1.60, the dummy array was there, probably for
future extension. They were not replacements of pre-existing fields.
So no sane applications that read the value of /proc/sys/fs/dentry-state
will do dummy thing if the last 2 fields of the sysctl parameter are not
zero. IOW, it will be safe to use one of the dummy array entry for
negative dentry count.
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The list_lru structure is essentially just a pointer to a table of
per-node LRU lists. Even if CONFIG_MEMCG_KMEM is defined, the list
field is just used for LRU list registration and shrinker_id is set at
initialization. Those fields won't need to be touched that often.
So there is no point to make the list_lru structures to sit in their own
cachelines.
Signed-off-by: Waiman Long <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
The shrink_dcache_sb() function moves dentries from the LRU list to a
shrink list and subtracts the dentry count from nr_dentry_unused. This
is incorrect as the nr_dentry_unused count will also be decremented in
shrink_dentry_list() via d_shrink_del().
To fix this double decrement, the decrement in the shrink_dcache_sb()
function is taken out.
Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all."
Cc: [email protected]
Signed-off-by: Waiman Long <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
After a timeout event caused by for example a broadcast storm, when
the MAC and PHY are reset, the BQL TX queue needs to be reset as
well. Otherwise, the device will exhibit severe performance issues
even after the storm has ended.
Co-authored-by: David Gounaris <[email protected]>
Signed-off-by: Mathias Thore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
With the following commit:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
... the hotplug code attempted to detect when SMT was disabled by BIOS,
in which case it reported SMT as permanently disabled. However, that
code broke a virt hotplug scenario, where the guest is booted with only
primary CPU threads, and a sibling is brought online later.
The problem is that there doesn't seem to be a way to reliably
distinguish between the HW "SMT disabled by BIOS" case and the virt
"sibling not yet brought online" case. So the above-mentioned commit
was a bit misguided, as it permanently disabled SMT for both cases,
preventing future virt sibling hotplugs.
Going back and reviewing the original problems which were attempted to
be solved by that commit, when SMT was disabled in BIOS:
1) /sys/devices/system/cpu/smt/control showed "on" instead of
"notsupported"; and
2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
I'd propose that we instead consider #1 above to not actually be a
problem. Because, at least in the virt case, it's possible that SMT
wasn't disabled by BIOS and a sibling thread could be brought online
later. So it makes sense to just always default the smt control to "on"
to allow for that possibility (assuming cpuid indicates that the CPU
supports SMT).
The real problem is #2, which has a simple fix: change vmx_vm_init() to
query the actual current SMT state -- i.e., whether any siblings are
currently online -- instead of looking at the SMT "control" sysfs value.
So fix it by:
a) reverting the original "fix" and its followup fix:
73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
bc2d8d262cba ("cpu/hotplug: Fix SMT supported evaluation")
and
b) changing vmx_vm_init() to query the actual current SMT state --
instead of the sysfs control value -- to determine whether the L1TF
warning is needed. This also requires the 'sched_smt_present'
variable to exported, instead of 'cpu_smt_control'.
Fixes: 73d5e2b47264 ("cpu/hotplug: detect SMT disabled by BIOS")
Reported-by: Igor Mammedov <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Joe Mario <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
|
|
Johannes Berg says:
====================
various compat ioctl fixes
Back a long time ago, I already fixed a few of these by passing
the size of the struct ifreq to do_sock_ioctl(). However, Robert
found more cases, and now it won't be as simple because we'd have
to pass that down all the way to e.g. bond_do_ioctl() which isn't
really feasible.
Therefore, restore the old code.
While looking at why SIOCGIFNAME was broken, I realized that Al
had removed that case - which had been handled in an explicit
separate function - as well, and looking through his work at the
time I saw that bond ioctls were also affected by the erroneous
removal.
I've restored SIOCGIFNAME and bond ioctls by going through the
(now renamed) dev_ifsioc() instead of reintroducing their own
helper functions, which I hope is correct but have only tested
with SIOCGIFNAME.
====================
Acked-by: Al Viro <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Same story as before, these use struct ifreq and thus need
to be read with the shorter version to not cause faults.
Cc: [email protected]
Fixes: f92d4fc95341 ("kill bond_ioctl()")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As reported by Robert O'Callahan in
https://bugzilla.kernel.org/show_bug.cgi?id=202273
reverting the previous changes in this area broke
the SIOCGIFNAME ioctl in compat again (I'd previously
fixed it after his previous report of breakage in
https://bugzilla.kernel.org/show_bug.cgi?id=199469).
This is obviously because I fixed SIOCGIFNAME more or
less by accident.
Fix it explicitly now by making it pass through the
restored compat translation code.
Cc: [email protected]
Fixes: 4cf808e7ac32 ("kill dev_ifname32()")
Reported-by: Robert O'Callahan <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit bf4405737f9f ("kill dev_ifsioc()").
This wasn't really unused as implied by the original commit,
it still handles the copy to/from user differently, and the
commit thus caused issues such as
https://bugzilla.kernel.org/show_bug.cgi?id=199469
and
https://bugzilla.kernel.org/show_bug.cgi?id=202273
However, deviating from a strict revert, rename dev_ifsioc()
to compat_ifreq_ioctl() to be clearer as to its purpose and
add a comment.
Cc: [email protected]
Fixes: bf4405737f9f ("kill dev_ifsioc()")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit 1cebf8f143c2 ("socket: fix struct ifreq
size in compat ioctl"), it's a bugfix for another commit that
I'll revert next.
This is not a 'perfect' revert, I'm keeping some coding style
intact rather than revert to the state with indentation errors.
Cc: [email protected]
Fixes: 1cebf8f143c2 ("socket: fix struct ifreq size in compat ioctl")
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit d4104c5e783f5d053b97268fb92001d785de7dd5.
Turns out it still needs some more work, I merged it to soon :(
Reported-by: Gao Xiang <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).
v2: reservation_object is already locked by amdgpu_bo_reserve()
v3: Replace looping with get_fences_rcu and special case the promotion
of a single shared fence directly to an exclusive fence, bypassing the
fence array.
v4: Drop the fence array ref after assigning to reservation_object
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107341
Testcase: igt/amd_prime/amd-to-i915
References: 8e94a46c1770 ("drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "Christian König" <[email protected]>
Reviewed-by: "Christian König" <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"A few more fixes this time:
- Two patches to fix the error path of the map_sg implementation of
the AMD IOMMU driver.
- Also a missing IOTLB flush is fixed in the AMD IOMMU driver.
- Memory leak fix for the Intel IOMMU driver.
- Fix a regression in the Mediatek IOMMU driver which caused device
initialization to fail (seen as broken HDMI output)"
* tag 'iommu-fixes-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix IOMMU page flush when detach device from a domain
iommu/mediatek: Use correct fwspec in mtk_iommu_add_device()
iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
iommu/amd: Unmap all mapped pages in error path of map_sg
iommu/amd: Call free_iova_fast with pfn in map_sg
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here is a bunch of GPIO fixes for the v5.0 series. I was helped out by
Bartosz in collecting these fixes, for which I am very grateful, the
biggest achievement in GPIO right now is work distribution.
There is one serious core fix (timestamping) and a bunch of driver
fixes:
- Fix timestamps on nested IRQs
- Handle IRQs properly in multiple instances of PCF857x
- Use the right data register and IRQ type setting in the Spreadtrum
GPIO driver
- Let the value argument work properly when setting direction in the
Altera GPIO driver
- Mask interrupts properly in the vf610 driver"
* tag 'gpio-v5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: vf610: Mask all GPIO interrupts
gpio: altera-a10sr: Set proper output level for direction_output
gpio: sprd: Fix incorrect irq type setting for the async EIC
gpio: sprd: Fix the incorrect data register
gpiolib: fix line event timestamps for nested irqs
gpio: pcf857x: Fix interrupts on multiple instances
|
|
The subvol_name is allocated in btrfs_parse_subvol_options and is
consumed and freed in mount_subvol. Add a free to the error paths that
don't call mount_subvol so that it is guaranteed that subvol_name is
freed when an error happens.
Fixes: 312c89fbca06 ("btrfs: cleanup btrfs_mount() using btrfs_mount_root()")
Cc: [email protected] # v4.19+
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
The fstests generic/475 stresses transaction aborts and can reveal
space accounting or use-after-free bugs regarding block goups.
In this case the pending block groups that remain linked to the
structures after transaction commit aborts in the middle.
The corrupted slabs lead to failures in following tests, eg. generic/476
[ 8172.752887] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
[ 8172.755799] #PF error: [normal kernel read fault]
[ 8172.757571] PGD 661ae067 P4D 661ae067 PUD 3db8e067 PMD 0
[ 8172.759000] Oops: 0000 [#1] PREEMPT SMP
[ 8172.760209] CPU: 0 PID: 39 Comm: kswapd0 Tainted: G W 5.0.0-rc2-default #408
[ 8172.762495] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 8172.765772] RIP: 0010:shrink_page_list+0x2f9/0xe90
[ 8172.770453] RSP: 0018:ffff967f00663b18 EFLAGS: 00010287
[ 8172.771184] RAX: 0000000000000000 RBX: ffff967f00663c20 RCX: 0000000000000000
[ 8172.772850] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8c0620ab20e0
[ 8172.774629] RBP: ffff967f00663dd8 R08: 0000000000000000 R09: 0000000000000000
[ 8172.776094] R10: ffff8c0620ab22f8 R11: ffff8c063f772688 R12: ffff967f00663b78
[ 8172.777533] R13: ffff8c063f625600 R14: ffff8c063f625608 R15: dead000000000200
[ 8172.778886] FS: 0000000000000000(0000) GS:ffff8c063d400000(0000) knlGS:0000000000000000
[ 8172.780545] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8172.781787] CR2: 0000000000000058 CR3: 000000004e962000 CR4: 00000000000006f0
[ 8172.783547] Call Trace:
[ 8172.784112] shrink_inactive_list+0x194/0x410
[ 8172.784747] shrink_node_memcg.constprop.85+0x3a5/0x6a0
[ 8172.785472] shrink_node+0x62/0x1e0
[ 8172.786011] balance_pgdat+0x216/0x460
[ 8172.786577] kswapd+0xe3/0x4a0
[ 8172.787085] ? finish_wait+0x80/0x80
[ 8172.787795] ? balance_pgdat+0x460/0x460
[ 8172.788799] kthread+0x116/0x130
[ 8172.789640] ? kthread_create_on_node+0x60/0x60
[ 8172.790323] ret_from_fork+0x24/0x30
[ 8172.794253] CR2: 0000000000000058
or accounting errors at umount time:
[ 8159.537251] WARNING: CPU: 2 PID: 19031 at fs/btrfs/extent-tree.c:5987 btrfs_free_block_groups+0x3d5/0x410 [btrfs]
[ 8159.543325] CPU: 2 PID: 19031 Comm: umount Tainted: G W 5.0.0-rc2-default #408
[ 8159.545472] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014
[ 8159.548155] RIP: 0010:btrfs_free_block_groups+0x3d5/0x410 [btrfs]
[ 8159.554030] RSP: 0018:ffff967f079cbde8 EFLAGS: 00010206
[ 8159.555144] RAX: 0000000001000000 RBX: ffff8c06366cf800 RCX: 0000000000000000
[ 8159.556730] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8c06255ad800
[ 8159.558279] RBP: ffff8c0637ac0000 R08: 0000000000000001 R09: 0000000000000000
[ 8159.559797] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8c0637ac0108
[ 8159.561296] R13: ffff8c0637ac0158 R14: 0000000000000000 R15: dead000000000100
[ 8159.562852] FS: 00007f7f693b9fc0(0000) GS:ffff8c063d800000(0000) knlGS:0000000000000000
[ 8159.564839] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8159.566160] CR2: 00007f7f68fab7b0 CR3: 000000000aec7000 CR4: 00000000000006e0
[ 8159.567898] Call Trace:
[ 8159.568597] close_ctree+0x17f/0x350 [btrfs]
[ 8159.569628] generic_shutdown_super+0x64/0x100
[ 8159.570808] kill_anon_super+0x14/0x30
[ 8159.571857] btrfs_kill_super+0x12/0xa0 [btrfs]
[ 8159.573063] deactivate_locked_super+0x29/0x60
[ 8159.574234] cleanup_mnt+0x3b/0x70
[ 8159.575176] task_work_run+0x98/0xc0
[ 8159.576177] exit_to_usermode_loop+0x83/0x90
[ 8159.577315] do_syscall_64+0x15b/0x180
[ 8159.578339] entry_SYSCALL_64_after_hwframe+0x49/0xbe
This fix is based on 2 Josef's patches that used sideefects of
btrfs_create_pending_block_groups, this fix introduces the helper that
does what we need.
CC: [email protected] # 4.4+
CC: Josef Bacik <[email protected]>
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
alloc_fs_devices() can return ERR_PTR(-ENOMEM), so dereferencing its
result before the check for IS_ERR() is a bad idea.
Fixes: d1a63002829a4 ("btrfs: add members to fs_devices to track fsid changes")
Reviewed-by: Nikolay Borisov <[email protected]>
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Al Viro <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Commit 765b6a98c1de3 ("iommu/vt-d: Enumerate the scalable
mode capability") enables VT-d scalable mode if hardware
advertises the capability. As we will bring up different
features and use cases to upstream in different patch
series, it will leave some intermediate kernel versions
which support partial features. Hence, end user might run
into problems when they use such kernels on bare metals
or virtualization environments.
This leaves scalable mode default off and end users could
turn it on with "intel-iommu=sm_on" only when they have
clear ideas about which scalable features are supported
in the kernel.
Cc: Liu Yi L <[email protected]>
Cc: Jacob Pan <[email protected]>
Suggested-by: Ashok Raj <[email protected]>
Suggested-by: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Joerg Roedel <[email protected]>
|
|
Florian reported a io hung issue when fsync(). It should be
triggered by following race condition.
data + post flush a flush
blk_flush_complete_seq
case REQ_FSEQ_DATA
blk_flush_queue_rq
issued to driver blk_mq_dispatch_rq_list
try to issue a flush req
failed due to NON-NCQ command
.queue_rq return BLK_STS_DEV_RESOURCE
request completion
req->end_io // doesn't check RESTART
mq_flush_data_end_io
case REQ_FSEQ_POSTFLUSH
blk_kick_flush
do nothing because previous flush
has not been completed
blk_mq_run_hw_queue
insert rq to hctx->dispatch
due to RESTART is still set, do nothing
To fix this, replace the blk_mq_run_hw_queue in mq_flush_data_end_io
with blk_mq_sched_restart to check and clear the RESTART flag.
Fixes: bd166ef1 (blk-mq-sched: add framework for MQ capable IO schedulers)
Reported-by: Florian Stecker <[email protected]>
Tested-by: Florian Stecker <[email protected]>
Signed-off-by: Jianchao Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
syzbot is hitting flush_work() warning caused by commit 4d43d395fed12463
("workqueue: Try to catch flush_work() without INIT_WORK().") [1].
Although that commit did not expect INIT_WORK(NULL) case, calling
flush_work() without setting a valid callback should be avoided anyway.
Fix this problem by setting a no-op callback instead of NULL.
[1] https://syzkaller.appspot.com/bug?id=e390366bc48bc82a7c668326e0663be3b91cbd29
Signed-off-by: Tetsuo Handa <[email protected]>
Reported-and-tested-by: syzbot <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Handling short packets (length < max packet size) in the Inventra DMA
engine in the MUSB driver causes the MUSB DMA controller to hang. An
example of a problem that is caused by this problem is when streaming
video out of a UVC gadget, only the first video frame is transferred.
For short packets (mode-0 or mode-1 DMA), MUSB_TXCSR_TXPKTRDY must be
set manually by the driver. This was previously done in musb_g_tx
(musb_gadget.c), but incorrectly (all csr flags were cleared, and only
MUSB_TXCSR_MODE and MUSB_TXCSR_TXPKTRDY were set). Fixing that problem
allows some requests to be transferred correctly, but multiple requests
were often put together in one USB packet, and caused problems if the
packet size was not a multiple of 4. Instead, set MUSB_TXCSR_TXPKTRDY
in dma_controller_irq (musbhsdma.c), just like host mode transfers.
This topic was originally tackled by Nicolas Boichat [0] [1] and is
discussed further at [2] as part of his GSoC project [3].
[0] https://groups.google.com/forum/?hl=en#!topic/beagleboard-gsoc/k8Azwfp75CU
[1] https://gitorious.org/beagleboard-usbsniffer/beagleboard-usbsniffer-kernel/commit/b0be3b6cc195ba732189b04f1d43ec843c3e54c9?p=beagleboard-usbsniffer:beagleboard-usbsniffer-kernel.git;a=patch;h=b0be3b6cc195ba732189b04f1d43ec843c3e54c9
[2] http://beagleboard-usbsniffer.blogspot.com/2010/07/musb-isochronous-transfers-fixed.html
[3] http://elinux.org/BeagleBoard/GSoC/USBSniffer
Fixes: 550a7375fe72 ("USB: Add MUSB and TUSB support")
Signed-off-by: Paul Elder <[email protected]>
Signed-off-by: Bin Liu <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
VOP is broken in mainline since commit 1ce9e6055fa0a9043 ("virtio_ring:
introduce packed ring support"); attempting to use the virtqueues leads
to various kernel crashes. I'm testing it with my not-yet-merged
loopback patches, but even the in-tree MIC hardware cannot work.
The problem is not in the referenced commit per se, but is due to the
following hack in vop_find_vq() which depends on the layout of private
structures in other source files, which that commit happened to change:
/*
* To reassign the used ring here we are directly accessing
* struct vring_virtqueue which is a private data structure
* in virtio_ring.c. At the minimum, a BUILD_BUG_ON() in
* vring_new_virtqueue() would ensure that
* (&vq->vring == (struct vring *) (&vq->vq + 1));
*/
vr = (struct vring *)(vq + 1);
vr->used = used;
Fix vop by using __vring_new_virtqueue() to create the needed vring
layout from the start, instead of attempting to patch in the used ring
later. __vring_new_virtqueue() was added way back in commit
2a2d1382fe9dcc ("virtio: Add improved queue allocation API") in order to
address mic's usecase, according to the commit message.
Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support")
Signed-off-by: Vincent Whitchurch <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
As Al pointed out, "
... and while we are at it, what happens to
unsigned int nameoff = le16_to_cpu(de[mid].nameoff);
unsigned int matched = min(startprfx, endprfx);
struct qstr dname = QSTR_INIT(data + nameoff,
unlikely(mid >= ndirents - 1) ?
maxsize - nameoff :
le16_to_cpu(de[mid + 1].nameoff) - nameoff);
/* string comparison without already matched prefix */
int ret = dirnamecmp(name, &dname, &matched);
if le16_to_cpu(de[...].nameoff) is not monotonically increasing? I.e.
what's to prevent e.g. (unsigned)-1 ending up in dname.len?
Corrupted fs image shouldn't oops the kernel.. "
Revisit the related lookup flow to address the issue.
Fixes: d72d1ce60174 ("staging: erofs: add namei functions")
Cc: <[email protected]> # 4.19+
Suggested-by: Al Viro <[email protected]>
Signed-off-by: Gao Xiang <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Commit 2b3e88ea6528 ("net: phy: improve phy state checking")
added checks for phylib usage, and this triggers with OCTEON ethernet
and results in broken networking.
Fix by replacing phy_start_aneg() with phy_start().
Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking")
Signed-off-by: Aaro Koskinen <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
There is a little window during disconnection flow
when read cb is moved between lists and may be not freed.
Remove moving read cbs explicitly during flash fixes this memory
leak.
Signed-off-by: Alexander Usyskin <[email protected]>
Signed-off-by: Tomas Winkler <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The device was moved from misc device to character devices
to support multiple mei devices.
Cc: <[email protected]> #v4.9+
Signed-off-by: Tomas Winkler <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Add icelake mei device id.
Cc: <[email protected]>
Signed-off-by: Tomas Winkler <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
We currently adhere to the reserved devices limit when creating new
binderfs devices in binderfs instances not located in the inital ipc
namespace. But it is still possible to rob the host instances of their 4
reserved devices by creating the maximum allowed number of devices in a
single binderfs instance located in a non-initial ipc namespace and then
mounting 4 separate binderfs instances in non-initial ipc namespaces. That
happens because the limit is currently not respected for the creation of
the initial binder-control device node. Block this nonsense by performing
the same check in binderfs_binder_ctl_create() that we perform in
binderfs_binder_device_create().
Fixes: 36bdf3cae09d ("binderfs: reserve devices for initial mount")
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Several users have tried to only rely on binderfs to provide binder devices
and set CONFIG_ANDROID_BINDER_DEVICES="" empty. This is a great use-case of
binderfs and one that was always intended to work. However, this is
currently not possible since setting CONFIG_ANDROID_BINDER_DEVICES="" emtpy
will simply panic the kernel:
kobject: (00000000028c2f79): attempted to be registered with empty name!
WARNING: CPU: 7 PID: 1703 at lib/kobject.c:228 kobject_add_internal+0x288/0x2b0
Modules linked in: binder_linux(+) bridge stp llc ipmi_ssif gpio_ich dcdbas coretemp kvm_intel kvm irqbypass serio_raw input_leds lpc_ich i5100_edac mac_hid ipmi_si ipmi_devintf ipmi_msghandler sch_fq_codel ib_i
CPU: 7 PID: 1703 Comm: modprobe Not tainted 5.0.0-rc2-brauner-binderfs #263
Hardware name: Dell DCS XS24-SC2 /XS24-SC2 , BIOS S59_3C20 04/07/2011
RIP: 0010:kobject_add_internal+0x288/0x2b0
Code: 12 95 48 c7 c7 78 63 3b 95 e8 77 35 71 ff e9 91 fe ff ff 0f 0b eb a7 0f 0b eb 9a 48 89 de 48 c7 c7 00 63 3b 95 e8 f8 95 6a ff <0f> 0b 41 bc ea ff ff ff e9 6d fe ff ff 41 bc fe ff ff ff e9 62 fe
RSP: 0018:ffff973f84237a30 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff8b53e2472010 RCX: 0000000000000006
RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff8b53edbd63a0
RBP: ffff973f84237a60 R08: 0000000000000342 R09: 0000000000000004
R10: ffff973f84237af0 R11: 0000000000000001 R12: 0000000000000000
R13: ffff8b53e9f1a1e0 R14: 00000000e9f1a1e0 R15: 0000000000a00037
FS: 00007fbac36f7540(0000) GS:ffff8b53edbc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbac364cfa7 CR3: 00000004a6d48000 CR4: 00000000000406e0
Call Trace:
kobject_add+0x71/0xd0
? _cond_resched+0x19/0x40
? mutex_lock+0x12/0x40
device_add+0x12e/0x6b0
device_create_groups_vargs+0xe4/0xf0
device_create_with_groups+0x3f/0x60
? _cond_resched+0x19/0x40
misc_register+0x140/0x180
binder_init+0x1ed/0x2d4 [binder_linux]
? trace_event_define_fields_binder_transaction_fd_send+0x8e/0x8e [binder_linux]
do_one_initcall+0x4a/0x1c9
? _cond_resched+0x19/0x40
? kmem_cache_alloc_trace+0x151/0x1c0
do_init_module+0x5f/0x216
load_module+0x223d/0x2b20
__do_sys_finit_module+0xfc/0x120
? __do_sys_finit_module+0xfc/0x120
__x64_sys_finit_module+0x1a/0x20
do_syscall_64+0x5a/0x120
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fbac3202839
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd1494a908 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055b629ebec60 RCX: 00007fbac3202839
RDX: 0000000000000000 RSI: 000055b629c20d2e RDI: 0000000000000003
RBP: 000055b629c20d2e R08: 0000000000000000 R09: 000055b629ec2310
R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
R13: 000055b629ebed70 R14: 0000000000040000 R15: 000055b629ebec60
So check for the empty string since strsep() will otherwise return the
emtpy string which will cause kobject_add_internal() to panic when trying
to add a kobject with an emtpy name.
Fixes: ac4812c5ffbb ("binder: Support multiple /dev instances")
Cc: Martijn Coenen <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Acked-by: Todd Kjos <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
This adds the promised selftest for binderfs. It will verify the following
things:
- binderfs mounting works
- binder device allocation works
- performing a binder ioctl() request through a binderfs device works
- binder device removal works
- binder-control removal fails
- binderfs unmounting works
The tests are performed both privileged and unprivileged. The latter
verifies that binderfs behaves correctly in user namespaces.
Cc: Todd Kjos <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Acked-by: Shuah Khan <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Lots of callers of debugfs_lookup() were just checking NULL to see if
the file/directory was found or not. By changing this in ff9fb72bc077
("debugfs: return error values, not NULL") we caused some subsystems to
easily crash.
Fixes: ff9fb72bc077 ("debugfs: return error values, not NULL")
Reported-by: [email protected]
Reported-by: Tetsuo Handa <[email protected]>
Cc: Omar Sandoval <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The send_xchar() and tiocmset() tty operations are optional. Add the
missing sanity checks to prevent user-space triggerable NULL-pointer
dereferences.
Fixes: 6b9ad1c742bf ("staging: speakup: add send_xchar, tiocmset and input functionality for tty")
Cc: stable <[email protected]> # 4.13
Cc: Okash Khawaja <[email protected]>
Cc: Samuel Thibault <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Reviewed-by: Samuel Thibault <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
... so that they can get CCed on platform patches.
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Andy Shevchenko <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
Calling platform-specific code unconditionally blows up when running
an ARCH_MULTIPLATFORM kernel on a different platform. Don't do it.
Reported-by: Paolo Pisati <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Cc: [email protected] # v4.8+
Fixes: a30eceb7a59d ("ARM: tango: add Suspend-to-RAM support")
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into arm/fixes
Third Round of Renesas ARM Based SoC Fixes for v5.0
* Convert to new LVDS DT bindings fixing a regression introduced in v4.17
* tag 'renesas-fixes3-for-v5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: dts: r8a7743: Convert to new LVDS DT bindings
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into arm/fixes
Allwinner Fixes for 5.0
A couple of device tree fixes for the 5.0 cycle:
- Add missing clock-output-names for the osc24M clock on sun6i/A31
The Linux clock driver uses the device node as the clock name if
the property is missing. The node name was changed in 5.0-rc1,
breaking a subtle dependency in the sunxi-ng clock driver, and
renders Linux unable to completely boot up.
- Add alias for Ethernet controller on Beelink X2
This allows the bootloader to assign a deterministically generated
MAC address to it.
- Add property to enable USB VBUS regulator on OrangePi Win
The board had defined the constraints for the regulator, but was
missing the property to actually enable it.
* tag 'sunxi-fixes-for-5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: a64: Fix USB OTG regulator
ARM: dts: sun8i: h3: Add ethernet0 alias to Beelink X2
ARM: dts: sun6i: Add clock-output-names to osc24M clock
arm64: dts: allwinner: a64: Fix the video engine compatible
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic into arm/fixes
Amlogic fixes for v5.0-rc, round 2
- several fixups for the GPIO cd-inverted change
- IRQ trigger fixes for MAC IRQ
* tag 'amlogic-fixes-2.1' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-amlogic:
arm64: dts: meson: Fix mmc cd-gpios polarity
ARM: dts: meson8m2: mxiii-plus: mark the SD card detection GPIO active-low
ARM: dts: meson8b: ec100: mark the SD card detection GPIO active-low
ARM: dts: meson8b: odroidc1: mark the SD card detection GPIO active-low
arm: dts: meson: Fix IRQ trigger type for macirq
Signed-off-by: Arnd Bergmann <[email protected]>
|