Age | Commit message (Collapse) | Author | Files | Lines |
|
The function is only used in kvm.ko module.
Reviewed-by: Mark Kanda <[email protected]>
Signed-off-by: Liran Alon <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
When L1 guest uses 5-level paging, it fails vm-entry to L2 due to
invalid host-state. It needs to add CR4_LA57 bit to nested CR4_FIXED1
MSR.
Signed-off-by: Chenyi Qiang <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
Reviewed-by: Liran Alon <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Reviewed-by: Mark Kanda <[email protected]>
Signed-off-by: Liran Alon <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Not zeroing the bitmap used for identifying the destination vCPUs for an
IOAPIC scan request in fixed delivery mode could lead to waking up unwanted
vCPUs. This patch zeroes the vCPU bitmap before passing it to
kvm_bitmap_or_dest_vcpus(), which is responsible for setting the bitmap
with the bits corresponding to the destination vCPUs.
Fixes: 7ee30bc132c6("KVM: x86: deliver KVM IOAPIC scan request to target vCPUs")
Signed-off-by: Nitesh Narayan Lal <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
If an SMT capable system is not IPL'ed from the first CPU the setup of
the physical to logical CPU mapping is broken: the IPL core gets CPU
number 0, but then the next core gets CPU number 1. Correct would be
that all SMT threads of CPU 0 get the subsequent logical CPU numbers.
This is important since a lot of code (like e.g. the CPU topology
code) assumes that CPU maps are setup like this. If the mapping is
broken the system will not IPL due to broken topology masks:
[ 1.716341] BUG: arch topology broken
[ 1.716342] the SMT domain not a subset of the MC domain
[ 1.716343] BUG: arch topology broken
[ 1.716344] the MC domain not a subset of the BOOK domain
This scenario can usually not happen since LPARs are always IPL'ed
from CPU 0 and also re-IPL is intiated from CPU 0. However older
kernels did initiate re-IPL on an arbitrary CPU. If therefore a re-IPL
from an old kernel into a new kernel is initiated this may lead to
crash.
Fix this by setting up the physical to logical CPU mapping correctly.
Signed-off-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
vdso_per_cpu_data lowcore value is only needed for fully functional
exception handlers, which are activated in setup_lowcore_dat_off. The
same function does init vdso_per_cpu_data via vdso_alloc_boot_cpu.
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
Currently if the kernel is built with CONFIG_TRACE_IRQFLAGS and KASAN
and used as crash kernel it crashes itself due to
trace_hardirqs_off/trace_hardirqs_on being called with DAT off. This
happens because trace_hardirqs_off/trace_hardirqs_on are instrumented and
kasan code tries to perform access to shadow memory to validate memory
accesses. Kasan shadow memory is populated with vmemmap, so all accesses
require DAT on.
memcpy_real could be called with DAT on or off (with kasan enabled DAT
is set even before early code is executed).
Make sure that trace_hardirqs_off/trace_hardirqs_on are called with DAT
on and only actual __memcpy_real is called with DAT off.
Also annotate __memcpy_real and _memcpy_real with __no_sanitize_address
to avoid further problems due to switching DAT off.
Reviewed-by: Philipp Rudo <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
s390_crypto_shash_parmsize() return type is int, it
should not be stored in a unsigned variable, which
compared with zero.
Reported-by: Hulk Robot <[email protected]>
Fixes: 3c2eb6b76cab ("s390/crypto: Support for SHA3 via CPACF (MSA6)")
Signed-off-by: YueHaibing <[email protected]>
Signed-off-by: Joerg Schmidbauer <[email protected]>
Signed-off-by: Vasily Gorbik <[email protected]>
|
|
pidfd_poll() is defined as returning 'unsigned int' but the
.poll method is declared as returning '__poll_t', a bitwise type.
Fix this by using the proper return type and using the EPOLL
constants instead of the POLL ones, as required for __poll_t.
Fixes: b53b0b9d9a61 ("pidfd: add polling support")
Cc: Joel Fernandes (Google) <[email protected]>
Cc: [email protected] # 5.3
Signed-off-by: Luc Van Oostenryck <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Switching cpufreq drivers (or switching operation modes of the
intel_pstate driver from "active" to "passive" and vice versa)
does not work on some x86 systems with ACPI after commit
3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS"), because
the ACPI _PPC and thermal code uses the same frequency QoS request
object for a given CPU every time a cpufreq driver is registered
and freq_qos_remove_request() does not invalidate the request after
removing it from its QoS list, so freq_qos_add_request() complains
and fails when that request is passed to it again.
Fix the issue by modifying freq_qos_remove_request() to clear the qos
and type fields of the frequency request pointed to by its argument
after removing it from its QoS list so as to invalidate it.
Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS")
Reported-and-tested-by: Doug Smythies <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Acked-by: Viresh Kumar <[email protected]>
|
|
Instead of multiplying by page order, virtio balloon divided by page
order. The result is that it can return 0 if there are a bit less
than MAX_ORDER - 1 pages in use, and then shrinker scan won't be called.
Cc: [email protected]
Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
|
|
virtio_balloon_shrinker_scan should return number of system pages freed,
but because it's calling functions that deal with balloon pages, it gets
confused and sometimes returns the number of balloon pages.
It does not matter practically as the exact number isn't
used, but it seems better to be consistent in case someone
starts using this API.
Further, if we ever tried to iteratively leak pages as
virtio_balloon_shrinker_scan tries to do, we'd run into issues - this is
because freed_pages was accumulating total freed pages, but was also
subtracted on each iteration from pages_to_free, which can result in
either leaking less memory than we were supposed to free, or more if
pages_to_free underruns.
On a system with 4K pages we are lucky that we are never asked to leak
more than 128 pages while we can leak up to 256 at a time,
but it looks like a real issue for systems with page size != 4K.
Fixes: 71994620bb25 ("virtio_balloon: replace oom notifier with shrinker")
Reported-by: Khazhismel Kumykov <[email protected]>
Reviewed-by: Wei Wang <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Commit 1d4639567d97 ("mdio_bus: Fix PTR_ERR applied after initialization
to constant") accidentally changed a check from -ENOTSUPP to -ENOSYS,
causing failures if reset controller support is not enabled. E.g. on
r7s72100/rskrza1:
sh-eth e8203000.ethernet: MDIO init failed: -524
sh-eth: probe of e8203000.ethernet failed with error -524
Seen on r8a7740/armadillo, r7s72100/rskrza1, and r7s9210/rza2mevb.
Fixes: 1d4639567d97 ("mdio_bus: Fix PTR_ERR applied after initialization to constant")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: David S. Miller <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit 075e238d12c21c8bde700d21fb48be7a3aa80194.
Going to go with Geert's fix instead, which also has a
correct Fixes tag.
Signed-off-by: David S. Miller <[email protected]>
|
|
According to hardware user manual, bits5~7 in register
HCLGE_MISC_VECTOR_INT_STS means reset interrupts status,
but HCLGE_RESET_INT_M is defined as bits0~2 now. So it
will make hclge_reset_err_handle() read the wrong reset
interrupt status.
This patch fixes this wrong bit mask.
Fixes: 2336f19d7892 ("net: hns3: check reset interrupt status when reset fails")
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
pm_runtime_put_autosuspend in probe will call runtime suspend to
disable clks automatically if CONFIG_PM is defined. (If CONFIG_PM
is not defined, its implementation will be empty, then runtime
suspend will not be called.)
Therefore, we can call pm_runtime_get_sync to runtime resume it
first to enable clks, which matches the runtime suspend. (Only when
CONFIG_PM is defined, otherwise pm_runtime_get_sync will also be
empty, then runtime resume will not be called.)
Then it is fine to disable clks without causing clock count mis-match.
Fixes: c43eab3eddb4 ("net: fec: add missed clk_disable_unprepare in remove")
Signed-off-by: Chuhong Yuan <[email protected]>
Acked-by: Fugang Duan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
when configuring act_pedit rules, the number of keys is validated only on
addition of a new entry. This is not sufficient to avoid hitting a WARN()
in the traffic path: for example, it is possible to replace a valid entry
with a new one having 0 extended keys, thus causing splats in dmesg like:
pedit BUG: index 42
WARNING: CPU: 2 PID: 4054 at net/sched/act_pedit.c:410 tcf_pedit_act+0xc84/0x1200 [act_pedit]
[...]
RIP: 0010:tcf_pedit_act+0xc84/0x1200 [act_pedit]
Code: 89 fa 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ac 00 00 00 48 8b 44 24 10 48 c7 c7 a0 c4 e4 c0 8b 70 18 e8 1c 30 95 ea <0f> 0b e9 a0 fa ff ff e8 00 03 f5 ea e9 14 f4 ff ff 48 89 58 40 e9
RSP: 0018:ffff888077c9f320 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffffac2983a2
RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888053927bec
RBP: dffffc0000000000 R08: ffffed100a726209 R09: ffffed100a726209
R10: 0000000000000001 R11: ffffed100a726208 R12: ffff88804beea780
R13: ffff888079a77400 R14: ffff88804beea780 R15: ffff888027ab2000
FS: 00007fdeec9bd740(0000) GS:ffff888053900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffdb3dfd000 CR3: 000000004adb4006 CR4: 00000000001606e0
Call Trace:
tcf_action_exec+0x105/0x3f0
tcf_classify+0xf2/0x410
__dev_queue_xmit+0xcbf/0x2ae0
ip_finish_output2+0x711/0x1fb0
ip_output+0x1bf/0x4b0
ip_send_skb+0x37/0xa0
raw_sendmsg+0x180c/0x2430
sock_sendmsg+0xdb/0x110
__sys_sendto+0x257/0x2b0
__x64_sys_sendto+0xdd/0x1b0
do_syscall_64+0xa5/0x4e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fdeeb72e993
Code: 48 8b 0d e0 74 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 0d d6 2c 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 4b cc 00 00 48 89 04 24
RSP: 002b:00007ffdb3de8a18 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 000055c81972b700 RCX: 00007fdeeb72e993
RDX: 0000000000000040 RSI: 000055c81972b700 RDI: 0000000000000003
RBP: 00007ffdb3dea130 R08: 000055c819728510 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
R13: 000055c81972b6c0 R14: 000055c81972969c R15: 0000000000000080
Fix this moving the check on 'nkeys' earlier in tcf_pedit_init(), so that
attempts to install rules having 0 keys are always rejected with -EINVAL.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Davide Caratti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Modifying the link settings via phylink_ethtool_ksettings_set() and
phylink_ethtool_set_pauseparam() didn't always work as intended for
PHY based setups, as calling phylink_mac_config() would result in the
unresolved configuration being committed to the MAC, rather than the
configuration with the speed and duplex setting.
This would work fine if the update caused the link to renegotiate,
but if no settings have changed, phylib won't trigger a renegotiation
cycle, and the MAC will be left incorrectly configured.
Avoid calling phylink_mac_config() unless we are using an inband mode
in phylink_ethtool_ksettings_set(), and use phy_set_asym_pause() as
introduced in 4.20 to set the PHY settings in
phylink_ethtool_set_pauseparam().
Signed-off-by: Russell King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Update the documentation on phylink's create and destroy functions to
explicitly state that the rtnl lock must not be held while calling
these.
Signed-off-by: Russell King <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
During performance testing, I found that one of my r8169 NICs suffered
a major performance loss, a 8168c model.
Running netperf's TCP_STREAM test didn't return the expected
throughput of > 900 Mb/s, but rather only about 22 Mb/s. Strange
enough, running the TCP_MAERTS and UDP_STREAM tests all returned with
throughput > 900 Mb/s, as did TCP_STREAM with the other r8169 NICs I can
test (either one of 8169s, 8168e, 8168f).
Bisecting turned up commit 93681cd7d94f83903cb3f0f95433d10c28a7e9a5,
"r8169: enable HW csum and TSO" as the culprit.
I added my 8168c version, RTL_GIGA_MAC_VER_22, to the code
special-casing the 8168evl as per the patch below. This fixed the
performance problem for me.
Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO")
Signed-off-by: Corinna Vinschen <[email protected]>
Reviewed-by: Heiner Kallweit <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
I prefer to use my personal email address for kernel related work.
Signed-off-by: Zhu Yanjun <[email protected]>
Acked-by: Rain River <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The taprio qdisc allows to set mqprio setting but only once. In case
if mqprio settings are provided next time the error is returned as
it's not allowed to change traffic class mapping in-flignt and that
is normal. But if configuration is absolutely the same - no need to
return error. It allows to provide same command couple times,
changing only base time for instance, or changing only scheds maps,
but leaving mqprio setting w/o modification. It more corresponds the
message: "Changing the traffic mapping of a running schedule is not
supported", so reject mqprio if it's really changed.
Also corrected TC_BITMASK + 1 for consistency, as proposed.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule")
Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]>
Acked-by: Vinicius Costa Gomes <[email protected]>
Signed-off-by: Ivan Khoronzhuk <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Bring back tls_sw_sendpage_locked. sk_msg redirection into a socket
with TLS_TX takes the following path:
tcp_bpf_sendmsg_redir
tcp_bpf_push_locked
tcp_bpf_push
kernel_sendpage_locked
sock->ops->sendpage_locked
Also update the flags test in tls_sw_sendpage_locked to allow flag
MSG_NO_SHARED_FRAGS. bpf_tcp_sendmsg sets this.
Link: https://lore.kernel.org/netdev/CA+FuTSdaAawmZ2N8nfDDKu3XLpXBbMtcCT0q4FntDD2gn8ASUw@mail.gmail.com/T/#t
Link: https://github.com/wdebruij/kerneltools/commits/icept.2
Fixes: 0608c69c9a80 ("bpf: sk_msg, sock{map|hash} redirect through ULP")
Fixes: f3de19af0f5b ("Revert \"net/tls: remove unused function tls_sw_sendpage_locked\"")
Signed-off-by: Willem de Bruijn <[email protected]>
Acked-by: John Fastabend <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In afs_wait_for_call_to_complete(), rather than immediately aborting an
operation if a signal occurs, the code attempts to wait for it to
complete, using a schedule timeout of 2*RTT (or min 2 jiffies) and a
check that we're still receiving relevant packets from the server before
we consider aborting the call. We may even ping the server to check on
the status of the call.
However, there's a missing timeout reset in the event that we do
actually get a packet to process, such that if we then get a couple of
short stalls, we then time out when progress is actually being made.
Fix this by resetting the timeout any time we get something to process.
If it's the failure of the call then the call state will get changed and
we'll exit the loop shortly thereafter.
A symptom of this is data fetches and stores failing with EINTR when
they really shouldn't.
Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Signed-off-by: David Howells <[email protected]>
Reviewed-by: Marc Dionne <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The previous commit had a bug where the last page in the memory range
could not be synced. This change fixes the behavior so that all the
required pages are synced.
Fixes: 9cfeeb576d49 ("gve: Fixes DMA synchronization")
Signed-off-by: Adi Suresh <[email protected]>
Reviewed-by: Catherine Sullivan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
For our current users we don't expect pool objects to be writable from
the gpu.
Signed-off-by: Matthew Auld <[email protected]>
Cc: Chris Wilson <[email protected]>
Fixes: 4f7af1948abc ("drm/i915: Support ro ppgtt mapped cmdparser shadow buffers")
Reviewed-by: Chris Wilson <[email protected]>
Signed-off-by: Chris Wilson <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit d18580b08b92ec4105eb0ede2d676e8b1f5a66c3)
Signed-off-by: Rodrigo Vivi <[email protected]>
|
|
Commit 1d4639567d97 ("mdio_bus: Fix PTR_ERR applied after initialization
to constant") accidentally changed a check from -ENOTSUPP to -ENOSYS,
causing failures if reset controller support is not enabled. E.g. on
r7s72100/rskrza1:
sh-eth e8203000.ethernet: MDIO init failed: -524
sh-eth: probe of e8203000.ethernet failed with error -524
Seen on r8a7740/armadillo, r7s72100/rskrza1, and r7s9210/rza2mevb.
Fixes: 1d4639567d97 ("mdio_bus: Fix PTR_ERR applied after initialization to constant")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Cc: YueHaibing <[email protected]>
Cc: David S. Miller <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Before returning NULL, put the sock first.
Cc: [email protected]
Fixes: cf1b2326b734 ("nbd: verify socket is supported during setup")
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Mike Christie <[email protected]>
Signed-off-by: Sun Ke <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When we hot unplug a virtserialport and then try to hot plug again,
it fails:
(qemu) chardev-add socket,id=serial0,path=/tmp/serial0,server,nowait
(qemu) device_add virtserialport,bus=virtio-serial0.0,nr=2,\
chardev=serial0,id=serial0,name=serial0
(qemu) device_del serial0
(qemu) device_add virtserialport,bus=virtio-serial0.0,nr=2,\
chardev=serial0,id=serial0,name=serial0
kernel error:
virtio-ports vport2p2: Error allocating inbufs
qemu error:
virtio-serial-bus: Guest failure in adding port 2 for device \
virtio-serial0.0
This happens because buffers for the in_vq are allocated when the port is
added but are not released when the port is unplugged.
They are only released when virtconsole is removed (see a7a69ec0d8e4)
To avoid the problem and to be symmetric, we could allocate all the buffers
in init_vqs() as they are released in remove_vqs(), but it sounds like
a waste of memory.
Rather than that, this patch changes add_port() logic to ignore ENOSPC
error in fill_queue(), which means queue has already been filled.
Fixes: a7a69ec0d8e4 ("virtio_console: free buffers after reset")
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Laurent Vivier <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
Commit 780bc7903a32 ("virtio_ring: Support DMA APIs") makes
virtqueue_add() return -EIO when we fail to map our I/O buffers. This is
a very realistic scenario for guests with encrypted memory, as swiotlb
may run out of space, depending on it's size and the I/O load.
The virtio-blk driver interprets -EIO form virtqueue_add() as an IO
error, despite the fact that swiotlb full is in absence of bugs a
recoverable condition.
Let us change the return code to -ENOMEM, and make the block layer
recover form these failures when virtio-blk encounters the condition
described above.
Cc: [email protected]
Fixes: 780bc7903a32 ("virtio_ring: Support DMA APIs")
Signed-off-by: Halil Pasic <[email protected]>
Tested-by: Michael Mueller <[email protected]>
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
When CONFIG_RESET_CONTROLLER is disabled, the
devm_reset_control_get_exclusive function returns -ENOTSUPP. This is not
handled in subsequent check and then the mdio device fails to probe.
When CONFIG_RESET_CONTROLLER is enabled, its code checks in OF for reset
device, and since it is not present, returns -ENOENT. -ENOENT is handled.
Add -ENOTSUPP also.
This happened to me when upgrading kernel on Turris Omnia. You either
have to enable CONFIG_RESET_CONTROLLER or use this patch.
Signed-off-by: Marek Behún <[email protected]>
Fixes: 71dd6c0dff51b ("net: phy: add support for reset-controller")
Cc: Dmitry Torokhov <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit eec4844fae7c ("proc/sysctl: add shared variables for range
check") did:
- .extra2 = &two,
+ .extra2 = SYSCTL_ONE,
here, which doesn't seem to be intentional, given the changelog.
This patch restores it to the previous, as the value of 2 still makes
sense (used in fib_multipath_hash()).
Fixes: eec4844fae7c ("proc/sysctl: add shared variables for range check")
Cc: Matteo Croce <[email protected]>
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Acked-by: Matteo Croce <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The driver forgets to disable the regulator in remove like what is done
in probe failure.
Add the missed call to fix it.
Signed-off-by: Chuhong Yuan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
XDP_TX rings should not be limited by max_num_tx_rings_p_up.
To make sure total number of TX rings never exceed MAX_TX_RINGS,
add similar check in mlx4_en_alloc_tx_queue_per_tc(), where
a new value is assigned for num_up.
Fixes: 7e1dc5e926d5 ("net/mlx4_en: Limit the number of TX rings")
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
info->options_len is 'u8' type, and when opts_len with a value >
IP_TUNNEL_OPTS_MAX, 'info->options_len = opts_len' will cast int
to u8 and set a wrong value to info->options_len.
Kernel crashed in my test when doing:
# opts="0102:80:00800022"
# for i in {1..99}; do opts="$opts,0102:80:00800022"; done
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 ip_proto udp action tunnel_key \
set src_ip 10.0.99.192 dst_ip 10.0.99.193 \
dst_port 6081 id 11 geneve_opts $opts \
action mirred egress redirect dev geneve0
So we should do the similar check as cls_flower does, return error
when opts_len > IP_TUNNEL_OPTS_MAX in tunnel_key_copy_opts().
Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The helper mlxsw_sp_ipip_dev_ul_tb_id() determines the underlay VRF of a
GRE tunnel. For a tunnel without a bound device, it uses the same VRF that
the tunnel is in. However in Linux, a GRE tunnel without a bound device
uses the main VRF as the underlay. Fix the function accordingly.
mlxsw further assumed that moving a tunnel to a different VRF could cause
conflict in local tunnel endpoint address, which cannot be offloaded.
However, the only way that an underlay could be changed by moving the
tunnel device itself is if the tunnel device does not have a bound device.
But in that case the underlay is always the main VRF, so there is no
opportunity to introduce a conflict by moving such device. Thus this check
constitutes a dead code, and can be removed, which do.
Fixes: 6ddb7426a7d4 ("mlxsw: spectrum_router: Introduce loopback RIFs")
Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In case of errors in unlink_clip_vcc, the logging level is set to
pr_crit but failures in clip_setentry are handled by pr_err().
The patch changes the severity consistent across invocations.
Signed-off-by: Aditya Pakki <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After previous patches removing bdev being passed around to set it to
bio, it has become unused in submit_extent_page. So it now has "only" 13
parameters.
Signed-off-by: David Sterba <[email protected]>
|
|
We can now remove the bdev from extent_map. Previous patches made sure
that bio_set_dev is correctly in all places and that we don't need to
grab it from latest_bdev or pass it around inside the extent map.
Signed-off-by: David Sterba <[email protected]>
|
|
bio_set_dev sets a bdev to a bio and is not only setting a pointer bug
also changing some state bits if there was a different bdev set before.
This is one thing that's not needed.
Another thing is that setting a bdev at bio allocation time is too early
and actually does not work with plain redundancy profiles, where each
time we submit a bio to a device, the bdev is set correctly.
In many places the bio bdev is set to latest_bdev that seems to serve as
a stub pointer "just to put something to bio". But we don't have to do
that.
Where do we know which bdev to set:
* for regular IO: submit_stripe_bio that's called by btrfs_map_bio
* repair IO: repair_io_failure, read or write from specific device
* super block write (using buffer_heads but uses raw bdev) and barriers
* scrub: this does not use all regular IO paths as it needs to reach all
copies, verify and fixup eventually, and for that all bdev management
is independent
* raid56: rbio_add_io_page, for the RMW write
* integrity-checker: does it's own low-level block tracking
Signed-off-by: David Sterba <[email protected]>
|
|
This is preparatory patch to remove @bdev parameter from
submit_extent_page. It can't be removed completely, because the cgroups
need it for wbc when initializing the bio
wbc_init_bio
bio_associate_blkg_from_css
dereference bdev->bi_disk->queue
The bdev pointer is the same as latest_bdev, thus no functional change.
We can retrieve it from fs_devices that's reachable through several
dereferences. The local variable shadows the parameter, but that's only
temporary.
Signed-off-by: David Sterba <[email protected]>
|
|
Since the execlists_active() is no longer protected by the
engine->active.lock, we need to protect the request pointer with RCU to
prevent it being freed as we evaluate whether or not we need to preempt.
Fixes: df403069029d ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Mika Kuoppala <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Reviewed-by: Mika Kuoppala <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 7d148635253328dda7cfe55d57e3c828e9564427)
Signed-off-by: Joonas Lahtinen <[email protected]>
(cherry picked from commit 8eb4704b124cbd44f189709959137d77063ecfa1)
(cherry picked from commit 7e27238e149ce4f00d9cd801fe3aa0ea55e986a2)
Signed-off-by: Rodrigo Vivi <[email protected]>
|
|
Testing with the new fsstress support for subvolumes uncovered a pretty
bad problem with rename exchange on subvolumes. We're modifying two
different subvolumes, but we only start the transaction on one of them,
so the other one is not added to the dirty root list. This is caught by
btrfs_cow_block() with a warning because the root has not been updated,
however if we do not modify this root again we'll end up pointing at an
invalid root because the root item is never updated.
Fix this by making sure we add the destination root to the trans list,
the same as we do with normal renames. This fixes the corruption.
Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
CC: [email protected] # 4.9+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
set_page_dirty says:
For pages with a mapping this should be done under the page lock
for the benefit of asynchronous memory errors who prefer a
consistent dirty state. This rule can be broken in some special
cases, but should be better not to.
Under those rules, it is only safe for us to use the plain set_page_dirty
calls for shmemfs/anonymous memory. Userptr may be used with real
mappings and so needs to use the locked version (set_page_dirty_lock).
However, following a try_to_unmap() we may want to remove the userptr and
so call put_pages(). However, try_to_unmap() acquires the page lock and
so we must avoid recursively locking the pages ourselves -- which means
that we cannot safely acquire the lock around set_page_dirty(). Since we
can't be sure of the lock, we have to risk skip dirtying the page, or
else risk calling set_page_dirty() without a lock and so risk fs
corruption.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203317
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112012
Fixes: 5cc9ed4b9a7a ("drm/i915: Introduce mapping of user pages into video memory (userptr) ioctl")
References: cb6d7c7dc7ff ("drm/i915/userptr: Acquire the page lock around set_page_dirty()")
References: 505a8ec7e11a ("Revert "drm/i915/userptr: Acquire the page lock around set_page_dirty()"")
References: 6dcc693bc57f ("ext4: warn when page is dirtied without buffers")
Signed-off-by: Chris Wilson <[email protected]>
Cc: Lionel Landwerlin <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: [email protected]
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit 0d4bbe3d407f79438dc4f87943db21f7134cfc65)
Signed-off-by: Joonas Lahtinen <[email protected]>
(cherry picked from commit cee7fb437edcdb2f9f8affa959e274997f5dca4d)
Signed-off-by: Rodrigo Vivi <[email protected]>
|
|
We report "frequencies" (actual-frequency, requested-frequency) as the
number of accumulated cycles so that the average frequency over that
period may be determined by the user. This means the units we report to
the user are Mcycles (or just M), not MHz.
Signed-off-by: Chris Wilson <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: [email protected]
Reviewed-by: Tvrtko Ursulin <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
(cherry picked from commit e88866ef02851c88fe95a4bb97820b94b4d46f36)
Signed-off-by: Joonas Lahtinen <[email protected]>
(cherry picked from commit a7d87b70d6da96c6772e50728c8b4e78e4cbfd55)
Signed-off-by: Rodrigo Vivi <[email protected]>
|
|
The LUTs are single buffered so in order to program them without
tearing we'd have to do it during vblank (actually to be 100%
effective it has to happen between start of vblank and frame start).
We have no proper mechanism for that at the moment so we just
defer loading them after the vblank waits have happened. That
is not quite sufficient (especially when committing multiple pipes
whose vblanks don't line up) so the LUT load will often leak into
the following frame causing tearing.
However in case the hardware wasn't previously using the LUT we
can preload it before setting the enable bit (which is double
buffered so won't tear). Let's determine if we can do such
preloading and make it happen. Slight variation between the
hardware requires some platforms specifics in the checks.
Hans is seeing ugly colored flash on VLV/CHV macchines (GPD win
and Asus T100HA) when the gamma LUT gets loaded for the first
time as the BIOS has left some junk in the LUT memory.
v2: Deal with uapi vs. hw crtc state split
s/GCM/CGM/ typo fix
Cc: Hans de Goede <[email protected]>
Fixes: 051a6d8d3ca0 ("drm/i915: Move LUT programming to happen after vblank waits")
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Tested-by: Hans de Goede <[email protected]>
Reviewed-by: Hans de Goede <[email protected]>
(cherry picked from commit 0ccc42a2fd5107a7f58e62c8b35b61de9a70ce82)
Signed-off-by: Joonas Lahtinen <[email protected]>
(cherry picked from commit f77021372e2880237278e0ee57faadc077a8256a)
Signed-off-by: Rodrigo Vivi <[email protected]>
|
|
Make sure we have a crtc before probing its primary plane's
max stride. Initially I thought we can't get this far without
crtcs, but looks like we can via the dumb_create ioctl.
Not sure if we shouldn't disable dumb buffer support entirely
when we have no crtcs, but that would require some amount of work
as the only thing currently being checked is dev->driver->dumb_create
which we'd have to convert to some device specific dynamic thing.
Cc: [email protected]
Reported-by: Mika Kuoppala <[email protected]>
Fixes: aa5ca8b7421c ("drm/i915: Align dumb buffer stride to 4k to allow for gtt remapping")
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Reviewed-by: Chris Wilson <[email protected]>
(cherry picked from commit baea9ffe64200033499a4955f431e315bb807899)
Signed-off-by: Joonas Lahtinen <[email protected]>
(cherry picked from commit aeec766133f99d45aad60d650de50fb382104d95)
Signed-off-by: Rodrigo Vivi <[email protected]>
|
|
When doing a device replace, while at scrub.c:scrub_enumerate_chunks(), we
set the block group to RO mode and then wait for any ongoing writes into
extents of the block group to complete. While doing that wait we overwrite
the value of the variable 'ret' and can break out of the loop if an error
happens without turning the block group back into RW mode. So what happens
is the following:
1) btrfs_inc_block_group_ro() returns 0, meaning it set the block group
to RO mode (its ->ro field set to 1 or incremented to some value > 1);
2) Then btrfs_wait_ordered_roots() returns a value > 0;
3) Then if either joining or committing the transaction fails, we break
out of the loop wihtout calling btrfs_dec_block_group_ro(), leaving
the block group in RO mode forever.
To fix this, just remove the code that waits for ongoing writes to extents
of the block group, since it's not needed because in the initial setup
phase of a device replace operation, before starting to find all chunks
and their extents, we set the target device for replace while holding
fs_info->dev_replace->rwsem, which ensures that after releasing that
semaphore, any writes into the source device are made to the target device
as well (__btrfs_map_block() guarantees that). So while at
scrub_enumerate_chunks() we only need to worry about finding and copying
extents (from the source device to the target device) that were written
before we started the device replace operation.
Fixes: f0e9b7d6401959 ("Btrfs: fix race setting block group readonly during device replace")
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|