Age | Commit message (Collapse) | Author | Files | Lines |
|
io_unprep_linked_timeout() is broken, first it needs to return back
REQ_F_ARM_LTIMEOUT, so the linked timeout is enqueued and disarmed. But
now we refcounted it, and linked timeouts may get not executed at all,
leaking a request.
Just kill the unprep optimisation.
Fixes: 906c6caaf586 ("io_uring: optimise io_prep_linked_timeout()")
Reported-by: Beld Zhang <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/51b8e2bfc4bea8ee625cf2ba62b2a350cc9be031.1634719585.git.asml.silence@gmail.com
Link: https://github.com/axboe/liburing/issues/460
Reported-by: Beld Zhang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently, IORING_REGISTER_IOWQ_MAX_WORKERS applies only to the task
that issued it, it's unexpected for users. If one task creates a ring,
limits workers and then passes it to another task the limit won't be
applied to the other task.
Another pitfall is that a task should either create a ring or submit at
least one request for IORING_REGISTER_IOWQ_MAX_WORKERS to work at all,
furher complicating the picture.
Change the API, save the limits and apply to all future users. Note, it
should be done first before giving away the ring or submitting new
requests otherwise the result is not guaranteed.
Fixes: 2e480058ddc2 ("io-wq: provide a way to limit max number of workers")
Link: https://github.com/axboe/liburing/issues/460
Reported-by: Beld Zhang <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/51d0bae97180e08ab722c0d5c93e7439cfb6f697.1634683237.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull kvm fixes from Paolo Bonzini:
"Tools:
- kvm_stat: do not show halt_wait_ns since it is not a cumulative statistic
x86:
- clean ups and fixes for bus lock vmexit and lazy allocation of rmaps
- two fixes for SEV-ES (one more coming as soon as I get reviews)
- fix for static_key underflow
ARM:
- Properly refcount pages used as a concatenated stage-2 PGD
- Fix missing unlock when detecting the use of MTE+VM_SHARED"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SEV-ES: reduce ghcb_sa_len to 32 bits
KVM: VMX: Remove redundant handling of bus lock vmexit
KVM: kvm_stat: do not show halt_wait_ns
KVM: x86: WARN if APIC HW/SW disable static keys are non-zero on unload
Revert "KVM: x86: Open code necessary bits of kvm_lapic_set_base() at vCPU RESET"
KVM: SEV-ES: Set guest_state_protected after VMSA update
KVM: X86: fix lazy allocation of rmaps
KVM: SEV-ES: fix length of string I/O
KVM: arm64: Release mmap_lock when using VM_SHARED with MTE
KVM: arm64: Report corrupted refcount at EL2
KVM: arm64: Fix host stage-2 PGD refcount
KVM: s390: Function documentation fixes
|
|
We have the same LAN controller on different PCHs. Separate TGP board
type from SPT which will allow for specific fixes to be applied for
TGP platforms.
Suggested-by: Kai-Heng Feng <[email protected]>
Signed-off-by: Sasha Neftin <[email protected]>
Reviewed-by: Paul Menzel <[email protected]>
Tested-by: Mark Pearson <[email protected]>
Tested-by: Nechama Kraus <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Just like we do for internal objects. Also just use
i915_gem_object_set_cache_coherency() here. No need for over-flushing on
LLC platforms.
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
While the pages can't be swapped out, they can be discarded by the shrinker.
Normally such objects are marked with __I915_MADV_PURGED, which can't be
unset, and therefore requires a new object. For kernel internal objects
this is not true, since the madv hint is reset for our special volatile
objects, such that we can re-acquire new pages, if so desired, without
needing a new object. As a result we should probably be paranoid here
and put the object back into the CPU domain when discarding the pages,
and also correctly set cache_dirty, if required.
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Add some details around non-LLC platforms and cflushing, when dealing
with the flush-on-acquire, which is potentially security sensitive.
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Cc: Daniel Vetter <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
On non-LLC platforms, force the flush-on-acquire if this is ever
swapped-in. Our async flush path is not trust worthy enough yet(and
happens in the wrong order), and with some tricks it's conceivable for
userspace to change the cache-level to I915_CACHE_NONE after the pages
are swapped-in, and since execbuf binds the object before doing the
async flush, there is a potential race window.
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Even though userptr objects are always coherent with the GPU, with no
way for userspace to change this with the set_caching ioctl, even on
non-LLC platforms, there is still the 'Bypass LCC' mocs setting, which
might permit reading the contents of main memory directly.
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
As pointed out by Thomas, we likely need to flush the pages here if the
GPU can read the page contents directly from main memory. Underneath we
don't know what the sg_table is pointing to, so just add a
wbinvd_on_all_cpus() here, for now.
Reported-by: Thomas Hellström <[email protected]>
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
It looks like we will need this in some more places, so extract as a
helper.
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
These are userspace objects, so mark them as such. In a later patch it's
useful to determine how paranoid we need to be when managing cache
flushes. In theory no functional changes.
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
These are userspace objects, so mark them as such. In a later patch it's
useful to determine how paranoid we need to be when managing cache
flushes. In theory no functional changes.
Signed-off-by: Matthew Auld <[email protected]>
Cc: Thomas Hellström <[email protected]>
Reviewed-by: Thomas Hellström <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Change the error message when the bad_page_threshold is reached,
explicitly stating that the GPU will not be initialized.
Cc: Luben Tuikov <[email protected]>
Cc: Mukul Joshi <[email protected]>
Signed-off-by: Kent Russell <[email protected]>
Reviewed-by: Luben Tuikov <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No longer used since IP enumeration is driven by the IP
discovery table now.
Acked-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No longer used since IP enumeration is now driven by
amdgpu IP discovery code.
Acked-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
No longer used since IP enumeration is now driven by
amdgpu IP discovery code.
Acked-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
raven_device_info is not used when KFD_SUPPORT_IOMMU_V2 is not
set.
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
hawaii_device_info is not used when CONFIG_DRM_AMDGPU_CIK is not
set.
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Check was incorrectly converted to IP version checking.
Fixes: 4b0ad8425498ba ("drm/amdgpu/gfx10: convert to IP version checking")
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
When selecting between levels in the force performance levels interface
sclk (gfxclk) was not set correctly for all levels. Select the proper
sclk settings for all levels.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1726
Reviewed-by: Evan Quan <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
show() must not use snprintf() when formatting the value to be
returned to user space.
Fix the following coccicheck warning:
drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c:427:
WARNING: use scnprintf or sprintf.
Signed-off-by: Qing Wang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
B0 internal rev_id is 0x01, B1 internal rev_id is 0x02.
The external rev_id for B0 and B1 is 0x20.
The original expression is not suitable for B1.
v2: squash in fix for display code (Alex)
Signed-off-by: Aaron Liu <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Setting cred->ucounts in cred_alloc_blank does not make sense. The
uid and user_ns are deliberately not set in cred_alloc_blank but
instead the setting is delayed until key_change_session_keyring.
So move dealing with ucounts into key_change_session_keyring as well.
Unfortunately that movement of get_ucounts adds a new failure mode to
key_change_session_keyring. I do not see anything stopping the parent
process from calling setuid and changing the relevant part of it's
cred while keyctl_session_to_parent is running making it fundamentally
necessary to call get_ucounts in key_change_session_keyring. Which
means that the new failure mode cannot be avoided.
A failure of key_change_session_keyring results in a single threaded
parent keeping it's existing credentials. Which results in the parent
process not being able to access the session keyring and whichever
keys are in the new keyring.
Further get_ucounts is only expected to fail if the number of bits in
the refernece count for the structure is too few.
Since the code has no other way to report the failure of get_ucounts
and because such failures are not expected to be common add a WARN_ONCE
to report this problem to userspace.
Between the WARN_ONCE and the parent process not having access to
the keys in the new session keyring I expect any failure of get_ucounts
will be noticed and reported and we can find another way to handle this
condition. (Possibly by just making ucounts->count an atomic_long_t).
Cc: [email protected]
Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred")
Link: https://lkml.kernel.org/r/7k0ias0uf.fsf_-_@disp2133
Tested-by: Yu Zhao <[email protected]>
Reviewed-by: Alexey Gladkov <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
I got memory leak as follows when doing fault injection test:
unreferenced object 0xffff88800906c618 (size 8):
comm "i2c-idt82p33931", pid 4421, jiffies 4294948083 (age 13.188s)
hex dump (first 8 bytes):
70 74 70 30 00 00 00 00 ptp0....
backtrace:
[<00000000312ed458>] __kmalloc_track_caller+0x19f/0x3a0
[<0000000079f6e2ff>] kvasprintf+0xb5/0x150
[<0000000026aae54f>] kvasprintf_const+0x60/0x190
[<00000000f323a5f7>] kobject_set_name_vargs+0x56/0x150
[<000000004e35abdd>] dev_set_name+0xc0/0x100
[<00000000f20cfe25>] ptp_clock_register+0x9f4/0xd30 [ptp]
[<000000008bb9f0de>] idt82p33_probe.cold+0x8b6/0x1561 [ptp_idt82p33]
When posix_clock_register() returns an error, the name allocated
in dev_set_name() will be leaked, the put_device() should be used
to give up the device reference, then the name will be freed in
kobject_cleanup() and other memory will be freed in ptp_clock_release().
Reported-by: Hulk Robot <[email protected]>
Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev")
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When utilizing End to End delay mechanism, the following error messages show up:
|root@ehl1:~# ptp4l --tx_timestamp_timeout=50 -H -i eno2 -E -m
|ptp4l[950.573]: selected /dev/ptp3 as PTP clock
|ptp4l[950.586]: port 1: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[950.586]: port 0: INITIALIZING to LISTENING on INIT_COMPLETE
|ptp4l[952.879]: port 1: new foreign master 001395.fffe.4897b4-1
|ptp4l[956.879]: selected best master clock 001395.fffe.4897b4
|ptp4l[956.879]: port 1: assuming the grand master role
|ptp4l[956.879]: port 1: LISTENING to GRAND_MASTER on RS_GRAND_MASTER
|ptp4l[962.017]: port 1: received DELAY_REQ without timestamp
|ptp4l[962.273]: port 1: received DELAY_REQ without timestamp
|ptp4l[963.090]: port 1: received DELAY_REQ without timestamp
Commit f2fb6b6275eb ("net: stmmac: enable timestamp snapshot for required PTP
packets in dwmac v5.10a") already addresses this problem for the dwmac
v5.10. However, same holds true for all dwmacs above version v4.10. Correct the
check accordingly. Afterwards everything works as expected.
Tested on Intel Atom(R) x6414RE Processor.
Fixes: 14f347334bf2 ("net: stmmac: Correctly take timestamp for PTPv2")
Fixes: f2fb6b6275eb ("net: stmmac: enable timestamp snapshot for required PTP packets in dwmac v5.10a")
Suggested-by: Ong Boon Leong <[email protected]>
Signed-off-by: Kurt Kanzenbach <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If something goes wrong in the remove callback, returning an error code
just results in an error message. The device still disappears.
So don't skip disabling the regulator in st95hf_remove() if resetting
the controller via spi fails. Also don't return an error code which just
results in two error messages.
Signed-off-by: Uwe Kleine-König <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Guangbin Huang says:
====================
net: hns3: add some fixes for -net
This series adds some fixes for the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
HNS3 driver includes hns3.ko, hnae3.ko and hclge.ko.
hns3.ko includes network stack and pci_driver, hclge.ko includes
HW device action, algo_ops and timer task, hnae3.ko includes some
register function.
When SRIOV is enable and hclge.ko is removed, HW device is unloaded
but VF still exists, PF will not reply VF mbx messages, and cause
errors.
This patch fix it by disable SRIOV before remove hclge.ko.
Fixes: e2cb1dec9779 ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support")
Signed-off-by: Peng Li <[email protected]>
Signed-off-by: Guangbin Huang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The task of VF reset is performed through the workqueue. It checks the
value of hdev->reset_pending to determine whether to exit the loop.
However, the value of hdev->reset_pending may also be assigned by
the interrupt function hclgevf_misc_irq_handle(), which may cause the
loop fail to exit and keep occupying the workqueue. This loop is not
necessary, so remove it and the workqueue will be rescheduled if the
reset needs to be retried or a new reset occurs.
Fixes: 1cc9bc6e5867 ("net: hns3: split hclgevf_reset() into preparing and rebuilding part")
Signed-off-by: Yufeng Mo <[email protected]>
Signed-off-by: Guangbin Huang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently when there is a rx page allocation failure, it is
possible that polling may be stopped if there is no more packet
to be reveiced, which may cause queue stall problem under memory
pressure.
This patch makes sure polling is scheduled again when there is
any rx page allocation failure, and polling will try to allocate
receive buffers until it succeeds.
Now the allocation retry is added, it is unnecessary to do the rx
page allocation at the end of rx cleaning, so remove it. And reset
the unused_count to zero after calling hns3_nic_alloc_rx_buffers()
to avoid calling hns3_nic_alloc_rx_buffers() repeatedly under
memory pressure.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Guangbin Huang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
rx unused desc is the desc that need attatching new buffer
before refilling to hw to receive new packet, the number of
desc need attatching new buffer is calculated using next_to_use
and next_to_clean. when next_to_use == next_to_clean, currently
hns3 driver assumes that all the desc has the buffer attatched,
but 'next_to_use == next_to_clean' also means all the desc need
attatching new buffer if hw has comsumed all the desc and the
driver has not attatched any buffer to the desc yet.
This patch adds 'refill' in desc_cb to indicate whether a new
buffer has been refilled to a desc.
Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Guangbin Huang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently the max tx size supported by the hw is calculated by
using the max BD num supported by the hw. According to the hw
user manual, the max tx size is fixed value for both non-TSO and
TSO skb.
This patch updates the max tx size according to the manual.
Fixes: 8ae10cfb5089("net: hns3: support tx-scatter-gather-fraglist feature")
Signed-off-by: Yunsheng Lin <[email protected]>
Signed-off-by: Guangbin Huang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If ets dwrr bandwidth of tc is set to 0, the hardware will switch to SP
mode. In this case, this tc may occupy all the tx bandwidth if it has
huge traffic, so it violates the purpose of the user setting.
To fix this problem, limit the ets dwrr bandwidth must greater than 0.
Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature")
Signed-off-by: Guangbin Huang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently, DWRR of tc will be initialized to a fixed value when this tc
is enabled, but it is not been reset to 0 when this tc is disabled. It
cause a problem that the DWRR of unused tc is not 0 after using tc tool
to add and delete multi-tc parameters.
For examples, after enabling 4 TCs and restoring to 1 TC by follow
tc commands:
$ tc qdisc add dev eth0 root mqprio num_tc 4 map 0 1 2 3 0 1 2 3 queues \
8@0 8@8 8@16 8@24 hw 1 mode channel
$ tc qdisc del dev eth0 root
Now there is just one TC is enabled for eth0, but the tc info querying by
debugfs is shown as follow:
$ cat /mnt/hns3/0000:7d:00.0/tm/tc_sch_info
enabled tc number: 1
weight_offset: 14
TC MODE WEIGHT
0 dwrr 100
1 dwrr 100
2 dwrr 100
3 dwrr 100
4 dwrr 0
5 dwrr 0
6 dwrr 0
7 dwrr 0
This patch fixes it by resetting DWRR of tc to 0 when tc is disabled.
Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver")
Signed-off-by: Guangbin Huang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add configuration of interrupt type and fifo interrupt enable of TM QCN
error event if enabled, otherwise this event will not be reported when
there is error.
Fixes: d914971df022 ("net: hns3: remove redundant query in hclge_config_tm_hw_err_int()")
Signed-off-by: Jiaran Zhang <[email protected]>
Signed-off-by: Guangbin Huang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we
get:
BUG: scheduling while atomic: swapper/1/0/0x00000000
no locks held by swapper/1/0.
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100
Call Trace:
dump_stack_lvl+0xac/0x108
__schedule_bug+0xac/0xe0
__schedule+0xcf8/0x10d0
schedule_idle+0x3c/0x70
do_idle+0x2d8/0x4a0
cpu_startup_entry+0x38/0x40
start_secondary+0x2ec/0x3a0
start_secondary_prolog+0x10/0x14
This is because powerpc's arch_cpu_idle_dead() decrements the idle task's
preempt count, for reasons explained in commit a7c2bb8279d2 ("powerpc:
Re-enable preemption before cpu_die()"), specifically "start_secondary()
expects a preempt_count() of 0."
However, since commit 2c669ef6979c ("powerpc/preempt: Don't touch the idle
task's preempt_count during hotplug") and commit f1a0a376ca0c ("sched/core:
Initialize the idle task with preemption disabled"), that justification no
longer holds.
The idle task isn't supposed to re-enable preemption, so remove the
vestigial preempt_enable() from the CPU offline path.
Tested with pseries and powernv in qemu, and pseries on PowerVM.
Fixes: 2c669ef6979c ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug")
Signed-off-by: Nathan Lynch <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Reviewed-by: Srikar Dronamraju <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
In isa206_idle_insn_mayloss() we store various registers into the stack
red zone, which is allowed.
However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again,
to 0(r1), which corrupts the stack back chain.
We used to do the same in isa206_idle_insn_mayloss() itself, but we
fixed that in 73287caa9210 ("powerpc64/idle: Fix SP offsets when saving
GPRs"), however we missed that the macro also corrupts the back chain.
Corrupting the back chain is bad for debuggability but doesn't
necessarily cause a bug.
However we recently changed the stack handling in some KVM code, and it
now relies on the stack back chain being valid when it returns. The
corruption causes that code to return with r1 pointing somewhere in
kernel data, at some point LR is restored from the stack and we branch
to NULL or somewhere else invalid.
Only affects Power8 hosts running KVM guests, with dynamic_mt_modes
enabled (which it is by default).
The fixes tag below points to the commit that changed the KVM stack
handling, exposing this bug. The actual corruption of the back chain has
always existed since 948cf67c4726 ("powerpc: Add NAP mode support on
Power7 in HV mode").
Fixes: 9b4416c5095c ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()")
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
This reverts commit 09e856d54bda5f288ef8437a90ab2b9b3eab83d1.
When an interface is enslaved in a VRF, prerouting conntrack hook is
called twice: once in the context of the original input interface, and
once in the context of the VRF interface. If no special precausions are
taken, this leads to creation of two conntrack entries instead of one,
and breaks SNAT.
Commit above was intended to avoid creation of extra conntrack entries
when input interface is enslaved in a VRF. It did so by resetting
conntrack related data associated with the skb when it enters VRF context.
However it breaks netfilter operation. Imagine a use case when conntrack
zone must be assigned based on the original input interface, rather than
VRF interface (that would make original interfaces indistinguishable). One
could create netfilter rules similar to these:
chain rawprerouting {
type filter hook prerouting priority raw;
iif realiface1 ct zone set 1 return
iif realiface2 ct zone set 2 return
}
This works before the mentioned commit, but not after: zone assignment
is "forgotten", and any subsequent NAT or filtering that is dependent
on the conntrack zone does not work.
Here is a reproducer script that demonstrates the difference in behaviour.
==========
#!/bin/sh
# This script demonstrates unexpected change of nftables behaviour
# caused by commit 09e856d54bda5f28 ""vrf: Reset skb conntrack
# connection on VRF rcv"
# https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1
#
# Before the commit, it was possible to assign conntrack zone to a
# packet (or mark it for `notracking`) in the prerouting chanin, raw
# priority, based on the `iif` (interface from which the packet
# arrived).
# After the change, # if the interface is enslaved in a VRF, such
# assignment is lost. Instead, assignment based on the `iif` matching
# the VRF master interface is honored. Thus it is impossible to
# distinguish packets based on the original interface.
#
# This script demonstrates this change of behaviour: conntrack zone 1
# or 2 is assigned depending on the match with the original interface
# or the vrf master interface. It can be observed that conntrack entry
# appears in different zone in the kernel versions before and after
# the commit.
IPIN=172.30.30.1
IPOUT=172.30.30.2
PFXL=30
ip li sh vein >/dev/null 2>&1 && ip li del vein
ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf
nft list table testct >/dev/null 2>&1 && nft delete table testct
ip li add vein type veth peer veout
ip li add tvrf type vrf table 9876
ip li set veout master tvrf
ip li set vein up
ip li set veout up
ip li set tvrf up
/sbin/sysctl -w net.ipv4.conf.veout.accept_local=1
/sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0
ip addr add $IPIN/$PFXL dev vein
ip addr add $IPOUT/$PFXL dev veout
nft -f - <<__END__
table testct {
chain rawpre {
type filter hook prerouting priority raw;
iif { veout, tvrf } meta nftrace set 1
iif veout ct zone set 1 return
iif tvrf ct zone set 2 return
notrack
}
chain rawout {
type filter hook output priority raw;
notrack
}
}
__END__
uname -rv
conntrack -F
ping -W 1 -c 1 -I vein $IPOUT
conntrack -L
Signed-off-by: Eugene Crosser <[email protected]>
Acked-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
'drm/ttm/ttm_placement.h' included in
'drivers/gpu/drm/i915/selftests/mock_region.c' is duplicated.
It is also included on the 9 line.
Reported-by: Zeal Robot <[email protected]>
Signed-off-by: Ran Jianping <[email protected]>
Reviewed-by: Matthew Auld <[email protected]>
Signed-off-by: Matthew Auld <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Make sure the security buffer's length/offset are valid with regards to
the packet length.
Acked-by: Namjae Jeon <[email protected]>
Signed-off-by: Marios Makassikis <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
To avoid dictionary attacks (repeated session setups rapidly sent) to
connect to server, ksmbd make a delay of a 5 seconds on session setup
failure to make it harder to send enough random connection requests
to break into a server if a user insert the wrong password 10 times
in a row.
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests and
check the free size of response buffer for these requests.
Acked-by: Namjae Jeon <[email protected]>
Signed-off-by: Hyunchul Lee <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
First, fix nr_workers checks against max_workers, with max_worker
registration, it may pretty easily happen that nr_workers > max_workers.
Also, synchronise writing to acct->max_worker with wqe->lock. It's not
an actual problem, but as we don't care about io_wqe_create_worker(),
it's better than WRITE_ONCE()/READ_ONCE().
Fixes: 2e480058ddc2 ("io-wq: provide a way to limit max number of workers")
Reported-by: Beld Zhang <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/11f90e6b49410b7d1a88f5d04fb8d95bb86b8cf3.1634671835.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
If we return before the end of the 'for_each_child_of_node()' iterator, the
reference taken on 'port' must be released.
Add the missing 'of_node_put()' calls.
Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/15d5310d1d55ad51c1af80775865306d92432e03.1634587046.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Otherwise, hw_id_name string is NULL for SDMA 2 and 3 when dumping
ip version from VBIOS.
Signed-off-by: Guchun Chen <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Output a warning message if RAS TA returns UNSUPPORTED_ERROR_INJ status.
v2: implement it in psp_ras_ta_check_status function.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Create new function to check status returned by RAS TA.
Signed-off-by: Tao Zhou <[email protected]>
Reviewed-by: Hawking Zhang <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit 1445d967fb915156aed0e79ca80c239cb2d414ce.
This patch introduced a couple of dmesg warnings, this is not a valid
approach anymore. For this reason, we are reverting this patch, and we
need to revert the workaround patch.
v2: squash in unused variable removal (Alex)
Cc: Hanghong Ma <[email protected]>
Cc: Mark Broadworth <[email protected]>
Signed-off-by: Agustin Gutierrez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
This reverts commit 4874ecf5fd1de008756a76796bf255d3b19715c1.
Signed-off-by: Agustin Gutierrez <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|