Age | Commit message (Collapse) | Author | Files | Lines |
|
The purpose here is to avoid ptp4l fail due to this condition:
timed out while polling for tx timestamp
increasing tx_timestamp_timeout may correct this issue, but it is likely caused by a driver bug
port 1: send peer delay request failed
So either reset the switch before the management frame was sent, or
after it was timestamped as well, but not in the middle.
The condition may arise either due to a true timeout (i.e. because
re-uploading the static config takes time), or due to the TX timestamp
actually getting lost due to reset. For the former we can increase
tx_timestamp_timeout in userspace, for the latter we need this patch.
Locking all traffic during switch reset does not make sense at all,
though. Forcing all CPU-originated traffic to potentially block waiting
for a sleepable context to send > 800 bytes over SPI is not a good idea.
Flows that are autonomously forwarded by the switch will get dropped
anyway during switch reset no matter what. So just let all other
CPU-originated traffic be dropped as well.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The PTP time of the switch is not preserved when uploading a new static
configuration. Work around this hardware oddity by reading its PTP time
before a static config upload, and restoring it afterwards.
Static config changes are expected to occur at runtime even in scenarios
directly related to PTP, i.e. the Time-Aware Scheduler of the switch is
programmed in this way.
Perhaps the larger implication of this patch is that the PTP .gettimex64
and .settime functions need to be exposed to sja1105_main.c, where the
PTP lock needs to be held during this entire process. So their core
implementation needs to move to some common functions which get exposed
in sja1105_ptp.h.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace
applications (i.e. phc2sys) to compensate for the delays incurred while
reading the PHC's time.
The task itself of taking the software timestamp is delegated to the SPI
subsystem, through the newly introduced API in struct spi_transfer. The
goal is to cross-timestamp I/O operations on the switch's PTP clock with
values in the local system clock (CLOCK_REALTIME). For that we need to
understand a bit of the hardware internals.
The 'read PTP time' message is a 12 byte structure, first 4 bytes of
which represent the SPI header, and the last 8 bytes represent the
64-bit PTP time. The switch itself starts processing the command
immediately after receiving the last bit of the address, i.e. at the
middle of byte 3 (last byte of header). The PTP time is shadowed to a
buffer register in the switch, and retrieved atomically during the
subsequent SPI frames.
A similar thing goes on for the 'write PTP time' message, although in
that case the switch waits until the 64-bit PTP time becomes fully
available before taking any action. So the byte that needs to be
software-timestamped is byte 11 (last) of the transfer.
The patch creates a common (and local) sja1105_xfer implementation for
the SPI I/O, and offers 3 front-ends:
- sja1105_xfer_u32 and sja1105_xfer_u64: these are capable of optionally
requesting a PTP timestamp
- sja1105_xfer_buf: this is for large transfers (e.g. the static config
buffer) and other misc data, and there is no point in giving
timestamping capabilities to this.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"There's an inadvertent preemption point in ptrace_stop() which was
reliably triggering for a test scenario significantly slowing it down.
This contains Oleg's fix to remove the unwanted preemption point"
* 'for-5.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: freezer: call cgroup_enter_frozen() with preemption disabled in ptrace_stop()
|
|
When RoCE is disabled load mlx5_ib in raw_eth profile.
Clean pf_profile roce capability checks as it will not be used without
roce capability.
Signed-off-by: Michael Guralnik <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Rename uplink_rep_profile and its unique init and cleanup stages to
suit its upcoming use as the profile when RoCE is disabled.
Signed-off-by: Michael Guralnik <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Register "enable_roce" param, default value is RoCE enabled.
Current configuration is stored on mlx5_core_dev and exposed to user
through the cmode runtime devlink param.
Changing configuration requires changing the cmode driverinit devlink
param and calling devlink reload.
Signed-off-by: Michael Guralnik <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Add documentation for current mlx5 supported devlink param.
Signed-off-by: Michael Guralnik <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
New device parameter to enable/disable handling of RoCE traffic in the
device.
Signed-off-by: Michael Guralnik <[email protected]>
Acked-by: Jiri Pirko <[email protected]>
Reviewed-by: Maor Gottlieb <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
During rename exchange we might have successfully log the new name in the
source root's log tree, in which case we leave our log context (allocated
on stack) in the root's list of log contextes. However we might fail to
log the new name in the destination root, in which case we fallback to
a transaction commit later and never sync the log of the source root,
which causes the source root log context to remain in the list of log
contextes. This later causes invalid memory accesses because the context
was allocated on stack and after rename exchange finishes the stack gets
reused and overwritten for other purposes.
The kernel's linked list corruption detector (CONFIG_DEBUG_LIST=y) can
detect this and report something like the following:
[ 691.489929] ------------[ cut here ]------------
[ 691.489947] list_add corruption. prev->next should be next (ffff88819c944530), but was ffff8881c23f7be4. (prev=ffff8881c23f7a38).
[ 691.489967] WARNING: CPU: 2 PID: 28933 at lib/list_debug.c:28 __list_add_valid+0x95/0xe0
(...)
[ 691.489998] CPU: 2 PID: 28933 Comm: fsstress Not tainted 5.4.0-rc6-btrfs-next-62 #1
[ 691.490001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
[ 691.490003] RIP: 0010:__list_add_valid+0x95/0xe0
(...)
[ 691.490007] RSP: 0018:ffff8881f0b3faf8 EFLAGS: 00010282
[ 691.490010] RAX: 0000000000000000 RBX: ffff88819c944530 RCX: 0000000000000000
[ 691.490011] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa2c497e0
[ 691.490013] RBP: ffff8881f0b3fe68 R08: ffffed103eaa4115 R09: ffffed103eaa4114
[ 691.490015] R10: ffff88819c944000 R11: ffffed103eaa4115 R12: 7fffffffffffffff
[ 691.490016] R13: ffff8881b4035610 R14: ffff8881e7b84728 R15: 1ffff1103e167f7b
[ 691.490019] FS: 00007f4b25ea2e80(0000) GS:ffff8881f5500000(0000) knlGS:0000000000000000
[ 691.490021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 691.490022] CR2: 00007fffbb2d4eec CR3: 00000001f2a4a004 CR4: 00000000003606e0
[ 691.490025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 691.490027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 691.490029] Call Trace:
[ 691.490058] btrfs_log_inode_parent+0x667/0x2730 [btrfs]
[ 691.490083] ? join_transaction+0x24a/0xce0 [btrfs]
[ 691.490107] ? btrfs_end_log_trans+0x80/0x80 [btrfs]
[ 691.490111] ? dget_parent+0xb8/0x460
[ 691.490116] ? lock_downgrade+0x6b0/0x6b0
[ 691.490121] ? rwlock_bug.part.0+0x90/0x90
[ 691.490127] ? do_raw_spin_unlock+0x142/0x220
[ 691.490151] btrfs_log_dentry_safe+0x65/0x90 [btrfs]
[ 691.490172] btrfs_sync_file+0x9f1/0xc00 [btrfs]
[ 691.490195] ? btrfs_file_write_iter+0x1800/0x1800 [btrfs]
[ 691.490198] ? rcu_read_lock_any_held.part.11+0x20/0x20
[ 691.490204] ? __do_sys_newstat+0x88/0xd0
[ 691.490207] ? cp_new_stat+0x5d0/0x5d0
[ 691.490218] ? do_fsync+0x38/0x60
[ 691.490220] do_fsync+0x38/0x60
[ 691.490224] __x64_sys_fdatasync+0x32/0x40
[ 691.490228] do_syscall_64+0x9f/0x540
[ 691.490233] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 691.490235] RIP: 0033:0x7f4b253ad5f0
(...)
[ 691.490239] RSP: 002b:00007fffbb2d6078 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
[ 691.490242] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4b253ad5f0
[ 691.490244] RDX: 00007fffbb2d5fe0 RSI: 00007fffbb2d5fe0 RDI: 0000000000000003
[ 691.490245] RBP: 000000000000000d R08: 0000000000000001 R09: 00007fffbb2d608c
[ 691.490247] R10: 00000000000002e8 R11: 0000000000000246 R12: 00000000000001f4
[ 691.490248] R13: 0000000051eb851f R14: 00007fffbb2d6120 R15: 00005635a498bda0
This started happening recently when running some test cases from fstests
like btrfs/004 for example, because support for rename exchange was added
last week to fsstress from fstests.
So fix this by deleting the log context for the source root from the list
if we have logged the new name in the source root.
Reported-by: Su Yue <[email protected]>
Fixes: d4682ba03ef618 ("Btrfs: sync log after logging new name")
CC: [email protected] # 4.19+
Tested-by: Su Yue <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three small changes: two in the core and one in the qla2xxx driver.
The sg_tablesize fix affects a thinko in the migration to blk-mq of
certain legacy drivers which could cause an oops and the sd core
change should only affect zoned block devices which were wrongly
suppressing error messages for reset all zones"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: Handle drivers which set sg_tablesize to zero
scsi: qla2xxx: fix NPIV tear down process
scsi: sd_zbc: Fix sd_zbc_complete()
|
|
When a jump_whitelist bitmap is reused, it needs to be cleared.
Currently this is done with memset() and the size calculation assumes
bitmaps are made of 32-bit words, not longs. So on 64-bit
architectures, only the first half of the bitmap is cleared.
If some whitelist bits are carried over between successive batches
submitted on the same context, this will presumably allow embedding
the rogue instructions that we're trying to reject.
Use bitmap_zero() instead, which gets the calculation right.
Fixes: f8c08d8faee5 ("drm/i915/cmdparser: Add support for backward jumps")
Signed-off-by: Ben Hutchings <[email protected]>
Signed-off-by: Jon Bloomfield <[email protected]>
|
|
Reported by syzkaller:
=============================
WARNING: suspicious RCU usage
-----------------------------
./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
no locks held by repro_11/12688.
stack backtrace:
Call Trace:
dump_stack+0x7d/0xc5
lockdep_rcu_suspicious+0x123/0x170
kvm_dev_ioctl+0x9a9/0x1260 [kvm]
do_vfs_ioctl+0x1a1/0xfb0
ksys_ioctl+0x6d/0x80
__x64_sys_ioctl+0x73/0xb0
do_syscall_64+0x108/0xaa0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Commit a97b0e773e4 (kvm: call kvm_arch_destroy_vm if vm creation fails)
sets users_count to 1 before kvm_arch_init_vm(), however, if kvm_arch_init_vm()
fails, we need to decrease this count. By moving it earlier, we can push
the decrease to out_err_no_arch_destroy_vm without introducing yet another
error label.
syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=15209b84e00000
Reported-by: [email protected]
Fixes: a97b0e773e49 ("kvm: call kvm_arch_destroy_vm if vm creation fails")
Cc: Jim Mattson <[email protected]>
Analyzed-by: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Reported by syzkaller:
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 14727 Comm: syz-executor.3 Not tainted 5.4.0-rc4+ #0
RIP: 0010:kvm_coalesced_mmio_init+0x5d/0x110 arch/x86/kvm/../../../virt/kvm/coalesced_mmio.c:121
Call Trace:
kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:3446 [inline]
kvm_dev_ioctl+0x781/0x1490 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3494
vfs_ioctl fs/ioctl.c:46 [inline]
file_ioctl fs/ioctl.c:509 [inline]
do_vfs_ioctl+0x196/0x1150 fs/ioctl.c:696
ksys_ioctl+0x62/0x90 fs/ioctl.c:713
__do_sys_ioctl fs/ioctl.c:720 [inline]
__se_sys_ioctl fs/ioctl.c:718 [inline]
__x64_sys_ioctl+0x6e/0xb0 fs/ioctl.c:718
do_syscall_64+0xca/0x5d0 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Commit 9121923c457d ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
moves memslots and buses allocations around, however, if kvm->srcu/irq_srcu fails
initialization, NULL will be returned instead of error code, NULL will not be intercepted
in kvm_dev_ioctl_create_vm() and be dereferenced by kvm_coalesced_mmio_init(), this patch
fixes it.
Moving the initialization is required anyway to avoid an incorrect synchronize_srcu that
was also reported by syzkaller:
wait_for_completion+0x29c/0x440 kernel/sched/completion.c:136
__synchronize_srcu+0x197/0x250 kernel/rcu/srcutree.c:921
synchronize_srcu_expedited kernel/rcu/srcutree.c:946 [inline]
synchronize_srcu+0x239/0x3e8 kernel/rcu/srcutree.c:997
kvm_page_track_unregister_notifier+0xe7/0x130 arch/x86/kvm/page_track.c:212
kvm_mmu_uninit_vm+0x1e/0x30 arch/x86/kvm/mmu.c:5828
kvm_arch_destroy_vm+0x4a2/0x5f0 arch/x86/kvm/x86.c:9579
kvm_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:702 [inline]
so do it.
Reported-by: [email protected]
Reported-by: [email protected]
Fixes: 9121923c457d ("kvm: Allocate memslots and buses before calling kvm_arch_init_vm")
Cc: Jim Mattson <[email protected]>
Cc: Wanpeng Li <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Apply same logic to pin setup as on previous platforms. Fixes
errors in HDMI/DP playback.
Tested with both snd-hda-intel and SOF drivers.
Fixes: 9a11ba7388f1 ("ALSA: hda: hdmi - add Tigerlake support")
Signed-off-by: Kai Vehmanen <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
With latest llvm compiler, running test_progs will have the following
verifier failure for test_sysctl_loop1.o:
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
invalid indirect read from stack var_off (0x0; 0xff)+196 size 7
...
libbpf: -- END LOG --
libbpf: failed to load program 'cgroup/sysctl'
libbpf: failed to load object 'test_sysctl_loop1.o'
The related bytecode looks as below:
0000000000000308 LBB0_8:
97: r4 = r10
98: r4 += -288
99: r4 += r7
100: w8 &= 255
101: r1 = r10
102: r1 += -488
103: r1 += r8
104: r2 = 7
105: r3 = 0
106: call 106
107: w1 = w0
108: w1 += -1
109: if w1 > 6 goto -24 <LBB0_5>
110: w0 += w8
111: r7 += 8
112: w8 = w0
113: if r7 != 224 goto -17 <LBB0_8>
And source code:
for (i = 0; i < ARRAY_SIZE(tcp_mem); ++i) {
ret = bpf_strtoul(value + off, MAX_ULONG_STR_LEN, 0,
tcp_mem + i);
if (ret <= 0 || ret > MAX_ULONG_STR_LEN)
return 0;
off += ret & MAX_ULONG_STR_LEN;
}
Current verifier is not able to conclude that register w0 before '+'
at insn 110 has a range of 1 to 7 and thinks it is from 0 - 255. This
leads to more conservative range for w8 at insn 112, and later verifier
complaint.
Let us workaround this issue until we found a compiler and/or verifier
solution. The workaround in this patch is to make variable 'ret' volatile,
which will force a reload and then '&' operation to ensure better value
range. With this patch, I got the below byte code for the loop:
0000000000000328 LBB0_9:
101: r4 = r10
102: r4 += -288
103: r4 += r7
104: w8 &= 255
105: r1 = r10
106: r1 += -488
107: r1 += r8
108: r2 = 7
109: r3 = 0
110: call 106
111: *(u32 *)(r10 - 64) = r0
112: r1 = *(u32 *)(r10 - 64)
113: if w1 s< 1 goto -28 <LBB0_5>
114: r1 = *(u32 *)(r10 - 64)
115: if w1 s> 7 goto -30 <LBB0_5>
116: r1 = *(u32 *)(r10 - 64)
117: w1 &= 7
118: w1 += w8
119: r7 += 8
120: w8 = w1
121: if r7 != 224 goto -21 <LBB0_9>
Insn 117 did the '&' operation and we got more precise value range
for 'w8' at insn 120. The test is happy then:
#3/17 test_sysctl_loop1.o:OK
Signed-off-by: Yonghong Song <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add HD Audio Device PCI ID for the Intel Cometlake-S platform
Signed-off-by: Chiou, Cooper <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Magnus Karlsson says:
====================
This patch set extends libbpf and the xdpsock sample program to
demonstrate the shared umem mode (XDP_SHARED_UMEM) as well as Rx-only
and Tx-only sockets. This in order for users to have an example to use
as a blue print and also so that these modes will be exercised more
frequently.
Note that the user needs to supply an XDP program with the
XDP_SHARED_UMEM mode that distributes the packets over the sockets
according to some policy. There is an example supplied with the
xdpsock program, but there is no default one in libbpf similarly to
when XDP_SHARED_UMEM is not used. The reason for this is that I felt
that supplying one that would work for all users in this mode is
futile. There are just tons of ways to distribute packets, so whatever
I come up with and build into libbpf would be wrong in most cases.
This patch has been applied against commit 30ee348c1267 ("Merge branch 'bpf-libbpf-fixes'")
Structure of the patch set:
Patch 1: Adds shared umem support to libbpf
Patch 2: Shared umem support and example XPD program added to xdpsock sample
Patch 3: Adds Rx-only and Tx-only support to libbpf
Patch 4: Uses Rx-only sockets for rxdrop and Tx-only sockets for txpush in
the xdpsock sample
Patch 5: Add documentation entries for these two features
====================
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Add more documentation about the new Rx-only and Tx-only sockets in
libbpf and also how libbpf can now support shared umems. Also found
two pieces that could be improved in the text, that got fixed in this
commit.
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Tested-by: William Tu <[email protected]>
Acked-by: Jonathan Lemon <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Use Rx-only sockets for the rxdrop sample and Tx-only sockets for the
txpush sample in the xdpsock application. This so that we exercise and
show case these socket types too.
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Tested-by: William Tu <[email protected]>
Acked-by: Jonathan Lemon <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The libbpf AF_XDP code is extended to allow for the creation of Rx
only or Tx only sockets. Previously it returned an error if the socket
was not initialized for both Rx and Tx.
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Tested-by: William Tu <[email protected]>
Acked-by: Jonathan Lemon <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add support for the XDP_SHARED_UMEM mode to the xdpsock sample
application. As libbpf does not have a built in XDP program for this
mode, we use an explicitly loaded XDP program. This also serves as an
example on how to write your own XDP program that can route to an
AF_XDP socket.
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Tested-by: William Tu <[email protected]>
Acked-by: Jonathan Lemon <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Add support in libbpf to create multiple sockets that share a single
umem. Note that an external XDP program need to be supplied that
routes the incoming traffic to the desired sockets. So you need to
supply the libbpf_flag XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD and load
your own XDP program.
Signed-off-by: Magnus Karlsson <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Tested-by: William Tu <[email protected]>
Acked-by: Jonathan Lemon <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Toke Høiland-Jørgensen says:
====================
This series fixes a few bugs in libbpf that I discovered while playing around
with the new auto-pinning code, and writing the first utility in xdp-tools[0]:
- If object loading fails, libbpf does not clean up the pinnings created by the
auto-pinning mechanism.
- EPERM is not propagated to the caller on program load
- Netlink functions write error messages directly to stderr
In addition, libbpf currently only has a somewhat limited getter function for
XDP link info, which makes it impossible to discover whether an attached program
is in SKB mode or not. So the last patch in the series adds a new getter for XDP
link info which returns all the information returned via netlink (and which can
be extended later).
Finally, add a getter for BPF program size, which can be used by the caller to
estimate the amount of locked memory needed to load a program.
A selftest is added for the pinning change, while the other features were tested
in the xdp-filter tool from the xdp-tools repo. The 'new-libbpf-features' branch
contains the commits that make use of the new XDP getter and the corrected EPERM
error code.
[0] https://github.com/xdp-project/xdp-tools
Changelog:
v4:
- Don't do any size checks on struct xdp_info, just copy (and/or zero)
whatever size the caller supplied.
v3:
- Pass through all kernel error codes on program load (instead of just EPERM).
- No new bpf_object__unload() variant, just do the loop at the caller
- Don't reject struct xdp_info sizes that are bigger than what we expect.
- Add a comment noting that bpf_program__size() returns the size in bytes
v2:
- Keep function names in libbpf.map sorted properly
====================
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
This adds a new getter for the BPF program size (in bytes). This is useful
for a caller that is trying to predict how much memory will be locked by
loading a BPF object into the kernel.
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Currently, libbpf only provides a function to get a single ID for the XDP
program attached to the interface. However, it can be useful to get the
full set of program IDs attached, along with the attachment mode, in one
go. Add a new getter function to support this, using an extendible
structure to carry the information. Express the old bpf_get_link_id()
function in terms of the new function.
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
The netlink functions were using fprintf(stderr, ) directly to print out
error messages, instead of going through the usual logging macros. This
makes it impossible for the calling application to silence or redirect
those error messages. Fix this by switching to pr_warn() in nlattr.c and
netlink.c.
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
When loading an eBPF program, libbpf overrides the return code for EPERM
errors instead of returning it to the caller. This makes it hard to figure
out what went wrong on load.
In particular, EPERM is returned when the system rlimit is too low to lock
the memory required for the BPF program. Previously, this was somewhat
obscured because the rlimit error would be hit on map creation (which does
return it correctly). However, since maps can now be reused, object load
can proceed all the way to loading programs without hitting the error;
propagating it even in this case makes it possible for the caller to react
appropriately (and, e.g., attempt to raise the rlimit before retrying).
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
This add tests for the different variations of automatic map unpinning on
load failure.
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Since the automatic map-pinning happens during load, it will leave pinned
maps around if the load fails at a later stage. Fix this by unpinning any
pinned maps on cleanup. To avoid unpinning pinned maps that were reused
rather than newly pinned, add a new boolean property on struct bpf_map to
keep track of whether that map was reused or not; and only unpin those maps
that were not reused.
Fixes: 57a00f41644f ("libbpf: Add auto-pinning of maps when loading BPF objects")
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: David S. Miller <[email protected]>
Acked-by: Song Liu <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A set of fixes that have trickled in over the last couple of weeks:
- MAINTAINER update for Cavium/Marvell ThunderX2
- stm32 tweaks to pinmux for Joystick/Camera, and RAM allocation for
CAN interfaces
- i.MX fixes for voltage regulator GPIO mappings, fixes voltage
scaling issues
- More i.MX fixes for various issues on i.MX eval boards: interrupt
storm due to u-boot leaving pins in new states, fixing power button
config, a couple of compatible-string corrections.
- Powerdown and Suspend/Resume fixes for Allwinner A83-based tablets
- A few documentation tweaks and a fix of a memory leak in the reset
subsystem"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
MAINTAINERS: update Cavium ThunderX2 maintainers
ARM: dts: stm32: change joystick pinctrl definition on stm32mp157c-ev1
ARM: dts: stm32: remove OV5640 pinctrl definition on stm32mp157c-ev1
ARM: dts: stm32: Fix CAN RAM mapping on stm32mp157c
ARM: dts: stm32: relax qspi pins slew-rate for stm32mp157
arm64: dts: zii-ultra: fix ARM regulator GPIO handle
ARM: sunxi: Fix CPU powerdown on A83T
ARM: dts: sun8i-a83t-tbs-a711: Fix WiFi resume from suspend
arm64: dts: imx8mn: fix compatible string for sdma
arm64: dts: imx8mm: fix compatible string for sdma
reset: fix reset_control_ops kerneldoc comment
ARM: dts: imx6-logicpd: Re-enable SNVS power key
soc: imx: gpc: fix initialiser format
ARM: dts: imx6qdl-sabreauto: Fix storm of accelerometer interrupts
arm64: dts: ls1028a: fix a compatible issue
reset: fix reset_control_get_exclusive kerneldoc comment
reset: fix reset_control_lookup kerneldoc comment
reset: fix of_reset_control_get_count kerneldoc comment
reset: fix of_reset_simple_xlate kerneldoc comment
reset: Fix memory leak in reset_control_array_put()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull IIO fixes and staging driver from Greg KH:
"Here is a mix of a number of IIO driver fixes for 5.4-rc7, and a whole
new staging driver.
The IIO fixes resolve some reported issues, all are tiny.
The staging driver addition is the vboxsf filesystem, which is the
VirtualBox guest shared folder code. Hans has been trying to get
filesystem reviewers to review the code for many months now, and
Christoph finally said to just merge it in staging now as it is
stand-alone and the filesystem people can review it easier over time
that way.
I know it's late for this big of an addition, but it is stand-alone.
The code has been in linux-next for a while, long enough to pick up a
few tiny fixes for it already so people are looking at it.
All of these have been in linux-next with no reported issues"
* tag 'staging-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: Fix error return code in vboxsf_fill_super()
staging: vboxsf: fix dereference of pointer dentry before it is null checked
staging: vboxsf: Remove unused including <linux/version.h>
staging: Add VirtualBox guest shared folder (vboxsf) support
iio: adc: stm32-adc: fix stopping dma
iio: imu: inv_mpu6050: fix no data on MPU6050
iio: srf04: fix wrong limitation in distance measuring
iio: imu: adis16480: make sure provided frequency is positive
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a number of late-arrival driver fixes for issues reported for
some char/misc drivers for 5.4-rc7
These all come from the different subsystem/driver maintainers as
things that they had reports for and wanted to see fixed.
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
intel_th: pci: Add Jasper Lake PCH support
intel_th: pci: Add Comet Lake PCH support
intel_th: msu: Fix possible memory leak in mode_store()
intel_th: msu: Fix overflow in shift of an unsigned int
intel_th: msu: Fix missing allocation failure check on a kstrndup
intel_th: msu: Fix an uninitialized mutex
intel_th: gth: Fix the window switching sequence
soundwire: slave: fix scanf format
soundwire: intel: fix intel_register_dai PDI offsets and numbers
interconnect: Add locking in icc_set_tag()
interconnect: qcom: Fix icc_onecell_data allocation
soundwire: depend on ACPI || OF
soundwire: depend on ACPI
thunderbolt: Drop unnecessary read when writing LC command in Ice Lake
thunderbolt: Fix lockdep circular locking depedency warning
thunderbolt: Read DP IN adapter first two dwords in one go
|
|
Pull configfs regression fix from Christoph Hellwig:
"Fix a regression from this merge window in the configfs symlink
handling (Honggang Li)"
* tag 'configfs-for-5.4-2' of git://git.infradead.org/users/hch/configfs:
configfs: calculate the depth of parent item
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:
- Make the tsc=reliable/nowatchdog command line parameter work again.
It was broken with the introduction of the early TSC clocksource.
- Prevent the evaluation of exception stacks before they are set up.
This causes a crash in dumpstack because the stack walk termination
gets screwed up.
- Prevent a NULL pointer dereference in the rescource control file
system.
- Avoid bogus warnings about APIC id mismatch related to the LDR
which can happen when the LDR is not in use and therefore not
initialized. Only evaluate that when the APIC is in logical
destination mode"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Respect tsc command line paraemeter for clocksource_tsc_early
x86/dumpstack/64: Don't evaluate exception stacks before setup
x86/apic/32: Avoid bogus LDR warnings
x86/resctrl: Prevent NULL pointer dereference when reading mondata
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A small set of fixes for timekeepoing and clocksource drivers:
- VDSO data was updated conditional on the availability of a VDSO
capable clocksource. This causes the VDSO functions which do not
depend on a VDSO capable clocksource to operate on stale data.
Always update unconditionally.
- Prevent a double free in the mediatek driver
- Use the proper helper in the sh_mtu2 driver so it won't attempt to
initialize non-existing interrupts"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping/vsyscall: Update VDSO data unconditionally
clocksource/drivers/sh_mtu2: Do not loop using platform_get_irq_by_name()
clocksource/drivers/mediatek: Fix error handling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
"Two fixes for scheduler regressions:
- Plug a subtle race condition which was introduced with the rework
of the next task selection functionality. The change of task
properties became unprotected which can be observed inconsistently
causing state corruption.
- A trivial compile fix for CONFIG_CGROUPS=n"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix pick_next_task() vs 'change' pattern race
sched/core: Fix compilation error when cgroup not selected
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:
- Fix the time sorting algorithm which was broken due to truncation of
big numbers
- Fix the python script generator fail caused by a broken tracepoint
array iterator
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Fix time sorting
perf tools: Remove unused trace_find_next_event()
perf scripting engines: Iterate on tep event arrays directly
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixlet from Thomas Gleixner:
"A trivial fix for a kernel doc regression where an argument change was
not reflected in the documentation"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irq/irqdomain: Update __irq_domain_alloc_fwnode() function documentation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull stacktrace fix from Thomas Gleixner:
"A small fix for a stacktrace regression.
Saving a stacktrace for a foreign task skipped an extra entry which
makes e.g. the output of /proc/$PID/stack incomplete"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
stacktrace: Don't skip first entry on noncurrent tasks
|
|
Pull cifs fix from Steve French:
"Small fix for an smb3 reconnect bug (also marked for stable)"
* tag '5.4-rc7-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
SMB3: Fix persistent handles reconnect
|
|
config option GENERIC_IO was removed but still selected by lib/kconfig
This patch finish the cleaning.
Fixes: 9de8da47742b ("kconfig: kill off GENERIC_IO option")
Acked-by: Rob Herring <[email protected]>
Signed-off-by: Corentin Labbe <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We need to get the underlying dentry of parent; sure, absent the races
it is the parent of underlying dentry, but there's nothing to prevent
losing a timeslice to preemtion in the middle of evaluation of
lower_dentry->d_parent->d_inode, having another process move lower_dentry
around and have its (ex)parent not pinned anymore and freed on memory
pressure. Then we regain CPU and try to fetch ->d_inode from memory
that is freed by that point.
dentry->d_parent *is* stable here - it's an argument of ->lookup() and
we are guaranteed that it won't be moved anywhere until we feed it
to d_add/d_splice_alias. So we safely go that way to get to its
underlying dentry.
Cc: [email protected] # since 2009 or so
Signed-off-by: Al Viro <[email protected]>
|
|
lower_dentry can't go from positive to negative (we have it pinned),
but it *can* go from negative to positive. So fetching ->d_inode
into a local variable, doing a blocking allocation, checking that
now ->d_inode is non-NULL and feeding the value we'd fetched
earlier to a function that won't accept NULL is not a good idea.
Cc: [email protected]
Signed-off-by: Al Viro <[email protected]>
|
|
A problem similar to the one caught in commit 74dd7c97ea2a ("ecryptfs_rename():
verify that lower dentries are still OK after lock_rename()") exists for
unlink/rmdir as well.
Instead of playing with dget_parent() of underlying dentry of victim
and hoping it's the same as underlying dentry of our directory,
do the following:
* find the underlying dentry of victim
* find the underlying directory of victim's parent (stable
since the victim is ecryptfs dentry and inode of its parent is
held exclusive by the caller).
* lock the inode of dentry underlying the victim's parent
* check that underlying dentry of victim is still hashed and
has the right parent - it can be moved, but it can't be moved to/from
the directory we are holding exclusive. So while ->d_parent itself
might not be stable, the result of comparison is.
If the check passes, everything is fine - underlying directory is locked,
underlying victim is still a child of that directory and we can go ahead
and feed them to vfs_unlink(). As in the current mainline we need to
pin the underlying dentry of victim, so that it wouldn't go negative under
us, but that's the only temporary reference that needs to be grabbed there.
Underlying dentry of parent won't go away (it's pinned by the parent,
which is held by caller), so there's no need to grab it.
The same problem (with the same solution) exists for rmdir. Moreover,
rename gets simpler and more robust with the same "don't bother with
dget_parent()" approach.
Fixes: 74dd7c97ea2 "ecryptfs_rename(): verify that lower dentries are still OK after lock_rename()"
Signed-off-by: Al Viro <[email protected]>
|
|
if the child has been negative and just went positive
under us, we want coherent d_is_positive() and ->d_inode.
Don't unlock the parent until we'd done that work...
Signed-off-by: Al Viro <[email protected]>
|
|
locked
Signed-off-by: Al Viro <[email protected]>
|
|
the caller of ->get_tree() expects NULL left there on error...
Reported-by: Thibaut Sautereau <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Heiner Kallweit says:
====================
r8169: improve PHY configuration
This series adds helpers to improve and simplify the PHY
configuration on various network chip versions.
====================
Signed-off-by: David S. Miller <[email protected]>
|