Age | Commit message (Collapse) | Author | Files | Lines |
|
This is meant to be a gentle introduction into the world of watermarks
on ocelot. The code is placed in ocelot_devlink.c because it will be
integrated with devlink, even if it isn't right now.
My first step was intended to be to replicate the default configuration
of the congestion watermarks programatically, since they are now going
to be tuned by the user.
But after studying and understanding through trial and error how they
work, I now believe that the configuration used out of reset does not do
justice to the word "reservation", since the sum of all reservations
exceeds the total amount of resources (otherwise said, all reservations
cannot be fulfilled at the same time, which means that, contrary to the
reference manual, they don't guarantee anything).
As an example, here's a dump of the reservation watermarks for frame
buffers, for port 0 (for brevity, the ports 1-6 were omitted, but they
have the same configuration):
BUF_Q_RSRV_I(port 0, prio 0) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 1) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 2) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 3) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 4) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 5) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 6) = max 3000 bytes
BUF_Q_RSRV_I(port 0, prio 7) = max 3000 bytes
Otherwise said, every port-tc has an ingress reservation of 3000 bytes,
and there are 7 ports in VSC9959 Felix (6 user ports and 1 CPU port).
Concentrating only on the ingress reservations, there are, in total,
8 [traffic classes] x 7 [ports] x 3000 [bytes] = 168,000 bytes of memory
reserved on ingress.
But, surprise, Felix only has 128 KB of packet buffer in total...
A similar thing happens with Seville, which has a larger packet buffer,
but also more ports, and the default configuration is also overcommitted.
This patch disables the (apparently) bogus reservations and moves all
resources to the shared area. This way, real reservations can be set up
by the user, using devlink-sb.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add devlink integration into the mscc_ocelot switchdev driver. All
physical ports (i.e. the unused ones as well) except the CPU port module
at ocelot->num_phys_ports are registered with devlink, and that requires
keeping the devlink_port structure outside struct ocelot_port_private,
since the latter has a 1:1 mapping with a struct net_device (which does
not exist for unused ports).
Since we use devlink_port_type_eth_set to link the devlink port to the
net_device, we can as well remove the .ndo_get_phys_port_name and
.ndo_get_port_parent_id implementations, since devlink takes care of
retrieving the port name and number automatically, once
.ndo_get_devlink_port is implemented.
Note that the felix DSA driver is already integrated with devlink by
default, since that is a thing that the DSA core takes care of. This is
the reason why these devlink stubs were put in ocelot_net.c and not in
the common library. It is also the reason why ocelot::devlink is a
pointer and not a full structure embedded inside struct ocelot: because
the mscc_ocelot driver allocates that by itself (as the container of
struct ocelot, in fact), but in the case of felix, it is DSA who
allocates the devlink, and felix just propagates the pointer towards
struct ocelot.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This is a leftover of commit 69df578c5f4b ("net: mscc: ocelot: eliminate
confusion between CPU and NPI port") which renamed that function.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We should be moving anything that isn't DSA-specific or SoC-specific out
of the felix DSA driver, and into the common mscc_ocelot switch library.
The number of traffic classes is one of the aspects that is common
between all ocelot switches, so it belongs in the library.
This patch also makes seville use 8 TX queues, and therefore enables
prioritization via the QOS_CLASS field in the NPI injection header.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In general it is desirable that cleanup is the reverse process of setup.
In this case I am not seeing any particular issue, but with the
introduction of devlink-sb for felix, a non-obvious decision had to be
made as to where to put its cleanup method. When there's a convention in
place, that decision becomes obvious.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The devlink function pointer names are super long, and they would break
the alignment. So reindent the existing ops now by adding one tab.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Switches that care about QoS might have hardware support for reserving
buffer pools for individual ports or traffic classes, and configuring
their sizes and thresholds. Through devlink-sb (shared buffers), this is
all configurable, as well as their occupancy being viewable.
Add the plumbing in DSA for these operations.
Individual drivers still need to call devlink_sb_register() with the
shared buffers they want to expose. A helper was not created in DSA for
this purpose (unlike, say, dsa_devlink_params_register), since in my
opinion it does not bring any benefit over plainly calling
devlink_sb_register() directly.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We'll need to read back the watermark thresholds and occupancy from
hardware (for devlink-sb integration), not only to write them as we did
so far in ocelot_port_set_maxlen. So introduce 2 new functions in struct
ocelot_ops, similar to wm_enc, and implement them for the 3 supported
mscc_ocelot switches.
Remove the INUSE and MAXUSE unpacking helpers for the QSYS_RES_STAT
register, because that doesn't scale with the number of switches that
mscc_ocelot supports now. They have different bit widths for the
watermarks, and we need function pointers to abstract that difference
away.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Instead of reading these values from the reference manual and writing
them down into the driver, it appears that the hardware gives us the
option of detecting them dynamically.
The number of frame references corresponds to what the reference manual
notes, however it seems that the frame buffers are reported as slightly
less than the books would indicate. On VSC9959 (Felix), the books say it
should have 128KB of packet buffer, but the registers indicate only
129840 bytes (126.79 KB). Also, the unit of measurement for FREECNT from
the documentation of all these devices is incorrect (taken from an older
generation). This was confirmed by Younes Leroul from Microchip support.
Not having anything better to do with these values at the moment* (this
will change soon), let's just print them.
*The frame buffer size is, in fact, used to calculate the tail dropping
watermarks.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
sizeof needs to be called on the compat pointer, not the native one.
Fixes: 89cd35c58bc2 ("iov_iter: transparently handle compat iovecs in import_iovec")
Reported-by: David Laight <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
tc_index being 16bit wide, we need to check that TCA_TCINDEX_SHIFT
attribute is not silly.
UBSAN: shift-out-of-bounds in net/sched/cls_tcindex.c:260:29
shift exponent 255 is too large for 32-bit type 'int'
CPU: 0 PID: 8516 Comm: syz-executor228 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
valid_perfect_hash net/sched/cls_tcindex.c:260 [inline]
tcindex_set_parms.cold+0x1b/0x215 net/sched/cls_tcindex.c:425
tcindex_change+0x232/0x340 net/sched/cls_tcindex.c:546
tc_new_tfilter+0x13fb/0x21b0 net/sched/cls_api.c:2127
rtnetlink_rcv_msg+0x8b6/0xb80 net/core/rtnetlink.c:5555
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:672
____sys_sendmsg+0x6e8/0x810 net/socket.c:2336
___sys_sendmsg+0xf3/0x170 net/socket.c:2390
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2423
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
syzbot report reminded us that very big ewma_log were supported in the past,
even if they made litle sense.
tc qdisc replace dev xxx root est 1sec 131072sec ...
While fixing the bug, also add boundary checks for ewma_log, in line
with range supported by iproute2.
UBSAN: shift-out-of-bounds in net/core/gen_estimator.c:83:38
shift exponent -1 is negative
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
__ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
est_timer.cold+0xbb/0x12d net/core/gen_estimator.c:83
call_timer_fn+0x1a5/0x710 kernel/time/timer.c:1417
expire_timers kernel/time/timer.c:1462 [inline]
__run_timers.part.0+0x692/0xa80 kernel/time/timer.c:1731
__run_timers kernel/time/timer.c:1712 [inline]
run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1744
__do_softirq+0x2bc/0xa77 kernel/softirq.c:343
asm_call_irq_on_stack+0xf/0x20
</IRQ>
__run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline]
run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline]
do_softirq_own_stack+0xaa/0xd0 arch/x86/kernel/irq_64.c:77
invoke_softirq kernel/softirq.c:226 [inline]
__irq_exit_rcu+0x17f/0x200 kernel/softirq.c:420
irq_exit_rcu+0x5/0x20 kernel/softirq.c:432
sysvec_apic_timer_interrupt+0x4d/0x100 arch/x86/kernel/apic/apic.c:1096
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628
RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline]
RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:79 [inline]
RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:169 [inline]
RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline]
RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516
Fixes: 1c0d32fde5bd ("net_sched: gen_estimator: complete rewrite of rate estimators")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
iproute2 probably never goes beyond 8 for the cell exponent,
but stick to the max shift exponent for signed 32bit.
UBSAN reported:
UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22
shift exponent 130 is too large for 32-bit type 'int'
CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x183/0x22e lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:148 [inline]
__ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
__detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389
qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435
cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180
qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246
tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662
rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564
netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494
netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline]
netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330
netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:652 [inline]
sock_sendmsg net/socket.c:672 [inline]
____sys_sendmsg+0x5a2/0x900 net/socket.c:2345
___sys_sendmsg net/socket.c:2399 [inline]
__sys_sendmsg+0x319/0x400 net/socket.c:2432
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Acked-by: Cong Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM-raid's raid1 discard limits so discards work.
- Select missing Kconfig dependencies for DM integrity and zoned
targets.
- Four fixes for DM crypt target's support to optionally bypass kcryptd
workqueues.
- Fix DM snapshot merge supports missing data flushes before committing
metadata.
- Fix DM integrity data device flushing when external metadata is used.
- Fix DM integrity's maximum number of supported constructor arguments
that user can request when creating an integrity device.
- Eliminate DM core ioctl logging noise when an ioctl is issued without
required CAP_SYS_RAWIO permission.
* tag 'for-5.11/dm-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm crypt: defer decryption to a tasklet if interrupts disabled
dm integrity: fix the maximum number of arguments
dm crypt: do not call bio_endio() from the dm-crypt tasklet
dm integrity: fix flush with external metadata device
dm: eliminate potential source of excessive kernel log noise
dm snapshot: flush merged data before committing metadata
dm crypt: use GFP_ATOMIC when allocating crypto requests from softirq
dm crypt: do not wait for backlogged crypto request completion in softirq
dm zoned: select CONFIG_CRC32
dm integrity: select CRYPTO_SKCIPHER
dm raid: fix discard limits for raid1
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-01-16
1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support,
that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman.
2) Add support for using kernel module global variables (__ksym externs in BPF
programs) retrieved via module's BTF, from Andrii Nakryiko.
3) Generalize BPF stackmap's buildid retrieval and add support to have buildid
stored in mmap2 event for perf, from Jiri Olsa.
4) Various fixes for cross-building BPF sefltests out-of-tree which then will
unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker.
5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann.
6) Clean up driver's XDP buffer init and split into two helpers to init per-
descriptor and non-changing fields during processing, from Lorenzo Bianconi.
7) Minor misc improvements to libbpf & bpftool, from Ian Rogers.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits)
perf: Add build id data in mmap2 event
bpf: Add size arg to build_id_parse function
bpf: Move stack_map_get_build_id into lib
bpf: Document new atomic instructions
bpf: Add tests for new BPF atomic operations
bpf: Add bitwise atomic instructions
bpf: Pull out a macro for interpreting atomic ALU operations
bpf: Add instructions for atomic_[cmp]xchg
bpf: Add BPF_FETCH field / create atomic_fetch_add instruction
bpf: Move BPF_STX reserved field check into BPF_STX verifier code
bpf: Rename BPF_XADD and prepare to encode other atomics in .imm
bpf: x86: Factor out a lookup table for some ALU opcodes
bpf: x86: Factor out emission of REX byte
bpf: x86: Factor out emission of ModR/M for *(reg + off)
tools/bpftool: Add -Wall when building BPF programs
bpf, libbpf: Avoid unused function warning on bpf_tail_call_static
selftests/bpf: Install btf_dump test cases
selftests/bpf: Fix installation of urandom_read
selftests/bpf: Move generated test files to $(TEST_GEN_FILES)
selftests/bpf: Fix out-of-tree build
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Defining DEBUG should only be done in development.
So remove DEBUG.
Signed-off-by: Tom Rix <[email protected]>
Reviewed-by: Marek Vasut <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Defining DEBUG should only be done in development.
So remove DEBUG.
Signed-off-by: Tom Rix <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Defining DEBUG should only be done in development.
So remove DEBUG.
Signed-off-by: Tom Rix <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
As explained in commit 54a0ed0df496 ("net: dsa: provide an option for
drivers to always receive bridge VLANs"), DSA has historically been
skipping VLAN switchdev operations when the bridge wasn't in
vlan_filtering mode, but the reason why it was doing that has never been
clear. So the configure_vlan_while_not_filtering option is there merely
to preserve functionality for existing drivers. It isn't some behavior
that drivers should opt into. Ideally, when all drivers leave this flag
set, we can delete the dsa_port_skip_vlan_configuration() function.
New drivers always seem to omit setting this flag, for some reason. So
let's reverse the logic: the DSA core sets it by default to true before
the .setup() callback, and legacy drivers can turn it off. This way, new
drivers get the new behavior by default, unless they explicitly set the
flag to false, which is more obvious during review.
Remove the assignment from drivers which were setting it to true, and
add the assignment to false for the drivers that didn't previously have
it. This way, it should be easier to see how many we have left.
The following drivers: lan9303, mv88e6060 were skipped from setting this
flag to false, because they didn't have any VLAN offload ops in the
first place.
The Broadcom Starfighter 2 driver calls the common b53_switch_alloc and
therefore also inherits the configure_vlan_while_not_filtering=true
behavior.
Also, print a message through netlink extack every time a VLAN has been
skipped. This is mildly annoying on purpose, so that (a) it is at least
clear that VLANs are being skipped - the legacy behavior in itself is
confusing, and the extack should be much more difficult to miss, unlike
kernel logs - and (b) people have one more incentive to convert to the
new behavior.
No behavior change except for the added prints is intended at this time.
$ ip link add br0 type bridge vlan_filtering 0
$ ip link set sw0p2 master br0
[ 60.315148] br0: port 1(sw0p2) entered blocking state
[ 60.320350] br0: port 1(sw0p2) entered disabled state
[ 60.327839] device sw0p2 entered promiscuous mode
[ 60.334905] br0: port 1(sw0p2) entered blocking state
[ 60.340142] br0: port 1(sw0p2) entered forwarding state
Warning: dsa_core: skipping configuration of VLAN. # This was the pvid
$ bridge vlan add dev sw0p2 vid 100
Warning: dsa_core: skipping configuration of VLAN.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Kurt Kanzenbach <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'netxen_get_minidump_template()' GFP_KERNEL can
be used because its only caller, ' netxen_setup_minidump(()' already uses
it and no lock is acquired in the between.
When memory is allocated in other function in 'netxen_nic_ctx.c' GFP_KERNEL
can be used because the call chain already uses GFP_KERNEL and no lock is
taken in the between.
The call chain is:
netxen_nic_attach()
--> netxen_alloc_sw_resources() : already uses GFP_KERNEL
--> netxen_alloc_hw_resources()
--> nx_fw_cmd_create_rx_ctx()
--> nx_fw_cmd_create_tx_ctx()
When memory is allocated in 'netxen_init_dummy_dma()' GFP_KERNEL can be
used because its only call chain already uses it and no lock is acquired in
the between.
The call chain is:
--> netxen_start_firmware
--> netxen_request_firmware()
--> request_firmware()
--> _request_firmware(()
--> fw_get_filesystem_firmware()
--> __getname() : already uses GFP_KERNEL
--> netxen_init_dummy_dma()
@@
@@
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL
@@
@@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
@@
@@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
@@
@@
- PCI_DMA_NONE
+ DMA_NONE
@@
expression e1, e2, e3;
@@
- pci_alloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3;
@@
- pci_zalloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3, e4;
@@
- pci_free_consistent(e1, e2, e3, e4)
+ dma_free_coherent(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4, e5;
@@
- pci_map_page(e1, e2, e3, e4, e5)
+ dma_map_page(&e1->dev, e2, e3, e4, e5)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_page(e1, e2, e3, e4)
+ dma_unmap_page(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_sg(e1, e2, e3, e4)
+ dma_map_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_sg(e1, e2, e3, e4)
+ dma_unmap_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+ dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_device(e1, e2, e3, e4)
+ dma_sync_single_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+ dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+ dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_consistent_dma_mask(e1, e2)
+ dma_set_coherent_mask(&e1->dev, e2)
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2021-01-16
1) Fix a double bpf_prog_put() for BPF_PROG_{TYPE_EXT,TYPE_TRACING} types in
link creation's error path causing a refcount underflow, from Jiri Olsa.
2) Fix BTF validation errors for the case where kernel modules don't declare
any new types and end up with an empty BTF, from Andrii Nakryiko.
3) Fix BPF local storage helpers to first check their {task,inode} owners for
being NULL before access, from KP Singh.
4) Fix a memory leak in BPF setsockopt handling for the case where optlen is
zero and thus temporary optval buffer should be freed, from Stanislav Fomichev.
5) Fix a syzbot memory allocation splat in BPF_PROG_TEST_RUN infra for
raw_tracepoint caused by too big ctx_size_in, from Song Liu.
6) Fix LLVM code generation issues with verifier where PTR_TO_MEM{,_OR_NULL}
registers were spilled to stack but not recognized, from Gilad Reti.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
MAINTAINERS: Update my email address
selftests/bpf: Add verifier test for PTR_TO_MEM spill
bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling
bpf: Reject too big ctx_size_in for raw_tp test run
libbpf: Allow loading empty BTFs
bpf: Allow empty module BTFs
bpf: Don't leak memory in bpf getsockopt when optlen == 0
bpf: Update local storage test to check handling of null ptrs
bpf: Fix typo in bpf_inode_storage.c
bpf: Local storage helpers should check nullness of owner ptr passed
bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Tobias Waldekranz says:
====================
net: dsa: mv88e6xxx: LAG fixes
The kernel test robot kindly pointed out that Global 2 support in
mv88e6xxx is optional.
This also made me realize that we should verify that the hardware
actually supports LAG offloading before trying to configure it.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There are chips that do have Global 2 registers, and therefore trunk
mapping/mask tables are not available. Refuse the offload as early as
possible on those devices.
Fixes: 57e661aae6a8 ("net: dsa: mv88e6xxx: Link aggregation support")
Signed-off-by: Tobias Waldekranz <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Support for Global 2 registers is build-time optional. In the case
where it was not enabled the build would fail as no "dummy"
implementation of these functions was available.
Fixes: 57e661aae6a8 ("net: dsa: mv88e6xxx: Link aggregation support")
Reported-by: kernel test robot <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Tested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Tobias Waldekranz <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
fl_set_enc_opt() simply checks if there are still bytes left to parse,
but this is not sufficent as syzbot seems to be able to generate
malformatted netlink messages. nla_ok() is more strict so should be
used to validate the next nlattr here.
And nla_validate_nested_deprecated() has less strict check too, it is
probably too late to switch to the strict version, but we can just
call nla_ok() too after it.
Reported-and-tested-by: [email protected]
Fixes: 0a6e77784f49 ("net/sched: allow flower to match tunnel options")
Fixes: 79b1011cb33d ("net: sched: allow flower to match erspan options")
Cc: Jamal Hadi Salim <[email protected]>
Cc: Xin Long <[email protected]>
Cc: Jiri Pirko <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
George McCollister says:
====================
Arrow SpeedChips XRS700x DSA Driver
This series adds a DSA driver for the Arrow SpeedChips XRS 7000 series
of HSR/PRP gigabit switch chips.
The chips use Flexibilis IP.
More information can be found here:
https://www.flexibilis.com/products/speedchips-xrs7000/
The switches have up to three RGMII ports and one MII port and are
managed via mdio or i2c. They use a one byte trailing tag to identify
the switch port when in managed mode so I've added a tag driver which
implements this.
This series contains minimal DSA functionality which may be built upon
in future patches. The ultimate goal is to add HSR and PRP
(IEC 62439-3 Clause 5 & 4) offloading with integration into net/hsr.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add documentation and an example for Arrow SpeedChips XRS7000 Series
single chip Ethernet switches.
Signed-off-by: George McCollister <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add a driver with initial support for the Arrow SpeedChips XRS7000
series of gigabit Ethernet switch chips which are typically used in
critical networking applications.
The switches have up to three RGMII ports and one RMII port.
Management to the switches can be performed over i2c or mdio.
Support for advanced features such as PTP and
HSR/PRP (IEC 62439-3 Clause 5 & 4) is not included in this patch and
may be added at a later date.
Signed-off-by: George McCollister <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add support for Arrow SpeedChips XRS700x single byte tag trailer. This
is modeled on tag_trailer.c which works in a similar way.
Signed-off-by: George McCollister <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Merge misc fixes from Andrew Morton:
"10 patches.
Subsystems affected by this patch series: MAINTAINERS and mm (slub,
pagealloc, memcg, kasan, vmalloc, migration, hugetlb, memory-failure,
and process_vm_access)"
* emailed patches from Andrew Morton <[email protected]>:
mm/process_vm_access.c: include compat.h
mm,hwpoison: fix printing of page flags
MAINTAINERS: add Vlastimil as slab allocators maintainer
mm/hugetlb: fix potential missing huge page size info
mm: migrate: initialize err in do_migrate_pages
mm/vmalloc.c: fix potential memory leak
arm/kasan: fix the array size of kasan_early_shadow_pte[]
mm/memcontrol: fix warning in mem_cgroup_page_lruvec()
mm/page_alloc: add a missing mm_page_alloc_zone_locked() tracepoint
mm, slub: consider rest of partial list if acquire_slab() fails
|
|
Pull rdma fixes from Jason Gunthorpe:
"A fairly modest set of bug fixes, nothing abnormal from the merge
window
The ucma patch is a bit on the larger side, but given the regression
was recently added I've opted to forward it to the rc stream.
- Fix a ucma memory leak introduced in v5.9 while fixing the
Syzkaller bugs
- Don't fail when the xarray wraps for user verbs objects
- User triggerable oops regression from the umem page size rework
- Error unwind bugs in usnic, ocrdma, mlx5 and cma"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/cma: Fix error flow in default_roce_mode_store
RDMA/mlx5: Fix wrong free of blue flame register on error
IB/mlx5: Fix error unwinding when set_has_smi_cap fails
RDMA/umem: Avoid undefined behavior of rounddown_pow_of_two()
RDMA/ocrdma: Fix use after free in ocrdma_dealloc_ucontext_pd()
RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp
RDMA/restrack: Don't treat as an error allocation ID wrapping
RDMA/ucma: Do not miss ctx destruction steps in some cases
|
|
Russell King says:
====================
Add further DT configuration for AT803x PHYs
This patch series adds the ability to configure the SmartEEE feature
in AT803x PHYs. SmartEEE defaults to enabled on these PHYs, and has
a history of causing random sporadic link drops at Gigabit speeds.
There appears to be two solutions to this. There is the approach that
Freescale adopted early on, which is to disable the SmartEEE feature.
However, this loses the power saving provided by EEE. Another solution
was found by Jon Nettleton is to increase the Tw parameter for Gigabit
links.
This patch series adds support for both approaches, by adding a boolean:
qca,disable-smarteee
if one wishes to disable SmartEEE, and two properties to configure the
SmartEEE Tw parameters:
qca,smarteee-tw-us-100m
qca,smarteee-tw-us-1g
Sadly, the PHY quirk I merged a while back for AT8035 on iMX6 is broken
- rather than disabling SmartEEE mode, it enables it.
The addition of these properties will be sent to the appropriate
platform maintainers - although for SolidRun platforms, we only make use
of "qca,smarteee-tw-us-1g".
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
SmartEEE for the atheros phy was deemed buggy by Freescale and commits
were added to disable it for their boards.
In initial testing, SolidRun found that the default settings were
causing disconnects but by increasing the Tw buffer time we could allow
enough time for all parts of the link to come out of a low power state
and function properly without causing a disconnect. This allows us to
have functional power savings of between 300 and 400mW, rather than
disabling the feature altogether.
This commit adds support for disabling SmartEEE and configuring the Tw
parameters for 1G and 100M speeds.
Signed-off-by: Russell King <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The SmartEEE feature of Atheros AR803x PHYs can cause the link to
bounce. Add DT properties to allow SmartEEE to be disabled, and to
allow the Tw parameters for 100M and 1G links to be configured.
Signed-off-by: Russell King <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
If we enter with requests pending and performm cancelations, we'll have
a different inflight count before and after calling prepare_to_wait().
This causes the loop to restart. If we actually ended up canceling
everything, or everything completed in-between, then we'll break out
of the loop without calling finish_wait() on the waitqueue. This can
trigger a warning on exit_signals(), as we leave the task state in
TASK_UNINTERRUPTIBLE.
Put a finish_wait() after the loop to catch that case.
Cc: [email protected] # 5.9+
Signed-off-by: Jens Axboe <[email protected]>
|
|
My Intel email will stop working in a not too distant future. Move my
MAINTAINERS entries to my kernel.org address.
Signed-off-by: Björn Töpel <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"A number of bug fixes for ext4:
- Fix for the new fast_commit feature
- Fix some error handling codepaths in whiteout handling and
mountpoint sampling
- Fix how we write ext4_error information so it goes through the
journal when journalling is active, to avoid races that can lead to
lost error information, superblock checksum failures, or DIF/DIX
features"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: remove expensive flush on fast commit
ext4: fix bug for rename with RENAME_WHITEOUT
ext4: fix wrong list_splice in ext4_fc_cleanup
ext4: use IS_ERR instead of IS_ERR_OR_NULL and set inode null when IS_ERR
ext4: don't leak old mountpoint samples
ext4: drop ext4_handle_dirty_super()
ext4: fix superblock checksum failure when setting password salt
ext4: use sbi instead of EXT4_SB(sb) in ext4_update_super()
ext4: save error info to sb through journal if available
ext4: protect superblock modifications with a buffer lock
ext4: drop sync argument of ext4_commit_super()
ext4: combine ext4_handle_error() and save_error_info()
|
|
Pull cifs fixes from Steve French:
"Two small cifs fixes for stable (including an important handle leak
fix) and three small cleanup patches"
* tag '5.11-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: style: replace one-element array with flexible-array
cifs: connect: style: Simplify bool comparison
fs: cifs: remove unneeded variable in smb3_fs_context_dup
cifs: fix interrupted close commands
cifs: check pointer before freeing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Set the minimum GCC version to 5.1 for arm64 due to earlier compiler
bugs.
- Make atomic helpers __always_inline to avoid a section mismatch when
compiling with clang.
- Fix the CMA and crashkernel reservations to use ZONE_DMA (remove the
arm64_dma32_phys_limit variable, no longer needed with a dynamic
ZONE_DMA sizing in 5.11).
- Remove redundant IRQ flag tracing that was leaving lockdep
inconsistent with the hardware state.
- Revert perf events based hard lockup detector that was causing
smp_processor_id() to be called in preemptible context.
- Some trivial cleanups - spelling fix, renaming S_FRAME_SIZE to
PT_REGS_SIZE, function prototypes added.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: selftests: Fix spelling of 'Mismatch'
arm64: syscall: include prototype for EL0 SVC functions
compiler.h: Raise minimum version of GCC to 5.1 for arm64
arm64: make atomic helpers __always_inline
arm64: rename S_FRAME_SIZE to PT_REGS_SIZE
Revert "arm64: Enable perf events based hard lockup detector"
arm64: entry: remove redundant IRQ flag tracing
arm64: Remove arm64_dma32_phys_limit and its uses
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fix coredumps on 64bit kernels
- fix for alignment bugs preventing booting
- fix checking for failed irq_alloc_desc calls
* tag 'mips_fixes_5.11.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: OCTEON: fix unreachable code in octeon_irq_init_ciu
MIPS: relocatable: fix possible boot hangup with KASLR enabled
MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
|
|
When 'perf inject' reads a perf.data file from an older version of perf,
it writes event attributes into the output with the original size field,
but lays them out as if they had the size currently used. Readers see a
corrupt file. Update the size field to match the layout.
Signed-off-by: Al Grant <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Denis Nikitin <[email protected]>
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In some cases, the number of cpus (nr_cpus_online) is confused with the
maximum cpu number (nr_cpus_avail), which results in the error in the
example below:
Example on system with 8 cpus:
Before:
# echo 0 > /sys/devices/system/cpu/cpu2/online
# ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname
Linux
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.147 MB perf.data ]
# ./perf script --itrace=e
Requested CPU 7 too large. Consider raising MAX_NR_CPUS
0x25908 [0x8]: failed to process type: 68 [Invalid argument]
After:
# ./perf script --itrace=e
#
Fixes: 8c7274691f0d ("perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Fixes: 7df4e36a4785 ("perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online")
Signed-off-by: Adrian Hunter <[email protected]>
Tested-by: Kan Liang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: [email protected]
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
As of now it doesn't consider cgroups when collecting shadow stats and
metrics so counter values from different cgroups will be saved in a same
slot. This resulted in incorrect numbers when those cgroups have
different workloads.
For example, let's look at the scenario below: cgroups A and C runs same
workload which burns a cpu while cgroup B runs a light workload.
$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1
Performance counter stats for 'system wide':
3,958,116,522 cycles A
6,722,650,929 instructions A # 2.53 insn per cycle
1,132,741 cycles B
571,743 instructions B # 0.00 insn per cycle
4,007,799,935 cycles C
6,793,181,523 instructions C # 2.56 insn per cycle
1.001050869 seconds time elapsed
When I run 'perf stat' with single workload, it usually shows IPC around
1.7. We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A.
But in this case, since cgroups are ignored, cycles are averaged so it
used the lower value for IPC calculation and resulted in around 2.5.
avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066
IPC (A) : 6722650929 / 2655683066 = 2.531
IPC (B) : 571743 / 2655683066 = 0.0002
IPC (C) : 6793181523 / 2655683066 = 2.557
We can simply compare cgroup pointers in the evsel and it'll be NULL
when cgroups are not specified. With this patch, I can see correct
numbers like below:
$ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1
Performance counter stats for 'system wide':
4,171,051,687 cycles A
7,219,793,922 instructions A # 1.73 insn per cycle
1,051,189 cycles B
583,102 instructions B # 0.55 insn per cycle
4,171,124,710 cycles C
7,192,944,580 instructions C # 1.72 insn per cycle
1.007909814 seconds time elapsed
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
To pass more info to the saved_value in the runtime_stat, add a new
struct runtime_stat_data. Currently it only has 'ctx' field but later
patch will add more.
Note that we intentionally pass 0 as ctx to clock-related events for
compatibility. It was already there in a few places. So move the code
into the saved_value_lookup() explicitly and add a comment.
Suggested-by: Andi Kleen <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Jin Yao <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Permissions are necessary to get a tracepoint id. Fail the test when the
read fails.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
If a test fails return -1 rather than 0. This is consistent with the
return value in test-cpumap.c
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The variable 'bf' is read (for a write call) without being initialized
triggering a memory sanitizer warning. Use 'bf' in the read and switch
the write to reading from a string.
Signed-off-by: Ian Rogers <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Link: http://lore.kernel.org/lkml/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
In the fast commit, it adds REQ_FUA and REQ_PREFLUSH on each fast
commit block when barrier is enabled. However, in recovery phase,
ext4 compares CRC value in the tail. So it is sufficient to add
REQ_FUA and REQ_PREFLUSH on the block that has tail.
Signed-off-by: Daejun Park <[email protected]>
Reviewed-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/20210106013242epcms2p5b6b4ed8ca86f29456fdf56aa580e74b4@epcms2p5
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
We got a "deleted inode referenced" warning cross our fsstress test. The
bug can be reproduced easily with following steps:
cd /dev/shm
mkdir test/
fallocate -l 128M img
mkfs.ext4 -b 1024 img
mount img test/
dd if=/dev/zero of=test/foo bs=1M count=128
mkdir test/dir/ && cd test/dir/
for ((i=0;i<1000;i++)); do touch file$i; done # consume all block
cd ~ && renameat2(AT_FDCWD, /dev/shm/test/dir/file1, AT_FDCWD,
/dev/shm/test/dir/dst_file, RENAME_WHITEOUT) # ext4_add_entry in
ext4_rename will return ENOSPC!!
cd /dev/shm/ && umount test/ && mount img test/ && ls -li test/dir/file1
We will get the output:
"ls: cannot access 'test/dir/file1': Structure needs cleaning"
and the dmesg show:
"EXT4-fs error (device loop0): ext4_lookup:1626: inode #2049: comm ls:
deleted inode referenced: 139"
ext4_rename will create a special inode for whiteout and use this 'ino'
to replace the source file's dir entry 'ino'. Once error happens
latter(the error above was the ENOSPC return from ext4_add_entry in
ext4_rename since all space has been consumed), the cleanup do drop the
nlink for whiteout, but forget to restore 'ino' with source file. This
will trigger the bug describle as above.
Signed-off-by: yangerkun <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: [email protected]
Fixes: cd808deced43 ("ext4: support RENAME_WHITEOUT")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
After full/fast commit, entries in staging queue are promoted to main
queue. In ext4_fs_cleanup function, it splice to staging queue to
staging queue.
Fixes: aa75f4d3daaeb ("ext4: main fast-commit commit path")
Signed-off-by: Daejun Park <[email protected]>
Reviewed-by: Harshad Shirwadkar <[email protected]>
Link: https://lore.kernel.org/r/20201230094851epcms2p6eeead8cc984379b37b2efd21af90fd1a@epcms2p6
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
|