Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a 'ping' command to verify that the 'perf record' session is up and
operational.
It's used in the following patches via test code to make sure 'perf
record' is ready to receive signals.
Example:
# cat ~/.perfconfig
[daemon]
base=/opt/perfdata
[session-cycles]
run = -m 10M -e cycles --overwrite --switch-output -a
[session-sched]
run = -m 20M -e sched:* --overwrite --switch-output -a
Start the daemon:
# perf daemon start
Ping all sessions:
# perf daemon ping
OK cycles
OK sched
Ping specific session:
# perf daemon ping --session sched
OK sched
Committer notes:
Fixed up bug pointed by clang:
Buggy:
if (!pollfd.revents & POLLIN)
Correct code:
if (!(pollfd.revents & POLLIN))
clang warning:
builtin-daemon.c:560:6: error: logical not is only applied to the left hand side of this bitwise operator [-Werror,-Wlogical-not-parentheses]
if (!pollfd.revents & POLLIN) {
^ ~
builtin-daemon.c:560:6: note: add parentheses after the '!' to evaluate the bitwise operator first
Also use designated initialized with pollfd, i.e.:
struct pollfd pollfd = { .events = POLLIN, };
Instead of:
struct pollfd pollfd = { 0, };
To get past:
builtin-daemon.c:510:30: error: missing field 'events' initializer [-Werror,-Wmissing-field-initializers]
struct pollfd pollfd = { 0, };
^
1 error generated.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Setup control fifos for session and add --control option to session
arguments.
Example:
# cat ~/.perfconfig
[daemon]
base=/opt/perfdata
[session-cycles]
run = -m 10M -e cycles --overwrite --switch-output -a
[session-sched]
run = -m 20M -e sched:* --overwrite --switch-output -a
Starting the daemon:
# perf daemon start
Use can list control fifos with (control and ack files):
# perf daemon -v
[776459:daemon] base: /opt/perfdata
output: /opt/perfdata/output
lock: /opt/perfdata/lock
[776460:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a
base: /opt/perfdata/session-cycles
output: /opt/perfdata/session-cycles/output
control: /opt/perfdata/session-cycles/control
ack: /opt/perfdata/session-cycles/ack
[776461:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
base: /opt/perfdata/session-sched
output: /opt/perfdata/session-sched/output
control: /opt/perfdata/session-sched/control
ack: /opt/perfdata/session-sched/ack
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add 'lock' file under daemon base and flock it, so only one perf daemon
can run on top of it.
Each daemon tries to create and lock BASE/lock file, if it's successful
we are sure we're the only daemon running over the BASE.
Once daemon is finished, file descriptor to lock file is closed and lock
is released.
Example:
# cat ~/.perfconfig
[daemon]
base=/opt/perfdata
[session-cycles]
run = -m 10M -e cycles --overwrite --switch-output -a
[session-sched]
run = -m 20M -e sched:* --overwrite --switch-output -a
Starting the daemon:
# perf daemon start
And try once more:
# perf daemon start
failed: another perf daemon (pid 775594) owns /opt/perfdata
will end up with an error, because there's already one running
on top of /opt/perfdata.
Committer notes:
Provide lockf(F_TLOCK) when not available, i.e. transform:
lockf(fd, F_TLOCK, 0);
into:
flock(fd, LOCK_EX | LOCK_NB);
Which should be equivalent.
Noticed when cross building to some odd Android NDK.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add 'perf daemon stop' command to stop daemon process and all running
sessions.
Example:
# cat ~/.perfconfig
[daemon]
base=/opt/perfdata
[session-cycles]
run = -m 10M -e cycles --overwrite --switch-output -a
[session-sched]
run = -m 20M -e sched:* --overwrite --switch-output -a
Start the daemon:
# perf daemon start
Stop the daemon
# perf daemon stop
Daemon is not running, nothing to connect to:
# perf daemon
connect error: Connection refused
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Allow the 'perf daemon' to send SIGUSR2 to all running sessions or just
to a specific session.
Example:
# cat ~/.perfconfig
[daemon]
base=/opt/perfdata
[session-cycles]
run = -m 10M -e cycles --overwrite --switch-output -a
[session-sched]
run = -m 20M -e sched:* --overwrite --switch-output -a
Start the daemon:
# perf daemon start
Send signal to all running sessions:
# perf daemon signal
signal 12 sent to session 'cycles [773738]'
signal 12 sent to session 'sched [773739]'
Or to specific one:
# perf daemon signal --session sched
signal 12 sent to session 'sched [773739]'
And verify signals were delivered and perf.data dumped:
# cat /opt/perfdata/session-cycles/output
rounding mmap pages size to 32M (8192 pages)
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2021010220382490 ]
# car /opt/perfdata/session-sched/output
rounding mmap pages size to 32M (8192 pages)
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2021010220382489 ]
[ perf record: dump data: Woken up 1 times ]
[ perf record: Dump perf.data.2021010220393745 ]
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add a 'list' command to display all running sessions. It's the default
command if no other command is specified.
Example:
# cat ~/.perfconfig
[daemon]
base=/opt/perfdata
[session-cycles]
run = -m 10M -e cycles --overwrite --switch-output -a
[session-sched]
run = -m 20M -e sched:* --overwrite --switch-output -a
Start the daemon:
# perf daemon start
List sessions:
# perf daemon
[771394:daemon] base: /opt/perfdata
[771395:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a
[771396:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
List sessions with more info:
# perf daemon -v
[771394:daemon] base: /opt/perfdata
output: /opt/perfdata/output
[771395:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a
base: /opt/perfdata/session-cycles
output: /opt/perfdata/session-cycles/output
[771396:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a
base: /opt/perfdata/session-sched
output: /opt/perfdata/session-sched/output
The 'output' file is perf record output for specific session.
Note you have to stop all running perf processes manually at this point,
stop command is coming in following patches.
Committer notes:
Fixup union initialization to overcome this in multiple older systems:
22 15.74 debian:8 : FAIL gcc version 4.9.2 (Debian 4.9.2-10+deb8u2)
builtin-daemon.c: In function 'send_cmd_list':
builtin-daemon.c:1386:2: error: missing initializer for field 'csv_sep' of 'struct <anonymous>' [-Werror=missing-field-initializers]
};
^
builtin-daemon.c:641:8: note: 'csv_sep' declared here
char csv_sep;
^
cc1: all warnings being treated as errors
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Use a signalfd fd to track SIGCHLD signals as notifications for perf
session termination.
This way we don't need to actively check for child status, being
notified if there's change.
Suggested-by: Alexei Budankov <[email protected]>
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add support to put the daemon process in the background.
It's now enabled by default and -f option is added to keep the daemon
process on the console for debugging.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add support to detect changes to the daemon's config file triggering a
re-read of the configuration when that happens.
Use a inotify file descriptor plugged into the main fdarray object for
polling.
Example:
# cat ~/.perfconfig
[daemon]
base=/opt/perfdata
[session-cycles]
run = -m 10M -e cycles --overwrite --switch-output -a
Starting the daemon:
# perf daemon start
Check sessions:
# perf daemon
[772262:daemon] base: /opt/perfdata
[772263:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a
Change '-m 10M' to '-m 20M', and check daemon log:
# tail -f /opt/perfdata/output
[2021-01-02 20:31:41.234045] daemon started (pid 772262)
[2021-01-02 20:31:41.235072] reconfig: ruining session [cycles:772263]: -m 10M -e cycles --overwrite --switch-output -a
[2021-01-02 20:32:08.310137] reconfig: session 'cycles' killed
[2021-01-02 20:32:08.310847] reconfig: ruining session [cycles:772338]: -m 20M -e cycles --overwrite --switch-output -a
And the session list:
# perf daemon
[772262:daemon] base: /opt/perfdata
[772338:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a
Note the changed '-m 20M' option is in place.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Adding support to configure daemon with config file.
Each client or server invocation of perf daemon needs to know the
base directory, where all sessions data is stored.
The base is defined with:
daemon.base
Base path for daemon data. All sessions data are stored under
this path.
The daemon allows to create record sessions. Each session is a
record command spawned and monitored by perf daemon.
The session is defined with:
session-<NAME>.run
Defines new record session for daemon. The value is record's
command line without the 'record' keyword.
Example:
# cat ~/.perfconfig
[daemon]
base=/opt/perfdata
[session-cycles]
run = -m 10M -e cycles --overwrite --switch-output -a
[session-sched]
run = -m 20M -e sched:* --overwrite --switch-output -a
The example above defines '/opt/perfdata' as the base directory and 2
record sessions.
# perf daemon start
[2021-01-28 19:47:33.454413] daemon started (pid 16015)
[2021-01-28 19:47:33.455910] reconfig: ruining session [cycles:16016]: -m 10M -e cycles --overwrite --switch-output -a
[2021-01-28 19:47:33.456599] reconfig: ruining session [sched:16017]: -m 20M -e sched:* --overwrite --switch-output -a
# ps -ef | grep perf
... perf daemon start
... /home/jolsa/.../perf record -m 20M -e cycles --overwrite --switch-output -a
... /home/jolsa/.../perf record -m 20M -e sched:* --overwrite --switch-output -a
The base directory is populated with:
# find /opt/perfdata/
/opt/perfdata/
/opt/perfdata/control <- control socket
/opt/perfdata/session-cycles <- data for session 'cycles':
/opt/perfdata/session-cycles/output <- perf record output
/opt/perfdata/session-cycles/perf.data <- perf data
/opt/perfdata/session-sched <- ditto for session 'sched'
/opt/perfdata/session-sched/output
/opt/perfdata/session-sched/perf.data
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
Add support for client socket side that will be used to send commands to
the daemon server socket.
This patch adds only the core support, all commands using this
functionality are coming in the following patches.
Committer notes:
Hat to patch patch it to deal with this in some systems:
cc1: warnings being treated as errors
builtin-daemon.c: In function 'send_cmd': MKDIR /tmp/build/perf/bench/
builtin-daemon.c:1368: error: ignoring return value of 'fwrite', declared with attribute warn_unused_result
MKDIR /tmp/build/perf/tests/
make[3]: *** [/tmp/build/perf/builtin-daemon.o] Error 1
And also to not leak the 'line' buffer allocated by getline(), since you
initialized line to NULL and len to zero, man page says:
If *lineptr is set to NULL and *n is set 0 before the call,
then getline() will allocate a buffer for storing the line.
This buffer should be freed by the user program even if
getline() failed.
Signed-off-by: Jiri Olsa <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Alexei Budankov <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Michael Petlan <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
The old implementation wasn't consistend on this.
But it looks like we depend on this so better bring it back.
Signed-off-by: Christian König <[email protected]>
Reported-and-tested-by: Mike Galbraith <[email protected]>
Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
Reviewed-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
After Commit 3499ba8198cad ("xen: Fix event channel callback via
INTX/GSI"), xenbus_probe() will be called too early on Arm. This will
recent to a guest hang during boot.
If the hang wasn't there, we would have ended up to call
xenbus_probe() twice (the second time is in xenbus_probe_initcall()).
We don't need to initialize xenbus_probe() early for Arm guest.
Therefore, the call in xen_guest_init() is now removed.
After this change, there is no more external caller for xenbus_probe().
So the function is turned to a static one. Interestingly there were two
prototypes for it.
Cc: [email protected]
Fixes: 3499ba8198cad ("xen: Fix event channel callback via INTX/GSI")
Reported-by: Ian Jackson <[email protected]>
Signed-off-by: Julien Grall <[email protected]>
Reviewed-by: David Woodhouse <[email protected]>
Reviewed-by: Stefano Stabellini <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Juergen Gross <[email protected]>
|
|
VSC8541 phys need a special reset sequence, which the driver doesn't
currentlny support. As a result enabling the reset via GPIO essentially
guarnteees that the device won't work correctly. We've been relying on
bootloaders to reset the device for years, with this revert we'll go
back to doing so until we can sort out how to get the reset sequence
into the kernel.
This reverts commit a0fa9d727043da2238432471e85de0bdb8a8df65.
Fixes: a0fa9d727043 ("dts: phy: add GPIO number and active state used for phy reset")
Cc: [email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Invoking x86_init.irqs.create_pci_msi_domain() before
x86_init.pci.arch_init() breaks XEN PV.
The XEN_PV specific pci.arch_init() function overrides the default
create_pci_msi_domain() which is obviously too late.
As a consequence the XEN PV PCI/MSI allocation goes through the native
path which runs out of vectors and causes malfunction.
Invoke it after x86_init.pci.arch_init().
Fixes: 6b15ffa07dc3 ("x86/irq: Initialize PCI/MSI domain at PCI init time")
Reported-by: Juergen Gross <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Juergen Gross <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Address a performance regression related to scale-invariance on x86
that may prevent turbo CPU frequencies from being used in certain
workloads on systems using acpi-cpufreq as the CPU performance scaling
driver and schedutil as the scaling governor"
* tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there
cpufreq: ACPI: Extend frequency tables to cover boost frequencies
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Revert a problematic ACPICA commit that changed the code to attempt to
update memory regions which may be read-only on some systems (Ard
Biesheuvel)"
* tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPICA: Interpreter: fix memory leak by using existing buffer"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine fixes from Vinod Koul:
"Some late fixes for dmaengine:
Core:
- fix channel device_node deletion
Driver fixes:
- dw: revert of runtime pm enabling
- idxd: device state fix, interrupt completion and list corruption
- ti: resource leak
* tag 'dmaengine-fix2-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine:
dmaengine dw: Revert "dmaengine: dw: Enable runtime PM"
dmaengine: idxd: check device state before issue command
dmaengine: ti: k3-udma: Fix a resource leak in an error handling path
dmaengine: move channel device_node deletion to driver
dmaengine: idxd: fix misc interrupt completion
dmaengine: idxd: Fix list corruption in description completion
|
|
This reverts commit 10cad2c40dcb04bb46b2bf399e00ca5ea93d36b0.
Petr reports that with this commit in place, io_uring fails the chroot
test (CVE-202-29373). We do need to retain ->fs for send/recvmsg, so
revert this commit.
Reported-by: Petr Vorel <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull networking fixes from David Miller:
"Another pile of networing fixes:
1) ath9k build error fix from Arnd Bergmann
2) dma memory leak fix in mediatec driver from Lorenzo Bianconi.
3) bpf int3 kprobe fix from Alexei Starovoitov.
4) bpf stackmap integer overflow fix from Bui Quang Minh.
5) Add usb device ids for Cinterion MV31 to qmi_qwwan driver, from
Christoph Schemmel.
6) Don't update deleted entry in xt_recent netfilter module, from
Jazsef Kadlecsik.
7) Use after free in nftables, fix from Pablo Neira Ayuso.
8) Header checksum fix in flowtable from Sven Auhagen.
9) Validate user controlled length in qrtr code, from Sabyrzhan
Tasbolatov.
10) Fix race in xen/netback, from Juergen Gross,
11) New device ID in cxgb4, from Raju Rangoju.
12) Fix ring locking in rxrpc release call, from David Howells.
13) Don't return LAPB error codes from x25_open(), from Xie He.
14) Missing error returns in gsi_channel_setup() from Alex Elder.
15) Get skb_copy_and_csum_datagram working properly with odd segment
sizes, from Willem de Bruijn.
16) Missing RFS/RSS table init in enetc driver, from Vladimir Oltean.
17) Do teardown on probe failure in DSA, from Vladimir Oltean.
18) Fix compilation failures of txtimestamp selftest, from Vadim
Fedorenko.
19) Limit rx per-napi gro queue size to fix latency regression, from
Eric Dumazet.
20) dpaa_eth xdp fixes from Camelia Groza.
21) Missing txq mode update when switching CBS off, in stmmac driver,
from Mohammad Athari Bin Ismail.
22) Failover pending logic fix in ibmvnic driver, from Sukadev
Bhattiprolu.
23) Null deref fix in vmw_vsock, from Norbert Slusarek.
24) Missing verdict update in xdp paths of ena driver, from Shay
Agroskin.
25) seq_file iteration fix in sctp from Neil Brown.
26) bpf 32-bit src register truncation fix on div/mod, from Daniel
Borkmann.
27) Fix jmp32 pruning in bpf verifier, from Daniel Borkmann.
28) Fix locking in vsock_shutdown(), from Stefano Garzarella.
29) Various missing index bound checks in hns3 driver, from Yufeng Mo.
30) Flush ports on .phylink_mac_link_down() in dsa felix driver, from
Vladimir Oltean.
31) Don't mix up stp and mrp port states in bridge layer, from Horatiu
Vultur.
32) Fix locking during netif_tx_disable(), from Edwin Peer"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
bpf: Fix 32 bit src register truncation on div/mod
bpf: Fix verifier jmp32 pruning decision logic
bpf: Fix verifier jsgt branch analysis on max bound
vsock: fix locking in vsock_shutdown()
net: hns3: add a check for index in hclge_get_rss_key()
net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()
net: hns3: add a check for queue_id in hclge_reset_vf_queue()
net: dsa: felix: implement port flushing on .phylink_mac_link_down
switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT
bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state
net: watchdog: hold device global xmit lock during tx disable
netfilter: nftables: relax check for stateful expressions in set definition
netfilter: conntrack: skip identical origin tuple in same zone only
vsock/virtio: update credit only if socket is not closed
net: fix iteration for sctp transport seq_files
net: ena: Update XDP verdict upon failure
net/vmw_vsock: improve locking in vsock_connect_timeout()
net/vmw_vsock: fix NULL pointer dereference
ibmvnic: Clear failover_pending if unable to schedule
net: stmmac: set TxQ mode back to DCB after disabling CBS
...
|
|
Merge misc fixes from Andrew Morton:
"14 patches.
Subsystems affected by this patch series: mm (kasan, mremap, tmpfs,
selftests, memcg, and slub), MAINTAINERS, squashfs, nilfs2, and
firmware"
* emailed patches from Andrew Morton <[email protected]>:
nilfs2: make splice write available again
mm, slub: better heuristic for number of cpus when calculating slab order
Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
MAINTAINERS: update Andrey Ryabinin's email address
selftests/vm: rename file run_vmtests to run_vmtests.sh
tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
mm/mremap: fix BUILD_BUG_ON() error in get_extent
firmware_loader: align .builtin_fw to 8
kasan: fix stack traces dependency for HW_TAGS
squashfs: add more sanity checks in xattr id lookup
squashfs: add more sanity checks in inode lookup
squashfs: add more sanity checks in id lookup
squashfs: avoid out of bounds writes in decompressors
|
|
Since 5.10, splice() or sendfile() to NILFS2 return EINVAL. This was
caused by commit 36e2c7421f02 ("fs: don't allow splice read/write
without explicit ops").
This patch initializes the splice_write field in file_operations, like
most file systems do, to restore the functionality.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Joachim Henke <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Tested-by: Ryusuke Konishi <[email protected]>
Cc: <[email protected]> [5.10+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When creating a new kmem cache, SLUB determines how large the slab pages
will based on number of inputs, including the number of CPUs in the
system. Larger slab pages mean that more objects can be allocated/free
from per-cpu slabs before accessing shared structures, but also
potentially more memory can be wasted due to low slab usage and
fragmentation. The rough idea of using number of CPUs is that larger
systems will be more likely to benefit from reduced contention, and also
should have enough memory to spare.
Number of CPUs used to be determined as nr_cpu_ids, which is number of
possible cpus, but on some systems many will never be onlined, thus
commit 045ab8c9487b ("mm/slub: let number of online CPUs determine the
slub page order") changed it to nr_online_cpus(). However, for kmem
caches created early before CPUs are onlined, this may lead to
permamently low slab page sizes.
Vincent reports a regression [1] of hackbench on arm64 systems:
"I'm facing significant performances regression on a large arm64
server system (224 CPUs). Regressions is also present on small arm64
system (8 CPUs) but in a far smaller order of magnitude
On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16
v5.11-rc4 : 9.135sec (+/- 0.45%)
v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%)
v5.10: 3.136sec (+/- 0.40%)"
Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting
page allocator contention:
"i.e. the patch incurs a 7% to 32% performance penalty. This bisected
cleanly yesterday when I was looking for the regression and then
found the thread.
Numerous caches change size. For example, kmalloc-512 goes from
order-0 (vanilla) to order-2 with the revert.
So mostly this is down to the number of times SLUB calls into the
page allocator which only caches order-0 pages on a per-cpu basis"
Clearly num_online_cpus() doesn't work too early in bootup. We could
change the order dynamically in a memory hotplug callback, but runtime
order changing for existing kmem caches has been already shown as
dangerous, and removed in 32a6f409b693 ("mm, slub: remove runtime
allocation order changes").
It could be resurrected in a safe manner with some effort, but to fix
the regression we need something simpler.
We could use num_present_cpus() that should be the number of physically
present CPUs even before they are onlined. That would work for PowerPC
[3], which triggered the original commit, but that still doesn't work on
arm64 [4] as explained in [5].
So this patch tries to determine the best available value without
specific arch knowledge.
- num_present_cpus() if the number is larger than 1, as that means the
arch is likely setting it properly
- nr_cpu_ids otherwise
This should fix the reported regressions while also keeping the effect
of 045ab8c9487b for PowerPC systems. It's possible there are
configurations where num_present_cpus() is 1 during boot while
nr_cpu_ids is at the same time bloated, so these (if they exist) would
keep the large orders based on nr_cpu_ids as was before 045ab8c9487b.
[1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/
[2] https://lore.kernel.org/linux-mm/[email protected]/
[3] https://lore.kernel.org/linux-mm/[email protected]/
[4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/
[5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order")
Signed-off-by: Vlastimil Babka <[email protected]>
Reported-by: Vincent Guittot <[email protected]>
Reported-by: Mel Gorman <[email protected]>
Tested-by: Mel Gorman <[email protected]>
Tested-by: Vincent Guittot <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Bharata B Rao <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Fixes the following warnings which results in interrupts disabled on
port B/F:
gpio gpiochip1: (B): detected irqchip that is shared with multiple gpiochips: please fix the driver.
gpio gpiochip5: (F): detected irqchip that is shared with multiple gpiochips: please fix the driver.
- added separate irqchip for each interrupt capable gpiochip
- provided unique names for each irqchip
Fixes: d2b091961510 ("gpio: ep93xx: Pass irqchip when adding gpiochip")
Cc: <[email protected]>
Signed-off-by: Nikita Shubin <[email protected]>
Tested-by: Alexander Sverdlin <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
Two index spaces and ep93xx_gpio_port are confusing.
Instead add a separate struct to store necessary data and remove
ep93xx_gpio_port.
- add struct to store IRQ related data for each IRQ capable chip
- replace offset array with defined offsets
- add IRQ registers offset for each IRQ capable chip into
ep93xx_gpio_banks
------------[ cut here ]------------
kernel BUG at drivers/gpio/gpio-ep93xx.c:64!
---[ end trace 3f6544e133e9f5ae ]---
Fixes: fd935fc421e74 ("gpio: ep93xx: Do not pingpong irq numbers")
Cc: <[email protected]>
Reviewed-by: Alexander Sverdlin <[email protected]>
Tested-by: Alexander Sverdlin <[email protected]>
Signed-off-by: Nikita Shubin <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
Merely enabling CONFIG_COMPILE_TEST should not enable additional code.
To fix this, restrict the automatic enabling of GPIO_MXS to ARCH_MXS,
and ask the user in case of compile-testing.
Fixes: 6876ca311bfca5d7 ("gpio: mxs: add COMPILE_TEST support for GPIO_MXS")
Cc: <[email protected]>
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
It turns out that reasoning for lowering max. supported frequency is
wrong. Scrambling works just fine. Several now fixed bugs prevented
proper functioning, even with rates lower than 340 MHz. Issues were just
more pronounced with higher frequencies.
Fix that by allowing max. supported frequency in HW and fix the comment.
Fixes: cd9063757a22 ("drm/sun4i: DW HDMI: Lower max. supported rate for H6")
Reviewed-by: Chen-Yu Tsai <[email protected]>
Tested-by: Andre Heider <[email protected]>
Signed-off-by: Jernej Skrabec <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
As it turns out, vendor HDMI PHY driver for H6 has a pretty big table
of predefined values for various pixel clocks. However, most of them are
not useful/tested because they come from reference driver code. Vendor
PHY driver is concerned with only few of those, namely 27 MHz, 74.25
MHz, 148.5 MHz, 297 MHz and 594 MHz. These are all frequencies for
standard CEA modes.
Fix sun50i_h6_cur_ctr and sun50i_h6_phy_config with the values only for
aforementioned frequencies.
Table sun50i_h6_mpll_cfg doesn't need to be changed because values are
actually frequency dependent and not so much SoC dependent. See i.MX6
documentation for explanation of those values for similar PHY.
Fixes: c71c9b2fee17 ("drm/sun4i: Add support for Synopsys HDMI PHY")
Tested-by: Andre Heider <[email protected]>
Signed-off-by: Jernej Skrabec <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
As expected, HDMI controller clock should always match pixel clock. In
the past, changing HDMI controller rate would seemingly worsen
situation. However, that was the result of other bugs which are now
fixed.
Fix that by removing set_rate quirk and always set clock rate.
Fixes: 40bb9d3147b2 ("drm/sun4i: Add support for H6 DW HDMI controller")
Reviewed-by: Chen-Yu Tsai <[email protected]>
Tested-by: Andre Heider <[email protected]>
Signed-off-by: Jernej Skrabec <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Channel 1 has polarity bits for vsync and hsync signals but driver never
sets them. It turns out that with pre-HDMI2 controllers seemingly there
is no issue if polarity is not set. However, with HDMI2 controllers
(H6) there often comes to de-synchronization due to phase shift. This
causes flickering screen. It's safe to assume that similar issues might
happen also with pre-HDMI2 controllers.
Solve issue with setting vsync and hsync polarity. Note that display
stacks with tcon top have polarity bits actually in tcon0 polarity
register.
Fixes: 9026e0d122ac ("drm: Add Allwinner A10 Display Engine support")
Reviewed-by: Chen-Yu Tsai <[email protected]>
Tested-by: Andre Heider <[email protected]>
Signed-off-by: Jernej Skrabec <[email protected]>
Signed-off-by: Maxime Ripard <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
We don't have a persistent fb holding a reference to the frontbuffer
object, so every time we do the get+put we throw the frontbuffer object
immediately away. And so the next time around we get a pristine
frontbuffer object with bits==0 even for the old vma. This confuses
the frontbuffer tracking code which understandably expects the old
frontbuffer to have the overlay's bit set.
Fix this by hanging on to the frontbuffer reference until the next
flip. And just to make this a bit more clear let's track the frontbuffer
explicitly instead of just grabbing it via the old vma.
Cc: [email protected]
Cc: Chris Wilson <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1136
Signed-off-by: Ville Syrjälä <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Fixes: 8e7cb1799b4f ("drm/i915: Extract intel_frontbuffer active tracking")
Reviewed-by: Chris Wilson <[email protected]>
(cherry picked from commit 553c23bdb4775130f333f07a51b047276bc53f79)
Signed-off-by: Jani Nikula <[email protected]>
|
|
This reverts commit 4a3dea8932d3b1199680d2056dd91d31d94d70b7.
This causes blank screens for some users.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1482
Cc: Alvin Lee <[email protected]>
Cc: Jun Lei <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Reviewed-by: Rodrigo Siqueira <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2021-02-10
The following pull-request contains BPF updates for your *net* tree.
We've added 5 non-merge commits during the last 8 day(s) which contain
a total of 3 files changed, 22 insertions(+), 21 deletions(-).
The main changes are:
1) Fix missed execution of kprobes BPF progs when kprobe is firing via
int3, from Alexei Starovoitov.
2) Fix potential integer overflow in map max_entries for stackmap on
32 bit archs, from Bui Quang Minh.
3) Fix a verifier pruning and a insn rewrite issue related to 32 bit ops,
from Daniel Borkmann.
====================
Signed-off-by: David S. Miller <[email protected]>
c# Please enter a commit message to explain why this merge is necessary,
|
|
Fixes small regression in implementation of new mount API.
Signed-off-by: Ronnie Sahlberg <[email protected]>
Reported-by: Hyunchul Lee <[email protected]>
Tested-by: Hyunchul Lee <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
This reverts commit 536d3bf261a2fc3b05b3e91e7eef7383443015cf, as it can
cause writers to memory.high to get stuck in the kernel forever,
performing page reclaim and consuming excessive amounts of CPU cycles.
Before the patch, a write to memory.high would first put the new limit
in place for the workload, and then reclaim the requested delta. After
the patch, the kernel tries to reclaim the delta before putting the new
limit into place, in order to not overwhelm the workload with a sudden,
large excess over the limit. However, if reclaim is actively racing
with new allocations from the uncurbed workload, it can keep the write()
working inside the kernel indefinitely.
This is causing problems in Facebook production. A privileged
system-level daemon that adjusts memory.high for various workloads
running on a host can get unexpectedly stuck in the kernel and
essentially turn into a sort of involuntary kswapd for one of the
workloads. We've observed that daemon busy-spin in a write() for
minutes at a time, neglecting its other duties on the system, and
expending privileged system resources on behalf of a workload.
To remedy this, we have first considered changing the reclaim logic to
break out after a couple of loops - whether the workload has converged
to the new limit or not - and bound the write() call this way. However,
the root cause that inspired the sequence change in the first place has
been fixed through other means, and so a revert back to the proven
limit-setting sequence, also used by memory.max, is preferable.
The sequence was changed to avoid extreme latencies in the workload when
the limit was lowered: the sudden, large excess created by the limit
lowering would erroneously trigger the penalty sleeping code that is
meant to throttle excessive growth from below. Allocating threads could
end up sleeping long after the write() had already reclaimed the delta
for which they were being punished.
However, erroneous throttling also caused problems in other scenarios at
around the same time. This resulted in commit b3ff92916af3 ("mm, memcg:
reclaim more aggressively before high allocator throttling"), included
in the same release as the offending commit. When allocating threads
now encounter large excess caused by a racing write() to memory.high,
instead of entering punitive sleeps, they will simply be tasked with
helping reclaim down the excess, and will be held no longer than it
takes to accomplish that. This is in line with regular limit
enforcement - i.e. if the workload allocates up against or over an
otherwise unchanged limit from below.
With the patch breaking userspace, and the root cause addressed by other
means already, revert it again.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <[email protected]>
Reported-by: Tejun Heo <[email protected]>
Acked-by: Chris Down <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Michal Koutný <[email protected]>
Cc: <[email protected]> [5.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Update my email, @virtuozzo.com will stop working shortly.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andrey Ryabinin <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit c2aa8afc36fa has renamed run_vmtests in Makefile, but the file
still uses the old name.
The kernel test robot reported the following issue:
# selftests: vm: run_vmtests.sh
# Warning: file run_vmtests.sh is missing!
not ok 1 selftests: vm: run_vmtests.sh
Link: https://lkml.kernel.org/r/[email protected]
Fixes: c2aa8afc36fa (selftests/vm: rename run_vmtests --> run_vmtests.sh)
Signed-off-by: Rong Chen <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reviewed-by: John Hubbard <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As with s390, alpha is a 64-bit architecture with a 32-bit ino_t. With
CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and
display "inode64" in the mount options, whereas passing "inode64" in the
mount options will fail. This leads to erroneous behaviours such as
this:
# mkdir mnt
# mount -t tmpfs nodev mnt
# mount -o remount,rw mnt
mount: /home/ubuntu/mnt: mount point not mounted or bad option.
Prevent CONFIG_TMPFS_INODE64 from being selected on alpha.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Chris Down <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: <[email protected]> [5.9+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently there is an assumption in tmpfs that 64-bit architectures also
have a 64-bit ino_t. This is not true on s390 which has a 32-bit ino_t.
With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers
and display "inode64" in the mount options, but passing the "inode64"
mount option will fail. This leads to the following behavior:
# mkdir mnt
# mount -t tmpfs nodev mnt
# mount -o remount,rw mnt
mount: /home/ubuntu/mnt: mount point not mounted or bad option.
As mount sees "inode64" in the mount options and thus passes it in the
options for the remount.
So prevent CONFIG_TMPFS_INODE64 from being selected on s390.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Chris Down <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Amir Goldstein <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: <[email protected]> [5.9+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
clang can't evaluate this function argument at compile time when the
function is not inlined, which leads to a link time failure:
ld.lld: error: undefined symbol: __compiletime_assert_414
>>> referenced by mremap.c
>>> mremap.o:(get_extent) in archive mm/built-in.a
Mark the function as __always_inline to avoid it.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 9ad9718bfa41 ("mm/mremap: calculate extent in one place")
Signed-off-by: Arnd Bergmann <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Sedat Dilek <[email protected]>
Cc: Kirill A. Shutemov" <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Dmitry Safonov <[email protected]>
Cc: Brian Geffon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
arm64 references the start address of .builtin_fw (__start_builtin_fw)
with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC
relocations. The compiler is allowed to emit the
R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in
include/linux/firmware.h is 8-byte aligned.
The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a
multiple of 8, which may not be the case if .builtin_fw is empty.
Unconditionally align .builtin_fw to fix the linker error. 32-bit
architectures could use ALIGN(4) but that would add unnecessary
complexity, so just use ALIGN(8).
Link: https://lkml.kernel.org/r/[email protected]
Link: https://github.com/ClangBuiltLinux/linux/issues/1204
Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image")
Signed-off-by: Fangrui Song <[email protected]>
Reported-by: kernel test robot <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Tested-by: Nick Desaulniers <[email protected]>
Tested-by: Douglas Anderson <[email protected]>
Acked-by: Nathan Chancellor <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently, whether the alloc/free stack traces collection is enabled by
default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL.
The intention for this dependency was to only enable collection on slow
debug kernels due to a significant perf and memory impact.
As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option
and is enabled on many productions kernels including Android and Ubuntu.
As the result, this dependency is pointless and only complicates the
code and documentation.
Having stack traces collection disabled by default would make the
hardware mode work differently to to the software ones, which is
confusing.
This change removes the dependency and enables stack traces collection
by default.
Looking into the future, this default might makes sense for production
kernels, assuming we implement a fast stack trace collection approach.
Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Peter Collingbourne <[email protected]>
Cc: Evgenii Stepanov <[email protected]>
Cc: Branislav Rankov <[email protected]>
Cc: Kevin Brodsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sysbot has reported a warning where a kmalloc() attempt exceeds the
maximum limit. This has been identified as corruption of the xattr_ids
count when reading the xattr id lookup table.
This patch adds a number of additional sanity checks to detect this
corruption and others.
1. It checks for a corrupted xattr index read from the inode. This could
be because the metadata block is uncompressed, or because the
"compression" bit has been corrupted (turning a compressed block
into an uncompressed block). This would cause an out of bounds read.
2. It checks against corruption of the xattr_ids count. This can either
lead to the above kmalloc failure, or a smaller than expected
table to be read.
3. It checks the contents of the index table for corruption.
[[email protected]: fix checkpatch issue]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Reported-by: [email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sysbot has reported an "slab-out-of-bounds read" error which has been
identified as being caused by a corrupted "ino_num" value read from the
inode. This could be because the metadata block is uncompressed, or
because the "compression" bit has been corrupted (turning a compressed
block into an uncompressed block).
This patch adds additional sanity checks to detect this, and the
following corruption.
1. It checks against corruption of the inodes count. This can either
lead to a larger table to be read, or a smaller than expected
table to be read.
In the case of a too large inodes count, this would often have been
trapped by the existing sanity checks, but this patch introduces
a more exact check, which can identify too small values.
2. It checks the contents of the index table for corruption.
[[email protected]: fix checkpatch issue]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Reported-by: [email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sysbot has reported a number of "slab-out-of-bounds reads" and
"use-after-free read" errors which has been identified as being caused
by a corrupted index value read from the inode. This could be because
the metadata block is uncompressed, or because the "compression" bit has
been corrupted (turning a compressed block into an uncompressed block).
This patch adds additional sanity checks to detect this, and the
following corruption.
1. It checks against corruption of the ids count. This can either
lead to a larger table to be read, or a smaller than expected
table to be read.
In the case of a too large ids count, this would often have been
trapped by the existing sanity checks, but this patch introduces
a more exact check, which can identify too small values.
2. It checks the contents of the index table for corruption.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Squashfs: fix BIO migration regression and add sanity checks".
Patch [1/4] fixes a regression introduced by the "migrate from
ll_rw_block usage to BIO" patch, which has produced a number of
Sysbot/Syzkaller reports.
Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption
issues which have produced Sysbot reports in the id, inode and xattr
lookup code.
Each patch has been tested against the Sysbot reproducers using the
given kernel configuration. They have the appropriate "Reported-by:"
lines added.
Additionally, all of the reproducer filesystems are indirectly fixed by
patch [4/4] due to the fact they all have xattr corruption which is now
detected there.
Additional testing with other configurations and architectures (32bit,
big endian), and normal filesystems has also been done to trap any
inadvertent regressions caused by the additional sanity checks.
This patch (of 4):
This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".
Sysbot/Syskaller has reported a number of "out of bounds writes" and
"unable to handle kernel paging request in squashfs_decompress" errors
which have been identified as a regression introduced by the above
patch.
Specifically, the patch removed the following sanity check
if (length < 0 || length > output->length ||
(index + length) > msblk->bytes_used)
This check did two things:
1. It ensured any reads were not beyond the end of the filesystem
2. It ensured that the "length" field read from the filesystem
was within the expected maximum length. Without this any
corrupted values can over-run allocated buffers.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 93e72b3c612adc ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: [email protected]
Signed-off-by: Phillip Lougher <[email protected]>
Cc: Philippe Liard <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux
Pull i3c fix from Alexandre Belloni:
"A single build warning fix"
* tag 'i3c/fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
i3c/master/mipi-i3c-hci: Fix position of __maybe_unused in i3c_hci_of_match
|
|
While reviewing a different fix, John and I noticed an oddity in one of the
BPF program dumps that stood out, for example:
# bpftool p d x i 13
0: (b7) r0 = 808464450
1: (b4) w4 = 808464432
2: (bc) w0 = w0
3: (15) if r0 == 0x0 goto pc+1
4: (9c) w4 %= w0
[...]
In line 2 we noticed that the mov32 would 32 bit truncate the original src
register for the div/mod operation. While for the two operations the dst
register is typically marked unknown e.g. from adjust_scalar_min_max_vals()
the src register is not, and thus verifier keeps tracking original bounds,
simplified:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = -1
1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b7) r1 = -1
2: R0_w=invP-1 R1_w=invP-1 R10=fp0
2: (3c) w0 /= w1
3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0
3: (77) r1 >>= 32
4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0
4: (bf) r0 = r1
5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0
5: (95) exit
processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Runtime result of r0 at exit is 0 instead of expected -1. Remove the
verifier mov32 src rewrite in div/mod and replace it with a jmp32 test
instead. After the fix, we result in the following code generation when
having dividend r1 and divisor r6:
div, 64 bit: div, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8
1: (b7) r1 = 8 1: (b7) r1 = 8
2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2
3: (ac) w1 ^= w1 3: (ac) w1 ^= w1
4: (05) goto pc+1 4: (05) goto pc+1
5: (3f) r1 /= r6 5: (3c) w1 /= w6
6: (b7) r0 = 0 6: (b7) r0 = 0
7: (95) exit 7: (95) exit
mod, 64 bit: mod, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8
1: (b7) r1 = 8 1: (b7) r1 = 8
2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1
3: (9f) r1 %= r6 3: (9c) w1 %= w6
4: (b7) r0 = 0 4: (b7) r0 = 0
5: (95) exit 5: (95) exit
x86 in particular can throw a 'divide error' exception for div
instruction not only for divisor being zero, but also for the case
when the quotient is too large for the designated register. For the
edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT
since we always zero edx (rdx). Hence really the only protection
needed is against divisor being zero.
Fixes: 68fda450a7df ("bpf: fix 32-bit divide by zero")
Co-developed-by: John Fastabend <[email protected]>
Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
Anatoly has been fuzzing with kBdysch harness and reported a hang in
one of the outcomes:
func#0 @0
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = 808464450
1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b4) w4 = 808464432
2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0
2: (9c) w4 %= w0
3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
3: (66) if w4 s> 0x30303030 goto pc+0
R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
4: (7f) r0 >>= r0
5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
5: (9c) w4 %= w0
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
9: (95) exit
propagating r0
from 6 to 7: safe
4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
4: (7f) r0 >>= r0
5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
5: (9c) w4 %= w0
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
propagating r0
7: safe
propagating r0
from 6 to 7: safe
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
The underlying program was xlated as follows:
# bpftool p d x i 10
0: (b7) r0 = 808464450
1: (b4) w4 = 808464432
2: (bc) w0 = w0
3: (15) if r0 == 0x0 goto pc+1
4: (9c) w4 %= w0
5: (66) if w4 s> 0x30303030 goto pc+0
6: (7f) r0 >>= r0
7: (bc) w0 = w0
8: (15) if r0 == 0x0 goto pc+1
9: (9c) w4 %= w0
10: (66) if w0 s> 0x3030 goto pc+0
11: (d6) if w0 s<= 0x303030 goto pc+1
12: (05) goto pc-1
13: (95) exit
The verifier rewrote original instructions it recognized as dead code with
'goto pc-1', but reality differs from verifier simulation in that we are
actually able to trigger a hang due to hitting the 'goto pc-1' instructions.
Taking a closer look at the verifier analysis, the reason is that it misjudges
its pruning decision at the first 'from 6 to 7: safe' occasion. What happens
is that while both old/cur registers are marked as precise, they get misjudged
for the jmp32 case as range_within() yields true, meaning that the prior
verification path with a wider register bound could be verified successfully
and therefore the current path with a narrower register bound is deemed safe
as well whereas in reality it's not. R0 old/cur path's bounds compare as
follows:
old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff)
cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff)
old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff
cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff
The 64 bit bounds generally look okay and while the information that got
propagated from 32 to 64 bit looks correct as well, it's not precise enough
for judging a conditional jmp32. Given the latter only operates on subregisters
we also need to take these into account as well for a range_within() probe
in order to be able to prune paths. Extending the range_within() constraint
to both bounds will be able to tell us that the old signed 32 bit bounds are
not wider than the cur signed 32 bit bounds.
With the fix in place, the program will now verify the 'goto' branch case as
it should have been:
[...]
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
9: (95) exit
7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
8: (30) r0 = *(u8 *)skb[808464432]
BPF_LD_[ABS|IND] uses reserved fields
processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1
The bug is quite subtle in the sense that when verifier would determine that
a given branch is dead code, it would (here: wrongly) remove these instructions
from the program and hard-wire the taken branch for privileged programs instead
of the 'goto pc-1' rewrites which will cause hard to debug problems.
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Anatoly Trosinenko <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: John Fastabend <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return
code for both will tell the caller whether a given conditional jump is taken
or not, e.g. 1 means branch will be taken [for the involved registers] and the
goto target will be executed, 0 means branch will not be taken and instead we
fall-through to the next insn, and last but not least a -1 denotes that it is
not known at verification time whether a branch will be taken or not. Now while
the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the
branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval
since the branch will also be taken for reg->s32_max_value == sval. The jgt
branch analysis, for example, gets this right.
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Fixes: 4f7b3e82589e ("bpf: improve verifier branch analysis")
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: John Fastabend <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|