aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-02-11perf daemon: Add 'ping' commandJiri Olsa2-0/+157
Add a 'ping' command to verify that the 'perf record' session is up and operational. It's used in the following patches via test code to make sure 'perf record' is ready to receive signals. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start Ping all sessions: # perf daemon ping OK cycles OK sched Ping specific session: # perf daemon ping --session sched OK sched Committer notes: Fixed up bug pointed by clang: Buggy: if (!pollfd.revents & POLLIN) Correct code: if (!(pollfd.revents & POLLIN)) clang warning: builtin-daemon.c:560:6: error: logical not is only applied to the left hand side of this bitwise operator [-Werror,-Wlogical-not-parentheses] if (!pollfd.revents & POLLIN) { ^ ~ builtin-daemon.c:560:6: note: add parentheses after the '!' to evaluate the bitwise operator first Also use designated initialized with pollfd, i.e.: struct pollfd pollfd = { .events = POLLIN, }; Instead of: struct pollfd pollfd = { 0, }; To get past: builtin-daemon.c:510:30: error: missing field 'events' initializer [-Werror,-Wmissing-field-initializers] struct pollfd pollfd = { 0, }; ^ 1 error generated. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Set control fifo for sessionJiri Olsa2-1/+28
Setup control fifos for session and add --control option to session arguments. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Starting the daemon: # perf daemon start Use can list control fifos with (control and ack files): # perf daemon -v [776459:daemon] base: /opt/perfdata output: /opt/perfdata/output lock: /opt/perfdata/lock [776460:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a base: /opt/perfdata/session-cycles output: /opt/perfdata/session-cycles/output control: /opt/perfdata/session-cycles/control ack: /opt/perfdata/session-cycles/ack [776461:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a base: /opt/perfdata/session-sched output: /opt/perfdata/session-sched/output control: /opt/perfdata/session-sched/control ack: /opt/perfdata/session-sched/ack Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Allow only one daemon over base directoryJiri Olsa2-1/+77
Add 'lock' file under daemon base and flock it, so only one perf daemon can run on top of it. Each daemon tries to create and lock BASE/lock file, if it's successful we are sure we're the only daemon running over the BASE. Once daemon is finished, file descriptor to lock file is closed and lock is released. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Starting the daemon: # perf daemon start And try once more: # perf daemon start failed: another perf daemon (pid 775594) owns /opt/perfdata will end up with an error, because there's already one running on top of /opt/perfdata. Committer notes: Provide lockf(F_TLOCK) when not available, i.e. transform: lockf(fd, F_TLOCK, 0); into: flock(fd, LOCK_EX | LOCK_NB); Which should be equivalent. Noticed when cross building to some odd Android NDK. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add 'stop' commandJiri Olsa2-0/+35
Add 'perf daemon stop' command to stop daemon process and all running sessions. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start Stop the daemon # perf daemon stop Daemon is not running, nothing to connect to: # perf daemon connect error: Connection refused Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add 'signal' commandJiri Olsa2-7/+77
Allow the 'perf daemon' to send SIGUSR2 to all running sessions or just to a specific session. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start Send signal to all running sessions: # perf daemon signal signal 12 sent to session 'cycles [773738]' signal 12 sent to session 'sched [773739]' Or to specific one: # perf daemon signal --session sched signal 12 sent to session 'sched [773739]' And verify signals were delivered and perf.data dumped: # cat /opt/perfdata/session-cycles/output rounding mmap pages size to 32M (8192 pages) [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2021010220382490 ] # car /opt/perfdata/session-sched/output rounding mmap pages size to 32M (8192 pages) [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2021010220382489 ] [ perf record: dump data: Woken up 1 times ] [ perf record: Dump perf.data.2021010220393745 ] Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add 'list' commandJiri Olsa1-3/+85
Add a 'list' command to display all running sessions. It's the default command if no other command is specified. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a Start the daemon: # perf daemon start List sessions: # perf daemon [771394:daemon] base: /opt/perfdata [771395:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a [771396:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a List sessions with more info: # perf daemon -v [771394:daemon] base: /opt/perfdata output: /opt/perfdata/output [771395:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a base: /opt/perfdata/session-cycles output: /opt/perfdata/session-cycles/output [771396:sched] perf record -m 20M -e sched:* --overwrite --switch-output -a base: /opt/perfdata/session-sched output: /opt/perfdata/session-sched/output The 'output' file is perf record output for specific session. Note you have to stop all running perf processes manually at this point, stop command is coming in following patches. Committer notes: Fixup union initialization to overcome this in multiple older systems: 22 15.74 debian:8 : FAIL gcc version 4.9.2 (Debian 4.9.2-10+deb8u2) builtin-daemon.c: In function 'send_cmd_list': builtin-daemon.c:1386:2: error: missing initializer for field 'csv_sep' of 'struct <anonymous>' [-Werror=missing-field-initializers] }; ^ builtin-daemon.c:641:8: note: 'csv_sep' declared here char csv_sep; ^ cc1: all warnings being treated as errors Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add signalfd supportJiri Olsa1-6/+155
Use a signalfd fd to track SIGCHLD signals as notifications for perf session termination. This way we don't need to actively check for child status, being notified if there's change. Suggested-by: Alexei Budankov <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add background supportJiri Olsa2-0/+66
Add support to put the daemon process in the background. It's now enabled by default and -f option is added to keep the daemon process on the console for debugging. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add config file change checkJiri Olsa1-3/+97
Add support to detect changes to the daemon's config file triggering a re-read of the configuration when that happens. Use a inotify file descriptor plugged into the main fdarray object for polling. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a Starting the daemon: # perf daemon start Check sessions: # perf daemon [772262:daemon] base: /opt/perfdata [772263:cycles] perf record -m 10M -e cycles --overwrite --switch-output -a Change '-m 10M' to '-m 20M', and check daemon log: # tail -f /opt/perfdata/output [2021-01-02 20:31:41.234045] daemon started (pid 772262) [2021-01-02 20:31:41.235072] reconfig: ruining session [cycles:772263]: -m 10M -e cycles --overwrite --switch-output -a [2021-01-02 20:32:08.310137] reconfig: session 'cycles' killed [2021-01-02 20:32:08.310847] reconfig: ruining session [cycles:772338]: -m 20M -e cycles --overwrite --switch-output -a And the session list: # perf daemon [772262:daemon] base: /opt/perfdata [772338:cycles] perf record -m 20M -e cycles --overwrite --switch-output -a Note the changed '-m 20M' option is in place. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add config file supportJiri Olsa3-2/+393
Adding support to configure daemon with config file. Each client or server invocation of perf daemon needs to know the base directory, where all sessions data is stored. The base is defined with: daemon.base Base path for daemon data. All sessions data are stored under this path. The daemon allows to create record sessions. Each session is a record command spawned and monitored by perf daemon. The session is defined with: session-<NAME>.run Defines new record session for daemon. The value is record's command line without the 'record' keyword. Example: # cat ~/.perfconfig [daemon] base=/opt/perfdata [session-cycles] run = -m 10M -e cycles --overwrite --switch-output -a [session-sched] run = -m 20M -e sched:* --overwrite --switch-output -a The example above defines '/opt/perfdata' as the base directory and 2 record sessions. # perf daemon start [2021-01-28 19:47:33.454413] daemon started (pid 16015) [2021-01-28 19:47:33.455910] reconfig: ruining session [cycles:16016]: -m 10M -e cycles --overwrite --switch-output -a [2021-01-28 19:47:33.456599] reconfig: ruining session [sched:16017]: -m 20M -e sched:* --overwrite --switch-output -a # ps -ef | grep perf ... perf daemon start ... /home/jolsa/.../perf record -m 20M -e cycles --overwrite --switch-output -a ... /home/jolsa/.../perf record -m 20M -e sched:* --overwrite --switch-output -a The base directory is populated with: # find /opt/perfdata/ /opt/perfdata/ /opt/perfdata/control <- control socket /opt/perfdata/session-cycles <- data for session 'cycles': /opt/perfdata/session-cycles/output <- perf record output /opt/perfdata/session-cycles/perf.data <- perf data /opt/perfdata/session-sched <- ditto for session 'sched' /opt/perfdata/session-sched/output /opt/perfdata/session-sched/perf.data Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11perf daemon: Add client socket supportJiri Olsa1-0/+138
Add support for client socket side that will be used to send commands to the daemon server socket. This patch adds only the core support, all commands using this functionality are coming in the following patches. Committer notes: Hat to patch patch it to deal with this in some systems: cc1: warnings being treated as errors builtin-daemon.c: In function 'send_cmd': MKDIR /tmp/build/perf/bench/ builtin-daemon.c:1368: error: ignoring return value of 'fwrite', declared with attribute warn_unused_result MKDIR /tmp/build/perf/tests/ make[3]: *** [/tmp/build/perf/builtin-daemon.o] Error 1 And also to not leak the 'line' buffer allocated by getline(), since you initialized line to NULL and len to zero, man page says: If *lineptr is set to NULL and *n is set 0 before the call, then getline() will allocate a buffer for storing the line. This buffer should be freed by the user program even if getline() failed. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Budankov <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2021-02-11drm/ttm: make sure pool pages are clearedChristian König1-0/+10
The old implementation wasn't consistend on this. But it looks like we depend on this so better bring it back. Signed-off-by: Christian König <[email protected]> Reported-and-tested-by: Mike Galbraith <[email protected]> Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3") Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-02-11arm/xen: Don't probe xenbus as part of an early initcallJulien Grall4-6/+1
After Commit 3499ba8198cad ("xen: Fix event channel callback via INTX/GSI"), xenbus_probe() will be called too early on Arm. This will recent to a guest hang during boot. If the hang wasn't there, we would have ended up to call xenbus_probe() twice (the second time is in xenbus_probe_initcall()). We don't need to initialize xenbus_probe() early for Arm guest. Therefore, the call in xen_guest_init() is now removed. After this change, there is no more external caller for xenbus_probe(). So the function is turned to a static one. Interestingly there were two prototypes for it. Cc: [email protected] Fixes: 3499ba8198cad ("xen: Fix event channel callback via INTX/GSI") Reported-by: Ian Jackson <[email protected]> Signed-off-by: Julien Grall <[email protected]> Reviewed-by: David Woodhouse <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2021-02-10Revert "dts: phy: add GPIO number and active state used for phy reset"Palmer Dabbelt1-1/+0
VSC8541 phys need a special reset sequence, which the driver doesn't currentlny support. As a result enabling the reset via GPIO essentially guarnteees that the device won't work correctly. We've been relying on bootloaders to reset the device for years, with this revert we'll go back to doing so until we can sort out how to get the reset sequence into the kernel. This reverts commit a0fa9d727043da2238432471e85de0bdb8a8df65. Fixes: a0fa9d727043 ("dts: phy: add GPIO number and active state used for phy reset") Cc: [email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2021-02-10x86/pci: Create PCI/MSI irqdomain after x86_init.pci.arch_init()Thomas Gleixner1-4/+11
Invoking x86_init.irqs.create_pci_msi_domain() before x86_init.pci.arch_init() breaks XEN PV. The XEN_PV specific pci.arch_init() function overrides the default create_pci_msi_domain() which is obviously too late. As a consequence the XEN PV PCI/MSI allocation goes through the native path which runs out of vectors and causes malfunction. Invoke it after x86_init.pci.arch_init(). Fixes: 6b15ffa07dc3 ("x86/irq: Initialize PCI/MSI domain at PCI init time") Reported-by: Juergen Gross <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Juergen Gross <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2021-02-10Merge tag 'pm-5.11-rc8' of ↵Linus Torvalds2-12/+104
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Address a performance regression related to scale-invariance on x86 that may prevent turbo CPU frequencies from being used in certain workloads on systems using acpi-cpufreq as the CPU performance scaling driver and schedutil as the scaling governor" * tag 'pm-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there cpufreq: ACPI: Extend frequency tables to cover boost frequencies
2021-02-10Merge tag 'acpi-5.11-rc8' of ↵Linus Torvalds1-4/+13
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Revert a problematic ACPICA commit that changed the code to attempt to update memory regions which may be read-only on some systems (Ard Biesheuvel)" * tag 'acpi-5.11-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPICA: Interpreter: fix memory leak by using existing buffer"
2021-02-10Merge tag 'dmaengine-fix2-5.11' of ↵Linus Torvalds8-63/+104
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: "Some late fixes for dmaengine: Core: - fix channel device_node deletion Driver fixes: - dw: revert of runtime pm enabling - idxd: device state fix, interrupt completion and list corruption - ti: resource leak * tag 'dmaengine-fix2-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine dw: Revert "dmaengine: dw: Enable runtime PM" dmaengine: idxd: check device state before issue command dmaengine: ti: k3-udma: Fix a resource leak in an error handling path dmaengine: move channel device_node deletion to driver dmaengine: idxd: fix misc interrupt completion dmaengine: idxd: Fix list corruption in description completion
2021-02-10Revert "io_uring: don't take fs for recvmsg/sendmsg"Jens Axboe1-2/+4
This reverts commit 10cad2c40dcb04bb46b2bf399e00ca5ea93d36b0. Petr reports that with this commit in place, io_uring fails the chroot test (CVE-202-29373). We do need to retain ->fs for send/recvmsg, so revert this commit. Reported-by: Petr Vorel <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds48-134/+429
Pull networking fixes from David Miller: "Another pile of networing fixes: 1) ath9k build error fix from Arnd Bergmann 2) dma memory leak fix in mediatec driver from Lorenzo Bianconi. 3) bpf int3 kprobe fix from Alexei Starovoitov. 4) bpf stackmap integer overflow fix from Bui Quang Minh. 5) Add usb device ids for Cinterion MV31 to qmi_qwwan driver, from Christoph Schemmel. 6) Don't update deleted entry in xt_recent netfilter module, from Jazsef Kadlecsik. 7) Use after free in nftables, fix from Pablo Neira Ayuso. 8) Header checksum fix in flowtable from Sven Auhagen. 9) Validate user controlled length in qrtr code, from Sabyrzhan Tasbolatov. 10) Fix race in xen/netback, from Juergen Gross, 11) New device ID in cxgb4, from Raju Rangoju. 12) Fix ring locking in rxrpc release call, from David Howells. 13) Don't return LAPB error codes from x25_open(), from Xie He. 14) Missing error returns in gsi_channel_setup() from Alex Elder. 15) Get skb_copy_and_csum_datagram working properly with odd segment sizes, from Willem de Bruijn. 16) Missing RFS/RSS table init in enetc driver, from Vladimir Oltean. 17) Do teardown on probe failure in DSA, from Vladimir Oltean. 18) Fix compilation failures of txtimestamp selftest, from Vadim Fedorenko. 19) Limit rx per-napi gro queue size to fix latency regression, from Eric Dumazet. 20) dpaa_eth xdp fixes from Camelia Groza. 21) Missing txq mode update when switching CBS off, in stmmac driver, from Mohammad Athari Bin Ismail. 22) Failover pending logic fix in ibmvnic driver, from Sukadev Bhattiprolu. 23) Null deref fix in vmw_vsock, from Norbert Slusarek. 24) Missing verdict update in xdp paths of ena driver, from Shay Agroskin. 25) seq_file iteration fix in sctp from Neil Brown. 26) bpf 32-bit src register truncation fix on div/mod, from Daniel Borkmann. 27) Fix jmp32 pruning in bpf verifier, from Daniel Borkmann. 28) Fix locking in vsock_shutdown(), from Stefano Garzarella. 29) Various missing index bound checks in hns3 driver, from Yufeng Mo. 30) Flush ports on .phylink_mac_link_down() in dsa felix driver, from Vladimir Oltean. 31) Don't mix up stp and mrp port states in bridge layer, from Horatiu Vultur. 32) Fix locking during netif_tx_disable(), from Edwin Peer" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) bpf: Fix 32 bit src register truncation on div/mod bpf: Fix verifier jmp32 pruning decision logic bpf: Fix verifier jsgt branch analysis on max bound vsock: fix locking in vsock_shutdown() net: hns3: add a check for index in hclge_get_rss_key() net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx() net: hns3: add a check for queue_id in hclge_reset_vf_queue() net: dsa: felix: implement port flushing on .phylink_mac_link_down switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state net: watchdog: hold device global xmit lock during tx disable netfilter: nftables: relax check for stateful expressions in set definition netfilter: conntrack: skip identical origin tuple in same zone only vsock/virtio: update credit only if socket is not closed net: fix iteration for sctp transport seq_files net: ena: Update XDP verdict upon failure net/vmw_vsock: improve locking in vsock_connect_timeout() net/vmw_vsock: fix NULL pointer dereference ibmvnic: Clear failover_pending if unable to schedule net: stmmac: set TxQ mode back to DCB after disabling CBS ...
2021-02-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds18-48/+171
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (kasan, mremap, tmpfs, selftests, memcg, and slub), MAINTAINERS, squashfs, nilfs2, and firmware" * emailed patches from Andrew Morton <[email protected]>: nilfs2: make splice write available again mm, slub: better heuristic for number of cpus when calculating slab order Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" MAINTAINERS: update Andrey Ryabinin's email address selftests/vm: rename file run_vmtests to run_vmtests.sh tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 mm/mremap: fix BUILD_BUG_ON() error in get_extent firmware_loader: align .builtin_fw to 8 kasan: fix stack traces dependency for HW_TAGS squashfs: add more sanity checks in xattr id lookup squashfs: add more sanity checks in inode lookup squashfs: add more sanity checks in id lookup squashfs: avoid out of bounds writes in decompressors
2021-02-10nilfs2: make splice write available againJoachim Henke1-0/+1
Since 5.10, splice() or sendfile() to NILFS2 return EINVAL. This was caused by commit 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops"). This patch initializes the splice_write field in file_operations, like most file systems do, to restore the functionality. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Joachim Henke <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> [5.10+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-10mm, slub: better heuristic for number of cpus when calculating slab orderVlastimil Babka1-2/+16
When creating a new kmem cache, SLUB determines how large the slab pages will based on number of inputs, including the number of CPUs in the system. Larger slab pages mean that more objects can be allocated/free from per-cpu slabs before accessing shared structures, but also potentially more memory can be wasted due to low slab usage and fragmentation. The rough idea of using number of CPUs is that larger systems will be more likely to benefit from reduced contention, and also should have enough memory to spare. Number of CPUs used to be determined as nr_cpu_ids, which is number of possible cpus, but on some systems many will never be onlined, thus commit 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order") changed it to nr_online_cpus(). However, for kmem caches created early before CPUs are onlined, this may lead to permamently low slab page sizes. Vincent reports a regression [1] of hackbench on arm64 systems: "I'm facing significant performances regression on a large arm64 server system (224 CPUs). Regressions is also present on small arm64 system (8 CPUs) but in a far smaller order of magnitude On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16 v5.11-rc4 : 9.135sec (+/- 0.45%) v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%) v5.10: 3.136sec (+/- 0.40%)" Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting page allocator contention: "i.e. the patch incurs a 7% to 32% performance penalty. This bisected cleanly yesterday when I was looking for the regression and then found the thread. Numerous caches change size. For example, kmalloc-512 goes from order-0 (vanilla) to order-2 with the revert. So mostly this is down to the number of times SLUB calls into the page allocator which only caches order-0 pages on a per-cpu basis" Clearly num_online_cpus() doesn't work too early in bootup. We could change the order dynamically in a memory hotplug callback, but runtime order changing for existing kmem caches has been already shown as dangerous, and removed in 32a6f409b693 ("mm, slub: remove runtime allocation order changes"). It could be resurrected in a safe manner with some effort, but to fix the regression we need something simpler. We could use num_present_cpus() that should be the number of physically present CPUs even before they are onlined. That would work for PowerPC [3], which triggered the original commit, but that still doesn't work on arm64 [4] as explained in [5]. So this patch tries to determine the best available value without specific arch knowledge. - num_present_cpus() if the number is larger than 1, as that means the arch is likely setting it properly - nr_cpu_ids otherwise This should fix the reported regressions while also keeping the effect of 045ab8c9487b for PowerPC systems. It's possible there are configurations where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same time bloated, so these (if they exist) would keep the large orders based on nr_cpu_ids as was before 045ab8c9487b. [1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/ [2] https://lore.kernel.org/linux-mm/[email protected]/ [3] https://lore.kernel.org/linux-mm/[email protected]/ [4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/ [5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order") Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Vincent Guittot <[email protected]> Reported-by: Mel Gorman <[email protected]> Tested-by: Mel Gorman <[email protected]> Tested-by: Vincent Guittot <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Jann Horn <[email protected]> Cc: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Will Deacon <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-10gpio: ep93xx: Fix single irqchip with multi gpiochipsNikita Shubin1-11/+19
Fixes the following warnings which results in interrupts disabled on port B/F: gpio gpiochip1: (B): detected irqchip that is shared with multiple gpiochips: please fix the driver. gpio gpiochip5: (F): detected irqchip that is shared with multiple gpiochips: please fix the driver. - added separate irqchip for each interrupt capable gpiochip - provided unique names for each irqchip Fixes: d2b091961510 ("gpio: ep93xx: Pass irqchip when adding gpiochip") Cc: <[email protected]> Signed-off-by: Nikita Shubin <[email protected]> Tested-by: Alexander Sverdlin <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2021-02-10gpio: ep93xx: fix BUG_ON port F usageNikita Shubin1-87/+99
Two index spaces and ep93xx_gpio_port are confusing. Instead add a separate struct to store necessary data and remove ep93xx_gpio_port. - add struct to store IRQ related data for each IRQ capable chip - replace offset array with defined offsets - add IRQ registers offset for each IRQ capable chip into ep93xx_gpio_banks ------------[ cut here ]------------ kernel BUG at drivers/gpio/gpio-ep93xx.c:64! ---[ end trace 3f6544e133e9f5ae ]--- Fixes: fd935fc421e74 ("gpio: ep93xx: Do not pingpong irq numbers") Cc: <[email protected]> Reviewed-by: Alexander Sverdlin <[email protected]> Tested-by: Alexander Sverdlin <[email protected]> Signed-off-by: Nikita Shubin <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2021-02-10gpio: mxs: GPIO_MXS should not default to y unconditionallyGeert Uytterhoeven1-1/+2
Merely enabling CONFIG_COMPILE_TEST should not enable additional code. To fix this, restrict the automatic enabling of GPIO_MXS to ARCH_MXS, and ask the user in case of compile-testing. Fixes: 6876ca311bfca5d7 ("gpio: mxs: add COMPILE_TEST support for GPIO_MXS") Cc: <[email protected]> Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2021-02-10drm/sun4i: dw-hdmi: Fix max. frequency for H6Jernej Skrabec1-4/+2
It turns out that reasoning for lowering max. supported frequency is wrong. Scrambling works just fine. Several now fixed bugs prevented proper functioning, even with rates lower than 340 MHz. Issues were just more pronounced with higher frequencies. Fix that by allowing max. supported frequency in HW and fix the comment. Fixes: cd9063757a22 ("drm/sun4i: DW HDMI: Lower max. supported rate for H6") Reviewed-by: Chen-Yu Tsai <[email protected]> Tested-by: Andre Heider <[email protected]> Signed-off-by: Jernej Skrabec <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-02-10drm/sun4i: Fix H6 HDMI PHY configurationJernej Skrabec1-17/+9
As it turns out, vendor HDMI PHY driver for H6 has a pretty big table of predefined values for various pixel clocks. However, most of them are not useful/tested because they come from reference driver code. Vendor PHY driver is concerned with only few of those, namely 27 MHz, 74.25 MHz, 148.5 MHz, 297 MHz and 594 MHz. These are all frequencies for standard CEA modes. Fix sun50i_h6_cur_ctr and sun50i_h6_phy_config with the values only for aforementioned frequencies. Table sun50i_h6_mpll_cfg doesn't need to be changed because values are actually frequency dependent and not so much SoC dependent. See i.MX6 documentation for explanation of those values for similar PHY. Fixes: c71c9b2fee17 ("drm/sun4i: Add support for Synopsys HDMI PHY") Tested-by: Andre Heider <[email protected]> Signed-off-by: Jernej Skrabec <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-02-10drm/sun4i: dw-hdmi: always set clock rateJernej Skrabec2-4/+1
As expected, HDMI controller clock should always match pixel clock. In the past, changing HDMI controller rate would seemingly worsen situation. However, that was the result of other bugs which are now fixed. Fix that by removing set_rate quirk and always set clock rate. Fixes: 40bb9d3147b2 ("drm/sun4i: Add support for H6 DW HDMI controller") Reviewed-by: Chen-Yu Tsai <[email protected]> Tested-by: Andre Heider <[email protected]> Signed-off-by: Jernej Skrabec <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-02-10drm/sun4i: tcon: set sync polarity for tcon1 channelJernej Skrabec2-0/+31
Channel 1 has polarity bits for vsync and hsync signals but driver never sets them. It turns out that with pre-HDMI2 controllers seemingly there is no issue if polarity is not set. However, with HDMI2 controllers (H6) there often comes to de-synchronization due to phase shift. This causes flickering screen. It's safe to assume that similar issues might happen also with pre-HDMI2 controllers. Solve issue with setting vsync and hsync polarity. Note that display stacks with tcon top have polarity bits actually in tcon0 polarity register. Fixes: 9026e0d122ac ("drm: Add Allwinner A10 Display Engine support") Reviewed-by: Chen-Yu Tsai <[email protected]> Tested-by: Andre Heider <[email protected]> Signed-off-by: Jernej Skrabec <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-02-10drm/i915: Fix overlay frontbuffer trackingVille Syrjälä1-9/+8
We don't have a persistent fb holding a reference to the frontbuffer object, so every time we do the get+put we throw the frontbuffer object immediately away. And so the next time around we get a pristine frontbuffer object with bits==0 even for the old vma. This confuses the frontbuffer tracking code which understandably expects the old frontbuffer to have the overlay's bit set. Fix this by hanging on to the frontbuffer reference until the next flip. And just to make this a bit more clear let's track the frontbuffer explicitly instead of just grabbing it via the old vma. Cc: [email protected] Cc: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1136 Signed-off-by: Ville Syrjälä <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Fixes: 8e7cb1799b4f ("drm/i915: Extract intel_frontbuffer active tracking") Reviewed-by: Chris Wilson <[email protected]> (cherry picked from commit 553c23bdb4775130f333f07a51b047276bc53f79) Signed-off-by: Jani Nikula <[email protected]>
2021-02-09Revert "drm/amd/display: Update NV1x SR latency values"Alex Deucher1-2/+2
This reverts commit 4a3dea8932d3b1199680d2056dd91d31d94d70b7. This causes blank screens for some users. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1482 Cc: Alvin Lee <[email protected]> Cc: Jun Lei <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Reviewed-by: Rodrigo Siqueira <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller3-21/+22
Daniel Borkmann says: ==================== pull-request: bpf 2021-02-10 The following pull-request contains BPF updates for your *net* tree. We've added 5 non-merge commits during the last 8 day(s) which contain a total of 3 files changed, 22 insertions(+), 21 deletions(-). The main changes are: 1) Fix missed execution of kprobes BPF progs when kprobe is firing via int3, from Alexei Starovoitov. 2) Fix potential integer overflow in map max_entries for stackmap on 32 bit archs, from Bui Quang Minh. 3) Fix a verifier pruning and a insn rewrite issue related to 32 bit ops, from Daniel Borkmann. ==================== Signed-off-by: David S. Miller <[email protected]> c# Please enter a commit message to explain why this merge is necessary,
2021-02-09cifs: do not disable noperm if multiuser mount option is not providedRonnie Sahlberg1-2/+2
Fixes small regression in implementation of new mount API. Signed-off-by: Ronnie Sahlberg <[email protected]> Reported-by: Hyunchul Lee <[email protected]> Tested-by: Hyunchul Lee <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-02-09Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"Johannes Weiner1-3/+2
This reverts commit 536d3bf261a2fc3b05b3e91e7eef7383443015cf, as it can cause writers to memory.high to get stuck in the kernel forever, performing page reclaim and consuming excessive amounts of CPU cycles. Before the patch, a write to memory.high would first put the new limit in place for the workload, and then reclaim the requested delta. After the patch, the kernel tries to reclaim the delta before putting the new limit into place, in order to not overwhelm the workload with a sudden, large excess over the limit. However, if reclaim is actively racing with new allocations from the uncurbed workload, it can keep the write() working inside the kernel indefinitely. This is causing problems in Facebook production. A privileged system-level daemon that adjusts memory.high for various workloads running on a host can get unexpectedly stuck in the kernel and essentially turn into a sort of involuntary kswapd for one of the workloads. We've observed that daemon busy-spin in a write() for minutes at a time, neglecting its other duties on the system, and expending privileged system resources on behalf of a workload. To remedy this, we have first considered changing the reclaim logic to break out after a couple of loops - whether the workload has converged to the new limit or not - and bound the write() call this way. However, the root cause that inspired the sequence change in the first place has been fixed through other means, and so a revert back to the proven limit-setting sequence, also used by memory.max, is preferable. The sequence was changed to avoid extreme latencies in the workload when the limit was lowered: the sudden, large excess created by the limit lowering would erroneously trigger the penalty sleeping code that is meant to throttle excessive growth from below. Allocating threads could end up sleeping long after the write() had already reclaimed the delta for which they were being punished. However, erroneous throttling also caused problems in other scenarios at around the same time. This resulted in commit b3ff92916af3 ("mm, memcg: reclaim more aggressively before high allocator throttling"), included in the same release as the offending commit. When allocating threads now encounter large excess caused by a racing write() to memory.high, instead of entering punitive sleeps, they will simply be tasked with helping reclaim down the excess, and will be held no longer than it takes to accomplish that. This is in line with regular limit enforcement - i.e. if the workload allocates up against or over an otherwise unchanged limit from below. With the patch breaking userspace, and the root cause addressed by other means already, revert it again. Link: https://lkml.kernel.org/r/[email protected] Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high") Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Tejun Heo <[email protected]> Acked-by: Chris Down <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Michal Koutný <[email protected]> Cc: <[email protected]> [5.8+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09MAINTAINERS: update Andrey Ryabinin's email addressAndrey Ryabinin2-1/+2
Update my email, @virtuozzo.com will stop working shortly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09selftests/vm: rename file run_vmtests to run_vmtests.shRong Chen1-0/+0
Commit c2aa8afc36fa has renamed run_vmtests in Makefile, but the file still uses the old name. The kernel test robot reported the following issue: # selftests: vm: run_vmtests.sh # Warning: file run_vmtests.sh is missing! not ok 1 selftests: vm: run_vmtests.sh Link: https://lkml.kernel.org/r/[email protected] Fixes: c2aa8afc36fa (selftests/vm: rename run_vmtests --> run_vmtests.sh) Signed-off-by: Rong Chen <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09tmpfs: disallow CONFIG_TMPFS_INODE64 on alphaSeth Forshee1-1/+1
As with s390, alpha is a 64-bit architecture with a 32-bit ino_t. With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and display "inode64" in the mount options, whereas passing "inode64" in the mount options will fail. This leads to erroneous behaviours such as this: # mkdir mnt # mount -t tmpfs nodev mnt # mount -o remount,rw mnt mount: /home/ubuntu/mnt: mount point not mounted or bad option. Prevent CONFIG_TMPFS_INODE64 from being selected on alpha. Link: https://lkml.kernel.org/r/[email protected] Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb") Signed-off-by: Seth Forshee <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Chris Down <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: <[email protected]> [5.9+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09tmpfs: disallow CONFIG_TMPFS_INODE64 on s390Seth Forshee1-1/+1
Currently there is an assumption in tmpfs that 64-bit architectures also have a 64-bit ino_t. This is not true on s390 which has a 32-bit ino_t. With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and display "inode64" in the mount options, but passing the "inode64" mount option will fail. This leads to the following behavior: # mkdir mnt # mount -t tmpfs nodev mnt # mount -o remount,rw mnt mount: /home/ubuntu/mnt: mount point not mounted or bad option. As mount sees "inode64" in the mount options and thus passes it in the options for the remount. So prevent CONFIG_TMPFS_INODE64 from being selected on s390. Link: https://lkml.kernel.org/r/[email protected] Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb") Signed-off-by: Seth Forshee <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Chris Down <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Amir Goldstein <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: <[email protected]> [5.9+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09mm/mremap: fix BUILD_BUG_ON() error in get_extentArnd Bergmann1-2/+3
clang can't evaluate this function argument at compile time when the function is not inlined, which leads to a link time failure: ld.lld: error: undefined symbol: __compiletime_assert_414 >>> referenced by mremap.c >>> mremap.o:(get_extent) in archive mm/built-in.a Mark the function as __always_inline to avoid it. Link: https://lkml.kernel.org/r/[email protected] Fixes: 9ad9718bfa41 ("mm/mremap: calculate extent in one place") Signed-off-by: Arnd Bergmann <[email protected]> Tested-by: Nick Desaulniers <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Tested-by: Sedat Dilek <[email protected]> Cc: Kirill A. Shutemov" <[email protected]> Cc: Wei Yang <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Brian Geffon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09firmware_loader: align .builtin_fw to 8Fangrui Song1-1/+1
arm64 references the start address of .builtin_fw (__start_builtin_fw) with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC relocations. The compiler is allowed to emit the R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in include/linux/firmware.h is 8-byte aligned. The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a multiple of 8, which may not be the case if .builtin_fw is empty. Unconditionally align .builtin_fw to fix the linker error. 32-bit architectures could use ALIGN(4) but that would add unnecessary complexity, so just use ALIGN(8). Link: https://lkml.kernel.org/r/[email protected] Link: https://github.com/ClangBuiltLinux/linux/issues/1204 Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image") Signed-off-by: Fangrui Song <[email protected]> Reported-by: kernel test robot <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Tested-by: Nick Desaulniers <[email protected]> Tested-by: Douglas Anderson <[email protected]> Acked-by: Nathan Chancellor <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09kasan: fix stack traces dependency for HW_TAGSAndrey Konovalov2-8/+3
Currently, whether the alloc/free stack traces collection is enabled by default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL. The intention for this dependency was to only enable collection on slow debug kernels due to a significant perf and memory impact. As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option and is enabled on many productions kernels including Android and Ubuntu. As the result, this dependency is pointless and only complicates the code and documentation. Having stack traces collection disabled by default would make the hardware mode work differently to to the software ones, which is confusing. This change removes the dependency and enables stack traces collection by default. Looking into the future, this default might makes sense for production kernels, assuming we implement a fast stack trace collection approach. Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Marco Elver <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Will Deacon <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Peter Collingbourne <[email protected]> Cc: Evgenii Stepanov <[email protected]> Cc: Branislav Rankov <[email protected]> Cc: Kevin Brodsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09squashfs: add more sanity checks in xattr id lookupPhillip Lougher1-9/+57
Sysbot has reported a warning where a kmalloc() attempt exceeds the maximum limit. This has been identified as corruption of the xattr_ids count when reading the xattr id lookup table. This patch adds a number of additional sanity checks to detect this corruption and others. 1. It checks for a corrupted xattr index read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This would cause an out of bounds read. 2. It checks against corruption of the xattr_ids count. This can either lead to the above kmalloc failure, or a smaller than expected table to be read. 3. It checks the contents of the index table for corruption. [[email protected]: fix checkpatch issue] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Phillip Lougher <[email protected]> Reported-by: [email protected] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09squashfs: add more sanity checks in inode lookupPhillip Lougher1-8/+33
Sysbot has reported an "slab-out-of-bounds read" error which has been identified as being caused by a corrupted "ino_num" value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the inodes count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large inodes count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. [[email protected]: fix checkpatch issue] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Phillip Lougher <[email protected]> Reported-by: [email protected] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09squashfs: add more sanity checks in id lookupPhillip Lougher4-12/+45
Sysbot has reported a number of "slab-out-of-bounds reads" and "use-after-free read" errors which has been identified as being caused by a corrupted index value read from the inode. This could be because the metadata block is uncompressed, or because the "compression" bit has been corrupted (turning a compressed block into an uncompressed block). This patch adds additional sanity checks to detect this, and the following corruption. 1. It checks against corruption of the ids count. This can either lead to a larger table to be read, or a smaller than expected table to be read. In the case of a too large ids count, this would often have been trapped by the existing sanity checks, but this patch introduces a more exact check, which can identify too small values. 2. It checks the contents of the index table for corruption. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Phillip Lougher <[email protected]> Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09squashfs: avoid out of bounds writes in decompressorsPhillip Lougher1-1/+7
Patch series "Squashfs: fix BIO migration regression and add sanity checks". Patch [1/4] fixes a regression introduced by the "migrate from ll_rw_block usage to BIO" patch, which has produced a number of Sysbot/Syzkaller reports. Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption issues which have produced Sysbot reports in the id, inode and xattr lookup code. Each patch has been tested against the Sysbot reproducers using the given kernel configuration. They have the appropriate "Reported-by:" lines added. Additionally, all of the reproducer filesystems are indirectly fixed by patch [4/4] due to the fact they all have xattr corruption which is now detected there. Additional testing with other configurations and architectures (32bit, big endian), and normal filesystems has also been done to trap any inadvertent regressions caused by the additional sanity checks. This patch (of 4): This is a regression introduced by the patch "migrate from ll_rw_block usage to BIO". Sysbot/Syskaller has reported a number of "out of bounds writes" and "unable to handle kernel paging request in squashfs_decompress" errors which have been identified as a regression introduced by the above patch. Specifically, the patch removed the following sanity check if (length < 0 || length > output->length || (index + length) > msblk->bytes_used) This check did two things: 1. It ensured any reads were not beyond the end of the filesystem 2. It ensured that the "length" field read from the filesystem was within the expected maximum length. Without this any corrupted values can over-run allocated buffers. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 93e72b3c612adc ("squashfs: migrate from ll_rw_block usage to BIO") Reported-by: [email protected] Signed-off-by: Phillip Lougher <[email protected]> Cc: Philippe Liard <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-02-09Merge tag 'i3c/fixes-for-5.11' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c fix from Alexandre Belloni: "A single build warning fix" * tag 'i3c/fixes-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c/master/mipi-i3c-hci: Fix position of __maybe_unused in i3c_hci_of_match
2021-02-10bpf: Fix 32 bit src register truncation on div/modDaniel Borkmann1-15/+13
While reviewing a different fix, John and I noticed an oddity in one of the BPF program dumps that stood out, for example: # bpftool p d x i 13 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 [...] In line 2 we noticed that the mov32 would 32 bit truncate the original src register for the div/mod operation. While for the two operations the dst register is typically marked unknown e.g. from adjust_scalar_min_max_vals() the src register is not, and thus verifier keeps tracking original bounds, simplified: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = -1 1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b7) r1 = -1 2: R0_w=invP-1 R1_w=invP-1 R10=fp0 2: (3c) w0 /= w1 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0 3: (77) r1 >>= 32 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0 4: (bf) r0 = r1 5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0 5: (95) exit processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 Runtime result of r0 at exit is 0 instead of expected -1. Remove the verifier mov32 src rewrite in div/mod and replace it with a jmp32 test instead. After the fix, we result in the following code generation when having dividend r1 and divisor r6: div, 64 bit: div, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2 3: (ac) w1 ^= w1 3: (ac) w1 ^= w1 4: (05) goto pc+1 4: (05) goto pc+1 5: (3f) r1 /= r6 5: (3c) w1 /= w6 6: (b7) r0 = 0 6: (b7) r0 = 0 7: (95) exit 7: (95) exit mod, 64 bit: mod, 32 bit: 0: (b7) r6 = 8 0: (b7) r6 = 8 1: (b7) r1 = 8 1: (b7) r1 = 8 2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1 3: (9f) r1 %= r6 3: (9c) w1 %= w6 4: (b7) r0 = 0 4: (b7) r0 = 0 5: (95) exit 5: (95) exit x86 in particular can throw a 'divide error' exception for div instruction not only for divisor being zero, but also for the case when the quotient is too large for the designated register. For the edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT since we always zero edx (rdx). Hence really the only protection needed is against divisor being zero. Fixes: 68fda450a7df ("bpf: fix 32-bit divide by zero") Co-developed-by: John Fastabend <[email protected]> Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]>
2021-02-10bpf: Fix verifier jmp32 pruning decision logicDaniel Borkmann1-1/+5
Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: func#0 @0 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r0 = 808464450 1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0 1: (b4) w4 = 808464432 2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0 2: (9c) w4 %= w0 3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 3: (66) if w4 s> 0x30303030 goto pc+0 R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit propagating r0 from 6 to 7: safe 4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 4: (7f) r0 >>= r0 5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0 5: (9c) w4 %= w0 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 propagating r0 7: safe propagating r0 from 6 to 7: safe processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 The underlying program was xlated as follows: # bpftool p d x i 10 0: (b7) r0 = 808464450 1: (b4) w4 = 808464432 2: (bc) w0 = w0 3: (15) if r0 == 0x0 goto pc+1 4: (9c) w4 %= w0 5: (66) if w4 s> 0x30303030 goto pc+0 6: (7f) r0 >>= r0 7: (bc) w0 = w0 8: (15) if r0 == 0x0 goto pc+1 9: (9c) w4 %= w0 10: (66) if w0 s> 0x3030 goto pc+0 11: (d6) if w0 s<= 0x303030 goto pc+1 12: (05) goto pc-1 13: (95) exit The verifier rewrote original instructions it recognized as dead code with 'goto pc-1', but reality differs from verifier simulation in that we are actually able to trigger a hang due to hitting the 'goto pc-1' instructions. Taking a closer look at the verifier analysis, the reason is that it misjudges its pruning decision at the first 'from 6 to 7: safe' occasion. What happens is that while both old/cur registers are marked as precise, they get misjudged for the jmp32 case as range_within() yields true, meaning that the prior verification path with a wider register bound could be verified successfully and therefore the current path with a narrower register bound is deemed safe as well whereas in reality it's not. R0 old/cur path's bounds compare as follows: old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff) cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff) old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff The 64 bit bounds generally look okay and while the information that got propagated from 32 to 64 bit looks correct as well, it's not precise enough for judging a conditional jmp32. Given the latter only operates on subregisters we also need to take these into account as well for a range_within() probe in order to be able to prune paths. Extending the range_within() constraint to both bounds will be able to tell us that the old signed 32 bit bounds are not wider than the cur signed 32 bit bounds. With the fix in place, the program will now verify the 'goto' branch case as it should have been: [...] 6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 6: (66) if w0 s> 0x3030 goto pc+0 R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0 9: (95) exit 7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 7: (d6) if w0 s<= 0x303030 goto pc+1 R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0 8: (30) r0 = *(u8 *)skb[808464432] BPF_LD_[ABS|IND] uses reserved fields processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1 The bug is quite subtle in the sense that when verifier would determine that a given branch is dead code, it would (here: wrongly) remove these instructions from the program and hard-wire the taken branch for privileged programs instead of the 'goto pc-1' rewrites which will cause hard to debug problems. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Reported-by: Anatoly Trosinenko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: John Fastabend <[email protected]> Acked-by: Alexei Starovoitov <[email protected]>
2021-02-10bpf: Fix verifier jsgt branch analysis on max boundDaniel Borkmann1-2/+2
Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return code for both will tell the caller whether a given conditional jump is taken or not, e.g. 1 means branch will be taken [for the involved registers] and the goto target will be executed, 0 means branch will not be taken and instead we fall-through to the next insn, and last but not least a -1 denotes that it is not known at verification time whether a branch will be taken or not. Now while the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval since the branch will also be taken for reg->s32_max_value == sval. The jgt branch analysis, for example, gets this right. Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking") Fixes: 4f7b3e82589e ("bpf: improve verifier branch analysis") Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: John Fastabend <[email protected]> Acked-by: Alexei Starovoitov <[email protected]>