| Age | Commit message (Collapse) | Author | Files | Lines |
|
The stack_depot member was added without kernel-doc, leading to below
warning. Fix it.
./include/drm/drm_modeset_lock.h:74: warning: Function parameter or
member 'stack_depot' not described in 'drm_modeset_acquire_ctx'
Reported-by: Stephen Rothwell <[email protected]>
Fixes: cd06ab2fd48f ("drm/locking: add backtrace for locking contended locks without backoff")
Signed-off-by: Jani Nikula <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Tested-by: Stephen Rothwell <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Pull ceph updates from Ilya Dryomov:
"The highlight is the new mount "device" string syntax implemented by
Venky Shankar. It solves some long-standing issues with using
different auth entities and/or mounting different CephFS filesystems
from the same cluster, remounting and also misleading /proc/mounts
contents. The existing syntax of course remains to be maintained.
On top of that, there is a couple of fixes for edge cases in quota and
a new mount option for turning on unbuffered I/O mode globally instead
of on a per-file basis with ioctl(CEPH_IOC_SYNCIO)"
* tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client:
ceph: move CEPH_SUPER_MAGIC definition to magic.h
ceph: remove redundant Lsx caps check
ceph: add new "nopagecache" option
ceph: don't check for quotas on MDS stray dirs
ceph: drop send metrics debug message
rbd: make const pointer spaces a static const array
ceph: Fix incorrect statfs report for small quota
ceph: mount syntax module parameter
doc: document new CephFS mount device syntax
ceph: record updated mon_addr on remount
ceph: new device mount syntax
libceph: rename parse_fsid() to ceph_parse_fsid() and export
libceph: generalize addr/ip parsing based on delimiter
|
|
The link extended sub-states are assigned as enum that is an integer
size but read from a union as u8, this is working for small values on
little endian systems but for big endian this always give 0. Fix the
variable in the union to match the enum size.
Fixes: ecc31c60240b ("ethtool: Add link extended state")
Signed-off-by: Moshe Tal <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Tested-by: Ido Schimmel <[email protected]>
Reviewed-by: Gal Pressman <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In one net namespace, after creating a packet socket without binding
it to a device, users in other net namespaces can observe the new
`packet_type` added by this packet socket by reading `/proc/net/ptype`
file. This is minor information leakage as packet socket is
namespace aware.
Add a net pointer in `packet_type` to keep the net namespace of
of corresponding packet socket. In `ptype_seq_show`, this net pointer
must be checked when it is not NULL.
Fixes: 2feb27dbe00c ("[NETNS]: Minor information leak via /proc/net/ptype file.")
Signed-off-by: Congyu Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After using io_stop_wc(), drivers reports following compile error when
compiled on X86.
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c: In function ‘hns3_tx_push_bd’:
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:2058:12: error: expected ‘;’ before ‘(’ token
io_stop_wc();
^
It is because I missed to add the brackets after io_stop_wc macro. So
let's add the missing brackets.
Fixes: d5624bb29f49 ("asm-generic: introduce io_stop_wc() and add implementation for ARM64")
Reported-by: Guangbin Huang <[email protected]>
Signed-off-by: Xiongfeng Wang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Catalin Marinas <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bpf.
Quite a handful of old regression fixes but most of those are
pre-5.16.
Current release - regressions:
- fix memory leaks in the skb free deferral scheme if upper layer
protocols are used, i.e. in-kernel TCP readers like TLS
Current release - new code bugs:
- nf_tables: fix NULL check typo in _clone() functions
- change the default to y for Vertexcom vendor Kconfig
- a couple of fixes to incorrect uses of ref tracking
- two fixes for constifying netdev->dev_addr
Previous releases - regressions:
- bpf:
- various verifier fixes mainly around register offset handling
when passed to helper functions
- fix mount source displayed for bpffs (none -> bpffs)
- bonding:
- fix extraction of ports for connection hash calculation
- fix bond_xmit_broadcast return value when some devices are down
- phy: marvell: add Marvell specific PHY loopback
- sch_api: don't skip qdisc attach on ingress, prevent ref leak
- htb: restore minimal packet size handling in rate control
- sfp: fix high power modules without diagnostic monitoring
- mscc: ocelot:
- don't let phylink re-enable TX PAUSE on the NPI port
- don't dereference NULL pointers with shared tc filters
- smsc95xx: correct reset handling for LAN9514
- cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
- phy: micrel: use kszphy_suspend/_resume for irq aware devices,
avoid races with the interrupt
Previous releases - always broken:
- xdp: check prog type before updating BPF link
- smc: resolve various races around abnormal connection termination
- sit: allow encapsulated IPv6 traffic to be delivered locally
- axienet: fix init/reset handling, add missing barriers, read the
right status words, stop queues correctly
- add missing dev_put() in sock_timestamping_bind_phc()
Misc:
- ipv4: prevent accidentally passing RTO_ONLINK to
ip_route_output_key_hash() by sanitizing flags
- ipv4: avoid quadratic behavior in netns dismantle
- stmmac: dwmac-oxnas: add support for OX810SE
- fsl: xgmac_mdio: add workaround for erratum A-009885"
* tag 'net-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (92 commits)
ipv4: add net_hash_mix() dispersion to fib_info_laddrhash keys
ipv4: avoid quadratic behavior in netns dismantle
net/fsl: xgmac_mdio: Fix incorrect iounmap when removing module
powerpc/fsl/dts: Enable WA for erratum A-009885 on fman3l MDIO buses
dt-bindings: net: Document fsl,erratum-a009885
net/fsl: xgmac_mdio: Add workaround for erratum A-009885
net: mscc: ocelot: fix using match before it is set
net: phy: micrel: use kszphy_suspend()/kszphy_resume for irq aware devices
net: cpsw: avoid alignment faults by taking NET_IP_ALIGN into account
nfc: llcp: fix NULL error pointer dereference on sendmsg() after failed bind()
net: axienet: increase default TX ring size to 128
net: axienet: fix for TX busy handling
net: axienet: fix number of TX ring slots for available check
net: axienet: Fix TX ring slot available check
net: axienet: limit minimum TX ring size
net: axienet: add missing memory barriers
net: axienet: reset core on initialization prior to MDIO access
net: axienet: Wait for PhyRstCmplt after core reset
net: axienet: increase reset timeout
bpf, selftests: Add ringbuf memory type confusion test
...
|
|
Merge more updates from Andrew Morton:
"55 patches.
Subsystems affected by this patch series: percpu, procfs, sysctl,
misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"
* emailed patches from Andrew Morton <[email protected]>: (55 commits)
lib: remove redundant assignment to variable ret
ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
btrfs: use generic Kconfig option for 256kB page size limit
arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
configs: introduce debug.config for CI-like setup
delayacct: track delays from memory compact
Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
delayacct: cleanup flags in struct task_delay_info and functions use it
delayacct: fix incomplete disable operation when switch enable to disable
delayacct: support swapin delay accounting for swapping without blkio
panic: remove oops_id
panic: use error_report_end tracepoint on warnings
fs/adfs: remove unneeded variable make code cleaner
FAT: use io_schedule_timeout() instead of congestion_wait()
hfsplus: use struct_group_attr() for memcpy() region
nilfs2: remove redundant pointer sbufs
fs/binfmt_elf: use PT_LOAD p_align values for static PIE
const_structs.checkpatch: add frequently used ops structs
...
|
|
Delay accounting does not track the delay of memory compact. When there
is not enough free memory, tasks can spend a amount of their time
waiting for compact.
To get the impact of tasks in direct memory compact, measure the delay
when allocating memory through memory compact.
Also update tools/accounting/getdelays.c:
/ # ./getdelays_next -di -p 304
print delayacct stats ON
printing IO accounting
PID 304
CPU count real total virtual total delay total delay average
277 780000000 849039485 18877296 0.068ms
IO count delay total delay average
0 0 0ms
SWAP count delay total delay average
0 0 0ms
RECLAIM count delay total delay average
5 11088812685 2217ms
THRASHING count delay total delay average
0 0 0ms
COMPACT count delay total delay average
3 72758 0ms
watch: read=0, write=0, cancelled_write=0
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: wangyong <[email protected]>
Reviewed-by: Jiang Xuexin <[email protected]>
Reviewed-by: Zhang Wenya <[email protected]>
Reviewed-by: Yang Yang <[email protected]>
Reviewed-by: Balbir Singh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Flags in struct task_delay_info is used to distinguish the difference
between swapin and blkio delay acountings. But after patch "delayacct:
support swapin delay accounting for swapping without blkio", there is no
need to do that since swapin and blkio delay accounting use their own
functions.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Yang <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zeal Robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When a task is created after delayacct is enabled, kernel will do all
the delay accountings for that task. The problems is if user disables
delayacct by set /proc/sys/kernel/task_delayacct to zero, only blkio
delay accounting is disabled.
Now disable all the kinds of delay accountings when
/proc/sys/kernel/task_delayacct sets to zero.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Yang <[email protected]>
Reported-by: Zeal Robot <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently delayacct accounts swapin delay only for swapping that cause
blkio. If we use zram for swapping, tools/accounting/getdelays can't
get any SWAP delay.
It's useful to get zram swapin delay information, for example to adjust
compress algorithm or /proc/sys/vm/swappiness.
Reference to PSI, it accounts any kind of swapping by doing its work in
swap_readpage(), no matter whether swapping causes blkio. Let delayacct
do the similar work.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Yang <[email protected]>
Reported-by: Zeal Robot <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Introduce the error detector "warning" to the error_report event and use
the error_report_end tracepoint at the end of a warning report.
This allows in-kernel tests but also userspace to more easily determine
if a warning occurred without polling kernel logs.
[[email protected]: add comma to enum list, per Andy]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Marco Elver <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Wei Liu <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: John Ogness <[email protected]>
Cc: Andy Shevchenko <[email protected]>
Cc: Alexander Popov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Remove licence boilerplate text from the UAPI header.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Discourage people from using UAPI header in new code by adding a note.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Reviewed-by: Brendan Higgins <[email protected]>
Tested-by: Brendan Higgins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "test_hash.c: refactor into KUnit", v3.
We refactored the lib/test_hash.c file into KUnit as part of the student
group LKCAMP [1] introductory hackathon for kernel development.
This test was pointed to our group by Daniel Latypov [2], so its full
conversion into a pure KUnit test was our goal in this patch series, but
we ran into many problems relating to it not being split as unit tests,
which complicated matters a bit, as the reasoning behind the original
tests is quite cryptic for those unfamiliar with hash implementations.
Some interesting developments we'd like to highlight are:
- In patch 1/5 we noticed that there was an unused define directive
that could be removed.
- In patch 4/5 we noticed how stringhash and hash tests are all under
the lib/test_hash.c file, which might cause some confusion, and we
also broke those kernel config entries up.
Overall KUnit developments have been made in the other patches in this
series:
In patches 2/5, 3/5 and 5/5 we refactored the lib/test_hash.c file so as
to make it more compatible with the KUnit style, whilst preserving the
original idea of the maintainer who designed it (i.e. George Spelvin),
which might be undesirable for unit tests, but we assume it is enough
for a first patch.
This patch (of 5):
Currently, there exist hash_32() and __hash_32() functions, which were
introduced in a patch [1] targeting architecture specific optimizations.
These functions can be overridden on a per-architecture basis to achieve
such optimizations. They must set their corresponding define directive
(HAVE_ARCH_HASH_32 and HAVE_ARCH__HASH_32, respectively) so that header
files can deal with these overrides properly.
As the supported 32-bit architectures that have their own hash function
implementation (i.e. m68k, Microblaze, H8/300, pa-risc) have only been
making use of the (more general) __hash_32() function (which only lacks
a right shift operation when compared to the hash_32() function), remove
the define directive corresponding to the arch-specific hash_32()
implementation.
[1] https://lore.kernel.org/lkml/[email protected]/
[[email protected]: hash_32_generic() becomes hash_32()]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Reviewed-by: David Gow <[email protected]>
Tested-by: David Gow <[email protected]>
Co-developed-by: Augusto Durães Camargo <[email protected]>
Signed-off-by: Augusto Durães Camargo <[email protected]>
Co-developed-by: Enzo Ferreira <[email protected]>
Signed-off-by: Enzo Ferreira <[email protected]>
Signed-off-by: Isabella Basso <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: Daniel Latypov <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Rodrigo Siqueira <[email protected]>
Cc: kernel test robot <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Introduce list_is_head() in the similar (*) way as it's done for
list_entry_is_head(). Make use of it in the list.h.
*) it's done as inliner and not a macro to be aligned with other
list_is_*() APIs; while at it, make all three to have the same
style.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Cc: Heikki Krogerus <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When I was implementing a new per-cpu kthread cfs_migration, I found the
comm of it "cfs_migration/%u" is truncated due to the limitation of
TASK_COMM_LEN. For example, the comm of the percpu thread on CPU10~19
all have the same name "cfs_migration/1", which will confuse the user.
This issue is not critical, because we can get the corresponding CPU
from the task's Cpus_allowed. But for kthreads corresponding to other
hardware devices, it is not easy to get the detailed device info from
task comm, for example,
jbd2/nvme0n1p2-
xfs-reclaim/sdf
Currently there are so many truncated kthreads:
rcu_tasks_kthre
rcu_tasks_rude_
rcu_tasks_trace
poll_mpt3sas0_s
ext4-rsv-conver
xfs-reclaim/sd{a, b, c, ...}
xfs-blockgc/sd{a, b, c, ...}
xfs-inodegc/sd{a, b, c, ...}
audit_send_repl
ecryptfs-kthrea
vfio-irqfd-clea
jbd2/nvme0n1p2-
...
We can shorten these names to work around this problem, but it may be
not applied to all of the truncated kthreads. Take 'jbd2/nvme0n1p2-'
for example, it is a nice name, and it is not a good idea to shorten it.
One possible way to fix this issue is extending the task comm size, but
as task->comm is used in lots of places, that may cause some potential
buffer overflows. Another more conservative approach is introducing a
new pointer to store kthread's full name if it is truncated, which won't
introduce too much overhead as it is in the non-critical path. Finally
we make a dicision to use the second approach. See also the discussions
in this thread:
https://lore.kernel.org/lkml/[email protected]/
After this change, the full name of these truncated kthreads will be
displayed via /proc/[pid]/comm:
rcu_tasks_kthread
rcu_tasks_rude_kthread
rcu_tasks_trace_kthread
poll_mpt3sas0_statu
ext4-rsv-conversion
xfs-reclaim/sdf1
xfs-blockgc/sdf1
xfs-inodegc/sdf1
audit_send_reply
ecryptfs-kthread
vfio-irqfd-cleanup
jbd2/nvme0n1p2-8
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Suggested-by: Petr Mladek <[email protected]>
Suggested-by: Steven Rostedt <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Michal Miroslaw <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
As the sched:sched_switch tracepoint args are derived from the kernel,
we'd better make it same with the kernel. So the macro TASK_COMM_LEN is
converted to type enum, then all the BPF programs can get it through
BTF.
The BPF program which wants to use TASK_COMM_LEN should include the
header vmlinux.h. Regarding the test_stacktrace_map and
test_tracepoint, as the type defined in linux/bpf.h are also defined in
vmlinux.h, so we don't need to include linux/bpf.h again.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yafang Shao <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Michal Miroslaw <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is better to use get_task_comm() instead of the open coded string
copy as we do in other places.
struct elf_prpsinfo is used to dump the task information in userspace
coredump or kernel vmcore. Below is the verification of vmcore,
crash> ps
PID PPID CPU TASK ST %MEM VSZ RSS COMM
0 0 0 ffffffff9d21a940 RU 0.0 0 0 [swapper/0]
> 0 0 1 ffffa09e40f85e80 RU 0.0 0 0 [swapper/1]
> 0 0 2 ffffa09e40f81f80 RU 0.0 0 0 [swapper/2]
> 0 0 3 ffffa09e40f83f00 RU 0.0 0 0 [swapper/3]
> 0 0 4 ffffa09e40f80000 RU 0.0 0 0 [swapper/4]
> 0 0 5 ffffa09e40f89f80 RU 0.0 0 0 [swapper/5]
0 0 6 ffffa09e40f8bf00 RU 0.0 0 0 [swapper/6]
> 0 0 7 ffffa09e40f88000 RU 0.0 0 0 [swapper/7]
> 0 0 8 ffffa09e40f8de80 RU 0.0 0 0 [swapper/8]
> 0 0 9 ffffa09e40f95e80 RU 0.0 0 0 [swapper/9]
> 0 0 10 ffffa09e40f91f80 RU 0.0 0 0 [swapper/10]
> 0 0 11 ffffa09e40f93f00 RU 0.0 0 0 [swapper/11]
> 0 0 12 ffffa09e40f90000 RU 0.0 0 0 [swapper/12]
> 0 0 13 ffffa09e40f9bf00 RU 0.0 0 0 [swapper/13]
> 0 0 14 ffffa09e40f98000 RU 0.0 0 0 [swapper/14]
> 0 0 15 ffffa09e40f9de80 RU 0.0 0 0 [swapper/15]
It works well as expected.
Some comments are added to explain why we use the hard-coded 16.
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Yafang Shao <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Michal Miroslaw <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Dennis Dalessandro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Include a note at the top to discourage people from including it in
headers.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When kernel.h is used in the headers it adds a lot into dependency hell,
especially when there are circular dependencies are involved.
Replace kernel.h inclusion with the list of what is really being used.
The rest of the changes are induced by the above and may not be split.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Arend van Spriel <[email protected]> [brcmfmac]
Acked-by: Kalle Valo <[email protected]>
Cc: Arend van Spriel <[email protected]>
Cc: Franky Lin <[email protected]>
Cc: Hante Meuleman <[email protected]>
Cc: Chi-hsien Lin <[email protected]>
Cc: Wright Feng <[email protected]>
Cc: Chung-hsien Hsu <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Heikki Krogerus <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Change the proc_create[_data]() stubs which are used when CONFIG_PROC_FS
is not set from #defines to a static inline stubs.
This should fix clang -Werror builds failing due to errors like this:
drivers/platform/x86/thinkpad_acpi.c:918:30: error: unused variable
'dispatch_proc_ops' [-Werror,-Wunused-const-variable]
Fixing this in include/linux/proc_fs.h should ensure that the same issue
is also fixed in any other drivers hitting the same -Werror issue.
[[email protected]: fix CONFIG_PROC_FS=n]
[[email protected]: fix arch/sparc/kernel/led.c]
[[email protected]: fix build]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
Reported-by: kernel test robot <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Hans de Goede <[email protected]>
Cc: David Howells <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to
populate pte, this patch adds a generic pcpu populate pte function,
pcpu_populate_pte(), which is marked __weak and used on most
architectures, but it is overridden on x86, which has its own
implementation.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With the previous patch, we could add a generic pcpu first chunk
allocate and free function to cleanup the duplicated definations on each
architecture.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu
first chunk allocation will call it to alloc memblock on the
corresponding node by it, this is prepare for the next patch.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kefeng Wang <[email protected]>
Cc: Thomas Bogendoerfer <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Dennis Zhou <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patchset allows to have a single kernel for sv39 and sv48 without
being relocatable.
The idea comes from Arnd Bergmann who suggested to do the same as x86,
that is mapping the kernel to the end of the address space, which allows
the kernel to be linked at the same address for both sv39 and sv48 and
then does not require to be relocated at runtime.
This implements sv48 support at runtime. The kernel will try to boot
with 4-level page table and will fallback to 3-level if the HW does not
support it. Folding the 4th level into a 3-level page table has almost
no cost at runtime.
Note that kasan region had to be moved to the end of the address space
since its location must be known at compile-time and then be valid for
both sv39 and sv48 (and sv57 that is coming).
* riscv-sv48-v3:
riscv: Explicit comment about user virtual address space size
riscv: Use pgtable_l4_enabled to output mmu_type in cpuinfo
riscv: Implement sv48 support
asm-generic: Prepare for riscv use of pud_alloc_one and pud_free
riscv: Allow to dynamically define VA_BITS
riscv: Introduce functions to switch pt_ops
riscv: Split early kasan mapping to prepare sv48 introduction
riscv: Move KASAN mapping next to the kernel mapping
riscv: Get rid of MAXPHYSMEM configs
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
In the following commits, riscv will almost use the generic versions of
pud_alloc_one and pud_free but an additional check is required since those
functions are only relevant when using at least a 4-level page table, which
will be determined at runtime on riscv.
So move the content of those functions into other functions that riscv
can use without duplicating code.
Signed-off-by: Alexandre Ghiti <[email protected]>
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
The helpers continue to use int for retval because all the hooks
are int-returning rather than long-returning. The return value of
bpf_set_retval is int for future-proofing, in case in the future
there may be errors trying to set the retval.
After the previous patch, if a program rejects a syscall by
returning 0, an -EPERM will be generated no matter if the retval
is already set to -err. This patch change it being forced only if
retval is not -err. This is because we want to support, for
example, invoking bpf_set_retval(-EINVAL) and return 0, and have
the syscall return value be -EINVAL not -EPERM.
For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is
that, if the return value is NET_XMIT_DROP, the packet is silently
dropped. We preserve this behavior for backward compatibility
reasons, so even if an errno is set, the errno does not return to
caller. However, setting a non-err to retval cannot propagate so
this is not allowed and we return a -EFAULT in that case.
Signed-off-by: YiFei Zhu <[email protected]>
Reviewed-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/b4013fd5d16bed0b01977c1fafdeae12e1de61fb.1639619851.git.zhuyifei@google.com
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The retval value is moved to struct bpf_cg_run_ctx for ease of access
in different prog types with different context structs layouts. The
helper implementation (to be added in a later patch in the series) can
simply perform a container_of from current->bpf_ctx to retrieve
bpf_cg_run_ctx.
Unfortunately, there is no easy way to access the current task_struct
via the verifier BPF bytecode rewrite, aside from possibly calling a
helper, so a pointer to current task is added to struct bpf_sockopt_kern
so that the rewritten BPF bytecode can access struct bpf_cg_run_ctx with
an indirection.
For backward compatibility, if a getsockopt program rejects a syscall
by returning 0, an -EPERM will be generated, by having the
BPF_PROG_RUN_ARRAY_CG family macros automatically set the retval to
-EPERM. Unlike prior to this patch, this -EPERM will be visible to
ctx->retval for any other hooks down the line in the prog array.
Additionally, the restriction that getsockopt filters can only set
the retval to 0 is removed, considering that certain getsockopt
implementations may return optlen. Filters are now able to set the
value arbitrarily.
Signed-off-by: YiFei Zhu <[email protected]>
Reviewed-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/73b0325f5c29912ccea7ea57ec1ed4d388fc1d37.1639619851.git.zhuyifei@google.com
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Right now BPF_PROG_RUN_ARRAY and related macros return 1 or 0
for whether the prog array allows or rejects whatever is being
hooked. The caller of these macros then return -EPERM or continue
processing based on thw macro's return value. Unforunately this is
inflexible, since -EPERM is the only err that can be returned.
This patch should be a no-op; it prepares for the next patch. The
returning of the -EPERM is moved to inside the macros, so the outer
functions are directly returning what the macros returned if they
are non-zero.
Signed-off-by: YiFei Zhu <[email protected]>
Reviewed-by: Stanislav Fomichev <[email protected]>
Link: https://lore.kernel.org/r/788abcdca55886d1f43274c918eaa9f792a9f33b.1639619851.git.zhuyifei@google.com
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Both description and returns section will become mandatory
for helpers and syscalls in a later commit to generate man pages.
This commit also adds in the documentation that BPF_PROG_RUN is
an alias for BPF_PROG_TEST_RUN for anyone searching for the
syscall in the generated man pages.
Signed-off-by: Usama Arif <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Provide a helper macro to register platform DRM drivers. The new
macro behaves like module_platform_driver() with an additional
test if DRM modesetting has been enabled.
Signed-off-by: Javier Martinez Canillas <[email protected]>
Acked-by: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Provide helper macros to register PCI-based DRM drivers. The new
macros behave like module_pci_driver() with an additional test if
DRM modesetting has been enabled.
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Signed-off-by: Javier Martinez Canillas <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Move the seemingly generic block_vcpu_list from kvm_vcpu to vcpu_vmx, and
rename the list and all associated variables to clarify that it tracks
the set of vCPU that need to be poked on a posted interrupt to the wakeup
vector. The list is not used to track _all_ vCPUs that are blocking, and
the term "blocked" can be misleading as it may refer to a blocking
condition in the host or the guest, where as the PI wakeup case is
specifically for the vCPUs that are actively blocking from within the
guest.
No functional change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Remove kvm_vcpu.pre_pcpu as it no longer has any users. No functional
change intended.
Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maxim Levitsky <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
Bring in fix for VT-d posted interrupts before further changing the code in 5.17.
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've tried to address some performance issues in
f2fs_checkpoint and direct IO flows. Also, there was a work to enhance
the page cache management used for compression. Other than them, we've
done typical work including sysfs, code clean-ups, tracepoint, sanity
check, in addition to bug fixes on corner cases.
Enhancements:
- use iomap for direct IO
- try to avoid lock contention to improve f2fs_ckpt speed
- avoid unnecessary memory allocation in compression flow
- POSIX_FADV_DONTNEED drops the page cache containing compression
pages
- add some sysfs entries (gc_urgent_high_remaining, pending_discard)
Bug fixes:
- try not to expose unwritten blocks to user by DIO (this was added
to avoid merge conflict; another patch is coming to address other
missing case)
- relax minor error condition for file pinning feature used in
Android OTA
- fix potential deadlock case in compression flow
- should not truncate any block on pinned file
In addition, we've done some code clean-ups and tracepoint/sanity
check improvement"
* tag 'f2fs-for-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (29 commits)
f2fs: do not allow partial truncation on pinned file
f2fs: remove redunant invalidate compress pages
f2fs: Simplify bool conversion
f2fs: don't drop compressed page cache in .{invalidate,release}page
f2fs: fix to reserve space for IO align feature
f2fs: fix to check available space of CP area correctly in update_ckpt_flags()
f2fs: support fault injection to f2fs_trylock_op()
f2fs: clean up __find_inline_xattr() with __find_xattr()
f2fs: fix to do sanity check on last xattr entry in __f2fs_setxattr()
f2fs: do not bother checkpoint by f2fs_get_node_info
f2fs: avoid down_write on nat_tree_lock during checkpoint
f2fs: compress: fix potential deadlock of compress file
f2fs: avoid EINVAL by SBI_NEED_FSCK when pinning a file
f2fs: add gc_urgent_high_remaining sysfs node
f2fs: fix to do sanity check in is_alive()
f2fs: fix to avoid panic in is_alive() if metadata is inconsistent
f2fs: fix to do sanity check on inode type during garbage collection
f2fs: avoid duplicate call of mark_inode_dirty
f2fs: show number of pending discard commands
f2fs: support POSIX_FADV_DONTNEED drop compressed page cache
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add new kconfig target 'make mod2noconfig', which will be useful to
speed up the build and test iteration.
- Raise the minimum supported version of LLVM to 11.0.0
- Refactor certs/Makefile
- Change the format of include/config/auto.conf to stop double-quoting
string type CONFIG options.
- Fix ARCH=sh builds in dash
- Separate compression macros for general purposes (cmd_bzip2 etc.) and
the ones for decompressors (cmd_bzip2_with_size etc.)
- Misc Makefile cleanups
* tag 'kbuild-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
kbuild: add cmd_file_size
arch: decompressor: remove useless vmlinux.bin.all-y
kbuild: rename cmd_{bzip2,lzma,lzo,lz4,xzkern,zstd22}
kbuild: drop $(size_append) from cmd_zstd
sh: rename suffix-y to suffix_y
doc: kbuild: fix default in `imply` table
microblaze: use built-in function to get CPU_{MAJOR,MINOR,REV}
certs: move scripts/extract-cert to certs/
kbuild: do not quote string values in include/config/auto.conf
kbuild: do not include include/config/auto.conf from shell scripts
certs: simplify $(srctree)/ handling and remove config_filename macro
kbuild: stop using config_filename in scripts/Makefile.modsign
certs: remove misleading comments about GCC PR
certs: refactor file cleaning
certs: remove unneeded -I$(srctree) option for system_certificates.o
certs: unify duplicated cmd_extract_certs and improve the log
certs: use $< and $@ to simplify the key generation rule
kbuild: remove headers_check stub
kbuild: move headers_check.pl to usr/include/
certs: use if_changed to re-generate the key when the key type is changed
...
|
|
Returning the exclusive fence separately is no longer used.
Instead add a write parameter to indicate the use case.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator fixes from Jason Donenfeld:
- Some Kconfig changes resulted in BIG_KEYS being unselectable, which
Justin sent a patch to fix.
- Geert pointed out that moving to BLAKE2s bloated vmlinux on little
machines, like m68k, so we now compensate for this.
- Numerous style and house cleaning fixes, meant to have a cleaner base
for future changes.
* 'random-5.17-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
random: simplify arithmetic function flow in account()
random: selectively clang-format where it makes sense
random: access input_pool_data directly rather than through pointer
random: cleanup fractional entropy shift constants
random: prepend remaining pool constants with POOL_
random: de-duplicate INPUT_POOL constants
random: remove unused OUTPUT_POOL constants
random: rather than entropy_store abstraction, use global
random: remove unused extract_entropy() reserved argument
random: remove incomplete last_data logic
random: cleanup integer types
random: cleanup poolinfo abstraction
random: fix typo in comments
lib/crypto: sha1: re-roll loops to reduce code size
lib/crypto: blake2s: move hmac construction into wireguard
lib/crypto: add prompts back to crypto libraries
|
|
Move the base i915 buddy allocator code into drm
- Move i915_buddy.h to include/drm
- Move i915_buddy.c to drm root folder
- Rename "i915" string with "drm" string wherever applicable
- Rename "I915" string with "DRM" string wherever applicable
- Fix header file dependencies
- Fix alignment issues
- add Makefile support for drm buddy
- export functions and write kerneldoc description
- Remove i915 selftest config check condition as buddy selftest
will be moved to drm selftest folder
cleanup i915 buddy references in i915 driver module
and replace with drm buddy
v2:
- include header file in alphabetical order(Thomas)
- merged changes listed in the body section into a single patch
to keep the build intact(Christian, Jani)
v3:
- make drm buddy a separate module(Thomas, Christian)
v4:
- Fix build error reported by kernel test robot <[email protected]>
- removed i915 buddy selftest from i915_mock_selftests.h to
avoid build error
- removed selftests/i915_buddy.c file as we create a new set of
buddy test cases in drm/selftests folder
v5:
- Fix merge conflict issue
v6:
- replace drm_buddy_mm structure name as drm_buddy(Thomas, Christian)
- replace drm_buddy_alloc() function name as drm_buddy_alloc_blocks()
(Thomas)
- replace drm_buddy_free() function name as drm_buddy_free_block()
(Thomas)
- export drm_buddy_free_block() function
- fix multiple instances of KMEM_CACHE() entry
v7:
- fix warnings reported by kernel test robot <[email protected]>
- modify the license(Christian)
v8:
- fix warnings reported by kernel test robot <[email protected]>
Signed-off-by: Arunpravin <[email protected]>
Acked-by: Christian König <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Signed-off-by: Christian König <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2022-01-19
We've added 12 non-merge commits during the last 8 day(s) which contain
a total of 12 files changed, 262 insertions(+), 64 deletions(-).
The main changes are:
1) Various verifier fixes mainly around register offset handling when
passed to helper functions, from Daniel Borkmann.
2) Fix XDP BPF link handling to assert program type,
from Toke Høiland-Jørgensen.
3) Fix regression in mount parameter handling for BPF fs,
from Yafang Shao.
4) Fix incorrect integer literal when marking scratched stack slots
in verifier, from Christy Lee.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, selftests: Add ringbuf memory type confusion test
bpf, selftests: Add various ringbuf tests with invalid offset
bpf: Fix ringbuf memory type confusion when passing to helpers
bpf: Fix out of bounds access for ringbuf helpers
bpf: Generally fix helper register offset check
bpf: Mark PTR_TO_FUNC register initially with zero offset
bpf: Generalize check_ctx_reg for reuse with other types
bpf: Fix incorrect integer literal used for marking scratched stack.
bpf/selftests: Add check for updating XDP bpf_link with wrong program type
bpf/selftests: convert xdp_link test to ASSERT_* macros
xdp: check prog type before updating BPF link
bpf: Fix mount source show for bpffs
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The bpf_ringbuf_submit() and bpf_ringbuf_discard() have ARG_PTR_TO_ALLOC_MEM
in their bpf_func_proto definition as their first argument, and thus both expect
the result from a prior bpf_ringbuf_reserve() call which has a return type of
RET_PTR_TO_ALLOC_MEM_OR_NULL.
While the non-NULL memory from bpf_ringbuf_reserve() can be passed to other
helpers, the two sinks (bpf_ringbuf_submit(), bpf_ringbuf_discard()) right now
only enforce a register type of PTR_TO_MEM.
This can lead to potential type confusion since it would allow other PTR_TO_MEM
memory to be passed into the two sinks which did not come from bpf_ringbuf_reserve().
Add a new MEM_ALLOC composable type attribute for PTR_TO_MEM, and enforce that:
- bpf_ringbuf_reserve() returns NULL or PTR_TO_MEM | MEM_ALLOC
- bpf_ringbuf_submit() and bpf_ringbuf_discard() only take PTR_TO_MEM | MEM_ALLOC
but not plain PTR_TO_MEM arguments via ARG_PTR_TO_ALLOC_MEM
- however, other helpers might treat PTR_TO_MEM | MEM_ALLOC as plain PTR_TO_MEM
to populate the memory area when they use ARG_PTR_TO_{UNINIT_,}MEM in their
func proto description
Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
Generalize the check_ctx_reg() helper function into a more generic named one
so that it can be reused for other register types as well to check whether
their offset is non-zero. No functional change.
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: John Fastabend <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
This change adds conntrack lookup helpers using the unstable kfunc call
interface for the XDP and TC-BPF hooks. The primary usecase is
implementing a synproxy in XDP, see Maxim's patchset [0].
Export get_net_ns_by_id as nf_conntrack_bpf.c needs to call it.
This object is only built when CONFIG_DEBUG_INFO_BTF_MODULES is enabled.
[0]: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
This patch adds verifier support for PTR_TO_BTF_ID return type of kfunc
to be a reference, by reusing acquire_reference_state/release_reference
support for existing in-kernel bpf helpers.
We make use of the three kfunc types:
- BTF_KFUNC_TYPE_ACQUIRE
Return true if kfunc_btf_id is an acquire kfunc. This will
acquire_reference_state for the returned PTR_TO_BTF_ID (this is the
only allow return value). Note that acquire kfunc must always return a
PTR_TO_BTF_ID{_OR_NULL}, otherwise the program is rejected.
- BTF_KFUNC_TYPE_RELEASE
Return true if kfunc_btf_id is a release kfunc. This will release the
reference to the passed in PTR_TO_BTF_ID which has a reference state
(from earlier acquire kfunc).
The btf_check_func_arg_match returns the regno (of argument register,
hence > 0) if the kfunc is a release kfunc, and a proper referenced
PTR_TO_BTF_ID is being passed to it.
This is similar to how helper call check uses bpf_call_arg_meta to
store the ref_obj_id that is later used to release the reference.
Similar to in-kernel helper, we only allow passing one referenced
PTR_TO_BTF_ID as an argument. It can also be passed in to normal
kfunc, but in case of release kfunc there must always be one
PTR_TO_BTF_ID argument that is referenced.
- BTF_KFUNC_TYPE_RET_NULL
For kfunc returning PTR_TO_BTF_ID, tells if it can be NULL, hence
force caller to mark the pointer not null (using check) before
accessing it. Note that taking into account the case fixed by commit
93c230e3f5bd ("bpf: Enforce id generation for all may-be-null register type")
we assign a non-zero id for mark_ptr_or_null_reg logic. Later, if more
return types are supported by kfunc, which have a _OR_NULL variant, it
might be better to move this id generation under a common
reg_type_may_be_null check, similar to the case in the commit.
Referenced PTR_TO_BTF_ID is currently only limited to kfunc, but can be
extended in the future to other BPF helpers as well. For now, we can
rely on the btf_struct_ids_match check to ensure we get the pointer to
the expected struct type. In the future, care needs to be taken to avoid
ambiguity for reference PTR_TO_BTF_ID passed to release function, in
case multiple candidates can release same BTF ID.
e.g. there might be two release kfuncs (or kfunc and helper):
foo(struct abc *p);
bar(struct abc *p);
... such that both release a PTR_TO_BTF_ID with btf_id of struct abc. In
this case we would need to track the acquire function corresponding to
the release function to avoid type confusion, and store this information
in the register state so that an incorrect program can be rejected. This
is not a problem right now, hence it is left as an exercise for the
future patch introducing such a case in the kernel.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
BPF helpers can associate two adjacent arguments together to pass memory
of certain size, using ARG_PTR_TO_MEM and ARG_CONST_SIZE arguments.
Since we don't use bpf_func_proto for kfunc, we need to leverage BTF to
implement similar support.
The ARG_CONST_SIZE processing for helpers is refactored into a common
check_mem_size_reg helper that is shared with kfunc as well. kfunc
ptr_to_mem support follows logic similar to global functions, where
verification is done as if pointer is not null, even when it may be
null.
This leads to a simple to follow rule for writing kfunc: always check
the argument pointer for NULL, except when it is PTR_TO_CTX. Also, the
PTR_TO_CTX case is also only safe when the helper expecting pointer to
program ctx is not exposed to other programs where same struct is not
ctx type. In that case, the type check will fall through to other cases
and would permit passing other types of pointers, possibly NULL at
runtime.
Currently, we require the size argument to be suffixed with "__sz" in
the parameter name. This information is then recorded in kernel BTF and
verified during function argument checking. In the future we can use BTF
tagging instead, and modify the kernel function definitions. This will
be a purely kernel-side change.
This allows us to have some form of backwards compatibility for
structures that are passed in to the kernel function with their size,
and allow variable length structures to be passed in if they are
accompanied by a size parameter.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Completely remove the old code for check_kfunc_call to help it work
with modules, and also the callback itself.
The previous commit adds infrastructure to register all sets and put
them in vmlinux or module BTF, and concatenates all related sets
organized by the hook and the type. Once populated, these sets remain
immutable for the lifetime of the struct btf.
Also, since we don't need the 'owner' module anywhere when doing
check_kfunc_call, drop the 'btf_modp' module parameter from
find_kfunc_desc_btf.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
This patch prepares the kernel to support putting all kinds of kfunc BTF
ID sets in the struct btf itself. The various kernel subsystems will
make register_btf_kfunc_id_set call in the initcalls (for built-in code
and modules).
The 'hook' is one of the many program types, e.g. XDP and TC/SCHED_CLS,
STRUCT_OPS, and 'types' are check (allowed or not), acquire, release,
and ret_null (with PTR_TO_BTF_ID_OR_NULL return type).
A maximum of BTF_KFUNC_SET_MAX_CNT (32) kfunc BTF IDs are permitted in a
set of certain hook and type for vmlinux sets, since they are allocated
on demand, and otherwise set as NULL. Module sets can only be registered
once per hook and type, hence they are directly assigned.
A new btf_kfunc_id_set_contains function is exposed for use in verifier,
this new method is faster than the existing list searching method, and
is also automatic. It also lets other code not care whether the set is
unallocated or not.
Note that module code can only do single register_btf_kfunc_id_set call
per hook. This is why sorting is only done for in-kernel vmlinux sets,
because there might be multiple sets for the same hook and type that
must be concatenated, hence sorting them is required to ensure bsearch
in btf_id_set_contains continues to work correctly.
Next commit will update the kernel users to make use of this
infrastructure.
Finally, add __maybe_unused annotation for BTF ID macros for the
!CONFIG_DEBUG_INFO_BTF case, so that they don't produce warnings during
build time.
The previous patch is also needed to provide synchronization against
initialization for module BTF's kfunc_set_tab introduced here, as
described below:
The kfunc_set_tab pointer in struct btf is write-once (if we consider
the registration phase (comprised of multiple register_btf_kfunc_id_set
calls) as a single operation). In this sense, once it has been fully
prepared, it isn't modified, only used for lookup (from the verifier
context).
For btf_vmlinux, it is initialized fully during the do_initcalls phase,
which happens fairly early in the boot process, before any processes are
present. This also eliminates the possibility of bpf_check being called
at that point, thus relieving us of ensuring any synchronization between
the registration and lookup function (btf_kfunc_id_set_contains).
However, the case for module BTF is a bit tricky. The BTF is parsed,
prepared, and published from the MODULE_STATE_COMING notifier callback.
After this, the module initcalls are invoked, where our registration
function will be called to populate the kfunc_set_tab for module BTF.
At this point, BTF may be available to userspace while its corresponding
module is still intializing. A BTF fd can then be passed to verifier
using bpf syscall (e.g. for kfunc call insn).
Hence, there is a race window where verifier may concurrently try to
lookup the kfunc_set_tab. To prevent this race, we must ensure the
operations are serialized, or waiting for the __init functions to
complete.
In the earlier registration API, this race was alleviated as verifier
bpf_check_mod_kfunc_call didn't find the kfunc BTF ID until it was added
by the registration function (called usually at the end of module __init
function after all module resources have been initialized). If the
verifier made the check_kfunc_call before kfunc BTF ID was added to the
list, it would fail verification (saying call isn't allowed). The
access to list was protected using a mutex.
Now, it would still fail verification, but for a different reason
(returning ENXIO due to the failed btf_try_get_module call in
add_kfunc_call), because if the __init call is in progress the module
will be in the middle of MODULE_STATE_COMING -> MODULE_STATE_LIVE
transition, and the BTF_MODULE_LIVE flag for btf_module instance will
not be set, so the btf_try_get_module call will fail.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|