Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- SLOB deprecation and SLUB_TINY
The SLOB allocator adds maintenance burden and stands in the way of
API improvements [1]. Deprecate it by renaming the config option (to
make users notice) to CONFIG_SLOB_DEPRECATED with updated help text.
SLUB should be used instead as SLAB will be the next on the removal
list.
Based on reports from a riscv k210 board with 8MB RAM, add a
CONFIG_SLUB_TINY option to minimize SLUB's memory usage at the
expense of scalability. This has resolved the k210 regression [2] so
in case there are no others (that wouldn't be resolvable by further
tweaks to SLUB_TINY) plan is to remove SLOB in a few cycles.
Existing defconfigs with CONFIG_SLOB are converted to
CONFIG_SLUB_TINY.
- kmalloc() slub_debug redzone improvements
A series from Feng Tang that builds on the tracking or requested size
for kmalloc() allocations (for caches with debugging enabled) added
in 6.1, to make redzone checks consider the requested size and not
the rounded up one, in order to catch more subtle buffer overruns.
Includes new slub_kunit test.
- struct slab fields reordering to accomodate larger rcu_head
RCU folks would like to grow rcu_head with debugging options, which
breaks current struct slab layout's assumptions, so reorganize it to
make this possible.
- Miscellaneous improvements/fixes:
- __alloc_size checking compiler workaround (Kees Cook)
- Optimize and cleanup SLUB's sysfs init (Rasmus Villemoes)
- Make SLAB compatible with PROVE_RAW_LOCK_NESTING (Jiri Kosina)
- Correct SLUB's percpu allocation estimates (Baoquan He)
- Re-enableS LUB's run-time failslab sysfs control (Alexander Atanasov)
- Make tools/vm/slabinfo more user friendly when not run as root (Rong Tao)
- Dead code removal in SLUB (Hyeonggon Yoo)
* tag 'slab-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (31 commits)
mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED
mm, slub: don't aggressively inline with CONFIG_SLUB_TINY
mm, slub: remove percpu slabs with CONFIG_SLUB_TINY
mm, slub: split out allocations from pre/post hooks
mm/slub, kunit: Add a test case for kmalloc redzone check
mm/slub, kunit: add SLAB_SKIP_KFENCE flag for cache creation
mm, slub: refactor free debug processing
mm, slab: ignore SLAB_RECLAIM_ACCOUNT with CONFIG_SLUB_TINY
mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY
mm, slub: lower the default slub_max_order with CONFIG_SLUB_TINY
mm, slub: retain no free slabs on partial list with CONFIG_SLUB_TINY
mm, slub: disable SYSFS support with CONFIG_SLUB_TINY
mm, slub: add CONFIG_SLUB_TINY
mm, slab: ignore hardened usercopy parameters when disabled
slab: Remove special-casing of const 0 size allocations
slab: Clean up SLOB vs kmalloc() definition
mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head
mm/migrate: make isolate_movable_page() skip slab pages
mm/slab: move and adjust kernel-doc for kmem_cache_alloc
mm/slub, percpu: correct the calculation of early percpu allocation size
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
"Most are small refactorings and bug fixes, but three things stand out:
switching timens (which got reverted before) looks solid now,
FOLL_FORCE has been removed (no failures seen yet across several weeks
in -next), and some whitespace cleanups (which are long overdue).
- Add timens support (when switching mm). This version has survived
in -next for the entire cycle (Andrei Vagin)
- Various small bug fixes, refactoring, and readability improvements
(Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin)
- Remove FOLL_FORCE for stack setup (Kees Cook)
- Whitespace cleanups (Rolf Eike Beer, Kees Cook)"
* tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_misc: fix shift-out-of-bounds in check_special_flags
binfmt: Fix error return code in load_elf_fdpic_binary()
exec: Remove FOLL_FORCE for stack setup
binfmt_elf: replace IS_ERR() with IS_ERR_VALUE()
binfmt_elf: simplify error handling in load_elf_phdrs()
binfmt_elf: fix documented return value for load_elf_phdrs()
exec: simplify initial stack size expansion
binfmt: Fix whitespace issues
exec: Add comments on check_unsafe_exec() fs counting
ELF uapi: add spaces before '{'
selftests/timens: add a test for vfork+exit
fs/exec: switch timens when a task gets a new mm
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook:
- Add missing kerndoc parameter (Randy Dunlap)
- Improve seccomp selftest to check CAP_SYS_ADMIN (Gautam Menghani)
- Fix allocation leak when cloned thread immediately dies (Kuniyuki
Iwashima)
* tag 'seccomp-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: document the "filter_count" field
seccomp: Move copy_seccomp() to no failure path.
selftests/seccomp: Check CAP_SYS_ADMIN capability in the test mode_filter_without_nnp
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
- Further improvements to nolibc testing
* tag 'nolibc.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
selftests/nolibc: Always rebuild the sysroot when running a test
selftests/nolibc: Add 7 tests for memcmp()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull KCSAN updates from Paul McKenney:
- Add instrumentation for memcpy(), memset(), and memmove() for Clang
v16+'s new function names that are used when the -fsanitize=thread
argument is given
- Fix objtool warnings from KCSAN's volatile instrumentation, and typos
in a pair of Kconfig options' help clauses
* tag 'kcsan.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
kcsan: Fix trivial typo in Kconfig help comments
objtool, kcsan: Add volatile read/write instrumentation to whitelist
kcsan: Instrument memcpy/memset/memmove with newer Clang
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull kernel memory model documentation updates from Paul McKenney:
- Update the LKMM documentation, both in English and in Korean
* tag 'lkmm.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
docs/memory-barriers.txt/kokr: Fix confusing name of 'data dependency barrier'
docs/memory-barriers.txt/kokr: Add memory barrier dma_mb()
docs/memory-barriers.txt/kokr: introduce io_stop_wc() and add implementation for ARM64
docs/memory-barriers.txt: Add a missed closing parenthesis
tools/memory-model: Weaken ctrl dependency definition in explanation.txt
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates. This is the second in a series from an ongoing
review of the RCU documentation.
- Miscellaneous fixes.
- Introduce a default-off Kconfig option that depends on RCU_NOCB_CPU
that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument
CPU lists, causes call_rcu() to introduce delays.
These delays result in significant power savings on nearly idle
Android and ChromeOS systems. These savings range from a few percent
to more than ten percent.
This series also includes several commits that change call_rcu() to a
new call_rcu_hurry() function that avoids these delays in a few
cases, for example, where timely wakeups are required. Several of
these are outside of RCU and thus have acks and reviews from the
relevant maintainers.
- Create an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe()
for architectures that support NMIs, but which do not provide
NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required
by the upcoming lockless printk() work by John Ogness et al.
- Changes providing minor but important increases in torture test
coverage for the new RCU polled-grace-period APIs.
- Changes to torturescript that avoid redundant kernel builds, thus
providing about a 30% speedup for the torture.sh acceptance test.
* tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
net: devinet: Reduce refcount before grace period
net: Use call_rcu_hurry() for dst_release()
workqueue: Make queue_rcu_work() use call_rcu_hurry()
percpu-refcount: Use call_rcu_hurry() for atomic switch
scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
rcu/rcutorture: Use call_rcu_hurry() where needed
rcu/rcuscale: Use call_rcu_hurry() for async reader test
rcu/sync: Use call_rcu_hurry() instead of call_rcu
rcuscale: Add laziness and kfree tests
rcu: Shrinker for lazy rcu
rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
rcu: Make call_rcu() lazy to save power
rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC
srcu: Debug NMI safety even on archs that don't require it
srcu: Explain the reason behind the read side critical section on GP start
srcu: Warn when NMI-unsafe API is used in NMI
arch/s390: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
arch/loongarch: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()
rcu-tasks: Make grace-period-age message human-readable
...
|
|
Merge devfreq updates and cpupower utility updates for 6.2-rc1:
- Add a private governor_data for devfreq governors (Kant Fan).
- Reorganize devfreq code to use device_match_of_node() and
devm_platform_get_and_ioremap_resource() instead of open coding
them (ye xingchen, Minghao Chi).
- Make cpupower choose base_cpu to display default cpupower details
instead of picking CPU 0 (Saket Kumar Bhaskar).
- Add Georgian translation to cpupower documentation (Zurab
Kargareteli).
- Introduce powercap intel-rapl library, powercap-info command, and
RAPL monitor into cpupower (Thomas Renninger).
* pm-devfreq:
PM / devfreq: event: use devm_platform_get_and_ioremap_resource()
PM / devfreq: event: Use device_match_of_node()
PM / devfreq: Use device_match_of_node()
PM/devfreq: governor: Add a private governor_data for governor
* pm-tools:
cpupower: rapl monitor - shows the used power consumption in uj for each rapl domain
cpupower: Introduce powercap intel-rapl library and powercap-info command
cpupower: Add Georgian translation
tools/cpupower: Choose base_cpu to display default cpupower details
|
|
Merge cpuidle changes, updates related to system sleep amd generic power
domains code fixes for 6.2-rc1:
- Improve kernel messages printed by the cpuidle PCI driver (Ulf
Hansson).
- Make the DT cpuidle driver return the correct number of parsed idle
states, clean it up and clarify a comment in it (Ulf Hansson).
- Modify the tasks freezing code to avoid using pr_cont() and refine an
error message printed by it (Rafael Wysocki).
- Make the hibernation core code complain about memory map mismatches
during resume to help diagnostics (Xueqin Luo).
- Fix mistake in a kerneldoc comment in the hibernation code (xiongxin).
- Reverse the order of performance and enabling operations in the
generic power domains code (Abel Vesa).
- Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the
generic power domains code (Abel Vesa).
- Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn
Guo).
- Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo).
- Drop generic power domain status manipulation during hibernate
restore (Shawn Guo).
* pm-cpuidle:
cpuidle: dt: Clarify a comment and simplify code in dt_init_idle_driver()
cpuidle: dt: Return the correct numbers of parsed idle states
cpuidle: psci: Extend information in log about OSI/PC mode
* pm-sleep:
PM: sleep: Refine error message in try_to_freeze_tasks()
PM: sleep: Avoid using pr_cont() in the tasks freezing code
PM: hibernate: Complain about memory map mismatches during resume
PM: hibernate: Fix mistake in kerneldoc comment
* pm-domains:
PM: domains: Reverse the order of performance and enabling ops
PM: domains: Power off[on] domain in hibernate .freeze[thaw]_noirq hook
PM: domains: Consolidate genpd_restore_noirq() and genpd_resume_noirq()
PM: domains: Pass generic PM noirq hooks to genpd_finish_suspend()
PM: domains: Drop genpd status manipulation for hibernate restore
|
|
Merge ACPICA changes, including bug fixes and cleanups as well as support
for some recently defined data structures, for 6.2-rc1:
- Make acpi_ex_load_op() match upstream implementation (Rafael Wysocki).
- Add support for loong_arch-specific APICs in MADT (Huacai Chen).
- Add support for fixed PCIe wake event (Huacai Chen).
- Add EBDA pointer sanity checks (Vit Kabele).
- Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele).
- Add CCEL table support to both compiler/disassembler (Kuppuswamy
Sathyanarayanan).
- Add a couple of new UUIDs to the known UUID list (Bob Moore).
- Add support for FFH Opregion special context data (Sudeep Holla).
- Improve warning message for "invalid ACPI name" (Bob Moore).
- Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT table
(Alison Schofield).
- Prepare IORT support for revision E.e (Robin Murphy).
- Finish support for the CDAT table (Bob Moore).
- Fix error code path in acpi_ds_call_control_method() (Rafael Wysocki).
- Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li Zetao).
- Update the version of the ACPICA code in the kernel (Bob Moore).
* acpica:
ACPICA: Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage()
ACPICA: Fix error code path in acpi_ds_call_control_method()
ACPICA: Update version to 20221020
ACPICA: Add utcksum.o to the acpidump Makefile
Revert "LoongArch: Provisionally add ACPICA data structures"
ACPICA: Finish support for the CDAT table
ACPICA: IORT: Update for revision E.e
ACPICA: Add CXL 3.0 structures (CXIMS & RDPAS) to the CEDT table
ACPICA: Improve warning message for "invalid ACPI name"
ACPICA: Add support for FFH Opregion special context data
ACPICA: Add a couple of new UUIDs to the known UUID list
ACPICA: iASL: Add CCEL table to both compiler/disassembler
ACPICA: Do not touch VGA memory when EBDA < 1ki_b
ACPICA: Check that EBDA pointer is in valid memory
ACPICA: Events: Support fixed PCIe wake event
ACPICA: MADT: Add loong_arch-specific APICs support
ACPICA: Make acpi_ex_load_op() match upstream
|
|
The test currently fails on 32bit. Fixing the "-1ull" vs. "-1ul" seems
to make the test pass and the compiler happy.
Note: This test is not in mm-stable yet. This fix should be squashed into
"selftests/vm: add KSM unmerge tests".
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Yang Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The compiler complains about the conversion of a pointer to an int of
different width.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 6f1405efc61b ("selftests/vm: anon_cow: add R/O longterm tests via gup_test")
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Yang Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The tests fail to compile in some environments (e.g., Debian 11.5 on x86).
Let's simply conditionally define MADV_POPULATE_(READ|WRITE) if not
already defined, similar to how the khugepaged.c test handles it.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 39b2e5cae43d ("selftests/vm: make MADV_POPULATE_(READ|WRITE) use in-tree headers")
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Yang Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Make sure that we ignore protection of a memcg that is the target of memcg
reclaim.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yosry Ahmed <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Cc: Chris Down <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vasily Averin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Refactor the code that drives writing to memory.reclaim (retrying, error
handling, etc) from test_memcg_reclaim() to a helper called
reclaim_until(), which proactively reclaims from a memcg until its usage
reaches a certain value.
While we are at it, refactor and simplify the reclaim loop.
This will be used in a following patch in another test.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yosry Ahmed <[email protected]>
Suggested-by: Roman Gushchin <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Cc: Chris Down <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Vasily Averin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
A DAMON sysfs user could start DAMON with a scheme, remove the sysfs
directory for the scheme, and then ask stats or schemes tried regions
update. The related logic were not aware of the already removed directory
situation, so it was able to results in invalid memory accesses. The fix
has made with commit 8468b486612c ("mm/damon/sysfs-schemes: skip stats
update if the scheme directory is removed"), though. Add a selftest to
prevent such kinds of bugs from being introduced again.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: SeongJae Park <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Let's add a test to measure performance of KSM breaking not triggered via
COW, but triggered by disabling KSM on an area filled with KSM pages via
MADV_UNMERGEABLE.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Peter Xu <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/ksm: break_ksm() cleanups and fixes", v2.
This series cleans up and fixes break_ksm(). In summary, we no longer use
fake write faults to break COW but instead FAULT_FLAG_UNSHARE. Further,
we move away from using follow_page() --- that we can hopefully remove
completely at one point --- and use new walk_page_range_vma() instead.
Fortunately, we can get rid of VM_FAULT_WRITE and FOLL_MIGRATION in common
code now.
Extend the existing ksm tests by an unmerge benchmark, and a some new
unmerge tests.
Also, add a selftest to measure MADV_UNMERGEABLE performance. In my setup
(AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance
on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a
performance degradation of ~6% -- 7% (old: ~5250 MiB/s, new: ~4900 MiB/s).
I don't think we particularly care for now, but it's good to be aware of
the implication.
This patch (of 9):
Let's add three unmerge tests (MADV_UNMERGEABLE unmerging all pages in the
range).
test_unmerge(): basic unmerge tests
test_unmerge_discarded(): have some pte_none() entries in the range
test_unmerge_uffd_wp(): protect the merged pages using uffd-wp
ksm_tests.c currently contains a mixture of benchmarks and tests, whereby
each test is carried out by executing the ksm_tests binary with specific
parameters. Let's add new ksm_functional_tests.c that performs multiple,
smaller functional tests all at once.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Our memory management kernel CI testing at Red Hat uses the VM
selftests and we have run into two problems:
First, our LTP tests overlap with the VM selftests.
We want to avoid unhelpful redundancy in our testing practices.
Second, we have observed the current run_vmtests.sh to report overall
failure/ambiguous results in the case that a machine lacks the necessary
hardware to perform one or more of the tests. E.g. ksm tests that
require more than one numa node.
We want to be able to run the vm selftests suitable to particular hardware.
Add the ability to run one or more groups of vm tests via run_vmtests.sh
instead of simply all-or-none in order to solve these problems.
Preserve existing default behavior of running all tests when the script
is invoked with no arguments.
Documentation of test groups is included in the patch as follows:
# ./run_vmtests.sh [ -h || --help ]
usage: ./tools/testing/selftests/vm/run_vmtests.sh [ -h | -t "<categories>"]
-t: specify specific categories to tests to run
-h: display this message
The default behavior is to run all tests.
Alternatively, specific groups tests can be run by passing a string
to the -t argument containing one or more of the following categories
separated by spaces:
- mmap
tests for mmap(2)
- gup_test
tests for gup using gup_test interface
- userfaultfd
tests for userfaultfd(2)
- compaction
a test for the patch "Allow compaction of unevictable pages"
- mlock
tests for mlock(2)
- mremap
tests for mremap(2)
- hugevm
tests for very large virtual address space
- vmalloc
vmalloc smoke tests
- hmm
hmm smoke tests
- madv_populate
test memadvise(2) MADV_POPULATE_{READ,WRITE} options
- memfd_secret
test memfd_secret(2)
- process_mrelease
test process_mrelease(2)
- ksm
ksm tests that do not require >=2 NUMA nodes
- ksm_numa
ksm tests that require >=2 NUMA nodes
- pkey
memory protection key tests
- soft_dirty
test soft dirty page bit semantics
- anon_cow
test anonymous copy-on-write semantics
example: ./run_vmtests.sh -t "hmm mmap ksm"
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Joel Savitz <[email protected]>
Cc: Joel Savitz <[email protected]>
Cc: Nico Pache <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Nine hotfixes.
Six for MM, three for other areas. Four of these patches address
post-6.0 issues"
* tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
memcg: fix possible use-after-free in memcg_write_event_control()
MAINTAINERS: update Muchun Song's email
mm/gup: fix gup_pud_range() for dax
mmap: fix do_brk_flags() modifying obviously incorrect VMAs
mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit
tmpfs: fix data loss from failed fallocate
kselftests: cgroup: update kmem test precision tolerance
mm: do not BUG_ON missing brk mapping, because userspace can unmap it
mailmap: update Matti Vaittinen's email address
|
|
Check that verifier.c:states_equal() uses check_ids() to match
consistent active_lock/map_value configurations. This allows to prune
states with active spin locks even if numerical values of
active_lock ids do not match across compared states.
Signed-off-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Test that when reg->id is not same for the same register of type
PTR_TO_MAP_VALUE between current and old explored state, we currently
return false from regsafe and continue exploring.
Without the fix in prior commit, the test case fails.
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
A test case that would erroneously pass verification if
verifier.c:states_equal() maintains separate register ID mappings for
call frames.
Signed-off-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Under certain conditions it was possible for verifier.c:regsafe() to
skip check_id() call. This commit adds negative test cases previously
errorneously accepted as safe.
Signed-off-by: Eduard Zingerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
|
|
1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
the batch size while this test case has been left behind. This has led
to a test failure reported by test bot:
not ok 2 selftests: cgroup: test_kmem # exit=1
Update the tolerance for the pcp charges to reflect the
MEMCG_CHARGE_BATCH change to fix this.
[[email protected]: update comments, per Roman]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 1813e51eece0a ("memcg: increase MEMCG_CHARGE_BATCH to 64")
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/oe-lkp/[email protected]
Acked-by: Shakeel Butt <[email protected]>
Acked-by: Roman Gushchin <[email protected]>
Tested-by: Yujie Liu <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Feng Tang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: "Michal Koutný" <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Add man pages for the rv command line, using the same scheme we used
in rtla.
Link: https://lkml.kernel.org/r/e841d7cfbdfc3ebdaf7cbd40278571940145d829.1668180100.git.bristot@kernel.org
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Add the ability to control and trace in-kernel monitors. This is
a generic interface, it will check for existing monitors and enable
standard setup, like enabling reactors.
For example:
# rv list
wip wakeup in preemptive per-cpu testing monitor. [OFF]
wwnr wakeup while not running per-task testing model. [OFF]
# rv mon wwnr --help
rv version 6.1.0-rc4: help
usage: rv mon wwnr [-h] [-q] [-r reactor] [-s] [-v]
-h/--help: print this menu and the reactor list
-r/--reactor 'reactor': enables the 'reactor'
-s/--self: when tracing (-t), also trace rv command
-t/--trace: trace monitor's event
-v/--verbose: print debug messages
available reactors: nop printk panic
# rv mon wwnr --trace
<TASK>-PID [CPU] TYPE ID STATE x EVENT -> NEXT_STATE FINAL
| | | | | | | | |
rv-3613 [001] event 3613 running x switch_out -> not_running Y
sshd-1248 [005] event 1248 running x switch_out -> not_running Y
<idle>-0 [005] event 71 not_running x wakeup -> not_running Y
<idle>-0 [005] event 71 not_running x switch_in -> running N
kcompactd0-71 [005] event 71 running x switch_out -> not_running Y
<idle>-0 [000] event 860 not_running x wakeup -> not_running Y
<idle>-0 [000] event 860 not_running x switch_in -> running N
systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y
<idle>-0 [000] event 860 not_running x wakeup -> not_running Y
<idle>-0 [000] event 860 not_running x switch_in -> running N
systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y
<idle>-0 [005] event 71 not_running x wakeup -> not_running Y
<idle>-0 [005] event 71 not_running x switch_in -> running N
kcompactd0-71 [005] event 71 running x switch_out -> not_running Y
<idle>-0 [000] event 860 not_running x wakeup -> not_running Y
<idle>-0 [000] event 860 not_running x switch_in -> running N
systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y
<idle>-0 [001] event 3613 not_running x wakeup -> not_running Y
Link: https://lkml.kernel.org/r/1e57547e3acadda6e23949b2672c89e76ec2ec42.1668180100.git.bristot@kernel.org
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
This is the (user-space) runtime verification tool, named rv.
This tool aims to be the interface for in-kernel rv monitors, as
well as the home for monitors in user-space (online asynchronous),
and in *eBPF.
The tool receives a command as the first argument, the current
commands are:
list - list all available monitors
mon - run a given monitor
Each monitor is an independent piece of software inside the
tool and can have their own arguments.
There is no monitor implemented in this patch, it only
adds the basic structure of the tool, based on rtla.
# rv --help
rv version 6.1.0-rc4: help
usage: rv command [-h] [command_options]
-h/--help: print this menu
command: run one of the following command:
list: list all available monitors
mon: run a monitor
[command options]: each command has its own set of options
run rv command -h for further information
*dot2bpf is the next patch set, depends on this, doing cleanups.
Link: https://lkml.kernel.org/r/fb51184f3b95aea0d7bfdc33ec09f4153aee84fa.1668180100.git.bristot@kernel.org
Cc: Jonathan Corbet <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
rtla_usage(), osnoise_usage() and timerlat_usage() all exit with an
error status.
However when these are called from help, they should exit with a
non-error status.
Fix this by passing the exit status to the functions.
Note, although we remove the subsequent call to exit after calling
usage, we leave it in at the end of a function to suppress the compiler
warning "control reaches end of a non-void function".
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: John Kacur <[email protected]>
Acked-by: Daniel Bristot de Oliveira <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
MEM_REGION_TEST_DATA is meant to hold data explicitly used by a
selftest, not implicit allocations due to the selftests infrastructure.
Allocate the ucall pool from MEM_REGION_DATA much like the rest of the
selftests library allocations.
Fixes: 426729b2cf2e ("KVM: selftests: Add ucall pool based implementation")
Signed-off-by: Oliver Upton <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
An interesting feature of the Arm architecture is that the stage-1 MMU
supports two distinct VA regions, controlled by TTBR{0,1}_EL1. As KVM
selftests on arm64 only uses TTBR0_EL1, the VA space is constrained to
[0, 2^(va_bits-1)). This is different from other architectures that
allow for addressing low and high regions of the VA space from a single
page table.
KVM selftests' VA space allocator presumes the valid address range is
split between low and high memory based the MSB, which of course is a
poor match for arm64's TTBR0 region.
Allow architectures to override the default VA space layout. Make use of
the override to align vpages_valid with the behavior of TTBR0 on arm64.
Signed-off-by: Oliver Upton <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.2
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on.
- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
stage-2 faults and access tracking. You name it, we got it, we
probably broke it.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
As a side effect, this tag also drags:
- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
series
- A shared branch with the arm64 tree that repaints all the system
registers to match the ARM ARM's naming, and resulting in
interesting conflicts
|
|
Allow variables to execute shell commands. Note, these are processed when
they are first seen while parsing the config file. This is useful if you
have the same config file used for multiple hosts (as they may be in a git
repository).
HOSTNAME := ${shell hostname}
DEFAULTS IF "${HOSTNAME}" == "frodo"
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The BPF Makefile in net/bpf did incorrect path substitution for O=dir
builds, e.g.
make O=/tmp/kselftest headers
make O=/tmp/kselftest -C tools/testing/selftests
would fail in selftest builds [1] net/ with
clang-16: error: no such file or directory: 'kselftest/net/bpf/nat6to4.c'
clang-16: error: no input files
Add a pattern prerequisite and an order-only-prerequisite (for
creating the directory), to resolve the issue.
[1] https://lore.kernel.org/all/[email protected]/
Reported-by: kernel test robot <[email protected]>
Fixes: 837a3d66d698 ("selftests: net: Add cross-compilation support for BPF programs")
Signed-off-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Now that Spectrum-1 gained ip6gre support we can move the test out of
the Spectrum-2 directory.
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: Amit Cohen <[email protected]>
Signed-off-by: Petr Machata <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The original support for bpf_user_ringbuf_drain callbacks simply
short-circuited checks for the dynptr state, allowing users to pass
PTR_TO_DYNPTR (now CONST_PTR_TO_DYNPTR) to helpers that initialize a
dynptr. This bug would have also surfaced with other dynptr helpers in
the future that changed dynptr view or modified it in some way.
Include test cases for all cases, i.e. both bpf_dynptr_from_mem and
bpf_ringbuf_reserve_dynptr, and ensure verifier rejects both of them.
Without the fix, both of these programs load and pass verification.
While at it, remove sys_nanosleep target from failure cases' SEC
definition, as there is no such tracepoint.
Acked-by: David Vernet <[email protected]>
Acked-by: Joanne Koong <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
While check_func_arg_reg_off is the place which performs generic checks
needed by various candidates of reg->type, there is some handling for
special cases, like ARG_PTR_TO_DYNPTR, OBJ_RELEASE, and
ARG_PTR_TO_RINGBUF_MEM.
This commit aims to streamline these special cases and instead leave
other things up to argument type specific code to handle. The function
will be restrictive by default, and cover all possible cases when
OBJ_RELEASE is set, without having to update the function again (and
missing to do that being a bug).
This is done primarily for two reasons: associating back reg->type to
its argument leaves room for the list getting out of sync when a new
reg->type is supported by an arg_type.
The other case is ARG_PTR_TO_RINGBUF_MEM. The problem there is something
we already handle, whenever a release argument is expected, it should
be passed as the pointer that was received from the acquire function.
Hence zero fixed and variable offset.
There is nothing special about ARG_PTR_TO_RINGBUF_MEM, where technically
its target register type PTR_TO_MEM | MEM_RINGBUF can already be passed
with non-zero offset to other helper functions, which makes sense.
Hence, lift the arg_type_is_release check for reg->off and cover all
possible register types, instead of duplicating the same kind of check
twice for current OBJ_RELEASE arg_types (alloc_mem and ptr_to_btf_id).
For the release argument, arg_type_is_dynptr is the special case, where
we go to actual object being freed through the dynptr, so the offset of
the pointer still needs to allow fixed and variable offset and
process_dynptr_func will verify them later for the release argument case
as well.
This is not specific to ARG_PTR_TO_DYNPTR though, we will need to make
this exception for any future object on the stack that needs to be
released. In this sense, PTR_TO_STACK as a candidate for object on stack
argument is a special case for release offset checks, and they need to
be done by the helper releasing the object on stack.
Since the check has been lifted above all register type checks, remove
the duplicated check that is being done for PTR_TO_BTF_ID.
Acked-by: Joanne Koong <[email protected]>
Acked-by: David Vernet <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type
for use in callback state, because in case of user ringbuf helpers,
there is no dynptr on the stack that is passed into the callback. To
reflect such a state, a special register type was created.
However, some checks have been bypassed incorrectly during the addition
of this feature. First, for arg_type with MEM_UNINIT flag which
initialize a dynptr, they must be rejected for such register type.
Secondly, in the future, there are plans to add dynptr helpers that
operate on the dynptr itself and may change its offset and other
properties.
In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed
to such helpers, however the current code simply returns 0.
The rejection for helpers that release the dynptr is already handled.
For fixing this, we take a step back and rework existing code in a way
that will allow fitting in all classes of helpers and have a coherent
model for dealing with the variety of use cases in which dynptr is used.
First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together
with a DYNPTR_TYPE_* constant that denotes the only type it accepts.
Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this
fact. To make the distinction clear, use MEM_RDONLY flag to indicate
that the helper only operates on the memory pointed to by the dynptr,
not the dynptr itself. In C parlance, it would be equivalent to taking
the dynptr as a point to const argument.
When either of these flags are not present, the helper is allowed to
mutate both the dynptr itself and also the memory it points to.
Currently, the read only status of the memory is not tracked in the
dynptr, but it would be trivial to add this support inside dynptr state
of the register.
With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to
better reflect its usage, it can no longer be passed to helpers that
initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.
A note to reviewers is that in code that does mark_stack_slots_dynptr,
and unmark_stack_slots_dynptr, we implicitly rely on the fact that
PTR_TO_STACK reg is the only case that can reach that code path, as one
cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In
both cases such helpers won't be setting that flag.
The next patch will add a couple of selftest cases to make sure this
doesn't break.
Fixes: 205715673844 ("bpf: Add bpf_user_ringbuf_drain() helper")
Acked-by: Joanne Koong <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
ARG_PTR_TO_DYNPTR is akin to ARG_PTR_TO_TIMER, ARG_PTR_TO_KPTR, where
the underlying register type is subjected to more special checks to
determine the type of object represented by the pointer and its state
consistency.
Move dynptr checks to their own 'process_dynptr_func' function so that
is consistent and in-line with existing code. This also makes it easier
to reuse this code for kfunc handling.
Then, reuse this consolidated function in kfunc dynptr handling too.
Note that for kfuncs, the arg_type constraint of DYNPTR_TYPE_LOCAL has
been lifted.
Acked-by: David Vernet <[email protected]>
Acked-by: Joanne Koong <[email protected]>
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, can and netfilter.
Current release - new code bugs:
- bonding: ipv6: correct address used in Neighbour Advertisement
parsing (src vs dst typo)
- fec: properly scope IRQ coalesce setup during link up to supported
chips only
Previous releases - regressions:
- Bluetooth fixes for fake CSR clones (knockoffs):
- re-add ERR_DATA_REPORTING quirk
- fix crash when device is replugged
- Bluetooth:
- silence a user-triggerable dmesg error message
- L2CAP: fix u8 overflow, oob access
- correct vendor codec definition
- fix support for Read Local Supported Codecs V2
- ti: am65-cpsw: fix RGMII configuration at SPEED_10
- mana: fix race on per-CQ variable NAPI work_done
Previous releases - always broken:
- af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(),
avoid null-deref
- af_can: fix NULL pointer dereference in can_rcv_filter
- can: slcan: fix UAF with a freed work
- can: can327: flush TX_work on ldisc .close()
- macsec: add missing attribute validation for offload
- ipv6: avoid use-after-free in ip6_fragment()
- nft_set_pipapo: actually validate intervals in fields after the
first one
- mvneta: prevent oob access in mvneta_config_rss()
- ipv4: fix incorrect route flushing when table ID 0 is used, or when
source address is deleted
- phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C"
* tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()
s390/qeth: fix use-after-free in hsci
macsec: add missing attribute validation for offload
net: mvneta: Fix an out of bounds check
net: thunderbolt: fix memory leak in tbnet_open()
ipv6: avoid use-after-free in ip6_fragment()
net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
net: phy: mxl-gpy: add MDINT workaround
net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports
xen/netback: don't call kfree_skb() under spin_lock_irqsave()
dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove()
ethernet: aeroflex: fix potential skb leak in greth_init_rings()
tipc: call tipc_lxc_xmit without holding node_read_lock
can: esd_usb: Allow REC and TEC to return to zero
can: can327: flush TX_work on ldisc .close()
can: slcan: fix freed work crash
can: af_can: fix NULL pointer dereference in can_rcv_filter
net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
ipv4: Fix incorrect route flushing when table ID 0 is used
ipv4: Fix incorrect route flushing when source address is deleted
...
|
|
Bpftool has new extra libbpf_det_bind probing map we need to exclude.
Also skip trying to load netdevsim modules if it's already loaded (builtin).
v2:
- drop iproute2->bpftool changes (Toke)
Signed-off-by: Stanislav Fomichev <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Merge the powerpc objtool support, which we were keeping in a topic
branch in case of any merge conflicts.
|
|
grub2 has submenus where to use grub-reboot, it requires:
grub-reboot X>Y
where X is the main index and Y is the submenu. Thus if you have:
menuentry 'Debian GNU/Linux' --class debian --class gnu-linux ...
[...]
}
submenu 'Advanced options for Debian GNU/Linux' $menuentry_id_option ...
menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64' --class debian --class gnu-linux ...
[...]
}
menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64 (recovery mode)' --class debian --class gnu-linux ...
[...]
}
menuentry 'Debian GNU/Linux, with Linux test' --class debian --class gnu-linux ...
[...]
}
And wanted to boot to the "Linux test" kernel, you need to run:
# grub-reboot 1>2
As 1 is the second top menu (the submenu) and 2 is the third of the sub
menu entries.
Have the grub.cfg parsing for grub2 handle such cases.
Cc: [email protected]
Fixes: a15ba91361d46 ("ktest: Add support for grub2")
Reviewed-by: John 'Warthog9' Hawley (VMware) <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
After a full run of a make_min_config test, I noticed there were a lot of
CONFIGs still enabled that really should not be. Looking at them, I
noticed they were all defined as "default y". The issue is that the test
simple removes the config and re-runs make oldconfig, which enables it
again because it is set to default 'y'. Instead, explicitly disable the
config with writing "# CONFIG_FOO is not set" to the file to keep it from
being set again.
With this change, one of my box's minconfigs went from 768 configs set,
down to 521 configs set.
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 0a05c769a9de5 ("ktest: Added config_bisect test type")
Reviewed-by: John 'Warthog9' Hawley (VMware) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Convert big chunks of dynptr and map_kptr subtests to use generic
verification_tester. They are switched from using manually maintained
tables of test cases, specifying program name and expected error
verifier message, to btf_decl_tag-based annotations directly on
corresponding BPF programs: __failure to specify that BPF program is
expected to fail verification, and __msg() to specify expected log
message.
Acked-by: John Fastabend <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
It's become a common pattern to have a collection of small BPF programs
in one BPF object file, each representing one test case. On user-space
side of such tests we maintain a table of program names and expected
failure or success, along with optional expected verifier log message.
This works, but each set of tests reimplement this mundane code over and
over again, which is a waste of time for anyone trying to add a new set
of tests. Furthermore, it's quite error prone as it's way too easy to miss
some entries in these manually maintained test tables (as evidences by
dynptr_fail tests, in which ringbuf_release_uninit_dynptr subtest was
accidentally missed; this is fixed in next patch).
So this patch implements generic test_loader, which accepts skeleton
name and handles the rest of details: opens and loads BPF object file,
making sure each program is tested in isolation. Optionally each test
case can specify expected BPF verifier log message. In case of failure,
tester makes sure to report verifier log, but it also reports verifier
log in verbose mode unconditionally.
Now, the interesting deviation from existing custom implementations is
the use of btf_decl_tag attribute to specify expected-to-fail vs
expected-to-succeed markers and, optionally, expected log message
directly next to BPF program source code, eliminating the need to
manually create and update table of tests.
We define few macros wrapping btf_decl_tag with a convention that all
values of btf_decl_tag start with "comment:" prefix, and then utilizing
a very simple "just_some_text_tag" or "some_key_name=<value>" pattern to
define things like expected success/failure, expected verifier message,
extra verifier log level (if necessary). This approach is demonstrated
by next patch in which two existing sets of failure tests are converted.
Tester supports both expected-to-fail and expected-to-succeed programs,
though this patch set didn't convert any existing expected-to-succeed
programs yet, as existing tests couple BPF program loading with their
further execution through attach or test_prog_run. One way to allow
testing scenarios like this would be ability to specify custom callback,
executed for each successfully loaded BPF program. This is left for
follow up patches, after some more analysis of existing test cases.
This test_loader is, hopefully, a start of a test_verifier-like runner,
but integrated into test_progs infrastructure. It will allow much better
"user experience" of defining low-level verification tests that can take
advantage of all the libbpf-provided nicety features on BPF side: global
variables, declarative maps, etc. All while having a choice of defining
it in C or as BPF assembly (through __attribute__((naked)) functions and
using embedded asm), depending on what makes most sense in each
particular case. This will be explored in follow up patches as well.
Acked-by: John Fastabend <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Cited commit added the table ID to the FIB info structure, but did not
properly initialize it when table ID 0 is used. This can lead to a route
in the default VRF with a preferred source address not being flushed
when the address is deleted.
Consider the following example:
# ip address add dev dummy1 192.0.2.1/28
# ip address add dev dummy1 192.0.2.17/28
# ip route add 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 100
# ip route add table 0 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 200
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 100
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Both routes are installed in the default VRF, but they are using two
different FIB info structures. One with a metric of 100 and table ID of
254 (main) and one with a metric of 200 and table ID of 0. Therefore,
when the preferred source address is deleted from the default VRF,
the second route is not flushed:
# ip address del dev dummy1 192.0.2.17/28
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Fix by storing a table ID of 254 instead of 0 in the route configuration
structure.
Add a test case that fails before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [FAIL]
Tests passed: 8
Tests failed: 1
And passes after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [ OK ]
Tests passed: 9
Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs")
Reported-by: Donald Sharp <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cited commit added the table ID to the FIB info structure, but did not
prevent structures with different table IDs from being consolidated.
This can lead to routes being flushed from a VRF when an address is
deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB
info. This is already done for FIB info structures backed by a nexthop
object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [FAIL]
TEST: Route in default VRF not removed [ OK ]
RTNETLINK answers: File exists
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6
Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8
Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|