blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-02-15	devlink: Move devlink health dump to health file	Moshe Shemesh	3	-123/+126
	Move devlink health report dump callbacks and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15	devlink: Move devlink fmsg and health diagnose to health file	Moshe Shemesh	3	-630/+636
	Devlink fmsg (formatted message) is used by devlink health diagnose, dump and drivers which support these devlink health callbacks. Therefore, move devlink fmsg helpers and related code to file health.c. Move devlink health diagnose to file health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15	devlink: Move devlink health report and recover to health file	Moshe Shemesh	3	-138/+144
	Move devlink health report helper and recover callback and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15	devlink: Move devlink health get and set code to health file	Moshe Shemesh	3	-217/+234
	Move devlink health get and set callbacks and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15	devlink: health: Fix nla_nest_end in error flow	Moshe Shemesh	1	-1/+1
	devlink_nl_health_reporter_fill() error flow calls nla_nest_end(). Fix it to call nla_nest_cancel() instead. Note the bug is harmless as genlmsg_cancel() cancel the entire message, so no fixes tag added. Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15	devlink: Split out health reporter create code	Moshe Shemesh	4	-209/+226
	Move devlink health reporter create/destroy and related dev code to new file health.c. This file shall include all callbacks and functionality that are related to devlink health. In addition, fix kdoc indentation and make reporter create/destroy kdoc more clear. No functional change in this patch. Signed-off-by: Moshe Shemesh <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15	ice: update xdp_features with xdp multi-buff	Lorenzo Bianconi	1	-6/+12
	Now ice driver supports xdp multi-buffer so add it to xdp_features. Check vsi type before setting xdp_features flag. Signed-off-by: Lorenzo Bianconi <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/r/8a4781511ab6e3cd280e944eef69158954f1a15f.1676385351.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15	i40e: check vsi type before setting xdp_features flag	Lorenzo Bianconi	1	-2/+4
	Set xdp_features flag just for I40E_VSI_MAIN vsi type since XDP is supported just in this configuration. Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/f2b537f86b34fc176fbc6b3d249b46a20a87a2f3.1676405131.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2023-02-15	bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES	Alexander Lobakin	2	-9/+27
	&xdp_buff and &xdp_frame are bound in a way that xdp_buff->data_hard_start == xdp_frame It's always the case and e.g. xdp_convert_buff_to_frame() relies on this. IOW, the following: for (u32 i = 0; i < 0xdead; i++) { xdpf = xdp_convert_buff_to_frame(&xdp); xdp_convert_frame_to_buff(xdpf, &xdp); } shouldn't ever modify @xdpf's contents or the pointer itself. However, "live packet" code wrongly treats &xdp_frame as part of its context placed before the data_hard_start. With such flow, data_hard_start is sizeof(*xdpf) off to the right and no longer points to the XDP frame. Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several places and praying that there are no more miscalcs left somewhere in the code, unionize ::frm with ::data in a flex array, so that both starts pointing to the actual data_hard_start and the XDP frame actually starts being a part of it, i.e. a part of the headroom, not the context. A nice side effect is that the maximum frame size for this mode gets increased by 40 bytes, as xdp_buff::frame_sz includes everything from data_hard_start (-> includes xdpf already) to the end of XDP/skb shared info. Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it hardcoded for 64 bit && 4k pages, it can be made more flexible later on. Minor: align `&head->data` with how `head->frm` is assigned for consistency. Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for clarity. (was found while testing XDP traffic generator on ice, which calls xdp_convert_frame_to_buff() for each XDP frame) Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Acked-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-02-15	Merge branch 'New benchmark for hashmap lookups'	Andrii Nakryiko	14	-32/+420
	Anton Protopopov says: ==================== Add a new benchmark for hashmap lookups and fix several typos. In commit 3 I've patched the bench utility so that now command line options can be reused by different benchmarks. The benchmark itself is added in the last commit 7. I was using this benchmark to test map lookup productivity when using a different hash function [1]. When run with --quiet, the results can be easily plotted [2]. The results provided by the benchmark look reasonable and match the results of my different benchmarks (requiring to patch kernel to get actual statistics on map lookups). Links: [1] https://fosdem.org/2023/schedule/event/bpf_hashing/ [2] https://github.com/aspsk/bpf-bench/tree/master/hashmap-bench Changes, v1->v2: - percpu_times_index[] is of wrong size (Martin) - use base 0 for strtol (Andrii) - just use -q without argument (Andrii) - use less hacks when parsing arguments (Andrii) ==================== Signed-off-by: Andrii Nakryiko <[email protected]>
2023-02-15	selftest/bpf/benchs: Add benchmark for hashmap lookups	Anton Protopopov	4	-1/+354
	Add a new benchmark which measures hashmap lookup operations speed. A user can control the following parameters of the benchmark: * key_size (max 1024): the key size to use * max_entries: the hashmap max entries * nr_entries: the number of entries to insert/lookup * nr_loops: the number of loops for the benchmark * map_flags The hashmap flags passed to BPF_MAP_CREATE The BPF program performing the benchmarks calls two nested bpf_loop: bpf_loop(nr_loops/nr_entries) bpf_loop(nr_entries) bpf_map_lookup() So the nr_loops determines the number of actual map lookups. All lookups are successful. Example (the output is generated on a AMD Ryzen 9 3950X machine): for nr_entries in `seq 4096 4096 65536`; do echo -n "$((nr_entries*100/65536))% full: "; sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=4 --nr_entries=$nr_entries --max_entries=65536 --nr_loops=1000000 --map_flags=0x40 \| grep cpu; done 6% full: cpu01: lookup 50.739M ± 0.018M events/sec (approximated from 32 samples of ~19ms) 12% full: cpu01: lookup 47.751M ± 0.015M events/sec (approximated from 32 samples of ~20ms) 18% full: cpu01: lookup 45.153M ± 0.013M events/sec (approximated from 32 samples of ~22ms) 25% full: cpu01: lookup 43.826M ± 0.014M events/sec (approximated from 32 samples of ~22ms) 31% full: cpu01: lookup 41.971M ± 0.012M events/sec (approximated from 32 samples of ~23ms) 37% full: cpu01: lookup 41.034M ± 0.015M events/sec (approximated from 32 samples of ~24ms) 43% full: cpu01: lookup 39.946M ± 0.012M events/sec (approximated from 32 samples of ~25ms) 50% full: cpu01: lookup 38.256M ± 0.014M events/sec (approximated from 32 samples of ~26ms) 56% full: cpu01: lookup 36.580M ± 0.018M events/sec (approximated from 32 samples of ~27ms) 62% full: cpu01: lookup 36.252M ± 0.012M events/sec (approximated from 32 samples of ~27ms) 68% full: cpu01: lookup 35.200M ± 0.012M events/sec (approximated from 32 samples of ~28ms) 75% full: cpu01: lookup 34.061M ± 0.009M events/sec (approximated from 32 samples of ~29ms) 81% full: cpu01: lookup 34.374M ± 0.010M events/sec (approximated from 32 samples of ~29ms) 87% full: cpu01: lookup 33.244M ± 0.011M events/sec (approximated from 32 samples of ~30ms) 93% full: cpu01: lookup 32.182M ± 0.013M events/sec (approximated from 32 samples of ~31ms) 100% full: cpu01: lookup 31.497M ± 0.016M events/sec (approximated from 32 samples of ~31ms) Signed-off-by: Anton Protopopov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-15	selftest/bpf/benchs: Print less if the quiet option is set	Anton Protopopov	1	-2/+4
	The bench utility will print Setting up benchmark '<bench-name>'... Benchmark '<bench-name>' started. on startup to stdout. Suppress this output if --quiet option if given. This makes it simpler to parse benchmark output by a script. Signed-off-by: Anton Protopopov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-15	selftest/bpf/benchs: Make quiet option common	Anton Protopopov	4	-15/+8
	The "local-storage-tasks-trace" benchmark has a `--quiet` option. Move it to the list of common options, so that the main code and other benchmarks can use (new) env.quiet variable. Patch the run_bench_local_storage_rcu_tasks_trace.sh helper script accordingly. Signed-off-by: Anton Protopopov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-15	selftest/bpf/benchs: Remove an unused header	Anton Protopopov	1	-1/+0
	The benchs/bench_bpf_hashmap_full_update.c doesn't set a custom argp, so it shouldn't include the <argp.h> header. Signed-off-by: Anton Protopopov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-15	selftest/bpf/benchs: Enhance argp parsing	Anton Protopopov	8	-10/+51
	To parse command line the bench utility uses the argp_parse() function. This function takes as an argument a parent 'struct argp' structure which defines common command line options and an array of children 'struct argp' structures which defines additional command line options for particular benchmarks. This implementation doesn't allow benchmarks to share option names, e.g., if two benchmarks want to use, say, the --option option, then only one of them will succeed (the first one encountered in the array). This will be convenient if same option names could be used in different benchmarks (with the same semantics, e.g., --nr_loops=N). Fix this by calling the argp_parse() function twice. The first call is the same as it was before, with all children argps, and helps to find the benchmark name and to print a combined help message if anything is wrong. Given the name, we can call the argp_parse the second time, but now the children array points only to a correct benchmark thus always calling the correct parsers. (If there's no a specific list of arguments, then only one call to argp_parse will be done.) Signed-off-by: Anton Protopopov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-15	selftest/bpf/benchs: Make a function static in bpf_hashmap_full_update	Anton Protopopov	1	-1/+1
	The hashmap_report_final callback function defined in the benchs/bench_bpf_hashmap_full_update.c file should be static. Signed-off-by: Anton Protopopov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-15	selftest/bpf/benchs: Fix a typo in bpf_hashmap_full_update	Anton Protopopov	2	-2/+2
	To call the bpf_hashmap_full_update benchmark, one should say: bench bpf-hashmap-ful-update The patch adds a missing 'l' to the benchmark name. Signed-off-by: Anton Protopopov <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-02-15	Merge branch 'Use __GFP_ZERO in bpf memory allocator'	Alexei Starovoitov	5	-3/+130
	Hou Tao says: ==================== From: Hou Tao <[email protected]> Hi, The patchset tries to fix the hard-up problem found when checking how htab handles element reuse in bpf memory allocator. The immediate reuse of freed elements will reinitialize special fields (e.g., bpf_spin_lock) in htab map value and it may corrupt lookup procedure with BFP_F_LOCK flag which acquires bpf-spin-lock during value copying, and lead to hard-lock as shown in patch #2. Patch #1 fixes it by using __GFP_ZERO when allocating the object from slab and the behavior is similar with the preallocated hash-table case. Please see individual patches for more details. And comments are always welcome. Regards, Change Log: v1: * Use __GFP_ZERO instead of ctor to avoid retpoline overhead (from Alexei) * Add comments for check_and_init_map_value() (from Alexei) * split __GFP_ZERO patches out of the original patchset to unblock the development work of others. RFC: https://lore.kernel.org/bpf/[email protected] ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-15	selftests/bpf: Add test case for element reuse in htab map	Hou Tao	2	-0/+120
	The reinitialization of spin-lock in map value after immediate reuse may corrupt lookup with BPF_F_LOCK flag and result in hard lock-up, so add one test case to demonstrate the problem. Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-15	bpf: Zeroing allocated object from slab in bpf memory allocator	Hou Tao	3	-3/+10
	Currently the freed element in bpf memory allocator may be immediately reused, for htab map the reuse will reinitialize special fields in map value (e.g., bpf_spin_lock), but lookup procedure may still access these special fields, and it may lead to hard-lockup as shown below: NMI backtrace for cpu 16 CPU: 16 PID: 2574 Comm: htab.bin Tainted: G L 6.1.0+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), RIP: 0010:queued_spin_lock_slowpath+0x283/0x2c0 ...... Call Trace: <TASK> copy_map_value_locked+0xb7/0x170 bpf_map_copy_value+0x113/0x3c0 __sys_bpf+0x1c67/0x2780 __x64_sys_bpf+0x1c/0x20 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0 ...... </TASK> For htab map, just like the preallocated case, these is no need to initialize these special fields in map value again once these fields have been initialized. For preallocated htab map, these fields are initialized through __GFP_ZERO in bpf_map_area_alloc(), so do the similar thing for non-preallocated htab in bpf memory allocator. And there is no need to use __GFP_ZERO for per-cpu bpf memory allocator, because __alloc_percpu_gfp() does it implicitly. Fixes: 0fd7c5d43339 ("bpf: Optimize call_rcu in non-preallocated hash map.") Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-15	Merge tag 'apparmor-v6.2-rc9' of ↵	Linus Torvalds	1	-2/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor fix from John Johansen: "Regression fix for getattr mediation of old policy" * tag 'apparmor-v6.2-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: Fix regression in compat permissions for getattr
2023-02-15	Merge tag 'nvme-6.3-2023-02-15' of git://git.infradead.org/nvme into ↵	Jens Axboe	1	-4/+2
	for-6.3/block Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.3 - fix and cleanup freeing single sgl (Keith Busch)" * tag 'nvme-6.3-2023-02-15' of git://git.infradead.org/nvme: nvme-pci: remove iod use_sgls nvme-pci: fix freeing single sgl
2023-02-15	Merge tag 'nvme-6.2-2023-02-15' of git://git.infradead.org/nvme into block-6.2	Jens Axboe	1	-10/+10
	Pull NVMe fixes from Christoph: "nvme fixes for Linux 6.2 - always return an ERR_PTR from nvme_pci_alloc_dev (Irvin Cote) - add bogus ID quirk for ADATA SX6000PNP (Daniel Wagner) - set the DMA mask earlier (Christoph Hellwig)" * tag 'nvme-6.2-2023-02-15' of git://git.infradead.org/nvme: nvme-pci: always return an ERR_PTR from nvme_pci_alloc_dev nvme-pci: set the DMA mask earlier nvme-pci: add bogus ID quirk for ADATA SX6000PNP
2023-02-15	Merge tag 'nfsd-6.2-6' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Fix a teardown bug in the new nfs4_file hashtable * tag 'nfsd-6.2-6' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't destroy global nfs4_file table in per-net shutdown
2023-02-15	Merge branch 'Improvements for BPF_ST tracking by verifier '	Alexei Starovoitov	3	-49/+150
	Eduard Zingerman says: ==================== This patch-set is a part of preparation work for -mcpu=v4 option for BPF C compiler (discussed in [1]). Among other things -mcpu=v4 should enable generation of BPF_ST instruction by the compiler. - Patches #1,2 adjust verifier to track values of constants written to stack using BPF_ST. Currently these are tracked imprecisely, unlike the writes using BPF_STX, e.g.: fp[-8] = 42; currently verifier assumes that fp[-8]=mmmmmmmm after such instruction, where m stands for "misc", just a note that something is written at fp[-8]. r1 = 42; verifier tracks r1=42 after this instruction. fp[-8] = r1; verifier tracks fp[-8]=42 after this instruction. This patch makes both cases equivalent. - Patches #3,4 adjust verifier.c:check_stack_write_fixed_off() to preserve STACK_ZERO marks when BPF_ST writes zero. Currently these are replaced by STACK_MISC, unlike zero writes using BPF_STX, e.g.: ... stack range [X,Y] is marked as STACK_ZERO ... r0 = ... variable offset pointer to stack with range [X,Y] ... fp[r0] = 0; currently verifier marks range [X,Y] as STACK_MISC for such instructions. r1 = 0; fp[r0] = r1; verifier keeps STACK_ZERO marks for range [X,Y]. This patch makes both cases equivalent. Motivating example for patch #1 could be found at [3]. Previous version of the patch-set is here [2], the changes are: - Explicit initialization of fake register parent link is removed from verifier.c:check_stack_write_fixed_off() as parent links are now correctly handled by verifier.c:save_register_state(). - Original patch #1 is split in patches #1 & #3. - Missing test case added for patch #3 verifier.c:check_stack_write_fixed_off() adjustment. - Test cases are updated to use .prog_type = BPF_PROG_TYPE_SK_LOOKUP, which requires return value to be in the range [0,1] (original test cases assumed that such range is always required, which is not true). - Original patch #3 with changes allowing BPF_ST writes to context is withheld for now, w/o compiler support for BPF_ST it requires some creative testing. - Original patch #5 is removed from the patch-set. This patch contained adjustments to expected verifier error messages in some tests, necessary when C compiler generates BPF_ST instruction instead of BPF_STX (changes to expected instruction indices). These changes are not necessary yet. [1] https://lore.kernel.org/bpf/[email protected]/ [2] https://lore.kernel.org/bpf/[email protected]/ [3] https://lore.kernel.org/bpf/[email protected]/ ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-15	selftests/bpf: check if BPF_ST with variable offset preserves STACK_ZERO	Eduard Zingerman	1	-0/+30
	A test case to verify that variable offset BPF_ST instruction preserves STACK_ZERO marks when writes zeros, e.g. in the following situation: (u64)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now variable offset pointer to stack (u8)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for ; fp[-8] should be preserved. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-15	bpf: BPF_ST with variable offset should preserve STACK_ZERO marks	Eduard Zingerman	1	-1/+3
	BPF_STX instruction preserves STACK_ZERO marks for variable offset writes in situations like below: (u64)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now a variable offset pointer to stack r1 = 0 (u8)(r0) = r1 ; BPF_STX writing zero, STACK_ZERO mark for ; fp[-8] is preserved This commit updates verifier.c:check_stack_write_var_off() to process BPF_ST in a similar manner, e.g. the following example: (u64)(r10 - 8) = 0 ; STACK_ZERO marks for fp[-8] r0 = random(-7, -1) ; some random number in range of [-7, -1] r0 += r10 ; r0 is now variable offset pointer to stack (u8)(r0) = 0 ; BPF_ST writing zero, STACK_ZERO mark for ; fp[-8] is preserved Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-15	selftests/bpf: check if verifier tracks constants spilled by BPF_ST_MEM	Eduard Zingerman	1	-0/+37
	Check that verifier tracks the value of 'imm' spilled to stack by BPF_ST_MEM instruction. Cover the following cases: - write of non-zero constant to stack; - write of a zero constant to stack. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-15	bpf: track immediate values written to stack by BPF_ST instruction	Eduard Zingerman	2	-48/+80
	For aligned stack writes using BPF_ST instruction track stored values in a same way BPF_STX is handled, e.g. make sure that the following commands produce similar verifier knowledge: fp[-8] = 42; r1 = 42; fp[-8] = r1; This covers two cases: - non-null values written to stack are stored as spill of fake registers; - null values written to stack are stored as STACK_ZERO marks. Previously both cases above used STACK_MISC marks instead. Some verifier test cases relied on the old logic to obtain STACK_MISC marks for some stack values. These test cases are updated in the same commit to avoid failures during bisect. Signed-off-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-02-15	Merge tag 'trace-v6.2-rc7-2' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixlet from Steven Rostedt: "Make trace_define_field_ext() static. Just after the fix to TASK_COMM_LEN not converted to its value in trace_events was pulled, the kernel test robot reported that the helper function trace_define_field_ext() added to that change was only used in the file it was defined in but was not declared static. Make it a local function" * tag 'trace-v6.2-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Make trace_define_field_ext() static
2023-02-15	apparmor: Fix regression in compat permissions for getattr	John Johansen	1	-2/+1
	This fixes a regression in mediation of getattr when old policy built under an older ABI is loaded and mapped to internal permissions. The regression does not occur for all getattr permission requests, only appearing if state zero is the final state in the permission lookup. This is because despite the first state (index 0) being guaranteed to not have permissions in both newer and older permission formats, it may have to carry permissions that were not mediated as part of an older policy. These backward compat permissions are mapped here to avoid special casing the mediation code paths. Since the mapping code already takes into account backwards compat permission from older formats it can be applied to state 0 to fix the regression. Fixes: 408d53e923bd ("apparmor: compute file permissions on profile load") Reported-by: Philip Meulengracht <[email protected]> Signed-off-by: John Johansen <[email protected]>
2023-02-15	Merge branches 'pm-tools' and 'pm-docs'	Rafael J. Wysocki	4	-10/+10
	Merge power management utilities and documentation updates for 6.3-rc1: - Modify some power management utilities to use the canonical ftrace path (Ross Zwisler). - Correct spelling problems for Documentation/power/ as reported by codespell (Randy Dunlap). * pm-tools: PM: tools: use canonical ftrace path * pm-docs: Documentation: power: correct spelling
2023-02-15	Merge branches 'powercap', 'pm-domains', 'pm-em' and 'pm-opp'	Rafael J. Wysocki	9	-19/+33
	Merge updates of the powercap framework, generic PM domains, Energy Model and operating performance points for 6.3-rc1: - Fix possible name leak in powercap_register_zone() (Yang Yingliang). - Add Meteor Lake and Emerald Rapids support to the intel_rapl power capping driver (Zhang Rui). - Modify the idle_inject power capping facility to support 100% idle injection (Srinivas Pandruvada). - Fix large time windows handling in the intel_rapl power capping driver (Zhang Rui). - Fix memory leaks with using debugfs_lookup() in the generic PM domains and Energy Model code (Greg Kroah-Hartman). - Add missing 'cache-unified' property in example for kryo OPP bindings (Rob Herring). - Fix error checking in opp_migrate_dentry() (Qi Zheng). - Remove "select SRCU" (Paul E. McKenney). - Let qcom,opp-fuse-level be a 2-long array for qcom SoCs (Konrad Dybcio). * powercap: powercap: intel_rapl: Fix handling for large time window powercap: idle_inject: Support 100% idle injection powercap: intel_rapl: add support for Emerald Rapids powercap: intel_rapl: add support for Meteor Lake powercap: fix possible name leak in powercap_register_zone() * pm-domains: PM: domains: fix memory leak with using debugfs_lookup() * pm-em: PM: EM: fix memory leak with using debugfs_lookup() * pm-opp: OPP: fix error checking in opp_migrate_dentry() dt-bindings: opp: v2-qcom-level: Let qcom,opp-fuse-level be a 2-long array drivers/opp: Remove "select SRCU" dt-bindings: opp: opp-v2-kryo-cpu: Add missing 'cache-unified' property in example
2023-02-15	btrfs: make kobj_type structures constant	Thomas Weißschuh	1	-6/+6
	Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: remove the bdev argument to btrfs_rmap_block	Christoph Hellwig	3	-10/+4
	The only user in the zoned remap code is gone now, so remove the argument. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: don't rely on unchanging ->bi_bdev for zone append remaps	Christoph Hellwig	3	-26/+26
	btrfs_record_physical_zoned relies on a bio->bi_bdev samples in the bio_end_io handler to find the reverse map for remapping the zone append write, but stacked block device drivers can and usually do change bi_bdev when sending on the bio to a lower device. This can happen e.g. with the nvme-multipath driver when a NVMe SSD sets the shared namespace bit. But there is no real need for the bdev in btrfs_record_physical_zoned, as it is only passed to btrfs_rmap_block, which uses it to pick the mapping to report if there are multiple reverse mappings. As zone writes can only do simple non-mirror writes right now, and anything more complex will use the stripe tree there is no chance of the multiple mappings case actually happening. Instead open code the subset of btrfs_rmap_block in btrfs_record_physical_zoned, which also removes a memory allocation and remove the bdev field in the ordered extent. Fixes: d8e3fb106f39 ("btrfs: zoned: use ZONE_APPEND write for zoned mode") Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: never return true for reads in btrfs_use_zone_append	Christoph Hellwig	1	-0/+3
	Using Zone Append only makes sense for writes to the device, so check that in btrfs_use_zone_append. This avoids the possibility of artificially limited read size on zoned file systems. Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: pass a btrfs_bio to btrfs_use_append	Christoph Hellwig	4	-6/+7
	struct btrfs_bio has all the information needed for btrfs_use_append, so pass that instead of a btrfs_inode and file_offset. Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: set bbio->file_offset in alloc_new_bio	Christoph Hellwig	1	-2/+1
	Instead of digging into the bio_vec in submit_one_bio, set file_offset at bio allocation time from the provided parameter. This also ensures that the file_offset is available all the time when building up the bio payload. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: use file_offset to limit bios size in calc_bio_boundaries	Christoph Hellwig	1	-2/+2
	btrfs_ordered_extent->disk_bytenr can be rewritten by the zoned I/O completion handler, and thus in general is not a good idea to limit I/O size. But the maximum bio size calculation can easily be done using the file_offset fields in the btrfs_ordered_extent and btrfs_bio structures, so switch to that instead. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: do unsigned integer division in the extent buffer binary search loop	Filipe Manana	2	-7/+12
	In the search loop of the binary search function, we are doing a division by 2 of the sum of the high and low slots. Because the slots are integers, the generated assembly code for it is the following on x86_64: 0x00000000000141f1 <+145>: mov %eax,%ebx 0x00000000000141f3 <+147>: shr $0x1f,%ebx 0x00000000000141f6 <+150>: add %eax,%ebx 0x00000000000141f8 <+152>: sar %ebx It's a few more instructions than a simple right shift, because signed integer division needs to round towards zero. However we know that slots can never be negative (btrfs_header_nritems() returns an u32), so we can instead use unsigned types for the low and high slots and therefore use unsigned integer division, which results in a single instruction on x86_64: 0x00000000000141f0 <+144>: shr %ebx So use unsigned types for the slots and therefore unsigned division. This is part of a small patchset comprised of the following two patches: btrfs: eliminate extra call when doing binary search on extent buffer btrfs: do unsigned integer division in the extent buffer binary search loop The following fs_mark test was run on a non-debug kernel (Debian's default kernel config) before and after applying the patchset: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi MOUNT_OPTIONS="-o ssd" MKFS_OPTIONS="-O no-holes -R free-space-tree" FILES=100000 THREADS=$(nproc --all) FILE_SIZE=0 umount $DEV &> /dev/null mkfs.btrfs -f $MKFS_OPTIONS $DEV mount $MOUNT_OPTIONS $DEV $MNT OPTS="-S 0 -L 6 -n $FILES -s $FILE_SIZE -t $THREADS -k" for ((i = 1; i <= $THREADS; i++)); do OPTS="$OPTS -d $MNT/d$i" done fs_mark $OPTS umount $MNT Results before applying patchset: FSUse% Count Size Files/sec App Overhead 2 1200000 0 174472.0 11549868 4 2400000 0 253503.0 11694618 4 3600000 0 257833.1 11611508 6 4800000 0 247089.5 11665983 6 6000000 0 211296.1 12121244 10 7200000 0 187330.6 12548565 Results after applying patchset: FSUse% Count Size Files/sec App Overhead 2 1200000 0 207556.0 11393252 4 2400000 0 266751.1 11347909 4 3600000 0 274397.5 11270058 6 4800000 0 259608.4 11442250 6 6000000 0 238895.8 11635921 8 7200000 0 211942.2 11873825 Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: eliminate extra call when doing binary search on extent buffer	Filipe Manana	2	-13/+18
	The function btrfs_bin_search() is just a wrapper around the function generic_bin_search(), which passes the same arguments plus a default low slot with a value of 0. This adds an unnecessary extra function call, since btrfs_bin_search() is not static. So improve on this by making btrfs_bin_search() an inline function that calls generic_bin_search(), renaming the later to btrfs_generic_bin_search() and exporting it. Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: raid56: handle endio in scrub_rbio	Christoph Hellwig	1	-11/+7
	The only caller of scrub_rbio calls rbio_orig_end_io right after it, move it into scrub_rbio to match the other work item helpers. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: raid56: handle endio in recover_rbio	Christoph Hellwig	1	-18/+9
	Both callers of recover_rbio call rbio_orig_end_io right after it, so move the call into the shared function. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: raid56: handle endio in rmw_rbio	Christoph Hellwig	1	-20/+10
	Both callers of rmv_rbio call rbio_orig_end_io right after it, so move the call into the shared function. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: raid56: submit the read bios from scrub_assemble_read_bios	Christoph Hellwig	1	-23/+13
	Instead of filling in a bio_list and submitting the bios in the only caller, do that in scrub_assemble_read_bios. This removes the need to pass the bio_list, and also makes it clear that the extra bio_list cleanup in the caller is entirely pointless. Rename the function to scrub_read_bios to make it clear that the bios are not only assembled. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: raid56: fold rmw_read_wait_recover into rmw_read_bios	Christoph Hellwig	1	-46/+23
	There is very little extra code in rmw_read_bios, and a large part of it is the superfluous extra cleanup of the bio list. Merge the two functions, and only clean up the bio list after it has been added to but before it has been emptied again by submit_read_wait_bio_list. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: raid56: fold recover_assemble_read_bios into recover_rbio	Christoph Hellwig	1	-40/+21
	There is very little extra code in recover_rbio, and a large part of it is the superfluous extra cleanup of the bio list. Merge the two functions, and only clean up the bio list after it has been added to but before it has been emptied again by submit_read_wait_bio_list. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: raid56: add a bio_list_put helper	Christoph Hellwig	1	-28/+16
	Add a helper to put all bios in a list. This does not need to be added to block layer as there are no other users of such code. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2023-02-15	btrfs: raid56: wait for I/O completion in submit_read_bios	Christoph Hellwig	1	-7/+6
	In addition to setting up the end_io handler and submitting the bios in submit_read_bios, also wait for them to be completed instead of waiting for the completion manually in all three callers. Rename submit_read_bios to submit_read_wait_bio_list to make it clear it waits for the bios as well. Reviewed-by: Qu Wenruo <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>