aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-07-08lib: fix spelling mistakes in header filesZhen Lei7-13/+13
Fix some spelling mistakes in comments found by "codespell": Hoever ==> However poiter ==> pointer representaion ==> representation uppon ==> upon independend ==> independent aquired ==> acquired mis-match ==> mismatch scrach ==> scratch struture ==> structure Analagous ==> Analogous interation ==> iteration And some were discovered manually by Joe Perches and Christoph Lameter: stroed ==> stored arch independent ==> an architecture independent A example structure for ==> Example structure for Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08lib: fix spelling mistakesZhen Lei19-23/+23
Fix some spelling mistakes in comments: permanentely ==> permanently wont ==> won't remaning ==> remaining succed ==> succeed shouldnt ==> shouldn't alpha-numeric ==> alphanumeric storeing ==> storing funtion ==> function documenation ==> documentation Determin ==> Determine intepreted ==> interpreted ammount ==> amount obious ==> obvious interupts ==> interrupts occured ==> occurred asssociated ==> associated taking into acount ==> taking into account squence ==> sequence stil ==> still contiguos ==> contiguous matchs ==> matches Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08lib/test: fix spelling mistakesZhen Lei5-7/+7
Fix some spelling mistakes in comments found by "codespell": thats ==> that's unitialized ==> uninitialized panicing ==> panicking sucess ==> success possitive ==> positive intepreted ==> interpreted Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Acked-by: Yonghong Song <[email protected]> [test_bfp.c] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08Fix UCOUNT_RLIMIT_SIGPENDING counter leakAlexey Gladkov1-4/+16
We must properly handle an errors when we increase the rlimit counter and the ucounts reference counter. We have to this with RCU protection to prevent possible use-after-free that could occur due to concurrent put_cred_rcu(). The following reproducer triggers the problem: $ cat testcase.sh case "${STEP:-0}" in 0) ulimit -Si 1 ulimit -Hi 1 STEP=1 unshare -rU "$0" killall sleep ;; 1) for i in 1 2 3 4 5; do unshare -rU sleep 5 & done ;; esac with the KASAN report being along the lines of BUG: KASAN: use-after-free in put_ucounts+0x17/0xa0 Write of size 4 at addr ffff8880045f031c by task swapper/2/0 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.13.0+ #19 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-alt4 04/01/2014 Call Trace: <IRQ> put_ucounts+0x17/0xa0 put_cred_rcu+0xd5/0x190 rcu_core+0x3bf/0xcb0 __do_softirq+0xe3/0x341 irq_exit_rcu+0xbe/0xe0 sysvec_apic_timer_interrupt+0x6a/0x90 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 default_idle_call+0x53/0x130 do_idle+0x311/0x3c0 cpu_startup_entry+0x14/0x20 secondary_startup_64_no_verify+0xc2/0xcb Allocated by task 127: kasan_save_stack+0x1b/0x40 __kasan_kmalloc+0x7c/0x90 alloc_ucounts+0x169/0x2b0 set_cred_ucounts+0xbb/0x170 ksys_unshare+0x24c/0x4e0 __x64_sys_unshare+0x16/0x20 do_syscall_64+0x37/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 0: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xeb/0x120 kfree+0xaa/0x460 put_cred_rcu+0xd5/0x190 rcu_core+0x3bf/0xcb0 __do_softirq+0xe3/0x341 The buggy address belongs to the object at ffff8880045f0300 which belongs to the cache kmalloc-192 of size 192 The buggy address is located 28 bytes inside of 192-byte region [ffff8880045f0300, ffff8880045f03c0) The buggy address belongs to the page: page:000000008de0a388 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff8880045f0000 pfn:0x45f0 flags: 0x100000000000200(slab|node=0|zone=1) raw: 0100000000000200 ffffea00000f4640 0000000a0000000a ffff888001042a00 raw: ffff8880045f0000 000000008010000d 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880045f0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880045f0280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff8880045f0300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880045f0380: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8880045f0400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Disabling lock debugging due to kernel taint Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Cc: Eric W. Biederman <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Alexey Gladkov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-08Merge part 2 of branch 'sysfs-devel'Trond Myklebust11-18/+177
Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3Trond Myklebust1-1/+1
Currently we fail to return an error if the NFSv3 module failed to load when we're trying to connect to a pNFS data server. Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple timesTrond Myklebust1-26/+26
After we grab the lock in nfs4_pnfs_ds_connect(), there is no check for whether or not ds->ds_clp has already been initialised, so we can end up adding the same transports multiple times. Fixes: fc821d59209d ("pnfs/NFSv4.1: Add multipath capabilities to pNFS flexfiles servers over NFSv3") Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08NFSv4/pnfs: Clean up layout get on openTrond Myklebust3-17/+18
Cache the layout in the arguments so we don't have to keep looking it up from the inode. Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08NFSv4/pnfs: Fix layoutget behaviour after invalidationTrond Myklebust1-5/+5
If the layout gets invalidated, we should wait for any outstanding layoutget requests for that layout to complete, and we should resend them only after re-establishing the layout stateid. Fixes: d29b468da4f9 ("pNFS/NFSv4: Improve rejection of out-of-order layouts") Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08NFSv4/pnfs: Fix the layout barrier updateTrond Myklebust1-15/+15
If we have multiple outstanding layoutget requests, the current code to update the layout barrier assumes that the outstanding layout stateids are updated in order. That's not necessarily the case. Instead of using the value of lo->plh_outstanding as a guesstimate for the window of values we need to accept, just wait to update the window until we're processing the last one. The intention here is just to ensure that we don't process 2^31 seqid updates without also updating the barrier. Fixes: 1bcf34fdac5f ("pNFS/NFSv4: Update the layout barrier when we schedule a layoutreturn") Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08NFS: Fix fscache read from NFS after cache errorDave Wysochanski2-7/+16
Earlier commits refactored some NFS read code and removed nfs_readpage_async(), but neglected to properly fixup nfs_readpage_from_fscache_complete(). The code path is only hit when something unusual occurs with the cachefiles backing filesystem, such as an IO error or while a cookie is being invalidated. Mark page with PG_checked if fscache IO completes in error, unlock the page, and let the VM decide to re-issue based on PG_uptodate. When the VM reissues the readpage, PG_checked allows us to skip over fscache and read from the server. Link: https://marc.info/?l=linux-nfs&m=162498209518739 Fixes: 1e83b173b266 ("NFS: Add nfs_pageio_complete_read() and remove nfs_readpage_async()") Signed-off-by: Dave Wysochanski <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08NFS: Ensure nfs_readpage returns promptly when internal error occursDave Wysochanski1-3/+3
A previous refactoring of nfs_readpage() might end up calling wait_on_page_locked_killable() even if readpage_async_filler() failed with an internal error and pg_error was non-zero (for example, if nfs_create_request() failed). In the case of an internal error, skip over wait_on_page_locked_killable() as this is only needed when the read is sent and an error occurs during completion handling. Signed-off-by: Dave Wysochanski <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08Merge branch 'sysfs-devel'Trond Myklebust12-2/+650
Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: remove an offlined xprt using sysfsOlga Kornievskaia3-4/+47
Once a transport has been put offline, this transport can be also removed from the list of transports. Any tasks that have been stuck on this transport would find the next available active transport and be re-tried. This transport would be removed from the xprt_switch list and freed. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: provide showing transport's state info in the sysfs directoryOlga Kornievskaia1-0/+47
In preparation of being able to change the xprt's state, add a way to show currect state of the transport. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: display xprt's queuelen of assigned tasks via sysfsOlga Kornievskaia1-2/+4
Once a task grabs a trasnport it's reflected in the queuelen of the rpc_xprt structure. Add display of that value in the xprt's info file in sysfs. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: provide multipath info in the sysfs directoryOlga Kornievskaia1-0/+35
Allow to query xrpt_switch attributes. Currently showing the following fields of the rpc_xprt_switch structure: xps_nxprts, xps_nactive, xps_queuelen. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08NFSv4.1 identify and mark RPC tasks that can move between transportsOlga Kornievskaia4-8/+46
In preparation for when we can re-try a task on a different transport, identify and mark such RPC tasks as moveable. Only 4.1+ operarations can be re-tried on a different transport. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: provide transport info in the sysfs directoryOlga Kornievskaia1-0/+25
Allow to query transport's attributes. Currently showing following fields of the rpc_xprt structure: state, last_used, cong, cwnd, max_reqs, min_reqs, num_reqs, sizes of queues binding, sending, pending, backlog. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08SUNRPC: take a xprt offline using sysfsOlga Kornievskaia4-6/+68
Using sysfs's xprt_state attribute, mark a particular transport offline. It will not be picked during the round-robin selection. It's not allowed to take the main (1st created transport associated with the rpc_client) offline. Also bring a transport back online via sysfs by writing "online" and that would allow for this transport to be picked during the round- robin selection. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: add dst_attr attributes to the sysfs xprt directoryOlga Kornievskaia4-3/+109
Allow to query and set the destination's address of a transport. Setting of the destination address is allowed only for TCP or RDMA based connections. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08SUNRPC for TCP display xprt's source port in sysfs xprt_infoOlga Kornievskaia1-4/+8
Using TCP connection's source port it is useful to match connections seen on the network traces to the xprts used by the linux nfs client. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08SUNRPC query transport's source portOlga Kornievskaia2-0/+8
Provide ability to query transport's source port. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08SUNRPC display xprt's main value in sysfs's xprt_infoOlga Kornievskaia1-3/+3
Display in sysfs in the information about the xprt if this is a main transport or not. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08SUNRPC mark the first transportOlga Kornievskaia2-0/+2
When an RPC client gets created it's first transport is special and should be marked a main transport. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: add add sysfs directory per xprt under each xprt_switchOlga Kornievskaia4-0/+81
Add individual transport directories under each transport switch group. For instance, for each nconnect=X connections there will be a transport directory. Naming conventions also identifies transport type -- xprt-<id>-<type> where type is udp, tcp, rdma, local, bc. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: add a symlink from rpc-client directory to the xprt_switchOlga Kornievskaia3-4/+26
An rpc client uses a transport switch and one ore more transports associated with that switch. Since transports are shared among rpc clients, create a symlink into the xprt_switch directory instead of duplicating entries under each rpc client. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: add xprt_switch direcotry to sunrpc's sysfsOlga Kornievskaia4-7/+108
Add xprt_switch directory to the sysfs and create individual xprt_swith subdirectories for multipath transport group. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: keep track of the xprt_class in rpc_xprt structureOlga Kornievskaia3-0/+13
We need to keep track of the type for a given transport. Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: add IDs to multipathOlga Kornievskaia3-0/+31
This is used to uniquely identify sunrpc multipath objects in /sys. Signed-off-by: Dan Aloni <[email protected]> Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: add xprt idOlga Kornievskaia3-0/+29
This adds a unique identifier for a sunrpc transport in sysfs, which is similarly managed to the unique IDs of clients. Signed-off-by: Dan Aloni <[email protected]> Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: Create per-rpc_clnt sysfs kobjectsOlga Kornievskaia4-0/+76
These will eventually have files placed under them for sysfs operations. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: Create a client/ subdirectory in the sunrpc sysfsOlga Kornievskaia1-0/+43
For network namespace separation. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08sunrpc: Create a sunrpc directory under /sys/kernel/Olga Kornievskaia4-1/+40
This is where we'll put per-rpc_client related files Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Olga Kornievskaia <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2021-07-08ftrace: Use list_move instead of list_del/list_addBaokun Li1-2/+1
Using list_move() instead of list_del() + list_add(). Link: https://lkml.kernel.org/r/[email protected] Reported-by: Hulk Robot <[email protected]> Signed-off-by: Baokun Li <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-07-08tracing/selftests: Add tests to test histogram sym and sym-offset modifiersSteven Rostedt (VMware)1-0/+18
Add a test to the tracing selftests that will catch if the .sym or .sym-offset modifiers break in the future. Link: https://lkml.kernel.org/r/[email protected] Acked-by: Tom Zanussi <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-07-08powerpc/preempt: Don't touch the idle task's preempt_count during hotplugValentin Schneider2-6/+0
Powerpc currently resets a CPU's idle task preempt_count to 0 before said task starts executing the secondary startup routine (and becomes an idle task proper). This conflicts with commit f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled"). which initializes all of the idle tasks' preempt_count to PREEMPT_DISABLED during smp_init(). Note that this was superfluous before said commit, as back then the hotplug machinery would invoke init_idle() via idle_thread_get(), which would have already reset the CPU's idle task's preempt_count to PREEMPT_ENABLED. Get rid of this preempt_count write. Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled") Reported-by: Bharata B Rao <[email protected]> Signed-off-by: Valentin Schneider <[email protected]> Tested-by: Guenter Roeck <[email protected]> Tested-by: Bharata B Rao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-07-08s390/vdso: add minimal compat vdsoSven Schnelle16-25/+371
Add a small vdso for 31 bit compat application that provides trampolines for calls to sigreturn,rt_sigreturn,syscall_restart. This is requird for moving these syscalls away from the signal frame to the vdso. Note that this patch effectively disables CONFIG_COMPAT when using clang to compile the kernel. clang doesn't support 31 bit mode. We want to redirect sigreturn and restart_syscall to the vdso. However, the kernel cannot parse the ELF vdso file, so we need to generate header files which contain the offsets of the syscall instructions in the vdso page. Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/vdso: rename VDSO64_LBASE to VDSO_LBASESven Schnelle2-2/+2
Will be used by both vdso32 and vdso64, so change the name. Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/vdso64: add sigreturn,rt_sigreturn and restart_syscallSven Schnelle2-0/+20
Add minimalistic trampolines to vdso64 so we can return from signal without using the stack which requires pgm check handler hacks when NX is enabled. restart_syscall will be called from vdso to work around the architectural limitation that the syscall number might be encoded in the svc instruction, and therefore can not be changed. Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/vdso: always enable vdsoSven Schnelle2-24/+8
With the upcoming move of the svc sigreturn instruction from the signal frame to vdso we need to have vdso always enabled. Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/ap: get rid of register asmHeiko Carstens1-63/+87
Using register asm statements has been proven to be very error prone, especially when using code instrumentation where gcc may add function calls, which clobbers register contents in an unexpected way. Therefore get rid of register asm statements in ap code. There are also potential bugs, depending on inline decisions of the compiler. E.g. for: static inline struct ap_queue_status ap_tapq(ap_qid_t qid, unsigned long *info) { register unsigned long reg0 asm ("0") = qid; register struct ap_queue_status reg1 asm ("1"); register unsigned long reg2 asm ("2"); asm volatile(".long 0xb2af0000" /* PQAP(TAPQ) */ : "=d" (reg1), "=d" (reg2) : "d" (reg0) : "cc"); if (info) *info = reg2; return reg1; } In case of KCOV the "if (info)" line could cause a generated function call, which could clobber the contents of both reg2, and reg1. Similar can happen in case of KASAN for the "*info = reg2" line. Even though compilers will likely inline the function and optimize things away, this is not guaranteed. To get rid of this bug class, simply get rid of register asm constructs. Note: The inline function ap_dqap() will be handled in a separate patch because this one requires an addressing of the odd register of a register pair (which is done with %N[xxx] in the assembler code) and that's currently not supported by clang. Signed-off-by: Heiko Carstens <[email protected]> Signed-off-by: Harald Freudenberger <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/irq: remove HAVE_IRQ_EXIT_ON_IRQ_STACKSven Schnelle1-1/+0
This is no longer true since we switched to generic entry. The code switches to the IRQ stack before calling do_IRQ, but switches back to the kernel stack before calling irq_exit(). Fixes: 56e62a737028 ("s390: convert to generic entry") Signed-off-by: Sven Schnelle <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/traps: do not test MONITOR CALL without CONFIG_BUGIlya Leoshkevich1-0/+2
tinyconfig fails to boot, because without CONFIG_BUG report_bug() always returns BUG_TRAP_TYPE_BUG, which causes mc 0,0 in test_monitor_call() to panic. Fix by skipping the test without CONFIG_BUG. Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08s390/ap: Rework ap_dqap to deal with messages greater than recv bufferHarald Freudenberger2-11/+65
Rework of the ap_dqap() inline function with the dqap inline assembler invocation and the caller code in ap_queue.c to be able to handle replies which exceed the receive buffer size. ap_dqap() now provides two additional parameters to handle together with the caller the case where a reply in the firmware queue entry exceeds the given message buffer size. It depends on the caller how to exactly handle this. The behavior implemented now by ap_sm_recv() in ap_queue.c is to simple purge this entry from the firmware queue and let the caller 'receive' a -EMSGSIZE for the request without delivering any reply data - not even a truncated reply message. However, the reworked ap_dqap() could now get invoked in a way that the message is received in multiple parts and the caller assembles the parts into one reply message. Signed-off-by: Harald Freudenberger <[email protected]> Suggested-by: Juergen Christ <[email protected]> Reviewed-by: Juergen Christ <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]>
2021-07-08ext4: inline jbd2_journal_[un]register_shrinker()Theodore Ts'o4-103/+64
The function jbd2_journal_unregister_shrinker() was getting called twice when the file system was getting unmounted. On Power and ARM platforms this was causing kernel crash when unmounting the file system, when a percpu_counter was destroyed twice. Fix this by removing jbd2_journal_[un]register_shrinker() functions, and inlining the shrinker setup and teardown into journal_init_common() and jbd2_journal_destroy(). This means that ext4 and ocfs2 now no longer need to know about registering and unregistering jbd2's shrinker. Also, while we're at it, rename the percpu counter from j_jh_shrink_count to j_checkpoint_jh_count, since this makes it clearer what this counter is intended to track. Link: https://lore.kernel.org/r/[email protected] Fixes: 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers") Reported-by: Jon Hunter <[email protected]> Reported-by: Sachin Sant <[email protected]> Tested-by: Sachin Sant <[email protected]> Tested-by: Jon Hunter <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]>
2021-07-08ext4: fix flags validity checking for EXT4_IOC_CHECKPOINTTheodore Ts'o1-1/+1
Use the correct bitmask when checking for any not-yet-supported flags. Link: https://lore.kernel.org/r/[email protected] Fixes: 351a0a3fbc35 ("ext4: add ioctl EXT4_IOC_CHECKPOINT") Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Leah Rumancik <[email protected]>
2021-07-08ext4: fix possible UAF when remounting r/o a mmp-protected file systemTheodore Ts'o2-17/+20
After commit 618f003199c6 ("ext4: fix memory leak in ext4_fill_super"), after the file system is remounted read-only, there is a race where the kmmpd thread can exit, causing sbi->s_mmp_tsk to point at freed memory, which the call to ext4_stop_mmpd() can trip over. Fix this by only allowing kmmpd() to exit when it is stopped via ext4_stop_mmpd(). Link: https://lore.kernel.org/r/[email protected] Reported-by: Ye Bin <[email protected]> Bug-Report-Link: <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Jan Kara <[email protected]>
2021-07-08virtio-mem: prioritize unplug from ZONE_MOVABLE in Big Block ModeDavid Hildenbrand1-1/+26
Let's handle unplug in Big Block Mode similar to Sub Block Mode -- prioritize memory blocks onlined to ZONE_MOVABLE. We won't care further about big blocks with mixed zones, as it's rather a corner case that won't matter in practice. Signed-off-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2021-07-08virtio-mem: simplify high-level unplug handling in Big Block ModeDavid Hildenbrand1-72/+24
Let's simplify high-level big block selection when unplugging in Big Block Mode. Combine handling of offline and online blocks. We can get rid of virtio_mem_bbm_bb_is_offline() and simply use virtio_mem_bbm_offline_remove_and_unplug_bb(), as that already tolerates offline parts. We can race with concurrent onlining/offlining either way, so we don;t have to be super correct by failing if an offline big block we'd like to unplug just got (partially) onlined. Signed-off-by: David Hildenbrand <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>