aboutsummaryrefslogtreecommitdiff
path: root/fs/notify
AgeCommit message (Collapse)AuthorFilesLines
2020-07-27fanotify: use FAN_EVENT_ON_CHILD as implicit flag on sb/mount/non-dir marksAmir Goldstein1-3/+10
Up to now, fanotify allowed to set the FAN_EVENT_ON_CHILD flag on sb/mount marks and non-directory inode mask, but the flag was ignored. Mask out the flag if it is provided by user on sb/mount/non-dir marks and define it as an implicit flag that cannot be removed by user. This flag is going to be used internally to request for events with parent and name info. Link: https://lore.kernel.org/r/20200716084230.30611-8-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: prepare for implicit event flags in mark maskAmir Goldstein1-16/+24
So far, all flags that can be set in an fanotify mark mask can be set explicitly by a call to fanotify_mark(2). Prepare for defining implicit event flags that cannot be set by user with fanotify_mark(2), similar to how inotify/dnotify implicitly set the FS_EVENT_ON_CHILD flag. Implicit event flags cannot be removed by user and mark gets destroyed when only implicit event flags remain in the mask. Link: https://lore.kernel.org/r/20200716084230.30611-7-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: mask out special event flags from ignored maskAmir Goldstein1-0/+5
The special event flags (FAN_ONDIR, FAN_EVENT_ON_CHILD) never had any meaning in ignored mask. Mask them out explicitly. Link: https://lore.kernel.org/r/20200716084230.30611-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: generalize test for FAN_REPORT_FIDAmir Goldstein2-10/+12
As preparation for new flags that report fids, define a bit set of flags for a group reporting fids, currently containing the only bit FAN_REPORT_FID. Link: https://lore.kernel.org/r/20200716084230.30611-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: distinguish between fid encode error and null fidAmir Goldstein1-10/+4
In fanotify_encode_fh(), both cases of NULL inode and failure to encode ended up with fh type FILEID_INVALID. Distiguish the case of NULL inode, by setting fh type to FILEID_ROOT. This is just a semantic difference at this point. Remove stale comment and unneeded check from fid event compare helpers. Link: https://lore.kernel.org/r/20200716084230.30611-4-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: generalize merge logic of events on dirAmir Goldstein1-11/+11
An event on directory should never be merged with an event on non-directory regardless of the event struct type. This change has no visible effect, because currently, with struct fanotify_path_event, the relevant events will not be merged because event path of dir will be different than event path of non-dir. Link: https://lore.kernel.org/r/20200716084230.30611-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: generalize the handling of extra event flagsAmir Goldstein1-5/+10
In fanotify_group_event_mask() there is logic in place to make sure we are not going to handle an event with no type and just FAN_ONDIR flag. Generalize this logic to any FANOTIFY_EVENT_FLAGS. There is only one more flag in this group at the moment - FAN_EVENT_ON_CHILD. We never report it to user, but we do pass it in to fanotify_alloc_event() when group is reporting fid as indication that event happened on child. We will have use for this indication later on. Link: https://lore.kernel.org/r/20200716084230.30611-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fanotify: remove event FAN_DIR_MODIFYAmir Goldstein2-8/+3
It was never enabled in uapi and its functionality is about to be superseded by events FAN_CREATE, FAN_DELETE, FAN_MOVE with group flag FAN_REPORT_NAME. Keep a place holder variable name_event instead of removing the name recording code since it will be used by the new events. Link: https://lore.kernel.org/r/20200708111156.24659-17-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27fsnotify: pass dir argument to handle_event() callbackAmir Goldstein6-39/+35
The 'inode' argument to handle_event(), sometimes referred to as 'to_tell' is somewhat obsolete. It is a remnant from the times when a group could only have an inode mark associated with an event. We now pass an iter_info array to the callback, with all marks associated with an event. Most backends ignore this argument, with two exceptions: 1. dnotify uses it for sanity check that event is on directory 2. fanotify uses it to report fid of directory on directory entry modification events Remove the 'inode' argument and add a 'dir' argument. The callback function signature is deliberately changed, because the meaning of the argument has changed and the arguments have been documented. The 'dir' argument is set to when 'file_name' is specified and it is referring to the directory that the 'file_name' entry belongs to. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fanotify: break up fanotify_alloc_event()Amir Goldstein1-65/+89
Break up fanotify_alloc_event() into helpers by event struct type. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fanotify: create overflow event typeAmir Goldstein3-27/+36
The special overflow event is allocated as struct fanotify_path_event, but with a null path. Use a special event type to identify the overflow event, so the helper fanotify_has_event_path() will always indicate a non null path. Allocating the overflow event doesn't need any of the fancy stuff in fanotify_alloc_event(), so create a simplified helper for allocating the overflow event. There is also no need to store and report the pid with an overflow event. Link: https://lore.kernel.org/r/20200708111156.24659-7-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15inotify: do not use objectid when comparing eventsAmir Goldstein1-2/+2
inotify's event->wd is the object identifier. Compare that instead of the common fsnotidy event objectid, so we can get rid of the objectid field later. Link: https://lore.kernel.org/r/20200708111156.24659-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fsnotify: return non const from fsnotify_data_inode()Amir Goldstein1-1/+1
Return non const inode pointer from fsnotify_data_inode(). None of the fsnotify hooks pass const inode pointer as data and callers often need to cast to a non const pointer. Link: https://lore.kernel.org/r/20200708111156.24659-3-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fsnotify: fold fsnotify() call into fsnotify_parent()Amir Goldstein1-9/+18
All (two) callers of fsnotify_parent() also call fsnotify() to notify the child inode. Move the second fsnotify() call into fsnotify_parent(). This will allow more flexibility in making decisions about which of the two event falvors should be sent. Using 'goto notify_child' in the inline helper seems a bit strange, but it mimics the code in __fsnotify_parent() for clarity and the goto pattern will become less strage after following patches are applied. Link: https://lore.kernel.org/r/20200708111156.24659-2-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fsnotify: Rearrange fast path to minimise overhead when there is no watcherMel Gorman1-12/+15
The fsnotify paths are trivial to hit even when there are no watchers and they are surprisingly expensive. For example, every successful vfs_write() hits fsnotify_modify which calls both fsnotify_parent and fsnotify unless FMODE_NONOTIFY is set which is an internal flag invisible to userspace. As it stands, fsnotify_parent is a guaranteed functional call even if there are no watchers and fsnotify() does a substantial amount of unnecessary work before it checks if there are any watchers. A perf profile showed that applying mnt->mnt_fsnotify_mask in fnotify() was almost half of the total samples taken in that function during a test. This patch rearranges the fast paths to reduce the amount of work done when there are no watchers. The test motivating this was "perf bench sched messaging --pipe". Despite the fact the pipes are anonymous, fsnotify is still called a lot and the overhead is noticeable even though it's completely pointless. It's likely the overhead is negligible for real IO so this is an extreme example. This is a comparison of hackbench using processes and pipes on a 1-socket machine with 8 CPU threads without fanotify watchers. 5.7.0 5.7.0 vanilla fastfsnotify-v1r1 Amean 1 0.4837 ( 0.00%) 0.4630 * 4.27%* Amean 3 1.5447 ( 0.00%) 1.4557 ( 5.76%) Amean 5 2.6037 ( 0.00%) 2.4363 ( 6.43%) Amean 7 3.5987 ( 0.00%) 3.4757 ( 3.42%) Amean 12 5.8267 ( 0.00%) 5.6983 ( 2.20%) Amean 18 8.4400 ( 0.00%) 8.1327 ( 3.64%) Amean 24 11.0187 ( 0.00%) 10.0290 * 8.98%* Amean 30 13.1013 ( 0.00%) 12.8510 ( 1.91%) Amean 32 13.9190 ( 0.00%) 13.2410 ( 4.87%) 5.7.0 5.7.0 vanilla fastfsnotify-v1r1 Duration User 157.05 152.79 Duration System 1279.98 1219.32 Duration Elapsed 182.81 174.52 This is showing that the latencies are improved by roughly 2-9%. The variability is not shown but some of these results are within the noise as this workload heavily overloads the machine. That said, the system CPU usage is reduced by quite a bit so it makes sense to avoid the overhead even if it is a bit tricky to detect at times. A perf profile of just 1 group of tasks showed that 5.14% of samples taken were in either fsnotify() or fsnotify_parent(). With the patch, 2.8% of samples were in fsnotify, mostly function entry and the initial check for watchers. The check for watchers is complicated enough that inlining it may be controversial. [Amir] Slightly simplify with mnt_or_sb_mask => marks_mask Link: https://lore.kernel.org/r/20200708111156.24659-1-amir73il@gmail.com Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15fanotify: Avoid softlockups when reading many eventsJan Kara1-0/+5
When user provides large buffer for events and there are lots of events available, we can try to copy them all to userspace without scheduling which can softlockup the kernel (furthermore exacerbated by the contention on notification_lock). Add a scheduling point after copying each event. Note that usually the real underlying problem is the cost of fanotify event merging and the resulting contention on notification_lock but this is a cheap way to somewhat reduce the problem until we can properly address that. Reported-by: Francesco Ruggeri <fruggeri@arista.com> Link: https://lore.kernel.org/lkml/20200714025417.A25EB95C0339@us180.sjc.aristanetworks.com Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada2-3/+3
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-06-04Merge tag 'fsnotify_for_v5.8-rc1' of ↵Linus Torvalds7-12/+19
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "Several smaller fixes and cleanups for fsnotify subsystem" * tag 'fsnotify_for_v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix ignore mask logic for events on child and on dir fanotify: don't write with size under sizeof(response) fsnotify: Remove proc_fs.h include fanotify: remove reference to fill_event_metadata() fsnotify: add mutex destroy fanotify: prefix should_merge() fanotify: Replace zero-length array with flexible-array inotify: Fix error return code assignment flow. fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for fsnotify_prepare_user_wait()
2020-06-01Merge tag 'docs-5.8' of git://git.lwn.net/linuxLinus Torvalds1-1/+1
Pull documentation updates from Jonathan Corbet: "A fair amount of stuff this time around, dominated by yet another massive set from Mauro toward the completion of the RST conversion. I *really* hope we are getting close to the end of this. Meanwhile, those patches reach pretty far afield to update document references around the tree; there should be no actual code changes there. There will be, alas, more of the usual trivial merge conflicts. Beyond that we have more translations, improvements to the sphinx scripting, a number of additions to the sysctl documentation, and lots of fixes" * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits) Documentation: fixes to the maintainer-entry-profile template zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst tracing: Fix events.rst section numbering docs: acpi: fix old http link and improve document format docs: filesystems: add info about efivars content Documentation: LSM: Correct the basic LSM description mailmap: change email for Ricardo Ribalda docs: sysctl/kernel: document unaligned controls Documentation: admin-guide: update bug-hunting.rst docs: sysctl/kernel: document ngroups_max nvdimm: fixes to maintainter-entry-profile Documentation/features: Correct RISC-V kprobes support entry Documentation/features: Refresh the arch support status files Revert "docs: sysctl/kernel: document ngroups_max" docs: move locking-specific documents to locking/ docs: move digsig docs to the security book docs: move the kref doc into the core-api book docs: add IRQ documentation at the core-api book docs: debugging-via-ohci1394.txt: add it to the core-api book docs: fix references for ipmi.rst file ...
2020-05-27fanotify: turn off support for FAN_DIR_MODIFYAmir Goldstein1-1/+1
FAN_DIR_MODIFY has been enabled by commit 44d705b0370b ("fanotify: report name info for FAN_DIR_MODIFY event") in 5.7-rc1. Now we are planning further extensions to the fanotify API and during that we realized that FAN_DIR_MODIFY may behave slightly differently to be more consistent with extensions we plan. So until we finalize these extensions, let's not bind our hands with exposing FAN_DIR_MODIFY to userland. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-25fanotify: fix ignore mask logic for events on child and on dirAmir Goldstein1-1/+4
The comments in fanotify_group_event_mask() say: "If the event is on dir/child and this mark doesn't care about events on dir/child, don't send it!" Specifically, mount and filesystem marks do not care about events on child, but they can still specify an ignore mask for those events. For example, a group that has: - A mount mark with mask 0 and ignore_mask FAN_OPEN - An inode mark on a directory with mask FAN_OPEN | FAN_OPEN_EXEC with flag FAN_EVENT_ON_CHILD A child file open for exec would be reported to group with the FAN_OPEN event despite the fact that FAN_OPEN is in ignore mask of mount mark, because the mark iteration loop skips over non-inode marks for events on child when calculating the ignore mask. Move ignore mask calculation to the top of the iteration loop block before excluding marks for events on dir/child. Link: https://lore.kernel.org/r/20200524072441.18258-1-amir73il@gmail.com Reported-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/linux-fsdevel/20200521162443.GA26052@quack2.suse.cz/ Fixes: 55bf882c7f13 "fanotify: fix merging marks masks with FAN_ONDIR" Fixes: b469e7e47c8a "fanotify: fix handling of events on child..." Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-13fanotify: don't write with size under sizeof(response)Fabian Frederick1-2/+4
fanotify_write() only aligned copy_from_user size to sizeof(response) for higher values. This patch avoids all values below as suggested by Amir Goldstein and set to response size unconditionally. Link: https://lore.kernel.org/r/20200512181921.405973-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-13fsnotify: Remove proc_fs.h includeFabian Frederick1-1/+0
proc_fs.h was already included in fdinfo.h Link: https://lore.kernel.org/r/20200512181906.405927-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-13fanotify: remove reference to fill_event_metadata()Fabian Frederick1-1/+1
fill_event_metadata() was removed in commit bb2f7b4542c7 ("fanotify: open code fill_event_metadata()") Link: https://lore.kernel.org/r/20200512181836.405879-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-13fsnotify: add mutex destroyFabian Frederick1-0/+1
Call mutex_destroy() before freeing notification group. This only adds some additional debug checks when mutex debugging is enabled but still it may be useful. Link: https://lore.kernel.org/r/20200512181803.405832-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-13fanotify: prefix should_merge()Fabian Frederick1-2/+2
Prefix function with fanotify_ like others. Link: https://lore.kernel.org/r/20200512181715.405728-1-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-05-08fanotify: Replace zero-length array with flexible-arrayGustavo A. R. Silva1-1/+1
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Link: https://lore.kernel.org/r/20200507185230.GA14229@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>
2020-04-27inotify: Fix error return code assignment flow.youngjun1-3/+1
If error code is initialized -EINVAL, there is no need to assign -EINVAL. Link: https://lore.kernel.org/r/20200426143316.29877-1-her0gyugyu@gmail.com Signed-off-by: youngjun <her0gyugyu@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-04-20docs: filesystems: fix renamed referencesMauro Carvalho Chehab1-1/+1
Some filesystem references got broken by a previous patch series I submitted. Address those. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Acked-by: David Sterba <dsterba@suse.com> # fs/affs/Kconfig Link: https://lore.kernel.org/r/57318c53008dbda7f6f4a5a9e5787f4d37e8565a.1586881715.git.mchehab+huawei@kernel.org Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2020-04-15fsnotify: Add missing annotation for fsnotify_finish_user_wait() and for ↵Jules Irenge1-1/+5
fsnotify_prepare_user_wait() Sparse reports warnings at fsnotify_prepare_user_wait() and at fsnotify_finish_user_wait() warning: context imbalance in fsnotify_finish_user_wait() - wrong count at exit warning: context imbalance in fsnotify_prepare_user_wait() - unexpected unlock The root cause is the missing annotation at fsnotify_finish_user_wait() and at fsnotify_prepare_user_wait() fsnotify_prepare_user_wait() has an extra annotation __release() that only tell Sparse and not GCC to shutdown the warning Add the missing __acquires(&fsnotify_mark_srcu) annotation Add the missing __releases(&fsnotify_mark_srcu) annotation Add the __release(&fsnotify_mark_srcu) annotation. Link: https://lore.kernel.org/r/20200413214240.15245-1-jbi.octave@gmail.com Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-30fanotify: Fix the checks in fanotify_fsid_equalNathan Chancellor1-1/+1
Clang warns: fs/notify/fanotify/fanotify.c:28:23: warning: self-comparison always evaluates to true [-Wtautological-compare] return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1]; ^ fs/notify/fanotify/fanotify.c:28:57: warning: self-comparison always evaluates to true [-Wtautological-compare] return fsid1->val[0] == fsid1->val[0] && fsid2->val[1] == fsid2->val[1]; ^ 2 warnings generated. The intention was clearly to compare val[0] and val[1] in the two different fsid structs. Fix it otherwise this function always returns true. Fixes: afc894c784c8 ("fanotify: Store fanotify handles differently") Link: https://github.com/ClangBuiltLinux/linux/issues/952 Link: https://lore.kernel.org/r/20200327171030.30625-1-natechancellor@gmail.com Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25fanotify: report name info for FAN_DIR_MODIFY eventAmir Goldstein2-28/+91
Report event FAN_DIR_MODIFY with name in a variable length record similar to how fid's are reported. With name info reporting implemented, setting FAN_DIR_MODIFY in mark mask is now allowed. When events are reported with name, the reported fid identifies the directory and the name follows the fid. The info record type for this event info is FAN_EVENT_INFO_TYPE_DFID_NAME. For now, all reported events have at most one info record which is either FAN_EVENT_INFO_TYPE_FID or FAN_EVENT_INFO_TYPE_DFID_NAME (for FAN_DIR_MODIFY). Later on, events "on child" will report both records. There are several ways that an application can use this information: 1. When watching a single directory, the name is always relative to the watched directory, so application need to fstatat(2) the name relative to the watched directory. 2. When watching a set of directories, the application could keep a map of dirfd for all watched directories and hash the map by fid obtained with name_to_handle_at(2). When getting a name event, the fid in the event info could be used to lookup the base dirfd in the map and then call fstatat(2) with that dirfd. 3. When watching a filesystem (FAN_MARK_FILESYSTEM) or a large set of directories, the application could use open_by_handle_at(2) with the fid in event info to obtain dirfd for the directory where event happened and call fstatat(2) with this dirfd. The last option scales better for a large number of watched directories. The first two options may be available in the future also for non privileged fanotify watchers, because open_by_handle_at(2) requires the CAP_DAC_READ_SEARCH capability. Link: https://lore.kernel.org/r/20200319151022.31456-15-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25fanotify: record name info for FAN_DIR_MODIFY eventAmir Goldstein3-12/+108
For FAN_DIR_MODIFY event, allocate a variable size event struct to store the dir entry name along side the directory file handle. At this point, name info reporting is not yet implemented, so trying to set FAN_DIR_MODIFY in mark mask will return -EINVAL. Link: https://lore.kernel.org/r/20200319151022.31456-14-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25fanotify: Drop fanotify_event_has_fid()Jan Kara3-8/+3
When some events have directory id and some object id, fanotify_event_has_fid() becomes mostly useless and confusing because we usually need to know which type of file handle the event has. So just drop the function and use fanotify_event_object_fh() instead. Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25fanotify: prepare to report both parent and child fid'sAmir Goldstein1-8/+15
For some events, we are going to report both child and parent fid's, so pass fsid and file handle as arguments to copy_fid_to_user(), which is going to be called with parent and child file handles. Link: https://lore.kernel.org/r/20200319151022.31456-13-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25fanotify: send FAN_DIR_MODIFY event flavor with dir inode and nameAmir Goldstein2-4/+5
Dirent events are going to be supported in two flavors: 1. Directory fid info + mask that includes the specific event types (e.g. FAN_CREATE) and an optional FAN_ONDIR flag. 2. Directory fid info + name + mask that includes only FAN_DIR_MODIFY. To request the second event flavor, user needs to set the event type FAN_DIR_MODIFY in the mark mask. The first flavor is supported since kernel v5.1 for groups initialized with flag FAN_REPORT_FID. It is intended to be used for watching directories in "batch mode" - the watcher is notified when directory is changed and re-scans the directory content in response. This event flavor is stored more compactly in the event queue, so it is optimal for workloads with frequent directory changes. The second event flavor is intended to be used for watching large directories, where the cost of re-scan of the directory on every change is considered too high. The watcher getting the event with the directory fid and entry name is expected to call fstatat(2) to query the content of the entry after the change. Legacy inotify events are reported with name and event mask (e.g. "foo", FAN_CREATE | FAN_ONDIR). That can lead users to the conclusion that there is *currently* an entry "foo" that is a sub-directory, when in fact "foo" may be negative or non-dir by the time user gets the event. To make it clear that the current state of the named entry is unknown, when reporting an event with name info, fanotify obfuscates the specific event types (e.g. create,delete,rename) and uses a common event type - FAN_DIR_MODIFY to describe the change. This should make it harder for users to make wrong assumptions and write buggy filesystem monitors. At this point, name info reporting is not yet implemented, so trying to set FAN_DIR_MODIFY in mark mask will return -EINVAL. Link: https://lore.kernel.org/r/20200319151022.31456-12-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25fanotify: divorce fanotify_path_event and fanotify_fid_eventJan Kara3-94/+180
Breakup the union and make them both inherit from abstract fanotify_event. fanotify_path_event, fanotify_fid_event and fanotify_perm_event inherit from fanotify_event. type field in abstract fanotify_event determines the concrete event type. fanotify_path_event, fanotify_fid_event and fanotify_perm_event are allocated from separate memcache pools. Rename fanotify_perm_event casting macro to FANOTIFY_PERM(), so that FANOTIFY_PE() and FANOTIFY_FE() can be used as casting macros to fanotify_path_event and fanotify_fid_event. [JK: Cleanup FANOTIFY_PE() and FANOTIFY_FE() to be proper inline functions and remove requirement that fanotify_event is the first in event structures] Link: https://lore.kernel.org/r/20200319151022.31456-11-amir73il@gmail.com Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25fanotify: Store fanotify handles differentlyJan Kara3-106/+145
Currently, struct fanotify_fid groups fsid and file handle and is unioned together with struct path to save space. Also there is fh_type and fh_len directly in struct fanotify_event to avoid padding overhead. In the follwing patches, we will be adding more event types and this packing makes code difficult to follow. So unpack everything and create struct fanotify_fh which groups members logically related to file handle to make code easier to follow. In the following patch we will pack things again differently to make events smaller. Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-25fanotify: Simplify create_fd()Jan Kara1-15/+11
create_fd() is never used with invalid path. Also the only thing it needs to know from fanotify_event is the path. Simplify the function to take path directly and assume it is correct. Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-24fanotify: fix merging marks masks with FAN_ONDIRAmir Goldstein1-4/+7
Change the logic of FAN_ONDIR in two ways that are similar to the logic of FAN_EVENT_ON_CHILD, that was fixed in commit 54a307ba8d3c ("fanotify: fix logic of events on child"): 1. The flag is meaningless in ignore mask 2. The flag refers only to events in the mask of the mark where it is set This is what the fanotify_mark.2 man page says about FAN_ONDIR: "Without this flag, only events for files are created." It doesn't say anything about setting this flag in ignore mask to stop getting events on directories nor can I think of any setup where this capability would be useful. Currently, when marks masks are merged, the FAN_ONDIR flag set in one mark affects the events that are set in another mark's mask and this behavior causes unexpected results. For example, a user adds a mark on a directory with mask FAN_ATTRIB | FAN_ONDIR and a mount mark with mask FAN_OPEN (without FAN_ONDIR). An opendir() of that directory (which is inside that mount) generates a FAN_OPEN event even though neither of the marks requested to get open events on directories. Link: https://lore.kernel.org/r/20200319151022.31456-10-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-24fanotify: merge duplicate events on parent and childAmir Goldstein1-1/+6
With inotify, when a watch is set on a directory and on its child, an event on the child is reported twice, once with wd of the parent watch and once with wd of the child watch without the filename. With fanotify, when a watch is set on a directory and on its child, an event on the child is reported twice, but it has the exact same information - either an open file descriptor of the child or an encoded fid of the child. The reason that the two identical events are not merged is because the object id used for merging events in the queue is the child inode in one event and parent inode in the other. For events with path or dentry data, use the victim inode instead of the watched inode as the object id for event merging, so that the event reported on parent will be merged with the event reported on the child. Link: https://lore.kernel.org/r/20200319151022.31456-9-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-24fsnotify: replace inode pointer with an object idAmir Goldstein3-5/+5
The event inode field is used only for comparison in queue merges and cannot be dereferenced after handle_event(), because it does not hold a refcount on the inode. Replace it with an abstract id to do the same thing. Link: https://lore.kernel.org/r/20200319151022.31456-8-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-23fsnotify: simplify arguments passing to fsnotify_parent()Amir Goldstein1-11/+4
Instead of passing both dentry and path and having to figure out which one to use, pass data/data_type to simplify the code. Link: https://lore.kernel.org/r/20200319151022.31456-6-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-03-23fsnotify: use helpers to access data by data_typeAmir Goldstein3-17/+14
Create helpers to access path and inode from different data types. Link: https://lore.kernel.org/r/20200319151022.31456-5-amir73il@gmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2019-12-18fs: call fsnotify_sb_delete after evict_inodesEric Sandeen1-0/+3
When a filesystem is unmounted, we currently call fsnotify_sb_delete() before evict_inodes(), which means that fsnotify_unmount_inodes() must iterate over all inodes on the superblock looking for any inodes with watches. This is inefficient and can lead to livelocks as it iterates over many unwatched inodes. At this point, SB_ACTIVE is gone and dropping refcount to zero kicks the inode out out immediately, so anything processed by fsnotify_sb_delete / fsnotify_unmount_inodes gets evicted in that loop. After that, the call to evict_inodes will evict everything else with a zero refcount. This should speed things up overall, and avoid livelocks in fsnotify_unmount_inodes(). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-18fs: avoid softlockups in s_inodes iteratorsEric Sandeen1-0/+1
Anything that walks all inodes on sb->s_inodes list without rescheduling risks softlockups. Previous efforts were made in 2 functions, see: c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() ac05fbb inode: don't softlockup when evicting inodes but there hasn't been an audit of all walkers, so do that now. This also consistently moves the cond_resched() calls to the bottom of each loop in cases where it already exists. One loop remains: remove_dquot_ref(), because I'm not quite sure how to deal with that one w/o taking the i_lock. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-01Merge tag 'compat-ioctl-5.5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann: "As part of the cleanup of some remaining y2038 issues, I came to fs/compat_ioctl.c, which still has a couple of commands that need support for time64_t. In completely unrelated work, I spent time on cleaning up parts of this file in the past, moving things out into drivers instead. After Al Viro reviewed an earlier version of this series and did a lot more of that cleanup, I decided to try to completely eliminate the rest of it and move it all into drivers. This series incorporates some of Al's work and many patches of my own, but in the end stops short of actually removing the last part, which is the scsi ioctl handlers. I have patches for those as well, but they need more testing or possibly a rewrite" * tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits) scsi: sd: enable compat ioctls for sed-opal pktcdvd: add compat_ioctl handler compat_ioctl: move SG_GET_REQUEST_TABLE handling compat_ioctl: ppp: move simple commands into ppp_generic.c compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic compat_ioctl: unify copy-in of ppp filters tty: handle compat PPP ioctls compat_ioctl: move SIOCOUTQ out of compat_ioctl.c compat_ioctl: handle SIOCOUTQNSD af_unix: add compat_ioctl support compat_ioctl: reimplement SG_IO handling compat_ioctl: move WDIOC handling into wdt drivers fs: compat_ioctl: move FITRIM emulation into file systems gfs2: add compat_ioctl support compat_ioctl: remove unused convert_in_user macro compat_ioctl: remove last RAID handling code compat_ioctl: remove /dev/raw ioctl translation compat_ioctl: remove PCI ioctl translation compat_ioctl: remove joystick ioctl translation ...
2019-10-23compat_ioctl: move more drivers to compat_ptr_ioctlArnd Bergmann1-1/+1
The .ioctl and .compat_ioctl file operations have the same prototype so they can both point to the same function, which works great almost all the time when all the commands are compatible. One exception is the s390 architecture, where a compat pointer is only 31 bit wide, and converting it into a 64-bit pointer requires calling compat_ptr(). Most drivers here will never run in s390, but since we now have a generic helper for it, it's easy enough to use it consistently. I double-checked all these drivers to ensure that all ioctl arguments are used as pointers or are ignored, but are not interpreted as integer values. Acked-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David Sterba <dsterba@suse.com> Acked-by: Darren Hart (VMware) <dvhart@infradead.org> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-10-17fsnotify/fdinfo: exportfs_encode_inode_fh() takes pointer as 4th argumentBen Dooks (Codethink)1-1/+1
The call to exportfs_encode_inode_fh() takes an pointer as the 4th argument, so replace the integer 0 with the NULL pointer. This fixes the following sparse warning: fs/notify/fdinfo.c:53:87: warning: Using plain integer as NULL pointer Link: https://lore.kernel.org/r/20191016095955.3347-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Jan Kara <jack@suse.cz>
2019-10-17fsnotify: move declaration of fsnotify_mark_connector_cachep to fsnotify.hBen Dooks2-2/+2
Move fsnotify_mark_connector_cachep to fsnotify.h to properly share it with the user in mark.c and avoid the following warning from sparse: fs/notify/mark.c:82:19: warning: symbol 'fsnotify_mark_connector_cachep' was not declared. Should it be static? Link: https://lore.kernel.org/r/20191015132518.21819-1-ben.dooks@codethink.co.uk Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: Jan Kara <jack@suse.cz>