Age | Commit message (Collapse) | Author | Files | Lines |
|
Use a macro to define the default value for the 'msize' option
at one place instead of using two separate integer literals.
Link: http://lkml.kernel.org/r/28bb651ae0349a7d57e8ddc92c1bd5e62924a912.1630770829.git.linux_oss@crudebyte.com
Signed-off-by: Christian Schoenebeck <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
|
|
Historically TCP has been limited to 64K buffers, but increasing
msize provides huge performance benefits especially as latency
increase so allow for bigger buffers.
Ideally further improvements could change the allocation from the
current contiguous chunk in slab (kmem_cache) to some scatter-gather
compatible API...
Note this only increases the max possible setting, not the default
value.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dominique Martinet <[email protected]>
|
|
When the function IS_ALIGNED() returns false, the value of ret is 0.
So, we set ret to -EINVAL to indicate this error.
Clean up smatch warning:
drivers/ntb/test/ntb_perf.c:602 perf_setup_inbuf() warn: missing error
code 'ret'.
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Reviewed-by: Serge Semin <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
|
|
When the value of nm->isr_ctx is false, the value of ret is 0.
So, we set ret to -ENOMEM to indicate this error.
Clean up smatch warning:
drivers/ntb/test/ntb_msi_test.c:373 ntb_msit_probe() warn: missing
error code 'ret'.
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Reviewed-by: Logan Gunthorpe <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
|
|
Remove Jon's old email address.
Signed-off-by: Dave Jiang <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
|
|
Pull MAP_DENYWRITE removal from David Hildenbrand:
"Remove all in-tree usage of MAP_DENYWRITE from the kernel and remove
VM_DENYWRITE.
There are some (minor) user-visible changes:
- We no longer deny write access to shared libaries loaded via legacy
uselib(); this behavior matches modern user space e.g. dlopen().
- We no longer deny write access to the elf interpreter after exec
completed, treating it just like shared libraries (which it often
is).
- We always deny write access to the file linked via /proc/pid/exe:
sys_prctl(PR_SET_MM_MAP/EXE_FILE) will fail if write access to the
file cannot be denied, and write access to the file will remain
denied until the link is effectivel gone (exec, termination,
sys_prctl(PR_SET_MM_MAP/EXE_FILE)) -- just as if exec'ing the file.
Cross-compiled for a bunch of architectures (alpha, microblaze, i386,
s390x, ...) and verified via ltp that especially the relevant tests
(i.e., creat07 and execve04) continue working as expected"
* tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux:
fs: update documentation of get_write_access() and friends
mm: ignore MAP_DENYWRITE in ksys_mmap_pgoff()
mm: remove VM_DENYWRITE
binfmt: remove in-tree usage of MAP_DENYWRITE
kernel/fork: always deny write access to current MM exe_file
kernel/fork: factor out replacing the current MM exe_file
binfmt: don't use MAP_DENYWRITE when loading shared libraries via uselib()
|
|
Merge NTFSv3 filesystem from Konstantin Komarov:
"This patch adds NTFS Read-Write driver to fs/ntfs3.
Having decades of expertise in commercial file systems development and
huge test coverage, we at Paragon Software GmbH want to make our
contribution to the Open Source Community by providing implementation
of NTFS Read-Write driver for the Linux Kernel.
This is fully functional NTFS Read-Write driver. Current version works
with NTFS (including v3.1) and normal/compressed/sparse files and
supports journal replaying.
We plan to support this version after the codebase once merged, and
add new features and fix bugs. For example, full journaling support
over JBD will be added in later updates"
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://lore.kernel.org/lkml/[email protected]/
* git://github.com/Paragon-Software-Group/linux-ntfs3: (35 commits)
fs/ntfs3: Change how module init/info messages are displayed
fs/ntfs3: Remove GPL boilerplates from decompress lib files
fs/ntfs3: Remove unnecessary condition checking from ntfs_file_read_iter
fs/ntfs3: Fix integer overflow in ni_fiemap with fiemap_prep()
fs/ntfs3: Restyle comments to better align with kernel-doc
fs/ntfs3: Rework file operations
fs/ntfs3: Remove fat ioctl's from ntfs3 driver for now
fs/ntfs3: Restyle comments to better align with kernel-doc
fs/ntfs3: Fix error handling in indx_insert_into_root()
fs/ntfs3: Potential NULL dereference in hdr_find_split()
fs/ntfs3: Fix error code in indx_add_allocate()
fs/ntfs3: fix an error code in ntfs_get_acl_ex()
fs/ntfs3: add checks for allocation failure
fs/ntfs3: Use kcalloc/kmalloc_array over kzalloc/kmalloc
fs/ntfs3: Do not use driver own alloc wrappers
fs/ntfs3: Use kernel ALIGN macros over driver specific
fs/ntfs3: Restyle comment block in ni_parse_reparse()
fs/ntfs3: Remove unused including <linux/version.h>
fs/ntfs3: Fix fall-through warnings for Clang
fs/ntfs3: Fix one none utf8 char in source file
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this cycle, we've addressed some performance issues such as lock
contention, misbehaving compress_cache, allowing extent_cache for
compressed files, and new sysfs to adjust ra_size for fadvise.
In order to diagnose the performance issues quickly, we also added an
iostat which shows the IO latencies periodically.
On the stability side, we've found two memory leakage cases in the
error path in compression flow. And, we've also fixed various corner
cases in fiemap, quota, checkpoint=disable, zstd, and so on.
Enhancements:
- avoid long checkpoint latency by releasing nat_tree_lock
- collect and show iostats periodically
- support extent_cache for compressed files
- add a sysfs entry to manage ra_size given fadvise(POSIX_FADV_SEQUENTIAL)
- report f2fs GC status via sysfs
- add discard_unit=%s in mount option to handle zoned device
Bug fixes:
- fix two memory leakages when an error happens in the compressed IO flow
- fix commpress_cache to get the right LBA
- fix fiemap to deal with compressed case correctly
- fix wrong EIO returns due to SBI_NEED_FSCK
- fix missing writes when enabling checkpoint back
- fix quota deadlock
- fix zstd level mount option
In addition to the above major updates, we've cleaned up several code
paths such as dio, unnecessary operations, debugfs/f2fs/status, sanity
check, and typos"
* tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
f2fs: should put a page beyond EOF when preparing a write
f2fs: deallocate compressed pages when error happens
f2fs: enable realtime discard iff device supports discard
f2fs: guarantee to write dirty data when enabling checkpoint back
f2fs: fix to unmap pages from userspace process in punch_hole()
f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
f2fs: fix to account missing .skipped_gc_rwsem
f2fs: adjust unlock order for cleanup
f2fs: Don't create discard thread when device doesn't support realtime discard
f2fs: rebuild nat_bits during umount
f2fs: introduce periodic iostat io latency traces
f2fs: separate out iostat feature
f2fs: compress: do sanity check on cluster
f2fs: fix description about main_blkaddr node
f2fs: convert S_IRUGO to 0444
f2fs: fix to keep compatibility of fault injection interface
f2fs: support fault injection for f2fs_kmem_cache_alloc()
f2fs: compress: allow write compress released file after truncate to zero
f2fs: correct comment in segment.h
f2fs: improve sbi status info in debugfs/f2fs/status
...
|
|
Update maintainership for the uniform CD-ROM driver from Jens Axboe to
Phillip Potter in MAINTAINERS file, to reflect the attempt to pass on
maintainership of this driver to a different individual. Also remove
URL to site which is no longer active.
Suggested-by: Jens Axboe <[email protected]>
Signed-off-by: Phillip Potter <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull NFS client updates from Anna Schumaker:
"New Features:
- Better client responsiveness when server isn't replying
- Use refcount_t in sunrpc rpc_client refcount tracking
- Add srcaddr and dst_port to the sunrpc sysfs info files
- Add basic support for connection sharing between servers with multiple NICs`
Bugfixes and Cleanups:
- Sunrpc tracepoint cleanups
- Disconnect after ib_post_send() errors to avoid deadlocks
- Fix for tearing down rpcrdma_reps
- Fix a potential pNFS layoutget livelock loop
- pNFS layout barrier fixes
- Fix a potential memory corruption in rpc_wake_up_queued_task_set_status()
- Fix reconnection locking
- Fix return value of get_srcport()
- Remove rpcrdma_post_sends()
- Remove pNFS dead code
- Remove copy size restriction for inter-server copies
- Overhaul the NFS callback service
- Clean up sunrpc TCP socket shutdowns
- Always provide aligned buffers to RPC read layers"
* tag 'nfs-for-5.15-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (39 commits)
NFS: Always provide aligned buffers to the RPC read layers
NFSv4.1 add network transport when session trunking is detected
SUNRPC enforce creation of no more than max_connect xprts
NFSv4 introduce max_connect mount options
SUNRPC add xps_nunique_destaddr_xprts to xprt_switch_info in sysfs
SUNRPC keep track of number of transports to unique addresses
NFSv3: Delete duplicate judgement in nfs3_async_handle_jukebox
SUNRPC: Tweak TCP socket shutdown in the RPC client
SUNRPC: Simplify socket shutdown when not reusing TCP ports
NFSv4.2: remove restriction of copy size for inter-server copy.
NFS: Clean up the synopsis of callback process_op()
NFS: Extract the xdr_init_encode/decode() calls from decode_compound
NFS: Remove unused callback void decoder
NFS: Add a private local dispatcher for NFSv4 callback operations
SUNRPC: Eliminate the RQ_AUTHERR flag
SUNRPC: Set rq_auth_stat in the pg_authenticate() callout
SUNRPC: Add svc_rqst::rq_auth_stat
SUNRPC: Add dst_port to the sysfs xprt info file
SUNRPC: Add srcaddr as a file in sysfs
sunrpc: Fix return value of get_srcport()
...
|
|
'cgx_lmac_init()'
Memory allocated before 'lmac' is stored in 'cgx->lmac_idmap[]' must be
freed explicitly. Otherwise, in case of error, it will leak.
Rename the 'err_irq' label to better describe what is done at this place in
the error handling path.
Fixes: 6f14078e3ee5 ("octeontx2-af: DMAC filter support in MAC block")
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In order to match 'rvu_alloc_bitmap()', add a 'rvu_free_bitmap()' function
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When adapter->registered_device_map is NULL, the value of err is
uncertain, we set err to -EINVAL to avoid ambiguity.
Clean up smatch warning:
drivers/net/ethernet/chelsio/cxgb/cxgb2.c:1114 init_one() warn: missing
error code 'err'
Reported-by: Abaci Robot <[email protected]>
Signed-off-by: Yang Li <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Previous commit 68233c583ab4 removes the qlcnic_rom_lock()
in qlcnic_pinit_from_rom(), but remains its corresponding
unlock function, which is odd. I'm not very sure whether the
lock is missing, or the unlock is redundant. This bug is
suggested by a static analysis tool, please advise.
Fixes: 68233c583ab4 ("qlcnic: updated reset sequence")
Signed-off-by: Dinghao Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
syzbot found that forcing a big quantum attribute would crash hosts fast,
essentially using this:
tc qd replace dev eth0 root fq_codel quantum 4294967295
This is because fq_codel_dequeue() would have to loop
~2^31 times in :
if (flow->deficit <= 0) {
flow->deficit += q->quantum;
list_move_tail(&flow->flowchain, &q->old_flows);
goto begin;
}
SFQ max quantum is 2^19 (half a megabyte)
Lets adopt a max quantum of one megabyte for FQ_CODEL.
Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Embed local_lock into struct kmem_cpu_slab and use the irq-safe versions of
local_lock instead of plain local_irq_save/restore. On !PREEMPT_RT that's
equivalent, with better lockdep visibility. On PREEMPT_RT that means better
preemption.
However, the cost on PREEMPT_RT is the loss of lockless fast paths which only
work with cpu freelist. Those are designed to detect and recover from being
preempted by other conflicting operations (both fast or slow path), but the
slow path operations assume they cannot be preempted by a fast path operation,
which is guaranteed naturally with disabled irqs. With local locks on
PREEMPT_RT, the fast paths now also need to take the local lock to avoid races.
In the allocation fastpath slab_alloc_node() we can just defer to the slowpath
__slab_alloc() which also works with cpu freelist, but under the local lock.
In the free fastpath do_slab_free() we have to add a new local lock protected
version of freeing to the cpu freelist, as the existing slowpath only works
with the page freelist.
Also update the comment about locking scheme in SLUB to reflect changes done
by this series.
[ Mike Galbraith <[email protected]>: use local_lock() without irq in PREEMPT_RT
scope; debugging of RT crashes resulting in put_cpu_partial() locking changes ]
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
We currently use preempt_disable() (directly or via get_cpu_ptr()) to stabilize
the pointer to kmem_cache_cpu. On PREEMPT_RT this would be incompatible with
the list_lock spinlock. We can use migrate_disable() instead, but that
increases overhead on !PREEMPT_RT as it's an unconditional function call.
In order to get the best available mechanism on both PREEMPT_RT and
!PREEMPT_RT, introduce private slub_get_cpu_ptr() and slub_put_cpu_ptr()
wrappers and use them.
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
Jann Horn reported [1] the following theoretically possible race:
task A: put_cpu_partial() calls preempt_disable()
task A: oldpage = this_cpu_read(s->cpu_slab->partial)
interrupt: kfree() reaches unfreeze_partials() and discards the page
task B (on another CPU): reallocates page as page cache
task A: reads page->pages and page->pobjects, which are actually
halves of the pointer page->lru.prev
task B (on another CPU): frees page
interrupt: allocates page as SLUB page and places it on the percpu partial list
task A: this_cpu_cmpxchg() succeeds
which would cause page->pages and page->pobjects to end up containing
halves of pointers that would then influence when put_cpu_partial()
happens and show up in root-only sysfs files. Maybe that's acceptable,
I don't know. But there should probably at least be a comment for now
to point out that we're reading union fields of a page that might be
in a completely different state.
Additionally, the this_cpu_cmpxchg() approach in put_cpu_partial() is only safe
against s->cpu_slab->partial manipulation in ___slab_alloc() if the latter
disables irqs, otherwise a __slab_free() in an irq handler could call
put_cpu_partial() in the middle of ___slab_alloc() manipulating ->partial
and corrupt it. This becomes an issue on RT after a local_lock is introduced
in later patch. The fix means taking the local_lock also in put_cpu_partial()
on RT.
After debugging this issue, Mike Galbraith suggested [2] that to avoid
different locking schemes on RT and !RT, we can just protect put_cpu_partial()
with disabled irqs (to be converted to local_lock_irqsave() later) everywhere.
This should be acceptable as it's not a fast path, and moving the actual
partial unfreezing outside of the irq disabled section makes it short, and with
the retry loop gone the code can be also simplified. In addition, the race
reported by Jann should no longer be possible.
[1] https://lore.kernel.org/lkml/CAG48ez1mvUuXwg0YPH5ANzhQLpbphqk-ZS+jbRz+H66fvm4FcA@mail.gmail.com/
[2] https://lore.kernel.org/linux-rt-users/[email protected]/
Reported-by: Jann Horn <[email protected]>
Suggested-by: Mike Galbraith <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
We need to disable irqs around slab_lock() (a bit spinlock) to make it
irq-safe. Most calls to slab_lock() are nested under spin_lock_irqsave() which
doesn't disable irqs on PREEMPT_RT, so add explicit disabling with PREEMPT_RT.
The exception is cmpxchg_double_slab() which already disables irqs, so use a
__slab_[un]lock() variant without irq disable there.
slab_[un]lock() thus needs a flags pointer parameter, which is unused on !RT.
free_debug_processing() now has two flags variables, which looks odd, but only
one is actually used - the one used in spin_lock_irqsave() on !RT and the one
used in slab_lock() on RT.
As a result, __cmpxchg_double_slab() and cmpxchg_double_slab() become
effectively identical on RT, as both will disable irqs, which is necessary on
RT as most callers of this function also rely on irqsaving lock operations.
Thus, assert that irqs are already disabled in __cmpxchg_double_slab() only on
!RT and also change the VM_BUG_ON assertion to the more standard lockdep_assert
one.
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
The variable object_map is protected by object_map_lock. The lock is always
acquired in debug code and within already atomic context
Make object_map_lock a raw_spinlock_t.
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
As the ADC ladder input driver does not have any code or data located in
initmem, there is no need to annotate the adc_keys_driver structure with
__refdata. Drop the annotation, to avoid suppressing future section
warnings.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Acked-by: Alexandre Belloni <[email protected]>
Link: https://lore.kernel.org/r/7091e8213602be64826fd689a7337246d218f3b1.1626255421.git.geert+renesas@glider.be
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
There is a spelling mistake in the Kconfig text. Fix it.
Signed-off-by: Colin Ian King <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
There is a spelling mistake in the Kconfig text. Fix it.
Signed-off-by: Colin Ian King <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Torokhov <[email protected]>
|
|
If ndr->length is smaller than expected size, ksmbd can access invalid
access in ndr->data. This patch add validation to check ndr->offset is
over ndr->length. and added exception handling to check return value of
ndr read/write function.
Cc: Dan Carpenter <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
ksmbd_file_table_flush is a leftover from SMB1. This function is no longer
needed as SMB1 has been removed from ksmbd.
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Becase smb direct header is mapped and msg->num_sge
already is incremented, the decrement should be
removed from the condition.
Signed-off-by: Hyunchul Lee <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
This log happens on servers with a network bridge since
the bridge does not have a specified link speed.
This is not a real error so change the error log to debug instead.
Signed-off-by: Per Forlin <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
When ownership is changed we might in certain scenarios loose the
ability to alter the inode after we changed ownership. This can e.g.
happen when we are on an idmapped mount where uid 0 is mapped to uid
1000 and uid 1000 is mapped to uid 0.
A caller with fs*id 1000 will be able to create files as *id 1000 on
disk. They will also be able to change ownership of files owned by *id 0
to *id 1000 but they won't be able to change ownership in the other
direction. This means acl operations following notify_change() would
fail. Move the notify_change() call after the acls have been updated.
This guarantees that we don't end up with spurious "hash value diff"
warnings later on because we managed to change ownership but didn't
manage to alter acls.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Permission checking and copying over ownership information is the task
of the underlying filesystem not ksmbd. The order is also wrong here.
This modifies the inode before notify_change(). If notify_change() fails
this will have changed ownership nonetheless. All of this is unnecessary
though since the underlying filesystem's ->setattr handler will do all
this (if required) by itself.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
It seems the error was accidently ignored until now. Make sure it is
surfaced.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
The sid_to_id() helper encodes raw ownership information suitable for
s*id handling. This is conceptually equivalent to reporting ownership
information via stat to userspace. In this case the consumer is ksmbd
instead of a regular user. So when encoding raw ownership information
suitable for s*id handling later we need to map the id up according to
the user namespace of ksmbd itself taking any idmapped mounts into
account.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
The sid_to_id() functions is relevant when changing ownership of
filesystem objects based on acl information. In this case we need to
first translate the relevant s*ids into k*ids in ksmbd's user namespace
and account for any idmapped mounts. Requesting a change in ownership
requires the inverse translation to be applied when we would report
ownership to userspace. So k*id_from_mnt() must be used here.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
It's not obvious why subauth 0 would be excluded from translation. This
would lead to wrong results whenever a non-identity idmapping is used.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
The ksmbd server performs translation of posix acls to smb acls.
Currently the translation is wrong since the idmapping of the mount is
used to map the ids into raw userspace ids but what is relevant is the
user namespace of ksmbd itself. The user namespace of ksmbd itself which
is the initial user namespace. The operation is similar to asking "What
*ids would a userspace process see given that k*id in the relevant user
namespace?". Before the final translation we need to apply the idmapping
of the mount in case any is used. Add two simple helpers for ksmbd.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
When creating new filesystem objects ksmbd translates between k*ids and
s*ids. For this it often uses struct smb_fattr and stashes the k*ids in
cf_uid and cf_gid. Let cf_uid and cf_gid always contain the final
information taking any potential idmapped mounts into account. When
finally translation cf_*id into s*ids translate them into the user
namespace of ksmbd since that is the relevant user namespace here.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
When transferring ownership information to the client the k*ids are
translated into raw *ids before they are sent over the wire. The
function currently erroneously translates the k*ids according to the
mount's idmapping. Instead, reporting the owning *ids to userspace the
underlying k*ids need to be mapped up in the caller's user namespace.
This is how stat() works.
The caller in this instance is ksmbd itself and ksmbd always runs in the
initial user namespace. Translate according to that taking any potential
idmapped mounts into account.
Switch to from_k*id_munged() which ensures that the overflow*id is
returned instead of the (*id_t)-1 when the k*id can't be translated.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
When transferring ownership information to the
client the k*ids are translated into raw *ids before they are sent over
the wire. The function currently erroneously translates the k*ids
according to the mount's idmapping. Instead, reporting the owning *ids
to userspace the underlying k*ids need to be mapped up in the caller's
user namespace. This is how stat() works.
The caller in this instance is ksmbd itself and ksmbd always runs in the
initial user namespace. Translate according to that.
The idmapping of the mount is already taken into account by the lower
filesystem and so kstat->*id will contain the mapped k*ids.
Switch to from_k*id_munged() which ensures that the overflow*id is
returned instead of the (*id_t)-1 when the k*id can't be translated.
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
It's great that the new in-kernel ksmbd server will support idmapped
mounts out of the box! However, lookup is currently broken. Lookup
helpers such as lookup_one_len() call inode_permission() internally to
ensure that the caller is privileged over the inode of the base dentry
they are trying to lookup under. So the permission checking here is
currently wrong.
Linux v5.15 will gain a new lookup helper lookup_one() that does take
idmappings into account. I've added it as part of my patch series to
make btrfs support idmapped mounts. The new helper is in linux-next as
part of David's (Sterba) btrfs for-next branch as commit
c972214c133b ("namei: add mapping aware lookup helper").
I've said it before during one of my first reviews: I would very much
recommend adding fstests to [1]. It already seems to have very
rudimentary cifs support. There is a completely generic idmapped mount
testsuite that supports idmapped mounts.
[1]: https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/
Cc: Colin Ian King <[email protected]>
Cc: Steve French <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Hyunchul Lee <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: David Sterba <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit a160c6159d4a0cf8 ("block: add an optional probe callback to
major_names") is calling the module's probe function with major_names_lock
held.
Fortunately, since commit 990e78116d38059c ("block: loop: fix deadlock
between open and remove") stopped holding loop_ctl_mutex in lo_open(),
current role of loop_ctl_mutex is to serialize access to loop_index_idr
and loop_add()/loop_remove(); in other words, management of id for IDR.
To avoid holding loop_ctl_mutex during whole add/remove operation, use
a bool flag to indicate whether the loop device is ready for use.
loop_unregister_transfer() which is called from cleanup_cryptoloop()
currently has possibility of use-after-free problem due to lack of
serialization between kfree() from loop_remove() from loop_control_remove()
and mutex_lock() from unregister_transfer_cb(). But since lo->lo_encryption
should be already NULL when this function is called due to module unload,
and commit 222013f9ac30b9ce ("cryptoloop: add a deprecation warning")
indicates that we will remove this function shortly, this patch updates
this function to emit warning instead of checking lo->lo_encryption.
Holding loop_ctl_mutex in loop_exit() is pointless, for all users must
close /dev/loop-control and /dev/loop$num (in order to drop module's
refcount to 0) before loop_exit() starts, and nobody can open
/dev/loop-control or /dev/loop$num afterwards.
Link: https://syzkaller.appspot.com/bug?id=7bb10e8b62f83e4d445cdf4c13d69e407e629558 [1]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
migrate_disable() forbids task migration to another CPU. It is available
since v5.11 and has already users such as highmem or BPF. It is useful
to observe this task state in tracing which already has other states
like the preemption counter.
Instead of adding the migrate disable counter as a new entry to struct
trace_entry, which would extend the whole struct by four bytes, it is
squashed into the preempt-disable counter. The lower four bits represent
the preemption counter, the upper four bits represent the migrate
disable counter. Both counter shouldn't exceed 15 but if they do, there
is a safety net which caps the value at 15.
Add the migrate-disable counter to the trace entry so it shows up in the
trace. Due to the users mentioned above, it is already possible to
observe it:
| bash-1108 [000] ...21 73.950578: rss_stat: mm_id=2213312838 curr=0 type=MM_ANONPAGES size=8192B
| bash-1108 [000] d..31 73.951222: irq_disable: caller=flush_tlb_mm_range+0x115/0x130 parent=ptep_clear_flush+0x42/0x50
| bash-1108 [000] d..31 73.951222: tlb_flush: pages:1 reason:local mm shootdown (3)
The last value is the migrate-disable counter.
Things that popped up:
- trace_print_lat_context() does not print the migrate counter. Not sure
if it should. It is used in "verbose" mode and uses 8 digits and I'm
not sure ther is something processing the value.
- trace_define_common_fields() now defines a different variable. This
probably breaks things. No ide what to do in order to preserve the old
behaviour. Since this is used as a filter it should be split somehow
to be able to match both nibbles here.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
[bigeasy: patch description.]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
[ SDR: Removed change to common_preempt_count field name ]
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
|
|
[ 74.211232] BUG: KASAN: stack-out-of-bounds in iov_iter_revert+0x809/0x900
[ 74.212778] Read of size 8 at addr ffff888025dc78b8 by task
syz-executor.0/828
[ 74.214756] CPU: 0 PID: 828 Comm: syz-executor.0 Not tainted
5.14.0-rc3-next-20210730 #1
[ 74.216525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 74.219033] Call Trace:
[ 74.219683] dump_stack_lvl+0x8b/0xb3
[ 74.220706] print_address_description.constprop.0+0x1f/0x140
[ 74.224226] kasan_report.cold+0x7f/0x11b
[ 74.226085] iov_iter_revert+0x809/0x900
[ 74.227960] io_write+0x57d/0xe40
[ 74.232647] io_issue_sqe+0x4da/0x6a80
[ 74.242578] __io_queue_sqe+0x1ac/0xe60
[ 74.245358] io_submit_sqes+0x3f6e/0x76a0
[ 74.248207] __do_sys_io_uring_enter+0x90c/0x1a20
[ 74.257167] do_syscall_64+0x3b/0x90
[ 74.257984] entry_SYSCALL_64_after_hwframe+0x44/0xae
old_size = iov_iter_count();
...
iov_iter_revert(old_size - iov_iter_count());
If iov_iter_revert() is done base on the initial size as above, and the
iter is truncated and not reexpanded in the middle, it miscalculates
borders causing problems. This trace is due to no one reexpanding after
generic_write_checks().
Now iters store how many bytes has been truncated, so reexpand them to
the initial state right before reverting.
Cc: [email protected]
Reported-by: Palash Oswal <[email protected]>
Reported-by: Sudip Mukherjee <[email protected]>
Reported-and-tested-by: [email protected]
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Remember how many bytes were truncated and reverted back. Because
not reexpanded iterators don't always work well with reverting, we may
need to know that to reexpand ourselves when needed.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Protect nft_ct template with global mutex, from Pavel Skripkin.
2) Two recent commits switched inet rt and nexthop exception hashes
from jhash to siphash. If those two spots are problematic then
conntrack is affected as well, so switch voer to siphash too.
While at it, add a hard upper limit on chain lengths and reject
insertion if this is hit. Patches from Florian Westphal.
3) Fix use-after-scope in nf_socket_ipv6 reported by KASAN,
from Benjamin Hesmans.
* git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf:
netfilter: socket: icmp6: fix use-after-scope
netfilter: refuse insertion if chain has grown too large
netfilter: conntrack: switch to siphash
netfilter: conntrack: sanitize table size default settings
netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This code is holding spin_lock_bh(&lif->rx_filters.lock); so the
allocation needs to be atomic.
Fixes: 969f84394604 ("ionic: sync the filters in the work task")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Shannon Nelson <[email protected]>
Link: https://lore.kernel.org/r/20210903131856.GA25934@kili
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
IRQ context
flush_all() flushes a specific SLAB cache on each CPU (where the cache
is present). The deactivate_slab()/__free_slab() invocation happens
within IPI handler and is problematic for PREEMPT_RT.
The flush operation is not a frequent operation or a hot path. The
per-CPU flush operation can be moved to within a workqueue.
Because a workqueue handler, unlike IPI handler, does not disable irqs,
flush_slab() now has to disable them for working with the kmem_cache_cpu
fields. deactivate_slab() is safe to call with irqs enabled.
[[email protected]: adapt to new SLUB changes]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
flush_slab() is called either as part IPI handler on given live cpu, or as a
cleanup on behalf of another cpu that went offline. The first case needs to
protect updating the kmem_cache_cpu fields with disabled irqs. Currently the
whole call happens with irqs disabled by the IPI handler, but the following
patch will change from IPI to workqueue, and flush_slab() will have to disable
irqs (to be replaced with a local lock later) in the critical part.
To prepare for this change, replace the call to flush_slab() for the dead cpu
handling with an opencoded variant that will not disable irqs nor take a local
lock.
Suggested-by: Mike Galbraith <[email protected]>
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
slub_cpu_dead() cleans up for an offlined cpu from another cpu and calls only
functions that are now irq safe, so we don't need to disable irqs anymore.
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
__unfreeze_partials() no longer needs to have irqs disabled, except for making
the spin_lock operations irq-safe, so convert the spin_locks operations and
remove the separate irq handling.
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
unfreezing
Unfreezing partial list can be split to two phases - detaching the list from
struct kmem_cache_cpu, and processing the list. The whole operation does not
need to be protected by disabled irqs. Restructure the code to separate the
detaching (with disabled irqs) and unfreezing (with irq disabling to be reduced
in the next patch).
Also, unfreeze_partials() can be called from another cpu on behalf of a cpu
that is being offlined, where disabling irqs on the local cpu has no sense, so
restructure the code as follows:
- __unfreeze_partials() is the bulk of unfreeze_partials() that processes the
detached percpu partial list
- unfreeze_partials() detaches list from current cpu with irqs disabled and
calls __unfreeze_partials()
- unfreeze_partials_cpu() is to be called for the offlined cpu so it needs no
irq disabling, and is called from __flush_cpu_slab()
- flush_cpu_slab() is for the local cpu thus it needs to call
unfreeze_partials(). So it can't simply call
__flush_cpu_slab(smp_processor_id()) anymore and we have to open-code the
proper calls.
Signed-off-by: Vlastimil Babka <[email protected]>
|
|
Instead of iterating through the live percpu partial list, detach it from the
kmem_cache_cpu at once. This is simpler and will allow further optimization.
Signed-off-by: Vlastimil Babka <[email protected]>
|