Age | Commit message (Collapse) | Author | Files | Lines |
|
It is OK for s_first_meta_bg to be equal to the number of block group
descriptor blocks. (It rarely happens, but it shouldn't cause any
problems.)
https://bugzilla.kernel.org/show_bug.cgi?id=194567
Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
|
|
Fix a BUG when the kernel tries to mount a file system constructed as
follows:
echo foo > foo.txt
mke2fs -Fq -t ext4 -O encrypt foo.img 100
debugfs -w foo.img << EOF
write foo.txt a
set_inode_field a i_flags 0x80800
set_super_value s_last_orphan 12
quit
EOF
root@kvm-xfstests:~# mount -o loop foo.img /mnt
[ 160.238770] ------------[ cut here ]------------
[ 160.240106] kernel BUG at /usr/projects/linux/ext4/fs/ext4/inode.c:3874!
[ 160.240106] invalid opcode: 0000 [#1] SMP
[ 160.240106] Modules linked in:
[ 160.240106] CPU: 0 PID: 2547 Comm: mount Tainted: G W 4.10.0-rc3-00034-gcdd33b941b67 #227
[ 160.240106] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[ 160.240106] task: f4518000 task.stack: f47b6000
[ 160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4
[ 160.240106] EFLAGS: 00010246 CPU: 0
[ 160.240106] EAX: 00000001 EBX: f7be4b50 ECX: f47b7dc0 EDX: 00000007
[ 160.240106] ESI: f43b05a8 EDI: f43babec EBP: f47b7dd0 ESP: f47b7dac
[ 160.240106] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 160.240106] CR0: 80050033 CR2: bfd85b08 CR3: 34a00680 CR4: 000006f0
[ 160.240106] Call Trace:
[ 160.240106] ext4_truncate+0x1e9/0x3e5
[ 160.240106] ext4_fill_super+0x286f/0x2b1e
[ 160.240106] ? set_blocksize+0x2e/0x7e
[ 160.240106] mount_bdev+0x114/0x15f
[ 160.240106] ext4_mount+0x15/0x17
[ 160.240106] ? ext4_calculate_overhead+0x39d/0x39d
[ 160.240106] mount_fs+0x58/0x115
[ 160.240106] vfs_kern_mount+0x4b/0xae
[ 160.240106] do_mount+0x671/0x8c3
[ 160.240106] ? _copy_from_user+0x70/0x83
[ 160.240106] ? strndup_user+0x31/0x46
[ 160.240106] SyS_mount+0x57/0x7b
[ 160.240106] do_int80_syscall_32+0x4f/0x61
[ 160.240106] entry_INT80_32+0x2f/0x2f
[ 160.240106] EIP: 0xb76b919e
[ 160.240106] EFLAGS: 00000246 CPU: 0
[ 160.240106] EAX: ffffffda EBX: 08053838 ECX: 08052188 EDX: 080537e8
[ 160.240106] ESI: c0ed0000 EDI: 00000000 EBP: 080537e8 ESP: bfa13660
[ 160.240106] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[ 160.240106] Code: 59 8b 00 a8 01 0f 84 09 01 00 00 8b 07 66 25 00 f0 66 3d 00 80 75 61 89 f8 e8 3e e2 ff ff 84 c0 74 56 83 bf 48 02 00 00 00 75 02 <0f> 0b 81 7d e8 00 10 00 00 74 02 0f 0b 8b 43 04 8b 53 08 31 c9
[ 160.240106] EIP: ext4_block_zero_page_range+0x1a7/0x2b4 SS:ESP: 0068:f47b7dac
[ 160.317241] ---[ end trace d6a773a375c810a5 ]---
The problem is that when the kernel tries to truncate an inode in
ext4_truncate(), it tries to clear any on-disk data beyond i_size.
Without the encryption key, it can't do that, and so it triggers a
BUG.
E2fsck does *not* provide this service, and in practice most file
systems have their orphan list processed by e2fsck, so to avoid
crashing, this patch skips this step if we don't have access to the
encryption key (which is the case when processing the orphan list; in
all other cases, we will have the encryption key, or the kernel
wouldn't have allowed the file to be opened).
An open question is whether the fact that e2fsck isn't clearing the
bytes beyond i_size causing problems --- and if we've lived with it
not doing it for so long, can we drop this from the kernel replay of
the orphan list in all cases (not just when we don't have the key for
encrypted inodes).
Addresses-Google-Bug: #35209576
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Avoid using stripe_width for sbi->s_stripe value if it is not actually
set. It prevents using the stride for sbi->s_stripe.
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
When a filesystem is created using:
mkfs.ext4 -b 4096 -E stride=512 <dev>
and we try to allocate 64MB extent, we will end up directly in
ext4_mb_complex_scan_group(). This is because the request is detected
as power-of-two allocation (so we start in ext4_mb_regular_allocator()
with ac_criteria == 0) however the check before
ext4_mb_simple_scan_group() refuses the direct buddy scan because the
allocation request is too large. Since cr == 0, the check whether we
should use ext4_mb_scan_aligned() fails as well and we fall back to
ext4_mb_complex_scan_group().
Fix the problem by checking for upper limit on power-of-two requests
directly when detecting them.
Reported-by: Ross Zwisler <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Make sure all callers follow the same locking protocol, given that DAX
transparantly replaced the normal buffered I/O path.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
|
|
Unlike O_DIRECT DAX is not an optional opt-in feature selected by the
application, so we'll have to provide the traditional synchronіzation
of overlapping writes as we do for buffered writes.
This was broken historically for DAX, but got fixed for ext2 and XFS
as part of the iomap conversion. Fix up ext4 as well.
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
|
|
This ioctl is modeled after the xfs's XFS_IOC_GOINGDOWN ioctl. (In
fact, it uses the same code points.)
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Add a shutdown bit that will cause ext4 processing to fail immediately
with EIO.
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
We are currently using one bit in s_resize_flags; rename it in order
to allow more of the bits in that unsigned long for other purposes.
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
If the file system requires journal recovery, and the device is
read-ony, return EROFS to the mount system call. This allows xfstests
generic/050 to pass.
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
|
|
If the journal is aborted, the needs_recovery feature flag should not
be removed. Otherwise, it's the journal might not get replayed and
this could lead to more data getting lost.
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
|
|
If the journal has been aborted, we shouldn't mark the underlying
buffer head as dirty, since that will cause the metadata block to get
modified. And if the journal has been aborted, we shouldn't allow
this since it will almost certainly lead to a corrupted file system.
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
|
|
The write_end() function must always unlock the page and drop its ref
count, even on an error.
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
|
|
The "half md4" transform should not be used by any new code. And
fortunately, it's only used now by ext4. Since ext4 supports several
hashing methods, at some point it might be desirable to move to
something like SipHash. As an intermediate step, remove half md4 from
cryptohash.h and lib, and make it just a local function in ext4's
hash.c. There's precedent for doing this; the other function ext can use
for its hashes -- TEA -- is also implemented in the same place. Also, by
being a local function, this might allow gcc to perform some additional
optimizations.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Reviewed-by: Andreas Dilger <[email protected]>
Cc: Theodore Ts'o <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
In the case where the child's encryption context was inconsistent with
its parent directory, we were using inode->i_sb and inode->i_ino after
the inode had already been iput(). Fix this by doing the iput() in the
correct places.
Note: only ext4 had this bug, not f2fs and ubifs.
Fixes: d9cdc9033181 ("ext4 crypto: enforce context consistency")
Cc: [email protected]
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Below is the synchronization issue between unmount and kjournald2
contexts, which results into use after free issue in kjournald2().
Fix this issue by using journal->j_state_lock to synchronize the
wait_event() done in journal_kill_thread() and the wake_up() done
in kjournald2().
TASK 1:
umount cmd:
|--jbd2_journal_destroy() {
|--journal_kill_thread() {
write_lock(&journal->j_state_lock);
journal->j_flags |= JBD2_UNMOUNT;
...
write_unlock(&journal->j_state_lock);
wake_up(&journal->j_wait_commit); TASK 2 wakes up here:
kjournald2() {
...
checks JBD2_UNMOUNT flag and calls goto end-loop;
...
end_loop:
write_unlock(&journal->j_state_lock);
journal->j_task = NULL; --> If this thread gets
pre-empted here, then TASK 1 wait_event will
exit even before this thread is completely
done.
wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
...
write_lock(&journal->j_state_lock);
write_unlock(&journal->j_state_lock);
}
|--kfree(journal);
}
}
wake_up(&journal->j_wait_done_commit); --> this step
now results into use after free issue.
}
Signed-off-by: Sahitya Tummala <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
ext4_journalled_write_end() did not propely handle all the cases when
generic_perform_write() did not copy all the data into the target page
and could mark buffers with uninitialized contents as uptodate and dirty
leading to possible data corruption (which would be quickly fixed by
generic_perform_write() retrying the write but still). Fix the problem
by carefully handling the case when the page that is written to is not
uptodate.
CC: [email protected]
Reported-by: Al Viro <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
If filesystem groups are artifically small (using parameter -g to
mkfs.ext4), ext4_mb_normalize_request() can result in a request that is
larger than a block group. Trim the request size to not confuse
allocation code.
Reported-by: "Kirill A. Shutemov" <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]
|
|
The last BUG_ON in mb_find_extent() is apparently triggering in some
rare cases. Most of the time it indicates a bug in the buddy bitmap
algorithms, but there are some weird cases where it can trigger when
buddy bitmap is still in memory, but the block bitmap has to be read
from disk, and there is disk or memory corruption such that the block
bitmap and the buddy bitmap are out of sync.
Google-Bug-Id: #33702157
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
There is no need to call ext4_mark_inode_dirty while holding xattr_sem
or i_data_sem, so where it's easy to avoid it, move it out from the
critical region.
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
The xattr_sem deadlock problems fixed in commit 2e81a4eeedca: "ext4:
avoid deadlock when expanding inode size" didn't include the use of
xattr_sem in fs/ext4/inline.c. With the addition of project quota
which added a new extra inode field, this exposed deadlocks in the
inline_data code similar to the ones fixed by 2e81a4eeedca.
The deadlock can be reproduced via:
dmesg -n 7
mke2fs -t ext4 -O inline_data -Fq -I 256 /dev/vdc 32768
mount -t ext4 -o debug_want_extra_isize=24 /dev/vdc /vdc
mkdir /vdc/a
umount /vdc
mount -t ext4 /dev/vdc /vdc
echo foo > /vdc/a/foo
and looks like this:
[ 11.158815]
[ 11.160276] =============================================
[ 11.161960] [ INFO: possible recursive locking detected ]
[ 11.161960] 4.10.0-rc3-00015-g011b30a8a3cf #160 Tainted: G W
[ 11.161960] ---------------------------------------------
[ 11.161960] bash/2519 is trying to acquire lock:
[ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1225a4b>] ext4_expand_extra_isize_ea+0x3d/0x4cd
[ 11.161960]
[ 11.161960] but task is already holding lock:
[ 11.161960] (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
[ 11.161960]
[ 11.161960] other info that might help us debug this:
[ 11.161960] Possible unsafe locking scenario:
[ 11.161960]
[ 11.161960] CPU0
[ 11.161960] ----
[ 11.161960] lock(&ei->xattr_sem);
[ 11.161960] lock(&ei->xattr_sem);
[ 11.161960]
[ 11.161960] *** DEADLOCK ***
[ 11.161960]
[ 11.161960] May be due to missing lock nesting notation
[ 11.161960]
[ 11.161960] 4 locks held by bash/2519:
[ 11.161960] #0: (sb_writers#3){.+.+.+}, at: [<c11a2414>] mnt_want_write+0x1e/0x3e
[ 11.161960] #1: (&type->i_mutex_dir_key){++++++}, at: [<c119508b>] path_openat+0x338/0x67a
[ 11.161960] #2: (jbd2_handle){++++..}, at: [<c123314a>] start_this_handle+0x582/0x622
[ 11.161960] #3: (&ei->xattr_sem){++++..}, at: [<c1227941>] ext4_try_add_inline_entry+0x3a/0x152
[ 11.161960]
[ 11.161960] stack backtrace:
[ 11.161960] CPU: 0 PID: 2519 Comm: bash Tainted: G W 4.10.0-rc3-00015-g011b30a8a3cf #160
[ 11.161960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1 04/01/2014
[ 11.161960] Call Trace:
[ 11.161960] dump_stack+0x72/0xa3
[ 11.161960] __lock_acquire+0xb7c/0xcb9
[ 11.161960] ? kvm_clock_read+0x1f/0x29
[ 11.161960] ? __lock_is_held+0x36/0x66
[ 11.161960] ? __lock_is_held+0x36/0x66
[ 11.161960] lock_acquire+0x106/0x18a
[ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[ 11.161960] down_write+0x39/0x72
[ 11.161960] ? ext4_expand_extra_isize_ea+0x3d/0x4cd
[ 11.161960] ext4_expand_extra_isize_ea+0x3d/0x4cd
[ 11.161960] ? _raw_read_unlock+0x22/0x2c
[ 11.161960] ? jbd2_journal_extend+0x1e2/0x262
[ 11.161960] ? __ext4_journal_get_write_access+0x3d/0x60
[ 11.161960] ext4_mark_inode_dirty+0x17d/0x26d
[ 11.161960] ? ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[ 11.161960] ext4_add_dirent_to_inline.isra.12+0xa5/0xb2
[ 11.161960] ext4_try_add_inline_entry+0x69/0x152
[ 11.161960] ext4_add_entry+0xa3/0x848
[ 11.161960] ? __brelse+0x14/0x2f
[ 11.161960] ? _raw_spin_unlock_irqrestore+0x44/0x4f
[ 11.161960] ext4_add_nondir+0x17/0x5b
[ 11.161960] ext4_create+0xcf/0x133
[ 11.161960] ? ext4_mknod+0x12f/0x12f
[ 11.161960] lookup_open+0x39e/0x3fb
[ 11.161960] ? __wake_up+0x1a/0x40
[ 11.161960] ? lock_acquire+0x11e/0x18a
[ 11.161960] path_openat+0x35c/0x67a
[ 11.161960] ? sched_clock_cpu+0xd7/0xf2
[ 11.161960] do_filp_open+0x36/0x7c
[ 11.161960] ? _raw_spin_unlock+0x22/0x2c
[ 11.161960] ? __alloc_fd+0x169/0x173
[ 11.161960] do_sys_open+0x59/0xcc
[ 11.161960] SyS_open+0x1d/0x1f
[ 11.161960] do_int80_syscall_32+0x4f/0x61
[ 11.161960] entry_INT80_32+0x2f/0x2f
[ 11.161960] EIP: 0xb76ad469
[ 11.161960] EFLAGS: 00000286 CPU: 0
[ 11.161960] EAX: ffffffda EBX: 08168ac8 ECX: 00008241 EDX: 000001b6
[ 11.161960] ESI: b75e46bc EDI: b7755000 EBP: bfbdb108 ESP: bfbdafc0
[ 11.161960] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Cc: [email protected] # 3.10 (requires 2e81a4eeedca as a prereq)
Reported-by: George Spelvin <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
In order to test the inode extra isize expansion code, it is useful to
be able to easily create file systems that have inodes with extra
isize values smaller than the current desired value.
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Inside ext4_ext_shift_extents() function ext4_find_extent() is called
without EXT4_EX_NOCACHE flag, which should prevent cache population.
This leads to oudated offsets in the extents tree and wrong blocks
afterwards.
Patch fixes the problem providing EXT4_EX_NOCACHE flag for each
ext4_find_extents() call inside ext4_ext_shift_extents function.
Fixes: 331573febb6a2
Signed-off-by: Roman Pen <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: [email protected]
|
|
While doing 'insert range' start block should be also shifted right.
The bug can be easily reproduced by the following test:
ptr = malloc(4096);
assert(ptr);
fd = open("./ext4.file", O_CREAT | O_TRUNC | O_RDWR, 0600);
assert(fd >= 0);
rc = fallocate(fd, 0, 0, 8192);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = 0xbeef;
rc = pwrite(fd, ptr, 4096, 0);
assert(rc == 4096);
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
for (block = 2; block < 1000; block++) {
rc = fallocate(fd, FALLOC_FL_INSERT_RANGE, 4096, 4096);
assert(rc == 0);
for (i = 0; i < 2048; i++)
*((unsigned short *)ptr + i) = block;
rc = pwrite(fd, ptr, 4096, 4096);
assert(rc == 4096);
}
Because start block is not included in the range the hole appears at
the wrong offset (just after the desired offset) and the following
pwrite() overwrites already existent block, keeping hole untouched.
Simple way to verify wrong behaviour is to check zeroed blocks after
the test:
$ hexdump ./ext4.file | grep '0000 0000'
The root cause of the bug is a wrong range (start, stop], where start
should be inclusive, i.e. [start, stop].
This patch fixes the problem by including start into the range. But
not to break left shift (range collapse) stop points to the beginning
of the a block, not to the end.
The other not obvious change is an iterator check on validness in a
main loop. Because iterator is unsigned the following corner case
should be considered with care: insert a block at 0 offset, when stop
variables overflows and never becomes less than start, which is 0.
To handle this special case iterator is set to NULL to indicate that
end of the loop is reached.
Fixes: 331573febb6a2
Signed-off-by: Roman Pen <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: Namjae Jeon <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a bunch of USB fixes for 4.10-rc3. Yeah, it's a lot, an
artifact of the holiday break I think.
Lots of gadget and the usual XHCI fixups for reported issues (one day
that driver will calm down...) Also included are a bunch of usb-serial
driver fixes, and for good measure, a number of much-reported MUSB
driver issues have finally been resolved.
All of these have been in linux-next with no reported issues"
* tag 'usb-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (72 commits)
USB: fix problems with duplicate endpoint addresses
usb: ohci-at91: use descriptor-based gpio APIs correctly
usb: storage: unusual_uas: Add JMicron JMS56x to unusual device
usb: hub: Move hub_port_disable() to fix warning if PM is disabled
usb: musb: blackfin: add bfin_fifo_offset in bfin_ops
usb: musb: fix compilation warning on unused function
usb: musb: Fix trying to free already-free IRQ 4
usb: musb: dsps: implement clear_ep_rxintr() callback
usb: musb: core: add clear_ep_rxintr() to musb_platform_ops
USB: serial: ti_usb_3410_5052: fix NULL-deref at open
USB: serial: spcp8x5: fix NULL-deref at open
USB: serial: quatech2: fix sleep-while-atomic in close
USB: serial: pl2303: fix NULL-deref at open
USB: serial: oti6858: fix NULL-deref at open
USB: serial: omninet: fix NULL-derefs at open and disconnect
USB: serial: mos7840: fix misleading interrupt-URB comment
USB: serial: mos7840: remove unused write URB
USB: serial: mos7840: fix NULL-deref at open
USB: serial: mos7720: remove obsolete port initialisation
USB: serial: mos7720: fix parallel probe
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fixes from Greg KH:
"Here are a few small char/misc driver fixes for 4.10-rc3.
Two MEI driver fixes, and three NVMEM patches for reported issues, and
a new Hyper-V driver MAINTAINER update. Nothing major at all, all have
been in linux-next with no reported issues"
* tag 'char-misc-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
hyper-v: Add myself as additional MAINTAINER
nvmem: fix nvmem_cell_read() return type doc
nvmem: imx-ocotp: Fix wrong register size
nvmem: qfprom: Allow single byte accesses for read/write
mei: move write cb to completion on credentials failures
mei: bus: fix mei_cldev_enable KDoc
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"Here are some staging and IIO driver fixes for 4.10-rc3.
Most of these are minor IIO fixes of reported issues, along with one
network driver fix to resolve an issue. And a MAINTAINERS update with
a new mailing list. All of these, except the MAINTAINERS file update,
have been in linux-next with no reported issues (the MAINTAINERS patch
happened on Friday...)"
* tag 'staging-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
MAINTAINERS: add greybus subsystem mailing list
staging: octeon: Call SET_NETDEV_DEV()
iio: accel: st_accel: fix LIS3LV02 reading and scaling
iio: common: st_sensors: fix channel data parsing
iio: max44000: correct value in illuminance_integration_time_available
iio: adc: TI_AM335X_ADC should depend on HAS_DMA
iio: bmi160: Fix time needed to sleep after command execution
iio: 104-quad-8: Fix active level mismatch for the preset enable option
iio: 104-quad-8: Fix off-by-one errors when addressing IOR
iio: 104-quad-8: Fix index control configuration
|
|
There was an unnecessary amount of complexity around requesting the
filesystem-specific key prefix. It was unclear why; perhaps it was
envisioned that different instances of the same filesystem type could
use different key prefixes, or that key prefixes could be binary.
However, neither of those things were implemented or really make sense
at all. So simplify the code by making key_prefix a const char *.
Signed-off-by: Eric Biggers <[email protected]>
Reviewed-by: Richard Weinberger <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Nothing reads or writes fscrypt_ctx.mode, and it doesn't belong there
because a fscrypt_ctx is not tied to a specific encryption mode.
Signed-off-by: Eric Biggers <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
While we allow deletes without the key, the following should not be
permitted:
# cd /vdc/encrypted-dir-without-key
# ls -l
total 4
-rw-r--r-- 1 root root 0 Dec 27 22:35 6,LKNRJsp209FbXoSvJWzB
-rw-r--r-- 1 root root 286 Dec 27 22:35 uRJ5vJh9gE7vcomYMqTAyD
# mv uRJ5vJh9gE7vcomYMqTAyD 6,LKNRJsp209FbXoSvJWzB
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked like
use-after-free access from the shadow node shrinker.
Dave Jones managed to reproduce the issue with a debug patch applied,
which confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:
WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
Call Trace:
delete_node+0x1e4/0x200
__radix_tree_delete_node+0xd/0x10
shadow_lru_isolate+0xe6/0x220
__list_lru_walk_one.isra.4+0x9b/0x190
list_lru_walk_one+0x23/0x30
scan_shadow_nodes+0x2e/0x40
shrink_slab.part.44+0x23d/0x5d0
shrink_node+0x22c/0x330
kswapd+0x392/0x8f0
This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
inlined radix_tree_shrink().
The problem is with 14b468791fa9 ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node.
While the reclaimed shadow node itself is unlinked by the shrinker, its
deletion from the tree can cause the left-most leaf node in the tree to
be shrunk. If that happens to be a shadow node as well, we don't unlink
it from the LRU as we should.
Consider this tree, where the s are shadow entries:
root->rnode
|
[0 n]
| |
[s ] [sssss]
Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:
root->rnode
|
[0 ]
|
[s ]
Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:
root->rnode
|
[s ]
The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root->rnode directly and free
the node - which is a leaf node on the shadow node LRU.
root->rnode
|
s
Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.
Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.
Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.
Fixes: 14b468791fa9 ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: Dave Chinner <[email protected]>
Reported-by: Hugh Dickins <[email protected]>
Reported-by: Andrea Arcangeli <[email protected]>
Reported-and-tested-by: Dave Jones <[email protected]>
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Chris Leech <[email protected]>
Cc: Lee Duncan <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
4.10-rc loadtest (even on x86, and even without THPCache) fails with
"fork: Cannot allocate memory" or some such; and /proc/meminfo shows
PageTables growing.
Commit 953c66c2b22a ("mm: THP page cache support for ppc64") that got
merged in rc1 removed the freeing of an unused preallocated pagetable
after do_fault_around() has called map_pages().
This is usually a good optimization, so that the followup doesn't have
to reallocate one; but it's not sufficient to shift the freeing into
alloc_set_pte(), since there are failure cases (most commonly
VM_FAULT_RETRY) which never reach finish_fault().
Check and free it at the outer level in do_fault(), then we don't need
to worry in alloc_set_pte(), and can restore that to how it was (I
cannot find any reason to pte_free() under lock as it was doing).
And fix a separate pagetable leak, or crash, introduced by the same
change, that could only show up on some ppc64: why does do_set_pmd()'s
failure case attempt to withdraw a pagetable when it never deposited
one, at the same time overwriting (so leaking) the vmf->prealloc_pte?
Residue of an earlier implementation, perhaps? Delete it.
Fixes: 953c66c2b22a ("mm: THP page cache support for ppc64")
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild fix from Michal Marek:
"The asm-prototypes.h file added in the last merge window results in
invalid code with CONFIG_KMEMCHECK=y. The net result is that genksyms
segfaults.
This pull request fixes the header, the genksyms fix is in my kbuild
branch for 4.11"
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
asm-prototypes: Clear any CPP defines before declaring the functions
|
|
The Greybus driver subsystem has a mailing list, so list it in the
MAINTAINERS file so that people know to send patches there as well.
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Acked-by: Johan Hovold <[email protected]>
Reviewed-by: Viresh Kumar <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Nothing particular stands out, only a few small fixes for USB-audio,
HD-audio and Firewire. The USB-audio fix is the respin of the previous
race fix after a revert due to the regression"
* tag 'sound-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
Revert "ALSA: firewire-lib: change structure member with proper type"
ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
ALSA: usb-audio: Fix irq/process data synchronization
ALSA: hda - Apply asus-mode8 fixup to ASUS X71SL
ALSA: hda - Fix up GPIO for ASUS ROG Ranger
ALSA: firewire-lib: change structure member with proper type
ALSA: firewire-tascam: Fix to handle error from initialization of stream data
ALSA: fireworks: fix asymmetric API call at unit removal
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"One fix for a broken driver on Renesas RZ/A1 SoCs with bootloaders
that don't turn all the clks on and another fix for stm32f4 SoCs where
we have multiple drivers attaching to the same DT node"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: stm32f4: Use CLK_OF_DECLARE_DRIVER initialization method
clk: renesas: mstp: Support 8-bit registers for r7s72100
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Fix temp1_max_alarm attribute in lm90 driver"
* tag 'hwmon-for-linus-v4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (lm90) fix temp1_max_alarm attribute
|
|
Pull KVM fixes from Radim Krčmář:
"MIPS:
- fix host kernel crashes when receiving a signal with 64-bit
userspace
- flush instruction cache on all vcpus after generating entry code
(both for stable)
x86:
- fix NULL dereference in MMU caused by SMM transitions (for stable)
- correct guest instruction pointer after emulating some VMX errors
- minor cleanup"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: remove duplicated declaration
KVM: MIPS: Flush KVM entry code from icache globally
KVM: MIPS: Don't clobber CP0_Status.UX
KVM: x86: reset MMU on KVM_SET_VCPU_EVENTS
KVM: nVMX: fix instruction skipping during emulated vm-entry
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- re-introduce the arm64 get_current() optimisation
- KERN_CONT fallout fix in show_pte()
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: restore get_current() optimisation
arm64: mm: fix show_pte KERN_CONT fallout
|
|
Pull VFIO fixes from Alex Williamson:
- Add mtty sample driver properly into build system (Alex Williamson)
- Restore type1 mapping performance after mdev (Alex Williamson)
- Fix mdev device race (Alex Williamson)
- Cleanups to the mdev ABI used by vendor drivers (Alex Williamson)
- Build fix for old compilers (Arnd Bergmann)
- Fix sample driver error path (Dan Carpenter)
- Handle pci_iomap() error (Arvind Yadav)
- Fix mdev ioctl return type (Paul Gortmaker)
* tag 'vfio-v4.10-rc3' of git://github.com/awilliam/linux-vfio:
vfio-mdev: fix non-standard ioctl return val causing i386 build fail
vfio-pci: Handle error from pci_iomap
vfio-mdev: fix some error codes in the sample code
vfio-pci: use 32-bit comparisons for register address for gcc-4.5
vfio-mdev: Make mdev_device private and abstract interfaces
vfio-mdev: Make mdev_parent private
vfio-mdev: de-polute the namespace, rename parent_device & parent_ops
vfio-mdev: Fix remove race
vfio/type1: Restore mapping performance with mdev support
vfio-mdev: Fix mtty sample driver building
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
"This has one fix to make i915 work when using Xen SWIOTLB, and a
feature from Geert to aid in debugging of devices that can't do DMA
outside the 32-bit address space.
The feature from Geert is on top of v4.10 merge window commit
(specifically you pulling my previous branch), as his changes were
dependent on the Documentation/ movement patches.
I figured it would just easier than me trying than to cherry-pick the
Documentation patches to satisfy git.
The patches have been soaking since 12/20, albeit I updated the last
patch due to linux-next catching an compiler error and adding an
Tested-and-Reported-by tag"
* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
swiotlb: Export swiotlb_max_segment to users
swiotlb: Add swiotlb=noforce debug option
swiotlb: Convert swiotlb_force from int to enum
x86, swiotlb: Simplify pci_swiotlb_detect_override()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Three fixes queued up:
- fix an issue with command buffer overflow handling in the AMD IOMMU
driver
- add an additional context entry flush to the Intel VT-d driver to
make sure any old context entry from kdump copying is flushed out
of the cache
- correct the encoding of the PASID table size in the Intel VT-d
driver"
* tag 'iommu-fixes-v4.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix the left value check of cmd buffer
iommu/vt-d: Fix pasid table size encoding
iommu/vt-d: Flush old iommu caches for kdump when the device gets context mapped
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix a device enumeration problem related to _ADR matching and an
IOMMU initialization issue related to the DMAR table missing, remove
an excessive function call from the core ACPI code, update an error
message in the ACPI WDAT watchdog driver and add a way to work around
problems with unhandled GPE notifications.
Specifics:
- Fix a device enumeration issue leading to incorrect associations
between ACPI device objects and platform device objects
representing physical devices if the given device object has both
_ADR and _HID (Rafael Wysocki).
- Avoid passing NULL to acpi_put_table() during IOMMU initialization
which triggers a (rightful) warning from ACPICA (Rafael Wysocki).
- Drop an excessive call to acpi_dma_deconfigure() from the core code
that binds ACPI device objects to device objects representing
physical devices (Lorenzo Pieralisi).
- Update an error message in the ACPI WDAT watchdog driver to make it
provide more useful information (Mika Westerberg).
- Add a mechanism to work around issues with unhandled GPE
notifications that occur during system initialization and cannot be
prevented by means of sysfs (Lv Zheng)"
* tag 'acpi-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / DMAR: Avoid passing NULL to acpi_put_table()
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
ACPI / watchdog: Print out error number when device creation fails
ACPI / sysfs: Provide quirk mechanism to prevent GPE flooding
ACPI: Drop misplaced acpi_dma_deconfigure() call from acpi_bind_one()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a few issues in the intel_pstate driver, a documetation
issue, a false-positive compiler warning in the generic power domains
framework and two problems in the devfreq subsystem. They also update
the MAINTAINERS entry for devfreq and add a new "compatible" string to
the generic cpufreq-dt driver.
Specifics:
- Fix a few intel_pstate driver issues: add missing locking it two
places, avoid exposing a useless debugfs interface and keep the
attribute values in sysfs in sync (Rafael Wysocki).
- Drop confusing kernel-doc references related to power management
and ACPI from the driver API manual (Rafael Wysocki).
- Make a false-positive compiler warning in the generic power domains
framework go away (Augusto Mecking Caringi).
- Fix two initialization issues in the devfreq subsystem and update
the MAINTAINERS entry for it (Chanwoo Choi).
- Add a new "compatible" string for APM X-Gene 2 to the generic DT
cpufreq driver (Hoan Tran)"
* tag 'pm-4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: dt: Add support for APM X-Gene 2
PM / devfreq: exynos-bus: Fix the wrong return value
PM / devfreq: Fix the bug of devfreq_add_device when governor is NULL
MAINTAINERS: Add myself as reviewer for DEVFREQ subsystem support
PM / docs: Drop confusing kernel-doc references from infrastructure.rst
PM / domains: Fix 'may be used uninitialized' build warning
cpufreq: intel_pstate: Always keep all limits settings in sync
cpufreq: intel_pstate: Use locking in intel_cpufreq_verify_policy()
cpufreq: intel_pstate: Use locking in intel_pstate_resume()
cpufreq: intel_pstate: Do not expose PID parameters in passive mode
|
|
So they can figure out what is the optimal number of pages
that can be contingously stitched together without fear of
bounce buffer.
We also expose an mechanism for sub-users of SWIOTLB API, such
as Xen-SWIOTLB to set the max segment value. And lastly
if swiotlb=force is set (which mandates we bounce buffer everything)
we set max_segment so at least we can bounce buffer one 4K page
instead of a giant 512KB one for which we may not have space.
Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
Reported-and-Tested-by: Juergen Gross <[email protected]>
|
|
* acpi-scan:
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
ACPI: Drop misplaced acpi_dma_deconfigure() call from acpi_bind_one()
* acpi-sysfs:
ACPI / sysfs: Provide quirk mechanism to prevent GPE flooding
* acpi-wdat:
ACPI / watchdog: Print out error number when device creation fails
* acpi-tables:
ACPI / DMAR: Avoid passing NULL to acpi_put_table()
|
|
* pm-domains:
PM / domains: Fix 'may be used uninitialized' build warning
* pm-docs:
PM / docs: Drop confusing kernel-doc references from infrastructure.rst
* pm-devfreq:
PM / devfreq: exynos-bus: Fix the wrong return value
PM / devfreq: Fix the bug of devfreq_add_device when governor is NULL
MAINTAINERS: Add myself as reviewer for DEVFREQ subsystem support
|