Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently in DAX if we have three read faults on the same hole address we
can end up with the following:
Thread 0 Thread 1 Thread 2
-------- -------- --------
dax_iomap_fault
grab_mapping_entry
lock_slot
<locks empty DAX entry>
dax_iomap_fault
grab_mapping_entry
get_unlocked_mapping_entry
<sleeps on empty DAX entry>
dax_iomap_fault
grab_mapping_entry
get_unlocked_mapping_entry
<sleeps on empty DAX entry>
dax_load_hole
find_or_create_page
...
page_cache_tree_insert
dax_wake_mapping_entry_waiter
<wakes one sleeper>
__radix_tree_replace
<swaps empty DAX entry with 4k zero page>
<wakes>
get_page
lock_page
...
put_locked_mapping_entry
unlock_page
put_page
<sleeps forever on the DAX
wait queue>
The crux of the problem is that once we insert a 4k zero page, all
locking from then on is done in terms of that 4k zero page and any
additional threads sleeping on the empty DAX entry will never be woken.
Fix this by waking all sleepers when we replace the DAX radix tree entry
with a 4k zero page. This will allow all sleeping threads to
successfully transition from locking based on the DAX empty entry to
locking on the 4k zero page.
With the test case reported by Xiong this happens very regularly in my
test setup, with some runs resulting in 9+ threads in this deadlocked
state. With this fix I've been able to run that same test dozens of
times in a loop without issue.
Fixes: ac401cc78242 ("dax: New fault locking")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ross Zwisler <[email protected]>
Reported-by: Xiong Zhou <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: <[email protected]> [4.7+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
I have noticed that two different descriptions for B: entries in
MAINTAINERS were merged: commit 686564434e88 ("MAINTAINERS: Add bug
tracking system location entry type") and 2de2bd95f456 ("MAINTAINERS:
add "B:" for URI where to file bugs").
This patch keeps the description from 2de2bd95f456. There has been a
discussion [1] about whether this more detailed description is useful
and what it exactly implies. I find it more useful and general, and the
author of 686564434e88 agreed in the end that either is fine.
[1] https://lkml.org/lkml/2016/12/8/71
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The GRO fast path caches the frag0 address. This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.
This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.
This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.
Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: Slava Shwartsman <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0. However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.
This patch adds the check by capping frag0_len with the skb tailroom.
Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In commit b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs"),
it changed EOPNOTSUPP to ENOTSUPP by mistake. This patch fixes it.
Fixes: b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs")
Signed-off-by: Martin KaFai Lau <[email protected]>
Acked-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
With commit e53743994e21
("af_iucv: use paged SKBs for big outbound messages"),
we transmit paged skbs for both of AF_IUCV's transport modes
(IUCV or HiperSockets).
The qeth driver for Layer 3 HiperSockets currently doesn't
support NETIF_F_SG, so these skbs would just be linearized again
by the stack.
Avoid that overhead by using paged skbs only for IUCV transport.
cc stable, since this also circumvents a significant skb leak when
sending large messages (where the skb then needs to be linearized).
Signed-off-by: Julian Wiedmann <[email protected]>
Signed-off-by: Ursula Braun <[email protected]>
Cc: <[email protected]> # v4.8+
Fixes: e53743994e21 ("af_iucv: use paged SKBs for big outbound messages")
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit bdabad3e363d ("net: Add Qualcomm IPC router") introduced a
new address family. Update the family name tables accordingly so
that the lockdep initialization can use the proper names for this
family.
Cc: Courtney Cavin <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Signed-off-by: Suman Anna <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Failure to mark this pointer as __le32 causes checkers like
sparse to complain:
net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:274:16: expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:274:16: got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:275:16: expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:275:16: got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:276:16: expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:276:16: got restricted __le32 [usertype] <noident>
Silence it.
Cc: Bjorn Andersson <[email protected]>
Signed-off-by: Stephen Boyd <[email protected]>
Acked-by: Bjorn Andersson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
It is perfectly possible to have non zero indexed switches being present
in a DSA switch tree, in such a case, we will be deferencing a NULL
pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive
and ensure that dst->ds[0] is valid before doing anything with it.
Fixes: 0c73c523cf73 ("net: dsa: Initialize CPU port ethtool ops per tree")
Signed-off-by: Florian Fainelli <[email protected]>
Reviewed-by: Vivien Didelot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"This update consists of fixes to use shell instead of bash to run
tests in embedded devices where the only shell available is the
busybox ash.
Also included is a typo fix to a test result message"
* tag 'linux-kselftest-4.10-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: x86/pkeys: fix spelling mistake: "itertation" -> "iteration"
selftests: do not require bash to run netsocktests testcase
selftests: do not require bash to run bpf tests
selftests: do not require bash for the generated test
|
|
If blk_mq_init_queue() returns an error, it gets assigned to
vblk->disk->queue. Then, when we call put_disk(), we end up calling
blk_put_queue() with the ERR_PTR, causing a bad dereference. Fix it by
only assigning to vblk->disk->queue on success.
Signed-off-by: Omar Sandoval <[email protected]>
Reviewed-by: Jeff Moyer <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Additionally, don't assign directly to disk->queue, otherwise
blk_put_queue (called via put_disk) will choke (panic) on the errno
stored there.
Bug found by code inspection after Omar found a similar issue in
virtio_blk. Compile-tested only.
Signed-off-by: Jeff Moyer <[email protected]>
Reviewed-by: Omar Sandoval <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Most users of BLOCK_PC requests allocate the sense buffer on the stack,
so to avoid DMA to the stack copy them to a field in the heap allocated
virtblk_req structure. Without that any attempt at SCSI passthrough I/O,
including the SG_IO ioctl from userspace will crash the kernel. Note that
this includes running tools like hdparm even when the host does not have
SCSI passthrough enabled.
Signed-off-by: Christoph Hellwig <[email protected]>
Cc: [email protected] # v4.9+
Signed-off-by: Jens Axboe <[email protected]>
|
|
The code currently uses sdio->blkbits to compute the number of blocks to
be cleaned. However sdio->blkbits is derived from the logical block size
of the underlying block device (Refer to the definition of
do_blockdev_direct_IO()). Due to this, generic/299 test would rarely
fail when executed on an ext4 filesystem with 64k as the block size and
when using a virtio based disk (having 512 byte as the logical block
size) inside a kvm guest.
This commit fixes the bug by using inode->i_blkbits to compute the
number of blocks to be cleaned.
Signed-off-by: Chandan Rajendra <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Fixed up by Jeff Moyer to only use/evaluate inode->i_blkbits once,
to avoid issues with block size changes with IO in flight.
Signed-off-by: Jens Axboe <[email protected]>
|
|
Removes following sparse complain :
net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16'
was not declared. Should it be static?
Fixes: 972d3876faa8 ("flow dissector: ICMP support")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If srp_transfer_data fails within ibmvscsis_write_pending, then
the most likely scenario is that the client timed out the op and
removed the TCE mapping. Thus it will loop forever retrying the
op that is pretty much guaranteed to fail forever. A better return
code would be EIO instead of EAGAIN.
Cc: [email protected]
Reported-by: Steven Royer <[email protected]>
Tested-by: Steven Royer <[email protected]>
Signed-off-by: Bryant G. Ly <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
MUSB driver now has runtime PM support, but the debugfs driver misses
the PM _get/_put() calls, which could cause MUSB register access
failure.
Cc: [email protected] # 4.9+
Acked-by: Tony Lindgren <[email protected]>
Signed-off-by: Bin Liu <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Hayes Wang says:
====================
r8152: fix autosuspend issue
Avoid rx is split into two parts when runtime suspend occurs.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Pause the rx and make sure the rx fifo is empty when the autosuspend
occurs.
If the rx data comes when the driver is canceling the rx urb, the host
controller would stop getting the data from the device and continue
it after next rx urb is submitted. That is, one continuing data is
split into two different urb buffers. That let the driver take the
data as a rx descriptor, and unexpected behavior happens.
Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Split rtl8152_suspend() into rtl8152_system_suspend() and
rtl8152_rumtime_suspend().
Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
SPC4r37 6.4.1 EXTENDED COPY(LID4) states (also applying to LID1 reqs):
A parameter list length of zero specifies that the copy manager shall
not transfer any data or alter any internal state, and this shall not
be considered an error.
This behaviour can be tested using the libiscsi ExtendedCopy.ParamHdr
test.
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
Check for XCOPY header, CSCD descriptor and segment descriptor list
truncation, and respond accordingly.
SPC4r37 6.4.1 EXTENDED COPY(LID4) states (also applying to LID1 reqs):
If the parameter list length causes truncation of the parameter list,
then the copy manager shall transfer no data and shall terminate the
EXTENDED COPY command with CHECK CONDITION status, with the sense key
set to ILLEGAL REQUEST, and the additional sense code set to PARAMETER
LIST LENGTH ERROR.
This behaviour can be tested using the libiscsi ExtendedCopy.ParamHdr
test.
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
The XCOPY specification in SPC4r37 states that the XCOPY source and
destination device(s) should be derived from the copy source and copy
destination (CSCD) descriptor IDs in the XCOPY segment descriptor.
The CSCD IDs are generally (for block -> block copies), indexes into
the corresponding CSCD descriptor list, e.g.
=================================
EXTENDED COPY Header
=================================
CSCD Descriptor List
- entry 0
+ LU ID <--------------<------------------\
- entry 1 |
+ LU ID <______________<_____________ |
================================= | |
Segment Descriptor List | |
- segment 0 | |
+ src CSCD ID = 0 --------->---------+----/
+ dest CSCD ID = 1 ___________>______|
+ len
+ src lba
+ dest lba
=================================
Currently LIO completely ignores the src and dest CSCD IDs in the
Segment Descriptor List, and instead assumes that the first entry in the
CSCD list corresponds to the source, and the second to the destination.
This commit removes this assumption, by ensuring that the Segment
Descriptor List is parsed prior to processing the CSCD Descriptor List.
CSCD Descriptor List processing is modified to compare the current list
index with the previously obtained src and dest CSCD IDs.
Additionally, XCOPY requests where the src and dest CSCD IDs refer to
the CSCD Descriptor List entry can now be successfully processed.
Fixes: cbf031f ("target: Add support for EXTENDED_COPY copy offload")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191381
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
Ensure that the segment descriptor CSCD descriptor ID values correspond
to CSCD descriptor entries located in the XCOPY command parameter list.
SPC4r37 6.4.6.1 Table 150 specifies this range as 0000h to 07FFh, where
the CSCD descriptor location in the parameter list can be located via:
16 + (id * 32)
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
[ bvanassche: inserted "; " in the format string of an error message
and also moved a "||" operator from the start of a line to the end
of the previous line ]
Signed-off-by: Bart Van Assche <[email protected]>
|
|
target_xcopy_locate_se_dev_e4() is used to locate an se_dev, based on
the WWN provided with the XCOPY request. Remove a couple of unneeded
arguments, and rely on the caller for the src/dst test.
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
Use UNSUPPORTED TARGET DESCRIPTOR TYPE CODE and UNSUPPORTED SEGMENT
DESCRIPTOR TYPE CODE additional sense codes if a descriptor type in an
XCOPY request is not supported, as specified in spc4r37 6.4.5 and 6.4.6.
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
spc4r37 6.4.3.5 states:
If the combined length of the CSCD descriptors and segment descriptors
exceeds the allowed value, then the copy manager shall terminate the
command with CHECK CONDITION status, with the sense key set to ILLEGAL
REQUEST, and the additional sense code set to PARAMETER LIST LENGTH
ERROR.
This functionality can be tested using the libiscsi
ExtendedCopy.DescrLimits test.
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
Check the length of the XCOPY request segment descriptor list against
the value advertised via the MAXIMUM SEGMENT DESCRIPTOR COUNT field in
the RECEIVE COPY OPERATING PARAMETERS response.
spc4r37 6.4.3.5 states:
If the number of segment descriptors exceeds the allowed number, the
copy manager shall terminate the command with CHECK CONDITION status,
with the sense key set to ILLEGAL REQUEST, and the additional sense
code set to TOO MANY SEGMENT DESCRIPTORS.
This functionality is testable using the libiscsi
ExtendedCopy.DescrLimits test.
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
spc4r37 6.4.3.4 states:
If the number of CSCD descriptors exceeds the allowed number, the copy
manager shall terminate the command with CHECK CONDITION status, with
the sense key set to ILLEGAL REQUEST, and the additional sense code
set to TOO MANY TARGET DESCRIPTORS.
LIO currently responds with INVALID FIELD IN PARAMETER LIST, which sees
it fail the libiscsi ExtendedCopy.DescrLimits test.
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
As defined in http://www.t10.org/lists/asc-num.htm. To be used during
validation of XCOPY target and segment descriptor lists.
Signed-off-by: David Disseldorp <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Bart Van Assche <[email protected]>
|
|
Make sockfs_setattr() static as it is not used outside of net/socket.c
This fixes the following GCC warning:
net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes]
Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.")
Cc: Lorenzo Colitti <[email protected]>
Signed-off-by: Tobias Klauser <[email protected]>
Acked-by: Lorenzo Colitti <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
wireless-drivers fixes for 4.10
Only two fixes at this time. The rtlwifi fix is an important one as it
fixes a reported oops and Linus was already asking about it. The
orinoco fix is not tested on a real device, because it's old legacy
hardware and hardly no-one use it, but it should fix a (theoretical)
issue with VMAP_STACK.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
The driver put a constant buffer of all zeros on the stack and
pointed a scatterlist entry at it. This doesn't work with virtual
stacks. Use ZERO_PAGE instead.
Cc: [email protected] # 4.9 only
Reported-by: Eric Biggers <[email protected]>
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The current implementation failed to detect short transfers when
attempting to read the line state, and also, to make things worse,
logged the content of the uninitialised heap transfer buffer.
Fixes: abf492e7b3ae ("USB: kl5kusb105: fix DMA buffers on stack")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
|
|
'asoc/fix/tlv320aic3x' and 'asoc/fix/topology' into asoc-linus
|
|
'asoc/fix/dwc', 'asoc/fix/fsl-ssi' and 'asoc/fix/hdmi-codec' into asoc-linus
|
|
|
|
|
|
|
|
Plantronics BT600 does not support reading the sample rate which leads
to many lines of "cannot get freq at ep 0x1" and "cannot get freq at
ep 0x82". This patch adds the USB ID of the BT600 to quirks.c and
avoids those error messages.
Signed-off-by: Dennis Kadioglu <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
|
|
This change was missed the tmpfs modification in In CVE-2016-7097
commit 073931017b49 ("posix_acl: Clear SGID bit when setting
file permissions")
It can test by xfstest generic/375, which failed to clear
setgid bit in the following test case on tmpfs:
touch $testfile
chown 100:100 $testfile
chmod 2755 $testfile
_runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile
Signed-off-by: Gu Zheng <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
We do not yet have users of port_window. The following errors were found
when converting the tusb6010_omap.c musb driver:
- The peripheral side must have SRC_/DST_PACKED disabled
- when configuring the burst for the peripheral side the memory side
configuration were overwritten: d->csdp = ... -> d->csdp |= ...
- The EI and FI were configured for the wrong sides of the transfers.
With these changes and the converted tus6010_omap.c I was able to verify
that things are working as they expected to work.
Fixes: 201ac4861c19 ("dmaengine: omap-dma: Support for slave devices with data port window")
Signed-off-by: Peter Ujfalusi <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
|
|
allocations.
On a kernel with DEBUG_LOCKS, ioat_free_chan_resources triggers an
in_interrupt() warning. With PROVE_LOCKING, it reports detecting a
SOFTIRQ-safe to SOFTIRQ-unsafe lock ordering in the same code path.
This is because dma_generic_alloc_coherent() checks if the GFP flags
permit blocking. It allocates from different subsystems if blocking is
permitted. The free path knows how to return the memory to the correct
allocator. If GFP_KERNEL is specified then the alloc and free end up
going through cma_alloc(), which uses mutexes.
Given that ioat_free_chan_resources() can be called in interrupt
context, ioat_alloc_chan_resources() must specify GFP_NOWAIT so that the
allocations do not block and instead use an allocator that uses
spinlocks.
Signed-off-by: Krister Johansen <[email protected]>
Acked-by: Dave Jiang <[email protected]>
Signed-off-by: Vinod Koul <[email protected]>
|
|
Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference
added by grab_header when return from !dir_emit_dots path.
It can cause any path called unregister_sysctl_table will
wait forever.
The calltrace of CVE-2016-9191:
[ 5535.960522] Call Trace:
[ 5535.963265] [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 5535.968817] [<ffffffff817d33fb>] schedule_timeout+0x3db/0x6f0
[ 5535.975346] [<ffffffff817cf055>] ? wait_for_completion+0x45/0x130
[ 5535.982256] [<ffffffff817cf0d3>] wait_for_completion+0xc3/0x130
[ 5535.988972] [<ffffffff810d1fd0>] ? wake_up_q+0x80/0x80
[ 5535.994804] [<ffffffff8130de64>] drop_sysctl_table+0xc4/0xe0
[ 5536.001227] [<ffffffff8130de17>] drop_sysctl_table+0x77/0xe0
[ 5536.007648] [<ffffffff8130decd>] unregister_sysctl_table+0x4d/0xa0
[ 5536.014654] [<ffffffff8130deff>] unregister_sysctl_table+0x7f/0xa0
[ 5536.021657] [<ffffffff810f57f5>] unregister_sched_domain_sysctl+0x15/0x40
[ 5536.029344] [<ffffffff810d7704>] partition_sched_domains+0x44/0x450
[ 5536.036447] [<ffffffff817d0761>] ? __mutex_unlock_slowpath+0x111/0x1f0
[ 5536.043844] [<ffffffff81167684>] rebuild_sched_domains_locked+0x64/0xb0
[ 5536.051336] [<ffffffff8116789d>] update_flag+0x11d/0x210
[ 5536.057373] [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.064186] [<ffffffff81167acb>] ? cpuset_css_offline+0x1b/0x60
[ 5536.070899] [<ffffffff810fce3d>] ? trace_hardirqs_on+0xd/0x10
[ 5536.077420] [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.084234] [<ffffffff8115a9f5>] ? css_killed_work_fn+0x25/0x220
[ 5536.091049] [<ffffffff81167ae5>] cpuset_css_offline+0x35/0x60
[ 5536.097571] [<ffffffff8115aa2c>] css_killed_work_fn+0x5c/0x220
[ 5536.104207] [<ffffffff810bc83f>] process_one_work+0x1df/0x710
[ 5536.110736] [<ffffffff810bc7c0>] ? process_one_work+0x160/0x710
[ 5536.117461] [<ffffffff810bce9b>] worker_thread+0x12b/0x4a0
[ 5536.123697] [<ffffffff810bcd70>] ? process_one_work+0x710/0x710
[ 5536.130426] [<ffffffff810c3f7e>] kthread+0xfe/0x120
[ 5536.135991] [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40
[ 5536.142041] [<ffffffff810c3e80>] ? kthread_create_on_node+0x230/0x230
One cgroup maintainer mentioned that "cgroup is trying to offline
a cpuset css, which takes place under cgroup_mutex. The offlining
ends up trying to drain active usages of a sysctl table which apprently
is not happening."
The real reason is that proc_sys_readdir doesn't drop reference added
by grab_header when return from !dir_emit_dots path. So this cpuset
offline path will wait here forever.
See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13
Fixes: f0c3b5093add ("[readdir] convert procfs")
Cc: [email protected]
Reported-by: CAI Qian <[email protected]>
Tested-by: Yang Shukui <[email protected]>
Signed-off-by: Zhou Chengming <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
4.10.0-rc2-00024-g4aecec9-dirty #118 Tainted: G W
---------------------------------------------------------
swapper/1/0 just changed the state of lock:
(&(&sighand->siglock)->rlock){-.....}, at: [<ffffffffbd0a1bc6>] __lock_task_sighand+0xb6/0x2c0
but this lock took another, HARDIRQ-unsafe lock in the past:
(ucounts_lock){+.+...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of: &(&sighand->siglock)->rlock --> &(&tty->ctrl_lock)->rlock --> ucounts_lock
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(ucounts_lock);
local_irq_disable();
lock(&(&sighand->siglock)->rlock);
lock(&(&tty->ctrl_lock)->rlock);
<Interrupt>
lock(&(&sighand->siglock)->rlock);
*** DEADLOCK ***
This patch removes a dependency between rlock and ucount_lock.
Fixes: f333c700c610 ("pidns: Add a limit on the number of pid namespaces")
Cc: [email protected]
Signed-off-by: Andrei Vagin <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
Add MS_KERNMOUNT to the flags that are passed.
Use sget_userns and force &init_user_ns instead of calling sget so that
even if called from a weird context the internal filesystem will be
considered to be in the intial user namespace.
Luis Ressel reported that the the failure to pass MS_KERNMOUNT into
mount_pseudo broke his in development graphics driver that uses the
generic drm infrastructure. I am not certain the deriver was bug
free in it's usage of that infrastructure but since
mount_pseudo_xattr can never be triggered by userspace it is clearer
and less error prone, and less problematic for the code to be explicit.
Reported-by: Luis Ressel <[email protected]>
Tested-by: Luis Ressel <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
Protecting the mountpoint hashtable with namespace_sem was sufficient
until a call to umount_mnt was added to mntput_no_expire. At which
point it became possible for multiple calls of put_mountpoint on
the same hash chain to happen on the same time.
Kristen Johansen <[email protected]> reported:
> This can cause a panic when simultaneous callers of put_mountpoint
> attempt to free the same mountpoint. This occurs because some callers
> hold the mount_hash_lock, while others hold the namespace lock. Some
> even hold both.
>
> In this submitter's case, the panic manifested itself as a GP fault in
> put_mountpoint() when it called hlist_del() and attempted to dereference
> a m_hash.pprev that had been poisioned by another thread.
Al Viro observed that the simple fix is to switch from using the namespace_sem
to the mount_lock to protect the mountpoint hash table.
I have taken Al's suggested patch moved put_mountpoint in pivot_root
(instead of taking mount_lock an additional time), and have replaced
new_mountpoint with get_mountpoint a function that does the hash table
lookup and addition under the mount_lock. The introduction of get_mounptoint
ensures that only the mount_lock is needed to manipulate the mountpoint
hashtable.
d_set_mounted is modified to only set DCACHE_MOUNTED if it is not
already set. This allows get_mountpoint to use the setting of
DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry
happens exactly once.
Cc: [email protected]
Fixes: ce07d891a089 ("mnt: Honor MNT_LOCKED when detaching mounts")
Reported-by: Krister Johansen <[email protected]>
Suggested-by: Al Viro <[email protected]>
Acked-by: Al Viro <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
In generic_load_microcode(), curr_mc_size is the size of the last
allocated buffer and since we have this performance "optimization"
there to vmalloc a new buffer only when the current one is bigger,
curr_mc_size ends up becoming the size of the biggest buffer we've seen
so far.
However, we end up saving the microcode patch which matches our CPU
and its size is not curr_mc_size but the respective mc_size during the
iteration while we're staring at it.
So save that mc_size into a separate variable and use it to store the
previously found microcode buffer.
Without this fix, we could get oops like this:
BUG: unable to handle kernel paging request at ffffc9000e30f000
IP: __memcpy+0x12/0x20
...
Call Trace:
? kmemdup+0x43/0x60
__alloc_microcode_buf+0x44/0x70
save_microcode_patch+0xd4/0x150
generic_load_microcode+0x1b8/0x260
request_microcode_user+0x15/0x20
microcode_write+0x91/0x100
__vfs_write+0x34/0x120
vfs_write+0xc1/0x130
SyS_write+0x56/0xc0
do_syscall_64+0x6c/0x160
entry_SYSCALL64_slow_path+0x25/0x25
Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Signed-off-by: Jun'ichi Nomura <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
We allocate struct ucode_patch here. @size is the size of microcode data
and used for kmemdup() later in this function.
Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading")
Signed-off-by: Jun'ichi Nomura <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Since on Intel we're required to do CPUID(1) first, before reading
the microcode revision MSR, let's add a special helper which does the
required steps so that we don't forget to do them next time, when we
want to read the microcode revision.
Signed-off-by: Borislav Petkov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|