Age | Commit message (Collapse) | Author | Files | Lines |
|
There are still quite a few cases where a device might want
to get to a different node of the device-tree, obtain the
resources and map them.
We have of_iomap() and of_io_request_and_map() but they both
have shortcomings, such as not returning the size of the
resource found (which can be useful) and not being "managed".
This adds a devm_of_iomap() that provides all of these and
should probably replace uses of the above in most drivers.
Signed-off-by: Benjamin Herrenschmidt <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
Reviewed-by: Joel Stanley <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core kernel fixes from Ingo Molnar:
"This is mostly the copy_to_user_mcsafe() related fixes from Dan
Williams, and an ORC fix for Clang"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling
lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()
lib/iov_iter: Document _copy_to_iter_flushcache()
lib/iov_iter: Document _copy_to_iter_mcsafe()
objtool: Use '.strtab' if '.shstrtab' doesn't exist, to support ORC tables on Clang
|
|
This change implements get_ownership() for ksets created with
kset_create_and_add() call by fetching ownership data from parent kobject.
This is done mostly for benefit of "queues" attribute of net devices so
that corresponding directory belongs to container's root instead of global
root for network devices in a container.
Signed-off-by: Dmitry Torokhov <[email protected]>
Reviewed-by: Tyler Hicks <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Normally kobjects and their sysfs representation belong to global root,
however it is not necessarily the case for objects in separate namespaces.
For example, objects in separate network namespace logically belong to the
container's root and not global root.
This change lays groundwork for allowing network namespace objects
ownership to be transferred to container's root user by defining
get_ownership() callback in ktype structure and using it in sysfs code to
retrieve desired uid/gid when creating sysfs objects for given kobject.
Co-Developed-by: Tyler Hicks <[email protected]>
Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Tyler Hicks <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
All conflicts were trivial overlapping changes, so reasonably
easy to resolve.
Signed-off-by: David S. Miller <[email protected]>
|
|
rhashtable_init() currently does not take into account the user-passed
min_size parameter unless param->nelem_hint is set as well. As such,
the default size (number of buckets) will always be HASH_DEFAULT_SIZE
even if the smallest allowed size is larger than that. Remediate this
by unconditionally calling into rounded_hashtable_size() and handling
things accordingly.
Signed-off-by: Davidlohr Bueso <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently printing [hashed] pointers requires enough entropy to be
available. Early in the boot sequence this may not be the case
resulting in a dummy string '(____ptrval____)' being printed. This
makes debugging the early boot sequence difficult. We can relax the
requirement to use cryptographically secure hashing during debugging.
This enables debugging while keeping development/production kernel
behaviour the same.
If new command line option debug_boot_weak_hash is enabled use
cryptographically insecure hashing and hash pointer value immediately.
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Tobin C. Harding <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Currently we must wait for enough entropy to become available before
hashed pointers can be printed. We can remove this wait by using the
hw RNG if available.
Use hw RNG to get keying material.
Reviewed-by: Steven Rostedt (VMware) <[email protected]>
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Tobin C. Harding <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
By mistake the ITER_PIPE early-exit / warning from copy_from_iter() was
cargo-culted in _copy_to_iter_mcsafe() rather than a machine-check-safe
version of copy_to_iter_pipe().
Implement copy_pipe_to_iter_mcsafe() being careful to return the
indication of short copies due to a CPU exception.
Without this regression-fix all splice reads to dax-mode files fail.
Reported-by: Ross Zwisler <[email protected]>
Tested-by: Ross Zwisler <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Acked-by: Al Viro <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Fixes: 8780356ef630 ("x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe()")
Link: http://lkml.kernel.org/r/153108277278.37979.3327916996902264102.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add some theory of operation documentation to _copy_to_iter_flushcache().
Reported-by: Al Viro <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Link: http://lkml.kernel.org/r/153108276767.37979.9462477994086841699.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add some theory of operation documentation to _copy_to_iter_mcsafe().
Reported-by: Al Viro <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Link: http://lkml.kernel.org/r/153108276256.37979.1689794213845539316.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <[email protected]>
|
|
In many cases, it would be useful to be able to use the full
sanity-checked refcount helpers regardless of CONFIG_REFCOUNT_FULL,
as this would help to avoid duplicate warnings where callers try to
sanity-check refcount manipulation.
This patch refactors things such that the full refcount helpers were
always built, as refcount_${op}_checked(), such that they can be used
regardless of CONFIG_REFCOUNT_FULL. This will allow code which *always*
wants a checked refcount to opt-in, avoiding the need to duplicate the
logic for warnings.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Will Deacon <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The current doc build warns:
./lib/reed_solomon/reed_solomon.c:287: WARNING: Unknown target name: "gfp".
This is because it misinterprets the "GFP_" that is part of the
description. Change the description to avoid the problem.
Signed-off-by: Matthew Wilcox <[email protected]>
Acked-by: Kees Cook <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
|
|
rhashtable_free_and_destroy() cancels re-hash deferred work
then walks and destroys elements. at this moment, some elements can be
still in future_tbl. that elements are not destroyed.
test case:
nft_rhash_destroy() calls rhashtable_free_and_destroy() to destroy
all elements of sets before destroying sets and chains.
But rhashtable_free_and_destroy() doesn't destroy elements of future_tbl.
so that splat occurred.
test script:
%cat test.nft
table ip aa {
map map1 {
type ipv4_addr : verdict;
elements = {
0 : jump a0,
1 : jump a0,
2 : jump a0,
3 : jump a0,
4 : jump a0,
5 : jump a0,
6 : jump a0,
7 : jump a0,
8 : jump a0,
9 : jump a0,
}
}
chain a0 {
}
}
flush ruleset
table ip aa {
map map1 {
type ipv4_addr : verdict;
elements = {
0 : jump a0,
1 : jump a0,
2 : jump a0,
3 : jump a0,
4 : jump a0,
5 : jump a0,
6 : jump a0,
7 : jump a0,
8 : jump a0,
9 : jump a0,
}
}
chain a0 {
}
}
flush ruleset
%while :; do nft -f test.nft; done
Splat looks like:
[ 200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
[ 200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
[ 200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[ 200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
[ 200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0 4c 8b 40 08 e8 58 e5 fd f8 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
[ 200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282
[ 200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000
[ 200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90
[ 200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa
[ 200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200
[ 200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000
[ 200.906354] FS: 00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
[ 200.915533] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0
[ 200.930353] Call Trace:
[ 200.932351] ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
[ 200.939525] ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
[ 200.947525] ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
[ 200.952383] ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
[ 200.959532] ? nla_parse+0xab/0x230
[ 200.963529] ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
[ 200.968384] ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
[ 200.975525] ? debug_show_all_locks+0x290/0x290
[ 200.980363] ? debug_show_all_locks+0x290/0x290
[ 200.986356] ? sched_clock_cpu+0x132/0x170
[ 200.990352] ? find_held_lock+0x39/0x1b0
[ 200.994355] ? sched_clock_local+0x10d/0x130
[ 200.999531] ? memset+0x1f/0x40
V2:
- free all tables requested by Herbert Xu
Signed-off-by: Taehee Yoo <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The commit 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI
when logbuf_lock is available") brought back the possible deadlocks
in printk() and NMI.
The check of logbuf_lock is done only in printk_nmi_enter() to prevent
mixed output. But another CPU might take the lock later, enter NMI, and:
+ Both NMIs might be serialized by yet another lock, for example,
the one in nmi_cpu_backtrace().
+ The other CPU might get stopped in NMI, see smp_send_stop()
in panic().
The only safe solution is to use trylock when storing the message
into the main log-buffer. It might cause reordering when some lines
go to the main lock buffer directly and others are delayed via
the per-CPU buffer. It means that it is not useful in general.
This patch replaces the problematic NMI deferred context with NMI
direct context. It can be used to mark a code that might produce
many messages in NMI and the risk of losing them is more critical
than problems with eventual reordering.
The context is then used when dumping trace buffers on oops. It was
the primary motivation for the original fix. Also the reordering is
even smaller issue there because some traces have their own time stamps.
Finally, nmi_cpu_backtrace() need not longer be serialized because
it will always us the per-CPU buffers again.
Fixes: 719f6a7040f1bdaf96 ("printk: Use the main logbuf in NMI when logbuf_lock is available")
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
To: Steven Rostedt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: [email protected]
Cc: [email protected]
Acked-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
|
|
gcc 8.1.0 complains:
lib/kobject.c:128:3: warning:
'strncpy' output truncated before terminating nul copying as many
bytes from a string as its length [-Wstringop-truncation]
lib/kobject.c: In function 'kobject_get_path':
lib/kobject.c:125:13: note: length computed here
Using strncpy() is indeed less than perfect since the length of data to
be copied has already been determined with strlen(). Replace strncpy()
with memcpy() to address the warning and optimize the code a little.
Signed-off-by: Guenter Roeck <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The new added "reciprocal_value_adv" implements the advanced version of the
algorithm described in Figure 4.2 of the paper except when
"divisor > (1U << 31)" whose ceil(log2(d)) result will be 32 which then
requires u128 divide on host. The exception case could be easily handled
before calling "reciprocal_value_adv".
The advanced version requires more complex calculation to get the
reciprocal multiplier and other control variables, but then could reduce
the required emulation operations.
It makes no sense to use this advanced version for host divide emulation,
those extra complexities for calculating multiplier etc could completely
waive our saving on emulation operations.
However, it makes sense to use it for JIT divide code generation (for
example eBPF JIT backends) for which we are willing to trade performance of
JITed code with that of host. As shown by the following pseudo code, the
required emulation operations could go down from 6 (the basic version) to 3
or 4.
To use the result of "reciprocal_value_adv", suppose we want to calculate
n/d, the C-style pseudo code will be the following, it could be easily
changed to real code generation for other JIT targets.
struct reciprocal_value_adv rvalue;
u8 pre_shift, exp;
// handle exception case.
if (d >= (1U << 31)) {
result = n >= d;
return;
}
rvalue = reciprocal_value_adv(d, 32)
exp = rvalue.exp;
if (rvalue.is_wide_m && !(d & 1)) {
// floor(log2(d & (2^32 -d)))
pre_shift = fls(d & -d) - 1;
rvalue = reciprocal_value_adv(d >> pre_shift, 32 - pre_shift);
} else {
pre_shift = 0;
}
// code generation starts.
if (imm == 1U << exp) {
result = n >> exp;
} else if (rvalue.is_wide_m) {
// pre_shift must be zero when reached here.
t = (n * rvalue.m) >> 32;
result = n - t;
result >>= 1;
result += t;
result >>= rvalue.sh - 1;
} else {
if (pre_shift)
result = n >> pre_shift;
result = ((u64)result * rvalue.m) >> 32;
result >>= rvalue.sh;
}
Signed-off-by: Jiong Wang <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
into drm-next
A patchset worked out together with Peter Zijlstra. Ingo is OK with taking
it through the DRM tree:
This is a small fallout from a work to allow batching WW mutex locks and
unlocks.
Our Wound-Wait mutexes actually don't use the Wound-Wait algorithm but
the Wait-Die algorithm. One could perhaps rename those mutexes tree-wide to
"Wait-Die mutexes" or "Deadlock Avoidance mutexes". Another approach suggested
here is to implement also the "Wound-Wait" algorithm as a per-WW-class
choice, as it has advantages in some cases. See for example
http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/deadlock-compare.html
Now Wound-Wait is a preemptive algorithm, and the preemption is implemented
using a lazy scheme: If a wounded transaction is about to go to sleep on
a contended WW mutex, we return -EDEADLK. That is sufficient for deadlock
prevention. Since with WW mutexes we also require the aborted transaction to
sleep waiting to lock the WW mutex it was aborted on, this choice also provides
a suitable WW mutex to sleep on. If we were to return -EDEADLK on the first
WW mutex lock after the transaction was wounded whether the WW mutex was
contended or not, the transaction might frequently be restarted without a wait,
which is far from optimal. Note also that with the lazy preemption scheme,
contrary to Wait-Die there will be no rollbacks on lock contention of locks
held by a transaction that has completed its locking sequence.
The modeset locks are then changed from Wait-Die to Wound-Wait since the
typical locking pattern of those locks very well matches the criterion for
a substantial reduction in the number of rollbacks. For reservation objects,
the benefit is more unclear at this point and they remain using Wait-Die.
Signed-off-by: Dave Airlie <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The following kernel panic was observed on ARM64 platform due to a stale
TLB entry.
1. ioremap with 4K size, a valid pte page table is set.
2. iounmap it, its pte entry is set to 0.
3. ioremap the same address with 2M size, update its pmd entry with
a new value.
4. CPU may hit an exception because the old pmd entry is still in TLB,
which leads to a kernel panic.
Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page
table") has addressed this panic by falling to pte mappings in the above
case on ARM64.
To support pmd mappings in all cases, TLB purge needs to be performed
in this case on ARM64.
Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
so that TLB purge can be added later in seprate patches.
[[email protected]: merge changes, rewrite patch description]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Will Deacon <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: [email protected]
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
In the quest to remove all stack VLA usage from the kernel[1], this moves
the "$#" replacement from being an argument to being inside the function,
which avoids generating VLAs.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Martin Schwidefsky <[email protected]>
|
|
In file lib/rhashtable.c line 777, skip variable is assigned to
itself. The following error was observed:
lib/rhashtable.c:777:41: warning: explicitly assigning value of
variable of type 'int' to itself [-Wself-assign] error, forbidden
warning: rhashtable.c:777
This error was found when compiling with Clang 6.0. Change it to iter->skip.
Signed-off-by: Rishabh Bhatnagar <[email protected]>
Acked-by: Herbert Xu <[email protected]>
Reviewed-by: NeilBrown <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The current Wound-Wait mutex algorithm is actually not Wound-Wait but
Wait-Die. Implement also Wound-Wait as a per-ww-class choice. Wound-Wait
is, contrary to Wait-Die a preemptive algorithm and is known to generate
fewer backoffs. Testing reveals that this is true if the
number of simultaneous contending transactions is small.
As the number of simultaneous contending threads increases, Wait-Wound
becomes inferior to Wait-Die in terms of elapsed time.
Possibly due to the larger number of held locks of sleeping transactions.
Update documentation and callers.
Timings using git://people.freedesktop.org/~thomash/ww_mutex_test
tag patch-18-06-15
Each thread runs 100000 batches of lock / unlock 800 ww mutexes randomly
chosen out of 100000. Four core Intel x86_64:
Algorithm #threads Rollbacks time
Wound-Wait 4 ~100 ~17s.
Wait-Die 4 ~150000 ~19s.
Wound-Wait 16 ~360000 ~109s.
Wait-Die 16 ~450000 ~82s.
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Gustavo Padovan <[email protected]>
Cc: Maarten Lankhorst <[email protected]>
Cc: Sean Paul <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Kate Stewart <[email protected]>
Cc: Philippe Ombredanne <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Co-authored-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Hellstrom <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
|
|
Simple overlapping changes in stmmac driver.
Adjust skb_gro_flush_final_remcsum function signature to make GRO list
changes in net-next, as per Stephen Rothwell's example merge
resolution.
Signed-off-by: David S. Miller <[email protected]>
|
|
In the scsi_transport_srp implementation it cannot be avoided to
iterate over a klist from atomic context when using the legacy block
layer instead of blk-mq. Hence this patch that makes it safe to use
klists in atomic context. This patch avoids that lockdep reports the
following:
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&(&k->k_lock)->rlock);
local_irq_disable();
lock(&(&q->__queue_lock)->rlock);
lock(&(&k->k_lock)->rlock);
<Interrupt>
lock(&(&q->__queue_lock)->rlock);
stack backtrace:
Workqueue: kblockd blk_timeout_work
Call Trace:
dump_stack+0xa4/0xf5
check_usage+0x6e6/0x700
__lock_acquire+0x185d/0x1b50
lock_acquire+0xd2/0x260
_raw_spin_lock+0x32/0x50
klist_next+0x47/0x190
device_for_each_child+0x8e/0x100
srp_timed_out+0xaf/0x1d0 [scsi_transport_srp]
scsi_times_out+0xd4/0x410 [scsi_mod]
blk_rq_timed_out+0x36/0x70
blk_timeout_work+0x1b5/0x220
process_one_work+0x4fe/0xad0
worker_thread+0x63/0x5a0
kthread+0x1c1/0x1e0
ret_from_fork+0x24/0x30
See also commit c9ddf73476ff ("scsi: scsi_transport_srp: Fix shost to
rport translation").
Signed-off-by: Bart Van Assche <[email protected]>
Cc: Martin K. Petersen <[email protected]>
Cc: James Bottomley <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Martin K. Petersen <[email protected]>
|
|
Pull networking fixes from David Miller:
1) Verify netlink attributes properly in nf_queue, from Eric Dumazet.
2) Need to bump memory lock rlimit for test_sockmap bpf test, from
Yonghong Song.
3) Fix VLAN handling in lan78xx driver, from Dave Stevenson.
4) Fix uninitialized read in nf_log, from Jann Horn.
5) Fix raw command length parsing in mlx5, from Alex Vesker.
6) Cleanup loopback RDS connections upon netns deletion, from Sowmini
Varadhan.
7) Fix regressions in FIB rule matching during create, from Jason A.
Donenfeld and Roopa Prabhu.
8) Fix mpls ether type detection in nfp, from Pieter Jansen van Vuuren.
9) More bpfilter build fixes/adjustments from Masahiro Yamada.
10) Fix XDP_{TX,REDIRECT} flushing in various drivers, from Jesper
Dangaard Brouer.
11) fib_tests.sh file permissions were broken, from Shuah Khan.
12) Make sure BH/preemption is disabled in data path of mac80211, from
Denis Kenzior.
13) Don't ignore nla_parse_nested() return values in nl80211, from
Johannes berg.
14) Properly account sock objects ot kmemcg, from Shakeel Butt.
15) Adjustments to setting bpf program permissions to read-only, from
Daniel Borkmann.
16) TCP Fast Open key endianness was broken, it always took on the host
endiannness. Whoops. Explicitly make it little endian. From Yuching
Cheng.
17) Fix prefix route setting for link local addresses in ipv6, from
David Ahern.
18) Potential Spectre v1 in zatm driver, from Gustavo A. R. Silva.
19) Various bpf sockmap fixes, from John Fastabend.
20) Use after free for GRO with ESP, from Sabrina Dubroca.
21) Passing bogus flags to crypto_alloc_shash() in ipv6 SR code, from
Eric Biggers.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
qede: Adverstise software timestamp caps when PHC is not available.
qed: Fix use of incorrect size in memcpy call.
qed: Fix setting of incorrect eswitch mode.
qed: Limit msix vectors in kdump kernel to the minimum required count.
ipvlan: call dev_change_flags when ipvlan mode is reset
ipv6: sr: fix passing wrong flags to crypto_alloc_shash()
net: fix use-after-free in GRO with ESP
tcp: prevent bogus FRTO undos with non-SACK flows
bpf: sockhash, add release routine
bpf: sockhash fix omitted bucket lock in sock_close
bpf: sockmap, fix smap_list_map_remove when psock is in many maps
bpf: sockmap, fix crash when ipv6 sock is added
net: fib_rules: bring back rule_exists to match rule during add
hv_netvsc: split sub-channel setup into async and sync
net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN
atm: zatm: Fix potential Spectre v1
s390/qeth: consistently re-enable device features
s390/qeth: don't clobber buffer on async TX completion
s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6]
s390/qeth: fix race when setting MAC address
...
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-07-01
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) A bpf_fib_lookup() helper fix to change the API before freeze to
return an encoding of the FIB lookup result and return the nexthop
device index in the params struct (instead of device index as return
code that we had before), from David.
2) Various BPF JIT fixes to address syzkaller fallout, that is, do not
reject progs when set_memory_*() fails since it could still be RO.
Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was
an issue, and a memory leak in s390 JIT found during review, from
Daniel.
3) Multiple fixes for sockmap/hash to address most of the syzkaller
triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(),
a missing sock_map_release() routine to hook up to callbacks, and a
fix for an omitted bucket lock in sock_close(), from John.
4) Two bpftool fixes to remove duplicated error message on program load,
and another one to close the libbpf object after program load. One
additional fix for nfp driver's BPF offload to avoid stopping offload
completely if replace of program failed, from Jakub.
5) Couple of BPF selftest fixes that bail out in some of the test
scripts if the user does not have the right privileges, from Jeffrin.
6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set
where we need to set the flag that some of the test cases are expected
to fail, from Kleber.
7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF
since it has no relation to it and lirc2 users often have configs
without cgroups enabled and thus would not be able to use it, from Sean.
8) Fix a selftest failure in sockmap by removing a useless setrlimit()
call that would set a too low limit where at the same time we are
already including bpf_rlimit.h that does the job, from Yonghong.
9) Fix BPF selftest config with missing missing NET_SCHED, from Anders.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull block fixes from Jens Axboe:
"Small set of fixes for this series. Mostly just minor fixes, the only
oddball in here is the sg change.
The sg change came out of the stall fix for NVMe, where we added a
mempool and limited us to a single page allocation. CONFIG_SG_DEBUG
sort-of ruins that, since we'd need to account for that. That's
actually a generic problem, since lots of drivers need to allocate SG
lists. So this just removes support for CONFIG_SG_DEBUG, which I added
back in 2007 and to my knowledge it was never useful.
Anyway, outside of that, this pull contains:
- clone of request with special payload fix (Bart)
- drbd discard handling fix (Bart)
- SATA blk-mq stall fix (me)
- chunk size fix (Keith)
- double free nvme rdma fix (Sagi)"
* tag 'for-linus-20180629' of git://git.kernel.dk/linux-block:
sg: remove ->sg_magic member
drbd: Fix drbd_request_prepare() discard handling
blk-mq: don't queue more if we get a busy return
block: Fix cloning of requests with a special payload
nvme-rdma: fix possible double free of controller async event buffer
block: Fix transfer when chunk sectors exceeds max
|
|
This was introduced more than a decade ago when sg chaining was
added, but we never really caught anything with it. The scatterlist
entry size can be critical, since drivers allocate it, so remove
the magic member. Recently it's been triggering allocation stalls
and failures in NVMe.
Tested-by: Jordan Glover <[email protected]>
Acked-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Flag with FLAG_EXPECTED_FAIL the BPF_MAXINSNS tests that cannot be jited
on s390 because they exceed BPF_SIZE_MAX and fail when
CONFIG_BPF_JIT_ALWAYS_ON is set. Also set .expected_errcode to -ENOTSUPP
so the tests pass in that case.
Signed-off-by: Kleber Sacilotto de Souza <[email protected]>
Acked-by: Song Liu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk fix from Petr Mladek:
"Revert a commit that went in by mistake. I already have a better fix
in the queue for 4.19"
* tag 'printk-for-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
Revert "lib/test_printf.c: call wait_for_random_bytes() before plain %p tests"
|
|
KASAN depends on having access to some of the accounting that SLUB_DEBUG
does; without it, there are immediate crashes [1]. So, the natural
thing to do is to make KASAN select SLUB_DEBUG.
[1] http://lkml.kernel.org/r/CAHmME9rtoPwxUSnktxzKso14iuVCWT7BE_-_8PAC=pGw1iJnQg@mail.gmail.com
Link: http://lkml.kernel.org/r/[email protected]
Fixes: f9e13c0a5a33 ("slab, slub: skip unnecessary kasan_cache_shutdown()")
Signed-off-by: Jason A. Donenfeld <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In commit 804209d8a009 ("lib/percpu_ida.c: use _irqsave() instead of
local_irq_save() + spin_lock") I inlined alloc_local_tag() and mixed up
the >= check from percpu_ida_alloc() with the one in alloc_local_tag().
Don't alloc from per-CPU freelist if ->nr_free is zero.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 804209d8a009 ("lib/percpu_ida.c: use _irqsave() instead of local_irq_save() + spin_lock")
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Reported-by: David Disseldorp <[email protected]>
Tested-by: David Disseldorp <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Nicholas Bellinger <[email protected]>
Cc: Shaohua Li <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Jens Axboe <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Have one extack message for parsing and validating.
Signed-off-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add tests for the bitfield helpers. The constant ones will all
be folded to nothing by the compiler (if everything is correct
in the header file), and the variable ones do some tests against
open-coding the necessary shifts.
A few test cases that should fail/warn compilation are provided
under ifdef.
Suggested-by: Andy Shevchenko <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
|
|
The goal of passing the "quiet" option to the kernel is for the kernel
to be quiet unless something really is wrong.
Sofar passing quiet has been (mostly) equivalent to passing
loglevel=4 on the kernel commandline. Which means to show any messages
with a level of KERN_ERR or higher severity on the console.
In practice this often does not result in a quiet boot though, since
there are many false-positive or otherwise harmless error messages printed,
defeating the purpose of the quiet option. Esp. the ACPICA code is really
bad wrt this, but there are plenty of others too.
This commit makes CONSOLE_LOGLEVEL_QUIET configurable.
This for example will allow distros which want quiet to really mean quiet
to set CONSOLE_LOGLEVEL_QUIET so that only messages with a higher severity
then KERN_ERR (CRIT, ALERT, EMERG) get printed, avoiding an endless game
of whack-a-mole silencing harmless error messages.
Link: http://lkml.kernel.org/r/[email protected]
To: Petr Mladek <[email protected]>
To: Sergey Senozhatsky <[email protected]>
Cc: Hans de Goede <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Signed-off-by: Hans de Goede <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
|
|
This reverts commit ee410f15b1418f2f4428e79980674c979081bcb7.
It might prevent the machine from boot. It would wait for enough
randomness at the very beginning of kernel_init(). But there is
basically nothing running in parallel that would help to produce
any randomness.
Reported-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Petr Mladek <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"A set of fixes and updates for the locking code:
- Prevent lockdep from updating irq state within its own code and
thereby confusing itself.
- Buid fix for older GCCs which mistreat anonymous unions
- Add a missing lockdep annotation in down_read_non_onwer() which
causes up_read_non_owner() to emit a lockdep splat
- Remove the custom alpha dec_and_lock() implementation which is
incorrect in terms of ordering and use the generic one.
The remaining two commits are not strictly fixes. They provide irqsave
variants of atomic_dec_and_lock() and refcount_dec_and_lock(). These
are required to merge the relevant updates and cleanups into different
maintainer trees for 4.19, so routing them into mainline without
actual users is the sanest approach.
They should have been in -rc1, but last weekend I took the liberty to
just avoid computers in order to regain some mental sanity"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/qspinlock: Fix build for anonymous union in older GCC compilers
locking/lockdep: Do not record IRQ state within lockdep code
locking/rwsem: Fix up_read_non_owner() warning with DEBUG_RWSEMS
locking/refcounts: Implement refcount_dec_and_lock_irqsave()
atomic: Add irqsave variant of atomic_dec_and_lock()
alpha: Remove custom dec_and_lock() implementation
|
|
Using rht_dereference_bucket() to dereference
->future_tbl looks like a type error, and could be confusing.
Using rht_dereference_rcu() to test a pointer for NULL
adds an unnecessary barrier - rcu_access_pointer() is preferred
for NULL tests when no lock is held.
This uses 3 different ways to access ->future_tbl.
- if we know the mutex is held, use rht_dereference()
- if we don't hold the mutex, and are only testing for NULL,
use rcu_access_pointer()
- otherwise (using RCU protection for true dereference),
use rht_dereference_rcu().
Note that this includes a simplification of the call to
rhashtable_last_table() - we don't do an extra dereference
before the call any more.
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Rather than borrowing one of the bucket locks to
protect ->future_tbl updates, use cmpxchg().
This gives more freedom to change how bucket locking
is implemented.
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Now that we don't use the hash value or shift in nested_table_alloc()
there is room for simplification.
We only need to pass a "is this a leaf" flag to nested_table_alloc(),
and don't need to track as much information in
rht_bucket_nested_insert().
Note there is another minor cleanup in nested_table_alloc() here.
The number of elements in a page of "union nested_tables" is most naturally
PAGE_SIZE / sizeof(ntbl[0])
The previous code had
PAGE_SIZE / sizeof(ntbl[0].bucket)
which happens to be the correct value only because the bucket uses all
the space in the union.
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The 'ht' and 'hash' arguments to INIT_RHT_NULLS_HEAD() are
no longer used - so drop them. This allows us to also
remove the nhash argument from nested_table_alloc().
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This "feature" is unused, undocumented, and untested and so doesn't
really belong. A patch is under development to properly implement
support for detecting when a search gets diverted down a different
chain, which the common purpose of nulls markers.
This patch actually fixes a bug too. The table resizing allows a
table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
any growth beyond 2^27 is wasteful an ineffective.
This patch results in NULLS_MARKER(0) being used for all chains,
and leaves the use of rht_is_a_null() to test for it.
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Due to the use of rhashtables in net namespaces,
rhashtable.h is included in lots of the kernel,
so a small changes can required a large recompilation.
This makes development painful.
This patch splits out rhashtable-types.h which just includes
the major type declarations, and does not include (non-trivial)
inline code. rhashtable.h is no longer included by anything
in the include/ directory.
Common include files only include rhashtable-types.h so a large
recompilation is only triggered when that changes.
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
print_ht in rhashtable_test calls rht_dereference() with neither
RCU protection or the mutex. This triggers an RCU warning.
So take the mutex to silence the warning.
Acked-by: Herbert Xu <[email protected]>
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In the quest to remove all stack VLA usage from the kernel[1], this
allocates a fixed size stack array to cover the range needed for
bch. This was done instead of a preallocation on the SLAB due to
performance reasons, shown by Ivan Djelic:
little-endian, type sizes: int=4 long=8 longlong=8
cpu: Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz
calibration: iter=4.9143µs niter=2034 nsamples=200 m=13 t=4
Buffer allocation | Encoding throughput (Mbit/s)
---------------------------------------------------
on-stack, VLA | 3988
on-stack, fixed | 4494
kmalloc | 1967
So this change actually improves performance too, it seems.
The resulting stack allocation can get rather large; without
CONFIG_BCH_CONST_PARAMS, it will allocate 4096 bytes, which
trips the stack size checking:
lib/bch.c: In function ‘encode_bch’:
lib/bch.c:261:1: warning: the frame size of 4432 bytes is larger than 2048 bytes [-Wframe-larger-than=]
Even the default case for "allmodconfig" (with CONFIG_BCH_CONST_M=14 and
CONFIG_BCH_CONST_T=4) would have started throwing a warning:
lib/bch.c: In function ‘encode_bch’:
lib/bch.c:261:1: warning: the frame size of 2288 bytes is larger than 2048 bytes [-Wframe-larger-than=]
But this is how large it's always been; it was just hidden from
the checker because it was a VLA. So the Makefile has been adjusted to
silence this warning for anything smaller than 4500 bytes, which should
provide room for normal cases, but still low enough to catch any future
pathological situations.
[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Ivan Djelic <[email protected]>
Tested-by: Ivan Djelic <[email protected]>
Acked-by: Boris Brezillon <[email protected]>
Signed-off-by: Boris Brezillon <[email protected]>
|
|
Debloat <linux/refcount.h>'s dependencies:
- <linux/kernel.h> is not needed, but <linux/compiler.h> is.
- <linux/mutex.h> is not needed, only a forward declaration of "struct mutex".
- <linux/spinlock.h> is not needed, <linux/spinlock_types.h> is enough.
Signed-off-by: Alexey Dobriyan <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Will Deacon <[email protected]>
Link: https://lkml.kernel.org/lkml/20180331220036.GA7676@avx2
Signed-off-by: Ingo Molnar <[email protected]>
|