Age | Commit message (Collapse) | Author | Files | Lines |
|
Open vSwitch provides code to pop an MPLS header to a packet. In
preparation for supporting this in TC, move the pop code to an skb helper
that can be reused.
Remove the, now unused, update_ethertype static function from OvS.
Signed-off-by: John Hurley <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Open vSwitch provides code to push an MPLS header to a packet. In
preparation for supporting this in TC, move the push code to an skb helper
that can be reused.
Signed-off-by: John Hurley <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Acked-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Two cases of overlapping changes, nothing fancy.
Signed-off-by: David S. Miller <[email protected]>
|
|
skb_warn_bad_offload and netdev_rx_csum_fault trigger on hard to debug
issues. Dump more state and the header.
Optionally dump the entire packet and linear segment. This is required
to debug checksum bugs that may include bytes past skb_tail_pointer().
Both call sites call this function inside a net_ratelimit() block.
Limit full packet log further to a hard limit of can_dump_full (5).
Based on an earlier patch by Cong Wang, see link below.
Changes v1 -> v2
- dump frag_list only on full_pkt
Link: https://patchwork.ozlabs.org/patch/1000841/
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
If not set, by default an autogenerated flowlabel is selected.
Explicit flowlabels require a control operation per label plus a
datapath check on every connection (every datagram if unconnected).
This is particularly expensive on unconnected sockets multiplexing
many flows, such as QUIC.
In the common case, where no lease is exclusive, the check can be
safely elided, as both lease request and check trivially succeed.
Indeed, autoflowlabel does the same even with exclusive leases.
Elide the check if no process has requested an exclusive lease.
fl6_sock_lookup previously returns either a reference to a lease or
NULL to denote failure. Modify to return a real error and update
all callers. On return NULL, they can use the label and will elide
the atomic_dec in fl6_sock_release.
This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.
Changes RFC->v1:
- use static_key_false_deferred to rate limit jump label operations
- call static_key_deferred_flush to stop timers on exit
- move decrement out of RCU context
- defer optimization also if opt data is associated with a lease
- updated all fp6_sock_lookup callers, not just udp
Signed-off-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull keyring namespacing from David Howells:
"These patches help make keys and keyrings more namespace aware.
Firstly some miscellaneous patches to make the process easier:
- Simplify key index_key handling so that the word-sized chunks
assoc_array requires don't have to be shifted about, making it
easier to add more bits into the key.
- Cache the hash value in the key so that we don't have to calculate
on every key we examine during a search (it involves a bunch of
multiplications).
- Allow keying_search() to search non-recursively.
Then the main patches:
- Make it so that keyring names are per-user_namespace from the point
of view of KEYCTL_JOIN_SESSION_KEYRING so that they're not
accessible cross-user_namespace.
keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this.
- Move the user and user-session keyrings to the user_namespace
rather than the user_struct. This prevents them propagating
directly across user_namespaces boundaries (ie. the KEY_SPEC_*
flags will only pick from the current user_namespace).
- Make it possible to include the target namespace in which the key
shall operate in the index_key. This will allow the possibility of
multiple keys with the same description, but different target
domains to be held in the same keyring.
keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this.
- Make it so that keys are implicitly invalidated by removal of a
domain tag, causing them to be garbage collected.
- Institute a network namespace domain tag that allows keys to be
differentiated by the network namespace in which they operate. New
keys that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned
the network domain in force when they are created.
- Make it so that the desired network namespace can be handed down
into the request_key() mechanism. This allows AFS, NFS, etc. to
request keys specific to the network namespace of the superblock.
This also means that the keys in the DNS record cache are
thenceforth namespaced, provided network filesystems pass the
appropriate network namespace down into dns_query().
For DNS, AFS and NFS are good, whilst CIFS and Ceph are not. Other
cache keyrings, such as idmapper keyrings, also need to set the
domain tag - for which they need access to the network namespace of
the superblock"
* tag 'keys-namespace-20190627' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Pass the network namespace into request_key mechanism
keys: Network namespace domain tag
keys: Garbage collect keys for which the domain has been removed
keys: Include target namespace in match criteria
keys: Move the user and user-session keyrings to the user_namespace
keys: Namespace keyring names
keys: Add a 'recurse' flag for keyring searches
keys: Cache the hash value to avoid lots of recalculation
keys: Simplify key description management
|
|
If an app is playing tricks to reuse a socket via tcp_disconnect(),
bytes_acked/received needs to be reset to 0. Otherwise tcp_info will
report the sum of the current and the old connection..
Cc: Eric Dumazet <[email protected]>
Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Fixes: bdd1f9edacb5 ("tcp: add tcpi_bytes_received to tcp_info")
Signed-off-by: Christoph Paasch <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IFLA_BOND_PEER_NOTIF_DELAY was set to the value of downdelay instead
of peer_notif_delay. After this change, the correct value is exported.
Fixes: 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications")
Signed-off-by: Vincent Bernat <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
socket->wq is assign-once, set when we are initializing both
struct socket it's in and struct socket_wq it points to. As the
matter of fact, the only reason for separate allocation was the
ability to RCU-delay freeing of socket_wq. RCU-delaying the
freeing of socket itself gets rid of that need, so we can just
fold struct socket_wq into the end of struct socket and simplify
the life both for sock_alloc_inode() (one allocation instead of
two) and for tun/tap oddballs, where we used to embed struct socket
and struct socket_wq into the same structure (now - embedding just
the struct socket).
Note that reference to struct socket_wq in struct sock does remain
a reference - that's unchanged.
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
we do have an RCU-delayed part there already (freeing the wq),
so it's not like the pipe situation; moreover, it might be
worth considering coallocating wq with the rest of struct sock_alloc.
->sk_wq in struct sock would remain a pointer as it is, but
the object it normally points to would be coallocated with
struct socket...
Signed-off-by: Al Viro <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We have a dedicated pointer for that, so use it. Much easier to read and
less computation involved.
Reported-by: Peter Rosin <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull request_key improvements from David Howells:
"These are all request_key()-related, including a fix and some improvements:
- Fix the lack of a Link permission check on a key found by
request_key(), thereby enabling request_key() to link keys that
don't grant this permission to the target keyring (which must still
grant Write permission).
Note that the key must be in the caller's keyrings already to be
found.
- Invalidate used request_key authentication keys rather than
revoking them, so that they get cleaned up immediately rather than
hanging around till the expiry time is passed.
- Move the RCU locks outwards from the keyring search functions so
that a request_key_rcu() can be provided. This can be called in RCU
mode, so it can't sleep and can't upcall - but it can be called
from LOOKUP_RCU pathwalk mode.
- Cache the latest positive result of request_key*() temporarily in
task_struct so that filesystems that make a lot of request_key()
calls during pathwalk can take advantage of it to avoid having to
redo the searching. This requires CONFIG_KEYS_REQUEST_CACHE=y.
It is assumed that the key just found is likely to be used multiple
times in each step in an RCU pathwalk, and is likely to be reused
for the next step too.
Note that the cleanup of the cache is done on TIF_NOTIFY_RESUME,
just before userspace resumes, and on exit"
* tag 'keys-request-20190626' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Kill off request_key_async{,_with_auxdata}
keys: Cache result of request_key*() temporarily in task_struct
keys: Provide request_key_rcu()
keys: Move the RCU locks outwards from the keyring search functions
keys: Invalidate used request_key authentication keys
keys: Fix request_key() lack of Link perm check on found key
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-07-09
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Lots of libbpf improvements: i) addition of new APIs to attach BPF
programs to tracing entities such as {k,u}probes or tracepoints,
ii) improve specification of BTF-defined maps by eliminating the
need for data initialization for some of the members, iii) addition
of a high-level API for setting up and polling perf buffers for
BPF event output helpers, all from Andrii.
2) Add "prog run" subcommand to bpftool in order to test-run programs
through the kernel testing infrastructure of BPF, from Quentin.
3) Improve verifier for BPF sockaddr programs to support 8-byte stores
for user_ip6 and msg_src_ip6 members given clang tends to generate
such stores, from Stanislav.
4) Enable the new BPF JIT zero-extension optimization for further
riscv64 ALU ops, from Luke.
5) Fix a bpftool json JIT dump crash on powerpc, from Jiri.
6) Fix an AF_XDP race in generic XDP's receive path, from Ilya.
7) Various smaller fixes from Ilya, Yue and Arnd.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull misc keyring updates from David Howells:
"These are some miscellaneous keyrings fixes and improvements:
- Fix a bunch of warnings from sparse, including missing RCU bits and
kdoc-function argument mismatches
- Implement a keyctl to allow a key to be moved from one keyring to
another, with the option of prohibiting key replacement in the
destination keyring.
- Grant Link permission to possessors of request_key_auth tokens so
that upcall servicing daemons can more easily arrange things such
that only the necessary auth key is passed to the actual service
program, and not all the auth keys a daemon might possesss.
- Improvement in lookup_user_key().
- Implement a keyctl to allow keyrings subsystem capabilities to be
queried.
The keyutils next branch has commits to make available, document and
test the move-key and capabilities code:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log
They're currently on the 'next' branch"
* tag 'keys-misc-20190619' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
keys: Add capability-checking keyctl function
keys: Reuse keyring_index_key::desc_len in lookup_user_key()
keys: Grant Link permission to possessers of request_key auth keys
keys: Add a keyctl to move a key between keyrings
keys: Hoist locking out of __key_link_begin()
keys: Break bits out of key_unlink()
keys: Change keyring_serialise_link_sem to a mutex
keys: sparse: Fix kdoc mismatches
keys: sparse: Fix incorrect RCU accesses
keys: sparse: Fix key_fs[ug]id_changed()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"Like the audit pull request this is a little early due to some
upcoming vacation plans and uncertain network access while I'm away.
Also like the audit PR, the list of patches here is pretty minor, the
highlights include:
- Explicitly use __le variables to make sure "sparse" can verify
proper byte endian handling.
- Remove some BUG_ON()s that are no longer needed.
- Allow zero-byte writes to the "keycreate" procfs attribute without
requiring key:create to make it easier for userspace to reset the
keycreate label.
- Consistently log the "invalid_context" field as an untrusted string
in the AUDIT_SELINUX_ERR audit records"
* tag 'selinux-pr-20190702' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: format all invalid context as untrusted
selinux: fix empty write to keycreate file
selinux: remove some no-op BUG_ONs
selinux: provide __le variables explicitly
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"This pull request is a bit early, but with some vacation time coming
up I wanted to send this out now just in case the remote Internet Gods
decide not to smile on me once the merge window opens. The patchset
for v5.3 is pretty minor this time, the highlights include:
- When the audit daemon is sent a signal, ensure we deliver
information about the sender even when syscall auditing is not
enabled/supported.
- Add the ability to filter audit records based on network address
family.
- Tighten the audit field filtering restrictions on string based
fields.
- Cleanup the audit field filtering verification code.
- Remove a few BUG() calls from the audit code"
* tag 'audit-pr-20190702' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: remove the BUG() calls in the audit rule comparison functions
audit: enforce op for string fields
audit: add saddr_fam filter field
audit: re-structure audit field valid checks
audit: deliver signal_info regarless of syscall
|
|
Pull tpm updates from Jarkko Sakkinen:
"This contains two critical bug fixes and support for obtaining TPM
events triggered by ExitBootServices().
For the latter I have to give a quite verbose explanation not least
because I had to revisit all the details myself to remember what was
going on in Matthew's patches.
The preboot software stack maintains an event log that gets entries
every time something gets hashed to any of the PCR registers. What
gets hashed could be a component to be run or perhaps log of some
actions taken just to give couple of coarse examples. In general,
anything relevant for the boot process that the preboot software does
gets hashed and a log entry with a specific event type [1].
The main application for this is remote attestation and the reason why
it is useful is nicely put in the very first section of [1]:
"Attestation is used to provide information about the platform’s
state to a challenger. However, PCR contents are difficult to
interpret; therefore, attestation is typically more useful when
the PCR contents are accompanied by a measurement log. While not
trusted on their own, the measurement log contains a richer set of
information than do the PCR contents. The PCR contents are used to
provide the validation of the measurement log."
Because EFI_TCG2_PROTOCOL.GetEventLog() is not available after calling
ExitBootServices(), Linux EFI stub copies the event log to a custom
configuration table. Unfortunately, ExitBootServices() also generates
events and obviously these events do not get copied to that table.
Luckily firmware does this for us by providing a configuration table
identified by EFI_TCG2_FINAL_EVENTS_TABLE_GUID.
This essentially contains necessary changes to provide the full event
log for the use the user space that is concatenated from these two
partial event logs [2]"
[1] https://trustedcomputinggroup.org/resource/pc-client-specific-platform-firmware-profile-specification/
[2] The final concatenation is done in drivers/char/tpm/eventlog/efi.c
* tag 'tpmdd-next-20190625' of git://git.infradead.org/users/jjs/linux-tpmdd:
tpm: Don't duplicate events from the final event log in the TCG2 log
Abstract out support for locating an EFI config table
tpm: Fix TPM 1.2 Shutdown sequence to prevent future TPM operations
efi: Attempt to get the TCG2 event log in the boot stub
tpm: Append the final event log to the TPM event log
tpm: Reserve the TPM final events table
tpm: Abstract crypto agile event size calculations
tpm: Actually fail on TPM errors during "get random"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 topology updates from Ingo Molnar:
"Implement multi-die topology support on Intel CPUs and expose the die
topology to user-space tooling, by Len Brown, Kan Liang and Zhang Rui.
These changes should have no effect on the kernel's existing
understanding of topologies, i.e. there should be no behavioral impact
on cache, NUMA, scheduler, perf and other topologies and overall
system performance"
* 'x86-topology-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/rapl: Cosmetic rename internal variables in response to multi-die/pkg support
perf/x86/intel/uncore: Cosmetic renames in response to multi-die/pkg support
hwmon/coretemp: Cosmetic: Rename internal variables to zones from packages
thermal/x86_pkg_temp_thermal: Cosmetic: Rename internal variables to zones from packages
perf/x86/intel/cstate: Support multi-die/package
perf/x86/intel/rapl: Support multi-die/package
perf/x86/intel/uncore: Support multi-die/package
topology: Create core_cpus and die_cpus sysfs attributes
topology: Create package_cpus sysfs attribute
hwmon/coretemp: Support multi-die/package
powercap/intel_rapl: Update RAPL domain name and debug messages
thermal/x86_pkg_temp_thermal: Support multi-die/package
powercap/intel_rapl: Support multi-die/package
powercap/intel_rapl: Simplify rapl_find_package()
x86/topology: Define topology_logical_die_id()
x86/topology: Define topology_die_id()
cpu/topology: Export die_id
x86/topology: Create topology_max_die_per_package()
x86/topology: Add CPUID.1F multi-die/package support
|
|
Each iteration of for_each_child_of_node puts the previous
node, but in the case of a return from the middle of the loop, there is
no put, thus causing a memory leak. Hence add an of_node_put before the
return.
Issue found with Coccinelle.
Signed-off-by: Nishka Dasgupta <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Guenter Roeck <[email protected]>
|
|
sysfs_notify() and kobject_uevent() are passed the wrong device.
fan_data->hwmon_dev needs to be passed, so that sysfs notification
goes to right place and generated uevent has the right information
Signed-off-by: Christian Schneider <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
devm_hwmon_device_register_with_groups
This makes it possible to use the hwmon_dev in fan_alarm_notify(). Otherwise
it would be possible, that a interupt arrives and fan_alarm_notify() is
executed, before hwmon_dev is initialized.
Signed-off-by: Christian Schneider <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
The code to update the configuration register is repeated several times.
Move it into a separate function. At the same time, un-inline
lm90_select_remote_channel() and leave it up to the compiler to decide
what to do with it. Also remove the 'client' argument from
lm90_select_remote_channel() and from lm90_write_convrate() and take
it from struct lm90_data instead where needed.
This patch reduces code size by more than 800 bytes on x86_64.
Cc: Boyang Yu <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
The configuration register does not change on its own. Yet, it is read
in various locations, modified, and written back. Simplify and optimize
the code by caching its value and by only writing it back when needed.
Cc: Boyang Yu <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
max6658 may report unrealistically high temperature during
the driver initialization, for which, its overtemp alarm pin
also gets asserted. For certain devices implementing overtemp
protection based on that pin, it may further trigger a reset to
the device. By reproducing the problem, the wrong reading is
found to be coincident with changing the conversion rate.
To mitigate this issue, set the stop bit before changing the
conversion rate and unset it thereafter. After such change, the
wrong reading is not reproduced. Apply this change only to the
max6657 kind for now, controlled by flag LM90_PAUSE_ON_CONFIG.
Signed-off-by: Boyang Yu <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
Linux style for comments is the C89 "/* ... */" style,
changes the comments to Linux style.
Signed-off-by: amy.shih <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
When register read and write operations return errors, needs to add
error handling.
Signed-off-by: amy.shih <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 platform updayes from Ingo Molnar:
"Most of the commits add ACRN hypervisor guest support, plus two
cleanups"
* 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/jailhouse: Mark jailhouse_x2apic_available() as __init
x86/platform/geode: Drop <linux/gpio.h> includes
x86/acrn: Use HYPERVISOR_CALLBACK_VECTOR for ACRN guest upcall vector
x86: Add support for Linux guests on an ACRN hypervisor
x86/Kconfig: Add new X86_HV_CALLBACK_VECTOR config symbol
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Ingo Molnar:
"A handful of paravirt patching code enhancements to make it more
robust against patching failures, and related cleanups and not so
related cleanups - by Thomas Gleixner and myself"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/paravirt: Rename paravirt_patch_site::instrtype to paravirt_patch_site::type
x86/paravirt: Standardize 'insn_buff' variable names
x86/paravirt: Match paravirt patchlet field definition ordering to initialization ordering
x86/paravirt: Replace the paravirt patch asm magic
x86/paravirt: Unify the 32/64 bit paravirt patching code
x86/paravirt: Detect over-sized patching bugs in paravirt_patch_call()
x86/paravirt: Detect over-sized patching bugs in paravirt_patch_insns()
x86/paravirt: Remove bogus extern declarations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 AVX512 status update from Ingo Molnar:
"This adds a new ABI that the main scheduler probably doesn't want to
deal with but HPC job schedulers might want to use: the
AVX512_elapsed_ms field in the new /proc/<pid>/arch_status task status
file, which allows the user-space job scheduler to cluster such tasks,
to avoid turbo frequency drops"
* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation/filesystems/proc.txt: Add arch_status file
x86/process: Add AVX-512 usage elapsed time to /proc/pid/arch_status
proc: Add /proc/<pid>/arch_status
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Misc small cleanups: removal of superfluous code and coding style
cleanups mostly"
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kexec: Make variable static and config dependent
x86/defconfigs: Remove useless UEVENT_HELPER_PATH
x86/amd_nb: Make hygon_nb_misc_ids static
x86/tsc: Move inline keyword to the beginning of function declarations
x86/io_delay: Define IO_DELAY macros in C instead of Kconfig
x86/io_delay: Break instead of fallthrough in switch statement
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache resource control update from Ingo Molnar:
"Two cleanup patches"
* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Cleanup cbm_ensure_valid()
x86/resctrl: Use _ASM_BX to avoid ifdeffery
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Ingo Molnar:
"Two kbuild enhancements by Masahiro Yamada"
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/build: Remove redundant 'clean-files += capflags.c'
x86/build: Add 'set -e' to mkcapflags.sh to delete broken capflags.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
"Most of the changes relate to Peter Zijlstra's cleanup of ptregs
handling, in particular the i386 part is now much simplified and
standardized - no more partial ptregs stack frames via the esp/ss
oddity. This simplifies ftrace, kprobes, the unwinder, ptrace, kdump
and kgdb.
There's also a CR4 hardening enhancements by Kees Cook, to make the
generic platform functions such as native_write_cr4() less useful as
ROP gadgets that disable SMEP/SMAP. Also protect the WP bit of CR0
against similar attacks.
The rest is smaller cleanups/fixes"
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternatives: Add int3_emulate_call() selftest
x86/stackframe/32: Allow int3_emulate_push()
x86/stackframe/32: Provide consistent pt_regs
x86/stackframe, x86/ftrace: Add pt_regs frame annotations
x86/stackframe, x86/kprobes: Fix frame pointer annotations
x86/stackframe: Move ENCODE_FRAME_POINTER to asm/frame.h
x86/entry/32: Clean up return from interrupt preemption path
x86/asm: Pin sensitive CR0 bits
x86/asm: Pin sensitive CR4 bits
Documentation/x86: Fix path to entry_32.S
x86/asm: Remove unused TASK_TI_flags from asm-offsets.c
|
|
Unlike driver mode, generic xdp receive could be triggered
by different threads on different CPU cores at the same time
leading to the fill and rx queue breakage. For example, this
could happen while sending packets from two processes to the
first interface of veth pair while the second part of it is
open with AF_XDP socket.
Need to take a lock for each generic receive to avoid race.
Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support")
Signed-off-by: Ilya Maximets <[email protected]>
Acked-by: Magnus Karlsson <[email protected]>
Tested-by: William Tu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Remove the unused per rq load array and all its infrastructure, by
Dietmar Eggemann.
- Add utilization clamping support by Patrick Bellasi. This is a
refinement of the energy aware scheduling framework with support for
boosting of interactive and capping of background workloads: to make
sure critical GUI threads get maximum frequency ASAP, and to make
sure background processing doesn't unnecessarily move to cpufreq
governor to higher frequencies and less energy efficient CPU modes.
- Add the bare minimum of tracepoints required for LISA EAS regression
testing, by Qais Yousef - which allows automated testing of various
power management features, including energy aware scheduling.
- Restructure the former tsk_nr_cpus_allowed() facility that the -rt
kernel used to modify the scheduler's CPU affinity logic such as
migrate_disable() - introduce the task->cpus_ptr value instead of
taking the address of &task->cpus_allowed directly - by Sebastian
Andrzej Siewior.
- Misc optimizations, fixes, cleanups and small enhancements - see the
Git log for details.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
sched/uclamp: Add uclamp support to energy_compute()
sched/uclamp: Add uclamp_util_with()
sched/cpufreq, sched/uclamp: Add clamps for FAIR and RT tasks
sched/uclamp: Set default clamps for RT tasks
sched/uclamp: Reset uclamp values on RESET_ON_FORK
sched/uclamp: Extend sched_setattr() to support utilization clamping
sched/core: Allow sched_setattr() to use the current policy
sched/uclamp: Add system default clamps
sched/uclamp: Enforce last task's UCLAMP_MAX
sched/uclamp: Add bucket local max tracking
sched/uclamp: Add CPU's clamp buckets refcounting
sched/fair: Rename weighted_cpuload() to cpu_runnable_load()
sched/debug: Export the newly added tracepoints
sched/debug: Add sched_overutilized tracepoint
sched/debug: Add new tracepoint to track PELT at se level
sched/debug: Add new tracepoints to track PELT at rq level
sched/debug: Add a new sched_trace_*() helper functions
sched/autogroup: Make autogroup_path() always available
sched/wait: Deduplicate code with do-while
sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()
...
|
|
Stephen Suryaputra says:
====================
net: Multipath hashing on inner L3
This series extends commit 363887a2cdfe ("ipv4: Support multipath
hashing on inner IP pkts for GRE tunnel") to include support when the
outer L3 is IPv6 and to consider the case where the inner L3 is
different version from the outer L3, such as IPv6 tunneled by IPv4 GRE
or vice versa. It also includes kselftest scripts to test the use cases.
v2: Clarify the commit messages in the commits in this series to use the
term tunneled by IPv4 GRE or by IPv6 GRE so that it's clear which
one is the inner and which one is the outer (per David Miller).
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add selftest scripts for multipath hashing on inner IP pkts when there
is a single GRE tunnel but there are multiple underlay routes to reach
the other end of the tunnel.
Four cases are covered in these scripts:
- IPv4 inner, IPv4 outer
- IPv6 inner, IPv4 outer
- IPv4 inner, IPv6 outer
- IPv6 inner, IPv6 outer
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: Stephen Suryaputra <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Make the same support as commit 363887a2cdfe ("ipv4: Support multipath
hashing on inner IP pkts for GRE tunnel") for outer IPv6. The hashing
considers both IPv4 and IPv6 pkts when they are tunneled by IPv6 GRE.
Signed-off-by: Stephen Suryaputra <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts
for GRE tunnel") supports multipath policy value of 2, Layer 3 or inner
Layer 3 if present, but it only considers inner IPv4. There is a use
case of IPv6 is tunneled by IPv4 GRE, thus add the ability to hash on
inner IPv6 addresses.
Fixes: 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel")
Signed-off-by: Stephen Suryaputra <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The phy_dn variable is still being used in of_phy_connect() after the
of_node_put() call, which may result in use-after-free.
Fixes: 1dd2d06c0459 ("net: Rework pasemi_mac driver to use of_mdio infrastructure")
Signed-off-by: Wen Yang <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RAS updates from Ingo Molnar:
"Boris is on vacation so I'm sending the RAS bits this time. The main
changes were:
- Various RAS/CEC improvements and fixes by Borislav Petkov:
- error insertion fixes
- offlining latency fix
- memory leak fix
- additional sanity checks
- cleanups
- debug output improvements
- More SMCA enhancements by Yazen Ghannam:
- make banks truly per-CPU which they are in the hardware
- don't over-cache certain registers
- make the number of MCA banks per-CPU variable
The long term goal with these changes is to support future
heterogenous SMCA extensions.
- Misc fixes and improvements"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Do not check return value of debugfs_create functions
x86/MCE: Determine MCA banks' init state properly
x86/MCE: Make the number of MCA banks a per-CPU variable
x86/MCE/AMD: Don't cache block addresses on SMCA systems
x86/MCE: Make mce_banks a per-CPU array
x86/MCE: Make struct mce_banks[] static
RAS/CEC: Add copyright
RAS/CEC: Add CONFIG_RAS_CEC_DEBUG and move CEC debug features there
RAS/CEC: Dump the different array element sections
RAS/CEC: Rename count_threshold to action_threshold
RAS/CEC: Sanity-check array on every insertion
RAS/CEC: Fix potential memory leak
RAS/CEC: Do not set decay value on error
RAS/CEC: Check count_threshold unconditionally
RAS/CEC: Fix pfn insertion
|
|
There is a possible use-after-free issue in the axienet_probe():
1701: np = of_parse_phandle(pdev->dev.of_node, "axistream-connected", 0);
1702: if (np) {
...
1787: of_node_put(np); ---> released here
1788: lp->eth_irq = platform_get_irq(pdev, 0);
1789: } else {
...
1801: }
1802: if (IS_ERR(lp->dma_regs)) {
...
1805: of_node_put(np); ---> double released here
1806: goto free_netdev;
1807: }
We solve this problem by removing the unnecessary of_node_put().
Fixes: 28ef9ebdb64c ("net: axienet: make use of axistream-connected attribute optional")
Signed-off-by: Wen Yang <[email protected]>
Cc: Anirudha Sarangi <[email protected]>
Cc: John Linn <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Robert Hancock <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Reviewed-by: Robert Hancock <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The function cache_seq_next is declared static and marked
EXPORT_SYMBOL_GPL, which is at best an odd combination. Because the
function is not used outside of the net/sunrpc/cache.c file it is
defined in, this commit removes the EXPORT_SYMBOL_GPL() marking.
Fixes: d48cf356a130 ("SUNRPC: Remove non-RCU protected lookup")
Signed-off-by: Denis Efremov <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"The main changes in this cycle are:
- rwsem scalability improvements, phase #2, by Waiman Long, which are
rather impressive:
"On a 2-socket 40-core 80-thread Skylake system with 40 reader
and writer locking threads, the min/mean/max locking operations
done in a 5-second testing window before the patchset were:
40 readers, Iterations Min/Mean/Max = 1,807/1,808/1,810
40 writers, Iterations Min/Mean/Max = 1,807/50,344/151,255
After the patchset, they became:
40 readers, Iterations Min/Mean/Max = 30,057/31,359/32,741
40 writers, Iterations Min/Mean/Max = 94,466/95,845/97,098"
There's a lot of changes to the locking implementation that makes
it similar to qrwlock, including owner handoff for more fair
locking.
Another microbenchmark shows how across the spectrum the
improvements are:
"With a locking microbenchmark running on 5.1 based kernel, the
total locking rates (in kops/s) on a 2-socket Skylake system
with equal numbers of readers and writers (mixed) before and
after this patchset were:
# of Threads Before Patch After Patch
------------ ------------ -----------
2 2,618 4,193
4 1,202 3,726
8 802 3,622
16 729 3,359
32 319 2,826
64 102 2,744"
The changes are extensive and the patch-set has been through
several iterations addressing various locking workloads. There
might be more regressions, but unless they are pathological I
believe we want to use this new implementation as the baseline
going forward.
- jump-label optimizations by Daniel Bristot de Oliveira: the primary
motivation was to remove IPI disturbance of isolated RT-workload
CPUs, which resulted in the implementation of batched jump-label
updates. Beyond the improvement of the real-time characteristics
kernel, in one test this patchset improved static key update
overhead from 57 msecs to just 1.4 msecs - which is a nice speedup
as well.
- atomic64_t cross-arch type cleanups by Mark Rutland: over the last
~10 years of atomic64_t existence the various types used by the
APIs only had to be self-consistent within each architecture -
which means they became wildly inconsistent across architectures.
Mark puts and end to this by reworking all the atomic64
implementations to use 's64' as the base type for atomic64_t, and
to ensure that this type is consistently used for parameters and
return values in the API, avoiding further problems in this area.
- A large set of small improvements to lockdep by Yuyang Du: type
cleanups, output cleanups, function return type and othr cleanups
all around the place.
- A set of percpu ops cleanups and fixes by Peter Zijlstra.
- Misc other changes - please see the Git log for more details"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (82 commits)
locking/lockdep: increase size of counters for lockdep statistics
locking/atomics: Use sed(1) instead of non-standard head(1) option
locking/lockdep: Move mark_lock() inside CONFIG_TRACE_IRQFLAGS && CONFIG_PROVE_LOCKING
x86/jump_label: Make tp_vec_nr static
x86/percpu: Optimize raw_cpu_xchg()
x86/percpu, sched/fair: Avoid local_clock()
x86/percpu, x86/irq: Relax {set,get}_irq_regs()
x86/percpu: Relax smp_processor_id()
x86/percpu: Differentiate this_cpu_{}() and __this_cpu_{}()
locking/rwsem: Guard against making count negative
locking/rwsem: Adaptive disabling of reader optimistic spinning
locking/rwsem: Enable time-based spinning on reader-owned rwsem
locking/rwsem: Make rwsem->owner an atomic_long_t
locking/rwsem: Enable readers spinning on writer
locking/rwsem: Clarify usage of owner's nonspinaable bit
locking/rwsem: Wake up almost all readers in wait queue
locking/rwsem: More optimal RT task handling of null owner
locking/rwsem: Always release wait_lock before waking up tasks
locking/rwsem: Implement lock handoff to prevent lock starvation
locking/rwsem: Make rwsem_spin_on_owner() return owner state
...
|
|
Fix endianness issue: passing a pointer to 64-bit fd as a 32-bit key
does not work on big-endian architectures. So cast fd to 32-bits when
necessary.
Signed-off-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
DWMAC4 is capable to support clause 45 mdio communication.
This patch enable the feature on stmmac_mdio_write() and
stmmac_mdio_read() by following phy_write_mmd() and
phy_read_mmd() mdiobus read write implementation format.
Reviewed-by: Li, Yifan <[email protected]>
Signed-off-by: Kweh Hock Leong <[email protected]>
Signed-off-by: Ong Boon Leong <[email protected]>
Signed-off-by: Voon Weifeng <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use netif_ovs_is_port() function instead of open code.
This patch doesn't change logic.
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In this release cycle the number of NIC drivers using page_pool
will likely reach 4 drivers. It is about time to add a maintainer
entry. Add myself and Ilias.
Signed-off-by: Jesper Dangaard Brouer <[email protected]>
Acked-by: Ilias Apalodimas <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Maxime Chevallier says:
====================
net: mvpp2: Add classification based on the ETHER flow
This series adds support for classification of the ETHER flow in the
mvpp2 driver.
The first patch allows detecting when a user specifies a flow_type that
isn't supported by the driver, while the second adds support for this
flow_type by adding the mapping between the ETHER_FLOW enum value and
the relevant classifier flow entries.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Users can specify classification actions based on the 'ether' flow type.
In that case, this will apply to all ethernet traffic, superseeding
flows such as 'udp4' or 'tcp6'.
Add support for this flow type in the PPv2 classifier, by mapping the
ETHER_FLOW value to the corresponding entries in the classifier.
Signed-off-by: Maxime Chevallier <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|