aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2020-01-16Merge branch 'topic/sdw_intel' into nextVinod Koul1-0/+11
2020-01-16soundwire: intel: report slave_ids for each link to SOF driverBard Liao1-0/+11
The existing link_mask flag is no longer sufficient to detect the hardware and identify which topology file and a machine driver to load. By reporting the slave_ids exposed in ACPI tables, the parent SOF driver will be able to compare against a set of static configurations. This patch only adds the interface change, the functionality is added in future patches. Signed-off-by: Bard Liao <[email protected]> Signed-off-by: Pierre-Louis Bossart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2020-01-16gpio: Fix the no return statement warningKevin Hao1-0/+2
In commit 242587616710 ("gpiolib: Add support for the irqdomain which doesn't use irq_fwspec as arg") we have changed the return type of gpiochip_populate_parent_fwspec_twocell/fourcell() from void to void *, but forgot to add a return statement for these two dummy functions. Add "return NULL" to fix the build warnings. Reported-by: kbuild test robot <[email protected]> Signed-off-by: Kevin Hao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2020-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2-5/+10
Daniel Borkmann says: ==================== pull-request: bpf 2020-01-15 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 9 day(s) which contain a total of 13 files changed, 95 insertions(+), 43 deletions(-). The main changes are: 1) Fix refcount leak for TCP time wait and request sockets for socket lookup related BPF helpers, from Lorenz Bauer. 2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann. 3) Batch of several sockmap and related TLS fixes found while operating more complex BPF programs with Cilium and OpenSSL, from John Fastabend. 4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue() to avoid purging data upon teardown, from Lingpeng Chen. 5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly dump a BPF map's value with BTF, from Martin KaFai Lau. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-15block: fix an integer overflow in logical block sizeMikulas Patocka1-4/+4
Logical block size has type unsigned short. That means that it can be at most 32768. However, there are architectures that can run with 64k pages (for example arm64) and on these architectures, it may be possible to create block devices with 64k block size. For exmaple (run this on an architecture with 64k pages): Mount will fail with this error because it tries to read the superblock using 2-sector access: device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536 EXT4-fs (dm-0): unable to read superblock This patch changes the logical block size from unsigned short to unsigned int to avoid the overflow. Cc: [email protected] Reviewed-by: Martin K. Petersen <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-15scsi: drivers: base: Propagate errors through the transport componentGabriel Krisman Bertazi1-3/+3
The transport registration may fail. Make sure the errors are propagated to the callers. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-01-15scsi: drivers: base: Support atomic version of ↵Gabriel Krisman Bertazi1-0/+7
attribute_container_device_trigger attribute_container_device_trigger invokes callbacks that may fail for one or more classdevs, for instance, the transport_add_class_device callback, called during transport creation, does memory allocation. This information, though, is not propagated to upper layers, and any driver using the attribute_container_device_trigger API will not know whether any, some, or all callbacks succeeded. This patch implements a safe version of this dispatcher, to either succeed all the callbacks or revert to the original state. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-01-15bpf: Sockmap/tls, push write_space updates through ulp updatesJohn Fastabend1-4/+8
When sockmap sock with TLS enabled is removed we cleanup bpf/psock state and call tcp_update_ulp() to push updates to TLS ULP on top. However, we don't push the write_space callback up and instead simply overwrite the op with the psock stored previous op. This may or may not be correct so to ensure we don't overwrite the TLS write space hook pass this field to the ULP and have it fixup the ctx. This completes a previous fix that pushed the ops through to the ULP but at the time missed doing this for write_space, presumably because write_space TLS hook was added around the same time. Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Acked-by: Jonathan Lemon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loopJohn Fastabend1-0/+1
When a sockmap is free'd and a socket in the map is enabled with tls we tear down the bpf context on the socket, the psock struct and state, and then call tcp_update_ulp(). The tcp_update_ulp() call is to inform the tls stack it needs to update its saved sock ops so that when the tls socket is later destroyed it doesn't try to call the now destroyed psock hooks. This is about keeping stacked ULPs in good shape so they always have the right set of stacked ops. However, recently unhash() hook was removed from TLS side. But, the sockmap/bpf side is not doing any extra work to update the unhash op when is torn down instead expecting TLS side to manage it. So both TLS and sockmap believe the other side is managing the op and instead no one updates the hook so it continues to point at tcp_bpf_unhash(). When unhash hook is called we call tcp_bpf_unhash() which detects the psock has already been destroyed and calls sk->sk_prot_unhash() which calls tcp_bpf_unhash() yet again and so on looping and hanging the core. To fix have sockmap tear down logic fixup the stale pointer. Fixes: 5d92e631b8be ("net/tls: partially revert fix transition through disconnect with close") Reported-by: [email protected] Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Acked-by: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Add batch ops to all htab bpf mapYonghong Song1-0/+3
htab can't use generic batch support due some problematic behaviours inherent to the data structre, i.e. while iterating the bpf map a concurrent program might delete the next entry that batch was about to use, in that case there's no easy solution to retrieve the next entry, the issue has been discussed multiple times (see [1] and [2]). The only way hmap can be traversed without the problem previously exposed is by making sure that the map is traversing entire buckets. This commit implements those strict requirements for hmap, the implementation follows the same interaction that generic support with some exceptions: - If keys/values buffer are not big enough to traverse a bucket, ENOSPC will be returned. - out_batch contains the value of the next bucket in the iteration, not the next key, but this is transparent for the user since the user should never use out_batch for other than bpf batch syscalls. This commits implements BPF_MAP_LOOKUP_BATCH and adds support for new command BPF_MAP_LOOKUP_AND_DELETE_BATCH. Note that for update/delete batch ops it is possible to use the generic implementations. [1] https://lore.kernel.org/bpf/[email protected]/ [2] https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Brian Vazquez <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Add generic support for update and delete batch opsBrian Vazquez1-0/+10
This commit adds generic support for update and delete batch ops that can be used for almost all the bpf maps. These commands share the same UAPI attr that lookup and lookup_and_delete batch ops use and the syscall commands are: BPF_MAP_UPDATE_BATCH BPF_MAP_DELETE_BATCH The main difference between update/delete and lookup batch ops is that for update/delete keys/values must be specified for userspace and because of that, neither in_batch nor out_batch are used. Suggested-by: Stanislav Fomichev <[email protected]> Signed-off-by: Brian Vazquez <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Add generic support for lookup batch opBrian Vazquez1-0/+5
This commit introduces generic support for the bpf_map_lookup_batch. This implementation can be used by almost all the bpf maps since its core implementation is relying on the existing map_get_next_key and map_lookup_elem. The bpf syscall subcommand introduced is: BPF_MAP_LOOKUP_BATCH The UAPI attribute is: struct { /* struct used by BPF_MAP_*_BATCH commands */ __aligned_u64 in_batch; /* start batch, * NULL to start from beginning */ __aligned_u64 out_batch; /* output: next start batch */ __aligned_u64 keys; __aligned_u64 values; __u32 count; /* input/output: * input: # of key/value * elements * output: # of filled elements */ __u32 map_fd; __u64 elem_flags; __u64 flags; } batch; in_batch/out_batch are opaque values use to communicate between user/kernel space, in_batch/out_batch must be of key_size length. To start iterating from the beginning in_batch must be null, count is the # of key/value elements to retrieve. Note that the 'keys' buffer must be a buffer of key_size * count size and the 'values' buffer must be value_size * count, where value_size must be aligned to 8 bytes by userspace if it's dealing with percpu maps. 'count' will contain the number of keys/values successfully retrieved. Note that 'count' is an input/output variable and it can contain a lower value after a call. If there's no more entries to retrieve, ENOENT will be returned. If error is ENOENT, count might be > 0 in case it copied some values but there were no more entries to retrieve. Note that if the return code is an error and not -EFAULT, count indicates the number of elements successfully processed. Suggested-by: Stanislav Fomichev <[email protected]> Signed-off-by: Brian Vazquez <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Fix incorrect verifier simulation of ARSH under ALU32Daniel Borkmann1-1/+1
Anatoly has been fuzzing with kBdysch harness and reported a hang in one of the outcomes: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (85) call bpf_get_socket_cookie#46 1: R0_w=invP(id=0) R10=fp0 1: (57) r0 &= 808464432 2: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 2: (14) w0 -= 810299440 3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 3: (c4) w0 s>>= 1 4: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 4: (76) if w0 s>= 0x30303030 goto pc+216 221: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 221: (95) exit processed 6 insns (limit 1000000) [...] Taking a closer look, the program was xlated as follows: # ./bpftool p d x i 12 0: (85) call bpf_get_socket_cookie#7800896 1: (bf) r6 = r0 2: (57) r6 &= 808464432 3: (14) w6 -= 810299440 4: (c4) w6 s>>= 1 5: (76) if w6 s>= 0x30303030 goto pc+216 6: (05) goto pc-1 7: (05) goto pc-1 8: (05) goto pc-1 [...] 220: (05) goto pc-1 221: (05) goto pc-1 222: (95) exit Meaning, the visible effect is very similar to f54c7898ed1c ("bpf: Fix precision tracking for unbounded scalars"), that is, the fall-through branch in the instruction 5 is considered to be never taken given the conclusion from the min/max bounds tracking in w6, and therefore the dead-code sanitation rewrites it as goto pc-1. However, real-life input disagrees with verification analysis since a soft-lockup was observed. The bug sits in the analysis of the ARSH. The definition is that we shift the target register value right by K bits through shifting in copies of its sign bit. In adjust_scalar_min_max_vals(), we do first coerce the register into 32 bit mode, same happens after simulating the operation. However, for the case of simulating the actual ARSH, we don't take the mode into account and act as if it's always 64 bit, but location of sign bit is different: dst_reg->smin_value >>= umin_val; dst_reg->smax_value >>= umin_val; dst_reg->var_off = tnum_arshift(dst_reg->var_off, umin_val); Consider an unknown R0 where bpf_get_socket_cookie() (or others) would for example return 0xffff. With the above ARSH simulation, we'd see the following results: [...] 1: R1=ctx(id=0,off=0,imm=0) R2_w=invP65535 R10=fp0 1: (85) call bpf_get_socket_cookie#46 2: R0_w=invP(id=0) R10=fp0 2: (57) r0 &= 808464432 -> R0_runtime = 0x3030 3: R0_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 3: (14) w0 -= 810299440 -> R0_runtime = 0xcfb40000 4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 (0xffffffff) 4: (c4) w0 s>>= 1 -> R0_runtime = 0xe7da0000 5: R0_w=invP(id=0,umin_value=1740636160,umax_value=2147221496,var_off=(0x67c00000; 0x183bfff8)) R10=fp0 (0x67c00000) (0x7ffbfff8) [...] In insn 3, we have a runtime value of 0xcfb40000, which is '1100 1111 1011 0100 0000 0000 0000 0000', the result after the shift has 0xe7da0000 that is '1110 0111 1101 1010 0000 0000 0000 0000', where the sign bit is correctly retained in 32 bit mode. In insn4, the umax was 0xffffffff, and changed into 0x7ffbfff8 after the shift, that is, '0111 1111 1111 1011 1111 1111 1111 1000' and means here that the simulation didn't retain the sign bit. With above logic, the updates happen on the 64 bit min/max bounds and given we coerced the register, the sign bits of the bounds are cleared as well, meaning, we need to force the simulation into s32 space for 32 bit alu mode. Verification after the fix below. We're first analyzing the fall-through branch on 32 bit signed >= test eventually leading to rejection of the program in this specific case: 0: R1=ctx(id=0,off=0,imm=0) R10=fp0 0: (b7) r2 = 808464432 1: R1=ctx(id=0,off=0,imm=0) R2_w=invP808464432 R10=fp0 1: (85) call bpf_get_socket_cookie#46 2: R0_w=invP(id=0) R10=fp0 2: (bf) r6 = r0 3: R0_w=invP(id=0) R6_w=invP(id=0) R10=fp0 3: (57) r6 &= 808464432 4: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=808464432,var_off=(0x0; 0x30303030)) R10=fp0 4: (14) w6 -= 810299440 5: R0_w=invP(id=0) R6_w=invP(id=0,umax_value=4294967295,var_off=(0xcf800000; 0x3077fff0)) R10=fp0 5: (c4) w6 s>>= 1 6: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0 (0x67c00000) (0xfffbfff8) 6: (76) if w6 s>= 0x30303030 goto pc+216 7: R0_w=invP(id=0) R6_w=invP(id=0,umin_value=3888119808,umax_value=4294705144,var_off=(0xe7c00000; 0x183bfff8)) R10=fp0 7: (30) r0 = *(u8 *)skb[808464432] BPF_LD_[ABS|IND] uses reserved fields processed 8 insns (limit 1000000) [...] Fixes: 9cbe1f5a32dc ("bpf/verifier: improve register value range tracking with ARSH") Reported-by: Anatoly Trosinenko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15soc: ti: k3: add navss ringacc driverGrygorii Strashko1-0/+244
The Ring Accelerator (RINGACC or RA) provides hardware acceleration to enable straightforward passing of work between a producer and a consumer. There is one RINGACC module per NAVSS on TI AM65x SoCs. The RINGACC converts constant-address read and write accesses to equivalent read or write accesses to a circular data structure in memory. The RINGACC eliminates the need for each DMA controller which needs to access ring elements from having to know the current state of the ring (base address, current offset). The DMA controller performs a read or write access to a specific address range (which maps to the source interface on the RINGACC) and the RINGACC replaces the address for the transaction with a new address which corresponds to the head or tail element of the ring (head for reads, tail for writes). Since the RINGACC maintains the state, multiple DMA controllers or channels are allowed to coherently share the same rings as applicable. The RINGACC is able to place data which is destined towards software into cached memory directly. Supported ring modes: - Ring Mode - Messaging Mode - Credentials Mode - Queue Manager Mode TI-SCI integration: Texas Instrument's System Control Interface (TI-SCI) Message Protocol now has control over Ringacc module resources management (RM) and Rings configuration. The corresponding support of TI-SCI Ringacc module RM protocol introduced as option through DT parameters: - ti,sci: phandle on TI-SCI firmware controller DT node - ti,sci-dev-id: TI-SCI device identifier as per TI-SCI firmware spec if both parameters present - Ringacc driver will configure/free/reset Rings using TI-SCI Message Ringacc RM Protocol. The Ringacc driver manages Rings allocation by itself now and requests TI-SCI firmware to allocate and configure specific Rings only. It's done this way because, Linux driver implements two stage Rings allocation and configuration (allocate ring and configure ring) while TI-SCI Message Protocol supports only one combined operation (allocate+configure). Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Peter Ujfalusi <[email protected]> Reviewed-by: Tero Kristo <[email protected]> Tested-by: Keerthy <[email protected]> Signed-off-by: Santosh Shilimkar <[email protected]>
2020-01-15Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+0
Pull vfs fixes from Al Viro: "Fixes for mountpoint_last() bugs (by converting to use of lookup_last()) and an autofs regression fix from this cycle (caused by follow_managed() breakage introduced in barrier fixes series)" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix autofs regression caused by follow_managed() changes reimplement path_mountpoint() with less magic
2020-01-15Merge branch 'topic/equal' of ↵Mark Brown1-1/+1
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into asoc-5.6
2020-01-15regulator fix for "regulator: core: Add regulator_is_equal() helper"Stephen Rothwell1-1/+1
Signed-off-by: Stephen Rothwell <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: Marek Vasut <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2020-01-15serial_core: Remove unused member in uart_portDmitry Safonov1-1/+0
It should remove the align-padding before @name. [yes, there's a "hole" in the structure now, but that's fine, no one cares. If they do care, the whole thing should be restructured using pahole to find a better ordering. Removing this field is good as some drivers have been known to abuse it for other things when they shouldn't have been doing that. -- gregkh] Signed-off-by: Dmitry Safonov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-01-15gpiolib: Add support for the irqdomain which doesn't use irq_fwspec as argKevin Hao1-13/+8
Some gpio's parent irqdomain may not use the struct irq_fwspec as argument, such as msi irqdomain. So rename the callback populate_parent_fwspec() to populate_parent_alloc_arg() and make it allocate and populate the specific struct which is needed by the parent irqdomain. Signed-off-by: Kevin Hao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2020-01-15usb: phy-generic: Delete unused platform dataLinus Walleij1-12/+0
The last user of the phy generic platform data was deleted in commit 1e041b6f313aaa966612a7e415cfc09c90d6b829 ("usb: dwc3: exynos: Remove dead code"). So get rid of the platform data, which rids us of another consumer of the legacy GPIO API at the same time. Make sure we only inlcude <linux/gpio/consumer.h> which is all we use. Alter the usb_phy_gen_create_phy() function prototype to not pass any platform data as this is just hardcoded to NULL at all locations calling it in the kernel. Move the devm_gpiod_get* calls out of the if (of_node) parenthesis, as these calls are generic and do not depend on device tree, they are used by any hardware description. Cc: Marek Szyprowski <[email protected]> Cc: Felipe Balbi <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Felipe Balbi <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-01-15Merge tag 'drm-intel-next-2020-01-14' of ↵Dave Airlie1-0/+5
git://anongit.freedesktop.org/drm/drm-intel into drm-next Final drm/i915 features for v5.6: - DP MST fixes (José) - Fix intel_bw_state memory leak (Pankaj Bharadiya) - Switch context id allocation to xarray (Tvrtko) - ICL/EHL/TGL workarounds (Matt Roper, Tvrtko) - Debugfs for LMEM details (Lukasz Fiedorowicz) - Prefer platform acronyms over codenames in symbols (Lucas) - Tiled and port sync mode fixes for fbdev and DP (Manasi) - DSI panel and backlight enable GPIO fixes (Hans de Goede) - Relax audio min CDCLK requirements on non-GLK (Kai Vehmanen) - Plane alignment and dimension check fixes (Imre) - Fix state checks for PSR (José) - Remove ICL+ clock gating programming (José) - Static checker fixes around bool usage (Ma Feng) - Bring back tests for self-contained headers in i915 (Masahiro Yamada) - Fix DP MST disable sequence (Ville) - Start converting i915 to the new drm device based logging macros (Wambui Karuga) - Add DSI VBT I2C sequence execution (Vivek Kasireddy) - Start using function pointers and ops structs in uc code (Michal) - Fix PMU names to not use colons or dashes (Tvrtko) - TGL media decompression support (DK, Imre) - Split i915_gem_gtt.[ch] to more manageable chunks (Matthew Auld) - Create dumb buffers in LMEM where available (Ram) - Extend mmap support for LMEM (Abdiel) - Selftest updates (Chris) - Hack bump up CDCLK on TGL to avoid underruns (Stan) - Use intel_encoder and intel_connector more instead of drm counterparts (Ville) - Build error fixes (Zhang Xiaoxu) - Fixes related to GPU and engine initialization/resume (Chris) - Support for prefaulting discontiguous objects (Abdiel) - Support discontiguous LMEM object maps (Chris) - Various GEM and GT improvements and fixes (Chris) - Merge pinctrl dependencies branch for the DSI GPIO updates (Jani) - Backmerge drm-next for new logging macros (Jani) Signed-off-by: Dave Airlie <[email protected]> From: Jani Nikula <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-01-15Merge tag 'mediatek-drm-next-5.6' of ↵Dave Airlie2-0/+64
https://github.com/ckhu-mediatek/linux.git-tags into drm-next Mediatek DRM Next for Linux 5.6 This fix non-smooth cursor problem, add cmdq support, add ctm property support and some refinement. Signed-off-by: Dave Airlie <[email protected]> From: CK Hu <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/1578972526.14594.8.camel@mtksdaap41
2020-01-15reimplement path_mountpoint() with less magicAl Viro1-1/+0
... and get rid of a bunch of bugs in it. Background: the reason for path_mountpoint() is that umount() really doesn't want attempts to revalidate the root of what it's trying to umount. The thing we want to avoid actually happen from complete_walk(); solution was to do something parallel to normal path_lookupat() and it both went overboard and got the boilerplate subtly (and not so subtly) wrong. A better solution is to do pretty much what the normal path_lookupat() does, but instead of complete_walk() do unlazy_walk(). All it takes to avoid that ->d_weak_revalidate() call... mountpoint_last() goes away, along with everything it got wrong, and so does the magic around LOOKUP_NO_REVAL. Another source of bugs is that when we traverse mounts at the final location (and we need to do that - umount . expects to get whatever's overmounting ., if any, out of the lookup) we really ought to take care of ->d_manage() - as it is, manual umount of autofs automount in progress can lead to unpleasant surprises for the daemon. Easily solved by using handle_lookup_down() instead of follow_mount(). Tested-by: Ian Kent <[email protected]> Signed-off-by: Al Viro <[email protected]>
2020-01-15Merge tag 'drm/tegra/for-5.6-rc1' of ↵Dave Airlie1-2/+13
git://anongit.freedesktop.org/tegra/linux into drm-next drm/tegra: Changes for v5.6-rc1 This contains a small set of mostly fixes and some minor improvements. Signed-off-by: Dave Airlie <[email protected]> From: Thierry Reding <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-01-14fs-verity: implement readahead of Merkle tree pagesEric Biggers1-1/+6
When fs-verity verifies data pages, currently it reads each Merkle tree page synchronously using read_mapping_page(). Therefore, when the Merkle tree pages aren't already cached, fs-verity causes an extra 4 KiB I/O request for every 512 KiB of data (assuming that the Merkle tree uses SHA-256 and 4 KiB blocks). This results in more I/O requests and performance loss than is strictly necessary. Therefore, implement readahead of the Merkle tree pages. For simplicity, we take advantage of the fact that the kernel already does readahead of the file's *data*, just like it does for any other file. Due to this, we don't really need a separate readahead state (struct file_ra_state) just for the Merkle tree, but rather we just need to piggy-back on the existing data readahead requests. We also only really need to bother with the first level of the Merkle tree, since the usual fan-out factor is 128, so normally over 99% of Merkle tree I/O requests are for the first level. Therefore, make fsverity_verify_bio() enable readahead of the first Merkle tree level, for up to 1/4 the number of pages in the bio, when it sees that the REQ_RAHEAD flag is set on the bio. The readahead size is then passed down to ->read_merkle_tree_page() for the filesystem to (optionally) implement if it sees that the requested page is uncached. While we're at it, also make build_merkle_tree_level() set the Merkle tree readahead size, since it's easy to do there. However, for now don't set the readahead size in fsverity_verify_page(), since currently it's only used to verify holes on ext4 and f2fs, and it would need parameters added to know how much to read ahead. This patch significantly improves fs-verity sequential read performance. Some quick benchmarks with 'cat'-ing a 250MB file after dropping caches: On an ARM64 phone (using sha256-ce): Before: 217 MB/s After: 263 MB/s (compare to sha256sum of non-verity file: 357 MB/s) In an x86_64 VM (using sha256-avx2): Before: 173 MB/s After: 215 MB/s (compare to sha256sum of non-verity file: 223 MB/s) Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Theodore Ts'o <[email protected]> Signed-off-by: Eric Biggers <[email protected]>
2020-01-14net: skbuff: disambiguate argument and member for skb_list_walk_safe helperJason A. Donenfeld1-3/+3
This worked before, because we made all callers name their next pointer "next". But in trying to be more "drop-in" ready, the silliness here is revealed. This commit fixes the problem by making the macro argument and the member use different names. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: phy: add MACsec ops in phy_deviceAntoine Tenart1-0/+7
This patch adds a reference to MACsec ops in the phy_device, to allow PHYs to support offloading MACsec operations. The phydev lock will be held while calling those helpers. Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: macsec: introduce the macsec_context structureAntoine Tenart1-0/+2
This patch introduces the macsec_context structure. It will be used in the kernel to exchange information between the common MACsec implementation (macsec.c) and the MACsec hardware offloading implementations. This structure contains pointers to MACsec specific structures which contain the actual MACsec configuration, and to the underlying device (phydev for now). Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14net: phy: Added IRQ print to phylink_bringup_phy()Florian Fainelli1-0/+2
The information about the PHY attached to the PHYLINK instance is useful but is missing the IRQ prints that phy_attached_info() adds. phy_attached_info() is a bit long and it would not be possible to use phylink_info() anyway. Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-14mtd: spi-nor: remove unused enum spi_nor_opsMichael Walle1-10/+2
The ops aren't used in any SPI NOR controller. Therefore, remove them altogether. Signed-off-by: Michael Walle <[email protected]> Signed-off-by: Tudor Ambarus <[email protected]>
2020-01-14mm/mmu_notifiers: Use 'interval_sub' as the variable for mmu_interval_notifierJason Gunthorpe1-17/+21
The 'interval_sub' is placed on the 'notifier_subscriptions' interval tree. This eliminates the poor name 'mni' for this variable. Signed-off-by: Jason Gunthorpe <[email protected]>
2020-01-14mm/mmu_notifiers: Use 'subscription' as the variable name for mmu_notifierJason Gunthorpe1-14/+16
The 'subscription' is placed on the 'notifier_subscriptions' list. This eliminates the poor name 'mn' for this variable. Signed-off-by: Jason Gunthorpe <[email protected]>
2020-01-14mm/mmu_notifier: Rename struct mmu_notifier_mm to mmu_notifier_subscriptionsJason Gunthorpe2-10/+10
The name mmu_notifier_mm implies that the thing is a mm_struct pointer, and is difficult to abbreviate. The struct is actually holding the interval tree and hlist containing the notifiers subscribed to a mm. Use 'subscriptions' as the variable name for this struct instead of the really terrible and misleading 'mmn_mm'. Signed-off-by: Jason Gunthorpe <[email protected]>
2020-01-14Merge tag 'regulator-eq' of ↵Mark Brown1-0/+7
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator into asoc-5.6 regulator: add regulator_equal()
2020-01-14regulator: core: Add regulator_is_equal() helperMarek Vasut1-0/+7
Add regulator_is_equal() helper to compare whether two regulators are the same. This is useful for checking whether two separate regulators in a driver are actually the same supply. Signed-off-by: Marek Vasut <[email protected]> Cc: Fabio Estevam <[email protected]> Cc: Igor Opaniuk <[email protected]> Cc: Liam Girdwood <[email protected]> Cc: Marcel Ziswiler <[email protected]> Cc: Mark Brown <[email protected]> Cc: Oleksandr Suvorov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2020-01-14misc: alcor_pci: Add AU6625 to list of supported PCI_IDsRhys Perry1-0/+1
I have added the AU6625 PCI_ID to the list of supported IDs: alcor_pci.c // Added au6625s ID to the array of supported devices alcor_pci.h // Added entry to define the PCI ID Made it fit in with the already submitted code: alcor_pci.c // Added config entry to that matches the one for au6601 >From general usage there seems to be no problems. Signed-off-by: Rhys Perry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-01-14fs/proc: Introduce /proc/pid/timens_offsetsAndrei Vagin1-0/+10
API to set time namespace offsets for children processes, i.e.: echo "$clockid $offset_sec $offset_nsec" > /proc/self/timens_offsets Co-developed-by: Dmitry Safonov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-14x86/vdso: Zap vvar pages when switching to a time namespaceDmitry Safonov1-0/+9
The VVAR page layout depends on whether a task belongs to the root or non-root time namespace. Whenever a task changes its namespace, the VVAR page tables are cleared and then they will be re-faulted with a corresponding layout. Co-developed-by: Andrei Vagin <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-14time: Allocate per-timens vvar pageDmitry Safonov1-0/+3
VDSO support for Time namespace needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page contains time namespace clock offsets and it has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. Allocate the timens page during namespace creation. Setup the offsets when the first task enters the ns and freeze them to guarantee the pace of monotonic/boottime clocks and to avoid breakage of applications. The design decision is to have a global offset_lock which is used during namespace offsets setup and to freeze offsets when the first task joins the new time namespace. That is better in terms of memory usage compared to having a per namespace mutex that's used only during the setup period. Suggested-by: Andy Lutomirski <[email protected]> Based-on-work-by: Thomas Gleixner <[email protected]> Co-developed-by: Andrei Vagin <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-14x86/vdso: Provide vdso_data offset on vvar_pageDmitry Safonov1-0/+1
VDSO support for time namespaces needs to set up a page with the same layout as VVAR. That timens page will be placed on position of VVAR page inside namespace. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. To prepare the time namespace page the kernel needs to know the vdso_data offset. Provide arch_get_vdso_data() helper for locating vdso_data on VVAR page. Co-developed-by: Andrei Vagin <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-14lib/vdso: Prepare for time namespace supportThomas Gleixner1-0/+6
To support time namespaces in the vdso with a minimal impact on regular non time namespace affected tasks, the namespace handling needs to be hidden in a slow path. The most obvious place is vdso_seq_begin(). If a task belongs to a time namespace then the VVAR page which contains the system wide vdso data is replaced with a namespace specific page which has the same layout as the VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time namespace handling path. The extra check in the case that vdso_data->seq is odd, e.g. a concurrent update of the vdso data is in progress, is not really affecting regular tasks which are not part of a time namespace as the task is spin waiting for the update to finish and vdso_data->seq to become even again. If a time namespace task hits that code path, it invokes the corresponding time getter function which retrieves the real VVAR page, reads host time and then adds the offset for the requested clock which is stored in the special VVAR page. If VDSO time namespace support is disabled the whole magic is compiled out. Initial testing shows that the disabled case is almost identical to the host case which does not take the slow timens path. With the special timens page installed the performance hit is constant time and in the range of 5-7%. For the vdso functions which are not using the sequence count an unconditional check for vdso_data->clock_mode is added which switches to the real vdso when the clock_mode is VCLOCK_TIMENS. [avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data pointer by CS_RAW offset.] Suggested-by: Andy Lutomirski <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-14hrtimers: Prepare hrtimer_nanosleep() for time namespacesAndrei Vagin1-2/+1
clock_nanosleep() accepts absolute values of expiration time when TIMER_ABSTIME flag is set. This absolute value is inside the task's time namespace, and has to be converted to the host's time. There is timens_ktime_to_host() helper for converting time, but it accepts ktime argument. As a preparation, make hrtimer_nanosleep() accept a clock value in ktime instead of timespec64. Co-developed-by: Dmitry Safonov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-14time: Add do_timens_ktime_to_host() helperAndrei Vagin1-0/+17
The helper subtracts namespace's clock offset from the given time and ensures that the result is within [0, KTIME_MAX]. Co-developed-by: Dmitry Safonov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-14time: Add timens_offsets to be used for tasks in time namespaceAndrei Vagin1-0/+22
Introduce offsets for time namespace. They will contain an adjustment needed to convert clocks to/from host's. A new namespace is created with the same offsets as the time namespace of the current process. Co-developed-by: Dmitry Safonov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-14ns: Introduce Time NamespaceAndrei Vagin4-0/+77
Time Namespace isolates clock values. The kernel provides access to several clocks CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc. CLOCK_REALTIME System-wide clock that measures real (i.e., wall-clock) time. CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. CLOCK_BOOTTIME Identical to CLOCK_MONOTONIC, except it also includes any time that the system is suspended. For many users, the time namespace means the ability to changes date and time in a container (CLOCK_REALTIME). Providing per namespace notions of CLOCK_REALTIME would be complex with a massive overhead, but has a dubious value. But in the context of checkpoint/restore functionality, monotonic and boottime clocks become interesting. Both clocks are monotonic with unspecified starting points. These clocks are widely used to measure time slices and set timers. After restoring or migrating processes, it has to be guaranteed that they never go backward. In an ideal case, the behavior of these clocks should be the same as for a case when a whole system is suspended. All this means that it is required to set CLOCK_MONOTONIC and CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace offsets for clocks. A time namespace is similar to a pid namespace in the way how it is created: unshare(CLONE_NEWTIME) system call creates a new time namespace, but doesn't set it to the current process. Then all children of the process will be born in the new time namespace, or a process can use the setns() system call to join a namespace. This scheme allows setting clock offsets for a namespace, before any processes appear in it. All available clone flags have been used, so CLONE_NEWTIME uses the highest bit of CSIGNAL. It means that it can be used only with the unshare() and the clone3() system calls. [ tglx: Adjusted paragraph about clone3() to reality and massaged the changelog a bit. ] Co-developed-by: Dmitry Safonov <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://criu.org/Time_namespace Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html Link: https://lore.kernel.org/r/[email protected]
2020-01-14soundwire: bus: fix device number leak on errorsPierre-Louis Bossart1-1/+3
If the programming of the dev_number fails due to an IO error, a new device_number will be assigned, resulting in a leak. Make sure we only assign a device_number once per Slave device. Signed-off-by: Pierre-Louis Bossart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2020-01-14phy: Add DisplayPort configuration optionsYuti Amonkar2-0/+100
Allow DisplayPort PHYs to be configured through the generic functions through a custom structure added to the generic union. The configuration structure is used for reconfiguration of DisplayPort PHYs during link training operation. The parameters added here are the ones defined in the DisplayPort spec v1.4 which include link rate, number of lanes, voltage swing and pre-emphasis. Add the DisplayPort phy mode to the generic phy_mode enum. Signed-off-by: Yuti Amonkar <[email protected]> Reviewed-by: Maxime Ripard <[email protected]> Reviewed-by: Jyri Sarha <[email protected]> Signed-off-by: Kishon Vijay Abraham I <[email protected]>
2020-01-13net: stmmac: Initial support for TBSJose Abreu1-0/+1
Adds the initial hooks for TBS support. This needs a 32 byte descriptor in order for it to work with current HW. Adds all the logic for Enhanced Descriptors in main core but no HW related logic for now. Changes from v2: - Use bitfield for TBS status / support (Jakub) - Remove unneeded cache alignment (Jakub) - Fix checkpatch issues Signed-off-by: Jose Abreu <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-01-13mm, debug_pagealloc: don't rely on static keys too earlyVlastimil Babka1-3/+15
Commit 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") has introduced a static key to reduce overhead when debug_pagealloc is compiled in but not enabled. It relied on the assumption that jump_label_init() is called before parse_early_param() as in start_kernel(), so when the "debug_pagealloc=on" option is parsed, it is safe to enable the static key. However, it turns out multiple architectures call parse_early_param() earlier from their setup_arch(). x86 also calls jump_label_init() even earlier, so no issue was found while testing the commit, but same is not true for e.g. ppc64 and s390 where the kernel would not boot with debug_pagealloc=on as found by our QA. To fix this without tricky changes to init code of multiple architectures, this patch partially reverts the static key conversion from 96a2b03f281d. Init-time and non-fastpath calls (such as in arch code) of debug_pagealloc_enabled() will again test a simple bool variable. Fastpath mm code is converted to a new debug_pagealloc_enabled_static() variant that relies on the static key, which is enabled in a well-defined point in mm_init() where it's guaranteed that jump_label_init() has been called, regardless of architecture. [[email protected]: export _debug_pagealloc_enabled_early] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: 96a2b03f281d ("mm, debug_pagelloc: use static keys to enable debugging") Signed-off-by: Vlastimil Babka <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Qian Cai <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-01-13mm: memcg/slab: fix percpu slab vmstats flushingRoman Gushchin1-3/+2
Currently slab percpu vmstats are flushed twice: during the memcg offlining and just before freeing the memcg structure. Each time percpu counters are summed, added to the atomic counterparts and propagated up by the cgroup tree. The second flushing is required due to how recursive vmstats are implemented: counters are batched in percpu variables on a local level, and once a percpu value is crossing some predefined threshold, it spills over to atomic values on the local and each ascendant levels. It means that without flushing some numbers cached in percpu variables will be dropped on floor each time a cgroup is destroyed. And with uptime the error on upper levels might become noticeable. The first flushing aims to make counters on ancestor levels more precise. Dying cgroups may resume in the dying state for a long time. After kmem_cache reparenting which is performed during the offlining slab counters of the dying cgroup don't have any chances to be updated, because any slab operations will be performed on the parent level. It means that the inaccuracy caused by percpu batching will not decrease up to the final destruction of the cgroup. By the original idea flushing slab counters during the offlining should minimize the visible inaccuracy of slab counters on the parent level. The problem is that percpu counters are not zeroed after the first flushing. So every cached percpu value is summed twice. It creates a small error (up to 32 pages per cpu, but usually less) which accumulates on parent cgroup level. After creating and destroying of thousands of child cgroups, slab counter on parent level can be way off the real value. For now, let's just stop flushing slab counters on memcg offlining. It can't be done correctly without scheduling a work on each cpu: reading and zeroing it during css offlining can race with an asynchronous update, which doesn't expect values to be changed underneath. With this change, slab counters on parent level will become eventually consistent. Once all dying children are gone, values are correct. And if not, the error is capped by 32 * NR_CPUS pages per dying cgroup. It's not perfect, as slab are reparented, so any updates after the reparenting will happen on the parent level. It means that if a slab page was allocated, a counter on child level was bumped, then the page was reparented and freed, the annihilation of positive and negative counter values will not happen until the child cgroup is released. It makes slab counters different from others, and it might want us to implement flushing in a correct form again. But it's also a question of performance: scheduling a work on each cpu isn't free, and it's an open question if the benefit of having more accurate counters is worth it. We might also consider flushing all counters on offlining, not only slab counters. So let's fix the main problem now: make the slab counters eventually consistent, so at least the error won't grow with uptime (or more precisely the number of created and destroyed cgroups). And think about the accuracy of counters separately. Link: http://lkml.kernel.org/r/[email protected] Fixes: bee07b33db78 ("mm: memcontrol: flush percpu slab vmstats on kmem offlining") Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>