aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2020-12-04driver core: auxiliary bus: move slab.h from include fileGreg Kroah-Hartman1-1/+0
No need to include slab.h in include/linux/auxiliary_bus.h, as it is not needed there. Move it to drivers/base/auxiliary.c instead. Cc: Dan Williams <[email protected]> Cc: Dave Ertman <[email protected]> Cc: Fred Oh <[email protected]> Cc: Kiran Patil <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Martin Habets <[email protected]> Cc: Parav Pandit <[email protected]> Cc: Pierre-Louis Bossart <[email protected]> Cc: Ranjani Sridharan <[email protected]> Cc: Shiraz Saleem <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04mmc: tmio: set max_busy_timeoutWolfram Sang1-1/+6
Set max_busy_timeouts for variants known to support the TOPxx bits in the SD_OPTION register. The timeout mechanism was running in the background but not yet properly handled in the driver. So, let the MMC core know when to not use R1B to avoid unhandled timeouts. My datasheets for older variants (tmio_mmc.c) suggest that they support it, too. However, actual bit descriptions are lacking, so I chose an opt-in approach. Signed-off-by: Wolfram Sang <[email protected]> Reviewed-by: Yoshihiro Shimoda <[email protected]> Tested-by: Yoshihiro Shimoda <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-12-04Add auxiliary bus supportDave Ertman2-0/+86
Add support for the Auxiliary Bus, auxiliary_device and auxiliary_driver. It enables drivers to create an auxiliary_device and bind an auxiliary_driver to it. The bus supports probe/remove shutdown and suspend/resume callbacks. Each auxiliary_device has a unique string based id; driver binds to an auxiliary_device based on this id through the bus. Co-developed-by: Kiran Patil <[email protected]> Co-developed-by: Ranjani Sridharan <[email protected]> Co-developed-by: Fred Oh <[email protected]> Co-developed-by: Leon Romanovsky <[email protected]> Signed-off-by: Kiran Patil <[email protected]> Signed-off-by: Ranjani Sridharan <[email protected]> Signed-off-by: Fred Oh <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Dave Ertman <[email protected]> Reviewed-by: Pierre-Louis Bossart <[email protected]> Reviewed-by: Shiraz Saleem <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Reviewed-by: Dan Williams <[email protected]> Reviewed-by: Martin Habets <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]> Link: https://lore.kernel.org/r/160695681289.505290.8978295443574440604.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <[email protected]>
2020-12-04Merge remote-tracking branch 'origin/kvm-arm64/misc-5.11' into ↵Marc Zyngier1-0/+4
kvmarm-master/queue Signed-off-by: Marc Zyngier <[email protected]>
2020-12-04psci: Add accessor for psci_0_1_function_idsDavid Brazdil1-0/+9
Make it possible to retrieve a copy of the psci_0_1_function_ids struct. This is useful for KVM if it is configured to intercept host's PSCI SMCs. Signed-off-by: David Brazdil <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-03bpf: Allow to specify kernel module BTFs when attaching BPF programsAndrii Nakryiko1-0/+1
Add ability for user-space programs to specify non-vmlinux BTF when attaching BTF-powered BPF programs: raw_tp, fentry/fexit/fmod_ret, LSM, etc. For this, attach_prog_fd (now with the alias name attach_btf_obj_fd) should specify FD of a module or vmlinux BTF object. For backwards compatibility reasons, 0 denotes vmlinux BTF. Only kernel BTF (vmlinux or module) can be specified. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03bpf: Remove hard-coded btf_vmlinux assumption from BPF verifierAndrii Nakryiko3-12/+34
Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead, wherever BTF type IDs are involved, also track the instance of struct btf that goes along with the type ID. This allows to gradually add support for kernel module BTFs and using/tracking module types across BPF helper calls and registers. This patch also renames btf_id() function to btf_obj_id() to minimize naming clash with using btf_id to denote BTF *type* ID, rather than BTF *object*'s ID. Also, altough btf_vmlinux can't get destructed and thus doesn't need refcounting, module BTFs need that, so apply BTF refcounting universally when BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This makes for simpler clean up code. Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF trampoline key to include BTF object ID. To differentiate that from target program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are not allowed to take full 32 bits, so there is no danger of confusing that bit with a valid BTF type ID. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-2/+24
Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-03vfio-mdev: Wire in a request handler for mdev parentEric Farman1-0/+4
While performing some destructive tests with vfio-ccw, where the paths to a device are forcible removed and thus the device itself is unreachable, it is rather easy to end up in an endless loop in vfio_del_group_dev() due to the lack of a request callback for the associated device. In this example, one MDEV (77c) is used by a guest, while another (77b) is not. The symptom is that the iommu is detached from the mdev for 77b, but not 77c, until that guest is shutdown: [ 238.794867] vfio_ccw 0.0.077b: MDEV: Unregistering [ 238.794996] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: Removing from iommu group 2 [ 238.795001] vfio_mdev 11f2d2bc-4083-431d-a023-eff72715c4f0: MDEV: detaching iommu [ 238.795036] vfio_ccw 0.0.077c: MDEV: Unregistering ...silence... Let's wire in the request call back to the mdev device, so that a device being physically removed from the host can be (gracefully?) handled by the parent device at the time the device is removed. Add a message when registering the device if a driver doesn't provide this callback, so a clue is given that this same loop may be encountered in a similar situation, and a message when this occurs instead of the awkward silence noted above. Signed-off-by: Eric Farman <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Signed-off-by: Alex Williamson <[email protected]>
2020-12-03Merge tag 'net-5.10-rc7' of ↵Linus Torvalds2-2/+21
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.10-rc7, including fixes from bpf, netfilter, wireless drivers, wireless mesh and can. Current release - regressions: - mt76: usb: fix crash on device removal Current release - always broken: - xsk: Fix umem cleanup from wrong context in socket destruct Previous release - regressions: - net: ip6_gre: set dev->hard_header_len when using header_ops - ipv4: Fix TOS mask in inet_rtm_getroute() - net, xsk: Avoid taking multiple skbuff references Previous release - always broken: - net/x25: prevent a couple of overflows - netfilter: ipset: prevent uninit-value in hash_ip6_add - geneve: pull IP header before ECN decapsulation - mpls: ensure LSE is pullable in TC and openvswitch paths - vxlan: respect needed_headroom of lower device - batman-adv: Consider fragmentation for needed packet headroom - can: drivers: don't count arbitration loss as an error - netfilter: bridge: reset skb->pkt_type after POST_ROUTING traversal - inet_ecn: Fix endianness of checksum update when setting ECT(1) - ibmvnic: fix various corner cases around reset handling - net/mlx5: fix rejecting unsupported Connect-X6DX SW steering - net/mlx5: Enforce HW TX csum offload with kTLS" * tag 'net-5.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits) net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering net/mlx5e: kTLS, Enforce HW TX csum offload with kTLS net: mlx5e: fix fs_tcp.c build when IPV6 is not enabled net/mlx5: Fix wrong address reclaim when command interface is down net/sched: act_mpls: ensure LSE is pullable before reading it net: openvswitch: ensure LSE is pullable before reading it net: skbuff: ensure LSE is pullable before decrementing the MPLS ttl net: mvpp2: Fix error return code in mvpp2_open() chelsio/chtls: fix a double free in chtls_setkey() rtw88: debug: Fix uninitialized memory in debugfs code vxlan: fix error return code in __vxlan_dev_create() net: pasemi: fix error return code in pasemi_mac_open() cxgb3: fix error return code in t3_sge_alloc_qset() net/x25: prevent a couple of overflows dpaa_eth: copy timestamp fields to new skb in A-050385 workaround net: ip6_gre: set dev->hard_header_len when using header_ops mt76: usb: fix crash on device removal iwlwifi: pcie: add some missing entries for AX210 iwlwifi: pcie: invert values of NO_160 device config entries iwlwifi: pcie: add one missing entry for AX210 ...
2020-12-03security: add const qualifier to struct sock in various placesFlorian Westphal3-4/+4
A followup change to tcp_request_sock_op would have to drop the 'const' qualifier from the 'route_req' function as the 'security_inet_conn_request' call is moved there - and that function expects a 'struct sock *'. However, it turns out its also possible to add a const qualifier to security_inet_conn_request instead. Signed-off-by: Florian Westphal <[email protected]> Acked-by: James Morris <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-03iio:trigger: rename try_reenable() to reenable() plus return voidJonathan Cameron1-2/+2
As we no longer support a try again if we cannot reenable the trigger rename the function to reflect this. Also we don't do anything with the value returned so stop it returning anything. For the few drivers that didn't already print an error message in this patch, add such a print. Signed-off-by: Jonathan Cameron <[email protected]> Reviewed-by: Lars-Peter Clausen <[email protected]> Acked-by: Linus Walleij <[email protected]> Acked-by: Srinivas Pandruvada <[email protected]> Cc: Linus Walleij <[email protected]> Cc: Srinivas Pandruvada <[email protected]> Cc: Christian Oder <[email protected]> Cc: Eugen Hristev <[email protected]> Cc: Nishant Malpani <[email protected]> Cc: Daniel Baluta <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-03iio: ad_sigma_delta: Don't put SPI transfer buffer on the stackLars-Peter Clausen1-1/+5
Use a heap allocated memory for the SPI transfer buffer. Using stack memory can corrupt stack memory when using DMA on some systems. This change moves the buffer from the stack of the trigger handler call to the heap of the buffer of the state struct. The size increases takes into account the alignment for the timestamp, which is 8 bytes. The 'data' buffer is split into 'tx_buf' and 'rx_buf', to make a clearer separation of which part of the buffer should be used for TX & RX. Fixes: af3008485ea03 ("iio:adc: Add common code for ADI Sigma Delta devices") Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Alexandru Ardelean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]>
2020-12-03net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steeringYevgeny Kliteynik1-1/+8
STEs format for Connect-X5 and Connect-X6DX different. Currently, on Connext-X6DX the SW steering would break at some point when building STEs w/o giving a proper error message. Fix this by checking the STE format of the current device when initializing domain: add mlx5_ifc definitions for Connect-X6DX SW steering, read FW capability to get the current format version, and check this version when domain is being created. Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities") Signed-off-by: Yevgeny Kliteynik <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-03macvlan: Support for high multicast packet rateThomas Karlsson1-0/+1
Background: Broadcast and multicast packages are enqueued for later processing. This queue was previously hardcoded to 1000. This proved insufficient for handling very high packet rates. This resulted in packet drops for multicast. While at the same time unicast worked fine. The change: This patch make the queue length adjustable to accommodate for environments with very high multicast packet rate. But still keeps the default value of 1000 unless specified. The queue length is specified as a request per macvlan using the IFLA_MACVLAN_BC_QUEUE_LEN parameter. The actual used queue length will then be the maximum of any macvlan connected to the same port. The actual used queue length for the port can be retrieved (read only) by the IFLA_MACVLAN_BC_QUEUE_LEN_USED parameter for verification. This will be followed up by a patch to iproute2 in order to adjust the parameter from userspace. Signed-off-by: Thomas Karlsson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-04signal: Add unsafe_put_compat_sigset()Christophe Leroy1-0/+32
Implement 'unsafe' version of put_compat_sigset() For the bigendian, use unsafe_put_user() directly to avoid intermediate copy through the stack. For the littleendian, use a straight unsafe_copy_to_user(). Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/537c7082ee309a0bb9c67a50c5d9dd929aedb82d.1597770847.git.christophe.leroy@csgroup.eu
2020-12-03Updated locking documentation for transaction_tAlexander Lochmann1-2/+3
We used LockDoc to derive locking rules for each member of struct transaction_t. Based on those results, we extended the existing documentation by more members of struct transaction_t, and updated the existing documentation. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexander Lochmann <[email protected]> Signed-off-by: Horst Schirmeier <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]>
2020-12-03fsnotify: generalize handle_inode_event()Amir Goldstein1-1/+2
The handle_inode_event() interface was added as (quoting comment): "a simple variant of handle_event() for groups that only have inode marks and don't have ignore mask". In other words, all backends except fanotify. The inotify backend also falls under this category, but because it required extra arguments it was left out of the initial pass of backends conversion to the simple interface. This results in code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback which also happen to be buggy code. Generalize the handle_inode_event() arguments and add the check for FS_EXCL_UNLINK flag to the generic helper, so inotify backend could be converted to use the simple interface. Link: https://lore.kernel.org/r/[email protected] CC: [email protected] Fixes: b9a1b9772509 ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <[email protected]> Signed-off-by: Jan Kara <[email protected]>
2020-12-03refcount: Fix a kernel-doc markupMauro Carvalho Chehab1-1/+1
The kernel-doc markup is wrong: it is asking the tool to document struct refcount_struct, instead of documenting typedef refcount_t. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/afb9bb1e675bf5f72a34a55d780779d7d5916b4c.1606823973.git.mchehab+huawei@kernel.org
2020-12-03completion: Drop init_completion defineMauro Carvalho Chehab1-3/+2
Changeset cd8084f91c02 ("locking/lockdep: Apply crossrelease to completions") added a CONFIG_LOCKDEP_COMPLETE (that was later renamed to CONFIG_LOCKDEP_COMPLETIONS). Such changeset renamed the init_completion, and add a macro that would either run a modified version or the original code. However, such code reported too many false positives. So, it ended being dropped later on by changeset e966eaeeb623 ("locking/lockdep: Remove the cross-release locking checks"). Yet, the define remained there as just: #define init_completion(x) __init_completion(x) Get rid of the define, and return __init_completion() function to its original name. Fixes: e966eaeeb623 ("locking/lockdep: Remove the cross-release locking checks") Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/e657bfc533545c185b1c3c55926a449ead56a88b.1606823973.git.mchehab+huawei@kernel.org
2020-12-03seqlock: Rename __seqprop() usersPeter Zijlstra1-23/+23
More consistent naming should make it easier to untangle the _Generic token pasting maze called __seqprop(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-03seqlock: avoid -Wshadow warningsArnd Bergmann1-7/+7
When building with W=2, there is a flood of warnings about the seqlock macros shadowing local variables: 19806 linux/seqlock.h:331:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] 48 linux/seqlock.h:348:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] 8 linux/seqlock.h:379:11: warning: declaration of 'seq' shadows a previous local [-Wshadow] Prefix the local variables to make the warning useful elsewhere again. Fixes: 52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as statement expressions") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-03mm: Introduce pXX_leaf_size()Peter Zijlstra1-0/+16
A number of architectures have non-pagetable aligned huge/large pages. For such architectures a leaf can actually be part of a larger entry. Provide generic helpers to determine the size of a page-table leaf. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-03mm/gup: Provide gup_get_pte() more genericPeter Zijlstra1-0/+55
In order to write another lockless page-table walker, we need gup_get_pte() exposed. While doing that, rename it to ptep_get_lockless() to match the existing ptep_get() naming. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-12-03media: coda: Convert the driver to DT-onlyFabio Estevam1-14/+0
Since 5.10-rc1 i.MX is a devicetree-only platform, so simplify the code by removing the unused non-DT support. Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2020-12-02Input: Add "inhibited" propertyPatrik Fimml1-2/+10
Userspace might want to implement a policy to temporarily disregard input from certain devices, including not treating them as wakeup sources. An example use case is a laptop, whose keyboard can be folded under the screen to create tablet-like experience. The user then must hold the laptop in such a way that it is difficult to avoid pressing the keyboard keys. It is therefore desirable to temporarily disregard input from the keyboard, until it is folded back. This obviously is a policy which should be kept out of the kernel, but the kernel must provide suitable means to implement such a policy. This patch adds a sysfs interface for exactly this purpose. To implement the said interface it adds an "inhibited" property to struct input_dev, and effectively creates four states a device can be in: closed uninhibited, closed inhibited, open uninhibited, open inhibited. It also defers calling driver's ->open() and ->close() to until they are actually needed, e.g. it makes no sense to prepare the underlying device for generating events (->open()) if the device is inhibited. uninhibit closed <------------ closed uninhibited ------------> inhibited | ^ inhibit | ^ 1st | | 1st | | open | | open | | | | | | | | last | | last | | close | | close v | uninhibit v | open <------------ open uninhibited ------------> inhibited The top inhibit/uninhibit transition happens when users == 0. The bottom inhibit/uninhibit transition happens when users > 0. The left open/close transition happens when !inhibited. The right open/close transition happens when inhibited. Due to all transitions being serialized with dev->mutex, it is impossible to have "diagonal" transitions between closed uninhibited and open inhibited or between open uninhibited and closed inhibited. No new callbacks are added to drivers, because their open() and close() serve exactly the purpose to tell the driver to start/stop providing events to the input core. Consequently, open() and close() - if provided - are called in both inhibit and uninhibit paths. Signed-off-by: Patrik Fimml <[email protected]> Co-developed-by: Andrzej Pietrasiewicz <[email protected]> Signed-off-by: Andrzej Pietrasiewicz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Torokhov <[email protected]>
2020-12-02Input: add input_device_enabled()Andrzej Pietrasiewicz1-0/+2
A helper function for drivers to decide if the device is used or not. A lockdep check is introduced as inspecting ->users should be done under input device's mutex. Signed-off-by: Andrzej Pietrasiewicz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Torokhov <[email protected]>
2020-12-02f2fs: compress: support chksumChao Yu1-1/+1
This patch supports to store chksum value with compressed data, and verify the integrality of compressed data while reading the data. The feature can be enabled through specifying mount option 'compress_chksum'. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2020-12-02fscrypt: Have filesystems handle their d_opsDaniel Rosenberg1-2/+5
This shifts the responsibility of setting up dentry operations from fscrypt to the individual filesystems, allowing them to have their own operations while still setting fscrypt's d_revalidate as appropriate. Most filesystems can just use generic_set_encrypted_ci_d_ops, unless they have their own specific dentry operations as well. That operation will set the minimal d_ops required under the circumstances. Since the fscrypt d_ops are set later on, we must set all d_ops there, since we cannot adjust those later on. This should not result in any change in behavior. Signed-off-by: Daniel Rosenberg <[email protected]> Acked-by: Theodore Ts'o <[email protected]> Acked-by: Eric Biggers <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2020-12-02libfs: Add generic function for setting dentry_opsDaniel Rosenberg1-0/+1
This adds a function to set dentry operations at lookup time that will work for both encrypted filenames and casefolded filenames. A filesystem that supports both features simultaneously can use this function during lookup preparations to set up its dentry operations once fscrypt no longer does that itself. Currently the casefolding dentry operation are always set if the filesystem defines an encoding because the features is toggleable on empty directories. Unlike in the encryption case, the dentry operations used come from the parent. Since we don't know what set of functions we'll eventually need, and cannot change them later, we enable the casefolding operations if the filesystem supports them at all. By splitting out the various cases, we support as few dentry operations as we can get away with, maximizing compatibility with overlayfs, which will not function if a filesystem supports certain dentry_operations. Signed-off-by: Daniel Rosenberg <[email protected]> Reviewed-by: Theodore Ts'o <[email protected]> Reviewed-by: Eric Biggers <[email protected]> Reviewed-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2020-12-02bpf: Eliminate rlimit-based memory accounting for bpf progsRoman Gushchin1-11/+0
Do not use rlimit-based memory accounting for bpf progs. It has been replaced with memcg-based memory accounting. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Eliminate rlimit-based memory accounting infra for bpf mapsRoman Gushchin1-12/+0
Remove rlimit-based accounting infrastructure code, which is not used anymore. To provide a backward compatibility, use an approximation of the bpf map memory footprint as a "memlock" value, available to a user via map info. The approximation is based on the maximal number of elements and key and value sizes. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02bpf: Prepare for memcg-based memory accounting for bpf mapsRoman Gushchin1-0/+34
Bpf maps can be updated from an interrupt context and in such case there is no process which can be charged. It makes the memory accounting of bpf maps non-trivial. Fortunately, after commit 4127c6504f25 ("mm: kmem: enable kernel memcg accounting from interrupt contexts") and commit b87d8cefe43c ("mm, memcg: rework remote charging API to support nesting") it's finally possible. To make the ownership model simple and consistent, when the map is created, the memory cgroup of the current process is recorded. All subsequent allocations related to the bpf map are charged to the same memory cgroup. It includes allocations made by any processes (even if they do belong to a different cgroup) and from interrupts. This commit introduces 3 new helpers, which will be used by following commits to enable the accounting of bpf maps memory: - bpf_map_kmalloc_node() - bpf_map_kzalloc() - bpf_map_alloc_percpu() They are wrapping popular memory allocation functions. They set the active memory cgroup to the map's memory cgroup and add __GFP_ACCOUNT to the passed gfp flags. Then they call into the corresponding memory allocation function and restore the original active memory cgroup. These helpers are supposed to use everywhere except the map creation path. During the map creation when the map structure is allocated by itself, it cannot be passed to those helpers. In those cases default memory allocation function will be used with the __GFP_ACCOUNT flag. Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02mm: Convert page kmemcg type to a page memcg flagRoman Gushchin2-13/+35
PageKmemcg flag is currently defined as a page type (like buddy, offline, table and guard). Semantically it means that the page was accounted as a kernel memory by the page allocator and has to be uncharged on the release. As a side effect of defining the flag as a page type, the accounted page can't be mapped to userspace (look at page_has_type() and comments above). In particular, this blocks the accounting of vmalloc-backed memory used by some bpf maps, because these maps do map the memory to userspace. One option is to fix it by complicating the access to page->mapcount, which provides some free bits for page->page_type. But it's way better to move this flag into page->memcg_data flags. Indeed, the flag makes no sense without enabled memory cgroups and memory cgroup pointer set in particular. This commit replaces PageKmemcg() and __SetPageKmemcg() with PageMemcgKmem() and an open-coded OR operation setting the memcg pointer with the MEMCG_DATA_KMEM bit. __ClearPageKmemcg() can be simple deleted, as the whole memcg_data is zeroed at once. As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will be compiled out. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02mm: Introduce page memcg flagsRoman Gushchin1-12/+20
The lowest bit in page->memcg_data is used to distinguish between struct memory_cgroup pointer and a pointer to a objcgs array. All checks and modifications of this bit are open-coded. Let's formalize it using page memcg flags, defined in enum page_memcg_data_flags. Additional flags might be added later. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02mm: memcontrol/slab: Use helpers to access slab page's memcg_dataRoman Gushchin1-0/+64
To gather all direct accesses to struct page's memcg_data field in one place, let's introduce 3 new helpers to use in the slab accounting code: struct obj_cgroup **page_objcgs(struct page *page); struct obj_cgroup **page_objcgs_check(struct page *page); bool set_page_objcgs(struct page *page, struct obj_cgroup **objcgs); They are similar to the corresponding API for generic pages, except that the setter can return false, indicating that the value has been already set from a different thread. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02mm: memcontrol: Use helpers to read page's memcg dataRoman Gushchin3-35/+106
Patch series "mm: allow mapping accounted kernel pages to userspace", v6. Currently a non-slab kernel page which has been charged to a memory cgroup can't be mapped to userspace. The underlying reason is simple: PageKmemcg flag is defined as a page type (like buddy, offline, etc), so it takes a bit from a page->mapped counter. Pages with a type set can't be mapped to userspace. But in general the kmemcg flag has nothing to do with mapping to userspace. It only means that the page has been accounted by the page allocator, so it has to be properly uncharged on release. Some bpf maps are mapping the vmalloc-based memory to userspace, and their memory can't be accounted because of this implementation detail. This patchset removes this limitation by moving the PageKmemcg flag into one of the free bits of the page->mem_cgroup pointer. Also it formalizes accesses to the page->mem_cgroup and page->obj_cgroups using new helpers, adds several checks and removes a couple of obsolete functions. As the result the code became more robust with fewer open-coded bit tricks. This patch (of 4): Currently there are many open-coded reads of the page->mem_cgroup pointer, as well as a couple of read helpers, which are barely used. It creates an obstacle on a way to reuse some bits of the pointer for storing additional bits of information. In fact, we already do this for slab pages, where the last bit indicates that a pointer has an attached vector of objcg pointers instead of a regular memcg pointer. This commits uses 2 existing helpers and introduces a new helper to converts all read sides to calls of these helpers: struct mem_cgroup *page_memcg(struct page *page); struct mem_cgroup *page_memcg_rcu(struct page *page); struct mem_cgroup *page_memcg_check(struct page *page); page_memcg_check() is intended to be used in cases when the page can be a slab page and have a memcg pointer pointing at objcg vector. It does check the lowest bit, and if set, returns NULL. page_memcg() contains a VM_BUG_ON_PAGE() check for the page not being a slab page. To make sure nobody uses a direct access, struct page's mem_cgroup/obj_cgroups is converted to unsigned long memcg_data. Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02fscrypt: allow deleting files with unsupported encryption policyEric Biggers1-3/+6
Currently it's impossible to delete files that use an unsupported encryption policy, as the kernel will just return an error when performing any operation on the top-level encrypted directory, even just a path lookup into the directory or opening the directory for readdir. More specifically, this occurs in any of the following cases: - The encryption context has an unrecognized version number. Current kernels know about v1 and v2, but there could be more versions in the future. - The encryption context has unrecognized encryption modes (FSCRYPT_MODE_*) or flags (FSCRYPT_POLICY_FLAG_*), an unrecognized combination of modes, or reserved bits set. - The encryption key has been added and the encryption modes are recognized but aren't available in the crypto API -- for example, a directory is encrypted with FSCRYPT_MODE_ADIANTUM but the kernel doesn't have CONFIG_CRYPTO_ADIANTUM enabled. It's desirable to return errors for most operations on files that use an unsupported encryption policy, but the current behavior is too strict. We need to allow enough to delete files, so that people can't be stuck with undeletable files when downgrading kernel versions. That includes allowing directories to be listed and allowing dentries to be looked up. Fix this by modifying the key setup logic to treat an unsupported encryption policy in the same way as "key unavailable" in the cases that are required for a recursive delete to work: preparing for a readdir or a dentry lookup, revalidating a dentry, or checking whether an inode has the same encryption policy as its parent directory. Reviewed-by: Andreas Dilger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2020-12-02fscrypt: unexport fscrypt_get_encryption_info()Eric Biggers1-6/+1
Now that fscrypt_get_encryption_info() is only called from files in fs/crypto/ (due to all key setup now being handled by higher-level helper functions instead of directly by filesystems), unexport it and move its declaration to fscrypt_private.h. Reviewed-by: Andreas Dilger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2020-12-02fscrypt: move fscrypt_require_key() to fscrypt_private.hEric Biggers1-26/+0
fscrypt_require_key() is now only used by files in fs/crypto/. So reduce its visibility to fscrypt_private.h. This is also a prerequsite for unexporting fscrypt_get_encryption_info(). Reviewed-by: Andreas Dilger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2020-12-02fscrypt: move body of fscrypt_prepare_setattr() out-of-lineEric Biggers1-2/+9
In preparation for reducing the visibility of fscrypt_require_key() by moving it to fscrypt_private.h, move the call to it from fscrypt_prepare_setattr() to an out-of-line function. Reviewed-by: Andreas Dilger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2020-12-02fscrypt: introduce fscrypt_prepare_readdir()Eric Biggers1-0/+24
The last remaining use of fscrypt_get_encryption_info() from filesystems is for readdir (->iterate_shared()). Every other call is now in fs/crypto/ as part of some other higher-level operation. We need to add a new argument to fscrypt_get_encryption_info() to indicate whether the encryption policy is allowed to be unrecognized or not. Doing this is easier if we can work with high-level operations rather than direct filesystem use of fscrypt_get_encryption_info(). So add a function fscrypt_prepare_readdir() which wraps the call to fscrypt_get_encryption_info() for the readdir use case. Reviewed-by: Andreas Dilger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2020-12-02Merge tag 'mlx5-next-2020-12-02' of ↵Jakub Kicinski4-17/+99
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-next-2020-12-02 Low level mlx5 updates required by both netdev and rdma trees. * tag 'mlx5-next-2020-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Treat host PF vport as other (non eswitch manager) vport net/mlx5: Enable host PF HCA after eswitch is initialized net/mlx5: Rename peer_pf to host_pf net/mlx5: Make API mlx5_core_is_ecpf accept const pointer net/mlx5: Export steering related functions net/mlx5: Expose other function ifc bits net/mlx5: Expose IP-in-IP TX and RX capability bits net/mlx5: Update the hardware interface definition for vhca state net/mlx5: Update the list of the PCI supported devices net/mlx5: Avoid exposing driver internal command helpers net/mlx5: Add ts_cqe_to_dest_cqn related bits net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits net/mlx5: Check dr mask size against mlx5_match_param size net/mlx5: Add sampler destination type net/mlx5: Add sample offload hardware bits and structures ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2020-12-02bpf: Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooksStanislav Fomichev1-6/+6
I have to now lock/unlock socket for the bind hook execution. That shouldn't cause any overhead because the socket is unbound and shouldn't receive any traffic. Signed-off-by: Stanislav Fomichev <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrey Ignatov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-12-02Input: remove input_polled_dev implementationDmitry Torokhov1-58/+0
Now that normal input devices support polling mode, and all users of input_polled_dev API have been converted, we can remove it. Signed-off-by: Dmitry Torokhov <[email protected]>
2020-12-02irqtime: Move irqtime entry accounting after irq offset incrementationFrederic Weisbecker2-12/+26
IRQ time entry is currently accounted before HARDIRQ_OFFSET or SOFTIRQ_OFFSET are incremented. This is convenient to decide to which index the cputime to account is dispatched. Unfortunately it prevents tick_irq_enter() from being called under HARDIRQ_OFFSET because tick_irq_enter() has to be called before the IRQ entry accounting due to the necessary clock catch up. As a result we don't benefit from appropriate lockdep coverage on tick_irq_enter(). To prepare for fixing this, move the IRQ entry cputime accounting after the preempt offset is incremented. This requires the cputime dispatch code to handle the extra offset. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-02sched/vtime: Consolidate IRQ time accountingFrederic Weisbecker1-10/+6
The 3 architectures implementing CONFIG_VIRT_CPU_ACCOUNTING_NATIVE all have their own version of irq time accounting that dispatch the cputime to the appropriate index: hardirq, softirq, system, idle, guest... from an all-in-one function. Instead of having these ad-hoc versions, move the cputime destination dispatch decision to the core code and leave only the actual per-index cputime accounting to the architecture. Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-12-02SUNRPC: Remove unused function xprt_load_transport()Trond Myklebust1-1/+0
Signed-off-by: Trond Myklebust <[email protected]>
2020-12-02SUNRPC: Add a helper to return the transport identifier given a netidTrond Myklebust1-0/+1
Signed-off-by: Trond Myklebust <[email protected]>
2020-12-02SUNRPC: xprt_load_transport() needs to support the netid "rdma6"Trond Myklebust1-0/+1
According to RFC5666, the correct netid for an IPv6 addressed RDMA transport is "rdma6", which we've supported as a mount option since Linux-4.7. The problem is when we try to load the module "xprtrdma6", that will fail, since there is no modulealias of that name. Fixes: 181342c5ebe8 ("xprtrdma: Add rdma6 option to support NFS/RDMA IPv6") Signed-off-by: Trond Myklebust <[email protected]>