aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2020-01-24mptcp: Handle MP_CAPABLE options for outgoing connectionsPeter Krystad2-0/+60
Add hooks to tcp_output.c to add MP_CAPABLE to an outgoing SYN request, to capture the MP_CAPABLE in the received SYN-ACK, to add MP_CAPABLE to the final ACK of the three-way handshake. Use the .sk_rx_dst_set() handler in the subflow proto to capture when the responding SYN-ACK is received and notify the MPTCP connection layer. Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Peter Krystad <[email protected]> Signed-off-by: Christoph Paasch <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-24mptcp: Associate MPTCP context with TCP socketPeter Krystad1-0/+3
Use ULP to associate a subflow_context structure with each TCP subflow socket. Creating these sockets requires new bind and connect functions to make sure ULP is set up immediately when the subflow sockets are created. Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Co-developed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Co-developed-by: Davide Caratti <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Peter Krystad <[email protected]> Signed-off-by: Christoph Paasch <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-24mptcp: Handle MPTCP TCP optionsPeter Krystad2-0/+36
Add hooks to parse and format the MP_CAPABLE option. This option is handled according to MPTCP version 0 (RFC6824). MPTCP version 1 MP_CAPABLE (RFC6824bis/RFC8684) will be added later in coordination with related code changes. Co-developed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Co-developed-by: Davide Caratti <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Peter Krystad <[email protected]> Signed-off-by: Christoph Paasch <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-24mptcp: Add MPTCP socket stubsMat Martineau1-0/+16
Implements the infrastructure for MPTCP sockets. MPTCP sockets open one in-kernel TCP socket per subflow. These subflow sockets are only managed by the MPTCP socket that owns them and are not visible from userspace. This commit allows a userspace program to open an MPTCP socket with: sock = socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); The resulting socket is simply a wrapper around a single regular TCP socket, without any of the MPTCP protocol implemented over the wire. Co-developed-by: Florian Westphal <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Co-developed-by: Peter Krystad <[email protected]> Signed-off-by: Peter Krystad <[email protected]> Co-developed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Co-developed-by: Paolo Abeni <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Christoph Paasch <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-24net: bridge: vlan: add per-vlan stateNikolay Aleksandrov1-0/+1
The first per-vlan option added is state, it is needed for EVPN and for per-vlan STP. The state allows to control the forwarding on per-vlan basis. The vlan state is considered only if the port state is forwarding in order to avoid conflicts and be consistent. br_allowed_egress is called only when the state is forwarding, but the ingress case is a bit more complicated due to the fact that we may have the transition between port:BR_STATE_FORWARDING -> vlan:BR_STATE_LEARNING which should still allow the bridge to learn from the packet after vlan filtering and it will be dropped after that. Also to optimize the pvid state check we keep a copy in the vlan group to avoid one lookup. The state members are modified with *_ONCE() to annotate the lockless access. Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-24net: bridge: vlan: add basic option setting supportNikolay Aleksandrov1-0/+1
This patch adds support for option modification of single vlans and ranges. It allows to only modify options, i.e. skip create/delete by using the BRIDGE_VLAN_INFO_ONLY_OPTS flag. When working with a range option changes we try to pack the notifications as much as possible. v2: do full port (all vlans) notification only when creating/deleting vlans for compatibility, rework the range detection when changing options, add more verbose extack errors and check if a vlan should be used (br_vlan_should_use checks) Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-24dmaengine: Create symlinks between DMA channels and slavesGeert Uytterhoeven1-0/+4
Currently it is not easy to find out which DMA channels are in use, and which slave devices are using which channels. Fix this by creating two symlinks between the DMA channel and the actual slave device when a channel is requested: 1. A "slave" symlink from DMA channel to slave device, 2. A "dma:<name>" symlink slave device to DMA channel. When the channel is released, the symlinks are removed again. The latter requires keeping track of the slave device and the channel name in the dma_chan structure. Note that this is limited to channel request functions for requesting an exclusive slave channel that take a device pointer (dma_request_chan() and dma_request_slave_channel*()). Signed-off-by: Geert Uytterhoeven <[email protected]> Tested-by: Niklas Söderlund <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2020-01-24dmaengine: idxd: Init and probe for Intel data acceleratorsDave Jiang1-0/+228
The idxd driver introduces the Intel Data Stream Accelerator [1] that will be available on future Intel Xeon CPUs. One of the kernel access point for the driver is through the dmaengine subsystem. It will initially provide the DMA copy service to the kernel. Some of the main functionality introduced with this accelerator are: shared virtual memory (SVM) support, and descriptor submission using Intel CPU instructions movdir64b and enqcmds. There will be additional accelerator devices that share the same driver with variations to capabilities. This commit introduces the probe and initialization component of the driver. [1]: https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/157965023991.73301.6186843973135311580.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <[email protected]>
2020-01-24dmaengine: add support to dynamic register/unregister of channelsDave Jiang1-0/+4
With the channel registration routines broken out, now add support code to allow independent registering and unregistering of channels in a hotplug fashion. Signed-off-by: Dave Jiang <[email protected]> Link: https://lore.kernel.org/r/157965023364.73301.7821862091077299040.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <[email protected]>
2020-01-23hwmon: (pmbus) Detect if chip is write protectedGuenter Roeck1-1/+10
If a chip is write protected, we can not change any limits, and we can not clear status flags. This may be the reason why clearing status flags is reported to not work for some chips. Detect the condition in the pmbus core. If the chip is write protected, set limit attributes as read-only, and set the flag indicating that the status flag should be ignored. Signed-off-by: Guenter Roeck <[email protected]>
2020-01-23hwmon: Add support for enable attributes to hwmon coreGuenter Roeck1-3/+15
The hwmon ABI supports enable attributes since commit fb41a710f84e ("hwmon: Document the sensor enable attribute"), but did not add support for those attributes to the hwmon core. Do that now. Since the enable attributes are logically the most important attributes, they are added as first attribute to the attribute list. Move hwmon_in_enable from last to first place for consistency. Signed-off-by: Guenter Roeck <[email protected]>
2020-01-23hwmon: Add intrusion templatesDr. David Alan Gilbert1-0/+8
Add templates for intrusion%d_alarm and intrusion%d_beep. Note, these start at 0. Signed-off-by: Dr. David Alan Gilbert <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2020-01-23Merge tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds1-5/+40
Pull XArray fixes from Matthew Wilcox: "Primarily bugfixes, mostly around handling index wrap-around correctly. A couple of doc fixes and adding missing APIs. I had an oops live on stage at linux.conf.au this year, and it turned out to be a bug in xas_find() which I can't prove isn't triggerable in the current codebase. Then in looking for the bug, I spotted two more bugs. The bots have had a few days to chew on this with no problems reported, and it passes the test-suite (which now has more tests to make sure these problems don't come back)" * tag 'xarray-5.5' of git://git.infradead.org/users/willy/linux-dax: XArray: Add xa_for_each_range XArray: Fix xas_find returning too many entries XArray: Fix xa_find_after with multi-index entries XArray: Fix infinite loop with entry at ULONG_MAX XArray: Add wrappers for nested spinlocks XArray: Improve documentation of search marks XArray: Fix xas_pause at ULONG_MAX
2020-01-23Merge tag 'trace-v5.5-rc6-2' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Various tracing fixes: - Fix a function comparison warning for a xen trace event macro - Fix a double perf_event linking to a trace_uprobe_filter for multiple events - Fix suspicious RCU warnings in trace event code for using list_for_each_entry_rcu() when the "_rcu" portion wasn't needed. - Fix a bug in the histogram code when using the same variable - Fix a NULL pointer dereference when tracefs lockdown enabled and calling trace_set_default_clock() - A fix to a bug found with the double perf_event linking patch" * tag 'trace-v5.5-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/uprobe: Fix to make trace_uprobe_filter alignment safe tracing: Do not set trace clock if tracefs lockdown is in effect tracing: Fix histogram code when expression has same var as value tracing: trigger: Replace unneeded RCU-list traversals tracing/uprobe: Fix double perf_event linking on multiprobe uprobe tracing: xen: Ordered comparison of function pointers
2020-01-23bcache: print written and keys in trace_bcache_btree_writeGuoju Fang1-1/+2
It's useful to dump written block and keys on btree write, this patch add them into trace_bcache_btree_write. Signed-off-by: Guoju Fang <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-23bcache: use read_cache_page_gfp to read the superblockChristoph Hellwig1-0/+1
Avoid a pointless dependency on buffer heads in bcache by simply open coding reading a single page. Also add a SB_OFFSET define for the byte offset of the superblock instead of using magic numbers. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-23bcache: use a separate data structure for the on-disk super blockChristoph Hellwig1-0/+51
Split out an on-disk version struct cache_sb with the proper endianness annotations. This fixes a fair chunk of sparse warnings, but there are some left due to the way the checksum is defined. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Coly Li <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-01-23Merge back new material related to system-wide PM for v5.6.Rafael J. Wysocki1-0/+2
2020-01-23Merge branch 'spi-5.6' into spi-nextMark Brown2-4/+8
2020-01-23Merge remote-tracking branch 'regulator/topic/equal' into regulator-nextMark Brown1-0/+7
2020-01-23Merge branch 'asoc-5.6' into asoc-nextMark Brown16-56/+355
2020-01-23net: sched: add Flow Queue PIE packet schedulerMohit P. Tahiliani2-0/+33
Principles: - Packets are classified on flows. - This is a Stochastic model (as we use a hash, several flows might be hashed to the same slot) - Each flow has a PIE managed queue. - Flows are linked onto two (Round Robin) lists, so that new flows have priority on old ones. - For a given flow, packets are not reordered. - Drops during enqueue only. - ECN capability is off by default. - ECN threshold (if ECN is enabled) is at 10% by default. - Uses timestamps to calculate queue delay by default. Usage: tc qdisc ... fq_pie [ limit PACKETS ] [ flows NUMBER ] [ target TIME ] [ tupdate TIME ] [ alpha NUMBER ] [ beta NUMBER ] [ quantum BYTES ] [ memory_limit BYTES ] [ ecnprob PERCENTAGE ] [ [no]ecn ] [ [no]bytemode ] [ [no_]dq_rate_estimator ] defaults: limit: 10240 packets, flows: 1024 target: 15 ms, tupdate: 15 ms (in jiffies) alpha: 1/8, beta : 5/4 quantum: device MTU, memory_limit: 32 Mb ecnprob: 10%, ecn: off bytemode: off, dq_rate_estimator: off Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Sachin D. Patil <[email protected]> Signed-off-by: V. Saicharan <[email protected]> Signed-off-by: Mohit Bhasi <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23net: sched: pie: export symbols to be reused by FQ-PIEMohit P. Tahiliani1-0/+9
This patch makes the drop_early(), calculate_probability() and pie_process_dequeue() functions generic enough to be used by both PIE and FQ-PIE (to be added in a future commit). The major change here is in the way the functions take in arguments. This patch exports these functions and makes FQ-PIE dependent on sch_pie. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23pie: improve comments and commenting styleMohit P. Tahiliani1-27/+58
Improve the comments along with the commenting style used to describe the members of the structures and their initial values in the init functions. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23pie: rearrange structure members and their initializationsMohit P. Tahiliani1-10/+10
Rearrange the members of the structure such that closely referenced members appear together and/or fit in the same cacheline. Also, change the order of their initializations to match the order in which they appear in the structure. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23pie: use u8 instead of bool in pie_varsMohit P. Tahiliani1-2/+2
Linux best practice recommends using u8 for true/false values in structures. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23pie: rearrange macros in order of lengthMohit P. Tahiliani1-5/+5
Rearrange macros in order of length and align the values to improve readability. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23pie: use U64_MAX to denote (2^64 - 1)Mohit P. Tahiliani1-2/+2
Use the U64_MAX macro to denote the constant (2^64 - 1). Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23net: sched: pie: move common code to pie.hMohit P. Tahiliani1-0/+96
This patch moves macros, structures and small functions common to PIE and FQ-PIE (to be added in a future commit) from the file net/sched/sch_pie.c to the header file include/net/pie.h. All the moved functions are made inline. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()Eric Dumazet1-0/+2
rtnl_create_link() needs to apply dev->min_mtu and dev->max_mtu checks that we apply in do_setlink() Otherwise malicious users can crash the kernel, for example after an integer overflow : BUG: KASAN: use-after-free in memset include/linux/string.h:365 [inline] BUG: KASAN: use-after-free in __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 Write of size 32 at addr ffff88819f20b9c0 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd4/0x30b mm/kasan/report.c:374 __kasan_report.cold+0x1b/0x41 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:639 check_memory_region_inline mm/kasan/generic.c:185 [inline] check_memory_region+0x134/0x1a0 mm/kasan/generic.c:192 memset+0x24/0x40 mm/kasan/common.c:108 memset include/linux/string.h:365 [inline] __alloc_skb+0x37b/0x5e0 net/core/skbuff.c:238 alloc_skb include/linux/skbuff.h:1049 [inline] alloc_skb_with_frags+0x93/0x590 net/core/skbuff.c:5664 sock_alloc_send_pskb+0x7ad/0x920 net/core/sock.c:2242 sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2259 mld_newpack+0x1d7/0x7f0 net/ipv6/mcast.c:1609 add_grhead.isra.0+0x299/0x370 net/ipv6/mcast.c:1713 add_grec+0x7db/0x10b0 net/ipv6/mcast.c:1844 mld_send_cr net/ipv6/mcast.c:1970 [inline] mld_ifc_timer_expire+0x3d3/0x950 net/ipv6/mcast.c:2477 call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1404 expire_timers kernel/time/timer.c:1449 [inline] __run_timers kernel/time/timer.c:1773 [inline] __run_timers kernel/time/timer.c:1740 [inline] run_timer_softirq+0x6c3/0x1790 kernel/time/timer.c:1786 __do_softirq+0x262/0x98c kernel/softirq.c:292 invoke_softirq kernel/softirq.c:373 [inline] irq_exit+0x19b/0x1e0 kernel/softirq.c:413 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1137 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: 98 6b ea f9 eb 8a cc cc cc cc cc cc e9 07 00 00 00 0f 00 2d 44 1c 60 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d 34 1c 60 00 fb f4 <c3> cc 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 4e 5d 9a f9 e8 79 RSP: 0018:ffffffff89807ce8 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff13266ae RBX: ffffffff8987a1c0 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffffffff8987aa54 RBP: ffffffff89807d18 R08: ffffffff8987a1c0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: dffffc0000000000 R13: ffffffff8a799980 R14: 0000000000000000 R15: 0000000000000000 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:690 default_idle_call+0x84/0xb0 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x3c8/0x6e0 kernel/sched/idle.c:269 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:361 rest_init+0x23b/0x371 init/main.c:451 arch_call_rest_init+0xe/0x1b start_kernel+0x904/0x943 init/main.c:784 x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:490 x86_64_start_kernel+0x77/0x7b arch/x86/kernel/head64.c:471 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242 The buggy address belongs to the page: page:ffffea00067c82c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 raw: 057ffe0000000000 ffffea00067c82c8 ffffea00067c82c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88819f20b880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20b900: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88819f20b980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88819f20ba00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88819f20ba80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller11-93/+343
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-01-22 The following pull-request contains BPF updates for your *net-next* tree. We've added 92 non-merge commits during the last 16 day(s) which contain a total of 320 files changed, 7532 insertions(+), 1448 deletions(-). The main changes are: 1) function by function verification and program extensions from Alexei. 2) massive cleanup of selftests/bpf from Toke and Andrii. 3) batched bpf map operations from Brian and Yonghong. 4) tcp congestion control in bpf from Martin. 5) bulking for non-map xdp_redirect form Toke. 6) bpf_send_signal_thread helper from Yonghong. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-22bpf: Add BPF_FUNC_jiffies64Martin KaFai Lau2-1/+9
This patch adds a helper to read the 64bit jiffies. It will be used in a later patch to implement the bpf_cubic.c. The helper is inlined for jit_requested and 64 BITS_PER_LONG as the map_gen_lookup(). Other cases could be considered together with map_gen_lookup() if needed. Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-23Merge branch 'intel_idle+acpi'Rafael J. Wysocki2-0/+16
Merge changes updating the ACPI processor driver in order to export acpi_processor_evaluate_cst() to the code outside of it and adding ACPI support to the intel_idle driver based on that. * intel_idle+acpi: Documentation: admin-guide: PM: Add intel_idle document intel_idle: Use ACPI _CST on server systems intel_idle: Add module parameter to prevent ACPI _CST from being used intel_idle: Allow ACPI _CST to be used for selected known processors cpuidle: Allow idle states to be disabled by default intel_idle: Use ACPI _CST for processor models without C-state tables intel_idle: Refactor intel_idle_cpuidle_driver_init() ACPI: processor: Export acpi_processor_evaluate_cst() ACPI: processor: Make ACPI_PROCESSOR_CSTATE depend on ACPI_PROCESSOR ACPI: processor: Clean up acpi_processor_evaluate_cst() ACPI: processor: Introduce acpi_processor_evaluate_cst() ACPI: processor: Export function to claim _CST control
2020-01-22fscrypt: improve format of no-key namesDaniel Rosenberg1-75/+2
When an encrypted directory is listed without the key, the filesystem must show "no-key names" that uniquely identify directory entries, are at most 255 (NAME_MAX) bytes long, and don't contain '/' or '\0'. Currently, for short names the no-key name is the base64 encoding of the ciphertext filename, while for long names it's the base64 encoding of the ciphertext filename's dirhash and second-to-last 16-byte block. This format has the following problems: - Since it doesn't always include the dirhash, it's incompatible with directories that will use a secret-keyed dirhash over the plaintext filenames. In this case, the dirhash won't be computable from the ciphertext name without the key, so it instead must be retrieved from the directory entry and always included in the no-key name. Casefolded encrypted directories will use this type of dirhash. - It's ambiguous: it's possible to craft two filenames that map to the same no-key name, since the method used to abbreviate long filenames doesn't use a proper cryptographic hash function. Solve both these problems by switching to a new no-key name format that is the base64 encoding of a variable-length structure that contains the dirhash, up to 149 bytes of the ciphertext filename, and (if any bytes remain) the SHA-256 of the remaining bytes of the ciphertext filename. This ensures that each no-key name contains everything needed to find the directory entry again, contains only legal characters, doesn't exceed NAME_MAX, is unambiguous unless there's a SHA-256 collision, and that we only take the performance hit of SHA-256 on very long filenames. Note: this change does *not* address the existing issue where users can modify the 'dirhash' part of a no-key name and the filesystem may still accept the name. Signed-off-by: Daniel Rosenberg <[email protected]> [EB: improved comments and commit message, fixed checking return value of base64_decode(), check for SHA-256 error, continue to set disk_name for short names to keep matching simpler, and many other cleanups] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2020-01-22fscrypt: derive dirhash key for casefolded directoriesDaniel Rosenberg1-0/+10
When we allow indexed directories to use both encryption and casefolding, for the dirhash we can't just hash the ciphertext filenames that are stored on-disk (as is done currently) because the dirhash must be case insensitive, but the stored names are case-preserving. Nor can we hash the plaintext names with an unkeyed hash (or a hash keyed with a value stored on-disk like ext4's s_hash_seed), since that would leak information about the names that encryption is meant to protect. Instead, if we can accept a dirhash that's only computable when the fscrypt key is available, we can hash the plaintext names with a keyed hash using a secret key derived from the directory's fscrypt master key. We'll use SipHash-2-4 for this purpose. Prepare for this by deriving a SipHash key for each casefolded encrypted directory. Make sure to handle deriving the key not only when setting up the directory's fscrypt_info, but also in the case where the casefold flag is enabled after the fscrypt_info was already set up. (We could just always derive the key regardless of casefolding, but that would introduce unnecessary overhead for people not using casefolding.) Signed-off-by: Daniel Rosenberg <[email protected]> [EB: improved commit message, updated fscrypt.rst, squashed with change that avoids unnecessarily deriving the key, and many other cleanups] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2020-01-22fscrypt: don't allow v1 policies with casefoldingDaniel Rosenberg1-0/+9
Casefolded encrypted directories will use a new dirhash method that requires a secret key. If the directory uses a v2 encryption policy, it's easy to derive this key from the master key using HKDF. However, v1 encryption policies don't provide a way to derive additional keys. Therefore, don't allow casefolding on directories that use a v1 policy. Specifically, make it so that trying to enable casefolding on a directory that has a v1 policy fails, trying to set a v1 policy on a casefolded directory fails, and trying to open a casefolded directory that has a v1 policy (if one somehow exists on-disk) fails. Signed-off-by: Daniel Rosenberg <[email protected]> [EB: improved commit message, updated fscrypt.rst, and other cleanups] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]>
2020-01-22bpf: Introduce dynamic program extensionsAlexei Starovoitov4-1/+17
Introduce dynamic program extensions. The users can load additional BPF functions and replace global functions in previously loaded BPF programs while these programs are executing. Global functions are verified individually by the verifier based on their types only. Hence the global function in the new program which types match older function can safely replace that corresponding function. This new function/program is called 'an extension' of old program. At load time the verifier uses (attach_prog_fd, attach_btf_id) pair to identify the function to be replaced. The BPF program type is derived from the target program into extension program. Technically bpf_verifier_ops is copied from target program. The BPF_PROG_TYPE_EXT program type is a placeholder. It has empty verifier_ops. The extension program can call the same bpf helper functions as target program. Single BPF_PROG_TYPE_EXT type is used to extend XDP, SKB and all other program types. The verifier allows only one level of replacement. Meaning that the extension program cannot recursively extend an extension. That also means that the maximum stack size is increasing from 512 to 1024 bytes and maximum function nesting level from 8 to 16. The programs don't always consume that much. The stack usage is determined by the number of on-stack variables used by the program. The verifier could have enforced 512 limit for combined original plus extension program, but it makes for difficult user experience. The main use case for extensions is to provide generic mechanism to plug external programs into policy program or function call chaining. BPF trampoline is used to track both fentry/fexit and program extensions because both are using the same nop slot at the beginning of every BPF function. Attaching fentry/fexit to a function that was replaced is not allowed. The opposite is true as well. Replacing a function that currently being analyzed with fentry/fexit is not allowed. The executable page allocated by BPF trampoline is not used by program extensions. This inefficiency will be optimized in future patches. Function by function verification of global function supports scalars and pointer to context only. Hence program extensions are supported for such class of global functions only. In the future the verifier will be extended with support to pointers to structures, arrays with sizes, etc. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-22ima: add the ability to query the cached hash of a given fileFlorent Revest1-0/+6
This allows other parts of the kernel (perhaps a stacked LSM allowing system monitoring, eg. the proposed KRSI LSM [1]) to retrieve the hash of a given file from IMA if it's present in the iint cache. It's true that the existence of the hash means that it's also in the audit logs or in /sys/kernel/security/ima/ascii_runtime_measurements, but it can be difficult to pull that information out for every subsequent exec. This is especially true if a given host has been up for a long time and the file was first measured a long time ago. It should be kept in mind that this function gives access to cached entries which can be removed, for instance on security_inode_free(). This is based on Peter Moody's patch: https://sourceforge.net/p/linux-ima/mailman/message/33036180/ [1] https://lkml.org/lkml/2019/9/10/393 Signed-off-by: Florent Revest <[email protected]> Reviewed-by: KP Singh <[email protected]> Signed-off-by: Mimi Zohar <[email protected]>
2020-01-22Merge tag 'io_uring-5.5-2020-01-22' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+2
Pull io_uring fix from Jens Axboe: "This was supposed to have gone in last week, but due to a brain fart on my part, I forgot that we made this struct addition in the 5.5 cycle. So here it is for 5.5, to prevent having a 32 vs 64-bit compatability issue with the files_update command" * tag 'io_uring-5.5-2020-01-22' of git://git.kernel.dk/linux-block: io_uring: fix compat for IORING_REGISTER_FILES_UPDATE
2020-01-22genirq, sched/isolation: Isolate from handling managed interruptsMing Lei1-0/+1
The affinity of managed interrupts is completely handled in the kernel and cannot be changed via the /proc/irq/* interfaces from user space. As the kernel tries to spread out interrupts evenly accross CPUs on x86 to prevent vector exhaustion, it can happen that a managed interrupt whose affinity mask contains both isolated and housekeeping CPUs is routed to an isolated CPU. As a consequence IO submitted on a housekeeping CPU causes interrupts on the isolated CPU. Add a new sub-parameter 'managed_irq' for 'isolcpus' and the corresponding logic in the interrupt affinity selection code. The subparameter indicates to the interrupt affinity selection logic that it should try to avoid the above scenario. This isolation is best effort and only effective if the automatically assigned interrupt mask of a device queue contains isolated and housekeeping CPUs. If housekeeping CPUs are online then such interrupts are directed to the housekeeping CPU so that IO submitted on the housekeeping CPU cannot disturb the isolated CPU. If a queue's affinity mask contains only isolated CPUs then this parameter has no effect on the interrupt routing decision, though interrupts are only happening when tasks running on those isolated CPUs submit IO. IO submitted on housekeeping CPUs has no influence on those queues. If the affinity mask contains both housekeeping and isolated CPUs, but none of the contained housekeeping CPUs is online, then the interrupt is also routed to an isolated CPU. Interrupts are only delivered when one of the isolated CPUs in the affinity mask submits IO. If one of the contained housekeeping CPUs comes online, the CPU hotplug logic migrates the interrupt automatically back to the upcoming housekeeping CPU. Depending on the type of interrupt controller, this can require that at least one interrupt is delivered to the isolated CPU in order to complete the migration. [ tglx: Removed unused parameter, added and edited comments/documentation and rephrased the changelog so it contains more details. ] Signed-off-by: Ming Lei <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-22irqchip/gic-v4.1: Allow direct invalidation of VLPIsMarc Zyngier1-0/+1
Just like for INVALL, GICv4.1 has grown a VPE-aware INVLPI register. Let's plumb it in and make use of the DirectLPI code in that case. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-22irqchip/gic-v4.1: Add VPE INVALL callbackMarc Zyngier1-0/+6
GICv4.1 redistributors have a VPE-aware INVALL register. Progress! We can now emulate a guest-requested INVALL without emiting a VINVALL command. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-22irqchip/gic-v4.1: Add VPE residency callbackMarc Zyngier2-0/+14
Making a VPE resident on GICv4.1 is pretty simple, as it is just a single write to the local redistributor. We just need extra information about which groups to enable, which the KVM code will have to provide. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-22irqchip/gic-v4.1: Add mask/unmask doorbell callbacksMarc Zyngier1-1/+2
masking/unmasking doorbells on GICv4.1 relies on a new INVDB command, which broadcasts the invalidation to all RDs. Implement the new command as well as the masking callbacks, and plug the whole thing into the v4.1 VPE irqchip. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-22irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPPMarc Zyngier1-4/+14
The ITS VMAPP command gains some new fields with GICv4.1: - a default doorbell, which allows a single doorbell to be used for all the VLPIs routed to a given VPE - a pointer to the configuration table (instead of having it in a register that gets context switched) - a flag indicating whether this is the first map or the last unmap for this particular VPE - a flag indicating whether the pending table is known to be zeroed, or not Plumb in the new fields in the VMAPP builder, and add the map/unmap refcounting so that the ITS can do the right thing. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-22irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocationMarc Zyngier1-4/+29
GICv4.1 defines a new VPE table that is potentially shared between both the ITSs and the redistributors, following complicated affinity rules. To make things more confusing, the programming of this table at the redistributor level is reusing the GICv4.0 GICR_VPROPBASER register for something completely different. The code flow is somewhat complexified by the need to respect the affinities required by the HW, meaning that tables can either be inherited from a previously discovered ITS or redistributor. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-22irqchip/gic-v3: Add GICv4.1 VPEID size discoveryMarc Zyngier1-0/+5
While GICv4.0 mandates 16 bit worth of VPEIDs, GICv4.1 allows smaller implementations to be built. Add the required glue to dynamically compute the limit. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-22irqchip/gic-v3: Detect GICv4.1 supporting RVPEIDMarc Zyngier1-0/+2
GICv4.1 supports the RVPEID ("Residency per vPE ID"), which allows for a much efficient way of making virtual CPUs resident (to allow direct injection of interrupts). The functionnality needs to be discovered on each and every redistributor in the system, and disabled if the settings are inconsistent. Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Zenghui Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-22crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_dataTudor Ambarus1-23/+0
These drivers no longer need it as they are only probed via DT. crypto_platform_data was allocated but unused, so remove it. This is a follow up for: commit 45a536e3a7e0 ("crypto: atmel-tdes - Retire dma_request_slave_channel_compat()") commit db28512f48e2 ("crypto: atmel-sha - Retire dma_request_slave_channel_compat()") commit 62f72cbdcf02 ("crypto: atmel-aes - Retire dma_request_slave_channel_compat()") Signed-off-by: Tudor Ambarus <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2020-01-22xsk, net: Make sock_def_readable() have external linkageBjörn Töpel1-0/+2
XDP sockets use the default implementation of struct sock's sk_data_ready callback, which is sock_def_readable(). This function is called in the XDP socket fast-path, and involves a retpoline. By letting sock_def_readable() have external linkage, and being called directly, the retpoline can be avoided. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]