blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2020-03-29	Linux 5.6	Linus Torvalds	1	-1/+1

2020-03-29	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	7	-45/+82
	Merge vm fixes from Andrew Morton: "5 fixes" * emailed patches from Andrew Morton <[email protected]>: mm/sparse: fix kernel crash with pfn_section_valid check mm: fork: fix kernel_stack memcg stats for various stack implementations hugetlb_cgroup: fix illegal access to memory drivers/base/memory.c: indicate all memory blocks as removable mm/swapfile.c: move inode_lock out of claim_swapfile
2020-03-29	Merge tag 'timers-urgent-2020-03-29' of ↵	Linus Torvalds	1	-2/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "A single fix for the Hyper-V clocksource driver to make sched clock actually return nanoseconds and not the virtual clock value which increments at 10e7 HZ (100ns)" * tag 'timers-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/hyper-v: Make sched clock return nanoseconds correctly
2020-03-29	Merge tag 'irq-urgent-2020-03-29' of ↵	Linus Torvalds	1	-2/+9
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Thomas Gleixner: "A single bugfix to prevent reference leaks in irq affinity notifiers" * tag 'irq-urgent-2020-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Fix reference leaks on irq affinity notifiers
2020-03-29	mm/sparse: fix kernel crash with pfn_section_valid check	Aneesh Kumar K.V	1	-0/+6
	Fix the crash like this: BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc000000000c3447c Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1 ... NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0 LR [c000000000088354] vmemmap_free+0x144/0x320 Call Trace: section_deactivate+0x220/0x240 __remove_pages+0x118/0x170 arch_remove_memory+0x3c/0x150 memunmap_pages+0x1cc/0x2f0 devm_action_release+0x30/0x50 release_nodes+0x2f8/0x3e0 device_release_driver_internal+0x168/0x270 unbind_store+0x130/0x170 drv_attr_store+0x44/0x60 sysfs_kf_write+0x68/0x80 kernfs_fop_write+0x100/0x290 __vfs_write+0x3c/0x70 vfs_write+0xcc/0x240 ksys_write+0x7c/0x140 system_call+0x5c/0x68 The crash is due to NULL dereference at test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL in pfn_section_valid() With commit d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM\|!VMEMMAP case") section_mem_map is set to NULL after depopulate_section_mem(). This was done so that pfn_page() can work correctly with kernel config that disables SPARSEMEM_VMEMMAP. With that config pfn_to_page does __section_mem_map_addr(__sec) + __pfn; where static inline struct page __section_mem_map_addr(struct mem_section section) { unsigned long map = section->section_mem_map; map &= SECTION_MAP_MASK; return (struct page )map; } Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to check the pfn validity (pfn_valid()). Since section_deactivate release mem_section->usage if a section is fully deactivated, pfn_valid() check after a subsection_deactivate cause a kernel crash. static inline int pfn_valid(unsigned long pfn) { ... return early_section(ms) \|\| pfn_section_valid(ms, pfn); } where static inline int pfn_section_valid(struct mem_section ms, unsigned long pfn) { int idx = subsection_map_index(pfn); return test_bit(idx, ms->usage->subsection_map); } Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed. For architectures like ppc64 where large pages are used for vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple sections. Hence before a vmemmap mapping page can be freed, the kernel needs to make sure there are no valid sections within that mapping. Clearing the section valid bit before depopulate_section_memap enables this. [[email protected]: add comment] Link: http://lkml.kernel.org/r/[email protected]: http://lkml.kernel.org/r/[email protected] Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM\|!VMEMMAP case") Reported-by: Sachin Sant <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Sachin Sant <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Wei Yang <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Pankaj Gupta <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-03-29	mm: fork: fix kernel_stack memcg stats for various stack implementations	Roman Gushchin	3	-2/+52
	Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the space for task stacks can be allocated using __vmalloc_node_range(), alloc_pages_node() and kmem_cache_alloc_node(). In the first and the second cases page->mem_cgroup pointer is set, but in the third it's not: memcg membership of a slab page should be determined using the memcg_from_slab_page() function, which looks at page->slab_cache->memcg_params.memcg . In this case, using mod_memcg_page_state() (as in account_kernel_stack()) is incorrect: page->mem_cgroup pointer is NULL even for pages charged to a non-root memory cgroup. It can lead to kernel_stack per-memcg counters permanently showing 0 on some architectures (depending on the configuration). In order to fix it, let's introduce a mod_memcg_obj_state() helper, which takes a pointer to a kernel object as a first argument, uses mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls mod_memcg_state(). It allows to handle all possible configurations (CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without spilling any memcg/kmem specifics into fork.c . Note: This is a special version of the patch created for stable backports. It contains code from the following two patches: - mm: memcg/slab: introduce mem_cgroup_from_obj() - mm: fork: fix kernel_stack memcg stats for various stack implementations [[email protected]: introduce mem_cgroup_from_obj()] Link: http://lkml.kernel.org/r/[email protected] Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages") Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-03-29	hugetlb_cgroup: fix illegal access to memory	Mina Almasry	1	-2/+1
	This appears to be a mistake in commit faced7e0806cf ("mm: hugetlb controller for cgroups v2"). Essentially that commit does a hugetlb_cgroup_from_counter assuming that page_counter_try_charge has initialized counter. But if that has failed then it seems will not initialize counter, so hugetlb_cgroup_from_counter(counter) ends up pointing to random memory, causing kasan to complain. The solution is to simply use 'h_cg', instead of hugetlb_cgroup_from_counter(counter), since that is a reference to the hugetlb_cgroup anyway. After this change kasan ceases to complain. Fixes: faced7e0806cf ("mm: hugetlb controller for cgroups v2") Reported-by: [email protected] Signed-off-by: Mina Almasry <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Giuseppe Scrivano <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: David Rientjes <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-03-29	drivers/base/memory.c: indicate all memory blocks as removable	David Hildenbrand	1	-20/+3
	We see multiple issues with the implementation/interface to compute whether a memory block can be offlined (exposed via /sys/devices/system/memory/memoryX/removable) and would like to simplify it (remove the implementation). 1. It runs basically lockless. While this might be good for performance, we see possible races with memory offlining that will require at least some sort of locking to fix. 2. Nowadays, more false positives are possible. No arch-specific checks are performed that validate if memory offlining will not be denied right away (and such check will require locking). For example, arm64 won't allow to offline any memory block that was added during boot - which will imply a very high error rate. Other archs have other constraints. 3. The interface is inherently racy. E.g., if a memory block is detected to be removable (and was not a false positive at that time), there is still no guarantee that offlining will actually succeed. So any caller already has to deal with false positives. 4. It is unclear which performance benefit this interface actually provides. The introducing commit 5c755e9fd813 ("memory-hotplug: add sysfs removable attribute for hotplug memory remove") mentioned "A user-level agent must be able to identify which sections of memory are likely to be removable before attempting the potentially expensive operation." However, no actual performance comparison was included. Known users: - lsmem: Will group memory blocks based on the "removable" property. [1] - chmem: Indirect user. It has a RANGE mode where one can specify removable ranges identified via lsmem to be offlined. However, it also has a "SIZE" mode, which allows a sysadmin to skip the manual "identify removable blocks" step. [2] - powerpc-utils: Uses the "removable" attribute to skip some memory blocks right away when trying to find some to offline+remove. However, with ballooning enabled, it already skips this information completely (because it once resulted in many false negatives). Therefore, the implementation can deal with false positives properly already. [3] According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer driven from userspace via the drmgr command (powerpc-utils). Nowadays it's managed in the kernel - including onlining/offlining of memory blocks - triggered by drmgr writing to /sys/kernel/dlpar. So the affected legacy userspace handling is only active on old kernels. Only very old versions of drmgr on a new kernel (unlikely) might execute slower - totally acceptable. With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not break any user space tool. We implement a very bad heuristic now. Without CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report "not removable" as before. Original discussion can be found in [4] ("[PATCH RFC v1] mm: is_mem_section_removable() overhaul"). Other users of is_mem_section_removable() will be removed next, so that we can remove is_mem_section_removable() completely. [1] http://man7.org/linux/man-pages/man1/lsmem.1.html [2] http://man7.org/linux/man-pages/man8/chmem.8.html [3] https://github.com/ibm-power-utilities/powerpc-utils [4] https://lkml.kernel.org/r/[email protected] Also, this patch probably fixes a crash reported by Steve. http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLXw@mail.gmail.com Reported-by: "Scargall, Steve" <[email protected]> Suggested-by: Michal Hocko <[email protected]> Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Nathan Fontenot <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Dan Williams <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Robert Jennings <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Karel Zak <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-03-29	mm/swapfile.c: move inode_lock out of claim_swapfile	Naohiro Aota	1	-21/+20
	claim_swapfile() currently keeps the inode locked when it is successful, or the file is already swapfile (with -EBUSY). And, on the other error cases, it does not lock the inode. This inconsistency of the lock state and return value is quite confusing and actually causing a bad unlock balance as below in the "bad_swap" section of __do_sys_swapon(). This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE check out of claim_swapfile(). The inode is unlocked in "bad_swap_unlock_inode" section, so that the inode is ensured to be unlocked at "bad_swap". Thus, error handling codes after the locking now jumps to "bad_swap_unlock_inode" instead of "bad_swap". ===================================== WARNING: bad unlock balance detected! 5.5.0-rc7+ #176 Not tainted ------------------------------------- swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at: __do_sys_swapon+0x94b/0x3550 but there are no more locks to release! other info that might help us debug this: no locks held by swapon/4294. stack backtrace: CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176 Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014 Call Trace: dump_stack+0xa1/0xea print_unlock_imbalance_bug.cold+0x114/0x123 lock_release+0x562/0xed0 up_write+0x2d/0x490 __do_sys_swapon+0x94b/0x3550 __x64_sys_swapon+0x54/0x80 do_syscall_64+0xa4/0x4b0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f15da0a0dc7 Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices") Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Qais Youef <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-03-29	netfilter: nf_queue: prefer nf_queue_entry_free	Florian Westphal	1	-18/+9
	Instead of dropping refs+kfree, use the helper added in previous patch. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-29	netfilter: nf_queue: do not release refcouts until nf_reinject is done	Florian Westphal	1	-4/+2
	nf_queue is problematic when another NF_QUEUE invocation happens from nf_reinject(). 1. nf_queue is invoked, increments state->sk refcount. 2. skb is queued, waiting for verdict. 3. sk is closed/released. 3. verdict comes back, nf_reinject is called. 4. nf_reinject drops the reference -- refcount can now drop to 0 Instead of get_ref/release_ref pattern, we need to nest the get_ref calls: get_ref get_ref release_ref release_ref So that when we invoke the next processing stage (another netfilter or the okfn()), we hold at least one reference count on the devices/socket. After previous patch, it is now safe to put the entry even after okfn() has potentially free'd the skb. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-29	netfilter: nf_queue: place bridge physports into queue_entry struct	Florian Westphal	2	-31/+27
	The refcount is done via entry->skb, which does work fine. Major problem: When putting the refcount of the bridge ports, we must always put the references while the skb is still around. However, we will need to put the references after okfn() to avoid a possible 1 -> 0 -> 1 refcount transition, so we cannot use the skb pointer anymore. Place the physports in the queue entry structure instead to allow for refcounting changes in the next patch. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-29	netfilter: nf_queue: make nf_queue_entry_release_refs static	Florian Westphal	3	-11/+11
	This is a preparation patch, no logical changes. Move free_entry into core and rename it to something more sensible. Will ease followup patches which will complicate the refcount handling. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-28	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Linus Torvalds	23	-80/+221
	Pull networking fixes from David Miller: 1) Fix memory leak in vti6, from Torsten Hilbrich. 2) Fix double free in xfrm_policy_timer, from YueHaibing. 3) NL80211_ATTR_CHANNEL_WIDTH attribute is put with wrong type, from Johannes Berg. 4) Wrong allocation failure check in qlcnic driver, from Xu Wang. 5) Get ks8851-ml IO operations right, for real this time, from Marek Vasut. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits) r8169: fix PHY driver check on platforms w/o module softdeps net: ks8851-ml: Fix IO operations, again mlxsw: spectrum_mr: Fix list iteration in error path qlcnic: Fix bad kzalloc null test mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX mac80211: mark station unauthorized before key removal mac80211: Check port authorization in the ieee80211_tx_dequeue() case cfg80211: Do not warn on same channel at the end of CSA mac80211: drop data frames without key on encrypted links ieee80211: fix HE SPR size calculation nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute type xfrm: policy: Fix doulbe free in xfrm_policy_timer bpf: Explicitly memset some bpf info structures declared on the stack bpf: Explicitly memset the bpf_attr structure bpf: Sanitize the bpf_struct_ops tcp-cc name vti6: Fix memory leak of skb if input policy check fails esp: remove the skb from the chain when it's enqueued in cryptd_wq ipv6: xfrm6_tunnel.c: Use built-in RCU list checking xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire xfrm: fix uctx len check in verify_sec_ctx_len ...
2020-03-28	Merge branch 'ifla_xdp_expected_fd'	Alexei Starovoitov	9	-9/+146
	Toke Høiland-Jørgensen says: ==================== This series adds support for atomically replacing the XDP program loaded on an interface. This is achieved by means of a new netlink attribute that can specify the expected previous program to replace on the interface. If set, the kernel will compare this "expected fd" attribute with the program currently loaded on the interface, and reject the operation if it does not match. With this primitive, userspace applications can avoid stepping on each other's toes when simultaneously updating the loaded XDP program. Changelog: v4: - Switch back to passing FD instead of ID (Andrii) - Rename flag to XDP_FLAGS_REPLACE (for consistency with other similar uses) v3: - Pass existing ID instead of FD (Jakub) - Use opts struct for new libbpf function (Andrii) v2: - Fix checkpatch nits and add .strict_start_type to netlink policy (Jakub) ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2020-03-28	selftests/bpf: Add tests for attaching XDP programs	Toke Høiland-Jørgensen	1	-0/+62
	This adds tests for the various replacement operations using IFLA_XDP_EXPECTED_FD. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-28	libbpf: Add function to set link XDP fd while specifying old program	Toke Høiland-Jørgensen	3	-1/+42
	This adds a new function to set the XDP fd while specifying the FD of the program to replace, using the newly added IFLA_XDP_EXPECTED_FD netlink parameter. The new function uses the opts struct mechanism to be extendable in the future. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-28	tools: Add EXPECTED_FD-related definitions in if_link.h	Toke Høiland-Jørgensen	1	-1/+3
	This adds the IFLA_XDP_EXPECTED_FD netlink attribute definition and the XDP_FLAGS_REPLACE flag to if_link.h in tools/include. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-28	xdp: Support specifying expected existing program when attaching XDP	Toke Høiland-Jørgensen	4	-7/+39
	While it is currently possible for userspace to specify that an existing XDP program should not be replaced when attaching to an interface, there is no mechanism to safely replace a specific XDP program with another. This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be set along with IFLA_XDP_FD. If set, the kernel will check that the program currently loaded on the interface matches the expected one, and fail the operation if it does not. This corresponds to a 'cmpxchg' memory operation. Setting the new attribute with a negative value means that no program is expected to be attached, which corresponds to setting the UPDATE_IF_NOEXIST flag. A new companion flag, XDP_FLAGS_REPLACE, is also added to explicitly request checking of the EXPECTED_FD attribute. This is needed for userspace to discover whether the kernel supports the new attribute. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-28	Merge branch 'i2c/for-current' of ↵	Linus Torvalds	5	-15/+13
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Three more driver bugfixes, and two doc improvements fixing build warnings while we are here" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: pca-platform: Use platform_irq_get_optional i2c: st: fix missing struct parameter description i2c: nvidia-gpu: Handle timeout correctly in gpu_i2c_check_status() i2c: fix a doc warning i2c: hix5hd2: add missed clk_disable_unprepare in remove
2020-03-28	bpf: Fix build warning regarding missing prototypes	Jean-Philippe Menil	1	-0/+4
	Fix build warnings when building net/bpf/test_run.o with W=1 due to missing prototype for bpf_fentry_test{1..6}. Instead of declaring prototypes, turn off warnings with __diag_{push,ignore,pop} as pointed out by Alexei. Signed-off-by: Jean-Philippe Menil <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-28	Merge tag 'scsi-fixes' of ↵	Linus Torvalds	2	-3/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two small fixes: one in drivers (qla2xxx), and one in the core (sd) to try to cope with USB enclosures that silently change reported parameters" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Fix optimal I/O size for devices that change reported values scsi: qla2xxx: Fix I/Os being passed down when FC device is being deleted
2020-03-28	libbpf, xsk: Init all ring members in xsk_umem__create and xsk_socket__create	Fletcher Dunn	1	-2/+14
	Fix a sharp edge in xsk_umem__create and xsk_socket__create. Almost all of the members of the ring buffer structs are initialized, but the "cached_xxx" variables are not all initialized. The caller is required to zero them. This is needlessly dangerous. The results if you don't do it can be very bad. For example, they can cause xsk_prod_nb_free and xsk_cons_nb_avail to return values greater than the size of the queue. xsk_ring_cons__peek can return an index that does not refer to an item that has been queued. I have confirmed that without this change, my program misbehaves unless I memset the ring buffers to zero before calling the function. Afterwards, my program works without (or with) the memset. Signed-off-by: Fletcher Dunn <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Magnus Karlsson <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-28	i2c: pca-platform: Use platform_irq_get_optional	Chris Packham	1	-1/+1
	The interrupt is not required so use platform_irq_get_optional() to avoid error messages like i2c-pca-platform 22080000.i2c: IRQ index 0 not found Signed-off-by: Chris Packham <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2020-03-28	i2c: st: fix missing struct parameter description	Alain Volmat	1	-0/+1
	Fix a missing struct parameter description to allow warning free W=1 compilation. Signed-off-by: Alain Volmat <[email protected]> Reviewed-by: Patrice Chotard <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2020-03-27	Merge branch 'cgroup-helpers'	Alexei Starovoitov	11	-14/+336
	Daniel Borkmann says: ==================== This adds various straight-forward helper improvements and additions to BPF cgroup based connect(), sendmsg(), recvmsg() and bind-related hooks which would allow to implement more fine-grained policies and improve current load balancer limitations we're seeing. For details please see individual patches. I've tested them on Kubernetes & Cilium and also added selftests for the small verifier extension. Thanks! ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2020-03-27	bpf: Add selftest cases for ctx_or_null argument type	Daniel Borkmann	1	-0/+105
	Add various tests to make sure the verifier keeps catching them: # ./test_verifier [...] #230/p pass ctx or null check, 1: ctx OK #231/p pass ctx or null check, 2: null OK #232/p pass ctx or null check, 3: 1 OK #233/p pass ctx or null check, 4: ctx - const OK #234/p pass ctx or null check, 5: null (connect) OK #235/p pass ctx or null check, 6: null (bind) OK #236/p pass ctx or null check, 7: ctx (bind) OK #237/p pass ctx or null check, 8: null (bind) OK [...] Summary: 1595 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/c74758d07b1b678036465ef7f068a49e9efd3548.1585323121.git.daniel@iogearbox.net
2020-03-27	bpf: Enable retrival of pid/tgid/comm from bpf cgroup hooks	Daniel Borkmann	1	-0/+8
	We already have the bpf_get_current_uid_gid() helper enabled, and given we now have perf event RB output available for connect(), sendmsg(), recvmsg() and bind-related hooks, add a trivial change to enable bpf_get_current_pid_tgid() and bpf_get_current_comm() as well. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/18744744ed93c06343be8b41edcfd858706f39d7.1585323121.git.daniel@iogearbox.net
2020-03-27	bpf: Enable bpf cgroup hooks to retrieve cgroup v2 and ancestor id	Daniel Borkmann	6	-2/+72
	Enable the bpf_get_current_cgroup_id() helper for connect(), sendmsg(), recvmsg() and bind-related hooks in order to retrieve the cgroup v2 context which can then be used as part of the key for BPF map lookups, for example. Given these hooks operate in process context 'current' is always valid and pointing to the app that is performing mentioned syscalls if it's subject to a v2 cgroup. Also with same motivation of commit 7723628101aa ("bpf: Introduce bpf_skb_ancestor_cgroup_id helper") enable retrieval of ancestor from current so the cgroup id can be used for policy lookups which can then forbid connect() / bind(), for example. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/d2a7ef42530ad299e3cbb245e6c12374b72145ef.1585323121.git.daniel@iogearbox.net
2020-03-27	bpf: Allow to retrieve cgroup v1 classid from v2 hooks	Daniel Borkmann	2	-1/+27
	Today, Kubernetes is still operating on cgroups v1, however, it is possible to retrieve the task's classid based on 'current' out of connect(), sendmsg(), recvmsg() and bind-related hooks for orchestrators which attach to the root cgroup v2 hook in a mixed env like in case of Cilium, for example, in order to then correlate certain pod traffic and use it as part of the key for BPF map lookups. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/555e1c69db7376c0947007b4951c260e1074efc3.1585323121.git.daniel@iogearbox.net
2020-03-27	bpf: Add netns cookie and enable it for bpf cgroup hooks	Daniel Borkmann	7	-8/+103
	In Cilium we're mainly using BPF cgroup hooks today in order to implement kube-proxy free Kubernetes service translation for ClusterIP, NodePort (), ExternalIP, and LoadBalancer as well as HostPort mapping [0] for all traffic between Cilium managed nodes. While this works in its current shape and avoids packet-level NAT for inter Cilium managed node traffic, there is one major limitation we're facing today, that is, lack of netns awareness. In Kubernetes, the concept of Pods (which hold one or multiple containers) has been built around network namespaces, so while we can use the global scope of attaching to root BPF cgroup hooks also to our advantage (e.g. for exposing NodePort ports on loopback addresses), we also have the need to differentiate between initial network namespaces and non-initial one. For example, ExternalIP services mandate that non-local service IPs are not to be translated from the host (initial) network namespace as one example. Right now, we have an ugly work-around in place where non-local service IPs for ExternalIP services are not xlated from connect() and friends BPF hooks but instead via less efficient packet-level NAT on the veth tc ingress hook for Pod traffic. On top of determining whether we're in initial or non-initial network namespace we also have a need for a socket-cookie like mechanism for network namespaces scope. Socket cookies have the nice property that they can be combined as part of the key structure e.g. for BPF LRU maps without having to worry that the cookie could be recycled. We are planning to use this for our sessionAffinity implementation for services. Therefore, add a new bpf_get_netns_cookie() helper which would resolve both use cases at once: bpf_get_netns_cookie(NULL) would provide the cookie for the initial network namespace while passing the context instead of NULL would provide the cookie from the application's network namespace. We're using a hole, so no size increase; the assignment happens only once. Therefore this allows for a comparison on initial namespace as well as regular cookie usage as we have today with socket cookies. We could later on enable this helper for other program types as well as we would see need. () Both externalTrafficPolicy={Local\|Cluster} types [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.c Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/c47d2346982693a9cf9da0e12690453aded4c788.1585323121.git.daniel@iogearbox.net
2020-03-27	bpf: Enable perf event rb output for bpf cgroup progs	Daniel Borkmann	1	-5/+9
	Currently, connect(), sendmsg(), recvmsg() and bind-related hooks are all lacking perf event rb output in order to push notifications or monitoring events up to user space. Back in commit a5a3a828cd00 ("bpf: add perf event notificaton support for sock_ops"), I've worked with Sowmini to enable them for sock_ops where the context part is not used (as opposed to skbs for example where the packet data can be appended). Make the bpf_sockopt_event_output() helper generic and enable it for mentioned hooks. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/69c39daf87e076b31e52473c902e9bfd37559124.1585323121.git.daniel@iogearbox.net
2020-03-27	bpf: Enable retrieval of socket cookie for bind/post-bind hook	Daniel Borkmann	1	-0/+14
	We currently make heavy use of the socket cookie in BPF's connect(), sendmsg() and recvmsg() hooks for load-balancing decisions. However, it is currently not enabled/implemented in BPF {post-}bind hooks where it can later be used in combination for correlation in the tc egress path, for example. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/e9d71f310715332f12d238cc650c1edc5be55119.1585323121.git.daniel@iogearbox.net
2020-03-27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	David S. Miller	4	-20/+25
	Daniel Borkmann says: ==================== pull-request: bpf 2020-03-27 The following pull-request contains BPF updates for your net tree. We've added 3 non-merge commits during the last 4 day(s) which contain a total of 4 files changed, 25 insertions(+), 20 deletions(-). The main changes are: 1) Explicitly memset the bpf_attr structure on bpf() syscall to avoid having to rely on compiler to do so. Issues have been noticed on some compilers with padding and other oddities where the request was then unexpectedly rejected, from Greg Kroah-Hartman. 2) Sanitize the bpf_struct_ops TCP congestion control name in order to avoid problematic characters such as whitespaces, from Martin KaFai Lau. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-27	Merge branch 'DSA-mtu'	David S. Miller	20	-48/+500
	Vladimir Oltean says: ==================== Configure the MTU on DSA switches This series adds support for configuring the MTU on front-panel switch ports, while seamlessly adapting the CPU port and the DSA master to the largest value plus the tagger overhead. It also implements bridge MTU auto-normalization within the DSA core, as resulted after the feedback of the implementation of this feature inside the bridge driver in v2. Support was added for quite a number of switches, in the hope that this series would gain some traction: - sja1105 - felix - vsc73xx - b53 and rest of the platform V3 of this series was submitted here: https://patchwork.ozlabs.org/cover/1262394/ V2 of this series was submitted here: https://patchwork.ozlabs.org/cover/1261471/ V1 of this series was submitted here: https://patchwork.ozlabs.org/cover/1199868/ ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-27	net: dsa: felix: support changing the MTU	Vladimir Oltean	3	-10/+61
	Changing the MTU for this switch means altering the DEV_GMII:MAC_CFG_STATUS:MAC_MAXLEN_CFG field MAX_LEN, which in turn limits the size of frames that can be received. Special accounting needs to be done for the DSA CPU port (NPI port in hardware terms). The NPI port configuration needs to be held inside the private ocelot structure, since it is now accessed from multiple places. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	net: dsa: vsc73xx: make the MTU configurable	Vladimir Oltean	1	-10/+20
	Instead of hardcoding the MTU to the maximum value allowed by the hardware, obey the value known by the operating system. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	net: dsa: sja1105: implement the port MTU callbacks	Vladimir Oltean	2	-4/+47
	On this switch, the frame length enforcements are performed by the ingress policers. There are 2 types of those: regular L2 (also called best-effort) and Virtual Link policers (an ARINC664/AFDX concept for defining L2 streams with certain QoS abilities). To avoid future confusion, I prefer to call the reset reason "Best-effort policers", even though the VL policers are not yet supported. We also need to change the setup of the initial static config, such that DSA calls to .change_mtu (which are expensive) become no-ops and don't reset the switch 5 times. A driver-level decision is to unconditionally allow single VLAN-tagged traffic on all ports. The CPU port must accept an additional VLAN header for the DSA tag, which is again a driver-level decision. The policers actually count bytes not only from the SDU, but also from the Ethernet header and FCS, so those need to be accounted for as well. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	net: dsa: b53: add MTU configuration support	Murali Krishna Policharla	1	-5/+22
	It looks like the Broadcom switches supported by the b53 driver don't support precise configuration of the MTU, but just a mumbo-jumbo boolean flag. Set that. Also configure BCM583XX devices to send and receive jumbo frames when ports are configured with 10/100 Mbps speed. Signed-off-by: Murali Krishna Policharla <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	net: dsa: implement auto-normalization of MTU for bridge hardware datapath	Vladimir Oltean	4	-1/+125
	Many switches don't have an explicit knob for configuring the MTU (maximum transmission unit per interface). Instead, they do the length-based packet admission checks on the ingress interface, for reasons that are easy to understand (why would you accept a packet in the queuing subsystem if you know you're going to drop it anyway). So it is actually the MRU that these switches permit configuring. In Linux there only exists the IFLA_MTU netlink attribute and the associated dev_set_mtu function. The comments like to play blind and say that it's changing the "maximum transfer unit", which is to say that there isn't any directionality in the meaning of the MTU word. So that is the interpretation that this patch is giving to things: MTU == MRU. When 2 interfaces having different MTUs are bridged, the bridge driver MTU auto-adjustment logic kicks in: what br_mtu_auto_adjust() does is it adjusts the MTU of the bridge net device itself (and not that of the slave net devices) to the minimum value of all slave interfaces, in order for forwarded packets to not exceed the MTU regardless of the interface they are received and send on. The idea behind this behavior, and why the slave MTUs are not adjusted, is that normal termination from Linux over the L2 forwarding domain should happen over the bridge net device, which _is_ properly limited by the minimum MTU. And termination over individual slave devices is possible even if those are bridged. But that is not "forwarding", so there's no reason to do normalization there, since only a single interface sees that packet. The problem with those switches that can only control the MRU is with the offloaded data path, where a packet received on an interface with MRU 9000 would still be forwarded to an interface with MRU 1500. And the br_mtu_auto_adjust() function does not really help, since the MTU configured on the bridge net device is ignored. In order to enforce the de-facto MTU == MRU rule for these switches, we need to do MTU normalization, which means: in order for no packet larger than the MTU configured on this port to be sent, then we need to limit the MRU on all ports that this packet could possibly come from. AKA since we are configuring the MRU via MTU, it means that all ports within a bridge forwarding domain should have the same MTU. And that is exactly what this patch is trying to do. >From an implementation perspective, we try to follow the intent of the user, otherwise there is a risk that we might livelock them (they try to change the MTU on an already-bridged interface, but we just keep changing it back in an attempt to keep the MTU normalized). So the MTU that the bridge is normalized to is either: - The most recently changed one: ip link set dev swp0 master br0 ip link set dev swp1 master br0 ip link set dev swp0 mtu 1400 This sequence will make swp1 inherit MTU 1400 from swp0. - The one of the most recently added interface to the bridge: ip link set dev swp0 master br0 ip link set dev swp1 mtu 1400 ip link set dev swp1 master br0 The above sequence will make swp0 inherit MTU 1400 as well. Suggested-by: Florian Fainelli <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	net: dsa: configure the MTU for switch ports	Vladimir Oltean	6	-16/+181
	It is useful be able to configure port policers on a switch to accept frames of various sizes: - Increase the MTU for better throughput from the default of 1500 if it is known that there is no 10/100 Mbps device in the network. - Decrease the MTU to limit the latency of high-priority frames under congestion, or work around various network segments that add extra headers to packets which can't be fragmented. For DSA slave ports, this is mostly a pass-through callback, called through the regular ndo ops and at probe time (to ensure consistency across all supported switches). The CPU port is called with an MTU equal to the largest configured MTU of the slave ports. The assumption is that the user might want to sustain a bidirectional conversation with a partner over any switch port. The DSA master is configured the same as the CPU port, plus the tagger overhead. Since the MTU is by definition L2 payload (sans Ethernet header), it is up to each individual driver to figure out if it needs to do anything special for its frame tags on the CPU port (it shouldn't except in special cases). So the MTU does not contain the tagger overhead on the CPU port. However the MTU of the DSA master, minus the tagger overhead, is used as a proxy for the MTU of the CPU port, which does not have a net device. This is to avoid uselessly calling the .change_mtu function on the CPU port when nothing should change. So it is safe to assume that the DSA master and the CPU port MTUs are apart by exactly the tagger's overhead in bytes. Some changes were made around dsa_master_set_mtu(), function which was now removed, for 2 reasons: - dev_set_mtu() already calls dev_validate_mtu(), so it's redundant to do the same thing in DSA - __dev_set_mtu() returns 0 if ops->ndo_change_mtu is an absent method That is to say, there's no need for this function in DSA, we can safely call dev_set_mtu() directly, take the rtnl lock when necessary, and just propagate whatever errors get reported (since the user probably wants to be informed). Some inspiration (mainly in the MTU DSA notifier) was taken from a vaguely similar patch from Murali and Florian, who are credited as co-developers down below. Co-developed-by: Murali Krishna Policharla <[email protected]> Signed-off-by: Murali Krishna Policharla <[email protected]> Co-developed-by: Florian Fainelli <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	bgmac: configure MTU and add support for frames beyond 8192 byte size	Murali Krishna Policharla	2	-2/+15
	Change DMA descriptor length to handle jumbo frames beyond 8192 bytes. Also update jumbo frame max size to include FCS, the DMA packet length received includes FCS. Signed-off-by: Murali Krishna Policharla <[email protected]> Reviewed-by: Arun Parameswaran <[email protected]> Reviewed-by: Ray Jui <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	net: phy: bcm7xx: add jumbo frame configuration to PHY	Murali Krishna Policharla	4	-0/+29
	The BCM7XX PHY family requires special configuration to pass jumbo frames. Do that during initial PHY setup. Signed-off-by: Murali Krishna Policharla <[email protected]> Reviewed-by: Scott Branden <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	r8169: fix PHY driver check on platforms w/o module softdeps	Heiner Kallweit	1	-9/+7
	On Android/x86 the module loading infrastructure can't deal with softdeps. Therefore the check for presence of the Realtek PHY driver module fails. mdiobus_register() will try to load the PHY driver module, therefore move the check to after this call and explicitly check that a dedicated PHY driver is bound to the PHY device. Fixes: f32593773549 ("r8169: check that Realtek PHY driver module is loaded") Reported-by: Chih-Wei Huang <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	Merge tag 'wireless-drivers-next-2020-03-27' of ↵	David S. Miller	73	-1211/+3350
	git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.7 Third set of patches for v5.7. Nothing really special this time, business as usual. When pulling this to net-next there's again a conflict in: drivers/net/wireless/intel/iwlwifi/pcie/drv.c To solve this drop these three lines from the conflict (the first hunk from "HEAD") as the whole AX200 block was moved above in the same file: IWL_DEV_INFO(0x2723, 0x1653, iwl_ax200_cfg_cc, iwl_ax200_killer_1650w_name), IWL_DEV_INFO(0x2723, 0x1654, iwl_ax200_cfg_cc, iwl_ax200_killer_1650x_name), IWL_DEV_INFO(0x2723, IWL_CFG_ANY, iwl_ax200_cfg_cc, iwl_ax200_name), And keep all the __IWL_DEV_INFO() entries (the second hunk). In other words, take everything from wireless-drivers-next. When running 'git diff' after the resolution the output should be empty. Major changes: brcmfmac * add USB autosuspend support ath11k * handle RX fragments * enable PN offload * add support for HE BSS color iwlwifi * support new FW API version * support for EDCA measurements * new scan API features * enable new firmware debugging code ==================== Kalle gave me directions on how to resolve the iwlwifi conflict as follows: ==================== When pulling this to net-next there's again a conflict in: drivers/net/wireless/intel/iwlwifi/pcie/drv.c To solve this drop these three lines from the conflict (the first hunk from "HEAD") as the whole AX200 block was moved above in the same file: IWL_DEV_INFO(0x2723, 0x1653, iwl_ax200_cfg_cc, iwl_ax200_killer_1650w_name), IWL_DEV_INFO(0x2723, 0x1654, iwl_ax200_cfg_cc, iwl_ax200_killer_1650x_name), IWL_DEV_INFO(0x2723, IWL_CFG_ANY, iwl_ax200_cfg_cc, iwl_ax200_name), And keep all the __IWL_DEV_INFO() entries (the second hunk). In other words, take everything from wireless-drivers-next. When running 'git diff' after the resolution the output should be empty. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-27	Merge branch 's390-qeth-next'	David S. Miller	6	-36/+37
	Julian Wiedmann says: ==================== s390/qeth: updates 2020-03-27 please apply the following patch series for qeth to netdev's net-next tree. Spring clean edition: - remove one sysfs attribute that was never put in use, - make support for OSN and OSX devices optional, and - probe for removal of the obsolete OSN support. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-27	s390/qeth: phase out OSN support	Julian Wiedmann	2	-0/+4
	OSN devices currently spend an awful long time in qeth_l2_set_online() until various unsupported HW cmds time out. This has been broken for over two years, ever since commit d22ffb5a712f ("s390/qeth: fix IPA command submission race") triggered a FW bug in cmd processing. Prior to commit 782e4a792147 ("s390/qeth: don't poll for cmd IO completion"), this wait for timeout would have even been spent busy-polling. The offending patch was picked up by stable and all relevant distros, and yet noone noticed. OSN setups only ever worked in combination with an out-of-tree blob, and the last machine that even offered HW with OSN support was released back in 2015. Rather than attempting to work-around this FW issue for no actual gain, add a deprecation warning so anyone who still wants to maintain this part of the code can speak up. Else rip it all out in 2021. Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	s390/qeth: make OSN / OSX support configurable	Julian Wiedmann	4	-0/+33
	The last machine generation that supports OSN is z13, and OSX is only supported up to z14. Allow users and distros to decide whether they still need support for these device types. Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	s390/qeth: remove fake_broadcast attribute	Julian Wiedmann	2	-36/+0
	Ever since commit 4a71df50047f ("qeth: new qeth device driver") introduced this attribute, it can be read & written but has no actual effect. Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-27	Merge branch 'bnxt_en-Updates-to-devlink-info_get-cb'	David S. Miller	7	-7/+113
	Vasundhara Volam says: ==================== bnxt_en: Updates to devlink info_get cb This series adds support for a generic macro to devlink info_get cb. Adds support for fw.mgmt.api and board.id info to bnxt_en driver info_get cb. Also, updates the devlink-info.rst and bnxt.rst documentation accordingly. This series adds a patch to fix few macro names that maps to bnxt_en firmware versions. v1->v2: Remove ECN dev param, base_mh_addr and serial number info support in this series. Rename drv.spec macro to fw.api. --- v2->v3: Remove hw.addr info as it is per netdev but not per device info. --- v3->v4: Rename "fw.api" to "fw.mgmt.api". Also, add a patch that modifies few macro names in info_get command, to match the devlink documentation. ==================== Signed-off-by: David S. Miller <[email protected]>