aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-02-22bpf: add schedule points in percpu arrays managementEric Dumazet1-1/+4
syszbot managed to trigger RCU detected stalls in bpf_array_free_percpu() It takes time to allocate a huge percpu map, but even more time to free it. Since we run in process context, use cond_resched() to yield cpu if needed. Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-22Merge tag 'mac80211-for-davem-2018-02-22' of ↵David S. Miller11-27/+41
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Various fixes across the tree, the shortlog basically says it all: cfg80211: fix cfg80211_beacon_dup -> old bug in this code cfg80211: clear wep keys after disconnection -> certain ways of disconnecting left the keys mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4 -> alignment issues with using 14 bytes mac80211: Do not disconnect on invalid operating class -> if the AP has a bogus operating class, let it be mac80211: Fix sending ADDBA response for an ongoing session -> don't send the same frame twice cfg80211: use only 1Mbps for basic rates in mesh -> interop issue with old versions of our code mac80211_hwsim: don't use WQ_MEM_RECLAIM -> it causes splats because it flushes work on a non-reclaim WQ regulatory: add NUL to request alpha2 -> nla_put_string() issue from Kees mac80211: mesh: fix wrong mesh TTL offset calculation -> protocol issue mac80211: fix a possible leak of station stats -> error path might leak memory mac80211: fix calling sleeping function in atomic context -> percpu allocations need to be made with gfp flags ==================== Signed-off-by: David S. Miller <[email protected]>
2018-02-22Merge tag 'usb-4.16-rc3' of ↵Linus Torvalds38-133/+350
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a number of USB fixes for 4.16-rc3 Nothing major, but a number of different fixes all over the place in the USB stack for reported issues. Mostly gadget driver fixes, although the typical set of xhci bugfixes are there, along with some new quirks additions as well. All of these have been in linux-next for a while with no reported issues" * tag 'usb-4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (39 commits) Revert "usb: musb: host: don't start next rx urb if current one failed" usb: musb: fix enumeration after resume usb: cdc_acm: prevent race at write to acm while system resumes Add delay-init quirk for Corsair K70 RGB keyboards usb: ohci: Proper handling of ed_rm_list to handle race condition between usb_kill_urb() and finish_unlinks() usb: host: ehci: always enable interrupt for qtd completion at test mode usb: ldusb: add PIDs for new CASSY devices supported by this driver usb: renesas_usbhs: missed the "running" flag in usb_dmac with rx path usb: host: ehci: use correct device pointer for dma ops usbip: keep usbip_device sockfd state in sync with tcp_socket ohci-hcd: Fix race condition caused by ohci_urb_enqueue() and io_watchdog_func() USB: serial: option: Add support for Quectel EP06 xhci: fix xhci debugfs errors in xhci_stop xhci: xhci debugfs device nodes weren't removed after device plugged out xhci: Fix xhci debugfs devices node disappearance after hibernation xhci: Fix NULL pointer in xhci debugfs xhci: Don't print a warning when setting link state for disabled ports xhci: workaround for AMD Promontory disabled ports wakeup usb: dwc3: core: Fix ULPI PHYs and prevent phy_get/ulpi_init during suspend/resume USB: gadget: udc: Add missing platform_device_put() on error in bdc_pci_probe() ...
2018-02-22Merge tag 'staging-4.16-rc2' of ↵Linus Torvalds10-30/+64
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO fixes from Greg KH: "Here are a small number of staging and iio driver fixes for 4.16-rc2. The IIO fixes are all for reported things, and the android driver fixes also resolve some reported problems. The remaining fsl-mc Kconfig change resolves a build testing error that Arnd reported. All of these have been in linux-next with no reported issues" * tag 'staging-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: iio: buffer: check if a buffer has been set up when poll is called iio: adis_lib: Initialize trigger before requesting interrupt staging: android: ion: Zero CMA allocated memory staging: android: ashmem: Fix a race condition in pin ioctls staging: fsl-mc: fix build testing on x86 iio: srf08: fix link error "devm_iio_triggered_buffer_setup" undefined staging: iio: ad5933: switch buffer mode to software iio: adc: stm32: fix stm32h7_adc_enable error handling staging: iio: adc: ad7192: fix external frequency setting iio: adc: aspeed: Fix error handling path
2018-02-22Merge tag 'char-misc-4.16-rc3' of ↵Linus Torvalds7-45/+45
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are a handful of char/misc driver fixes for 4.16-rc3. There are some binder driver fixes to resolve reported issues in stress testing the recent binder changes, some extcon driver fixes, and a few mei driver fixes and new device ids. All of these, with the exception of the mei driver id additions, have been in linux-next for a while. I forgot to push out the mei driver id additions to kernel.org until today, but all build tests pass with them enabled" * tag 'char-misc-4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: mei: me: add cannon point device ids for 4th device mei: me: add cannon point device ids mei: set device client to the disconnected state upon suspend. ANDROID: binder: synchronize_rcu() when using POLLFREE. binder: replace "%p" with "%pK" ANDROID: binder: remove WARN() for redundant txn error binder: check for binder_thread allocation failure in binder_poll() extcon: int3496: process id-pin first so that we start with the right status Revert "extcon: axp288: Redo charger type detection a couple of seconds after probe()" extcon: axp288: Constify the axp288_pwr_up_down_info array
2018-02-22regulatory: add NUL to request alpha2Johannes Berg1-1/+1
Similar to the ancient commit a5fe8e7695dc ("regulatory: add NUL to alpha2"), add another byte to alpha2 in the request struct so that when we use nla_put_string(), we don't overrun anything. Fixes: 73d54c9e74c4 ("cfg80211: add regulatory netlink multicast group") Reported-by: Kees Cook <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2018-02-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds23-115/+242
Pull rdma fixes from Doug Ledford: "Nothing in this is overly interesting, it's mostly your garden variety fixes. There was some work in this merge cycle around the new ioctl kABI, so there are fixes in here related to that (probably with more to come). We've also recently added new netlink support with a goal of moving the primary means of configuring the entire subsystem to netlink (eventually, this is a long term project), so there are fixes for that. Then a few bnxt_re driver fixes, and a few minor WARN_ON removals, and that covers this pull request. There are already a few more fixes on the list as of this morning, so there will certainly be more to come in this rc cycle ;-) Summary: - Lots of fixes for the new IOCTL interface and general uverbs flow. Found through testing and syzkaller - Bugfixes for the new resource track netlink reporting - Remove some unneeded WARN_ONs that were triggering for some users in IPoIB - Various fixes for the bnxt_re driver" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (27 commits) RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type RDMA/bnxt_re: Avoid system hang during device un-reg RDMA/bnxt_re: Fix system crash during load/unload RDMA/bnxt_re: Synchronize destroy_qp with poll_cq RDMA/bnxt_re: Unpin SQ and RQ memory if QP create fails RDMA/bnxt_re: Disable atomic capability on bnxt_re adapters RDMA/restrack: don't use uaccess_kernel() RDMA/verbs: Check existence of function prior to accessing it RDMA/vmw_pvrdma: Fix usage of user response structures in ABI file RDMA/uverbs: Sanitize user entered port numbers prior to access it RDMA/uverbs: Fix circular locking dependency RDMA/uverbs: Fix bad unlock balance in ib_uverbs_close_xrcd RDMA/restrack: Increment CQ restrack object before committing RDMA/uverbs: Protect from command mask overflow IB/uverbs: Fix unbalanced unlock on error path for rdma_explicit_destroy IB/uverbs: Improve lockdep_check RDMA/uverbs: Protect from races between lookup and destroy of uobjects IB/uverbs: Hold the uobj write lock after allocate IB/uverbs: Fix possible oops with duplicate ioctl attributes IB/uverbs: Add ioctl support for 32bit processes ...
2018-02-22Merge tag 'riscv-for-linus-4.16-rc3-riscv_cleanups' of ↵Linus Torvalds4-7/+5
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V cleanups from Palmer Dabbelt: "This contains a handful of small cleanups. The only functional change is that IRQs are now enabled during exception handling, which was found when some warnings triggered with `CONFIG_DEBUG_ATOMIC_SLEEP=y`. The remaining fixes should have no functional change: `sbi_save()` has been renamed to `parse_dtb()` reflect what it actually does, and a handful of unused Kconfig entries have been removed" * tag 'riscv-for-linus-4.16-rc3-riscv_cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: Rename sbi_save to parse_dtb to improve code readability RISC-V: Enable IRQ during exception handling riscv: Remove ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE select riscv: kconfig: Remove RISCV_IRQ_INTC select riscv: Remove ARCH_WANT_OPTIONAL_GPIOLIB select
2018-02-22ibmvnic: Fix early release of login bufferThomas Falcon1-1/+1
The login buffer is released before the driver can perform sanity checks between resources the driver requested and what firmware will provide. Don't release the login buffer until the sanity check is performed. Fixes: 34f0f4e3f488 ("ibmvnic: Fix login buffer memory leaks") Signed-off-by: Thomas Falcon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-22net/smc9194: Remove bogus CONFIG_MAC referenceFinn Thain1-1/+1
AFAIK the only version of smc9194.c with Mac support is the one in the linux-mac68k CVS repo, which never made it to the mainline. Despite that, from v2.3.45, arch/m68k/config.in listed CONFIG_SMC9194 under CONFIG_MAC. This mistake got carried over into Kconfig in v2.5.55. (See pre-git era "[PATCH] add m68k dependencies to net driver config".) Signed-off-by: Finn Thain <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-22net: ipv4: Set addr_type in hash_keys for forwarded caseDavid Ahern1-0/+2
The result of the skb flow dissect is copied from keys to hash_keys to ensure only the intended data is hashed. The original L4 hash patch overlooked setting the addr_type for this case; add it. Fixes: bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice") Reported-by: Ido Schimmel <[email protected]> Signed-off-by: David Ahern <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-22tcp_bbr: better deal with suboptimal GSOEric Dumazet1-4/+5
BBR uses tcp_tso_autosize() in an attempt to probe what would be the burst sizes and to adjust cwnd in bbr_target_cwnd() with following gold formula : /* Allow enough full-sized skbs in flight to utilize end systems. */ cwnd += 3 * bbr->tso_segs_goal; But GSO can be lacking or be constrained to very small units (ip link set dev ... gso_max_segs 2) What we really want is to have enough packets in flight so that both GSO and GRO are efficient. So in the case GSO is off or downgraded, we still want to have the same number of packets in flight as if GSO/TSO was fully operational, so that GRO can hopefully be working efficiently. To fix this issue, we make tcp_tso_autosize() unaware of sk->sk_gso_max_segs Only tcp_tso_segs() has to enforce the gso_max_segs limit. Tested: ethtool -K eth0 tso off gso off tc qd replace dev eth0 root pfifo_fast Before patch: for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done     691  (ss -temoi shows cwnd is stuck around 6 )     667     651     631     517 After patch : # for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done    1733 (ss -temoi shows cwnd is around 386 )    1778    1746    1781    1718 Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Oleksandr Natalenko <[email protected]> Acked-by: Neal Cardwell <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-22smsc75xx: fix smsc75xx_set_features()Eric Dumazet1-3/+4
If an attempt is made to disable RX checksums, USB adapter is changed but netdev->features is not, because smsc75xx_set_features() returns a non zero value. This throws errors from netdev_rx_csum_fault() : <devname>: hw csum failure Signed-off-by: Eric Dumazet <[email protected]> Cc: Steve Glendinning <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-22netlink: put module reference if dump start failsJason A. Donenfeld1-1/+3
Before, if cb->start() failed, the module reference would never be put, because cb->cb_running is intentionally false at this point. Users are generally annoyed by this because they can no longer unload modules that leak references. Also, it may be possible to tediously wrap a reference counter back to zero, especially since module.c still uses atomic_inc instead of refcount_inc. This patch expands the error path to simply call module_put if cb->start() fails. Fixes: 41c87425a1ac ("netlink: do not set cb_running if dump's start() errs") Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-02-22Merge tag 'seccomp-v4.16-rc3' of ↵James Morris3-4/+67
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux into fixes-v4.16-rc3 - Fix seccomp GET_METADATA to deal with field sizes correctly (Tycho Andersen) - Add selftest to make sure GET_METADATA doesn't regress (Tycho Andersen)
2018-02-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds40-155/+180
Merge misc fixes from Andrew Morton: "16 fixes" * emailed patches from Andrew Morton <[email protected]>: mm: don't defer struct page initialization for Xen pv guests lib/Kconfig.debug: enable RUNTIME_TESTING_MENU vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systems selftests/memfd: add run_fuse_test.sh to TEST_FILES bug.h: work around GCC PR82365 in BUG() mm/swap.c: make functions and their kernel-doc agree (again) mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-doc ida: do zeroing in ida_pre_get() mm, swap, frontswap: fix THP swap if frontswap enabled certs/blacklist_nohashes.c: fix const confusion in certs blacklist kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE mm, mlock, vmscan: no more skipping pagevecs mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats Kbuild: always define endianess in kconfig.h include/linux/sched/mm.h: re-inline mmdrop() tools: fix cross-compile var clobbering
2018-02-22efivarfs: Limit the rate for non-root to read filesLuck, Tony3-0/+13
Each read from a file in efivarfs results in two calls to EFI (one to get the file size, another to get the actual data). On X86 these EFI calls result in broadcast system management interrupts (SMI) which affect performance of the whole system. A malicious user can loop performing reads from efivarfs bringing the system to its knees. Linus suggested per-user rate limit to solve this. So we add a ratelimit structure to "user_struct" and initialize it for the root user for no limit. When allocating user_struct for other users we set the limit to 100 per second. This could be used for other places that want to limit the rate of some detrimental user action. In efivarfs if the limit is exceeded when reading, we take an interruptible nap for 50ms and check the rate limit again. Signed-off-by: Tony Luck <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-22kconfig.h: Include compiler types to avoid missed struct attributesKees Cook1-0/+3
The header files for some structures could get included in such a way that struct attributes (specifically __randomize_layout from path.h) would be parsed as variable names instead of attributes. This could lead to some instances of a structure being unrandomized, causing nasty GPFs, etc. This patch makes sure the compiler_types.h header is included in kconfig.h so that we've always got types and struct attributes defined, since kconfig.h is included from the compiler command line. Reported-by: Patrick McLean <[email protected]> Root-caused-by: Maciej S. Szmigiero <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Tested-by: Maciej S. Szmigiero <[email protected]> Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization") Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-22x86: Treat R_X86_64_PLT32 as R_X86_64_PC32H.J. Lu3-0/+5
On i386, there are 2 types of PLTs, PIC and non-PIC. PIE and shared objects must use PIC PLT. To use PIC PLT, you need to load _GLOBAL_OFFSET_TABLE_ into EBX first. There is no need for that on x86-64 since x86-64 uses PC-relative PLT. On x86-64, for 32-bit PC-relative branches, we can generate PLT32 relocation, instead of PC32 relocation, which can also be used as a marker for 32-bit PC-relative branches. Linker can always reduce PLT32 relocation to PC32 if function is defined locally. Local functions should use PC32 relocation. As far as Linux kernel is concerned, R_X86_64_PLT32 can be treated the same as R_X86_64_PC32 since Linux kernel doesn't use PLT. R_X86_64_PLT32 for 32-bit PC-relative branches has been enabled in binutils master branch which will become binutils 2.31. [ hjl is working on having better documentation on this all, but a few more notes from him: "PLT32 relocation is used as marker for PC-relative branches. Because of EBX, it looks odd to generate PLT32 relocation on i386 when EBX doesn't have GOT. As for symbol resolution, PLT32 and PC32 relocations are almost interchangeable. But when linker sees PLT32 relocation against a protected symbol, it can resolved locally at link-time since it is used on a branch instruction. Linker can't do that for PC32 relocation" but for the kernel use, the two are basically the same, and this commit gets things building and working with the current binutils master - Linus ] Signed-off-by: H.J. Lu <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-22KEYS: Use individual pages in big_key for crypto buffersDavid Howells1-23/+87
kmalloc() can't always allocate large enough buffers for big_key to use for crypto (1MB + some metadata) so we cannot use that to allocate the buffer. Further, vmalloc'd pages can't be passed to sg_init_one() and the aead crypto accessors cannot be called progressively and must be passed all the data in one go (which means we can't pass the data in one block at a time). Fix this by allocating the buffer pages individually and passing them through a multientry scatterlist to the crypto layer. This has the bonus advantage that we don't have to allocate a contiguous series of pages. We then vmap() the page list and pass that through to the VFS read/write routines. This can trigger a warning: WARNING: CPU: 0 PID: 60912 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0xb7c/0x15f8 ([<00000000002acbb6>] __alloc_pages_nodemask+0x1ee/0x15f8) [<00000000002dd356>] kmalloc_order+0x46/0x90 [<00000000002dd3e0>] kmalloc_order_trace+0x40/0x1f8 [<0000000000326a10>] __kmalloc+0x430/0x4c0 [<00000000004343e4>] big_key_preparse+0x7c/0x210 [<000000000042c040>] key_create_or_update+0x128/0x420 [<000000000042e52c>] SyS_add_key+0x124/0x220 [<00000000007bba2c>] system_call+0xc4/0x2b0 from the keyctl/padd/useradd test of the keyutils testsuite on s390x. Note that it might be better to shovel data through in page-sized lumps instead as there's no particular need to use a monolithic buffer unless the kernel itself wants to access the data. Fixes: 13100a72f40f ("Security: Keys: Big keys stored encrypted") Reported-by: Paul Bunyan <[email protected]> Signed-off-by: David Howells <[email protected]> cc: Kirill Marinushkin <[email protected]>
2018-02-22X.509: fix NULL dereference when restricting key with unsupported_sigEric Biggers1-8/+13
The asymmetric key type allows an X.509 certificate to be added even if its signature's hash algorithm is not available in the crypto API. In that case 'payload.data[asym_auth]' will be NULL. But the key restriction code failed to check for this case before trying to use the signature, resulting in a NULL pointer dereference in key_or_keyring_common() or in restrict_link_by_signature(). Fix this by returning -ENOPKG when the signature is unsupported. Reproducer when all the CONFIG_CRYPTO_SHA512* options are disabled and keyctl has support for the 'restrict_keyring' command: keyctl new_session keyctl restrict_keyring @s asymmetric builtin_trusted openssl req -new -sha512 -x509 -batch -nodes -outform der \ | keyctl padd asymmetric desc @s Fixes: a511e1af8b12 ("KEYS: Move the point of trust determination to __key_link()") Cc: <[email protected]> # v4.7+ Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: David Howells <[email protected]>
2018-02-22X.509: fix BUG_ON() when hash algorithm is unsupportedEric Biggers1-1/+3
The X.509 parser mishandles the case where the certificate's signature's hash algorithm is not available in the crypto API. In this case, x509_get_sig_params() doesn't allocate the cert->sig->digest buffer; this part seems to be intentional. However, public_key_verify_signature() is still called via x509_check_for_self_signed(), which triggers the 'BUG_ON(!sig->digest)'. Fix this by making public_key_verify_signature() return -ENOPKG if the hash buffer has not been allocated. Reproducer when all the CONFIG_CRYPTO_SHA512* options are disabled: openssl req -new -sha512 -x509 -batch -nodes -outform der \ | keyctl padd asymmetric desc @s Fixes: 6c2dc5ae4ab7 ("X.509: Extract signature digest and make self-signed cert checks earlier") Reported-by: Paolo Valente <[email protected]> Cc: Paolo Valente <[email protected]> Cc: <[email protected]> # v4.7+ Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: David Howells <[email protected]>
2018-02-22PKCS#7: fix direct verification of SignerInfo signatureEric Biggers1-0/+1
If none of the certificates in a SignerInfo's certificate chain match a trusted key, nor is the last certificate signed by a trusted key, then pkcs7_validate_trust_one() tries to check whether the SignerInfo's signature was made directly by a trusted key. But, it actually fails to set the 'sig' variable correctly, so it actually verifies the last signature seen. That will only be the SignerInfo's signature if the certificate chain is empty; otherwise it will actually be the last certificate's signature. This is not by itself a security problem, since verifying any of the certificates in the chain should be sufficient to verify the SignerInfo. Still, it's not working as intended so it should be fixed. Fix it by setting 'sig' correctly for the direct verification case. Fixes: 757932e6da6d ("PKCS#7: Handle PKCS#7 messages that contain no X.509 certs") Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: David Howells <[email protected]>
2018-02-22PKCS#7: fix certificate blacklistingEric Biggers1-4/+6
If there is a blacklisted certificate in a SignerInfo's certificate chain, then pkcs7_verify_sig_chain() sets sinfo->blacklisted and returns 0. But, pkcs7_verify() fails to handle this case appropriately, as it actually continues on to the line 'actual_ret = 0;', indicating that the SignerInfo has passed verification. Consequently, PKCS#7 signature verification ignores the certificate blacklist. Fix this by not considering blacklisted SignerInfos to have passed verification. Also fix the function comment with regards to when 0 is returned. Fixes: 03bb79315ddc ("PKCS#7: Handle blacklisted certificates") Cc: <[email protected]> # v4.12+ Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: David Howells <[email protected]>
2018-02-22PKCS#7: fix certificate chain verificationEric Biggers1-1/+1
When pkcs7_verify_sig_chain() is building the certificate chain for a SignerInfo using the certificates in the PKCS#7 message, it is passing the wrong arguments to public_key_verify_signature(). Consequently, when the next certificate is supposed to be used to verify the previous certificate, the next certificate is actually used to verify itself. An attacker can use this bug to create a bogus certificate chain that has no cryptographic relationship between the beginning and end. Fortunately I couldn't quite find a way to use this to bypass the overall signature verification, though it comes very close. Here's the reasoning: due to the bug, every certificate in the chain beyond the first actually has to be self-signed (where "self-signed" here refers to the actual key and signature; an attacker might still manipulate the certificate fields such that the self_signed flag doesn't actually get set, and thus the chain doesn't end immediately). But to pass trust validation (pkcs7_validate_trust()), either the SignerInfo or one of the certificates has to actually be signed by a trusted key. Since only self-signed certificates can be added to the chain, the only way for an attacker to introduce a trusted signature is to include a self-signed trusted certificate. But, when pkcs7_validate_trust_one() reaches that certificate, instead of trying to verify the signature on that certificate, it will actually look up the corresponding trusted key, which will succeed, and then try to verify the *previous* certificate, which will fail. Thus, disaster is narrowly averted (as far as I could tell). Fixes: 6c2dc5ae4ab7 ("X.509: Extract signature digest and make self-signed cert checks earlier") Cc: <[email protected]> # v4.7+ Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: David Howells <[email protected]>
2018-02-22selftests/bpf/test_maps: exit child process without error in ENOMEM caseLi Zhijian1-0/+2
test_maps contains a series of stress tests, and previously it will break the rest tests when it failed to alloc memory. ----------------------- Failed to create hashmap key=8 value=262144 'Cannot allocate memory' Failed to create hashmap key=16 value=262144 'Cannot allocate memory' Failed to create hashmap key=8 value=262144 'Cannot allocate memory' Failed to create hashmap key=8 value=262144 'Cannot allocate memory' test_maps: test_maps.c:955: run_parallel: Assertion `status == 0' failed. Aborted not ok 1..3 selftests: test_maps [FAIL] ----------------------- after this patch, the rest tests will be continue when it occurs an ENOMEM failure CC: Alexei Starovoitov <[email protected]> CC: Philip Li <[email protected]> Suggested-by: Daniel Borkmann <[email protected]> Signed-off-by: Li Zhijian <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-22arm64: Enforce BBM for huge IO/VMAP mappingsWill Deacon1-0/+10
ioremap_page_range doesn't honour break-before-make and attempts to put down huge mappings (using p*d_set_huge) over the top of pre-existing table entries. This leads to us leaking page table memory and also gives rise to TLB conflicts and spurious aborts, which have been seen in practice on Cortex-A75. Until this has been resolved, refuse to put block mappings when the existing entry is found to be present. Fixes: 324420bf91f60 ("arm64: add support for ioremap() block mappings") Reported-by: Hanjun Guo <[email protected]> Reported-by: Lei Li <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Catalin Marinas <[email protected]>
2018-02-22i2c: designware: Consider SCL GPIO optionalAndy Shevchenko1-1/+1
GPIO library can return -ENOSYS for the failed request. Instead of failing ->probe() in this case override error code to 0. Fixes: ca382f5b38f3 ("i2c: designware: add i2c gpio recovery option") Reported-by: Dominik Brodowski <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Tested-by: Dominik Brodowski <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2018-02-22i2c: busses: i2c-sirf: Fix spelling: "formular" -> "formula".Patryk Kocielnik1-2/+2
Fix spelling. Signed-off-by: Patryk Kocielnik <[email protected]> [wsa: fixed "Initialization", too] Signed-off-by: Wolfram Sang <[email protected]>
2018-02-22i2c: bcm2835: Set up the rising/falling edge delaysEric Anholt1-1/+20
We were leaving them in the power on state (or the state the firmware had set up for some client, if we were taking over from them). The boot state was 30 core clocks, when we actually want to sample some time after (to make sure that the new input bit has actually arrived). Signed-off-by: Eric Anholt <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Cc: [email protected]
2018-02-21seccomp: add a selftest for get_metadataTycho Andersen1-0/+61
Let's test that we get the flags correctly, and that we preserve the filter index across the ptrace(PTRACE_SECCOMP_GET_METADATA) correctly. Signed-off-by: Tycho Andersen <[email protected]> CC: Kees Cook <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2018-02-21ptrace, seccomp: tweak get_metadata behavior slightlyTycho Andersen1-2/+4
Previously if users passed a small size for the input structure size, they would get get odd behavior. It doesn't make sense to pass a structure smaller than at least filter_off size, so let's just give -EINVAL in this case. This changes userspace visible behavior, but was only introduced in commit 26500475ac1b ("ptrace, seccomp: add support for retrieving seccomp metadata") in 4.16-rc2, so should be safe to change if merged before then. Reported-by: Eugene Syromiatnikov <[email protected]> Signed-off-by: Tycho Andersen <[email protected]> CC: Kees Cook <[email protected]> CC: Oleg Nesterov <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2018-02-21seccomp, ptrace: switch get_metadata types to arch independentTycho Andersen1-2/+2
Commit 26500475ac1b ("ptrace, seccomp: add support for retrieving seccomp metadata") introduced `struct seccomp_metadata`, which contained unsigned longs that should be arch independent. The type of the flags member was chosen to match the corresponding argument to seccomp(), and so we need something at least as big as unsigned long. My understanding is that __u64 should fit the bill, so let's switch both types to that. While this is userspace facing, it was only introduced in 4.16-rc2, and so should be safe assuming it goes in before then. Reported-by: "Dmitry V. Levin" <[email protected]> Signed-off-by: Tycho Andersen <[email protected]> CC: Kees Cook <[email protected]> CC: Oleg Nesterov <[email protected]> Reviewed-by: "Dmitry V. Levin" <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2018-02-22selftests/bpf: update gitignore with test_libbpf_openAnders Roxell1-0/+1
bpf builds a test program for loading BPF ELF files. Add the executable to the .gitignore list. Signed-off-by: Anders Roxell <[email protected]> Tested-by: Daniel Díaz <[email protected]> Acked-by: David S. Miller <[email protected]> Acked-by: Shuah Khan <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-22selftests/bpf: tcpbpf_kern: use in6_* macros from glibcAnders Roxell1-1/+0
Both glibc and the kernel have in6_* macros definitions. Build fails because it picks up wrong in6_* macro from the kernel header and not the header from glibc. Fixes build error below: clang -I. -I./include/uapi -I../../../include/uapi -Wno-compare-distinct-pointer-types \ -O2 -target bpf -emit-llvm -c test_tcpbpf_kern.c -o - | \ llc -march=bpf -mcpu=generic -filetype=obj -o .../tools/testing/selftests/bpf/test_tcpbpf_kern.o In file included from test_tcpbpf_kern.c:12: .../netinet/in.h:101:5: error: expected identifier IPPROTO_HOPOPTS = 0, /* IPv6 Hop-by-Hop options. */ ^ .../linux/in6.h:131:26: note: expanded from macro 'IPPROTO_HOPOPTS' ^ In file included from test_tcpbpf_kern.c:12: /usr/include/netinet/in.h:103:5: error: expected identifier IPPROTO_ROUTING = 43, /* IPv6 routing header. */ ^ .../linux/in6.h:132:26: note: expanded from macro 'IPPROTO_ROUTING' ^ In file included from test_tcpbpf_kern.c:12: .../netinet/in.h:105:5: error: expected identifier IPPROTO_FRAGMENT = 44, /* IPv6 fragmentation header. */ ^ Since both glibc and the kernel have in6_* macros definitions, use the one from glibc. Kernel headers will check for previous libc definitions by including include/linux/libc-compat.h. Reported-by: Daniel Díaz <[email protected]> Signed-off-by: Anders Roxell <[email protected]> Tested-by: Daniel Díaz <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-22bpf: clean up unused-variable warningArnd Bergmann1-5/+1
The only user of this variable is inside of an #ifdef, causing a warning without CONFIG_INET: net/core/filter.c: In function '____bpf_sock_ops_cb_flags_set': net/core/filter.c:3382:6: error: unused variable 'val' [-Werror=unused-variable] int val = argval & BPF_SOCK_OPS_ALL_CB_FLAGS; This replaces the #ifdef with a nicer IS_ENABLED() check that makes the code more readable and avoids the warning. Fixes: b13d88072172 ("bpf: Adds field bpf_sock_ops_cb_flags to tcp_sock") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-02-21mm: don't defer struct page initialization for Xen pv guestsJuergen Gross1-0/+4
Commit f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") broke Xen pv domains in some configurations, as the "Pinned" information in struct page of early page tables could get lost. This will lead to the kernel trying to write directly into the page tables instead of asking the hypervisor to do so. The result is a crash like the following: BUG: unable to handle kernel paging request at ffff8801ead19008 IP: xen_set_pud+0x4e/0xd0 PGD 1c0a067 P4D 1c0a067 PUD 23a0067 PMD 1e9de0067 PTE 80100001ead19065 Oops: 0003 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-default+ #271 Hardware name: Dell Inc. Latitude E6440/0159N7, BIOS A07 06/26/2014 task: ffffffff81c10480 task.stack: ffffffff81c00000 RIP: e030:xen_set_pud+0x4e/0xd0 Call Trace: __pmd_alloc+0x128/0x140 ioremap_page_range+0x3f4/0x410 __ioremap_caller+0x1c3/0x2e0 acpi_os_map_iomem+0x175/0x1b0 acpi_tb_acquire_table+0x39/0x66 acpi_tb_validate_table+0x44/0x7c acpi_tb_verify_temp_table+0x45/0x304 acpi_reallocate_root_table+0x12d/0x141 acpi_early_init+0x4d/0x10a start_kernel+0x3eb/0x4a1 xen_start_kernel+0x528/0x532 Code: 48 01 e8 48 0f 42 15 a2 fd be 00 48 01 d0 48 ba 00 00 00 00 00 ea ff ff 48 c1 e8 0c 48 c1 e0 06 48 01 d0 48 8b 00 f6 c4 02 75 5d <4c> 89 65 00 5b 5d 41 5c c3 65 8b 05 52 9f fe 7e 89 c0 48 0f a3 RIP: xen_set_pud+0x4e/0xd0 RSP: ffffffff81c03cd8 CR2: ffff8801ead19008 ---[ end trace 38eca2e56f1b642e ]--- Avoid this problem by not deferring struct page initialization when running as Xen pv guest. Pavel said: : This is unique for Xen, so this particular issue won't effect other : configurations. I am going to investigate if there is a way to : re-enable deferred page initialization on xen guests. [[email protected]: explicitly include xen.h] Link: http://lkml.kernel.org/r/[email protected] Fixes: f7f99100d8d95d ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Cc: Steven Sistare <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Bob Picco <[email protected]> Cc: <[email protected]> [4.15.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21lib/Kconfig.debug: enable RUNTIME_TESTING_MENUAnders Roxell1-0/+1
Commit d3deafaa8b5c ("lib/: make RUNTIME_TESTS a menuconfig to ease disabling it all") causes a regression when using runtime tests due to it defaults RUNTIME_TESTING_MENU to not set. Link: http://lkml.kernel.org/r/[email protected] Fixes: d3deafaa8b5c ("lib/: make RUNTIME_TESTS a menuconfig to easedisabling it all") Signed-off-by: Anders Roxell <[email protected]> Cc: Vincent Legoll <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Byungchul Park <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21vmalloc: fix __GFP_HIGHMEM usage for vmalloc_32 on 32b systemsMichal Hocko1-3/+7
Kai Heng Feng has noticed that BUG_ON(PageHighMem(pg)) triggers in drivers/media/common/saa7146/saa7146_core.c since 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly"). saa7146_vmalloc_build_pgtable uses vmalloc_32 and it is reasonable to expect that the resulting page is not in highmem. The above commit aimed to add __GFP_HIGHMEM only for those requests which do not specify any zone modifier gfp flag. vmalloc_32 relies on GFP_VMALLOC32 which should do the right thing. Except it has been missed that GFP_VMALLOC32 is an alias for GFP_KERNEL on 32b architectures. Thanks to Matthew to notice this. Fix the problem by unconditionally setting GFP_DMA32 in GFP_VMALLOC32 for !64b arches (as a bailout). This should do the right thing and use ZONE_NORMAL which should be always below 4G on 32b systems. Debugged by Matthew Wilcox. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Fixes: 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly”) Signed-off-by: Michal Hocko <[email protected]> Reported-by: Kai Heng Feng <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Laura Abbott <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21selftests/memfd: add run_fuse_test.sh to TEST_FILESAnders Roxell1-0/+1
While testing memfd tests, there is a missing script, as reported by kselftest: ./run_tests.sh: line 7: ./run_fuse_test.sh: No such file or directory Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anders Roxell <[email protected]> Signed-off-by: Daniel Díaz <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21bug.h: work around GCC PR82365 in BUG()Arnd Bergmann8-6/+44
Looking at functions with large stack frames across all architectures led me discovering that BUG() suffers from the same problem as fortify_panic(), which I've added a workaround for already. In short, variables that go out of scope by calling a noreturn function or __builtin_unreachable() keep using stack space in functions afterwards. A workaround that was identified is to insert an empty assembler statement just before calling the function that doesn't return. I'm adding a macro "barrier_before_unreachable()" to document this, and insert calls to that in all instances of BUG() that currently suffer from this problem. The files that saw the largest change from this had these frame sizes before, and much less with my patch: fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=] fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=] net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=] drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=] In case of ARC and CRIS, it turns out that the BUG() implementation actually does return (or at least the compiler thinks it does), resulting in lots of warnings about uninitialized variable use and leaving noreturn functions, such as: block/cfq-iosched.c: In function 'cfq_async_queue_prio': block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type] include/linux/dmaengine.h: In function 'dma_maxpq': include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type] This makes them call __builtin_trap() instead, which should normally dump the stack and kill the current process, like some of the other architectures already do. I tried adding barrier_before_unreachable() to panic() and fortify_panic() as well, but that had very little effect, so I'm not submitting that patch. Vineet said: : For ARC, it is double win. : : 1. Fixes 3 -Wreturn-type warnings : : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function : [-Wreturn-type] : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of : non-void function [-Wreturn-type] : : 2. bloat-o-meter reports code size improvements as gcc elides the : generated code for stack return. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Vineet Gupta <[email protected]> [arch/arc] Tested-by: Vineet Gupta <[email protected]> [arch/arc] Cc: Mikael Starvik <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Christopher Li <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Kees Cook <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Will Deacon <[email protected]> Cc: "Steven Rostedt (VMware)" <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21mm/swap.c: make functions and their kernel-doc agree (again)Mike Rapoport1-1/+1
There was a conflict between the commit e02a9f048ef7 ("mm/swap.c: make functions and their kernel-doc agree") and the commit f144c390f905 ("mm: docs: fix parameter names mismatch") that both tried to fix mismatch betweeen pagevec_lookup_entries() parameter names and their description. Since nr_entries is a better name for the parameter, fix the description again. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Randy Dunlap <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21mm/zpool.c: zpool_evictable: fix mismatch in parameter name and kernel-docMike Rapoport1-1/+1
[[email protected]: add colon, per Randy] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21ida: do zeroing in ida_pre_get()Rasmus Villemoes2-3/+1
As far as I can tell, the only place the per-cpu ida_bitmap is populated is in ida_pre_get. The pre-allocated element is stolen in two places in ida_get_new_above, in both cases immediately followed by a memset(0). Since ida_get_new_above is called with locks held, do the zeroing in ida_pre_get, or rather let kmalloc() do it. Also, apparently gcc generates ~44 bytes of code to do a memset(, 0, 128): $ scripts/bloat-o-meter vmlinux.{0,1} add/remove: 0/0 grow/shrink: 2/1 up/down: 5/-88 (-83) Function old new delta ida_pre_get 115 119 +4 vermagic 27 28 +1 ida_get_new_above 715 627 -88 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Matthew Wilcox <[email protected]> Cc: Eric Biggers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21mm, swap, frontswap: fix THP swap if frontswap enabledHuang Ying2-0/+10
It was reported by Sergey Senozhatsky that if THP (Transparent Huge Page) and frontswap (via zswap) are both enabled, when memory goes low so that swap is triggered, segfault and memory corruption will occur in random user space applications as follow, kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000] #0 0x00007fc08889ae0d _int_malloc (libc.so.6) #1 0x00007fc08889c2f3 malloc (libc.so.6) #2 0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt) #3 0x0000560e6005e75c n/a (urxvt) #4 0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt) #5 0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt) #6 0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt) #7 0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt) #8 0x0000560e6005cb55 ev_run (urxvt) #9 0x0000560e6003b9b9 main (urxvt) #10 0x00007fc08883af4a __libc_start_main (libc.so.6) #11 0x0000560e6003f9da _start (urxvt) After bisection, it was found the first bad commit is bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out"). The root cause is as follows: When the pages are written to swap device during swapping out in swap_writepage(), zswap (fontswap) is tried to compress the pages to improve performance. But zswap (frontswap) will treat THP as a normal page, so only the head page is saved. After swapping in, tail pages will not be restored to their original contents, causing memory corruption in the applications. This is fixed by refusing to save page in the frontswap store functions if the page is a THP. So that the THP will be swapped out to swap device. Another choice is to split THP if frontswap is enabled. But it is found that the frontswap enabling isn't flexible. For example, if CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if zswap itself isn't enabled. Frontswap has multiple backends, to make it easy for one backend to enable THP support, the THP checking is put in backend frontswap store functions instead of the general interfaces. Link: http://lkml.kernel.org/r/[email protected] Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out") Signed-off-by: "Huang, Ying" <[email protected]> Reported-by: Sergey Senozhatsky <[email protected]> Tested-by: Sergey Senozhatsky <[email protected]> Suggested-by: Minchan Kim <[email protected]> [put THP checking in backend] Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Dan Streetman <[email protected]> Cc: Seth Jennings <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Juergen Gross <[email protected]> Cc: <[email protected]> [4.14] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21certs/blacklist_nohashes.c: fix const confusion in certs blacklistAndi Kleen1-1/+1
const must be marked __initconst, not __initdata. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andi Kleen <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZEDavid Rientjes1-1/+1
chan->n_subbufs is set by the user and relay_create_buf() does a kmalloc() of chan->n_subbufs * sizeof(size_t *). kmalloc_slab() will generate a warning when this fails if chan->subbufs * sizeof(size_t *) > KMALLOC_MAX_SIZE. Limit chan->n_subbufs to the maximum allowed kmalloc() size. Link: http://lkml.kernel.org/r/[email protected] Fixes: f6302f1bcd75 ("relay: prevent integer overflow in relay_open()") Signed-off-by: David Rientjes <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Al Viro <[email protected]> Cc: Dan Carpenter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21mm, mlock, vmscan: no more skipping pagevecsShakeel Butt4-95/+54
When a thread mlocks an address space backed either by file pages which are currently not present in memory or swapped out anon pages (not in swapcache), a new page is allocated and added to the local pagevec (lru_add_pvec), I/O is triggered and the thread then sleeps on the page. On I/O completion, the thread can wake on a different CPU, the mlock syscall will then sets the PageMlocked() bit of the page but will not be able to put that page in unevictable LRU as the page is on the pagevec of a different CPU. Even on drain, that page will go to evictable LRU because the PageMlocked() bit is not checked on pagevec drain. The page will eventually go to right LRU on reclaim but the LRU stats will remain skewed for a long time. This patch puts all the pages, even unevictable, to the pagevecs and on the drain, the pages will be added on their LRUs correctly by checking their evictability. This resolves the mlocked pages on pagevec of other CPUs issue because when those pagevecs will be drained, the mlocked file pages will go to unevictable LRU. Also this makes the race with munlock easier to resolve because the pagevec drains happen in LRU lock. However there is still one place which makes a page evictable and does PageLRU check on that page without LRU lock and needs special attention. TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock(). #0: __pagevec_lru_add_fn #1: clear_page_mlock SetPageLRU() if (!TestClearPageMlocked()) return smp_mb() // <--required // inside does PageLRU if (!PageMlocked()) if (isolate_lru_page()) move to evictable LRU putback_lru_page() else move to unevictable LRU In '#1', TestClearPageMlocked() provides full memory barrier semantics and thus the PageLRU check (inside isolate_lru_page) can not be reordered before it. In '#0', without explicit memory barrier, the PageMlocked() check can be reordered before SetPageLRU(). If that happens, '#0' can put a page in unevictable LRU and '#1' might have just cleared the Mlocked bit of that page but fails to isolate as PageLRU fails as '#0' still hasn't set PageLRU bit of that page. That page will be stranded on the unevictable LRU. There is one (good) side effect though. Without this patch, the pages allocated for System V shared memory segment are added to evictable LRUs even after shmctl(SHM_LOCK) on that segment. This patch will correctly put such pages to unevictable LRU. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Jérôme Glisse <[email protected]> Cc: Huang Ying <[email protected]> Cc: Tim Chen <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Jan Kara <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Dan Williams <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21mm: memcontrol: fix NR_WRITEBACK leak in memcg and system statsJohannes Weiner1-8/+16
After commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting"), we observed slowly upward creeping NR_WRITEBACK counts over the course of several days, both the per-memcg stats as well as the system counter in e.g. /proc/meminfo. The conversion from full per-cpu stat counts to per-cpu cached atomic stat counts introduced an irq-unsafe RMW operation into the updates. Most stat updates come from process context, but one notable exception is the NR_WRITEBACK counter. While writebacks are issued from process context, they are retired from (soft)irq context. When writeback completions interrupt the RMW counter updates of new writebacks being issued, the decs from the completions are lost. Since the global updates are routed through the joint lruvec API, both the memcg counters as well as the system counters are affected. This patch makes the joint stat and event API irq safe. Link: http://lkml.kernel.org/r/[email protected] Fixes: a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting") Signed-off-by: Johannes Weiner <[email protected]> Debugged-by: Tejun Heo <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-02-21Kbuild: always define endianess in kconfig.hArnd Bergmann1-0/+6
Build testing with LTO found a couple of files that get compiled differently depending on whether asm/byteorder.h gets included early enough or not. In particular, include/asm-generic/qrwlock_types.h is affected by this, but there are probably others as well. The symptom is a series of LTO link time warnings, including these: net/netlabel/netlabel_unlabeled.h:223: error: type of 'netlbl_unlhsh_add' does not match original declaration [-Werror=lto-type-mismatch] int netlbl_unlhsh_add(struct net *net, net/netlabel/netlabel_unlabeled.c:377: note: 'netlbl_unlhsh_add' was previously declared here include/net/ipv6.h:360: error: type of 'ipv6_renew_options_kern' does not match original declaration [-Werror=lto-type-mismatch] ipv6_renew_options_kern(struct sock *sk, net/ipv6/exthdrs.c:1162: note: 'ipv6_renew_options_kern' was previously declared here net/core/dev.c:761: note: 'dev_get_by_name_rcu' was previously declared here struct net_device *dev_get_by_name_rcu(struct net *net, const char *name) net/core/dev.c:761: note: code may be misoptimized unless -fno-strict-aliasing is used drivers/gpu/drm/i915/i915_drv.h:3377: error: type of 'i915_gem_object_set_to_wc_domain' does not match original declaration [-Werror=lto-type-mismatch] i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write); drivers/gpu/drm/i915/i915_gem.c:3639: note: 'i915_gem_object_set_to_wc_domain' was previously declared here include/linux/debugfs.h:92:9: error: type of 'debugfs_attr_read' does not match original declaration [-Werror=lto-type-mismatch] ssize_t debugfs_attr_read(struct file *file, char __user *buf, fs/debugfs/file.c:318: note: 'debugfs_attr_read' was previously declared here include/linux/rwlock_api_smp.h:30: error: type of '_raw_read_unlock' does not match original declaration [-Werror=lto-type-mismatch] void __lockfunc _raw_read_unlock(rwlock_t *lock) __releases(lock); kernel/locking/spinlock.c:246:26: note: '_raw_read_unlock' was previously declared here include/linux/fs.h:3308:5: error: type of 'simple_attr_open' does not match original declaration [-Werror=lto-type-mismatch] int simple_attr_open(struct inode *inode, struct file *file, fs/libfs.c:795: note: 'simple_attr_open' was previously declared here All of the above are caused by include/asm-generic/qrwlock_types.h failing to include asm/byteorder.h after commit e0d02285f16e ("locking/qrwlock: Use 'struct qrwlock' instead of 'struct __qrwlock'") in linux-4.15. Similar bugs may or may not exist in older kernels as well, but there is no easy way to test those with link-time optimizations, and kernels before 4.14 are harder to fix because they don't have Babu's patch series We had similar issues with CONFIG_ symbols in the past and ended up always including the configuration headers though linux/kconfig.h. This works around the issue through that same file, defining either __BIG_ENDIAN or __LITTLE_ENDIAN depending on CONFIG_CPU_BIG_ENDIAN, which is now always set on all architectures since commit 4c97a0c8fee3 ("arch: define CPU_BIG_ENDIAN for all fixed big endian archs"). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Cc: Babu Moger <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nicolas Pitre <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>