aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2019-10-17Merge branch 'errata/tx2-219' into for-next/fixesWill Deacon13-33/+153
Workaround for Cavium/Marvell ThunderX2 erratum #219. * errata/tx2-219: arm64: Allow CAVIUM_TX2_ERRATUM_219 to be selected arm64: Avoid Cavium TX2 erratum 219 when switching TTBR arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT arm64: KVM: Trap VM ops when ARM64_WORKAROUND_CAVIUM_TX2_219_TVM is set
2019-10-17Merge tag 'gpio-v5.4-3' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fixes from Linus Walleij: "The fixes pertain to a problem with initializing the Intel GPIO irqchips when adding gpiochips. Andy fixed it up elegantly by adding a hardware initialization callback to the struct gpio_irq_chip so let's use this. Tested and verified on the target hardware" * tag 'gpio-v5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: lynxpoint: set default handler to be handle_bad_irq() gpio: merrifield: Move hardware initialization to callback gpio: lynxpoint: Move hardware initialization to callback gpio: intel-mid: Move hardware initialization to callback gpiolib: Initialize the hardware with a callback gpio: merrifield: Restore use of irq_base
2019-10-17btrfs: tracepoints: Fix bad entry members of qgroup eventsQu Wenruo1-1/+2
[BUG] For btrfs:qgroup_meta_reserve event, the trace event can output garbage: qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=DATA diff=2 qgroup_meta_reserve: 9c7f6acc-b342-4037-bc47-7f6e4d2232d7: refroot=5(FS_TREE) type=0x258792 diff=2 The @type can be completely garbage, as DATA type is not possible for trace_qgroup_meta_reserve() trace event. [CAUSE] Ther are several problems related to qgroup trace events: - Unassigned entry member Member entry::type of trace_qgroup_update_reserve() and trace_qgourp_meta_reserve() is not assigned - Redundant entry member Member entry::type is completely useless in trace_qgroup_meta_convert() Fixes: 4ee0d8832c2e ("btrfs: qgroup: Update trace events for metadata reservation") CC: [email protected] # 4.10+ Reviewed-by: Nikolay Borisov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2019-10-16arm64: entry.S: Do not preempt from IRQ before all cpufeatures are enabledJulien Thierry1-0/+1
Preempting from IRQ-return means that the task has its PSTATE saved on the stack, which will get restored when the task is resumed and does the actual IRQ return. However, enabling some CPU features requires modifying the PSTATE. This means that, if a task was scheduled out during an IRQ-return before all CPU features are enabled, the task might restore a PSTATE that does not include the feature enablement changes once scheduled back in. * Task 1: PAN == 0 ---| |--------------- | |<- return from IRQ, PSTATE.PAN = 0 | <- IRQ | +--------+ <- preempt() +-- ^ | reschedule Task 1, PSTATE.PAN == 1 * Init: --------------------+------------------------ ^ | enable_cpu_features set PSTATE.PAN on all CPUs Worse than this, since PSTATE is untouched when task switching is done, a task missing the new bits in PSTATE might affect another task, if both do direct calls to schedule() (outside of IRQ/exception contexts). Fix this by preventing preemption on IRQ-return until features are enabled on all CPUs. This way the only PSTATE values that are saved on the stack are from synchronous exceptions. These are expected to be fatal this early, the exception is BRK for WARN_ON(), but as this uses do_debug_exception() which keeps IRQs masked, it shouldn't call schedule(). Signed-off-by: Julien Thierry <[email protected]> [james: Replaced a really cool hack, with an even simpler static key in C. expanded commit message with Julien's cover-letter ascii art] Signed-off-by: James Morse <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2019-10-15net/sched: fix corrupted L2 header with MPLS 'push' and 'pop' actionsDavide Caratti1-2/+3
the following script: # tc qdisc add dev eth0 clsact # tc filter add dev eth0 egress protocol ip matchall \ > action mpls push protocol mpls_uc label 0x355aa bos 1 causes corruption of all IP packets transmitted by eth0. On TC egress, we can't rely on the value of skb->mac_len, because it's 0 and a MPLS 'push' operation will result in an overwrite of the first 4 octets in the packet L2 header (e.g. the Destination Address if eth0 is an Ethernet); the same error pattern is present also in the MPLS 'pop' operation. Fix this error in act_mpls data plane, computing 'mac_len' as the difference between the network header and the mac header (when not at TC ingress), and use it in MPLS 'push'/'pop' core functions. v2: unbreak 'make htmldocs' because of missing documentation of 'mac_len' in skb_mpls_pop(), reported by kbuild test robot CC: Lorenzo Bianconi <[email protected]> Fixes: 2a2ea50870ba ("net: sched: add mpls manipulation actions to TC") Reviewed-by: Simon Horman <[email protected]> Acked-by: John Hurley <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-15Merge tag 'scsi-fixes' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Five changes, two in drivers (qla2xxx, zfcp), one to MAINTAINERS (qla2xxx) and two in the core. The last two are mostly about removing incorrect messages from the kernel log: the resid message is definitely wrong and the sync cache on protected drive problem is arguably wrong" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: MAINTAINERS: Update qla2xxx driver scsi: zfcp: fix reaction on bit error threshold notification scsi: core: save/restore command resid for error handling scsi: qla2xxx: Remove WARN_ON_ONCE in qla2x00_status_cont_entry() scsi: sd: Ignore a failure to sync cache due to lack of authorization
2019-10-15ASoC: sof: include types.h at header.hKuninori Morimoto1-0/+1
Content-Transfer-Encoding: 8bit Without <types.h> we will get these error linux/include/sound/sof/header.h:125:2: error: unknown type name ‘uint32_t’uint32_t size; linux/include/sound/sof/header.h:136:2: error: unknown type name ‘uint32_t’uint32_t size; linux/include/sound/sof/header.h:137:2: error: unknown type name ‘uint32_t’uint32_t cmd; ... linux/include/sound/sof/dai-imx.h:18:2: error: unknown type name ‘uint16_t’uint16_t reserved1; linux/include/sound/sof/dai-imx.h:30:2: error: unknown type name ‘uint16_t’uint16_t tdm_slot_width; linux/include/sound/sof/dai-imx.h:31:2: error: unknown type name ‘uint16_t’uint16_t reserved2; Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-15gpiolib: Initialize the hardware with a callbackAndy Shevchenko1-0/+8
After changing the drivers to use GPIO core to add an IRQ chip it appears that some of them requires a hardware initialization before adding the IRQ chip. Add an optional callback ->init_hw() to allow that drivers to initialize hardware if needed. This change is a part of the fix NULL pointer dereference brought to the several drivers recently. Cc: Hans de Goede <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
2019-10-14xarray.h: fix kernel-doc warningRandy Dunlap1-2/+2
Fix (Sphinx) kernel-doc warning in <linux/xarray.h>: include/linux/xarray.h:232: WARNING: Unexpected indentation. Link: http://lkml.kernel.org/r/[email protected] Fixes: a3e4d3f97ec8 ("XArray: Redesign xa_alloc API") Signed-off-by: Randy Dunlap <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-14bitmap.h: fix kernel-doc warning and typoRandy Dunlap1-1/+2
Fix kernel-doc warning in <linux/bitmap.h>: include/linux/bitmap.h:341: warning: Function parameter or member 'nbits' not described in 'bitmap_or_equal' Also fix small typo (bitnaps). Link: http://lkml.kernel.org/r/[email protected] Fixes: b9fa6442f704 ("cpumask: Implement cpumask_or_equal()") Signed-off-by: Randy Dunlap <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-14mm, page_owner: rename flag indicating that page is allocatedVlastimil Babka1-1/+1
Commit 37389167a281 ("mm, page_owner: keep owner info when freeing the page") has introduced a flag PAGE_EXT_OWNER_ACTIVE to indicate that page is tracked as being allocated. Kirril suggested naming it PAGE_EXT_OWNER_ALLOCATED to make it more clear, as "active is somewhat loaded term for a page". Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Suggested-by: Kirill A. Shutemov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Walter Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-14mm, page_owner: fix off-by-one error in __set_page_owner_handle()Vlastimil Babka1-0/+8
Patch series "followups to debug_pagealloc improvements through page_owner", v3. These are followups to [1] which made it to Linus meanwhile. Patches 1 and 3 are based on Kirill's review, patch 2 on KASAN request [2]. It would be nice if all of this made it to 5.4 with [1] already there (or at least Patch 1). This patch (of 3): As noted by Kirill, commit 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage") has introduced an off-by-one error in __set_page_owner_handle() when looking up page_ext for subpages. As a result, the head page page_owner info is set twice, while for the last tail page, it's not set at all. Fix this and also make the code more efficient by advancing the page_ext pointer we already have, instead of calling lookup_page_ext() for each subpage. Since the full size of struct page_ext is not known at compile time, we can't use a simple page_ext++ statement, so introduce a page_ext_next() inline function for that. Link: http://lkml.kernel.org/r/[email protected] Fixes: 7e2f2a0cd17c ("mm, page_owner: record page owner for each subpage") Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Kirill A. Shutemov <[email protected]> Reported-by: Miles Chen <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Walter Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-14dmaengine: imx-sdma: fix size check for sdma script_numberRobin Gong1-0/+3
Illegal memory will be touch if SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3 (41) exceed the size of structure sdma_script_start_addrs(40), thus cause memory corrupt such as slob block header so that kernel trap into while() loop forever in slob_free(). Please refer to below code piece in imx-sdma.c: for (i = 0; i < sdma->script_number; i++) if (addr_arr[i] > 0) saddr_arr[i] = addr_arr[i]; /* memory corrupt here */ That issue was brought by commit a572460be9cf ("dmaengine: imx-sdma: Add support for version 3 firmware") because SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V3 (38->41 3 scripts added) not align with script number added in sdma_script_start_addrs(2 scripts). Fixes: a572460be9cf ("dmaengine: imx-sdma: Add support for version 3 firmware") Cc: [email protected] Link: https://www.spinics.net/lists/arm-kernel/msg754895.html Signed-off-by: Robin Gong <[email protected]> Reported-by: Jurgen Lambrecht <[email protected]> Link: https://lore.kernel.org/r/[email protected] [vkoul: update the patch title] Signed-off-by: Vinod Koul <[email protected]>
2019-10-13tcp: annotate sk->sk_wmem_queued lockless readsEric Dumazet2-6/+11
For the sake of tcp_poll(), there are few places where we fetch sk->sk_wmem_queued while this field can change from IRQ or other cpu. We need to add READ_ONCE() annotations, and also make sure write sides use corresponding WRITE_ONCE() to avoid store-tearing. sk_wmem_queued_add() helper is added so that we can in the future convert to ADD_ONCE() or equivalent if/when available. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-13tcp: annotate sk->sk_sndbuf lockless readsEric Dumazet1-7/+11
For the sake of tcp_poll(), there are few places where we fetch sk->sk_sndbuf while this field can change from IRQ or other cpu. We need to add READ_ONCE() annotations, and also make sure write sides use corresponding WRITE_ONCE() to avoid store-tearing. Note that other transports probably need similar fixes. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-13tcp: annotate sk->sk_rcvbuf lockless readsEric Dumazet2-3/+3
For the sake of tcp_poll(), there are few places where we fetch sk->sk_rcvbuf while this field can change from IRQ or other cpu. We need to add READ_ONCE() annotations, and also make sure write sides use corresponding WRITE_ONCE() to avoid store-tearing. Note that other transports probably need similar fixes. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-13tcp: annotate tp->snd_nxt lockless readsEric Dumazet1-1/+2
There are few places where we fetch tp->snd_nxt while this field can change from IRQ or other cpu. We need to add READ_ONCE() annotations, and also make sure write sides use corresponding WRITE_ONCE() to avoid store-tearing. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-13tcp: annotate tp->write_seq lockless readsEric Dumazet1-1/+1
There are few places where we fetch tp->write_seq while this field can change from IRQ or other cpu. We need to add READ_ONCE() annotations, and also make sure write sides use corresponding WRITE_ONCE() to avoid store-tearing. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-13tcp: add rcu protection around tp->fastopen_rskEric Dumazet1-3/+3
Both tcp_v4_err() and tcp_v6_err() do the following operations while they do not own the socket lock : fastopen = tp->fastopen_rsk; snd_una = fastopen ? tcp_rsk(fastopen)->snt_isn : tp->snd_una; The problem is that without appropriate barrier, the compiler might reload tp->fastopen_rsk and trigger a NULL deref. request sockets are protected by RCU, we can simply add the missing annotations and barriers to solve the issue. Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-10-13Merge tag 'hwmon-for-v5.4-rc3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Update/fix inspur-ipsps1 and k10temp Documentation - Fix nct7904 driver - Fix HWMON_P_MIN_ALARM mask in hwmon core * tag 'hwmon-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: docs: Extend inspur-ipsps1 title underline hwmon: (nct7904) Add array fan_alarm and vsen_alarm to store the alarms in nct7904_data struct. docs: hwmon: Include 'inspur-ipsps1.rst' into docs hwmon: Fix HWMON_P_MIN_ALARM mask hwmon: (k10temp) Update documentation and add temp2_input info hwmon: (nct7904) Fix the incorrect value of vsen_mask in nct7904_data struct
2019-10-12Merge tag 'tty-5.4-rc3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small tty and serial driver fixes for 5.4-rc3 that resolve a number of reported issues and regressions. None of these are huge, full details are in the shortlog. There's also a MAINTAINERS update that I think you might have already taken in your tree already, but git should handle that merge easily. All have been in linux-next with no reported issues" * tag 'tty-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: MAINTAINERS: kgdb: Add myself as a reviewer for kgdb/kdb tty: serial: imx: Use platform_get_irq_optional() for optional IRQs serial: fix kernel-doc warning in comments serial: 8250_omap: Fix gpio check for auto RTS/CTS serial: mctrl_gpio: Check for NULL pointer tty: serial: fsl_lpuart: Fix lpuart_flush_buffer() tty: serial: Fix PORT_LINFLEXUART definition tty: n_hdlc: fix build on SPARC serial: uartps: Fix uartps_major handling serial: uartlite: fix exit path null pointer tty: serial: linflexuart: Fix magic SysRq handling serial: sh-sci: Use platform_get_irq_optional() for optional interrupts dt-bindings: serial: sh-sci: Document r8a774b1 bindings serial/sifive: select SERIAL_EARLYCON tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()' tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
2019-10-12Merge tag 'usb-5.4-rc3' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are a lot of small USB driver fixes for 5.4-rc3. syzbot has stepped up its testing of the USB driver stack, now able to trigger fun race conditions between disconnect and probe functions. Because of that we have a lot of fixes in here from Johan and others fixing these reported issues that have been around since almost all time. We also are just deleting the rio500 driver, making all of the syzbot bugs found in it moot as it turns out no one has been using it for years as there is a userspace version that is being used instead. There are also a number of other small fixes in here, all resolving reported issues or regressions. All have been in linux-next without any reported issues" * tag 'usb-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (65 commits) USB: yurex: fix NULL-derefs on disconnect USB: iowarrior: use pr_err() USB: iowarrior: drop redundant iowarrior mutex USB: iowarrior: drop redundant disconnect mutex USB: iowarrior: fix use-after-free after driver unbind USB: iowarrior: fix use-after-free on release USB: iowarrior: fix use-after-free on disconnect USB: chaoskey: fix use-after-free on release USB: adutux: fix use-after-free on release USB: ldusb: fix NULL-derefs on driver unbind USB: legousbtower: fix use-after-free on release usb: cdns3: Fix for incorrect DMA mask. usb: cdns3: fix cdns3_core_init_role() usb: cdns3: gadget: Fix full-speed mode USB: usb-skeleton: drop redundant in-urb check USB: usb-skeleton: fix use-after-free after driver unbind USB: usb-skeleton: fix NULL-deref on disconnect usb:cdns3: Fix for CV CH9 running with g_zero driver. usb: dwc3: Remove dev_err() on platform_get_irq() failure usb: dwc3: Switch to platform_get_irq_byname_optional() ...
2019-10-12Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds1-4/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI fixes from Ingo Molnar: "Misc EFI fixes all across the map: CPER error report fixes, fixes to TPM event log parsing, fix for a kexec hang, a Sparse fix and other fixes" * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/tpm: Fix sanity check of unsigned tbl_size being less than zero efi/x86: Do not clean dummy variable in kexec path efi: Make unexported efi_rci2_sysfs_init() static efi/tpm: Only set 'efi_tpm_final_log_size' after successful event log parsing efi/tpm: Don't traverse an event log with no events efi/tpm: Don't access event->count when it isn't mapped efivar/ssdt: Don't iterate over EFI vars if no SSDT override was specified efi/cper: Fix endianness of PCIe class code
2019-10-12Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-1/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "A handful of fixes: a kexec linking fix, an AMD MWAITX fix, a vmware guest support fix when built under Clang, and new CPU model number definitions" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add Comet Lake to the Intel CPU models header lib/string: Make memzero_explicit() inline instead of external x86/cpu/vmware: Use the full form of INL in VMWARE_PORT x86/asm: Fix MWAITX C-state hint value
2019-10-11Merge tag 'nfs-for-5.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds1-0/+1
Pull NFS client bugfixes from Anna Schumaker: "Stable bugfixes: - Fix O_DIRECT accounting of number of bytes read/written # v4.1+ Other fixes: - Fix nfsi->nrequests count error on nfs_inode_remove_request() - Remove redundant mirror tracking in O_DIRECT - Fix leak of clp->cl_acceptor string - Fix race to sk_err after xs_error_report" * tag 'nfs-for-5.4-2' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: fix race to sk_err after xs_error_report NFSv4: Fix leak of clp->cl_acceptor string NFS: Remove redundant mirror tracking in O_DIRECT NFS: Fix O_DIRECT accounting of number of bytes read/written nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
2019-10-11Merge tag 'modules-for-v5.4-rc3' of ↵Linus Torvalds1-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module fixes from Jessica Yu: "Code cleanups and kbuild/namespace related fixups from Masahiro. Most importantly, it fixes a namespace-related modpost issue for external module builds - Fix broken external module builds due to a modpost bug in read_dump(), where the namespace was not being strdup'd and sym->namespace would be set to bogus data. - Various namespace-related kbuild fixes and cleanups thanks to Masahiro Yamada" * tag 'modules-for-v5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: doc: move namespaces.rst from kbuild/ to core-api/ nsdeps: make generated patches independent of locale nsdeps: fix hashbang of scripts/nsdeps kbuild: fix build error of 'make nsdeps' in clean tree module: rename __kstrtab_ns_* to __kstrtabns_* to avoid symbol conflict modpost: fix broken sym->namespace for external module builds module: swap the order of symbol.namespace scripts: add_namespace: Fix coccicheck failed
2019-10-11compiler_attributes.h: Add 'fallthrough' pseudo keyword for switch/case useJoe Perches1-0/+17
Reserve the pseudo keyword 'fallthrough' for the ability to convert the various case block /* fallthrough */ style comments to appear to be an actual reserved word with the same gcc case block missing fallthrough warning capability. All switch/case blocks now should end in one of: break; fallthrough; goto <label>; return [expression]; continue; In C mode, GCC supports the __fallthrough__ attribute since 7.1, the same time the warning and the comment parsing were introduced. fallthrough devolves to an empty "do {} while (0)" if the compiler version (any version less than gcc 7) does not support the attribute. Signed-off-by: Joe Perches <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Suggested-by: Dan Carpenter <[email protected]> Cc: Miguel Ojeda <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-10SUNRPC: fix race to sk_err after xs_error_reportBenjamin Coddington1-0/+1
Since commit 4f8943f80883 ("SUNRPC: Replace direct task wakeups from softirq context") there has been a race to the value of the sk_err if both XPRT_SOCK_WAKE_ERROR and XPRT_SOCK_WAKE_DISCONNECT are set. In that case, we may end up losing the sk_err value that existed when xs_error_report was called. Fix this by reverting to the previous behavior: instead of using SO_ERROR to retrieve the value at a later time (which might also return sk_err_soft), copy the sk_err value onto struct sock_xprt, and use that value to wake pending tasks. Signed-off-by: Benjamin Coddington <[email protected]> Fixes: 4f8943f80883 ("SUNRPC: Replace direct task wakeups from softirq context") Signed-off-by: Anna Schumaker <[email protected]>
2019-10-10ASoC: SOF: acpi led support for switch controlsJaska Uimonen1-0/+4
Currently sof doesn't support acpi leds with mute switches. So implement acpi leds following quite shamelessly existing HDA implementation by Takashi Iwai. Mute leds can be enabled in topology by adding led and direction token in switch control private data. Signed-off-by: Jaska Uimonen <[email protected]> Signed-off-by: Pierre-Louis Bossart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-10ASoC: SOF: imx: Describe ESAI parameters to be sent to DSPDaniel Baluta4-3/+38
Introduce sof_ipc_dai_esai_params to keep information that we get from topology and we send to DSP FW. Also bump the ABI minor to reflect the changes on DSP FW. Signed-off-by: Daniel Baluta <[email protected]> Signed-off-by: Pierre-Louis Bossart <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-09net: silence KCSAN warnings about sk->sk_backlog.len readsEric Dumazet1-1/+2
sk->sk_backlog.len can be written by BH handlers, and read from process contexts in a lockless way. Note the write side should also use WRITE_ONCE() or a variant. We need some agreement about the best way to do this. syzbot reported : BUG: KCSAN: data-race in tcp_add_backlog / tcp_grow_window.isra.0 write to 0xffff88812665f32c of 4 bytes by interrupt on cpu 1: sk_add_backlog include/net/sock.h:934 [inline] tcp_add_backlog+0x4a0/0xcc0 net/ipv4/tcp_ipv4.c:1737 tcp_v4_rcv+0x1aba/0x1bf0 net/ipv4/tcp_ipv4.c:1925 ip_protocol_deliver_rcu+0x51/0x470 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5004 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5118 netif_receive_skb_internal+0x59/0x190 net/core/dev.c:5208 napi_skb_finish net/core/dev.c:5671 [inline] napi_gro_receive+0x28f/0x330 net/core/dev.c:5704 receive_buf+0x284/0x30b0 drivers/net/virtio_net.c:1061 virtnet_receive drivers/net/virtio_net.c:1323 [inline] virtnet_poll+0x436/0x7d0 drivers/net/virtio_net.c:1428 napi_poll net/core/dev.c:6352 [inline] net_rx_action+0x3ae/0xa50 net/core/dev.c:6418 read to 0xffff88812665f32c of 4 bytes by task 7292 on cpu 0: tcp_space include/net/tcp.h:1373 [inline] tcp_grow_window.isra.0+0x6b/0x480 net/ipv4/tcp_input.c:413 tcp_event_data_recv+0x68f/0x990 net/ipv4/tcp_input.c:717 tcp_rcv_established+0xbfe/0xf50 net/ipv4/tcp_input.c:5618 tcp_v4_do_rcv+0x381/0x4e0 net/ipv4/tcp_ipv4.c:1542 sk_backlog_rcv include/net/sock.h:945 [inline] __release_sock+0x135/0x1e0 net/core/sock.c:2427 release_sock+0x61/0x160 net/core/sock.c:2943 tcp_recvmsg+0x63b/0x1a30 net/ipv4/tcp.c:2181 inet_recvmsg+0xbb/0x250 net/ipv4/af_inet.c:838 sock_recvmsg_nosec net/socket.c:871 [inline] sock_recvmsg net/socket.c:889 [inline] sock_recvmsg+0x92/0xb0 net/socket.c:885 sock_read_iter+0x15f/0x1e0 net/socket.c:967 call_read_iter include/linux/fs.h:1864 [inline] new_sync_read+0x389/0x4f0 fs/read_write.c:414 __vfs_read+0xb1/0xc0 fs/read_write.c:427 vfs_read fs/read_write.c:461 [inline] vfs_read+0x143/0x2c0 fs/read_write.c:446 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 7292 Comm: syz-fuzzer Not tainted 5.3.0+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-10-09net: annotate sk->sk_rcvlowat lockless readsEric Dumazet1-1/+3
sock_rcvlowat() or int_sk_rcvlowat() might be called without the socket lock for example from tcp_poll(). Use READ_ONCE() to document the fact that other cpus might change sk->sk_rcvlowat under us and avoid KCSAN splats. Use WRITE_ONCE() on write sides too. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-10-09tcp: annotate lockless access to tcp_memory_pressureEric Dumazet1-1/+1
tcp_memory_pressure is read without holding any lock, and its value could be changed on other cpus. Use READ_ONCE() to annotate these lockless reads. The write side is already using atomic ops. Fixes: b8da51ebb1aa ("tcp: introduce tcp_under_memory_pressure()") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-10-09net: add {READ|WRITE}_ONCE() annotations on ->rskq_accept_headEric Dumazet1-2/+2
reqsk_queue_empty() is called from inet_csk_listen_poll() while other cpus might write ->rskq_accept_head value. Use {READ|WRITE}_ONCE() to avoid compiler tricks and potential KCSAN splats. Fixes: fff1f3001cc5 ("tcp: add a spinlock to protect struct request_sock_queue") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-10-09sctp: add chunks to sk_backlog when the newsk sk_socket is not setXin Long1-0/+5
This patch is to fix a NULL-ptr deref in selinux_socket_connect_helper: [...] kasan: GPF could be caused by NULL-ptr deref or user memory access [...] RIP: 0010:selinux_socket_connect_helper+0x94/0x460 [...] Call Trace: [...] selinux_sctp_bind_connect+0x16a/0x1d0 [...] security_sctp_bind_connect+0x58/0x90 [...] sctp_process_asconf+0xa52/0xfd0 [sctp] [...] sctp_sf_do_asconf+0x785/0x980 [sctp] [...] sctp_do_sm+0x175/0x5a0 [sctp] [...] sctp_assoc_bh_rcv+0x285/0x5b0 [sctp] [...] sctp_backlog_rcv+0x482/0x910 [sctp] [...] __release_sock+0x11e/0x310 [...] release_sock+0x4f/0x180 [...] sctp_accept+0x3f9/0x5a0 [sctp] [...] inet_accept+0xe7/0x720 It was caused by that the 'newsk' sk_socket was not set before going to security sctp hook when processing asconf chunk with SCTP_PARAM_ADD_IP or SCTP_PARAM_SET_PRIMARY: inet_accept()-> sctp_accept(): lock_sock(): lock listening 'sk' do_softirq(): sctp_rcv(): <-- [1] asconf chunk arrives and enqueued in 'sk' backlog sctp_sock_migrate(): set asoc's sk to 'newsk' release_sock(): sctp_backlog_rcv(): lock 'newsk' sctp_process_asconf() <-- [2] unlock 'newsk' sock_graft(): set sk_socket <-- [3] As it shows, at [1] the asconf chunk would be put into the listening 'sk' backlog, as accept() was holding its sock lock. Then at [2] asconf would get processed with 'newsk' as asoc's sk had been set to 'newsk'. However, 'newsk' sk_socket is not set until [3], while selinux_sctp_bind_connect() would deref it, then kernel crashed. Here to fix it by adding the chunk to sk_backlog until newsk sk_socket is set when .accept() is done. Note that sk->sk_socket can be NULL when the sock is closed, so SOCK_DEAD flag is also needed to check in sctp_newsk_ready(). Thanks to Ondrej for reviewing the code. Fixes: d452930fd3b9 ("selinux: Add SCTP support") Reported-by: Ying Xu <[email protected]> Suggested-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-10-09ASoC: simple_card_utils.h: Add missing includeDaniel Baluta1-0/+1
When debug is enabled compiler cannot find the definition of clk_get_rate resulting in the following error: ./include/sound/simple_card_utils.h:168:40: note: previous implicit declaration of ‘clk_get_rate’ was here dev_dbg(dev, "%s clk %luHz\n", name, clk_get_rate(dai->clk)); ./include/sound/simple_card_utils.h:168:3: note: in expansion of macro ‘dev_dbg’ dev_dbg(dev, "%s clk %luHz\n", name, clk_get_rate(dai->clk)); Fix this by including the appropriate header. Fixes: 0580dde59438686d ("ASoC: simple-card-utils: add asoc_simple_debug_info()") Signed-off-by: Daniel Baluta <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-09ASoC: simple_card_utils.h: Fix potential multiple redefinition errorDaniel Baluta1-4/+4
asoc_simple_debug_info and asoc_simple_debug_dai must be static otherwise we might a compilation error if the compiler decides not to inline the given function. Fixes: 0580dde59438686d ("ASoC: simple-card-utils: add asoc_simple_debug_info()") Signed-off-by: Daniel Baluta <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-08Merge tag 'mac80211-for-davem-2019-10-08' of ↵Jakub Kicinski1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A number of fixes: * allow scanning when operating on radar channels in ETSI regdomains * accept deauth frames in IBSS - we have code to parse and handle them, but were dropping them early * fix an allocation failure path in hwsim * fix a failure path memory leak in nl80211 FTM code * fix RCU handling & locking in multi-BSSID parsing * reject malformed SSID in mac80211 (this shouldn't really be able to happen, but defense in depth) * avoid userspace buffer overrun in ancient wext code if SSID was too long ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2019-10-08Merge tag 'led-fixes-for-5.4-rc3' of ↵Linus Torvalds1-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED fixes from Jacek Anaszewski: - fix a leftover from earlier stage of development in the documentation of recently added led_compose_name() and fix old mistake in the documentation of led_set_brightness_sync() parameter name. - MAINTAINERS: add pointer to Pavel Machek's linux-leds.git tree. Pavel is going to take over LED tree maintainership from myself. * tag 'led-fixes-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: Add my linux-leds branch to MAINTAINERS leds: core: Fix leds.h structure documentation
2019-10-08llc: fix sk_buff leak in llc_conn_service()Eric Biggers1-1/+1
syzbot reported: BUG: memory leak unreferenced object 0xffff88811eb3de00 (size 224): comm "syz-executor559", pid 7315, jiffies 4294943019 (age 10.300s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 a0 38 24 81 88 ff ff 00 c0 f2 15 81 88 ff ff ..8$............ backtrace: [<000000008d1c66a1>] kmemleak_alloc_recursive include/linux/kmemleak.h:55 [inline] [<000000008d1c66a1>] slab_post_alloc_hook mm/slab.h:439 [inline] [<000000008d1c66a1>] slab_alloc_node mm/slab.c:3269 [inline] [<000000008d1c66a1>] kmem_cache_alloc_node+0x153/0x2a0 mm/slab.c:3579 [<00000000447d9496>] __alloc_skb+0x6e/0x210 net/core/skbuff.c:198 [<000000000cdbf82f>] alloc_skb include/linux/skbuff.h:1058 [inline] [<000000000cdbf82f>] llc_alloc_frame+0x66/0x110 net/llc/llc_sap.c:54 [<000000002418b52e>] llc_conn_ac_send_sabme_cmd_p_set_x+0x2f/0x140 net/llc/llc_c_ac.c:777 [<000000001372ae17>] llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline] [<000000001372ae17>] llc_conn_service net/llc/llc_conn.c:400 [inline] [<000000001372ae17>] llc_conn_state_process+0x1ac/0x640 net/llc/llc_conn.c:75 [<00000000f27e53c1>] llc_establish_connection+0x110/0x170 net/llc/llc_if.c:109 [<00000000291b2ca0>] llc_ui_connect+0x10e/0x370 net/llc/af_llc.c:477 [<000000000f9c740b>] __sys_connect+0x11d/0x170 net/socket.c:1840 [...] The bug is that most callers of llc_conn_send_pdu() assume it consumes a reference to the skb, when actually due to commit b85ab56c3f81 ("llc: properly handle dev_queue_xmit() return value") it doesn't. Revert most of that commit, and instead make the few places that need llc_conn_send_pdu() to *not* consume a reference call skb_get() before. Fixes: b85ab56c3f81 ("llc: properly handle dev_queue_xmit() return value") Reported-by: [email protected] Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-10-08leds: core: Fix leds.h structure documentationDan Murphy1-3/+2
Update the leds.h structure documentation to define the correct arguments. Signed-off-by: Dan Murphy <[email protected]> Signed-off-by: Jacek Anaszewski <[email protected]>
2019-10-08ASoC: soc-component: remove snd_pcm_ops from component driverKuninori Morimoto1-5/+0
No driver is using snd_pcm_ops on component driver. This patch removes it. Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-08ASoC: pxa: remove snd_pcm_opsKuninori Morimoto1-2/+24
snd_pcm_ops is no longer needed. Let's use component driver callback. Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-08ASoC: soc-core: add snd_soc_pcm_lib_ioctl()Kuninori Morimoto1-0/+5
add snd_soc_pcm_lib_ioctl() to bypass to snd_pcm_lib_ioctl() Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-08ASoC: soc-core: add new pcm_construct/pcm_destructKuninori Morimoto1-0/+6
Current snd_soc_component_driver has pcm_new/pcm_free, but, it doesn't have "component" at parameter. Thus, each callback can't know it is called for which component. Each callback currently is getting "component" by using snd_soc_rtdcom_lookup() with driver name. It works today, but, will not work in the future if we support multi CPU/Codec/Platform, because 1 rtd might have multiple same driver name component. To solve this issue, each callback need to be called with component. This patch adds new pcm_construct/pcm_destruct with "component" parameter. Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-08ASoC: soc-core: merge snd_pcm_ops member to component driverKuninori Morimoto1-0/+34
Current snd_soc_component_driver has snd_pcm_ops, and each driver can have callback via it (1). But, it is mainly created for ALSA, thus, it doesn't have "component" as parameter for ALSA SoC (1)(2). Thus, each callback can't know it is called for which component. Thus, each callback currently is getting "component" by using snd_soc_rtdcom_lookup() with driver name (3). --- ALSA SoC --- ... if (component->driver->ops && component->driver->ops->open) (1) return component->driver->ops->open(substream); ... --- driver --- (2) static int xxx_open(struct snd_pcm_substream *substream) { struct snd_soc_pcm_runtime *rtd = substream->private_data; (3) struct snd_soc_component *component = snd_soc_rtdcom_lookup(..); ... } It works today, but, will not work in the future if we support multi CPU/Codec/Platform, because 1 rtd might have multiple components which have same driver name. To solve this issue, each callback needs to be called with component. We already have many component driver callback. This patch copies each snd_pcm_ops member under component driver, and having "component" as parameter. --- ALSA SoC --- ... if (component->driver->open) => return component->driver->open(component, substream); ... --- driver --- => static int xxx_open(struct snd_soc_component *component, struct snd_pcm_substream *substream) { ... } *Note* Only Intel skl-pcm has .get_time_info implementation, but ALSA SoC framework doesn't call it so far. To keep its implementation, this patch keeps .get_time_info, but it is still not called. Intel guy need to support it in the future. Signed-off-by: Kuninori Morimoto <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-10-08lib/string: Make memzero_explicit() inline instead of externalArvind Sankar1-1/+20
With the use of the barrier implied by barrier_data(), there is no need for memzero_explicit() to be extern. Making it inline saves the overhead of a function call, and allows the code to be reused in arch/*/purgatory without having to duplicate the implementation. Tested-by: Hans de Goede <[email protected]> Signed-off-by: Arvind Sankar <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Cc: Ard Biesheuvel <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: H . Peter Anvin <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephan Mueller <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: 906a4bb97f5d ("crypto: sha256 - Use get/put_unaligned_be32 to get input, memzero_explicit") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-10-07Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-0/+33
Merge misc fixes from Andrew Morton: "The usual shower of hotfixes. Chris's memcg patches aren't actually fixes - they're mature but a few niggling review issues were late to arrive. The ocfs2 fixes are quite old - those took some time to get reviewer attention. Subsystems affected by this patch series: ocfs2, hotfixes, mm/memcg, mm/slab-generic" * emailed patches from Andrew Morton <[email protected]>: mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two) mm, sl[ou]b: improve memory accounting mm, memcg: make scan aggression always exclude protection mm, memcg: make memory.emin the baseline for utilisation determination mm, memcg: proportional memory.{low,min} reclaim mm/vmpressure.c: fix a signedness bug in vmpressure_register_event() mm/page_alloc.c: fix a crash in free_pages_prepare() mm/z3fold.c: claim page in the beginning of free kernel/sysctl.c: do not override max_threads provided by userspace memcg: only record foreign writebacks with dirty pages when memcg is not disabled mm: fix -Wmissing-prototypes warnings writeback: fix use-after-free in finish_writeback_work() mm/memremap: drop unused SECTION_SIZE and SECTION_MASK panic: ensure preemption is disabled during panic() fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc() fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry() ocfs2: clear zero in unaligned direct IO
2019-10-07mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)Vlastimil Babka1-0/+4
In most configurations, kmalloc() happens to return naturally aligned (i.e. aligned to the block size itself) blocks for power of two sizes. That means some kmalloc() users might unknowingly rely on that alignment, until stuff breaks when the kernel is built with e.g. CONFIG_SLUB_DEBUG or CONFIG_SLOB, and blocks stop being aligned. Then developers have to devise workaround such as own kmem caches with specified alignment [1], which is not always practical, as recently evidenced in [2]. The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()' variant would not help with code unknowingly relying on the implicit alignment. For slab implementations it would either require creating more kmalloc caches, or allocate a larger size and only give back part of it. That would be wasteful, especially with a generic alignment parameter (in contrast with a fixed alignment to size). Ideally we should provide to mm users what they need without difficult workarounds or own reimplementations, so let's make the kmalloc() alignment to size explicitly guaranteed for power-of-two sizes under all configurations. What this means for the three available allocators? * SLAB object layout happens to be mostly unchanged by the patch. The implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due to redzoning, however SLAB disables redzoning for caches with alignment larger than unsigned long long. Practically on at least x86 this includes kmalloc caches as they use cache line alignment, which is larger than that. Still, this patch ensures alignment on all arches and cache sizes. * SLUB layout is also unchanged unless redzoning is enabled through CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With this patch, explicit alignment is guaranteed with redzoning as well. This will result in more memory being wasted, but that should be acceptable in a debugging scenario. * SLOB has no implicit alignment so this patch adds it explicitly for kmalloc(). The potential downside is increased fragmentation. While pathological allocation scenarios are certainly possible, in my testing, after booting a x86_64 kernel+userspace with virtme, around 16MB memory was consumed by slab pages both before and after the patch, with difference in the noise. [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@c-s.fr/ [2] https://lore.kernel.org/linux-fsdevel/[email protected]/ [3] https://lwn.net/Articles/787740/ [[email protected]: documentation fixlet, per Matthew] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Christoph Hellwig <[email protected]> Cc: David Sterba <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ming Lei <[email protected]> Cc: Dave Chinner <[email protected]> Cc: "Darrick J . Wong" <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: James Bottomley <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-10-07mm, memcg: make scan aggression always exclude protectionChris Down1-13/+12
This patch is an incremental improvement on the existing memory.{low,min} relative reclaim work to base its scan pressure calculations on how much protection is available compared to the current usage, rather than how much the current usage is over some protection threshold. This change doesn't change the experience for the user in the normal case too much. One benefit is that it replaces the (somewhat arbitrary) 100% cutoff with an indefinite slope, which makes it easier to ballpark a memory.low value. As well as this, the old methodology doesn't quite apply generically to machines with varying amounts of physical memory. Let's say we have a top level cgroup, workload.slice, and another top level cgroup, system-management.slice. We want to roughly give 12G to system-management.slice, so on a 32GB machine we set memory.low to 20GB in workload.slice, and on a 64GB machine we set memory.low to 52GB. However, because these are relative amounts to the total machine size, while the amount of memory we want to generally be willing to yield to system.slice is absolute (12G), we end up putting more pressure on system.slice just because we have a larger machine and a larger workload to fill it, which seems fairly unintuitive. With this new behaviour, we don't end up with this unintended side effect. Previously the way that memory.low protection works is that if you are 50% over a certain baseline, you get 50% of your normal scan pressure. This is certainly better than the previous cliff-edge behaviour, but it can be improved even further by always considering memory under the currently enforced protection threshold to be out of bounds. This means that we can set relatively low memory.low thresholds for variable or bursty workloads while still getting a reasonable level of protection, whereas with the previous version we may still trivially hit the 100% clamp. The previous 100% clamp is also somewhat arbitrary, whereas this one is more concretely based on the currently enforced protection threshold, which is likely easier to reason about. There is also a subtle issue with the way that proportional reclaim worked previously -- it promotes having no memory.low, since it makes pressure higher during low reclaim. This happens because we base our scan pressure modulation on how far memory.current is between memory.min and memory.low, but if memory.low is unset, we only use the overage method. In most cromulent configurations, this then means that we end up with *more* pressure than with no memory.low at all when we're in low reclaim, which is not really very usable or expected. With this patch, memory.low and memory.min affect reclaim pressure in a more understandable and composable way. For example, from a user standpoint, "protected" memory now remains untouchable from a reclaim aggression standpoint, and users can also have more confidence that bursty workloads will still receive some amount of guaranteed protection. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Chris Down <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Dennis Zhou <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>