aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-02-15Merge tag 'drm-misc-fixes-2019-02-13' of ↵Dave Airlie7-35/+9
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes drm-misc-fixes for v5.0: - Fix license inconsistency in vkms. Signed-off-by: Dave Airlie <[email protected]> From: Maarten Lankhorst <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-02-14dm thin: fix bug where bio that overwrites thin block ignores FUANikos Tsironis1-5/+50
When provisioning a new data block for a virtual block, either because the block was previously unallocated or because we are breaking sharing, if the whole block of data is being overwritten the bio that triggered the provisioning is issued immediately, skipping copying or zeroing of the data block. When this bio completes the new mapping is inserted in to the pool's metadata by process_prepared_mapping(), where the bio completion is signaled to the upper layers. This completion is signaled without first committing the metadata. If the bio in question has the REQ_FUA flag set and the system crashes right after its completion and before the next metadata commit, then the write is lost despite the REQ_FUA flag requiring that I/O completion for this request must only be signaled after the data has been committed to non-volatile storage. Fix this by deferring the completion of overwrite bios, with the REQ_FUA flag set, until after the metadata has been committed. Cc: [email protected] Signed-off-by: Nikos Tsironis <[email protected]> Acked-by: Joe Thornber <[email protected]> Acked-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2019-02-14Revert "exec: load_script: don't blindly truncate shebang string"Linus Torvalds1-7/+3
This reverts commit 8099b047ecc431518b9bb6bdbba3549bbecdc343. It turns out that people do actually depend on the shebang string being truncated, and on the fact that an interpreter (like perl) will often just re-interpret it entirely to get the full argument list. Reported-by: Samuel Dionne-Riel <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-14Revert "gfs2: read journal in large chunks to locate the head"Bob Peterson8-192/+134
This reverts commit 2a5f14f279f59143139bcd1606903f2f80a34241. This patch causes xfstests generic/311 to fail. Reverting this for now until we have a proper fix. Signed-off-by: Abhi Das <[email protected]> Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-14net: ethernet: freescale: set FEC ethtool regs versionVivien Didelot1-0/+4
Currently the ethtool_regs version is set to 0 for FEC devices. Use this field to store the register dump version exposed by the kernel. The choosen version 2 corresponds to the kernel compile test: #if defined(CONFIG_M523x) || defined(CONFIG_M527x) || defined(CONFIG_M528x) || defined(CONFIG_M520x) || defined(CONFIG_M532x) || defined(CONFIG_ARM) || defined(CONFIG_ARM64) || defined(CONFIG_COMPILE_TEST) and version 1 corresponds to the opposite. Binaries of ethtool unaware of this version will dump the whole set as usual. Signed-off-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-14net: hns: Fix object reference leaks in hns_dsaf_roce_reset()Huang Zijiang1-0/+2
The of_find_device_by_node() takes a reference to the underlying device structure, we should release that reference. Signed-off-by: Huang Zijiang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-14mm: page_alloc: fix ref bias in page_frag_alloc() for 1-byte allocsJann Horn1-4/+4
The basic idea behind ->pagecnt_bias is: If we pre-allocate the maximum number of references that we might need to create in the fastpath later, the bump-allocation fastpath only has to modify the non-atomic bias value that tracks the number of extra references we hold instead of the atomic refcount. The maximum number of allocations we can serve (under the assumption that no allocation is made with size 0) is nc->size, so that's the bias used. However, even when all memory in the allocation has been given away, a reference to the page is still held; and in the `offset < 0` slowpath, the page may be reused if everyone else has dropped their references. This means that the necessary number of references is actually `nc->size+1`. Luckily, from a quick grep, it looks like the only path that can call page_frag_alloc(fragsz=1) is TAP with the IFF_NAPI_FRAGS flag, which requires CAP_NET_ADMIN in the init namespace and is only intended to be used for kernel testing and fuzzing. To test for this issue, put a `WARN_ON(page_ref_count(page) == 0)` in the `offset < 0` path, below the virt_to_page() call, and then repeatedly call writev() on a TAP device with IFF_TAP|IFF_NO_PI|IFF_NAPI_FRAGS|IFF_NAPI, with a vector consisting of 15 elements containing 1 byte each. Signed-off-by: Jann Horn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-14Merge branch 'net-phy-fix-locking-issue'David S. Miller2-20/+8
Heiner Kallweit says: ==================== net: phy: fix locking issue Russell pointed out that the locking used in phy_is_started() isn't needed and misleading. This locking also contributes to a race fixed with patch 2. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-02-14net: phy: fix potential race in the phylib state machineHeiner Kallweit1-0/+2
Russell reported the following race in the phylib state machine (quoting from his mail): if (phy_polling_mode(phydev) && phy_is_started(phydev)) phy_queue_state_machine(phydev, PHY_STATE_TIME); state = PHY_UP thread 0 thread 1 phy_disconnect() +-phy_is_started() phy_is_started() | `-phy_stop() +-phydev->state = PHY_HALTED `-phy_stop_machine() `-cancel_delayed_work_sync() phy_queue_state_machine() `-mod_delayed_work() At this point, the phydev->state_queue() has been added back onto the system workqueue despite phy_stop_machine() having been called and cancel_delayed_work_sync() called on it. Fix this by protecting the complete operation in thread 0. Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking") Reported-by: Russell King - ARM Linux admin <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-14net: phy: don't use locking in phy_is_startedHeiner Kallweit2-20/+6
Russell suggested to remove the locking from phy_is_started() because the read is atomic anyway and actually the locking may be more misleading. Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking") Suggested-by: Russell King - ARM Linux admin <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-14selftests: fix timestamping MakefileDeepa Dinamani1-3/+0
The clean target in the makefile conflicts with the generic kselftests lib.mk, and fails to properly remove the compiled test programs. Remove the redundant rule, the TEST_GEN_FILES will be already removed by the CLEAN macro in lib.mk. Signed-off-by: Deepa Dinamani <[email protected]> Acked-by: Shuah Khan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-13net: dsa: bcm_sf2: potential array overflow in bcm_sf2_sw_suspend()Dan Carpenter1-1/+1
The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less than or equal to DSA_MAX_PORTS. The ds->ports[] array is used inside the dsa_is_user_port() and dsa_is_cpu_port() functions. The ds->ports[] array is allocated in dsa_switch_alloc() and it has ds->num_ports elements so this leads to a static checker warning about a potential out of bounds read. Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-13net: fix possible overflow in __sk_mem_raise_allocated()Eric Dumazet2-2/+2
With many active TCP sockets, fat TCP sockets could fool __sk_mem_raise_allocated() thanks to an overflow. They would increase their share of the memory, instead of decreasing it. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-13dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exitJohn David Anglin1-6/+22
The GPIO interrupt controller on the espressobin board only supports edge interrupts. If one enables the use of hardware interrupts in the device tree for the 88E6341, it is possible to miss an edge. When this happens, the INTn pin on the Marvell switch is stuck low and no further interrupts occur. I found after adding debug statements to mv88e6xxx_g1_irq_thread_work() that there is a race in handling device interrupts (e.g. PHY link interrupts). Some interrupts are directly cleared by reading the Global 1 status register. However, the device interrupt flag, for example, is not cleared until all the unmasked SERDES and PHY ports are serviced. This is done by reading the relevant SERDES and PHY status register. The code only services interrupts whose status bit is set at the time of reading its status register. If an interrupt event occurs after its status is read and before all interrupts are serviced, then this event will not be serviced and the INTn output pin will remain low. This is not a problem with polling or level interrupts since the handler will be called again to process the event. However, it's a big problem when using level interrupts. The fix presented here is to add a loop around the code servicing switch interrupts. If any pending interrupts remain after the current set has been handled, we loop and process the new set. If there are no pending interrupts after servicing, we are sure that INTn has gone high and we will get an edge when a new event occurs. Tested on espressobin board. Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.") Signed-off-by: John David Anglin <[email protected]> Tested-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-13net: phy: fix interrupt handling in non-started statesHeiner Kallweit1-3/+0
phylib enables interrupts before phy_start() has been called, and if we receive an interrupt in a non-started state, the interrupt handler returns IRQ_NONE. This causes problems with at least one Marvell chip as reported by Andrew. Fix this by handling interrupts the same as in phy_mac_interrupt(), basically always running the phylib state machine. It knows when it has to do something and when not. This change allows to handle interrupts gracefully even if they occur in a non-started state. Fixes: 2b3e88ea6528 ("net: phy: improve phy state checking") Reported-by: Andrew Lunn <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-13sctp: set stream ext to NULL after freeing it in sctp_stream_outq_migrateXin Long1-1/+3
In sctp_stream_init(), after sctp_stream_outq_migrate() freed the surplus streams' ext, but sctp_stream_alloc_out() returns -ENOMEM, stream->outcnt will not be set to 'outcnt'. With the bigger value on stream->outcnt, when closing the assoc and freeing its streams, the ext of those surplus streams will be freed again since those stream exts were not set to NULL after freeing in sctp_stream_outq_migrate(). Then the invalid-free issue reported by syzbot would be triggered. We fix it by simply setting them to NULL after freeing. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: [email protected] Signed-off-by: Xin Long <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-13sctp: call gso_reset_checksum when computing checksum in sctp_gso_segmentXin Long1-0/+1
Jianlin reported a panic when running sctp gso over gre over vlan device: [ 84.772930] RIP: 0010:do_csum+0x6d/0x170 [ 84.790605] Call Trace: [ 84.791054] csum_partial+0xd/0x20 [ 84.791657] gre_gso_segment+0x2c3/0x390 [ 84.792364] inet_gso_segment+0x161/0x3e0 [ 84.793071] skb_mac_gso_segment+0xb8/0x120 [ 84.793846] __skb_gso_segment+0x7e/0x180 [ 84.794581] validate_xmit_skb+0x141/0x2e0 [ 84.795297] __dev_queue_xmit+0x258/0x8f0 [ 84.795949] ? eth_header+0x26/0xc0 [ 84.796581] ip_finish_output2+0x196/0x430 [ 84.797295] ? skb_gso_validate_network_len+0x11/0x80 [ 84.798183] ? ip_finish_output+0x169/0x270 [ 84.798875] ip_output+0x6c/0xe0 [ 84.799413] ? ip_append_data.part.50+0xc0/0xc0 [ 84.800145] iptunnel_xmit+0x144/0x1c0 [ 84.800814] ip_tunnel_xmit+0x62d/0x930 [ip_tunnel] [ 84.801699] gre_tap_xmit+0xac/0xf0 [ip_gre] [ 84.802395] dev_hard_start_xmit+0xa5/0x210 [ 84.803086] sch_direct_xmit+0x14f/0x340 [ 84.803733] __dev_queue_xmit+0x799/0x8f0 [ 84.804472] ip_finish_output2+0x2e0/0x430 [ 84.805255] ? skb_gso_validate_network_len+0x11/0x80 [ 84.806154] ip_output+0x6c/0xe0 [ 84.806721] ? ip_append_data.part.50+0xc0/0xc0 [ 84.807516] sctp_packet_transmit+0x716/0xa10 [sctp] [ 84.808337] sctp_outq_flush+0xd7/0x880 [sctp] It was caused by SKB_GSO_CB(skb)->csum_start not set in sctp_gso_segment. sctp_gso_segment() calls skb_segment() with 'feature | NETIF_F_HW_CSUM', which causes SKB_GSO_CB(skb)->csum_start not to be set in skb_segment(). For TCP/UDP, when feature supports HW_CSUM, CHECKSUM_PARTIAL will be set and gso_reset_checksum will be called to set SKB_GSO_CB(skb)->csum_start. So SCTP should do the same as TCP/UDP, to call gso_reset_checksum() when computing checksum in sctp_gso_segment. Reported-by: Jianlin Shi <[email protected]> Signed-off-by: Xin Long <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller5-8/+18
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Missing structure initialization in ebtables causes splat with 32-bit user level on a 64-bit kernel, from Francesco Ruggeri. 2) Missing dependency on nf_defrag in IPVS IPv6 codebase, from Andrea Claudi. 3) Fix possible use-after-free from release path of target extensions. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-02-13Merge tag 'mlx5-fixes-2019-02-13' of ↵David S. Miller9-16/+55
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-02-13 This series introduces some fixes to mlx5 driver. For more information please see tag log below. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-02-13net/mlx5e: XDP, fix redirect resources availability checkSaeed Mahameed4-4/+22
Currently mlx5 driver creates xdp redirect hw queues unconditionally on netdevice open, This is great until someone starts redirecting XDP traffic via ndo_xdp_xmit on mlx5 device and changes the device configuration at the same time, this might cause crashes, since the other device's napi is not aware of the mlx5 state change (resources un-availability). To fix this we must synchronize with other devices napi's on the system. Added a new flag under mlx5e_priv to determine XDP TX resources are available, set/clear it up when necessary and use synchronize_rcu() when the flag is turned off, so other napi's are in-sync with it, before we actually cleanup the hw resources. The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and it is sufficient to determine if it safe to transmit or not. The other two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become unnecessary. Thus, they are removed from data path. Fixes: 58b99ee3e3eb ("net/mlx5e: Add support for XDP_REDIRECT in device-out side") Reported-by: Toke Høiland-Jørgensen <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-02-13net/mlx5: Fix a compilation warning in events.cTariq Toukan1-8/+9
Eliminate the following compilation warning: drivers/net/ethernet/mellanox/mlx5/core/events.c: warning: 'error_str' may be used uninitialized in this function [-Wuninitialized]: => 238:3 Fixes: c2fb3db22d35 ("net/mlx5: Rework handling of port module events") Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Mikhael Goikhman <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-02-13net/mlx5: No command allowed when command interface is not readyHuy Nguyen3-1/+20
When EEH is injected and PCI bus stalls, mlx5's pci error detect function is called to deactivate the command interface and tear down the device. The issue is that there can be a thread that already passed MLX5_DEVICE_STATE_INTERNAL_ERROR check, it will send the command and stuck in the wait_func. Solution: Add function mlx5_cmd_flush to disable command interface and clear all the pending commands. When device state is set to MLX5_DEVICE_STATE_INTERNAL_ERROR, call mlx5_cmd_flush to ensure all pending threads waiting for firmware commands completion are terminated. Fixes: c1d4d2e92ad6 ("net/mlx5: Avoid calling sleeping function by the health poll thread") Signed-off-by: Huy Nguyen <[email protected]> Reviewed-by: Daniel Jurgens <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-02-13net/mlx5e: Fix NULL pointer derefernce in set channels error flowMaria Pasechnik1-3/+4
New channels are applied to the priv channels only after they are successfully opened. Then, the indirection table should be built according to the new number of channels. Currently, such build is preformed independently of whether the channels opening is successful, and is not reverted on failure. The bug is caused due to removal of rss params from channels struct and moving it to priv struct. That change cause to independency between channels and rss params. This causes a crash on a later point, when accessing rqn of a non existing channel. This patch fixes it by moving the indirection table build right before switching the priv channels to new channels struct, after the new set of channels was successfully opened. Fixes: bbeb53b8b2c9 ("net/mlx5e: Move RSS params to a dedicated struct") Signed-off-by: Maria Pasechnik <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2019-02-13Merge tag 'trace-v5.0-rc4' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "This fixes kprobes/uprobes dynamic processing of strings, where it processes the args but does not update the remaining length of the buffer that the string arguments will be placed in. It constantly passes in the total size of buffer used instead of passing in the remaining size of the buffer used. This could cause issues if the strings are larger than the max size of an event which could cause the strings to be written beyond what was reserved on the buffer" * tag 'trace-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: probeevent: Correctly update remaining space in dynamic area
2019-02-13netfilter: nft_compat: use-after-free when deleting targetsPablo Neira Ayuso1-1/+2
Fetch pointer to module before target object is released. Fixes: 29e3880109e3 ("netfilter: nf_tables: fix use-after-free when deleting compat expressions") Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-02-13drm/amdgpu/psp11: TA firmware is optional (v3)Alex Deucher2-14/+23
Don't warn or fail if it's missing. v2: handle xgmi case more gracefully. v3: handle older kernels properly Reviewed-by: Hawking Zhang <[email protected]> Tested-by: James Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-02-13Merge branch 'nvme-5.0' of git://git.infradead.org/nvme into for-linusJens Axboe1-3/+5
Pull single NVMe fix from Christoph * 'nvme-5.0' of git://git.infradead.org/nvme: nvme-pci: add missing unlock for reset error
2019-02-13signal: Restore the stop PTRACE_EVENT_EXITEric W. Biederman1-2/+5
In the middle of do_exit() there is there is a call "ptrace_event(PTRACE_EVENT_EXIT, code);" That call places the process in TACKED_TRACED aka "(TASK_WAKEKILL | __TASK_TRACED)" and waits for for the debugger to release the task or SIGKILL to be delivered. Skipping past dequeue_signal when we know a fatal signal has already been delivered resulted in SIGKILL remaining pending and TIF_SIGPENDING remaining set. This in turn caused the scheduler to not sleep in PTACE_EVENT_EXIT as it figured a fatal signal was pending. This also caused ptrace_freeze_traced in ptrace_check_attach to fail because it left a per thread SIGKILL pending which is what fatal_signal_pending tests for. This difference in signal state caused strace to report strace: Exit of unknown pid NNNNN ignored Therefore update the signal handling state like dequeue_signal would when removing a per thread SIGKILL, by removing SIGKILL from the per thread signal mask and clearing TIF_SIGPENDING. Acked-by: Oleg Nesterov <[email protected]> Reported-by: Oleg Nesterov <[email protected]> Reported-by: Ivan Delalande <[email protected]> Cc: [email protected] Fixes: 35634ffa1751 ("signal: Always notice exiting tasks") Signed-off-by: "Eric W. Biederman" <[email protected]>
2019-02-13mmc: meson-gx: fix interrupt nameMartin Blumenstingl1-1/+2
Commit bb364890323cca ("mmc: meson-gx: Free irq in release() callback") changed the _probe code to use request_threaded_irq() instead of devm_request_threaded_irq(). Unfortunately this removes a fallback for the interrupt name: devm_request_threaded_irq() uses the device name as fallback if the given IRQ name is NULL. request_threaded_irq() has no such fallback, thus /proc/interrupts shows "(null)" instead. Explicitly pass the dev_name() so we get the IRQ name shown in /proc/interrupts again. While here, also fix the indentation of the request_threaded_irq() parameter list. Fixes: bb364890323cca ("mmc: meson-gx: Free irq in release() callback") Signed-off-by: Martin Blumenstingl <[email protected]> Signed-off-by: Ulf Hansson <[email protected]>
2019-02-13Merge tag 'imx-drm-fixes-2019-02-12' of git://git.pengutronix.de/pza/linux ↵Dave Airlie4-14/+29
into drm-fixes drm/imx: plane, ldb, and ipu-v3 fixes - Fix CSI register offsets for i.MX51 and i.MX53. - Fix delayed page flip completion events on i.MX6QP due to unexpected behaviour of the PRE when issuing NOP buffer updates to the same buffer address. - Stop throwing errors for plane updates on disabled CRTCs when a userspace process is killed while a plane update is pending. - Add missing of_node_put cleanup in imx_ldb_bind. Signed-off-by: Dave Airlie <[email protected]> From: Philipp Zabel <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2019-02-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds7-28/+21
Merge fixes from Andrew Morton: "6 fixes" * emailed patches from Andrew Morton <[email protected]>: mm: proc: smaps_rollup: fix pss_locked calculation Rename include/{uapi => }/asm-generic/shmparam.h really Revert "mm: use early_pfn_to_nid in page_ext_init" mm/gup: fix gup_pmd_range() for dax Revert "mm: slowly shrink slabs with a relatively small number of objects" Revert "mm: don't reclaim inodes with many attached pages"
2019-02-12mm: proc: smaps_rollup: fix pss_locked calculationSandeep Patil1-8/+14
The 'pss_locked' field of smaps_rollup was being calculated incorrectly. It accumulated the current pss everytime a locked VMA was found. Fix that by adding to 'pss_locked' the same time as that of 'pss' if the vma being walked is locked. Link: http://lkml.kernel.org/r/[email protected] Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup") Signed-off-by: Sandeep Patil <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Joel Fernandes (Google) <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: <[email protected]> [4.14.x, 4.19.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-12Rename include/{uapi => }/asm-generic/shmparam.h reallyMasahiro Yamada1-0/+0
Commit 36c0f7f0f899 ("arch: unexport asm/shmparam.h for all architectures") is different from the patch I submitted. My patch is this: https://lore.kernel.org/lkml/[email protected]/T/#u The file renaming part: rename include/{uapi => }/asm-generic/shmparam.h (100%) was lost when it was picked up. I think it was an accident because Andrew did not say anything. Link: http://lkml.kernel.org/r/[email protected] Fixes: 36c0f7f0f899 ("arch: unexport asm/shmparam.h for all architectures") Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-12Revert "mm: use early_pfn_to_nid in page_ext_init"Qian Cai2-4/+3
This reverts commit fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init"). When booting a system with "page_owner=on", start_kernel page_ext_init invoke_init_callbacks init_section_page_ext init_page_owner init_early_allocated_pages init_zones_in_node init_pages_in_zone lookup_page_ext page_to_nid The issue here is that page_to_nid() will not work since some page flags have no node information until later in page_alloc_init_late() due to DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds access with an invalid nid. UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50 index 7 is out of range for type 'zone [5]' Also, kernel will panic since flags were poisoned earlier with, CONFIG_DEBUG_VM_PGFLAGS=y CONFIG_NODE_NOT_IN_PAGE_FLAGS=n start_kernel setup_arch pagetable_init paging_init sparse_init sparse_init_nid memblock_alloc_try_nid_raw It did not handle it well in init_pages_in_zone() which ends up calling page_to_nid(). page:ffffea0004200000 is uninitialized and poisoned raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) page_owner info is not active (free page?) kernel BUG at include/linux/mm.h:990! RIP: 0010:init_page_owner+0x486/0x520 This means that assumptions behind commit fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init") are incomplete. Therefore, revert the commit for now. A proper way to move the page_owner initialization to sooner is to hook into memmap initialization. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Qian Cai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Pasha Tatashin <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Yang Shi <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-12mm/gup: fix gup_pmd_range() for daxYu Zhao1-1/+2
For dax pmd, pmd_trans_huge() returns false but pmd_huge() returns true on x86. So the function works as long as hugetlb is configured. However, dax doesn't depend on hugetlb. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yu Zhao <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Dan Williams <[email protected]> Cc: Huang Ying <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Keith Busch <[email protected]> Cc: "Michael S . Tsirkin" <[email protected]> Cc: John Hubbard <[email protected]> Cc: Wei Yang <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-12Revert "mm: slowly shrink slabs with a relatively small number of objects"Dave Chinner1-10/+0
This reverts commit 172b06c32b9497 ("mm: slowly shrink slabs with a relatively small number of objects"). This change changes the agressiveness of shrinker reclaim, causing small cache and low priority reclaim to greatly increase scanning pressure on small caches. As a result, light memory pressure has a disproportionate affect on small caches, and causes large caches to be reclaimed much faster than previously. As a result, it greatly perturbs the delicate balance of the VFS caches (dentry/inode vs file page cache) such that the inode/dentry caches are reclaimed much, much faster than the page cache and this drives us into several other caching imbalance related problems. As such, this is a bad change and needs to be reverted. [ Needs some massaging to retain the later seekless shrinker modifications.] Link: http://lkml.kernel.org/r/[email protected] Fixes: 172b06c32b9497 ("mm: slowly shrink slabs with a relatively small number of objects") Signed-off-by: Dave Chinner <[email protected]> Cc: Wolfgang Walter <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Spock <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-12Revert "mm: don't reclaim inodes with many attached pages"Dave Chinner1-5/+2
This reverts commit a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages"). This change causes serious changes to page cache and inode cache behaviour and balance, resulting in major performance regressions when combining worklaods such as large file copies and kernel compiles. https://bugzilla.kernel.org/show_bug.cgi?id=202441 This change is a hack to work around the problems introduced by changing how agressive shrinkers are on small caches in commit 172b06c32b94 ("mm: slowly shrink slabs with a relatively small number of objects"). It creates more problems than it solves, wasn't adequately reviewed or tested, so it needs to be reverted. Link: http://lkml.kernel.org/r/[email protected] Fixes: a76cf1a474d7d ("mm: don't reclaim inodes with many attached pages") Signed-off-by: Dave Chinner <[email protected]> Cc: Wolfgang Walter <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Spock <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-02-12Merge tag 'hwmon-for-v5.0-rc7' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fix from Guenter Roeck: "Fix fan detection for NCT6793D" * tag 'hwmon-for-v5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (nct6775) Fix fan6 detection for NCT6793D
2019-02-12Merge branch 'md-fixes' of https://github.com/liu-song-6/linux into for-linusJens Axboe1-10/+18
Pull MD fix from Song * 'md-fixes' of https://github.com/liu-song-6/linux: md/raid1: don't clear bitmap bits on interrupted recovery.
2019-02-12md/raid1: don't clear bitmap bits on interrupted recovery.Nate Dailey1-10/+18
sync_request_write no longer submits writes to a Faulty device. This has the unfortunate side effect that bitmap bits can be incorrectly cleared if a recovery is interrupted (previously, end_sync_write would have prevented this). This means the next recovery may not copy everything it should, potentially corrupting data. Add a function for doing the proper md_bitmap_end_sync, called from end_sync_write and the Faulty case in sync_request_write. backport note to 4.14: s/md_bitmap_end_sync/bitmap_end_sync Cc: [email protected] 4.14+ Fixes: 0c9d5b127f69 ("md/raid1: avoid reusing a resync bio after error handling.") Reviewed-by: Jack Wang <[email protected]> Tested-by: Jack Wang <[email protected]> Signed-off-by: Nate Dailey <[email protected]> Signed-off-by: Song Liu <[email protected]>
2019-02-12Merge tag 'riscv-for-linus-5.0-rc7' of ↵Linus Torvalds3-10/+12
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V fixes from Palmer Dabbelt: "This contains a pair of bug fixes that I'd like to include in 5.0: - A fix to disambiguate swap from invalid PTEs, which fixes an error when trying to unmap PROT_NONE pages. - A revert to an optimization of the size of flat binaries. This is really a workaround to prevent breaking existing boot flows, but since the change was introduced as part of the 5.0 merge window I'd like to have the fix in before 5.0 so we can avoid a regression for any proper releases. With these I hope we're out of patches for 5.0 in RISC-V land" * tag 'riscv-for-linus-5.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: Revert "RISC-V: Make BSS section as the last section in vmlinux.lds.S" riscv: Add pte bit to distinguish swap from invalid
2019-02-12team: avoid complex list operations in team_nl_cmd_options_set()Cong Wang1-22/+5
The current opt_inst_list operations inside team_nl_cmd_options_set() is too complex to track: LIST_HEAD(opt_inst_list); nla_for_each_nested(...) { list_for_each_entry(opt_inst, &team->option_inst_list, list) { if (__team_option_inst_tmp_find(&opt_inst_list, opt_inst)) continue; list_add(&opt_inst->tmp_list, &opt_inst_list); } } team_nl_send_event_options_get(team, &opt_inst_list); as while we retrieve 'opt_inst' from team->option_inst_list, it could be added to the local 'opt_inst_list' for multiple times. The __team_option_inst_tmp_find() doesn't work, as the setter team_mode_option_set() still calls team->ops.exit() which uses ->tmp_list too in __team_options_change_check(). Simplify the list operations by moving the 'opt_inst_list' and team_nl_send_event_options_get() into the nla_for_each_nested() loop so that it can be guranteed that we won't insert a same list entry for multiple times. Therefore, __team_option_inst_tmp_find() can be removed too. Fixes: 4fb0534fb7bb ("team: avoid adding twice the same option to the event list") Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message") Reported-by: [email protected] Reported-by: [email protected] Cc: Jiri Pirko <[email protected]> Cc: Paolo Abeni <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Jiri Pirko <[email protected]> Reviewed-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-12Merge branch 'net_sched-some-fixes-for-cls_tcindex'David S. Miller1-32/+48
Cong Wang says: ==================== net_sched: some fixes for cls_tcindex This patchset contains 3 bug fixes for tcindex filter. Please check each patch for details. v2: fix a compile error in patch 2 drop netns refcnt in patch 1 ==================== Signed-off-by: Cong Wang <[email protected]>
2019-02-12net_sched: fix two more memory leaks in cls_tcindexCong Wang1-9/+7
struct tcindex_filter_result contains two parts: struct tcf_exts and struct tcf_result. For the local variable 'cr', its exts part is never used but initialized without being released properly on success path. So just completely remove the exts part to fix this leak. For the local variable 'new_filter_result', it is never properly released if not used by 'r' on success path. Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-12net_sched: fix a memory leak in cls_tcindexCong Wang1-16/+30
When tcindex_destroy() destroys all the filter results in the perfect hash table, it invokes the walker to delete each of them. However, results with class==0 are skipped in either tcindex_walk() or tcindex_delete(), which causes a memory leak reported by kmemleak. This patch fixes it by skipping the walker and directly deleting these filter results so we don't miss any filter result. As a result of this change, we have to initialize exts->net properly in tcindex_alloc_perfect_hash(). For net-next, we need to consider whether we should initialize ->net in tcf_exts_init() instead, before that just directly test CONFIG_NET_CLS_ACT=y. Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-12net_sched: fix a race condition in tcindex_destroy()Cong Wang1-7/+11
tcindex_destroy() invokes tcindex_destroy_element() via a walker to delete each filter result in its perfect hash table, and tcindex_destroy_element() calls tcindex_delete() which schedules tcf RCU works to do the final deletion work. Unfortunately this races with the RCU callback __tcindex_destroy(), which could lead to use-after-free as reported by Adrian. Fix this by migrating this RCU callback to tcf RCU work too, as that workqueue is ordered, we will not have use-after-free. Note, we don't need to hold netns refcnt because we don't call tcf_exts_destroy() here. Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter") Reported-by: Adrian <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-12Merge branch 'ena-races'David S. Miller2-6/+6
Arthur Kiyanovski says: ==================== net: ena: race condition bug fix and version update This patchset includes a fix to a race condition that can cause kernel panic, as well as a driver version update because of this fix. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-02-12net: ena: update driver version from 2.0.2 to 2.0.3Arthur Kiyanovski1-1/+1
Update driver version due to bug fix. Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-12net: ena: fix race between link up and device initalizationArthur Kiyanovski1-5/+5
Fix race condition between ena_update_on_link_change() and ena_restore_device(). This race can occur if link notification arrives while the driver is performing a reset sequence. In this case link can be set up, enabling the device, before it is fully restored. If packets are sent at this time, the driver might access uninitialized data structures, causing kernel crash. Move the clearing of ENA_FLAG_ONGOING_RESET and netif_carrier_on() after ena_up() to ensure the device is ready when link is set up. Fixes: d18e4f683445 ("net: ena: fix race condition between device reset and link up setup") Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-02-12net/packet: fix 4gb buffer limit due to overflow checkKal Conley1-1/+1
When calculating rb->frames_per_block * req->tp_block_nr the result can overflow. Check it for overflow without limiting the total buffer size to UINT_MAX. This change fixes support for packet ring buffers >= UINT_MAX. Fixes: 8f8d28e4d6d8 ("net/packet: fix overflow in check for tp_frame_nr") Signed-off-by: Kal Conley <[email protected]> Signed-off-by: David S. Miller <[email protected]>