Age | Commit message (Collapse) | Author | Files | Lines |
|
Patch series "try to reduce fragmenting fallbacks", v3.
Last year, Johannes Weiner has reported a regression in page mobility
grouping [1] and while the exact cause was not found, I've come up with
some ways to improve it by reducing the number of allocations falling
back to different migratetype and causing permanent fragmentation.
The series was tested with mmtests stress-highalloc modified to do
GFP_KERNEL order-4 allocations, on 4.9 with "mm, vmscan: fix zone
balance check in prepare_kswapd_sleep" (without that, kcompactd indeed
wasn't woken up) on UMA machine with 4GB memory. There were 5 repeats
of each run, as the extfrag stats are quite volatile (note the stats
below are sums, not averages, as it was less perl hacking for me).
Success rate are the same, already high due to the low allocation order
used, so I'm not including them.
Compaction stats:
(the patches are stacked, and I haven't measured the non-functional-changes
patches separately)
patch 1 patch 2 patch 3 patch 4 patch 7 patch 8
Compaction stalls 22449 24680 24846 19765 22059 17480
Compaction success 12971 14836 14608 10475 11632 8757
Compaction failures 9477 9843 10238 9290 10426 8722
Page migrate success 3109022 3370438 3312164 1695105 1608435 2111379
Page migrate failure 911588 1149065 1028264 1112675 1077251 1026367
Compaction pages isolated 7242983 8015530 7782467 4629063 4402787 5377665
Compaction migrate scanned 980838938 987367943 957690188 917647238 947155598 1018922197
Compaction free scanned 557926893 598946443 602236894 594024490 541169699 763651731
Compaction cost 10243 10578 10304 8286 8398 9440
Compaction stats are mostly within noise until patch 4, which decreases
the number of compactions, and migrations. Part of that could be due to
more pageblocks marked as unmovable, and async compaction skipping
those. This changes a bit with patch 7, but not so much. Patch 8
increases free scanner stats and migrations, which comes from the
changed termination criteria. Interestingly number of compactions
decreases - probably the fully compacted pageblock satisfies multiple
subsequent allocations, so it amortizes.
Next comes the extfrag tracepoint, where "fragmenting" means that an
allocation had to fallback to a pageblock of another migratetype which
wasn't fully free (which is almost all of the fallbacks). I have
locally added another tracepoint for "Page steal" into
steal_suitable_fallback() which triggers in situations where we are
allowed to do move_freepages_block(). If we decide to also do
set_pageblock_migratetype(), it's "Pages steal with pageblock" with
break down for which allocation migratetype we are stealing and from
which fallback migratetype. The last part "due to counting" comes from
patch 4 and counts the events where the counting of movable pages
allowed us to change pageblock's migratetype, while the number of free
pages alone wouldn't be enough to cross the threshold.
patch 1 patch 2 patch 3 patch 4 patch 7 patch 8
Page alloc extfrag event 10155066 8522968 10164959 15622080 13727068 13140319
Extfrag fragmenting 10149231 8517025 10159040 15616925 13721391 13134792
Extfrag fragmenting for unmovable 159504 168500 184177 97835 70625 56948
Extfrag fragmenting unmovable placed with movable 153613 163549 172693 91740 64099 50917
Extfrag fragmenting unmovable placed with reclaim. 5891 4951 11484 6095 6526 6031
Extfrag fragmenting for reclaimable 4738 4829 6345 4822 5640 5378
Extfrag fragmenting reclaimable placed with movable 1836 1902 1851 1579 1739 1760
Extfrag fragmenting reclaimable placed with unmov. 2902 2927 4494 3243 3901 3618
Extfrag fragmenting for movable 9984989 8343696 9968518 15514268 13645126 13072466
Pages steal 179954 192291 210880 123254 94545 81486
Pages steal with pageblock 22153 18943 20154 33562 29969 33444
Pages steal with pageblock for unmovable 14350 12858 13256 20660 19003 20852
Pages steal with pageblock for unmovable from mov. 12812 11402 11683 19072 17467 19298
Pages steal with pageblock for unmovable from recl. 1538 1456 1573 1588 1536 1554
Pages steal with pageblock for movable 7114 5489 5965 11787 10012 11493
Pages steal with pageblock for movable from unmov. 6885 5291 5541 11179 9525 10885
Pages steal with pageblock for movable from recl. 229 198 424 608 487 608
Pages steal with pageblock for reclaimable 689 596 933 1115 954 1099
Pages steal with pageblock for reclaimable from unmov. 273 219 537 658 547 667
Pages steal with pageblock for reclaimable from mov. 416 377 396 457 407 432
Pages steal with pageblock due to counting 11834 10075 7530
... for unmovable 8993 7381 4616
... for movable 2792 2653 2851
... for reclaimable 49 41 63
What we can see is that "Extfrag fragmenting for unmovable" and "...
placed with movable" drops with almost each patch, which is good as we
are polluting less movable pageblocks with unmovable pages.
The most significant change is patch 4 with movable page counting. On
the other hand it increases "Extfrag fragmenting for movable" by 50%.
"Pages steal" drops though, so these movable allocation fallbacks find
only small free pages and are not allowed to steal whole pageblocks
back. "Pages steal with pageblock" raises, because the patch increases
the chances of pageblock migratetype changes to happen. This affects
all migratetypes.
The summary is that patch 4 is not a clear win wrt these stats, but I
believe that the tradeoff it makes is a good one. There's less
pollution of movable pageblocks by unmovable allocations. There's less
stealing between pageblock, and those that remain have higher chance of
changing migratetype also the pageblock itself, so it should more
faithfully reflect the migratetype of the pages within the pageblock.
The increase of movable allocations falling back to unmovable pageblock
might look dramatic, but those allocations can be migrated by compaction
when needed, and other patches in the series (7-9) improve that aspect.
Patches 7 and 8 continue the trend of reduced unmovable fallbacks and
also reduce the impact on movable fallbacks from patch 4.
[1] https://www.spinics.net/lists/linux-mm/msg114237.html
This patch (of 8):
While currently there are (mostly by accident) no holes in struct
compact_control (on x86_64), but we are going to add more bool flags, so
place them all together to the end of the structure. While at it, just
order all fields from largest to smallest.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Vlastimil Babka <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Instead of messing with the commit path which has been causing issues,
add a COMMIT op after the COPY and ask for stable copies in the first
space.
It saves a round trip, since after the COPY, the client sends a COMMIT
anyway.
Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
As reported by David Jeffery: "a signal was sent to lockd while lockd
was shutting down from a request to stop nfs. The signal causes lockd
to call restart_grace() which puts the lockd_net structure on the grace
list. If this signal is received at the wrong time, it will occur after
lockd_down_net() has called locks_end_grace() but before
lockd_down_net() stops the lockd thread. This leads to lockd putting
the lockd_net structure back on the grace list, then exiting without
anything removing it from the list."
So, perform the final locks_end_grace() from the the lockd thread; this
ensures it's serialized with respect to restart_grace().
Reported-by: David Jeffery <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
|
|
If an error is encountered in mdio_mux_init(), the error path will call
mdiobus_free(). Since mdiobus_register() has been called prior to
mdio_mux_init(), the bus->state will not be MDIOBUS_UNREGISTERED. This
causes a BUG_ON() in mdiobus_free(). To correct this issue, add an
error path for mdio_mux_init() which calls mdiobus_unregister() prior to
mdiobus_free().
Signed-off-by: Jon Mason <[email protected]>
Fixes: 98bc865a1ec8 ("net: mdio-mux: Add MDIO mux driver for iProc SoCs")
Acked-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The parisc architecture recently reimplemented the memcpy function and
their reimplementation crashed when source and destination overlapped.
The crash happened in the function ide_complete_cmd where memcpy is called
with the same source and destination pointer. According to the C
specification, memcpy behavior is undefined if the source and destination
range overlaps. This patches fixes the undefined behavior.
Signed-off-by: Mikulas Patocka <[email protected]>
Reviewed-by: Bartlomiej Zolnierkiewicz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use setup_timer() instead of init_timer() to simplify the code.
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When users set flow control using ethtool the bits are set properly in the
CPGMAC_SL MACCONTROL register, but the FIFO depth in the respective Port n
Maximum FIFO Blocks (Pn_MAX_BLKS) registers remains set to the minimum size
reset value. When receive flow control is enabled on a port, the port's
associated FIFO block allocation must be adjusted. The port RX allocation
must increase to accommodate the flow control runout. The TRM recommends
numbers of 5 or 6.
Hence, apply required Port FIFO configuration to
Pn_MAX_BLKS.Pn_TX_MAX_BLKS=0xF and Pn_MAX_BLKS.Pn_RX_MAX_BLKS=0x5 during
interface initialization.
Cc: Schuyler Patton <[email protected]>
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
For each netns (except init_net), we initialize its null entry
in 3 places:
1) The template itself, as we use kmemdup()
2) Code around dst_init_metrics() in ip6_route_net_init()
3) ip6_route_dev_notify(), which is supposed to initialize it after
loopback registers
Unfortunately the last one still happens in a wrong order because
we expect to initialize net->ipv6.ip6_null_entry->rt6i_idev to
net->loopback_dev's idev, thus we have to do that after we add
idev to loopback. However, this notifier has priority == 0 same as
ipv6_dev_notf, and ipv6_dev_notf is registered after
ip6_route_dev_notifier so it is called actually after
ip6_route_dev_notifier. This is similar to commit 2f460933f58e
("ipv6: initialize route null entry in addrconf_init()") which
fixes init_net.
Fix it by picking a smaller priority for ip6_route_dev_notifier.
Also, we have to release the refcnt accordingly when unregistering
loopback_dev because device exit functions are called before subsys
exit functions.
Acked-by: David Ahern <[email protected]>
Tested-by: David Ahern <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A couple more fixes:
* don't try to authenticate during reconfiguration, which causes
drivers to get confused
* fix a kernel-doc warning for a recently merged change
* fix MU-MIMO group configuration (relevant only for monitor mode)
* more rate flags fix: remove stray RX_ENC_FLAG_40MHZ
* fix IBSS probe response allocation size
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
The zero padding that is added to NTB's does
not zero the memory correctly.
This is because the skb_put modifies the value
of skb_out->len which results in the memset
command not setting any memory to zero as
(ctx->tx_max - skb_out->len) == 0.
I have resolved this by storing the size of
the memory to be zeroed before the skb_put
and using this in the memset call.
Signed-off-by: Jim Baxter <[email protected]>
Reviewed-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull KVM updates from Paolo Bonzini:
"ARM:
- HYP mode stub supports kexec/kdump on 32-bit
- improved PMU support
- virtual interrupt controller performance improvements
- support for userspace virtual interrupt controller (slower, but
necessary for KVM on the weird Broadcom SoCs used by the Raspberry
Pi 3)
MIPS:
- basic support for hardware virtualization (ImgTec P5600/P6600/I6400
and Cavium Octeon III)
PPC:
- in-kernel acceleration for VFIO
s390:
- support for guests without storage keys
- adapter interruption suppression
x86:
- usual range of nVMX improvements, notably nested EPT support for
accessed and dirty bits
- emulation of CPL3 CPUID faulting
generic:
- first part of VCPU thread request API
- kvm_stat improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
kvm: nVMX: Don't validate disabled secondary controls
KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
Revert "KVM: Support vCPU-based gfn->hva cache"
tools/kvm: fix top level makefile
KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
KVM: Documentation: remove VM mmap documentation
kvm: nVMX: Remove superfluous VMX instruction fault checks
KVM: x86: fix emulation of RSM and IRET instructions
KVM: mark requests that need synchronization
KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
KVM: add explicit barrier to kvm_vcpu_kick
KVM: perform a wake_up in kvm_make_all_cpus_request
KVM: mark requests that do not need a wakeup
KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
KVM: x86: always use kvm_make_request instead of set_bit
KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
s390: kvm: Cpu model support for msa6, msa7 and msa8
KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
kvm: better MWAIT emulation for guests
KVM: x86: virtualize cpuid faulting
...
|
|
Pull ARM updates from Russell King:
"Lots of little things this time:
- allow modules to be autoloaded according to the HWCAP feature bits
(used primarily for crypto modules)
- split module core and init PLT sections, since the core code and
init code could be placed far apart, and the PLT sections need to
be local to the code block.
- three patches from Chris Brandt to allow Cortex-A9 L2 cache
optimisations to be disabled where a SoC didn't wire up the out of
band signals.
- NoMMU compliance fixes, avoiding corruption of vector table which
is not being used at this point, and avoiding possible register
state corruption when switching mode.
- fixmap memory attribute compliance update.
- remove unnecessary locking from update_sections_early()
- ftrace fix for DEBUG_RODATA with !FRAME_POINTER"
* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 8672/1: mm: remove tasklist locking from update_sections_early()
ARM: 8671/1: V7M: Preserve registers across switch from Thread to Handler mode
ARM: 8670/1: V7M: Do not corrupt vector table around v7m_invalidate_l1 call
ARM: 8668/1: ftrace: Fix dynamic ftrace with DEBUG_RODATA and !FRAME_POINTER
ARM: 8667/3: Fix memory attribute inconsistencies when using fixmap
ARM: 8663/1: wire up HWCAP/HWCAP2 feature bits to the CPU modalias
ARM: 8666/1: mm: dump: Add domain to output
ARM: 8662/1: module: split core and init PLT sections
ARM: 8661/1: dts: r7s72100: add l2 cache
ARM: 8660/1: shmobile: r7s72100: Enable L2 cache
ARM: 8659/1: l2c: allow CA9 optimizations to be disabled
|
|
Pull Xtensa updates from Max Filippov:
- clearly mark references to spilled register locations with SPILL_SLOT
macros
- clean up xtensa ptrace: use generic tracehooks, move internal kernel
definitions from uapi/asm to asm, make locally-used functions static,
fix code style and alignment
- use command line parameters passed to ISS as kernel command line.
* tag 'xtensa-20170507' of git://github.com/jcmvbkbc/linux-xtensa:
xtensa: clean up access to spilled registers locations
xtensa: use generic tracehooks
xtensa: move internal ptrace definitions from uapi/asm to asm
xtensa: clean up xtensa/kernel/ptrace.c
xtensa: drop unused fast_io_protect function
xtensa: use ITLB_HIT_BIT instead of hardcoded number
xtensa: ISS: update kernel command line in platform_setup
xtensa: ISS: add argc/argv simcall definitions
xtensa: ISS: cleanup setup.c
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've focused on enhancing performance with regards to
block allocation, GC, and discard/in-place-update IO controls. There
are a bunch of clean-ups as well as minor bug fixes.
Enhancements:
- disable heap-based allocation by default
- issue small-sized discard commands by default
- change the policy of data hotness for logging
- distinguish IOs in terms of size and wbc type
- start SSR earlier to avoid foreground GC
- enhance data structures managing discard commands
- enhance in-place update flow
- add some more fault injection routines
- secure one more xattr entry
Bug fixes:
- calculate victim cost for GC correctly
- remain correct victim segment number for GC
- race condition in nid allocator and initializer
- stale pointer produced by atomic_writes
- fix missing REQ_SYNC for flush commands
- handle missing errors in more corner cases"
* tag 'for-f2fs-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (111 commits)
f2fs: fix a mount fail for wrong next_scan_nid
f2fs: enhance scalability of trace macro
f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS
f2fs: Make flush bios explicitely sync
f2fs: show available_nids in f2fs/status
f2fs: flush dirty nats periodically
f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard
f2fs: allow cpc->reason to indicate more than one reason
f2fs: release cp and dnode lock before IPU
f2fs: shrink size of struct discard_cmd
f2fs: don't hold cmd_lock during waiting discard command
f2fs: nullify fio->encrypted_page for each writes
f2fs: sanity check segment count
f2fs: introduce valid_ipu_blkaddr to clean up
f2fs: lookup extent cache first under IPU scenario
f2fs: reconstruct code to write a data page
f2fs: introduce __wait_discard_cmd
f2fs: introduce __issue_discard_cmd
f2fs: enable small discard by default
f2fs: delay awaking discard thread
...
|
|
Andy Shevchenko says:
====================
stmmac: pci: Fix crash on Intel Galileo Gen2
Due to misconfiguration of PCI driver for Intel Quark the user will get
a kernel crash:
udhcpc: started, v1.26.2
stmmaceth 0000:00:14.6 eth0: device MAC address 98:4f:ee:05:ac:47
Generic PHY stmmac-a6:01: attached PHY driver [Generic PHY] (mii_bus:phy_addr=stmmac-a6:01, irq=-1)
stmmaceth 0000:00:14.6 eth0: IEEE 1588-2008 Advanced Timestamp supported
stmmaceth 0000:00:14.6 eth0: registered PTP clock
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
udhcpc: sending discover
stmmaceth 0000:00:14.6 eth0: Link is Up - 100Mbps/Full - flow control off
IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: stmmac_xmit+0xf1/0x1080
Fix this by adding necessary settings.
P.S. I split fix to three patches according to what each of them adds.
====================
Tested-by: Jan Kiszka <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
New helper is added in order to prevent misconfiguration happened
for one of the platforms when configuration data is expanded.
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Joao Pinto <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The commit abe80fdc6ee6
("net: stmmac: RX queue routing configuration")
missed Intel Quark configuration. Append it here.
Fixes: abe80fdc6ee6 ("net: stmmac: RX queue routing configuration")
Cc: Joao Pinto <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Joao Pinto <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The commit a8f5102af2a7
("net: stmmac: TX and RX queue priority configuration")
missed Intel Quark configuration. Append it here.
Fixes: a8f5102af2a7 ("net: stmmac: TX and RX queue priority configuration")
Cc: Joao Pinto <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Joao Pinto <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The commit 26d6851fd24e
("net: stmmac: set default number of rx and tx queues in stmmac_pci")
missed Intel Quark configuration. Append it here.
Fixes: 26d6851fd24e ("net: stmmac: set default number of rx and tx queues in stmmac_pci")
Cc: Joao Pinto <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Acked-by: Joao Pinto <[email protected]>
Acked-by: Giuseppe Cavallaro <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The patch fixes two things at once:
1) It checks the env->allow_ptr_leaks and only prints the map address to
the log if we have the privileges to do so, otherwise it just dumps 0
as we would when kptr_restrict is enabled on %pK. Given the latter is
off by default and not every distro sets it, I don't want to rely on
this, hence the 0 by default for unprivileged.
2) Printing of ldimm64 in the verifier log is currently broken in that
we don't print the full immediate, but only the 32 bit part of the
first insn part for ldimm64. Thus, fix this up as well; it's okay to
access, since we verified all ldimm64 earlier already (including just
constants) through replace_map_fd_with_map_ptr().
Fixes: 1be7f75d1668 ("bpf: enable non-root eBPF programs")
Fixes: cbd357008604 ("bpf: verifier (add ability to receive verification log)")
Reported-by: Jann Horn <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use memdup_user() helper instead of open-coding to simplify the code.
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use memdup_user() helper instead of open-coding to simplify the code.
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Recent Chelsio firmware started using few port capablity bits to
manage FEC and as driver was not aware of FEC changes those bits
were zeroed, consequently disabling FEC.
Avoid zeroing those bits and default to whatever the firmware
tells us the Link is currently advertising.
Signed-off-by: Ganesh Goudar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If 'devm_kzalloc' fails, a NULL pointer will be dereferenced.
Return -ENOMEM instead, as done for some other memory allocation just a
few lines above.
Fixes: 98cd1552ea27 ("net: dsa: Mock-up driver")
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Acked-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Hangbin Liu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add a file under debugfs to allow easy access to the erase count for
each physical erase block on an UBI device. This is useful when
debugging data integrity issues with UBIFS on NAND flash devices.
Signed-off-by: Ben Shelton <[email protected]>
Signed-off-by: Zach Brown <[email protected]>
v2:
* If ubi_io_is_bad eraseblk_count_seq_show just returns the err.
* if ubi->lookuptbl returns null, its no longer treated as an error
instead info for that block is not printeded
* Removed check for UBI_MAX_ERASECOUNTER since it is impossible to hit
* Removed block state from print, if a block is printed then it is good and
if it is not printed, then it is bad.
v3:
* Remove errant ! symbol from if statement checking if erase count is valid.
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Change 'convert' to 'converts'
Change 'UBIFS' to 'UBIFS inode flags'
Signed-off-by: Rock Lee <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Assigning a value of a variable to itself is not useful.
Signed-off-by: Stefan Agner <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
The check for the bad node type of sb->type is checking sa->type
and not sb-type. This looks like a cut and paste error. Fix this.
Detected by PVS-Studio, warning: V581
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Booting with UBI fastmap and SLUB debugging enabled results in the
following splats. The problem is that ubi_scan_fastmap() moves the
fastmap blocks from the scan_ai (allocated in scan_fast()) to the ai
allocated in ubi_attach(). This results in two problems:
- When the scan_ai is freed, aebs which were allocated from its slab
cache are still in use.
- When the other ai is being destroyed in destroy_ai(), the
arguments to kmem_cache_free() call are incorrect since aebs on its
->fastmap list were allocated with a slab cache from a differnt ai.
Fix this by making a copy of the aebs in ubi_scan_fastmap() instead of
moving them.
=============================================================================
BUG ubi_aeb_slab_cache (Not tainted): Objects remaining in ubi_aeb_slab_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------
INFO: Slab 0xbfd2da3c objects=17 used=1 fp=0xb33d7748 flags=0x40000080
CPU: 1 PID: 118 Comm: ubiattach Tainted: G B 4.9.15 #3
[<80111910>] (unwind_backtrace) from [<8010d498>] (show_stack+0x18/0x1c)
[<8010d498>] (show_stack) from [<804a3274>] (dump_stack+0xb4/0xe0)
[<804a3274>] (dump_stack) from [<8026c47c>] (slab_err+0x78/0x88)
[<8026c47c>] (slab_err) from [<802735bc>] (__kmem_cache_shutdown+0x180/0x3e0)
[<802735bc>] (__kmem_cache_shutdown) from [<8024e13c>] (shutdown_cache+0x1c/0x60)
[<8024e13c>] (shutdown_cache) from [<8024ed64>] (kmem_cache_destroy+0x19c/0x20c)
[<8024ed64>] (kmem_cache_destroy) from [<8057cc14>] (destroy_ai+0x1dc/0x1e8)
[<8057cc14>] (destroy_ai) from [<8057f04c>] (ubi_attach+0x3f4/0x450)
[<8057f04c>] (ubi_attach) from [<8056fe70>] (ubi_attach_mtd_dev+0x60c/0xff8)
[<8056fe70>] (ubi_attach_mtd_dev) from [<80571d78>] (ctrl_cdev_ioctl+0x110/0x2b8)
[<80571d78>] (ctrl_cdev_ioctl) from [<8029c77c>] (do_vfs_ioctl+0xac/0xa00)
[<8029c77c>] (do_vfs_ioctl) from [<8029d10c>] (SyS_ioctl+0x3c/0x64)
[<8029d10c>] (SyS_ioctl) from [<80108860>] (ret_fast_syscall+0x0/0x1c)
INFO: Object 0xb33d7e88 @offset=3720
INFO: Allocated in scan_peb+0x608/0x81c age=72 cpu=1 pid=118
kmem_cache_alloc+0x3b0/0x43c
scan_peb+0x608/0x81c
ubi_attach+0x124/0x450
ubi_attach_mtd_dev+0x60c/0xff8
ctrl_cdev_ioctl+0x110/0x2b8
do_vfs_ioctl+0xac/0xa00
SyS_ioctl+0x3c/0x64
ret_fast_syscall+0x0/0x1c
kmem_cache_destroy ubi_aeb_slab_cache: Slab cache still has objects
CPU: 1 PID: 118 Comm: ubiattach Tainted: G B 4.9.15 #3
[<80111910>] (unwind_backtrace) from [<8010d498>] (show_stack+0x18/0x1c)
[<8010d498>] (show_stack) from [<804a3274>] (dump_stack+0xb4/0xe0)
[<804a3274>] (dump_stack) from [<8024ed80>] (kmem_cache_destroy+0x1b8/0x20c)
[<8024ed80>] (kmem_cache_destroy) from [<8057cc14>] (destroy_ai+0x1dc/0x1e8)
[<8057cc14>] (destroy_ai) from [<8057f04c>] (ubi_attach+0x3f4/0x450)
[<8057f04c>] (ubi_attach) from [<8056fe70>] (ubi_attach_mtd_dev+0x60c/0xff8)
[<8056fe70>] (ubi_attach_mtd_dev) from [<80571d78>] (ctrl_cdev_ioctl+0x110/0x2b8)
[<80571d78>] (ctrl_cdev_ioctl) from [<8029c77c>] (do_vfs_ioctl+0xac/0xa00)
[<8029c77c>] (do_vfs_ioctl) from [<8029d10c>] (SyS_ioctl+0x3c/0x64)
[<8029d10c>] (SyS_ioctl) from [<80108860>] (ret_fast_syscall+0x0/0x1c)
cache_from_obj: Wrong slab cache. ubi_aeb_slab_cache but object is from ubi_aeb_slab_cache
------------[ cut here ]------------
WARNING: CPU: 1 PID: 118 at mm/slab.h:354 kmem_cache_free+0x39c/0x450
Modules linked in:
CPU: 1 PID: 118 Comm: ubiattach Tainted: G B 4.9.15 #3
[<80111910>] (unwind_backtrace) from [<8010d498>] (show_stack+0x18/0x1c)
[<8010d498>] (show_stack) from [<804a3274>] (dump_stack+0xb4/0xe0)
[<804a3274>] (dump_stack) from [<80120e40>] (__warn+0xf4/0x10c)
[<80120e40>] (__warn) from [<80120f20>] (warn_slowpath_null+0x28/0x30)
[<80120f20>] (warn_slowpath_null) from [<80271fe0>] (kmem_cache_free+0x39c/0x450)
[<80271fe0>] (kmem_cache_free) from [<8057cb88>] (destroy_ai+0x150/0x1e8)
[<8057cb88>] (destroy_ai) from [<8057ef1c>] (ubi_attach+0x2c4/0x450)
[<8057ef1c>] (ubi_attach) from [<8056fe70>] (ubi_attach_mtd_dev+0x60c/0xff8)
[<8056fe70>] (ubi_attach_mtd_dev) from [<80571d78>] (ctrl_cdev_ioctl+0x110/0x2b8)
[<80571d78>] (ctrl_cdev_ioctl) from [<8029c77c>] (do_vfs_ioctl+0xac/0xa00)
[<8029c77c>] (do_vfs_ioctl) from [<8029d10c>] (SyS_ioctl+0x3c/0x64)
[<8029d10c>] (SyS_ioctl) from [<80108860>] (ret_fast_syscall+0x0/0x1c)
---[ end trace 2bd8396277fd0a0b ]---
=============================================================================
BUG ubi_aeb_slab_cache (Tainted: G B W ): page slab pointer corrupt.
-----------------------------------------------------------------------------
INFO: Allocated in scan_peb+0x608/0x81c age=104 cpu=1 pid=118
kmem_cache_alloc+0x3b0/0x43c
scan_peb+0x608/0x81c
ubi_attach+0x124/0x450
ubi_attach_mtd_dev+0x60c/0xff8
ctrl_cdev_ioctl+0x110/0x2b8
do_vfs_ioctl+0xac/0xa00
SyS_ioctl+0x3c/0x64
ret_fast_syscall+0x0/0x1c
INFO: Slab 0xbfd2da3c objects=17 used=1 fp=0xb33d7748 flags=0x40000081
INFO: Object 0xb33d7e88 @offset=3720 fp=0xb33d7da0
Redzone b33d7e80: cc cc cc cc cc cc cc cc ........
Object b33d7e88: 02 00 00 00 01 00 00 00 00 f0 ff 7f ff ff ff ff ................
Object b33d7e98: 00 00 00 00 00 00 00 00 bd 16 00 00 00 00 00 00 ................
Object b33d7ea8: 00 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 ................
Redzone b33d7eb8: cc cc cc cc ....
Padding b33d7f60: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
CPU: 1 PID: 118 Comm: ubiattach Tainted: G B W 4.9.15 #3
[<80111910>] (unwind_backtrace) from [<8010d498>] (show_stack+0x18/0x1c)
[<8010d498>] (show_stack) from [<804a3274>] (dump_stack+0xb4/0xe0)
[<804a3274>] (dump_stack) from [<80271770>] (free_debug_processing+0x320/0x3c4)
[<80271770>] (free_debug_processing) from [<80271ad0>] (__slab_free+0x2bc/0x430)
[<80271ad0>] (__slab_free) from [<80272024>] (kmem_cache_free+0x3e0/0x450)
[<80272024>] (kmem_cache_free) from [<8057cb88>] (destroy_ai+0x150/0x1e8)
[<8057cb88>] (destroy_ai) from [<8057ef1c>] (ubi_attach+0x2c4/0x450)
[<8057ef1c>] (ubi_attach) from [<8056fe70>] (ubi_attach_mtd_dev+0x60c/0xff8)
[<8056fe70>] (ubi_attach_mtd_dev) from [<80571d78>] (ctrl_cdev_ioctl+0x110/0x2b8)
[<80571d78>] (ctrl_cdev_ioctl) from [<8029c77c>] (do_vfs_ioctl+0xac/0xa00)
[<8029c77c>] (do_vfs_ioctl) from [<8029d10c>] (SyS_ioctl+0x3c/0x64)
[<8029d10c>] (SyS_ioctl) from [<80108860>] (ret_fast_syscall+0x0/0x1c)
FIX ubi_aeb_slab_cache: Object at 0xb33d7e88 not freed
Signed-off-by: Rabin Vincent <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
When write syscall is called, every time security label is searched to
determine that file's privileges should be changed.
If LSM(Linux Security Model) is not used, this is useless.
So introduce CONFIG_UBIFS_SECURITY to disable security labels. it's default
value is "y".
Signed-off-by: Hyunchul Lee <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
Fix permissions to allow read mtd parameter back (only for owner).
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
WARNING: vmlinux.o(.text+0x1f2a80): Section mismatch in reference from the variable __param_ops_mtd to the function .init.text:ubi_mtd_param_parse()
The function __param_ops_mtd() references
the function __init ubi_mtd_param_parse().
This is often because __param_ops_mtd lacks a __init
annotation or the annotation of ubi_mtd_param_parse is wrong.
Cc: Richard Weinberger <[email protected]>
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
|
|
We have the number of longs, but we need to calculate the number of
bytes required.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Michael Chan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.
This was found with the future CONFIG_FORTIFY_SOURCE feature.
Cc: Daniel Micay <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.
This was found with the future CONFIG_FORTIFY_SOURCE feature.
Cc: Daniel Micay <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Using memcpy() from a string that is shorter than the length copied means
the destination buffer is being filled with arbitrary data from the kernel
rodata segment. Instead, use strncpy() which will fill the trailing bytes
with zeros.
This was found with the future CONFIG_FORTIFY_SOURCE feature.
Cc: Daniel Micay <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt
Pull fscrypt updates from Ted Ts'o:
"Only bug fixes and cleanups for this merge window"
* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt:
fscrypt: correct collision claim for digested names
MAINTAINERS: fscrypt: update mailing list, patchwork, and git
ext4: clean up ext4_match() and callers
f2fs: switch to using fscrypt_match_name()
ext4: switch to using fscrypt_match_name()
fscrypt: introduce helper function for filename matching
fscrypt: avoid collisions when presenting long encrypted filenames
f2fs: check entire encrypted bigname when finding a dentry
ubifs: check for consistent encryption contexts in ubifs_lookup()
f2fs: sync f2fs_lookup() with ext4_lookup()
ext4: remove "nokey" check from ext4_lookup()
fscrypt: fix context consistency check when key(s) unavailable
fscrypt: Remove __packed from fscrypt_policy
fscrypt: Move key structure and constants to uapi
fscrypt: remove fscrypt_symlink_data_len()
fscrypt: remove unnecessary checks for NULL operations
|
|
Vlan devices, like all other software devices, enable
NETIF_F_HW_CSUM feature. However, unlike all the othe other
software devices, vlans will switch to using IP|IPV6_CSUM
features, if the underlying devices uses them. In these situations,
checksum offload features on the vlan device can't be controlled
via ethtool.
This patch makes vlans keep HW_CSUM feature if the underlying
device supports checksum offloading. This makes vlan devices
behave like other software devices, and restores control to the
user.
A side-effect is that some offload settings (typically UFO)
may be enabled on the vlan device while being disabled on the HW.
However, the GSO code will correctly process the packets. This
actually results in slightly better raw throughput.
Signed-off-by: Vladislav Yasevich <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Congestion control modules that want full control over congestion
control behavior do not want the cwnd modifications controlled by
the sysctl_tcp_slow_start_after_idle code path.
So skip those code paths for CC modules that use the cong_control()
API.
As an example, those cwnd effects are not desired for the BBR congestion
control algorithm.
Fixes: c0402760f565 ("tcp: new CC hook to set sending rate with rate_sample in any CA state")
Signed-off-by: Wei Wang <[email protected]>
Signed-off-by: Yuchung Cheng <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
IPv4 dst could use fi->fib_metrics to store metrics but fib_info
itself is refcnt'ed, so without taking a refcnt fi and
fi->fib_metrics could be freed while dst metrics still points to
it. This triggers use-after-free as reported by Andrey twice.
This patch reverts commit 2860583fe840 ("ipv4: Kill rt->fi") to
restore this reference counting. It is a quick fix for -net and
-stable, for -net-next, as Eric suggested, we can consider doing
reference counting for metrics itself instead of relying on fib_info.
IPv6 is very different, it copies or steals the metrics from mx6_config
in fib6_commit_metrics() so probably doesn't need a refcnt.
Decnet has already done the refcnt'ing, see dn_fib_semantic_match().
Fixes: 2860583fe840 ("ipv4: Kill rt->fi")
Reported-by: Andrey Konovalov <[email protected]>
Tested-by: Andrey Konovalov <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
- add GETFSMAP support
- some performance improvements for very large file systems and for
random write workloads into a preallocated file
- bug fixes and cleanups.
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
jbd2: cleanup write flags handling from jbd2_write_superblock()
ext4: mark superblock writes synchronous for nobarrier mounts
ext4: inherit encryption xattr before other xattrs
ext4: replace BUG_ON with WARN_ONCE in ext4_end_bio()
ext4: avoid unnecessary transaction stalls during writeback
ext4: preload block group descriptors
ext4: make ext4_shutdown() static
ext4: support GETFSMAP ioctls
vfs: add common GETFSMAP ioctl definitions
ext4: evict inline data when writing to memory map
ext4: remove ext4_xattr_check_entry()
ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries()
ext4: merge ext4_xattr_list() into ext4_listxattr()
ext4: constify static data that is never modified
ext4: trim return value and 'dir' argument from ext4_insert_dentry()
jbd2: fix dbench4 performance regression for 'nobarrier' mounts
jbd2: Fix lockdep splat with generic/270 test
mm: retry writepages() on ENOMEM when doing an data integrity writeback
|
|
For configurations that do not enable DAX filesystems or drivers, do not
require the DAX core to be built.
Given that the 'direct_access' method has been removed from
'block_device_operations', we can also go ahead and remove the
block-related dax helper functions from fs/block_dev.c to
drivers/dax/super.c. This keeps dax details out of the block layer and
lets the DAX core be built as a module in the FS_DAX=n case.
Filesystems need to include dax.h to call bdev_dax_supported().
Cc: [email protected]
Cc: Jens Axboe <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Ross Zwisler <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
ERROR: "devm_create_dev_dax" [drivers/dax/dax_pmem.ko] undefined!
ERROR: "alloc_dax_region" [drivers/dax/dax_pmem.ko] undefined!
ERROR: "dax_region_put" [drivers/dax/dax_pmem.ko] undefined!
Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
|
|
Wrong sign of iov_iter_revert() argument. Unfortunately, slipped through
the testing, since most of the time we don't do anything to the iterator
afterwards and potential oops on walking the iter->iov too far backwards
is too infrequent to be easily triggered.
Add a sanity check in iov_iter_revert() to catch bugs like this one;
fortunately, the same braino hadn't happened in other callers, but we'd
better have a warning if such thing crops up.
Signed-off-by: Al Viro <[email protected]>
|
|
This patch fixes the crash that happens when driver tries to collect statistics
from already released "aq_vec" object.
If adapter is in "down" state we still allow user to see statistics from HW.
V2: fixed braces around "aq_vec_free".
Fixes: 97bde5c4f909 ("net: ethernet: aquantia: Support for NIC-specific code")
Signed-off-by: Pavel Belous <[email protected]>
Tested-by: David Arcari <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Making __blk_mq_stop_hw_queues static fixes sparse warning:
block/blk-mq.c:6: warning: symbol '__blk_mq_stop_hw_queues' was not
declared. Should it be static?
Fixes: 2719aa217e0d0 ("blk-mq: don't use sync workqueue flushing from drivers")
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When using NFS4_CREATE_EXCLUSIVE4_1 mode, the client will overestimate the
amount of space that it needs for the attributes because it does so
before checking whether or not the server supports a given attribute.
Fix by checking the attribute mask earlier.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
The intention in the original patch was to release the lock when
we put the inode, however something got screwed up.
Reported-by: Jason Yan <[email protected]>
Fixes: 7b410d9ce460f ("pNFS: Delay getting the layout header in..")
Cc: [email protected] # v4.10+
Signed-off-by: Trond Myklebust <[email protected]>
|