aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-09-13md: add feature flag MD_FEATURE_RAID0_LAYOUTNeilBrown3-0/+18
Due to a bug introduced in Linux 3.14 we cannot determine the correctly layout for a multi-zone RAID0 array - there are two possibilities. It is possible to tell the kernel which to chose using a module parameter, but this can be clumsy to use. It would be best if the choice were recorded in the metadata. So add a feature flag for this purpose. If it is set, then the 'layout' field of the superblock is used to determine which layout to use. If this flag is not set, then mddev->layout gets set to -1, which causes the module parameter to be required. Acked-by: Guoqing Jiang <[email protected]> Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Song Liu <[email protected]>
2019-09-13md/raid0: avoid RAID0 data corruption due to layout confusion.NeilBrown2-1/+45
If the drives in a RAID0 are not all the same size, the array is divided into zones. The first zone covers all drives, to the size of the smallest. The second zone covers all drives larger than the smallest, up to the size of the second smallest - etc. A change in Linux 3.14 unintentionally changed the layout for the second and subsequent zones. All the correct data is still stored, but each chunk may be assigned to a different device than in pre-3.14 kernels. This can lead to data corruption. It is not possible to determine what layout to use - it depends which kernel the data was written by. So we add a module parameter to allow the old (0) or new (1) layout to be specified, and refused to assemble an affected array if that parameter is not set. Fixes: 20d0189b1012 ("block: Introduce new bio_split()") cc: [email protected] (3.14+) Acked-by: Guoqing Jiang <[email protected]> Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Song Liu <[email protected]>
2019-09-13raid5: don't set STRIPE_HANDLE to stripe which is in batch listGuoqing Jiang1-1/+2
If stripe in batch list is set with STRIPE_HANDLE flag, then the stripe could be set with STRIPE_ACTIVE by the handle_stripe function. And if error happens to the batch_head at the same time, break_stripe_batch_list is called, then below warning could happen (the same report in [1]), it means a member of batch list was set with STRIPE_ACTIVE. [7028915.431770] stripe state: 2001 [7028915.431815] ------------[ cut here ]------------ [7028915.431828] WARNING: CPU: 18 PID: 29089 at drivers/md/raid5.c:4614 break_stripe_batch_list+0x203/0x240 [raid456] [...] [7028915.431879] CPU: 18 PID: 29089 Comm: kworker/u82:5 Tainted: G O 4.14.86-1-storage #4.14.86-1.2~deb9 [7028915.431881] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 3.1 06/18/2018 [7028915.431888] Workqueue: raid5wq raid5_do_work [raid456] [7028915.431890] task: ffff9ab0ef36d7c0 task.stack: ffffb72926f84000 [7028915.431896] RIP: 0010:break_stripe_batch_list+0x203/0x240 [raid456] [7028915.431898] RSP: 0018:ffffb72926f87ba8 EFLAGS: 00010286 [7028915.431900] RAX: 0000000000000012 RBX: ffff9aaa84a98000 RCX: 0000000000000000 [7028915.431901] RDX: 0000000000000000 RSI: ffff9ab2bfa15458 RDI: ffff9ab2bfa15458 [7028915.431902] RBP: ffff9aaa8fb4e900 R08: 0000000000000001 R09: 0000000000002eb4 [7028915.431903] R10: 00000000ffffffff R11: 0000000000000000 R12: ffff9ab1736f1b00 [7028915.431904] R13: 0000000000000000 R14: ffff9aaa8fb4e900 R15: 0000000000000001 [7028915.431906] FS: 0000000000000000(0000) GS:ffff9ab2bfa00000(0000) knlGS:0000000000000000 [7028915.431907] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [7028915.431908] CR2: 00007ff953b9f5d8 CR3: 0000000bf4009002 CR4: 00000000003606e0 [7028915.431909] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [7028915.431910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [7028915.431910] Call Trace: [7028915.431923] handle_stripe+0x8e7/0x2020 [raid456] [7028915.431930] ? __wake_up_common_lock+0x89/0xc0 [7028915.431935] handle_active_stripes.isra.58+0x35f/0x560 [raid456] [7028915.431939] raid5_do_work+0xc6/0x1f0 [raid456] Also commit 59fc630b8b5f9f ("RAID5: batch adjacent full stripe write") said "If a stripe is added to batch list, then only the first stripe of the list should be put to handle_list and run handle_stripe." So don't set STRIPE_HANDLE to stripe which is already in batch list, otherwise the stripe could be put to handle_list and run handle_stripe, then the above warning could be triggered. [1]. https://www.spinics.net/lists/raid/msg62552.html Signed-off-by: Guoqing Jiang <[email protected]> Signed-off-by: Song Liu <[email protected]>
2019-09-13raid5: don't increment read_errors on EILSEQ returnNigel Croxon1-1/+2
While MD continues to count read errors returned by the lower layer. If those errors are -EILSEQ, instead of -EIO, it should NOT increase the read_errors count. When RAID6 is set up on dm-integrity target that detects massive corruption, the leg will be ejected from the array. Even if the issue is correctable with a sector re-write and the array has necessary redundancy to correct it. The leg is ejected because it runs up the rdev->read_errors beyond conf->max_nr_stripes. The return status in dm-drypt when there is a data integrity error is -EILSEQ (BLK_STS_PROTECTION). Signed-off-by: Nigel Croxon <[email protected]> Signed-off-by: Song Liu <[email protected]>
2019-09-13cdc_ether: fix rndis support for Mediatek based smartphonesBjørn Mork1-1/+9
A Mediatek based smartphone owner reports problems with USB tethering in Linux. The verbose USB listing shows a rndis_host interface pair (e0/01/03 + 10/00/00), but the driver fails to bind with [ 355.960428] usb 1-4: bad CDC descriptors The problem is a failsafe test intended to filter out ACM serial functions using the same 02/02/ff class/subclass/protocol as RNDIS. The serial functions are recognized by their non-zero bmCapabilities. No RNDIS function with non-zero bmCapabilities were known at the time this failsafe was added. But it turns out that some Wireless class RNDIS functions are using the bmCapabilities field. These functions are uniquely identified as RNDIS by their class/subclass/protocol, so the failing test can safely be disabled. The same applies to the two types of Misc class RNDIS functions. Applying the failsafe to Communication class functions only retains the original functionality, and fixes the problem for the Mediatek based smartphone. Tow examples of CDC functional descriptors with non-zero bmCapabilities from Wireless class RNDIS functions are: 0e8d:000a Mediatek Crosscall Spider X5 3G Phone CDC Header: bcdCDC 1.10 CDC ACM: bmCapabilities 0x0f connection notifications sends break line coding and serial state get/set/clear comm features CDC Union: bMasterInterface 0 bSlaveInterface 1 CDC Call Management: bmCapabilities 0x03 call management use DataInterface bDataInterface 1 and 19d2:1023 ZTE K4201-z CDC Header: bcdCDC 1.10 CDC ACM: bmCapabilities 0x02 line coding and serial state CDC Call Management: bmCapabilities 0x03 call management use DataInterface bDataInterface 1 CDC Union: bMasterInterface 0 bSlaveInterface 1 The Mediatek example is believed to apply to most smartphones with Mediatek firmware. The ZTE example is most likely also part of a larger family of devices/firmwares. Suggested-by: Lars Melin <[email protected]> Signed-off-by: Bjørn Mork <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-13Merge branch 'sctp_do_bind-leak'David S. Miller1-10/+11
Mao Wenan says: ==================== fix memory leak for sctp_do_bind First two patches are to do cleanup, remove redundant assignment, and change return type of sctp_get_port_local. Third patch is to fix memory leak for sctp_do_bind if failed to bind address. v2: add one patch to change return type of sctp_get_port_local. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-09-13sctp: destroy bucket if failed to bind addrMao Wenan1-4/+6
There is one memory leak bug report: BUG: memory leak unreferenced object 0xffff8881dc4c5ec0 (size 40): comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s) hex dump (first 32 bytes): 02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00 ................ f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00 .c=............. backtrace: [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp] [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp] [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp] [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6] [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647 [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline] [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline] [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656 [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296 [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe This is because in sctp_do_bind, if sctp_get_port_local is to create hash bucket successfully, and sctp_add_bind_addr failed to bind address, e.g return -ENOMEM, so memory leak found, it needs to destroy allocated bucket. Reported-by: Hulk Robot <[email protected]> Signed-off-by: Mao Wenan <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-13sctp: remove redundant assignment when call sctp_get_port_localMao Wenan1-2/+1
There are more parentheses in if clause when call sctp_get_port_local in sctp_do_bind, and redundant assignment to 'ret'. This patch is to do cleanup. Signed-off-by: Mao Wenan <[email protected]> Acked-by: Neil Horman <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-13sctp: change return type of sctp_get_port_localMao Wenan1-4/+4
Currently sctp_get_port_local() returns a long which is either 0,1 or a pointer casted to long. It's neither of the callers use the return value since commit 62208f12451f ("net: sctp: simplify sctp_get_port"). Now two callers are sctp_get_port and sctp_do_bind, they actually assumend a casted to an int was the same as a pointer casted to a long, and they don't save the return value just check whether it is zero or non-zero, so it would better change return type from long to int for sctp_get_port_local. Signed-off-by: Mao Wenan <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-13ixgbevf: Fix secpath usage for IPsec Tx offloadJeff Kirsher1-1/+2
Port the same fix for ixgbe to ixgbevf. The ixgbevf driver currently does IPsec Tx offloading based on an existing secpath. However, the secpath can also come from the Rx side, in this case it is misinterpreted for Tx offload and the packets are dropped with a "bad sa_idx" error. Fix this by using the xfrm_offload() function to test for Tx offload. CC: Shannon Nelson <[email protected]> Fixes: 7f68d4306701 ("ixgbevf: enable VF IPsec offload operations") Reported-by: Jonathan Tooker <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]> Acked-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-13hwmon: submitting-patches: Add note on comment styleGuenter Roeck1-0/+4
Ask for standard multi-line comments, and ask for consistent comment style. Signed-off-by: Guenter Roeck <[email protected]>
2019-09-13hwmon: submitting-patches: Point to with_info APIGuenter Roeck1-2/+2
New driver should use devm_hwmon_device_register_with_info() or hwmon_device_register_with_info() to register with the hwmon subsystem. Signed-off-by: Guenter Roeck <[email protected]>
2019-09-13mmc: tmio: Fixup runtime PM management during removeUlf Hansson1-3/+4
Accessing the device when it may be runtime suspended is a bug, which is the case in tmio_mmc_host_remove(). Let's fix the behaviour. Cc: [email protected] Signed-off-by: Ulf Hansson <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]>
2019-09-13mmc: tmio: Fixup runtime PM management during probeUlf Hansson2-1/+9
The tmio_mmc_host_probe() calls pm_runtime_set_active() to update the runtime PM status of the device, as to make it reflect the current status of the HW. This works fine for most cases, but unfortunate not for all. Especially, there is a generic problem when the device has a genpd attached and that genpd have the ->start|stop() callbacks assigned. More precisely, if the driver calls pm_runtime_set_active() during ->probe(), genpd does not get to invoke the ->start() callback for it, which means the HW isn't really fully powered on. Furthermore, in the next phase, when the device becomes runtime suspended, genpd will invoke the ->stop() callback for it, potentially leading to usage count imbalance problems, depending on what's implemented behind the callbacks of course. To fix this problem, convert to call pm_runtime_get_sync() from tmio_mmc_host_probe() rather than pm_runtime_set_active(). Additionally, to avoid bumping usage counters and unnecessary re-initializing the HW the first time the tmio driver's ->runtime_resume() callback is called, introduce a state flag to keeping track of this. Cc: [email protected] Signed-off-by: Ulf Hansson <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]>
2019-09-13Revert "mmc: tmio: move runtime PM enablement to the driver implementations"Ulf Hansson4-23/+2
This reverts commit 7ff213193310ef8d0ee5f04f79d791210787ac2c. It turns out that the above commit introduces other problems. For example, calling pm_runtime_set_active() must not be done prior calling pm_runtime_enable() as that makes it fail. This leads to additional problems, such as clock enables being wrongly balanced. Rather than fixing the problem on top, let's start over by doing a revert. Fixes: 7ff213193310 ("mmc: tmio: move runtime PM enablement to the driver implementations") Signed-off-by: Ulf Hansson <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]>
2019-09-13s390: add support for IBM z15 machinesMartin Schwidefsky4-0/+27
Add detection for machine types 0x8562 and 8x8561 and set the ELF platform name to z15. Add the miscellaneous-instruction-extension 3 facility to the list of facilities for z15. And allow to generate code that only runs on a z15 machine. Reviewed-by: Christian Borntraeger <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2019-09-13s390/crypto: Support for SHA3 via CPACF (MSA6)Joerg Schmidbauer9-28/+395
This patch introduces sha3 support for s390. - Rework the s390-specific SHA1 and SHA2 related code to provide the basis for SHA3. - Provide two new kernel modules sha3_256_s390 and sha3_512_s390 together with new kernel options. Signed-off-by: Joerg Schmidbauer <[email protected]> Reviewed-by: Ingo Franzki <[email protected]> Reviewed-by: Harald Freudenberger <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2019-09-13s390/startup: add pgm check info printingVasily Gorbik4-2/+101
Try to print out startup pgm check info including exact linux kernel version, pgm interruption code and ilc, psw and general registers. Like the following: Linux version 5.3.0-rc7-07282-ge7b4d41d61bd-dirty (gor@tuxmaker) #3 SMP PREEMPT Thu Sep 5 16:07:34 CEST 2019 Kernel fault: interruption code 0005 ilc:2 PSW : 0000000180000000 0000000000012e52 R:0 T:0 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:0 CC:0 PM:0 RI:0 EA:3 GPRS: 0000000000000000 00ffffffffffffff 0000000000000000 0000000000019a58 000000000000bf68 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 000000000001a041 0000000000000000 0000000004c9c000 0000000000010070 0000000000012e42 000000000000beb0 This info makes it apparent that kernel startup failed and might help to understand what went wrong without actual standalone dump. Printing code runs on its own stack of 1 page (at unused 0x5000), which should be sufficient for sclp_early_printk usage (typical stack usage observed has been around 512 bytes). The code has pgm check recursion prevention, despite pgm check info printing failure (follow on pgm check) or success it restores original faulty psw and gprs and does disabled wait. Reviewed-by: Heiko Carstens <[email protected]> Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Heiko Carstens <[email protected]>
2019-09-13spi: mediatek: support large PAluhua.xu1-5/+39
Add spi large PA(max=64G) support for DMA transfer. Signed-off-by: luhua.xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-09-13spi: mediatek: add spi support for mt6765 ICluhua.xu1-0/+9
This patch add spi support for mt6765 IC. Signed-off-by: luhua.xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-09-13dt-bindings: spi: update bindings for MT6765 SoCluhua.xu1-0/+1
Add a DT binding documentation for the MT6765 soc. Signed-off-by: luhua.xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2019-09-13Merge branch 'for-5.3-fixes' of ↵Linus Torvalds2-1/+63
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "Roman found and fixed a bug in the cgroup2 freezer which allows new child cgroup to escape frozen state" * 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: freezer: fix frozen state inheritance kselftests: cgroup: add freezer mkdir test
2019-09-13Merge tag 'for-5.3-rc8-tag' of ↵Linus Torvalds2-17/+34
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Here are two fixes, one of them urgent fixing a bug introduced in 5.2 and reported by many users. It took time to identify the root cause, catching the 5.3 release is higly desired also to push the fix to 5.2 stable tree. The bug is a mess up of return values after adding proper error handling and honestly the kind of bug that can cause sleeping disorders until it's caught. My appologies to everybody who was affected. Summary of what could happen: 1) either a hang when committing a transaction, if this happens there's no risk of corruption, still the hang is very inconvenient and can't be resolved without a reboot 2) writeback for some btree nodes may never be started and we end up committing a transaction without noticing that, this is really serious and that will lead to the "parent transid verify failed" messages" * tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix unwritten extent buffers and hangs on future writeback attempts Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-13sched/psi: Correct overly pessimistic size calculationMiles Chen1-1/+1
When passing a equal or more then 32 bytes long string to psi_write(), psi_write() copies 31 bytes to its buf and overwrites buf[30] with '\0'. Which makes the input string 1 byte shorter than it should be. Fix it by copying sizeof(buf) bytes when nbytes >= sizeof(buf). This does not cause problems in normal use case like: "some 500000 10000000" or "full 500000 10000000" because they are less than 32 bytes in length. /* assuming nbytes == 35 */ char buf[32]; buf_size = min(nbytes, (sizeof(buf) - 1)); /* buf_size = 31 */ if (copy_from_user(buf, user_buf, buf_size)) return -EFAULT; buf[buf_size - 1] = '\0'; /* buf[30] = '\0' */ Before: %cd /proc/pressure/ %echo "123456789|123456789|123456789|1234" > memory [ 22.473497] nbytes=35,buf_size=31 [ 22.473775] 123456789|123456789|123456789| (print 30 chars) %sh: write error: Invalid argument %echo "123456789|123456789|123456789|1" > memory [ 64.916162] nbytes=32,buf_size=31 [ 64.916331] 123456789|123456789|123456789| (print 30 chars) %sh: write error: Invalid argument After: %cd /proc/pressure/ %echo "123456789|123456789|123456789|1234" > memory [ 254.837863] nbytes=35,buf_size=32 [ 254.838541] 123456789|123456789|123456789|1 (print 31 chars) %sh: write error: Invalid argument %echo "123456789|123456789|123456789|1" > memory [ 9965.714935] nbytes=32,buf_size=32 [ 9965.715096] 123456789|123456789|123456789|1 (print 31 chars) %sh: write error: Invalid argument Also remove the superfluous parentheses. Signed-off-by: Miles Chen <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-13sched/fair: Speed-up energy-aware wake-upsQuentin Perret1-60/+50
EAS computes the energy impact of migrating a waking task when deciding on which CPU it should run. However, the current approach is known to have a high algorithmic complexity, which can result in prohibitively high wake-up latencies on systems with complex energy models, such as systems with per-CPU DVFS. On such systems, the algorithm complexity is in O(n^2) (ignoring the cost of searching for performance states in the EM) with 'n' the number of CPUs. To address this, re-factor the EAS wake-up path to compute the energy 'delta' (with and without the task) on a per-performance domain basis, rather than system-wide, which brings the complexity down to O(n). No functional changes intended. Test results ~~~~~~~~~~~~ * Setup: Tested on a Google Pixel 3, with a Snapdragon 845 (4+4 CPUs, A55/A75). Base kernel is 5.3-rc5 + Pixel3 specific patches. Android userspace, no graphics. * Test case: Run a periodic rt-app task, with 16ms period, ramping down from 70% to 10%, in 5% steps of 500 ms each (json avail. at [1]). Frequencies of all CPUs are pinned to max (using scaling_min_freq CPUFreq sysfs entries) to reduce variability. The time to run select_task_rq_fair() is measured using the function profiler (/sys/kernel/debug/tracing/trace_stat/function*). See the test script for more details [2]. Test 1: I hacked the DT to 'fake' per-CPU DVFS. That is, we end up with one CPUFreq policy per CPU (8 policies in total). Since all frequencies are pinned to max for the test, this should have no impact on the actual frequency selection, but it does in the EAS calculation. +---------------------------+----------------------------------+ | Without patch | With patch | +-----+-----+----------+----------+-----+-----------------+----------+ | CPU | Hit | Avg (us) | s^2 (us) | Hit | Avg (us) | s^2 (us) | |-----+-----+----------+----------+-----+-----------------+----------+ | 0 | 274 | 38.303 | 1750.239 | 401 | 14.126 (-63.1%) | 146.625 | | 1 | 197 | 49.529 | 1695.852 | 314 | 16.135 (-67.4%) | 167.525 | | 2 | 142 | 34.296 | 1758.665 | 302 | 14.133 (-58.8%) | 130.071 | | 3 | 172 | 31.734 | 1490.975 | 641 | 14.637 (-53.9%) | 139.189 | | 4 | 316 | 7.834 | 178.217 | 425 | 5.413 (-30.9%) | 20.803 | | 5 | 447 | 8.424 | 144.638 | 556 | 5.929 (-29.6%) | 27.301 | | 6 | 581 | 14.886 | 346.793 | 456 | 5.711 (-61.6%) | 23.124 | | 7 | 456 | 10.005 | 211.187 | 997 | 4.708 (-52.9%) | 21.144 | +-----+-----+----------+----------+-----+-----------------+----------+ * Hit, Avg and s^2 are as reported by the function profiler Test 2: I also ran the same test with a normal DT, with 2 CPUFreq policies, to see if this causes regressions in the most common case. +---------------------------+----------------------------------+ | Without patch | With patch | +-----+-----+----------+----------+-----+-----------------+----------+ | CPU | Hit | Avg (us) | s^2 (us) | Hit | Avg (us) | s^2 (us) | |-----+-----+----------+----------+-----+-----------------+----------+ | 0 | 345 | 22.184 | 215.321 | 580 | 18.635 (-16.0%) | 146.892 | | 1 | 358 | 18.597 | 200.596 | 438 | 12.934 (-30.5%) | 104.604 | | 2 | 359 | 25.566 | 200.217 | 397 | 10.826 (-57.7%) | 74.021 | | 3 | 362 | 16.881 | 200.291 | 718 | 11.455 (-32.1%) | 102.280 | | 4 | 457 | 3.822 | 9.895 | 757 | 4.616 (+20.8%) | 13.369 | | 5 | 344 | 4.301 | 7.121 | 594 | 5.320 (+23.7%) | 18.798 | | 6 | 472 | 4.326 | 7.849 | 464 | 5.648 (+30.6%) | 22.022 | | 7 | 331 | 4.630 | 13.937 | 408 | 5.299 (+14.4%) | 18.273 | +-----+-----+----------+----------+-----+-----------------+----------+ * Hit, Avg and s^2 are as reported by the function profiler In addition to these two tests, I also ran 50 iterations of the Lisa EAS functional test suite [3] with this patch applied on Arm Juno r0, Arm Juno r2, Arm TC2 and Hikey960, and could not see any regressions (all EAS functional tests are passing). [1] https://paste.debian.net/1100055/ [2] https://paste.debian.net/1100057/ [3] https://github.com/ARM-software/lisa/blob/master/lisa/tests/scheduler/eas_behaviour.py Signed-off-by: Quentin Perret <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-12cgroup: freezer: fix frozen state inheritanceRoman Gushchin1-1/+9
If a new child cgroup is created in the frozen cgroup hierarchy (one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup flag should be set. Otherwise if a process will be attached to the child cgroup, it won't become frozen. The problem can be reproduced with the test_cgfreezer_mkdir test. This is the output before this patch: ~/test_freezer ok 1 test_cgfreezer_simple ok 2 test_cgfreezer_tree ok 3 test_cgfreezer_forkbomb Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen not ok 4 test_cgfreezer_mkdir ok 5 test_cgfreezer_rmdir ok 6 test_cgfreezer_migrate ok 7 test_cgfreezer_ptrace ok 8 test_cgfreezer_stopped ok 9 test_cgfreezer_ptraced ok 10 test_cgfreezer_vfork And with this patch: ~/test_freezer ok 1 test_cgfreezer_simple ok 2 test_cgfreezer_tree ok 3 test_cgfreezer_forkbomb ok 4 test_cgfreezer_mkdir ok 5 test_cgfreezer_rmdir ok 6 test_cgfreezer_migrate ok 7 test_cgfreezer_ptrace ok 8 test_cgfreezer_stopped ok 9 test_cgfreezer_ptraced ok 10 test_cgfreezer_vfork Reported-by: Mark Crossen <[email protected]> Signed-off-by: Roman Gushchin <[email protected]> Fixes: 76f969e8948d ("cgroup: cgroup v2 freezer") Cc: Tejun Heo <[email protected]> Cc: [email protected] # v5.2+ Signed-off-by: Tejun Heo <[email protected]>
2019-09-12kselftests: cgroup: add freezer mkdir testRoman Gushchin1-0/+54
Add a new cgroup freezer selftest, which checks that if a cgroup is frozen, their new child cgroups will properly inherit the frozen state. It creates a parent cgroup, freezes it, creates a child cgroup and populates it with a dummy process. Then it checks that both parent and child cgroup are frozen. Signed-off-by: Roman Gushchin <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2019-09-12io_uring: make sqpoll wakeup possible with geteventsJens Axboe1-6/+2
The way the logic is setup in io_uring_enter() means that you can't wake up the SQ poller thread while at the same time waiting (or polling) for completions afterwards. There's no reason for that to be the case. Reported-by: Lewis Baker <[email protected]> Reviewed-by: Jeff Moyer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-09-12io_uring: extend async work mergingJens Axboe1-8/+28
We currently merge async work items if we see a strict sequential hit. This helps avoid unnecessary workqueue switches when we don't need them. We can extend this merging to cover cases where it's not a strict sequential hit, but the IO still fits within the same page. If an application is doing multiple requests within the same page, we don't want separate workers waiting on the same page to complete IO. It's much faster to let the first worker bring in the page, then operate on that page from the same worker to complete the next request(s). Reviewed-by: Jeff Moyer <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2019-09-12hwmon: (nct7904) Fix incorrect SMI status register setting of LTD ↵amy.shih1-2/+9
temperature and fan. According to datasheet, the SMI status register setting of LTD temperature is SMI_STS3, and the SMI status register setting of fan is SMI_STS5 and SMI_STS6. Signed-off-by: amy.shih <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]>
2019-09-12Merge branch 'nvme-5.4' of git://git.infradead.org/nvme into for-5.4/blockJens Axboe7-71/+153
Pull NVMe updates from Sagi: "Highlights includes: - controller reset and namespace scan races fixes - nvme discovery log change uevent support - naming improvements from Keith - multiple discovery controllers reject fix from James - some regular cleanups from various people" * 'nvme-5.4' of git://git.infradead.org/nvme: nvmet: fix a wrong error status returned in error log page nvme: send discovery log page change events to userspace nvme: add uevent variables for controller devices nvme: enable aen regardless of the presence of I/O queues nvme-fabrics: allow discovery subsystems accept a kato nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery() nvme: Remove redundant assignment of cq vector nvme: Assign subsys instance from first ctrl nvme: tcp: remove redundant assignment to variable ret nvme: include admin_q sync with nvme_sync_queues nvme: Treat discovery subsystems as unique subsystems nvme: fix ns removal hang when failing to revalidate due to a transient error nvme: make nvme_report_ns_ids propagate error back nvme: make nvme_identify_ns propagate errors back nvme: pass status to nvme_error_status nvme-fc: Fail transport errors with NVME_SC_HOST_PATH nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failed nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERROR
2019-09-12nvmet: fix a wrong error status returned in error log pageAmit1-5/+3
When the command data_len cannot hold all the controller errors, we should simply return as much errors as we can fit instead of failing the command. Signed-off-by: Amit Engel <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: send discovery log page change events to userspaceSagi Grimberg1-1/+5
If the controller supports discovery log page change events, we want to enable it. When we see a discovery log change event we will send it up to userspace and expect it to handle it. Reviewed-by: Minwoo Im <[email protected]> Reviewed-by: James Smart <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: add uevent variables for controller devicesSagi Grimberg1-0/+28
When we send uevents to userspace, add controller specific environment variables to uniquly identify the controller beyond its device name. This will be useful to address discovery log change events by actually verifying that the discovery controller is indeed the same as the device that generated the event. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: enable aen regardless of the presence of I/O queuesSagi Grimberg1-2/+4
AENs in general are not related to the presence of I/O queues, so enable them regardless. Note that the only exception is that discovery controller will not support any of the requested AENs and nvme_enable_aen will respect that and return, so it is still safe to enable regardless. Note it is safe to enable AENs even before the initial namespace scanning as we have the scan operation in a workqueue context. Reviewed-by: Minwoo Im <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme-fabrics: allow discovery subsystems accept a katoSagi Grimberg1-10/+2
This modifies the behavior of discovery subsystems to accept a kato as a preparation to support discovery log change events. This also means that now every discovery controller will have a default kato value, and for non-persistent connections the host needs to pass in a zero kato value (keep_alive_tmo=0). Reviewed-by: Minwoo Im <[email protected]> Reviewed-by: James Smart <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()Markus Elfring1-3/+1
Simplify this function implementation by using a known function. Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Markus Elfring <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: Remove redundant assignment of cq vectorIsrael Rukshin1-1/+0
The cq vector is already assigned with the correct value. Signed-off-by: Israel Rukshin <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: Assign subsys instance from first ctrlKeith Busch1-11/+10
The namespace disk names must be unique for the lifetime of the subsystem. This was accomplished by using their parent subsystems' instances which were allocated independently from the controllers connected to that subsystem. This allowed name prefixes assigned to namespaces to match a controller from an unrelated subsystem, and has created confusion among users examining device nodes. Ensure a namespace's subsystem instance never clashes with a controller instance of another subsystem by transferring the instance ownership to the parent subsystem from the first controller discovered in that subsystem. Reviewed-by: Logan Gunthorpe <[email protected]> Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Minwoo Im <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: tcp: remove redundant assignment to variable retColin Ian King1-1/+1
The variable ret is being initialized with a value that is never read and is being re-assigned immediately afterwards. The assignment is redundant and hence can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: include admin_q sync with nvme_sync_queuesEdmund Nadolski1-0/+3
nvme_sync_queues currently syncs all namespace queues, but should also sync the admin queue, if present. Signed-off-by: Edmund Nadolski <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: Treat discovery subsystems as unique subsystemsJames Smart1-0/+11
Current code matches subnqn and collapses all controllers to the same subnqn to a single subsystem structure. This is good for recognizing multiple controllers for the same subsystem. But with the well-known discovery subnqn, the subsystems aren't truly the same subsystem. As such, subsystem specific rules, such as no overlap of controller id, do not apply. With today's behavior, the check for overlap of controller id can fail, preventing the new discovery controller from being created. When searching for like subsystem nqn, exclude the discovery nqn from matching. This will result in each discovery controller being attached to a unique subsystem structure. Signed-off-by: James Smart <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Max Gurtovoy <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: fix ns removal hang when failing to revalidate due to a transient errorSagi Grimberg1-1/+7
If a controller reset is racing with a namespace revalidation, the revalidation (admin) I/O will surely fail, but we should not remove the namespace as we will execute the I/O when the controller is back up. Same for spurious allocation errors (return -ENOMEM). Fix this by checking the specific error code in nvme_revalidate_disk and if it is a transient error (for example non DNR nvme statuses or a negative ENOMEM as allocation failure), do not remove the namespace as it will either recover when the controller is back up and schedule a subsequent scan, or the controller is going away and the namespaces will be removed anyways. This fixes a hang namespace scanning racing with a controller reset and also sporious I/O errors in path failover coditions where the controller reset is racing with the namespace scan work with multipath enabled. Reported-by: Hannes Reinecke <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: James Smart <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: make nvme_report_ns_ids propagate error backSagi Grimberg1-6/+22
Make the callers check the return status and propagate back accordingly (casting to errno from a positive nvme status). Also print the return status in nvme_report_ns_ids. Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: James Smart <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: make nvme_identify_ns propagate errors backSagi Grimberg1-19/+20
right now callers of nvme_identify_ns only know that it failed, but don't know why. Make nvme_identify_ns propagate the error back. Because nvme_submit_sync_cmd may return a positive status code, we make nvme_identify_ns receive the id by reference and return that status up the call chain, but make sure not to leak positive nvme status codes to the upper layers. Reviewed-by: Minwoo Im <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: James Smart <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: pass status to nvme_error_statusSagi Grimberg1-3/+3
No need for the full blown request structure. Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme-fc: Fail transport errors with NVME_SC_HOST_PATHJames Smart1-7/+30
NVME_SC_INTERNAL should indicate an internal controller errors and not host transport errors. These errors will propagate to upper layers (essentially nvme core) and be interpereted as transport errors which should not be taken into account for namespace state or condition. Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: James Smart <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme-tcp: fail command with NVME_SC_HOST_PATH_ERROR send failedSagi Grimberg1-1/+1
This is a more appropriate error status for a transport error detected by us (the host). Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: James Smart <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12nvme: fail cancelled commands with NVME_SC_HOST_PATH_ERRORSagi Grimberg1-1/+3
NVME_SC_ABORT_REQ means that the request was aborted due to an abort command received. In our case, this is a transport cancellation, so host pathing error is much more appropriate. Also, convert NVME_SC_HOST_PATH_ERROR to BLK_STS_TRANSPORT for such that callers can understand that the status is a transport related error. This will be used by the ns scanning code to understand if it got an error from the controller or that the controller happens to be unreachable by the transport. Reviewed-by: Minwoo Im <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: James Smart <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]>
2019-09-12MAINTAINERS: Switch PDx86 subsystem status to Odd FixesAndy Shevchenko1-1/+1
Due to shift of priorities the actual status of the subsystem is Odd Fixes. Signed-off-by: Andy Shevchenko <[email protected]>