Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull ceph updates from Ilya Dryomov:
"The main items are:
- support for asynchronous create and unlink (Jeff Layton).
Creates and unlinks are satisfied locally, without waiting for a
reply from the MDS, provided the client has been granted
appropriate caps (new in v15.y.z ("Octopus") release). This can be
a big help for metadata heavy workloads such as tar and rsync.
Opt-in with the new nowsync mount option.
- multiple blk-mq queues for rbd (Hannes Reinecke and myself).
When the driver was converted to blk-mq, we settled on a single
blk-mq queue because of a global lock in libceph and some other
technical debt. These have since been addressed, so allocate a
queue per CPU to enhance parallelism.
- don't hold onto caps that aren't actually needed (Zheng Yan).
This has been our long-standing behavior, but it causes issues with
some active/standby applications (synchronous I/O, stalls if the
standby goes down, etc).
- .snap directory timestamps consistent with ceph-fuse (Luis
Henriques)"
* tag 'ceph-for-5.7-rc1' of git://github.com/ceph/ceph-client: (49 commits)
ceph: fix snapshot directory timestamps
ceph: wait for async creating inode before requesting new max size
ceph: don't skip updating wanted caps when cap is stale
ceph: request new max size only when there is auth cap
ceph: cleanup return error of try_get_cap_refs()
ceph: return ceph_mdsc_do_request() errors from __get_parent()
ceph: check all mds' caps after page writeback
ceph: update i_requested_max_size only when sending cap msg to auth mds
ceph: simplify calling of ceph_get_fmode()
ceph: remove delay check logic from ceph_check_caps()
ceph: consider inode's last read/write when calculating wanted caps
ceph: always renew caps if mds_wanted is insufficient
ceph: update dentry lease for async create
ceph: attempt to do async create when possible
ceph: cache layout in parent dir on first sync create
ceph: add new MDS req field to hold delegated inode number
ceph: decode interval_sets for delegated inos
ceph: make ceph_fill_inode non-static
ceph: perform asynchronous unlink if we have sufficient caps
ceph: don't take refs to want mask unless we have all bits
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
- Fix failure to copy-up files from certain NFSv4 mounts
- Sort out inconsistencies between st_ino and i_ino (used in /proc/locks)
- Allow consistent (POSIX-y) inode numbering in more cases
- Allow virtiofs to be used as upper layer
- Miscellaneous cleanups and fixes
* tag 'ovl-update-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: document xino expected behavior
ovl: enable xino automatically in more cases
ovl: avoid possible inode number collisions with xino=on
ovl: use a private non-persistent ino pool
ovl: fix WARN_ON nlink drop to zero
ovl: fix a typo in comment
ovl: replace zero-length array with flexible-array member
ovl: ovl_obtain_alias(): don't call d_instantiate_anon() for old
ovl: strict upper fs requirements for remote upper fs
ovl: check if upper fs supports RENAME_WHITEOUT
ovl: allow remote upper
ovl: decide if revalidate needed on a per-dentry basis
ovl: separate detection of remote upper layer from stacked overlay
ovl: restructure dentry revalidation
ovl: ignore failure to copy up unknown xattrs
ovl: document permission model
ovl: simplify i_ino initialization
ovl: factor out helper ovl_get_root()
ovl: fix out of date comment and unreachable code
ovl: fix value of i_ino for lower hardlink corner case
|
|
Pull iomap fix from Darrick Wong:
"Fix a problem in readahead where we can crash if we can't allocate a
full bio due to GFP_NORETRY"
* tag 'iomap-5.7-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Handle memory allocation failure in readahead
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
"This fixes a Kconfig dependency for hisilicon as well as a double free
in marvell/octeontx"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: marvell/octeontx - fix double free of ptr
crypto: hisilicon - Fix build error
|
|
git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
- add TI K3 RTI watchdog
- add stop_on_reboot parameter to control reboot policy
- wm831x_wdt: Remove GPIO handling
- several small fixes, improvements and clean-ups
* tag 'linux-watchdog-5.7-rc1' of git://www.linux-watchdog.org/linux-watchdog:
watchdog: Add K3 RTI watchdog support
dt-bindings: watchdog: Add support for TI K3 RTI watchdog
watchdog: ziirave_wdt: change name to be more specific
watchdog: orion: use 0 for unset heartbeat
watchdog: npcm: remove whitespaces
watchdog: reset last_hw_keepalive time at start
watchdog: imx2_wdt: Drop .remove callback
watchdog: Add stop_on_reboot parameter to control reboot policy
watchdog: wm831x_wdt: Remove GPIO handling
watchdog: imx7ulp: Remove unused include of init.h
watchdog: imx_sc_wdt: Remove unused includes
watchdog: qcom: Use irq flags from firmware
watchdog: pm8916_wdt: Add system sleep callbacks
watchdog: qcom-wdt: disable pretimeout on timer platform
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
Pull chrome platform updates from Benson Leung:
cros-usbpd-notify and cros_ec_typec:
- Add a new notification driver that handles and dispatches USB PD
related events to other drivers.
- Add a Type C connector class driver for cros_ec
CrOS EC:
- Introduce a new cros_ec_cmd_xfer_status helper
Sensors/iio:
- A series from Gwendal that adds Cros EC sensor hub FIFO support
Wilco EC:
- Fix a build warning.
- Platform data shouldn't include kernel.h
Misc:
- i2c api conversion complete, with i2c_new_client_device instead of
i2c_new_device in chromeos_laptop.
- Replace zero-length array with flexible-array member in
cros_ec_chardev and wilco_ec
- Update new structure for SPI transfer delays in cros_ec_spi
* tag 'tag-chrome-platform-for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: (34 commits)
platform/chrome: cros_ec_spi: Wait for USECS, not NSECS
iio: cros_ec: Use Hertz as unit for sampling frequency
iio: cros_ec: Report hwfifo_watermark_max
iio: cros_ec: Expose hwfifo_timeout
iio: cros_ec: Remove pm function
iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO
iio: expose iio_device_set_clock
iio: cros_ec: Move function description to .c file
platform/chrome: cros_ec_sensorhub: Add median filter
platform/chrome: cros_ec_sensorhub: Add code to spread timestmap
platform/chrome: cros_ec_sensorhub: Add FIFO support
platform/chrome: cros_ec_sensorhub: Add the number of sensors in sensorhub
platform/chrome: chromeos_laptop: make I2C API conversion complete
platform/chrome: wilco_ec: event: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_chardev: Replace zero-length array with flexible-array member
platform/chrome: cros_ec_typec: Update port info from EC
platform/chrome: Add Type C connector class driver
platform/chrome: cros_usbpd_notify: Pull PD_HOST_EVENT status
platform/chrome: cros_usbpd_notify: Amend ACPI driver to plat
platform/chrome: cros_usbpd_notify: Add driver data struct
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm and dax updates from Dan Williams:
"There were multiple touches outside of drivers/nvdimm/ this round to
add cross arch compatibility to the devm_memremap_pages() interface,
enhance numa information for persistent memory ranges, and add a
zero_page_range() dax operation.
This cycle I switched from the patchwork api to Konstantin's b4 script
for collecting tags (from x86, PowerPC, filesystem, and device-mapper
folks), and everything looks to have gone ok there. This has all
appeared in -next with no reported issues.
Summary:
- Add support for region alignment configuration and enforcement to
fix compatibility across architectures and PowerPC page size
configurations.
- Introduce 'zero_page_range' as a dax operation. This facilitates
filesystem-dax operation without a block-device.
- Introduce phys_to_target_node() to facilitate drivers that want to
know resulting numa node if a given reserved address range was
onlined.
- Advertise a persistence-domain for of_pmem and papr_scm. The
persistence domain indicates where cpu-store cycles need to reach
in the platform-memory subsystem before the platform will consider
them power-fail protected.
- Promote numa_map_to_online_node() to a cross-kernel generic
facility.
- Save x86 numa information to allow for node-id lookups for reserved
memory ranges, deploy that capability for the e820-pmem driver.
- Pick up some miscellaneous minor fixes, that missed v5.6-final,
including a some smatch reports in the ioctl path and some unit
test compilation fixups.
- Fixup some flexible-array declarations"
* tag 'libnvdimm-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (29 commits)
dax: Move mandatory ->zero_page_range() check in alloc_dax()
dax,iomap: Add helper dax_iomap_zero() to zero a range
dax: Use new dax zero page method for zeroing a page
dm,dax: Add dax zero_page_range operation
s390,dcssblk,dax: Add dax zero_page_range operation to dcssblk driver
dax, pmem: Add a dax operation zero_page_range
pmem: Add functions for reading/writing page to/from pmem
libnvdimm: Update persistence domain value for of_pmem and papr_scm device
tools/test/nvdimm: Fix out of tree build
libnvdimm/region: Fix build error
libnvdimm/region: Replace zero-length array with flexible-array member
libnvdimm/label: Replace zero-length array with flexible-array member
ACPI: NFIT: Replace zero-length array with flexible-array member
libnvdimm/region: Introduce an 'align' attribute
libnvdimm/region: Introduce NDD_LABELING
libnvdimm/namespace: Enforce memremap_compat_align()
libnvdimm/pfn: Prevent raw mode fallback if pfn-infoblock valid
libnvdimm: Out of bounds read in __nd_ioctl()
acpi/nfit: improve bounds checking for 'func'
mm/memremap_pages: Introduce memremap_compat_align()
...
|
|
The current arm BPF JIT does not correctly compile RSH or ARSH when the
immediate shift amount is 0. This causes the "rsh64 by 0 imm" and "arsh64
by 0 imm" BPF selftests to hang the kernel by reaching an instruction
the verifier determines to be unreachable.
The root cause is in how immediate right shifts are encoded on arm.
For LSR and ASR (logical and arithmetic right shift), a bit-pattern
of 00000 in the immediate encodes a shift amount of 32. When the BPF
immediate is 0, the generated code shifts by 32 instead of the expected
behavior (a no-op).
This patch fixes the bugs by adding an additional check if the BPF
immediate is 0. After the change, the above mentioned BPF selftests pass.
Fixes: 39c13c204bb11 ("arm: eBPF JIT compiler")
Co-developed-by: Xi Wang <[email protected]>
Signed-off-by: Xi Wang <[email protected]>
Signed-off-by: Luke Nelson <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
In testing, we found that for request sockets the sk->sk_reuseport field
may yet be uninitialized, which caused bpf_sk_assign() to randomly
succeed or return -ESOCKTNOSUPPORT when handling the forward ACK in a
three-way handshake.
Fix it by only applying the reuseport check for full sockets.
Fixes: cf7fbe660f2d ("bpf: Add socket assign support")
Signed-off-by: Joe Stringer <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Fixes CT entries list corruption.
After allowing parallel insertion/removals in upper nf flow table
layer, unprotected ct entries list can be corrupted by parallel add/del
on the same flow table.
CT entries list is only used while freeing a ct zone flow table to
go over all the ct entries offloaded on that zone/table, and flush
the table.
As rhashtable already provides an api to go over all the inserted entries,
fix the race by using the rhashtable iteration instead, and remove the list.
Fixes: 7da182a998d6 ("netfilter: flowtable: Use work entry per offload command")
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
In cited commit netdevice is registered after devlink port.
Unregistration flow should be mirror sequence of registration flow.
Hence, unregister netdevice before devlink port.
Fixes: 31e87b39ba9d ("net/mlx5e: Fix devlink port register sequence")
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Cited patch missed to extract PCI pf number accurately for PF and VF
port flavour. It considered PCI device + function number.
Due to this, device having non zero device number shown large pfnum.
Hence, use only PCI function number; to avoid similar errors, derive
pfnum one time for all port flavours.
Fixes: f60f315d339e ("net/mlx5e: Register devlink ports for physical link, PCI PF, VFs")
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
With ct clear action we should not allocate the action in hw
and not release the mod_acts parsed in advance.
It will be done when handling the ct clear action.
Fixes: 1ef3018f5af3 ("net/mlx5e: CT: Support clear action")
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Paul Blakey <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Current value of nest_level, assigned from net_device lower_level value,
does not reflect the actual number of vlan headers, needed to pop.
For ex., if we have untagged ingress traffic sended over vlan devices,
instead of one pop action, driver will perform two pop actions.
To fix that, calculate nest_level as difference between vlan device and
parent device lower_levels.
Fixes: f3b0a18bb6cb ("net: remove unnecessary variables and callback")
Signed-off-by: Dmytro Linkin <[email protected]>
Signed-off-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Once driver finishes flashing the firmware image, it should release it.
Fixes: 9c8bca2637b8 ("mlx5: Move firmware flash implementation to devlink")
Signed-off-by: Eran Ben Elisha <[email protected]>
Reviewed-by: Aya Levin <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When we destroy rules from slow path we need to avoid destroying
termination tables since termination tables are never created in slow
path. By doing so we avoid destroying the termination table created for the
slow path.
Fixes: d8a2034f152a ("net/mlx5: Don't use termination tables in slow path")
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Oz Shlomo <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
High frequency of PCI ioread calls during recovery flow may cause the
following trace on powerpc:
[ 248.670288] EEH: 2100000 reads ignored for recovering device at
location=Slot1 driver=mlx5_core pci addr=0000:01:00.1
[ 248.670331] EEH: Might be infinite loop in mlx5_core driver
[ 248.670361] CPU: 2 PID: 35247 Comm: kworker/u192:11 Kdump: loaded
Tainted: G OE ------------ 4.14.0-115.14.1.el7a.ppc64le #1
[ 248.670425] Workqueue: mlx5_health0000:01:00.1 health_recover_work
[mlx5_core]
[ 248.670471] Call Trace:
[ 248.670492] [c00020391c11b960] [c000000000c217ac] dump_stack+0xb0/0xf4
(unreliable)
[ 248.670548] [c00020391c11b9a0] [c000000000045818]
eeh_check_failure+0x5c8/0x630
[ 248.670631] [c00020391c11ba50] [c00000000068fce4]
ioread32be+0x114/0x1c0
[ 248.670692] [c00020391c11bac0] [c00800000dd8b400]
mlx5_error_sw_reset+0x160/0x510 [mlx5_core]
[ 248.670752] [c00020391c11bb60] [c00800000dd75824]
mlx5_disable_device+0x34/0x1d0 [mlx5_core]
[ 248.670822] [c00020391c11bbe0] [c00800000dd8affc]
health_recover_work+0x11c/0x3c0 [mlx5_core]
[ 248.670891] [c00020391c11bc80] [c000000000164fcc]
process_one_work+0x1bc/0x5f0
[ 248.670955] [c00020391c11bd20] [c000000000167f8c]
worker_thread+0xac/0x6b0
[ 248.671015] [c00020391c11bdc0] [c000000000171618] kthread+0x168/0x1b0
[ 248.671067] [c00020391c11be30] [c00000000000b65c]
ret_from_kernel_thread+0x5c/0x80
Reduce the PCI ioread frequency during recovery by using msleep()
instead of cond_resched()
Fixes: 3e5b72ac2f29 ("net/mlx5: Issue SW reset on FW assert")
Signed-off-by: Moshe Shemesh <[email protected]>
Reviewed-by: Feras Daoud <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
fixes unused variable warning.
Reported-by: Eric Biggers <[email protected]>
Acked-by: Christian König <[email protected]>
Reviewed-by: Mikita Lipski <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
Make the fw_write_wait default case true since presumably all new
gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM
packet with opration=1.
Signed-off-by: Aaron Liu <[email protected]>
Tested-by: Aaron Liu <[email protected]>
Tested-by: Yuxian Dai <[email protected]>
Acked-by: Alex Deucher <[email protected]>
Acked-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
For Arcturus, forcing clock to some specific level is not supported
with 54.18 and onwards SMU firmware. As according to firmware team,
they adopt new gfx dpm tuned parameters which can cover all the use
case in a much smooth way. Thus setting through driver interface
is not needed and maybe do a disservice.
Signed-off-by: Evan Quan <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
|
|
The system will be hang up during S3 suspend because of SMU is pending
for GC not respose the register CP_HQD_ACTIVE access request.This issue
root cause of accessing the GC register under enter GFX CGGPG and can
be fixed by disable GFX CGPG before perform suspend.
v2: Use disable the GFX CGPG instead of RLC safe mode guard.
Signed-off-by: Prike Liang <[email protected]>
Tested-by: Mengbing Wang <[email protected]>
Reviewed-by: Huang Rui <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Cc: [email protected]
|
|
Building with some experimental patches, I came across a warning
in the tls code:
include/linux/compiler.h:215:30: warning: assignment discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
215 | *(volatile typeof(x) *)&(x) = (val); \
| ^
net/tls/tls_main.c:650:4: note: in expansion of macro 'smp_store_release'
650 | smp_store_release(&saved_tcpv4_prot, prot);
This appears to be a legitimate warning about assigning a const pointer
into the non-const 'saved_tcpv4_prot' global. Annotate both the ipv4 and
ipv6 pointers 'const' to make the code internally consistent.
Fixes: 5bb4c45d466c ("net/tls: Read sk_prot once when building tls proto ops")
Signed-off-by: Arnd Bergmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Creation and management of L2TPv3 tunnels and session through netlink
requires CAP_NET_ADMIN. However, a process with CAP_NET_ADMIN in a
non-initial user namespace gets an EPERM due to the use of the
genetlink GENL_ADMIN_PERM flag. Thus, management of L2TP VPNs inside
an unprivileged container won't work.
We replaced the GENL_ADMIN_PERM by the GENL_UNS_ADMIN_PERM flag
similar to other network modules which also had this problem, e.g.,
openvswitch (commit 4a92602aa1cd "openvswitch: allow management from
inside user namespaces") and nl80211 (commit 5617c6cd6f844 "nl80211:
Allow privileged operations from user namespaces").
I tested this in the container runtime trustm3 (trustm3.github.io)
and was able to create l2tp tunnels and sessions in unpriviliged
(user namespaced) containers using a private network namespace.
For other runtimes such as docker or lxc this should work, too.
Signed-off-by: Michael Weiß <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Shannon Nelson says:
====================
ionic: fw upgrade filter fixes
With further testing of the fw-upgrade operations we found a
couple of issues that needed to be cleaned up:
- the filters other than the base MAC address need to be
reinstated into the device
- we don't need to remove the station MAC filter if it
isn't changing from a previous MAC filter
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
The code was working too hard and in some cases was trying to
delete filters that weren't there, generating a potentially
misleading error message.
IONIC_CMD_RX_FILTER_DEL (32) failed: IONIC_RC_ENOENT (-2)
Fixes: 2a654540be10 ("ionic: Add Rx filter and rx_mode ndo support")
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The NIC's filters are lost in the midst of the fw-upgrade
so we need to replay them into the FW. We also remove the
unused ionic_rx_filter_del() function.
Fixes: c672412f6172 ("ionic: remove lifs on fw reset")
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The commit 2e05ea5cdc1a ("dma-mapping: implement dma_map_single_attrs using
dma_map_page_attrs") removed "dma_debug_page" enum, but missed to update
type2name string table. This causes incorrect displaying of dma allocation
type.
Fix it by removing "page" string from type2name string table and switch to
use named initializers.
Before (dma_alloc_coherent()):
k3-ringacc 4b800000.ringacc: scather-gather idx 2208 P=d1140000 N=d114 D=d1140000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable
k3-ringacc 4b800000.ringacc: scather-gather idx 2216 P=d1150000 N=d115 D=d1150000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable
After:
k3-ringacc 4b800000.ringacc: coherent idx 2208 P=d1140000 N=d114 D=d1140000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable
k3-ringacc 4b800000.ringacc: coherent idx 2216 P=d1150000 N=d115 D=d1150000 L=40 DMA_BIDIRECTIONAL dma map error check not applicable
Fixes: 2e05ea5cdc1a ("dma-mapping: implement dma_map_single_attrs using dma_map_page_attrs")
Signed-off-by: Grygorii Strashko <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
The upper 32-bit physical address gets truncated inadvertently
when dma_direct_get_required_mask() invokes phys_to_dma_direct().
This results in dma_addressing_limited() return incorrect value
when used in platforms with LPAE enabled.
Fix it here by explicitly type casting 'max_pfn' to phys_addr_t
in order to prevent overflow of intermediate value while evaluating
'(max_pfn - 1) << PAGE_SHIFT'.
Signed-off-by: Kishon Vijay Abraham I <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
As Documentation/kbuild/llvm.rst implies, building the kernel with a
full set of LLVM tools gets very verbose and unwieldy.
Provide a single switch LLVM=1 to use Clang and LLVM tools instead
of GCC and Binutils. You can pass it from the command line or as an
environment variable.
Please note LLVM=1 does not turn on the integrated assembler. You need
to pass LLVM_IAS=1 to use it. When the upstream kernel is ready for the
integrated assembler, I think we can make it default.
We discussed what we need, and we agreed to go with a simple boolean
flag that switches both target and host tools:
https://lkml.org/lkml/2020/3/28/494
https://lkml.org/lkml/2020/4/3/43
Some items discussed, but not adopted:
- LLVM_DIR
When multiple versions of LLVM are installed, I just thought supporting
LLVM_DIR=/path/to/my/llvm/bin/ might be useful.
CC = $(LLVM_DIR)clang
LD = $(LLVM_DIR)ld.lld
...
However, we can handle this by modifying PATH. So, we decided to not do
this.
- LLVM_SUFFIX
Some distributions (e.g. Debian) package specific versions of LLVM with
naming conventions that use the version as a suffix.
CC = clang$(LLVM_SUFFIX)
LD = ld.lld(LLVM_SUFFIX)
...
will allow a user to pass LLVM_SUFFIX=-11 to use clang-11 etc.,
but the suffixed versions in /usr/bin/ are symlinks to binaries in
/usr/lib/llvm-#/bin/, so this can also be handled by PATH.
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]> # build
Tested-by: Nick Desaulniers <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
|
|
The 'AS' variable is unused for building the kernel. Only the remaining
usage is to turn on the integrated assembler. A boolean flag is a better
fit for this purpose.
AS=clang was added for experts. So, I replaced it with LLVM_IAS=1,
breaking the backward compatibility.
Suggested-by: Nick Desaulniers <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu updates from Joerg Roedel:
- ARM-SMMU support for the TLB range invalidation command in SMMUv3.2
- ARM-SMMU introduction of command batching helpers to batch up CD and
ATC invalidation
- ARM-SMMU support for PCI PASID, along with necessary PCI symbol
exports
- Introduce a generic (actually rename an existing) IOMMU related
pointer in struct device and reduce the IOMMU related pointers
- Some fixes for the OMAP IOMMU driver to make it build on 64bit
architectures
- Various smaller fixes and improvements
* tag 'iommu-updates-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (39 commits)
iommu: Move fwspec->iommu_priv to struct dev_iommu
iommu/virtio: Use accessor functions for iommu private data
iommu/qcom: Use accessor functions for iommu private data
iommu/mediatek: Use accessor functions for iommu private data
iommu/renesas: Use accessor functions for iommu private data
iommu/arm-smmu: Use accessor functions for iommu private data
iommu/arm-smmu: Refactor master_cfg/fwspec usage
iommu/arm-smmu-v3: Use accessor functions for iommu private data
iommu: Introduce accessors for iommu private data
iommu/arm-smmu: Fix uninitilized variable warning
iommu: Move iommu_fwspec to struct dev_iommu
iommu: Rename struct iommu_param to dev_iommu
iommu/tegra-gart: Remove direct access of dev->iommu_fwspec
drm/msm/mdp5: Remove direct access of dev->iommu_fwspec
ACPI/IORT: Remove direct access of dev->iommu_fwspec
iommu: Define dev_iommu_fwspec_get() for !CONFIG_IOMMU_API
iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE
iommu/virtio: Fix freeing of incomplete domains
iommu/virtio: Fix sparse warning
iommu/vt-d: Add build dependency on IOASID
...
|
|
Pull more kvm updates from Paolo Bonzini:
"s390:
- nested virtualization fixes
x86:
- split svm.c
- miscellaneous fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: VMX: fix crash cleanup when KVM wasn't used
KVM: X86: Filter out the broadcast dest for IPI fastpath
KVM: s390: vsie: Fix possible race when shadowing region 3 tables
KVM: s390: vsie: Fix delivery of addressing exceptions
KVM: s390: vsie: Fix region 1 ASCE sanity shadow address checks
KVM: nVMX: don't clear mtf_pending when nested events are blocked
KVM: VMX: Remove unnecessary exception trampoline in vmx_vmenter
KVM: SVM: Split svm_vcpu_run inline assembly to separate file
KVM: SVM: Move SEV code to separate file
KVM: SVM: Move AVIC code to separate file
KVM: SVM: Move Nested SVM Implementation to nested.c
kVM SVM: Move SVM related files to own sub-directory
|
|
Pull virtio updates from Michael Tsirkin:
- Some bug fixes
- The new vdpa subsystem with two first drivers
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-balloon: Revert "virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM"
vdpa: move to drivers/vdpa
virtio: Intel IFC VF driver for VDPA
vdpasim: vDPA device simulator
vhost: introduce vDPA-based backend
virtio: introduce a vDPA based transport
vDPA: introduce vDPA bus
vringh: IOTLB support
vhost: factor out IOTLB
vhost: allow per device message handler
vhost: refine vhost and vringh kconfig
virtio-balloon: Switch back to OOM handler for VIRTIO_BALLOON_F_DEFLATE_ON_OOM
virtio-net: Introduce hash report feature
virtio-net: Introduce RSS receive steering feature
virtio-net: Introduce extended RSC feature
tools/virtio: option to build an out of tree module
|
|
For thumb instructions, call_undef_hook() in traps.c first reads a u16,
and if the u16 indicates a T32 instruction (u16 >= 0xe800), a second
u16 is read, which then makes up the the lower half-word of a T32
instruction. For T16 instructions, the second u16 is not read,
which makes the resulting u32 opcode always have the upper half set to
0.
However, having the upper half of instr_mask in the undef_hook set to 0
masks out the upper half of all thumb instructions - both T16 and T32.
This results in trapped T32 instructions with the lower half-word equal
to the T16 encoding of setend (b650) being matched, even though the upper
half-word is not 0000 and thus indicates a T32 opcode.
An example of such a T32 instruction is eaa0b650, which should raise a
SIGILL since T32 instructions with an eaa prefix are unallocated as per
Arm ARM, but instead works as a SETEND because the second half-word is set
to b650.
This patch fixes the issue by extending instr_mask to include the
upper u32 half, which will still match T16 instructions where the upper
half is 0, but not T32 instructions.
Fixes: 2d888f48e056 ("arm64: Emulate SETEND for AArch32 tasks")
Cc: <[email protected]> # 4.0.x-
Reviewed-by: Suzuki K Poulose <[email protected]>
Signed-off-by: Fredrik Strupe <[email protected]>
Signed-off-by: Catalin Marinas <[email protected]>
|
|
Whenever we add a ticket to a space_info object we increment the object's
reclaim_size counter witht the ticket's bytes, and we decrement it with
the corresponding amount only when we are able to grant the requested
space to the ticket. When we are not able to grant the space to a ticket,
or when the ticket is removed due to a signal (e.g. an application has
received sigterm from the terminal) we never decrement the counter with
the corresponding bytes from the ticket. This leak can result in the
space reclaim code to later do much more work than necessary. So fix it
by decrementing the counter when those two cases happen as well.
Fixes: db161806dc5615 ("btrfs: account ticket size at add/delete time")
Reviewed-by: Nikolay Borisov <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
This is a revert of commit 0a8068a3dd4294 ("btrfs: make ranged full
fsyncs more efficient"), with updated comment in btrfs_sync_file.
Commit 0a8068a3dd4294 ("btrfs: make ranged full fsyncs more efficient")
made full fsyncs operate on the given range only as it assumed it was safe
when using the NO_HOLES feature, since the hole detection was simplified
some time ago and no longer was a source for races with ordered extent
completion of adjacent file ranges.
However it's still not safe to have a full fsync only operate on the given
range, because extent maps for new extents might not be present in memory
due to inode eviction or extent cloning. Consider the following example:
1) We are currently at transaction N;
2) We write to the file range [0, 1MiB);
3) Writeback finishes for the whole range and ordered extents complete,
while we are still at transaction N;
4) The inode is evicted;
5) We open the file for writing, causing the inode to be loaded to
memory again, which sets the 'full sync' bit on its flags. At this
point the inode's list of modified extent maps is empty (figuring
out which extents were created in the current transaction and were
not yet logged by an fsync is expensive, that's why we set the
'full sync' bit when loading an inode);
6) We write to the file range [512KiB, 768KiB);
7) We do a ranged fsync (such as msync()) for file range [512KiB, 768KiB).
This correctly flushes this range and logs its extent into the log
tree. When the writeback started an extent map for range [512KiB, 768KiB)
was added to the inode's list of modified extents, and when the fsync()
finishes logging it removes that extent map from the list of modified
extent maps. This fsync also clears the 'full sync' bit;
8) We do a regular fsync() (full ranged). This fsync() ends up doing
nothing because the inode's list of modified extents is empty and
no other changes happened since the previous ranged fsync(), so
it just returns success (0) and we end up never logging extents for
the file ranges [0, 512KiB) and [768KiB, 1MiB).
Another scenario where this can happen is if we replace steps 2 to 4 with
cloning from another file into our test file, as that sets the 'full sync'
bit in our inode's flags and does not populate its list of modified extent
maps.
This was causing test case generic/457 to fail sporadically when using the
NO_HOLES feature, as it exercised this later case where the inode has the
'full sync' bit set and has no extent maps in memory to represent the new
extents due to extent cloning.
Fix this by reverting commit 0a8068a3dd4294 ("btrfs: make ranged full fsyncs
more efficient") since there is no easy way to work around it.
Fixes: 0a8068a3dd4294 ("btrfs: make ranged full fsyncs more efficient")
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
When not using the NO_HOLES feature we were not marking the destination's
file range as written after cloning an inline extent into it. This can
lead to a data loss if the current destination file size is smaller than
the source file's size.
Example:
$ mkfs.btrfs -f -O ^no-holes /dev/sdc
$ mount /mnt/sdc /mnt
$ echo "hello world" > /mnt/foo
$ cp --reflink=always /mnt/foo /mnt/bar
$ rm -f /mnt/foo
$ umount /mnt
$ mount /mnt/sdc /mnt
$ cat /mnt/bar
$
$ stat -c %s /mnt/bar
0
# -> the file is empty, since we deleted foo, the data lost is forever
Fix that by calling btrfs_inode_set_file_extent_range() after cloning an
inline extent.
A test case for fstests will follow soon.
Link: https://lore.kernel.org/linux-btrfs/20200404193846.GA432065@latitude/
Reported-by: Johannes Hirte <[email protected]>
Fixes: 9ddc959e802bf ("btrfs: use the file extent tree infrastructure")
Tested-by: Johannes Hirte <[email protected]>
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Previously we would set the reloc root's last snapshot to transid - 1.
However there was a problem with doing this, and we changed it to
setting the last snapshot to the generation of the commit node of the fs
root.
This however broke should_ignore_root(). The assumption is that if we
are in a generation newer than when the reloc root was created, then we
would find the reloc root through normal backref lookups, and thus can
ignore any fs roots we find with an old enough reloc root.
Now that the last snapshot could be considerably further in the past
than before, we'd end up incorrectly ignoring an fs root. Thus we'd
find no nodes for the bytenr we were searching for, and we'd fail to
relocate anything. We'd loop through the relocate code again and see
that there were still used space in that block group, attempt to
relocate those bytenr's again, fail in the same way, and just loop like
this forever. This is tricky in that we have to not modify the fs root
at all during this time, so we need to have a block group that has data
in this fs root that is not shared by any other root, which is why this
has been difficult to reproduce.
Fixes: 054570a1dc94 ("Btrfs: fix relocation incorrectly dropping data references")
CC: [email protected] # 4.9+
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
Signed-off-by: Mike Marshall <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.7
A collection of fixes that have been accumilated since the merge window,
mainly relating to x86 platform support.
|
|
__get_user_pages_locked() will return 0 instead of -EINTR after commit
4426e945df588 ("mm/gup: allow VM_FAULT_RETRY for multiple times") which
added extra code to allow gup detect fatal signal faster.
Restore the original -EINTR behavior.
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Fixes: 4426e945df58 ("mm/gup: allow VM_FAULT_RETRY for multiple times")
Reported-by: [email protected]
Signed-off-by: Hillf Danton <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Peter Xu <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Don't re-read userspace-shared sqe->flags, it can be exploited.
sqe->flags are copied into req->flags in io_submit_sqe(), check them
there instead.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
io_get_req() do two different things: io_kiocb allocation and
initialisation. Move init part out of it and rename into
io_alloc_req(). It's simpler this way and also have better data
locality.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
As io_get_sqe() split into 2 stage get/consume, get an sqe before
allocating io_kiocb, so no free_req*() for a failure case is needed,
and inline back __io_req_do_free(), which has only 1 user.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Make io_get_sqring() care only about sqes themselves, not initialising
the io_kiocb. Also, split it into get + consume, that will be helpful in
the future.
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In io_read_prep() or io_write_prep(), io_req_map_rw() takes
struct io_async_rw's fast_iov as argument to call io_import_iovec(),
and if io_import_iovec() uses struct io_async_rw's fast_iov as
valid iovec array, later indeed io_req_map_rw() does not need
to do the memcpy operation, because they are same pointers.
Signed-off-by: Xiaoguang Wang <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
OPENAT2 correctly sets O_LARGEFILE if it has to, but that escaped the
OPENAT opcode. Dmitry reports that his test case that compares openat()
and IORING_OP_OPENAT sees failures on large files:
*** sync openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write succeeded
*** sync openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write succeeded
*** io_uring openat
openat succeeded
sync write at offset 0
write succeeded
sync write at offset 4294967296
write failed: File too large
*** io_uring openat
openat succeeded
io_uring write at offset 0
write succeeded
io_uring write at offset 4294967296
write failed: File too large
Ensure we set O_LARGEFILE, if force_o_largefile() is true.
Cc: [email protected] # v5.6
Fixes: 15b71abe7b52 ("io_uring: add support for IORING_OP_OPENAT")
Reported-by: Dmitry Kadashev <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Staring v4.18, Kconfig evaluates compiler capabilities, and hides CONFIG
options your compiler does not support. This works well if you configure
and build the kernel on the same host machine.
It is inconvenient if you prepare the .config that is carried to a
different build environment (typically this happens when you package
the kernel for distros) because using a different compiler potentially
produces different CONFIG options than the real build environment.
So, you probably want to make as many options visible as possible.
In other words, you need to create a super-set of CONFIG options that
cover any build environment. If some of the CONFIG options turned out
to be unsupported on the build machine, they are automatically disabled
by the nature of Kconfig.
However, it is not feasible to get a full-featured compiler for every
arch.
This issue was discussed here:
https://lkml.org/lkml/2019/12/9/620
Other than distros, savedefconfig is also a problem. Some arch sub-systems
periodically resync defconfig files. If you use a less-capable compiler
for savedefconfig, options that do not meet 'depends on $(cc-option,...)'
will be forcibly disabled. So, 'make defconfig && make savedefconfig'
may silently change the behavior.
This commit adds a set of dummy toolchains that pretend to support any
feature.
Most of compiler features are tested by cc-option, which simply checks
the exit code of $(CC). The dummy tools are shell scripts that always
exit with 0. So, $(cc-option, ...) is evaluated as 'y'.
There are more complicated checks such as:
scripts/gcc-x86_{32,64}-has-stack-protector.sh
scripts/gcc-plugin.sh
scripts/tools-support-relr.sh
scripts/dummy-tools/gcc passes all checks.
From the top directory of the source tree, you can do:
$ make CROSS_COMPILE=scripts/dummy-tools/ oldconfig
Signed-off-by: Masahiro Yamada <[email protected]>
Reviewed-by: Philipp Rudo <[email protected]>
Tested-by: Jeremy Cline <[email protected]>
|
|
Kbuild supports not only obj-y but also lib-y to list objects linked to
vmlinux.
The difference between them is that all the objects from obj-y are
forcibly linked to vmlinux, whereas the objects from lib-y are linked
as needed; if there is no user of a lib-y object, it is not linked.
lib-y is intended to list utility functions that may be called from all
over the place (and may be unused at all), but it is a problem for
EXPORT_SYMBOL(). Even if there is no call-site in the vmlinux, we need
to keep exported symbols for the use from loadable modules.
Commit 7f2084fa55e6 ("[kbuild] handle exports in lib-y objects reliably")
worked around it by linking a dummy object, lib-ksyms.o, which contains
references to all the symbols exported from lib.a in that directory.
It uses the linker script command, EXTERN. Unfortunately, the meaning of
EXTERN of ld.lld is different from that of ld.bfd. Therefore, this does
not work with LD=ld.lld (CBL issue #515).
Anyway, the build rule of lib-ksyms.o is somewhat tricky. So, I want to
get rid of it.
At first, I was thinking of accumulating lib-y objects into obj-y
(or even replacing lib-y with obj-y entirely), but the lib-y syntax
is used beyond the ordinary use in lib/ and arch/*/lib/.
Examples:
- drivers/firmware/efi/libstub/Makefile builds lib.a, which is linked
into vmlinux in the own way (arm64), or linked to the decompressor
(arm, x86).
- arch/alpha/lib/Makefile builds lib.a which is linked not only to
vmlinux, but also to bootloaders in arch/alpha/boot/Makefile.
- arch/xtensa/boot/lib/Makefile builds lib.a for use from
arch/xtensa/boot/boot-redboot/Makefile.
One more thing, adding everything to obj-y would increase the vmlinux
size of allnoconfig (or tinyconfig).
For less impact, I tweaked the destination of lib.a at the top Makefile;
when CONFIG_MODULES=y, lib.a goes to KBUILD_VMLINUX_OBJS, which is
forcibly linked to vmlinux, otherwise lib.a goes to KBUILD_VMLINUX_LIBS
as before.
The size impact for normal usecases is quite small since at lease one
symbol in every lib-y object is eventually called by someone. In case
you are intrested, here are the figures.
x86_64_defconfig:
text data bss dec hex filename
19566602 5422072 1589328 26578002 1958c52 vmlinux.before
19566932 5422104 1589328 26578364 1958dbc vmlinux.after
The case with the biggest impact is allnoconfig + CONFIG_MODULES=y.
ARCH=x86 allnoconfig + CONFIG_MODULES=y:
text data bss dec hex filename
1175162 254740 1220608 2650510 28718e vmlinux.before
1177974 254836 1220608 2653418 287cea vmlinux.after
Hopefully this is still not a big deal. The per-file trimming with the
static library is not so effective after all.
If fine-grained optimization is desired, some architectures support
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION, which trims dead code per-symbol
basis. When LTO is supported in mainline, even better optimization will
be possible.
Link: https://github.com/ClangBuiltLinux/linux/issues/515
Signed-off-by: Masahiro Yamada <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
|
|
As far as I understood, prom_meminit() in arch/mips/fw/arc/memory.c
is overridden by the one in arch/mips/sgi-ip32/ip32-memory.c if
CONFIG_SGI_IP32 is enabled.
The use of EXPORT_SYMBOL in static libraries potentially causes a
problem for the llvm linker [1]. So, I want to forcibly link lib-y
objects to vmlinux when CONFIG_MODULES=y.
As a groundwork, we must fix multiple definitions that have previously
been hidden by lib-y.
The prom_cleanup() in this file is already marked as __weak (because
it is overridden by the one in arch/mips/sgi-ip22/ip22-mc.c).
I think it should be OK to do the same for these two.
[1]: https://github.com/ClangBuiltLinux/linux/issues/515
Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Masahiro Yamada <[email protected]>
Acked-By: Thomas Bogendoerfer <[email protected]>
|