| Age | Commit message (Collapse) | Author | Files | Lines |
|
John David Anglin reported parisc has been broken since commit
ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost").
Like ia64, parisc64 uses a function descriptor. The function
references must be prefixed with P%.
Also, symbols prefixed $$ from the library have the symbol type
STT_LOPROC instead of STT_FUNC. They should be handled as functions
too.
Fixes: ddb5cdbafaaa ("kbuild: generate KSYMTAB entries by modpost")
Reported-by: John David Anglin <[email protected]>
Tested-by: John David Anglin <[email protected]>
Tested-by: Helge Deller <[email protected]>
Closes: https://lore.kernel.org/linux-parisc/[email protected]/T/#u
Signed-off-by: Masahiro Yamada <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
When SVE is enabled, the host may set bit 16 in SMCCC function IDs, a
hint that indicates an unused SVE state. At the moment NVHE doesn't
account for this bit when inspecting the function ID, and rejects most
calls. Clear the hint bit before comparing function IDs.
About version compatibility: the host's PSCI driver initially probes the
firmware for a SMCCC version number. If the firmware implements a
protocol recent enough (1.3), subsequent SMCCC calls have the hint bit
set. Since the hint bit was reserved in earlier versions of the
protocol, clearing it is fine regardless of the version in use.
When a new hint is added to the protocol in the future, it will be added
to ARM_SMCCC_CALL_HINTS and NVHE will handle it straight away. This
patch only clears known hints and leaves reserved bits as is, because
future SMCCC versions could use reserved bits as modifiers for the
function ID, rather than hints.
Fixes: cfa7ff959a78 ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint")
Reported-by: Ben Horgan <[email protected]>
Signed-off-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Marc Zyngier <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Commit 0f3a8c3f34f7 ("iio: Add support for creating IIO devices via configfs")
declared but never implemented iio_sw_device_type_configfs_{un}register().
Commit b662f809d410 ("iio: core: Introduce IIO software triggers") declared but
never implemented iio_sw_trigger_type_configfs_{un}register().
Commit a3e0b51884ee ("iio: accel: add support for FXLS8962AF/FXLS8964AF accelerometers")
declared but never implemented fxls8962af_core_remove().
Commit 8dedcc3eee3a ("iio: core: centralize ioctl() calls to the main chardev")
declared but never implemented iio_device_ioctl().
Commit d430f3c36ca6 ("iio: imu: inv_mpu6050: Use regmap instead of i2c specific functions")
removed inv_mpu6050_write_reg() but not its declaration.
Signed-off-by: Yue Haibing <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jonathan Cameron <[email protected]>
|
|
Add support for the WMI methods used to turn off and adjust the
brightness of the secondary "screenpad" device found on some high-end
ASUS laptops like the GX650P series and others.
There are some small quirks with this device when considering only the
raw WMI methods:
1. The Off method can only switch the device off
2. Changing the brightness turns the device back on
3. To turn the device back on the brightness must be > 1
4. When the device is off the brightness can't be changed (so it is
stored by the driver if device is off).
5. Booting with a value of 0 brightness (retained by bios) means the bios
will set a value of >0 <15
6. When the device is off it is "unplugged"
asus_wmi sets the minimum brightness as 20 in general use, and 60 for
booting with values <= min.
The ACPI methods are used in a new backlight device named asus_screenpad.
Signed-off-by: Luke D. Jones <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Hans de Goede <[email protected]>
Signed-off-by: Hans de Goede <[email protected]>
|
|
It makes sense for a GPIO driver to want to get its own descriptor
without requesting it. After all, the driver knows that it'll still be
valid. Let's move this helper to linux/gpio/driver.h.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Acked-by: Linus Walleij <[email protected]>
|
|
printbuf now needs to know the number of characters that would have been
written if the buffer was too small, like snprintf(); this changes
string_get_size() to return the the return value of snprintf().
Signed-off-by: Kent Overstreet <[email protected]>
|
|
New helper for bcachefs - bcachefs doesn't want the
inode_dec_link_count() call that d_tmpfile does, it handles i_nlink on
its own atomically with other btree updates
Signed-off-by: Kent Overstreet <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: [email protected]
Reviewed-by: Darrick J. Wong <[email protected]>
Reviewed-by: Christian Brauner <[email protected]>
|
|
There has been a long standing page cache coherence bug with direct IO.
This provides part of a mechanism to fix it, currently just used by
bcachefs but potentially worth promoting to the VFS.
Direct IO evicts the range of the pagecache being read or written to.
For reads, we need dirty pages to be written to disk, so that the read
doesn't return stale data. For writes, we need to evict that range of
the pagecache so that it's not stale after the write completes.
However, without a locking mechanism to prevent those pages from being
re-added to the pagecache - by a buffered read or page fault - page
cache inconsistency is still possible.
This isn't necessarily just an issue for userspace when they're playing
games; filesystems may hang arbitrary state off the pagecache, and so
page cache inconsistency may cause real filesystem bugs, depending on
the filesystem. This is less of an issue for iomap based filesystems,
but e.g. buffer heads caches disk block mappings (!) and attaches them
to the pagecache, and bcachefs attaches disk reservations to pagecache
pages.
This issue has been hard to fix, because
- we need to add a lock (henceforth called pagecache_add_lock), which
would be held for the duration of the direct IO
- page faults add pages to the page cache, thus need to take the same
lock
- dio -> gup -> page fault thus can deadlock
And we cannot enforce a lock ordering with this lock, since userspace
will be controlling the lock ordering (via the fd and buffer arguments
to direct IOs), so we need a different method of deadlock avoidance.
We need to tell the page fault handler that we're already holding a
pagecache_add_lock, and since plumbing it through the entire gup() path
would be highly impractical this adds a field to task_struct.
Then the full method is:
- in the dio path, when we first take the pagecache_add_lock, note the
mapping in the current task_struct
- in the page fault handler, if faults_disabled_mapping is set, we
check if it's the same mapping as the one we're taking a page fault
for, and if so return an error.
Then we check lock ordering: if there's a lock ordering violation and
trylock fails, we'll have to cycle the locks and return an error that
tells the DIO path to retry: faults_disabled_mapping is also used for
signalling "locks were dropped, please retry".
Also relevant to this patch: mapping->invalidate_lock.
mapping->invalidate_lock provides most of the required semantics - it's
used by truncate/fallocate to block pages being added to the pagecache.
However, since it's a rwsem, direct IOs would need to take the write
side in order to block page cache adds, and would then be exclusive with
each other - we'll need a new type of lock to pair with this approach.
Signed-off-by: Kent Overstreet <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Darrick J. Wong <[email protected]>
Cc: [email protected]
Cc: Andreas Grünbacher <[email protected]>
|
|
Export and move the declaration of pcie_aer_is_native() to a common header
file to be reused by cxl/pci module.
Signed-off-by: Smita Koralahalli <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Reviewed-by: Kuppuswamy Sathyanarayanan <[email protected]>
Reviewed-by: Robert Richter <[email protected]>
Reviewed-by: Jonathan Cameron <[email protected]>
Reviewed-by: Dave Jiang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dan Williams <[email protected]>
|
|
To make handling BIG and LITTLE endian better the offset/len of dynamic
fields of the synthetic events was changed into a structure of:
struct trace_dynamic_info {
#ifdef CONFIG_CPU_BIG_ENDIAN
u16 offset;
u16 len;
#else
u16 len;
u16 offset;
#endif
};
to replace the manual changes of:
data_offset = offset & 0xffff;
data_offest = len << 16;
But if you look closely, the above is:
<len> << 16 | offset
Which in little endian would be in memory:
offset_lo offset_hi len_lo len_hi
and in big endian:
len_hi len_lo offset_hi offset_lo
Which if broken into a structure would be:
struct trace_dynamic_info {
#ifdef CONFIG_CPU_BIG_ENDIAN
u16 len;
u16 offset;
#else
u16 offset;
u16 len;
#endif
};
Which is the opposite of what was defined.
Fix this and just to be safe also add "__packed".
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: [email protected]
Cc: Mark Rutland <[email protected]>
Tested-by: Sven Schnelle <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Fixes: ddeea494a16f3 ("tracing/synthetic: Use union instead of casts")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
It is sometimes helpful to have a way for the subsystem causing
the stall to dump its state when an RCU CPU stall occurs. This
commit therefore bases rcu_stall_chain_notifier_register() and
rcu_stall_chain_notifier_unregister() on atomic notifiers in order to
provide this functionality.
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Steven Rostedt <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
There are no users of the cpu hotplug hooks in xfs now, so remove it.
This reverts f1653c2e2831e ("xfs: introduce CPU hotplug
infrastructure").
Signed-off-by: Darrick J. Wong <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
|
|
Add 'const' to the definition of the 'trip' argument of the
.get_trend() thermal zone callback to indicate that the trip point
passed to it should not be modified by it and adjust the
callback functions implementing it, thermal_get_trend() in the
ACPI thermal driver and __ti_thermal_get_trend(), accordingly.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Michal Wilczynski <[email protected]>
|
|
The interfaces for the fbdev logo are not used outside of the fbdev
module. Hence declare the fbdev logo functions in the internal header
file and remove their symbol exports. Only build the functions if
CONFIG_LOGO has been selected.
Signed-off-by: Thomas Zimmermann <[email protected]>
Acked-by: Javier Martinez Canillas <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Before exporting these helpers to modules, make their names more
meaningful.
The names mnt_{get,put)_write_access*() were chosen, because they rhyme
with the inode {get,put)_write_access() helpers, which have a very close
meaning for the inode object.
Suggested-by: Christian Brauner <[email protected]>
Link: https://lore.kernel.org/r/20230817-anfechtbar-ruhelosigkeit-8c6cca8443fc@brauner/
Signed-off-by: Amir Goldstein <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
1. Following the ARM SMC32 calling convention, the return value
from secure monitor is a 32-bit signed integer. This patch changes
the type of the return value of the function meson_sm_call().
2. Now, when meson_sm_call() returns a 32-bit signed integer, we need
to ensure that this value is not negative. It is important to check
that the return value is not negative in both the meson_sm_call_read()
and meson_sm_call_write() functions.
3. Add a comment explaining why it is necessary to check if the SMC
return value is equal to 0 in the function meson_sm_call_read().
It is not obvious when reading this code.
Signed-off-by: Alexey Romanov <[email protected]>
Reviewed-by: Neil Armstrong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Neil Armstrong <[email protected]>
|
|
Currently struct ieee80211_tim_ie defines:
u8 virtual_map[1];
Per the guidance in [1] change this to be a flexible array.
Per the discussion in [2] wrap the virtual_map in a union with a u8
item in order to preserve the existing expectation that the
virtual_map must contain at least one octet (at least when used in a
non-S1G PPDU). This means that no driver changes are required.
[1] https://docs.kernel.org/process/deprecated.html#zero-length-and-one-element-arrays
[2] https://lore.kernel.org/linux-wireless/202308301529.AC90A9EF98@keescook/
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Jeff Johnson <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[add wifi prefix]
Signed-off-by: Johannes Berg <[email protected]>
|
|
'extern' doesn't do anything for function declarations. Remove it.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
|
|
Fix a double newline in the GPIO provider header.
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
There are no and never have been any users of gpiod_set_transitory()
outside the core GPIOLIB code. Make it private.
Signed-off-by: Bartosz Golaszewski <[email protected]>
Reviewed-by: Linus Walleij <[email protected]>
|
|
Drop Itanium support from the RAID6 code, and along with it, the 16x and
32x unrolled versions, which were only used by IA64.
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
The Itanium architecture is obsolete, and an informal survey [0] reveals
that any residual use of Itanium hardware in production is mostly HP-UX
or OpenVMS based. The use of Linux on Itanium appears to be limited to
enthusiasts that occasionally boot a fresh Linux kernel to see whether
things are still working as intended, and perhaps to churn out some
distro packages that are rarely used in practice.
None of the original companies behind Itanium still produce or support
any hardware or software for the architecture, and it is listed as
'Orphaned' in the MAINTAINERS file, as apparently, none of the engineers
that contributed on behalf of those companies (nor anyone else, for that
matter) have been willing to support or maintain the architecture
upstream or even be responsible for applying the odd fix. The Intel
firmware team removed all IA-64 support from the Tianocore/EDK2
reference implementation of EFI in 2018. (Itanium is the original
architecture for which EFI was developed, and the way Linux supports it
deviates significantly from other architectures.) Some distros, such as
Debian and Gentoo, still maintain [unofficial] ia64 ports, but many have
dropped support years ago.
While the argument is being made [1] that there is a 'for the common
good' angle to being able to build and run existing projects such as the
Grid Community Toolkit [2] on Itanium for interoperability testing, the
fact remains that none of those projects are known to be deployed on
Linux/ia64, and very few people actually have access to such a system in
the first place. Even if there were ways imaginable in which Linux/ia64
could be put to good use today, what matters is whether anyone is
actually doing that, and this does not appear to be the case.
There are no emulators widely available, and so boot testing Itanium is
generally infeasible for ordinary contributors. GCC still supports IA-64
but its compile farm [3] no longer has any IA-64 machines. GLIBC would
like to get rid of IA-64 [4] too because it would permit some overdue
code cleanups. In summary, the benefits to the ecosystem of having IA-64
be part of it are mostly theoretical, whereas the maintenance overhead
of keeping it supported is real.
So let's rip off the band aid, and remove the IA-64 arch code entirely.
This follows the timeline proposed by the Debian/ia64 maintainer [5],
which removes support in a controlled manner, leaving IA-64 in a known
good state in the most recent LTS release. Other projects will follow
once the kernel support is removed.
[0] https://lore.kernel.org/all/CAMj1kXFCMh_578jniKpUtx_j8ByHnt=s7S+yQ+vGbKt9ud7+kQ@mail.gmail.com/
[1] https://lore.kernel.org/all/[email protected]/
[2] https://gridcf.org/gct-docs/latest/index.html
[3] https://cfarm.tetaneutral.net/machines/list/
[4] https://lore.kernel.org/all/[email protected]/
[5] https://lore.kernel.org/all/ff58a3e76e5102c94bb5946d99187b358def688a.camel@physik.fu-berlin.de/
Acked-by: Tony Luck <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
|
|
New code should solely use firmware nodes for the specifics and
not any callbacks.
Signed-off-by: Andy Shevchenko <[email protected]>
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
In AHCI 1.3.1, the register description for CAP.SSC:
"When cleared to ‘0’, software must not allow the HBA to initiate
transitions to the Slumber state via agressive link power management nor
the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port
must be programmed to disallow device initiated Slumber requests."
In AHCI 1.3.1, the register description for CAP.PSC:
"When cleared to ‘0’, software must not allow the HBA to initiate
transitions to the Partial state via agressive link power management nor
the PxCMD.ICC field in each port, and the PxSCTL.IPM field in each port
must be programmed to disallow device initiated Partial requests."
Ensure that we always set the corresponding bits in PxSCTL.IPM, such that
a device is not allowed to initiate transitions to power states which are
unsupported by the HBA.
DevSleep is always initiated by the HBA, however, for completeness, set the
corresponding bit in PxSCTL.IPM such that agressive link power management
cannot transition to DevSleep if DevSleep is not supported.
sata_link_scr_lpm() is used by libahci, ata_piix and libata-pmp.
However, only libahci has the ability to read the CAP/CAP2 register to see
if these features are supported. Therefore, in order to not introduce any
regressions on ata_piix or libata-pmp, create flags that indicate that the
respective feature is NOT supported. This way, the behavior for ata_piix
and libata-pmp should remain unchanged.
This change is based on a patch originally submitted by Runa Guo-oc.
Signed-off-by: Niklas Cassel <[email protected]>
Fixes: 1152b2617a6e ("libata: implement sata_link_scr_lpm() and make ata_dev_set_feature() global")
Cc: [email protected]
Signed-off-by: Damien Le Moal <[email protected]>
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- six smb3 client fixes including ones to allow controlling smb3
directory caching timeout and limits, and one debugging improvement
- one fix for nls Kconfig (don't need to expose NLS_UCS2_UTILS option)
- one minor spnego registry update
* tag '6.6-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
spnego: add missing OID to oid registry
smb3: fix minor typo in SMB2_GLOBAL_CAP_LARGE_MTU
cifs: update internal module version number for cifs.ko
smb3: allow controlling maximum number of cached directories
smb3: add trace point for queryfs (statfs)
nls: Hide new NLS_UCS2_UTILS
smb3: allow controlling length of time directory entries are cached with dir leases
smb: propagate error code of extract_sharename()
|
|
Pull more SCSI updates from James Bottomley:
"Mostly small stragglers that missed the initial merge.
Driver updates are qla2xxx and smartpqi (mp3sas has a high diffstat
due to the volatile qualifier removal, fnic due to unused function
removal and sd.c has a lot of code shuffling to remove forward
declarations)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (38 commits)
scsi: ufs: core: No need to update UPIU.header.flags and lun in advanced RPMB handler
scsi: ufs: core: Add advanced RPMB support where UFSHCI 4.0 does not support EHS length in UTRD
scsi: mpt3sas: Remove volatile qualifier
scsi: mpt3sas: Perform additional retries if doorbell read returns 0
scsi: libsas: Simplify sas_queue_reset() and remove unused code
scsi: ufs: Fix the build for the old ARM OABI
scsi: qla2xxx: Fix unused variable warning in qla2xxx_process_purls_pkt()
scsi: fnic: Remove unused functions fnic_scsi_host_start/end_tag()
scsi: qla2xxx: Fix spelling mistake "tranport" -> "transport"
scsi: fnic: Replace sgreset tag with max_tag_id
scsi: qla2xxx: Remove unused variables in qla24xx_build_scsi_type_6_iocbs()
scsi: qla2xxx: Fix nvme_fc_rcv_ls_req() undefined error
scsi: smartpqi: Change driver version to 2.1.24-046
scsi: smartpqi: Enhance error messages
scsi: smartpqi: Enhance controller offline notification
scsi: smartpqi: Enhance shutdown notification
scsi: smartpqi: Simplify lun_number assignment
scsi: smartpqi: Rename pciinfo to pci_info
scsi: smartpqi: Rename MACRO to clarify purpose
scsi: smartpqi: Add abort handler
...
|
|
Add missing OID to the registry. Some servers and clients (including
Windows) now request "NEGOEX - SPNEGEO Extended Negotiation Security")
See https://datatracker.ietf.org/doc/html/draft-zhu-negoex-02
Reviewed-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Pull xarray fixes from Matthew Wilcox:
- Fix a bug encountered by people using bittorrent where they'd get
NULL pointer dereferences on page cache lookups when using XFS
- Two documentation fixes
* tag 'xarray-6.6' of git://git.infradead.org/users/willy/xarray:
idr: fix param name in idr_alloc_cyclic() doc
xarray: Document necessary flag in alloc functions
XArray: Do not return sibling entries from xa_load()
|
|
Now that eventfs structure is used to create the events directory via the
eventfs dynamically allocate code, the "dir" field of the trace_event_file
structure is no longer used. Remove it.
Link: https://lkml.kernel.org/r/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ajay Kaher <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more thermal control updates from Rafael Wysocki:
"Eliminate an obsolete thermal zone registration function"
* tag 'thermal-6.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: core: Drop thermal_zone_device_register()
thermal: Use thermal_tripless_zone_device_register()
thermal: core: Add function for registering tripless thermal zones
thermal: core: Clean up headers of thermal zone registration functions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Allow usage of LSX/LASX in the kernel, and use them for
SIMD-optimized RAID5/RAID6 routines
- Add Loongson Binary Translation (LBT) extension support
- Add basic KGDB & KDB support
- Add building with kcov coverage
- Add KFENCE (Kernel Electric-Fence) support
- Add KASAN (Kernel Address Sanitizer) support
- Some bug fixes and other small changes
- Update the default config file
* tag 'loongarch-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (25 commits)
LoongArch: Update Loongson-3 default config file
LoongArch: Add KASAN (Kernel Address Sanitizer) support
LoongArch: Simplify the processing of jumping new kernel for KASLR
kasan: Add (pmd|pud)_init for LoongArch zero_(pud|p4d)_populate process
kasan: Add __HAVE_ARCH_SHADOW_MAP to support arch specific mapping
LoongArch: Add KFENCE (Kernel Electric-Fence) support
LoongArch: Get partial stack information when providing regs parameter
LoongArch: mm: Add page table mapped mode support for virt_to_page()
kfence: Defer the assignment of the local variable addr
LoongArch: Allow building with kcov coverage
LoongArch: Provide kaslr_offset() to get kernel offset
LoongArch: Add basic KGDB & KDB support
LoongArch: Add Loongson Binary Translation (LBT) extension support
raid6: Add LoongArch SIMD recovery implementation
raid6: Add LoongArch SIMD syndrome calculation
LoongArch: Add SIMD-optimized XOR routines
LoongArch: Allow usage of LSX/LASX in the kernel
LoongArch: Define symbol 'fault' as a local label in fpu.S
LoongArch: Adjust {copy, clear}_user exception handler behavior
LoongArch: Use static defined zero page rather than allocated
...
|
|
The bpf helpers bpf_this_cpu_ptr() and bpf_per_cpu_ptr() are re-purposed
for allocated percpu objects. For an allocated percpu obj,
the reg type is 'PTR_TO_BTF_ID | MEM_PERCPU | MEM_RCU'.
The return type for these two re-purposed helpera is
'PTR_TO_MEM | MEM_RCU | MEM_ALLOC'.
The MEM_ALLOC allows that the per-cpu data can be read and written.
Since the memory allocator bpf_mem_alloc() returns
a ptr to a percpu ptr for percpu data, the first argument
of bpf_this_cpu_ptr() and bpf_per_cpu_ptr() is patched
with a dereference before passing to the helper func.
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
BPF_KPTR_PERCPU represents a percpu field type like below
struct val_t {
... fields ...
};
struct t {
...
struct val_t __percpu_kptr *percpu_data_ptr;
...
};
where
#define __percpu_kptr __attribute__((btf_type_tag("percpu_kptr")))
While BPF_KPTR_REF points to a trusted kernel object or a trusted
local object, BPF_KPTR_PERCPU points to a trusted local
percpu object.
This patch added basic support for BPF_KPTR_PERCPU
related to percpu_kptr field parsing, recording and free operations.
BPF_KPTR_PERCPU also supports the same map types
as BPF_KPTR_REF does.
Note that unlike a local kptr, it is possible that
a BPF_KTPR_PERCPU struct may not contain any
special fields like other kptr, bpf_spin_lock, bpf_list_head, etc.
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
This is needed for later percpu mem allocation when the
allocation is done by bpf program. For such cases, a global
bpf_global_percpu_ma is added where a flexible allocation
size is needed.
Signed-off-by: Yonghong Song <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking updates from Jakub Kicinski:
"Including fixes from netfilter and bpf.
Current release - regressions:
- eth: stmmac: fix failure to probe without MAC interface specified
Current release - new code bugs:
- docs: netlink: fix missing classic_netlink doc reference
Previous releases - regressions:
- deal with integer overflows in kmalloc_reserve()
- use sk_forward_alloc_get() in sk_get_meminfo()
- bpf_sk_storage: fix the missing uncharge in sk_omem_alloc
- fib: avoid warn splat in flow dissector after packet mangling
- skb_segment: call zero copy functions before using skbuff frags
- eth: sfc: check for zero length in EF10 RX prefix
Previous releases - always broken:
- af_unix: fix msg_controllen test in scm_pidfd_recv() for
MSG_CMSG_COMPAT
- xsk: fix xsk_build_skb() dereferencing possible ERR_PTR()
- netfilter:
- nft_exthdr: fix non-linear header modification
- xt_u32, xt_sctp: validate user space input
- nftables: exthdr: fix 4-byte stack OOB write
- nfnetlink_osf: avoid OOB read
- one more fix for the garbage collection work from last release
- igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU
- bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t
- handshake: fix null-deref in handshake_nl_done_doit()
- ip: ignore dst hint for multipath routes to ensure packets are
hashed across the nexthops
- phy: micrel:
- correct bit assignments for cable test errata
- disable EEE according to the KSZ9477 errata
Misc:
- docs/bpf: document compile-once-run-everywhere (CO-RE) relocations
- Revert "net: macsec: preserve ingress frame ordering", it appears
to have been developed against an older kernel, problem doesn't
exist upstream"
* tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs()
Revert "net: team: do not use dynamic lockdep key"
net: hns3: remove GSO partial feature bit
net: hns3: fix the port information display when sfp is absent
net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
net: hns3: fix debugfs concurrency issue between kfree buffer and read
net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
net: hns3: Support query tx timeout threshold by debugfs
net: hns3: fix tx timeout issue
net: phy: Provide Module 4 KSZ9477 errata (DS80000754C)
netfilter: nf_tables: Unbreak audit log reset
netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID
netfilter: nfnetlink_osf: avoid OOB read
netfilter: nftables: exthdr: fix 4-byte stack OOB write
selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
s390/bpf: Pass through tail call counter in trampolines
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull more devicetree updates from Rob Herring:
"A couple of conversions which didn't get picked up by the subsystems
and one fix:
- Convert st,stih407-irq-syscfg and Omnivision OV7251 bindings to DT
schema
- Merge Omnivision OV5695 into OV5693 binding
- Fix of_overlay_fdt_apply prototype when !CONFIG_OF_OVERLAY"
* tag 'devicetree-fixes-for-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: irqchip: convert st,stih407-irq-syscfg to DT schema
media: dt-bindings: Convert Omnivision OV7251 to DT schema
media: dt-bindings: Merge OV5695 into OV5693 binding
of: overlay: Fix of_overlay_fdt_apply prototype when !CONFIG_OF_OVERLAY
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"Various cleanups and fixes across the board"
* tag 'pwm/for-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm: (31 commits)
pwm: lpc32xx: Remove handling of PWM channels
pwm: atmel: Simplify using devm functions
dt-bindings: pwm: brcm,kona-pwm: convert to YAML
pwm: stmpe: Handle errors when disabling the signal
pwm: stm32: Simplify using devm_pwmchip_add()
pwm: stm32: Don't modify HW state in .remove() callback
pwm: Fix order of freeing resources in pwmchip_remove()
pwm: ntxec: Use device_set_of_node_from_dev()
pwm: ntxec: Drop a write-only variable from driver data
pwm: pxa: Don't reimplement of_device_get_match_data()
pwm: lpc18xx-sct: Simplify using devm_clk_get_enabled()
pwm: atmel-tcb: Don't track polarity in driver data
pwm: atmel-tcb: Unroll atmel_tcb_pwm_set_polarity() into only caller
pwm: atmel-tcb: Put per-channel data into driver data
pwm: atmel-tcb: Fix resource freeing in error path and remove
pwm: atmel-tcb: Harmonize resource allocation order
pwm: Drop unused #include <linux/radix-tree.h>
pwm: rz-mtu3: Fix build warning 'num_channel_ios' not described
pwm: Remove outdated documentation for pwmchip_remove()
pwm: atmel: Enable clk when pwm already enabled in bootloader
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"Subsystem:
- Add a way for drivers to tell the core the supported alarm range is
smaller than the date range. This is not used yet but will be
useful for the alarmtimers in the next release.
- fix Wvoid-pointer-to-enum-cast warnings
- remove redundant of_match_ptr()
- stop warning for invalid alarms when the alarm is disabled
Drivers:
- isl12022: allow setting the trip level for battery level detection
- pcf2127: add support for PCF2131 and multiple timestamps
- stm32: time precision improvement, many fixes
- twl: NVRAM support"
* tag 'rtc-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (73 commits)
dt-bindings: rtc: ds3231: Remove text binding
rtc: wm8350: remove unnecessary messages
rtc: twl: remove unnecessary messages
rtc: sun6i: remove unnecessary message
rtc: stop warning for invalid alarms when the alarm is disabled
rtc: twl: add NVRAM support
rtc: pcf85363: Allow to wake up system without IRQ
rtc: m48t86: add DT support for m48t86
dt-bindings: rtc: Add ST M48T86
rtc: pcf2127: remove useless check
rtc: rzn1: Report maximum alarm limit to rtc core
rtc: ds1305: Report maximum alarm limit to rtc core
rtc: tps6586x: Report maximum alarm limit to rtc core
rtc: cmos: Report supported alarm limit to rtc infrastructure
rtc: cros-ec: Detect and report supported alarm window size
rtc: Add support for limited alarm timer offsets
rtc: isl1208: Fix incorrect logic in isl1208_set_xtoscb()
MAINTAINERS: remove obsolete pattern in RTC SUBSYSTEM section
rtc: tps65910: Remove redundant dev_warn() and do not check for 0 return after calling platform_get_irq()
rtc: omap: Do not check for 0 return after calling platform_get_irq()
...
|
|
Pull kvm updates from Paolo Bonzini:
"ARM:
- Clean up vCPU targets, always returning generic v8 as the preferred
target
- Trap forwarding infrastructure for nested virtualization (used for
traps that are taken from an L2 guest and are needed by the L1
hypervisor)
- FEAT_TLBIRANGE support to only invalidate specific ranges of
addresses when collapsing a table PTE to a block PTE. This avoids
that the guest refills the TLBs again for addresses that aren't
covered by the table PTE.
- Fix vPMU issues related to handling of PMUver.
- Don't unnecessary align non-stack allocations in the EL2 VA space
- Drop HCR_VIRT_EXCP_MASK, which was never used...
- Don't use smp_processor_id() in kvm_arch_vcpu_load(), but the cpu
parameter instead
- Drop redundant call to kvm_set_pfn_accessed() in user_mem_abort()
- Remove prototypes without implementations
RISC-V:
- Zba, Zbs, Zicntr, Zicsr, Zifencei, and Zihpm support for guest
- Added ONE_REG interface for SATP mode
- Added ONE_REG interface to enable/disable multiple ISA extensions
- Improved error codes returned by ONE_REG interfaces
- Added KVM_GET_REG_LIST ioctl() implementation for KVM RISC-V
- Added get-reg-list selftest for KVM RISC-V
s390:
- PV crypto passthrough enablement (Tony, Steffen, Viktor, Janosch)
Allows a PV guest to use crypto cards. Card access is governed by
the firmware and once a crypto queue is "bound" to a PV VM every
other entity (PV or not) looses access until it is not bound
anymore. Enablement is done via flags when creating the PV VM.
- Guest debug fixes (Ilya)
x86:
- Clean up KVM's handling of Intel architectural events
- Intel bugfixes
- Add support for SEV-ES DebugSwap, allowing SEV-ES guests to use
debug registers and generate/handle #DBs
- Clean up LBR virtualization code
- Fix a bug where KVM fails to set the target pCPU during an IRTE
update
- Fix fatal bugs in SEV-ES intrahost migration
- Fix a bug where the recent (architecturally correct) change to
reinject #BP and skip INT3 broke SEV guests (can't decode INT3 to
skip it)
- Retry APIC map recalculation if a vCPU is added/enabled
- Overhaul emergency reboot code to bring SVM up to par with VMX, tie
the "emergency disabling" behavior to KVM actually being loaded,
and move all of the logic within KVM
- Fix user triggerable WARNs in SVM where KVM incorrectly assumes the
TSC ratio MSR cannot diverge from the default when TSC scaling is
disabled up related code
- Add a framework to allow "caching" feature flags so that KVM can
check if the guest can use a feature without needing to search
guest CPUID
- Rip out the ancient MMU_DEBUG crud and replace the useful bits with
CONFIG_KVM_PROVE_MMU
- Fix KVM's handling of !visible guest roots to avoid premature
triple fault injection
- Overhaul KVM's page-track APIs, and KVMGT's usage, to reduce the
API surface that is needed by external users (currently only
KVMGT), and fix a variety of issues in the process
Generic:
- Wrap kvm_{gfn,hva}_range.pte in a union to allow mmu_notifier
events to pass action specific data without needing to constantly
update the main handlers.
- Drop unused function declarations
Selftests:
- Add testcases to x86's sync_regs_test for detecting KVM TOCTOU bugs
- Add support for printf() in guest code and covert all guest asserts
to use printf-based reporting
- Clean up the PMU event filter test and add new testcases
- Include x86 selftests in the KVM x86 MAINTAINERS entry"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (279 commits)
KVM: x86/mmu: Include mmu.h in spte.h
KVM: x86/mmu: Use dummy root, backed by zero page, for !visible guest roots
KVM: x86/mmu: Disallow guest from using !visible slots for page tables
KVM: x86/mmu: Harden TDP MMU iteration against root w/o shadow page
KVM: x86/mmu: Harden new PGD against roots without shadow pages
KVM: x86/mmu: Add helper to convert root hpa to shadow page
drm/i915/gvt: Drop final dependencies on KVM internal details
KVM: x86/mmu: Handle KVM bookkeeping in page-track APIs, not callers
KVM: x86/mmu: Drop @slot param from exported/external page-track APIs
KVM: x86/mmu: Bug the VM if write-tracking is used but not enabled
KVM: x86/mmu: Assert that correct locks are held for page write-tracking
KVM: x86/mmu: Rename page-track APIs to reflect the new reality
KVM: x86/mmu: Drop infrastructure for multiple page-track modes
KVM: x86/mmu: Use page-track notifiers iff there are external users
KVM: x86/mmu: Move KVM-only page-track declarations to internal header
KVM: x86: Remove the unused page-track hook track_flush_slot()
drm/i915/gvt: switch from ->track_flush_slot() to ->track_remove_region()
KVM: x86: Add a new page-track hook to handle memslot deletion
drm/i915/gvt: Don't bother removing write-protection on to-be-deleted slot
KVM: x86: Reject memslot MOVE operations if KVMGT is attached
...
|
|
If the buffer pointed to by the buffer_head is part of a compound page,
bh_offset() assumes that b_page is the precise page that contains
the data. A recent change to jbd2 inadvertently violated that assumption.
By using page_size(), we support both b_page being set to the head page
(as page_size() will return the size of the entire folio) and the precise
page (as it will return PAGE_SIZE for a tail page).
Fixes: 8147c4c4546f ("jbd2: use a folio in jbd2_journal_write_metadata_buffer()")
Reported-by: Zorro Lang <[email protected]>
Tested-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
|
|
This reverts commit 39285e124edbc752331e98ace37cc141a6a3747a.
Looks like the change has unintended consequences in exposing
objects before they are initialized. Let's drop this patch
and try again in net-next.
Reported-by: [email protected]
Fixes: 39285e124edb ("net: team: do not use dynamic lockdep key")
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The KSZ9477 errata points out (in 'Module 4') the link up/down problems
when EEE (Energy Efficient Ethernet) is enabled in the device to which
the KSZ9477 tries to auto negotiate.
The suggested workaround is to clear advertisement of EEE for PHYs in
this chip driver.
To avoid regressions with other switch ICs the new MICREL_NO_EEE flag
has been introduced.
Moreover, the in-register disablement of MMD_DEVICE_ID_EEE_ADV.MMD_EEE_ADV
MMD register is removed, as this code is both; now executed too late
(after previous rework of the PHY and DSA for KSZ switches) and not
required as setting all members of eee_broken_modes bit field prevents
the KSZ9477 from advertising EEE.
Fixes: 69d3b36ca045 ("net: dsa: microchip: enable EEE support") # for KSZ9477
Signed-off-by: Lukasz Majewski <[email protected]>
Tested-by: Oleksij Rempel <[email protected]> # Confirmed disabled EEE with oscilloscope.
Reviewed-by: Oleksij Rempel <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull ceph updates from Ilya Dryomov:
"Mixed with some fixes and cleanups, this brings in reasonably complete
fscrypt support to CephFS! The list of things which don't work with
encryption should be fairly short, mostly around the edges: fallocate
(not supported well in CephFS to begin with), copy_file_range
(requires re-encryption), non-default striping patterns.
This was a multi-year effort principally by Jeff Layton with
assistance from Xiubo Li, Luís Henriques and others, including several
dependant changes in the MDS, netfs helper library and fscrypt
framework itself"
* tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client: (53 commits)
ceph: make num_fwd and num_retry to __u32
ceph: make members in struct ceph_mds_request_args_ext a union
rbd: use list_for_each_entry() helper
libceph: do not include crypto/algapi.h
ceph: switch ceph_lookup/atomic_open() to use new fscrypt helper
ceph: fix updating i_truncate_pagecache_size for fscrypt
ceph: wait for OSD requests' callbacks to finish when unmounting
ceph: drop messages from MDS when unmounting
ceph: update documentation regarding snapshot naming limitations
ceph: prevent snapshot creation in encrypted locked directories
ceph: add support for encrypted snapshot names
ceph: invalidate pages when doing direct/sync writes
ceph: plumb in decryption during reads
ceph: add encryption support to writepage and writepages
ceph: add read/modify/write to ceph_sync_write
ceph: align data in pages in ceph_sync_write
ceph: don't use special DIO path for encrypted inodes
ceph: add truncate size handling support for fscrypt
ceph: add object version support for sync read
libceph: allow ceph_osdc_new_request to accept a multi-op read
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a new driver for Azoteq IQS7210A/7211A/E touch controllers
- support for Azoteq IQS7222D variant added to iqs7222 driver
- support for touch keys functionality added to Melfas MMS114 driver
- new hardware IDs added to exc3000 and Goodix drivers
- xpad driver gained support for GameSir T4 Kaleid Controller
- a fix for xpad driver to properly support some third-party
controllers that need a magic packet to start properly
- a fix for psmouse driver to more reliably switch to RMI4 mode on
devices that use native RMI4/SMbus protocol
- a quirk for i8042 for TUXEDO Gemini 17 Gen1/Clevo PD70PN laptops
- multiple drivers have been updated to make use of devm and other
newer APIs such as dev_err_probe(), devm_regulator_get_enable(), and
others.
* tag 'input-for-v6.6-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (83 commits)
Input: goodix - add support for ACPI ID GDX9110
Input: rpckbd - fix the return value handle for platform_get_irq()
Input: tca6416-keypad - switch to using input core's polling features
Input: tca6416-keypad - convert to use devm_* api
Input: tca6416-keypad - fix interrupt enable disbalance
Input: tca6416-keypad - rely on I2C core to set up suspend/resume
Input: tca6416-keypad - always expect proper IRQ number in i2c client
Input: lm8323 - convert to use devm_* api
Input: lm8323 - rely on device core to create kp_disable attribute
Input: qt2160 - convert to use devm_* api
Input: qt2160 - do not hard code interrupt trigger
Input: qt2160 - switch to using threaded interrupt handler
Input: qt2160 - tweak check for i2c adapter functionality
Input: psmouse - add delay when deactivating for SMBus mode
Input: mcs-touchkey - fix uninitialized use of error in mcs_touchkey_probe()
Input: qt1070 - convert to use devm_* api
Input: mcs-touchkey - convert to use devm_* api
Input: amikbd - convert to use devm_* api
Input: lm8333 - convert to use devm_* api
Input: mms114 - add support for touch keys
...
|
|
MIPS, LoongArch and some other architectures have many holes between
different segments and the valid address space (256T available) is
insufficient to map all these segments to kasan shadow memory with the
common formula provided by kasan core. So we need architecture specific
mapping formulas to ensure different segments are mapped individually,
and only limited space lengths of those specific segments are mapped to
shadow.
Therefore, when the incoming address is converted to a shadow, we need
to add a condition to determine whether it is valid.
Reviewed-by: Andrey Konovalov <[email protected]>
Signed-off-by: Qing Zhang <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
|
|
Similar to the syndrome calculation, the recovery algorithms also work
on 64 bytes at a time to align with the L1 cache line size of current
and future LoongArch cores (that we care about). Which means
unrolled-by-4 LSX and unrolled-by-2 LASX code.
The assembly is originally based on the x86 SSSE3/AVX2 ports, but
register allocation has been redone to take advantage of LSX/LASX's 32
vector registers, and instruction sequence has been optimized to suit
(e.g. LoongArch can perform per-byte srl and andi on vectors, but x86
cannot).
Performance numbers measured by instrumenting the raid6test code, on a
3A5000 system clocked at 2.5GHz:
> lasx 2data: 354.987 MiB/s
> lasx datap: 350.430 MiB/s
> lsx 2data: 340.026 MiB/s
> lsx datap: 337.318 MiB/s
> intx1 2data: 164.280 MiB/s
> intx1 datap: 187.966 MiB/s
Because recovery algorithms are chosen solely based on priority and
availability, lasx is marked as priority 2 and lsx priority 1. At least
for the current generation of LoongArch micro-architectures, LASX should
always be faster than LSX whenever supported, and have similar power
consumption characteristics (because the only known LASX-capable uarch,
the LA464, always compute the full 256-bit result for vector ops).
Acked-by: Song Liu <[email protected]>
Signed-off-by: WANG Xuerui <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
|
|
The algorithms work on 64 bytes at a time, which is the L1 cache line
size of all current and future LoongArch cores (that we care about), as
confirmed by Huacai. The code is based on the generic int.uc algorithm,
unrolled 4 times for LSX and 2 times for LASX. Further unrolling does
not meaningfully improve the performance according to experiments.
Performance numbers measured during system boot on a 3A5000 @ 2.5GHz:
> raid6: lasx gen() 12726 MB/s
> raid6: lsx gen() 10001 MB/s
> raid6: int64x8 gen() 2876 MB/s
> raid6: int64x4 gen() 3867 MB/s
> raid6: int64x2 gen() 2531 MB/s
> raid6: int64x1 gen() 1945 MB/s
Comparison of xor() speeds (from different boots but meaningful anyway):
> lasx: 11226 MB/s
> lsx: 6395 MB/s
> int64x4: 2147 MB/s
Performance as measured by raid6test:
> raid6: lasx gen() 25109 MB/s
> raid6: lsx gen() 13233 MB/s
> raid6: int64x8 gen() 4164 MB/s
> raid6: int64x4 gen() 6005 MB/s
> raid6: int64x2 gen() 5781 MB/s
> raid6: int64x1 gen() 4119 MB/s
> raid6: using algorithm lasx gen() 25109 MB/s
> raid6: .... xor() 14439 MB/s, rmw enabled
Acked-by: Song Liu <[email protected]>
Signed-off-by: WANG Xuerui <[email protected]>
Signed-off-by: Huacai Chen <[email protected]>
|
|
sphinx complains about the use of "%PHYLINK_PCS_NEG_*":
Documentation/networking/kapi:144: ./include/linux/phylink.h:601: WARNING: Inline literal start-string without end-string.
Documentation/networking/kapi:144: ./include/linux/phylink.h:633: WARNING: Inline literal start-string without end-string.
These are not valid symbols so drop the '%' prefix.
Alternatively we could use %PHYLINK_PCS_NEG_\* (escape the *)
or use normal literal ``PHYLINK_PCS_NEG_*`` but there is already
a handful of un-adorned DEFINE_* in this file.
Fixes: f99d471afa03 ("net: phylink: add PCS negotiation mode")
Reported-by: Stephen Rothwell <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Bagas Sanjaya <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
team interface has used a dynamic lockdep key to avoid false-positive
lockdep deadlock detection. Virtual interfaces such as team usually
have their own lock for protecting private data.
These interfaces can be nested.
team0
|
team1
Each interface's lock is actually different(team0->lock and team1->lock).
So,
mutex_lock(&team0->lock);
mutex_lock(&team1->lock);
mutex_unlock(&team1->lock);
mutex_unlock(&team0->lock);
The above case is absolutely safe. But lockdep warns about deadlock.
Because the lockdep understands these two locks are same. This is a
false-positive lockdep warning.
So, in order to avoid this problem, the team interfaces started to use
dynamic lockdep key. The false-positive problem was fixed, but it
introduced a new problem.
When the new team virtual interface is created, it registers a dynamic
lockdep key(creates dynamic lockdep key) and uses it. But there is the
limitation of the number of lockdep keys.
So, If so many team interfaces are created, it consumes all lockdep keys.
Then, the lockdep stops to work and warns about it.
In order to fix this problem, team interfaces use the subclass instead
of the dynamic key. So, when a new team interface is created, it doesn't
register(create) a new lockdep, but uses existed subclass key instead.
It is already used by the bonding interface for a similar case.
As the bonding interface does, the subclass variable is the same as
the 'dev->nested_level'. This variable indicates the depth in the stacked
interface graph.
The 'dev->nested_level' is protected by RTNL and RCU.
So, 'mutex_lock_nested()' for 'team->lock' requires RTNL or RCU.
In the current code, 'team->lock' is usually acquired under RTNL, there is
no problem with using 'dev->nested_level'.
The 'team_nl_team_get()' and The 'lb_stats_refresh()' functions acquire
'team->lock' without RTNL.
But these don't iterate their own ports nested so they don't need nested
lock.
Reproducer:
for i in {0..1000}
do
ip link add team$i type team
ip link add dummy$i master team$i type dummy
ip link set dummy$i up
ip link set team$i up
done
Splat looks like:
BUG: MAX_LOCKDEP_ENTRIES too low!
turning off the locking correctness validator.
Please attach the output of /proc/lock_stat to the bug report
CPU: 0 PID: 4104 Comm: ip Not tainted 6.5.0-rc7+ #45
Call Trace:
<TASK>
dump_stack_lvl+0x64/0xb0
add_lock_to_list+0x30d/0x5e0
check_prev_add+0x73a/0x23a0
...
sock_def_readable+0xfe/0x4f0
netlink_broadcast+0x76b/0xac0
nlmsg_notify+0x69/0x1d0
dev_open+0xed/0x130
...
Reported-by: [email protected]
Fixes: 369f61bee0f5 ("team: fix nested locking lockdep warning")
Signed-off-by: Taehee Yoo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|