| Age | Commit message (Collapse) | Author | Files | Lines |
|
By improving the BPF_LINK_UPDATE command of bpf(), it should allow you
to conveniently switch between different struct_ops on a single
bpf_link. This would enable smoother transitions from one struct_ops
to another.
The struct_ops maps passing along with BPF_LINK_UPDATE should have the
BPF_F_LINK flag.
Signed-off-by: Kui-Feng Lee <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
Make bpf_link support struct_ops. Previously, struct_ops were always
used alone without any associated links. Upon updating its value, a
struct_ops would be activated automatically. Yet other BPF program
types required to make a bpf_link with their instances before they
could become active. Now, however, you can create an inactive
struct_ops, and create a link to activate it later.
With bpf_links, struct_ops has a behavior similar to other BPF program
types. You can pin/unpin them from their links and the struct_ops will
be deactivated when its link is removed while previously need someone
to delete the value for it to be deactivated.
bpf_links are responsible for registering their associated
struct_ops. You can only use a struct_ops that has the BPF_F_LINK flag
set to create a bpf_link, while a structs without this flag behaves in
the same manner as before and is registered upon updating its value.
The BPF_LINK_TYPE_STRUCT_OPS serves a dual purpose. Not only is it
used to craft the links for BPF struct_ops programs, but also to
create links for BPF struct_ops them-self. Since the links of BPF
struct_ops programs are only used to create trampolines internally,
they are never seen in other contexts. Thus, they can be reused for
struct_ops themself.
To maintain a reference to the map supporting this link, we add
bpf_struct_ops_link as an additional type. The pointer of the map is
RCU and won't be necessary until later in the patchset.
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
This feature lets you immediately transition to another congestion
control algorithm or implementation with the same name. Once a name
is updated, new connections will apply this new algorithm.
The purpose is to update a customized algorithm implemented in BPF
struct_ops with a new version on the flight. The following is an
example of using the userspace API implemented in later BPF patches.
link = bpf_map__attach_struct_ops(skel->maps.ca_update_1);
.......
err = bpf_link__update_map(link, skel->maps.ca_update_2);
We first load and register an algorithm implemented in BPF struct_ops,
then swap it out with a new one using the same name. After that, newly
created connections will apply the updated algorithm, while older ones
retain the previous version already applied.
This patch also takes this chance to refactor the ca validation into
the new tcp_validate_congestion_control() function.
Cc: [email protected], Eric Dumazet <[email protected]>
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
We have replaced kvalue-refcnt with synchronize_rcu() to wait for an
RCU grace period.
Maintenance of kvalue->refcnt was a complicated task, as we had to
simultaneously keep track of two reference counts: one for the
reference count of bpf_map. When the kvalue->refcnt reaches zero, we
also have to reduce the reference count on bpf_map - yet these steps
are not performed in an atomic manner and require us to be vigilant
when managing them. By eliminating kvalue->refcnt, we can make our
maintenance more straightforward as the refcount of bpf_map is now
solely managed!
To prevent the trampoline image of a struct_ops from being released
while it is still in use, we wait for an RCU grace period. The
setsockopt(TCP_CONGESTION, "...") command allows you to change your
socket's congestion control algorithm and can result in releasing the
old struct_ops implementation. It is fine. However, this function is
exposed through bpf_setsockopt(), it may be accessed by BPF programs
as well. To ensure that the trampoline image belonging to struct_op
can be safely called while its method is in use, the trampoline
safeguarde the BPF program with rcu_read_lock(). Doing so prevents any
destruction of the associated images before returning from a
trampoline and requires us to wait for an RCU grace period.
Signed-off-by: Kui-Feng Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Martin KaFai Lau <[email protected]>
|
|
The Autoneg bit in the advertising bitmap and state->an_enabled are
always identical. state->an_enabled is now no longer used by any
drivers, so lets kill this duplication.
Signed-off-by: Russell King (Oracle) <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When requesting a TX queue at a given index, warn on out-of-bounds
referencing if the index is greater than the allocated number of
queues.
Specifically, since this function is used heavily in the networking
stack use DEBUG_NET_WARN_ON_ONCE to avoid executing a new branch on
every packet.
Signed-off-by: Nick Child <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Leon Romanovsky says:
====================
Extend packet offload to fully support libreswan
The following patches are an outcome of Raed's work to add packet
offload support to libreswan [1].
The series includes:
* Priority support to IPsec policies
* Statistics per-SA (visible through "ip -s xfrm state ..." command)
* Support to IKE policy holes
* Fine tuning to acquire logic.
[1] https://github.com/libreswan/libreswan/pull/986
Link: https://lore.kernel.org/all/[email protected]
* tag 'ipsec-libreswan-mlx5' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
net/mlx5e: Update IPsec per SA packets/bytes count
net/mlx5e: Use one rule to count all IPsec Tx offloaded traffic
net/mlx5e: Support IPsec acquire default SA
net/mlx5e: Allow policies with reqid 0, to support IKE policy holes
xfrm: copy_to_user_state fetch offloaded SA packets/bytes statistics
xfrm: add new device offload acquire flag
net/mlx5e: Use chains for IPsec policy priority offload
net/mlx5: fs_core: Allow ignore_flow_level on TX dest
net/mlx5: fs_chains: Refactor to detach chains from tc usage
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
PARPORT_EPP_FAST flag currently uses 32-bit I/O port access for data
read/write (insl/outsl).
Add PARPORT_EPP_FAST_16 and PARPORT_EPP_FAST_8 that use insw/outsw
and insb/outsb (and PARPORT_EPP_FAST_32 as alias for PARPORT_EPP_FAST).
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
This function was renamed from lba_capacity_is_ok()() and moved from
drivers/ide/ to <linux/ata.h> but it never got used by libata, thus it
became useless after drivers/ide/ removal...
Signed-off-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
This function was renamed from ide_id_to_hd_driveid() and moved from
drivers/ide/ to <linux/ata.h> but it never got used by libata, thus it
became useless after drivers/ide/ removal...
Signed-off-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
Now that paride is gone, pata_parport.h does not need to be in
include/linux. Move it to drivers/ata/pata_parport.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
Don't pass around a pointer to scratch buffer. Use local buffers in
protocols that need it.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
verbose parameter of test_proto() is now unused, remove it.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
scratch parameter of log_adapter() is only used by bpck driver.
Remove it.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
verbose parameter of log_adapter() is unused, remove it.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
Remove typedef struct PIA and use struct pi_adapter directly.
Fix formatting (excessive spaces) while at it.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
device is never set in pata_parport, remove it.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
Only bpck driver uses devtype but it never gets set in pata_parport.
Remove it.
As most bpck devices are CD-ROMs, always run the code that depends
on devtype == PI_PCD.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
Introduce module_pata_parport_driver macro and use it in protocol
drivers to reduce boilerplate code. Remove paride_(un)register
compatibility defines.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
Convert comm and kbic drivers to use standard swab16.
Remove pi_swab16 and pi_swab32.
Reviewed-by: Sergey Shtylyov <[email protected]>
Signed-off-by: Ondrej Zary <[email protected]>
Signed-off-by: Damien Le Moal <[email protected]>
|
|
Current flow interates over entire ACPI table entries looking for
Bluetooth Per Platform Antenna Gain(PPAG) entry. This patch iterates
over ACPI entries relvant to Bluetooth device only.
Fixes: c585a92b2f9c ("Bluetooth: btintel: Set Per Platform Antenna Gain(PPAG)")
Signed-off-by: Kiran K <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
|
|
This patch changes the return types of bpf_map_ops functions to long, where
previously int was returned. Using long allows for bpf programs to maintain
the sign bit in the absence of sign extension during situations where
inlined bpf helper funcs make calls to the bpf_map_ops funcs and a negative
error is returned.
The definitions of the helper funcs are generated from comments in the bpf
uapi header at `include/uapi/linux/bpf.h`. The return type of these
helpers was previously changed from int to long in commit bdb7b79b4ce8. For
any case where one of the map helpers call the bpf_map_ops funcs that are
still returning 32-bit int, a compiler might not include sign extension
instructions to properly convert the 32-bit negative value a 64-bit
negative value.
For example:
bpf assembly excerpt of an inlined helper calling a kernel function and
checking for a specific error:
; err = bpf_map_update_elem(&mymap, &key, &val, BPF_NOEXIST);
...
46: call 0xffffffffe103291c ; htab_map_update_elem
; if (err && err != -EEXIST) {
4b: cmp $0xffffffffffffffef,%rax ; cmp -EEXIST,%rax
kernel function assembly excerpt of return value from
`htab_map_update_elem` returning 32-bit int:
movl $0xffffffef, %r9d
...
movl %r9d, %eax
...results in the comparison:
cmp $0xffffffffffffffef, $0x00000000ffffffef
Fixes: bdb7b79b4ce8 ("bpf: Switch most helper return values from 32-bit int to 64-bit long")
Tested-by: Eduard Zingerman <[email protected]>
Signed-off-by: JP Kobryn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Fix the rcutorturename field so that its size is correctly reported in
the text format embedded in trace.dat files. As it stands, it is
reported as being of size 1:
field:char rcutorturename[8]; offset:8; size:1; signed:0;
Signed-off-by: Douglas Raillard <[email protected]>
Reviewed-by: Mukesh Ojha <[email protected]>
Cc: [email protected]
Fixes: 04ae87a52074e ("ftrace: Rework event_create_dir()")
Reviewed-by: Steven Rostedt (Google) <[email protected]>
[ boqun: Add "Cc" and "Fixes" tags per Steven ]
Signed-off-by: Boqun Feng <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
|
|
eval call-backs
`nf_nat_redirect_ipv4` takes a `struct nf_nat_ipv4_multi_range_compat`,
but converts it internally to a `struct nf_nat_range2`. Change the
function to take the latter, factor out the code now shared with
`nf_nat_redirect_ipv6`, move the conversion to the xt_REDIRECT module,
and update the ipv4 range initialization in the nft_redir module.
Replace a bare hex constant for 127.0.0.1 with a macro.
Remove `WARN_ON`. `nf_nat_setup_info` calls `nf_ct_is_confirmed`:
/* Can't setup nat info for confirmed ct. */
if (nf_ct_is_confirmed(ct))
return NF_ACCEPT;
This means that `ct` cannot be null or the kernel will crash, and
implies that `ctinfo` is `IP_CT_NEW` or `IP_CT_RELATED`.
nft_redir has separate ipv4 and ipv6 call-backs which share much of
their code, and an inet one switch containing a switch that calls one of
the others based on the family of the packet. Merge the ipv4 and ipv6
ones into the inet one in order to get rid of the duplicate code.
Const-qualify the `priv` pointer since we don't need to write through
it.
Assign `priv->flags` to the range instead of OR-ing it in.
Set the `NF_NAT_RANGE_PROTO_SPECIFIED` flag once during init, rather
than on every eval.
Signed-off-by: Jeremy Sowden <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
|
|
In ACPI systems, the OS can direct power management, as opposed to the
firmware. This OS-directed Power Management is called OSPM. Part of
telling the firmware that the OS going to direct power management is
making ACPI "_PDC" (Processor Driver Capabilities) calls. These _PDC
methods must be evaluated for every processor object. If these _PDC
calls are not completed for every processor it can lead to
inconsistency and later failures in things like the CPU frequency
driver.
In a Xen system, the dom0 kernel is responsible for system-wide power
management. The dom0 kernel is in charge of OSPM. However, the
number of CPUs available to dom0 can be different than the number of
CPUs physically present on the system.
This leads to a problem: the dom0 kernel needs to evaluate _PDC for
all the processors, but it can't always see them.
In dom0 kernels, ignore the existing ACPI method for determining if a
processor is physically present because it might not be accurate.
Instead, ask the hypervisor for this information.
Fix this by introducing a custom function to use when running as Xen
dom0 in order to check whether a processor object matches a CPU that's
online. Such checking is done using the existing information fetched
by the Xen pCPU subsystem, extending it to also store the ACPI ID.
This ensures that _PDC method gets evaluated for all physically online
CPUs, regardless of the number of CPUs made available to dom0.
Fixes: 5d554a7bb064 ("ACPI: processor: add internal processor_physically_present()")
Signed-off-by: Roger Pau Monné <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Add feature bit to existing device caps field. EFA supports data polling
of 128 bytes blocks.
The flag indicates that the NIC guarentees that a 128 byte aligned block
is written in order, ie that observing the last 8 bits of the block mean
the prior 127 bytes are also written.
It is useful for "last data polling" acceleration techniques.
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Yehuda Yitschak <[email protected]>
Reviewed-by: Yossi Leybovich <[email protected]>
Signed-off-by: Yonatan Nachum <[email protected]>
Signed-off-by: Michael Margolin <[email protected]>
Acked-by: Gal Pressman <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap into gpio/for-next
regmap: Add no_status support
This patch adds support for devices which don't support readback of
individual interrupt statuses, we report all interrupts as firing and
hope the consumers do the right thing.
|
|
There have been reports [1][2] of live patches failing to complete
within a reasonable amount of time due to CPU-bound kthreads.
Fix it by patching tasks in cond_resched().
There are four different flavors of cond_resched(), depending on the
kernel configuration. Hook into all of them.
A more elegant solution might be to use a preempt notifier. However,
non-ORC unwinders can't unwind a preempted task reliably.
[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lkml.kernel.org/lkml/[email protected]
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Tested-by: Seth Forshee (DigitalOcean) <[email protected]>
Link: https://lore.kernel.org/r/4ae981466b7814ec221014fc2554b2f86f3fb70b.1677257135.git.jpoimboe@kernel.org
|
|
Introduce a core thermal API function, thermal_cooling_device_update(),
for updating the max_state value for a cooling device and rearranging
its statistics in sysfs after a possible change of its ->get_max_state()
callback return value.
That callback is now invoked only once, during cooling device
registration, to populate the max_state field in the cooling device
object, so if its return value changes, it needs to be invoked again
and the new return value needs to be stored as max_state. Moreover,
the statistics presented in sysfs need to be rearranged in general,
because there may not be enough room in them to store data for all
of the possible states (in the case when max_state grows).
The new function takes care of that (and some other minor things
related to it), but some extra locking and lockdep annotations are
added in several places too to protect against crashes in the cases
when the statistics are not present or when a stale max_state value
might be used by sysfs attributes.
Note that the actual user of the new function will be added separately.
Link: https://lore.kernel.org/linux-pm/[email protected]/
Signed-off-by: Rafael J. Wysocki <[email protected]>
Tested-by: Zhang Rui <[email protected]>
Reviewed-by: Zhang Rui <[email protected]>
|
|
Add APIs to generate an array of beacons for an EMA AP (enhanced
multiple BSSID advertisements), each including a single MBSSID element.
EMA profile periodicity equals the count of elements.
- ieee80211_beacon_get_template_ema_list() - Generate and return all
EMA beacon templates. Drivers must call ieee80211_beacon_free_ema_list()
to free the memory. No change in the prototype for the existing API,
ieee80211_beacon_get_template(), which should be used for non-EMA AP.
- ieee80211_beacon_get_template_ema_index() - Generate a beacon which
includes the multiple BSSID element at the given index. Drivers can use
this function in a loop until NULL is returned which indicates end of
available MBSSID elements.
- ieee80211_beacon_free_ema_list() - free the memory allocated for the
list of EMA beacon templates.
Modify existing functions ieee80211_beacon_get_ap(),
ieee80211_get_mbssid_beacon_len() and ieee80211_beacon_add_mbssid()
to accept a new parameter for EMA index.
Signed-off-by: Aloka Dixit <[email protected]>
Co-developed-by: John Crispin <[email protected]>
Signed-off-by: John Crispin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
Consolidate all handling of CONFIG_DRM_FBDEV_LEAK_PHYS_SMEM by
making the module parameter optional in drm_fb_helper.c.
Without the config option, modules can set smem_start in struct
fb_info for internal usage, but not export if to userspace. The
address can only be exported by enabling the option and setting
the module parameter. Also update the comment.
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Acked-by: Zack Rusin <[email protected]>
Tested-by: Sui Jingfeng<[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Export the fb_info release code as drm_fb_helper_release_info(). Will
help with cleaning up failed fbdev probing.
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Acked-by: Zack Rusin <[email protected]>
Tested-by: Sui Jingfeng<[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Remove the flag prefer_shadow_fbdev from struct drm_mode_config.
Drivers set this flag to enable shadow buffering in the generic
fbdev emulation. Such shadow buffering is now mandatory, so the
flag is unused.
Signed-off-by: Thomas Zimmermann <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Reviewed-by: Zack Rusin <[email protected]>
Tested-by: Sui Jingfeng <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Currently when NL80211_SCAN_FLAG_COLOCATED_6GHZ is set in the scan flags,
in addition to the co-located APs, PSC channels in the 6 GHz band would
also be scanned if the user space has asked for it. In other words, the
scan would happen on PSC channels & co-located 6 GHz channels that were
reported in the RNR IE.
Update the documentation of NL80211_SCAN_FLAG_COLOCATED_6GHZ flag to
reflect the above said behavior.
Signed-off-by: Manikanta Pubbisetty <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
MT7996 hardware supports mesh A-MSDU subframes in hardware, but uses a
big-endian length field
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Ryder Lee <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
On newer MediaTek SoCs (e.g. MT7986), WLAN->WLAN or WLAN->Ethernet flows can
be offloaded by the SoC. In order to support that, the .ndo_setup_tc op is
needed.
Signed-off-by: Felix Fietkau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
After a couple of years and multiple LTS releases we received a report
that the behavior of O_DIRECTORY | O_CREAT changed starting with v5.7.
On kernels prior to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL
had the following semantics:
(1) open("/tmp/d", O_DIRECTORY | O_CREAT)
* d doesn't exist: create regular file
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: EISDIR
(2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
* d doesn't exist: create regular file
* d exists and is a regular file: EEXIST
* d exists and is a directory: EEXIST
(3) open("/tmp/d", O_DIRECTORY | O_EXCL)
* d doesn't exist: ENOENT
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: open directory
On kernels since to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL
have the following semantics:
(1) open("/tmp/d", O_DIRECTORY | O_CREAT)
* d doesn't exist: ENOTDIR (create regular file)
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: EISDIR
(2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
* d doesn't exist: ENOTDIR (create regular file)
* d exists and is a regular file: EEXIST
* d exists and is a directory: EEXIST
(3) open("/tmp/d", O_DIRECTORY | O_EXCL)
* d doesn't exist: ENOENT
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: open directory
This is a fairly substantial semantic change that userspace didn't
notice until Pedro took the time to deliberately figure out corner
cases. Since no one noticed this breakage we can somewhat safely assume
that O_DIRECTORY | O_CREAT combinations are likely unused.
The v5.7 breakage is especially weird because while ENOTDIR is returned
indicating failure a regular file is actually created. This doesn't make
a lot of sense.
Time was spent finding potential users of this combination. Searching on
codesearch.debian.net showed that codebases often express semantical
expectations about O_DIRECTORY | O_CREAT which are completely contrary
to what our code has done and currently does.
The expectation often is that this particular combination would create
and open a directory. This suggests users who tried to use that
combination would stumble upon the counterintuitive behavior no matter
if pre-v5.7 or post v5.7 and quickly realize neither semantics give them
what they want. For some examples see the code examples in [1] to [3]
and the discussion in [4].
There are various ways to address this issue. The lazy/simple option
would be to restore the pre-v5.7 behavior and to just live with that bug
forever. But since there's a real chance that the O_DIRECTORY | O_CREAT
quirk isn't relied upon we should try to get away with murder(ing bad
semantics) first. If we need to Frankenstein pre-v5.7 behavior later so
be it.
So let's simply return EINVAL categorically for O_DIRECTORY | O_CREAT
combinations. In addition to cleaning up the old bug this also opens up
the possiblity to make that flag combination do something more intuitive
in the future.
Starting with this commit the following semantics apply:
(1) open("/tmp/d", O_DIRECTORY | O_CREAT)
* d doesn't exist: EINVAL
* d exists and is a regular file: EINVAL
* d exists and is a directory: EINVAL
(2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
* d doesn't exist: EINVAL
* d exists and is a regular file: EINVAL
* d exists and is a directory: EINVAL
(3) open("/tmp/d", O_DIRECTORY | O_EXCL)
* d doesn't exist: ENOENT
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: open directory
One additional note, O_TMPFILE is implemented as:
#define __O_TMPFILE 020000000
#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
#define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)
For older kernels it was important to return an explicit error when
O_TMPFILE wasn't supported. So O_TMPFILE requires that O_DIRECTORY is
raised alongside __O_TMPFILE. It also enforced that O_CREAT wasn't
specified. Since O_DIRECTORY | O_CREAT could be used to create a regular
allowing that combination together with __O_TMPFILE would've meant that
false positives were possible, i.e., that a regular file was created
instead of a O_TMPFILE. This could've been used to trick userspace into
thinking it operated on a O_TMPFILE when it wasn't.
Now that we block O_DIRECTORY | O_CREAT completely the check for O_CREAT
in the __O_TMPFILE branch via if ((flags & O_TMPFILE_MASK) != O_TMPFILE)
can be dropped. Instead we can simply check verify that O_DIRECTORY is
raised via if (!(flags & O_DIRECTORY)) and explain this in two comments.
As Aleksa pointed out O_PATH is unaffected by this change since it
always returned EINVAL if O_CREAT was specified - with or without
O_DIRECTORY.
Link: https://lore.kernel.org/lkml/[email protected]
Link: https://sources.debian.org/src/flatpak/1.14.4-1/subprojects/libglnx/glnx-dirfd.c/?hl=324#L324 [1]
Link: https://sources.debian.org/src/flatpak-builder/1.2.3-1/subprojects/libglnx/glnx-shutil.c/?hl=251#L251 [2]
Link: https://sources.debian.org/src/ostree/2022.7-2/libglnx/glnx-dirfd.c/?hl=324#L324 [3]
Link: https://www.openwall.com/lists/oss-security/2014/11/26/14 [4]
Reported-by: Pedro Falcato <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
Not used by any drivers any more, the only use case in drm_dev_init()
can be inlined now.
Signed-off-by: Christian König <[email protected]>
Reviewed-by: Jeffrey Hugo <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The FEI field of C2HTermReq/H2CTermReq is 4 bytes but not 4-byte-aligned
in the NVMe/TCP specification (it is located at offset 10 in the PDU).
Split it into two 16-bit integers in struct nvme_tcp_term_pdu
so no padding is inserted. There should also be 10 reserved bytes after.
There are currently no users of this type.
Fixes: fc221d05447aa6db ("nvme-tcp: Add protocol header")
Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Caleb Sander <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
|
|
This helper is no longer used in the tree.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
rcu_bh is no longer a win, especially for objects freed
with standard call_rcu().
Switch neighbour code to no longer disable BH when not necessary.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into drm-next
This tag contains habanalabs driver and accel changes for v6.4:
- uAPI changes:
- Add opcodes to the CS ioctl to allow user to stall/resume specific engines
inside Gaudi2. This is to allow the user to perform power
testing/measurements when training different topologies.
- Expose in the INFO ioctl the amount of device memory that the driver
and f/w reserve for themselves.
- Expose in the INFO ioctl a bit-mask of the available rotator engines
in Gaudi2. This is to align with other engines that are already exposed.
- Expose in the INFO ioctl the register's address of the f/w that should
be used to trigger interrupts from within the user's code running in the
compute engines.
- Add a critical-event bit in the eventfd bitmask so the user will know the
event that was received was critical, and a reset will now occur
- Expose in the INFO ioctl two new opcodes to fetch information on h/w and
f/w events. The events recorded are the events that were reported in the
eventfd.
- New features and improvements:
- Add a dedicated interrupt ID in MSI-X in the device to the notification of
an unexpected user-related event in Gaudi2. Handle it in the driver by
reporting this event.
- Allow the user to fetch the device memory current usage even when the
device is undergoing compute-reset (a reset type that only clears the
compute engines).
- Enable graceful reset mechanism for compute-reset. This will give the
user a few seconds before the device is reset. For example, the user can,
during that time, perform certain device operations (dump data for debug)
or close the device in an orderly fashion.
- Align the decoder with the rest of the engines in regard to notification
to the user about interrupts and in regard to performing graceful reset
when needed (instead of immediate reset).
- Add support for assert interrupt from the TPC engine.
- Get the reset type that is necessary to perform per event from the
auto-generated irq_map array.
- Print the specific reason why a device is still in use when notifying to
the user about it (after the user closed the device's FD).
- Move to threaded IRQ when handling interrupts of workload completions.
- Firmware related fixes:
- Fix RAZWI event handler to match newest f/w version.
- Read error cause register in dma core events because the f/w doesn't
do that.
- Increase maximum time to wait for completion of Gaudi2 reset due to f/w
bug.
- Align to the latest firmware specs.
- Enforce the release order of the compute device and dma-buf.
i.e increment the device file refcount for any dma-buf that was exported
for that device. This will make sure the compute device release function
won't be called until the user closes all the FDs of the relevant
dma-bufs. Without this change, closing the device's FD before/without
closing the dma-buf's FD would always lead to hard-reset of the device.
- Fix a link in the drm documentation to correctly point to the accel section.
- Compilation warnings cleanups
- Misc bug fixes and code cleanups
Signed-off-by: Dave Airlie <[email protected]>
# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCgAdFiEE7TEboABC71LctBLFZR1NuKta54AFAmQYfcAACgkQZR1NuKta
# 54DB4Af/SuiHZkVXwr+yHPv9El726rz9ZQD7mQtzNmehWGonwAvz15yqocNMUSbF
# JbqE/vrZjvbXrP1Uv5UrlRVdnFHSPV18VnHU4BMS/WOm19SsR6vZ0QOXOoa6/AUb
# w+kF3D//DbFI4/mTGfpH5/pzwu51ti8aVktosPFlHIa8iI8CB4/4IV+ivQ8UW4oK
# HyDRkIvHdRmER7vGOfhwhsr4zdqSlJBYrv3C3Z1dkSYBPW/5ICbiM1UlKycwdYKI
# cajQBSdUQwUCWnI+i8RmSy3kjNO6OE4XRUvTv89F2bQeyK/1rJLG2m2xZR/Ml/o5
# 7Cgvbn0hWZyeqe7OObYiBlSOBSehCA==
# =wclm
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 21 Mar 2023 01:37:36 AEST
# gpg: using RSA key ED311BA00042EF52DCB412C5651D4DB8AB5AE780
# gpg: Can't check signature: No public key
From: Oded Gabbay <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Add devicetree binding document and related header file
for the Loongson-1 clock.
Signed-off-by: Keguang Zhang <[email protected]>
Reviewed-by: Krzysztof Kozlowski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Stephen Boyd <[email protected]>
|
|
The CDAT exposed in sysfs differs between little endian and big endian
arches: On big endian, every 4 bytes are byte-swapped.
PCI Configuration Space is little endian (PCI r3.0 sec 6.1). Accessors
such as pci_read_config_dword() implicitly swap bytes on big endian.
That way, the macros in include/uapi/linux/pci_regs.h work regardless of
the arch's endianness. For an example of implicit byte-swapping, see
ppc4xx_pciex_read_config(), which calls in_le32(), which uses lwbrx
(Load Word Byte-Reverse Indexed).
DOE Read/Write Data Mailbox Registers are unlike other registers in
Configuration Space in that they contain or receive a 4 byte portion of
an opaque byte stream (a "Data Object" per PCIe r6.0 sec 7.9.24.5f).
They need to be copied to or from the request/response buffer verbatim.
So amend pci_doe_send_req() and pci_doe_recv_resp() to undo the implicit
byte-swapping.
The CXL_DOE_TABLE_ACCESS_* and PCI_DOE_DATA_OBJECT_DISC_* macros assume
implicit byte-swapping. Byte-swap requests after constructing them with
those macros and byte-swap responses before parsing them.
Change the request and response type to __le32 to avoid sparse warnings.
Per a request from Jonathan, replace sizeof(u32) with sizeof(__le32) for
consistency.
Fixes: c97006046c79 ("cxl/port: Read CDAT table")
Tested-by: Ira Weiny <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: [email protected] # v6.0+
Reviewed-by: Jonathan Cameron <[email protected]>
Link: https://lore.kernel.org/r/3051114102f41d19df3debbee123129118fc5e6d.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <[email protected]>
|
|
Merge series from Peter Ujfalusi <[email protected]>:
On a platform when the DSP is in use, we cannot select individual links
to use or not use the DSP, it is either all or none. On some audio
endpoint, like HDMI/DP, it is preferred to not use any processing in DSP
to reduce the latency and to allow bytestream pass-through (DTS, DD,
etc)
IPC4 introduces a new type of end-to-end connection within the DSP which
is using the host DMA and link DMA in a single buffer, working
back-to-back, passing the received data without looking at it or trying
to understand the format, content.
This mode reduces the latency and allows non PCM streams to be sent from
userspace.
The feature is enabled per PCM bases, signalled in topology.
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for v6.4-rc1:
Cross-subsystem Changes:
- Add drm_bridge.h to drm_bridge maintainers.
Core Changes:
- Assorted fixes to TTM, tests, format-helper, accel.
- Assorted Makefile fixes to drivers and accel.
- Implement fbdev emulation for GEM DMA drivers, and convert a lot of
drivers to use it.
- Use tgid instead of pid for tracking clients.
Driver Changes:
- Assorted fixes in rockchip, vmwgfx, nouveau, cirrus.
- Add imx25 driver.
- Add Elida KD50T048A, Sony TD4353, Novatek NT36523, STARRY 2081101QFH032011-53G panels.
- Add 4K mode support to rockchip.
- Convert cirrus to use regular atomic helpers, and more cirrus
improvements.
- Add damage clipping to cirrus, virtio.
[airlied: add drm_bridge.h include to imx]
Signed-off-by: Dave Airlie <[email protected]>
From: Maarten Lankhorst <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
The function returned zero unconditionally. Switch the return type to void
and simplify the callers accordingly.
Signed-off-by: Uwe Kleine-König <[email protected]>
Reviewed-by: Miquel Raynal <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexandre Belloni <[email protected]>
|
|
When debugging a crash that appears to be related to ftrace, but not for
sure, it is useful to know if a function was ever enabled by ftrace or
not. It could be that a BPF program was attached to it, or possibly a live
patch.
We are having crashes in the field where this information is not always
known. But having ftrace set a flag if a function has ever been attached
since boot up helps tremendously in trying to know if a crash had to do
with something using ftrace.
For analyzing crashes, the use of a kdump image can have access to the
flags. When looking at issues where the kernel did not panic, the
touched_functions file can simply be used.
Link: https://lore.kernel.org/linux-trace-kernel/[email protected]
Cc: Masami Hiramatsu <[email protected]>
Cc: Catalin Marinas <[email protected]>
Tested-by: Mark Rutland <[email protected]>
Tested-by: Chris Li <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The ftrace selftest code has a trace_direct_tramp() function which it
uses as a direct call trampoline. This happens to work on x86, since the
direct call's return address is in the usual place, and can be returned
to via a RET, but in general the calling convention for direct calls is
different from regular function calls, and requires a trampoline written
in assembly.
On s390, regular function calls place the return address in %r14, and an
ftrace patch-site in an instrumented function places the trampoline's
return address (which is within the instrumented function) in %r0,
preserving the original %r14 value in-place. As a regular C function
will return to the address in %r14, using a C function as the trampoline
results in the trampoline returning to the caller of the instrumented
function, skipping the body of the instrumented function.
Note that the s390 issue is not detcted by the ftrace selftest code, as
the instrumented function is trivial, and returning back into the caller
happens to be equivalent.
On arm64, regular function calls place the return address in x30, and
an ftrace patch-site in an instrumented function saves this into r9
and places the trampoline's return address (within the instrumented
function) in x30. A regular C function will return to the address in
x30, but will not restore x9 into x30. Consequently, using a C function
as the trampoline results in returning to the trampoline's return
address having corrupted x30, such that when the instrumented function
returns, it will return back into itself.
To avoid future issues in this area, remove the trace_direct_tramp()
function, and require that each architecture with direct calls provides
a stub trampoline, named ftrace_stub_direct_tramp. This can be written
to handle the architecture's trampoline calling convention, and in
future could be used elsewhere (e.g. in the ftrace ops sample, to
measure the overhead of direct calls), so we may as well always build it
in.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Mark Rutland <[email protected]>
Cc: Li Huafei <[email protected]>
Cc: Xu Kuohai <[email protected]>
Signed-off-by: Florent Revest <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Direct called trampolines can be called in two ways:
- either from the ftrace callsite. In this case, they do not access any
struct ftrace_regs nor pt_regs
- Or, if a ftrace ops is also attached, from the end of a ftrace
trampoline. In this case, the call_direct_funcs ops is in charge of
setting the direct call trampoline's address in a struct ftrace_regs
Since:
commit 9705bc709604 ("ftrace: pass fregs to arch_ftrace_set_direct_caller()")
The later case no longer requires a full pt_regs. It only needs a struct
ftrace_regs so DIRECT_CALLS can work with both WITH_ARGS or WITH_REGS.
With architectures like arm64 already abandoning WITH_REGS in favor of
WITH_ARGS, it's important to have DIRECT_CALLS work WITH_ARGS only.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Florent Revest <[email protected]>
Co-developed-by: Mark Rutland <[email protected]>
Signed-off-by: Mark Rutland <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|