Age | Commit message (Collapse) | Author | Files | Lines |
|
With recent changes that separated action module load from action
initialization tcf_action_init() function error handling code was modified
to manually release the loaded modules if loading/initialization of any
further action in same batch failed. For the case when all modules
successfully loaded and some of the actions were initialized before one of
them failed in init handler. In this case for all previous actions the
module will be released twice by the error handler: First time by the loop
that manually calls module_put() for all ops, and second time by the action
destroy code that puts the module after destroying the action.
Reproduction:
$ sudo tc actions add action simple sdata \"2\" index 2
$ sudo tc actions add action simple sdata \"1\" index 1 \
action simple sdata \"2\" index 2
RTNETLINK answers: File exists
We have an error talking to the kernel
$ sudo tc actions ls action simple
total acts 1
action order 0: Simple <"2">
index 2 ref 1 bind 0
$ sudo tc actions flush action simple
$ sudo tc actions ls action simple
$ sudo tc actions add action simple sdata \"2\" index 2
Error: Failed to load TC action module.
We have an error talking to the kernel
$ lsmod | grep simple
act_simple 20480 -1
Fix the issue by modifying module reference counting handling in action
initialization code:
- Get module reference in tcf_idr_create() and put it in tcf_idr_release()
instead of taking over the reference held by the caller.
- Modify users of tcf_action_init_1() to always release the module
reference which they obtain before calling init function instead of
assuming that created action takes over the reference.
- Finally, modify tcf_action_init_1() to not release the module reference
when overwriting existing action as this is no longer necessary since both
upper and lower layers obtain and manage their own module references
independently.
Fixes: d349f9976868 ("net_sched: fix RTNL deadlock again caused by request_module()")
Suggested-by: Cong Wang <[email protected]>
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Action init code increments reference counter when it changes an action.
This is the desired behavior for cls API which needs to obtain action
reference for every classifier that points to action. However, act API just
needs to change the action and releases the reference before returning.
This sequence breaks when the requested action doesn't exist, which causes
act API init code to create new action with specified index, but action is
still released before returning and is deleted (unless it was referenced
concurrently by cls API).
Reproduction:
$ sudo tc actions ls action gact
$ sudo tc actions change action gact drop index 1
$ sudo tc actions ls action gact
Extend tcf_action_init() to accept 'init_res' array and initialize it with
action->ops->init() result. In tcf_action_add() remove pointers to created
actions from actions array before passing it to tcf_action_put_many().
Fixes: cae422f379f3 ("net: sched: use reference counting action init")
Reported-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This reverts commit 6855e8213e06efcaf7c02a15e12b1ae64b9a7149.
Following commit in series fixes the issue without introducing regression
in error rollback of tcf_action_destroy().
Signed-off-by: Vlad Buslov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Remove including <linux/version.h> that don't need it.
Signed-off-by: Tian Tao <[email protected]>
Signed-off-by: Zhiqi Song <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Huazhong Tan says:
====================
net: hns3: add support for pm_ops
This series adds support for pm_ops in the HNS3 ethernet driver.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
To implement the system suspend/resume functions, the NIC driver needs
to support:
1. When the system enters the suspend mode, the driver needs to
implement the suspend callback function of the NIC device. The driver
needs to mute the device, stop all RX/TX activities of the device, and
unmap the interrupt.
2. When the system enters the resume mode, the driver needs to
implement the resume callback function of the NIC device and restore
the device to the state before suspension.
When the system enters the suspend and resume mode, the NIC driver
actually executes the PF function reset process.
When the PFs are suspending/resuming, VFs also enter the suspend/resume
state because the PFs trigger the VFs to reset, therefore no operation
is required when the VF pci_driver is suspending or resuming.
Signed-off-by: Jiaran Zhang <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The flr_prepare/flr_done functions are not only used in the FLR scenario,
but also used in the suspend/resume.
Change the function names to prepare_for_reset/rebuild_for_reset, change
the flr_prepare/flr_done to reset_prepare/reset_done in hnae3_ae_ops.
Signed-off-by: Jiaran Zhang <[email protected]>
Signed-off-by: Huazhong Tan <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Shannon Nelson says:
====================
ionic: hwstamp tweaks
A few little changes after review comments and
additional internal testing.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Make sure the configuration is locked before
operating on it for the replay.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Split the call into ionic_lif_hwstamp_set() to have two
separate interfaces, one from the ioctl() for changing the
configuration and one for replaying the current configuration
after a FW RESET.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When starting the queues in the link-check, don't go into
the BROKEN state if the return was EBUSY.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When returning after a firmware reset, re-start the
PTP after we've restarted the general queues.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Set the SKBTX_IN_PROGRESS when offloading the Tx timestamp.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Make sure the device is in a Tx offload mode before calling the
hwstamp offload xmit.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We don't need to look for HAVE_HWSTAMP_TX_ONESTEP_P2P in the
upstream kernel.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Clean up variable declarations.
Signed-off-by: Shannon Nelson <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Marek Behún says:
====================
net: phy: marvell10g updates
Here are some updates for marvell10g PHY driver.
I am still working on some more changes for this driver, but I would
like to have at least something reviewed / applied.
Changes since v3:
- added Andrew's Reviewed-by tags
- removed patches adding variadic-macro library and bitmap
initialization macro - it causes warning that we are not currently
able to fix easily. Instead the supported_interfaces bitmap is now
initialized via a chip specific method
- added explanation of mactype initialization to commit message of patch
07/16
- fixed repeated word in commit message of second to last patch
Changes since v2:
- code refactored to use an additional structure mv3310_chip describing
mv3310 specific properties / operations for PHYs supported by this
driver
- added separate phy_driver structures for 88X3340 and 88E2111
- removed 88E2180 specific code (dual-port and quad-port SXGMII modes
are ignored for now)
Changes since v1:
- added various MACTYPEs support also for 88E21XX
- differentiate between specific models with same PHY_ID
- better check for compatible interface
- print exact model
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add myself as maintainer of the marvell10g ethernet PHY driver, in
addition to Russell King.
Signed-off-by: Marek Behún <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This module supports not only Alaska X, but also Alaska M.
Change module description appropriately.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
88E2111 is a variant of 88E2110 which does not support 5 gigabit speeds.
Differentiate these variants via the match_phy_device() method, since
they have the same PHY ID.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add constants for 2.5G and 5G speed in PCS speed register into mdio.h.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The driver name "mv88x2110" should be instead "mv88e2110".
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The 88X3340 contains 4 cores similar to 88X3310, but there is a
difference: it does not support xaui host mode. Instead the
corresponding MACTYPE means
rxaui / 5gbase-r / 2500base-x / sgmii without AN
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently the only "changing" MACTYPE we support is when the PHY changes
between
10gbase-r / 5gbase-r / 2500base-x / sgmii
Add support for
usxgmii
xaui / 5gbase-r / 2500base-x / sgmii
rxaui / 5gbase-r / 2500base-x / sgmii
and also
5gbase-r / 2500base-x / sgmii
for 88E2110.
Signed-off-by: Marek Behún <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Now that we have a chip structure, we can store the temperature reading
method in this structure (OOP style).
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The 88E2110 does not support xaui nor rxaui modes. Check for correct
interface mode for different chips.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add support for all rate matching modes for 88X3310 (currently only
10gbase-r is supported, but xaui and rxaui can also be used).
Add support for rate matching for 88E2110 (on 88E2110 the MACTYPE
register is at a different place).
Currently rate matching mode is selected by strapping pins (by setting
the MACTYPE register). There is work in progress to enable this driver
to deduce the best MACTYPE from the knowledge of which interface modes
are supported by the host, but this work is not finished yet.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add all MACTYPE definitions for 88E2110, 88E2180, 88E2111 and 88E2181.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add all MACTYPE definitions for 88X3310, 88X3310P, 88X3340 and 88X3340P.
In order to have consistent naming, rename
MV_V2_33X0_PORT_CTRL_MACTYPE_RATE_MATCH to
MV_V2_33X0_PORT_CTRL_MACTYPE_10GBASER_RATE_MATCH.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Rename port control registers to indicate that they are valid only for
88X33x0, not for 88E21x0.
Signed-off-by: Marek Behún <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
These modes are also supported by these PHYs.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This space should be a tab instead.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The MV_V2_PORT_MAC_TYPE_* is part of the CTRL register. Rename to
MV_V2_PORT_CTRL_MACTYPE_*.
Signed-off-by: Marek Behún <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
WARNING: CPU: 5 PID: 227 at fs/io_uring.c:8578 io_ring_exit_work+0xe6/0x470
RIP: 0010:io_ring_exit_work+0xe6/0x470
Call Trace:
process_one_work+0x206/0x400
worker_thread+0x4a/0x3d0
kthread+0x129/0x170
ret_from_fork+0x22/0x30
INFO: task lfs-openat:2359 blocked for more than 245 seconds.
task:lfs-openat state:D stack: 0 pid: 2359 ppid: 1 flags:0x00000004
Call Trace:
...
wait_for_completion+0x8b/0xf0
io_wq_destroy_manager+0x24/0x60
io_wq_put_and_exit+0x18/0x30
io_uring_clean_tctx+0x76/0xa0
__io_uring_files_cancel+0x1b9/0x2e0
do_exit+0xc0/0xb40
...
Even after io-wq destroy has been issued io-wq worker threads will
continue executing all left work items as usual, and may hang waiting
for I/O that won't ever complete (aka unbounded).
[<0>] pipe_read+0x306/0x450
[<0>] io_iter_do_read+0x1e/0x40
[<0>] io_read+0xd5/0x330
[<0>] io_issue_sqe+0xd21/0x18a0
[<0>] io_wq_submit_work+0x6c/0x140
[<0>] io_worker_handle_work+0x17d/0x400
[<0>] io_wqe_worker+0x2c0/0x330
[<0>] ret_from_fork+0x22/0x30
Cancel all unbounded I/O instead of executing them. This changes the
user visible behaviour, but that's inevitable as io-wq is not per task.
Suggested-by: Jens Axboe <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/cd4b543154154cba055cf86f351441c2174d7f71.1617842918.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
WARNING: at fs/io_uring.c:8578 io_ring_exit_work.cold+0x0/0x18
As reissuing is now passed back by REQ_F_REISSUE and kiocb_done()
internally uses __io_complete_rw(), it may stop after setting the flag
so leaving a dangling request.
There are tricky edge cases, e.g. reading beyound file, boundary, so
the easiest way is to hand code reissue in kiocb_done() as
__io_complete_rw() was doing for us before.
Fixes: 230d50d448ac ("io_uring: move reissue into regular IO path")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/f602250d292f8a84cca9a01d747744d1e797be26.1617842918.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
The nla_len() is less than or equal to 16. If it's less than 16 then end
of the "gid" buffer is uninitialized.
Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload")
Link: https://lore.kernel.org/r/[email protected]
Reported-by: Dan Carpenter <[email protected]>
Signed-off-by: Mark Bloch <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- fix incorrect dereference of the ext_params2 external interrupt
parameter, which leads to an instant kernel crash if a pfault
interrupt occurs.
- add forgotten stack unwinder support, and fix memory leak for the
new machine check handler stack.
- fix inline assembly register clobbering due to KASAN code
instrumentation.
* tag 's390-5.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/setup: use memblock_free_late() to free old stack
s390/irq: fix reading of ext_params2 field from lowcore
s390/unwind: add machine check handler stack
s390/cpcmd: fix inline assembly register clobbering
|
|
In ice_suspend(), ice_clear_interrupt_scheme() is called, and then
irq_free_descs() will be eventually called to free irq and its descriptor.
In ice_resume(), ice_init_interrupt_scheme() is called to allocate new
irqs. However, in ice_rebuild_arfs(), struct irq_glue and struct cpu_rmap
maybe cannot be freed, if the irqs that released in ice_suspend() were
reassigned to other devices, which makes irq descriptor's affinity_notify
lost.
So call ice_free_cpu_rx_rmap() before ice_clear_interrupt_scheme(), which
can make sure all irq_glue and cpu_rmap can be correctly released before
corresponding irq and descriptor are released.
Fix the following memory leak.
unreferenced object 0xffff95bd951afc00 (size 512):
comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s)
hex dump (first 32 bytes):
18 00 00 00 18 00 18 00 70 fc 1a 95 bd 95 ff ff ........p.......
00 00 ff ff 01 00 ff ff 02 00 ff ff 03 00 ff ff ................
backtrace:
[<0000000072e4b914>] __kmalloc+0x336/0x540
[<0000000054642a87>] alloc_cpu_rmap+0x3b/0xb0
[<00000000f220deec>] ice_set_cpu_rx_rmap+0x6a/0x110 [ice]
[<000000002370a632>] ice_probe+0x941/0x1180 [ice]
[<00000000d692edba>] local_pci_probe+0x47/0xa0
[<00000000503934f0>] work_for_cpu_fn+0x1a/0x30
[<00000000555a9e4a>] process_one_work+0x1dd/0x410
[<000000002c4b414a>] worker_thread+0x221/0x3f0
[<00000000bb2b556b>] kthread+0x14c/0x170
[<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30
unreferenced object 0xffff95bd81b0a2a0 (size 96):
comm "kworker/0:1", pid 134, jiffies 4294684283 (age 13051.958s)
hex dump (first 32 bytes):
38 00 00 00 01 00 00 00 e0 ff ff ff 0f 00 00 00 8...............
b0 a2 b0 81 bd 95 ff ff b0 a2 b0 81 bd 95 ff ff ................
backtrace:
[<00000000582dd5c5>] kmem_cache_alloc_trace+0x31f/0x4c0
[<000000002659850d>] irq_cpu_rmap_add+0x25/0xe0
[<00000000495a3055>] ice_set_cpu_rx_rmap+0xb4/0x110 [ice]
[<000000002370a632>] ice_probe+0x941/0x1180 [ice]
[<00000000d692edba>] local_pci_probe+0x47/0xa0
[<00000000503934f0>] work_for_cpu_fn+0x1a/0x30
[<00000000555a9e4a>] process_one_work+0x1dd/0x410
[<000000002c4b414a>] worker_thread+0x221/0x3f0
[<00000000bb2b556b>] kthread+0x14c/0x170
[<00000000ad2cf1cd>] ret_from_fork+0x1f/0x30
Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL")
Signed-off-by: Yongxin Liu <[email protected]>
Tested-by: Tony Brelinski <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Set proper return values inside error checking if-statements.
Previously following warning was produced when compiling against sparse.
i40e_main.c:15162 i40e_init_recovery_mode() warn: missing error code 'err'
Fixes: 4ff0ee1af0169 ("i40e: Introduce recovery mode support")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Remove vsi->netdev->name from the trace.
This is redundant information. With the devinfo trace, the adapter
is already identifiable.
Previously following error was produced when compiling against sparse.
i40e_main.c:2571 i40e_sync_vsi_filters() error:
we previously assumed 'vsi->netdev' could be null (see line 2323)
Fixes: b603f9dc20af ("i40e: Log info when PF is entering and leaving Allmulti mode.")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Init pointer with NULL in default switch case statement.
Previously the error was produced when compiling against sparse.
i40e_debugfs.c:582 i40e_dbg_dump_desc() error: uninitialized symbol 'ring'.
Fixes: 44ea803e2fa7 ("i40e: introduce new dump desc XDP command")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Remove error handling through pointers. Instead use plain int
to return value from i40e_run_xdp(...).
Previously:
- sparse errors were produced during compilation:
i40e_txrx.c:2338 i40e_run_xdp() error: (-2147483647) too low for ERR_PTR
i40e_txrx.c:2558 i40e_clean_rx_irq() error: 'skb' dereferencing possible ERR_PTR()
- sk_buff* was used to return value, but it has never had valid
pointer to sk_buff. Returned value was always int handled as
a pointer.
Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
Fixes: 2e6893123830 ("i40e: split XDP_TX tail and XDP_REDIRECT map flushing")
Signed-off-by: Aleksandr Loktionov <[email protected]>
Signed-off-by: Arkadiusz Kubalewski <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
Change parameters order in aq_get_phy_register() due to wrong
statistics in PHY reported by ethtool. Previously all PHY statistics were
exactly the same for all interfaces
Now statistics are reported correctly - different for different interfaces
Fixes: 0514db37dd78 ("i40e: Extend PHY access with page change flag")
Signed-off-by: Grzegorz Siwik <[email protected]>
Tested-by: Dave Switzer <[email protected]>
Signed-off-by: Tony Nguyen <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This batch became unexpectedly bigger due to the pending ASoC patches,
but all look small and fine device-specific fixes.
Many of the commits are for ASoC Intel drivers, while the rest are for
ASoC small codec/platform fixes and HD-audio quirks"
* tag 'sound-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (21 commits)
ALSA: hda/realtek: Fix speaker amp setup on Acer Aspire E1
ALSA: aloop: Fix initialization of controls
ALSA: hda/conexant: Apply quirk for another HP ZBook G5 model
ASoC: fsl_esai: Fix TDM slot setup for I2S mode
ASoC: codecs: lpass-rx-macro: set npl clock rate correctly
ASoC: codecs: lpass-tx-macro: set npl clock rate correctly
ASoC: sunxi: sun4i-codec: fill ASoC card owner
ASoC: cygnus: fix for_each_child.cocci warnings
ASoC: max98373: Added 30ms turn on/off time delay
ASoC: max98373: Changed amp shutdown register as volatile
ASoC: intel: atom: Remove 44100 sample-rate from the media and deep-buffer DAI descriptions
ASoC: intel: atom: Stop advertising non working S24LE support
ASoC: wm8960: Fix wrong bclk and lrclk with pll enabled for some chips
ASoC: SOF: Intel: move ELH chip info
ASoC: SOF: Intel: APL: set shutdown callback to hda_dsp_shutdown
ASoC: SOF: Intel: CNL: set shutdown callback to hda_dsp_shutdown
ASoC: SOF: Intel: ICL: set shutdown callback to hda_dsp_shutdown
ASoC: SOF: Intel: TGL: set shutdown callback to hda_dsp_shutdown
ASoC: SOF: Intel: TGL: fix EHL ops
ASoC: SOF: core: harden shutdown helper
...
|
|
Pull kvm fix from Paolo Bonzini:
"A lone x86 patch, for a bug found while developing a backport to
stable versions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull close_range() fix from Christian Brauner:
"Syzbot reported a bug in close_range.
Debugging this showed we didn't recalculate the current maximum fd
number for CLOSE_RANGE_UNSHARE | CLOSE_RANGE_CLOEXEC after we unshared
the file descriptors table. As a result, max_fd could exceed the
current fdtable maximum causing us to set excessive bits.
As a concrete example, let's say the user requested everything from fd
4 to ~0UL to be closed and their current fdtable size is 256 with
their highest open fd being 4. With CLOSE_RANGE_UNSHARE the caller
will end up with a new fdtable which has room for 64 file descriptors
since that is the lowest fdtable size we accept. But now max_fd will
still point to 255 and needs to be adjusted. Fix this by retrieving
the correct maximum fd value in __range_cloexec().
I've carried this fix for a little while but since there was no
linux-next release over easter I waited until now.
With this change close_range() can be further simplified but imho we
are in no hurry to do that and so I'll defer this for the 5.13 merge
window"
* tag 'for-linus-2021-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
file: fix close_range() for unshare+cloexec
|
|
Pull umount fix from Al Viro:
"Brown paperbag time: dumb braino in the series that went into 5.7
broke the 'don't step into ->d_weak_revalidate() when umount(2) looks
the victim up' behaviour.
Spotted only now - saw
if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) {
err = handle_lookup_down(nd);
nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please...
}
and went "why do we clear that flag here - nothing below that point is
going to check it anyway" / "wait a minute, what is it doing *after*
complete_walk() (which is where we check that flag and call
->d_weak_revalidate())" / "how could that possibly _not_ break?",
followed by reproducing the breakage and verifying that the obvious
fix of that braino does, indeed, fix it.
The reproducer is (assuming that $DIR exists and is exported r/w to
localhost)
mkdir $DIR/a
mkdir /tmp/foo
mount --bind /tmp/foo /tmp/foo
mkdir /tmp/foo/a
mkdir /tmp/foo/b
mount -t nfs4 localhost:$DIR/a /tmp/foo/a
mount -t nfs4 localhost:$DIR /tmp/foo/b
rmdir /tmp/foo/b/a
umount /tmp/foo/b
umount /tmp/foo/a
umount -l /tmp/foo # will get everything under /tmp/foo, no matter what
Correct behaviour is successful umount; broken kernels (5.7-rc1 and
later) get
umount.nfs4: /tmp/foo/a: Stale file handle
Note that bind mount is there to be able to recover - on broken
kernels we'd get stuck with impossible-to-umount filesystem if not for
that.
FWIW, that braino had been posted for review back then, at least
twice. Unfortunately, the call of complete_walk() was outside of diff
context, so the bogosity hadn't been immediately obvious from the
patch alone ;-/"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
|
|
If the beacon head attribute (NL80211_ATTR_BEACON_HEAD)
is too short to even contain the frame control field,
we access uninitialized data beyond the buffer. Fix this
by checking the minimal required size first. We used to
do this until S1G support was added, where the fixed
data portion has a different size.
Reported-and-tested-by: [email protected]
Suggested-by: Eric Dumazet <[email protected]>
Fixes: 1d47f1198d58 ("nl80211: correctly validate S1G beacon head")
Signed-off-by: Johannes Berg <[email protected]>
Link: https://lore.kernel.org/r/20210408154518.d9b06d39b4ee.Iff908997b2a4067e8d456b3cb96cab9771d252b8@changeid
Signed-off-by: Johannes Berg <[email protected]>
|
|
The branch displacement logic in the BPF JIT compilers for x86 assumes
that, for any generated branch instruction, the distance cannot
increase between optimization passes.
But this assumption can be violated due to how the distances are
computed. Specifically, whenever a backward branch is processed in
do_jit(), the distance is computed by subtracting the positions in the
machine code from different optimization passes. This is because part
of addrs[] is already updated for the current optimization pass, before
the branch instruction is visited.
And so the optimizer can expand blocks of machine code in some cases.
This can confuse the optimizer logic, where it assumes that a fixed
point has been reached for all machine code blocks once the total
program size stops changing. And then the JIT compiler can output
abnormal machine code containing incorrect branch displacements.
To mitigate this issue, we assert that a fixed point is reached while
populating the output image. This rejects any problematic programs.
The issue affects both x86-32 and x86-64. We mitigate separately to
ease backporting.
Signed-off-by: Piotr Krysiuk <[email protected]>
Reviewed-by: Daniel Borkmann <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|
|
The branch displacement logic in the BPF JIT compilers for x86 assumes
that, for any generated branch instruction, the distance cannot
increase between optimization passes.
But this assumption can be violated due to how the distances are
computed. Specifically, whenever a backward branch is processed in
do_jit(), the distance is computed by subtracting the positions in the
machine code from different optimization passes. This is because part
of addrs[] is already updated for the current optimization pass, before
the branch instruction is visited.
And so the optimizer can expand blocks of machine code in some cases.
This can confuse the optimizer logic, where it assumes that a fixed
point has been reached for all machine code blocks once the total
program size stops changing. And then the JIT compiler can output
abnormal machine code containing incorrect branch displacements.
To mitigate this issue, we assert that a fixed point is reached while
populating the output image. This rejects any problematic programs.
The issue affects both x86-32 and x86-64. We mitigate separately to
ease backporting.
Signed-off-by: Piotr Krysiuk <[email protected]>
Reviewed-by: Daniel Borkmann <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
|