Age | Commit message (Collapse) | Author | Files | Lines |
|
into master
Pull drm fixes from Dave Airlie:
"The nouveau fixes missed the last pull by a few hours, and we had a
few arm driver/panel/bridge fixes come in.
This is possibly a bit more than I'm comfortable sending at this
stage, but I've looked at each patch, the core + nouveau patches fix
regressions, and the arm related ones are all around screens turning
on and working, and are mostly trivial patches, the line count is
mostly in comments.
core:
- fix possible use-after-free
drm_fb_helper:
- regression fix to use memcpy_io on bochs' sparc64
nouveau:
- format modifiers fixes
- HDA regression fix
- turing modesetting race fix
of:
- fix a double free
dbi:
- fix SPI Type 1 transfer
mcde:
- fix screen stability crash
panel:
- panel: fix display noise on auo,kd101n80-45na
- panel: delay HPD checks for boe_nv133fhm_n61
bridge:
- bridge: drop connector check in nwl-dsi bridge
- bridge: set proper bridge type for adv7511"
* tag 'drm-fixes-2020-07-29' of git://anongit.freedesktop.org/drm/drm:
drm: hold gem reference until object is no longer accessed
drm/dbi: Fix SPI Type 1 (9-bit) transfer
drm/drm_fb_helper: fix fbdev with sparc64
drm/mcde: Fix stability issue
drm/bridge: nwl-dsi: Drop DRM_BRIDGE_ATTACH_NO_CONNECTOR check.
drm/panel: Fix auo, kd101n80-45na horizontal noise on edges of panel
drm: panel: simple: Delay HPD checking on boe_nv133fhm_n61 for 15 ms
drm/bridge/adv7511: set the bridge type properly
drm: of: Fix double-free bug
drm/nouveau/fbcon: zero-initialise the mode_cmd2 structure
drm/nouveau/fbcon: fix module unload when fbcon init has failed for some reason
drm/nouveau/kms/tu102: wait for core update to complete when assigning windows
drm/nouveau/kms/gf100: use correct format modifiers
drm/nouveau/disp/gm200-: fix regression from HDA SOR selection changes
|
|
This modifies the first 32 bits out of the 128 bits of a random CPU's
net_rand_state on interrupt or CPU activity to complicate remote
observations that could lead to guessing the network RNG's internal
state.
Note that depending on some network devices' interrupt rate moderation
or binding, this re-seeding might happen on every packet or even almost
never.
In addition, with NOHZ some CPUs might not even get timer interrupts,
leaving their local state rarely updated, while they are running
networked processes making use of the random state. For this reason, we
also perform this update in update_process_times() in order to at least
update the state when there is user or system activity, since it's the
only case we care about.
Reported-by: Amit Klein <[email protected]>
Suggested-by: Linus Torvalds <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: "Jason A. Donenfeld" <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: <[email protected]>
Signed-off-by: Willy Tarreau <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
free cmd id is read using virtio endian, spec says all fields
in balloon are LE. Fix it up.
Fixes: 86a559787e6f ("virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT")
Cc: [email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Wei Wang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
|
|
The poison_val field in the virtio_balloon_config is treated as a
little-endian field by the host. Since we are currently only having to deal
with a single byte poison value this isn't a problem, however if the value
should ever expand it would cause byte ordering issues. Document that in
the code so that we know that if the value should ever expand we need to
byte swap the value on big-endian architectures.
Signed-off-by: Alexander Duyck <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
|
|
vhost/scsi doesn't handle type conversion correctly
for request type when using virtio 1.0 and up for BE,
or cross-endian platforms.
Fix it up using vhost_32_to_cpu.
Cc: [email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
Acked-by: Jason Wang <[email protected]>
Reviewed-by: Stefan Hajnoczi <[email protected]>
|
|
Pull NVMe fixes from Christoph.
* 'nvme-5.8' of git://git.infradead.org/nvme:
nvme: add a Identify Namespace Identification Descriptor list quirk
nvme-pci: prevent SK hynix PC400 from using Write Zeroes command
nvme-tcp: fix possible hang waiting for icresp response
|
|
Scatter CQE feature relies on two flags MLX5_QP_FLAG_SCATTER_CQE and
MLX5_QP_FLAG_ALLOW_SCATTER_CQE, both of them can be provided without
relation to device capability.
Relax global validity check to allow MLX5_QP_FLAG_ALLOW_SCATTER_CQE QP
flag.
Existing user applications are failing on this new validity check.
Fixes: 90ecb37a751b ("RDMA/mlx5: Change scatter CQE flag to be set like other vendor flags")
Fixes: 37518fa49f76 ("RDMA/mlx5: Process all vendor flags in one place")
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Artemy Kovalyov <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
|
|
kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object.
Callback function fw_cfg_sysfs_release_entry() in kobject_put()
can handle the pointer "entry" properly.
Signed-off-by: Qiushi Wu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Michael S. Tsirkin <[email protected]>
|
|
0day reported a possible circular locking dependency:
Chain exists of:
&irq_desc_lock_class --> console_owner --> &port_lock_key
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&port_lock_key);
lock(console_owner);
lock(&port_lock_key);
lock(&irq_desc_lock_class);
The reason for this is a printk() in the i8259 interrupt chip driver
which is invoked with the irq descriptor lock held, which reverses the
lock operations vs. printk() from arbitrary contexts.
Switch the printk() to printk_deferred() to avoid that.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
|
|
Preemption must be disabled before entering a sequence count write side
critical section. Failing to do so, the seqcount read side can preempt
the write side section and spin for the entire scheduler tick. If that
reader belongs to a real-time scheduling class, it can spin forever and
the kernel will livelock.
Assert through lockdep that preemption is disabled for seqcount writers.
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Asserting that preemption is enabled or disabled is a critical sanity
check. Developers are usually reluctant to add such a check in a
fastpath as reading the preemption count can be costly.
Extend the lockdep API with macros asserting that preemption is disabled
or enabled. If lockdep is disabled, or if the underlying architecture
does not support kernel preemption, this assert has no runtime overhead.
References: f54bb2ec02c8 ("locking/lockdep: Add IRQs disabled/enabled assertion APIs: ...")
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
raw_seqcount_begin() has the same code as raw_read_seqcount(), with the
exception of masking the sequence counter's LSB before returning it to
the caller.
Note, raw_seqcount_begin() masks the counter's LSB before returning it
to the caller so that read_seqcount_retry() can fail if the counter is
odd -- without the overhead of an extra branching instruction.
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
seqlock.h is now included by kernel's RST documentation, but a small
number of the the exported seqlock.h functions are kernel-doc annotated.
Add kernel-doc for all seqlock.h exported APIs.
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The seqlock.h seqcount_t and seqlock_t API definitions are presented in
the chronological order of their development rather than the order that
makes most sense to readers. This makes it hard to follow and understand
the header file code.
Group and reorder all of the exported seqlock.h functions according to
their function.
First, group together the seqcount_t standard read path functions:
- __read_seqcount_begin()
- raw_read_seqcount_begin()
- read_seqcount_begin()
since each function is implemented exactly in terms of the one above
it. Then, group the special-case seqcount_t readers on their own as:
- raw_read_seqcount()
- raw_seqcount_begin()
since the only difference between the two functions is that the second
one masks the sequence counter LSB while the first one does not. Note
that raw_seqcount_begin() can actually be implemented in terms of
raw_read_seqcount(), which will be done in a follow-up commit.
Then, group the seqcount_t write path functions, instead of injecting
unrelated seqcount_t latch functions between them, and order them as:
- raw_write_seqcount_begin()
- raw_write_seqcount_end()
- write_seqcount_begin_nested()
- write_seqcount_begin()
- write_seqcount_end()
- raw_write_seqcount_barrier()
- write_seqcount_invalidate()
which is the expected natural order. This also isolates the seqcount_t
latch functions into their own area, at the end of the sequence counters
section, and before jumping to the next one: sequential locks
(seqlock_t).
Do a similar grouping and reordering for seqlock_t "locking" readers vs.
the "conditionally locking or lockless" ones.
No implementation code was changed in any of the reordering above.
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The seqcount_t latch reader example at the raw_write_seqcount_latch()
kernel-doc comment ends the latch read section with a manual smp memory
barrier and sequence counter comparison.
This is technically correct, but it is suboptimal: read_seqcount_retry()
already contains the same logic of an smp memory barrier and sequence
counter comparison.
End the latch read critical section example with read_seqcount_retry().
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Align the code samples and note sections inside kernel-doc comments with
tabs. This way they can be properly parsed and rendered by Sphinx. It
also makes the code samples easier to read from text editors.
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:
- Divide all documentation on a seqcount_t vs. seqlock_t basis. The
description for both mechanisms was intermingled, which is incorrect
since the usage constrains for each type are vastly different.
- Add an introductory paragraph describing the internal design of, and
rationale for, sequence counters.
- Document seqcount_t writer non-preemptibility requirement, which was
not previously documented anywhere, and provide a clear rationale.
- Provide template code for seqcount_t and seqlock_t initialization
and reader/writer critical sections.
- Recommend using seqlock_t by default. It implicitly handles the
serialization and non-preemptibility requirements of writers.
At seqlock.h:
- Remove references to brlocks as they've long been removed from the
kernel.
- Remove references to gcc-3.x since the kernel's minimum supported
gcc version is 4.9.
References: 0f6ed63b1707 ("no need to keep brlock macros anymore...")
References: 6ec4476ac825 ("Raise gcc version requirement to 4.9")
Signed-off-by: Ahmed S. Darwish <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
|
|
This patch breaks a header loop involving qspinlock_types.h.
The issue is that qspinlock_types.h includes atomic.h, which then
eventually includes kernel.h which could lead back to the original
file via spinlock_types.h.
As ATOMIC_INIT is now defined by linux/types.h, there is no longer
any need to include atomic.h from qspinlock_types.h. This also
allows the CONFIG_PARAVIRT hack to be removed since it was trying
to prevent exactly this loop.
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Waiman Long <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
This patch moves ATOMIC_INIT from asm/atomic.h into linux/types.h.
This allows users of atomic_t to use ATOMIC_INIT without having to
include atomic.h as that way may lead to header loops.
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Waiman Long <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Qian reported that the current setup forgoes the Kconfig dependencies and
results in warnings such as:
WARNING: unmet direct dependencies detected for SCHED_THERMAL_PRESSURE
Depends on [n]: SMP [=y] && CPU_FREQ_THERMAL [=n]
Selected by [y]:
- ARM64 [=y]
Revert commit
e17ae7fea871 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE")
and re-implement it by making the option default to 'y' for arm64 and arm,
which respects Kconfig dependencies (i.e. will remain 'n' if
CPU_FREQ_THERMAL=n).
Fixes: e17ae7fea871 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE")
Reported-by: Qian Cai <[email protected]>
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
present")
Unfortunately the commit listed in the subject line above failed
to ensure that the task's audit_context was properly initialized/set
before enabling the "accompanying records". Depending on the
situation, the resulting audit_context could have invalid values in
some of it's fields which could cause a kernel panic/oops when the
task/syscall exists and the audit records are generated.
We will revisit the original patch, with the necessary fixes, in a
future kernel but right now we just want to fix the kernel panic
with the least amount of added risk.
Cc: [email protected]
Fixes: 1320a4052ea1 ("audit: trigger accompanying records when no rules present")
Reported-by: [email protected]
Signed-off-by: Paul Moore <[email protected]>
|
|
Uclamp exposes 3 sysctl knobs:
* sched_util_clamp_min
* sched_util_clamp_max
* sched_util_clamp_min_rt_default
Document them in sysctl/kernel.rst.
Signed-off-by: Qais Yousef <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
RT tasks by default run at the highest capacity/performance level. When
uclamp is selected this default behavior is retained by enforcing the
requested uclamp.min (p->uclamp_req[UCLAMP_MIN]) of the RT tasks to be
uclamp_none(UCLAMP_MAX), which is SCHED_CAPACITY_SCALE; the maximum
value.
This is also referred to as 'the default boost value of RT tasks'.
See commit 1a00d999971c ("sched/uclamp: Set default clamps for RT tasks").
On battery powered devices, it is desired to control this default
(currently hardcoded) behavior at runtime to reduce energy consumed by
RT tasks.
For example, a mobile device manufacturer where big.LITTLE architecture
is dominant, the performance of the little cores varies across SoCs, and
on high end ones the big cores could be too power hungry.
Given the diversity of SoCs, the new knob allows manufactures to tune
the best performance/power for RT tasks for the particular hardware they
run on.
They could opt to further tune the value when the user selects
a different power saving mode or when the device is actively charging.
The runtime aspect of it further helps in creating a single kernel image
that can be run on multiple devices that require different tuning.
Keep in mind that a lot of RT tasks in the system are created by the
kernel. On Android for instance I can see over 50 RT tasks, only
a handful of which created by the Android framework.
To control the default behavior globally by system admins and device
integrator, introduce the new sysctl_sched_uclamp_util_min_rt_default
to change the default boost value of the RT tasks.
I anticipate this to be mostly in the form of modifying the init script
of a particular device.
To avoid polluting the fast path with unnecessary code, the approach
taken is to synchronously do the update by traversing all the existing
tasks in the system. This could race with a concurrent fork(), which is
dealt with by introducing sched_post_fork() function which will ensure
the racy fork will get the right update applied.
Tested on Juno-r2 in combination with the RT capacity awareness [1].
By default an RT task will go to the highest capacity CPU and run at the
maximum frequency, which is particularly energy inefficient on high end
mobile devices because the biggest core[s] are 'huge' and power hungry.
With this patch the RT task can be controlled to run anywhere by
default, and doesn't cause the frequency to be maximum all the time.
Yet any task that really needs to be boosted can easily escape this
default behavior by modifying its requested uclamp.min value
(p->uclamp_req[UCLAMP_MIN]) via sched_setattr() syscall.
[1] 804d402fb6f6: ("sched/rt: Make RT capacity-aware")
Signed-off-by: Qais Yousef <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The following splat was caught when setting uclamp value of a task:
BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:49
cpus_read_lock+0x68/0x130
static_key_enable+0x1c/0x38
__sched_setscheduler+0x900/0xad8
Fix by ensuring we enable the key outside of the critical section in
__sched_setscheduler()
Fixes: 46609ce22703 ("sched/uclamp: Protect uclamp fast path code with static key")
Signed-off-by: Qais Yousef <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
When the ASoC card registration fails and the codec component driver
never probes, the codec device is not initialized and therefore
memory for codec->wcaps is not allocated. This results in a NULL pointer
dereference when the codec driver suspend callback is invoked during
system suspend. Fix this by returning without performing any actions
during codec suspend/resume if the card was not registered successfully.
Reviewed-by: Pierre-Louis Bossart <[email protected]>
Signed-off-by: Ranjani Sridharan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Add a quirk for a device that does not support the Identify Namespace
Identification Descriptor list despite claiming 1.3 compliance.
Fixes: ea43d9709f72 ("nvme: fix identify error status silent ignore")
Reported-by: Ingo Brunberg <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Tested-by: Ingo Brunberg <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
* drm: fix possible use-after-free
* dbi: fix SPI Type 1 transfer
* drm_fb_helper: use memcpy_io on bochs' sparc64
* mcde: fix stability
* panel: fix display noise on auo,kd101n80-45na
* panel: delay HPD checks for boe_nv133fhm_n61
* bridge: drop connector check in nwl-dsi bridge
* bridge: set proper bridge type for adv7511
* of: fix a double free
Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/20200728110446.GA8076@linux-uq9g
|
|
Removed redundant words.
Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.")
Signed-off-by: Martin Varghese <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In multiproto mode, bareudp_xmit() accepts sending multicast MPLS and
IPv6 packets regardless of the bareudp ethertype. In practice, this
let an IP tunnel send multicast MPLS packets, or an MPLS tunnel send
IPv6 packets.
We need to restrict the test further, so that the multiproto mode only
enables
* IPv6 for IPv4 tunnels,
* or multicast MPLS for unicast MPLS tunnels.
To improve clarity, the protocol validation is moved to its own
function, where each logical test has its own condition.
v2: s/ntohs/htons/
Fixes: 4b5f67232d95 ("net: Special handling for IP & MPLS.")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
ip6_route_info_create() invokes nexthop_get(), which increases the
refcount of the "nh".
When ip6_route_info_create() returns, local variable "nh" becomes
invalid, so the refcount should be decreased to keep refcount balanced.
The reference counting issue happens in one exception handling path of
ip6_route_info_create(). When nexthops can not be used with source
routing, the function forgets to decrease the refcnt increased by
nexthop_get(), causing a refcnt leak.
Fix this issue by pulling up the error source routing handling when
nexthops can not be used with source routing.
Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info")
Signed-off-by: Xiyu Yang <[email protected]>
Signed-off-by: Xin Tan <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Subbaraya Sundeep says:
====================
Fix bugs in Octeontx2 netdev driver
There are problems in the existing Octeontx2
netdev drivers like missing cancel_work for the
reset task, missing lock in reset task and
missing unergister_netdev in driver remove.
This patch set fixes the above problems.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Added unregister_netdev in the driver remove
function. Generally unregister_netdev is called
after disabling all the device interrupts but here
it is called before disabling device mailbox
interrupts. The reason behind this is VF needs
mailbox interrupt to communicate with its PF to
clean up its resources during otx2_stop.
otx2_stop disables packet I/O and queue interrupts
first and by using mailbox interrupt communicates
to PF to free VF resources. Hence this patch
calls unregister_device just before
disabling mailbox interrupts.
Fixes: 3184fb5ba96e ("octeontx2-vf: Virtual function driver support")
Signed-off-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: Sunil Goutham <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
During driver exit cancel the queued
reset_task work in VF driver.
Fixes: 3184fb5ba96e ("octeontx2-vf: Virtual function driver support")
Signed-off-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: Sunil Goutham <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Two bugs exist in the code related to reset_task
in PF driver one is the missing protection
against network stack ndo_open and ndo_close.
Other one is the missing cancel_work.
This patch fixes those problems.
Fixes: 4ff7d1488a84 ("octeontx2-pf: Error handling support")
Signed-off-by: Subbaraya Sundeep <[email protected]>
Signed-off-by: Sunil Goutham <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
It appears that not disabling a PCI device on .shutdown may lead to
a Hardware Error with particular (perhaps buggy) BIOS versions:
mlx4_en: eth0: Close port called
mlx4_en 0000:04:00.0: removed PHC
reboot: Restarting system
{1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
{1}[Hardware Error]: event severity: fatal
{1}[Hardware Error]: Error 0, type: fatal
{1}[Hardware Error]: section_type: PCIe error
{1}[Hardware Error]: port_type: 4, root port
{1}[Hardware Error]: version: 1.16
{1}[Hardware Error]: command: 0x4010, status: 0x0143
{1}[Hardware Error]: device_id: 0000:00:02.2
{1}[Hardware Error]: slot: 0
{1}[Hardware Error]: secondary_bus: 0x04
{1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x2f06
{1}[Hardware Error]: class_code: 000604
{1}[Hardware Error]: bridge: secondary_status: 0x2000, control: 0x0003
{1}[Hardware Error]: aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00000000
{1}[Hardware Error]: aer_uncor_severity: 0x00062030
{1}[Hardware Error]: TLP Header: 40000018 040000ff 791f4080 00000000
[hw error repeats]
Kernel panic - not syncing: Fatal hardware error!
CPU: 0 PID: 2189 Comm: reboot Kdump: loaded Not tainted 5.6.x-blabla #1
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 05/05/2017
Fix the mlx4 driver.
This is a very similar problem to what had been fixed in:
commit 0d98ba8d70b0 ("scsi: hpsa: disable device during shutdown")
to address https://bugzilla.kernel.org/show_bug.cgi?id=199779.
Fixes: 2ba5fbd62b25 ("net/mlx4_core: Handle AER flow properly")
Reported-by: Jake Lawrence <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Herbert Xu says:
====================
rhashtable: Fix unprotected RCU dereference in __rht_ptr
This patch series fixes an unprotected dereference in __rht_ptr.
The first patch is a minimal fix that does not use the correct
RCU markings but is suitable for backport, and the second patch
cleans up the RCU markings.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch restores the RCU marking on bucket_table->buckets as
it really does need RCU protection. Its removal had led to a fatal
bug.
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The rcu_dereference call in rht_ptr_rcu is completely bogus because
we've already dereferenced the value in __rht_ptr and operated on it.
This causes potential double readings which could be fatal. The RCU
dereference must occur prior to the comparison in __rht_ptr.
This patch changes the order of RCU dereference so that it is done
first and the result is then fed to __rht_ptr. The RCU marking
changes have been minimised using casts which will be removed in
a follow-up patch.
Fixes: ba6306e3f648 ("rhashtable: Remove RCU marking from...")
Reported-by: "Gong, Sishuai" <[email protected]>
Signed-off-by: Herbert Xu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Modify mtk_gmac0_rgmii_adjust() so it can always be called.
mtk_gmac0_rgmii_adjust() sets-up the TRGMII clocks.
Signed-off-by: René van Dorst <[email protected]>
Signed-off-By: David Woodhouse <[email protected]>
Tested-by: Frank Wunderlich <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes-2020-07-28
This series introduces some fixes to mlx5 driver.
v1->v2:
- Drop the "Hold reference on mirred devices" patch, until Or's
comments are addressed.
- Imporve "Modify uplink state" patch commit message per Or's request.
Please pull and let me know if there is any problem.
For -Stable:
For -stable v4.9
('net/mlx5e: Fix error path of device attach')
For -stable v4.15
('net/mlx5: Verify Hardware supports requested ptp function on a given
pin')
For -stable v5.3
('net/mlx5e: Modify uplink state on interface up/down')
For -stable v5.4
('net/mlx5e: Fix kernel crash when setting vf VLANID on a VF dev')
('net/mlx5: E-switch, Destroy TSAR when fail to enable the mode')
For -stable v5.5
('net/mlx5: E-switch, Destroy TSAR after reload interface')
For -stable v5.7
('net/mlx5: Fix a bug of using ptp channel index as pin index')
====================
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Johan Hovold says:
====================
net: lan78xx: fix NULL deref and memory leak
The first two patches fix a NULL-pointer dereference at probe that can
be triggered by a malicious device and a small transfer-buffer memory
leak, respectively.
For another subsystem I would have marked them:
Cc: [email protected] # 4.3
The third one replaces the driver's current broken endpoint lookup
helper, which could end up accepting incomplete interfaces and whose
results weren't even useeren
Johan
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Drop the bogus endpoint-lookup helper which could end up accepting
interfaces based on endpoints belonging to unrelated altsettings.
Note that the returned bulk pipes and interrupt endpoint descriptor
were never actually used. Instead the bulk-endpoint numbers are
hardcoded to 1 and 2 (matching the specification), while the interrupt-
endpoint descriptor was assumed to be the third descriptor created by
USB core.
Try to bring some order to this by dropping the bogus lookup helper and
adding the missing endpoint sanity checks while keeping the interrupt-
descriptor assumption for now.
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The interrupt URB transfer-buffer was never freed on disconnect or after
probe errors.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Cc: [email protected] <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add the missing endpoint sanity check to prevent a NULL-pointer
dereference should a malicious device lack the expected endpoints.
Note that the driver has a broken endpoint-lookup helper,
lan78xx_get_endpoints(), which can end up accepting interfaces in an
altsetting without endpoints as long as *some* altsetting has a bulk-in
and a bulk-out endpoint.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Cc: [email protected] <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
in case of an error tty_register_device_attr() returns ERR_PTR(),
add IS_ERR() check
Reported-and-tested-by: [email protected]
Link: https://syzkaller.appspot.com/bug?extid=67b2bd0e34f952d0321e
Signed-off-by: Rustam Kovhaev <[email protected]>
Reviewed-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After the cited commit, function 'mlx5_eswitch_set_vport_vlan' started
to acquire esw->state_lock.
However, esw is not defined for VF devices, hence attempting to set vf
VLANID on a VF dev will cause a kernel panic.
Fix it by moving up the (redundant) esw validation from function
'__mlx5_eswitch_set_vport_vlan' since the rest of the callers now have
and use a valid esw.
For example with vf device eth4:
# ip link set dev eth4 vf 0 vlan 0
Trace of the panic:
[ 411.409842] BUG: unable to handle page fault for address: 00000000000011b8
[ 411.449745] #PF: supervisor read access in kernel mode
[ 411.452348] #PF: error_code(0x0000) - not-present page
[ 411.454938] PGD 80000004189c9067 P4D 80000004189c9067 PUD 41899a067 PMD 0
[ 411.458382] Oops: 0000 [#1] SMP PTI
[ 411.460268] CPU: 4 PID: 5711 Comm: ip Not tainted 5.8.0-rc4_for_upstream_min_debug_2020_07_08_22_04 #1
[ 411.462447] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 411.464158] RIP: 0010:__mutex_lock+0x4e/0x940
[ 411.464928] Code: fd 41 54 49 89 f4 41 52 53 89 d3 48 83 ec 70 44 8b 1d ee 03 b0 01 65 48 8b 04 25 28 00 00 00 48 89 45 c8 31 c0 45 85 db 75 0a <48> 3b 7f 60 0f 85 7e 05 00 00 49 8d 45 68 41 56 41 b8 01 00 00 00
[ 411.467678] RSP: 0018:ffff88841fcd74b0 EFLAGS: 00010246
[ 411.468562] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 411.469715] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000001158
[ 411.470812] RBP: ffff88841fcd7550 R08: ffffffffa00fa1ce R09: 0000000000000000
[ 411.471835] R10: ffff88841fcd7570 R11: 0000000000000000 R12: 0000000000000002
[ 411.472862] R13: 0000000000001158 R14: ffffffffa00fa1ce R15: 0000000000000000
[ 411.474004] FS: 00007faee7ca6b80(0000) GS:ffff88846fc00000(0000) knlGS:0000000000000000
[ 411.475237] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 411.476129] CR2: 00000000000011b8 CR3: 000000041909c006 CR4: 0000000000360ea0
[ 411.477260] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 411.478340] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 411.479332] Call Trace:
[ 411.479760] ? __nla_validate_parse.part.6+0x57/0x8f0
[ 411.482825] ? mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
[ 411.483804] mlx5_eswitch_set_vport_vlan+0x3e/0xa0 [mlx5_core]
[ 411.484733] mlx5e_set_vf_vlan+0x41/0x50 [mlx5_core]
[ 411.485545] do_setlink+0x613/0x1000
[ 411.486165] __rtnl_newlink+0x53d/0x8c0
[ 411.486791] ? mark_held_locks+0x49/0x70
[ 411.487429] ? __lock_acquire+0x8fe/0x1eb0
[ 411.488085] ? rcu_read_lock_sched_held+0x52/0x60
[ 411.488998] ? kmem_cache_alloc_trace+0x16d/0x2d0
[ 411.489759] rtnl_newlink+0x47/0x70
[ 411.490357] rtnetlink_rcv_msg+0x24e/0x450
[ 411.490978] ? netlink_deliver_tap+0x92/0x3d0
[ 411.491631] ? validate_linkmsg+0x330/0x330
[ 411.492262] netlink_rcv_skb+0x47/0x110
[ 411.492852] netlink_unicast+0x1ac/0x270
[ 411.493551] netlink_sendmsg+0x336/0x450
[ 411.494209] sock_sendmsg+0x30/0x40
[ 411.494779] ____sys_sendmsg+0x1dd/0x1f0
[ 411.495378] ? copy_msghdr_from_user+0x5c/0x90
[ 411.496082] ___sys_sendmsg+0x87/0xd0
[ 411.496683] ? lock_acquire+0xb9/0x3a0
[ 411.497322] ? lru_cache_add+0x5/0x170
[ 411.497944] ? find_held_lock+0x2d/0x90
[ 411.498568] ? handle_mm_fault+0xe46/0x18c0
[ 411.499205] ? __sys_sendmsg+0x51/0x90
[ 411.499784] __sys_sendmsg+0x51/0x90
[ 411.500341] do_syscall_64+0x59/0x2e0
[ 411.500938] ? asm_exc_page_fault+0x8/0x30
[ 411.501609] ? rcu_read_lock_sched_held+0x52/0x60
[ 411.502350] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 411.503093] RIP: 0033:0x7faee73b85a7
[ 411.503654] Code: Bad RIP value.
Fixes: 0e18134f4f9f ("net/mlx5e: Eswitch, use state_lock to synchronize vlan change")
Signed-off-by: Alaa Hleihel <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When setting the PF interface up/down, notify the firmware to update
uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.
This behavior will prevent sending traffic out on uplink port when PF is
down, such as sending traffic from a VF interface which is still up.
Currently when calling mlx5e_open/close(), the driver only sends PAOS
command to notify the firmware to set the physical port state to
up/down, however, it is not sufficient. When VF is in "auto" state, it
follows the uplink state, which was not updated on mlx5e_open/close()
before this patch.
When switchdev mode is enabled and uplink representor is first enabled,
set the uplink port state value back to its FW default "AUTO".
Fixes: 63bfd399de55 ("net/mlx5e: Send PAOS command on interface up/down")
Signed-off-by: Ron Diskin <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
In a special configuration, a ConnectX6-Dx pin pps-out might be activated
when driver is loaded. Fix the driver to always read the operational pin
mode when registering it, and advertise it accordingly.
Fixes: ee7f12205abc ("net/mlx5e: Implement 1PPS support")
Signed-off-by: Eran Ben Elisha <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
mlx5e_rep_is_lag_netdev is used as first check as part of netdev events
handler for bond device of non-uplink representors, this handler can get
any netdevice under the same network namespace of mlx5e netdevice. Current
code treats the netdev as mlx5e netdev and only later on verifies this,
hence causes the following Kasan trace:
[15402.744990] ==================================================================
[15402.746942] BUG: KASAN: slab-out-of-bounds in mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.749009] Read of size 8 at addr ffff880391f3f6b0 by task ovs-vswitchd/5347
[15402.752065] CPU: 7 PID: 5347 Comm: ovs-vswitchd Kdump: loaded Tainted: G B O --------- -t - 4.18.0-g3dcc204d291d-dirty #1
[15402.755349] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[15402.757600] Call Trace:
[15402.758968] dump_stack+0x71/0xab
[15402.760427] print_address_description+0x6a/0x270
[15402.761969] kasan_report+0x179/0x2d0
[15402.763445] ? mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.765121] mlx5e_rep_is_lag_netdev+0xcb/0xf0 [mlx5_core]
[15402.766782] mlx5e_rep_esw_bond_netevent+0x129/0x620 [mlx5_core]
Fix by deferring the violating access to be post the netdev verify check.
Fixes: 7e51891a237f ("net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule")
Signed-off-by: Raed Salem <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Reviewed-by: Vu Pham <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|