Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull networking fixes from David Miller:
1) Fix ppp_mppe crypto soft dependencies, from Takashi Iawi.
2) Fix TX completion to be finite, from Sergej Benilov.
3) Use register_pernet_device to avoid a dst leak in tipc, from Xin
Long.
4) Double free of TX cleanup in Dirk van der Merwe.
5) Memory leak in packet_set_ring(), from Eric Dumazet.
6) Out of bounds read in qmi_wwan, from Bjørn Mork.
7) Fix iif used in mcast/bcast looped back packets, from Stephen
Suryaputra.
8) Fix neighbour resolution on raw ipv6 sockets, from Nicolas Dichtel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (25 commits)
af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
sctp: change to hold sk after auth shkey is created successfully
ipv6: fix neighbour resolution with raw socket
ipv6: constify rt6_nexthop()
net: dsa: microchip: Use gpiod_set_value_cansleep()
net: aquantia: fix vlans not working over bridged network
ipv4: reset rt_iif for recirculated mcast/bcast out pkts
team: Always enable vlan tx offload
net/smc: Fix error path in smc_init
net/smc: hold conns_lock before calling smc_lgr_register_conn()
bonding: Always enable vlan tx offload
net/ipv6: Fix misuse of proc_dointvec "skip_notify_on_dev_down"
ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
qmi_wwan: Fix out-of-bounds read
tipc: check msg->req data len in tipc_nl_compat_bearer_disable
net: macb: do not copy the mac address if NULL
net/packet: fix memory leak in packet_set_ring()
net/tls: fix page double free on TX cleanup
net/sched: cbs: Fix error path of cbs_module_init
tipc: change to use register_pernet_device
...
|
|
Fixes: e70980312a94 ("MAINTAINERS: Add entry for the generic VDSO library")
Reported-by: Joe Perches/ <[email protected]>
Reported-by: Andy Lutomirks^H^Hski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
All preparations are done. Use the channel storage for the legacy
clockevent and remove the static variable.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Replace the static initialization of the legacy clockevent with runtime
initialization utilizing the common init function as the last preparatory
step to switch the legacy clockevent over to the channel 0 storage in
hpet_base.
This comes with a twist. The static clockevent initializer has selected
support for periodic and oneshot mode unconditionally whether the HPET
config advertised periodic mode or not. Even the pre clockevents code did
this. But....
Using the conditional in hpet_init_clockevent() makes at least Qemu and one
hardware machine fail to boot. There are two issues which cause the boot
failure:
#1 After the timer delivery test in IOAPIC and the IOAPIC setup the next
interrupt is not delivered despite the HPET channel being programmed
correctly. Reprogramming the HPET after switching to IOAPIC makes it
work again. After fixing this, the next issue surfaces:
#2 Due to the unconditional periodic mode 'availability' the Local APIC
timer calibration can hijack the global clockevents event handler
without causing damage. Using oneshot at this stage makes if hang
because the HPET does not get reprogrammed due to the handler
hijacking. Duh, stupid me!
Both issues require major surgery and especially the kick HPET again after
enabling IOAPIC results in really nasty hackery. This 'assume periodic
works' magic has survived since HPET support got added, so it's
questionable whether this should be fixed. Both Qemu and the failing
hardware machine support periodic mode despite the fact that both don't
advertise it in the configuration register and both need that extra kick
after switching to IOAPIC. Seems to be a feature...
Keep the 'assume periodic works' magic around and add a big fat comment.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
To finally remove the static channel0/clockevent storage and to utilize the
channel 0 storage in hpet_base, it's required to run time initialize the
clockevent. The MSI clockevents already have a run time init function.
Carve out the parts which can be shared between the legacy and the MSI
implementation.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Now that the legacy clockevent is wrapped in a hpet_channel struct most
clockevent functions can be shared between the legacy and the MSI based
clockevents.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
For HPET channel 0 there exist two clockevent structures right now:
- the static hpet_clockevent
- the clockevent in channel 0 storage
The goal is to use the clockevent in the channel storage, remove the static
variable and share code with the MSI implementation.
As a first step wrap the legacy clockevent into a hpet_channel struct and
convert the users.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Now that HPET clockevent support is integrated into the channel data, reuse
the cached boot configuration instead of copying the same information into
a flags field.
This also allows to consolidate the reservation code into one place, which
can now solely depend on the mode information.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Instead of allocating yet another data structure, move the clock event data
into the channel structure. This allows further consolidation of the
reservation code and the reuse of the cached boot config to replace the
extra flags in the clockevent data.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
struct hpet_dev is gone with the next change as the clockevent storage
moves into struct hpet_channel. So the variable name hdev will not make
sense anymore. Ditto for timer vs. channel and similar details.
Doing the rename in the change makes the patch harder to review. Doing it
afterward is problematic vs. tracking down issues. Doing it upfront is the
easiest solution as it does not change functionality.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
If CONFIG_HPET=y is enabled the x86 specific HPET code should reserve at
least one channel for the /dev/hpet character device, so that not all
channels are absorbed for per CPU clockevent devices.
Create a function to assign HPET_MODE_DEVICE so the rework of the
clockevents allocation code can utilize the mode information instead of
reducing the number of evaluated channels by #ifdef hackery.
The function is not yet used, but provided as a separate patch for ease of
review. It will be used when the rework of the clockevent selection takes
place.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The usage of the individual HPET channels is not tracked in a central
place. The information is scattered in different data structures. Also the
HPET reservation in the HPET character device is split out into several
places which makes the code hard to follow.
Assigning a mode to the channel allows to consolidate the reservation code
and paves the way for further simplifications.
As a first step set the mode of the legacy channels when the HPET is in
legacy mode.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Instead of rereading the HPET registers over and over use the information
which was cached in hpet_enable().
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Introduce new data structures to replace the ad hoc collection of separate
variables and pointers.
Replace the boot configuration store and restore as a first step.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Use 'evt' for clockevents pointers and capitalize HPET in comments.
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Signed-off-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
It's a function not a macro and the upcoming changes use channel for the
individual hpet timer units to allow a step by step refactoring approach.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
There is no point to loop for 200k TSC cycles to check afterwards whether
the HPET counter is working. Read the counter inside of the loop and break
out when the counter value changed.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The init code checks whether the HPET counter works late in the init
function when the clocksource is registered. That should happen right with
the other sanity checks.
Split it into a separate validation function and move it to the other
sanity checks.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
It doesn't make sense to have init functions in the middle of other
code. Aside of that, further changes in that area create horrible diffs if
the code stays where it is.
No functional change
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Having static and global variables sprinkled all over the code is just
annoying to read. Move them all to the top of the file.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Mark them inline and remove the pointless 'return;' statement.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
They are only called from init code.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
No users.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The clockevent device pointer is not used in this function.
While at it, rename the misnamed 'timer' parameter to 'channel', which makes it
clear what this parameter means.
No functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
Nothing requires asm/pgtable.h here anymore.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
As a preparatory change for further consolidation, restructure the HPET
init code so it becomes more readable. Fix up misleading and stale comments
and rename variables so they actually make sense.
No intended functional change.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
And sanitize the format strings while at it.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The indirection via work scheduled on the upcoming CPU was necessary with the
old hotplug code because the online callback was invoked on the control CPU
not on the upcoming CPU. The rework of the CPU hotplug core guarantees that
the online callbacks are invoked on the upcoming CPU.
Remove the now pointless work redirection.
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ricardo Neri <[email protected]>
Cc: Ashok Raj <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Suravee Suthikulpanit <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Ravi Shankar <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
|
|
The ORC unwinder can't unwind through BPF JIT generated code because
there are no ORC entries associated with the code.
If an ORC entry isn't available, try to fall back to frame pointers. If
BPF and other generated code always do frame pointer setup (even with
CONFIG_FRAME_POINTERS=n) then this will allow ORC to unwind through most
generated code despite there being no corresponding ORC entries.
Fixes: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
Reported-by: Song Liu <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/b6f69208ddff4343d56b7bfac1fc7cfcd62689e8.1561595111.git.jpoimboe@redhat.com
|
|
The stacktrace_map_raw_tp BPF selftest is failing because the RIP saved by
perf_arch_fetch_caller_regs() isn't getting saved by perf_callchain_kernel().
This was broken by the following commit:
d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
With that change, when starting with non-HW regs, the unwinder starts
with the current stack frame and unwinds until it passes up the frame
which called perf_arch_fetch_caller_regs(). So regs->ip needs to be
saved deliberately.
Fixes: d15d356887e7 ("perf/x86: Make perf callchains work without CONFIG_FRAME_POINTER")
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Kairui Song <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Link: https://lkml.kernel.org/r/3975a298fa52b506fea32666d8ff6a13467eee6d.1561595111.git.jpoimboe@redhat.com
|
|
get_gate_page() is a piece of somewhat alarming code to make
get_user_pages() work on the vsyscall page. Test it via
process_vm_readv().
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Kernel Hardening <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/0fe34229a9330e8f9de9765967939cc4f1cf26b1.1561610354.git.luto@kernel.org
|
|
The vDSO is only configurable by command-line options, so make its
global variables __ro_after_init. This seems highly unlikely to
ever stop an exploit, but it's nicer anyway.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Kernel Hardening <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/a386925835e49d319e70c4d7404b1f6c3c2e3702.1561610354.git.luto@kernel.org
|
|
The use case for full emulation over xonly is very esoteric, e.g. magic
instrumentation tools.
Change the default to the safer xonly mode.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Kernel Hardening <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/30539f8072d2376b9c9efcc07e6ed0d6bf20e882.1561610354.git.luto@kernel.org
|
|
If vsyscall=none accidentally still allowed vsyscalls, the test wouldn't
fail. Fix it.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Kernel Hardening <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/b413397c804265f8865f3e70b14b09485ea7c314.1561610354.git.luto@kernel.org
|
|
Even if vsyscall=none, user page faults on the vsyscall page are reported
as though the PROT bit in the error code was set. Add a comment explaining
why this is probably okay and display the value in the test case.
While at it, explain why the behavior is correct with respect to PKRU.
Modify also the selftest to print the odd error code so that there is a
way to demonstrate the odd behaviour.
If anyone really cares about more accurate emulation, the behaviour could
be changed. But that needs a real good justification.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Kernel Hardening <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/75c91855fd850649ace162eec5495a1354221aaa.1561610354.git.luto@kernel.org
|
|
Just segfaulting the application when it tries to read the vsyscall page in
xonly mode is not helpful for those who need to debug it.
Emit a hint.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jann Horn <[email protected]>
Link: https://lkml.kernel.org/r/8016afffe0eab497be32017ad7f6f7030dc3ba66.1561610354.git.luto@kernel.org
|
|
With vsyscall emulation on, a readable vsyscall page is still exposed that
contains syscall instructions that validly implement the vsyscalls.
This is required because certain dynamic binary instrumentation tools
attempt to read the call targets of call instructions in the instrumented
code. If the instrumented code uses vsyscalls, then the vsyscall page needs
to contain readable code.
Unfortunately, leaving readable memory at a deterministic address can be
used to help various ASLR bypasses, so some hardening value can be gained
by disallowing vsyscall reads.
Given how rarely the vsyscall page needs to be readable, add a mechanism to
make the vsyscall page be execute only.
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Kernel Hardening <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/d17655777c21bc09a7af1bbcf74e6f2b69a51152.1561610354.git.luto@kernel.org
|
|
The vsyscall=native feature is gone -- remove the docs.
Fixes: 076ca272a14c ("x86/vsyscall/64: Drop "native" vsyscalls")
Signed-off-by: Andy Lutomirski <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Kees Cook <[email protected]>
Cc: Florian Weimer <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: [email protected]
Cc: Borislav Petkov <[email protected]>
Cc: Kernel Hardening <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lkml.kernel.org/r/d77c7105eb4c57c1a95a95b6a5b8ba194a18e764.1561610354.git.luto@kernel.org
|
|
Replace the uid/gid/perm permissions checking on a key with an ACL to allow
the SETATTR and SEARCH permissions to be split. This will also allow a
greater range of subjects to represented.
============
WHY DO THIS?
============
The problem is that SETATTR and SEARCH cover a slew of actions, not all of
which should be grouped together.
For SETATTR, this includes actions that are about controlling access to a
key:
(1) Changing a key's ownership.
(2) Changing a key's security information.
(3) Setting a keyring's restriction.
And actions that are about managing a key's lifetime:
(4) Setting an expiry time.
(5) Revoking a key.
and (proposed) managing a key as part of a cache:
(6) Invalidating a key.
Managing a key's lifetime doesn't really have anything to do with
controlling access to that key.
Expiry time is awkward since it's more about the lifetime of the content
and so, in some ways goes better with WRITE permission. It can, however,
be set unconditionally by a process with an appropriate authorisation token
for instantiating a key, and can also be set by the key type driver when a
key is instantiated, so lumping it with the access-controlling actions is
probably okay.
As for SEARCH permission, that currently covers:
(1) Finding keys in a keyring tree during a search.
(2) Permitting keyrings to be joined.
(3) Invalidation.
But these don't really belong together either, since these actions really
need to be controlled separately.
Finally, there are number of special cases to do with granting the
administrator special rights to invalidate or clear keys that I would like
to handle with the ACL rather than key flags and special checks.
===============
WHAT IS CHANGED
===============
The SETATTR permission is split to create two new permissions:
(1) SET_SECURITY - which allows the key's owner, group and ACL to be
changed and a restriction to be placed on a keyring.
(2) REVOKE - which allows a key to be revoked.
The SEARCH permission is split to create:
(1) SEARCH - which allows a keyring to be search and a key to be found.
(2) JOIN - which allows a keyring to be joined as a session keyring.
(3) INVAL - which allows a key to be invalidated.
The WRITE permission is also split to create:
(1) WRITE - which allows a key's content to be altered and links to be
added, removed and replaced in a keyring.
(2) CLEAR - which allows a keyring to be cleared completely. This is
split out to make it possible to give just this to an administrator.
(3) REVOKE - see above.
Keys acquire ACLs which consist of a series of ACEs, and all that apply are
unioned together. An ACE specifies a subject, such as:
(*) Possessor - permitted to anyone who 'possesses' a key
(*) Owner - permitted to the key owner
(*) Group - permitted to the key group
(*) Everyone - permitted to everyone
Note that 'Other' has been replaced with 'Everyone' on the assumption that
you wouldn't grant a permit to 'Other' that you wouldn't also grant to
everyone else.
Further subjects may be made available by later patches.
The ACE also specifies a permissions mask. The set of permissions is now:
VIEW Can view the key metadata
READ Can read the key content
WRITE Can update/modify the key content
SEARCH Can find the key by searching/requesting
LINK Can make a link to the key
SET_SECURITY Can change owner, ACL, expiry
INVAL Can invalidate
REVOKE Can revoke
JOIN Can join this keyring
CLEAR Can clear this keyring
The KEYCTL_SETPERM function is then deprecated.
The KEYCTL_SET_TIMEOUT function then is permitted if SET_SECURITY is set,
or if the caller has a valid instantiation auth token.
The KEYCTL_INVALIDATE function then requires INVAL.
The KEYCTL_REVOKE function then requires REVOKE.
The KEYCTL_JOIN_SESSION_KEYRING function then requires JOIN to join an
existing keyring.
The JOIN permission is enabled by default for session keyrings and manually
created keyrings only.
======================
BACKWARD COMPATIBILITY
======================
To maintain backward compatibility, KEYCTL_SETPERM will translate the
permissions mask it is given into a new ACL for a key - unless
KEYCTL_SET_ACL has been called on that key, in which case an error will be
returned.
It will convert possessor, owner, group and other permissions into separate
ACEs, if each portion of the mask is non-zero.
SETATTR permission turns on all of INVAL, REVOKE and SET_SECURITY. WRITE
permission turns on WRITE, REVOKE and, if a keyring, CLEAR. JOIN is turned
on if a keyring is being altered.
The KEYCTL_DESCRIBE function translates the ACL back into a permissions
mask to return depending on possessor, owner, group and everyone ACEs.
It will make the following mappings:
(1) INVAL, JOIN -> SEARCH
(2) SET_SECURITY -> SETATTR
(3) REVOKE -> WRITE if SETATTR isn't already set
(4) CLEAR -> WRITE
Note that the value subsequently returned by KEYCTL_DESCRIBE may not match
the value set with KEYCTL_SETATTR.
=======
TESTING
=======
This passes the keyutils testsuite for all but a couple of tests:
(1) tests/keyctl/dh_compute/badargs: The first wrong-key-type test now
returns EOPNOTSUPP rather than ENOKEY as READ permission isn't removed
if the type doesn't have ->read(). You still can't actually read the
key.
(2) tests/keyctl/permitting/valid: The view-other-permissions test doesn't
work as Other has been replaced with Everyone in the ACL.
Signed-off-by: David Howells <[email protected]>
|
|
Create a request_key_net() function and use it to pass the network
namespace domain tag into DNS revolver keys and rxrpc/AFS keys so that keys
for different domains can coexist in the same keyring.
Signed-off-by: David Howells <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
|
|
The index to access the threads tls array is controlled by userspace
via syscall: sys_ptrace(), hence leading to a potential exploitation
of the Spectre variant 1 vulnerability.
The index can be controlled from:
ptrace -> arch_ptrace -> do_get_thread_area.
Fix this by sanitizing the user supplied index before using it to access
the p->thread.tls_array.
Signed-off-by: Dianzhang Chen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
The index to access the threads ptrace_bps is controlled by userspace via
syscall: sys_ptrace(), hence leading to a potential exploitation of the
Spectre variant 1 vulnerability.
The index can be controlled from:
ptrace -> arch_ptrace -> ptrace_get_debugreg.
Fix this by sanitizing the user supplied index before using it access
thread->ptrace_bps.
Signed-off-by: Dianzhang Chen <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]
|
|
That gets rid of this warning:
./kernel/time/hrtimer.c:1119: WARNING: Block quote ends without a blank line; unexpected unindent.
and displays nicely both at the source code and at the produced
documentation.
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Linux Doc Mailing List <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Link: https://lkml.kernel.org/r/74ddad7dac331b4e5ce4a90e15c8a49e3a16d2ac.1561372382.git.mchehab+samsung@kernel.org
|
|
All callers use GFP_KERNEL. No point in having that argument.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
None of those functions have any users outside of workqueue.c. Confine
them.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Commit 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2") assumes
edac_mc_poll_msec to be unsigned long, but the type of the variable still
remained as int. Setting edac_mc_poll_msec can trigger out-of-bounds
write.
Reproducer:
# echo 1001 > /sys/module/edac_core/parameters/edac_mc_poll_msec
KASAN report:
BUG: KASAN: global-out-of-bounds in edac_set_poll_msec+0x140/0x150
Write of size 8 at addr ffffffffb91b2d00 by task bash/1996
CPU: 1 PID: 1996 Comm: bash Not tainted 5.2.0-rc6+ #23
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
Call Trace:
dump_stack+0xca/0x13e
print_address_description.cold+0x5/0x246
__kasan_report.cold+0x75/0x9a
? edac_set_poll_msec+0x140/0x150
kasan_report+0xe/0x20
edac_set_poll_msec+0x140/0x150
? dimmdev_location_show+0x30/0x30
? vfs_lock_file+0xe0/0xe0
? _raw_spin_lock+0x87/0xe0
param_attr_store+0x1b5/0x310
? param_array_set+0x4f0/0x4f0
module_attr_store+0x58/0x80
? module_attr_show+0x80/0x80
sysfs_kf_write+0x13d/0x1a0
kernfs_fop_write+0x2bc/0x460
? sysfs_kf_bin_read+0x270/0x270
? kernfs_notify+0x1f0/0x1f0
__vfs_write+0x81/0x100
vfs_write+0x1e1/0x560
ksys_write+0x126/0x250
? __ia32_sys_read+0xb0/0xb0
? do_syscall_64+0x1f/0x390
do_syscall_64+0xc1/0x390
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fa7caa5e970
Code: 73 01 c3 48 8b 0d 28 d5 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 99 2d 2c 00 00 75 10 b8 01 00 00 00 04
RSP: 002b:00007fff6acfdfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fa7caa5e970
RDX: 0000000000000005 RSI: 0000000000e95c08 RDI: 0000000000000001
RBP: 0000000000e95c08 R08: 00007fa7cad1e760 R09: 00007fa7cb36a700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000005
R13: 0000000000000001 R14: 00007fa7cad1d600 R15: 0000000000000005
The buggy address belongs to the variable:
edac_mc_poll_msec+0x0/0x40
Memory state around the buggy address:
ffffffffb91b2c00: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
ffffffffb91b2c80: 00 00 00 00 fa fa fa fa 00 00 00 00 fa fa fa fa
>ffffffffb91b2d00: 04 fa fa fa fa fa fa fa 04 fa fa fa fa fa fa fa
^
ffffffffb91b2d80: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
ffffffffb91b2e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Fix it by changing the type of edac_mc_poll_msec to unsigned int.
The reason why this patch adopts unsigned int rather than unsigned long
is msecs_to_jiffies() assumes arg to be unsigned int. We can avoid
integer conversion bugs and unsigned int will be large enough for
edac_mc_poll_msec.
Reviewed-by: James Morse <[email protected]>
Fixes: 9da21b1509d8 ("EDAC: Poll timeout cannot be zero, p2")
Signed-off-by: Eiichi Tsukata <[email protected]>
Signed-off-by: Tony Luck <[email protected]>
|
|
We need to convert all old gpio irqchips to pass the irqchip
setup along when adding the gpio_chip.
For chained irqchips this is a pretty straight-forward
conversion.
Take this opportunity to add a local dev pointer and
use devm_gpiochip_add() so we can get rid of the remove()
callback altogether.
Cc: [email protected]
Acked-by: Alban Bedel <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
|