aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2018-01-16hrtimer: Make hrtimer_cpu_base.next_timer handling unconditionalAnna-Maria Gleixner1-2/+2
hrtimer_cpu_base.next_timer stores the pointer to the next expiring timer in a CPU base. This pointer cannot be dereferenced and is solely used to check whether a hrtimer which is removed is the hrtimer which is the first to expire in the CPU base. If this is the case, then the timer hardware needs to be reprogrammed to avoid an extra interrupt for nothing. Again, this is conditional functionality, but there is no compelling reason to make this conditional. As a preparation, hrtimer_cpu_base.next_timer needs to be available unconditonally. Aside of that the upcoming support for softirq based hrtimers requires access to this pointer unconditionally as well, so our motivation is not entirely simplicity based. Make the update of hrtimer_cpu_base.next_timer unconditional and remove the #ifdef cruft. The impact on CONFIG_HIGH_RES_TIMERS=n && CONFIG_NOHZ=n is marginal as it's just a store on an already dirtied cacheline. No functional change. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16hrtimer: Make the remote enqueue check unconditionalAnna-Maria Gleixner1-3/+3
hrtimer_cpu_base.expires_next is used to cache the next event armed in the timer hardware. The value is used to check whether an hrtimer can be enqueued remotely. If the new hrtimer is expiring before expires_next, then remote enqueue is not possible as the remote hrtimer hardware cannot be accessed for reprogramming to an earlier expiry time. The remote enqueue check is currently conditional on CONFIG_HIGH_RES_TIMERS=y and hrtimer_cpu_base.hres_active. There is no compelling reason to make this conditional. Move hrtimer_cpu_base.expires_next out of the CONFIG_HIGH_RES_TIMERS=y guarded area and remove the conditionals in hrtimer_check_target(). The check is currently a NOOP for the CONFIG_HIGH_RES_TIMERS=n and the !hrtimer_cpu_base.hres_active case because in these cases nothing updates hrtimer_cpu_base.expires_next yet. This will be changed with later patches which further reduce the #ifdef zoo in this code. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16hrtimer: Make the hrtimer_cpu_base::hres_active field unconditional, to ↵Anna-Maria Gleixner1-12/+8
simplify the code The hrtimer_cpu_base::hres_active_member field depends on CONFIG_HIGH_RES_TIMERS=y currently, and all related functions to this member are conditional as well. To simplify the code make it unconditional and set it to zero during initialization. (This will also help with the upcoming softirq based hrtimers code.) The conditional code sections can be avoided by adding IS_ENABLED(HIGHRES) conditionals into common functions, which ensures dead code elimination. There is no functional change. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16hrtimer: Make room in 'struct hrtimer_cpu_base'Anna-Maria Gleixner1-2/+2
The upcoming softirq based hrtimers support requires an additional field in the hrtimer_cpu_base struct, which would grow the struct size beyond a cache line. The hrtimer_cpu_base::nr_retries and ::nr_hangs members are solely used for diagnostic output and have no requirement to be 'unsigned int'. Make them 'unsigned short' to create room for the new struct member. No functional change. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16hrtimer: Store running timer in hrtimer_clock_baseAnna-Maria Gleixner1-11/+9
The pointer to the currently running timer is stored in hrtimer_cpu_base before the base lock is dropped and the callback is invoked. This results in two levels of indirections and the upcoming support for softirq based hrtimer requires splitting the "running" storage into soft and hard IRQ context expiry. Storing both in the cpu base would require conditionals in all code paths accessing that information. It's possible to have a per clock base sequence count and running pointer without changing the semantics of the related mechanisms because the timer base pointer cannot be changed while a timer is running the callback. Unfortunately this makes cpu_clock base larger than 32 bytes on 32-bit kernels. Instead of having huge gaps due to alignment, remove the alignment and let the compiler pack CPU base for 32-bit kernels. The resulting cache access patterns are fortunately not really different from the current behaviour. On 64-bit kernels the 64-byte alignment stays and the behaviour is unchanged. This was determined by analyzing the resulting layout and looking at the number of cache lines involved for the frequently used clocks. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16hrtimer: Clean up 'enum hrtimer_mode'Anna-Maria Gleixner1-5/+11
It's not obvious that the HRTIMER_MODE variants are bit combinations, because all modes are hard coded constants currently. Change it so the bit meanings are clear; and use the symbols for creating modes which combine bits. While at it get rid of the ugly tail comments as well. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16hrtimer: Fix hrtimer_start[_range_ns]() function descriptionsAnna-Maria Gleixner1-3/+3
The hrtimer_start[_range_ns]() functions start a timer reliably on this CPU only when HRTIMER_MODE_PINNED is set. Furthermore the HRTIMER_MODE_PINNED mode is not considered when a hrtimer is initialized. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16hrtimer: Clean up the 'int clock' parameter of schedule_hrtimeout_range_clock()Anna-Maria Gleixner1-1/+1
schedule_hrtimeout_range_clock() uses an 'int clock' parameter for the clock ID, instead of the customary predefined "clockid_t" type. In hrtimer coding style the canonical variable name for the clock ID is 'clock_id', therefore change the name of the parameter here as well to make it all consistent. While at it, clean up the description for the 'clock_id' and 'mode' function parameters. The clock modes and the clock IDs are not restricted as the comment suggests. Fix the mode description as well for the callers of schedule_hrtimeout_range_clock(). No functional changes intended. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16hrtimer: Fix kerneldoc syntax for 'struct hrtimer_cpu_base'Anna-Maria Gleixner1-4/+4
The '/**' sequence marks the start of a structure description. Add the missing second asterisk. While at it adapt the ordering of the struct members to the struct definition and document the purpose of expires_next more precisely. Signed-off-by: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16hrtimer: Optimize the hrtimer code by using static keys for ↵Thomas Gleixner1-4/+0
migration_enable/nohz_active The hrtimer_cpu_base::migration_enable and ::nohz_active fields were originally introduced to avoid accessing global variables for these decisions. Still that results in a (cache hot) load and conditional branch, which can be avoided by using static keys. Implement it with static keys and optimize for the most critical case of high performance networking which tends to disable the timer migration functionality. No change in functionality. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Anna-Maria Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: John Stultz <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1801142327490.2371@nanos Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-16Merge branch 'timers/urgent' into timers/core, to pick up dependent fixIngo Molnar1-1/+3
Signed-off-by: Ingo Molnar <[email protected]>
2018-01-15signal: Unify and correct copy_siginfo_from_user32Eric W. Biederman1-1/+1
The function copy_siginfo_from_user32 is used for two things, in ptrace since the dawn of siginfo for arbirarily modifying a signal that user space sees, and in sigqueueinfo to send a signal with arbirary siginfo data. Create a single copy of copy_siginfo_from_user32 that all architectures share, and teach it to handle all of the cases in the siginfo union. In the generic version of copy_siginfo_from_user32 ensure that all of the fields in siginfo are initialized so that the siginfo structure can be safely copied to userspace if necessary. When copying the embedded sigval union copy the si_int member. That ensures the 32bit values passes through the kernel unchanged. Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-01-15signal: Move addr_lsb into the _sigfault union for clarityEric W. Biederman1-2/+10
The addr_lsb fields is only valid and available when the signal is SIGBUS and the si_code is BUS_MCEERR_AR or BUS_MCEERR_AO. Document this with a comment and place the field in the _sigfault union to make this clear. All of the fields stay in the same physical location so both the old and new definitions of struct siginfo will continue to work. Signed-off-by: "Eric W. Biederman" <[email protected]>
2018-01-15signal: unify compat_siginfo_tAl Viro1-0/+90
--EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in linux/compat.h CONFIG_X86_X32 is set when the user requests X32 support. CONFIG_X86_X32_ABI is set when the user requests X32 support and the tool-chain has X32 allowing X32 support to be built. Signed-off-by: Al Viro <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2018-01-16i2c: add 'set_sda' to bus_recovery_infoWolfram Sang1-0/+4
This will be needed when we want to create STOP conditions, too, later. Create the needed fields and populate them for the GPIO case if the GPIO is set to output. Tested-by: Phil Reid <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2018-01-16i2c: add identifier in declarations for i2c_bus_recoveryWolfram Sang1-6/+6
No reason to have them undefined, so let's add them. Tested-by: Phil Reid <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2018-01-16i2c: make kerneldoc about bus recovery more preciseWolfram Sang1-5/+5
"Used internally" is vague. What it actually means is that those fields are populated by the core if valid GPIOs are provided. Change the comments to reflect that. Tested-by: Phil Reid <[email protected]> Signed-off-by: Wolfram Sang <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>
2018-01-15netlink: extack: avoid parenthesized string constant warningJohannes Berg1-2/+2
NL_SET_ERR_MSG() and NL_SET_ERR_MSG_ATTR() lead to the following warning in newer versions of gcc: warning: array initialized from parenthesized string constant Just remove the parentheses, they're not needed in this context since anyway since there can be no operator precendence issues or similar. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-01-15fork: Provide usercopy whitelisting for task_structKees Cook1-0/+14
While the blocked and saved_sigmask fields of task_struct are copied to userspace (via sigmask_to_save() and setup_rt_frame()), it is always copied with a static length (i.e. sizeof(sigset_t)). The only portion of task_struct that is potentially dynamically sized and may be copied to userspace is in the architecture-specific thread_struct at the end of task_struct. cache object allocation: kernel/fork.c: alloc_task_struct_node(...): return kmem_cache_alloc_node(task_struct_cachep, ...); dup_task_struct(...): ... tsk = alloc_task_struct_node(node); copy_process(...): ... dup_task_struct(...) _do_fork(...): ... copy_process(...) example usage trace: arch/x86/kernel/fpu/signal.c: __fpu__restore_sig(...): ... struct task_struct *tsk = current; struct fpu *fpu = &tsk->thread.fpu; ... __copy_from_user(&fpu->state.xsave, ..., state_size); fpu__restore_sig(...): ... return __fpu__restore_sig(...); arch/x86/kernel/signal.c: restore_sigcontext(...): ... fpu__restore_sig(...) This introduces arch_thread_struct_whitelist() to let an architecture declare specifically where the whitelist should be within thread_struct. If undefined, the entire thread_struct field is left whitelisted. Cc: Andrew Morton <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Laura Abbott <[email protected]> Cc: "Mickaël Salaün" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Andy Lutomirski <[email protected]> Signed-off-by: Kees Cook <[email protected]> Acked-by: Rik van Riel <[email protected]>
2018-01-15usercopy: Allow strict enforcement of whitelistsKees Cook1-0/+2
This introduces CONFIG_HARDENED_USERCOPY_FALLBACK to control the behavior of hardened usercopy whitelist violations. By default, whitelist violations will continue to WARN() so that any bad or missing usercopy whitelists can be discovered without being too disruptive. If this config is disabled at build time or a system is booted with "slab_common.usercopy_fallback=0", usercopy whitelists will BUG() instead of WARN(). This is useful for admins that want to use usercopy whitelists immediately. Suggested-by: Matthew Garrett <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2018-01-15usercopy: WARN() on slab cache usercopy region violationsKees Cook1-0/+2
This patch adds checking of usercopy cache whitelisting, and is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. The SLAB and SLUB allocators are modified to WARN() on all copy operations in which the kernel heap memory being modified falls outside of the cache's defined usercopy region. Based on an earlier patch from David Windsor. Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Kees Cook <[email protected]>
2018-01-15usercopy: Prepare for usercopy whitelistingDavid Windsor3-6/+27
This patch prepares the slab allocator to handle caches having annotations (useroffset and usersize) defining usercopy regions. This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Currently, hardened usercopy performs dynamic bounds checking on slab cache objects. This is good, but still leaves a lot of kernel memory available to be copied to/from userspace in the face of bugs. To further restrict what memory is available for copying, this creates a way to whitelist specific areas of a given slab cache object for copying to/from userspace, allowing much finer granularity of access control. Slab caches that are never exposed to userspace can declare no whitelist for their objects, thereby keeping them unavailable to userspace via dynamic copy operations. (Note, an implicit form of whitelisting is the use of constant sizes in usercopy operations and get_user()/put_user(); these bypass hardened usercopy checks since these sizes cannot change at runtime.) To support this whitelist annotation, usercopy region offset and size members are added to struct kmem_cache. The slab allocator receives a new function, kmem_cache_create_usercopy(), that creates a new cache with a usercopy region defined, suitable for declaring spans of fields within the objects that get copied to/from userspace. In this patch, the default kmem_cache_create() marks the entire allocation as whitelisted, leaving it semantically unchanged. Once all fine-grained whitelists have been added (in subsequent patches), this will be changed to a usersize of 0, making caches created with kmem_cache_create() not copyable to/from userspace. After the entire usercopy whitelist series is applied, less than 15% of the slab cache memory remains exposed to potential usercopy bugs after a fresh boot: Total Slab Memory: 48074720 Usercopyable Memory: 6367532 13.2% task_struct 0.2% 4480/1630720 RAW 0.3% 300/96000 RAWv6 2.1% 1408/64768 ext4_inode_cache 3.0% 269760/8740224 dentry 11.1% 585984/5273856 mm_struct 29.1% 54912/188448 kmalloc-8 100.0% 24576/24576 kmalloc-16 100.0% 28672/28672 kmalloc-32 100.0% 81920/81920 kmalloc-192 100.0% 96768/96768 kmalloc-128 100.0% 143360/143360 names_cache 100.0% 163840/163840 kmalloc-64 100.0% 167936/167936 kmalloc-256 100.0% 339968/339968 kmalloc-512 100.0% 350720/350720 kmalloc-96 100.0% 455616/455616 kmalloc-8192 100.0% 655360/655360 kmalloc-1024 100.0% 812032/812032 kmalloc-4096 100.0% 819200/819200 kmalloc-2048 100.0% 1310720/1310720 After some kernel build workloads, the percentage (mainly driven by dentry and inode caches expanding) drops under 10%: Total Slab Memory: 95516184 Usercopyable Memory: 8497452 8.8% task_struct 0.2% 4000/1456000 RAW 0.3% 300/96000 RAWv6 2.1% 1408/64768 ext4_inode_cache 3.0% 1217280/39439872 dentry 11.1% 1623200/14608800 mm_struct 29.1% 73216/251264 kmalloc-8 100.0% 24576/24576 kmalloc-16 100.0% 28672/28672 kmalloc-32 100.0% 94208/94208 kmalloc-192 100.0% 96768/96768 kmalloc-128 100.0% 143360/143360 names_cache 100.0% 163840/163840 kmalloc-64 100.0% 245760/245760 kmalloc-256 100.0% 339968/339968 kmalloc-512 100.0% 350720/350720 kmalloc-96 100.0% 563520/563520 kmalloc-8192 100.0% 655360/655360 kmalloc-1024 100.0% 794624/794624 kmalloc-4096 100.0% 819200/819200 kmalloc-2048 100.0% 1257472/1257472 Signed-off-by: David Windsor <[email protected]> [kees: adjust commit log, split out a few extra kmalloc hunks] [kees: add field names to function declarations] [kees: convert BUGs to WARNs and fail closed] [kees: add attack surface reduction analysis to commit log] Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Acked-by: Christoph Lameter <[email protected]>
2018-01-15stddef.h: Introduce sizeof_field()Kees Cook1-1/+9
The size of fields within a structure is needed in a few places in the kernel already, and will be needed for the usercopy whitelisting when declaring whitelist regions within structures. This creates a dedicated macro and redefines offsetofend() to use it. Existing usage, ignoring the 1200+ lustre assert uses: $ git grep -E 'sizeof\(\(\((struct )?[a-zA-Z_]+ \*\)0\)->' | \ grep -v staging/lustre | wc -l 65 Signed-off-by: Kees Cook <[email protected]>
2018-01-15usercopy: Include offset in hardened usercopy reportKees Cook1-8/+4
This refactors the hardened usercopy code so that failure reporting can happen within the checking functions instead of at the top level. This simplifies the return value handling and allows more details and offsets to be included in the report. Having the offset can be much more helpful in understanding hardened usercopy bugs. Signed-off-by: Kees Cook <[email protected]>
2018-01-15usercopy: Enhance and rename report_usercopy()Kees Cook1-0/+6
In preparation for refactoring the usercopy checks to pass offset to the hardened usercopy report, this renames report_usercopy() to the more accurate usercopy_abort(), marks it as noreturn because it is, adds a hopefully helpful comment for anyone investigating such reports, makes the function available to the slab allocators, and adds new "detail" and "offset" arguments. Signed-off-by: Kees Cook <[email protected]>
2018-01-15net: phy: remove parameter new_link from phy_mac_interrupt()Heiner Kallweit1-1/+1
I see two issues with parameter new_link: 1. It's not needed. See also phy_interrupt(), works w/o this parameter. phy_mac_interrupt sets the state to PHY_CHANGELINK and triggers the state machine which then calls phy_read_status. And phy_read_status updates the link state. 2. phy_mac_interrupt is used in interrupt context and getting the link state may sleep (at least when having to access the PHY registers via MDIO bus). So let's remove it. Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Tested-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-01-15ptr_ring: document usage around __ptr_ring_peekMichael S. Tsirkin1-4/+10
This explains why is the net usage of __ptr_ring_peek actually ok without locks. Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-01-15remoteproc: Drop dangling find_rsc_table dummiesBjorn Andersson1-4/+0
As the core now deals with the lack of a resource table, remove the dangling custom dummy implementations of find_rsc_table from drivers. Reviewed-By: Loic Pallardy <[email protected]> Tested-By: Loic Pallardy <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]>
2018-01-15remoteproc: Move resource table load logic to findBjorn Andersson1-0/+2
Extend the previous operation of finding the resource table in the ELF with the extra step of populating the rproc struct with a copy and the size. This allows drivers to override the mechanism used for acquiring the resource table, or omit it for firmware that is known not to have a resource table. This leaves the custom, dummy, find_rsc_table implementations found in some drivers dangling. Reviewed-By: Loic Pallardy <[email protected]> Tested-By: Loic Pallardy <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]>
2018-01-15remoteproc: Merge rproc_ops and rproc_fw_opsBjorn Andersson1-2/+15
There are currently a few different schemes used for overriding fw_ops or parts of fw_ops. Merge fw_ops into rproc_ops and expose the default ELF-loader symbols so that they can be assigned by the drivers. To keep backwards compatibility with the "default" case, a driver not specifying the "load" operation is assumed to want the full ELF-loader suit of functions. Reviewed-By: Loic Pallardy <[email protected]> Tested-By: Loic Pallardy <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]>
2018-01-15remoteproc: Clone rproc_ops in rproc_alloc()Bjorn Andersson1-1/+1
In order to allow rproc_alloc() to, in a future patch, update entries in the "ops" struct we need to make a local copy of it. Reviewed-By: Loic Pallardy <[email protected]> Tested-By: Loic Pallardy <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]>
2018-01-15remoteproc: Cache resource table sizeBjorn Andersson1-0/+2
We don't re-read the resource table during a recovery, so it is possible in the recovery path that the resource table has a different size than cached_table. Store the original size of cached_table to avoid these getting out of sync. Reviewed-By: Loic Pallardy <[email protected]> Tested-By: Loic Pallardy <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]>
2018-01-15remoteproc: Remove depricated crash completionBjorn Andersson1-2/+0
The crash handling now happens in a single execution context, so there's no longer a need for a completion to synchronize this. Reviewed-By: Loic Pallardy <[email protected]> Tested-By: Loic Pallardy <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]>
2018-01-15block: allow gendisk's request_queue registration to be deferredMike Snitzer1-0/+5
Since I can remember DM has forced the block layer to allow the allocation and initialization of the request_queue to be distinct operations. Reason for this is block/genhd.c:add_disk() has requires that the request_queue (and associated bdi) be tied to the gendisk before add_disk() is called -- because add_disk() also deals with exposing the request_queue via blk_register_queue(). DM's dynamic creation of arbitrary device types (and associated request_queue types) requires the DM device's gendisk be available so that DM table loads can establish a master/slave relationship with subordinate devices that are referenced by loaded DM tables -- using bd_link_disk_holder(). But until these DM tables, and their associated subordinate devices, are known DM cannot know what type of request_queue it needs -- nor what its queue_limits should be. This chicken and egg scenario has created all manner of problems for DM and, at times, the block layer. Summary of changes: - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant that drivers may use to add a disk without also calling blk_register_queue(). Driver must call blk_register_queue() once its request_queue is fully initialized. - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED is not set. It won't be set if driver used add_disk_no_queue_reg() but driver encounters an error and must del_gendisk() before calling blk_register_queue(). - Export blk_register_queue(). These changes allow DM to use add_disk_no_queue_reg() to anchor its gendisk as the "master" for master/slave relationships DM must establish with subordinate devices referenced in DM tables that get loaded. Once all "slave" devices for a DM device are known its request_queue can be properly initialized and then advertised via sysfs -- important improvement being that no request_queue resource initialization performed by blk_register_queue() is missed for DM devices anymore. Signed-off-by: Mike Snitzer <[email protected]> Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2018-01-15Merge 4.15-rc8 into usb-nextGreg Kroah-Hartman9-18/+41
We want the USB fixes in here as well for merge issues. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-01-15swiotlb: add common swiotlb_map_opsChristoph Hellwig1-0/+8
Currently all architectures that want to use swiotlb have to implement their own dma_map_ops instances. Provide a generic one based on the x86 implementation which first calls into dma_direct to try a full blown direct mapping implementation (including e.g. CMA) before falling back allocating from the swiotlb buffer. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Christian König <[email protected]>
2018-01-15swiotlb: rename swiotlb_free to swiotlb_exitChristoph Hellwig1-2/+2
Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Christian König <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-01-15dma-direct: reject too small dma masksChristoph Hellwig1-0/+1
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Robin Murphy <[email protected]>
2018-01-15dma-direct: make dma_direct_{alloc,free} available to other implementationsChristoph Hellwig1-0/+5
So that they don't need to indirect through the operation vector. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Vladimir Murzin <[email protected]>
2018-01-15dma-direct: rename dma_noop to dma_directChristoph Hellwig1-1/+1
The trivial direct mapping implementation already does a virtual to physical translation which isn't strictly a noop, and will soon learn to do non-direct but linear physical to dma translations through the device offset and a few small tricks. Rename it to a better fitting name. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Vladimir Murzin <[email protected]>
2018-01-15dma-mapping: add an arch_dma_supported hookChristoph Hellwig1-0/+11
To implement the x86 forbid_dac and iommu_sac_force we want an arch hook so that it can apply the global options across all dma_map_ops implementations. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-01-15dma-mapping: clear harmful GFP_* flags in common codeChristoph Hellwig1-0/+7
Lift the code from x86 so that we behave consistently. In the future we should probably warn if any of these is set. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Jesper Nilsson <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> [m68k]
2018-01-15dma-mapping: warn when there is no coherent_dma_maskChristoph Hellwig1-0/+1
These days all devices should have a DMA coherent mask, and most dma_ops implementations rely on that fact. But just to be sure add an assert to ring the warning bell if that is not the case. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Vladimir Murzin <[email protected]> Reviewed-by: Konrad Rzeszutek Wilk <[email protected]>
2018-01-14SUNRPC: Remove rpc_protocol()Chuck Lever1-1/+0
Since nfs4_create_referral_server was the only call site of rpc_protocol, rpc_protocol can now be removed. Signed-off-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2018-01-14lockd: convert nlm_rqst.a_count from atomic_t to refcount_tElena Reshetova1-1/+1
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_rqst.a_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_rqst.a_count it might make a difference in following places: - nlmclnt_release_call() and nlmsvc_release_call(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <[email protected]> Reviewed-by: David Windsor <[email protected]> Reviewed-by: Hans Liljestrand <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2018-01-14lockd: convert nlm_lockowner.count from atomic_t to refcount_tElena Reshetova1-1/+1
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_lockowner.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_lockowner.count it might make a difference in following places: - nlm_put_lockowner(): decrement in refcount_dec_and_lock() only provides RELEASE ordering, control dependency on success and holds a spin lock on success vs. fully ordered atomic counterpart. No changes in spin lock guarantees. Suggested-by: Kees Cook <[email protected]> Reviewed-by: David Windsor <[email protected]> Reviewed-by: Hans Liljestrand <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2018-01-14lockd: convert nsm_handle.sm_count from atomic_t to refcount_tElena Reshetova1-1/+1
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nsm_handle.sm_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nsm_handle.sm_count it might make a difference in following places: - nsm_release(): decrement in refcount_dec_and_lock() only provides RELEASE ordering, control dependency on success and holds a spin lock on success vs. fully ordered atomic counterpart. No change for the spin lock guarantees. Suggested-by: Kees Cook <[email protected]> Reviewed-by: David Windsor <[email protected]> Reviewed-by: Hans Liljestrand <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2018-01-14lockd: convert nlm_host.h_count from atomic_t to refcount_tElena Reshetova1-1/+2
atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_host.h_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_host.h_count it might make a difference in following places: - nlmsvc_release_host(): decrement in refcount_dec() provides RELEASE ordering, while original atomic_dec() was fully unordered. Since the change is for better, it should not matter. - nlmclnt_release_host(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart. It doesn't seem to matter in this case since object freeing happens under mutex lock anyway. Suggested-by: Kees Cook <[email protected]> Reviewed-by: David Windsor <[email protected]> Reviewed-by: Hans Liljestrand <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2018-01-14NFS: Fix nfsstat breakage due to LOOKUPPTrond Myklebust1-4/+8
The LOOKUPP operation was inserted into the nfs4_procedures array rather than being appended, which put /proc/net/rpc/nfs out of whack, and broke the nfsstat utility. Fix by moving the LOOKUPP operation to the end of the array, and by ensuring that it keeps the same length whether or not NFSV4.1 and NFSv4.2 are compiled in. Fixes: 5b5faaf6df734 ("nfs4: add NFSv4 LOOKUPP handlers") Cc: [email protected] # v4.13+ Signed-off-by: Trond Myklebust <[email protected]>
2018-01-14bpf: offload: add map offload infrastructureJakub Kicinski2-0/+65
BPF map offload follow similar path to program offload. At creation time users may specify ifindex of the device on which they want to create the map. Map will be validated by the kernel's .map_alloc_check callback and device driver will be called for the actual allocation. Map will have an empty set of operations associated with it (save for alloc and free callbacks). The real device callbacks are kept in map->offload->dev_ops because they have slightly different signatures. Map operations are called in process context so the driver may communicate with HW freely, msleep(), wait() etc. Map alloc and free callbacks are muxed via existing .ndo_bpf, and are always called with rtnl lock held. Maps and programs are guaranteed to be destroyed before .ndo_uninit (i.e. before unregister_netdev() returns). Map callbacks are invoked with bpf_devs_lock *read* locked, drivers must take care of exclusive locking if necessary. All offload-specific branches are marked with unlikely() (through bpf_map_is_dev_bound()), given that branch penalty will be negligible compared to IO anyway, and we don't want to penalize SW path unnecessarily. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>