aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-09-11openvswitch: Fix dependency on IPv6 defrag.Joe Stringer1-1/+2
When NF_CONNTRACK is built-in, NF_DEFRAG_IPV6 is a module, and OPENVSWITCH is built-in, the following build error would occur: net/built-in.o: In function `ovs_ct_execute': (.text+0x10f587): undefined reference to `nf_ct_frag6_gather' Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Jim Davis <[email protected]> Signed-off-by: Joe Stringer <[email protected]> Acked-by: Pravin B Shelar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-09-11revert "ocfs2/dlm: use list_for_each_entry instead of list_for_each"Andrew Morton1-2/+4
Revert commit f83c7b5e9fd6 ("ocfs2/dlm: use list_for_each_entry instead of list_for_each"). list_for_each_entry() will dereference its `pos' argument, which can be NULL in dlm_process_recovery_data(). Reported-by: Julia Lawall <[email protected]> Reported-by: Fengguang Wu <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-11mm/early_ioremap: add explicit #include of asm/early_ioremap.hArd Biesheuvel1-0/+1
Commit 6b0f68e32ea8 ("mm: add utility for early copy from unmapped ram") introduces a function copy_from_early_mem() into mm/early_ioremap.c which itself calls early_memremap()/early_memunmap(). However, since early_memunmap() has not been declared yet at this point in the .c file, nor by any explicitly included header files, we are depending on a transitive include of asm/early_ioremap.h to declare it, which is fragile. So instead, include this header explicitly. Signed-off-by: Ard Biesheuvel <[email protected]> Acked-by: Mark Salter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-11fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to voidJoe Perches4-50/+45
The seq_<foo> function return values were frequently misused. See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") All uses of these return values have been removed, so convert the return types to void. Miscellanea: o Move seq_put_decimal_<type> and seq_escape prototypes closer the other seq_vprintf prototypes o Reorder seq_putc and seq_puts to return early on overflow o Add argument names to seq_vprintf and seq_printf o Update the seq_escape kernel-doc o Convert a couple of leading spaces to tabs in seq_escape Signed-off-by: Joe Perches <[email protected]> Cc: Al Viro <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Mark Brown <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Joerg Roedel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-11selftests: enhance membarrier syscall testMathieu Desnoyers1-25/+75
Update the membarrier syscall self-test to match the membarrier interface. Extend coverage of the interface. Consider ENOSYS as a "SKIP" test, since it is a valid configuration, but does not allow testing the system call. Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Pranith Kumar <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-11selftests: add membarrier syscall testPranith Kumar4-0/+84
Add a self test for the membarrier system call. Signed-off-by: Pranith Kumar <[email protected]> Signed-off-by: Mathieu Desnoyers <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-11sys_membarrier(): system-wide memory barrier (generic, x86)Mathieu Desnoyers11-1/+151
Here is an implementation of a new system call, sys_membarrier(), which executes a memory barrier on all threads running on the system. It is implemented by calling synchronize_sched(). It can be used to distribute the cost of user-space memory barriers asymmetrically by transforming pairs of memory barriers into pairs consisting of sys_membarrier() and a compiler barrier. For synchronization primitives that distinguish between read-side and write-side (e.g. userspace RCU [1], rwlocks), the read-side can be accelerated significantly by moving the bulk of the memory barrier overhead to the write-side. The existing applications of which I am aware that would be improved by this system call are as follows: * Through Userspace RCU library (http://urcu.so) - DNS server (Knot DNS) https://www.knot-dns.cz/ - Network sniffer (http://netsniff-ng.org/) - Distributed object storage (https://sheepdog.github.io/sheepdog/) - User-space tracing (http://lttng.org) - Network storage system (https://www.gluster.org/) - Virtual routers (https://events.linuxfoundation.org/sites/events/files/slides/DPDK_RCU_0MQ.pdf) - Financial software (https://lkml.org/lkml/2015/3/23/189) Those projects use RCU in userspace to increase read-side speed and scalability compared to locking. Especially in the case of RCU used by libraries, sys_membarrier can speed up the read-side by moving the bulk of the memory barrier cost to synchronize_rcu(). * Direct users of sys_membarrier - core dotnet garbage collector (https://github.com/dotnet/coreclr/issues/198) Microsoft core dotnet GC developers are planning to use the mprotect() side-effect of issuing memory barriers through IPIs as a way to implement Windows FlushProcessWriteBuffers() on Linux. They are referring to sys_membarrier in their github thread, specifically stating that sys_membarrier() is what they are looking for. To explain the benefit of this scheme, let's introduce two example threads: Thread A (non-frequent, e.g. executing liburcu synchronize_rcu()) Thread B (frequent, e.g. executing liburcu rcu_read_lock()/rcu_read_unlock()) In a scheme where all smp_mb() in thread A are ordering memory accesses with respect to smp_mb() present in Thread B, we can change each smp_mb() within Thread A into calls to sys_membarrier() and each smp_mb() within Thread B into compiler barriers "barrier()". Before the change, we had, for each smp_mb() pairs: Thread A Thread B previous mem accesses previous mem accesses smp_mb() smp_mb() following mem accesses following mem accesses After the change, these pairs become: Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses As we can see, there are two possible scenarios: either Thread B memory accesses do not happen concurrently with Thread A accesses (1), or they do (2). 1) Non-concurrent Thread A vs Thread B accesses: Thread A Thread B prev mem accesses sys_membarrier() follow mem accesses prev mem accesses barrier() follow mem accesses In this case, thread B accesses will be weakly ordered. This is OK, because at that point, thread A is not particularly interested in ordering them with respect to its own accesses. 2) Concurrent Thread A vs Thread B accesses Thread A Thread B prev mem accesses prev mem accesses sys_membarrier() barrier() follow mem accesses follow mem accesses In this case, thread B accesses, which are ensured to be in program order thanks to the compiler barrier, will be "upgraded" to full smp_mb() by synchronize_sched(). * Benchmarks On Intel Xeon E5405 (8 cores) (one thread is calling sys_membarrier, the other 7 threads are busy looping) 1000 non-expedited sys_membarrier calls in 33s =3D 33 milliseconds/call. * User-space user of this system call: Userspace RCU library Both the signal-based and the sys_membarrier userspace RCU schemes permit us to remove the memory barrier from the userspace RCU rcu_read_lock() and rcu_read_unlock() primitives, thus significantly accelerating them. These memory barriers are replaced by compiler barriers on the read-side, and all matching memory barriers on the write-side are turned into an invocation of a memory barrier on all active threads in the process. By letting the kernel perform this synchronization rather than dumbly sending a signal to every process threads (as we currently do), we diminish the number of unnecessary wake ups and only issue the memory barriers on active threads. Non-running threads do not need to execute such barrier anyway, because these are implied by the scheduler context switches. Results in liburcu: Operations in 10s, 6 readers, 2 writers: memory barriers in reader: 1701557485 reads, 2202847 writes signal-based scheme: 9830061167 reads, 6700 writes sys_membarrier: 9952759104 reads, 425 writes sys_membarrier (dyn. check): 7970328887 reads, 425 writes The dynamic sys_membarrier availability check adds some overhead to the read-side compared to the signal-based scheme, but besides that, sys_membarrier slightly outperforms the signal-based scheme. However, this non-expedited sys_membarrier implementation has a much slower grace period than signal and memory barrier schemes. Besides diminishing the number of wake-ups, one major advantage of the membarrier system call over the signal-based scheme is that it does not need to reserve a signal. This plays much more nicely with libraries, and with processes injected into for tracing purposes, for which we cannot expect that signals will be unused by the application. An expedited version of this system call can be added later on to speed up the grace period. Its implementation will likely depend on reading the cpu_curr()->mm without holding each CPU's rq lock. This patch adds the system call to x86 and to asm-generic. [1] http://urcu.so membarrier(2) man page: MEMBARRIER(2) Linux Programmer's Manual MEMBARRIER(2) NAME membarrier - issue memory barriers on a set of threads SYNOPSIS #include <linux/membarrier.h> int membarrier(int cmd, int flags); DESCRIPTION The cmd argument is one of the following: MEMBARRIER_CMD_QUERY Query the set of supported commands. It returns a bitmask of supported commands. MEMBARRIER_CMD_SHARED Execute a memory barrier on all threads running on the system. Upon return from system call, the caller thread is ensured that all running threads have passed through a state where all memory accesses to user-space addresses match program order between entry to and return from the system call (non-running threads are de facto in such a state). This covers threads from all pro=E2=80=90 cesses running on the system. This command returns 0. The flags argument needs to be 0. For future extensions. All memory accesses performed in program order from each targeted thread is guaranteed to be ordered with respect to sys_membarrier(). If we use the semantic "barrier()" to represent a compiler barrier forcing memory accesses to be performed in program order across the barrier, and smp_mb() to represent explicit memory barriers forcing full memory ordering across the barrier, we have the following ordering table for each pair of barrier(), sys_membarrier() and smp_mb(): The pair ordering is detailed as (O: ordered, X: not ordered): barrier() smp_mb() sys_membarrier() barrier() X X O smp_mb() X O O sys_membarrier() O O O RETURN VALUE On success, these system calls return zero. On error, -1 is returned, and errno is set appropriately. For a given command, with flags argument set to 0, this system call is guaranteed to always return the same value until reboot. ERRORS ENOSYS System call is not implemented. EINVAL Invalid arguments. Linux 2015-04-15 MEMBARRIER(2) Signed-off-by: Mathieu Desnoyers <[email protected]> Reviewed-by: Paul E. McKenney <[email protected]> Reviewed-by: Josh Triplett <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Nicholas Miell <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Alan Cox <[email protected]> Cc: Lai Jiangshan <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: David Howells <[email protected]> Cc: Pranith Kumar <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-11MODSIGN: fix a compilation warning in extract-certDavid Howells1-1/+1
Fix the following warning when compiling extract-cert: scripts/extract-cert.c: In function `write_cert': scripts/extract-cert.c:89:2: warning: format not a string literal and no format arguments [-Wformat-security] ERR(!i2d_X509_bio(wb, x509), cert_dst); ^ whereby the ERR() macro is taking cert_dst as the format string. "%s" should be used as the format string as the path could contain special characters. Signed-off-by: David Howells <[email protected]> Reported-by: Jim Davis <[email protected]> Acked-by : David Woodhouse <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-11IB/ehca: Deprecate driver, move to staging, schedule deletionDoug Ledford36-3/+9
The ehca driver is only supported on IBM machines with a custom EBus. As they have opted to build their newer machines using more industry standard technology and haven't really been pushing EBus capable machines for a while, this driver can now safely be moved to the staging area and scheduled for eventual removal. This plan was brought to IBM's attention and received their sign-off. Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Doug Ledford <[email protected]>
2015-09-11Merge git://www.linux-watchdog.org/linux-watchdogLinus Torvalds53-135/+916
Pull watchdog updates from Wim Van Sebroeck: - new driver for NXP LPC18xx Watchdog Timer - new driver for SAMA5D4 watchdog timer - add support for MCP79 to nv_tco driver - clean-up and improvement of the mpc8xxx watchdog driver - improvements to gpio-wdt - at91sam9_wdt clock improvements ... and other small fixes and improvements * git://www.linux-watchdog.org/linux-watchdog: (25 commits) Watchdog: Fix parent of watchdog_devices watchdog: at91rm9200: Correct check for syscon_node_to_regmap() errors watchdog: at91sam9: get and use slow clock Documentation: dt: binding: atmel-sama5d4-wdt: for SAMA5D4 watchdog driver watchdog: add a driver to support SAMA5D4 watchdog timer watchdog: mpc8xxx: allow to compile for MPC512x watchdog: mpc8xxx: use better error code when watchdog cannot be enabled watchdog: mpc8xxx: use dynamic memory for device specific data watchdog: mpc8xxx: use devm_ioremap_resource to map memory watchdog: mpc8xxx: make use of of_device_get_match_data watchdog: mpc8xxx: simplify registration watchdog: mpc8xxx: remove dead code watchdog: lpc18xx_wdt_get_timeleft() can be static DT: watchdog: Add NXP LPC18xx Watchdog Timer binding documentation watchdog: NXP LPC18xx Watchdog Timer Driver watchdog: gpio-wdt: ping already at startup for always running devices watchdog: gpio-wdt: be more strict about hw_algo matching Documentation: watchdog: at91sam9_wdt: add clocks property watchdog: booke_wdt: Use infrastructure to check timeout limits watchdog: (nv_tco) add support for MCP79 ...
2015-09-11bridge: fix igmpv3 / mldv2 report parsingLinus Lüssing1-2/+2
With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMPv3 and MLDv2 report parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header, breaking the message parsing and creating packet loss. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: 9afd85c9e455 ("net: Export IGMP/MLD message validation code") Reported-by: Tobias Powalowski <[email protected]> Tested-by: Tobias Powalowski <[email protected]> Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-09-11bnx2x: use ktime_get_seconds() for timestampArnd Bergmann1-4/+2
commit c48f350ff5e7 "bnx2x: Add MFW dump support" added the bnx2x_update_mfw_dump() function that reads the current time and stores it in a 32-bit field that gets passed into a buffer in a fixed format. This is potentially broken when the epoch overflows in 2038, and otherwise overflows in 2106. As we're trying to avoid uses of struct timeval for this reason, I noticed the addition of this function, and tried to rewrite it in a way that is more explicit about the overflow and that will keep working once we deprecate struct timeval. I assume that it is not possible to change the ABI any more, otherwise we should try to use a 64-bit field for the seconds right away. Signed-off-by: Arnd Bergmann <[email protected]> Cc: Yuval Mintz <[email protected]> Cc: Ariel Elior <[email protected]> Acked-by: Yuval Mintz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-09-11sctp: fix race on protocol/netns initializationMarcelo Ricardo Leitner1-23/+41
Consider sctp module is unloaded and is being requested because an user is creating a sctp socket. During initialization, sctp will add the new protocol type and then initialize pernet subsys: status = sctp_v4_protosw_init(); if (status) goto err_protosw_init; status = sctp_v6_protosw_init(); if (status) goto err_v6_protosw_init; status = register_pernet_subsys(&sctp_net_ops); The problem is that after those calls to sctp_v{4,6}_protosw_init(), it is possible for userspace to create SCTP sockets like if the module is already fully loaded. If that happens, one of the possible effects is that we will have readers for net->sctp.local_addr_list list earlier than expected and sctp_net_init() does not take precautions while dealing with that list, leading to a potential panic but not limited to that, as sctp_sock_init() will copy a bunch of blank/partially initialized values from net->sctp. The race happens like this: CPU 0 | CPU 1 socket() | __sock_create | socket() inet_create | __sock_create list_for_each_entry_rcu( | answer, &inetsw[sock->type], | list) { | inet_create /* no hits */ | if (unlikely(err)) { | ... | request_module() | /* socket creation is blocked | * the module is fully loaded | */ | sctp_init | sctp_v4_protosw_init | inet_register_protosw | list_add_rcu(&p->list, | last_perm); | | list_for_each_entry_rcu( | answer, &inetsw[sock->type], sctp_v6_protosw_init | list) { | /* hit, so assumes protocol | * is already loaded | */ | /* socket creation continues | * before netns is initialized | */ register_pernet_subsys | Simply inverting the initialization order between register_pernet_subsys() and sctp_v4_protosw_init() is not possible because register_pernet_subsys() will create a control sctp socket, so the protocol must be already visible by then. Deferring the socket creation to a work-queue is not good specially because we loose the ability to handle its errors. So, as suggested by Vlad, the fix is to split netns initialization in two moments: defaults and control socket, so that the defaults are already loaded by when we register the protocol, while control socket initialization is kept at the same moment it is today. Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace") Signed-off-by: Vlad Yasevich <[email protected]> Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-09-11ebpf: emit correct src_reg for conditional jumpsTycho Andersen1-1/+1
Instead of always emitting BPF_REG_X, let's emit BPF_REG_X only when the source actually is BPF_X. This causes programs generated by the classic converter to not be importable via bpf(), as the eBPF verifier checks that the src_reg is correct or 0. While not a problem yet, this will be a problem when BPF_PROG_DUMP lands, and we can potentially dump and re-import programs generated by the converter. Signed-off-by: Tycho Andersen <[email protected]> CC: Alexei Starovoitov <[email protected]> CC: Daniel Borkmann <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-09-11netlink, mmap: transform mmap skb into full skb on tapsDaniel Borkmann2-7/+32
Ken-ichirou reported that running netlink in mmap mode for receive in combination with nlmon will throw a NULL pointer dereference in __kfree_skb() on nlmon_xmit(), in my case I can also trigger an "unable to handle kernel paging request". The problem is the skb_clone() in __netlink_deliver_tap_skb() for skbs that are mmaped. I.e. the cloned skb doesn't have a destructor, whereas the mmap netlink skb has it pointed to netlink_skb_destructor(), set in the handler netlink_ring_setup_skb(). There, skb->head is being set to NULL, so that in such cases, __kfree_skb() doesn't perform a skb_release_data() via skb_release_all(), where skb->head is possibly being freed through kfree(head) into slab allocator, although netlink mmap skb->head points to the mmap buffer. Similarly, the same has to be done also for large netlink skbs where the data area is vmalloced. Therefore, as discussed, make a copy for these rather rare cases for now. This fixes the issue on my and Ken-ichirou's test-cases. Reference: http://thread.gmane.org/gmane.linux.network/371129 Fixes: bcbde0d449ed ("net: netlink: virtual tap device management") Reported-by: Ken-ichirou MATSUZAWA <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Ken-ichirou MATSUZAWA <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-09-11Revert "writeback: plug writeback at a high level"Linus Torvalds1-3/+4
This reverts commit d353d7587d02116b9732d5c06615aed75a4d3a47. Doing the block layer plug/unplug inside writeback_sb_inodes() is broken, because that function is actually called with a spinlock held: wb->list_lock, as pointed out by Chris Mason. Chris suggested just dropping and re-taking the spinlock around the blk_finish_plug() call (the plgging itself can happen under the spinlock), and that would technically work, but is just disgusting. We do something fairly similar - but not quite as disgusting because we at least have a better reason for it - in writeback_single_inode(), so it's not like the caller can depend on the lock being held over the call, but in this case there just isn't any good reason for that "release and re-take the lock" pattern. [ In general, we should really strive to avoid the "release and retake" pattern for locks, because in the general case it can easily cause subtle bugs when the caller caches any state around the call that might be invalidated by dropping the lock even just temporarily. ] But in this case, the plugging should be easy to just move up to the callers before the spinlock is taken, which should even improve the effectiveness of the plug. So there is really no good reason to play games with locking here. I'll send off a test-patch so that Dave Chinner can verify that that plug movement works. In the meantime this just reverts the problematic commit and adds a comment to the function so that we hopefully don't make this mistake again. Reported-by: Chris Mason <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Neil Brown <[email protected]> Cc: Jan Kara <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-09-11Merge branch 'for-linus-4.3' of ↵Linus Torvalds9-96/+82
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs cleanups and fixes from Chris Mason: "These are small cleanups, and also some fixes for our async worker thread initialization. I was having some trouble testing these, but it ended up being a combination of changing around my test servers and a shiny new schedule while atomic from the new start/finish_plug in writeback_sb_inodes(). That one only hits on btrfs raid5/6 or MD raid10, and if I wasn't changing a bunch of things in my test setup at once it would have been really clear. Fix for writeback_sb_inodes() on the way as well" * 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: cleanup: remove unnecessary check before btrfs_free_path is called btrfs: async_thread: Fix workqueue 'max_active' value when initializing btrfs: Add raid56 support for updating num_tolerated_disk_barrier_failures in btrfs_balance btrfs: Cleanup for btrfs_calc_num_tolerated_disk_barrier_failures btrfs: Remove noused chunk_tree and chunk_objectid from scrub_enumerate_chunks and scrub_chunk btrfs: Update out-of-date "skip parity stripe" comment
2015-09-11Merge branch 'for-linus' of ↵Linus Torvalds17-98/+191
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph update from Sage Weil: "There are a few fixes for snapshot behavior with CephFS and support for the new keepalive protocol from Zheng, a libceph fix that affects both RBD and CephFS, a few bug fixes and cleanups for RBD from Ilya, and several small fixes and cleanups from Jianpeng and others" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: improve readahead for file holes ceph: get inode size for each append write libceph: check data_len in ->alloc_msg() libceph: use keepalive2 to verify the mon session is alive rbd: plug rbd_dev->header.object_prefix memory leak rbd: fix double free on rbd_dev->header_name libceph: set 'exists' flag for newly up osd ceph: cleanup use of ceph_msg_get ceph: no need to get parent inode in ceph_open ceph: remove the useless judgement ceph: remove redundant test of head->safe and silence static analysis warnings ceph: fix queuing inode to mdsdir's snaprealm libceph: rename con_work() to ceph_con_workfn() libceph: Avoid holding the zero page on ceph_msgr_slab_init errors libceph: remove the unused macro AES_KEY_SIZE ceph: invalidate dirty pages after forced umount ceph: EIO all operations after forced umount
2015-09-11Merge tag 'gfs2-merge-window' of ↵Linus Torvalds11-285/+212
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "Here is a list of patches we've accumulated for GFS2 for the current upstream merge window. This time we've only got six patches, many of which are very small: - three cleanups from Andreas Gruenbacher, including a nice cleanup of the sequence file code for the sbstats debugfs file. - a patch from Ben Hutchings that changes statistics variables from signed to unsigned. - two patches from me that increase GFS2's glock scalability by switching from a conventional hash table to rhashtable" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: A minor "sbstats" cleanup gfs2: Fix a typo in a comment gfs2: Make statistics unsigned, suitable for use with do_div() GFS2: Use resizable hash table for glocks GFS2: Move glock superblock pointer to field gl_name gfs2: Simplify the seq file code for "sbstats"
2015-09-11scsi_dh: fix randconfig build errorChristoph Hellwig1-1/+1
It looks like the Kconfig check that was meant to fix this (commit fe9233fb6914a0eb20166c967e3020f7f0fba2c9 [SCSI] scsi_dh: fix kconfig related build errors) was actually reversed, but no-one noticed until the new set of patches which separated DM and SCSI_DH). Fixes: fe9233fb6914a0eb20166c967e3020f7f0fba2c9 Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Mike Snitzer <[email protected]> Signed-off-by: James Bottomley <[email protected]>
2015-09-11Merge branch 'uaccess' into fixesRussell King31-139/+398
2015-09-11ARM: 8431/1: fix alignement of __bug_table section entriesRobert Jarzmik1-0/+1
On old ARM chips, unaligned accesses to memory are not trapped and fixed. On module load, symbols are relocated, and the relocation of __bug_table symbols is done on a u32 basis. Yet the section is not aligned to a multiple of 4 address, but to a multiple of 2. This triggers an Oops on pxa architecture, where address 0xbf0021ea is the first relocation in the __bug_table section : apply_relocate(): pxa3xx_nand: section 13 reloc 0 sym '' Unable to handle kernel paging request at virtual address bf0021ea pgd = e1cd0000 [bf0021ea] *pgd=c1cce851, *pte=c1cde04f, *ppte=c1cde01f Internal error: Oops: 23 [#1] ARM Modules linked in: CPU: 0 PID: 606 Comm: insmod Not tainted 4.2.0-rc8-next-20150828-cm-x300+ #887 Hardware name: CM-X300 module task: e1c68700 ti: e1c3e000 task.ti: e1c3e000 PC is at apply_relocate+0x2f4/0x3d4 LR is at 0xbf0021ea pc : [<c000e7c8>] lr : [<bf0021ea>] psr: 80000013 sp : e1c3fe30 ip : 60000013 fp : e49e8c60 r10: e49e8fa8 r9 : 00000000 r8 : e49e7c58 r7 : e49e8c38 r6 : e49e8a58 r5 : e49e8920 r4 : e49e8918 r3 : bf0021ea r2 : bf007034 r1 : 00000000 r0 : bf000000 Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 0000397f Table: c1cd0018 DAC: 00000051 Process insmod (pid: 606, stack limit = 0xe1c3e198) [<c000e7c8>] (apply_relocate) from [<c005ce5c>] (load_module+0x1248/0x1f5c) [<c005ce5c>] (load_module) from [<c005dc54>] (SyS_init_module+0xe4/0x170) [<c005dc54>] (SyS_init_module) from [<c000a420>] (ret_fast_syscall+0x0/0x38) Fix this by ensuring entries in __bug_table are all aligned to at least of multiple of 4. This transforms a module section __bug_table as : - [12] __bug_table PROGBITS 00000000 002232 000018 00 A 0 0 1 + [12] __bug_table PROGBITS 00000000 002232 000018 00 A 0 0 4 Signed-off-by: Robert Jarzmik <[email protected]> Reviewed-by: Dave Martin <[email protected]> Signed-off-by: Russell King <[email protected]>
2015-09-11arm/xen: Enable user access to the kernel before issuing a privcmd callJulien Grall1-0/+15
When Xen is copying data to/from the guest it will check if the kernel has the right to do the access. If not, the hypercall will return an error. After the commit a5e090acbf545c0a3b04080f8a488b17ec41fe02 "ARM: software-based privileged-no-access support", the kernel can't access any longer the user space by default. This will result to fail on every hypercall made by the userspace (i.e via privcmd). We have to enable the userspace access and then restore the correct permission every time the privcmd is used to made an hypercall. I didn't find generic helpers to do a these operations, so the change is only arm32 specific. Reported-by: Riku Voipio <[email protected]> Signed-off-by: Julien Grall <[email protected]> Signed-off-by: Russell King <[email protected]>
2015-09-11Merge tag 'sound-fix-4.3-rc1' of ↵Linus Torvalds3-5/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes since the last update: the HD-audio quirks as usual with a USB-audio fix and a trivial fix for the old sparc driver" * tag 'sound-fix-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Change internal PCM order ALSA: hda - Fix white noise on Dell M3800 ALSA: hda - Use ALC880_FIXUP_FUJITSU for FSC Amilo M1437 ALSA: hda - Enable headphone jack detect on old Fujitsu laptops ALSA: sparc: amd7930: Fix module autoload for OF platform driver ALSA: hda - Add some FIXUP quirks for white noise on Dell laptop.
2015-09-11Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds16-83/+218
Pull drm fixes from Dave Airlie: "Just a bunch of fixes to squeeze in before -rc1: - three nouveau regression fixes - one qxl regression fix - a bunch of i915 fixes ... and some core displayport/atomic fixes" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/nouveau/device: enable c800 quirk for tecra w50 drm/nouveau/clk/gt215: Unbreak engine pausing for GT21x/MCP7x drm/nouveau/gr/nv04: fix big endian setting on gr context drm/qxl: validate monitors config modes drm/i915: Allow DSI dual link to be configured on any pipe drm/i915: Don't try to use DDR DVFS on CHV when disabled in the BIOS drm/i915: Fix CSR MMIO address check drm/i915: Limit the number of loops for reading a split 64bit register drm/i915: Fix broken mst get_hw_state. drm/i915: Pass hpd_status_i915[] to intel_get_hpd_pins() in pre-g4x uapi/drm/i915_drm.h: fix userspace compilation. drm/i915: Always mark the object as dirty when used by the GPU drm/dp: Add dp_aux_i2c_speed_khz module param to set the assume i2c bus speed drm/dp: Adjust i2c-over-aux retry count based on message size and i2c bus speed drm/dp: Define AUX_RETRY_INTERVAL as 500 us drm/atomic: Fix bookkeeping with TEST_ONLY, v3.
2015-09-11Merge branch 'next' into for-linusDmitry Torokhov17-15/+1019
Prepare second round of input updates for 4.3 merge window.
2015-09-11ARM: domains: add memory dependencies to get_domain/set_domainRussell King1-2/+4
We need to have memory dependencies on get_domain/set_domain to avoid the compiler over-optimising these inline assembly instructions. Loads/stores must not be reordered across a set_domain(), so introduce a compiler barrier for that assembly. The value of get_domain() must not be cached across a set_domain(), but we still want to allow the compiler to optimise it away. Introduce a dependency on current_thread_info()->cpu_domain to avoid this; the new memory clobber in set_domain() should therefore cause the compiler to re-load this. The other advantage of using this is we should have its address in the register set already, or very soon after at most call sites. Tested-by: Robert Jarzmik <[email protected]> Signed-off-by: Russell King <[email protected]>
2015-09-11ARM: domains: thread_info.h no longer needs asm/domains.hRussell King1-1/+0
As of 1eef5d2f1b46 ("ARM: domains: switch to keeping domain value in register") we no longer need to include asm/domains.h into asm/thread_info.h. Remove it. Tested-by: Robert Jarzmik <[email protected]> Signed-off-by: Russell King <[email protected]>
2015-09-11block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkgTejun Heo1-0/+3
While making the root blkg unconditional, ec13b1d6f0a0 ("blkcg: always create the blkcg_gq for the root blkcg") removed the part which clears q->root_blkg and ->root_rl.blkg during q exit. This leaves the two pointers dangling after blkg_destroy_all(). blk-throttle exit path performs blkg traversals and dereferences ->root_blkg and can lead to the following oops. BUG: unable to handle kernel NULL pointer dereference at 0000000000000558 IP: [<ffffffff81389746>] __blkg_lookup+0x26/0x70 ... task: ffff88001b4e2580 ti: ffff88001ac0c000 task.ti: ffff88001ac0c000 RIP: 0010:[<ffffffff81389746>] [<ffffffff81389746>] __blkg_lookup+0x26/0x70 ... Call Trace: [<ffffffff8138d14a>] blk_throtl_drain+0x5a/0x110 [<ffffffff8138a108>] blkcg_drain_queue+0x18/0x20 [<ffffffff81369a70>] __blk_drain_queue+0xc0/0x170 [<ffffffff8136a101>] blk_queue_bypass_start+0x61/0x80 [<ffffffff81388c59>] blkcg_deactivate_policy+0x39/0x100 [<ffffffff8138d328>] blk_throtl_exit+0x38/0x50 [<ffffffff8138a14e>] blkcg_exit_queue+0x3e/0x50 [<ffffffff8137016e>] blk_release_queue+0x1e/0xc0 ... While the bug is a straigh-forward use-after-free bug, it is tricky to reproduce because blkg release is RCU protected and the rest of exit path usually finishes before RCU grace period. This patch fixes the bug by updating blkg_destro_all() to clear q->root_blkg and ->root_rl.blkg. Signed-off-by: Tejun Heo <[email protected]> Reported-by: "Richard W.M. Jones" <[email protected]> Reported-by: Josh Boyer <[email protected]> Link: http://lkml.kernel.org/g/CA+5PVA5rzQ0s4723n5rHBcxQa9t0cW8BPPBekr_9aMRoWt2aYg@mail.gmail.com Fixes: ec13b1d6f0a0 ("blkcg: always create the blkcg_gq for the root blkcg") Cc: [email protected] # v4.2+ Tested-by: Richard W.M. Jones <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-09-11block: Copy a user iovec if it includes gapsSagi Grimberg1-2/+24
For drivers that don't support gaps in the SG lists handed to them we must bounce (copy the user buffers) and pass a bio that does not include gaps. This doesn't matter for any current user, but will help to allow iser which can't handle gaps to use the block virtual boundary instead of using driver-local bounce buffering when handling SG_IO commands. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-09-11block: Refuse adding appending a gapped integrity page to a bioSagi Grimberg1-0/+5
This is only theoretical at the moment given that the only subsystems that generate integrity payloads are the block layer itself and the scsi target (which generate well aligned integrity payloads). But when we will expose integrity meta-data to user-space, we'll need to refuse appending a page with a gap (if the queue virtual boundary is set). Signed-off-by: Sagi Grimberg <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2015-09-11block: Refuse request/bio merges with gaps in the integrity payloadSagi Grimberg3-0/+39
If a driver sets the block queue virtual boundary mask, it means that it cannot handle gaps so we must not allow those in the integrity payload as well. Signed-off-by: Sagi Grimberg <[email protected]> Fixed up by me to have duplicate integrity merge functions, depending on whether block integrity is enabled or not. Fixes a compilations issue with CONFIG_BLK_DEV_INTEGRITY unset. Signed-off-by: Jens Axboe <[email protected]>
2015-09-11CIFS: fix type confusion in copy offload ioctlJann Horn1-0/+6
This might lead to local privilege escalation (code execution as kernel) for systems where the following conditions are met: - CONFIG_CIFS_SMB2 and CONFIG_CIFS_POSIX are enabled - a cifs filesystem is mounted where: - the mount option "vers" was used and set to a value >=2.0 - the attacker has write access to at least one file on the filesystem To attack this, an attacker would have to guess the target_tcon pointer (but guessing wrong doesn't cause a crash, it just returns an error code) and win a narrow race. CC: Stable <[email protected]> Signed-off-by: Jann Horn <[email protected]> Signed-off-by: Steve French <[email protected]>
2015-09-11crypto: testmgr - don't copy from source IV too muchAndrey Ryabinin1-2/+3
While the destination buffer 'iv' is MAX_IVLEN size, the source 'template[i].iv' could be smaller, thus memcpy may read read invalid memory. Use crypto_skcipher_ivsize() to get real ivsize and pass it to memcpy. Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2015-09-11Merge branches 'pm-cpu', 'pm-cpuidle' and 'pm-domains'Rafael J. Wysocki7-16/+94
* pm-cpu: kernel/cpu_pm: fix cpu_cluster_pm_exit comment * pm-cpuidle: cpuidle/coupled: Add sanity check for safe_state_index * pm-domains: staging: board: Migrate away from __pm_genpd_name_add_device() PM / Domains: Ensure subdomain is not in use before removing PM / Domains: Try power off masters in error path of __pm_genpd_poweron()
2015-09-11Merge branch 'pm-cpufreq'Rafael J. Wysocki4-28/+59
* pm-cpufreq: intel_pstate: fix PCT_TO_HWP macro intel_pstate: Fix user input of min/max to legal policy region cpufreq-dt: add suspend frequency support cpufreq: allow cpufreq_generic_suspend() to work without suspend frequency cpufreq: Use __func__ to print function's name cpufreq: staticize cpufreq_cpu_get_raw() cpufreq: Add ARM_MT8173_CPUFREQ dependency on THERMAL cpufreq: dt: Tolerance applies on both sides of target voltage cpufreq: dt: Print error on failing to mark OPPs as shared cpufreq: dt: Check OPP count before marking them shared
2015-09-11Merge branch 'pm-opp'Rafael J. Wysocki2-0/+34
* pm-opp: PM / OPP: Return suspend_opp only if it is enabled PM / OPP: add dev_pm_opp_get_suspend_opp() helper
2015-09-11ASoC: wm8960: correct gain value for input PGA and add microphone PGAZidan Wang1-6/+16
The input PGAs have a gain range from -17.25dB to +30dB in 0.75dB steps. The boost stage can provide additional gain. For line inputs, -12dB to +6dB gain is available on the boost mixer. For micphone inputs, it can provide up to +29dB additional gain from the microphone PGA. Signed-off-by: Zidan Wang <[email protected]> Acked-by: Charles Keepax <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2015-09-11ASoC: wm8960: correct the min gain value of some PGAZidan Wang1-3/+3
The min gain is the corresponding gain value when the register value is 0 instead of 1, just correct it. Signed-off-by: Zidan Wang <[email protected]> Acked-by: Charles Keepax <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2015-09-11spi: spidev: fix possible NULL dereferenceSudip Mukherjee1-1/+2
During the last close we are freeing spidev if spidev->spi is NULL, but just before checking if spidev->spi is NULL we are dereferencing it. Lets add a check there to avoid the NULL dereference. Fixes: 9169051617df ("spi: spidev: Don't mangle max_speed_hz in underlying spi device") Signed-off-by: Sudip Mukherjee <[email protected]> Reviewed-by: Jarkko Nikula <[email protected]> Tested-by: Jarkko Nikula <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2015-09-11ASoC: rt5645: Remove incorrect settingsOder Chiou1-6/+0
The patch removes the incorrect settings to avoid the pop sound in the first playback with headphone after boot. Signed-off-by: Oder Chiou <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2015-09-11ASoC: dapm: fix memory leakSudip Mukherjee1-1/+1
Incase of an unknown event we were directly returning but we missed freeing params. Signed-off-by: Sudip Mukherjee <[email protected]> Signed-off-by: Mark Brown <[email protected]>
2015-09-11perf/x86/intel/bts: Set event->hw.itrace_started in pmu::start to match the ↵Alexander Shishkin1-0/+1
new logic Since event->hw.itrace_started is now set in pmu::start() to signal the beginning of the trace, do so also in the intel_bts driver. Signed-off-by: Alexander Shishkin <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/1437140050-23363-4-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <[email protected]>
2015-09-11target: use stringify.h instead of own definitionDavid Disseldorp2-5/+2
Signed-off-by: David Disseldorp <[email protected]> Acked-by: Andy Grover <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2015-09-11target/user: Fix UFLAG_UNKNOWN_OP handlingAndy Grover1-8/+2
Calling transport_generic_request_failure() from here causes list corruption. We should be using target_complete_cmd() instead. Which we do in all other cases, so the UNKNOWN_OP case can become just another member of the big else/if chain in tcmu_handle_completion(). Signed-off-by: Andy Grover <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2015-09-11target: Remove no-op conditionalAndy Grover1-2/+1
This does nothing, and there are many other places where transport_cmd_check_stop_to_fabric()'s retval is not checked>, If we wanted to check it here, we should probably do it those other places too. Signed-off-by: Andy Grover <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2015-09-11target/user: Remove unused variableAndy Grover1-1/+0
We don't use it any more. Signed-off-by: Andy Grover <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2015-09-11target: Fix max_cmd_sn increment w/o cmdsn mutex regressionsRoland Dreier2-4/+5
Current for-next iscsi target is broken: commit 109e2381749c1cfd94a0d22b2b54142539024973 Author: Roland Dreier <[email protected]> Date: Thu Jul 23 14:53:32 2015 -0700 target: Drop iSCSI use of mutex around max_cmd_sn increment This patch fixes incorrect pr_debug() + atomic_inc_return() usage within iscsit_increment_maxcmdsn() code. Also fix funny iscsit_determine_maxcmdsn() usage and update iscsi_target_do_tx_login_io() code. Reported-by: Sagi Grimberg <[email protected]> Cc: Sagi Grimberg <[email protected]> Signed-off-by: Roland Dreier <[email protected]> Cc: Roland Dreier <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>
2015-09-11target: Attach EXTENDED_COPY local I/O descriptors to xcopy_pt_sessNicholas Bellinger1-2/+4
This patch is a >= v4.1 regression bug-fix where control CDB emulation logic in commit 38b57f82 now expects a se_cmd->se_sess pointer to exist when determining T10-PI support is to be exposed for initiator host ports. To address this bug, go ahead and add locally generated se_cmd descriptors for copy-offload block-copy to it's own stand-alone se_session nexus, while the parent EXTENDED_COPY se_cmd descriptor remains associated with it's originating se_cmd->se_sess nexus. Note a valid se_cmd->se_sess is also required for future support of WRITE_INSERT and READ_STRIP software emulation when submitting backend I/O to se_device that exposes T10-PI suport. Reported-by: Alex Gorbachev <[email protected]> Tested-by: Alex Gorbachev <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Doug Gilbert <[email protected]> Cc: <[email protected]> # v4.1+ Signed-off-by: Nicholas Bellinger <[email protected]>
2015-09-11target/qla2xxx: Honor max_data_sg_nents I/O transfer limitNicholas Bellinger4-4/+78
This patch adds an optional fabric driver provided SGL limit that target-core will honor as it's own internal I/O maximum transfer length limit, as exposed by EVPD=0xb0 block limits parameters. This is required for handling cases when host I/O transfer length exceeds the requested EVPD block limits maximum transfer length. The initial user of this logic is qla2xxx, so that we can avoid having to reject I/Os from some legacy FC hosts where EVPD=0xb0 parameters are not honored. When se_cmd payload length exceeds the provided limit in target_check_max_data_sg_nents() code, se_cmd->data_length + se_cmd->prot_length are reset with se_cmd->residual_count plus underflow bit for outgoing TFO response callbacks. It also checks for existing CDB level underflow + overflow and recalculates final residual_count as necessary. Note this patch currently assumes 1:1 mapping of PAGE_SIZE per struct scatterlist entry. Reported-by: Craig Watson <[email protected]> Cc: Craig Watson <[email protected]> Tested-by: Himanshu Madhani <[email protected]> Cc: Roland Dreier <[email protected]> Cc: Arun Easi <[email protected]> Cc: Giridhar Malavali <[email protected]> Cc: Andrew Vasquez <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Martin K. Petersen <[email protected]> Signed-off-by: Nicholas Bellinger <[email protected]>