aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-02-18IB/mlx5: Enable the ODP capability query verbHaggai Eran1-0/+2
Re-enable the on-demand paging capability query through the extended query device verb. Signed-off-by: Haggai Eran <[email protected]> Reviewed-by: Yann Droneaud <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18IB/core: Add on demand paging caps to ib_uverbs_ex_query_deviceHaggai Eran2-1/+30
Add on-demand paging capabilities reporting to the extended query device verb. Yann Droneaud writes: Note: as offsetof() is used to retrieve the size of the lower chunk of the response, beware that it only works if the upper chunk is right after, without any implicit padding. And, as the size of the latter chunk is added to the base size, implicit padding at the end of the structure is not taken in account. Both point must be taken in account when extending the uverbs functionalities. Signed-off-by: Haggai Eran <[email protected]> Reviewed-by: Yann Droneaud <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18IB/core: Add support for extended query device capsEli Cohen4-41/+104
Add extensible query device capabilities verb to allow adding new features. ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy capability fields to be used by both ib_uverbs_query_device and ib_uverbs_ex_query_device. Following the discussion about this patch [1], the code now validates the command's comp_mask is zero, returning -EINVAL for unknown values, in order to allow extending the verb in the future. The verb also checks the user-space provided response buffer size and only fills in capabilities that will fit in the buffer. In attempt to follow the spirit of presentation [2] by Tzahi Oved that was presented during OpenFabrics Alliance International Developer Workshop 2013, the comp_mask bits will only describe which fields are valid. Furthermore, fields that can simply be cleared when they are not supported, do not require a comp_mask bit at all. The verb returns a response_length field containing the actual number of bytes written by the kernel, so that a newer version running on an older kernel can tell which fields were actually returned. [1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19 http://thread.gmane.org/gmane.linux.kernel.api/7889/ [2] https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Haggai Eran <[email protected]> Reviewed-by: Yann Droneaud <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/cxgb4: Don't hang threads forever waiting on WR repliesHariprasad S1-15/+14
In c4iw_wait_for_reply(), if a FW6_MSG WR reply is not received after C4IW_WR_TO seconds, fail the WR operation and mark the device as fatally dead. Further, if the device is marked fatally dead, then fail the WR wait immediately. Also change the timeout to 60 seconds. Signed-off-by: Steve Wise <[email protected]> Signed-off-by: Hariprasad Shenai <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Fix off by one in ocrdma_query_gid()Dan Carpenter1-1/+1
The ->sgid_tbl[] array has OCRDMA_MAX_SGID number of elements so this test is off by one. ->sgid_tbl is allocated in ocrdma_alloc_resources(). Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Use unsigned for bit indexRasmus Villemoes1-3/+3
In the expressions idx/32 and idx%32, both idx and 32 have signed type, and unfortunately the C standard prescribes rounding to 0, so unless gcc can prove that idx is non-negative, these cannot be implemented as simple shift respectively mask operations. Help gcc by changing the type of idx to unsigned - this cuts another few instructions from the generated code. Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Help gcc generate better code for ocrdma_srq_toggle_bitRasmus Villemoes1-4/+1
gcc emits a surprising amount of code in order to flip a bit. One would think that a single instruction is enough. $ scripts/bloat-o-meter /tmp/ocrdma_verbs.o drivers/infiniband/hw/ocrdma/ocrdma_verbs.o add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-142 (-142) function old new delta ocrdma_post_srq_recv 498 460 -38 ocrdma_poll_cq 2010 1962 -48 ocrdma_discard_cqes 495 439 -56 All three calls of ocrdma_srq_toggle_bit happen within spinlocks, so saving a few useless instructions might be worthwhile. Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Update the ocrdma module version stringMitesh Ahuja1-1/+1
Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: set vlan present bit for user AHDevesh Sharma4-6/+22
For the AH that describs a VLAN interface details, vlan present bit needs to be set during posting a WQE. This patch adds the code to allow it happening. Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: remove reference of ocrdma_dev out of ocrdma_qp structureMitesh Ahuja3-29/+34
Use get_ocrdma_dev(ocrdma_qp->ibqp.device) function to access ocrdma device pointer. Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Add support for interrupt moderationMitesh Ahuja4-0/+119
Add support for interrupt moderation for ocrdma device. Thresholds for high interrupt rates are static values derived based on experimental results. Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Honor return value of ocrdma_resolve_dmacDevesh Sharma1-1/+3
Check for return value for ocrdma_resolve_dmac while setting AV params. Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Allow expansion of the SQ CQEs via buddy CQ expansion of the QPSelvin Xavier1-22/+44
If the SQ and RQ of the QP in error state uses separate CQs, traverse the list of QPs using each CQs and invoke the buddy CQ handler for both SQ and RQ. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Discontinue support of RDMA-READ-WITH-INVALIDATEDevesh Sharma1-2/+0
Remove support for RDMA-READ-WITH-INVALIDATE from ocrdma driver. Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Host crash on destroying device resourcesMitesh Ahuja2-15/+10
1. Cleanup sequence in ocrdma_remove(). The device should be unregistered from IB stack before any device specific cleanup. 2. Always return success in the resource destroy path. In case destroy command returns error, IB stack will trigger cleanup again while closing the uverbs device causing kernel panic BUG_ON(). Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Report correct state in ibv_query_qpPadmanabh Ratnakar1-2/+4
Fix ocrdma_query_qp to refelect correct qp state based on FW. Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Padmanabh Ratnakar <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Debugfs enhancments for ocrdma driverSelvin Xavier6-3/+183
1. Add statistics counters for error cqes. 2. Add file ("reset_stats") to reset rdma stats in Debugfs. Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Report correct count of interrupt vectors while registering ↵Devesh Sharma1-1/+1
ocrdma device Fix ocrdma_register_device to initialize correct number of interrupt vectors in device pointer. Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Move PD resource management to driver.Mitesh Ahuja7-2/+298
Move PD allocation and deallocation from firmware to driver. At driver load time all the PDs will be requested from firmware and their management will be handled by driver to reduce mailbox commands overhead at runtime. Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Increase the GID table size.Mitesh Ahuja1-1/+1
Increase the GID table size from 8 to 16 enteries. Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18RDMA/ocrdma: Add support for IB stack compliant stats in sysfs.Mitesh Ahuja3-1/+89
Add the following per-port sysfs traffic counters for RoCE: port_xmit_packets port_rcv_packets port_rcv_data port_xmit_data Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Selvin Xavier <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2015-02-18Merge tag 'firewire-updates' of ↵Linus Torvalds3-18/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull fireware updates from Stefan Richter: "IEEE 1394 subsystem updates: - Replace made-up, unallocated Vendor and Model values of firewire-core's Configuration ROM register root directory by properly registered IDs. (These IDs are visible to peer nodes on the bus and locally via sysfs, but they are not involved in protocol matching or driver matching, nor are they used in stock udev rules) - Remove some unneccessary code" * tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: core: use correct vendor/model IDs firewire: sbp2: remove redundant check for bidi command firewire: ohci: Remove unused function
2015-02-18Merge tag 'for-linus-20150216' of git://git.infradead.org/linux-mtdLinus Torvalds35-214/+1385
Pull MTD updates from Brian Norris: "NAND: - Add new Hisilicon NAND driver for Hip04 - Add default reboot handler, to ensure all outstanding erase transactions complete in time - jz4740: convert to use GPIO descriptor API - Atmel: add support for sama5d4 - Change default bitflip threshold to 75% of correction strength - Miscellaneous cleanups and bugfixes SPI NOR: - Freescale QuadSPI: - Fix a few probe() and remove() issues - Add a MAINTAINERS entry for this driver - Tweak transfer size to increase read performance - Add suspend/resume support - Add Micron quad I/O support - ST FSM SPI: miscellaneous fixes JFFS2: - gracefully handle corrupted 'offset' field found on flash Other: - bcm47xxpart: add tweaks for a few new devices - mtdconcat: set return lengths properly for mtd_write_oob() - map_ram: enable use with mtdoops - maps: support fallback to ROM/UBI for write-protected NOR flash" * tag 'for-linus-20150216' of git://git.infradead.org/linux-mtd: (46 commits) mtd: hisilicon: && vs & typo jffs2: fix handling of corrupted summary length mtd: hisilicon: add device tree binding documentation mtd: hisilicon: add a new NAND controller driver for hisilicon hip04 Soc mtd: avoid registering reboot notifier twice mtd: concat: set the return lengths properly mtd: kconfig: replace PPC_OF with PPC mtd: denali: remove unnecessary stubs mtd: nand: remove redundant local variable MAINTAINERS: add maintainer entry for FREESCALE QUAD SPI driver mtd: fsl-quadspi: improve read performance by increase AHB transfer size mtd: fsl-quadspi: Remove unnecessary 'map_failed' label mtd: fsl-quadspi: Remove unneeded success/error messages mtd: fsl-quadspi: Fix the error paths mtd: nand: omap: drop condition with no effect mtd: nand: jz4740: Convert to GPIO descriptor API mtd: nand: Request strength instead of bytes for soft BCH mtd: nand: default bitflip-reporting threshold to 75% of correction strength mtd: atmel_nand: introduce a new compatible string for sama5d4 chip mtd: atmel_nand: return max bitflips in all sectors in pmecc_correction() ...
2015-02-18Merge branch 'cleanups'Trond Myklebust4779-99889/+176167
Merge cleanups requested by Linus. * cleanups: (3 commits) pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit nfs: Can call nfs_clear_page_commit() instead nfs: Provide and use helper functions for marking a page as unstable
2015-02-18pnfs: Refactor the *_layout_mark_request_commit to use ↵Tom Haynes4-75/+45
pnfs_layout_mark_request_commit The File Layout's filelayout_mark_request_commit() is almost the Flex File Layout's ff_layout_mark_request_commit(). And that can be reduced by calling into nfs_request_add_commit_list(). Signed-off-by: Tom Haynes <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2015-02-18sched/rt: Avoid obvious configuration failPeter Zijlstra1-3/+11
Setting the root group's cpu.rt_runtime_us to 0 is a bad thing; it would disallow the kernel creating RT tasks. One can of course still set it to 1, which will (likely) still wreck your kernel, but at least make it clear that setting it to 0 is not good. Collect both sanity checks into the one place while we're there. Suggested-by: Zefan Li <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched/autogroup: Fix failure to set cpu.rt_runtime_usPeter Zijlstra2-5/+7
Because task_group() uses a cache of autogroup_task_group(), whose output depends on sched_class, switching classes can generate problems. In particular, when started as fair, the cache points to the autogroup, so when switching to RT the tg_rt_schedulable() test fails for every cpu.rt_{runtime,period}_us change because now the autogroup has tasks and no runtime. Furthermore, going back to the previous semantics of varying task_group() with sched_class has the down-side that the sched_debug output varies as well, even though the task really is in the autogroup. Therefore add an autogroup exception to tg_has_rt_tasks() -- such that both (all) task_group() usages in sched/core now have one. And remove all the remnants of the variable task_group() output. Reported-by: Zefan Li <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Stefan Bader <[email protected]> Fixes: 8323f26ce342 ("sched: Fix race in task_group()") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched/dl: Do update_rq_clock() in yield_task_dl()Kirill Tkhai1-0/+1
update_curr_dl() needs actual rq clock. Signed-off-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/1423040972.18770.10.camel@tkhai Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18dm snapshot: fix a possible invalid memory access on unloadMikulas Patocka1-2/+2
When the snapshot target is unloaded, snapshot_dtr() waits until pending_exceptions_count drops to zero. Then, it destroys the snapshot. Therefore, the function that decrements pending_exceptions_count should not touch the snapshot structure after the decrement. pending_complete() calls free_pending_exception(), which decrements pending_exceptions_count, and then it performs up_write(&s->lock) and it calls retry_origin_bios() which dereferences s->origin. These two memory accesses to the fields of the snapshot may touch the dm_snapshot struture after it is freed. This patch moves the call to free_pending_exception() to the end of pending_complete(), so that the snapshot will not be destroyed while pending_complete() is in progress. Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Cc: [email protected]
2015-02-18dm: fix a race condition in dm_get_mdMikulas Patocka1-17/+10
The function dm_get_md finds a device mapper device with a given dev_t, increases the reference count and returns the pointer. dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the device, tests that the device doesn't have DMF_DELETING or DMF_FREEING flag, drops _minor_lock and returns pointer to the device. dm_get_md then calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag, otherwise it increments the reference count. There is a possible race condition - after dm_find_md exits and before dm_get is called, there are no locks held, so the device may disappear or DMF_FREEING flag may be set, which results in BUG. To fix this bug, we need to call dm_get while we hold _minor_lock. This patch renames dm_find_md to dm_get_md and changes it so that it calls dm_get while holding the lock. Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]> Cc: [email protected]
2015-02-18x86/irq: Check for valid irq descriptor in check_irq_vectors_for_cpu_disable()Joerg Roedel1-0/+3
When an interrupt is migrated away from a cpu it will stay in its vector_irq array until smp_irq_move_cleanup_interrupt succeeded. The cfg->move_in_progress flag is cleared already when the IPI was sent. When the interrupt is destroyed after migration its 'struct irq_desc' is freed and the vector_irq arrays are cleaned up. But since cfg->move_in_progress is already 0 the references at cpus before the last migration will not be cleared. So this would leave a reference to an already destroyed irq alive. When the cpu is taken down at this point, the check_irq_vectors_for_cpu_disable() function finds a valid irq number in the vector_irq array, but gets NULL for its descriptor and dereferences it, causing a kernel panic. This has been observed on real systems at shutdown. Add a check to check_irq_vectors_for_cpu_disable() for a valid 'struct irq_desc' to prevent this issue. Signed-off-by: Joerg Roedel <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Jiang Liu <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Jan Beulich <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Prarit Bhargava <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18x86/irq: Fix regression caused by commit b568b8601f05Jiang Liu1-0/+5
Commit b568b8601f05 ("Treat SCI interrupt as normal GSI interrupt") accidently removes support of legacy PIC interrupt when fixing a regression for Xen, which causes a nasty regression on HP/Compaq nc6000 where we fail to register the ACPI interrupt, and thus lose eg. thermal notifications leading a potentially overheated machine. So reintroduce support of legacy PIC based ACPI SCI interrupt. Reported-by: Ville Syrjälä <[email protected]> Tested-by: Ville Syrjälä <[email protected]> Signed-off-by: Jiang Liu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Pavel Machek <[email protected]> Cc: <[email protected]> # 3.19+ Cc: H. Peter Anvin <[email protected]> Cc: Len Brown <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Sander Eikelenboom <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18x86/spinlocks/paravirt: Fix memory corruption on unlockRaghavendra K T3-56/+64
Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x 0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: Hinted missing TICKET_LOCK_INC for kick] [Oleg: Moved slowpath flag to head, ticket_equals idea] [PeterZ: Added detailed changelog] Suggested-by: Linus Torvalds <[email protected]> Reported-by: Sasha Levin <[email protected]> Tested-by: Sasha Levin <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Oleg Nesterov <[email protected]> Cc: Andrew Jones <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Dave Jones <[email protected]> Cc: David Vrabel <[email protected]> Cc: Fernando Luis Vázquez Cao <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Ulrich Obergfell <[email protected]> Cc: Waiman Long <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18ntp: Fixup adjtimex freq validation on 32-bit systemsJohn Stultz1-3/+7
Additional validation of adjtimex freq values to avoid potential multiplication overflows were added in commit 5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values) Unfortunately the patch used LONG_MAX/MIN instead of LLONG_MAX/MIN, which was fine on 64-bit systems, but being much smaller on 32-bit systems caused false positives resulting in most direct frequency adjustments to fail w/ EINVAL. ntpd only does direct frequency adjustments at startup, so the issue was not as easily observed there, but other time sync applications like ptpd and chrony were more effected by the bug. See bugs: https://bugzilla.kernel.org/show_bug.cgi?id=92481 https://bugzilla.redhat.com/show_bug.cgi?id=1188074 This patch changes the checks to use LLONG_MAX for clarity, and additionally the checks are disabled on 32-bit systems since LLONG_MAX/PPM_SCALE is always larger then the 32-bit long freq value, so multiplication overflows aren't possible there. Reported-by: Josh Boyer <[email protected]> Reported-by: George Joseph <[email protected]> Tested-by: George Joseph <[email protected]> Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> # v3.19+ Cc: Linus Torvalds <[email protected]> Cc: Sasha Levin <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Prettified the changelog and the comments a bit. ] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18Merge tag 'efi-urgent' of ↵Ingo Molnar6-213/+307
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: " - Leave a valid 64-bit IDT installed during runtime EFI mixed mode calls to avoid triple faults if an NMI/MCE is received. - Revert Ard's change to the libstub get_memory_map() that went into the v3.20 merge window because it causes boot regressions on Qemu and Xen. " Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched: Prevent recursion in io_schedule()NeilBrown2-22/+19
io_schedule() calls blk_flush_plug() which, depending on the contents of current->plug, can initiate arbitrary blk-io requests. Note that this contrasts with blk_schedule_flush_plug() which requires all non-trivial work to be handed off to a separate thread. This makes it possible for io_schedule() to recurse, and initiating block requests could possibly call mempool_alloc() which, in times of memory pressure, uses io_schedule(). Apart from any stack usage issues, io_schedule() will not behave correctly when called recursively as delayacct_blkio_start() does not allow for repeated calls. So: - use ->in_iowait to detect recursion. Set it earlier, and restore it to the old value. - move the call to "raw_rq" after the call to blk_flush_plug(). As this is some sort of per-cpu thing, we want some chance that we are on the right CPU - When io_schedule() is called recurively, use blk_schedule_flush_plug() which cannot further recurse. - as this makes io_schedule() a lot more complex and as io_schedule() must match io_schedule_timeout(), but all the changes in io_schedule_timeout() and make io_schedule a simple wrapper for that. Signed-off-by: NeilBrown <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> [ Moved the now rudimentary io_schedule() into sched.h. ] Cc: Jens Axboe <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Tony Battersby <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched/completion: Serialize completion_done() with complete()Oleg Nesterov1-2/+17
Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when reading completion state" was not correct, without lock/unlock the code like stop_machine_from_inactive_cpu() while (!completion_done()) cpu_relax(); can return before complete() finishes its spin_unlock() which writes to this memory. And spin_unlock_wait(). While at it, change try_wait_for_completion() to use READ_ONCE(). Reported-by: Paul E. McKenney <[email protected]> Reported-by: Davidlohr Bueso <[email protected]> Tested-by: Paul E. McKenney <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> [ Added a comment with the barrier. ] Cc: Linus Torvalds <[email protected]> Cc: Nicholas Mc Guire <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: de30ec47302c ("sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched: Fix preempt_schedule_common() triggering tracing recursionFrederic Weisbecker1-1/+1
Since the function graph tracer needs to disable preemption, it might call preempt_schedule() after reenabling it if something triggered the need for rescheduling in between. Therefore we can't trace preempt_schedule() itself because we would face a function tracing recursion otherwise as the tracer is always called before PREEMPT_ACTIVE gets set to prevent that recursion. This is why preempt_schedule() is tagged as "notrace". But the same issue applies to every function called by preempt_schedule() before PREEMPT_ACTIVE is actually set. And preempt_schedule_common() is one such example. Unfortunately we forgot to tag it as notrace as well and as a result we are encountering tracing recursion since it got introduced by: a18b5d0181923 ("sched: Fix missing preemption opportunity") Let's fix that by applying the appropriate function tag to preempt_schedule_common(). Reported-by: Huang Ying <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Steven Rostedt <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched/dl: Prevent enqueue of a sleeping task in dl_task_timer()Kirill Tkhai1-0/+20
A deadline task may be throttled and dequeued at the same time. This happens, when it becomes throttled in schedule(), which is called to go to sleep: current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() prev->on_rq = 0; Later the timer fires, but the task is still dequeued: dl_task_timer() enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */ Someone wakes it up: try_to_wake_up() enqueue_dl_entity() BUG_ON(on_dl_rq()) Patch fixes this problem, it prevents queueing !on_rq tasks on dl_rq. Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> [ Wrote comment. ] Cc: Juri Lelli <[email protected]> Fixes: 1019a359d3dc ("sched/deadline: Fix stale yield state") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched: Make dl_task_time() use task_rq_lock()Peter Zijlstra3-85/+79
Kirill reported that a dl task can be throttled and dequeued at the same time. This happens, when it becomes throttled in schedule(), which is called to go to sleep: current->state = TASK_INTERRUPTIBLE; schedule() deactivate_task() dequeue_task_dl() update_curr_dl() start_dl_timer() __dequeue_task_dl() prev->on_rq = 0; This invalidates the assumption from commit 0f397f2c90ce ("sched/dl: Fix race in dl_task_timer()"): "The only reason we don't strictly need ->pi_lock now is because we're guaranteed to have p->state == TASK_RUNNING here and are thus free of ttwu races". And therefore we have to use the full task_rq_lock() here. This further amends the fact that we forgot to update the rq lock loop for TASK_ON_RQ_MIGRATE, from commit cca26e8009d1 ("sched: Teach scheduler to understand TASK_ON_RQ_MIGRATING state"). Reported-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Juri Lelli <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18sched: Clarify ordering between task_rq_lock() and move_queued_task()Peter Zijlstra1-0/+16
There was a wee bit of confusion around the exact ordering here; clarify things. Reported-by: Kirill Tkhai <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Paul E. McKenney <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-02-18Revert "efi/libstub: Call get_memory_map() to obtain map and desc sizes"Matt Fleming1-10/+6
This reverts commit d1a8d66b9177105e898e73716f97eb61842c457a. Ard reported a boot failure when running UEFI under Qemu and Xen and experimenting with various Tianocore build options, "As it turns out, when allocating room for the UEFI memory map using UEFI's AllocatePool (), it may result in two new memory map entries being created, for instance, when using Tianocore's preallocated region feature. For example, the following region 0x00005ead5000-0x00005ebfffff [Conventional Memory| | | | | |WB|WT|WC|UC] may be split like this 0x00005ead5000-0x00005eae2fff [Conventional Memory| | | | | |WB|WT|WC|UC] 0x00005eae3000-0x00005eae4fff [Loader Data | | | | | |WB|WT|WC|UC] 0x00005eae5000-0x00005ebfffff [Conventional Memory| | | | | |WB|WT|WC|UC] if the preallocated Loader Data region was chosen to be right in the middle of the original free space. After patch d1a8d66b9177 ("efi/libstub: Call get_memory_map() to obtain map and desc sizes"), this is not being dealt with correctly anymore, as the existing logic to allocate room for a single additional entry has become insufficient." Mark requested to reinstate the old loop we had before commit d1a8d66b9177, which grows the memory map buffer until it's big enough to hold the EFI memory map. Acked-by: Ard Biesheuvel <[email protected]> Acked-by: Mark Rutland <[email protected]> Signed-off-by: Matt Fleming <[email protected]>
2015-02-18ARM: mm: Remove Kconfig symbol CACHE_PL310Paul Bolle2-8/+0
Commit 20e783e39e55 ("ARM: 8296/1: cache-l2x0: clean up aurora cache handling") removed the only user of the Kconfig symbol CACHE_PL310. Setting CACHE_PL310 is now pointless. Remove its Kconfig entry, and one select of this symbol. Signed-off-by: Paul Bolle <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]>
2015-02-18Merge tag 'arm-soc/for-3.20/dts' of http://github.com/broadcom/stblinux into ↵Arnd Bergmann2-2/+23
fixes This pull request contains the following Broadcom SoCs Device Tree changes: - Ray adds support for the Cygnus i2c Device Tree controller on Cygnus SoCs - Fixes to the BCM63138 dtsi file for the L2 cache controller properties * tag 'arm-soc/for-3.20/dts' of http://github.com/broadcom/stblinux: ARM: dts: add I2C device nodes for Broadcom Cygnus ARM: dts: BCM63xx: fix L2 cache properties
2015-02-18ARM: rockchip: force built-in regulator support for PMArnd Bergmann1-0/+1
The rockchips suspend/resume code requires regulators to work, and gives a compile-time error if they are not available: arch/arm/mach-rockchip/built-in.o: In function `rk3288_suspend_finish': :(.text+0x146): undefined reference to `regulator_suspend_finish' arch/arm/mach-rockchip/built-in.o: In function `rk3288_suspend_prepare': :(.text+0x18e): undefined reference to `regulator_suspend_prepare' To solve this, we now enable regulators whenever they are needed, which is what we do on a lot of other platforms as well. Signed-off-by: Arnd Bergmann <[email protected]>
2015-02-18ARM: mvebu: build armada375-smp code conditionallyArnd Bergmann1-1/+1
mvebu_armada375_smp_wa_init is only used on armada 375 but is defined for all mvebu machines. As it calls a function that is only provided sometimes, this can result in a link error: arch/arm/mach-mvebu/built-in.o: In function `mvebu_armada375_smp_wa_init': :(.text+0x228): undefined reference to `mvebu_setup_boot_addr_wa' To solve this, we can just change the existing #ifdef around the function to also check for Armada375 SMP platforms. Signed-off-by: Arnd Bergmann <[email protected]> Fixes: 305969fb6292 ("ARM: mvebu: use the common function for Armada 375 SMP workaround") Cc: Andrew Lunn <[email protected]> Cc: Jason Cooper <[email protected]> Cc: Gregory Clement <[email protected]>
2015-02-18ARM: sti: always enable RESET_CONTROLLERArnd Bergmann1-0/+1
A lot of the sti device drivers require reset controller support, but do not all have individual 'depends on RESET_CONTROLLER' statements. Using 'select' here once avoids a lot of build errors resulting from this. Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Maxime Coquelin <[email protected]> Acked-by: Patrice Chotard <[email protected]> Cc: Srinivas Kandagatla <[email protected]>
2015-02-18ARM: rockchip: make rockchip_suspend_init conditionalArnd Bergmann1-0/+6
If CONFIG_PM_SLEEP is disabled, we get a build error for rockchips: arch/arm/mach-rockchip/built-in.o: In function `rockchip_dt_init': :(.init.text+0x1c): undefined reference to `rockchip_suspend_init' This adds an inline alternative for that case. Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Heiko Stuebner <[email protected]> Cc: [email protected]
2015-02-18ARM: ixp4xx: fix {in,out}s{bwl} data typesArnd Bergmann1-6/+13
Most platforms use void pointer arguments in these functions, but ixp4xx does not, which triggers lots of warnings in device drivers like: net/ethernet/8390/ne2k-pci.c: In function 'ne2k_pci_get_8390_hdr': net/ethernet/8390/ne2k-pci.c:503:3: warning: passing argument 2 of 'insw' from incompatible pointer type insw(NE_BASE + NE_DATAPORT, hdr, sizeof(struct e8390_pkt_hdr)>>1); ^ In file included from include/asm/io.h:214:0, from /git/arm-soc/include/linux/io.h:22, from /git/arm-soc/include/linux/pci.h:31, from net/ethernet/8390/ne2k-pci.c:48: mach-ixp4xx/include/mach/io.h:316:91: note: expected 'u16 *' but argument is of type 'struct e8390_pkt_hdr *' static inline void insw(u32 io_addr, u16 *vaddr, u32 count) Fixing the drivers seems hopeless, so this changes the ixp4xx code to do the same as the others to avoid the warnings. Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Krzysztof Halasa <[email protected]> Cc: Imre Kaloz <[email protected]>
2015-02-18ARM: prima2: do not select SMP_ON_UPArnd Bergmann1-1/+0
The new Atlas7 platform implicitly selects 'CONFIG_SMP_ON_UP', which leads to problems if we enable building the platform without MMU, as that combination is not allowed and causes a link error: arch/arm/kernel/built-in.o: In function `c_show': :(.text+0x1872): undefined reference to `smp_on_up' :(.text+0x1876): undefined reference to `smp_on_up' arch/arm/kernel/built-in.o: In function `arch_irq_work_raise': :(.text+0x3d48): undefined reference to `smp_on_up' :(.text+0x3d4c): undefined reference to `smp_on_up' arch/arm/kernel/built-in.o: In function `smp_setup_processor_id': :(.init.text+0x180): undefined reference to `smp_on_up' This removes the 'select' statement. Signed-off-by: Arnd Bergmann <[email protected]> Fixes: 4cba058526a7 ("ARM: sirf: add Atlas7 machine support") Acked-by: Barry Song <[email protected]> Cc: Zhiwu Song <[email protected]>