Age | Commit message (Collapse) | Author | Files | Lines |
|
Re-enable the on-demand paging capability query through the
extended query device verb.
Signed-off-by: Haggai Eran <[email protected]>
Reviewed-by: Yann Droneaud <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Add on-demand paging capabilities reporting to the extended query device verb.
Yann Droneaud writes:
Note: as offsetof() is used to retrieve the size of the lower chunk
of the response, beware that it only works if the upper chunk
is right after, without any implicit padding. And, as the size of
the latter chunk is added to the base size, implicit padding at the
end of the structure is not taken in account. Both point must be
taken in account when extending the uverbs functionalities.
Signed-off-by: Haggai Eran <[email protected]>
Reviewed-by: Yann Droneaud <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Add extensible query device capabilities verb to allow adding new features.
ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to copy
capability fields to be used by both ib_uverbs_query_device and
ib_uverbs_ex_query_device.
Following the discussion about this patch [1], the code now validates
the command's comp_mask is zero, returning -EINVAL for unknown values,
in order to allow extending the verb in the future.
The verb also checks the user-space provided response buffer size and
only fills in capabilities that will fit in the buffer. In attempt to
follow the spirit of presentation [2] by Tzahi Oved that was presented
during OpenFabrics Alliance International Developer Workshop 2013, the
comp_mask bits will only describe which fields are valid. Furthermore,
fields that can simply be cleared when they are not supported, do not
require a comp_mask bit at all. The verb returns a response_length
field containing the actual number of bytes written by the kernel, so
that a newer version running on an older kernel can tell which fields
were actually returned.
[1] [PATCH v1 0/5] IB/core: extended query device caps cleanup for v3.19
http://thread.gmane.org/gmane.linux.kernel.api/7889/
[2] https://www.openfabrics.org/images/docs/2013_Dev_Workshop/Tues_0423/2013_Workshop_Tues_0830_Tzahi_Oved-verbs_extensions_ofa_2013-tzahio.pdf
Signed-off-by: Eli Cohen <[email protected]>
Signed-off-by: Haggai Eran <[email protected]>
Reviewed-by: Yann Droneaud <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
In c4iw_wait_for_reply(), if a FW6_MSG WR reply is not received after
C4IW_WR_TO seconds, fail the WR operation and mark the device as fatally
dead. Further, if the device is marked fatally dead, then fail the WR
wait immediately.
Also change the timeout to 60 seconds.
Signed-off-by: Steve Wise <[email protected]>
Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
The ->sgid_tbl[] array has OCRDMA_MAX_SGID number of elements so this
test is off by one. ->sgid_tbl is allocated in ocrdma_alloc_resources().
Signed-off-by: Dan Carpenter <[email protected]>
Acked-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
In the expressions idx/32 and idx%32, both idx and 32 have signed
type, and unfortunately the C standard prescribes rounding to 0, so
unless gcc can prove that idx is non-negative, these cannot be
implemented as simple shift respectively mask operations. Help gcc by
changing the type of idx to unsigned - this cuts another few
instructions from the generated code.
Signed-off-by: Rasmus Villemoes <[email protected]>
Acked-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
gcc emits a surprising amount of code in order to flip a bit. One
would think that a single instruction is enough.
$ scripts/bloat-o-meter /tmp/ocrdma_verbs.o drivers/infiniband/hw/ocrdma/ocrdma_verbs.o
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-142 (-142)
function old new delta
ocrdma_post_srq_recv 498 460 -38
ocrdma_poll_cq 2010 1962 -48
ocrdma_discard_cqes 495 439 -56
All three calls of ocrdma_srq_toggle_bit happen within spinlocks, so
saving a few useless instructions might be worthwhile.
Signed-off-by: Rasmus Villemoes <[email protected]>
Acked-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
For the AH that describs a VLAN interface details, vlan present bit
needs to be set during posting a WQE. This patch adds the code to
allow it happening.
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Use get_ocrdma_dev(ocrdma_qp->ibqp.device) function to access ocrdma
device pointer.
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Add support for interrupt moderation for ocrdma device. Thresholds
for high interrupt rates are static values derived based on experimental
results.
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Check for return value for ocrdma_resolve_dmac while setting AV params.
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
If the SQ and RQ of the QP in error state uses separate CQs, traverse
the list of QPs using each CQs and invoke the buddy CQ handler for
both SQ and RQ.
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Remove support for RDMA-READ-WITH-INVALIDATE from ocrdma driver.
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
1. Cleanup sequence in ocrdma_remove(). The device should be
unregistered from IB stack before any device specific cleanup.
2. Always return success in the resource destroy path. In case destroy
command returns error, IB stack will trigger cleanup again while
closing the uverbs device causing kernel panic BUG_ON().
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Fix ocrdma_query_qp to refelect correct qp state based on FW.
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Padmanabh Ratnakar <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
1. Add statistics counters for error cqes.
2. Add file ("reset_stats") to reset rdma stats in Debugfs.
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
ocrdma device
Fix ocrdma_register_device to initialize correct number of interrupt
vectors in device pointer.
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Move PD allocation and deallocation from firmware to driver. At
driver load time all the PDs will be requested from firmware and their
management will be handled by driver to reduce mailbox commands
overhead at runtime.
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Increase the GID table size from 8 to 16 enteries.
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
Add the following per-port sysfs traffic counters for RoCE:
port_xmit_packets
port_rcv_packets
port_rcv_data
port_xmit_data
Signed-off-by: Mitesh Ahuja <[email protected]>
Signed-off-by: Devesh Sharma <[email protected]>
Signed-off-by: Selvin Xavier <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull fireware updates from Stefan Richter:
"IEEE 1394 subsystem updates:
- Replace made-up, unallocated Vendor and Model values of
firewire-core's Configuration ROM register root directory by
properly registered IDs. (These IDs are visible to peer nodes on
the bus and locally via sysfs, but they are not involved in
protocol matching or driver matching, nor are they used in stock
udev rules)
- Remove some unneccessary code"
* tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: use correct vendor/model IDs
firewire: sbp2: remove redundant check for bidi command
firewire: ohci: Remove unused function
|
|
Pull MTD updates from Brian Norris:
"NAND:
- Add new Hisilicon NAND driver for Hip04
- Add default reboot handler, to ensure all outstanding erase
transactions complete in time
- jz4740: convert to use GPIO descriptor API
- Atmel: add support for sama5d4
- Change default bitflip threshold to 75% of correction strength
- Miscellaneous cleanups and bugfixes
SPI NOR:
- Freescale QuadSPI:
- Fix a few probe() and remove() issues
- Add a MAINTAINERS entry for this driver
- Tweak transfer size to increase read performance
- Add suspend/resume support
- Add Micron quad I/O support
- ST FSM SPI: miscellaneous fixes
JFFS2:
- gracefully handle corrupted 'offset' field found on flash
Other:
- bcm47xxpart: add tweaks for a few new devices
- mtdconcat: set return lengths properly for mtd_write_oob()
- map_ram: enable use with mtdoops
- maps: support fallback to ROM/UBI for write-protected NOR flash"
* tag 'for-linus-20150216' of git://git.infradead.org/linux-mtd: (46 commits)
mtd: hisilicon: && vs & typo
jffs2: fix handling of corrupted summary length
mtd: hisilicon: add device tree binding documentation
mtd: hisilicon: add a new NAND controller driver for hisilicon hip04 Soc
mtd: avoid registering reboot notifier twice
mtd: concat: set the return lengths properly
mtd: kconfig: replace PPC_OF with PPC
mtd: denali: remove unnecessary stubs
mtd: nand: remove redundant local variable
MAINTAINERS: add maintainer entry for FREESCALE QUAD SPI driver
mtd: fsl-quadspi: improve read performance by increase AHB transfer size
mtd: fsl-quadspi: Remove unnecessary 'map_failed' label
mtd: fsl-quadspi: Remove unneeded success/error messages
mtd: fsl-quadspi: Fix the error paths
mtd: nand: omap: drop condition with no effect
mtd: nand: jz4740: Convert to GPIO descriptor API
mtd: nand: Request strength instead of bytes for soft BCH
mtd: nand: default bitflip-reporting threshold to 75% of correction strength
mtd: atmel_nand: introduce a new compatible string for sama5d4 chip
mtd: atmel_nand: return max bitflips in all sectors in pmecc_correction()
...
|
|
Merge cleanups requested by Linus.
* cleanups: (3 commits)
pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit
nfs: Can call nfs_clear_page_commit() instead
nfs: Provide and use helper functions for marking a page as unstable
|
|
pnfs_layout_mark_request_commit
The File Layout's filelayout_mark_request_commit() is almost the
Flex File Layout's ff_layout_mark_request_commit(). And that can
be reduced by calling into nfs_request_add_commit_list().
Signed-off-by: Tom Haynes <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Setting the root group's cpu.rt_runtime_us to 0 is a bad thing; it
would disallow the kernel creating RT tasks.
One can of course still set it to 1, which will (likely) still wreck
your kernel, but at least make it clear that setting it to 0 is not
good.
Collect both sanity checks into the one place while we're there.
Suggested-by: Zefan Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Because task_group() uses a cache of autogroup_task_group(), whose
output depends on sched_class, switching classes can generate
problems.
In particular, when started as fair, the cache points to the
autogroup, so when switching to RT the tg_rt_schedulable() test fails
for every cpu.rt_{runtime,period}_us change because now the autogroup
has tasks and no runtime.
Furthermore, going back to the previous semantics of varying
task_group() with sched_class has the down-side that the sched_debug
output varies as well, even though the task really is in the
autogroup.
Therefore add an autogroup exception to tg_has_rt_tasks() -- such that
both (all) task_group() usages in sched/core now have one. And remove
all the remnants of the variable task_group() output.
Reported-by: Zefan Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mike Galbraith <[email protected]>
Cc: Stefan Bader <[email protected]>
Fixes: 8323f26ce342 ("sched: Fix race in task_group()")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
update_curr_dl() needs actual rq clock.
Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/1423040972.18770.10.camel@tkhai
Signed-off-by: Ingo Molnar <[email protected]>
|
|
When the snapshot target is unloaded, snapshot_dtr() waits until
pending_exceptions_count drops to zero. Then, it destroys the snapshot.
Therefore, the function that decrements pending_exceptions_count
should not touch the snapshot structure after the decrement.
pending_complete() calls free_pending_exception(), which decrements
pending_exceptions_count, and then it performs up_write(&s->lock) and it
calls retry_origin_bios() which dereferences s->origin. These two
memory accesses to the fields of the snapshot may touch the dm_snapshot
struture after it is freed.
This patch moves the call to free_pending_exception() to the end of
pending_complete(), so that the snapshot will not be destroyed while
pending_complete() is in progress.
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]
|
|
The function dm_get_md finds a device mapper device with a given dev_t,
increases the reference count and returns the pointer.
dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
flag, drops _minor_lock and returns pointer to the device. dm_get_md then
calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
otherwise it increments the reference count.
There is a possible race condition - after dm_find_md exits and before
dm_get is called, there are no locks held, so the device may disappear or
DMF_FREEING flag may be set, which results in BUG.
To fix this bug, we need to call dm_get while we hold _minor_lock. This
patch renames dm_find_md to dm_get_md and changes it so that it calls
dm_get while holding the lock.
Signed-off-by: Mikulas Patocka <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Cc: [email protected]
|
|
When an interrupt is migrated away from a cpu it will stay
in its vector_irq array until smp_irq_move_cleanup_interrupt
succeeded. The cfg->move_in_progress flag is cleared already
when the IPI was sent.
When the interrupt is destroyed after migration its 'struct
irq_desc' is freed and the vector_irq arrays are cleaned up.
But since cfg->move_in_progress is already 0 the references
at cpus before the last migration will not be cleared. So
this would leave a reference to an already destroyed irq
alive.
When the cpu is taken down at this point, the
check_irq_vectors_for_cpu_disable() function finds a valid irq
number in the vector_irq array, but gets NULL for its
descriptor and dereferences it, causing a kernel panic.
This has been observed on real systems at shutdown. Add a
check to check_irq_vectors_for_cpu_disable() for a valid
'struct irq_desc' to prevent this issue.
Signed-off-by: Joerg Roedel <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Jiang Liu <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: K. Y. Srinivasan <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit b568b8601f05 ("Treat SCI interrupt as normal GSI interrupt")
accidently removes support of legacy PIC interrupt when fixing a
regression for Xen, which causes a nasty regression on HP/Compaq
nc6000 where we fail to register the ACPI interrupt, and thus
lose eg. thermal notifications leading a potentially overheated
machine.
So reintroduce support of legacy PIC based ACPI SCI interrupt.
Reported-by: Ville Syrjälä <[email protected]>
Tested-by: Ville Syrjälä <[email protected]>
Signed-off-by: Jiang Liu <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Pavel Machek <[email protected]>
Cc: <[email protected]> # 3.19+
Cc: H. Peter Anvin <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Sander Eikelenboom <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(&lock->tickets.head, TICKET_LOCK_INC);
/* add_smp() is a full mb() */
if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);
which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more. Exactly
because a fast-path lock might come in, and release the whole data
structure.
Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.
So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.
2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.
Result:
setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
benchmark overcommit %improve
kernbench 1x -0.13
kernbench 2x 0.02
dbench 1x -1.77
dbench 2x -0.63
[Jeremy: Hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moved slowpath flag to head, ticket_equals idea]
[PeterZ: Added detailed changelog]
Suggested-by: Linus Torvalds <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Tested-by: Sasha Levin <[email protected]>
Signed-off-by: Raghavendra K T <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Cc: Andrew Jones <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Boris Ostrovsky <[email protected]>
Cc: Christian Borntraeger <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Vrabel <[email protected]>
Cc: Fernando Luis Vázquez Cao <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Ulrich Obergfell <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Additional validation of adjtimex freq values to avoid
potential multiplication overflows were added in commit
5e5aeb4367b (time: adjtimex: Validate the ADJ_FREQUENCY values)
Unfortunately the patch used LONG_MAX/MIN instead of
LLONG_MAX/MIN, which was fine on 64-bit systems, but being
much smaller on 32-bit systems caused false positives
resulting in most direct frequency adjustments to fail w/
EINVAL.
ntpd only does direct frequency adjustments at startup, so
the issue was not as easily observed there, but other time
sync applications like ptpd and chrony were more effected by
the bug.
See bugs:
https://bugzilla.kernel.org/show_bug.cgi?id=92481
https://bugzilla.redhat.com/show_bug.cgi?id=1188074
This patch changes the checks to use LLONG_MAX for
clarity, and additionally the checks are disabled
on 32-bit systems since LLONG_MAX/PPM_SCALE is always
larger then the 32-bit long freq value, so multiplication
overflows aren't possible there.
Reported-by: Josh Boyer <[email protected]>
Reported-by: George Joseph <[email protected]>
Tested-by: George Joseph <[email protected]>
Signed-off-by: John Stultz <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: <[email protected]> # v3.19+
Cc: Linus Torvalds <[email protected]>
Cc: Sasha Levin <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Prettified the changelog and the comments a bit. ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
" - Leave a valid 64-bit IDT installed during runtime EFI mixed mode
calls to avoid triple faults if an NMI/MCE is received.
- Revert Ard's change to the libstub get_memory_map() that went into
the v3.20 merge window because it causes boot regressions on Qemu and
Xen. "
Signed-off-by: Ingo Molnar <[email protected]>
|
|
io_schedule() calls blk_flush_plug() which, depending on the
contents of current->plug, can initiate arbitrary blk-io requests.
Note that this contrasts with blk_schedule_flush_plug() which requires
all non-trivial work to be handed off to a separate thread.
This makes it possible for io_schedule() to recurse, and initiating
block requests could possibly call mempool_alloc() which, in times of
memory pressure, uses io_schedule().
Apart from any stack usage issues, io_schedule() will not behave
correctly when called recursively as delayacct_blkio_start() does
not allow for repeated calls.
So:
- use ->in_iowait to detect recursion. Set it earlier, and restore
it to the old value.
- move the call to "raw_rq" after the call to blk_flush_plug().
As this is some sort of per-cpu thing, we want some chance that
we are on the right CPU
- When io_schedule() is called recurively, use blk_schedule_flush_plug()
which cannot further recurse.
- as this makes io_schedule() a lot more complex and as io_schedule()
must match io_schedule_timeout(), but all the changes in io_schedule_timeout()
and make io_schedule a simple wrapper for that.
Signed-off-by: NeilBrown <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
[ Moved the now rudimentary io_schedule() into sched.h. ]
Cc: Jens Axboe <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Tony Battersby <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Commit de30ec47302c "Remove unnecessary ->wait.lock serialization when
reading completion state" was not correct, without lock/unlock the code
like stop_machine_from_inactive_cpu()
while (!completion_done())
cpu_relax();
can return before complete() finishes its spin_unlock() which writes to
this memory. And spin_unlock_wait().
While at it, change try_wait_for_completion() to use READ_ONCE().
Reported-by: Paul E. McKenney <[email protected]>
Reported-by: Davidlohr Bueso <[email protected]>
Tested-by: Paul E. McKenney <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
[ Added a comment with the barrier. ]
Cc: Linus Torvalds <[email protected]>
Cc: Nicholas Mc Guire <[email protected]>
Cc: [email protected]
Cc: [email protected]
Fixes: de30ec47302c ("sched/completion: Remove unnecessary ->wait.lock serialization when reading completion state")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Since the function graph tracer needs to disable preemption, it might
call preempt_schedule() after reenabling it if something triggered the
need for rescheduling in between.
Therefore we can't trace preempt_schedule() itself because we would
face a function tracing recursion otherwise as the tracer is always
called before PREEMPT_ACTIVE gets set to prevent that recursion. This is
why preempt_schedule() is tagged as "notrace".
But the same issue applies to every function called by preempt_schedule()
before PREEMPT_ACTIVE is actually set. And preempt_schedule_common() is
one such example. Unfortunately we forgot to tag it as notrace as well
and as a result we are encountering tracing recursion since it got
introduced by:
a18b5d0181923 ("sched: Fix missing preemption opportunity")
Let's fix that by applying the appropriate function tag to
preempt_schedule_common().
Reported-by: Huang Ying <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Linus Torvalds <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
A deadline task may be throttled and dequeued at the same time.
This happens, when it becomes throttled in schedule(), which
is called to go to sleep:
current->state = TASK_INTERRUPTIBLE;
schedule()
deactivate_task()
dequeue_task_dl()
update_curr_dl()
start_dl_timer()
__dequeue_task_dl()
prev->on_rq = 0;
Later the timer fires, but the task is still dequeued:
dl_task_timer()
enqueue_task_dl() /* queues on dl_rq; on_rq remains 0 */
Someone wakes it up:
try_to_wake_up()
enqueue_dl_entity()
BUG_ON(on_dl_rq())
Patch fixes this problem, it prevents queueing !on_rq tasks
on dl_rq.
Reported-by: Fengguang Wu <[email protected]>
Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
[ Wrote comment. ]
Cc: Juri Lelli <[email protected]>
Fixes: 1019a359d3dc ("sched/deadline: Fix stale yield state")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Kirill reported that a dl task can be throttled and dequeued at the
same time. This happens, when it becomes throttled in schedule(),
which is called to go to sleep:
current->state = TASK_INTERRUPTIBLE;
schedule()
deactivate_task()
dequeue_task_dl()
update_curr_dl()
start_dl_timer()
__dequeue_task_dl()
prev->on_rq = 0;
This invalidates the assumption from commit 0f397f2c90ce ("sched/dl:
Fix race in dl_task_timer()"):
"The only reason we don't strictly need ->pi_lock now is because
we're guaranteed to have p->state == TASK_RUNNING here and are
thus free of ttwu races".
And therefore we have to use the full task_rq_lock() here.
This further amends the fact that we forgot to update the rq lock loop
for TASK_ON_RQ_MIGRATE, from commit cca26e8009d1 ("sched: Teach
scheduler to understand TASK_ON_RQ_MIGRATING state").
Reported-by: Kirill Tkhai <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Juri Lelli <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
There was a wee bit of confusion around the exact ordering here;
clarify things.
Reported-by: Kirill Tkhai <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This reverts commit d1a8d66b9177105e898e73716f97eb61842c457a.
Ard reported a boot failure when running UEFI under Qemu and Xen and
experimenting with various Tianocore build options,
"As it turns out, when allocating room for the UEFI memory map using
UEFI's AllocatePool (), it may result in two new memory map entries
being created, for instance, when using Tianocore's preallocated region
feature. For example, the following region
0x00005ead5000-0x00005ebfffff [Conventional Memory| | | | | |WB|WT|WC|UC]
may be split like this
0x00005ead5000-0x00005eae2fff [Conventional Memory| | | | | |WB|WT|WC|UC]
0x00005eae3000-0x00005eae4fff [Loader Data | | | | | |WB|WT|WC|UC]
0x00005eae5000-0x00005ebfffff [Conventional Memory| | | | | |WB|WT|WC|UC]
if the preallocated Loader Data region was chosen to be right in the
middle of the original free space.
After patch d1a8d66b9177 ("efi/libstub: Call get_memory_map() to
obtain map and desc sizes"), this is not being dealt with correctly
anymore, as the existing logic to allocate room for a single additional
entry has become insufficient."
Mark requested to reinstate the old loop we had before commit
d1a8d66b9177, which grows the memory map buffer until it's big enough to
hold the EFI memory map.
Acked-by: Ard Biesheuvel <[email protected]>
Acked-by: Mark Rutland <[email protected]>
Signed-off-by: Matt Fleming <[email protected]>
|
|
Commit 20e783e39e55 ("ARM: 8296/1: cache-l2x0: clean up aurora cache
handling") removed the only user of the Kconfig symbol CACHE_PL310.
Setting CACHE_PL310 is now pointless. Remove its Kconfig entry, and one
select of this symbol.
Signed-off-by: Paul Bolle <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
fixes
This pull request contains the following Broadcom SoCs Device Tree changes:
- Ray adds support for the Cygnus i2c Device Tree controller on Cygnus SoCs
- Fixes to the BCM63138 dtsi file for the L2 cache controller properties
* tag 'arm-soc/for-3.20/dts' of http://github.com/broadcom/stblinux:
ARM: dts: add I2C device nodes for Broadcom Cygnus
ARM: dts: BCM63xx: fix L2 cache properties
|
|
The rockchips suspend/resume code requires regulators to work,
and gives a compile-time error if they are not available:
arch/arm/mach-rockchip/built-in.o: In function `rk3288_suspend_finish':
:(.text+0x146): undefined reference to `regulator_suspend_finish'
arch/arm/mach-rockchip/built-in.o: In function `rk3288_suspend_prepare':
:(.text+0x18e): undefined reference to `regulator_suspend_prepare'
To solve this, we now enable regulators whenever they are needed,
which is what we do on a lot of other platforms as well.
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
mvebu_armada375_smp_wa_init is only used on armada 375 but is defined
for all mvebu machines. As it calls a function that is only provided
sometimes, this can result in a link error:
arch/arm/mach-mvebu/built-in.o: In function `mvebu_armada375_smp_wa_init':
:(.text+0x228): undefined reference to `mvebu_setup_boot_addr_wa'
To solve this, we can just change the existing #ifdef around the
function to also check for Armada375 SMP platforms.
Signed-off-by: Arnd Bergmann <[email protected]>
Fixes: 305969fb6292 ("ARM: mvebu: use the common function for Armada 375 SMP workaround")
Cc: Andrew Lunn <[email protected]>
Cc: Jason Cooper <[email protected]>
Cc: Gregory Clement <[email protected]>
|
|
A lot of the sti device drivers require reset controller support,
but do not all have individual 'depends on RESET_CONTROLLER'
statements. Using 'select' here once avoids a lot of build errors
resulting from this.
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Maxime Coquelin <[email protected]>
Acked-by: Patrice Chotard <[email protected]>
Cc: Srinivas Kandagatla <[email protected]>
|
|
If CONFIG_PM_SLEEP is disabled, we get a build error for rockchips:
arch/arm/mach-rockchip/built-in.o: In function `rockchip_dt_init':
:(.init.text+0x1c): undefined reference to `rockchip_suspend_init'
This adds an inline alternative for that case.
Signed-off-by: Arnd Bergmann <[email protected]>
Reviewed-by: Heiko Stuebner <[email protected]>
Cc: [email protected]
|
|
Most platforms use void pointer arguments in these functions, but
ixp4xx does not, which triggers lots of warnings in device drivers like:
net/ethernet/8390/ne2k-pci.c: In function 'ne2k_pci_get_8390_hdr':
net/ethernet/8390/ne2k-pci.c:503:3: warning: passing argument 2 of 'insw' from incompatible pointer type
insw(NE_BASE + NE_DATAPORT, hdr, sizeof(struct e8390_pkt_hdr)>>1);
^
In file included from include/asm/io.h:214:0,
from /git/arm-soc/include/linux/io.h:22,
from /git/arm-soc/include/linux/pci.h:31,
from net/ethernet/8390/ne2k-pci.c:48:
mach-ixp4xx/include/mach/io.h:316:91: note: expected 'u16 *' but argument is of type 'struct e8390_pkt_hdr *'
static inline void insw(u32 io_addr, u16 *vaddr, u32 count)
Fixing the drivers seems hopeless, so this changes the ixp4xx code
to do the same as the others to avoid the warnings.
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Krzysztof Halasa <[email protected]>
Cc: Imre Kaloz <[email protected]>
|
|
The new Atlas7 platform implicitly selects 'CONFIG_SMP_ON_UP',
which leads to problems if we enable building the platform without
MMU, as that combination is not allowed and causes a link error:
arch/arm/kernel/built-in.o: In function `c_show':
:(.text+0x1872): undefined reference to `smp_on_up'
:(.text+0x1876): undefined reference to `smp_on_up'
arch/arm/kernel/built-in.o: In function `arch_irq_work_raise':
:(.text+0x3d48): undefined reference to `smp_on_up'
:(.text+0x3d4c): undefined reference to `smp_on_up'
arch/arm/kernel/built-in.o: In function `smp_setup_processor_id':
:(.init.text+0x180): undefined reference to `smp_on_up'
This removes the 'select' statement.
Signed-off-by: Arnd Bergmann <[email protected]>
Fixes: 4cba058526a7 ("ARM: sirf: add Atlas7 machine support")
Acked-by: Barry Song <[email protected]>
Cc: Zhiwu Song <[email protected]>
|