Age | Commit message (Collapse) | Author | Files | Lines |
|
Bisection by Amadeusz Sławiński implicates this commit leading to bad
page state issues after VM shutdown, likely due to unbalanced page
references. The original commit was intended only as a performance
improvement, therefore revert for offline rework.
Link: https://lkml.org/lkml/2018/6/2/97
Fixes: 356e88ebe447 ("vfio/type1: Improve memory pinning process for raw PFN mapping")
Cc: Jason Cai (Xiang Feng) <[email protected]>
Reported-by: Amadeusz Sławiński <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-06-02
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) BPF uapi fix in struct bpf_prog_info and struct bpf_map_info in
order to fix offsets on 32 bit archs.
This will have a minor merge conflict with net-next which has the
__u32 gpl_compatible:1 bitfield in struct bpf_prog_info at this
location. Resolution is to use the gpl_compatible member.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the
case of struct bpf_map_info but also struct bpf_prog_info. In net-next
commit b85fab0e67b ("bpf: Add gpl_compatible flag to struct bpf_prog_info")
added a bitfield into it to expose some flags related to programs. Thus,
add an unnamed __u32 bitfield for both so that alignment keeps the same
in both 32 and 64 bit cases, and can be naturally extended from there
as in b85fab0e67b.
Before:
# file test.o
test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
# pahole test.o
struct bpf_map_info {
__u32 type; /* 0 4 */
__u32 id; /* 4 4 */
__u32 key_size; /* 8 4 */
__u32 value_size; /* 12 4 */
__u32 max_entries; /* 16 4 */
__u32 map_flags; /* 20 4 */
char name[16]; /* 24 16 */
__u32 ifindex; /* 40 4 */
__u64 netns_dev; /* 44 8 */
__u64 netns_ino; /* 52 8 */
/* size: 64, cachelines: 1, members: 10 */
/* padding: 4 */
};
After (same as on 64 bit):
# file test.o
test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
# pahole test.o
struct bpf_map_info {
__u32 type; /* 0 4 */
__u32 id; /* 4 4 */
__u32 key_size; /* 8 4 */
__u32 value_size; /* 12 4 */
__u32 max_entries; /* 16 4 */
__u32 map_flags; /* 20 4 */
char name[16]; /* 24 16 */
__u32 ifindex; /* 40 4 */
/* XXX 4 bytes hole, try to pack */
__u64 netns_dev; /* 48 8 */
__u64 netns_ino; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
/* size: 64, cachelines: 1, members: 10 */
/* sum members: 60, holes: 1, sum holes: 4 */
};
Reported-by: Dmitry V. Levin <[email protected]>
Reported-by: Eugene Syromiatnikov <[email protected]>
Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps")
Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs")
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
Testing Telit LM940 with ICMP packets > 14552 bytes revealed that
the modem needs FLAG_SEND_ZLP to properly work, otherwise the cdc
mbim data interface won't be anymore responsive.
Signed-off-by: Daniele Palmas <[email protected]>
Acked-by: Bjørn Mork <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Nicolas Dichtel says:
====================
ip[6] tunnels: fix mtu calculations
The first patch restores the possibility to bind an ip4 tunnel to an
interface whith a large mtu.
The second patch was spotted after the first fix. I also target it to net
because it fixes the max mtu value that can be used for ipv6 tunnels.
v2: remove the 0xfff8 in ip_tunnel_newlink()
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
I don't know where this value comes from (probably a copy and paste and
paste and paste ...).
Let's use standard values which are a bit greater.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Signed-off-by: Nicolas Dichtel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After commit f6cc9c054e77, the following conf is broken (note that the
default loopback mtu is 65536, ie IP_MAX_MTU + 1):
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
add tunnel "gre0" failed: Invalid argument
$ ip l a type dummy
$ ip l s dummy1 up
$ ip l s dummy1 mtu 65535
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
add tunnel "gre0" failed: Invalid argument
dev_set_mtu() doesn't allow to set a mtu which is too large.
First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
the magic value 0xFFF8 and use IP_MAX_MTU instead.
0xFFF8 seems to be there for ages, I don't know why this value was used.
With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
$ ip l s dummy1 mtu 66000
After that patch, it's also possible to bind an ip tunnel on that kind of
interface.
CC: Petr Machata <[email protected]>
CC: Ido Schimmel <[email protected]>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
Signed-off-by: Nicolas Dichtel <[email protected]>
Reviewed-by: Ido Schimmel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2018-05-31
1) Avoid possible overflow of the offset variable
in _decode_session6(), this fixes an infinite
lookp there. From Eric Dumazet.
2) We may use an error pointer in the error path of
xfrm_bundle_create(). Fix this by returning this
pointer directly to the caller.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds support for the BCM5389 switch connected through MDIO.
Signed-off-by: Damien Thébault <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
pblk allocates line bitmaps within the line lock unnecessarily. In order
to take pressure out of the fast patch, allocate line bitmaps outside
of this lock and refactor accordingly.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Unless we kick the writer directly when setting a new flush point, the
user risks having to wait for up to one second (the default timeout for
the write thread to be kicked) for the IO to complete.
Signed-off-by: Hans Holmberg <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When switching between different lun configurations, there is no
guarantee that all lines that contain closed/open chunks have some
valid data to recover.
Check that the smeta chunk has been written to instead. Also
skip bad lines (that does not have enough good chunks).
Signed-off-by: Hans Holmberg <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In the read path, pblk gets a reference to the incoming bio and puts it
after ending the bio. Though this behavior is correct, it is unnecessary
since pblk is the one putting the bio, therefore, it cannot disappear
underneath it.
Removing this reference, allows to clean up rqd->bio and avoids pointer
bouncing for the different read paths. Now, the incoming bio always
resides in the read context and pblk's internal bios (if any) reside in
rqd->bio.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In some cases, users can want set write buffer size manually, e.g. to
adjust it to specific workload. This patch provides the possibility
to set write buffer size via module parameter feature.
Signed-off-by: Marcin Dziegielewski <[email protected]>
Signed-off-by: Igor Konopko <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When error occurs during bio_add_page on partial read path, pblk
tries to free pages twice.
Signed-off-by: Igor Konopko <[email protected]>
Signed-off-by: Marcin Dziegielewski <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Currently in case of error caused by bio_pc_add_page in
pblk_bio_add_pages two issues occur when calling from
pblk_rb_read_to_bio(). First one is in pblk_bio_free_pages, since we
are trying to free pages not allocated from our mempool. Second one
is the warn from dma_pool_free, that we are trying to free NULL
pointer dma.
This commit fix both issues.
Signed-off-by: Igor Konopko <[email protected]>
Signed-off-by: Marcin Dziegielewski <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Smeta write errors were previously ignored. Skip these
lines instead and throw them back on the free
list, so the chunks will go through a reset cycle
before we attempt to use the line again.
Signed-off-by: Hans Holmberg <[email protected]>
Reviewed-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Write failures should not happen under normal circumstances,
so in order to bring the chunk back into a known state as soon
as possible, evacuate all the valid data out of the line and let the
fw judge if the block can be written to in the next reset cycle.
Do this by introducing a new gc list for lines with failed writes,
and ensure that the rate limiter allocates a small portion of
the write bandwidth to get the job done.
The lba list is saved in memory for use during gc as we
cannot gurantee that the emeta data is readable if a write
error occurred.
Signed-off-by: Hans Holmberg <[email protected]>
Reviewed-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
The write error recovery path is incomplete, so rework
the write error recovery handling to do resubmits directly
from the write buffer.
When a write error occurs, the remaining sectors in the chunk are
mapped out and invalidated and the request inserted in a resubmit list.
The writer thread checks if there are any requests to resubmit,
scans and invalidates any lbas that have been overwritten by later
writes and resubmits the failed entries.
Signed-off-by: Hans Holmberg <[email protected]>
Reviewed-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
(resend for properly queueing in patchwork)
kcm_clone() creates kernel socket, which does not take net counter.
Thus, the net may die before the socket is completely destructed,
i.e. kcm_exit_net() is executed before kcm_done().
Reported-by: [email protected]
Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Remove dead function for manual sync. I/O
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
If the namespace is unregistered before the LightNVM target is removed
(e.g., on hot unplug) it is too late for the target to store any metadata
on the device - any attempt to write to the device will fail. In this
case, pass on a "gracefull teardown" flag to the target to let it know
when this happens.
In the case of pblk, we pad the open line (close all open chunks) to
improve data retention. In the event of an ungraceful shutdown, avoid
this part and just clean up.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Do the check for the chunk state after making sure that the chunk type
is supported.
Fixes: 32ef9412c114 ("lightnvm: pblk: implement get log report chunk")
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Remove unnecessary argument on pblk_line_free()
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Call nvm_submit_io directly and remove an unnecessary indirection on the
read path.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Return a meaningful error when the sanity vector I/O check fails.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
When cleaning up buffer entries as we wrap up, their state should be
"completed". If any of the entries is in "submitted" state, it means
that something bad has happened. Trigger a warning immediately instead of
waiting for the state flag to eventually be updated, thus hiding the
issue.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In the event of a mismatch between the read LBA and the metadata pointer
reported by the device, improve the error message to be able to detect
the offending physical address (PPA) mapped to the corrupted LBA.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Check that the lba stored in the LBA metadata is correct in the GC path
too. This requires a new helper function to check random reads in the
vector read.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Bad blocks can grow at runtime. Check that the number of valid blocks in
a line are within the sanity threshold before allocating the line for
new writes.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
In the event of a line failing to allocate, fail gracefully and stop the
pipeline to avoid more write failing in the same place.
Signed-off-by: Javier González <[email protected]>
Signed-off-by: Matias Bjørling <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull NVMe changes from Christoph:
"Below is another set of NVMe updates for 4.18. Besides the usual bug
fixes this includes more feature completness in terms of AEN and log
page handling on the target."
* 'nvme-4.18' of git://git.infradead.org/nvme:
nvme: use the changed namespaces list log to clear ns data changed AENs
nvme: mark nvme_queue_scan static
nvme: submit AEN event configuration on startup
nvmet: mask pending AENs
nvmet: add AEN configuration support
nvmet: implement the changed namespaces log
nvmet: split log page implementation
nvmet: add a new nvmet_zero_sgl helper
nvme.h: add AEN configuration symbols
nvme.h: add the changed namespace list log
nvme.h: untangle AEN notice definitions
nvmet: fix error return code in nvmet_file_ns_enable()
nvmet: fix a typo in nvmet_file_ns_enable()
nvme-fabrics: allow internal passthrough command on deleting controllers
nvme-loop: add support for multiple ports
nvme-pci: simplify __nvme_submit_cmd
nvme-pci: Rate limit the nvme timeout warnings
nvme: allow duplicate controller if prior controller being deleted
|
|
There is almost no shared logic, which leads to a very confusing code
flow.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Tested-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Both callers take just around so function call, so move it in.
Also remove the now pointless blk_mq_sched_init wrapper.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Tested-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Reported-by: Damien Le Moal <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Tested-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
These are only used by the block core. Also move the declarations to
block/blk.h.
Reported-by: Damien Le Moal <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Tested-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
No point in doing this in elevator_init.
Signed-off-by: Christoph Hellwig <[email protected]>
Reported-by: Damien Le Moal <[email protected]>
Reviewed-by: Damien Le Moal <[email protected]>
Tested-by: Damien Le Moal <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Per section 5.2 we need to issue the corresponding log page to clear an
AEN, so for a namespace data changed AEN we need to read the changed
namespace list log. And once we read that log anyway we might as well
use it to optimize the rescan.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
|
|
And move it toward the top of the file to avoid a forward declaration.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
We should register for AEN events; some law-abiding targets might
not be sending us AENs otherwise.
Signed-off-by: Hannes Reinecke <[email protected]>
[hch: slight cleanups]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
|
|
Per section 5.2 of the NVMe 1.3 spec:
"When the controller posts a completion queue entry for an outstanding
Asynchronous Event Request command and thus reports an asynchronous
event, subsequent events of that event type are automatically masked by
the controller until the host clears that event. An event is cleared by
reading the log page associated with that event using the Get Log Page
command (see section 5.14)."
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
|
|
AEN configuration via the 'Get Features' and 'Set Features' admin
command is mandatory, so we should be implemeting handling for it.
Signed-off-by: Hannes Reinecke <[email protected]>
[hch: use WRITE_ONCE, check for invalid values]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Daniel Verkamp <[email protected]>
|
|
Just keep a per-controller buffer of changed namespaces and copy it out
in the get log page implementation.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Daniel Verkamp <[email protected]>
|
|
Remove the common code to allocate a buffer and copy it into the SGL.
Instead the two no-op implementations just zero the SGL directly, and
the smart log allocates a buffer on its own. This prepares for the
more elaborate ANA log page.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
Zeroes the SGL in the payload.
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
Signed-off-by: Hannes Reinecke <[email protected]>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Johannes Thumshirn <[email protected]>
|
|
This patch reorders the error cases in showing the XPS configuration so
that we hold off on memory allocation until after we have verified that we
can support XPS on a given ring.
Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes")
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The previous code was optimistic, accepting the offload of whole action
chain when there was a single known action (drop/redirect). This results
in offloading a rule which should not be offloaded, because its behavior
cannot be reproduced in the hardware.
For example:
$ tc filter add dev eno1 parent ffff: protocol ip \
u32 ht 800: order 1 match tcp src 42 FFFF \
action mirred egress mirror dev enp1s16 pipe \
drop
The controller is unable to mirror the packet to a VF, but still
offloads the rule by dropping the packet.
Change the approach of the function to a pessimistic one, rejecting the
chain when an unknown action is found. This is better suited for future
extensions.
Note that both recognized actions always return TC_ACT_SHOT, therefore
it is safe to ignore actions behind them.
Signed-off-by: Ondřej Hlavatý <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull xfs fix from Darrick Wong:
"Clear out i_mapping error state when we're reinitializing inodes.
This last minute fix prevents writeback error state from persisting
past the end of the in-core inode lifecycle and causing EIO errors to
be reported to userspace when no error has occurred.
This fix for the behavioral regression has been soaking in for-next
for a while, but various fs developers persuaded me to try to get it
upstream for 4.17 because the patch that broke things was introduced
in 4.17-rc4"
* tag 'xfs-4.17-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: clear writeback errors in inode_init_always
|
|
The current error handling code has an issue where it does:
if (priv->txchan)
cpdma_chan_destroy(priv->txchan);
The problem is that ->txchan is either valid or an error pointer (which
would lead to an Oops). I've changed it to use multiple error labels so
that the test can be removed.
Also there were some missing calls to netif_napi_del().
Fixes: 3ef0fdb2342c ("net: davinci_emac: switch to new cpdma layer")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|