Age | Commit message (Collapse) | Author | Files | Lines |
|
Falling back unconditionally to HostNotify as primary client's interrupt
breaks some drivers which alter their functionality depending on whether
interrupt is present or not, so let's introduce a board flag telling I2C
core explicitly if we want wired interrupt or HostNotify-based one:
I2C_CLIENT_HOST_NOTIFY.
For DT-based systems we introduce "host-notify" property that we convert
to I2C_CLIENT_HOST_NOTIFY board flag.
Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Pali Rohár <pali.rohar@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
|
|
Pull remoteproc fixes from Bjorn Andersson:
"This fixes two regressions that have been reported to be introduced in
v4.10-rc1.
- correct an incorrect usage of the kref api
- revert the change to make the resource table read-only. As the
space each vdev resource is used as virtio device config space it
must be shared with the remote"
* tag 'rproc-v4.10-fixes' of git://github.com/andersson/remoteproc:
Revert "remoteproc: Merge table_ptr and cached_table pointers"
remoteproc: fix vdev reference management
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux
Pull scsi target fixes from Bart Van Assche:
- a series of bug fixes for the XCOPY implementation from David
Disseldorp
- one bug fix for the ibmvscsis driver, a driver that is used for
communication between partitions on IBM POWER systems.
* 'scsi-target-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bvanassche/linux:
ibmvscsis: Fix srp_transfer_data fail return code
target: support XCOPY requests without parameters
target: check for XCOPY parameter truncation
target: use XCOPY segment descriptor CSCD IDs
target: check XCOPY segment descriptor CSCD IDs
target: simplify XCOPY wwn->se_dev lookup helper
target: return UNSUPPORTED TARGET/SEGMENT DESC TYPE CODE sense
target: bounds check XCOPY total descriptor list length
target: bounds check XCOPY segment descriptor list
target: use XCOPY TOO MANY TARGET DESCRIPTORS sense
target: add XCOPY target/segment desc sense codes
|
|
Currently, when calling convert_ctx_access() callback for the various
program types, we pass in insn->dst_reg, insn->src_reg, insn->off from
the original instruction. This information is needed to rewrite the
instruction that is based on the user ctx structure into a kernel
representation for the ctx. As we'd like to allow access size beyond
just BPF_W, we'd need also insn->code for that in order to decode the
original access size. Given that, lets just pass insn directly to the
convert_ctx_access() callback and work on that to not clutter the
callback with even more arguments we need to pass when everything is
already contained in insn. So lets go through that once, no functional
change.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All block device data fields and functions returning a number of 512B
sectors are by convention named xxx_sectors while names in the form
xxx_size are generally used for a number of bytes. The blk_queue_zone_size
and bdev_zone_size functions were not following this convention so rename
them.
No functional change is introduced by this patch.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Collapsed the two patches, they were nonsensically split and broke
bisection.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
netif_wake_subqueue() is duplicating the same thing that netif_tx_wake_queue()
does, so make it call it directly after looking up the queue from the index.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Modules that use static_key_deferred need a way to synchronize with
any delayed work that is still pending when the module is unloaded.
Introduce static_key_deferred_flush() which flushes any pending
jump label updates.
Signed-off-by: David Matlack <dmatlack@google.com>
Cc: stable@vger.kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Add a flag that indicates that the WEP ICV was stripped from an
RX packet, allowing the device to not transfer that if it's
already checked.
Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
gcc-7 complains that wl3501_cs passes NULL into a function that
then uses the argument as the input for memcpy:
drivers/net/wireless/wl3501_cs.c: In function 'wl3501_get_scan':
include/net/iw_handler.h:559:3: error: argument 2 null where non-null expected [-Werror=nonnull]
memcpy(stream + point_len, extra, iwe->u.data.length);
This works fine here because iwe->u.data.length is guaranteed to be 0
and the memcpy doesn't actually have an effect.
Making the length check explicit avoids the warning and should have
no other effect here.
Also check the pointer itself, since otherwise we get warnings
elsewhere in the code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Two AF_* families adding entries to the lockdep tables
at the same time.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Merge fixes from Andrew Morton:
"27 fixes.
There are three patches that aren't actually fixes. They're simple
function renamings which are nice-to-have in mainline as ongoing net
development depends on them."
* akpm: (27 commits)
timerfd: export defines to userspace
mm/hugetlb.c: fix reservation race when freeing surplus pages
mm/slab.c: fix SLAB freelist randomization duplicate entries
zram: support BDI_CAP_STABLE_WRITES
zram: revalidate disk under init_lock
mm: support anonymous stable page
mm: add documentation for page fragment APIs
mm: rename __page_frag functions to __page_frag_cache, drop order from drain
mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
mm: don't dereference struct page fields of invalid pages
mailmap: add codeaurora.org names for nameless email commits
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
mm: pmd dirty emulation in page fault handler
ipc/sem.c: fix incorrect sem_lock pairing
lib/Kconfig.debug: fix frv build failure
mm: get rid of __GFP_OTHER_NODE
mm: fix remote numa hits statistics
mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
ocfs2: fix crash caused by stale lvb with fsdlm plugin
...
|
|
Support matching on ARP operation, and hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.
Example usage:
tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol arp parent ffff: flower indev eth0 \
arp_op request arp_sip 10.0.0.1 action drop
tc filter add dev eth0 protocol rarp parent ffff: flower indev eth0 \
arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow dissection of (R)ARP operation hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.
There are currently no users of FLOW_DISSECTOR_KEY_ARP.
A follow-up patch will allow FLOW_DISSECTOR_KEY_ARP to be used by the
flower classifier.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, this attribute is only fetched on station addition, but
not on station change. Since this info is only present in the assoc
request, with full station state support in the driver it cannot be
present when the station is added.
Thus, add support for changing the VHT opmode on station update if
done before (or while) the station is marked as associated. After
this, ignore it, since it used to be ignored.
Signed-off-by: Beni Lev <beni.lev@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Since userspace is expected to call timerfd syscalls directly with these
flags/ioctls, make sure we export them so they don't have to duplicate
the values themselves.
Link: http://lkml.kernel.org/r/20161219064052.7196-1-vapier@gentoo.org
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
During developemnt for zram-swap asynchronous writeback, I found strange
corruption of compressed page, resulting in:
Modules linked in: zram(E)
CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G E 4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff88007620b840 task.stack: ffff880078090000
RIP: set_freeobj.part.43+0x1c/0x1f
RSP: 0018:ffff880078093ca8 EFLAGS: 00010246
RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8
RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246
RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88
R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0
Call Trace:
obj_malloc+0x22b/0x260
zs_malloc+0x1e4/0x580
zram_bvec_rw+0x4cd/0x830 [zram]
page_requests_rw+0x9c/0x130 [zram]
zram_thread+0xe6/0x173 [zram]
kthread+0xca/0xe0
ret_from_fork+0x25/0x30
With investigation, it reveals currently stable page doesn't support
anonymous page. IOW, reuse_swap_page can reuse the page without waiting
writeback completion so it can overwrite page zram is compressing.
Unfortunately, zram has used per-cpu stream feature from v4.7.
It aims for increasing cache hit ratio of scratch buffer for
compressing. Downside of that approach is that zram should ask
memory space for compressed page in per-cpu context which requires
stricted gfp flag which could be failed. If so, it retries to
allocate memory space out of per-cpu context so it could get memory
this time and compress the data again, copies it to the memory space.
In this scenario, zram assumes the data should never be changed
but it is not true unless stable page supports. So, If the data is
changed under us, zram can make buffer overrun because second
compression size could be bigger than one we got in previous trial
and blindly, copy bigger size object to smaller buffer which is
buffer overrun. The overrun breaks zsmalloc free object chaining
so system goes crash like above.
I think below is same problem.
https://bugzilla.suse.com/show_bug.cgi?id=997574
Unfortunately, reuse_swap_page should be atomic so that we cannot wait on
writeback in there so the approach in this patch is simply return false if
we found it needs stable page. Although it increases memory footprint
temporarily, it happens rarely and it should be reclaimed easily althoug
it happened. Also, It would be better than waiting of IO completion,
which is critial path for application latency.
Fixes: da9556a2367c ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox
Link: http://lkml.kernel.org/r/1482366980-3782-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: <stable@vger.kernel.org> [4.7+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch does two things.
First it goes through and renames the __page_frag prefixed functions to
__page_frag_cache so that we can be clear that we are draining or
refilling the cache, not the frags themselves.
Second we drop the order parameter from __page_frag_cache_drain since we
don't actually need to pass it since all fragments are either order 0 or
must be a compound page.
Link: http://lkml.kernel.org/r/20170104023954.13451.5678.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
page_frag_free
Patch series "Page fragment updates", v4.
This patch series takes care of a few cleanups for the page fragments
API.
First we do some renames so that things are much more consistent. First
we move the page_frag_ portion of the name to the front of the functions
names. Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.
Finally I added a bit of documentation that will hopefully help to
explain some of this. I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.
This patch (of 3):
This patch renames the page frag functions to be more consistent with
other APIs. Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix. This makes it a bit clearer in terms of naming.
In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.
Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Nils Holland and Klaus Ethgen have reported unexpected OOM killer
invocations with 32b kernel starting with 4.8 kernels
kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
kworker/u4:5 cpuset=/ mems_allowed=0
CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
[...]
Mem-Info:
active_anon:58685 inactive_anon:90 isolated_anon:0
active_file:274324 inactive_file:281962 isolated_file:0
unevictable:0 dirty:649 writeback:0 unstable:0
slab_reclaimable:40662 slab_unreclaimable:17754
mapped:7382 shmem:202 pagetables:351 bounce:0
free:206736 free_pcp:332 free_cma:0
Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 813 3474 3474
Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
lowmem_reserve[]: 0 0 21292 21292
HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB
the oom killer is clearly pre-mature because there there is still a lot
of page cache in the zone Normal which should satisfy this lowmem
request. Further debugging has shown that the reclaim cannot make any
forward progress because the page cache is hidden in the active list
which doesn't get rotated because inactive_list_is_low is not memcg
aware.
The code simply subtracts per-zone highmem counters from the respective
memcg's lru sizes which doesn't make any sense. We can simply end up
always seeing the resulting active and inactive counts 0 and return
false. This issue is not limited to 32b kernels but in practice the
effect on systems without CONFIG_HIGHMEM would be much harder to notice
because we do not invoke the OOM killer for allocations requests
targeting < ZONE_NORMAL.
Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
and subtract per-memcg highmem counts when memcg is enabled. Introduce
helper lruvec_zone_lru_size which redirects to either zone counters or
mem_cgroup_get_zone_lru_size when appropriate.
We are losing empty LRU but non-zero lru size detection introduced by
ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size") because
of the inherent zone vs. node discrepancy.
Fixes: f8d1a31163fc ("mm: consider whether to decivate based on eligible zones inactive ratio")
Link: http://lkml.kernel.org/r/20170104100825.3729-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Nils Holland <nholland@tisys.org>
Tested-by: Nils Holland <nholland@tisys.org>
Reported-by: Klaus Ethgen <Klaus@Ethgen.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since commit 00cd5c37afd5 ("ptrace: permit ptracing of /sbin/init") we
can now trace init processes. init is initially protected with
SIGNAL_UNKILLABLE which will prevent fatal signals such as SIGSTOP, but
there are a number of paths during tracing where SIGNAL_UNKILLABLE can
be implicitly cleared.
This can result in init becoming stoppable/killable after tracing. For
example, running:
while true; do kill -STOP 1; done &
strace -p 1
and then stopping strace and the kill loop will result in init being
left in state TASK_STOPPED. Sending SIGCONT to init will resume it, but
init will now respond to future SIGSTOP signals rather than ignoring
them.
Make sure that when setting SIGNAL_STOP_CONTINUED/SIGNAL_STOP_STOPPED
that we don't clear SIGNAL_UNKILLABLE.
Link: http://lkml.kernel.org/r/20170104122017.25047-1-jamie.iles@oracle.com
Signed-off-by: Jamie Iles <jamie.iles@oracle.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The flag was introduced by commit 78afd5612deb ("mm: add
__GFP_OTHER_NODE flag") to allow proper accounting of remote node
allocations done by kernel daemons on behalf of a process - e.g.
khugepaged.
After "mm: fix remote numa hits statistics" we do not need and actually
use the flag so we can safely remove it because all allocations which
are satisfied from their "home" node are accounted properly.
[mhocko@suse.com: fix build]
Link: http://lkml.kernel.org/r/20170106122225.GK5556@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170102153057.9451-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Andrey Konovalov has reported the following warning triggered by the
syzkaller fuzzer.
WARNING: CPU: 1 PID: 9935 at mm/page_alloc.c:3511 __alloc_pages_nodemask+0x159c/0x1e20
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 9935 Comm: syz-executor0 Not tainted 4.9.0-rc7+ #34
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__alloc_pages_slowpath mm/page_alloc.c:3511
__alloc_pages_nodemask+0x159c/0x1e20 mm/page_alloc.c:3781
alloc_pages_current+0x1c7/0x6b0 mm/mempolicy.c:2072
alloc_pages include/linux/gfp.h:469
kmalloc_order+0x1f/0x70 mm/slab_common.c:1015
kmalloc_order_trace+0x1f/0x160 mm/slab_common.c:1026
kmalloc_large include/linux/slab.h:422
__kmalloc+0x210/0x2d0 mm/slub.c:3723
kmalloc include/linux/slab.h:495
ep_write_iter+0x167/0xb50 drivers/usb/gadget/legacy/inode.c:664
new_sync_write fs/read_write.c:499
__vfs_write+0x483/0x760 fs/read_write.c:512
vfs_write+0x170/0x4e0 fs/read_write.c:560
SYSC_write fs/read_write.c:607
SyS_write+0xfb/0x230 fs/read_write.c:599
entry_SYSCALL_64_fastpath+0x1f/0xc2
The issue is caused by a lack of size check for the request size in
ep_write_iter which should be fixed. It, however, points to another
problem, that SLUB defines KMALLOC_MAX_SIZE too large because the its
KMALLOC_SHIFT_MAX is (MAX_ORDER + PAGE_SHIFT) which means that the
resulting page allocator request might be MAX_ORDER which is too large
(see __alloc_pages_slowpath).
The same applies to the SLOB allocator which allows even larger sizes.
Make sure that they are capped properly and never request more than
MAX_ORDER order.
Link: http://lkml.kernel.org/r/20161220130659.16461-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Currently dax_mapping_entry_mkclean() fails to clean and write protect
the pmd_t of a DAX PMD entry during an *sync operation. This can result
in data loss in the following sequence:
1) mmap write to DAX PMD, dirtying PMD radix tree entry and making the
pmd_t dirty and writeable
2) fsync, flushing out PMD data and cleaning the radix tree entry. We
currently fail to mark the pmd_t as clean and write protected.
3) more mmap writes to the PMD. These don't cause any page faults since
the pmd_t is dirty and writeable. The radix tree entry remains clean.
4) fsync, which fails to flush the dirty PMD data because the radix tree
entry was clean.
5) crash - dirty data that should have been fsync'd as part of 4) could
still have been in the processor cache, and is lost.
Fix this by marking the pmd_t clean and write protected in
dax_mapping_entry_mkclean(), which is called as part of the fsync
operation 2). This will cause the writes in step 3) above to generate
page faults where we'll re-dirty the PMD radix tree entry, resulting in
flushes in the fsync that happens in step 4).
Fixes: 4b4bb46d00b3 ("dax: clear dirty entry tags on cache flush")
Link: http://lkml.kernel.org/r/1482272586-21177-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "Write protect DAX PMDs in *sync path".
Currently dax_mapping_entry_mkclean() fails to clean and write protect
the pmd_t of a DAX PMD entry during an *sync operation. This can result
in data loss, as detailed in patch 2.
This series is based on Dan's "libnvdimm-pending" branch, which is the
current home for Jan's "dax: Page invalidation fixes" series. You can
find a working tree here:
https://git.kernel.org/cgit/linux/kernel/git/zwisler/linux.git/log/?h=dax_pmd_clean
This patch (of 2):
Similar to follow_pte(), follow_pte_pmd() allows either a PTE leaf or a
huge page PMD leaf to be found and returned.
Link: http://lkml.kernel.org/r/1482272586-21177-2-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The GRO fast path caches the frag0 address. This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.
This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.
This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.
Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As defined in http://www.t10.org/lists/asc-num.htm. To be used during
validation of XCOPY target and segment descriptor lists.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
|
|
'asoc/fix/dwc', 'asoc/fix/fsl-ssi' and 'asoc/fix/hdmi-codec' into asoc-linus
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:
====================
mlx5 4K UAR
The following series of patches optimizes the usage of the UAR area which is
contained within the BAR 0-1. Previous versions of the firmware and the driver
assumed each system page contains a single UAR. This patch set will query the
firmware for a new capability that if published, means that the firmware can
support UARs of fixed 4K regardless of system page size. In the case of
powerpc, where page size equals 64KB, this means we can utilize 16 UARs per
system page. Since user space processes by default consume eight UARs per
context this means that with this change a process will need a single system
page to fulfill that requirement and in fact make use of more UARs which is
better in terms of performance.
In addition to optimizing user-space processes, we introduce an allocator
that can be used by kernel consumers to allocate blue flame registers
(which are areas within a UAR that are used to write doorbells). This provides
further optimization on using the UAR area since the Ethernet driver makes
use of a single blue flame register per system page and now it will use two
blue flame registers per 4K.
The series also makes changes to naming conventions and now the terms used in
the driver code match the terms used in the PRM (programmers reference manual).
Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame
register).
In order to support compatibility between different versions of
library/driver/firmware, the library has now means to notify the kernel driver
that it supports the new scheme and the kernel can notify the library if it
supports this extension. So mixed versions of libraries can run concurrently
without any issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
since ARG_PTR_TO_STACK is no longer just pointer to stack
rename it to ARG_PTR_TO_MEM and adjust comment.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Support for SMC socket monitoring via netlink sockets of protocol
NETLINK_SOCK_DIAG.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Connection creation with SMC-R starts through an internal
TCP-connection. The Ethernet interface for this TCP-connection is not
restricted to the Ethernet interface of a RoCE device. Any existing
Ethernet interface belonging to the same physical net can be used, as
long as there is a defined relation between the Ethernet interface and
some RoCE devices. This relation is defined with the help of an
identification string called "Physical Net ID" or short "pnet ID".
Information about defined pnet IDs and their related Ethernet
interfaces and RoCE devices is stored in the SMC-R pnet table.
A pnet table entry consists of the identifying pnet ID and the
associated network and IB device.
This patch adds pnet table configuration support using the
generic netlink message interface referring to network and IB device
by their names. Commands exist to add, delete, and display pnet table
entries, and to flush or display the entire pnet table.
There are cross-checks to verify whether the ethernet interfaces
or infiniband devices really exist in the system. If either device
is not available, the pnet ID entry is not created.
Loss of network devices and IB devices is also monitored;
a pnet ID entry is removed when an associated network or
IB device is removed.
Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
* enable smc module loading and unloading
* register new socket family
* basic smc socket creation and deletion
* use backing TCP socket to run CLC (Connection Layer Control)
handshake of SMC protocol
* Setup for infiniband traffic is implemented in follow-on patches.
For now fallback to TCP socket is always used.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Direct call of tcp_set_keepalive() function from protocol-agnostic
sock_setsockopt() function in net/core/sock.c violates network
layering. And newly introduced protocol (SMC-R) will need its own
keepalive function. Therefore, add "keepalive" function pointer
to "struct proto", and call it from sock_setsockopt() via this pointer.
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
afs: Refcount afs_call struct
These patches provide some tracepoints for AFS and fix a potential leak by
adding refcounting to the afs_call struct.
The patches are:
(1) Add some tracepoints for logging incoming calls and monitoring
notifications from AF_RXRPC and data reception.
(2) Get rid of afs_wait_mode as it didn't turn out to be as useful as
initially expected. It can be brought back later if needed. This
clears some stuff out that I don't then need to fix up in (4).
(3) Allow listen(..., 0) to be used to disable listening. This makes
shutting down the AFS cache manager server in the kernel much easier
and the accounting simpler as we can then be sure that (a) all
preallocated afs_call structs are relesed and (b) no new incoming
calls are going to be started.
For the moment, listening cannot be reenabled.
(4) Add refcounting to the afs_call struct to fix a potential multiple
release detected by static checking and add a tracepoint to follow the
lifecycle of afs_call objects.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that we have properly encapsulated and made drivers utilize exported
functions, we can switch dsa_switch_ops to be a annotated with const.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for making struct dsa_switch_ops const, encapsulate it
within a dsa_switch_driver which has a list pointer and a pointer to
dsa_switch_ops. This allows us to take the list_head pointer out of
dsa_switch_ops, which is written to by {un,}register_switch_driver.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
|
|
This patch moves stmmac_clk, pclk, clk_ptp_ref and stmmac_rst to the
plat_stmmacenet_data structure. It also moves these platform variables
initialization to stmmac_platform. This was done for two reasons:
a) If PCI is used, platform related code is being executed in stmmac_main
resulting in warnings that have no sense and conceptually was not right
b) stmmac as a synopsys reference ethernet driver stack will be hosting
more and more drivers to its structure like synopsys/dwc_eth_qos.c.
These drivers have their own DT bindings that are not compatible with
stmmac's. One of the most important are the clock names, and so they need
to be parsed in the glue logic and initialized there, and that is the main
reason why the clocks were passed to the platform structure.
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Tested-by: Niklas Cassel <niklas.cassel@axis.com>
Reviewed-by: Lars Persson <larper@axis.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds a new parameter to the stmmac DT: snps,en-tx-lpi-clockgating.
It was ported from synopsys/dwc_eth_qos.c and it is useful if lpi tx clock
gating is needed by stmmac users also.
Signed-off-by: Joao Pinto <jpinto@synopsys.com>
Tested-by: Niklas Cassel <niklas.cassel@axis.com>
Reviewed-by: Lars Persson <larper@axis.com>
Acked-by: Alexandre TORGUE <alexandre.torgue@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
modify act_csum to compute crc32c on IPv4/IPv6 packets having SCTP in
their payload, and extend UAPI definitions accordingly.
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
HalfSipHash, or hsiphash, is a shortened version of SipHash, which
generates 32-bit outputs using a weaker 64-bit key. It has *much* lower
security margins, and shouldn't be used for anything too sensitive, but
it could be used as a hashtable key function replacement, if the output
is never exposed, and if the security requirement is not too high.
The goal is to make this something that performance-critical jhash users
would be willing to use.
On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias
SipHash1-3 to HalfSipHash1-3 on those systems.
64-bit x86_64:
[ 0.509409] test_siphash: SipHash2-4 cycles: 4049181
[ 0.510650] test_siphash: SipHash1-3 cycles: 2512884
[ 0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920
[ 0.512904] test_siphash: JenkinsHash cycles: 978267
So, we map hsiphash() -> SipHash1-3
32-bit x86:
[ 0.509868] test_siphash: SipHash2-4 cycles: 14812892
[ 0.513601] test_siphash: SipHash1-3 cycles: 9510710
[ 0.515263] test_siphash: HalfSipHash1-3 cycles: 3856157
[ 0.515952] test_siphash: JenkinsHash cycles: 1148567
So, we map hsiphash() -> HalfSipHash1-3
hsiphash() is roughly 3 times slower than jhash(), but comes with a
considerable security improvement.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function, or as a
general PRF for short input use cases, such as sequence numbers or RNG
chaining.
For the first usage:
There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector. Currently
hashtables use jhash, which is fast but not secure, and some kind of
rotating key scheme (or none at all, which isn't good). SipHash is meant
as a replacement for jhash in these cases.
There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.
While SipHash is extremely fast for a cryptographically secure function,
it is likely a bit slower than the insecure jhash, and so replacements
will be evaluated on a case-by-case basis based on whether or not the
difference in speed is negligible and whether or not the current jhash usage
poses a real security risk.
For the second usage:
A few places in the kernel are using MD5 or SHA1 for creating secure
sequence numbers, syn cookies, port numbers, or fast random numbers.
SipHash is a faster and more fitting, and more secure replacement for MD5
in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
obvious and straight-forward, and so is submitted along with this patch
series. There shouldn't be much of a debate over its efficacy.
Dozens of languages are already using this internally for their hash
tables and PRFs. Some of the BSDs already use this in their kernels.
SipHash is a widely known high-speed solution to a widely known set of
problems, and it's time we catch-up.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add fields to structs to convey to kernel an indication whether the
library supports multi UARs per page and return to the library the size
of a UAR based on the queried value.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Current check requests that new fields in struct
mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero.
This was introduced so new libraries passing additional information to
the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified
by old kernels that do not support their request by failing the
operation. This schecme is problematic since it requires libmlx5 to issue
the requests with descending input size for struct
mlx5_ib_alloc_ucontext_req_v2.
To avoid this, we require that new features that will obey the following
rules:
If the feature requires one or more fields in the response and the at
least one of the fields can be encoded such that a zero value means the
kernel ignored the request then this field will provide the indication
to the library. If no response is required or if zero is a valid
response, a new field should be added that indicates to the library
whether its request was processed.
Fixes: b368d7cb8ceb ('IB/mlx5: Add hca_core_clock_offset to udata in init_ucontext')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Make use of the blue flame registers allocator at mlx5_ib. Since blue
flame was not really supported we remove all the code that is related to
blue flame and we let all consumers to use the same blue flame register.
Once blue flame is supported we will add the code. As part of this patch
we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only
used by mlx5_ib.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
A reference to a UAR is required to generate CQ or EQ doorbells. Since
CQ or EQ doorbells can all be generated using the same UAR area without
any effect on performance, we are just getting a reference to any
available UAR, If one is not available we allocate it but we don't waste
the blue flame registers it can provide and we will use them for
subsequent allocations.
We get a reference to such UAR and put in mlx5_priv so any kernel
consumer can make use of it.
Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Disconnect or deauthenticate when the owning socket is closed if this
flag is supplied to CMD_CONNECT or CMD_ASSOCIATE. This may be used
to ensure userspace daemon doesn't leave an unmanaged connection behind.
In some situations it would be possible to account for that, to some
degree, in the deamon restart code or in the up/down scripts without
the use of this attribute. But there will be systems where the daemon
can go away for varying periods without a warning due to local resource
management.
Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
A static checker warning occurs in the AFS filesystem:
fs/afs/cmservice.c:155 SRXAFSCB_CallBack()
error: dereferencing freed memory 'call'
due to the reply being sent before we access the server it points to. The
act of sending the reply causes the call to be freed if an error occurs
(but not if it doesn't).
On top of this, the lifetime handling of afs_call structs is fragile
because they get passed around through workqueues without any sort of
refcounting.
Deal with the issues by:
(1) Fix the maybe/maybe not nature of the reply sending functions with
regards to whether they release the call struct.
(2) Refcount the afs_call struct and sort out places that need to get/put
references.
(3) Pass a ref through the work queue and release (or pass on) that ref in
the work function. Care has to be taken because a work queue may
already own a ref to the call.
(4) Do the cleaning up in the put function only.
(5) Simplify module cleanup by always incrementing afs_outstanding_calls
whenever a call is allocated.
(6) Set the backlog to 0 with kernel_listen() at the beginning of the
process of closing the socket to prevent new incoming calls from
occurring and to remove the contribution of preallocated calls from
afs_outstanding_calls before we wait on it.
A tracepoint is also added to monitor the afs_call refcount and lifetime.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Fixes: 08e0e7c82eea: "[AF_RXRPC]: Make the in-kernel AFS filesystem use AF_RXRPC."
|
|
We've recently added the fsid to trace events, this makes the line quite
long. To reduce the it again, remove extra spaces around = and remove
",".
Signed-off-by: David Sterba <dsterba@suse.com>
|