Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph updates from Sage Weil:
"The big item here is support for inline data for CephFS and for
message signatures from Zheng. There are also several bug fixes,
including interrupted flock request handling, 0-length xattrs, mksnap,
cached readdir results, and a message version compat field. Finally
there are several cleanups from Ilya, Dan, and Markus.
Note that there is another series coming soon that fixes some bugs in
the RBD 'lingering' requests, but it isn't quite ready yet"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
ceph: fix setting empty extended attribute
ceph: fix mksnap crash
ceph: do_sync is never initialized
libceph: fixup includes in pagelist.h
ceph: support inline data feature
ceph: flush inline version
ceph: convert inline data to normal data before data write
ceph: sync read inline data
ceph: fetch inline data when getting Fcr cap refs
ceph: use getattr request to fetch inline data
ceph: add inline data to pagecache
ceph: parse inline data in MClientReply and MClientCaps
libceph: specify position of extent operation
libceph: add CREATE osd operation support
libceph: add SETXATTR/CMPXATTR osd operations support
rbd: don't treat CEPH_OSD_OP_DELETE as extent op
ceph: remove unused stringification macros
libceph: require cephx message signature by default
ceph: introduce global empty snap context
ceph: message versioning fixes
...
|
|
The optimization for filtering out extended inquiry results, advertising
reports or scan response data based on provided UUID list has a logic
bug. In case no match is found in the advertising data, the scan
response is ignored and not checked against the filter. This will lead
to events being filtered wrongly.
Change the code to actually only drop the events when the scan response
data is not present. If it is present, it needs to be checked against
the provided filter.
The patch is a bit more complex than it needs to be. That is because
it also fixes this compiler warning that some gcc versions produce.
CC net/bluetooth/mgmt.o
net/bluetooth/mgmt.c: In function ‘mgmt_device_found’:
net/bluetooth/mgmt.c:7028:7: warning: ‘match’ may be used uninitialized in this function [-Wmaybe-uninitialized]
bool match;
^
It seems that gcc can not clearly figure out the context of the match
variable. So just change the branches for the extended inquiry response
and advertising data around so that it is clear.
Reported-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
Signed-off-by: Johan Hedberg <[email protected]>
|
|
allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).
Signed-off-by: Yan, Zheng <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
|
|
Add CEPH_OSD_OP_CREATE support. Also change libceph to not treat
CEPH_OSD_OP_DELETE as an extent op and add an assert to that end.
Signed-off-by: Yan, Zheng <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
|
|
Signed-off-by: Yan, Zheng <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
|
|
Signed-off-by: Yan, Zheng <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
|
|
Signed-off-by: Yan, Zheng <[email protected]>
|
|
Session key is required when calculating message signature. Save the session
key in authorizer, this avoid lookup ticket handler for each message
Signed-off-by: Yan, Zheng <[email protected]>
|
|
Use kvfree() from linux/mm.h instead, which is identical. Also fix the
ceph_buffer comment: we will allocate with kmalloc() up to 32k - the
value of PAGE_ALLOC_COSTLY_ORDER, but that really is just an
implementation detail so don't mention it at all.
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
When writing the code to allow per-station GTKs, I neglected to
take into account the management frame keys (index 4 and 5) when
freeing the station and only added code to free the first four
data frame keys.
Fix this by iterating the array of keys over the right length.
Cc: [email protected]
Fixes: e31b82136d1a ("cfg80211/mac80211: allow per-station GTKs")
Signed-off-by: Johannes Berg <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile #2 from Al Viro:
"Next pile (and there'll be one or two more).
The large piece in this one is getting rid of /proc/*/ns/* weirdness;
among other things, it allows to (finally) make nameidata completely
opaque outside of fs/namei.c, making for easier further cleanups in
there"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coda_venus_readdir(): use file_inode()
fs/namei.c: fold link_path_walk() call into path_init()
path_init(): don't bother with LOOKUP_PARENT in argument
fs/namei.c: new helper (path_cleanup())
path_init(): store the "base" pointer to file in nameidata itself
make default ->i_fop have ->open() fail with ENXIO
make nameidata completely opaque outside of fs/namei.c
kill proc_ns completely
take the targets of /proc/*/ns/* symlinks to separate fs
bury struct proc_ns in fs/proc
copy address of proc_ns_ops into ns_common
new helpers: ns_alloc_inum/ns_free_inum
make proc_ns_operations work with struct ns_common * instead of void *
switch the rest of proc_ns_operations to working with &...->ns
netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
common object embedded into various struct ....ns
|
|
Pull nfsd updates from Bruce Fields:
"A comparatively quieter cycle for nfsd this time, but still with two
larger changes:
- RPC server scalability improvements from Jeff Layton (using RCU
instead of a spinlock to find idle threads).
- server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
Schumaker, enabling fallocate on new clients"
* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd4: fix xdr4 count of server in fs_location4
nfsd4: fix xdr4 inclusion of escaped char
sunrpc/cache: convert to use string_escape_str()
sunrpc: only call test_bit once in svc_xprt_received
fs: nfsd: Fix signedness bug in compare_blob
sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
sunrpc: convert to lockless lookup of queued server threads
sunrpc: fix potential races in pool_stats collection
sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
sunrpc: require svc_create callers to pass in meaningful shutdown routine
sunrpc: have svc_wake_up only deal with pool 0
sunrpc: convert sp_task_pending flag to use atomic bitops
sunrpc: move rq_cachetype field to better optimize space
sunrpc: move rq_splice_ok flag into rq_flags
sunrpc: move rq_dropme flag into rq_flags
sunrpc: move rq_usedeferral flag to rq_flags
sunrpc: move rq_local field to rq_flags
sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
nfsd: minor off by one checks in __write_versions()
sunrpc: release svc_pool_map reference when serv allocation fails
...
|
|
The current implementations all use dev_uc_add_excl() and such whose API
doesn't support vlans, so we can't make it with NICs HW for now.
Fixes: f6f6424ba773 ('net: make vid as a parameter for ndo_fdb_add/ndo_fdb_del')
Signed-off-by: Or Gerlitz <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Jeff Kirsher <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The encap->type comes straight from Netlink. Validate it against
max supported encap types just like ip_encap_hlen() already does.
Fixes: a8c5f9 ("ip_tunnel: Ops registration for secondary encap (fou, gue)")
Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The symbols are exported and could be used by external modules.
Fixes: a8c5f9 ("ip_tunnel: Ops registration for secondary encap (fou, gue)")
Signed-off-by: Thomas Graf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:
====================
pull request: wireless 2014-12-16
Please pull this batch of fixes intended for the 3.19 stream!
For the Bluetooth bits, Johan says:
"The patches consist of:
- Coccinelle warning fix
- hci_dev_lock/unlock fixes
- Fixes for pending mgmt command handling
- Fixes for properly following the force_lesc_support switch
- Fix for a Microsoft branded Broadcom adapter
- New device id for Atheros AR3012
- Fix for BR/EDR Secure Connections enabling"
Along with that...
Brian Norris avoids leaking some kernel memory contents via printk in brcmsmac.
Julia Lawall corrects some misspellings in a few drivers.
Larry Finger gives us one more rtlwifi fix to correct a porting oversight.
Wei Yongjun fixes a sparse warning in rtlwifi.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
|
|
net/rds/message.c: In function ‘rds_message_inc_copy_to_user’:
net/rds/message.c:328: warning: comparison of distinct pointer types lacks a cast
Use min_t(unsigned long, ...) like is done in
rds_message_copy_from_user().
Signed-off-by: Geert Uytterhoeven <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The NBMA GRE tunnels temporarily push GRE header that contain the
per-packet NBMA destination on the skb via header ops early in xmit
path. It is the later pulled before the real GRE header is constructed.
The inner mac was thus set differently in nbma case: the GRE header
has been pushed by neighbor layer, and mac header points to beginning
of the temporary gre header (set by dev_queue_xmit).
Now that the offloads expect mac header to point to the gre payload,
fix the xmit patch to:
- pull first the temporary gre header away
- and reset mac header to point to gre payload
This fixes tso to work again with nbma tunnels.
Fixes: 14051f0452a2 ("gre: Use inner mac length when computing tunnel length")
Signed-off-by: Timo Teräs <[email protected]>
Cc: Tom Herbert <[email protected]>
Cc: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core update from Greg KH:
"Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes,
just removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There
are some ath9k patches coming in through this tree that have been
acked by the wireless maintainers as they relied on the debugfs
changes.
Everything has been in linux-next for a while"
* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
fs: debugfs: add forward declaration for struct device type
firmware class: Deletion of an unnecessary check before the function call "vunmap"
firmware loader: fix hung task warning dump
devcoredump: provide a one-way disable function
device: Add dev_<level>_once variants
ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
ath: use seq_file api for ath9k debugfs files
debugfs: add helper function to create device related seq_file
drivers/base: cacheinfo: remove noisy error boot message
Revert "core: platform: add warning if driver has no owner"
drivers: base: support cpu cache information interface to userspace via sysfs
drivers: base: add cpu_device_create to support per-cpu devices
topology: replace custom attribute macros with standard DEVICE_ATTR*
cpumask: factor out show_cpumap into separate helper function
driver core: Fix unbalanced device reference in drivers_probe
driver core: fix race with userland in device_add()
sysfs/kernfs: make read requests on pre-alloc files use the buffer.
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
fs: sysfs: return EGBIG on write if offset is larger than file size
...
|
|
Pull crypto update from Herbert Xu:
- The crypto API is now documented :)
- Disallow arbitrary module loading through crypto API.
- Allow get request with empty driver name through crypto_user.
- Allow speed testing of arbitrary hash functions.
- Add caam support for ctr(aes), gcm(aes) and their derivatives.
- nx now supports concurrent hashing properly.
- Add sahara support for SHA1/256.
- Add ARM64 version of CRC32.
- Misc fixes.
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
crypto: tcrypt - Allow speed testing of arbitrary hash functions
crypto: af_alg - add user space interface for AEAD
crypto: qat - fix problem with coalescing enable logic
crypto: sahara - add support for SHA1/256
crypto: sahara - replace tasklets with kthread
crypto: sahara - add support for i.MX53
crypto: sahara - fix spinlock initialization
crypto: arm - replace memset by memzero_explicit
crypto: powerpc - replace memset by memzero_explicit
crypto: sha - replace memset by memzero_explicit
crypto: sparc - replace memset by memzero_explicit
crypto: algif_skcipher - initialize upon init request
crypto: algif_skcipher - removed unneeded code
crypto: algif_skcipher - Fixed blocking recvmsg
crypto: drbg - use memzero_explicit() for clearing sensitive data
crypto: drbg - use MODULE_ALIAS_CRYPTO
crypto: include crypto- module prefix in template
crypto: user - add MODULE_ALIAS
crypto: sha-mb - remove a bogus NULL check
crytpo: qat - Fix 64 bytes requests
...
|
|
This patch addresses an issue with the level compression of the fib_trie.
Specifically in the case of adding a new leaf that triggers a new node to
be added that takes the place of the old node. The result is a trie where
the 1 child tnode is on one side and one leaf is on the other which gives
you a very deep trie. Below is the script I used to generate a trie on
dummy0 with a 10.X.X.X family of addresses.
ip link add type dummy
ipval=184549374
bit=2
for i in `seq 1 23`
do
ifconfig dummy0:$bit $ipval/8
ipval=`expr $ipval - $bit`
bit=`expr $bit \* 2`
done
cat /proc/net/fib_triestat
Running the script before the patch:
Local:
Aver depth: 10.82
Max depth: 23
Leaves: 29
Prefixes: 30
Internal nodes: 27
1: 26 2: 1
Pointers: 56
Null ptrs: 1
Total size: 5 kB
After applying the patch and repeating:
Local:
Aver depth: 4.72
Max depth: 9
Leaves: 29
Prefixes: 30
Internal nodes: 12
1: 3 2: 2 3: 7
Pointers: 70
Null ptrs: 30
Total size: 4 kB
What this fix does is start the rebalance at the newly created tnode
instead of at the parent tnode. This way if there is a gap between the
parent and the new node it doesn't prevent the new tnode from being
coalesced with any pre-existing nodes that may have been pushed into one
of the new nodes child branches.
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Since the real device can segment packets by software, a vlan device
can set TSO/UFO even when the real device doesn't have those features.
Unlike GSO, this allows packets to be segmented after Qdisc.
Signed-off-by: Toshiaki Makita <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Ad-hoc requires beaconing for regulatory purposes. Validate that the
channel is valid for beaconing, and not only enabled.
Signed-off-by: Arik Nemtsov <[email protected]>
Reviewed-by: Luis R. Rodriguez <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
This can happen and there is no point in added more
detection code lower in the stack. Catching these in one
single point (cfg80211) is enough. Stop WARNING about this
case.
This fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=89001
Cc: [email protected]
Fixes: 2f1c6c572d7b ("cfg80211: process non country IE conflicting first")
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
When an adapter is removed (hci_unregister_dev) any pending mgmt
commands for that adapter should get the appropriate INVALID_INDEX
response. Since hci_unregister_dev() calls hci_dev_do_close() first
that'd so far have caused "not powered" responses to be sent.
Skipping the HCI_UNREGISTER case in mgmt_powered() is also not a
solution since before reaching the mgmt_index_removed() stage any
hci_conn callbacks (e.g. used by pairing) will get called, thereby
causing "disconnected" status responses to be sent.
The fix that covers all scenarios is to handle both INVALID_INDEX and
NOT_POWERED responses through the mgmt_powered() function. The
INVALID_INDEX response sending from mgmt_index_removed() is left
untouched since there are a couple of places not related to powering off
or removing an adapter that call it (e.g. configuring a new bdaddr).
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
If we're in the AUTO_OFF stage the powered_update_hci() function is
responsible for doing the updates to the HCI state that were not done
during the actual mgmt command handlers. One of the updates needing done
is for BR/EDR SC support. This patch adds the missing HCI command for SC
support to the powered_update_hci() function.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
When the channel switch has been made, a vif is now using
the channel context which was reserved. When that happens,
we need to update the channel context since its parameters
may change.
I hit a case in which I switched to a 40Mhz channel but the
reserved channel context was still on 20Mhz. The rate control
would try to send 40Mhz packets on a 20Mhz channel context and
that made iwlwifi's firmware unhappy.
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
If the userspace passes a malformed sched scan request (or a net
detect wowlan configuration) by adding a NL80211_ATTR_SCHED_SCAN_MATCH
attribute without any nested matchsets, a NULL pointer dereference
will occur. Fix this by checking that we do have matchsets in our
array before trying to access it.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
IP: [<ffffffffa002fd69>] nl80211_parse_sched_scan.part.67+0x6e9/0x900 [cfg80211]
PGD 865c067 PUD 865b067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: iwlmvm(O) iwlwifi(O) mac80211(O) cfg80211(O) compat(O) [last unloaded: compat]
CPU: 2 PID: 2442 Comm: iw Tainted: G O 3.17.2 #31
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff880013800790 ti: ffff880008d80000 task.ti: ffff880008d80000
RIP: 0010:[<ffffffffa002fd69>] [<ffffffffa002fd69>] nl80211_parse_sched_scan.part.67+0x6e9/0x900 [cfg80211]
RSP: 0018:ffff880008d838d0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000143c RSI: 0000000000000000 RDI: ffff880008ee8dd0
RBP: ffff880008d83948 R08: 0000000000000002 R09: 0000000000000019
R10: ffff88001d1b3c40 R11: 0000000000000002 R12: ffff880019e85e00
R13: 00000000fffffed4 R14: ffff880009757800 R15: 0000000000001388
FS: 00007fa3b6d13700(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000024 CR3: 0000000008670000 CR4: 00000000000006e0
Stack:
ffff880009757800 ffff880000000001 0000000000000000 ffff880008ee84e0
0000000000000000 ffff880009757800 00000000fffffed4 ffff880008d83948
ffffffff814689c9 ffff880009757800 ffff880008ee8000 0000000000000000
Call Trace:
[<ffffffff814689c9>] ? nla_parse+0xb9/0x120
[<ffffffffa00306de>] nl80211_set_wowlan+0x75e/0x960 [cfg80211]
[<ffffffff810bf3d5>] ? mark_held_locks+0x75/0xa0
[<ffffffff8161a77b>] genl_family_rcv_msg+0x18b/0x360
[<ffffffff810bf66d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff8161a9d4>] genl_rcv_msg+0x84/0xc0
[<ffffffff8161a950>] ? genl_family_rcv_msg+0x360/0x360
[<ffffffff81618e79>] netlink_rcv_skb+0xa9/0xd0
[<ffffffff81619458>] genl_rcv+0x28/0x40
[<ffffffff816184a5>] netlink_unicast+0x105/0x180
[<ffffffff8161886f>] netlink_sendmsg+0x34f/0x7a0
[<ffffffff8105a097>] ? kvm_clock_read+0x27/0x40
[<ffffffff815c644d>] sock_sendmsg+0x8d/0xc0
[<ffffffff811a75c9>] ? might_fault+0xb9/0xc0
[<ffffffff811a756e>] ? might_fault+0x5e/0xc0
[<ffffffff815d5d26>] ? verify_iovec+0x56/0xe0
[<ffffffff815c73e0>] ___sys_sendmsg+0x3d0/0x3e0
[<ffffffff810a7be8>] ? sched_clock_cpu+0x98/0xd0
[<ffffffff810611b4>] ? __do_page_fault+0x254/0x580
[<ffffffff810bb39f>] ? up_read+0x1f/0x40
[<ffffffff810611b4>] ? __do_page_fault+0x254/0x580
[<ffffffff812146ed>] ? __fget_light+0x13d/0x160
[<ffffffff815c7b02>] __sys_sendmsg+0x42/0x80
[<ffffffff815c7b52>] SyS_sendmsg+0x12/0x20
[<ffffffff81751f69>] system_call_fastpath+0x16/0x1b
Fixes: ea73cbce4e1f ("nl80211: fix scheduled scan RSSI matchset attribute confusion")
Cc: [email protected] [3.15+]
Signed-off-by: Luciano Coelho <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
In the already-set and intersect case of a driver-hint, the previous
wiphy regdomain was not freed before being reset with a copy of the
cfg80211 regdomain.
Cc: [email protected]
Signed-off-by: Arik Nemtsov <[email protected]>
Acked-by: Luis R. Rodriguez <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
The VHT supported channel width field is a two bit integer, not a
bitfield. cfg80211_chandef_usable() was interpreting it incorrectly and
ended up rejecting 160 MHz channel width if the driver indicated support
for both 160 and 80+80 MHz channels.
Cc: [email protected] (3.16+)
Fixes: 3d9d1d6656a73 ("nl80211/cfg80211: support VHT channel configuration")
(however, no real drivers had 160 MHz support it until 3.16)
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount"
stopped being incremented after the use-after-free fix. Furthermore, the
RX-LED will be triggered by every multicast frame (which wouldn't happen
before) which wouldn't allow the LED to rest at all.
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the
patch.
Cc: [email protected]
Fixes: b8fff407a180 ("mac80211: fix use-after-free in defragmentation")
Signed-off-by: Andreas Müller <[email protected]>
[rewrite commit message]
Signed-off-by: Johannes Berg <[email protected]>
|
|
Avoid a case where we would access uninitialized stack data if the AP
advertises HT support without 40MHz channel support.
Cc: [email protected]
Fixes: f3000e1b43f1 ("mac80211: fix broken use of VHT/20Mhz with some APs")
Signed-off-by: Jes Sorensen <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
In case we cannot attach to our slave netdevice PHY, error out and
propagate that error up to the caller: dsa_slave_create().
Fixes: 0d8bcdd383b8 ("net: dsa: allow for more complex PHY setups")
Signed-off-by: Andrey Volkov <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In case there is no PHY at the designated address on the internal
switch, we would basically de-reference a null pointer here:
dsa_slave_phy_setup(...)
{
p->phy = ds->slave_mii_bus->phy_map[p->port];
phy_connect_direct(slave_dev, p->phy, dsa_slave_adjust_link,
^------
This can be triggered when the platform configuration (platform_data or
Device Tree) indicates there should be a PHY device at this address, but
the HW is non-responsive, such that we cannot attach a PHY device at
this specific location.
Fix this by checking the return value prior to calling
phy_connect_direct().
CC: Andrew Lunn <[email protected]>
Fixes: b31f65fb4383 ("net: dsa: slave: Fix autoneg for phys on switch MDIO bus")
Reported-by: Brian Norris <[email protected]>
Signed-off-by: Andrey Volkov <[email protected]>
Signed-off-by: Florian Fainelli <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pull networking updates from David Miller:
1) New offloading infrastructure and example 'rocker' driver for
offloading of switching and routing to hardware.
This work was done by a large group of dedicated individuals, not
limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu
2) Start making the networking operate on IOV iterators instead of
modifying iov objects in-situ during transfers. Thanks to Al Viro
and Herbert Xu.
3) A set of new netlink interfaces for the TIPC stack, from Richard
Alpe.
4) Remove unnecessary looping during ipv6 routing lookups, from Martin
KaFai Lau.
5) Add PAUSE frame generation support to gianfar driver, from Matei
Pavaluca.
6) Allow for larger reordering levels in TCP, which are easily
achievable in the real world right now, from Eric Dumazet.
7) Add a variable of napi_schedule that doesn't need to disable cpu
interrupts, from Eric Dumazet.
8) Use a doubly linked list to optimize neigh_parms_release(), from
Nicolas Dichtel.
9) Various enhancements to the kernel BPF verifier, and allow eBPF
programs to actually be attached to sockets. From Alexei
Starovoitov.
10) Support TSO/LSO in sunvnet driver, from David L Stevens.
11) Allow controlling ECN usage via routing metrics, from Florian
Westphal.
12) Remote checksum offload, from Tom Herbert.
13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
driver, from Thomas Lendacky.
14) Add MPLS support to openvswitch, from Simon Horman.
15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
Klassert.
16) Do gro flushes on a per-device basis using a timer, from Eric
Dumazet. This tries to resolve the conflicting goals between the
desired handling of bulk vs. RPC-like traffic.
17) Allow userspace to ask for the CPU upon what a packet was
received/steered, via SO_INCOMING_CPU. From Eric Dumazet.
18) Limit GSO packets to half the current congestion window, from Eric
Dumazet.
19) Add a generic helper so that all drivers set their RSS keys in a
consistent way, from Eric Dumazet.
20) Add xmit_more support to enic driver, from Govindarajulu
Varadarajan.
21) Add VLAN packet scheduler action, from Jiri Pirko.
22) Support configurable RSS hash functions via ethtool, from Eyal
Perry.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
Fix race condition between vxlan_sock_add and vxlan_sock_release
net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
net/mlx4: Add support for A0 steering
net/mlx4: Refactor QUERY_PORT
net/mlx4_core: Add explicit error message when rule doesn't meet configuration
net/mlx4: Add A0 hybrid steering
net/mlx4: Add mlx4_bitmap zone allocator
net/mlx4: Add a check if there are too many reserved QPs
net/mlx4: Change QP allocation scheme
net/mlx4_core: Use tasklet for user-space CQ completion events
net/mlx4_core: Mask out host side virtualization features for guests
net/mlx4_en: Set csum level for encapsulated packets
be2net: Export tunnel offloads only when a VxLAN tunnel is created
gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
net: fec: only enable mdio interrupt before phy device link up
net: fec: clear all interrupt events to support i.MX6SX
net: fec: reset fep link status in suspend function
net: sock: fix access via invalid file descriptor
net: introduce helper macro for_each_cmsghdr
...
|
|
Pull virtio updates from Michael Tsirkin:
"virtio: virtio 1.0 support, misc patches
This adds a lot of infrastructure for virtio 1.0 support. Notable
missing pieces: virtio pci, virtio balloon (needs spec extension),
vhost scsi.
Plus, there are some minor fixes in a couple of places.
Note: some net drivers are affected by these patches. David said he's
fine with merging these patches through my tree.
Rusty's on vacation, he acked using my tree for these, too"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (70 commits)
virtio_ccw: finalize_features error handling
virtio_ccw: future-proof finalize_features
virtio_pci: rename virtio_pci -> virtio_pci_common
virtio_pci: update file descriptions and copyright
virtio_pci: split out legacy device support
virtio_pci: setup config vector indirectly
virtio_pci: setup vqs indirectly
virtio_pci: delete vqs indirectly
virtio_pci: use priv for vq notification
virtio_pci: free up vq->priv
virtio_pci: fix coding style for structs
virtio_pci: add isr field
virtio: drop legacy_only driver flag
virtio_balloon: drop legacy_only driver flag
virtio_ccw: rev 1 devices set VIRTIO_F_VERSION_1
virtio: allow finalize_features to fail
virtio_ccw: legacy: don't negotiate rev 1/features
virtio: add API to detect legacy devices
virtio_console: fix sparse warnings
vhost: remove unnecessary forward declarations in vhost.h
...
|
|
This patch moves the mgmt_powered() notification earlier in the
hci_dev_do_close() function. This way the correct "not powered" error
gets passed to any pending mgmt commands. Without the patch the pending
commands would instead get a misleading "disconnected" response when
powering down the adapter.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The pairing_complete() function is used as a pending mgmt command
cmd_complete callback. The expectation of such functions is that they
are not responsible themselves for calling mgmt_pending_remove(). This
patch fixes the incorrect mgmt_pending_remove() call in
pairing_complete() and adds it to the appropriate changes.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The pairing_complete() function relies on a hci_conn reference to be
able to access the hci_conn object. It should therefore only release
this reference once it's done accessing the object, i.e. at the end of
the function.
Signed-off-by: Johan Hedberg <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
The Read Page Scan Activity and Read Page Scan Type commands are not
supported by all controllers. Move the execution of both commands
into the 3rd phase of the init procedure. And then check the bit
mask of supported commands before adding them to the init sequence.
With this re-ordering of the init sequence, the extra check for
AVM BlueFritz! controllers is no longer needed. They will report
that these two commands are not supported.
This fixes an issue with the Microsoft Corp. Wireless Transceiver
for Bluetooth 2.0 (ID 045e:009c).
Signed-off-by: Marcel Holtmann <[email protected]>
Signed-off-by: Johan Hedberg <[email protected]>
|
|
mgmt_pending_remove() should be called with hci_dev_lock protection and
all hci_event.c functions which calls mgmt_complete() (which eventually
calls mgmt_pending_remove()) should hold the lock.
So this patch fixes the same
Signed-off-by: Jaganath Kanakkassery <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
mgmt_pending_remove() should be called with hci_dev_lock protection
and currently the rule to take dev lock is that all mgmt req_complete
functions should take dev lock. So this patch fixes the same in the
missing functions
Without this patch there is a chance of invalid memory access while
accessing the mgmt_pending list like below
bluetoothd: 392] [0] Backtrace:
bluetoothd: 392] [0] [<c04ec770>] (pending_eir_or_class+0x0/0x68) from [<c04f1830>] (add_uuid+0x34/0x1c4)
bluetoothd: 392] [0] [<c04f17fc>] (add_uuid+0x0/0x1c4) from [<c04f3cc4>] (mgmt_control+0x204/0x274)
bluetoothd: 392] [0] [<c04f3ac0>] (mgmt_control+0x0/0x274) from [<c04f609c>] (hci_sock_sendmsg+0x80/0x308)
bluetoothd: 392] [0] [<c04f601c>] (hci_sock_sendmsg+0x0/0x308) from [<c03d4d68>] (sock_aio_write+0x144/0x174)
bluetoothd: 392] [0] r8:00000000 r7 7c1be90 r6 7c1be18 r5:00000017 r4 a90ea80
bluetoothd: 392] [0] [<c03d4c24>] (sock_aio_write+0x0/0x174) from [<c00e2d4c>] (do_sync_write+0xb0/0xe0)
bluetoothd: 392] [0] [<c00e2c9c>] (do_sync_write+0x0/0xe0) from [<c00e371c>] (vfs_write+0x134/0x13c)
bluetoothd: 392] [0] r8:00000000 r7 7c1bf70 r6:beeca5c8 r5:00000017 r4 7c05900
bluetoothd: 392] [0] [<c00e35e8>] (vfs_write+0x0/0x13c) from [<c00e3910>] (sys_write+0x44/0x70)
bluetoothd: 392] [0] r8:00000000 r7:00000004 r6:00000017 r5:beeca5c8 r4 7c05900
bluetoothd: 392] [0] [<c00e38cc>] (sys_write+0x0/0x70) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30)
bluetoothd: 392] [0] r9 7c1a000 r8:c000e568 r6:400b5f10 r5:403896d8 r4:beeca604
bluetoothd: 392] [0] Code: e28cc00c e152000c 0a00000f e3a00001 (e1d210b8)
bluetoothd: 392] [0] ---[ end trace 67b6ac67435864c4 ]---
bluetoothd: 392] [0] Kernel panic - not syncing: Fatal exception
Signed-off-by: Jaganath Kanakkassery <[email protected]>
Signed-off-by: Marcel Holtmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
"This time we have some more new material than we used to have during
the last couple of development cycles.
The most important part of it to me is the introduction of a unified
interface for accessing device properties provided by platform
firmware. It works with Device Trees and ACPI in a uniform way and
drivers using it need not worry about where the properties come from
as long as the platform firmware (either DT or ACPI) makes them
available. It covers both devices and "bare" device node objects
without struct device representation as that turns out to be necessary
in some cases. This has been in the works for quite a few months (and
development cycles) and has been approved by all of the relevant
maintainers.
On top of that, some drivers are switched over to the new interface
(at25, leds-gpio, gpio_keys_polled) and some additional changes are
made to the core GPIO subsystem to allow device drivers to manipulate
GPIOs in the "canonical" way on platforms that provide GPIO
information in their ACPI tables, but don't assign names to GPIO lines
(in which case the driver needs to do that on the basis of what it
knows about the device in question). That also has been approved by
the GPIO core maintainers and the rfkill driver is now going to use
it.
Second is support for hardware P-states in the intel_pstate driver.
It uses CPUID to detect whether or not the feature is supported by the
processor in which case it will be enabled by default. However, it
can be disabled entirely from the kernel command line if necessary.
Next is support for a platform firmware interface based on ACPI
operation regions used by the PMIC (Power Management Integrated
Circuit) chips on the Intel Baytrail-T and Baytrail-T-CR platforms.
That interface is used for manipulating power resources and for
thermal management: sensor temperature reporting, trip point setting
and so on.
Also the ACPI core is now going to support the _DEP configuration
information in a limited way. Basically, _DEP it supposed to reflect
off-the-hierarchy dependencies between devices which may be very
indirect, like when AML for one device accesses locations in an
operation region handled by another device's driver (usually, the
device depended on this way is a serial bus or GPIO controller). The
support added this time is sufficient to make the ACPI battery driver
work on Asus T100A, but it is general enough to be able to cover some
other use cases in the future.
Finally, we have a new cpufreq driver for the Loongson1B processor.
In addition to the above, there are fixes and cleanups all over the
place as usual and a traditional ACPICA update to a recent upstream
release.
As far as the fixes go, the ACPI LPSS (Low-power Subsystem) driver for
Intel platforms should be able to handle power management of the DMA
engine correctly, the cpufreq-dt driver should interact with the
thermal subsystem in a better way and the ACPI backlight driver should
handle some more corner cases, among other things.
On top of the ACPICA update there are fixes for race conditions in the
ACPICA's interrupt handling code which might lead to some random and
strange looking failures on some systems.
In the cleanups department the most visible part is the series of
commits targeted at getting rid of the CONFIG_PM_RUNTIME configuration
option. That was triggered by a discussion regarding the generic
power domains code during which we realized that trying to support
certain combinations of PM config options was painful and not really
worth it, because nobody would use them in production anyway. For
this reason, we decided to make CONFIG_PM_SLEEP select
CONFIG_PM_RUNTIME and that lead to the conclusion that the latter
became redundant and CONFIG_PM could be used instead of it. The
material here makes that replacement in a major part of the tree, but
there will be at least one more batch of that in the second part of
the merge window.
Specifics:
- Support for retrieving device properties information from ACPI _DSD
device configuration objects and a unified device properties
interface for device drivers (and subsystems) on top of that. As
stated above, this works with Device Trees and ACPI and allows
device drivers to be written in a platform firmware (DT or ACPI)
agnostic way. The at25, leds-gpio and gpio_keys_polled drivers are
now going to use this new interface and the GPIO subsystem is
additionally modified to allow device drivers to assign names to
GPIO resources returned by ACPI _CRS objects (in case _DSD is not
present or does not provide the expected data). The changes in
this set are mostly from Mika Westerberg, Rafael J Wysocki, Aaron
Lu, and Darren Hart with some fixes from others (Fabio Estevam,
Geert Uytterhoeven).
- Support for Hardware Managed Performance States (HWP) as described
in Volume 3, section 14.4, of the Intel SDM in the intel_pstate
driver. CPUID is used to detect whether or not the feature is
supported by the processor. If supported, it will be enabled
automatically unless the intel_pstate=no_hwp switch is present in
the kernel command line. From Dirk Brandewie.
- New Intel Broadwell-H ID for intel_pstate (Dirk Brandewie).
- Support for firmware interface based on ACPI operation regions used
by the PMIC chips on the Intel Baytrail-T and Baytrail-T-CR
platforms for power resource control and thermal management (Aaron
Lu).
- Limited support for retrieving off-the-hierarchy dependencies
between devices from ACPI _DEP device configuration objects and
deferred probing support for the ACPI battery driver based on the
_DEP information to make that driver work on Asus T100A (Lan
Tianyu).
- New cpufreq driver for the Loongson1B processor (Kelvin Cheung).
- ACPICA update to upstream revision 20141107 which only affects
tools (Bob Moore).
- Fixes for race conditions in the ACPICA's interrupt handling code
and in the ACPI code related to system suspend and resume (Lv Zheng
and Rafael J Wysocki).
- ACPI core fix for an RCU-related issue in the ioremap() regions
management code that slowed down significantly after CPUs had been
allowed to enter idle states even if they'd had RCU callbakcs
queued and triggered some problems in certain proprietary graphics
driver (and elsewhere). The fix replaces synchronize_rcu() in that
code with synchronize_rcu_expedited() which makes the issue go
away. From Konstantin Khlebnikov.
- ACPI LPSS (Low-Power Subsystem) driver fix to handle power
management of the DMA engine included into the LPSS correctly. The
problem is that the DMA engine doesn't have ACPI PM support of its
own and it simply is turned off when the last LPSS device having
ACPI PM support goes into D3cold. To work around that, the PM
domain used by the ACPI LPSS driver is redesigned so at least one
device with ACPI PM support will be on as long as the DMA engine is
in use. From Andy Shevchenko.
- ACPI backlight driver fix to avoid using it on "Win8-compatible"
systems where it doesn't work and where it was used by default by
mistake (Aaron Lu).
- Assorted minor ACPI core fixes and cleanups from Tomasz Nowicki,
Sudeep Holla, Huang Rui, Hanjun Guo, Fabian Frederick, and Ashwin
Chaugule (mostly related to the upcoming ARM64 support).
- Intel RAPL (Running Average Power Limit) power capping driver fixes
and improvements including new processor IDs (Jacob Pan).
- Generic power domains modification to power up domains after
attaching devices to them to meet the expectations of device
drivers and bus types assuming devices to be accessible at probe
time (Ulf Hansson).
- Preliminary support for controlling device clocks from the generic
power domains core code and modifications of the ARM/shmobile
platform to use that feature (Ulf Hansson).
- Assorted minor fixes and cleanups of the generic power domains core
code (Ulf Hansson, Geert Uytterhoeven).
- Assorted minor fixes and cleanups of the device clocks control code
in the PM core (Geert Uytterhoeven, Grygorii Strashko).
- Consolidation of device power management Kconfig options by making
CONFIG_PM_SLEEP select CONFIG_PM_RUNTIME and removing the latter
which is now redundant (Rafael J Wysocki and Kevin Hilman). That
is the first batch of the changes needed for this purpose.
- Core device runtime power management support code cleanup related
to the execution of callbacks (Andrzej Hajda).
- cpuidle ARM support improvements (Lorenzo Pieralisi).
- cpuidle cleanup related to the CPUIDLE_FLAG_TIME_VALID flag and a
new MAINTAINERS entry for ARM Exynos cpuidle (Daniel Lezcano and
Bartlomiej Zolnierkiewicz).
- New cpufreq driver callback (->ready) to be executed when the
cpufreq core is ready to use a given policy object and cpufreq-dt
driver modification to use that callback for cooling device
registration (Viresh Kumar).
- cpufreq core fixes and cleanups (Viresh Kumar, Vince Hsu, James
Geboski, Tomeu Vizoso).
- Assorted fixes and cleanups in the cpufreq-pcc, intel_pstate,
cpufreq-dt, pxa2xx cpufreq drivers (Lenny Szubowicz, Ethan Zhao,
Stefan Wahren, Petr Cvek).
- OPP (Operating Performance Points) framework modification to allow
OPPs to be removed too and update of a few cpufreq drivers
(cpufreq-dt, exynos5440, imx6q, cpufreq) to remove OPPs (added
during initialization) on driver removal (Viresh Kumar).
- Hibernation core fixes and cleanups (Tina Ruchandani and Markus
Elfring).
- PM Kconfig fix related to CPU power management (Pankaj Dubey).
- cpupower tool fix (Prarit Bhargava)"
* tag 'pm+acpi-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (120 commits)
i2c-omap / PM: Drop CONFIG_PM_RUNTIME from i2c-omap.c
dmaengine / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
tools: cpupower: fix return checks for sysfs_get_idlestate_count()
drivers: sh / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
e1000e / igb / PM: Eliminate CONFIG_PM_RUNTIME
MMC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
MFD / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
misc / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
media / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
input / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
leds: leds-gpio: Fix multiple instances registration without 'label' property
iio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
hsi / OMAP / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
i2c-hid / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
drm / exynos / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
gpio / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
hwrandom / exynos / PM: Use CONFIG_PM in #ifdef
block / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
USB / PM: Drop CONFIG_PM_RUNTIME from the USB core
PM: Merge the SET*_RUNTIME_PM_OPS() macros
...
|
|
0day robot reported the following crash:
[ 21.233581] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
[ 21.234709] IP: [<ffffffff8156ebda>] sk_attach_bpf+0x39/0xc2
It's due to bpf_prog_get() returning ERR_PTR.
Check it properly.
Reported-by: Fengguang Wu <[email protected]>
Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.
Signed-off-by: Gu Zheng <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Merge first patchbomb from Andrew Morton:
- a few minor cifs fixes
- dma-debug upadtes
- ocfs2
- slab
- about half of MM
- procfs
- kernel/exit.c
- panic.c tweaks
- printk upates
- lib/ updates
- checkpatch updates
- fs/binfmt updates
- the drivers/rtc tree
- nilfs
- kmod fixes
- more kernel/exit.c
- various other misc tweaks and fixes
* emailed patches from Andrew Morton <[email protected]>: (190 commits)
exit: pidns: fix/update the comments in zap_pid_ns_processes()
exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
exit: exit_notify: re-use "dead" list to autoreap current
exit: reparent: call forget_original_parent() under tasklist_lock
exit: reparent: avoid find_new_reaper() if no children
exit: reparent: introduce find_alive_thread()
exit: reparent: introduce find_child_reaper()
exit: reparent: document the ->has_child_subreaper checks
exit: reparent: s/while_each_thread/for_each_thread/ in find_new_reaper()
exit: reparent: fix the cross-namespace PR_SET_CHILD_SUBREAPER reparenting
exit: reparent: fix the dead-parent PR_SET_CHILD_SUBREAPER reparenting
exit: proc: don't try to flush /proc/tgid/task/tgid
exit: release_task: fix the comment about group leader accounting
exit: wait: drop tasklist_lock before psig->c* accounting
exit: wait: don't use zombie->real_parent
exit: wait: cleanup the ptrace_reparented() checks
usermodehelper: kill the kmod_thread_locker logic
usermodehelper: don't use CLONE_VFORK for ____call_usermodehelper()
fs/hfs/catalog.c: fix comparison bug in hfs_cat_keycmp
nilfs2: fix the nilfs_iget() vs. nilfs_new_inode() races
...
|
|
As it is, default ->i_fop has NULL ->open() (along with all other methods).
The only case where it matters is reopening (via procfs symlink) a file that
didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned
to something sane (default would fail on read/write/ioctl/etc.).
Unfortunately, such case exists - alloc_file() users, especially
anon_get_file() ones. There we have tons of opened files of very different
kinds sharing the same inode. As the result, attempt to reopen those via
procfs succeeds and you get a descriptor you can't do anything with.
Moreover, in case of sockets we set ->i_fop that will only be used
on such reopen attempts - and put a failing ->open() into it to make sure
those do not succeed.
It would be simpler to put such ->open() into default ->i_fop and leave
it unchanged both for anon inode (as we do anyway) and for socket ones. Result:
* everything going through do_dentry_open() works as it used to
* sock_no_open() kludge is gone
* attempts to reopen anon-inode files fail as they really ought to
* ditto for aio_private_file()
* ditto for perfmon - this one actually tried to imitate sock_no_open()
trick, but failed to set ->i_fop, so in the current tree reopens succeed and
yield completely useless descriptor. Intent clearly had been to fail with
-ENXIO on such reopens; now it actually does.
* everything else that used alloc_file() keeps working - it has ->i_fop
set for its inodes anyway
Signed-off-by: Al Viro <[email protected]>
|
|
|
|
Memory is internally accounted in bytes, using spinlock-protected 64-bit
counters, even though the smallest accounting delta is a page. The
counter interface is also convoluted and does too many things.
Introduce a new lockless word-sized page counter API, then change all
memory accounting over to it. The translation from and to bytes then only
happens when interfacing with userspace.
The removed locking overhead is noticable when scaling beyond the per-cpu
charge caches - on a 4-socket machine with 144-threads, the following test
shows the performance differences of 288 memcgs concurrently running a
page fault benchmark:
vanilla:
18631648.500498 task-clock (msec) # 140.643 CPUs utilized ( +- 0.33% )
1,380,638 context-switches # 0.074 K/sec ( +- 0.75% )
24,390 cpu-migrations # 0.001 K/sec ( +- 8.44% )
1,843,305,768 page-faults # 0.099 M/sec ( +- 0.00% )
50,134,994,088,218 cycles # 2.691 GHz ( +- 0.33% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
8,049,712,224,651 instructions # 0.16 insns per cycle ( +- 0.04% )
1,586,970,584,979 branches # 85.176 M/sec ( +- 0.05% )
1,724,989,949 branch-misses # 0.11% of all branches ( +- 0.48% )
132.474343877 seconds time elapsed ( +- 0.21% )
lockless:
12195979.037525 task-clock (msec) # 133.480 CPUs utilized ( +- 0.18% )
832,850 context-switches # 0.068 K/sec ( +- 0.54% )
15,624 cpu-migrations # 0.001 K/sec ( +- 10.17% )
1,843,304,774 page-faults # 0.151 M/sec ( +- 0.00% )
32,811,216,801,141 cycles # 2.690 GHz ( +- 0.18% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
9,999,265,091,727 instructions # 0.30 insns per cycle ( +- 0.10% )
2,076,759,325,203 branches # 170.282 M/sec ( +- 0.12% )
1,656,917,214 branch-misses # 0.08% of all branches ( +- 0.55% )
91.369330729 seconds time elapsed ( +- 0.45% )
On top of improved scalability, this also gets rid of the icky long long
types in the very heart of memcg, which is great for 32 bit and also makes
the code a lot more readable.
Notable differences between the old and new API:
- res_counter_charge() and res_counter_charge_nofail() become
page_counter_try_charge() and page_counter_charge() resp. to match
the more common kernel naming scheme of try_do()/do()
- res_counter_uncharge_until() is only ever used to cancel a local
counter and never to uncharge bigger segments of a hierarchy, so
it's replaced by the simpler page_counter_cancel()
- res_counter_set_limit() is replaced by page_counter_limit(), which
expects its callers to serialize against themselves
- res_counter_memparse_write_strategy() is replaced by
page_counter_limit(), which rounds down to the nearest page size -
rather than up. This is more reasonable for explicitely requested
hard upper limits.
- to keep charging light-weight, page_counter_try_charge() charges
speculatively, only to roll back if the result exceeds the limit.
Because of this, a failing bigger charge can temporarily lock out
smaller charges that would otherwise succeed. The error is bounded
to the difference between the smallest and the biggest possible
charge size, so for memcg, this means that a failing THP charge can
send base page charges into reclaim upto 2MB (4MB) before the limit
would have been reached. This should be acceptable.
[[email protected]: add includes for WARN_ON_ONCE and memparse]
[[email protected]: add includes for WARN_ON_ONCE, memparse, strncmp, and PAGE_SIZE]
Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Vladimir Davydov <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|