aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-04-04ipv6: udp: set dst cache for a connected sk if current not validAlexey Kodanev1-19/+2
A new RTF_CACHE route can be created between ip6_sk_dst_lookup_flow() and ip6_dst_store() calls in udpv6_sendmsg(), when datagram sending results to ICMPV6_PKT_TOOBIG error: udp_v6_send_skb(), for example with vti6 tunnel: vti6_xmit(), get ICMPV6_PKT_TOOBIG error skb_dst_update_pmtu(), can create a RTF_CACHE clone icmpv6_send() ... udpv6_err() ip6_sk_update_pmtu() ip6_update_pmtu(), can create a RTF_CACHE clone ... ip6_datagram_dst_update() ip6_dst_store() And after commit 33c162a980fe ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update"), the UDPv6 error handler can update socket's dst cache, but it can happen before the update in the end of udpv6_sendmsg(), preventing getting the new dst cache on the next udpv6_sendmsg() calls. In order to fix it, save dst in a connected socket only if the current socket's dst cache is invalid. The previous patch prepared ip6_sk_dst_lookup_flow() to do that with the new argument, and this patch enables it in udpv6_sendmsg(). Fixes: 33c162a980fe ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update") Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: Alexey Kodanev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04ipv6: udp: convert 'connected' to bool type in udpv6_sendmsg()Alexey Kodanev1-5/+5
This should make it consistent with ip6_sk_dst_lookup_flow() that is accepting the new 'connected' parameter of type bool. Signed-off-by: Alexey Kodanev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow()Alexey Kodanev4-6/+16
Add 'connected' parameter to ip6_sk_dst_lookup_flow() and update the cache only if ip6_sk_dst_check() returns NULL and a socket is connected. The function is used as before, the new behavior for UDP sockets in udpv6_sendmsg() will be enabled in the next patch. Signed-off-by: Alexey Kodanev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04ipv6: add a wrapper for ip6_dst_store() with flowi6 checksAlexey Kodanev3-8/+21
Move commonly used pattern of ip6_dst_store() usage to a separate function - ip6_sk_dst_store_flow(), which will check the addresses for equality using the flow information, before saving them. There is no functional changes in this patch. In addition, it will be used in the next patch, in ip6_sk_dst_lookup_flow(). Signed-off-by: Alexey Kodanev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04net: phy: marvell10g: add thermal hwmon deviceRussell King1-2/+182
Add a thermal monitoring device for the Marvell 88x3310, which updates once a second. We also need to hook into the suspend/resume mechanism to ensure that the thermal monitoring is reconfigured when we resume. Suggested-by: Andrew Lunn <[email protected]> Signed-off-by: Russell King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04pptp: remove a buggy dst release in pptp_connect()Eric Dumazet1-1/+0
Once dst has been cached in socket via sk_setup_caps(), it is illegal to call ip_rt_put() (or dst_release()), since sk_setup_caps() did not change dst refcount. We can still dereference it since we hold socket lock. Caugth by syzbot : BUG: KASAN: use-after-free in atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline] BUG: KASAN: use-after-free in dst_release+0x27/0xa0 net/core/dst.c:185 Write of size 4 at addr ffff8801c54dc040 by task syz-executor4/20088 CPU: 1 PID: 20088 Comm: syz-executor4 Not tainted 4.16.0+ #376 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x1a7/0x27d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23c/0x360 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x137/0x190 mm/kasan/kasan.c:267 kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278 atomic_dec_return include/asm-generic/atomic-instrumented.h:198 [inline] dst_release+0x27/0xa0 net/core/dst.c:185 sk_dst_set include/net/sock.h:1812 [inline] sk_dst_reset include/net/sock.h:1824 [inline] sock_setbindtodevice net/core/sock.c:610 [inline] sock_setsockopt+0x431/0x1b20 net/core/sock.c:707 SYSC_setsockopt net/socket.c:1845 [inline] SyS_setsockopt+0x2ff/0x360 net/socket.c:1828 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 RIP: 0033:0x4552d9 RSP: 002b:00007f4878126c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00007f48781276d4 RCX: 00000000004552d9 RDX: 0000000000000019 RSI: 0000000000000001 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000010 R09: 0000000000000000 R10: 00000000200010c0 R11: 0000000000000246 R12: 00000000ffffffff R13: 0000000000000526 R14: 00000000006fac30 R15: 0000000000000000 Allocated by task 20088: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:552 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3542 dst_alloc+0x11f/0x1a0 net/core/dst.c:104 rt_dst_alloc+0xe9/0x540 net/ipv4/route.c:1520 __mkroute_output net/ipv4/route.c:2265 [inline] ip_route_output_key_hash_rcu+0xa49/0x2c60 net/ipv4/route.c:2493 ip_route_output_key_hash+0x20b/0x370 net/ipv4/route.c:2322 __ip_route_output_key include/net/route.h:126 [inline] ip_route_output_flow+0x26/0xa0 net/ipv4/route.c:2577 ip_route_output_ports include/net/route.h:163 [inline] pptp_connect+0xa84/0x1170 drivers/net/ppp/pptp.c:453 SYSC_connect+0x213/0x4a0 net/socket.c:1639 SyS_connect+0x24/0x30 net/socket.c:1620 do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Freed by task 20082: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:520 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:527 __cache_free mm/slab.c:3486 [inline] kmem_cache_free+0x83/0x2a0 mm/slab.c:3744 dst_destroy+0x266/0x380 net/core/dst.c:140 dst_destroy_rcu+0x16/0x20 net/core/dst.c:153 __rcu_reclaim kernel/rcu/rcu.h:178 [inline] rcu_do_batch kernel/rcu/tree.c:2675 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2930 [inline] __rcu_process_callbacks kernel/rcu/tree.c:2897 [inline] rcu_process_callbacks+0xd6c/0x17b0 kernel/rcu/tree.c:2914 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 The buggy address belongs to the object at ffff8801c54dc000 which belongs to the cache ip_dst_cache of size 168 The buggy address is located 64 bytes inside of 168-byte region [ffff8801c54dc000, ffff8801c54dc0a8) The buggy address belongs to the page: page:ffffea0007153700 count:1 mapcount:0 mapping:ffff8801c54dc000 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffff8801c54dc000 0000000000000000 0000000100000010 raw: ffffea0006b34b20 ffffea0006b6c1e0 ffff8801d674a1c0 0000000000000000 page dumped because: kasan: bad access detected Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04net: dsa: mt7530: Use NULL instead of plain integerFlorian Fainelli1-3/+3
We would be passing 0 instead of NULL as the rsp argument to mt7530_fdb_cmd(), fix that. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Florian Fainelli <[email protected]> Acked-by: Sean Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04net: dsa: b53: Fix sparse warnings in b53_mmap.cFlorian Fainelli1-9/+24
sparse complains about the following warnings: drivers/net/dsa/b53/b53_mmap.c:33:31: warning: incorrect type in initializer (different address spaces) drivers/net/dsa/b53/b53_mmap.c:33:31: expected unsigned char [noderef] [usertype] <asn:2>*regs drivers/net/dsa/b53/b53_mmap.c:33:31: got void *priv and indeed, while what we are doing is functional, we are dereferencing a void * pointer into a void __iomem * which is not great. Just use the defined b53_mmap_priv structure which holds our register base and use that. Fixes: 967dd82ffc52 ("net: dsa: b53: Add support for Broadcom RoboSwitch") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04af_unix: remove redundant lockdep classCong Wang1-10/+0
After commit 581319c58600 ("net/socket: use per af lockdep classes for sk queues") sock queue locks now have per-af lockdep classes, including unix socket. It is no longer necessary to workaround it. I noticed this while looking at a syzbot deadlock report, this patch itself doesn't fix it (this is why I don't add Reported-by). Fixes: 581319c58600 ("net/socket: use per af lockdep classes for sk queues") Cc: Paolo Abeni <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04Merge branch 'net-Broadcom-drivers-sparse-fixes'David S. Miller2-10/+12
Florian Fainelli says: ==================== net: Broadcom drivers sparse fixes This patch series fixes the same warning reported by sparse in bcmsysport and bcmgenet in the code that deals with inserting the TX checksum pointers: drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: cast from restricted __be16 drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: warning: incorrect type in argument 1 (different base types) drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: expected unsigned short [unsigned] [usertype] val drivers/net/ethernet/broadcom/bcmsysport.c:1155:26: got restricted __be16 [usertype] protocol This patch fixes both issues by using the same construct and not swapping skb->protocol but instead the values we are checking against. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-04-04net: systemport: Fix sparse warnings in bcm_sysport_insert_tsb()Florian Fainelli1-5/+6
skb->protocol is a __be16 which we would be calling htons() against, while this is not wrong per-se as it correctly results in swapping the value on LE hosts, this still upsets sparse. Adopt a similar pattern to what other drivers do and just assign ip_ver to skb->protocol, and then use htons() against the different constants such that the compiler can resolve the values at build time. Fixes: 80105befdb4b ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04net: bcmgenet: Fix sparse warnings in bcmgenet_put_tx_csum()Florian Fainelli1-5/+6
skb->protocol is a __be16 which we would be calling htons() against, while this is not wrong per-se as it correctly results in swapping the value on LE hosts, this still upsets sparse. Adopt a similar pattern to what other drivers do and just assign ip_ver to skb->protocol, and then use htons() against the different constants such that the compiler can resolve the values at build time. Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04rxrpc: Fix undefined packet handlingDavid Howells2-0/+12
By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11 should just be discarded rather than being aborted like other undefined packet types. Reported-by: Jeffrey Altman <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-04MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driverJohn Garry1-0/+7
Add John Garry as maintainer for drivers/bus/hisi_lpc.c, the HiSilicon LPC driver. Signed-off-by: John Garry <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2018-04-04HISI LPC: Add ACPI supportJohn Garry1-4/+204
Based on the previous patches, this patch supports the LPC host on Hip06/Hip07 for ACPI FW. It is the responsibility of the LPC host driver to enumerate the child devices, as the ACPI scan code will not enumerate children of "indirect IO" hosts. The ACPI table for the LPC host controller and the child devices is in the following format: Device (LPC0) { Name (_HID, "HISI0191") // HiSi LPC Name (_CRS, ResourceTemplate () { Memory32Fixed (ReadWrite, 0xa01b0000, 0x1000) }) } Device (LPC0.IPMI) { Name (_HID, "IPI0001") Name (LORS, ResourceTemplate() { QWordIO ( ResourceConsumer, MinNotFixed, // _MIF MaxNotFixed, // _MAF PosDecode, EntireRange, 0x0, // _GRA 0xe4, // _MIN 0x3fff, // _MAX 0x0, // _TRA 0x04, // _LEN , , BTIO ) }) Since the IO resources of the child devices need to be translated from LPC bus addresses to logical PIO addresses, and we shouldn't modify the resources of the devices generated in the FW scan, a per-child MFD is created as a substitute. The MFD IO resources will be the translated bus addresses of the ACPI child. Tested-by: dann frazier <[email protected]> Signed-off-by: John Garry <[email protected]> Signed-off-by: Zhichang Yuan <[email protected]> Signed-off-by: Gabriele Paoloni <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]>
2018-04-04ACPI / scan: Do not enumerate Indirect IO host childrenJohn Garry1-0/+14
Through the logical PIO framework, systems which otherwise have no IO space access to legacy ISA/LPC devices may access these devices through so-called "indirect IO" method. In this, IO space accesses for non-PCI hosts are redirected to a host LLDD to manually generate the IO space (bus) accesses. Hosts are able to register a region in logical PIO space to map to its bus address range. Indirect IO child devices have an associated host-specific bus address. Special translation is required to map between a logical PIO address for a device and its host bus address. Since in the ACPI tables the child device IO resources would be the host-specific values, it is required the ACPI scan code should not enumerate these devices, and that this should be the responsibility of the host driver so that it can "fixup" the resources so that they map to the appropriate logical PIO addresses. To avoid enumerating these child devices, add a check from acpi_device_enumeration_by_parent() as to whether the parent for a device is a member of a known list of "indirect IO" hosts. For now, the HiSilicon LPC host controller ID is added. Tested-by: dann frazier <[email protected]> Signed-off-by: John Garry <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]>
2018-04-04ACPI / scan: Rename acpi_is_serial_bus_slave() for more general useJohn Garry2-10/+11
Currently the ACPI scan has special handling for serial bus slaves, in that it makes it the responsibility of the slave device's parent to enumerate the device. To support other types of slave devices which require the same special handling but where the bus is not strictly a serial bus, such as devices on the HiSilicon LPC controller bus, rename acpi_is_serial_bus_slave() to acpi_device_enumeration_by_parent(), so that the name can fit the wider purpose. Also rename the associated device flag acpi_device_flags.serial_bus_slave to .enumeration_by_parent. Signed-off-by: John Garry <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]>
2018-04-04HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindingsZhichang Yuan4-0/+457
The low-pin-count (LPC) interface of Hip06/Hip07 accesses I/O port space of peripherals. Implement the LPC host controller driver which performs the I/O operations on the underlying hardware. We don't want to touch existing drivers such as ipmi-bt, so this driver applies the indirect-IO introduced in the previous patch after registering an indirect-IO node to the indirect-IO devices list which will be searched by the I/O accessors to retrieve the host-local I/O port. The driver config is set as a bool instead of a tristate. The reason here is that, by the very nature of the driver providing a logical PIO range, it does not make sense to have this driver as a loadable module. Another more specific reason is that the Huawei D03 board which includes Hip06 SoC requires the LPC bus for UART console, so should be built in. Tested-by: dann frazier <[email protected]> Signed-off-by: Zou Rongrong <[email protected]> Signed-off-by: Zhichang Yuan <[email protected]> Signed-off-by: John Garry <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Rob Herring <[email protected]> # dts part
2018-04-04of: Add missing I/O range exception for indirect-IO devicesZhichang Yuan1-16/+76
There are some special ISA/LPC devices that work on a specific I/O range where it is not correct to specify a 'ranges' property in the DTS parent node as CPU addresses translated from DTS node are only for memory space on some architectures, such as ARM64. Without the parent 'ranges' property, of_translate_address() returns an error. Here we add special handling for this case. During the OF address translation, some checking will be performed to identify whether the device node is registered as indirect-IO. If it is, the I/O translation will be done in a different way from that one of PCI MMIO. In this way, the I/O 'reg' property of the special ISA/LPC devices will be parsed correctly. Tested-by: dann frazier <[email protected]> Signed-off-by: Zhichang Yuan <[email protected]> Signed-off-by: Gabriele Paoloni <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> # earlier draft Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Rob Herring <[email protected]>
2018-04-04PCI: Apply the new generic I/O management on PCI IO hostsZhichang Yuan2-76/+18
After introducing the new generic I/O space management (Logical PIO), the original PCI MMIO relevant helpers need to be updated based on the new interfaces defined in logical PIO. Adapt the corresponding code to match the changes introduced by logical PIO. Tested-by: dann frazier <[email protected]> Signed-off-by: Zhichang Yuan <[email protected]> Signed-off-by: Gabriele Paoloni <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> # earlier draft Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]>
2018-04-04PCI: Add fwnode handler as input param of pci_register_io_range()Gabriele Paoloni4-6/+12
In preparation for having the PCI MMIO helpers use the new generic I/O space management (logical PIO) we need to add the fwnode handler as an extra input parameter. Changes the signature of pci_register_io_range() and its callers as needed. Tested-by: dann frazier <[email protected]> Signed-off-by: Gabriele Paoloni <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Acked-by: Rob Herring <[email protected]>
2018-04-04PCI: Remove __weak tag from pci_register_io_range()Gabriele Paoloni1-1/+1
pci_register_io_range() has only one definition, so there is no need for the __weak attribute. Remove it. Tested-by: dann frazier <[email protected]> Signed-off-by: Gabriele Paoloni <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]>
2018-04-04fscache: Attach the index key and aux data to the cookieDavid Howells26-786/+606
Attach copies of the index key and auxiliary data to the fscache cookie so that: (1) The callbacks to the netfs for this stuff can be eliminated. This can simplify things in the cache as the information is still available, even after the cache has relinquished the cookie. (2) Simplifies the locking requirements of accessing the information as we don't have to worry about the netfs object going away on us. (3) The cache can do lazy updating of the coherency information on disk. As long as the cache is flushed before reboot/poweroff, there's no need to update the coherency info on disk every time it changes. (4) Cookies can be hashed or put in a tree as the index key is easily available. This allows: (a) Checks for duplicate cookies can be made at the top fscache layer rather than down in the bowels of the cache backend. (b) Caching can be added to a netfs object that has a cookie if the cache is brought online after the netfs object is allocated. A certain amount of space is made in the cookie for inline copies of the data, but if it won't fit there, extra memory will be allocated for it. The downside of this is that live cache operation requires more memory. Signed-off-by: David Howells <[email protected]> Acked-by: Anna Schumaker <[email protected]> Tested-by: Steve Dickson <[email protected]>
2018-04-04fscache: Add more tracepointsDavid Howells6-8/+330
Add more tracepoints to fscache, including: (*) fscache_page - Tracks netfs pages known to fscache. (*) fscache_check_page - Tracks the netfs querying whether a page is pending storage. (*) fscache_wake_cookie - Tracks cookies being woken up after a page completes/aborts storage in the cache. (*) fscache_op - Tracks operations being initialised. (*) fscache_wrote_page - Tracks return of the backend write_page op. (*) fscache_gang_lookup - Tracks lookup of pages to be stored in the write operation. Signed-off-by: David Howells <[email protected]>
2018-04-04fscache: Add tracepointsDavid Howells12-54/+731
Add some tracepoints to fscache: (*) fscache_cookie - Tracks a cookie's usage count. (*) fscache_netfs - Logs registration of a network filesystem, including the pointer to the cookie allocated. (*) fscache_acquire - Logs cookie acquisition. (*) fscache_relinquish - Logs cookie relinquishment. (*) fscache_enable - Logs enablement of a cookie. (*) fscache_disable - Logs disablement of a cookie. (*) fscache_osm - Tracks execution of states in the object state machine. and cachefiles: (*) cachefiles_ref - Tracks a cachefiles object's usage count. (*) cachefiles_lookup - Logs result of lookup_one_len(). (*) cachefiles_mkdir - Logs result of vfs_mkdir(). (*) cachefiles_create - Logs result of vfs_create(). (*) cachefiles_unlink - Logs calls to vfs_unlink(). (*) cachefiles_rename - Logs calls to vfs_rename(). (*) cachefiles_mark_active - Logs an object becoming active. (*) cachefiles_wait_active - Logs a wait for an old object to be destroyed. (*) cachefiles_mark_inactive - Logs an object becoming inactive. (*) cachefiles_mark_buried - Logs the burial of an object. Signed-off-by: David Howells <[email protected]>
2018-04-04fscache: Fix hanging wait on page discarded by writebackDavid Howells1-4/+9
If the fscache asynchronous write operation elects to discard a page that's pending storage to the cache because the page would be over the store limit then it needs to wake the page as someone may be waiting on completion of the write. The problem is that the store limit may be updated by a different asynchronous operation - and so may miss the write - and that the store limit may not even get updated until later by the netfs. Fix the kernel hang by making fscache_write_op() mark as written any pages that are over the limit. Signed-off-by: David Howells <[email protected]>
2018-04-04fscache: Detect multiple relinquishment of a cookieDavid Howells1-1/+2
Report if an fscache cookie is relinquished multiple times by the netfs. Signed-off-by: David <[email protected]>
2018-04-04fscache: Pass the correct cancelled indications to fscache_op_complete()David Howells2-7/+10
The last parameter to fscache_op_complete() is a bool indicating whether or not the operation was cancelled. A lot of the time the inverse value is given or no differentiation is made. Fix this. Signed-off-by: David Howells <[email protected]>
2018-04-04fscache, cachefiles: Fix checker warningsDavid Howells2-1/+1
Fix a couple of checker warnings in fscache and cachefiles: (1) fscache_n_op_requeue is never used, so get rid of it. (2) cachefiles_uncache_page() is passed in a lock that it releases, so this needs annotating. Signed-off-by: David Howells <[email protected]>
2018-04-04afs: Be more aggressive in retiring cached vnodesDavid Howells1-2/+3
When relinquishing cookies, either due to iget failure or to inode eviction, retire a cookie if we think the corresponding vnode got deleted on the server rather than just letting it lie in the cache. Signed-off-by: David Howells <[email protected]>
2018-04-04afs: Use the vnode ID uniquifier in the cache key not the aux dataDavid Howells1-14/+8
AFS vnodes (files) are referenced by a triplet of { volume ID, vnode ID, uniquifier }. Currently, kafs is only using the vnode ID as the file key in the volume fscache index and checking the uniquifier on cookie acquisition against the contents of the auxiliary data stored in the cache. Unfortunately, this is subject to a race in which an FS.RemoveFile or FS.RemoveDir op is issued against the server but the local afs inode isn't torn down and disposed off before another thread issues something like FS.CreateFile. The latter then gets given the vnode ID that just got removed, but with a new uniquifier and a cookie collision occurs in the cache because the cookie is only keyed on the vnode ID whereas the inode is keyed on the vnode ID plus the uniquifier. Fix this by keying the cookie on the uniquifier in addition to the vnode ID and dropping the uniquifier from the auxiliary data supplied. Signed-off-by: David Howells <[email protected]>
2018-04-04afs: Invalidate cache on server data changeDavid Howells1-0/+4
Invalidate any data stored in fscache for a vnode that changes on the server so that we don't end up with the cache in a bad state locally. Signed-off-by: David Howells <[email protected]>
2018-04-04cxl: Fix possible deadlock when processing page faults from cxllibFrederic Barrat1-30/+55
cxllib_handle_fault() is called by an external driver when it needs to have the host resolve page faults for a buffer. The buffer can cover several pages and VMAs. The function iterates over all the pages used by the buffer, based on the page size of the VMA. To ensure some stability while processing the faults, the thread T1 grabs the mm->mmap_sem semaphore with read access (R1). However, when processing a page fault for a single page, one of the underlying functions, copro_handle_mm_fault(), also grabs the same semaphore with read access (R2). So the thread T1 takes the semaphore twice. If another thread T2 tries to access the semaphore in write mode W1 (say, because it wants to allocate memory and calls 'brk'), then that thread T2 will have to wait because there's a reader (R1). If the thread T1 is processing a new page at that time, it won't get an automatic grant at R2, because there's now a writer thread waiting (T2). And we have a deadlock. The timeline is: 1. thread T1 owns the semaphore with read access R1 2. thread T2 requests write access W1 and waits 3. thread T1 requests read access R2 and waits The fix is for the thread T1 to release the semaphore R1 once it got the information it needs from the current VMA. The address space/VMAs could evolve while T1 iterates over the full buffer, but in the unlikely case where T1 misses a page, the external driver will raise a new page fault when retrying the memory access. Fixes: 3ced8d730063 ("cxl: Export library to support IBM XSL") Cc: [email protected] # 4.13+ Signed-off-by: Frederic Barrat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports itNaveen N. Rao1-1/+2
We get the below warning if we try to use kexec on P9: kexec_core: Starting new kernel WARNING: CPU: 0 PID: 1223 at arch/powerpc/kernel/process.c:826 __set_breakpoint+0xb4/0x140 [snip] NIP __set_breakpoint+0xb4/0x140 LR kexec_prepare_cpus_wait+0x58/0x150 Call Trace: 0xc0000000ee70fb20 (unreliable) 0xc0000000ee70fb20 default_machine_kexec+0x234/0x2c0 machine_kexec+0x84/0x90 kernel_kexec+0xd8/0xe0 SyS_reboot+0x214/0x2c0 system_call+0x58/0x6c This happens since we are trying to clear hw breakpoint on POWER9, though we don't have CPU_FTR_DAWR enabled. Guard __set_breakpoint() within hw_breakpoint_disable() with ppc_breakpoint_available() to address this. Fixes: 9654153158d3 ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Naveen N. Rao <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04openrisc: Set CONFIG_MULTI_IRQ_HANDLERPalmer Dabbelt1-0/+4
arm has an optional MULTI_IRQ_HANDLER, which openrisc copied but didn't make optional. The multi irq handler infrastructure has been copied to generic code selectable with a new config symbol. That symbol can be selected by randconfig builds and can cause build breakage. Introduce CONFIG_MULTI_IRQ_HANDLER as an intermediate step which prevents the core config symbol from being selected. The openrisc local config symbol will be removed once openrisc gets converted to the generic code. Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Arnd Bergmann <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-04-04arm64: Set CONFIG_MULTI_IRQ_HANDLERPalmer Dabbelt1-0/+4
arm has an optional MULTI_IRQ_HANDLER, which arm64 copied but didn't make optional. The multi irq handler infrastructure has been copied to generic code selectable with a new config symbol. That symbol can be selected by randconfig builds and can cause build breakage. Introduce CONFIG_MULTI_IRQ_HANDLER as an intermediate step which prevents the core config symbol from being selected. The arm64 local config symbol will be removed once arm64 gets converted to the generic code. Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Arnd Bergmann <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-04-04genirq: Make GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLERPalmer Dabbelt1-0/+1
These config switches enable the same code in the core and the not yet converted architecture code. They can be selected both by randconfig builds and cause linker error because the same symbols are defined twice. Make the new GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER to prevent that. The dependency will be removed once all architectures are converted over. Signed-off-by: Palmer Dabbelt <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Arnd Bergmann <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-04-04powerpc/mm/radix: Update command line parsing for disable_radixAneesh Kumar K.V2-4/+14
kernel parameter disable_radix takes different options disable_radix=yes|no|1|0 or just disable_radix. prom_init parsing is not supporting these options. Fixes: 1fd6c0220710 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04powerpc/mm/radix: Parse disable_radix commandline correctly.Aneesh Kumar K.V1-1/+1
kernel parameter disable_radix takes different options disable_radix=yes|no|1|0 or just disable_radix. When using the later format we get below error. `Malformed early option 'disable_radix'` Fixes: 1fd6c0220710 ("powerpc/mm: Add a CONFIG option to choose if radix is used by default") Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlbAneesh Kumar K.V1-5/+13
With 64k page size, we have hugetlb pte entries at the pmd and pud level for book3s64. We don't need to create a separate page table cache for that. With 4k we need to make sure hugepd page table cache for 16M is placed at PUD level and 16G at the PGD level. Simplify all these by not using HUGEPD_PD_SHIFT which is confusing for book3s64. Without this patch, with 64k page size we create pagetable caches with shift value 10 and 7 which are not used at all. Fixes: 419df06eea5b ("powerpc: Reduce the PTE_INDEX_SIZE") Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04powerpc/mm/radix: Update pte fragment count from 16 to 256 on radixAneesh Kumar K.V4-12/+17
With split PTL (page table lock) config, we allocate the level 4 (leaf) page table using pte fragment framework instead of slab cache like other levels. This was done to enable us to have split page table lock at the level 4 of the page table. We use page->plt backing the all the level 4 pte fragment for the lock. Currently with Radix, we use only 16 fragments out of the allocated page. In radix each fragment is 256 bytes which means we use only 4k out of the allocated 64K page wasting 60k of the allocated memory. This was done earlier to keep it closer to hash. This patch update the pte fragment count to 256, thereby using the full 64K page and reducing the memory usage. Performance tests shows really low impact even with THP disabled. With THP disabled we will be contenting further less on level 4 ptl and hence the impact should be further low. 256 threads: without patch (10 runs of ./ebizzy -m -n 1000 -s 131072 -S 100) median = 15678.5 stdev = 42.1209 with patch: median = 15354 stdev = 194.743 This is with THP disabled. With THP enabled the impact of the patch will be less. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04powerpc/mm/keys: Update documentation and remove unnecessary checkAneesh Kumar K.V2-23/+16
Adds more code comments. We also remove an unnecessary pkey check after we check for pkey error in this patch. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2018-04-04Merge branch 'old.dcache' into work.dcacheAl Viro1-2/+2
2018-04-03RDMA: Use ib_gid_attr during GID modificationParav Pandit9-95/+63
Now that ib_gid_attr contains device, port and index, simplify the provider APIs add_gid() and del_gid() to use device, port and index fields from the ib_gid_attr attributes structure. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-04-03IB/providers: Avoid null netdev check for RoCEParav Pandit10-73/+54
Now that IB core GID cache ensures that all RoCE entries have an associated netdev remove null checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-04-03IB/providers: Avoid zero GID check for RoCEParav Pandit5-19/+1
Now that the IB core GID cache ensures that a zero GID doesn't exist in the GID table remove zero GID checks from the provider drivers for clarity. Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-04-03IB/core: Refactor GID modify code for RoCEParav Pandit3-235/+272
Code is refactored to prepare separate functions for RoCE which can do more complex operations related to reference counting, while still maintainining code readability. This includes (a) Simplification to not perform netdevice checks and modifications for IB link layer. (b) Do not add RoCE GID entry which has NULL netdevice; instead return an error. (c) If GID addition fails at provider level add_gid(), do not add the entry in the cache and keep the entry marked as INVALID. (d) Simplify and reuse the ib_cache_gid_add()/del() routines so that they can be used even for modifying default GIDs. This avoid some code duplication in modifying default GIDs. (e) find_gid() routine refers to the data entry flags to qualify a GID as valid or invalid GID rather than depending on attributes and zeroness of the GID content. (f) gid_table_reserve_default() sets the GID default attribute at beginning while setting up the GID table. There is no need to use default_gid flag in low level functions such as write_gid(), add_gid(), del_gid(), as they never need to update the DEFAULT property of the GID entry while during GID table update. As as result of this refactor, reserved GID 0:0:0:0:0:0:0:0 is no longer searchable as described below. A unicast GID entry of 0:0:0:0:0:0:0:0 is Reserved GID as per the IB spec version 1.3 section 4.1.1, point (6) whose snippet is below. "The unicast GID address 0:0:0:0:0:0:0:0 is reserved - referred to as the Reserved GID. It shall never be assigned to any endport. It shall not be used as a destination address or in a global routing header (GRH)." GID table cache now only stores valid GID entries. Before this patch, Reserved GID 0:0:0:0:0:0:0:0 was searchable in the GID table using ib_find_cached_gid_by_port() and other similar find routines. Zero GID is no longer searchable as it shall not to be present in GRH or path recored entry as described in IB spec version 1.3 section 4.1.1, point (6), section 12.7.10 and section 12.7.20. ib_cache_update() is simplified to check link layer once, use unified locking scheme for all link layers, removed temporary gid table allocation/free logic. Additionally, (a) Expand ib_gid_attr to store port and index so that GID query routines can get port and index information from the attribute structure. (b) Expand ib_gid_attr to store device as well so that in future code when GID reference counting is done, device is used to reach back to the GID table entry. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-04-03IB/core: Simplify ib_query_gid to always refer to cacheParav Pandit2-14/+5
Currently following inconsistencies exist. 1. ib_query_gid() returns GID from the software cache for a RoCE port and returns GID from the HCA for an IB port. This is incorrect because software GID cache is maintained regardless of HCA port type. 2. GID is queries from the HCA via ib_query_gid and updated in the software cache for IB link layer. Both of them might not be in sync. ULPs such as SRP initiator, SRP target, IPoIB driver have historically used ib_query_gid() API to query the GID. However CM used cached version during CM processing, When software cache was introduced, this inconsitency remained. In order to simplify, improve readability and avoid link layer specific above inconsistencies, this patch brings following changes. 1. ib_query_gid() always refers to the cache layer regardless of link layer. 2. cache module who reads the GID entry from HCA and builds the cache, directly invokes the HCA provider verb's query_gid() callback function. 3. ib_query_port() is being called in early stage where GID cache is not yet build while reading port immutable property. Therefore it needs to read the default GID from the HCA for IB link layer to publish the subnet prefix. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-04-03RDMA/providers: Simplify query_gid callback of RoCE providersParav Pandit11-87/+4
ib_query_gid() fetches the GID from the software cache maintained in ib_core for RoCE ports. Therefore, simplify the provider drivers for RoCE to treat query_gid() callback as never called for RoCE, and only require non-RoCE devices to implement it. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2018-04-03RDMA/core: Update query_gid documentation for HCA driversParav Pandit1-0/+4
query_gid() should return right GID value for iWarp and IB link layers. It is a no-op for RoCE link layer. Update the documentation to reflect this. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>