aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-09-01net: systemport: Correctly set TSB endian for hostFlorian Fainelli2-1/+15
Similarly to how we configure the RSB (Receive Status Block) we also need to set the TSB (Transmit Status Block) based on the host endian. This was missing from the commit indicated below. Fixes: 389a06bc534e ("net: systemport: Set correct RSB endian bits based on host") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-09-01Merge branch 'inet_diag-TCP-MD5'David S. Miller5-10/+138
Ivan Delalande says: ==================== inet_diag: report TCP MD5 signing keys and addresses Allow userspace to retrieve MD5 signature keys and addresses configured on TCP sockets through inet_diag. Thanks to Eric Dumazet and Stephen Hemminger for their useful explanations and feedback. v5: - memset the whole netlink payload after it has been nla_reserve-d in tcp_diag_put_md5sig (a third memset had to be added for tcpm_key so we might as well have just one for entire region). - move the nla_total_size call from inet_sk_attr_size to the idiag_get_aux_size defined by protocols as they could add multiple netlink attributes, - add check for net_admin in tcp_diag_get_aux_size. v4: - add new struct tcp_diag_md5sig to report the data instead of tcp_md5sig to avoid wasting 112 bytes on every tcpm_addr, - memset tcpm_addr on IPv4 addresses to avoid leaks, - style fix in inet_diag_dump_one_icsk. v3: - rename inet_diag_*md5sig in tcp_diag.c to tcp_diag_* for consistency, - don't lock the socket in tcp_diag_put_md5sig, - add checks on md5sig_count in tcp_diag_put_md5sig to not create the netlink attribute if the list is empty, and to avoid overflows or memory leaks if the list has changed in the meantime. v2: - move changes to tcp_diag.c and extend inet_diag_handler to allow protocols to provide additional data on INET_DIAG_INFO, - lock socket before calling tcp_diag_put_md5sig. I also have a patch for iproute2/ss to test this change, making it print this new attribute. I'm planning to polish and send it if this series gets applied. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-09-01tcp_diag: report TCP MD5 signing keys and addressesIvan Delalande3-6/+113
Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is not possible to retrieve these from the kernel once they have been configured on sockets. Signed-off-by: Ivan Delalande <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-09-01inet_diag: allow protocols to provide additional dataIvan Delalande2-4/+25
Extend inet_diag_handler to allow individual protocols to report additional data on INET_DIAG_INFO through idiag_get_aux. The size can be dynamic and is computed by idiag_get_aux_size. Signed-off-by: Ivan Delalande <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-09-01ipv6: sr: Use ARRAY_SIZE macroThomas Meyer1-3/+4
Grepping for "sizeof\(.+\) / sizeof\(" found this as one of the first candidates. Maybe a coccinelle can catch all of those. Signed-off-by: Thomas Meyer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-09-01net: qualcomm: rmnet: remove unused variable privColin Ian King1-3/+0
priv is being assigned but is never used, so remove it. Cleans up clang build warning: "warning: Value stored to 'priv' is never read" Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation") Signed-off-by: Colin Ian King <[email protected]> Acked-by: Subash Abhinov Kasiviswanathan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-09-01net: phy: bcm7xxx: make array bcm7xxx_suspend_cfg static, reduces object ↵Colin Ian King1-1/+1
code size Don't populate the array bcm7xxx_suspend_cfg A on the stack, instead make it static. Makes the object code smaller by over 300 bytes: Before: text data bss dec hex filename 6351 8146 0 14497 38a1 drivers/net/phy/bcm7xxx.o After: text data bss dec hex filename 5986 8210 0 14196 3774 drivers/net/phy/bcm7xxx.o Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-09-01fsl/fman: make arrays port_ids static, reduces object code sizeColin Ian King1-4/+10
Don't populate the arrays port_ids on the stack, instead make them static. Makes the object code smaller by over 700 bytes: Before: text data bss dec hex filename 28785 5832 192 34809 87f9 fman.o After: text data bss dec hex filename 27921 5992 192 34105 8539 fman.o Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller359-1658/+2437
Three cases of simple overlapping changes. Signed-off-by: David S. Miller <[email protected]>
2017-09-01inetpeer: fix RCU lookup()Eric Dumazet1-3/+6
Excess of seafood or something happened while I cooked the commit adding RB tree to inetpeer. Of course, RCU rules need to be respected or bad things can happen. In this particular loop, we need to read *pp once per iteration, not twice. Fixes: b145425f269a ("inetpeer: remove AVL implementation in favor of RB tree") Reported-by: John Sperbeck <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-09-01Merge tag 'armsoc-fixes' of ↵Linus Torvalds20-349/+2
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "A couple of late-arriving fixes before final 4.13: - A few reverts of DT bindings on Allwinner for their ethernet driver. Discussion didn't converge, and since bindings are considered ABI it makes sense to revert instead of having to support two bindings long-term. - A fix to enumerate GPIOs properly on Marvell Armada AP806" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: arm64: dts: marvell: fix number of GPIOs in Armada AP806 description arm: dts: sunxi: Revert EMAC changes arm64: dts: allwinner: Revert EMAC changes dt-bindings: net: Revert sun8i dwmac binding
2017-09-01Merge tag 'mvebu-fixes-4.13-3' of git://git.infradead.org/linux-mvebu into fixesOlof Johansson1-2/+2
mvebu fixes for 4.13 (part 3) Fix number of GPIOs in AP806 description for Armada 7K/8K * tag 'mvebu-fixes-4.13-3' of git://git.infradead.org/linux-mvebu: arm64: dts: marvell: fix number of GPIOs in Armada AP806 description Signed-off-by: Olof Johansson <[email protected]>
2017-09-01Merge branch 'i2c/for-current' of ↵Linus Torvalds2-6/+16
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "The ismt driver had a problem with a rarely used transaction type and the designware driver was made even more robust against non standard ACPI tables" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: designware: Round down ACPI provided clk to nearest supported clk i2c: ismt: Return EMSGSIZE for block reads with bogus length i2c: ismt: Don't duplicate the receive length for block reads
2017-09-01Bluetooth: make baswap src constLoic Poulain2-4/+4
Signed-off-by: Loic Poulain <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2017-09-01fsmap: fix documentation of FMR_OF_LASTDarrick J. Wong1-1/+1
The FMR_OF_LAST flag is set on the last fsmap record being returned for the dataset requested, contrary to what the header file says. Fix the docs to reflect the behavior of all fsmap implementations. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]>
2017-09-01xfs: simplify the rmap code in xfs_bmse_mergeDarrick J. Wong1-4/+3
In Christoph's patch to refactor xfs_bmse_merge, the updated rmap code does more work than it needs to (because map-extent auto-merges records). Remove the unnecessary unmap and save ourselves a deferred op. Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]>
2017-09-01xfs: remove unused flags arg from xfs_file_iomap_begin_delayEric Sandeen1-3/+1
Signed-off-by: Eric Sandeen <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: fix incorrect log_flushed on fsyncAmir Goldstein1-7/+0
When calling into _xfs_log_force{,_lsn}() with a pointer to log_flushed variable, log_flushed will be set to 1 if: 1. xlog_sync() is called to flush the active log buffer AND/OR 2. xlog_wait() is called to wait on a syncing log buffers xfs_file_fsync() checks the value of log_flushed after _xfs_log_force_lsn() call to optimize away an explicit PREFLUSH request to the data block device after writing out all the file's pages to disk. This optimization is incorrect in the following sequence of events: Task A Task B ------------------------------------------------------- xfs_file_fsync() _xfs_log_force_lsn() xlog_sync() [submit PREFLUSH] xfs_file_fsync() file_write_and_wait_range() [submit WRITE X] [endio WRITE X] _xfs_log_force_lsn() xlog_wait() [endio PREFLUSH] The write X is not guarantied to be on persistent storage when PREFLUSH request in completed, because write A was submitted after the PREFLUSH request, but xfs_file_fsync() of task A will be notified of log_flushed=1 and will skip explicit flush. If the system crashes after fsync of task A, write X may not be present on disk after reboot. This bug was discovered and demonstrated using Josef Bacik's dm-log-writes target, which can be used to record block io operations and then replay a subset of these operations onto the target device. The test goes something like this: - Use fsx to execute ops of a file and record ops on log device - Every now and then fsync the file, store md5 of file and mark the location in the log - Then replay log onto device for each mark, mount fs and compare md5 of file to stored value Cc: Christoph Hellwig <[email protected]> Cc: Josef Bacik <[email protected]> Cc: <[email protected]> Signed-off-by: Amir Goldstein <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: disable per-inode DAX flagChristoph Hellwig1-1/+2
Currently flag switching can be used to easily crash the kernel. Disable the per-inode DAX flag until that is sorted out. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: replace xfs_qm_get_rtblks with a direct call to xfs_bmap_count_leavesChristoph Hellwig1-32/+12
Use the existing functionality instead of directly poking into the extent list. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: rewrite xfs_bmap_count_leaves using xfs_iext_get_extentChristoph Hellwig2-11/+11
This avoids poking into the internals of the extent list. Also return the number of extents as the return value instead of an additional by reference argument, and make it available to callers outside of xfs_bmap_util.c Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: use xfs_iext_*_extent helpers in xfs_bmap_split_extent_atChristoph Hellwig1-16/+4
This abstracts the function away from details of the low-level extent list implementation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: use xfs_iext_*_extent helpers in xfs_bmap_shift_extentsChristoph Hellwig1-92/+88
This abstracts the function away from details of the low-level extent list implementation. Note that it seems like the previous implementation of rmap for the merge case was completely broken, but it no seems appear to trigger that. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: move some code around inside xfs_bmap_shift_extentsChristoph Hellwig1-25/+29
For the first right move we need to look up next_fsb. That means our last fsb that contains next_fsb must also be the current extent, so take advantage of that by moving the code around a bit. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()Oleg Nesterov1-16/+26
The race was introduced by me in commit 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead"). I did not realize that nothing can protect eventpoll after ep_poll_callback() sets ->whead = NULL, only whead->lock can save us from the race with ep_free() or ep_remove(). Move ->whead = NULL to the end of ep_poll_callback() and add the necessary barriers. TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even before this patch. Hopefully this explains use-after-free reported by syzcaller: BUG: KASAN: use-after-free in debug_spin_lock_before ... _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148 this is spin_lock(eventpoll->lock), ... Freed by task 17774: ... kfree+0xe8/0x2c0 mm/slub.c:3883 ep_free+0x22c/0x2a0 fs/eventpoll.c:865 Fixes: 971316f0503a ("epoll: ep_unregister_pollwait() can use the freed pwq->whead") Reported-by: 范龙飞 <[email protected]> Cc: [email protected] Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-09-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds119-559/+816
Pull networking fixes from David Miller: 1) Fix handling of pinned BPF map nodes in hash of maps, from Daniel Borkmann. 2) IPSEC ESP error paths leak memory, from Steffen Klassert. 3) We need an RCU grace period before freeing fib6_node objects, from Wei Wang. 4) Must check skb_put_padto() return value in HSR driver, from FLorian Fainelli. 5) Fix oops on PHY probe failure in ftgmac100 driver, from Andrew Jeffery. 6) Fix infinite loop in UDP queue when using SO_PEEK_OFF, from Eric Dumazet. 7) Use after free when tcf_chain_destroy() called multiple times, from Jiri Pirko. 8) Fix KSZ DSA tag layer multiple free of SKBS, from Florian Fainelli. 9) Fix leak of uninitialized memory in sctp_get_sctp_info(), inet_diag_msg_sctpladdrs_fill() and inet_diag_msg_sctpaddrs_fill(). From Stefano Brivio. 10) L2TP tunnel refcount fixes from Guillaume Nault. 11) Don't leak UDP secpath in udp_set_dev_scratch(), from Yossi Kauperman. 12) Revert a PHY layer change wrt. handling of PHY_HALTED state in phy_stop_machine(), it causes regressions for multiple people. From Florian Fainelli. 13) When packets are sent out of br0 we have to clear the offload_fwdq_mark value. 14) Several NULL pointer deref fixes in packet schedulers when their ->init() routine fails. From Nikolay Aleksandrov. 15) Aquantium devices cannot checksum offload correctly when the packet is <= 60 bytes. From Pavel Belous. 16) Fix vnet header access past end of buffer in AF_PACKET, from Benjamin Poirier. 17) Double free in probe error paths of nfp driver, from Dan Carpenter. 18) QOS capability not checked properly in DCB init paths of mlx5 driver, from Huy Nguyen. 19) Fix conflicts between firmware load failure and health_care timer in mlx5, also from Huy Nguyen. 20) Fix dangling page pointer when DMA mapping errors occur in mlx5, from Eran Ben ELisha. 21) ->ndo_setup_tc() in bnxt_en driver doesn't count rings properly, from Michael Chan. 22) Missing MSIX vector free in bnxt_en, also from Michael Chan. 23) Refcount leak in xfrm layer when using sk_policy, from Lorenzo Colitti. 24) Fix copy of uninitialized data in qlge driver, from Arnd Bergmann. 25) bpf_setsockopts() erroneously always returns -EINVAL even on success. Fix from Yuchung Cheng. 26) tipc_rcv() needs to linearize the SKB before parsing the inner headers, from Parthasarathy Bhuvaragan. 27) Fix deadlock between link status updates and link removal in netvsc driver, from Stephen Hemminger. 28) Missed locking of page fragment handling in ESP output, from Steffen Klassert. 29) Fix refcnt leak in ebpf congestion control code, from Sabrina Dubroca. 30) sxgbe_probe_config_dt() doesn't check devm_kzalloc()'s return value, from Christophe Jaillet. 31) Fix missing ipv6 rx_dst_cookie update when rx_dst is updated during early demux, from Paolo Abeni. 32) Several info leaks in xfrm_user layer, from Mathias Krause. 33) Fix out of bounds read in cxgb4 driver, from Stefano Brivio. 34) Properly propagate obsolete state of route upwards in ipv6 so that upper holders like xfrm can see it. From Xin Long. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (118 commits) udp: fix secpath leak bridge: switchdev: Clear forward mark when transmitting packet mlxsw: spectrum: Forbid linking to devices that have uppers wl1251: add a missing spin_lock_init() Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()" net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278 kcm: do not attach PF_KCM sockets to avoid deadlock sch_tbf: fix two null pointer dereferences on init failure sch_sfq: fix null pointer dereference on init failure sch_netem: avoid null pointer deref on init failure sch_fq_codel: avoid double free on init failure sch_cbq: fix null pointer dereferences on init failure sch_hfsc: fix null pointer deref and double free on init failure sch_hhf: fix null pointer dereference on init failure sch_multiq: fix double free on init failure sch_htb: fix crash on init failure net/mlx5e: Fix CQ moderation mode not set properly net/mlx5e: Fix inline header size for small packets net/mlx5: E-Switch, Unload the representors in the correct order net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address ...
2017-09-01Merge tag 'ceph-for-4.13-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds2-18/+18
Pull ceph fix from Ilya Dryomov: "ceph fscache page locking fix from Zheng, marked for stable" * tag 'ceph-for-4.13-rc8' of git://github.com/ceph/ceph-client: ceph: fix readpage from fscache
2017-09-01xfs: use xfs_iext_get_extent in xfs_bmap_first_unusedChristoph Hellwig1-5/+7
Use the bmap abstraction instead of open-coding bmbt details here. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: switch xfs_bmap_local_to_extents to use xfs_iext_insertChristoph Hellwig1-4/+7
Use the helper instead of open coding it, to provide a better abstraction for the scalable extent list work. This also gets an additional assert and trace point for free. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: add a xfs_iext_update_extent helperChristoph Hellwig3-6/+20
This helper is used to update an extent record based on the extent index, and can be used to provide a level of abstractions between callers that want to modify in-core extent records and the details of the extent list implementation. Also switch all users of the xfs_bmbt_set_all(xfs_iext_get_ext(...)) pattern to this new helper. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: consolidate the various page fault handlersChristoph Hellwig2-65/+60
Add a new __xfs_filemap_fault helper that implements all four page fault callouts, and make these methods themselves small stubs that set the correct write_fault flag, and exit early for the non-DAX case for the hugepage related ones. Also remove the extra size checking in the pfn_fault path, which is now handled in the core DAX code. Life would be so much simpler if we only had one method for all this. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01iomap: return VM_FAULT_* codes from iomap_page_mkwriteChristoph Hellwig2-3/+2
All callers will need the VM_FAULT_* flags, so convert in the helper. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: relog dirty buffers during swapext bmbt owner changeBrian Foster2-18/+65
The owner change bmbt scan that occurs during extent swap operations does not handle ordered buffer failures. Buffers that cannot be marked ordered must be physically logged so previously dirty ranges of the buffer can be relogged in the transaction. Since the bmbt scan may need to process and potentially log a large number of blocks, we can't expect to complete this operation in a single transaction. Update extent swap to use a permanent transaction with enough log reservation to physically log a buffer. Update the bmbt scan to physically log any buffers that cannot be ordered and to terminate the scan with -EAGAIN. On -EAGAIN, the caller rolls the transaction and restarts the scan. Finally, update the bmbt scan helper function to skip bmbt blocks that already match the expected owner so they are not reprocessed after scan restarts. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> [darrick: fix the xfs_trans_roll call] Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: disallow marking previously dirty buffers as orderedBrian Foster2-3/+6
Ordered buffers are used in situations where the buffer is not physically logged but must pass through the transaction/logging pipeline for a particular transaction. As a result, ordered buffers are not unpinned and written back until the transaction commits to the log. Ordered buffers have a strict requirement that the target buffer must not be currently dirty and resident in the log pipeline at the time it is marked ordered. If a dirty+ordered buffer is committed, the buffer is reinserted to the AIL but not physically relogged at the LSN of the associated checkpoint. The buffer log item is assigned the LSN of the latest checkpoint and the AIL effectively releases the previously logged buffer content from the active log before the buffer has been written back. If the tail pushes forward and a filesystem crash occurs while in this state, an inconsistent filesystem could result. It is currently the caller responsibility to ensure an ordered buffer is not already dirty from a previous modification. This is unclear and error prone when not used in situations where it is guaranteed a buffer has not been previously modified (such as new metadata allocations). To facilitate general purpose use of ordered buffers, update xfs_trans_ordered_buf() to conditionally order the buffer based on state of the log item and return the status of the result. If the bli is dirty, do not order the buffer and return false. The caller must either physically log the buffer (having acquired the appropriate log reservation) or push it from the AIL to clean it before it can be marked ordered in the current transaction. Note that ordered buffers are currently only used in two situations: 1.) inode chunk allocation where previously logged buffers are not possible and 2.) extent swap which will be updated to handle ordered buffer failures in a separate patch. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: move bmbt owner change to last step of extent swapBrian Foster1-18/+26
The extent swap operation currently resets bmbt block owners before the inode forks are swapped. The bmbt buffers are marked as ordered so they do not have to be physically logged in the transaction. This use of ordered buffers is not safe as bmbt buffers may have been previously physically logged. The bmbt owner change algorithm needs to be updated to physically log buffers that are already dirty when/if they are encountered. This means that an extent swap will eventually require multiple rolling transactions to handle large btrees. In addition, all inode related changes must be logged before the bmbt owner change scan begins and can roll the transaction for the first time to preserve fs consistency via log recovery. In preparation for such fixes to the bmbt owner change algorithm, refactor the bmbt scan out of the extent fork swap code to the last operation before the transaction is committed. Update xfs_swap_extent_forks() to only set the inode log flags when an owner change scan is necessary. Update xfs_swap_extents() to trigger the owner change based on the inode log flags. Note that since the owner change now occurs after the extent fork swap, the inode btrees must be fixed up with the inode number of the current inode (similar to log recovery). Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: skip bmbt block ino validation during owner changeBrian Foster3-1/+4
Extent swap uses xfs_btree_visit_blocks() to fix up bmbt block owners on v5 (!rmapbt) filesystems. The bmbt scan uses xfs_btree_lookup_get_block() to read bmbt blocks which verifies the current owner of the block against the parent inode of the bmbt. This works during extent swap because the bmbt owners are updated to the opposite inode number before the inode extent forks are swapped. The modified bmbt blocks are marked as ordered buffers which allows everything to commit in a single transaction. If the transaction commits to the log and the system crashes such that recovery of the extent swap is required, log recovery restarts the bmbt scan to fix up any bmbt blocks that may have not been written back before the crash. The log recovery bmbt scan occurs after the inode forks have been swapped, however. This causes the bmbt block owner verification to fail, leads to log recovery failure and requires xfs_repair to zap the log to recover. Define a new invalid inode owner flag to inform the btree block lookup mechanism that the current inode may be invalid with respect to the current owner of the bmbt block. Set this flag on the cursor used for change owner scans to allow this operation to work at runtime and during log recovery. Signed-off-by: Brian Foster <[email protected]> Fixes: bb3be7e7c ("xfs: check for bogus values in btree block headers") Cc: [email protected] Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: don't log dirty ranges for ordered buffersBrian Foster3-18/+16
Ordered buffers are attached to transactions and pushed through the logging infrastructure just like normal buffers with the exception that they are not actually written to the log. Therefore, we don't need to log dirty ranges of ordered buffers. xfs_trans_log_buf() is called on ordered buffers to set up all of the dirty state on the transaction, buffer and log item and prepare the buffer for I/O. Now that xfs_trans_dirty_buf() is available, call it from xfs_trans_ordered_buf() so the latter is now mutually exclusive with xfs_trans_log_buf(). This reflects the implementation of ordered buffers and helps eliminate confusion over the need to log ranges of ordered buffers just to set up internal log state. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: refactor buffer logging into buffer dirtying helperBrian Foster2-17/+33
xfs_trans_log_buf() is responsible for logging the dirty segments of a buffer along with setting all of the necessary state on the transaction, buffer, bli, etc., to ensure that the associated items are marked as dirty and prepared for I/O. We have a couple use cases that need to to dirty a buffer in a transaction without actually logging dirty ranges of the buffer. One existing use case is ordered buffers, which are currently logged with arbitrary ranges to accomplish this even though the content of ordered buffers is never written to the log. Another pending use case is to relog an already dirty buffer across rolled transactions within the deferred operations infrastructure. This is required to prevent a held (XFS_BLI_HOLD) buffer from pinning the tail of the log. Refactor xfs_trans_log_buf() into a new function that contains all of the logic responsible to dirty the transaction, lidp, buffer and bli. This new function can be used in the future for the use cases outlined above. This patch does not introduce functional changes. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Allison Henderson <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: ordered buffer log items are never formattedBrian Foster2-11/+2
Ordered buffers pass through the logging infrastructure without ever being written to the log. The way this works is that the ordered buffer status is transferred to the log vector at commit time via the ->iop_size() callback. In xlog_cil_insert_format_items(), ordered log vectors bypass ->iop_format() processing altogether. Therefore it is unnecessary for xfs_buf_item_format() to handle ordered buffers. Remove the unnecessary logic and assert that an ordered buffer never reaches this point. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: remove unnecessary dirty bli format check for ordered bufsBrian Foster2-30/+33
xfs_buf_item_unlock() historically checked the dirty state of the buffer by manually checking the buffer log formats for dirty segments. The introduction of ordered buffers invalidated this check because ordered buffers have dirty bli's but no dirty (logged) segments. The check was updated to accommodate ordered buffers by looking at the bli state first and considering the blf only if the bli is clean. This logic is safe but unnecessary. There is no valid case where the bli is clean yet the blf has dirty segments. The bli is set dirty whenever the blf is logged (via xfs_trans_log_buf()) and the blf is cleared in the only place BLI_DIRTY is cleared (xfs_trans_binval()). Remove the conditional blf dirty checks and replace with an assert that should catch any discrepencies between bli and blf dirty states. Refactor the old blf dirty check into a helper function to be used by the assert. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: open-code xfs_buf_item_dirty()Brian Foster3-13/+1
It checks a single flag and has one caller. It probably isn't worth its own function. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: remove the ip argument to xfs_defer_finishChristoph Hellwig15-116/+129
And instead require callers to explicitly join the inode using xfs_defer_ijoin. Also consolidate the defer error handling in a few places using a goto label. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: rename xfs_defer_join to xfs_defer_ijoinChristoph Hellwig3-4/+4
Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: refactor xfs_trans_rollChristoph Hellwig9-56/+48
Split xfs_trans_roll into a low-level helper that just rolls the actual transaction and a new higher level xfs_trans_roll_inode that takes care of logging and rejoining the inode. This gets rid of the NULL inode case, and allows to simplify the special cases in the deferred operation code. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: check for race with xfs_reclaim_inode() in xfs_ifree_cluster()Omar Sandoval2-10/+23
After xfs_ifree_cluster() finds an inode in the radix tree and verifies that the inode number is what it expected, xfs_reclaim_inode() can swoop in and free it. xfs_ifree_cluster() will then happily continue working on the freed inode. Most importantly, it will mark the inode stale, which will probably be overwritten when the inode slab object is reallocated, but if it has already been reallocated then we can end up with an inode spuriously marked stale. In 8a17d7ddedb4 ("xfs: mark reclaimed inodes invalid earlier") we added a second check to xfs_iflush_cluster() to detect this race, but the similar RCU lookup in xfs_ifree_cluster() needs the same treatment. Signed-off-by: Omar Sandoval <[email protected]> Reviewed-by: Brian Foster <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-09-01xfs: evict all inodes involved with log redo itemDarrick J. Wong4-1/+14
When we introduced the bmap redo log items, we set MS_ACTIVE on the mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes from being truncated prematurely during log recovery. This also had the effect of putting linked inodes on the lru instead of evicting them. Unfortunately, we neglected to find all those unreferenced lru inodes and evict them after finishing log recovery, which means that we leak them if anything goes wrong in the rest of xfs_mountfs, because the lru is only cleaned out on unmount. Therefore, evict unreferenced inodes in the lru list immediately after clearing MS_ACTIVE. Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped") Signed-off-by: Darrick J. Wong <[email protected]> Cc: [email protected] Reviewed-by: Brian Foster <[email protected]>
2017-09-01Merge branch 'for-linus' of ↵Linus Torvalds2-20/+39
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fixes from Dmitry Torokhov: "Just a couple drivers fixes (Synaptics PS/2, Xpad)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: xpad - fix PowerA init quirk for some gamepad models Input: synaptics - fix device info appearing different on reconnect
2017-09-01selftests: correct define in msg_zerocopy.cWillem de Bruijn1-3/+3
The msg_zerocopy test defines SO_ZEROCOPY if necessary, but its value is inconsistent with the one in asm-generic.h. Correct that. Also convert one error to a warning. When the test is complete, report throughput and close cleanly even if the process did not wait for all completions. Reported-by: Dan Melnic <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-09-01Merge tag 'mmc-v4.13-rc7' of ↵Linus Torvalds2-3/+22
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull two more MMC fixes from Ulf Hansson: "MMC core: - Fix block status codes MMC host: - sdhci-xenon: Fix SD bus voltage select" * tag 'mmc-v4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sdhci-xenon: add set_power callback mmc: block: Fix block status codes
2017-09-01doc: document MSG_ZEROCOPYWillem de Bruijn1-0/+257
Documentation for this feature was missing from the patchset. Copied a lot from the netdev 2.1 paper, addressing some small interface changes since then. Changes v1 -> v2 - change email discussion URL format - clarify that u32 counter is per-syscall, unsigned and wraps after UINT_MAX calls - describe errno on send failure specific to MSG_ZEROCOPY - a few very minor rewordings Signed-off-by: Willem de Bruijn <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>