aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-08-03RDMA/efa: Remove double QP type assignmentLeon Romanovsky1-1/+0
The QP type is set by the IB/core and shouldn't be set in the driver. Fixes: 40909f664d27 ("RDMA/efa: Add EFA verbs implementation") Link: https://lore.kernel.org/r/838c40134c1590167b888ca06ad51071139ff2ae.1627040189.git.leonro@nvidia.com Acked-by: Gal Pressman <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-08-03RDMA/hns: Don't overwrite supplied QP attributesLeon Romanovsky1-7/+1
QP attributes that were supplied by IB/core already have all parameters set when they are passed to the driver. The drivers are not supposed to change anything in struct ib_qp_init_attr. Fixes: 66d86e529dd5 ("RDMA/hns: Add UD support for HIP09") Link: https://lore.kernel.org/r/5987138875e8ade9aa339d4db6e1bd9694ed4591.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-08-03RDMA/hns: Don't skip IB creation flow for regular RC QPLeon Romanovsky1-3/+3
The call to internal QP creation function skips QP creation checks and misses the addition of such device QPs to the restrack DB. As a preparation to general allocation scheme, convert hns to use proper API. Link: https://lore.kernel.org/r/7b236c15f7d5abb368958297ac6962d8459cb824.1627040189.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30RDMA/qedr: Improve error logs for rdma_alloc_tid error returnPrabhakar Kushwaha1-3/+15
Use -EINVAL return type to identify whether error is returned because of "Out of MR resources" or any other error types. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Shai Malin <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Prabhakar Kushwaha <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30RDMA/qed: Use accurate error num in qed_cxt_dynamic_ilt_allocPrabhakar Kushwaha1-2/+2
To have more accurate error return type use -EOPNOTSUPP instead of -EINVAL. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Shai Malin <[email protected]> Signed-off-by: Ariel Elior <[email protected]> Signed-off-by: Prabhakar Kushwaha <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30RDMA/hfi1: Fix typo in commentsCai Huoqing5-7/+7
Remove the repeated word 'the' from comments Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Cai Huoqing <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30docs: Fix infiniband uverbs minor numberLeon Romanovsky1-3/+3
Starting from the beginning of infiniband subsystem, the uverbs char devices start from 192 as a minor number, see commit bc38a6abdd5a ("[PATCH] IB uverbs: core implementation"). This patch updates the admin guide documentation to reflect it. Fixes: 9d85025b0418 ("docs-rst: create an user's manual book") Link: https://lore.kernel.org/r/bad03e6bcde45550c01e12908a6fe7dfa4770703.1627477347.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30RDMA/iwpm: Rely on the rdma_nl_[un]register() to ensure that requests are validLeon Romanovsky3-68/+1
The core netlink code alread guarentees that no netlink callback can be running outside the rdma_nl_register/unregister() region and this registration happens during module init/exit. Thus it is already prevented that iwpm_valid_client() can ever fail. Remove it. Link: https://lore.kernel.org/r/a9f05a78f9996bf6ea47099b5e02671bf742f5ab.1627048781.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30RDMA/iwpm: Remove not-needed reference countingLeon Romanovsky2-47/+16
iwpm_init() and iwpm_exit() are called only once during iw_cm module load. This makes whole reference count implementation not needed at all. Link: https://lore.kernel.org/r/1778ded873ba58c9fadc5bb25038de1cec843bec.1627048781.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30RDMA/iwcm: Release resources if iw_cm module initialization failsLeon Romanovsky1-7/+12
The failure during iw_cm module initialization partially left the system with unreleased memory and other resources. Rewrite the module init/exit routines in such way that netlink commands will be opened only after successful initialization. Fixes: b493d91d333e ("iwcm: common code for port mapper") Link: https://lore.kernel.org/r/b01239f99cb1a3e6d2b0694c242d89e6410bcd93.1627048781.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30RDMA/hfi1: Convert from atomic_t to refcount_t on hfi1_devdata->user_refcountXiyu Yang4-6/+7
refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situations. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xiyu Yang <[email protected]> Signed-off-by: Xin Tan <[email protected]> Tested-by: Josh Fisher <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30IB/hfi1: Adjust pkey entry in index 0Mike Marciniszyn1-6/+1
It is possible for the primary IPoIB network device associated with any RDMA device to fail to join certain multicast groups preventing IPv6 neighbor discovery and possibly other network ULPs from working correctly. The IPv4 broadcast group is not affected as the IPoIB network device handles joining that multicast group directly. This is because the primary IPoIB network device uses the pkey at ndex 0 in the associated RDMA device's pkey table. Anytime the pkey value of index 0 changes, the primary IPoIB network device automatically modifies it's broadcast address (i.e. /sys/class/net/[ib0]/broadcast), since the broadcast address includes the pkey value, and then bounces carrier. This includes initial pkey assignment, such as when the pkey at index 0 transitions from the opa default of invalid (0x0000) to some value such as the OPA default pkey for Virtual Fabric 0: 0x8001 or when the fabric manager is restarted with a configuration change causing the pkey at index 0 to change. Many network ULPs are not sensitive to the carrier bounce and are not expecting the broadcast address to change including the linux IPv6 stack. This problem does not affect IPoIB child network devices as their pkey value is constant for all time. To mitigate this issue, change the default pkey in at index 0 to 0x8001 to cover the predominant case and avoid issues as ipoib comes up and the FM sweeps. At some point, ipoib multicast support should automatically fix non-broadcast addresses as it does with the primary broadcast address. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://lore.kernel.org/r/[email protected] Suggested-by: Josh Collier <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-30IB/hfi1: Indicate DMA wait when txq is queued for wakeupMike Marciniszyn1-0/+3
There is no counter for dmawait in AIP, which hampers debugging performance issues. Add the counter increment when the txq is queued. Fixes: d99dc602e2a5 ("IB/hfi1: Add functions to transmit datagram ipoib packets") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Marciniszyn <[email protected]> Signed-off-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-25IB/mlx5: Rename is_apu_thread_cq function to is_apu_cqTal Gilboa8-13/+12
is_apu_thread_cq() used to detect CQs which are attached to APU threads. This was extended to support other elements as well, so the function was renamed to is_apu_cq(). c_eqn_or_apu_element was extended from 8 bits to 32 bits, which wan't reflected when the APU support was first introduced. Acked-by: Michael S. Tsirkin <[email protected]> # vdpa Signed-off-by: Tal Gilboa <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2021-07-20Merge branch 'mlx5_dcs' into rdma.git for-nextJason Gunthorpe4-5/+204
Leon Romanovsky says: ==================== Add ConnectX DCS offload support This patchset from Lior adds support of DCI stream channel (DCS) support. DCS is an offload to SW load balancing of DC initiator work requests. A single DC QP initiator (DCI) can be connected to only one target at the time and can't start new connection until the previous work request is completed. This limitation causes to delays when the initiator process needs to transfer data to multiple targets at the same time. ==================== * branch 'mlx5_dcs': RDMA/mlx5: Add DCS offload support RDMA/mlx5: Separate DCI QP creation logic net/mlx5: Add DCS caps & fields support
2021-07-20RDMA/mlx5: Add DCS offload supportLior Nahmanson3-2/+36
DCS is an offload to SW load balancing of DC initiator work requests. A single DCI can be connected to only one target at the time and can't start new connection until the previous work request is completed. This limitation will cause to delay when the initiator process needs to transfer data to multiple targets at the same time. The SW solution is to use a process that handling and spreading the work request on many DCIs according to destinations. This feature is an offload to this process and coming to reduce the load from the CPU and improve the performance. Link: https://lore.kernel.org/r/491c2c2afdb5b07de7f03eab3f93cf0704549dbc.1624258894.git.leonro@nvidia.com Reviewed-by: Meir Lichtinger <[email protected]> Signed-off-by: Lior Nahmanson <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-20RDMA/mlx5: Separate DCI QP creation logicLior Nahmanson1-0/+157
This patch isolates DCI QP creation logic to separate function, so this change will reduce complexity when adding new features to DCI QP without interfering with other QP types. The code was copied from create_user_qp() while taking only DCI relevant bits. Link: https://lore.kernel.org/r/b4530bdd999349c59691224f016ff1efb5dc3b92.1624258894.git.leonro@nvidia.com Reviewed-by: Meir Lichtinger <[email protected]> Signed-off-by: Lior Nahmanson <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-18net/mlx5: Add DCS caps & fields supportLior Nahmanson1-3/+11
This fields will be needed when adding a support for DCS offload max_dci_stream_channels - maximum DCI stream channels supported per DCI. max_dci_errored_streams - maximum DCI error stream channels supported per DCI before a DCI move to error state. Signed-off-by: Lior Nahmanson <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2021-07-16RDMA/rxe: Fix types in rxe_icrc.cBob Pearson1-14/+14
Currently the ICRC is generated as a u32 type and then forced to a __be32 and stored into the ICRC field in the packet. The actual type of the ICRC is __be32. This patch replaces u32 by __be32 and eliminates the casts. The computation is exactly the same as the original but the types are more consistent. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16RDMA/rxe: Add kernel-doc comments to rxe_icrc.cBob Pearson1-3/+29
This patch adds kernel-doc style comments to rxe_icrc.c Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16RDMA/rxe: Move crc32 init code to rxe_icrc.cBob Pearson4-9/+22
This patch collects the code from rxe_register_device() that sets up the crc32 calculation into a subroutine rxe_icrc_init() in rxe_icrc.c. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16RDMA/rxe: Fixup rxe_icrc_hdrBob Pearson2-4/+3
rxe_icrc_hdr() in rxe_icrc.c is no longer shared. This patch makes it static and changes the parameter list to match the other routines there. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16RDMA/rxe: Move rxe_crc32 to a subroutineBob Pearson2-21/+21
Move rxe_crc32() from rxe.h to rxe_icrc.c as a static local function. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16RDMA/rxe: Move ICRC generation to a subroutineBob Pearson7-64/+37
Isolate ICRC generation into a single subroutine named rxe_generate_icrc() in rxe_icrc.c. Remove scattered crc generation code from elsewhere. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16RDMA/rxe: Fixup rxe_send and rxe_loopbackBob Pearson2-16/+14
Fixup rxe_send() and rxe_loopback() in rxe_net.c to have the same calling sequence. This patch makes them static and have the same parameter list and return value. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16RDMA/rxe: Move rxe_xmit_packet to a subroutineBob Pearson2-43/+45
rxe_xmit_packet() was an overlong inline subroutine. This patch moves it into rxe_net.c as an ordinary subroutine. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Reviewed-by: Zhu Yanjun <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16RDMA/rxe: Move ICRC checking to a subroutineBob Pearson3-21/+42
Move the code in rxe_recv() that checks the ICRC on incoming packets to a subroutine rxe_check_icrc() and move that to rxe_icrc.c. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Reviewed-by: Zhu Yanjun <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16IB/core: Read subnet_prefix in ib_query_port via cache.Anand Khoje1-6/+2
ib_query_port() calls device->ops.query_port() to get the port attributes. The method of querying is device driver specific. The same function calls device->ops.query_gid() to get the GID and extract the subnet_prefix (gid_prefix). The GID and subnet_prefix are stored in a cache. But they do not get read from the cache if the device is an Infiniband device. The following change takes advantage of the cached subnet_prefix. Testing with RDBMS has shown a significant improvement in performance with this change. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Anand Khoje <[email protected]> Signed-off-by: Haakon Bugge <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16IB/core: Shifting initialization of device->cache_lockAnand Khoje2-2/+2
The lock cache_lock of struct ib_device is initialized in function ib_cache_setup_one(). This is much later than the device initialization in _ib_alloc_device(). This change shifts initialization of cache_lock in _ib_alloc_device(). Link: https://lore.kernel.org/r/[email protected] Suggested-by: Haakon Bugge <[email protected]> Signed-off-by: Anand Khoje <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-16IB/core: Updating cache for subnet_prefix in config_non_roce_gid_cache()Anand Khoje1-3/+5
Currently, cache for subnet_prefix was getting updated by reading port attributes via ib_query_port. ib_query_port() calls ops.query_gid() to get subnet_prefix and returns it via port_attr. In ib_cache_update(), config_non_roce_gid_cache() obtains GIDs by calling ops.query_gid(). We utilize this to store subnet_prefix in cache. Link: https://lore.kernel.org/r/[email protected] Suggested-by: Jason Gunthorpe <[email protected]> Suggested-by: Aru Kolappan <[email protected]> Signed-off-by: Anand Khoje <[email protected]> Signed-off-by: Haakon Bugge <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/efa: Split hardware stats to device and port statsGal Pressman1-48/+70
The hardware stats API distinguishes between device and port statistics, split the EFA stats accordingly instead of always dumping everything. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Firas JahJah <[email protected]> Reviewed-by: Yossi Leybovich <[email protected]> Signed-off-by: Gal Pressman <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15MAINTAINERS: Update maintainers of HiSilicon RoCEWeihang Li1-1/+1
Lijun has moved to work in other technical areas, and Wenpeng will maintain this modules instead of him. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Weihang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/rxe: Remove the repeated 'mr->umem = umem'Xiao Yang1-1/+0
Drop duplicated code Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Xiao Yang <[email protected]> Reviewed-by: Håkon Bugge <[email protected]> Reviewed-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()Ira Weiny1-11/+19
kmap() is being deprecated and will break uses of device dax after PKS protection is introduced.[1] The use of kmap() in siw_tx_hdt() is all thread local therefore kmap_local_page() is a sufficient replacement and will work with pgmap protected pages when those are implemented. siw_tx_hdt() tracks pages used in a page_array. It uses that array to unmap pages which were mapped on function exit. Not all entries in the array are mapped and this is tracked in kmap_mask. kunmap_local() takes a mapped address rather than a page. Alter siw_unmap_pages() to take the iov array to reuse the iov_base address of each mapping. Use PAGE_MASK to get the proper address for kunmap_local(). kmap_local_page() mappings are tracked in a stack and must be unmapped in the opposite order they were mapped in. Because segments are mapped into the page array in increasing index order, modify siw_unmap_pages() to unmap pages in decreasing order. Use kmap_local_page() instead of kmap() to map pages in the page_array. [1] https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ira Weiny <[email protected]> Reviewed-by: Bernard Metzler <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/siw: Remove kmap()Ira Weiny1-6/+8
kmap() is being deprecated and will break uses of device dax after PKS protection is introduced.[1] These uses of kmap() in the SIW driver are thread local. Therefore kmap_local_page() is sufficient to use and will work with pgmap protected pages when those are implemnted. There is one more use of kmap() in this driver which is split into its own patch because kmap_local_page() has strict ordering rules and the use of the kmap_mask over multiple segments must be handled carefully. Therefore, that conversion is handled in a stand alone patch. Use kmap_local_page() instead of kmap() in the 'easy' cases. [1] https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ira Weiny <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/rtrs: Move sq_wr_avail to rtrs_conJack Wang5-5/+7
In order to account HB for sq_wr_avail properly, move sq_wr_avail from rtrs_srv_con to rtrs_con. Although rtrs-clt do not care sq_wr_avail, but still init it to max_send_wr. Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Aleksei Marov <[email protected]> Reviewed-by: Gioh Kim <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/rtrs: Remove unused flags parameterJack Wang1-3/+2
flags is not used, so remove it from rtrs_post_rdma_write_imm_empty. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Aleksei Marov <[email protected]> Reviewed-by: Gioh Kim <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/rtrs: Make rtrs_post_rdma_write_imm_empty staticJack Wang2-7/+5
It's only used in rtrs.c, so no need to export. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Aleksei Marov <[email protected]> Reviewed-by: Gioh Kim <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/rtrs: Enable the same selective signal for heartbeat and IOJack Wang4-8/+18
On idle session, because we do not do signal for heartbeat, it will overflow the send queue after sometime. To avoid that, we need to enable the signal for heartbeat. To do that, add a new member signal_interval in rtrs_path, which will set min of queue_depth and SERVICE_CON_QUEUE_DEPTH, and track it for both heartbeat and IO, so the sq queue full accounting is correct. Fixes: b38041d50add ("RDMA/rtrs: Do not signal for heatbeat") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Aleksei Marov <[email protected]> Reviewed-by: Gioh Kim <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/rtrs: move wr_cnt from rtrs_srv_con to rtrs_conJack Wang5-8/+8
We need to track also the wr used for heatbeat. This is a preparation for that, will be used in later patch. The io_cnt in rtrs_clt is removed, use wr_cnt instead. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Aleksei Marov <[email protected]> Reviewed-by: Gioh Kim <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-15RDMA/rtrs: Add error messages for failed operations.Jack Wang1-0/+3
It could help debugging in case of error happens. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jack Wang <[email protected]> Reviewed-by: Aleksei Marov <[email protected]> Reviewed-by: Gioh Kim <[email protected]> Reviewed-by: Md Haris Iqbal <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2021-07-11Linux 5.14-rc1Linus Torvalds1-2/+2
2021-07-11mm/rmap: try_to_migrate() skip zone_device !device_privateHugh Dickins1-3/+3
I know nothing about zone_device pages and !device_private pages; but if try_to_migrate_one() will do nothing for them, then it's better that try_to_migrate() filter them first, than trawl through all their vmas. Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Andrew Morton <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Yang Shi <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-11mm/rmap: fix new bug: premature return from page_mlock_one()Hugh Dickins1-6/+5
In the unlikely race case that page_mlock_one() finds VM_LOCKED has been cleared by the time it got page table lock, page_vma_mapped_walk_done() must be called before returning, either explicitly, or by a final call to page_vma_mapped_walk() - otherwise the page table remains locked. Fixes: cd62734ca60d ("mm/rmap: split try_to_munlock from try_to_unmap") Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/ Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Andrew Morton <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Yang Shi <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-11mm/rmap: fix old bug: munlocking THP missed other mlocksHugh Dickins1-5/+8
The kernel recovers in due course from missing Mlocked pages: but there was no point in calling page_mlock() (formerly known as try_to_munlock()) on a THP, because nothing got done even when it was found to be mapped in another VM_LOCKED vma. It's true that we need to be careful: Mlocked accounting of pte-mapped THPs is too difficult (so consistently avoided); but Mlocked accounting of only-pmd-mapped THPs is supposed to work, even when multiple mappings are mlocked and munlocked or munmapped. Refine the tests. There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so page_mlock_one() does not even have to worry about that complication. (I said the kernel recovers: but would page reclaim be likely to split THP before rediscovering that it's VM_LOCKED? I've not followed that up) Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages") Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Andrew Morton <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-11mm/rmap: fix comments left over from recent changesHugh Dickins2-7/+2
Parallel developments in mm/rmap.c have left behind some out-of-date comments: try_to_migrate_one() also accepts TTU_SYNC (already commented in try_to_migrate() itself), and try_to_migrate() returns nothing at all. TTU_SPLIT_FREEZE has just been deleted, so reword the comment about it in mm/huge_memory.c; and TTU_IGNORE_ACCESS was removed in 5.11, so delete the "recently referenced" comment from try_to_unmap_one() (once upon a time the comment was near the removed codeblock, but they drifted apart). Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Cc: Andrew Morton <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Yang Shi <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-11Merge tag 'irq-urgent-2021-07-11' of ↵Linus Torvalds6-12/+31
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Ingo Molnar: "Two fixes: - Fix a MIPS IRQ handling RCU bug - Remove a DocBook annotation for a parameter that doesn't exist anymore" * tag 'irq-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/mips: Fix RCU violation when using irqdomain lookup on interrupt entry genirq/irqdesc: Drop excess kernel-doc entry @lookup
2021-07-11Merge tag 'sched-urgent-2021-07-11' of ↵Linus Torvalds2-10/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Three fixes: - Fix load tracking bug/inconsistency - Fix a sporadic CFS bandwidth constraints enforcement bug - Fix a uclamp utilization tracking bug for newly woken tasks" * tag 'sched-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/uclamp: Ignore max aggregation if rq is idle sched/fair: Fix CFS bandwidth hrtimer expiry type sched/fair: Sync load_sum with load_avg after dequeue
2021-07-11Merge tag 'perf-urgent-2021-07-11' of ↵Linus Torvalds2-8/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A fix and a hardware-enablement addition: - Robustify uncore_snbep's skx_iio_set_mapping()'s error cleanup - Add cstate event support for Intel ICELAKE_X and ICELAKE_D" * tag 'perf-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/uncore: Clean up error handling path of iio mapping perf/x86/cstate: Add ICELAKE_X and ICELAKE_D support
2021-07-11Merge tag 'locking-urgent-2021-07-11' of ↵Linus Torvalds5-23/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Fix a Sparc crash - Fix a number of objtool warnings - Fix /proc/lockdep output on certain configs - Restore a kprobes fail-safe * tag 'locking-urgent-2021-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/atomic: sparc: Fix arch_cmpxchg64_local() kprobe/static_call: Restore missing static_call_text_reserved() static_call: Fix static_call_text_reserved() vs __init jump_label: Fix jump_label_text_reserved() vs __init locking/lockdep: Fix meaningless /proc/lockdep output of lock classes on !CONFIG_PROVE_LOCKING