blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2021-09-16	arm64: Mark __stack_chk_guard as __ro_after_init	Dan Li	1	-1/+1
	__stack_chk_guard is setup once while init stage and never changed after that. Although the modification of this variable at runtime will usually cause the kernel to crash (so does the attacker), it should be marked as __ro_after_init, and it should not affect performance if it is placed in the ro_after_init section. Signed-off-by: Dan Li <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2021-09-16	arm64/kernel: remove duplicate include in process.c	Lv Ruyi	1	-1/+0
	Remove all but the first include of linux/sched.h from process.c Reported-by: Zeal Robot <[email protected]> Signed-off-by: Lv Ruyi <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2021-09-16	arm64/sve: Use correct size when reinitialising SVE state	Mark Brown	1	-1/+1
	When we need a buffer for SVE register state we call sve_alloc() to make sure that one is there. In order to avoid repeated allocations and frees we keep the buffer around unless we change vector length and just memset() it to ensure a clean register state. The function that deals with this takes the task to operate on as an argument, however in the case where we do a memset() we initialise using the SVE state size for the current task rather than the task passed as an argument. This is only an issue in the case where we are setting the register state for a task via ptrace and the task being configured has a different vector length to the task tracing it. In the case where the buffer is larger in the traced process we will leak old state from the traced process to itself, in the case where the buffer is smaller in the traced process we will overflow the buffer and corrupt memory. Fixes: bc0ee4760364 ("arm64/sve: Core task context handling") Cc: <[email protected]> # 4.15.x Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]>
2021-09-16	dt-bindings: ufs: Add bindings for Samsung ufs host	Alim Akhtar	1	-0/+89
	This patch adds DT bindings for Samsung ufs hci Signed-off-by: Alim Akhtar <[email protected]> Signed-off-by: Chanho Park <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2021-09-16	drm/amdgpu/display: add a proper license to dc_link_dp.c	Alex Deucher	1	-1/+23
	Was missing. Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amd/display: Fix white screen page fault for gpuvm	Nicholas Kazlauskas	1	-0/+2
	[Why] The "base_addr_is_mc_addr" field was added for dcn3.1 support but pa_config was never updated to set it to false. Uninitialized memory causes it to be set to true which results in address mistranslation and white screen. [How] Use memset to ensure all fields are initialized to 0 by default. Fixes: 64b1d0e8d500 ("drm/amd/display: Add DCN3.1 HWSEQ") Signed-off-by: Nicholas Kazlauskas <[email protected]> Acked-by: Alex Deucher <[email protected]> Acked-by: Aaron Liu <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-09-16	amd/display: enable panel orientation quirks	Simon Ser	1	-0/+28
	This patch allows panel orientation quirks from DRM core to be used. They attach a DRM connector property "panel orientation" which indicates in which direction the panel has been mounted. Some machines have the internal screen mounted with a rotation. Since the panel orientation quirks need the native mode from the EDID, check for it in amdgpu_dm_connector_ddc_get_modes. Signed-off-by: Simon Ser <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Harry Wentland <[email protected]> Cc: Nicholas Kazlauskas <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amdgpu: Demote TMZ unsupported log message from warning to info	Paul Menzel	1	-1/+1
	As the user cannot do anything about the unsupported Trusted Memory Zone (TMZ) feature, do not warn about it, but make it informational, so demote the log level from warning to info. Signed-off-by: Paul Menzel <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_count	Michel Dänzer	2	-2/+2
	This was unusual; normally, inline functions are declared static as well, and defined in a header file if used by multiple compilation units. The latter would be more involved in this case, so just drop the inline declaration for now. Fixes compile failure building for ppc64le on RHEL 8: In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32, from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33: ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’: ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available 90 \| inline uint32_t amdgpu_ras_eeprom_max_record_count(void); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here 1985 \| max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count(); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: c84d46707ebb "drm/amdgpu: validate bad page threshold in ras(v3)" Reviewed-by: Lyude Paul <[email protected]> Signed-off-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amd/pm: fix runpm hang when amdgpu loaded prior to sound driver	Evan Quan	4	-4/+47
	Current RUNPM mechanism relies on PMFW to master the timing for BACO in/exit. And that needs cooperation from sound driver for dstate change notification for function 1(audio). Otherwise(on sound driver missing), BACO cannot be kicked in correctly and hang will be observed on RUNPM exit. By switching back to legacy message way on sound driver missing, we are able to fix the runpm hang observed for the scenario below: amdgpu driver loaded -> runpm suspend kicked -> sound driver loaded Signed-off-by: Evan Quan <[email protected]> Reported-and-tested-by: Pierre-Eric Pelloux-Prayer <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-09-16	drm/radeon: pass drm dev radeon_agp_head_init directly	Nirmoy Das	1	-1/+1
	Pass drm dev directly as rdev->ddev gets initialized later on at radeon_device_init(). Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214375 Signed-off-by: Nirmoy Das <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-09-16	drm/amdgpu: move iommu_resume before ip init/resume	James Zhu	1	-0/+12
	Separate iommu_resume from kfd_resume, and move it before other amdgpu ip init/resume. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-09-16	drm/amdgpu: add amdgpu_amdkfd_resume_iommu	James Zhu	2	-0/+11
	Add amdgpu_amdkfd_resume_iommu for amdgpu. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-09-16	drm/amdkfd: separate kfd_iommu_resume from kfd_resume	James Zhu	2	-4/+14
	Separate kfd_iommu_resume from kfd_resume for fine-tuning of amdgpu device init/resume/reset/recovery sequence. v2: squash in fix for !CONFIG_HSA_AMD Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211277 Signed-off-by: James Zhu <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-09-16	drm/amd/display: Link training retry fix for abort case	Meenakshikumar Somasundaram	1	-3/+7
	[Why] If link training is aborted, it shall be retried if sink is present. [How] Check hpd status to find out whether sink is present or not. If sink is present, then link training shall be tried again with same settings. Otherwise, link training shall be aborted. Reviewed-by: Jimmy Kizito <[email protected]> Acked-by: Mikita Lipski <[email protected]> Signed-off-by: Meenakshikumar Somasundaram <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amd/display: Fix unstable HPCP compliance on Chrome Barcelo	Qingqing Zhuo	1	-2/+20
	[Why] Intermittently, there presents two occurrences of 0 stream commits in a single HPD event. Current HDCP sequence does not consider such scenerio, and will thus disable HDCP. [How] Add condition check to include stream remove and re-enable case for HDCP enable. Reviewed-by: Bhawanpreet Lakha <[email protected]> Acked-by: Mikita Lipski <[email protected]> Signed-off-by: Qingqing Zhuo <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amd/display: dsc mst 2 4K displays go dark with 2 lane HBR3	Hersen Wu	3	-17/+34
	[Why] call stack of amdgpu dsc mst pbn, slot num calculation is as below: -compute_bpp_x16_from_target_bandwidth -decide_dsc_target_bpp_x16 -setup_dsc_config -dc_dsc_compute_bandwidth_range -compute_mst_dsc_configs_for_link -compute_mst_dsc_configs_for_state from pbn -> dsc target bpp_x16 bpp_x16 is calulated by compute_bpp_x16_from_target_bandwidth. Beside pixel clock and bpp, num_slices_h and bpp_increment_div will also affect bpp_x16. from dsc target bpp_x16 -> pbn within dm_update_mst_vcpi_slots_for_dsc, pbn = drm_dp_calc_pbn_mode(clock, bpp_x16, true); drm_dp_calc_pbn_mode(int clock, int bpp, bool dsc) { return DIV_ROUND_UP_ULL(mul_u32_u32(clock * (bpp / 16), 64 * 1006), 8 * 54 * 1000 * 1000); } bpp / 16 trunc digits after decimal point. This will cause calculation delta. drm_dp_calc_pbn_mode does not have other informations, like num_slices_h, bpp_increment_div. therefore, it does not do revese calcuation properly from bpp_x16 to pbn. pbn from drm_dp_calc_pbn_mode is less than pbn from compute_mst_dsc_configs_for_state. This cause not enough mst slot allocated to display. display could not visually light up. [How] pass pbn from compute_mst_dsc_configs_for_state to dm_update_mst_vcpi_slots_for_dsc Cc: [email protected] Reviewed-by: Scott Foster <[email protected]> Acked-by: Mikita Lipski <[email protected]> Signed-off-by: Hersen Wu <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amd/display: Get backlight from PWM if DMCU is not initialized	Harry Wentland	2	-14/+12
	On Carrizo/Stoney systems we set backlight through panel_cntl, i.e. directly via the PWM registers, if DMCU is not initialized. We always read it back through ABM registers which leads to a mismatch and forces atomic_commit to program the backlight each time. Instead make sure we use the same logic for backlight readback, i.e. read it from panel_cntl if DMCU is not initialized. We also need to remove some extraneous and incorrect calculations at the end of dce_get_16_bit_backlight_from_pwm. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1666 Cc: [email protected] Reviewed-by: Josip Pavic <[email protected]> Acked-by: Mikita Lipski <[email protected]> Signed-off-by: Harry Wentland <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amdkfd: make needs_pcie_atomics FW-version dependent	Felix Kuehling	2	-16/+29
	On some GPUs the PCIe atomic requirement for KFD depends on the MEC firmware version. Add a firmware version check for this. The minimum firmware version that works without atomics can be updated in the device_info structure for each GPU type. Move PCIe atomic detection from kgd2kfd_probe into kgd2kfd_device_init because the MEC firmware is not loaded yet at the probe stage. Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amdgpu: add manual sclk/vddc setting support for cyan skilfish(v3)	Lang Yu	2	-1/+138
	Add manual sclk/vddc setting supoort via pp_od_clk_voltage sysfs to maintain consistency with other asics. As cyan skillfish doesn't support DPM, there is only a single frequency and voltage to adjust. v2: maintain consistency and add command guide. v3: adjust user settings storage and coding style. Command guide: echo vc point sclk vddc > pp_od_clk_voltage "vc" - sclk voltage curve "point" - must be 0 "sclk" - target value of sclk(MHz), should be in safe range "vddc" - target value of vddc(mV), a 6.25(mV) stepping is recommended and should be in safe range (the real vddc is an approximation of target value) echo c > pp_od_clk_voltage "c" - commit the changes of sclk and vddc, only after the commit command, the target values set by "vc" command will take effect echo r > pp_od_clk_voltage "r" - reset sclk and vddc to default value, a subsequent commit command is needed to take effect Example: 1) Check default sclk and vddc $ cat pp_od_clk_voltage OD_SCLK: 0: 1800Mhz * OD_VDDC: 0: 862mV * OD_RANGE: SCLK: 1000Mhz 2000Mhz VDDC: 700mV 1129mV 2) Set sclk to 1500MHz and vddc to 700mV $ echo vc 0 1500 700 > pp_od_clk_voltage $ echo c > pp_od_clk_voltage $ cat pp_od_clk_voltage OD_SCLK: 0: 1500Mhz * OD_VDDC: 0: 693mV * OD_RANGE: SCLK: 1000Mhz 2000Mhz VDDC: 700mV 1129mV 3) Reset sclk and vddc to default $ echo r > pp_od_clk_voltage $ echo c > pp_od_clk_voltage $ cat pp_od_clk_voltage OD_SCLK: 0: 1800Mhz * OD_VDDC: 0: 874mV * OD_RANGE: SCLK: 1000Mhz 2000Mhz VDDC: 700mV 1129mV NOTE: We don't specify an explicit safe range, you can set any values between min and max at your own risk. Enjoy! Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amdgpu: add some pptable funcs for cyan skilfish(v3)	Lang Yu	1	-0/+347
	Add print_clk_levels and read_sensor pptable funcs for cyan skilfish. v2: keep consitency and add get_gpu_metrics callback. v3: use sysfs_emit_at() in sysfs show function. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amdgpu: update SMU driver interface for cyan skilfish(v3)	Lang Yu	1	-51/+35
	Add SmuMetrics_t definition for cyan skilfish. v2: update SmuMetrics_t definition. v3: cleanup and rearrange the order of fields. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amdgpu: update SMU PPSMC for cyan skilfish	Lang Yu	1	-1/+8
	Add some PPSMC MSGs for cyan skilfish. Signed-off-by: Lang Yu <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amdgpu: fix sysfs_emit/sysfs_emit_at warnings(v2)	Lang Yu	8	-16/+49
	sysfs_emit and sysfs_emit_at requrie a page boundary aligned buf address. Make them happy! v2: use an inline function. Warning Log: [ 492.545174] invalid sysfs_emit_at: buf:00000000f19bdfde at:0 [ 492.546416] WARNING: CPU: 7 PID: 1304 at fs/sysfs/file.c:765 sysfs_emit_at+0x4a/0xa0 [ 492.654805] Call Trace: [ 492.655353] ? smu_cmn_get_metrics_table+0x40/0x50 [amdgpu] [ 492.656780] vangogh_print_clk_levels+0x369/0x410 [amdgpu] [ 492.658245] vangogh_common_print_clk_levels+0x77/0x80 [amdgpu] [ 492.659733] ? preempt_schedule_common+0x18/0x30 [ 492.660713] smu_print_ppclk_levels+0x65/0x90 [amdgpu] [ 492.662107] amdgpu_get_pp_od_clk_voltage+0x13d/0x190 [amdgpu] [ 492.663620] dev_attr_show+0x1d/0x40 Signed-off-by: Lang Yu <[email protected]> Acked-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amd/display: dc_assert_fp_enabled assert only if FPU is not enabled	Anson Jacob	1	-1/+1
	Assert only when FPU is not enabled. Fixes: 0ea7ee821701 ("drm/amd/display: Add DC_FP helper to check FPU state") Signed-off-by: Anson Jacob <[email protected]> Cc: Christian König <[email protected]> Cc: Hersen Wu <[email protected]> Cc: Harry Wentland <[email protected]> Cc: Rodrigo Siqueira <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	drm/amd/display: Add NULL checks for vblank workqueue	Nicholas Kazlauskas	1	-14/+18
	[Why] If we're running a headless config with 0 links then the vblank workqueue will be NULL - causing a NULL pointer exception during any commit. [How] Guard access to the workqueue if it's NULL and don't queue or flush work if it is. Reported-by: Mike Lothian <[email protected]> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1700 Fixes: 58aa1c50e5a231 ("drm/amd/display: Use vblank control events for PSR enable/disable") Signed-off-by: Nicholas Kazlauskas <[email protected]> Reviewed-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-09-16	mlxbf_gige: clear valid_polarity upon open	David Thompson	1	-0/+7
	The network interface managed by the mlxbf_gige driver can get into a problem state where traffic does not flow. In this state, the interface will be up and enabled, but will stop processing received packets. This problem state will happen if three specific conditions occur: 1) driver has received more than (N * RxRingSize) packets but less than (N+1 * RxRingSize) packets, where N is an odd number Note: the command "ethtool -g <interface>" will display the current receive ring size, which currently defaults to 128 2) the driver's interface was disabled via "ifconfig oob_net0 down" during the window described in #1. 3) the driver's interface is re-enabled via "ifconfig oob_net0 up" This patch ensures that the driver's "valid_polarity" field is cleared during the open() method so that it always matches the receive polarity used by hardware. Without this fix, the driver needs to be unloaded and reloaded to correct this problem state. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Reviewed-by: Asmaa Mnebhi <[email protected]> Signed-off-by: David Thompson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-16	igc: fix tunnel offloading	Paolo Abeni	1	-1/+3
	Checking tunnel offloading, it turns out that offloading doesn't work as expected. The following script allows to reproduce the issue. Call it as `testscript DEVICE LOCALIP REMOTEIP NETMASK' === SNIP === if [ $# -ne 4 ] then echo "Usage $0 DEVICE LOCALIP REMOTEIP NETMASK" exit 1 fi DEVICE="$1" LOCAL_ADDRESS="$2" REMOTE_ADDRESS="$3" NWMASK="$4" echo "Driver: $(ethtool -i ${DEVICE} \| awk '/^driver:/{print $2}') " ethtool -k "${DEVICE}" \| grep tx-udp echo echo "Set up NIC and tunnel..." ip addr add "${LOCAL_ADDRESS}/${NWMASK}" dev "${DEVICE}" ip link set "${DEVICE}" up sleep 2 ip link add vxlan1 type vxlan id 42 \ remote "${REMOTE_ADDRESS}" \ local "${LOCAL_ADDRESS}" \ dstport 0 \ dev "${DEVICE}" ip addr add fc00::1/64 dev vxlan1 ip link set vxlan1 up sleep 2 rm -f vxlan.pcap echo "Running tcpdump and iperf3..." ( nohup tcpdump -i any -w vxlan.pcap >/dev/null 2>&1 ) & sleep 2 iperf3 -c fc00::2 >/dev/null pkill tcpdump echo echo -n "Max. Paket Size: " tcpdump -r vxlan.pcap -nnle 2>/dev/null \ \| grep "${LOCAL_ADDRESS}.> ${REMOTE_ADDRESS}.OTV" \ \| awk '{print $8}' \| awk -F ':' '{print $1}' \ \| sort -n \| tail -1 echo ip link del vxlan1 ip addr del ${LOCAL_ADDRESS}/${NWMASK} dev "${DEVICE}" === SNAP === The expected outcome is Max. Paket Size: 64904 This is what you see on igb, the code igc has been taken from. However, on igc the output is Max. Paket Size: 1516 so the GSO aggregate packets are segmented by the kernel before calling igc_xmit_frame. Inside the subsequent call to igc_tso, the check for skb_is_gso(skb) fails and the function returns prematurely. It turns out that this occurs because the feature flags aren't set entirely correctly in igc_probe. In contrast to the original code from igb_probe, igc_probe neglects to set the flags required to allow tunnel offloading. Setting the same flags as igb fixes the issue on igc. Fixes: 34428dff3679 ("igc: Add GSO partial support") Signed-off-by: Paolo Abeni <[email protected]> Tested-by: Corinna Vinschen <[email protected]> Acked-by: Sasha Neftin <[email protected]> Tested-by: Nechama Kraus <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-16	net/{mlx5\|nfp\|bnxt}: Remove unnecessary RTNL lock assert	Eli Cohen	3	-9/+0
	Remove the assert from the callback priv lookup function since it does not require RTNL lock and is already protected by flow_indr_block_lock. This will avoid warnings from being emitted to dmesg if the driver registers its callback after an ingress qdisc was created for a netdevice. The warnings started after the following patch was merged: commit 74fc4f828769 ("net: Fix offloading indirect devices dependency on qdisc order creation") Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-16	net: wan: wanxl: define CROSS_COMPILE_M68K	Adam Borowski	1	-0/+2
	It was used but never set. The hardcoded value from before the dawn of time was non-standard; the usual name for cross-tools is $TRIPLET-$TOOL Signed-off-by: Adam Borowski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-16	selftests: nci: replace unsigned int with int	Xiang wangx	1	-1/+1
	Should not use comparison of unsigned expressions < 0. Signed-off-by: Xiang wangx <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-09-16	drm/etnaviv: add missing MMU context put when reaping MMU mapping	Lucas Stach	1	-0/+1
	When we forcefully evict a mapping from the the address space and thus the MMU context, the MMU context is leaked, as the mapping no longer points to it, so it doesn't get freed when the GEM object is destroyed. Add the mssing context put to fix the leak. Cc: [email protected] # 5.4 Signed-off-by: Lucas Stach <[email protected]> Tested-by: Michael Walle <[email protected]> Tested-by: Marek Vasut <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
2021-09-16	drm/etnaviv: reference MMU context when setting up hardware state	Lucas Stach	3	-12/+24
	Move the refcount manipulation of the MMU context to the point where the hardware state is programmed. At that point it is also known if a previous MMU state is still there, or the state needs to be reprogrammed with a potentially different context. Cc: [email protected] # 5.4 Signed-off-by: Lucas Stach <[email protected]> Tested-by: Michael Walle <[email protected]> Tested-by: Marek Vasut <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
2021-09-16	drm/etnaviv: fix MMU context leak on GPU reset	Lucas Stach	1	-0/+2
	After a reset the GPU is no longer using the MMU context and may be restarted with a different context. While the mmu_state proeprly was cleared, the context wasn't unreferenced, leading to a memory leak. Cc: [email protected] # 5.4 Reported-by: Michael Walle <[email protected]> Signed-off-by: Lucas Stach <[email protected]> Tested-by: Michael Walle <[email protected]> Tested-by: Marek Vasut <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
2021-09-16	drm/etnaviv: exec and MMU state is lost when resetting the GPU	Lucas Stach	1	-3/+2
	When the GPU is reset both the current exec state, as well as all MMU state is lost. Move the driver side state tracking into the reset function to keep hardware and software state from diverging. Cc: [email protected] # 5.4 Signed-off-by: Lucas Stach <[email protected]> Tested-by: Michael Walle <[email protected]> Tested-by: Marek Vasut <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
2021-09-16	drm/etnaviv: keep MMU context across runtime suspend/resume	Lucas Stach	1	-3/+3
	The MMU state may be kept across a runtime suspend/resume cycle, as we avoid a full hardware reset to keep the latency of the runtime PM small. Don't pretend that the MMU state is lost in driver state. The MMU context is pushed out when new HW jobs with a different context are coming in. The only exception to this is when the GPU is unbound, in which case we need to make sure to also free the last active context. Cc: [email protected] # 5.4 Reported-by: Michael Walle <[email protected]> Signed-off-by: Lucas Stach <[email protected]> Tested-by: Michael Walle <[email protected]> Tested-by: Marek Vasut <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
2021-09-16	drm/etnaviv: stop abusing mmu_context as FE running marker	Lucas Stach	2	-2/+9
	While the DMA frontend can only be active when the MMU context is set, the reverse isn't necessarily true, as the frontend can be stopped while the MMU state is kept. Stop treating mmu_context being set as a indication that the frontend is running and instead add a explicit property. Cc: [email protected] # 5.4 Signed-off-by: Lucas Stach <[email protected]> Tested-by: Michael Walle <[email protected]> Tested-by: Marek Vasut <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
2021-09-16	drm/etnaviv: put submit prev MMU context when it exists	Lucas Stach	1	-0/+2
	The prev context is the MMU context at the time of the job queueing in hardware. As a job might be queued multiple times due to recovery after a GPU hang, we need to make sure to put the stale prev MMU context from a prior queuing, to avoid the reference and thus the MMU context leaking. Cc: [email protected] # 5.4 Signed-off-by: Lucas Stach <[email protected]> Tested-by: Michael Walle <[email protected]> Tested-by: Marek Vasut <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
2021-09-16	drm/etnaviv: return context from etnaviv_iommu_context_get	Lucas Stach	5	-11/+8
	Being able to have the refcount manipulation in an assignment makes it much easier to parse the code. Cc: [email protected] # 5.4 Signed-off-by: Lucas Stach <[email protected]> Tested-by: Michael Walle <[email protected]> Tested-by: Marek Vasut <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
2021-09-16	parisc: Use absolute_pointer() to define PAGE0	Helge Deller	1	-1/+1
	Use absolute_pointer() wrapper for PAGE0 to avoid this compiler warning: arch/parisc/kernel/setup.c: In function 'start_parisc': error: '__builtin_memcmp_eq' specified bound 8 exceeds source size 0 Signed-off-by: Helge Deller <[email protected]> Co-Developed-by: Guenter Roeck <[email protected]> Suggested-by: Linus Torvalds <[email protected]>
2021-09-15	Merge tag 'hyperv-fixes-signed-20210915' of ↵	Linus Torvalds	3	-19/+46
	git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix kernel crash caused by uio driver (Vitaly Kuznetsov) - Remove on-stack cpumask from HV APIC code (Wei Liu) * tag 'hyperv-fixes-signed-20210915' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: remove on-stack cpumask from hv_send_ipi_mask_allbutself asm-generic/hyperv: provide cpumask_to_vpset_noself Drivers: hv: vmbus: Fix kernel crash upon unbinding a device from uio_hv_generic driver
2021-09-15	Merge tag 'rtc-5.15-fixes' of ↵	Linus Torvalds	1	-0/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux Pull RTC fix from Alexandre Belloni: "Fix a locking issue in the cmos rtc driver" * tag 'rtc-5.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: rtc: cmos: Disable irq around direct invocation of cmos_interrupt()
2021-09-15	net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports	Vladimir Oltean	4	-15/+42
	Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error messages appear: mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2 (and similarly for other ports) What happens is that DSA has a policy "even if there are bugs, let's at least not leak memory" and dsa_port_teardown() clears the dp->fdbs and dp->mdbs lists, which are supposed to be empty. But deleting that cleanup code, the warnings go away. => the FDB and MDB lists (used for refcounting on shared ports, aka CPU and DSA ports) will eventually be empty, but are not empty by the time we tear down those ports. Aka we are deleting them too soon. The addresses that DSA complains about are host-trapped addresses: the local addresses of the ports, and the MAC address of the bridge device. The problem is that offloading those entries happens from a deferred work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this races with the teardown of the CPU and DSA ports where the refcounting is kept. In fact, not only it races, but fundamentally speaking, if we iterate through the port list linearly, we might end up tearing down the shared ports even before we delete a DSA user port which has a bridge upper. So as it turns out, we need to first tear down the user ports (and the unused ones, for no better place of doing that), then the shared ports (the CPU and DSA ports). In between, we need to ensure that all work items scheduled by our switchdev handlers (which only run for user ports, hence the reason why we tear them down first) have finished. Fixes: 161ca59d39e9 ("net: dsa: reference count the MDB entries at the cross-chip notifier level") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-09-15	Revert "net: phy: Uniform PHY driver access"	Vladimir Oltean	1	-1/+3
	This reverts commit 3ac8eed62596387214869319379c1fcba264d8c6, which did more than it said on the box, and not only it replaced to_phy_driver with phydev->drv, but it also removed the "!drv" check, without actually explaining why that is fine. That patch in fact breaks suspend/resume on any system which has PHY devices with no drivers bound. The stack trace is: Unable to handle kernel NULL pointer dereference at virtual address 00000000000000e8 pc : mdio_bus_phy_suspend+0xd8/0xec lr : dpm_run_callback+0x38/0x90 Call trace: mdio_bus_phy_suspend+0xd8/0xec dpm_run_callback+0x38/0x90 __device_suspend+0x108/0x3cc dpm_suspend+0x140/0x210 dpm_suspend_start+0x7c/0xa0 suspend_devices_and_enter+0x13c/0x540 pm_suspend+0x2a4/0x330 Examples why that assumption is not fine: - There is an MDIO bus with a PHY device that doesn't have a specific PHY driver loaded, because mdiobus_register() automatically creates a PHY device for it but there is no specific PHY driver in the system. Normally under those circumstances, the generic PHY driver will be bound lazily to it (at phy_attach_direct time). But some Ethernet drivers attach to their PHY at .ndo_open time. Until then it, the to-be-driven-by-genphy PHY device will not have a driver. The blamed patch amounts to saying "you need to open all net devices before the system can suspend, to avoid the NULL pointer dereference". - There is any raw MDIO device which has 'plausible' values in the PHY ID registers 2 and 3, which is located on an MDIO bus whose driver does not set bus->phy_mask = ~0 (which prevents auto-scanning of PHY devices). An example could be a MAC's internal MDIO bus with PCS devices on it, for serial links such as SGMII. PHY devices will get created for those PCSes too, due to that MDIO bus auto-scanning, and although those PHY devices are not used, they do not bother anybody either. PCS devices are usually managed in Linux as raw MDIO devices. Nonetheless, they do not have a PHY driver, nor does anybody attempt to connect to them (because they are not a PHY), and therefore this patch breaks that. The goal itself of the patch is questionable, so I am going for a straight revert. to_phy_driver does not seem to have a need to be replaced by phydev->drv, in fact that might even trigger code paths which were not given too deep of a thought. For instance: phy_probe populates phydev->drv at the beginning, but does not clean it up on any error (including EPROBE_DEFER). So if the phydev driver requests probe deferral, phydev->drv will remain populated despite there being no driver bound. If a system suspend starts in between the initial probe deferral request and the subsequent probe retry, we will be calling the phydev->drv->suspend method, but _before_ any phydev->drv->probe call has succeeded. That is to say, if the phydev->drv is allocating any driver-private data structure in ->probe, it pretty much expects that data structure to be available in ->suspend. But it may not. That is a pretty insane environment to present to PHY drivers. In the code structure before the blamed patch, mdio_bus_phy_may_suspend would just say "no, don't suspend" to any PHY device which does not have a driver pointer _in_the_device_structure_ (not the phydev->drv). That would essentially ensure that ->suspend will never get called for a device that has not yet successfully completed probe. This is the code structure the patch is returning to, via the revert. Fixes: 3ac8eed62596 ("net: phy: Uniform PHY driver access") Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Florian Fainelli <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-09-15	net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup	Vladimir Oltean	1	-7/+5
	DSA supports connecting to a phy-handle, and has a fallback to a non-OF based method of connecting to an internal PHY on the switch's own MDIO bus, if no phy-handle and no fixed-link nodes were present. The -ENODEV error code from the first attempt (phylink_of_phy_connect) is what triggers the second attempt (phylink_connect_phy). However, when the first attempt returns a different error code than -ENODEV, this results in an unbalance of calls to phylink_create and phylink_destroy by the time we exit the function. The phylink instance has leaked. There are many other error codes that can be returned by phylink_of_phy_connect. For example, phylink_validate returns -EINVAL. So this is a practical issue too. Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-09-15	MAINTAINERS: Add Nirmal Patel as VMD maintainer	Jon Derrick	1	-1/+2
	Change my email to my unaffiliated address and move me to reviewer status. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jon Derrick <[email protected]> Signed-off-by: Jon Derrick <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Cc: Nirmal Patel <[email protected]>
2021-09-15	PCI: Add AMD GPU multi-function power dependencies	Evan Quan	1	-2/+7
	Some AMD GPUs have built-in USB xHCI and USB Type-C UCSI controllers with power dependencies between the GPU and the other functions as in 6d2e369f0d4c ("PCI: Add NVIDIA GPU multi-function power dependencies"). Add device link support for the AMD integrated USB xHCI and USB Type-C UCSI controllers. Without this, runtime power management, including GPU resume and temp and fan sensors don't work correctly. Reported-at: https://gitlab.freedesktop.org/drm/amd/-/issues/1704 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Evan Quan <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Cc: [email protected]
2021-09-15	PCI/ACPI: Don't reset a fwnode set by OF	Jean-Philippe Brucker	1	-1/+1
	Commit 375553a93201 ("PCI: Setup ACPI fwnode early and at the same time with OF") added a call to pci_set_acpi_fwnode() in pci_setup_device(), which unconditionally clears any fwnode previously set by pci_set_of_node(). pci_set_acpi_fwnode() looks for ACPI_COMPANION(), which only returns the existing fwnode if it was set by ACPI_COMPANION_SET(). If it was set by OF instead, ACPI_COMPANION() returns NULL and pci_set_acpi_fwnode() accidentally clears the fwnode. To fix this, look for any fwnode instead of just ACPI companions. Fixes a virtio-iommu boot regression in v5.15-rc1. Fixes: 375553a93201 ("PCI: Setup ACPI fwnode early and at the same time with OF") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jean-Philippe Brucker <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Rob Herring <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]>
2021-09-15	PCI/VPD: Defer VPD sizing until first access	Bjorn Helgaas	1	-10/+26
	7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") reads VPD at enumeration-time to find the size. But this is quite slow, and we don't need the size until we actually need data from VPD. Dave reported a boot slowdown of more than two minutes [1]. Defer the VPD sizing until a driver or the user (via sysfs) requests information from VPD. If devices are quirked because VPD is known not to work, don't bother even looking for the VPD capability. The VPD will not be accessible at all. [1] https://lore.kernel.org/r/[email protected]/ Link: https://lore.kernel.org/r/20210914215543.GA1437800@bjorn-Precision-5520 Fixes: 7bac54497c3e ("PCI/VPD: Determine VPD size in pci_vpd_init()") Signed-off-by: Bjorn Helgaas <[email protected]>
2021-09-15	qnx4: avoid stringop-overread errors	Linus Torvalds	1	-17/+34
	The qnx4 directory entries are 64-byte blocks that have different contents depending on the a status byte that is in the last byte of the block. In particular, a directory entry can be either a "link info" entry with a 48-byte name and pointers to the real inode information, or an "inode entry" with a smaller 16-byte name and the full inode information. But the code was written to always just treat the directory name as if it was part of that "inode entry", and just extend the name to the longer case if the status byte said it was a link entry. That work just fine and gives the right results, but now that gcc is tracking data structure accesses much more, the code can trigger a compiler error about using up to 48 bytes (the long name) in a structure that only has that shorter name in it: fs/qnx4/dir.c: In function ‘qnx4_readdir’: fs/qnx4/dir.c:51:32: error: ‘strnlen’ specified bound 48 exceeds source size 16 [-Werror=stringop-overread] 51 \| size = strnlen(de->di_fname, size); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from fs/qnx4/qnx4.h:3, from fs/qnx4/dir.c:16: include/uapi/linux/qnx4_fs.h:45:25: note: source object declared here 45 \| char di_fname[QNX4_SHORT_NAME_MAX]; \| ^~~~~~~~ which is because the source code doesn't really make this whole "one of two different types" explicit. Fix this by introducing a very explicit union of the two types, and basically explaining to the compiler what is really going on. Signed-off-by: Linus Torvalds <[email protected]>