aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-11-24rxrpc: Express protocol timeouts in terms of RTTDavid Howells2-4/+25
Express protocol timeouts for data retransmission and deferred ack generation in terms on RTT rather than specified timeouts once we have sufficient RTT samples. For the moment, this requires just one RTT sample to be able to use this for ack deferral and two for data retransmission. The data retransmission timeout is set at RTT*1.5 and the ACK deferral timeout is set at RTT. Note that the calculated timeout is limited to a minimum of 4ns to make sure it doesn't happen too quickly. Signed-off-by: David Howells <[email protected]>
2017-11-24rxrpc: Don't transmit DELAY ACKs immediately on proposalDavid Howells1-2/+2
Don't transmit a DELAY ACK immediately on proposal when the Rx window is rotated, but rather defer it to the work function. This means that we have a chance to queue/consume more received packets before we actually send the DELAY ACK, or even cancel it entirely, thereby reducing the number of packets transmitted. We do, however, want to continue sending other types of packet immediately, particularly REQUESTED ACKs, as they may be used for RTT calculation by the other side. Signed-off-by: David Howells <[email protected]>
2017-11-24rxrpc: Fix call timeoutsDavid Howells11-201/+290
Fix the rxrpc call expiration timeouts and make them settable from userspace. By analogy with other rx implementations, there should be three timeouts: (1) "Normal timeout" This is set for all calls and is triggered if we haven't received any packets from the peer in a while. It is measured from the last time we received any packet on that call. This is not reset by any connection packets (such as CHALLENGE/RESPONSE packets). If a service operation takes a long time, the server should generate PING ACKs at a duration that's substantially less than the normal timeout so is to keep both sides alive. This is set at 1/6 of normal timeout. (2) "Idle timeout" This is set only for a service call and is triggered if we stop receiving the DATA packets that comprise the request data. It is measured from the last time we received a DATA packet. (3) "Hard timeout" This can be set for a call and specified the maximum lifetime of that call. It should not be specified by default. Some operations (such as volume transfer) take a long time. Allow userspace to set/change the timeouts on a call with sendmsg, using a control message: RXRPC_SET_CALL_TIMEOUTS The data to the message is a number of 32-bit words, not all of which need be given: u32 hard_timeout; /* sec from first packet */ u32 idle_timeout; /* msec from packet Rx */ u32 normal_timeout; /* msec from data Rx */ This can be set in combination with any other sendmsg() that affects a call. Signed-off-by: David Howells <[email protected]>
2017-11-24rxrpc: Split the call params from the operation paramsDavid Howells4-45/+60
When rxrpc_sendmsg() parses the control message buffer, it places the parameters extracted into a structure, but lumps together call parameters (such as user call ID) with operation parameters (such as whether to send data, send an abort or accept a call). Split the call parameters out into their own structure, a copy of which is then embedded in the operation parameters struct. The call parameters struct is then passed down into the places that need it instead of passing the individual parameters. This allows for extra call parameters to be added. Signed-off-by: David Howells <[email protected]>
2017-11-24rxrpc: Delay terminal ACK transmission on a client callDavid Howells5-13/+108
Delay terminal ACK transmission on a client call by deferring it to the connection processor. This allows it to be skipped if we can send the next call instead, the first DATA packet of which will implicitly ack this call. Signed-off-by: David Howells <[email protected]>
2017-11-24rxrpc: Provide a different lockdep key for call->user_mutex for kernel callsDavid Howells3-6/+17
Provide a different lockdep key for rxrpc_call::user_mutex when the call is made on a kernel socket, such as by the AFS filesystem. The problem is that lockdep registers a false positive between userspace calling the sendmsg syscall on a user socket where call->user_mutex is held whilst userspace memory is accessed whereas the AFS filesystem may perform operations with mmap_sem held by the caller. In such a case, the following warning is produced. ====================================================== WARNING: possible circular locking dependency detected 4.14.0-fscache+ #243 Tainted: G E ------------------------------------------------------ modpost/16701 is trying to acquire lock: (&vnode->io_lock){+.+.}, at: [<ffffffffa000fc40>] afs_begin_vnode_operation+0x33/0x77 [kafs] but task is already holding lock: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++}: __might_fault+0x61/0x89 _copy_from_iter_full+0x40/0x1fa rxrpc_send_data+0x8dc/0xff3 rxrpc_do_sendmsg+0x62f/0x6a1 rxrpc_sendmsg+0x166/0x1b7 sock_sendmsg+0x2d/0x39 ___sys_sendmsg+0x1ad/0x22b __sys_sendmsg+0x41/0x62 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #2 (&call->user_mutex){+.+.}: __mutex_lock+0x86/0x7d2 rxrpc_new_client_call+0x378/0x80e rxrpc_kernel_begin_call+0xf3/0x154 afs_make_call+0x195/0x454 [kafs] afs_vl_get_capabilities+0x193/0x198 [kafs] afs_vl_lookup_vldb+0x5f/0x151 [kafs] afs_create_volume+0x2e/0x2f4 [kafs] afs_mount+0x56a/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #1 (k-sk_lock-AF_RXRPC){+.+.}: lock_sock_nested+0x74/0x8a rxrpc_kernel_begin_call+0x8a/0x154 afs_make_call+0x195/0x454 [kafs] afs_fs_get_capabilities+0x17a/0x17f [kafs] afs_probe_fileserver+0xf7/0x2f0 [kafs] afs_select_fileserver+0x83f/0x903 [kafs] afs_fetch_status+0x89/0x11d [kafs] afs_iget+0x16f/0x4f8 [kafs] afs_mount+0x6c6/0x8d7 [kafs] mount_fs+0x6a/0x109 vfs_kern_mount+0x67/0x135 do_mount+0x90b/0xb57 SyS_mount+0x72/0x98 do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 -> #0 (&vnode->io_lock){+.+.}: lock_acquire+0x174/0x19f __mutex_lock+0x86/0x7d2 afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 __clear_user+0x3d/0x60 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be return_from_SYSCALL_64+0x0/0x75 other info that might help us debug this: Chain exists of: &vnode->io_lock --> &call->user_mutex --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(&call->user_mutex); lock(&mm->mmap_sem); lock(&vnode->io_lock); *** DEADLOCK *** 1 lock held by modpost/16701: #0: (&mm->mmap_sem){++++}, at: [<ffffffff8104376a>] __do_page_fault+0x1ef/0x486 stack backtrace: CPU: 0 PID: 16701 Comm: modpost Tainted: G E 4.14.0-fscache+ #243 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack+0x67/0x8e print_circular_bug+0x341/0x34f check_prev_add+0x11f/0x5d4 ? add_lock_to_list.isra.12+0x8b/0x8b ? add_lock_to_list.isra.12+0x8b/0x8b ? __lock_acquire+0xf77/0x10b4 __lock_acquire+0xf77/0x10b4 lock_acquire+0x174/0x19f ? afs_begin_vnode_operation+0x33/0x77 [kafs] __mutex_lock+0x86/0x7d2 ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] ? afs_begin_vnode_operation+0x33/0x77 [kafs] afs_begin_vnode_operation+0x33/0x77 [kafs] afs_fetch_data+0x80/0x12a [kafs] afs_readpages+0x314/0x405 [kafs] __do_page_cache_readahead+0x203/0x2ba ? filemap_fault+0x179/0x54d filemap_fault+0x179/0x54d __do_fault+0x17/0x60 __handle_mm_fault+0x6d7/0x95c handle_mm_fault+0x24e/0x2a3 __do_page_fault+0x301/0x486 do_page_fault+0x236/0x259 page_fault+0x22/0x30 RIP: 0010:__clear_user+0x3d/0x60 RSP: 0018:ffff880071e93da0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 000000000000011c RCX: 000000000000011c RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000060f720 RBP: 000000000060f720 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: ffff8800b5459b68 R12: ffff8800ce150e00 R13: 000000000060f720 R14: 00000000006127a8 R15: 0000000000000000 padzero+0x1c/0x2b load_elf_binary+0x785/0xdc7 search_binary_handler+0x81/0x1ff do_execveat_common.isra.14+0x600/0x888 do_execve+0x1f/0x21 SyS_execve+0x28/0x2f do_syscall_64+0x89/0x1be entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fdb6009ee07 RSP: 002b:00007fff566d9728 EFLAGS: 00000246 ORIG_RAX: 000000000000003b RAX: ffffffffffffffda RBX: 000055ba57280900 RCX: 00007fdb6009ee07 RDX: 000055ba5727f270 RSI: 000055ba5727cac0 RDI: 000055ba57280900 RBP: 000055ba57280900 R08: 00007fff566d9700 R09: 0000000000000000 R10: 000055ba5727cac0 R11: 0000000000000246 R12: 0000000000000000 R13: 000055ba5727cac0 R14: 000055ba5727f270 R15: 0000000000000000 Signed-off-by: David Howells <[email protected]>
2017-11-24rxrpc: Don't set upgrade by default in sendmsg()David Howells1-1/+1
Don't set upgrade by default when creating a call from sendmsg(). This is a holdover from when I was testing the code. Signed-off-by: David Howells <[email protected]>
2017-11-24rxrpc: The mutex lock returned by rxrpc_accept_call() needs releasingDavid Howells1-2/+3
The caller of rxrpc_accept_call() must release the lock on call->user_mutex returned by that function. Signed-off-by: David Howells <[email protected]>
2017-11-24s390: fix alloc_pgste check in init_new_context againMartin Schwidefsky1-1/+1
git commit badb8bb983e9 "fix alloc_pgste check in init_new_context" fixed the problem of 'current->mm == NULL' in init_new_context back in 2011. git commit 3eabaee998c7 "KVM: s390: allow sie enablement for multi- threaded programs" completely removed the check against alloc_pgste. git commit 23fefe119ceb "s390/kvm: avoid global config of vm.alloc_pgste=1" re-added a check against the alloc_pgste flag but without the required check for current->mm != NULL. For execve() called by a kernel thread init_new_context() reads from ((struct mm_struct *) NULL)->context.alloc_pgste to decide between 2K vs 4K page tables. If the bit happens to be set for the init process it will be created with large page tables. This decision is inherited by all the children of init, this waste quite some memory. Re-add the check for 'current->mm != NULL'. Fixes: 23fefe119ceb ("s390/kvm: avoid global config of vm.alloc_pgste=1") Signed-off-by: Martin Schwidefsky <[email protected]>
2017-11-24s390/disassembler: correct disassembly lines alignmentVasily Gorbik1-1/+1
176.718956 Krnl Code: 00000000004d38b0: a54c0018 llihh %r4,24 176.718956 00000000004d38b4: b9080014 agr %r1,%r4 ^ Using a tab to align disassembly lines which follow the first line with "Krnl Code: " doesn't always work, e.g. if there is a prefix (timestamp or syslog prefix) which is not 8 chars aligned. Go back to alignment with spaces. Fixes: b192571d1ae3 ("s390/disassembler: increase show_code buffer size") Signed-off-by: Vasily Gorbik <[email protected]> Signed-off-by: Martin Schwidefsky <[email protected]>
2017-11-24sched/debug: Fix task state recording/printoutThomas Gleixner1-3/+3
The recent conversion of the task state recording to use task_state_index() broke the sched_switch tracepoint task state output. task_state_index() returns surprisingly an index (0-7) which is then printed with __print_flags() applying bitmasks. Not really working and resulting in weird states like 'prev_state=t' instead of 'prev_state=I'. Use TASK_REPORT_MAX instead of TASK_STATE_MAX to report preemption. Build a bitmask from the return value of task_state_index() and store it in entry->prev_state, which makes __print_flags() work as expected. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Fixes: efb40f588b43 ("sched/tracing: Fix trace_sched_switch task-state printing") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1711221304180.1751@nanos Signed-off-by: Ingo Molnar <[email protected]>
2017-11-24x86/decoder: Add new TEST instruction patternMasami Hiramatsu1-1/+1
The kbuild test robot reported this build warning: Warning: arch/x86/tools/test_get_len found difference at <jump_table>:ffffffff8103dd2c Warning: ffffffff8103dd82: f6 09 d8 testb $0xd8,(%rcx) Warning: objdump says 3 bytes, but insn_get_length() says 2 Warning: decoded and checked 1569014 instructions with 1 warnings This sequence seems to be a new instruction not in the opcode map in the Intel SDM. The instruction sequence is "F6 09 d8", means Group3(F6), MOD(00)REG(001)RM(001), and 0xd8. Intel SDM vol2 A.4 Table A-6 said the table index in the group is "Encoding of Bits 5,4,3 of the ModR/M Byte (bits 2,1,0 in parenthesis)" In that table, opcodes listed by the index REG bits as: 000 001 010 011 100 101 110 111 TEST Ib/Iz,(undefined),NOT,NEG,MUL AL/rAX,IMUL AL/rAX,DIV AL/rAX,IDIV AL/rAX So, it seems TEST Ib is assigned to 001. Add the new pattern. Reported-by: kbuild test robot <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-11-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds62-409/+995
Pull networking fixes from David Miller: 1) Fix PCI IDs of 9000 series iwlwifi devices, from Luca Coelho. 2) bpf offload bug fixes from Jakub Kicinski. 3) Fix bpf verifier to NOP out code which is dead at run time because due to branch pruning the verifier will not explore such instructions. From Alexei Starovoitov. 4) Fix crash when deleting secondary chains in packet scheduler classifier. From Roman Kapl. 5) Fix buffer management bugs in smc, from Ursula Braun. 6) Fix regression in anycast route handling, from David Ahern. 7) Fix link settings regression in r8169, from Tobias Jakobi. 8) Add back enough UFO support so that live migration still works, from Willem de Bruijn. 9) Linearize enough packet data for the full extent to which the ipvlan code will inspect the packet headers, from Gao Feng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) ipvlan: Fix insufficient skb linear check for ipv6 icmp ipvlan: Fix insufficient skb linear check for arp geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6 net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY net: accept UFO datagrams from tuntap and packet net: realtek: r8169: implement set_link_ksettings() net: ipv6: Fixup device for anycast routes during copy net/smc: Fix preinitialization of buf_desc in __smc_buf_create() net/smc: use sk_rcvbuf as start for rmb creation ipv6: Do not consider linkdown nexthops during multipath net: sched: fix crash when deleting secondary chains net: phy: cortina: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE bpf: fix branch pruning logic bpf: change bpf_perf_event_output arg5 type to ARG_CONST_SIZE_OR_ZERO bpf: change bpf_probe_read_str arg2 type to ARG_CONST_SIZE_OR_ZERO bpf: remove explicit handling of 0 for arg2 in bpf_probe_read bpf: introduce ARG_PTR_TO_MEM_OR_NULL i40evf: Use smp_rmb rather than read_barrier_depends fm10k: Use smp_rmb rather than read_barrier_depends igb: Use smp_rmb rather than read_barrier_depends ...
2017-11-23Merge tag 'platform-drivers-x86-v4.15-2' of ↵Linus Torvalds4-3/+41
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver fixes from Darren Hart: "Fix two issues resulting from the dell-smbios refactoring and introduction of the dell-smbios-wmi dispatcher. The first ensures a proper error code is returned when kzalloc fails. The second avoids an issue in older Dell BIOS implementations which would fail if the more complex calls were made by limiting those platforms to the simple calls such as those used by the existing dell-laptop and dell-wmi drivers, preserving their functionality prior to the addition of the dell-smbios-wmi dispatcher" * tag 'platform-drivers-x86-v4.15-2' of git://git.infradead.org/linux-platform-drivers-x86: platform/x86: dell-laptop: fix error return code in dell_init() platform/x86: dell-smbios-wmi: Disable userspace interface if missing hotfix
2017-11-23Merge tag 'scsi-fixes' of ↵Linus Torvalds6-51/+78
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Two basic fixes: one for the sparse problem with the blacklist flags and another for a hang forever in bnx2i" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: Use 'blist_flags_t' for scsi_devinfo flags scsi: bnx2fc: Fix hung task messages when a cleanup response is not received during abort
2017-11-23Merge tag 'sound-fix-4.15-rc1' of ↵Linus Torvalds11-23/+69
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "All commits found here are small fixes for regression or stable: - PCM timestamp behavior fix that could be seen as a regression - Remove spurious WARN_ON() from ALSA timer 32bit compat ioctl - HD-audio HDMI/DP channel mapping fix for 32bit archs - Fix the previous fix for HD-audio initialization code - More hardening USB-audio against malicious USB descriptors - HD-audio quirks/fixes (Realtek codec, AMD controller) - Missing help text for the recent Intel SST kconfig change" * tag 'sound-fix-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda: Add Raven PCI ID ALSA: hda/realtek - Fix ALC700 family no sound issue ALSA: hda - Fix yet remaining issue with vmaster 0dB initialization ALSA: usb-audio: Add sanity checks in v2 clock parsers ALSA: usb-audio: Fix potential zero-division at parsing FU ALSA: usb-audio: Fix potential out-of-bound access at parsing SU ALSA: usb-audio: Add sanity checks to FE parser ALSA: timer: Remove kernel warning at compat ioctl error paths ALSA: pcm: update tstamp only if audio_tstamp changed ALSA: hda/realtek: Add headset mic support for Intel NUC Skull Canyon ALSA: hda: Fix too short HDMI/DP chmap reporting ALSA: usb-audio: uac1: Invalidate ctl on interrupt ALSA: hda/realtek - Fix ALC275 no sound issue ASoC: Intel: Add help text for SND_SOC_INTEL_SST_TOPLEVEL
2017-11-23Merge tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds39-1399/+281
Pull more drm updates from Dave Airlie: "Fixes/cleanups for rc1, non-desktop flags for VR - remove the MSM dt-bindings file Rob managed to push in the previous pull. - add a property/edid quirk to denote HMD devices, I had these hanging around for a few weeks and Keith had done some work on them, they are fairly self contained and small, and only affect people using HTC Vive VR headsets so far. - amdgpu, tegra, tilcdc, fsl fixes - some imx-drm cleanups I missed, these seemed pretty small, and no reason to hold off. I have one TTM regression fix (fixes bochs-vga in qemu) sitting locally awaiting review I'll probably send that in a separate pull request tomorrow" * tag 'drm-for-v4.15-part2' of git://people.freedesktop.org/~airlied/linux: (33 commits) dt-bindings: remove file that was added accidentally drm/edid: quirk HTC vive headset as non-desktop. [v2] drm/fb: add support for not enabling fbcon on non-desktop displays [v2] drm: add connector info/property for non-desktop displays [v2] drm/amdgpu: fix rmmod KCQ disable failed error drm/amdgpu: fix kernel hang when starting VNC server drm/amdgpu: don't skip attributes when powerplay is enabled drm/amd/pp: fix typecast error in powerplay. drm/tilcdc: Remove obsolete "ti,tilcdc,slave" dts binding support drm/tegra: sor: Reimplement pad clock Revert "drm/radeon: dont switch vt on suspend" drm/amd/amdgpu: fix over-bound accessing in amdgpu_cs_wait_any_fence drm/amd/powerplay: fix unfreeze level smc message for smu7 drm/amdgpu:fix memleak drm/amdgpu:fix memleak in takedown drm/amd/pp: fix dpm randomly failed on Vega10 drm/amdgpu: set f_mapping on exported DMA-bufs drm/amdgpu: Properly allocate VM invalidate eng v2 drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume() drm/fsl-dcu: avoid disabling pixel clock twice on suspend ...
2017-11-23Merge tag 'docs-4.15-2' of git://git.lwn.net/linuxLinus Torvalds6-143/+137
Pull documentation updates from Jonathan Corbet: "A few late-arriving docs updates that have no real reason to wait. There's a new "Co-Developed-by" tag described by Greg, and a build enhancement from Willy to generate docs warnings during a kernel build (but only when additional warnings have been requested in general)" * tag 'docs-4.15-2' of git://git.lwn.net/linux: Add optional check for bad kernel-doc comments Documentation: fix profile= options in kernel-parameters.txt documentation/svga.txt: update outdated file kokr/memory-barriers.txt: Fix typo in paring example kokr/memory-barriers/txt: Replace uses of "transitive" Documentation/process: add Co-Developed-by: tag for patches with multiple authors
2017-11-23Merge branch 'next-keys' of ↵Linus Torvalds14-65/+66
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull keys update from James Morris: "There's nothing too controversial here: - Doc fix for keyctl_read(). - time_t -> time64_t replacement. - Set the module licence on things to prevent tainting" * 'next-keys' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: pkcs7: Set the module licence to prevent tainting security: keys: Replace time_t with time64_t for struct key_preparsed_payload security: keys: Replace time_t/timespec with time64_t KEYS: fix in-kernel documentation for keyctl_read()
2017-11-23Merge tag 'apparmor-pr-2017-11-21' of ↵Linus Torvalds11-71/+91
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor updates from John Johansen: "No features this time, just minor cleanups and bug fixes. Cleanups: - fix spelling mistake: "resoure" -> "resource" - remove unused redundant variable stop - Fix bool initialization/comparison Bug Fixes: - initialized returned struct aa_perms - fix leak of null profile name if profile allocation fails - ensure that undecidable profile attachments fail - fix profile attachment for special unconfined profiles - fix locking when creating a new complain profile. - fix possible recursive lock warning in __aa_create_ns" * tag 'apparmor-pr-2017-11-21' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: fix possible recursive lock warning in __aa_create_ns apparmor: fix locking when creating a new complain profile. apparmor: fix profile attachment for special unconfined profiles apparmor: ensure that undecidable profile attachments fail apparmor: fix leak of null profile name if profile allocation fails apparmor: remove unused redundant variable stop apparmor: Fix bool initialization/comparison apparmor: initialized returned struct aa_perms apparmor: fix spelling mistake: "resoure" -> "resource"
2017-11-24powerpc/kexec: Fix kexec/kdump in P9 guest kernelsMichael Ellerman1-0/+2
The code that cleans up the IAMR/AMOR before kexec'ing failed to remember that when we're running as a guest AMOR is not writable, it's hypervisor privileged. They symptom is that the kexec stops before entering purgatory and nothing else is seen on the console. If you examine the state of the system all threads will be in the 0x700 program check handler. Fix it by making the write to AMOR dependent on HV mode. Fixes: 1e2a516e89fc ("powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR") Cc: [email protected] # v4.10+ Reported-by: Yilin Zhang <[email protected]> Debugged-by: David Gibson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Acked-by: Balbir Singh <[email protected]> Reviewed-by: David Gibson <[email protected]> Tested-by: David Gibson <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2017-11-24crypto: af_alg - remove locking in async callbackStephan Mueller4-29/+39
The code paths protected by the socket-lock do not use or modify the socket in a non-atomic fashion. The actions pertaining the socket do not even need to be handled as an atomic operation. Thus, the socket-lock can be safely ignored. This fixes a bug regarding scheduling in atomic as the callback function may be invoked in interrupt context. In addition, the sock_hold is moved before the AIO encrypt/decrypt operation to ensure that the socket is always present. This avoids a tiny race window where the socket is unprotected and yet used by the AIO operation. Finally, the release of resources for a crypto operation is moved into a common function of af_alg_free_resources. Cc: <[email protected]> Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory management") Reported-by: Romain Izard <[email protected]> Signed-off-by: Stephan Mueller <[email protected]> Tested-by: Romain Izard <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2017-11-24crypto: algif_aead - skip SGL entries with NULL pageStephan Mueller1-9/+24
The TX SGL may contain SGL entries that are assigned a NULL page. This may happen if a multi-stage AIO operation is performed where the data for each stage is pointed to by one SGL entry. Upon completion of that stage, af_alg_pull_tsgl will assign NULL to the SGL entry. The NULL cipher used to copy the AAD from TX SGL to the destination buffer, however, cannot handle the case where the SGL starts with an SGL entry having a NULL page. Thus, the code needs to advance the start pointer into the SGL to the first non-NULL entry. This fixes a crash visible on Intel x86 32 bit using the libkcapi test suite. Cc: <[email protected]> Fixes: 72548b093ee38 ("crypto: algif_aead - copy AAD from src to dst") Signed-off-by: Stephan Mueller <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2017-11-23blk-wbt: fix comments typoweiping zhang1-1/+1
Signed-off-by: weiping zhang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-23blk-wbt: move wbt_clear_stat to common place in wbt_doneweiping zhang1-2/+1
wbt_done call wbt_clear_stat no matter current stat was tracked or not, move it to common place. Signed-off-by: weiping zhang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-23blk-sysfs: remove NULL pointer checking in queue_wb_lat_storeweiping zhang1-4/+1
wbt_init doesn't set q->rq_wb to NULL, if wbt_init return 0, so check return value is enough, remove NULL checking. Signed-off-by: weiping zhang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-23blk-wbt: remove duplicated setting in wbt_initweiping zhang1-2/+0
rwb->wc and rwb->queue_depth were overwritten by wbt_set_write_cache and wbt_set_queue_depth, remove the default setting. Signed-off-by: weiping zhang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-11-23Merge branch 'nvme-4.15' of git://git.infradead.org/nvme into for-linusJens Axboe9-47/+107
Pull NVMe fixes from Christoph: "A couple nvme fixes for 4.15: - expand the queue ready fix that we only had for RDMA to also cover FC and loop by moving it to common code (Sagi) - fix an array out of bounds in the PCIe HMB code (Minwoo Im) - two new device quirks (Jeff Lien and Kai-Heng Feng) - static checkers fixes (Keith Busch) - FC target refcount fix (James Smart) - A trivial spelling fix in new code (Colin Ian King)"
2017-11-24Merge tag 'drm-misc-fixes-2017-11-20' of ↵Dave Airlie5-5/+22
git://anongit.freedesktop.org/drm/drm-misc into drm-next 4.15 merge window fixes 1 * tag 'drm-misc-fixes-2017-11-20' of git://anongit.freedesktop.org/drm/drm-misc: drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks drm/vc4: Account for interrupts in flight
2017-11-24Merge tag 'drm-intel-next-fixes-2017-11-23' of ↵Dave Airlie6-3/+23
git://anongit.freedesktop.org/drm/drm-intel into drm-next drm/i915 fixes for v4.15 * tag 'drm-intel-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-intel: drm/i915: Fix init_clock_gating for resume drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM drm/i915: Clear breadcrumb node when cancelling signaling drm/i915/gvt: ensure -ve return value is handled correctly drm/i915: Re-register PMIC bus access notifier on runtime resume drm/i915: Fix false-positive assert_rpm_wakelock_held in i915_pmic_bus_access_notifier v2
2017-11-24Merge tag 'drm-misc-next-fixes-2017-11-23' of ↵Dave Airlie1-0/+1
git://anongit.freedesktop.org/drm/drm-misc into drm-next Fix crtc_id in page_flip event. * tag 'drm-misc-next-fixes-2017-11-23' of git://anongit.freedesktop.org/drm/drm-misc: drm/vblank: Pass crtc_id to page_flip_ioctl.
2017-11-24drm/ttm: don't attempt to use hugepages if dma32 requested (v2)Dave Airlie1-16/+20
The commit below introduced thp support for ttm allocations, however it didn't take into account the case where dma32 was requested. Some drivers always request dma32, and the bochs driver is one of those. This fixes an oops: [ 30.108507] ------------[ cut here ]------------ [ 30.108920] kernel BUG at ./include/linux/gfp.h:408! [ 30.109356] invalid opcode: 0000 [#1] SMP [ 30.109700] Modules linked in: fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp mrp stp llc virtio_net [ 30.115605] virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop [ 30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted 4.15.0-0.rc0.git6.1.fc28.x86_64 #1 [ 30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 30.118866] task: ffff923a77e03380 task.stack: ffffa78182228000 [ 30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430 [ 30.119810] RSP: 0000:ffffa7818222bba8 EFLAGS: 00010202 [ 30.120250] RAX: 0000000000000001 RBX: 00000000014382c6 RCX: 0000000000000006 [ 30.120840] RDX: 0000000000000000 RSI: 0000000000000009 RDI: 0000000000000000 [ 30.121443] RBP: ffff923a760d6000 R08: 0000000000000000 R09: 0000000000000006 [ 30.122039] R10: 0000000000000040 R11: 0000000000000300 R12: ffff923a729273c0 [ 30.122629] R13: 0000000000000000 R14: 0000000000000000 R15: ffff923a7483d400 [ 30.123223] FS: 00007fe48da7dac0(0000) GS:ffff923a7cc00000(0000) knlGS:0000000000000000 [ 30.123896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 30.124373] CR2: 00007fe457b73000 CR3: 0000000078313000 CR4: 00000000000006f0 [ 30.124968] Call Trace: [ 30.125186] ttm_pool_populate+0x19b/0x400 [ttm] [ 30.125578] ttm_bo_vm_fault+0x325/0x570 [ttm] [ 30.125964] __do_fault+0x19/0x11e [ 30.126255] __handle_mm_fault+0xcd3/0x1260 [ 30.126609] handle_mm_fault+0x14c/0x310 [ 30.126947] __do_page_fault+0x28c/0x530 [ 30.127282] do_page_fault+0x32/0x270 [ 30.127593] async_page_fault+0x22/0x30 [ 30.127922] RIP: 0033:0x7fe48aae39a8 [ 30.128225] RSP: 002b:00007ffc21c4d928 EFLAGS: 00010206 [ 30.128664] RAX: 00007fe457b73000 RBX: 000055cd4c1041a0 RCX: 00007fe457b73040 [ 30.129259] RDX: 0000000000300000 RSI: 0000000000000000 RDI: 00007fe457b73000 [ 30.129855] RBP: 0000000000000300 R08: 000000000000000c R09: 0000000100000000 [ 30.130457] R10: 0000000000000001 R11: 0000000000000246 R12: 000055cd4c1041a0 [ 30.131054] R13: 000055cd4bdfe990 R14: 000055cd4c104110 R15: 0000000000000400 [ 30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff <0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c [ 30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: ffffa7818222bba8 [ 30.133836] ---[ end trace d4f1deb60784f40a ]--- v2: handle free path as well. Reported-by: Laura Abbott <[email protected]> Reported-by: Adam Williamson <[email protected]> Fixes: 0284f1ead87463bc17cf5e81a24fc65c052486f3 (drm/ttm: add transparent huge page support for cached allocations v2) Reviewed-by: Christian König <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2017-11-24Merge tag 'keys-next-20171123' of ↵James Morris14-65/+66
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next-keys Merge keys subsystem changes from David Howells, for v4.15.
2017-11-23x86/PCI: Remove unused HyperTransport interrupt supportBjorn Helgaas11-454/+2
There are no in-tree callers of ht_create_irq(), the driver interface for HyperTransport interrupts, left. Remove the unused entry point and all the supporting code. See 8b955b0dddb3 ("[PATCH] Initial generic hypertransport interrupt support"). Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Cc: Benjamin Herrenschmidt <[email protected]> Link: https://lkml.kernel.org/r/20171122221337.3877.23362.stgit@bhelgaas-glaptop.roam.corp.google.com
2017-11-23x86/umip: Fix insn_get_code_seg_params()'s return valueBorislav Petkov3-4/+4
In order to save on redundant structs definitions insn_get_code_seg_params() was made to return two 4-bit values in a char but clang complains: arch/x86/lib/insn-eval.c:780:10: warning: implicit conversion from 'int' to 'char' changes value from 132 to -124 [-Wconstant-conversion] return INSN_CODE_SEG_PARAMS(4, 8); ~~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~ ./arch/x86/include/asm/insn-eval.h:16:57: note: expanded from macro 'INSN_CODE_SEG_PARAMS' #define INSN_CODE_SEG_PARAMS(oper_sz, addr_sz) (oper_sz | (addr_sz << 4)) Those two values do get picked apart afterwards the opposite way of how they were ORed so wrt to the LSByte, the return value is the same. But this function returns -EINVAL in the error case, which is an int. So make it return an int which is the native word size anyway and thus fix the clang warning. Reported-by: Kees Cook <[email protected]> Reported-by: Nick Desaulniers <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2017-11-23x86/boot/KASLR: Remove unused variableChao Fan1-3/+2
There are two variables "rc" in mem_avoid_memmap. One at the top of the function and another one inside the while() loop. Drop the outer one as it is unused. Cleanup some whitespace damage while at it. Signed-off-by: Chao Fan <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2017-11-23genirq/matrix: Make - vs ?: Precedence explicitKees Cook1-1/+1
Noticed with a Clang build. This improves the readability of the ?: expression, as it has lower precedence than the - expression. Show explicitly that - is evaluated first. Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lkml.kernel.org/r/20171122205645.GA27125@beast
2017-11-23irqchip/imgpdc: Use resource_size function on resource objectVasyl Gomonovych1-1/+1
drivers/irqchip/irq-imgpdc.c:327:20-23: WARNING: Suspicious code. resource_size is maybe missing with res_regs Generated by: scripts/coccinelle/api/resource_size.cocci Signed-off-by: Vasyl Gomonovych <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2017-11-23irqchip/qcom: Fix u32 comparison with value less than zeroColin Ian King1-1/+1
The comparison of u32 nregs being less than zero is never true since nregs is unsigned. Fix this by making nregs a signed integer. Fixes: f20cc9b00c7b ("irqchip/qcom: Add IRQ combiner driver") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: [email protected] Cc: Jason Cooper <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2017-11-24Merge branch 'ipvlan-Fix-insufficient-skb-linear-check'David S. Miller1-9/+27
Gao Feng says: ==================== ipvlan: Fix insufficient skb linear check The current ipvlan codes use pskb_may_pull to get the skb linear header in func ipvlan_get_L3_hdr, but the size isn't enough for arp and ipv6 icmp. So it may access the unexpected momory in ipvlan_addr_lookup. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-24ipvlan: Fix insufficient skb linear check for ipv6 icmpGao Feng1-1/+19
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to make sure the skb header has enough linear room for ipv6 header. But it would use the latter memory directly without linear check when it is icmp. So it still may access the unepxected memory in ipvlan_addr_lookup. Now invoke the pskb_may_pull again if it is ipv6 icmp. Signed-off-by: Gao Feng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-24ipvlan: Fix insufficient skb linear check for arpGao Feng1-8/+8
In the function ipvlan_get_L3_hdr, current codes use pskb_may_pull to make sure the skb header has enough linear room for arp header. But it would access the arp payload in func ipvlan_addr_lookup. So it still may access the unepxected memory. Now use arp_hdr_len(port->dev) instead of the arp header as the param. Signed-off-by: Gao Feng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-24geneve: only configure or fill UDP_ZERO_CSUM6_RX/TX info when CONFIG_IPV6Hangbin Liu1-1/+15
Stefano pointed that configure or show UDP_ZERO_CSUM6_RX/TX info doesn't make sense if we haven't enabled CONFIG_IPV6. Fix it by adding if IS_ENABLED(CONFIG_IPV6) check. Fixes: abe492b4f50c ("geneve: UDP checksum configuration via netlink") Fixes: fd7eafd02121 ("geneve: fix fill_info when link down") Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Stefano Brivio <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-24Merge tag 'wireless-drivers-for-davem-2017-11-22' of ↵David S. Miller10-90/+335
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 4.15 First set of fixes for 4.15. Most important here is the iwlwifi fix for scan command firmware interface change. ath10k * fix CCMP-256, GCMP and GCMP-256 in raw mode, it was never working wcn36xx * fix device tree node search iwlwifi * fix a regression with firmware API change of scan cmd (introduced in firmware version 34) * add a bunch of PCI IDs and fix configuration structs for A000 devices * fix the exported firmware name strings for 9000 and A000 devices ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-24Merge branch '40GbE' of ↵David S. Miller16-120/+154
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Fixes 2017-11-21 This series contains fixes for igb/vf, ixgbe/vf, i40e/vf and fm10k. Jake fixes a regression issue with older firmware, where we were using the NVM lock to synchronize NVM reads for all devices and firmware versions, yet this caused issues with older firmware prior to version 1.5. Fixed this by only grabbing the lock for newer devices and firmware version 1.5 or newer. Zijie Pan fixes the calculation of the i40e VF MAC addresses, where it was possible to increment to the next MAC entry without calling i40e_add_mac_filter(). Amritha removes the upper limit of 64 queues on a channel VSI since the upper bound is determined by the VSI's num_queue_pairs. Filip fixes an issue during FLR resets, where should have been checking for upcoming core reset and if so, just return with I40E_ERR_NOT_READY. Alan fixes the notifying clients of l2 parameters by copying the parameters to the client instance struct and re-organizes the priority in which the client tasks fire so that if the flag for notifying l2 params is set, it will trigger before the client open task. Also fixed the promiscuous settings after reset for all the VSI's. Brian King from IBM fixes an issue seen on Power systems which would result in skb list corruption and eventual kernel oops. Brian provides the same fix for nearly all our drivers, to replace the read_barrier_depends with smp_rmb() to ensure loads are ordered with respect to the load of tx_buffer->next_to_watch. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-24net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHYFlorian Fainelli1-1/+1
The PHY on BCM7278 has an additional bit that needs to be cleared: IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out of suspend/resume cycles. Fixes: 0fe9933804eb ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller14-152/+216
Daniel Borkmann says: ==================== pull-request: bpf 2017-11-23 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Several BPF offloading fixes, from Jakub. Among others: - Limit offload to cls_bpf and XDP program types only. - Move device validation into the driver and don't make any assumptions about the device in the classifier due to shared blocks semantics. - Don't pass offloaded XDP program into the driver when it should be run in native XDP instead. Offloaded ones are not JITed for the host in such cases. - Don't destroy device offload state when moved to another namespace. - Revert dumping offload info into user space for now, since ifindex alone is not sufficient. This will be redone properly for bpf-next tree. 2) Fix test_verifier to avoid using bpf_probe_write_user() helper in test cases, since it's dumping a warning into kernel log which may confuse users when only running tests. Switch to use bpf_trace_printk() instead, from Yonghong. 3) Several fixes for correcting ARG_CONST_SIZE_OR_ZERO semantics before it becomes uabi, from Gianluca. More specifically: - Add a type ARG_PTR_TO_MEM_OR_NULL that is used only by bpf_csum_diff(), where the argument is either a valid pointer or NULL. The subsequent ARG_CONST_SIZE_OR_ZERO then enforces a valid pointer in case of non-0 size or a valid pointer or NULL in case of size 0. Given that, the semantics for ARG_PTR_TO_MEM in combination with ARG_CONST_SIZE_OR_ZERO are now such that in case of size 0, the pointer must always be valid and cannot be NULL. This fix in semantics allows for bpf_probe_read() to drop the recently added size == 0 check in the helper that would become part of uabi otherwise once released. At the same time we can then fix bpf_probe_read_str() and bpf_perf_event_output() to use ARG_CONST_SIZE_OR_ZERO instead of ARG_CONST_SIZE in order to fix recently reported issues by Arnaldo et al, where LLVM optimizes two boundary checks into a single one for unknown variables where the verifier looses track of the variable bounds and thus rejects valid programs otherwise. 4) A fix for the verifier for the case when it detects comparison of two constants where the branch is guaranteed to not be taken at runtime. Verifier will rightfully prune the exploration of such paths, but we still pass the program to JITs, where they would complain about using reserved fields, etc. Track such dead instructions and sanitize them with mov r0,r0. Rejection is not possible since LLVM may generate them for valid C code and doesn't do as much data flow analysis as verifier. For bpf-next we might implement removal of such dead code and adjust branches instead. Fix from Alexei. ==================== Signed-off-by: David S. Miller <[email protected]>
2017-11-24net: accept UFO datagrams from tuntap and packetWillem de Bruijn15-14/+209
Tuntap and similar devices can inject GSO packets. Accept type VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively. Processes are expected to use feature negotiation such as TUNSETOFFLOAD to detect supported offload types and refrain from injecting other packets. This process breaks down with live migration: guest kernels do not renegotiate flags, so destination hosts need to expose all features that the source host does. Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677. This patch introduces nearly(*) no new code to simplify verification. It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP insertion and software UFO segmentation. It does not reinstate protocol stack support, hardware offload (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception of VIRTIO_NET_HDR_GSO_UDP packets in tuntap. To support SKB_GSO_UDP reappearing in the stack, also reinstate logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD by squashing in commit 939912216fa8 ("net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1 ("net: avoid skb_warn_bad_offload false positives on UFO"). (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id, ipv6_proxy_select_ident is changed to return a __be32 and this is assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted at the end of the enum to minimize code churn. Tested Booted a v4.13 guest kernel with QEMU. On a host kernel before this patch `ethtool -k eth0` shows UFO disabled. After the patch, it is enabled, same as on a v4.13 host kernel. A UFO packet sent from the guest appears on the tap device: host: nc -l -p -u 8000 & tcpdump -n -i tap0 guest: dd if=/dev/zero of=payload.txt bs=1 count=2000 nc -u 192.16.1.1 8000 < payload.txt Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds, packets arriving fragmented: ./with_tap_pair.sh ./tap_send_ufo tap0 tap1 (from https://github.com/wdebruij/kerneltools/tree/master/tests) Changes v1 -> v2 - simplified set_offload change (review comment) - documented test procedure Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com> Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.") Reported-by: Michal Kubecek <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Acked-by: Jason Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-24net: realtek: r8169: implement set_link_ksettings()Tobias Jakobi1-16/+22
Commit 6fa1ba61520576cf1346c4ff09a056f2950cb3bf partially implemented the new ethtool API, by replacing get_settings() with get_link_ksettings(). This breaks ethtool, since the userspace tool (according to the new API specs) never tries the legacy set() call, when the new get() call succeeds. All attempts to chance some setting from userspace result in: > Cannot set new settings: Operation not supported Implement the missing set() call. Signed-off-by: Tobias Jakobi <[email protected]> Tested-by: Holger Hoffstätte <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-11-24net: ipv6: Fixup device for anycast routes during copyDavid Ahern1-1/+1
Florian reported a breakage with anycast routes due to commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address"). Prior to this commit anycast routes were added against the loopback device causing repetitive route entries with no insight into why they existed. e.g.: $ ip -6 ro ls table local type anycast anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium anycast fe80:: dev lo proto kernel metric 0 pref medium anycast fe80:: dev lo proto kernel metric 0 pref medium The point of commit 4832c30d5458 is to add the routes using the device with the address which is causing the route to be added. e.g.,: $ ip -6 ro ls table local type anycast anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium anycast fe80:: dev eth2 proto kernel metric 0 pref medium anycast fe80:: dev eth1 proto kernel metric 0 pref medium For traffic to work as it did before, the dst device needs to be switched to the loopback when the copy is created similar to local routes. Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>