aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-03-31bpf: Check attach type at prog load timeAndrey Ignatov8-29/+88
== The problem == There are use-cases when a program of some type can be attached to multiple attach points and those attach points must have different permissions to access context or to call helpers. E.g. context structure may have fields for both IPv4 and IPv6 but it doesn't make sense to read from / write to IPv6 field when attach point is somewhere in IPv4 stack. Same applies to BPF-helpers: it may make sense to call some helper from some attach point, but not from other for same prog type. == The solution == Introduce `expected_attach_type` field in in `struct bpf_attr` for `BPF_PROG_LOAD` command. If scenario described in "The problem" section is the case for some prog type, the field will be checked twice: 1) At load time prog type is checked to see if attach type for it must be known to validate program permissions correctly. Prog will be rejected with EINVAL if it's the case and `expected_attach_type` is not specified or has invalid value. 2) At attach time `attach_type` is compared with `expected_attach_type`, if prog type requires to have one, and, if they differ, attach will be rejected with EINVAL. The `expected_attach_type` is now available as part of `struct bpf_prog` in both `bpf_verifier_ops->is_valid_access()` and `bpf_verifier_ops->get_func_proto()` () and can be used to check context accesses and calls to helpers correspondingly. Initially the idea was discussed by Alexei Starovoitov <[email protected]> and Daniel Borkmann <[email protected]> here: https://marc.info/?l=linux-netdev&m=152107378717201&w=2 Signed-off-by: Andrey Ignatov <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-30Merge branch 'bpf-sockmap-sg-api-fixes'Daniel Borkmann3-13/+27
Prashant Bhole says: ==================== These patches fix sg api usage in sockmap. Previously sockmap didn't use sg_init_table(), which caused hitting BUG_ON in sg api, when CONFIG_DEBUG_SG is enabled v1: added sg_init_table() calls wherever needed. v2: - Patch1 adds new helper function in sg api. sg_init_marker() - Patch2 sg_init_marker() and sg_init_table() in appropriate places Backgroud: While reviewing v1, John Fastabend raised a valid point about unnecessary memset in sg_init_table() because sockmap uses sg table which embedded in a struct. As enclosing struct is zeroed out, there is unnecessary memset in sg_init_table. So Daniel Borkmann suggested to define another static inline function in scatterlist.h which only initializes sg_magic. Also this function will be called from sg_init_table. From this suggestion I defined a function sg_init_marker() which sets sg_magic and calls sg_mark_end() ==================== Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-30bpf: sockmap: initialize sg table entries properlyPrashant Bhole1-5/+8
When CONFIG_DEBUG_SG is set, sg->sg_magic is initialized in sg_init_table() and it is verified in sg api while navigating. We hit BUG_ON when magic check is failed. In functions sg_tcp_sendpage and sg_tcp_sendmsg, the struct containing the scatterlist is already zeroed out. So to avoid extra memset, we use sg_init_marker() to initialize sg_magic. Fixed following things: - In bpf_tcp_sendpage: initialize sg using sg_init_marker - In bpf_tcp_sendmsg: Replace sg_init_table with sg_init_marker - In bpf_tcp_push: Replace memset with sg_init_table where consumed sg entry needs to be re-initialized. Signed-off-by: Prashant Bhole <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-30lib/scatterlist: add sg_init_marker() helperPrashant Bhole2-8/+19
sg_init_marker initializes sg_magic in the sg table and calls sg_mark_end() on the last entry of the table. This can be useful to avoid memset in sg_init_table() when scatterlist is already zeroed out For example: when scatterlist is embedded inside other struct and that container struct is zeroed out Suggested-by: Daniel Borkmann <[email protected]> Signed-off-by: Prashant Bhole <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-30Merge branch 'bpf-sockmap-ingress'Daniel Borkmann8-37/+430
John Fastabend says: ==================== This series adds the BPF_F_INGRESS flag support to the redirect APIs. Bringing the sockmap API in-line with the cls_bpf redirect APIs. We add it to both variants of sockmap programs, the first patch adds support for tx ulp hooks and the third patch adds support for the recv skb hooks. Patches two and four add tests for the corresponding ingress redirect hooks. Follow on patches can address busy polling support, but next series from me will move the sockmap sample program into selftests. v2: added static to function definition caught by kbuild bot v3: fixed an error branch with missing mem_uncharge in recvmsg op moved receive_queue check outside of RCU region ==================== Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-30bpf: sockmap, more BPF_SK_SKB_STREAM_VERDICT testsJohn Fastabend3-4/+60
Add BPF_SK_SKB_STREAM_VERDICT tests for ingress hook. While we do this also bring stream tests in-line with MSG based testing. A map for skb options is added for userland to push options at BPF programs. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-30bpf: sockmap, BPF_F_INGRESS flag for BPF_SK_SKB_STREAM_VERDICT:John Fastabend3-19/+78
Add support for the BPF_F_INGRESS flag in skb redirect helper. To do this convert skb into a scatterlist and push into ingress queue. This is the same logic that is used in the sk_msg redirect helper so it should feel familiar. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-30bpf: sockmap, add BPF_F_INGRESS testsJohn Fastabend3-11/+87
Add a set of tests to verify ingress flag in redirect helpers works correctly with various msg sizes. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-30bpf: sockmap redirect ingress supportJohn Fastabend5-5/+207
Add support for the BPF_F_INGRESS flag in sk_msg redirect helper. To do this add a scatterlist ring for receiving socks to check before calling into regular recvmsg call path. Additionally, because the poll wakeup logic only checked the skb recv queue we need to add a hook in TCP stack (similar to write side) so that we have a way to wake up polling socks when a scatterlist is redirected to that sock. After this all that is needed is for the redirect helper to push the scatterlist into the psock receive queue. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28Merge branch 'nfp-bpf-updates'Alexei Starovoitov10-80/+771
Jakub Kicinski says: ==================== This set adds support for update and delete calls from the datapath, as well as XADD instructions (32 and 64 bit) and pseudo random numbers. The XADD support depends on verifier enforcing alignment which Daniel recently added. XADD uses NFP's atomic engine which requires values to be in big endian, therefore we need to keep track of which parts of the values are used as atomics and byte swap them accordingly. Pseudo random numbers are generated using NFP's HW pseudo random number generator. Jiong tackles initial implementation of packet cache, which he describes as follows: Memory reads on NFP would first fetch data from memory to transfer-in registers, then move them from transfer-in to general registers. Given NFP is rich on transfer-in registers, they could serve as memory cache. This patch tries to identify a sequence of packet data read (BPF_LDX) that are executed sequentially, then the total access range of the sequence is calculated and attached to each read instruction, the first instruction in this sequence is marked with an cache init flag so the execution of it would bring in the whole range of packet data for the sequence. All later packet reads in this sequence would fetch data from transfer-in registers directly, no need to JIT NFP memory access. Function call, non-packet-data memory read, packet write and memcpy will invalidate the cache and start a new cache range. Cache invalidation could be improved in the future, for example packet write doesn't need to invalidate the cache if the the write destination won't be read again. ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: improve wrong FW response warningsJakub Kicinski1-6/+6
When FW responds with a message of wrong size or type make sure the type is checked first and included in the wrong size message. This makes it easier to figure out which FW command failed. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: add support for bpf_get_prandom_u32()Jakub Kicinski6-2/+47
NFP has a prng register, which we can read to obtain a u32 worth of pseudo random data. Generate code for it. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: add support for atomic add of unknown valuesJakub Kicinski5-15/+88
Allow atomic add to be used even when the value is not guaranteed to fit into a 16 bit immediate. This requires the value to be pulled as data, and therefore use of a transfer register and a context swap. Track the information about possible lengths of the value, if it's guaranteed to be larger than 16bits don't generate the code for the optimized case at all. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: expose command delay slotsJakub Kicinski1-29/+24
Allow callers to control the delay slots of commands, instead of giving them just a wait/nowait choice. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: add basic support for atomic addsJakub Kicinski6-3/+212
Implement atomic add operation for 32 and 64 bit values. Depend on the verifier to ensure alignment. Values have to be kept in big endian and swapped upon read/write. For now only support atomic add of a constant. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28bpf: add parenthesis around argument of BPF_LDST_BYTES()Jakub Kicinski1-1/+1
BPF_LDST_BYTES() does not put it's argument in parenthesis when referencing it. This makes it impossible to pass pointers obtained by address-of operator (e.g. BPF_LDST_BYTES(&insn)). Add the parenthesis. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: add map deletes from the datapathJakub Kicinski4-0/+17
Support calling map_delete_elem() FW helper from the datapath programs. For JIT checks and code are basically equivalent to map lookups. Similarly to other map helper key must be on the stack. Different pointer types are left for future extension. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: add map updates from the datapathJakub Kicinski4-0/+21
Support calling map_update_elem() from the datapath programs by calling into FW-provided helper. Value pointer is passed in LM pointer #2. Keeping track of old state for arg3 is not necessary, since LM pointer #2 will be always loaded in this case, the trivial optimization for value at the bottom of the stack can't be done here. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: add helper for basic map call checksJakub Kicinski1-15/+25
Add a verifier helper for performing the basic state checks before a call to a map helper. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: add helper for validating stack pointersJakub Kicinski3-27/+50
Our implementation has restriction on stack pointers for function calls. Move the common checks into a helper for reuse. The state has to be encapsulated into a structure to support parameters other than BPF_REG_2. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: rename map_lookup_stack() to map_call_stack_common()Jakub Kicinski1-3/+3
We will reuse most of map call code gen for other map calls. Rename the lookup gen function and use meta->func_id instead of hard-coding lookup. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Quentin Monnet <[email protected]> Reviewed-by: Jiong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: detect packet reads could be cached, enable the optimisationJiong Wang2-0/+145
This patch is the front end of this optimisation, it detects and marks those packet reads that could be cached. Then the optimisation "backend" will be activated automatically. Signed-off-by: Jiong Wang <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: support unaligned read offsetJiong Wang1-3/+70
This patch add the support for unaligned read offset, i.e. the read offset to the start of packet cache area is not aligned to REG_WIDTH. In this case, the read area might across maximum three transfer-in registers. Signed-off-by: Jiong Wang <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28nfp: bpf: read from packet data cache for PTR_TO_PACKETJiong Wang3-2/+88
This patch assumes there is a packet data cache, and would try to read packet data from the cache instead of from memory. This patch only implements the optimisation "backend", it doesn't build the packet data cache, so this optimisation is not enabled. This patch has only enabled aligned packet data read, i.e. when the read offset to the start of cache is REG_WIDTH aligned. Signed-off-by: Jiong Wang <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2018-03-28Merge branch 'bpf-raw-tracepoints'Daniel Borkmann30-77/+607
Alexei Starovoitov says: ==================== v7->v8: - moved 'u32 num_args' from 'struct tracepoint' into 'struct bpf_raw_event_map' that increases memory overhead, but can be optimized/compressed later. Now it's zero changes in tracepoint.[ch] v6->v7: - adopted Steven's bpf_raw_tp_map section approach to find tracepoint and corresponding bpf probe function instead of kallsyms approach. dropped kernel_tracepoint_find_by_name() patch v5->v6: - avoid changing semantics of for_each_kernel_tracepoint() function, instead introduce kernel_tracepoint_find_by_name() helper v4->v5: - adopted Daniel's fancy REPEAT macro in bpf_trace.c in patch 6 v3->v4: - adopted Linus's CAST_TO_U64 macro to cast any integer, pointer, or small struct to u64. That nicely reduced the size of patch 1 v2->v3: - with Linus's suggestion introduced generic COUNT_ARGS and CONCATENATE macros (or rather moved them from apparmor) that cleaned up patch 6 - added patch 4 to refactor trace_iwlwifi_dev_ucode_error() from 17 args to 4 Now any tracepoint with >12 args will have build error v1->v2: - simplified api by combing bpf_raw_tp_open(name) + bpf_attach(prog_fd) into bpf_raw_tp_open(name, prog_fd) as suggested by Daniel. That simplifies bpf_detach as well which is now simple close() of fd. - fixed memory leak in error path which was spotted by Daniel. - fixed bpf_get_stackid(), bpf_perf_event_output() called from raw tracepoints - added more tests - fixed allyesconfig build caught by buildbot v1: This patch set is a different way to address the pressing need to access task_struct pointers in sched tracepoints from bpf programs. The first approach simply added these pointers to sched tracepoints: https://lkml.org/lkml/2017/12/14/753 which Peter nacked. Few options were discussed and eventually the discussion converged on doing bpf specific tracepoint_probe_register() probe functions. Details here: https://lkml.org/lkml/2017/12/20/929 Patch 1 is kernel wide cleanup of pass-struct-by-value into pass-struct-by-reference into tracepoints. Patches 2 and 3 are minor cleanups to address allyesconfig build Patch 4 refactor trace_iwlwifi_dev_ucode_error from 17 to 4 args Patch 5 introduces COUNT_ARGS macro Patch 6 introduces BPF_RAW_TRACEPOINT api. the auto-cleanup and multiple concurrent users are must have features of tracing api. For bpf raw tracepoints it looks like: // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type prog_fd = bpf_prog_load(...); // receive anon_inode fd for given bpf_raw_tracepoint // and attach bpf program to it raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd); Ctrl-C of tracing daemon or cmdline tool will automatically detach bpf program, unload it and unregister tracepoint probe. More details in patch 6. Patch 7 - trivial support in libbpf Patches 8, 9 - user space tests samples/bpf/test_overhead performance on 1 cpu: tracepoint base kprobe+bpf tracepoint+bpf raw_tracepoint+bpf task_rename 1.1M 769K 947K 1.0M urandom_read 789K 697K 750K 755K ==================== Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28selftests/bpf: test for bpf_get_stackid() from raw tracepointsAlexei Starovoitov1-21/+70
similar to traditional traceopint test add bpf_get_stackid() test from raw tracepoints and reduce verbosity of existing stackmap test Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28samples/bpf: raw tracepoint testAlexei Starovoitov4-0/+44
add empty raw_tracepoint bpf program to test overhead similar to kprobe and traditional tracepoint tests Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28libbpf: add bpf_raw_tracepoint_open helperAlexei Starovoitov3-0/+23
add bpf_raw_tracepoint_open(const char *name, int prog_fd) api to libbpf Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28bpf: introduce BPF_RAW_TRACEPOINTAlexei Starovoitov9-0/+424
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access kernel internal arguments of the tracepoints in their raw form. >From bpf program point of view the access to the arguments look like: struct bpf_raw_tracepoint_args { __u64 args[0]; }; int bpf_prog(struct bpf_raw_tracepoint_args *ctx) { // program can read args[N] where N depends on tracepoint // and statically verified at program load+attach time } kprobe+bpf infrastructure allows programs access function arguments. This feature allows programs access raw tracepoint arguments. Similar to proposed 'dynamic ftrace events' there are no abi guarantees to what the tracepoints arguments are and what their meaning is. The program needs to type cast args properly and use bpf_probe_read() helper to access struct fields when argument is a pointer. For every tracepoint __bpf_trace_##call function is prepared. In assembler it looks like: (gdb) disassemble __bpf_trace_xdp_exception Dump of assembler code for function __bpf_trace_xdp_exception: 0xffffffff81132080 <+0>: mov %ecx,%ecx 0xffffffff81132082 <+2>: jmpq 0xffffffff811231f0 <bpf_trace_run3> where TRACE_EVENT(xdp_exception, TP_PROTO(const struct net_device *dev, const struct bpf_prog *xdp, u32 act), The above assembler snippet is casting 32-bit 'act' field into 'u64' to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is. All of ~500 of __bpf_trace_*() functions are only 5-10 byte long and in total this approach adds 7k bytes to .text. This approach gives the lowest possible overhead while calling trace_xdp_exception() from kernel C code and transitioning into bpf land. Since tracepoint+bpf are used at speeds of 1M+ events per second this is valuable optimization. The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced that returns anon_inode FD of 'bpf-raw-tracepoint' object. The user space looks like: // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type prog_fd = bpf_prog_load(...); // receive anon_inode fd for given bpf_raw_tracepoint with prog attached raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd); Ctrl-C of tracing daemon or cmdline tool that uses this feature will automatically detach bpf program, unload it and unregister tracepoint probe. On the kernel side the __bpf_raw_tp_map section of pointers to tracepoint definition and to __bpf_trace_*() probe function is used to find a tracepoint with "xdp_exception" name and corresponding __bpf_trace_xdp_exception() probe function which are passed to tracepoint_probe_register() to connect probe with tracepoint. Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf tracepoint mechanisms. perf_event_open() can be used in parallel on the same tracepoint. Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted. Each with its own bpf program. The kernel will execute all tracepoint probes and all attached bpf programs. In the future bpf_raw_tracepoints can be extended with query/introspection logic. __bpf_raw_tp_map section logic was contributed by Steven Rostedt Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]> Acked-by: Steven Rostedt (VMware) <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28macro: introduce COUNT_ARGS() macroAlexei Starovoitov2-6/+8
move COUNT_ARGS() macro from apparmor to generic header and extend it to count till twelve. COUNT() was an alternative name for this logic, but it's used for different purpose in many other places. Similarly for CONCATENATE() macro. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepointAlexei Starovoitov4-33/+21
fix iwlwifi_dev_ucode_error tracepoint to pass pointer to a table instead of all 17 arguments by value. dvm/main.c and mvm/utils.c have 'struct iwl_error_event_table' defined with very similar yet subtly different fields and offsets. tracepoint is still common and using definition of 'struct iwl_error_event_table' from dvm/commands.h while copying fields. Long term this tracepoint probably should be split into two. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28net/mac802154: disambiguate mac80215 vs mac802154 trace eventsAlexei Starovoitov1-4/+4
two trace events defined with the same name and both unused. They conflict in allyesconfig build. Rename one of them. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28net/mediatek: disambiguate mt76 vs mt7601u trace eventsAlexei Starovoitov1-3/+3
two trace events defined with the same name and both unused. They conflict in allyesconfig build. Rename one of them. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28treewide: remove large struct-pass-by-value from tracepoint argumentsAlexei Starovoitov5-10/+10
- fix trace_hfi1_ctxt_info() to pass large struct by reference instead of by value - convert 'type array[]' tracepoint arguments into 'type *array', since compiler will warn that sizeof('type array[]') == sizeof('type *array') and later should be used instead The CAST_TO_U64 macro in the later patch will enforce that tracepoint arguments can only be integers, pointers, or less than 8 byte structures. Larger structures should be passed by reference. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28bpf: Add sock_ops R/W access to ipv4 tosNikita V. Shirokov1-0/+35
Sample usage for tos ... bpf_getsockopt(skops, SOL_IP, IP_TOS, &v, sizeof(v)) ... where skops is a pointer to the ctx (struct bpf_sock_ops). Signed-off-by: Nikita V. Shirokov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-28samples/bpf: fix spelling mistake: "revieve" -> "receive"Colin Ian King1-1/+1
Trivial fix to spelling mistake in error message text Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-27bpf: follow idr code conventionShaohua Li1-0/+4
Generally we do a preload before doing idr allocation. This also help improve the allocation success rate in memory pressure. Cc: Daniel Borkmann <[email protected]> Cc: Alexei Starovoitov <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-26Merge branch 'bpf-verifier-log-btf-prep'Daniel Borkmann2-12/+22
Martin KaFai Lau says: ==================== This patch set has some changes and clean-up works for the bpf_verifier_log. They are the prep works for the BTF (BPF Type Format). ==================== Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-26bpf: Add bpf_verifier_vlog() and bpf_verifier_log_needed()Martin KaFai Lau2-8/+18
The BTF (BPF Type Format) verifier needs to reuse the current BPF verifier log. Hence, it requires the following changes: (1) Expose log_write() in verifier.c for other users. Its name is renamed to bpf_verifier_vlog(). (2) The BTF verifier also needs to check 'log->level && log->ubuf && !bpf_verifier_log_full(log);' independently outside of the current log_write(). It is because the BTF verifier will do one-check before making multiple calls to btf_verifier_vlog to log the details of a type. Hence, this check is also re-factored to a new function bpf_verifier_log_needed(). Since it is re-factored, we can check it before va_start() in the current bpf_verifier_log_write() and verbose(). Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-26bpf: Rename bpf_verifer_logMartin KaFai Lau2-5/+5
bpf_verifer_log => bpf_verifier_log Signed-off-by: Martin KaFai Lau <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-23Merge branch 'bpf-print-insns-api'Daniel Borkmann4-53/+60
Jiri Olsa says: ==================== This patchset removes struct bpf_verifier_env argument from print_bpf_insn function (patch 1) and changes user space bpftool user to use it that way (patch 2). ==================== Reviewed-by: Quentin Monnet <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-23bpftool: Adjust to new print_bpf_insn interfaceJiri Olsa1-6/+6
Change bpftool to skip the removed struct bpf_verifier_env argument in print_bpf_insn. It was passed as NULL anyway. No functional change intended. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-23bpf: Remove struct bpf_verifier_env argument from print_bpf_insnJiri Olsa3-47/+54
We use print_bpf_insn in user space (bpftool and soon perf), so it'd be nice to keep it generic and strip it off the kernel struct bpf_verifier_env argument. This argument can be safely removed, because its users can use the struct bpf_insn_cbs::private_data to pass it. By changing the argument type we can no longer have clean 'verbose' alias to 'bpf_verifier_log_write' in verifier.c. Instead we're adding the 'verbose' cb_print callback and removing the alias. This way we have new cb_print callback in place, and all the 'verbose(env, ...) calls in verifier.c will cleanly cast to 'verbose(void *, ...)' so no other change is needed. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2018-03-23intel: add SPDX identifiers to all the Intel driversJeff Kirsher168-0/+168
Add the SPDX identifiers to all the Intel wired LAN driver files, as outlined in Documentation/process/license-rules.rst. Signed-off-by: Jeff Kirsher <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-23bridge: Allow max MTU when multiple VLANs presentChas Williams4-7/+25
If the bridge is allowing multiple VLANs, some VLANs may have different MTUs. Instead of choosing the minimum MTU for the bridge interface, choose the maximum MTU of the bridge members. With this the user only needs to set a larger MTU on the member ports that are participating in the large MTU VLANS. Signed-off-by: Chas Williams <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-23virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUSJay Vosburgh1-1/+1
The operstate update logic will leave an interface in the default UNKNOWN operstate if the interface carrier state never changes from the default carrier up state set at creation. This includes the case of an explicit call to netif_carrier_on, as the carrier on to on transition has no effect on operstate. This affects virtio-net for the case that the virtio peer does not support VIRTIO_NET_F_STATUS (the feature that provides carrier state updates). Without this feature, the virtio specification states that "the link should be assumed active," so, logically, the operstate should be UP instead of UNKNOWN. This has impact on user space applications that use the operstate to make availability decisions for the interface. Resolve this by changing the virtio probe logic slightly to call netif_carrier_off for both the "with" and "without" VIRTIO_NET_F_STATUS cases, and then the existing call to netif_carrier_on for the "without" case will cause an operstate transition. Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Ben Hutchings <[email protected]> Signed-off-by: Jay Vosburgh <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-23devlink: Remove top_hierarchy arg for DEVLINK disabled pathDavid Ahern1-1/+0
Earlier change missed the path where CONFIG_NET_DEVLINK is disabled. Thanks to Jiri for spotting. Fixes: 145307460ba9 ("devlink: Remove top_hierarchy arg to devlink_resource_register") Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller636-4191/+6274
Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f9524 (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit b5ca15ad7e61 (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <[email protected]>
2018-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds16-79/+153
Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <[email protected]>: mm, thp: do not cause memcg oom for thp mm/vmscan: wake up flushers for legacy cgroups too Revert "mm: page_alloc: skip over regions of invalid pfns where possible" mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink() mm/thp: do not wait for lock_page() in deferred_split_scan() mm/khugepaged.c: convert VM_BUG_ON() to collapse fail x86/mm: implement free pmd/pte page interfaces mm/vmalloc: add interfaces to free unmapped page table h8300: remove extraneous __BIG_ENDIAN definition hugetlbfs: check for pgoff value overflow lockdep: fix fs_reclaim warning MAINTAINERS: update Mark Fasheh's e-mail mm/mempolicy.c: avoid use uninitialized preferred_node
2018-03-22Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds8-49/+57
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "Two regression fixes, two bug fixes for older issues, two fixes for new functionality added this cycle that have userspace ABI concerns, and a small cleanup. These have appeared in a linux-next release and have a build success report from the 0day robot. * The 4.16 rework of altmap handling led to some configurations leaking page table allocations due to freeing from the altmap reservation rather than the page allocator. The impact without the fix is leaked memory and a WARN() message when tearing down libnvdimm namespaces. The rework also missed a place where error handling code needed to be removed that can lead to a crash if devm_memremap_pages() fails. * acpi_map_pxm_to_node() had a latent bug whereby it could misidentify the closest online node to a given proximity domain. * Block integrity handling was reworked several kernels back to allow calling add_disk() after setting up the integrity profile. The nd_btt and nd_blk drivers are just now catching up to fix automatic partition detection at driver load time. * The new peristence_domain attribute, a platform indicator of whether cpu caches are powerfail protected for example, is meant to be a single value enum and not a set of flags. This oversight was caught while reviewing new userspace code in libndctl to communicate the attribute. Fix this new enabling up so that we are not stuck with an unwanted userspace ABI" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm, nfit: fix persistence domain reporting libnvdimm, region: hide persistence_domain when unknown acpi, numa: fix pxm to online numa node associations x86, memremap: fix altmap accounting at free libnvdimm: remove redundant assignment to pointer 'dev' libnvdimm, {btt, blk}: do integrity setup before add_disk() kernel/memremap: Remove stale devres_free() call