aboutsummaryrefslogtreecommitdiff
path: root/security
AgeCommit message (Collapse)AuthorFilesLines
2013-08-14apparmor: add an optional profile attachment string for profilesJohn Johansen4-0/+40
Add the ability to take in and report a human readable profile attachment string for profiles so that attachment specifications can be easily inspected. Signed-off-by: John Johansen <[email protected]> Acked-by: Seth Arnold <[email protected]>
2013-08-14apparmor: add interface files for profiles and namespacesJohn Johansen7-29/+436
Add basic interface files to access namespace and profile information. The interface files are created when a profile is loaded and removed when the profile or namespace is removed. Signed-off-by: John Johansen <[email protected]>
2013-08-14apparmor: allow setting any profile into the unconfined stateJohn Johansen5-9/+22
Allow emulating the default profile behavior from boot, by allowing loading of a profile in the unconfined state into a new NS. Signed-off-by: John Johansen <[email protected]> Acked-by: Seth Arnold <[email protected]>
2013-08-14apparmor: make free_profile available outside of policy.cJohn Johansen3-7/+7
Signed-off-by: John Johansen <[email protected]>
2013-08-14apparmor: rework namespace free pathJohn Johansen2-35/+10
namespaces now completely use the unconfined profile to track the refcount and rcu freeing cycle. So rework the code to simplify (track everything through the profile path right up to the end), and move the rcu_head from policy base to profile as the namespace no longer needs it. Signed-off-by: John Johansen <[email protected]> Acked-by: Seth Arnold <[email protected]>
2013-08-14apparmor: update how unconfined is handledJohn Johansen3-83/+67
ns->unconfined is being used read side without locking, nor rcu but is being updated when a namespace is removed. This works for the root ns which is never removed but has a race window and can cause failures when children namespaces are removed. Also ns and ns->unconfined have a circular refcounting dependency that is problematic and must be broken. Currently this is done incorrectly when the namespace is destroyed. Fix this by forward referencing unconfined via the replacedby infrastructure instead of directly updating the ns->unconfined pointer. Remove the circular refcount dependency by making the ns and its unconfined profile share the same refcount. Signed-off-by: John Johansen <[email protected]> Acked-by: Seth Arnold <[email protected]>
2013-08-14apparmor: change how profile replacement update is doneJohn Johansen6-87/+125
remove the use of replaced by chaining and move to profile invalidation and lookup to handle task replacement. Replacement chaining can result in large chains of profiles being pinned in memory when one profile in the chain is use. With implicit labeling this will be even more of a problem, so move to a direct lookup method. Signed-off-by: John Johansen <[email protected]>
2013-08-14apparmor: convert profile lists to RCU based lockingJohn Johansen4-111/+167
Signed-off-by: John Johansen <[email protected]>
2013-08-14apparmor: provide base for multiple profiles to be replaced at onceJohn Johansen4-146/+283
previously profiles had to be loaded one at a time, which could result in cases where a replacement of a set would partially succeed, and then fail resulting in inconsistent policy. Allow multiple profiles to replaced "atomically" so that the replacement either succeeds or fails for the entire set of profiles. Signed-off-by: John Johansen <[email protected]>
2013-08-14apparmor: add a features/policy dir to interfaceJohn Johansen1-0/+5
Add a policy directory to features to contain features that can affect policy compilation but do not affect mediation. Eg of such features would be types of dfa compression supported, etc. Signed-off-by: John Johansen <[email protected]> Acked-by: Kees Cook <[email protected]>
2013-08-14apparmor: enable users to query whether apparmor is enabledJohn Johansen1-1/+1
Signed-off-by: John Johansen <[email protected]>
2013-08-14apparmor: remove minimum size check for vmalloc()Tetsuo Handa1-5/+0
This is a follow-up to commit b5b3ee6c "apparmor: no need to delay vfree()". Since vmalloc() will do "size = PAGE_ALIGN(size);", we don't need to check for "size >= sizeof(struct work_struct)". Signed-off-by: Tetsuo Handa <[email protected]> Signed-off-by: John Johansen <[email protected]>
2013-08-12Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytesRafal Krypa1-85/+82
Smack interface for loading rules has always parsed only single rule from data written to it. This requires user program to call one write() per each rule it wants to load. This change makes it possible to write multiple rules, separated by new line character. Smack will load at most PAGE_SIZE-1 characters and properly return number of processed bytes. In case when user buffer is larger, it will be additionally truncated. All characters after last \n will not get parsed to avoid partial rule near input buffer boundary. Signed-off-by: Rafal Krypa <[email protected]>
2013-08-08cgroup: make css_for_each_descendant() and friends include the origin css in ↵Tejun Heo1-1/+1
the iteration Previously, all css descendant iterators didn't include the origin (root of subtree) css in the iteration. The reasons were maintaining consistency with css_for_each_child() and that at the time of introduction more use cases needed skipping the origin anyway; however, given that css_is_descendant() considers self to be a descendant, omitting the origin css has become more confusing and looking at the accumulated use cases rather clearly indicates that including origin would result in simpler code overall. While this is a change which can easily lead to subtle bugs, cgroup API including the iterators has recently gone through major restructuring and no out-of-tree changes will be applicable without adjustments making this a relatively acceptable opportunity for this type of change. The conversions are mostly straight-forward. If the iteration block had explicit origin handling before or after, it's moved inside the iteration. If not, if (pos == origin) continue; is added. Some conversions add extra reference get/put around origin handling by consolidating origin handling and the rest. While the extra ref operations aren't strictly necessary, this shouldn't cause any noticeable difference. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]> Acked-by: Vivek Goyal <[email protected]> Acked-by: Aristeu Rozanski <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Matt Helsley <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Balbir Singh <[email protected]>
2013-08-08cgroup: make hierarchy iterators deal with cgroup_subsys_state instead of cgroupTejun Heo1-8/+3
cgroup is currently in the process of transitioning to using css (cgroup_subsys_state) as the primary handle instead of cgroup in subsystem API. For hierarchy iterators, this is beneficial because * In most cases, css is the only thing subsystems care about anyway. * On the planned unified hierarchy, iterations for different subsystems will need to skip over different subtrees of the hierarchy depending on which subsystems are enabled on each cgroup. Passing around css makes it unnecessary to explicitly specify the subsystem in question as css is intersection between cgroup and subsystem * For the planned unified hierarchy, css's would need to be created and destroyed dynamically independent from cgroup hierarchy. Having cgroup core manage css iteration makes enforcing deref rules a lot easier. Most subsystem conversions are straight-forward. Noteworthy changes are * blkio: cgroup_to_blkcg() is no longer used. Removed. * freezer: cgroup_freezer() is no longer used. Removed. * devices: cgroup_to_devcgroup() is no longer used. Removed. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vivek Goyal <[email protected]> Acked-by: Aristeu Rozanski <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Matt Helsley <[email protected]> Cc: Jens Axboe <[email protected]>
2013-08-08cgroup: pass around cgroup_subsys_state instead of cgroup in file methodsTejun Heo1-6/+6
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup. Please see the previous commit which converts the subsystem methods for rationale. This patch converts all cftype file operations to take @css instead of @cgroup. cftypes for the cgroup core files don't have their subsytem pointer set. These will automatically use the dummy_css added by the previous patch and can be converted the same way. Most subsystem conversions are straight forwards but there are some interesting ones. * freezer: update_if_frozen() is also converted to take @css instead of @cgroup for consistency. This will make the code look simpler too once iterators are converted to use css. * memory/vmpressure: mem_cgroup_from_css() needs to be exported to vmpressure while mem_cgroup_from_cont() can be made static. Updated accordingly. * cpu: cgroup_tg() doesn't have any user left. Removed. * cpuacct: cgroup_ca() doesn't have any user left. Removed. * hugetlb: hugetlb_cgroup_form_cgroup() doesn't have any user left. Removed. * net_cls: cgrp_cls_state() doesn't have any user left. Removed. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vivek Goyal <[email protected]> Acked-by: Aristeu Rozanski <[email protected]> Acked-by: Daniel Wagner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Matt Helsley <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Steven Rostedt <[email protected]>
2013-08-08cgroup: pass around cgroup_subsys_state instead of cgroup in subsystem methodsTejun Heo1-11/+11
cgroup is currently in the process of transitioning to using struct cgroup_subsys_state * as the primary handle instead of struct cgroup * in subsystem implementations for the following reasons. * With unified hierarchy, subsystems will be dynamically bound and unbound from cgroups and thus css's (cgroup_subsys_state) may be created and destroyed dynamically over the lifetime of a cgroup, which is different from the current state where all css's are allocated and destroyed together with the associated cgroup. This in turn means that cgroup_css() should be synchronized and may return NULL, making it more cumbersome to use. * Differing levels of per-subsystem granularity in the unified hierarchy means that the task and descendant iterators should behave differently depending on the specific subsystem the iteration is being performed for. * In majority of the cases, subsystems only care about its part in the cgroup hierarchy - ie. the hierarchy of css's. Subsystem methods often obtain the matching css pointer from the cgroup and don't bother with the cgroup pointer itself. Passing around css fits much better. This patch converts all cgroup_subsys methods to take @css instead of @cgroup. The conversions are mostly straight-forward. A few noteworthy changes are * ->css_alloc() now takes css of the parent cgroup rather than the pointer to the new cgroup as the css for the new cgroup doesn't exist yet. Knowing the parent css is enough for all the existing subsystems. * In kernel/cgroup.c::offline_css(), unnecessary open coded css dereference is replaced with local variable access. This patch shouldn't cause any behavior differences. v2: Unnecessary explicit cgrp->subsys[] deref in css_online() replaced with local variable @css as suggested by Li Zefan. Rebased on top of new for-3.12 which includes for-3.11-fixes so that ->css_free() invocation added by da0a12caff ("cgroup: fix a leak when percpu_ref_init() fails") is converted too. Suggested by Li Zefan. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vivek Goyal <[email protected]> Acked-by: Aristeu Rozanski <[email protected]> Acked-by: Daniel Wagner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Matt Helsley <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Steven Rostedt <[email protected]>
2013-08-08cgroup: add css_parent()Tejun Heo1-13/+5
Currently, controllers have to explicitly follow the cgroup hierarchy to find the parent of a given css. cgroup is moving towards using cgroup_subsys_state as the main controller interface construct, so let's provide a way to climb the hierarchy using just csses. This patch implements css_parent() which, given a css, returns its parent. The function is guarnateed to valid non-NULL parent css as long as the target css is not at the top of the hierarchy. freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices are converted to use css_parent() instead of accessing cgroup->parent directly. * __parent_ca() is dropped from cpuacct and its usage is replaced with parent_ca(). The only difference between the two was NULL test on cgroup->parent which is now embedded in css_parent() making the distinction moot. Note that eventually a css->parent field will be added to css and the NULL check in css_parent() will go away. This patch shouldn't cause any behavior differences. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2013-08-08cgroup: add/update accessors which obtain subsys specific data from cssTejun Heo1-1/+1
css (cgroup_subsys_state) is usually embedded in a subsys specific data structure. Subsystems either use container_of() directly to cast from css to such data structure or has an accessor function wrapping such cast. As cgroup as whole is moving towards using css as the main interface handle, add and update such accessors to ease dealing with css's. All accessors explicitly handle NULL input and return NULL in those cases. While this looks like an extra branch in the code, as all controllers specific data structures have css as the first field, the casting doesn't involve any offsetting and the compiler can trivially optimize out the branch. * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such accessor. Added. * memory, hugetlb and devices already had one but didn't explicitly handle NULL input. Updated. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2013-08-08cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/Tejun Heo1-2/+2
The names of the two struct cgroup_subsys_state accessors - cgroup_subsys_state() and task_subsys_state() - are somewhat awkward. The former clashes with the type name and the latter doesn't even indicate it's somehow related to cgroup. We're about to revamp large portion of cgroup API, so, let's rename them so that they're less awkward. Most per-controller usages of the accessors are localized in accessor wrappers and given the amount of scheduled changes, this isn't gonna add any noticeable headache. Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state() to task_css(). This patch is pure rename. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Li Zefan <[email protected]>
2013-08-06Smack: IPv6 casting error fix for 3.11Casey Schaufler1-13/+11
The original implementation of the Smack IPv6 port based local controls works most of the time using a sockaddr as a temporary variable, but not always as it overflows in some circumstances. The correct data is a sockaddr_in6. A struct sockaddr isn't as large as a struct sockaddr_in6. There would need to be casting one way or the other. This patch gets it the right way. Signed-off-by: Casey Schaufler <[email protected]> Signed-off-by: James Morris <[email protected]>
2013-08-01Smack: network label match fixCasey Schaufler3-9/+31
The Smack code that matches incoming CIPSO tags with Smack labels reaches through the NetLabel interfaces and compares the network data with the CIPSO header associated with a Smack label. This was done in a ill advised attempt to optimize performance. It works so long as the categories fit in a single capset, but this isn't always the case. This patch changes the Smack code to use the appropriate NetLabel interfaces to compare the incoming CIPSO header with the CIPSO header associated with a label. It will always match the CIPSO headers correctly. Targeted for git://git.gitorious.org/smack-next/kernel.git Signed-off-by: Casey Schaufler <[email protected]>
2013-08-01security: smack: add a hash table to quicken smk_find_entry()Tomasz Stanislawski3-9/+37
Accepted for the smack-next tree after changing the number of slots from 128 to 16. This patch adds a hash table to quicken searching of a smack label by its name. Basically, the patch improves performance of SMACK initialization. Parsing of rules involves translation from a string to a smack_known (aka label) entity which is done in smk_find_entry(). The current implementation of the function iterates over a global list of smack_known resulting in O(N) complexity for smk_find_entry(). The total complexity of SMACK initialization becomes O(rules * labels). Therefore it scales quadratically with a complexity of a system. Applying the patch reduced the complexity of smk_find_entry() to O(1) as long as number of label is in hundreds. If the number of labels is increased please update SMACK_HASH_SLOTS constant defined in security/smack/smack.h. Introducing the configuration of this constant with Kconfig or cmdline might be a good idea. The size of the hash table was adjusted experimentally. The rule set used by TIZEN contains circa 17K rules for 500 labels. The table above contains results of SMACK initialization using 'time smackctl apply' bash command. The 'Ref' is a kernel without this patch applied. The consecutive values refers to value of SMACK_HASH_SLOTS. Every measurement was repeated three times to reduce noise. | Ref | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 -------------------------------------------------------------------------------------------- Run1 | 1.156 | 1.096 | 0.883 | 0.764 | 0.692 | 0.667 | 0.649 | 0.633 | 0.634 | 0.629 | 0.620 Run2 | 1.156 | 1.111 | 0.885 | 0.764 | 0.694 | 0.661 | 0.649 | 0.651 | 0.634 | 0.638 | 0.623 Run3 | 1.160 | 1.107 | 0.886 | 0.764 | 0.694 | 0.671 | 0.661 | 0.638 | 0.631 | 0.624 | 0.638 AVG | 1.157 | 1.105 | 0.885 | 0.764 | 0.693 | 0.666 | 0.653 | 0.641 | 0.633 | 0.630 | 0.627 Surprisingly, a single hlist is slightly faster than a double-linked list. The speed-up saturates near 64 slots. Therefore I chose value 128 to provide some margin if more labels were used. It looks that IO becomes a new bottleneck. Signed-off-by: Tomasz Stanislawski <[email protected]>
2013-08-01security: smack: fix memleak in smk_write_rules_list()Tomasz Stanislawski1-22/+11
The smack_parsed_rule structure is allocated. If a rule is successfully installed then the last reference to the object is lost. This patch fixes this leak. Moreover smack_parsed_rule is allocated on stack because it no longer needed ofter smk_write_rules_list() is finished. Signed-off-by: Tomasz Stanislawski <[email protected]>
2013-07-31net: split rt_genid for ipv4 and ipv6fan.du1-1/+6
Current net name space has only one genid for both IPv4 and IPv6, it has below drawbacks: - Add/delete an IPv4 address will invalidate all IPv6 routing table entries. - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table entries even when the policy is only applied for one address family. Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6 separately in a fine granularity. Signed-off-by: Fan Du <[email protected]> Acked-by: Hannes Frederic Sowa <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2013-07-25Add SELinux policy capability for always checking packet and peer classes.Chris PeBenito4-6/+30
Currently the packet class in SELinux is not checked if there are no SECMARK rules in the security or mangle netfilter tables. Some systems prefer that packets are always checked, for example, to protect the system should the netfilter rules fail to load or if the nefilter rules were maliciously flushed. Add the always_check_network policy capability which, when enabled, treats SECMARK as enabled, even if there are no netfilter SECMARK rules and treats peer labeling as enabled, even if there is no Netlabel or labeled IPSEC configuration. Includes definition of "redhat1" SELinux policy capability, which exists in the SELinux userpace library, to keep ordering correct. The SELinux userpace portion of this was merged last year, but this kernel change fell on the floor. Signed-off-by: Chris PeBenito <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25selinux: fix problems in netnode when BUG() is compiled outPaul Moore1-0/+2
When the BUG() macro is disabled at compile time it can cause some problems in the SELinux netnode code: invalid return codes and uninitialized variables. This patch fixes this by making sure we take some corrective action after the BUG() macro. Reported-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: use a helper function to determine seclabelEric Paris1-14/+24
Use a helper to determine if a superblock should have the seclabel flag rather than doing it in the function. I'm going to use this in the security server as well. Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: pass a superblock to security_fs_useEric Paris3-15/+11
Rather than passing pointers to memory locations, strings, and other stuff just give up on the separation and give security_fs_use the superblock. It just makes the code easier to read (even if not easier to reuse on some other OS) Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: do not handle seclabel as a special flagEric Paris2-4/+1
Instead of having special code around the 'non-mount' seclabel mount option just handle it like the mount options. Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: change sbsec->behavior to shortEric Paris3-3/+3
We only have 6 options, so char is good enough, but use a short as that packs nicely. This shrinks the superblock_security_struct just a little bit. Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: renumber the superblock optionsEric Paris2-4/+5
Just to make it clear that we have mount time options and flags, separate them. Since I decided to move the non-mount options above above 0x10, we need a short instead of a char. (x86 padding says this takes up no additional space as we have a 3byte whole in the structure) Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: do all flags twiddling in one placeEric Paris1-7/+5
Currently we set the initialize and seclabel flag in one place. Do some unrelated printk then we unset the seclabel flag. Eww. Instead do the flag twiddling in one place in the code not seperated by unrelated printk. Also don't set and unset the seclabel flag. Only set it if we need to. Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: rename SE_SBLABELSUPP to SBLABEL_MNTEric Paris2-15/+15
Just a flag rename as we prepare to make it not so special. Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: use define for number of bits in the mnt flags maskEric Paris1-1/+4
We had this random hard coded value of '8' in the code (I put it there) for the number of bits to check for mount options. This is stupid. Instead use the #define we already have which tells us the number of mount options. Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: make it harder to get the number of mnt opts wrongEric Paris1-2/+3
Instead of just hard coding a value, use the enum to out benefit. Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: remove crazy contortions around procEric Paris1-1/+1
We check if the fsname is proc and if so set the proc superblock security struct flag. We then check if the flag is set and use the string 'proc' for the fsname instead of just using the fsname. What's the point? It's always proc... Get rid of the useless conditional. Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: fix selinuxfs policy file on big endian systemsEric Paris1-2/+1
The /sys/fs/selinux/policy file is not valid on big endian systems like ppc64 or s390. Let's see why: static int hashtab_cnt(void *key, void *data, void *ptr) { int *cnt = ptr; *cnt = *cnt + 1; return 0; } static int range_write(struct policydb *p, void *fp) { size_t nel; [...] /* count the number of entries in the hashtab */ nel = 0; rc = hashtab_map(p->range_tr, hashtab_cnt, &nel); if (rc) return rc; buf[0] = cpu_to_le32(nel); rc = put_entry(buf, sizeof(u32), 1, fp); So size_t is 64 bits. But then we pass a pointer to it as we do to hashtab_cnt. hashtab_cnt thinks it is a 32 bit int and only deals with the first 4 bytes. On x86_64 which is little endian, those first 4 bytes and the least significant, so this works out fine. On ppc64/s390 those first 4 bytes of memory are the high order bits. So at the end of the call to hashtab_map nel has a HUGE number. But the least significant 32 bits are all 0's. We then pass that 64 bit number to cpu_to_le32() which happily truncates it to a 32 bit number and does endian swapping. But the low 32 bits are all 0's. So no matter how many entries are in the hashtab, big endian systems always say there are 0 entries because I screwed up the counting. The fix is easy. Use a 32 bit int, as the hashtab_cnt expects, for nel. Signed-off-by: Eric Paris <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2013-07-25SELinux: Enable setting security contexts on rootfs inodes.Stephen Smalley1-0/+7
rootfs (ramfs) can support setting of security contexts by userspace due to the vfs fallback behavior of calling the security module to set the in-core inode state for security.* attributes when the filesystem does not provide an xattr handler. No xattr handler required as the inodes are pinned in memory and have no backing store. This is useful in allowing early userspace to label individual files within a rootfs while still providing a policy-defined default via genfs. Signed-off-by: Stephen Smalley <[email protected]> Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: Increase ebitmap_node size for 64-bit configurationWaiman Long1-1/+7
Currently, the ebitmap_node structure has a fixed size of 32 bytes. On a 32-bit system, the overhead is 8 bytes, leaving 24 bytes for being used as bitmaps. The overhead ratio is 1/4. On a 64-bit system, the overhead is 16 bytes. Therefore, only 16 bytes are left for bitmap purpose and the overhead ratio is 1/2. With a 3.8.2 kernel, a boot-up operation will cause the ebitmap_get_bit() function to be called about 9 million times. The average number of ebitmap_node traversal is about 3.7. This patch increases the size of the ebitmap_node structure to 64 bytes for 64-bit system to keep the overhead ratio at 1/4. This may also improve performance a little bit by making node to node traversal less frequent (< 2) as more bits are available in each node. Signed-off-by: Waiman Long <[email protected]> Acked-by: Stephen Smalley <[email protected]> Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25SELinux: Reduce overhead of mls_level_isvalid() function callWaiman Long4-19/+27
While running the high_systime workload of the AIM7 benchmark on a 2-socket 12-core Westmere x86-64 machine running 3.10-rc4 kernel (with HT on), it was found that a pretty sizable amount of time was spent in the SELinux code. Below was the perf trace of the "perf record -a -s" of a test run at 1500 users: 5.04% ls [kernel.kallsyms] [k] ebitmap_get_bit 1.96% ls [kernel.kallsyms] [k] mls_level_isvalid 1.95% ls [kernel.kallsyms] [k] find_next_bit The ebitmap_get_bit() was the hottest function in the perf-report output. Both the ebitmap_get_bit() and find_next_bit() functions were, in fact, called by mls_level_isvalid(). As a result, the mls_level_isvalid() call consumed 8.95% of the total CPU time of all the 24 virtual CPUs which is quite a lot. The majority of the mls_level_isvalid() function invocations come from the socket creation system call. Looking at the mls_level_isvalid() function, it is checking to see if all the bits set in one of the ebitmap structure are also set in another one as well as the highest set bit is no bigger than the one specified by the given policydb data structure. It is doing it in a bit-by-bit manner. So if the ebitmap structure has many bits set, the iteration loop will be done many times. The current code can be rewritten to use a similar algorithm as the ebitmap_contains() function with an additional check for the highest set bit. The ebitmap_contains() function was extended to cover an optional additional check for the highest set bit, and the mls_level_isvalid() function was modified to call ebitmap_contains(). With that change, the perf trace showed that the used CPU time drop down to just 0.08% (ebitmap_contains + mls_level_isvalid) of the total which is about 100X less than before. 0.07% ls [kernel.kallsyms] [k] ebitmap_contains 0.05% ls [kernel.kallsyms] [k] ebitmap_get_bit 0.01% ls [kernel.kallsyms] [k] mls_level_isvalid 0.01% ls [kernel.kallsyms] [k] find_next_bit The remaining ebitmap_get_bit() and find_next_bit() functions calls are made by other kernel routines as the new mls_level_isvalid() function will not call them anymore. This patch also improves the high_systime AIM7 benchmark result, though the improvement is not as impressive as is suggested by the reduction in CPU time spent in the ebitmap functions. The table below shows the performance change on the 2-socket x86-64 system (with HT on) mentioned above. +--------------+---------------+----------------+-----------------+ | Workload | mean % change | mean % change | mean % change | | | 10-100 users | 200-1000 users | 1100-2000 users | +--------------+---------------+----------------+-----------------+ | high_systime | +0.1% | +0.9% | +2.6% | +--------------+---------------+----------------+-----------------+ Signed-off-by: Waiman Long <[email protected]> Acked-by: Stephen Smalley <[email protected]> Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25selinux: remove the BUG_ON() from selinux_skb_xfrm_sid()Paul Moore2-5/+8
Remove the BUG_ON() from selinux_skb_xfrm_sid() and propogate the error code up to the caller. Also check the return values in the only caller function, selinux_skb_peerlbl_sid(). Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25selinux: cleanup the XFRM headerPaul Moore1-14/+5
Remove the unused get_sock_isec() function and do some formatting fixes. Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25selinux: cleanup selinux_xfrm_decode_session()Paul Moore1-11/+12
Some basic simplification. Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25selinux: cleanup some comment and whitespace issues in the XFRM codePaul Moore1-13/+10
Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25selinux: cleanup selinux_xfrm_sock_rcv_skb() and selinux_xfrm_postroute_last()Paul Moore2-60/+42
Some basic simplification and comment reformatting. Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25selinux: cleanup selinux_xfrm_policy_lookup() and ↵Paul Moore1-36/+18
selinux_xfrm_state_pol_flow_match() Do some basic simplification and comment reformatting. Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25selinux: cleanup and consolidate the XFRM alloc/clone/delete/free codePaul Moore1-31/+40
The SELinux labeled IPsec code state management functions have been long neglected and could use some cleanup and consolidation. Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25lsm: split the xfrm_state_alloc_security() hook implementationPaul Moore5-124/+110
The xfrm_state_alloc_security() LSM hook implementation is really a multiplexed hook with two different behaviors depending on the arguments passed to it by the caller. This patch splits the LSM hook implementation into two new hook implementations, which match the LSM hooks in the rest of the kernel: * xfrm_state_alloc * xfrm_state_alloc_acquire Also included in this patch are the necessary changes to the SELinux code; no other LSMs are affected. Signed-off-by: Paul Moore <[email protected]> Signed-off-by: Eric Paris <[email protected]>
2013-07-25xattr: Constify ->name member of "struct xattr".Tetsuo Handa5-24/+14
Since everybody sets kstrdup()ed constant string to "struct xattr"->name but nobody modifies "struct xattr"->name , we can omit kstrdup() and its failure checking by constifying ->name member of "struct xattr". Signed-off-by: Tetsuo Handa <[email protected]> Reviewed-by: Joel Becker <[email protected]> [ocfs2] Acked-by: Serge E. Hallyn <[email protected]> Acked-by: Casey Schaufler <[email protected]> Acked-by: Mimi Zohar <[email protected]> Reviewed-by: Paul Moore <[email protected]> Tested-by: Paul Moore <[email protected]> Acked-by: Eric Paris <[email protected]> Signed-off-by: James Morris <[email protected]>