aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2015-02-11mm: /proc/pid/clear_refs: avoid split_huge_page()Kirill A. Shutemov1-3/+44
Currently pagewalker splits all THP pages on any clear_refs request. It's not necessary. We can handle this on PMD level. One side effect is that soft dirty will potentially see more dirty memory, since we will mark whole THP page dirty at once. Sanity checked with CRIU test suite. More testing is required. Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Naoya Horiguchi <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Dave Hansen <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)Naoya Horiguchi1-0/+3
walk_page_range() silently skips vma having VM_PFNMAP set, which leads to undesirable behaviour at client end (who called walk_page_range). For example for pagemap_read(), when no callbacks are called against VM_PFNMAP vma, pagemap_read() may prepare pagemap data for next virtual address range at wrong index. That could confuse and/or break userspace applications. This patch avoid this misbehavior caused by vma(VM_PFNMAP) like follows: - for pagemap_read() which has its own ->pte_hole(), call the ->pte_hole() over vma(VM_PFNMAP), - for clear_refs and queue_pages which have their own ->tests_walk, just return 1 and skip vma(VM_PFNMAP). This is no problem because these are not interested in hole regions, - for other callers, just skip the vma(VM_PFNMAP) as a default behavior. Signed-off-by: Naoya Horiguchi <[email protected]> Signed-off-by: Shiraz Hashim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11numa_maps: remove numa_maps->vmaNaoya Horiguchi1-16/+13
pagewalk.c can handle vma in itself, so we don't have to pass vma via walk->private. And show_numa_map() walks pages on vma basis, so using walk_page_vma() is preferable. Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11numa_maps: fix typo in gather_hugetbl_statsNaoya Horiguchi1-3/+3
Just doing s/gather_hugetbl_stats/gather_hugetlb_stats/g, this makes code grep-friendly. Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11pagemap: use walk->vma instead of calling find_vma()Naoya Horiguchi1-54/+14
Page table walker has the information of the current vma in mm_walk, so we don't have to call find_vma() in each pagemap_(pte|hugetlb)_range() call any longer. Currently pagemap_pte_range() does vma loop itself, so this patch reduces many lines of code. NULL-vma check is omitted because we assume that we never run these callbacks on any address outside vma. And even if it were broken, NULL pointer dereference would be detected, so we can get enough information for debugging. Signed-off-by: Naoya Horiguchi <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()Naoya Horiguchi1-24/+22
clear_refs_write() has some prechecks to determine if we really walk over a given vma. Now we have a test_walk() callback to filter vmas, so let's utilize it. Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11smaps: remove mem_size_stats->vma and use walk_page_vma()Naoya Horiguchi1-8/+4
pagewalk.c can handle vma in itself, so we don't have to pass vma via walk->private. And show_smap() walks pages on vma basis, so using walk_page_vma() is preferable. Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11proc/pagemap: walk page tables under pte lockKonstantin Khlebnikov1-5/+9
Lockless access to pte in pagemap_pte_range() might race with page migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page(): CPU A (pagemap) CPU B (migration) lock_page() try_to_unmap(page, TTU_MIGRATION...) make_migration_entry() set_pte_at() <read *pte> pte_to_pagemap_entry() remove_migration_ptes() unlock_page() if(is_migration_entry()) migration_entry_to_page() BUG_ON(!PageLocked(page)) Also lockless read might be non-atomic if pte is larger than wordsize. Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes. Fixes: 052fb0d635df ("proc: report file/anon bit in /proc/pid/pagemap") Signed-off-by: Konstantin Khlebnikov <[email protected]> Reported-by: Andrey Ryabinin <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: <[email protected]> [3.5+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11page_writeback: put account_page_redirty() after set_page_dirty()Konstantin Khebnikov1-1/+1
Helper account_page_redirty() fixes dirty pages counter for redirtied pages. This patch puts it after dirtying and prevents temporary underflows of dirtied pages counters on zone/bdi and current->nr_dirtied. Signed-off-by: Konstantin Khebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11mm: account pmd page tables to the processKirill A. Shutemov1-3/+6
Dave noticed that unprivileged process can allocate significant amount of memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and memory cgroup. The trick is to allocate a lot of PMD page tables. Linux kernel doesn't account PMD tables to the process, only PTE. The use-cases below use few tricks to allocate a lot of PMD page tables while keeping VmRSS and VmPTE low. oom_score for the process will be 0. #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mman.h> #include <sys/prctl.h> #define PUD_SIZE (1UL << 30) #define PMD_SIZE (1UL << 21) #define NR_PUD 130000 int main(void) { char *addr = NULL; unsigned long i; prctl(PR_SET_THP_DISABLE); for (i = 0; i < NR_PUD ; i++) { addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); break; } *addr = 'x'; munmap(addr, PMD_SIZE); mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ, MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0); if (addr == MAP_FAILED) perror("re-mmap"), exit(1); } printf("PID %d consumed %lu KiB in PMD page tables\n", getpid(), i * 4096 >> 10); return pause(); } The patch addresses the issue by account PMD tables to the process the same way we account PTE. The main place where PMD tables is accounted is __pmd_alloc() and free_pmd_range(). But there're few corner cases: - HugeTLB can share PMD page tables. The patch handles by accounting the table to all processes who share it. - x86 PAE pre-allocates few PMD tables on fork. - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity check on exit(2). Accounting only happens on configuration where PMD page table's level is present (PMD is not folded). As with nr_ptes we use per-mm counter. The counter value is used to calculate baseline for badness score by oom-killer. Signed-off-by: Kirill A. Shutemov <[email protected]> Reported-by: Dave Hansen <[email protected]> Cc: Hugh Dickins <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: David Rientjes <[email protected]> Tested-by: Sedat Dilek <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11mm:add KPF_ZERO_PAGE flag for /proc/kpageflagsWang, Yalin1-3/+13
Add KPF_ZERO_PAGE flag for zero_page, so that userspace processes can detect zero_page in /proc/kpageflags, and then do memory analysis more accurately. Signed-off-by: Yalin Wang <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Naoya Horiguchi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-02-11f2fs: use spinlock for segmap_lock instead of rwlockChao Yu2-12/+12
rwlock can provide better concurrency when there are much more readers than writers because readers can hold the rwlock simultaneously. But now, for segmap_lock rwlock in struct free_segmap_info, there is only one reader 'mount' from below call path: ->f2fs_fill_super ->build_segment_manager ->build_dirty_segmap ->init_dirty_segmap ->find_next_inuse read_lock ... read_unlock Now that our concurrency can not be improved since there is no other reader for this lock, we do not need to use rwlock_t type for segmap_lock, let's replace it with spinlock_t type. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: fix accessing wrong indexed data blocksJaegeuk Kim1-1/+9
This patch fixes the following test. This causes: attempt to access beyond end of device sdb2: rw=16384, want=14413962000, limit=16777216 The reason is: - f2fs_write_begin - f2fs_convert_inline_inode returns -ENOSPC - f2fs_write_failed - truncate_blocks - truncate_partial_data_page - find_data_page - get_dnode_of_data returns wrong data index retrieved from inline_data - f2fs_submit_page_bio(wrong data index) - submit_bio(wrong data index) Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: avoid variable length arrayJaegeuk Kim3-2/+10
Instead of using variable length array, this patch let preallocate memory for them. Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: fix sparse warningsJaegeuk Kim4-6/+5
This patch resolves the following warnings. include/trace/events/f2fs.h:150:1: warning: expression using sizeof bool include/trace/events/f2fs.h:180:1: warning: expression using sizeof bool include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool include/trace/events/f2fs.h:150:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:180:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) fs/f2fs/checkpoint.c:27:19: warning: symbol 'inode_entry_slab' was not declared. Should it be static? fs/f2fs/checkpoint.c:577:15: warning: cast to restricted __le32 fs/f2fs/checkpoint.c:592:15: warning: cast to restricted __le32 fs/f2fs/trace.c:19:1: warning: symbol 'pids' was not declared. Should it be static? fs/f2fs/trace.c:21:21: warning: symbol 'last_io' was not declared. Should it be static? Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: allocate data blocks in advance for f2fs_direct_IOJaegeuk Kim1-3/+54
This patch adds preallocation for data blocks to prepare f2fs_direct_IO. Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: introduce macros to convert bytes and blocks in f2fsJaegeuk Kim3-10/+8
This patch adds two macros for transition between byte and block offsets. Currently, f2fs only supports 4KB blocks, so use the default size for now. Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: call set_buffer_new for get_blockJaegeuk Kim1-1/+3
This patch fixes wrong handling of buffer_new flag in get_block. If f2fs allocates new blocks and mapped buffer_head, it needs to set buffer_new for the bh_result. Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: check node page contents all the timeJaegeuk Kim1-4/+3
In get_node_page, if the page is up-to-date, we assumed that the page was not reclaimed at all. But, sometimes it was reported that its contents was missing. So, just for sure, let's check its mapping and contents. Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: avoid data offset overflow when lseeking huge fileChao Yu1-1/+1
xfstest generic/285 complains our issue in lseeking huge file. Here is the detail output of generic/285: "./check -f2fs tests/generic/285 Ran: generic/285 Failures: generic/285 Failed 1 of 1 tests 10. Test a huge file for offset overflow 10.01 SEEK_HOLE expected 65536 or 8589934592, got 65536. succ 10.02 SEEK_HOLE expected 65536 or 8589934592, got 65536. succ 10.03 SEEK_DATA expected 0 or 0, got 0. succ 10.04 SEEK_DATA expected 1 or 1, got 1. succ 10.05 SEEK_HOLE expected 8589934592 or 8589934592, got 0. FAIL 10.06 SEEK_DATA expected 8589869056 or 8589869056, got 8589869056. succ 10.07 SEEK_DATA expected 8589869057 or 8589869057, got 8589869057. succ 10.08 SEEK_DATA expected 8589869056 or 8589869056, got 4294901760. FAIL" The reason of this issue is: We will calculate current offset through left shifting page-offset with PAGE_CACHE_SHIFT bits, but our page-offset is a type of unsigned long, its size is 4 bytes in 32-bits machine. So if our page-offset is bigger than (1 << 32 / pagesize - 1), result of left shifting will overflow. Let's fix this issue by casting type of page-offset to type of current offset: loff_t. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: fix to use highmem for pages of newly created directoryChao Yu1-1/+1
In commit a78186ebe516 ("f2fs: use highmem for directory pages"), we have set __GFP_HIGHMEM into dir mapping's gfp flag in f2fs_iget, so high address memory could be used for these existing dir's page. But we forgot to set flag for newly created dir, due to this reason, our newly created dir pages could not be allocated from high address memory. Fix it. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: introduce a batched trimJaegeuk Kim3-5/+21
This patch introduces a batched trimming feature, which submits split discard commands. This is to avoid long latency due to huge trim commands. If fstrim was triggered ranging from 0 to the end of device, we should lock all the checkpoint-related mutexes, resulting in very long latency. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: merge {invalidate,release}page for meta/node/data pagesChao Yu4-51/+22
This patch merges ->{invalidate,release}page function for meta/node/data pages. After this, duplication of codes could be removed. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: show the number of writeback pages in statJaegeuk Kim2-3/+4
This patch adds the # of writeback pages in stat info. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: keep PagePrivate during releasepageJaegeuk Kim3-0/+12
If PagePrivate is removed by releasepage, f2fs loses counting dirty pages. e.g., try_to_release_page will not release page when the page is dirty, but our releasepage removes PagePrivate. [<ffffffff81188d75>] try_to_release_page+0x35/0x50 [<ffffffff811996f9>] invalidate_inode_pages2_range+0x2f9/0x3b0 [<ffffffffa02a7f54>] ? truncate_blocks+0x384/0x4d0 [f2fs] [<ffffffffa02b7583>] ? f2fs_direct_IO+0x283/0x290 [f2fs] [<ffffffffa02b7fb0>] ? get_data_block_fiemap+0x20/0x20 [f2fs] [<ffffffff8118aa53>] generic_file_direct_write+0x163/0x170 [<ffffffff8118ad06>] __generic_file_write_iter+0x2a6/0x350 [<ffffffff8118adef>] generic_file_write_iter+0x3f/0xb0 [<ffffffff81203081>] new_sync_write+0x81/0xb0 [<ffffffff81203837>] vfs_write+0xb7/0x1f0 [<ffffffff81204459>] SyS_write+0x49/0xb0 [<ffffffff817c286d>] system_call_fastpath+0x16/0x1b Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: should fail mount when trying to recover data on read-only devJaegeuk Kim1-0/+9
If device is read-only, we should not proceed data recovery. But, if the previous checkpoint was done by normal clean shutdown, it's safe to proceed the recovery, since there will be no data to be recovered. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: split UMOUNT and FASTBOOT flagsJaegeuk Kim5-17/+44
This patch adds FASTBOOT flag into checkpoint as follows. - CP_UMOUNT_FLAG is set when system is umounted. - CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries was done. So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly. Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: avoid write_checkpoint if f2fs is mounted readonlyJaegeuk Kim1-0/+2
Do not change any partition when f2fs is changed to readonly mode. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: support norecovery mount optionJaegeuk Kim1-0/+8
This patch adds a mount option, norecovery, which is mostly same as disable_roll_forward. The only difference is that norecovery should be activated with read-only mount option. This can be used when user wants to check whether f2fs is mountable or not without any recovery process. (e.g., xfstests/200) Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: fix not to drop mount options when retrying fill_superJaegeuk Kim1-3/+13
If wrong mount option was requested, f2fs tries to fill_super again. But, during the next trial, f2fs has no valid mount options, since parse_options deleted all the separators in the original string. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: merge flags in struct f2fs_sb_infoChao Yu7-33/+44
Currently, there are several variables with Boolean type as below: struct f2fs_sb_info { ... int s_dirty; bool need_fsck; bool s_closing; ... bool por_doing; ... } For this there are some issues: 1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean type variables by compiler. 2. if we continuously add new flag into f2fs_sb_info, structure will be messed up. So in this patch, we try to: 1. switch s_dirty to Boolean type variable since it has two status 0/1. 2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag. 3. introduce an enum type which can indicate different states of sbi. 4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag to operate flags for sbi. After that, above issues will be fixed. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: clean up {in,de}create_sleep_timeChao Yu2-18/+18
Use pointer parameter @wait to pass result in {in,de}create_sleep_time for cleanup. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: make truncate_inline_date staticChao Yu2-17/+9
1. make truncate_inline_date static; 2. remove parameter @from of truncate_inline_date as callers only pass zero. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: fix a bug of inheriting default ACL from parentKinglong Mee1-1/+1
Introduced by a6dda0e63e97122ce9e0ba04367e37cca28315fa "f2fs: use generic posix ACL infrastructure". When testing default acl, gets in recent kernel (3.19.0-rc5), user::rwx group::r-x other::r-x default:user::rwx default:group::r-x default:group:root:rwx default:mask::rwx default:other::r-x ]# getfacl testdir/ user::rwx group::rwx // missing an acl "group:root:rwx" inherited from parent other::r-x default:user::rwx default:group::r-x default:group:root:rwx default:mask::rwx default:other::r-x Signed-off-by: Kinglong Mee <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: use f2fs_radix_tree_insert to clean codesChao Yu1-5/+2
No modification in functionality, just clean codes with f2fs_radix_tree_insert. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: add F2FS_IOC_GETVERSION supportChao Yu2-0/+10
In this patch we add the FS_IOC_GETVERSION ioctl for getting i_generation from inode, after that, users can list file's generation number by using "lsattr -v". Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: leave comment for code readabilityJaegeuk Kim1-0/+4
During the recovery, any xattr blocks should not be found, since they are written into cold log, not the warm node chain. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: fix to release count of meta page in ->invalidatepageChao Yu1-0/+19
We will encounter deadloop in below scenario: 1. increase page count for F2FS_DIRTY_META type in following path: ->recover_fsync_data ->recover_data ->do_recover_data ->recover_data_page ->change_curseg ->write_sum_page ->set_page_dirty 2. fail in recover_data() 3. invalidate meta pages in truncate_inode_pages_final without decreasing page count. 4. deadloop when sync_meta_pages as page count will always be non-zero. message: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [<c1129a37>] pagevec_lookup_tag+0x27/0x30 [<f0e774c7>] sync_meta_pages+0x87/0x160 [f2fs] [<f0e86dd9>] recover_fsync_data+0xeb9/0xf10 [f2fs] [<f0e75398>] f2fs_fill_super+0x888/0x980 [f2fs] [<c11733ca>] mount_bdev+0x16a/0x1a0 [<f0e7180f>] f2fs_mount+0x1f/0x30 [f2fs] [<c1173da6>] mount_fs+0x36/0x170 [<c118b6f5>] vfs_kern_mount+0x55/0xe0 [<c118d63f>] do_mount+0x1df/0x9f0 [<c118e110>] SyS_mount+0x70/0xb0 [<c15a0c48>] sysenter_do_call+0x12/0x12 To avoid page count leak, let's add ->invalidatepage and ->releasepage in f2fs_meta_aops as f2fs_node_aops to release meta page count correctly. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: do checkpoint when umount flag is not setJaegeuk Kim2-3/+9
If the previous checkpoint was done without CP_UMOUNT flag, it needs to do checkpoint with CP_UMOUNT for the next fast boot. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: trigger correct checkpoint during umountJaegeuk Kim2-2/+11
This patch fixes to trigger checkpoint with umount flag when kill_sb was called. In kill_sb, f2fs_sync_fs was finally called, but at this time, f2fs can't do checkpoint with CP_UMOUNT. After then, f2fs_put_super is not doing checkpoint, since it is not dirty. So, this patch adds a flag to indicate f2fs_sync_fs is called during umount. Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: update memory footprint informationJaegeuk Kim2-8/+18
This patch adds missing memory usages, and splits them in detail. Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: fix wrong memory footprint statistics in debugfsChao Yu1-4/+11
Our value of memory footprint statistics showed in debugfs is not calculated correctly. Fix it in this patch. Signed-off-by: Chao Yu <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11f2fs: avoid infinite loop on cp_errorJaegeuk Kim1-0/+4
If cp_error is set, we should avoid all the infinite loop. In f2fs_sync_file, there is a hole, and this patch fixes that. Signed-off-by: Jaegeuk Kim <[email protected]>
2015-02-11NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()Trond Myklebust1-2/+2
For added overflow protection... Signed-off-by: Trond Myklebust <[email protected]>
2015-02-11NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_argsTrond Myklebust1-1/+3
If the call to decode_rc_list() fails due to a memory allocation error, then we need to truncate the array size to ensure that we only call kfree() on those pointer that were allocated. Reported-by: David Ramos <[email protected]> Fixes: 4aece6a19cf7f ("nfs41: cb_sequence xdr implementation") Cc: [email protected] Signed-off-by: Trond Myklebust <[email protected]>
2015-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2-13/+8
Pull networking updates from David Miller: 1) More iov_iter conversion work from Al Viro. [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was wrong, and this pull actually adds an extra commit on top of the branch I'm pulling to fix that up, so that the pre-merge state is ok. - Linus ] 2) Various optimizations to the ipv4 forwarding information base trie lookup implementation. From Alexander Duyck. 3) Remove sock_iocb altogether, from CHristoph Hellwig. 4) Allow congestion control algorithm selection via routing metrics. From Daniel Borkmann. 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet. 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet. 7) Add xmit_more support to r8169, e1000, and e1000e drivers. From Florian Westphal. 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross. 9) Add BPF packet actions to packet scheduler, from Jiri Pirko. 10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer. 11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman Kwok. 12) More sanely handle out-of-window dupacks, which can result in serious ACK storms. From Neal Cardwell. 13) Various rhashtable bug fixes and enhancements, from Herbert Xu, Patrick McHardy, and Thomas Graf. 14) Support xmit_more in be2net, from Sathya Perla. 15) Group Policy extensions for vxlan, from Thomas Graf. 16) Remove Checksum Offload support for vxlan, from Tom Herbert. 17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From Vlad Yasevich. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits) crypto: fix af_alg_make_sg() conversion to iov_iter ipv4: Namespecify TCP PMTU mechanism i40e: Fix for stats init function call in Rx setup tcp: don't include Fast Open option in SYN-ACK on pure SYN-data openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set ipv6: Make __ipv6_select_ident static ipv6: Fix fragment id assignment on LE arches. bridge: Fix inability to add non-vlan fdb entry net: Mellanox: Delete unnecessary checks before the function call "vunmap" cxgb4: Add support in cxgb4 to get expansion rom version via ethtool ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version net: dsa: Remove redundant phy_attach() IB/mlx4: Reset flow support for IB kernel ULPs IB/mlx4: Always use the correct port for mirrored multicast attachments net/bonding: Fix potential bad memory access during bonding events tipc: remove tipc_snprintf tipc: nl compat add noop and remove legacy nl framework tipc: convert legacy nl stats show to nl compat tipc: convert legacy nl net id get to nl compat tipc: convert legacy nl net id set to nl compat ...
2015-02-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds37-115/+92
Merge misc updates from Andrew Morton: "Bite-sized chunks this time, to avoid the MTA ratelimiting woes. - fs/notify updates - ocfs2 - some of MM" That laconic "some MM" is mainly the removal of remap_file_pages(), which is a big simplification of the VM, and which gets rid of a *lot* of random cruft and special cases because we no longer support the non-linear mappings that it used. From a user interface perspective, nothing has changed, because the remap_file_pages() syscall still exists, it's just done by emulating the old behavior by creating a lot of individual small mappings instead of one non-linear one. The emulation is slower than the old "native" non-linear mappings, but nobody really uses or cares about remap_file_pages(), and simplifying the VM is a big advantage. * emailed patches from Andrew Morton <[email protected]>: (78 commits) memcg: zap memcg_slab_caches and memcg_slab_mutex memcg: zap memcg_name argument of memcg_create_kmem_cache memcg: zap __memcg_{charge,uncharge}_slab mm/page_alloc.c: place zone_id check before VM_BUG_ON_PAGE check mm: hugetlb: fix type of hugetlb_treat_as_movable variable mm, hugetlb: remove unnecessary lower bound on sysctl handlers"? mm: memory: merge shared-writable dirtying branches in do_wp_page() mm: memory: remove ->vm_file check on shared writable vmas xtensa: drop _PAGE_FILE and pte_file()-related helpers x86: drop _PAGE_FILE and pte_file()-related helpers unicore32: drop pte_file()-related helpers um: drop _PAGE_FILE and pte_file()-related helpers tile: drop pte_file()-related helpers sparc: drop pte_file()-related helpers sh: drop _PAGE_FILE and pte_file()-related helpers score: drop _PAGE_FILE and pte_file()-related helpers s390: drop pte_file()-related helpers parisc: drop _PAGE_FILE and pte_file()-related helpers openrisc: drop _PAGE_FILE and pte_file()-related helpers nios2: drop _PAGE_FILE and pte_file()-related helpers ...
2015-02-10Merge tag 'gfs2-merge-window' of ↵Linus Torvalds6-17/+8
git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw Pull gfs2 updates from Steven Whitehouse: "This time we have mostly clean ups. There is a bug fix for a NULL dereference relating to ACLs, and another which improves (but does not fix entirely) an allocation fall-back code path. The other three patches are small clean ups" * tag 'gfs2-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: Fix crash during ACL deletion in acl max entry check in gfs2_set_acl() GFS2: use __vmalloc GFP_NOFS for fs-related allocations. GFS2: Eliminate a nonsense goto GFS2: fix sprintf format specifier GFS2: Eliminate __gfs2_glock_remove_from_lru
2015-02-10Merge tag 'xfs-for-linus-3.20-rc1' of ↵Linus Torvalds38-880/+807
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs update from Dave Chinner: "This update contains: - RENAME_EXCHANGE support - Rework of the superblock logging infrastructure - Rework of the XFS_IOCTL_SETXATTR implementation * enables use inside user namespaces * fixes inconsistencies setting extent size hints - fixes for missing buffer type annotations used in log recovery - more consolidation of libxfs headers - preparation patches for block based PNFS support - miscellaneous bug fixes and cleanups" * tag 'xfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (37 commits) xfs: only trace buffer items if they exist xfs: report proper f_files in statfs if we overshoot imaxpct xfs: fix panic_mask documentation xfs: xfs_ioctl_setattr_check_projid can be static xfs: growfs should use synchronous transactions xfs: fix behaviour of XFS_IOC_FSSETXATTR on directories xfs: factor projid hint checking out of xfs_ioctl_setattr xfs: factor extsize hint checking out of xfs_ioctl_setattr xfs: XFS_IOCTL_SETXATTR can run in user namespaces xfs: kill xfs_ioctl_setattr behaviour mask xfs: disaggregate xfs_ioctl_setattr xfs: factor out xfs_ioctl_setattr transaciton preamble xfs: separate xflags from xfs_ioctl_setattr xfs: FSX_NONBLOCK is not used xfs: don't allocate an ioend for direct I/O completions xfs: change kmem_free to use generic kvfree() xfs: factor out a xfs_update_prealloc_flags() helper xfs: remove incorrect error negation in attr_multi ioctl xfs: set superblock buffer type correctly xfs: set buf types when converting extent formats ...
2015-02-10Merge branch 'for_linus' of ↵Linus Torvalds15-212/+256
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota interface unification and misc cleanups from Jan Kara: "The first part of the series unifying XFS and VFS quota interfaces. This part unifies turning quotas on and off so quota-tools and xfs_quota can be used to manage any filesystem. This is useful so that userspace doesn't have to distinguish which filesystem it is working with. As a result we can then easily reuse tests for project quotas in XFS for ext4. This also contains minor cleanups and fixes for udf, isofs, and ext3" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (23 commits) udf: remove bool assignment to 0/1 udf: use bool for done quota: Store maximum space limit in bytes quota: Remove quota_on_meta callback ocfs2: Use generic helpers for quotaon and quotaoff ext4: Use generic helpers for quotaon and quotaoff quota: Add ->quota_{enable,disable} callbacks for VFS quotas quota: Wire up ->quota_{enable,disable} callbacks into Q_QUOTA{ON,OFF} quota: Split ->set_xstate callback into two xfs: Remove some pointless quota checks xfs: Remove some useless flags tests xfs: Remove useless test quota: Verify flags passed to Q_SETINFO quota: Cleanup flags definitions ocfs2: Move OLQF_CLEAN flag out of generic quota flags quota: Don't store flags for v2 quota format jbd: drop jbd_ENOSYS debug udf: destroy sbi mutex in put_super udf: Check length of extended attributes and allocation descriptors udf: Remove repeated loads blocksize ...