Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Hirokazu Takata <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Richard Kuo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: David Howells <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Mikael Starvik <[email protected]>
Acked-by: Jesper Nilsson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Haavard Skinnemoen <[email protected]>
Acked-by: Hans-Christian Egtvedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Russell King <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Vineet Gupta <[email protected]> [for arch/arc bits]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: Matt Turner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It will fix NR_PAGETABLE accounting. It's also required if the arch is
going ever support split ptl.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Jonas Bonn <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It will fix NR_PAGETABLE accounting. It's also required if the arch is
going ever support split ptl.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: David Howells <[email protected]>
Cc: Koichi Yasutake <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It will fix NR_PAGETABLE accounting. It's also required if the arch is
going ever support split ptl.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Michal Simek <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Change pgtable_page_ctor() return type from void to bool. Returns true,
if initialization is successful and false otherwise.
Current implementation never fails, but it will change later.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add missing check for memory allocation fail.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Max Filippov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add missing check for memory allocation fail.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Hirokazu Takata <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add missing check for memory allocation fail.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Mikael Starvik <[email protected]>
Acked-by: Jesper Nilsson <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In split page table lock case, we embed spinlock_t into struct page.
For obvious reason, we don't want to increase size of struct page if
spinlock_t is too big, like with DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC or
on -rt kernel. So we disable split page table lock, if spinlock_t is
too big.
This patchset allows to allocate the lock dynamically if spinlock_t is
big. In this page->ptl is used to store pointer to spinlock instead of
spinlock itself. It costs additional cache line for indirect access,
but fix page fault scalability for multi-threaded applications.
LOCK_STAT depends on DEBUG_SPINLOCK, so on current kernel enabling
LOCK_STAT to analyse scalability issues breaks scalability. ;)
The patchset mostly fixes this. Results for ./thp_memscale -c 80 -b 512M
on 4-socket machine:
baseline, no CONFIG_LOCK_STAT: 9.115460703 seconds time elapsed
baseline, CONFIG_LOCK_STAT=y: 53.890567123 seconds time elapsed
patched, no CONFIG_LOCK_STAT: 8.852250368 seconds time elapsed
patched, CONFIG_LOCK_STAT=y: 11.069770759 seconds time elapsed
Patch count is scary, but most of them trivial. Overview:
Patches 1-4 Few bug fixes. No dependencies to other patches.
Probably should applied as soon as possible.
Patch 5 Changes signature of pgtable_page_ctor(). We will use it
for dynamic lock allocation, so it can fail.
Patches 6-8 Add missing constructor/destructor calls on few archs.
It's fixes NR_PAGETABLE accounting and prepare to use
split ptl.
Patches 9-33 Add pgtable_page_ctor() fail handling to all archs.
Patches 34 Finally adds support of dynamically-allocated page->pte.
Also contains documentation for split page table lock.
This patch (of 34):
I've missed that we preallocate few pmds on pgd_alloc() if X86_PAE
enabled. Let's add missed constructor/destructor calls.
I haven't noticed it during testing since prep_new_page() clears
page->mapping and therefore page->ptl. It's effectively equal to
spin_lock_init(&page->ptl).
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chen Liqin <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: David Howells <[email protected]>
Cc: David S. Miller <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Grant Likely <[email protected]>
Cc: Guan Xuetao <[email protected]>
Cc: Haavard Skinnemoen <[email protected]>
Cc: Hans-Christian Egtvedt <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: Hirokazu Takata <[email protected]>
Cc: Ivan Kokshaysky <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Jeff Dike <[email protected]>
Cc: Jesper Nilsson <[email protected]>
Cc: Jonas Bonn <[email protected]>
Cc: Koichi Yasutake <[email protected]>
Cc: Lennox Wu <[email protected]>
Cc: Martin Schwidefsky <[email protected]>
Cc: Matt Turner <[email protected]>
Cc: Max Filippov <[email protected]>
Cc: Michal Simek <[email protected]>
Cc: Mikael Starvik <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Paul Mundt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Russell King <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vineet Gupta <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Enable PMD split page table lock for X86_64 and PAE.
[[email protected]: coding-style fixes]
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The basic idea is the same as with PTE level: the lock is embedded into
struct page of table's page.
We can't use mm->pmd_huge_pte to store pgtables for THP, since we don't
take mm->page_table_lock anymore. Let's reuse page->lru of table's page
for that.
pgtable_pmd_page_ctor() returns true, if initialization is successful
and false otherwise. Current implementation never fails, but assumption
that constructor can fail will help to port it to -rt where spinlock_t
is rather huge and cannot be embedded into struct page -- dynamic
allocation is required.
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Only trivial cases left. Let's convert them altogether.
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Hugetlb supports multiple page sizes. We use split lock only for PMD
level, but not for PUD.
[[email protected]: coding-style fixes]
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Currently mm->pmd_huge_pte protected by page table lock. It will not
work with split lock. We have to have per-pmd pmd_huge_pte for proper
access serialization.
For now, let's just introduce wrapper to access mm->pmd_huge_pte.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With split page table lock we can't know which lock we need to take
before we find the relevant pmd.
Let's move lock taking inside the function.
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With split ptlock it's important to know which lock
pmd_trans_huge_lock() took. This patch adds one more parameter to the
function to return the lock.
In most places migration to new api is trivial. Exception is
move_huge_pmd(): we need to take two locks if pmd tables are different.
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Basic api, backed by mm->page_table_lock for now. Actual implementation
will be added later.
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
With split page table lock for PMD level we can't hold mm->page_table_lock
while updating nr_ptes.
Let's convert it to atomic_long_t to avoid races.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We're going to introduce split page table lock for PMD level. Let's
rename existing split ptlock for PTE level to avoid confusion.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Tested-by: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Alex Thorlton noticed that some massively threaded workloads work poorly,
if THP enabled. This patchset fixes this by introducing split page table
lock for PMD tables. hugetlbfs is not covered yet.
This patchset is based on work by Naoya Horiguchi.
: akpm result summary:
:
: THP off, v3.12-rc2: 18.059261877 seconds time elapsed
: THP off, patched: 16.768027318 seconds time elapsed
:
: THP on, v3.12-rc2: 42.162306788 seconds time elapsed
: THP on, patched: 8.397885779 seconds time elapsed
:
: HUGETLB, v3.12-rc2: 47.574936948 seconds time elapsed
: HUGETLB, patched: 19.447481153 seconds time elapsed
THP off, v3.12-rc2:
-------------------
Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):
1037072.835207 task-clock # 57.426 CPUs utilized ( +- 3.59% )
95,093 context-switches # 0.092 K/sec ( +- 3.93% )
140 cpu-migrations # 0.000 K/sec ( +- 5.28% )
10,000,550 page-faults # 0.010 M/sec ( +- 0.00% )
2,455,210,400,261 cycles # 2.367 GHz ( +- 3.62% ) [83.33%]
2,429,281,882,056 stalled-cycles-frontend # 98.94% frontend cycles idle ( +- 3.67% ) [83.33%]
1,975,960,019,659 stalled-cycles-backend # 80.48% backend cycles idle ( +- 3.88% ) [66.68%]
46,503,296,013 instructions # 0.02 insns per cycle
# 52.24 stalled cycles per insn ( +- 3.21% ) [83.34%]
9,278,997,542 branches # 8.947 M/sec ( +- 4.00% ) [83.34%]
89,881,640 branch-misses # 0.97% of all branches ( +- 1.17% ) [83.33%]
18.059261877 seconds time elapsed ( +- 2.65% )
THP on, v3.12-rc2:
------------------
Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):
3114745.395974 task-clock # 73.875 CPUs utilized ( +- 1.84% )
267,356 context-switches # 0.086 K/sec ( +- 1.84% )
99 cpu-migrations # 0.000 K/sec ( +- 1.40% )
58,313 page-faults # 0.019 K/sec ( +- 0.28% )
7,416,635,817,510 cycles # 2.381 GHz ( +- 1.83% ) [83.33%]
7,342,619,196,993 stalled-cycles-frontend # 99.00% frontend cycles idle ( +- 1.88% ) [83.33%]
6,267,671,641,967 stalled-cycles-backend # 84.51% backend cycles idle ( +- 2.03% ) [66.67%]
117,819,935,165 instructions # 0.02 insns per cycle
# 62.32 stalled cycles per insn ( +- 4.39% ) [83.34%]
28,899,314,777 branches # 9.278 M/sec ( +- 4.48% ) [83.34%]
71,787,032 branch-misses # 0.25% of all branches ( +- 1.03% ) [83.33%]
42.162306788 seconds time elapsed ( +- 1.73% )
HUGETLB, v3.12-rc2:
-------------------
Performance counter stats for './thp_memscale_hugetlbfs -c 80 -b 512M' (5 runs):
2588052.787264 task-clock # 54.400 CPUs utilized ( +- 3.69% )
246,831 context-switches # 0.095 K/sec ( +- 4.15% )
138 cpu-migrations # 0.000 K/sec ( +- 5.30% )
21,027 page-faults # 0.008 K/sec ( +- 0.01% )
6,166,666,307,263 cycles # 2.383 GHz ( +- 3.68% ) [83.33%]
6,086,008,929,407 stalled-cycles-frontend # 98.69% frontend cycles idle ( +- 3.77% ) [83.33%]
5,087,874,435,481 stalled-cycles-backend # 82.51% backend cycles idle ( +- 4.41% ) [66.67%]
133,782,831,249 instructions # 0.02 insns per cycle
# 45.49 stalled cycles per insn ( +- 4.30% ) [83.34%]
34,026,870,541 branches # 13.148 M/sec ( +- 4.24% ) [83.34%]
68,670,942 branch-misses # 0.20% of all branches ( +- 3.26% ) [83.33%]
47.574936948 seconds time elapsed ( +- 2.09% )
THP off, patched:
-----------------
Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):
943301.957892 task-clock # 56.256 CPUs utilized ( +- 3.01% )
86,218 context-switches # 0.091 K/sec ( +- 3.17% )
121 cpu-migrations # 0.000 K/sec ( +- 6.64% )
10,000,551 page-faults # 0.011 M/sec ( +- 0.00% )
2,230,462,457,654 cycles # 2.365 GHz ( +- 3.04% ) [83.32%]
2,204,616,385,805 stalled-cycles-frontend # 98.84% frontend cycles idle ( +- 3.09% ) [83.32%]
1,778,640,046,926 stalled-cycles-backend # 79.74% backend cycles idle ( +- 3.47% ) [66.69%]
45,995,472,617 instructions # 0.02 insns per cycle
# 47.93 stalled cycles per insn ( +- 2.51% ) [83.34%]
9,179,700,174 branches # 9.731 M/sec ( +- 3.04% ) [83.35%]
89,166,529 branch-misses # 0.97% of all branches ( +- 1.45% ) [83.33%]
16.768027318 seconds time elapsed ( +- 2.47% )
THP on, patched:
----------------
Performance counter stats for './thp_memscale -c 80 -b 512m' (5 runs):
458793.837905 task-clock # 54.632 CPUs utilized ( +- 0.79% )
41,831 context-switches # 0.091 K/sec ( +- 0.97% )
98 cpu-migrations # 0.000 K/sec ( +- 1.66% )
57,829 page-faults # 0.126 K/sec ( +- 0.62% )
1,077,543,336,716 cycles # 2.349 GHz ( +- 0.81% ) [83.33%]
1,067,403,802,964 stalled-cycles-frontend # 99.06% frontend cycles idle ( +- 0.87% ) [83.33%]
864,764,616,143 stalled-cycles-backend # 80.25% backend cycles idle ( +- 0.73% ) [66.68%]
16,129,177,440 instructions # 0.01 insns per cycle
# 66.18 stalled cycles per insn ( +- 7.94% ) [83.35%]
3,618,938,569 branches # 7.888 M/sec ( +- 8.46% ) [83.36%]
33,242,032 branch-misses # 0.92% of all branches ( +- 2.02% ) [83.32%]
8.397885779 seconds time elapsed ( +- 0.18% )
HUGETLB, patched:
-----------------
Performance counter stats for './thp_memscale_hugetlbfs -c 80 -b 512M' (5 runs):
395353.076837 task-clock # 20.329 CPUs utilized ( +- 8.16% )
55,730 context-switches # 0.141 K/sec ( +- 5.31% )
138 cpu-migrations # 0.000 K/sec ( +- 4.24% )
21,027 page-faults # 0.053 K/sec ( +- 0.00% )
930,219,717,244 cycles # 2.353 GHz ( +- 8.21% ) [83.32%]
914,295,694,103 stalled-cycles-frontend # 98.29% frontend cycles idle ( +- 8.35% ) [83.33%]
704,137,950,187 stalled-cycles-backend # 75.70% backend cycles idle ( +- 9.16% ) [66.69%]
30,541,538,385 instructions # 0.03 insns per cycle
# 29.94 stalled cycles per insn ( +- 3.98% ) [83.35%]
8,415,376,631 branches # 21.286 M/sec ( +- 3.61% ) [83.36%]
32,645,478 branch-misses # 0.39% of all branches ( +- 3.41% ) [83.32%]
19.447481153 seconds time elapsed ( +- 2.00% )
This patch (of 11):
CONFIG_GENERIC_LOCKBREAK increases sizeof(spinlock_t) to 8 bytes. It
leads to increase sizeof(struct page) by 4 bytes on 32-bit system if split
page table lock is in use, since page->ptl shares space in union with
longs and pointers.
Let's disable split page table lock on 32-bit systems with
GENERIC_LOCKBREAK enabled.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Cc: Alex Thorlton <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: "Eric W . Biederman" <[email protected]>
Cc: "Paul E . McKenney" <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: David Howells <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Michael Kerrisk <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Sedat Dilek <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
There's only one caller of do_generic_file_read() and the only actor is
file_read_actor(). No reason to have a callback parameter.
Signed-off-by: Kirill A. Shutemov <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Reviewed-by: Wanpeng Li <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Hugh Dickins <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Cc: Maxim Levitsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add Dave as maintainer of XFS.
Signed-off-by: Ben Myers <[email protected]>
Reviewed-by: Ric Wheeler <[email protected]>
Reviewed-by: Dave Chinner <[email protected]>
Reviewed-by: Alex Elder <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs update frm Chris Mason:
"This is our usual merge window set of bug fixes, performance
improvements and cleanups. Miao Xie has some really nice
optimizations for writeback.
Josef also expanded our sanity checks quite a bit; these make up a big
chunk of the new lines"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (98 commits)
Btrfs: rename btrfs_start_all_delalloc_inodes
Btrfs: don't wait for the completion of all the ordered extents
Btrfs: don't wait for all the async delalloc when shrinking delalloc
Btrfs: fix the confusion between delalloc bytes and metadata bytes
Btrfs: pick up the code for the item number calculation in flush_space()
Btrfs: wait for the ordered extent only when we want
Btrfs: remove unnecessary initialization and memory barrior in shrink_delalloc()
Btrfs: avoid unnecessary scrub workers allocation
Btrfs: check file extent type before anything else
btrfs: Remove useless variable in write_ctree_super()
btrfs: Fix checkpatch.pl warning of spacing issues
btrfs: Replace kmalloc with kmalloc_array
btrfs: Enclose macros with complex values within parenthesis
btrfs: Use WARN_ON()'s return value in place of WARN_ON(1)
btrfs: Remove redundant local zero structure
btrfs: Pack struct btrfs_device
btrfs: Replace multiple atomic_inc() with atomic_add()
btrfs: Add helper function for free_root_pointers()
Btrfs: fix a crash when running balance and defrag concurrently
Btrfs: do not run snapshot-aware defragment on error
...
|
|
We had been using a DMI table workaround to select the right
frequency for devices, but this is fragile and must be updated
with every new platform.
Instead the default case when VBT is missing is changed to use
120MHz clock for LVDS SSC for these generations.
The docs for 2010-Core, SandyBridge, and IvyBridge all indicate
that the reference frequency for LVDS is 120MHz:
"2010 Core"
http://intellinuxgraphics.org/IHD_OS_Vol3_Part3r2.pdf
page 38
Reference Frequency: 120MHz for CRT and LVDS. 100MHz for the FDI.
"2011 SandyBridge"
http://intellinuxgraphics.org/documentation/SNB/IHD_OS_Vol3_Part3.pdf
page 33
Reference Frequency: 120MHz for CRT, HDMI, LVDS. 100MHz for the FDI.
"2012 IvyBridge"
http://intellinuxgraphics.org/documentation/IVB/IHD_OS_Vol3_Part4.pdf
page 27
Reference Frequency: 120 MHz for CRT, HDMI, LVDS, 100MHz for the FDI.
Signed-off-by: Duncan Laurie <[email protected]>
[olof: Fixup for recent base, switched from if/else to single call]
Signed-off-by: Olof Johansson <[email protected]>
Reviewed-by: Rodrigo Vivi <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
|
|
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Use module_pci_driver() macro which makes the code smaller and
simpler.
Signed-off-by: Jingoo Han <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Casting the return value which is a void pointer is redundant.
The conversion from void pointer to any other pointer type is
guaranteed by the C programming language.
Signed-off-by: Jingoo Han <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Mark the places when the system are in user or are in kernel.
This is used to make full dynticks system (tickless) --
CONFIG_NO_HZ_FULL dependence.
Signed-off-by: Kirill Tkhai <[email protected]>
CC: David Miller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
CONFIG_NO_HZ_FULL requires possibility of smp_send_reschedule()
for the calling CPU. Currently, it is used in inc_nr_running()
scheduler primitive only.
Nobody calls smp_send_reschedule() from preemptible context
(furthermore, it looks like it will be save if anybody use it
another way in the future). But anyway I add WARN_ON() here
just to return here if anything changes.
Signed-off-by: Kirill Tkhai <[email protected]>
CC: David Miller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Simba-bridges
The SIMBA APB Bridges lacks the 'ranges' of-property describing the
PCI I/O and memory areas located beneath the bridge. Faking this
information has been performed by reading range registers in the
APB bridge, and calculating the corresponding areas.
In commit 01f94c4a6ced476ce69b895426fc29bfc48c69bd
("Fix sabre pci controllers with new probing scheme.") a bug was
introduced into this calculation, causing the PCI memory areas
to be calculated incorrectly: The shift size was set to be
identical for I/O and MEM ranges, which is incorrect.
This patch set the shift size of the MEM range back to the
value used before 01f94c4a6ced476ce69b895426fc29bfc48c69bd.
Signed-off-by: Kjetil Oftedal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
ip4_datagram_connect() being called from process context,
it should use IP_INC_STATS() instead of IP_INC_STATS_BH()
otherwise we can deadlock on 32bit arches, or get corruptions of
SNMP counters.
Fixes: 584bdf8cbdf6 ("[IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dave Jones <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
I have received a report about the BUG_ON() in free_basic_memory_bitmaps()
triggering mysteriously during an aborted s2disk hibernation attempt.
The only way I can explain that is that /dev/snapshot was first
opened for writing (resume mode), then closed and then opened again
for reading and closed again without freezing tasks. In that case
the first invocation of snapshot_open() would set the free_bitmaps
flag in snapshot_state, which is a static variable. That flag
wouldn't be cleared later and the second invocation of snapshot_open()
would just leave it like that, so the subsequent snapshot_release()
would see data->frozen set and free_basic_memory_bitmaps() would be
called unnecessarily.
To prevent that from happening clear data->free_bitmaps in
snapshot_open() when the file is being opened for reading (hibernate
mode).
In addition to that, replace the BUG_ON() in free_basic_memory_bitmaps()
with a WARN_ON() as the kernel can continue just fine if the condition
checked by that macro occurs.
Fixes: aab172891542 (PM / hibernate: Fix user space driven resume regression)
Reported-by: Oliver Lorenz <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
Cc: 3.12+ <[email protected]> # 3.12+
|
|
If 'hsr_get_node_data()' returns error, going directly to 'fail' label
doesn't free the memory pointed by 'skb_out'.
Signed-off-by: Geyslan G. Bem <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:
====================
pull request: wireless 2013-11-14
Please pull this batch of fixes intended for the 3.13 stream!
Amitkumar Karwar offers a quartet of mwifiex fixes, including an
endian fix and three fixes for invalid memory access.
Avinash Patil trims the packet length value for packets received from
an SDIO interface.
Colin Ian King fixes a NULL pointer dereference in the rtlwifi
efuse code.
Dan Carpenter cleans-up an mwifiex integer underflow, a potential
libertas oops, a memory corrupion bug in wcn36xx, and a locking issue
also in wcn36xx.
Dan Williams helps prism54 devices to avoid being misclassified as
Ethernet devices.
Felipe Pena fixes a couple of typo errors, one in rt2x00 and the
other in rtlwifi.
Janusz Dziedzic corrects a pair of DFS-related problems in ath9k.
Larry Finger patches three rtlwifi drivers to correctly report signal
strength even for an unassociated AP.
Mark Cave-Ayland rewrites some endian-illiterate packet type extraction
code in rtlwifi.
Stanislaw Gruszka addresses an rt2x00 regression related to setting
HT station WCID and AMPDU density parameters.
Sujith Manoharan corrects the initvals settings for AR9485.
Ujjal Roy patches an obscure bit of code in mwifiex that was using
the wrong definition of eth_hdr when briding patches in AP mode.
Wei Yongjun fixes a couple of bugs: one is a return code handling
bug in libertas; and, the other is a locking issue in wcn36xx.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Now the pointer of struct acpi_device can be got by
ACPI_COMPANION(struct acpi_ac->pdev->dev). So the pointer
is not necessary and remove it.
Signed-off-by: Lan Tianyu <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Commit 2613af0ed18a ("virtio_net: migrate mergeable rx buffers to page
frag allocators") changed the mergeable receive buffer size from PAGE_SIZE
to MTU-size. However, the merge buffer size does not take into account the
size of the virtio-net header. Consequently, packets that are MTU-size
will take two buffers intead of one (to store the virtio-net header),
substantially decreasing the throughput of MTU-size traffic due to TCP
window / SKB truesize effects.
This commit changes the mergeable buffer size to include the virtio-net
header. The buffer size is cacheline-aligned because skb_page_frag_refill
will not automatically align the requested size.
Benchmarks taken from an average of 5 netperf 30-second TCP_STREAM runs
between two QEMU VMs on a single physical machine. Each VM has two VCPUs and
vhost enabled. All VMs and vhost threads run in a single 4 CPU cgroup
cpuset, using cgroups to ensure that other processes in the system will not
be scheduled on the benchmark CPUs. Transmit offloads and mergeable receive
buffers are enabled, but guest_tso4 / guest_csum are explicitly disabled to
force MTU-sized packets on the receiver.
next-net trunk before 2613af0ed18a (PAGE_SIZE buf): 3861.08Gb/s
net-next trunk (MTU 1500- packet uses two buf due to size bug): 4076.62Gb/s
net-next trunk (MTU 1480- packet fits in one buf): 6301.34Gb/s
net-next trunk w/ size fix (MTU 1500 - packet fits in one buf): 6445.44Gb/s
Suggested-by: Eric Northup <[email protected]>
Signed-off-by: Michael Dalton <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Current spi bus_num.chip_select "spix.y" based device naming scheme may not
be stable enough to be used in name based matching, for instance within
ALSA SoC subsystem.
This can be problem in PC kind of platforms if there are changes in SPI bus
configuration, amount of busses or probe order.
This patch addresses the problem by using the ACPI device name with
"spi-" prefix for ACPI enumerated SPI slave. For them device name
"spix.y" becomes "spi-INTABCD:ij".
Signed-off-by: Jarkko Nikula <[email protected]>
Acked-by: Mark Brown <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
Current I2C adapter id - client address "x-00yy" based device naming scheme
is not always stable enough to be used in name based matching, for instance
within ALSA SoC subsystem.
This is problematic in PC kind of platforms where I2C adapter numbers can
change due variable amount of bus controllers, probe order, add-on cards or
just because of BIOS settings.
This patch addresses the problem by using the ACPI device name with
"i2c-" prefix for ACPI enumerated I2C slaves. For them device name
"x-00yz" becomes "i2c-INTABCD:ij" after this patch.
Signed-off-by: Jarkko Nikula <[email protected]>
Acked-by: Wolfram Sang <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
struct acpi_device fields are only available when CONFIG_ACPI is set. We may
find use for dev_name(&adev->dev) in generic code that is build also without
CONFIG_ACPI is set but currently this requires #ifdef CONFIG_ACPI churn.
Provide here an accessor that returns dev_name of embedded struct device dev
in struct acpi_device or NULL depending on CONFIG_ACPI setting.
Signed-off-by: Jarkko Nikula <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
In af3e095a1fb4, Erik Jacobsen fixed one type of unaligned access
bug for ia64 by converting a 64-bit write to use put_unaligned().
Unfortunately, since gcc will convert a short memset() to a series
of appropriately-aligned stores, the problem is now visible again
on tilegx, where the memset that zeros out proc_event is converted
to three 64-bit stores, causing an unaligned access panic.
A better fix for the original problem is to ensure that proc_event
is aligned to 8 bytes here. We can do that relatively easily by
arranging to start the struct cn_msg aligned to 8 bytes and then
offset by 4 bytes. Doing so means that the immediately following
proc_event structure is then correctly aligned to 8 bytes.
The result is that the memset() stores are now aligned, and as an
added benefit, we can remove the put_unaligned() calls in the code.
Signed-off-by: Chris Metcalf <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|