Skip to content

Commit 3e07b3b

Browse files
donettom-1intel-lab-lkp
authored andcommitted
powerpc/64s/slb: Fix SLB multihit issue during SLB preload
On systems using the hash MMU, there is a software SLB preload cache that mirrors the entries loaded into the hardware SLB buffer. This preload cache is subject to periodic eviction — typically after every 256 context switches — to remove old entry. To optimize performance, the kernel skips switch_mmu_context() in switch_mm_irqs_off() when the prev and next mm_struct are the same. However, on hash MMU systems, this can lead to inconsistencies between the hardware SLB and the software preload cache. If an SLB entry for a process is evicted from the software cache on one CPU, and the same process later runs on another CPU without executing switch_mmu_context(), the hardware SLB may retain stale entries. If the kernel then attempts to reload that entry, it can trigger an SLB multi-hit error. The following timeline shows how stale SLB entries are created and can cause a multi-hit error when a process moves between CPUs without a MMU context switch. CPU 0 CPU 1 ----- ----- Process P exec swapper/1 load_elf_binary begin_new_exc activate_mm switch_mm_irqs_off switch_mmu_context switch_slb /* * This invalidates all * the entries in the HW * and setup the new HW * SLB entries as per the * preload cache. */ context_switch sched_migrate_task migrates process P to cpu-1 Process swapper/0 context switch (to process P) (uses mm_struct of Process P) switch_mm_irqs_off() switch_slb load_slb++ /* * load_slb becomes 0 here * and we evict an entry from * the preload cache with * preload_age(). We still * keep HW SLB and preload * cache in sync, that is * because all HW SLB entries * anyways gets evicted in * switch_slb during SLBIA. * We then only add those * entries back in HW SLB, * which are currently * present in preload_cache * (after eviction). */ load_elf_binary continues... setup_new_exec() slb_setup_new_exec() sched_switch event sched_migrate_task migrates process P to cpu-0 context_switch from swapper/0 to Process P switch_mm_irqs_off() /* * Since both prev and next mm struct are same we don't call * switch_mmu_context(). This will cause the HW SLB and SW preload * cache to go out of sync in preload_new_slb_context. Because there * was an SLB entry which was evicted from both HW and preload cache * on cpu-1. Now later in preload_new_slb_context(), when we will try * to add the same preload entry again, we will add this to the SW * preload cache and then will add it to the HW SLB. Since on cpu-0 * this entry was never invalidated, hence adding this entry to the HW * SLB will cause a SLB multi-hit error. */ load_elf_binary continues... START_THREAD start_thread preload_new_slb_context /* * This tries to add a new EA to preload cache which was earlier * evicted from both cpu-1 HW SLB and preload cache. This caused the * HW SLB of cpu-0 to go out of sync with the SW preload cache. The * reason for this was, that when we context switched back on CPU-0, * we should have ideally called switch_mmu_context() which will * bring the HW SLB entries on CPU-0 in sync with SW preload cache * entries by setting up the mmu context properly. But we didn't do * that since the prev mm_struct running on cpu-0 was same as the * next mm_struct (which is true for swapper / kernel threads). So * now when we try to add this new entry into the HW SLB of cpu-0, * we hit a SLB multi-hit error. */ WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62 assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149] Modules linked in: CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty torvalds#12 VOLUNTARY Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected) 0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000 REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty) MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000 CFAR: c0000000001543b0 IRQMASK: 3 <...> NIP [c00000000015426c] assert_slb_presence+0x2c/0x50 LR [c0000000001543b4] slb_insert_entry+0x124/0x390 Call Trace: 0x7fffceb5ffff (unreliable) preload_new_slb_context+0x100/0x1a0 start_thread+0x26c/0x420 load_elf_binary+0x1b04/0x1c40 bprm_execve+0x358/0x680 do_execveat_common+0x1f8/0x240 sys_execve+0x58/0x70 system_call_exception+0x114/0x300 system_call_common+0x160/0x2c4 >From the above analysis, during early exec the hardware SLB is cleared, and entries from the software preload cache are reloaded into hardware by switch_slb. However, preload_new_slb_context and slb_setup_new_exec also attempt to load some of the same entries, which can trigger a multi-hit. In most cases, these additional preloads simply hit existing entries and add nothing new. Removing these functions avoids redundant preloads and eliminates the multi-hit issue. This patch removes these two functions. We tested process switching performance using the context_switch benchmark on POWER9/hash, and observed no regression. Without this patch: 129041 ops/sec With this patch: 129341 ops/sec We also measured SLB faults during boot, and the counts are essentially the same with and without this patch. SLB faults without this patch: 19727 SLB faults with this patch: 19786 Cc: Madhavan Srinivasan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Donet Tom <[email protected]> Cc: <[email protected]> Fixes: 5434ae7 ("powerpc/64s/hash: Add a SLB preload cache") cc: <[email protected]> Suggested-by: Nicholas Piggin <[email protected]> Signed-off-by: Donet Tom <[email protected]> Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
1 parent 0843ba4 commit 3e07b3b

File tree

5 files changed

+0
-98
lines changed

5 files changed

+0
-98
lines changed

arch/powerpc/include/asm/book3s/64/mmu-hash.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,6 @@ void slb_save_contents(struct slb_entry *slb_ptr);
524524
void slb_dump_contents(struct slb_entry *slb_ptr);
525525

526526
extern void slb_vmalloc_update(void);
527-
void preload_new_slb_context(unsigned long start, unsigned long sp);
528527

529528
#ifdef CONFIG_PPC_64S_HASH_MMU
530529
void slb_set_size(u16 size);

arch/powerpc/kernel/process.c

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1897,18 +1897,13 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
18971897
return 0;
18981898
}
18991899

1900-
void preload_new_slb_context(unsigned long start, unsigned long sp);
1901-
19021900
/*
19031901
* Set up a thread for executing a new program
19041902
*/
19051903
void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
19061904
{
19071905
#ifdef CONFIG_PPC64
19081906
unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */
1909-
1910-
if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled())
1911-
preload_new_slb_context(start, sp);
19121907
#endif
19131908

19141909
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM

arch/powerpc/mm/book3s64/internal.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,6 @@ static inline bool stress_hpt(void)
2424

2525
void hpt_do_stress(unsigned long ea, unsigned long hpte_group);
2626

27-
void slb_setup_new_exec(void);
28-
2927
void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush);
3028

3129
#endif /* ARCH_POWERPC_MM_BOOK3S64_INTERNAL_H */

arch/powerpc/mm/book3s64/mmu_context.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -150,8 +150,6 @@ static int hash__init_new_context(struct mm_struct *mm)
150150
void hash__setup_new_exec(void)
151151
{
152152
slice_setup_new_exec();
153-
154-
slb_setup_new_exec();
155153
}
156154
#else
157155
static inline int hash__init_new_context(struct mm_struct *mm)

arch/powerpc/mm/book3s64/slb.c

Lines changed: 0 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -328,94 +328,6 @@ static void preload_age(struct thread_info *ti)
328328
ti->slb_preload_tail = (ti->slb_preload_tail + 1) % SLB_PRELOAD_NR;
329329
}
330330

331-
void slb_setup_new_exec(void)
332-
{
333-
struct thread_info *ti = current_thread_info();
334-
struct mm_struct *mm = current->mm;
335-
unsigned long exec = 0x10000000;
336-
337-
WARN_ON(irqs_disabled());
338-
339-
/*
340-
* preload cache can only be used to determine whether a SLB
341-
* entry exists if it does not start to overflow.
342-
*/
343-
if (ti->slb_preload_nr + 2 > SLB_PRELOAD_NR)
344-
return;
345-
346-
hard_irq_disable();
347-
348-
/*
349-
* We have no good place to clear the slb preload cache on exec,
350-
* flush_thread is about the earliest arch hook but that happens
351-
* after we switch to the mm and have already preloaded the SLBEs.
352-
*
353-
* For the most part that's probably okay to use entries from the
354-
* previous exec, they will age out if unused. It may turn out to
355-
* be an advantage to clear the cache before switching to it,
356-
* however.
357-
*/
358-
359-
/*
360-
* preload some userspace segments into the SLB.
361-
* Almost all 32 and 64bit PowerPC executables are linked at
362-
* 0x10000000 so it makes sense to preload this segment.
363-
*/
364-
if (!is_kernel_addr(exec)) {
365-
if (preload_add(ti, exec))
366-
slb_allocate_user(mm, exec);
367-
}
368-
369-
/* Libraries and mmaps. */
370-
if (!is_kernel_addr(mm->mmap_base)) {
371-
if (preload_add(ti, mm->mmap_base))
372-
slb_allocate_user(mm, mm->mmap_base);
373-
}
374-
375-
/* see switch_slb */
376-
asm volatile("isync" : : : "memory");
377-
378-
local_irq_enable();
379-
}
380-
381-
void preload_new_slb_context(unsigned long start, unsigned long sp)
382-
{
383-
struct thread_info *ti = current_thread_info();
384-
struct mm_struct *mm = current->mm;
385-
unsigned long heap = mm->start_brk;
386-
387-
WARN_ON(irqs_disabled());
388-
389-
/* see above */
390-
if (ti->slb_preload_nr + 3 > SLB_PRELOAD_NR)
391-
return;
392-
393-
hard_irq_disable();
394-
395-
/* Userspace entry address. */
396-
if (!is_kernel_addr(start)) {
397-
if (preload_add(ti, start))
398-
slb_allocate_user(mm, start);
399-
}
400-
401-
/* Top of stack, grows down. */
402-
if (!is_kernel_addr(sp)) {
403-
if (preload_add(ti, sp))
404-
slb_allocate_user(mm, sp);
405-
}
406-
407-
/* Bottom of heap, grows up. */
408-
if (heap && !is_kernel_addr(heap)) {
409-
if (preload_add(ti, heap))
410-
slb_allocate_user(mm, heap);
411-
}
412-
413-
/* see switch_slb */
414-
asm volatile("isync" : : : "memory");
415-
416-
local_irq_enable();
417-
}
418-
419331
static void slb_cache_slbie_kernel(unsigned int index)
420332
{
421333
unsigned long slbie_data = get_paca()->slb_cache[index];

0 commit comments

Comments
 (0)