Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 27 additions & 1 deletion lld/ELF/Arch/RISCV.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -651,12 +651,38 @@ static void relaxHi20Lo12(const InputSection &sec, size_t i, uint64_t loc,
if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA()))
return;

// The symbol may be accessed in multiple pieces with different addends.
// If we are relaxing the HI20 relocation, we need to ensure that we only
// relax (and delete the instruction) if all possible LO12 relocations
// that depend on it will be relaxable. The compiler will only access multiple
// pieces of an object with low relocations on the memory op if the alignment
// allows it. Therefore it should suffice to check that the smaller of the
// alignment and size can be reached from GP.
uint32_t alignAdjust =
r.sym->getOutputSection() ? r.sym->getOutputSection()->addralign : 0;
alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignAdjust = std::min<uint32_t>(alignAdjust, r.sym->getSize()); is not tested

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I added a test now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If addend < st_size, it seems that this can be improved to min(addralign, st_size-addend)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, I think that is indeed the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is only true if we assume that the addend on the LO12 will be equal or greater than the addend on the HI20.
Ultimately, we need to ensure that the entire window of
min(st_size, st_align) bytes that contains the relocation is reachable from GP.

if (alignAdjust)
alignAdjust--;

switch (r.type) {
case R_RISCV_HI20:
case R_RISCV_HI20: {
uint64_t hiAddr = r.sym->getVA(r.addend);
// If the addend is zero, the LO12 relocations can only be accessing the
// range [base, base+alignAdjust] (where base == r.sym->getVA()).
if (r.addend == 0 && !isInt<12>(hiAddr + alignAdjust - gp->getVA()))
Copy link
Member

@MaskRay MaskRay Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R_RISCV_HI20 and R_RISCV_LO12_I/R_RISCV_LO12_S should have consistent decisions on whether to do relaxation. Placing the condition only at R_RISCV_HI20 could lead to inconsistent results. I think the condition should be moved closer to if (!isInt<12>(r.sym->getVA(r.addend) - gp->getVA())) above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here was that only this direction involves a functional problem. Relaxing the LO12 and not relaxing the HI20 leaves a redundant HI20, but that's about it. Whereas the other way is the crux of the problem. However, if you'd prefer that I implement the LO12 pessimistically as well, I am not against it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we retain lui/HI20 but relax addi/LO12_i, the output will be lui a0, ...; addi a0, gp, ... with a redundant lui. I think it's confusing to leave lui there and the addi change has no benefit anyway. Therefore, I prefer that we disable the relaxation completely, which aligns with GNU ld.

return;

// However, if the addend is non-zero, the LO12 relocations may be accessing
// the range [HI-alignAdjust-1, HI+alignAdjust].
if (r.addend != 0 && (!isInt<12>(hiAddr - alignAdjust - 1 - gp->getVA()) ||
Copy link
Member

@MaskRay MaskRay Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addend ==0 and !=0 distinction appears to make the condition less strict and utilize pointer provenance: for char p[1], q[1];, we can't use q-1 to get p even if q is placed after p.

At the object file format level, I wonder whether we should optimize the addend==0 case (GNU ld doesn't).
I understand that optimizing the addend==0 case allows us to relax the first lui in riscv-relax-hi20-lo12.s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could check the negative range even in the addend==0 case with the assumption that the compiler won't allow for out-of-bounds access as you describe here. This is just another one of those assumptions that the linker makes about what the compiler will or will not do.
There are of course other ways we can accomplish this:

  • Disable relaxation if we see a negative addend on a LO12 relaxation
  • Keep track of symbols that are accessed with such negative addends and disable relaxation for relocations referencing those symbols
  • Emit a fatal error if such a negative addend is used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to hear from others about the reliability.

Ultimately I hope that we can derive some rules that are simple (ideally no addend==0/addend!=0 distinction), even if we give up some legitimate opportunities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if I just change it to reject relaxation for negative addends? That way, we always look only forward? Would that be too pessimistic?

!isInt<12>(hiAddr + alignAdjust - gp->getVA())))
return;

// Remove lui rd, %hi20(x).
sec.relaxAux->relocTypes[i] = R_RISCV_RELAX;
remove = 4;
break;
}
case R_RISCV_LO12_I:
sec.relaxAux->relocTypes[i] = INTERNAL_R_RISCV_GPREL_I;
break;
Expand Down
147 changes: 147 additions & 0 deletions lld/test/ELF/riscv-relax-gp-edges.s
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# REQUIRES: riscv
# RUN: rm -rf %t && split-file %s %t && cd %t

# RUN: llvm-mc -filetype=obj -triple=riscv32-unknown-elf -mattr=+relax a.s -o rv32a.o
# RUN: llvm-mc -filetype=obj -triple=riscv64-unknown-elf -mattr=+relax a.s -o rv64a.o
# RUN: llvm-mc -filetype=obj -triple=riscv32-unknown-elf -mattr=+relax b.s -o rv32b.o
# RUN: llvm-mc -filetype=obj -triple=riscv64-unknown-elf -mattr=+relax b.s -o rv64b.o
# RUN: llvm-mc -filetype=obj -triple=riscv32-unknown-elf -mattr=+relax c.s -o rv32c.o
# RUN: llvm-mc -filetype=obj -triple=riscv64-unknown-elf -mattr=+relax c.s -o rv64c.o

# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv32a.o lds.a -o rv32a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a.lds b.lds c.lds instead of lds.a lds.b lds.c

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv64a.o lds.a -o rv64a
# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv32b.o lds.b -o rv32b
# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv64b.o lds.b -o rv64b
# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv32c.o lds.c -o rv32c
# RUN: ld.lld --relax-gp --undefined=__global_pointer$ rv64c.o lds.c -o rv64c
# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv32a | FileCheck %s --check-prefix=CHECK-ALIGN
# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv64a | FileCheck %s --check-prefix=CHECK-ALIGN
# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv32b | FileCheck %s --check-prefix=CHECK-HI-ADD
# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv64b | FileCheck %s --check-prefix=CHECK-HI-ADD
# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv32c | FileCheck %s --check-prefix=CHECK-SIZE
# RUN: llvm-objdump -td -M no-aliases --no-show-raw-insn rv64c | FileCheck %s --check-prefix=CHECK-SIZE

# CHECK-ALIGN: 000017e0 l .data {{0+}}80 Var1
# CHECK-ALIGN: 00000ffc g .sdata {{0+}}00 __global_pointer$

# CHECK-ALIGN: <_start>:
# CHECK-ALIGN-NEXT: lui a1, 1
# CHECK-ALIGN-NEXT: lw a0, 2020(gp)
# CHECK-ALIGN-NEXT: lw a1, 2044(a1)

#--- a.s
# The relaxation on the lui is rejected because the alignment (which is smaller
# than the size) doesn't allow it.
.global _start
_start:
lui a1, %hi(Var1)
lw a0, %lo(Var1)(a1) # First part is reachable from gp
lw a1, %lo(Var1+28)(a1) # The second part is not reachable

.section .sdata,"aw"
.section .data,"aw"
.p2align 5
Var1:
.quad 0
.zero 120
.size Var1, 128

#--- lds.a
SECTIONS {
.text : { }
.sdata 0x07fc : { }
.data 0x17E0 : { }
}

# CHECK-HI-ADD: 00001000 l .data {{0+}}08 Var0
# CHECK-HI-ADD: 00001f80 l .data1 {{0+}}80 Var1
# CHECK-HI-ADD: 00001800 g .sdata {{0+}}00 __global_pointer$

# CHECK-HI-ADD: <_start>:
# CHECK-HI-ADD-NEXT: lui a1, 1
# CHECK-HI-ADD-NEXT: lw a0, -2048(gp)
# CHECK-HI-ADD-NEXT: lw a1, -2044(gp)
# CHECK-HI-ADD-NEXT: lui a1, 2
# CHECK-HI-ADD-NEXT: lw a0, 1920(gp)
# CHECK-HI-ADD-NEXT: lw a1, 2044(gp)

#--- b.s
# The relaxation on the two lui are rejected because the amount of data a LO12
# reloc is allowed to address below and above the respective HI20 goes past
# the amount reachable from GP.
.global _start
_start:
lui a1, %hi(Var0+4) # Cannot prove that %lo relocs will be reachable
lw a0, %lo(Var0)(a1) # Reachable from GP
lw a1, %lo(Var0+4)(a1) # Reachable from GP
lui a1, %hi(Var1+124) # Cannot prove that %lo relocs will be reachable
lw a0, %lo(Var1)(a1) # Reachable from GP
lw a1, %lo(Var1+124)(a1) # Reachable from GP

.section .sdata,"aw"
.section .data,"aw"
.p2align 3
Var0:
.quad 0
.size Var0, 8

.section .data1,"aw"
.p2align 7
Var1:
.quad 0
.zero 120
.size Var1, 128

#--- lds.b
SECTIONS {
.text : { }
.sdata 0x1000 : { }
.data 0x1000 : { }
.data1 0x1f80 : { }
}

# CHECK-SIZE: 00000080 l .data {{0+}}08 Var0
# CHECK-SIZE: 00001000 l .data1 {{0+}}80 Var1
# CHECK-SIZE: 00000815 g .sdata {{0+}}00 __global_pointer$

# CHECK-SIZE: <_start>:
# CHECK-SIZE-NOT: lui
# CHECK-SIZE-NEXT: lw a0, -1941(gp)
# CHECK-SIZE-NEXT: lw a1, -1937(gp)
# CHECK-SIZE-NEXT: lui a1, 1
# CHECK-SIZE-NEXT: lw a0, 2027(gp)
# CHECK-SIZE-NEXT: lw a1, 124(a1)

#--- c.s
# The relaxation on the second lui is rejected because the size (and alignment)
# allow for a LO12 that cannot reach its target from GP.
.global _start
_start:
lui a1, %hi(Var0)
lw a0, %lo(Var0)(a1)
lw a1, %lo(Var0+4)(a1)
lui a1, %hi(Var1)
lw a0, %lo(Var1)(a1) # First part is reachable from gp
lw a1, %lo(Var1+124)(a1) # The second part is not reachable

.section .sdata,"aw"
.section .data,"aw"
.p2align 3
Var0:
.quad 0
.size Var0, 8

.section .data1,"aw"
.p2align 7
Var1:
.quad 0
.zero 120
.size Var1, 128

#--- lds.c
SECTIONS {
.text : { }
.sdata 0x0015 : { }
.data 0x0080 : { }
.data1 0x1000 : { }
}