Skip to content

Conversation

@stephenkyle-ARM
Copy link
Contributor

Ensures that Aarch32 Arm builds with an Armv8 compiler do not set
BROTLI_64_BITS.

This scenario is possible with ChromeOS builds, as they may use a
toolchain with the target armv7-cros-gnueabi, but with -march=armv8.
This will set __ARM_ARCH to 8 (defining BROTLI_TARGET_ARMV8), but will
also set __ARM_32BIT_STATE and not __ARM_64BIT_STATE. Without this,
illegal 64-bit non-word-aligned reads (LDRD) may be emitted.

Also fix unaligned 64-bit reads on AArch32 - STRD was still possible to
emit.

@eustas
Copy link
Collaborator

eustas commented Jul 17, 2018

Hello. Thank you for the patch.

We try to keep different concepts separate, so it would be better to split BROTLI_TARGET_ARMV8 into BROTLI_TARGET_ARMV8_32 and BROTLI_TARGET_ARMV8_64 instead of making complex ifdef condition.

@stephenkyle-ARM
Copy link
Contributor Author

Thanks for the feedback, here's an updated version. I figured it would be best to avoid people mistakenly choosing the wrong guard for ARMV8* defines in the future by forcing them to choose _32, _64, or _ANY. Let me know what you think?

@stephenkyle-ARM stephenkyle-ARM force-pushed the fix-armv7-builds branch 2 times, most recently from 02c8def to c8b6083 Compare July 19, 2018 08:33
@eustas
Copy link
Collaborator

eustas commented Jul 20, 2018

Sorry for the late response.
Thanks, now patch is nearly ideal! One last thing to fix is to guard __attribute__((aligned(x))).
Unfortunately, can not find exact compiler versions that support it.
With clang it is easy, it has that nice has_attribute...
For GCC we can set 2.7, according to https://ohse.de/uwe/articles/gcc-attributes.html
For other compilers... wait until someone complains =)

Ensures that Aarch32 Arm builds with an Armv8 compiler do not set
BROTLI_64_BITS.

This scenario is possible with ChromeOS builds, as they may use a
toolchain with the target armv7-cros-gnueabi, but with -march=armv8.
This will set __ARM_ARCH to 8 (defining BROTLI_TARGET_ARMV8), but will
also set __ARM_32BIT_STATE and not __ARM_64BIT_STATE. Without this,
illegal 64-bit non-word-aligned reads (LDRD) may be emitted.

Also fix unaligned 64-bit reads on AArch32 - STRD was still possible to
emit.
@stephenkyle-ARM
Copy link
Contributor Author

Added the guard, and an alternative solution if that guard fails.

@eustas
Copy link
Collaborator

eustas commented Jul 24, 2018

LGTM.
@jan-wassenberg, could you also take a look, please.

@jan-wassenberg
Copy link
Member

How's code generation for uint64_t x; memcpy(&x, ptr, 8);? I'd recommend that approach instead because volatile semantics differ between compilers and type-punning might run into trouble with strict aliasing enabled.

@stephenkyle-ARM
Copy link
Contributor Author

stephenkyle-ARM commented Jul 24, 2018

Okay, here's a comparison of the code generation that I'm seeing with the versions of gcc/clang available in ChromeOS.

This is using the gcc compiler:

armv7a-cros-linux-gnueabi-gcc.real (4.9.2_cos_gg_4.9.2-r191-71959ce8f47f676a26bb21da7117101d9d73867e_4.9.2-r191) 4.9.x 20150123 (prerelease)

and the clang compiler:

Chromium OS 7.0_pre328903_p20180425-r5 clang version 7.0.0 (/var/cache/chromeos-cache/distfiles/host/egit-src/clang.git e7408fe366bb18923fa360b069b4e4566203f34f) (/var/cache/chromeos-cache/distfiles/host/egit-src/llvm.git 95561668f063fbcb8195bde05ecede721ece4ba4) (based on LLVM 7.0.0svn)
Target: armv7a-cros-linux-gnueabi

This produces the following assembly:

=== GCC NO NEON
=== using: -march=armv8-a+crc -mtune=cortex-a57.cortex-a53 -O3

00000000 <BrotliUnalignedWrite64_ORIGINAL>:
   0:	e9c0 2300 	strd	r2, r3, [r0]
   4:	4770      	bx	lr
   6:	bf00      	nop

00000008 <BrotliUnalignedRead64_ORIGINAL>:
   8:	e9d0 0300 	ldrd	r0, r3, [r0]
   c:	2200      	movs	r2, #0
   e:	ea42 0200 	orr.w	r2, r2, r0
  12:	4610      	mov	r0, r2
  14:	4619      	mov	r1, r3
  16:	4770      	bx	lr

00000018 <BrotliUnalignedWrite64_USE_VOLATILE>:
  18:	6002      	str	r2, [r0, #0]
  1a:	6043      	str	r3, [r0, #4]
  1c:	4770      	bx	lr
  1e:	bf00      	nop

00000020 <BrotliUnalignedRead64_USE_VOLATILE>:
  20:	6801      	ldr	r1, [r0, #0]
  22:	2200      	movs	r2, #0
  24:	6840      	ldr	r0, [r0, #4]
  26:	4603      	mov	r3, r0
  28:	ea42 0201 	orr.w	r2, r2, r1
  2c:	4610      	mov	r0, r2
  2e:	4619      	mov	r1, r3
  30:	4770      	bx	lr
  32:	bf00      	nop

00000034 <BrotliUnalignedWrite64_USE_MEMCPY>:
  34:	b082      	sub	sp, #8
  36:	6002      	str	r2, [r0, #0]
  38:	6043      	str	r3, [r0, #4]
  3a:	b002      	add	sp, #8
  3c:	4770      	bx	lr
  3e:	bf00      	nop

00000040 <BrotliUnalignedRead64_USE_MEMCPY>:
  40:	4b0a      	ldr	r3, [pc, #40]	; (6c <BrotliUnalignedRead64_USE_MEMCPY+0x2c>)
  42:	f84d ed04 	str.w	lr, [sp, #-4]!
  46:	447b      	add	r3, pc
  48:	681b      	ldr	r3, [r3, #0]
  4a:	b085      	sub	sp, #20
  4c:	6841      	ldr	r1, [r0, #4]
  4e:	6800      	ldr	r0, [r0, #0]
  50:	681a      	ldr	r2, [r3, #0]
  52:	9203      	str	r2, [sp, #12]
  54:	9a03      	ldr	r2, [sp, #12]
  56:	681b      	ldr	r3, [r3, #0]
  58:	e9cd 0100 	strd	r0, r1, [sp]
  5c:	429a      	cmp	r2, r3
  5e:	d102      	bne.n	66 <BrotliUnalignedRead64_USE_MEMCPY+0x26>
  60:	b005      	add	sp, #20
  62:	f85d fb04 	ldr.w	pc, [sp], #4
  66:	f7ff fffe 	bl	0 <__stack_chk_fail>
  6a:	bf00      	nop
  6c:	00000022 	.word	0x00000022

00000070 <BrotliUnalignedWrite64_USE_ALIGNED>:
  70:	6002      	str	r2, [r0, #0]
  72:	6043      	str	r3, [r0, #4]
  74:	4770      	bx	lr
  76:	bf00      	nop

00000078 <BrotliUnalignedRead64_USE_ALIGNED>:
  78:	4603      	mov	r3, r0
  7a:	6800      	ldr	r0, [r0, #0]
  7c:	6859      	ldr	r1, [r3, #4]
  7e:	4770      	bx	lr

=== GCC NEON
=== using: -march=armv8-a+crc -mtune=cortex-a57.cortex-a53 -O3 -mfpu=neon

00000000 <BrotliUnalignedWrite64_ORIGINAL>:
   0:	e9c0 2300 	strd	r2, r3, [r0]
   4:	4770      	bx	lr
   6:	bf00      	nop

00000008 <BrotliUnalignedRead64_ORIGINAL>:
   8:	e9d0 1300 	ldrd	r1, r3, [r0]
   c:	2200      	movs	r2, #0
   e:	ea42 0201 	orr.w	r2, r2, r1
  12:	4610      	mov	r0, r2
  14:	4619      	mov	r1, r3
  16:	4770      	bx	lr

00000018 <BrotliUnalignedWrite64_USE_VOLATILE>:
  18:	6002      	str	r2, [r0, #0]
  1a:	6043      	str	r3, [r0, #4]
  1c:	4770      	bx	lr
  1e:	bf00      	nop

00000020 <BrotliUnalignedRead64_USE_VOLATILE>:
  20:	6801      	ldr	r1, [r0, #0]
  22:	2200      	movs	r2, #0
  24:	6840      	ldr	r0, [r0, #4]
  26:	4603      	mov	r3, r0
  28:	ea42 0201 	orr.w	r2, r2, r1
  2c:	4610      	mov	r0, r2
  2e:	4619      	mov	r1, r3
  30:	4770      	bx	lr
  32:	bf00      	nop

00000034 <BrotliUnalignedWrite64_USE_MEMCPY>:
  34:	b082      	sub	sp, #8
  36:	6002      	str	r2, [r0, #0]
  38:	6043      	str	r3, [r0, #4]
  3a:	b002      	add	sp, #8
  3c:	4770      	bx	lr
  3e:	bf00      	nop

00000040 <BrotliUnalignedRead64_USE_MEMCPY>:
  40:	4b0a      	ldr	r3, [pc, #40]	; (6c <BrotliUnalignedRead64_USE_MEMCPY+0x2c>)
  42:	f84d ed04 	str.w	lr, [sp, #-4]!
  46:	447b      	add	r3, pc
  48:	681b      	ldr	r3, [r3, #0]
  4a:	b085      	sub	sp, #20
  4c:	6841      	ldr	r1, [r0, #4]
  4e:	6800      	ldr	r0, [r0, #0]
  50:	681a      	ldr	r2, [r3, #0]
  52:	9203      	str	r2, [sp, #12]
  54:	9a03      	ldr	r2, [sp, #12]
  56:	681b      	ldr	r3, [r3, #0]
  58:	e9cd 0100 	strd	r0, r1, [sp]
  5c:	429a      	cmp	r2, r3
  5e:	d102      	bne.n	66 <BrotliUnalignedRead64_USE_MEMCPY+0x26>
  60:	b005      	add	sp, #20
  62:	f85d fb04 	ldr.w	pc, [sp], #4
  66:	f7ff fffe 	bl	0 <__stack_chk_fail>
  6a:	bf00      	nop
  6c:	00000022 	.word	0x00000022

00000070 <BrotliUnalignedWrite64_USE_ALIGNED>:
  70:	ec43 2b30 	vmov	d16, r2, r3
  74:	f940 07cf 	vst1.64	{d16}, [r0]
  78:	4770      	bx	lr
  7a:	bf00      	nop

0000007c <BrotliUnalignedRead64_USE_ALIGNED>:
  7c:	f960 07cf 	vld1.64	{d16}, [r0]
  80:	ec51 0b30 	vmov	r0, r1, d16
  84:	4770      	bx	lr
  86:	bf00      	nop

=== CLANG 
=== using: -march=armv8-a+crc -mtune=cortex-a57.cortex-a53 -O3 
=== (using -mfpu=neon seems to have no effect)

00000000 <BrotliUnalignedWrite64_ORIGINAL>:
   0:	e9c0 2300 	strd	r2, r3, [r0]
   4:	4770      	bx	lr

00000006 <BrotliUnalignedRead64_ORIGINAL>:
   6:	e9d0 2100 	ldrd	r2, r1, [r0]
   a:	4610      	mov	r0, r2
   c:	4770      	bx	lr

0000000e <BrotliUnalignedWrite64_USE_VOLATILE>:
   e:	6002      	str	r2, [r0, #0]
  10:	6043      	str	r3, [r0, #4]
  12:	4770      	bx	lr

00000014 <BrotliUnalignedRead64_USE_VOLATILE>:
  14:	6802      	ldr	r2, [r0, #0]
  16:	6841      	ldr	r1, [r0, #4]
  18:	4610      	mov	r0, r2
  1a:	4770      	bx	lr

0000001c <BrotliUnalignedWrite64_USE_MEMCPY>:
  1c:	b580      	push	{r7, lr}
  1e:	466f      	mov	r7, sp
  20:	b084      	sub	sp, #16
  22:	4909      	ldr	r1, [pc, #36]	; (48 <BrotliUnalignedWrite64_USE_MEMCPY+0x2c>)
  24:	9301      	str	r3, [sp, #4]
  26:	4479      	add	r1, pc
  28:	9200      	str	r2, [sp, #0]
  2a:	eddd 0b00 	vldr	d16, [sp]
  2e:	6809      	ldr	r1, [r1, #0]
  30:	680a      	ldr	r2, [r1, #0]
  32:	9203      	str	r2, [sp, #12]
  34:	f940 070f 	vst1.8	{d16}, [r0]
  38:	9803      	ldr	r0, [sp, #12]
  3a:	6809      	ldr	r1, [r1, #0]
  3c:	1a08      	subs	r0, r1, r0
  3e:	d101      	bne.n	44 <BrotliUnalignedWrite64_USE_MEMCPY+0x28>
  40:	b004      	add	sp, #16
  42:	bd80      	pop	{r7, pc}
  44:	f7ff fffe 	bl	0 <__stack_chk_fail>
  48:	0000001e 	.word	0x0000001e

0000004c <BrotliUnalignedRead64_USE_MEMCPY>:
  4c:	6802      	ldr	r2, [r0, #0]
  4e:	6841      	ldr	r1, [r0, #4]
  50:	4610      	mov	r0, r2
  52:	4770      	bx	lr

00000054 <BrotliUnalignedWrite64_USE_ALIGNED>:
  54:	6043      	str	r3, [r0, #4]
  56:	6002      	str	r2, [r0, #0]
  58:	4770      	bx	lr

0000005a <BrotliUnalignedRead64_USE_ALIGNED>:
  5a:	6802      	ldr	r2, [r0, #0]
  5c:	6841      	ldr	r1, [r0, #4]
  5e:	4610      	mov	r0, r2
  60:	4770      	bx	lr

You can play around with the code I was using at https://godbolt.org/g/pE7Q9V

So I suggest we go with attribute(aligned) when available, and then memcpy in the rare case it's not?

Let me know if you agree, and then I'll push a new version. Thanks!

@jan-wassenberg
Copy link
Member

Thanks for checking the output! Looks like -march=armv8-a is necessary to elide the memcpy call. Your suggestion of attr+memcpy fallback sounds good to me.

@eustas eustas merged commit 6d027d1 into google:master Jul 24, 2018
@eustas
Copy link
Collaborator

eustas commented Jul 24, 2018

Thanks again!

eustas added a commit that referenced this pull request Jul 24, 2018
eustas added a commit that referenced this pull request Jul 24, 2018
eustas added a commit that referenced this pull request Jul 24, 2018
@eustas
Copy link
Collaborator

eustas commented Jul 24, 2018

Sorry, landed too soon. Yes, lets proceed with attribute(aligned) + memcpy fallback.

juj pushed a commit to Unity-Technologies/brotli that referenced this pull request May 13, 2023
Ensures that Aarch32 Arm builds with an Armv8 compiler do not set
BROTLI_64_BITS.

This scenario is possible with ChromeOS builds, as they may use a
toolchain with the target armv7-cros-gnueabi, but with -march=armv8.
This will set __ARM_ARCH to 8 (defining BROTLI_TARGET_ARMV8), but will
also set __ARM_32BIT_STATE and not __ARM_64BIT_STATE. Without this,
illegal 64-bit non-word-aligned reads (LDRD) may be emitted.

Also fix unaligned 64-bit reads on AArch32 - STRD was still possible to
emit.
juj pushed a commit to Unity-Technologies/brotli that referenced this pull request May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants