-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[llvm-mca][AArch64] Refactor Neoverse tests to split out common inputs (NFC) #170324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm-mca][AArch64] Refactor Neoverse tests to split out common inputs (NFC) #170324
Conversation
…s (NFC) For many of these tests the inputs are very similiar with slight divergences, we should work to make common sources where we can to avoid subtle differences. Changes: - For basic tests there's a relatively large diff between V1 and all other cores because of 24f0901 (llvm#128892) which makes it more complete. I've been thru the entire diff and 99% of the time V1 makes more sense, except for a couple of small changes (might post a separate patch for). Therefore I decided it's best to take V1-basic-instructions.s as the common source. - Split out FEAT_RCPC_IMMO tests from basic since N1 doesnt have this feature. - Split out FEAT_MTE tests. V2/V3 also have this feature but were missing tests, so I've added them. The NEON and SVE tests are also quite substantial and could be made common, but there's also subtle differences that take time to go thru.
Asher8118
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the change makes sense and I agree that a common source for testing is a good direction.
I've been thru the entire diff and 99% of the time V1 makes more sense, except for a couple of small changes (might post a separate patch for).
One question, in the cases where we do have even slight divergence, what is the plan? Add those instructions specifically as a separate test to avoid duplication?
The NEON and SVE tests are also quite substantial and could be made common, but there's also subtle differences that take time to go thru
I think as long as exceptions are handled as well, that would still be an improvement to the amount of duplication we currently have in tests.
If the divergence is necessary then yes, but I imagine a lot of them arent. For the basic tests there were 3 types of divergence:
I think when a new core is added the tests are copied from the previous one, so the changes to v1 in #128892 for example were equally relevant to other Neoverse cores. When I went thru the diff I'd see an instruction like and we didnt 99% of the time.
Agreed 👍 |
Asher8118
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the divergence is necessary then yes, but I imagine a lot of them arent. For the basic tests there were 3 types of divergence:
For add(s)/sub(s)/add/cmn/cmp v1 would test shift <= 4 whereas v2/v3 didnt, e.g. add x17, x29, x20, lsl #3 (v1) vs add x17, x29, x20, lsl #63 (v2/v3). Looking at the SWOGs shift <= 4 is a special-case so I think V1 is better.
V2/V3 had quite a lot of duplicate instructions
Better opcode coverage in V1 than other cores.
Thank you for your answer. LGTM
rj-jesus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this - I just have a couple of questions/suggestions. :)
llvm/test/tools/llvm-mca/AArch64/Neoverse/Inputs/rcpc-immo-instructions.s
Outdated
Show resolved
Hide resolved
- take 16-bit load/store register offset from v2/v3 (better coverage) - update rcpc-immo header comment
rj-jesus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, cheers!
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/21773 Here is the relevant piece of the build log for the reference |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/51/builds/28061 Here is the relevant piece of the build log for the reference |
…s (NFC) (llvm#170324) For many of these tests the inputs are very similiar with slight divergences, ideally we'd have common sources where we can to avoid subtle differences. Changes: - For basic tests there's a relatively large diff between V1 and all other cores because of 24f0901 (llvm#128892) which makes it more complete. I've been thru the entire diff and 99% of the time V1 makes more sense, except for a couple of small changes (might post a separate patch for). Therefore I decided it's best to take V1-basic-instructions.s as the common source. - Split out FEAT_RCPC_IMMO tests from basic since N1 doesnt have this feature. - Split out FEAT_MTE tests. V2/V3 also have this feature but were missing tests, so I've added them. - Take 16-bit load/store register offset from V2/V3 (better coverage). The NEON and SVE tests are also quite substantial and could be made common, but there's also subtle differences that take time to go thru.
Follow-on from llvm#170324 to also refactor the NEON tests to reuse the input assembly across all Neoverse cores. Posting as a single PR, but to help with review I've staged the commits to hopefully make it clearer how this is done as it's a little confusing understand the differences. I can also post as individual patches if that would help. The approach is as follows: - Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so first combine those. - Inputs for V2/V3/V3AE NEON tests are also already identical, but differ from N-cores, so combine those separately. - Most significantly, input for V1 differs from all other cores primarily because of 24f0901 (llvm#128892). - Split out features that are not supported across all cores. - Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Split out FEAT_BF16. V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests. - Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing N[1-3] against V[2-3] the only change the N cores have that V[2-3] dont is: < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 Checked if this has difference performance characteristics: llvm-mca -mtriple=aarch64 -mcpu=neoverse-n1 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 5 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n2 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 6 6 1.50 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] llvm-mca -mtriple=aarch64 -mcpu=neoverse-n3 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[1], [x0] 4 2 1.00 * st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0] and imm of 1 matches 9, so took that. The rest of the diff is instructions in V[2-3] that arent in N cores, so we take them. All Neoverse cores can optionally support the Cryptographic Extension. The related features (AES, ...) are enabled by default for V1/N1 but not the other cores, so need to be explicitly enabled via -mattr. - Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3 - loads/stores are blended - duplicates with different spaces like 'shll v0.2d, v0.2s, llvm#32' are removed - the rest of the diff is instructions in V1 that are not tested in the other cores, so we add them for the other cores
cofibrant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had accidentally commented on the commit, not the PR, so reposting this comment. Sorry if you've already seen it!
| # CHECK-NEXT: 2 1 0.50 U ldapursb w7, [x8] | ||
| # CHECK-NEXT: 2 1 0.50 U ldapursb x29, [x7] | ||
| # CHECK-NEXT: 2 1 0.50 U ldapursh w17, [x19] | ||
| # CHECK-NEXT: 2 1 0.50 U ldapursh x3, [x3] | ||
| # CHECK-NEXT: 2 1 0.50 U ldapursw x3, [x18] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment on why these instructions are marked as having side effects but not as 'may load'? Aren't these loads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes they're loads, looks like the mayLoad/mayStore properties are not set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this sounds like a bug, then. Is this something you'd be able to take a look at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but just to clarify it's nothing to do with this PR, this was missed when the instructions were added. You can post a PR if you like since you spotted it, but I'm happy to fix it if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, yes, sorry, I didn't mean to suggest this PR was the root cause. I don't think I have scope to fix this and was just wondering if you did as you're working on Neoverse anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
posted a fix #171142, thanks for spotting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this!
…s (NFC) (llvm#170324) For many of these tests the inputs are very similiar with slight divergences, ideally we'd have common sources where we can to avoid subtle differences. Changes: - For basic tests there's a relatively large diff between V1 and all other cores because of 24f0901 (llvm#128892) which makes it more complete. I've been thru the entire diff and 99% of the time V1 makes more sense, except for a couple of small changes (might post a separate patch for). Therefore I decided it's best to take V1-basic-instructions.s as the common source. - Split out FEAT_RCPC_IMMO tests from basic since N1 doesnt have this feature. - Split out FEAT_MTE tests. V2/V3 also have this feature but were missing tests, so I've added them. - Take 16-bit load/store register offset from V2/V3 (better coverage). The NEON and SVE tests are also quite substantial and could be made common, but there's also subtle differences that take time to go thru.
Follow-on from #170324 to also refactor the NEON tests to reuse the input assembly across all Neoverse cores. The approach is as follows: - Inputs for Neoverse N1/N2/N3 NEON tests are already identical, so first combine those. - Inputs for V2/V3/V3AE NEON tests are also already identical, but differ from N-cores, so combine those separately. - Most significantly, input for V1 differs from all other cores primarily because of 24f0901 (#128892). - Split out features that are not supported across all cores. - Split out FEAT_I8MM, FEAT_FHM, FEAT_FCMA. N1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Split out FEAT_BF16. V1 doesn't have this feature but all other Neoverse cores do. Also adds coverage for N1/N2/N3 since they were missing tests. - Split out FEAT_FRINTTS. V1/N1 don't have this feature but all other Neoverse cores do. Also adds coverage for N2/N3 since they were missing tests. - Bring Neoverse V2/V3/V3AE and N1/N2/N3 neon tests inline. Comparing N[1-3] against V[2-3] the only change the N cores have that V[2-3] dont is: ``` < st4 { v0.d, v1.d, v2.d, v3.d }[1], [x0], x5 --- > st4 { v0.b, v1.b, v2.b, v3.b }[9], [x0], x5 ``` So we take it for all cores. The rest of the diff is instructions in V[2-3] that arent in N cores, so we also take them. All Neoverse cores can optionally support the Cryptographic Extension. The related features (AES, ...) are enabled by default for V1/N1 but not the other cores, so need to be explicitly enabled via -mattr. - Finally bring Neoverse V1 inline with V2/V3/V3AE/N1/N2/N3 - loads/stores are blended - duplicates with different spaces like `shll v0.2d, v0.2s, #32` are removed - the rest of the diff is instructions in V1 that are not tested in the other cores, so we add them for the other cores
For many of these tests the inputs are very similiar with slight divergences, ideally we'd have common sources where we can to avoid subtle differences.
Changes:
The NEON and SVE tests are also quite substantial and could be made common, but there's also subtle differences that take time to go thru.