Skip to content

Core file path set to tmpdirname#2544

Merged
jgmelber merged 7 commits intomainfrom
ypapadop-amd/core-file-location-fix
Oct 8, 2025
Merged

Core file path set to tmpdirname#2544
jgmelber merged 7 commits intomainfrom
ypapadop-amd/core-file-location-fix

Conversation

@ypapadop-amd
Copy link
Collaborator

This PR fixes the path where core files should go. Before that, tmpdir was ignored for those.

Copy link
Collaborator

@fifield fifield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might break code using the libxaiev2 backend (aie-translate --aie-generate-xaie) because it generates C++ to load the elf. @andrej is that still an issue after #2532 ?

@fifield fifield requested a review from andrej August 28, 2025 16:21
@ypapadop-amd
Copy link
Collaborator Author

This might break code using the libxaiev2 backend (aie-translate --aie-generate-xaie) because it generates C++ to load the elf. @andrej is that still an issue after #2532 ?

Happy to try an example. Do we have one?

@andrej
Copy link
Collaborator

andrej commented Aug 28, 2025

I think #2532 should probably fix this. I have tried to remove the reliance on any sort of path structure for where the core files are located, instead using the "elf_file" attribute in MLIR. #2532 passes tests, so if there is a test that tests this aspect of the XAIEV2 flow, then we should be robust to using any path.

@ypapadop-amd
Copy link
Collaborator Author

I think #2532 should probably fix this. I have tried to remove the reliance on any sort of path structure for where the core files are located, instead using the "elf_file" attribute in MLIR. #2532 passes tests, so if there is a test that tests this aspect of the XAIEV2 flow, then we should be robust to using any path.

Are core files still generated? Since I call compile_mlir_module() directly in my own build scripts, the core files are generated to the current directory, which is not expected.

@andrej
Copy link
Collaborator

andrej commented Aug 28, 2025

Yes, the files need to be generated since they contain the core's program memory (as per my understanding). What I meant by "fixed this" was meant in reply to Jeff's question, i.e. the ELF files could now be located elsewhere and the XAIEV2 lowering would still find them. All that's required is that the "elf_file=" attribute on the core is set.

@ypapadop-amd
Copy link
Collaborator Author

Yes, the files need to be generated since they contain the core's program memory (as per my understanding). What I meant by "fixed this" was meant in reply to Jeff's question, i.e. the ELF files could now be located elsewhere and the XAIEV2 lowering would still find them. All that's required is that the "elf_file=" attribute on the core is set.

That makes sense, because I read it the other way :)

I'm happy to wait until #2532 is merged, unless we really want core files to be generated in the cwd.

@ypapadop-amd ypapadop-amd force-pushed the ypapadop-amd/core-file-location-fix branch 2 times, most recently from 4fca133 to 99bccab Compare September 8, 2025 15:46
@ypapadop-amd ypapadop-amd force-pushed the ypapadop-amd/core-file-location-fix branch 2 times, most recently from b2b3010 to 19e0549 Compare September 16, 2025 15:28
@ypapadop-amd ypapadop-amd force-pushed the ypapadop-amd/core-file-location-fix branch from 19e0549 to 60278f4 Compare September 25, 2025 13:49
@ypapadop-amd ypapadop-amd force-pushed the ypapadop-amd/core-file-location-fix branch from 60278f4 to 711ebaf Compare September 26, 2025 17:59
@ypapadop-amd ypapadop-amd force-pushed the ypapadop-amd/core-file-location-fix branch from 711ebaf to 2b6ac7c Compare September 30, 2025 20:28
@ypapadop-amd
Copy link
Collaborator Author

@andrej @fifield would this be good to go?

@andrej
Copy link
Collaborator

andrej commented Sep 30, 2025

I'd merge main and rerun the CI. That might fix the failing tests by fixing the ELF file paths. Once it's all green, I see no reason not to merge this.

@andrej
Copy link
Collaborator

andrej commented Sep 30, 2025

Actually I see you already did merge main. I'd have a look at the generated MLIR and what the value of the elf file path is, and if those files actually exist.

@mawad-amd
Copy link
Collaborator

Is this PR also needed to make sure concurrent builds are correct?

@andrej
Copy link
Collaborator

andrej commented Oct 1, 2025

No, this is unrelated to that.

@mawad-amd
Copy link
Collaborator

How so? It looks like I am seeing the .elf file inside my working directory:

(ironenv) ➜  mlir-aie git:(muhaawad/process-safe-jit) ✗ ls -alth | grep .elf
-rwxrwxr-x  1 muhaawad muhaawad 2.5K Sep 30 17:37 main_core_0_2.elf
(ironenv) ➜  mlir-aie git:(muhaawad/process-safe-jit) ✗ rm *.elf
(ironenv) ➜  mlir-aie git:(muhaawad/process-safe-jit) ✗ pytest test/python/jit_compilation.py -sv
============================================== test session starts ===============================================
platform linux -- Python 3.12.3, pytest-8.4.2, pluggy-1.6.0 -- /home/muhaawad/git/amd/iron/mlir-aie/ironenv/bin/python3
cachedir: .pytest_cache
rootdir: /home/muhaawad/git/amd/iron/mlir-aie
plugins: anyio-4.11.0
collected 3 items                                                                                                

test/python/jit_compilation.py::test_multiple_jit_compilations[int32-16] PASSED
test/python/jit_compilation.py::test_multiple_jit_compilations[int32-64] PASSED
test/python/jit_compilation.py::test_parallel_compilation_subprocess PASSED

=============================================== 3 passed in 1.75s ================================================
(ironenv) ➜  mlir-aie git:(muhaawad/process-safe-jit) ✗ ls -alth | grep .elf
-rwxrwxr-x  1 muhaawad muhaawad 2.5K Sep 30 17:37 main_core_0_2.elf

@andrej
Copy link
Collaborator

andrej commented Oct 1, 2025

Yes, and each test runs in its own working directory. The other tests were failing because they put things in the home directory, the same home directory for all tests. That's at least my understanding. I think this was just a cleanup effort from Yiannis to make it possible to put these files inside the tmpdir created by aiecc.py.

@mawad-amd
Copy link
Collaborator

mawad-amd commented Oct 1, 2025

True but multiple tests will run inside the same working directory. E.g., here:

# RUN: %run_on_npu1% %pytest %s
# RUN: %run_on_npu2% %pytest %s

@ypapadop-amd
Copy link
Collaborator Author

ypapadop-amd commented Oct 1, 2025

I was getting failures because the core files end up in the current working directory under parallel builds. I am manipulating the current working dir at my level to avoid that.

@mawad-amd
Copy link
Collaborator

mawad-amd commented Oct 1, 2025

Yeah, I can see the file also getting overwritten since it has a very standard name it seems.

@andrej
Copy link
Collaborator

andrej commented Oct 1, 2025

True but multiple tests will run inside the same working directory. E.g., here:

# RUN: %run_on_npu1% %pytest %s
# RUN: %run_on_npu2% %pytest %s

Shouldn't only one of those two lines run, since a machine will be either npu1 or npu2 but not both? Or is the working directory an NFS mount across npu1 and npu2 machines?

If there are tests that run in the same working directory, I agree that will also cause issues.

Edit: Just realized the linked file contains multiple tests that pytest runs.

@ypapadop-amd
Copy link
Collaborator Author

I'm hitting some core file related failures on IRONCLAD too which does parallel builds.

@github-actions
Copy link
Contributor

github-actions bot commented Oct 8, 2025

Coverage Report

Created: 2025-10-08 15:05

Click here for information about interpreting this report.

FilenameFunction CoverageLine CoverageRegion CoverageBranch Coverage
home/runner/work/mlir-aie/mlir-aie/lib/Targets/AIERT.cpp 75.00% 61.99% 51.04% 27.63%
Totals 75.00% 61.99% 51.04% 27.63%
Generated by llvm-cov -- llvm version 18.1.3

@jgmelber jgmelber added this pull request to the merge queue Oct 8, 2025
@ypapadop-amd
Copy link
Collaborator Author

@jgmelber it seems it's working without any workarounds to change the cwd in user code.

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 8, 2025
@jgmelber jgmelber merged commit fc54f91 into main Oct 8, 2025
54 of 57 checks passed
@jgmelber jgmelber deleted the ypapadop-amd/core-file-location-fix branch October 8, 2025 16:45
fifield pushed a commit to fifield/mlir-aie that referenced this pull request Nov 12, 2025
Co-authored-by: Joseph Melber <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants