Skip to content

Conversation

@kunalspathak
Copy link
Contributor

@kunalspathak kunalspathak commented Oct 1, 2024

For linux/arm64, in #106052 , we added a piece of code to retrieve the TLS resolver to check if it is static or dynamic resolver and disable the TLS optimization if it was dynamic. In that, we disassemble the resolver address to check the instruction stream in order to determine if it is static or dynamic resolver. However, when SingleApps are compiled, we noticed that the TLS resolver address retrieval code was linked with different code sequence because of which it did not return the correct address of resolver that the caller was expecting. The caller simply tries to de-reference the value (it assumes it is a function pointer), but with invalid value, we would segfault. I added a check for the TLS resolver retrieval code itself to make sure that it is in correct format.

On my local machine, I was able to repro the original issue, but when I tried reproing it by pointing it to locally build singleAppHost, the resolver code did return an incorrect address, but it was not a totally invalid one and the code sequence was getting dereferenced and disassembled. So, I was not able to see the segfault, but the existing checks were able to find out that the resolver is incorrect and was not doing the optimization. With the fix, I did verify that we would bail out even before invoking the resolver code.

I did locally verify that JIT linux/arm64 scenario still triggers the TLS optimization.

I do not have much knowledge about the SingleAppHost test, and there is a test hole because of which we were not able to catch this issue earlier. @agocke / @elinor-fung - I will need your help in writing and enabling a test for it. We can use the existing environment variable DOTNET_DisableOptimizedThreadStaticAccess to make sure we hit the code path and validate the code.

The issue was reported by the customer in #108327.

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@kunalspathak
Copy link
Contributor Author

FYI @shushanhf - I did not touch loongarch code because seems the instruction stream is different, but you might want to do something similar.

@kunalspathak
Copy link
Contributor Author

I have also moved DisableOptimizedThreadStaticAccess very early in the method so customers can get themselves unblocked if they see TLS issues. Earlier, it was at the end of method and the crash was happening before the check for DisableOptimizedThreadStaticAccess.

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I validated locally and it appears to work.

@kunalspathak kunalspathak merged commit ed05fa9 into dotnet:main Oct 4, 2024
@kunalspathak kunalspathak deleted the tlsresolver branch October 4, 2024 23:42
@kunalspathak
Copy link
Contributor Author

/backport to release/9.0

@github-actions
Copy link
Contributor

github-actions bot commented Oct 4, 2024

Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/11188481884

sirntar pushed a commit to sirntar/runtime that referenced this pull request Oct 8, 2024
* linux/arm64: Verify TLS resolver code

* Move the flag at the top so customer can turn it off to get unblocked
@github-actions github-actions bot locked and limited conversation to collaborators Nov 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants