Skip to content

Conversation

@hmelder
Copy link
Contributor

@hmelder hmelder commented Nov 21, 2025

This pull request adds initial support for compiling Objective-C to WebAssembly. I tested my changes with libobjc2 and the swift-corelibs-blocksruntime.

There are two outstanding issues, which I cannot fix as deeper knowledge of the subsystems is required:

  1. Symbols marked as explicitly hidden in code generation are exported
  2. Clang crashes in SelectionDAG when compiling an Objective-C try/catch block with -fwasm-exceptions

First Issue

Emscripten is processing the generated .wasm file in emscripten.py and checks if all exported symbols are valid javascript identifiers (tools/js_manipulation.py#L104). However, hidden symbols such as .objc_init are intentionally an invalid C identifier.

The core of the problem is that symbols with the WASM_SYMBOL_NO_STRIP attribute are exported when targeting Emscripten (https://reviews.llvm.org/D62542). This attribute is added to the symbol during relocation in WasmObjectWriter::recordRelocation. So we are accidentally exporting a lot of hidden symbols and not only ones generated by ObjC CG...

I'm currently hacking around this by not exporting no-strip symbols. This is the default behaviour for Wasm.

Second Issue

Here is a minimal example that triggers the crash.

#include<stdio.h>

int main(void) {
	int ret = 0;
	@try {
	}
	@catch (id a)
	{
		ret = 1;
                 puts("abc");
	}

	return ret;
}

The following assertion is triggered:

clang: /home/vm/llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp:124: void llvm::WebAssemblyExceptionInfo::recalculate(MachineFunction &, MachineDominatorTree &, const MachineDominanceFrontier &): Assertion `EHInfo' failed.

Here is the crash report main-c3884.zip.

You can use emcc with a modified LLVM build by exporting EM_LLVM_ROOT before sourcing emsdk/emsdk_env.sh:

emcc -fobjc-runtime=gnustep-2.2 -fwasm-exceptions -c main.m

or just invoke clang directly:

/home/vm/llvm-build-wasm/bin/clang -target wasm32-unknown-emscripten -mllvm -combiner-global-alias-analysis=false -mllvm -wasm-enable-sjlj -mllvm -wasm-use-legacy-eh=false -mllvm -disable-lsr --sysroot=/home/vm/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -fobjc-runtime=gnustep-2.2 -fwasm-exceptions -c main.m

Building libobjc2 and the BlocksRuntime

Building the BlocksRuntime

cmake -DCMAKE_TOOLCHAIN_FILE=$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake   -DCMAKE_INSTALL_PREFIX=/home/vm/demo-install -DCMAKE_BUILD_TYPE=Debug -B build -G Ninja

Building libobjc2

cmake -DCMAKE_TOOLCHAIN_FILE=$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake   -DCMAKE_INSTALL_PREFIX=/home/vm/demo-install -DBlocksRuntime_LIBRARIES=/home/vm/demo-install/lib/libBlocksRuntime.a -DBlocksRuntime_INCLUDE_DIR=/home/vm/demo-install/include/BlocksRuntime -DEMBEDDED_BLOCKS_RUNTIME=OFF -DTESTS=OFF  -B build  -DCMAKE_BUILD_TYPE=Debug  -G Ninja

@llvmbot llvmbot added backend:WebAssembly clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:mc Machine (object) code labels Nov 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2025

@llvm/pr-subscribers-clang
@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-backend-webassembly

Author: Hugo Melder (hmelder)

Changes

This pull request adds initial support for compiling Objective-C to WebAssembly. I tested my changes with libobjc2 and the swift-corelibs-blocksruntime.

There are two outstanding issues, which I cannot fix as deeper knowledge of the subsystems is required:

  1. Symbols marked as explicitly hidden in code generation are exported
  2. Clang crashes in SelectionDAG when compiling an Objective-C try/catch block with -fwasm-exceptions

First Issue

Emscripten is processing the generated .wasm file in emscripten.py and checks if all exported symbols are valid javascript identifiers (tools/js_manipulation.py#L104). However, hidden symbols such as .objc_init are intentionally an invalid C identifier.

The core of the problem is that symbols with the WASM_SYMBOL_NO_STRIP attribute are exported when targeting Emscripten (https://reviews.llvm.org/D62542). This attribute is added to the symbol during relocation in WasmObjectWriter::recordRelocation. So we are accidentally exporting a lot of hidden symbols and not only ones generated by ObjC CG...

I'm currently hacking around this by not exporting no-strip symbols. This is the default behaviour for Wasm.

Second Issue

Here is a minimal example that triggers the crash.

#include&lt;stdio.h&gt;

int main(void) {
	int ret = 0;
	@<!-- -->try {
	}
	@<!-- -->catch (id a)
	{
		ret = 1;
                 puts("abc");
	}

	return ret;
}

The following assertion is triggered:

clang: /home/vm/llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp:124: void llvm::WebAssemblyExceptionInfo::recalculate(MachineFunction &amp;, MachineDominatorTree &amp;, const MachineDominanceFrontier &amp;): Assertion `EHInfo' failed.

Here is the crash report main-c3884.zip.

You can use emcc with a modified LLVM build by exporting EM_LLVM_ROOT before sourcing emsdk/emsdk_env.sh:

emcc -fobjc-runtime=gnustep-2.2 -fwasm-exceptions -c main.m

or just invoke clang directly:

/home/vm/llvm-build-wasm/bin/clang -target wasm32-unknown-emscripten -mllvm -combiner-global-alias-analysis=false -mllvm -wasm-enable-sjlj -mllvm -wasm-use-legacy-eh=false -mllvm -disable-lsr --sysroot=/home/vm/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -fobjc-runtime=gnustep-2.2 -fwasm-exceptions -c main.m

Building libobjc2 and the BlocksRuntime

Building the BlocksRuntime

cmake -DCMAKE_TOOLCHAIN_FILE=$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake   -DCMAKE_INSTALL_PREFIX=/home/vm/demo-install -DCMAKE_BUILD_TYPE=Debug -B build -G Ninja

Building libobjc2

cmake -DCMAKE_TOOLCHAIN_FILE=$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake   -DCMAKE_INSTALL_PREFIX=/home/vm/demo-install -DBlocksRuntime_LIBRARIES=/home/vm/demo-install/lib/libBlocksRuntime.a -DBlocksRuntime_INCLUDE_DIR=/home/vm/demo-install/include/BlocksRuntime -DEMBEDDED_BLOCKS_RUNTIME=OFF -DTESTS=OFF  -B build  -DCMAKE_BUILD_TYPE=Debug  -G Ninja

Full diff: https://github.com/llvm/llvm-project/pull/169043.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGObjCGNU.cpp (+10-4)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+2-1)
  • (modified) llvm/lib/MC/WasmObjectWriter.cpp (-3)
diff --git a/clang/lib/CodeGen/CGObjCGNU.cpp b/clang/lib/CodeGen/CGObjCGNU.cpp
index 06643d4bdc211..3b9f9f306829d 100644
--- a/clang/lib/CodeGen/CGObjCGNU.cpp
+++ b/clang/lib/CodeGen/CGObjCGNU.cpp
@@ -179,8 +179,15 @@ class CGObjCGNU : public CGObjCRuntime {
       (R.getVersion() >= VersionTuple(major, minor));
   }
 
-  std::string ManglePublicSymbol(StringRef Name) {
-    return (StringRef(CGM.getTriple().isOSBinFormatCOFF() ? "$_" : "._") + Name).str();
+  const std::string ManglePublicSymbol(StringRef Name) {
+    auto triple = CGM.getTriple();
+
+    // Exported symbols in Emscripten must be a valid Javascript identifier.
+    if (triple.isOSBinFormatCOFF() || triple.isOSBinFormatWasm()) {
+      return (StringRef("$_") + Name).str();
+    } else {
+      return (StringRef("._") + Name).str();
+    }
   }
 
   std::string SymbolForProtocol(Twine Name) {
@@ -4106,8 +4113,7 @@ llvm::Function *CGObjCGNU::ModuleInitFunction() {
   if (!ClassAliases.empty()) {
     llvm::Type *ArgTypes[2] = {PtrTy, PtrToInt8Ty};
     llvm::FunctionType *RegisterAliasTy =
-      llvm::FunctionType::get(Builder.getVoidTy(),
-                              ArgTypes, false);
+        llvm::FunctionType::get(BoolTy, ArgTypes, false);
     llvm::Function *RegisterAlias = llvm::Function::Create(
       RegisterAliasTy,
       llvm::GlobalValue::ExternalWeakLinkage, "class_registerAlias_np",
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 30d3e5293a31b..6cbec5e17ae1a 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -8001,7 +8001,8 @@ ObjCRuntime Clang::AddObjCRuntimeArgs(const ArgList &args,
     if ((runtime.getKind() == ObjCRuntime::GNUstep) &&
         (runtime.getVersion() >= VersionTuple(2, 0)))
       if (!getToolChain().getTriple().isOSBinFormatELF() &&
-          !getToolChain().getTriple().isOSBinFormatCOFF()) {
+          !getToolChain().getTriple().isOSBinFormatCOFF() &&
+          !getToolChain().getTriple().isOSBinFormatWasm()) {
         getToolChain().getDriver().Diag(
             diag::err_drv_gnustep_objc_runtime_incompatible_binary)
           << runtime.getVersion().getMajor();
diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index 15590b31fd07f..d882146e21b8a 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1794,9 +1794,6 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
       Flags |= wasm::WASM_SYMBOL_UNDEFINED;
     if (WS.isNoStrip()) {
       Flags |= wasm::WASM_SYMBOL_NO_STRIP;
-      if (isEmscripten()) {
-        Flags |= wasm::WASM_SYMBOL_EXPORTED;
-      }
     }
     if (WS.hasImportName())
       Flags |= wasm::WASM_SYMBOL_EXPLICIT_NAME;

@llvmbot
Copy link
Member

llvmbot commented Nov 21, 2025

@llvm/pr-subscribers-llvm-mc

Author: Hugo Melder (hmelder)

Changes

This pull request adds initial support for compiling Objective-C to WebAssembly. I tested my changes with libobjc2 and the swift-corelibs-blocksruntime.

There are two outstanding issues, which I cannot fix as deeper knowledge of the subsystems is required:

  1. Symbols marked as explicitly hidden in code generation are exported
  2. Clang crashes in SelectionDAG when compiling an Objective-C try/catch block with -fwasm-exceptions

First Issue

Emscripten is processing the generated .wasm file in emscripten.py and checks if all exported symbols are valid javascript identifiers (tools/js_manipulation.py#L104). However, hidden symbols such as .objc_init are intentionally an invalid C identifier.

The core of the problem is that symbols with the WASM_SYMBOL_NO_STRIP attribute are exported when targeting Emscripten (https://reviews.llvm.org/D62542). This attribute is added to the symbol during relocation in WasmObjectWriter::recordRelocation. So we are accidentally exporting a lot of hidden symbols and not only ones generated by ObjC CG...

I'm currently hacking around this by not exporting no-strip symbols. This is the default behaviour for Wasm.

Second Issue

Here is a minimal example that triggers the crash.

#include&lt;stdio.h&gt;

int main(void) {
	int ret = 0;
	@<!-- -->try {
	}
	@<!-- -->catch (id a)
	{
		ret = 1;
                 puts("abc");
	}

	return ret;
}

The following assertion is triggered:

clang: /home/vm/llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp:124: void llvm::WebAssemblyExceptionInfo::recalculate(MachineFunction &amp;, MachineDominatorTree &amp;, const MachineDominanceFrontier &amp;): Assertion `EHInfo' failed.

Here is the crash report main-c3884.zip.

You can use emcc with a modified LLVM build by exporting EM_LLVM_ROOT before sourcing emsdk/emsdk_env.sh:

emcc -fobjc-runtime=gnustep-2.2 -fwasm-exceptions -c main.m

or just invoke clang directly:

/home/vm/llvm-build-wasm/bin/clang -target wasm32-unknown-emscripten -mllvm -combiner-global-alias-analysis=false -mllvm -wasm-enable-sjlj -mllvm -wasm-use-legacy-eh=false -mllvm -disable-lsr --sysroot=/home/vm/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -fobjc-runtime=gnustep-2.2 -fwasm-exceptions -c main.m

Building libobjc2 and the BlocksRuntime

Building the BlocksRuntime

cmake -DCMAKE_TOOLCHAIN_FILE=$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake   -DCMAKE_INSTALL_PREFIX=/home/vm/demo-install -DCMAKE_BUILD_TYPE=Debug -B build -G Ninja

Building libobjc2

cmake -DCMAKE_TOOLCHAIN_FILE=$EMSDK/upstream/emscripten/cmake/Modules/Platform/Emscripten.cmake   -DCMAKE_INSTALL_PREFIX=/home/vm/demo-install -DBlocksRuntime_LIBRARIES=/home/vm/demo-install/lib/libBlocksRuntime.a -DBlocksRuntime_INCLUDE_DIR=/home/vm/demo-install/include/BlocksRuntime -DEMBEDDED_BLOCKS_RUNTIME=OFF -DTESTS=OFF  -B build  -DCMAKE_BUILD_TYPE=Debug  -G Ninja

Full diff: https://github.com/llvm/llvm-project/pull/169043.diff

3 Files Affected:

  • (modified) clang/lib/CodeGen/CGObjCGNU.cpp (+10-4)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+2-1)
  • (modified) llvm/lib/MC/WasmObjectWriter.cpp (-3)
diff --git a/clang/lib/CodeGen/CGObjCGNU.cpp b/clang/lib/CodeGen/CGObjCGNU.cpp
index 06643d4bdc211..3b9f9f306829d 100644
--- a/clang/lib/CodeGen/CGObjCGNU.cpp
+++ b/clang/lib/CodeGen/CGObjCGNU.cpp
@@ -179,8 +179,15 @@ class CGObjCGNU : public CGObjCRuntime {
       (R.getVersion() >= VersionTuple(major, minor));
   }
 
-  std::string ManglePublicSymbol(StringRef Name) {
-    return (StringRef(CGM.getTriple().isOSBinFormatCOFF() ? "$_" : "._") + Name).str();
+  const std::string ManglePublicSymbol(StringRef Name) {
+    auto triple = CGM.getTriple();
+
+    // Exported symbols in Emscripten must be a valid Javascript identifier.
+    if (triple.isOSBinFormatCOFF() || triple.isOSBinFormatWasm()) {
+      return (StringRef("$_") + Name).str();
+    } else {
+      return (StringRef("._") + Name).str();
+    }
   }
 
   std::string SymbolForProtocol(Twine Name) {
@@ -4106,8 +4113,7 @@ llvm::Function *CGObjCGNU::ModuleInitFunction() {
   if (!ClassAliases.empty()) {
     llvm::Type *ArgTypes[2] = {PtrTy, PtrToInt8Ty};
     llvm::FunctionType *RegisterAliasTy =
-      llvm::FunctionType::get(Builder.getVoidTy(),
-                              ArgTypes, false);
+        llvm::FunctionType::get(BoolTy, ArgTypes, false);
     llvm::Function *RegisterAlias = llvm::Function::Create(
       RegisterAliasTy,
       llvm::GlobalValue::ExternalWeakLinkage, "class_registerAlias_np",
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 30d3e5293a31b..6cbec5e17ae1a 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -8001,7 +8001,8 @@ ObjCRuntime Clang::AddObjCRuntimeArgs(const ArgList &args,
     if ((runtime.getKind() == ObjCRuntime::GNUstep) &&
         (runtime.getVersion() >= VersionTuple(2, 0)))
       if (!getToolChain().getTriple().isOSBinFormatELF() &&
-          !getToolChain().getTriple().isOSBinFormatCOFF()) {
+          !getToolChain().getTriple().isOSBinFormatCOFF() &&
+          !getToolChain().getTriple().isOSBinFormatWasm()) {
         getToolChain().getDriver().Diag(
             diag::err_drv_gnustep_objc_runtime_incompatible_binary)
           << runtime.getVersion().getMajor();
diff --git a/llvm/lib/MC/WasmObjectWriter.cpp b/llvm/lib/MC/WasmObjectWriter.cpp
index 15590b31fd07f..d882146e21b8a 100644
--- a/llvm/lib/MC/WasmObjectWriter.cpp
+++ b/llvm/lib/MC/WasmObjectWriter.cpp
@@ -1794,9 +1794,6 @@ uint64_t WasmObjectWriter::writeOneObject(MCAssembler &Asm,
       Flags |= wasm::WASM_SYMBOL_UNDEFINED;
     if (WS.isNoStrip()) {
       Flags |= wasm::WASM_SYMBOL_NO_STRIP;
-      if (isEmscripten()) {
-        Flags |= wasm::WASM_SYMBOL_EXPORTED;
-      }
     }
     if (WS.hasImportName())
       Flags |= wasm::WASM_SYMBOL_EXPLICIT_NAME;

@github-actions
Copy link

github-actions bot commented Nov 21, 2025

🐧 Linux x64 Test Results

  • 111606 tests passed
  • 4467 tests skipped

@hmelder
Copy link
Contributor Author

hmelder commented Nov 28, 2025

@sunfishcode, I see that you are the original author of https://reviews.llvm.org/D62542. As @dschuff said in the review back then:

I'm hoping we can make that export behavior nicer soon; I find the attribute(used) -> export behavior a bit odd too. Once we drop fastcomp it will be easier to redefine EMSCRIPTEN_KEEPALIVE and other things.

This was in 2019, is this hack still required now that fastcomp is deprecated? Then problem with the current behaviour is that hidden no-strip symbols, added during codegen, are exported.

@hmelder
Copy link
Contributor Author

hmelder commented Nov 28, 2025

@davidchisnall the changes in codegen are trivial:

  1. Mangle public symbols with '$' instead of '.' as the latter is not a valid javascript identifier.
  2. Fix the function signature of class_registerAlias_np to return a bool instead of void.

@hmelder
Copy link
Contributor Author

hmelder commented Nov 28, 2025

Assuming that the new WASM exception implementation implements the mandatory functions and data structure of the Itanium EH ABI correctly, not much needs to be done to get EH working with libobjc2. I just need to find the root course of the crash...

Copy link
Contributor

@davidchisnall davidchisnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Objective-C bits look fine to me, the MC bit possibly should be a separate PR.

@github-actions
Copy link

github-actions bot commented Nov 28, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Nov 28, 2025
Copy link
Contributor

@davidchisnall davidchisnall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM once clang-format is happy.

We probably should have a test in tests/CodeGenObjC checking that the mangling is correct for WAsm.

@dschuff
Copy link
Member

dschuff commented Dec 5, 2025

+cc @sbc100 about issue 1 and @aheejin about issue 2.

Pardon my ignorance about objc here... When you say "explicitly hidden" what do you mean exactly? Do you mean something like __attribute__((visibility("hidden"))?

On exception handling, I took a quick look at the IR output of your example. The cc1 command includes
-target-feature +exception-handling -target-feature +multivalue -target-feature +reference-types -exception-model=wasm -mllvm -wasm-enable-eh -fobjc-exceptions -fexceptions -mllvm -wasm-enable-sjlj -mllvm -wasm-use-legacy-eh which looks approximately right. The IR output uses the @__gnustep_objc_personality_v0 and uses objc exception runtime functions including @objc_begin_catch(ptr %exn). Probably the exception handling ABI for objc is going to have to be tuned the way the libc++ EH ABI was, which will probably take some small tweaks in the frontend to have the same behavior, and some larger tweaks in the runtime, as there was with libc++ and libc++abi.
I believe all of our wasm-specific change to the EH runtime have been upstreamed by @aheejin so you can take a look at the wasm-specific code there to get an idea of what it would take for objc.


// Exported symbols in Emscripten must be a valid Javascript identifier.
auto triple = CGM.getTriple();
if (triple.isOSBinFormatCOFF() || triple.isOSBinFormatWasm()) {
Copy link
Member

@dschuff dschuff Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restriction on valid JS identifiers is specific to Emscripten rather than wasm as a whole, so you might want to check for isOSEmscripten here rather than the bin format. But if you want to have a common ABI across Emscripten and WASI, then this would be OK with me too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should maybe fix emscripten to deal with these symbols instead of patching here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense. we could just do a similar prefix or substitution as the one here? If the export name is invalid, we could just mangle the symbol on the Module object? Or have an alias so Module[".realSymbol"] could keep working?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do that. But remember that symbols are also available directly in the module scope as normal variables.

e.g. One can just write _malloc for symbols that are not exported on the Module. For exported symbols one can also write Module['_malloc']. So this change would just mean that symbol are that not valid JS symbol names would not be accessible via the first method... which is an odd difference but maybe better than "link failure" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, better than link failure I think. And given that the map style accessors aren't going away, maybe it's sufficient to just leave it at that. where if you have invalid identifiers, you just need to use that method (as opposed to trying to mangle them and change the symbol name altogether?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll look into fixing this now on the emscripten side. We have an open bug there already: emscripten-core/emscripten#24825

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmelder / @HendrikHuebner Would Sam's Emscripten change be useful for the objc use case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The restriction on valid JS identifiers is specific to Emscripten rather than wasm as a whole, so you might want to check for isOSEmscripten here rather than the bin format. But if you want to have a common ABI across Emscripten and WASI, then this would be OK with me too.

I think it does not really matter whether we use $ or . and I would prefer to not add more complexity into the conditional.

We should maybe fix emscripten to deal with these symbols instead of patching here?
[...]
So this change would just mean that symbol are that not valid JS symbol names would not be accessible via the first method... which is an odd difference but maybe better than "link failure" ?

We mangle the symbols so that they are not directly usable by the user in a C program, or at least this was the original intend with . on ELF. AFAIK there is an extension that allows $ to be used in a C identifier which is why Emscripten supports it in the first place.

You can change this in Emscripten but it does not really matter for our use case.

sbc100 added a commit to sbc100/emscripten that referenced this pull request Dec 10, 2025
In this case we now generate a warning instead of an errors.

Such symbols are not directly accessible in the module scope (since we
cannot declare them there).  They are only accessible via `wasmExports`
or `Module` dictionary objects.

See: llvm/llvm-project#169043
Fixes: emscripten-core#24825
@dschuff
Copy link
Member

dschuff commented Dec 10, 2025

@sbc100 I also re-read First Issue in OP, and it looks like (mangling aside) the underlying problem is that symbols are being exported that weren't intended to be in the first place. It looks like maybe OP is running into the issue of Emscripten's hack from the ancient times where EMSCRIPTEN_KEEPALIVE (which compiles to attribute("used") and then WASM_SYM_NO_STRIP) then also implies WASM_SYM_EXPORTED. IIUC they want to keep these symbols from being stripped but don't want them exported. Normally these unused exports are not necessarily a problem, they are just extra binary size that can removed with metadce. But when they cause linker errors because they are invalid C then everything breaks. So landing emscripten-core/emscripten#24825 will keep everything from breaking (we'd probably also want to suppress the warning), and it seems useful independently. But it might still be also useful to take another crack at the EMSCRIPTEN_KEEPALIVE problem. It seems sticky because there are likely users depending on the fact that attribute("used") causes exporting (although hopefully most are using EMSCRIPTEN_KEEPALIVE instead of directly). But there are probably other uses of attribute("used") that just want to prevent stripping, and we can't tell them apart. We could maybe just add 2 distinct attributes ("just-no-dead-strip" and "just-export-unless-dead") and deprecate "used" but keep it working ~forever. Not sure if there's a better way.

sbc100 added a commit to sbc100/emscripten that referenced this pull request Dec 10, 2025
In this case we now generate a warning instead of an errors.

Such symbols are not directly accessible in the module scope (since we
cannot declare them there).  They are only accessible via `wasmExports`
or `Module` dictionary objects.

See: llvm/llvm-project#169043
Fixes: emscripten-core#24825
sbc100 added a commit to sbc100/emscripten that referenced this pull request Dec 10, 2025
In this case we now generate a warning instead of an errors.

Such symbols are not directly accessible in the module scope (since we
cannot declare them there).  They are only accessible via `wasmExports`
or `Module` dictionary objects.

See: llvm/llvm-project#169043
Fixes: emscripten-core#24825
@hmelder
Copy link
Contributor Author

hmelder commented Dec 12, 2025

Take .objc_ctor for example

auto *InitVar = new llvm::GlobalVariable(TheModule, LoadFunction->getType(),
/*isConstant*/false, llvm::GlobalValue::LinkOnceAnyLinkage,
LoadFunction, ".objc_ctor");

which is placed into comdat and explicitly marked as hidden here

InitVar->setVisibility(llvm::GlobalValue::HiddenVisibility);
InitVar->setComdat(TheModule.getOrInsertComdat(".objc_ctor"));

When compiling a basic Objective-C application with emcc (EM_LLVM_ROOT pointing to my patched LLVM), I get the following error message:

emcc: error: invalid export name: "_.objc_ctor"

Which originates from upstream/emscripten/tools/emscripten.py:

  for e in settings.EXPORTED_FUNCTIONS:
    if not js_manipulation.isidentifier(e):
      exit_with_error(f'invalid export name: "{e}"')

The hidden symbol is exported because the no-strip attributed was added implicitly, causing it to be exported in

if (WS.isNoStrip()) {
Flags |= wasm::WASM_SYMBOL_NO_STRIP;
if (isEmscripten()) {
Flags |= wasm::WASM_SYMBOL_EXPORTED;
}
}

@hmelder
Copy link
Contributor Author

hmelder commented Dec 12, 2025

It looks like maybe OP is running into the issue of Emscripten's hack from the ancient times where EMSCRIPTEN_KEEPALIVE (which compiles to attribute("used") and then WASM_SYM_NO_STRIP)

Exactly this is the problem.

@hmelder
Copy link
Contributor Author

hmelder commented Dec 12, 2025

There is also some work to be done to support other personality functions in WebAssembly. Basic code generation with the wrong personality function works with this patch.

  1. Clang emits _Unwind_CallPersonality (defined in libunwind/src/Unwind-wasm.c) which always calls __gxx_personality_wasm0.
  2. WebAssembly catchpads are emitted based on the return value of isWasmPersonality.
  3. WasmEHPrepare hard-codes and reports a fatal error if the personality function is not __gxx_wasm_personality_v0
    if (!F.hasPersonalityFn() ||
    !isScopedEHPersonality(classifyEHPersonality(F.getPersonalityFn()))) {
    report_fatal_error("Function '" + F.getName() +
    "' does not have a correct Wasm personality function "
    "'__gxx_wasm_personality_v0'");
    }
    assert(F.hasPersonalityFn() && "Personality function not found");

The personality __gxx_personality_wasm0 is implemented in libcxxabi/src/cxa_personality.cpp.

I would like to avoid defining two new personality functions for ObjC and ObjCXX on WASM. Can't we just differentiate and emit code based on the target triple?

@HendrikHuebner
Copy link
Contributor

There is also some work to be done to support other personality functions in WebAssembly. Basic code generation with the wrong personality function works with this patch.

1. Clang emits `_Unwind_CallPersonality` (defined in [libunwind/src/Unwind-wasm.c](https://github.com/llvm/llvm-project/blob/80ec43d455a5e47ba005112cd2b2c447bb40c42c/libunwind/src/Unwind-wasm.c#L58)) which always calls `__gxx_personality_wasm0`.

2. WebAssembly catchpads are emitted based on the return value of `isWasmPersonality`.

3. WasmEHPrepare hard-codes and reports a fatal error if the personality function is not `__gxx_wasm_personality_v0`
   https://github.com/llvm/llvm-project/blob/80ec43d455a5e47ba005112cd2b2c447bb40c42c/llvm/lib/CodeGen/WasmEHPrepare.cpp#L238-L244

The personality __gxx_personality_wasm0 is implemented in libcxxabi/src/cxa_personality.cpp.

I would like to avoid defining two new personality functions for ObjC and ObjCXX on WASM. Can't we just differentiate and emit code based on the target triple?

I don't understand why we need _Unwind_CallPersonality in the first place, since all it does at the moment is delegating to the (hardcoded) personality function and resetting the selector (Which could also be done inside the personality function). There needs to be more flexibility in the ABI to have different personality functions. @dschuff Do you think an ABI break involving _Unwind_CallPersonality would still be tolerable at this stage, such as changing the signature or just avoiding it all together and directly emitting a call to the personality function?

@sbc100
Copy link
Collaborator

sbc100 commented Dec 12, 2025

It looks like maybe OP is running into the issue of Emscripten's hack from the ancient times where EMSCRIPTEN_KEEPALIVE (which compiles to attribute("used") and then WASM_SYM_NO_STRIP)

Exactly this is the problem.

It seems like there are actually two different issues:

  1. attribute("used") implies WASM_SYM_EXPORTED on emscripten.
  2. exported symbols that are not valid JS identifiers generate invalid export name under emscripten.

We already have a fix for (2) in flight which works around this issue for now I believe.

The fix for (1) is harder, but I would like to get this fixed eventually. The problem stems from the fact that we have an EMSCRIPTEN_KEEPALIVE macro in emscripten. This macro will cause both functions and data to be exported and currently relies on attribute("used"). I have a PR from back in 2022 to try to fix it: emscripten-core/emscripten#16149. However it ran into the issue that LLVM currently doesn't support attributes on data symbols. Fixing that seems like it could be hard, but I'm not sure.

@dschuff
Copy link
Member

dschuff commented Dec 12, 2025

I would like to avoid defining two new personality functions for ObjC and ObjCXX on WASM. Can't we just differentiate and emit code based on the target triple?

If objc follows exactly the same ABI and runtime as C++ then maybe it can just use the same personality function. I think we do want to support different personality functions, but maybe the test is which parts of the runtime code they want to share. IIRC we also have a separate personality function for using wasm EH for setjmp and longjmp (or maybe that's just a different tag?). I'd want to get @aheejin's opinion about the best design for this. But we can certainly make this work one way or another.

@rjmccall
Copy link
Contributor

rjmccall commented Dec 12, 2025

I don't know why the GNUStep runtime needs different ObjC and ObjC++ personality functions. The ObjC++ personality has to support the complete superset of exception clauses (a single call site can have handlers for both C++ and ObjC exception types). It should therefore still be usable in either pure ObjC or pure C++.

In theory, the ObjC++ personality needs to be able to distinguish ObjC and C++ exceptions, so using it instead of a single-language personality could require a less compact LSDA format. In practice, they use the exact same format, and AFAIK the exception types are always distinguished in a more subtle way. I don't know the exact details of what GNUStep does, but the Apple ObjC runtime just provides a v-table for ObjC exception RTTI objects that implements the private exception-matching virtual methods on type_info differently. (I'm not even sure LLVM supports emitting a different LSDA format for different personalities.)

sbc100 added a commit to emscripten-core/emscripten that referenced this pull request Dec 16, 2025
In this case we now generate a warning instead of an errors.

Such symbols are not directly accessible in the module scope (since we
cannot declare them there). They are only accessible via `wasmExports`
or `Module` dictionary objects.

See: llvm/llvm-project#169043
Fixes: #24825, #23560
@aheejin
Copy link
Member

aheejin commented Dec 17, 2025

Sorry for the late reply. My email filters weren't set up right...

@hmelder

There is also some work to be done to support other personality functions in WebAssembly. Basic code generation with the wrong personality function works with this patch.

  1. Clang emits _Unwind_CallPersonality (defined in libunwind/src/Unwind-wasm.c) which always calls __gxx_personality_wasm0.

Calls to _Unwind_CallPersonality are emitted not in Clang but WasmEHPrepare:

// Pseudocode: _Unwind_CallPersonality(exn);
CallInst *PersCI = IRB.CreateCall(CallPersonalityF, CatchCI,
OperandBundleDef("funclet", CPI));

  1. WebAssembly catchpads are emitted based on the return value of isWasmPersonality.
  2. WasmEHPrepare hard-codes and reports a fatal error if the personality function is not __gxx_wasm_personality_v0
    if (!F.hasPersonalityFn() ||
    !isScopedEHPersonality(classifyEHPersonality(F.getPersonalityFn()))) {
    report_fatal_error("Function '" + F.getName() +
    "' does not have a correct Wasm personality function "
    "'__gxx_wasm_personality_v0'");
    }
    assert(F.hasPersonalityFn() && "Personality function not found");

The personality __gxx_personality_wasm0 is implemented in libcxxabi/src/cxa_personality.cpp.

I would like to avoid defining two new personality functions for ObjC and ObjCXX on WASM. Can't we just differentiate and emit code based on the target triple?

I think the name of the personality function can be either the same or different, depending on your preference. Even if the name is the same, the library you link will be different (i.e., it's not going to be libc++abi), so as long as the signature matches the name doesn't matter much. It would be slightly simpler if your personality function has the same name because we don't need to fix WasmEHPrepare though.

EDIT: I assumed you can't share libc++abi, but if you can, things can be simpler.

@aheejin
Copy link
Member

aheejin commented Dec 17, 2025

@HendrikHuebner

I don't understand why we need _Unwind_CallPersonality in the first place, since all it does at the moment is delegating to the (hardcoded) personality function and resetting the selector (Which could also be done inside the personality function). There needs to be more flexibility in the ABI to have different personality functions.

It's a long ago, but I think I made it that way in order to minimize the changes to libc++abi so that we share most of the code with the other platforms. Also the intention was to hide Wasm-specific low-level details in libunwind, such as the use of the variable __wasm_lpad_context or calling builtins for Wasm instructions.

@dschuff Do you think an ABI break involving _Unwind_CallPersonality would still be tolerable at this stage, such as changing the signature or just avoiding it all together and directly emitting a call to the personality function?

It's more of our convention than ABI, so I don't think it's doable if we really need to, but for our libraries I think the current code works fine. You can either

  1. Modify WasmEHPrepare to generate calls to your personality function in case of ObjC, or
  2. Create _Unwind_CallPersonality wrapper in your library too

If 2 is not too much burden, I think it would help make WasmEHPrepare simpler.

@aheejin
Copy link
Member

aheejin commented Dec 17, 2025

@dschuff

IIRC we also have a separate personality function for using wasm EH for setjmp and longjmp (or maybe that's just a different tag?).

We don't use a personality function to handle SjLj. I think what you're talking about might be this?

#ifdef __USING_SJLJ_EXCEPTIONS__
__gxx_personality_sj0

This is used in SjLj exception handling, which I believe is a kind of EH used in x86.

@aheejin
Copy link
Member

aheejin commented Dec 17, 2025

@rjmccall

I don't know why the GNUStep runtime needs different ObjC and ObjC++ personality functions. The ObjC++ personality has to support the complete superset of exception clauses (a single call site can have handlers for both C++ and ObjC exception types). It should therefore still be usable in either pure ObjC or pure C++.

In theory, the ObjC++ personality needs to be able to distinguish ObjC and C++ exceptions, so using it instead of a single-language personality could require a less compact LSDA format. In practice, they use the exact same format, and AFAIK the exception types are always distinguished in a more subtle way. I don't know the exact details of what GNUStep does, but the Apple ObjC runtime just provides a v-table for ObjC exception RTTI objects that implements the private exception-matching virtual methods on type_info differently. (I'm not even sure LLVM supports emitting a different LSDA format for different personalities.)

If ObjC can share libc++abi (with little #ifdefs here and there I suppose) it will be simpler and we don't need to change much (if at all) in WasmEHPrepare.

@aheejin
Copy link
Member

aheejin commented Dec 17, 2025

@hmelder

Second Issue

Here is a minimal example that triggers the crash.

#include<stdio.h>

int main(void) {
	int ret = 0;
	@try {
	}
	@catch (id a)
	{
		ret = 1;
                 puts("abc");
	}

	return ret;
}

The following assertion is triggered:

clang: /home/vm/llvm-project/llvm/lib/Target/WebAssembly/WebAssemblyExceptionInfo.cpp:124: void llvm::WebAssemblyExceptionInfo::recalculate(MachineFunction &, MachineDominatorTree &, const MachineDominanceFrontier &): Assertion `EHInfo' failed.

Here is the crash report main-c3884.zip.

You can use emcc with a modified LLVM build by exporting EM_LLVM_ROOT before sourcing emsdk/emsdk_env.sh:

emcc -fobjc-runtime=gnustep-2.2 -fwasm-exceptions -c main.m

or just invoke clang directly:

/home/vm/llvm-build-wasm/bin/clang -target wasm32-unknown-emscripten -mllvm -combiner-global-alias-analysis=false -mllvm -wasm-enable-sjlj -mllvm -wasm-use-legacy-eh=false -mllvm -disable-lsr --sysroot=/home/vm/emsdk/upstream/emscripten/cache/sysroot -DEMSCRIPTEN -fobjc-runtime=gnustep-2.2 -fwasm-exceptions -c main.m

This is because EH handling has not been implemented. WasmEHInfo is created and calculated when the wasm personality function is used:

if (isScopedEHPersonality(classifyEHPersonality(
F.hasPersonalityFn() ? F.getPersonalityFn() : nullptr))) {
WasmEHInfo = new (Allocator) WasmEHFuncInfo();
}

} else if (Personality == EHPersonality::Wasm_CXX) {
WasmEHFuncInfo &EHInfo = *MF->getWasmEHFuncInfo();
calculateWasmEHInfo(&fn, EHInfo);
// Map all BB references in the Wasm EH data to MBBs.
DenseMap<BBOrMBB, BBOrMBB> SrcToUnwindDest;
for (auto &KV : EHInfo.SrcToUnwindDest) {
const auto *Src = cast<const BasicBlock *>(KV.first);
const auto *Dest = cast<const BasicBlock *>(KV.second);
SrcToUnwindDest[getMBB(Src)] = getMBB(Dest);
}
EHInfo.SrcToUnwindDest = std::move(SrcToUnwindDest);
DenseMap<BBOrMBB, SmallPtrSet<BBOrMBB, 4>> UnwindDestToSrcs;
for (auto &KV : EHInfo.UnwindDestToSrcs) {
const auto *Dest = cast<const BasicBlock *>(KV.first);
MachineBasicBlock *DestMBB = getMBB(Dest);
auto &Srcs = UnwindDestToSrcs[DestMBB];
for (const auto P : KV.second)
Srcs.insert(getMBB(cast<const BasicBlock *>(P)));
}
EHInfo.UnwindDestToSrcs = std::move(UnwindDestToSrcs);
}

If we decide to use the same personality function (or the same name), I think it will work. Or, if the name is different, we should make it be classified as a Wasm personality function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:WebAssembly clang:codegen IR generation bugs: mangling, exceptions, etc. clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category llvm:mc Machine (object) code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants