Skip to content

Conversation

@anutosh491
Copy link
Member

We can see the following while running clang-repl in C mode

anutosh491@vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --Xcc=-x --Xcc=c --Xcc=-std=c23
clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl> #include <stdio.h>
hi

In debug mode while dumping the generated Module, i see this

clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl> #include <stdio.h>

=== compile-ptu 1 ===
[TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
[LLVM IR]
; ModuleID = 'incr_module_1'
source_filename = "incr_module_1"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"hi\0A\00", align 1
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_incr_module_1, ptr null }]

define internal void @__stmts__0() #0 {
entry:
  %call = call i32 (ptr, ...) @printf(ptr noundef @.str)
  ret void
}

declare i32 @printf(ptr noundef, ...) #1

; Function Attrs: noinline nounwind uwtable
define internal void @_GLOBAL__sub_I_incr_module_1() #2 section ".text.startup" {
entry:
  call void @__stmts__0()
  ret void
}

attributes #0 = { "min-legal-vector-width"="0" }
attributes #1 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #2 = { noinline nounwind uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"clang version 22.0.0git (https://github.com/anutosh491/llvm-project.git 81ad8fbc2bb09bae61ed59316468011e4a42cf47)"}
=== end compile-ptu ===

execute-ptu 1: [TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
hi

Basically I see that CodeGen emits IR for a cell before we know whether DiagnosticsEngine has an error. For C code like printf("hi\n"); without <stdio.h>, Sema emits a diagnostic but still produces a "codegen-able" TopLevelStmt, so the printf call is IR-generated into the current module.

Previously, when Diags.hasErrorOccurred() was true, we only cleaned up the PTU AST and left the CodeGen module untouched. The next successful cell then called GenModule(), which returned that same module (now also containing the next cell’s IR), causing side effects from the failed cell (e.g. printf)

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Nov 29, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 29, 2025

@llvm/pr-subscribers-clang

Author: Anutosh Bhat (anutosh491)

Changes

We can see the following while running clang-repl in C mode

anutosh491@<!-- -->vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --Xcc=-x --Xcc=c --Xcc=-std=c23
clang-repl&gt; printf("hi\n");
In file included from &lt;&lt;&lt; inputs &gt;&gt;&gt;:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header &lt;stdio.h&gt; or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl&gt; #include &lt;stdio.h&gt;
hi

In debug mode while dumping the generated Module, i see this

clang-repl&gt; printf("hi\n");
In file included from &lt;&lt;&lt; inputs &gt;&gt;&gt;:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header &lt;stdio.h&gt; or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl&gt; #include &lt;stdio.h&gt;

=== compile-ptu 1 ===
[TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
[LLVM IR]
; ModuleID = 'incr_module_1'
source_filename = "incr_module_1"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"hi\0A\00", align 1
@<!-- -->llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @<!-- -->_GLOBAL__sub_I_incr_module_1, ptr null }]

define internal void @<!-- -->__stmts__0() #<!-- -->0 {
entry:
  %call = call i32 (ptr, ...) @<!-- -->printf(ptr noundef @.str)
  ret void
}

declare i32 @<!-- -->printf(ptr noundef, ...) #<!-- -->1

; Function Attrs: noinline nounwind uwtable
define internal void @<!-- -->_GLOBAL__sub_I_incr_module_1() #<!-- -->2 section ".text.startup" {
entry:
  call void @<!-- -->__stmts__0()
  ret void
}

attributes #<!-- -->0 = { "min-legal-vector-width"="0" }
attributes #<!-- -->1 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #<!-- -->2 = { noinline nounwind uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"clang version 22.0.0git (https://github.com/anutosh491/llvm-project.git 81ad8fbc2bb09bae61ed59316468011e4a42cf47)"}
=== end compile-ptu ===

execute-ptu 1: [TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
hi

Basically I see that CodeGen emits IR for a cell before we know whether DiagnosticsEngine has an error. For C code like printf("hi\n"); without <stdio.h>, Sema emits a diagnostic but still produces a "codegen-able" TopLevelStmt, so the printf call is IR-generated into the current module.

Previously, when Diags.hasErrorOccurred() was true, we only cleaned up the PTU AST and left the CodeGen module untouched. The next successful cell then called GenModule(), which returned that same module (now also containing the next cell’s IR), causing side effects from the failed cell (e.g. printf)


Full diff: https://github.com/llvm/llvm-project/pull/169989.diff

3 Files Affected:

  • (modified) clang/lib/Interpreter/IncrementalAction.cpp (+11)
  • (modified) clang/lib/Interpreter/IncrementalAction.h (+2)
  • (modified) clang/lib/Interpreter/IncrementalParser.cpp (+1)
diff --git a/clang/lib/Interpreter/IncrementalAction.cpp b/clang/lib/Interpreter/IncrementalAction.cpp
index 3d489fce54bc6..e2b9d13017ada 100644
--- a/clang/lib/Interpreter/IncrementalAction.cpp
+++ b/clang/lib/Interpreter/IncrementalAction.cpp
@@ -120,6 +120,17 @@ std::unique_ptr<llvm::Module> IncrementalAction::GenModule() {
   return nullptr;
 }
 
+void IncrementalAction::discardCurrentCodeGenModule() {
+  if (CodeGenerator *CG = getCodeGen()) {
+    if (auto *CurM = CG->GetModule()) {
+      llvm::LLVMContext &Ctx = CurM->getContext();
+      std::string Name = CurM->getName().str();
+      std::unique_ptr<llvm::Module> Dead(CG->ReleaseModule());
+      CG->StartModule(Name, Ctx);
+    }
+  }
+}
+
 CodeGenerator *IncrementalAction::getCodeGen() const {
   FrontendAction *WrappedAct = getWrapped();
   if (!WrappedAct || !WrappedAct->hasIRSupport())
diff --git a/clang/lib/Interpreter/IncrementalAction.h b/clang/lib/Interpreter/IncrementalAction.h
index 725cdd0c27cf4..485cfaa45f793 100644
--- a/clang/lib/Interpreter/IncrementalAction.h
+++ b/clang/lib/Interpreter/IncrementalAction.h
@@ -74,6 +74,8 @@ class IncrementalAction : public WrapperFrontendAction {
 
   /// Generate an LLVM module for the most recent parsed input.
   std::unique_ptr<llvm::Module> GenModule();
+
+  void discardCurrentCodeGenModule();
 };
 
 class InProcessPrintingASTConsumer final : public MultiplexConsumer {
diff --git a/clang/lib/Interpreter/IncrementalParser.cpp b/clang/lib/Interpreter/IncrementalParser.cpp
index bf08911e23533..53379603c26da 100644
--- a/clang/lib/Interpreter/IncrementalParser.cpp
+++ b/clang/lib/Interpreter/IncrementalParser.cpp
@@ -82,6 +82,7 @@ IncrementalParser::ParseOrWrapTopLevelDecl() {
 
   DiagnosticsEngine &Diags = S.getDiagnostics();
   if (Diags.hasErrorOccurred()) {
+    Act->discardCurrentCodeGenModule();
     CleanUpPTU(C.getTranslationUnitDecl());
 
     Diags.Reset(/*soft=*/true);

@anutosh491
Copy link
Member Author

So this fix discards the current CodeGen module whenever Diags.hasErrorOccurred(), so IR from a failing input is dropped and cannot leak into the next cell. After the patch in debug mode we can see the following !

anutosh491@vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --debug-only=clang-repl
compile-ptu 0: [TU=0x55556cf78168, M=0x55556cf72c60 (incr_module_0)]
execute-ptu 0: [TU=0x55556cf78168, M=0x55556cf72c60 (incr_module_0)]
clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: use of undeclared identifier 'printf'
    1 | printf("hi\n");
      | ^~~~~~
error: Parsing failed.
clang-repl> #include <stdio.h>
compile-ptu 1: [TU=0x55556cfc7c08, M=0x55556cfbeee0 (incr_module_1)]
execute-ptu 1: [TU=0x55556cfc7c08, M=0x55556cfbeee0 (incr_module_1)]
clang-repl> printf("hi\n");
compile-ptu 2: [TU=0x55556d02cd68, M=0x55556cfca4e0 (incr_module_2)]
execute-ptu 2: [TU=0x55556d02cd68, M=0x55556cfca4e0 (incr_module_2)]
hi
clang-repl> 

anutosh491@vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --Xcc=-x --Xcc=c --Xcc=-std=c23 --debug-only=clang-repl
compile-ptu 0: [TU=0x55556cfb2b18, M=0x55556cf95a90 (incr_module_0)]
execute-ptu 0: [TU=0x55556cfb2b18, M=0x55556cf95a90 (incr_module_0)]
clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type
      'int (const char *, ...)'; ISO C99 and later do not support implicit function
      declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a
      declaration for 'printf'
error: Parsing failed.
clang-repl> #include <stdio.h> 
compile-ptu 1: [TU=0x55556cfbfb20, M=0x55556cf979a0 (incr_module_1)]
execute-ptu 1: [TU=0x55556cfbfb20, M=0x55556cf979a0 (incr_module_1)]
clang-repl> printf("hi\n");
compile-ptu 2: [TU=0x55556d00b5f8, M=0x55556cfc1790 (incr_module_2)]
execute-ptu 2: [TU=0x55556d00b5f8, M=0x55556cfc1790 (incr_module_2)]
hi

@anutosh491
Copy link
Member Author

Looks like the goto approach for fixing this to me. Can add tests if we agree with the same !

@anutosh491 anutosh491 changed the title [clang-repl] Drop CodeGen module when an input has parse errors [clang-repl] Skip CodeGen for top-level decls when diagnostics report errors Dec 8, 2025
@anutosh491
Copy link
Member Author

My first approach here #169989 (comment) highlights how I thought dropping the faulty module having the IR leaked from the input line above was the solution. This is what commit1 was trying to do.

But on discussing with @vgvassilev , we discovered that we're making a call to codegen even when an error has occured.

So basically we don’t want failing inputs to produce IR or leak state into subsequent cells. This patch adds a guard in InProcessPrintingASTConsumer::HandleTopLevelDecl that returns early when DiagnosticsEngine::hasErrorOccurred() is true, so the MultiplexConsumer/CodeGen path is not invoked for erroneous inputs.

Copy link
Contributor

@vgvassilev vgvassilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@anutosh491
Copy link
Member Author

Thanks for the review. Merging !

@anutosh491 anutosh491 merged commit cbce30e into llvm:main Dec 8, 2025
10 checks passed
@anutosh491 anutosh491 deleted the discard_curr_mod branch December 8, 2025 16:53
honeygoyal pushed a commit to honeygoyal/llvm-project that referenced this pull request Dec 9, 2025
… errors (llvm#169989)

We can see the following while running clang-repl in C mode 
```
anutosh491@vv-nuc:/build/anutosh491/llvm-project/build/bin$ ./clang-repl --Xcc=-x --Xcc=c --Xcc=-std=c23
clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl> #include <stdio.h>
hi
```

In debug mode while dumping the generated Module, i see this 
```
clang-repl> printf("hi\n");
In file included from <<< inputs >>>:1:
input_line_1:1:1: error: call to undeclared library function 'printf' with type 'int (const char *, ...)'; ISO C99 and
      later do not support implicit function declarations [-Wimplicit-function-declaration]
    1 | printf("hi\n");
      | ^
input_line_1:1:1: note: include the header <stdio.h> or explicitly provide a declaration for 'printf'
error: Parsing failed.
clang-repl> #include <stdio.h>

=== compile-ptu 1 ===
[TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
[LLVM IR]
; ModuleID = 'incr_module_1'
source_filename = "incr_module_1"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"hi\0A\00", align 1
@llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 65535, ptr @_GLOBAL__sub_I_incr_module_1, ptr null }]

define internal void @__stmts__0() #0 {
entry:
  %call = call i32 (ptr, ...) @printf(ptr noundef @.str)
  ret void
}

declare i32 @printf(ptr noundef, ...) llvm#1

; Function Attrs: noinline nounwind uwtable
define internal void @_GLOBAL__sub_I_incr_module_1() llvm#2 section ".text.startup" {
entry:
  call void @__stmts__0()
  ret void
}

attributes #0 = { "min-legal-vector-width"="0" }
attributes llvm#1 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes llvm#2 = { noinline nounwind uwtable "frame-pointer"="all" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1, !2, !3, !4}
!llvm.ident = !{!5}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 8, !"PIC Level", i32 2}
!2 = !{i32 7, !"PIE Level", i32 2}
!3 = !{i32 7, !"uwtable", i32 2}
!4 = !{i32 7, !"frame-pointer", i32 2}
!5 = !{!"clang version 22.0.0git (https://github.com/anutosh491/llvm-project.git 81ad8fb)"}
=== end compile-ptu ===

execute-ptu 1: [TU=0x55556cfbf830, M=0x55556cfc13a0 (incr_module_1)]
hi
```

Basically I see that CodeGen emits IR for a cell before we know whether
DiagnosticsEngine has an error. For C code like `printf("hi\n");`
without <stdio.h>, Sema emits a diagnostic but still produces a
"codegen-able" `TopLevelStmt`, so the `printf` call is IR-generated into
the current module.

Previously, when `Diags.hasErrorOccurred()` was true, we only cleaned up
the PTU AST and left the CodeGen module untouched. The next successful
cell then called `GenModule()`, which returned that same module (now
also containing the next cell’s IR), causing side effects from the
failed cell (e.g. printf)
@anutosh491
Copy link
Member Author

anutosh491 commented Dec 9, 2025

A small corner case was raised here

#171440

Small cause would only show up in C mode from (C99+ to C17)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category clang-repl

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants