A complete, production-ready C compiler that transforms high-level C code into executable FRISC assembly. Built from scratch using formal compiler construction techniques, this compiler demonstrates every phase of modern compiler design—from lexical analysis through code generation.
This isn't just another compiler project—it's a complete, educational, and production-quality implementation that:
- 🎯 Compiles Real C Programs: Supports a comprehensive subset of C including functions, arrays, control flow, and more
- 🏗️ Built from Scratch: No external parser generators or regex libraries—everything is hand-crafted using formal automata theory
- 📚 Educational Excellence: Clear architecture, comprehensive documentation, and well-commented code perfect for learning compiler construction
- 🎨 Beautiful Output: Generates human-readable FRISC assembly with extensive comments and proper formatting
- ✅ Thoroughly Tested: 90+ test programs with 82% success rate, comprehensive HTML reports, and FRISC simulator integration
Before you begin, ensure you have:
- Java 21+ (uses modern features: records, sealed classes, pattern matching)
- Maven 3.8+ for build management
- Node.js 18+ (for running FRISC simulator—see FRISC Simulator Guide)
- Bash (Unix-like environment recommended)
Check your setup:
java -version # Should show Java 21 or higher
mvn -version # Should show Maven 3.8 or higher
node --version # Should show Node.js 18 or higher (for simulator)Installing Node.js (if needed):
# Using Homebrew (macOS)
brew install node
# Using nvm (recommended)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
nvm install 18
nvm use 18
# Or download from https://nodejs.org/Installing FRISC Simulator (see more here):
# Install FRISC simulator dependencies
npm install friscjs
# This installs friscjs package in node_modules/Option 1: Quick Build (Recommended)
./build.shThis script:
- ✅ Compiles all modules
- ✅ Runs comprehensive tests
- ✅ Performs static analysis
- ✅ Generates executable JAR at
cli/target/ccompiler.jar
Option 2: Manual Build
# Complete build with all checks
mvn clean verify
# Fast development build (skip tests and checks)
mvn clean package -DskipTestsLet's compile a simple C program:
1. Create a test program:
// hello.c
int main(void) {
return 42;
}2. Compile it:
./run.sh hello.c3. Check the output:
cat compiler-bin/a.friscYou should see beautiful FRISC assembly code! 🎉
1. Execute with FRISC Simulator:
# Make sure FRISC simulator is installed first
npm install friscjs
# Run the generated assembly
node node_modules/friscjs/consoleapp/frisc-console.js compiler-bin/a.friscThe simulator will output the program's return value (42) in register R6 as a decimal number (not hex).
2. Or use the built-in runner:
./run.sh run compiler-bin/a.friscNote: The FRISC simulator outputs decimal values to stdout. The compiler's test infrastructure automatically compares these decimal values with expected results—no hex conversion needed!
The compiler supports multiple execution modes:
# Lexical analysis only (outputs tokens to stdout)
./run.sh lexer program.c
# Syntax analysis (generates parse trees)
./run.sh syntax program.c
# Output: compiler-bin/generativno_stablo.txt
# compiler-bin/sintaksno_stablo.txt
# Semantic analysis (type checking, symbol resolution)
./run.sh semantic program.c
# Additional output: compiler-bin/tablica_simbola.txt
# compiler-bin/semanticko_stablo.txt
# Full compilation (all phases → FRISC assembly)
./run.sh program.c
# Final output: compiler-bin/a.friscLet's trace through a complete example:
1. Create a program:
// factorial.c
int factorial(int n) {
if (n <= 1) {
return 1;
}
return n * factorial(n - 1);
}
int main(void) {
return factorial(5);
}2. Compile:
./run.sh factorial.c3. View generated assembly:
cat compiler-bin/a.frisc4. Run on FRISC simulator:
# Make sure FRISC simulator is installed
npm install
# Execute the generated assembly
node node_modules/friscjs/consoleapp/frisc-console.js compiler-bin/a.frisc
# Output: 120 (5! = 120)
# The simulator outputs the decimal value of R6 register5. Inspect intermediate outputs:
# See lexical tokens
cat compiler-bin/leksicke_jedinke.txt
# See parse tree
cat compiler-bin/sintaksno_stablo.txt
# See symbol table
cat compiler-bin/tablica_simbola.txtThis project includes extensive documentation organized as a comprehensive guide to compiler construction. All documentation is located in the docs/ directory and organized into logical chapters:
The documentation is organized into chapter-like sections covering all aspects of compiler construction:
- Overview: Project overview, architecture, and quick start guide
- Project Architecture: Detailed architecture overview, module organization, and design patterns
- Formal Languages and Grammars: Formal language theory, regular languages, context-free grammars, and LR parsing foundations
- Automata and Parsing Theory: Detailed automata algorithms, parsing algorithms, and error recovery techniques
- Lexer Design: Lexer architecture, token specification, and design principles
- Implementation Notes: Complete technical documentation including regex parsing and NFA/DFA conversion algorithms
- Token Specification: User guide for writing lexer specifications and token patterns
- Grammar Specification: Grammar format, production rules, and parser module overview
- Parser Construction: Parser architecture, grammar parsing, and FIRST set computation
- Parsing Tables and Algorithms: Detailed LR(1) parser implementation, table construction, and runtime parsing
- Symbol Tables and Scopes: Symbol table implementation, scope management, and identifier resolution
- Type System and Checking: Type system design, type checking algorithms, and semantic validation
- Semantic Passes: Semantic analysis pipeline, attribute synthesis, and error reporting
- IR Design: AST structure, node hierarchy, and IR design principles
- AST Structure and Walkers: Detailed AST node specifications and traversal mechanisms
- Target Architecture Overview: FRISC architecture overview, code generation strategy, and runtime model
- Instruction Selection: Code generation algorithms, expression code generation, and statement code generation
- Calling Conventions and Runtime: Function calling conventions, stack management, and activation records
- FRISC Codegen Details: Complete FRISC processor reference including instruction set, addressing modes, and assembly directives
- Codegen Module Structure: Complete guide to code generation module architecture and package organization
- Codegen Rules and Conventions: Detailed rules and conventions for code generation (37 rules covering expressions, statements, functions, memory, types, labels, formatting, stack, and registers)
- Basic Optimizations: Optimization techniques including constant folding, dead code elimination, and register allocation
- Runtime Library: Runtime functions, helper function generation, and memory management
- Helper Functions on FRISC: Detailed implementation of helper functions including float operations (Q16.16 fixed-point)
- FRISC Simulator Guide: Complete guide to using the FRISC simulator for testing and debugging
- Configuration Overview: Configuration system overview, file loading, and validation
- Config File Reference: Complete reference for lexer, parser, and semantics configuration file formats
- Examples and Best Practices: Configuration examples and usage patterns
- Test Strategy: Testing methodology, test organization, and execution
- Example Programs: Test program catalog and validation results
- Debugging Workflow: Debugging techniques and tools
- Glossary: Complete glossary of compiler construction terms
- Notation and Conventions: Documentation notation, code conventions, and terminology
- Bibliography and Further Reading: References to textbooks, papers, and online resources
For new users, start with:
- Introduction Overview: Project overview and quick start
- Project Architecture: Understanding the compiler structure
- Theoretical Foundations: Learn the theoretical background
- Lexical Analysis: Start with the first compiler phase
- FRISC Simulator Guide: Running and debugging FRISC assembly
The compiler follows a clean, modular architecture with four distinct phases:
flowchart LR
A[Source Code<br/>program.c] --> B[Lexical Analysis<br/>Tokenization]
B --> C[Syntax Analysis<br/>Parse Tree]
C --> D[Semantic Analysis<br/>Type Checking]
D --> E[Code Generation<br/>FRISC Assembly]
E --> F[a.frisc]
style A fill:#e1f5fe
style E fill:#c8e6c9
style F fill:#f3e5f5
compiler-lexer/ → Tokenization using hand-built DFAs
compiler-parser/ → LR(1) parsing with auto-generated tables
compiler-semantics/ → Type checking and symbol resolution
compiler-codegen/ → FRISC assembly generation
cli/ → Command-line interface
Each module is independently testable and follows strict dependency hierarchy.
The compiler supports a comprehensive subset of C:
- Data Types:
int,char,void, arrays, functions - Control Flow:
if/else,while,for,break,continue,return - Operators: Arithmetic, relational, logical, bitwise, assignment, increment/decrement
- Functions: Full function support with parameters and return values
- Arrays: Array declarations, indexing, and initialization
- Variables: Local and global variables with proper scoping
Check out the examples/ directory:
- Valid Programs (
examples/valid/): 80+ working examples - Invalid Programs (
examples/invalid/): 70+ error examples
# Run all tests
mvn test
# Run tests for specific module
mvn test -pl compiler-lexer
mvn test -pl compiler-parser
mvn test -pl compiler-semantics
mvn test -pl compiler-codegenGenerate comprehensive HTML reports for all test programs:
# Using Java directly
java -cp "$(mvn dependency:build-classpath -q -pl cli -DincludeScope=compile | tail -1):cli/target/classes:compiler-codegen/target/classes:compiler-semantics/target/classes:compiler-parser/target/classes:compiler-lexer/target/classes" hr.fer.ppj.examples.ExamplesReportGenerator
# Reports generated:
# - examples/report_valid.html
# - examples/report_invalid.htmlReports include:
- ✅ Source code listings
- ✅ Lexical token analysis
- ✅ Parse tree visualizations
- ✅ Semantic analysis results
- ✅ Generated FRISC assembly code
- ✅ Execution results with FRISC simulator
- 90 valid C programs tested
- 82.2% success rate (74 programs compile successfully)
- 16 programs fail due to unsupported features (float, struct, advanced pointers)
- All successful programs execute correctly on FRISC simulator
The project enforces strict quality standards:
# Run all quality checks
mvn verify
# Individual tools
mvn checkstyle:check # Code style
mvn spotbugs:check # Bug detection
mvn spotless:check # Formatting
mvn spotless:apply # Auto-format.
├── compiler-lexer/ # Lexical analysis module
├── compiler-parser/ # Syntax analysis module
├── compiler-semantics/ # Semantic analysis module
├── compiler-codegen/ # Code generation module
├── cli/ # Command-line interface
├── config/ # Grammar and lexer definitions
├── examples/ # Test programs
│ ├── valid/ # Valid C programs
│ └── invalid/ # Invalid programs (for error testing)
├── docs/ # Comprehensive documentation
└── pom.xml # Maven root configuration
- 🎓 Educational Value: Every phase is clearly documented and follows formal compiler construction principles
- 🏗️ Clean Architecture: Modular design with strict separation of concerns
- 📝 Comprehensive Documentation: 15+ detailed documentation files covering every aspect
- ✅ Production Quality: Extensive testing, error handling, and code quality tools
- 🎨 Beautiful Output: Human-readable assembly with extensive comments
- 🚀 Complete Pipeline: From source code to executable assembly in one command
- ✅ Manual Regex Parser: No external regex libraries—hand-built using formal automata theory
- ✅ Canonical LR(1) Parser: Auto-generated parsing tables with ~823 states
- ✅ Complete Type System: Full type checking with const-qualification support
- ✅ FRISC Code Generation: Complete assembly generation for all supported constructs
- ✅ Stack Management: Proper activation records and calling conventions
- ✅ Short-Circuit Evaluation: Correct implementation of
&&and||operators
- Try the Examples: Explore
examples/valid/to see what the compiler can do - Read the Documentation: Start with Introduction Overview and FRISC Simulator Guide
- Write Your Own Programs: Compile your C programs and run them on the FRISC simulator
- Explore the Architecture: Read Project Architecture and Codegen Module Structure to understand the codebase
- Study the Rules: Review Codegen Rules and Conventions for implementation details
- Contribute: Check out the code quality standards and start contributing!
- Lexical Analysis: Complete with multi-state lexer and error recovery
- Syntax Analysis: Full LR(1) parser with automatic table generation
- Semantic Analysis: Complete type system with scope resolution
- Code Generation: Full FRISC assembly generation for all supported constructs
- Testing: Comprehensive test suite with HTML report generation
- Documentation: 36+ detailed documentation files organized into 12 chapters
- Advanced optimizations (dead code elimination, constant folding)
- Enhanced diagnostics (multiple errors, warnings, suggestions)
- Development tools (debugger, visualizations, profiling)
- Extended language support (structs, pointers, float types)
This is an educational project demonstrating formal compiler construction. Contributions are welcome! Please:
- Follow the code quality standards (Checkstyle, SpotBugs, Spotless)
- Add comprehensive tests for new features
- Update documentation for significant changes
- Maintain the educational focus and code clarity
This project is licensed under the MIT License—see the LICENSE file for details.
Karlo Knežević
- Website: karloknezevic.github.io
- GitHub: @karloknezevic
Here's a complete example to get you started:
# 1. Install FRISC simulator (if not already installed)
npm install
# 2. Build the compiler
./build.sh
# 3. Compile a test program
./run.sh examples/valid/program1.c
# 4. View the generated assembly
cat compiler-bin/a.frisc
# 5. Run the generated code on FRISC simulator
node node_modules/friscjs/consoleapp/frisc-console.js compiler-bin/a.frisc
# 6. Explore the comprehensive documentation
# All documentation is organized in docs/ directory by chapter
ls docs/Happy Compiling! 🚀
The documentation is comprehensively organized into 12 chapters covering all aspects of compiler construction. See the Comprehensive Documentation section above for the complete structure.
Quick Links:
- Introduction - Start here for project overview
- Theoretical Foundations - Formal language theory
- Lexical Analysis - Token specification and lexer implementation
- Syntax Analysis - Grammar and parser construction
- Semantic Analysis - Type checking and symbol resolution
- Code Generation - FRISC assembly generation
- Configuration - Configuration file reference
- Testing - Testing methodology
- Appendices - Glossary and references
This compiler represents a complete implementation of formal compiler construction techniques, providing both educational value and practical functionality. Every phase—from lexical analysis through code generation—is implemented from scratch using rigorous theoretical foundations.