Skip to content

A modular Java-based compiler framework for a custom C-like language, featuring a full lexer generator (ε-NFA→DFA), LR(1) parser, semantic analysis, and FRISC assembly codegen. Includes multi-stage testing, error recovery, and comprehensive developer/user documentation.

Notifications You must be signed in to change notification settings

KarloKnezevic/ccompiler

Repository files navigation

🚀 PPJ Compiler: From C to FRISC Assembly

A complete, production-ready C compiler that transforms high-level C code into executable FRISC assembly. Built from scratch using formal compiler construction techniques, this compiler demonstrates every phase of modern compiler design—from lexical analysis through code generation.

Java Maven License

✨ What Makes This Compiler Special?

This isn't just another compiler project—it's a complete, educational, and production-quality implementation that:

  • 🎯 Compiles Real C Programs: Supports a comprehensive subset of C including functions, arrays, control flow, and more
  • 🏗️ Built from Scratch: No external parser generators or regex libraries—everything is hand-crafted using formal automata theory
  • 📚 Educational Excellence: Clear architecture, comprehensive documentation, and well-commented code perfect for learning compiler construction
  • 🎨 Beautiful Output: Generates human-readable FRISC assembly with extensive comments and proper formatting
  • Thoroughly Tested: 90+ test programs with 82% success rate, comprehensive HTML reports, and FRISC simulator integration

🎬 Quick Start

Prerequisites

Before you begin, ensure you have:

  • Java 21+ (uses modern features: records, sealed classes, pattern matching)
  • Maven 3.8+ for build management
  • Node.js 18+ (for running FRISC simulator—see FRISC Simulator Guide)
  • Bash (Unix-like environment recommended)

Check your setup:

java -version    # Should show Java 21 or higher
mvn -version     # Should show Maven 3.8 or higher
node --version   # Should show Node.js 18 or higher (for simulator)

Installing Node.js (if needed):

# Using Homebrew (macOS)
brew install node

# Using nvm (recommended)
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash
nvm install 18
nvm use 18

# Or download from https://nodejs.org/

Installing FRISC Simulator (see more here):

# Install FRISC simulator dependencies
npm install friscjs

# This installs friscjs package in node_modules/

🏗️ Building the Compiler

Option 1: Quick Build (Recommended)

./build.sh

This script:

  • ✅ Compiles all modules
  • ✅ Runs comprehensive tests
  • ✅ Performs static analysis
  • ✅ Generates executable JAR at cli/target/ccompiler.jar

Option 2: Manual Build

# Complete build with all checks
mvn clean verify

# Fast development build (skip tests and checks)
mvn clean package -DskipTests

🎯 Compiling Your First Program

Let's compile a simple C program:

1. Create a test program:

// hello.c
int main(void) {
    return 42;
}

2. Compile it:

./run.sh hello.c

3. Check the output:

cat compiler-bin/a.frisc

You should see beautiful FRISC assembly code! 🎉

🚀 Running Generated Code

1. Execute with FRISC Simulator:

# Make sure FRISC simulator is installed first
npm install friscjs

# Run the generated assembly
node node_modules/friscjs/consoleapp/frisc-console.js compiler-bin/a.frisc

The simulator will output the program's return value (42) in register R6 as a decimal number (not hex).

2. Or use the built-in runner:

./run.sh run compiler-bin/a.frisc

Note: The FRISC simulator outputs decimal values to stdout. The compiler's test infrastructure automatically compares these decimal values with expected results—no hex conversion needed!

📖 Complete Usage Guide

Compiler Commands

The compiler supports multiple execution modes:

# Lexical analysis only (outputs tokens to stdout)
./run.sh lexer program.c

# Syntax analysis (generates parse trees)
./run.sh syntax program.c
# Output: compiler-bin/generativno_stablo.txt
#         compiler-bin/sintaksno_stablo.txt

# Semantic analysis (type checking, symbol resolution)
./run.sh semantic program.c
# Additional output: compiler-bin/tablica_simbola.txt
#                   compiler-bin/semanticko_stablo.txt

# Full compilation (all phases → FRISC assembly)
./run.sh program.c
# Final output: compiler-bin/a.frisc

Example: Complete Workflow

Let's trace through a complete example:

1. Create a program:

// factorial.c
int factorial(int n) {
    if (n <= 1) {
        return 1;
    }
    return n * factorial(n - 1);
}

int main(void) {
    return factorial(5);
}

2. Compile:

./run.sh factorial.c

3. View generated assembly:

cat compiler-bin/a.frisc

4. Run on FRISC simulator:

# Make sure FRISC simulator is installed
npm install

# Execute the generated assembly
node node_modules/friscjs/consoleapp/frisc-console.js compiler-bin/a.frisc
# Output: 120 (5! = 120)
# The simulator outputs the decimal value of R6 register

5. Inspect intermediate outputs:

# See lexical tokens
cat compiler-bin/leksicke_jedinke.txt

# See parse tree
cat compiler-bin/sintaksno_stablo.txt

# See symbol table
cat compiler-bin/tablica_simbola.txt

📚 Comprehensive Documentation

This project includes extensive documentation organized as a comprehensive guide to compiler construction. All documentation is located in the docs/ directory and organized into logical chapters:

📖 Documentation Structure

The documentation is organized into chapter-like sections covering all aspects of compiler construction:

1. Introduction

  • Overview: Project overview, architecture, and quick start guide
  • Project Architecture: Detailed architecture overview, module organization, and design patterns

2. Theoretical Foundations

3. Lexical Analysis

  • Lexer Design: Lexer architecture, token specification, and design principles
  • Implementation Notes: Complete technical documentation including regex parsing and NFA/DFA conversion algorithms
  • Token Specification: User guide for writing lexer specifications and token patterns

4. Syntax Analysis

5. Semantic Analysis

6. Intermediate Representation

7. Code Generation

8. Optimizations

  • Basic Optimizations: Optimization techniques including constant folding, dead code elimination, and register allocation

9. Runtime and Support

10. Configuration

11. Testing and Tooling

12. Appendices

🎓 Quick Start Documentation

For new users, start with:

  1. Introduction Overview: Project overview and quick start
  2. Project Architecture: Understanding the compiler structure
  3. Theoretical Foundations: Learn the theoretical background
  4. Lexical Analysis: Start with the first compiler phase
  5. FRISC Simulator Guide: Running and debugging FRISC assembly

🏛️ Architecture Overview

The compiler follows a clean, modular architecture with four distinct phases:

flowchart LR
    A[Source Code<br/>program.c] --> B[Lexical Analysis<br/>Tokenization]
    B --> C[Syntax Analysis<br/>Parse Tree]
    C --> D[Semantic Analysis<br/>Type Checking]
    D --> E[Code Generation<br/>FRISC Assembly]
    
    E --> F[a.frisc]
    
    style A fill:#e1f5fe
    style E fill:#c8e6c9
    style F fill:#f3e5f5
Loading

Module Structure

compiler-lexer/      → Tokenization using hand-built DFAs
compiler-parser/     → LR(1) parsing with auto-generated tables
compiler-semantics/  → Type checking and symbol resolution
compiler-codegen/    → FRISC assembly generation
cli/                 → Command-line interface

Each module is independently testable and follows strict dependency hierarchy.

🎨 Language Features

The compiler supports a comprehensive subset of C:

✅ Supported Features

  • Data Types: int, char, void, arrays, functions
  • Control Flow: if/else, while, for, break, continue, return
  • Operators: Arithmetic, relational, logical, bitwise, assignment, increment/decrement
  • Functions: Full function support with parameters and return values
  • Arrays: Array declarations, indexing, and initialization
  • Variables: Local and global variables with proper scoping

📝 Example Programs

Check out the examples/ directory:

  • Valid Programs (examples/valid/): 80+ working examples
  • Invalid Programs (examples/invalid/): 70+ error examples

🧪 Testing and Validation

Running Tests

# Run all tests
mvn test

# Run tests for specific module
mvn test -pl compiler-lexer
mvn test -pl compiler-parser
mvn test -pl compiler-semantics
mvn test -pl compiler-codegen

Generating HTML Reports

Generate comprehensive HTML reports for all test programs:

# Using Java directly
java -cp "$(mvn dependency:build-classpath -q -pl cli -DincludeScope=compile | tail -1):cli/target/classes:compiler-codegen/target/classes:compiler-semantics/target/classes:compiler-parser/target/classes:compiler-lexer/target/classes" hr.fer.ppj.examples.ExamplesReportGenerator

# Reports generated:
# - examples/report_valid.html
# - examples/report_invalid.html

Reports include:

  • ✅ Source code listings
  • ✅ Lexical token analysis
  • ✅ Parse tree visualizations
  • ✅ Semantic analysis results
  • ✅ Generated FRISC assembly code
  • ✅ Execution results with FRISC simulator

Test Results

  • 90 valid C programs tested
  • 82.2% success rate (74 programs compile successfully)
  • 16 programs fail due to unsupported features (float, struct, advanced pointers)
  • All successful programs execute correctly on FRISC simulator

🛠️ Development

Code Quality

The project enforces strict quality standards:

# Run all quality checks
mvn verify

# Individual tools
mvn checkstyle:check      # Code style
mvn spotbugs:check        # Bug detection
mvn spotless:check        # Formatting
mvn spotless:apply        # Auto-format

Project Structure

.
├── compiler-lexer/       # Lexical analysis module
├── compiler-parser/       # Syntax analysis module
├── compiler-semantics/   # Semantic analysis module
├── compiler-codegen/     # Code generation module
├── cli/                  # Command-line interface
├── config/               # Grammar and lexer definitions
├── examples/             # Test programs
│   ├── valid/           # Valid C programs
│   └── invalid/         # Invalid programs (for error testing)
├── docs/                 # Comprehensive documentation
└── pom.xml              # Maven root configuration

🎯 Key Highlights

What Makes This Compiler Stand Out?

  1. 🎓 Educational Value: Every phase is clearly documented and follows formal compiler construction principles
  2. 🏗️ Clean Architecture: Modular design with strict separation of concerns
  3. 📝 Comprehensive Documentation: 15+ detailed documentation files covering every aspect
  4. ✅ Production Quality: Extensive testing, error handling, and code quality tools
  5. 🎨 Beautiful Output: Human-readable assembly with extensive comments
  6. 🚀 Complete Pipeline: From source code to executable assembly in one command

Technical Achievements

  • Manual Regex Parser: No external regex libraries—hand-built using formal automata theory
  • Canonical LR(1) Parser: Auto-generated parsing tables with ~823 states
  • Complete Type System: Full type checking with const-qualification support
  • FRISC Code Generation: Complete assembly generation for all supported constructs
  • Stack Management: Proper activation records and calling conventions
  • Short-Circuit Evaluation: Correct implementation of && and || operators

🚀 Next Steps

For Users

  1. Try the Examples: Explore examples/valid/ to see what the compiler can do
  2. Read the Documentation: Start with Introduction Overview and FRISC Simulator Guide
  3. Write Your Own Programs: Compile your C programs and run them on the FRISC simulator

For Developers

  1. Explore the Architecture: Read Project Architecture and Codegen Module Structure to understand the codebase
  2. Study the Rules: Review Codegen Rules and Conventions for implementation details
  3. Contribute: Check out the code quality standards and start contributing!

📊 Project Status

✅ Completed Features

  • Lexical Analysis: Complete with multi-state lexer and error recovery
  • Syntax Analysis: Full LR(1) parser with automatic table generation
  • Semantic Analysis: Complete type system with scope resolution
  • Code Generation: Full FRISC assembly generation for all supported constructs
  • Testing: Comprehensive test suite with HTML report generation
  • Documentation: 36+ detailed documentation files organized into 12 chapters

🔮 Future Enhancements

  • Advanced optimizations (dead code elimination, constant folding)
  • Enhanced diagnostics (multiple errors, warnings, suggestions)
  • Development tools (debugger, visualizations, profiling)
  • Extended language support (structs, pointers, float types)

🤝 Contributing

This is an educational project demonstrating formal compiler construction. Contributions are welcome! Please:

  1. Follow the code quality standards (Checkstyle, SpotBugs, Spotless)
  2. Add comprehensive tests for new features
  3. Update documentation for significant changes
  4. Maintain the educational focus and code clarity

📄 License

This project is licensed under the MIT License—see the LICENSE file for details.

👤 Author

Karlo Knežević


🎉 Ready to Start?

Here's a complete example to get you started:

# 1. Install FRISC simulator (if not already installed)
npm install

# 2. Build the compiler
./build.sh

# 3. Compile a test program
./run.sh examples/valid/program1.c

# 4. View the generated assembly
cat compiler-bin/a.frisc

# 5. Run the generated code on FRISC simulator
node node_modules/friscjs/consoleapp/frisc-console.js compiler-bin/a.frisc

# 6. Explore the comprehensive documentation
# All documentation is organized in docs/ directory by chapter
ls docs/

Happy Compiling! 🚀


📚 Documentation Quick Links

The documentation is comprehensively organized into 12 chapters covering all aspects of compiler construction. See the Comprehensive Documentation section above for the complete structure.

Quick Links:


This compiler represents a complete implementation of formal compiler construction techniques, providing both educational value and practical functionality. Every phase—from lexical analysis through code generation—is implemented from scratch using rigorous theoretical foundations.

About

A modular Java-based compiler framework for a custom C-like language, featuring a full lexer generator (ε-NFA→DFA), LR(1) parser, semantic analysis, and FRISC assembly codegen. Includes multi-stage testing, error recovery, and comprehensive developer/user documentation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published