feat: Add LLM integration layer for semantic code analysis #82

Copilot · 2025-12-10T13:34:10Z

Pull Request

Description

Implements optional LLM integration for semantic code analysis beyond AST syntax comparison. Enables behavioral change detection, code execution simulation, and intelligent documentation synchronization for DocuMCP Orchestrator (ADR-009, ADR-010).

Architecture:

LLM Client (src/utils/llm-client.ts): Multi-provider abstraction (DeepSeek, OpenAI, Anthropic, Ollama) with rate limiting, timeout handling, and graceful degradation
Semantic Analyzer (src/utils/semantic-analyzer.ts): Three-mode analysis (LLM/AST/Hybrid) with automatic fallback and confidence-based selection
Complete isolation: Zero impact on existing functionality, works fully offline without LLM

Usage:

import { SemanticAnalyzer } from './utils/semantic-analyzer.js';

const analyzer = new SemanticAnalyzer();
await analyzer.initialize();

// Detects behavioral changes, not just signature changes
const analysis = await analyzer.analyzeSemanticImpact(
  'function multiply(a, b) { return a * b; }',
  'function multiply(a, b) { return a + b; }'  // Bug!
);

console.log(analysis.hasBehavioralChange);     // true
console.log(analysis.breakingForExamples);     // true
console.log(analysis.analysisMode);            // 'llm' | 'ast' | 'hybrid'
console.log(analysis.confidence);              // 0.0-1.0

// Validate documentation examples without execution
const validation = await analyzer.validateExamples(
  ['const result = multiply(6, 7);  // Should return 42'],
  implementation
);

Configuration (optional):

export DOCUMCP_LLM_API_KEY="sk-..."
export DOCUMCP_LLM_PROVIDER="deepseek"  # default
export DOCUMCP_LLM_MODEL="deepseek-chat"

Type of Change

New feature (non-breaking change which adds functionality)
Documentation update

Testing

Tests pass locally with my changes
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes (48 new tests, 1554 total passing)
Test coverage maintains or improves upon 80% threshold (85-97% on new modules)

Code Quality

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have run npm run validate:rules and all checks pass

Documentation

I have made corresponding changes to the documentation
- Complete how-to guide: docs/how-to/llm-integration.md
- Provider configuration for all supported LLMs
- Usage patterns, best practices, troubleshooting
- README updated with feature highlights

Security

I have run npm run security:check and there are no new vulnerabilities
My changes do not introduce security risks
- API keys configurable via environment only
- No keys in code or version control
- Graceful fallback ensures offline operation
- Code never sent to LLM unless explicitly configured

ADR Impact

This change affects existing ADRs (list which ones)
- ADR-009: Content Accuracy Validation Framework (enables LLM-powered validation)
- ADR-010: MCP Resource Pattern Redesign (no resource storage, stateless operation)
This change requires a new ADR
No ADR impact

Checklist

I have read the CONTRIBUTING guidelines
I have linked this PR to relevant issues
I have updated the TODO.md if this completes tasks

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

example.com
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
https://api.github.com/test
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (http block)
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (http block)
httpstat.us
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
invalid-domain-that-does-not-exist-12345.com
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
invalid-url-12345.test
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
this-domain-definitely-does-not-exist-12345.com
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
this-domain-should-not-exist-12345.com
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
www.google.com
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
- Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

This section details on the original issue you should resolve

<issue_title>feat: Add LLM integration layer for semantic code analysis</issue_title>
<issue_description>## Summary
Add an LLM integration layer to enable semantic analysis of code changes beyond AST-based syntax comparison.

Motivation

The DocuMCP Orchestrator (ADR-009, ADR-010) requires semantic understanding of code changes to:

Detect behavioral changes within the same function signature

Simulate code execution paths without running code

Generate intelligent documentation update suggestions

Proposed Solution

1. Create src/utils/llm-client.ts
export interface LLMConfig {
  provider: 'deepseek' | 'openai' | 'anthropic' | 'ollama';
  apiKey?: string;
  baseUrl?: string;
  model: string;
  maxTokens?: number;
}

export interface LLMClient {
  complete(prompt: string): Promise<string>;
  analyzeCodeChange(before: string, after: string): Promise<SemanticAnalysis>;
  simulateExecution(example: string, implementation: string): Promise<SimulationResult>;
}
2. Create src/utils/semantic-analyzer.ts
export interface SemanticAnalysis {
  hasBehavioralChange: boolean;
  breakingForExamples: boolean;
  changeDescription: string;
  affectedDocSections: string[];
  confidence: number;
}

export class SemanticAnalyzer {
  async analyzeSemanticImpact(
    codeBefore: string,
    codeAfter: string,
    functionName: string
  ): Promise<SemanticAnalysis>;
}
Acceptance Criteria

LLM client supports multiple providers (DeepSeek default)

Configuration via environment variables

Fallback to AST-only mode if LLM unavailable

Rate limiting and error handling

Unit tests with mocked LLM responses

Related

Orchestrator ADR-009: Documentation Sync Agent

Orchestrator ADR-010: LLM-Validated Documentation Examples

Orchestrator Repo: https://github.com/tosin2013/documcp-orchestrator

Labels

enhancement, priority-high, orchestrator-integration</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes feat: Add LLM integration layer for semantic code analysis #71

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…ests

…mentation

…angelog (#1, #2) - Add npm publication verification with retry mechanism - Add package installation test after publication - Enhance changelog extraction in release workflow - Add commit message validation before release - Update coverage threshold from 80% to 85% - Create ADR-012: Priority Scoring System for Documentation Drift - Create ADR-013: Release Pipeline and Package Distribution Architecture - Update ADRs with GitHub issue references (#77, #78, #80, #81, #82, #83) - Add LLM integration documentation to ADR-009 - Add Diataxis type tracking documentation to ADR-004 Fixes #1, #2 Related to #3 (coverage already exceeds target)

Initial plan

afd652c

Copilot AI assigned Copilot and tosin2013 Dec 10, 2025

Copilot started work on behalf of tosin2013 December 10, 2025 13:34 View session

Copilot AI added 3 commits December 10, 2025 13:42

feat: Implement LLM client and semantic analyzer with comprehensive t…

3e6d415

…ests

docs: Add LLM integration documentation and update README

fa58e87

refactor: Address code review feedback - improve type safety and docu…

3a93522

…mentation

Copilot AI changed the title ~~[WIP] Add LLM integration layer for semantic code analysis~~ feat: Add LLM integration layer for semantic code analysis Dec 10, 2025

Copilot AI requested a review from tosin2013 December 10, 2025 13:51

Copilot finished work on behalf of tosin2013 December 10, 2025 13:51

tosin2013 marked this pull request as ready for review December 10, 2025 13:55

tosin2013 merged commit f7b6fcd into main Dec 10, 2025
17 of 19 checks passed

tosin2013 deleted the copilot/add-llm-integration-layer branch December 10, 2025 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add LLM integration layer for semantic code analysis #82

feat: Add LLM integration layer for semantic code analysis #82

Uh oh!

Copilot AI commented Dec 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add LLM integration layer for semantic code analysis #82

feat: Add LLM integration layer for semantic code analysis #82

Uh oh!

Conversation

Copilot AI commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Description

Type of Change

Testing

Code Quality

Documentation

Security

ADR Impact

Checklist

I tried to connect to the following addresses, but was blocked by firewall rules:

Motivation

Proposed Solution

1. Create src/utils/llm-client.ts

2. Create src/utils/semantic-analyzer.ts

Acceptance Criteria

Related

Labels

Comments on the Issue (you are @copilot in this section)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 10, 2025 •

edited

Loading

1. Create `src/utils/llm-client.ts`

2. Create `src/utils/semantic-analyzer.ts`