Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 10, 2025

Pull Request

Description

Implements optional LLM integration for semantic code analysis beyond AST syntax comparison. Enables behavioral change detection, code execution simulation, and intelligent documentation synchronization for DocuMCP Orchestrator (ADR-009, ADR-010).

Architecture:

  • LLM Client (src/utils/llm-client.ts): Multi-provider abstraction (DeepSeek, OpenAI, Anthropic, Ollama) with rate limiting, timeout handling, and graceful degradation
  • Semantic Analyzer (src/utils/semantic-analyzer.ts): Three-mode analysis (LLM/AST/Hybrid) with automatic fallback and confidence-based selection
  • Complete isolation: Zero impact on existing functionality, works fully offline without LLM

Usage:

import { SemanticAnalyzer } from './utils/semantic-analyzer.js';

const analyzer = new SemanticAnalyzer();
await analyzer.initialize();

// Detects behavioral changes, not just signature changes
const analysis = await analyzer.analyzeSemanticImpact(
  'function multiply(a, b) { return a * b; }',
  'function multiply(a, b) { return a + b; }'  // Bug!
);

console.log(analysis.hasBehavioralChange);     // true
console.log(analysis.breakingForExamples);     // true
console.log(analysis.analysisMode);            // 'llm' | 'ast' | 'hybrid'
console.log(analysis.confidence);              // 0.0-1.0

// Validate documentation examples without execution
const validation = await analyzer.validateExamples(
  ['const result = multiply(6, 7);  // Should return 42'],
  implementation
);

Configuration (optional):

export DOCUMCP_LLM_API_KEY="sk-..."
export DOCUMCP_LLM_PROVIDER="deepseek"  # default
export DOCUMCP_LLM_MODEL="deepseek-chat"

Type of Change

  • New feature (non-breaking change which adds functionality)
  • Documentation update

Testing

  • Tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes (48 new tests, 1554 total passing)
  • Test coverage maintains or improves upon 80% threshold (85-97% on new modules)

Code Quality

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have run npm run validate:rules and all checks pass

Documentation

  • I have made corresponding changes to the documentation
    • Complete how-to guide: docs/how-to/llm-integration.md
    • Provider configuration for all supported LLMs
    • Usage patterns, best practices, troubleshooting
    • README updated with feature highlights

Security

  • I have run npm run security:check and there are no new vulnerabilities
  • My changes do not introduce security risks
    • API keys configurable via environment only
    • No keys in code or version control
    • Graceful fallback ensures offline operation
    • Code never sent to LLM unless explicitly configured

ADR Impact

  • This change affects existing ADRs (list which ones)
    • ADR-009: Content Accuracy Validation Framework (enables LLM-powered validation)
    • ADR-010: MCP Resource Pattern Redesign (no resource storage, stateless operation)
  • This change requires a new ADR
  • No ADR impact

Checklist

  • I have read the CONTRIBUTING guidelines
  • I have linked this PR to relevant issues
  • I have updated the TODO.md if this completes tasks

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • example.com
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
  • https://api.github.com/test
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (http block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (http block)
  • httpstat.us
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
  • invalid-domain-that-does-not-exist-12345.com
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
  • invalid-url-12345.test
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
  • this-domain-definitely-does-not-exist-12345.com
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
  • this-domain-should-not-exist-12345.com
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)
  • www.google.com
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js (dns block)
    • Triggering command: /usr/local/bin/node /usr/local/bin/node /home/REDACTED/work/documcp/documcp/node_modules/jest-worker/build/workers/processChild.js k/documcp/documc9bbac2319fde065e4b9004787f53c0631b993eb8:tests/utils/llm-client.test.ts git bin/�� --port 4000 node /usr/bin/git semantic-analyzenode 1.11.1 19.6/src git log ll serve --port 3000 --format=%cI k/documcp/node_modules/.bin/sh 19.6/deps/uv/incsh with comprehens-c 19.6/deps/zlib sh (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>feat: Add LLM integration layer for semantic code analysis</issue_title>
<issue_description>## Summary
Add an LLM integration layer to enable semantic analysis of code changes beyond AST-based syntax comparison.

Motivation

The DocuMCP Orchestrator (ADR-009, ADR-010) requires semantic understanding of code changes to:

  • Detect behavioral changes within the same function signature
  • Simulate code execution paths without running code
  • Generate intelligent documentation update suggestions

Proposed Solution

1. Create src/utils/llm-client.ts

export interface LLMConfig {
  provider: 'deepseek' | 'openai' | 'anthropic' | 'ollama';
  apiKey?: string;
  baseUrl?: string;
  model: string;
  maxTokens?: number;
}

export interface LLMClient {
  complete(prompt: string): Promise<string>;
  analyzeCodeChange(before: string, after: string): Promise<SemanticAnalysis>;
  simulateExecution(example: string, implementation: string): Promise<SimulationResult>;
}

2. Create src/utils/semantic-analyzer.ts

export interface SemanticAnalysis {
  hasBehavioralChange: boolean;
  breakingForExamples: boolean;
  changeDescription: string;
  affectedDocSections: string[];
  confidence: number;
}

export class SemanticAnalyzer {
  async analyzeSemanticImpact(
    codeBefore: string,
    codeAfter: string,
    functionName: string
  ): Promise<SemanticAnalysis>;
}

Acceptance Criteria

  • LLM client supports multiple providers (DeepSeek default)
  • Configuration via environment variables
  • Fallback to AST-only mode if LLM unavailable
  • Rate limiting and error handling
  • Unit tests with mocked LLM responses

Related

Labels

enhancement, priority-high, orchestrator-integration</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Add LLM integration layer for semantic code analysis feat: Add LLM integration layer for semantic code analysis Dec 10, 2025
Copilot AI requested a review from tosin2013 December 10, 2025 13:51
@tosin2013 tosin2013 marked this pull request as ready for review December 10, 2025 13:55
@tosin2013 tosin2013 merged commit f7b6fcd into main Dec 10, 2025
17 of 19 checks passed
@tosin2013 tosin2013 deleted the copilot/add-llm-integration-layer branch December 10, 2025 13:55
tosin2013 added a commit that referenced this pull request Dec 10, 2025
…angelog (#1, #2)

- Add npm publication verification with retry mechanism
- Add package installation test after publication
- Enhance changelog extraction in release workflow
- Add commit message validation before release
- Update coverage threshold from 80% to 85%
- Create ADR-012: Priority Scoring System for Documentation Drift
- Create ADR-013: Release Pipeline and Package Distribution Architecture
- Update ADRs with GitHub issue references (#77, #78, #80, #81, #82, #83)
- Add LLM integration documentation to ADR-009
- Add Diataxis type tracking documentation to ADR-004

Fixes #1, #2
Related to #3 (coverage already exceeds target)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Add LLM integration layer for semantic code analysis

2 participants