DocSentinel — Product Requirements Document (PRD) | 产品需求文档


Version	v4.0
Date	2026-03-30
Author	PAN CHAO
Contact	u3638376@connect.hku.hk

System Architecture | 系统架构文档

Full system architecture (diagrams, data flow, deployment) is maintained in: 完整的系统架构说明（含图示、数据流、部署视图）已单独成文：

ARCHITECTURE.md

Section 5 of this PRD contains only an architecture summary and index. 本文 PRD 第五节仅保留架构摘要与索引。

History | 版本历史

v4.0: SSDLC + LangGraph. Full SSDLC lifecycle support (6 stages), LangChain/LangGraph as orchestration engine, stage-specific skills and assessment flows. Pivoted to full-phase support with phase-specific SSDLC agents. SSDLC + LangGraph。完整 SSDLC 生命周期支持（6 阶段），引入 LangChain/LangGraph 作为编排引擎，阶段专属 Skill 与评估流程。转向全阶段支持，配备阶段专用 SSDLC Agent。
v3.1: Performance & quality. Graph RAG, Docling parser, async pipeline, parallel orchestration, guardrails, singleton KB, cached LLM. 性能与质量优化。Graph RAG、Docling 解析器、异步流水线、并行编排、输入防护、单例 KB、缓存 LLM。
v3.0: Headless pivot. Removed Streamlit frontend; pure API + MCP service. 无前端化转型。移除 Streamlit 前端，纯 API + MCP 服务。
v2.0: Major upgrade. Added Multi-Agent Orchestration, Human-in-the-Loop Workflow, Skill/Persona Management, and One-Click Deployment. 重大更新。新增多代理编排、人机协作流、技能/角色管理及一键部署。
v1.4: PRD and System Architecture doc split. PRD 与系统架构文档分离。
v1.3: Added "Security Requirements and Controls". 新增非业务性「安全需求与安全控制」。
v1.2: KB multi-format upload & open-source parsing; Parser reuse. 知识库多格式上传与开源解析、Parser 复用。
v1.1: Enterprise integration (ServiceNow), IAM (AAD/SSO, RBAC), Deployment. 企业集成（ServiceNow）、IAM（AAD/SSO、RBAC）、部署与连通性。

1. Document Purpose | 文档说明

English

This PRD is for the open-source "DocSentinel" project. It defines business pain points, solution approach, system architecture, and product scope to serve as a single source of truth for subsequent design and development. The project aims to build an AI-powered SSDLC (Secure Software Development Lifecycle) platform that automates security activities across all six phases of the software development lifecycle — from requirements gathering to production operations. It automates the review of and recommendations for security-related documents, forms, and reports, reduces the burden on enterprise security teams, and supports integration with mainstream and local LLMs, multi-format file parsing, and extensible Skills and knowledge bases. Powered by LangChain and LangGraph for intelligent agent orchestration, it helps enterprise security teams embed security into every stage of delivery, not just the final review.

中文

本 PRD 面向「DocSentinel」开源项目，用于明确业务痛点、解决方案、系统架构与产品范围，为后续设计与开发提供统一依据。项目目标是构建一个 AI 驱动的 SSDLC（安全开发生命周期）平台，自动化覆盖软件开发生命周期全部六个阶段的安全活动——从需求收集到生产运维。通过 AI Agent 自动化完成安全评估相关文档/表格/报告的审阅与建议，减轻企业安全团队负担，并支持对接主流与本地大模型、多格式文件解析及可扩展的 Skill 与知识库。通过 LangChain 与 LangGraph 实现智能 Agent 编排，帮助企业安全团队将安全内嵌到交付的每一个环节，而非仅在最终审阅时介入。

2. Business Context and Pain Points | 业务背景与痛点

2.1 Business Context | 业务背景

English

Enterprise Cyber Security teams operate under multiple constraints:

Diverse reference sources: Internal security policies, industry best practices (e.g. NIST SSDF, OWASP, CISA), past project cases, and compliance frameworks (e.g. SOC2, ISO 27001, PCI DSS).
Full SSDLC coverage: Security review and control requirements exist at every stage — requirements/design, development, testing, deployment, and operations — but most tools only address one or two stages.
Wide variety of deliverables: Security questionnaires, threat models, architecture documents, secure coding guidelines, SAST/DAST reports, penetration test findings, deployment checklists, compliance evidence, and audit materials all require manual reading, comparison, and sign-off.
Shift-left pressure: Modern DevSecOps demands security involvement early in the lifecycle, but security teams lack tooling to scale across requirements, design, and development phases.

In agile and DevOps environments, enterprises ship dozens to hundreds of projects per year. Security teams must complete large volumes of assessments and reviews with limited headcount, creating a clear bottleneck — especially when coverage is expected across the entire SSDLC, not just pre-release reviews.

中文

大型企业的 Cyber Security 团队需要在以下多维度约束下工作：

依据来源多样：公司内部 Security Policy、行业最佳实践（如 NIST SSDF、OWASP、CISA 等）、历史项目案例与合规框架（如 SOC2、ISO 27001、PCI DSS）。
流程覆盖完整 SSDLC：从需求/设计、开发、测试、部署到运维，每个阶段都有安全评审与管控要求——但大多数工具只覆盖一两个阶段。
交付物类型繁多：安全问卷、威胁建模、架构文档、安全编码规范、SAST/DAST 报告、渗透测试结果、部署检查清单、合规证明、审计材料等，需人工阅读、比对与签字（Sign-off）。
左移压力：现代 DevSecOps 要求安全尽早介入生命周期，但安全团队缺乏在需求、设计、开发阶段规模化覆盖的工具支持。

在敏捷与 DevOps 环境下，企业每年上线项目数量从几十到几百不等，安全人员需要在有限人力下完成大量评估与审阅，成为明显瓶颈——尤其当覆盖范围从上线前审阅扩展到整个 SSDLC 时。

2.2 Core Pain Points | 核心痛点

Pain Point (English)	痛点描述 (中文)
Fragmented SSDLC coverage Most tools cover only testing/deployment; requirements, design, and development phases lack automated security support.	SSDLC 覆盖碎片化大多数工具仅覆盖测试/部署阶段；需求、设计和开发阶段缺乏自动化安全支持。
Fragmented assessment criteria Teams must align with policies, industry standards, and project precedents; manual lookup and alignment cost is high.	评估依据分散需同时参照 Policy、行业标准、项目案例；人工查找与对齐成本高。
No unified threat modeling Threat models are created ad-hoc in design phase; no automated STRIDE/DREAD analysis or carry-forward to testing.	威胁建模无统一支持设计阶段威胁模型临时创建；无自动化 STRIDE/DREAD 分析，也无法延续至测试阶段。
Heavy questionnaire workflow Multiple rounds of questionnaire filling, assessment, evidence collection, and review; inconsistent templates.	问卷与证据流程繁重问卷—评估—证据—审阅多轮往返；模板不统一、证据质量参差。
Development-phase control relies on people Secure coding guidance, SAST result interpretation, policy definition, and exception approval still depend on security staff and are hard to scale.	开发阶段管控依赖人工安全编码指导、SAST 结果解读、策略制定、例外审批仍依赖安全人员，难以规模化。
Pre-release review pressure Security must review every file and sign off; DAST/pentest reports need interpretation. Technical documents are hard for non-technical staff to interpret.	上线前集中审阅压力大需 Review 全部文件并 Sign-off；DAST/渗透测试报告需解读；技术文档阅读与理解成本高。
Post-deployment blind spots Vulnerability monitoring, incident response, and patch tracking are disconnected from the development lifecycle.	上线后盲区漏洞监控、应急响应和补丁跟踪与开发生命周期脱节。
Scale vs. consistency Manual assessment tends to be inconsistent, incomplete, or delayed; reusable patterns are hard to institutionalize.	规模与一致性矛盾人工评估易出现不一致、遗漏或延迟，且难以沉淀可复用的评估模式。
SSDLC coverage gaps Security involvement is unevenly distributed across the SSDLC; requirements and design phases often get less scrutiny than pre-release review, leaving risks to accumulate.	SSDLC 覆盖断层安全介入在 SSDLC 各阶段分布不均；需求与设计阶段审查不足，风险层层积累到上线前集中爆发。

2.3 Desired Change | 期望改变

Full lifecycle coverage / 全生命周期覆盖: Provide AI-assisted security support across all six SSDLC phases, not just testing and deployment.
Automation / 自动化: Automate analysis and initial assessment of security artifacts at each phase — from requirements to operations.
Consistency / 一致性: Produce consistent assessment conclusions and remediation recommendations based on a unified knowledge base and policies.
Intelligence / 智能化: Use LangGraph-orchestrated agents to reason about cross-phase dependencies (e.g. a threat identified in design must be tested and monitored).
Extensibility / 可扩展: Support custom SSDLC workflows, assessment scenarios, phase-specific skills, and different compliance frameworks and customer/project types.
SSDLC coverage / 全生命周期覆盖: Provide stage-aware assessment across the entire SSDLC — requirements, design, development, testing, deployment, and operations — with stage-specific skills and checklists.

3. Solution Overview | 解决方案概述

3.1 Product Positioning | 产品定位

English

Build an AI-powered SSDLC platform for security teams, with the primary focus on automating security activities and assessment of all forms, documents, and reports across the entire secure software development lifecycle. After security staff submit project-related files to the Agent, the platform:

Parses multi-format files: Convert Word, PDF, Excel, PPT, SAST/DAST reports, images, etc. into an intermediate format (e.g. JSON/Markdown).
Uses knowledge base and policy: Rely on built-in or configurable compliance and policy knowledge to understand "what standards must be met."
SSDLC-aware assessment: Automatically determine or accept the SSDLC stage and apply stage-specific assessment logic, checklists, and risk focus.
Performs risk assessment and recommendations: Identify security/compliance risks and provide security advice and actionable remediation.
Produces structured output: Enable security staff to quickly review, sign off, or hand off to business/development for remediation.

The platform covers six standard SSDLC phases with dedicated AI agents for each:

Requirements Phase Agent: Analyze requirements documents to identify security requirements, compliance obligations (GDPR, PCI DSS, etc.), and perform initial risk analysis.
Design Phase Agent: Review architecture/design documents, perform automated threat modeling (STRIDE/DREAD), evaluate security architecture, encryption schemes, and access control designs. Conduct Security Design Review (SDR).
Development Phase Agent: Assess code against secure coding standards, review SAST findings, evaluate security controls (anti-injection, XSS prevention), and provide secure coding guidance.
Testing Phase Agent: Analyze SAST/DAST scan reports, interpret penetration test results, prioritize vulnerability fixes, and verify remediation completeness.
Deployment Phase Agent: Review deployment configurations, evaluate secret management, assess hardening measures, and perform pre-release security sign-off checks.
Operations Phase Agent: Monitor vulnerability feeds, assist incident response, track patch management, and audit security logs.

The platform uses LangGraph to orchestrate these agents into configurable workflows — agents can run sequentially, in parallel, or conditionally based on project context. LangChain provides the unified LLM abstraction, tool integration, and RAG pipeline.

中文

构建一个面向安全团队的 AI 驱动 SSDLC 平台，首要方向为：自动化覆盖安全软件开发生命周期的全部安全活动，评估所有需要安全团队审阅的表格、文档与报告。安全人员将项目相关文件提交给 Agent 后，平台能够：

解析多格式文件：将 Word、PDF、Excel、PPT、SAST/DAST 报告、图片等转为可被模型理解的中间格式（如 JSON/Markdown）。
结合知识库与策略：基于内置/可配置的合规与策略知识库，理解「应该满足什么标准」。
SSDLC 阶段感知评估：自动识别或接受 SSDLC 阶段，应用阶段专属评估逻辑、检查清单和风险关注点。
执行风险评估与建议：识别与安全/合规相关的风险点，给出安全建议与可操作的整改方案。
输出结构化结果：便于安全人员快速复核、签字或转交业务/开发团队整改。

平台为六个标准 SSDLC 阶段配备专用 AI Agent：

需求阶段 Agent：分析需求文档，识别安全需求、合规义务（GDPR、PCI DSS 等），执行初步风险分析。
设计阶段 Agent：审阅架构/设计文档，执行自动化威胁建模（STRIDE/DREAD），评估安全架构、加密方案、权限设计。执行安全设计评审（SDR）。
开发阶段 Agent：对照安全编码规范评估代码，审阅 SAST 发现，评估安全控件（防注入、XSS 防护），提供安全编码指导。
测试阶段 Agent：分析 SAST/DAST 扫描报告，解读渗透测试结果，确定漏洞修复优先级，验证整改完整性。
部署阶段 Agent：审阅部署配置，评估密钥管理，评估加固措施，执行上线前安全检查。
运维阶段 Agent：监控漏洞情报，辅助应急响应，跟踪补丁管理，审计安全日志。

平台使用 LangGraph 将这些 Agent 编排为可配置的工作流——Agent 可根据项目上下文顺序执行、并行执行或条件执行。LangChain 提供统一的 LLM 抽象、工具集成和 RAG 管道。

3.2 SSDLC Phase Details | SSDLC 阶段详述

Phase 1: Requirements | 需求阶段

Activity (English)	活动 (中文)	Agent Capability
Define security requirements	定义安全需求	Extract security-relevant requirements from PRDs, user stories, BRDs
Identify compliance obligations	识别合规要求	Match requirements against GDPR, PCI DSS, SOC2, ISO 27001, etc.
Initial risk analysis	初步风险分析	Classify project risk level based on data sensitivity, exposure, and scope
Security requirements checklist	安全需求清单	Generate a checklist of security requirements that must be addressed

Phase 2: Design | 设计阶段

Activity (English)	活动 (中文)	Agent Capability
Security architecture review	安全架构评审	Evaluate architecture documents for security patterns and anti-patterns
Threat modeling (STRIDE/DREAD)	威胁建模	Automated STRIDE analysis on design documents; DREAD risk scoring
Access control & encryption design	权限设计与加密方案	Review IAM design, data flow encryption, key management proposals
Security Design Review (SDR)	安全设计评审	Structured SDR report with findings and recommendations

Phase 3: Development | 开发阶段

Activity (English)	活动 (中文)	Agent Capability
Secure coding standards assessment	安全编码规范评估	Check code/documents against OWASP Secure Coding Practices
SAST findings review	SAST 结果审阅	Triage and interpret SAST tool output, reduce false positives
Built-in security controls	内置安全控件	Evaluate anti-injection, XSS prevention, CSRF protection implementations
Secure coding guidance	安全编码指导	Provide language-specific secure coding recommendations

Phase 4: Testing | 测试阶段

Activity (English)	活动 (中文)	Agent Capability
SAST report analysis	SAST 报告分析	Parse and prioritize static analysis findings
DAST report analysis	DAST 报告分析	Parse and interpret dynamic scan results
Penetration test review	渗透测试审阅	Analyze pentest reports, map findings to controls
Vulnerability fix verification	漏洞修复验证	Verify remediation evidence against original findings

Phase 5: Deployment / Release | 部署/发布阶段

Activity (English)	活动 (中文)	Agent Capability
Pre-release security review	上线前安全评审	Checklist-based review of all phase outputs
Configuration security	配置安全	Review deployment configs, secrets management, least privilege
Security hardening assessment	安全加固评估	Evaluate server/container hardening against CIS benchmarks
Release sign-off	发布签字	Generate structured sign-off report with risk summary

Phase 6: Operations / Maintenance | 运维/响应阶段

Activity (English)	活动 (中文)	Agent Capability
Vulnerability monitoring	漏洞监控	Analyze CVE feeds and vulnerability advisories against project stack
Incident response assistance	应急响应辅助	Provide structured incident analysis and response recommendations
Patch management tracking	补丁管理跟踪	Track vulnerability remediation progress and SLA compliance
Log audit analysis	日志审计分析	Analyze security logs for anomalies and compliance evidence

3.3 Solution Value | 方案价值

Full lifecycle / 全生命周期: Security coverage from day one (requirements) through production operations — not just pre-release review.
Cost reduction / 降本: Reduce time security staff spend on repetitive document review across all SSDLC phases.
Speed / 提速: Shorten the cycle time at each phase; enable parallel security review with development.
Intelligence / 智能化: LangGraph-orchestrated agents maintain cross-phase context — a threat identified in design is automatically tracked through testing and deployment.
Reproducibility / 可复现: Assessment logic and criteria are captured in the knowledge base, skills, and graph-based workflows.
Openness / 开放: Support multiple commercial and local LLMs to meet requirements for data residency and cost control.

4. Product Goals and Success Metrics | 产品目标与成功指标

4.1 Product Goals | 产品目标

Goal	Description
SSDLC full coverage SSDLC 全阶段覆盖	Provide AI-assisted security assessment across all 6 SSDLC phases with dedicated agents for each, with stage-specific skills, checklists, and flows.
Intelligent orchestration 智能编排	Use LangGraph to create configurable, stateful agent workflows that maintain context across SSDLC phases.
Automated assessment 自动化评估	Support automatic parsing and risk assessment of common formats: security questionnaires, design documents, SAST/DAST reports, pentest findings, deployment configs, compliance evidence, and audit reports.
Configurable scenarios 可配置评估场景	Use the knowledge base and Skills to configure different assessment criteria by compliance framework, SSDLC phase, project type, customer type, or risk level.
Multi-model support 多模型支持	Support mainstream commercial LLMs (e.g. ChatGPT, Qwen, Claude) and local/on-prem models (e.g. Ollama) through a unified LangChain interface.
Actionable results 结果可操作	Output risk items, compliance gaps, threat models, remediation suggestions, and sign-off reports with traceability across phases.

4.2 Success Metrics (Suggested) | 成功指标（建议）

SSDLC Coverage: Number of SSDLC phases with active agent support (target: 6/6).
Coverage: Number of supported document types (e.g. 8+ common formats) and knowledge base entries per phase.
Efficiency: Average time from upload to report generation per phase; time saved vs. manual review.
Cross-phase traceability: Percentage of findings that are tracked from identification to remediation across phases.
Usability: Steps and time to complete one "upload → assess → review → sign-off" loop per phase.
Extensibility: Configuration/development cost to add a new SSDLC phase workflow or assessment scenario.

5. System Architecture | 系统架构

Full Architecture Document

For detailed diagrams, data flow, deployment, and security architecture, see: 详细组件说明、Mermaid 架构图、数据流与时序图、集成视图、安全架构及部署视图见：

ARCHITECTURE.md

5.1 Architecture Summary | 架构摘要

English

The system uses a layered design: Access (REST API / MCP Server / CLI) → SSDLC Orchestration (LangGraph state machine with phase-specific agents, SSDLC Pipeline, Memory, Skills) → Core Services (Knowledge Base RAG, Parser) → LLM Abstraction (LangChain) → Cloud/Local LLMs. The orchestrator is built on LangChain + LangGraph, enabling stateful, graph-based agent workflows with conditional branching per SSDLC stage. Optional integrations: AAD (identity/SSO), ServiceNow (project metadata), and SAST/DAST tools (scan results ingestion).

中文

系统采用分层设计：接入层（REST API / MCP Server / CLI）→ SSDLC 编排层（LangGraph 状态机与阶段专用 Agent、SSDLC 流水线、记忆体、Skill 层）→ 核心服务（知识库 RAG、文件解析）→ LLM 抽象层（LangChain）→ 商用/本地 LLM。编排引擎基于 LangChain + LangGraph 构建，支持有状态、图驱动的 Agent 工作流与 SSDLC 阶段条件分支。可选对接 AAD（身份/SSO）、ServiceNow（项目元数据）及 SAST/DAST 工具（扫描结果接入）。

High-Level Diagram | 架构图

                    ┌─────────────────────────────────────────────────────────┐
                    │           User / Security Staff | 用户 / 安全人员         │
                    └───────────────────────────┬─────────────────────────────┘
                                                │
                    ┌───────────────────────────▼─────────────────────────────┐
                    │            Access Layer | 接入层 (API / MCP / CLI)       │
                    └───────────────────────────┬─────────────────────────────┘
                                                │
    ┌───────────────────────────────────────────▼───────────────────────────────────────────┐
    │                    SSDLC Orchestration (LangGraph) | SSDLC 编排层                      │
    │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐       │
    │  │ Require- │ │  Design  │ │  Dev     │ │  Test    │ │  Deploy  │ │  Ops     │       │
    │  │ ments    │ │  Agent   │ │  Agent   │ │  Agent   │ │  Agent   │ │  Agent   │       │
    │  │ Agent    │ │          │ │          │ │          │ │          │ │          │       │
    │  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘       │
    │       └─────────────┴────────────┴─────────────┴────────────┴────────────┘             │
    │                                    │                                                    │
    │  ┌─────────────┐  ┌─────────────┐  │  ┌─────────────┐  ┌─────────────┐                 │
    │  │   Memory    │  │   Skills    │  │  │ KB (RAG)    │  │   Parser   │                 │
    │  │   记忆体     │  │   Skill 层  │  │  │   知识库     │  │  文件解析   │                 │
    │  └─────────────┘  └─────────────┘  │  └─────────────┘  └─────────────┘                 │
    │                                    │                                                    │
    │                        ┌───────────▼───────────┐                                       │
    │                        │ LLM Abstraction Layer  │                                       │
    │                        │   (LangChain)          │                                       │
    │                        └───────────┬───────────┘                                       │
    └────────────────────────────────────┼───────────────────────────────────────────────────┘
                                         │
        ┌───────────────────────────────┼───────────────────────────────────┐
        │  Commercial/Cloud LLM         │    Local/On-prem LLM              │
        │  ChatGPT / Claude / Qwen      │    Ollama / vLLM / ...            │
        └───────────────────────────────────────────────────────────────────┘

5.2 Component Index | 核心组件索引

Component	Role	Details
SSDLC Orchestrator	LangGraph state machine coordinating phase agents with conditional routing and shared state.	ARCHITECTURE.md § Component Design
SSDLC Pipeline	Stage-aware routing (6 stages); selects stage-specific skills and checklists.	ARCHITECTURE.md § SSDLC Pipeline
Phase Agents	Six dedicated agents, each with phase-specific prompts, tools, and evaluation criteria.	ARCHITECTURE.md § SSDLC Agents
Memory	Manages working, episodic, cross-phase, and semantic state via LangGraph checkpointing.	ARCHITECTURE.md § Component Design
Skills	Reusable assessment capabilities (e.g. threat modeling, SAST triage, compliance check, SSDLC stage skills).	ARCHITECTURE.md § Component Design
Knowledge Base	Multi-format ingestion, chunking, embedding, hybrid RAG (vector + graph).	ARCHITECTURE.md § Component Design
Parser	Converts files (PDF, Word, Excel, SAST/DAST reports, etc.) to Markdown/JSON.	ARCHITECTURE.md § Component Design
LLM Abstraction	LangChain unified interface for model switching.	ARCHITECTURE.md § Component Design
Integrations	AAD (SSO), ServiceNow (metadata), SAST/DAST tool connectors.	ARCHITECTURE.md § Integration Points

5.3 Data Flow (Summary) | 数据流（简要）

User submits SSDLC assessment task (files + phase + optional SSDLC stage / skill ID / project/scenario) via API/MCP. API returns task_id immediately (non-blocking).
(Optional) Fetch project metadata from ServiceNow.
Parser converts files to intermediate Markdown/text format (Docling or legacy).
SSDLC Router determines the lifecycle stage (auto-detect or user-specified) and selects stage-specific skill + checklist.
LangGraph Orchestrator routes to the appropriate Phase Agent(s). Policy+History and Evidence nodes run in parallel, followed by Drafter and Reviewer nodes.
Phase Agent(s) load Knowledge Base chunks (RAG) and Skills, call LLM with context.
Generate structured assessment report (risks, gaps, threat model, remediations, confidence, sources, SSDLC stage) with cross-phase traceability.
Results stored for human-in-the-loop review and sign-off. User polls GET /assessments/{task_id} to retrieve the completed report.

6. Scope and User Stories | 功能范围与用户故事

6.1 Core Feature List | 核心功能列表

Module	Feature	Priority
SSDLC Orchestrator	LangGraph-based state machine with 6 phase agents and conditional routing.	P0
SSDLC Orchestrator	Cross-phase state management and finding traceability.	P0
SSDLC Orchestrator	Configurable workflows: sequential, parallel, or selective phase execution.	P1
Requirements Agent	Analyze requirements docs for security requirements and compliance obligations.	P0
Design Agent	Automated threat modeling (STRIDE/DREAD) from architecture documents.	P0
Design Agent	Security Design Review (SDR) report generation.	P0
Design Agent	Threat modeling integration: support PyTM exports and Mermaid.js diagrams to help the agent “see” architecture, data flows, and trust boundaries.	P1
Development Agent	Secure coding assessment against OWASP standards.	P0
Development Agent	SAST findings triage and interpretation.	P1
Testing Agent	SAST/DAST report parsing and vulnerability prioritization.	P0
Testing Agent	Penetration test report analysis and remediation tracking.	P1
Deployment Agent	Pre-release security checklist and configuration review.	P0
Deployment Agent	CIS benchmark assessment for hardening.	P1
Operations Agent	Vulnerability monitoring and CVE analysis against project stack.	P1
Operations Agent	Incident response assistance and log audit.	P2
SSDLC	Auto-detect stage from document content or accept explicit stage parameter.	P1
Parser	Upload Word / PDF / Excel / PPT / SAST/DAST reports and convert to JSON/Markdown.	P0
Parser	OCR / Vision support for images.	P1
Parser	Ingest architecture diagrams as text inputs (e.g. Mermaid.js `.mmd`) for Design-stage reviews.	P1
Knowledge Base	Upload multi-format docs, parse, chunk, embed, and retrieve (RAG).	P0
Knowledge Base	Metadata filtering (e.g. by framework, SSDLC phase, project, customer).	P1
Knowledge Base	Phase-specific knowledge collections (requirements policies, design patterns, coding standards, etc.).	P0
Knowledge Base	Graph RAG: Map relationships across internal policies and controls for deeper compliance insights.	P1
Assessment	Select SSDLC phase and scenario, upload files, trigger assessment.	P0
Assessment	Output structured report (Risks, Gaps, Threat Model, Remediation, Confidence).	P0
Assessment	Human-in-the-Loop: Review, approve, reject, comment workflow.	P0
Assessment	HITL feedback learning: allow auditors to correct findings and feed accepted corrections back into history/KB to reduce future false positives.	P1
Assessment	Per-finding Confidence Scores + evidence links (page/paragraph citations) to speed up manual verification and benchmarking.	P1
LLM	Configurable commercial LLMs (OpenAI, Claude, etc.) via LangChain.	P0
LLM	Configurable local models (Ollama) via LangChain.	P0
Skill	Skill/Persona Management: Create custom roles and import templates.	P0
Skill	Built-in personas per SSDLC phase (e.g. Threat Modeler, Secure Code Reviewer, Pentest Analyst, SOC2 Auditor, AppSec Engineer).	P0
Memory	LangGraph checkpointing for cross-phase state persistence.	P0
Memory	History Reuse: Retrieve past similar assessments.	P1
Access	REST API + MCP Server for agent integration.	P0
Integrations	ServiceNow: Read project metadata.	P0
Integrations	ServiceNow: Write back results / Webhook trigger.	P1
Integrations	SAST/DAST tool connectors (SonarQube, Checkmarx, Burp, etc.).	P1
Integrations	Automated remediation tracking: create and sync remediation items to Jira or GitHub Issues (ticket links in report).	P1
IAM	AAD (Azure AD) Login & SSO.	P0
IAM	RBAC (Analyst, Lead, Project Owner, Admin, API Consumer).	P0
IAM	API Authentication (Bearer Token / API Key).	P0
IAM	Data isolation by project/role.	P0

6.2 User Stories (Examples) | 用户故事（示例）

As a security team member, I want to upload a project's requirements document (or a Security Questionnaire) and have the Requirements Agent automatically identify missing security requirements, compliance obligations, and gaps vs. policy/standards so that I can provide early feedback before design begins.
As a security architect, I want to submit an architecture document to the Design Agent and receive an automated STRIDE threat model so that I can focus on reviewing and validating threats rather than creating the initial model from scratch.
As a security lead, I want to run a full SSDLC assessment across multiple phases for a project (or select a project from ServiceNow) so that I get a unified view of security posture from requirements through deployment.
As a developer, I want to submit my code review package and SAST results to the Development Agent via REST API so that I get prioritized findings with secure coding guidance specific to my language and framework, in JSON format for integration into ticketing workflows.
As a pentest manager, I want to upload penetration test reports to the Testing Agent so that findings are automatically mapped to the original threat model and remediation is tracked.
As an operations engineer, I want the Operations Agent to analyze new CVE feeds against our deployment stack and evaluate incident response logs so that I know which vulnerabilities require immediate patching and can identify process gaps.
As enterprise IT, I want to configure the platform to use only a local Ollama model so that all assessment data stays within the internal network.
As a DevSecOps engineer, I want to integrate the assessment API into our CI/CD pipeline so that security checks run automatically at each stage.
As a project manager, I want the Agent to auto-detect the SSDLC stage from the uploaded document type so that I don't need to manually specify it every time.

6.3 SSDLC Stage Definitions | SSDLC 阶段定义

The 6 standard SSDLC stages (aligned with NIST, OWASP, and Microsoft SDL):

Stage	Name (EN)	阶段名称 (CN)	Key Activities	Typical Inputs
1	Requirements	需求阶段	Define security requirements, compliance mapping (GDPR, ISO 27001, etc.), initial threat modeling, risk analysis	Requirements docs, compliance checklists, regulatory references
2	Design	设计阶段	Security architecture design, permission/access model, encryption scheme, threat modeling (STRIDE/DREAD), Security Design Review (SDR)	Architecture docs, design specs, threat models, data flow diagrams
3	Development	开发阶段	Secure coding standards compliance, security training verification, built-in security controls (anti-injection, XSS prevention, input validation)	Source code, coding guidelines, code review reports
4	Testing	测试阶段	SAST (static analysis), DAST (dynamic scanning), penetration testing, vulnerability fix & verification	SAST/DAST reports, pen-test findings, vulnerability scan results
5	Deployment	部署阶段	Security release readiness review, configuration security (key management, least privilege), hardening checklist	Deployment configs, infrastructure-as-code, release checklists
6	Operations	运维阶段	Vulnerability monitoring, incident response evaluation, patch management, log audit, ongoing compliance	Monitoring alerts, incident reports, audit logs, patch records

Each stage maps to one or more built-in SSDLC Skills that define stage-specific system_prompt, risk_focus, compliance_frameworks, and assessment checklists. Users can also create custom SSDLC skills.

7. Non-Functional Requirements | 非功能需求

7.1 General NFRs | 通用非功能需求

Category	Requirement (English)	要求 (中文)
Security & Privacy	Support fully local/on-prem deployment and local LLM; support audit logs.	支持纯本地部署与本地 LLM；支持审计日志。
Performance	Acceptable end-to-end latency for single-phase assessment; parallel phase execution for full SSDLC.	单阶段评估时延可接受；全 SSDLC 评估支持并行执行。
Maintainability	KB, Skills, LangGraph workflows, and LLM config maintainable via config/API without code changes.	知识库、Skill、LangGraph 工作流、LLM 可配置，无需改代码扩展。
Observability	Log model usage, tokens, duration, errors, and agent state transitions.	记录模型、token、耗时、错误及 Agent 状态转换。
Auth & Isolation	RBAC and data isolation by project/role; fine-grained auth via AAD/ServiceNow.	按角色与项目隔离数据；细粒度授权。
Deployment	Support on-prem/private deployment; connectivity to AAD/ServiceNow/LLM/SAST/DAST tools.	支持内网部署；需连通 AAD/ServiceNow/LLM/SAST/DAST 工具。
Open Source	Architecture aligns with mainstream open-source Agent projects (LangChain/LangGraph ecosystem).	架构对齐 LangChain/LangGraph 生态，便于社区贡献。

7.2 Security Requirements and Controls (Non-Functional) | 安全需求与控制

This section defines security controls for the system itself (not the documents being assessed).

7.2.1 Control Domains | 控制域

IAM: Identity and Access Control (身份与访问控制)
DATA: Data Security (数据安全)
APP: Application Security (应用安全)
OPS: Operations and Audit (运维与审计)
SCM: Supply Chain and Open Source (供应链与开源)

7.2.2 Identity and Access Control | 身份与访问控制

IAM-01: All user/integration endpoints must require authentication (except health checks).
IAM-02: Strong auth: AAD/OIDC SSO; API Bearer JWT or API Key (no secrets in URL).
IAM-03: RBAC with least privilege default.
IAM-04: Session/Token timeout and revocation.
IAM-05: Sensitive operations (e.g. delete KB, modify workflows) require confirmation or higher privilege.

7.2.3 Data Security | 数据安全

DATA-01: TLS (1.2+) for all transport.
DATA-02: Encryption at rest for sensitive data; secrets management (no plaintext in code).
DATA-03: Data minimization and retention policy.
DATA-04: PII handling compliance (access control, audit).
DATA-05: Clarify LLM data usage (cloud vs. local) for data sovereignty.

7.2.4 Application Security | 应用安全

APP-01: Input validation (file type, size, path traversal).
APP-02: Injection prevention (prompt injection mitigation, SQLi/Command injection).
APP-03: Dependency scanning (SCA) and updates.
APP-04: Safe error handling (no stack traces to users).
APP-05: Web protections (CSRF, security headers, rate limiting).

7.2.5 Operations and Audit | 运维与审计

OPS-01: Audit logs (who, what, when, resource) protected from tampering.
OPS-02: Operational logs (performance, errors, agent state transitions) without sensitive content.
OPS-03: Security event detection and alerting.
OPS-04: Backup and recovery for critical data (KB, assessment history, LangGraph checkpoints).

7.2.6 Supply Chain | 供应链

SCM-01: Trusted dependency sources.
SCM-02: Vulnerability management process.
SCM-03: License compliance.

8. Technology Stack | 技术栈

8.1 Agent Orchestration | Agent 编排

Component	Technology	Purpose
Workflow Engine	LangGraph	Stateful, graph-based agent orchestration with conditional routing, parallel execution, and checkpointing
LLM Framework	LangChain	Unified LLM abstraction, prompt management, tool integration, RAG chains
State Management	LangGraph Checkpointing	Cross-phase state persistence, conversation memory, assessment context

8.2 Core Stack | 核心技术栈

Component	Technology	Purpose
Language	Python 3.10+	Primary development language
Web/API	FastAPI	Async REST API with auto OpenAPI
Vector DB	ChromaDB	Chunk-level similarity search
Graph RAG	LightRAG	Entity-relationship aware retrieval
Embeddings	sentence-transformers	Vector embeddings for RAG
Parsing	Docling (primary) + legacy fallback	Multi-format document parsing
LLM Providers	OpenAI, Ollama	Cloud and local LLM support

9. References | 参考与借鉴

LangGraph Documentation: Reference for stateful agent orchestration, conditional routing, and multi-agent patterns.
LangChain Documentation: Reference for LLM abstraction, RAG patterns, and tool integration.
NIST SSDF (Secure Software Development Framework): Reference for SSDLC phase definitions and security activities.
OWASP SAMM (Software Assurance Maturity Model): Reference for security practice areas across the SDLC.
Microsoft SDL: Reference for security development lifecycle practices.
STRIDE/DREAD: Reference for threat modeling methodology.

10. Next Steps | 后续步骤

LangGraph Integration: Implement LangGraph state machine with phase agent nodes, conditional edges, and shared state.
Phase Agent MVP: Implement Requirements and Design phase agents first (highest Shift-Left value).
Knowledge Base per Phase: Build phase-specific knowledge collections (requirements policies, design patterns, coding standards, testing guides, deployment checklists, operations playbooks).
SAST/DAST Connectors: Build parsers for common tool output formats (SonarQube, Checkmarx, Burp Suite, OWASP ZAP).
Cross-Phase Traceability: Implement finding linkage from threat model → test case → deployment check → monitoring rule.
Enterprise Integration: Align with IT on AAD registration and ServiceNow API access.
Pilot: Run with 1-2 teams across a full SSDLC cycle to gather feedback.
Open Source: Release as "DocSentinel" after MVP stabilization.

11. Open Questions and Deliverables | 待澄清问题与建议产出

11.1 Open Questions | 待澄清问题

LangGraph Workflow Schema: How to define and persist custom SSDLC workflow configurations?
Phase Agent Granularity: Should each phase have a single agent or multiple sub-agents (e.g. Design → Threat Modeler + Architecture Reviewer)?
SAST/DAST Integration: Which tool output formats to support first? Standard SARIF format?
Cross-Phase State: How much context to carry between phases? Full report or summarized findings?
Report Schema: Concrete JSON schema for phase-specific and cross-phase findings?
Skill Contract: Input/output for the first phase-specific Skills?
KB Partitioning: Separate vector collections per SSDLC phase or unified with metadata filtering?
Limits: File size, concurrency, rate limits per phase?

11.2 Recommended Deliverables | 建议产出文档

Technology & Architecture: docs/01-architecture-and-tech-stack.md
API Specification: docs/02-api-specification.yaml
Report & Skill Contract: docs/03-assessment-report-and-skill-contract.md
Integration Guide: docs/04-integration-guide.md
Deployment Runbook: docs/05-deployment-runbook.md
Agent Integration (MCP): docs/06-agent-integration.md
SSDLC Workflow Guide: docs/07-ssdlc-workflow-guide.md (new)
Security Implementation: SECURITY.md and secure coding guidelines.

End of Document

FilesExpand file tree

SPEC.md

Latest commit

History

SPEC.md

File metadata and controls

DocSentinel — Product Requirements Document (PRD) | 产品需求文档

1. Document Purpose | 文档说明

2. Business Context and Pain Points | 业务背景与痛点

2.1 Business Context | 业务背景

2.2 Core Pain Points | 核心痛点

2.3 Desired Change | 期望改变

3. Solution Overview | 解决方案概述

3.1 Product Positioning | 产品定位

3.2 SSDLC Phase Details | SSDLC 阶段详述

Phase 1: Requirements | 需求阶段

Phase 2: Design | 设计阶段

Phase 3: Development | 开发阶段

Phase 4: Testing | 测试阶段

Phase 5: Deployment / Release | 部署/发布阶段

Phase 6: Operations / Maintenance | 运维/响应阶段

3.3 Solution Value | 方案价值

4. Product Goals and Success Metrics | 产品目标与成功指标

4.1 Product Goals | 产品目标

4.2 Success Metrics (Suggested) | 成功指标（建议）

5. System Architecture | 系统架构

5.1 Architecture Summary | 架构摘要

5.2 Component Index | 核心组件索引

5.3 Data Flow (Summary) | 数据流（简要）

6. Scope and User Stories | 功能范围与用户故事

6.1 Core Feature List | 核心功能列表

6.2 User Stories (Examples) | 用户故事（示例）

6.3 SSDLC Stage Definitions | SSDLC 阶段定义

7. Non-Functional Requirements | 非功能需求

7.1 General NFRs | 通用非功能需求

7.2 Security Requirements and Controls (Non-Functional) | 安全需求与控制

8. Technology Stack | 技术栈

8.1 Agent Orchestration | Agent 编排

8.2 Core Stack | 核心技术栈

9. References | 参考与借鉴

10. Next Steps | 后续步骤

11. Open Questions and Deliverables | 待澄清问题与建议产出

11.1 Open Questions | 待澄清问题

11.2 Recommended Deliverables | 建议产出文档