| Level | Badge | Cloud Behavior | Applied To |
|---|---|---|---|
standard |
— | Full content sent per routing table | Default for all dirs |
sensitive |
🟡 | Only entity names/types/metadata to cloud. Full text local. | cortex privacy set <dir> sensitive |
restricted |
🔒 | BLOCKED. Zero data to cloud. Ever. | cortex privacy set <dir> restricted |
Classification is per-directory, inherited by files. Most restrictive wins.
Status: Planned. Auto-classification is not yet implemented. Files matching these patterns are excluded from ingestion (see Auto-Excluded list below), but privacy levels are not automatically assigned based on file type. Use
cortex privacy set <dir> <level>to classify manually.
.env*,*.pem,*.key→restricted*.sqlite,*.db→sensitivedocker-compose*.ymlwith env vars →sensitive
Runs BEFORE every cloud API call, in order:
- Privacy check: Is content from restricted/sensitive directory? → Block or redact
- Secret scan: Regex patterns for AWS keys, Anthropic keys, GitHub tokens, passwords, connection strings → Replace with
[SECRET_REDACTED] - PII detection: Email, phone, SSN, credit card patterns → Replace with
[PII_REDACTED] - Size validation: Max 50KB payload per request
If any check fails critically, the request is blocked (not queued, not retried to cloud).
Implementation status:
- Step 1 (privacy check): Enforced during ingestion and query paths.
restrictedentities are excluded from query context entirely.sensitiveentities have theircontentredacted to[REDACTED]before being sent to cloud LLMs. Applies to all query entry points (REST API, CLI, MCP).- Step 2 (secret scan): Implemented.
secretPatternsfrom config are compiled at pipeline construction and applied viascrubSecrets()before cloud LLM transmission (standard privacy level only; sensitive/restricted use local provider). Matches are replaced with[SECRET_REDACTED].- Step 3 (PII detection): Planned for future release.
- Step 4 (size validation): Not yet implemented.
When sensitive content needs cloud processing, send a redacted version:
interface RedactedEntity {
name: string; // kept
type: EntityType; // kept
summary: string; // kept (metadata level)
content: '[REDACTED]'; // replaced
properties: {}; // stripped
}Status: Planned. Transmission logging is not yet implemented. The
cortex privacy logcommand exists but returns empty results. The interface below describes the target design.
Every cloud API call will be logged to ~/.cortex/transmission.log (chmod 600):
interface TransmissionLogEntry {
id: string;
timestamp: string;
provider: string;
model: string;
task: LLMTask;
requestSizeBytes: number;
sourceFiles: string[];
privacyLevels: string[];
redactionsApplied: number;
secretsDetected: number;
status: 'sent' | 'blocked' | 'error';
}Will be viewable via cortex privacy log.
node_modules/, .git/objects/, dist/, build/, out/, .env*, *.key, *.pem, *.min.js, *.min.css, package-lock.json, yarn.lock, __pycache__/, *.pyc, .DS_Store, Thumbs.db, ~/.cortex/*.db
Status: Partially implemented.
cortex.config.jsonis written with600permissions. Directory and DB permissions are not yet explicitly set — they inherit from umask. Backup files (cortex.db.backup) do not have explicit permissions. See issue #12.
~/.cortex/directory:700(planned)cortex.db:600(planned)cortex.config.json:600(implemented)transmission.log:600(planned — logging not yet implemented)- Log files:
600(planned)
- Max file size: 10MB (configurable)
- Max files per directory: 10,000
- Max total watched files: 50,000
- Max parse time per file: 30s
- Max LLM extraction time: 60s
- Max context window per query: 50,000 tokens
https://api.anthropic.com(Anthropic API)https://api.openai.com(OpenAI API)http://localhost:11434(Ollama, local only)
No analytics, no update checks, no phone-home. Ever.
- REST API:
127.0.0.1:3710(localhost only by default) - WebSocket:
127.0.0.1:3710 - Authentication: Bearer token auth on all
/api/v1/*routes and WebSocket connections. Enforced automatically whenserver.hostis non-localhost. Can be enabled explicitly viaserver.auth.enabled = trueorCORTEX_SERVER_AUTH_TOKENenv var. Tokens are compared usingtimingSafeEqual. When serving on a non-localhost host without a configured token, one is auto-generated and saved to~/.cortex/.env. - WebSocket auth uses query string:
ws://host:port/ws?token=<token> - No rate limiting on API endpoints yet. See issue #9.
Priority: OS Keychain > Environment Variable > File reference > Raw in config (warn)
Implementation status:
env:VAR_NAMEformat: Implemented and recommended.- OS Keychain (
keychain:ENTRY): Planned for future release.- File reference (
file:PATH): Planned for future release.- Raw key detection/warning: Not yet implemented. Raw keys in
apiKeySourcesilently fail (returnundefined). See issue #15.- Startup API key validation via minimal API call: Implemented in
isAvailable(), but this incurs real API cost on every call without caching. See issue #11.
- Structured output: JSON schema + Zod validation rejects non-conforming output
- Content isolation:
---CONTENT START---/---CONTENT END---delimiters - Output validation: Entity types from enum, confidence 0.0-1.0, relationship types valid
- No execution: Code is stored as text summaries, never executed