Skip to content

Work-In-Progress-For-Health/malware-scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Architecture Design Overview: Malware Scanner Service

Document Details

Document author: Mark Evans

Document version: v0.1

Status: Draft

Approved by: …

Date approved: …

Review date: 2025-11-14

Template version: 1.2

Revision History

Date Version Author Revision Summary
2025-11-14 v0.1 Mark Evans Initial draft

Approval/Scrutiny History

Committee or Group Date Outcome
yyyy-mm-dd Draft

1. Introduction and Goals

The Malware Scanner Service provides asynchronous malware scanning for uploaded documents. Documents are stored in MinIO buckets, and a message is sent to RabbitMQ to trigger scanning by a Java Spring Boot worker. The goal is to provide a reliable, scalable, isolated malware‑scanning process that integrates cleanly with surrounding systems.

1.1 Requirements Overview

  • Accept file-upload notifications
  • Retrieve file objects for scanning
  • Scan files for malware
  • Emit clean/infected results
  • Quarantine infected files
  • Operate asynchronously and at scale

1.2 Quality Goals

  • Reliability: Messages must not be lost; scans must complete.
  • Scalability: Multiple workers must run in parallel.
  • Security: Isolation during scanning; secure access to buckets and queues.
  • Observability: Metrics and logs for throughput, failures, and performance.

1.3 Stakeholders

Role/Name Contact Expectations
Solution Architect Accurate architecture documentation
DevOps / Platform Team Clarity on deployment, scaling, and ops
Security Team Clear visibility of scanning and quarantine flows
Integrating Services Clear API/queue contract and flow

2. Architecture Constraints

  • Must use cloud object storage.
  • Must use a queue for asynchronous job triggering.
  • Must run as containerized Spring Boot service.
  • Must not access external internet during scanning.
  • Must support horizontal scaling.

3. Context and Scope

The system sits between file‑uploading applications and consuming downstream services. It does not handle direct file uploads; it reacts to messages indicating a file has been uploaded to MinIO.

3.1 Business Context

  • Upstream systems store documents in MinIO and notify the scanner.
  • The scanner determines malware risk and publishes results.
  • Downstream systems use scan results for triage, acceptance, or rejection.

3.2 Technical Context

C4 Level 2


4. Solution Strategy

  • Event‑driven asynchronous processing via RabbitMQ.
  • Blob storage in MinIO; documents referenced by bucket/key.
  • Spring Boot application performing scanning and publishing results.
  • Optional ClamAV or compatible engine for scanning.
  • C4 model used for decomposition.

5. Building Block View

5.1 C4 Level 1 View

C4 Level 2

5.2 C4 Level 2 View

C4 Level 2


6. Runtime View

Representative workflow:

C4 Level 2


7. Deployment view

C4 Level 2


8. Crosscutting Concepts

8.1 Design Patterns

  • Event‑driven architecture
  • Worker‑queue consumer pattern
  • Adapter pattern for plugging different malware engines

8.2 Security

  • TLS for RabbitMQ
  • IAM‑style scoped credentials for MinIO
  • Strict container isolation for scanning engine

8.3 Scalability

  • Horizontal pod autoscaling for scanner workers
  • RabbitMQ consumer scaling
  • MinIO bucket partitioning

8.4 Resilience

  • Retry and DLQ in RabbitMQ
  • Idempotent scan operations

8.5 Observability

  • Prometheus metrics
  • Structured JSON logging
  • Trace IDs from incoming messages

8.6 Regulatory & Compliance

  • Malware scanning supports IG controls on unsafe attachments
  • Audit logs retained according to organisational policy

9. Architecture Decisions

9.1 New ADRs

ID Impact
ADR‑001 Select RabbitMQ for asynchronous messaging
ADR‑002 Use MinIO for object storage
ADR‑003 Use Spring Boot for worker service

9.2 Design History

Date Impact Rationale
yyyy-mm-dd Introduced MinIO Required S3‑compatible local storage

10. Quality Requirements

  • Low false‑positive rate
  • High throughput under load
  • Secure isolation of untrusted files
  • Compatibility across environments

11. Risks and Technical Debt

11.1 Risks

ID Impact Mitigation / Plan Owner
R1 Medium Add DLQ monitoring
R2 High Sandbox scan-engine

11.2 Technical Debt

ID Impact Mitigation / Plan Owner
TD1 Medium Replace tightly-coupled scanner code

12. Glossary

Term Definition
MinIO S3-compatible object storage
RabbitMQ AMQP message broker
Malware Engine Component responsible for scanning files

About

Asynchronous malware scanning service to check documents that are uploaded to health boards.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published