Completed comprehensive spec-driven documentation: - SPEC.md (222 lines): Project scope, objectives, constraints - ARCHITECTURE.md (290 lines): System design, components, data flow - REQUIREMENTS.md (168 lines): Functional & non-functional requirements - API_SPEC.md (556 lines): 15 API endpoints with schemas - SECURITY.md (188 lines): STRIDE threat model, security controls - ROADMAP.md (203 lines): 5 phases, 8 milestones, risk register Total: 1,627 lines of specification documentation Milestone M1 complete - Ready for Phase 0 (Rust scaffolding)
8.6 KiB
8.6 KiB
Linux_Patch_API - Architecture Document
System Overview
The Linux_Patch_API is a secure, single-host API service that enables remote package and patch management on Linux systems. Each instance runs as a systemd service on the managed host, providing a REST API over mTLS with strict IP whitelist enforcement.
Architecture Type: Agent Per Host (Option B)
Deployment: One instance per managed Linux host
Network: Internal network only (no internet exposure)
Component Architecture
Core Components
-
API Layer (Actix-web/Axum)
- HTTP/HTTPS endpoint handling
- mTLS termination
- IP whitelist enforcement
- Request routing
- WebSocket support for real-time job status
-
Authentication Layer
- Certificate validation (mTLS)
- Client identity extraction from certificate
- No session management (stateless, cert-based auth only)
-
Authorization Layer
- IP whitelist checking (deny by default)
- No permission validation (whitelisted IP + valid cert = full access)
-
Job Manager
- Async job queue for long-running operations
- Job status tracking with persistent storage
- WebSocket broadcast for real-time status updates
- 30-minute timeout enforcement
- Job cleanup and expiration
-
Package Manager Backend (Pluggable)
- apt/dpkg adapter (Debian/Ubuntu - primary)
- dnf/yum adapter (RHEL/CentOS/Fedora)
- apk adapter (Alpine)
- pacman adapter (Arch)
- Distribution detection and adapter selection
-
Audit Logger
- systemd journal integration (primary)
- Optional remote syslog server
- Local file fallback (
/var/log/linux_patch_api/) - 30-day retention with daily rotation and gzip compression
-
Configuration Manager
- YAML config file watcher (
/etc/linux_patch_api/config.yaml) - Auto-reload on file change
- Config validation before reload (prevents service downtime)
- Runtime settings access for all components
- YAML config file watcher (
External Integrations
- Package Managers: apt, dnf, yum, apk, pacman (via system commands)
- systemd: Service management and journal logging
- Internal CA: Certificate validation against self-hosted CA
- Remote Syslog: Optional external log aggregation
Technology Stack
Backend
- Language: Rust
- Framework: Actix-web or Axum
- Database: None (file-based job storage)
- mTLS: Rust TLS library (rustls or native-tls)
Infrastructure
- Service Manager: systemd
- Configuration: YAML
- Logging: systemd journal + optional syslog
Deployment
- Package Format: Native Linux packages (deb, rpm, apk, pkg.tar.zst)
- Distribution: Via target system package manager (apt, dnf, apk, pacman)
- Installation: Package installs binary, systemd service, and default config structure
- Updates: Handled through system package manager
Security Architecture
Authentication
- mTLS certificate-based authentication (required)
- Internal self-hosted CA
- Unique client certificates (1-year validity)
- Silent drop for non-mTLS connections
Authorization
- IP whitelist enforcement (block all by default)
- No granular permissions (binary access: allowed or denied)
- Whitelisted IP + valid cert = full API access
Process Security (systemd Hardening)
- User: root (required for package management)
- NoNewPrivileges: true (prevent privilege escalation)
- ProtectSystem: strict (read-only filesystem except allowed paths)
- ProtectHome: true (no access to /home, /root, /run/user)
- PrivateTmp: true (isolated /tmp)
- SystemCallFilter: Restrict to required syscalls only (application whitelist)
- RestrictAddressFamilies: AF_INET, AF_INET6, AF_UNIX (network restrictions)
- CapabilityBoundingSet: CAP_NET_BIND_SERVICE, CAP_SYS_ADMIN (minimal capabilities)
Data Security
- All communications encrypted via TLS
- Certificates stored securely with restricted permissions
- Audit logging of all operations
Certificate Storage (Option A: Separate Files)
/etc/linux_patch_api/certs/
├── ca.pem (644) - CA certificate
├── server.pem (644) - Server certificate
└── server.key (600) - Server private key (restricted)
Rationale:
- Tighter permissions on private key only (600)
- Easier certificate rotation (replace cert without touching key)
- Standard practice for TLS deployments
- No extraction overhead
File System Layout
/etc/linux_patch_api/
├── config.yaml # Main configuration
├── whitelist.yaml # IP whitelist
└── certs/
├── ca.pem # CA certificate (or server.p12)
├── server.pem # Server certificate
└── server.key # Server private key
/var/lib/linux_patch_api/
├── jobs/ # Job storage (cleared on restart)
└── state/ # Runtime state
/var/log/linux_patch_api/
└── audit.log # Local audit log fallback
/usr/bin/linux-patch-api # Binary location
/etc/systemd/system/linux-patch-api.service # Systemd service
Data Flow
Synchronous Request Flow (Quick Operations):
Client → [mTLS Handshake] → [IP Whitelist Check] → [API Layer]
↓
[Auth: Cert Valid?] → No → Silent Drop
↓ Yes
[Authz: IP Allowed?] → No → Silent Drop
↓ Yes
[Route to Handler] → [Execute Package Op] → [Log to Audit]
↓
[Return JSON Response] ← Client
Asynchronous Request Flow (Long Operations):
Client → [mTLS + IP Check] → [API Layer] → [Create Job] → [Return Job ID]
↓
[Job Manager Queue]
↓
[Package Manager Backend]
↓
[Update Job Status] → [WebSocket Broadcast]
↓
[Job Complete/Timeout]
↓
[Log to Audit]
Job Status Endpoint Flow:
Client → [mTLS + IP Check] → [API Layer] → [GET /jobs/{id}]
↓
[Query Job Storage]
↓
[Return Job Status JSON]
Configuration Reload Flow:
[Config File Changed] → [File Watcher Detects]
↓
[Validate New Config] → Invalid → [Log Error, Keep Old Config]
↓ Valid
[Swap Config in Memory] → [Notify Components] → [Log Reload Event]
Certificate Renewal Flow:
[Cert File Updated] → [File Watcher Detects]
↓
[Validate Certificate Chain] → Invalid → [Log Error, Keep Old Certs]
↓ Valid
[Reload TLS Context] → [New Connections Use New Certs] → [Log Reload Event]
Rollback Execution Flow (Exclusive):
[Rollback Triggered] → [Set Exclusive Mode] → [Reject New Requests]
↓
[Execute Rollback Operations] → [Log Each Step]
↓
[Rollback Complete] → [Clear Exclusive Mode] → [Accept New Requests]
Key Behaviors:
- Failed jobs are cleared on service restart (no persistence)
- Rollback execution is exclusive - no new requests accepted until complete
- Certificate renewal follows same validation pattern as config reload
- Status endpoint available (GET /jobs/{id}) in addition to WebSocket for job monitoring
API Design Principles
- Pure REST (resources as nouns, HTTP verbs for actions)
- JSON request/response with standard envelope
- Hybrid execution model (sync for quick ops, async for long ops)
- WebSocket for real-time job status streaming
- GET /jobs/{id} endpoint for job status polling
Network Configuration
- Bind Address: 0.0.0.0 (all interfaces)
- Port: 12443 (HTTPS/mTLS)
- Protocol: TLS 1.3 only
- Firewall: Host-level firewall should restrict inbound to whitelisted IPs only
Health Checks
Endpoint: GET /health
Purpose: General service status check
Response (200 OK - Healthy):
{
"success": true,
"request_id": "uuid",
"timestamp": "2026-04-09T13:04:02Z",
"data": {
"status": "healthy",
"uptime_seconds": 12345,
"version": "0.0.1"
},
"error": null
}
Health Check Criteria:
- Service is listening on port 12443
- mTLS is configured and valid
- Config file is loaded and valid
- Package manager backend is accessible
NOT Required:
- Metrics collection
- Alerting integration
- Prometheus/Grafana endpoints
Following kiro spec-driven development standards