Private
Public Access
1
0
Files
linux_patch_manager/ARCHITECTURE.md
Echo f6540133c2 Complete SDD specification documents
- SPEC.md: Full project specification including scope, objectives, constraints,
  architecture overview, API integration, certificate management, UI structure,
  error handling, audit logging, and out-of-scope items

- REQUIREMENTS.md: Functional requirements (host mgmt, patch monitoring,
  deployment, scheduling, reporting, user mgmt), non-functional requirements
  (security, performance, scalability, reliability, usability), interface
  requirements, data requirements, HIPAA/PCI-DSS compliance

- ARCHITECTURE.md: Architecture decisions, system architecture diagram,
  component design (Axum web server, background worker, PostgreSQL, React SPA,
  internal CA), data flows, technology stack, security architecture,
  deployment architecture, integration points, monitoring
2026-04-23 14:40:33 +00:00

333 lines
19 KiB
Markdown

# Linux_Patch_Manager - Architecture Document
## Project Overview
**Title:** Linux_Patch_Manager
**Version:** 0.0.1
**Status:** Draft
## Architecture Decisions
| Decision | Choice | Rationale |
|----------|--------|-----------|
| Backend language/framework | Rust with Axum | Security-aligned with linux_patch_api, memory-safe, high async performance |
| Frontend framework | React + TypeScript SPA | Rich ecosystem for enterprise dashboards, strong typing |
| Database | PostgreSQL with SQLx | Enterprise-grade, type-safe Rust queries, handles concurrent access |
| Async runtime | Tokio | Standard Rust async runtime, integrates with Axum |
| Deployment model | Single bare metal/VM | Simplicity, supports up to 2,500 managed hosts |
| Frontend serving | Axum serves static files | Simplest deployment, single process |
| Background processing | Separate worker process | Clean separation of concerns, communicates via PostgreSQL |
| Session management | JWT + refresh tokens | Short-lived access tokens (15 min), revocable refresh tokens (1 hr) |
| Encryption at rest | LUKS full-disk (infrastructure) | HIPAA/PCI-DSS compliant, handled at infrastructure level |
| Certificate management | Internal CA on Patch Manager host | Issues/renews mTLS certs, manual distribution to clients |
## System Architecture
```
┌──────────────────────────────────────────────────────────────┐
│ Linux Patch Manager Host │
│ (Ubuntu 24.04) │
│ │
│ ┌─────────────────────┐ ┌──────────────────────────────┐ │
│ │ Axum Web Server │ │ Background Worker │ │
│ │ │ │ │ │
│ │ ┌───────────────┐ │ │ ┌────────────────────────┐ │ │
│ │ │ REST API │ │ │ │ Health Poller │ │ │
│ │ │ (CRUD, auth) │ │ │ │ (5 min intervals) │ │ │
│ │ └───────────────┘ │ │ └────────────────────────┘ │ │
│ │ ┌───────────────┐ │ │ ┌────────────────────────┐ │ │
│ │ │ WebSocket │ │ │ │ Patch Data Poller │ │ │
│ │ │ Relay │ │ │ │ (30 min intervals) │ │ │
│ │ └───────────────┘ │ │ └────────────────────────┘ │ │
│ │ ┌───────────────┐ │ │ ┌────────────────────────┐ │ │
│ │ │ Static Files │ │ │ │ Job Scheduler │ │ │
│ │ │ (React SPA) │ │ │ │ (maintenance windows) │ │ │
│ │ └───────────────┘ │ │ └────────────────────────┘ │ │
│ │ ┌───────────────┐ │ │ ┌────────────────────────┐ │ │
│ │ │ mTLS Client │ │ │ │ Retry Engine │ │ │
│ │ │ (agent comm) │◄─┼────┼─►│ (exp. backoff) │ │ │
│ │ └───────────────┘ │ │ └────────────────────────┘ │ │
│ └─────────┬─────────┘ │ ┌────────────────────────┐ │ │
│ │ │ │ Email Notifier │ │ │
│ │ │ │ (optional/disabled) │ │ │
│ │ │ └────────────────────────┘ │ │
│ │ └──────────────┬───────────────┘ │
│ │ │ │
│ │ ┌───────────────────┘ │
│ │ │ │
│ ┌─────────▼─────────▼──────────────────────────────────┐ │
│ │ PostgreSQL │ │
│ │ (hosts, groups, users, jobs, schedules, audit, etc.) │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Internal CA (mTLS certs) │ │
│ └───────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
mTLS / REST API (port 12443)
┌──────┼──────┐
▼ ▼ ▼
┌──────┐┌──────┐┌──────┐
│ Host ││ Host ││ Host │ ← Linux Patch API agents
│ A ││ B ││ C │ (up to 2,500)
└──────┘└──────┘└──────┘
```
## Component Design
### 1. Axum Web Server
**Responsibility:** Handle all HTTP/HTTPS requests from browsers and serve the React SPA.
- **REST API:** CRUD operations for hosts, groups, users, schedules, certificates, reports
- **WebSocket Relay:** Proxy real-time job status from agent WebSocket streams to browser clients
- **Static File Server:** Serve compiled React SPA (HTML, JS, CSS, assets)
- **Authentication:** JWT access token validation, refresh token management, MFA enforcement
- **Authorization:** RBAC middleware enforcing admin/operator/group-scoped access
- **mTLS Client:** HTTP client with client certificates for communicating with Linux Patch API agents
**API Versioning:** URL path versioning (`/api/v1/`) to match the upstream Linux Patch API convention.
### 2. Background Worker
**Responsibility:** All scheduled and asynchronous background processing.
- **Health Poller:** Periodic health checks to all registered agents (5-minute intervals)
- **Patch Data Poller:** Periodic patch availability queries to all agents (30-minute intervals)
- **Job Scheduler:** Execute queued patch operations when maintenance windows open
- **Retry Engine:** Handle agent communication failures with exponential backoff (3 retries, max 30 min)
- **Job Executor:** Trigger patch operations on agents, track async job status
- **Email Notifier:** Optional email notifications (disabled by default)
- **Data Pruner:** Clean up operational data older than 30 days, audit logs older than 6 months
**Communication:** Worker reads job queue from PostgreSQL, updates results back to PostgreSQL. Web server reads results from PostgreSQL for API responses.
### 3. PostgreSQL Database
**Responsibility:** Persistent storage for all application data.
**Key Tables:**
- `hosts` — registered hosts, metadata, health status, last seen
- `groups` — static groups for access control
- `host_groups` — many-to-many host ↔ group membership
- `users` — local accounts with hashed passwords, MFA secrets
- `user_groups` — many-to-many user ↔ group membership
- `refresh_tokens` — server-side refresh tokens for session management
- `maintenance_windows` — per-device recurring and one-time schedules
- `patch_jobs` — queued, running, completed, failed patch operations
- `patch_job_hosts` — per-host status within a batch job
- `host_patch_data` — cached patch availability data from agents
- `host_health_data` — cached health check results
- `certificates` — issued mTLS client certificates
- `audit_log` — tamper-evident audit trail
- `azure_sso_config` — Azure AD SSO configuration
**Data Retention:**
- Operational data (health, patches, jobs): 30 days
- Audit logs: 6 months
### 4. React + TypeScript SPA
**Responsibility:** User-facing web interface.
**Pages:**
1. Dashboard — fleet overview, compliance %, health summary, upcoming windows, root CA download
2. Hosts — filterable host list by group, status, OS
3. Host Detail — system info, packages, patches, jobs, maintenance window config, host cert download
4. Patch Deployment — select hosts, review patches, deploy (queue or immediate)
5. Jobs — real-time job monitoring with WebSocket updates
6. Maintenance Windows — per-device recurring/one-time schedule management
7. Groups — manage static groups, assign hosts and operators
8. Reports — generate/export compliance, patch history, vulnerability, audit (CSV/PDF)
9. Users — local account management, MFA setup, group assignments
10. Certificates — view/manage internal CA, issue/renew client certs
11. Settings — system config, Azure SSO, polling intervals
### 5. Internal CA
**Responsibility:** mTLS certificate management for agent communication.
- Runs on the same Patch Manager host
- Issues client certificates for mTLS communication with agents
- Manages certificate renewal
- Root CA certificate downloadable from dashboard for manual distribution
- Host-specific mTLS certificates downloadable from host detail page
- No automated distribution to clients — server administrators handle this manually
## Data Flow
### Host Registration Flow
```
1. Admin enters FQDN/IP → Axum validates & resolves FQDN
2. Axum stores host in PostgreSQL
3. Worker picks up new host → initial health check via mTLS
4. Health result stored in PostgreSQL → visible in dashboard
```
### Auto-Discovery Flow
```
1. Admin triggers CIDR scan → Axum sends request to Worker
2. Worker scans subnet for agents on port 12443
3. Discovered agents reported back → Admin selects which to register
4. Selected hosts stored in PostgreSQL
```
### Patch Deployment Flow (Queued)
```
1. Operator selects hosts + patches → chooses "Queue for next window"
2. Axum creates patch job in PostgreSQL (status: queued)
3. When maintenance window opens → Worker triggers patch operations on agents
4. Worker monitors async job status via agent API
5. Results stored in PostgreSQL → WebSocket relay pushes updates to browser
6. Failed jobs auto-retried once if still within window
```
### Patch Deployment Flow (Immediate)
```
1. Operator selects hosts + patches → chooses "Apply Now"
2. Axum creates patch job in PostgreSQL (status: pending)
3. Worker immediately triggers patch operations on agents
4. Same monitoring and retry logic as queued flow
```
### Health/Patch Polling Flow
```
1. Worker polls each agent on schedule (5 min health, 30 min patches)
2. Results cached in PostgreSQL
3. Unhealthy agents marked with visual alerts in dashboard
4. On-demand refresh: operator clicks refresh → Worker queries agent immediately
```
## Technology Stack
| Layer | Technology | Version/Notes |
|-------|-----------|---------------|
| Backend | Rust + Axum | Tokio async runtime, Tower middleware |
| Database | PostgreSQL | SQLx for type-safe queries, migrations via sqlx-cli |
| Frontend | React + TypeScript | Vite build tooling |
| UI Components | MUI (Material UI) | Enterprise dashboard components, dark mode, theming |
| WebSocket | Axum native WebSocket | Agent → Manager → Browser relay |
| Auth (Local) | Argon2 password hashing + TOTP/WebAuthn | MFA enforcement |
| Auth (SSO) | OAuth2/OIDC via Azure AD | Optional, with Azure MFA |
| Session | JWT (access) + PostgreSQL (refresh) | 15 min access, 1 hr refresh |
| mTLS Client | Rustls + client certs | TLS 1.3 only |
| Internal CA | Rustls/RCGen | Certificate issuance and renewal |
| Email | Lettre (Rust email crate) | Optional, disabled by default |
| PDF Export | Rust PDF generation crate | Compliance and audit reports |
| CSV Export | Rust CSV crate | Data export for all report types |
| Service Management | systemd | Ubuntu 24.04 |
| Static Files | Axum built-in static file serving | React SPA served directly |
## Security Architecture
### Authentication
- **Local accounts:** Argon2-hashed passwords + TOTP or WebAuthn for MFA
- **Azure SSO:** OAuth2/OIDC flow with Azure AD, using Azure's built-in MFA
- **Session tokens:** Short-lived JWT (15 min) for API access, server-side refresh tokens (1 hr inactivity timeout)
- **Refresh token revocation:** Stored in PostgreSQL, can be immediately revoked for forced logout
### Authorization (RBAC)
- **Admin:** Full access to all resources and settings
- **Operator:** Can add/remove clients, manage schedules and patches only for devices in their group memberships
- **Group scoping:** Operators can only interact with hosts in their assigned groups
- **Ungrouped hosts:** Accessible by any operator or admin
### Agent Communication
- **mTLS:** Client certificate authentication for all agent communication
- **TLS 1.3 only:** No older TLS versions
- **Internal CA:** Patch Manager manages CA, issues and renews client certificates
- **Manual distribution:** Server administrators manually install certs on managed clients
### Data Protection
- **Encryption at rest:** LUKS full-disk encryption (infrastructure-managed)
- **Encryption in transit:** TLS 1.3 for all connections (agent and web UI)
- **Audit log integrity:** Tamper-evident logging (hash chaining)
- **Password storage:** Argon2 with salt
### Compliance
- **HIPAA:** Audit controls, access controls, integrity controls, transmission security, automatic logoff
- **PCI-DSS:** Vulnerability management (core function), access restrictions, user identification, audit tracking, data protection
## Deployment Architecture
```
┌─────────────────────────────────────────┐
│ Patch Manager Host (Ubuntu 24.04) │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ systemd: patch-manager-web │ │
│ │ (Axum web server + static files) │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ systemd: patch-manager-worker │ │
│ │ (Background polling + jobs) │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ PostgreSQL │ │
│ │ (Database) │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Internal CA │ │
│ │ (Certificate management) │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ LUKS (Full-disk encryption) │ │
│ │ (Infrastructure-managed) │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘
```
- Two systemd services: `patch-manager-web` and `patch-manager-worker`
- PostgreSQL runs on the same host
- Internal CA runs on the same host
- LUKS full-disk encryption managed by infrastructure
- No Docker/LXC — bare metal/VM deployment
- Internal network only — no public internet exposure
## Scalability
- **Single-instance design:** Supports 500 typical hosts, up to 2,500
- **Manual horizontal scaling:** Divide clients between multiple Patch Manager hosts if needed
- **Connection pooling:** Axum handles thousands of concurrent connections with Tokio
- **Background worker:** Independent scaling of polling/jobs from web serving
- **Database:** PostgreSQL handles the workload easily on a single host
- **No automatic clustering or load balancing required**
## Integration Points
**Upstream Dependency:** [Linux Patch API](https://gitea.moon-dragon.us/echo/linux_patch_api)
| Integration | Protocol | Direction | Purpose |
|-------------|----------|-----------|----------|
| Agent REST API | HTTPS/mTLS (TLS 1.3) | Manager → Agent | Queries, patch operations |
| Agent WebSocket | WSS/mTLS | Agent → Manager | Real-time job status streaming |
| Azure AD | HTTPS/OAuth2 | Manager → Azure | SSO authentication (optional) |
**API Endpoints Used:**
- `GET /api/v1/health` — Agent health checks
- `GET /api/v1/system/info` — Host system information
- `GET /api/v1/packages` — List installed packages
- `GET /api/v1/patches` — List available patches
- `POST /api/v1/patches/apply` — Apply patches
- `PUT /api/v1/packages/{name}` — Update specific package
- `DELETE /api/v1/packages/{name}` — Remove package
- `POST /api/v1/packages` — Install packages
- `GET /api/v1/jobs` — List jobs
- `GET /api/v1/jobs/{id}` — Get job status
- `POST /api/v1/jobs/{id}/rollback` — Rollback a job
- `POST /api/v1/system/reboot` — Reboot host
- `WebSocket /api/v1/ws/jobs` — Real-time job status
## Monitoring and Observability
- **Application logging:** Structured JSON logging (tracing crate)
- **Log levels:** Configurable at runtime (DEBUG, INFO, WARN, ERROR)
- **Health endpoint:** `GET /api/v1/health` on the Patch Manager's own API for infrastructure monitoring
- **Dashboard alerts:** Visual indicators for unhealthy/unreachable agents (red/yellow status)
- **Audit logging:** All significant events logged to PostgreSQL with tamper-evident hash chaining
- **No external monitoring integration required** (dashboard-only alerts)