Complete SDD specification documents
- SPEC.md: Full project specification including scope, objectives, constraints, architecture overview, API integration, certificate management, UI structure, error handling, audit logging, and out-of-scope items - REQUIREMENTS.md: Functional requirements (host mgmt, patch monitoring, deployment, scheduling, reporting, user mgmt), non-functional requirements (security, performance, scalability, reliability, usability), interface requirements, data requirements, HIPAA/PCI-DSS compliance - ARCHITECTURE.md: Architecture decisions, system architecture diagram, component design (Axum web server, background worker, PostgreSQL, React SPA, internal CA), data flows, technology stack, security architecture, deployment architecture, integration points, monitoring
This commit is contained in:
14
.gitignore
vendored
Normal file
14
.gitignore
vendored
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
# Agent Zero project data
|
||||||
|
.a0proj/
|
||||||
|
|
||||||
|
# Python environments & cache
|
||||||
|
venv/**
|
||||||
|
**/__pycache__/**
|
||||||
|
|
||||||
|
# Node.js dependencies
|
||||||
|
**/node_modules/**
|
||||||
|
**/.npm/**
|
||||||
|
|
||||||
|
# IDE
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
306
ARCHITECTURE.md
306
ARCHITECTURE.md
@ -7,42 +7,326 @@
|
|||||||
|
|
||||||
## Architecture Decisions
|
## Architecture Decisions
|
||||||
|
|
||||||
<!-- Document key architectural decisions and rationale -->
|
| Decision | Choice | Rationale |
|
||||||
|
|----------|--------|-----------|
|
||||||
|
| Backend language/framework | Rust with Axum | Security-aligned with linux_patch_api, memory-safe, high async performance |
|
||||||
|
| Frontend framework | React + TypeScript SPA | Rich ecosystem for enterprise dashboards, strong typing |
|
||||||
|
| Database | PostgreSQL with SQLx | Enterprise-grade, type-safe Rust queries, handles concurrent access |
|
||||||
|
| Async runtime | Tokio | Standard Rust async runtime, integrates with Axum |
|
||||||
|
| Deployment model | Single bare metal/VM | Simplicity, supports up to 2,500 managed hosts |
|
||||||
|
| Frontend serving | Axum serves static files | Simplest deployment, single process |
|
||||||
|
| Background processing | Separate worker process | Clean separation of concerns, communicates via PostgreSQL |
|
||||||
|
| Session management | JWT + refresh tokens | Short-lived access tokens (15 min), revocable refresh tokens (1 hr) |
|
||||||
|
| Encryption at rest | LUKS full-disk (infrastructure) | HIPAA/PCI-DSS compliant, handled at infrastructure level |
|
||||||
|
| Certificate management | Internal CA on Patch Manager host | Issues/renews mTLS certs, manual distribution to clients |
|
||||||
|
|
||||||
## System Architecture
|
## System Architecture
|
||||||
|
|
||||||
<!-- High-level system architecture diagram and description -->
|
```
|
||||||
|
┌──────────────────────────────────────────────────────────────┐
|
||||||
|
│ Linux Patch Manager Host │
|
||||||
|
│ (Ubuntu 24.04) │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────────┐ ┌──────────────────────────────┐ │
|
||||||
|
│ │ Axum Web Server │ │ Background Worker │ │
|
||||||
|
│ │ │ │ │ │
|
||||||
|
│ │ ┌───────────────┐ │ │ ┌────────────────────────┐ │ │
|
||||||
|
│ │ │ REST API │ │ │ │ Health Poller │ │ │
|
||||||
|
│ │ │ (CRUD, auth) │ │ │ │ (5 min intervals) │ │ │
|
||||||
|
│ │ └───────────────┘ │ │ └────────────────────────┘ │ │
|
||||||
|
│ │ ┌───────────────┐ │ │ ┌────────────────────────┐ │ │
|
||||||
|
│ │ │ WebSocket │ │ │ │ Patch Data Poller │ │ │
|
||||||
|
│ │ │ Relay │ │ │ │ (30 min intervals) │ │ │
|
||||||
|
│ │ └───────────────┘ │ │ └────────────────────────┘ │ │
|
||||||
|
│ │ ┌───────────────┐ │ │ ┌────────────────────────┐ │ │
|
||||||
|
│ │ │ Static Files │ │ │ │ Job Scheduler │ │ │
|
||||||
|
│ │ │ (React SPA) │ │ │ │ (maintenance windows) │ │ │
|
||||||
|
│ │ └───────────────┘ │ │ └────────────────────────┘ │ │
|
||||||
|
│ │ ┌───────────────┐ │ │ ┌────────────────────────┐ │ │
|
||||||
|
│ │ │ mTLS Client │ │ │ │ Retry Engine │ │ │
|
||||||
|
│ │ │ (agent comm) │◄─┼────┼─►│ (exp. backoff) │ │ │
|
||||||
|
│ │ └───────────────┘ │ │ └────────────────────────┘ │ │
|
||||||
|
│ └─────────┬─────────┘ │ ┌────────────────────────┐ │ │
|
||||||
|
│ │ │ │ Email Notifier │ │ │
|
||||||
|
│ │ │ │ (optional/disabled) │ │ │
|
||||||
|
│ │ │ └────────────────────────┘ │ │
|
||||||
|
│ │ └──────────────┬───────────────┘ │
|
||||||
|
│ │ │ │
|
||||||
|
│ │ ┌───────────────────┘ │
|
||||||
|
│ │ │ │
|
||||||
|
│ ┌─────────▼─────────▼──────────────────────────────────┐ │
|
||||||
|
│ │ PostgreSQL │ │
|
||||||
|
│ │ (hosts, groups, users, jobs, schedules, audit, etc.) │ │
|
||||||
|
│ └───────────────────────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌───────────────────────────────────────────────────────┐ │
|
||||||
|
│ │ Internal CA (mTLS certs) │ │
|
||||||
|
│ └───────────────────────────────────────────────────────┘ │
|
||||||
|
└──────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
mTLS / REST API (port 12443)
|
||||||
|
┌──────┼──────┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌──────┐┌──────┐┌──────┐
|
||||||
|
│ Host ││ Host ││ Host │ ← Linux Patch API agents
|
||||||
|
│ A ││ B ││ C │ (up to 2,500)
|
||||||
|
└──────┘└──────┘└──────┘
|
||||||
|
```
|
||||||
|
|
||||||
## Component Design
|
## Component Design
|
||||||
|
|
||||||
<!-- Detailed component design and interactions -->
|
### 1. Axum Web Server
|
||||||
|
|
||||||
|
**Responsibility:** Handle all HTTP/HTTPS requests from browsers and serve the React SPA.
|
||||||
|
|
||||||
|
- **REST API:** CRUD operations for hosts, groups, users, schedules, certificates, reports
|
||||||
|
- **WebSocket Relay:** Proxy real-time job status from agent WebSocket streams to browser clients
|
||||||
|
- **Static File Server:** Serve compiled React SPA (HTML, JS, CSS, assets)
|
||||||
|
- **Authentication:** JWT access token validation, refresh token management, MFA enforcement
|
||||||
|
- **Authorization:** RBAC middleware enforcing admin/operator/group-scoped access
|
||||||
|
- **mTLS Client:** HTTP client with client certificates for communicating with Linux Patch API agents
|
||||||
|
|
||||||
|
**API Versioning:** URL path versioning (`/api/v1/`) to match the upstream Linux Patch API convention.
|
||||||
|
|
||||||
|
### 2. Background Worker
|
||||||
|
|
||||||
|
**Responsibility:** All scheduled and asynchronous background processing.
|
||||||
|
|
||||||
|
- **Health Poller:** Periodic health checks to all registered agents (5-minute intervals)
|
||||||
|
- **Patch Data Poller:** Periodic patch availability queries to all agents (30-minute intervals)
|
||||||
|
- **Job Scheduler:** Execute queued patch operations when maintenance windows open
|
||||||
|
- **Retry Engine:** Handle agent communication failures with exponential backoff (3 retries, max 30 min)
|
||||||
|
- **Job Executor:** Trigger patch operations on agents, track async job status
|
||||||
|
- **Email Notifier:** Optional email notifications (disabled by default)
|
||||||
|
- **Data Pruner:** Clean up operational data older than 30 days, audit logs older than 6 months
|
||||||
|
|
||||||
|
**Communication:** Worker reads job queue from PostgreSQL, updates results back to PostgreSQL. Web server reads results from PostgreSQL for API responses.
|
||||||
|
|
||||||
|
### 3. PostgreSQL Database
|
||||||
|
|
||||||
|
**Responsibility:** Persistent storage for all application data.
|
||||||
|
|
||||||
|
**Key Tables:**
|
||||||
|
- `hosts` — registered hosts, metadata, health status, last seen
|
||||||
|
- `groups` — static groups for access control
|
||||||
|
- `host_groups` — many-to-many host ↔ group membership
|
||||||
|
- `users` — local accounts with hashed passwords, MFA secrets
|
||||||
|
- `user_groups` — many-to-many user ↔ group membership
|
||||||
|
- `refresh_tokens` — server-side refresh tokens for session management
|
||||||
|
- `maintenance_windows` — per-device recurring and one-time schedules
|
||||||
|
- `patch_jobs` — queued, running, completed, failed patch operations
|
||||||
|
- `patch_job_hosts` — per-host status within a batch job
|
||||||
|
- `host_patch_data` — cached patch availability data from agents
|
||||||
|
- `host_health_data` — cached health check results
|
||||||
|
- `certificates` — issued mTLS client certificates
|
||||||
|
- `audit_log` — tamper-evident audit trail
|
||||||
|
- `azure_sso_config` — Azure AD SSO configuration
|
||||||
|
|
||||||
|
**Data Retention:**
|
||||||
|
- Operational data (health, patches, jobs): 30 days
|
||||||
|
- Audit logs: 6 months
|
||||||
|
|
||||||
|
### 4. React + TypeScript SPA
|
||||||
|
|
||||||
|
**Responsibility:** User-facing web interface.
|
||||||
|
|
||||||
|
**Pages:**
|
||||||
|
1. Dashboard — fleet overview, compliance %, health summary, upcoming windows, root CA download
|
||||||
|
2. Hosts — filterable host list by group, status, OS
|
||||||
|
3. Host Detail — system info, packages, patches, jobs, maintenance window config, host cert download
|
||||||
|
4. Patch Deployment — select hosts, review patches, deploy (queue or immediate)
|
||||||
|
5. Jobs — real-time job monitoring with WebSocket updates
|
||||||
|
6. Maintenance Windows — per-device recurring/one-time schedule management
|
||||||
|
7. Groups — manage static groups, assign hosts and operators
|
||||||
|
8. Reports — generate/export compliance, patch history, vulnerability, audit (CSV/PDF)
|
||||||
|
9. Users — local account management, MFA setup, group assignments
|
||||||
|
10. Certificates — view/manage internal CA, issue/renew client certs
|
||||||
|
11. Settings — system config, Azure SSO, polling intervals
|
||||||
|
|
||||||
|
### 5. Internal CA
|
||||||
|
|
||||||
|
**Responsibility:** mTLS certificate management for agent communication.
|
||||||
|
|
||||||
|
- Runs on the same Patch Manager host
|
||||||
|
- Issues client certificates for mTLS communication with agents
|
||||||
|
- Manages certificate renewal
|
||||||
|
- Root CA certificate downloadable from dashboard for manual distribution
|
||||||
|
- Host-specific mTLS certificates downloadable from host detail page
|
||||||
|
- No automated distribution to clients — server administrators handle this manually
|
||||||
|
|
||||||
## Data Flow
|
## Data Flow
|
||||||
|
|
||||||
<!-- Data flow between components -->
|
### Host Registration Flow
|
||||||
|
```
|
||||||
|
1. Admin enters FQDN/IP → Axum validates & resolves FQDN
|
||||||
|
2. Axum stores host in PostgreSQL
|
||||||
|
3. Worker picks up new host → initial health check via mTLS
|
||||||
|
4. Health result stored in PostgreSQL → visible in dashboard
|
||||||
|
```
|
||||||
|
|
||||||
|
### Auto-Discovery Flow
|
||||||
|
```
|
||||||
|
1. Admin triggers CIDR scan → Axum sends request to Worker
|
||||||
|
2. Worker scans subnet for agents on port 12443
|
||||||
|
3. Discovered agents reported back → Admin selects which to register
|
||||||
|
4. Selected hosts stored in PostgreSQL
|
||||||
|
```
|
||||||
|
|
||||||
|
### Patch Deployment Flow (Queued)
|
||||||
|
```
|
||||||
|
1. Operator selects hosts + patches → chooses "Queue for next window"
|
||||||
|
2. Axum creates patch job in PostgreSQL (status: queued)
|
||||||
|
3. When maintenance window opens → Worker triggers patch operations on agents
|
||||||
|
4. Worker monitors async job status via agent API
|
||||||
|
5. Results stored in PostgreSQL → WebSocket relay pushes updates to browser
|
||||||
|
6. Failed jobs auto-retried once if still within window
|
||||||
|
```
|
||||||
|
|
||||||
|
### Patch Deployment Flow (Immediate)
|
||||||
|
```
|
||||||
|
1. Operator selects hosts + patches → chooses "Apply Now"
|
||||||
|
2. Axum creates patch job in PostgreSQL (status: pending)
|
||||||
|
3. Worker immediately triggers patch operations on agents
|
||||||
|
4. Same monitoring and retry logic as queued flow
|
||||||
|
```
|
||||||
|
|
||||||
|
### Health/Patch Polling Flow
|
||||||
|
```
|
||||||
|
1. Worker polls each agent on schedule (5 min health, 30 min patches)
|
||||||
|
2. Results cached in PostgreSQL
|
||||||
|
3. Unhealthy agents marked with visual alerts in dashboard
|
||||||
|
4. On-demand refresh: operator clicks refresh → Worker queries agent immediately
|
||||||
|
```
|
||||||
|
|
||||||
## Technology Stack
|
## Technology Stack
|
||||||
|
|
||||||
<!-- Technology choices and rationale -->
|
| Layer | Technology | Version/Notes |
|
||||||
|
|-------|-----------|---------------|
|
||||||
|
| Backend | Rust + Axum | Tokio async runtime, Tower middleware |
|
||||||
|
| Database | PostgreSQL | SQLx for type-safe queries, migrations via sqlx-cli |
|
||||||
|
| Frontend | React + TypeScript | Vite build tooling |
|
||||||
|
| UI Components | MUI (Material UI) | Enterprise dashboard components, dark mode, theming |
|
||||||
|
| WebSocket | Axum native WebSocket | Agent → Manager → Browser relay |
|
||||||
|
| Auth (Local) | Argon2 password hashing + TOTP/WebAuthn | MFA enforcement |
|
||||||
|
| Auth (SSO) | OAuth2/OIDC via Azure AD | Optional, with Azure MFA |
|
||||||
|
| Session | JWT (access) + PostgreSQL (refresh) | 15 min access, 1 hr refresh |
|
||||||
|
| mTLS Client | Rustls + client certs | TLS 1.3 only |
|
||||||
|
| Internal CA | Rustls/RCGen | Certificate issuance and renewal |
|
||||||
|
| Email | Lettre (Rust email crate) | Optional, disabled by default |
|
||||||
|
| PDF Export | Rust PDF generation crate | Compliance and audit reports |
|
||||||
|
| CSV Export | Rust CSV crate | Data export for all report types |
|
||||||
|
| Service Management | systemd | Ubuntu 24.04 |
|
||||||
|
| Static Files | Axum built-in static file serving | React SPA served directly |
|
||||||
|
|
||||||
## Security Architecture
|
## Security Architecture
|
||||||
|
|
||||||
<!-- Security design including authentication, authorization, encryption -->
|
### Authentication
|
||||||
|
- **Local accounts:** Argon2-hashed passwords + TOTP or WebAuthn for MFA
|
||||||
|
- **Azure SSO:** OAuth2/OIDC flow with Azure AD, using Azure's built-in MFA
|
||||||
|
- **Session tokens:** Short-lived JWT (15 min) for API access, server-side refresh tokens (1 hr inactivity timeout)
|
||||||
|
- **Refresh token revocation:** Stored in PostgreSQL, can be immediately revoked for forced logout
|
||||||
|
|
||||||
|
### Authorization (RBAC)
|
||||||
|
- **Admin:** Full access to all resources and settings
|
||||||
|
- **Operator:** Can add/remove clients, manage schedules and patches only for devices in their group memberships
|
||||||
|
- **Group scoping:** Operators can only interact with hosts in their assigned groups
|
||||||
|
- **Ungrouped hosts:** Accessible by any operator or admin
|
||||||
|
|
||||||
|
### Agent Communication
|
||||||
|
- **mTLS:** Client certificate authentication for all agent communication
|
||||||
|
- **TLS 1.3 only:** No older TLS versions
|
||||||
|
- **Internal CA:** Patch Manager manages CA, issues and renews client certificates
|
||||||
|
- **Manual distribution:** Server administrators manually install certs on managed clients
|
||||||
|
|
||||||
|
### Data Protection
|
||||||
|
- **Encryption at rest:** LUKS full-disk encryption (infrastructure-managed)
|
||||||
|
- **Encryption in transit:** TLS 1.3 for all connections (agent and web UI)
|
||||||
|
- **Audit log integrity:** Tamper-evident logging (hash chaining)
|
||||||
|
- **Password storage:** Argon2 with salt
|
||||||
|
|
||||||
|
### Compliance
|
||||||
|
- **HIPAA:** Audit controls, access controls, integrity controls, transmission security, automatic logoff
|
||||||
|
- **PCI-DSS:** Vulnerability management (core function), access restrictions, user identification, audit tracking, data protection
|
||||||
|
|
||||||
## Deployment Architecture
|
## Deployment Architecture
|
||||||
|
|
||||||
<!-- How the system is deployed and configured -->
|
```
|
||||||
|
┌─────────────────────────────────────────┐
|
||||||
|
│ Patch Manager Host (Ubuntu 24.04) │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────────────────────────┐ │
|
||||||
|
│ │ systemd: patch-manager-web │ │
|
||||||
|
│ │ (Axum web server + static files) │ │
|
||||||
|
│ └─────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────────────────────────┐ │
|
||||||
|
│ │ systemd: patch-manager-worker │ │
|
||||||
|
│ │ (Background polling + jobs) │ │
|
||||||
|
│ └─────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────────────────────────┐ │
|
||||||
|
│ │ PostgreSQL │ │
|
||||||
|
│ │ (Database) │ │
|
||||||
|
│ └─────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────────────────────────┐ │
|
||||||
|
│ │ Internal CA │ │
|
||||||
|
│ │ (Certificate management) │ │
|
||||||
|
│ └─────────────────────────────────────┘ │
|
||||||
|
│ │
|
||||||
|
│ ┌─────────────────────────────────────┐ │
|
||||||
|
│ │ LUKS (Full-disk encryption) │ │
|
||||||
|
│ │ (Infrastructure-managed) │ │
|
||||||
|
│ └─────────────────────────────────────┘ │
|
||||||
|
└─────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
- Two systemd services: `patch-manager-web` and `patch-manager-worker`
|
||||||
|
- PostgreSQL runs on the same host
|
||||||
|
- Internal CA runs on the same host
|
||||||
|
- LUKS full-disk encryption managed by infrastructure
|
||||||
|
- No Docker/LXC — bare metal/VM deployment
|
||||||
|
- Internal network only — no public internet exposure
|
||||||
|
|
||||||
## Scalability
|
## Scalability
|
||||||
|
|
||||||
<!-- How the system scales horizontally and vertically -->
|
- **Single-instance design:** Supports 500 typical hosts, up to 2,500
|
||||||
|
- **Manual horizontal scaling:** Divide clients between multiple Patch Manager hosts if needed
|
||||||
|
- **Connection pooling:** Axum handles thousands of concurrent connections with Tokio
|
||||||
|
- **Background worker:** Independent scaling of polling/jobs from web serving
|
||||||
|
- **Database:** PostgreSQL handles the workload easily on a single host
|
||||||
|
- **No automatic clustering or load balancing required**
|
||||||
|
|
||||||
## Integration Points
|
## Integration Points
|
||||||
|
|
||||||
<!-- External system integrations, especially Linux Patch API -->
|
|
||||||
|
|
||||||
**Upstream Dependency:** [Linux Patch API](https://gitea.moon-dragon.us/echo/linux_patch_api)
|
**Upstream Dependency:** [Linux Patch API](https://gitea.moon-dragon.us/echo/linux_patch_api)
|
||||||
|
|
||||||
|
| Integration | Protocol | Direction | Purpose |
|
||||||
|
|-------------|----------|-----------|----------|
|
||||||
|
| Agent REST API | HTTPS/mTLS (TLS 1.3) | Manager → Agent | Queries, patch operations |
|
||||||
|
| Agent WebSocket | WSS/mTLS | Agent → Manager | Real-time job status streaming |
|
||||||
|
| Azure AD | HTTPS/OAuth2 | Manager → Azure | SSO authentication (optional) |
|
||||||
|
|
||||||
|
**API Endpoints Used:**
|
||||||
|
- `GET /api/v1/health` — Agent health checks
|
||||||
|
- `GET /api/v1/system/info` — Host system information
|
||||||
|
- `GET /api/v1/packages` — List installed packages
|
||||||
|
- `GET /api/v1/patches` — List available patches
|
||||||
|
- `POST /api/v1/patches/apply` — Apply patches
|
||||||
|
- `PUT /api/v1/packages/{name}` — Update specific package
|
||||||
|
- `DELETE /api/v1/packages/{name}` — Remove package
|
||||||
|
- `POST /api/v1/packages` — Install packages
|
||||||
|
- `GET /api/v1/jobs` — List jobs
|
||||||
|
- `GET /api/v1/jobs/{id}` — Get job status
|
||||||
|
- `POST /api/v1/jobs/{id}/rollback` — Rollback a job
|
||||||
|
- `POST /api/v1/system/reboot` — Reboot host
|
||||||
|
- `WebSocket /api/v1/ws/jobs` — Real-time job status
|
||||||
|
|
||||||
## Monitoring and Observability
|
## Monitoring and Observability
|
||||||
|
|
||||||
<!-- Logging, metrics, tracing strategy -->
|
- **Application logging:** Structured JSON logging (tracing crate)
|
||||||
|
- **Log levels:** Configurable at runtime (DEBUG, INFO, WARN, ERROR)
|
||||||
|
- **Health endpoint:** `GET /api/v1/health` on the Patch Manager's own API for infrastructure monitoring
|
||||||
|
- **Dashboard alerts:** Visual indicators for unhealthy/unreachable agents (red/yellow status)
|
||||||
|
- **Audit logging:** All significant events logged to PostgreSQL with tamper-evident hash chaining
|
||||||
|
- **No external monitoring integration required** (dashboard-only alerts)
|
||||||
|
|||||||
101
REQUIREMENTS.md
101
REQUIREMENTS.md
@ -7,63 +7,146 @@
|
|||||||
|
|
||||||
## Functional Requirements
|
## Functional Requirements
|
||||||
|
|
||||||
<!-- Define all functional requirements -->
|
|
||||||
|
|
||||||
### FR-01: Host Management
|
### FR-01: Host Management
|
||||||
|
|
||||||
|
- Manual host registration by FQDN or IP address (FQDN resolved to IP at add time)
|
||||||
|
- On-demand auto-discovery targeting a CIDR subnet range (scans for Linux Patch API agents on port 12443)
|
||||||
|
- Host metadata tracked: hostname, IP, OS, kernel, agent version, last seen, health status
|
||||||
|
- Static group-based organization with many-to-many relationships (hosts can belong to multiple groups)
|
||||||
|
- Ungrouped hosts can be managed by any operator or admin
|
||||||
|
- Host removal with audit logging
|
||||||
|
|
||||||
### FR-02: Patch Monitoring
|
### FR-02: Patch Monitoring
|
||||||
|
|
||||||
|
- Scheduled background polling: 5-minute intervals for health checks, 30-minute intervals for patch data
|
||||||
|
- On-demand refresh triggered by operator/admin from the UI
|
||||||
|
- Visual dashboard alerts for unhealthy or unreachable agents (red/yellow status indicators)
|
||||||
|
- CVE severity, patch priority, and reboot requirement display per host
|
||||||
|
|
||||||
### FR-03: Patch Deployment
|
### FR-03: Patch Deployment
|
||||||
|
|
||||||
|
- Patches queue for the next available maintenance window by default
|
||||||
|
- Immediate-apply override option for urgent patches
|
||||||
|
- No approval gate required — operator/admin triggers deployment directly
|
||||||
|
- Auto-retry failed patch jobs once if still within the maintenance window, then surface failure prominently
|
||||||
|
- Batch operations across multiple hosts with partial failure handling (auto-retry once, then report failures)
|
||||||
|
- Rollback support via upstream Linux Patch API rollback endpoint
|
||||||
|
|
||||||
### FR-04: Scheduling
|
### FR-04: Scheduling
|
||||||
|
|
||||||
|
- Maintenance windows are per-device (not per-group)
|
||||||
|
- Recurring schedules: daily, weekly, or monthly
|
||||||
|
- One-time maintenance windows
|
||||||
|
- Patch operations execute automatically when a maintenance window opens
|
||||||
|
|
||||||
### FR-05: Reporting
|
### FR-05: Reporting
|
||||||
|
|
||||||
|
- Compliance report: percentage of hosts fully patched, by group or fleet-wide
|
||||||
|
- Patch history: log of all patch operations per host or per group
|
||||||
|
- Vulnerability exposure: hosts with known CVEs pending patches
|
||||||
|
- Audit trail: who did what when (user actions, patch operations)
|
||||||
|
- Export formats: CSV and PDF
|
||||||
|
|
||||||
### FR-06: User Management
|
### FR-06: User Management
|
||||||
|
|
||||||
|
- **Admin role**: Full access to manage all aspects of Linux Patch Manager
|
||||||
|
- **Operator role**: Can add/remove clients, manage schedules and patches only for devices in their group memberships
|
||||||
|
- Operators can belong to multiple groups
|
||||||
|
- Local accounts with MFA required (TOTP or WebAuthn)
|
||||||
|
- Azure SSO integration (optional, with Azure's built-in MFA)
|
||||||
|
- Group membership management for users and hosts
|
||||||
|
|
||||||
## Non-Functional Requirements
|
## Non-Functional Requirements
|
||||||
|
|
||||||
<!-- Define all non-functional requirements -->
|
|
||||||
|
|
||||||
### NFR-01: Security
|
### NFR-01: Security
|
||||||
|
|
||||||
|
- Combination authentication: local accounts + Azure SSO
|
||||||
|
- MFA required for all users (TOTP or WebAuthn; Azure MFA for SSO users)
|
||||||
|
- Session management: short-lived JWT access tokens (15 min) + server-side refresh tokens (1-hour inactivity timeout, revocable)
|
||||||
|
- mTLS for all agent communication (certificate-based, TLS 1.3 only)
|
||||||
|
- HTTPS enforced for web UI
|
||||||
|
- Internal CA managed by Patch Manager for mTLS certificate issuance and renewal
|
||||||
|
- Certificate distribution to managed clients is manual (server administrators responsible)
|
||||||
|
- RBAC with group-scoped access control
|
||||||
|
|
||||||
### NFR-02: Performance
|
### NFR-02: Performance
|
||||||
|
|
||||||
|
- Support 500 typical managed hosts, up to 2,500
|
||||||
|
- Dashboard load time under 5 seconds for full fleet view
|
||||||
|
- Background polling must not degrade UI responsiveness
|
||||||
|
- Concurrent batch operations (e.g., patch 500 hosts simultaneously) must not overwhelm the system
|
||||||
|
|
||||||
### NFR-03: Scalability
|
### NFR-03: Scalability
|
||||||
|
|
||||||
|
- Single-instance design on bare metal/VM (Ubuntu 24.04)
|
||||||
|
- Manual horizontal scaling by dividing clients between multiple Patch Manager hosts if needed
|
||||||
|
- No automatic clustering or load balancing required
|
||||||
|
|
||||||
### NFR-04: Reliability
|
### NFR-04: Reliability
|
||||||
|
|
||||||
|
- Agent communication failures: retry with exponential backoff (3 retries, max 30 minutes between retries)
|
||||||
|
- Patch job failures: auto-retry once within maintenance window, then surface to operators
|
||||||
|
- Batch partial failures: auto-retry once, then report remaining failures to operator
|
||||||
|
- Continue processing healthy hosts regardless of individual host failures
|
||||||
|
|
||||||
### NFR-05: Usability
|
### NFR-05: Usability
|
||||||
|
|
||||||
|
- 11-page web UI (React + TypeScript SPA)
|
||||||
|
- Responsive design for desktop/laptop screens
|
||||||
|
- Dark mode support
|
||||||
|
- Certificate download links integrated into dashboard (root CA) and host detail (host-specific mTLS)
|
||||||
|
|
||||||
## Interface Requirements
|
## Interface Requirements
|
||||||
|
|
||||||
<!-- API and UI interface requirements -->
|
|
||||||
|
|
||||||
### IR-01: Web Interface
|
### IR-01: Web Interface
|
||||||
|
|
||||||
|
- React + TypeScript SPA served by Axum backend
|
||||||
|
- Real-time job status via WebSocket relay (agent WebSocket → Patch Manager → browser)
|
||||||
|
- RESTful API backend for all UI operations
|
||||||
|
- Certificate download endpoints for root CA and host-specific mTLS certs
|
||||||
|
|
||||||
### IR-02: Linux Patch API Integration
|
### IR-02: Linux Patch API Integration
|
||||||
|
|
||||||
|
- All managed device communication via Linux Patch API (upstream agent)
|
||||||
|
- mTLS client certificate authentication to each agent
|
||||||
|
- Base path: `/api/v1/`, Port: 12443, TLS 1.3 only
|
||||||
|
- Sync operations: GET endpoints (packages, patches, system info, health)
|
||||||
|
- Async operations: POST/PUT/DELETE endpoints (install, update, remove, patch apply, reboot)
|
||||||
|
- Job status tracking via GET `/api/v1/jobs/{id}` and WebSocket `/api/v1/ws/jobs`
|
||||||
|
- Rollback via POST `/api/v1/jobs/{id}/rollback`
|
||||||
|
|
||||||
## Data Requirements
|
## Data Requirements
|
||||||
|
|
||||||
<!-- Data storage, retention, and processing requirements -->
|
- **Database:** PostgreSQL
|
||||||
|
- **Operational data retention:** 30 days (host patch history, job history, health history)
|
||||||
|
- **Audit log retention:** 6 months
|
||||||
|
- **Data storage:** All data on Patch Manager host
|
||||||
|
|
||||||
## Compliance Requirements
|
## Compliance Requirements
|
||||||
|
|
||||||
<!-- Regulatory and compliance requirements -->
|
### HIPAA (Health Insurance Portability and Accountability Act)
|
||||||
|
|
||||||
|
- **Audit Controls (§164.312(b)):** Comprehensive audit logging of all system activity (covered by audit logging requirements)
|
||||||
|
- **Access Controls (§164.312(a)(1)):** RBAC with group-scoped access, unique user identification, MFA enforcement
|
||||||
|
- **Integrity Controls (§164.312(c)(1)):** Audit log integrity protection (tamper-evident logging)
|
||||||
|
- **Transmission Security (§164.312(e)(1)):** mTLS for all agent communication, HTTPS for web UI, TLS 1.3 minimum
|
||||||
|
- **Encryption at Rest:** PostgreSQL data encryption (full-disk or column-level for sensitive fields)
|
||||||
|
- **Automatic Logoff (§164.312(a)(2)(iii)):** 1-hour inactivity session timeout
|
||||||
|
|
||||||
|
### PCI-DSS (Payment Card Industry Data Security Standard)
|
||||||
|
|
||||||
|
- **Requirement 6:** Vulnerability management — patch management is core PCI-DSS requirement; system must track and enforce timely patching
|
||||||
|
- **Requirement 7:** Restrict access to need-to-know — RBAC with group-scoped operator access
|
||||||
|
- **Requirement 8:** Identify and authenticate users — MFA required, unique IDs, session timeouts
|
||||||
|
- **Requirement 10:** Track and monitor all access — comprehensive audit logging with 6-month retention
|
||||||
|
- **Requirement 3:** Protect stored data — encryption at rest for PostgreSQL
|
||||||
|
- **Requirement 4:** Encrypt transmission — mTLS (TLS 1.3) for agent communication, HTTPS for web UI
|
||||||
|
|
||||||
## Constraints
|
## Constraints
|
||||||
|
|
||||||
<!-- Implementation constraints -->
|
- Single bare metal/VM host running Ubuntu 24.04
|
||||||
|
- Systemd service management
|
||||||
|
- Internal network only (no public internet exposure)
|
||||||
|
- Rust/Axum backend, React/TypeScript frontend, PostgreSQL database
|
||||||
|
- No direct permissions on managed clients
|
||||||
|
- Certificate distribution to clients is manual
|
||||||
|
|||||||
137
SPEC.md
137
SPEC.md
@ -8,63 +8,156 @@
|
|||||||
|
|
||||||
## Scope
|
## Scope
|
||||||
|
|
||||||
<!-- Define what is in scope and out of scope for this project -->
|
|
||||||
|
|
||||||
**In Scope:**
|
**In Scope:**
|
||||||
|
- Centralized dashboard for fleet-wide patch status monitoring (5 min health polling, 30 min patch polling, on-demand refresh) with visual alerts for unhealthy/unreachable agents
|
||||||
|
- Multi-distribution support (Debian/Ubuntu, RHEL/CentOS/Fedora, Alpine, Arch)
|
||||||
|
- Batch patch operations across multiple hosts
|
||||||
|
- Maintenance window scheduling (per-device, daily/weekly/monthly recurring + one-time) with immediate-apply override
|
||||||
|
- Compliance reporting and patch status dashboards (compliance, patch history, vulnerability exposure, audit trail — exportable as CSV and PDF)
|
||||||
|
- User management with RBAC
|
||||||
|
- Secure mTLS communication with Linux Patch API agents
|
||||||
|
- Real-time job status via WebSocket relay
|
||||||
|
- Host registration (manual FQDN/IP + on-demand CIDR auto-discover)
|
||||||
|
- Static group-based device organization with group-scoped operator access
|
||||||
|
- Email notifications (optional, disabled by default)
|
||||||
|
|
||||||
**Out of Scope:**
|
**Out of Scope:**
|
||||||
|
- Configuration management (Ansible/Puppet/Chef territory)
|
||||||
|
- OS provisioning, imaging, or bootstrapping
|
||||||
|
- Vulnerability scanning (manager consumes CVE data from agents, does not scan)
|
||||||
|
- Mobile UI / native apps
|
||||||
|
- Automated certificate distribution to agents
|
||||||
|
- Agent installation/management (separate concern)
|
||||||
|
- Webhook/Slack/other external notification integrations
|
||||||
|
- Multi-instance clustering / automatic horizontal scaling
|
||||||
|
|
||||||
## Objectives
|
## Objectives
|
||||||
|
|
||||||
<!-- Define primary and secondary objectives -->
|
**Primary Objective:** Provide a centralized web interface to monitor and control patch operations across a fleet of Linux hosts via the Linux Patch API.
|
||||||
|
|
||||||
**Primary Objective:**
|
|
||||||
|
|
||||||
|
|
||||||
**Key Goals:**
|
**Key Goals:**
|
||||||
|
- Fleet-wide visibility into patch status and compliance
|
||||||
|
- Zero-friction patch deployment via maintenance windows
|
||||||
|
- Secure-by-design architecture (Rust core, mTLS, MFA)
|
||||||
|
- Single-instance simplicity supporting up to 2,500 managed hosts
|
||||||
|
|
||||||
## Constraints
|
## Constraints
|
||||||
|
|
||||||
<!-- Define technical, deployment, and security constraints -->
|
|
||||||
|
|
||||||
**Deployment:**
|
**Deployment:**
|
||||||
|
- Single bare metal/VM host running Ubuntu 24.04
|
||||||
|
- Systemd service management
|
||||||
|
- Internal network access only (same network as managed agents)
|
||||||
|
|
||||||
**Technical:**
|
**Technical:**
|
||||||
|
- Backend: Rust with Axum framework, Tokio async runtime
|
||||||
|
- Frontend: React + TypeScript SPA
|
||||||
|
- Database: PostgreSQL with SQLx for type-safe queries
|
||||||
|
- Real-time: Axum native WebSocket support for agent-to-browser relay
|
||||||
|
- Single-instance design (manual horizontal scaling by dividing clients between multiple Patch Manager hosts if needed)
|
||||||
|
- Fleet capacity: ~500 typical, up to 2,500 hosts
|
||||||
|
|
||||||
**Security:**
|
**Security:**
|
||||||
|
- Combination authentication: local accounts + Azure SSO
|
||||||
|
- MFA required for all users (TOTP or WebAuthn)
|
||||||
|
- Azure SSO users may use Azure's built-in MFA
|
||||||
|
- mTLS for all agent communication
|
||||||
|
- HTTPS for web UI
|
||||||
|
- Role-based access control:
|
||||||
|
- **Admin**: Full access to manage all aspects of Linux Patch Manager
|
||||||
|
- **Operator**: Can add/remove clients, manage schedules and patches only for devices in their group memberships
|
||||||
|
- Groups are static; devices and operators can belong to multiple groups
|
||||||
|
- Ungrouped devices can be managed by any operator or admin
|
||||||
|
|
||||||
## Architecture Overview
|
## Architecture Overview
|
||||||
|
|
||||||
<!-- High-level architecture description -->
|
Management plane web application communicating with Linux Patch API agents on each managed host.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────┐
|
||||||
|
│ Linux Patch Manager │ ← Web UI (this project)
|
||||||
|
│ (Management Plane) │ Rust/Axum + React/TS
|
||||||
|
│ PostgreSQL + WebSocket │
|
||||||
|
└──────────────┬──────────────┘
|
||||||
|
│ mTLS / REST API
|
||||||
|
┌──────┼──────┐
|
||||||
|
▼ ▼ ▼
|
||||||
|
┌──────┐┌──────┐┌──────┐
|
||||||
|
│ Host ││ Host ││ Host │ ← Linux Patch API agents
|
||||||
|
│ A ││ B ││ C │ (up to 2,500)
|
||||||
|
└──────┘└──────┘└──────┘
|
||||||
|
```
|
||||||
|
|
||||||
## API Integration
|
## API Integration
|
||||||
|
|
||||||
<!-- How Linux Patch Manager integrates with Linux Patch API -->
|
|
||||||
|
|
||||||
**Upstream Dependency:** [Linux Patch API](https://gitea.moon-dragon.us/echo/linux_patch_api)
|
**Upstream Dependency:** [Linux Patch API](https://gitea.moon-dragon.us/echo/linux_patch_api)
|
||||||
|
- All managed device access uses the Linux Patch API
|
||||||
|
- mTLS certificate-based authentication to agents
|
||||||
|
- Hybrid sync/async operation model (sync for queries, async jobs for patch operations)
|
||||||
|
- WebSocket streaming for real-time job status from agents
|
||||||
|
- Base path: `/api/v1/`, Port: 12443, TLS 1.3 only
|
||||||
|
|
||||||
|
## Certificate Management
|
||||||
|
|
||||||
|
- Internal CA managed by Patch Manager, installed on the same host
|
||||||
|
- Patch Manager issues and renews client certificates for mTLS communication
|
||||||
|
- Certificate distribution to managed target clients is manual (server administrators responsible)
|
||||||
|
- Patch Manager has no direct permissions on managed clients
|
||||||
|
|
||||||
## User Interface
|
## User Interface
|
||||||
|
|
||||||
<!-- Web UI specifications -->
|
### Pages/Views
|
||||||
|
|
||||||
|
1. **Dashboard** — Fleet overview: patch compliance %, host health summary, pending patches, upcoming maintenance windows. Includes root CA certificate download icon.
|
||||||
|
2. **Hosts** — List of all managed hosts with filtering by group, health status, OS, patch status
|
||||||
|
3. **Host Detail** — Single host view: system info, installed packages, available patches, job history, maintenance window config. Includes host-specific mTLS certificate download icon.
|
||||||
|
4. **Patch Deployment** — Select hosts → review available patches → deploy (queue for window or apply now)
|
||||||
|
5. **Jobs** — Real-time job monitoring with WebSocket status updates
|
||||||
|
6. **Maintenance Windows** — Create/edit recurring and one-time windows per device
|
||||||
|
7. **Groups** — Manage static groups, assign hosts and operators
|
||||||
|
8. **Reports** — Generate and export compliance, patch history, vulnerability, audit reports (CSV and PDF)
|
||||||
|
9. **Users** — Manage local accounts, MFA setup, group assignments
|
||||||
|
10. **Certificates** — View/manage internal CA, issue/renew client certs
|
||||||
|
11. **Settings** — System configuration, Azure SSO setup, polling intervals
|
||||||
|
|
||||||
## Error Handling
|
## Error Handling
|
||||||
|
|
||||||
<!-- Error handling strategy -->
|
**Agent Communication Failures:**
|
||||||
|
- Mark host as unhealthy in dashboard
|
||||||
|
- Retry with exponential backoff (3 retries, max 30 minutes between retries)
|
||||||
|
- Continue processing other hosts without blocking
|
||||||
|
|
||||||
|
**Patch Job Failures:**
|
||||||
|
- Auto-retry failed patch jobs once if still within the maintenance window
|
||||||
|
- If retry fails or window has closed, surface failure prominently to operators
|
||||||
|
|
||||||
|
**Batch Operations with Partial Failures:**
|
||||||
|
- Auto-retry failed hosts once
|
||||||
|
- If retry fails, report which hosts failed and let operator decide next steps
|
||||||
|
- Successful hosts proceed normally regardless of failures
|
||||||
|
|
||||||
## Assumptions
|
## Assumptions
|
||||||
|
|
||||||
<!-- List assumptions -->
|
- Patch Manager host has network connectivity to all managed agents
|
||||||
|
- Linux Patch API agent is installed and running on each managed host
|
||||||
|
- Server administrators manually distribute mTLS and root certificates to managed clients
|
||||||
|
- PostgreSQL is available on the Patch Manager host
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
<!-- External and internal dependencies -->
|
- Linux Patch API (upstream agent on each managed host)
|
||||||
|
- PostgreSQL
|
||||||
|
- Internal CA for mTLS certificates
|
||||||
|
- Azure AD (optional, for SSO)
|
||||||
|
|
||||||
## Audit Logging
|
## Audit Logging
|
||||||
|
|
||||||
<!-- Audit logging requirements -->
|
**Captured Events:**
|
||||||
|
- All user login/logout events (success and failure)
|
||||||
|
- All patch operations (who triggered, which hosts, what patches, queue vs immediate)
|
||||||
|
- All host registration/removal events
|
||||||
|
- All group membership changes (hosts and users)
|
||||||
|
- All certificate operations (issue, renew, download)
|
||||||
|
- All maintenance window changes
|
||||||
|
- All configuration changes
|
||||||
|
|
||||||
|
**Retention:** 6 months
|
||||||
|
|||||||
Reference in New Issue
Block a user