ARCHITECTURE.md -> 0.0.3 REQUIREMENTS.md -> 0.0.2 SPEC.md -> 0.0.2 Closed OI-01 through OI-06 with concrete decisions: - OI-01: Encryption at rest delegated to hardware-host (no OS-level LUKS, no column-level). Compliance intent preserved at infrastructure layer. - OI-02: Argon2id starting parameters m=64MiB, t=3, p=1; 250-500 ms login-latency budget on Intel Xeon 4c/16GB; calibration recorded in system_config at deploy time. - OI-03: JWT signing = EdDSA/Ed25519; 90-day key rotation with 24-hour overlap; web holds signing key, worker holds verifying key only. - OI-04: CIDR scan concurrency = 128, per-host timeout = 1.5 s; /22 across sites completes under 10 s; progress UI + cancel required. - OI-05: PDF stack = printpdf + plotters (in-process, no sidecar); charts required; no branding; no digital signatures. - OI-06: /status/health = minimal unauthenticated liveness; /api/v1/status/fleet = authenticated fleet aggregates. Added architecture decisions: - AD-15: Web UI TLS certificate strategy (self-signed from internal CA by default; operator may supply external cert) - AD-16: Azure SSO + SMTP runtime configuration via Settings GUI with test-connection actions - AD-17: PDF generation via printpdf + plotters - AD-18: IP whitelist enforcement at every listener Added FR-07 (System Configuration) in REQUIREMENTS.md covering Azure SSO GUI, SMTP GUI, polling-interval tuning, Web UI TLS strategy, and IP whitelist management. SDD review pass also added (from v0.0.2): - IEEE 1016-aligned structure (Introduction, Stakeholders, Design Rationale, Risks, Open Issues, Glossary, References, Revision History) - Portable ASCII diagrams; split into Context/Logical/Deployment/Process views - Explicit WebSocket ticket authentication flow - Rollback data flow (6.5) - API error envelope + X-Request-Id correlation - Configuration, migration, and backup/DR sections - Worker heartbeat and dead-process detection - Sizing math for 2,500-host scalability claim - Split /status/health (Manager) from /api/v1/health (Agent) namespaces See ARCHITECTURE.md section 18 for the full change log.
207 lines
10 KiB
Markdown
207 lines
10 KiB
Markdown
# Linux_Patch_Manager — Requirements Document
|
||
|
||
## Document Control
|
||
|
||
| Field | Value |
|
||
|-------|-------|
|
||
| Title | Linux_Patch_Manager — Requirements Document |
|
||
| Version | 0.0.2 |
|
||
| Status | Draft |
|
||
| Last Updated | 2026-04-23 |
|
||
| Related Docs | `SPEC.md`, `ARCHITECTURE.md`, `README.md` |
|
||
|
||
### Revision History
|
||
|
||
| Version | Date | Summary |
|
||
|---------|------|---------|
|
||
| 0.0.1 | 2026-04-21 | Initial draft |
|
||
| 0.0.2 | 2026-04-23 | Aligned with SDD v0.0.3: hardware-host encryption at rest (no OS-level LUKS), Argon2id, EdDSA JWTs, Azure SSO configuration GUI, web-UI TLS cert strategy, SMTP runtime configurability |
|
||
|
||
---
|
||
|
||
## Project Overview
|
||
**Title:** Linux_Patch_Manager
|
||
**Description:** Enterprise-class, secure, web-based management interface for controlling patching and updates on Linux servers and workstations
|
||
**Version:** 0.0.2
|
||
**Status:** Draft
|
||
|
||
## Functional Requirements
|
||
|
||
### FR-01: Host Management
|
||
|
||
- Manual host registration by FQDN or IP address (FQDN resolved to IP at add time)
|
||
- On-demand auto-discovery targeting a CIDR subnet range (scans for Linux Patch API agents on port 12443)
|
||
- Host metadata tracked: hostname, IP, OS, kernel, agent version, last seen, health status
|
||
- Static group-based organization with many-to-many relationships (hosts can belong to multiple groups)
|
||
- Ungrouped hosts can be managed by any operator or admin
|
||
- Host removal with audit logging
|
||
|
||
### FR-02: Patch Monitoring
|
||
|
||
- Scheduled background polling: 5-minute intervals for health checks, 30-minute intervals for patch data
|
||
- On-demand refresh triggered by operator/admin from the UI
|
||
- Visual dashboard alerts for unhealthy or unreachable agents (red/yellow status indicators)
|
||
- CVE severity, patch priority, and reboot requirement display per host
|
||
|
||
### FR-03: Patch Deployment
|
||
|
||
- Patches queue for the next available maintenance window by default
|
||
- Immediate-apply override option for urgent patches
|
||
- No approval gate required — operator/admin triggers deployment directly
|
||
- Auto-retry failed patch jobs once if still within the maintenance window, then surface failure prominently
|
||
- Batch operations across multiple hosts with partial failure handling (auto-retry once, then report failures)
|
||
- Rollback support via upstream Linux Patch API rollback endpoint
|
||
|
||
### FR-04: Scheduling
|
||
|
||
- Maintenance windows are per-device (not per-group)
|
||
- Recurring schedules: daily, weekly, or monthly
|
||
- One-time maintenance windows
|
||
- Patch operations execute automatically when a maintenance window opens
|
||
|
||
### FR-05: Reporting
|
||
|
||
- Compliance report: percentage of hosts fully patched, by group or fleet-wide
|
||
- Patch history: log of all patch operations per host or per group
|
||
- Vulnerability exposure: hosts with known CVEs pending patches
|
||
- Audit trail: who did what, when (user actions, patch operations)
|
||
- Charts and graphs required in PDF exports (compliance trends, patch-status distributions)
|
||
- Export formats: CSV and PDF
|
||
|
||
### FR-06: User Management
|
||
|
||
- **Admin role**: Full access to manage all aspects of Linux Patch Manager
|
||
- **Operator role**: Can add/remove clients, manage schedules and patches only for devices in their group memberships
|
||
- Operators can belong to multiple groups
|
||
- Local accounts with MFA required (TOTP or WebAuthn)
|
||
- Azure SSO integration (optional, with Azure's built-in MFA)
|
||
- Group membership management for users and hosts
|
||
|
||
### FR-07: System Configuration
|
||
|
||
- Azure SSO configuration GUI in the Settings page (tenant ID, client ID, client secret, redirect URI, scopes)
|
||
- "Test connection" action in the Azure SSO config GUI that performs a round-trip against Azure AD and reports success/failure without enabling SSO
|
||
- SMTP configuration GUI (host, port, auth mode, username/password, TLS mode, from-address); disabled by default
|
||
- "Send test email" action in the SMTP config GUI
|
||
- Polling-interval tuning (health and patch pollers)
|
||
- Web UI TLS certificate strategy selection: self-signed from the internal CA (default) or operator-supplied certificate/key (e.g., existing infrastructure wildcard)
|
||
|
||
## Non-Functional Requirements
|
||
|
||
### NFR-01: Security
|
||
|
||
- Combination authentication: local accounts + Azure SSO
|
||
- MFA required for all users (TOTP or WebAuthn; Azure MFA for SSO users)
|
||
- Password hashing: **Argon2id**
|
||
- Session management: short-lived JWT access tokens (15 min, signed with **EdDSA / Ed25519**) + server-side opaque refresh tokens (1-hour inactivity timeout, rotated on use, revocable)
|
||
- JWT signing key rotation every 90 days with a 24-hour overlap window for in-flight tokens
|
||
- mTLS for all agent communication (certificate-based, **TLS 1.3 only**)
|
||
- HTTPS enforced for web UI (TLS 1.3 only)
|
||
- Internal CA managed by Patch Manager for mTLS certificate issuance and renewal
|
||
- Certificate distribution to managed clients is manual (server administrators responsible)
|
||
- RBAC with group-scoped access control
|
||
- IP whitelist enforcement on all connection points
|
||
|
||
### NFR-02: Performance
|
||
|
||
- Support 500 typical managed hosts, up to 2,500
|
||
- Dashboard load time under 5 seconds for full fleet view
|
||
- Background polling must not degrade UI responsiveness
|
||
- Concurrent batch operations (e.g., patch 500 hosts simultaneously) must not overwhelm the system
|
||
- Login latency budget: 250–500 ms on target hardware (Intel Xeon, 4 cores, 16 GB RAM); Argon2id parameters calibrated to land in this window
|
||
- CIDR auto-discovery of a `/22` network (~1,024 hosts) across sites completes within 10 seconds wall-clock
|
||
|
||
### NFR-03: Scalability
|
||
|
||
- Single-instance design on bare metal/VM (Ubuntu 24.04)
|
||
- Manual horizontal scaling by dividing clients between multiple Patch Manager hosts if needed
|
||
- No automatic clustering or load balancing required
|
||
|
||
### NFR-04: Reliability
|
||
|
||
- Agent communication failures: retry with exponential backoff (3 retries, max 30 minutes between retries)
|
||
- Patch job failures: auto-retry once within maintenance window, then surface to operators
|
||
- Batch partial failures: auto-retry once, then report remaining failures to operator
|
||
- Continue processing healthy hosts regardless of individual host failures
|
||
|
||
### NFR-05: Usability
|
||
|
||
- 11-page web UI (React + TypeScript SPA)
|
||
- Responsive design for desktop/laptop screens
|
||
- Dark mode support
|
||
- Certificate download links integrated into dashboard (root CA) and host detail (host-specific mTLS)
|
||
- Long-running scans (CIDR discovery, full-fleet operations) must display progress and offer a cancel action
|
||
|
||
## Interface Requirements
|
||
|
||
### IR-01: Web Interface
|
||
|
||
- React + TypeScript SPA served by Axum backend
|
||
- Real-time job status via WebSocket relay (agent WebSocket → Patch Manager → browser)
|
||
- RESTful API backend for all UI operations
|
||
- Certificate download endpoints for root CA and host-specific mTLS certs
|
||
- Unauthenticated liveness endpoint at `/status/health` (minimal: process up, DB reachable, worker heartbeat fresh)
|
||
- Authenticated fleet-aggregate endpoint at `/api/v1/status/fleet` (counts of healthy / degraded / unreachable agents)
|
||
|
||
### IR-02: Linux Patch API Integration
|
||
|
||
- All managed device communication via Linux Patch API (upstream agent)
|
||
- mTLS client certificate authentication to each agent
|
||
- Base path: `/api/v1/`, Port: 12443, TLS 1.3 only
|
||
- Sync operations: GET endpoints (packages, patches, system info, health)
|
||
- Async operations: POST/PUT/DELETE endpoints (install, update, remove, patch apply, reboot)
|
||
- Job status tracking via `GET /api/v1/jobs/{id}` and WebSocket `/api/v1/ws/jobs`
|
||
- Rollback via `POST /api/v1/jobs/{id}/rollback`
|
||
|
||
## Data Requirements
|
||
|
||
- **Database:** PostgreSQL 16+
|
||
- **Operational data retention:** 30 days (host patch history, job history, health history)
|
||
- **Audit log retention:** 6 months
|
||
- **Data storage:** All data on Patch Manager host
|
||
|
||
## Compliance Requirements
|
||
|
||
### HIPAA (Health Insurance Portability and Accountability Act)
|
||
|
||
- **Audit Controls (§164.312(b)):** Comprehensive audit logging of all system activity (hash-chained rows for integrity)
|
||
- **Access Controls (§164.312(a)(1)):** RBAC with group-scoped access, unique user identification, MFA enforcement
|
||
- **Integrity Controls (§164.312(c)(1)):** Audit log integrity protection via hash chaining
|
||
- **Transmission Security (§164.312(e)(1)):** mTLS for all agent communication, HTTPS for web UI, TLS 1.3 minimum
|
||
- **Encryption at Rest:** Provided by the underlying hardware host (infrastructure-level full-disk encryption). The application does not manage disk encryption.
|
||
- **Automatic Logoff (§164.312(a)(2)(iii)):** 1-hour inactivity session timeout
|
||
|
||
### PCI-DSS (Payment Card Industry Data Security Standard)
|
||
|
||
- **Requirement 3:** Protect stored data — encryption at rest provided by the hardware host
|
||
- **Requirement 4:** Encrypt transmission — mTLS (TLS 1.3) for agent communication, HTTPS (TLS 1.3) for web UI
|
||
- **Requirement 6:** Vulnerability management — patch management is the core function; system tracks and enforces timely patching
|
||
- **Requirement 7:** Restrict access to need-to-know — RBAC with group-scoped operator access
|
||
- **Requirement 8:** Identify and authenticate users — MFA required, unique IDs, session timeouts
|
||
- **Requirement 10:** Track and monitor all access — comprehensive audit logging with 6-month retention
|
||
|
||
## Audit Logging
|
||
|
||
**Captured Events:**
|
||
- All user login/logout events (success and failure)
|
||
- All patch operations (who triggered, which hosts, what patches, queue vs. immediate)
|
||
- All host registration/removal events
|
||
- All group membership changes (hosts and users)
|
||
- All certificate operations (issue, renew, download, revoke)
|
||
- All maintenance window changes
|
||
- All configuration changes (including Azure SSO and SMTP configuration)
|
||
|
||
**Integrity:** Tamper-evident via hash-chained rows (`prev_hash`, `row_hash`). Periodic and on-demand integrity verification.
|
||
|
||
**Retention:** 6 months
|
||
|
||
## Constraints
|
||
|
||
- Single bare metal/VM host running Ubuntu 24.04
|
||
- Systemd service management
|
||
- Internal network only (no public internet exposure)
|
||
- Rust/Axum backend, React/TypeScript frontend, PostgreSQL 16+ database
|
||
- No direct permissions on managed clients
|
||
- Certificate distribution to clients is manual
|
||
- Encryption at rest is provided by the hardware host; the application does not configure or manage disk encryption
|