- Auto-enrollment on startup when certs are missing/invalid and enrollment.manager_url configured - Certificate validation (existence, parse, expiry, key match, CA trust) - --enroll exits after completion (no port conflict with systemd service) - --renew-certs flag for manual cert renewal - SO_REUSEADDR on TcpListener::bind (prevents Address already in use) - Polling token persistence for enrollment resume after restart - Exit code strategy (0=clean, 1=error, 2=enrollment in progress) - HTTP 409 (host already exists) handling during enrollment - Move 'Listening on' log after actual bind - Increase RestartSec to 10s and add StartLimitBurst=5 - Postinst checks for certs and enrollment URL, prints guidance - EnrollmentConfig.manager_url changed to Option<String> - cert_renewal_threshold_days and polling_token config fields - Updated SPEC.md and DEPLOYMENT_GUIDE.md with new workflow - RCA document for crash loop root cause analysis - Version bumped to 1.2.0
379 lines
17 KiB
Markdown
379 lines
17 KiB
Markdown
# Linux_Patch_API - Specification Document
|
|
|
|
## Project Overview
|
|
**Title:** Linux_Patch_API
|
|
**Description:** API service for secure remote management of patching processes and software add/removal
|
|
**Version:** 1.2.0
|
|
**Status:** Draft
|
|
|
|
## Scope
|
|
|
|
**Primary Focus:** Debian/Ubuntu (apt/dpkg)
|
|
**Secondary Support:** RHEL/CentOS/Fedora (dnf/yum), Alpine (apk), Arch (pacman)
|
|
|
|
**In Scope:**
|
|
- Remote package installation, removal, and updates
|
|
- System patch management (security and general updates)
|
|
- Multi-distribution support via pluggable package manager backend
|
|
- Secure authentication and authorization for remote operations
|
|
- Audit logging of all package/patch operations
|
|
- RESTful API design with JSON request/response
|
|
|
|
**Supported Operations:**
|
|
- **Core Package:** GET /packages (with filtering), GET /packages/{name}, POST /packages (install), PUT /packages/{name} (update), DELETE /packages/{name} (remove)
|
|
- **Patch Management:** GET /patches (list available), POST /patches/apply (apply all or specific)
|
|
- **System Info:** GET /system/info (OS version, kernel, last update time)
|
|
|
|
**Operation Features:**
|
|
- Version pinning support (e.g., package=1.2.3)
|
|
- Rollback capability for failed operations
|
|
- Batch operations: best-effort (not atomic)
|
|
- GET filtering: by name, version, status, upgradable
|
|
- No pagination (return all results)
|
|
|
|
**Out of Scope (for now):**
|
|
- GUI/frontend interface (API-only)
|
|
- Automatic scheduled patching (manual trigger only)
|
|
- Cross-distribution package compatibility
|
|
|
|
## Objectives
|
|
|
|
**Primary Objective:** Provide secure API for remote patch/package management on individual Linux hosts
|
|
|
|
**Key Goals:**
|
|
- Run as a system service on each managed machine (Option B: Agent Per Host)
|
|
- systemd for Debian/Ubuntu, RHEL/CentOS/Fedora
|
|
- OpenRC for Alpine Linux
|
|
- Internal network access only (no internet exposure)
|
|
- Support Debian/Ubuntu first, then expand to other distributions
|
|
- Maintain audit trail of all operations
|
|
- Minimal resource footprint
|
|
- mTLS certificate-based authentication
|
|
- IP whitelist enforcement (deny by default)
|
|
|
|
## Constraints
|
|
|
|
**Deployment:**
|
|
- One API instance per host
|
|
- Internal network only (LAN/private network)
|
|
- No public internet exposure
|
|
- Must run as a system service (init system determined by distribution)
|
|
- systemd: Debian, Ubuntu, RHEL, CentOS, Fedora
|
|
- OpenRC: Alpine Linux
|
|
|
|
**Technical:**
|
|
- Must run with elevated privileges for package management (root/sudo)
|
|
- Must support multiple Linux distributions
|
|
- API-only (no GUI required)
|
|
- mTLS required for all client connections
|
|
- IP whitelist enforcement required (block all by default, allow only listed)
|
|
- Technology: Rust with Actix-web or Axum framework
|
|
- Default API port: 12443
|
|
- API Style: Pure REST (resources as nouns, HTTP verbs for actions)
|
|
- Data Format: JSON for all requests and responses
|
|
- Response Envelope: Standard envelope with success, request_id, timestamp, data, error fields
|
|
- Request IDs: Required for all requests (tracking and auditing)
|
|
- Execution Model: Hybrid (sync for quick ops, async with job ID for long ops)
|
|
- Real-time Updates: WebSocket support for job status streaming
|
|
- Job Timeout: Maximum 30 minutes per operation
|
|
|
|
**Security:**
|
|
- Certificate-based authentication (mTLS)
|
|
- Network-level access control via IP/subnet whitelist
|
|
- Silent drop for non-mTLS connections (no response)
|
|
- Detailed error messages for authenticated clients only
|
|
|
|
## Error Handling
|
|
|
|
- **HTTP Status Codes:** Standard HTTP status codes (200, 400, 401, 403, 404, 500, etc.)
|
|
|
|
- **Error Response Format** (inside envelope's `error` field):
|
|
```json
|
|
{
|
|
"code": "ERROR_CODE",
|
|
"message": "Human-readable description",
|
|
"details": {},
|
|
"retryable": false
|
|
}
|
|
```
|
|
|
|
- **Error Categories:**
|
|
- Authentication failures (invalid/expired cert)
|
|
- Authorization failures (valid cert but not whitelisted IP)
|
|
- Package not found
|
|
- Package manager errors (dpkg/apt failures)
|
|
- Permission denied
|
|
- System resource errors
|
|
- Configuration errors
|
|
- Enrollment failures:
|
|
- `ENROLLMENT_DENIED`: Admin rejected enrollment request on linux_patch_manager
|
|
- `ENROLLMENT_EXPIRED`: Polling token expired or purged (HTTP 404 from manager)
|
|
- `ENROLLMENT_TIMEOUT`: 24-hour polling limit exceeded (1440 attempts exhausted)
|
|
- `ENROLLMENT_RATE_LIMITED`: Request rate limit exceeded (1/minute per IP, HTTP 429)
|
|
- `PKI_PROVISION_FAILED`: Certificate write or PEM validation failed during provisioning
|
|
|
|
- **Error Message Policy:**
|
|
- mTLS confirmed clients: Detailed error messages with debugging info
|
|
- Non-mTLS connections: Silent drop (no response sent)
|
|
- DEBUG mode: Include additional diagnostic information
|
|
|
|
- **Idempotency:** Operations should be idempotent where possible (safe to retry)
|
|
|
|
## Assumptions
|
|
|
|
- Host machines have network connectivity to internal clients
|
|
- API clients are trusted internal systems
|
|
- Host OS has Rust toolchain available (or can be installed)
|
|
- Package manager (apt/dnf/apk/pacman) is functional on target hosts
|
|
|
|
## Dependencies
|
|
|
|
- Linux OS with package manager support
|
|
- Init system for service management (distribution-dependent)
|
|
- systemd (most distributions)
|
|
- OpenRC (Alpine Linux)
|
|
- Network access for API communication
|
|
- mTLS certificate infrastructure (CA, client certs)
|
|
- IP whitelist configuration
|
|
- Rust toolchain (rustc, cargo)
|
|
- Actix-web or Axum framework
|
|
- Internal CA for certificate issuance (self-hosted)
|
|
|
|
## Certificate Management
|
|
|
|
- **CA Type:** Internal self-hosted Certificate Authority
|
|
- **Distribution:** Automated Self-Enrollment (preferred) OR manual certificate distribution
|
|
- Auto-Enrollment: daemon automatically enrolls on startup when certs are missing/invalid and `enrollment.manager_url` is configured
|
|
- Manual Enrollment: `linux-patch-api --enroll <url>` for explicit enrollment (exits after completion, does not start server)
|
|
- Eliminates manual certificate copy/permission management for new hosts
|
|
- **Scope:** Limited distribution (small number of authorized clients)
|
|
- **Validity Period:** 1 year standard expiration
|
|
- **Client Identity:** Unique certificate per client (no shared certs)
|
|
- **Rotation:** Automatic re-enrollment when certs are expiring within threshold, or manual via `--renew-certs`
|
|
|
|
## Certificate Validation
|
|
|
|
On startup, the daemon validates all configured TLS certificates before attempting to bind the listening port. Validation checks (in order):
|
|
|
|
1. **Existence**: All three cert files (`ca_cert`, `server_cert`, `server_key`) must exist at configured paths
|
|
2. **Parse**: Each file must be valid PEM — CA and server cert must parse as X.509, server key must parse as PKCS#8 or PKCS#1
|
|
3. **Expiry**: CA cert and server cert must not be expired (`not_after > now`). Certs expiring within `cert_renewal_threshold_days` (default 7) trigger a warning and auto-re-enrollment
|
|
4. **Key match**: Server cert's public key must correspond to server key's private key
|
|
5. **CA trust**: Server cert must be signed by the CA cert (or chain validates to CA)
|
|
|
|
Validation results determine startup behavior:
|
|
|
|
| Result | Action |
|
|
|--------|--------|
|
|
| Valid | Start normally with mTLS |
|
|
| ExpiringSoon | Log warning, start normally, schedule background re-enrollment |
|
|
| Missing/Corrupt/Expired/KeyMismatch/Untrusted | Trigger auto-enrollment if `enrollment.manager_url` configured, otherwise exit with guidance |
|
|
|
|
## Self-Enrollment Workflow
|
|
|
|
The `linux_patch_api` daemon supports automated self-enrollment to securely request identity from the `linux_patch_manager` without manual PKI distribution. Enrollment can be triggered automatically on startup or manually via CLI.
|
|
|
|
### Auto-Enrollment on Startup
|
|
|
|
When cert validation fails AND `enrollment.manager_url` is configured in config.yaml, the daemon automatically enters enrollment mode:
|
|
|
|
1. Log: "Certs [status]. Auto-enrolling with <url>"
|
|
2. Skip cert validation (`skip_tls_validation=true`)
|
|
3. Register with manager (POST /api/v1/enroll)
|
|
- If host already exists: log warning, skip to step 5 (polling for re-provisioning)
|
|
- If new registration: receive polling token
|
|
4. Poll for approval (GET /api/v1/enroll/status/{token})
|
|
- Persist `polling_token` to config.yaml for resume after restart
|
|
- Retry with exponential backoff on network errors
|
|
5. When approved: provision certs (ca.pem, server.pem, server.key)
|
|
6. Re-validate certs (should now be Valid)
|
|
7. Continue to normal mTLS server startup
|
|
|
|
If enrollment fails (network error, manager unreachable):
|
|
- Log: "Auto-enrollment failed: [error]. Retrying on next restart."
|
|
- Exit code 1 (triggers systemd restart with backoff)
|
|
|
|
If no enrollment URL is configured and certs are invalid:
|
|
- Log clear error with guidance (add URL, run --enroll, or place certs manually)
|
|
- Exit code 0 (don't trigger restart loop)
|
|
|
|
### Polling Token Resume
|
|
|
|
If the service restarts during enrollment polling:
|
|
1. Read `polling_token` from config.yaml (persisted during enrollment)
|
|
2. If token exists and `enrollment.manager_url` is configured:
|
|
a. Resume polling from where left off
|
|
b. Don't re-register (host already has a pending request)
|
|
3. On successful provisioning:
|
|
a. Clear `polling_token` from config.yaml
|
|
b. Continue to normal server startup
|
|
|
|
### CLI Enrollment (`--enroll`)
|
|
|
|
```
|
|
linux-patch-api --enroll https://<manager_url>
|
|
```
|
|
|
|
The enrollment flow runs and **exits after completion** — it does NOT start the server. This prevents port conflicts with the systemd service.
|
|
|
|
- On success: prints "Enrollment complete. Start service: systemctl start linux-patch-api" and exits with code 0
|
|
- On failure: exits with code 1 (triggers systemd restart if configured)
|
|
|
|
### Security Model
|
|
- Initial connection uses TLS with verification disabled (`danger_accept_invalid_certs`)
|
|
- Manager approval workflow provides authorization; transport encryption is secondary during enrollment
|
|
- URL scheme validation prevents SSRF/path traversal (only `http` and `https` permitted)
|
|
- Host component required in manager URL
|
|
|
|
### Phase 1: Registration Request
|
|
- **Identity Extraction:**
|
|
- `/etc/machine-id` (fallback: `/var/lib/dbus/machine-id`)
|
|
- FQDN from `hostname -f` (validated contains `.`) → `hostname` + `hostname -d` → `/etc/hostname` → `hostname` → `localhost`
|
|
- Non-loopback IPv4 addresses via network interface enumeration
|
|
- OS details from `/etc/os-release` (distro, version, id_like, codename) + kernel version (`uname -r`)
|
|
- **Submission:** Unauthenticated `POST /api/v1/enroll` to manager with identity payload
|
|
- **Response:** HTTP 202 with temporary `polling_token` (bearer credential — never logged)
|
|
- **Rate Limiting:** Manager enforces 1 request/minute per IP (HTTP 429 on violation)
|
|
|
|
### Phase 2: Polling & Approval
|
|
- **Polling Loop:** `GET /api/v1/enroll/status/{token}` with configurable interval and max attempts
|
|
- **Default Interval:** 60 seconds (configurable via `enrollment.polling_interval_seconds`)
|
|
- **Hard Timeout:** 24 hours maximum (1440 attempts; values >1440 clamped to 1440)
|
|
- **Status States:**
|
|
- `pending`: Continue polling
|
|
- `approved`: Proceed to Phase 3 with PKI bundle
|
|
- `denied`: Abort enrollment (`ENROLLMENT_DENIED`)
|
|
- `not_found`: Token expired/purged — abort (`ENROLLMENT_EXPIRED`)
|
|
- **Signal Handling:** SIGINT (Ctrl+C) and SIGTERM interrupt polling gracefully
|
|
- **Transient Errors:** Network failures and HTTP 5xx retried with backoff; HTTP 404/429 terminate immediately
|
|
- **Log Throttling:** Status logged every 10 attempts or after 5 minutes elapsed
|
|
|
|
### Phase 3: PKI Provisioning
|
|
- **Certificate Validation:** PEM format verification for CA cert, server cert, and server key (supports PKCS#8, PKCS#1 RSA, EC keys)
|
|
- **Atomic Writes:** Temp file → set permissions → atomic rename pattern prevents partial writes
|
|
- **File Permissions:** Keys at `0600`, certificates at `0644`, directories at `0755`
|
|
- **Backup Strategy:** Existing certificate files renamed to `.bak` before overwrite
|
|
- **Target Paths:** Configured via TLS settings or defaults (`/etc/linux_patch_api/certs/{ca,server,server.key}.pem`)
|
|
- **Whitelist Auto-Append:** Manager IP resolved (hostname → DNS or direct IP) and appended to `/etc/linux_patch_api/whitelist.yaml`
|
|
- **Completion:** For auto-enrollment: daemon transitions to standard mTLS listening mode without requiring service restart. For `--enroll`: daemon exits with code 0.
|
|
|
|
## Audit Logging
|
|
|
|
- **Log Content (All Required):**
|
|
- Every API request (endpoint, method, timestamp, client cert ID)
|
|
- Package operations (package name, version, action: install/remove/update)
|
|
- Authentication events (success/failure, cert validation)
|
|
- IP whitelist denials (blocked connection attempts)
|
|
- System changes made by the API
|
|
- Configuration changes (whitelist updates, cert renewals)
|
|
|
|
- **Enrollment Events:**
|
|
- Registration request submitted (machine-id, FQDN, manager URL — polling token never logged)
|
|
- Polling status changes (`pending` → `approved`/`denied`/`not_found`)
|
|
- PKI bundle provisioning success/failure with target file paths
|
|
- Whitelist auto-append during enrollment (manager IP added)
|
|
- Enrollment timeout or denial with reason
|
|
- Signal interruption (SIGINT/SIGTERM) during polling
|
|
- Auto-enrollment triggered (cert status and reason)
|
|
- Certificate validation results on startup
|
|
|
|
- **Log Storage:**
|
|
- Primary: Distribution-appropriate logging
|
|
- systemd journal (journalctl) on systemd systems
|
|
- syslog/local files on OpenRC systems
|
|
- Secondary: Optional remote syslog server (universal)
|
|
- Local file logs as fallback (`/var/log/linux_patch_api/`)
|
|
|
|
- **Log Retention:**
|
|
- Retention period: 30 days
|
|
- Rotation: Daily
|
|
- Compression: Enabled (gzip)
|
|
|
|
- **Log Levels:** Configurable at runtime (DEBUG, INFO, WARN, ERROR)
|
|
|
|
## IP Whitelist Configuration
|
|
|
|
- **Config File:** `/etc/linux_patch_api/whitelist.yaml`
|
|
- **Format:** YAML
|
|
- **Management:** Static config file (edit file to change)
|
|
- **Apply Method:** Instant apply on file change (no restart required)
|
|
- **Logging:** All whitelist changes logged to audit log
|
|
|
|
- **Supported Entries:**
|
|
- Individual IPv4 addresses (e.g., `192.168.1.100`)
|
|
- CIDR subnets (e.g., `192.168.1.0/24`)
|
|
- Hostnames (resolved at config load)
|
|
- IPv6: Not supported (explicitly out of scope)
|
|
|
|
- **Default Behavior:** Block all connections not in whitelist
|
|
|
|
## API Configuration Management
|
|
|
|
- **Config File:** `/etc/linux_patch_api/config.yaml`
|
|
- **Format:** YAML
|
|
- **Reload Method:** Config file watch with auto-reload on change (no restart required)
|
|
|
|
- **Configurable Settings:**
|
|
- **Server:** port, bind address, timeout settings
|
|
- **mTLS:** CA cert path, server cert path, server key path
|
|
- **Logging:** log level, log retention, remote syslog server (optional)
|
|
- **Security:** job timeout, max concurrent jobs, rate limiting
|
|
- **Enrollment:** manager_url, polling_interval_seconds, max_poll_attempts, polling_token (auto-populated), cert_renewal_threshold_days
|
|
|
|
- **Hard-Coded Paths (not configurable):**
|
|
- Whitelist file: `/etc/linux_patch_api/whitelist.yaml`
|
|
- Data directory: `/var/lib/linux_patch_api/`
|
|
- Job storage: `/var/lib/linux_patch_api/jobs/`
|
|
- Log directory: `/var/log/linux_patch_api/`
|
|
|
|
## Testing Requirements
|
|
|
|
- **Unit Test Coverage:** Minimum 95%
|
|
- **Integration Tests:** API endpoint testing with mock package manager
|
|
- **Security Tests:** mTLS validation, IP whitelist enforcement, authentication failures
|
|
- **End-to-End Tests:** Full workflow testing on actual Ubuntu systems
|
|
|
|
- **Test Environments:**
|
|
- Primary: Ubuntu (latest LTS)
|
|
- CI/CD Pipeline: Required for automated testing
|
|
- Penetration Testing: Required before release
|
|
|
|
## CLI Arguments
|
|
|
|
| Flag | Description |
|
|
|------|-------------|
|
|
| `--config <PATH>` or `-c` | Path to configuration file (default: `/etc/linux_patch_api/config.yaml`) |
|
|
| `--verbose` or `-v` | Enable verbose (DEBUG-level) logging |
|
|
| `--enroll <MANAGER_URL>` | Run self-enrollment flow with manager at URL, then EXIT (does not start server) |
|
|
| `--renew-certs` | Validate existing certs and re-enroll if expiring within threshold or invalid |
|
|
| `--version` or `-V` | Print version information and exit |
|
|
| `--help` or `-h` | Display help information and exit |
|
|
|
|
### Enrollment Mode Behavior
|
|
|
|
- **`--enroll <URL>`**: Executes enrollment flow, provisions certs, then **exits with code 0**. Does NOT start server or bind port. Print guidance message on completion.
|
|
- **Auto-enrollment (startup)**: Triggered when cert validation fails and `enrollment.manager_url` is configured. After provisioning, continues to normal server startup.
|
|
- **`--renew-certs`**: Validates existing certs. If expiring within threshold or invalid, re-enrolls using `enrollment.manager_url` from config. Exits with code 0 after completion.
|
|
- TLS verification is disabled on initial manager connection (manager approval workflow provides security)
|
|
|
|
### Exit Codes
|
|
|
|
| Code | Meaning | systemd Behavior |
|
|
|------|---------|------------------|
|
|
| 0 | Clean exit: no certs and no enrollment URL configured, or --enroll/--renew-certs success | No restart |
|
|
| 1 | Error: config error, enrollment network failure, cert validation error | Restart with backoff |
|
|
| 2 | Certs invalid, auto-enrollment in progress (will retry) | Restart with backoff |
|
|
|
|
- **Phase 1 Acceptance Criteria:**
|
|
- All endpoints functional with mTLS authentication
|
|
- IP whitelist enforced correctly
|
|
- Audit logging working (journalctl + file)
|
|
- Config auto-reload working
|
|
- WebSocket status streaming functional
|
|
- Rollback mechanism tested
|
|
|
|
- **Security Audit:** No formal audit planned at this time
|
|
|
|
---
|
|
*Following kiro spec-driven development standards*
|