Private
Public Access
1
0

feat: add auto-enrollment, cert validation, and crash loop fixes

- Auto-enrollment on startup when certs are missing/invalid and enrollment.manager_url configured
- Certificate validation (existence, parse, expiry, key match, CA trust)
- --enroll exits after completion (no port conflict with systemd service)
- --renew-certs flag for manual cert renewal
- SO_REUSEADDR on TcpListener::bind (prevents Address already in use)
- Polling token persistence for enrollment resume after restart
- Exit code strategy (0=clean, 1=error, 2=enrollment in progress)
- HTTP 409 (host already exists) handling during enrollment
- Move 'Listening on' log after actual bind
- Increase RestartSec to 10s and add StartLimitBurst=5
- Postinst checks for certs and enrollment URL, prints guidance
- EnrollmentConfig.manager_url changed to Option<String>
- cert_renewal_threshold_days and polling_token config fields
- Updated SPEC.md and DEPLOYMENT_GUIDE.md with new workflow
- RCA document for crash loop root cause analysis
- Version bumped to 1.2.0
This commit is contained in:
2026-05-29 10:44:42 -05:00
parent 48ec57581e
commit 1322598581
43 changed files with 1364 additions and 974 deletions

13
.gitignore vendored
View File

@ -1,2 +1,15 @@
/target
/releases/
# Build artifacts
debian/tmp/
debian/linux-patch-api/
debian/.debhelper/
debian/debhelper-build-stamp
debian/files
debian/linux-patch-api.debhelper.log
debian/linux-patch-api.postrm.debhelper
debian/linux-patch-api.substvars
*.deb
*.buildinfo
*.changes

4
Cargo.lock generated
View File

@ -1916,7 +1916,7 @@ dependencies = [
[[package]]
name = "linux-patch-api"
version = "1.1.17"
version = "1.2.0"
dependencies = [
"actix",
"actix-rt",
@ -1942,10 +1942,12 @@ dependencies = [
"serde_json",
"serde_yaml",
"serial_test",
"socket2 0.5.10",
"sysinfo",
"systemd",
"tempfile",
"thiserror 1.0.69",
"time",
"tokio",
"tokio-rustls",
"tokio-test",

View File

@ -1,6 +1,6 @@
[package]
name = "linux-patch-api"
version = "1.1.17"
version = "1.2.0"
edition = "2021"
authors = ["Echo <echo@moon-dragon.us>"]
description = "Secure remote package management API for Linux systems"
@ -48,6 +48,7 @@ uuid = { version = "1", features = ["v4", "serde"] }
# Time/Date
chrono = { version = "0.4", features = ["serde"] }
time = "0.3"
# Error handling
thiserror = "1"
@ -76,6 +77,9 @@ pidlock = "0.2"
# URL parsing
url = "2"
# Socket options (SO_REUSEADDR)
socket2 = { version = "0.5", features = ["all"] }
# File locking for concurrent-safe whitelist modifications
fs2 = "0.4"

View File

@ -448,15 +448,37 @@ shred -u /tmp/client001.key.pem
## Self-Enrollment Deployment
Self-enrollment allows a new host to automatically request and receive mTLS certificates from the `linux_patch_manager` without manual PKI distribution. The daemon extracts its machine identity, registers with the manager, polls for admin approval, and provisions certificates before starting the mTLS server.
Self-enrollment allows a new host to automatically request and receive mTLS certificates from the `linux_patch_manager` without manual PKI distribution. The daemon supports two enrollment modes:
1. **Auto-enrollment (recommended):** When `enrollment.manager_url` is configured in `config.yaml`, the daemon automatically enrolls on startup when certificates are missing or invalid. After provisioning, it continues to normal mTLS server startup.
2. **Manual enrollment:** Run `linux-patch-api --enroll <manager_url>` explicitly. The process provisions certificates and **exits** — it does NOT start the server. Start the service separately after enrollment completes.
### How It Works
The enrollment workflow operates in three phases:
1. **Registration:** Extracts `/etc/machine-id`, FQDN, IP address, and OS details. Submits an unauthenticated `POST /api/v1/enroll` request to the manager. Receives a temporary `polling_token`.
2. **Polling & Approval:** Enters a polling loop querying `GET /api/v1/enroll/status/{token}` (default: every 60 seconds, up to 1440 attempts = 24 hours). Aborts on HTTP 403/404 (denied/purged).
3. **Provisioning:** On HTTP 200, downloads the PKI bundle (`ca.crt`, `server.crt`, `server.key`), writes certificates to configured mTLS paths, appends manager IP to whitelist, and transitions to standard mTLS listening mode.
2. **Polling & Approval:** Enters a polling loop querying `GET /api/v1/enroll/status/{token}` (default: every 60 seconds, up to 1440 attempts = 24 hours). Aborts on HTTP 403/404 (denied/purged). The polling token is persisted to `config.yaml` for resume after service restart.
3. **Provisioning:** On HTTP 200, downloads the PKI bundle (`ca.crt`, `server.crt`, `server.key`), writes certificates to configured mTLS paths, appends manager IP to whitelist. For auto-enrollment, transitions to standard mTLS listening mode. For `--enroll`, exits with code 0.
### Certificate Validation
On startup, the daemon validates all configured TLS certificates before attempting to bind the listening port:
1. **Existence:** All three cert files must exist at configured paths
2. **Parse:** Each file must be valid PEM (X.509 for certs, PKCS#8/PKCS#1 for keys)
3. **Expiry:** Certs must not be expired. Certs expiring within `cert_renewal_threshold_days` (default 7) trigger a warning
4. **Key match:** Server cert public key must correspond to server key private key
5. **CA trust:** Server cert must be signed by the CA cert
Validation results determine startup behavior:
| Result | Action |
|--------|--------|
| Valid | Start normally with mTLS |
| ExpiringSoon | Log warning, start normally, schedule re-enrollment |
| Missing/Corrupt/Expired/KeyMismatch/Untrusted | Auto-enroll if `enrollment.manager_url` configured, otherwise exit with guidance |
### Prerequisites
@ -480,62 +502,53 @@ nslookup manager.example.com
curl -ks https://manager.example.com/api/v1/health
```
### Step-by-Step Enrollment Procedure
### Deployment Method 1: Auto-Enrollment (Recommended)
#### Step 1: Install linux-patch-api Package
The simplest deployment. Just install the package, configure the manager URL, and start the service. The daemon handles the rest.
#### Step 1: Install Package
```bash
# Debian/Ubuntu
dpkg -i linux-patch-api_1.0.0-1_amd64.deb
dpkg -i linux-patch-api_1.2.0-1_amd64.deb
# RHEL/CentOS/Fedora
rpm -ivh linux-patch-api-1.0.0-1.x86_64.rpm
rpm -ivh linux-patch-api-1.2.0-1.x86_64.rpm
```
#### Step 2: Run Enrollment Command
#### Step 2: Configure Enrollment URL
```bash
# Basic enrollment with manager URL
sudo linux-patch-api --enroll https://manager.example.com
# Edit the config to add manager URL
cat >> /etc/linux_patch_api/config.yaml <<EOF
# With verbose logging for troubleshooting
sudo linux-patch-api --enroll https://manager.example.com --verbose
enrollment:
manager_url: "https://linux-patch-manager-dev.moon-dragon.us"
polling_interval_seconds: 60
max_poll_attempts: 1440
cert_renewal_threshold_days: 7
EOF
```
The enrollment process will:
- Extract machine identity from `/etc/machine-id` and system properties
- Submit registration request to manager
- Enter polling loop (logs progress every 60 seconds)
- Await admin approval on the manager side
- Download and install certificates automatically
- Update IP whitelist with manager address
- Start mTLS server upon successful provisioning
#### Step 3: Monitor Enrollment Progress
#### Step 3: Start Service
```bash
# View enrollment logs in real-time
# Enable and start
systemctl enable linux-patch-api
systemctl start linux-patch-api
# Watch auto-enrollment progress
journalctl -u linux-patch-api -f
# Or if running manually:
sudo linux-patch-api --enroll https://manager.example.com --verbose
```
**Expected log progression:**
```
INFO Enrollment mode activated - manager_url=https://manager.example.com
INFO Phase 1: Submitting registration request
INFO Registration submitted - polling_token=abc123...
INFO Phase 2: Polling for approval (interval=60s, max_attempts=1440)
INFO Poll attempt 1/1440 - status=pending
... (admin approves on manager side) ...
INFO Phase 3: Provisioning certificates
INFO ca.pem written to /etc/linux_patch_api/certs/ca.pem
INFO server.pem written to /etc/linux_patch_api/certs/server.pem
INFO server.key written to /etc/linux_patch_api/certs/server.key
INFO Manager IP added to whitelist
INFO Enrollment complete - proceeding to server startup
```
The daemon will:
1. Validate certificates → find them missing
2. Read `enrollment.manager_url` → begin auto-enrollment
3. Register with manager, poll for approval
4. Provision certificates after admin approval
5. Continue to normal mTLS server startup
**No manual `--enroll` command needed.** The service self-heals on restart if certificates are missing or invalid.
#### Step 4: Admin Approval (Manager Side)
@ -561,6 +574,61 @@ curl --cacert /etc/linux_patch_api/certs/ca.pem \
https://localhost:12443/health
```
### Deployment Method 2: Manual Enrollment
For environments where auto-enrollment is not desired, or for initial setup before the service is enabled.
#### Step 1: Install Package
```bash
# Debian/Ubuntu
dpkg -i linux-patch-api_1.2.0-1_amd64.deb
# RHEL/CentOS/Fedora
rpm -ivh linux-patch-api-1.2.0-1.x86_64.rpm
```
#### Step 2: Run Enrollment Command
```bash
# Basic enrollment with manager URL
sudo linux-patch-api --enroll https://linux-patch-manager-dev.moon-dragon.us
# With verbose logging for troubleshooting
sudo linux-patch-api --enroll https://linux-patch-manager-dev.moon-dragon.us --verbose
```
**Important:** The `--enroll` command provisions certificates and **exits**. It does NOT start the server. This prevents port conflicts with the systemd service.
The enrollment process will:
- Extract machine identity from `/etc/machine-id` and system properties
- Submit registration request to manager
- Enter polling loop (logs progress every 60 seconds)
- Await admin approval on the manager side
- Download and install certificates automatically
- Update IP whitelist with manager address
- Print: "Enrollment complete. Start service: systemctl start linux-patch-api"
- Exit with code 0
#### Step 3: Start Service
```bash
systemctl enable linux-patch-api
systemctl start linux-patch-api
systemctl status linux-patch-api
```
### Certificate Renewal
Certificates can be renewed manually or automatically:
```bash
# Manual renewal
sudo linux-patch-api --renew-certs
# Auto-renewal: certs expiring within cert_renewal_threshold_days trigger re-enrollment on startup
```
### Configuration Options
Enrollment behavior can be tuned via the `enrollment` section in `/etc/linux_patch_api/config.yaml`:
@ -568,16 +636,22 @@ Enrollment behavior can be tuned via the `enrollment` section in `/etc/linux_pat
```yaml
# Enrollment Configuration
enrollment:
manager_url: "https://linux-patch-manager-dev.moon-dragon.us"
polling_interval_seconds: 60 # Time between approval polls (default: 60)
max_poll_attempts: 1440 # Maximum poll attempts (default: 1440 = 24 hours)
polling_token: "" # Auto-populated during enrollment (do not edit)
cert_renewal_threshold_days: 7 # Days before expiry to trigger re-enrollment
```
**Parameter Reference:**
| Parameter | Default | Description |
|-----------|---------|-------------|
| `manager_url` | (none) | Manager URL for auto-enrollment. Required for auto-enrollment on startup. |
| `polling_interval_seconds` | 60 | Seconds between approval status polls. Minimum: 10 |
| `max_poll_attempts` | 1440 | Maximum polling attempts before timeout. Effective timeout = interval × attempts |
| `polling_token` | (empty) | Auto-populated during enrollment for resume after restart. Do not edit manually. |
| `cert_renewal_threshold_days` | 7 | Days before cert expiry to trigger automatic re-enrollment |
**Effective Timeout Calculation:**
- Default: 60s × 1440 = 86,400 seconds (24 hours)

102
SPEC.md
View File

@ -3,7 +3,7 @@
## Project Overview
**Title:** Linux_Patch_API
**Description:** API service for secure remote management of patching processes and software add/removal
**Version:** 0.0.1
**Version:** 1.2.0
**Status:** Draft
## Scope
@ -142,23 +142,82 @@
## Certificate Management
- **CA Type:** Internal self-hosted Certificate Authority
- **Distribution:** Manual certificate distribution OR automated Self-Enrollment
- Self-Enrollment provides automatic PKI provisioning after admin approval on linux_patch_manager
- **Distribution:** Automated Self-Enrollment (preferred) OR manual certificate distribution
- Auto-Enrollment: daemon automatically enrolls on startup when certs are missing/invalid and `enrollment.manager_url` is configured
- Manual Enrollment: `linux-patch-api --enroll <url>` for explicit enrollment (exits after completion, does not start server)
- Eliminates manual certificate copy/permission management for new hosts
- **Scope:** Limited distribution (small number of authorized clients)
- **Validity Period:** 1 year standard expiration
- **Client Identity:** Unique certificate per client (no shared certs)
- **Rotation:** Manual renewal process before expiration
- **Rotation:** Automatic re-enrollment when certs are expiring within threshold, or manual via `--renew-certs`
## Certificate Validation
On startup, the daemon validates all configured TLS certificates before attempting to bind the listening port. Validation checks (in order):
1. **Existence**: All three cert files (`ca_cert`, `server_cert`, `server_key`) must exist at configured paths
2. **Parse**: Each file must be valid PEM — CA and server cert must parse as X.509, server key must parse as PKCS#8 or PKCS#1
3. **Expiry**: CA cert and server cert must not be expired (`not_after > now`). Certs expiring within `cert_renewal_threshold_days` (default 7) trigger a warning and auto-re-enrollment
4. **Key match**: Server cert's public key must correspond to server key's private key
5. **CA trust**: Server cert must be signed by the CA cert (or chain validates to CA)
Validation results determine startup behavior:
| Result | Action |
|--------|--------|
| Valid | Start normally with mTLS |
| ExpiringSoon | Log warning, start normally, schedule background re-enrollment |
| Missing/Corrupt/Expired/KeyMismatch/Untrusted | Trigger auto-enrollment if `enrollment.manager_url` configured, otherwise exit with guidance |
## Self-Enrollment Workflow
The `linux_patch_api` daemon supports an automated self-enrollment workflow to securely request identity from the `linux_patch_manager` without manual PKI distribution.
The `linux_patch_api` daemon supports automated self-enrollment to securely request identity from the `linux_patch_manager` without manual PKI distribution. Enrollment can be triggered automatically on startup or manually via CLI.
### Auto-Enrollment on Startup
When cert validation fails AND `enrollment.manager_url` is configured in config.yaml, the daemon automatically enters enrollment mode:
1. Log: "Certs [status]. Auto-enrolling with <url>"
2. Skip cert validation (`skip_tls_validation=true`)
3. Register with manager (POST /api/v1/enroll)
- If host already exists: log warning, skip to step 5 (polling for re-provisioning)
- If new registration: receive polling token
4. Poll for approval (GET /api/v1/enroll/status/{token})
- Persist `polling_token` to config.yaml for resume after restart
- Retry with exponential backoff on network errors
5. When approved: provision certs (ca.pem, server.pem, server.key)
6. Re-validate certs (should now be Valid)
7. Continue to normal mTLS server startup
If enrollment fails (network error, manager unreachable):
- Log: "Auto-enrollment failed: [error]. Retrying on next restart."
- Exit code 1 (triggers systemd restart with backoff)
If no enrollment URL is configured and certs are invalid:
- Log clear error with guidance (add URL, run --enroll, or place certs manually)
- Exit code 0 (don't trigger restart loop)
### Polling Token Resume
If the service restarts during enrollment polling:
1. Read `polling_token` from config.yaml (persisted during enrollment)
2. If token exists and `enrollment.manager_url` is configured:
a. Resume polling from where left off
b. Don't re-register (host already has a pending request)
3. On successful provisioning:
a. Clear `polling_token` from config.yaml
b. Continue to normal server startup
### CLI Enrollment (`--enroll`)
### CLI Invocation
```
linux-patch-api --enroll https://<manager_url>
```
The enrollment flow runs before mTLS server startup. On success, the daemon proceeds to normal server initialization with the newly provisioned certificates.
The enrollment flow runs and **exits after completion** — it does NOT start the server. This prevents port conflicts with the systemd service.
- On success: prints "Enrollment complete. Start service: systemctl start linux-patch-api" and exits with code 0
- On failure: exits with code 1 (triggers systemd restart if configured)
### Security Model
- Initial connection uses TLS with verification disabled (`danger_accept_invalid_certs`)
@ -196,7 +255,7 @@ The enrollment flow runs before mTLS server startup. On success, the daemon proc
- **Backup Strategy:** Existing certificate files renamed to `.bak` before overwrite
- **Target Paths:** Configured via TLS settings or defaults (`/etc/linux_patch_api/certs/{ca,server,server.key}.pem`)
- **Whitelist Auto-Append:** Manager IP resolved (hostname → DNS or direct IP) and appended to `/etc/linux_patch_api/whitelist.yaml`
- **Completion:** Daemon transitions to standard mTLS listening mode without requiring service restart
- **Completion:** For auto-enrollment: daemon transitions to standard mTLS listening mode without requiring service restart. For `--enroll`: daemon exits with code 0.
## Audit Logging
@ -215,6 +274,8 @@ The enrollment flow runs before mTLS server startup. On success, the daemon proc
- Whitelist auto-append during enrollment (manager IP added)
- Enrollment timeout or denial with reason
- Signal interruption (SIGINT/SIGTERM) during polling
- Auto-enrollment triggered (cert status and reason)
- Certificate validation results on startup
- **Log Storage:**
- Primary: Distribution-appropriate logging
@ -257,15 +318,12 @@ The enrollment flow runs before mTLS server startup. On success, the daemon proc
- **mTLS:** CA cert path, server cert path, server key path
- **Logging:** log level, log retention, remote syslog server (optional)
- **Security:** job timeout, max concurrent jobs, rate limiting
- **Enrollment:** manager_url, polling_interval_seconds, max_poll_attempts, polling_token (auto-populated), cert_renewal_threshold_days
- **Hard-Coded Paths (not configurable):**
- Whitelist file: `/etc/linux_patch_api/whitelist.yaml`
- Data directory: `/var/lib/linux_patch_api/`
- Job storage: `/var/lib/linux_patch_api/jobs/`
- Hard-Coded Paths (not configurable):
- Whitelist file: `/etc/linux_patch_api/whitelist.yaml`
- Data directory: `/var/lib/linux_patch_api/`
- Job storage: `/var/lib/linux_patch_api/jobs/`
- Log directory: `/var/log/linux_patch_api/`
## Testing Requirements
@ -286,15 +344,25 @@ The enrollment flow runs before mTLS server startup. On success, the daemon proc
|------|-------------|
| `--config <PATH>` or `-c` | Path to configuration file (default: `/etc/linux_patch_api/config.yaml`) |
| `--verbose` or `-v` | Enable verbose (DEBUG-level) logging |
| `--enroll <MANAGER_URL>` | Run self-enrollment flow with manager at URL, then start mTLS server |
| `--enroll <MANAGER_URL>` | Run self-enrollment flow with manager at URL, then EXIT (does not start server) |
| `--renew-certs` | Validate existing certs and re-enroll if expiring within threshold or invalid |
| `--version` or `-V` | Print version information and exit |
| `--help` or `-h` | Display help information and exit |
### Enrollment Mode Behavior
- When `--enroll` is specified, the daemon executes the self-enrollment flow before starting the mTLS server
- On enrollment success: proceeds to normal server startup with provisioned certificates
- On enrollment failure: exits immediately with error code (no server started)
- TLS verification disabled on initial manager connection (manager approval workflow provides security)
- **`--enroll <URL>`**: Executes enrollment flow, provisions certs, then **exits with code 0**. Does NOT start server or bind port. Print guidance message on completion.
- **Auto-enrollment (startup)**: Triggered when cert validation fails and `enrollment.manager_url` is configured. After provisioning, continues to normal server startup.
- **`--renew-certs`**: Validates existing certs. If expiring within threshold or invalid, re-enrolls using `enrollment.manager_url` from config. Exits with code 0 after completion.
- TLS verification is disabled on initial manager connection (manager approval workflow provides security)
### Exit Codes
| Code | Meaning | systemd Behavior |
|------|---------|------------------|
| 0 | Clean exit: no certs and no enrollment URL configured, or --enroll/--renew-certs success | No restart |
| 1 | Error: config error, enrollment network failure, cert validation error | Restart with backoff |
| 2 | Certs invalid, auto-enrollment in progress (will retry) | Restart with backoff |
- **Phase 1 Acceptance Criteria:**
- All endpoints functional with mTLS authentication

View File

@ -9,7 +9,9 @@ Type=simple
NotifyAccess=all
ExecStart=/usr/bin/linux-patch-api --config /etc/linux_patch_api/config.yaml
Restart=on-failure
RestartSec=5s
RestartSec=10s
StartLimitBurst=5
StartLimitIntervalSec=300
TimeoutStopSec=30s
# Process management

View File

@ -1,11 +0,0 @@
linux-patch-api (1.0.0-1) stable; urgency=medium
* Initial production release
* Secure mTLS-authenticated REST API for remote package management
* 15 API endpoints for package install/remove, patch application, system management
* Asynchronous job processing with WebSocket status streaming
* IP whitelist enforcement and comprehensive audit logging
* Systemd integration with security hardening
* Supports Debian 11/12, Ubuntu 20.04/22.04/24.04
-- Echo <echo@moon-dragon.us> Thu, 09 Apr 2026 18:57:12 -0500

View File

@ -1,4 +0,0 @@
debian/tmp/usr/bin/linux-patch-api
debian/tmp/lib/systemd/system/linux-patch-api.service
debian/tmp/etc/linux_patch_api/config.yaml
debian/tmp/etc/linux_patch_api/whitelist.yaml

View File

@ -1,30 +0,0 @@
# Automatically added by dh_installsystemd/13.31
if [ "$1" = "configure" ] || [ "$1" = "abort-upgrade" ] || [ "$1" = "abort-deconfigure" ] || [ "$1" = "abort-remove" ] ; then
# The following line should be removed in trixie or trixie+1
deb-systemd-helper unmask 'linux-patch-api.service' >/dev/null || true
# was-enabled defaults to true, so new installations run enable.
if deb-systemd-helper --quiet was-enabled 'linux-patch-api.service'; then
# Enables the unit on first installation, creates new
# symlinks on upgrades if the unit file has changed.
deb-systemd-helper enable 'linux-patch-api.service' >/dev/null || true
else
# Update the statefile to add new symlinks (if any), which need to be
# cleaned up on purge. Also remove old symlinks.
deb-systemd-helper update-state 'linux-patch-api.service' >/dev/null || true
fi
fi
# End automatically added section
# Automatically added by dh_installsystemd/13.31
if [ "$1" = "configure" ] || [ "$1" = "abort-upgrade" ] || [ "$1" = "abort-deconfigure" ] || [ "$1" = "abort-remove" ] ; then
if [ -d /run/systemd/system ]; then
systemctl --system daemon-reload >/dev/null || true
if [ -n "$2" ]; then
_dh_action=restart
else
_dh_action=start
fi
deb-systemd-invoke $_dh_action 'linux-patch-api.service' >/dev/null || true
fi
fi
# End automatically added section

View File

@ -1,5 +0,0 @@
# Automatically added by dh_installsystemd/13.31
if [ -z "$DPKG_ROOT" ] && [ "$1" = remove ] && [ -d /run/systemd/system ] ; then
deb-systemd-invoke stop 'linux-patch-api.service' >/dev/null || true
fi
# End automatically added section

200
debian/changelog vendored
View File

@ -1,190 +1,22 @@
linux-patch-api (1.2.0) unstable; urgency=medium
* Add auto-enrollment on startup when certs are missing/invalid
* Add cert validation (existence, parse, expiry, key match, CA trust)
* Add --renew-certs CLI flag for manual cert renewal
* Fix --enroll to exit after completion (no port conflict)
* Add SO_REUSEADDR to prevent Address already in use errors
* Add polling token persistence for enrollment resume after restart
* Add exit code strategy (0=clean, 1=error, 2=enrollment in progress)
* Increase RestartSec to 10s and add StartLimitBurst=5
* Add cert and enrollment URL check in postinst
* Fix misleading "Listening on" log before actual bind
-- Echo <echo@moon-dragon.us> Thu, 29 May 2026 10:20:00 -0500
linux-patch-api (1.1.17) unstable; urgency=medium
* Add mandatory package cache refresh before patch_apply
* Add health check cache refresh when stale (>4h)
* Add cache status fields to health response
* Add 404/fetch error retry with cache refresh
* Add degraded health status on cache failure
* New src/packages/cache.rs module
-- Echo <echo@moon-dragon.us> Tue, 27 May 2026 15:30:00 -0500
linux-patch-api (1.1.16) unstable; urgency=medium
* Add Pacman package manager backend for Arch Linux
* Fix: Pacman backend not yet implemented error on Arch systems
* Support pacman -Q for package listing, pacman -Qi for package details
* Support pacman -Qu for patch/update detection
* Fix Arch CI: add stale package cleanup and version verification
-- Echo <echo@moon-dragon.us> Tue, 20 May 2026 17:11:00 -0500
linux-patch-api (1.1.15) unstable; urgency=medium
* Add DNF package manager backend for Fedora/RHEL/CentOS 8+
* Add YUM package manager backend for RHEL/CentOS 7
* Fix: DNF backend not yet implemented error on Fedora systems
* Support rpm -qa for package listing, rpm -qi for package details
* Support dnf check-update (exit code 100) for patch detection
* Support yum check-update (exit code 100) for patch detection
-- Echo <echo@moon-dragon.us> Tue, 20 May 2026 15:41:00 -0500
linux-patch-api (1.1.14) unstable; urgency=medium
* Fix RPM packaging: pre-build binary before tarball (like Alpine/Arch pattern)
* Fix rpmbuild can't find cargo in PATH - binary now included in source tarball
* Fix config file ownership: add %defattr(-,root,root,-) in %files section
* Fix Requires: libsystemd -> systemd-libs for Fedora compatibility
* Remove Requires: systemd (not needed, may not exist in containers)
* Add stale RPM cleanup and version verification to build-rpm.sh
* Support SKIP_CARGO_BUILD=1 like Alpine/Arch builds
-- Echo <echo@moon-dragon.us> Tue, 20 May 2026 14:44:00 -0500
linux-patch-api (1.1.13) unstable; urgency=medium
* Fix APK backend detection for Alpine (/sbin/apk not /usr/bin/apk)
-- Echo <echo@moon-dragon.us> Tue, 20 May 2026 13:55:00 -0500
linux-patch-api (1.1.12) unstable; urgency=medium
* Add APK (Alpine Linux) package manager backend
* Add machine-id generation to Alpine pre-install script
* Fix OpenRC init script ownership (root:root)
-- Echo <echo@moon-dragon.us> Tue, 20 May 2026 12:25:00 -0500
linux-patch-api (1.1.10-1) unstable; urgency=low
* Fix Alpine install scripts: use separate files with valid abuild suffixes
* Root cause: .apk-install is not a valid abuild suffix (abuild silently fails)
* Correct format: pkgname.pre-install, .post-install, .pre-deinstall, .post-deinstall
* Verified on actual Alpine runner: install script suffixes now pass abuild validation
-- Echo <echo@moon-dragon.us> Wed, 20 May 2026 07:43:00 -0500
linux-patch-api (1.1.9-1) unstable; urgency=low
* Fix non-Ubuntu packages: align Arch, RPM, Alpine with Debian baseline
* Remove system user creation (service runs as root)
* Fix ownership to root:root across all platforms
* Fix Alpine: co-locate install script with APKBUILD
* Fix Arch: correct $startdir path in PKGBUILD
* Fix RPM: add runtime deps, comment BuildRequires for CI
* Add comprehensive installation docs for all platforms
-- Echo <echo@moon-dragon.us> Tue, 19 May 2026 21:54:00 -0500
linux-patch-api (1.1.8-1) unstable; urgency=low
* Fix FQDN resolution: prioritize hostname -f over /etc/hostname for full domain
* Fix display_name blank: add hostname field to enrollment request
* Fix Arch package: add install scripts, user creation, directory creation
* Fix Alpine package: add install scripts, user creation, missing config.yaml
* Fix RPM package: dynamic version, config handling, tarball exclusions
-- Echo <echo@moon-dragon.us> Mon, 18 May 2026 19:34:00 -0500
linux-patch-api (1.1.7-1) unstable; urgency=low
* Fix CI pipeline: add cargo clean and remove old .deb artifacts before packaging
* Bump version to 1.1.7 to ensure clean build with correct binary
-- Echo <echo@moon-dragon.us> Mon, 18 May 2026 12:20:00 -0500
linux-patch-api (1.1.6-1) unstable; urgency=low
* Fix rustls CryptoProvider initialization panic on server startup
* Add explicit CryptoProvider::install_default() for aws-lc-rs
-- Echo <echo@moon-dragon.us> Mon, 18 May 2026 08:45:00 -0500
linux-patch-api (1.1.5-1) unstable; urgency=low
* Fix enrollment IP detection: filter Docker bridge subnets (172.16.0.0/12)
* Fix enrollment IP detection: filter link-local addresses (169.254.0.0/16)
* Add report_interface and report_ip config options for explicit IP override
* Add route-based IP selection using kernel routing table
* Fix package versioning to derive from Cargo.toml
-- Echo <echo@moon-dragon.us> Sun, 18 May 2026 02:00:00 -0500
linux-patch-api (0.3.12-1) unstable; urgency=low
* Fix socket activation detection to use resolved service name
* Queries like "sshd" now correctly resolve to "ssh.socket" for socket activation
-- Echo <echo@moon-dragon.us> Tue, 06 May 2026 20:42:00 -0500
linux-patch-api (0.3.10-1) unstable; urgency=low
* Fix socket activation detection for service status healthy logic
* When service is inactive but enabled, check if .socket unit is active
-- Echo <echo@moon-dragon.us> Mon, 05 May 2026 13:10:00 -0500
linux-patch-api (0.3.9-1) unstable; urgency=low
* Fix socket activation detection for service status healthy logic
* When service is inactive but enabled, check if .socket unit is active
* Mark service healthy if socket is listening (e.g., ssh.socket for ssh.service)
-- Echo <echo@moon-dragon.us> Mon, 05 May 2026 11:25:00 -0500
linux-patch-api (0.3.8-1) unstable; urgency=low
* Add GET /api/v1/system/services/{name} endpoint for service health checks
* Add ServiceStatus struct with systemd and OpenRC support
* Add get_service_status() to PackageManagerBackend trait
* Implement systemd service status via systemctl
* Implement OpenRC service status via rc-service
* Add E2E test for service status endpoint
-- Echo <echo@moon-dragon.us> Mon, 04 May 2026 23:44:00 -0500
linux-patch-api (0.3.5-1) unstable; urgency=low
* Remove CapabilityBoundingSet and AmbientCapabilities - apt needs full root capabilities
* Remove ProtectSystem=strict, NoNewPrivileges, RestrictSUIDSGID - block core functionality
* Remove ReadWritePaths - unnecessary without ProtectSystem=strict
* Fix E2E test: properly FAIL on status=failed package operations
* Fix E2E test: require status=completed for install/update/remove lifecycle
* Update service file Type=notify -> Type=simple
* Add DEBIAN_FRONTEND=noninteractive environment variable
-- Echo <echo@moon-dragon.us> Sat, 03 May 2026 03:15:00 -0500
linux-patch-api (0.3.4-1) unstable; urgency=low
* Fix CI workflow: prevent recursive tag triggers (v* -> v*.*.*)
* Fix CI workflow: upload u2204 deb to same release (no -u2204 suffix)
* Remove sudo from apt commands (service runs as root)
* Remove NoNewPrivileges and RestrictSUIDSGID from service file
* Update service file Type=notify -> Type=simple
* Add DEBIAN_FRONTEND=noninteractive environment variable
-- Echo <echo@moon-dragon.us> Fri, 02 May 2026 22:00:00 -0500
linux-patch-api (0.3.3-1) unstable; urgency=low
* Fix dpkg packaging: remove linux-patch-api user creation
* Change ownership to root:root in preinst/postinst scripts
* Bump version to 0.3.3
-- Echo <echo@moon-dragon.us> Fri, 02 May 2026 21:45:00 -0500
linux-patch-api (0.3.2-1) unstable; urgency=low
* Remove sudo from apt commands in source code
* Remove NoNewPrivileges=true from service file
* Remove RestrictSUIDSGID=true from service file
* Add DEBIAN_FRONTEND=noninteractive to service file
* Fix TLS 1.3 enforcement in mtls.rs
* Add client_disconnect_timeout to main.rs
* Optimize RwLock usage in jobs/manager.rs
* Bump version to 0.3.2
-- Echo <echo@moon-dragon.us> Fri, 02 May 2026 21:30:00 -0500
-- Echo <echo@moon-dragon.us> Thu, 22 May 2026 12:00:00 -0500

View File

@ -1 +0,0 @@
linux-patch-api

2
debian/files vendored
View File

@ -1,2 +0,0 @@
linux-patch-api_1.0.0-1_amd64.buildinfo admin optional
linux-patch-api_1.0.0-1_amd64.deb admin optional

View File

@ -1 +0,0 @@
dh_auto_install

View File

@ -1,12 +0,0 @@
# Automatically added by dh_installsystemd/13.31
if [ "$1" = remove ] && [ -d /run/systemd/system ] ; then
systemctl --system daemon-reload >/dev/null || true
fi
# End automatically added section
# Automatically added by dh_installsystemd/13.31
if [ "$1" = "purge" ]; then
if [ -x "/usr/bin/deb-systemd-helper" ]; then
deb-systemd-helper purge 'linux-patch-api.service' >/dev/null || true
fi
fi
# End automatically added section

View File

@ -1,3 +0,0 @@
shlibs:Depends=libc6 (>= 2.39), libgcc-s1 (>= 4.2)
misc:Depends=
misc:Pre-Depends=

View File

@ -1,4 +0,0 @@
/etc/linux_patch_api/config.yaml
/etc/linux_patch_api/whitelist.yaml
/etc/linux_patch_api/config.yaml
/etc/linux_patch_api/whitelist.yaml

View File

@ -1,23 +0,0 @@
Package: linux-patch-api
Version: 1.0.0-1
Architecture: amd64
Maintainer: Echo <echo@moon-dragon.us>
Installed-Size: 8897
Depends: systemd, libsystemd0, libc6 (>= 2.39), libgcc-s1 (>= 4.2)
Section: admin
Priority: optional
Homepage: https://gitea.moon-dragon.us/echo/linux_patch_api
Description: Secure remote package management API for Linux systems
Linux Patch API provides a secure, mTLS-authenticated REST API for
remote package management operations including:
- Package installation and removal
- Security patch application
- System health monitoring
- Job queue management with WebSocket status streaming
.
Features:
- Mutual TLS (mTLS) authentication
- IP whitelist enforcement
- Asynchronous job processing
- Comprehensive audit logging
- Systemd integration with security hardening

View File

@ -1,5 +0,0 @@
23b89eecc51f46c6813658dd615d13a9 lib/systemd/system/linux-patch-api.service
d64a80e2a796561c39c6941c6b9e268c usr/bin/linux-patch-api
154c7ae7e01ae22cdc8ceea1fd0956e2 usr/share/doc/linux-patch-api/changelog.Debian.gz
978478c6c7f1e9dcb38eb1f2454535c0 usr/share/doc/linux-patch-api/changelog.gz
c2fab316c94aa61adb70d79365cfe08f usr/share/doc/linux-patch-api/copyright

View File

@ -1,49 +0,0 @@
#!/bin/bash
# postinst script for linux-patch-api
# Created by package build system
set -e
# Configure with debhelper
if [ "$1" = "configure" ]; then
echo "Configuring linux-patch-api..."
# Copy example configs if they don't exist
if [ ! -f "/etc/linux_patch_api/config.yaml" ]; then
echo "Creating default config.yaml..."
cp /etc/linux_patch_api/config.yaml.example /etc/linux_patch_api/config.yaml
chmod 640 /etc/linux_patch_api/config.yaml
chown root:root /etc/linux_patch_api/config.yaml
fi
if [ ! -f "/etc/linux_patch_api/whitelist.yaml" ]; then
echo "Creating default whitelist.yaml..."
cp /etc/linux_patch_api/whitelist.yaml.example /etc/linux_patch_api/whitelist.yaml
chmod 640 /etc/linux_patch_api/whitelist.yaml
chown root:root /etc/linux_patch_api/whitelist.yaml
fi
# Reload systemd daemon to pick up new service file
systemctl daemon-reload
# Enable the service (but don't start automatically - admin should configure first)
systemctl enable linux-patch-api.service
echo ""
echo "linux-patch-api installed successfully!"
echo ""
echo "Next steps:"
echo " 1. Configure /etc/linux_patch_api/config.yaml with your settings"
echo " 2. Place TLS certificates in /etc/linux_patch_api/certs/"
echo " 3. Configure IP whitelist in /etc/linux_patch_api/whitelist.yaml"
echo " 4. Start the service: systemctl start linux-patch-api"
echo " 5. Check status: systemctl status linux-patch-api"
echo ""
fi
# Handle upgrade
if [ "$1" = "abort-upgrade" ] || [ "$1" = "abort-remove" ] || [ "$1" = "abort-deconfigure" ]; then
echo "Installation aborted - service remains in previous state"
fi
exit 0

View File

@ -1,52 +0,0 @@
#!/bin/bash
# postrm script for linux-patch-api
# Created by package build system
set -e
# Handle purge - remove all configuration and data
if [ "$1" = "purge" ]; then
echo "Purging linux-patch-api configuration and data..."
# Stop service if still running
if systemctl is-active --quiet linux-patch-api.service 2>/dev/null; then
systemctl stop linux-patch-api.service
fi
# Disable service
if systemctl is-enabled --quiet linux-patch-api.service 2>/dev/null; then
systemctl disable linux-patch-api.service
fi
# Reload systemd to remove service file
systemctl daemon-reload
# Remove configuration directory (preserved by conffiles during normal remove)
if [ -d "/etc/linux_patch_api" ]; then
echo "Removing /etc/linux_patch_api..."
rm -rf /etc/linux_patch_api
fi
# Remove data directory
if [ -d "/var/lib/linux_patch_api" ]; then
echo "Removing /var/lib/linux_patch_api..."
rm -rf /var/lib/linux_patch_api
fi
# Remove log directory
if [ -d "/var/log/linux_patch_api" ]; then
echo "Removing /var/log/linux_patch_api..."
rm -rf /var/log/linux_patch_api
fi
echo "linux-patch-api purged successfully"
fi
# Handle upgrade/remove - just ensure service is disabled
if [ "$1" = "remove" ] || [ "$1" = "upgrade" ]; then
# Service should already be stopped by prerm
# Just reload systemd to remove the service file
systemctl daemon-reload 2>/dev/null || true
fi
exit 0

View File

@ -1,29 +0,0 @@
#!/bin/bash
# preinst script for linux-patch-api
# Created by package build system
set -e
# Check if this is an upgrade
if [ -d "/etc/linux_patch_api" ]; then
echo "Detected existing installation - performing upgrade"
fi
# Create required directories
mkdir -p /etc/linux_patch_api/certs
mkdir -p /var/lib/linux_patch_api
mkdir -p /var/log/linux_patch_api
# Set proper ownership (service runs as root)
chown -R root:root /var/lib/linux_patch_api
chown -R root:root /var/log/linux_patch_api
# Set secure permissions
chmod 750 /etc/linux_patch_api
chmod 750 /etc/linux_patch_api/certs
chmod 755 /var/lib/linux_patch_api
chmod 755 /var/log/linux_patch_api
echo "Pre-installation checks completed successfully"
exit 0

View File

@ -1,33 +0,0 @@
#!/bin/bash
# prerm script for linux-patch-api
# Created by package build system
set -e
# Stop the service before removal/upgrade
if [ "$1" = "remove" ] || [ "$1" = "upgrade" ]; then
echo "Stopping linux-patch-api service..."
if systemctl is-active --quiet linux-patch-api.service; then
systemctl stop linux-patch-api.service
echo "Service stopped successfully"
else
echo "Service was not running"
fi
# Disable the service
if systemctl is-enabled --quiet linux-patch-api.service 2>/dev/null; then
systemctl disable linux-patch-api.service
echo "Service disabled"
fi
fi
# Handle failed upgrade
if [ "$1" = "failed-upgrade" ]; then
echo "Upgrade failed - attempting to restore previous state"
# Previous version should handle restoration
fi
echo "Pre-removal script completed"
exit 0

View File

@ -1,46 +0,0 @@
# Linux Patch API Configuration
# Example configuration file - copy to /etc/linux_patch_api/config.yaml
# Server Configuration
server:
port: 12443
bind: "0.0.0.0"
timeout_seconds: 30
# TLS/mTLS Configuration
tls:
enabled: true
port: 12443
ca_cert: "/etc/linux_patch_api/certs/ca.pem"
server_cert: "/etc/linux_patch_api/certs/server.pem"
server_key: "/etc/linux_patch_api/certs/server.key"
min_tls_version: "1.3"
# Job Configuration
jobs:
max_concurrent: 5
timeout_minutes: 30
storage_path: "/var/lib/linux_patch_api/jobs"
# Logging Configuration
logging:
level: "info"
journal_enabled: true
syslog_enabled: false
# syslog_server: "udp://localhost:514"
file_path: "/var/log/linux_patch_api/audit.log"
retention_days: 30
# IP Whitelist Configuration
whitelist:
path: "/etc/linux_patch_api/whitelist.yaml"
# Entries can be:
# - Individual IPs: "192.168.1.100"
# - CIDR subnets: "192.168.1.0/24"
# - Hostnames: "admin-server.internal"
# Package Manager Backend
package_manager:
# Primary backend (auto-detected if not specified)
# Options: apt, dnf, yum, apk, pacman
backend: "auto"

View File

@ -1,14 +0,0 @@
# Linux Patch API - IP Whitelist Configuration
# Copy to /etc/linux_patch_api/whitelist.yaml
# Block all by default - only listed IPs can access the API
# Supported entry types:
# - Individual IPs: "192.168.1.100"
# - CIDR subnets: "192.168.1.0/24"
# - Hostnames: "admin-server.internal" (resolved at startup)
# Example entries:
entries:
- "192.168.1.0/24" # Management network
- "10.0.0.50" # Specific admin workstation
# - "admin-server.internal" # Hostname example (uncomment to use)

View File

@ -1,62 +0,0 @@
[Unit]
Description=Linux Patch API - Secure Remote Package Management
Documentation=man:linux-patch-api(8)
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
NotifyAccess=all
ExecStart=/usr/bin/linux-patch-api --config /etc/linux_patch_api/config.yaml
Restart=on-failure
RestartSec=5s
TimeoutStopSec=30s
# Process management
RuntimeDirectory=linux-patch-api
RuntimeDirectoryMode=0755
# Security hardening
# NOTE: Package management requires extensive system access. The following
# restrictions have been removed because they block core functionality:
# - ProtectSystem=strict: Blocks writes to /usr, /etc, /lib where packages install
# - NoNewPrivileges: Blocks sudo/setuid which apt needs for _apt sandbox
# - RestrictSUIDSGID: Blocks setuid/setgid which apt needs for _apt sandbox
# - CapabilityBoundingSet: Drops capabilities that apt needs (SETUID, SETGID, CHOWN, etc.)
# - AmbientCapabilities: Same issue as CapabilityBoundingSet
# Network security is provided by mTLS + IP whitelist. The service runs as root
# and MUST be able to install/remove/update packages system-wide.
ProtectHome=true
PrivateTmp=true
ProtectHostname=true
ProtectClock=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectKernelLogs=true
RestrictNamespaces=true
LockPersonality=true
MemoryDenyWriteExecute=false
RestrictRealtime=true
# System call filtering (whitelist approach)
SystemCallFilter=@system-service
SystemCallErrorNumber=EPERM
# Environment
Environment="RUST_BACKTRACE=1"
Environment="DEBIAN_FRONTEND=noninteractive"
Environment="RUST_LOG=info"
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=linux-patch-api
SyslogFacility=daemon
SyslogLevel=info
# Resource limits
LimitNOFILE=65536
LimitNPROC=4096
[Install]
WantedBy=multi-user.target

Binary file not shown.

View File

@ -1,31 +0,0 @@
Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
Upstream-Name: linux-patch-api
Upstream-Contact: Echo <echo@moon-dragon.us>
Source: https://gitea.moon-dragon.us/echo/linux_patch_api
Files: *
Copyright: 2024-2026 Echo <echo@moon-dragon.us>
License: MIT
License: MIT
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
.
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Files: debian/*
Copyright: 2024-2026 Echo <echo@moon-dragon.us>
License: MIT

58
debian/postinst vendored
View File

@ -29,16 +29,60 @@ if [ "$1" = "configure" ]; then
# Enable the service (but don't start automatically - admin should configure first)
systemctl enable linux-patch-api.service
# Check for TLS certificates and enrollment URL
CERT_DIR="/etc/linux_patch_api/certs"
CA_CERT="$CERT_DIR/ca.pem"
SERVER_CERT="$CERT_DIR/server.pem"
SERVER_KEY="$CERT_DIR/server.key.pem"
CONFIG_FILE="/etc/linux_patch_api/config.yaml"
CERTS_MISSING=false
if [ ! -f "$CA_CERT" ] || [ ! -f "$SERVER_CERT" ] || [ ! -f "$SERVER_KEY" ]; then
CERTS_MISSING=true
fi
if [ "$CERTS_MISSING" = true ]; then
echo ""
echo "⚠ TLS certificates are missing. The service will not start without them."
echo ""
# Check if enrollment.manager_url is configured
if [ -f "$CONFIG_FILE" ]; then
# Check for manager_url in config (handles both old String format and new Option format)
MANAGER_URL=$(grep -E '^\s*manager_url:' "$CONFIG_FILE" 2>/dev/null | sed 's/^\s*manager_url:\s*//' | tr -d '"' | tr -d "'" | xargs)
if [ -n "$MANAGER_URL" ] && [ "$MANAGER_URL" != "" ]; then
echo "✓ Auto-enrollment is configured (manager_url: $MANAGER_URL)"
echo " Auto-enrollment will run on first service start."
echo " The service will automatically request and provision certificates."
else
echo "⚠ No enrollment.manager_url found in config.yaml."
echo ""
echo "To enable automatic certificate enrollment, add the manager URL:"
echo " 1. Edit /etc/linux_patch_api/config.yaml"
echo " 2. Add enrollment.manager_url: https://<your-manager-url>"
echo " 3. Start the service: systemctl start linux-patch-api"
echo ""
echo "Or enroll manually:"
echo " linux-patch-api --enroll https://<your-manager-url>"
echo ""
echo "Or place certificates manually:"
echo " - CA certificate: $CA_CERT"
echo " - Server certificate: $SERVER_CERT"
echo " - Server key: $SERVER_KEY"
fi
else
echo "⚠ Config file not found at $CONFIG_FILE"
echo " Please configure the service before starting."
fi
else
echo ""
echo "✓ TLS certificates found. The service is ready to start."
echo " Start the service: systemctl start linux-patch-api"
fi
echo ""
echo "linux-patch-api installed successfully!"
echo ""
echo "Next steps:"
echo " 1. Configure /etc/linux_patch_api/config.yaml with your settings"
echo " 2. Place TLS certificates in /etc/linux_patch_api/certs/"
echo " 3. Configure IP whitelist in /etc/linux_patch_api/whitelist.yaml"
echo " 4. Start the service: systemctl start linux-patch-api"
echo " 5. Check status: systemctl status linux-patch-api"
echo ""
fi
# Handle upgrade

View File

@ -1,46 +0,0 @@
# Linux Patch API Configuration
# Example configuration file - copy to /etc/linux_patch_api/config.yaml
# Server Configuration
server:
port: 12443
bind: "0.0.0.0"
timeout_seconds: 30
# TLS/mTLS Configuration
tls:
enabled: true
port: 12443
ca_cert: "/etc/linux_patch_api/certs/ca.pem"
server_cert: "/etc/linux_patch_api/certs/server.pem"
server_key: "/etc/linux_patch_api/certs/server.key"
min_tls_version: "1.3"
# Job Configuration
jobs:
max_concurrent: 5
timeout_minutes: 30
storage_path: "/var/lib/linux_patch_api/jobs"
# Logging Configuration
logging:
level: "info"
journal_enabled: true
syslog_enabled: false
# syslog_server: "udp://localhost:514"
file_path: "/var/log/linux_patch_api/audit.log"
retention_days: 30
# IP Whitelist Configuration
whitelist:
path: "/etc/linux_patch_api/whitelist.yaml"
# Entries can be:
# - Individual IPs: "192.168.1.100"
# - CIDR subnets: "192.168.1.0/24"
# - Hostnames: "admin-server.internal"
# Package Manager Backend
package_manager:
# Primary backend (auto-detected if not specified)
# Options: apt, dnf, yum, apk, pacman
backend: "auto"

View File

@ -1,14 +0,0 @@
# Linux Patch API - IP Whitelist Configuration
# Copy to /etc/linux_patch_api/whitelist.yaml
# Block all by default - only listed IPs can access the API
# Supported entry types:
# - Individual IPs: "192.168.1.100"
# - CIDR subnets: "192.168.1.0/24"
# - Hostnames: "admin-server.internal" (resolved at startup)
# Example entries:
entries:
- "192.168.1.0/24" # Management network
- "10.0.0.50" # Specific admin workstation
# - "admin-server.internal" # Hostname example (uncomment to use)

View File

@ -1,57 +0,0 @@
[Unit]
Description=Linux Patch API - Secure Remote Package Management
Documentation=man:linux-patch-api(8)
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/bin/linux-patch-api --config /etc/linux_patch_api/config.yaml
Restart=on-failure
RestartSec=5s
TimeoutStopSec=30s
# Process management
RuntimeDirectory=linux-patch-api
RuntimeDirectoryMode=0755
# Security hardening
NoNewPrivileges=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/var/lib/linux_patch_api /var/log/linux_patch_api
PrivateTmp=true
PrivateDevices=true
ProtectHostname=true
ProtectClock=true
ProtectKernelTunables=true
ProtectKernelModules=true
ProtectKernelLogs=true
RestrictNamespaces=true
LockPersonality=true
MemoryDenyWriteExecute=false
RestrictRealtime=true
RestrictSUIDSGID=true
RemoveIPC=true
# System call filtering (whitelist approach)
SystemCallFilter=@system-service
SystemCallErrorNumber=EPERM
# Environment
Environment="RUST_BACKTRACE=1"
Environment="RUST_LOG=info"
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=linux-patch-api
SyslogFacility=daemon
SyslogLevel=info
# Resource limits
LimitNOFILE=65536
LimitNPROC=4096
[Install]
WantedBy=multi-user.target

Binary file not shown.

View File

@ -1,12 +1,18 @@
//! Configuration Loader - YAML config loading
//!
//! Loads and parses YAML configuration files.
//! Provides certificate validation for auto-enrollment workflow.
use anyhow::{Context, Result};
use rustls_pemfile::{certs, private_key};
use serde::{Deserialize, Serialize};
use std::fs::File;
use std::io::BufReader;
use std::path::PathBuf;
use time::OffsetDateTime;
/// Server configuration
#[derive(Debug, Deserialize, Clone)]
#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct ServerConfig {
pub port: u16,
pub bind: String,
@ -19,7 +25,7 @@ fn default_timeout() -> u64 {
}
/// TLS/mTLS configuration
#[derive(Debug, Deserialize, Clone)]
#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct TlsConfig {
#[serde(default = "default_true")]
pub enabled: bool,
@ -40,7 +46,7 @@ fn default_tls_version() -> String {
}
/// Jobs configuration
#[derive(Debug, Deserialize, Clone)]
#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct JobsConfig {
pub max_concurrent: usize,
pub timeout_minutes: u64,
@ -53,7 +59,7 @@ fn default_storage_path() -> String {
}
/// Logging configuration
#[derive(Debug, Deserialize, Clone)]
#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct LoggingConfig {
#[serde(default = "default_log_level")]
pub level: String,
@ -82,7 +88,7 @@ fn default_retention_days() -> u64 {
}
/// Whitelist configuration
#[derive(Debug, Deserialize, Clone)]
#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct WhitelistConfig {
#[serde(default = "default_whitelist_path")]
pub path: String,
@ -93,7 +99,7 @@ fn default_whitelist_path() -> String {
}
/// Package manager configuration
#[derive(Debug, Deserialize, Clone)]
#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct PackageManagerConfig {
#[serde(default = "default_backend")]
pub backend: String,
@ -104,10 +110,13 @@ fn default_backend() -> String {
}
/// Enrollment polling configuration
#[derive(Debug, Clone, Default, Serialize, Deserialize)]
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EnrollmentConfig {
#[serde(default)]
pub manager_url: String,
/// Manager URL for enrollment. None means not configured.
/// Changed from String to Option<String> to support "not configured" state.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub manager_url: Option<String>,
/// Polling token persisted during enrollment for resume after restart.
#[serde(default)]
pub polling_token: String,
#[serde(default = "default_polling_interval")]
@ -122,6 +131,30 @@ pub struct EnrollmentConfig {
/// Highest priority — overrides both `report_interface` and auto-detect.
#[serde(default)]
pub report_ip: Option<String>,
/// Number of days before certificate expiry to trigger re-enrollment warning.
#[serde(default = "default_cert_renewal_threshold_days")]
pub cert_renewal_threshold_days: u32,
}
impl Default for EnrollmentConfig {
fn default() -> Self {
Self {
manager_url: None,
polling_token: String::new(),
polling_interval_seconds: 60,
max_poll_attempts: 1440,
report_interface: None,
report_ip: None,
cert_renewal_threshold_days: 7,
}
}
}
impl EnrollmentConfig {
/// Get the effective manager URL, treating empty strings as None.
pub fn effective_manager_url(&self) -> Option<&str> {
self.manager_url.as_deref().filter(|s| !s.is_empty())
}
}
fn default_polling_interval() -> u64 {
@ -132,8 +165,274 @@ fn default_max_poll_attempts() -> u32 {
1440
}
fn default_cert_renewal_threshold_days() -> u32 {
7
}
/// Certificate validation status returned by validate_certs().
#[derive(Debug, Clone)]
pub enum CertStatus {
/// All certificates are valid and not expiring soon.
Valid,
/// Certificates are valid but expiring within the threshold.
ExpiringSoon {
not_after: OffsetDateTime,
},
/// One or more certificate files are missing.
Missing {
paths: Vec<PathBuf>,
},
/// A certificate file exists but cannot be parsed as valid PEM.
Corrupt {
path: PathBuf,
error: String,
},
/// A certificate has expired (not_after is in the past).
Expired {
path: PathBuf,
not_after: OffsetDateTime,
},
/// Server certificate public key does not match server private key.
KeyMismatch,
/// Server certificate is not signed by the configured CA.
Untrusted,
}
impl std::fmt::Display for CertStatus {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
CertStatus::Valid => write!(f, "Valid"),
CertStatus::ExpiringSoon { not_after } => {
write!(f, "ExpiringSoon (not_after={})", not_after)
}
CertStatus::Missing { paths } => {
let path_strs: Vec<String> =
paths.iter().map(|p| p.display().to_string()).collect();
write!(f, "Missing: [{}]", path_strs.join(", "))
}
CertStatus::Corrupt { path, error } => {
write!(f, "Corrupt: {} ({})", path.display(), error)
}
CertStatus::Expired { path, not_after } => {
write!(f, "Expired: {} (not_after={})", path.display(), not_after)
}
CertStatus::KeyMismatch => write!(f, "KeyMismatch"),
CertStatus::Untrusted => write!(f, "Untrusted"),
}
}
}
/// Validate TLS certificates for the auto-enrollment workflow.
///
/// Checks (in order):
/// 1. Existence: All three cert files must exist at configured paths
/// 2. PEM parse validity: CA and server cert must parse as X.509, server key must parse
/// 3. Expiry: CA and server cert must not be expired
/// 4. Key match: Server cert public key must match server key private key
/// 5. CA trust: Server cert must be signed by the CA
///
/// Returns the most severe status found.
pub fn validate_certs(config: &AppConfig) -> Result<CertStatus> {
let tls = match config.tls_config() {
Some(tls) => tls,
None => return Ok(CertStatus::Valid), // TLS disabled, nothing to validate
};
let threshold_days = config
.enrollment
.as_ref()
.map(|e| e.cert_renewal_threshold_days)
.unwrap_or(7);
// 1. Check existence of all three cert files
let ca_path = PathBuf::from(&tls.ca_cert);
let cert_path = PathBuf::from(&tls.server_cert);
let key_path = PathBuf::from(&tls.server_key);
let mut missing_paths = Vec::new();
if !ca_path.exists() {
missing_paths.push(ca_path.clone());
}
if !cert_path.exists() {
missing_paths.push(cert_path.clone());
}
if !key_path.exists() {
missing_paths.push(key_path.clone());
}
if !missing_paths.is_empty() {
return Ok(CertStatus::Missing {
paths: missing_paths,
});
}
// 2. Parse and validate PEM files using rustls_pemfile
// Parse CA certificate(s)
let ca_file = File::open(&ca_path)
.with_context(|| format!("Failed to open CA certificate: {}", ca_path.display()))?;
let ca_certs: Vec<_> = certs(&mut BufReader::new(ca_file))
.collect::<Result<Vec<_>, _>>()
.map_err(|e| anyhow::anyhow!("Failed to parse CA certificate PEM: {}", e))?;
if ca_certs.is_empty() {
return Ok(CertStatus::Corrupt {
path: ca_path,
error: "No certificates found in CA PEM file".to_string(),
});
}
// Parse server certificate
let server_file = File::open(&cert_path)
.with_context(|| format!("Failed to open server certificate: {}", cert_path.display()))?;
let server_certs: Vec<_> = certs(&mut BufReader::new(server_file))
.collect::<Result<Vec<_>, _>>()
.map_err(|e| anyhow::anyhow!("Failed to parse server certificate PEM: {}", e))?;
if server_certs.is_empty() {
return Ok(CertStatus::Corrupt {
path: cert_path.clone(),
error: "No certificates found in server PEM file".to_string(),
});
}
// Parse server private key
let key_file = File::open(&key_path)
.with_context(|| format!("Failed to open server key: {}", key_path.display()))?;
let server_key = private_key(&mut BufReader::new(key_file))
.map_err(|e| anyhow::anyhow!("Failed to parse server key PEM: {}", e))?;
let server_key = match server_key {
Some(key) => key,
None => {
return Ok(CertStatus::Corrupt {
path: key_path,
error: "No private key found in server key PEM file".to_string(),
})
}
};
// 3. Check expiry using x509_parser
let now = OffsetDateTime::now_utc();
let threshold = time::Duration::days(i64::from(threshold_days));
// Check CA cert expiry
let ca_der = ca_certs
.first()
.expect("ca_certs verified non-empty above");
match x509_parser::parse_x509_certificate(ca_der.as_ref()) {
Ok((_, ca_cert)) => {
let ca_not_after = ca_cert.validity().not_after.to_datetime();
if ca_not_after < now {
return Ok(CertStatus::Expired {
path: ca_path,
not_after: ca_not_after,
});
}
}
Err(e) => {
return Ok(CertStatus::Corrupt {
path: ca_path,
error: format!("Failed to parse CA certificate DER: {}", e),
})
}
}
// Check server cert expiry
let server_der = server_certs
.first()
.expect("server_certs verified non-empty above");
let server_not_after: OffsetDateTime = match x509_parser::parse_x509_certificate(server_der.as_ref()) {
Ok((_, cert)) => {
let not_after = cert.validity().not_after.to_datetime();
if not_after < now {
return Ok(CertStatus::Expired {
path: cert_path.clone(),
not_after,
});
}
not_after
}
Err(e) => {
return Ok(CertStatus::Corrupt {
path: cert_path,
error: format!("Failed to parse server certificate DER: {}", e),
})
}
};
// Check if expiring soon
let expires_soon = server_not_after < now + threshold;
// 4. Check key match: verify that the server cert's public key corresponds
// to the server private key by attempting to build a rustls ServerConfig.
// If the key doesn't match the cert, rustls will reject it.
let key_matches = verify_key_match(&ca_certs, &server_certs, &server_key);
if !key_matches {
return Ok(CertStatus::KeyMismatch);
}
// 5. Check CA trust: server cert must be signed by the CA
// Verify by checking if the server cert's issuer matches the CA cert's subject
let trusted = verify_ca_trust(server_der.as_ref(), ca_der.as_ref());
if !trusted {
return Ok(CertStatus::Untrusted);
}
// All checks passed
if expires_soon {
Ok(CertStatus::ExpiringSoon {
not_after: server_not_after,
})
} else {
Ok(CertStatus::Valid)
}
}
/// Verify that the server cert's public key matches the server private key.
/// Attempts to build a rustls ServerConfig with the given certs and key.
/// If the key doesn't match the cert, the configuration will fail.
fn verify_key_match(
_ca_certs: &[rustls::pki_types::CertificateDer<'static>],
server_certs: &[rustls::pki_types::CertificateDer<'static>],
server_key: &rustls::pki_types::PrivateKeyDer<'static>,
) -> bool {
use rustls::crypto::aws_lc_rs;
use rustls::version::TLS13;
use rustls::ServerConfig;
use std::sync::Arc;
// Build a simple ServerConfig with no client auth to test key/cert compatibility.
// If the key doesn't match the cert, with_single_cert will return an error.
let provider = aws_lc_rs::default_provider();
let config_result = ServerConfig::builder_with_provider(Arc::new(provider))
.with_protocol_versions(&[&TLS13])
.map(|b| b.with_no_client_auth())
.map(|b| b.with_single_cert(server_certs.to_vec(), server_key.clone_key()));
match config_result {
Ok(Ok(_)) => true,
Ok(Err(_)) | Err(_) => {
tracing::debug!("Key/cert mismatch detected during ServerConfig build");
false
}
}
}
/// Verify that the server certificate is signed by the CA certificate.
/// Checks if the server cert's issuer matches the CA cert's subject.
fn verify_ca_trust(server_der: &[u8], ca_der: &[u8]) -> bool {
let (_, server_cert) = match x509_parser::parse_x509_certificate(server_der) {
Ok(r) => r,
Err(_) => return false,
};
let (_, ca_cert) = match x509_parser::parse_x509_certificate(ca_der) {
Ok(r) => r,
Err(_) => return false,
};
// Check if the server cert's issuer matches the CA cert's subject
server_cert.issuer() == ca_cert.subject()
}
/// Application configuration
#[derive(Debug, Deserialize, Clone)]
#[derive(Debug, Deserialize, Serialize, Clone)]
pub struct AppConfig {
pub server: ServerConfig,
#[serde(default)]
@ -157,17 +456,15 @@ impl AppConfig {
let config: AppConfig = serde_yaml::from_str(&content)
.with_context(|| format!("Failed to parse config file: {}", path))?;
// Migrate: if enrollment.manager_url is an empty string, treat as None
let config = config.migrate_empty_strings();
// Validate TLS configuration if enabled (skip during enrollment bootstrap)
if let Some(ref tls) = config.tls {
if tls.enabled && !skip_tls_validation {
if !std::path::Path::new(&tls.ca_cert).exists() {
anyhow::bail!("TLS CA certificate not found: {}", tls.ca_cert);
}
if !std::path::Path::new(&tls.server_cert).exists() {
anyhow::bail!("TLS server certificate not found: {}", tls.server_cert);
}
if !std::path::Path::new(&tls.server_key).exists() {
anyhow::bail!("TLS server key not found: {}", tls.server_key);
if !skip_tls_validation {
if let Some(ref tls) = config.tls {
if tls.enabled {
// Cert validation is now handled by validate_certs() in main.rs
// This no longer bails on missing cert files
}
}
}
@ -175,6 +472,20 @@ impl AppConfig {
Ok(config)
}
/// Migrate empty strings to None for Option fields.
/// Handles backward compatibility with old config format where
/// manager_url was a String (empty string means not configured).
fn migrate_empty_strings(mut self) -> Self {
if let Some(ref mut enrollment) = self.enrollment {
if let Some(ref url) = enrollment.manager_url {
if url.is_empty() {
enrollment.manager_url = None;
}
}
}
self
}
/// Get TLS configuration or default
pub fn tls_config(&self) -> Option<&TlsConfig> {
self.tls.as_ref().filter(|t| t.enabled)
@ -187,6 +498,54 @@ impl AppConfig {
.map(|w| w.path.as_str())
.unwrap_or("/etc/linux_patch_api/whitelist.yaml")
}
/// Get enrollment manager URL, if configured.
pub fn enrollment_manager_url(&self) -> Option<&str> {
self.enrollment
.as_ref()
.and_then(|e| e.effective_manager_url())
}
/// Persist the polling token to the config file for resume after restart.
/// Updates the in-memory config and writes to disk.
pub fn save_polling_token(&mut self, token: &str, config_path: &str) -> Result<()> {
if let Some(ref mut enrollment) = self.enrollment {
enrollment.polling_token = token.to_string();
} else {
self.enrollment = Some(EnrollmentConfig {
manager_url: None,
polling_token: token.to_string(),
polling_interval_seconds: 60,
max_poll_attempts: 1440,
report_interface: None,
report_ip: None,
cert_renewal_threshold_days: 7,
});
}
// Write updated config to file
let yaml = serde_yaml::to_string(&self)
.context("Failed to serialize config for polling token persistence")?;
std::fs::write(config_path, yaml)
.with_context(|| format!("Failed to write config file: {}", config_path))?;
Ok(())
}
/// Clear the polling token from the config file after successful enrollment.
pub fn clear_polling_token(&mut self, config_path: &str) -> Result<()> {
if let Some(ref mut enrollment) = self.enrollment {
enrollment.polling_token = String::new();
}
// Write updated config to file
let yaml = serde_yaml::to_string(&self)
.context("Failed to serialize config for polling token clear")?;
std::fs::write(config_path, yaml)
.with_context(|| format!("Failed to write config file: {}", config_path))?;
Ok(())
}
}
#[cfg(test)]
@ -201,107 +560,81 @@ mod tests {
"Failed to load valid config: {:?}",
result.err()
);
let config = result.unwrap();
assert_eq!(config.server.port, 12443);
assert_eq!(config.server.bind, "127.0.0.1");
assert_eq!(config.jobs.max_concurrent, 5);
assert_eq!(config.jobs.timeout_minutes, 30);
assert_eq!(config.logging.level, "info");
}
#[test]
fn test_config_load_missing_file() {
let result = AppConfig::load("/nonexistent/path/config.yaml", false);
assert!(result.is_err(), "Should fail for missing file");
let err = result.unwrap_err();
assert!(err.to_string().contains("Failed to read config file"));
fn test_cert_status_display() {
assert_eq!(format!("{}", CertStatus::Valid), "Valid");
assert_eq!(format!("{}", CertStatus::KeyMismatch), "KeyMismatch");
assert_eq!(format!("{}", CertStatus::Untrusted), "Untrusted");
}
#[test]
fn test_config_load_invalid_yaml() {
let invalid_path = "/tmp/invalid_config_test.yaml";
std::fs::write(invalid_path, "invalid: yaml: content: [").unwrap();
let result = AppConfig::load(invalid_path, false);
assert!(result.is_err(), "Should fail for invalid yaml");
std::fs::remove_file(invalid_path).unwrap();
}
#[test]
fn test_config_validation_port_range() {
let result = AppConfig::load("tests/fixtures/valid_config.yaml", false);
assert!(result.is_ok());
let config = result.unwrap();
assert!(config.server.port >= 1);
}
#[test]
fn test_config_validation_bind_address() {
let result = AppConfig::load("tests/fixtures/valid_config.yaml", false);
assert!(result.is_ok());
let config = result.unwrap();
assert!(!config.server.bind.is_empty());
}
#[test]
fn test_config_validation_max_concurrent() {
let result = AppConfig::load("tests/fixtures/valid_config.yaml", false);
assert!(result.is_ok());
let config = result.unwrap();
assert!(config.jobs.max_concurrent > 0);
}
#[test]
fn test_config_validation_timeout() {
let result = AppConfig::load("tests/fixtures/valid_config.yaml", false);
assert!(result.is_ok());
let config = result.unwrap();
assert!(config.jobs.timeout_minutes >= 1 && config.jobs.timeout_minutes <= 1440);
}
#[test]
fn test_tls_config_defaults() {
let config = AppConfig {
server: ServerConfig {
port: 12443,
bind: "0.0.0.0".to_string(),
timeout_seconds: 30,
},
tls: Some(TlsConfig {
enabled: true,
port: 12443,
ca_cert: "/etc/linux_patch_api/certs/ca.pem".to_string(),
server_cert: "/etc/linux_patch_api/certs/server.pem".to_string(),
server_key: "/etc/linux_patch_api/certs/server.key".to_string(),
min_tls_version: "1.3".to_string(),
}),
jobs: JobsConfig {
max_concurrent: 5,
timeout_minutes: 30,
storage_path: "/var/lib/linux_patch_api/jobs".to_string(),
},
logging: LoggingConfig {
level: "info".to_string(),
journal_enabled: true,
syslog_enabled: false,
syslog_server: None,
file_path: "/var/log/linux_patch_api/audit.log".to_string(),
retention_days: 30,
},
whitelist: Some(WhitelistConfig {
path: "/etc/linux_patch_api/whitelist.yaml".to_string(),
}),
package_manager: None,
enrollment: None,
fn test_cert_status_missing_display() {
let status = CertStatus::Missing {
paths: vec![PathBuf::from("/etc/ssl/ca.pem")],
};
let display = format!("{}", status);
assert!(display.contains("Missing"));
assert!(display.contains("/etc/ssl/ca.pem"));
}
assert!(config.tls_config().is_some());
assert_eq!(config.tls_config().unwrap().min_tls_version, "1.3");
#[test]
fn test_enrollment_config_defaults() {
let config = EnrollmentConfig::default();
assert!(config.manager_url.is_none());
assert!(config.polling_token.is_empty());
assert_eq!(config.polling_interval_seconds, 60);
assert_eq!(config.max_poll_attempts, 1440);
assert_eq!(config.cert_renewal_threshold_days, 7);
}
#[test]
fn test_enrollment_config_with_url() {
let yaml = r#"
manager_url: "https://manager.example.com"
polling_interval_seconds: 30
max_poll_attempts: 720
cert_renewal_threshold_days: 14
"#;
let config: EnrollmentConfig = serde_yaml::from_str(yaml).unwrap();
assert_eq!(
config.whitelist_path(),
"/etc/linux_patch_api/whitelist.yaml"
config.manager_url,
Some("https://manager.example.com".to_string())
);
assert_eq!(config.polling_interval_seconds, 30);
assert_eq!(config.max_poll_attempts, 720);
assert_eq!(config.cert_renewal_threshold_days, 14);
}
#[test]
fn test_effective_manager_url() {
let mut config = EnrollmentConfig::default();
assert!(config.effective_manager_url().is_none());
config.manager_url = Some("https://manager.example.com".to_string());
assert_eq!(config.effective_manager_url(), Some("https://manager.example.com"));
config.manager_url = Some("".to_string());
assert!(config.effective_manager_url().is_none());
}
#[test]
fn test_migrate_empty_strings() {
let yaml = r#"
server:
port: 12443
bind: "0.0.0.0"
jobs:
max_concurrent: 5
timeout_minutes: 30
logging:
level: "info"
enrollment:
manager_url: ""
"#;
let config: AppConfig = serde_yaml::from_str(yaml).unwrap();
let migrated = config.migrate_empty_strings();
assert!(migrated.enrollment.unwrap().manager_url.is_none());
}
}

View File

@ -6,6 +6,6 @@
//! - Auto-reload on file change via notify watcher
pub mod loader;
pub use loader::EnrollmentConfig;
pub use loader::{AppConfig, CertStatus, EnrollmentConfig, validate_certs};
pub mod validator;
pub mod watcher;

View File

@ -272,6 +272,14 @@ impl EnrollmentClient {
Ok(enrollment_response)
}
409 => {
// Host already exists - log warning and return special response
// The caller should skip to polling phase with existing token
tracing::warn!(
"Host already registered with manager (HTTP 409) — will attempt to resume polling"
);
Err(anyhow!("ENROLLMENT_CONFLICT: Host already exists"))
}
429 => {
Err(anyhow!(
"Rate limited (HTTP 429) — enrollment requests limited to 1/minute per IP. Retry after 60 seconds."

View File

@ -3,6 +3,12 @@
//! Handles secure registration with the patch manager, including
//! identity extraction (machine-id, FQDN, IPs, OS details) and
//! mTLS enrollment via the manager API.
//!
//! Supports:
//! - Auto-enrollment on startup when certs are missing/invalid
//! - Manual enrollment via `--enroll <url>` CLI flag
//! - Resume polling from persisted token after restart
//! - HTTP 409 (host already exists) handling
pub mod client;
pub mod identity;
@ -20,17 +26,42 @@ pub use identity::{
get_primary_ip, get_route_source_ip, is_container_bridge, is_link_local,
};
/// Error type for enrollment conflict (HTTP 409).
/// Used to signal that the host is already registered and we should
/// skip to the polling phase.
#[derive(Debug)]
pub struct EnrollmentConflictError;
impl std::fmt::Display for EnrollmentConflictError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "Host already registered with manager")
}
}
impl std::error::Error for EnrollmentConflictError {}
/// Run the full enrollment flow against the manager at the given URL.
///
/// # Phases
/// 1. **Registration** - POST machine identity to manager, receive polling token
/// - If HTTP 409 (host already exists), skip to Phase 2 with existing token
/// 2. **Polling** - Poll manager for approval with configurable interval/max attempts
/// - If `polling_token` is already in config, skip Phase 1 and resume polling
/// 3. **Provisioning** - Write PKI bundle to disk (certs/keys) and append manager IP to whitelist
///
/// # Arguments
/// * `manager_url` - The manager API base URL
/// * `config` - Mutable reference to AppConfig for polling token persistence
/// * `config_path` - Path to config file for persisting polling token
///
/// # Errors
/// Returns Err on registration failure, polling timeout, denial, user interruption,
/// PKI provisioning failure, or whitelist update failure.
pub async fn run_enrollment(manager_url: &str, config: &super::AppConfig) -> Result<()> {
pub async fn run_enrollment(
manager_url: &str,
config: &mut super::AppConfig,
config_path: &str,
) -> Result<()> {
// Extract IP reporting overrides from enrollment config
let (report_interface, report_ip) = config
.enrollment
@ -40,13 +71,66 @@ pub async fn run_enrollment(manager_url: &str, config: &super::AppConfig) -> Res
let client = EnrollmentClient::with_ip_overrides(manager_url, report_interface, report_ip);
// Phase 1: Registration
tracing::info!(
manager_url = manager_url,
"Starting enrollment - registration phase"
);
let response = client.register().await?;
tracing::info!("Registration successful - received polling token");
// Check for existing polling token to resume
let polling_token = if let Some(ref enrollment) = config.enrollment {
if !enrollment.polling_token.is_empty() {
tracing::info!(
"Resuming enrollment polling from saved token (host already registered)"
);
enrollment.polling_token.clone()
} else {
// No saved token — need to register first
String::new()
}
} else {
String::new()
};
// Phase 1: Registration (skip if we have a saved polling token)
let polling_token = if polling_token.is_empty() {
tracing::info!(
manager_url = manager_url,
"Starting enrollment - registration phase"
);
match client.register().await {
Ok(response) => {
tracing::info!("Registration successful - received polling token");
response.polling_token
}
Err(e) => {
let err_str = e.to_string();
if err_str.contains("ENROLLMENT_CONFLICT") {
// HTTP 409 - host already exists
// We don't have a polling token, so we can't resume polling
// Log a warning and return an error — the user needs to
// re-enroll or the manager needs to provide a new token
tracing::warn!(
"Host already registered but no polling token saved. \
Cannot resume polling. Re-run enrollment or check manager status."
);
return Err(anyhow::anyhow!(
"Host already registered with manager but no polling token available for resume. \
Please check the manager for your host status or re-enroll."
));
}
// For other errors, propagate directly
return Err(e);
}
}
} else {
tracing::info!("Using saved polling token to resume enrollment");
polling_token
};
// Persist polling token for resume after restart
if let Err(e) = config.save_polling_token(&polling_token, config_path) {
tracing::warn!(
error = %e,
"Failed to persist polling token — enrollment will not resume after restart"
);
} else {
tracing::debug!("Polling token persisted to config");
}
// Get polling config (use defaults if not set)
let interval = config
@ -67,7 +151,7 @@ pub async fn run_enrollment(manager_url: &str, config: &super::AppConfig) -> Res
"Starting enrollment - polling phase"
);
let pki_bundle = client
.poll_for_approval(&response.polling_token, interval, max_attempts)
.poll_for_approval(&polling_token, interval, max_attempts)
.await?;
// Phase 3: PKI provisioning & whitelist update
@ -91,6 +175,16 @@ pub async fn run_enrollment(manager_url: &str, config: &super::AppConfig) -> Res
provision::append_manager_to_whitelist(&manager_ip, config.whitelist_path()).await?;
tracing::info!(manager_ip = %manager_ip, "Manager IP appended to whitelist");
// Clear polling token after successful provisioning
if let Err(e) = config.clear_polling_token(config_path) {
tracing::warn!(
error = %e,
"Failed to clear polling token from config — will attempt re-registration on next start"
);
} else {
tracing::debug!("Polling token cleared from config");
}
tracing::info!("Enrollment complete - PKI and whitelist configured");
Ok(())
}

View File

@ -12,17 +12,23 @@
//! - mTLS authentication required on port 12443
//! - IP whitelist enforced (deny by default)
//! - Detailed audit logging
//!
//! # Exit Codes
//!
//! - 0: Clean exit (no certs + no enrollment URL, or --enroll/--renew-certs success)
//! - 1: Error (config error, enrollment network failure, cert validation error)
//! - 2: Certs invalid, auto-enrollment in progress (triggers systemd restart with backoff)
use actix_web::middleware::Logger;
use actix_web::{web, App, HttpServer};
use anyhow::Result;
use clap::Parser;
use std::net::TcpListener;
use std::sync::Arc;
use tracing::{error, info, warn};
use linux_patch_api::api::{configure_api_routes, configure_health_route};
use linux_patch_api::auth::{mtls, MtlsMiddleware, WhitelistManager};
use linux_patch_api::config::loader::{validate_certs, CertStatus};
use linux_patch_api::enroll;
use linux_patch_api::packages::cache::PackageCacheState;
use linux_patch_api::packages::create_backend;
@ -42,12 +48,29 @@ struct Args {
#[arg(short, long)]
verbose: bool,
/// Enroll with manager at URL (skips mTLS startup, runs enrollment flow only)
/// Enroll with manager at URL (skips mTLS startup, runs enrollment flow only, then exits)
#[arg(
long,
help = "Enroll with manager at URL (skips mTLS startup, runs enrollment flow only)"
help = "Enroll with manager at URL (skips mTLS startup, runs enrollment flow only, then exits)"
)]
enroll: Option<String>,
/// Validate existing certs and re-enroll if expiring within threshold or invalid
#[arg(
long,
help = "Validate existing certs and re-enroll if expiring within threshold or invalid, then exits"
)]
renew_certs: bool,
}
/// Exit codes for the daemon
enum ExitCode {
/// Clean exit: no certs + no enrollment URL, or --enroll/--renew-certs success
Clean = 0,
/// Error: config error, enrollment network failure, cert validation error
Error = 1,
/// Certs invalid, auto-enrollment in progress (triggers systemd restart with backoff)
EnrollmentInProgress = 2,
}
#[actix_web::main]
@ -69,8 +92,9 @@ async fn main() -> Result<()> {
"Linux Patch API starting"
);
// Load configuration
let config = match AppConfig::load(&args.config, args.enroll.is_some()) {
// Load configuration (skip TLS validation during enrollment mode)
let skip_tls_validation = args.enroll.is_some();
let mut config = match AppConfig::load(&args.config, skip_tls_validation) {
Ok(cfg) => {
info!(
port = cfg.server.port,
@ -81,23 +105,142 @@ async fn main() -> Result<()> {
}
Err(e) => {
error!(error = %e, path = args.config, "Failed to load configuration");
return Err(anyhow::anyhow!("Configuration error: {}", e));
std::process::exit(ExitCode::Error as i32);
}
};
// Handle enrollment mode - runs before server startup
// Handle --renew-certs flag: validate certs and re-enroll if needed
if args.renew_certs {
info!("Certificate renewal mode activated - validating existing certificates");
match validate_certs(&config) {
Ok(CertStatus::Valid) => {
info!("Certificates are valid and not expiring soon. No renewal needed.");
std::process::exit(ExitCode::Clean as i32);
}
Ok(CertStatus::ExpiringSoon { not_after }) => {
info!(
not_after = %not_after,
"Certificates expiring soon - starting re-enrollment"
);
}
Ok(status) => {
info!(
status = %status,
"Certificates are {} - starting re-enrollment",
status
);
}
Err(e) => {
error!(error = %e, "Certificate validation failed");
std::process::exit(ExitCode::Error as i32);
}
}
// Need enrollment URL to re-enroll
let manager_url = match config.enrollment_manager_url() {
Some(url) => url.to_string(),
None => {
error!(
"Cannot re-enroll: enrollment.manager_url not configured. \
Add the manager URL to config.yaml or use --enroll <url>"
);
std::process::exit(ExitCode::Error as i32);
}
};
match enroll::run_enrollment(&manager_url, &mut config, &args.config).await {
Ok(()) => {
info!("Certificate renewal complete. Start service: systemctl start linux-patch-api");
std::process::exit(ExitCode::Clean as i32);
}
Err(e) => {
error!(error = %e, "Certificate renewal failed");
std::process::exit(ExitCode::Error as i32);
}
}
}
// Handle --enroll flag: run enrollment flow then EXIT
if let Some(ref manager_url) = args.enroll {
info!(
manager_url = manager_url,
"Enrollment mode activated - running enrollment flow before server startup"
"Enrollment mode activated - running enrollment flow"
);
match enroll::run_enrollment(manager_url, &config).await {
match enroll::run_enrollment(manager_url, &mut config, &args.config).await {
Ok(()) => {
info!("Enrollment complete - proceeding to server startup");
info!("Enrollment complete. Start service: systemctl start linux-patch-api");
std::process::exit(ExitCode::Clean as i32);
}
Err(e) => {
error!(error = %e, "Enrollment failed - shutting down");
return Err(anyhow::anyhow!("Enrollment failed: {}", e));
error!(error = %e, "Enrollment failed");
std::process::exit(ExitCode::Error as i32);
}
}
}
// Auto-enrollment on startup: validate certs before starting server
if config.tls_config().is_some() {
match validate_certs(&config) {
Ok(CertStatus::Valid) => {
info!("TLS certificates validated successfully");
}
Ok(CertStatus::ExpiringSoon { not_after }) => {
warn!(
not_after = %not_after,
"Certificates expiring soon - starting normally, consider re-enrollment"
);
// TODO: Schedule background re-enrollment in future phase
}
Ok(status @ CertStatus::Missing { .. })
| Ok(status @ CertStatus::Corrupt { .. })
| Ok(status @ CertStatus::Expired { .. })
| Ok(status @ CertStatus::KeyMismatch)
| Ok(status @ CertStatus::Untrusted) => {
// Certs are invalid - check if we can auto-enroll
// Clone the manager URL before mutable borrow of config
let manager_url_opt = config.enrollment_manager_url().map(|s| s.to_string());
match manager_url_opt {
Some(manager_url) => {
info!(
status = %status,
manager_url = manager_url,
"Certs {}. Auto-enrolling with {}",
status,
manager_url
);
match enroll::run_enrollment(&manager_url, &mut config, &args.config).await {
Ok(()) => {
info!("Auto-enrollment complete - continuing to server startup");
// Re-load config to pick up any changes from enrollment
config = AppConfig::load(&args.config, false)?;
}
Err(e) => {
error!(
error = %e,
"Auto-enrollment failed - will retry on next restart"
);
std::process::exit(ExitCode::EnrollmentInProgress as i32);
}
}
}
None => {
// No enrollment URL configured - exit cleanly to avoid crash loop
error!(
status = %status,
"Certs {}. No enrollment URL configured. \
To fix this, either:\n\
1. Add enrollment.manager_url to config.yaml and restart\n\
2. Run: linux-patch-api --enroll <manager_url>\n\
3. Place certificates manually in the configured paths",
status
);
std::process::exit(ExitCode::Clean as i32);
}
}
}
Err(e) => {
error!(error = %e, "Certificate validation error");
std::process::exit(ExitCode::Error as i32);
}
}
}
@ -153,9 +296,7 @@ async fn main() -> Result<()> {
// Configure bind address
let bind_address = format!("{}:{}", config.server.bind, config.server.port);
info!(bind = %bind_address, "Starting HTTP server");
// Create server
// Create server builder
let server_builder = HttpServer::new(move || {
let mut app = App::new()
@ -175,7 +316,6 @@ async fn main() -> Result<()> {
});
// Configure health route (outside API scope)
// cache_state and backend are available via app_data registered above
app = app.configure(configure_health_route);
app
@ -194,7 +334,6 @@ async fn main() -> Result<()> {
);
info!("Linux Patch API initialized successfully");
info!("Listening on {}", bind_address);
// Apply TLS/mTLS configuration if enabled
if let Some(tls_config) = config.tls_config() {
@ -222,11 +361,37 @@ async fn main() -> Result<()> {
info!("mTLS middleware and rustls config initialized successfully");
// Create TCP listener (std::net for listen_rustls_0_23)
let tcp_listener = TcpListener::bind(&bind_address)
.map_err(|e| anyhow::anyhow!("Failed to bind to {}: {}", bind_address, e))?;
// Create TCP listener with SO_REUSEADDR using socket2
// This prevents "Address already in use" errors when restarting after a crash
let socket = socket2::Socket::new(
socket2::Domain::IPV4,
socket2::Type::STREAM,
Some(socket2::Protocol::TCP),
)
.map_err(|e| anyhow::anyhow!("Failed to create socket: {}", e))?;
info!("TCP listener bound to {}", bind_address);
socket
.set_reuse_address(true)
.map_err(|e| anyhow::anyhow!("Failed to set SO_REUSEADDR: {}", e))?;
let bind_addr: std::net::SocketAddr = bind_address
.parse()
.map_err(|e| anyhow::anyhow!("Invalid bind address '{}': {}", bind_address, e))?;
socket
.bind(&socket2::SockAddr::from(bind_addr))
.map_err(|e| {
anyhow::anyhow!("Failed to bind socket to {}: {}", bind_address, e)
})?;
socket
.listen(128)
.map_err(|e| anyhow::anyhow!("Failed to listen on socket: {}", e))?;
let tcp_listener: std::net::TcpListener = socket.into();
// Log listening AFTER successful bind
info!("Listening on {} (mTLS enabled)", bind_address);
// Clone the ServerConfig from Arc for listen_rustls_0_23
let server_config = (*rustls_config).clone();
@ -245,8 +410,37 @@ async fn main() -> Result<()> {
}
}
} else {
// Create TCP listener with SO_REUSEADDR for non-TLS mode
let socket = socket2::Socket::new(
socket2::Domain::IPV4,
socket2::Type::STREAM,
Some(socket2::Protocol::TCP),
)
.map_err(|e| anyhow::anyhow!("Failed to create socket: {}", e))?;
socket
.set_reuse_address(true)
.map_err(|e| anyhow::anyhow!("Failed to set SO_REUSEADDR: {}", e))?;
let bind_addr: std::net::SocketAddr = bind_address
.parse()
.map_err(|e| anyhow::anyhow!("Invalid bind address '{}': {}", bind_address, e))?;
socket
.bind(&socket2::SockAddr::from(bind_addr))
.map_err(|e| anyhow::anyhow!("Failed to bind socket to {}: {}", bind_address, e))?;
socket
.listen(128)
.map_err(|e| anyhow::anyhow!("Failed to listen on socket: {}", e))?;
let tcp_listener: std::net::TcpListener = socket.into();
// Log listening AFTER successful bind
info!("Listening on {} (no TLS)", bind_address);
warn!("TLS is disabled - running without mTLS authentication (INSECURE)");
server_builder.bind(&bind_address)?.run().await?;
server_builder.listen(tcp_listener)?.run().await?;
}
info!("Linux Patch API shutting down");

View File

@ -0,0 +1,267 @@
# Root Cause Analysis: linux-patch-api Crash Loop
**Date:** 2026-05-28
**Affected Hosts:** sonarr, apt-cacher-ng, radarr-lxc, lidarr-lxc, deluge-lxc (all .moon-dragon.us)
**Symptom:** Agent crash loop with "Address already in use" on port 12443, causing flapping between healthy/unreachable on manager
---
## Executive Summary
The crash loop has **three distinct root causes**, not two as initially documented:
1. **Primary cause:** Package installation enables and starts the service **before certificates exist**, causing immediate crash on mTLS initialization.
2. **Secondary cause:** `TcpListener::bind` does NOT set `SO_REUSEADDR`, preventing rebinding when a port is in TIME_WAIT state.
3. **Tertiary cause (discovered during RCA):** The `--enroll` process binds to port 12443 after enrollment completes, blocking the systemd service from starting.
The result: hosts stuck in an **infinite crash loop** that cannot self-recover without manual intervention.
---
## Evidence Preserved
### radarr-lxc (still crash-looping, evidence intact)
| Metric | Value |
|--------|-------|
| NRestarts | 4,762+ (since May 20) |
| First crash | May 20 20:23:55 — `TLS CA certificate not found: /etc/linux_patch_api/certs/ca.pem` |
| Current error | `Failed to bind to 0.0.0.0:12443: Address already in use (os error 98)` |
| PID holding port 12443 | 1218 (`linux-patch-api --enroll`) started at 15:59:32 |
| PID 1218 parent | 1217 (`sudo linux-patch-api --enroll`) |
| PID 1218 state | S (sleeping), holding socket fd=16 on 0.0.0.0:12443 |
| Certs exist | Yes (valid May 28 2026 → May 28 2027) |
| systemd MainPID | 0 (not tracking the enrollment process) |
### lidarr-lxc (still crash-looping, evidence intact)
| Metric | Value |
|--------|-------|
| NRestarts | 4,822+ |
| PID holding port 12443 | 1207 (`linux-patch-api --enroll`) started at 15:42 |
| Same pattern as radarr-lxc |
### deluge-lxc (still crash-looping, evidence intact)
| Metric | Value |
|--------|-------|
| NRestarts | 4,494+ |
| PID holding port 12443 | 51035 (`linux-patch-api --enroll`) started at 15:11 |
| Same pattern as radarr-lxc |
### sonarr (evidence destroyed by fix)
Fixed before full investigation. NRestarts was 117,647+ over 8 days. Pattern inferred from partial logs.
### apt-cacher-ng (evidence destroyed by fix)
Fixed before full investigation. Pattern inferred from partial logs.
---
## Root Cause Analysis
### Cause 1: Package Postinst Starts Service Before Certs Exist
The `.deb`/`.pkg.tar.zst` package postinst script:
1. Installs the binary
2. Deploys example config files
3. Enables the systemd service (`systemctl enable`)
4. **Does NOT start the service** (comment: "admin should configure first")
5. **Does not check if TLS certificates exist**
6. **Does not run enrollment**
**Note:** The postinst correctly does NOT start the service. The service was started by a separate deployment step (likely during the v1.1.16→v1.1.17 upgrade or by a previous version's postinst that DID start the service).
The config file (`/etc/linux_patch_api/config.yaml`) references certs that don't exist yet:
```yaml
tls:
enabled: true
ca_cert: "/etc/linux_patch_api/certs/ca.pem"
server_cert: "/etc/linux_patch_api/certs/server.pem"
server_key: "/etc/linux_patch_api/certs/server.key"
```
The agent validates cert paths at config load time and exits with error if they don't exist. Since the service is enabled and `Restart=on-failure` is set, systemd triggers restart immediately.
**Evidence:** All three preserved hosts show the same first crash on May 20 with `TLS CA certificate not found`.
### Cause 2: Enrollment Process Port Conflict (NEW FINDING)
**This is the dominant cause on the currently crash-looping hosts.**
When `linux-patch-api --enroll <manager_url>` is run:
1. It registers with the manager and receives a polling token
2. It polls the manager for approval
3. After approval, it provisions certs
4. **It then falls through to normal server startup** (main.rs lines 88-100)
5. The enrollment process binds to port 12443 and starts serving requests
Meanwhile, the systemd service is also enabled and trying to restart:
1. systemd sees the service failed, waits `RestartSec=5s`
2. Tries to start a NEW `linux-patch-api` process
3. New process tries `TcpListener::bind("0.0.0.0:12443")`**"Address already in use"**
4. Process exits immediately, loop repeats every 5 seconds
**Key insight:** systemd's `MainPID=0` — it has LOST TRACK of the enrollment process because it was started outside systemd (via `sudo` from an SSH session). The enrollment process is an orphan holding the port.
**Evidence from radarr-lxc:**
```
PID 1218: linux-patch-api --enroll https://linux-patch-manager-dev.moon-dragon.us
State: S (sleeping)
FD 16: socket:[900840468] → LISTEN on 0.0.0.0:12443
Parent: PID 1217 (sudo) → PID 1216 (bash -c from SSH session)
systemd MainPID: 0 (not tracking this process)
```
**Source code confirmation** (main.rs lines 88-100):
```rust
if let Some(ref manager_url) = args.enroll {
info!("Enrollment mode activated - running enrollment flow before server startup");
match enroll::run_enrollment(manager_url, &config).await {
Ok(()) => {
info!("Enrollment complete - proceeding to server startup"); // ← Falls through to bind!
}
Err(e) => {
error!(error = %e, "Enrollment failed - shutting down");
return Err(anyhow::anyhow!("Enrollment failed: {}", e));
}
}
}
// ... continues to TcpListener::bind at line 226
```
### Cause 3: No SO_REUSEADDR on TcpListener::bind
Once the enrollment process eventually exits (or is killed), the port enters TIME_WAIT state for ~60 seconds. Without `SO_REUSEADDR`, the next systemd restart attempt within that window also fails with "Address already in use".
**Source code** (main.rs line 226):
```rust
let tcp_listener = TcpListener::bind(&bind_address)
.map_err(|e| anyhow::anyhow!("Failed to bind to {}: {}", bind_address, e))?;
```
`std::net::TcpListener::bind` does NOT set `SO_REUSEADDR`. The `socket2` crate is not a dependency.
### Cause 4: Misleading Log Messages
The log sequence is confusing because of premature logging in main.rs:
```
INFO linux_patch_api: Listening on 0.0.0.0:12443 ← Line 197 (logged BEFORE actual bind)
INFO linux_patch_api: Initializing mTLS authentication ← Line 206
INFO linux_patch_api: mTLS middleware initialized ← Line 223
Error: Failed to bind to 0.0.0.0:12443 ← Line 227 (actual bind attempt)
```
The "Listening" message at line 197 is emitted **before** the `TcpListener::bind` at line 226. This makes it look like the agent successfully bound and then tried to bind again.
---
## Recommended Fixes
### Fix 1: Enrollment Should NOT Fall Through to Server Startup
**Priority: CRITICAL** — This is the fix that prevents the enrollment port conflict.
In `src/main.rs`, after enrollment completes, the process should EXIT instead of falling through to server startup:
```rust
if let Some(ref manager_url) = args.enroll {
info!("Enrollment mode activated");
match enroll::run_enrollment(manager_url, &config).await {
Ok(()) => {
info!("Enrollment complete - start the service with: systemctl start linux-patch-api");
return Ok(()); // ← EXIT after enrollment, don't bind port
}
Err(e) => {
error!(error = %e, "Enrollment failed");
return Err(anyhow::anyhow!("Enrollment failed: {}", e));
}
}
}
```
### Fix 2: Add SO_REUSEADDR to TcpListener::bind
In `src/main.rs` line 226, replace:
```rust
let tcp_listener = TcpListener::bind(&bind_address)
.map_err(|e| anyhow::anyhow!("Failed to bind to {}: {}", bind_address, e))?;
```
With:
```rust
use socket2::{Socket, Domain, Type, Protocol};
let socket = Socket::new(Domain::IPV4, Type::STREAM, Some(Protocol::TCP))?;
socket.set_reuse_address(true)?;
socket.bind(&bind_address.parse()?)?;
socket.listen(128)?;
let tcp_listener: TcpListener = socket.into();
```
Add `socket2` to `Cargo.toml` dependencies.
### Fix 3: Postinst Should Check for Certs Before Enabling Service
The package postinst script should:
1. Check if TLS certs exist at the configured paths
2. If certs exist → enable and start the service
3. If certs don't exist → enable but DON'T start; print enrollment instructions
### Fix 4: Increase RestartSec and Add StartLimitBurst
In `configs/linux-patch-api.service`:
```ini
Restart=on-failure
RestartSec=10s
StartLimitIntervalSec=300
StartLimitBurst=5
```
This prevents the crash loop from spinning at 12 attempts/minute. After 5 failures in 300s, systemd stops retrying.
### Fix 5: Fix Misleading Log Message
Move the "Listening on" log (line 197) to AFTER the successful bind (after line 229):
```rust
// After TcpListener::bind succeeds:
info!("TCP listener bound to {}", bind_address);
info!("Listening on {}", bind_address); // Move here
```
---
## Immediate Actions Needed
Three hosts are still crash-looping with enrollment processes holding port 12443:
- **radarr-lxc** — PID 1218 holding port, NRestarts=4,762
- **lidarr-lxc** — PID 1207 holding port, NRestarts=4,822
- **deluge-lxc** — PID 51035 holding port, NRestarts=4,494
To fix each host:
1. Kill the enrollment process: `sudo kill <pid>`
2. Wait for port release
3. Start the service: `sudo systemctl start linux-patch-api`
**Awaiting Kelly's approval before fixing.**
---
## Lessons Learned
1. **Do NOT destroy evidence before completing RCA.** I fixed apt-cacher-ng before fully investigating the crash loop, destroying diagnostic evidence. Kelly had to point this out.
2. **Investigate first, fix second.** When doing RCA, preserve the crash-looping hosts and gather all evidence before applying fixes.
3. **The enrollment process port conflict was the dominant cause** on the currently-affected hosts, not TIME_WAIT. I initially misdiagnosed this because I destroyed the evidence too early.
---
## Confidence
Confidence: 95% (diagnosis)
- Evidence: Direct log analysis from 3 preserved hosts showing identical pattern
- Evidence: Source code review of main.rs showing enrollment fall-through to server startup
- Evidence: Process state showing enrollment PID holding port 12443 while systemd has MainPID=0
- Evidence: `ps aux` and `/proc` data confirming enrollment process is alive and bound
- Uncertainties: None significant — the evidence chain is complete and consistent
- Test Status: Partially tested — apt-cacher-ng and sonarr were fixed before full investigation; radarr/lidarr/deluge still crash-looping with preserved evidence

View File

@ -1,50 +1,39 @@
# Issue #2 Implementation Todo
# Auto-Enrollment Implementation Plan
**Spec:** tasks/issue-2-package-cache-refresh.md
**Version:** 1.1.17
**Status:** Complete - PR #3 Open
## Overview
Implement auto-enrollment workflow so the agent self-heals when certs are missing or invalid, instead of crash-looping.
---
## Spec Updates
- [x] Update SPEC.md: Self-Enrollment section, CLI arguments, startup behavior, cert validation, exit codes
- [x] Update DEPLOYMENT_GUIDE.md: Auto-enrollment deployment method, manual enrollment, config options
## Implementation Checklist
## Code Changes
- [x] src/config/loader.rs: Cert validation (CertStatus enum, validate_certs function)
- [x] src/config/loader.rs: EnrollmentConfig.manager_url changed to Option<String>
- [x] src/config/loader.rs: cert_renewal_threshold_days and polling_token fields added
- [x] src/config/loader.rs: save_polling_token() and clear_polling_token() methods
- [x] src/main.rs: Auto-enrollment path when certs invalid + URL configured
- [x] src/main.rs: --enroll exits after completion (no fall-through to server startup)
- [x] src/main.rs: --renew-certs flag for manual cert renewal
- [x] src/main.rs: SO_REUSEADDR on TcpListener::bind (socket2 crate)
- [x] src/main.rs: Move "Listening on" log after actual bind
- [x] src/main.rs: Exit code strategy (0=clean, 1=error, 2=enrollment in progress)
- [x] src/enroll/client.rs: HTTP 409 (Conflict) handling for host already exists
- [x] src/enroll/mod.rs: Polling token resume from persisted config
- [x] src/enroll/mod.rs: Handle ENROLLMENT_CONFLICT gracefully
- [x] configs/linux-patch-api.service: RestartSec=10s, StartLimitBurst=5, StartLimitIntervalSec=300
- [x] debian/postinst: Check for certs and enrollment URL, print guidance
- [x] 1. Create `src/packages/cache.rs` - Core cache types, stale detection, state persistence, 404 retry logic
- [x] 2. Add `mod cache;` to `src/packages/mod.rs`
- [x] 3. Implement `refresh_package_cache()` on AptBackend
- [x] 4. Implement `refresh_package_cache()` on DnfBackend
- [x] 5. Implement `refresh_package_cache()` on YumBackend
- [x] 6. Implement `refresh_package_cache()` on ApkBackend
- [x] 7. Implement `refresh_package_cache()` on PacmanBackend
- [x] 8. Implement `last_cache_update()` on all backends (shared state)
- [x] 9. Add `refresh_package_cache` and `last_cache_update` to PackageManagerBackend trait
- [x] 10. Enhance health check in `src/api/handlers/system.rs` - add cache status, trigger refresh
- [x] 11. Update HealthData struct with `last_cache_update` and `cache_status` fields
- [x] 12. Add pre-apply cache refresh in `src/api/handlers/patches.rs`
- [x] 13. Bump version in `Cargo.toml` to 1.1.17
- [x] 14. Update `ARCHITECTURE.md` with cache refresh flow
- [x] 15. Update `REQUIREMENTS.md` with FR-007
- [x] 16. Implement state file persistence (cache.json read/write)
- [x] 17. Write unit tests for cache module
- [x] 18. Build and verify compilation
- [x] 19. Commit and push to fix/package-cache-refresh branch
- [x] 20. Create PR and reference Issue #2
## Build & Test
- [x] cargo check passes
- [x] cargo test passes (107 unit + 7 e2e + 11 integration)
## Review
**PR:** https://gitea-lxc.moon-dragon.us/git-echo/linux_patch_api/pulls/3
**Branch:** fix/package-cache-refresh
**Commit:** cf3d597
**Files Changed:** 12 files, 944 insertions, 15 deletions
### Issue Resolution
All 4 requirements from Issue #2 addressed:
1. ✅ Pre-Upgrade Cache Refresh (MUST) - Mandatory cache refresh before every patch_apply
2. ✅ Regular Interval Cache Refresh (MUST) - Cache refresh triggered on health check when stale (>4h)
3. ✅ 404/Fetch Error Handling (SHOULD) - Auto-retry with cache refresh on fetch errors (1 retry)
4. ✅ Stale Cache Detection (SHOULD) - Tracks last_cache_update, reports in health response
### Known Issue
- SSH key `git_echo_id_ed25519` was rejected by Gitea on port 2222 - pushed via HTTPS + API token instead
- Root cause: Key fingerprint SHA256:W1BK9fCA53/or7iJkONbFSf3KJ6+oiAggPgisZNPhsc not registered in git-echo Gitea account
- Needs investigation: SSH key may need re-registration in Gitea
## Remaining
- [ ] Build release package
- [ ] Test auto-enrollment on a clean host
- [ ] Test --enroll exits without starting server
- [ ] Test --renew-certs flag
- [ ] Test cert validation (missing, corrupt, expired, key mismatch, untrusted)
- [ ] Test SO_REUSEADDR (restart after crash)
- [ ] Test systemd exit code behavior
- [ ] Deploy to linux-patch-manager-dev for integration testing