Private
Public Access
1
0

feat(pki): add CRL generation, distribution endpoint, and enrollment bundle extension (#26)
All checks were successful
CI Pipeline / Rust Format Check (push) Successful in 6s
CI Pipeline / Clippy Lints (push) Successful in 52s
CI Pipeline / Rust Unit Tests (push) Successful in 1m10s
CI Pipeline / Security Audit (push) Successful in 1m26s
CI Pipeline / Frontend Lint & Type Check (push) Successful in 15s
CI Pipeline / Build .deb & Release (push) Has been skipped

* feat(pki): add CRL generation, distribution endpoint, and enrollment bundle extension

Implements manager-side CRL infrastructure for issue #7:
- Add CertAuthority::generate_crl() using rcgen 0.13
- Add GET /api/v1/pki/crl.pem public endpoint
- Extend PkiBundle with ca_chain and crl_pem fields
- Update enrollment route to include CRL in bundle
- Mount pki route as public endpoint
- Add proptest dev-dependency

* style: fix cargo fmt in enrollment.rs

---------

Co-authored-by: Draco Lunaris <331325+Draco-Lunaris@users.noreply.github.com>
This commit is contained in:
Draco-Lunaris-Echo
2026-06-05 12:54:14 -05:00
committed by GitHub
parent 80ffb6b62f
commit 5aec9e629c
11 changed files with 871 additions and 4 deletions

404
tasks/issue-7-crl-design.md Normal file
View File

@ -0,0 +1,404 @@
# Issue #7: Certificate Revocation Enforcement — Full CRL Design
**GitHub Issue:** https://github.com/Draco-Lunaris/Linux-Patch-Manager/issues/7
**Companion issue (agent repo):** https://github.com/Draco-Lunaris/Linux-Patch-Api/issues/20
**Status:** Design finalized — implementation pending
**Repos affected:** linux-patch-manager (this), linux-patch-api (agent)
**Last updated:** 2026-06-05
---
## 1. Goal
Enforce certificate revocation at the mTLS handshake by having the manager (CA operator) publish a Certificate Revocation List (CRL) and the agent (linux-patch-api) consult it during TLS client certificate validation.
**Connection direction:** The manager (this repo) is the mTLS client. The agent (linux-patch-api) is the mTLS server. The manager connects TO the agent and presents a client cert. The agent validates it. Agent-to-manager connections occur only for enrollment.
---
## 2. Architecture
### 2.1 Components
```
┌──────────────────┐ ┌──────────────────────┐
│ pm-web │ │ linux-patch-api │
│ (manager) │ GET /pki/crl.pem │ (agent) │
│ │ ◄──────────────────────│ │
│ ┌────────────┐ │ on enrollment + │ ┌────────────────┐ │
│ │ pm-ca │ │ every 24h │ │ mTLS server │ │
│ │ (signs │ │ │ │ (validates │ │
│ │ certs + │ │ Bundle: CA chain + │ │ client certs │ │
│ │ CRLs) │ │ client cert + │ │ + CRL check) │ │
│ └────────────┘ │ client key + │ └────────────────┘ │
│ │ CRL │ │
└──────────────────┘ └──────────────────────┘
│ │
│ Health check (existing infra) │
│ + CRL age on agent side │
└───────────────────────────────────────────┘
```
### 2.2 Mermaid flow diagram
```mermaid
sequenceDiagram
participant Mgr as Manager (pm-web)
participant Agent as Agent (linux-patch-api)
participant CA as pm-ca
participant DB as certificates table
Note over Mgr,CA: Initial Enrollment
Agent->>Mgr: POST /api/v1/enroll (with CSR)
Mgr->>CA: issue cert (sign with CA key)
CA->>DB: INSERT certificate (status=active)
CA-->>Mgr: leaf cert
Mgr->>CA: generate_crl()
CA->>DB: SELECT serials WHERE status=revoked
CA-->>Mgr: signed CRL
Mgr-->>Agent: PKI bundle (CA chain + cert + key + CRL)
Agent->>Agent: persist all 4 to /etc/linux-patch-api/certs/
Agent->>Agent: verify CRL signature against pinned CA
Note over Agent: Background refresh (every 24h)
Agent->>Mgr: GET /api/v1/pki/crl.pem
Mgr->>CA: generate_crl() (cached or regenerate)
CA-->>Mgr: CRL
Mgr-->>Agent: CRL
Agent->>Agent: verify signature, persist, swap in-memory map
Note over Mgr,Agent: Normal operation (mTLS)
Mgr->>Agent: mTLS handshake (presents client cert)
Agent->>Agent: webpki verifies chain
Agent->>Agent: extract serial, check CRL
alt serial in CRL
Agent-->>Mgr: handshake rejected
else serial not in CRL
Agent-->>Mgr: handshake accepted
end
Note over Mgr,CA: Operator revokes a cert
Mgr->>CA: revoke_cert(serial)
CA->>DB: UPDATE status=revoked
CA->>CA: generate_crl() (regenerate)
Note over Agent: Next 24h refresh picks up the revocation
```
### 2.3 Sub-CA handling
**Both root and sub-CA modes are supported.**
- **Root mode:** Manager is a self-signed CA. The CA cert in the bundle is also the trust anchor. CRL signature chains directly to it.
- **Sub-CA mode:** Manager is a sub-CA under an external root. The enrollment bundle includes the full chain: external root + manager's intermediate cert. The agent pins both. CRL signature chains up to the external root.
**Required code change for sub-CA support:** Extend `PkiBundle` to include the full chain (new `ca_chain` field containing intermediate + root as a single PEM bundle). The existing single `ca_crt` field is preserved for backward compat (it becomes the leaf-most cert in the chain).
The external root's own CRL is **out of scope** for this design. Documented assumption: the external root is long-lived and trusted by the agent's system trust store, or the operator accepts the risk of a long-lived external root.
### 2.4 Cert lifetime
**No change to current 1-year lifetime.** Revocation lag of up to 24h is acceptable given the 1-year cert validity. Shortening to 90 days was considered and deferred (Phase 1+ works correctly with either lifetime).
---
## 3. Final Decisions (12 concerns walked through with Kelly)
| # | Concern | Decision |
|---|---------|----------|
| 1 | Sub-CA enrollment bundle chain | Extend `PkiBundle` to include full chain (intermediate + root) as new `ca_chain` field. Single `ca_crt` field preserved for backward compat. |
| 2 | CRL generation library | rcgen 0.13 on manager (sign). x509-parser on agent (parse). webpki for chain validation in custom verifier. No new system deps. |
| 3 | Custom ClientCertVerifier | Use rustls `danger::ClientCertVerifier` trait. Wrapper struct delegates chain validation to `WebPkiClientVerifier`, adds serial lookup against parsed CRL. Only ~80 lines of custom code. |
| 4 | Stale-CRL failure mode | (c) Degraded. Continue serving with stale CRL, log warning, health check reports degraded. Missing CRL = degraded. Invalid signature = refuse to start (fail-closed). |
| 5 | CRL size at scale | Not a concern. Max 2500 clients/manager. CRLs KB-range. No index on (status, not_after) needed. |
| 6 | Health check backward compat | Missing `crl_status` field from older agent = degraded (not unhealthy). New agent with missing > 24h after enrollment = unhealthy. UI: host details page + list icon + dashboard widget. |
| 7 | Test coverage | Layers 1-3 (unit + property + integration) required for ship. Layer 4 (E2E docker-compose) incremental. Layer 5 (fuzz) added now. Property-based tests with `proptest` added now. |
| 8 | Deployment order | 6 PRs sequential. No feature flag (disk state is the implicit flag). All-at-once rollout for PR 2 (agent). |
| 9 | Documentation | Full scope. New `docs/security/revocation.md` as top-level doc. Mermaid diagrams in markdown (GitHub renders natively). |
| 10 | Phasing risk | Low. Pre-production stage, no live users to disrupt. Bounded window between PR 1 and PR 2. |
| 11 | mTLS direction | Confirmed. Manager = client, agent = server. Agent-to-manager only for enrollment. |
| 12 | New host enrollment during CRL outage | Enrollment succeeds without CRL. Health check reports missing. Agent fetches CRL on next refresh cycle. |
---
## 4. Phased Implementation (6 PRs)
### PR 1 — Manager: CRL generation + endpoint + enrollment bundle
**Repo:** linux-patch-manager (this)
**Scope:**
- Extend `PkiBundle` to include full chain (new `ca_chain` field)
- Add `generate_crl()` to `pm-ca/src/ca.rs` using rcgen 0.13
- Add `GET /api/v1/pki/crl.pem` route in new `crates/pm-web/src/routes/pki.rs`
- Include CRL PEM in enrollment response
- Background task: regenerate CRL every 12h and on every `revoke_cert` call
- No DB schema changes
**Testing:**
- Unit tests for `generate_crl()` (revoked serials present, non-revoked absent, expired excluded)
- Property tests (proptest) for CRL generation roundtrip
- Fuzz harness for CRL generation
- Integration test: `GET /pki/crl.pem` returns 200 + valid PEM + correct `Cache-Control`
- Integration test: enrollment bundle includes CRL
**Backward compat:** Endpoint is dark until an agent is updated to consume it. Older agents ignore it. Zero impact on existing flows.
---
### PR 2 — Agent: CRL consumption + custom verifier
**Repo:** linux-patch-api
**Scope:**
- New `src/auth/crl.rs` module: CRL load, signature verification, in-memory serial map (ArcSwap)
- New `src/auth/crl_refresh.rs`: background task fetching CRL every 24h from `GET {manager_url}/api/v1/pki/crl.pem`
- Extend `src/auth/mtls.rs`: replace direct `WebPkiClientVerifier` usage with `CrlClientCertVerifier` wrapper
- Persist CRL to `/etc/linux-patch-api/certs/crl.pem`
- Config additions: `crl_path`, `crl_refresh_interval`, `manager_url`
**Custom verifier (sketch):**
```rust
pub struct CrlClientCertVerifier {
inner: Arc<dyn rustls::client::danger::ClientCertVerifier>,
crl: arc_swap::ArcSwap<Crl>,
}
impl rustls::client::danger::ClientCertVerifier for CrlClientCertVerifier {
fn verify_client_cert(
&self,
end_entity: &CertificateDer<'_>,
intermediates: &[CertificateDer<'_>],
now: UnixTime,
) -> Result<ClientCertVerified, rustls::Error> {
// Delegate chain validation to WebPKI (battle-tested)
self.inner.verify_client_cert(end_entity, intermediates, now)?;
// Extract serial from the leaf cert
let serial = extract_serial(end_entity)
.map_err(|e| rustls::Error::General(format!("serial extract: {}", e)))?;
// Check CRL (O(1) hash lookup)
let crl = self.crl.load();
if crl.is_revoked(serial) {
return Err(rustls::Error::General(format!(
"cert serial {} is revoked", serial
)));
}
Ok(ClientCertVerified::assertion())
}
// Delegate remaining trait methods to self.inner
fn supported_verify_schemes(&self) -> Vec<rustls::SignatureScheme> {
self.inner.supported_verify_schemes()
}
fn verify_tls12_signature(&self, ...) -> Result<...> {
self.inner.verify_tls12_signature(...)
}
fn verify_tls13_signature(&self, ...) -> Result<...> {
self.inner.verify_tls13_signature(...)
}
}
```
**Backward compat:** If CRL file is missing or fails signature verification, fall back to `WebPkiClientVerifier` directly (current behavior). Log warning. Health check from manager reports degraded.
**Testing:**
- Unit tests: CRL load (valid, malformed, missing, tampered, expired)
- Unit tests: custom verifier (valid cert accepted, revoked cert rejected, no false positives)
- Property tests (proptest): random certs + random CRLs, no false negs/pos
- Fuzz harness for CRL load and verifier
- Integration test: end-to-end mTLS (valid cert connects, revoked cert rejected)
- Integration test: stale CRL fallback to WebPKI (no connection rejection)
---
### PR 3 — Manager: Health check schema + UI
**Repo:** linux-patch-manager (this)
**Scope:**
- Extend health check response schema to include `crl_status` and `crl_age_seconds` fields (optional, backward compat)
- Add UI: CRL section in host details page
- Add hosts list icon (green/yellow/red) for CRL status
- Add dashboard widget: "hosts with degraded CRL: N"
**Backward compat:** Older agents don't report these fields → UI shows "CRL not configured". No regression.
---
### PR 4 — Agent: Health response includes CRL status
**Repo:** linux-patch-api
**Scope:**
- Add `crl_status` and `crl_age_seconds` to the agent's health response payload
- Logic: `valid` if CRL loaded + signature good + not expired, `expired` if `nextUpdate` passed, `missing` if no CRL on disk, `invalid` if signature fails
**Backward compat:** Field is additive. Manager treats missing field as "unknown" / "missing".
---
### PR 5 — Manager: Health aggregation logic
**Repo:** linux-patch-manager (this)
**Scope:**
- Aggregate per-host CRL health into the host's overall health
- Implement severity rules: invalid signature → unhealthy; missing > 24h on new agent → unhealthy; missing on old agent → degraded; > 25h old → degraded; otherwise healthy
- Add audit events: `CrlStaleDetected`, `CrlMissing`, `CrlInvalid`
**Backward compat:** Logic only fires when PR 3 + PR 4 are deployed. Safe to merge ahead of those.
---
### PR 6 — E2E integration test harness
**Repos:** both (new `tests/e2e/` directory in this repo, mirroring setup in agent repo)
**Scope:**
- docker-compose harness running both pm-web and linux-patch-api
- Test scenarios:
- Issue → enroll → connect (fresh agent connects successfully)
- Issue → enroll → revoke → refresh → connect (rejected)
- Issue → enroll → revoke → no refresh → connect (succeeds with stale CRL + warning)
- Manager down → connect (succeeds with stale CRL + degraded health)
- Independent CI for each repo; full E2E runs on main branch merges
**Backward compat:** Test-only, no production impact.
---
## 5. Failure Modes and Operational Behavior
### 5.1 Stale CRL on agent
**Scenario:** Agent's CRL has `nextUpdate` passed. Background refresh fails (manager unreachable).
**Behavior:**
- Agent continues serving mTLS connections using the stale CRL
- Logs warning every refresh attempt
- Reports `crl_status=expired` and `crl_age_seconds` in health response
- Manager's health aggregation marks host as `degraded`
- Worst case: ~24h of accepting a cert that was revoked after the agent's CRL was generated
- The cert's `not_after` is still the hard backstop (1 year from issuance)
### 5.2 Missing CRL on agent
**Scenario:** New agent enrolls, but CRL generation fails on the manager. Or older agent predates CRL feature.
**Behavior:**
- Agent starts with no CRL on disk
- Falls back to `WebPkiClientVerifier` (chain validation only, no CRL check)
- Logs warning, reports `crl_status=missing`
- Manager's health aggregation marks host as `degraded`
- If host is a newer agent: 24h after enrollment without CRL → escalates to `unhealthy`
- If host is an older agent: stays `degraded` indefinitely (feature gap, not a failure)
### 5.3 Invalid CRL signature on agent
**Scenario:** CRL file is corrupted, or the manager's CA key was compromised.
**Behavior:**
- Agent refuses to load the CRL
- **Refuses to start the mTLS server** (fail-closed here, because invalid signature is a security event)
- Logs critical error
- Reports `crl_status=invalid` in health response
- Operator must investigate: check manager's CA, re-fetch CRL manually, or restore from backup
### 5.4 Manager unreachable during enrollment
**Scenario:** New agent tries to enroll. Manager is down.
**Behavior:**
- Enrollment fails (manager is required for cert issuance)
- Agent retries on its configured enrollment schedule
- Once manager is back, enrollment succeeds, agent receives cert + CA + CRL (if available)
### 5.5 New host enrollment during CRL outage
**Scenario:** Manager is up, cert issuance works, but CRL generation fails (e.g., DB issue during `generate_crl`).
**Behavior:**
- Enrollment succeeds
- Agent receives cert + CA chain, but **no CRL** in the bundle
- Agent starts with no CRL, falls back to WebPKI
- Reports `crl_status=missing`
- Next 24h refresh attempts to fetch CRL from `/pki/crl.pem`
- If CRL generation is fixed by then, agent picks it up on next refresh
- If still failing, agent continues in degraded mode
---
## 6. Acceptance Criteria
### Phase 1 (Manager-side MVP)
- [ ] `generate_crl()` produces a valid X.509 CRL signed by the same CA key that signs leaf certs
- [ ] CRL includes only certs where `status='revoked' AND not_after > NOW()`
- [ ] `GET /api/v1/pki/crl.pem` returns 200 + valid PEM + `Cache-Control: max-age=3600`
- [ ] Enrollment PKI bundle includes the CRL
- [ ] Enrollment bundle includes the full CA chain (new `ca_chain` field)
- [ ] Background task regenerates CRL every 12h
- [ ] `revoke_cert` triggers immediate CRL regeneration
- [ ] Unit tests, property tests, fuzz harness, integration tests all pass
### Phase 2 (Agent-side consumption)
- [ ] Agent fetches CRL on enrollment from enrollment bundle
- [ ] Agent persists CRL to `/etc/linux-patch-api/certs/crl.pem`
- [ ] Agent verifies CRL signature against pinned CA on load
- [ ] Agent uses `CrlClientCertVerifier` wrapper that delegates to WebPKI + adds CRL check
- [ ] Revoked cert is rejected at mTLS handshake with clear error
- [ ] Valid (non-revoked) cert is accepted
- [ ] Background task refreshes CRL every 24h (configurable)
- [ ] Missing CRL falls back to WebPKI (degraded mode, not fail-closed)
- [ ] Invalid CRL signature causes agent to refuse to start
- [ ] Unit tests, property tests, fuzz harness, integration tests all pass
### Phase 3 (Health monitoring + UI)
- [ ] Health response includes `crl_status` and `crl_age_seconds`
- [ ] Host details page shows CRL section (status, age, next update, last refresh)
- [ ] Hosts list shows CRL status icon (green/yellow/red)
- [ ] Dashboard widget shows count of hosts with degraded CRL
- [ ] Health aggregation: invalid signature → unhealthy
- [ ] Health aggregation: new agent missing > 24h → unhealthy
- [ ] Health aggregation: old agent missing → degraded
- [ ] Health aggregation: > 25h old → degraded
- [ ] Audit events: `CertRevoked`, `CrlGenerated`, `CrlFetched`, `CrlStaleDetected`, `CrlMissing`, `CrlInvalid`
### Phase 4 (E2E tests)
- [ ] docker-compose harness runs both pm-web and linux-patch-api
- [ ] E2E test: issue → enroll → connect (succeeds)
- [ ] E2E test: issue → enroll → revoke → refresh → connect (rejected)
- [ ] E2E test: issue → enroll → revoke → no refresh → connect (succeeds with stale CRL)
- [ ] E2E test: manager down → connect (succeeds with stale CRL, degraded health)
### Documentation
- [ ] `docs/security/revocation.md` (NEW) — revocation policy and operational behavior
- [ ] `docs/architecture/pki.md` updated with CRL section + sub-CA section
- [ ] `docs/architecture/health-monitoring.md` updated with CRL health states
- [ ] `docs/architecture/agent-cert-flow.md` (NEW) — end-to-end flow with mermaid diagram
- [ ] `docs/api/REST_API.md` (or equivalent) updated with new endpoint
- [ ] `docs/operations/upgrade-guide.md` updated with rollout notes
- [ ] `docs/operations/crl-troubleshooting.md` (NEW) — common issues and diagnostics
- [ ] Inline code docs on all new public functions/structs
- [ ] `CHANGELOG.md` entry for the release that lands Phase 1
- [ ] `linux-patch-api/config.example.toml` updated with new CRL config keys
- [ ] `linux-patch-manager/config.example.toml` updated with new CRL config keys
---
## 7. Sign-off
**All 12 concerns resolved.** Design is finalized. Implementation can begin.
**Next action:** Start PR 1 (Manager: CRL generation + endpoint + enrollment bundle).
The companion issue on linux-patch-api (#20) is filed and tracks the agent-side changes for PR 2 and PR 4.
**Documented assumptions (must be confirmed before production deployment):**
1. The external root in sub-CA mode is long-lived and trusted. Its own CRL is not consulted.
2. 1-year cert lifetime is acceptable; revocation lag of up to 24h is the operational upper bound.
3. Operators accept that during a CRL refresh failure, revoked certs may be accepted for up to 24h (the cert's `not_after` is the hard backstop).
4. Max ~2500 clients per manager. If this changes, revisit CRL size and consider OCSP.

84
tasks/issue-7-pr1-todo.md Normal file
View File

@ -0,0 +1,84 @@
# PR 1: Manager-side CRL generation + endpoint + enrollment bundle
**Branch:** `feat/7-crl-manager-side`
**Target issue:** https://github.com/Draco-Lunaris/Linux-Patch-Manager/issues/7
## Pre-implementation
- [x] Read existing `pm-ca/src/ca.rs` to understand CA structure
- [x] Confirm rcgen 0.13 is the chosen library
- [x] Confirm sub-CA handling: extend `PkiBundle` with `ca_chain` field
- [x] Read design doc decisions table (concerns 1-12)
## Code changes
- [ ] **pm-ca/src/ca.rs**: Add `generate_crl(db: &PgPool) -> Result<String>` function
- Query `certificates` for `status='revoked' AND not_after > NOW()`
- Build CRL using rcgen 0.13 `CertificateRevocationList`
- Sign with CA private key
- Return PEM-encoded CRL
- [ ] **pm-core/src/models.rs**: Extend `PkiBundle` with `ca_chain: String` field
- Concatenated PEM bundle of full chain (intermediate + root) for sub-CA mode
- For root mode, contains just the root cert (same as ca_crt)
- [ ] **pm-web/src/routes/pki.rs** (NEW): `GET /api/v1/pki/crl.pem` route
- Public endpoint (no auth, CRLs are self-authenticating)
- `Cache-Control: max-age=3600`
- Returns latest cached CRL (regenerated on schedule or on revoke)
- [ ] **pm-web/src/routes/enrollment.rs**: Include CRL in enrollment response
- Fetch current CRL via `generate_crl()`
- Add `crl_pem: String` to response
- [ ] **pm-web/src/main.rs**: Wire up background CRL regeneration task
- Regenerate every 12 hours
- Hook into `revoke_cert` to trigger immediate regeneration
- Store latest CRL in shared state (ArcSwap or similar)
- [ ] **crates/pm-web/src/state.rs** (or similar): Shared state for cached CRL
## Tests
- [ ] **Unit tests** in `pm-ca/src/ca.rs`:
- `generate_crl` produces valid X.509 CRL signed by test CA
- Revoked serials appear in CRL
- Non-revoked serials do not appear
- Expired certs (not_after < now) are excluded
- Empty table produces CRL with zero revoked entries
- [ ] **Property tests** (proptest):
- Random revoked cert data: CRL is always parseable, signature always verifies
- Single-byte mutations to CRL fail signature verification
- [ ] **Fuzz harness** (cargo-fuzz):
- Target: `pm_ca::ca::generate_crl`
- Target: `pm_ca::ca::parse_crl` (if we add parsing)
- [ ] **Integration tests** in `pm-web/tests/`:
- `GET /pki/crl.pem` returns 200 + valid PEM + correct Cache-Control
- Enrollment bundle includes CRL
- Enrollment bundle includes `ca_chain` field
## Documentation
- [ ] **docs/security/revocation.md** (NEW): Revocation policy and operational behavior
- [ ] **docs/api/REST_API.md** (or equivalent): Document `GET /pki/crl.pem`
- [ ] **Inline doc comments** on new public functions/structs
- [ ] **CHANGELOG.md** entry for the release
## Pre-PR checklist
- [ ] `cargo build` clean
- [ ] `cargo test` all pass
- [ ] `cargo clippy --all-targets --all-features -- -D warnings` clean
- [ ] `cargo fmt --check` clean
- [ ] CI on GitHub passes
## Out of scope for PR 1 (deferred to later PRs)
- Agent-side consumption (PR 2, in linux-patch-api repo)
- Health check schema additions (PR 3)
- Agent health response field (PR 4)
- Health aggregation logic (PR 5)
- E2E test harness (PR 6)