* feat(pki): add CRL generation, distribution endpoint, and enrollment bundle extension Implements manager-side CRL infrastructure for issue #7: - Add CertAuthority::generate_crl() using rcgen 0.13 - Add GET /api/v1/pki/crl.pem public endpoint - Extend PkiBundle with ca_chain and crl_pem fields - Update enrollment route to include CRL in bundle - Mount pki route as public endpoint - Add proptest dev-dependency * style: fix cargo fmt in enrollment.rs --------- Co-authored-by: Draco Lunaris <331325+Draco-Lunaris@users.noreply.github.com>
19 KiB
Issue #7: Certificate Revocation Enforcement — Full CRL Design
GitHub Issue: https://github.com/Draco-Lunaris/Linux-Patch-Manager/issues/7 Companion issue (agent repo): https://github.com/Draco-Lunaris/Linux-Patch-Api/issues/20 Status: Design finalized — implementation pending Repos affected: linux-patch-manager (this), linux-patch-api (agent) Last updated: 2026-06-05
1. Goal
Enforce certificate revocation at the mTLS handshake by having the manager (CA operator) publish a Certificate Revocation List (CRL) and the agent (linux-patch-api) consult it during TLS client certificate validation.
Connection direction: The manager (this repo) is the mTLS client. The agent (linux-patch-api) is the mTLS server. The manager connects TO the agent and presents a client cert. The agent validates it. Agent-to-manager connections occur only for enrollment.
2. Architecture
2.1 Components
┌──────────────────┐ ┌──────────────────────┐
│ pm-web │ │ linux-patch-api │
│ (manager) │ GET /pki/crl.pem │ (agent) │
│ │ ◄──────────────────────│ │
│ ┌────────────┐ │ on enrollment + │ ┌────────────────┐ │
│ │ pm-ca │ │ every 24h │ │ mTLS server │ │
│ │ (signs │ │ │ │ (validates │ │
│ │ certs + │ │ Bundle: CA chain + │ │ client certs │ │
│ │ CRLs) │ │ client cert + │ │ + CRL check) │ │
│ └────────────┘ │ client key + │ └────────────────┘ │
│ │ CRL │ │
└──────────────────┘ └──────────────────────┘
│ │
│ Health check (existing infra) │
│ + CRL age on agent side │
└───────────────────────────────────────────┘
2.2 Mermaid flow diagram
sequenceDiagram
participant Mgr as Manager (pm-web)
participant Agent as Agent (linux-patch-api)
participant CA as pm-ca
participant DB as certificates table
Note over Mgr,CA: Initial Enrollment
Agent->>Mgr: POST /api/v1/enroll (with CSR)
Mgr->>CA: issue cert (sign with CA key)
CA->>DB: INSERT certificate (status=active)
CA-->>Mgr: leaf cert
Mgr->>CA: generate_crl()
CA->>DB: SELECT serials WHERE status=revoked
CA-->>Mgr: signed CRL
Mgr-->>Agent: PKI bundle (CA chain + cert + key + CRL)
Agent->>Agent: persist all 4 to /etc/linux-patch-api/certs/
Agent->>Agent: verify CRL signature against pinned CA
Note over Agent: Background refresh (every 24h)
Agent->>Mgr: GET /api/v1/pki/crl.pem
Mgr->>CA: generate_crl() (cached or regenerate)
CA-->>Mgr: CRL
Mgr-->>Agent: CRL
Agent->>Agent: verify signature, persist, swap in-memory map
Note over Mgr,Agent: Normal operation (mTLS)
Mgr->>Agent: mTLS handshake (presents client cert)
Agent->>Agent: webpki verifies chain
Agent->>Agent: extract serial, check CRL
alt serial in CRL
Agent-->>Mgr: handshake rejected
else serial not in CRL
Agent-->>Mgr: handshake accepted
end
Note over Mgr,CA: Operator revokes a cert
Mgr->>CA: revoke_cert(serial)
CA->>DB: UPDATE status=revoked
CA->>CA: generate_crl() (regenerate)
Note over Agent: Next 24h refresh picks up the revocation
2.3 Sub-CA handling
Both root and sub-CA modes are supported.
- Root mode: Manager is a self-signed CA. The CA cert in the bundle is also the trust anchor. CRL signature chains directly to it.
- Sub-CA mode: Manager is a sub-CA under an external root. The enrollment bundle includes the full chain: external root + manager's intermediate cert. The agent pins both. CRL signature chains up to the external root.
Required code change for sub-CA support: Extend PkiBundle to include the full chain (new ca_chain field containing intermediate + root as a single PEM bundle). The existing single ca_crt field is preserved for backward compat (it becomes the leaf-most cert in the chain).
The external root's own CRL is out of scope for this design. Documented assumption: the external root is long-lived and trusted by the agent's system trust store, or the operator accepts the risk of a long-lived external root.
2.4 Cert lifetime
No change to current 1-year lifetime. Revocation lag of up to 24h is acceptable given the 1-year cert validity. Shortening to 90 days was considered and deferred (Phase 1+ works correctly with either lifetime).
3. Final Decisions (12 concerns walked through with Kelly)
| # | Concern | Decision |
|---|---|---|
| 1 | Sub-CA enrollment bundle chain | Extend PkiBundle to include full chain (intermediate + root) as new ca_chain field. Single ca_crt field preserved for backward compat. |
| 2 | CRL generation library | rcgen 0.13 on manager (sign). x509-parser on agent (parse). webpki for chain validation in custom verifier. No new system deps. |
| 3 | Custom ClientCertVerifier | Use rustls danger::ClientCertVerifier trait. Wrapper struct delegates chain validation to WebPkiClientVerifier, adds serial lookup against parsed CRL. Only ~80 lines of custom code. |
| 4 | Stale-CRL failure mode | (c) Degraded. Continue serving with stale CRL, log warning, health check reports degraded. Missing CRL = degraded. Invalid signature = refuse to start (fail-closed). |
| 5 | CRL size at scale | Not a concern. Max 2500 clients/manager. CRLs KB-range. No index on (status, not_after) needed. |
| 6 | Health check backward compat | Missing crl_status field from older agent = degraded (not unhealthy). New agent with missing > 24h after enrollment = unhealthy. UI: host details page + list icon + dashboard widget. |
| 7 | Test coverage | Layers 1-3 (unit + property + integration) required for ship. Layer 4 (E2E docker-compose) incremental. Layer 5 (fuzz) added now. Property-based tests with proptest added now. |
| 8 | Deployment order | 6 PRs sequential. No feature flag (disk state is the implicit flag). All-at-once rollout for PR 2 (agent). |
| 9 | Documentation | Full scope. New docs/security/revocation.md as top-level doc. Mermaid diagrams in markdown (GitHub renders natively). |
| 10 | Phasing risk | Low. Pre-production stage, no live users to disrupt. Bounded window between PR 1 and PR 2. |
| 11 | mTLS direction | Confirmed. Manager = client, agent = server. Agent-to-manager only for enrollment. |
| 12 | New host enrollment during CRL outage | Enrollment succeeds without CRL. Health check reports missing. Agent fetches CRL on next refresh cycle. |
4. Phased Implementation (6 PRs)
PR 1 — Manager: CRL generation + endpoint + enrollment bundle
Repo: linux-patch-manager (this) Scope:
- Extend
PkiBundleto include full chain (newca_chainfield) - Add
generate_crl()topm-ca/src/ca.rsusing rcgen 0.13 - Add
GET /api/v1/pki/crl.pemroute in newcrates/pm-web/src/routes/pki.rs - Include CRL PEM in enrollment response
- Background task: regenerate CRL every 12h and on every
revoke_certcall - No DB schema changes
Testing:
- Unit tests for
generate_crl()(revoked serials present, non-revoked absent, expired excluded) - Property tests (proptest) for CRL generation roundtrip
- Fuzz harness for CRL generation
- Integration test:
GET /pki/crl.pemreturns 200 + valid PEM + correctCache-Control - Integration test: enrollment bundle includes CRL
Backward compat: Endpoint is dark until an agent is updated to consume it. Older agents ignore it. Zero impact on existing flows.
PR 2 — Agent: CRL consumption + custom verifier
Repo: linux-patch-api Scope:
- New
src/auth/crl.rsmodule: CRL load, signature verification, in-memory serial map (ArcSwap) - New
src/auth/crl_refresh.rs: background task fetching CRL every 24h fromGET {manager_url}/api/v1/pki/crl.pem - Extend
src/auth/mtls.rs: replace directWebPkiClientVerifierusage withCrlClientCertVerifierwrapper - Persist CRL to
/etc/linux-patch-api/certs/crl.pem - Config additions:
crl_path,crl_refresh_interval,manager_url
Custom verifier (sketch):
pub struct CrlClientCertVerifier {
inner: Arc<dyn rustls::client::danger::ClientCertVerifier>,
crl: arc_swap::ArcSwap<Crl>,
}
impl rustls::client::danger::ClientCertVerifier for CrlClientCertVerifier {
fn verify_client_cert(
&self,
end_entity: &CertificateDer<'_>,
intermediates: &[CertificateDer<'_>],
now: UnixTime,
) -> Result<ClientCertVerified, rustls::Error> {
// Delegate chain validation to WebPKI (battle-tested)
self.inner.verify_client_cert(end_entity, intermediates, now)?;
// Extract serial from the leaf cert
let serial = extract_serial(end_entity)
.map_err(|e| rustls::Error::General(format!("serial extract: {}", e)))?;
// Check CRL (O(1) hash lookup)
let crl = self.crl.load();
if crl.is_revoked(serial) {
return Err(rustls::Error::General(format!(
"cert serial {} is revoked", serial
)));
}
Ok(ClientCertVerified::assertion())
}
// Delegate remaining trait methods to self.inner
fn supported_verify_schemes(&self) -> Vec<rustls::SignatureScheme> {
self.inner.supported_verify_schemes()
}
fn verify_tls12_signature(&self, ...) -> Result<...> {
self.inner.verify_tls12_signature(...)
}
fn verify_tls13_signature(&self, ...) -> Result<...> {
self.inner.verify_tls13_signature(...)
}
}
Backward compat: If CRL file is missing or fails signature verification, fall back to WebPkiClientVerifier directly (current behavior). Log warning. Health check from manager reports degraded.
Testing:
- Unit tests: CRL load (valid, malformed, missing, tampered, expired)
- Unit tests: custom verifier (valid cert accepted, revoked cert rejected, no false positives)
- Property tests (proptest): random certs + random CRLs, no false negs/pos
- Fuzz harness for CRL load and verifier
- Integration test: end-to-end mTLS (valid cert connects, revoked cert rejected)
- Integration test: stale CRL fallback to WebPKI (no connection rejection)
PR 3 — Manager: Health check schema + UI
Repo: linux-patch-manager (this) Scope:
- Extend health check response schema to include
crl_statusandcrl_age_secondsfields (optional, backward compat) - Add UI: CRL section in host details page
- Add hosts list icon (green/yellow/red) for CRL status
- Add dashboard widget: "hosts with degraded CRL: N"
Backward compat: Older agents don't report these fields → UI shows "CRL not configured". No regression.
PR 4 — Agent: Health response includes CRL status
Repo: linux-patch-api Scope:
- Add
crl_statusandcrl_age_secondsto the agent's health response payload - Logic:
validif CRL loaded + signature good + not expired,expiredifnextUpdatepassed,missingif no CRL on disk,invalidif signature fails
Backward compat: Field is additive. Manager treats missing field as "unknown" / "missing".
PR 5 — Manager: Health aggregation logic
Repo: linux-patch-manager (this) Scope:
- Aggregate per-host CRL health into the host's overall health
- Implement severity rules: invalid signature → unhealthy; missing > 24h on new agent → unhealthy; missing on old agent → degraded; > 25h old → degraded; otherwise healthy
- Add audit events:
CrlStaleDetected,CrlMissing,CrlInvalid
Backward compat: Logic only fires when PR 3 + PR 4 are deployed. Safe to merge ahead of those.
PR 6 — E2E integration test harness
Repos: both (new tests/e2e/ directory in this repo, mirroring setup in agent repo)
Scope:
- docker-compose harness running both pm-web and linux-patch-api
- Test scenarios:
- Issue → enroll → connect (fresh agent connects successfully)
- Issue → enroll → revoke → refresh → connect (rejected)
- Issue → enroll → revoke → no refresh → connect (succeeds with stale CRL + warning)
- Manager down → connect (succeeds with stale CRL + degraded health)
- Independent CI for each repo; full E2E runs on main branch merges
Backward compat: Test-only, no production impact.
5. Failure Modes and Operational Behavior
5.1 Stale CRL on agent
Scenario: Agent's CRL has nextUpdate passed. Background refresh fails (manager unreachable).
Behavior:
- Agent continues serving mTLS connections using the stale CRL
- Logs warning every refresh attempt
- Reports
crl_status=expiredandcrl_age_secondsin health response - Manager's health aggregation marks host as
degraded - Worst case: ~24h of accepting a cert that was revoked after the agent's CRL was generated
- The cert's
not_afteris still the hard backstop (1 year from issuance)
5.2 Missing CRL on agent
Scenario: New agent enrolls, but CRL generation fails on the manager. Or older agent predates CRL feature.
Behavior:
- Agent starts with no CRL on disk
- Falls back to
WebPkiClientVerifier(chain validation only, no CRL check) - Logs warning, reports
crl_status=missing - Manager's health aggregation marks host as
degraded - If host is a newer agent: 24h after enrollment without CRL → escalates to
unhealthy - If host is an older agent: stays
degradedindefinitely (feature gap, not a failure)
5.3 Invalid CRL signature on agent
Scenario: CRL file is corrupted, or the manager's CA key was compromised.
Behavior:
- Agent refuses to load the CRL
- Refuses to start the mTLS server (fail-closed here, because invalid signature is a security event)
- Logs critical error
- Reports
crl_status=invalidin health response - Operator must investigate: check manager's CA, re-fetch CRL manually, or restore from backup
5.4 Manager unreachable during enrollment
Scenario: New agent tries to enroll. Manager is down.
Behavior:
- Enrollment fails (manager is required for cert issuance)
- Agent retries on its configured enrollment schedule
- Once manager is back, enrollment succeeds, agent receives cert + CA + CRL (if available)
5.5 New host enrollment during CRL outage
Scenario: Manager is up, cert issuance works, but CRL generation fails (e.g., DB issue during generate_crl).
Behavior:
- Enrollment succeeds
- Agent receives cert + CA chain, but no CRL in the bundle
- Agent starts with no CRL, falls back to WebPKI
- Reports
crl_status=missing - Next 24h refresh attempts to fetch CRL from
/pki/crl.pem - If CRL generation is fixed by then, agent picks it up on next refresh
- If still failing, agent continues in degraded mode
6. Acceptance Criteria
Phase 1 (Manager-side MVP)
generate_crl()produces a valid X.509 CRL signed by the same CA key that signs leaf certs- CRL includes only certs where
status='revoked' AND not_after > NOW() GET /api/v1/pki/crl.pemreturns 200 + valid PEM +Cache-Control: max-age=3600- Enrollment PKI bundle includes the CRL
- Enrollment bundle includes the full CA chain (new
ca_chainfield) - Background task regenerates CRL every 12h
revoke_certtriggers immediate CRL regeneration- Unit tests, property tests, fuzz harness, integration tests all pass
Phase 2 (Agent-side consumption)
- Agent fetches CRL on enrollment from enrollment bundle
- Agent persists CRL to
/etc/linux-patch-api/certs/crl.pem - Agent verifies CRL signature against pinned CA on load
- Agent uses
CrlClientCertVerifierwrapper that delegates to WebPKI + adds CRL check - Revoked cert is rejected at mTLS handshake with clear error
- Valid (non-revoked) cert is accepted
- Background task refreshes CRL every 24h (configurable)
- Missing CRL falls back to WebPKI (degraded mode, not fail-closed)
- Invalid CRL signature causes agent to refuse to start
- Unit tests, property tests, fuzz harness, integration tests all pass
Phase 3 (Health monitoring + UI)
- Health response includes
crl_statusandcrl_age_seconds - Host details page shows CRL section (status, age, next update, last refresh)
- Hosts list shows CRL status icon (green/yellow/red)
- Dashboard widget shows count of hosts with degraded CRL
- Health aggregation: invalid signature → unhealthy
- Health aggregation: new agent missing > 24h → unhealthy
- Health aggregation: old agent missing → degraded
- Health aggregation: > 25h old → degraded
- Audit events:
CertRevoked,CrlGenerated,CrlFetched,CrlStaleDetected,CrlMissing,CrlInvalid
Phase 4 (E2E tests)
- docker-compose harness runs both pm-web and linux-patch-api
- E2E test: issue → enroll → connect (succeeds)
- E2E test: issue → enroll → revoke → refresh → connect (rejected)
- E2E test: issue → enroll → revoke → no refresh → connect (succeeds with stale CRL)
- E2E test: manager down → connect (succeeds with stale CRL, degraded health)
Documentation
docs/security/revocation.md(NEW) — revocation policy and operational behaviordocs/architecture/pki.mdupdated with CRL section + sub-CA sectiondocs/architecture/health-monitoring.mdupdated with CRL health statesdocs/architecture/agent-cert-flow.md(NEW) — end-to-end flow with mermaid diagramdocs/api/REST_API.md(or equivalent) updated with new endpointdocs/operations/upgrade-guide.mdupdated with rollout notesdocs/operations/crl-troubleshooting.md(NEW) — common issues and diagnostics- Inline code docs on all new public functions/structs
CHANGELOG.mdentry for the release that lands Phase 1linux-patch-api/config.example.tomlupdated with new CRL config keyslinux-patch-manager/config.example.tomlupdated with new CRL config keys
7. Sign-off
All 12 concerns resolved. Design is finalized. Implementation can begin.
Next action: Start PR 1 (Manager: CRL generation + endpoint + enrollment bundle).
The companion issue on linux-patch-api (#20) is filed and tracks the agent-side changes for PR 2 and PR 4.
Documented assumptions (must be confirmed before production deployment):
- The external root in sub-CA mode is long-lived and trusted. Its own CRL is not consulted.
- 1-year cert lifetime is acceptable; revocation lag of up to 24h is the operational upper bound.
- Operators accept that during a CRL refresh failure, revoked certs may be accepted for up to 24h (the cert's
not_afteris the hard backstop). - Max ~2500 clients per manager. If this changes, revisit CRL size and consider OCSP.