M5: Patch Deployment & Job Management
Backend: - migrations/003_jobs_scheduling.sql: retry_next_at/last_error columns, pg_notify trigger for immediate job dispatch, retry index - pm-agent-client: ApplyPatchesRequest/Response, AgentJobStatus, RollbackResponse types; apply_patches/job_status/rollback_job client methods + generic POST helper - pm-core/models: JobStatus, JobKind, PatchJob, PatchJobHost, CreateJobRequest, PatchJobSummary - pm-web/routes/jobs.rs: POST/GET /api/v1/jobs, GET /jobs/:id, POST /jobs/:id/cancel, POST /jobs/:id/rollback - pm-worker/job_executor.rs: NOTIFY listener, periodic scanner, execute_host_job, poll_running_jobs, handle_host_failure (3-retry exponential backoff 1m/5m/30m), sync_job_status, retry_pending_jobs - pm-worker/main.rs: spawn job_executor Frontend: - types/index.ts: PatchInfo, PatchJobHost, PatchJob, PatchJobSummary, CreateJobRequest interfaces - api/client.ts: jobsApi (list/get/create/cancel/rollback), patchesApi (getHostPatches) - pages/PatchDeploymentPage.tsx: 3-step MUI Stepper (host select → configure → result) - pages/JobsPage.tsx: job list table, expandable per-host detail, cancel/rollback actions with confirm dialog, load-more pagination - App.tsx: /jobs and /deployment routes wired to real pages cargo check: 0 errors | vite build: 0 errors
This commit is contained in:
@ -116,32 +116,32 @@ Each milestone produces a **testable vertical slice** — backend + frontend + d
|
||||
### M4: Agent Communication Layer + Dashboard
|
||||
**Goal:** mTLS client works, health/patch polling operational, dashboard shows fleet status.
|
||||
|
||||
- [ ] Implement `pm-agent-client` — Rustls-based mTLS HTTP client with client certificate, TLS 1.3 only
|
||||
- [ ] Implement agent API calls: `GET /api/v1/health`, `GET /api/v1/system/info`, `GET /api/v1/packages`, `GET /api/v1/patches`
|
||||
- [ ] Implement worker health poller: 5-minute intervals, bounded concurrency (64 semaphore), update `host_health_data`
|
||||
- [ ] Implement worker patch data poller: 30-minute intervals, bounded concurrency, update `host_patch_data`
|
||||
- [ ] Implement on-demand refresh: `POST /api/v1/hosts/{id}/refresh` → `NOTIFY refresh_requested` → worker queries immediately
|
||||
- [ ] Implement host health status tracking: healthy/degraded/unreachable with timestamps
|
||||
- [ ] Implement dashboard API: `GET /api/v1/status/fleet` (authenticated, fleet aggregates)
|
||||
- [ ] Frontend: Dashboard page — compliance %, health summary, pending patches, upcoming windows, root CA download icon
|
||||
- [ ] Frontend: Real-time health status indicators (green/yellow/red) on host lists
|
||||
- [ ] Verify: polling works, dashboard shows live fleet data, on-demand refresh works, visual alerts for unhealthy agents
|
||||
- [x] Implement `pm-agent-client` — Rustls-based mTLS HTTP client with client certificate, TLS 1.3 only
|
||||
- [x] Implement agent API calls: `GET /api/v1/health`, `GET /api/v1/system/info`, `GET /api/v1/packages`, `GET /api/v1/patches`
|
||||
- [x] Implement worker health poller: 5-minute intervals, bounded concurrency (64 semaphore), update `host_health_data`
|
||||
- [x] Implement worker patch data poller: 30-minute intervals, bounded concurrency, update `host_patch_data`
|
||||
- [x] Implement on-demand refresh: `POST /api/v1/hosts/{id}/refresh` → `NOTIFY refresh_requested` → worker queries immediately
|
||||
- [x] Implement host health status tracking: healthy/degraded/unreachable with timestamps
|
||||
- [x] Implement dashboard API: `GET /api/v1/status/fleet` (authenticated, fleet aggregates)
|
||||
- [x] Frontend: Dashboard page — compliance %, health summary, pending patches, upcoming windows, root CA download icon
|
||||
- [x] Frontend: Real-time health status indicators (green/yellow/red) on host lists
|
||||
- [x] Verify: polling works, dashboard shows live fleet data, on-demand refresh works, visual alerts for unhealthy agents
|
||||
|
||||
### M5: Patch Deployment & Job Management + Frontend Pages
|
||||
**Goal:** Full patch lifecycle — queue, immediate, retry, rollback, job monitoring.
|
||||
|
||||
- [ ] Implement job creation: `POST /api/v1/jobs` (queue for window or apply now)
|
||||
- [ ] Implement `patch_jobs` and `patch_job_hosts` row creation
|
||||
- [ ] Implement `NOTIFY job_enqueued` for immediate-apply wake
|
||||
- [ ] Implement worker job executor: call agent `POST /api/v1/patches/apply`, track async job IDs
|
||||
- [ ] Implement worker retry engine: exponential backoff (1min, 5min, 30min), 3 retries max
|
||||
- [ ] Implement patch job auto-retry within maintenance window (1 retry)
|
||||
- [ ] Implement batch partial failure handling: auto-retry once, then report
|
||||
- [ ] Implement rollback: `POST /api/v1/jobs/{id}/rollback` → worker calls agent rollback endpoint
|
||||
- [ ] Implement job status tracking: poll agent `GET /api/v1/jobs/{id}` for running jobs
|
||||
- [ ] Implement job listing/detail API: `GET /api/v1/jobs`, `GET /api/v1/jobs/{id}`
|
||||
- [ ] Frontend: Patch Deployment page (select hosts → review patches → queue or apply now)
|
||||
- [ ] Frontend: Jobs page (job list, per-host status, rollback action)
|
||||
- [x] Implement job creation: `POST /api/v1/jobs` (queue for window or apply now)
|
||||
- [x] Implement `patch_jobs` and `patch_job_hosts` row creation
|
||||
- [x] Implement `NOTIFY job_enqueued` for immediate-apply wake
|
||||
- [x] Implement worker job executor: call agent `POST /api/v1/patches/apply`, track async job IDs
|
||||
- [x] Implement worker retry engine: exponential backoff (1min, 5min, 30min), 3 retries max
|
||||
- [x] Implement patch job auto-retry within maintenance window (1 retry)
|
||||
- [x] Implement batch partial failure handling: auto-retry once, then report
|
||||
- [x] Implement rollback: `POST /api/v1/jobs/{id}/rollback` → worker calls agent rollback endpoint
|
||||
- [x] Implement job status tracking: poll agent `GET /api/v1/jobs/{id}` for running jobs
|
||||
- [x] Implement job listing/detail API: `GET /api/v1/jobs`, `GET /api/v1/jobs/{id}`
|
||||
- [x] Frontend: Patch Deployment page (select hosts → review patches → queue or apply now)
|
||||
- [x] Frontend: Jobs page (job list, per-host status, rollback action)
|
||||
- [ ] Verify: queued job waits for window, immediate job runs now, retry logic works, rollback works, batch partial failures reported
|
||||
|
||||
### M6: Maintenance Windows & Scheduling + Frontend Page
|
||||
|
||||
Reference in New Issue
Block a user