Private
Public Access
1
0
Files
linux_patch_manager/tasks/todo.md
Echo e3a27eb2ed
Some checks failed
CI Pipeline / Rust Format Check (push) Failing after 19s
CI Pipeline / Clippy Lints (push) Successful in 46s
CI Pipeline / Rust Unit Tests (push) Successful in 1m30s
CI Pipeline / Security Audit (push) Successful in 4s
CI Pipeline / Frontend Lint & Type Check (push) Successful in 1m11s
CI Pipeline / Build .deb & Release (push) Has been skipped
fix: add ALPN http/1.1 for WebSocket, polling fallback, and job-level WS events
- ws_relay.rs: Add ALPN protocol http/1.1 to rustls ClientConfig to prevent
  HTTP/2 negotiation which breaks WebSocket upgrades (Sec-WebSocket-Accept mismatch)
- ws_relay.rs: Add detailed TLS error chain logging for debugging connection failures
- ws_relay.rs: Add HTTP polling fallback when WebSocket connection fails, using
  AgentClient to poll /api/v1/jobs/{id} every ws_relay_poll_interval_secs
- config.rs: Add ws_relay_poll_interval_secs field (default: 10 seconds)
- config.example.toml: Add ws_relay_poll_interval_secs documentation
- jobs.rs: Fire pg_notify with event_type job on cancel
- job_executor.rs: Fire pg_notify with event_type job when parent job transitions
- ws_relay.rs: Add event_type field to NotifyPayload (host vs job events)
- Frontend: Add event_type, succeeded_count, failed_count, host_count to JobWsEvent
- Frontend: handleWsEvent distinguishes host vs job events for accurate status updates
2026-05-04 15:16:20 +00:00

60 lines
2.7 KiB
Markdown

# WebSocket + Polling Fallback Implementation Plan
## Problem
The linux-patch-api agent's `/api/v1/ws/jobs` endpoint is a stub that returns HTTP 101
with a JSON body but doesn't compute the required `Sec-WebSocket-Accept` header. This
causes the pm-worker WS relay to fail with "Key mismatch in Sec-WebSocket-Accept header".
Additionally, the pm-worker WS relay's rustls ClientConfig didn't set ALPN to http/1.1,
causing HTTP/2 negotiation which also breaks WebSocket upgrades.
## Root Causes
1. **Agent WS handler is a stub** — doesn't implement RFC 6455 WebSocket handshake
2. **WS relay missing ALPN** — rustls ClientConfig didn't set `alpn_protocols` to `http/1.1`
3. **No fallback** — WS relay has no fallback if WebSocket fails
## Completed
- [x] ALPN fix in pm-worker ws_relay.rs (forces HTTP/1.1 for WebSocket)
- [x] Error chain logging in pm-worker ws_relay.rs (for future debugging)
- [x] Job-level WS event_type fix (frontend + backend)
## Remaining Tasks
### Phase 1: Implement proper WebSocket in linux-patch-api
- [ ] Replace stub `websocket_handler` in `src/api/handlers/websocket.rs` with proper actix-web-actors WebSocket
- [ ] Create `WsJobActor` that:
- Accepts WebSocket connections via `actix_web_actors::ws::start()`
- Subscribes to job status updates from `JobManager`
- Streams job status events to connected clients
- Handles subscribe/unsubscribe messages
- [ ] Wire up broadcast channel from JobManager to WebSocket actors
- [ ] Build and deploy to dev LXC
### Phase 2: Add polling fallback in pm-worker WS relay
- [ ] In `relay_one_job()`, if WebSocket connection fails, fall back to HTTP polling
- [ ] Use existing `AgentClient` (reqwest + mTLS) to poll `/api/v1/jobs/{id}`
- [ ] Poll interval: configurable, default 5-10 seconds
- [ ] Convert polled job status to same event format as WebSocket messages
- [ ] Fire `pg_notify('job_update')` for polled status changes
### Phase 3: Testing & Deployment
- [ ] Test WebSocket connection on dev LXC
- [ ] Test polling fallback on dev LXC
- [ ] Verify job completion status updates in UI
- [ ] Push to Gitea
- [ ] Update dev LXC deployment
## Architecture Notes
### linux-patch-api WebSocket (Phase 1)
- Uses `actix-web-actors::ws` for proper RFC 6455 WebSocket handshake
- `WsJobActor` implements `actix::Actor` + `StreamHandler<ws::Message>`
- JobManager has a `tokio::sync::broadcast` channel for status updates
- WsJobActor subscribes to this channel and forwards events to clients
### pm-worker WS relay fallback (Phase 2)
- `relay_one_job()` tries WebSocket first
- On connection failure, falls back to `poll_job_status()` using AgentClient
- Poll interval configurable via `[worker]` config (default: 10s)
- Status changes trigger `pg_notify('job_update')` same as WebSocket events