Some checks failed
CI Pipeline / Rust Format Check (push) Failing after 19s
CI Pipeline / Clippy Lints (push) Successful in 46s
CI Pipeline / Rust Unit Tests (push) Successful in 1m30s
CI Pipeline / Security Audit (push) Successful in 4s
CI Pipeline / Frontend Lint & Type Check (push) Successful in 1m11s
CI Pipeline / Build .deb & Release (push) Has been skipped
- ws_relay.rs: Add ALPN protocol http/1.1 to rustls ClientConfig to prevent
HTTP/2 negotiation which breaks WebSocket upgrades (Sec-WebSocket-Accept mismatch)
- ws_relay.rs: Add detailed TLS error chain logging for debugging connection failures
- ws_relay.rs: Add HTTP polling fallback when WebSocket connection fails, using
AgentClient to poll /api/v1/jobs/{id} every ws_relay_poll_interval_secs
- config.rs: Add ws_relay_poll_interval_secs field (default: 10 seconds)
- config.example.toml: Add ws_relay_poll_interval_secs documentation
- jobs.rs: Fire pg_notify with event_type job on cancel
- job_executor.rs: Fire pg_notify with event_type job when parent job transitions
- ws_relay.rs: Add event_type field to NotifyPayload (host vs job events)
- Frontend: Add event_type, succeeded_count, failed_count, host_count to JobWsEvent
- Frontend: handleWsEvent distinguishes host vs job events for accurate status updates
2.7 KiB
2.7 KiB
WebSocket + Polling Fallback Implementation Plan
Problem
The linux-patch-api agent's /api/v1/ws/jobs endpoint is a stub that returns HTTP 101
with a JSON body but doesn't compute the required Sec-WebSocket-Accept header. This
causes the pm-worker WS relay to fail with "Key mismatch in Sec-WebSocket-Accept header".
Additionally, the pm-worker WS relay's rustls ClientConfig didn't set ALPN to http/1.1, causing HTTP/2 negotiation which also breaks WebSocket upgrades.
Root Causes
- Agent WS handler is a stub — doesn't implement RFC 6455 WebSocket handshake
- WS relay missing ALPN — rustls ClientConfig didn't set
alpn_protocolstohttp/1.1 - No fallback — WS relay has no fallback if WebSocket fails
Completed
- ALPN fix in pm-worker ws_relay.rs (forces HTTP/1.1 for WebSocket)
- Error chain logging in pm-worker ws_relay.rs (for future debugging)
- Job-level WS event_type fix (frontend + backend)
Remaining Tasks
Phase 1: Implement proper WebSocket in linux-patch-api
- Replace stub
websocket_handlerinsrc/api/handlers/websocket.rswith proper actix-web-actors WebSocket - Create
WsJobActorthat:- Accepts WebSocket connections via
actix_web_actors::ws::start() - Subscribes to job status updates from
JobManager - Streams job status events to connected clients
- Handles subscribe/unsubscribe messages
- Accepts WebSocket connections via
- Wire up broadcast channel from JobManager to WebSocket actors
- Build and deploy to dev LXC
Phase 2: Add polling fallback in pm-worker WS relay
- In
relay_one_job(), if WebSocket connection fails, fall back to HTTP polling - Use existing
AgentClient(reqwest + mTLS) to poll/api/v1/jobs/{id} - Poll interval: configurable, default 5-10 seconds
- Convert polled job status to same event format as WebSocket messages
- Fire
pg_notify('job_update')for polled status changes
Phase 3: Testing & Deployment
- Test WebSocket connection on dev LXC
- Test polling fallback on dev LXC
- Verify job completion status updates in UI
- Push to Gitea
- Update dev LXC deployment
Architecture Notes
linux-patch-api WebSocket (Phase 1)
- Uses
actix-web-actors::wsfor proper RFC 6455 WebSocket handshake WsJobActorimplementsactix::Actor+StreamHandler<ws::Message>- JobManager has a
tokio::sync::broadcastchannel for status updates - WsJobActor subscribes to this channel and forwards events to clients
pm-worker WS relay fallback (Phase 2)
relay_one_job()tries WebSocket first- On connection failure, falls back to
poll_job_status()using AgentClient - Poll interval configurable via
[worker]config (default: 10s) - Status changes trigger
pg_notify('job_update')same as WebSocket events