Private
Public Access
1
0
Files
linux_patch_manager/tasks/todo.md
Echo e3a27eb2ed
Some checks failed
CI Pipeline / Rust Format Check (push) Failing after 19s
CI Pipeline / Clippy Lints (push) Successful in 46s
CI Pipeline / Rust Unit Tests (push) Successful in 1m30s
CI Pipeline / Security Audit (push) Successful in 4s
CI Pipeline / Frontend Lint & Type Check (push) Successful in 1m11s
CI Pipeline / Build .deb & Release (push) Has been skipped
fix: add ALPN http/1.1 for WebSocket, polling fallback, and job-level WS events
- ws_relay.rs: Add ALPN protocol http/1.1 to rustls ClientConfig to prevent
  HTTP/2 negotiation which breaks WebSocket upgrades (Sec-WebSocket-Accept mismatch)
- ws_relay.rs: Add detailed TLS error chain logging for debugging connection failures
- ws_relay.rs: Add HTTP polling fallback when WebSocket connection fails, using
  AgentClient to poll /api/v1/jobs/{id} every ws_relay_poll_interval_secs
- config.rs: Add ws_relay_poll_interval_secs field (default: 10 seconds)
- config.example.toml: Add ws_relay_poll_interval_secs documentation
- jobs.rs: Fire pg_notify with event_type job on cancel
- job_executor.rs: Fire pg_notify with event_type job when parent job transitions
- ws_relay.rs: Add event_type field to NotifyPayload (host vs job events)
- Frontend: Add event_type, succeeded_count, failed_count, host_count to JobWsEvent
- Frontend: handleWsEvent distinguishes host vs job events for accurate status updates
2026-05-04 15:16:20 +00:00

2.7 KiB

WebSocket + Polling Fallback Implementation Plan

Problem

The linux-patch-api agent's /api/v1/ws/jobs endpoint is a stub that returns HTTP 101 with a JSON body but doesn't compute the required Sec-WebSocket-Accept header. This causes the pm-worker WS relay to fail with "Key mismatch in Sec-WebSocket-Accept header".

Additionally, the pm-worker WS relay's rustls ClientConfig didn't set ALPN to http/1.1, causing HTTP/2 negotiation which also breaks WebSocket upgrades.

Root Causes

  1. Agent WS handler is a stub — doesn't implement RFC 6455 WebSocket handshake
  2. WS relay missing ALPN — rustls ClientConfig didn't set alpn_protocols to http/1.1
  3. No fallback — WS relay has no fallback if WebSocket fails

Completed

  • ALPN fix in pm-worker ws_relay.rs (forces HTTP/1.1 for WebSocket)
  • Error chain logging in pm-worker ws_relay.rs (for future debugging)
  • Job-level WS event_type fix (frontend + backend)

Remaining Tasks

Phase 1: Implement proper WebSocket in linux-patch-api

  • Replace stub websocket_handler in src/api/handlers/websocket.rs with proper actix-web-actors WebSocket
  • Create WsJobActor that:
    • Accepts WebSocket connections via actix_web_actors::ws::start()
    • Subscribes to job status updates from JobManager
    • Streams job status events to connected clients
    • Handles subscribe/unsubscribe messages
  • Wire up broadcast channel from JobManager to WebSocket actors
  • Build and deploy to dev LXC

Phase 2: Add polling fallback in pm-worker WS relay

  • In relay_one_job(), if WebSocket connection fails, fall back to HTTP polling
  • Use existing AgentClient (reqwest + mTLS) to poll /api/v1/jobs/{id}
  • Poll interval: configurable, default 5-10 seconds
  • Convert polled job status to same event format as WebSocket messages
  • Fire pg_notify('job_update') for polled status changes

Phase 3: Testing & Deployment

  • Test WebSocket connection on dev LXC
  • Test polling fallback on dev LXC
  • Verify job completion status updates in UI
  • Push to Gitea
  • Update dev LXC deployment

Architecture Notes

linux-patch-api WebSocket (Phase 1)

  • Uses actix-web-actors::ws for proper RFC 6455 WebSocket handshake
  • WsJobActor implements actix::Actor + StreamHandler<ws::Message>
  • JobManager has a tokio::sync::broadcast channel for status updates
  • WsJobActor subscribes to this channel and forwards events to clients

pm-worker WS relay fallback (Phase 2)

  • relay_one_job() tries WebSocket first
  • On connection failure, falls back to poll_job_status() using AgentClient
  • Poll interval configurable via [worker] config (default: 10s)
  • Status changes trigger pg_notify('job_update') same as WebSocket events