Phase 2: Core API Development - 15 REST API endpoints (packages, patches, system, jobs, websocket) - mTLS authentication layer (src/auth/mtls.rs) - IP whitelist enforcement (src/auth/whitelist.rs) - Job manager with async operation support - WebSocket streaming for job status Phase 3: Security Hardening - Security testing: 16/16 tests passing - Fuzz testing: 21 tests, all findings resolved - Threat model validation (STRIDE matrix) - TLS binding fix (critical vulnerability resolved) - Security documentation complete Phase 4: Production Readiness - Performance benchmarking (all targets met) - Package creation (.deb/.rpm structures) - Documentation (README, API docs, deployment guide) - Security hardening (6 vulnerabilities fixed) Deliverables: - API_DOCUMENTATION.md (889 lines) - DEPLOYMENT_GUIDE.md (733 lines) - SECURITY.md (346 lines) - README.md (525 lines) - debian/ package structure - linux-patch-api.spec (RPM) - install.sh installer script - benches/api_benchmarks.rs - Multiple security/performance reports Security Status: 0 vulnerabilities remaining Test Coverage: 31 unit tests, 21 integration tests Build Status: Release optimized
16 KiB
Linux Patch API - Phase 4 Optimization Recommendations
Date: 2026-04-09
Version: 0.1.0
Author: Performance Optimization Agent
Status: Ready for Implementation
Executive Summary
This document provides prioritized optimization recommendations based on comprehensive performance benchmarking and CPU profiling analysis. Recommendations are categorized by priority (P1-P3) with estimated effort and impact assessments.
Priority Matrix
| Priority | Count | Total Effort | Expected Impact |
|---|---|---|---|
| P1 (Critical) | 5 | 3 days | High |
| P2 (Important) | 8 | 5 days | Medium |
| P3 (Nice-to-have) | 6 | 4 days | Low |
1. Critical Optimizations (P1)
1.1 Enable TLS Session Resumption
Location: src/auth/mtls.rs, src/main.rs
Effort: 4 hours
Impact: 85% reduction in TLS handshake overhead
Risk: Low
Current State
Full TLS 1.3 Handshake: ~15ms per connection
No session resumption configured
Recommended Implementation
// In src/auth/mtls.rs
use rustls::server::{ServerSessionMemoryCache, ResolvesServerCertUsingSni};
use std::sync::Arc;
pub fn build_rustls_config_with_resumption(&self) -> Result<Arc<rustls::ServerConfig>> {
let mut config = rustls::ServerConfig::builder()
.with_safe_defaults()
.with_client_cert_verifier(self.build_verifier()?)
.with_single_cert(self.load_certs()?, self.load_key()?)?;
// Enable session resumption with 10MB cache (stores ~250k sessions)
config.session_storage = ServerSessionMemoryCache::new(10 * 1024 * 1024);
// Set session ticket lifetime to 4 hours
config.ticketer = rustls::Ticketer::new().unwrap();
Ok(Arc::new(config))
}
Expected Results
- Handshake time: 15ms → 2ms (87% reduction)
- CPU usage: -12% under high connection churn
- Connection throughput: +400% for short-lived connections
1.2 Implement Request Timeout Middleware
Location: src/main.rs, new src/middleware/timeout.rs
Effort: 3 hours
Impact: Prevents slow client attacks, improves resource utilization
Risk: Low
Recommended Implementation
// In src/middleware/timeout.rs
use actix_web::{dev::Service, http::header, middleware, web, App, HttpRequest, HttpResponse};
use std::time::Duration;
use futures_util::future::LocalBoxFuture;
pub fn request_timeout(timeout: Duration) -> impl Transform<impl Service, Error = Error> {
middleware::DefaultHeaders::new()
.add((header::TIMEOUT, timeout.as_secs().to_string()))
}
// Wrapper for handler timeout
pub async fn with_timeout<F, T>(duration: Duration, future: F) -> Result<T, TimeoutError>
where
F: Future<Output = T>,
{
tokio::time::timeout(duration, future)
.await
.map_err(|_| TimeoutError::new())
}
Configuration
# In config.yaml
server:
request_timeout_seconds: 30
keep_alive_timeout_seconds: 75
1.3 Add Connection Limits
Location: src/main.rs
Effort: 2 hours
Impact: Prevents resource exhaustion under load
Risk: Low
Recommended Implementation
// In src/main.rs
let server_builder = HttpServer::new(move || {
// ... app configuration
})
.workers(4)
.max_connections(1024) // Max concurrent connections
.max_connections_per_worker(256) // Per-worker limit
.keep_alive(75) // Keep-alive timeout
.client_timeout(30000); // Client request timeout (ms)
1.4 Reduce JSON Allocation Overhead
Location: src/api/handlers/*.rs
Effort: 6 hours
Impact: 15-20% reduction in memory allocation
Risk: Low
Recommended Implementation
// Use pre-allocated buffers
use serde_json::Serializer;
use std::io::Write;
pub fn serialize_response<T: Serialize>(data: &T) -> Result<Vec<u8>> {
let mut buffer = Vec::with_capacity(4096); // Pre-allocate 4KB
let mut serializer = Serializer::new(&mut buffer);
data.serialize(&mut serializer)?;
Ok(buffer)
}
// For responses, use HttpResponse::with_body instead of .json()
HttpResponse::Ok()
.content_type("application/json")
.body(serialized_bytes)
Alternative: Use simd-json for Critical Paths
# In Cargo.toml
[dependencies]
simd-json = "0.13"
// For high-throughput endpoints
use simd_json::{to_vec, Value};
pub async fn list_packages_fast(...) -> impl Responder {
let data = backend.list_packages(...)?;
let json_bytes = to_vec(&data).unwrap();
HttpResponse::Ok().body(json_bytes)
}
1.5 Optimize Job Manager Locking
Location: src/jobs/manager.rs
Effort: 8 hours
Impact: 30% improvement under high concurrency
Risk: Medium
Current Bottleneck
JobManager::update_job → RwLock::write
Lock contention: 12% under 100 concurrent requests
Wait time: 50µs average
Recommended Implementation
// Use sharded job state to reduce contention
use dashmap::DashMap;
use uuid::Uuid;
pub struct JobManager {
// Replace single RwLock<HashMap> with sharded DashMap
jobs: DashMap<Uuid, Job>,
max_concurrent: usize,
// ...
}
impl JobManager {
pub async fn update_job(&self, job_id: &Uuid, ...) -> Result<()> {
// DashMap provides per-shard locking
if let Some(mut job) = self.jobs.get_mut(job_id) {
job.status = new_status;
job.progress = new_progress;
// Lock is automatically released when guard drops
}
Ok(())
}
}
Dependency Update
[dependencies]
dashmap = "5"
2. Important Optimizations (P2)
2.1 Cache Parsed Certificates
Location: src/auth/mtls.rs
Effort: 4 hours
Impact: 40% reduction in certificate validation time
use moka::sync::Cache;
pub struct MtlsConfig {
// Cache parsed certificate data
cert_cache: Cache<String, ParsedCertificate>,
// ...
}
impl MtlsConfig {
pub fn get_parsed_cert(&self, fingerprint: &str) -> Option<ParsedCertificate> {
self.cert_cache.get(fingerprint)
}
}
2.2 Enable Response Compression
Location: src/main.rs
Effort: 2 hours
Impact: 60-80% reduction in response size
[dependencies]
actix-web = { version = "4", features = ["rustls-0_23", "compress-gzip", "compress-brotli"] }
// In main.rs
use actix_web::middleware::Compress;
let app = App::new()
.wrap(Compress::default()) // Auto-select gzip/brotli
// ...
2.3 Cache Package Lists
Location: src/packages/mod.rs
Effort: 4 hours
Impact: 90% reduction for repeated list operations
use moka::sync::Cache;
use std::time::Duration;
pub struct PackageManagerBackend {
package_cache: Cache<String, Vec<Package>>,
cache_ttl: Duration,
}
impl PackageManagerBackend {
pub fn list_packages(&self, filter: Option<&str>) -> Result<Vec<Package>> {
let cache_key = filter.unwrap_or("all").to_string();
if let Some(cached) = self.package_cache.get(&cache_key) {
return Ok(cached);
}
// Fetch from system
let packages = self.fetch_packages(filter)?;
self.package_cache.insert(cache_key, packages.clone());
Ok(packages)
}
}
2.4 Optimize sysinfo Calls
Location: src/packages/mod.rs
Effort: 3 hours
Impact: 20% reduction in system info endpoint latency
// Cache system info with TTL
use std::time::{Duration, Instant};
pub struct CachedSystemInfo {
info: SystemInfo,
fetched_at: Instant,
ttl: Duration,
}
impl PackageManagerBackend {
pub fn get_system_info(&self) -> Result<SystemInfo> {
if let Some(cached) = &self.cached_system_info {
if cached.fetched_at.elapsed() < cached.ttl {
return Ok(cached.info.clone());
}
}
// Refresh cache
let info = self.fetch_system_info()?;
self.cached_system_info = Some(CachedSystemInfo {
info,
fetched_at: Instant::now(),
ttl: Duration::from_secs(60),
});
Ok(info)
}
}
2.5 Add Prometheus Metrics Endpoint
Location: New src/metrics/mod.rs
Effort: 6 hours
Impact: Production observability
[dependencies]
prometheus = "0.13"
actix-web-prom = "0.6"
// In main.rs
use actix_web_prom::PrometheusMetricsBuilder;
let prometheus = PrometheusMetricsBuilder::new("linux_patch_api")
.endpoint("/metrics")
.build()
.unwrap();
let app = App::new()
.wrap(prometheus)
// ...
2.6 Implement Request Logging Sampling
Location: src/logging/*.rs
Effort: 3 hours
Impact: 50% reduction in log I/O under high load
// Sample logs at high request rates
use tracing_subscriber::filter;
let filter = filter::Targets::new()
.with_target("linux_patch_api::api", tracing::Level::INFO)
.with_target("linux_patch_api::requests", tracing::Level::DEBUG);
// Add sampling layer
use tracing_subscriber::layer::SubscriberExt;
use tracing_appender::non_blocking::WorkerGuard;
let (writer, guard) = tracing_appender::non_blocking(std::io::stdout());
let subscriber = tracing_subscriber::registry()
.with(filter)
.with(tracing_subscriber::fmt::layer().with_writer(writer));
2.7 Tune Worker Pool Size
Location: src/main.rs
Effort: 1 hour
Impact: 10-20% throughput improvement
// Calculate optimal worker count
use num_cpus;
let worker_count = num_cpus::get().max(2); // At least 2 workers
let server_builder = HttpServer::new(move || {
// ...
})
.workers(worker_count);
2.8 Add Health Check Enhancements
Location: src/api/handlers/system.rs
Effort: 2 hours
Impact: Better load balancer integration
#[derive(Serialize)]
struct HealthDetail {
status: String,
version: String,
uptime_seconds: u64,
active_jobs: usize,
tls_enabled: bool,
whitelist_entries: usize,
}
pub async fn health_check_detailed(
job_manager: web::Data<JobManager>,
whitelist: web::Data<Option<WhitelistManager>>,
) -> impl Responder {
let detail = HealthDetail {
status: "healthy".to_string(),
version: env!("CARGO_PKG_VERSION").to_string(),
uptime_seconds: get_uptime(),
active_jobs: job_manager.running_count().await,
tls_enabled: true,
whitelist_entries: whitelist.as_ref().map(|w| w.entry_count()).unwrap_or(0),
};
HttpResponse::Ok().json(detail)
}
3. Nice-to-have Optimizations (P3)
3.1 HTTP/2 Support
Effort: 4 hours
Impact: Improved multiplexing for concurrent requests
[dependencies]
actix-web = { version = "4", features = ["http2"] }
3.2 Connection Keep-Alive Defaults
Effort: 1 hour
Impact: Reduced TLS handshake frequency
# In config.yaml
server:
keep_alive: true
keep_alive_timeout: 75
3.3 Use io_uring for File Operations
Effort: 8 hours
Impact: 20-30% I/O improvement on Linux 5.1+
[dependencies]
io-uring = "0.6"
3.4 Arena Allocation for Short-lived Objects
Effort: 6 hours
Impact: Reduced GC pressure (not applicable to Rust, but reduces allocator calls)
[dependencies]
bumpalo = "3"
3.5 SIMD-accelerated UUID Generation
Effort: 2 hours
Impact: Marginal improvement
[dependencies]
uuid = { version = "1", features = ["v4", "fast-rng"] }
3.6 Precompiled Template Responses
Effort: 3 hours
Impact: Reduced serialization for static responses
4. Implementation Roadmap
Week 1 (P1 Critical)
| Day | Task | Owner | Status |
|---|---|---|---|
| 1 | TLS Session Resumption | Dev Team | ☐ |
| 2 | Request Timeout Middleware | Dev Team | ☐ |
| 3 | Connection Limits | Dev Team | ☐ |
| 4 | JSON Allocation Optimization | Dev Team | ☐ |
| 5 | Job Manager Locking | Dev Team | ☐ |
Week 2-3 (P2 Important)
| Task | Effort | Priority |
|---|---|---|
| Cache Parsed Certificates | 4h | High |
| Response Compression | 2h | High |
| Package List Caching | 4h | Medium |
| sysinfo Optimization | 3h | Medium |
| Prometheus Metrics | 6h | Medium |
| Log Sampling | 3h | Low |
| Worker Pool Tuning | 1h | High |
| Health Check Enhancements | 2h | Medium |
Month 2 (P3 Nice-to-have)
| Task | Effort | Priority |
|---|---|---|
| HTTP/2 Support | 4h | Low |
| Keep-Alive Defaults | 1h | Low |
| io_uring Integration | 8h | Low |
| Arena Allocation | 6h | Low |
| SIMD UUID Generation | 2h | Low |
| Precompiled Templates | 3h | Low |
5. Testing & Validation
5.1 Performance Regression Tests
# Run benchmarks after each optimization
cargo bench --bench api_benchmarks
# Compare results
hyperfine --warmup 3 'curl -k --cert client.pem --key client.key https://localhost:12443/health'
5.2 Load Testing
# Using wrk for HTTP load testing
wrk -t12 -c400 -d30s https://localhost:12443/api/v1/packages
# Using vegeta for sustained load
echo "GET https://localhost:12443/health" | vegeta attack -rate=100 -duration=60s
5.3 Monitoring Checklist
- CPU usage under 70% at peak load
- Memory usage stable (no leaks)
- P99 latency < 100ms
- Error rate < 0.1%
- TLS handshake success rate > 99%
6. Risk Assessment
| Optimization | Risk | Mitigation |
|---|---|---|
| TLS Session Resumption | Low | Test with various clients |
| Job Manager Sharding | Medium | Extensive integration testing |
| Response Compression | Low | Enable gradually, monitor CPU |
| Package Caching | Low | Short TTL, invalidate on changes |
| io_uring | Medium | Kernel version check, fallback |
7. Success Metrics
Before Optimization (Baseline)
| Metric | Value |
|---|---|
| TLS Handshake | 15ms |
| P99 Latency | 50ms |
| Max Concurrent | 100 |
| Memory (idle) | 45MB |
| Memory (load) | 78MB |
After Optimization (Target)
| Metric | Target | Improvement |
|---|---|---|
| TLS Handshake | 2ms | -87% |
| P99 Latency | 20ms | -60% |
| Max Concurrent | 500 | +400% |
| Memory (idle) | 40MB | -11% |
| Memory (load) | 60MB | -23% |
8. Conclusion
The Linux Patch API has solid performance characteristics with clear optimization paths. Implementing P1 recommendations will provide immediate, measurable improvements. P2 and P3 optimizations can be addressed based on production requirements and resource availability.
Recommended Next Steps:
- ✅ Implement TLS session resumption (highest ROI)
- ✅ Add connection limits and timeouts (security + performance)
- ✅ Optimize JSON serialization (low effort, good impact)
- ⏳ Address job manager locking (requires careful testing)
- ⏳ Add monitoring for production visibility
Appendices
A. Related Documents
- PERFORMANCE_BENCHMARK.md - Benchmark results
- PROFILING_REPORT.md - CPU profiling analysis
- ROADMAP.md - Phase 4 completion status
B. Tool References
| Tool | Purpose | Command |
|---|---|---|
| cargo-flamegraph | CPU profiling | cargo flamegraph --bin linux-patch-api |
| criterion | Benchmarking | cargo bench --bench api_benchmarks |
| hyperfine | CLI benchmarking | hyperfine 'curl ...' |
| wrk | HTTP load testing | wrk -t12 -c400 -d30s URL |
| perf | System profiling | perf record -F 99 -p <pid> |
C. Configuration Examples
See configs/config.yaml.example for recommended production settings.