# Threading & Concurrency Model ## Overview Languard uses a hybrid concurrency model: - **FastAPI (asyncio)** handles HTTP requests and WebSocket connections on the main event loop - **Python `threading.Thread`** handles long-running background work per server - **`queue.Queue`** bridges the thread world to the asyncio world for WebSocket broadcasting - **SQLAlchemy sync sessions** with thread-local connections provide thread-safe database access ## Thread Architecture For N running servers, the system runs up to 4N+1 background threads: | Thread Type | Count | Purpose | |---|---|---| | `BroadcastThread` | 1 (global) | Bridges `queue.Queue` to asyncio WebSocket broadcasts | | `LogTailThread` | 1 per server | Tails .rpt log files, parses lines, persists to DB, broadcasts events | | `ProcessMonitorThread` | 1 per server | Monitors server process, detects crashes, triggers auto-restart | | `MetricsCollectorThread` | 1 per server | Collects CPU/RAM metrics via psutil every 10 seconds | | `RemoteAdminPollerThread` | 1 per server | Polls player list via RCon, syncs join/leave events | All server-specific threads are managed by `ThreadRegistry`, which creates/destroys thread bundles as servers start/stop. ## BaseServerThread All background threads extend `BaseServerThread`, which provides: - **Stop event**: `threading.Event` for graceful shutdown - **Thread-local DB**: Creates a fresh SQLAlchemy connection per thread via `get_thread_db()` - **Exception backoff**: On unhandled exceptions, sleeps with exponential backoff (5s → 30s max), then retries. If stop event is set, exits cleanly. - **Abstract `run_loop()` method**: Subclasses implement the main loop, called repeatedly until stop event is set ```python class BaseServerThread(threading.Thread): def __init__(self, server_id: int, ...): super().__init__(daemon=True) self.server_id = server_id self._stop_event = threading.Event() def stop(self): self._stop_event.set() def run(self): while not self._stop_event.is_set(): try: self.run_loop() except Exception: backoff = min(backoff * 2, 30) self._stop_event.wait(backoff) ``` ## ThreadRegistry `ThreadRegistry` manages thread lifecycle per server: - **`start_server_threads(server_id, db)`** — Creates and starts all 4 thread types for a server - **`stop_server_threads(server_id)`** — Sets stop events and joins all threads for a server - **`reattach_server_threads(server_id, db)`** — Recovers threads for a server that survived a process restart - **`stop_all()`** — Stops all threads for all servers (called on shutdown) Thread bundles are stored in a dict: `{server_id → ThreadBundle}`, where `ThreadBundle` is a dataclass holding all thread references. ## BroadcastThread The `BroadcastThread` is the single global thread that bridges synchronous background threads to asynchronous WebSocket clients: 1. Background threads push events into a `queue.Queue(maxsize=1000)` 2. `BroadcastThread` runs a loop reading from the queue 3. For each event, it calls `asyncio.run_coroutine_threadsafe()` to schedule a WebSocket broadcast on the main event loop 4. If the queue is full, events are dropped (non-blocking put) Events are broadcast to WebSocket clients subscribed to the relevant `server_id` (or `None` for all servers). ## ProcessManager `ProcessManager` is a singleton that manages server processes via `subprocess.Popen`: - **`start_process(server_id, cmd, cwd, env)`** — Starts a new subprocess, stores the PID - **`stop_process(server_id, timeout)`** — Sends terminate signal, waits for exit, force-kills after timeout - **`kill_process(server_id)`** — Force-kills the process immediately - **`recover_on_startup(db)`** — On startup, checks all stored PIDs against running processes via `psutil.pid_exists()`. If a process is still alive, marks the server as running. If not, marks it as stopped. - Thread-safe with per-server `threading.Lock` ## LogTailThread Tails the Arma 3 .rpt log file for each server: - Resolves the latest log file path using `Path(server["exe_path"]).parent / "server"` — Arma 3 writes .rpt files next to its executable, not in the languard server data directory - Reads new lines from the end of the file, detecting log rotation (Windows/NTFS safe) - Parses each line using `RPTParser.parse_line()` to extract timestamp, level, and message - Persists parsed entries to the `logs` table via `LogRepository` - Broadcasts `log` events via the global queue ## ProcessMonitorThread Monitors each server process for crashes: - Checks every 5 seconds whether the process is still alive - If the process has exited unexpectedly: 1. Updates server status to `crashed` 2. Logs the crash event 3. If `auto_restart` is enabled and restart count hasn't exceeded `max_restarts` within the `restart_window_seconds`: - Triggers a restart via `ServerService.start_server()` - Increments `restart_count` ## MetricsCollectorThread Collects CPU and RAM metrics for each running server: - Uses `psutil.Process(pid)` to get CPU and memory usage - Collects every 10 seconds - Stores metrics in the `metrics` table via `MetricsRepository` - Broadcasts `metrics` events via the global queue ## RemoteAdminPollerThread Polls the BattlEye RCon interface for player list updates: - Connects via `Arma3RemoteAdmin` using `BERConClient` - Polls player list every 10 seconds - Compares current players with previous state to detect joins/leaves - On player join: upserts to `players` table, inserts to `player_history`, broadcasts `players` event - On player leave: removes from `players`, updates `left_at` in `player_history`, broadcasts `players` event - On RCon connection failure: reconnects with exponential backoff ## WebSocketManager Runs on the main asyncio event loop: - Clients connect to `/ws?token=JWT&server_id=N` - JWT is validated on connection; invalid tokens close with code 4001 - Clients subscribe to specific `server_id`s or `None` (all servers) - `broadcast(server_id, message)` sends JSON-encoded messages to matching subscribers - `disconnect(websocket)` removes the client from the registry - Thread-safe via `asyncio.Lock` ## Thread Safety Rules 1. **Database access**: Each thread uses its own connection via `get_thread_db()`. No shared DB connections. 2. **WebSocket broadcasting**: Threads write to `queue.Queue`, which is thread-safe. Only `BroadcastThread` reads from the queue. 3. **Process management**: `ProcessManager` uses per-server locks for thread-safe start/stop operations. 4. **SQLite WAL mode**: Enables concurrent reads from multiple threads while a single writer operates. 5. **Asyncio locks**: `WebSocketManager` uses `asyncio.Lock` for connection registry modifications. ## Scheduled Jobs APScheduler `BackgroundScheduler` runs 3 cleanup cron jobs: | Job | Schedule | Cleanup | |---|---|---| | Clean up old log entries | Daily at 03:00 | `DELETE FROM logs WHERE created_at < datetime('now', '-7 days')` | | Clean up old metrics | Every 6 hours | `DELETE FROM metrics WHERE timestamp < datetime('now', '-1 day')` | | Clean up old events | Weekly (Sunday 04:00) | `DELETE FROM server_events WHERE created_at < datetime('now', '-30 days')` | ## Startup Sequence 1. Init DB engine and run pending migrations 2. Register built-in adapters (Arma 3) and scan for third-party plugins 3. Create `WebSocketManager` (asyncio-only) 4. Create global `BroadcastThread` (queue → asyncio bridge) 5. Create `ThreadRegistry` with `ProcessManager` and adapter registry 6. Recover processes that survived a restart (PID validation via psutil) 7. Re-attach monitoring threads for running servers 8. Seed default admin user if no users exist 9. Register and start APScheduler cleanup jobs ## Shutdown Sequence 1. Stop all server threads via `ThreadRegistry.stop_all()` 2. Stop `BroadcastThread` and join with 5s timeout 3. Stop APScheduler