Files
languard-servers-manager/THREADING.md
Tran G. (Revernomad) Khoa 6511353b55 feat: implement full backend + frontend server detail, settings, and create server pages
Backend:
- Complete FastAPI backend with 42+ REST endpoints (auth, servers, config,
  players, bans, missions, mods, games, system)
- Game adapter architecture with Arma 3 as first-class adapter
- WebSocket real-time events for status, metrics, logs, players
- Background thread system (process monitor, metrics, log tail, RCon poller)
- Fernet encryption for sensitive config fields at rest
- JWT auth with admin/viewer roles, bcrypt password hashing
- SQLite with WAL mode, parameterized queries, migration system
- APScheduler cleanup jobs for logs, metrics, events

Frontend:
- Server Detail page with 7 tabs (overview, config, players, bans,
  missions, mods, logs)
- Settings page with password change and admin user management
- Create Server wizard (4-step; known bug: silent validation failure)
- New hooks: useServerDetail, useAuth, useGames
- New components: ServerHeader, ConfigEditor, PlayerTable, BanTable,
  MissionList, ModList, LogViewer, PasswordChange, UserManager
- WebSocket onEvent callback for real-time log accumulation
- 120 unit tests passing (Vitest + React Testing Library)

Docs:
- Added .gitignore, CLAUDE.md, README.md
- Updated FRONTEND.md, ARCHITECTURE.md with current implementation state
- Added .env.example for backend configuration

Known issues:
- Create Server form: "Next" buttons don't validate before advancing,
  causing silent submit failure when fields are invalid
- Config sub-tabs need UX redesign for non-technical users
2026-04-17 11:58:34 +07:00

173 lines
7.7 KiB
Markdown

# Threading & Concurrency Model
## Overview
Languard uses a hybrid concurrency model:
- **FastAPI (asyncio)** handles HTTP requests and WebSocket connections on the main event loop
- **Python `threading.Thread`** handles long-running background work per server
- **`queue.Queue`** bridges the thread world to the asyncio world for WebSocket broadcasting
- **SQLAlchemy sync sessions** with thread-local connections provide thread-safe database access
## Thread Architecture
For N running servers, the system runs up to 4N+1 background threads:
| Thread Type | Count | Purpose |
|---|---|---|
| `BroadcastThread` | 1 (global) | Bridges `queue.Queue` to asyncio WebSocket broadcasts |
| `LogTailThread` | 1 per server | Tails .rpt log files, parses lines, persists to DB, broadcasts events |
| `ProcessMonitorThread` | 1 per server | Monitors server process, detects crashes, triggers auto-restart |
| `MetricsCollectorThread` | 1 per server | Collects CPU/RAM metrics via psutil every 10 seconds |
| `RemoteAdminPollerThread` | 1 per server | Polls player list via RCon, syncs join/leave events |
All server-specific threads are managed by `ThreadRegistry`, which creates/destroys thread bundles as servers start/stop.
## BaseServerThread
All background threads extend `BaseServerThread`, which provides:
- **Stop event**: `threading.Event` for graceful shutdown
- **Thread-local DB**: Creates a fresh SQLAlchemy connection per thread via `get_thread_db()`
- **Exception backoff**: On unhandled exceptions, sleeps with exponential backoff (5s → 30s max), then retries. If stop event is set, exits cleanly.
- **Abstract `run_loop()` method**: Subclasses implement the main loop, called repeatedly until stop event is set
```python
class BaseServerThread(threading.Thread):
def __init__(self, server_id: int, ...):
super().__init__(daemon=True)
self.server_id = server_id
self._stop_event = threading.Event()
def stop(self):
self._stop_event.set()
def run(self):
while not self._stop_event.is_set():
try:
self.run_loop()
except Exception:
backoff = min(backoff * 2, 30)
self._stop_event.wait(backoff)
```
## ThreadRegistry
`ThreadRegistry` manages thread lifecycle per server:
- **`start_server_threads(server_id, db)`** — Creates and starts all 4 thread types for a server
- **`stop_server_threads(server_id)`** — Sets stop events and joins all threads for a server
- **`reattach_server_threads(server_id, db)`** — Recovers threads for a server that survived a process restart
- **`stop_all()`** — Stops all threads for all servers (called on shutdown)
Thread bundles are stored in a dict: `{server_id → ThreadBundle}`, where `ThreadBundle` is a dataclass holding all thread references.
## BroadcastThread
The `BroadcastThread` is the single global thread that bridges synchronous background threads to asynchronous WebSocket clients:
1. Background threads push events into a `queue.Queue(maxsize=1000)`
2. `BroadcastThread` runs a loop reading from the queue
3. For each event, it calls `asyncio.run_coroutine_threadsafe()` to schedule a WebSocket broadcast on the main event loop
4. If the queue is full, events are dropped (non-blocking put)
Events are broadcast to WebSocket clients subscribed to the relevant `server_id` (or `None` for all servers).
## ProcessManager
`ProcessManager` is a singleton that manages server processes via `subprocess.Popen`:
- **`start_process(server_id, cmd, cwd, env)`** — Starts a new subprocess, stores the PID
- **`stop_process(server_id, timeout)`** — Sends terminate signal, waits for exit, force-kills after timeout
- **`kill_process(server_id)`** — Force-kills the process immediately
- **`recover_on_startup(db)`** — On startup, checks all stored PIDs against running processes via `psutil.pid_exists()`. If a process is still alive, marks the server as running. If not, marks it as stopped.
- Thread-safe with per-server `threading.Lock`
## LogTailThread
Tails the Arma 3 .rpt log file for each server:
- Resolves the latest log file path using the adapter's `LogParser.get_latest_log_file()`
- Reads new lines from the end of the file, detecting log rotation (Windows/NTFS safe)
- Parses each line using `RPTParser.parse_line()` to extract timestamp, level, and message
- Persists parsed entries to the `logs` table via `LogRepository`
- Broadcasts `log` events via the global queue
## ProcessMonitorThread
Monitors each server process for crashes:
- Checks every 5 seconds whether the process is still alive
- If the process has exited unexpectedly:
1. Updates server status to `crashed`
2. Logs the crash event
3. If `auto_restart` is enabled and restart count hasn't exceeded `max_restarts` within the `restart_window_seconds`:
- Triggers a restart via `ServerService.start_server()`
- Increments `restart_count`
## MetricsCollectorThread
Collects CPU and RAM metrics for each running server:
- Uses `psutil.Process(pid)` to get CPU and memory usage
- Collects every 10 seconds
- Stores metrics in the `metrics` table via `MetricsRepository`
- Broadcasts `metrics` events via the global queue
## RemoteAdminPollerThread
Polls the BattlEye RCon interface for player list updates:
- Connects via `Arma3RemoteAdmin` using `BERConClient`
- Polls player list every 10 seconds
- Compares current players with previous state to detect joins/leaves
- On player join: upserts to `players` table, inserts to `player_history`, broadcasts `players` event
- On player leave: removes from `players`, updates `left_at` in `player_history`, broadcasts `players` event
- On RCon connection failure: reconnects with exponential backoff
## WebSocketManager
Runs on the main asyncio event loop:
- Clients connect to `/ws?token=JWT&server_id=N`
- JWT is validated on connection; invalid tokens close with code 4001
- Clients subscribe to specific `server_id`s or `None` (all servers)
- `broadcast(server_id, message)` sends JSON-encoded messages to matching subscribers
- `disconnect(websocket)` removes the client from the registry
- Thread-safe via `asyncio.Lock`
## Thread Safety Rules
1. **Database access**: Each thread uses its own connection via `get_thread_db()`. No shared DB connections.
2. **WebSocket broadcasting**: Threads write to `queue.Queue`, which is thread-safe. Only `BroadcastThread` reads from the queue.
3. **Process management**: `ProcessManager` uses per-server locks for thread-safe start/stop operations.
4. **SQLite WAL mode**: Enables concurrent reads from multiple threads while a single writer operates.
5. **Asyncio locks**: `WebSocketManager` uses `asyncio.Lock` for connection registry modifications.
## Scheduled Jobs
APScheduler `BackgroundScheduler` runs 3 cleanup cron jobs:
| Job | Schedule | Cleanup |
|---|---|---|
| Clean up old log entries | Daily at 03:00 | `DELETE FROM logs WHERE created_at < datetime('now', '-7 days')` |
| Clean up old metrics | Every 6 hours | `DELETE FROM metrics WHERE timestamp < datetime('now', '-1 day')` |
| Clean up old events | Weekly (Sunday 04:00) | `DELETE FROM server_events WHERE created_at < datetime('now', '-30 days')` |
## Startup Sequence
1. Init DB engine and run pending migrations
2. Register built-in adapters (Arma 3) and scan for third-party plugins
3. Create `WebSocketManager` (asyncio-only)
4. Create global `BroadcastThread` (queue → asyncio bridge)
5. Create `ThreadRegistry` with `ProcessManager` and adapter registry
6. Recover processes that survived a restart (PID validation via psutil)
7. Re-attach monitoring threads for running servers
8. Seed default admin user if no users exist
9. Register and start APScheduler cleanup jobs
## Shutdown Sequence
1. Stop all server threads via `ThreadRegistry.stop_all()`
2. Stop `BroadcastThread` and join with 5s timeout
3. Stop APScheduler