feat: initial system design documents for Languard Server Manager
Complete backend design for an Arma 3 dedicated server management panel: - ARCHITECTURE.md: System architecture, tech stack, component responsibilities, data flows - DATABASE.md: SQLite schema with WAL mode, CHECK constraints, 16+ tables - API.md: REST + WebSocket API contract with auth, CRUD, and real-time channels - MODULES.md: Python module breakdown with class definitions and dependencies - THREADING.md: Concurrency model with thread safety, auto-restart, and WS bridge - IMPLEMENTATION_PLAN.md: 7-phase implementation plan with security from Phase 1 Key design decisions: - Sync SQLAlchemy only (no aiosqlite), thread-local DB connections - Structured config builder (not f-strings) preventing config injection - RCon request multiplexer for concurrent UDP access - BackgroundScheduler for sync DB cleanup jobs - ban.txt bidirectional sync with documented field mapping - Auto-restart sequenced after thread cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
445
IMPLEMENTATION_PLAN.md
Normal file
445
IMPLEMENTATION_PLAN.md
Normal file
@@ -0,0 +1,445 @@
|
||||
# Languard Server Manager — Implementation Plan
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before starting, ensure the following are available:
|
||||
- Python 3.11+
|
||||
- A working Arma 3 dedicated server installation (for testing)
|
||||
- Node.js 18+ (for frontend dev server)
|
||||
- The reference docs: ARCHITECTURE.md, DATABASE.md, API.md, MODULES.md, THREADING.md
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Foundation (Start Here)
|
||||
|
||||
**Goal:** Running FastAPI server with DB, auth, and basic server CRUD.
|
||||
|
||||
### Step 1.1 — Project scaffold
|
||||
|
||||
```
|
||||
mkdir backend
|
||||
cd backend
|
||||
python -m venv venv
|
||||
venv/Scripts/activate
|
||||
pip install fastapi uvicorn[standard] sqlalchemy python-jose[cryptography] passlib[bcrypt] cryptography psutil apscheduler python-multipart slowapi pytest pytest-asyncio httpx
|
||||
# uvloop (faster event loop) is Linux/macOS only — skip on Windows:
|
||||
# pip install uvloop # only on Linux/macOS
|
||||
pip freeze > requirements.txt
|
||||
```
|
||||
|
||||
Create:
|
||||
- `backend/config.py` — Settings class (see MODULES.md)
|
||||
- `backend/main.py` — FastAPI app factory, startup/shutdown hooks
|
||||
- `backend/conftest.py` — pytest fixtures (in-memory SQLite, test client)
|
||||
- `.env.example` — All env vars documented
|
||||
|
||||
### Step 1.2 — Database + Migrations
|
||||
|
||||
1. Create `backend/migrations/001_initial_schema.sql` — all tables from DATABASE.md
|
||||
- Include all CHECK constraints (role, status, verify_signatures, von_codec_quality, etc.)
|
||||
- Include `PRAGMA busy_timeout=5000` in engine setup
|
||||
- **Important:** Put `CREATE TABLE IF NOT EXISTS schema_migrations` as the very first
|
||||
statement — the migration runner queries this table before it can track anything.
|
||||
2. Create `backend/dal/event_repository.py` — `ServerEventRepository` (needed by Phase 3 threads)
|
||||
3. Create `backend/database.py`:
|
||||
- `get_engine()` with WAL + FK pragma
|
||||
- `run_migrations()` — reads and applies `.sql` files from migrations/
|
||||
- `get_db()` — FastAPI dependency (sync session)
|
||||
- `get_thread_db()` — thread-local session factory
|
||||
3. Call `run_migrations()` in `main.py:on_startup()`
|
||||
|
||||
**Test:** Start app, confirm `languard.db` created with all tables. Run `pytest` with in-memory SQLite to verify schema creates cleanly.
|
||||
|
||||
### Step 1.3 — Auth module
|
||||
|
||||
1. `backend/auth/utils.py` — `hash_password`, `verify_password`, `create_access_token`, `decode_access_token`
|
||||
2. `backend/auth/schemas.py` — `LoginRequest`, `TokenResponse`, `UserResponse`
|
||||
3. `backend/auth/service.py` — `AuthService` (create user, login, list users)
|
||||
4. `backend/auth/router.py` — login, me, users CRUD
|
||||
5. `backend/dependencies.py` — `get_current_user`, `require_admin`
|
||||
6. `main.py` — seed default admin user on first startup if users table empty
|
||||
- **Generate a random password** and print it to stdout once (NOT admin/admin)
|
||||
- Add rate limiting to `POST /auth/login` (5 attempts/minute per IP via slowapi)
|
||||
- Add input sanitization for all string fields in auth schemas
|
||||
|
||||
**Test:** `POST /api/auth/login` returns JWT. `GET /api/auth/me` with token returns user. Rate limiting returns 429 after 5 failed attempts.
|
||||
|
||||
### Step 1.4 — Server CRUD (no process management yet)
|
||||
|
||||
1. `backend/dal/server_repository.py`
|
||||
2. `backend/dal/config_repository.py`
|
||||
3. `backend/servers/schemas.py`
|
||||
4. `backend/servers/router.py` — GET, POST, PUT, DELETE /servers and /servers/{id}
|
||||
5. `backend/servers/service.py` — CRUD methods only (skip start/stop for now)
|
||||
6. `backend/utils/file_utils.py` — `ensure_server_dirs()`, `sanitize_filename()`
|
||||
7. `backend/utils/port_checker.py` — `is_port_in_use()`, `check_server_ports_available()`
|
||||
8. Port validation on create/start: check game_port through game_port+4
|
||||
|
||||
**Test:** Create server via API, confirm DB row + directory created.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Process Management
|
||||
|
||||
**Goal:** Start/stop actual `arma3server.exe` processes.
|
||||
|
||||
### Step 2.1 — Config Generator
|
||||
|
||||
1. `backend/servers/config_generator.py`
|
||||
2. **Use a structured builder** (NOT f-strings) — escape double quotes and newlines in all user-supplied string values to prevent config injection
|
||||
3. Write `server.cfg` covering all params from DATABASE.md, including mission rotation as `class Missions {}` block
|
||||
4. Write `basic.cfg`
|
||||
5. Write `server.Arma3Profile` — **written to `servers/{id}/server/server.Arma3Profile`** (Arma 3 reads from the `-name` subdirectory)
|
||||
6. Write `BESERVER_CFG_TEMPLATE` — **required for BattlEye RCon to work**
|
||||
```
|
||||
# servers/{id}/battleye/beserver.cfg
|
||||
RConPassword {rcon_password}
|
||||
RConPort {rcon_port}
|
||||
```
|
||||
`write_beserver_cfg()` must create the `battleye/` directory and write this file.
|
||||
Without it BattlEye will not open an RCon port regardless of launch parameters.
|
||||
7. `build_launch_args()` — assembles full CLI arg list
|
||||
- Include `-bepath=./battleye` to point BE at the generated config (relative to cwd)
|
||||
- Include `-profiles=./` and `-name=server` for profile directory
|
||||
- All relative paths resolve against `cwd=servers/{id}/` set in ProcessManager
|
||||
8. Set file permissions 0600 on config files containing passwords (server.cfg, beserver.cfg)
|
||||
|
||||
**Test:** `ConfigGenerator.write_all(server_id)` → inspect all generated files for correctness.
|
||||
Verify `servers/{id}/battleye/beserver.cfg` exists with the correct RCon password.
|
||||
Verify `servers/{id}/server/server.Arma3Profile` exists.
|
||||
Test config injection prevention: set hostname to `X"; passwordAdmin = "pwned"; //` — verify generated server.cfg does NOT contain the injected directive.
|
||||
Validate generated `server.cfg` manually by running the server with it.
|
||||
|
||||
### Step 2.2 — Process Manager
|
||||
|
||||
1. `backend/servers/process_manager.py` — `ProcessManager` singleton
|
||||
2. `start(server_id, exe_path, args, cwd=servers/{id}/)` — subprocess.Popen with cwd set to server instance dir
|
||||
3. `stop(server_id, timeout=30)` — on Windows: `terminate()` = hard kill (no SIGTERM). Graceful shutdown is via RCon `#shutdown` in ServerService.
|
||||
4. `kill()`, `is_running()`, `get_pid()`
|
||||
5. `recover_on_startup()` — verify PID is alive AND process name matches arma3server (prevents PID reuse)
|
||||
6. Wire `ServerService.start()` and `ServerService.stop()`
|
||||
7. Add `POST /servers/{id}/start`, `POST /servers/{id}/stop`, `POST /servers/{id}/kill` endpoints
|
||||
|
||||
**Test:** Start a server via API → confirm process appears in Task Manager. Stop it → confirm process ends.
|
||||
|
||||
### Step 2.3 — Config endpoints
|
||||
|
||||
1. `GET /servers/{id}/config`
|
||||
2. `PUT /servers/{id}/config/server`
|
||||
3. `PUT /servers/{id}/config/basic`
|
||||
4. `PUT /servers/{id}/config/profile`
|
||||
5. `PUT /servers/{id}/config/launch`
|
||||
6. `GET /servers/{id}/config/preview`
|
||||
|
||||
**Test:** Update hostname via API → regenerate and start server → confirm new hostname appears in server browser.
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — Background Threads
|
||||
|
||||
**Goal:** Live monitoring — process crash detection, log tailing, metrics.
|
||||
|
||||
### Step 3.1 — Thread infrastructure
|
||||
|
||||
1. `backend/threads/base_thread.py` — `BaseServerThread`
|
||||
2. `backend/threads/thread_registry.py` — `ThreadRegistry` singleton
|
||||
3. Wire `start_server_threads()` / `stop_server_threads()` into `ServerService.start()` / `ServerService.stop()`
|
||||
|
||||
### Step 3.2 — Process Monitor Thread
|
||||
|
||||
1. `backend/threads/process_monitor.py`
|
||||
2. Crash detection + status update in DB
|
||||
3. Auto-restart with exponential backoff
|
||||
|
||||
**Test:** Start server → kill process manually → confirm DB status changes to 'crashed'.
|
||||
**Test:** Enable auto_restart → kill → confirm server restarts automatically.
|
||||
|
||||
### Step 3.3 — Log Tail Thread
|
||||
|
||||
1. `backend/logs/parser.py` — `RPTParser`
|
||||
2. `backend/dal/log_repository.py`
|
||||
3. `backend/threads/log_tail.py`
|
||||
4. `backend/logs/service.py`
|
||||
5. `backend/logs/router.py` — `GET /servers/{id}/logs`
|
||||
|
||||
**Test:** Start server → `GET /api/servers/{id}/logs` returns recent RPT lines.
|
||||
|
||||
### Step 3.4 — Metrics Collector Thread
|
||||
|
||||
1. `backend/metrics/service.py`
|
||||
2. `backend/dal/metrics_repository.py`
|
||||
3. `backend/threads/metrics_collector.py`
|
||||
4. `backend/metrics/router.py` — `GET /servers/{id}/metrics`
|
||||
|
||||
**Test:** Running server → query metrics endpoint → see CPU/RAM data points.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — BattlEye RCon
|
||||
|
||||
**Goal:** Real-time player list, in-game admin commands.
|
||||
|
||||
### Step 4.1 — RCon Client
|
||||
|
||||
1. `backend/rcon/client.py` — `BERConClient`
|
||||
2. Implement BE RCon UDP protocol:
|
||||
- Packet structure: `'BE'` + CRC32 (little-endian) + type byte + payload
|
||||
- Login: type `0x00`, payload = password
|
||||
- Command: type `0x01`, payload = sequence byte + command string
|
||||
- Keepalive: type `0x02`, payload = empty
|
||||
3. **Request multiplexer**: track pending requests by sequence byte, route responses to correct caller via `threading.Event` per request. Background receiver thread reads all incoming packets.
|
||||
4. `parse_players_response()` — parse `players` command output
|
||||
5. Handle unsolicited server messages (type 0x02) — enqueue for event logging
|
||||
|
||||
BattlEye RCon packet format reference:
|
||||
```
|
||||
Login packet (client → server):
|
||||
42 45 # 'BE'
|
||||
[CRC32 LE] # checksum of bytes after CRC
|
||||
FF # packet type prefix
|
||||
00 # login type
|
||||
[password] # ASCII password
|
||||
|
||||
Command packet:
|
||||
42 45
|
||||
[CRC32 LE]
|
||||
FF
|
||||
01
|
||||
[seq byte] # 0x00-0xFF, wraps around
|
||||
[command] # ASCII command string
|
||||
|
||||
Command response (server → client):
|
||||
42 45
|
||||
[CRC32 LE]
|
||||
FF
|
||||
01 # 0x01 = command response (same type byte as outgoing command)
|
||||
[seq byte]
|
||||
[response] # ASCII response text
|
||||
|
||||
Server-pushed message (server → client, unsolicited):
|
||||
42 45
|
||||
[CRC32 LE]
|
||||
FF
|
||||
02 # 0x02 = server message (chat events, kill events, etc.)
|
||||
[seq byte]
|
||||
[message] # ASCII message text
|
||||
```
|
||||
|
||||
**Test:** Connect BERConClient to a running server with BattlEye → successfully login → send `players` → receive response.
|
||||
|
||||
### Step 4.2 — RCon Service + Poller Thread
|
||||
|
||||
1. `backend/rcon/service.py` — `RConService`
|
||||
2. `backend/threads/rcon_poller.py`
|
||||
3. `backend/dal/player_repository.py`
|
||||
4. `backend/players/service.py`
|
||||
5. `backend/players/router.py` — `GET /servers/{id}/players`
|
||||
|
||||
**Test:** Players join server → `GET /players` returns them with pings.
|
||||
|
||||
### Step 4.3 — Admin Actions via RCon
|
||||
|
||||
1. `POST /servers/{id}/players/{num}/kick`
|
||||
2. `POST /servers/{id}/players/{num}/ban`
|
||||
3. `POST /servers/{id}/rcon/command`
|
||||
4. `POST /servers/{id}/rcon/say`
|
||||
5. `backend/dal/ban_repository.py`
|
||||
6. `GET/POST/DELETE /servers/{id}/bans`
|
||||
7. **ban.txt bidirectional sync**: on ban add/delete via API, write to `battleye/ban.txt`; on startup, read `ban.txt` and upsert into DB
|
||||
|
||||
**Test:** Kick a player via API → confirm player disconnected from server.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — WebSocket Real-Time
|
||||
|
||||
**Goal:** Live updates to React frontend without polling.
|
||||
|
||||
### Step 5.1 — Broadcast infrastructure
|
||||
|
||||
1. `backend/websocket/broadcaster.py` — `BroadcastThread` + `enqueue()`
|
||||
2. `backend/websocket/manager.py` — `ConnectionManager`
|
||||
3. Store event loop reference in `main.py:on_startup()`:
|
||||
```python
|
||||
import asyncio
|
||||
# on_startup() runs inside the asyncio event loop — use get_running_loop(),
|
||||
# not get_event_loop() (deprecated in Python 3.10+ from async context).
|
||||
_event_loop = asyncio.get_running_loop()
|
||||
broadcaster.init(_event_loop, connection_manager)
|
||||
```
|
||||
4. Start `BroadcastThread` in `on_startup()`
|
||||
5. Wire `BroadcastThread.enqueue()` calls into all background threads
|
||||
|
||||
### Step 5.2 — WebSocket endpoint
|
||||
|
||||
1. `backend/websocket/router.py`
|
||||
2. JWT validation from query param
|
||||
3. Subscribe/unsubscribe message handling
|
||||
4. Ping/pong keepalive
|
||||
|
||||
**Test:** Connect to `ws://localhost:8000/ws/1?token=...` → see live log lines stream in terminal.
|
||||
|
||||
### Step 5.3 — Integrate all event sources
|
||||
|
||||
Wire `BroadcastThread.enqueue()` into:
|
||||
- `ProcessMonitorThread` → status updates, crash events
|
||||
- `LogTailThread` → log lines
|
||||
- `MetricsCollectorThread` → metrics snapshots
|
||||
- `RConPollerThread` → player list updates
|
||||
- `ServerService.start/stop` → status transitions
|
||||
|
||||
**Test:** React frontend connects to WS → server starts → see status, logs, metrics all update in real time.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6 — Mission & Mod Management
|
||||
|
||||
### Step 6.1 — Missions
|
||||
|
||||
1. `backend/missions/service.py`
|
||||
2. `backend/missions/router.py`
|
||||
3. Upload PBO validation (check `.pbo` extension, parse name)
|
||||
4. Mission rotation CRUD
|
||||
|
||||
**Test:** Upload a `.pbo` → appears in `GET /missions` → set as rotation → start server → mission available.
|
||||
|
||||
### Step 6.2 — Mods
|
||||
|
||||
1. `backend/mods/service.py`
|
||||
2. `backend/mods/router.py`
|
||||
3. `build_mod_string()` — assemble `-mod=` and `-serverMod=` args
|
||||
4. Wire mod string into `ConfigGenerator.build_launch_args()`
|
||||
|
||||
**Test:** Register `@CBA_A3` → enable on server → start → server loads mod.
|
||||
|
||||
---
|
||||
|
||||
## Phase 7 — Polish & Production
|
||||
|
||||
### Step 7.1 — APScheduler jobs
|
||||
|
||||
Add to `on_startup()`:
|
||||
```python
|
||||
# Use BackgroundScheduler (not AsyncIOScheduler) because cleanup methods
|
||||
# perform sync SQLite operations. AsyncIOScheduler would block the event loop.
|
||||
from apscheduler.schedulers.background import BackgroundScheduler
|
||||
scheduler = BackgroundScheduler()
|
||||
scheduler.add_job(log_service.cleanup_old_logs, 'cron', hour=3)
|
||||
scheduler.add_job(metrics_service.cleanup_old_metrics, 'cron', hour=3, minute=30)
|
||||
scheduler.add_job(player_service.cleanup_old_history, 'cron', hour=4) # 90-day retention
|
||||
scheduler.start()
|
||||
```
|
||||
|
||||
### Step 7.2 — Startup recovery
|
||||
|
||||
In `on_startup()` → `ProcessManager.recover_on_startup()`:
|
||||
- Query DB for servers with `status='running'`
|
||||
- Check if PID still alive (`psutil.pid_exists(pid)`)
|
||||
- If alive: re-attach threads (skip process start, just start monitoring threads)
|
||||
- If dead: mark as `crashed`, clear players
|
||||
|
||||
### Step 7.3 — Events log
|
||||
|
||||
1. `backend/dal/event_repository.py`
|
||||
2. Insert events for: start, stop, crash, kick, ban, config change, mission change
|
||||
3. `GET /servers/{id}/events` endpoint
|
||||
|
||||
### Step 7.4 — Security hardening (additional layers)
|
||||
|
||||
1. Encrypt sensitive DB fields: `password`, `password_admin`, `rcon_password`
|
||||
- `backend/utils/crypto.py` with Fernet
|
||||
- **Key format:** `LANGUARD_ENCRYPTION_KEY` must be a Fernet base64 key, NOT hex.
|
||||
Generate with: `python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"`
|
||||
Passing a hex string to `Fernet()` raises `ValueError` at startup.
|
||||
- Encrypt on write, decrypt on read in repositories
|
||||
- **NOTE:** Core security (rate limiting, input sanitization, config escaping, exe path validation) is already in Phases 1-2.
|
||||
2. Additional penetration testing and security audit
|
||||
3. Content-Security-Policy headers for frontend
|
||||
|
||||
### Step 7.5 — Frontend integration checklist
|
||||
|
||||
Verify React app can:
|
||||
- [ ] Login and store JWT
|
||||
- [ ] List servers with live status
|
||||
- [ ] Start/stop server and see status update via WebSocket (no page refresh)
|
||||
- [ ] View streaming log output
|
||||
- [ ] See player list update every 10s
|
||||
- [ ] See CPU/RAM charts update every 5s
|
||||
- [ ] Edit all config sections and see preview
|
||||
- [ ] Upload a mission PBO
|
||||
- [ ] Kick a player
|
||||
- [ ] Send a message to all players
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit tests (pytest)
|
||||
- `ConfigGenerator.write_server_cfg()` — compare output against expected string; test config injection prevention
|
||||
- `ConfigGenerator._escape_config_string()` — test double-quote and newline escaping
|
||||
- `RPTParser.parse_line()` — test all log formats
|
||||
- `BERConClient.parse_players_response()` — test with sample output
|
||||
- `AuthService.login()` — correct password / wrong password / rate limiting
|
||||
- Repository methods — use in-memory SQLite (`:memory:`)
|
||||
- `check_server_ports_available()` — test derived port validation
|
||||
- `sanitize_filename()` — test path traversal prevention
|
||||
- In-memory SQLite setup in `conftest.py` — shared fixture for all repository tests
|
||||
|
||||
### Integration tests
|
||||
- Full start/stop cycle with a real arma3server.exe (manual — requires licensed Arma 3 installation, not in CI)
|
||||
- WebSocket message delivery (can be automated with httpx test client)
|
||||
- RCon command round-trip (manual — requires running server with BattlEye)
|
||||
|
||||
### Load notes
|
||||
- SQLite with WAL handles concurrent reads from 4 threads per server well
|
||||
- For >10 simultaneous servers, consider connection pool size tuning
|
||||
- WebSocket broadcast scales to ~100 concurrent connections without issue
|
||||
|
||||
---
|
||||
|
||||
## Environment Setup (Developer)
|
||||
|
||||
```bash
|
||||
# 1. Clone repo
|
||||
git clone <repo>
|
||||
cd languard-server-manager
|
||||
|
||||
# 2. Backend
|
||||
cd backend
|
||||
python -m venv venv
|
||||
source venv/bin/activate # or venv\Scripts\activate on Windows
|
||||
pip install -r requirements.txt
|
||||
|
||||
# 3. Environment
|
||||
cp .env.example .env
|
||||
# Edit .env: set LANGUARD_ARMA_EXE to your arma3server_x64.exe path
|
||||
|
||||
# 4. Run backend
|
||||
uvicorn main:app --reload --host 0.0.0.0 --port 8000
|
||||
|
||||
# 5. Frontend (separate)
|
||||
cd ../frontend
|
||||
npm install
|
||||
npm run dev
|
||||
```
|
||||
|
||||
Backend auto-creates `languard.db` and seeds an admin user on first run:
|
||||
- Username: `admin`
|
||||
- Password: **randomly generated** and printed to stdout once (e.g., `Initial admin password: a7b9c2d4e5f6...`)
|
||||
- Change immediately via `PUT /api/auth/password`
|
||||
|
||||
---
|
||||
|
||||
## Phase Summary
|
||||
|
||||
| Phase | Deliverable | Est. Complexity |
|
||||
|-------|-------------|----------------|
|
||||
| 1 | Foundation (auth + server CRUD) | Low |
|
||||
| 2 | Process management + config gen | Medium |
|
||||
| 3 | Background threads (monitor, logs, metrics) | Medium-High |
|
||||
| 4 | BattlEye RCon (player list, admin cmds) | High |
|
||||
| 5 | WebSocket real-time | Medium |
|
||||
| 6 | Mission + mod management | Low-Medium |
|
||||
| 7 | Polish, security, recovery | Medium |
|
||||
|
||||
Implement phases in order — each phase builds on the previous and is independently testable.
|
||||
Reference in New Issue
Block a user