feat: multi-game adapter revamp, council protocol merge, and frontend design doc
- Revamp architecture for modular game server support (Arma 3 first, extensible) - Merge ConfigSchema into ConfigGenerator per council decision (8→7 protocols) - Add has_capability() method to GameAdapter protocol for explicit capability probing - Add FRONTEND.md: production-grade dark neumorphism design with amber/orange palette - Update all docs (ARCHITECTURE, MODULES, DATABASE, API, IMPLEMENTATION_PLAN, THREADING) to reflect protocol merge and multi-game adapter patterns
This commit is contained in:
411
THREADING.md
411
THREADING.md
@@ -8,6 +8,8 @@ The system uses a hybrid concurrency model:
|
||||
- **Queue** bridges the thread world → asyncio world for WebSocket broadcasting
|
||||
- **SQLAlchemy sync sessions** are used in threads (thread-local connections)
|
||||
|
||||
The key change for multi-game support: **core threads are game-agnostic** and receive game-specific behavior (log parsers, remote admin clients) via dependency injection from the adapter.
|
||||
|
||||
---
|
||||
|
||||
## Thread Map
|
||||
@@ -16,7 +18,7 @@ The system uses a hybrid concurrency model:
|
||||
Main Process (FastAPI / asyncio event loop)
|
||||
│
|
||||
├── [uvicorn] HTTP/WS event loop (asyncio)
|
||||
│ ├── REST request handlers (async def)
|
||||
│ ├── REST request handlers (async def / plain def)
|
||||
│ └── WebSocket handlers (async def)
|
||||
│
|
||||
├── BroadcastThread (daemon thread, 1 global)
|
||||
@@ -25,14 +27,66 @@ Main Process (FastAPI / asyncio event loop)
|
||||
│ → ConnectionManager.broadcast()
|
||||
│
|
||||
└── Per-running-server thread group (started when server starts, stopped when server stops):
|
||||
├── ProcessMonitorThread (1 per server, 1s interval)
|
||||
├── LogTailThread (1 per server, 100ms interval)
|
||||
├── MetricsCollectorThread (1 per server, 5s interval)
|
||||
└── RConPollerThread (1 per server, 10s interval, 30s startup delay)
|
||||
├── ProcessMonitorThread (1 per server, 1s interval) — CORE
|
||||
├── LogTailThread (1 per server, 100ms interval) — CORE + adapter LogParser
|
||||
├── MetricsCollectorThread (1 per server, 5s interval) — CORE
|
||||
└── RemoteAdminPollerThread (1 per server, 10s interval) — CORE + adapter RemoteAdmin
|
||||
```
|
||||
|
||||
For **N running servers**, there are:
|
||||
- `4*N` background threads + 1 BroadcastThread = `4N+1` background threads total
|
||||
- (If adapter has no `remote_admin`, RemoteAdminPollerThread is skipped → `3*N+1`)
|
||||
|
||||
---
|
||||
|
||||
## Adapter Injection into Threads
|
||||
|
||||
The `ThreadRegistry` resolves the adapter at thread creation time and injects game-specific components into the generic core threads:
|
||||
|
||||
```python
|
||||
class ThreadRegistry:
|
||||
@classmethod
|
||||
def start_server_threads(cls, server_id: int, db: Connection) -> None:
|
||||
server = ServerRepository(db).get_by_id(server_id)
|
||||
adapter = GameAdapterRegistry.get(server["game_type"])
|
||||
|
||||
threads: dict[str, BaseServerThread] = {}
|
||||
|
||||
# Core threads — always present
|
||||
threads["process_monitor"] = ProcessMonitorThread(server_id)
|
||||
threads["metrics_collector"] = MetricsCollectorThread(server_id)
|
||||
|
||||
# Core thread with adapter's log parser injected
|
||||
log_parser = adapter.get_log_parser()
|
||||
threads["log_tail"] = LogTailThread(
|
||||
server_id,
|
||||
parser=log_parser,
|
||||
log_file_resolver=log_parser.get_log_file_resolver(server_id),
|
||||
)
|
||||
|
||||
# Core thread with adapter's remote admin injected (if supported)
|
||||
remote_admin = adapter.get_remote_admin()
|
||||
if remote_admin is not None:
|
||||
threads["remote_admin_poller"] = RemoteAdminPollerThread(
|
||||
server_id,
|
||||
remote_admin_factory=lambda: remote_admin.create_client(
|
||||
host="127.0.0.1",
|
||||
port=server["rcon_port"],
|
||||
password=_get_remote_admin_password(server_id, db),
|
||||
),
|
||||
)
|
||||
|
||||
# Adapter-declared custom threads (for game-specific background work)
|
||||
for thread_factory in adapter.get_custom_thread_factories():
|
||||
thread = thread_factory(server_id, db)
|
||||
threads[thread.name_key] = thread
|
||||
|
||||
with cls._lock:
|
||||
cls._threads[server_id] = threads
|
||||
|
||||
for thread in threads.values():
|
||||
thread.start()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
@@ -46,6 +100,37 @@ For **N running servers**, there are:
|
||||
| `ConnectionManager._connections` | async, single event loop | `asyncio.Lock` |
|
||||
| SQLite connections | one connection per thread | Thread-local via `threading.local()` |
|
||||
| Config files on disk | write on start, read-only during run | No lock needed (regenerated before start) |
|
||||
| Adapter objects | read-only after registration | No lock needed (registered once at startup) |
|
||||
| RemoteAdminClient calls | called from RemoteAdminPollerThread only | **Core wraps with per-server `threading.Lock`** (see below) |
|
||||
|
||||
### RemoteAdminClient Thread Safety
|
||||
|
||||
Adapters do NOT need to make their `RemoteAdminClient` implementations thread-safe. The core wraps every RemoteAdminClient call with a **per-server `threading.Lock`** so only one call executes at a time against a given server's admin client.
|
||||
|
||||
```python
|
||||
# In RemoteAdminPollerThread
|
||||
class RemoteAdminPollerThread(BaseServerThread):
|
||||
def __init__(self, server_id: int,
|
||||
remote_admin_factory: Callable[[], "RemoteAdminClient"]):
|
||||
super().__init__(server_id, self.interval)
|
||||
self._client_factory = remote_admin_factory
|
||||
self._client: RemoteAdminClient | None = None
|
||||
self._connected = False
|
||||
self._call_lock = threading.Lock() # per-server lock
|
||||
|
||||
def _call(self, method, *args, **kwargs):
|
||||
"""All RemoteAdminClient calls go through this to serialize access."""
|
||||
with self._call_lock:
|
||||
return method(*args, **kwargs)
|
||||
|
||||
# In tick(), replace direct self._client.get_players() with:
|
||||
# players = self._call(self._client.get_players)
|
||||
```
|
||||
|
||||
This means:
|
||||
- Adapter authors write simple, non-thread-safe clients
|
||||
- Core guarantees no concurrent calls to the same client
|
||||
- Different servers' clients can call concurrently (different locks)
|
||||
|
||||
### SQLite Thread Safety
|
||||
```python
|
||||
@@ -55,7 +140,6 @@ For **N running servers**, there are:
|
||||
|
||||
class BaseServerThread(threading.Thread):
|
||||
def run(self):
|
||||
# Create thread-local DB connection — single connection per thread
|
||||
engine = get_engine()
|
||||
self._db = engine.connect()
|
||||
try:
|
||||
@@ -69,46 +153,49 @@ class BaseServerThread(threading.Thread):
|
||||
except Exception as e:
|
||||
logger.error(f"{self.name} setup error: {e}")
|
||||
finally:
|
||||
self.teardown() # always release resources (even on setup failure)
|
||||
self._db.close() # always close connection
|
||||
self.teardown()
|
||||
self._db.close()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## BroadcastThread — Asyncio Bridge
|
||||
|
||||
This is the critical bridge between background threads and the asyncio WebSocket layer.
|
||||
This is the critical bridge between background threads and the asyncio WebSocket layer. **Game-agnostic.**
|
||||
|
||||
```
|
||||
Background Thread Asyncio Event Loop
|
||||
───────────────── ──────────────────
|
||||
BroadcastThread.enqueue( uvicorn runs here
|
||||
server_id=1,
|
||||
Any background thread uvicorn runs here
|
||||
│
|
||||
▼
|
||||
BroadcastThread.enqueue( loop = asyncio.get_running_loop()
|
||||
server_id=1, (stored at app startup)
|
||||
msg_type='log',
|
||||
data={...}
|
||||
)
|
||||
│
|
||||
▼
|
||||
broadcast_queue.put({ loop = asyncio.get_event_loop()
|
||||
'server_id': 1, (stored at app startup)
|
||||
'type': 'log',
|
||||
'data': {...}
|
||||
})
|
||||
│
|
||||
▼
|
||||
BroadcastThread.run() ──────────────────► asyncio.run_coroutine_threadsafe(
|
||||
while True: connection_manager.broadcast(
|
||||
msg = queue.get() server_id=1,
|
||||
fut = run_coroutine_threadsafe( message={type, data}
|
||||
broadcast_coro, ),
|
||||
self._loop loop=self._loop
|
||||
) )
|
||||
broadcast_queue.put({ asyncio.run_coroutine_threadsafe(
|
||||
'server_id': 1, connection_manager.broadcast(
|
||||
'type': 'log', server_id=1,
|
||||
'data': {...} message={type, data}
|
||||
) ),
|
||||
│ loop=self._loop
|
||||
▼ )
|
||||
BroadcastThread.run() ──────────────────►
|
||||
while True:
|
||||
msg = queue.get()
|
||||
fut = run_coroutine_threadsafe(
|
||||
broadcast_coro,
|
||||
self._loop
|
||||
)
|
||||
fut.result(timeout=5)
|
||||
```
|
||||
|
||||
### Implementation Sketch
|
||||
```python
|
||||
# broadcaster.py
|
||||
# core/websocket/broadcaster.py
|
||||
import asyncio
|
||||
import queue
|
||||
import threading
|
||||
@@ -130,9 +217,6 @@ class BroadcastThread(threading.Thread):
|
||||
try:
|
||||
msg = _broadcast_queue.get(timeout=1.0)
|
||||
server_id = msg['server_id']
|
||||
# Build the outgoing WebSocket message envelope.
|
||||
# Include server_id so clients subscribed to 'all' can identify the source.
|
||||
# API contract: {type, server_id, data}
|
||||
outgoing = {
|
||||
'type': msg['type'],
|
||||
'server_id': server_id,
|
||||
@@ -145,7 +229,6 @@ class BroadcastThread(threading.Thread):
|
||||
try:
|
||||
future.result(timeout=5.0)
|
||||
except TimeoutError:
|
||||
# Don't block the queue — log and continue
|
||||
logger.warning(f"Broadcast timeout for server {server_id} msg type {msg['type']}")
|
||||
except queue.Empty:
|
||||
continue
|
||||
@@ -172,6 +255,8 @@ class BroadcastThread(threading.Thread):
|
||||
|
||||
## ProcessMonitorThread — Crash Detection & Auto-Restart
|
||||
|
||||
**Game-agnostic.** This thread only checks OS-level process status and updates the core `servers` table.
|
||||
|
||||
```python
|
||||
class ProcessMonitorThread(BaseServerThread):
|
||||
interval = 1.0
|
||||
@@ -184,7 +269,6 @@ class ProcessMonitorThread(BaseServerThread):
|
||||
|
||||
exit_code = proc.poll()
|
||||
if exit_code is not None:
|
||||
# Process has exited
|
||||
self._handle_process_exit(exit_code)
|
||||
self.stop()
|
||||
|
||||
@@ -210,18 +294,13 @@ class ProcessMonitorThread(BaseServerThread):
|
||||
'detail': {'exit_code': exit_code}
|
||||
})
|
||||
|
||||
# Stop other threads for this server. Must NOT be called synchronously
|
||||
# from within this thread's own run() if stop_server_threads() joins threads,
|
||||
# as a thread cannot join itself. Use a daemon thread to do the cleanup
|
||||
# after this thread's run() returns naturally.
|
||||
# IMPORTANT: The auto-restart Timer must be started AFTER thread cleanup
|
||||
# completes. The cleanup daemon thread starts the restart timer when done.
|
||||
# Stop other threads for this server via daemon cleanup thread
|
||||
# (avoids thread joining itself)
|
||||
import threading as _threading
|
||||
|
||||
def _cleanup_and_maybe_restart():
|
||||
try:
|
||||
ThreadRegistry.get().stop_server_threads(self.server_id)
|
||||
# Only schedule restart after threads are fully cleaned up
|
||||
if is_crash and server.get('auto_restart'):
|
||||
self._schedule_auto_restart(server)
|
||||
except Exception as e:
|
||||
@@ -238,10 +317,8 @@ class ProcessMonitorThread(BaseServerThread):
|
||||
).start()
|
||||
|
||||
def _schedule_auto_restart(self, server: dict):
|
||||
# IMPORTANT: This method runs in the daemon cleanup thread, NOT the
|
||||
# ProcessMonitorThread. Must create its own DB connection — do NOT
|
||||
# use self._db (it belongs to the ProcessMonitorThread's thread context
|
||||
# and may be closed by teardown() already).
|
||||
# IMPORTANT: Runs in daemon cleanup thread, NOT ProcessMonitorThread.
|
||||
# Must create its own DB connection.
|
||||
from database import get_thread_db
|
||||
db = get_thread_db()
|
||||
|
||||
@@ -250,7 +327,6 @@ class ProcessMonitorThread(BaseServerThread):
|
||||
window = server['restart_window_seconds']
|
||||
last_restart = server.get('last_restart_at')
|
||||
|
||||
# Reset restart_count if last restart was outside the window
|
||||
if last_restart:
|
||||
last_dt = datetime.fromisoformat(last_restart)
|
||||
elapsed = (datetime.utcnow() - last_dt).total_seconds()
|
||||
@@ -270,7 +346,7 @@ class ProcessMonitorThread(BaseServerThread):
|
||||
})
|
||||
|
||||
def _auto_restart(self):
|
||||
from servers.service import ServerService
|
||||
from core.servers.service import ServerService
|
||||
try:
|
||||
ServerService().start(self.server_id)
|
||||
except Exception as e:
|
||||
@@ -279,63 +355,61 @@ class ProcessMonitorThread(BaseServerThread):
|
||||
|
||||
---
|
||||
|
||||
## LogTailThread — RPT File Tailing
|
||||
## LogTailThread — Generic File Tailing with Adapter Parser
|
||||
|
||||
The Arma 3 RPT file grows while the server runs. This thread tails it like `tail -f`.
|
||||
**Core thread** that takes an adapter-provided `LogParser` for game-specific log line parsing and file discovery.
|
||||
|
||||
```python
|
||||
class LogTailThread(BaseServerThread):
|
||||
interval = 0.1 # 100ms
|
||||
|
||||
def setup(self):
|
||||
self._file = None
|
||||
def __init__(self, server_id: int, log_parser: "LogParser",
|
||||
log_file_resolver: Callable[[Path], Path | None]):
|
||||
super().__init__(server_id, self.interval)
|
||||
self._parser = log_parser
|
||||
self._log_file_resolver = log_file_resolver
|
||||
self._file: TextIO | None = None
|
||||
self._current_path: Path | None = None
|
||||
self._last_size: int = 0
|
||||
self._open_latest_rpt()
|
||||
|
||||
def _open_latest_rpt(self):
|
||||
def setup(self):
|
||||
self._open_latest_log()
|
||||
|
||||
def _open_latest_log(self):
|
||||
"""
|
||||
Arma 3 writes timestamped RPT files in the profile subdirectory:
|
||||
servers/{id}/server/arma3server_YYYY-MM-DD_HH-MM-SS.rpt
|
||||
|
||||
Use rglob('*.rpt') to search recursively within the server dir.
|
||||
The profile subdirectory is determined by -profiles + -name flags.
|
||||
Uses the adapter-provided log_file_resolver to find the current log file.
|
||||
Opens it and seeks to end (tail behavior).
|
||||
|
||||
NOTE: Do NOT use os.stat().st_ino for rotation detection — on Windows/NTFS
|
||||
st_ino is always 0, making inode comparison completely non-functional.
|
||||
Instead, track the filename and file size. If a newer .rpt appears or the
|
||||
current file shrinks (truncated/replaced), reopen.
|
||||
st_ino is always 0. Instead, track filename and file size.
|
||||
"""
|
||||
rpt_files = list(Path(get_server_dir(self.server_id)).rglob("*.rpt"))
|
||||
if not rpt_files:
|
||||
return # Server hasn't created RPT yet; retry in next tick
|
||||
server_dir = get_server_dir(self.server_id)
|
||||
log_path = self._log_file_resolver(server_dir)
|
||||
if log_path is None:
|
||||
return # Server hasn't created log yet; retry in next tick
|
||||
|
||||
latest = max(rpt_files, key=lambda p: p.stat().st_mtime)
|
||||
try:
|
||||
self._file = open(latest, 'r', encoding='utf-8', errors='replace')
|
||||
self._file.seek(0, 2) # seek to end — tail, don't replay old output
|
||||
self._current_path = latest
|
||||
self._file = open(log_path, 'r', encoding='utf-8', errors='replace')
|
||||
self._file.seek(0, 2) # seek to end
|
||||
self._current_path = log_path
|
||||
self._last_size = self._file.tell()
|
||||
except OSError:
|
||||
self._file = None
|
||||
|
||||
def tick(self):
|
||||
if self._file is None:
|
||||
self._open_latest_rpt()
|
||||
self._open_latest_log()
|
||||
return
|
||||
|
||||
# Rotation detection: only re-glob every 5 seconds (not every 100ms tick)
|
||||
# to avoid excessive filesystem I/O with large mpmissions directories.
|
||||
# Rotation detection: only re-check every 5 seconds
|
||||
now = time.monotonic()
|
||||
if now - getattr(self, '_last_glob_time', 0) > 5.0:
|
||||
self._last_glob_time = now
|
||||
rpt_files = list(Path(get_server_dir(self.server_id)).rglob("*.rpt"))
|
||||
if rpt_files:
|
||||
latest = max(rpt_files, key=lambda p: p.stat().st_mtime)
|
||||
if latest != self._current_path:
|
||||
# A new RPT file was created — switch to it
|
||||
server_dir = get_server_dir(self.server_id)
|
||||
log_path = self._log_file_resolver(server_dir)
|
||||
if log_path is not None and log_path != self._current_path:
|
||||
self._file.close()
|
||||
self._open_latest_rpt()
|
||||
self._open_latest_log()
|
||||
return
|
||||
|
||||
try:
|
||||
@@ -344,12 +418,11 @@ class LogTailThread(BaseServerThread):
|
||||
return
|
||||
|
||||
if current_size < self._last_size:
|
||||
# File shrank — truncated or replaced; reopen
|
||||
self._file.close()
|
||||
self._open_latest_rpt()
|
||||
self._open_latest_log()
|
||||
return
|
||||
|
||||
# Read new lines
|
||||
# Read new lines and parse using adapter's parser
|
||||
while True:
|
||||
line = self._file.readline()
|
||||
if not line:
|
||||
@@ -359,13 +432,13 @@ class LogTailThread(BaseServerThread):
|
||||
if not line:
|
||||
continue
|
||||
|
||||
entry = RPTParser.parse_line(line)
|
||||
# Adapter parses the line — game-specific format
|
||||
entry = self._parser.parse_line(line)
|
||||
if entry:
|
||||
LogRepository(self._db).insert(self.server_id, entry)
|
||||
BroadcastThread.enqueue(self.server_id, 'log', entry)
|
||||
|
||||
def teardown(self):
|
||||
"""Close the open RPT file handle when the thread stops."""
|
||||
if self._file is not None:
|
||||
try:
|
||||
self._file.close()
|
||||
@@ -376,48 +449,108 @@ class LogTailThread(BaseServerThread):
|
||||
|
||||
---
|
||||
|
||||
## RConPollerThread — Player List Synchronization
|
||||
## MetricsCollectorThread — Game-Agnostic Resource Monitoring
|
||||
|
||||
**Fully game-agnostic.** Uses psutil to monitor any process.
|
||||
|
||||
```python
|
||||
class RConPollerThread(BaseServerThread):
|
||||
interval = 10.0
|
||||
STARTUP_DELAY = 30.0 # wait for server to fully initialize
|
||||
_rcon_ready = False # flag: True only after successful setup
|
||||
|
||||
def setup(self):
|
||||
# Wait for server to start up before attempting RCon
|
||||
if self._stop_event.wait(self.STARTUP_DELAY):
|
||||
self._rcon_ready = False
|
||||
return # stop was requested during wait
|
||||
self._rcon = RConService(self.server_id)
|
||||
self._connected = self._rcon.connect()
|
||||
self._rcon_ready = True
|
||||
class MetricsCollectorThread(BaseServerThread):
|
||||
interval = 5.0
|
||||
|
||||
def tick(self):
|
||||
pid = ProcessManager.get().get_pid(self.server_id)
|
||||
if pid is None:
|
||||
return
|
||||
|
||||
try:
|
||||
proc = psutil.Process(pid)
|
||||
cpu = proc.cpu_percent(interval=0.5)
|
||||
ram = proc.memory_info().rss / (1024 * 1024) # MB
|
||||
except (psutil.NoSuchProcess, psutil.AccessDenied):
|
||||
return
|
||||
|
||||
player_count = PlayerRepository(self._db).count(self.server_id)
|
||||
|
||||
MetricsRepository(self._db).insert(self.server_id, cpu, ram, player_count)
|
||||
BroadcastThread.enqueue(self.server_id, 'metrics', {
|
||||
'cpu_percent': cpu,
|
||||
'ram_mb': ram,
|
||||
'player_count': player_count,
|
||||
})
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## RemoteAdminPollerThread — Generic Polling with Adapter Client
|
||||
|
||||
**Core thread** that takes an adapter-provided `RemoteAdmin` factory for game-specific admin protocol communication. Skipped entirely if adapter has no `remote_admin` capability.
|
||||
|
||||
```python
|
||||
class RemoteAdminPollerThread(BaseServerThread):
|
||||
interval = 10.0
|
||||
STARTUP_DELAY = 30.0
|
||||
|
||||
def __init__(self, server_id: int,
|
||||
remote_admin_factory: Callable[[], "RemoteAdminClient"]):
|
||||
super().__init__(server_id, self.interval)
|
||||
self._client_factory = remote_admin_factory
|
||||
self._client: RemoteAdminClient | None = None
|
||||
self._connected = False
|
||||
|
||||
def setup(self):
|
||||
# Wait for server to start up before attempting connection
|
||||
# Uses _stop_event.wait() instead of time.sleep() for immediate shutdown
|
||||
startup_delay = self._get_startup_delay()
|
||||
if self._stop_event.wait(startup_delay):
|
||||
return # stop was requested during wait
|
||||
self._connect()
|
||||
|
||||
def _get_startup_delay(self) -> float:
|
||||
# Default delay; adapter may override via RemoteAdmin.get_startup_delay()
|
||||
return self.STARTUP_DELAY
|
||||
|
||||
def _connect(self):
|
||||
try:
|
||||
self._client = self._client_factory()
|
||||
self._connected = True
|
||||
except Exception as e:
|
||||
logger.warning(f"Remote admin connection failed for server {self.server_id}: {e}")
|
||||
self._connected = False
|
||||
|
||||
def tick(self):
|
||||
if not self._rcon_ready:
|
||||
return # setup() failed or was interrupted
|
||||
if not self._connected:
|
||||
self._reconnect_attempts = getattr(self, '_reconnect_attempts', 0) + 1
|
||||
delay = min(10 * 2 ** self._reconnect_attempts, 120) # exponential backoff
|
||||
if self._reconnect_attempts > 1:
|
||||
logger.info(f"RCon reconnect attempt {self._reconnect_attempts} for server {self.server_id} (next in {delay}s)")
|
||||
logger.info(f"Remote admin reconnect attempt {self._reconnect_attempts} for server {self.server_id}")
|
||||
if self._stop_event.wait(delay):
|
||||
return
|
||||
self._connected = self._rcon.connect()
|
||||
self._connect()
|
||||
if not self._connected:
|
||||
return
|
||||
self._reconnect_attempts = 0 # reset on successful connection
|
||||
self._reconnect_attempts = 0
|
||||
|
||||
try:
|
||||
players = self._rcon.get_players()
|
||||
PlayerService(self._db).update_from_rcon(self.server_id, players)
|
||||
players = self._call(self._client.get_players)
|
||||
PlayerService(self._db).update_from_remote_admin(self.server_id, players)
|
||||
BroadcastThread.enqueue(self.server_id, 'players', {
|
||||
'players': [p.dict() for p in players],
|
||||
'count': len(players)
|
||||
'players': [p for p in players],
|
||||
'count': len(players),
|
||||
})
|
||||
except ConnectionError:
|
||||
self._connected = False
|
||||
logger.warning(f"RCon connection lost for server {self.server_id}")
|
||||
logger.warning(f"Remote admin connection lost for server {self.server_id}")
|
||||
except RemoteAdminError as e:
|
||||
logger.error(f"Remote admin adapter error for server {self.server_id}: {e}")
|
||||
self._connected = False
|
||||
|
||||
def teardown(self):
|
||||
if self._client is not None:
|
||||
try:
|
||||
self._client.disconnect()
|
||||
except Exception:
|
||||
pass
|
||||
self._client = None
|
||||
```
|
||||
|
||||
---
|
||||
@@ -429,29 +562,45 @@ class RConPollerThread(BaseServerThread):
|
||||
POST /servers/{id}/start
|
||||
│
|
||||
├── ServerService.start()
|
||||
│ ├── ConfigGenerator.write_all()
|
||||
│ ├── adapter = GameAdapterRegistry.get(server.game_type)
|
||||
│ ├── check_server_ports_available(server_id)
|
||||
│ │ └── For ALL running servers, resolve each adapter,
|
||||
│ │ get port conventions, check full derived port set
|
||||
│ │ (cross-game: Arma 3 game+steam query + other games' ports)
|
||||
│ ├── adapter.config_generator.write_configs()
|
||||
│ │ └── Atomic write: write to .tmp files first, then os.replace()
|
||||
│ │ On failure: .tmp files cleaned up, originals untouched
|
||||
│ ├── launch_args = adapter.config_generator.build_launch_args()
|
||||
│ ├── ProcessManager.start() ← creates subprocess.Popen
|
||||
│ └── ThreadRegistry.start_server_threads(id)
|
||||
│ ├── ProcessMonitorThread(id).start()
|
||||
│ ├── LogTailThread(id).start()
|
||||
│ ├── MetricsCollectorThread(id).start()
|
||||
│ └── RConPollerThread(id).start()
|
||||
│ └── ThreadRegistry.start_server_threads(id, db)
|
||||
│ ├── ProcessMonitorThread(id) ← core, always
|
||||
│ ├── LogTailThread(id, adapter.log_parser) ← core + adapter
|
||||
│ ├── MetricsCollectorThread(id) ← core, always
|
||||
│ └── RemoteAdminPollerThread(id, adapter.remote_admin)
|
||||
│ ← core + adapter (if available)
|
||||
│
|
||||
└── BroadcastThread.enqueue(id, 'status', {status: 'starting'})
|
||||
|
||||
Error paths on start:
|
||||
├── ConfigWriteError → rollback .tmp files, return 500 to client
|
||||
├── ConfigValidationError → return 422 with validation details
|
||||
├── LaunchArgsError → return 400 with invalid arg info
|
||||
├── ExeNotAllowedError → return 403 with executable name
|
||||
└── PortInUseError → return 409 with conflicting port info
|
||||
```
|
||||
|
||||
### Stop Server Flow
|
||||
```
|
||||
POST /servers/{id}/stop
|
||||
│
|
||||
├── RConService.shutdown() ← sends #shutdown via RCon
|
||||
├── adapter.remote_admin.shutdown() ← if adapter has remote_admin
|
||||
├── Wait up to 30s for process exit (ProcessManager.stop(timeout=30))
|
||||
├── If still running: ProcessManager.kill()
|
||||
├── ThreadRegistry.stop_server_threads(id)
|
||||
│ ├── ProcessMonitorThread.stop() (sets _stop_event)
|
||||
│ ├── ProcessMonitorThread.stop()
|
||||
│ ├── LogTailThread.stop()
|
||||
│ ├── MetricsCollectorThread.stop()
|
||||
│ └── RConPollerThread.stop()
|
||||
│ └── RemoteAdminPollerThread.stop() ← if present
|
||||
│ └── Thread.join(timeout=5) for each
|
||||
│
|
||||
└── BroadcastThread.enqueue(id, 'status', {status: 'stopped'})
|
||||
@@ -496,7 +645,7 @@ class BaseServerThread(threading.Thread):
|
||||
self.setup()
|
||||
except Exception as e:
|
||||
logger.error(f"{self.name} setup error: {e}")
|
||||
return # setup failed completely — no partial resources to clean
|
||||
return # setup failed completely
|
||||
|
||||
try:
|
||||
while not self._stop_event.is_set():
|
||||
@@ -504,24 +653,34 @@ class BaseServerThread(threading.Thread):
|
||||
self.tick()
|
||||
except Exception as e:
|
||||
self.on_error(e)
|
||||
# Use wait() instead of sleep() — responds immediately to stop()
|
||||
self._stop_event.wait(self.interval)
|
||||
finally:
|
||||
self.teardown() # always runs; subclasses close files/sockets here
|
||||
self.teardown()
|
||||
|
||||
def on_error(self, error: Exception):
|
||||
"""Default error handler. Adapter exceptions are typed for specific handling."""
|
||||
if isinstance(error, RemoteAdminError):
|
||||
logger.error(f"{self.name} remote admin error: {error}")
|
||||
# RemoteAdminPollerThread overrides to set _connected = False
|
||||
elif isinstance(error, ConfigWriteError):
|
||||
logger.critical(f"{self.name} config write error (atomic write failed): {error}")
|
||||
elif isinstance(error, ConfigValidationError):
|
||||
logger.error(f"{self.name} config validation error: {error}")
|
||||
else:
|
||||
logger.error(f"{self.name} unhandled error: {error}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## WebSocket Connection Manager (asyncio)
|
||||
|
||||
**Game-agnostic.** No changes from single-game design.
|
||||
|
||||
```python
|
||||
# websocket/manager.py
|
||||
# core/websocket/manager.py
|
||||
class ConnectionManager:
|
||||
def __init__(self):
|
||||
# server_id → set[WebSocket]
|
||||
# Use set (not list) so .add()/.discard() work correctly.
|
||||
self._connections: dict[str, set[WebSocket]] = defaultdict(set)
|
||||
# Per-connection channel subscriptions: ws → set[str]
|
||||
self._channel_subs: dict[WebSocket, set[str]] = defaultdict(set)
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
@@ -529,8 +688,7 @@ class ConnectionManager:
|
||||
await ws.accept()
|
||||
async with self._lock:
|
||||
self._connections[server_id].add(ws)
|
||||
self._channel_subs[ws].add('status') # default channel
|
||||
# Only add to 'all' bucket if server_id is explicitly 'all'
|
||||
self._channel_subs[ws].add('status')
|
||||
if server_id == 'all':
|
||||
self._connections['all'].add(ws)
|
||||
|
||||
@@ -549,15 +707,12 @@ class ConnectionManager:
|
||||
self._channel_subs[ws].difference_update(channels)
|
||||
|
||||
async def broadcast(self, server_id: str, message: dict, channel: str = None):
|
||||
"""Send to all clients subscribed to server_id AND the message's channel."""
|
||||
targets: set[WebSocket] = set()
|
||||
async with self._lock:
|
||||
# Collect clients for this server_id + 'all' subscribers
|
||||
server_clients = self._connections.get(server_id, set())
|
||||
all_clients = self._connections.get('all', set())
|
||||
candidates = server_clients | all_clients
|
||||
|
||||
# Filter by channel subscription if specified
|
||||
if channel:
|
||||
targets = {ws for ws in candidates
|
||||
if channel in self._channel_subs.get(ws, set())}
|
||||
@@ -571,7 +726,6 @@ class ConnectionManager:
|
||||
except Exception:
|
||||
dead.append(ws)
|
||||
|
||||
# Clean up dead connections
|
||||
if dead:
|
||||
async with self._lock:
|
||||
for ws in dead:
|
||||
@@ -587,9 +741,9 @@ class ConnectionManager:
|
||||
| Thread | Memory Impact | CPU Impact |
|
||||
|--------|--------------|-----------|
|
||||
| ProcessMonitorThread | Minimal (one `os.kill` check) | Negligible |
|
||||
| LogTailThread | Buffer for unread log lines | Low (file I/O) |
|
||||
| LogTailThread | Buffer for unread log lines | Low (file I/O + adapter parsing) |
|
||||
| MetricsCollectorThread | psutil subprocess scan | Low-Medium |
|
||||
| RConPollerThread | UDP socket + response buffer | Low |
|
||||
| RemoteAdminPollerThread | Adapter client socket + buffer | Low (varies by adapter protocol) |
|
||||
| BroadcastThread | Queue buffer (max 10000 entries) | Low |
|
||||
|
||||
### Recommendations
|
||||
@@ -597,4 +751,5 @@ class ConnectionManager:
|
||||
- `broadcast_queue.maxsize=10000` — backpressure; drop on Full (log warning)
|
||||
- `LogTailThread` buffers max ~100 lines per tick before writing to DB in batch
|
||||
- `MetricsCollectorThread` uses `psutil.Process.cpu_percent(interval=0.5)` — blocks 500ms, acceptable at 5s interval
|
||||
- For N=10 servers: 41 background threads — well within Python's thread limits
|
||||
- For N=10 servers: 31-41 background threads — well within Python's thread limits
|
||||
- Games without remote admin skip the RemoteAdminPollerThread entirely
|
||||
Reference in New Issue
Block a user