fix: address design review ACT NOW items (6 risk gaps)
- Add migrate_config() to ConfigGenerator protocol for schema version upgrades - Add per-server operation lock to ProcessManager to prevent start/stop races - Add busy_timeout retry/backoff strategy (exponential: 1s, 2s, 4s) for DB lock exhaustion - Add ConfigForm testing strategy and error boundary for malformed schemas - Add schema cache invalidation on adapter version change - Add ConfigMigrationError to typed adapter exceptions
This commit is contained in:
@@ -211,6 +211,7 @@ def start(self, server_id: int) -> dict:
|
|||||||
### ProcessManager (Core)
|
### ProcessManager (Core)
|
||||||
- Singleton that owns all subprocess handles
|
- Singleton that owns all subprocess handles
|
||||||
- Thread-safe dict: `{server_id: subprocess.Popen}`
|
- Thread-safe dict: `{server_id: subprocess.Popen}`
|
||||||
|
- **Per-server operation lock** (`_operation_locks: dict[int, threading.Lock]`) — serializes start/stop/restart for the same server. Prevents race conditions when two admins hit "start" simultaneously or when start+stop overlap. Each server gets its own lock; different servers operate independently.
|
||||||
- `start()` sets `cwd=servers/{server_id}/` so relative config paths resolve correctly
|
- `start()` sets `cwd=servers/{server_id}/` so relative config paths resolve correctly
|
||||||
- On Windows: `terminate()` = `TerminateProcess` (hard kill, no SIGTERM) — graceful shutdown must go through adapter's RemoteAdmin
|
- On Windows: `terminate()` = `TerminateProcess` (hard kill, no SIGTERM) — graceful shutdown must go through adapter's RemoteAdmin
|
||||||
- Provides: `start()`, `stop()`, `restart()`, `is_running()`, `send_command()`
|
- Provides: `start()`, `stop()`, `restart()`, `is_running()`, `send_command()`
|
||||||
@@ -232,6 +233,7 @@ def start(self, server_id: int) -> dict:
|
|||||||
- **Validation is delegated to adapter's Pydantic models** — core never inspects config content
|
- **Validation is delegated to adapter's Pydantic models** — core never inspects config content
|
||||||
- **Sensitive field encryption**: calls `adapter.get_config_generator().get_sensitive_fields(section)` to identify which JSON keys to encrypt/decrypt via Fernet
|
- **Sensitive field encryption**: calls `adapter.get_config_generator().get_sensitive_fields(section)` to identify which JSON keys to encrypt/decrypt via Fernet
|
||||||
- **Optimistic locking**: each row includes `config_version` (integer). On PUT, client sends the version they read. If version mismatch, return 409 Conflict.
|
- **Optimistic locking**: each row includes `config_version` (integer). On PUT, client sends the version they read. If version mismatch, return 409 Conflict.
|
||||||
|
- **Schema migration**: on read, if `schema_version` differs from `adapter.get_config_version()`, core calls `adapter.migrate_config(old_version, config_json)`. On success, updates the row with migrated JSON and new `schema_version`. On `ConfigMigrationError`, keeps original config and logs a warning.
|
||||||
- Provides: `get_section()`, `get_all_sections()`, `upsert_section()`, `delete_sections()`
|
- Provides: `get_section()`, `get_all_sections()`, `upsert_section()`, `delete_sections()`
|
||||||
|
|
||||||
### Adapter Exceptions (Standard Error Types)
|
### Adapter Exceptions (Standard Error Types)
|
||||||
@@ -245,6 +247,7 @@ Adapters raise specific exception types so the core can handle errors precisely:
|
|||||||
| `LaunchArgsError` | build_launch_args() fails (missing mod, bad path) | Set server status='error', return 400 |
|
| `LaunchArgsError` | build_launch_args() fails (missing mod, bad path) | Set server status='error', return 400 |
|
||||||
| `RemoteAdminError` | Remote admin connection/command fails | Log warning, return 503 with detail |
|
| `RemoteAdminError` | Remote admin connection/command fails | Log warning, return 503 with detail |
|
||||||
| `ExeNotAllowedError` | Executable not in adapter's allowlist | Return 400 with allowed list |
|
| `ExeNotAllowedError` | Executable not in adapter's allowlist | Return 400 with allowed list |
|
||||||
|
| `ConfigMigrationError` | `migrate_config()` fails to transform old schema | Keep original config, log warning, server runs with old schema |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -529,10 +532,11 @@ Steps:
|
|||||||
| Sync vs async DB | **Sync SQLAlchemy only** | All DB access is synchronous; background threads are non-async; no aiosqlite dependency |
|
| Sync vs async DB | **Sync SQLAlchemy only** | All DB access is synchronous; background threads are non-async; no aiosqlite dependency |
|
||||||
| WebSocket auth | JWT in query param on connect | Browser WS API doesn't support headers |
|
| WebSocket auth | JWT in query param on connect | Browser WS API doesn't support headers |
|
||||||
| Process ownership | **ProcessManager singleton** | Single source of truth; prevents duplicate launches |
|
| Process ownership | **ProcessManager singleton** | Single source of truth; prevents duplicate launches |
|
||||||
|
| Server operation safety | **Per-server operation lock** | ProcessManager holds a lock per server_id that serializes start/stop/restart. Two concurrent start requests for the same server: the second waits for the first to complete, then sees the server is already running. Different servers are independent (no cross-server locking). |
|
||||||
| Config files | **Adapter regenerates on each start** | Always fresh from DB; no sync drift; adapter's structured builder prevents config injection |
|
| Config files | **Adapter regenerates on each start** | Always fresh from DB; no sync drift; adapter's structured builder prevents config injection |
|
||||||
| Config write failure | **Atomic write + rollback** | Adapter writes to temp files first, then atomic rename. On failure, temp files are cleaned up — original files remain untouched. Server start never proceeds with partial config. |
|
| Config write failure | **Atomic write + rollback** | Adapter writes to temp files first, then atomic rename. On failure, temp files are cleaned up — original files remain untouched. Server start never proceeds with partial config. |
|
||||||
| Sensitive field encryption | **Adapter declares via get_sensitive_fields()** | ConfigGenerator protocol returns list of JSON keys per section that need Fernet encryption. Core's ConfigRepository handles encrypt/decrypt transparently. |
|
| Sensitive field encryption | **Adapter declares via get_sensitive_fields()** | ConfigGenerator protocol returns list of JSON keys per section that need Fernet encryption. Core's ConfigRepository handles encrypt/decrypt transparently. |
|
||||||
| Adapter schema versioning | **config_version in game_configs row** | Each config section row stores a version string. On adapter update, if version differs, adapter provides a migration function. |
|
| Adapter schema versioning | **config_version in game_configs row** | Each config section row stores a version string. On adapter update, if version differs, core calls `adapter.migrate_config(old_version, config_json)` which returns the migrated dict. On migration failure (ConfigMigrationError), core keeps the original config and logs a warning — the server can still run with the old schema. |
|
||||||
| Adapter error communication | **Typed adapter exceptions** | Adapters raise specific exception types (ConfigWriteError, ConfigValidationError, LaunchArgsError, RemoteAdminError). Core catches specifically and sets appropriate DB status + returns clear API errors. |
|
| Adapter error communication | **Typed adapter exceptions** | Adapters raise specific exception types (ConfigWriteError, ConfigValidationError, LaunchArgsError, RemoteAdminError). Core catches specifically and sets appropriate DB status + returns clear API errors. |
|
||||||
| Remote admin thread safety | **Core wraps with lock** | Core wraps RemoteAdminClient calls with a threading.Lock. Adapter clients don't need to be thread-safe. One lock per server — API requests and poller thread share safely. |
|
| Remote admin thread safety | **Core wraps with lock** | Core wraps RemoteAdminClient calls with a threading.Lock. Adapter clients don't need to be thread-safe. One lock per server — API requests and poller thread share safely. |
|
||||||
| Third-party adapter loading | **Setuptools entry_points** | Third-party adapters register via `languard.adapters` entry_point group. Core scans entry_points at startup and auto-registers. Built-in adapters registered on import. |
|
| Third-party adapter loading | **Setuptools entry_points** | Third-party adapters register via `languard.adapters` entry_point group. Core scans entry_points at startup and auto-registers. Built-in adapters registered on import. |
|
||||||
|
|||||||
13
DATABASE.md
13
DATABASE.md
@@ -6,6 +6,7 @@
|
|||||||
- WAL mode enabled: `PRAGMA journal_mode=WAL` — allows concurrent reads during writes
|
- WAL mode enabled: `PRAGMA journal_mode=WAL` — allows concurrent reads during writes
|
||||||
- Foreign keys enabled: `PRAGMA foreign_keys=ON`
|
- Foreign keys enabled: `PRAGMA foreign_keys=ON`
|
||||||
- Busy timeout: `PRAGMA busy_timeout=5000` — prevents "database is locked" errors under concurrent thread writes
|
- Busy timeout: `PRAGMA busy_timeout=5000` — prevents "database is locked" errors under concurrent thread writes
|
||||||
|
- **Retry on exhaustion**: If busy_timeout is exceeded (5s), writes fail with `OperationalError("database is locked")`. Background threads retry with exponential backoff (1s, 2s, 4s), then skip the tick. API handlers retry up to 2 times with 1s backoff, then return 503. See THREADING.md for the full retry implementation.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -537,7 +538,17 @@ The `config_version` column in `game_configs` prevents lost updates when two adm
|
|||||||
4. If match: increment version, write new JSON, return 200
|
4. If match: increment version, write new JSON, return 200
|
||||||
5. If no match (version changed): return **409 Conflict** with current config for client-side merge
|
5. If no match (version changed): return **409 Conflict** with current config for client-side merge
|
||||||
|
|
||||||
## game_data JSON Schema
|
## Config Schema Migration
|
||||||
|
|
||||||
|
When the adapter is updated and `get_config_version()` returns a newer version than what's stored in `game_configs.schema_version`, the core automatically migrates:
|
||||||
|
|
||||||
|
1. On read, detect `stored_schema_version != adapter.get_config_version()`
|
||||||
|
2. Call `adapter.migrate_config(stored_schema_version, config_json)` → returns migrated dict
|
||||||
|
3. Update the row: `SET config_json = ?, schema_version = ? WHERE id = ?`
|
||||||
|
4. On `ConfigMigrationError`: keep original config, log warning, server runs with old schema
|
||||||
|
5. Migration is per-section — each section can have a different stored version
|
||||||
|
|
||||||
|
This ensures config data is always compatible with the current adapter without manual intervention.
|
||||||
|
|
||||||
The `game_data` columns on `players`, `missions`, `mods`, and `bans` are validated by adapter-provided Pydantic models. Each capability protocol optionally provides a schema:
|
The `game_data` columns on `players`, `missions`, `mods`, and `bans` are validated by adapter-provided Pydantic models. Each capability protocol optionally provides a schema:
|
||||||
|
|
||||||
|
|||||||
41
FRONTEND.md
41
FRONTEND.md
@@ -895,6 +895,14 @@ frontend/
|
|||||||
- Optimistic locking (send `config_version` on save)
|
- Optimistic locking (send `config_version` on save)
|
||||||
- 409 conflict resolution (show diff, allow override/merge)
|
- 409 conflict resolution (show diff, allow override/merge)
|
||||||
- Validation errors from adapter (field-level, from Pydantic)
|
- Validation errors from adapter (field-level, from Pydantic)
|
||||||
|
- **Error boundary**: if `jsonSchemaToFields()` encounters an unsupported or malformed schema type (deeply nested objects, unknown formats), render a fallback "This section uses an unsupported field type. Edit raw JSON." with a raw JSON editor instead of crashing the form
|
||||||
|
|
||||||
|
**ConfigForm test plan** (critical — highest-risk component):
|
||||||
|
- **Unit: `jsonSchemaToFields`** — test every supported type mapping (string→text, integer→number, boolean→toggle, enum→select, array→repeatable), test sensitive field masking, test unknown type → fallback descriptor with `type: "raw_json"`
|
||||||
|
- **Unit: `jsonSchemaToFields`** — test malformed schema input (missing `properties`, nested `$ref`, `oneOf`/`anyOf`) → returns fallback descriptor, never throws
|
||||||
|
- **Integration: `ConfigForm`** — render with a 2-section schema, verify field rendering, toggle a boolean, verify dirty state, submit and verify request payload
|
||||||
|
- **Integration: 409 conflict** — after save, mock a 409 response, verify ConflictDialog appears with diff
|
||||||
|
- **Integration: error boundary** — mount `ConfigForm` with a schema that has unsupported `type: "object"` nested 3 levels deep, verify raw JSON fallback renders instead of crash
|
||||||
|
|
||||||
**`LogViewer`** — performance-critical:
|
**`LogViewer`** — performance-critical:
|
||||||
- Virtualized rendering (react-window or similar)
|
- Virtualized rendering (react-window or similar)
|
||||||
@@ -929,7 +937,7 @@ All API data uses TanStack Query with appropriate stale times:
|
|||||||
| Mods | 5min | — |
|
| Mods | 5min | — |
|
||||||
| Bans | 2min | — |
|
| Bans | 2min | — |
|
||||||
| Game types | 30min | — (rarely changes) |
|
| Game types | 30min | — (rarely changes) |
|
||||||
| Config schema | 30min | — (tied to adapter version) |
|
| Config schema | 30min | — (tied to adapter version) **Invalidation**: when `GET /games/{type}` returns a different `schema_version` than previously cached, TanStack Query invalidates all config schema queries for that game type. This prevents stale forms after an adapter update. |
|
||||||
|
|
||||||
**Optimistic updates** on:
|
**Optimistic updates** on:
|
||||||
- Server start/stop → immediately update status in cache, rollback on error
|
- Server start/stop → immediately update status in cache, rollback on error
|
||||||
@@ -1071,7 +1079,7 @@ const { hasMissions, hasMods, hasRemoteAdmin } = useCapability(serverId);
|
|||||||
interface FieldDescriptor {
|
interface FieldDescriptor {
|
||||||
name: string;
|
name: string;
|
||||||
label: string;
|
label: string;
|
||||||
type: "text" | "number" | "boolean" | "select" | "textarea" | "password" | "array";
|
type: "text" | "number" | "boolean" | "select" | "textarea" | "password" | "array" | "raw_json";
|
||||||
default?: unknown;
|
default?: unknown;
|
||||||
min?: number;
|
min?: number;
|
||||||
max?: number;
|
max?: number;
|
||||||
@@ -1087,6 +1095,9 @@ function jsonSchemaToFields(
|
|||||||
): FieldDescriptor[] {
|
): FieldDescriptor[] {
|
||||||
// Walk schema.properties, map each to a FieldDescriptor
|
// Walk schema.properties, map each to a FieldDescriptor
|
||||||
// Mark fields in sensitiveFields as type: "password"
|
// Mark fields in sensitiveFields as type: "password"
|
||||||
|
// Unknown types (nested objects, oneOf/anyOf, $ref) → type: "raw_json"
|
||||||
|
// The form renders a textarea with JSON editing for these fields
|
||||||
|
// rather than crashing or hiding the field silently
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -1105,6 +1116,31 @@ function jsonSchemaToFields(
|
|||||||
|
|
||||||
Future adapters register themselves; the card list auto-populates from `GET /games`.
|
Future adapters register themselves; the card list auto-populates from `GET /games`.
|
||||||
|
|
||||||
|
### Schema Cache Invalidation
|
||||||
|
|
||||||
|
Adapter config schemas are cached with `staleTime: 30min`. When an adapter is updated and its `schema_version` changes, the frontend must not serve a stale schema. The invalidation pattern:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// When fetching game type info, compare schema_version
|
||||||
|
function useGameTypeWithInvalidation(gameType: string) {
|
||||||
|
const queryClient = useQueryClient();
|
||||||
|
return useQuery({
|
||||||
|
queryKey: ["gameType", gameType],
|
||||||
|
staleTime: 30 * 60 * 1000,
|
||||||
|
onSuccess: (data) => {
|
||||||
|
const prevVersion = localStorage.getItem(`schema_version_${gameType}`);
|
||||||
|
if (prevVersion && prevVersion !== data.schema_version) {
|
||||||
|
// Adapter was updated — invalidate all config schema queries
|
||||||
|
queryClient.invalidateQueries({ queryKey: ["configSchema", gameType] });
|
||||||
|
localStorage.setItem(`schema_version_${gameType}`, data.schema_version);
|
||||||
|
} else if (!prevVersion) {
|
||||||
|
localStorage.setItem(`schema_version_${gameType}`, data.schema_version);
|
||||||
|
}
|
||||||
|
},
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Error Handling UX
|
## Error Handling UX
|
||||||
@@ -1121,6 +1157,7 @@ Future adapters register themselves; the card list auto-populates from `GET /gam
|
|||||||
| WS disconnect | StatusBar indicator + stale data with timestamp |
|
| WS disconnect | StatusBar indicator + stale data with timestamp |
|
||||||
| WS reconnect | Automatic; no user action needed |
|
| WS reconnect | Automatic; no user action needed |
|
||||||
| Server crashed | Toast notification + status dot turns red |
|
| Server crashed | Toast notification + status dot turns red |
|
||||||
|
| Malformed adapter schema | Raw JSON fallback in ConfigForm section ("This section uses an unsupported field type. Edit raw JSON.") |
|
||||||
|
|
||||||
**No `alert()` calls.** All feedback uses toast (transient) or inline (persistent) patterns.
|
**No `alert()` calls.** All feedback uses toast (transient) or inline (persistent) patterns.
|
||||||
|
|
||||||
|
|||||||
@@ -447,6 +447,7 @@ Verify React app can:
|
|||||||
### Adapter contract tests
|
### Adapter contract tests
|
||||||
- Template test suite that any new adapter should pass
|
- Template test suite that any new adapter should pass
|
||||||
- Tests: ConfigGenerator produces valid sections and valid config files, ProcessConfig returns allowed executables, LogParser parses sample lines
|
- Tests: ConfigGenerator produces valid sections and valid config files, ProcessConfig returns allowed executables, LogParser parses sample lines
|
||||||
|
- ConfigGenerator migration test: `migrate_config(old_version, config_json)` returns valid migrated dict; `ConfigMigrationError` on invalid old_version
|
||||||
|
|
||||||
### Load notes
|
### Load notes
|
||||||
- SQLite with WAL handles concurrent reads from 4 threads per server well
|
- SQLite with WAL handles concurrent reads from 4 threads per server well
|
||||||
|
|||||||
10
MODULES.md
10
MODULES.md
@@ -281,10 +281,14 @@ class ProcessManager:
|
|||||||
_instance = None
|
_instance = None
|
||||||
_processes: dict[int, subprocess.Popen] = {}
|
_processes: dict[int, subprocess.Popen] = {}
|
||||||
_lock: threading.Lock
|
_lock: threading.Lock
|
||||||
|
_operation_locks: dict[int, threading.Lock] # per-server operation lock
|
||||||
|
|
||||||
@classmethod
|
@classmethod
|
||||||
def get() -> ProcessManager
|
def get() -> ProcessManager
|
||||||
|
|
||||||
|
def get_operation_lock(server_id) -> threading.Lock
|
||||||
|
# Returns a per-server lock that serializes start/stop/restart
|
||||||
|
# Prevents concurrent start+stop races for the same server
|
||||||
def start(server_id, exe_path, args: list[str], cwd: str) -> int # returns PID
|
def start(server_id, exe_path, args: list[str], cwd: str) -> int # returns PID
|
||||||
def stop(server_id, timeout=30) -> bool
|
def stop(server_id, timeout=30) -> bool
|
||||||
def kill(server_id) -> bool
|
def kill(server_id) -> bool
|
||||||
@@ -579,6 +583,12 @@ class ConfigGenerator(Protocol):
|
|||||||
def get_config_version(self) -> str:
|
def get_config_version(self) -> str:
|
||||||
"""Current adapter schema version. Stored in game_configs.schema_version."""
|
"""Current adapter schema version. Stored in game_configs.schema_version."""
|
||||||
...
|
...
|
||||||
|
def migrate_config(self, old_version: str, config_json: dict[str, dict]) -> dict[str, dict]:
|
||||||
|
"""Transform config JSON from an older schema version to the current one.
|
||||||
|
Called by ConfigRepository when a section's stored schema_version
|
||||||
|
differs from get_config_version(). Returns the migrated config dict.
|
||||||
|
Raises ConfigMigrationError on failure (core rolls back; original stays)."""
|
||||||
|
...
|
||||||
def write_configs(self, server_id: int, server_dir: Path,
|
def write_configs(self, server_id: int, server_dir: Path,
|
||||||
config_sections: dict[str, dict]) -> list[Path]: ...
|
config_sections: dict[str, dict]) -> list[Path]: ...
|
||||||
def build_launch_args(self, config_sections: dict[str, dict],
|
def build_launch_args(self, config_sections: dict[str, dict],
|
||||||
|
|||||||
27
THREADING.md
27
THREADING.md
@@ -137,8 +137,16 @@ This means:
|
|||||||
# Each background thread creates its own SQLAlchemy connection
|
# Each background thread creates its own SQLAlchemy connection
|
||||||
# from the same engine (WAL mode allows concurrent reads)
|
# from the same engine (WAL mode allows concurrent reads)
|
||||||
# PRAGMA busy_timeout=5000 prevents "database is locked" errors
|
# PRAGMA busy_timeout=5000 prevents "database is locked" errors
|
||||||
|
#
|
||||||
|
# If busy_timeout is exhausted (5s), the write fails with
|
||||||
|
# OperationalError. Background threads retry with exponential
|
||||||
|
# backoff: 1s, 2s, 4s — then log and skip the tick.
|
||||||
|
# API request handlers retry up to 2 times with 1s backoff,
|
||||||
|
# then return 503 "database temporarily unavailable".
|
||||||
|
|
||||||
class BaseServerThread(threading.Thread):
|
class BaseServerThread(threading.Thread):
|
||||||
|
_db_retry_delays = [1.0, 2.0, 4.0] # seconds, exponential backoff
|
||||||
|
|
||||||
def run(self):
|
def run(self):
|
||||||
engine = get_engine()
|
engine = get_engine()
|
||||||
self._db = engine.connect()
|
self._db = engine.connect()
|
||||||
@@ -147,6 +155,13 @@ class BaseServerThread(threading.Thread):
|
|||||||
while not self._stop_event.is_set():
|
while not self._stop_event.is_set():
|
||||||
try:
|
try:
|
||||||
self.tick()
|
self.tick()
|
||||||
|
except OperationalError as e:
|
||||||
|
if "database is locked" in str(e):
|
||||||
|
retried = self._retry_db_write(self.tick)
|
||||||
|
if not retried:
|
||||||
|
logger.warning(f"{self.name}: DB locked after all retries, skipping tick")
|
||||||
|
else:
|
||||||
|
self.on_error(e)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
self.on_error(e)
|
self.on_error(e)
|
||||||
self._stop_event.wait(self.interval)
|
self._stop_event.wait(self.interval)
|
||||||
@@ -155,6 +170,18 @@ class BaseServerThread(threading.Thread):
|
|||||||
finally:
|
finally:
|
||||||
self.teardown()
|
self.teardown()
|
||||||
self._db.close()
|
self._db.close()
|
||||||
|
|
||||||
|
def _retry_db_write(self, fn, max_retries=3):
|
||||||
|
for i, delay in enumerate(self._db_retry_delays[:max_retries]):
|
||||||
|
self._stop_event.wait(delay)
|
||||||
|
if self._stop_event.is_set():
|
||||||
|
return False
|
||||||
|
try:
|
||||||
|
fn()
|
||||||
|
return True
|
||||||
|
except OperationalError:
|
||||||
|
continue
|
||||||
|
return False
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|||||||
Reference in New Issue
Block a user