fix: address design review ACT NOW items (6 risk gaps)
- Add migrate_config() to ConfigGenerator protocol for schema version upgrades - Add per-server operation lock to ProcessManager to prevent start/stop races - Add busy_timeout retry/backoff strategy (exponential: 1s, 2s, 4s) for DB lock exhaustion - Add ConfigForm testing strategy and error boundary for malformed schemas - Add schema cache invalidation on adapter version change - Add ConfigMigrationError to typed adapter exceptions
This commit is contained in:
@@ -211,6 +211,7 @@ def start(self, server_id: int) -> dict:
|
||||
### ProcessManager (Core)
|
||||
- Singleton that owns all subprocess handles
|
||||
- Thread-safe dict: `{server_id: subprocess.Popen}`
|
||||
- **Per-server operation lock** (`_operation_locks: dict[int, threading.Lock]`) — serializes start/stop/restart for the same server. Prevents race conditions when two admins hit "start" simultaneously or when start+stop overlap. Each server gets its own lock; different servers operate independently.
|
||||
- `start()` sets `cwd=servers/{server_id}/` so relative config paths resolve correctly
|
||||
- On Windows: `terminate()` = `TerminateProcess` (hard kill, no SIGTERM) — graceful shutdown must go through adapter's RemoteAdmin
|
||||
- Provides: `start()`, `stop()`, `restart()`, `is_running()`, `send_command()`
|
||||
@@ -232,6 +233,7 @@ def start(self, server_id: int) -> dict:
|
||||
- **Validation is delegated to adapter's Pydantic models** — core never inspects config content
|
||||
- **Sensitive field encryption**: calls `adapter.get_config_generator().get_sensitive_fields(section)` to identify which JSON keys to encrypt/decrypt via Fernet
|
||||
- **Optimistic locking**: each row includes `config_version` (integer). On PUT, client sends the version they read. If version mismatch, return 409 Conflict.
|
||||
- **Schema migration**: on read, if `schema_version` differs from `adapter.get_config_version()`, core calls `adapter.migrate_config(old_version, config_json)`. On success, updates the row with migrated JSON and new `schema_version`. On `ConfigMigrationError`, keeps original config and logs a warning.
|
||||
- Provides: `get_section()`, `get_all_sections()`, `upsert_section()`, `delete_sections()`
|
||||
|
||||
### Adapter Exceptions (Standard Error Types)
|
||||
@@ -245,6 +247,7 @@ Adapters raise specific exception types so the core can handle errors precisely:
|
||||
| `LaunchArgsError` | build_launch_args() fails (missing mod, bad path) | Set server status='error', return 400 |
|
||||
| `RemoteAdminError` | Remote admin connection/command fails | Log warning, return 503 with detail |
|
||||
| `ExeNotAllowedError` | Executable not in adapter's allowlist | Return 400 with allowed list |
|
||||
| `ConfigMigrationError` | `migrate_config()` fails to transform old schema | Keep original config, log warning, server runs with old schema |
|
||||
|
||||
---
|
||||
|
||||
@@ -529,10 +532,11 @@ Steps:
|
||||
| Sync vs async DB | **Sync SQLAlchemy only** | All DB access is synchronous; background threads are non-async; no aiosqlite dependency |
|
||||
| WebSocket auth | JWT in query param on connect | Browser WS API doesn't support headers |
|
||||
| Process ownership | **ProcessManager singleton** | Single source of truth; prevents duplicate launches |
|
||||
| Server operation safety | **Per-server operation lock** | ProcessManager holds a lock per server_id that serializes start/stop/restart. Two concurrent start requests for the same server: the second waits for the first to complete, then sees the server is already running. Different servers are independent (no cross-server locking). |
|
||||
| Config files | **Adapter regenerates on each start** | Always fresh from DB; no sync drift; adapter's structured builder prevents config injection |
|
||||
| Config write failure | **Atomic write + rollback** | Adapter writes to temp files first, then atomic rename. On failure, temp files are cleaned up — original files remain untouched. Server start never proceeds with partial config. |
|
||||
| Sensitive field encryption | **Adapter declares via get_sensitive_fields()** | ConfigGenerator protocol returns list of JSON keys per section that need Fernet encryption. Core's ConfigRepository handles encrypt/decrypt transparently. |
|
||||
| Adapter schema versioning | **config_version in game_configs row** | Each config section row stores a version string. On adapter update, if version differs, adapter provides a migration function. |
|
||||
| Adapter schema versioning | **config_version in game_configs row** | Each config section row stores a version string. On adapter update, if version differs, core calls `adapter.migrate_config(old_version, config_json)` which returns the migrated dict. On migration failure (ConfigMigrationError), core keeps the original config and logs a warning — the server can still run with the old schema. |
|
||||
| Adapter error communication | **Typed adapter exceptions** | Adapters raise specific exception types (ConfigWriteError, ConfigValidationError, LaunchArgsError, RemoteAdminError). Core catches specifically and sets appropriate DB status + returns clear API errors. |
|
||||
| Remote admin thread safety | **Core wraps with lock** | Core wraps RemoteAdminClient calls with a threading.Lock. Adapter clients don't need to be thread-safe. One lock per server — API requests and poller thread share safely. |
|
||||
| Third-party adapter loading | **Setuptools entry_points** | Third-party adapters register via `languard.adapters` entry_point group. Core scans entry_points at startup and auto-registers. Built-in adapters registered on import. |
|
||||
|
||||
13
DATABASE.md
13
DATABASE.md
@@ -6,6 +6,7 @@
|
||||
- WAL mode enabled: `PRAGMA journal_mode=WAL` — allows concurrent reads during writes
|
||||
- Foreign keys enabled: `PRAGMA foreign_keys=ON`
|
||||
- Busy timeout: `PRAGMA busy_timeout=5000` — prevents "database is locked" errors under concurrent thread writes
|
||||
- **Retry on exhaustion**: If busy_timeout is exceeded (5s), writes fail with `OperationalError("database is locked")`. Background threads retry with exponential backoff (1s, 2s, 4s), then skip the tick. API handlers retry up to 2 times with 1s backoff, then return 503. See THREADING.md for the full retry implementation.
|
||||
|
||||
---
|
||||
|
||||
@@ -537,7 +538,17 @@ The `config_version` column in `game_configs` prevents lost updates when two adm
|
||||
4. If match: increment version, write new JSON, return 200
|
||||
5. If no match (version changed): return **409 Conflict** with current config for client-side merge
|
||||
|
||||
## game_data JSON Schema
|
||||
## Config Schema Migration
|
||||
|
||||
When the adapter is updated and `get_config_version()` returns a newer version than what's stored in `game_configs.schema_version`, the core automatically migrates:
|
||||
|
||||
1. On read, detect `stored_schema_version != adapter.get_config_version()`
|
||||
2. Call `adapter.migrate_config(stored_schema_version, config_json)` → returns migrated dict
|
||||
3. Update the row: `SET config_json = ?, schema_version = ? WHERE id = ?`
|
||||
4. On `ConfigMigrationError`: keep original config, log warning, server runs with old schema
|
||||
5. Migration is per-section — each section can have a different stored version
|
||||
|
||||
This ensures config data is always compatible with the current adapter without manual intervention.
|
||||
|
||||
The `game_data` columns on `players`, `missions`, `mods`, and `bans` are validated by adapter-provided Pydantic models. Each capability protocol optionally provides a schema:
|
||||
|
||||
|
||||
41
FRONTEND.md
41
FRONTEND.md
@@ -895,6 +895,14 @@ frontend/
|
||||
- Optimistic locking (send `config_version` on save)
|
||||
- 409 conflict resolution (show diff, allow override/merge)
|
||||
- Validation errors from adapter (field-level, from Pydantic)
|
||||
- **Error boundary**: if `jsonSchemaToFields()` encounters an unsupported or malformed schema type (deeply nested objects, unknown formats), render a fallback "This section uses an unsupported field type. Edit raw JSON." with a raw JSON editor instead of crashing the form
|
||||
|
||||
**ConfigForm test plan** (critical — highest-risk component):
|
||||
- **Unit: `jsonSchemaToFields`** — test every supported type mapping (string→text, integer→number, boolean→toggle, enum→select, array→repeatable), test sensitive field masking, test unknown type → fallback descriptor with `type: "raw_json"`
|
||||
- **Unit: `jsonSchemaToFields`** — test malformed schema input (missing `properties`, nested `$ref`, `oneOf`/`anyOf`) → returns fallback descriptor, never throws
|
||||
- **Integration: `ConfigForm`** — render with a 2-section schema, verify field rendering, toggle a boolean, verify dirty state, submit and verify request payload
|
||||
- **Integration: 409 conflict** — after save, mock a 409 response, verify ConflictDialog appears with diff
|
||||
- **Integration: error boundary** — mount `ConfigForm` with a schema that has unsupported `type: "object"` nested 3 levels deep, verify raw JSON fallback renders instead of crash
|
||||
|
||||
**`LogViewer`** — performance-critical:
|
||||
- Virtualized rendering (react-window or similar)
|
||||
@@ -929,7 +937,7 @@ All API data uses TanStack Query with appropriate stale times:
|
||||
| Mods | 5min | — |
|
||||
| Bans | 2min | — |
|
||||
| Game types | 30min | — (rarely changes) |
|
||||
| Config schema | 30min | — (tied to adapter version) |
|
||||
| Config schema | 30min | — (tied to adapter version) **Invalidation**: when `GET /games/{type}` returns a different `schema_version` than previously cached, TanStack Query invalidates all config schema queries for that game type. This prevents stale forms after an adapter update. |
|
||||
|
||||
**Optimistic updates** on:
|
||||
- Server start/stop → immediately update status in cache, rollback on error
|
||||
@@ -1071,7 +1079,7 @@ const { hasMissions, hasMods, hasRemoteAdmin } = useCapability(serverId);
|
||||
interface FieldDescriptor {
|
||||
name: string;
|
||||
label: string;
|
||||
type: "text" | "number" | "boolean" | "select" | "textarea" | "password" | "array";
|
||||
type: "text" | "number" | "boolean" | "select" | "textarea" | "password" | "array" | "raw_json";
|
||||
default?: unknown;
|
||||
min?: number;
|
||||
max?: number;
|
||||
@@ -1087,6 +1095,9 @@ function jsonSchemaToFields(
|
||||
): FieldDescriptor[] {
|
||||
// Walk schema.properties, map each to a FieldDescriptor
|
||||
// Mark fields in sensitiveFields as type: "password"
|
||||
// Unknown types (nested objects, oneOf/anyOf, $ref) → type: "raw_json"
|
||||
// The form renders a textarea with JSON editing for these fields
|
||||
// rather than crashing or hiding the field silently
|
||||
}
|
||||
```
|
||||
|
||||
@@ -1105,6 +1116,31 @@ function jsonSchemaToFields(
|
||||
|
||||
Future adapters register themselves; the card list auto-populates from `GET /games`.
|
||||
|
||||
### Schema Cache Invalidation
|
||||
|
||||
Adapter config schemas are cached with `staleTime: 30min`. When an adapter is updated and its `schema_version` changes, the frontend must not serve a stale schema. The invalidation pattern:
|
||||
|
||||
```typescript
|
||||
// When fetching game type info, compare schema_version
|
||||
function useGameTypeWithInvalidation(gameType: string) {
|
||||
const queryClient = useQueryClient();
|
||||
return useQuery({
|
||||
queryKey: ["gameType", gameType],
|
||||
staleTime: 30 * 60 * 1000,
|
||||
onSuccess: (data) => {
|
||||
const prevVersion = localStorage.getItem(`schema_version_${gameType}`);
|
||||
if (prevVersion && prevVersion !== data.schema_version) {
|
||||
// Adapter was updated — invalidate all config schema queries
|
||||
queryClient.invalidateQueries({ queryKey: ["configSchema", gameType] });
|
||||
localStorage.setItem(`schema_version_${gameType}`, data.schema_version);
|
||||
} else if (!prevVersion) {
|
||||
localStorage.setItem(`schema_version_${gameType}`, data.schema_version);
|
||||
}
|
||||
},
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Handling UX
|
||||
@@ -1121,6 +1157,7 @@ Future adapters register themselves; the card list auto-populates from `GET /gam
|
||||
| WS disconnect | StatusBar indicator + stale data with timestamp |
|
||||
| WS reconnect | Automatic; no user action needed |
|
||||
| Server crashed | Toast notification + status dot turns red |
|
||||
| Malformed adapter schema | Raw JSON fallback in ConfigForm section ("This section uses an unsupported field type. Edit raw JSON.") |
|
||||
|
||||
**No `alert()` calls.** All feedback uses toast (transient) or inline (persistent) patterns.
|
||||
|
||||
|
||||
@@ -447,6 +447,7 @@ Verify React app can:
|
||||
### Adapter contract tests
|
||||
- Template test suite that any new adapter should pass
|
||||
- Tests: ConfigGenerator produces valid sections and valid config files, ProcessConfig returns allowed executables, LogParser parses sample lines
|
||||
- ConfigGenerator migration test: `migrate_config(old_version, config_json)` returns valid migrated dict; `ConfigMigrationError` on invalid old_version
|
||||
|
||||
### Load notes
|
||||
- SQLite with WAL handles concurrent reads from 4 threads per server well
|
||||
|
||||
10
MODULES.md
10
MODULES.md
@@ -281,10 +281,14 @@ class ProcessManager:
|
||||
_instance = None
|
||||
_processes: dict[int, subprocess.Popen] = {}
|
||||
_lock: threading.Lock
|
||||
_operation_locks: dict[int, threading.Lock] # per-server operation lock
|
||||
|
||||
@classmethod
|
||||
def get() -> ProcessManager
|
||||
|
||||
def get_operation_lock(server_id) -> threading.Lock
|
||||
# Returns a per-server lock that serializes start/stop/restart
|
||||
# Prevents concurrent start+stop races for the same server
|
||||
def start(server_id, exe_path, args: list[str], cwd: str) -> int # returns PID
|
||||
def stop(server_id, timeout=30) -> bool
|
||||
def kill(server_id) -> bool
|
||||
@@ -579,6 +583,12 @@ class ConfigGenerator(Protocol):
|
||||
def get_config_version(self) -> str:
|
||||
"""Current adapter schema version. Stored in game_configs.schema_version."""
|
||||
...
|
||||
def migrate_config(self, old_version: str, config_json: dict[str, dict]) -> dict[str, dict]:
|
||||
"""Transform config JSON from an older schema version to the current one.
|
||||
Called by ConfigRepository when a section's stored schema_version
|
||||
differs from get_config_version(). Returns the migrated config dict.
|
||||
Raises ConfigMigrationError on failure (core rolls back; original stays)."""
|
||||
...
|
||||
def write_configs(self, server_id: int, server_dir: Path,
|
||||
config_sections: dict[str, dict]) -> list[Path]: ...
|
||||
def build_launch_args(self, config_sections: dict[str, dict],
|
||||
|
||||
27
THREADING.md
27
THREADING.md
@@ -137,8 +137,16 @@ This means:
|
||||
# Each background thread creates its own SQLAlchemy connection
|
||||
# from the same engine (WAL mode allows concurrent reads)
|
||||
# PRAGMA busy_timeout=5000 prevents "database is locked" errors
|
||||
#
|
||||
# If busy_timeout is exhausted (5s), the write fails with
|
||||
# OperationalError. Background threads retry with exponential
|
||||
# backoff: 1s, 2s, 4s — then log and skip the tick.
|
||||
# API request handlers retry up to 2 times with 1s backoff,
|
||||
# then return 503 "database temporarily unavailable".
|
||||
|
||||
class BaseServerThread(threading.Thread):
|
||||
_db_retry_delays = [1.0, 2.0, 4.0] # seconds, exponential backoff
|
||||
|
||||
def run(self):
|
||||
engine = get_engine()
|
||||
self._db = engine.connect()
|
||||
@@ -147,6 +155,13 @@ class BaseServerThread(threading.Thread):
|
||||
while not self._stop_event.is_set():
|
||||
try:
|
||||
self.tick()
|
||||
except OperationalError as e:
|
||||
if "database is locked" in str(e):
|
||||
retried = self._retry_db_write(self.tick)
|
||||
if not retried:
|
||||
logger.warning(f"{self.name}: DB locked after all retries, skipping tick")
|
||||
else:
|
||||
self.on_error(e)
|
||||
except Exception as e:
|
||||
self.on_error(e)
|
||||
self._stop_event.wait(self.interval)
|
||||
@@ -155,6 +170,18 @@ class BaseServerThread(threading.Thread):
|
||||
finally:
|
||||
self.teardown()
|
||||
self._db.close()
|
||||
|
||||
def _retry_db_write(self, fn, max_retries=3):
|
||||
for i, delay in enumerate(self._db_retry_delays[:max_retries]):
|
||||
self._stop_event.wait(delay)
|
||||
if self._stop_event.is_set():
|
||||
return False
|
||||
try:
|
||||
fn()
|
||||
return True
|
||||
except OperationalError:
|
||||
continue
|
||||
return False
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user