diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 737b81a..3f8bba3 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -211,6 +211,7 @@ def start(self, server_id: int) -> dict: ### ProcessManager (Core) - Singleton that owns all subprocess handles - Thread-safe dict: `{server_id: subprocess.Popen}` +- **Per-server operation lock** (`_operation_locks: dict[int, threading.Lock]`) — serializes start/stop/restart for the same server. Prevents race conditions when two admins hit "start" simultaneously or when start+stop overlap. Each server gets its own lock; different servers operate independently. - `start()` sets `cwd=servers/{server_id}/` so relative config paths resolve correctly - On Windows: `terminate()` = `TerminateProcess` (hard kill, no SIGTERM) — graceful shutdown must go through adapter's RemoteAdmin - Provides: `start()`, `stop()`, `restart()`, `is_running()`, `send_command()` @@ -232,6 +233,7 @@ def start(self, server_id: int) -> dict: - **Validation is delegated to adapter's Pydantic models** — core never inspects config content - **Sensitive field encryption**: calls `adapter.get_config_generator().get_sensitive_fields(section)` to identify which JSON keys to encrypt/decrypt via Fernet - **Optimistic locking**: each row includes `config_version` (integer). On PUT, client sends the version they read. If version mismatch, return 409 Conflict. +- **Schema migration**: on read, if `schema_version` differs from `adapter.get_config_version()`, core calls `adapter.migrate_config(old_version, config_json)`. On success, updates the row with migrated JSON and new `schema_version`. On `ConfigMigrationError`, keeps original config and logs a warning. - Provides: `get_section()`, `get_all_sections()`, `upsert_section()`, `delete_sections()` ### Adapter Exceptions (Standard Error Types) @@ -245,6 +247,7 @@ Adapters raise specific exception types so the core can handle errors precisely: | `LaunchArgsError` | build_launch_args() fails (missing mod, bad path) | Set server status='error', return 400 | | `RemoteAdminError` | Remote admin connection/command fails | Log warning, return 503 with detail | | `ExeNotAllowedError` | Executable not in adapter's allowlist | Return 400 with allowed list | +| `ConfigMigrationError` | `migrate_config()` fails to transform old schema | Keep original config, log warning, server runs with old schema | --- @@ -529,10 +532,11 @@ Steps: | Sync vs async DB | **Sync SQLAlchemy only** | All DB access is synchronous; background threads are non-async; no aiosqlite dependency | | WebSocket auth | JWT in query param on connect | Browser WS API doesn't support headers | | Process ownership | **ProcessManager singleton** | Single source of truth; prevents duplicate launches | +| Server operation safety | **Per-server operation lock** | ProcessManager holds a lock per server_id that serializes start/stop/restart. Two concurrent start requests for the same server: the second waits for the first to complete, then sees the server is already running. Different servers are independent (no cross-server locking). | | Config files | **Adapter regenerates on each start** | Always fresh from DB; no sync drift; adapter's structured builder prevents config injection | | Config write failure | **Atomic write + rollback** | Adapter writes to temp files first, then atomic rename. On failure, temp files are cleaned up — original files remain untouched. Server start never proceeds with partial config. | | Sensitive field encryption | **Adapter declares via get_sensitive_fields()** | ConfigGenerator protocol returns list of JSON keys per section that need Fernet encryption. Core's ConfigRepository handles encrypt/decrypt transparently. | -| Adapter schema versioning | **config_version in game_configs row** | Each config section row stores a version string. On adapter update, if version differs, adapter provides a migration function. | +| Adapter schema versioning | **config_version in game_configs row** | Each config section row stores a version string. On adapter update, if version differs, core calls `adapter.migrate_config(old_version, config_json)` which returns the migrated dict. On migration failure (ConfigMigrationError), core keeps the original config and logs a warning — the server can still run with the old schema. | | Adapter error communication | **Typed adapter exceptions** | Adapters raise specific exception types (ConfigWriteError, ConfigValidationError, LaunchArgsError, RemoteAdminError). Core catches specifically and sets appropriate DB status + returns clear API errors. | | Remote admin thread safety | **Core wraps with lock** | Core wraps RemoteAdminClient calls with a threading.Lock. Adapter clients don't need to be thread-safe. One lock per server — API requests and poller thread share safely. | | Third-party adapter loading | **Setuptools entry_points** | Third-party adapters register via `languard.adapters` entry_point group. Core scans entry_points at startup and auto-registers. Built-in adapters registered on import. | diff --git a/DATABASE.md b/DATABASE.md index 3127434..4948540 100644 --- a/DATABASE.md +++ b/DATABASE.md @@ -6,6 +6,7 @@ - WAL mode enabled: `PRAGMA journal_mode=WAL` — allows concurrent reads during writes - Foreign keys enabled: `PRAGMA foreign_keys=ON` - Busy timeout: `PRAGMA busy_timeout=5000` — prevents "database is locked" errors under concurrent thread writes +- **Retry on exhaustion**: If busy_timeout is exceeded (5s), writes fail with `OperationalError("database is locked")`. Background threads retry with exponential backoff (1s, 2s, 4s), then skip the tick. API handlers retry up to 2 times with 1s backoff, then return 503. See THREADING.md for the full retry implementation. --- @@ -537,7 +538,17 @@ The `config_version` column in `game_configs` prevents lost updates when two adm 4. If match: increment version, write new JSON, return 200 5. If no match (version changed): return **409 Conflict** with current config for client-side merge -## game_data JSON Schema +## Config Schema Migration + +When the adapter is updated and `get_config_version()` returns a newer version than what's stored in `game_configs.schema_version`, the core automatically migrates: + +1. On read, detect `stored_schema_version != adapter.get_config_version()` +2. Call `adapter.migrate_config(stored_schema_version, config_json)` → returns migrated dict +3. Update the row: `SET config_json = ?, schema_version = ? WHERE id = ?` +4. On `ConfigMigrationError`: keep original config, log warning, server runs with old schema +5. Migration is per-section — each section can have a different stored version + +This ensures config data is always compatible with the current adapter without manual intervention. The `game_data` columns on `players`, `missions`, `mods`, and `bans` are validated by adapter-provided Pydantic models. Each capability protocol optionally provides a schema: diff --git a/FRONTEND.md b/FRONTEND.md index 2f9200b..be7903e 100644 --- a/FRONTEND.md +++ b/FRONTEND.md @@ -895,6 +895,14 @@ frontend/ - Optimistic locking (send `config_version` on save) - 409 conflict resolution (show diff, allow override/merge) - Validation errors from adapter (field-level, from Pydantic) +- **Error boundary**: if `jsonSchemaToFields()` encounters an unsupported or malformed schema type (deeply nested objects, unknown formats), render a fallback "This section uses an unsupported field type. Edit raw JSON." with a raw JSON editor instead of crashing the form + +**ConfigForm test plan** (critical — highest-risk component): +- **Unit: `jsonSchemaToFields`** — test every supported type mapping (string→text, integer→number, boolean→toggle, enum→select, array→repeatable), test sensitive field masking, test unknown type → fallback descriptor with `type: "raw_json"` +- **Unit: `jsonSchemaToFields`** — test malformed schema input (missing `properties`, nested `$ref`, `oneOf`/`anyOf`) → returns fallback descriptor, never throws +- **Integration: `ConfigForm`** — render with a 2-section schema, verify field rendering, toggle a boolean, verify dirty state, submit and verify request payload +- **Integration: 409 conflict** — after save, mock a 409 response, verify ConflictDialog appears with diff +- **Integration: error boundary** — mount `ConfigForm` with a schema that has unsupported `type: "object"` nested 3 levels deep, verify raw JSON fallback renders instead of crash **`LogViewer`** — performance-critical: - Virtualized rendering (react-window or similar) @@ -929,7 +937,7 @@ All API data uses TanStack Query with appropriate stale times: | Mods | 5min | — | | Bans | 2min | — | | Game types | 30min | — (rarely changes) | -| Config schema | 30min | — (tied to adapter version) | +| Config schema | 30min | — (tied to adapter version) **Invalidation**: when `GET /games/{type}` returns a different `schema_version` than previously cached, TanStack Query invalidates all config schema queries for that game type. This prevents stale forms after an adapter update. | **Optimistic updates** on: - Server start/stop → immediately update status in cache, rollback on error @@ -1071,7 +1079,7 @@ const { hasMissions, hasMods, hasRemoteAdmin } = useCapability(serverId); interface FieldDescriptor { name: string; label: string; - type: "text" | "number" | "boolean" | "select" | "textarea" | "password" | "array"; + type: "text" | "number" | "boolean" | "select" | "textarea" | "password" | "array" | "raw_json"; default?: unknown; min?: number; max?: number; @@ -1087,6 +1095,9 @@ function jsonSchemaToFields( ): FieldDescriptor[] { // Walk schema.properties, map each to a FieldDescriptor // Mark fields in sensitiveFields as type: "password" + // Unknown types (nested objects, oneOf/anyOf, $ref) → type: "raw_json" + // The form renders a textarea with JSON editing for these fields + // rather than crashing or hiding the field silently } ``` @@ -1105,6 +1116,31 @@ function jsonSchemaToFields( Future adapters register themselves; the card list auto-populates from `GET /games`. +### Schema Cache Invalidation + +Adapter config schemas are cached with `staleTime: 30min`. When an adapter is updated and its `schema_version` changes, the frontend must not serve a stale schema. The invalidation pattern: + +```typescript +// When fetching game type info, compare schema_version +function useGameTypeWithInvalidation(gameType: string) { + const queryClient = useQueryClient(); + return useQuery({ + queryKey: ["gameType", gameType], + staleTime: 30 * 60 * 1000, + onSuccess: (data) => { + const prevVersion = localStorage.getItem(`schema_version_${gameType}`); + if (prevVersion && prevVersion !== data.schema_version) { + // Adapter was updated — invalidate all config schema queries + queryClient.invalidateQueries({ queryKey: ["configSchema", gameType] }); + localStorage.setItem(`schema_version_${gameType}`, data.schema_version); + } else if (!prevVersion) { + localStorage.setItem(`schema_version_${gameType}`, data.schema_version); + } + }, + }); +} +``` + --- ## Error Handling UX @@ -1121,6 +1157,7 @@ Future adapters register themselves; the card list auto-populates from `GET /gam | WS disconnect | StatusBar indicator + stale data with timestamp | | WS reconnect | Automatic; no user action needed | | Server crashed | Toast notification + status dot turns red | +| Malformed adapter schema | Raw JSON fallback in ConfigForm section ("This section uses an unsupported field type. Edit raw JSON.") | **No `alert()` calls.** All feedback uses toast (transient) or inline (persistent) patterns. diff --git a/IMPLEMENTATION_PLAN.md b/IMPLEMENTATION_PLAN.md index e601902..2bf1244 100644 --- a/IMPLEMENTATION_PLAN.md +++ b/IMPLEMENTATION_PLAN.md @@ -447,6 +447,7 @@ Verify React app can: ### Adapter contract tests - Template test suite that any new adapter should pass - Tests: ConfigGenerator produces valid sections and valid config files, ProcessConfig returns allowed executables, LogParser parses sample lines +- ConfigGenerator migration test: `migrate_config(old_version, config_json)` returns valid migrated dict; `ConfigMigrationError` on invalid old_version ### Load notes - SQLite with WAL handles concurrent reads from 4 threads per server well diff --git a/MODULES.md b/MODULES.md index e494358..c7c7060 100644 --- a/MODULES.md +++ b/MODULES.md @@ -281,10 +281,14 @@ class ProcessManager: _instance = None _processes: dict[int, subprocess.Popen] = {} _lock: threading.Lock + _operation_locks: dict[int, threading.Lock] # per-server operation lock @classmethod def get() -> ProcessManager + def get_operation_lock(server_id) -> threading.Lock + # Returns a per-server lock that serializes start/stop/restart + # Prevents concurrent start+stop races for the same server def start(server_id, exe_path, args: list[str], cwd: str) -> int # returns PID def stop(server_id, timeout=30) -> bool def kill(server_id) -> bool @@ -579,6 +583,12 @@ class ConfigGenerator(Protocol): def get_config_version(self) -> str: """Current adapter schema version. Stored in game_configs.schema_version.""" ... + def migrate_config(self, old_version: str, config_json: dict[str, dict]) -> dict[str, dict]: + """Transform config JSON from an older schema version to the current one. + Called by ConfigRepository when a section's stored schema_version + differs from get_config_version(). Returns the migrated config dict. + Raises ConfigMigrationError on failure (core rolls back; original stays).""" + ... def write_configs(self, server_id: int, server_dir: Path, config_sections: dict[str, dict]) -> list[Path]: ... def build_launch_args(self, config_sections: dict[str, dict], diff --git a/THREADING.md b/THREADING.md index 850a0ac..82f88a3 100644 --- a/THREADING.md +++ b/THREADING.md @@ -137,8 +137,16 @@ This means: # Each background thread creates its own SQLAlchemy connection # from the same engine (WAL mode allows concurrent reads) # PRAGMA busy_timeout=5000 prevents "database is locked" errors +# +# If busy_timeout is exhausted (5s), the write fails with +# OperationalError. Background threads retry with exponential +# backoff: 1s, 2s, 4s — then log and skip the tick. +# API request handlers retry up to 2 times with 1s backoff, +# then return 503 "database temporarily unavailable". class BaseServerThread(threading.Thread): + _db_retry_delays = [1.0, 2.0, 4.0] # seconds, exponential backoff + def run(self): engine = get_engine() self._db = engine.connect() @@ -147,6 +155,13 @@ class BaseServerThread(threading.Thread): while not self._stop_event.is_set(): try: self.tick() + except OperationalError as e: + if "database is locked" in str(e): + retried = self._retry_db_write(self.tick) + if not retried: + logger.warning(f"{self.name}: DB locked after all retries, skipping tick") + else: + self.on_error(e) except Exception as e: self.on_error(e) self._stop_event.wait(self.interval) @@ -155,6 +170,18 @@ class BaseServerThread(threading.Thread): finally: self.teardown() self._db.close() + + def _retry_db_write(self, fn, max_retries=3): + for i, delay in enumerate(self._db_retry_delays[:max_retries]): + self._stop_event.wait(delay) + if self._stop_event.is_set(): + return False + try: + fn() + return True + except OperationalError: + continue + return False ``` ---