arma-modlist-tools/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Common Commands

```bash
# Run all tests (no network required)
python test_suite.py

# Check Python version and dependencies
python check_deps.py

# Full pipeline (parse → compare → fetch → link)
python run.py

# Parse + compare only (no download, no linking)
python run.py --skip-fetch --skip-link

# Diagnose mod folder name / steam_id issues
python check_names.py
python check_names.py --fix --fix-ids

# Launch the GUI
python gui.py
```

There is no build step, linter config, or package install beyond `pip install -r requirements.txt`.

## Architecture

### Package vs CLI layer

`arma_modlist_tools/` is a pure library — no I/O side effects, no `sys.exit`, no `print`. All CLI scripts (`run.py`, `fetch_mods.py`, `link_mods.py`, etc.) sit at the project root and call into the package. New functionality goes in the package first, then a CLI script wraps it.

### Data flow

```
modlist_html/*.html
    └─ parser.parse_modlist_dir()
           └─ compare.compare_presets()
                  └─ comparison.json  ←─ source of truth for groups + mod identity
                         ├─ fetcher.build_server_index()  ←─ Caddy JSON API
                         │      └─ fetcher.find_mod_folder()  (steam_id first, name fallback)
                         │             └─ downloads/{group}/@ModName/
                         │                    └─ linker.link_group()
                         │                           └─ arma_dir/@ModName  (junction/symlink)
                         └─ reporter.build_missing_report()  →  missing_report.json
```

### Group naming convention

- `"shared"` — mods present in **all** compared presets
- `"<preset_name>"` — mods unique to one preset (key from `comparison["unique"]`)

This group label is stored in `missing_report.json` per-mod so `sync_missing.py` knows where to place newly available mods without re-reading `comparison.json`.

### Server index structure

`build_server_index()` returns:
```python
{
    "by_steam_id": {"450814997": "https://server/@cba_a3/"},  # primary lookup
    "by_name":     {"cbaa3":     "https://server/@cba_a3/"},  # normalized fallback
    "folders":     [...]                                       # raw Caddy listing
}
```

`_normalize_name` strips `@`, lowercases, removes all non-alphanumeric: `"@CBA_A3"` → `"cbaa3"`. Used in both the index builder and every lookup.

### Junction / symlink critical rules

**Detection:** `os.path.islink()` returns `False` for Windows junctions. Always use `_is_junction()` from `linker.py`, which checks `st_file_attributes & 0x400` (`FILE_ATTRIBUTE_REPARSE_POINT`) on Windows.

**Removal:** Use `os.rmdir()` on Windows and `os.unlink()` on Linux. **Never** `shutil.rmtree()` — it follows the junction and deletes the target mod files.

**Creation:** `cmd /c mklink /J <link> <target>` on Windows, `os.symlink()` on Linux.

### check_names.py classification (two-pass)

Pass 1 collects raw `(server_name, local_steam_id)` for every disk folder.
Pass 2 builds `ok_disk_names` — the set of disk names that already match the server exactly. Any MISMATCH whose proposed server name is in `ok_disk_names` is reclassified as `ID_COLLISION` (the local `meta.cpp` has a wrong `publishedid` that belongs to a different mod). This prevents false rename suggestions caused by shared/duplicate steam IDs on the server.

`--fix-ids` corrects `meta.cpp` using steam IDs from `comparison.json` (sourced from Steam Workshop URLs in the HTML presets) as the authoritative source.

### GUI package

`gui/` is a CustomTkinter desktop application wrapping the CLI toolchain. Entry point is `gui.py` at the project root, which calls `gui.run_app()`.

**Key files:**
- `gui/__init__.py` — sets dark theme + blue color scheme; exports `run_app()`
- `gui/app.py` — `ArmaModManagerApp` main window; manages view routing, config loading, thread-safe log queue, and background pipeline execution
- `gui/wizard.py` — `SetupWizard` dialog shown on first launch when no `config.json` exists
- `gui/_constants.py` — window dimensions, status color constants, file paths
- `gui/_io.py` — `_QueueWriter` redirects stdout/stderr to a thread-safe queue so pipeline output streams into the Logs view. `write()` strips ANSI/CSI escape codes and converts bare `\r` to `\n` before enqueuing, so `tqdm` progress output is legible in the textbox.

**Views** (`gui/views/`): each inherits `BaseView`; `build()` runs once on creation, `refresh()` runs on each navigation:
- `dashboard.py` — overview, status, quick stats
- `mods.py` — browse and manage downloaded mods by group
- `tools.py` — link/unlink, rename folders, sync missing mods, check server
- `logs.py` — real-time log viewer fed from the stdout/stderr queue
- `settings.py` — in-app editor for `config.json` (server URL, paths, credentials)

**`_find_folder` (mods.py) — four-level name matching:** The mods view resolves a mod's local folder by mod name from `comparison.json`, which may differ from the server-canonical folder name used by the fetcher. Lookup order:
1. Exact: `@{mod_name}`
2. Case-insensitive: `@CBA_A3` matches `CBA_A3`
3. Normalized (`_normalize_name`): strips all non-alphanumeric — handles punctuation/spacing differences, e.g. `@US GEAr- Units (IFA3)` matches `US GEAr: Units (IFA3)` (both → `usgearunitsifa3`)
4. Steam ID via `meta.cpp`: reads `publishedid` from each folder's `meta.cpp` and matches against `mod["steam_id"]` — handles the case where the folder name bears no resemblance to the modlist name but the mod content is correct

**`selection.json`** — GUI selection state file, tracked in git. Persists which mods/groups are selected between GUI sessions. Written by the GUI; safe to delete (GUI recreates it on next save).

**`run_tool` subprocess streaming:** Tool scripts are launched via `subprocess.Popen` (not `subprocess.run`) with `stdout=PIPE, stderr=STDOUT`, read line-by-line via `iter(proc.stdout.readline, "")`, and posted to the log queue immediately. Python's own output buffering is disabled with the `-u` flag and `PYTHONUNBUFFERED=1` in the environment — without these, output would batch inside the pipe and only appear when the script exits. The `Popen` call uses `encoding="utf-8", errors="replace"` and sets `PYTHONUTF8=1` in the child environment so that tqdm's Unicode block characters (e.g. `▉`) don't crash the pipe reader on Windows, where the default `charmap` codec cannot decode them.

**GUI threading model:** Every network or long-running operation runs in a `threading.Thread(daemon=True)` so the Tkinter event loop is never blocked. The only safe way to update widgets from a background thread is `self.after(0, callback)` — never touch widgets directly from a worker thread. `_poll_log` drains the entire log queue in one `after(80, ...)` tick and does a single batched `CTkTextbox.insert()` call rather than one per log entry, keeping the UI smooth even when `tqdm` emits many rapid updates during downloads. The wizard's "Test Connection" button follows the same pattern: `requests.get` runs in a daemon thread; the result is posted back via `self.after(0, ...)` with widget references captured *before* the thread starts, so stale references cannot update the wrong widgets if the user navigates away mid-request.

**`run_pipeline` worker — import guard:** `from run import step_fetch, step_link` is performed inside its own `try/except` *before* stdout is redirected. If this import fails for any reason the exception is posted to the log via `self.after(0, ...)` and `_pipeline_done` is called so the UI resets cleanly. Previously an import failure would silently kill the worker thread and leave the pipeline button disabled forever.

**`build_server_index` progress callback:** Accepts an optional `progress_fn(current, total, name)` callback. `step_fetch` in `run.py` uses this to print `Indexing N/M: @FolderName` every 25 folders so the log never goes silent during the server scan phase. The library itself never calls `print` — the caller owns the I/O.

### `update_mods.py` — orphan file removal

After downloading updated files, `update_mods.py` compares every file in the local mod folder against the server's file list and **deletes any local files that no longer exist on the server**. This prevents stale `.pbo` or `.bisign` files from accumulating when a mod's content changes upstream. Each removed file is logged as `[-] orphan removed: <rel_path>` and the final summary line includes an orphan count. The orphan check runs even when no files need downloading (e.g. timestamps match but the local folder has extras).

### GUI localization (`gui/locales.py`)

All user-facing strings are centralised in `gui/locales.py`. Two languages are supported: English (`"en"`) and Vietnamese (`"vi"`).

**API:**
```python
from gui.locales import t, set_language, get_language

t("nav.dashboard")                          # → "Dashboard" or "Tổng quan"
t("dashboard.stats", total=42, shared=10)   # → "42 mods · 10 shared"
set_language("vi")                          # switch active language
get_language()                              # → "vi"
```

**Key naming:** flat dot-notation — `"<view>.<widget_purpose>"`, e.g. `"dashboard.run_btn"`, `"wizard.step1_title"`, `"tools.cn_warn"`.

**Dynamic strings** use `str.format_map` with keyword args. The dict value contains `{placeholder}` and the caller passes `t("key", placeholder=value)`.

**Hot-swap:** `app.switch_language(lang)` calls `set_language()`, saves the preference to `config.json` under `"ui": {"language": "..."}`, retranslates sidebar nav buttons, then calls `view.refresh()` on every cached view. Views that build all content in `refresh()` (Settings, Mods) update automatically. Views with static `build()`-time widgets (Dashboard, Logs, Tools) store widget references and retranslate them at the top of `refresh()`.

**Constraints:**
- `CTkTabview` tab names in `tools.py` are kept in English — they double as frame lookup keys (`tv.tab("Check Names")`) and cannot be renamed after creation.
- Segmented button values in `tools.py` (`"Status"`, `"Link"`, `"Unlink"`) are kept in English — they drive the logic in `_lm_on_change()`.
- `_VIEW_NAMES` routing keys (`"Dashboard"`, `"Mods"`, etc.) are kept in English — they are `_view_cache` dict keys.

**Adding a new string:** Add the key to both `_EN` and `_VI` dicts in `locales.py`. The `assert set(_EN.keys()) == set(_VI.keys())` guard at module load will catch any mismatch.

## Python Version Compatibility

Minimum is Python **3.9**. All files that use `X | Y` union type annotations **must** have `from __future__ import annotations` as the first import. Without it, the `|` syntax raises `TypeError` at runtime on Python < 3.10. Every module in `arma_modlist_tools/` already has it; any new CLI script you add must include it too.

### `fix_console_encoding` — `None` stdout guard

When the GUI is launched via `pythonw.exe` (no console window), Python sets `sys.stdout` and `sys.stderr` to `None`. `fix_console_encoding()` must check `if sys.stdout is None or sys.stderr is None: return` **before** accessing `.encoding`, otherwise it raises `AttributeError: 'NoneType' object has no attribute 'encoding'`. This error surfaces in the GUI as *"Failed to load pipeline"* because `run.py` calls `fix_console_encoding()` at module level and the exception is caught by the pipeline import guard.

## Test Suite

`test_suite.py` uses a custom harness (no pytest/unittest dependency). Structure:

```python
group("section name")   # prints header
test("description", callable)  # runs fn, catches exceptions, tracks pass/fail
skip("description", "reason")  # marks skipped
```

Tests that exercise the linker use `tempfile.TemporaryDirectory()` — never the real `arma_dir`. Tests that would require network calls mock `list_mod_files` with `unittest.mock.patch`.

## Key Files Not in Git

- `config.json` — credentials + paths (copy from `config.template.json`)
- `downloads/` — downloaded mod files, can be several GB
- `modlist_json/` — generated JSON output

The `.html` preset files in `modlist_html/` **are** tracked as example inputs.