# Setup Guide — Live Transcription Display

This guide walks through everything needed to get the system running on a
Windows 11 PC from scratch. Follow each section in order.

**Total setup time: approximately 30–60 minutes** (most of that is download time).

---

## Why no installer / executable?

The transcription engine (WhisperLiveKit) depends on PyTorch and CUDA — the
combined download is ~4–5 GB and requires NVIDIA GPU drivers to be installed
natively on the host machine regardless. Packaging everything into a single
`.exe` is not practical for software of this type.

Instead this guide provides:

- `install.bat` — run **once** to set everything up
- `start.bat` — run each time to launch the full system

After setup, operation is a double-click.

---

## Part 1 — System Requirements

Before starting, confirm your PC meets these requirements:

| Requirement | Minimum | Recommended |
| --- | --- | --- |
| OS | Windows 10 64-bit | Windows 11 |
| GPU | NVIDIA GTX 1060 6 GB | NVIDIA RTX 3070 or better |
| VRAM | 6 GB | 8 GB+ |
| RAM | 16 GB | 32 GB |
| Storage | 10 GB free | 20 GB free |
| Internet | Required for setup | Not needed during services |

> The RTX 4070 Super (tested hardware) runs `large-v3` in real time comfortably.
> The RTX 5060 Ti (production hardware) also confirmed working.

---

## Part 2 — NVIDIA Driver

You need an up-to-date NVIDIA driver. You will also need the CUDA Toolkit
(Part 2b below) — the driver alone is not sufficient for all components.

1. Open **GeForce Experience** (if installed) → Drivers → Check for updates.

   **Or** visit [nvidia.com/drivers](https://www.nvidia.com/drivers), enter your
   GPU model, download and run the installer.

2. Choose **Express Installation**.

3. Restart the PC when prompted.

4. Verify the driver is working:

   - Press `Win + R`, type `cmd`, press Enter.
   - Type `nvidia-smi` and press Enter.
   - You should see a table with your GPU name and driver version.

   ```text
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 595.x   Driver Version: 595.x   CUDA Version: 13.x              |
   +-----------------------------------------------------------------------------+
   | RTX 5060 Ti ...
   ```

   If this command is not found, the driver did not install correctly.

---

## Part 2b — CUDA Toolkit

The NVIDIA driver alone is not enough for all GPU components. The CUDA Toolkit
provides compiler tools (`nvcc`) and low-level libraries used by WhisperLiveKit.

> `nvidia-smi` showing "CUDA Version: 13.x" means your *driver supports* up
> to that version — it does **not** mean the toolkit is installed.

1. Go to [developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)

2. Select: **Windows → x86_64 → 11 → exe (local)**

3. Download and run the installer. Choose **Custom install** and ensure
   **CUDA Runtime** and **cuBLAS** are ticked.

4. Restart the PC after installation.

5. Verify:

   ```cmd
   nvcc --version
   ```

   Expected output ends with something like `release 13.x, V13.x.xxx` (the
   exact version will match whatever you downloaded).

   > If `nvcc` is not found, add the toolkit's `bin` folder to your system
   > PATH (e.g. `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin`)
   > using the same method as the Mosquitto PATH fix in Part 4.

**Triton kernel warning** — after installing the CUDA Toolkit you will still
see this at bridge startup:

```text
Failed to launch Triton kernels, likely due to missing CUDA toolkit;
falling back to a slower median kernel implementation...
```

This message is **misleading**. The `triton` Python package does not support
Windows — there is no Windows build. The fallback is expected and has no
practical effect on transcription quality.

---

## Part 3 — Python 3.12

Python 3.12 is the required version. PyTorch (the AI engine that powers
WhisperLiveKit) does not yet publish pre-built packages for Python 3.14 or
3.13, so newer versions will fail at the PyTorch install step.

> If you already have Python 3.13 or 3.14 installed, **do not uninstall it**
> — just install 3.12 alongside it. Windows supports multiple Python versions
> at the same time and `install.bat` will automatically pick the right one.

1. Go to [python.org/downloads](https://www.python.org/downloads/) and look for
   the latest **Python 3.12.x** release. Download the **Windows installer (64-bit)**.

2. Run the installer. On the first screen:

   - **Tick "Add Python to PATH"** (important — do this before clicking Install Now)
   - Click **Install Now**

3. Once complete, verify in a new Command Prompt window:

   ```cmd
   py -3.12 --version
   ```

   Expected output: `Python 3.12.x`

---

## Part 4 — Mosquitto (MQTT Broker)

Mosquitto is the message relay between the transcription bridge and the display.

1. Download the Windows installer from
   [mosquitto.org/download](https://mosquitto.org/download/) — choose the
   `.exe` installer for Windows.

2. Run the installer, accept all defaults.

3. **Add Mosquitto to the system PATH** (the installer does not do this
   automatically). Run Command Prompt **as Administrator**:

   ```cmd
   setx /M PATH "%PATH%;C:\Program Files\mosquitto"
   ```

   Close and reopen the Command Prompt window after running this — PATH changes
   don't take effect in the current window.

4. Start Mosquitto as a Windows service (still as Administrator):

   ```cmd
   net start mosquitto
   ```

5. Set it to start automatically with Windows:

   ```cmd
   sc config mosquitto start=auto
   ```

6. Verify the tools are working:

   ```cmd
   mosquitto_sub -h localhost -t test -v
   ```

   Leave this running in the background. If it shows no errors, Mosquitto is
   working. Press `Ctrl+C` to stop the test.

---

## Part 5 — HuggingFace Account (required for speaker diarization)

The automatic speaker detection uses a model from HuggingFace that requires
accepting its licence terms. This is free — it just needs an account.

1. Go to [huggingface.co](https://huggingface.co) and create a free account.

2. Accept the licence for the diarization model:

   - Visit [huggingface.co/pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
   - Click **"Agree and access repository"**
   - Also visit [huggingface.co/pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
   - Click **"Agree and access repository"**

   > If you skip this step, the server will fail to start with a 403 error.

3. Create an access token:

   - Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
   - Click **New token**
   - Name: `church-transcription` (or anything you like)
   - Role: **Read**
   - Click **Generate token**
   - Copy the token — it starts with `hf_`

4. **Save this token somewhere safe** (Notepad or a password manager). You will
   paste it into `start.bat` in Part 7.

---

## Part 6 — Run install.bat

The `install.bat` script in this folder does the following automatically:

- Creates a Python virtual environment in `.venv\`
- Installs PyTorch with CUDA support
- Installs WhisperLiveKit
- Installs the bridge script dependencies

**Steps:**

1. Open File Explorer and navigate to this project folder.

2. Double-click **`install.bat`**.

   A Command Prompt window will open. You will see packages downloading and
   installing. This will take **10–20 minutes** depending on your internet speed.
   The PyTorch download alone is ~2.5 GB.

3. Near the end you will see the Whisper model downloading for the first time:

   ```text
   Downloading model large-v3 (~3 GB) ...
   ```

   Wait for this to complete. The model is cached after the first download.

4. When you see `Installation complete.` the window will pause. Press any key
   to close it.

> **If install.bat fails** — see the Troubleshooting section at the bottom.

---

## Part 7 — Configure start.bat

Before running the system for the first time, you need to add your HuggingFace
token to the startup script. The token is passed as an **environment variable**
— `start.bat` sets it automatically before launching WhisperLiveKit, so
pyannote can download the diarization model.

1. Right-click **`start.bat`** → **Edit** (opens in Notepad).

2. Find this line near the top:

   ```bat
   set HF_TOKEN=PASTE_YOUR_TOKEN_HERE
   ```

3. Replace `PASTE_YOUR_TOKEN_HERE` with the token you copied in Part 5.
   Example:

   ```bat
   set HF_TOKEN=hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ
   ```

4. Save the file (`Ctrl+S`).

---

## Part 8 — First run

1. Double-click **`start.bat`**.

   Two Command Prompt windows will open:

   - **Window 1 — Bridge**: the transcription pipeline. Wait until you see
     `Audio pipeline running`.
   - **Window 2 — Admin**: the web server. Wait until it shows
     `Application startup complete`.

2. Open the speaker admin page:

   - Open a browser and go to `http://localhost:8001`
   - You should see the Speaker Admin table.

3. Open the display page on a tablet or spare screen:

   - On any device on the same WiFi, open `http://[PC-IP]:8001/display`
   - Press `F11` for fullscreen. A green dot in the corner means it is
     connected and receiving updates.

4. Send a test message to verify the full pipeline. Open a Command Prompt and run:

   ```cmd
   mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}"
   ```

   The display page should update immediately with those three lines.

5. Full pipeline test:

   - Speak naturally into the microphone.
   - After a sentence or natural pause, text should appear on the display within
     3–5 seconds.
   - If two people take turns speaking, a `[PASTOR]` / `[READER]` label line
     should appear between their sections.

---

## Part 9 — Assigning speaker names

The speaker admin page at `http://localhost:8001` shows all detected speakers.
The system automatically labels them `SPEAKER_00`, `SPEAKER_01`, etc.

- The defaults (Pastor, Reader, Guest, Choir) are loaded on first run.
- When a new speaker appears, click their name in the table and type the
  correct name. Changes take effect within 5 seconds.
- Speaker labels appear on the display as a gold heading line (e.g. `PASTOR`)
  whenever the speaker changes.

---

## Ongoing use (every Sunday)

1. Double-click `start.bat`.
2. Wait ~30 seconds for both windows to show "ready" status.
3. Open `http://[PC-IP]:8001/display` on the tablet and press `F11`.
4. Begin the service — transcription runs automatically.
5. Close both windows when done.

---

## Troubleshooting

### `SyntaxError: f-string expression part cannot include a backslash`

WhisperLiveKit requires Python 3.12+. Your virtual environment was built with
Python 3.11. To fix:

1. Install Python 3.12 or later from python.org/downloads (3.11 can stay — they coexist).
2. Delete the `.venv` folder in the project directory.
3. Run `install.bat` again — it will detect and use the newest compatible version.

### `mosquitto_sub` or `mosquitto_pub` is not recognised

The Mosquitto installer sets up the Windows service but does not add its tools
to the system PATH. Run Command Prompt **as Administrator** and execute:

```bat
setx /M PATH "%PATH%;C:\Program Files\mosquitto"
```

Close and reopen the Command Prompt, then retry the command.

### `nvidia-smi` not found

The NVIDIA driver is not installed or not in PATH. Re-run the driver installer
and restart the PC.

### `python --version` shows wrong version or "not found"

Python was not added to PATH. Re-run the Python installer, choose "Modify",
and tick "Add Python to environment variables".

### install.bat fails with "torch" errors — `No matching distribution found`

PyTorch does not publish pre-built packages for Python 3.14 (or very new
versions). Install **Python 3.12** from python.org alongside your current
version — they coexist safely. Then delete `.venv` and re-run `install.bat`;
it will automatically select Python 3.12.

If the error occurs on Python 3.12, the PyTorch download may have failed
mid-way. Delete `.venv` and re-run `install.bat` with a stable connection.

### Whisper server fails with `401` or `403`

Your HuggingFace token is incorrect, or you have not accepted the model licence
terms. Re-check Part 5 — both model pages must have "Agree and access
repository" clicked while logged into the same account that generated the token.

### Bridge starts but no text appears on the display

Check that the correct audio input device is selected:

- Open Windows **Sound Settings** → Input → ensure the microphone or audio
  interface is set as the default device.
- Or set `AUDIO_DEVICE` to a specific device index in `bridge/bridge.py`.

### Display page does not update

- Check the green/red dot in the bottom-right corner of the display page.
  Red means the browser lost its connection to the admin server.
- Confirm `admin.py` is running and accessible at `http://[PC-IP]:8001`.
- Confirm Mosquitto is running: `sc query mosquitto`
- Verify the PC's IP has not changed — tablets store the URL, so update it
  if the PC was assigned a new address.

### `large-v3` is too slow (display lags more than 5–6 seconds)

Switch to a faster model by editing `bridge/bridge.py`:

```python
engine = TranscriptionEngine(model_size="distil-large-v3", ...)
```

`distil-large-v3` is ~50% faster with only a small accuracy reduction.