# Setup Guide — Live Transcription Display

This guide walks through everything needed to get the system running on a
Windows 11 PC from scratch. Follow each section in order.

**Total setup time: approximately 30–60 minutes** (most of that is download time).

---

## Why no installer / executable?

The transcription engine (WhisperLiveKit) depends on PyTorch and CUDA — the
combined download is ~4–5 GB and requires NVIDIA GPU drivers to be installed
natively on the host machine regardless. Packaging everything into a single
`.exe` is not practical for software of this type.

Instead this guide provides:
- `install.bat` — run **once** to set everything up
- `start.bat` — run each time to launch the full system

After setup, operation is a double-click.

---

## Part 1 — System Requirements

Before starting, confirm your PC meets these requirements:

| Requirement | Minimum | Recommended |
|---|---|---|
| OS | Windows 10 64-bit | Windows 11 |
| GPU | NVIDIA GTX 1060 6 GB | NVIDIA RTX 3070 or better |
| VRAM | 6 GB | 8 GB+ |
| RAM | 16 GB | 32 GB |
| Storage | 10 GB free | 20 GB free |
| Internet | Required for setup | Not needed during services |

> The RTX 4070 Super (tested hardware) runs `large-v3` in real time comfortably.

---

## Part 2 — NVIDIA Driver

You need an up-to-date NVIDIA driver. You do **not** need to install the CUDA
Toolkit separately — PyTorch bundles everything it needs.

1. Open **GeForce Experience** (if installed) → Drivers → Check for updates.

   **Or** visit [nvidia.com/drivers](https://www.nvidia.com/drivers), enter your
   GPU model, download and run the installer.

2. Choose **Express Installation**.

3. Restart the PC when prompted.

4. Verify the driver is working:
   - Press `Win + R`, type `cmd`, press Enter.
   - Type `nvidia-smi` and press Enter.
   - You should see a table with your GPU name and driver version.

   ```
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 560.x   Driver Version: 560.x   CUDA Version: 12.6              |
   +-----------------------------------------------------------------------------+
   | RTX 4070 Super ...
   ```

   If this command is not found, the driver did not install correctly.

---

## Part 2b — CUDA Toolkit 12.x

The NVIDIA driver alone is not enough. WhisperLiveKit uses **faster-whisper**
(via ctranslate2) for inference, which requires the CUDA runtime libraries to
be installed separately. Without this you will see `cublas64_12.dll not found`
and the server will fall back to CPU-only mode, making transcription too slow
for live use.

> `nvidia-smi` showing "CUDA Version: 12.6" means your *driver supports* up
> to that version — it does **not** mean the toolkit is installed.

1. Go to [developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)

2. Select: **Windows → x86_64 → 11 → exe (local)**

3. Download and run the installer. Choose **Custom install** and ensure
   **CUDA Runtime** and **cuBLAS** are ticked.

4. Restart the PC after installation.

5. Verify:

   ```
   nvcc --version
   ```

   Expected: `release 12.x, V12.x.xxx`

   > If `nvcc` is not found, add `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x\bin`
   > to your system PATH (same method as the Mosquitto PATH fix in Part 4).

---

## Part 3 — Python 3.12

Python 3.12 is the required version. PyTorch (the AI engine that powers
WhisperLiveKit) does not yet publish pre-built packages for Python 3.14 or
3.13, so newer versions will fail at the PyTorch install step.

> If you already have Python 3.13 or 3.14 installed, **do not uninstall it**
> — just install 3.12 alongside it. Windows supports multiple Python versions
> at the same time and `install.bat` will automatically pick the right one.

1. Go to [python.org/downloads](https://www.python.org/downloads/) and look for
   the latest **Python 3.12.x** release. Download the **Windows installer (64-bit)**.

2. Run the installer. On the first screen:
   - **Tick "Add Python to PATH"** (important — do this before clicking Install Now)
   - Click **Install Now**

3. Once complete, verify in a new Command Prompt window:

   ```
   py -3.12 --version
   ```

   Expected output: `Python 3.12.x`

---

## Part 4 — Mosquitto (MQTT Broker)

Mosquitto is the message relay between the PC and the display.

1. Download the Windows installer from
   [mosquitto.org/download](https://mosquitto.org/download/) — choose the
   `.exe` installer for Windows.

2. Run the installer, accept all defaults.

3. **Add Mosquitto to the system PATH** (the installer does not do this
   automatically). Run Command Prompt **as Administrator**:

   ```
   setx /M PATH "%PATH%;C:\Program Files\mosquitto"
   ```

   Close and reopen the Command Prompt window after running this — PATH changes
   don't take effect in the current window.

4. Start Mosquitto as a Windows service (still as Administrator):

   ```
   net start mosquitto
   ```

5. Set it to start automatically with Windows:

   ```
   sc config mosquitto start=auto
   ```

6. Verify the tools are working:

   ```
   mosquitto_sub -h localhost -t test -v
   ```

   Leave this running in the background. If it shows no errors, Mosquitto is
   working. Press `Ctrl+C` to stop the test.

---

## Part 5 — HuggingFace Account (required for speaker diarization)

The automatic speaker detection uses a model from HuggingFace that requires
accepting its licence terms. This is free — it just needs an account.

1. Go to [huggingface.co](https://huggingface.co) and create a free account.

2. Accept the licence for the diarization model:
   - Visit [huggingface.co/pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
   - Click **"Agree and access repository"**
   - Also visit [huggingface.co/pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
   - Click **"Agree and access repository"**

   > If you skip this step, the server will fail to start with a 403 error.

3. Create an access token:
   - Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
   - Click **New token**
   - Name: `church-transcription` (or anything you like)
   - Role: **Read**
   - Click **Generate token**
   - Copy the token — it starts with `hf_`
   

4. **Save this token somewhere safe** (Notepad or a password manager). You will
   paste it into `start.bat` in Part 7.

---

## Part 6 — Run install.bat

The `install.bat` script in this folder does the following automatically:
- Creates a Python virtual environment in `.venv\`
- Installs PyTorch with CUDA support
- Installs WhisperLiveKit
- Installs the bridge script dependencies

**Steps:**

1. Open File Explorer and navigate to this project folder.

2. Double-click **`install.bat`**.

   A Command Prompt window will open. You will see packages downloading and
   installing. This will take **10–20 minutes** depending on your internet speed.
   The PyTorch download alone is ~2.5 GB.

3. Near the end you will see the Whisper model downloading for the first time:

   ```
   Downloading model large-v3 (~3 GB) ...
   ```

   Wait for this to complete. The model is cached after the first download.

4. When you see `Installation complete.` the window will pause. Press any key
   to close it.

> **If install.bat fails** — see the Troubleshooting section at the bottom.

---

## Part 7 — Configure start.bat

Before running the system for the first time, you need to add your HuggingFace
token to the startup script. The token is passed as an **environment variable**
— `start.bat` sets it automatically before launching WhisperLiveKit, so
pyannote can download the diarization model.

1. Right-click **`start.bat`** → **Edit** (opens in Notepad).

2. Find this line near the top:

   ```bat
   set HF_TOKEN=PASTE_YOUR_TOKEN_HERE
   ```

3. Replace `PASTE_YOUR_TOKEN_HERE` with the token you copied in Part 5.
   Example:

   ```bat
   set HF_TOKEN=hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ
   ```

4. Save the file (`Ctrl+S`).

---

## Part 8 — First run

1. Double-click **`start.bat`**.

   Two windows will open:
   - **Window 1 — Whisper Server**: shows the transcription engine loading.
     On first run this downloads the speaker diarization model (~500 MB).
     Wait until you see `Server running on ws://0.0.0.0:8000`.
   - **Window 2 — Bridge**: the speaker name mapping window appears, and the
     Command Prompt behind it shows connection status.

2. Verify the Whisper server is working:
   - Open a browser and go to `http://localhost:8000`
   - You should see a simple web interface. Speak into the microphone — text
     should appear.

3. Verify the display:
   - With the ESP32 powered on and connected to the same WiFi, send a test
     message. Open a third Command Prompt and run:

     ```
     mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}"
     ```

   - The e-ink display should refresh within 2 seconds showing those three lines.

4. Full pipeline test:
   - Speak naturally into the microphone.
   - After a sentence or natural pause, text should appear on the display within
     3–5 seconds.
   - If two people take turns speaking, a `[PASTOR]` / `[READER]` label line
     should appear between their sections.

---

## Part 9 — Assigning speaker names

The bridge window shows a **Speaker Name Mapping** panel. The system
automatically detects different speakers and labels them SPEAKER_00,
SPEAKER_01, etc.

- The defaults (Pastor, Reader, Guest, Choir) are applied immediately when the
  bridge starts.
- If a different person is speaking than expected, type their name in the
  matching row and click **Apply**.
- Speaker labels appear on the display as a short heading line (e.g. `[PASTOR]`)
  whenever the speaker changes.

---

## Ongoing use (every Sunday)

1. Double-click `start.bat`.
2. Wait ~30 seconds for both windows to show "ready" status.
3. The display will show `DISPLAY READY` when the ESP32 connects.
4. Begin the service — transcription runs automatically.
5. Close both windows when done.

---

## Troubleshooting

### `SyntaxError: f-string expression part cannot include a backslash`

WhisperLiveKit requires Python 3.12+. Your virtual environment was built with
Python 3.11. To fix:

1. Install Python 3.12 or later from python.org/downloads (3.11 can stay — they coexist).
2. Delete the `.venv` folder in the project directory.
3. Run `install.bat` again — it will detect and use the newest compatible version.

### `mosquitto_sub` or `mosquitto_pub` is not recognised

The Mosquitto installer sets up the Windows service but does not add its tools
to the system PATH. Run Command Prompt **as Administrator** and execute:

```bat
setx /M PATH "%PATH%;C:\Program Files\mosquitto"
```

Close and reopen the Command Prompt, then retry the command.

### `nvidia-smi` not found
The NVIDIA driver is not installed or not in PATH. Re-run the driver installer
and restart the PC.

### `python --version` shows wrong version or "not found"
Python was not added to PATH. Re-run the Python installer, choose "Modify",
and tick "Add Python to environment variables".

### install.bat fails with "torch" errors — `No matching distribution found`

PyTorch does not publish pre-built packages for Python 3.14 (or very new
versions). Install **Python 3.12** from python.org alongside your current
version — they coexist safely. Then delete `.venv` and re-run `install.bat`;
it will automatically select Python 3.12.

If the error occurs on Python 3.12, the PyTorch download may have failed
mid-way. Delete `.venv` and re-run `install.bat` with a stable connection.

### Whisper server fails with `401` or `403`
Your HuggingFace token is incorrect, or you have not accepted the model licence
terms. Re-check Part 5 — both model pages must have "Agree and access
repository" clicked while logged into the same account that generated the token.

### Whisper server starts but no text appears
Check that the correct audio input device is selected:
- Open Windows **Sound Settings** → Input → ensure the microphone or audio
  interface is set as the default device.
- The bridge uses the Windows default input device.

### Display does not update
- Check the ESP32 Serial Monitor for WiFi/MQTT connection messages.
- Verify `MQTT_HOST` in `main.cpp` matches the PC's IP address (`ipconfig` →
  look for the WiFi adapter IPv4 address).
- Confirm Mosquitto is running: `sc query mosquitto`

### `large-v3` is too slow (display lags more than 5–6 seconds)
Switch to a faster model by editing `start.bat`:

```
set WHISPER_MODEL=distil-large-v3
```

`distil-large-v3` is ~50%% faster with only a small accuracy reduction.