|
|
@@ -0,0 +1,310 @@
|
|
|
+# Setup Guide — Church Live Transcription Display
|
|
|
+
|
|
|
+This guide walks through everything needed to get the system running on a
|
|
|
+Windows 11 PC from scratch. Follow each section in order.
|
|
|
+
|
|
|
+**Total setup time: approximately 30–60 minutes** (most of that is download time).
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Why no installer / executable?
|
|
|
+
|
|
|
+The transcription engine (WhisperLiveKit) depends on PyTorch and CUDA — the
|
|
|
+combined download is ~4–5 GB and requires NVIDIA GPU drivers to be installed
|
|
|
+natively on the host machine regardless. Packaging everything into a single
|
|
|
+`.exe` is not practical for software of this type.
|
|
|
+
|
|
|
+Instead this guide provides:
|
|
|
+- `install.bat` — run **once** to set everything up
|
|
|
+- `start.bat` — run each time to launch the full system
|
|
|
+
|
|
|
+After setup, operation is a double-click.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Part 1 — System Requirements
|
|
|
+
|
|
|
+Before starting, confirm your PC meets these requirements:
|
|
|
+
|
|
|
+| Requirement | Minimum | Recommended |
|
|
|
+|---|---|---|
|
|
|
+| OS | Windows 10 64-bit | Windows 11 |
|
|
|
+| GPU | NVIDIA GTX 1060 6 GB | NVIDIA RTX 3070 or better |
|
|
|
+| VRAM | 6 GB | 8 GB+ |
|
|
|
+| RAM | 16 GB | 32 GB |
|
|
|
+| Storage | 10 GB free | 20 GB free |
|
|
|
+| Internet | Required for setup | Not needed during services |
|
|
|
+
|
|
|
+> The RTX 4070 Super (tested hardware) runs `large-v3` in real time comfortably.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Part 2 — NVIDIA Driver
|
|
|
+
|
|
|
+You need an up-to-date NVIDIA driver. You do **not** need to install the CUDA
|
|
|
+Toolkit separately — PyTorch bundles everything it needs.
|
|
|
+
|
|
|
+1. Open **GeForce Experience** (if installed) → Drivers → Check for updates.
|
|
|
+
|
|
|
+ **Or** visit [nvidia.com/drivers](https://www.nvidia.com/drivers), enter your
|
|
|
+ GPU model, download and run the installer.
|
|
|
+
|
|
|
+2. Choose **Express Installation**.
|
|
|
+
|
|
|
+3. Restart the PC when prompted.
|
|
|
+
|
|
|
+4. Verify the driver is working:
|
|
|
+ - Press `Win + R`, type `cmd`, press Enter.
|
|
|
+ - Type `nvidia-smi` and press Enter.
|
|
|
+ - You should see a table with your GPU name and driver version.
|
|
|
+
|
|
|
+ ```
|
|
|
+ +-----------------------------------------------------------------------------+
|
|
|
+ | NVIDIA-SMI 560.x Driver Version: 560.x CUDA Version: 12.6 |
|
|
|
+ +-----------------------------------------------------------------------------+
|
|
|
+ | RTX 4070 Super ...
|
|
|
+ ```
|
|
|
+
|
|
|
+ If this command is not found, the driver did not install correctly.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Part 3 — Python 3.11
|
|
|
+
|
|
|
+1. Go to [python.org/downloads](https://www.python.org/downloads/release/python-3119/)
|
|
|
+ and download **Python 3.11.x** (Windows installer, 64-bit).
|
|
|
+
|
|
|
+ > Use Python **3.11** specifically. Some ML libraries have known issues with
|
|
|
+ > Python 3.13 on Windows.
|
|
|
+
|
|
|
+2. Run the installer. On the first screen:
|
|
|
+ - **Tick "Add Python to PATH"** (important — do this before clicking Install Now)
|
|
|
+ - Click **Install Now**
|
|
|
+
|
|
|
+3. Once complete, verify in a new Command Prompt window:
|
|
|
+
|
|
|
+ ```
|
|
|
+ python --version
|
|
|
+ ```
|
|
|
+
|
|
|
+ Expected output: `Python 3.11.x`
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Part 4 — Mosquitto (MQTT Broker)
|
|
|
+
|
|
|
+Mosquitto is the message relay between the PC and the display.
|
|
|
+
|
|
|
+1. Download the Windows installer from
|
|
|
+ [mosquitto.org/download](https://mosquitto.org/download/) — choose the
|
|
|
+ `.exe` installer for Windows.
|
|
|
+
|
|
|
+2. Run the installer, accept all defaults.
|
|
|
+
|
|
|
+3. Start Mosquitto as a Windows service (run Command Prompt **as Administrator**):
|
|
|
+
|
|
|
+ ```
|
|
|
+ net start mosquitto
|
|
|
+ ```
|
|
|
+
|
|
|
+4. Set it to start automatically with Windows:
|
|
|
+
|
|
|
+ ```
|
|
|
+ sc config mosquitto start=auto
|
|
|
+ ```
|
|
|
+
|
|
|
+5. Verify it's running:
|
|
|
+
|
|
|
+ ```
|
|
|
+ mosquitto_sub -h localhost -t test -v
|
|
|
+ ```
|
|
|
+
|
|
|
+ Leave this running in the background. If it shows no errors, Mosquitto is
|
|
|
+ working. Press `Ctrl+C` to stop the test.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Part 5 — HuggingFace Account (required for speaker diarization)
|
|
|
+
|
|
|
+The automatic speaker detection uses a model from HuggingFace that requires
|
|
|
+accepting its licence terms. This is free — it just needs an account.
|
|
|
+
|
|
|
+1. Go to [huggingface.co](https://huggingface.co) and create a free account.
|
|
|
+
|
|
|
+2. Accept the licence for the diarization model:
|
|
|
+ - Visit [huggingface.co/pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
|
|
|
+ - Click **"Agree and access repository"**
|
|
|
+ - Also visit [huggingface.co/pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
|
|
|
+ - Click **"Agree and access repository"**
|
|
|
+
|
|
|
+ > If you skip this step, the server will fail to start with a 403 error.
|
|
|
+
|
|
|
+3. Create an access token:
|
|
|
+ - Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
|
|
|
+ - Click **New token**
|
|
|
+ - Name: `church-transcription` (or anything you like)
|
|
|
+ - Role: **Read**
|
|
|
+ - Click **Generate token**
|
|
|
+ - Copy the token — it starts with `hf_`
|
|
|
+
|
|
|
+4. **Save this token somewhere safe** (Notepad or a password manager). You will
|
|
|
+ paste it into `start.bat` in Part 7.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Part 6 — Run install.bat
|
|
|
+
|
|
|
+The `install.bat` script in this folder does the following automatically:
|
|
|
+- Creates a Python virtual environment in `.venv\`
|
|
|
+- Installs PyTorch with CUDA support
|
|
|
+- Installs WhisperLiveKit
|
|
|
+- Installs the bridge script dependencies
|
|
|
+
|
|
|
+**Steps:**
|
|
|
+
|
|
|
+1. Open File Explorer and navigate to this project folder.
|
|
|
+
|
|
|
+2. Double-click **`install.bat`**.
|
|
|
+
|
|
|
+ A Command Prompt window will open. You will see packages downloading and
|
|
|
+ installing. This will take **10–20 minutes** depending on your internet speed.
|
|
|
+ The PyTorch download alone is ~2.5 GB.
|
|
|
+
|
|
|
+3. Near the end you will see the Whisper model downloading for the first time:
|
|
|
+
|
|
|
+ ```
|
|
|
+ Downloading model large-v3 (~3 GB) ...
|
|
|
+ ```
|
|
|
+
|
|
|
+ Wait for this to complete. The model is cached after the first download.
|
|
|
+
|
|
|
+4. When you see `Installation complete.` the window will pause. Press any key
|
|
|
+ to close it.
|
|
|
+
|
|
|
+> **If install.bat fails** — see the Troubleshooting section at the bottom.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Part 7 — Configure start.bat
|
|
|
+
|
|
|
+Before running the system for the first time, you need to add your HuggingFace
|
|
|
+token to the startup script.
|
|
|
+
|
|
|
+1. Right-click **`start.bat`** → **Edit** (opens in Notepad).
|
|
|
+
|
|
|
+2. Find this line near the top:
|
|
|
+
|
|
|
+ ```
|
|
|
+ set HF_TOKEN=PASTE_YOUR_TOKEN_HERE
|
|
|
+ ```
|
|
|
+
|
|
|
+3. Replace `PASTE_YOUR_TOKEN_HERE` with the token you copied in Part 5.
|
|
|
+ Example:
|
|
|
+
|
|
|
+ ```
|
|
|
+ set HF_TOKEN=hf_aBcDeFgHiJkLmNoPqRsTuVwXyZ
|
|
|
+ ```
|
|
|
+
|
|
|
+4. Save the file (`Ctrl+S`).
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Part 8 — First run
|
|
|
+
|
|
|
+1. Double-click **`start.bat`**.
|
|
|
+
|
|
|
+ Two windows will open:
|
|
|
+ - **Window 1 — Whisper Server**: shows the transcription engine loading.
|
|
|
+ On first run this downloads the speaker diarization model (~500 MB).
|
|
|
+ Wait until you see `Server running on ws://0.0.0.0:8000`.
|
|
|
+ - **Window 2 — Bridge**: the speaker name mapping window appears, and the
|
|
|
+ Command Prompt behind it shows connection status.
|
|
|
+
|
|
|
+2. Verify the Whisper server is working:
|
|
|
+ - Open a browser and go to `http://localhost:8000`
|
|
|
+ - You should see a simple web interface. Speak into the microphone — text
|
|
|
+ should appear.
|
|
|
+
|
|
|
+3. Verify the display:
|
|
|
+ - With the ESP32 powered on and connected to the same WiFi, send a test
|
|
|
+ message. Open a third Command Prompt and run:
|
|
|
+
|
|
|
+ ```
|
|
|
+ mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}"
|
|
|
+ ```
|
|
|
+
|
|
|
+ - The e-ink display should refresh within 2 seconds showing those three lines.
|
|
|
+
|
|
|
+4. Full pipeline test:
|
|
|
+ - Speak naturally into the microphone.
|
|
|
+ - After a sentence or natural pause, text should appear on the display within
|
|
|
+ 3–5 seconds.
|
|
|
+ - If two people take turns speaking, a `[PASTOR]` / `[READER]` label line
|
|
|
+ should appear between their sections.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Part 9 — Assigning speaker names
|
|
|
+
|
|
|
+The bridge window shows a **Speaker Name Mapping** panel. The system
|
|
|
+automatically detects different speakers and labels them SPEAKER_00,
|
|
|
+SPEAKER_01, etc.
|
|
|
+
|
|
|
+- The defaults (Pastor, Reader, Guest, Choir) are applied immediately when the
|
|
|
+ bridge starts.
|
|
|
+- If a different person is speaking than expected, type their name in the
|
|
|
+ matching row and click **Apply**.
|
|
|
+- Speaker labels appear on the display as a short heading line (e.g. `[PASTOR]`)
|
|
|
+ whenever the speaker changes.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Ongoing use (every Sunday)
|
|
|
+
|
|
|
+1. Double-click `start.bat`.
|
|
|
+2. Wait ~30 seconds for both windows to show "ready" status.
|
|
|
+3. The display will show `DISPLAY READY` when the ESP32 connects.
|
|
|
+4. Begin the service — transcription runs automatically.
|
|
|
+5. Close both windows when done.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## Troubleshooting
|
|
|
+
|
|
|
+### `nvidia-smi` not found
|
|
|
+The NVIDIA driver is not installed or not in PATH. Re-run the driver installer
|
|
|
+and restart the PC.
|
|
|
+
|
|
|
+### `python --version` shows wrong version or "not found"
|
|
|
+Python was not added to PATH. Re-run the Python installer, choose "Modify",
|
|
|
+and tick "Add Python to environment variables".
|
|
|
+
|
|
|
+### install.bat fails with "torch" errors
|
|
|
+PyTorch may have failed to download. Delete the `.venv` folder and run
|
|
|
+`install.bat` again with a stable internet connection.
|
|
|
+
|
|
|
+### Whisper server fails with `401` or `403`
|
|
|
+Your HuggingFace token is incorrect, or you have not accepted the model licence
|
|
|
+terms. Re-check Part 5 — both model pages must have "Agree and access
|
|
|
+repository" clicked while logged into the same account that generated the token.
|
|
|
+
|
|
|
+### Whisper server starts but no text appears
|
|
|
+Check that the correct audio input device is selected:
|
|
|
+- Open Windows **Sound Settings** → Input → ensure the microphone or audio
|
|
|
+ interface is set as the default device.
|
|
|
+- The bridge uses the Windows default input device.
|
|
|
+
|
|
|
+### Display does not update
|
|
|
+- Check the ESP32 Serial Monitor for WiFi/MQTT connection messages.
|
|
|
+- Verify `MQTT_HOST` in `main.cpp` matches the PC's IP address (`ipconfig` →
|
|
|
+ look for the WiFi adapter IPv4 address).
|
|
|
+- Confirm Mosquitto is running: `sc query mosquitto`
|
|
|
+
|
|
|
+### `large-v3` is too slow (display lags more than 5–6 seconds)
|
|
|
+Switch to a faster model by editing `start.bat`:
|
|
|
+
|
|
|
+```
|
|
|
+set WHISPER_MODEL=distil-large-v3
|
|
|
+```
|
|
|
+
|
|
|
+`distil-large-v3` is ~50%% faster with only a small accuracy reduction.
|