|
|
@@ -15,6 +15,7 @@ natively on the host machine regardless. Packaging everything into a single
|
|
|
`.exe` is not practical for software of this type.
|
|
|
|
|
|
Instead this guide provides:
|
|
|
+
|
|
|
- `install.bat` — run **once** to set everything up
|
|
|
- `start.bat` — run each time to launch the full system
|
|
|
|
|
|
@@ -27,7 +28,7 @@ After setup, operation is a double-click.
|
|
|
Before starting, confirm your PC meets these requirements:
|
|
|
|
|
|
| Requirement | Minimum | Recommended |
|
|
|
-|---|---|---|
|
|
|
+| --- | --- | --- |
|
|
|
| OS | Windows 10 64-bit | Windows 11 |
|
|
|
| GPU | NVIDIA GTX 1060 6 GB | NVIDIA RTX 3070 or better |
|
|
|
| VRAM | 6 GB | 8 GB+ |
|
|
|
@@ -36,13 +37,14 @@ Before starting, confirm your PC meets these requirements:
|
|
|
| Internet | Required for setup | Not needed during services |
|
|
|
|
|
|
> The RTX 4070 Super (tested hardware) runs `large-v3` in real time comfortably.
|
|
|
+> The RTX 5060 Ti (production hardware) also confirmed working.
|
|
|
|
|
|
---
|
|
|
|
|
|
## Part 2 — NVIDIA Driver
|
|
|
|
|
|
-You need an up-to-date NVIDIA driver. You do **not** need to install the CUDA
|
|
|
-Toolkit separately — PyTorch bundles everything it needs.
|
|
|
+You need an up-to-date NVIDIA driver. You will also need the CUDA Toolkit
|
|
|
+(Part 2b below) — the driver alone is not sufficient for all components.
|
|
|
|
|
|
1. Open **GeForce Experience** (if installed) → Drivers → Check for updates.
|
|
|
|
|
|
@@ -54,30 +56,28 @@ Toolkit separately — PyTorch bundles everything it needs.
|
|
|
3. Restart the PC when prompted.
|
|
|
|
|
|
4. Verify the driver is working:
|
|
|
+
|
|
|
- Press `Win + R`, type `cmd`, press Enter.
|
|
|
- Type `nvidia-smi` and press Enter.
|
|
|
- You should see a table with your GPU name and driver version.
|
|
|
|
|
|
- ```
|
|
|
+ ```text
|
|
|
+-----------------------------------------------------------------------------+
|
|
|
- | NVIDIA-SMI 560.x Driver Version: 560.x CUDA Version: 12.6 |
|
|
|
+ | NVIDIA-SMI 595.x Driver Version: 595.x CUDA Version: 13.x |
|
|
|
+-----------------------------------------------------------------------------+
|
|
|
- | RTX 4070 Super ...
|
|
|
+ | RTX 5060 Ti ...
|
|
|
```
|
|
|
|
|
|
If this command is not found, the driver did not install correctly.
|
|
|
|
|
|
---
|
|
|
|
|
|
-## Part 2b — CUDA Toolkit 12.x
|
|
|
+## Part 2b — CUDA Toolkit
|
|
|
|
|
|
-The NVIDIA driver alone is not enough. WhisperLiveKit uses **faster-whisper**
|
|
|
-(via ctranslate2) for inference, which requires the CUDA runtime libraries to
|
|
|
-be installed separately. Without this you will see `cublas64_12.dll not found`
|
|
|
-and the server will fall back to CPU-only mode, making transcription too slow
|
|
|
-for live use.
|
|
|
+The NVIDIA driver alone is not enough for all GPU components. The CUDA Toolkit
|
|
|
+provides compiler tools (`nvcc`) and low-level libraries used by WhisperLiveKit.
|
|
|
|
|
|
-> `nvidia-smi` showing "CUDA Version: 12.6" means your *driver supports* up
|
|
|
+> `nvidia-smi` showing "CUDA Version: 13.x" means your *driver supports* up
|
|
|
> to that version — it does **not** mean the toolkit is installed.
|
|
|
|
|
|
1. Go to [developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
|
|
|
@@ -91,14 +91,28 @@ for live use.
|
|
|
|
|
|
5. Verify:
|
|
|
|
|
|
- ```
|
|
|
+ ```cmd
|
|
|
nvcc --version
|
|
|
```
|
|
|
|
|
|
- Expected: `release 12.x, V12.x.xxx`
|
|
|
+ Expected output ends with something like `release 13.x, V13.x.xxx` (the
|
|
|
+ exact version will match whatever you downloaded).
|
|
|
+
|
|
|
+ > If `nvcc` is not found, add the toolkit's `bin` folder to your system
|
|
|
+ > PATH (e.g. `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.2\bin`)
|
|
|
+ > using the same method as the Mosquitto PATH fix in Part 4.
|
|
|
+
|
|
|
+**Triton kernel warning** — after installing the CUDA Toolkit you will still
|
|
|
+see this at bridge startup:
|
|
|
+
|
|
|
+```text
|
|
|
+Failed to launch Triton kernels, likely due to missing CUDA toolkit;
|
|
|
+falling back to a slower median kernel implementation...
|
|
|
+```
|
|
|
|
|
|
- > If `nvcc` is not found, add `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.x\bin`
|
|
|
- > to your system PATH (same method as the Mosquitto PATH fix in Part 4).
|
|
|
+This message is **misleading**. The `triton` Python package does not support
|
|
|
+Windows — there is no Windows build. The fallback is expected and has no
|
|
|
+practical effect on transcription quality.
|
|
|
|
|
|
---
|
|
|
|
|
|
@@ -116,12 +130,13 @@ WhisperLiveKit) does not yet publish pre-built packages for Python 3.14 or
|
|
|
the latest **Python 3.12.x** release. Download the **Windows installer (64-bit)**.
|
|
|
|
|
|
2. Run the installer. On the first screen:
|
|
|
+
|
|
|
- **Tick "Add Python to PATH"** (important — do this before clicking Install Now)
|
|
|
- Click **Install Now**
|
|
|
|
|
|
3. Once complete, verify in a new Command Prompt window:
|
|
|
|
|
|
- ```
|
|
|
+ ```cmd
|
|
|
py -3.12 --version
|
|
|
```
|
|
|
|
|
|
@@ -131,7 +146,7 @@ WhisperLiveKit) does not yet publish pre-built packages for Python 3.14 or
|
|
|
|
|
|
## Part 4 — Mosquitto (MQTT Broker)
|
|
|
|
|
|
-Mosquitto is the message relay between the PC and the display.
|
|
|
+Mosquitto is the message relay between the transcription bridge and the display.
|
|
|
|
|
|
1. Download the Windows installer from
|
|
|
[mosquitto.org/download](https://mosquitto.org/download/) — choose the
|
|
|
@@ -142,7 +157,7 @@ Mosquitto is the message relay between the PC and the display.
|
|
|
3. **Add Mosquitto to the system PATH** (the installer does not do this
|
|
|
automatically). Run Command Prompt **as Administrator**:
|
|
|
|
|
|
- ```
|
|
|
+ ```cmd
|
|
|
setx /M PATH "%PATH%;C:\Program Files\mosquitto"
|
|
|
```
|
|
|
|
|
|
@@ -151,19 +166,19 @@ Mosquitto is the message relay between the PC and the display.
|
|
|
|
|
|
4. Start Mosquitto as a Windows service (still as Administrator):
|
|
|
|
|
|
- ```
|
|
|
+ ```cmd
|
|
|
net start mosquitto
|
|
|
```
|
|
|
|
|
|
5. Set it to start automatically with Windows:
|
|
|
|
|
|
- ```
|
|
|
+ ```cmd
|
|
|
sc config mosquitto start=auto
|
|
|
```
|
|
|
|
|
|
6. Verify the tools are working:
|
|
|
|
|
|
- ```
|
|
|
+ ```cmd
|
|
|
mosquitto_sub -h localhost -t test -v
|
|
|
```
|
|
|
|
|
|
@@ -180,6 +195,7 @@ accepting its licence terms. This is free — it just needs an account.
|
|
|
1. Go to [huggingface.co](https://huggingface.co) and create a free account.
|
|
|
|
|
|
2. Accept the licence for the diarization model:
|
|
|
+
|
|
|
- Visit [huggingface.co/pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
|
|
|
- Click **"Agree and access repository"**
|
|
|
- Also visit [huggingface.co/pyannote/segmentation-3.0](https://huggingface.co/pyannote/segmentation-3.0)
|
|
|
@@ -188,13 +204,13 @@ accepting its licence terms. This is free — it just needs an account.
|
|
|
> If you skip this step, the server will fail to start with a 403 error.
|
|
|
|
|
|
3. Create an access token:
|
|
|
+
|
|
|
- Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
|
|
|
- Click **New token**
|
|
|
- Name: `church-transcription` (or anything you like)
|
|
|
- Role: **Read**
|
|
|
- Click **Generate token**
|
|
|
- Copy the token — it starts with `hf_`
|
|
|
-
|
|
|
|
|
|
4. **Save this token somewhere safe** (Notepad or a password manager). You will
|
|
|
paste it into `start.bat` in Part 7.
|
|
|
@@ -204,6 +220,7 @@ accepting its licence terms. This is free — it just needs an account.
|
|
|
## Part 6 — Run install.bat
|
|
|
|
|
|
The `install.bat` script in this folder does the following automatically:
|
|
|
+
|
|
|
- Creates a Python virtual environment in `.venv\`
|
|
|
- Installs PyTorch with CUDA support
|
|
|
- Installs WhisperLiveKit
|
|
|
@@ -221,7 +238,7 @@ The `install.bat` script in this folder does the following automatically:
|
|
|
|
|
|
3. Near the end you will see the Whisper model downloading for the first time:
|
|
|
|
|
|
- ```
|
|
|
+ ```text
|
|
|
Downloading model large-v3 (~3 GB) ...
|
|
|
```
|
|
|
|
|
|
@@ -264,29 +281,34 @@ pyannote can download the diarization model.
|
|
|
|
|
|
1. Double-click **`start.bat`**.
|
|
|
|
|
|
- Two windows will open:
|
|
|
- - **Window 1 — Whisper Server**: shows the transcription engine loading.
|
|
|
- On first run this downloads the speaker diarization model (~500 MB).
|
|
|
- Wait until you see `Server running on ws://0.0.0.0:8000`.
|
|
|
- - **Window 2 — Bridge**: the speaker name mapping window appears, and the
|
|
|
- Command Prompt behind it shows connection status.
|
|
|
+ Two Command Prompt windows will open:
|
|
|
+
|
|
|
+ - **Window 1 — Bridge**: the transcription pipeline. Wait until you see
|
|
|
+ `Audio pipeline running`.
|
|
|
+ - **Window 2 — Admin**: the web server. Wait until it shows
|
|
|
+ `Application startup complete`.
|
|
|
+
|
|
|
+2. Open the speaker admin page:
|
|
|
|
|
|
-2. Verify the Whisper server is working:
|
|
|
- - Open a browser and go to `http://localhost:8000`
|
|
|
- - You should see a simple web interface. Speak into the microphone — text
|
|
|
- should appear.
|
|
|
+ - Open a browser and go to `http://localhost:8001`
|
|
|
+ - You should see the Speaker Admin table.
|
|
|
|
|
|
-3. Verify the display:
|
|
|
- - With the ESP32 powered on and connected to the same WiFi, send a test
|
|
|
- message. Open a third Command Prompt and run:
|
|
|
+3. Open the display page on a tablet or spare screen:
|
|
|
+
|
|
|
+ - On any device on the same WiFi, open `http://[PC-IP]:8001/display`
|
|
|
+ - Press `F11` for fullscreen. A green dot in the corner means it is
|
|
|
+ connected and receiving updates.
|
|
|
+
|
|
|
+4. Send a test message to verify the full pipeline. Open a Command Prompt and run:
|
|
|
+
|
|
|
+ ```cmd
|
|
|
+ mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}"
|
|
|
+ ```
|
|
|
|
|
|
- ```
|
|
|
- mosquitto_pub -h localhost -t display/text -m "{\"lines\":[\"Test line 1\",\"Test line 2\",\"Ready\"]}"
|
|
|
- ```
|
|
|
+ The display page should update immediately with those three lines.
|
|
|
|
|
|
- - The e-ink display should refresh within 2 seconds showing those three lines.
|
|
|
+5. Full pipeline test:
|
|
|
|
|
|
-4. Full pipeline test:
|
|
|
- Speak naturally into the microphone.
|
|
|
- After a sentence or natural pause, text should appear on the display within
|
|
|
3–5 seconds.
|
|
|
@@ -297,15 +319,13 @@ pyannote can download the diarization model.
|
|
|
|
|
|
## Part 9 — Assigning speaker names
|
|
|
|
|
|
-The bridge window shows a **Speaker Name Mapping** panel. The system
|
|
|
-automatically detects different speakers and labels them SPEAKER_00,
|
|
|
-SPEAKER_01, etc.
|
|
|
+The speaker admin page at `http://localhost:8001` shows all detected speakers.
|
|
|
+The system automatically labels them `SPEAKER_00`, `SPEAKER_01`, etc.
|
|
|
|
|
|
-- The defaults (Pastor, Reader, Guest, Choir) are applied immediately when the
|
|
|
- bridge starts.
|
|
|
-- If a different person is speaking than expected, type their name in the
|
|
|
- matching row and click **Apply**.
|
|
|
-- Speaker labels appear on the display as a short heading line (e.g. `[PASTOR]`)
|
|
|
+- The defaults (Pastor, Reader, Guest, Choir) are loaded on first run.
|
|
|
+- When a new speaker appears, click their name in the table and type the
|
|
|
+ correct name. Changes take effect within 5 seconds.
|
|
|
+- Speaker labels appear on the display as a gold heading line (e.g. `PASTOR`)
|
|
|
whenever the speaker changes.
|
|
|
|
|
|
---
|
|
|
@@ -314,7 +334,7 @@ SPEAKER_01, etc.
|
|
|
|
|
|
1. Double-click `start.bat`.
|
|
|
2. Wait ~30 seconds for both windows to show "ready" status.
|
|
|
-3. The display will show `DISPLAY READY` when the ESP32 connects.
|
|
|
+3. Open `http://[PC-IP]:8001/display` on the tablet and press `F11`.
|
|
|
4. Begin the service — transcription runs automatically.
|
|
|
5. Close both windows when done.
|
|
|
|
|
|
@@ -343,10 +363,12 @@ setx /M PATH "%PATH%;C:\Program Files\mosquitto"
|
|
|
Close and reopen the Command Prompt, then retry the command.
|
|
|
|
|
|
### `nvidia-smi` not found
|
|
|
+
|
|
|
The NVIDIA driver is not installed or not in PATH. Re-run the driver installer
|
|
|
and restart the PC.
|
|
|
|
|
|
### `python --version` shows wrong version or "not found"
|
|
|
+
|
|
|
Python was not added to PATH. Re-run the Python installer, choose "Modify",
|
|
|
and tick "Add Python to environment variables".
|
|
|
|
|
|
@@ -361,27 +383,34 @@ If the error occurs on Python 3.12, the PyTorch download may have failed
|
|
|
mid-way. Delete `.venv` and re-run `install.bat` with a stable connection.
|
|
|
|
|
|
### Whisper server fails with `401` or `403`
|
|
|
+
|
|
|
Your HuggingFace token is incorrect, or you have not accepted the model licence
|
|
|
terms. Re-check Part 5 — both model pages must have "Agree and access
|
|
|
repository" clicked while logged into the same account that generated the token.
|
|
|
|
|
|
-### Whisper server starts but no text appears
|
|
|
+### Bridge starts but no text appears on the display
|
|
|
+
|
|
|
Check that the correct audio input device is selected:
|
|
|
+
|
|
|
- Open Windows **Sound Settings** → Input → ensure the microphone or audio
|
|
|
interface is set as the default device.
|
|
|
-- The bridge uses the Windows default input device.
|
|
|
+- Or set `AUDIO_DEVICE` to a specific device index in `bridge/bridge.py`.
|
|
|
+
|
|
|
+### Display page does not update
|
|
|
|
|
|
-### Display does not update
|
|
|
-- Check the ESP32 Serial Monitor for WiFi/MQTT connection messages.
|
|
|
-- Verify `MQTT_HOST` in `main.cpp` matches the PC's IP address (`ipconfig` →
|
|
|
- look for the WiFi adapter IPv4 address).
|
|
|
+- Check the green/red dot in the bottom-right corner of the display page.
|
|
|
+ Red means the browser lost its connection to the admin server.
|
|
|
+- Confirm `admin.py` is running and accessible at `http://[PC-IP]:8001`.
|
|
|
- Confirm Mosquitto is running: `sc query mosquitto`
|
|
|
+- Verify the PC's IP has not changed — tablets store the URL, so update it
|
|
|
+ if the PC was assigned a new address.
|
|
|
|
|
|
### `large-v3` is too slow (display lags more than 5–6 seconds)
|
|
|
-Switch to a faster model by editing `start.bat`:
|
|
|
|
|
|
-```
|
|
|
-set WHISPER_MODEL=distil-large-v3
|
|
|
+Switch to a faster model by editing `bridge/bridge.py`:
|
|
|
+
|
|
|
+```python
|
|
|
+engine = TranscriptionEngine(model_size="distil-large-v3", ...)
|
|
|
```
|
|
|
|
|
|
-`distil-large-v3` is ~50%% faster with only a small accuracy reduction.
|
|
|
+`distil-large-v3` is ~50% faster with only a small accuracy reduction.
|