# CLAUDE.md — AI Development Context

This file provides context for AI-assisted development sessions on the Church Live Transcription Display project.

## Project Summary

A live captioning system for deaf/hard-of-hearing church congregants. A Windows PC captures audio, transcribes it locally using Whisper (GPU-accelerated), and sends rolling text over MQTT to an ESP32 driving a large e-ink display. No cloud services. No internet required during operation.

## Architecture

```
[Audio source]
     ↓ (USB mic or mixer line-in)
[Windows PC]
  ├── WhisperLiveKit (local Whisper server, WebSocket on port 8000)
  ├── Mosquitto MQTT broker (port 1883)
  └── bridge.py (Python: WS subscriber → sentence buffer → MQTT publisher)
     ↓ (WiFi / MQTT topic: display/text)
[ESP32-WROOM or S3]
  └── Waveshare 7.5" V2 e-ink display (SPI, GxEPD2 library)
```

## PC Environment

- OS: Windows 10/11
- GPU: NVIDIA RTX series (tested with RTX 4070 Super)
- Python: 3.11+
- MQTT broker: Mosquitto (localhost:1883)
- Whisper server: WhisperLiveKit (`wlk --model large-v3 --language en`)
- Whisper WebSocket: `ws://localhost:8000/asr`

## ESP32 Environment

- Board: ESP32-WROOM-32 or ESP32-S3
- Framework: Arduino (via PlatformIO)
- Display: Waveshare 7.5" V2 (800×480 pixels, black/white)
- Display library: GxEPD2
- MQTT library: PubSubClient
- Build tool: PlatformIO (VSCode)

### SPI Wiring (Waveshare 7.5" V2 to ESP32)

| Display Pin | ESP32 Pin |
|---|---|
| BUSY | GPIO 4 |
| RST | GPIO 16 |
| DC | GPIO 17 |
| CS | GPIO 5 |
| CLK | GPIO 18 |
| DIN | GPIO 23 |
| GND | GND |
| VCC | 3.3V |

## MQTT Topics

| Topic | Direction | Payload |
|---|---|---|
| `display/text` | PC → ESP32 | JSON: `{"lines": ["line1", "line2", "line3"]}` |
| `display/clear` | PC → ESP32 | Empty / any |
| `display/status` | ESP32 → PC | JSON: `{"ready": true}` |

## Key Files

- `bridge/bridge.py` — Main Python bridge. Connects to Whisper WS, buffers text, publishes to MQTT.
- `esp32/src/main.cpp` — ESP32 firmware. WiFi + MQTT client, renders text to e-ink.
- `esp32/platformio.ini` — Board and library config.

## Design Constraints & Decisions

### Refresh Strategy
- Full e-ink refresh: ~1.5–2 seconds with flash. Acceptable for sentence-level updates.
- Partial refresh: ~300ms, some ghosting. Use for rapid updates if needed.
- **Current approach**: buffer until sentence boundary or 4-second silence, then push full screen update.
- Display shows 3 lines of text. New text pushes old text up; oldest line drops off.

### Text Formatting
- Target font size: large enough to read at 3–5 metres (approx 36–48px equivalent at 800px wide)
- At ~800px wide with a large font: approximately 35–45 characters per line
- Lines wrap at word boundaries
- All caps optional for readability (configurable)

### Audio Input
- Preferred: direct feed from church mixing desk (line-in or USB audio interface)
- Fallback: USB condenser microphone near pulpit/lectern
- Whisper performs best with clean, low-noise input
- VAD (Voice Activity Detection) in WhisperLiveKit handles silence automatically

### Network
- All on local WiFi (church LAN or dedicated hotspot)
- MQTT broker on Windows PC
- ESP32 connects to same WiFi network
- Static IP recommended for ESP32 to avoid reconnection delays

## Bridge Script Logic (bridge.py)

```
1. Connect to Mosquitto MQTT broker
2. Connect to WhisperLiveKit WebSocket (ws://localhost:8000/asr)
3. Receive partial transcription updates
4. Accumulate words into a sentence buffer
5. On sentence-end signal (or timeout):
   a. Word-wrap text into lines (max ~40 chars each)
   b. Maintain a rolling 3-line buffer
   c. Publish JSON payload to MQTT topic display/text
6. On reconnect events: re-establish WS and MQTT connections
```

## Known Issues / Open Questions

- [ ] Partial refresh ghosting threshold — how many partial refreshes before forcing a full clear?
- [ ] Whisper latency with large-v3 model — may need to test medium or distil-large-v3 for lower latency
- [ ] Line-wrapping edge cases with long words (e.g. proper nouns, scripture references)
- [ ] ESP32 RAM: WROOM has 520KB; large font bitmaps may require PSRAM (use S3 variant)
- [ ] WiFi reconnection handling in firmware — need watchdog/retry logic

## Development Notes

- WhisperLiveKit WebSocket returns incremental JSON with `text` and `is_final` fields
- GxEPD2 supports both full and partial refresh; partial requires `setPartialWindow()`
- PubSubClient default packet size is 128 bytes — must increase to handle JSON payloads (~200 bytes)
- Use `client.setBufferSize(512)` in PubSubClient setup

## Testing Approach

1. Test Whisper server standalone: speak into mic, verify text in browser at `http://localhost:8000`
2. Test MQTT: use MQTT Explorer or `mosquitto_sub` to verify bridge publishes correctly
3. Test ESP32 display: send static MQTT messages manually before connecting bridge
4. End-to-end: full pipeline test with recorded sermon audio
5. In-situ trial: 1–2 Sunday services with a volunteer congregant providing feedback