# Changelog All notable changes to the TAS Councils scraping pipeline are recorded here. Entries are grouped by push/session in reverse-chronological order. --- ## 2026-04-13 — Code Quality & Bug Fixes (current) **Bug fixes** - Fixed `Mysql2::Error Unknown column '''' in 'SET'` — MariaDB 10.11's prepared-statement parser mishandles string literals (`''`) inside `NULLIF`/`IF` expressions in `SET` clauses. Replaced `COALESCE(NULLIF(?, ''), col)` with `COALESCE(?, col)` passing `nil` when the value is empty (`lib/enrich.rb`). - Fixed `private method 'da_tables' called` error in `lib/migrate.rb` — migration lambdas call `Migrate.da_tables` with an explicit receiver, which counts as a public call. Removed `da_tables` from `private_class_method` declaration. - Fixed unmatched `end` / dangling `rescue` syntax error in `scrapers/launcestoncity.rb` introduced during a prior cleanup pass. - Eliminated duplicate "Docs page had no usable links" warning (fired twice per DA) in `scrapers/launcestoncity.rb`. **Removed** - Deleted `scrapers/enrich.rb` — stale copy of `lib/enrich.rb` with wrong `require_relative` paths, old broken `COALESCE(NULLIF(?, ''))` query, and no main batch loop. Was being picked up by `run_all.sh`'s `scrapers/*.rb` glob and failing every full run with a `LoadError`. **Docs** - Updated `CLAUDE.md`: corrected geocode-backfill command to use `tools/backfill_geocode.rb`, updated schema-change guidance to point to `lib/migrate.rb`. - Updated `README.md`: removed stale `tools/enrich.rb` references, corrected enrichment/backfill examples, updated tools table. --- ## 2026-04-13 — Structure Updates (5f60868) - General structural cleanup across scrapers. --- ## 2026-04-13 — Launceston City Scraper (3fc874c → bc3490f) - Implemented `scrapers/launcestoncity.rb` for the Launceston eProperty portal (ASP.NET session-based site). - Session cookie management (`merge_set_cookie!`) to maintain ASP.NET_SessionId across requests. - Document listing via `docget.asp` with multi-variant URL probing (path-case and route-param variants). - `probe_common_docs` fallback: constructs known PDF filenames from DA number when the document list page returns no links. - PDF download to `DOWNLOAD_DIR/launceston//` when `DOWNLOAD_ATTACHMENTS=1`. - Enriches each DA from the details page (applicant, received date, advertised date, legal description). --- ## 2026-04-13 — Structured Logging (c03bfae) - Added `lib/log.rb` — `Log.debug`, `Log.info`, `Log.warn`, `Log.error` with `LOG_LEVEL` env filtering. - Replaced `puts`/`warn` calls across `lib/` with `Log.*` calls. - Added `LOG_LEVEL` env var to `docker-compose.yml` (default: `info`). --- ## 2026-04-13 — Schema Migrations (0e4e035) - Added `lib/migrate.rb` — lightweight sequential migration runner backed by a `schema_migrations` table. - Migration v1: adds enrichment and geocode columns to all existing `da_*` tables. - Migration v2: creates `geo_cache` table. - `run_all.sh` now runs `ruby /app/lib/migrate.rb` before scrapers. --- ## 2026-04-13 — SQL Injection Hardening (f3c06ab) - Added `DB.validate_table_name!` — enforces `da_[a-z0-9_]+` pattern on every table name before interpolation into SQL. - Applied `DB.client.escape()` on all remaining identifier interpolations. - Applied `validate_table_name!` in `lib/geocode.rb` and `lib/enrich.rb`. --- ## 2026-04-12 — Initial Upload (ab11792) - 28 council scrapers covering all Tasmanian councils. - `lib/db.rb` — DB client, `ensure_table!`, upsert with write-once semantics. - `lib/http.rb` — HTTP client with retries, cookie jar, 403/406 warmup, curl fallback. - `lib/geocode.rb` — Google Maps geocoding with SHA1 cache in `geo_cache`. - `lib/enrich.rb` — `enrich_after_upsert!` for per-row geocoding and property lookup. - `lib/util.rb` — `parse_aus_date`, council/table name mappings. - `web/index.php` — PHP search portal with dynamic UNION across all `da_*` tables. - `tools/backfill_geocode.rb` — batch geocode backfill. - `tools/import_sqlites.rb` — import from legacy SQLite exports. - Docker Compose stack: MariaDB 10.11, Ruby 3.2 scraper, PHP/Apache web, Adminer. - `run_all.sh` — discovers and runs scrapers with `ONLY`/`SKIP` filtering. - `entrypoint.sh` — Docker entry with optional loop via `SCRAPE_EVERY_MINUTES`.