# Changelog
All notable changes to the TAS Councils scraping pipeline are recorded here.
Entries are grouped by push/session in reverse-chronological order.
---
## 2026-04-13 — Scraper Fixes & Audit
**`scrapers/planbuild.rb`** — rewrote to fix crash on first item:
- Added missing `require "zlib"`, `require "stringio"`, `require_relative "../lib/log"`
- `fetch_detail` now always returns a Hash (`parsed.is_a?(Hash) ? parsed : {}`); bare `rescue {}` replaced with `rescue JSON::ParserError, Zlib::Error`
- Removed debug `puts` — replaced with `Log.debug`/`Log.info`
- `local_document_url` now passes `nil` (not `""`) when no downloads — prevents COALESCE overwriting an existing URL with empty string
- Per-item rescue so one bad reference skips and logs rather than killing the run
**`scrapers/southernmidlands.rb`** — rewrote detail page parser:
- Detail pages use `Location: / Proposal:` paragraph format, not table rows — old `table tr th/td` selector found nothing, causing 0 saves
- New parser splits `
`-separated lines per paragraph, extracts Location/Proposal fields, handles multiple DAs per item page
- Removed redundant `ALTER TABLE` block (columns already in `DB.ensure_table!`)
- Added explicit `require_relative "../lib/http"`, `../lib/db"`, `"../lib/util"`
**Missing `require_relative "../lib/log"` — 20 scrapers fixed:**
- `break_oday`, `brighton`, `burnie`, `centralcoast`, `circularhead`, `clarence`, `derwentvalley`, `devonportcity`, `dorset`, `flinders_council`, `glenorchy`, `huonvalley`, `kentish`, `launcestoncity`, `meandervalley`, `northernmidlands`, `southernmidlands`, `waratah_wynyard`, `westcoast`, `westtamar`
- `Log.warn` called in rescue blocks in all of these — without the require, the first error would raise `NameError: uninitialized constant Log` instead of logging
**`enrich_after_upsert!` variable scope bugs — 4 scrapers fixed:**
- `flinders_council.rb`: `council_reference` (undefined) → `ref`; folded separate `UPDATE document_url` into `DB.upsert`; removed redundant `ALTER TABLE`
- `huonvalley.rb`: `council_reference`/`address` (undefined) → `r[:council_reference]`/`r[:address]`; folded `UPDATE document_url` into upsert; removed redundant `ALTER TABLE`
- `kentish.rb`: `council_reference`/`address` (undefined) → `r[:council_reference]`/`r[:address]`; folded extras UPDATE into upsert
- `westcoast.rb`: `address` (undefined) → `item[:address]`; fixed upsert field names (`on_notice` → `on_notice_to`, `on_notice_raw` → `on_notice_to_raw`); fixed values referencing non-existent item keys; folded extras UPDATE into upsert
**Redundant `ALTER TABLE` blocks removed** from `circularhead.rb` and `waratah_wynyard.rb` — all columns already created by `DB.ensure_table!`
---
## 2026-04-13 — Code Quality Pass 3
**Logging**
- All 63 bare `warn "..."` calls across `scrapers/*.rb` replaced with `Log.warn "scraper", "..."` — structured logging now consistent throughout; stderr output is now filtered by `LOG_LEVEL`.
**DB.upsert dynamic rewrite** (`lib/db.rb`)
- Removed hardcoded 22-column array — `upsert` now derives columns from `row.keys`, so scrapers that pass scraper-specific columns (e.g. `advertised_date`, `legal_description`) are no longer silently ignored.
- Added `SAFE_COLUMN_RE = /\A[a-z][a-z0-9_]*\z/` — each key is validated before interpolation into SQL; unsafe names raise `ArgumentError` rather than silently passing.
- Extracted write-once/merge semantics into `UPSERT_ON_DUP` constant (`date_received`, `date_received_raw`, `document_url`, `local_document_url`) — easier to audit and extend.
- Non-existent columns now raise `Mysql2::Error` (caught by scraper rescue) instead of silently being dropped, surfacing schema mismatches early.
---
## 2026-04-13 — Code Quality Pass 2
**Security**
- `lib/http.rb` curl fallback: replaced shell-interpolated backtick call with `Open3.capture2` array form — eliminates shell injection risk from URL-derived `ref`/`uri` values.
- `web/index.php`: added `validate_table_name()` helper (enforces `/\Ada_[a-z0-9_]+\z/`) applied before every backtick-quoted table name interpolation (`tableHasColumn`, `$stageT` stages fetch, UNION SELECT builder).
**Schema consolidation**
- Removed `Geocode.ensure_da_columns!` from `lib/geocode.rb` — redundant, covered by `DB.ensure_table!` (new tables) and migration v1 (existing tables). Removed its call from `tools/backfill_geocode.rb`.
- Removed `ensure_extra_columns!` from `lib/enrich.rb` and all 10 scraper call-sites — same reasoning; was also using wrong column types (`DOUBLE`/`VARCHAR(50)`) vs canonical schema (`DECIMAL(10,7)`/`TEXT`).
**Error handling**
- 66 bare `rescue => e` replaced with `rescue StandardError => e` across all scrapers, lib, and tools — prevents accidental swallowing of `SystemExit`/`SignalException`.
- `lib/enrich.rb`: two `warn` calls replaced with `Log.warn` for structured logging; stale file header comment removed.
**Removed**
- Deleted `scrapers/enrich.rb` — stale duplicate with wrong `require_relative` paths, old broken `COALESCE(NULLIF(?, ''))` query, no main batch loop. Was picked up by `run_all.sh`'s glob and failing every full run with `LoadError`.
**Docs**
- `CLAUDE.md`: corrected scraper pattern (removed `ensure_extra_columns!(TABLE)` step), updated geocode-backfill command, corrected schema-change guidance.
- `README.md`: removed stale `tools/enrich.rb` references; corrected enrichment/backfill examples and tools table; added link to VERSIONS.md.
- `VERSIONS.md`: created — changelog covering all changes from initial upload.
---
## 2026-04-13 — Code Quality & Bug Fixes
**Bug fixes**
- Fixed `Mysql2::Error Unknown column '''' in 'SET'` — MariaDB 10.11's prepared-statement parser mishandles string literals (`''`) inside `NULLIF`/`IF` expressions in `SET` clauses. Replaced `COALESCE(NULLIF(?, ''), col)` with `COALESCE(?, col)` passing `nil` when the value is empty (`lib/enrich.rb`).
- Fixed `private method 'da_tables' called` error in `lib/migrate.rb` — migration lambdas call `Migrate.da_tables` with an explicit receiver, which counts as a public call. Removed `da_tables` from `private_class_method` declaration.
- Fixed unmatched `end` / dangling `rescue` syntax error in `scrapers/launcestoncity.rb` introduced during a prior cleanup pass.
- Eliminated duplicate "Docs page had no usable links" warning (fired twice per DA) in `scrapers/launcestoncity.rb`.
**Removed**
- Deleted `scrapers/enrich.rb` — stale copy of `lib/enrich.rb` with wrong `require_relative` paths, old broken `COALESCE(NULLIF(?, ''))` query, and no main batch loop. Was being picked up by `run_all.sh`'s `scrapers/*.rb` glob and failing every full run with a `LoadError`.
**Docs**
- Updated `CLAUDE.md`: corrected geocode-backfill command to use `tools/backfill_geocode.rb`, updated schema-change guidance to point to `lib/migrate.rb`.
- Updated `README.md`: removed stale `tools/enrich.rb` references, corrected enrichment/backfill examples, updated tools table.
---
## 2026-04-13 — Structure Updates (5f60868)
- General structural cleanup across scrapers.
---
## 2026-04-13 — Launceston City Scraper (3fc874c → bc3490f)
- Implemented `scrapers/launcestoncity.rb` for the Launceston eProperty portal (ASP.NET session-based site).
- Session cookie management (`merge_set_cookie!`) to maintain ASP.NET_SessionId across requests.
- Document listing via `docget.asp` with multi-variant URL probing (path-case and route-param variants).
- `probe_common_docs` fallback: constructs known PDF filenames from DA number when the document list page returns no links.
- PDF download to `DOWNLOAD_DIR/launceston//` when `DOWNLOAD_ATTACHMENTS=1`.
- Enriches each DA from the details page (applicant, received date, advertised date, legal description).
---
## 2026-04-13 — Structured Logging (c03bfae)
- Added `lib/log.rb` — `Log.debug`, `Log.info`, `Log.warn`, `Log.error` with `LOG_LEVEL` env filtering.
- Replaced `puts`/`warn` calls across `lib/` with `Log.*` calls.
- Added `LOG_LEVEL` env var to `docker-compose.yml` (default: `info`).
---
## 2026-04-13 — Schema Migrations (0e4e035)
- Added `lib/migrate.rb` — lightweight sequential migration runner backed by a `schema_migrations` table.
- Migration v1: adds enrichment and geocode columns to all existing `da_*` tables.
- Migration v2: creates `geo_cache` table.
- `run_all.sh` now runs `ruby /app/lib/migrate.rb` before scrapers.
---
## 2026-04-13 — SQL Injection Hardening (f3c06ab)
- Added `DB.validate_table_name!` — enforces `da_[a-z0-9_]+` pattern on every table name before interpolation into SQL.
- Applied `DB.client.escape()` on all remaining identifier interpolations.
- Applied `validate_table_name!` in `lib/geocode.rb` and `lib/enrich.rb`.
---
## 2026-04-12 — Initial Upload (ab11792)
- 28 council scrapers covering all Tasmanian councils.
- `lib/db.rb` — DB client, `ensure_table!`, upsert with write-once semantics.
- `lib/http.rb` — HTTP client with retries, cookie jar, 403/406 warmup, curl fallback.
- `lib/geocode.rb` — Google Maps geocoding with SHA1 cache in `geo_cache`.
- `lib/enrich.rb` — `enrich_after_upsert!` for per-row geocoding and property lookup.
- `lib/util.rb` — `parse_aus_date`, council/table name mappings.
- `web/index.php` — PHP search portal with dynamic UNION across all `da_*` tables.
- `tools/backfill_geocode.rb` — batch geocode backfill.
- `tools/import_sqlites.rb` — import from legacy SQLite exports.
- Docker Compose stack: MariaDB 10.11, Ruby 3.2 scraper, PHP/Apache web, Adminer.
- `run_all.sh` — discovers and runs scrapers with `ONLY`/`SKIP` filtering.
- `entrypoint.sh` — Docker entry with optional loop via `SCRAPE_EVERY_MINUTES`.