|
|
@@ -0,0 +1,80 @@
|
|
|
+# Changelog
|
|
|
+
|
|
|
+All notable changes to the TAS Councils scraping pipeline are recorded here.
|
|
|
+Entries are grouped by push/session in reverse-chronological order.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 2026-04-13 — Code Quality & Bug Fixes (current)
|
|
|
+
|
|
|
+**Bug fixes**
|
|
|
+- Fixed `Mysql2::Error Unknown column '''' in 'SET'` — MariaDB 10.11's prepared-statement parser mishandles string literals (`''`) inside `NULLIF`/`IF` expressions in `SET` clauses. Replaced `COALESCE(NULLIF(?, ''), col)` with `COALESCE(?, col)` passing `nil` when the value is empty (`lib/enrich.rb`).
|
|
|
+- Fixed `private method 'da_tables' called` error in `lib/migrate.rb` — migration lambdas call `Migrate.da_tables` with an explicit receiver, which counts as a public call. Removed `da_tables` from `private_class_method` declaration.
|
|
|
+- Fixed unmatched `end` / dangling `rescue` syntax error in `scrapers/launcestoncity.rb` introduced during a prior cleanup pass.
|
|
|
+- Eliminated duplicate "Docs page had no usable links" warning (fired twice per DA) in `scrapers/launcestoncity.rb`.
|
|
|
+
|
|
|
+**Removed**
|
|
|
+- Deleted `scrapers/enrich.rb` — stale copy of `lib/enrich.rb` with wrong `require_relative` paths, old broken `COALESCE(NULLIF(?, ''))` query, and no main batch loop. Was being picked up by `run_all.sh`'s `scrapers/*.rb` glob and failing every full run with a `LoadError`.
|
|
|
+
|
|
|
+**Docs**
|
|
|
+- Updated `CLAUDE.md`: corrected geocode-backfill command to use `tools/backfill_geocode.rb`, updated schema-change guidance to point to `lib/migrate.rb`.
|
|
|
+- Updated `README.md`: removed stale `tools/enrich.rb` references, corrected enrichment/backfill examples, updated tools table.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 2026-04-13 — Structure Updates (5f60868)
|
|
|
+
|
|
|
+- General structural cleanup across scrapers.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 2026-04-13 — Launceston City Scraper (3fc874c → bc3490f)
|
|
|
+
|
|
|
+- Implemented `scrapers/launcestoncity.rb` for the Launceston eProperty portal (ASP.NET session-based site).
|
|
|
+- Session cookie management (`merge_set_cookie!`) to maintain ASP.NET_SessionId across requests.
|
|
|
+- Document listing via `docget.asp` with multi-variant URL probing (path-case and route-param variants).
|
|
|
+- `probe_common_docs` fallback: constructs known PDF filenames from DA number when the document list page returns no links.
|
|
|
+- PDF download to `DOWNLOAD_DIR/launceston/<da_ref>/` when `DOWNLOAD_ATTACHMENTS=1`.
|
|
|
+- Enriches each DA from the details page (applicant, received date, advertised date, legal description).
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 2026-04-13 — Structured Logging (c03bfae)
|
|
|
+
|
|
|
+- Added `lib/log.rb` — `Log.debug`, `Log.info`, `Log.warn`, `Log.error` with `LOG_LEVEL` env filtering.
|
|
|
+- Replaced `puts`/`warn` calls across `lib/` with `Log.*` calls.
|
|
|
+- Added `LOG_LEVEL` env var to `docker-compose.yml` (default: `info`).
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 2026-04-13 — Schema Migrations (0e4e035)
|
|
|
+
|
|
|
+- Added `lib/migrate.rb` — lightweight sequential migration runner backed by a `schema_migrations` table.
|
|
|
+- Migration v1: adds enrichment and geocode columns to all existing `da_*` tables.
|
|
|
+- Migration v2: creates `geo_cache` table.
|
|
|
+- `run_all.sh` now runs `ruby /app/lib/migrate.rb` before scrapers.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 2026-04-13 — SQL Injection Hardening (f3c06ab)
|
|
|
+
|
|
|
+- Added `DB.validate_table_name!` — enforces `da_[a-z0-9_]+` pattern on every table name before interpolation into SQL.
|
|
|
+- Applied `DB.client.escape()` on all remaining identifier interpolations.
|
|
|
+- Applied `validate_table_name!` in `lib/geocode.rb` and `lib/enrich.rb`.
|
|
|
+
|
|
|
+---
|
|
|
+
|
|
|
+## 2026-04-12 — Initial Upload (ab11792)
|
|
|
+
|
|
|
+- 28 council scrapers covering all Tasmanian councils.
|
|
|
+- `lib/db.rb` — DB client, `ensure_table!`, upsert with write-once semantics.
|
|
|
+- `lib/http.rb` — HTTP client with retries, cookie jar, 403/406 warmup, curl fallback.
|
|
|
+- `lib/geocode.rb` — Google Maps geocoding with SHA1 cache in `geo_cache`.
|
|
|
+- `lib/enrich.rb` — `enrich_after_upsert!` for per-row geocoding and property lookup.
|
|
|
+- `lib/util.rb` — `parse_aus_date`, council/table name mappings.
|
|
|
+- `web/index.php` — PHP search portal with dynamic UNION across all `da_*` tables.
|
|
|
+- `tools/backfill_geocode.rb` — batch geocode backfill.
|
|
|
+- `tools/import_sqlites.rb` — import from legacy SQLite exports.
|
|
|
+- Docker Compose stack: MariaDB 10.11, Ruby 3.2 scraper, PHP/Apache web, Adminer.
|
|
|
+- `run_all.sh` — discovers and runs scrapers with `ONLY`/`SKIP` filtering.
|
|
|
+- `entrypoint.sh` — Docker entry with optional loop via `SCRAPE_EVERY_MINUTES`.
|