All notable changes to the TAS Councils scraping pipeline are recorded here. Entries are grouped by push/session in reverse-chronological order.
Logging
warn "..." calls across scrapers/*.rb replaced with Log.warn "scraper", "..." — structured logging now consistent throughout; stderr output is now filtered by LOG_LEVEL.DB.upsert dynamic rewrite (lib/db.rb)
upsert now derives columns from row.keys, so scrapers that pass scraper-specific columns (e.g. advertised_date, legal_description) are no longer silently ignored.SAFE_COLUMN_RE = /\A[a-z][a-z0-9_]*\z/ — each key is validated before interpolation into SQL; unsafe names raise ArgumentError rather than silently passing.UPSERT_ON_DUP constant (date_received, date_received_raw, document_url, local_document_url) — easier to audit and extend.Mysql2::Error (caught by scraper rescue) instead of silently being dropped, surfacing schema mismatches early.Security
lib/http.rb curl fallback: replaced shell-interpolated backtick call with Open3.capture2 array form — eliminates shell injection risk from URL-derived ref/uri values.web/index.php: added validate_table_name() helper (enforces /\Ada_[a-z0-9_]+\z/) applied before every backtick-quoted table name interpolation (tableHasColumn, $stageT stages fetch, UNION SELECT builder).Schema consolidation
Geocode.ensure_da_columns! from lib/geocode.rb — redundant, covered by DB.ensure_table! (new tables) and migration v1 (existing tables). Removed its call from tools/backfill_geocode.rb.ensure_extra_columns! from lib/enrich.rb and all 10 scraper call-sites — same reasoning; was also using wrong column types (DOUBLE/VARCHAR(50)) vs canonical schema (DECIMAL(10,7)/TEXT).Error handling
rescue => e replaced with rescue StandardError => e across all scrapers, lib, and tools — prevents accidental swallowing of SystemExit/SignalException.lib/enrich.rb: two warn calls replaced with Log.warn for structured logging; stale file header comment removed.Removed
scrapers/enrich.rb — stale duplicate with wrong require_relative paths, old broken COALESCE(NULLIF(?, '')) query, no main batch loop. Was picked up by run_all.sh's glob and failing every full run with LoadError.Docs
CLAUDE.md: corrected scraper pattern (removed ensure_extra_columns!(TABLE) step), updated geocode-backfill command, corrected schema-change guidance.README.md: removed stale tools/enrich.rb references; corrected enrichment/backfill examples and tools table; added link to VERSIONS.md.VERSIONS.md: created — changelog covering all changes from initial upload.Bug fixes
Mysql2::Error Unknown column '''' in 'SET' — MariaDB 10.11's prepared-statement parser mishandles string literals ('') inside NULLIF/IF expressions in SET clauses. Replaced COALESCE(NULLIF(?, ''), col) with COALESCE(?, col) passing nil when the value is empty (lib/enrich.rb).private method 'da_tables' called error in lib/migrate.rb — migration lambdas call Migrate.da_tables with an explicit receiver, which counts as a public call. Removed da_tables from private_class_method declaration.end / dangling rescue syntax error in scrapers/launcestoncity.rb introduced during a prior cleanup pass.scrapers/launcestoncity.rb.Removed
scrapers/enrich.rb — stale copy of lib/enrich.rb with wrong require_relative paths, old broken COALESCE(NULLIF(?, '')) query, and no main batch loop. Was being picked up by run_all.sh's scrapers/*.rb glob and failing every full run with a LoadError.Docs
CLAUDE.md: corrected geocode-backfill command to use tools/backfill_geocode.rb, updated schema-change guidance to point to lib/migrate.rb.README.md: removed stale tools/enrich.rb references, corrected enrichment/backfill examples, updated tools table.5f60868)3fc874c → bc3490f)scrapers/launcestoncity.rb for the Launceston eProperty portal (ASP.NET session-based site).merge_set_cookie!) to maintain ASP.NET_SessionId across requests.docget.asp with multi-variant URL probing (path-case and route-param variants).probe_common_docs fallback: constructs known PDF filenames from DA number when the document list page returns no links.DOWNLOAD_DIR/launceston/<da_ref>/ when DOWNLOAD_ATTACHMENTS=1.c03bfae)lib/log.rb — Log.debug, Log.info, Log.warn, Log.error with LOG_LEVEL env filtering.puts/warn calls across lib/ with Log.* calls.LOG_LEVEL env var to docker-compose.yml (default: info).0e4e035)lib/migrate.rb — lightweight sequential migration runner backed by a schema_migrations table.da_* tables.geo_cache table.run_all.sh now runs ruby /app/lib/migrate.rb before scrapers.f3c06ab)DB.validate_table_name! — enforces da_[a-z0-9_]+ pattern on every table name before interpolation into SQL.DB.client.escape() on all remaining identifier interpolations.validate_table_name! in lib/geocode.rb and lib/enrich.rb.ab11792)lib/db.rb — DB client, ensure_table!, upsert with write-once semantics.lib/http.rb — HTTP client with retries, cookie jar, 403/406 warmup, curl fallback.lib/geocode.rb — Google Maps geocoding with SHA1 cache in geo_cache.lib/enrich.rb — enrich_after_upsert! for per-row geocoding and property lookup.lib/util.rb — parse_aus_date, council/table name mappings.web/index.php — PHP search portal with dynamic UNION across all da_* tables.tools/backfill_geocode.rb — batch geocode backfill.tools/import_sqlites.rb — import from legacy SQLite exports.run_all.sh — discovers and runs scrapers with ONLY/SKIP filtering.entrypoint.sh — Docker entry with optional loop via SCRAPE_EVERY_MINUTES.