All notable changes to the TAS Councils scraping pipeline are recorded here. Entries are grouped by push/session in reverse-chronological order.
lib/http.rb — Full browser fingerprint headers
Upgrade-Insecure-Requests: 1, Sec-Fetch-Dest, Sec-Fetch-Mode, Sec-Fetch-Site, Sec-Fetch-User, sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform to BASE_HEADERS — these are sent by all scrapers using Http.get/Http.request automaticallyscrapers/burnie.rb — Two bug fixes
next inside Net::HTTP.start block only exits the block, not the while loop; fixed by setting a redirect_to variable inside the block and calling next on the outer loopURI::InvalidURIError on PDF URLs containing non-ASCII characters (e.g. en-dash – in filename): percent-encode non-ASCII chars in href before URI.joinscrapers/kingisland.rb — Complete rewrite
div#accordion-1-c4) for DA noticesDA YYYY/NN), address, description, on-notice date, and PDF link from structured paragraph textscrapers/latrobe.rb — Complete rewrite
https://www.latrobe.tas.gov.au/services/building-and-planning-services/planningapp directlyli.generic-list__item h3.generic-list__title a — link text format: L-DA007/2026 ADDRESS - DESCRIPTION (submissions by DATE)scrapers/derwentvalley.rb — Complete rewrite
/home/latest-news?...=Public+Notice; for each news-listing__item link extracts the index_url parameter from the lgasa href, GETs lgasa-web.squiz.cloud/?a=ID (non-following), reads Location header to get the real DV detail page URLAPP No / SITE / PROPOSAL tablescrapers/georgetown.rb — Fixed field name matching
"Location" was not matched by /(Address|Property)/i — address was always empty, causing all rows to be skipped"Opening Date" was not matched by date received regexLocation and Opening Date to the respective patternsapplicant ("Applicant Name"), title_reference ("Title reference"), and on_notice_to ("Closing Date") into the upsertscrapers/kingisland.rb (original stub) → replaced with full implementation (see above)
Docs
CLAUDE.md: added WAF/Cloudflare handling section, warmup pattern guidance, template scraper recommendations, new common gotchas (non-ASCII PDF URLs, redirect-in-block bug)README.md: added WAF and Cloudflare Handling section; updated project structure tree to include scraper_helpers.rb and migrate.rbVERSIONS.md: this entryscrapers/planbuild.rb — rewrote to fix crash on first item:
require "zlib", require "stringio", require_relative "../lib/log"fetch_detail now always returns a Hash (parsed.is_a?(Hash) ? parsed : {}); bare rescue {} replaced with rescue JSON::ParserError, Zlib::Errorputs — replaced with Log.debug/Log.infolocal_document_url now passes nil (not "") when no downloads — prevents COALESCE overwriting an existing URL with empty stringscrapers/southernmidlands.rb — rewrote detail page parser:
Location: / Proposal: paragraph format, not table rows — old table tr th/td selector found nothing, causing 0 saves<br>-separated lines per paragraph, extracts Location/Proposal fields, handles multiple DAs per item pageALTER TABLE block (columns already in DB.ensure_table!)require_relative "../lib/http", ../lib/db", "../lib/util"Missing require_relative "../lib/log" — 20 scrapers fixed:
break_oday, brighton, burnie, centralcoast, circularhead, clarence, derwentvalley, devonportcity, dorset, flinders_council, glenorchy, huonvalley, kentish, launcestoncity, meandervalley, northernmidlands, southernmidlands, waratah_wynyard, westcoast, westtamarLog.warn called in rescue blocks in all of these — without the require, the first error would raise NameError: uninitialized constant Log instead of loggingenrich_after_upsert! variable scope bugs — 4 scrapers fixed:
flinders_council.rb: council_reference (undefined) → ref; folded separate UPDATE document_url into DB.upsert; removed redundant ALTER TABLEhuonvalley.rb: council_reference/address (undefined) → r[:council_reference]/r[:address]; folded UPDATE document_url into upsert; removed redundant ALTER TABLEkentish.rb: council_reference/address (undefined) → r[:council_reference]/r[:address]; folded extras UPDATE into upsertwestcoast.rb: address (undefined) → item[:address]; fixed upsert field names (on_notice → on_notice_to, on_notice_raw → on_notice_to_raw); fixed values referencing non-existent item keys; folded extras UPDATE into upsertRedundant ALTER TABLE blocks removed from circularhead.rb and waratah_wynyard.rb — all columns already created by DB.ensure_table!
Logging
warn "..." calls across scrapers/*.rb replaced with Log.warn "scraper", "..." — structured logging now consistent throughout; stderr output is now filtered by LOG_LEVEL.DB.upsert dynamic rewrite (lib/db.rb)
upsert now derives columns from row.keys, so scrapers that pass scraper-specific columns (e.g. advertised_date, legal_description) are no longer silently ignored.SAFE_COLUMN_RE = /\A[a-z][a-z0-9_]*\z/ — each key is validated before interpolation into SQL; unsafe names raise ArgumentError rather than silently passing.UPSERT_ON_DUP constant (date_received, date_received_raw, document_url, local_document_url) — easier to audit and extend.Mysql2::Error (caught by scraper rescue) instead of silently being dropped, surfacing schema mismatches early.Security
lib/http.rb curl fallback: replaced shell-interpolated backtick call with Open3.capture2 array form — eliminates shell injection risk from URL-derived ref/uri values.web/index.php: added validate_table_name() helper (enforces /\Ada_[a-z0-9_]+\z/) applied before every backtick-quoted table name interpolation (tableHasColumn, $stageT stages fetch, UNION SELECT builder).Schema consolidation
Geocode.ensure_da_columns! from lib/geocode.rb — redundant, covered by DB.ensure_table! (new tables) and migration v1 (existing tables). Removed its call from tools/backfill_geocode.rb.ensure_extra_columns! from lib/enrich.rb and all 10 scraper call-sites — same reasoning; was also using wrong column types (DOUBLE/VARCHAR(50)) vs canonical schema (DECIMAL(10,7)/TEXT).Error handling
rescue => e replaced with rescue StandardError => e across all scrapers, lib, and tools — prevents accidental swallowing of SystemExit/SignalException.lib/enrich.rb: two warn calls replaced with Log.warn for structured logging; stale file header comment removed.Removed
scrapers/enrich.rb — stale duplicate with wrong require_relative paths, old broken COALESCE(NULLIF(?, '')) query, no main batch loop. Was picked up by run_all.sh's glob and failing every full run with LoadError.Docs
CLAUDE.md: corrected scraper pattern (removed ensure_extra_columns!(TABLE) step), updated geocode-backfill command, corrected schema-change guidance.README.md: removed stale tools/enrich.rb references; corrected enrichment/backfill examples and tools table; added link to VERSIONS.md.VERSIONS.md: created — changelog covering all changes from initial upload.Bug fixes
Mysql2::Error Unknown column '''' in 'SET' — MariaDB 10.11's prepared-statement parser mishandles string literals ('') inside NULLIF/IF expressions in SET clauses. Replaced COALESCE(NULLIF(?, ''), col) with COALESCE(?, col) passing nil when the value is empty (lib/enrich.rb).private method 'da_tables' called error in lib/migrate.rb — migration lambdas call Migrate.da_tables with an explicit receiver, which counts as a public call. Removed da_tables from private_class_method declaration.end / dangling rescue syntax error in scrapers/launcestoncity.rb introduced during a prior cleanup pass.scrapers/launcestoncity.rb.Removed
scrapers/enrich.rb — stale copy of lib/enrich.rb with wrong require_relative paths, old broken COALESCE(NULLIF(?, '')) query, and no main batch loop. Was being picked up by run_all.sh's scrapers/*.rb glob and failing every full run with a LoadError.Docs
CLAUDE.md: corrected geocode-backfill command to use tools/backfill_geocode.rb, updated schema-change guidance to point to lib/migrate.rb.README.md: removed stale tools/enrich.rb references, corrected enrichment/backfill examples, updated tools table.5f60868)3fc874c → bc3490f)scrapers/launcestoncity.rb for the Launceston eProperty portal (ASP.NET session-based site).merge_set_cookie!) to maintain ASP.NET_SessionId across requests.docget.asp with multi-variant URL probing (path-case and route-param variants).probe_common_docs fallback: constructs known PDF filenames from DA number when the document list page returns no links.DOWNLOAD_DIR/launceston/<da_ref>/ when DOWNLOAD_ATTACHMENTS=1.c03bfae)lib/log.rb — Log.debug, Log.info, Log.warn, Log.error with LOG_LEVEL env filtering.puts/warn calls across lib/ with Log.* calls.LOG_LEVEL env var to docker-compose.yml (default: info).0e4e035)lib/migrate.rb — lightweight sequential migration runner backed by a schema_migrations table.da_* tables.geo_cache table.run_all.sh now runs ruby /app/lib/migrate.rb before scrapers.f3c06ab)DB.validate_table_name! — enforces da_[a-z0-9_]+ pattern on every table name before interpolation into SQL.DB.client.escape() on all remaining identifier interpolations.validate_table_name! in lib/geocode.rb and lib/enrich.rb.ab11792)lib/db.rb — DB client, ensure_table!, upsert with write-once semantics.lib/http.rb — HTTP client with retries, cookie jar, 403/406 warmup, curl fallback.lib/geocode.rb — Google Maps geocoding with SHA1 cache in geo_cache.lib/enrich.rb — enrich_after_upsert! for per-row geocoding and property lookup.lib/util.rb — parse_aus_date, council/table name mappings.web/index.php — PHP search portal with dynamic UNION across all da_* tables.tools/backfill_geocode.rb — batch geocode backfill.tools/import_sqlites.rb — import from legacy SQLite exports.run_all.sh — discovers and runs scrapers with ONLY/SKIP filtering.entrypoint.sh — Docker entry with optional loop via SCRAPE_EVERY_MINUTES.