Przeglądaj źródła

DB & Warn Fixes

Benjamin Harris 2 miesięcy temu
rodzic
commit
add1f78a4b

+ 9 - 1
.claude/settings.local.json

@@ -14,7 +14,15 @@
       "Bash(grep -n '\\\\$t\\\\|tableHasColumn\\\\|tableExists' /f/GIT_REPO/tas_councils/web/index.php)",
       "Bash(xargs sed:*)",
       "Bash(grep -n \"Open3\\\\|capture2\\\\|backtick\\\\|\\\\`#{\" f:/GIT_REPO/tas_councils/lib/http.rb)",
-      "Bash(grep -n \"validate_table_name\\\\|FROM \\\\`{\\\\|SHOW COLUMNS\" f:/GIT_REPO/tas_councils/web/index.php)"
+      "Bash(grep -n \"validate_table_name\\\\|FROM \\\\`{\\\\|SHOW COLUMNS\" f:/GIT_REPO/tas_councils/web/index.php)",
+      "Bash(grep -l \"DB.ensure_table!\" /f/GIT_REPO/tas_councils/scrapers/*.rb)",
+      "Bash(grep -L \"DB.ensure_table!\" /f/GIT_REPO/tas_councils/scrapers/*.rb)",
+      "Bash(grep -A 2 \"rescue StandardError => e\" /f/GIT_REPO/tas_councils/lib/*.rb /f/GIT_REPO/tas_councils/scrapers/*.rb)",
+      "Bash(grep -n \"enrich_after_upsert\" /f/GIT_REPO/tas_councils/scrapers/*.rb)",
+      "Bash(wc -l /f/GIT_REPO/tas_councils/lib/*.rb /f/GIT_REPO/tas_councils/scrapers/*.rb)",
+      "Bash(grep -n \"def.*$\" /f/GIT_REPO/tas_councils/lib/util.rb)",
+      "Bash(grep -l \"require.*log\\\\b\" f:/GIT_REPO/tas_councils/scrapers/*.rb)",
+      "Bash(grep -l \"require.*scraper_helpers\\\\b\" f:/GIT_REPO/tas_councils/scrapers/*.rb)"
     ]
   }
 }

+ 38 - 1
VERSIONS.md

@@ -5,7 +5,44 @@ Entries are grouped by push/session in reverse-chronological order.
 
 ---
 
-## 2026-04-13 — Code Quality & Bug Fixes (current)
+## 2026-04-13 — Code Quality Pass 3
+
+**Logging**
+- All 63 bare `warn "..."` calls across `scrapers/*.rb` replaced with `Log.warn "scraper", "..."` — structured logging now consistent throughout; stderr output is now filtered by `LOG_LEVEL`.
+
+**DB.upsert dynamic rewrite** (`lib/db.rb`)
+- Removed hardcoded 22-column array — `upsert` now derives columns from `row.keys`, so scrapers that pass scraper-specific columns (e.g. `advertised_date`, `legal_description`) are no longer silently ignored.
+- Added `SAFE_COLUMN_RE = /\A[a-z][a-z0-9_]*\z/` — each key is validated before interpolation into SQL; unsafe names raise `ArgumentError` rather than silently passing.
+- Extracted write-once/merge semantics into `UPSERT_ON_DUP` constant (`date_received`, `date_received_raw`, `document_url`, `local_document_url`) — easier to audit and extend.
+- Non-existent columns now raise `Mysql2::Error` (caught by scraper rescue) instead of silently being dropped, surfacing schema mismatches early.
+
+---
+
+## 2026-04-13 — Code Quality Pass 2
+
+**Security**
+- `lib/http.rb` curl fallback: replaced shell-interpolated backtick call with `Open3.capture2` array form — eliminates shell injection risk from URL-derived `ref`/`uri` values.
+- `web/index.php`: added `validate_table_name()` helper (enforces `/\Ada_[a-z0-9_]+\z/`) applied before every backtick-quoted table name interpolation (`tableHasColumn`, `$stageT` stages fetch, UNION SELECT builder).
+
+**Schema consolidation**
+- Removed `Geocode.ensure_da_columns!` from `lib/geocode.rb` — redundant, covered by `DB.ensure_table!` (new tables) and migration v1 (existing tables). Removed its call from `tools/backfill_geocode.rb`.
+- Removed `ensure_extra_columns!` from `lib/enrich.rb` and all 10 scraper call-sites — same reasoning; was also using wrong column types (`DOUBLE`/`VARCHAR(50)`) vs canonical schema (`DECIMAL(10,7)`/`TEXT`).
+
+**Error handling**
+- 66 bare `rescue => e` replaced with `rescue StandardError => e` across all scrapers, lib, and tools — prevents accidental swallowing of `SystemExit`/`SignalException`.
+- `lib/enrich.rb`: two `warn` calls replaced with `Log.warn` for structured logging; stale file header comment removed.
+
+**Removed**
+- Deleted `scrapers/enrich.rb` — stale duplicate with wrong `require_relative` paths, old broken `COALESCE(NULLIF(?, ''))` query, no main batch loop. Was picked up by `run_all.sh`'s glob and failing every full run with `LoadError`.
+
+**Docs**
+- `CLAUDE.md`: corrected scraper pattern (removed `ensure_extra_columns!(TABLE)` step), updated geocode-backfill command, corrected schema-change guidance.
+- `README.md`: removed stale `tools/enrich.rb` references; corrected enrichment/backfill examples and tools table; added link to VERSIONS.md.
+- `VERSIONS.md`: created — changelog covering all changes from initial upload.
+
+---
+
+## 2026-04-13 — Code Quality & Bug Fixes
 
 **Bug fixes**
 - Fixed `Mysql2::Error Unknown column '''' in 'SET'` — MariaDB 10.11's prepared-statement parser mishandles string literals (`''`) inside `NULLIF`/`IF` expressions in `SET` clauses. Replaced `COALESCE(NULLIF(?, ''), col)` with `COALESCE(?, col)` passing `nil` when the value is empty (`lib/enrich.rb`).

+ 26 - 54
lib/db.rb

@@ -62,65 +62,37 @@ module DB
         SQL
     end
 
+    # Write-once / merge semantics for specific columns on duplicate key.
+    # All other columns default to VALUES(`col`) (last-write-wins).
+    UPSERT_ON_DUP = {
+        date_received:      "`date_received` = IFNULL(`date_received`, VALUES(`date_received`))",
+        date_received_raw:  "`date_received_raw` = CASE WHEN `date_received_raw` IS NULL OR `date_received_raw` = '' THEN VALUES(`date_received_raw`) ELSE `date_received_raw` END",
+        document_url:       "`document_url` = COALESCE(VALUES(`document_url`), `document_url`)",
+        local_document_url: "`local_document_url` = COALESCE(VALUES(`local_document_url`), `local_document_url`)",
+    }.freeze
+
+    SAFE_COLUMN_RE = /\A[a-z][a-z0-9_]*\z/
+
     def self.upsert(table, row)
         validate_table_name!(table)
-        columns = [
-            :description,
-            :date_received,
-            :date_received_raw,
-            :address,
-            :council_reference,
-            :applicant,
-            :owner,
-            :local_document_url,
-            :document_url,
-            :on_notice_to,
-            :on_notice_to_raw,
-            :title_reference,
-            :property_id,
-            :area_sqm,
-            :area_ha,
-            :address_std,
-            :street,
-            :locality,
-            :state,
-            :postcode,
-            :lat,
-            :lng
-            ]
 
-        esc_table   = client.escape(table)
-        col_names   = columns.map { |c| "`#{c}`" }.join(", ")
-        placeholders = (["?"] * columns.size).join(", ")
+        columns = row.keys
+        columns.each do |c|
+            raise ArgumentError, "Unsafe column name: #{c.inspect}" unless c.to_s.match?(SAFE_COLUMN_RE)
+        end
 
-        updates = columns.map { |c|
-            case c
-            when :date_received
-                # write-once: only set if currently NULL
-                "`#{c}` = IFNULL(`#{c}`, VALUES(`#{c}`))"
-            when :date_received_raw
-                # write-once for strings: only set if NULL or ''
-                "`#{c}` = CASE WHEN `#{c}` IS NULL OR `#{c}` = '' THEN VALUES(`#{c}`) ELSE `#{c}` END"
-            when :document_url, :local_document_url
-                # don't blank out existing value if new is NULL
-                "`#{c}` = COALESCE(VALUES(`#{c}`), `#{c}`)"
-            else
-                "`#{c}` = VALUES(`#{c}`)"
-            end
-            }.join(", ")
+        esc_table    = client.escape(table)
+        col_names    = columns.map { |c| "`#{c}`" }.join(", ")
+        placeholders = (["?"] * columns.size).join(", ")
+        updates      = columns.map { |c| UPSERT_ON_DUP[c.to_sym] || "`#{c}` = VALUES(`#{c}`)" }.join(", ")
 
         sql = <<~SQL
-        INSERT INTO `#{esc_table}` (#{col_names}, created_at, updated_at)
+            INSERT INTO `#{esc_table}` (#{col_names}, created_at, updated_at)
             VALUES (#{placeholders}, NOW(), NOW())
-                ON DUPLICATE KEY UPDATE
-                #{updates},
-                updated_at = NOW()
-                SQL
-
-                stmt = client.prepare(sql)
-                values = columns.map { |c| row[c] }
-                stmt.execute(*values)
-                end
-
+            ON DUPLICATE KEY UPDATE #{updates}, updated_at = NOW()
+        SQL
 
-            end
+        stmt = client.prepare(sql)
+        stmt.execute(*columns.map { |c| row[c] })
+    end
+end

+ 2 - 2
scrapers/break_oday.rb

@@ -77,7 +77,7 @@ def safe_name(s) = s.to_s.gsub(/[^\w\-.]+/, "_")
             # return web-accessible relative path if needed
             "/downloads/breakoday/#{safe_name(council_reference)}/#{fname}"
         rescue StandardError => e
-            warn "PDF download failed for #{url}: #{e.class} #{e.message}"
+            Log.warn "scraper", "PDF download failed for #{url}: #{e.class} #{e.message}"
             nil
         end
     end
@@ -149,7 +149,7 @@ LIMIT 1
             row = DB.client.prepare(sql).execute(council_reference, address).first
             puts "  enriched -> #{row ? row.inspect : 'nil'}"
         rescue StandardError => e
-            warn "  enriched probe failed: #{e.class} #{e.message}"
+            Log.warn "scraper", "  enriched probe failed: #{e.class} #{e.message}"
         end
 
         puts "Upserted #{council_reference} -> #{address}"

+ 3 - 3
scrapers/brighton.rb

@@ -68,7 +68,7 @@ def safe_name(s) = s.to_s.gsub(/[^\w\-.]+/, "_")
             puts "Saved PDF #{path}"
             "/downloads/brighton/#{safe_name(council_reference)}/#{fname}"
         rescue StandardError => e
-            warn "PDF download failed for #{url}: #{e.class} #{e.message}"
+            Log.warn "scraper", "PDF download failed for #{url}: #{e.class} #{e.message}"
             nil
         end
     end
@@ -152,7 +152,7 @@ def safe_name(s) = s.to_s.gsub(/[^\w\-.]+/, "_")
             upd = DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET document_url = ? WHERE council_reference = ? AND address = ?")
             upd.execute(document_url, council_reference, address)
         rescue StandardError => e
-            warn "document_url update skipped for #{council_reference}: #{e.class} #{e.message}"
+            Log.warn "scraper", "document_url update skipped for #{council_reference}: #{e.class} #{e.message}"
         end
 
         # local copy
@@ -162,7 +162,7 @@ def safe_name(s) = s.to_s.gsub(/[^\w\-.]+/, "_")
                 upd2 = DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET local_document_url = ? WHERE council_reference = ? AND address = ?")
                 upd2.execute(local_doc_url, council_reference, address)
             rescue StandardError => e
-                warn "local_document_url update skipped for #{council_reference}: #{e.class} #{e.message}"
+                Log.warn "scraper", "local_document_url update skipped for #{council_reference}: #{e.class} #{e.message}"
             end
         end
 

+ 5 - 5
scrapers/burnie.rb

@@ -179,7 +179,7 @@ def first_pdf_on_detail(detail_url, jar)
   return "" unless a
   URI.join(detail_url, a["href"].to_s).to_s
 rescue StandardError => e
-  warn "Detail fetch failed for #{detail_url}: #{e.class} #{e.message}"
+  Log.warn "scraper", "Detail fetch failed for #{detail_url}: #{e.class} #{e.message}"
   ""
 end
 
@@ -194,7 +194,7 @@ def decode_seamless_viewstate(doc)
   end
   Nokogiri::HTML(html)
 rescue StandardError => e
-  warn "Failed to decode __SEAMLESSVIEWSTATE: #{e.class} #{e.message}"
+  Log.warn "scraper", "Failed to decode __SEAMLESSVIEWSTATE: #{e.class} #{e.message}"
   nil
 end
 
@@ -234,10 +234,10 @@ def save_pdf(document_url, council_reference, jar, referer:)
     File.open(out_path, "wb") { |f| f.write(data) }
     puts "Saved PDF to #{out_path} (#{data.bytesize} bytes)"
   else
-    warn "PDF fetch failed (#{code} #{msg}) for #{document_url}"
+    Log.warn "scraper", "PDF fetch failed (#{code} #{msg}) for #{document_url}"
   end
 rescue StandardError => e
-  warn "PDF save error for #{document_url}: #{e.class} #{e.message}"
+  Log.warn "scraper", "PDF save error for #{document_url}: #{e.class} #{e.message}"
 end
 
 # ----- Warm-up sequence to appease WAF -----
@@ -363,7 +363,7 @@ nodes.each do |a|
     title_reference = a.at_css(".list-item-title")&.text&.strip.to_s
     upd.execute(document_url, on_notice_to, on_notice_to_raw, title_reference, council_reference, address)
   rescue StandardError => e
-    warn "Extra fields update skipped for #{council_reference}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Extra fields update skipped for #{council_reference}: #{e.class} #{e.message}"
   end
 
   puts "Upserted #{council_reference} -> #{address}"

+ 2 - 2
scrapers/centralcoast.rb

@@ -66,7 +66,7 @@ puts "  saved #{File.basename(path)} (#{body.to_s.bytesize} bytes)"
 # adjust if your web container mounts differently
 "/downloads/centralcoast/#{safe_name(council_reference)}/#{fname}"
 rescue StandardError => e
-warn "Download failed for #{url}: #{e.class} #{e.message}"
+Log.warn "scraper", "Download failed for #{url}: #{e.class} #{e.message}"
 nil
 end
 
@@ -162,7 +162,7 @@ begin
         )
     upd.execute(document_url, local_doc_url, title_reference, council_reference, address)
 rescue StandardError => e
-    warn "Extras update skipped for #{council_reference}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Extras update skipped for #{council_reference}: #{e.class} #{e.message}"
 end
 
 puts "Upserted #{council_reference} -> #{address} #{local_doc_url ? '(downloaded)' : ''}"

+ 2 - 2
scrapers/circularhead.rb

@@ -16,7 +16,7 @@ begin
   DB.client.query("ALTER TABLE `#{DB.client.escape(TABLE)}` ADD COLUMN IF NOT EXISTS document_url VARCHAR(1024) NULL")
   DB.client.query("ALTER TABLE `#{DB.client.escape(TABLE)}` ADD COLUMN IF NOT EXISTS title_reference TEXT NULL")
 rescue StandardError => e
-  warn "Optional column add skipped: #{e.class} #{e.message}"
+  Log.warn "scraper", "Optional column add skipped: #{e.class} #{e.message}"
 end
 
 def abs_url(base, href)
@@ -80,7 +80,7 @@ items.each_with_index do |li, idx|
     upd = DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET document_url = ?, title_reference = ? WHERE council_reference = ? AND address = ?")
     upd.execute(document_url, title_reference, council_reference, address)
   rescue Mysql2::Error => e
-    warn "[circularhead] db update skipped for #{council_reference}: #{e.message}"
+    Log.warn "scraper", "[circularhead] db update skipped for #{council_reference}: #{e.message}"
   end
 
   puts "Upserted #{council_reference} -> #{address}"

+ 2 - 2
scrapers/clarence.rb

@@ -86,7 +86,7 @@ def parse_date_token(s)
                 # Web-accessible path (served by your web container)
                 "/downloads/clarence/#{safe_name(council_reference)}/#{fname}"
             rescue StandardError => e
-                warn "PDF download failed for #{url}: #{e.class} #{e.message}"
+                Log.warn "scraper", "PDF download failed for #{url}: #{e.class} #{e.message}"
                 nil
             end
         end
@@ -192,7 +192,7 @@ def parse_date_token(s)
                     )
                 upd.execute(r[:pdf], local_doc_url, r[:on_notice], r[:on_notice_raw], r[:title_reference], cr, addr)
             rescue StandardError => e
-                warn "Extras update skipped for #{cr}: #{e.class} #{e.message}"
+                Log.warn "scraper", "Extras update skipped for #{cr}: #{e.class} #{e.message}"
             end
 
             puts "Upserted #{cr} -> #{addr}  saved: #{local_doc_url ? 1 : 0}"

+ 3 - 3
scrapers/derwentvalley.rb

@@ -114,14 +114,14 @@ links = []
 begin
   links = detail_links_from_list(LIST_URL)
 rescue StandardError => e
-  warn "List fetch failed, will try news listing: #{e.class} #{e.message}"
+  Log.warn "scraper", "List fetch failed, will try news listing: #{e.class} #{e.message}"
 end
 
 if links.empty?
   begin
     links = detail_links_from_news(NEWS_URL)
   rescue StandardError => e
-    warn "News fetch failed: #{e.class} #{e.message}"
+    Log.warn "scraper", "News fetch failed: #{e.class} #{e.message}"
   end
 end
 
@@ -135,7 +135,7 @@ links.each do |u|
   begin
     item = parse_detail(u)
   rescue StandardError => e
-    warn "Skip #{u}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Skip #{u}: #{e.class} #{e.message}"
     next
   end
   next unless item

+ 3 - 3
scrapers/devonportcity.rb

@@ -92,12 +92,12 @@ def extract_on_notice_to_from_title(title)
                         end
                         puts "Saved PDF to #{out_path}"
                     else
-                        warn "PDF fetch failed (#{resp.code} #{resp.message}) for #{url}"
+                        Log.warn "scraper", "PDF fetch failed (#{resp.code} #{resp.message}) for #{url}"
                     end
                 end
             end
         rescue StandardError => e
-            warn "PDF save error for #{url}: #{e.class} #{e.message}"
+            Log.warn "scraper", "PDF save error for #{url}: #{e.class} #{e.message}"
         end
 
         # ---------- Fetch + parse ----------
@@ -188,7 +188,7 @@ def extract_on_notice_to_from_title(title)
                 upd = DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET document_url = ?, title_reference = ? WHERE council_reference = ? AND address = ?")
                 upd.execute(document_url, title_reference, council_reference, address)
             rescue Mysql2::Error => e
-                warn "[devonportcity] db update skipped for #{council_reference}: #{e.message}"
+                Log.warn "scraper", "[devonportcity] db update skipped for #{council_reference}: #{e.message}"
             end
 
             puts "Upserted #{council_reference} -> #{address}"

+ 5 - 5
scrapers/dorset.rb

@@ -231,7 +231,7 @@ def download_all(urls, jar, council_reference)
       saved << path
       first_web_rel ||= "/files/dorset/#{safe_name(council_reference)}/#{File.basename(path)}"
     rescue StandardError => e
-      warn "Download failed for #{u}: #{e.class} #{e.message}"
+      Log.warn "scraper", "Download failed for #{u}: #{e.class} #{e.message}"
     end
   end
 
@@ -240,7 +240,7 @@ def download_all(urls, jar, council_reference)
       DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET local_document_url = ? WHERE council_reference = ?")
                .execute(first_web_rel, council_reference)
     rescue StandardError => e
-      warn "Failed to set local_document_url for #{council_reference}: #{e.class} #{e.message}"
+      Log.warn "scraper", "Failed to set local_document_url for #{council_reference}: #{e.class} #{e.message}"
     end
   end
 
@@ -285,7 +285,7 @@ list_items.each do |r|
         end
       end
     rescue StandardError => e
-      warn "Detail fetch failed for #{detail_url}: #{e.class} #{e.message}"
+      Log.warn "scraper", "Detail fetch failed for #{detail_url}: #{e.class} #{e.message}"
     end
   end
 
@@ -296,7 +296,7 @@ list_items.each do |r|
   begin
     geo = Geocode.format_au(r[:address])
   rescue StandardError => e
-    warn "Geocode error for #{r[:council_reference]}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Geocode error for #{r[:council_reference]}: #{e.class} #{e.message}"
   end
   
   council_reference = r[:council_reference][0,100]
@@ -332,7 +332,7 @@ begin
   row = DB.client.prepare(sql).execute(council_reference, address).first
   puts "  enriched -> #{row ? row.inspect : 'nil'}"
 rescue StandardError => e
-  warn "  enriched probe failed: #{e.class} #{e.message}"
+  Log.warn "scraper", "  enriched probe failed: #{e.class} #{e.message}"
 end
 
   puts "Upserted #{r[:council_reference]} -> #{r[:address]}  docs: #{doc_urls.length} saved: #{saved_paths.length} stages: #{stages.length}"

+ 2 - 2
scrapers/flinders_council.rb

@@ -16,7 +16,7 @@ DB.ensure_table!(TABLE)
 begin
   DB.client.query("ALTER TABLE `#{DB.client.escape(TABLE)}` ADD COLUMN IF NOT EXISTS document_url VARCHAR(1024) NULL")
 rescue StandardError => e
-  warn "document_url add skipped: #{e.class} #{e.message}"
+  Log.warn "scraper", "document_url add skipped: #{e.class} #{e.message}"
 end
 
 def abs_url(base, href)
@@ -90,7 +90,7 @@ links.each do |a|
     upd = DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET document_url = ? WHERE council_reference = ? AND address = ?")
     upd.execute(pdf, ref, address)
   rescue Mysql2::Error => e
-    warn "[flinders] db update skipped for #{ref}: #{e.message}"
+    Log.warn "scraper", "[flinders] db update skipped for #{ref}: #{e.message}"
   end
 
   puts "Upserted #{ref} -> #{address}"

+ 4 - 4
scrapers/glenorchy.rb

@@ -20,7 +20,7 @@ DB.ensure_table!(TABLE)
 begin
     DB.client.query("ALTER TABLE `#{DB.client.escape(TABLE)}` ADD COLUMN IF NOT EXISTS document_url VARCHAR(1024) NULL")
 rescue StandardError => e
-    warn "Could not add document_url column: #{e.class} #{e.message}"
+    Log.warn "scraper", "Could not add document_url column: #{e.class} #{e.message}"
 end
 
 def text_or(node, default = "")
@@ -67,7 +67,7 @@ def safe_name(s) = s.to_s.gsub(/[^\w\-.]+/, "_")
             # web-facing relative path (match your nginx/apache mapping)
             "/downloads/glenorchy/#{safe_name(council_reference)}/#{fname}"
         rescue StandardError => e
-            warn "Download failed for #{doc_url}: #{e.class} #{e.message}"
+            Log.warn "scraper", "Download failed for #{doc_url}: #{e.class} #{e.message}"
             ""
         end
     end
@@ -129,7 +129,7 @@ def safe_name(s) = s.to_s.gsub(/[^\w\-.]+/, "_")
             DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET document_url = ? WHERE council_reference = ? AND address = ?")
             .execute(document_url, council_reference, address)
         rescue StandardError => e
-            warn "document_url update failed: #{e.class} #{e.message}"
+            Log.warn "scraper", "document_url update failed: #{e.class} #{e.message}"
         end
 
         # download + store local_document_url
@@ -139,7 +139,7 @@ def safe_name(s) = s.to_s.gsub(/[^\w\-.]+/, "_")
                 DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET local_document_url = ? WHERE council_reference = ? AND address = ?")
                 .execute(local_rel, council_reference, address)
             rescue StandardError => e
-                warn "local_document_url update failed: #{e.class} #{e.message}"
+                Log.warn "scraper", "local_document_url update failed: #{e.class} #{e.message}"
             end
         end
 

+ 3 - 3
scrapers/huonvalley.rb

@@ -17,7 +17,7 @@ DB.ensure_table!(TABLE)
 begin
   DB.client.query("ALTER TABLE `#{DB.client.escape(TABLE)}` ADD COLUMN IF NOT EXISTS document_url TEXT NULL")
 rescue StandardError => e
-  warn "document_url add skipped: #{e.class} #{e.message}"
+  Log.warn "scraper", "document_url add skipped: #{e.class} #{e.message}"
 end
 
 REF_RX = %r{\bDA[-\s]?\d{1,4}/20\d{2}\b}i
@@ -103,7 +103,7 @@ loop do
   begin
     html = Http.get(url)
   rescue StandardError => e
-    warn "Failed to fetch #{url}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Failed to fetch #{url}: #{e.class} #{e.message}"
     break
   end
 
@@ -134,7 +134,7 @@ loop do
       upd = DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET document_url = ? WHERE council_reference = ? AND address = ?")
       upd.execute(r[:document_url], r[:council_reference], r[:address])
     rescue Mysql2::Error => e
-      warn "[huonvalley] db update skipped for #{r[:council_reference]}: #{e.message}"
+      Log.warn "scraper", "[huonvalley] db update skipped for #{r[:council_reference]}: #{e.message}"
     end
 
     puts "Upserted #{r[:council_reference]} -> #{r[:address]}"

+ 2 - 2
scrapers/kentish.rb

@@ -131,7 +131,7 @@ end
 begin
   html = Http.get(URL)
 rescue StandardError => e
-  warn "Failed to fetch #{URL}: #{e.class} #{e.message}"
+  Log.warn "scraper", "Failed to fetch #{URL}: #{e.class} #{e.message}"
   exit 1
 end
 
@@ -161,7 +161,7 @@ items.each do |r|
     upd = DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET document_url = ?, on_notice_to = ?, on_notice_to_raw = ? WHERE council_reference = ? AND address = ?")
     upd.execute(r[:document_url], r[:date_received], r[:date_received_raw], r[:council_reference], r[:address])
   rescue StandardError => e
-    warn "Extras update skipped for #{r[:council_reference]}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Extras update skipped for #{r[:council_reference]}: #{e.class} #{e.message}"
   end
 
   puts "Upserted #{r[:council_reference]} -> #{r[:address]}"

+ 9 - 9
scrapers/launcestoncity.rb

@@ -228,13 +228,13 @@ def probe_common_docs(base_url:, key:, danum:, referer:)
             saved = download_doc(pdf_url, referer: referer, council_reference: danum_raw, jar: SESSION_JAR)
             local_rel = "/files/launceston/#{safe_name(danum_raw)}/#{File.basename(saved)}"
           rescue StandardError => e
-            warn "DOC download failed (probe) for #{danum_raw} #{File.basename(pdf_url)}: #{e.class} #{e.message}"
+            Log.warn "scraper", "DOC download failed (probe) for #{danum_raw} #{File.basename(pdf_url)}: #{e.class} #{e.message}"
           end
         end
         found << { name: File.basename(pdf_url), url: pdf_url, local_url: local_rel }
       end
     rescue StandardError => e
-      warn "[launcestoncity] probe failed for #{pdf_url}: #{e.class} #{e.message}"
+      Log.warn "scraper", "[launcestoncity] probe failed for #{pdf_url}: #{e.class} #{e.message}"
       next
     end
   end
@@ -371,7 +371,7 @@ tables.each do |t|
 						saved = download_doc(href, referer: candidate_url, council_reference: council_reference, jar: SESSION_JAR)
 						local_rel = "/files/launceston/#{safe_name(council_reference)}/#{File.basename(saved)}"
 					  rescue StandardError => e
-						warn "DOC download failed for #{council_reference} #{name}: #{e.class} #{e.message}"
+						Log.warn "scraper", "DOC download failed for #{council_reference} #{name}: #{e.class} #{e.message}"
 					  end
 					end
 
@@ -398,7 +398,7 @@ tables.each do |t|
 					  anchors_added = probed.size if probed.any?
 					end
 				  rescue StandardError => e
-					warn "Probe fallback failed for #{council_reference}: #{e.class} #{e.message}"
+					Log.warn "scraper", "Probe fallback failed for #{council_reference}: #{e.class} #{e.message}"
 				  end
 				  end
 
@@ -413,23 +413,23 @@ tables.each do |t|
 					  FileUtils.mkdir_p(dump_dir)
 					  File.write(File.join(dump_dir, "#{safe_name(council_reference)}.html"), list_html[0, 5000])
 					rescue StandardError => e
-					  warn "Failed to write dump for #{council_reference}: #{e.class} #{e.message}"
+					  Log.warn "scraper", "Failed to write dump for #{council_reference}: #{e.class} #{e.message}"
 					end
 				  end
 
 				rescue StandardError => e
-				  warn "Doc list fetch failed for #{council_reference} at #{candidate_url} (referer: #{ref}): #{e.class} #{e.message}"
+				  Log.warn "scraper", "Doc list fetch failed for #{council_reference} at #{candidate_url} (referer: #{ref}): #{e.class} #{e.message}"
 				end
 			  end
 			end
 
 			if used_url.nil?
-			  warn "Docs page had no usable links for #{council_reference} after variants: #{variants_for_doc_list(doc_list_url).join(' | ')}"
+			  Log.warn "scraper", "Docs page had no usable links for #{council_reference} after variants: #{variants_for_doc_list(doc_list_url).join(' | ')}"
 			end
 
 
 		  rescue StandardError => e
-			warn "Doc list fetch failed for #{council_reference}: #{e.class} #{e.message}"
+			Log.warn "scraper", "Doc list fetch failed for #{council_reference}: #{e.class} #{e.message}"
 		  end
 		end
 
@@ -469,7 +469,7 @@ tables.each do |t|
 		})
 
     rescue StandardError => e
-      warn "Enrich failed for #{council_reference}: #{e.class} #{e.message}"
+      Log.warn "scraper", "Enrich failed for #{council_reference}: #{e.class} #{e.message}"
     end
   end
 

+ 2 - 2
scrapers/meandervalley.rb

@@ -101,7 +101,7 @@ def safe_name(s) = s.to_s.gsub(/[^\w\-.]+/, "_")
             puts "Saved PDF #{path}"
             "/downloads/meandervalley/#{safe_name(council_reference)}/#{fname}"
         rescue StandardError => e
-            warn "PDF download failed for #{url}: #{e.class} #{e.message}"
+            Log.warn "scraper", "PDF download failed for #{url}: #{e.class} #{e.message}"
             nil
         end
     end
@@ -224,7 +224,7 @@ def safe_name(s) = s.to_s.gsub(/[^\w\-.]+/, "_")
                 r[:address]
                 )
         rescue StandardError => e
-            warn "Extras update skipped for #{r[:council_reference]}: #{e.class} #{e.message}"
+            Log.warn "scraper", "Extras update skipped for #{r[:council_reference]}: #{e.class} #{e.message}"
         end
 
         puts "Upserted #{r[:council_reference]} -> #{r[:address]}  saved: #{local_doc_url ? 1 : 0}"

+ 2 - 2
scrapers/northernmidlands.rb

@@ -141,7 +141,7 @@ def parse_items(doc, base_url)
 end
 
 if URL.empty?
-  warn "NORTHERN_MIDLANDS_URL is not set. Example:\n  ONLY=northernmidlands NORTHERN_MIDLANDS_URL='https://.../advertised-applications' docker compose run --rm scraper /app/run_all.sh"
+  Log.warn "scraper", "NORTHERN_MIDLANDS_URL is not set. Example:\n  ONLY=northernmidlands NORTHERN_MIDLANDS_URL='https://.../advertised-applications' docker compose run --rm scraper /app/run_all.sh"
   exit 0
 end
 
@@ -153,7 +153,7 @@ begin
     Http.get(URL)
   end
 rescue StandardError => e
-  warn "Failed to fetch #{URL}: #{e.class} #{e.message}"
+  Log.warn "scraper", "Failed to fetch #{URL}: #{e.class} #{e.message}"
   exit 1
 end
 

+ 2 - 2
scrapers/planbuild.rb

@@ -111,7 +111,7 @@ items.each do |r|
     begin
         detail = fetch_detail(uuid, jar, token, hdr) if uuid
     rescue StandardError => e
-        warn "Detail fetch failed for #{ref}: #{e.class} #{e.message}"
+        Log.warn "scraper", "Detail fetch failed for #{ref}: #{e.class} #{e.message}"
     end
 
     puts "Council: #{table}"
@@ -161,7 +161,7 @@ items.each do |r|
     begin
         geo = Geocode.format_au(addr)
     rescue StandardError => e
-        warn "Geocode error for #{ref}: #{e.class} #{e.message}"
+        Log.warn "scraper", "Geocode error for #{ref}: #{e.class} #{e.message}"
     end
 
     # --- upsert into DB ---

+ 3 - 3
scrapers/southernmidlands.rb

@@ -20,7 +20,7 @@ begin
   DB.client.query("ALTER TABLE `#{DB.client.escape(TABLE)}` ADD COLUMN IF NOT EXISTS on_notice_to_raw VARCHAR(80) NULL")
   DB.client.query("ALTER TABLE `#{DB.client.escape(TABLE)}` ADD COLUMN IF NOT EXISTS title_reference TEXT NULL")
 rescue StandardError => e
-  warn "Optional column add skipped: #{e.class} #{e.message}"
+  Log.warn "scraper", "Optional column add skipped: #{e.class} #{e.message}"
 end
 
 def abs_url(base, href)
@@ -97,7 +97,7 @@ detail_links.each do |url|
   begin
     html = Http.get(url)
   rescue StandardError => e
-    warn "Skip #{url}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Skip #{url}: #{e.class} #{e.message}"
     next
   end
 
@@ -174,7 +174,7 @@ detail_links.each do |url|
     upd = DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET document_url = ?, on_notice_to = ?, on_notice_to_raw = ?, title_reference = ? WHERE council_reference = ? AND address = ?")
     upd.execute(document_url, on_notice, on_notice_raw.to_s, title_reference, council_reference, address)
   rescue StandardError => e
-    warn "Extras update skipped for #{council_reference}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Extras update skipped for #{council_reference}: #{e.class} #{e.message}"
   end
 
   puts "Upserted #{council_reference} -> #{address}"

+ 4 - 4
scrapers/waratah_wynyard.rb

@@ -20,7 +20,7 @@ begin
   DB.client.query("ALTER TABLE `#{DB.client.escape(TABLE)}` ADD COLUMN IF NOT EXISTS on_notice_to_raw VARCHAR(80) NULL")
   DB.client.query("ALTER TABLE `#{DB.client.escape(TABLE)}` ADD COLUMN IF NOT EXISTS title_reference TEXT NULL")
 rescue StandardError => e
-  warn "Optional column add skipped: #{e.class} #{e.message}"
+  Log.warn "scraper", "Optional column add skipped: #{e.class} #{e.message}"
 end
 
 def abs_url(base, href)
@@ -180,7 +180,7 @@ end
 begin
   html = URL.include?("/eservice/") ? Http.dorset_session_get(URL) : Http.get(URL)
 rescue StandardError => e
-  warn "Failed to fetch #{URL}: #{e.class} #{e.message}"
+  Log.warn "scraper", "Failed to fetch #{URL}: #{e.class} #{e.message}"
   exit 1
 end
 
@@ -227,7 +227,7 @@ anchors.each do |u|
       item = parse_detail_page(u)
       rows << item if item
     rescue StandardError => e
-      warn "Skip detail #{u}: #{e.class} #{e.message}"
+      Log.warn "scraper", "Skip detail #{u}: #{e.class} #{e.message}"
     end
   end
 end
@@ -271,7 +271,7 @@ rows.each do |r|
     )
     upd.execute(r[:document_url], r[:date_received], r[:date_received_raw], r[:title_reference], cr, addr)
   rescue StandardError => e
-    warn "Extras update skipped for #{cr}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Extras update skipped for #{cr}: #{e.class} #{e.message}"
   end
 
   puts "Upserted #{cr} -> #{addr}"

+ 2 - 2
scrapers/westcoast.rb

@@ -148,7 +148,7 @@ detail_links.each do |u|
   begin
     item = parse_detail(u)
   rescue StandardError => e
-    warn "Skip #{u}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Skip #{u}: #{e.class} #{e.message}"
     next
   end
   next unless item
@@ -175,7 +175,7 @@ detail_links.each do |u|
     upd = DB.client.prepare("UPDATE `#{DB.client.escape(TABLE)}` SET document_url = ?, on_notice_to = ?, on_notice_to_raw = ?, title_reference = ? WHERE council_reference = ? AND address = ?")
     upd.execute(item[:document_url], item[:date_received], item[:date_received_raw], item[:title_reference], item[:council_reference], item[:address])
   rescue StandardError => e
-    warn "Extras update skipped for #{item[:council_reference]}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Extras update skipped for #{item[:council_reference]}: #{e.class} #{e.message}"
   end
 
   puts "Upserted #{item[:council_reference]} -> #{item[:address]}"

+ 1 - 1
scrapers/westtamar.rb

@@ -115,7 +115,7 @@ detail_links.each do |u|
   begin
     item = parse_detail(u)
   rescue StandardError => e
-    warn "Skip #{u}: #{e.class} #{e.message}"
+    Log.warn "scraper", "Skip #{u}: #{e.class} #{e.message}"
     next
   end
   next unless item