BQ Bulletproof

Status
in-flight
Tier
Tier 2 — Platform
Owner
Ryan Colston
Started
2026-05-03

Bulletproof every BigQuery transformation, ingestion, and read so surprise bills are physically impossible and every table has tests, lineage, and ownership.

Why

Started 2026-05-03. Two threads converged:

  1. Cost surprise. Feb 4-5 2026 incident: a per-row WHERE email_id = @id lookup loop scanned 1.4 TB / ~$9 in 2 days because BQ bills at least 10 MB per query. No project-level guardrail caught it. Audit revealed zero per-query caps in the wrapper code, no billing budget on the API (one existed at billing-account level but was uncatalogued).

  2. Transformation sprawl. ~80 transformation views across 7 datasets (pos_advanced 24, pos_digest 17, pos_staging 11, pos_analytics 9, pos_coaching 8, master_data 6, pos_dashboard 5) plus ~5 pure-SQL transform CFs. Created ad-hoc, no version control, no tests, no enforced tier discipline. "Why is this number off?" has no audit trail.

Bulletproof = caps stop bad queries, every transform is version-controlled with assertions, every gold table has a freshness watchdog, and there's exactly one source of truth per metric.

Current state

Next 3 actions

  1. Phase 3 scheduling — tracked in the umbrella, not here. Pick a slot in the umbrella's priority queue.
  2. Optional Phase 2 micro-cleanup: retire the now-orphan release_config.code_compilation_config.default_schema = "reporting_dataform" fallback in infra/terraform/bq-bulletproof-dataform/dataform.tf. Separate behavior-change MR. Low priority. Not part of the umbrella.
  3. When Phase 3 closes via the umbrella: restart BQ Bulletproof here for Phases 4-10 (Dataform reliability — assertions, lineage, dataset caps, freshness watchdog, cost monitoring).

Decisions log

Phase 11 — SUPERSEDED 2026-05-15 (added 2026-05-03)

Phase 11 originally coordinated the CCPJ→platform strangler-fig migration so no bq-bulletproof code got lost in the sweep. That strangler-fig never ran — the 2026-05-15 monorepo consolidation big-banged all 5 source repos into ~/rylobasic/ instead. lib/bigquery_client.py and all bq-bulletproof code already live under ~/rylobasic/; ~/CCPJ + ~/platform are archived read-only. Phase 11 is closed by supersession, its only live consequence being the removal of the Phase 2 cutover timing gate (see Blocked-on / Next actions). CHECKLIST.md Phase 11 rewritten accordingly (MR !135).

Files in this project

Canonical location since 2026-05-18: ~/rylobasic/infra/bq-bulletproof/ (relocated from the now-frozen ~/_archived-platform-2026-05-15/projects/bq-bulletproof/; old ~/platform/... paths are dead).

Open issues

References

Memory pointer

~/.claude/projects/-Users-rycolston/memory/project_bq_bulletproof.md