The Real-Time Fraud Playbook: A Block‑Sized Lesson in Snowflake‑Backed Feature Stores

Block, Inc.'s Cash App faced a real-time fraud scoring dilemma: scale ML-driven detection across streaming and batch signals while labels can arrive up to 24 hours late 1. That story isn’t just about speed; it’s about governance, reproducibility, and keeping data sane as it scales. This article traces that challenge into a Snowflake‑backed feature store blueprint—covering streaming feature computation, online inference with latency targets, delayed feedback, TTL caching, versioning, drift monitoring, and canary-based governance. The journey reveals a practical playbook for production‑grade fraud systems at fintech scale.

Building the Pipeline: Signals to Scores

Building on Block’s case, the core is a streaming-to-online pipeline where signals arrive continuously and features are computed on the fly. A TTL-driven cache keeps feature values fresh without repeatedly hitting the warehouse, while a versioned feature registry ensures reproducibility when late labels arrive or experiments roll back. Online scoring targets latency in the tens-of-milliseconds range, supported by in-database feature computation and in-system caching. Meanwhile, delayed labels (up to 24 hours) feed back into offline evaluation and model retraining pipelines, closing the loop between inference and ground truth 2 3 . Streaming signals → feature computation → Snowflake feature store TTL cache with explicit versioning to guarantee reproducibility 4 Online scoring with strict latency targets; delayed labels feed offline training 5 Governance hooks for tracing data lineage and model provenance 6

Schemas and Monitors: Designing for Reproducibility

Next is a disciplined schema strategy: versioned feature specs, strong typing, and clear TTL semantics. The schema acts as a contract between feature engineering, storage, and serving layers. Monitors keep an eye on drift, data quality, and latency, ensuring that any deviation triggers alerts and a rollback path. A simple pseudo schema helps illustrate the idea: // Pseudo schema const FeatureSpec = {name:string, type:'float'|'int'|'string', ttl:number} This approach supports offline/online parity, ensuring features produced during training align with those used at inference time 7 8 . When combined with a telemetry-rich registry, teams can answers questions like: which features version are used by a given model, and how did a drift event affect performance 9 .

Governance, Drift, and Canary Rollouts

A robust governance layer sits atop the pipeline to audit data access, feature lineage, and model decisions. Drift monitoring surfaces shifts in feature distributions or target leakage, prompting automatic canary rollouts and rollback canaries when risks rise. The cadence is deliberate: deploy new features to a small subset, measure impact for a defined window, and progressively widen rollout while maintaining a rollback path if drift or latency spikes occur 10 . Drift detection triggers canaries and rollback plans 11 Canaries decouple risk from velocity; roll back quickly if metrics degrade 12

Rollout Plan and A/B Architecture

The rollout strategy embraces a staged A/B approach: start with a 5–10% traffic split to the new feature set, with latency budgets and drift alarms in place. If signals stay healthy for a defined period, incrementally increase to 50% and finally full production. Delayed labels feed into continuous evaluation dashboards, ensuring that the new features outperform the baseline not just on immediate latency, but on long- horizon fraud metrics and business outcomes 1 5 . Key questions to answer during rollout: How does the new feature version affect precision/recall under real-world traffic 2 ? Do latency targets hold as shard counts grow, or do caching strategies need tuning 8 ? Is there any drift that requires model retraining or feature retirement 9 ?

Monitors and Metrics: What to Watch

A practical monitoring plan includes: latency distribution (p95/p99), feature-by-feature data freshness, drift scores, calibration metrics, and rollback readiness. An end-to-end view ties streaming signals, feature compute time, cache hits/misses, and online scoring latency into a single dashboard. Regular drills ensure the rollback canaries work as intended and that governance traces survive audits over extended periods 13 14 . Real-World Case Study Block, Inc. Block, the fintech behind Cash App, faced a common but critical challenge: scale real-time fraud scoring for digital payments. The company relied on Snowflake’s Data Cloud to centralize data and empower ML-driven fraud detection across streaming and batch signals, enabling visibility and governance across teams. Key Takeaway: A unified, governed data platform with streaming capabilities and in-database ML support unlocks rapid experimentation and safe, scalable fraud scoring at fintech scale. Clear feature versioning, TTL caching, drift monitoring, and canary-based rollouts are essential to production-grade fraud systems.

System Flow

graph TD S[Streaming Signals] --> FC[Feature Computation] FC --> FS[Feature Store (Snowflake)] FS --> OL[Online Scoring Service] OL --> LA[Latency Targets ~50ms] S2[Delayed Labels (up to 24h)] -->|Feedback| OL OL --> GOV[Governance & Auditing] GOV --> DM[Drift Monitoring] DM --> Canary[Canary Rollouts] Canary --> Rollback[Rollback & Canaries] OL --> Cache[TTL Cache & Feature Versioning] Cache --> Offline[Offline Training Parity] Did you know? Some fintechs run feature caches at the edge to shave tens of milliseconds off latency while keeping a single source of truth in the data cloud. Key Takeaways Feature versioning enables reproducibility across training and serving TTL caching balances freshness and latency Drift monitoring triggers governance canaries and rollbacks References 1 Fraud Detection & Financial Crimes - Snowflake article 2 Fraud detection - Wikipedia article 3 Feast (Open-Source Feature Store) - GitHub GitHub 4 Feast Documentation documentation 5 SageMaker Feature Store documentation 6 Amazon CloudFront Expiration documentation 7 HTTP Caching - MDN documentation 8 RFC 7234 - HTTP Caching documentation 9 Kubernetes Docs documentation 10 Snowpark for Python - GitHub GitHub Share This Ever wondered how fintechs scale real-time fraud scoring without losing governance? 🚀 Block, Inc. used a Snowflake-driven feature store to centralize signals and govern ML experiments at fintech scale.,TTL caching and feature versioning kept data fresh with sub-100ms online latency.,Drift monitoring + canary rollouts reduced risk during rapid iteration. Read the full journey to uncover the playbook developers can apply tomorrow. #SoftwareEngineering #SystemDesign #DataEngineering #MachineLearning #FraudDetection #FinTech #CloudComputing #TechTips undefined function copySnippet(btn) { const snippet = document.getElementById('shareSnippet').innerText; navi

System Flow

Did you know? Some fintechs run feature caches at the edge to shave tens of milliseconds off latency while keeping a single source of truth in the data cloud.

References

1Fraud Detection & Financial Crimes - Snowflakearticle
2Fraud detection - Wikipediaarticle
3Feast (Open-Source Feature Store) - GitHubGitHub
4Feast Documentationdocumentation
5SageMaker Feature Storedocumentation
6Amazon CloudFront Expirationdocumentation
7HTTP Caching - MDNdocumentation
8RFC 7234 - HTTP Cachingdocumentation
9Kubernetes Docsdocumentation
10Snowpark for Python - GitHubGitHub

Wrapping Up

From Block’s real-world challenge to a structured, governance‑driven pipeline, the path to reliable real-time fraud scoring lies in a unified feature store with transparent versioning, TTL caching, and disciplined canary deployments. Tomorrow’s teams can reuse this blueprint to move faster without sacrificing trust.

The Real-Time Fraud Playbook: A Block‑Sized Lesson in Snowflake‑Backed Feature Stores

Building the Pipeline: Signals to Scores

Schemas and Monitors: Designing for Reproducibility

Governance, Drift, and Canary Rollouts

Rollout Plan and A/B Architecture

Monitors and Metrics: What to Watch

System Flow

System Flow

References

Wrapping Up

Continue Reading