Designed and built a distributed data platform processing 100M+ customer records daily — powering identity resolution, real-time search, and enterprise analytics at scale.
ClientTELUS (Enterprise Telecom)
RoleSenior Backend / Data Platform Engineer
DurationFeb 2025 – Present
StackPython · GCP · PySpark · BigQuery
0
Records processed daily
0%
Query cost reduction via Parquet architecture
0%
Performance improvement on SQL & backend services
// data flow architecture
🏢
Enterprise Sources
CRM · CDC · APIs
Ingestion
⚡
GCP Dataflow
PySpark · ETL
Processing
🗄️
GCS + Parquet
Optimized Storage
Storage
🔍
BigQuery
Analytics · Search
Warehouse
📊
Looker / BI
Exec Dashboards
Insights
Overview
🔴 The Problem
Fragmented customer data across 10+ enterprise systems with no unified identity layer
BigQuery queries scanning full tables — slow and expensive at 100M+ record scale
Manual workflows causing data latency of 24–48 hours in reporting
No automated monitoring — pipeline failures discovered post-mortem
🟢 The Solution
Built Veritas — a unified customer identity resolution and search platform
Implemented Parquet-based serving layer to bypass full BigQuery table scans
Designed distributed PySpark pipelines on GCP Dataflow for parallel processing
Automated workflows via GCP Workflows + Cloud Functions with alerting
Live Platform Preview
Veritas · Customer Intelligence Platform · TELUS
RECORDS TODAY
103.4M
↑ 2.1% vs yesterday
PIPELINE STATUS
LIVE
All 6 jobs healthy
QUERY LATENCY
180ms
↓ 62% from baseline
COST / QUERY
$0.003
↓ 41% Parquet savings
// records processed — last 7 days (millions)
Mon
Tue
Wed
Thu
Fri
Sat
Sun
// recent pipeline jobs
identity-resolution-v3COMPLETE
cdc-ingestion-telcoCOMPLETE
parquet-optimizerRUNNING
bigquery-syncCOMPLETE
looker-refreshSCHEDULED
// data sources connected
CRM Platform38.2M rows
Telco CDR Events41.7M rows
Customer Portal12.4M rows
Support Tickets6.1M rows
Billing System5.0M rows
Outcomes & Impact
📈
30% Processing Performance GainOptimized SQL queries and backend services reduced compute time significantly across all pipeline stages.