π§ In Progress
This project is currently under development. Once complete, it will showcase:
- Automated ingestion of stock market data from APIs (Alpha Vantage, Yahoo Finance)
- Serverless data transformation via AWS Lambda
- Cloud data warehousing with Google BigQuery
- CI/CD and Infrastructure-as-Code with GitHub Actions and Terraform
- Airflow orchestration for reliable scheduling
- Monitoring, alerting, and cost optimization strategies
π§ Why Iβm Building This (Educational Project)
I designed CloudScale ETL as a structured way to upskill in modern data engineering. Itβs an educational, portfolio-focused project intended to reinforce the fundamentals:
- Build a working end-to-end pipeline with clear code
- Practice cloud-native patterns (serverless, IaC, monitoring) without over-engineering
- Document architecture and tradeoffs so others can learn from my approach
This is not a production workloadβitβs a teaching vehicle and a skills benchmark that Iβm iterating on publicly.
π Architecture Overview (Preview)
The pipeline will follow this general flow:
ββββββββββββββ
β Financial β
β APIs β
β (e.g., AV) β
ββββββ¬ββββββββ
β
βΌ
βββββββββββββββ
β AWS S3 β
β Raw Storage β
ββββββ¬βββββββββ
β
βΌ
ββββββββββββββββ
β AWS Lambda β
β (Transform) β
ββββββ¬ββββββββββ
β
βΌ
βββββββββββββββββββ
β Google BigQuery β
β Warehouse β
ββββββ¬βββββββββββββ
β
βΌ
ββββββββββββββββ
β Data Studio /β
β Looker / BI β
ββββββββββββββββ
Both development and production architectures will be containerized and cloud-integrated for scalability and cost-effectiveness.
π οΈ Tech Stack
- Languages & Tools: Python, Docker, Terraform
- Cloud Providers: AWS (S3, Lambda, CloudWatch), GCP (BigQuery)
- Workflow Orchestration: Apache Airflow
- DevOps: GitHub Actions, LocalStack, Monitoring/Alerting
- Data Processing: Pandas, Technical Indicators (SMA, RSI, VWAP)
π GitHub
You can follow the latest development progress or explore the source code here:
π View on GitHub
π Status
Architecture diagrams, transformation code, validation logic, and monitoring dashboards are being actively developed. Full documentation, demo video, and a live walkthrough will be added when completed.
Stay tuned!