CloudScale Finance ETL (In Progress)

🚧 In Progress

This project is currently under development. Once complete, it will showcase:

  • Automated ingestion of stock market data from APIs (Alpha Vantage, Yahoo Finance)
  • Serverless data transformation via AWS Lambda
  • Cloud data warehousing with Google BigQuery
  • CI/CD and Infrastructure-as-Code with GitHub Actions and Terraform
  • Airflow orchestration for reliable scheduling
  • Monitoring, alerting, and cost optimization strategies

🧠 Why I’m Building This (Educational Project)

I designed CloudScale ETL as a structured way to upskill in modern data engineering. It’s an educational, portfolio-focused project intended to reinforce the fundamentals:

  • Build a working end-to-end pipeline with clear code
  • Practice cloud-native patterns (serverless, IaC, monitoring) without over-engineering
  • Document architecture and tradeoffs so others can learn from my approach

This is not a production workloadβ€”it’s a teaching vehicle and a skills benchmark that I’m iterating on publicly.

πŸ“Š Architecture Overview (Preview)

The pipeline will follow this general flow:

         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚ Financial  β”‚
         β”‚   APIs     β”‚
         β”‚ (e.g., AV) β”‚
         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
              β”‚
              β–Ό
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚  AWS S3     β”‚
       β”‚ Raw Storage β”‚
       β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚
            β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚ AWS Lambda   β”‚
     β”‚ (Transform)  β”‚
     β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
          β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ Google BigQuery β”‚
   β”‚   Warehouse     β”‚
   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β”‚ Data Studio /β”‚
   β”‚ Looker / BI  β”‚
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Both development and production architectures will be containerized and cloud-integrated for scalability and cost-effectiveness.

πŸ› οΈ Tech Stack

  • Languages & Tools: Python, Docker, Terraform
  • Cloud Providers: AWS (S3, Lambda, CloudWatch), GCP (BigQuery)
  • Workflow Orchestration: Apache Airflow
  • DevOps: GitHub Actions, LocalStack, Monitoring/Alerting
  • Data Processing: Pandas, Technical Indicators (SMA, RSI, VWAP)

πŸ”— GitHub

You can follow the latest development progress or explore the source code here:
πŸ‘‰ View on GitHub


πŸ“ Status

Architecture diagrams, transformation code, validation logic, and monitoring dashboards are being actively developed. Full documentation, demo video, and a live walkthrough will be added when completed.

Stay tuned!