Infrastructure-as-Code Setup on GCP

    Provision a GCP environment using Terraform with BigQuery & Cloud Storage, staying within free tier limits

    ✓ Expert-Designed Project• Industry-Validated Implementation• Production-Ready Architecture

    This project was designed by data engineering professionals to simulate real-world scenarios used at companies like Netflix, Airbnb, and Spotify. Master Terraform, GCP, BigQuery and 2 more technologies through hands-on implementation. Rated intermediate level with comprehensive documentation and starter code.

    Intermediate
    4-6 hours

    ☁️ Project: Infrastructure-as-Code Setup on GCP

    📌 Project Overview

    In this project, you'll use Terraform to provision a Google Cloud Platform (GCP) environment tailored for data engineering workflows. You'll define and deploy resources including a BigQuery dataset and a Cloud Storage bucket, which together form the foundation of a modern, cloud-native data stack.

    All resources used in this project are available under GCP's free tier. However, some services have usage limits — for example:

    • BigQuery: 10 GB of storage and 1 TB of query processing per month
    • Cloud Storage: 5 GB of regional storage and 1 GB of egress per month (in select regions)

    This project is designed to stay well within those limits by provisioning infrastructure only and running lightweight validation steps.


    🎯 Learning Objectives

    • Understand the role of Infrastructure-as-Code (IaC) in modern data workflows
    • Learn Terraform syntax, structure, and lifecycle
    • Use variables, outputs, and modules to write clean, reusable Terraform code
    • Implement access controls and best practices for cloud resource security
    • Set up a working GCP foundation for future analytics or ingestion pipelines

    📂 Project Structure

    terraform-gcp/
    ├── terraform/
    │   ├── main.tf
    │   ├── variables.tf
    │   ├── outputs.tf
    │   └── versions.tf
    ├── config/
    │   └── credentials.json       # Service account key (excluded from version control)
    ├── docs/
    │   └── architecture.md        # Documentation for setup and resource structure
    ├── data/
    │   └── test.csv               # Dummy file for GCS test
    ├── README.md
    └── .gitignore
    

    🔄 Step-by-Step Guide

    1. 🛠 Set Up Your Local Environment

    • Install Terraform CLI (free)
    • Install the Google Cloud SDK
    • Create a GCP project (new accounts come with $300 in free credits)
    • Enable required APIs:
      • BigQuery
      • Cloud Storage
      • IAM & Service Accounts
    • Create a service account with the following roles:
      • BigQuery Admin
      • Storage Admin
      • Project Viewer
    • Generate and save a JSON key for use with Terraform

    2. 📦 Initialize Terraform Project

    • Define provider configuration for GCP
    • Set up input variables for project ID, region, and resource names
    • Use outputs to return dataset and bucket IDs
    • Optionally configure a backend for remote state (for teams)

    3. 🧮 Provision BigQuery Resources

    • Use google_bigquery_dataset
    • Configure:
      • Dataset name and location
      • Default table expiration
      • Descriptive labels and environment metadata
    • (Optional) Create a sample table schema to validate setup
    • Use test queries with minimal data to stay within limits

    4. 📁 Set Up Cloud Storage

    • Use google_storage_bucket
    • Configure:
      • Lifecycle rules to auto-delete test files
      • Folder structure (/raw/, /staging/, /processed/)
      • Uniform bucket-level access and IAM bindings

    5. ✅ Testing and Validation

    • Run terraform apply and confirm infrastructure creation
    • Use GCP Console and CLI to verify:
      • Dataset is listed and configured
      • Bucket is present and accessible
    • Upload a test .csv file to storage and preview it
    • Run a lightweight BigQuery query (e.g., row count)
    • Run terraform destroy when finished to clean up

    📦 Deliverables

    • Full Terraform configuration for BigQuery + Cloud Storage
    • Documentation on setup, architecture, and best practices
    • Sample test file and BQ table
    • Secure, reusable GCP environment for future data work

    🧪 Optional Extensions

    • Add support for environment separation (e.g., dev/staging/prod modules)
    • Set up CI/CD for Terraform using GitHub Actions
    • Configure audit logs or alerts on GCS and BQ usage
    • Use Terraform locals and modules for reuse and structure

    💰 Cost Notes

    All resources used in this project are covered by GCP's always-free tier:

    • BigQuery: 10 GB of active storage and 1 TB of queries/month
    • Cloud Storage: 5 GB/month of regional storage (in select locations) and minimal egress
    • IAM & APIs: Free to use and configure

    ⚠️ To stay within free limits:

    • Keep test files and queries small
    • Delete files after use
    • Destroy resources once validation is complete

    Project Details

    Tools & Technologies

    Terraform
    GCP
    BigQuery
    Cloud Storage
    IAM

    Difficulty Level

    Intermediate

    Estimated Duration

    4-6 hours

    Sign in to submit projects and track your progress

    More Projects