Infrastructure-as-Code Setup on GCP
Provision a GCP environment using Terraform with BigQuery & Cloud Storage, staying within free tier limits
This project was designed by data engineering professionals to simulate real-world scenarios used at companies like Netflix, Airbnb, and Spotify. Master Terraform, GCP, BigQuery and 2 more technologies through hands-on implementation. Rated intermediate level with comprehensive documentation and starter code.
☁️ Project: Infrastructure-as-Code Setup on GCP
📌 Project Overview
In this project, you'll use Terraform to provision a Google Cloud Platform (GCP) environment tailored for data engineering workflows. You'll define and deploy resources including a BigQuery dataset and a Cloud Storage bucket, which together form the foundation of a modern, cloud-native data stack.
All resources used in this project are available under GCP's free tier. However, some services have usage limits — for example:
- BigQuery: 10 GB of storage and 1 TB of query processing per month
- Cloud Storage: 5 GB of regional storage and 1 GB of egress per month (in select regions)
This project is designed to stay well within those limits by provisioning infrastructure only and running lightweight validation steps.
🎯 Learning Objectives
- Understand the role of Infrastructure-as-Code (IaC) in modern data workflows
- Learn Terraform syntax, structure, and lifecycle
- Use variables, outputs, and modules to write clean, reusable Terraform code
- Implement access controls and best practices for cloud resource security
- Set up a working GCP foundation for future analytics or ingestion pipelines
📂 Project Structure
terraform-gcp/
├── terraform/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ └── versions.tf
├── config/
│ └── credentials.json # Service account key (excluded from version control)
├── docs/
│ └── architecture.md # Documentation for setup and resource structure
├── data/
│ └── test.csv # Dummy file for GCS test
├── README.md
└── .gitignore
🔄 Step-by-Step Guide
1. 🛠 Set Up Your Local Environment
- Install Terraform CLI (free)
- Install the Google Cloud SDK
- Create a GCP project (new accounts come with $300 in free credits)
- Enable required APIs:
- BigQuery
- Cloud Storage
- IAM & Service Accounts
- Create a service account with the following roles:
- BigQuery Admin
- Storage Admin
- Project Viewer
- Generate and save a JSON key for use with Terraform
2. 📦 Initialize Terraform Project
- Define provider configuration for GCP
- Set up input variables for project ID, region, and resource names
- Use outputs to return dataset and bucket IDs
- Optionally configure a backend for remote state (for teams)
3. 🧮 Provision BigQuery Resources
- Use
google_bigquery_dataset - Configure:
- Dataset name and location
- Default table expiration
- Descriptive labels and environment metadata
- (Optional) Create a sample table schema to validate setup
- Use test queries with minimal data to stay within limits
4. 📁 Set Up Cloud Storage
- Use
google_storage_bucket - Configure:
- Lifecycle rules to auto-delete test files
- Folder structure (
/raw/,/staging/,/processed/) - Uniform bucket-level access and IAM bindings
5. ✅ Testing and Validation
- Run
terraform applyand confirm infrastructure creation - Use GCP Console and CLI to verify:
- Dataset is listed and configured
- Bucket is present and accessible
- Upload a test
.csvfile to storage and preview it - Run a lightweight BigQuery query (e.g., row count)
- Run
terraform destroywhen finished to clean up
📦 Deliverables
- Full Terraform configuration for BigQuery + Cloud Storage
- Documentation on setup, architecture, and best practices
- Sample test file and BQ table
- Secure, reusable GCP environment for future data work
🧪 Optional Extensions
- Add support for environment separation (e.g., dev/staging/prod modules)
- Set up CI/CD for Terraform using GitHub Actions
- Configure audit logs or alerts on GCS and BQ usage
- Use Terraform
localsandmodulesfor reuse and structure
💰 Cost Notes
All resources used in this project are covered by GCP's always-free tier:
- BigQuery: 10 GB of active storage and 1 TB of queries/month
- Cloud Storage: 5 GB/month of regional storage (in select locations) and minimal egress
- IAM & APIs: Free to use and configure
⚠️ To stay within free limits:
- Keep test files and queries small
- Delete files after use
- Destroy resources once validation is complete