Geolocation DAta Curation

🏡 Explore: Residential Real Estate Data Curation Project

Transforming public real estate data into actionable insights with AWS and geolocation enrichment.

Overview

This project transforms public residential real estate data from Zillow Research into curated, geo-enriched datasets. We combine home values and rental trends with geographic intelligence from Google Maps, Esri ArcGIS, and U.S. Census data — all processed using AWS-native services. Everything is transparent, reproducible, and free.

📥 1. Where the Data Came From

  • Zillow Research – Home value, rent, and sales data
  • Google Maps Platform – Geolocation and place metadata
  • Esri ArcGIS – Regional overlays and metro boundaries
  • U.S. Census Bureau – ZIP-to-FIPS mappings and demographic overlays

🛠️ 2. How It Was Processed

We used a three-tier data pipeline:
  • Bronze – Raw Zillow CSVs stored in S3
  • Silver – Data cleaned and standardized using AWS Glue
  • Gold – Enriched using AWS Lambda with:
    • Google Maps API for location details
    • Esri ArcGIS for boundaries and overlays
    • Census data for FIPS and ZIP alignment
    • Inflation-adjusted pricing

☁️ 3. AWS Services Used

  • S3 – Storage of raw, cleaned, and enriched data
  • AWS Glue – ETL and transformations
  • AWS Lambda – External enrichment and logic
  • CloudWatch – Logging and monitoring
  • IAM – Scoped permissions for all services
  • Step Functions (planned) – Visual orchestration

🧠 4. What Decisions Were Made During Curation

  • Used Google and Esri APIs for detailed geolocation enrichment
  • Standardized ZIP and region mappings across sources
  • Adjusted all pricing data for inflation using CPI
  • Interpolated missing values only where statistically supported
  • Unified all datasets into a consistent schema for querying

🔍 Why It Matters

This project enables:
  • Real estate market trend analysis
  • Geospatial and demographic modeling
  • Regional housing comparisons
  • Smart dashboards and public policy insights
Whether you’re a developer, analyst, policymaker, or just curious — this dataset is built to empower your work.