HouseTS: A Large-Scale, Multimodal Spatiotemporal U.S. Housing Dataset

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing long-term housing price forecasting lacks a large-scale, reproducible benchmark dataset that simultaneously captures spatiotemporal depth, multimodal richness, and real-world granularity. Method: We construct a monthly multimodal spatiotemporal dataset covering 6,000 ZIP codes across 30 U.S. metropolitan areas from 2012–2023 (890K+ records), integrating satellite imagery, point-of-interest (POI) data, socioeconomic indicators, fine-grained real-estate features, and natural-language descriptions. We propose a novel multimodal urban evolution analysis paradigm, unifying vision-language models (VLMs), POI embeddings, geographically weighted feature engineering, and a unified time-series modeling framework. Contribution/Results: We release an open-source dataset and a standardized benchmark comprising 14 baseline models. Our approach enables interpretable, satellite-driven textual generation of urban change narratives, significantly enhancing prediction transparency and policy-relevance. The benchmark establishes new standards for reproducibility, scalability, and multimodal integration in urban forecasting.

Technology Category

Application Category

📝 Abstract

Accurate house-price forecasting is essential for investors, planners, and researchers. However, reproducible benchmarks with sufficient spatiotemporal depth and contextual richness for long horizon prediction remain scarce. To address this, we introduce HouseTS a large scale, multimodal dataset covering monthly house prices from March 2012 to December 2023 across 6,000 ZIP codes in 30 major U.S. metropolitan areas. The dataset includes over 890K records, enriched with points of Interest (POI), socioeconomic indicators, and detailed real estate metrics. To establish standardized performance baselines, we evaluate 14 models, spanning classical statistical approaches, deep neural networks (DNNs), and pretrained time-series foundation models. We further demonstrate the value of HouseTS in a multimodal case study, where a vision language model extracts structured textual descriptions of geographic change from time stamped satellite imagery. This enables interpretable, grounded insights into urban evolution. HouseTS is hosted on Kaggle, while all preprocessing pipelines, benchmark code, and documentation are openly maintained on GitHub to ensure full reproducibility and easy adoption.

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale housing datasets with spatiotemporal depth

Need for standardized benchmarks in house-price forecasting

Limited multimodal integration of housing data with contextual factors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale multimodal housing dataset

Standardized benchmarks with 14 models

Vision language model for urban insights

🔎 Similar Papers

No similar papers found.