DiTEC-WDN: A Large-Scale Dataset of Water Distribution Network Scenarios under Diverse Hydraulic Conditions

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Privacy constraints hinder the sharing of real-world water distribution network (WDN) models, impeding the development of data-driven methods. To address this, we propose DiTEC-WDN—the first large-scale, publicly licensed synthetic WDN simulation dataset. It encompasses 36,000 operational scenarios spanning 24-hour and annual hydraulic conditions, yielding 228 million graph-structured hydraulic states. Our methodology integrates automated EPANET-based simulation, multi-objective parameter optimization, graph-state encoding, and rule-based consistency verification—ensuring hydraulic fidelity while eliminating privacy risks entirely. DiTEC-WDN supports multi-granularity tasks, including graph-level, node-level, edge-level regression, and time-series forecasting. It has already enabled training and benchmarking of multiple AI models for water systems, filling a critical gap in publicly available benchmarks and advancing standardization in AI research for the water industry.

Technology Category

Application Category

📝 Abstract
Privacy restrictions hinder the sharing of real-world Water Distribution Network (WDN) models, limiting the application of emerging data-driven machine learning, which typically requires extensive observations. To address this challenge, we propose the dataset DiTEC-WDN that comprises 36,000 unique scenarios simulated over either short-term (24 hours) or long-term (1 year) periods. We constructed this dataset using an automated pipeline that optimizes crucial parameters (e.g., pressure, flow rate, and demand patterns), facilitates large-scale simulations, and records discrete, synthetic but hydraulically realistic states under standard conditions via rule validation and post-hoc analysis. With a total of 228 million generated graph-based states, DiTEC-WDN can support a variety of machine-learning tasks, including graph-level, node-level, and link-level regression, as well as time-series forecasting. This contribution, released under a public license, encourages open scientific research in the critical water sector, eliminates the risk of exposing sensitive data, and fulfills the need for a large-scale water distribution network benchmark for study comparisons and scenario analysis.
Problem

Research questions and friction points this paper is trying to address.

Privacy restrictions limit sharing real-world water network data
Lack of large-scale datasets for water network machine learning
Need synthetic but realistic hydraulic scenarios for research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline optimizes hydraulic parameters
Generates synthetic hydraulically realistic states
Large-scale graph-based states for ML tasks
🔎 Similar Papers
No similar papers found.