gridfm-datakit-v1: A Python Library for Scalable and Realistic Power Flow and Optimal Power Flow Data Generation

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing PF/OPF datasets suffer from three key limitations: (1) inaccurate perturbation modeling—lacking realistic temporal load variations and N-k topological contingencies; (2) PF samples confined to the feasible region, omitting critical constraint violations (e.g., line overloads, voltage limit violations); and (3) fixed OPF cost functions, limiting generalizability. To address these, we introduce the first open-source Python library supporting large-scale power systems (up to 10,000 buses). Our method pioneers a unified data generation paradigm integrating realistic load scaling, localized noise injection, and arbitrary N-k topology perturbations. It systematically produces comprehensive PF samples—including all types of constraint violations—and diverse OPF datasets with multiple cost functions. Implemented via high-performance parallel computation on PyPower/Pandapower, our framework achieves over 3× greater scenario diversity and 100% coverage of violation states compared to tools like OPFData, significantly enhancing robustness and generalization of ML-based OPF solvers.

Technology Category

Application Category

📝 Abstract
We introduce gridfm-datakit-v1, a Python library for generating realistic and diverse Power Flow (PF) and Optimal Power Flow (OPF) datasets for training Machine Learning (ML) solvers. Existing datasets and libraries face three main challenges: (1) lack of realistic stochastic load and topology perturbations, limiting scenario diversity; (2) PF datasets are restricted to OPF-feasible points, hindering generalization of ML solvers to cases that violate operating limits (e.g., branch overloads or voltage violations); and (3) OPF datasets use fixed generator cost functions, limiting generalization across varying costs. gridfm-datakit addresses these challenges by: (1) combining global load scaling from real-world profiles with localized noise and supporting arbitrary N-k topology perturbations to create diverse yet realistic datasets; (2) generating PF samples beyond operating limits; and (3) producing OPF data with varying generator costs. It also scales efficiently to large grids (up to 10,000 buses). Comparisons with OPFData, OPF-Learn, PGLearn, and PF$Δ$ are provided. Available on GitHub at https://github.com/gridfm/gridfm-datakit under Apache 2.0 and via `pip install gridfm-datakit`.
Problem

Research questions and friction points this paper is trying to address.

Generates realistic PF and OPF datasets for ML solvers
Addresses lack of scenario diversity and limited generalization
Scales efficiently to large power grids up to 10,000 buses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates diverse PF/OPF data with realistic load and topology perturbations
Produces PF samples beyond operating limits for better ML generalization
Creates OPF data with varying generator costs to enhance model robustness
🔎 Similar Papers
No similar papers found.
A
Alban Puech
IBM Research, ETH Zurich
M
Matteo Mazzonelli
IBM Research, ETH Zurich
C
Celia Cintas
IBM Research
T
Tamara R. Govindasamy
IBM Research
M
Mangaliso Mngomezulu
IBM Research
J
Jonas Weiss
IBM Research
M
Matteo Baù
RSE S.p.A.
Anna Varbella
Anna Varbella
ETH Zurich
F
François Mirallès
Hydro-Québec Research Institute
Kibaek Kim
Kibaek Kim
Argonne National Laboratory
OptimizationDistributed LearningPower Systems
Le Xie
Le Xie
Gordon McKay Professor of Electrical Engineering, Harvard University
Power Systems EconomicsData SciencesPublic PolicyArtificial Intelligence
H
Hendrik F. Hamann
Stony Brook University, Brookhaven National Laboratory
E
Etienne Vos
IBM Research
Thomas Brunschwiler
Thomas Brunschwiler
IBM Research
Physics & AI for Climate Impact