gridfm-datakit-v1: A Python Library for Scalable and Realistic Power Flow and Optimal Power Flow Data Generation

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing PF/OPF datasets suffer from three key limitations: (1) inaccurate perturbation modeling—lacking realistic temporal load variations and N-k topological contingencies; (2) PF samples confined to the feasible region, omitting critical constraint violations (e.g., line overloads, voltage limit violations); and (3) fixed OPF cost functions, limiting generalizability. To address these, we introduce the first open-source Python library supporting large-scale power systems (up to 10,000 buses). Our method pioneers a unified data generation paradigm integrating realistic load scaling, localized noise injection, and arbitrary N-k topology perturbations. It systematically produces comprehensive PF samples—including all types of constraint violations—and diverse OPF datasets with multiple cost functions. Implemented via high-performance parallel computation on PyPower/Pandapower, our framework achieves over 3× greater scenario diversity and 100% coverage of violation states compared to tools like OPFData, significantly enhancing robustness and generalization of ML-based OPF solvers.

Technology Category

Application Category

📝 Abstract

We introduce gridfm-datakit-v1, a Python library for generating realistic and diverse Power Flow (PF) and Optimal Power Flow (OPF) datasets for training Machine Learning (ML) solvers. Existing datasets and libraries face three main challenges: (1) lack of realistic stochastic load and topology perturbations, limiting scenario diversity; (2) PF datasets are restricted to OPF-feasible points, hindering generalization of ML solvers to cases that violate operating limits (e.g., branch overloads or voltage violations); and (3) OPF datasets use fixed generator cost functions, limiting generalization across varying costs. gridfm-datakit addresses these challenges by: (1) combining global load scaling from real-world profiles with localized noise and supporting arbitrary N-k topology perturbations to create diverse yet realistic datasets; (2) generating PF samples beyond operating limits; and (3) producing OPF data with varying generator costs. It also scales efficiently to large grids (up to 10,000 buses). Comparisons with OPFData, OPF-Learn, PGLearn, and PF$Δ$ are provided. Available on GitHub at https://github.com/gridfm/gridfm-datakit under Apache 2.0 and via `pip install gridfm-datakit`.

Problem

Research questions and friction points this paper is trying to address.

Generates realistic PF and OPF datasets for ML solvers

Addresses lack of scenario diversity and limited generalization

Scales efficiently to large power grids up to 10,000 buses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates diverse PF/OPF data with realistic load and topology perturbations

Produces PF samples beyond operating limits for better ML generalization

Creates OPF data with varying generator costs to enhance model robustness

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Machine Learning Engineer