🤖 AI Summary
This work proposes a configurable large language model (LLM)-assisted pipeline to systematically translate high-level sustainability regulations into sector-specific, verifiable public procurement criteria—a process traditionally reliant on manual effort and expert knowledge. By integrating contextual prompting with structured policy documents, the approach enables auditable, cross-sector automation aligned with Swiss regulatory requirements. The framework incorporates automated validation and LLM-driven quality assessment mechanisms to ensure rigor and consistency. Experimental results demonstrate that the generated criteria exhibit strong alignment with official guidelines, achieving high performance in both automated checks and expert evaluations, thereby substantially reducing the burden of manual drafting.
📝 Abstract
Public procurement refers to the process by which public sector institutions, such as governments, municipalities, and publicly funded bodies, acquire goods and services. Swiss law requires the integration of ecological, social, and economic sustainability requirements into tender evaluations in the format of criteria that have to be fulfilled by a bidder. However, translating high-level sustainability regulations into concrete, verifiable, and sector-specific procurement criteria (such as selection criteria, award criteria, and technical specifications) remains a labor-intensive and error-prone manual task, requiring substantial domain expertise in several groups of goods and services and considerable manual effort. This paper presents a configurable, LLM-assisted pipeline that is presented as a software supporting the systematic generation and evaluation of sustainability-oriented procurement criteria catalogs for Switzerland. The system integrates in-context prompting, interchangeable LLM backends, and automated output validation to enable auditable criteria generation across different procurement sectors. As a proof of concept, we instantiate the pipeline using official sustainability guidelines published by the Swiss government and the European Commission, which are ingested as structured reference documents. We evaluate the system through a combination of automated quality checks, including an LLM-based evaluation component, and expert comparison against a manually curated gold standard. Our results demonstrate that the proposed pipeline can substantially reduce manual drafting effort while producing criteria catalogs that are consistent with official guidelines. We further discuss system limitations, failure modes, and design trade-offs observed during deployment, highlighting key considerations for integrating generative AI into public sector software workflows.