🤖 AI Summary
Converting over-the-counter (OTC) financial derivatives contracts into the standardized Common Domain Model (CDM) remains challenging due to the unstructured, heterogeneous nature of legal text.
Method: This paper introduces CDMizer, the first template-driven, hierarchical generation framework for CDM structuring. It integrates deep-aware retrieval with large language models (LLMs) and employs hierarchical structured output generation, grammar-constrained templates, and schema-consistency verification to achieve high-fidelity encoding from raw legal text to CDM.
Contribution/Results: We propose an LLM-powered automated evaluation framework supporting dual validation—semantic accuracy and structural completeness. Experiments on real-world OTC contracts demonstrate that CDMizer significantly improves both CDM encoding precision and scalability. The framework establishes a robust foundation for AI-driven contract understanding, synthesis, and automated regulatory compliance verification.
📝 Abstract
Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) are reshaping how AI systems extract and organize information from unstructured text. A key challenge is designing AI methods that can incrementally extract, structure, and validate information while preserving hierarchical and contextual relationships. We introduce CDMizer, a template-driven, LLM, and RAG-based framework for structured text transformation. By leveraging depth-based retrieval and hierarchical generation, CDMizer ensures a controlled, modular process that aligns generated outputs with predefined schema. Its template-driven approach guarantees syntactic correctness, schema adherence, and improved scalability, addressing key limitations of direct generation methods. Additionally, we propose an LLM-powered evaluation framework to assess the completeness and accuracy of structured representations. Demonstrated in the transformation of Over-the-Counter (OTC) financial derivative contracts into the Common Domain Model (CDM), CDMizer establishes a scalable foundation for AI-driven document understanding, structured synthesis, and automated validation in broader contexts.