🤖 AI Summary
Macedonian culinary culture has long lacked high-quality, structured corpora for digital humanities and computational gastronomy research. Method: We introduce the first publicly available, high-quality Macedonian-language recipe dataset—comprising over 1,200 traditional dishes—constructed via multi-source web crawling, structured parsing, and standardized cleaning (including unit normalization, quantity extraction, and harmonization of heterogeneous ingredient descriptions). We further propose a novel co-occurrence pattern mining framework grounded in pointwise mutual information and lift to quantitatively identify culturally distinctive ingredient pairings. Results: Our analysis reveals signature Macedonian flavor combinations—for instance, paprika–yogurt–oregano—providing the first empirical characterization of its culinary syntax. This dataset bridges a critical gap in under-resourced language food culture studies and establishes a foundational resource and methodological paradigm for cross-lingual gastronomic computing and cultural heritage modeling.
📝 Abstract
Computational gastronomy increasingly relies on diverse, high-quality recipe datasets to capture regional culinary traditions. Although there are large-scale collections for major languages, Macedonian recipes remain under-represented in digital research. In this work, we present the first systematic effort to construct a Macedonian recipe dataset through web scraping and structured parsing. We address challenges in processing heterogeneous ingredient descriptions, including unit, quantity, and descriptor normalization. An exploratory analysis of ingredient frequency and co-occurrence patterns, using measures such as Pointwise Mutual Information and Lift score, highlights distinctive ingredient combinations that characterize Macedonian cuisine. The resulting dataset contributes a new resource for studying food culture in underrepresented languages and offers insights into the unique patterns of Macedonian culinary tradition.