Building a Macedonian Recipe Dataset: Collection, Parsing, and Comparative Analysis

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Macedonian culinary culture has long lacked high-quality, structured corpora for digital humanities and computational gastronomy research. Method: We introduce the first publicly available, high-quality Macedonian-language recipe dataset—comprising over 1,200 traditional dishes—constructed via multi-source web crawling, structured parsing, and standardized cleaning (including unit normalization, quantity extraction, and harmonization of heterogeneous ingredient descriptions). We further propose a novel co-occurrence pattern mining framework grounded in pointwise mutual information and lift to quantitatively identify culturally distinctive ingredient pairings. Results: Our analysis reveals signature Macedonian flavor combinations—for instance, paprika–yogurt–oregano—providing the first empirical characterization of its culinary syntax. This dataset bridges a critical gap in under-resourced language food culture studies and establishes a foundational resource and methodological paradigm for cross-lingual gastronomic computing and cultural heritage modeling.

Technology Category

Application Category

📝 Abstract
Computational gastronomy increasingly relies on diverse, high-quality recipe datasets to capture regional culinary traditions. Although there are large-scale collections for major languages, Macedonian recipes remain under-represented in digital research. In this work, we present the first systematic effort to construct a Macedonian recipe dataset through web scraping and structured parsing. We address challenges in processing heterogeneous ingredient descriptions, including unit, quantity, and descriptor normalization. An exploratory analysis of ingredient frequency and co-occurrence patterns, using measures such as Pointwise Mutual Information and Lift score, highlights distinctive ingredient combinations that characterize Macedonian cuisine. The resulting dataset contributes a new resource for studying food culture in underrepresented languages and offers insights into the unique patterns of Macedonian culinary tradition.
Problem

Research questions and friction points this paper is trying to address.

Constructing the first systematic Macedonian recipe dataset through web scraping
Addressing normalization challenges in ingredient descriptions and quantities
Analyzing distinctive ingredient combinations characterizing Macedonian culinary traditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Web scraping and structured parsing for dataset construction
Normalizing ingredient units, quantities, and descriptors
Analyzing ingredient patterns with PMI and Lift scores
🔎 Similar Papers
No similar papers found.
D
Darko Sasanski
Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
D
Dimitar Peshevski
Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje, North Macedonia
Riste Stojanov
Riste Stojanov
Associate Professor, Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University
Semantic WebNatural Language ProcessingSoftware Engineering
Dimitar Trajanov
Dimitar Trajanov
Prof.@ Ss. Cyril and Methodius University in Skopje & Visiting Research Prof.@ Boston University, US
Data scienceAI AgentsNLPSemantic webOpen data