🤖 AI Summary
Existing recipe generation methods rely on cross-entropy loss, which struggles to ensure the fidelity of structured cooking elements such as ingredients, steps, time, and temperature. This work proposes a composite loss function that, for the first time, models ingredients as point clouds in an embedding space and employs topological optimal transport to align predicted and ground-truth ingredient distributions. Additionally, it integrates Dice loss to jointly optimize multidimensional attributes including cooking time, temperature, and ingredient quantities. Evaluated on the RECIPE-NLG benchmark, the proposed approach significantly outperforms baseline models in both automatic metrics and human evaluations, achieving improved performance at the ingredient and action levels, a 62% human preference win rate, and enhanced accuracy in predicting time, temperature, and quantities.
📝 Abstract
Cooking recipes are complex procedures that require not only a fluent and factual text, but also accurate timing, temperature, and procedural coherence, as well as the correct composition of ingredients. Standard training procedures are primarily based on cross-entropy and focus solely on fluency. Building on RECIPE-NLG, we investigate the use of several composite objectives and present a new topological loss that represents ingredient lists as point clouds in embedding space, minimizing the divergence between predicted and gold ingredients. Using both standard NLG metrics and recipe-specific metrics, we find that our loss significantly improves ingredient- and action-level metrics. Meanwhile, the Dice loss excels in time/temperature precision, and the mixed loss yields competitive trade-offs with synergistic gains in quantity and time. A human preference analysis supports our finding, showing our model is preferred in 62% of the cases.