🤖 AI Summary
This work addresses cross-lingual discourse relation classification in the DISRPT 2025 shared task, where limited annotated data and language-specific annotation biases hinder robust generalization. Method: We propose a dual-path encoder–decoder architecture: mT5 serves as the multilingual encoder to extract semantic representations, while Qwen acts as the reasoning-aware decoder for relation inference. We integrate machine-translated augmented data with explicit linguistic features—including dependency paths and discourse connectives—and introduce a progressive fine-tuning strategy tailored for low-resource settings. Contribution/Results: The model achieves 71.28% macro-accuracy on the DISRPT 2025 test set—ranking among the top systems—demonstrating strong generalization under unsupervised and few-shot conditions. Error analysis confirms its effectiveness in mitigating cross-lingual annotation bias and improving long-distance dependency modeling, establishing a scalable paradigm for low-resource discourse parsing.
📝 Abstract
This paper presents DeDisCo, Georgetown University's entry in the DISRPT 2025 shared task on discourse relation classification. We test two approaches, using an mt5-based encoder and a decoder based approach using the openly available Qwen model. We also experiment on training with augmented dataset for low-resource languages using matched data translated automatically from English, as well as using some additional linguistic features inspired by entries in previous editions of the Shared Task. Our system achieves a macro-accuracy score of 71.28, and we provide some interpretation and error analysis for our results.