Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Deploying multilingual neural machine translation (NMT) on resource-constrained devices—such as IoT endpoints—remains challenging, especially for bidirectional translation between Indian and international languages. Method: This paper proposes an algorithm–hardware co-optimized lightweight NMT system, featuring quantization-aware training combined with sub-eight-bit mixed-precision quantization (FP4/INT4/FP8/INT8), tightly integrated with FPGA architecture for hardware-accelerated inference. Contribution/Results: The system achieves a 4.1× reduction in model size and a 4.2× speedup in inference latency, delivering 66 tokens/s throughput. FPGA resource utilization is significantly improved: LUTs decrease by 1.96× and flip-flops by 1.65×. Compared to OPU and HPTA baselines, throughput increases by 2.2–4.6×, enabling high real-time performance and resource efficiency under ultra-low power constraints.

Technology Category

Application Category

📝 Abstract

This paper introduces Bhasha-Rupantarika, a light and efficient multilingual translation system tailored through algorithm-hardware codesign for resource-limited settings. The method investigates model deployment at sub-octet precision levels (FP8, INT8, INT4, and FP4), with experimental results indicating a 4.1x reduction in model size (FP4) and a 4.2x speedup in inference speed, which correlates with an increased throughput of 66 tokens/s (improvement by 4.8x). This underscores the importance of ultra-low precision quantization for real-time deployment in IoT devices using FPGA accelerators, achieving performance on par with expectations. Our evaluation covers bidirectional translation between Indian and international languages, showcasing its adaptability in low-resource linguistic contexts. The FPGA deployment demonstrated a 1.96x reduction in LUTs and a 1.65x decrease in FFs, resulting in a 2.2x enhancement in throughput compared to OPU and a 4.6x enhancement compared to HPTA. Overall, the evaluation provides a viable solution based on quantisation-aware translation along with hardware efficiency suitable for deployable multilingual AI systems. The entire codes [https://github.com/mukullokhande99/Bhasha-Rupantarika/] and dataset for reproducibility are publicly available, facilitating rapid integration and further development by researchers.

Problem

Research questions and friction points this paper is trying to address.

Develops efficient multilingual translation for resource-limited IoT devices

Investigates ultra-low precision quantization to reduce model size

Optimizes hardware deployment using FPGA accelerators for speed

Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithm-hardware co-design for efficient multilingual translation

Ultra-low precision quantization reduces model size and speeds inference

FPGA deployment optimizes resource use for IoT devices

🔎 Similar Papers

No similar papers found.