CLOT: Closed Loop Optimal Transport for Unsupervised Action Segmentation

📅 2025-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In unsupervised action segmentation, frame- and segment-level representations suffer from the absence of segment-level supervision and weak learning feedback. To address this, we propose the Closed-Loop Optimal Transport (CLOT) framework. CLOT employs an encoder-decoder architecture to jointly learn frame/segment embeddings and pseudo-labels. It introduces a novel three-stage optimal transport (OT) scheme coupled with cross-level attention, enabling bidirectional closed-loop optimization between frame- and segment-level representations. Furthermore, a hierarchical OT formulation integrates segment-level self-supervision, enhancing temporal consistency and representation discriminability. Extensive experiments on four benchmark datasets demonstrate that CLOT significantly outperforms state-of-the-art methods—including ASOT—validating the effectiveness and generalizability of its iterative feature learning mechanism for unsupervised action segmentation.

Technology Category

Application Category

📝 Abstract
Unsupervised action segmentation has recently pushed its limits with ASOT, an optimal transport (OT)-based method that simultaneously learns action representations and performs clustering using pseudo-labels. Unlike other OT-based approaches, ASOT makes no assumptions on the action ordering, and it is able to decode a temporally consistent segmentation from a noisy cost matrix between video frames and action labels. However, the resulting segmentation lacks segment-level supervision, which limits the effectiveness of the feedback between frames and action representations. To address this limitation, we propose Closed Loop Optimal Transport (CLOT), a novel OT-based framework that introduces a multi-level cyclic feature learning mechanism. Leveraging its encoder-decoder architecture, CLOT learns pseudo-labels alongside frame and segment embeddings by solving two separate OT problems. It then refines both frame embeddings and pseudo-labels through cross-attention between the learned frame and segment embeddings, integrating a third OT problem. Experimental results on four benchmark datasets demonstrate the benefits of cyclical learning for unsupervised action segmentation.
Problem

Research questions and friction points this paper is trying to address.

Improves unsupervised action segmentation with cyclical learning
Addresses lack of segment-level supervision in ASOT method
Integrates multi-level OT problems for better feature learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed Loop Optimal Transport (CLOT) framework
Multi-level cyclic feature learning mechanism
Cross-attention between frame and segment embeddings
🔎 Similar Papers
No similar papers found.