An Adaptive CMSA for Solving the Longest Filled Common Subsequence Problem with an Application in Audio Querying

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the NP-hard Longest Filled Common Subsequence (LFCS) problem, with applications in genome reconstruction and degraded audio song identification. To overcome the poor scalability and lack of large-scale benchmarks in existing approaches, we propose Adaptive CMSA—a metaheuristic framework integrating constructive initialization, modular subproblem decomposition, feedback-driven iterative refinement, and collaboration with external exact solvers. We introduce the first large-scale LFCS benchmark dataset; pioneer the application of LFCS to real-world audio querying via music energy contours; and conduct feature importance analysis to identify key performance determinants. Experiments demonstrate that our method consistently outperforms five state-of-the-art algorithms on both standard and newly constructed benchmarks. It solves 1,486 out of 1,510 instances with known optima (99.9% optimality rate), significantly enhancing scalability and robustness.

Technology Category

Application Category

📝 Abstract
This paper addresses the Longest Filled Common Subsequence (LFCS) problem, a challenging NP-hard problem with applications in bioinformatics, including gene mutation prediction and genomic data reconstruction. Existing approaches, including exact, metaheuristic, and approximation algorithms, have primarily been evaluated on small-sized instances, which offer limited insights into their scalability. In this work, we introduce a new benchmark dataset with significantly larger instances and demonstrate that existing datasets lack the discriminative power needed to meaningfully assess algorithm performance at scale. To solve large instances efficiently, we utilize an adaptive Construct, Merge, Solve, Adapt (CMSA) framework that iteratively generates promising subproblems via component-based construction and refines them using feedback from prior iterations. Subproblems are solved using an external black-box solver. Extensive experiments on both standard and newly introduced benchmarks show that the proposed adaptive CMSA achieves state-of-the-art performance, outperforming five leading methods. Notably, on 1,510 problem instances with known optimal solutions, our approach solves 1,486 of them -- achieving over 99.9% optimal solution quality and demonstrating exceptional scalability. We additionally propose a novel application of LFCS for song identification from degraded audio excerpts as an engineering contribution, using real-world energy-profile instances from popular music. Finally, we conducted an empirical explainability analysis to identify critical feature combinations influencing algorithm performance, i.e., the key problem features contributing to success or failure of the approaches across different instance types are revealed.
Problem

Research questions and friction points this paper is trying to address.

Solving NP-hard Longest Filled Common Subsequence problem efficiently
Addressing scalability limitations of existing algorithms on large instances
Developing adaptive CMSA framework with external solver integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive CMSA framework for large instances
External black-box solver for subproblems
Component-based construction with iterative refinement
🔎 Similar Papers
No similar papers found.