Anatomically Constrained Transformers for Cardiac Amyloidosis Classification

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing video-based deep learning models for cardiac amyloidosis (CA) classification often rely on non-clinically relevant regions in echocardiographic videos, undermining clinical interpretability and robustness. Method: We propose an anatomy-constrained Video Transformer framework: (1) dynamically generating myocardial masks from endo- and epicardial point clouds to extract only myocardial image patches and corresponding deformation points as tokens; (2) embedding this anatomical prior into the masked autoencoder (MAE) pretraining objective to enforce focus on pathologically relevant myocardial motion patterns; and (3) leveraging attention visualization to spatially localize decision evidence exclusively within the myocardium. Results: Our method achieves significantly higher CA classification accuracy than full-video Transformers and—crucially—enables the first dynamic, anatomy-aware, and spatially grounded model interpretation. It thus delivers both improved diagnostic performance and enhanced clinical trustworthiness through interpretable, myocardium-specific reasoning.

Technology Category

Application Category

📝 Abstract

Cardiac amyloidosis (CA) is a rare cardiomyopathy, with typical abnormalities in clinical measurements from echocardiograms such as reduced global longitudinal strain of the myocardium. An alternative approach for detecting CA is via neural networks, using video classification models such as convolutional neural networks. These models process entire video clips, but provide no assurance that classification is based on clinically relevant features known to be associated with CA. An alternative paradigm for disease classification is to apply models to quantitative features such as strain, ensuring that the classification relates to clinically relevant features. Drawing inspiration from this approach, we explicitly constrain a transformer model to the anatomical region where many known CA abnormalities occur -- the myocardium, which we embed as a set of deforming points and corresponding sampled image patches into input tokens. We show that our anatomical constraint can also be applied to the popular self-supervised learning masked autoencoder pre-training, where we propose to mask and reconstruct only anatomical patches. We show that by constraining both the transformer and pre-training task to the myocardium where CA imaging features are localized, we achieve increased performance on a CA classification task compared to full video transformers. Our model provides an explicit guarantee that the classification is focused on only anatomical regions of the echo, and enables us to visualize transformer attention scores over the deforming myocardium.

Problem

Research questions and friction points this paper is trying to address.

Classifying cardiac amyloidosis using anatomical constraints on transformers

Ensuring classification focuses on clinically relevant myocardial features

Improving performance by restricting attention to CA-specific abnormality regions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer constrained to myocardial anatomical regions

Masked autoencoder pre-training using anatomical patches

Visualizing attention scores over deforming myocardium

🔎 Similar Papers

Developing a Dual-Stage Vision Transformer Model for Lung Disease Classification