Hyrax: An Extensible Framework for Rapid ML Experimentation and Unsupervised Discovery in the Era of Rubin, Roman, and Euclid

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
This work addresses the infrastructure bottleneck in machine learning for next-generation astronomical surveys—such as Rubin, Roman, and Euclid—whose massive multimodal data streams overwhelm conventional analysis pipelines. We present the first open-source, astronomy-oriented, end-to-end machine learning framework featuring GPU acceleration, which integrates multimodal data processing, unsupervised representation learning, vector database indexing, density-based clustering, and interactive latent space exploration. Operating without labeled data, the framework enables systematic experimental comparison and scientific discovery. Validated across five real-world survey tasks, it successfully identifies novel galaxy mergers, low-surface-brightness objects, and gravitational lens candidates, while improving the accuracy of transient classification and exoplanet detection, thereby demonstrating its generality and computational efficiency.
📝 Abstract
The NSF-DOE Vera C. Rubin Observatory, Roman Space Telescope, Euclid, and other next-generation surveys will deliver imaging, spectroscopic, and time-domain data at scales that increasingly shift the bottleneck in astronomical machine learning (ML) projects from model design to infrastructure. We present Hyrax, an open-source, modular, GPU-enabled Python framework that supports the full ML lifecycle in astronomy: from data acquisition and training to inference and experiment comparison, with capabilities including multimodal dataset support, integrated vector databases for similarity search, and interactive two- and three-dimensional latent-space exploration for unsupervised discovery. We demonstrate Hyrax's versatility through five representative applications on real survey data: (i) unsupervised representation learning on $\sim 4\times10^5$ Rubin Legacy Survey of Space and Time (LSST) Data Preview 1 (DP1) galaxies, surfacing new merger and low-surface-brightness candidates missing from reference Euclid and Dark Energy Survey catalogs, while also isolating imaging artifacts -- all without labeled training data; (ii) hybrid density-based clustering for identifying cluster-scale gravitational lens candidates in DP1 data; (iii) multimodal early-time transient classification in the Zwicky Transient Facility leveraging light curves, spectra, images, and metadata; (iv) supervised false-positive filtering in shift-and-stack searches for distant solar system objects in the Dark Energy Camera Ecliptic Exploration Project survey; and (v) supervised detection of semi-resolved dwarf galaxies in Hyper Suprime-Cam and LSST-like imaging using synthetic source injection. Together, these results demonstrate that Hyrax provides astronomy-specific ML infrastructure that enables systematic discovery and rapid methodological iteration across next-generation astronomical surveys.
Problem

Research questions and friction points this paper is trying to address.

astronomical machine learning
next-generation surveys
ML infrastructure
unsupervised discovery
multimodal data
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal astronomy
unsupervised discovery
latent-space exploration
GPU-accelerated ML framework
vector database for similarity search
🔎 Similar Papers
No similar papers found.
Aritra Ghosh
Aritra Ghosh
Research Scientist, Meta
Machine LearningDeep LearningReinforcement Learning
D
Drew Oldag
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
M
Michael Tauraso
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
A
Andrew J. Connolly
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
Peter Ferguson
Peter Ferguson
Royal Prince Alfred Hospital, Melanoma Institute Australia, The University of Sydney
MelanomaGenitourinary PathologyMolecular Pathology
Derek Jones
Derek Jones
Lawrence Livermore National Laboratory, University of California - San Diego
Machine LearningBiophysicsDrug DiscoveryHigh Performance Computing
G
Gourav Khullar
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
A
Argyro Sasli
School of Physics and Astronomy, University of Minnesota, Minneapolis, MN 55455, USA
S
Samarth Venkatesh
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
G
Gracia Wang
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
M
Maxine West
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
D
Dylan Berry
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
N
Neven Caplar
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
C
Colin Orion Chandler
Department of Astronomy and Planetary Science, Northern Arizona University, Flagstaff, USA
T
Tanawan Chatchadanoraset
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
M
Michael W. Coughlin
School of Physics and Astronomy, University of Minnesota, Minneapolis, MN 55455, USA
M
Melissa DeLucchi
McWilliams Center for Cosmology and Astrophysics, Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213, USA
A
Alexandra Junell
School of Physics and Astronomy, University of Minnesota, Minneapolis, MN 55455, USA
D
Diego Miura
Department of Astronomy, Yale University, 219 Prospect Street, New Haven, CT 06511, USA
F
Felipe Fontinele Nunes
School of Physics and Astronomy, University of Minnesota, Minneapolis, MN 55455, USA
W
Wilson Beebe
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
D
Doug Branton
Dept. of Astronomy & the DiRAC Institute, University of Washington, Box 351580, Seattle, WA 98195, USA
S
Sandro Campos
McWilliams Center for Cosmology and Astrophysics, Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213, USA
L
Liam Cunningham
Department of Physics and Astronomy and PITT PACC, University of Pittsburgh, Pittsburgh, PA 15260, USA
M
Mi Dai
Department of Physics and Astronomy and PITT PACC, University of Pittsburgh, Pittsburgh, PA 15260, USA