Quality assessment of spectroscopic data reduction pipelines using ML

We are implementing and scaling a UMAP + FoF framework, following Suárez-Pérez (2023), to identify anomalous DESI spectra and prioritize expert review at survey scale.

UMAP embedding animation from DESI spectra
≈ 71,000,000Coadded spectra analyzed (DESI DR2)
14,199Tiles processed (tile-level embeddings)
~ 1%Typical outlier fraction per tile

UMAP (cosine, 2D) FoF groups Per-tile processing Inspector-ready IDs Cross-match: HSC/KiDS


What we have found

  • Distribution of candidates across tiles: a minority of fields dominate the counts.
  • Instrumental patterns: elevated rates near spectrograph/petal boundaries (e.g., FIBERID banding).
  • Recurrent morphologies: negative blue-arm continuum, arm-join flux steps, and arm-dependent template mismatches.
  • Astrophysical outliers: cross-matching flagged objects reveals confirmed gravitational lenses present in external catalogs (e.g., HSC, KiDS).

Why it matters

The pipeline highlights the small fraction of spectra that most warrant expert inspection, delivers stable, class-stratified QA metrics, and localizes issues to specific tiles, petals, and fibers, making it practical to monitor reduction performance consistently across data releases.
At the same time, catalog cross-matching naturally brings forward astrophysically interesting systems, such as known gravitational lenses.

Cross-matching: For candidates, we resolve DESI target IDs to sky positions and verify matches against published lens catalogs (HSC, KiDS, etc.).
Several flagged spectra coincide with known lenses, validating the approach on real survey data.
UMAP embedding animation from DESI spectra

Collaborators


Resources

Draft results are summarized here from ongoing paper: “Quality assessment of spectroscopic data reduction pipelines using AI: scrutinizing DESI DR2.”