Galaxy Stellar Mass Estimation with Graph Neural Networks
This project explores stellar-mass estimation for galaxies in the
DESI Early Data Release Bright Galaxy Survey (EDR/BGS) using machine-learning models
built from photometric measurements and spectroscopic redshifts. The main goal is to compare
simple and interpretable regressors with an initial Graph Neural Network (GNN)
framework that incorporates spatial relationships between galaxies through graph-based representations.
Research project:
This work was developed at Universidad de los Andes as a study of machine-learning methods
for fast stellar-mass estimation in DESI BGS data, using both tabular models and graph-based
approaches built from observational galaxy samples.
DESI EDR / BGS428,758 BGS objects in the EDR sample
10 rosettesregion-based subsets for training and testing
Featuresg, r, z, W1, W2 fluxes + redshift
ModelsLinear, Random Forest, and initial GNN
DESI EDR / BGS
PROVABGS stellar masses
Fluxes: g r z W1 W2
Spectroscopic redshift
Graph from spatial proximity
PyTorch Geometric
Why it matters
Stellar mass is one of the central quantities for understanding galaxy evolution, since it is closely
related to star-formation history, quenching, morphology, and the interpretation of population-level trends.
For large spectroscopic surveys such as DESI, efficient methods to estimate stellar mass are especially useful,
since traditional SED-based approaches can be computationally expensive when applied at scale.
Motivation:
the project investigates whether graph-based machine learning can exploit not only galaxy photometry
and redshift, but also the relational structure of the observed galaxy distribution, providing a possible
path toward richer models beyond standard tabular regression.
Methodology
The analysis starts from DESI BGS galaxies with associated stellar-mass estimates from PROVABGS.
Input features include broadband flux measurements in the g, r, z, W1,
and W2 bands, together with redshift. In the graph-based version, galaxies are represented as
nodes and connected according to spatial proximity, allowing the model to propagate information through
local neighborhoods.
- Input features: optical and infrared fluxes plus spectroscopic redshift.
- Reference labels: stellar-mass estimates matched to PROVABGS entries.
- Tabular baselines: multiple linear regression and Random Forest regression trained on the same observational features.
- Graph construction: galaxies are treated as nodes, with edges defined from spatial separations below a fixed distance threshold.
- GNN model: an initial graph-convolution architecture implemented with message passing and batch normalization.
What this project shows
- Strong baseline performance: simple regression models already recover stellar masses very accurately from DESI BGS observables.
- Linear structure in the data: redshift appears as one of the most strongly correlated predictors of stellar mass.
- Robust nonlinear baseline: Random Forest captures more complex feature interactions while remaining stable against irrelevant inputs.
- Exploratory graph approach: the initial GNN establishes a graph-based framework for future improvements using observational galaxy relations.
Model comparison
- Multiple linear regression: the strongest results in the current study, with very high predictive accuracy across training and held-out rosettes.
- Random Forest: also performs well, with test values around R² ≈ 0.90 and low MSE in the rosettes shown.
- Initial GNN: in its first implementation, the graph model shows more modest performance and is best interpreted as a proof of concept rather than the final best-performing method.
Current interpretation:
in the uploaded project material, the main result is not that the GNN already outperforms the simpler models,
but that graph-based learning is a promising extension worth refining further on observational DESI data.
Resources
This project combines observational DESI data, simple interpretable regressors, and an initial graph-neural-network
implementation to study fast stellar-mass estimation in bright galaxies.
Collaborators