2 years of IRIS Projects

Two years, two projects, two posters...

...and 3,094,081 parameters.


In Years 9 and 10, I participated in two of the Institute of Research in Schools' (IRIS) projects and completed a research project for each, culminating in presenting both at their annual conference in London.

  • Cosmic Mining - Automated classification of astronomical spectra using image classification models
  • Big Data: ATLAS - Detecting the Higgs Boson using Graph Neural Networks

Cosmic Mining

Manual classification of astronomical spectra is slow and can be subjective, therefore, in this project, I tested whether a Convolutional Neural Network (CNN) can classify spectral features automatically with human-level accuracy. I used 1635 unclassified low-resolution spectra from the Spitzer Space Telescope obtained from the CASSIS database which were quantised, flux-normalised, (classified by me!), then converted into fixed-resolution images for training data. For each classification task below, I fine-tuned a pre-trained MobileNet model (with 3 million trainable parameters):

  • overall continuum
  • absorption at 13.7 µm
  • red excess
  • and stellar component

I evaluated on an unseen validation set of 251 spectra and shows strong performance across tasks (approximately 0.91–0.96 accuracy across the 4 tasks), with the 13.7 µm absorption feature achieving the lowest accuracy, and red excess the highest.

You can read the full write-up here

Detecting the Higgs Boson using Graph Neural Networks

Detecting Higgs Boson decay events requires separating signal events from the Standard Model background. In this project, I investigated designing a Graph Neural Network (GNN) for classifying events from ATLAS Open Data for the the H → WW → ℓνℓν decay channel. Each event can be modelled as a graph where:

  • Nodes model the reconstructed particles (e.g. leptons/jets) described by features such as pT, η, and &phi.
  • Edges model relationships such as angular separation ΔR.

I built a three-layer GNN network (with 94,081 trainable parameters) using PyTorch geometric with global pooling and an MLP classifier (to output a value between 0 and 1) and trained using an 80/20 train-validation split on signal and background samples. While initial results do not show high accuracy, it receives ~65% weighted accuracy, showing the principle that the network can learn and differentiate between signal and background events.

You can read the full write-up here