Two years, two projects, two posters...
...and 3,094,081 parameters.
In Years 9 and 10, I participated in two of the Institute of Research in Schools' (IRIS) projects and completed a research project for each, culminating in presenting both at their annual conference in London.
- Cosmic Mining - Automated classification of astronomical spectra using image classification models
- Big Data: ATLAS - Detecting the Higgs Boson using Graph Neural Networks
Cosmic Mining
Manual classification of astronomical spectra is slow and can be subjective, therefore, in this project, I tested whether a Convolutional Neural Network (CNN) can classify spectral features automatically with human-level accuracy. I used 1635 unclassified low-resolution spectra from the Spitzer Space Telescope obtained from the CASSIS database which were quantised, flux-normalised, (classified by me!), then converted into fixed-resolution images for training data. For each classification task below, I fine-tuned a pre-trained MobileNet model (with 3 million trainable parameters):
- overall continuum
- absorption at 13.7 µm
- red excess
- and stellar component
I evaluated on an unseen validation set of 251 spectra and shows strong performance across tasks (approximately 0.91–0.96 accuracy across the 4 tasks), with the 13.7 µm absorption feature achieving the lowest accuracy, and red excess the highest.
You can read the full write-up here
Detecting the Higgs Boson using Graph Neural Networks
Detecting Higgs Boson decay events requires separating signal events from the Standard Model background. In this project, I investigated designing a Graph Neural Network (GNN) for classifying events from ATLAS Open Data for the the H → WW → ℓνℓν decay channel. Each event can be modelled as a graph where:
- Nodes model the reconstructed particles (e.g. leptons/jets) described by features such as pT, η, and &phi.
- Edges model relationships such as angular separation ΔR.
I built a three-layer GNN network (with 94,081 trainable parameters) using PyTorch geometric with global pooling and an MLP classifier (to output a value between 0 and 1) and trained using an 80/20 train-validation split on signal and background samples. While initial results do not show high accuracy, it receives ~65% weighted accuracy, showing the principle that the network can learn and differentiate between signal and background events.
You can read the full write-up here
