SUMMARY
My team tackled two related space asteroid problems using supervised and unsupervised machine learning. The first question was whether we could predict if an asteroid is potentially hazardous based on its physical characteristics. The second was whether we could identify geographic or class-based patterns in meteorite landings using clustering. We used separate datasets to answer these questions and get a complete picture of whether tracking asteroids can be made more efficient with machine learning.
For the asteroid problem, we worked with two datasets: a Kaggle CSV of 683 asteroids and live data pulled from NASA's Near-Earth Object API.  We cleaned and aligned them so a Random Forest Classifier trained on the labeled NASA data could be applied to predict hazardousness on the unlabeled Kaggle dataset. For the meteorite problem, we applied K-Means clustering to over 30,000 landing records using mass, latitude, longitude, and class as features, using an elbow plot to determine five as the optimal number of clusters.
The Random Forest performed well on the NASA data with strong cross-validated accuracy. The K-Means clustering, however, revealed that year of impact was the dominant factor driving cluster formation, meaning the geographic patterns we hoped to uncover weren't supported by the feature set we had.
Below is a video presentation that further describes methodology and results:

You may also like

Back to Top