There’s nothing like bonding over science with your seventh grader to make you feel both proud and profoundly inadequate. My son and I recently tackled his honors science project by diving headfirst into machine learning (ML) and exoplanet hunting. It was a bold move. I mean, who doesn’t want to turn a simple middle school project into a crash course on Python-based ML on Linux? As it turns out, finding exoplanets in a sea of “nothing-to-see-here” light curve data wasn’t just challenging — it was humbling.

Here’s the kicker: our struggles with this project made me realize something bigger — cybersecurity and exoplanet detection have a lot in common, and not in the “NASA uses Wi-Fi too” kind of way. Both involve sifting through an overwhelming amount of data, looking for that one-in-a-million anomaly, and both face a serious imbalance problem when it comes to training AI models.

So let’s take a journey through exoplanets, ML models, cybersecurity, and why AI might just be a hacker’s best friend.

How It Started: A Python, a CNN, and a Science Fair Walk into a Dataset

Our project began like most great adventures — with optimism and zero idea what we were getting into. The plan was simple: use light curve data from NASA’s Kepler mission to train a Python-based ML model to detect exoplanets. Light curves are like cosmic heartbeats — graphs showing how a star’s brightness changes over time. A dip in brightness might mean a planet is passing in front of the star, like a tiny celestial peekaboo.

We started with a feed-forward neural network (FFNN) because, well, it seemed approachable. Spoiler alert: it wasn’t. The FFNN essentially looked at the data and went, “Nah, I’m just going to guess ‘no exoplanet’ every time.” And you know what? It was technically accurate — just not helpful.

Next came the heavy artillery: convolutional neural networks (CNNs). CNNs are like the Sherlock Holmes of ML, designed to pick up patterns in complex data. Still, even with a CNN, our model had one favorite prediction: “no exoplanet.” Every. Single. Time.

Houston, We Have an Imbalance Problem

The real issue was our dataset. Exoplanet examples were outnumbered by non-exoplanet examples by about a bajillion to one (okay, a little less, but you get the idea). Machine learning models are like kids at a buffet — they’ll pick what’s easiest. In this case, it was easier for the model to just say “no exoplanet” and call it a day.

We tried everything to address this:

Class Weights: These are like putting your thumb on the scale to make the model pay more attention to the underrepresented class. Didn’t work.
SMOTE (Synthetic Minority Oversampling Technique): A fancy way of creating fake exoplanet data. Still didn’t work.
Ensemble Models: Multiple CNNs working together in a “voting” system. Better but not great.

No matter what we did, the imbalance problem reigned supreme.

The Cybersecurity Parallel: Attackers in a Galaxy of Defenders

Here’s where things got interesting. Our ML struggles mirrored a core challenge in cybersecurity: spotting attacks in oceans of normal behavior. Cybersecurity tools sift through terabytes of logs daily, looking for that one indicator of compromise (IoC). But like our exoplanet model, they’re battling an imbalance problem. Attack data is scarce and highly varied, while normal activity dominates the dataset.

Now imagine this: attackers leveraging AI. Offensive AI tools can learn to mimic normal behavior while crafting attacks. Think phishing emails indistinguishable from legitimate ones, or malware that adapts faster than you can say “zero-day exploit.”

Meanwhile, defensive AI tools struggle because:

Data Scarcity: Logs are packed with “no attack” entries, making true positives rare.
False Positives: Overreacting to anomalies is a quick way to lose credibility.
Adaptation: Attackers can train their tools on the same defenses, essentially gaming the system.

It’s like trying to catch a needle in a haystack when the needle has camouflage and knows how you’re searching.

The Models and Methods Behind the Madness

Let’s get nerdy for a minute. On Linux, Python is the de facto standard for ML, and tools like PyTorch and TensorFlow are the go-to frameworks. For our exoplanet project, we cycled through several model types:

Feed-Forward Neural Networks (FFNNs): Great for structured data but outmatched by the complexity of light curves.
Convolutional Neural Networks (CNNs): Designed for pattern recognition, like identifying cats in photos — or exoplanets in light curves. Still not magic.
Ensemble Learning: Combining multiple models to vote on predictions. Like democracy, it works better in theory than practice.

Cybersecurity AI often uses similar approaches but tailored for anomaly detection. Models like autoencoders and recurrent neural networks (RNNs) excel at spotting unusual sequences in time-series data. Still, they’re only as good as the data they’re trained on — and therein lies the rub.

Galaxy light reaching the James Webb telescope from up to 13 billion years ago

Why Attackers Have the Edge

Let’s be real: AI is going to be a game-changer for attackers. They don’t have to deal with the same data imbalance issues because they’re the ones creating the anomalies.

Offensive AI can:

Generate Phishing Content: AI tools like ChatGPT can craft phishing emails that pass human scrutiny.
Mimic Legitimate Behavior: Malware that looks and acts like a benign application? Check.
Automate Attacks: Tools that probe for vulnerabilities faster than any human ever could.

Defenders, on the other hand, are stuck playing catch-up. Their AI tools rely on historical data, which may not capture the latest attack methods. And when they do detect something, they face a new problem: what now? Alert fatigue is real, and most SOC teams can’t chase every lead.

The Path Forward: Breaking the Cycle

So, is there hope? Maybe. Just like exoplanet detection could improve with better data (and maybe quantum computing), cybersecurity can evolve. Here’s what needs to happen:

Better Data Curation: Balance datasets with synthetic but realistic attack scenarios.
Continuous Learning: Deploy models that adapt to new threats in real time.
Collaboration: Share threat intelligence across organizations to improve the collective defense.

And maybe — just maybe — we need a breakthrough in AI technology, akin to going from the Wright brothers to supersonic jets.

The Confusion Matrix was very confused on this 3-ensemble run (there aren’t 31 exoplanets in the training set)

Final Thoughts: The Cosmos and the Cloud

At the end of our project, my son and I didn’t find exoplanets, but we did find perspective. In both the vastness of space and the chaos of cyberspace, the real challenge is the same: finding the lone anomalies that matter.

AI can help, but it’s not a silver bullet. Whether you’re searching for planets or preventing breaches, the key is recognizing the limits of technology and working smarter to overcome them.

Now if you’ll excuse me, I’ve got to explain to a seventh grader why “learning from failure” is as good as winning. Wish me luck.

Thanks to the James Webb telescope for providing all the images

Addendum: Confessions of an AI Newbie

Before you finish this piece and start worrying that I’m sitting here building Skynet in my free time, let me set the record straight: while I certainly know my way around the world of cybersecurity (it’s been my professional home for decades), I am a relative noob when it comes to the intricacies of AI and ML coding. My son and I may have dabbled in neural networks and Python, but let’s be honest — there are people out there who eat, sleep, and breathe this stuff. They’re the real wizards of the AI world, and I tip my hat to them.

If my light curve escapades taught me anything, it’s that AI is incredibly complex, and getting it to do something truly revolutionary — whether it’s spotting exoplanets or stopping a cyberattack — requires a level of expertise and innovation that far exceeds what I brought to my son’s seventh-grade science project. Thankfully, those folks are out there. As you’re reading this, I’m hopeful that brilliant data scientists, AI researchers, and other experts are tackling these challenges with tools and techniques I can only dream of understanding.

In the meantime, I’ll keep doing what I do best: asking hard questions, cracking a few dad jokes, and doing my part to make the cyber world a little safer. And maybe, just maybe, inspiring a seventh grader to aim for the stars — both figuratively and literally.