I’ve been quiet lately, not because I ran out of things to say, but because I’ve been busy tending to my modest garden, hunting for my next role, but mostly, I’ve been building an AI assistant that doesn’t forget everything the moment you close a terminal window. Some of you may recall my previous post, where I introduced a command-line-based vulnerability analysis tool. It did what it was supposed to: parsed CVEs, connected them to internal policies and configurations, and spat out reasonably coherent summaries and an adequate risk analysis for the enterprise.
That should’ve been enough. But of course, it wasn’t.
What started as a proof-of-concept slowly evolved into a fully-fledged side project: an LLM (Large Language Model)-powered CISO sidekick with document ingestion, memory, and orchestration that finds new and interesting ways to break every time I make a change.
As luck would have it, I have a bit of time to write today, because I’m currently waiting on a full re-ingestion of source documents used for memory and RAG (Retrieval Augmented Generation). It turns out rebuilding a vector DB of nearly 900,000 chunks takes a while. Long enough to reflect on how this all spiraled and tee up a few posts about what I’ve learned (or at least broken) along the way.
What Took So Long
The short version? This was supposed to be a small project. Then I started coding. Here are a few of my thoughts on building an AI-based agent/tool.
- LLM-as-Coding-Assistant: Yes, I used an LLM to help build the LLM assistant. Yes, I see the irony. When it worked, it was super helpful, spitting out boilerplate code, fixing obvious bugs, scaffolding classes and modules, inserting logging statements, even writing decent first drafts of pipeline logic. But when it failed, it failed confidently—hallucinating imports from imaginary libraries, misreading its own context, forgetting what it suggested just minutes prior, and suggesting “fixes” that subtly broke everything. Eventually, I stopped trusting anything longer than 10 lines and started verifying every response like I was reviewing an audit report of my security program.
- Vibe Coding: Since my coding copilot wasn’t being as helpful as I wanted, and my Python skills are mediocre at best, I thought I’d try vibe coding on my first attempt at a rewrite. I didn’t start with a design document. I started with, “Let’s just get something working.” That worked fine until I had five partially overlapping modules, three conflicting chunking strategies, and an internal function named maybe_fix_it(). At some point, I found myself reverse-engineering my own project to understand why it was behaving the way it was. I also found that, even with frontier models, the AI would often forget or willfully ignore existing decisions, reference old code that had been deprecated or refactored, or get stuck in a debugging loop (try X, try Y, try X, try Y). You have to stay on top of your vibe coding partner constantly or you’ll end up with a broken app, having to revert git, or even branch your code from a known working version into a new repo and start over.
- Models: Local ones were private and cheap, but slow and occasionally unhelpful. Frontier models were fast and fluent, but expensive and a bit too eager to guess what I meant instead of what I asked. I eventually settled on a hybrid model setup that uses local LLMs for most use cases, a frontier API for niche ones, and a frontier AI for vibe coding.
- RAG: Retrieval-augmented generation is conceptually simple—chunk documents, embed them, pull them back as needed. In practice, it was two weeks of debugging chunking logic, tuning retrieval thresholds, and reading embeddings with a level of obsession normally reserved for horoscopes. Getting RAG right is more complicated than it looks; ingestion is time-consuming, mistakes are often catastrophic, meaning a hard reset, and retrieval can be buggy and inconsistent (inconsistency is a recurring theme with AI).
- Memory: I thought memory would be easy. Spoiler: it’s not. The assistant needs to remember enough to feel coherent, but not so much that it repeats itself or derails. Storing and recalling context across sessions involved building custom logic that now exists somewhere between “clever” and “duct-taped.” I’m sure the frontier systems (not just the LLM) are more the former, and mine is more the latter.
- Orchestration: This thing now runs on a small army of containers—tokenizer, embedder, LLM, vector DB, orchestrator, API service, and a handful of background workers. When it works, it feels like magic. When it doesn’t, it’s 2 AM, and I’m debugging why a service named
seed-orchestratorsuddenly thinks it’s responsible for chunking PDFs upside down. - Security: As a CISO, I couldn’t not threat model it. That led to discovering potential data exposures, misconfigured API endpoints, and the sinking feeling that I had built the kind of tool I’d normally yell at someone about. Locking it down took time and is still not where I want it, because the nature of generative AI, augmented memory, and RAG makes keeping sensitive data sandboxed difficult.
What’s Coming Next (Maybe)
I want to say this is the first of a regular series where I will dive into one of the topics above or share an update on progress, but let’s be honest, I’m still kind of addicted to getting this thing working just right. Which means I’ll probably disappear again the next time I decide to re-embed 900,000 chunks, swap out my LLM backend, or rewrite an entire API because I don’t like how a container logs errors.
That said, I do plan to share more. Maybe between rebuilds. Maybe while waiting for my vector DB to re-index itself for the third time this week.
Here’s what’s on deck—unless something explodes, or I change chunking strategies again:
- Vibe Coding an AI Sidekick – Building first, thinking later, and regretting just enough to keep going.
- Model Wars: Local vs. Frontier – How I tried to balance performance, cost, and security.
- RAG Isn’t Magic, It’s a Maintenance Plan – The dirty truth behind “just drop your docs into a folder and let Python do the rest.”
- Memory Without Regret – How to make an assistant that remembers what matters and forgets what doesn’t (kind of).
- Threat Modeling Myself – What happened when I looked at my own LLM stack with a critical eye.
Each post will go deeper into the pain, tradeoffs, and quiet victories of building a serious, usable AI assistant from scratch, as a security leader who knows better but did it anyway.

















