This question burned in my brain for months.
Every time I read the latest Large Language Model (LLM) development, it popped into my head. I couldn’t stop thinking that most of the people pumping out content (positive and negative) about AI’s developing code were missing a far larger point.
What if, and I know this is a bit crazy, we leveraged the autonomous, non-deterministic problem-solving skills of AI systems to not just CREATE code, but to also ITERATIVELY solve it, fix it, secure it, and improve it?
Imagine a world where a vulnerability gets released, and without a single human lifting a hand, your LLM conglomerate swung into action and completely autonomously updated all instances of vulnerable software to the point where your production system was no longer impacted?
What if, on the back of that, it autonomously reviewed historic logs, searched for indicators of compromise, developed an understanding or potential compromise vectors and threat actors, reviewed open source threat intelligence, and analyzed your systems accordingly?
You turn up on Monday morning and are presented with a fully scoped response, along with all the outcomes.
Wouldn’t that be amazing?
Let's take it further. What if we took the concepts of CI/CD, Test Driven Development, and threat modeling, then designed a development system that integrated them at all stages of software development? Then, integrated them into our overall software AND incident response pipeline, ALONG WITH autonomous learning skills that systematically improved all aspects of the pipeline.
Now that’s a compelling vision!
Initial Starting Principles
The more I thought about it, the more I found myself convinced that this was indeed the right way to treat the ever-more-powerful LLMs being released. Not as a replacement for software developers, but rather as an integrated developer assistant that systematically improves the discipline of effective software development and cybersecurity.
I also found myself reflecting on a famous book I studied during my engineering degree: The Innovator's Dilemma. It reminded me that while I don’t believe AI systems are quite there yet, if this system could be created and implemented, it would almost certainly become the gold standard moving forward, and completely overtake existing approaches.
Compelling indeed.
After several months turning this over in my head, I finally felt like I was ready to have a go at tackling this. Building on my TradeOxy YouTubing experience, I decided to go all in on this, creating a new channel, Building the Age of AI, a new GitHub repository, Production AI, and new blog content (which you’re reading right now).
My first step was defining some starting principles. I’m sure they’ll be added to, adjusted, and refined over the next few years, but by golly, they represent a great starting point!
So here goes.
Principle #1: First Class Autonomy
My first principle speaks to the heart of a full, self-healing, self-improving system. It must operate autonomously. Any goal less than that doesn’t revolutionize the existing status quo.
Specifically, it must be free to:
- Develop code solutions based on objectives and outcomes, not just advanced autocomplete solutions.
- Plan, specify, and solve for lower-level objectives based on human-directed higher-level outcomes
- Systematically identify and learn from existing and evolving problem sets over time
Put in simpler terms, autonomy must be a first-class requirement for all systems moving forward.
Principle #2: Proof-by-testing not by assertion
If autonomous operation is the bedrock of my proposal, then proof-by-testing is the answer to the question: “How do we know it’s actually doing what it says?”
If you’ve done any vibe coding at all, you’ve already experienced the frustrating moment when your AI confidently tells you that everything is good, and then you try it in the real world, and it breaks.
In many ways, it reminds me of junior software engineers and cybersecurity practitioners. They build something in isolation, run it on their local machine in perfect conditions, follow the happy-path only, and then 💥. It falls apart when it meets real-world conditions.
Just like with human engineers, the way to fix this (while still enabling forward progress) is integration of rigorous testing into all stages of development.
With AI systems, we need to take this to the nth degree, to the point where literally every single function, feature, and known attack vector is designed and solved for. Always.
This sounds massive (and it is), but I would challenge you to refer back to my goal. If we can pull this off, we will radically change the security and effectiveness of software development!
Principle #3: Integrated Self-Learning
One of the things I spent a lot of time thinking about was the duality of computer-driven technological advancement. If you’ve ever looked at an old Commodore 64, you would quickly realize something interesting.
Every component has been hand-soldered. In fact, with tools freely available on the open market right now, most people, with a bit of study and research, could rebuild that computer by hand. No need for advanced factories and sub-visible soldering. No need for hyper-advanced material engineering practices.
So how did we get from there to where we are today?
The answer to this question is a case study in why integrating self-learning in my AI principles is so important.
The Commodore 64 allowed a group of engineers to design slightly better computer chips, RAM, and circuitry, leading to the next generation of components. These components improved the ability of people to research the next stage of development and brought about a new generation of material science, software expertise, and so on. In turn, this evolving ecosystem birthed the next generation of components.
The cycle continued to the point where, today, even the most basic smartphone on the market is unbelievably powerful compared to the most advanced systems of the time.
In many ways, I think AI is same and it has a duality of application. Let me explain.
LLM Capability Improvement
On the one hand, and most obviously, LLMs are themselves improving. Each new model from the frontier builders adds more capability, reasoning, and specificity to the model. The contrast between ChatGPT 3.5 and ChatGPT 5.5 is incredible.
Even more compelling is the developing LLM ecosystem. Synthetic datasets, accurate world modeling, and deterministic predictive datasets mean that the ongoing pace of this development will accelerate over time.
As a result, the capability of LLMs is getting better, which leads to ever better solutions to a given problem set.
LLM Usage Improvement
Less obvious is the way integrating LLMs into your software development improves the quality of how you deliver.
Take, for instance, my earlier assertion that literally every single function should be tested for passing and failing cases. Think about the sheer number of human-hours this would take without LLMs. Then, layer on threat modeling, end-to-end testing, and enforcing this at every layer of your development lifecycle.
Quite frankly, it is overwhelming.
Then, remember that in order for this to work, you need to create feedback loops so that what person X learns and improves can be fed back to person Y, Z, A, B, and C, so that you don’t keep repeating the same problems over and over again. Then, think about how each of these people needs to be managed, led, and enabled to do their best work all the time.
…
Crazy.
…
In contrast, with an LLM, all you’re doing is paying for token usage. It is completely ambivalent to what problems you ask it to solve, as each problem is just a solution to be implemented.
Testing 100% of functions? That’s just a requirement.
Therefore, what we really need to do is create an environment where learning outcomes are systematically distilled back into our overall requirements in such a way that they result in updated, more efficient outcome definitions.
Combined
Without getting over the top, imagine these two aspects working together. Each time LLM models improve, the quality of what they output goes up. Then, this quality improvement systematically improves literally every aspect of your software and cybersecurity lifecycle.
Like I said. Compelling.
Principle #4: Holistic Cybersecurity
My final principle is one that is near and dear to my heart. Cybersecurity.
As someone who has operated in and around Cybersecurity at the highest levels, I’ve seen firsthand how difficult it is to integrate throughout your software stack. I’ve seen how scary it is when teams introduce bugs because they didn’t test them properly, or when inexperienced senior leaders make absolutely crazy risk-based decisions.
Perhaps even more importantly, I’ve also come to understand how incredibly difficult it is to do Cybersecurity well. It is an incredibly technical domain, and requires incredibly advanced knowledge of every aspect of the software AND hardware stack to truly defend against: something that is almost impossible for any one individual to get across.
An LLM-based approach completely flips this. Once again, we find ourselves coming back to a very compelling observation. Cybersecurity, when done well, is simply a set of technical requirements, proven through effective testing, that must be addressed before a product goes live.
LLMs are exceptional at solving for requirements.
Imagine that…
So, Back to Technical Debt and Cybersecurity
At the start of this article, I posed a compelling question that I now want to return to and examine.
What If Your LLM AI Started Fixing Its Own Tech Debt (and Cyber Security Problems)?
If you took the principles I’ve just outlined and applied them across your entire stack, what do you think would happen?
Well, first of all, your cybersecurity stack would probably start to identify all the redundant code paths that are hanging out there just waiting to be exploited.
Next, your LLM would start working through technical paths and start identifying unnecessary code paths that are introducing complexity. It would eliminate them (without degrading the product), which would then trigger your Cybersecurity skill to get them gone.
Your LLM would start identifying inefficient functions because you could set it to use your testing environment to optimize for code efficiency. In turn, this would trigger the previous two aspects.
In the interests of your time (and my article length), I’ll stop there, but consider how these three things alone would be a step change in your software development.
Your technical debt would be consistently and constantly drawn down. Your cybersecurity would improve.
And really, we’re just at the start.