What If Your AI Code Fixed Its Own Tech Debt?

When I started planning Building The Age of AI, I sat down and wrote out what I actually wanted from AI-assisted software development. I was not interested in one more demo where a model spits out a small app that looks impressive for five minutes and then collapses when you click around it. I wanted something much more useful than that.

The dream was simple to describe and difficult to build: I wanted large language models to take over as many parts of software development and cyber security as they safely could, automatically improving the quality of the final product.

To me, this was the critical distinction. Faster code writing is not enough. If AI only helps us generate insecure software faster, then we have not really solved anything. All we’ve really done is made it easier to create future incidents. Instead, what I really want to build is an ecosystem where AI helps with requirements, design, implementation, testing, security review, vulnerability management, logging, maintenance, and eventually the ongoing reduction of technical debt.

In other words, I do not want AI to be a better autocomplete. I want it to become part of the discipline of building software properly.

In the end, I broke this down into six principles.

Principle #1: The Starting Point Is High Quality Code

High quality code eliminates a lot of the problems that cyber security teams end up dealing with later. It is easier to read, easier to review, easier to test, easier to deploy, and easier to change. When the code is messy, ambiguous, and full of hidden assumptions, every later step becomes more expensive.

This is also where AI can either help or hurt. A model can produce a lot of code very quickly, but if that code is not structured, tested, and aligned with the rest of the application, it just increases the blast radius. A system built with AI needs standards that force the model to produce code that people can understand and maintain.

That is why Production AI is based around skills rather than one magic prompt. A skill gives the model a repeatable way to handle a class of work. It can define what evidence to gather, what questions to ask, what tests to create, what standards to apply, and what counts as done.

That is the only way I can see AI software development becoming trustworthy at scale.

Principle #2: Secure By Design Has To Be Proven

One of my strongest views is that secure software has to be backed by tests. It is not enough to say that a feature appears to work. Every meaningful user interaction should have tests that prove it behaves correctly and rejects the things it should reject.

This is where a lot of teams still fall short. A form field accepts user input, that input flows to a server, and nobody has properly tested what happens with unexpected data. That is still happening in 2026, and it should not be normal.

The AI version of this problem is even sharper. If I ask a model to build a feature and I do not force it to think through validation, edge cases, negative paths, permissions, and abuse cases, it will often optimize for the happy path. It will make the thing appear to work. But appearing to work is not the same as being production ready.

So the system has to make testing non-negotiable. Unit tests need to cover the logic. Integration tests need to prove the parts work together. End-to-end tests need to cover real user behavior. Security-focused tests need to prove that bad inputs are rejected and that trust boundaries are respected.

That is the direction I want AI development to move in: not less testing because AI is faster, but more consistent testing because AI can help produce and maintain the proof.

Principle #3: Vulnerability Management Should Not Be A Panic Button

Another part of the dream is automated vulnerability management. Modern software is full of dependencies. Your application is not just your code; it is your code plus the packages, frameworks, runtimes, base images, operating systems, and build tools that support it.

When a high-impact vulnerability is released, too many teams are still trying to figure out where the affected package exists, whether they are exposed, what has to be updated, and whether the update breaks anything. They are building the response while the incident is already underway.

That process should be far more automated.

At a minimum, AI should be able to help identify the affected dependency, locate where it is used, propose the update, run the relevant tests, summarize the risk, and prepare the change for human approval. Even automating half of that workflow would make teams faster and safer.

Vulnerabilities are not rare exceptional events. They are part of the normal operating environment for any codebase that depends on other people's software.

Principle #4: The System Has To Improve As Models Improve

A key reason I am building this around skills is that models keep getting better. Every time the underlying model improves, the skills should be able to improve with it. This creates a powerful self-improvement loop which should exponentially accelerate true secure software development.

The only way to do this at scale is to encode each workflow element into a skill, and then create a meta skill layer over the top that focus on optimizing these skills over time. This allows your AI to capture and learn from mistakes, identify effective patterns, and eliminate whole classes of issues over time.

Principle #5: Continuously Reduce Technical Debt

One of the points I keep coming back to is that technical debt directly affects the cost of AI development. If the codebase is messy, duplicated, poorly tested, or hard to navigate, the model has to read more context, reason through more uncertainty, and spend more iterations to make a safe change.

That means technical debt becomes token debt.

A cleaner codebase is not just nicer for humans. It is cheaper and more reliable for AI agents to work with. When a model can understand the boundaries quickly, find the relevant tests, and make a focused change, the cost of development goes down.

Interestingly, the objective-driven nature of LLMs means that you can feed these observations into principle #4 and autonomously and continuously reduce technical debt. You can identify when patterns are degrading, tests are missing, or when abstractions no longer fit, and then set your model to work on fixing those problems.

More Than The Sum Of Its Parts

The end goal is not one impressive skill. It is the combination of skills working together. If we can get that right, AI will not just change who writes code. It will change what good software development looks like.

That is the real dream behind Production AI. I want to see whether we can build a system where AI does not merely produce more code, but helps produce better software: software that is more secure, easier to maintain, cheaper to evolve, and more reliable in production.

Watch The Episode

References

Episode: What If Your AI Code Fixed Its Own Tech Debt?
Channel: Building The Age of AI
Source repository: Production AI on GitHub
Transcript page: Transcript - BAAI-Production-AI-Intro