The Dream: AI That Builds Truly Secure Software

For the first time, large language models are good enough to take over a serious portion of the software development lifecycle. That is an extraordinary thing to say, because for most of my career, software has required layers of people, process, review, tooling, security checks, and operational discipline just to get something small into production with confidence.

At the time I recorded this episode, I was looking at the latest model releases and felt that we had crossed an important threshold. I referenced OpenAI's 5.5 and Anthropic's Fable in the transcript because, from my perspective at the time, those releases made the next stage of AI-driven software development feel materially different. Whether every model lands perfectly or not is not the point. The point is that the capability curve has moved far enough that the real question is no longer just "can AI write code?" It is "can AI help us run the whole software development lifecycle properly?"

However, this means that writing code that appears to achieve an outcome is no longer the hard part. I can ask an LLM to build a feature, wire an API, refactor a module, create a test, or explain a failure, and it can often get surprisingly close. Sometimes it gets there faster than I could have done manually. That part is exciting.

These days, the hard part is everything else.

To be honest, this is the real reason I started the YouTube Channel: Building The Age of AI. I am interested in what happens when AI moves beyond isolated code generation and starts taking on the whole software development lifecycle. Not just faster software. Not just impressive demos. Software that is tested, threat modeled, observable, maintainable, and safe enough to put in front of real users.

My background is in cyber security, and that gives me a particular lens on this. I have seen too many incidents that did not begin with some dramatic architectural failure. They began with small changes that seemed sensible at the time. A form field that accepted the wrong kind of input. A package that had not been updated. A missing negative test. A feature that worked in the happy path but failed when a user, attacker, or integration behaved differently than expected.

That is why I do not think the future is simply "AI writes code." The future that actually matters is AI helping us build the surrounding system of quality. If we do not solve that, AI coding just makes it easier to produce insecure software faster.

Watch The Episode

Watch on YouTube: Watch the episode on YouTube

Code Generation Is Only One Layer

When people talk about AI replacing software development, they often focus on the visible act of writing code. That makes sense because code is the obvious output. You ask for a thing, the model writes files, and suddenly something runs.

But real software development is not only the act of producing code. It is the act of producing reliable behavior inside a changing system.

A feature does not exist in isolation. It touches routes, permissions, input validation, storage, logs, deployment, dependencies, user flows, tests, and sometimes payment, authentication, email, analytics, or third-party APIs. A feature that works in one demo can still be unsafe, untested, unobservable, or impossible to maintain.

That is where LLMs can get dangerous if they are treated as magic. They will often optimize for the request in front of them. If I ask for "a working upload feature," a model can build something that accepts a file and stores it. But did it validate size limits? Did it test a genuinely large file? Did it consider timeouts? Did it protect against malicious file names? Did it think about retries, cleanup, permissions, storage cost, observability, and failure states?

Sometimes it will. Often it will not, unless the workflow around the model forces those questions into the process.

That is the gap I care about.

The Security Work Most People Skip

Building on this perspective, many people never realize just HOW MUCH STUFF goes into good software development. Then, they go out of their way to ignore it. Focused on speed and ‘tightly scoped requirements’ they skip these surrounding protections and add pointless risk mitigations that systematically degrade the overall security of their product.

Take testing. Unit tests for the passing case are useful, but they are not enough. Secure software development writes tests that prove the system rejects bad input. I want this tested. Integration tests prove services talk to each other correctly (and prove fail safe techniques when bad input happens). End-to-end tests should cover EVERY ACTION THAT A REAL USER (GOOD OR BAD) COULD TAKE.

Key takeaway: And all of this should be tested before every release, including empty states, failure states, retries, permissions, and invalid data.

Then there’s my favorite: Threat Modelling. Before a feature is shipped, you should understand and model how it could be abused. What can an attacker control? What happens if they send unexpected input? What data can they access? What trust boundaries does the feature cross? What assumptions am I making about identity, network, storage, or third-party systems? A lot of insecure software comes from never asking those questions until after something breaks.

What about Vulnerability Management. I’ve written countless articles for companies, and analyzed innumerable platforms attempting to do this well, and yet we still have vulnerable software being released every day. Every package you install becomes part of your risk surface. In Python, Node, or any modern ecosystem, you are rarely just shipping your own code. You are shipping a tree of dependencies written by other people, updated on their schedules, with their own bugs and vulnerabilities. It is not enough to check them once at release. You need to keep checking. You need to update. Sometimes you need to replace a dependency entirely because the risk profile changes.

Logging and incident response matter as well. When something goes wrong, you need to know what happened. That means the system has to emit useful logs. Those logs need to be ingestible, searchable, and structured enough that you can reconstruct events quickly. Without that, you are guessing during an incident, and guessing is expensive.

Input validation is one of the oldest lessons in security, and it is still one of the most important. If a user can type into a box, upload a file, pass a parameter, or call an endpoint, the system needs to treat that input as untrusted. It needs to validate it, sanitize it, constrain it, and fail safely. Issues like XSS, injection, broken access control, and unsafe parsing often begin with a system trusting input it should never have trusted.

Even the transport layer matters. How is data protected in transit? What assumptions are being made about the network? What happens if a certificate expires, a proxy behaves unexpectedly, or a service boundary is compromised? Good software has to think through these layers because attackers and failures do not respect the boundaries of a neat feature ticket.

Why This Has Been So Hard To Do Well

The reason most teams do not do all of this perfectly is not because they are careless. It is because the workload is enormous. I know because I’ve been there.

If you take software quality seriously, one small change can create a long chain of necessary work. You need design review, implementation, tests, security checks, dependency scans, logging review, deployment checks, documentation updates, rollback planning, and sometimes migration sequencing. All of this requires coordination and integration, and gives rise to software that is dependent as much on personal dynamics as it is on effective code writing.

That simply does not scale.

As a result, the choice becomes: ship quickly and accept the risk, or slow down so much that the opportunity disappears. Or, in Silicon Valley speak: Move fast and break things or take the Apple monolithic approach. Quality is necessary, but the process to achieve it has been too expensive.

AI Changes This Equation

This is where I think AI changes the equation.

Not because the model is magically better than every specialist. It is not. I am not claiming that an LLM is already a better software developer than the best engineer, or a better security expert than the best incident responder. I am saying that the model can help execute repeatable quality work if we give it the right structure.

This means we can eliminate the basic errors and focus on solving the hard problems.

That structure is what I mean by skills.

Why Skills Matter

A skill is not just a prompt. A prompt can be useful, but it is usually temporary. It depends on what you remember to ask in that moment. A skill is more like a reusable operating procedure for an AI system. It tells the model how to approach a class of work, what evidence to gather, what standards to apply, what questions to ask, what outputs to produce, and what must be proven before the job is considered done.

That matters because AI systems are highly sensitive to context. If I ask vaguely, I get vague work. If I ask narrowly, I may get a narrow answer that misses important adjacent risk. If I ask with a repeatable skill, I can force the model to move through the same quality gates every time.

That is the heart of the Production AI idea. Instead of relying on one giant prompt or one heroic model, I want an ecosystem of skills that makes good engineering behavior repeatable. A clarify-before-build skill should stop vague requirements from becoming vague software. A feature design skill should turn a requirement into a testable implementation plan. A repo testing skill should make sure the project has a real local proof system. A threat modeling skill should force evidence-grounded security thinking. A logging skill should make observability part of the build, not an afterthought.

Individually, each skill handles one part of the problem. Together, they start to look like a software development lifecycle.

That is the dream: not AI that writes a file and walks away, but AI that participates in the discipline of building well.

The Innovator's Dilemma Angle

I have also been thinking about this through the lens of The Innovator's Dilemma. Disruptive technologies often start out looking worse than the incumbent approach in specific, high-end use cases. They are easier to dismiss because a specialist can still outperform them in a narrow comparison.

That is how I see a lot of AI criticism today. Can an LLM outperform the best software engineer on a complex architecture decision? Not always. Can it outperform a senior security specialist on a subtle threat model? Not necessarily. Can it replace a whole engineering organization tomorrow? No.

But that may be the wrong comparison.

The more important question is whether AI can make previously expensive disciplines accessible to far more people. Can a solo builder get meaningful test coverage, threat modeling, dependency scanning, logging review, and implementation planning into their workflow? Can a small business build with more discipline than it could afford before? Can a team use skills to reduce the amount of quality work that depends purely on memory, habit, or heroics?

I think the answer is yes, or at least that it is now worth finding out properly.

The exciting part is the compounding effect. If the workflow is built around reusable skills, then every model improvement makes the entire workflow better. Better reasoning improves planning. Better code understanding improves tests. Better tool use improves validation. Better long-context handling improves cross-repo analysis. The skills provide the structure, and the models keep getting more capable inside that structure.

How This Changes The Work

If this works, I do not think it means software developers and cyber security professionals vanish. I think their jobs change.

The valuable work becomes less about manually remembering every checklist item and more about designing the systems that make good work happen by default. It becomes more about judgment, review, prioritization, and knowing when the AI has missed something. It becomes more about building the workflows, skills, gates, and feedback loops that make the model useful in real production contexts.

That is a very different job from simply typing code into an editor.

It also raises the bar. If AI makes it cheap to generate code, then the differentiator becomes whether that code is trustworthy. Can it be tested? Can it be reviewed? Can it be deployed safely? Can it be observed in production? Can it be maintained six months later? Can it survive hostile input and dependency churn?

Those are the questions I want this channel to explore.

Where Builders Should Start

If you are building with AI today, my practical advice is to stop treating the model as just a code generator. Treat it as a junior system that needs process around it.

Start with requirements. Before building, force clarity. What does the feature actually need to do? What are the non-goals? What would make it fail in production? What are the edge cases? What does "done" mean in observable terms?

Then force test design before implementation. What unit tests prove the core logic? What integration tests prove the boundaries? What end-to-end tests prove the user workflow? What negative tests prove the system rejects bad input?

Then look at the security and operational layers. What dependencies are being introduced? What inputs are trusted? What needs to be logged? What permissions are required? What could fail? What happens if an external provider is down? What does rollback look like?

None of that is glamorous, but it is the difference between a demo and software.

The reason I am optimistic is that these questions are repeatable. If they are repeatable, they can be encoded into skills. If they can be encoded into skills, they can be run every time. And if they can be run every time, then individuals and small teams can start operating with a level of discipline that used to require far more people.

The Thesis Of This Channel

This is the opening thesis for Building The Age of AI. I am not claiming LLMs are already the best engineer or the best security expert in the room. I am saying that the surrounding work of software quality needs to happen, and for the first time we may have tools that make that work feasible at a much smaller scale.

That matters for builders, because it means we can move faster without accepting as much hidden risk. It matters for businesses, because better software quality should reduce incidents, rework, and long-term maintenance cost. It matters for users, because they deserve software that is more secure and reliable than the average rushed build usually gives them.

The question is whether we can get the skills right.

If we can, then every improvement in the underlying models improves not only code generation, but the whole software development lifecycle around it: planning, testing, threat modeling, logging, dependency management, incident response, and maintenance.

That is the dream I am working toward. AI should not just help us write more code. It should help us build better software.

References

Episode: The Dream - AI That Builds Truly Secure Software
Channel: Building The Age of AI
Source repository: Production AI on GitHub
Book: The Innovator's Dilemma: When New Technologies Cause Great Firms to Fail by Clayton M. Christensen