From GPT 3-5: Three Years Building With AI
Exploring the limitations of AI-powered development through the process of building a startup. I still refuse to let it use strongly typed python.
AI and I go way back. As long as Large Language Models (LLMs) have been around, I’ve been consistently leveraging them to develop software. Out of school and through the process of building a startup, my relationship with development has grown in parallel with the AI boom. These tools allowed me and my co-founder to stand up a production-ready identity platform with two people and a shoestring budget, long after we should have needed to hire more engineers.
These experiences have given me unique insight into the nuances and trajectory of one of the forefront practical applications of LLMs, software development1. Through this lens I’ve learned where the dragons are, and how to balance AI-powered tools effectively in full-scale development. Moreover, this story illustrates the continually decreasing but ever-present uncanny valley of coding. Yes, I will use m-dashes. No, none of this was written by AI.
Growing With AI
In 2022, I was sitting in my dorm room playing with beta access of a new platform called GPT-3. It was clunky and archaic by today’s standards but, at the time, mind-blowing. While limited in its immediate applicability, it was a harbinger of coming change.
In the beginning of 2023 I met my co-founder Marty Weiner, and began working on the project that would become our startup, VerifyYou. By this point ChatGPT and the surrounding hype were in full swing, but we were still in the early days where that hype outpaced the performance of barely comprehensible prompt responses and often
unexecutable code generation. As such, the utility of AI-driven development tools was minimal at best, actively detrimental at worst. Therefore we used it in our development process sparingly, and with great scrutiny if at all. That changed rapidly.
2023 was a busy year. Cue the release of GPT-4, 4.5, Claude, Llama, Copilot— LLMs were suddenly able to write meaningful code, with the efficacy of a bright intern. Under strict management and version control one could get it to enumerate solid integration tests, or perhaps build out a UI component. At the very least, autocomplete became much more useful, able to [tab] over whole lines, or blocks, rather than just variable names. It became a useful development tool, but I still referenced Stack Overflow (RIP) to double check its assertions.
In 2024, reasoning models made for yet another jump in AI lucidity. Instead of an intern, capable of executing rote tasks without the need for larger context, AI became a collaborator, able to discuss architecture tradeoffs and work out solutions to ambiguous problems. It was around this time that we began to raise our seed round, but in the meantime we were able to scale development (from demo to MVP) without actually scaling the number of developers.
Now let’s jump ahead to early 2025, where AI development tools are fully integrated, and full-service platforms like Lovable and Cursor have hit the scene. At this point I can reliably build out endpoint suites, draft full features, and convert a React Native component to work in a NextJS app with a single prompt and little pain. Developing with AI becomes a matter of speed, ensuring that all possible tedious drafting is outsourced, balanced against the complexity of letting it run away from you. The dangers lie in assuming models can do too much, and believing their unabashed confidence that leads the unwary into debugging doom loops and inane redundancies.
Finally, we arrive at present day: GPT-5, Codex, Claude Code, and the big players promising more to come. This tsunami, though arguably (partially) an investment bubble, is quite unstoppable in its push to reshape the way we build, live, and interact. Agent-driven development has finally met git, and I can sit back as it goes through debugging processes, looks at live local outputs using image analysis, and behaves in all respects like a functional developer. Almost.
The New Uncanny Valley
Despite incredible leaps over the last 3 years, LLM development still sits in a new uncanny valley— capable of attention to detail beyond any human developer, occasionally balanced by astonishing feats of stupidity and lack of inference. Models are, after all, only as good as their inputs. So, as long as they lack the complete project-specific or general context that humans are so good at inferring, LLMs will continue to be a powerful tool that requires an informed hand to guide it effectively.
I have no doubt that one day humans will not be needed to write code, but I think we have a long while before we aren’t needed to translate our often confounding and contradictory condition. Therefore, this uncanny valley persists, not as a product of an inferior intelligence on the part of LLMs, but as a fundamental inability to understand the specific ever-changing needs of society.
See this great article by Nathan Lambert:



Hi from SWH. The only podcast I listen to every week is Twit's Machine Intelligence, formerly known as This Week In Google. In their new format the interview someone from the AI world every week. I'd love to hear more about VerifyYou.