Vibe coding you do end up spending a lot of time waiting for prompts, so I get the results of that study.
I fall pretty deep in the power user category for LLMs, so I don’t really feel that the study applies well to me, but also I acknowledge I can be biased there.
I have custom proprietary MCPs for semantic search over my code bases that lets AI do repeated graph searches on my code (imagine combining language server, ctags, networkx, and grep+fuzzy search). That is way faster than iteratively grepping and code scanning manually with a low chance of LLM errors. By the time I open GitHub code search or run ripgrep Claude has used already prioritized and listed my modules to investigate.
That tool alone with an LLM can save me half a day of research and debugging on complex tickets, which pays for an AI subscription alone. I have other internal tools to accelerate work too.
I use it to organize my JIRA tickets and plan my daily goals. I actually get Claude to do a lot of triage for me before I even start a task, which cuts the investigation phase to a few minutes on small tasks.
I use it to review all my PRs before I ask a human to look, it catches a lot of small things and can correct them, then the PR avoids the bike shedding nitpicks some reviewers love. Claude can do this, Copilot will only ever point out nitpicks, so the model makes a huge difference here. But regardless, 1 fewer review request cycle helps keep things moving.
It’s a huge boon to debugging — much faster than searching errors manually. Especially helpful on the types of errors you have to rabbit hole GitHub issue content chains to solve.
It’s very fast to get projects to MVP while following common structure/idioms, and can help write unit tests quickly for me. After the MVP stage it sucks and I go back to manually coding.
I use it to generate code snippets where documentation sucks. If you look at the ibis library in Python for example the docs are Byzantine and poorly organized. LLMs are better at finding the relevant docs than I am there. I mostly use LLM search instead of manual for doc search now.
I have a lot of custom scripts and calculators and apps that I made with it which keep me more focused on my actual work and accelerate things.
I regularly have the LLM help me write bash or python or jq scripts when I need to audit codebases for large refactors. That’s low maintenance one off work that can be easily verified but complex to write. I never remember the syntax for bash and jq even after using them for years.
I guess the short version is I tend to build tools for the AI, then let the LLM use those tools to improve and accelerate my workflows. That returns a lot of time back to me.
I do try vibe coding but end up in the same time sink traps as the study found. If the LLM is ever wrong, you save time forking the chat than trying to realign it, but it’s still likely to be slower. Repeat chats result in the same pitfalls for complex issues and bugs, so you have to abandon that state quickly.
Vibe coding small revisions can still be a bit faster and it’s great at helping me with documentation.
Don’t you have any security concerns with sending all your code and JIRA tickets to some companies servers? My boss wouldn’t be pleased if I send anything that’s deemed a company secret over unencrypted channels.
The tool isn’t returning all code, but it is sending code.
I had discussions with my CTO and security team before integrating Claude code.
I have to use Gemini in one specific workflow and Gemini had a lot of landlines for how they use your data. Anthropic was easier to understand.
Anthropic also has some guidance for running Claude Code in a container with firewall and your specified dev tools, it works but that’s not my area of expertise.
The container doesn’t solve all the issues like using remote servers, but it does let you restrict what files and network requests Claude can access (so e.g. Claude can’t read your env vars or ssh key files).
I do try local LLMs but they’re not there yet on my machine for most use cases. Gemma 3n is decent if you need small model performance and tool calls, phi4 works but isn’t thinking (the thinking variants are awful), and I’m exploring dream coder and diffusion models. R1 is still one of the best local models but frequently overthinks, even the new release. Context window is the largest limiting factor I find locally.
Batch process turning unstructured free form text data into structured outputs.
As a crappy example imagine if you wanted to download metadata about your albums but they’re all labelled “Various Artists”. You can use an LLM call to read the album description and fix the track artists for the tracks, now you can properly organize your collection.
I’m using the same idea, different domain and a complex set of inputs.
It can be much more cost effective than manually spending days tagging data and writing custom importers.
You can definitely go lighter than LLMs. You can use gensim to do category matching, you can use sentence transformers and nearest neighbours (this is basically what Semantle does), but LLM performed the best on more complex document input.
Vibe coding you do end up spending a lot of time waiting for prompts, so I get the results of that study.
I fall pretty deep in the power user category for LLMs, so I don’t really feel that the study applies well to me, but also I acknowledge I can be biased there.
I have custom proprietary MCPs for semantic search over my code bases that lets AI do repeated graph searches on my code (imagine combining language server, ctags, networkx, and grep+fuzzy search). That is way faster than iteratively grepping and code scanning manually with a low chance of LLM errors. By the time I open GitHub code search or run ripgrep Claude has used already prioritized and listed my modules to investigate.
That tool alone with an LLM can save me half a day of research and debugging on complex tickets, which pays for an AI subscription alone. I have other internal tools to accelerate work too.
I use it to organize my JIRA tickets and plan my daily goals. I actually get Claude to do a lot of triage for me before I even start a task, which cuts the investigation phase to a few minutes on small tasks.
I use it to review all my PRs before I ask a human to look, it catches a lot of small things and can correct them, then the PR avoids the bike shedding nitpicks some reviewers love. Claude can do this, Copilot will only ever point out nitpicks, so the model makes a huge difference here. But regardless, 1 fewer review request cycle helps keep things moving.
It’s a huge boon to debugging — much faster than searching errors manually. Especially helpful on the types of errors you have to rabbit hole GitHub issue content chains to solve.
It’s very fast to get projects to MVP while following common structure/idioms, and can help write unit tests quickly for me. After the MVP stage it sucks and I go back to manually coding.
I use it to generate code snippets where documentation sucks. If you look at the ibis library in Python for example the docs are Byzantine and poorly organized. LLMs are better at finding the relevant docs than I am there. I mostly use LLM search instead of manual for doc search now.
I have a lot of custom scripts and calculators and apps that I made with it which keep me more focused on my actual work and accelerate things.
I regularly have the LLM help me write bash or python or jq scripts when I need to audit codebases for large refactors. That’s low maintenance one off work that can be easily verified but complex to write. I never remember the syntax for bash and jq even after using them for years.
I guess the short version is I tend to build tools for the AI, then let the LLM use those tools to improve and accelerate my workflows. That returns a lot of time back to me.
I do try vibe coding but end up in the same time sink traps as the study found. If the LLM is ever wrong, you save time forking the chat than trying to realign it, but it’s still likely to be slower. Repeat chats result in the same pitfalls for complex issues and bugs, so you have to abandon that state quickly.
Vibe coding small revisions can still be a bit faster and it’s great at helping me with documentation.
Don’t you have any security concerns with sending all your code and JIRA tickets to some companies servers? My boss wouldn’t be pleased if I send anything that’s deemed a company secret over unencrypted channels.
The tool isn’t returning all code, but it is sending code.
I had discussions with my CTO and security team before integrating Claude code.
I have to use Gemini in one specific workflow and Gemini had a lot of landlines for how they use your data. Anthropic was easier to understand.
Anthropic also has some guidance for running Claude Code in a container with firewall and your specified dev tools, it works but that’s not my area of expertise.
The container doesn’t solve all the issues like using remote servers, but it does let you restrict what files and network requests Claude can access (so e.g. Claude can’t read your env vars or ssh key files).
I do try local LLMs but they’re not there yet on my machine for most use cases. Gemma 3n is decent if you need small model performance and tool calls, phi4 works but isn’t thinking (the thinking variants are awful), and I’m exploring dream coder and diffusion models. R1 is still one of the best local models but frequently overthinks, even the new release. Context window is the largest limiting factor I find locally.
I would love some story on why AI is needed at all.
Batch process turning unstructured free form text data into structured outputs.
As a crappy example imagine if you wanted to download metadata about your albums but they’re all labelled “Various Artists”. You can use an LLM call to read the album description and fix the track artists for the tracks, now you can properly organize your collection.
I’m using the same idea, different domain and a complex set of inputs.
It can be much more cost effective than manually spending days tagging data and writing custom importers.
You can definitely go lighter than LLMs. You can use gensim to do category matching, you can use sentence transformers and nearest neighbours (this is basically what Semantle does), but LLM performed the best on more complex document input.
Thanks.
That’s pretty much what google says they use AI for, for structuring.
Thanks for your insight.