There is no doubt that AI is a productivity boost (although maybe not as much of a boost as people think — I’ll discuss that in a future post), and one of the most amazing and obvious boosts is AI’s ability to write code.1
This ability moved from being just a handy tool to being a meme about 5 months ago with this post on X:
And “vibe coding” as a term was born.
If you play around with vibe coding, you’ll find that when it works, it’s impressive how good a job it can do. But when it doesn’t, it seems to just spiral down in an endless stream of mistakes, misunderstandings, and misfires. I’m sure you’ve seen it, where it does something wrong and no matter how much you try to get it to fix its errors it just seems incapable.
I think there’s some limitations to vibe coding that aren’t immediately obvious, but if you’re aware of them you can either overcome them or know when to give up.
The first is that LLMs have poorer results when you get further from what they’ve been trained on. Ask to vibe code something the LLM saw a lot of during training (or very similar), and it’s great. Ask for something that’s obscure, and it gets more hit and miss. Unfortunately, we don’t really know what the LLM “knows”, so it can seem random to us and hard to diagnose when it fails. It just doesn’t seem to get what seem like simple concepts. You try to explain, you try to refine, but it just seems like it never clicks.
The second is that LLMs work better with context and examples. Vibe coding from scratch is almost (but not always) a “zero-shot” approach. You’re just counting on what the LLM remembers and prioritizes. But if you ask AI for help with augmenting existing code, it has a ton of context (the existing code) to draw upon. So “vibe maintenance” is a lot more likely to produce good results.2 Even if you’re doing something from scratch, giving it something to chew on (an existing application that has some similarities) might get it over the hump.
The third is that while you and I have an intuition about what will work well for users, LLMs don’t. LLMs cannot imagine themselves using the code they’re creating and thinking through what its strengths and weaknesses might be. The symptoms of this are an application that “does what you asked” but inexplicably just doesn’t make any sense. Call it lack of empathy, if you like. We can fix this by distilling our empathy into a prompt, but that’s not a vibe anymore: that’s a specification.
The fourth is that once an LLM has gone down the wrong path, it seems very, very hard to get it back on track. You might think “I can just correct it” and yet you discover it is 10× harder than it should be. My sense is that this is because of the ever increasing size of the input to the LLM: every time you make another request, the entire conversation history is sent along with your request. Before too long, the total size of the calls to the LLM grows really large; unfortunately, LLMs are known to have problems staying focused as the size of the input grows. Put simply, the LLM just loses track of what it’s supposed to do.
Let me give a lengthy demonstration of the third point: that sometimes LLMs can’t tell if the app they’ve created is unusable.
I decided I wanted to build what seemed to me to be a trivial application: an app that helps edit social media posts to conform to the rather strict length limitations they have. Here’s my description of the challenge:
If you post on social media sites like X, Threads, or BlueSky, you’re painfully aware of the character limits they impose on your posts. There’s often a lot of shoehorning involved to get things to fit into one post, yet sometimes it’s just not possible. In those circumstances, you end up having to break your message up across multiple posts by creating an initial post and then replying to it with more (and replying to that with even more ad infinitum).
Splitting a message up into pieces is hard, as you’re trying to find the best breaking point which is going to be some trade off of finding a place where the split is the least jarring and where you can cram as many words into a post as possible.
That seems simple enough to vibe code, right? I’m not super fussy about how it does it, only that it be usable and suitable for the purpose. So vibe coding should be just about one and done, right? I gave AI the text above and added in “Build an app that makes it easier to do this. Use Bootstrap 5 / Flask / Jinja2.” Here’s the app it produced:
Oooh, pretty! But … when I tested it by pasting in a bunch of text and pressing Split Message … nothing happened. It doesn’t work at all!! How did that happen?3
I would have been a bit happier if it worked, but I noticed this wasn’t the process I was looking for. I didn't want to just split the text, I wanted to be able to adjust the splitting points by hand to produce posts that didn’t awkwardly break sentences in two, etc. There was no point in trying to fix this app as it was too far from what I wanted.
I went back to the drawing board and wrote up a much more precise specification to fill in a lot of the missing details. Did my better / lengthier / more precise prompt get me (a) a better design and (b) one that worked, all in a single request?
For (a) sort of, but (b) not even close. I probably should have started over, for this example I just kept hammering the LLM to see how much work it would take to get the app I wanted. And it was a lot: it wasn’t until version 14 that the app was close enough to call it done.
What would I have done differently if I had started over? In this case, I think I would have two avenues I might pursue:
Most likely, I would decompose the task and use an incremental approach: give the LLM a simpler initial target to build, and then do “vibe enhancement” to build out the features I want. This had the advantage that after each increment, you can start a new session with the LLM, which keeps the history from growing too large and starting to affect performance in a bad way.
If not #1, I could try changing things up a bit: rewrite the specification, try a different LLM, have a lengthier dialog before I let it start building code.
Neither of these are particularly brilliant; rather, my main point is “if at first you don’t succeed, don’t just try and try again: try something different.”
Here we go!
Want to share my pain? This section isn’t exciting reading, but if you want a turn-by-turn view of how (and how hard) it was to finish vibe coding the post splitting app, here’s an edited transcript of my interactions.
Initial Prompt
Since my first prompt produced a too simplistic (and non-functioning!) application, I rewrote my prompt to be (I thought) comprehensive and clear::
I want to write a web app that allows people to compose messages for social media. The idea is that while each site has a different message length, they all have limits that are fairly short. So you will want to break a message into multiple posts if it goes long. The app should help people write a lengthy message, to start, and then help them break it up into below the limit posts.
Some requirements:
I want, at the very least, spell checking, but it can be "for free" from the browser.
Support for the different limits in X (280 characters max), Threads (500 max) and Bluesky (300 max) by asking which service
URLs in BlueSky always take 31 characters, in X 23 characters, but in Threads they just take their string length. The length-of-post calculation has to take this into account.
Let the user start by composing their entire message, regardless of length.
There are some editing buttons for the user to use:
"Remove Extra Spaces": it changes all runs of two or more spaces into a single space
"Remove Extra Lines": it changes all runs of three or more newlines into two newlines (so no more than one blank line in a row)
"and → &": Changes all instances of the word "and" (case insensitive) to "&"
Include an indicator of the post length (as relevant to the service they've chosen up front to post to) and the limit.
If their post is below the limit, offer a button that allows them to Copy the text as is to the clipboard
If their post is above the limit, offer a button that allows them to Split the post into multiple posts
The splitting process has to be incremental and interactive. I'm envisioning something like the following:
There's a text box at the top that starts with the full limit of characters from their longer post, and a box below that with the remainder of their post. Both boxes have indicators of nnn/lll, where nnn is the actual number of characters in the box and lll is the limit for the service.
There is some sort of UI element (perhaps a slider) that allows the user to adjust how many characters are held in the top box; as they move it up and down characters move back and forth between the two boxes as the user adjusts the point where the post breaks for the next post. This UI element should make it easy to do 1 character at a time adjustments.
When the user is happy with how the top box is filled, they can click a button called "Freeze". At this point, if the bottom box is at or under the limit, they are done. If not, the bottom box breaks into two boxes and the process repeats with these two new boxes. This goes on until the post has been broken up into at or under the limit size posts.
Next to each box is a button labeled copy, which copies its contents to the clipboard.
There should be a "reset" button that brings everything back into one box that contains the entire post
I want to use Bootstrap 5 for my UI
There should be a title bar across the top of all pages with an icon (TBD) and the title "Post Slicer", and three menu options:
Home, which takes you back to the starting page.
About, which links to a page with lorem ipsum text (real text TBD)
Help, which links to a different page with lorem ipsum text (real text TBD)
If they are on the home page, and there is any text that the user has already entered, a warning pop-up will come up first telling them that they will lose their text if they navigate away should they click any of the meny choices (and of course give them the option of staying on the page). There is no user enterable text on About and Help, so no pop-up needed there.
I want to use Flask / Jinja2 to serve this app (so we can use the concept of a base template in Jinja2 to create the title bar etc.). I do not anticipate that it needs any runtime logic (apart from serving the pages) at this point, although perhaps in the future I might want to offer some server-side processing (perhaps "save my WIP" or even "post this for me" kind of capabilities) so I don't want to preclude doing that in the future.
Is there anything you need to know before writing this application?
Far beyond what a normal “vibe” should require4, I think. And as you can see from the last line, rather than just saying “build it!”, I gave it the chance to ask follow-up questions if there was any ambiguity, which it did:
Looking at your requirements, I have a good understanding of what you want to build. Before I start coding, I have a few clarifying questions to ensure I implement this exactly as you envision:
URL Detection: For the character counting with URLs, should I detect URLs automatically (using regex patterns) or do you want users to manually indicate what counts as a URL? Also, should I handle common URL formats like
http://
,https://
,www.
, or just anything that looks like a domain?Text Splitting Logic: When splitting posts, should I try to be "smart" about where to break (e.g., prefer breaking at word boundaries, sentence endings, or paragraph breaks) or just let the user have full manual control with the slider?
Character Counting: Should the character count include or exclude trailing whitespace? And should newlines count as 1 character or 2 (like Windows line endings)?
Post Numbering: When displaying multiple split posts, would you like them numbered (e.g., "1/3", "2/3", "3/3") or just shown in sequence?
Clipboard Feedback: Should there be any visual feedback when text is copied to clipboard (like a brief "Copied!" message)?
Service Selection: Should the service selection (X/Threads/Bluesky) be persistent across page refreshes, or reset each time?
The requirements are quite comprehensive, so I can start building with reasonable assumptions for any of these details, but wanted to check if you have specific preferences. Should I proceed with a complete implementation?
These were reasonable questions, and here are the answers I gave it:
Detect automatically with a regex. The services seem to detect anything that looks like a URL, but I don't want to try to hardcode a list of TLDs to validate against. So I would say anything that looks like a domain but a bit more sophisticated than just a period with no spaces on either side of it.
How about we have ←Word, Word→, ←Character, and Character→ controls where the "Word" controls are smart and the "Character" ones just move one character at time?
Trailing spaces should not be counted, but that implies that our copy to clipboard will trim them. Newlines are 1 character.
Great point! They will be shown in sequence.
Let's add an option at the top that says "auto number posts", which means that when we copy them they will have nn/yy added to them, where nn is the number of this post, and yy is the total number of posts (we will assume it will always be less than 100 posts total).
There will be options for auto number style offered:
nn/yy
,… nn/yy
, and(nn/yy)
with the second being default.We will decrease the max post size to leave room for the auto number text
Yes, a brief message
When you first come to the page, the default will be X (as it has the smallest window, so it will work sub-optimally for the other services), but they should pick the right service and it should be obvious that they need to do that.
I gave it a pretty precise definition of what I wanted, and it identified places of uncertainty that I then addressed, so it should get pretty damn close to what I expect, right?
If at first AI doesn't succeed … Try, try, try, try …, and try again
Wrong. Version 1 of the app it generated turned out to be comically bad, as you actually couldn’t edit the splitting point of your post, the very purpose of the app. So I after I tested what it generated, I replied:
So when I click split post, where are my buttons to move text back and forth between posts?
“You’re absolutely right!” might imply it realized the error of its ways. But it turns out to be just a form of faux groveling as I was about to see over and over again.
It generated a new version of the application. In this new version there was no way to edit the second post or break it, if it was too long, into smaller pieces, so it was still useless. I gave it feedback:
The point of the freeze button is to lock a post and move on to working with the next post
Where's my ability to split the second post if it's too large?
“You’re absolutely right!” At this point, I’m starting to hum Bloody Well Right by Supertramp: the notable feature of that song is the heavy repetition of “You’re bloody well right” in the lyrics.
Unfortunately, the new revision of the application still had a lot of problems, which I told AI about:
I enter a block of text that is too large. I hit Split post. So far so good. However, the ←Word and Word→ buttons move much more than a word at a time. The Character buttons lose spaces as they copy. And they all go backwards from what they should do: ←Word should move a word from the front of a higher number post to the end of a lower numbered one (e.g., move the first word in post 2 to the last word of post 1).
When I am working with just two posts, and the second post is still over the limit, I am offered a button to freeze post 2 but at that point I want to freeze post 1 and then split post 2 into post 2 and 3 and start moving text between those two (with post one now being finished.)
“You’re absolutely right on all points!” ????
Now I’m moving from humming to singing the song:
♩ Right, right, you're bloody well right, You got a bloody right to say ♩
I have a new version of the code to try. And, sure, we’re getting closer, but the program is still completely unusable:
I enter a lot of text. I press Split. I have two posts. There is no Freeze Post 1, only Freeze Post 2. There is no ability to split post 2. And the arrow buttons are still backwards. And, at this point, they should appear under Post 1, not Post 2.
♩ Right, you're bloody well right, You know you got a right to say ♩
Of course, yet another new revision. This time, there’s a weird JavaScript function it generated that isn’t part of the application, so I ask:
What am I supposed to do with function splitPost which is not in a template?
♩ Ha-ha, you're bloody well right, You know you're right to say ♩
It tells me it will fix it, generates a new version of the application, still has that problem, and so I decided to just ignore it for now and plow ahead. The application it generated was better, but the freeze button seems to also split a child post (which I had not asked for it to do). Rather than trying to fix it, I decided to simplify the UI to try to get done sooner. So I say:
The Freeze and Split functionality seems mostly redundant. How about we just offer a Split button that actually does what the current Freeze button does?
♩ Yeah, yeah, you're bloody well right, You know you're right to say ♩
Which it does, and gives me another version of the application. Testing it, I see that even after I split a post it is offering to split the post again (or something, it doesn’t do anything at that point), so I give it my final correction:
I enter a large amount of text, and press Split Post. Now I'm shown a button that says "Split Post 1" EVEN THOUGH IT IS ALREADY SPLIT.
♩ Right, (quite right), you're bloody well right, You got a bloody right to say ♩
Finally, after 14 versions of the application, it behaves well enough (and I’m tired enough) to call it done.
The song, like my vibe coding session, continues until it gets tired of being bloody well right:
♩ You've had your cry, no, I shouldn't say wail, In the meantime, hush your face ♩
The app it generated works, albeit not quite in the way I initially anticipated. Should I have tossed everything, re-written my prompt, and tried again much earlier on? That might be the right idea.
Should I have broken the application up into phases, and vibe coded each phase independently? Ask it first for a skeleton of the application, gotten that working, and then vibe enhanced it incrementally, adding features and getting them to work one at a time? That would be a better idea.
In any case, I have an app that’s good enough, which generally is the goal of vibe coding, so I’ll call it a painful win.
If you want to see the finished application running, it is here:
https://web-production-7e686.up.railway.app
The source is here:
https://github.com/cmcguinness/PostSplitter
In fact, coding is the poster child for productivity boosts from AI.
Also, one- to many- shot prompting helps at lot in supplementing what the LLM knows. As does RAG. These are all approaches to get around blind spots in an LLM’s training.
It could be that a different LLM would have done better; perhaps a different vendor’s model, or a more powerful model. That’s all part of coming at things from a different angle.
Remember I tried to zen vibe code it and it was a failure, so I’m on to “write a specification” to try to get better results.