We were doing a user test about a year ago. Someone had just generated a 12-slide pitch deck with PitchShow — looked great, they were happy with it. Then they wanted to change one word on slide 4. One word. "Quarterly" to "monthly."
Eight minutes later, they were still trying to find it.
They had to figure out which file the slide lived in. Find the right component. Locate the text string inside the JSX. Make sure it was the right instance and not a label somewhere else. Save. Wait for the reload. Check if it looked right.
I sat there watching this and thought: this is actually insane. The word is right there on the screen. You can see it. You should be able to just point at it.
That thought became Vibe Mode.
The problem with describing WHERE something is
Every AI editor has a chat box. You type what you want. The AI tries to figure out what you mean and makes a change.
This works OK for big things: "rewrite the whole narrative arc" or "make this deck more formal." But for small, specific changes, it's maddening — because half the instruction is spatial. "The word in the top-right of slide 4." "The second bar in the chart on the left side of slide 7." "That heading — not the subheading, the main heading."
You're describing a location using language, when you have a mouse. It's like calling someone on the phone to explain which lamp in your living room you want moved, when they're standing right next to you.
Vibe Mode's premise is simple: stop describing where the thing is. Point at it. Then describe what you want changed.
How it actually works
PitchShow runs every presentation as a live webpage. When you switch to Vibe Mode, we inject a tiny script into that page. The script intercepts your mouse clicks before the slide handles them. When you click something, it collects everything it knows about that element — its type, its text, its position, what's around it — and sends that back to the app.
A pin appears on screen, floating over whatever you clicked. That pin is your anchor.
Now here's the part that required the most engineering: translating "the thing I clicked" into "the specific lines of code that render that thing."
Every slide component can be hundreds of lines. The heading you clicked could be on line 12 or line 147. We solve this by tagging every meaningful element at generation time with a fingerprint — a unique identifier derived from its role, its content, and its position. When the click comes in with that identifier, we look it up in the source file and extract the relevant code block.
That code block, plus your typed instruction, goes to the AI. The AI edits exactly that section and nothing else. The new code gets written to disk, the slide updates in about two seconds, and you see the result immediately.
The whole loop — click, type, see — takes under three seconds on a normal connection. Most of the time it's closer to two.
The things that broke and how we fixed them
It took six months to get this right. Here's what kept going wrong:
The pin would lose track of the element. After the AI made an edit, the slide re-rendered. If the AI rearranged anything, the pin's original coordinates were pointing at empty space. We fixed this by re-anchoring the pin using the fingerprint after every update — look up the element in the new DOM, recalculate position.
Background elements had no good fingerprint. Decorative gradients, animated overlays, abstract SVG shapes — these don't have text content to fingerprint on. Clicking them produced a pin but no reliable code location. We added a fallback: when the match is ambiguous, the AI gets the whole parent component. Slower, but usually correct.
Users wanted to edit multiple things at once. Holding Shift to select several elements, then describing one change that affects all of them. This required sending multiple code locations as a single structured request and making sure the AI's edits didn't break the components between them. Still not perfect, but good enough.
The version of Vibe Mode we have now is probably version 7 or 8 of the underlying mechanism. Every iteration was prompted by watching someone use it and getting frustrated. That's still how we find the bugs — not in test suites, but by sitting next to a real person who's trying to do real work.
What it feels like when it works
The best version of Vibe Mode doesn't feel like using a tool. It feels like thinking out loud.
"This chart needs to show the decline more dramatically." Click the chart. Type that. Two seconds later — more dramatic chart.
"The color of this heading is wrong." Click the heading. "Make this the same orange as our logo." Two seconds later — done.
"This whole slide feels too cluttered." Click anywhere. "Remove the subtitle and make the chart larger." Done.
The reason it feels different from chat editing is scope. In a chat, every instruction implies "apply this to the whole deck unless I say otherwise." In Vibe Mode, every interaction implies "apply this to exactly what I clicked, nothing else." You make surgical changes with the confidence that nothing else will move.
We're nowhere near done with it
Vibe Mode is good. It's not what I want it to eventually be.
Right now, you're pointing at one element at a time. What I want is to point at a slide and say "this needs to feel more urgent" and have the AI understand that's an instruction about tone, pacing, visual weight — not a request to edit a specific element. A higher-level edit that touches everything appropriately.
We're not there. But we know what direction we're walking in. And the foundation — the click-to-code mechanism, the live update loop, the pin system — that infrastructure is ready for it.
The person from the user test who spent eight minutes on one word? We showed them Vibe Mode a few months later. They changed the same word in about four seconds. Then immediately clicked three more things and changed them too, just because it was suddenly easy.
That's what we were building toward.