The Generator, the Critic, and the Evaluator Walk Into a Chat…
How “let’s make LLMs argue” turned into my go-to 4-role workflow (and a weekend CLI project).
This project actually started as a lunch-break joke at the office:
“What if we let two LLMs argue with each other and just… see where it ends?”
Fast-forward a couple weeks, and that “joke” has become one of my go-to prompt engineering tricks. I’ll often ask one LLM to critique another’s work—like:
“Hey Gemini, ChatGPT claims he’s the best consumer LLM product because yada yada… do you agree?”
Then I flip it back and let ChatGPT defend itself, refine the answer, or even rewrite the whole thing. Eventually I standardized the flow into four roles: Generator → Critic → Refiner → Evaluator.
And you know what? It works ridiculously well. It’s like having your own mini debate club of robots—except they don’t eat all your snacks.
Here’s a real example: “Help me come up with 5 domain name ideas used for a B2C SaaS product that converts RSS into Podcast”. (I loved “Podmatic.io”… sadly, someone else loved it first. 😭)
If you’re curious, give this multi-LLM ping-pong a try—you might discover insights you’d never have gotten from a single model. And if you’re feeling more weekend-coder handy, I even hacked together a little CLI version that bakes in the four roles and lets you swap between different LLM vendors. No more tedious copy-paste battles—just let the bots duke it out while you sip your coffee. ☕️🤖⚔️ (at your own llm token cost…)