The Generator, the Critic, and the Evaluator Walk Into a Chat…

How “let’s make LLMs argue” turned into my go-to 4-role workflow (and a weekend CLI project).

Aug 24, 2025

This project actually started as a lunch-break joke at the office:
“What if we let two LLMs argue with each other and just… see where it ends?”

Fast-forward a couple weeks, and that “joke” has become one of my go-to prompt engineering tricks. I’ll often ask one LLM to critique another’s work—like:

“Hey Gemini, ChatGPT claims he’s the best consumer LLM product because yada yada… do you agree?”

Then I flip it back and let ChatGPT defend itself, refine the answer, or even rewrite the whole thing. Eventually I standardized the flow into four roles: Generator → Critic → Refiner → Evaluator.

And you know what? It works ridiculously well. It’s like having your own mini debate club of robots—except they don’t eat all your snacks.

Here’s a real example: “Help me come up with 5 domain name ideas used for a B2C SaaS product that converts RSS into Podcast”. (I loved “Podmatic.io”… sadly, someone else loved it first. 😭)

Image preview — ChatGPT running through the four roles to brainstorm domain names

If you’re curious, give this multi-LLM ping-pong a try—you might discover insights you’d never have gotten from a single model. And if you’re feeling more weekend-coder handy, I even hacked together a little CLI version that bakes in the four roles and lets you swap between different LLM vendors. No more tedious copy-paste battles—just let the bots duke it out while you sip your coffee. ☕️🤖⚔️ (at your own llm token cost…)

👉 PyPI package: ai-cli-chat
👉 GitHub repo

Yusi’s Read, Think and Build

Discussion about this post

Ready for more?