n8n tutorial - Lesson 05: Comparing AI Models in n8n: Claude vs Gemini vs ChatGPT
Hi everyone, in this post we are going to compare three major AI models — Claude, Gemini, and ChatGPT — inside a real n8n workflow. This is part of our ongoing n8n Workflow Automation Tutorial series, and by the end you will have a working benchmark workflow that runs all three models in parallel and logs the results to Google Sheets.
This post is based on a real hands-on session from Week 1 of the series. We built the workflow T1-B13-Benchmark-3-Models end-to-end and ran into several real bugs and gotchas along the way. Everything you read here comes from actual production-grade experience.
Why Compare AI Models in n8n?
When you are building n8n workflow automation for real projects, picking the right AI model matters. Claude, Gemini, and OpenAI (ChatGPT) each behave differently in terms of output quality, response format, latency, and reliability. Running a benchmark inside n8n lets you see the differences side by side on your exact prompt and use case — not just on generic benchmarks from a marketing page.
This n8n AI models comparison approach is also reusable. Once the workflow is built, you can swap in any topic and re-run it anytime you want to evaluate model outputs for a new task.
What the Final Workflow Looks Like
The completed workflow T1-B13-Benchmark-3-Models has 6 nodes in this order:
Manual Trigger → Set Topic (Edit Fields) → 3 AI branches in parallel (Claude, Gemini, OpenAI) → Merge → Build Row (Code node) → Google Sheets Append
Results are written to a Google Sheet named T1-Benchmark-3-Models, tab Results, with columns: topic, claude_output, gemini_output, openai_output, timestamp.
How to do:
-
Step 1 — Set up your Google Sheet
Create a new Google Sheet and name it T1-Benchmark-3-Models. Inside it, rename the first tab to Results. Add these 5 column headers in row 1: topic, claude_output, gemini_output, openai_output, timestamp. Keep this sheet open — you will need the Spreadsheet ID from the URL later.
-
Step 2 — Set up Google Sheets OAuth credential in n8n
This is a 5-phase process. Go to Google Cloud Console and open your project (in the session we used Default Gemini Project). Enable both Google Sheets API and Google Drive API — you need Drive enabled too because n8n uses it to list and find your Sheet.
Next, go to OAuth Consent Screen, select External, and keep it in Testing mode. Add yourself as a Test User. One important trap here: if you accidentally click Publish app, the status changes to "In production" and the Test Users section disappears. Fix this by clicking Back to testing.
Then create an OAuth Client ID named n8n-sheets-client. Copy the Redirect URI from n8n and paste it into the Authorized Redirect URIs field in Google Cloud. Finally, go back to n8n, create a new credential called Google Sheets (Personal), paste in your Client ID and Client Secret, then click Sign in with Google. You will see a warning that says "App isn't verified" — click Advanced and continue. That is expected behavior for apps in Testing mode.
-
Step 3 — Create the workflow and add the Manual Trigger and Set Topic nodes
Create a new workflow and name it T1-B13-Benchmark-3-Models. Add a Manual Trigger node as the start. Then add an Edit Fields node, rename it Set Topic, and add one field: key = topic, value = your default test topic. In the session we used "How to effectively self-learn AI Automation as a beginner" as the default topic.
-
Step 4 — Add 3 parallel AI branches
This is where the n8n AI models comparison actually happens. You need to add three separate AI nodes — one for Claude (Anthropic), one for Gemini, one for OpenAI — all connected directly from the Set Topic node output.
Important trap: n8n's + button always belongs to whichever node is currently selected. If you click + without having Set Topic selected, n8n will create a sequential connection instead of a parallel branch. To add parallel branches correctly, click on the Set Topic node first, then click its + button, add the first AI node, then go back and click Set Topic again before adding the next one.
Use the same prompt for all three nodes, asking for a response of 120-150 words. This keeps the comparison fair.
For Gemini: during testing we found that gemini-2.0-flash had a limit:0 error, so switch to gemini-2.5-flash instead. Gemini on the free tier also throws 503 errors more often than Claude or OpenAI — this is because free tier has lower priority, 2.5-flash is in high demand, and if you are in Vietnam you are hitting US peak hours. If you get a 503, just retry the node.
-
Step 5 — Add a Merge node
Add a Merge node and connect all three AI branch outputs into it. This collects the results from Claude, Gemini, and OpenAI into a single stream before building the row.
One thing to keep in mind: n8n runs these three branches sequentially by default, not in true parallel. The "parallel branches" in n8n are logical, meaning they share the same single-threaded executor. If you need real parallel execution you would use a Code node with Promise.all or a sub-workflow. For a benchmark like this, sequential execution is fine.
-
Step 6 — Build the row with a Code node
Add a Code node after Merge and rename it Build Row. This node assembles one object with all four values (topic, claude_output, gemini_output, openai_output) plus a timestamp, ready to be appended to your Sheet.
Two bugs we hit here during the session: First, sibling nodes (branches at the same level) cannot reference each other using $('NodeName') directly — you need to reference them from a downstream node like this Code node instead. Second, the OpenAI output field is text, not output. This changed in a recent n8n update — n8n now standardizes all three providers to use the text field. The lesson: never trust convention alone. Always check the actual JSON output in the JSON tab of the node to verify the field name before writing your code.
Here is a simple code structure for the Build Row node:
Reference each AI node using $('Claude').first().json.text, $('Gemini').first().json.text, and $('OpenAI').first().json.text. The .first() method is required because every node output in n8n is an array of items — you need to call a method to select which item you want.
-
Step 7 — Append to Google Sheets
Add a Google Sheets node, set the operation to Append, and select your Google Sheets (Personal) credential. Use the Resource Locator to find your spreadsheet. Watch out for a common trap here: there are two different IDs involved. The Spreadsheet ID is the long string in the URL of your Google Sheet. The Sheet ID (gid) is a separate number that identifies which tab inside the spreadsheet. Make sure you are entering each one in the correct field.
Set the mapping mode to Map Automatically so n8n matches your output field names to your column headers automatically. Run the workflow and check your Sheet — you should see a new row with all four values plus the timestamp.
-
Step 8 (Bonus) — Add an AI Judge to pick the best model
This is an optional bonus step. After the Merge node, add another AI node using Claude Haiku (Anthropic) and configure it to read all three outputs and decide which model gave the best answer.
There are three fixes you must apply to make this work correctly. First, add a Limit (1) node between Merge and the Judge node. Without it, Merge outputs 3 items, the Judge runs 3 times, and because AI is non-deterministic you get 3 different evaluations — which defeats the purpose. The Limit node here is not filtering data; it is acting as a barrier sync, making sure the Judge fires only once. The Judge reads its input data from $('NodeName') references anyway, not from the wire data, so First/Last/Middle all give the same result.
Second, add an Output Parser (Structured, Generate from Example) on the Judge node with the schema {best_model, reason}. Without this, the AI returns a 300-word essay instead of a structured value, and your Sheet column breaks.
Third, set Sampling Temperature = 0 on the Judge's chat model sub-node to make the evaluation deterministic.
Key Takeaways from This n8n AI Models Comparison
After running this workflow several times, here are the practical differences we observed:
Claude gave the most consistently structured and readable output. Gemini was fast when it worked but had the most reliability issues on the free tier (503 errors, model version resets). OpenAI was the most stable and predictable. For production n8n workflow automation, stability matters as much as output quality — a 503 that breaks your workflow at 2 AM is worse than a slightly shorter response.
One thing that came up in this session that is worth sharing: when you build an n8n tutorial project like this benchmark, you learn far more from the bugs than from the steps that work on the first try. The Gemini model version resetting, the OpenAI field name change, the parallel branch trap with the + button — none of these show up in official documentation, but all of them will hit you in real projects.
Conclusion:
In this post we walked through building a full n8n AI models comparison workflow that benchmarks Claude, Gemini, and ChatGPT on the same prompt and logs the results to Google Sheets. This is a practical, reusable pattern you can apply to any n8n workflow automation project where you need to evaluate model outputs before committing to one provider.
If you have any questions, feel free to leave a comment below. Thank you!
Tags: n8n tutorial, n8n workflow automation, n8n AI models comparison, Claude vs Gemini vs ChatGPT, n8n Google Sheets integration, n8n beginner tutorial, AI automation workflow, n8n benchmark workflow
No Comment to " n8n tutorial - Lesson 05: Comparing AI Models in n8n: Claude vs Gemini vs ChatGPT "