ChatGPT vs Google's Bard: A Side-by-Side Comparison and Why It Matters | Join Me for an AI Happy Hour in SF!
The stakes couldn't be higher for Google.
👋🏽 Hey friends,
Before I get to the meat of this week’s newsletter, a couple of awesome happy hours and events if you are interested in Generative AI!
I will be part of a Generative AI happy hour + panel next Wednesday, March 29 at 6 PM at The Modernist in San Francisco featuring Cathy Gao (Sapphire Ventures), Luigi Congedo (AI/web3 VC, ex-Bootstrap Labs), and me! It’s free to attend.
First come, first serve, so register FAST. Mention you heard about the event from my newsletter!👉🏽 Link: https://lu.ma/x09s7512
If you can't make the 6 PM happy hour / panel (or registration fills up), there is also going to be a larger social hour starting at 8 PM, also at The Modernist.👉🏽 Link: https://lnkd.in/eg6QzWpj
I’ll also be in SF the next day —on March 30th — for ’s Cerebral Valley Summit. (Incredible speaker and attendee line-up, well done Eric!) If you are going to be attending Cerebral Valley, let me know and let’s chat.
Okay, now on to the main event.
How Does Bard Compare to ChatGPT?
Earlier this week, Google finally released Bard, its answer to ChatGPT. It comes six weeks after Google first publicly announced its Bard chatbot and three-and-a-half months since Open AI first released ChatGPT in early December and kicked off an AI arms race unlike anything we’ve ever seen before.
To test Bard’s capabilities, I asked Open AI’s ChatGPT and Google Bard to answer the same questions/prompts. I’m using the GPT-4 version of ChatGPT.
The stakes couldn’t be higher for Google. They will soon roll out generative AI across Docs, Gmail and the entire Workplace suite. The quality of their AI will steer public opinion about the tech giant — and with it, Google’s future.
With all that said, let’s get into the side-by-side comparison. The result surprised me.
🗣️ First Prompt: “Write me a unique and unusual 5-paragraph essay about the history of Super Mario.”
This is the first prompt I always use to test how unique an LLM can be with its writing. It’s also a test of whether the AI can follow my simple-but-explicit instructions.
Here was what Bard wrote for its first paragraph:
Now compare that to ChatGPT’s first paragraph:
You can immediately see some major differences — Google’s essay is far more generic. It uses simple, straightforward language that doesn’t beat even GPT 3.5 in my opinion. (Here’s what the older version of ChatGPT — GPT 3.5 — wrote with the same prompt for comparison.)
ChatGPT 4, on the other hand, pulled unique language and facts for its essay. It even created special subheadings for each section. I didn’t even know that Shigeru Miyamoto came up with the idea after repurposing an old arcade game called Radar Scope. And ChatGPT got this fact right! And yes, ChatGPT went long with its essay, but it followed my instructions and wrote a surprisingly poignant essay.
If you are curious to read the whole of each essay, I’ve included them below. On the left is Google Bard’s response and the right is ChatGPT’s response.
Picking the winner of this challenge is easy.
🏆 Winner: ChatGPT
🗣️ Second Prompt: “What is (−3−9i)(1+10i)4!”
Large language models (LLMs) like GPT-4 and LaMDA (the LLM powering Bard) are notoriously mistake-prone when it comes to solving math problems. GPT-4 made major improvements over GPT-3.5 in its ability to solve math problems and pass mathematical tests.
For this test, I pulled a complex number equation from the Lamar University website and I added 4! (4 factorial) to the equation, because I personally love factorials. (I’ve been intrigued by them since I was a kid, I don’t know why — I’m weird.)
To my surprise, ChatGPT and Google Bard came up with different responses.
I really thought both chatbots would get this question correct. Both were able to compute the factorial correctly (4! = 24) but only ChatGPT got the final answer correct (2088 - 936i). It’s also the only AI to show its work. Bard doesn’t break down its logic.
This issue becomes even more stark when you compare Bard’s result (wrong) to a Google search (correct):
I also asked both AIs to give me the answer to 100 factorial (because, again, I LOVE factorials). Again, ChatGPT got it right. Bard initially got it right, then went on a weird tangent with a second answer that was wrong. Very strange.
I thought Bard would pull through here. I was wrong.
🏆 Winner: ChatGPT, By a Mile
🗣️ Third Prompt: “Pretend you are a programming genius, and I am your human liaison. Pretend I know nothing about coding. Write me some code for a simple game of hangman.”
Building a hangman game is a really simple task for any programmer. In fact, one of my first assignments in my college Intro to Programming class was to make a hangman game using C. It’s a good way to test for basic programming skills.
So, naturally, I asked both ChatGPT and Bard to write some code for a hangman game, and then I copied that code with no edits into Replit. (It’s a development environment that lets even a complete novice run code and build products. I suggest playing with it even if you have never written a line of code in your life.)
ChatGPT’s code was far more detailed — it included a set of words the program would use for the hangman game, randomly chosen by a simple array function. It knew the game wouldn’t work without some starter words like ‘grape’ and ‘elderberry’. ChatGPT also included additional explanation for the key functions of its code. (It remembered I was a beginner in this example and included instructions.) Bard’s output was far less detailed and didn’t include starter words for the hangman game or additional context on how its code worked.
Below are the complete results of my prompts from ChatGPT and Bard. Top is ChatGPT; bottom is Bard:
In the end though, the only thing that matters is: does the code work?
The answer: ChatGPT’s code ran perfectly every time, but Bard’s code failed to work. I was allowed to guess one letter before the Bard-written program crashed.
Here’s the console output in Replit if you want to see for yourself. On the left is Bard’s code and on the right is ChatGPT’s code. The red text means there was an error running the code.
🏆 Winner: ChatGPT, By a Hundred Miles
🤔 Why Bard’s Shortcomings Matter
I planned on running at least two more side-by-side comparisons, but there’s no point — At least for now, Open AI’s ChatGPT (running on GPT-4) is far superior to Google Bard. There’s just no comparison. It doesn’t even compare to the GPT-3.5 version of ChatGPT released in December — ChatGPT 3.5 was able to correctly solved the same math challenge Bard couldn’t solve.
I’m not the only one who has noticed this glaring problem — the AI Explained YouTube channel came to the same conclusion I did. (It’s an excellent YouTube channel if you aren’t already following it.)
Is Bard worse because Google is holding back for safety or business reasons? Is Google actually that far behind Open AI? Or does Bard just need a lot more human interaction to quickly improve? I suspect the answer is a combination of all three. But it’s still disappointing to see Bard come up short like this.
If this is the technology Google is using for Google Docs and Gmail, I am going to have to rely on GPT-4 plug-ins instead. The gap is that wide right now, and users will notice.
Bard’s shortcomings really matters because Google’s users may end up not trusting Google’s AI to do things like summarize emails and write first drafts in Google Docs. This was the core piece of the announcement Google made last week that generative AI would be integrated into Google Workplace. They may instead turn to Microsoft’s AI, which will soon be in Word, Excel and PowerPoint and is powered by GPT-4.
Bard’s lack of sophistication pours cold water on last week’s Google AI announcements. Google still has some of the most advanced AI on the planet (Deepmind, anyone?), but when it comes to public-facing large language models, it almost feels antiquated.
I am still rooting for Google though, because I really want to have generative AI integrated into Gmail, Docs, and Slides. It would take my productivity to the next level. But the large language model they are using now needs to improve, and quickly.
This is what Midjourney gave when I asked it to generate a sad bard.
Before Closing Out: Some Other Upcoming Travel
In addition to being in SF next week, I’ll be hopping around the world in the next few months.
I will be in Las Vegas March 26th to March 28th for Shoptalk, the biggest ecommerce and retail conference of the year. I have made a private WhatsApp group for friends attending Shoptalk — let me know if you want me to add you to the group.
I will be in Japan starting April 11th, so if you are based in Japan or have someone you think I should meet in Japan, let me know. I will also be visiting Thailand and Singapore while my fiancé (still not used to saying that!) and I are in Asia. Let me know if you live in Japan, Thailand or Singapore!
I will also be in Austin, Miami, NYC, and possibly Europe in the next 2-3 months. Same as above — let me know if you’re in one of these cities!
Look out for another newsletter soon — this next one will be about what VCs really think of generative AI and generative AI companies.