• The AI Analyst by Ben Parr
  • Posts
  • ChatGPT vs Google's Bard: A Side-by-Side Comparison and Why It Matters | Join Me for an AI Happy Hour in SF!

ChatGPT vs Google's Bard: A Side-by-Side Comparison and Why It Matters | Join Me for an AI Happy Hour in SF!

The stakes couldn't be higher for Google.

šŸ‘‹šŸ½ Hey friends,

Before I get to the meat of this weekā€™s newsletter, a couple of awesome happy hours and events if you are interested in Generative AI!

  1. I will be part of a Generative AI happy hour + panel next Wednesday, March 29 at 6 PM at The Modernist in San Francisco featuring Cathy Gao (Sapphire Ventures), Luigi Congedo (AI/web3 VC, ex-Bootstrap Labs), and me! Itā€™s free to attend.

    First come, first serve, so register FAST. Mention you heard about the event from my newsletter!šŸ‘‰šŸ½ Link: https://lu.ma/x09s7512

    If you can't make the 6 PM happy hour / panel (or registration fills up), there is also going to be a larger social hour starting at 8 PM, also at The Modernist.šŸ‘‰šŸ½ Link: https://lnkd.in/eg6QzWpj

  2. Iā€™ll also be in SF the next day ā€”on March 30th ā€” for ā€™s Cerebral Valley Summit. (Incredible speaker and attendee line-up, well done Eric!) If you are going to be attending Cerebral Valley, let me know and letā€™s chat.

Okay, now on to the main event.

How Does Bard Compare to ChatGPT?

Earlier this week, Google finally released Bard, its answer to ChatGPT. It comes six weeks after Google first publicly announced its Bard chatbot and three-and-a-half months since Open AI first released ChatGPT in early December and kicked off an AI arms race unlike anything weā€™ve ever seen before.

To test Bardā€™s capabilities, I asked Open AIā€™s ChatGPT and Google Bard to answer the same questions/prompts. Iā€™m using the GPT-4 version of ChatGPT.

The stakes couldnā€™t be higher for Google. They will soon roll out generative AI across Docs, Gmail and the entire Workplace suite. The quality of their AI will steer public opinion about the tech giant ā€” and with it, Googleā€™s future.

With all that said, letā€™s get into the side-by-side comparison. The result surprised me.

šŸ—£ļø First Prompt: ā€œWrite me a unique and unusual 5-paragraph essay about the history of Super Mario.ā€

This is the first prompt I always use to test how unique an LLM can be with its writing. Itā€™s also a test of whether the AI can follow my simple-but-explicit instructions.

Here was what Bard wrote for its first paragraph:

ā€œSuper Mario is one of the most iconic video game characters of all time. He has starred in over 200 games and is known for his friendly demeanor, his red overalls, and his ability to jump incredibly high. But how did Mario come to be?ā€

Bard

Now compare that to ChatGPTā€™s first paragraph:

The history of Super Mario is a tale of serendipity and unbridled creativity that has defined the world of video games for over four decades. It all began in 1981 when a young game designer named Shigeru Miyamoto found himself tasked with repurposing an unsold arcade game called Radar Scope. As he toyed with the idea of a love triangle between a carpenter, a gorilla, and a damsel in distress, the character we now know as Mario emerged. Originally called "Jumpman," Mario's occupation and appearance were altered due to a striking resemblance to the landlord of Nintendo's American warehouse, Mario Segale. Thus, the mustachioed Italian plumber was born.

ChatGPT

You can immediately see some major differences ā€” Googleā€™s essay is far more generic. It uses simple, straightforward language that doesnā€™t beat even GPT 3.5 in my opinion. (Hereā€™s what the older version of ChatGPT ā€” GPT 3.5 ā€” wrote with the same prompt for comparison.)

ChatGPT 4, on the other hand, pulled unique language and facts for its essay. It even created special subheadings for each section. I didnā€™t even know that Shigeru Miyamoto came up with the idea after repurposing an old arcade game called Radar Scope. And ChatGPT got this fact right! And yes, ChatGPT went long with its essay, but it followed my instructions and wrote a surprisingly poignant essay.

If you are curious to read the whole of each essay, Iā€™ve included them below. On the left is Google Bardā€™s response and the right is ChatGPTā€™s response.

Picking the winner of this challenge is easy.

šŸ† Winner: ChatGPT

šŸ—£ļø Second Prompt: ā€œWhat is (āˆ’3āˆ’9i)(1+10i)4!ā€

Large language models (LLMs) like GPT-4 and LaMDA (the LLM powering Bard) are notoriously mistake-prone when it comes to solving math problems. GPT-4 made major improvements over GPT-3.5 in its ability to solve math problems and pass mathematical tests.

For this test, I pulled a complex number equation from the Lamar University website and I added 4! (4 factorial) to the equation, because I personally love factorials. (Iā€™ve been intrigued by them since I was a kid, I donā€™t know why ā€” Iā€™m weird.)

To my surprise, ChatGPT and Google Bard came up with different responses.

I really thought both chatbots would get this question correct. Both were able to compute the factorial correctly (4! = 24) but only ChatGPT got the final answer correct (2088 - 936i). Itā€™s also the only AI to show its work. Bard doesnā€™t break down its logic.

This issue becomes even more stark when you compare Bardā€™s result (wrong) to a Google search (correct):

I also asked both AIs to give me the answer to 100 factorial (because, again, I LOVE factorials). Again, ChatGPT got it right. Bard initially got it right, then went on a weird tangent with a second answer that was wrong. Very strange.

I thought Bard would pull through here. I was wrong.

šŸ† Winner: ChatGPT, By a Mile

šŸ—£ļø Third Prompt: ā€œPretend you are a programming genius, and I am your human liaison. Pretend I know nothing about coding. Write me some code for a simple game of hangman.ā€

Building a hangman game is a really simple task for any programmer. In fact, one of my first assignments in my college Intro to Programming class was to make a hangman game using C. Itā€™s a good way to test for basic programming skills.

So, naturally, I asked both ChatGPT and Bard to write some code for a hangman game, and then I copied that code with no edits into Replit. (Itā€™s a development environment that lets even a complete novice run code and build products. I suggest playing with it even if you have never written a line of code in your life.)

ChatGPTā€™s code was far more detailed ā€” it included a set of words the program would use for the hangman game, randomly chosen by a simple array function. It knew the game wouldnā€™t work without some starter words like ā€˜grapeā€™ and ā€˜elderberryā€™. ChatGPT also included additional explanation for the key functions of its code. (It remembered I was a beginner in this example and included instructions.) Bardā€™s output was far less detailed and didnā€™t include starter words for the hangman game or additional context on how its code worked.

Below are the complete results of my prompts from ChatGPT and Bard. Top is ChatGPT; bottom is Bard:

In the end though, the only thing that matters is: does the code work?Ā 

The answer: ChatGPTā€™s code ran perfectly every time, but Bardā€™s code failed to work. I was allowed to guess one letter before the Bard-written program crashed.

Hereā€™s the console output in Replit if you want to see for yourself. On the left is Bardā€™s code and on the right is ChatGPTā€™s code. The red text means there was an error running the code.

šŸ† Winner: ChatGPT, By a Hundred Miles

šŸ¤” Why Bardā€™s Shortcomings Matter

I planned on running at least two more side-by-side comparisons, but thereā€™s no point ā€” At least for now, Open AIā€™s ChatGPT (running on GPT-4) is far superior to Google Bard. Thereā€™s just no comparison. It doesnā€™t even compare to the GPT-3.5 version of ChatGPT released in December ā€” ChatGPT 3.5 was able to correctly solved the same math challenge Bard couldnā€™t solve.

Iā€™m not the only one who has noticed this glaring problem ā€” the AI Explained YouTube channel came to the same conclusion I did. (Itā€™s an excellent YouTube channel if you arenā€™t already following it.)

Is Bard worse because Google is holding back for safety or business reasons? Is Google actually that far behind Open AI? Or does Bard just need a lot more human interaction to quickly improve? I suspect the answer is a combination of all three. But itā€™s still disappointing to see Bard come up short like this.

If this is the technology Google is using for Google Docs and Gmail, I am going to have to rely on GPT-4 plug-ins instead. The gap is that wide right now, and users will notice.

Bardā€™s shortcomings really matters because Googleā€™s users may end up not trusting Googleā€™s AI to do things like summarize emails and write first drafts in Google Docs. This was the core piece of the announcement Google made last week that generative AI would be integrated into Google Workplace. They may instead turn to Microsoftā€™s AI, which will soon be in Word, Excel and PowerPoint and is powered by GPT-4.

Bardā€™s lack of sophistication pours cold water on last weekā€™s Google AI announcements. Google still has some of the most advanced AI on the planet (Deepmind, anyone?), but when it comes to public-facing large language models, it almost feels antiquated.

I am still rooting for Google though, because I really want to have generative AI integrated into Gmail, Docs, and Slides. It would take my productivity to the next level. But the large language model they are using now needs to improve, and quickly.

This is what Midjourney gave when I asked it to generate a sad bard.

Before Closing Out: Some Other Upcoming Travel

In addition to being in SF next week, Iā€™ll be hopping around the world in the next few months.

  1. I will be in Las Vegas March 26th to March 28th for Shoptalk, the biggest ecommerce and retail conference of the year. I have made a private WhatsApp group for friends attending Shoptalk ā€” let me know if you want me to add you to the group.

  2. I will be in Japan starting April 11th, so if you are based in Japan or have someone you think I should meet in Japan, let me know. I will also be visiting Thailand and Singapore while my fiancƩ (still not used to saying that!) and I are in Asia. Let me know if you live in Japan, Thailand or Singapore!

  3. I will also be in Austin, Miami, NYC, and possibly Europe in the next 2-3 months. Same as above ā€” let me know if youā€™re in one of these cities!

Look out for another newsletter soon ā€” this next one will be about what VCs really think of generative AI and generative AI companies.

Cheers,

~ Ben