Comparing ChatGPT, Gemini, Mistral & Anthropic

Comparing the Large Language models from, OpenAI, Anthropic, Mistral and Google on technical subjects (not trying to trip them up with wordplay)

Updated April, 2024

Introduction

As of March 2024, I've been subscribed to ChatGPT for a year, I'm in the UK which I'm assuming is the reason I never get the latest developments as quickly as I'd like, but it is what it is, I have Custom GPT's and they're great, if you haven't unlocked the power of Custom GPT's yet, check out my piece on that here.

I'm pretty sure I get my money's worth out of ChatGPT, I just tried to count back to the beginning of the year (about 80 days ago) and after loading the list of chats, I had to hit the space bar 8 times before I got to the 2023 section. At about 35 chats per scroll, with the last half a page I estimate that's about 300 different threads and some of those chats will go one for pages.

The Models

Right now I'm paying for Gemini Advanced & Claude Opus but I won't be keeping both. I've seen a bunch of other videos and blog posts comparing the various models but they often include a lot of things I'm not interested in, I have a box with a hole and a ball smaller than the box, or I own 3 cars, last year I sold 2 cars, etc. etc. These are mildly interesting to me in that they show the vulnerabilities of AI right now, but I'm not going to waste my time on that. I'm going to compare the models on the things I care about, which are:

The Tasks

Terraform: I use Terraform a lot, I've put a lot of problems to various models that have been available over time and I think I've built up a pretty good idea of what they get wrong.
Python: I use Python a bit less than Terraform, but it's still fairly frequent, I'm going to try and come up with a task that they will have a chance of completing in one go, and if not, I'll see if I can feed the error back to them to fix it.
Web Development: I'm not a web developer, CSS seems to be a particular weakness of mine, I mean, when I read the stuff it's logical but I struggle with its lack of structure, or am I just doing it wrong? Again, I'm going to try and give the models something they have a chance of completing in one go.
Website Planning: Like, structure, what should go where, what do I need to think about, etc. The stuff that I wish I'd known before trying to figure out how to build a website from the ground up.

The Prompts & Results

Terraform

What I'm asking here is not possible but it fools some models. I built a custom GPT that gets this right, the full details of that are here.

At the moment I've got these security group rule blocks:

resource "aws_security_group_rule" "ingress_rules_cidr" {
  for_each = {
    for k, v in local.sg_rules : k => v
    if can(cidrsubnet(v.4, 0, 0)) == true
  }
  type              = "ingress"
  from_port         = each.value.1
  to_port           = each.value.2
  protocol          = each.value.3
  cidr_blocks       = [each.value.4]
  description       = each.key
  security_group_id = aws_security_group.sg[each.value.0].id
}

resource "aws_security_group_rule" "ingress_rules_sg" {
  for_each = {
    for k, v in local.sg_rules : k => v
    if can(cidrsubnet(v.4, 0, 0)) == false
  }
  type                     = "ingress"
  from_port                = each.value.1
  to_port                  = each.value.2
  protocol                 = each.value.3
  source_security_group_id = module.lb-sg[each.value.4].security_group_id
  description              = each.key
  security_group_id        = aws_security_group.sg[each.value.0].id
}

Is it possible to combine the methods, cidr and security group, into one block by using dynamic blocks? Any idea if dynamic blocks are allowed in sg rule blocks?

ChatGPT GPT4, Claude Opus & Gemini Advanced: All failed, all offered exactly the same code block back in response, I'm not going to post that here as that could end up in some sort of infinite loop of AI robots scraping erroneous code to reinforce their own errors, suffice to say it had 2 dynamic blocks in it.
Mistral Large: A Pass, with a side of Fail. It said Yes, you can simplify your code by using dynamic blocks, but dynamic blocks are not directly supported in the aws_security_group_rule resource. which is 100% correct, but then it went on to offer another solution that was incorrect.

Terraform Winner: Mistral Large

Python

As someone from England, it pleases me that I get to talk about the weather, lets get into it.

I want a python script that can fetch the weather forecast for a given location using the OpenWeatherMap API. The script should return the weather forecast for the next 5 days.

I then want the data visualised in a google colab notebook, I will let you decide how to visualise the data but it should be clear and easy to understand and include all the main metrics.

The user should be prompted for an API key, City, Country code and the Units they want the data in (metric, imperial, standard). All of these details should be stored in a config file, and the next time the script is run, the user should be prompted to use the same details or enter new ones.

You will need to have the script use a public API to look up the latitude and longitude of the location based on the city and country code entered by the user.

This is the call you will need to make:
api.openweathermap.org/data/2.5/forecast?lat={lat}&lon={lon}&appid={API key}

Gemini, Mistral & Claude all tried to import Nominatim to get the location, I'm assuming you either need an API key to do that or they've banned it from being used from Google Colab because they didn't work. When each one was told they could have used the OpenWeatherMap API to get the location they all presented a working script.
Gemini and Mistral both picked one temperature reading per day for visualisations despite data being available at 3-hour intervals. they also both had x-axis labels of Day 1, Day 2, etc. which I thought was poor.
ChatGPT worked first time, it plotted a small graph at 3-hour intervals for the next 5 days, it gave the x-axis labels as the Month-DD format which I thought was a nice touch. This script seemed to be much more efficient than the others, just in terms of the amount of code it took to get the job done.
Claude pumped out a behemoth of a script and despite it needing to be told to use the OpenWeatherMap API to get the location, when it worked it gave wider graphs more befitting of this range of data. It also presented Temperature, Humidity and Wind Speed graphs, all plotted at 3-hour intervals.

Python Winner: Claude Opus, with ChatGPT a clear 2nd

Web Development

Ok, this was a big one I think, probably too much to ask for in one go, but there's no harm in trying.

Create a single HTML file that represents a mockup of a modern, minimal, and stylish food blog homepage. The mockup should be fully self-contained, including all necessary HTML, CSS, and JavaScript code within the same file.

The food blog homepage should include the following elements:

A responsive header with a navigation menu containing links to "Home," "Recipes," "About," and "Contact" sections.
A hero section showcasing a featured recipe with an enticing image and a brief description.
A diverse grid or list of 4 dummy articles, each displaying a food image, title, and a short excerpt.
A sidebar with a search bar, categories, and a list of recent posts.
A footer with social media icons, a newsletter subscription form, and a copyright notice.
Please use a placeholder for the logo image for the header, even if it is too large for the designated space. Resize and position the logo appropriately using CSS to ensure it fits well within the design.

Incorporate the supplied food images for the featured recipe and dummy articles. Resize and optimize the images for web usage and ensure they are visually appealing within the layout.

Use modern CSS techniques such as flexbox or CSS grid to create a responsive and visually pleasing layout. Implement hover effects, transitions, or animations to enhance the user experience.

Include basic JavaScript functionality, such as a responsive mobile menu toggle and a smooth scrolling effect when clicking on navigation menu links.

Ensure that the code is well-structured, properly indented, and follows best practices for HTML, CSS, and JavaScript. Use appropriate semantic HTML tags and include comments where necessary to improve code readability.

Please provide the complete HTML file with all the necessary code (HTML, CSS, and JavaScript) included within the same file. The mockup should be fully functional and can be viewed in a web browser without the need for any external files or dependencies.

You can assume that the logo and food images will be added, so please include placeholders for those images in the code, using appropriate alt tags and dimensions.

I appreciate your effort in creating a visually appealing and functional mockup of a food blog homepage. Thank you!

Mistral's page looked old, it used the full width, there were placeholders and an attempt to do a sidebar in code but none of it rendered. The worst of the 4.
Gemini's page was really basic, there was nothing in a small footer and no placeholder text. The images were nicely sized with one big featured image and 3 smaller images, nothing rearranged but things did resize as the browser window was made thinner, I'd assume that any text would fall into place but a fairly basic effort.
ChatGPT just cut off towards the end of the body, it tried to squeeze all the CSS into a big mass of instructions bless it, but it couldn't make it, I just responded with please continue and it made a decent job of it. Used fixed width/centred layout at full browser width, included email signup box and social media links in the footer, re-ordered images and articles really well when I made the browser thin, as in it went from 3 columns to 1, pretty decent.
Claude was impressive, the layout was tidy, well executed, everything fit where it was supposed to fit, the columns rearranged nicely, it added a full-width hero recipe and added 4 more horizontally underneath just to show off I think. Below that it added a search bar, then a list of categories and recent posts, the footer had a nice font, was laid out well and included social media links and a newsletter signup box. Very impressive.

Web Development Winner: Claude Opus

Website Planning

When I was building this site, I kept putting the cookie consent coding off and I realised quite late that I needed a sitemap, which meant I had to rework some things to make sure the timestamp was correct on the various pages, so these elements are of particular interest to me.

I am planning to create a blog called "arrywalker.com" that focuses on codified automation topics. As someone who is just starting out with this project, I want to ensure that I consider all the essential elements and best practices for building a successful blog website. Can you provide me with a comprehensive list of things I should think about and include in my blog?

Please cover aspects such as:

The essential pages and sections I should include on my website
How to organize and structure my content effectively
Technical considerations for optimal website performance and security
SEO strategies to improve my blog's visibility on search engines
Legal and privacy requirements I should be aware of
Any other tips or advice you think would be helpful for a beginner setting up a blog on codified automation
Please provide a detailed and well-structured response that covers each of these points, as well as any additional insights you think would be valuable for me to consider as I embark on this project.

I started making a lot of notes on this one but in the end it became clear where things were missing.

Gemini was light on details everywhere, strangely it was quite noticeable in the SEO strategies section, but it made up a couple of points in the additional tips section.
ChatGPT was marginally better than Gemini but still kinda felt like it was responding with the bare minimum to please. Bonus point for mentioning analytics.
Mistral was just slightly better everywhere than the first 2, didn't mention analytics but its the first one to mention cookies, generally a good performance.
Claude felt classy on this one, matched or excelled in every section, mentioned analytics, cookies and the only one to mention a sitemap.

Website Planning Winner: Claude Opus

Conclusion

Let me say, this has been really eye-opening for me, as I set out to do this experiment I didn't really know whether it would produce interesting results or not, or how much I'd learn from it, but the result, the difference between the models, the way they handle the tasks, it's been quite a revelation.

Claude Opus: If you need a longer output, I haven't seen anything to beat this yet, and it seems to know it, taking its time and laying things out nicely. I'd seen people on X (twitter) saying good things about Claude but I hadn't really had a chance to put it through its paces, Claude right now is the purring Rolls-Royce of the AI world, quite impressive.
ChatGPT 4 is a clear 2nd despite it's shortcomings in the more "fluffy" Web Planning category. I won't be getting rid of ChatGPT any time soon, as I've said, the Custom GPT's make the difference for me, converting it from failing at Terraform to being the best at it. It seems to try and squeeze what it needs into a shortened output which is quite clever, and if you ask it to continue it still does a very good job.
Mistral Large: I was rooting for Mistral a bit after having experienced good results solving terraform problems, but its basic outputs in the Python and Web Development categories showed that it's not up to GPT4 standards yet, it is free right now though, so it's got that going for it.
Gemini Advanced needs some serious improvement, bottom or equal bottom in everything but web development, and I think if Mistral had managed to actually render the features it added it might have been bottom there too, surprisingly poor performance.