Insights

AI vs Human Written Metadata: Which Leads to Higher CTRs?

How will this post help me?

If you're considering using AI like ChatGPT for writing metadata, then this post is for you.

We'll show you a detailed test we ran on the Seer Interactive site comparing ChatGPT-4, manual writing, and Google-generated metadata for SEO purposes.

By the end, you'll understand the efficiency and performance trade-offs of each method and whether AI is a suitable choice for your metadata optimization strategy.


Table of Contents

TLDR

Methodology

Results

What's Next?

Additional Data + Test Variables

Conclusions

Ever since ChatGPT was released in November 2022, digital markers have been continually testing and pushing the boundaries of the tool.

Seer is no exception - fromkeyword categorizationtointernal linking, we’ve been on a mission to figure out how ChatGPT can drive efficiency to benefit our team and clients.

Ourculture of testingwas highlighted within Anthony’s post from May 2023, regardingusing ChatGPT to write metadatatoo.

Spoiler alert:Anthony’s test uncovered that using ChatGPT led todecreasedperformance.

However, one of his noted variables was the small sample size of 4 pages in the test.

So, we decided to test how the results may vary with a larger sample size.

Test Results TLDR

When it comes to writing metadata with ChatGPT vs having our team manually write them, this is what we found:

+176% lift in click-through-ratemanually writing vs ChatGPT-4

-21.5% decline in CTRfor pages with ChatGPT-4 written metadata

>6x faster to write元数据with ChatGPT vs manually

While there are benefits of utilizing ChatGPT, the resulting performance is not comparable enough to justify utilizing this tool exclusively for metadata optimization.

There are multiple variables at play, requiring the need for a third test. Read on for a more detailed breakdown of this test.

Hypotheses

  1. If ChatGPT-4 can provide metadata with comparable performance, then we should utilize the tool more often than manually providing metadata optimizations.
  2. If Google overwrites meta descriptions and titles, then we should limit the time we spend writing them from scratch.

Methodology

Test groups

We categorized a list of 237 pages from the Seer blog into three groups:

Method

Number of Pages in Group

Author

Process

Manual Test Group

57

Meghan Evans

Our current process of keyword research and bespoke titles and descriptions.

GPT Test Group

90

ChatGPT-4 + LinkReader plugin

We used the LinkReader plugin so the tool would be able to crawl the content of each page to provide relevant metadata.

Google Test Group

90

N/A

We created this group with the intention of leaving all title tags and meta descriptions blank for Google to automatically fill in.

Generating the Metadata

ChatGPT-4 Metadata

We “trained” ChatGPT-4 by asking the tool to explain what an SEO optimized meta description and title tag would look like and to “act as an SEO practitioner” to ensure it was as up to the task as possible.

Google Metadata

As mentioned, the Google test group’s metadata was initially intended to be left blank. However, we discovered that Hubspot has a limitation and won’t allow the publishing of a page without these elements.

Due to this, we added each article's name in the title tag and followed two options for the meta description:

  • Left as a period. (Example)
  • Or input “We are not including an optimized meta description here to test how often Google rewrites metadata.” (Example)

We did this so the meta descriptions wouldn’t contain any information relevant to the page and give Google full rein to update as it pleased.

Measurement

Our measurement plan included standard SEO metrics:

  • URL Click-through-rate (CTR)
  • URLs rankings
  • # of rewritten titles and descriptions

We also assessed efficiency (time spent) using ChatGPT-4 for metadata vs. the typical manual process.

Results

After a month of testing our meta description optimizations, here’s what we saw for each group:

Method

CTR(Pre)

CTR(Post)

CTRPoP % Change

Manual

0.96%

1.12%

+16.26%

Google

0.25%

0.27%

+10.55%

ChatGPT-4

0.17%

0.13%

-21.48%

SEO Metrics

These metrics help us determine what actually performs best in terms of driving traffic to the website.

Winner: Manually Written Metadata

After testing these groups over the course of a month, the Manual group outperformed both ChatGPT-4 and Google methods.

ChatGPT-4 was the only group that experienced a decrease in CTR by -21.5%.

Manual Method CTR Data:

  • +176% liftvs ChatGPT-4
  • +54.1% liftvs Google rewrite
  • +16.3% liftvs previous period Manual CTR

Efficiency Metrics

These metrics help us understand what lowers the level-of-effort for the practitioners creating the meta data.

Winner: ChatGPT-4 Written Metadata

ChatGPT-4 enabled us to complete metadata>6x fasterthan the manual process.

ChatGPT-4 Method Efficiency Data:

  • 2.5 minutes/pagevs 19.1 minutes per page manually writing metadata
  • 3.75 hours for 90 pagesvs. 18.12 hours manually writing metadata for 57 pages

What’s Next?

So I shouldn’t use ChatGPT for Metadata?

We don’t believe that it’s time to give up manually writing metadata yet, but "don't use ChatGPT for Metadata"is notthe conclusion we’ve drawn here.

This technology is evolving at a quicker clip than we’ve ever seen before, so we will continue to test and learn in this space.

使用ChatGPT-4的主要好处是有效率的ciency, however, at this time it doesn’t seem like the performance is comparable enough to justify generating metadata exclusively with this tool.

Next test

Due to the multiple variables in this test, our next test is going to be swapping half of the Manual group with metadata written by ChatGPT-4, following a similar process as before.

Will this change result in a significant shift in CTR once variables are as consistent as possible?Stay tuned.

In our next test,Supernova™will be utilized to a greater capacity.

We’ll integrate both paid data (such as converting keywords) and organic features (such as People Also Ask questions) in order to provide ChatGPT additional context for metadata optimizations.

Additional Data + Test Variables

Organic rankings

Did it impact the test? Yes.

The average organic ranking across the test periods experienced minimal shifts, so we can’t determine a correlation between organic rankings and metadata methods.

While an incremental increase,the ChatGPT-4 group experienced the largest improvement in average ranking across test periods. This ranking shift did not drive an increase in CTR, possibly further highlighting the decreased performance of utilizing this tool for metadata.

Method

Beginning Avg. Rank

End Avg. Rank

Difference

Manual

18.0

17.6

-2.2%

ChatGPT-4

22.3

21.4

-4.0%

Google

23.3

24.3

+4.3%

Rewritten metadata

Did it impact the test? Yes.

We usedSupernova我们的基于云计算的数据platform, in order to analyze how often our implemented metadata was rewritten by Google in the search results. Here’s an example of how that report is viewed within the product’s dashboard:

image1-4

Manual written meta descriptions were rewritten 130% more often than URLs in the Google-written group.This includes theScreaming Frog Guidepage’s meta description that was rewritten 17 times over the course of 30 days.

Method

Avg. Titles

Avg.Descriptions

Manual

1.30

3.77

ChatGPT-4

0.74

1.72

Google

0.85

1.64

SERP Features

Did it impact the test? Yes.

The Manual group was shown alongside 8,001 different SERP Features, compared to 2,272 and 1,534 for the Google and ChatGPT-4 groups, respectively. This gap can be attributed to the better average ranking of this group compared to others as shown in the Organic Rankings table above.

image2-4

People Also Ask results were, by far, the most visible SERP Features, however Image results drove the highest CTR when visible alongside our testing pages.

SERP Feature

Count

CTR

PAA

7,968

0.99%

Image

2,378

1.42%

Video

924

1.01%

Answer Box - Paragraph

318

1.28%

Answer Box - List

183

1.34%

Answer Box - Other

34

0.48%

Answer Box - Table

2

0.00%

Impressions

Did it impact the test? Not as much as expected.

One group having an outlier of impressions could increase the margin of error for our test. Measuring the impressions of each group is important in order to understand if each test method had a comparable sample size of CTR data.

As expected, the Manual group had the largest amount of impressions. However there was just an 11.4% (21,177) difference between the high and low impression groups.

Method

Impressions

CTR

Manual

206,364

1.12%

Google

196,151

0.13%

ChatGPT-4

185,187

0.27%

While the pages within the Manual grouping were viewed the most, there was only a 5.2% difference in impressions compared to the Google group, despite a 38% average ranking difference.

Conclusions

We know that we’re still in the early stages of what will be possible with AI for SEO.

Sign up for the Seer newsletterorget in touch with usto hear more about our tests and how we’re using AI to impact our team and clients.

We love helping marketers like you.

Sign up for our newsletter to receive updates and more:

Nick Haigler
Nick Haigler
Manager, SEO