Can Article Forge 2.5 content pass the Turing Test?
This test was last run on November 12th, 2020 with Article Forge 2.5.Skip to the results
This case study was designed to determine if readers can differentiate Article Forge 2.5 content from content written by humans.
This is a key benchmark for Article Forge because creating content that is indistinguishable from a human is a strong barometer for quality.
Testing artificial intelligence to determine if it can think (and in this case write) like a human being was popularized by Alan Turing and his Turing Test. We are running our own Turing Test to see if Article Forge can write articles in a convincingly human way.
To determine if Article Forge’s content is distinguishable from human-written content, we paid 600 people via Mechanical Turk to read and evaluate Article Forge and human-written articles. Specifically, the readers were asked if the articles were: “written by a human”, “written by a machine”, or “do not know”.
The article topics were randomly selected from the over 2 million topics that users have entered into our system in the past. These topics range from “bitcoin” to “beer brewing” to “what to do in Cambodia” and cover both short tail (broad) and long tail (specific) keywords and topics.
Establishing a Baseline
The human-written articles were included as a control group to assess the validity of the test itself. The human-written articles were what we consider to be 1-star content which we determined to be the baseline for acceptable human-written content.
This control group was tested to account for any bias in our readers. More specifically, there was a chance that readers might be more inclined to select the “written by a human” option by default or on the basis of being charitable.
Passing ConditionsFor Article Forge content to pass the test and be considered indistinguishable from human content, two conditions must be met:
- A majority of readers must think the Article Forge content was written by humans.
- Readers must think the Article Forge content was written by humans at at least the same rate as the human-written content.
Readers were 69.16% more likely to think that Article Forge 2.5 articles were written by a human than by a machine, which is statistically significant with a p-value of 0.0000000002187.
Readers were 29.69% more likely to think that the human-written articles were written by a human than by a machine, which is also statistically significant with a p-value of 0.0009785.
1. A statistically significant number of readers thought the Article Forge content was written by humans, which meets the first passing condition.
2. More readers thought Article Forge articles were written by humans than in the control group of actual human-written articles. This meets the second passing condition and indicates that the results are valid and if anything, readers were biased towards thinking any content readers were biased towards thinking any content, including content actually written by a human, was written by a machine.
It is important to note that the readers were specifically asked to consider if the content was written by a machine. In day to day life, readers will likely not even consider whether or not Article Forge content was written by a machine because it reads as naturally as human content.
Therefore, you can conclude that people think that Article Forge 2.5 content was written by humans and that Article Forge passes a basic Turing Test.
What does this mean
In the past, machine written content has earned a negative connotation. This was largely due to the stilted grammar and syntax, abrupt and unnatural transitions, and the overall lack of readability and quality of the content being produced.
The above tests show that Article Forge content is largely indistinguishable from human content. This means that you can comfortably use Article Forge's machine-generated content knowing that it is syntactically on par with human-written content.
As we continue to improve Article Forge, we will continue to run this case study and update it with the most recent results and version information.