Let me start this post by noting that I will not attempt to test Godwin’s Law, which states that:
As an online discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches 1.
In this post, I’ll only try to find out how many Reddit comments mention Nazis or Hitler and ignore the context in which they are made. The data source for this analysis is the Reddit dataset which is publicly available on Google BigQuery. The following graph is based on 4.6 million comments and shows the share of comments mentioning Nazis or Hitler by subreddit.
Then I excluded history subreddits and looked at the probability that a Reddit thread mentions Nazis or Hitler at least once. Unsuprisigly, the probability of a Nazi refrence increases as the threads get bigger. Nevertheless, I didn’t expect that the probability would be over 70% for a thread with more than 1,000 comments.
The next step would be to implement sophisticated text mining techniques to identify comments which use Nazi analogies in a way as described by Godwin. Unfortunately due to time constraints and the complexity of this problem, I was not able to try for this blog post.