What meaning is there to be found in online discussion?
Users of the internet are exposed to the opinions of thousands of individuals, often without the slightest introduction or any other evidence of their existence. Most of us will happily accept exchanges that take place on the internet as though we witnessed the interaction taking place in front of us. Increasingly, I find myself forming an impression of what other people think of key issues that is based on “interactions” that I see taking place online, and not in the physical world.
Understanding how social media and online discussion impacts political and social discourse is naturally of interest to researchers, and there is a wealth of data available to be analysed. Unfortunately, there are some serious methodological issues impeding its use.
Let’s say you decide to collect a sample of data from popular content-aggregation site Reddit in order to research online deliberative democracy. You find a large political thread with 4,000 comments made by a total of 1,000 unique user accounts.
Problem 1: identifying genuine interactions
Reddit, like many similar sites, does not require users to create any kind of link between the person behind the keyboard and their online persona. It also doesn’t limit the number of accounts an individual can register. This means that your 1,000 users may contain not only genuine users with real comments about the topic, but also an unknown number of sockpuppet accounts created to artificially support arguments or to imitate and misrepresent those with opposing viewpoints. There may also be systematic use of such false accounts by governmental or corporate entities in order to conduct astroturfing. Commenters may further include a number of automated bots, some serving a useful purpose and others simply posting copied comments in order to create a believable user account that can later be sold as an astroturfing resource. There may also be missing data where users and comments have been removed, either by the user themselves or by a moderator.
Problem 2: what other forces are at play?
Reddit users can normally both upvote and downvote comments, depending on the subreddit and the user’s means of accessing the content. Typically, upvoted comments will float towards the top of the thread, and downvoted comments will be sent towards the bottom, but this will also depend on the user’s choice in how to sort comments. Many users treat upvotes and downvotes as a way of registering their side in a debate, instead of commenting, and they have a passive impact on the conversation as a result. Upvotes/downvotes may also (consciously or otherwise) be used to choose a side, forming a positive feedback loop where users’ votes can determine the popularity of content rather than just represent it. As a result, opinions can be formed based on what an individual sees as being the most popular side. Upvoted good; downvoted bad. Naturally, such voting systems are open to manipulation and abuse. A manufactured picture of popular opinion can be induced using false accounts to raise up one side of a debate and bury the other. How can you, with access only to only a single number (itself subject to vote-fuzzing) hope to gain any insight of how conversations are being viewed, navigated and manipulated?
Using data like this to inform our research seems fraught with complexity, and the example complications presented above are only two of many. We will face a different set of issues if we decide to analyse data from other websites such as Facebook, Twitter or 4chan. So, can we learn anything from this kind of data? I would argue that we can, but any meaning that we can infer will be inseparably tied to the design choices made by the platform creators. As a result, the context of any meaning found within the data is bound not only to a subset of the general online population, but also to the period of time in which those particular design choices are in place.
Author: Jack Holt