What meaning is there to be found in online discussion?

Users of the internet are exposed to the opinions of thousands of individuals, often without the slightest introduction or any other evidence of their existence. Most of us will happily accept exchanges that take place on the internet as though we witnessed the interaction taking place in front of us. Increasingly, I find myself forming an impression of what other people think of key issues that is based on “interactions” that I see taking place online, and not in the physical world.

Understanding how social media and online discussion impacts political and social discourse is naturally of interest to researchers, and there is a wealth of data available to be analysed. Unfortunately, there are some serious methodological issues impeding its use.

Let’s say you decide to collect a sample of data from popular content-aggregation site Reddit in order to research online deliberative democracy. You find a large political thread with 4,000 comments made by a total of 1,000 unique user accounts.

Problem 1: identifying genuine interactions

Reddit, like many similar sites, does not require users to create any kind of link between the person behind the keyboard and their online persona. It also doesn’t limit the number of accounts an individual can register. This means that your 1,000 users may contain not only genuine users with real comments about the topic, but also an unknown number of sockpuppet accounts created to artificially support arguments or to imitate and misrepresent those with opposing viewpoints. There may also be systematic use of such false accounts by governmental or corporate entities in order to conduct astroturfing. Commenters may further include a number of automated bots, some serving a useful purpose and others simply posting copied comments in order to create a believable user account that can later be sold as an astroturfing resource. There may also be missing data where users and comments have been removed, either by the user themselves or by a moderator.

Problem 2: what other forces are at play?

Reddit users can normally both upvote and downvote comments, depending on the subreddit and the user’s means of accessing the content. Typically, upvoted comments will float towards the top of the thread, and downvoted comments will be sent towards the bottom, but this will also depend on the user’s choice in how to sort comments. Many users treat upvotes and downvotes as a way of registering their side in a debate, instead of commenting, and they have a passive impact on the conversation as a result. Upvotes/downvotes may also (consciously or otherwise) be used to choose a side, forming a positive feedback loop where users’ votes can determine the popularity of content rather than just represent it. As a result, opinions can be formed based on what an individual sees as being the most popular side. Upvoted good; downvoted bad. Naturally, such voting systems are open to manipulation and abuse. A manufactured picture of popular opinion can be induced using false accounts to raise up one side of a debate and bury the other. How can you, with access only to only a single number (itself subject to vote-fuzzing) hope to gain any insight of how conversations are being viewed, navigated and manipulated?

Using data like this to inform our research seems fraught with complexity, and the example complications presented above are only two of many. We will face a different set of issues if we decide to analyse data from other websites such as Facebook, Twitter or 4chan. So, can we learn anything from this kind of data? I would argue that we can, but any meaning that we can infer will be inseparably tied to the design choices made by the platform creators. As a result, the context of any meaning found within the data is bound not only to a subset of the general online population, but also to the period of time in which those particular design choices are in place.

Author: Jack Holt

One response to “What meaning is there to be found in online discussion?”

  1. Matt Wood says:

    Lovely stuff Jack, and well done on incorporating more of your own arguments into your piece which you do very nicely. You do well to identify some of the issues surrounding researching online communities, which you relate to concepts developed in reddit – and in relation to how these platforms are designed, such as the nature of up-voting, which you articulate well. I might have liked to see some academic references alongside your hyperlink referencing – what can we learn from existing analyses of social media data? Is there any literature about how reddit users upvote/downvote? Also do be cautious of referencing wikipedia, it can raise some eyebrows even in blog form! I would have liked to hear a bit more about what we can learn from this online data – you indicate potential challenges but discussing more explicitly what we can learn from this data might have led to a slightly more balanced argument. Having said that, you reach some nice conclusions about insights being bound by the design of the interface – maybe you could give an example of this in practice? Great progress Jack, well done.

