Reflections on Quantitative Methods

As an example of a quantitative assessment of experimental data, we tested the hypothesis that the “buddhify” app affected anxiety levels. Dummy data was provided for anxiety level (dependent variable) of 12 participants after 1 week of using the app and after 1 week of not using the app (independent variable). As this was repeated testing of the same participants, the t-test for correlated samples was applied as described in Robson (1994). The resulting t-statistic value (t=2.474) exceeded the 5% significance level threshold for a 2-tailed test in the t-statistic table (t=2.201 at 11 degrees of freedom). As there was a less than 5% probability that this data could arise by chance, there was a statistically significant decrease in mean stress and the null hypothesis that using the app had no effect on anxiety in this group could be rejected.

Prior to conducting the test, we discussed the experiment in class. We recognised that anxiety cannot be measured directly and a proxy must be used. Self-reporting may be subjective and unreliable, objectively measurable variables like cortisone levels have multiple drivers that cannot be controlled for. Any assessment is temporally constrained whilst anxiety is dynamic, contextual and difficult to define and constrain. We identified several confounding factors that could affect the outcome of the experiment. These included placebo effects and user backgrounds. A habitual user may experience “withdrawal” for the week when app use is not permitted. A participant unfamiliar with smartphone usage would face a steep learning curve in using the app that could cause anxiety.

What inferences could be drawn from our results? We do not know what population the sample is drawn from, nor whether this data is representative of that population. For an investment in developing and marketing the app to be reasonable, a more rigorous sampling program from a defined population would be required as evidence of the potential efficacy of the app within the target population. The 5% significance threshold is commonly used, but arbitrary. If resources are very limited, a stricter test might be more appropriate to determine whether developing this app is a good investment.

In the natural sciences, poor experimental design limits the reliability of the data and the restricts the validity of any inferences drawn from the results (Hurlbert, 1984). Such concerns are exacerbated in the social sciences due to the complexity of social context and of individual differences and the tendency to use convenience samples (see further Henrich, Heine, & Norenzayan, 2010). The difference between correlation or association of two variables and causation by one variable of an effect in another variable causes considerable confusion both within academia and in the public domain (Goldacre, 2014). Outliers and the absence of positive effects are typically viewed as failures in an academic research community with a publication bias towards positive effects and subtle pressure to support the prevailing narrative of stakeholders including employers, sponsors and funding providers (Goldacre, 2014).

Using quantitative methods to answer research questions requires reflection around experimental design.



Goldacre, B. (2014). I Think You’ll Find It’s a Bit More Complicated Than That. UK: 4th Estate.

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral and Brain Sciences, 33(2–3), 61–83.

Hurlbert, S. H. (1984). Pseudoreplication and the Design of Ecological Field Experiments. Ecological Monographs, 54(2), 187–211.


Author: Matthew Snape

Leave a Reply

Your email address will not be published. Required fields are marked *