25 Repeated Measures ANOVA
Learning how to use Random Intercept ANOVA Models
25.1 Introduction
Repeated measures ANOVA is used when you have the same measure that participants were rated on at more than two time points. With only two time points a paired \(t\)-test will be sufficient, but for more times a repeated measures ANOVA is required. (2013-) There are many complex designs that can make use of repeated measures, but throughout this guide, we will be referring to the most simple case, that of a one-way repeated measures ANOVA. This particular test requires one independent variable and one dependent variable. The dependent variable needs to be continuous (interval or ratio) and the independent variable categorical (either nominal or ordinal). (2018-)
25.2 Neccessary packages
Make sure that you have installed the following R packages:
-
tidyverse
for data manipulation and visualization. -
ggpubr
for creating easily publication ready plots. -
rstatix
provides pipe-friendly R functions for easy statistical analyses.(2018-) -
datarium
contains required data sets for this chapter.
Start by loading the following R packages
25.3 Data source and description
For this example we will be using this dataset from the datarium
package that contains 10 individuals’ self-esteem score on three time points during a specific diet to determine whether their self-esteem improved.
One-way repeated measures ANOVA can be performed in order to determine the effect of time on the self-esteem score.
# Data preparation; wide format
data("selfesteem", package = "datarium")
selfesteem
# A tibble: 10 × 4
id t1 t2 t3
<int> <dbl> <dbl> <dbl>
1 1 4.01 5.18 7.11
2 2 2.56 6.91 6.31
3 3 3.24 4.44 9.78
4 4 3.42 4.71 8.35
5 5 2.87 3.91 6.46
6 6 2.05 5.34 6.65
7 7 3.53 5.58 6.84
8 8 3.18 4.37 7.82
9 9 3.51 4.40 8.47
10 10 3.04 4.49 8.58
Now we “gather” columns t1
, t2
, and t3
into “long” format, then convert id
and time
into factor variables.
selfesteem_df <-
selfesteem %>%
gather(key = "time", value = "score", t1, t2, t3) %>%
convert_as_factor(id, time)
selfesteem_df
# A tibble: 30 × 3
id time score
<fct> <fct> <dbl>
1 1 t1 4.01
2 2 t1 2.56
3 3 t1 3.24
4 4 t1 3.42
5 5 t1 2.87
6 6 t1 2.05
7 7 t1 3.53
8 8 t1 3.18
9 9 t1 3.51
10 10 t1 3.04
# ℹ 20 more rows
The one-way repeated measures ANOVA can be used to determine whether the means self-esteem scores are significantly different between the three time points.
Note: Whilst the repeated measures ANOVA is used when you have just “one” independent variable, if you have “two” independent variables (e.g., you measured time and condition), you will need to use a two-way repeated measures ANOVA. Two and Three-way Repeated Measures ANOVA examples with this data can be found here.
25.3.1 Summary statistics
Compute some summary statistics of the self-esteem score by groups (time): mean and sd (standard deviation)
# Statistics-summary
selfesteem_df %>%
group_by(time) %>%
get_summary_stats(score, type = "mean_sd")
# A tibble: 3 × 5
time variable n mean sd
<fct> <fct> <dbl> <dbl> <dbl>
1 t1 score 10 3.14 0.552
2 t2 score 10 4.93 0.863
3 t3 score 10 7.64 1.14
25.3.2 Visualization
Create a box plot and add points corresponding to individual values:
bxp <- ggboxplot(selfesteem_df, x = "time", y = "score", add = "point")
bxp
25.4 Test Assumptions
Before computing repeated measures ANOVA test, you need to perform some preliminary tests to check if the assumptions are met.
25.4.1 Outiliers
Outliers can be easily identified using box plot methods, implemented in the R function identify_outliers()
inside the rstatix
package.
selfesteem_df %>%
group_by(time) %>%
identify_outliers(score)
# A tibble: 2 × 5
time id score is.outlier is.extreme
<fct> <fct> <dbl> <lgl> <lgl>
1 t1 6 2.05 TRUE FALSE
2 t2 2 6.91 TRUE FALSE
There were no extreme outliers. In the situation where we have extreme outliers, we can include the outlier in the analysis anyway if we do not believe the result will be substantially affected. This can be evaluated by comparing the result of the ANOVA with and without the outlier. It’s also possible to keep the outliers in the data and perform robust ANOVA test using the WRS2
package. WRS2 Package
25.4.2 Normality Assumption
The outcome (or dependent) variable should be approximately normally distributed in each cell of the design. This can be checked using the Shapiro-Wilk normality test (shapiro_test()
in rstatix
package) or by visual inspection using QQ plot (ggqqplot()
in the ggpubr package
). If the data is normally distributed, the \(p\)-value should be greater than 0.05.
selfesteem_df %>%
group_by(time) %>%
shapiro_test(score)
# A tibble: 3 × 4
time variable statistic p
<fct> <chr> <dbl> <dbl>
1 t1 score 0.967 0.859
2 t2 score 0.876 0.117
3 t3 score 0.923 0.380
The self-esteem score was normally distributed at each time point, as assessed by Shapiro-Wilk’s test (\(p > 0.05\)).
Note that, if your sample size is greater than 50, the normal QQ plot is preferred because at larger sample sizes the Shapiro-Wilk test becomes very sensitive even to a minor deviation from normality. QQ plot draws the correlation between a given data and the normal distribution. Create QQ plots for each time point:
ggqqplot(selfesteem_df, "score", facet.by = "time")
From the plot above, as all the points fall approximately along the reference line, we can assume normality.
25.4.3 Assumption of Sphericity
The variance of the differences between groups should be equal. This can be checked using the Mauchly’s test of sphericity. This assumption will be automatically checked during the computation of the ANOVA test using the R function anova_test()
in rstatix
package. The Mauchly’s test is internally used to assess the sphericity assumption. Click HERE to know more about the Assumption of Sphericity and the Mauchly’s Test and to understand why is important.
By using the function get_anova_table()
to extract the ANOVA table, the Greenhouse-Geisser sphericity correction is automatically applied to factors violating the sphericity assumption.
res.aov <- anova_test(
data = selfesteem_df,
# Selfesteem variable
dv = score,
# Sample individuals
wid = id,
# Independent variable time
within = time
)
# Get table
get_anova_table(res.aov)
ANOVA Table (type III tests)
Effect DFn DFd F p p<.05 ges
1 time 2 18 55.469 2.01e-08 * 0.829
The self-esteem score was statistically significantly different at the different time points during the diet, \(F_{(2, 18)} = 55.5\), \(p < 0.0001\), \(\eta^2_g = 0.83\). where,
- F Indicates that we are comparing to an \(F\)-distribution (\(F\)-test),
- (2, 18) indicates the degrees of freedom in the numerator (DFn) and the denominator (DFd), respectively,
- 55.5 indicates the obtained \(F\)-statistic value;
- p specifies the \(p\)-value, and
- \(\eta^2_g\) is the generalized effect size (amount of variability due to the within-subjects factor).
25.4.4 Post-hoc test
You can perform multiple pairwise paired \(t\)-tests between the levels of the within-subjects factor (here time). We adjust \(p\)-values using the Bonferroni multiple testing correction method.
# pairwise comparisons
pwc <- pairwise_t_test(
data = selfesteem_df,
formula = score ~ time,
paired = TRUE,
p.adjust.method = "bonferroni"
)
pwc
# A tibble: 3 × 10
.y. group1 group2 n1 n2 statistic df p p.adj p.adj.signif
* <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <chr>
1 score t1 t2 10 10 -4.97 9 0.000772 2e-3 **
2 score t1 t3 10 10 -13.2 9 0.000000334 1e-6 ****
3 score t2 t3 10 10 -4.87 9 0.000886 3e-3 **
All the pairwise differences are statistically significant.
25.5 Results
We could report the results of the post-hoc test as follows: post-hoc analyses with a Bonferroni adjustment revealed that all the pairwise differences, between time points, were statistically significantly different (\(p < 0.05\)).
pwc <- pwc %>% add_xy_position(x = "time")
bxp +
stat_pvalue_manual(pwc) +
labs(
subtitle = get_test_label(res.aov, detailed = TRUE),
caption = get_pwc_label(pwc)
)
25.6 Conclusion
This chapter describes how to compute, interpret and report repeated measures ANOVA in R, specifically one-way repeated measures ANOVA. We also explain the assumptions made by one-way repeated measures ANOVA tests and provide practical examples of R codes to check whether the test assumptions are met.