The Wilcoxon-signed rank test is another example of a non-parametric test we can run in Julius. Data consists of paired observations (before and after). The test us designed to evaluate the median differences between paired observations (observations that do have a direct effect on one another and are not considered independent of each other) making it the ideal candidate to measure before-and after studies, repeated measurements on the same sample, or matched pairs of data.

The Wilcoxon signed-rank test does not assume normal distribution (usually found with smaller datasets), which is a requirement by parametric tests. Additionally, it is not sensitive to outliers since it essentially ranks the data before analyzing the differences. Here are the assumptions that we need to make sure this dataset follows:

- The data must be paired data: the test requires that the data consists of pairs of observations (before and after).
- Data must be continuous or ordinal: self explanatory.
- Symmetry of the Distribution Differences: the differences between the paired measurements should be symmetrically distributed around the median.
- There must be independence within pairs: meaning that the differences between each individual test subject should not influence the other.
- In addition to these, your data should also follow the non-parametric assumptions. Please check my other guide out for more information on that!

Now that we know a little more about this test, letâ€™s see an example on how to run it in Julius!

*Prompt* Youâ€™re a coach for the local 18U superhuman cross country team. You want to assess if there is an improvement in the athletes sprint times before and after they are enrolled in this new training program. You record the time (in seconds) it takes for each athlete to run a 50m dash. The results are displayed below:

id | before | after |
---|---|---|

1 | 0.4693 | 1.4196 |

2 | 3.0101 | 0.2254 |

3 | 1.3167 | 0.5183 |

4 | 0.9129 | 0.6844 |

5 | 0.1696 | 0.9134 |

6 | 0.1696 | 2.3069 |

7 | 0.0598 | 0.3341 |

8 | 2.0112 | 1.083 |

9 | 0.9191 | 1.3463 |

10 | 1.2313 | 0.0713 |

11 | 0.0208 | 1.403 |

12 | 3.5036 | 0.2804 |

13 | 1.7864 | 0.1009 |

14 | 0.2387 | 4.4605 |

15 | 0.2007 | 5.0559 |

16 | 0.2026 | 2.4785 |

17 | 0.3628 | 0.5449 |

18 | 0.7439 | 0.1542 |

19 | 0.5655 | 1.7291 |

20 | 0.3442 | 0.8701 |

This dataset is considered a â€śpairedâ€ť dataset (check 1 off the assumption list) because we are looking **at the same individual** over a period of **time** (paired results). It is also continuous data (check 2 off the assumption list), meaning that it can take on any value within a range. So, letâ€™s get to it!

**Steps of running a Wilcoxon signed-rank Test**

**Step 1: Run your descriptive statistics to get the characteristics of your dataset!**

As always, this should be the first step in any data analysis process. So letâ€™s run some descriptive stats on Julius!

Great! Now we know a little more about our dataset and the characteristics it has. We can use this information to chat about the results in the dataset if need be. Letâ€™s take a look at the assumptions now.

**Step 2: Check data for normal distribution**

The first step is to see if out dataset follows normal or non-normal distribution. We can perform a **Shapiro-Wilk test** to determine which type of distribution the dataset follows:

The test statistic for both the **before** and **after** both came back significant, so we can **reject the null hypothesis** and say that the data follows non-normal distribution. Additionally, we can see this non-normal distribution in the histograms, which show the data is skewed to the right. We can also check for outliers in this dataset by prompting Julius!

We do seem to have about four in this dataset, which is fine because the Wilcoxon-signed rank test can handle them!

**Step 3: Check for symmetry of the distribution of differences in the dataset**

One of the assumptions for running the Wilcoxon signed-rank test is to make sure that the **differences between the paired measurements should be symmetrically distributed around the median**. To calculate this manually you would just take each pair of observations and subtract the **after** score from the **before** score and plot that on a histogram. But we have Julius, so I prompted it to create a visualization to show me the distribution of skewness of my differences:

There is a slight skewness between my differences (as indicated by the slight peaks outside the median value on either side, as well as the gap between values 2 to 4), but it is not too detrimental to the analysis. Therefore, we can finally continue on to the final step!

**Step 4: Perform the Wilcoxon-signed rank test!**

So now we know the following about our dataset:

- it is considered a
**paired observation**because it follows the same individual over a period of time, but each pair is independent from the other pairs (meaning that each runner doesnâ€™t impact the other runners score in any way). - is continuous data AND data is independent from one another
- we have (mostly) symmetrical distribution of differences
- is not normally distributed
- it has some outliers in the dataset (~4)

So our superhuman runners are too good for the training regime, according to the findings. We would report them as follows:

â€śA Wilcoxon signed-ranked test was used to assess the effectiveness of a new training regime on the performance of superhuman cross country runners. The test revealed a statistic of 85.0, with a p-value of 0.475, indicating there was no significant change in the spring performance after implementing the training regime (p > 0.05).â€ť

Those superhumans are just too good!

**After the analysis**

For fun, letâ€™s say that we did find statistically significant differences between before and after sprint times, what would we do next? I asked Julius to come up with a list of potential post hoc tests that we would use to further analyze the data:

Julius has provided us a nice little list of potential post hoc tests that may be suitable for our dataset (some mention gender, age, etc., which we did not calculate for this specific example but could be tested). Of course, you would pick one (or multiple) depending on your specific question and what you are trying to see. But it is nice to have options!

Keywords: AI Statistics, GPT, Wilcoxon signed-rank test, paired data, data distribution, non-normal distribution, Shapiro-Wilk test, post hoc test, statistical analysis