Guide: Running Friedmans Test (Non-parametric)

Welcome to another post about running non-parametric tests in Julius! Today we are going to examine how to setup and run a Friedman’s Test! Let’s take a look at this prompt I made (sorry it is lengthy):

Prompt: You discovered a new species of plastic eating bacteria (Mangia plastica) located in the Pacific Rim National Park Reserve in Canada. Previous studies have shown that some plastic eating bacteria function better at higher temperatures. Being the eager scientist you are, you want to test this novel species ability to decompose 500mL plastic water bottles at three different temperatures: room temperature (20-22°C), warm (30°C) and freezing point (0°C). Each observation represents the rate of plastic decay (inches/hr) for one individual under each treatment. A rest time was given before the species were introduced to the next temperature change. The order of each treatment for each species was randomized as well. A total of 30 species were tested.

id treatment1 treatment2 treatment3
1 1.64 0.87 1.91
2 4.59 0.79 0.79
3 4.85 2.15 0.63
4 1.72 0.63 0.63
5 1.27 0.15 0.18
6 0.57 0.36 1.37
7 0.40 0.24 4.33
8 0.80 1.07 0.24
9 0.58 1.12 0.32
10 1.46 0.55 0.75
11 0.55 6.37 0.99
12 0.35 2.28 0.29
13 1.23 0.14 0.26
14 1.22 2.09 1.19
15 0.89 0.74 0.23
16 0.49 0.63 2.88
17 1.41 0.17 1.38
18 0.68 0.51 1.84
19 2.80 2.54 0.43
20 0.73 1.39 2.65
21 0.62 0.83 0.33
22 0.30 2.25 3.88
23 0.93 2.73 1.44
24 0.52 1.44 4.66
25 0.96 4.78 0.07
26 2.27 1.09 0.74
27 1.10 0.14 0.80
28 1.43 4.38 0.60
29 0.45 0.61 2.50
30 1.39 0.59 1.67

Determine if there are any statistically significant differences found between treatment conditions on the rate of plastic decomposition in Mangia plastico.

Step 1: Let’s do some descriptive statistics and visualizations!

If you’ve looked at any of my guides, you should know that descriptive stats are the first step in any good data analysis method. So let’s prompt Julius!

*Prompt: *can you perform descriptive statistics on my dataset please? Can you provide a visualization of my dataset as well?

It looks like treatment 2 (warm treatment) seems to have the highest average value and variability, followed by treatment 1 and then treatment 3. When examining the boxplot we can see that treatment 2’s whiskers are very high (~4.5), which is higher than the other two treatments. Additionally, there are quite a few outliers for each treatment type, with treatment 2 showing more extreme outliers than the other two treatments.

Step 2: Testing for the assumptions of the Friedman’s Test

Next we should look at the assumptions to make sure that our dataset is suitable for the Friedman’s test. Below are the assumptions:

1. Dependent Samples & Independence: our data must be dependent on one another. In other words, each individual must go through each temperature treatment. Additionally, each participant must not be influenced by another. For this particular setup, each individual was tested in their own area, without any connection to their other brethren.

2. Ordinal or Continuous Data: the data must be either ordinal or continuous values. Since our data is continuous, we can say that it passes this assumption.

3. Randomness: assignment of treatment must be randomized in each individual. From the prompt we know that this was a randomized experiment. So this also passes the assumption check.

4. No Significant Outliers: Friedmans test can be sensitive to outliers, as they can interfere with the ranking system. From our boxplot visualization, we can see that we do have outliers. This may be a problem… so, let’s ask Julius what we should do:

Prompt: Are there any specific adjustments or considerations you’d like to discuss regarding the outliers in the dataset?

I’m very interested in the winsorzing data, but first I want to ask to see if my outliers will be significant enough to pose a problem.

We know that my outliers are consistent across the three treatments, with treatment 1 having one more outlier than treatment 2 and 3. Based on this, I am okay with running the Friedman’s test. What I will do is run the first dataset without winsorized data, then again, with the winsorized (is this even a word?) dataset.

5. Similarity of Experimental Conditions: Excluding the treatment variable, all other conditions should be the same amongst all different treatment.

Note: this last assumption is hard to fulfill at times, especially if you are not testing within a controlled environment (i.e., lab conditions, greenhouse conditions). My thesis was conducted in a operational vineyard, so a lot of environmental variables such as wind, temperature, light and weather can not be controlled. So just be mindful about your experimental conditions.

6. Non-normality: Friedman’s test can handle non-normal distributions. Let’s run the Shapiro-Wilk statistic to check the normalcy of our dataset.

Nope, not a normal distribution. Let’s run Friedman’s Test!

Step 3: Running Friedmans Test

Prompt: Let’s conduct a Friedmans test

Ha, so these test results were considered non-significant. Therefore, temperature did not effect the speed that M. plastico decomposed plastic water bottles.

Let’s see if winsorizing my dataset will have an effect or not

I was correct in my assumption that my dataset would not benefit from winsorizing, as we can see there was no change in the test statistic or p-value.

I can also prompt to see which tests we can use for a post hoc analysis. Since my data did not show any significant results, there really isn’t any need for this analysis. But if you are curious, here is an example of me running Dunn’s post hoc test.

Ta da! We didn’t get any statistically significant results, who would have thought? Let’s report our findings anyways:

“The Friedman’s test was chosen to examine if there were any statistically significant results between different treatment conditions on M. plastico decomposition rate of 500mL plastic bottles. The test results yielded no significant differences, x^2(2, N=30) = 0.800, p = 0.670.”

Finished (for now). Go have some fun running more statistical analyses!

Keywords: AI Statistics, GPT, Non-parametric, Friedman’s Test, Descriptive Statistics, Outliers, Data preprocessing, Data visualization, post hoc analysis