Guide: Friedman's Test

Welcome to another post about running non-parametric tests in Julius! Today we are going to explore how to set up and execute a Friedman’s Test! Let’s take a look at this detailed example.

Prompt: You have discovered a new species of plastic eating bacteria (Mangia plastica) located in the Pacific Rim National Park Reserve in Canada. Previous studies have shown that some plastic eating bacteria perform better at higher temperatures. As an enthusiastic scientist, you aim to test this new species’ ability to decompose 500mL plastic water bottles at three different temperatures: room temperature (20-22°C), warm (30°C) and freezing point (0°C). Each observation represents the rate of plastic decay (inches/hr) for an individual bacteria sample under each treatment condition. There was a rest period before each temperature change, and the order of treatments was randomized for each sample. A total of 30 samples were tested.

id treatment1 treatment2 treatment3
1 1.64 0.87 1.91
2 4.59 0.79 0.79
3 4.85 2.15 0.63
4 1.72 0.63 0.63
5 1.27 0.15 0.18
6 0.57 0.36 1.37
7 0.40 0.24 4.33
8 0.80 1.07 0.24
9 0.58 1.12 0.32
10 1.46 0.55 0.75
11 0.55 6.37 0.99
12 0.35 2.28 0.29
13 1.23 0.14 0.26
14 1.22 2.09 1.19
15 0.89 0.74 0.23
16 0.49 0.63 2.88
17 1.41 0.17 1.38
18 0.68 0.51 1.84
19 2.80 2.54 0.43
20 0.73 1.39 2.65
21 0.62 0.83 0.33
22 0.30 2.25 3.88
23 0.93 2.73 1.44
24 0.52 1.44 4.66
25 0.96 4.78 0.07
26 2.27 1.09 0.74
27 1.10 0.14 0.80
28 1.43 4.38 0.60
29 0.45 0.61 2.50
30 1.39 0.59 1.67

We aim to determine if there are statistically significant differences in the rate of plastic decomposition in Mangia plastico under different temperature conditions.

Step 1: Descriptive Statistics and Visualizations

As highlighted in many of my previous guides, descriptive statistics are the first step in any robust analysis. So let’s prompt Julius!

Prompt: Can you perform descriptive statistics on my dataset please? Can you provide a visualization of my dataset as well?


Based on the descriptive statistics and boxplot, treatment 2 (warm treatment) appears to have the highest average value and variability, followed by treatment 1 and then treatment 3. The boxplot indicates that treatment 2 has the highest whiskers (~4.5), suggesting a broader range of values compared to the other treatments. Additionally, there are several outliers in each treatment, with treatment 2 showing more extreme outliers than the other treatments.

Step 2: Testing Assumptions of the Friedman’s Test

Before running the Friedman’s Test, we need to verify that the data meets the following assumptions:

1. Dependent Samples & Independence: The data must consist of dependent samples, meaning that each individual must undergo all treatments. Additionally, the independence of samples must be ensured, which was achieved by testing each sample in isolation.

2. Ordinal or Continuous Data: The test requires data to be either ordinal or continuous. Since our data is continuous, we can say that it passes this assumption.

3. Randomness: The assignment of treatment must be randomized for each individual. According to the prompt, the experiment was conducted with randomized treatment orders, passing this assumption.

4. No Significant Outliers: Friedmans test can be sensitive to outliers, as they can interfere with the ranking system. The boxplot reveals the presence of some outliers, which may cause an issue. We can consult with Julius on handling these outliers:

Prompt: Are there any specific adjustments or considerations you’d like to discuss regarding the outliers in the dataset?


We have observed consistent outliers across the three treatments, with treatment 1 containing slightly more outlier than treatments 2 and 3. Based on this observation, we will proceed with running the Friedman’s test without adjusting for outliers initially. If necessary, we will re-run the test with the winsorized data.

5. Similarity of Experimental Conditions: Excluding the treatment variable, all other conditions should be the same amongst all different treatment. This last assumption is crucial, especially when experiments are conducted outside controlled environment (i.e., lab conditions, greenhouse conditions). While environmental variables can vary in field studies, efforts should be made to maintain consistency.

6. Non-normality: Friedman’s test can handle non-normal distributions. Let’s run the Shapiro-Wilk statistic to check the normalcy of our dataset.

The Shapiro-Wilk test indicated that the data does not follow a normal distribution. Therefore, we can continue with the Friedman’s Test.

Step 3: Running Friedman’s Test

Prompt: Let’s conduct a Friedmans test

The test results were not statistically significant, indicating that temperature did not significantly affect the rate at which M. plastico decomposed plastic water bottles.

Exploring Winsorizing

Next, we will explore whether winsorizing the data impacts the test results. This process refers to limiting the extreme values in the dataset to reduce the influence of outliers.


After running the Friedman’s Test on the winsorized dataset, we did not observe much of a difference in the p-value.

Post Hoc Analysis Consideration
Although our data did not show significant results, post hoc tests like Dunn’s test can be used for exploratory purposes. Here is an example of me running Dunn’s post hoc test.



Conclusion

We can now report our results:

“The Friedman’s test was chosen to examine if there were any statistically significant results between different treatment conditions on M. plastico decomposition rate of 500mL plastic bottles. The test results yielded no significant differences, x^2(2, N=30) = 0.800, p = 0.670.”

Finished (for now). Feel free to explore more statistical analyses!

Reference & More Reading

Pereira, D. & Afonso, Anabela & Medeiros, Fátima. (2015). Overview of Friedman’s Test and Post-hoc Analysis. Communications in Statistics - Simulation and Computation. 44. 2636-2653. 10.1080/03610918.2014.931971.

Keywords: AI Statistics, GPT, Non-parametric, Friedman’s Test, Descriptive Statistics, Outliers, Data preprocessing, Data visualization, post hoc analysis

1 Like

TLDR
The bacteria could potentially create algal blooms. “We need to address the problem at the source”. numbers x^2(2, N=30) = 0.800, p = 0.670".

Thanks for the guide! I believe there’s a misunderstanding about the TLDR. This post isn’t about algal blooms; it’s about a synthetic study that examines how temperature and pollution could affect the rate of decay of plastic bottles by a fake species of bacteria.