Credits to Dr. Szuroczki, Dr. Gotanda, and the Brock 2Q04 Ecology for their original ideas, which inspired the creation of this guide!
From my previous post, we know a little more about some of the coding terms. Now let’s walk through a simple example using Base R functions.
(Please excuse me on the formatting of the code chunks, I am aware they are kind of funky looking at times due to spacing).
Prompt: You work for an Integrated Pest Management company monitoring the different invertebrate species located in various vineyards within California. You were tasked with performing vine sweeps on the vines at Jim & Sons Vineyard Co. You have collected the follow invertebrates from your vine sweeps.
invertebrate_species | count |
---|---|
Frankinella occidentalis | 128 |
Hercinothrips femoralis | 169 |
Erythronerua comes | 247 |
Erythroneura ziczac | 26 |
Emposca fabea | 75 |
Orius spp. | 38 |
Nabis spp. | 75 |
Macrosteles spp. | 49 |
Taking this information, create a frequency chart to visualize the differences in invertebrate species. Let’s start by prompting Julius to create a frequency table in R.
Bonus: who can tell me which species are beneficial, pest or neutral invertebrate species?
Step 1: Bring in the dataset and create a data.frame
Prompt: create a data.frame using this dataset shown in the photo.
Great! You should notice that Julius has not run the code yet, instead it created an R script, which is a text file containing the code that you would enter and run in R. I suggest doing this before you run a code as you can preview the script. The entire code is here:
#Create a data frame in R
invertebrate_data <- data.frame(invertebrate_species = c("Franklinella
occidentalis", "Hercinothrips femoralis", "Erythroneura comes",
"Erythroneura ziczac", "Empoasca fabae", "Orius spp.", "Nabis
spp.", "Macrosteles spp."))
count = c(128, 169, 247, 26, 75, 38, 75, 49)
#Print the data frame
print(invertebrate_data)
We can break this code down into two separate chunks. The first chunk is related to the invertebrate_species column, where Julius brought in the 8 invertebrate species found in the dataset.
The second chunk starting with count assigns the count of each invertebrate species. The order of the numerical values in the count vector must correspond to the order of the invertebrate species listed in the invertebrate_species vector. For example, we see that Frankinella occidentalis is matched with the count of 128, and Hercinothrips femoralis is subsequently matched with 169.
Step 2: Bringing in the Bar graph
Now let’s prompt R to keep this original script but add a plotting section.
Prompt: working off of this R script and using base R, add a code chunk that plots the count values on the y-axis and invertebrate_species on the x-axis.
Julius has added in a chunk of code that is related to plotting the dataset. Julius has also provided a little explanation on the code and has confirmed that the count values are plotted on the y-axis, and the invertebrate_species on the x-axis. Here is the remainder of the code chunk:
#Plotting the data
barplot(invertebrate_data$count, names.arg = invertebrate_data$invertebrate_species,
main = "Count of Invertebrate Species",
xlab = "Invertebrate Species",
ylab = "Count",
las = 2,
col = "blue")
We have a new function that we have not seen yet, this is the barplot() function. This is used, you guessed it, to make barplots! We can see that within the function we have different sections.
-
invertebrate_data$count
This tells R that within our invertebrate data frame we created that we want to plot the count data. By including the $ between the invertebrate_data and count, this tells R this. -
names.arg = invertebrate_data$invertebrate_species
The names.arg is used to specify the labels for the bars on the x-axis. Similar to the above code, we are telling R that we want it to use the names from the invertebrate_data in the invertebrate_species column to be plotted on the x-axis. -
main, xlab, ylab
These are used to specify the name of the title, x-axis, and y-axis. -
las = 2
This cute lil’ guy is used to orient the species name on the x-axis. There are three other las positions we can use:- las = 0: Labels are displayed parallel to the axis (horizontal for x-axis, vertical for y-axis).
- las = 1: Labels are displayed horizontally. We can add the function
cex.axis = to customize the size of the labels. - las = 3: Labels are displayed vertically.
-
col = “blue”
This tells us what colour we want the bar graph to be. You can change it to whatever you like! Some codes even allow you to do polka dots or stripe patterns.
Step 3: Run the code!
Now that we have the code ready, we can ask Julius to run it! Let’s do that:
Prompt: please run this code.
It looks like our code needs to be refined a bit, as we have some issues with formatting. Let’s ask Julius to help us add another code chunk that will allow us to fit the names of the invertebrate species on the x-axis.
Step 4: Adjusting the plot area
Prompt: how can I adjust the R script to get the names of the species to appear fully on the x-axis?
Julius has given us an additional section of code to add to our R script. This is called par(), which adjusts the margins to provide more space for us to fit the entirety of the x-axis labels. Let’s add this in and run the code!
Looking better! We just need to adjust the x-axis title so that it does not intersect with the names of the species.
Prompt: How can I adjust the x-axis so that it does not overlap with the invertebrate species names?
Julius has recommended that we add in the function mtext() AFTER we plot the bar plot. It then explains what numbers it will assign to both the side = and the line =. For side = it has four values that it can take on:
1 corresponds to the bottom
2 corresponds to the left
3 corresponds to the top
4 corresponds to the right
For line = , it specifies the line number where the axis annotation should be drawn. The specific interpretation can vary, but a simple rule of thumb is that 0 means right on the edge of the plot, positive values move towards the plot area from the edge, and negative values move away from the plot area towards the edge. You can play around with these values to see how it would affect the product.
For now, let’s run the code and see what we get:
Great! The invertebrate species x-axis title does not overlap as much as it was beforehand. However, it still is touching the Erythroneura ziczac name. So, let’s adjust it further to see if we can move it down a bit more.
Instead of prompting Julius again, I went in and edited the code manually by going to the upper right corner and clicking on edit code. Here I adjust the line = 8 to line = 9. This moved the title down a little more so there is a slight gap between the names now (see figure below)!
There is one more adjustment I want to make (yes, yes, I know this is enough already!). I want to add a section of code that will increase the y-axis to 300 and adjust the order of the invertebrate species so that the species counts are descending from left to right. Let’s ask Julius for help on this!
Step 5: Adding the finishing touches
Prompt: can you adjust the R script to increase the y-axis to 300 (ylim code) and also order the invertebrate species from greatest to least from left to right on the x-axis please?
Julius added two more sections of code here: [order(-invertebrate_data$count) and the ylim = c() that sets the y-axis limit. Now let’s run the entire code!
This is the final product! We have my invertebrate species descending from left to right, and my y-axis has been adjusted. We could have shortened the names of our invertebrate species so that it could fit better without having to use the par(), side =, and line = code. However, I wanted to introduce you to these codes as you may need them in the future.
This has been a journey, but I hope that this has helped you understand some of the coding terms used in Base R.
Keywords: AI, R, RStudio, Frequency Table, coding, troubleshooting, bar graph, AI coding