If you’re here reading this, that means my title has intrigued you to learn more about R in Julius. This guide will be a simple introduction to the different aspects of R, some common language used, and how coding with R looks. This guide aims to slowly introduce new users to the capabilities of R in Julius and how you can use it for your data analysis.
What Is R (ft. Alysha’s Storytime)
Great question! To give you an unwanted storytime of how I got introduced to R, I was essentially thrown into it during my convoluted thesis adventure!
My Masters’ program required all graduate students to work as a Teacher Assistant while completing your research at the institution. I was accepted as an Ecology TA since my thesis was in Ecology and Evolution. Did I know I was going to be teaching R? Absolutely not. Did I panic about teaching students something I was just learning? Naturally.
So, this is how my life with R began! I had absolutely no background knowledge in it, but I needed to learn how to navigate most of Base R within a weeks’ time, as I was going to teach it to students’.#Graduatestudentproblems
Some Background on R
R was introduced in 1993 by Ross Ihaka and Robert Gentleman from the University of Auckland.(1) It is a very powerful resource for data scientists, researchers, academia, and basically anyone in between.(2,3) It is (mostly) user-friendly, as the coding is very straightforward once you get the hang of it. It is mainly used for data visualization and statistical analyses but is versatile due to the implementation of community-made packages.(2) These packages extend the base functionality of R, allowing it to perform different tasks such as spatial analysis, bioinformatics, LLM, web development and much more.(2,4) If you want to learn more about R, check out this literature review on the history of R.
Terms with Examples
There will be many different terms I’ll be using when I work with R in Julius. I have already used some, but below I’ll highlight some common ones. This isn’t a comprehensive list, rather a short snippet of the terms.
R Packages: These are collections of R functions, data, and compiled code in a small format. You can think of the packages like a toolbox for R that help you perform specific tasks such as making graphs or analyzing data. Some packages come with example datasets to help you learn how to use it effectively.
Function: This refers to a piece of the code that performs a specific task. You can call the function by its name and provide it with the desired inputs. The format usually looks like this:
results <- add_numbers(3, 5)
print(result)
Where, add_numbers is the function here.
Variable: A name that stores a value or data. You’ll see this denoted like this ← in the code. For example, if I write ‘x ← 5’, this means that the value of 5 is assigned to the x variable.
add_numbers <- function(a, b) {
return(a + b)}
results <- add_numbers(3, 5)
We assigned the result of the function add_numbers(3, 5) to the variable named ‘results’.
Vector: This is a sequence of data elements of the same type. You’ll usually see it written like: ‘c(1, 2, 3, 4)’. This creates a numeric vector with four elements.
#Creating a vector of numbers
**numbers <- c(3, 5, 7, 9)**
Data Frame: A tabular data structure used for storing datasets similar to the format of a spreadsheet or dataset table. This is a way for R to create new datasets from existing datasets or manual input of data.
#Creating a data frame using data.frame()
random_data <- data.frame(
Name = c("John", "Alice", "Bob"),
Age = c(25, 30, 28),
Gender = c("Male", "Female", "Male"))
Factor: a type of data that is used to represent categorical variables.
#Creating a vector of categorical data
gender <- c("Male", "Female", "Female", "Male", "Male")
#Converting the vector to a factor
gender_factor <- (factor(gender)
#viewing the levels of the factor
levels(gender_factor)
Library: A collection of R packages that are available in R.
Install.packages(): This line of code allows you to install different packages available in R.
Base R: These are the packages that are included in every R installation. They include: base, compiler, datasets, graphics, grDevices, grid, methods, parallel, splines, stats, stats4, tcltk, tools, and utils.(4) You can call these packages in by prompting in Julius with library(stats).
Reminders:
Like python, R is very sensitive to how your dataset is formatted. Therefore, it is recommended to keep it as simple as possible. Avoid capitalizations of words, no spaces, no fancy icons (#@%^&*), just short, simple titles.
For example, if you named one of you columns Bear_data, and then while coding drop the B and it is now bear_data, R will not like that. Thankfully, Julius should catch this issue and fix it, as it is very good at troubleshooting. But it is best to avoid this issue altogether.
References used:
- Ihaka, R. (2008). The R Project: A Brief History and Thoughts About the Future (PDF) (p. 12). Retrieved from Auckland University website. (Archived (PDF) from the original on 28 December 2022)
- Staples T. 2023. Data from: Diversification and change in the R programming language. Dryad Digital Repository. ( 10.5061/dryad.h18931zrg) [CrossRef]
- Pechenick EA, Danforth CM, Dodds PS. 2015. Characterizing the google books corpus: strong limits to inferences of socio-cultural and linguistic evolution. PLoS ONE 10, e0137041. [PMC free article] [PubMed] [Google Scholar]
- Giorgi, F. M., Ceraolo, C., & Mercatelli, D. (2022). The R Language: An Engine for Bioinformatics and Data Science. Life, 12, 648. https://doi.org/10.3390/life12050648
Keywords: R, AI, Base R, coding, data frame, vectors