Have you ever wondered what the daily movements of bears in Pennsylvania looked like? No? Just me? Well now we can plot the movement of species using R in Julius! This walkthrough will show you how to use a simple dataset to plot the movement of three bear species in Pennsylvania. Let’s get started!
Prompt: You own a wildlife rescue centre in Pennsylvania and are interested in monitoring the release of three bear species (black bear, polar bear (irl no polars in PA), and grizzly bear). You place a non-invasive transmitter collar on each bear to monitor their movement across PA. You have collected the following data and want to map out the area they have been occupying.
Note: I will be giving the entire R code at the end of this guide for you to examine.
Step 1: Load in your dataset
We can start by asking Julius to load the dataset into the chat. Since it is in a Google doc, I can ask Julius to fetch it for me by providing a link to the document.
Prompt: load dataset from: animal_movement_data - Google Sheets
Code output by Julius:
library(googlesheets4)
gs4_deauth()
sheet_url <- 'https://docs.google.com/spreadsheets/d/1axYVyt1Qll6D60OA9ZjdL2n2FLVy0kSby82j071IgUs/edit?usp=sharing'
sheet <- read_sheet(sheet_url)
print(head(sheet))
Breakdown of code:
If you are new to R, here is a breakdown of each of the codes meaning:
-
library(googlesheets4)
R has what are called libraries, which store R packages that can analyze datasets, provide and read code, and overall make it easier to work with your data. Since our dataset is in a google sheet, we are using the googlesheets4 package to extract the dataset so that we can work with it in R. -
gs4_deauth()
This is the function that is used to deauthenticate any existing Google sheet authentication. -
sheet _url
This line defines the url, and gives it the name of sheet_url -
sheet ← read_sheet(sheet_url)
This function tells R to read the sheet called sheet_url, and then renames it as sheet. -
print(head(sheet))
This tells R to print the beginning portion of sheet.
Julius Answer:
Here is the final output from this code block:
The dataset contains, time, bears, x and y coordinates, longitude, and latitude coordinates. We are going to focus on longitude and latitude coordinates when mapping spatial data.
This entire R code was given to us based on that specific prompt. For people who are new to R, this is a great way to learn it! I encourage you to follow along with this dataset to learn more about how to prompt Julius when working in R and what each segment of code is.
For people who are R veterans, I encourage you to write the code you know and see what Julius does with it. As I continue to go through this example, I will explain each section of code as we go. This will give new users a chance to understand each aspect of the code as it pertains to R.
Step 2: Check to see if the species column is read in
Prompt: name the different bears
Code output by Julius:
unique_bears <- unique(sheet$bears)
print(unique_bears)
Breakdown of code:
-
unique_bears <-unique(sheets$bears)
There is a lot to unpack here, but let’s start with the unique(sheets$bears). Within the brackets, you can see the code sheet$bears. The sheet is telling R to look at the google sheet we uploaded. The $ sign after the sheet tells R what specific column you want it to look at, for us this is the column named bears.
The unique function extracts the species names from the column named bears in the google sheet.
The ← tells R that this entire code (unique(sheet$bears)) is going to be renamed as unique_bears when referenced. -
print(unique_bears)
This prints the results from this code, in our case it was to see the species of bears in the bears column.
Julius Answer:
Now that I know what three species of bears are present in the dataset, let’s move onto bringing in the map that we are going to be plotting the latitude and longitude data on.
Step 3: Bringing in Map of Pennsylvania
Prompt: Create a map of Pennsylvania using the library(maps) function and load it.
Code output by Julius:
library(maps)
penn_map <- map(‘state’, region =
‘pennsylvania’, fill = TRUE, col =
‘gray’, bg = ‘white’)
Breakdown of code:
-
library(maps)
As you probably can guess, this is telling R to obtain a package called maps, which provides map data for regions, which can include countries, states, provinces, etc., depending on the specified dataset. Of course, there are other packages that can do the same such as library(osm), library(rnnaturalearth), library(basemaps). But, for the purposes of a simplistic representation of the bear locations, a simple outline of the state of PA is sufficient. -
penn_map ← map(‘state’, region = ‘pennsylvania’, fill = TRUE, col = ‘gray’,
bg = ‘white’)
The map(xyz) is telling R that we want to load a dataset from the world map data. Everything within the brackets tells R which area on the map we want to focus on. For this example, we want it to focus on the state level, then look at the state of Pennsylvania. The fill = TRUE is just telling R that we want the polygon that is representing Pennsylvania to be filled in with a specific colour, which is defined in the code col = ‘gray’ and bg = ‘white’. The col = gray means we want the polygon to be gray, and the bg specifies the background colour as white. As you already know, the ← means that you are naming this specific block of code as penn_map.
Visual output of code:
Solid, now we have the map of Pennsylvania ready to go! What’s next? Time to plot the longitude and latitude coordinates for each bear onto this map.
Step 4: Layering the longitude and latitude coordinates onto the map
Prompt: layer all three species latitude and longitude coordinates in the dataset onto the map of Pennsylvania we just created.
Note: this code may look intimidating to new users, but do not get too overwhelmed as I will go over each aspect of code that we have not seen yet.
Code output by Julius:
library(maps)
library(ggplot2)
#Create a base map of Pennsylvania
penn_map <- map_data('state', region = 'pennsylvania')
#Convert the sheet data to a dataframe for ggplot
bear_data <- data.frame(long = as.numeric(sheet$long), lat =
as.numeric(sheet$lat), species = as.factor(sheet$bears))
#Plotting the map with bear locations
penn_plot <- ggplot() +
geom_polygon(data = penn_map,
aes(x = long, y = lat, group = group), fill = 'gray', color =
'white') +
geom_point(data = bear_data, aes(x = long, y = lat,
color = species), size = 3, alpha = 0.6) +
scale_color_manual(values = c('Black Bear' = 'black', 'Grizzly
Bear' = 'brown', 'Polar Bear' = 'blue')) +
labs(title = 'Bear Species in Pennsylvania', x = 'Longitude', y
= 'Latitude') + theme_minimal()
# Display the plot
print(penn_plot)
Breakdown of code:
-
library(ggplot2)
This is one of the most popular packages in R in relation to data analysis (from my experience anyways). This package is useful for creating graphics and visualizations in R. -
penn_map ← map_data(‘state’, region = ‘pennsylvania’)
This code looks slightly different from the original code that we made earlier, but it basically is just telling us that we are creating the base map of Pennsylvania. As you can see, we are just missing the fill = TRUE, col = ‘gray’, and bg = ‘white’ portion of the code. This will be added later on in the code. -
Bear_data ← data.frame(long = as.numeric(sheet$long), lat = as.numeric(sheet$lat), species = as.factor(sheet$bears))
This line of code converts the data from the sheet object (the information found in the google docs sheet) into a data.frame that can be plotted using the ggplot2 package. The term data.frame tells R that we are creating a new data frame based off of the sheet. The data frame will consist of the following:
a. long = as.numeric(sheet$long) & lat = as.numeric(sheet$lat)
This tells R that we are creating a column called “long” in the new data frame. The code as.numeric means that the values R is reading in the ‘long’ column are specified as numeric values. This process is redone with the “lat” column. The new data frame should now have the specified values from the original dataset.
b. species = as.factor(sheet$bears)
The new data frame will also have a column labeled species. The as.factor(sheet$bear) tells R that these variables are factors in the dataset, and that we are getting the variables from the bears column in the uploaded google sheet. -
penn_plot ← ggplot() + …
This line begins constructing the plot using the R package ggplot2. The + leads into the next portion of the code responsible for formatting. -
geom_polygon(data = penn_map, aes(x = long, y = lat, group = group) , fill = ‘gray’, color = ‘white’) + …
This is the layer responsible for adding polygons to the plot that represent the borders of Pennsylvania. You can see it uses the map we created penn_map when the data = command is brought up. The aes() is known as the aesthetics function, in which it is used to map variable properties on the plot. This can be position, colour, shape, size, transparency, etc.
The code within the bracket of the aes() tells R that the x-axis will be longitude, and for the y-axis, the latitude coordinates. This will make more sense once you see the plot. The group = group command is used here because we have multiple latitude and longitude variables, so they must be grouped accordingly. -
geom_point(data = bear_data, aes(x = long, y = lat, color = species), size = 3, alpha = 0.6) + …
This line is responsible for adding the layer of points to the plot representing the location of the bears. It takes the bear_data frame we created as the source of the data points in the code representing data = bear_data. The following code tells you that for the x-axis, the longitude coordinates will be plotted and for the y-axis, the latitude. The color = species tells R that we want different colors for each species so that we can distinguish between them. After the closing bracket, we have the code size = 3, which specifies the size of the data points being plotted. The code alpha = 0.6 sets the transparency of the data points being placed. -
scale_color_manual(values = c(‘Black Bear’ = ‘black’, ‘Grizzly Bear’ = ‘brown’, Polar Bear’ = ‘blue’)) + …
This line is responsible for specifying the colour of each bear species. Although we told R in the previous line that we want color = species, we did not specify which colours until this line. The values = c() tells R which colour to use on which species. You can see in the closed bracket that Black Bear = ‘black’, Grizzly Bear = ‘brown’, Polar Bear = ‘blue’. -
labs(title = ‘Bear Species in Pennsylvania’, x = ‘Longitude’, y = ‘Latitude’) + theme_minimal
The labs() line is responsible for placing a title over the plot as well as the x- and y-axis labels. This line is pretty nifty when it comes to formatting, so keep this friend on hand. The addition of + theme_minimal just applies a minimalistic theme to the plot and adjusts its appearance to not be as bold. If you’re curious about what it does exactly, try taking it out of the code and see what happens. -
print(penn_plot)
Prints the picture of our plot! Let’s see what it looks like.
Awesome! We now have a map with the longitude and latitude coordinates, the three different bear species with their representative colours, a legend AND a title! Oh, and it is on a map of Pennsylvania.
We can take this a step further by adding in heatmaps, but I think this was a lot for one post, so I encourage you to find the code on how to make a heatmap for this dataset (hint: try asking Julius to add some code to create a heat map).
I hope this guide has helped you learn a little bit about coding in Julius with R. Although R can be kind of intimidating when you first start, I highly recommend breaking the code into little chunks like we did here to really understand what is going on in each step. I promise you that coding gets easier as you do it more, so I encourage you to play around with some of the lines and change the colour, the size of the dots, the title, etc.
Happy coding!
Full Code used
For people starting out in R, we typically use # to indicate non-coding aspects in the block. You can see that #load the google sheet is above the code involved in loading the google sheet into R.
You also may notice that before we start loading the google sheet in, we load in the libraries required for the entire code. This is typically how coding in R is formatted.
library(maps)
library(ggplot2)
library(googlesheets4)
#load the google sheet
gs4_deauth()
sheet_url <-'https://docs.google.com/spreadsheets/d/1axYVyt1Qll6D60OA9ZjdL2n2FLVy0kSby82j071IgUs/edit?usp=sharing'
sheet <- read_sheet(sheet_url)
print(head(sheet))
#read the bears column in our sheet and rename it to
unique_bears, then print it
unique_bears <- unique(sheet$bears)
print(unique_bears)
#Create a base map of Pennsylvania
penn_map <- map_data('state', region = 'pennsylvania')
#Convert the sheet data to a dataframe for ggplot
bear_data <- data.frame(long = as.numeric(sheet$long), lat =
as.numeric(sheet$lat), species = as.factor(sheet$bears))
#Plotting the map with bear locations
penn_plot <- ggplot() +
geom_polygon(data = penn_map, aes(x = long, y = lat,
group = group), fill = 'gray', color = 'white') +
geom_point(data = bear_data, aes(x = long, y = lat, color =
species), size = 3, alpha = 0.6) +
scale_color_manual(values = c('Black Bear' = 'black', 'Grizzly
Bear' = 'brown', 'Polar Bear' = 'blue')) +
labs(title = 'Bear Species in Pennsylvania', x = 'Longitude', y
= 'Latitude') +
theme_minimal()
#Display the plot
print(penn_plot)
Keywords: mapping, spatial analysis, AI, data visualization, longitude, latitude, ggplot2, R, RStudio, Maps, AI coding