Getting to grips with R functions
The situation
In R, as with most languages, variables have a scope in which they can be used.
When using functions in R, any variables that are assigned within the function using the usual <- can’t be used outside of the function later.
This is because the variables within the function are stored in an environment that exists only within the function.
interestingFunction <- function() {
# Assign a variable within the function
thisVariable <- 'the value of thisVariable'
# A print statement that works
print(thisVariable)
}
# Run the function
interestingFunction()
# A print statement that doesn't work
print(thisVariable)
Using return()
That’s normally fine, because you can use the return() function to get a variable out to use it elsewhere, storing it.
Note, you can change the variable name upon its return, depending on what variable name you assign the output to.
interestingFunction <- function() {
# Assign a variable within the function
thisVariable <- 'the value of thisVariable'
# A print statement that works
print(thisVariable)
return(thisVariable)
}
# Run the function, assigning the output to thatVar
thatVar <- interestingFunction()
# A print statement that works, with a different name
print(thatVar)
Using super assignment <<- (not recommended, see below)
Let’s say you have a function that returns a variable, and you want to run it over a list of inputs.
You have heard that R devs frown upon people using for loops, so you want to use lapply() instead.
You create a function which is meant to manipulate a value outside of it.
You can’t use return() here, because R will just put the output into a list, and the variable that you want to manipulate will become a list of the individual ouputs of each run of the function.
Using super assignment solve the issue, by allowing the user to store variables outside of the scope of the function.
# Define the list of inputs, and the initial value of the output var
listOfInputs <- c(1, 5, 9, 3, 5)
outVar <- 1
# Define function
interestingFunction <- function(inVar) {
# Assign a variable within the function
outVar <<- outVar * inVar
}
# Run the function using lapply, then print the final result
lapply(listOfInputs, FUN = interestingFunction)
print(outVar)
Nested functions don’t work well with super assignment
Nested functions and super assignment work slightly differently. If a variable that you are messing with in an inner function is defined in the outer function, then that variable will be manipulated. Otherwise, that won’t work.
# Define the outermost outVar
outVar <- 1
print(paste("Before running:", outVar)) # outVar = 1
# Define the nested functions
outerFunction <- function() {
# Define outVar within this outer function
outVar <- 5
print(paste("In the outer function, before the inner:", outVar)) # outVar = 5
# Define inner function
innerFunction <- function() {
outVar <<- 9
print(paste("After assignment in the inner:", outVar)) # outVar = 9
}
innerFunction()
print(paste("In the outer function, after the inner:", outVar)) # outVar = 9
}
outerFunction()
# Run the function using lapply
print(paste("Outside again:", outVar)) # outVar = 1
To get this out, you can pass the variable assignment out in a stepwise manner. You can assign the variable in the inner function, and pass it out to the outer one, then have the outer function pass it outside.
# Define the outermost outVar
outVar <- 1
print(paste("Before running:", outVar)) # outVar = 1
# Define the nested functions
outerFunction <- function() {
# Define outVar within this outer function
outVar <- 5
print(paste("In the outer function, before the inner:", outVar)) # outVar = 5
# Define inner function
innerFunction <- function() {
outVar <<- 9
print(paste("After assignment in the inner:", outVar)) # outVar = 9
}
innerFunction()
print(paste("In the outer function, after the inner:", outVar)) # outVar = 9
outVar <<- outVar
}
outerFunction()
# Run the function using lapply
print(paste("Outside again:", outVar)) # outVar = 9
Specifying environments for assignment (recommended)
Super assignmnet is useful, but isn’t very explicit. Using control of the environments, you can actually specify exactly where you want to put a variable.
What’s an environment?
The first few paragraphs of this article describe what environments are.
When you make functions, whatever happens inside the function isn’t visible to the outside, so whenever you assign variables within the function, they are specific to the env of the function.
That’s the whole reason behind why you need to use return() or <<-, etc.
How to use environments
The main functions that you will need are assign() and environment()
In this example, there are two nested functions outerFunction() and innerFunction().
The variable outVar is assigned in three places:
- Outside the functions, in the “global environment”, where it can be accessed by everything
- At the beginning of
outerFunction() - At the beginning of
innerFunction()
How this code works is after the snippet.
# Define the outermost outVar
outVar <- 1
print(paste("Before running:", outVar)) # outVar = 1
# Define the nested functions
outerFunction <- function() {
# Define outVar within this outer function
outVar <- 5
print(paste("In the outer function, before the inner:", outVar)) # outVar = 5
envName <- environment()
# Define inner function with assignment to global environment, and run it
innerFunction <- function() {
outVar <- 13
assign("outVar", 9, envir = .GlobalEnv)
assign("outVar", 8, envir = envName)
print(paste("In the inner function:", outVar)) # outVar = 13
}
innerFunction()
print(paste("In the outer function, after the inner:", outVar)) # outVar = 5
}
# Run the function and check the output
outerFunction()
print(paste("Outside again:", outVar)) # outVar = 9
How this code works
The assign() function can be used with three parameters: the name of the variable, the value you want to store in the variable, and the environment in which you want to store your variable.
Within innerFunction(), the assign() function overwrites the outVar that’s in the second line of the snippet, by specifying .GlobalEnv as the environment you want to assign it to.
There are two lines that work to store 8 in the version of outVar that’s in the outerFunction().
These are:
envName <- environment()
from the outerFunction(), and:
assign("outVar", 8, envir = envName)
in the innerFunction().
Every env has a unique ID associated with it, and we need that ID to assign successfully to it.
The first line gets the ID using environment(), and stores it in envName.
The second assigns to the env of the outer function.