20 Basic programming

In this chapter we will be look at some basic programming with R. This will involve loops, if statements, and creating your own functions.

Create and use the directory “Chapter_20-21” as the working directory for this chapter. Additionally create a new script called “Programming.R” for this chapter.

20.1 Recycle rule

When two objects of different lengths act upon each other the smaller object will be recycled so each element in the larger object is acted upon.

We will demonstrate this with the below example. Run the command below and then read the explanation.

small_vec <- 1:3
large_vec <- rep(1,9) 
small_vec * large_vec

small_vec is a vector with 3 elements, the numbers 1 to 3.
large_vec is a vector with 9 elements, this is the number 1 repeated 9 times.
As small_vec is 3 times smaller than large_vec each element in small_vec is recycled 2 times on top of the first usage.
- The first small_vec element is used to multiply the 1st, 4th, and 7th positions of large_vec.
- The second small_vec element is used to multiply the 2nd, 5th, and 8th positions of large_vec.
- The last/third small_vec element is used to multiply the 3rd, 6th, and 9th positions of large_vec.

Below is an example with a visualisation. large_vec is the numbers 1 to 9 this time:

small_vec <- 1:3
large_vec <- 1:9 
small_vec * large_vec

Depending on the length of the large vector the elements of the smaller vector may be recycled an uneven amount of times. For example:

The smaller vector has 3 elements.
The larger vector has 4 elements.
The 1st element of the smaller vector would be used twice (recycled once).
The 2nd & 3rd elements of the smaller vector would be used once (not recycled).

Below is a quick example of this. You should notice a warning is printed out as the length of the objects are not multiples of each other. However, above the warning is the correct output.

1:3 * rep(1,4)

Warnings usually mean the code will run fine but R thinks you may have done something wrong which you should check. Errors on the other hand mean the R code did not work.

Below are a few more examples of using the recycle rule:

#When you multiply one number (scalar) by a numeric vector
#You are recycling the one number (scalar)
2 * 1:6
#Some numeric recycling
1:2 * 1:10
1:4 + 1:8
8:1 - 1:4
2:1 / 1:4
(1:2 + 3:6) * seq(10, 120, 10)
#Strings can be recycled with the paste() function
#You'll notice that no warnings appear this time
#even though the lengths of the variables are not multiples
recyclable_materials <- c("A:aluminium","B:glass","C:paper")
recycle_centres <- rep(
  c("1:old swan","2:otterspool","3:south sefton","4:Huyton","5:Kirkby"),
  2
)
paste0(recyclable_materials, " can be recycled at ", recycle_centres)

20.2 Loops

With loops we can carry out the same commands and functions on multiple elements within a 1 dimensional object (vector or list) without having to type it out multiple times.

There are 2 main types of loops: for loops and while loops.

20.2.1 For loops

For loops will run through a set of variables. Each variable will be run through the set of commands once.

The basic format of a for loop is (don’t run the below):

for (variable in vector/list) {
  command/s
}

We will run a simple example below. We need the print() function so the results are print to the console. If this was not done nothing would be print out as it is a loop:

for (i in 1:5) {
  print(i * 10)
}

In essence the below was carried out:

loop_vec <- 1:5
i <- loop_vec[1]
print(i * 10)
i <- loop_vec[2]
print(i * 10)
i <- loop_vec[3]
print(i * 10)
i <- loop_vec[4]
print(i * 10)
i <- loop_vec[5]
print(i * 10)

Except instead of many lines we were able to carry it out in one command rather than many.

Analogy:

The loop is a circular assembly line.
Machines carry out the commands on the provided variables.
A worker will put the first element of the provided vector/list on the assembly, therefore setting it as i.
The machines will carry out the commands on i.
Once the worker gets the 1st set of results they will put the 2nd element on the assembly line, setting it as i.
This will repeat till the results from all the elements are returned.

A few more examples:

for (i in c(3,6,9)) {
  print(i / 3)
  print(i / 2)
}
for (i in c(3,6,9)) {
  print(i %% 3)
  print(i %% 2)
}
#You can have loops in a loop
for (y in 1:5) {
  for (x in 1:6) {
    print(paste0(y, " + ", x, " = ", y+x))
  }
}

#Let us make the fibonacci sequence (first 10 numbers) as a vector
#First we create a vector that contains the number 1 twice
fibo_vec <- c(1,1)
#In a loop we can use the loop variable (i) to index objects
#We can use this for assignment and subsetting
for (i in 3:10) {
  fibo_vec[i] <- fibo_vec[i-1] + fibo_vec[i-2]
}
#Check out the vector
fibo_vec

A lot of what we have been doing we could do without loops. Where I find for loops handy is when I want to carry out a task on multiple columns or rows.

We can use for loops to quickly create a multiplication table in a matrix.

#First we create a matrix with NA values
#Make sure it contains the number of rows and columns we want
multiplication_mat <- matrix(data = NA, nrow = 10, ncol = 10)
#Next we create a vector to loop over
loop_vec <-  1:10
#Now to loop through the loop_vec for the columns
for (c in loop_vec) {
  #Next we loop through the loop_vec for the rows
  for (r in loop_vec){
    #Calculate the multiplication
    multiply_number <- c * r
    #Assign the relevant position in the matrix to the multiplication
    multiplication_mat[r,c] <- multiply_number
  }
}
#Check the matrix
multiplication_mat

We can then use for loops to carry out specific commands to each column or each row.

#First thing we are going to do is divide each number in the matrix
#By the column total of its column
#We'll save the results in a new matrix called mult_prop_col_mat
#Create the new matrix
mult_prop_col_mat <- multiplication_mat
#Loop through the columns
for (c in 1:ncol(mult_prop_col_mat)) {
  #Calculate column total
  column_total <- sum(mult_prop_col_mat[,c])
  #Assign calculated proportion vector to the column
  mult_prop_col_mat[,c] <- mult_prop_col_mat[,c] / column_total
}
#check new matrix
mult_prop_col_mat

#We'll do the same again but for each row
#This time with no annotation
mult_prop_row_mat <- multiplication_mat
for (r in 1:nrow(mult_prop_row_mat)) {
  mult_prop_row_mat[r,] <- 
    mult_prop_row_mat[r,] / 
    sum(mult_prop_row_mat[r,])
}
mult_prop_row_mat

Hopefully this is clear how for loops can be used. Further down we’ll work with a real dataset to show some real world applications. However, for now we will go onto while loops.

20.2.2 While loops

I do not use while loops often so we will only briefly go over them. A while loop will loop over a series of commands until a condition is no longer met. Conditions will either be TRUE or FALSE (logical)

The format of a while loop:

while (condition) {
  command/s
}

Below is an example:

i <- 1
while (i < 10) {
  print(paste0(
    "At the start of loop number:", i, ", the variable i is ", i))
  i <- i + 1
  print(paste0("-----"))
}
i

In this case 1 is being added to i in each loop. The loop keeps going till i is no longer less than 10. Therefore, at the end of the while loop i is equal to 10.

One final point on while loops. You can make a while loop so the condition will always be met. If this is the case the while loop will never finish and you’ll need to use “ESC” in the console to stop the while loop. An example is below which you should not run:

i <- 1
while (i > 0) {
  print(paste0(
    "At the start of loop number:", i, ", the variable i is ", i))
  i <- i + 1
  print(paste0("-----"))
}
i

20.3 If statements

If statements allow for different commands to be carried out depending on if a condition is met.

This can be thought of like a flow chart where you go one way if you answer yes and another if you answer no. Example:

Are you thirsty?
If yes, have something to drink.
If no, do not have something to drink.

Basic format of an if equation:

if (condition) {
  command/s if condition true
} else {
  command/s if condition false
}

I’ll show you a coding example using the modulus (%%) operator.

#Create a scalar
i <- 3
#If statement determing if the remainder of i/2 is 1
if (( i %% 2) == 1) {
  paste0(i, " is odd")
} else {
  paste0(i, " is even")
}

You can have an if statement within a loop.

Reminder: We need to use print() within loops to print results to the console.

for (i in 0:9) {
  #If statement determining if the remainder of i/2 is 1
  if (( i %% 2) == 1) {
    print(paste0(i, " is odd"))
  } else {
    print(paste0(i, " is even"))
  }
}

20.4 Functions

Throughout this course we have used many functions. However, you may want a function that does not exist. In this case you can create your own.

The format to create your own function is:

function_name <- function(inputs_and_option_names){
  command/s
  return(new_variable)
}

There are 2 new functions above:

function(): This function creates the new function.
return(): This specifies what will be returned when we run our new function.

A quick example to add VAT to each number in a vector:

#Create the function
add_vat <- function(input_num_vec) {
  vat_vec <- input_num_vec * 1.2
  return(round(vat_vec, digits = 2))
}
#Example running the new function
add_vat(0.99)
add_vat(c(9.99, 8.78, 2.45))

In the below example we will create a function that will return all numbers that are multiples of 5 from the input vector.

#Create the function
all_mult_5_func <- function(input_num_vec){
  #Create an empty vector prior to loop to contain multiples of 5
  out_vec <- vector(mode="numeric", length=0)
  #Loop to add multiples of 5 to out_vec
  for (i in input_num_vec){
    #If statement so only multiples of 5 are added to the out_vec
    #In this case no else part is required
    if ((i %% 5) == 0 ){
      out_vec <- c(out_vec, i)
    }
  }
  #Have the return function at the very end
  return(out_vec)
}
#Try out the function with some vectors
all_mult_5_func(1:20)
all_mult_5_func(seq(from = 0, to = 100, by = 8))
all_mult_5_func(1:10 * 5)

Like with normal functions a custom one can be given multiple arguments if built that way. Knowing this we will alter the above function so it can return all numbers that are multiples of a chosen number.

#When creating the function add in another input/option name
all_mult_x_func <- function(input_num_vec, multiple){
  #Create an empty vector prior to loop to contain the multiples
  out_vec <- vector(mode="numeric", length=0)
  #Loop to add the multiples to out_vec
  for (i in input_num_vec){
    #If statement so only multiples of the chosen number are added to the out_vec
    #In this case no else part is required
    if ((i %% multiple) == 0 ){
      out_vec <- c(out_vec, i)
    }
  }
  #Have the return function at the very end
  return(out_vec)
}
#Try out the function with some vectors
all_mult_x_func(input_num_vec = 1:20, multiple = 2)
all_mult_x_func(seq(from = 0, to = 100, by = 8), 4)
all_mult_x_func(1:10 * 5, 10)
#As always we can assign the output of a function to a new variable
multiples_of_6_btwn_1_100 <- all_mult_x_func(1:100, 6)

With all that we are now going to a real dataset in the next session to show the use and applications of these new techniques.