8 Files exercises
This exercise will look at two files. You will be assigned tasks requiring you to read and write files as well as index data frames.
Use your “Exercises.R” file, ensuring you are using code sections to separate the different exercises. Additionally, set your working directory to your main workshop directory.
Create a new directory called “Chapter_8_files” within your main workshop directory to download the files:
As the files we are going to read are in a different directory to our working directory we will have to specify the directory along with the file names. For example, to read in the “Liverpool_beaches.csv” file from the main directory you could use the following command.
<- read.csv("Chapter_7/Liverpool_beaches.csv", row.names = 1) liv_beaches_df
Ensure you also write any output to “Chapter_8_files”.
8.1 Bats
First we will look at the file bat_roosts.csv. This contains information on the max number of roosts for different Bat species in different UK regions.
The data is from: “Bat Conservation Trust 2020. Roost Count peak counts summary data”. Available from https://www.bats.org.uk/our-work/national-bat-monitoring-programme/reports/nbmp-annual-report
For this file carry out the below tasks:
- Read in the file “bat_roosts.csv” as a data frame variable called “bat_df”. Ensure the row names contain the Regions (Channel Islands, East Midlands, etc.).
- Inspect the variable and ensure there are only numerics within the data frame with all strings only being in column and row names.
- Add a row to “bat_df” called “UK” that contains the totals for each Species.
- Add a column to “bat_df” called “All_Bat_Species” that contains the totals for each Region.
- Create a transposed data frame of “bat_df” called “bat_t_df”.
- Write the data frame “bat_t_df” to a comma separated file called “bat_roosts_t.csv”. Ensure there are no quotes surrounding the row or column names.
Now that you have carried that out, can you answer the following questions?
- Which region has no roosts?
- Which Bat species has the highest number of roosts across the UK?
- Which Bat species has the lowest number of roosts across the UK?
8.2 UK retail
Next we have the file UK_retail.tsv containing UK retail information for each month from September 2017 to September 2020. The values are seasonally adjusted volume sales. The data comes from: https://www.ons.gov.uk/businessindustryandtrade/retailindustry/bulletins/retailsales/september2020.
Carry out the below tasks:
- Read in the file “UK_retail.tsv” as a data frame variable called “uk_retail_df”. Ensure the row names contain the YearMonth info (2017SEP, 2017OCT, etc.).
- Create a data frame called “uk_retail_2020_df” containing the rows for 2020 from “uk_retail_df”.
- For each month in 2020 print out the phrase “The Food retail index for <YearMonth> was <Food>”. For example the first phrase will be “The Food retail index for 2020JAN was 101.9”. This can be done with one line of code using the
paste()
function. - Make a total row and an average (mean) row for “uk_retail_2020_df”. Ensure you are not including the total in the mean.
- Finally write out the data frame “uk_retail_2020_df” as a tab separated file called “UK_retail_2020.tsv”.
Now that you have carried that out, can you answer the following questions?
- Which retail sectors have a lower average than their February 2020 value?
- Which retail sector was the highest for 2020?
- Which sector was the most stable?
Great! Have a look at the solutions and see how they compare to your code.