Chapter 8 Files exercises

This exercise will look at two files. You will be assigned tasks requiring you to read and write files as well as index data frames.

Use your "Exercises.R" file, ensuring you are using code sections to separate the different exercises. Additionally, set your working directory to your main workshop directory.

Create a new directory called "Chapter_8_files" within your main workshop directory to download the files bat_roosts.csv & honey_bee.tsv.

As the files we are going to read are in a different directory to our working directory we will have to specify the directory along with the file names. For example, to read in the "Liverpool_beaches.csv" file from the main directory you could use the following command.

liv_beaches_df <- read.csv("Chapter_7/Liverpool_beaches.csv", row.names = 1)

Ensure you also write any output to "Chapter_8_files".

8.1 Bats

First we will look at the file bat_roosts.csv. This contains information on the max number of roosts for different Bat species in different UK regions.

The data is from: "Bat Conservation Trust 2020. Roost Count peak counts summary data". Available from https://www.bats.org.uk/our-work/national-bat-monitoring-programme/reports/nbmp-annual-report

For this file carry out the below tasks:

  1. Read in the file "bat_roosts.csv" as a data frame variable called "bat_df". Ensure the row names contain the Regions (Channel Islands, East Midlands, etc.).
  2. Inspect the variable and ensure there are only numerics within the data frame with all strings only being in column and row names.
  3. Add a row to "bat_df" called "UK" that contains the totals for each Species.
  4. Add a column to "bat_df" called "All_Bat_Species" that contains the totals for each Region.
  5. Create a transposed data frame of "bat_df" called "bat_t_df".
  6. Write the data frame "bat_t_df" to a comma separated file called "bat_roosts_t.csv". Ensure there are no quotes surrounding the row or column names.

After you have carried the above tasks, attempt the following MCQs using the "bat_t_df" object.

  1. Which region has no roosts?
  2. Which region has the largest amount of Myotis brandtii roosts (excluding UK)?
  3. Which Bat species has the highest number of roosts across the UK?
  4. Which Bat species has the lowest number of roosts across the UK?

8.2 Honey bee colonies

Next we have a file (honey_bee.tsv) that contains information on the number of Honey Bee colonies in 4 different USA states. It is temporal data containing information on the 4 quarters for the years 2015-2018.

For more details and even more data please see the following link: https://www.kaggle.com/datasets/kyleahmurphy/nass-honey-bee-20152021

Carry out the below tasks:

  1. Read in the file "honey_bee.tsv" as a data frame variable called "bee_colonies_df". Ensure the row names contain the Year info (2015-Q1, 2015-Q2, etc.).
  2. Create a data frame called "bee_colonies_2017_2018_df" containing the rows for 2017 & 2018 from "bee_colonies_df".
  3. For each month in 2017 & 2018 print out the phrase "The number of colonies in Minnesota for <Year> was <Value>.
    • For example, the first phrase will be "The number of colonies in Minnesota for 2017-Q1 was 27000".
    • This can be done with one line of code using the paste() function.
  4. Make an average (mean) row for "bee_colonies_2017_2018_df".
  5. Finally write out the data frame "bee_colonies_2017_2018_df" as a tab separated file called "bee_colonies_2017_2018.tsv".

Now that you have carried that out, attempt the following questions based on the "bee_colonies_2017_2018_df" object.

  1. Which state has the lowest average?
  2. Which state has the highest minimum value of colonies?
  3. Which state had 20,000 colonies in Q3 of 2018 (2018-Q3)?

Tip: If you use the summary(), ensure you do not include the "Average" row.

Great! Have a look at the solutions and ask any questions you would like.