Chapter 11 Handy exercises

There has been a lot covered this session so these exercises will hopefully be straightforward.

Please set the working directory to your main workshop directory and use your "Exercises.R" script. Ensure you are using annotations and code sections to keep the contents clear and separated.

Additionally read in and write out files to "Chapter_10-11" .

Solutions are in the expandable boxes. Try your best to solve each challenge but use the solutions for help if you would like. Even if your method works it can be good to check the solution as there are many ways to do the same thing in R.

11.1 Tea exercise

The first task you will carry out is printing out information from "tea_df". Below is an example statement for the country Turkey:

"Turkey is the number 1 consumer of tea. It consumes 5.8kg of tea annually per capita."

Print out this statement for the countries Ireland, United Kingdom, France, and Australia with their relevant information. Make sure the kilogram value only has one decimal place.

Tip: You will require the functions paste() and round() from day 1.

First ensure you have the "tea_df" loaded (remember your working directory will need to be in the correct location first). Also it needs to be preprocessed with the gsub() function.

tea_df <- read.csv("Chapter_10-11/tea_consumption.csv", check.names=FALSE)
tea_df$lb <- tea_df$KG_LB_annual_per_capita
tea_df$lb <- gsub(pattern = ".*_", replacement = "", tea_df$lb)
colnames(tea_df)[3] <- "kg"
tea_df$kg <- gsub(pattern = "_.*", replacement = "", tea_df$kg)
tea_df$kg <- as.numeric(tea_df$kg)
tea_df$lb <- as.numeric(tea_df$lb)

Remember there are many ways to carry this out but here is one.

First create a vector with the names of the countries we want:

countries <- c("Ireland", "United Kingdom", "France", "Australia")

Set the row names to the countries for easy indexing:

Note: Row name must be unique which is the case here.

row.names(tea_df) <- tea_df$Country

Create a data frame that only contains our countries of interest. We use the vector as an index for the rows.

tea_df_subset <- tea_df[countries,]

Here because we are working with a temporary variable we will overwrite the kg column so the values only contain one decimal place

tea_df_subset$kg <- round(x = tea_df_subset$kg, digits = 1)

Last step is to print out the statement. We will use paste0() which is exactly like paste() but the sep = option is set to "".

paste0(tea_df_subset$Country, " is the number ", 
       tea_df_subset$Rank, " consumer of tea. It consumes ", 
       tea_df_subset$kg, "kg of tea annually per capita.")

11.2 English speakers across the world exercise

The last exercise is to create the following table as a data frame called "english_100mil_df". Use the "english_complete_datasets_df" data frame as a start.

	Eligible population	Total English speakers	As first language	As an additional language	Fraction of population that are English speakers
United States	296603003	283160411	234171556	48988855	0.9546782
Nigeria	156493000	79000000	0	79000000	0.5048149
Philippines	110000000	64025890	36935	63988955	0.5820535
Bangladesh	163323100	30108031	709873	29398158	0.1843464
China	1210000000	10000000	0	10000000	0.0082645
Brazil	205000000	10542000	292000	10250000	0.0514244
Mexico	120664000	15686262	0	15686262	0.1299995
Mean	323154729	70360371	33601481	36758890	0.3450831
Total	2262083103	492522594	235210364	257312230	0.2177297

The data frame only contains countries that have an eligible population that is greater than 100 million (100000000). Ensure the "Total" row was not calculated using the "Mean row".

When you have created yours check it with the above one. Is your value for the "Total" "Fraction of population that are English speakers" correct?.

Once you have created the data frame write it out as a comma separated file with the function write.table() called "English_top_7_populated_countries.csv". Have the row and column names surrounded by quotes in your file. Make sure there is an empty value above your row names.

First make sure the data frame is created. Remember to set your working directory to where the file is.

english_df <- read.csv(
  "Chapter_10-11/english_speaking_population_of_countries.tsv", 
  sep = "\t", 
  row.names = 1,
  check.names = FALSE)
english_df[is.na(english_df)] <- 0
english_complete_datasets_df <- 
  english_df[
    (english_df$`As first language` + english_df$`As an additional language`) ==
      english_df$`Total English speakers`,
    ]

Create new data frame only containing countries with an eligible population of > 100 million.

english_100mil_df <- english_complete_datasets_df[
  english_complete_datasets_df$`Eligible population` > 100000000,
  ]

Create column with fraction of total english speakers against population

english_100mil_df$`Fraction of population that are English speakers` <- 
  english_100mil_df$`Total English speakers` /
  english_100mil_df$`Eligible population`

Create row with mean values

english_100mil_df["Mean",] <- colMeans(english_100mil_df)

Create row with totals

english_100mil_df["Total",1:4] <- colSums(english_100mil_df[1:7,1:4])

Create the total fraction of english speakers

english_100mil_df["Total","Fraction of population that are English speakers"] <- 
  english_100mil_df["Total","Total English speakers"] /
  english_100mil_df["Total","Eligible population"]

Write the data as a file

write.table(x = english_100mil_df, 
            file = "Chapter_10-11/English_top_7_populated_countries.csv", 
            col.names=NA,
            quote = TRUE, 
            sep = ",")

11.3 Extra exercise

If you still have time this session and you do not have any questions please attempt the following task:

Create a multiplication table like the one in the practice from today. However have the row and column names equal one to twelve.

Then write the data frame to a file. The name and format of the file is up to you.

There is no solution for this exercise.