Chapter 12 Handy Solutions

12.1 Tea solution

First ensure you have the "tea_df" loaded (remember your working directory will need to be in the correct location first). Also it needs to be preprocessed with the gsub() function.

tea_df <- read.csv("Chapter_10-11/tea_consumption.csv", check.names=FALSE)
tea_df$lb <- tea_df$KG_LB_annual_per_capita
tea_df$lb <- gsub(pattern = ".*_", replacement = "", tea_df$lb)
colnames(tea_df)[3] <- "kg"
tea_df$kg <- gsub(pattern = "_.*", replacement = "", tea_df$kg)
tea_df$kg <- as.numeric(tea_df$kg)
tea_df$lb <- as.numeric(tea_df$lb)

Remember there are many ways to carry this out but here is one.

First create a vector with the names of the countries we want:

countries <- c("Ireland", "United Kingdom", "France", "Australia")

Set the row names to the countries for easy indexing:

Note: Row name must be unique which is the case here.

row.names(tea_df) <- tea_df$Country

Create a data frame that only contains our countries of interest. We use the vector as an index for the rows.

tea_df_subset <- tea_df[countries,]

Here because we are working with a temporary variable we will overwrite the kg column so the values only contain one decimal place

tea_df_subset$kg <- round(x = tea_df_subset$kg, digits = 1)

Last step is to print out the statement. We will use paste0() which is exactly like paste() but the sep = option is set to "".

paste0(tea_df_subset$Country, " is the number ", 
       tea_df_subset$Rank, " consumer of tea. It consumes ", 
       tea_df_subset$kg, "kg of tea annually per capita.")

12.2 English speakers across the world solution

First make sure the data frame is created. Remember to set your working directory to where the file is.

english_df <- read.csv(
  "Chapter_10-11/english_speaking_population_of_countries.tsv", 
  sep = "\t", 
  row.names = 1,
  check.names = FALSE)
english_df[is.na(english_df)] <- 0
english_complete_datasets_df <- 
  english_df[
    (english_df$`As first language` + english_df$`As an additional language`) ==
      english_df$`Total English speakers`,
    ]

Create new data frame only containing countries with an eligible population of > 100 million.

english_100mil_df <- english_complete_datasets_df[
  english_complete_datasets_df$`Eligible population` > 100000000,
  ]

Create column with fraction of total english speakers against population

english_100mil_df$`Fraction of population that are English speakers` <- 
  english_100mil_df$`Total English speakers` /
  english_100mil_df$`Eligible population`

Create row with mean values

english_100mil_df["Mean",] <- colMeans(english_100mil_df)

Create row with totals

english_100mil_df["Total",1:4] <- colSums(english_100mil_df[1:7,1:4])

Create the total fraction of english speakers

english_100mil_df["Total","Fraction of population that are English speakers"] <- 
  english_100mil_df["Total","Total English speakers"] /
  english_100mil_df["Total","Eligible population"]

Write the data as a file

write.table(x = english_100mil_df, 
            file = "Chapter_10-11/English_top_7_populated_countries.csv", 
            col.names=NA,
            quote = TRUE, 
            sep = ",")