Chapter 11 Handy exercises

There has been a lot covered this session so these exercises will hopefully be straightforward.
Please set the working directory to your main workshop directory and use your "Exercises.R" script. Ensure you are using annotations and code sections to keep the contents clear and separated.
Additionally read in and write out files to "Chapter_10-11" .
Solutions are in the expandable boxes. Try your best to solve each challenge but use the solutions for help if you would like. Even if your method works it can be good to check the solution as there are many ways to do the same thing in R.
11.1 Tea exercise

The first task you will carry out is printing out information from "tea_df". Below is an example statement for the country Turkey:
"Turkey is the number 1 consumer of tea. It consumes 5.8kg of tea annually per capita."
Print out this statement for the countries Ireland, United Kingdom, France, and Australia with their relevant information. Make sure the kilogram value only has one decimal place.
Tip: You will require the functions paste()
and round()
from day 1.
First ensure you have the "tea_df" loaded (remember your working directory will need to be in the correct location first). Also it needs to be preprocessed with the gsub()
function.
tea_df <- read.csv("Chapter_10-11/tea_consumption.csv", check.names=FALSE)
tea_df$lb <- tea_df$KG_LB_annual_per_capita
tea_df$lb <- gsub(pattern = ".*_", replacement = "", tea_df$lb)
colnames(tea_df)[3] <- "kg"
tea_df$kg <- gsub(pattern = "_.*", replacement = "", tea_df$kg)
tea_df$kg <- as.numeric(tea_df$kg)
tea_df$lb <- as.numeric(tea_df$lb)
Remember there are many ways to carry this out but here is one.
First create a vector with the names of the countries we want:
Set the row names to the countries for easy indexing:
Note: Row name must be unique which is the case here.
Create a data frame that only contains our countries of interest. We use the vector as an index for the rows.
Here because we are working with a temporary variable we will overwrite the kg column so the values only contain one decimal place
Last step is to print out the statement. We will use paste0()
which is exactly like paste()
but the sep =
option is set to ""
.
11.2 English speakers across the world exercise

The last exercise is to create the following table as a data frame called "english_100mil_df". Use the "english_complete_datasets_df" data frame as a start.
Eligible population | Total English speakers | As first language | As an additional language | Fraction of population that are English speakers | |
---|---|---|---|---|---|
United States | 296603003 | 283160411 | 234171556 | 48988855 | 0.9546782 |
Nigeria | 156493000 | 79000000 | 0 | 79000000 | 0.5048149 |
Philippines | 110000000 | 64025890 | 36935 | 63988955 | 0.5820535 |
Bangladesh | 163323100 | 30108031 | 709873 | 29398158 | 0.1843464 |
China | 1210000000 | 10000000 | 0 | 10000000 | 0.0082645 |
Brazil | 205000000 | 10542000 | 292000 | 10250000 | 0.0514244 |
Mexico | 120664000 | 15686262 | 0 | 15686262 | 0.1299995 |
Mean | 323154729 | 70360371 | 33601481 | 36758890 | 0.3450831 |
Total | 2262083103 | 492522594 | 235210364 | 257312230 | 0.2177297 |
The data frame only contains countries that have an eligible population that is greater than 100 million (100000000). Ensure the "Total" row was not calculated using the "Mean row".
When you have created yours check it with the above one. Is your value for the "Total" "Fraction of population that are English speakers" correct?.
Once you have created the data frame write it out as a comma separated file with the function write.table()
called "English_top_7_populated_countries.csv". Have the row and column names surrounded by quotes in your file. Make sure there is an empty value above your row names.
First make sure the data frame is created. Remember to set your working directory to where the file is.
english_df <- read.csv(
"Chapter_10-11/english_speaking_population_of_countries.tsv",
sep = "\t",
row.names = 1,
check.names = FALSE)
english_df[is.na(english_df)] <- 0
english_complete_datasets_df <-
english_df[
(english_df$`As first language` + english_df$`As an additional language`) ==
english_df$`Total English speakers`,
]
Create new data frame only containing countries with an eligible population of > 100 million.
english_100mil_df <- english_complete_datasets_df[
english_complete_datasets_df$`Eligible population` > 100000000,
]
Create column with fraction of total english speakers against population
english_100mil_df$`Fraction of population that are English speakers` <-
english_100mil_df$`Total English speakers` /
english_100mil_df$`Eligible population`
Create row with mean values
Create row with totals
Create the total fraction of english speakers
english_100mil_df["Total","Fraction of population that are English speakers"] <-
english_100mil_df["Total","Total English speakers"] /
english_100mil_df["Total","Eligible population"]
Write the data as a file
11.3 Extra exercise

If you still have time this session and you do not have any questions please attempt the following task:
Create a multiplication table like the one in the practice from today. However have the row and column names equal one to twelve.
Then write the data frame to a file. The name and format of the file is up to you.
There is no solution for this exercise.