Histograms are commonly used to display the distribution of a single continuous variable. This is carried out by creating bins (i.e. number ranges) and counting the number of values that are within each bin. These bin counts displayed as a bar chart.
In this page we will create histograms with ggplot2::geom_histogram(). Through examples we will demonstrate creating:
For demonstration we’ll load the mammal_sleep_tbl data from the mgrtibbles package (hyperlink includes install instructions). We will select the “total_sleep” and “predation” columns for our plots.
#Load packagelibrary("mgrtibbles")#mammal_sleep_tbl tibble for demonstrationmammal_sleep_tbl<- mgrtibbles::mammal_sleep_tbl |> dplyr::select("total_sleep","danger") |>#Remove rows with NA values tidyr::drop_na() |>#Reorder danger factor levelsmutate(danger = forcats::fct_relevel(danger, "1","2","3","4","5"))mammal_sleep_tbl
When creating a histogram with geom_histogram() a continuous numerical variable/column can be mapped to the x aesthetic. The function will create bins (number ranges) and count the number of values that fit within each bin.
For our default histogram the data will be fit into 30 bins. This is always the default value of ggplot2::geom_histogram() and the function will output a message saying how to choose a better value. However, the best method on choosing a better value is to create the default plot first and then experiment with bin numbers afterwards. Choosing bin widths or bin numbers is shown below.
Create a histogram of the total_sleepvariable/column. This is a column of continuous doubles <dbl> representing the total hours of sleep per day for different mammals.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Bin width
The default number of bins for ggplot2::geom_histogram() is rarely going to be the best option. To choose the bin width (i.e. the number range of the bins) we can use the option binwidth=.
Create a histogram of total_sleep with a bin size of 1 hour (i.e. 1).
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Frequency polygon
Histograms are great if you are only looking at one group but not as suitable for looking at distributions between groups. Instead you can create a frequency polygon with ggplot2::geom_freqpoly(). A frequency polygon shows the same information as a histogram but uses lines instead of bars.
Create a frequency polygon of total__sleep separated by the danger levels of the mammals (1 = least danger from other animals, 5 = most danger from other animals).