Summarise

The function summarise() allows you to get a summary of the unique values within a grouped tibble. It will produce a __tibble_ of the summarised information.

Various helper functions can be used to get specific info including:

n(): Count number of instances of group.
mean(): Calculate means of a columns.
median(): Calculate median of columns.
sd(): Calculate standard deviation.
IQR(): Calculate interquartile range. iqr = upper quartile - lower quartile
first(): Extract first value.
last(): Extract last value.
nth(): Extract specified nth value.

Tidyverse reference page

Dataset

For demonstration we’ll load the amphibian_div_tbl data from the mgrtibbles package (hyperlink includes install instructions).

We’ll remove any rows with NAs as NAs cause mean and other other calculations to return NA.

#Load package
library("mgrtibbles")
#amphibian_div_tbl tibble for demonstration
amphibian_div_tbl <- mgrtibbles::amphibian_div_tbl |> na.omit()

Summarise

The default of summarise() is to produce a tibble of unique group values.

amphibian_div_tbl |> 
    dplyr::group_by(IUCN.Red.List.Status) |>
    dplyr::summarise()

# A tibble: 5 × 1
  IUCN.Red.List.Status            
  <fct>                           
1 Least Concern (LC)              
2 Near Threatened (NT)            
3 Vulnerable (VU)                 
4 Endangered (EN)                 
5 Least Concern (LC) - Provisional

Count

The counts of each unique value can be added with n().

Notice that the new column’s name is specified before the = sign. This is the same as the count() function.

amphibian_div_tbl |> 
    dplyr::group_by(IUCN.Red.List.Status) |>
    dplyr::summarise(n = n())

# A tibble: 5 × 2
  IUCN.Red.List.Status                 n
  <fct>                            <int>
1 Least Concern (LC)                  55
2 Near Threatened (NT)                 5
3 Vulnerable (VU)                      3
4 Endangered (EN)                      1
5 Least Concern (LC) - Provisional     1

Mean and median

The mean() function can be used to calculate the mean of groups and specified columns/variables.

amphibian_div_tbl |>
    #Group by IUCN.Red.List.Status
    dplyr::group_by(IUCN.Red.List.Status) |>
    #Summarise
    dplyr::summarise(n = n(), 
                    mean_body_size = mean(Body_size_mm), 
                    mean_max_litter_size = mean(Litter_size_max_n))

# A tibble: 5 × 4
  IUCN.Red.List.Status                 n mean_body_size mean_max_litter_size
  <fct>                            <int>          <dbl>                <dbl>
1 Least Concern (LC)                  55           173.                2570.
2 Near Threatened (NT)                 5           114.                 369.
3 Vulnerable (VU)                      3           133                   19 
4 Endangered (EN)                      1            87                  350 
5 Least Concern (LC) - Provisional     1            48                 1012

The median() function calculates medians.

amphibian_div_tbl |>
    #Group by IUCN.Red.List.Status
    dplyr::group_by(IUCN.Red.List.Status) |>
    #Summarise
    dplyr::summarise(n = n(), 
                    mean_body_size = mean(Body_size_mm),
                    median_body_size = median(Body_size_mm))

# A tibble: 5 × 4
  IUCN.Red.List.Status                 n mean_body_size median_body_size
  <fct>                            <int>          <dbl>            <dbl>
1 Least Concern (LC)                  55           173.              127
2 Near Threatened (NT)                 5           114.              111
3 Vulnerable (VU)                      3           133               170
4 Endangered (EN)                      1            87                87
5 Least Concern (LC) - Provisional     1            48                48

Standard deviation and IQR

The sd() and IQR() function calculate standard deviation and inter quartile range (upper quartile - lower quartile).

Note NAs are provided for standard deviation and 0 for IQR if there is only one value in the group.

amphibian_div_tbl |>
    #Group by IUCN.Red.List.Status
    dplyr::group_by(IUCN.Red.List.Status) |>
    #Summarise
    dplyr::summarise(n = n(),
                    sd_body_size = sd(Body_size_mm),
                    iqr_body_size = IQR(Body_size_mm),
                    )

# A tibble: 5 × 4
  IUCN.Red.List.Status                 n sd_body_size iqr_body_size
  <fct>                            <int>        <dbl>         <dbl>
1 Least Concern (LC)                  55        200.           94.5
2 Near Threatened (NT)                 5         51.7          83  
3 Vulnerable (VU)                      3         71.1          63.5
4 Endangered (EN)                      1         NA             0  
5 Least Concern (LC) - Provisional     1         NA             0

First, last and nth values

The first, last, and nth value can be extracted with the function first(), last(), and nth().

Note NAs are provided for nth() as there is only one value. first() and last() work as the single value is the first and last value.

amphibian_div_tbl |>
    #Group by IUCN.Red.List.Status
    dplyr::group_by(IUCN.Red.List.Status) |>
    #Summarise
    dplyr::summarise(n = n(),
                    first_body_size = first(Body_size_mm),
                    last_body_size = last(Body_size_mm),
                    second_body_size = nth(Body_size_mm, 2)
                    )

# A tibble: 5 × 5
  IUCN.Red.List.Status         n first_body_size last_body_size second_body_size
  <fct>                    <int>           <dbl>          <dbl>            <dbl>
1 Least Concern (LC)          55              38            197               33
2 Near Threatened (NT)         5              51             76              111
3 Vulnerable (VU)              3              51            170              178
4 Endangered (EN)              1              87             87               NA
5 Least Concern (LC) - Pr…     1              48             48               NA