Mutate

The function dplyr::mutate() allows you to:

You can use operators and/or functions in a mutate() function with one or multiple columns/variables within the tibble you are mutating.

Tidyverse reference page

Dataset

For demonstration we’ll load the mammal_sleep_tbl data from the mgrtibbles package (hyperlink includes install instructions).

#Load package
library("mgrtibbles")
#mammal_sleep_tbl tibble for demonstration
mgrtibbles::mammal_sleep_tbl
# A tibble: 62 × 11
   species          body_wt brain_wt non_dreaming dreaming total_sleep life_span
   <chr>              <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>
 1 Africanelephant  6.65e+3   5.71           NA       NA           3.3      38.6
 2 Africangiantpou… 1   e+0   0.0066          6.3      2           8.3       4.5
 3 ArcticFox        3.38e+0   0.0445         NA       NA          12.5      14  
 4 Arcticgroundsqu… 9.2 e-1   0.0057         NA       NA          16.5      NA  
 5 Asianelephant    2.55e+3   4.60            2.1      1.8         3.9      69  
 6 Baboon           1.06e+1   0.180           9.1      0.7         9.8      27  
 7 Bigbrownbat      2.3 e-2   0.0003         15.8      3.9        19.7      19  
 8 Braziliantapir   1.6 e+2   0.169           5.2      1           6.2      30.4
 9 Cat              3.3 e+0   0.0256         10.9      3.6        14.5      28  
10 Chimpanzee       5.22e+1   0.44            8.3      1.4         9.7      50  
# ℹ 52 more rows
# ℹ 4 more variables: gestation <dbl>, predation <fct>, exposure <fct>,
#   danger <fct>

Create new column

The format of the mutate() function is:

dplyr::rename(mutated=mutation)

Create a column of brain and body weight ratio.

mammal_sleep_tbl |>
    #Select columns of interest
    dplyr::select(species, body_wt, brain_wt) |>
    #Calculate and add brain_body_wt_ratio column
    dplyr::mutate(brain_body_wt_ratio = brain_wt/(body_wt*1000))
# A tibble: 62 × 4
   species                 body_wt brain_wt brain_body_wt_ratio
   <chr>                     <dbl>    <dbl>               <dbl>
 1 Africanelephant        6654       5.71           0.000000858
 2 Africangiantpouchedrat    1       0.0066         0.0000066  
 3 ArcticFox                 3.38    0.0445         0.0000131  
 4 Arcticgroundsquirrel      0.92    0.0057         0.00000620 
 5 Asianelephant          2547       4.60           0.00000181 
 6 Baboon                   10.6     0.180          0.0000170  
 7 Bigbrownbat               0.023   0.0003         0.0000130  
 8 Braziliantapir          160       0.169          0.00000106 
 9 Cat                       3.3     0.0256         0.00000776 
10 Chimpanzee               52.2     0.44           0.00000844 
# ℹ 52 more rows

Create a column for life time sleep.

mammal_sleep_tbl |>
    #Select columns of interest
    dplyr::select(species, total_sleep, life_span) |>
    #Calculate and add total_life_sleep column
    dplyr::mutate(total_life_sleep = (life_span*365.25) * total_sleep)
# A tibble: 62 × 4
   species                total_sleep life_span total_life_sleep
   <chr>                        <dbl>     <dbl>            <dbl>
 1 Africanelephant                3.3      38.6           46526.
 2 Africangiantpouchedrat         8.3       4.5           13642.
 3 ArcticFox                     12.5      14             63919.
 4 Arcticgroundsquirrel          16.5      NA                NA 
 5 Asianelephant                  3.9      69             98289.
 6 Baboon                         9.8      27             96645.
 7 Bigbrownbat                   19.7      19            136713.
 8 Braziliantapir                 6.2      30.4           68842.
 9 Cat                           14.5      28            148292.
10 Chimpanzee                     9.7      50            177146.
# ℹ 52 more rows

Modify existing column

Modify the body_wt column with the round() function.

mammal_sleep_tbl |>
    #Round numbers of body_wt columns
    dplyr::mutate(body_wt = round(body_wt))
# A tibble: 62 × 11
   species          body_wt brain_wt non_dreaming dreaming total_sleep life_span
   <chr>              <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>
 1 Africanelephant     6654   5.71           NA       NA           3.3      38.6
 2 Africangiantpou…       1   0.0066          6.3      2           8.3       4.5
 3 ArcticFox              3   0.0445         NA       NA          12.5      14  
 4 Arcticgroundsqu…       1   0.0057         NA       NA          16.5      NA  
 5 Asianelephant       2547   4.60            2.1      1.8         3.9      69  
 6 Baboon                11   0.180           9.1      0.7         9.8      27  
 7 Bigbrownbat            0   0.0003         15.8      3.9        19.7      19  
 8 Braziliantapir       160   0.169           5.2      1           6.2      30.4
 9 Cat                    3   0.0256         10.9      3.6        14.5      28  
10 Chimpanzee            52   0.44            8.3      1.4         9.7      50  
# ℹ 52 more rows
# ℹ 4 more variables: gestation <dbl>, predation <fct>, exposure <fct>,
#   danger <fct>

Delete columns

Remove the species column by setting it to NULL.

mammal_sleep_tbl |>
    #Remove species column
    dplyr::mutate(species = NULL)
# A tibble: 62 × 10
    body_wt brain_wt non_dreaming dreaming total_sleep life_span gestation
      <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>     <dbl>
 1 6654       5.71           NA       NA           3.3      38.6       645
 2    1       0.0066          6.3      2           8.3       4.5        42
 3    3.38    0.0445         NA       NA          12.5      14          60
 4    0.92    0.0057         NA       NA          16.5      NA          25
 5 2547       4.60            2.1      1.8         3.9      69         624
 6   10.6     0.180           9.1      0.7         9.8      27         180
 7    0.023   0.0003         15.8      3.9        19.7      19          35
 8  160       0.169           5.2      1           6.2      30.4       392
 9    3.3     0.0256         10.9      3.6        14.5      28          63
10   52.2     0.44            8.3      1.4         9.7      50         230
# ℹ 52 more rows
# ℹ 3 more variables: predation <fct>, exposure <fct>, danger <fct>

Multiple columns with across()

The dplyr::across() function can be used within a dplyr::mutate() function to apply a change across multiple columns.

across() Tidyverse reference page

Numerically round the body_wt and brain_wt columns.

mammal_sleep_tbl |>
    #Round numbers of body_wt ad brain_wt columns
    dplyr::mutate(
        dplyr::across(c(body_wt, brain_wt), round)
        )
# A tibble: 62 × 11
   species          body_wt brain_wt non_dreaming dreaming total_sleep life_span
   <chr>              <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>
 1 Africanelephant     6654        6         NA       NA           3.3      38.6
 2 Africangiantpou…       1        0          6.3      2           8.3       4.5
 3 ArcticFox              3        0         NA       NA          12.5      14  
 4 Arcticgroundsqu…       1        0         NA       NA          16.5      NA  
 5 Asianelephant       2547        5          2.1      1.8         3.9      69  
 6 Baboon                11        0          9.1      0.7         9.8      27  
 7 Bigbrownbat            0        0         15.8      3.9        19.7      19  
 8 Braziliantapir       160        0          5.2      1           6.2      30.4
 9 Cat                    3        0         10.9      3.6        14.5      28  
10 Chimpanzee            52        0          8.3      1.4         9.7      50  
# ℹ 52 more rows
# ℹ 4 more variables: gestation <dbl>, predation <fct>, exposure <fct>,
#   danger <fct>

Convert hour values of non_dreaming, dreaming, and total_sleep to minutes by multiplying by 60.

Note: \(x) x*60 is a purrr formula.

mammal_sleep_tbl |>
    #Round numbers of body_wt ad brain_wt columns
    dplyr::mutate(
        dplyr::across(c(non_dreaming, dreaming, total_sleep), \(x) x*60)
        )
# A tibble: 62 × 11
   species          body_wt brain_wt non_dreaming dreaming total_sleep life_span
   <chr>              <dbl>    <dbl>        <dbl>    <dbl>       <dbl>     <dbl>
 1 Africanelephant  6.65e+3   5.71             NA       NA         198      38.6
 2 Africangiantpou… 1   e+0   0.0066          378      120         498       4.5
 3 ArcticFox        3.38e+0   0.0445           NA       NA         750      14  
 4 Arcticgroundsqu… 9.2 e-1   0.0057           NA       NA         990      NA  
 5 Asianelephant    2.55e+3   4.60            126      108         234      69  
 6 Baboon           1.06e+1   0.180           546       42         588      27  
 7 Bigbrownbat      2.3 e-2   0.0003          948      234        1182      19  
 8 Braziliantapir   1.6 e+2   0.169           312       60         372      30.4
 9 Cat              3.3 e+0   0.0256          654      216         870      28  
10 Chimpanzee       5.22e+1   0.44            498       84         582      50  
# ℹ 52 more rows
# ℹ 4 more variables: gestation <dbl>, predation <fct>, exposure <fct>,
#   danger <fct>

Changing column classes

You can convert a column to a different class with the as. functions, including:

  • as.numeric(): Convert to numerics
    • Especially useful when a column of numbers is set to characters
  • as.character(): Convert to string/character
  • as.factor(): Convert to factors

First we’ll use the function forcats::fct_recode() to recode the factor levels of predation, exposure, and danger from 1-5 to least likely-most likely.

Glimpse original numbered factors.

#Original numbers factors
mammal_sleep_tbl |>
    #Select columns of interest
    dplyr::select(species,predation:danger) |>
    dplyr::glimpse()
Rows: 62
Columns: 4
$ species   <chr> "Africanelephant", "Africangiantpouchedrat", "ArcticFox", "A…
$ predation <fct> 3, 3, 1, 5, 3, 4, 1, 4, 1, 1, 5, 5, 2, 5, 1, 2, 2, 2, 1, 1, …
$ exposure  <fct> 5, 1, 1, 2, 5, 4, 1, 5, 2, 1, 4, 5, 1, 5, 1, 2, 2, 2, 2, 1, …
$ danger    <fct> 3, 3, 1, 3, 4, 4, 1, 4, 1, 1, 4, 5, 2, 5, 1, 2, 2, 2, 1, 1, …

Recode factors to strings and glimpse new tibble.

#New string factors tibble
mammal_sleep_fct_indexes_tbl <-
    mammal_sleep_tbl |>
    #Select columns of interest
    dplyr::select(species,predation:danger) |>
    #Mutate predation to convert indexes from number to character values
    #Demonstrates factor recoding on one column
    dplyr::mutate(
        predation=forcats::fct_recode(
            predation, "least likely"="1","unlikely"="2",
            "neither"="3","likely"="4","most likely"="5")
    ) |>
    #Mutate exposure and danger to convert indexes from number to character values
    #Demonstrates the same factor recoding on multiple columns
    dplyr::mutate(
        dplyr::across(
            exposure:danger,
            ~ forcats::fct_recode(.x,
            "least likely"="1","unlikely"="2",
            "neither"="3","likely"="4","most likely"="5")
        )
    )
#Glimpse created tibble
mammal_sleep_fct_indexes_tbl |> dplyr::glimpse()
Rows: 62
Columns: 4
$ species   <chr> "Africanelephant", "Africangiantpouchedrat", "ArcticFox", "A…
$ predation <fct> neither, neither, least likely, most likely, neither, likely…
$ exposure  <fct> most likely, least likely, least likely, unlikely, most like…
$ danger    <fct> neither, neither, least likely, neither, likely, likely, lea…

Convert the predation factor column to strings and convert the exposure factor columns to numerics.

mammal_sleep_fct_indexes_tbl |>
    #Mutate factors to strings and to numerics
    dplyr::mutate(exposure=as.character(exposure), danger=as.numeric(danger))
# A tibble: 62 × 4
   species                predation    exposure     danger
   <chr>                  <fct>        <chr>         <dbl>
 1 Africanelephant        neither      most likely       1
 2 Africangiantpouchedrat neither      least likely      1
 3 ArcticFox              least likely least likely      2
 4 Arcticgroundsquirrel   most likely  unlikely          1
 5 Asianelephant          neither      most likely       3
 6 Baboon                 likely       likely            3
 7 Bigbrownbat            least likely least likely      2
 8 Braziliantapir         likely       most likely       3
 9 Cat                    least likely unlikely          2
10 Chimpanzee             least likely least likely      2
# ℹ 52 more rows

Convert the columns gestation:exposure from factors to characters.

mammal_sleep_fct_indexes_tbl |>
    #Mutate factors to strings and to numerics
    dplyr::mutate(
        dplyr::across(predation:danger, as.character)
    )
# A tibble: 62 × 4
   species                predation    exposure     danger      
   <chr>                  <chr>        <chr>        <chr>       
 1 Africanelephant        neither      most likely  neither     
 2 Africangiantpouchedrat neither      least likely neither     
 3 ArcticFox              least likely least likely least likely
 4 Arcticgroundsquirrel   most likely  unlikely     neither     
 5 Asianelephant          neither      most likely  likely      
 6 Baboon                 likely       likely       likely      
 7 Bigbrownbat            least likely least likely least likely
 8 Braziliantapir         likely       most likely  likely      
 9 Cat                    least likely unlikely     least likely
10 Chimpanzee             least likely least likely least likely
# ℹ 52 more rows

NA filling

For NA filling/editing please see the tidyr drop_na() page.