Dplyr

Overview
Dplyr is the main data manipulation package for tibbles in tidyverse.
Dplyr is described as a “grammar of data manipulation” using verbs as the name of its various functions.
This website aims to quickly cover the most commonly used dplyr
functions and uses. Therefore there are a lot more dplyr
functions than those covered here. Please check the below link for the full list.
Sections
There are many sections for dplyr
. These are summarised below.
Pipes
Pipes (|>
) are a vital part of creating efficient and clear code with tidyverse. Pipes allow you to chain/pipe functions together. It can be used for all functions not just those from tidyverse.
Rows
There four main verbs (i.e. functions) to manipulate rows. These are:
arrange()
: Arrange the rows of a tibble. Can be used to reorder the rows based on the values of a column.distinct()
: Extracts unique/distinct rows from a tibble.filter()
: Extract rows by filtering with conditions. This can be used to pick rows of certain groups, filter based on numeric sizes, and more.slice()
: A set of methods to choose a slice of rows based on index positions, top and bottom observations, and min and max values based on a specific column. This is especially useful for piping (|>
).
Columns
There six main verbs (i.e. functions) to manipulate columns. These are:
glimpse()
: Print a tibble in a transposed manner. Useful for seeing the data types of all the columns.mutate()
: Mutate columns to create new columns based on existing ones, modify existing columns, and delete columns.pull()
: Pull out a single column from a tibble, resulting in a vector.relocate()
: Relocate columns. You can relocate columns to the start or end, and you can move them after or before specified columns.rename()
: Rename columns in a tibble.select()
: Select specific columns of a tibble. Can be used with a variety of helper functions such asstarts_with()
,ends_with()
,contains()
, andmatches()
.
If you would like to carry out one of the column functions with multiple columns you can look at the official documentation for the following functions:
Grouping
Tibbles can be grouped by a specific variable/column or multiple variables/columns. This allows for group wise calculations.
group_by()
: Converts a tibble to a grouped tibble.count()
: Counts the number of instances of each unique value for the grouping in a tibble.summarise()
: Produces a tibble with summary information on the group members in a grouped tibble.- Various functions can be used to calculate various summary information including
n()
,mean()
,median()
,sd()
,IQR()
,first()
,last()
, andnth()
.
- Various functions can be used to calculate various summary information including
Bind tibbles
Tibbles can be combined/bound together with the following functions:
bind_cols()
: Bind 2 tibbles by columns (i.e. bind the tibbles side by side). The two tibbles must have the same number of rows.bind_row()
: Bind 2 tibbles by rows (i.e. bind one tibble on top of the other ). The two tibbles must have the same column types and names.