Merging and appending datasets with dplyr (R) (2024)

Introduction

This brief post aims at explaining how to merge and append datasets with R, concretely with the package dplyr. First of all, it is important to have installed and loaded the library, i.e. install.packages("dplyr") and library(dplyr).

library(dplyr)

Before starting, the following illustration visually describes the definition of merge and append in the present post:

Merging and appending datasets with dplyr (R) (1)

Merge functions

In this post I explain six useful functions in dplyr to merge datasets and one useful function to append datasets. First, I describe briefly the six merging functions (each function joins two indicated datasets):

  • inner_join(): the output dataset only contains matched observations from both initial datasets.
  • full_join(): the output dataset contains all observations from both initial datasets.
  • left_join(): the output dataset contains all observations from the first (or left) dataset indicated and only matched observations from the second (or right) dataset indicated.
  • right_join(): the output dataset contains only matched observations from the first (or left) dataset indicated and all observations from the second (or right) dataset indicated.
  • semi_join(): the output dataset only contains matched observations from the first (or left) dataset indicated (this includes only the variables from the left dataset).
  • anti_join(): the output dataset only contains NOT matched observations from the first (or left) dataset indicated (and also only the variables from the left dataset).

The following illustration summarised visually these merging functions:

Merging and appending datasets with dplyr (R) (2)

Append function

Second, the function to append two datasets:

  • bind_rows(): this is the verb to append two datasets, i.e.all the rows and variables from the two datasets.
Merging and appending datasets with dplyr (R) (3)

Practise – merge

I first create two small datasets to practise:

id <- c(1:5)names <- c("Paul", "Sara", "John", "Sandra", "Helen")years <- c("24", "30", "45", "29", "18")people <- data.frame(id, names)age <- data.frame(id, years)

The syntaxis for the six merging functions is exactly the same, but we must take into account that the output will be different according to the description in the previous section. For example, I want to merge people and age datasets with inner_join():

# Merge the two datasets by "id" variableoutput_dataset <- inner_join(people, age, by = "id")
  • The result is a data frame named output_dataset. Since I have used inner_join(), only matched observations from the two datasets remain.
  • people is the first (or left) dataset to merge.
  • age is the second (or right) dataset to merge.
  • I indicate the variable which connects both datasets with by =. In this example, I merge two datasets by one variable, but merging datasets by several variables is also possible.

The merging variable (called id in this example) might be labelled different in each dataset. If this would be the case, I must write the by = syntax as follows:

# Rename the id variable in age dataset to show this example:age <- rename(age, age_id=id)
output_dataset <- inner_join(people, age, by = c("id" = "age_id"))

The first variable id corresponds to the first (or right) dataset, i.e. people, and the second variable age_id corresponds to the second one, i.e. age.

Morever, the two initial datasets might have two different variables but with the same name. I can label each of these variable to know where it comes from with suffix =:

# New variable in each dataset with the same name but different values to show this example:random1 <- c("a", "2", "c", "8", "a")random2 <- c("d", "23", "c", "8a", "a")# Prepare people dataset for this examplepeople <- data.frame(id, names, random1)people <- rename(people, random=random1)# Prepare `age` dataset for this exampleage <- data.frame(id, years, random2)age <- rename(age, random=random2)
# Then, merge these two datasetsoutput_dataset <- inner_join(people, age, by = "id", suffix=c("_people", "_age"))

The first element in suffix, "_people", is for the variables in the first dataset people and the second element, "_age", for variables in the second dataset, age. Remember that these suffixes will be added only in those variables with the same name in both datasets. In case of not specifiying these suffixes, R would add ".x" for the first dataset and ".y" for the second one.

Note that in case of wanting to join more than two datasets, I must repeat the function as many times as datatsets I want to join. For example, to merge three datasets:

# Create a third dataset:location <- c("Spain", "Germany", "France", "France", "Italy")country <- data.frame(id, location)
output_dataset <- people %>% inner_join(age, by = "id") %>% inner_join(country, by = "id")

Practise – append

Finally, to append two datasets:

output_dataset <- people %>% bind_rows(age)

Now, let’s practice the other merge functions by yourself with these datasets and try to append more than three datasets. Good luck!!

Related

Merging and appending datasets with dplyr (R) (2024)
Top Articles
Myrtle Beach - Beyond Van Gogh
Immersive Van Gogh exhibit - Cocoa Beach, Orlando, Florida
159R Bus Schedule Pdf
Houston Isd Applitrack
Scooter Tramps And Beer
What Is a Megapixel: Essential Guide [Megapixels Explained]
Australian Gold zonbescherming review - Verdraaid Mooi
Quadrilateral Angles Sum Property - Theorem and Proof
South Park Season 26 Kisscartoon
Culver's Flavor Of The Day Little Chute
Jeff Siegel Picks Santa Anita
Kutty Movie Net
Europese richtlijn liften basis voor Nederlandse wet - Liftinstituut - Alles voor veiligheid
Osu Worday
Craigs List High Rockies
What You Need to Know About County Jails
True Or False Security Is A Team Effort
Deshaun Watson suspension ruling live updates: Latest on settlement with NFL, reactions
9:00 A.m. Cdt
‘Sound of Freedom’ Is Now Streaming: Here’s Where to Stream the Controversial Crime Thriller Online for Free
Ucf Net Price Calculator
Thompson Center Thunderhawk Parts
Syoss Oleo Intense - 5-10 Cool Bruin - Permanente Haarverf - Haarkleuring - 1 stuk | bol
The Dillards: From Mayberry's Darlings to Progressive Bluegrass Pioneers
My Big Fat Greek Wedding 3 Showtimes Near Regal Ukiah
Yellow Kitchen Curtains Walmart
Downloahub
30+ useful Dutch apps for new expats in the Netherlands
Cn/As Archives
636-730-9503
Vioc Credit Card Charge
Movierulz.com Kannada 2024 Download: Your Ultimate Guide
Olecranon Fractures Flower Mound
Tryst Independent
Nsa Panama City Mwr
Tackytwinzzbkup
Record Label Behind The Iconic R&B Sound Crossword
Broncos vs. Seahawks: How to Watch NFL Week 1 Online Today
Dawson Myers Fairview Nc
Body Rubs Austin Texas
Riverwood Family Services
Hd Hub4U Com
Hourly Pay At Dick's Sporting Goods
Closest Dollar Tree Store To My Location
Erie Pa Craigslist
Networks Guided Reading Activity
1 Reilly Cerca De Mí
Fetid Emesis
The Hollis Co Layoffs
Restaurant Supply Store Ogden Utah
Morphe Aventura Mall
Navy Qrs Supervisor Answers
Latest Posts
Article information

Author: Nicola Considine CPA

Last Updated:

Views: 6443

Rating: 4.9 / 5 (49 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Nicola Considine CPA

Birthday: 1993-02-26

Address: 3809 Clinton Inlet, East Aleisha, UT 46318-2392

Phone: +2681424145499

Job: Government Technician

Hobby: Calligraphy, Lego building, Worldbuilding, Shooting, Bird watching, Shopping, Cooking

Introduction: My name is Nicola Considine CPA, I am a determined, witty, powerful, brainy, open, smiling, proud person who loves writing and wants to share my knowledge and understanding with you.