How To Import Data Into Rstudio

Importing data and managing files

Learning objectives:

Create projects in Rstudio
Proper steps to import data
Intro to data transformation using dplyr

Projection-oriented workflow

Information technology is best to create a split up folder that contains all related files. Yous can exercise the same for research projects. In Rstudio, this is called a Project. For example you can create a project for each of the chapters in this book. Projects are cocky-contained and tin be moved to some other reckoner or folder and the code will run.

If the first line of your R script is

setwd("C:\Users\jenny\path\that\just\I\have")

I will come into your office and SET YOUR Reckoner ON Burn

—@JennyBryan on Twitter

Animated GIF for creating projection and import information.

Create a projection in a new folder

Start by setting up a project in a new folder by going to File->New projection->New Directory->Empty Project. And so choose where the directory will be created on your hard drive. I created a directory chosen "Chapter4" under "D:\RBook." Rstudio creates a Project file named like "Chapter4.Rproj," which contains information such as scripts files and working folders. Projects files can exist saved and later on opened from File->Open up. In Windows you tin also just double-click on the project file. Y'all become everything ready to keep a item consignment or research project.

This folder is your working binder, a default folder for reading files, writing outputs etc.

More avant-garde users should consider create a projection using version command, so that all versions of your code is backup and easily shared past GitHub. Run across Happy Git and GitHub for the useR by Jenny Bryan.

Copy data files to the new directory

This is done outside of Rstudio on Windows or Mac interface. Download the heartatk4R.txt file from here. Information technology is a tab-delimited text file, meaning the dissimilar columns are separated by tab. We also demand to become some context virtually the data and how it is collected. At the minimum level, nosotros need to know the meaning of each columns.

File unzipping, conversion, and context. If data is compressed, unzip using seven-goose egg, WinRAR, Winzip, gzip. Any of these volition do. Is it a text file (CSV, txt, …) or Binary file (XLS, XLSX, …)? Convert binary to text file using corresponding application. Comma separated values (CSV) files, use comma to separate the columns. Another mutual type is tab-delimited text files, which uses the tab or $\t$ as it is invisible grapheme.

Checking the file with text editor and Excel. Before reading files into R, we ofttimes demand to open the files to take a look. Notepad or WordPad that come with Windows is very limited (and then apprentice)! Do not even think about using Microsoft Discussion! I strongly recommend that yous install a powerful text editor such as NotePad++ (https://notepad-plus-plus.org/), or TextPad (https://www.textpad.com/). If you are a Mac user, attempt TextMate, TextWrangler etc. I utilise NotePad++ almost every day to look into data, and too write R programs, as it tin highlight R commands based on R syntax. I even use a tool called NppToR (https://sourceforge.net/projects/npptor/) to ship R commands from NotePad++ directly to R, and I beloved it!

Regardless of their extensions in file names, all apparently text files tin be opened by these text editors. Plain text files only contain text without any formatting, links and images. The file names tin can be "poems.txt," "poems.tex," "students.csv," or but "data" without extension. I often salvage my R scripts as text file with names like "code_1-22-2017.R." You can import text files, regardless of file names, to Microsoft Excel, which can properly parse your file into columns if the correct delimiter is specified.

Comma separated values (CSV) files utilize comma to split the columns. CSV files tin also be conveniently opened past Excel. And Rstudio likes it too. Then let's effort to utilise CSV files. Another common type is tab-delimited text files, which uses the tab or $\t$ as it is invisible character. Other types of files such as Excel .xls or .xlsx files often needed to be saved every bit CSV files.

Import data files

In Rstudio, click File->Import Dataset->From text(readr)…, observe the file on your hard drive. You lot should alter the Delimiter to "tab," and the preview shows that the data is correctly parsed into multiple columns. Yous tin can likewise change the name of your data object by changing the default "heartatk4R" to "df" on the lower left of the import interface.

We need to bank check each of the columns and encounter the data types are correctly read in. The get-go column is just patient id number runs from 1 to 12844. Information technology will not exist useful in our analysis. The numbers in DIAGNOSIS, DRG, and DIED are integers but they really lawmaking for certain categories. They are not measurements. It does not make sense, for example, to add them or boilerplate them. Most of the times, there is no particular order. The same is true for SEX. So in this dialog interface, DIAGNOSIS, DRG, and DIED should be changed from "double" to "character". As shown in Effigy 4.2, y'all tin click on the automatically guessed information type under each of the column names as shown in Effigy 4.two. Past selecting "character" from the drop downwardly, y'all can successfully format this column every bit a character.

On the other mitt, LOS (length of stay in days) and Age should exist numbers. Only because 10 is presented as "0010," these columns are automatically recognized equally characters. Nosotros take to force R to read these columns every bit integers by clicking on the column championship and select integer. We have to change LOS and Historic period from "character" to "integer".

The crawly nerds at Rstudio really helped yous generating these 3 lines of code:

                                                      library(readr)                  df                    <-                    read_delim("datasets/heartatk4R.txt",                                                        "                    \t                    ",                    escape_double =                    Fake,                    col_types =                    cols(Historic period =                    col_integer(),                                                        DIAGNOSIS =                    col_character(),                    DIED =                    col_character(),                                                        DRG =                    col_character(),                    LOS =                    col_integer()),                                                        trim_ws =                    TRUE)                                      View(df)

Before yous click on the Import push, I highly recommend that you select all the codes and re-create it to the script file. Later on clicking Import, you tin can paste the code into a script window. If you exercise non have a script window open, y'all tin create one by clicking the File + icon on the peak left. Copy and paste these code to your script file. You lot will demand it when yous want to re-run the analysis without going through the above steps. You can run into the information appears as a spreadsheet, which can be sorted by clicking on the cavalcade names. This spreadsheet can exist closed. To reopen, click on df object, which is a data frame named after the input file. You information is now available as df.

Changing data types while importing data into Rstudio.

Figure four.2: Changing data types while importing data into Rstudio.

Check and convert data types

Most of the times, R can gauge the type of information in each column. But we always demand to double check using the str control. If not satisfied, we can enforce data blazon conversion in R using as.numeric, as.factor, or as.character functions.

                                                      str(df)                    # structure of data object, data types for each column

              ## spec_tbl_df [12,844 10 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame) ##  $ Patient  : num [1:12844] 1 ii 3 4 5 6 vii 8 9 10 ... ##  $ DIAGNOSIS: chr [1:12844] "41041" "41041" "41091" "41081" ... ##  $ SEX      : chr [i:12844] "F" "F" "F" "F" ... ##  $ DRG      : chr [i:12844] "122" "122" "122" "122" ... ##  $ DIED     : chr [ane:12844] "0" "0" "0" "0" ... ##  $ CHARGES  : num [1:12844] 4752 3941 3657 1481 1681 ... ##  $ LOS      : int [ane:12844] ten half dozen 5 2 1 9 xv fifteen 2 ane ... ##  $ Age      : int [1:12844] 79 34 76 lxxx 55 84 84 70 76 65 ... ##  - attr(*, "spec")= ##   .. cols( ##   ..   Patient = col_double(), ##   ..   DIAGNOSIS = col_character(), ##   ..   SEX = col_character(), ##   ..   DRG = col_character(), ##   ..   DIED = col_character(), ##   ..   CHARGES = col_double(), ##   ..   LOS = col_integer(), ##   ..   AGE = col_integer() ##   .. ) ##  - attr(*, "problems")=<externalptr>

Nosotros can reformat chiselled values factors. Nosotros are going to utilise df$Sexual activity to refer to the Sex column of the data frame df:

                                  df$DIAGNOSIS                    <-                    as.factor(df$DIAGNOSIS)                    # convert this column to factor                                    df$Sex activity                    <-                    as.factor(df$SEX)                  df$DRG                    <-                    as.cistron(df$DRG)                  df$DIED                    <-                    as.factor(df$DIED)

Factors are very similar to character vectors. Only factors have defined levels and their values are stored as integers.

              ## [i] ix

              ## [1] "41001" "41011" "41021" "41031" "41041" "41051" "41071" "41081" "41091"

Annotation "41001" is the reference level for this factor and it is coded as 1. The reference level for factors are of import when we interpret results from regression, as effects are represented relative to the reference level. The reference level are defined based on the social club they appear in the dataset. Sometimes we demand to change the reference level.

                                  df$DIAGNOSIS                    <-                    relevel( df$DIAGNOSIS,                    "41091")                                      levels(df$DIAGNOSIS)

              ## [one] "41091" "41001" "41011" "41021" "41031" "41041" "41051" "41071" "41081"

                                                      str(df)                    # double check structure of data

              ## spec_tbl_df [12,844 x 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame) ##  $ Patient  : num [1:12844] i 2 iii four 5 half dozen 7 8 9 10 ... ##  $ DIAGNOSIS: Factor w/ 9 levels "41091","41001",..: 6 6 i 9 1 ane 1 i 6 half-dozen ... ##  $ SEX      : Factor west/ ii levels "F","M": ane i ane one 2 2 1 one 2 1 ... ##  $ DRG      : Cistron westward/ three levels "121","122","123": 2 2 two ii 2 1 1 1 1 3 ... ##  $ DIED     : Factor w/ 2 levels "0","1": 1 i 1 1 ane ane 1 1 1 2 ... ##  $ CHARGES  : num [1:12844] 4752 3941 3657 1481 1681 ... ##  $ LOS      : int [one:12844] 10 6 five 2 ane 9 15 15 2 i ... ##  $ AGE      : int [ane:12844] 79 34 76 80 55 84 84 seventy 76 65 ... ##  - attr(*, "spec")= ##   .. cols( ##   ..   Patient = col_double(), ##   ..   DIAGNOSIS = col_character(), ##   ..   Sexual practice = col_character(), ##   ..   DRG = col_character(), ##   ..   DIED = col_character(), ##   ..   CHARGES = col_double(), ##   ..   LOS = col_integer(), ##   ..   Historic period = col_integer() ##   .. ) ##  - attr(*, "problems")=<externalptr>

                                                      summary(df)                    # a summary oftentimes gives us a lot of useful information

              ##     Patient        DIAGNOSIS    Sex activity       DRG       DIED         CHARGES      ##  Min.   :    1   41091  :5213   F:5065   121:5387   0:11434   Min.   :    3   ##  1st Qu.: 3212   41041  :2665   M:7779   122:6047   ane: 1410   1st Qu.: 5422   ##  Median : 6422   41011  :1824            123:1410             Median : 8445   ##  Mean   : 6422   41071  :1703                                 Hateful   : 9879   ##  3rd Qu.: 9633   41001  : 467                                 3rd Qu.:12569   ##  Max.   :12844   41081  : 287                                 Max.   :47910   ##                  (Other): 685                                 NA's   :699     ##       LOS              Age         ##  Min.   : 0.000   Min.   : 20.00   ##  1st Qu.: iv.000   1st Qu.: 57.00   ##  Median : 7.000   Median : 67.00   ##  Mean   : seven.569   Mean   : 66.29   ##  third Qu.:ten.000   3rd Qu.: 77.00   ##  Max.   :38.000   Max.   :103.00   ##

The summary( ) function is very useful to get bones information about data frames.

Note that for numeric columns we are shown mean, median, etc, while for factors the frequencies are shown. This reassured u.s.a. that the data types are correctly recognized. It besides shows missing values for CHARGES. Some people got free treatment for center attack? Perchance not. Missing does not mean zippo. Perhaps the data was not entered for some patients.

Except enforcing data blazon conversion by as.factor, as.numeric and then on, We can also reformat the columns before clicking Import as we described in iv.two.4.

Shut a project when yous are done

Once you lot are done with a project, you can close information technology from File $\rightarrow$Shut Project. If you do not close a project, Rstuio assumes that you will be continue working on the same project; even after your close Rstudio, the same project and files will be open adjacent time when you lot offset Rstudio. This is convenient, virtually of times. Only I've seen some students' Rstudio environment cluttered with all the things they have been doing for an entire semester. This is not just confusing, only could also lead to errors. For example in this form, yous can start a new project for each chapter.

To open a project, use File $\rightarrow$Open Project and and then navigate to the project. Alternatively you can double-click on the Chapter4.Rproj file from Windows or Mac. When a project file is loaded, the entire computing envirionment is set for you lot. The working directory is gear up properly. Some of the script files are open up. If the script file is not open, you tin open up it by clicking on it from the Files tab in the lower right window.

Reading files directly using read.tabular array

As you become more feel with R programming, there are many other options to import data, except using the Import Dataset in Rstudio.

In summary, we have the following code to read in the heart assail dataset. We accept to brand sure the file is in the current working directory. To fix working directory from Rstudio primary carte, get to Session -> Gear up Working Directory.

                                                rm(list =                  ls())                  # Erase all objects in retentivity                                                  getwd()                  # show working directory                                df                  <-                  read.table("datasets/heartatk4R.txt",                  sep=                  "                  \t                  ",                  header =                  TRUE)                                  head(df)                  # show the first few rows                                                  # change several columns to factors                                df$DRG                  <-                  equally.factor(df$DRG)                df$DIED                  <-                  every bit.factor(df$DIED)                df$DIAGNOSIS                  <-                  as.gene(df$DIAGNOSIS)                df$Sexual practice                  <-                  equally.factor(df$SEX)                                  str(df)                  # testify the data types of columns                                                  summary(df)                  # show summary of dataset

Alternatively, you can skip all of the to a higher place and practice this.

                              URL                  <-                  "https://raw.githubusercontent.com/gexijin/learnR/master/datasets/heartatk4R.txt"                                df                  <-                  read.table(URL,                                                  header =                  Truthful,                                                  sep =                  "                  \t                  ",                                                  colClasses =                  c("grapheme",                  "factor",                  "factor",                  "factor",                                                  "gene",                  "numeric",                  "numeric",                  "numeric"))

We are reading information direct from the internet with the URL. And we are specifying the data type for each column.

General procedure to read data into R:

If information is compressed, unzip using 7-zip, WinRAR, Winzip, gzip. Any of these will do.
Is information technology a text file (CSV, txt, …) or Binary file (XLS, XLSX, …)? Convert binary to text file using corresponding application. Comma separated values (CSV) files use comma to separate the columns. Some other mutual type is tab-delimited text files, which uses the tab or $\t$ as it is invisible character.
Open with a text editor (TexPad, NotePad++) to have a look.
Rows and columns? Row and cavalcade names? row.names = 1, header = T
Delimiters betwixt columns?(space, comma, tab…) sep = "$\t$"
Missing values? NA, na, Null, bare, NaN, 0 missingstring =
Open equally text file in Excel, cull advisable delimiter while importing, or use the Text to Column under Data in Excel. Beware of the abrasive automatic conversion in Excel "OCT4"->"four-OCT." Edit column names by removing spaces, or shorten them for piece of cake of reference in R. Save every bit CSV for reading in R.
read.table( ), or read.csv( ). For example, x <- read.table("somefile.txt", sep = "\t", header = TRUE, missingstring = "NA")
Double check the data with str(ten), make sure each column is recognized correctly every bit "character", "gene" and "numeric". Pay attention to columns comprise numbers but are actually IDs (i.e. pupil IDs), these should exist treated as character. For example, 10$ids <- as.character(x$ids), here x is the data frame and ids is the column name. Also pay attention to columns contain numbers but actually codes for some discrete categories (i, ii, three, representing treatment 1, treatment ii and handling 3). These demand to be reformatted as factors. This could exist done with something similar x$treatment <- as.factor(x$handling).

Refresher using cheat sheets that summarize many R functions is available here: https://www.rstudio.com/resources/cheatsheets/. It is of import to know the unlike types of R objects: scalars, vectors, data frames, matrix, and lists.

Practise 4.i

If y'all accept not created a projection for chapter four, it is fourth dimension to create i. Download the tab-delimited text file pulse.txt from this page (http://statland.org/R/R/pulse.txt). Import pulse.txt into R using two methods: R menu (Show the process by attaching some necessary screenshots.) and R script.
a. Rename the file as chapter4Pulse.
b. Modify the class of ActivityL from double to integer.
c. After importing pulse.txt into R, convert the class of Sex activity from charater to cistron using R lawmaking. Don't forget using class() function to check your respond.

Exercise 4.2

Type in Tabular array 4.1 in Excel and salvage as a CSV file and a tab-delimited tex file. Create a new Rstudio project equally outlined above. Copy the files to the new folder. Import the CSV file to Rstudio. Create a script file which includes the rm(list = ls()) and getwd() command, the generated R code when importing the CSV file, (similar to those shown in Figure 4.ii), and the code that convert data types (Historic period, BloodPressure and Weight should be numeric, LastName should be character and HeartAttack should exist factor). Name the data prepare as patients. Submit the R script your created, data structure of the data set patient, and use caput(patients) to prove the data.

Tabular array 4.i: An example of a multivariate dataset.
LastName	Historic period	Sex	BloodPressure	Weight	HeartAttack
Smith	xix	M	100	130.ii	1
Bird	55	F	86	300	0
Wilson	23	M	200	212.vii	0

Enter data manually

There are many unlike ways to go information into R. Yous can enter data manually (see below), or semi-manually (see below). You can read data into R from a local file or a file on the net. You can also use R to recollect data from databases, local or remote. The most important thing is to read data set into R correctly. A dataset not read in correctly will never be analyzed or visualized correctly.

                              x                  <-                  c(2.1,                  3.1,                  3.2,                  five.iv)                                  sum(ten)

            ## [ane] 13.8

                              A                  <-                  matrix(                                  c(2,                  4,                  3,                  one,                  v,                  7),                  # the data elements                                                                    nrow =                  2,                  # number of rows                                                                    ncol =                  three)                  # number of columns                                                  A                  # show the matrix

            ##      [,1] [,two] [,3] ## [1,]    2    3    v ## [2,]    four    i    vii

                              x                  <-                  scan()                  # Enter values from keyboard, separated by Return central. End by empty line.                                                                    2.i

            ## [1] ii.1

            ## [1] 3.1

            ## [1] 4.1

You can fifty-fifty use the scan() function, and so copy and paste a cavalcade of numbers from Excel.

Information manipulation in a information frame

For our heartattach dataset df, nosotros tin can sort the information by age. Again, type these commands in the script window, instead of directly into the Panel window. And salve the scripts in one case a while.

                              df2                  <-                  df[order(df$Historic period), ]                  # sort by ascending order by Age

Global Surround window contains the names and sizes of all the variables or objects in the calculator memory. R programming is all about creating and modifying these objects in the memory with clear, footstep-by-step instructions. We also can sort the data by clicking on the column names in spreadsheet from Global Environs.

Just like in Excel, you can add a new column with computed results:

                              df2$pdc                  <-                  df2$CHARGES                  /                  df2$LOS

Here we created a new column pdc to represent per day cost. We can also create a column to correspond age groups using the floor function just returns the integer part.

                              df2$ag                  <-                  floor(df2$Historic period/                  10)                  *                  10

You can now do things like this:

                                                boxplot(df2$CHARGES                  ~                  df2$ag)

Each box represents an age grouping. Older patients tends to stay longer in the hospital after beingness admitted for eye attack.

You can extract a subset of cases:

                              df3                  <-                  subset(df2, Sex activity                  ==                  "F")                  #  But females. "==" is for comparison and "=" is for assign value.                                df4                  <-                  subset(df3, Historic period                  >                  80)                  # only people older than 80                                                  summary(df4)

            ##     Patient        DIAGNOSIS   SEX       DRG      DIED       CHARGES      ##  Min.   :    7   41091  :564   F:1263   121:679   0:929   Min.   :   92   ##  1st Qu.: 3416   41071  :224   G:   0   122:250   i:334   1st Qu.: 5179   ##  Median : 6816   41041  :181            123:334           Median : 8580   ##  Hateful   : 6624   41011  :166                              Mean   :10143   ##  3rd Qu.: 9814   41001  : 41                              third Qu.:13347   ##  Max.   :12841   41081  : 38                              Max.   :46915   ##                  (Other): 49                              NA's   :74      ##       LOS              AGE              pdc                ag         ##  Min.   : ane.000   Min.   : 81.00   Min.   :   xviii.four   Min.   : lxxx.00   ##  1st Qu.: 5.000   1st Qu.: 83.00   1st Qu.:  796.7   1st Qu.: eighty.00   ##  Median : 8.000   Median : 85.00   Median : 1113.7   Median : 80.00   ##  Hateful   : 9.457   Mean   : 85.92   Hateful   : 1331.4   Mean   : 81.96   ##  tertiary Qu.:13.000   third Qu.: 89.00   3rd Qu.: 1523.viii   3rd Qu.: 80.00   ##  Max.   :38.000   Max.   :102.00   Max.   :11246.i   Max.   :100.00   ##                                    NA's   :74

Attempt not to attach the data when you are manipulation data like this.

Data transformation using the dplyr

Following the same style of ggplot, the dplyr package, a part of the Tidyverse, makes data transformation more intuitive.

Get-go we demand to install the dplyr package.

                                                install.packages("dplyr")

With dplyr, data transformation becomes fun.

                                                library(dplyr)                                df2                  <-                  df                  %>%                  # piping operator; data is transport to the next footstep                                                  suit(Historic period)                  # sort in ascending gild; desc(AGE) for descending social club                                                  head(df2)

            ##   Patient DIAGNOSIS SEX DRG DIED  CHARGES LOS AGE ## 1    5411     41041   M 122    0  6214.00   iv  20 ## 2   10853     41091   F 122    0  6726.27   4  21 ## iii    4126     41041   M 122    0 10781.00   8  23 ## iv   10738     41011   1000 121    0       NA   eight  23 ## v    4247     41091   F 122    0 10672.00   vi  24 ## vi    5199     41041   M 121    0  7596.00   viii  24

                              df2                  <-                  df2                  %>%                                                  mutate(                  pdc =                  CHARGES                  /                  LOS)                  # new column by dividing two columns                                                  head(df2)

            ##   Patient DIAGNOSIS SEX DRG DIED  CHARGES LOS Historic period      pdc ## i    5411     41041   M 122    0  6214.00   4  twenty 1553.500 ## ii   10853     41091   F 122    0  6726.27   iv  21 1681.568 ## 3    4126     41041   M 122    0 10781.00   8  23 1347.625 ## 4   10738     41011   M 121    0       NA   viii  23       NA ## v    4247     41091   F 122    0 10672.00   6  24 1778.667 ## 6    5199     41041   K 121    0  7596.00   viii  24  949.500

                              df2                  <-                  df2                  %>%                                                  mutate(                  ag =                  floor(Age/                  ten)                  *                  x)                  # new column, age grouping                                                  caput(df2)

            ##   Patient DIAGNOSIS Sexual practice DRG DIED  CHARGES LOS AGE      pdc ag ## 1    5411     41041   One thousand 122    0  6214.00   4  20 1553.500 xx ## two   10853     41091   F 122    0  6726.27   4  21 1681.568 20 ## 3    4126     41041   M 122    0 10781.00   8  23 1347.625 20 ## 4   10738     41011   Thou 121    0       NA   viii  23       NA twenty ## 5    4247     41091   F 122    0 10672.00   vi  24 1778.667 20 ## 6    5199     41041   M 121    0  7596.00   8  24  949.500 20

                              df3                  <-                  df                  %>%                                                  filter(Sex                  ==                  "F", Historic period                  >                  eighty)                  # filtering                                                  head(df3)

            ##   Patient DIAGNOSIS Sex DRG DIED  CHARGES LOS Age ## 1       seven     41091   F 121    0 10958.52  xv  84 ## 2      13     41091   F 122    0       NA   9  83 ## 3      27     41011   F 123    1  3214.90   four  83 ## 4      36     41041   F 121    0  2584.x   9  81 ## 5      51     41011   F 121    0  7589.00  11  81 ## 6      67     41011   F 123    ane 16428.80   eight  81

The existent power and efficiency of dplyr comes when yous connect the pipes to exercise information transformation in multiple steps.

                              df2                  <-                  df                  %>%                                                  arrange(AGE)                  %>%                                                  mutate(                  pdc =                  CHARGES                  /                  LOS)                  %>%                                                  mutate(                  ag =                  floor(AGE/                  10)                  *                  10)                  %>%                                                  filter(Sex activity                  ==                  "F", Age                  >                  fourscore)                                  head(df2)

            ##   Patient DIAGNOSIS SEX DRG DIED  CHARGES LOS Historic period       pdc ag ## one      36     41041   F 121    0  2584.10   ix  81  287.1222 80 ## 2      51     41011   F 121    0  7589.00  11  81  689.9091 80 ## 3      67     41011   F 123    1 16428.80   8  81 2053.6000 lxxx ## 4     225     41041   F 123    i  4674.l   vii  81  667.7857 80 ## v     265     41091   F 122    0  9401.05   9  81 1044.5611 80 ## 6     284     41041   F 121    0  7240.30   seven  81 1034.3286 lxxx

arrange, mutate, filter are called activity verbs. For more activeness verbs, see dplyr cheat sheet from the Rstudio main menu: Aid $\rightarrow$ Cheatsheets $\rightarrow$ R Markdown Crook Sheet. Information technology is also available on line dplyr cheat Sheet.