1 Introduction to R

R is a programming language developed by Ross Ihaka and Robert Gentleman in 1993. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

Data analysis with R is done in a series of steps; programming, transforming, discovering, modeling and communicate the results.

  • Program: R is a clear and accessible programming tool
  • Transform: R is made up of a collection of libraries designed specifically for data science
  • Discover: Investigate the data, refine your hypothesis and analyze them
  • Model: R provides a wide array of tools to capture the right model for your data
  • Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny apps to share with the world

2 Objects

The : operator gives you a nice way to create a group of numbers.

1:6
#> [1] 1 2 3 4 5 6

Running 1:6 generated a vector of numbers for you to see, but it didn’t save that vector anywhere in your computer’s memory. If you want to use those numbers again, you’ll have to ask your computer to save them somewhere. You can do that by creating an R object.

R lets you save data by storing it inside an R object. What is an object? Just a name that you can use to call up stored data. For example, you can save data into an object like a or b. Wherever R encounters the object, it will replace it with the data saved inside, like so:

a <- 1
a
#> [1] 1
a + 2
#> [1] 3

So, for another example, the following code would create an object named die that contains the numbers one through six. To see what is stored in an object, just type the object’s name by itself:

die <- 1:6
die
#> [1] 1 2 3 4 5 6

When you create an object, the object will appear in the environment pane of RStudio.

You can name an object in R almost anything you want, but there are a few rules.

  • It should contain letters, numbers, and only dot or underscore characters.
  • It should not contain use some special symbols, like ^, !, $, @, +, -, /, or *.
  • It should not start with a number (eg: 2iota)
  • It should not start with a dot followed by a number (eg: .2iota)
  • It should not start with an underscore (eg: _iota)
  • It should not be a reserved keyword.
Good names Names that cause errors
a 1trial
b $
FOO ^mean
my_var 2nd
.day !bad

But you can skip those rules by using ``.

`!1` <- 2
`!1`
#> [1] 2

R is case-sensitive, so name and Name will refer to different objects:

Name <- 1
name <- 0
Name + 1
#> [1] 2

Finally, R will overwrite any previous information stored in an object without asking you for permission. So, it is a good idea to not use names that are already taken:

my_number <- 1
my_number
#> [1] 1
my_number <- 999
my_number
#> [1] 999

The function seq can generate sequences of real numbers as follows:

seq(5)
#> [1] 1 2 3 4 5
seq(2, 5)
#> [1] 2 3 4 5
seq(1, 5, 0.5)
#> [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

The function rep creates a vector with all its elements identical:

rep(1, 5)
#> [1] 1 1 1 1 1
rep(c(1, 3), 3)
#> [1] 1 3 1 3 1 3
rep(c(1, 3), each = 3)
#> [1] 1 1 1 3 3 3
rep(c(1, 3), c(2, 4))
#> [1] 1 1 3 3 3 3

You can see which object names you have already used with the function ls:

ls()
#>  [1] "!1"        "a"         "die"       "layout"    "my_number" "name"     
#>  [7] "Name"      "p1"        "p12"       "p123"      "p2"        "p24"      
#> [13] "p3"        "p34"       "p4"

You can also see which names you have used by examining RStudio’s environment pane.

To delete objects in memory, we use the function rm:

rm(die)
ls()
#>  [1] "!1"        "a"         "layout"    "my_number" "name"      "Name"     
#>  [7] "p1"        "p12"       "p123"      "p2"        "p24"       "p3"       
#> [13] "p34"       "p4"

The statement rm(list = ls()) will remove all objects from the working environment:

rm(list = ls())
ls()
#> character(0)

3 Vectors

3.1 Vector Basics

There are two types of vectors:

  1. Atomic vectors, of which there are six types: logical, integer, double, character, complex, and raw. Integer and double vectors are collectively known as numeric vectors.

  2. Lists, which are sometimes called recursive vectors because lists can contain other lists.

The chief difference between atomic vectors and lists is that atomic vectors are homogeneous, while lists can be heterogeneous.

There’s one other related object: NULL. NULL is often used to represent the absence of a vector (as opposed to NA which is used to represent the absence of a value in a vector). NULL typically behaves like a vector of length 0.

The hierarchy of R's vector types

The hierarchy of R’s vector types

Every vector has two key properties:

  1. Its type, which you can determine with typeof().
typeof(letters)
#> [1] "character"
typeof(1:10)
#> [1] "integer"
  1. Its length, which you can determine with length().
x <- list("a", "b", 1:10)
length(x)
#> [1] 3

Vectors can also contain arbitrary additional metadata in the form of attributes. These attributes are used to create augmented vectors which build on additional behaviour. There are three important types of augmented vector:

  • Factors are built on top of integer vectors.
  • Dates and date-times are built on top of numeric vectors.
  • Data frames and tibbles are built on top of lists.

Further, there are mainly six data types present in R:

  1. Vectors
  2. Matrices
  3. Arrays
  4. Factors
  5. Lists
  6. DataFrames / tibble

We’ll start with the very basics. The most simple type of object in R is an atomic vector. If you look closely enough, you’ll see that most structures in R are built from atomic vectors.

3.2 Atomic Vectors

An atomic vector is just a simple vector of data. You can make an atomic vector by grouping some values of data together with c:

die <- c(1, 2, 3, 4, 5, 6)
die
#> [1] 1 2 3 4 5 6
is.vector(die)
#> [1] TRUE

is.vector tests whether an object is an atomic vector. It returns TRUE if the object is an atomic vector and FALSE otherwise.

You can also make an atomic vector with just one value. R saves single values as an atomic vector of length 1:

five <- 5
five
#> [1] 5
is.vector(five)
#> [1] TRUE
length(five)
#> [1] 1
length(die)
#> [1] 6

Each atomic vector stores its values as a one-dimensional vector, and each atomic vector can only store one type of data. You can save different types of data in R by using different types of atomic vectors.

3.2.1 Doubles

A double vector stores regular numbers. The numbers can be positive or negative, large or small, and have digits to the right of the decimal place or not. In general, R will save any number that you type in R as a double.

die <- c(1, 2, 3, 4, 5, 6)
die
#> [1] 1 2 3 4 5 6
typeof(die)
#> [1] "double"

Some R functions refer to doubles as “numerics”.

3.2.2 Integers

Integer vectors store integers, numbers that can be written without a decimal component.

You can specifically create an integer in R by typing a number followed by an uppercase L. For example:

int <- c(-1L, 2L, 4L)
int
#> [1] -1  2  4
typeof(int)
#> [1] "integer"

Note that R won’t save a number as an integer unless you include the L. Integer numbers without the L will be saved as doubles. The only difference between 4 and 4L is how R saves the number in your computer’s memory. Integers are defined more precisely in your computer’s memory than doubles (unless the integer is very large or small).

Sometimes a difference in precision can have surprising effects. Your computer allocates 64 bits of memory to store each double in an R program. As a result, each double is accurate to about 16 significant digits. This introduces a little bit of error. For example, you may expect the result of the expression below to be zero, but it is not:

sqrt(2)^2
#> [1] 2
sqrt(2)^2 - 2
#> [1] 4.440892e-16
dplyr::near(sqrt(2)^2, 2)
#> [1] TRUE

These errors are known as floating-point errors, and doing arithmetic in these conditions is known as floating-point arithmetic. Floating-point arithmetic is not a feature of R; it is a feature of computer programming. Luckily, the errors caused by floating-point arithmetic are usually insignificant (and when they are not, they are easy to spot). As a result, you’ll generally use doubles instead of integers as a data scientist.

3.2.3 Characters

A character vector stores small pieces of text. You can create a character vector in R by typing a character or string of characters surrounded by quotes:

text <- c("Hello",  "World!")
text
#> [1] "Hello"  "World!"
typeof(text)
#> [1] "character"
typeof("Hello")
#> [1] "character"

The individual elements of a character vector are known as strings. Note that a string can contain more than just letters. You can assemble a character string from numbers or symbols as well. In fact, anything surrounded by quotes in R will be treated as a character string—no matter what appears between the quotes.

We can use the function paste and paste0 joins multiple vectors together.

paste("Hello", "World!", sep = " ")
#> [1] "Hello World!"
paste("x", 1:3, sep = "_")
#> [1] "x_1" "x_2" "x_3"
paste(text, collapse = " ")
#> [1] "Hello World!"
paste0("Today is ", date())
#> [1] "Today is Mon Oct 25 20:52:33 2021"

3.2.4 Logicals

Logical vectors store TRUEs and FALSEs, R’s form of Boolean data. Logicals are very helpful for doing things like comparisons:

3 > 4
#> [1] FALSE

Any time you type TRUE or FALSE in capital letters (without quotation marks), R will treat your input as logical data. R also assumes that T and F are shorthand for TRUE and FALSE, unless they are defined elsewhere (e.g. T <- 500). Since the meaning of T and F can change, its best to stick with TRUE and FALSE:

logic <- c(TRUE, FALSE, TRUE)
logic
#> [1]  TRUE FALSE  TRUE
typeof(logic)
#> [1] "logical"
typeof(F)
#> [1] "logical"
F <- 500
typeof(F)
#> [1] "double"

3.2.5 Complex and Raw

Doubles, integers, characters, and logicals are the most common types of atomic vectors in R, but R also recognizes two more types: complex and raw.

Complex vectors store complex numbers. To create a complex vector, add an imaginary term to a number with i:

comp <- c(1 + 1i, 1 + 2i, 1 + 3i)
comp
#> [1] 1+1i 1+2i 1+3i
typeof(comp)
#> [1] "complex"

Raw vectors store raw bytes of data. Making raw vectors gets complicated, but you can make an empty raw vector of length n with raw(n). See the help page of raw for more options when working with this type of data:

raw(3)
#> [1] 00 00 00
typeof(raw(3))
#> [1] "raw"

3.3 Attributes

An attribute is a piece of information that you can attach to an atomic vector (or any R object). The attribute won’t affect any of the values in the object, and it will not appear when you display your object. You can think of an attribute as “metadata”; it is just a convenient place to put information associated with an object. R will normally ignore this metadata, but some R functions will check for specific attributes. These functions may use the attributes to do special things with the data.

You can see which attributes an object has with attributes. attributes will return NULL if an object has no attributes. An atomic vector, like die, won’t have any attributes unless you give it some:

attributes(die)
#> NULL

R uses NULL to represent the null set, an empty object. NULL is often returned by functions whose values are undefined. You can create a NULL object by typing NULL in capital letters.

3.3.1 Names

The most common attributes to give an atomic vector are names, dimensions (dim), and classes. Each of these attributes has its own helper function that you can use to give attributes to an object. You can also use the helper functions to look up the value of these attributes for objects that already have them. For example, you can look up the value of the names attribute of die with names:

names(die)
#> NULL

NULL means that die does not have a names attribute. You can give one to die by assigning a character vector to the output of names. The vector should include one name for each element in die:

names(die) <- c("one", "two", "three", "four", "five", "six")

Now die has a names attribute:

names(die)
#> [1] "one"   "two"   "three" "four"  "five"  "six"
attributes(die)
#> $names
#> [1] "one"   "two"   "three" "four"  "five"  "six"

R will display the names above the elements of die whenever you look at the vector:

die
#>   one   two three  four  five   six 
#>     1     2     3     4     5     6

However, the names won’t affect the actual values of the vector, nor will the names be affected when you manipulate the values of the vector:

die + 1
#>   one   two three  four  five   six 
#>     2     3     4     5     6     7

You can also use names to change the names attribute or remove it all together. To change the names, assign a new set of labels to names:

names(die) <- c("uno", "dos", "tres", "quatro", "cinco", "seis")
die
#>    uno    dos   tres quatro  cinco   seis 
#>      1      2      3      4      5      6

To remove the names attribute, set it to NULL:

names(die) <- NULL
die
#> [1] 1 2 3 4 5 6

3.3.2 Dim

You can transform an atomic vector into an n-dimensional array by giving it a dimensions attribute with dim. To do this, set the dim attribute to a numeric vector of length n. R will reorganize the elements of the vector into n dimensions. Each dimension will have as many rows (or columns, etc.) as the nth value of the dim vector. For example, you can reorganize die into a 2 × 3 matrix (which has 2 rows and 3 columns):

dim(die) <- c(2, 3)
die
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6

or a 3 × 2 matrix (which has 3 rows and 2 columns):

dim(die) <- c(3, 2)
die
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> [3,]    3    6

or a 1 × 2 × 3 hypercube (which has 1 row, 2 columns, and 3 “slices”). This is a three-dimensional structure, but R will need to show it slice by slice by slice on your two-dimensional computer screen:

dim(die) <- c(1, 2, 3)
die
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]    1    2
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]    3    4
#> 
#> , , 3
#> 
#>      [,1] [,2]
#> [1,]    5    6

R will always use the first value in dim for the number of rows and the second value for the number of columns. In general, rows always come first in R operations that deal with both rows and columns.

You may notice that you don’t have much control over how R reorganizes the values into rows and columns. For example, R always fills up each matrix by columns, instead of by rows. If you’d like more control over this process, you can use one of R’s helper functions, matrix or array. They do the same thing as changing the dim attribute, but they provide extra arguments to customize the process.

3.4 Matrices

Matrices store values in a two-dimensional array, just like a matrix from linear algebra. To create one, first give matrix an atomic vector to reorganize into a matrix. Then, define how many rows should be in the matrix by setting the nrow argument to a number. matrix will organize your vector of values into a matrix with the specified number of rows. Alternatively, you can set the ncol argument, which tells R how many columns to include in the matrix:

m <- matrix(die, nrow = 2)
m
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6

matrix will fill up the matrix column by column by default, but you can fill the matrix row by row if you include the argument byrow = TRUE:

m <- matrix(die, nrow = 2, byrow = TRUE)
m
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6

You can use nrow and ncol to access the number of rows and columns of the matrix:

nrow(m)
#> [1] 2
ncol(m)
#> [1] 3

The matrix has a dim attribute that contains a vector of two elements, the number of rows and columns of the matrix.The dim property can be accessed using the dim:

attributes(m)
#> $dim
#> [1] 2 3
dim(m)
#> [1] 2 3

3.5 Arrays

The array function creates an n-dimensional array. For example, you could use array to sort values into a cube of three dimensions or a hypercube in 4, 5, or n dimensions. array is not as customizeable as matrix and basically does the same thing as setting the dim attribute. To use array, provide an atomic vector as the first argument, and a vector of dimensions as the second argument, now called dim:

ar <- array(c(11:14, 21:24, 31:34), dim = c(2, 2, 3))
ar
#> , , 1
#> 
#>      [,1] [,2]
#> [1,]   11   13
#> [2,]   12   14
#> 
#> , , 2
#> 
#>      [,1] [,2]
#> [1,]   21   23
#> [2,]   22   24
#> 
#> , , 3
#> 
#>      [,1] [,2]
#> [1,]   31   33
#> [2,]   32   34

3.6 Class

Notice that changing the dimensions of your object will not change the type of the object, but it will change the object’s class attribute:

dim(die) <- c(2, 3)
typeof(die)
#> [1] "double"
class(die)
#> [1] "matrix" "array"

A matrix is a special case of an atomic vector. For example, the die matrix is a special case of a double vector. Every element in the matrix is still a double, but the elements have been arranged into a new structure. R added a class attribute to die when you changed its dimensions. This class describes die’s new format. Many R functions will specifically look for an object’s class attribute, and then handle the object in a predetermined way based on the attribute.

Note that an object’s class attribute will not always appear when you run attributes; you may need to specifically search for it with class:

attributes(die)
#> $dim
#> [1] 2 3

You can apply class to objects that do not have a class attribute. class will return a value based on the object’s atomic type. Notice that the “class” of a double is “numeric”:

class("Hello")
#> [1] "character"
class(5)
#> [1] "numeric"

3.6.1 Dates and Times

Dates in R are numeric vectors that represent the number of days since 1 January 1970.

today <- Sys.Date()
today
#> [1] "2021-10-25"
typeof(today)
#> [1] "double"
class(today)
#> [1] "Date"
unclass(today)
#> [1] 18925

Date-times are numeric vectors with class POSIXct that represent the number of seconds since 12:00 AM January 1st 1970 (in the Universal Time Coordinated (UTC) zone). (In case you were wondering, “POSIXct” stands for “Portable Operating System Interface”, calendar time.)

now <- Sys.time()
now
#> [1] "2021-10-25 20:52:33 PDT"
typeof(now)
#> [1] "double"
class(now)
#> [1] "POSIXct" "POSIXt"
unclass(now)
#> [1] 1635220353
Sys.timezone()
#> [1] "America/Los_Angeles"
lubridate::with_tz(now, tzone = "Asia/Shanghai")
#> [1] "2021-10-26 11:52:33 CST"

3.6.2 Factors

Factors are R’s way of storing categorical information, like ethnicity or eye color. To make a factor, pass an atomic vector into the factor function. R will recode the data in the vector as integers and store the results in an integer vector. R will also add a levels attribute to the integer, which contains a set of labels for displaying the factor values, and a class attribute, which contains the class factor:

gender <- factor(c("male", "female", "female", "male"))
gender
#> [1] male   female female male  
#> Levels: female male
typeof(gender)
#> [1] "integer"
attributes(gender)
#> $levels
#> [1] "female" "male"  
#> 
#> $class
#> [1] "factor"

You can see exactly how R is storing your factor with unclass:

unclass(gender)
#> [1] 2 1 1 2
#> attr(,"levels")
#> [1] "female" "male"

Factors make it easy to put categorical variables into a statistical model because the variables are already coded as numbers. However, factors can be confusing since they look like character strings but behave like integers.

R will often try to convert character strings to factors when you load and create data. You can convert a factor to a character string with the as.character function. R will retain the display version of the factor, not the integers stored in memory:

as.character(gender)
#> [1] "male"   "female" "female" "male"

3.7 Using Atomic Vectors

Now that you understand the different types of atomic vector, it’s useful to review some of the important tools for working with them. These include:

  1. How to convert from one type to another, and when that happens automatically.

  2. How to tell if an object is a specific type of vector.

  3. What happens when you work with vectors of different lengths.

  4. How to pull out elements of interest.

  5. How to merge the elements of vectors.

3.7.1 Coercion

If you try to put multiple types of data into a vector, R will convert the elements to a single type of data.R’s coercion behavior may seem inconvenient, but it is not arbitrary. R always follows the same rules when it coerces data types. Once you are familiar with these rules, you can use R’s coercion behavior to do surprisingly useful things.

So how does R coerce data types? If a character string is present in an atomic vector, R will convert everything else in the vector to character strings. If a vector only contains logicals and numbers, R will convert the logicals to numbers; every TRUE becomes a 1, and every FALSE becomes a 0.

R always uses the same rules to coerce data to a single type.

R always uses the same rules to coerce data to a single type.

R uses the same coercion rules when you try to do math with logical values. So the following code:

sum(c(TRUE, TRUE, FALSE, FALSE))
#> [1] 2

will become:

sum(c(1, 1, 0, 0))
#> [1] 2

You can explicitly ask R to convert data from one type to another with the as functions. R will convert the data whenever there is a sensible way to do so:

as.character(1)
#> [1] "1"
as.logical(1)
#> [1] TRUE
as.numeric(FALSE)
#> [1] 0

In some cases, using only a single type of data is a huge advantage. Vectors, matrices, and arrays make it very easy to do math on large sets of numbers because R knows that it can manipulate each value the same way. Operations with vectors, matrices, and arrays also tend to be fast because the objects are so simple to store in memory.

3.7.2 Test Functions

Sometimes you want to do different things based on the type of vector. One option is to use typeof(). Another is to use a test function which returns a TRUE or FALSE. Base R provides many functions like is.vector() and is.atomic(), but they often return surprising results. Instead, it’s safer to use the is_* functions provided by purrr, which are summarised in the table below.

logical integer double character list
is_logical() x
is_integer() x
is_double() x
is_numeric() x x
is_character() x
is_atomic() x x x x
is_list() x
is_vector() x x x x x

3.7.3 Scalars and Recycling Rules

As well as implicitly coercing the types of vectors to be compatible, R will also implicitly coerce the length of vectors. This is called vector recycling, because the shorter vector is repeated, or recycled, to the same length as the longer vector.

This is generally most useful when you are mixing vectors and “scalars”. I put scalars in quotes because R doesn’t actually have scalars: instead, a single number is a vector of length 1. Because there are no scalars, most built-in functions are vectorised, meaning that they will operate on a vector of numbers. That’s why, for example, this code works:

sample(10) + 100
#>  [1] 106 108 105 109 103 110 104 102 107 101
runif(10) > 0.5
#>  [1] FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE

In R, basic mathematical operations work with vectors. That means that you should never need to perform explicit iteration when performing simple mathematical computations.

It’s intuitive what should happen if you add two vectors of the same length, or a vector and a “scalar”, but what happens if you add two vectors of different lengths?

1:10 + 1:2
#>  [1]  2  4  4  6  6  8  8 10 10 12

Here, R will expand the shortest vector to the same length as the longest, so called recycling. This is silent except when the length of the longer is not an integer multiple of the length of the shorter:

1:10 + 1:3
#> Warning in 1:10 + 1:3: longer object length is not a multiple of shorter object
#> length
#>  [1]  2  4  6  5  7  9  8 10 12 11

While vector recycling can be used to create very succinct, clever code, it can also silently conceal problems.

3.7.4 Subsetting

[ is the subsetting function, and is called like x[a]. There are four types of things that you can subset a vector with:

  1. A numeric vector containing only integers. The integers must either be all positive, all negative, or zero.

    Subsetting with positive integers keeps the elements at those positions:

    x <- c("one", "two", "three", "four", "five")
    x[c(3, 2, 5)]
    #> [1] "three" "two"   "five"

    By repeating a position, you can actually make a longer output than input:

    x[c(1, 1, 5, 5, 5, 2)]
    #> [1] "one"  "one"  "five" "five" "five" "two"

    Negative values drop the elements at the specified positions:

    x[c(-1, -3, -5)]
    #> [1] "two"  "four"

    It’s an error to mix positive and negative values:

    x[c(1, -1)]
    #> Error in x[c(1, -1)]: only 0's may be mixed with negative subscripts

    The error message mentions subsetting with zero, which returns no values:

    x[0]
    #> character(0)

    This is not useful very often, but it can be helpful if you want to create unusual data structures to test your functions with.

  2. Subsetting with a logical vector keeps all values corresponding to a TRUE value. This is most often useful in conjunction with the comparison functions.

    x <- c(10, 3, NA, 5, 8, 1, NA)
    # All non-missing values of x
    x[!is.na(x)]
    #> [1] 10  3  5  8  1
    # All even (or missing!) values of x
    x[x %% 2 == 0]
    #> [1] 10 NA  8 NA
    x[!is.na(x) & x %% 2 == 0]
    #> [1] 10  8
  3. If you have a named vector, you can subset it with a character vector:

    x <- c(abc = 1, def = 2, xyz = 5)
    x[c("xyz", "def")]
    #> xyz def 
    #>   5   2

    Like with positive integers, you can also use a character vector to duplicate individual entries.

  4. The simplest type of subsetting is nothing, x[], which returns the complete x. This is not useful for subsetting vectors, but it is useful when subsetting matrices (and other high dimensional structures) because it lets you select all the rows or all the columns, by leaving that index blank.

    m <- matrix(1:6, nrow = 2)
    m
    #>      [,1] [,2] [,3]
    #> [1,]    1    3    5
    #> [2,]    2    4    6
    m[1, ]
    #> [1] 1 3 5
    m[ ,-1]
    #>      [,1] [,2]
    #> [1,]    3    5
    #> [2,]    4    6

    Notice that when you take a subset of a matrix, if you take a subset that has only one row or only one column, you’re no longer going to be a matrix but you’re going to be an R vector, which is neither a row vector nor a column vector.To prevent this rule from working, add the drop = FALSE option to the square bracket subscript.

    m1 <- m[ , 1]
    m1
    #> [1] 1 2
    attributes(m1)
    #> NULL
    m2 <- m[ , 1, drop = FALSE]
    m2
    #>      [,1]
    #> [1,]    1
    #> [2,]    2
    attributes(m2)
    #> $dim
    #> [1] 2 1

3.7.5 Merging

If x is a vector, cbind(x) turns x into a column vector, that is, a matrix with columns of 1, and rbind(x) turns x into a row vector.

If x1, x2, x3 are equal length vectors, cbind(x1, x2, x3) treats them as column vectors and puts them together to form a matrix.The arguments of cbind can contain both a vector and a matrix, and the length of the vector must be equal to the number of rows in the matrix.

c1 <- cbind(c(1,2,3))
c1
#>      [,1]
#> [1,]    1
#> [2,]    2
#> [3,]    3
cbind(c(1,2), c(3,4), c(5,6))
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
c2 <- matrix(1:6, nrow = 3)
c2
#>      [,1] [,2]
#> [1,]    1    4
#> [2,]    2    5
#> [3,]    3    6
cbind(c2, c(1,-1,10))
#>      [,1] [,2] [,3]
#> [1,]    1    4    1
#> [2,]    2    5   -1
#> [3,]    3    6   10

cbind is also allowed to have scalars in its arguments, where the scalars are reused

cbind(1, c(1,-1,10))
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    1   -1
#> [3,]    1   10
cbind(c(1,2), c(1,-1,10))
#> Warning in cbind(c(1, 2), c(1, -1, 10)): number of rows of result is not a
#> multiple of vector length (arg 1)
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    2   -1
#> [3,]    1   10

rbind is used in a similar way. You can put a vector of equal length together as a row vector, or a matrix together with a vector of length equal to the number of columns in the matrix, or a vector of length one.

r1 <- rbind(c(1,2,3))
r1
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
rbind(c(1,2), c(3,4), c(5,6))
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    3    4
#> [3,]    5    6
r2 <- matrix(1:6, ncol = 3)
r2
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
rbind(r2, c(1,-1,10))
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    2    4    6
#> [3,]    1   -1   10
rbind(1, c(1,-1,10))
#>      [,1] [,2] [,3]
#> [1,]    1    1    1
#> [2,]    1   -1   10
rbind(c(1,2), c(1,-1,10))
#> Warning in rbind(c(1, 2), c(1, -1, 10)): number of columns of result is not a
#> multiple of vector length (arg 1)
#>      [,1] [,2] [,3]
#> [1,]    1    2    1
#> [2,]    1   -1   10

3.8 Lists

Lists are like atomic vectors because they group data into a one-dimensional set. However, lists do not group together individual values; lists group together R objects, such as atomic vectors and other lists. For example, you can make a list that contains a numeric vector of length 31 in its first element, a character vector of length 1 in its second element, and a new list of length 2 in its third element. To do this, use the list function.

list creates a list the same way c creates a vector. Separate each element in the list with a comma:

list1 <- list(100:130, "R", list(TRUE, FALSE))
list1
#> [[1]]
#>  [1] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
#> [20] 119 120 121 122 123 124 125 126 127 128 129 130
#> 
#> [[2]]
#> [1] "R"
#> 
#> [[3]]
#> [[3]][[1]]
#> [1] TRUE
#> 
#> [[3]][[2]]
#> [1] FALSE

Lists are a basic type of object in R, on par with atomic vectors. Like atomic vectors, they are used as building blocks to create many more spohisticated types of R objects.

As you can imagine, the structure of lists can become quite complicated, but this flexibility makes lists a useful all-purpose storage tool in R: you can group together anything with a list.

A very useful tool for working with lists is str() because it focusses on the structure, not the contents.

str(list1)
#> List of 3
#>  $ : int [1:31] 100 101 102 103 104 105 106 107 108 109 ...
#>  $ : chr "R"
#>  $ :List of 2
#>   ..$ : logi TRUE
#>   ..$ : logi FALSE

3.8.1 Subsetting

There are three ways to subset a list, which I’ll illustrate with a list named a:

a <- list(a = 1:3, b = "a string", c = pi, d = list(-1, -5))
str(a)
#> List of 4
#>  $ a: int [1:3] 1 2 3
#>  $ b: chr "a string"
#>  $ c: num 3.14
#>  $ d:List of 2
#>   ..$ : num -1
#>   ..$ : num -5
  1. [ extracts a sub-list. The result will always be a list.

    str(a[1:2])
    #> List of 2
    #>  $ a: int [1:3] 1 2 3
    #>  $ b: chr "a string"
    str(a[4])
    #> List of 1
    #>  $ d:List of 2
    #>   ..$ : num -1
    #>   ..$ : num -5

    Like with vectors, you can subset with a logical, integer, or character vector.

  2. [[ extracts a single component from a list. It removes a level of hierarchy from the list.

    str(a[[1]])
    #>  int [1:3] 1 2 3
    str(a[[4]])
    #> List of 2
    #>  $ : num -1
    #>  $ : num -5
  3. $ is a shorthand for extracting named elements of a list. It works similarly to [[ except that you don’t need to use quotes.

    a$a
    #> [1] 1 2 3
    a[["a"]]
    #> [1] 1 2 3

The distinction between [ and [[ is really important for lists, because [[ drills down into the list while [ returns a new, smaller list.

Subsetting a list, visually.

Subsetting a list, visually.

3.9 Data Frames

Data frames are the two-dimensional version of a list. They are far and away the most useful storage structure for data analysis. You can think of a data frame as R’s equivalent to the Excel spreadsheet because it stores data in a similar format.

Data frames group vectors together into a two-dimensional table. Each vector becomes a column in the table. As a result, each column of a data frame can contain a different type of data; but within a column, every cell must be the same type of data.

Creating a data frame by hand takes a lot of typing, but you can do it (if you like) with the data.frame function. Give data.frame any number of vectors, each separated with a comma. Each vector should be set equal to a name that describes the vector. data.frame will turn each vector into a column of the new data frame:

df <- data.frame(face = c("ace", "two", "six"), suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3), stringsAsFactors = TRUE)
df
#>   face  suit value
#> 1  ace clubs     1
#> 2  two clubs     2
#> 3  six clubs     3

You’ll need to make sure that each vector is the same length (or can be made so with R’s recycling rules), as data frames cannot combine columns of different lengths.

If you look at the type of a data frame, you will see that it is a list. In fact, each data frame is a list with class data.frame. You can see what types of objects are grouped together by a list (or data frame) with the str function:

typeof(df)
#> [1] "list"
class(df)
#> [1] "data.frame"
str(df)
#> 'data.frame':    3 obs. of  3 variables:
#>  $ face : Factor w/ 3 levels "ace","six","two": 1 3 2
#>  $ suit : Factor w/ 1 level "clubs": 1 1 1
#>  $ value: num  1 2 3

Notice that R saved your character strings as factors. It is not a very big deal here, but you can prevent this behavior by adding the argument stringsAsFactors = FALSE to data.frame:

Note: StringsAsFactors determines should character vectors be converted to factors. The ‘factory-fresh’ default has been TRUE previously but has been changed to FALSE for R 4.0.0. Only as short time workaround, you can revert by setting options(stringsAsFactors = TRUE) which now warns about its deprecation.

df1 <- data.frame(face = c("ace", "two", "six"), suit = c("clubs", "clubs", "clubs"), value = c(1, 2, 3))
str(df1)
#> 'data.frame':    3 obs. of  3 variables:
#>  $ face : chr  "ace" "two" "six"
#>  $ suit : chr  "clubs" "clubs" "clubs"
#>  $ value: num  1 2 3

You can use View to See the full dataframe. And you can use head to list the first few lines(default = 6) of the dataframe.

# View(df1)
head(df1, n=2L)
#>   face  suit value
#> 1  ace clubs     1
#> 2  two clubs     2

3.9.1 Tibbles

Tibbles are augmented lists: they have class “tbl_df” + “tbl” + “data.frame”, and names (column) and row.names attributes:

tb <- tibble::tibble(x = 1:5, y = 5:1)
tb
#> # A tibble: 5 x 2
#>       x     y
#>   <int> <int>
#> 1     1     5
#> 2     2     4
#> 3     3     3
#> 4     4     2
#> 5     5     1
typeof(tb)
#> [1] "list"
attributes(tb)
#> $names
#> [1] "x" "y"
#> 
#> $row.names
#> [1] 1 2 3 4 5
#> 
#> $class
#> [1] "tbl_df"     "tbl"        "data.frame"

The difference between a tibble and a list is that all the elements of a data frame must be vectors with the same length. All functions that work with tibbles enforce this constraint.

Traditional data.frames have a very similar structure:

df2 <- data.frame(x = 1:5, y = 5:1)
df2
#>   x y
#> 1 1 5
#> 2 2 4
#> 3 3 3
#> 4 4 2
#> 5 5 1
typeof(df2)
#> [1] "list"
attributes(df2)
#> $names
#> [1] "x" "y"
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#> [1] 1 2 3 4 5

The main difference is the class. The class of tibble includes “data.frame” which means tibbles inherit the regular data frame behaviour by default.

4 Operators

There are four main categories of Operators in R programming language.

  1. Assignment Operators

  2. Arithmetic Operators

  3. Relational Operators

  4. Logical Operators

4.1 Assignment Operators

Assignment Operators are those that help in assigning a value to the variable.

a = 2
a
#> [1] 2
a <- TRUE
a
#> [1] TRUE
454 -> a
a
#> [1] 454
a <<- 2.9
a
#> [1] 2.9
c(6, 8, 9) -> a
a
#> [1] 6 8 9

4.2 Arithmetic Operators

Arithmetic Operators are used to accomplish arithmetic operations.

Example for scalars:

a <- 7.5
b <- 2

print ( a+b ) #addition
#> [1] 9.5
print ( a-b ) #subtraction
#> [1] 5.5
print ( a*b ) #multiplication
#> [1] 15
print ( a/b ) #Division
#> [1] 3.75
print ( a%%b ) #Reminder
#> [1] 1.5
print ( a%/%b ) #Quotient
#> [1] 3
print ( a^b ) #Power of
#> [1] 56.25

Example for vectors:

a <- c(8, 9, 6)
b <- c(2, 4, 5)

print ( a+b ) #addition
#> [1] 10 13 11
print ( a-b ) #subtraction
#> [1] 6 5 1
print ( a*b ) #multiplication
#> [1] 16 36 30
print ( a/b ) #Division
#> [1] 4.00 2.25 1.20
print ( a%%b ) #Reminder
#> [1] 0 1 1
print ( a%/%b ) #Quotient
#> [1] 4 2 1
print ( a^b ) #Power of
#> [1]   64 6561 7776

You can do inner multiplication with the %*% operator and outer multiplication with the %o% operator:

die <- c(1:6)
die %*% die
#>      [,1]
#> [1,]   91
die %o% die
#>      [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,]    1    2    3    4    5    6
#> [2,]    2    4    6    8   10   12
#> [3,]    3    6    9   12   15   18
#> [4,]    4    8   12   16   20   24
#> [5,]    5   10   15   20   25   30
#> [6,]    6   12   18   24   30   36

4.3 Relational Operators

Relational Operators are those that find out relation between the two operands provided to them. The output is boolean (TRUE or FALSE) for all of the Relational Operators in R programming language.

Operator Syntax Tests
> a > b Is a greater than b?
>= a >= b Is a greater than or equal to b?
< a < b Is a less than b?
<= a <= b Is a less than or equal to b?
== a == b Is a equal to b?
!= a != b Is a not equal to b?
%in% a %in% c(a, b, c) Is a in the group c(a, b, c)?

Example for scalars:

a <- 7.5
b <- 2

print ( a<b ) # less than
#> [1] FALSE
print ( a>b ) # greater than
#> [1] TRUE
print ( a==b ) # equal to
#> [1] FALSE
print ( a<=b ) # less than or equal to
#> [1] FALSE
print ( a>=b ) # greater than or equal to
#> [1] TRUE
print ( a!=b ) # not equal to
#> [1] TRUE

Example for vectors:

a <- c(7.5, 3, 5)
b <- c(2, 7, 0)

print ( a<b ) # less than
#> [1] FALSE  TRUE FALSE
print ( a>b ) # greater than
#> [1]  TRUE FALSE  TRUE
print ( a==b ) # equal to
#> [1] FALSE FALSE FALSE
print ( a<=b ) # less than or equal to
#> [1] FALSE  TRUE FALSE
print ( a>=b ) # greater than or equal to
#> [1]  TRUE FALSE  TRUE
print ( a!=b ) # not equal to
#> [1] TRUE TRUE TRUE

%in% is the only operator that does not do normal element-wise execution. %in% tests whether the value(s) on the left side are in the vector on the right side. If you provide a vector on the left side, %in% will not pair up the values on the left with the values on the right and then do element-wise tests. Instead, %in% will independently test whether each value on the left is somewhere in the vector on the right:

1 %in% c(3, 4, 5)
#> [1] FALSE
c(1, 2) %in% c(3, 4, 5)
#> [1] FALSE FALSE
c(1, 2, 3) %in% c(3, 4, 5)
#> [1] FALSE FALSE  TRUE
c(1, 2, 3, 4) %in% c(3, 4, 5)
#> [1] FALSE FALSE  TRUE  TRUE

Notice that you test for equality with a double equals sign, ==, and not a single equals sign, =, which is another way to write <-.

4.4 Logical Operators

Logical Operators in R programming language work only for the basic data types logical, numeric and complex and vectors of these basic data types.

Operator Syntax Tests
& cond1 & cond2 Are both cond1 and cond2 true?
| cond1 | cond2 Is one or more of cond1 and cond2 true?
xor xor(cond1, cond2) Is exactly one of cond1 and cond2 true?
! !cond1 Is cond1 false? (e.g., ! flips the results of a logical test)
any any(cond1, cond2, cond3, ...) Are any of the conditions true?
all all(cond1, cond2, cond3, ...) Are all of the conditions true?

Example for basic logical elements:

a <- 0 # logical FALSE
b <- 2 # logical TRUE

print ( a & b ) # logical AND element wise
#> [1] FALSE
print ( a | b ) # logical OR element wise
#> [1] TRUE
print ( !a ) # logical NOT element wise
#> [1] TRUE
print ( a && b ) # logical AND consolidated for all elements
#> [1] FALSE
print ( a || b ) # logical OR consolidated for all elements
#> [1] TRUE

Example for boolean vectors:

a <- c(TRUE, TRUE, FALSE, FALSE)
b <- c(TRUE, FALSE, TRUE, FALSE)

print ( a & b ) # logical AND element wise
#> [1]  TRUE FALSE FALSE FALSE
print ( a | b ) # logical OR element wise
#> [1]  TRUE  TRUE  TRUE FALSE
print ( !a ) # logical NOT element wise
#> [1] FALSE FALSE  TRUE  TRUE
print ( a && b ) # logical AND consolidated for all elements
#> [1] TRUE
print ( a || b ) # logical OR consolidated for all elements
#> [1] TRUE

You can use || (or) and && (and) to combine multiple logical expressions. These operators are “short-circuiting”: as soon as || sees the first TRUE it returns TRUE without computing anything else. As soon as && sees the first FALSE it returns FALSE. You should never use | or & in an if statement: these are vectorised operations that apply to multiple values. If you do have a logical vector, you can use any() or all() to collapse it to a single value.

a <- c(TRUE, TRUE, FALSE, FALSE)
b <- c(TRUE, FALSE, TRUE, FALSE)

xor(a,b)
#> [1] FALSE  TRUE  TRUE FALSE
any(a,b)
#> [1] TRUE
all(a,b)
#> [1] FALSE

4.5 Missing Information

Missing information problems happen frequently in data science. Usually, they are more straightforward: you don’t know a value because the measurement was lost, corrupted, or never taken to begin with. R has a way to help you manage these missing values.

The NA character is a special symbol in R. It stands for “not available” and can be used as a placeholder for missing information. R will treat NA exactly as you should want missing information treated.

1 + NA
#> [1] NA
NA == 1
#> [1] NA
NA != 1
#> [1] NA
NA == NA
#> [1] NA
!NA
#> [1] NA
NA %in% c(TRUE, 0, FALSE)
#> [1] FALSE
NA %in% c(TRUE, NA, FALSE)
#> [1] TRUE
NA & TRUE
#> [1] NA
NA | TRUE
#> [1] TRUE
NA & FALSE
#> [1] FALSE
NA | FALSE
#> [1] NA

4.5.1 na.rm

Missing values can help you work around holes in your data sets, but they can also create some frustrating problems.

c(NA, 1:50)
#>  [1] NA  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
#> [26] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
#> [51] 50
mean(c(NA, 1:50))
#> [1] NA

Most R functions come with the optional argument, na.rm, which stands for NA remove. R will ignore NAs when it evaluates a function if you add the argument na.rm = TRUE:

mean(c(NA, 1:50), na.rm = TRUE)
#> [1] 25.5

4.5.2 is.na

On occasion, you may want to identify the NAs in your data set with a logical test, but that too creates a problem.

NA == NA
#> [1] NA
c(1, 2, 3, NA) == NA
#> [1] NA NA NA NA

R supplies a special function that can test whether a value is an NA. The function is sensibly named is.na:

is.na(NA)
#> [1] TRUE
vec <- c(1, 2, 3, NA)
is.na(vec)
#> [1] FALSE FALSE FALSE  TRUE

5 Conditional Statement

An if statement allows you to conditionally execute code. It looks like this:

if (condition) {
  # code executed when condition is TRUE
} else {
  # code executed when condition is FALSE
}

5.1 If Statement

The If statement helps you in evaluating a single expression as part of the flow. To perform this evaluation, you just need to write the If keyword followed by the expression to be evaluated.It looks like this:

num1 <- 10
num2 <- 20

if(num1 <= num2){
  print("Num1 is less or equal to Num2")
}
#> [1] "Num1 is less or equal to Num2"

The condition must evaluate to either TRUE or FALSE. If it’s a vector, you’ll get a warning message; if it’s an NA, you’ll get an error. Watch out for these messages in your own code:

if (c(TRUE, FALSE)) {}
#> Warning in if (c(TRUE, FALSE)) {: the condition has length > 1 and only the
#> first element will be used
#> NULL
if (NA) {}
#> Error in if (NA) {: missing value where TRUE/FALSE needed

5.2 Else If Statement:

The Else if statement helps you in extending branches to the flow created by the If statement and give you the opportunity to evaluate multiple conditions by creating new branches of flow.

Num1 <-  10
Num2 <-  20

if (Num1 < Num2) { 
  print("Num1 is lesser than Num2") 
} else if( Num1 > Num2) {
  print("Num2 is lesser than Num1") 
} else {
  print("Num1 and Num2 are Equal")
}
#> [1] "Num1 is lesser than Num2"
x <- 2

if (x > 2) {
  print("x is greater than 2")
} else if (x == 2) {
  print("x is equal to 2")
} else {
  print("x is not greater than 2")
}
#> [1] "x is equal to 2"

5.3 Else Statement:

The else statement is used when all the other expressions are checked and found invalid. This will be the last statement that gets executed as part of the If – Else if branch.

x <- -5

if (x > 0) {
  print("Non-negative number")
} else {
  print("Negative number")
}
#> [1] "Negative number"

5.4 ifelse

ifelse() has three arguments. The first argument test should be a logical vector. The result will contain the value of the second argument, yes, when test is TRUE, and the value of the third argument, no, when it is false.

x <- c(-2, 0, 1)
y <- ifelse(x >= 0, 1, 0); print(y)
#> [1] 0 1 1

ifelse((1:6) >= 3, 1:2, c(-1,-2))
#> [1] -1 -2  1  2  1  2

6 Loops

Loops are R’s method for repeating a task, which makes them a useful tool for programming simulations. A loop statement allows us to execute a statement or group of statements multiple times. There are mainly 3 types of loops in R:

6.1 Repeat Loop

It repeats a statement or group of statements while a given condition is TRUE. Repeat loop is the best example of an exit controlled loop where the code is first executed and then the condition is checked to determine if the control should be inside the loop or exit from it.

A block of statements are repeated forever. A breaking condition has to be provided inside the repeat block to come out of the loop.

It looks like this:

repeat {   
  statements  
  if (stop_condition) {    
    break   
  } 
}
a <-  1

repeat {
  # starting of repeat statements block
  print(a)
  a <- a + 1
  # ending of repeat statements block
  if(a > 5) { # breaking condition
    break
  }
}
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5

6.2 While Loop

It helps to repeats a statement or group of statements while a given condition is TRUE. While loop, when compared to the repeat loop is slightly different, it is an example of an entry controlled loop where the condition is first checked and only if the condition is found to be true does the control be delivered inside the loop to execute the code.

A block of statements are executed repeatedly in a loop till the condition provided to while statement returns TRUE.

It looks like this:

while (condition) {  
  statements
}
a <- 1

while(a <= 5) {
  print(a)
  a <- a + 1
}
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5

6.3 For Loop

It is used to repeats a statement or group of for a fixed number of times. Unlike repeat and while loop, the for loop is used in situations where we are aware of the number of times the code needs to executed beforehand. It is similar to the while loop where the condition is first checked and then only does the code written inside get executed.

A block of statements are executed for each of the items in the list provided to the for loop.

It looks like this:

for (x in vector) {
  statements
}
a <- c(1:5)

for(i in a) {
  print(i)
}
#> [1] 1
#> [1] 2
#> [1] 3
#> [1] 4
#> [1] 5

To save output from a for loop, you must write the loop so that it saves its own output as it runs. You can do this by creating an empty vector or list before you run the for loop. Then use the for loop to fill up the vector or list. When the for loop is finished, you’ll be able to access the vector or list, which will now have all of your results.

a <- c(1:5)
output <- vector(length = length(a))

for(i in seq_along(a)) {
  output[i] <- a[i] * 2
}
output
#> [1]  2  4  6  8 10

You might not have seen seq_along() before.It’s a safe version of the familiar 1:length(l), with an important difference: if you have a zero-length vector, seq_along() does the right thing:

y <- vector("double", 0)
seq_along(y)
#> integer(0)
1:length(y)
#> [1] 1 0

You probably won’t create a zero-length vector deliberately, but it’s easy to create them accidentally. If you use 1:length(x) instead of seq_along(x), you’re likely to get a confusing error message.

7 Functions

R comes with many functions that you can use to do sophisticated tasks like random sampling. Using a function is pretty simple. Just write the name of the function and then the data you want the function to operate on in parentheses:

round(3.1415)
#> [1] 3
factorial(3)
#> [1] 6
cumsum(1:5)
#> [1]  1  3  6 10 15
cumprod(1:5)
#> [1]   1   2   6  24 120

The data that you pass into the function is called the function’s argument. The argument can be raw data, an R object, or even the results of another R function. And R will work from the innermost function to the outermost.

die <- c(1:6)
mean(die)
#> [1] 3.5
round(mean(die))
#> [1] 4
sample(die, size = 3)
#> [1] 5 3 4
sample(die, size = 3, replace = TRUE)
#> [1] 3 1 2

If you’re not sure which names to use with a function, you can look up the function’s arguments with args. To do this, place the name of the function in the parentheses behind args.

args(round)
#> function (x, digits = 0) 
#> NULL
args(sample)
#> function (x, size, replace = FALSE, prob = NULL) 
#> NULL

7.1 Writing Your Own Functions

Every function in R has three basic parts: a name, a body of code, and a set of arguments. To make your own function, you need to replicate these parts and store them in an R object, which you can do with the function function. To do this, call function() and follow it with a pair of braces, {}:

my_function <- function(arguments) {
  # code
}
roll <- function() {
  die <- 1:6
  dice <- sample(die, size = 2, replace = TRUE)
  sum(dice)
}

You can think of the parentheses as the “trigger” that causes R to run the function. If you type in a function’s name without the parentheses, R will show you the code that is stored inside the function. If you type in the name with the parentheses, R will run that code:

roll
#> function() {
#>   die <- 1:6
#>   dice <- sample(die, size = 2, replace = TRUE)
#>   sum(dice)
#> }
roll()
#> [1] 7

The code that you place inside your function is known as the body of the function. When you run a function in R, R will execute all of the code in the body and then return the result of the last line of code. If the last line of code doesn’t return a value, neither will your function, so you want to ensure that your final line of code returns a value.

Here’s some code that would display a result:

dice
1 + 1
sqrt(2)

And here’s some code that would not:

dice <- sample(die, size = 2, replace = TRUE)
two <- 1 + 1
a <- sqrt(2)

These lines of code do not return a value to the command line; they save a value to an object.

You can supply bones when you call roll2 if you make bones an argument of the function. To do this, put the name bones in the parentheses that follow function when you define roll2:

roll2 <- function(bones) {
  dice <- sample(bones, size = 2, replace = TRUE)
  sum(dice)
}
roll2(bones = 1:4)
#> [1] 6
roll2(1:6)
#> [1] 3

Notice that roll2 will still give an error if you do not supply a value for the bones argument when you call roll2:

roll2()
#> Error in sample(bones, size = 2, replace = TRUE): argument "bones" is missing, with no default

You can prevent this error by giving the bones argument a default value. To do this, set bones equal to a value when you define roll2:

roll2 <- function(bones = 1:6) {
  dice <- sample(bones, size = 2, replace = TRUE)
  sum(dice)
}

Now you can supply a new value for bones if you like, and roll2 will use the default if you do not:

roll2()
#> [1] 6
roll2(1:20)
#> [1] 13

You can give your functions as many arguments as you like. Just list their names, separated by commas, in the parentheses that follow function. When the function is run, R will replace each argument name in the function body with the value that the user supplies for the argument. If the user does not supply a value, R will replace the argument name with the argument’s default value (if you defined one).