Objects in RThe entities that R creates and manipulates are referred to as ‘objects’. Examples of objects are arrays, matrices, character strings, factors, functions and data structures. The name of any object must start with a letter (A-Z and a-z) and can include letters, digits (0-9), dots and underscores. Note that R discriminates between uppercase and lowercase letters in the objects’ names. By ‘workspace’ we indicate the collection of objects created during an R session. An object can be created with the ‘assign’ operator (‘<-‘ or ‘->’ or ‘assign()’ or sometimes ‘=’). For instance, the following commands are all equivalent: |
Return to the wiki
|
> x <- 5 > 5 -> x > x = 5 > assign('x',5) |
↑ Go up |
To display the content of the object, simply type the name of the object (e.g., x) and press enter in the RStudio console. |
|
> x |
|
Alternatively, issue the command |
|
> print(x) |
|
The output includes the digit 1 within brackets indicating that the display starts at the first element of x. |
|
[1] 5 |
↑ Go up |
Not only are objects characterized by their names and contents, but also by their attributes. All objects have two intrinsic attributes: mode and length. The mode specifies the type of the elements contained in the object. There are four main modes to represent data in R: numeric, character, complex and logical. The length specifies the number of elements contained in the object. The functions mode() and length() allow to display the mode and the length of an object. |
|
> mode(x) [1] "numeric" > length(x) [1] 1 |
|
> a <- 'red'; b <- TRUE; c <- 1i > mode(a); mode(b); mode(c) [1] "character" [1] "logical" [1] "complex" |
↑ Go up |
Note that objects of mode character need to be delimited with single (') or double (") quotes. Note also the use of semicolon to separate distinct commands on the same line. |
|
VectorsA vector is the simplest instance of data structure in R. It consists of an ordered collection of data elements of the same basic type (numeric, character, logical or complex). To set up a vector, use the function c(). |
|
> v1 <- c(2,4.5,3); v2 <- c('red','green','blue'); v3 <- c(TRUE,FALSE) > mode(v1); mode(v2); mode(v3) [1] "numeric" [1] "character" [1] "logical" > length(v1); length(v2); length(v3) [1] 3 [1] 3 [1] 2 |
↑ Go up |
Remember that a vector must have elements of the same type. If we try to set up a vector concatenating elements of different types, the function c() will coerce the elements to the same type. Coercion is from lower to higher types, from logical to integer to double to character. |
|
> v4 <- c(FALSE,5,5.9,"brain") > v4 [1] "FALSE" "5" "5.9" "brain" > mode(v4) [1] "character" |
|
If you need to create a vector of consecutive numbers, the colon (:) operator is very helpful. For example, |
|
> 1:5 |
|
is equivalent to |
|
> c(1,2,3,4,5) |
↑ Go up |
but easier to use. More complex sequences of numbers can be created using the seq() function. For example, |
|
> seq(1,3,by=0.5) [1] 1.0 1.5 2.0 2.5 3.0 |
|
To select specific elements of a vector, you need to append an ‘index vector’ in square brackets to the name of the vector. For example, |
|
> v4[2:3] [1] "5" "5.9" |
|
The following index vector specifies the elements to be excluded rather than included |
|
> v4[-(2:3)] [1] "FALSE" "brain" |
↑ Go up |
The index vector can also be a logical vector |
|
> v1[v1>2] [1] 4.5 3.0 |
|
In the following example, we will create an empty numeric object using the function numeric() without specifying any argument. Then, we will assign numeric values to the third and fifth components, leaving empty the other components. |
|
> x <- numeric() > x[3] <- 5 > x[5] <- 8 > x [1] NA NA 5 NA 8 |
|
R uses the special value ‘NA’ to indicate that an element or value is ‘not available’. The following command creates an object y that will contain non-missing values of x. |
|
> y <- x[!is.na(x)] > y [1] 5 8 |
|
If we want to replace any missing values in x by zeros, we can use the command |
|
> x[is.na(x)] <- 0 > x [1] 0 0 5 0 8 |
↑ Go up |
ArraysArrays are multi-dimensional generalizations of vectors. Like vectors, arrays can store the values having the same kind of data types. Two-dimensional arrays are also called matrices. There are several ways to create an array in R. To begin with, a vector ‘x’ can be treated as an array, if we assign the ‘dim’ attribute to ‘x’. The dim attribute specifies the array size along each dimension. For instance, if dim(x) is a vector with components 5 and 2, then the vector x will be treated as a matrix with 5 rows and 2 columns. |
|
> x <- c(1:5,seq(3,11,by=2)) > dim(x) <- c(5,2) > x [,1] [,2] [1,] 1 3 [2,] 2 5 [3,] 3 7 [4,] 4 9 [5,] 5 11 |
↑ Go up |
To select individual elements of an array, use subscripts in square brackets, separated by commas. For example, to select the element in the fourth row and second column, use the command |
|
> x[4,2] [1] 9 |
|
If any index position is empty, then the full range of that subscript is taken. For example, to select the third row, use the command |
|
> x[3,] [1] 3 7 |
|
To select the second column, use the command |
|
> x[,2] [1] 3 5 7 9 11 |
↑ Go up |
Other functions, such as array() or matrix(), are available for defining arrays. |
|
> x <- array(1:9, dim=c(3,3)) |
|
Or, equivalently |
|
> x <- matrix(1:9,nrow=3,ncol=3) > x [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 |
|
FactorsFactors are important objects for statistical analysis and plotting, as they adequately represent categorical data. Although factors look like character vectors, they are actually integers. Factors can only contain a predefined set of values, known as ‘levels’. The factor() function creates and modifies factors in R. |
|
> v <- c('male','female') > f <- factor(v) > f [1] male female Levels: female male |
↑ Go up |
As you can see, R sorts the levels in alphabetical order by default. Under the hood, R assigned 1 to the level ‘female’ and 2 to the level ‘male’, because ‘f’ comes before ‘m’, even though the first element in the vector is ‘male’. Sometimes, the order of the factors matters and you might want to specify a given order. For example, |
|
> f2 <- factor(c('low','medium','high'),levels=c('low','medium','high')) > f2 [1] low medium high Levels: low medium high |
|
Note that the levels of f2 are sorted in a more meaningful way than alphabetical order. |
|
ListsLists are objects more generic than vectors, since the elements of a list can be of different types. For example, a list could consist of a logical value, a matrix, a character array, a complex vector, a function, and so on. The list() function creates a list. |
|
> l <- list(name='Richard',wife='Kelly',nChildren=2, childAges=c(5,3)) > l $name [1] "Richard" $wife [1] "Kelly" $nrChildren [1] 2 $childAges [1] 5 3 |
↑ Go up |
In the example given above, we have created a list ‘l’ with four components. To check the number of (top level) components in a list, use the command |
|
> length(l) [1] 4 |
|
Components are always numbered and may be referred to by specifying the corresponding number in square brackets. For instance, |
|
> l[[2]] [1] "Kelly" |
|
Components may also be named, as in our example. In such cases, components may be referred to by specifying the corresponding name in square brackets |
|
> l[['wife']] [1] "Kelly" |
↑ Go up |
or by giving an expression of the form ‘listName$componentName’ |
|
> l$wife [1] "Kelly" |
|
So, in our example, l$wife is the same as l[[2]] and is the same as l[['wife']] l$childAges2 is the same as l[[4]]2 and is the same as l[['childAges']]2 and is 3. It is worth emphasizing that l[[2]] is different from l2. The operator ‘[[…]]’ selects a single element in a list, whereas ‘...’ is a general subscripting operator. Like any subscripted object, lists can be extended by specifying additional components. For example, |
|
> l[['hairColor']] <- 'red' |
|
or, more simply, |
|
> l$hairCol <- 'red' |
|
We can delete a component by assigning ‘NULL’ to it. |
|
> l$hairCol <- NULL |
↑ Go up |
Data framesData frames are matrix-like lists, in which the columns can be of different types. Many experiments are adequately described by means of data frames, with one row per observational unit and with the possibility to manage both numerical and categorical variables. The main characteristics of a data frame are summarized as follows:
To create a data frame, use the data.frame() command. |
|
> df <- data.frame(id=c(1:3),name=c('Richard','Kelly','Alan'),nChildren=c(2,1,0)) > df id name nChildren 1 1 Richard 2 2 2 Kelly 1 3 3 Alan 0 |
↑ Go up |
By using the str() function, you can see the structure of the data frame. |
|
> str(df) 'data.frame': 3 obs. of 3 variables: $ id : int 1 2 3 $ name : chr "Richard" "Kelly" "Alan" $ nChildren: num 2 1 0 |
|
You can easily extract specific columns and store them into another data frame. |
|
> df2 <- data.frame(df$id,df$nChildren) > df2 df.id df.nChildren 1 1 2 2 2 1 3 3 0 |
|
You may want to extract specific columns and specific rows. For example, the command |
|
> df3 <- df[1:2,c(2,3)] |
|
extracts the first two rows and the second and third columns. You can also add columns very easily. |
|
> df$hasPets <- c(TRUE,TRUE,FALSE) > df id name nChildren hasPets 1 1 Richard 2 TRUE 2 2 Kelly 1 TRUE 3 3 Alan 0 FALSE |
↑ Go up |
Many datasets are supplied with R as data frames. To see the list of available datasets, use the following command. |
|
> data() |
|
To load a specific dataset from the list, it suffices to call the dataset name using the function data(). For example, |
|
> data(npk) |
|
Instead of printing out the entire data frame, it is often useful to preview it with the head() function |
|
> head(npk) |
|
The simplest way to make small changes to a data frame is to invoke the edit() function. The command |
|
> dfNew <- edit(npk) |
|
allows you to save the changes to a new object. But, if you want to edit the original dataset, the simplest way is to use the fix() function. If you want to enter new data via the spreadsheet interface, use the following command |
|
> dfNew <- edit(data.frame()) |
↑ Go up |
However, if your dataset is large, it is more convenient to read it from external files, rather than enter each value at the keyboard. You can read data stored in text files by using different functions: read.table(), scan(), read.csv(), read.csv2(), read.delim(), read.delim2(). R can also read files in other formats (e.g., Excel, SPSS), but the functions needed for this purpose are not present in base packages. In what follows, we will illustrate three different methods to read data from Excel files (.xls or .xlsx). As for the first method,
As for the second method,
The last R command assumes that the file ‘myFile.xlsx’ is in your current working directory (see below). As for the third method,
Note that the file.choose() function allows to choose a file interactively. |
|
Saving the workspaceSo far, we have created several objects in the R workspace. The functions ‘objects()’ or ‘ls()’ can be used to display the names of all objects currently available. |
|
> ls() |
↑ Go up |
To remove objects (e.g., v1, v2, v3 and v4) from the workspace, use the command |
|
> rm(v1, v2, v3, v4) |
|
All objects created during an R session can be stored in a file and reused in future sessions. By saving the workspace at the end of the session, the objects are written to a file called .RData in the current directory, and the command lines used in the session are written to a file called .Rhistory. Note that the leading ‘dot’ in these names make the files invisible in default file listings. When R is started from the same directory, it reloads the workspace and the command history from the corresponding files. For this reason, it is recommended to use separate working directories for separate data analyses. The command |
|
> getwd() |
|
is useful to get the working directory. If you want to change it, you can use |
|
> setwd('C:/data') |
↑ Go up |
specifying the path with quotation marks. |