<- 100
a = 100
b 100 -> c
Week 2
R basics
Basic operations
While much more versatile, R can be used as a calculator with some basic syntax.
+
- addition-
- subtraction*
- multiplication/
- division^
- raise to a power. E.g.,2^4
will compute \(2^4 = 16\)
Variable assignment
One of the key aspects of coding in R is assignment. That is, assigning objects or values to a name (we will come back to objects shortly).
Any of =
, <-
, and ->
can act as assignment operators. For example, in
all of a
, b
, and c
are assigned the value 100. We can then use these objects in other computations.
Basic data types
There are a handful of basic data types in R, but the ones you will encounter most ofter are:
- integer
- numeric
- character
- logical
Classes can get “elevated”
Classes may change in R depending on what operation an object or value is involved in. Let’s explore some scenarios in which this happens.
Objects and object classes
While there are some basic data types, there are many many classes of objects in R. To R, everything is an object, each with a specified class, and the class determines what sorts of operations an object can be involved in. Some other examples of object classes include:
data.frame
matrix
array
Methods for classes
Classes of objects have certain methods defined for them. These are generic functions, such as summary()
or print()
, that have specific instructions for how to handle different classes of objects. In order to see what functions work with a specific class of object, you can use the function methods()
. This will give you an idea of what sorts of things you can do with a given object class.
Dataframes, vectors, and matrices
Some of the most common objects used in the R language are dataframes, vectors, and matrices. These are objects that can store a lot of data in a logical format. You have already seen how to read in data from a csv file in week 1. What we didn’t cover was that the data were automatically stored in a dataframe when loaded into the environment.
Working with vectors
Vectors are essentially lists of values. They can be lists of character
values, numeric
values, logical
values, or even more general lists of just about any object type there is. These most general versions of vectors are called lists in R.
Accessing elements of a vector is one of the most important things to know. Indexing in R is 1-based, meaning the first element of a vector is considered to be in position 1 (rather than position 0, like many other languages).
We can also combine multiple vectors into one. For example, try
<- c(4, 5, 6)
w <- c(v, w)
vw vw
[1] 1 2 3 4 5 6
Working with dataframes and matrices
Matrices and dataframes are 2-Dimensional generalizations of vectors, as if stacking vectors on top of each other or side by side. For example, two vectors,
\[ {\bf v} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix},\ \ {\bf w} = \begin{bmatrix} 4 \\ 5 \\ 6\end{bmatrix} \]
can be concatenated into a single matrix
\[ {\bf M} = \begin{bmatrix} 1 & 4\\ 2 & 5\\ 3 & 6 \end{bmatrix}. \]
In R, we can also have character
and logical
matrices.
Accessing elements, columns, and rows
Again being able to access, add, or remove elements (or even whole columns/rows) from data structures is one of the most important skills to learn. Try the following:
Missing values
You will almost certainly have some missing data in your own graduate studies, either due to faulty instrumentation, a lapse in focus, or one of many other reasons. That is normal. So normal, that R uses a constant (a special object) NA
to encode missing data. It’s relatively easy to convert other codes for missing data (e.g., 999
or a blank cell in an excel spreadsheet) to NA
values, once data are loaded into R or during the loading process. There are even methods designed to identify missing values. For example, try:
is.na(c(2, 3, NA, 5, 6))
[1] FALSE FALSE TRUE FALSE FALSE
Note that the functions mean()
and sd()
, which take the mean and standard deviation of a vector of values, respectively, behave similarly.
Vectorization
R is optimized to work with vectors, matrices, and arrays (matrices but in even more dimensions). This is done through vectorized functions which apply the same function to all the elements of a vector or array at once.
Working with “strings”
Learning how to work with strings is extremely valuable and can save you a lot of manual labor. We will see some more of how useful it can be next week, but first, let’s cover some basics.
Double versus single quotes
Most of the time, you can use double or single quotes interchangeably.
<- "This is a string."
str1 <- 'This is a string.'
str2
# are they the same?
== str2 str1
[1] TRUE
However, if you want to include quotes inside the string, you need to use single quotes.
<- 'This is a "string".'
str3 str3
[1] "This is a \"string\"."
Note that printing a string, as we did above with str3
, is different from writing the string itself. To do that, we want
writeLines(str3)
This is a "string".
Special characters and escapes
There are a handful of special characters, including things like \
, '
, and "
. Other special characters you are likely to use are \n
and \t
, which are newline and tab characters, respectively. For example:
<- "Using\nnewline\ncharacters"
str4 writeLines(str4)
Using
newline
characters
To escape these special characters and use them literally in your string, precede them with \
.
writeLines("\'Escape\'")
'Escape'
writeLines("\"Escape\"")
"Escape"
writeLines("Some LaTeX: $\\bar{x} = \\frac{1}{n}\\sum_{i=1}^n x_i$")
Some LaTeX: $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$
We can also combine strings with other R output. For example:
<- round(pi, digits = 2)
approx_pi
writeLines(
paste0("I ate some ", approx_pi, ".\n")
)
I ate some 3.14.
# using paste() with a separator between the strings
writeLines(
paste("Lions", "Tigers", "and Bears", "oh my!", sep = ", ")
)
Lions, Tigers, and Bears, oh my!
Final thoughts
- Everything in R is an object with a specified class.
- Get used to working with dataframes and vectors. They will be your bread and butter for most analyses!