Discrete data types
One of the features of the R environment is the rich collection of data types that are available. Here, we briefly list some of the built-in data types that describe discrete data. The four data types discussed are the integer, logical, character, and factor data types. We also introduce the idea of a vector, which is the default data structure for any variable. A list of the commands discussed here is given in Table 2 and Table 3.
It should be noted that the default data type in R, for a number, is a double precision number. Strings can be interpreted in a variety of ways, usually as either a string or a factor. You should be careful to make sure that R is storing information in the format that you want, and it is important to double-check this important aspect of how data is tracked.
Integer
The first discrete data type examined is the integer type. Values are 32-bit integers. In most circumstances, a number must be explicitly cast as being an integer, as the default type in R is a double precision number. There are a variety of commands used to cast integers as well as allocate space for integers. The integer
command takes a number for an argument and will return a vector of integers whose length is given by the argument:
> bubba <- integer(12) > bubba [1] 0 0 0 0 0 0 0 0 0 0 0 0 > bubba[1] [1] 0 > bubba[2] [1] 0 > bubba[[4]] [1] 0 > b[4] <- 15 > b [1] 0 0 0 15 0 0 0 0 0 0 0 0
In the preceding example, a vector of twelve integers was defined. The default values are zero, and the individual entries in the vector are accessed using braces. The first entry in the vector has index 1
, so in this example, bubba[1]
refers to the initial entry in the vector. Note that there are two ways to access an element in the vector: single versus double braces. For a vector, the two methods are nearly the same, but when we explore the use of lists as opposed to vectors, the meaning will change. In short, the double braces return objects of the same type as the elements within the vector, and the single braces return values of the same type as the variable itself. For example, using single braces on a list will return a list, while double braces may return a vector.
A number can be cast as an integer using the as.integer
command. A variable's type can be checked using the typeof
command. The typeof
command indicates how R stores the object and is different from the class
command, which is an attribute that you can change or query:
> as.integer(13.2) [1] 13 > thisNumber <- as.integer(8/3) > typeof(thisNumber) [1] "integer"
Note that a sequence of numbers can be automatically created using either the :
operator or the seq
command:
> 1:5 [1] 1 2 3 4 5 > myNum <- as.integer(1:5) > myNum[1] [1] 1 > myNum[3] [1] 3 > seq(4,11,by=2) [1] 4 6 8 10 > otherNums <- seq(4,11,by=2) > otherNums[3] [1] 8
A common task is to determine whether or not a variable is of a certain type. For integers, the is.integer
command is used to determine whether or not a variable has an integer type:
> a <- 1.2 > typeof(a) [1] "double" > is.integer(a) [1] FALSE > a <- as.integer(1.2) > typeof(a) [1] "integer" > is.integer(a) [1] TRUE
Logical
Logical data consists of variables that are either true or false. The words TRUE
and FALSE
are used to designate the two possible values of a logical variable. (The TRUE
value can also be abbreviated to T
, and the FALSE
value can be abbreviated to F
.) The basic commands associated with logical variables are similar to the commands for integers discussed in the previous subsection. The logical
command is used to allocate a vector of Boolean values. In the following example, a logical vector of length 10 is created. The default value is FALSE
, and the Boolean not operator is used to flip the values to evaluate to TRUE
:
> b <- logical(10) > b [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > b[3] [1] FALSE > !b [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > !b[5] [1] TRUE > typeof(b) [1] "logical" > mode(b) [1] "logical" > storage.mode(b) [1] "logical" > b[3] <- TRUE > b [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
To cast a value to a logical type, you can use the as.logical
command. Note that zero is mapped to a value of FALSE
and other numbers are mapped to a value of TRUE
:
> a <- -1:1 > a [1] -1 0 1 > as.logical(a) [1] TRUE FALSE TRUE
To determine whether or not a value has a logical type, you use the is.logical
command:
> b <- logical(4) > b [1] FALSE FALSE FALSE FALSE > is.logical(b) [1] TRUE
The standard operators for logical operations are available, and a list of some of the more common operations is given in Table 1. Note that there is a difference between operations such as &
and &&
. A single &
is used to perform an and
operation on each pairwise element of two vectors, while the double &&
returns a single logical result using only the first elements of the vectors:
> l1 <- c(TRUE,FALSE) > l2 <- c(TRUE,TRUE) > l1&l1 [1] TRUE FALSE > l1&&l1 [1] TRUE > l1|l2 [1] TRUE TRUE > l1||l2 [1] TRUE
Tip
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. An additional source for the examples in this book can be found at https://github.com/KellyBlack/R-Object-Oriented-Programming. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
The following table shows various logical operators and their description:
Table 1 – list of operators for logical variables
Character
One common way to store information is to save data as characters or strings. Character data is defined using either single or double quotes:
> a <- "hello" > a [1] "hello" > b <- 'there' > b [1] "there" > typeof(a) [1] "character"
The character
command can be used to allocate a vector of character-valued strings, as follows:
> many <- character(3) > many [1] "" "" "" > many[2] <- "this is the second" > many[3] <- 'yo, third!' > many[1] <- "and the first" > many [1] "and the first" "this is the second" "yo, third!"
A value can be cast as a character using the as.character
command, as follows:
> a <- 3.0 > a [1] 3 > b <- as.character(a) > b [1] "3"
Finally, the is.character
command takes a single argument, and it returns a value of TRUE
if the argument is a string:
> a <- as.character(4.5) > a [1] "4.5" > is.character(a) [1] TRUE
Factors
Another common way to record data is to provide a discrete set of levels. For example, the results of an individual trial in an experiment may be denoted by a value of a
, b
, or c
. Ordinal data of this kind is referred to as a factor in R. The commands and ideas are roughly parallel to the data types described previously. There are some subtle differences with factors, though. Factors are used to designate different levels and can be considered ordered or unordered. There are a large number of options, and it is wise to consult the help pages for factors using the (help(factor))
command. One thing to note, though, is that the typeof
command for a factor will return an integer.
Factors can be defined using the factor
command, as follows:
> lev <- factor(x=c("one","two","three","one")) > lev [1] one two three one Levels: one three two > levels(lev) [1] "one" "three" "two" > sort(lev) [1] one one two three Levels: one two three > lev <- factor(x=c("one","two","three","one"),levels=c("one","two","three")) > lev [1] one two three one Levels: one two three > levels(lev) [1] "one" "two" "three" > sort(lev) [1] one one two three Levels: one two three
The techniques used to cast a variable to a factor or test whether a variable is a factor are similar to the previous examples. A variable can be cast as a factor using the as.factor
command. Also, the is.factor
command can be used to determine whether or not a variable has a type of factor.
Continuous data types
The data types for continuous data types are given here. The double and complex data types are given. A list of the commands discussed here is given in Table 2 and Table 3.
Double
The default numeric data type in R is a double precision number. The commands are similar to those of the integer data type discussed previously. The double
command can be used to allocate a vector of double precision numbers, and the numbers within the vector are accessed using braces:
> d <- double(8) > d [1] 0 0 0 0 0 0 0 0 > typeof(d) [1] "double" > d[3] <- 17 > d [1] 0 0 17 0 0 0 0 0
The techniques used to cast a variable to a double precision number and test whether a variable is a double precision number are similar to the examples seen previously. A variable can be cast as a double precision number using the as.double
command. Also, to determine whether a variable is a double precision number, the as.double
command can be used.
Complex
Arithmetic for complex numbers is supported in R, and most math functions will react properly when given a complex number. You can append i
to the end of a number to force it to be the imaginary part of a complex number, as follows:
> 1i [1] 0+1i > 1i*1i [1] -1+0i > z <- 3+2i > z [1] 3+2i > z*z [1] 5+12i > Mod(z) [1] 3.605551 > Re(z) [1] 3 > Im(z) [1] 2 > Arg(z) [1] 0.5880026 > Conj(z) [1] 3-2i
The complex
command can also be used to define a vector of complex numbers. There are a number of options for the complex
command, so a quick check of the help page, (help(complex))
, is recommended:
> z <- complex(3) > z [1] 0+0i 0+0i 0+0i > typeof(z) [1] "complex" > z <- complex(real=c(1,2),imag=c(3,4)) > z [1] 1+3i 2+4i > Re(z) [1] 1 2
The techniques to cast a variable to a complex number and to test whether or not a variable is a complex number are similar to the methods seen previously. A variable can be cast as complex using the as.complex
command. Also, to test whether or not a variable is a complex number, the as.complex
command can be used.
Special data types
There are two other common data types that occur that are important. We will discuss these two data types and provide a note about objects. The two data types are NA
and NULL
. These are brief comments, as these are recurring topics that we will revisit many times.
The first data type is a constant, NA
. This is a type used to indicate a missing value. It is a constant in R, and a variable can be tested using the is.na
command, as follows:
> n <- c(NA,2,3,NA,5) > n [1] NA 2 3 NA 5 > is.na(n) [1] TRUE FALSE FALSE TRUE FALSE > n[!is.na(n)] [1] 2 3 5
Another special type is the NULL
type. It has the same meaning as the null
keyword in the C language. It is not an actual type but is used to determine whether or not an object exists:
> a <- NULL
> typeof(a)
[1] "NULL"
Finally, we'll quickly explore the term objects
. The variables that we defined in all of the preceding examples are treated as objects within the R environment. When we start writing functions and creating classes, it will be important to realize that they are treated like variables. The names used to assign variables are just a shortcut for R to determine where an object is located.
For example, the complex
command is used to allocate a vector of complex values. The command is defined to be a set of instructions, and there is an object called complex
that points to those instructions:
> complex function (length.out = 0L, real = numeric(), imaginary = numeric(), modulus = 1, argument = 0) { if (missing(modulus) && missing(argument)) { .Internal(complex(length.out, real, imaginary)) } else { n <- max(length.out, length(argument), length(modulus)) rep_len(modulus, n) * exp((0+1i) * rep_len(argument, n)) } } <bytecode: 0x2489c80> <environment: namespace:base>
There is a difference between calling the complex()
function and referring to the set of instructions located at complex
.
Notes on the as and is functions
Two common tasks are to determine whether a variable is of a given type and to cast a variable to different types. The commands to determine whether a variable is of a given type generally start with the is
prefix, and the commands to cast a variable to a different type generally start with the as
prefix. The list of commands to determine whether a variable is of a given type are given in the following table:
Table 2 – commands to determine whether a variable is of a particular type
The commands used to cast a variable to a different type are given in Table 3. These commands take a single argument and return a variable of the given type. For example, the as.character
command can be used to convert a number to a string.
The commands in the previous table are used to test what type a variable has. The following table provides the commands that are used to change a variable of one type to another type:
Table 3 – commands to cast a variable into a particular type