# to read files faster
library(readr)

# to munge data
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# to make dplyr even better
library(magrittr)

# to tidy data
library(tidyr)
## 
## Attaching package: 'tidyr'
## 
## The following object is masked from 'package:magrittr':
## 
##     extract
# to plot
library(ggplot2)

# to make ggplot look even better
library(ggthemes)

# to get even more out of ggplot2
library(cowplot)
## 
## Attaching package: 'cowplot'
## 
## The following object is masked from 'package:ggplot2':
## 
##     ggsave

R overview

R is a language for statistical computing and graphics. In contrast to matlab or mathematica it’s free and open source. Hence, an ecosystem of diverse packages is growing rapidly. These packages are designed to make the R experience better, examples are dplyr and tidyr, to improve R’s plotting capabilities, e.g. ggplot2, or to address the specific needs of countless fields such as: Finance, Astronomy, Linguistics, …, and of course Biology. A lot of biological packages especially those that deal with high throughput genomic data are associated with Bioconductor (https://www.bioconductor.org/). People working with structures should check out Bio3d (http://thegrantlab.org/bio3d/index.php).

Some very useful commands before we get started

Before we get started I like to introduce some basic navigation commands:

  • getwd() will show you your working directory
  • setwd(“some/path/to/the/directory”) allows you to change the working directory
  • dir() lists the contents of the working directory.

Three more commands that come in very handy are:

  • ?some_command: This yields help pages for a command
  • ??some_package: Same but as ?some_command for packages.
  • str(some_object): Displays the internal structure of an R object

A very brief and incomplete overview of data structures in R

Vectors

The basic structure are vectors. They come either in the form of lists or in the form of atomic vectors. The difference is that atomic vectors contain elements of one type only whereas lists can be mixed.

atomic_vector_1 <- c(0, 1, 10.1, 10**3)
atomic_vector_1
## [1]    0.0    1.0   10.1 1000.0
atomic_vector_2 <- c("hello", "world")
atomic_vector_2
## [1] "hello" "world"
list_1 <- list("a bit of this", "...", 2+2, atomic_vector_1, atomic_vector_2)
list_1
## [[1]]
## [1] "a bit of this"
## 
## [[2]]
## [1] "..."
## 
## [[3]]
## [1] 4
## 
## [[4]]
## [1]    0.0    1.0   10.1 1000.0
## 
## [[5]]
## [1] "hello" "world"
list_2 <- list(letters[1:8], 1:8, 42, "Life", "The Universe", "And Everything")
list_2
## [[1]]
## [1] "a" "b" "c" "d" "e" "f" "g" "h"
## 
## [[2]]
## [1] 1 2 3 4 5 6 7 8
## 
## [[3]]
## [1] 42
## 
## [[4]]
## [1] "Life"
## 
## [[5]]
## [1] "The Universe"
## 
## [[6]]
## [1] "And Everything"

Atomic vectors are created using c() and lists are created using list(). When mixing different types of data in an atomic vector they will be coerced:

atomic_vector_3 <- c(42, "Life", "The Universe", "And Everything")
atomic_vector_3
## [1] "42"             "Life"           "The Universe"   "And Everything"

In the example above 42 will be treated as a string.

Vectors have three properties: - type - length - attributes

typeof(list_1)
## [1] "list"
length(list_1)
## [1] 5

Attributes are used to store metadata

attr(atomic_vector_3, "my_attribute") = "Hitchhiker's guide" 
str(atomic_vector_3)
##  atomic [1:4] 42 Life The Universe And Everything
##  - attr(*, "my_attribute")= chr "Hitchhiker's guide"

A bit more about coercion

logical_vector <- c(TRUE, FALSE, FALSE, TRUE, TRUE)

# checking the type
typeof(logical_vector)
## [1] "logical"
str(logical_vector)
##  logi [1:5] TRUE FALSE FALSE TRUE TRUE
# it's also possible to verify the type by:
is.logical(logical_vector)
## [1] TRUE
# the type can also be changed 
x <- as.numeric(logical_vector)
str(x)
##  num [1:5] 1 0 0 1 1

Note, coercion from less to more flexible types (logical < integer < double < character) usually works, the other way around will raise errors or undesired results.

For example:

y <- c("six", "seven")
as.numeric(y)

Changing a logical vector no numeric turns TRUEs into 1s and FALSEs into 0. This allows for quick analysis of logical data:

x
## [1] 1 0 0 1 1
# counting all TRUEs
sum(x)
## [1] 3
# ratio TRUE/FALSE
mean(x)
## [1] 0.6

It’s important to remember that many operations will coerce the type automatically.

Dataframes

As in Pandas the main workhorse are dataframes. (For this tutorial we ignore matrices, arrays, and factors.) A dataframe is a list of equal-length vectors. Let’s have a look at the frog tongue adhesion data to illustrate this.

whole_frog_file <- read_csv("frog_tongue_adhesion.csv")
## Warning: 93 parsing failures.
## row col  expected    actual
##   1  -- 2 columns 5 columns
##   2  -- 2 columns 1 columns
##   3  -- 2 columns 1 columns
##   4  -- 2 columns 1 columns
##   5  -- 2 columns 1 columns
## ... ... ......... .........
## .See problems(...) for more details.
# read in the csv file and skip the comments
frog <- read_csv("frog_tongue_adhesion.csv", skip = 14)
head(frog)
## Source: local data frame [6 x 15]
## 
##         date    ID trial number impact force (mN) impact time (ms)
##       (date) (chr)        (int)             (int)            (int)
## 1 2013-02-26     I            3              1205               46
## 2 2013-02-26     I            4              2527               44
## 3 2013-03-01     I            1              1745               34
## 4 2013-03-01     I            2              1556               41
## 5 2013-03-01     I            3               493               36
## 6 2013-03-01     I            4              2276               31
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
##   (int), time frog pulls on target (ms) (int), adhesive force / body
##   weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
##   (int), contact area without mucus (mm2) (int), contact area with mucus /
##   contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
##   strength (Pa) (int)

Let’s have a look at some of the dataframe’s properties:

str(frog)
## Classes 'tbl_df', 'tbl' and 'data.frame':    80 obs. of  15 variables:
##  $ date                                                : Date, format: "2013-02-26" "2013-02-26" ...
##  $ ID                                                  : chr  "I" "I" "I" "I" ...
##  $ trial number                                        : int  3 4 1 2 3 4 1 2 3 4 ...
##  $ impact force (mN)                                   : int  1205 2527 1745 1556 493 2276 556 1928 2641 1897 ...
##  $ impact time (ms)                                    : int  46 44 34 41 36 31 43 46 50 41 ...
##  $ impact force / body weight                          : num  1.95 4.08 2.82 2.51 0.8 3.68 0.9 3.11 4.27 3.06 ...
##  $ adhesive force (mN)                                 : int  -785 -983 -850 -455 -974 -592 -512 -804 -690 -462 ...
##  $ time frog pulls on target (ms)                      : int  884 248 211 1025 499 969 835 508 491 839 ...
##  $ adhesive force / body weight                        : num  1.27 1.59 1.37 0.74 1.57 0.96 0.83 1.3 1.12 0.75 ...
##  $ adhesive impulse (N-s)                              : num  -0.29 -0.181 -0.157 -0.17 -0.423 -0.176 -0.285 -0.285 -0.239 -0.328 ...
##  $ total contact area (mm2)                            : int  387 101 83 330 245 341 359 246 269 266 ...
##  $ contact area without mucus (mm2)                    : int  70 94 79 158 216 106 110 178 224 176 ...
##  $ contact area with mucus / contact area without mucus: num  0.82 0.07 0.05 0.52 0.12 0.69 0.69 0.28 0.17 0.34 ...
##  $ contact pressure (Pa)                               : int  3117 24923 21020 4718 2012 6676 1550 7832 9824 7122 ...
##  $ adhesive strength (Pa)                              : int  -2030 -9695 -10239 -1381 -3975 -1737 -1427 -3266 -2568 -1733 ...
# column and row names
colnames(frog)
##  [1] "date"                                                
##  [2] "ID"                                                  
##  [3] "trial number"                                        
##  [4] "impact force (mN)"                                   
##  [5] "impact time (ms)"                                    
##  [6] "impact force / body weight"                          
##  [7] "adhesive force (mN)"                                 
##  [8] "time frog pulls on target (ms)"                      
##  [9] "adhesive force / body weight"                        
## [10] "adhesive impulse (N-s)"                              
## [11] "total contact area (mm2)"                            
## [12] "contact area without mucus (mm2)"                    
## [13] "contact area with mucus / contact area without mucus"
## [14] "contact pressure (Pa)"                               
## [15] "adhesive strength (Pa)"
# or
names(frog)
##  [1] "date"                                                
##  [2] "ID"                                                  
##  [3] "trial number"                                        
##  [4] "impact force (mN)"                                   
##  [5] "impact time (ms)"                                    
##  [6] "impact force / body weight"                          
##  [7] "adhesive force (mN)"                                 
##  [8] "time frog pulls on target (ms)"                      
##  [9] "adhesive force / body weight"                        
## [10] "adhesive impulse (N-s)"                              
## [11] "total contact area (mm2)"                            
## [12] "contact area without mucus (mm2)"                    
## [13] "contact area with mucus / contact area without mucus"
## [14] "contact pressure (Pa)"                               
## [15] "adhesive strength (Pa)"
# and
rownames(frog)
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14"
## [15] "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28"
## [29] "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42"
## [43] "43" "44" "45" "46" "47" "48" "49" "50" "51" "52" "53" "54" "55" "56"
## [57] "57" "58" "59" "60" "61" "62" "63" "64" "65" "66" "67" "68" "69" "70"
## [71] "71" "72" "73" "74" "75" "76" "77" "78" "79" "80"
# Since there are no row names the row numbers are displayed. Note, the first 
# line is 1, not 0!

# number of colmuns
length(frog)
## [1] 15
#or
ncol(frog)
## [1] 15
# number of rows
nrow(frog)
## [1] 80
# summary of data
summary(frog)
##       date                 ID             trial number impact force (mN)
##  Min.   :2013-02-26   Length:80          Min.   :1.0   Min.   :  22.0   
##  1st Qu.:2013-03-18   Class :character   1st Qu.:1.0   1st Qu.: 456.0   
##  Median :2013-05-04   Mode  :character   Median :2.0   Median : 601.0   
##  Mean   :2013-04-30                      Mean   :2.4   Mean   : 801.7   
##  3rd Qu.:2013-06-15                      3rd Qu.:3.0   3rd Qu.:1005.0   
##  Max.   :2013-06-26                      Max.   :5.0   Max.   :2641.0   
##  impact time (ms) impact force / body weight adhesive force (mN)
##  Min.   :  6.00   Min.   :0.170              Min.   :-983.0     
##  1st Qu.: 29.75   1st Qu.:1.470              1st Qu.:-567.8     
##  Median : 34.00   Median :3.030              Median :-335.0     
##  Mean   : 39.06   Mean   :2.920              Mean   :-397.8     
##  3rd Qu.: 42.00   3rd Qu.:4.277              3rd Qu.:-224.5     
##  Max.   :143.00   Max.   :6.490              Max.   : -92.0     
##  time frog pulls on target (ms) adhesive force / body weight
##  Min.   : 189.0                 Min.   :0.220               
##  1st Qu.: 682.2                 1st Qu.:0.990               
##  Median : 927.0                 Median :1.320               
##  Mean   :1132.5                 Mean   :1.445               
##  3rd Qu.:1381.2                 3rd Qu.:1.772               
##  Max.   :4251.0                 Max.   :3.400               
##  adhesive impulse (N-s) total contact area (mm2)
##  Min.   :-0.76800       Min.   : 19.0           
##  1st Qu.:-0.27725       1st Qu.:104.8           
##  Median :-0.16500       Median :134.5           
##  Mean   :-0.18746       Mean   :166.5           
##  3rd Qu.:-0.08125       3rd Qu.:238.2           
##  Max.   :-0.00100       Max.   :455.0           
##  contact area without mucus (mm2)
##  Min.   :  0.00                  
##  1st Qu.: 16.75                  
##  Median : 43.00                  
##  Mean   : 61.40                  
##  3rd Qu.: 92.50                  
##  Max.   :260.00                  
##  contact area with mucus / contact area without mucus
##  Min.   :0.010                                       
##  1st Qu.:0.280                                       
##  Median :0.665                                       
##  Mean   :0.569                                       
##  3rd Qu.:0.885                                       
##  Max.   :1.000                                       
##  contact pressure (Pa) adhesive strength (Pa)
##  Min.   :  397         Min.   :-17652        
##  1st Qu.: 2579         1st Qu.: -3443        
##  Median : 4678         Median : -2186        
##  Mean   : 6073         Mean   : -3006        
##  3rd Qu.: 7250         3rd Qu.: -1736        
##  Max.   :28641         Max.   :  -678

tidyr, dplyr, and magrittr

A lot of data at our disposal needs to be prepared to be useful. The packages tidyr, dplyr, and magrittr help us to transform the data in a way that is useful. They are written in C++, hence, they are much faster than base R operations. Thanks to the piping (%>%, see below for examples) that was brought to a new level by magrittr, which is now largely incorporated in dplyr, code is surprisingly easy to read.

Let’s have another look at the frog data. The first column describes the date in the format Year-Month-Day. Let’s assume we wanted to compare the results from different months, wouldn’t it be lovely to have three date columns – year, month, and day – instead?

frog %>% separate(date, c("Year", "Month", "Day"), sep = "-", remove = TRUE)
## Source: local data frame [80 x 17]
## 
##     Year Month   Day    ID trial number impact force (mN) impact time (ms)
##    (chr) (chr) (chr) (chr)        (int)             (int)            (int)
## 1   2013    02    26     I            3              1205               46
## 2   2013    02    26     I            4              2527               44
## 3   2013    03    01     I            1              1745               34
## 4   2013    03    01     I            2              1556               41
## 5   2013    03    01     I            3               493               36
## 6   2013    03    01     I            4              2276               31
## 7   2013    03    05     I            1               556               43
## 8   2013    03    05     I            2              1928               46
## 9   2013    03    05     I            3              2641               50
## 10  2013    03    05     I            4              1897               41
## ..   ...   ...   ...   ...          ...               ...              ...
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
##   (int), time frog pulls on target (ms) (int), adhesive force / body
##   weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
##   (int), contact area without mucus (mm2) (int), contact area with mucus /
##   contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
##   strength (Pa) (int)
frog
## Source: local data frame [80 x 15]
## 
##          date    ID trial number impact force (mN) impact time (ms)
##        (date) (chr)        (int)             (int)            (int)
## 1  2013-02-26     I            3              1205               46
## 2  2013-02-26     I            4              2527               44
## 3  2013-03-01     I            1              1745               34
## 4  2013-03-01     I            2              1556               41
## 5  2013-03-01     I            3               493               36
## 6  2013-03-01     I            4              2276               31
## 7  2013-03-05     I            1               556               43
## 8  2013-03-05     I            2              1928               46
## 9  2013-03-05     I            3              2641               50
## 10 2013-03-05     I            4              1897               41
## ..        ...   ...          ...               ...              ...
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
##   (int), time frog pulls on target (ms) (int), adhesive force / body
##   weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
##   (int), contact area without mucus (mm2) (int), contact area with mucus /
##   contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
##   strength (Pa) (int)

magrittr allows us to use the %<>% operator which enables us to overwrite the input

frog %<>% separate(date, c("Year", "Month", "Day"), sep = "-", remove = TRUE)
frog
## Source: local data frame [80 x 17]
## 
##     Year Month   Day    ID trial number impact force (mN) impact time (ms)
##    (chr) (chr) (chr) (chr)        (int)             (int)            (int)
## 1   2013    02    26     I            3              1205               46
## 2   2013    02    26     I            4              2527               44
## 3   2013    03    01     I            1              1745               34
## 4   2013    03    01     I            2              1556               41
## 5   2013    03    01     I            3               493               36
## 6   2013    03    01     I            4              2276               31
## 7   2013    03    05     I            1               556               43
## 8   2013    03    05     I            2              1928               46
## 9   2013    03    05     I            3              2641               50
## 10  2013    03    05     I            4              1897               41
## ..   ...   ...   ...   ...          ...               ...              ...
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
##   (int), time frog pulls on target (ms) (int), adhesive force / body
##   weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
##   (int), contact area without mucus (mm2) (int), contact area with mucus /
##   contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
##   strength (Pa) (int)

tidyr also allows the user to fuse columns:

frog %>% unite(date, Year, Month, Day, sep = "_" )
## Source: local data frame [80 x 15]
## 
##          date    ID trial number impact force (mN) impact time (ms)
##         (chr) (chr)        (int)             (int)            (int)
## 1  2013_02_26     I            3              1205               46
## 2  2013_02_26     I            4              2527               44
## 3  2013_03_01     I            1              1745               34
## 4  2013_03_01     I            2              1556               41
## 5  2013_03_01     I            3               493               36
## 6  2013_03_01     I            4              2276               31
## 7  2013_03_05     I            1               556               43
## 8  2013_03_05     I            2              1928               46
## 9  2013_03_05     I            3              2641               50
## 10 2013_03_05     I            4              1897               41
## ..        ...   ...          ...               ...              ...
## Variables not shown: impact force / body weight (dbl), adhesive force (mN)
##   (int), time frog pulls on target (ms) (int), adhesive force / body
##   weight (dbl), adhesive impulse (N-s) (dbl), total contact area (mm2)
##   (int), contact area without mucus (mm2) (int), contact area with mucus /
##   contact area without mucus (dbl), contact pressure (Pa) (int), adhesive
##   strength (Pa) (int)

The spaces, parentheses, and division sign in the column names are not very helpful. Let’s get rid of them first before we move on:

# regular expressions exist in R too!
temp_names <-   gsub(( " |\\/|\\("), "_", colnames(frog))
temp_names <- gsub("_+", "_", temp_names)
temp_names <- gsub("\\)", "", temp_names)

colnames(frog) <- temp_names
frog %>% head
## Source: local data frame [6 x 17]
## 
##    Year Month   Day    ID trial_number impact_force_mN impact_time_ms
##   (chr) (chr) (chr) (chr)        (int)           (int)          (int)
## 1  2013    02    26     I            3            1205             46
## 2  2013    02    26     I            4            2527             44
## 3  2013    03    01     I            1            1745             34
## 4  2013    03    01     I            2            1556             41
## 5  2013    03    01     I            3             493             36
## 6  2013    03    01     I            4            2276             31
## Variables not shown: impact_force_body_weight (dbl), adhesive_force_mN
##   (int), time_frog_pulls_on_target_ms (int), adhesive_force_body_weight
##   (dbl), adhesive_impulse_N-s (dbl), total_contact_area_mm2 (int),
##   contact_area_without_mucus_mm2 (int),
##   contact_area_with_mucus_contact_area_without_mucus (dbl),
##   contact_pressure_Pa (int), adhesive_strength_Pa (int)

You are already familiar with the concept of tidy data. Let’s apply it to the modified frog dataframe:

tidy_frog <- 
  frog %>% gather(experiment, result, impact_force_mN:adhesive_strength_Pa)
tidy_frog
## Source: local data frame [960 x 7]
## 
##     Year Month   Day    ID trial_number      experiment result
##    (chr) (chr) (chr) (chr)        (int)          (fctr)  (dbl)
## 1   2013    02    26     I            3 impact_force_mN   1205
## 2   2013    02    26     I            4 impact_force_mN   2527
## 3   2013    03    01     I            1 impact_force_mN   1745
## 4   2013    03    01     I            2 impact_force_mN   1556
## 5   2013    03    01     I            3 impact_force_mN    493
## 6   2013    03    01     I            4 impact_force_mN   2276
## 7   2013    03    05     I            1 impact_force_mN    556
## 8   2013    03    05     I            2 impact_force_mN   1928
## 9   2013    03    05     I            3 impact_force_mN   2641
## 10  2013    03    05     I            4 impact_force_mN   1897
## ..   ...   ...   ...   ...          ...             ...    ...

dplyr uses seven key verbs to do stuff:

  • select()
  • filter()
  • group_by()
  • summarise()
  • arrange()
  • inner_join() (and other sql type of joins)
  • mutate()

Remember, you can find out more about these using ?command. Let’s arbitrarily look at a few things to illustrate these capabilities. For this we use the wide (not-tidied) dataframe

dry_contact_area <- 
frog %>% 
  # Select a subset of the data
  select(Year:trial_number, 
         total_contact_area_mm2, 
         contact_area_without_mucus_mm2) %>%
  # calculate area with mucus and add an additional column
  mutate(dry_contact_area_mm2 = 
           total_contact_area_mm2 - contact_area_without_mucus_mm2) %>%
  group_by(ID) %>% 
  summarize(mean_dry_contact = mean(dry_contact_area_mm2), 
            median_dry_contact = median(dry_contact_area_mm2),
            min_dry_contact = min(dry_contact_area_mm2), 
            max_dry_contact = max(dry_contact_area_mm2),
            sd_dry_contact = sd(dry_contact_area_mm2),
            counts = n())

dry_contact_area
## Source: local data frame [4 x 7]
## 
##      ID mean_dry_contact median_dry_contact min_dry_contact
##   (chr)            (dbl)              (dbl)           (int)
## 1     I           128.55               90.5               4
## 2    II           163.50              146.5               1
## 3   III            56.40               61.0             -95
## 4    IV            71.85               63.0             -17
## Variables not shown: max_dry_contact (int), sd_dry_contact (dbl), counts
##   (int)

There is some potentially useful information in the comments. Let’s create an new dataframe with this data:

names <- c("ID", "age_group", "svl", "weight")
ID <- c("I", "II", "III", "IV")
age_group <- c("adult", "adult", "juvenile", "juvenile")
svl <- c(63, 70, 28, 31)
weight <- c(63.1, 72.7, 12.7, 12.7)

frog_characteristics <- data.frame(ID, age_group, svl, weight)
colnames(frog_characteristics) <- names

frog_characteristics
##    ID age_group svl weight
## 1   I     adult  63   63.1
## 2  II     adult  70   72.7
## 3 III  juvenile  28   12.7
## 4  IV  juvenile  31   12.7

The join function allows us to combine this data with our dry_contact_area

dry_contact_area_characteristics <- 
  dry_contact_area %>% inner_join(frog_characteristics)
## Joining by: "ID"
## Warning in inner_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector
dry_contact_area_characteristics
## Source: local data frame [4 x 10]
## 
##      ID mean_dry_contact median_dry_contact min_dry_contact
##   (chr)            (dbl)              (dbl)           (int)
## 1     I           128.55               90.5               4
## 2    II           163.50              146.5               1
## 3   III            56.40               61.0             -95
## 4    IV            71.85               63.0             -17
## Variables not shown: max_dry_contact (int), sd_dry_contact (dbl), counts
##   (int), age_group (fctr), svl (dbl), weight (dbl)

Let’s add it also to the frog frame

new_frog <- frog %>%
  mutate(dry_contact_area_mm2 = 
           total_contact_area_mm2 - contact_area_without_mucus_mm2) %>%
  inner_join(frog_characteristics) %>% 
  # rearrange the columns 
  select(Year:trial_number, age_group:weight, 
         impact_force_mN:dry_contact_area_mm2) %>%
  gather(experiment, result, impact_force_mN:dry_contact_area_mm2)
## Joining by: "ID"
## Warning in inner_join_impl(x, y, by$x, by$y): joining character vector and
## factor, coercing into character vector
new_frog
## Source: local data frame [1,040 x 10]
## 
##     Year Month   Day    ID trial_number age_group   svl weight
##    (chr) (chr) (chr) (chr)        (int)    (fctr) (dbl)  (dbl)
## 1   2013    02    26     I            3     adult    63   63.1
## 2   2013    02    26     I            4     adult    63   63.1
## 3   2013    03    01     I            1     adult    63   63.1
## 4   2013    03    01     I            2     adult    63   63.1
## 5   2013    03    01     I            3     adult    63   63.1
## 6   2013    03    01     I            4     adult    63   63.1
## 7   2013    03    05     I            1     adult    63   63.1
## 8   2013    03    05     I            2     adult    63   63.1
## 9   2013    03    05     I            3     adult    63   63.1
## 10  2013    03    05     I            4     adult    63   63.1
## ..   ...   ...   ...   ...          ...       ...   ...    ...
## Variables not shown: experiment (fctr), result (dbl)

ggplot2, ggthemes, and cowplot

ggplot2 uses the concept “grammar of graphics”. Let’s illustrate this by plotting a few things.

Examples for one variable:

a <- ggplot(frog, aes(contact_pressure_Pa))
a + geom_area(stat = "bin")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

a + geom_histogram()
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

a + geom_dotplot()
## stat_bindot: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

# the histogram will be saved as:
plot1 <- a + geom_histogram()  + xlab("") + ylab("")

Examples for two continuous variables:

b <- ggplot(new_frog, aes(x = "impact_force_mN-s", y = "contact_pressure_Pa"))
b + geom_jitter()

# add color by ID
b + geom_jitter(aes(color = factor(age_group))) 

# this plot will be saved as:
plot2 <- b + geom_jitter(aes(color = factor(age_group))) + xlab("") + ylab("")

Examples for discrete X and continuous Y:

  • c <- ggplot(new_frog, aes(x = age_group, y = contact_pressure_Pa))
  • c + geom_dotplot()
  • c + geom_boxplot()

  • according to Edward Tufte this plot can be improved on
  • c + geom_tufteboxplot() + theme_tufte()

  • a personal favarite:
  • c + geom_violin() + geom_tufteboxplot() + theme_tufte()

  • this plot will be saved as:
  • plot3 <- c + geom_violin() + geom_tufteboxplot() + theme_tufte() + xlab(“”) + ylab(“”)

cowplot enables to create a figure out of the saved plots

plot_grid(plot1, plot2, plot3, labels = LETTERS[1:3], align = "h", nrow = 3)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.