Auxiliary tutorial 1: Introduction to Bokeh

This tutorial was generated from an Jupyter notebook. You can download the notebook here.

In [1]:
# Our numerical workhorses
import numpy as np
import pandas as pd

# Import Bokeh modules for interactive plotting
import bokeh.charts
import bokeh.charts.utils
import bokeh.io
import bokeh.models
import bokeh.palettes
import bokeh.plotting

# Display graphics in this notebook
bokeh.io.output_notebook()
BokehJS successfully loaded.

In this tutorial, we will explore browser-based interactive plotting using Bokeh. It is important that you are using the latest version of Bokeh, v. 0.10.0. After importing, verify that this is the case.

In [2]:
bokeh.__version__
Out[2]:
'0.10.0'

If we do not have the most recent version, you can update it:

conda update bokeh


Why is it so important to use the most recent version? Bokeh is currently in very active development. It is certainly not feature-full yet, and there are lots and lots of features slated to be added.

For browser-based interactive data visualization, D3.js is the most widely used and feature-full. However, it is a lower level package, and requires writing JavaScript. Bokeh, like Shiny(http://shiny.rstudio.com) for R, and others, is an attempt to bring the type of functionality D3 offers, using high level languages like Python. In other words, the goal is that you can achieve browser-based interactive data visualizations with few lines of code. Bokeh has the additional goal of being able to handle big data sets, including streaming data.

Why browser-based interactive data visualization?

I think the interactive part is easy to answer. The most you an interact with your data, particularly during the exploratory phase of data analysis, the more you can learn. When doing exploratory data analysis, we typically make lots and lots of plots to see patterns. If we can expedite this process, we can be more efficient and effective in our analysis.

Why browser-based? There are two simple answers to this. First, everyone has one, and they are relatively standardized. This makes your graphics very portable. Second, there are lots of tools for efficiently rendering graphics in browsers. Bokeh uses HTML5 canvas elements to accomplish this. These tools are mature and stable, thereby making backend rendering of the graphics easy.

Data for this tutorial

We will use the tidy DataFrames from the first couple weeks of class as we explore Bokeh's features and do some interactive visualizations. So, let's load in the DataFrames now.

In [3]:
# The frog data from tutorial 1a
df_frog = pd.read_csv('../data/frog_tongue_adhesion.csv', comment='#')

# The MT catastrophe data
df_mt = pd.read_csv(
    '../data/gardner_et_al/gardner_et_al_2011_time_to_catastrophe_dic.csv',
    comment='#')

# These were generated in tutorial 2a
df_fish = pd.read_csv('../data/130315_10_minute_intervals.csv')

Before moving on, we'll go ahead and tidy the MT catastrophe DataFrame.

In [4]:
# Tidy MT_catastrophe DataFrame
df_mt.columns = ['labeled', 'unlabeled']
df_mt = pd.melt(df_mt, var_name='fluor', value_name='tau').dropna()

High-level charts

Perhaps the easiest way to get started with Bokeh is to use its high-level charts. These allow for rapid plotting of data coming from Pandas DataFrames, much like the plotting utilities in Pandas itself.

Line plot

We'll start with a simple line plot of zebrafish sleep data.

In [5]:
# Pull out fish record
df_fish2 = df_fish[df_fish['fish']=='FISH2']

# Use Bokeh chart to make plot
p = bokeh.charts.Line(df_fish2, x='zeit', y='activity', color='firebrick')

# Display it
bokeh.io.show(p)

There are many things to note here. First, and most obviously, you can play with the various tools. You can select the tools in the upper right corner of the plot. Hovering over an icon will reveal what the tool does.

When we instantiate the bokeh.charts.Line object, we plot is returned, which we assigned to variable p. We can further modify/add attributes to this object. Importantly, the bokher.io.show() function displays the object. We have specified that the graphics will be shown in the current notebook with our import statements. We can also export the plot as its own standalone HTML document. We won't do it here, but simply put

bokeh.plotting.output_file('filename.html')

before the bokeh.io.show(p) function call.

Note also that we chose a color of "firebrick." We can choose any of the named CSS colors, or specify a hexadecimal color. Notice also that the axes were automatically labeled with the column headings of the DataFrame. We can specify the axis labels with keyword arguments as well.

In [6]:
# Use Bokeh chart to make plot
p = bokeh.charts.Line(df_fish2, x='zeit', y='activity', color='firebrick',
                      xlabel='time (h)', ylabel='sec of activity / 10 min')

# Display it
bokeh.io.show(p)

We can also put multiple lines on the same plot.

In [7]:
# Select three fish to plot
df_fish_multi = df_fish[df_fish['fish'].isin(['FISH11', 'FISH12', 'FISH23'])]

# Use Bokeh chart to make plot
p = bokeh.charts.Line(df_fish_multi, x='zeit', y='activity', color='fish',
                      legend="top_left")

# Display it
bokeh.io.show(p)

Box plots

Bokeh's high-level charts interface also allows for easy construction of box plots. As an example, we'll make box plots of the striking force of the frog tongues.

In [8]:
# Use Bokeh chart to make plot
p = bokeh.charts.BoxPlot(df_frog, values='impact force (mN)', label='ID',
                        color='ID', xlabel='frog', ylabel='impact force (mN)')

# Display it
bokeh.io.show(p)

The problem with bokeh.charts's way of doing box plots is that they choose the convention that the whisker always go $\pm 1.5\,\text{IQR}$, even when there are no outliers. I.e., the whiskers can extend past the actual measurement. I prefer to have the whiskers show the extent of the data. So, I wrote my own box plot function (below) to do the task more to my specification. This highlights a disadvantage of using the higher level tools; you have less control. Of course, sacrificing control to have a one-liner is often worth it.

Scatter plots

We can also make scatter plots. As a useful feature, we can color the points in the scatter plot according to values in the DataFrame.

In [9]:
# Use Bokeh chart to make plot
p = bokeh.charts.Scatter(df_frog, x='impact force (mN)', y='adhesive force (mN)',
                         color='ID', ylabel='adhesive force (mN)',
                         xlabel='impact force (mN)', legend='top_right')

# Display it
bokeh.io.show(p)

Histograms

And, of course, we can do histograms. We'll use the microtubule catastrophe data to do that.

In [10]:
# Use Bokeh chart to make plot
p = bokeh.charts.Histogram(df_mt, values='tau', color='fluor',
                           bins=20, legend='top_right')

# Display it
bokeh.io.show(p)

More control with the plotting interface

Bokeh's charts interface is useful for quickly making plots from DataFrames, but the lower level bokeh.plotting interface allows more control over the plots. For example, we'll use my favorite background fill with white grid for our plot.

In [11]:
# Set up the figure (this is like a canvas you will paint on)
p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650, 
                          plot_height=450)
p.xgrid.grid_line_color = 'white'
p.ygrid.grid_line_color = 'white'
p.xaxis.axis_label = 'Impact force (mN)'
p.yaxis.axis_label = 'Adhesive force (mN)'

# Specify the glyphs
p.circle(df_frog['impact force (mN)'], df_frog['adhesive force (mN)'], size=7,
         alpha=0.5)

bokeh.io.show(p)

We can also add multiple glyphs to the same plot.

In [12]:
p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650, 
                          plot_height=450)
p.xgrid.grid_line_color = 'white'
p.ygrid.grid_line_color = 'white'
p.xaxis.axis_label = 'Ï„ (s)'
p.yaxis.axis_label = 'ECDF'
p.legend.orientation = 'lower_right'

# Build ECDFs
ecdf_lab_x = np.sort(df_mt[df_mt.fluor=='labeled']['tau'].values)
ecdf_lab_y = np.arange(1, len(ecdf_lab_x)+1) / len(ecdf_lab_x)
ecdf_un_x = np.sort(df_mt[df_mt.fluor=='unlabeled']['tau'].values)
ecdf_un_y = np.arange(1, len(ecdf_un_x)+1) / len(ecdf_un_x)

# Specify the glyphs
p.circle(ecdf_lab_x, ecdf_lab_y, size=7, alpha=0.5, legend='labeled',
        color='dodgerblue')
p.circle(ecdf_un_x, ecdf_un_y, size=7, alpha=0.5, legend='unlabeled',
        color='tomato')

p.legend.orientation = 'bottom_right'

bokeh.io.show(p)

We can also exercise this increased control with the fish activity data. First, we'll write a small function to get the starting and ending points of nights.

In [13]:
def nights(df):
    """
    Takes light series from a single fish and gives the start and end of nights.
    """
    lefts = df.zeit[np.where(np.diff(df.light.astype(int)) == -1)[0]].values
    rights = df.zeit[np.where(np.diff(df.light.astype(int)) == 1)[0]].values
    return lefts, rights

Now that we have this function, we can proceed to make our nicely shaded plot.

In [14]:
# Create figure
p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650, 
                          plot_height=450)
p.xgrid.grid_line_color = 'white'
p.ygrid.grid_line_color = 'white'
p.xaxis.axis_label ='time (hours)'
p.yaxis.axis_label ='sec. of activity / 10 min.'

# Specify colors
colors = ['dodgerblue', 'tomato', 'indigo']

# Populate glyphs
for i, fish in enumerate(['FISH11', 'FISH12', 'FISH23']):
    source = bokeh.models.ColumnDataSource(df_fish[df_fish['fish']==fish])
    p.line(x='zeit', y='activity', line_width=0.5, alpha=0.75, source=source,
          color=colors[i])

# Determine when nights start and end
lefts, rights = nights(df_fish[df_fish.fish=='FISH1'])

# Make shaded boxes for nights
night_boxes = []
for i, left in enumerate(lefts):
    night_boxes.append(
            bokeh.models.BoxAnnotation(plot=p, left=left, right=rights[i], 
                                       fill_alpha=0.3, fill_color='gray'))
p.renderers.extend(night_boxes)

bokeh.io.show(p)

A better box plotting function

As I mentioned before, I would prefer to do box plots differently than the Bokeh default. With the added control of the bokeh.plotting module, I can do that.

In [15]:
def box_plot(df, vals, label, ylabel=None):
    """
    Make a Bokeh box plot from a tidy DataFrame.
    
    Parameters
    ----------
    df : tidy Pandas DataFrame
        DataFrame to be used for plotting
    vals : hashable object
        Column of DataFrame containing data to be used.
    label : hashable object
        Column of DataFrame use to categorize.
    ylabel : str, default None
        Text for y-axis label
        
    Returns
    -------
    output : Bokeh plotting object
        Bokeh plotting object that can be rendered with
        bokeh.io.show()
        
    Notes
    -----
    .. Based largely on example code found here:
     https://github.com/bokeh/bokeh/blob/master/examples/plotting/file/boxplot.py
    """
    # Get the categories
    cats = list(df[label].unique())
    
    # Group Data frame
    df_gb = df.groupby(label)

    # Compute quartiles for each group
    q1 = df_gb[vals].quantile(q=0.25)
    q2 = df_gb[vals].quantile(q=0.5)
    q3 = df_gb[vals].quantile(q=0.75)
                       
    # Compute interquartile region and upper and lower bounds for outliers
    iqr = q3 - q1
    upper_cutoff = q3 + 1.5*iqr
    lower_cutoff = q1 - 1.5*iqr

    # Find the outliers for each category
    def outliers(group):
        cat = group.name
        outlier_inds = (group[vals] > upper_cutoff[cat]) \
                                     | (group[vals] < lower_cutoff[cat])
        return group[vals][outlier_inds]

    # Apply outlier finder
    out = df_gb.apply(outliers).dropna()

    # Points of outliers for plotting
    outx = []
    outy = []
    for cat in cats:
        # only add outliers if they exist
        if not out[cat].empty:
            for value in out[cat]:
                outx.append(cat)
                outy.append(value) 
                
    # If outliers, shrink whiskers to smallest and largest non-outlier
    qmin = df_gb[vals].min()
    qmax = df_gb[vals].max()
    upper = [min([x,y]) for (x,y) in zip(qmax, upper_cutoff)]
    lower = [max([x,y]) for (x,y) in zip(qmin, lower_cutoff)]

    # Build figure
    p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650, 
                          plot_height=450, x_range=cats)
    p.ygrid.grid_line_color = 'white'
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_width = 2
    p.yaxis.axis_label = ylabel
    
    # stems
    p.segment(cats, upper, cats, q3, line_width=2, line_color="black")
    p.segment(cats, lower, cats, q1, line_width=2, line_color="black")

    # boxes
    p.rect(cats, (q3 + q1)/2, 0.5, q3 - q1, fill_color="mediumpurple", 
           alpha=0.7, line_width=2, line_color="black")

    # median (almost-0 height rects simpler than segments)
    p.rect(cats, q2, 0.5, 0.01, line_color="black", line_width=2)

    # whiskers (almost-0 height rects simpler than segments)
    p.rect(cats, lower, 0.2, 0.01, line_color="black")
    p.rect(cats, upper, 0.2, 0.01, line_color="black")

    # outliers
    p.circle(outx, outy, size=6, color="black")

    return p

p = box_plot(df_frog, 'impact force (mN)', 'ID', ylabel='Impact force (mN)')
bokeh.io.show(p)

Specifying tools

Using the bokeh.plotting interface, we can also specify which tools we want available. For example, we can add a HoverTool that will give information about each data point if we hover the mouse over it.

In [16]:
# Eliminate spaces from column headings to allow tooltip to work
df_frog = df_frog.rename(columns={'impact force (mN)': 'impf',
                                  'adhesive force (mN)': 'adhf'})

# Specify data source
source = bokeh.models.ColumnDataSource(df_frog)

# What pops up on hover?
tooltips = [('frog', '@ID'),
           ('imp', '@impf'),
           ('adh', '@adhf')]

# Make the hover tool
hover = bokeh.models.HoverTool(tooltips=tooltips)

# Create figure
p = bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650, 
                          plot_height=450)
p.xgrid.grid_line_color = 'white'
p.ygrid.grid_line_color = 'white'

# Add the hover tool
p.add_tools(hover)

# Populate glyphs
p.circle(x='adhf', y='impf', size=7, alpha=0.5, source=source)

bokeh.io.show(p)

Linking subplots

Bokeh also has the wonderful capability of linking subplots. The key here is to specify that the plots have the same ranges of the $x$ and $y$ variables. To enable linked selections, they also need to have their data come from the same source. We can construct a ColumnDataSource from a Pandas DataFrame. We need to untidy our data first, since the Bokeh ColumnDataSource object expects columnar data to plot.

In [22]:
# Unmelt the DataFrame
df_fish_unmelt = df_fish.pivot_table(index=['zeit', 'light', 'day', 'CLOCK'], 
                    columns='fish', values='activity').reset_index()

# Creat data source
source = bokeh.plotting.ColumnDataSource(df_fish_unmelt)

# Determine when nights start and end
lefts, rights = nights(df_fish[df_fish.fish=='FISH1'])

# Create figures
ps = [bokeh.plotting.figure(background_fill='#DFDFE5', plot_width=650, 
                            plot_height=250) for i in range(3)] 

# Link ranges (enable linked panning/zooming)
for i in (1, 2):
    ps[1].x_range = ps[0].x_range
    ps[2].x_range = ps[0].x_range
    ps[1].y_range = ps[0].y_range
    ps[2].y_range = ps[0].y_range
    
# Label the axes
for i in range(3):
    ps[i].yaxis.axis_label = 'sec of activity / 10 min'
    ps[i].xaxis.axis_label = 'time (h)'

# Specify colors
colors = ['dodgerblue', 'tomato', 'indigo']

# Stylize
for i, _ in enumerate(ps): 
    ps[i].xgrid.grid_line_color='white'
    ps[i].ygrid.grid_line_color='white'
    
# Populate glyphs
for i, fish in enumerate(['FISH11', 'FISH12', 'FISH23']):
    # Put in line
    ps[i].line(x='zeit', y=fish, line_width=1, source=source,
               color=colors[i])
    
    # Label with title
    ps[i].title = fish
        
    # Make shaded boxes for nights
    night_boxes = []
    for j, left in enumerate(lefts):
        night_boxes.append(
                bokeh.models.BoxAnnotation(plot=ps[i], left=left, right=rights[j], 
                                           fill_alpha=0.3, fill_color='gray'))
    ps[i].renderers.extend(night_boxes)
        
my_plot = bokeh.plotting.vplot(*tuple(ps))
        
bokeh.io.show(my_plot)