HoloViews is a high-level open source data visualization library for Python, which can create interactive plots with easier syntax and only few lines of code. Thus, it makes you focus on problem solving rather than taking time to write codes on plotting.
HoloViews was initially built on top of the Bokeh, a Python data visualization library that is used to create highly interactive plots. Besides Boken, HoloViews can also work with Matplotlib and Plotly libraries. It works seamlessly with Jupyter Notebook, JupyterLab and the widely used data analysis libraries such as NumPy, SciPy and Pandas.
In this post, it will display how to use HoloViews to create the most widely used basic interactive plots with a real-world dataset.
pip
¶pip install "holoviews[recommended]"
There are also many other installation options for pip, and you can read the its online doucment for more details.
conda
¶If you use Anaconda or Miniconda, the recommended way to install HoloViews is using the conda
command. Anaconda usually have integrated HoloViews.You can check if it is preinstalled in your Anaconda.
import holoviews as hv
print("Holoviews Version : {}".format(hv.__version__))
Holoviews Version : 1.15.0
If it prints the Holoviews version, that means it has been preinstalled in your Anaconda. Or you can install it using the following command.
conda install -c pyviz holoviews bokeh
Besides, if you works with JupyterLab<2.0 you should also install the PyViz extension:
jupyter labextension install @pyviz/jupyterlab_pyviz
import pandas as pd
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
In this post, we use economic dataset of Chinese top 5 provinces by GDP, and it has been used in many of my previous posts.
url = 'https://raw.githubusercontent.com/shoukewei/data/main/data-pydm/gdp_china_outlier_treated.csv'
df = pd.read_csv(url)
df.head()
prov | gdpr | year | gdp | pop | finv | trade | fexpen | uinc | |
---|---|---|---|---|---|---|---|---|---|
0 | Guangdong | First | 2000 | 1.074125 | 8.650000 | 0.314513 | 1.408147 | 0.108032 | 0.976157 |
1 | Guangdong | First | 2001 | 1.203925 | 8.733000 | 0.348443 | 1.501391 | 0.132133 | 1.041519 |
2 | Guangdong | First | 2002 | 1.350242 | 8.842000 | 0.385078 | 1.830169 | 0.152108 | 1.113720 |
3 | Guangdong | First | 2003 | 1.584464 | 8.963000 | 0.481320 | 2.346735 | 0.169563 | 1.238043 |
4 | Guangdong | First | 2004 | 1.886462 | 9.052298 | 0.587002 | 2.955899 | 0.185295 | 1.362765 |
where:
HoloViews only needs few lines with simple and expressive syntax to create an interactive plot. In this section, we will display how to create the most widely used basic plots using HoloViews.
To create a line plot, we only need one line as follows.
hv.Curve(df, kdims='year', vdims='gdp')
# or just
hv.Curve(df, 'year', 'gdp')
In HoloViews, kdims
denotes key dimensions, such as the x-axis, while vdims
stands for value dimensions, such as the y-axis. Holoviews regards kdims
as primary dimensions to generate a plot and vdims
as additional or secondary dimensions that can add further information to primary dimensions.
All the plot settings are set to default values, which usually looks not so good. We can change the plot attributes by passing additional options.
HoloViews divides the configuration options into two categories, i.e. primary and secondary options. The primary options include main characters of the plot, such as xlabel, ylabel, height, width, etc. The secondary options are all about the detailed elements of data plotting options, like the size of points, alpha, color, width of the bar, etc.
Besides, there are two methods to set options, one uses opts()
method and another uses the Jupyter notebook magic command %%opts
.
opts
method¶You can add .opts()
directly after the plot code as follows:
hv.Curve().opts()
To increase the readability, let's create a variable for the plot, and then use the .opts()
after this variable.
line_plot = hv.Curve(df, kdims='year', vdims='gdp',
label="GDP of China's top 5 GDP provinces ")
line_plot.opts(width=600,
height=400,
xlabel='Time (year)',
ylabel='GDP (x10⁸CNY)',
tools=["hover"],
color="red",
show_grid=True)
%%opts
¶%%opts Curve [width=600 height=400 tools=["hover"] title="GDP of China's top 5 GDP provinces"]
%%opts Curve [xlabel='Province' ylabel='GDP (x10⁸CNY)' show_grid=True]
%%opts Curve (color='blue')
line = hv.Curve(df,
kdims='year',
vdims='gdp')
line
For more information of opts
or one plot, you can use the following command.
hv.opts?
For example, here we see the more information about line plot, i.e. Curve, we use the follows methods.
hv.Curve?
hv.help(hv.Curve)
We can further see the plot structure using print()
function, which can give us meaningful insights, especially for complicate plot.
print(line)
:Curve [year] (gdp)
It displays the plot object including the Curve
method, the kdims in square brackets and all vdims (value dimensions) in parenthesis.
Now let's see how to create a scatter plot. In this example and the following examples, we use opts()
for the configuration options of the plots.
There are two ways that we can create a scatter plot.
scat = hv.Scatter(df, kdims="year", vdims="gdp", label="GDP vs year scatter plot")
scat.opts(xlabel="Year",
ylabel="GDP (x10⁸CNY)",
height=400, width=600,
tools=["hover"],
alpha=0.7, size=15,
color="purple", line_color="black")
We can use point plot to generate the similar plot result as the Scatter method.
point = hv.Points(df, kdims=["year", "gdp"], label="GDP vs year scatter plot")
point.opts(xlabel="Year",
ylabel="GDP (x10⁸CNY)",
height=400, width=600,
tools=["hover"],
alpha=0.7, size=15,
color="purple", line_color="black")
To create a bar plot, we can use Bars()
method.
bar = hv.Bars(df,
kdims='prov',
vdims='gdp')
bar.opts(width=600, height=400,
tools=['hover'],
ylim=(0.0,10),
title="GDP of Chins top 5 GDP provinces",
xrotation=45,
xlabel='Province', ylabel='GDP (x10⁸CNY)',
show_grid=True,
color='blue', hover_color='red',bar_width=0.5)
It can easily generate an inverse bar plot using the invert_axes=True
option.
bar.opts(invert_axes=True,width=500)
We can generate box whisker plot easily using BoxWhisker
method.
box_plot = hv.BoxWhisker(df,
kdims="prov",
vdims="gdp" ,
label="GDP distribution of Chinese top 5 big GDP prvinces")
box_plot.opts(width=600, height=400,
xlabel="Province", ylabel="GDP (x10⁸CNY)",
tools=["hover"],
xrotation=45)
The area plot is created using Area() method of holoviews. In the example, we generate an area plot of GDP of Guangdong province.
area = hv.Area(df[df["prov"] == "Guangdong"], kdims="year", vdims="gdp", label="Close Prices")
area.opts(height=500, width=600,
xlabel="Province", ylabel="GDP (x10⁸CNY)",
tools=["hover"],
title="Area plot of GDP of Guangdong Province in China",
fill_alpha=0.5 )
It can use a NumPy histogram method to generate histogram entries, and then we pass it to the Histogram
method of HoloViews to produce a histogram.
import numpy as np
hist = hv.Histogram(np.histogram(df['gdp'], bins=20),
kdims="year", label="GDP histogram")
hist.opts(width=500, height=400, tools=["hover"])
It can also use Violin
method to produce a violin plot easily.
violin = hv.Violin(df, kdims='prov', vdims='gdp')
violin.opts(height=500, width=600,
xlabel='Province', ylabel='GDP (x10⁸CNY)',
tools=["hover"],
ylim=(-7,18))
Let's see how to use HoloViews to create correlation heatmap to see the correlations between multivariables. First, we calculate correlation by calling corr() method
in Pandas.
corr = df.corr().reset_index().rename(columns={"index": "column names"})
corr
column names | year | gdp | pop | finv | trade | fexpen | uinc | |
---|---|---|---|---|---|---|---|---|
0 | year | 1.000000 | 0.888484 | 0.129927 | 0.914602 | 0.475826 | 0.904711 | 0.918817 |
1 | gdp | 0.888484 | 1.000000 | 0.246513 | 0.896596 | 0.713338 | 0.969129 | 0.872392 |
2 | pop | 0.129927 | 0.246513 | 1.000000 | 0.183929 | 0.237565 | 0.257292 | -0.128331 |
3 | finv | 0.914602 | 0.896596 | 0.183929 | 1.000000 | 0.367408 | 0.884774 | 0.827754 |
4 | trade | 0.475826 | 0.713338 | 0.237565 | 0.367408 | 1.000000 | 0.670646 | 0.567929 |
5 | fexpen | 0.904711 | 0.969129 | 0.257292 | 0.884774 | 0.670646 | 1.000000 | 0.868094 |
6 | uinc | 0.918817 | 0.872392 | -0.128331 | 0.827754 | 0.567929 | 0.868094 | 1.000000 |
Next, we melt the correlation coefficients into a new dataframe in oder to use in HoloViews.
melted_corr = corr.melt(id_vars=["column names"],
var_name="variable",
value_name="corr coeffs")
melted_corr.head()
column names | variable | corr coeffs | |
---|---|---|---|
0 | year | year | 1.000000 |
1 | gdp | year | 0.888484 |
2 | pop | year | 0.129927 |
3 | finv | year | 0.914602 |
4 | trade | year | 0.475826 |
Then, we use the Heatmap()
method of HoloViews to create a heatmap of correlation coefficients, in which variable
is as kdims and value
as vdims.
%%opts HeatMap [height=600 width=700 tools=["hover"]]
%%opts HeatMap [xrotation=45 colorbar=True]
%%opts HeatMap (cmap="Purples")
hv.HeatMap(melted_corr)
It is very easy to save a plot as HTML file, and then you can import or embed the HTML file containing the interactive plot on your website. For example, we save the violin plot above into the results folder in the working directory.
hv.save(violin,'./results/gdp_violin.html',fmt='html')
Now, let's import plot back in the Jupyter notebook, and you can see the plot is still interactive.
from IPython.display import HTML
HTML(filename='./results/gdp_violin.html')
In the above section, it displays how to easily use HoloViews to create interactive plots with bokeh as backend. Next, let's see how to set Matplotlib and Plotly as its backends to create an interactive plot.
At the beginning of this post, we have mentioned that it can set the backend by simply calling the hv.extension()
method.
In this example, we set Matplotlib as the backend of HoloView to create a boxplot.
hv.extension("matplotlib")
box_whisker = hv.BoxWhisker(df, kdims="prov", vdims="gdp")
box_whisker.opts(height=500, width=600,
tools=["hover"],
xlabel='Province', ylabel='GDP (x10⁸CNY)',
xrotation=45)
WARNING:param.main: Option 'height' for BoxWhisker type not valid for selected backend ('matplotlib'). Option only applies to following backends: ['bokeh'] WARNING:param.main: Option 'width' for BoxWhisker type not valid for selected backend ('matplotlib'). Option only applies to following backends: ['bokeh'] WARNING:param.main: Option 'tools' for BoxWhisker type not valid for selected backend ('matplotlib'). Option only applies to following backends: ['bokeh']
However, the above warning message shows that Option 'height','width' and 'tool' for BoxWhisker type not valid for matplotlib
backend, and this option only applies to the bokeh
backend.
Now let's see how about to set Plotly as the backend to create the same boxplot.
hv.extension("plotly")
box_plotly = hv.BoxWhisker(df, kdims="prov", vdims="gdp")
box_plotly.opts(height=500, width=600,
xlabel='Province', ylabel='GDP (x10⁸CNY)',
xrotation=45)