In this post, we will investigate how to use HoloViews to create multiple interactive plots with focuses on the overlays and layouts of multiple plots.
import pandas as pd
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
Now let's import the dataset again for the multiple plot creations in the following sections.
url = 'https://raw.githubusercontent.com/shoukewei/data/main/data-pydm/gdp_china_outlier_treated.csv'
df = pd.read_csv(url)
df.head()
prov | gdpr | year | gdp | pop | finv | trade | fexpen | uinc | |
---|---|---|---|---|---|---|---|---|---|
0 | Guangdong | First | 2000 | 1.074125 | 8.650000 | 0.314513 | 1.408147 | 0.108032 | 0.976157 |
1 | Guangdong | First | 2001 | 1.203925 | 8.733000 | 0.348443 | 1.501391 | 0.132133 | 1.041519 |
2 | Guangdong | First | 2002 | 1.350242 | 8.842000 | 0.385078 | 1.830169 | 0.152108 | 1.113720 |
3 | Guangdong | First | 2003 | 1.584464 | 8.963000 | 0.481320 | 2.346735 | 0.169563 | 1.238043 |
4 | Guangdong | First | 2004 | 1.886462 | 9.052298 | 0.587002 | 2.955899 | 0.185295 | 1.362765 |
where:
We can easily merge more than one graph objects in HoloViews. There are two 2 operations, including Overlays and Layouts of multiple plots.
*
: It overlays graphs on one another to create one single graph combining all individuals.+
: It merges graphs by putting them next to each other, i.e. layout of multiple plots, as subplots in MatplotlibIn the following sections, we will see how to create these two types of multiple plots.
In the previous post, we have discussed opts()
method and Jupyter notebook magic command %%opts
method to set the options, and we mainly used the opts()
method. In this post, we will focus on using %%opts
method, which is also very convenient method if you like working with Jupyter notebook or JupyterLab.
In this example, we will create a overlay line plot of the GDP of the 5 provinces. In HoloViews, it needs to create five line charts for each of the 5 provinces separately, and then merged them into one diagram.
%%opts Curve [tools=["hover"] xlabel="Province" ylabel="GDP (x10⁸CNY)" height=500 width=700]
line1 = hv.Curve(df[df["prov"] == "Guangdong"], kdims="year", vdims="gdp",label="Guangdong")
line2 = hv.Curve(df[df["prov"] == "Jiangsu"], kdims="year", vdims="gdp",label="Jiangsu")
line3 = hv.Curve(df[df["prov"] == "Shandong"], kdims="year", vdims="gdp",label="Shandong")
line4 = hv.Curve(df[df["prov"] == "Zhejiang"], kdims="year", vdims="gdp",label="Zhejiang")
line5 = hv.Curve(df[df["prov"] == "Henan"], kdims="year", vdims="gdp",label="Henan")
lines=(line1 * line2 * line3 * line4 * line5)
lines.opts(legend_position='top_left')
It uses label
to add a legend, and the legend positions include [top_right, top_left, bottom_left, bottom_right, right, left, top, bottom]. Besides, we can print the overlay structure of the multiple plots.
print(lines)
:Overlay .Curve.Guangdong :Curve [year] (gdp) .Curve.Jiangsu :Curve [year] (gdp) .Curve.Shandong :Curve [year] (gdp) .Curve.Zhejiang :Curve [year] (gdp) .Curve.Henan :Curve [year] (gdp)
Then, we can display one single plot using the following command.
lines.Curve.Guangdong
Similarly, we can generate an overlay bar plot of the GDP of the five province, or other variables.
%%opts Bars [tools=["hover"] xlabel="Province" ylabel="GDP (x10⁸CNY)" height=500 width=700]
%%opts Bars (bar_width=0.6 line_color="black")
bar1 = hv.Bars(df[df["prov"] == "Guangdong"], kdims="prov", vdims="gdp",label="Guangdong")
bar2 = hv.Bars(df[df["prov"] == "Jiangsu"], kdims="prov", vdims="gdp",label="Jiangsu")
bar3 = hv.Bars(df[df["prov"] == "Shandong"], kdims="prov", vdims="gdp",label="Shandong")
bar4 = hv.Bars(df[df["prov"] == "Zhejiang"], kdims="prov", vdims="gdp",label="Zhejiang")
bar5 = hv.Bars(df[df["prov"] == "Henan"], kdims="prov", vdims="gdp",label="Henan")
bars=(bar1 * bar2 * bar3 * bar4 * bar5)
bars
%%opts Scatter [tools=["hover"] xlabel="year" ylabel="GDP (x10⁸CNY)" height=500 width=700]
%%opts Scatter (alpha=0.5 size=10 line_color="black")
scat1 = hv.Scatter(df[df["prov"] == "Guangdong"], kdims="year", vdims="gdp")
scat2 = hv.Scatter(df[df["prov"] == "Jiangsu"], kdims="year", vdims="gdp")
scat3 = hv.Scatter(df[df["prov"] == "Shandong"], kdims="year", vdims="gdp")
scat4 = hv.Scatter(df[df["prov"] == "Zhejiang"], kdims="year", vdims="gdp")
scat5 = hv.Scatter(df[df["prov"] == "Henan"], kdims="year", vdims="gdp")
scatters = scat1 * scat2 * scat3 * scat4 * scat5
scatters
Matrix scatter plots are very helpful to visualize the correlations between variables. It is very easy to create a matrix scatter plot, which combine many individual scatter charts. Besides, we convert a Pandas Dataframe to a HoloViews Dataset with hv.Dataset
method.
from holoviews.operation import gridmatrix
ds = hv.Dataset(df[['prov','year', 'gdp', 'pop', 'finv', 'trade', 'fexpen','uinc']])
grouped_by = ds.groupby('prov', container_type=hv.NdOverlay)
grid = gridmatrix(grouped_by, diagonal_type=hv.Scatter)
grid.options('Scatter', tools=['hover', 'box_select'], bgcolor='#efe8e2', fill_alpha=0.2, size=4)
%%opts BoxWhisker [tools=["hover"] height=500 width=700 xlabel="Province" ylabel="GDP (x10⁸CNY)"]
box1 = hv.BoxWhisker(df[df["prov"]=="Guangdong"], kdims="prov", vdims="gdp")
box2 = hv.BoxWhisker(df[df["prov"]=="Jiangsu"], kdims="prov", vdims="gdp")
box3 = hv.BoxWhisker(df[df["prov"]=="Zhejiang"], kdims="prov", vdims="gdp")
box4 = hv.BoxWhisker(df[df["prov"]=="Shandong"], kdims="prov", vdims="gdp")
box5 = hv.BoxWhisker(df[df["prov"]=="Henan"], kdims="prov", vdims="gdp")
box1 * box2 * box3 * box4 * box5
For example, we plot all the numerical variables of the five provinces. First, we need to a create new column named variables
, which contains all the column names of the numerical variables (i.e. 'gdp', 'pop', 'finv', 'trade', 'fexpen','uinc') of the dataset DateFrame.
melted_df = df.melt(id_vars=["prov"], var_name="variables")
melted_df = melted_df[melted_df["variables"].isin(['gdp', 'pop', 'finv', 'trade', 'fexpen','uinc'])]
melted_df.head()
prov | variables | value | |
---|---|---|---|
190 | Guangdong | gdp | 1.074125 |
191 | Guangdong | gdp | 1.203925 |
192 | Guangdong | gdp | 1.350242 |
193 | Guangdong | gdp | 1.584464 |
194 | Guangdong | gdp | 1.886462 |
Then, we use the new created melted_df
to easily create an layouts of multiple box plots of the numerical columns of the DataFrame.
%%opts BoxWhisker [tools=["hover"] height=500 width=700]
box1 = hv.BoxWhisker(melted_df[melted_df["variables"]=="gdp"], kdims="variables", vdims="value")
box2 = hv.BoxWhisker(melted_df[melted_df["variables"]=="pop"], kdims="variables", vdims="value")
box3 = hv.BoxWhisker(melted_df[melted_df["variables"]=="finv"], kdims="variables", vdims="value")
box4 = hv.BoxWhisker(melted_df[melted_df["variables"]=="trade"], kdims="variables", vdims="value")
box5 = hv.BoxWhisker(melted_df[melted_df["variables"]=="fexpen"], kdims="variables", vdims="value")
box6 = hv.BoxWhisker(melted_df[melted_df["variables"]=="uinc"], kdims="variables", vdims="value")
box1 * box2 * box3 * box4 * box5 * box6
In this example, we plot the box plots of all the numerical columns of all the five provinces.
%%opts BoxWhisker [tools=["hover"] height=500 width=700 ]
%%opts BoxWhisker [xrotation=45]
%%opts BoxWhisker (box_color="prov" box_cmap="Category20")
multiboxs = hv.BoxWhisker(melted_df, kdims=["prov","variables"], vdims="value")
multiboxs
The previous post has discussed that we pass histogram entries, which are generated using NumPy histogram method, to the histogram method of HoloViews to produce a histogram.
import numpy as np
%%opts Histogram [height=500 width=800]
%%opts Histogram (alpha=0.6)
hist1 = hv.Histogram(np.histogram(df[df["prov"]=="Guangdong"]['gdp'], bins=30), kdims="gdp", label="Guangdong")
hist2 = hv.Histogram(np.histogram(df[df["prov"]=="Jiangsu"]['gdp'], bins=30), kdims="dgp", label="Jiangsu")
hist3 = hv.Histogram(np.histogram(df[df["prov"]=="Zhejiang"]['gdp'], bins=30), kdims="dgp", label="Zhejiang")
hist4 = hv.Histogram(np.histogram(df[df["prov"]=="Shandong"]['gdp'], bins=30), kdims="dgp", label="Shandong")
hist5 = hv.Histogram(np.histogram(df[df["prov"]=="Henan"]['gdp'], bins=30), kdims="gdp", label="Henan")
hist1 * hist2 * hist3 * hist4 * hist5
%%opts Violin [height=500 width=700 xlabel='Province' ylabel='GDP (x10⁸CNY)' ylim=(-7,18)]
violin1 = hv.Violin(df[df["prov"]=="Guangdong"],kdims='prov', vdims='gdp')
violin2 = hv.Violin(df[df["prov"]=="Jiangsu"],kdims='prov', vdims='gdp')
violin3 = hv.Violin(df[df["prov"]=="Zhejiang"],kdims='prov', vdims='gdp')
violin4 = hv.Violin(df[df["prov"]=="Shandong"],kdims='prov', vdims='gdp')
violin5 = hv.Violin(df[df["prov"]=="Henan"],kdims='prov', vdims='gdp')
violin1 * violin2 * violin3 * violin4 * violin5
We regard all each numerial column of all the 5 province as whole to create a violin plots. For the violin multiple plots, there is an error for object data type. But the tricky is to transform melted DateFrame to a dictionary, and then transform it back to DataFrame. Maybe this is bug because it works well for BoxWhisker.
melted_df = pd.DataFrame(melted_df.to_dict())
%%opts Violin [height=500 width=700 ylim=(-5,15)]
violin1 = hv.Violin(melted_df[melted_df["variables"]=="gdp"], kdims="variables", vdims="value")
violin2 = hv.Violin(melted_df[melted_df["variables"]=="pop"], kdims="variables", vdims="value")
violin3 = hv.Violin(melted_df[melted_df["variables"]=="finv"], kdims="variables", vdims="value")
violin4 = hv.Violin(melted_df[melted_df["variables"]=="trade"], kdims="variables", vdims="value")
violin5 = hv.Violin(melted_df[melted_df["variables"]=="fexpen"], kdims="variables", vdims="value")
violin6 = hv.Violin(melted_df[melted_df["variables"]=="uinc"], kdims="variables", vdims="value")
violin1 * violin2 * violin3 * violin4 * violin5 * violin6
Similar as what we did in BoxWhisker, we create an overlay violin plot of multiple variables of different categories.
%%opts Violin [height=500 width=800 ylim=(-8,18)]
%%opts Violin [xrotation=45]
%%opts Violin (box_color="prov" box_cmap="Category20")
violin_all = hv.Violin(melted_df, kdims=["prov","variables"], vdims="value")
violin_all
In this section, we display how to create a hexagonal binning plot commonly called as a hexbin plot using holoviews. In the following example, We create hexbin plot to illustrate the relationship between uinc
(Urban disposal income per capita) and gdp
.
HoloeViews uses HexTiles() method to create hexbin plot. In this example, We also include kernel density estimate using Bivariate() method. Then we overlay hexbin plot and kernel density plot to merge them into one plot.
%%opts HexTiles [width=550 height=500 tools=["hover"] xlabel='Urban disposal income per capita (CNY)' ylabel='GDP (x10⁸CNY)' colorbar=True]
%%opts HexTiles (cmap="OrRd")
%%opts Bivariate [show_legend=False]
%%opts Bivariate (cmap="OrRd")
hextiles = hv.HexTiles(data=df, kdims=["uinc", "gdp"])
bivariate = hv.Bivariate(data=df, kdims=["uinc", "gdp"])
hextiles * bivariate
%%opts Curve [tools=["hover"] xlabel="Province" ylabel="GDP (x10⁸CNY)"]
line1 = hv.Curve(df[df["prov"] == "Guangdong"], kdims="year", vdims="gdp",label="Guangdong")
line2 = hv.Curve(df[df["prov"] == "Jiangsu"], kdims="year", vdims="gdp",label="Jiangsu")
line3 = hv.Curve(df[df["prov"] == "Shandong"], kdims="year", vdims="gdp",label="Shandong")
line4 = hv.Curve(df[df["prov"] == "Zhejiang"], kdims="year", vdims="gdp",label="Zhejiang")
line5 = hv.Curve(df[df["prov"] == "Henan"], kdims="year", vdims="gdp",label="Henan")
lines=(line1 + line2 + line3 + line4 + line5)
lines.cols(3)
%%opts Bars [tools=["hover"] xlabel="Province"]
%%opts Bars [height=400 width=400 xrotation=45]
bar1 = hv.Bars(df, "prov","gdp", label="GDP of each province")
bar2 = hv.Bars(df, "prov","pop", label="Population of each province")
bar3 = hv.Bars(df, "prov","finv", label="Fixed assets investment of each province")
bar4 = hv.Bars(df, "prov","trade", label="Trade of each province")
bar1.opts(color="red",ylabel="GDP (x10⁸CNY)")
bar2.opts(color="blue",ylabel="Population (x10⁴ person)")
bar3.opts(color="green",ylabel="Fixed assets investment (x10⁸ CNY)")
bar4.opts(color="orange",ylabel="Trade (CNY)")
bars = bar1 + bar2 + bar3 + bar4
bars.cols(2)
Now, we can print the layout structure and display only single plot.
print(bars)
:Layout .Bars.GDP_of_each_province :Bars [prov] (gdp) .Bars.Population_of_each_province :Bars [prov] (pop) .Bars.Fixed_assets_investment_of_each_province :Bars [prov] (finv) .Bars.Trade_of_each_province :Bars [prov] (trade)
bars.Bars.GDP_of_each_province
In many cases, we want to stack the multiple bar plots rather than to layout them side and side. Here, we have explained how to create stacked bar chart using HoloViews.
%%opts Bars [tools=["hover"] stacked=True width=600 height=400 tools=["hover"]]
%%opts Bars [show_legend=True legend_position="right" legend_opts={"title": "GDP:"}]
%%opts Bars [xrotation=45,ylabel="GDP (x10⁸CNY)"]
bar = hv.Bars(df,
kdims=["year", "prov"],
vdims=["gdp"])
bar
We can also create a grouped bar chart for each category, i.e. province in this example. The code is exactly the same as the previous example except removing the stacked=True
or set it to False
.
%%opts Bars [width=950 height=450 tools=["hover"]]
%%opts Bars [show_legend=True legend_position="top" legend_opts={"title": "GDP"}]
%%opts Bars [xrotation=45 xlabel="Province, Year"]
provinces = ['Guangdong', 'Jiangsu', 'Zhejiang','Shandong','Henan']
bar_group = hv.Bars(df[df["prov"].isin(provinces)],
kdims=["prov","year"],
vdims=["gdp"])
bar_group
We create 6 box plots for the 6 numerical variables, then layout them by 2 on each row. As discussed in the previous section, we can easily set different options to each subplot separately using opts()
method. For example, we set the correct ylabel
for each variable in the subplots.
%%opts BoxWhisker [tools=["hover"] xlabel="Provinces" height=400 width=400 xrotation=45]
boxs1 = hv.BoxWhisker(df, kdims="prov", vdims="gdp" , label="GDP of 5 the prvinces ")
boxs2 = hv.BoxWhisker(df, kdims="prov", vdims="pop" , label="Population of the 5 prvinces ")
boxs3 = hv.BoxWhisker(df, kdims="prov", vdims="finv" , label="Fixed investment of the 5 prvinces ")
boxs4 = hv.BoxWhisker(df, kdims="prov", vdims="trade" , label="Trade of the 5 prvinces ")
boxs5 = hv.BoxWhisker(df, kdims="prov", vdims="fexpen" , label="Fixed expense of the 5 prvinces ")
boxs6 = hv.BoxWhisker(df, kdims="prov", vdims="uinc" , label="Urban income of the 5 prvinces ")
# set different ylabels
boxs1.opts(ylabel="GDP (x10⁸CNY)")
boxs2.opts(ylabel="Population (x10⁴person)")
boxs3.opts(ylabel="Fixed assets investment (x10⁸CNY)")
boxs4.opts(ylabel="Trade (CNY)")
boxs4.opts(ylabel="Fiscal expenditure (x10⁹CNY)")
boxs4.opts(ylabel="Urban disposal income per capita (CNY)")
(boxs1 + boxs2 + boxs3 + boxs4 + boxs5 + boxs6).cols(2)
Similar with box whisker plots, we can easily create different layouts for multiple violin plots with change the Bars into Violin.
%%opts Violin [tools=["hover"] xlabel="Provinces" height=400 width=400 xrotation=45 ylim=(-7,18) ]
violin1 = hv.Violin(df, kdims="prov", vdims="gdp" , label="GDP of 5 the prvinces ")
violin2 = hv.Violin(df, kdims="prov", vdims="pop" , label="Population of the 5 prvinces ")
violin3 = hv.Violin(df, kdims="prov", vdims="finv" , label="Fixed investment of the 5 prvinces ")
violin4 = hv.Violin(df, kdims="prov", vdims="trade" , label="Trade of the 5 prvinces ")
violin5 = hv.Violin(df, kdims="prov", vdims="fexpen" , label="Fixed expense of the 5 prvinces ")
violin6 = hv.Violin(df, kdims="prov", vdims="uinc" , label="Urban income of the 5 prvinces ")
# set different ylabels
violin1.opts(ylabel="GDP (x10⁸CNY)")
violin2.opts(ylabel="Population (x10⁴person)")
violin3.opts(ylabel="Fixed assets investment (x10⁸CNY)")
violin4.opts(ylabel="Trade (CNY)")
violin5.opts(ylabel="Fiscal expenditure (x10⁹CNY)")
violin6.opts(ylabel="Urban disposal income per capita (CNY)")
(violin1 + violin2 + violin3 + violin4 + violin5 + violin6).cols(2)
Since we have know all the other types of plots, it is similar to create a histogram layout plot. Let's create a layout of 2 x 2 for four histogram plots.
import numpy as np
%%opts Histogram [tools=["hover"] height=300 width=400 show_legend=True]
%%opts Histogram (alpha=0.6)
hist1 = hv.Histogram(np.histogram(df['gdp'], bins=24), kdims="gdp", label="GDP histogram")
hist2 = hv.Histogram(np.histogram(df['pop'], bins=24), kdims="pop", label="Population histogram")
hist3 = hv.Histogram(np.histogram(df['finv'], bins=24), kdims="finv", label="Fixed investment histogram")
hist4 = hv.Histogram(np.histogram(df['trade'], bins=24), kdims="trade", label="Trade histogram")
(hist1 + hist2 + hist3 + hist4).cols(2)
Interactive plot app is another special layout of multiple plots, where there is a plot with sliders and/or selectable menus. We can conveniently show other plots by selecting the values on slider and different items in the menu.
First, it needs to convert the DataFrame of Pandas into a HoloViews Dataset.
In the section of Matrix scatter plots, we have touched a bit of converting dataset of Pandas's DataFrame into HoloViews Dataset by using hv.Dataset()
method. Let's repeat it to show the whole process.
hd = hv.Dataset(df)
hd
:Dataset [prov,gdpr,year,gdp,pop,finv,trade,fexpen,uinc]
# the dataframe columns of pandas
df.columns
Index(['prov', 'gdpr', 'year', 'gdp', 'pop', 'finv', 'trade', 'fexpen', 'uinc'], dtype='object')
We can set the variables to converse. For example, we remove 'gdpr'.
hd = hv.Dataset(df, kdims=['year'], vdims=['prov','year', 'gdp', 'pop', 'finv', 'trade', 'fexpen',
'uinc'])
hd
:Dataset [year] (prov,year,gdp,pop,finv,trade,fexpen,uinc)
The advantage of converting dataset to HoloViews Dataset is that we can use some more convenient functions in HoloViews, such as select()
and aggregate()
methods. For example:
hv.Curve(hd.select(prov='Guangdong'), 'year', 'gdp').opts(tools=["hover"],width=400)
You can see that it is more convenient than the Pandas's slice method we used, thus you can covert the your dataset first before create different plots. However, the purpose of this and previous posts is to show how to create most widely used plots using HoloViews and our Pandas's DataFrame.
It just simply use .to()
method to map the variables of a HoloViews Dataset into an interactive app. It uses goupby
to create the slider or menu. HoloViews also provides powerful functions to create interactive dashboard, and it is better to write another post in the future. Thus, we only touch this topic here using 2 simple examples as follows.
curves_app = hd.to(hv.Curve, kdims=['year'], vdims=['gdp'], groupby='prov')
curves_app.opts(height=400,width=500,
tools=["hover"],
xlabel="Province",
ylabel="GDP (x10⁸CNY)")
bars_app = hd.to(hv.Bars, kdims=['prov'], vdims=['gdp'], groupby='year')
bars_app.opts(height=400,width=500,
xrotation=45,
tools=["hover"],
xlabel="Province",
ylabel="GDP (x10⁸CNY)",
ylim=(0.0,10),
color="red")