Before starting the plots, let's import the required libraries and read the data first. In this tutorial article, We need pandas to read the dataset, and use hvPlot's pandas-like API and its default bokeh plotting backend to create the plots, thus we need to import pandas
library and hvplot.pandas
module.
import pandas as pd
import hvplot.pandas
We continue to use the GDP (current US$) dataset of world top 6 economies during 1991 to 2020 to display the examples in this article. Let's read it directly from one of my GitHub repository as follows.
url = 'https://raw.githubusercontent.com/shoukewei/data/main/data-pydm/gdp_top_six_economies.csv'
df = pd.read_csv(url)
df.head()
Year | China | Germany | India | Japan | United Kingdom | United States | |
---|---|---|---|---|---|---|---|
0 | 1991 | 3.833733e+11 | 1.868945e+12 | 2.701053e+11 | 3.584420e+12 | 1.142797e+12 | 6.158129e+12 |
1 | 1992 | 4.269157e+11 | 2.131572e+12 | 2.882084e+11 | 3.908809e+12 | 1.179660e+12 | 6.520327e+12 |
2 | 1993 | 4.447313e+11 | 2.071324e+12 | 2.792960e+11 | 4.454144e+12 | 1.061389e+12 | 6.858559e+12 |
3 | 1994 | 5.643247e+11 | 2.205074e+12 | 3.272756e+11 | 4.998798e+12 | 1.140490e+12 | 7.287236e+12 |
4 | 1995 | 7.345479e+11 | 2.585792e+12 | 3.602820e+11 | 5.545564e+12 | 1.346423e+12 | 7.639749e+12 |
hvPlot will overlay the plots onto one axis by default when we plot multiple columns. For example, when we plot the GDP of 6 economies, we get a single multiple line chart, which has been displayed in the last article. Here, we just repeat it once more in order to better understand the method of subplots.
df.hvplot(x='Year',
y = ['United States', 'China', 'Japan','Germany','United Kingdom','India'],
ylabel= 'GDP (current US$)',
width=700, height=400,
title="GDP of World top 6 Economies",
group_label='GDP',
legend='top_left')
In our case, it is good enough to have a single multiple line plot for different columns, because these columns denote the same index, i.e. GDP in the example.
In many cases, however, each column represents different things, for example, GDP, Population, Age, Education and whatever. For such cases, it is probably better to create a separate subplot for each column. To create subplots, we just need to specify subplots=True
and use cols()
method to specify the numbers of subplots on each row. In the following example, we layout 3 subplots per row.
By default, hvPlot will create all subplot with linked and normalized axes, and other plots will be changed too if one plot is zoomed in or zoomed out. This feature provides a convenient way to compare the numerical values across plots.
df.hvplot(x='Year',
y = ['United States', 'China', 'Japan','Germany','United Kingdom','India'],
ylabel= 'GDP (current US$)',
width=300, height=250,
group_label='GDP',
subplots=True).cols(3)
However, each plot is better to give its own range if the data covers widely different numerical ranges or each variable has quite different unit. To create subplots with different ranges, we just need to specify shared_axes=False
.
df.hvplot(x='Year',
y = ['United States', 'China', 'Japan','Germany','United Kingdom','India'],
ylabel= 'GDP (current US$)',
width=300, height=300,
group_label='GDP',
subplots=True,
shared_axes=False).cols(3)
Let's use pandas melt()
to reshape the dataset as we did in the last article.
df_melt = pd.melt(df, id_vars=['Year'],
value_vars=['China', 'Germany', 'India', 'Japan', 'United Kingdom','United States'],
var_name="Country",
value_name='GDP')
df_melt
Year | Country | GDP | |
---|---|---|---|
0 | 1991 | China | 3.833733e+11 |
1 | 1992 | China | 4.269157e+11 |
2 | 1993 | China | 4.447313e+11 |
3 | 1994 | China | 5.643247e+11 |
4 | 1995 | China | 7.345479e+11 |
... | ... | ... | ... |
175 | 2016 | United States | 1.869511e+13 |
176 | 2017 | United States | 1.947962e+13 |
177 | 2018 | United States | 2.052716e+13 |
178 | 2019 | United States | 2.137257e+13 |
179 | 2020 | United States | 2.089374e+13 |
180 rows × 3 columns
For such types of data, we can also use the subplots=True
and/or shared_axes
arguments when using the by
keyword to group the data along a dimension. In the following example, we create 6 scatter subplots with 2 plot per row.
df_melt.hvplot.scatter(x='Year', y='GDP',
by='Country',
subplots=True,
shared_axes=False,
width=400, height=250,
alpha=0.5).cols(2)
hvPlot also provides a convenient method to arrange multidimensional data into an explicit 1D row of plots or a 2D grid of plots with shared axes. This kind of grid subplots gives an easier way to compare the variable values across large numbers of plots. To make a row or grid plots, you just need to specify the col
keyword and add a row
keyword if you want a grid.
df_melt.sort_values('GDP').hvplot.scatter(x='Year', y='GDP',
row='Country',col='Year',
width=400, height=250,
alpha=0.9,rot=90)
Similar to holoViews, hvPlot provides an easy method to overlay any two or more plots using *
. Let's see some examples as follows.
In this example, we overlay the GDP line plot of the United States over the GDP bar plot of China.
line_usa = df.hvplot(x='Year', y='United States',
width=700, height=400,
ylabel='GDP (current US$)',
rot=90)
bar_china = df.hvplot.bar(x='Year',
y='China',
width=700, height=400,
ylabel='GDP (current US$)',
color='tomato',
rot=90)
line_usa * bar_china
In this example, we overlay a scatter plot over a bivariate plot. We can directly overlay them using *
.
df.hvplot.bivariate(x='China', y='United States', legend=False) *\
df.hvplot.scatter(x="China", y="United States", size=30, alpha=0.7,
xlabel="GDP of China",
ylabel="GDP of the United States",
color='red')
In the following example, we overlay a jitter scatter plot over a box plot. Let's show the process step by step.
boxplot = df_melt.hvplot.box(y='GDP', by='Country', height=400, width=700, legend=False)
boxplot
scatterplot = df_melt.hvplot.scatter(x="Country", y="GDP",c='orange').opts(jitter=0.5)
scatterplot
boxplot * scatterplot
multiline = df.hvplot(x='Year',
y = ['United States', 'China', 'Japan','Germany','United Kingdom','India'],
ylabel= 'GDP (current US$)',
width=450, height=400,
title="GDP of World top 6 Economies",
group_label='GDP',
legend='top_left')
stackedbar = df.hvplot.bar(x='Year',
y = ['United States', 'China', 'Japan','Germany','United Kingdom','India'],
stacked=True,
width=450, height=400,
ylabel='GDP (current US$)',
legend='top_left',
rot=90)
multiline + stackedbar
It can also easily to arrange the number of plots per row using the method cols()
method if there are many plots. In the following example, we add a box plot and an area plot to the above two plots.
boxplot = df.hvplot.box(y=['United States', 'China', 'Japan','Germany','United Kingdom','India'],
width=450,height=400,
box_width=0.6,
xlabel= 'World top 6 economies',
ylabel='GDP (current US$)')
areaplot = df.hvplot.area(x='Year',
y=['United States', 'China', 'Japan','Germany','United Kingdom','India'],
xlabel= 'Year',
ylabel='GDP (current US$)',
width=450,height=400,
legend='top_left',
alpha=0.4)
Then, let's arrange them by 2 plot per row rather than the default 4 plots on the same row.
(multiline + stackedbar + boxplot + areaplot).cols(2)
hvPlot can also allow us to combine the overlay and layout Plots. In this example, we overlay the bivariate plot and scatter plot above, and then layout the data table aside.
bivariate = df.hvplot.bivariate(x='China', y='United States', legend=False,
width=550, height=400)
scatter = df.hvplot.scatter(x="China", y="United States", size=30, alpha=0.7,
xlabel="GDP of China",
ylabel="GDP of the United States",
color='red')
table = df.hvplot.table(['China', 'United States'], width=350, height=350)
bivariate * scatter + table
This article demonstrates how to easily create subplots, overlay and layout multiple plots using hvPlot. It includes four 5 essential topics: (1) create subplots, (2) overlay multiple plots, (3) layout multiple plots, (4) combine overlay plots and layout plots, and (5) arrange the plot numbers per row.