Let's read dataset directly from one of my GitHub repository as follows.
import pandas as pd
url = 'https://raw.githubusercontent.com/shoukewei/data/main/data-pydm/gdp_top_six_economies.csv'
df = pd.read_csv(url)
df.head()
Year | China | Germany | India | Japan | United Kingdom | United States | |
---|---|---|---|---|---|---|---|
0 | 1991 | 3.833733e+11 | 1.868945e+12 | 2.701053e+11 | 3.584420e+12 | 1.142797e+12 | 6.158129e+12 |
1 | 1992 | 4.269157e+11 | 2.131572e+12 | 2.882084e+11 | 3.908809e+12 | 1.179660e+12 | 6.520327e+12 |
2 | 1993 | 4.447313e+11 | 2.071324e+12 | 2.792960e+11 | 4.454144e+12 | 1.061389e+12 | 6.858559e+12 |
3 | 1994 | 5.643247e+11 | 2.205074e+12 | 3.272756e+11 | 4.998798e+12 | 1.140490e+12 | 7.287236e+12 |
4 | 1995 | 7.345479e+11 | 2.585792e+12 | 3.602820e+11 | 5.545564e+12 | 1.346423e+12 | 7.639749e+12 |
To use pandas like plotting API and default bokeh plotting backend, we need to import hvplot.pandas
first as follows.
import hvplot.pandas
In general, similar to pandas, hvPlot also has two plotting methods on the pandas-like plotting API:
plot.<kind>(...)
plot(kind=<kind>, ...)
In the following sections, we will use plot.<kind>(...)
method.
We start from line plots because line plot is probably the most widely used one for data visualization.
Let's create a single line plot using the GDP of the United States. We can omit the kind of line in the method plot.line()
because the default plot type is line.
df.hvplot(x='Year',
y='United States',
ylabel= 'GDP of United States(current US$)',
width=700, height=400,
title="GDP of the United States")
In this example, we create a multi-line plot for the GDP of the 6 economies. If you like, you can just select some of them that would like to plot.
df.hvplot(x='Year',
y = ['United States', 'China', 'Japan','Germany','United Kingdom','India'],
ylabel= 'GDP (current US$)',
width=700, height=400,
title='GDP of World top 6 Economies')
Besides, we can also easily change the legend title by using group_label
if you think the default name 'Variable' is not proper here. Let's use GDP
instead of it. In addition, the legend position can be put to other place, for example top_left
in the following example.
df.hvplot(x='Year',
y = ['United States', 'China', 'Japan','Germany','United Kingdom','India'],
ylabel= 'GDP (current US$)',
width=700, height=400,
title='GDP of World top 6 Economies',
group_label='GDP',
legend='top_left')
gdp_mean = df.agg('mean')
gdp_mean = gdp_mean.drop('Year')
gdp_mean
China 5.070277e+12 Germany 2.941571e+12 India 1.199736e+12 Japan 4.864806e+12 United Kingdom 2.206106e+12 United States 1.320973e+13 dtype: float64
gdp_mean.hvplot.bar(width=700, height=400,
xlabel='World top 6 economies',
ylabel='Average GDP (current US$)')
We can easily change the bar color by specifying the parameter color
. For example, we set color='tomato'
in the following example.
gdp_mean.hvplot.bar(width=700, height=400,
xlabel='World top 6 economies',
ylabel='GDP (current US$)',
color='tomato')
Here, we create a time series bar plot on GDP to show the annual GDP growth of the United Sates.
df.hvplot.bar(x='Year',
y='China',
width=700, height=400,
ylabel='GDP (current US$)',
rot=90)
In the following example, we create a stacked bar plot on the annual GDP of 6 economies.
df.hvplot.bar(x='Year',
y = ['United States', 'China', 'Japan','Germany','United Kingdom','India'],
stacked=True,
width=700, height=400,
ylabel='GDP (current US$)',
rot=90)
We can easily use barh()
method to create horizontal bar plot.
df.hvplot.barh(x='Year',
y = ['United States', 'China', 'Japan','Germany','United Kingdom','India'],
stacked=True,
width=700, height=400,
ylabel='GDP (current US$)',
legend='bottom_right')
We can easily create a scatter plot by calling the scatter()
method on the hvplot module by passing x and y values. We can set marker size, the transparency, the color and more other elements on the chart. In the following example, we create a scatter plot between China and the United States.
df.hvplot.scatter(x="China",
y="United States",
size=50,
alpha=0.7,
xlabel="GDP of China",
ylabel="GDP of the United States",
color='red',
title="GDP of United States Vs. China",
)
Besides, we can also easily add a color bar.
df.hvplot.scatter(x="China",
y="United States",
size=50,
alpha=0.7,
xlabel="GDP of China",
ylabel="GDP of the United States",
title="GDP of United States Vs. China",
c='Year')
df.hvplot.hist(y='Japan',
width=700, height=400,
bins=20,
alpha=0.9,
xlabel= 'GDP of Japan (current US$)',
ylabel='Frequency',
title='GDP Distribution of Japan')
We can create an overlapped histogram as well to see the distribution of different categories, i.e. economies in our case. In the following example, an overlapped histogram for the world top 3 economies, i.e. 'United States', 'China' and 'Japan' is created.
df.hvplot.hist(y = ['United States', 'China', 'Japan'],
width=700, height=400,
alpha=0.7,
bins=30,
xlabel= 'GDP of United States, China and Japan (current US$)',
ylabel='Frequency')
df.hvplot.box(y=['United States', 'China', 'Japan','Germany','United Kingdom','India'],
height=400,
box_width=0.6,
xlabel= 'World top 6 economies',
ylabel='GDP (current US$)')
Normally, a box plot is vertical by default. But it can also easily create a horizontal box plot by setting invert=True
in hvPlot.
df.hvplot.box(y=['United States', 'China', 'Japan','Germany','United Kingdom','India'],
height=400,
box_width=0.6,
invert=True,
xlabel= 'World top 6 economies',
ylabel='GDP (current US$)')
First, let's reshape the dataset by creating a Country column (i.e. a category variable) and GDP column (a value variable) using pandas melt()
method.
df_melt = pd.melt(df, id_vars=['Year'],
value_vars=['China', 'Germany', 'India', 'Japan', 'United Kingdom','United States'],
var_name='Country',
value_name='GDP')
df_melt
Year | Country | GDP | |
---|---|---|---|
0 | 1991 | China | 3.833733e+11 |
1 | 1992 | China | 4.269157e+11 |
2 | 1993 | China | 4.447313e+11 |
3 | 1994 | China | 5.643247e+11 |
4 | 1995 | China | 7.345479e+11 |
... | ... | ... | ... |
175 | 2016 | United States | 1.869511e+13 |
176 | 2017 | United States | 1.947962e+13 |
177 | 2018 | United States | 2.052716e+13 |
178 | 2019 | United States | 2.137257e+13 |
179 | 2020 | United States | 2.089374e+13 |
180 rows × 3 columns
The dataset that we see is organized in this way in many cases where there is a Category/Class variable. Now, we create a box plot for this kind of dataset with parameter by='Category variable'
, i.e. Country in this example.
df_melt.hvplot.box(y='GDP',
by='Country',
height=400,
box_width=0.6,
xlabel= 'World top 6 economies',
ylabel='GDP (current US$)')
df.hvplot.violin(y=['United States', 'China', 'Japan','Germany','United Kingdom','India'],
height=400,
xlabel= 'World top 6 economies',
ylabel='GDP (current US$)',
ylim=(-3.500e+13,5.000e+13))
Similar to box plot, we can create violin plot using the dataset of the above df_melt
type of DataFrame.
df_melt.hvplot.violin(y='GDP',
by='Country',
height=400,
xlabel= 'World top 6 economies',
ylabel='GDP (current US$)',
ylim=(-3.500e+13,5.000e+13))
df.hvplot.area(x='Year',
y='Germany',
height=400,
alpha=0.4,
xlabel= 'Germany',
ylabel='GDP (current US$)')
We can pass more than one column to area() method to create an overlapped area plot.The default overlapped area plot is stacked.
df.hvplot.area(x='Year',
y=['United States', 'China', 'Japan','Germany','United Kingdom','India'],
xlabel= 'Year',
ylabel='GDP (current US$)',
width=700,height=400,
alpha=0.4)
To prevent the area plots from stacking over one another, just add the stacked parameter stacked=False
.
df.hvplot.area(x='Year',
y=['United States', 'China', 'Japan','Germany','United Kingdom','India'],
xlabel= 'Year',
ylabel='GDP (current US$)',
width=700,height=400,
stacked=False,
alpha=0.4)
df.hvplot.kde(y='United States',
width=700, height=400,
alpha=0.9,
color='orangered',
xlabel='The United States GDP (current US$)',
ylabel='Density',
title='The United States GDP Distribution')
Similar to creating the overlapped histograms, we can easily create overlapped KDEs.
df.hvplot.kde(y=['United States', 'China', 'Japan','Germany','United Kingdom','India'],
width=700, height=400,
alpha=0.9,
ylabel='Density')
hvPlot also provide an easy step
method to create a step plot.
df.hvplot.step(x='Year',
y=['United States', 'China', 'Japan','Germany','United Kingdom','India'],
value_label='GDP (current US$)',
legend='top',
height=500, width=700)
It is much easier to create a heatmap using heatmap()
method for the dataset with a categorical column. Let's use the df_melt
dataframe to display this example. The color bar is displayed by default, and you can set it to false by parameter colorbar=False
.
df_melt.hvplot.heatmap(x='Year',
y='Country',
C='GDP',
height=500, width=700,
colorbar=False)
For the default dataset we read from GitHub, we can create a heatmap in the following way. First, we set 'Year' as the index column.
df_new = df.set_index('Year')
df_new.head()
China | Germany | India | Japan | United Kingdom | United States | |
---|---|---|---|---|---|---|
Year | ||||||
1991 | 3.833733e+11 | 1.868945e+12 | 2.701053e+11 | 3.584420e+12 | 1.142797e+12 | 6.158129e+12 |
1992 | 4.269157e+11 | 2.131572e+12 | 2.882084e+11 | 3.908809e+12 | 1.179660e+12 | 6.520327e+12 |
1993 | 4.447313e+11 | 2.071324e+12 | 2.792960e+11 | 4.454144e+12 | 1.061389e+12 | 6.858559e+12 |
1994 | 5.643247e+11 | 2.205074e+12 | 3.272756e+11 | 4.998798e+12 | 1.140490e+12 | 7.287236e+12 |
1995 | 7.345479e+11 | 2.585792e+12 | 3.602820e+11 | 5.545564e+12 | 1.346423e+12 | 7.639749e+12 |
Then we create the heatmap by set x='columns', y='index'
. It can also specify different colors for each column, set x-axis on the top
or bottom
. Besides, we can use opts
to add more options.
htmap = df_new.hvplot.heatmap(x='columns',
y='index',
title='GDP of World Top 6 Economies 1991—2020',
cmap=["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"],
xaxis='top',
rot=70,
width=700, height=400)
htmap.opts(
toolbar=None,
fontsize={'title': 10, 'xticks': 5, 'yticks': 5}
)
If you want to put the time on x-axis and country names on the y-axis, you can transpose the dataframe first, and then set the parameter xaxis='bottom'
.
df_trans = df_new.transpose()
htmap = df_trans.hvplot.heatmap(x='columns',
y='index',
title='GDP of World Top 6 Economies 1991—2020',
cmap=["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"],
xaxis='bottom',
rot=70,
width=700, height=400)
htmap.opts(
toolbar=None,
fontsize={'title': 10, 'xticks': 5, 'yticks': 5})
In the following example, we create a bivariate plot between the United States and China using hvplot.bivariate
.
df.hvplot.bivariate('United States', 'China')
Let's see how to easily create hexbin, which offers a straightforward method for plotting dense data. Similar to pandas, we can set gridsize
to small values to make large bins, and vice versa.
df.hvplot.hexbin('United States', 'China',
clabel='Count',
gridsize=20,
height=400, width=700)
The scatter matrix is also widely used to show all the pairwise relationships between the columns of the data. Each non-diagonal entry plots the corresponding columns against another, while each diagonal plot shows the distribution of the data within each individual column.
The method of scatter matrix in hvplot is hvplot.plotting.scatter_matrix
, which is closely modelled on pandas.plotting.scatter_matrix
.
First, we have to import scatter_matrix
method from hvplot.plotting
module.
from hvplot.plotting import scatter_matrix
By default, each non-diagonal entry plot is a scatter plot and each diagonal plot is a histogram.
scatter_matrix(df, alpha=0.5,
width=600, height=600,
tools=['box_select', 'hover'],
xrotation=90)
In this example, it shows how easily to change the non-diagonal entry plots to bivariate plots and/or change the diagonal plots to KDEs.
df_sub = df[['United States', 'China', 'Japan','Germany','United Kingdom','India']]
scatter_matrix(df_sub, chart='bivariate',
width=600, height=600,
tools=['box_select', 'hover'],
diagonal='kde',
xrotation=90
)
Similarly, it can easily change the non-diagonal entry plots to hexbin.
df_sub = df[['United States', 'China', 'Japan','Germany','United Kingdom','India']]
scatter_matrix(df_sub, chart='hexbin',
width=600, height=600,
gridsize=20,
tools=['box_select', 'hover'],
diagonal='kde',
xrotation=90)
In this last section, we see another convenient feature of hvPlot. It can easily plot DataFrame data as an interactive table by calling table()
method with column names.
df.hvplot.table(columns=['Year','United States', 'China', 'Japan','Germany','United Kingdom','India'],
sortable=True,
selectable=True,
width=900, height=200)
This article demonstrates how easily use hvPlot to the most widely used basic plots using a GDP dataset of world top 6 economies during 1991 to 2020. The examples in this article shows that hvPlot provides high and pandas-like easy to use API, which can create different modern and interactive plots. However, it seems that hvPlot has not provided method to create pie chart up to now.
This article only focuses on basic plots, and so I will display how to create subplots, overlay and outlay multiple plots in the future articles.