import pandas as pd
url = 'https://raw.githubusercontent.com/shoukewei/data/main/data-pydm/gdp_china_outlier_treated.csv'
df = pd.read_csv(url)
df.head()
prov | gdpr | year | gdp | pop | finv | trade | fexpen | uinc | |
---|---|---|---|---|---|---|---|---|---|
0 | Guangdong | First | 2000 | 1.074125 | 8.650000 | 0.314513 | 1.408147 | 0.108032 | 0.976157 |
1 | Guangdong | First | 2001 | 1.203925 | 8.733000 | 0.348443 | 1.501391 | 0.132133 | 1.041519 |
2 | Guangdong | First | 2002 | 1.350242 | 8.842000 | 0.385078 | 1.830169 | 0.152108 | 1.113720 |
3 | Guangdong | First | 2003 | 1.584464 | 8.963000 | 0.481320 | 2.346735 | 0.169563 | 1.238043 |
4 | Guangdong | First | 2004 | 1.886462 | 9.052298 | 0.587002 | 2.955899 | 0.185295 | 1.362765 |
Let's change the column names so that the labels on plots are more readable without referring to the variable denotation from time to time. You can refer to the previous article of "Convenient Methods to Rename Columns of Dataset with Pandas in Python" if you are interested in more methods about changing column names.
df.columns=['Province','GDP Rank','Year','GDP','Population','Fix Investment','Trade','Fiscal Expenditure','Urban Income']
df.head()
Province | GDP Rank | Year | GDP | Population | Fix Investment | Trade | Fiscal Expenditure | Urban Income | |
---|---|---|---|---|---|---|---|---|---|
0 | Guangdong | First | 2000 | 1.074125 | 8.650000 | 0.314513 | 1.408147 | 0.108032 | 0.976157 |
1 | Guangdong | First | 2001 | 1.203925 | 8.733000 | 0.348443 | 1.501391 | 0.132133 | 1.041519 |
2 | Guangdong | First | 2002 | 1.350242 | 8.842000 | 0.385078 | 1.830169 | 0.152108 | 1.113720 |
3 | Guangdong | First | 2003 | 1.584464 | 8.963000 | 0.481320 | 2.346735 | 0.169563 | 1.238043 |
4 | Guangdong | First | 2004 | 1.886462 | 9.052298 | 0.587002 | 2.955899 | 0.185295 | 1.362765 |
It can easily apply a groupby
along a particular column or dimension to automatically create plot widgets.
To create active plot using pandas-like API, it should import hvplot.pandas
, which has been discussed in the previous articles of hvPlots series.
import hvplot.pandas
We use a box plot for example as follows, in which box plots are created for the social economic variables based on the category/string variable 'Province'.
df.hvplot.box(y=['GDP','Population','Fix Investment','Trade','Fiscal Expenditure','Urban Income'],
groupby='Province',
width=600,height=400)
by default, widget location is on the right. We can easily change its location using 'widget_location'. In the following example, let's create a bivariate plot between 'Urban Income' and 'GDP' with the widget position on the top left.
df.hvplot.bivariate(x='Trade', y='GDP',
width=500, groupby='Province',
widget_location='left_top')
hvPlot provides a method to use the Panel library to customize the interactivity of the hvPlot output with more fine-grained control over the layout. Panel is an open-source Python library that lets you create custom interactive web apps and dashboards.
If you have not Panel installed, you can install it using either conda
provided by Anaconda or Miniconda:
conda install -c pyviz panel
or using pip:
pip install panel
After installing the Panel, you can use it in hvPlot. For example, we use a slider instead of the default selector using a widget dict as follows.
import panel as pn
df.hvplot.bivariate(x='Trade',
y='GDP',
width=600, groupby='Province',
widgets={'Province': pn.widgets.DiscreteSlider})
With Panel, hvPlot provides many other ways of expanding the interactivity of its objects. For example, it allows us to select which variable to plot on the x and y axes, and what kind of plot to create.
x = pn.widgets.Select(name='x',
options=['Year', 'Population','Fix Investment','Trade','Fiscal Expenditure','Urban Income'])
y = pn.widgets.Select(name='y',
options=['GDP','Population','Fix Investment','Trade','Fiscal Expenditure','Urban Income'])
kind = pn.widgets.Select(name='kind', value='scatter',
options=['bivariate', 'scatter'])
plot = df.hvplot(x=x, y=y, kind=kind, colorbar=False, width=600)
pn.Row(pn.WidgetBox(x, y, kind), plot)
In addition to using widgets directly as arguments, it can also easily add functions that are decorated with pn.depends
.
x = pn.widgets.Select(name='x',
options=['Year', 'Population','Fix Investment','Trade','Fiscal Expenditure','Urban Income'])
y = pn.widgets.Select(name='y',
options=['GDP','Population','Fix Investment','Trade','Fiscal Expenditure','Urban Income'])
kind = pn.widgets.Select(name='kind', value='scatter',
options=['line','bivariate', 'scatter'])
by_province = pn.widgets.Checkbox(name='Province')
color = pn.widgets.ColorPicker(value='#ff0000')
@pn.depends(by_province, color)
def by_province_name(by_province, color):
return 'Province' if by_province else color
plot = df.hvplot(x=x, y=y, kind=kind, c=by_province_name, colorbar=False, width=600)
pn.Row(pn.WidgetBox(x, y, kind, color, by_province), plot)
hvPlot .interative
provides a convenient way to connect widgets directly into an expression that you want to control. In this section, we see some simple examples.
For example, we create a float slider ranging from 0 to maximum GDP.
gdp_slider = pn.widgets.FloatSlider(name='Minimum GDP', start=0, end=round(max(df['GDP'])), value=0)
gdp_slider
We filter dfi['GDP'] > gdp_slider
and use hvPlot’s .interactive()
to connect widgets directly into this filter expression. It can easily display the widget FrameData table.
import numpy as np
dfi = df.interactive(width=700)
dfi[dfi['GDP'] > gdp_slider]
In this example, we create a histogram widget that can create different histograms with changing the slider values.
import numpy as np
dfi[dfi['GDP'] > gdp_slider].hvplot.hist(y='GDP', bins=np.linspace(0, 10, 30))
In another example, let's create a bivariate between Trade and GDP.
dfi[dfi['GDP'] > gdp_slider].hvplot.bivariate(x='Trade', y='GDP', width=700)
The Explorer is a Graphical User Interface added to hvPlot since version 0.8.0, which provides a simple and easy way to create customized plots. It gives the possibility to explore both the data and hvPlot’s extensive API. Thus, hvPlot explorer generates a powerful and advanced plot widget, because it is more specifically a Graphical User Interface
The simple way is to just pass the whole DataFrame to explorer as follows. Then choose scatter, population as x, GDP as y and By Province.
hvexplorer = hvplot.explorer(df)
hvexplorer
We can easily get the code for the plot using .plot_code()
, and then you can use the code anywhere to reproduce the plots.
hvexplorer.plot_code()
"df.hvplot(by=['Province'], kind='scatter', x='Trade', y=['GDP'])"
Then, we can copy and paste it to the place where you need to produce the plots, for example the following cell.
df.hvplot(by=['Province'], kind='scatter', title='GDP Scatter', x='Trade', y=['GDP'])
We can easily customize the setting and generate a satisfied plot. For example,
hvexplorer.param.set_param(kind='scatter', x='Population', y_multi=['GDP'], by=['Province'])
hvexplorer.labels.title = 'GDP Scatter'
Display the customized setting in a dictionary.
settings = hvexplorer.settings()
settings
{'by': ['Province'], 'kind': 'scatter', 'title': 'GDP Scatter', 'x': 'Population', 'y': ['GDP']}
It just needs to call the setting to produce the plot.
df.hvplot(**settings)
It also provides an easy way hvexplorer.save(filename)
to save the plot. For example, you save the plot into a HTML file as follows.
hvexplorer.save('./results/plot.html')
This article demonstrates 5 easy methods to generate interactive plot widgets and GUI for data visualization using hvPlot. Groupby
method is the most simple and easy one to create a simple plot widget, while the explorer
method is the most powerful, advanced, but simple and easy one, which creates a Graphical User Interface to create various customized plots.