Descripstats package adds more descriptive statistics to the default describe of Pandas
For numeric data, the describe( )
function of Python Pandas library provides a very convenient method to generate a general summary table of descriptive Statistics. However, the result’s index only include count
, mean
, std
, min
, max
as well as lower, 50
and upper percentiles. By default, the lower percentile is 25
, the upper percentile is 75
, and the50
percentile is the same as the median.
In most cases, such as writing a scientific and data analysis report, and journal paper, we need more statistic indices than these default ones, such as mean absolute deviation (mad
), variance
, standard error of the mean (sem
), sum
, skewness
, kurtosis
, etc. Pandas also provides methods to calculate them, but we have to write a code snippet to add them to the summary table of the describe( )
function.
In this connection, Dr. Shouke Wei from Deepsim Intelligence Inc. (Deepsim) created a Python package to easily generate the summary statistics table, which expands the indices of Pandas describe( )
. For convenient use purpose, I made it into a PyPI package named descipstats
, so you can easily install it and use it.
If you are interested in how to use this package with a concrete real-world dataset, please visit the post.