A Python Package Generates Richer Descriptive Statistics in Pandas DataFrame

Descripstats package adds more descriptive statistics to the default describe of Pandas

For numeric data, the describe( ) function of Python Pandas library provides a very convenient method to generate a general summary table of descriptive Statistics. However, the result’s index only include count, mean, std, min, max as well as lower, 50 and upper percentiles. By default, the lower percentile is 25, the upper percentile is 75, and the50 percentile is the same as the median.

In most cases, such as writing a scientific and data analysis report, and journal paper, we need more statistic indices than these default ones, such as mean absolute deviation (mad), variance, standard error of the mean (sem), sum, skewness, kurtosis, etc. Pandas also provides methods to calculate them, but we have to write a code snippet to add them to the summary table of the describe( ) function.

In this connection, Dr. Shouke Wei from Deepsim Intelligence Inc. (Deepsim) created a Python package to easily generate the summary statistics table, which expands the indices of Pandas describe( ). For convenient use purpose, I made it into a PyPI package named descipstats, so you can easily install it and use it.

If you are interested in how to use this package with a concrete real-world dataset, please visit the post.

0 - 0

Thank You For Your Vote!

Sorry You have Already Voted!

Leave a Reply Cancel reply