## Descripstats: A Python Package Generates Richer Descriptive Statistics in Pandas DataFrame

For numeric data, the `describe( )` function of Python Pandas library provides a very convenient method to generate a general summary table of descriptive Statistics. However, the result’s index only include `count`, `mean`, `std`, `min`, `max` as well as lower, `50` and upper percentiles. By default, the lower percentile is `25`, the upper percentile is `75`, and the`50` percentile is the same as the median.

In most cases, such as writing a scientific and data analysis report, and journal paper, we need more statistic indices than these default ones, such as mean absolute deviation (`mad`), `variance`, standard error of the mean (`sem`), `sum`, `skewness`, `kurtosis`, etc. Pandas also provides methods to calculate them, but we have to write a code snippet to add them to the summary table of the `describe( )` function.

In this connection, I created a Python function to easily generate the summary statistics table, which expands the indices of Pandas `describe( )`. For convenient use purpose, I made it into a PyPI package named `descipstats`, so you can easily install it and use it.

Let’s see how to use this package with a concrete real-world dataset.

### 1. Brief Description of the Package

The `descripstats` package can help add more descriptive statistics to the default `describe()` of Pandas, which include:

• variance: variance
• sem: standard error of the mean
• sum: sum
• skewness: skewness
• kurtosis: kurtosis

#### Method:

`Describe(data)`

Parameters:

• data: data in NumPy array or Pandas DataFrame

Return:

• stats: the descriptive statistics summary in Pandas DataFrame

### 2. Install the Package

Pandas is the only dependency of this package. You can easily install it using `pip` as follows:

`pip install descripstats`

### 3. Use the Package

After installation, we can import it as follows:

#### (1) Import the packages

You can import the package with:

`from descripstats import Describe`

Then use `Describe()` directly. Or

`import descripstats as ds`

then use `ds.Discribe()`

We use the second method in this example as follows:

`import pandas as pdimport descripstats as ds`

We read the dataset from GitHub directly. If you are not family with the method to read dataset from GitHub directly, you can read one of my previous posts.

`url = 'https://raw.githubusercontent.com/Sid-149/Life-Expectancy-Predictor-Comparative-Analysis/main/Notebooks/Life%20Expectancy%20Data.csv'df = pd.read_csv(url,index_col=False)# display the first rowsdf.head()`

#### (3) Display the default descriptive statistic measures of Pandas

First, let’s use the `describe()` function of Pandas so that you can clearly see what measures added in this package later.

`df.describe()`

#### (4) Descriptive statistic measures added by this package

Now, let’s use the function of the package by `Describe(data)`, which uses uppercase of `D`. Here, `df` is the variable name of our imported dataset.

`ds.Describe(df)`

#### (5) Remove some of them

You can remove one or more of them you do not want through the following way.

`stats = ds.Describe(df)stats`

#### (i) Remove one index

For example, you want to exclude `mad` (mean absolute deviation) in the summary table.

`stats.drop('mad')`

#### (ii) remove more than one indices

For example, remove `mad`, `variance` and `sem`. The `inplace=False` is the default, which does not change the summary table. So the `mad` is still there if you display the summary again. If you want to change the table, then use `inplace=True`.

`stats.drop(['mad','variance','sem'],inplace=True)`

#### (5) Transpose the table

We usually use the transposed table in a thesis, a journal paper or a book, so we need to transpose the summary table. Besides, we also just `roud` the values to one decimal place.

`stats.round(1).T.describe()`

