Different Methods to Access General Information of Dataset with Python Pandas

Convenient Pandas’ methods to access the information of dataset better before processing and analyze it

This article is the Part III of Data Analysis Series, which includes the following parts. I suggest you read from the first part so that you can better understand the whole process.

In the Part I and II, we discussed How to Read Dataset from GitHub and Save it using Pandas and also talked about Convenient Methods to Rename Columns of Dataset with Pandas. In this article, we will see how to access basic information of a DataFrame dataset.

First, let’s import the required packages and read the dataset into Pandas’ DataFrame. We use gdp_china_renamed.csv, which is the dataset that we renamed the columns of the original dataset in GitHub and saved into the local working directory. If you are very familiar with methods to read a dataset into pandas and methods to rename the columns, you can use your own dataset. But I strongly suggest you to read the previous two articles so that you will better understand the methods and the whole process that I use.

# Load the required package 
import pandas as pd

# Read the data
df = pd.read_csv('./data/gdp_china_renamed.csv')

1. Access the First Few Rows

(1) Access the first 5 rows

df.head()
png

(2) Access the first N rows

To access the first N rows, just use df.head(N), for example, first 3 rows.

df.head(3)
png

2. Access the Last Few Rows

(1) Access the last 5 rows

df.tail()
png

(2) Access the last N rows

For example, the last 2 rows

df.tail(2)
png

3. Randomly Accessing N Rows

For example, randomly get 5 rows, where the result is varied for each run.

df.sample(5)
png

4. Access All but Few Rows

(1) Access all rows except last N rows

For example, skipping the last 2 rows.

df.head(-2)
png

(2) Access all rows except the first N rows

For example, access all but the first 3 rows.

df.tail(-3)
png

5. Obtain a General Data Information Summary

We use info() to get data information, which highlights the total number of rows, names of the columns, their data type, and any missing value.

df.info()

The result looks as follows:

6. Check a Column’s Date Type

df['year'].dtype

The output is:

dtype(‘int64’)

7. Check Column Names

df.columns

It results in the following output:

8. Check Dataset Shape

df.shape

The output is as follows:

(95, 9)

9. Generate a General Descriptive Statistics Summary

df.describe()
png

We can transform the table by adding .Tat the end.

df.describe().T
png

We can save this descriptive statistics summary into a .csv file in the working directory.

df.to_csv('./results/describe_result.csv')

10. Online Course

If you are interested in learning Python data analysis in details, you are welcome to enroll one of my course:

Master Python Data Analysis and Modelling Essentials

Bookmark
Please login to bookmarkClose
0 - 0

Thank You For Your Vote!

Sorry You have Already Voted!

Please follow and like me:

Leave a Reply

Your email address will not be published. Required fields are marked *