Convenient Methods to Rename Columns of Dataset with Pandas in Python

To display how easy it is to rename columns of a dataset for different cases using Python Pandas Library

We usually change certain variables, rename, even add or completely replace variable names because the variable names from the original data files don’t adhere to the preferred naming conventions, including but not limited to the following reasons:

names are very long
names contains unwanted symbols, special characters or spaces
names are capitalized, but we need lower cased names, or
we need to add prefixes or suffixes, and whatever

Thus, we need to change the variable names as an initial step of data management or data analysis. In this post, it displays how to change the variable names, i.e. column names of datasets in the form of Pandas DataFrame.

This article is the Part II of Data Analysis Series, which includes the following parts. I suggest you read from the first part so that you can better understand the whole process.

1. Load Dataset

In Part I, I have displayed how to read the dataset gdp_china_clean.csv from GitHub into Pandas DataFrame and save it in the working directly of local computer. Now, let’s read the saved dataset into Pandas’ DataFrame directly from the working directory using Pandas’ pd.read_csv() function.

# import the required packages
import pandas as pd

# Read the data
df = pd.read_csv('./data/gdp_china_clean.csv',index_col=0)

# display the first five rows
df.head()

2. Display the column names

Now let’s show the column names only and see what they look like.

df.columns

From the above output, we can see that the names of columns really don’t adhere to the preferred naming conventions for analysis, where names contain spaces, units, long strings, etc. Thus, it is necessary to change them.

3. Rename columns

In this section, we will see different methods to change column names for different situations that we might meet.

3.1 Rename certain columns

In many cases, we need only rename one or two columns because the names of the rest columns meet our preferred naming conventions. For example, suppose we only need to change ‘GDP ranking’ and ‘population (x10⁷ person)’.

In this case, we can use dameframe.rename() function to change ‘GDP ranking‘ to ‘GDP_rank‘ and ‘population (x10^4 person)‘ to ‘POP‘ for instance. In general, there are three different methods.

Method 1:

We use the structure dameframe.rename(columns={'old name1': 'new name1', 'old name2': 'new name2',...}) to rename certain columns.

df_r1 = df.rename(columns={'GDP ranking ': 'GDP_rank', 'population (x10^4 person)': 'POP'})

df_r1.head()

Method 2

The structure of this method is dameframe.rename({'old name1': 'new name1', 'old name2': 'new name2',...}, axis=1). This method is new compared with the first one.

df_r2 = df.rename({'GDP ranking ': 'GDP_rank', 'population (x10^4 person)': 'POP'}, axis=1)  

df_r2.head()

Method 3

In this method, we just change axis=1 to axis='columns'. In fact, we can also consider methods two and three are the same because axis 1 refers to the columns.

df_r3 = df.rename({'GDP ranking ': 'GDP_rank', 'population (x10^4 person)': 'POP'}, axis='columns')  

df_r3.head()

3.2 Remove unwanted spaces or symbols from column names

In some cases, we need only rename some columns which contain unwanted spaces or symbols in the names.

(1) Remove unwanted spaces

Python string strip() Method can be used to remove spaces at the beginning and at the end of the string. We use a lamda function to remove unwanted spaces in all column names.

df_r4= df.rename(columns=lambda x: x.strip())

print(df.columns)
print(df_r4.columns)

From the above compared results, we can see that all unwanted spaces have been removed from the column names.

(2) Remove unwanted symbols

We can consider ‘(x10⁸CNY)’ as a symbol or special characters in the column names of ‘total imports and exports (x10⁸CNY)’, and let remove them by using replace() methods to replace it with ''.

df_r5 = df.rename(columns=lambda x: x.replace('(x10^8CNY)',''))
df_r5.columns

3.3 Lowercase or uppercase column names

Sometimes maybe we just rename column by lowercasing or uppercasing their names.

(1) Lowercase column names

df_r6 = df.rename(columns=str.lower)
df_r6.columns

(2) Uppercase column names

For example, we uppercase the column names.

df_r7 = df.rename(columns=str.upper)

df_r7.columns

(2) Lowercase the column names

We can also uppercase or lowercase a whole string column, such as the ‘Province’ column.

df_r8 = df.rename(columns=str.lower)

df_r8.columns

3.4 Rename all columns

In most cases, we want to rename all the columns to short or abbreviated ones, for instance.

(1) use `set_axis` with a list and `inplace=False`

new_colnames = ['prov','gdpr','year','gdp','pop','finv','trade','fexpen','uinc']

df_r9 = df.set_axis(new_colnames, axis='columns') #default: inplace=False

df_r9.columns

(2) use the `.columns` attribute with a list

This is a comparatively new method.

df.columns=['prov','gdpr','year','gdp','pop','finv','trade','fexpen','uinc']

df.columns

The advantage of using ‘set_axis’ is that it can be used as part of a method chain and that it returns a new copy of the DataFrame, while .columns attribute method will change the original DataFrame.

3.5 Prefix or suffix the column names

In some cases, we need to prefix or suffix the column names. It is very easy to realize it using df.add_prefix()and df.add_suffix() in Pandas. Let’s add china for example because the dataset is about China.

(1) Prefix the column names

df_r10 = df.add_prefix('china_') 
df_r10.head()

(2) Suffix the column names

df_r11 = df.add_suffix('_china') 
df_r11.head()

4. Save the new DataFrame

At the end of rename, do not forget to save the modified dataset in your working directory using new data file name, for example, gdp_china_renamed.csv.

df.to_csv('./data/gdp_china_renamed.csv',index=False)

5. Online course

If you are interested in learning Python data analysis in details, you are welcome to enroll one of my course:

Master Python Data Analysis and Modelling Essentials

Bookmark

Please login to bookmark

0 - 0

Thank You For Your Vote!

Sorry You have Already Voted!

Please follow and like me:

Convenient Methods to Rename Columns of Dataset with Pandas in Python