Before starting the project, let's import the required libraries, read the data, slice the signal, and create a simple plot to see what the signal looks like.
In this project, we need PyWavelets for Discrete wavelet transform, NumPy to create arrays, pandas to read the date, Matplotlib for data and result visualization. We import these libraries as follows.
import pywt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
We directly read the data from one of my GitHub repositories.
url = 'https://raw.githubusercontent.com/shoukewei/data/main/data-1dwt/water_ph_do.csv'
df = pd.read_csv(url)
df.head()
timestemp | ID | pH | DO(mg/l) | |
---|---|---|---|---|
0 | 27-01-2008 0:00 | 1 | 7.180000 | 7.800000 |
1 | 27-01-2008 1:00 | 2 | 7.234500 | 7.656634 |
2 | 27-01-2008 2:00 | 3 | 7.234519 | 7.656024 |
3 | 27-01-2008 3:00 | 4 | 7.234537 | 7.655414 |
4 | 27-01-2008 4:00 | 5 | 7.234555 | 7.654804 |
To display the column names, we can use
df.columns
Index(['timestemp', 'ID', 'pH', 'DO(mg/l)'], dtype='object')
In this project, we use the 'ID' as the x-axis, representing an hourly interval, and we select 'DO(mg/l)' as the time series signal.
t = df['ID']
s = df['DO(mg/l)']
Let's plot the signal and see generally what it looks like.
fig,ax = plt.subplots(figsize=(15,7))
plt.rc('xtick',labelsize=15)
plt.rc('ytick',labelsize=15)
ax.plot(t,s)
ax.set_xlabel('t (hour)',fontsize=15)
ax.set_ylabel('s (DO (mg/L))',fontsize=15)
plt.savefig('./results/water_do_signal.png',dpi=300)
plt.show()
In this section, we will check the maximum decomposition levels, specify the decomposed levels, make the decomposition, as well as print the coefficients and their length.
First, we check how many levels of the signal can be decomposed into.
w = pywt.Wavelet('db6')
l = len(s)
f_l = w.dec_len
max_l = pywt.dwt_max_level(data_len=l, filter_len=f_l)
print(max_l)
13
Thus, we can decompose to the maximum 13 levels.
It is really not necessary to decompose the signal into the maximum levels. Normally, it is enough to decompose a signal into 3 to 9 levels. Here, we specify the decomposition level to 6.
coeffs = pywt.wavedec(s,'db6','sym',level=6)
(cA6,cD6,cD5,cD4,cD3,cD2,cD1) = coeffs
We just print the approximation and detail coefficient at the last level for example.
print('cA6 = ',cA6)
print('cD6 = ',cD6)
cA6 = [61.36074204 61.36848392 61.36335069 ... 61.1648816 61.58335401 61.61328704] cD6 = [ 0.00983414 -0.01039727 0.01776662 ... -0.01860011 -0.11101357 0.01917679]
Let's check the coefficient length at the last level to compare them with the length of the orignal signal.
print(len(s))
print(len(cA6))
print(len(cD6))
93409 1470 1470
Printing the coefficient values seems no sense, and visualizing them is more meaningful to see what they are like.
fig,axs = plt.subplots(4,2,figsize=(15,12))
plt.rc('xtick',labelsize=12)
plt.rc('ytick',labelsize=12)
axs[0,0].plot(s)
axs[0,0].set_ylabel('s (DO (mg/L))',fontsize=13)
axs[0,1].plot(cA6)
axs[0,1].set_ylabel('cA6',fontsize=13)
axs[1,0].plot(cD6)
axs[1,0].set_ylabel('cD6',fontsize=13)
axs[1,1].plot(cD5)
axs[1,1].set_ylabel('cD5',fontsize=13)
axs[2,0].plot(cD4)
axs[2,0].set_ylabel('cD4',fontsize=13)
axs[2,1].plot(cD3)
axs[2,1].set_ylabel('cD3',fontsize=13)
axs[3,0].plot(cD2)
axs[3,0].set_ylabel('cD2',fontsize=13)
axs[3,0].set_xlabel('t (hour)',fontsize=13)
axs[3,1].plot(cD1)
axs[3,1].set_ylabel('cD1',fontsize=13)
axs[3,1].set_xlabel('t (hour)',fontsize=13)
plt.tight_layout()
plt.savefig('./results/water_s_coeffs.png',dpi=300)
plt.show()
s_r = pywt.waverec(coeffs,'db6','sym')
Next, we check the length of the reconstructed signal and see if it has the same length with the original signal.
print(len(s))
print(len(s_r))
93409 93410
In the wavelet reconstruction method, it has already discussed that the reconstructed signal will have one more value at the end when the original has an odd length. Thus, the real constructed signal is the one excluded the last value.
s_r = s_r[:-1]
First, we compare them in terms of plot.
fig,ax = plt.subplots(figsize=(15,6))
ax.plot(t,s)
ax.plot(t,s_r)
plt.show()
From the above visualization result, we cannot see the difference between them. Let's further check the performance of wavelet reconstruction from Mean Absolute Error (MAE) and the maximum absolute error (Max. AE).
err = sum(abs(s-s_r))/len(s)
max_err = max(abs(s-s_r))
print('MAE: ',err)
print('Max. AE: ', max_err)
MAE: 2.734037557153005e-15 Max. AE: 1.5987211554602254e-14
The above errors confirm that the signal reconstruction is very accuracy.
In this section, let's reconstruct the approximations and details of the original signal.
Based on the method, we should create zero arrays for all the coefficients.
cA06 = np.zeros(len(cA6))
cD06 = np.zeros(len(cD6))
cD05 = np.zeros(len(cD5))
cD04 = np.zeros(len(cD4))
cD03 = np.zeros(len(cD3))
cD02 = np.zeros(len(cD2))
cD01 = np.zeros(len(cD1))
Just print cA06, for example
print(cA06)
[0. 0. 0. ... 0. 0. 0.]
Now, let's reconstruct approximation and details with the process and method that were introduced in the previous articles.
A6 = pywt.waverec((cA6,cD06,cD05,cD04,cD03,cD02,cD01),'db6','sym')
D6 = pywt.waverec((cA06,cD6,cD05,cD04,cD03,cD02,cD01),'db6','sym')
D5 = pywt.waverec((cA06,cD06,cD5,cD04,cD03,cD02,cD01),'db6','sym')
D4 = pywt.waverec((cA06,cD06,cD05,cD4,cD03,cD02,cD01),'db6','sym')
D3 = pywt.waverec((cA06,cD06,cD05,cD04,cD3,cD02,cD01),'db6','sym')
D2 = pywt.waverec((cA06,cD06,cD05,cD04,cD03,cD2,cD01),'db6','sym')
D1 = pywt.waverec((cA06,cD06,cD05,cD04,cD03,cD02,cD1),'db6','sym')
Check the length of the last level.
print(len(s))
print(len(A6))
93409 93410
Similarly, the reconstructed approximation and details have extra values at the end that we should remove. Thus, the real approximation and details are as follows.
A6 = A6[:-1]
D6 = D6[:-1]
D5 = D5[:-1]
D4 = D4[:-1]
D3 = D3[:-1]
D2 = D2[:-1]
D1 = D1[:-1]
Based on the approximation at the last level and details at different levels, we can calculate the approximations at the other levels.
A5 = A6 + D6
A4 = A6 + D6 + D5
A3 = A6 + D6 + D5 + D4
A2 = A6 + D6 + D5 + D4 + D3
A1 = A6 + D6 + D5 + D4 + D3 + D2
Now, let's see how well the reconstructed signal from approximation and details.
s_r = A6 + D6 + D5 + D4 + D3 + D2 + D1
Let's check the performance of signal reconstruction from its approximation and details in terms of Mean absolute error (MAE) and Maximum absolute error (Max. AE).
err = sum(abs(s-s_r))/len(s)
max_err = max(abs(s-s_r))
print('MAE: ',err)
print('Max. AE: ',max_err)
MAE: 2.742300433821414e-15 Max. AE: 1.9539925233402755e-14
Now, let's visualize the approximation at the last level and details at all the levels. Besides, we also save the plot into the result folder of the current working directory.
fig,axs = plt.subplots(4,2,figsize=(15,12))
plt.rc('xtick',labelsize=12)
plt.rc('ytick',labelsize=12)
axs[0,0].plot(t,s)
axs[0,0].set_ylabel('s (DO (mg/L))',fontsize=13)
axs[0,1].plot(t,D1)
axs[0,1].set_ylabel('D1',fontsize=13)
axs[1,0].plot(t,D2)
axs[1,0].set_ylabel('D2',fontsize=13)
axs[1,1].plot(t,D3)
axs[1,1].set_ylabel('D3',fontsize=13)
axs[2,0].plot(t,D4)
axs[2,0].set_ylabel('D4',fontsize=13)
axs[2,1].plot(t,D5)
axs[2,1].set_ylabel('D5',fontsize=13)
axs[3,0].plot(t,D6)
axs[3,0].set_ylabel('D6',fontsize=13)
axs[3,0].set_xlabel('t (hour)',fontsize=13)
axs[3,1].plot(t,A6)
axs[3,1].set_ylabel('A6',fontsize=13)
axs[3,1].set_xlabel('t (hour)',fontsize=13)
plt.tight_layout()
plt.savefig('./results/water_s_approx_details.png',dpi=300)
plt.show()
Detail constituents are the high frequency data, which can be removed as noise. We can just keep the approximation constituent, or we just reconstruct the approximation constituent.
For example, we can reconstruct only reconstruct the approximations at different levels, which we have done in the section 6.2. Let's copy them below again.
A5 = A6 + D6
A4 = A6 + D6 + D5
A3 = A6 + D6 + D5 + D4
A2 = A6 + D6 + D5 + D4 + D3
A1 = A6 + D6 + D5 + D4 + D3 + D2
Let's check the difference between the approximation at the level of 6 in the terms of Mean absolute error (MAE), for example.
err = sum(abs(s-A6))/len(s)
max_err = max(abs(s-A6))
print('MAE: ',err)
print('Max. AE',max_err)
MAE: 0.01060595687642763 Max. AE 4.57962460477175
The small MAE result indicates that the approximations at different levels can well present the original signal.
We can visualize the difference between the approximations at different levels and the original signal, from which it can further illustrate how well the approximations at each level represent the noise reduced signals.
fig,axs = plt.subplots(3,2,figsize=(15,12))
plt.rc('xtick',labelsize=12)
plt.rc('ytick',labelsize=12)
axs[0,0].plot(t,s)
axs[0,0].plot(t,A1)
axs[0,0].set_ylabel('s (DO (mg/L))',fontsize=13)
axs[0,0].legend(('Orignal signal (s)','Approimation at level 1 (A1)'),loc='best')
axs[0,1].plot(t,s)
axs[0,1].plot(t,A2)
axs[0,1].set_ylabel('s (DO (mg/L))',fontsize=13)
axs[0,1].legend(('Orignal signal (s)','Approimation at level 2 (A2)'),loc='best')
axs[1,0].plot(t,s)
axs[1,0].plot(t,A3)
axs[1,0].set_ylabel('s (DO (mg/L))',fontsize=13)
axs[1,0].legend(('Orignal signal (s)','Approimation at level 3 (A3)'),loc='best')
axs[1,1].plot(t,s)
axs[1,1].plot(t,A4)
axs[1,1].set_ylabel('s (DO (mg/L))',fontsize=13)
axs[1,1].legend(('Orignal signal (s)','Approimation at level 4 (A4)'),loc='best')
axs[2,0].plot(t,s)
axs[2,0].plot(t,A5)
axs[2,0].set_ylabel('s (DO (mg/L))',fontsize=13)
axs[2,0].legend(('Orignal signal (s)','Approimation at level 5 (A5)'),loc='best')
axs[2,0].set_xlabel('t (hour)',fontsize=13)
axs[2,1].plot(t,s)
axs[2,1].plot(t,A6)
axs[2,1].set_ylabel('s (DO (mg/L))',fontsize=13)
axs[2,1].legend(('Orignal signal (s)','Approimation at level 6 (A6)'),loc='best')
axs[2,1].set_xlabel('t (hour)',fontsize=13)
plt.tight_layout()
plt.savefig('./results/water_s_approx.png',dpi=300)
plt.show()
This article is a real-world project on multilevel Discrete wavelet transform of 1D time series signal. In details, it includes 6 topics: (1) decompose the signal into multilevel approximation and detail coefficients; (2) visualize these approximation and detail coefficients; (3) reconstruct the time series signal from these coefficients; (4) reconstruct the approximation and details at different levels from their coefficients; (5) visualize these approximations and details of the signal; and (6) reduce data noise at different levels and visualize the results.