Data Normalisation & Standardisation¶

Dataset: Past record of Air Quality Health Index (English) Jul 1999 Hong Kong¶

import pandas as pd
import numpy as np

fi = pd.read_csv("hr071999.csv", skiprows=8)  # First 8 rows contains notes and comments
fi["Date"] = fi["Date"].fillna(method='pad')                              # Fill down dates
fi = fi.interpolate(method='pchip')                                       # Interpolation for missing values
for col in fi:                                                            # Round values to nearest integer
    if fi[col].dtype == np.float64:
        fi[col] = fi[col].round()
fi.head(10)

Specify columns to normalise/standardise.

cols = list(fi.loc[:,'Causeway Bay':])

Normalisation¶

Normalisation rescales the values to a specified range, usually between 0 and 1. Otherwise known as feature scaling.

Define normalise function, which takes a DataFrame, applies normalisation to each column of the DataFrame, and returns the DataFrame. An optional parameter range, a tuple, defines the restriction on the range after normalisation, default is (0,1).

def normalise(df, range=(0,1)):
    df = range[0] + ((df - df.min()) * (range[1] - range[0])) / (df.max() - df.min())
    return df

Apply normalisation with specified range of 0 to 10.

fi_norm = fi.copy()
fi_norm[cols] = normalise(fi_norm[cols], range=(0,10))
fi_norm.head(10)

Standardisation¶

Standardisation rescales the data so that it has a mean of 0 and a standard deviation of 1. Otherwise known as Z-score scaling.

Define standardise function, which takes a DataFrame, applies standardisation to each column of the DataFrame, and returns the DataFrame.

def standardise(df):
    df = (df - df.mean()) / df.std()
    return df

Apply standardisation.

fi_stds = fi.copy()
fi_stds[cols] = standardise(fi_stds[cols])
fi_stds.head(10)

	Date	Hour	Causeway Bay	Central	Mong Kok	Central/Western	Eastern	Kwai Chung	Kwun Tong	Sha Tin	Sham Shui Po	Tai Po	Tap Mun	Tsuen Wan	Tung Chung	Yuen Long
0	01/07/1999	0	67	44	31	14	13	29	32	19	21	21	10	21	14	20
1	01/07/1999	1	67	43	31	14	13	29	32	19	21	21	9	21	15	20
2	01/07/1999	2	67	43	31	14	13	30	32	19	21	21	9	21	13	20
3	01/07/1999	3	67	42	31	14	13	30	32	19	21	20	9	21	11	20
4	01/07/1999	4	67	43	31	14	12	29	32	19	20	20	9	21	11	20
5	01/07/1999	5	67	44	31	14	12	29	32	19	20	20	9	21	12	20
6	01/07/1999	6	68	44	31	14	12	30	32	20	20	21	9	22	11	20
7	01/07/1999	7	67	43	30	13	12	29	31	19	20	22	9	21	13	20
8	01/07/1999	8	67	41	30	13	12	27	31	19	19	21	9	21	13	20
9	01/07/1999	9	67	38	30	12	12	26	30	20	19	20	10	20	13	20

	Date	Hour	Causeway Bay	Central	Mong Kok	Central/Western	Eastern	Kwai Chung	Kwun Tong	Sha Tin	Sham Shui Po	Tai Po	Tap Mun	Tsuen Wan	Tung Chung	Yuen Long
0	01/07/1999	0	5.172414	4.285714	0.952381	0.714286	0.357143	1.923077	1.320755	1.132075	1.818182	1.063830	0.31250	0.652174	0.675676	0.714286
1	01/07/1999	1	5.172414	4.047619	0.952381	0.714286	0.357143	1.923077	1.320755	1.132075	1.818182	1.063830	0.15625	0.652174	0.810811	0.714286
2	01/07/1999	2	5.172414	4.047619	0.952381	0.714286	0.357143	2.115385	1.320755	1.132075	1.818182	1.063830	0.15625	0.652174	0.540541	0.714286
3	01/07/1999	3	5.172414	3.809524	0.952381	0.714286	0.357143	2.115385	1.320755	1.132075	1.818182	0.851064	0.15625	0.652174	0.270270	0.714286
4	01/07/1999	4	5.172414	4.047619	0.952381	0.714286	0.000000	1.923077	1.320755	1.132075	1.590909	0.851064	0.15625	0.652174	0.270270	0.714286
5	01/07/1999	5	5.172414	4.285714	0.952381	0.714286	0.000000	1.923077	1.320755	1.132075	1.590909	0.851064	0.15625	0.652174	0.405405	0.714286
6	01/07/1999	6	5.517241	4.285714	0.952381	0.714286	0.000000	2.115385	1.320755	1.320755	1.590909	1.063830	0.15625	0.869565	0.270270	0.714286
7	01/07/1999	7	5.172414	4.047619	0.714286	0.476190	0.000000	1.923077	1.132075	1.132075	1.590909	1.276596	0.15625	0.652174	0.540541	0.714286
8	01/07/1999	8	5.172414	3.571429	0.714286	0.476190	0.000000	1.538462	1.132075	1.132075	1.363636	1.063830	0.15625	0.652174	0.540541	0.714286
9	01/07/1999	9	5.172414	2.857143	0.714286	0.238095	0.000000	1.346154	0.943396	1.320755	1.363636	0.851064	0.31250	0.434783	0.540541	0.714286

	Date	Hour	Causeway Bay	Central	Mong Kok	Central/Western	Eastern	Kwai Chung	Kwun Tong	Sha Tin	Sham Shui Po	Tai Po	Tap Mun	Tsuen Wan	Tung Chung	Yuen Long
0	01/07/1999	0	0.483218	-0.667016	-1.411001	-1.498267	-1.527556	-1.094393	-0.431131	-0.916338	-1.283588	-1.018649	-1.235283	-1.161153	-0.708754	-0.930821
1	01/07/1999	1	0.483218	-0.795775	-1.411001	-1.498267	-1.527556	-1.094393	-0.431131	-0.916338	-1.283588	-1.018649	-1.348860	-1.161153	-0.628011	-0.930821
2	01/07/1999	2	0.483218	-0.795775	-1.411001	-1.498267	-1.527556	-0.998289	-0.431131	-0.916338	-1.283588	-1.018649	-1.348860	-1.161153	-0.789497	-0.930821
3	01/07/1999	3	0.483218	-0.924534	-1.411001	-1.498267	-1.527556	-0.998289	-0.431131	-0.916338	-1.283588	-1.122442	-1.348860	-1.161153	-0.950982	-0.930821
4	01/07/1999	4	0.483218	-0.795775	-1.411001	-1.498267	-1.680869	-1.094393	-0.431131	-0.916338	-1.386302	-1.122442	-1.348860	-1.161153	-0.950982	-0.930821
5	01/07/1999	5	0.483218	-0.667016	-1.411001	-1.498267	-1.680869	-1.094393	-0.431131	-0.916338	-1.386302	-1.122442	-1.348860	-1.161153	-0.870239	-0.930821
6	01/07/1999	6	0.617939	-0.667016	-1.411001	-1.498267	-1.680869	-0.998289	-0.431131	-0.790443	-1.386302	-1.018649	-1.348860	-1.037640	-0.950982	-0.930821
7	01/07/1999	7	0.483218	-0.795775	-1.517923	-1.619651	-1.680869	-1.094393	-0.550836	-0.916338	-1.386302	-0.914857	-1.348860	-1.161153	-0.789497	-0.930821
8	01/07/1999	8	0.483218	-1.053293	-1.517923	-1.619651	-1.680869	-1.286601	-0.550836	-0.916338	-1.489017	-1.018649	-1.348860	-1.161153	-0.789497	-0.930821
9	01/07/1999	9	0.483218	-1.439569	-1.517923	-1.741035	-1.680869	-1.382704	-0.670541	-0.790443	-1.489017	-1.122442	-1.235283	-1.284666	-0.789497	-0.930821