Reducing time-series data.

Sometimes you have temporal datasets where the variables remain constant during long periods and you are really just interested in the changes of these variables. Also, if your data has these kinds of periods is also really useful to reduce the data before plotting to have more light-weight plots.

For example, image you have binary data that looks like this:

You can generate sample data with this code:

#Create temporal index and random binary data
data_len = 100
max_num = 3
n_cols = 1
ind = pd.date_range('1/1/2018', periods=data_len, freq='D')
df = pd.DataFrame()
df["DATE"] = ind
for c in range(0,n_cols):
    ac_len = 0
    data = np.empty((0,0))
    while ac_len < data_len:
        rand_len = np.random.randint(5,10)
        rand_num = np.random.randint(0,max_num)
        n_len = rand_len if ac_len + rand_len < data_len else data_len - ac_len
        data = np.append(data,rand_num*np.ones(n_len))
        #print(ac_len,data_len,n_len)
        ac_len += n_len
    df["DATA"+str(c)] = data    
df.head(10)
plt.iplot([go.Scatter(x=df.DATE,y=df.DATA0,mode="lines+markers")])

You can filter the points whose left and right points (temporal-wise) are exactly the same and still obtain the same plot shape.

These filtering can be achieved like this:

def filter_temporal_data(df,columns):
    dff = df
    dff["USE"] = 0
    dff.USE = dff[columns].diff().fillna(1).abs().sum(axis=1)
    dff.USE += dff[columns].diff(periods=-1).fillna(1).abs().sum(axis=1)
    dff = dff[dff.USE != 0]
    return dff
dff = filter_temporal_data(df,["DATA0"])

← Previous Post