Data is becoming the new raw material of business.” – Craig Mundie
Data alone is intermittent, everything else is constant. Data tends to change over a period. Sometimes that period might be years, months or even fraction of seconds. Data scientists build their models on such extensive data. This may result in substandard and dwindling performance of the predictive models which assume a gridlock relationship between the explanatory and the explained variables. This shortcoming of the dynamism of data is what is known as drift in machine learning.
What Is It?
Let us consider an example of Market Basket Analysis where we want to group various loan types into “baskets” based on loan attributes, historical consumer data, loan uptake patterns, income patterns of consumers, credit history and various other factors. All these attributes will lead to a decision for the credit agencies of the next most probable type of loan that could be offered. A predictive model is build based on these factors. Any change in input factors of this model leads to deterioration in the performance of the model. This is the top-notch reason why any model’s accuracy and reliability downgrades with time. Thus, if one is able to monitor data drift, one can comprehensively improve its model performance. This concept drift can also be known as ‘dataset shift’, ‘covariate shift’ or ‘non-stationarity’.
What Are Its Concepts?
- Sudden / Abrupt Drift
In this method, a new concept in the input variable occurs within a short interval of time. For instance, in medical sciences, when an experiment being conducted is abruptly interrupted on grounds of it being hazardous to the subjects under consideration, this accounts for a sudden change. The effect of this change can be detected and monitored as it shows a clear movement of an average or data distribution from one point to another.
- Gradual Drift
In this method, a new concept slowly replaces an old concept over a period. For instance, the development of vaccines for various diseases develop gradually over time after conducting numerous experiments. The consumer lifestyle and habits changes with time. It can take a very large amount of time to actually reflect the change in the model.
- Recurring Drift
In this method, the old concepts may reoccur after a period of time. These changes reoccur either in cyclically or in an unordered way. For instance, changes in weather prediction models occur in accordance with seasons. Also, some trends on social media network might gain popularity during a specific time say during different seasons, festivals or elections. In case of recurring changes, a model which have been previously used, can be applied in the future too.
In our further research, we explore a comprehensive method that eliminates the consequences of drift such as increasing costs and decreasing accuracy. We will focus on making our models more robust and systematic by monitoring this change in data and taking necessary steps thereof. Check out this space for further information on the same!