The predictive modeling science has evolved throughout a huge number of fields such as physics, chemistry, computer science and statistics.
In most cases, Predictive Modeling is called “machine learning” or “artificial intelligence” to make this study-field looks more interesting. However, predictive modeling is not more than the art of creating a “pattern recognition” (predictive analytics) to find “something” (a valuable something) inside large amounts of data (also known as “data mining” or “knowledge discovery”).
What is Predictive Modeling?
Predictive modeling is the process of developing mathematical tools (models) that generates an accurate prediction. Or, what is the same, predictive modeling is the process for creating a model that predict the probability of an outcome.
In predictive modelling, the main goal is always the same even if each field approaches the problem using different tools and perspectives: “to make an accurate prediction”.
What is the main problem that predictive modelling solves?
A priori, it does not seem very logical. However, the main problem that predictive modeling solves is the “inability of our brain to see beyond”. Let me explain it with the next example: When you look an Excel spreadsheet with tons of data in different rows and columns you can not reach any conclusions about this bunch of numbers. Predictive modeling gets value (commonly, financial value) from these huge amounts of data that (even if it seems apparently ordered) hides “invisible value” for the human eyes.
The human brain can (not only consciously but subconsciously) gather a vast amount of data. However, our brain cannot process and assemble the even greater amount of easily obtainable, relevant information for the problem at hand. Predictive modelling helps us make decisions from real data (the real world) and not just by using our inaccurate insight.
What examples of predictive models we use every day?
In our daily basis, we consciously and subconsciously use data models to help us make appropriate decisions and save time. Think about the next process done by most of the predictive models that we use daily:
- model takes our current information
- sifts through data looking for patterns (relevant to solve our problem) and
- returns answers.
The best known predictive modeling development that we consciously use everyday is the search engine called Google.
There are tons of predictive models that we consciously use on the Internet (such as symptoms.webmd.com that gives you a diagnosis based on our symptoms or when we observe at weather prediction). However, there are many others that we unconsciously use such as the Amazon suggestions to acquire other similar products (a model predicts you probably want to buy a B product because you’ve seen A product).
What can be done with predictive models?
The truth is that examples of this predictive models can be found everywhere. Since the Google global machine that interpret cryptic human “queries” (searches or keywords), the algorithm used by the banks to track credit card fraud, the Netflix model to recommend movies to subscribers or (nothing to say about) the financial systems that handle billions of trades (with only the occasional meltdown).
Some types of questions can be predicted with data science are the next ones:
- How much will this house sell for in the current market?
- Should I sell/buy these assets at this moment?
- Will this customer move our contract to a different company?
- What number of products we’re going to sell this season?
- Does a patient have a specific disease/respond to this therapy?
- Which people should we match in our dating App?
At this moment, predictive models already permeate our existence much more than we think. In fact, predictive models guide us towards more satisfying products when we purchase online or we receive better medical treatments.
But, what happens when you look at tomorrow weather prediction? In fact, predictive models can generate inaccurate predictions and provide the wrong answers (for example, when you receive spam in your e-mail inbox due to a predictive model – a.k.a. e-mail filter – incorrectly identified the message as important). Don’t be angry if you receive inappropriate suggestions on your online shopping, think this inaccuracies can produce larger trouble as happened in the Securities & Exchange Commission in 2010.
Where else predictive modeling is applied?
As another example, insurance companies predict the risks of potential health to determine if an individual could receive a policy. Governments seek on several amounts of data to predict risks, protect their citizens or make social investments. As you probably guess, predictive models can be applied to create models of fraud detection in different areas, biometric models for identifying terror suspects or models of unrest and turmoil as well as solving thousands of problems in many other sector.
How can be improved the accuracy of these predictive models?
There are large number of reasons why predictive models fail. Below, the most common ways a predictive model fails in the most of the cases:
- inadequate pre-processing of the data
- inadequate model validation
- unjustified extrapolation
- over-fitting the model with the existing data
And, beyond these common problems, there is another one: the lack of exploration of other models that could solve a given problem. Often, when searching for predictive relationships, a lot of predictive modelers only explore relatively few models. In most cases, it’s product of their (1) preference or knowledge on a few models and, in other cases, (2) because of the lack of available software that would enable them to explore other techniques.
In any case, the best way is to explore a wide range of techniques to find the most efficiently or accuracy one (depending on the goal of the problem to be solved).
What do you think about this last fact (not trying to solve the problem with different models)? The reason is because (a) the most of the problems doesn’t require better results or (b) the problem comes because of the lack of interest/funding does not allow invest more time/resources to obtain better predictive models? Please, comment below.