What is the difference between supervised and unsupervised learning algorithms? I’ve heard this question a lot of times. This post is exclusively dedicated to extensively explain the difference between supervised and unsupervised learning algorithms.
The difference between Supervised and Unsupervised Learning
In supervised learning, the output datasets are provided (and used to train the model – or machine -) to get the desired outputs. In unsupervised learning, no datasets are provided (instead, the data is clustered into classes).
What is the difference between supervised and unsupervised learning algorithms?
Simple examples, some methods and general difference between supervised and unsupervised learning are described below.
Supervised Learning
In supervised learning, a data set is given. Therefore, the data scientist can know what the correct “output” should look like: these outputs (data) could give you an idea about the relationship between the input and the output. Since we have access to input variables we use some continuous function to have an outcome.
Supervised learning problems could be categorized into “regression” or “classification” problems. In regression problems, we predict results within a continuous output. In a classification problem, we predict results in a discrete output.
Supervised learning examples:
- Regression example of supervised learning: On the real estate market, try to predict their price given data about the size/number of rooms/location of houses/etc. You also have the price as a function of size/#rooms/location/etc. as a continuous output for a given amount of cases (population you use to create your prediction model).
- Classification example of supervised learning: Instead having as output the price of the house, we classify the houses based into two discrete categories: “sells for more or less than the asking price” (for example).
In all supervised learning problems, we want to have a given output based on predictors (or features) and we create this prediction/classification model based on a population of given cases. Therefore, a matrix with correlated data (with features from which we choose what must be the output and the inputs) let us build a model.
Unsupervised Learning
Unsupervised learning concerns problems where we have (just a little or) not idea at all about what our the results of the model should look like. So, we create structure from data but we don’t necessarily know the effect of the variables. As you must guess, there is no performance indicators of the results with unsupervised learning.
We can derive this structure by clustering the data based on relationships among the variables in the data.
Unsupervised learning examples:
- Clustering example of unsupervised learning: Given a collection of 1000 essays written on Economy, the data scientist find a way to automatically group these essays into groups with similar or related differences (such as signatory-university countries, word frequency, images count, etc.).
- Associative example of unsupervised learning: Based on his/her experience, a doctor forms associations between patient characteristics and their illnesses. Based on patient’s characteristics (symptoms, physical attributes, family medical history, mental outlook, etc.), the doctor associates possible illness/illnesses. In this case, we can estimate a mapping function from patient characteristics into different illnesses.
In unsupervised learning problems, we want to classify/organize/map/understand a bunch of data (nothing to be with predictors). We just want to see beyond numbers in order to understand the data and the knowledge hidden in their features. In unsupervised problems, a matrix with data is also given but we don’t try to label a feature as output. Instead, we observe all variables (features) to be able to understand how the data is “organized” inside this bunch of cases (or population).
Do you have a better example for describing the difference between supervised and unsupervised learning algorithms? Do you agree with the given example? Is there a shorter description to differentiate between supervised and unsupervised learning? Comment below.