Supervised Learning, Unsupervised Learning, And More: The Main Types of Machine Learning
In this article, we’ll focus on the second word contained in the concept of Machine learning, which is… well… LEARNING!
Yep, things are getting serious, so drink a coffee and let’s start.
… Just to avoid tragical misunderstandings, for “coffee” I mean espresso.
Not cappuccino or americano.
Especially after 11 am.
Ok ok, I get to the point. There are four main categories of Machine learning: supervised, unsupervised, semi-supervised, and reinforcement.
Every approach offers different solutions to train our systems. Let’s discover them together!
What is Supervised learning?
Supervised learning is an approach that teaches machines by example.
It involves building algorithms using previously classified examples, having the idea that there is a relationship between the input and the resulting output.
So we are talking about inputs already associated in some way with their outputs.
How does Supervised learning work in practice?
In supervised learning, machines are exposed to large amounts of labeled data.
This means that this data has been annotated with one or many labels.
The process of attaching labels to unstructured data is known as data annotation or data labeling.
Labels can indicate if a photo contains a car or a person (image annotation), what the topic of an essay is (text classification), which words were uttered in an audio recording (audio transcription), and many others.
Hello, I see you!
Another interesting example of image annotation?
Images of handwritten numbers annotated to indicate which figure they correspond to.
Given a sufficient amount of examples, a supervised-learning system will learn to recognize and distinguish the shapes of each handwritten number!
Yes, applications that recognize texts from handwritten notes work in this way. Crazy stuff, really.
However, these tools are not powerful enough to recognize my handwriting.
I write SO bad…
Labeling data is really long. Longer than The Divine Comedy!
Training these systems usually requires an incredible amount of labeled data.
Some systems literally need to be exposed to millions of examples, before being able to master a task properly.
That was the case of ImageNet, a famous image database.
Using one billion of photos to train an image-recognition system granted record levels of accuracy (85.4 percent) on ImageNet’s benchmark
The size of training datasets continues to grow. In 2018, Facebook announced it had compiled 3.5 billion images publicly available on Instagram, using hashtags attached to each image as labels.
The laborious process of labeling the datasets is often fulfilled using crowdworking services, such as Amazon Mechanical Turk, which provides access to a large pool of low-cost labor spread around the globe.
ImageNet required over two years by nearly 50,000 people, mainly recruited through this service.
It’s capitalism, baby!
Facebook followed an alternative approach, using immense and publicly available datasets to train systems without the overhead of manual labeling