What Is Machine Learning – very Informative Explanation

What Is Machine Learning

Machine Learning is all around us.  

Google uses it to provide millions of search results every hour. It helps Facebook guess your next love interest. Even Elon Musk’s Tesla uses it to make self-driving cars. However, if you’re new to the field, Machine Learning can seem daunting. 

In this article, we’ll give you an introduction to what Machine Learning actually is.  

How to Build a Machine Learning Model? 


Machine Learning (ML) is a method of analyzing data, considered to be a branch of Artificial Intelligence (AI). 

During the Machine Learning process, we build predictive models based on computer algorithms containing data. Building a good Machine Learning model can be similar to parenting.  

In this analogy, the ML model is the child and the parent is the data scientist working on it. Their main goal is to raise a child capable of solving problems. To become an excellent problem solver, the child has to learn how to deal with the surrounding environment. There are so many unknowns at first, but, over time, their logic will improve. Given enough life experience and useful lessons, the child will become a brilliant problem solver. 

This is precisely what we want from an ML model – problem solving skills! 

It’s all about learning from experience. Machine Learning does not rely on a pre-written equation. Instead, the algorithm learns from experience in the form of training data. The bigger, higher quality data you have – the stronger the results you obtain from the model. 

A child (or adult) may be talented, but perhaps doesn’t have enough experience – especially if they haven’t practiced enough. In such cases, it’s likely that they’ll be outperformed by someone of average talent who continues to learn and work on themselves. 

The same goes for Machine Learning models. The more training data you have, the better output you will receive. In most situations, a sophisticated Machine Learning algorithm, trained with a lower amount of data, would most likely perform worse than a fairly simple algorithm with a large amount of training data. 

What are The Main Types of Machine Learning? 

There are three main types of Machine Learning: 

  • Supervised learning 
  • Unsupervised learning 
  • Reinforcement learning

Let’s briefly explain each of them.

Supervised Learning

Supervised learning relies on labeled data. Following our previous analogy, the parent is very active in this case, and points out to the child whether a type of behavior is ‘good’ or ‘bad’. In fact, the parent provides plenty of pre-labeled examples. Based on this existing knowledge, the child tries to produce a pattern of behavior that fits the parent’s initial guidelines. 

Unsupervised Learning

Unsupervised learning, on the other hand, is an approach used when we don’t have labeled data. Our experiences are unlabeled – they’re not categorized as ‘good’ or ‘bad’. The parent lets the child explore the world on their own. Without initial guidance, they won’t be able to recognize and categorize experiences as ‘good’ or ‘bad’. However, that is not the objective. What the parent aims to accomplish with this kind of technique is that, eventually, the child will distinguish and point out different types of behavior, based on their similarities and differences.  

Reinforcement Learning

The third type of Machine Learning is called reinforcement learning. This type is based on feedback. Every time the parent sees a positive behavior from the child, they reward them. Similarly, bad behavior is discouraged with punishment. 

As with parenting style, Machine Learning models can be tweaked over time when the data scientist believes that a change of some of the model’s parameters could result in achieving more accurate results. So, very often the art of the data scientist and Machine Learning engineer professions is in the fine-tuning of an already well-performing model. In some cases, a 0.1% improvement in accuracy could be of important significance – especially when the ML model is applied in areas like healthcarefraud prevention, and self-driving vehicles. 

In terms of the complexity of a model that a data scientist can create, we can distinguish between traditional Machine Learning methods and Deep Learning. 

Some of the most popular traditional supervised Machine Learning techniques are: 

  • Regression 
  • Logistic regression 
  • Time series 
  • Support vector machines
  • Decision trees 

These methods allow us to predict a future value or classify our data based on predefined classes. 

On the other hand, traditional unsupervised ML techniques, such as K-means clustering, are mainly used for grouping items in the input data into clusters and analyzing patterns in these clusters. 

In some instances, data scientists use Principal component analysis (PCA) for the purposes of dimensionality reduction – understanding which are the key variables that make the most significant contribution in a dataset. 

What Is Deep Learning? 

If Machine Learning is considered a branch of AI, then we can say that Deep Learning is a branch of ML. 

The inspiration for Deep Learning (DL) arose from studying how the human brain works. It relies on a structure called a neural network, consisting of multiple layers. In a sense, each of these layers can be considered a classic ML mini-model, and they all learn together.  

We can say that a neural network does Deep Learning when it has more than 3 layers. The more layers a neural network has, the more complex it is. And the more capacity for learning it has. In a neural network, each layer’s outputs are inputs for the next layer.  

Deep Learning is the best solution for activities like: 

  • Image recognition and video recognition
  • Speech classification and speech recognition
  • Natural Language Processing (NLP) 

Basically, all the cool AI stuff presented at innovation summits. 

Machine Learning vs Deep Learning: How to Choose?  

The short answer is: based on the complexity of their data.  

The classic approach suffices when we have simpler data, whereas complex data will likely require neural networks. Deep Learning outperforms traditional Machine Learning methods in terms of precision in almost all instances. However, it requires a higher degree of sophistication, is more difficult to interpret, and isn’t as efficient as traditional methods in terms of the time necessary to prepare the model. 

The important thing we should remember is that Machine Learning is a tool that can potentially empower people, if applied ethically. It allows us to decrease our workload at scale and is invaluable in situations when we have to deal with a lot of incoming data, and when we have to constantly make a great number of micro decisions. 

Machine Learning: Next Steps 

Now that you have a basic understanding of what Machine Learning is, you can start learning how to apply it yourself.  

Are you ready to dive in? 

Try our Machine Learning in Python course for free.

Try Machine Learning Course for free

Google Professional Machine Learning Engineer exam Question and Answers

Question 1.

You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests. You want to store the results for analytics and visualization. How should you configure the pipeline?

  • A. 1 = Dataflow, 2 = AI Platform, 3 = BigQuery
  • B. 1 = DataProc, 2 = AutoML, 3 = Cloud Bigtable
  • C. 1 = BigQuery, 2 = AutoML, 3 = Cloud Functions
  • D. 1 = BigQuery, 2 = AI Platform, 3 = Cloud Storage

Answer 1. C

Question 2.

Your organization wants to make its internal shuttle service route more efficient. The shuttles currently stop at all pick-up points across the city every 30 minutes between 7 am and 10 am. The development team has already built an application on Google Kubernetes Engine that requires users to confirm their presence and shuttle station one day in advance. What approach should you take?

  • A. 1. Build a tree-based regression model that predicts how many passengers will be picked up at each shuttle station. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the prediction.
  • B. 1. Build a tree-based classification model that predicts whether the shuttle should pick up passengers at each shuttle station. 2. Dispatch an available shuttle and provide the map with the required stops based on the prediction.
  • C. 1. Define the optimal route as the shortest route that passes by all shuttle stations with confirmed attendance at the given time under capacity constraints. 2. Dispatch an appropriately sized shuttle and indicate the required stops on the map.
  • D. 1. Build a reinforcement learning model with tree-based classification models that predict the presence of passengers at shuttle stops as agents and a reward function around a distance-based metric. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the simulated outcome.

Answer 2. A

Question 3.

You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?

  • A. Use the class distribution to generate 10% positive examples.
  • B. Use a convolutional neural network with max pooling and softmax activation.
  • C. Downsample the data with upweighting to create a sample with 10% positive examples.
  • D. Remove negative examples until the numbers of positive and negative examples are equal.

Answer 3. B

Question 4.

You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?

  • A. Use Data Fusionג€™s GUI to build the transformation pipelines, and then write the data into BigQuery.
  • B. Convert your PySpark into SparkSQL queries to transform the data, and then run your pipeline on Dataproc to write the data into BigQuery.
  • C. Ingest your data into Cloud SQL, convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning.
  • D. Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table.

Answer 4. B

Question 5.

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, theano, Scikit-learn, and custom libraries. What should you do?

  • A. Use the AI Platform custom containers feature to receive training jobs using any framework.
  • B. Configure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TF Job.
  • C. Create a library of VM images on Compute Engine, and publish these images on a centralized repository.
  • D. Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

Answer 5. D

Question 6.

You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to classify whether an image contains your company’s product. Expecting the release of new products in the near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You also want to use AI Platform’s continuous evaluation service to ensure that the models have high accuracy on your test dataset. What should you do?

  • A. Keep the original test dataset unchanged even if newer products are incorporated into retraining.
  • B. Extend your test dataset with images of the newer products when they are introduced to retraining.
  • C. Replace your test dataset with images of the newer products when they are introduced to retraining.
  • D. Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.

Answer 6. C

Question 7.

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?

  • A. Configure AutoML Tables to perform the classification task.
  • B. Run a BigQuery ML task to perform logistic regression for the classification.
  • C. Use AI Platform Notebooks to run the classification model with pandas library.
  • D. Use AI Platform to run the classification model job configured for hyperparameter tuning.

Answer 7. B

Question 8.

You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions are served directly to users in an app in real time. Because different seasons and population increases impact the data relevance, you will retrain the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the predictive model?

  • A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.
  • B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.
  • C. Write a Cloud Functions script that launches a training and deploying job on AI Platform that is triggered by Cloud Scheduler.
  • D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.

Answer 8. A

Question 9.

You are developing ML models with AI Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers, and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

  • A. Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job.
  • B. Use the gcloud command-line tool to submit training jobs on AI Platform when you update your code.
  • C. Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository.
  • D. Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.

Answer 9. B

Question 10

Your team needs to build a model that predicts whether images contain a driver’s license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver’s licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: [`˜drivers_license’, `˜passport’, `˜credit_card’]. Which loss function should you use?

  • A. Categorical hinge
  • B. Binary cross-entropy
  • C. Categorical cross-entropy
  • D. Sparse categorical cross-entropy

Answer 10. D

Agile project management Artificial Intelligence aws blockchain cloud computing coding interview coding interviews Collaboration Coursera css cybersecurity cyber threats data analysis data breaches data science data visualization devops django docker excel flask Grafana html It Certification java javascript ketan kk Kubernetes machine learning machine learning engineer Network & Security nodejs online courses online learning Operating Systems Other It & Software pen testing Project Management python Software Engineering Terraform Udemy courses VLAN web development

No posts found!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.