Building Blocks of an ML Model! #2

Let's bake a cake!

Mar 15, 2024

Hi Everyone,

Welcome to the second edition of my newsletter “ML & AI Cupcakes!”

In the first edition, I gave you a sneak peak into what topics will be covered in the upcoming weeks. Just in case you missed it, check it out here:

Let's Begin! #1

Kavita Gupta, PhD

March 8, 2024

Hi Everyone, My name is Kavita and welcome to the first issue of my newsletter “ML &AI Cupcakes”! First and foremost, I want to express my gratitude to all of you for joining this newsletter. It is fantastic to have you on board as we embark on this exciting journey together!

Read full story

Today, we will discuss about the building blocks of a machine learning model. These building blocks remain the same across most of the machine learning models (from Linear Regression to Neural Networks). So, understanding them will help you gain deeper insights into how different ML models work.

In order to make the things simple for you, I will be using the analogy of baking a cake. Think of these building blocks as different parts of this cake baking recipe.

So, are you ready to bake the cake?

I think I heard a big “Yes”!

So, here we go:

Input Data

Input data is like the raw materials or ingredients you gather to bake a cake.

Input data is the foundation of each ML model. This is the raw material for the models to learn from. It is the collection of features. For example, Input data for house price prediction dataset contains features like size, location, number of bedrooms, number of bathrooms etc.

Always remember, ML models rely on the saying “Garbage in, Garbage out”. The quality and performance of an ML model heavily depends on the quality of the input data. Always make sure the data is collected from the trusted and authorized sources.

Features

They are the individual ingredients like flour, milk, sugar etc. needed to bake the cake. Each ingredient contributes to the flavour and texture of the cake.

Similarly, features are like the individual characteristics of the data. Each of them has a specific role to play in model’s output. In house price prediction dataset, each feature say size, location, number of bedrooms contribute to determine the final price of the house.

Preprocessing Pipeline

Before you start baking your cake, you need to make sure all the ingredients are clean, organized and ready for baking. This including steps like washing and chopping the fruits, cracking eggs in a bowl, scaling the quantities of ingredients based on the size of the cake you want to bake etc.

Similarly, in preprocessing, we need to make sure the data is ready for model building. This includes preprocessing steps like handling missing values and outliers, removing any inconsistencies, feature scaling etc.

Model

The model is like the recipe you follow to bake the cake. Just as a recipe provides instructions for combining ingredients to bake the cake, an ML model defines the structure and method by which the input data is transformed into predictions.

Depending on the type of cake you want to bake(chocolate cake or fruit cake), there are different set of procedures and techniques to follow. In the same way, depending upon the problem (regression or classification), we have different types of models.

Parameters

In baking, you need to decide how much quantity of each ingredient (flour, milk, sugar etc.) you should take. These quantities determine the final characteristics of the cake.

Similarly, in ML models, parameters are the values that directly impacts the final outcomes of the model. Some examples of parameters include co-efficients in Linear Regression, weights and biases in Neural Networks. Their values determine how much a particular feature will contribute to the predicted outcome.

Parameters are adjusted automatically during the training process in a way that minimizes the difference between predicted and actual outcome.

Hyperparameters

Hyperparameters can be compared to oven settings like baking temperature, baking time etc. They are external settings and configurations and are not a part of the recipe. They must be set beforehand may be based on experience. Similarly, hyperparameters are not a part of the trained model. They are the external settings which can influence the model performance indirectly.

Baking temperature can impact the texture and consistency of the cake. Similarly, hyperparameters like learning rate can influence the learning process and performance of the model.

Baking time can determine whether is cake is perfectly cooked or not. Similarly, hyperparameters like number of iterations or epochs in training can determine how well the model learns from the data. Too many iterations may lead to overfitting while too few may lead to underfitting.

Loss Function

You are baking the cake following a recipe. Now is the time to run a taste and texture test. You have a particular taste and texture in mind. This test will tell how well the cake turned out as compared to your expectations. Similarly, the loss function acts as a this “taste and texture test” to check how close model’s predictions are to the actual outcomes in the training data.

While tasting the cake, you may find that it is too hard, less sweet, too dry or may not be aligning well with your expectations. Then, you need to make required adjustments like increasing the sugar if the cake is less sweet in your next batch of cakes.

Similarly, the loss function guides the model training process by providing feedback on how to adjust the model's parameters to minimize the discrepancy between predictions and actual outcomes. The required adjustments are made in the model parameters to minimize the value of the loss function.

Basically, “taste and texture test” and “loss function” act as a feedback mechanism for the cake and ML model respectively, to do better.

Optimization

The ultimate goal of “taste and texture test” and “loss function” is optimization.

In baking, optimization refers to refining the recipe to produce a great cake. In an ML model, optimization refers to fine-tuning the model’s parameters to improve its predictive accuracy and generalization capacity on unseen data.

Optimization is an iterative process. Just like you taste the cake at different stages of its preparation and make adjustments to reach the desired outcome, similarly, you keep adjusting the parameters of the model during each iteration to reach the desired predictive accuracy.

Most commonly used optimization algorithm in ML are Gradient Descent, Stochastic Gradient Descent, Adam etc. Just like different recipes may result in varying tastes and textures, each optimization algorithm has its own strengths and limitations.

Basically, optimization is like the action taken by the model to do better by making necessary adjustments in parameters based on the feedback received from loss function.

Evaluation Metrics

We set various standards like taste, texture, appearance to judge our success in baking the cake. In the same way, you use evaluation metrics like RMSE, accuracy, precision, recall, F1-score etc. to assess the performance of your ML model.

Evaluation metrics often highlight the areas where model is underperforming or making incorrect predictions. Then adjustments can be made to hyperparameters, feature engineering or model selection to enhance the performance of the model.

In summary, just like baking a delicious cake needs a careful selection of ingredients, precise measurements, strategic steps and incorporating feedback, building an ML model also requires attention to details, thoughtful preprocessing, optimization and hyperparameter tuning.

Hope you all had a nice baking experience…:)

Just remember that mastering the art of cake baking takes time, practice and experimentation. Same goes with mastering the art of Machine Learning.

I would like to wrap up with the following powerful reminder:

“One hour per day of study in your chosen field is all it takes. One hour per day of study will put you at the top of your field within three years. Within five years you’ll be a national authority. In seven years, you can be one of the best people in the world at what you do.” — Earl Nightingale

Curious about a specific ML topic? Share your topic suggestions with me. This will help guiding this newsletter’s content and will ensure we are diving into the topics you are most eager to explore.

Also, feel free to share your feedbacks and suggestions. That will help me keep going.

See you next Friday!

-Kavita

P.S. Let’s grow our tribe. Know someone who is curious to dive into ML and AI? Share this newsletter with them and invite them to be a party of this exciting learning journey.

ML & AI Cupcakes

Building Blocks of an ML Model! #2

Let's bake a cake!

Let's Begin! #1

Discussion about this post