Understanding Key Formulas in Supervised Machine Learning: A Guide to Regression and Classification

Ayesha Anzer
Aug 4, 2024
3 min read

Supervised Machine Learning is essential in many data-driven applications, from predicting house prices to classifying emails as spam. In this blog, we'll explore the fundamental formulas that power two of the most important types of supervised learning: regression and classification. Understanding these formulas will give you deeper insights into how these models work.

1. Linear Regression: Predicting Continuous Outcomes

Linear regression is one of the simplest and most widely used algorithms for predicting a continuous target variable. The goal is to find the line (or hyperplane, in higher dimensions) that best fits the data.

Hypothesis Function (Model Formula): The hypothesis function is a linear combination of the input features:

This formula represents the linear relationship between the input features (x_1, x_2, ... , x_n) and the output (y).
Cost Function (Mean Squared Error): The cost function is used to measure how well the model’s predictions match the actual data:

Here, m is the number of training examples, h(x_i) is the predicted value, and y_i is the actual value.

Gradient Descent Update Rule: Gradient descent is used to minimize the cost function by iteratively updating the model parameters:

In this formula, alpha is the learning rate, and the update is done for each parameter theta_j.

2. Polynomial Regression: Capturing Non-Linearity

When the relationship between the input features and output is not linear, polynomial regression can be a powerful tool. It extends linear regression by including polynomial terms, allowing the model to capture more complex patterns.

Hypothesis Function (Model Formula): Polynomial regression adds powers of the original features:

This allows the model to fit a curve rather than just a straight line.
Cost Function (Mean Squared Error): The cost function remains the same as in linear regression:

3. Logistic Regression: Predicting Binary Outcomes

Logistic regression is the go-to method for binary classification problems, where the goal is to predict one of two possible outcomes, such as "spam" or "not spam."

Hypothesis Function (Sigmoid Function): Logistic regression uses the sigmoid function to map predicted values to probabilities:

This function outputs a probability between 0 and 1, making it ideal for classification tasks.

Cost Function (Log Loss or Binary Cross-Entropy): The cost function for logistic regression is designed to penalize incorrect predictions:

Gradient Descent Update Rule: The parameters are updated similarly to linear regression:

4. Regularization: Preventing Overfitting

One of the challenges in machine learning is finding a model that generalizes well to new data. Regularization techniques, like L2 and L1 regularization, help to prevent overfitting by adding a penalty to the cost function.

L2 Regularization (Ridge Regression): L2 regularization adds the sum of the squared parameter values to the cost function:

This penalty discourages large parameter values, making the model simpler and less prone to overfitting.
L1 Regularization (Lasso Regression): L1 regularization adds the sum of the absolute values of the parameters to the cost function:

L1 regularization promotes sparsity in the model, often leading to many coefficients being reduced to zero.

Conclusion

These formulas form the backbone of many supervised learning models, providing the mathematical framework to predict outcomes and make decisions based on data. By understanding these formulas, you not only gain insight into how models work but also how to tweak and optimize them for better performance. Whether you're building a model to predict house prices or classify images, these tools are essential in your machine learning toolkit.

References:

Information for this post was adapted from the "Supervised Learning: Regression and Classification" course on Coursera.