Building a Machine Learning Model from Scratch

Machine Learning (ML) has revolutionised industries by enabling systems to learn from data and make predictions or decisions. Building a machine learning model from scratch can seem daunting, but with a structured approach, it becomes manageable. In this blog, we’ll guide you step-by-step to create a simple ML model using Python.

Step 1: Define the Problem

The first step is to understand the problem you want to solve. For this example, we’ll predict house prices based on features like the number of rooms, location, and size.

Step 2: Collect and Prepare the Data

Quality data is the backbone of any ML model. We’ll use a sample dataset from the scikit-learn library.

Importing Libraries and Dataset

import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load the dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

Splitting the Data

We split the data into training and testing sets to evaluate our model’s performance.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Normalising the Data

Feature scaling is crucial for algorithms sensitive to feature magnitudes.

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 3: Choose an Algorithm

For this example, we’ll use Linear Regression, one of the simplest and most interpretable ML algorithms.

Step 4: Build the Model

Implementing Linear Regression

We’ll use the LinearRegression class from scikit-learn.

from sklearn.linear_model import LinearRegression

# Initialise the model
model = LinearRegression()

# Train the model
model.fit(X_train, y_train)

Step 5: Evaluate the Model

Evaluation helps you understand how well your model generalises to unseen data. We’ll use metrics like Mean Squared Error (MSE) and R-squared.

from sklearn.metrics import mean_squared_error, r2_score

# Make predictions
y_pred = model.predict(X_test)

# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")

Step 6: Interpret the Results

Coefficients and Intercept

Linear regression provides coefficients for each feature, indicating their impact on the target variable.

print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

Interpret these values to understand the relationships between features and the target.

Step 7: Iterate and Improve

Model building is an iterative process. You can:

Try different algorithms (e.g., Decision Trees, Random Forests).
Perform feature engineering to create new meaningful features.
Use techniques like cross-validation to improve robustness.

Conclusion

Building a machine learning model from scratch involves clear steps: understanding the problem, preparing the data, choosing the right algorithm, and evaluating the results. This structured approach can be applied to various ML problems, making it a valuable skill for aspiring data scientists.

Next Steps

Explore other machine learning models and techniques to tackle more complex problems. Consider diving into libraries like TensorFlow and PyTorch for deep learning tasks.

Want to learn Machine Learning with an instructor, either offline or online? Find experienced tutors near you and join Coaching Wallah—completely free for students!