Machine Learning (ML) has revolutionised industries by enabling systems to learn from data and make predictions or decisions. Building a machine learning model from scratch can seem daunting, but with a structured approach, it becomes manageable. In this blog, we’ll guide you step-by-step to create a simple ML model using Python.
Step 1: Define the Problem
The first step is to understand the problem you want to solve. For this example, we’ll predict house prices based on features like the number of rooms, location, and size.
Step 2: Collect and Prepare the Data
Quality data is the backbone of any ML model. We’ll use a sample dataset from the scikit-learn
library.
Importing Libraries and Dataset
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Load the dataset
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
Splitting the Data
We split the data into training and testing sets to evaluate our model’s performance.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Normalising the Data
Feature scaling is crucial for algorithms sensitive to feature magnitudes.
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
Step 3: Choose an Algorithm
For this example, we’ll use Linear Regression, one of the simplest and most interpretable ML algorithms.
Step 4: Build the Model
Implementing Linear Regression
We’ll use the LinearRegression
class from scikit-learn
.
from sklearn.linear_model import LinearRegression
# Initialise the model
model = LinearRegression()
# Train the model
model.fit(X_train, y_train)
Step 5: Evaluate the Model
Evaluation helps you understand how well your model generalises to unseen data. We’ll use metrics like Mean Squared Error (MSE) and R-squared.
from sklearn.metrics import mean_squared_error, r2_score
# Make predictions
y_pred = model.predict(X_test)
# Calculate metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared: {r2:.2f}")
Step 6: Interpret the Results
Coefficients and Intercept
Linear regression provides coefficients for each feature, indicating their impact on the target variable.
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)
Interpret these values to understand the relationships between features and the target.
Step 7: Iterate and Improve
Model building is an iterative process. You can:
- Try different algorithms (e.g., Decision Trees, Random Forests).
- Perform feature engineering to create new meaningful features.
- Use techniques like cross-validation to improve robustness.
Conclusion
Building a machine learning model from scratch involves clear steps: understanding the problem, preparing the data, choosing the right algorithm, and evaluating the results. This structured approach can be applied to various ML problems, making it a valuable skill for aspiring data scientists.
Next Steps
Explore other machine learning models and techniques to tackle more complex problems. Consider diving into libraries like TensorFlow and PyTorch for deep learning tasks.
Want to learn Machine Learning with an instructor, either offline or online? Find experienced tutors near you and join Coaching Wallah—completely free for students!
Leave your comment