Machine Learning

Fraud Detection in Car Insurance – Random Forest

University project · December 2025 – January 2026

Detecting fraudulent insurance claims using machine learning and pattern recognition.

Project Overview

This project was part of a university machine learning project, where multiple approaches were explored to detect fraudulent car insurance claims.

While other team members experimented with different algorithms, my focus was on implementing and optimizing a Random Forest model to identify suspicious patterns in the data.

The overall goal was to compare different machine learning techniques and evaluate their effectiveness in detecting fraud.

Data Preparation

A key part of the project was preparing the dataset for machine learning.

Cleaning and preprocessing raw data
Handling missing values
Encoding categorical features

In addition, I applied feature engineering and feature selection to identify the most relevant variables influencing fraudulent behavior.

Model: Random Forest

The core model used in my part of the project was a Random Forest classifier.

This ensemble method combines multiple decision trees and is particularly effective for classification tasks involving complex and non-linear relationships.

Robust against overfitting
Handles non-linear patterns
Works well with tabular datasets

Collaboration & Comparison

As part of the team project, different models were developed and compared across the group.

This allowed us to evaluate how various algorithms perform on the same dataset and better understand their strengths and weaknesses in fraud detection scenarios.

Training & Insights

During model training, I analyzed which features had the strongest correlation with fraudulent behavior.

This provided valuable insights into how certain patterns and variables contribute to suspicious claims.

Evaluation & Optimization

The model was evaluated and optimized to achieve a strong balance between:

High detection rate (recall)
Low misclassification rate

In fraud detection, minimizing false negatives is especially critical, as undetected fraud can lead to significant financial loss.

Key Learnings

End-to-end machine learning workflow
Working in a collaborative ML project
Comparing different algorithms
Importance of evaluation metrics beyond accuracy

Conclusion

This project provided hands-on experience with real-world machine learning challenges in the insurance domain.

It demonstrates how different machine learning approaches can be applied and compared to solve complex classification problems like fraud detection.

Back to Projects Download Jupyter Notebook Download Presentation