Performance Metrics: Confusion Matrix, Precision, Recall, and F1-score. The Basics

Confusion Matrix

It is a matrix of size 2×2 for binary classification with actual values on one axis and predicted on another. A confusion matrix is a table that is often used to describe the performance of a classification algorithm. It is a way to visualize the number of correct and incorrect predictions made by a model. The table is usually divided into four quadrants: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN).
  • True Positives (TP) are the cases where the model correctly predicted the positive class.
  • False Positives (FP) are the cases where the model incorrectly predicted the positive class.
  • True Negatives (TN) are the cases where the model correctly predicted the negative class.
  • False Negatives (FN) are the cases where the model incorrectly predicted the negative class.

Accuracy, Precision, recall, and F1 Score

Accuracy, precision, recall, and F1 score are all ways to measure how well a prediction model is doing.

Accuracy is like a test score. It tells us how many answers we got right out of all the answers we gave. A higher accuracy score means we got more answers right.

Precision is like aiming for a target. It tells us how many of the answers we got right were actually correct answers. A higher precision score means we aimed well and hit the correct answers more often.

Recall is like finding all the hidden treasure. It tells us how many of the correct answers we found out of all the correct answers there were. A higher recall score means we found more of the treasure.

F1 score is like a balance between accuracy and precision. It's a way to balance out the trade-off between getting a lot of answers right but not all of them being correct, versus getting very few answers right but all of them being correct. A higher F1 score means we found a good balance.

Imagine you are playing a game where you have to guess the color of the hidden ball, you want to get as many rights as you can (accuracy), aim well and hit the right color(precision), and find all the hidden balls (recall) and balance between getting many rights but not all of them being correct, versus getting very few right but all of them being correct(F1-score).

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Example true and predicted labels
y_true = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0]
y_pred = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0]

# Calculate accuracy
acc = accuracy_score(y_true, y_pred)
print("Accuracy: ", acc)

# Calculate precision
prec = precision_score(y_true, y_pred)
print("Precision: ", prec)

# Calculate recall
rec = recall_score(y_true, y_pred)
print("Recall: ", rec)

# Calculate F1 score
f1 = f1_score(y_true, y_pred)
print("F1 Score: ", f1)


accuracy_score(y_true, y_pred) function takes two input arguments, the actual labels (y_true) and the predicted labels (y_pred) and returns the accuracy of the model.

precision_score(y_true, y_pred) function takes two input arguments, the actual labels (y_true) and the predicted labels (y_pred) and returns the precision of the model.

recall_score(y_true, y_pred) function takes two input arguments, the actual labels (y_true) and the predicted labels (y_pred) and returns the recall of the model.

f1_score(y_true, y_pred) function takes two input arguments, the actual labels (y_true) and the predicted labels (y_pred) and returns the F1-score of the model.

Comments