MachineLearning cross_val_score vs cross_val_predict with code example?

In machine learning, cross_val_score and cross_val_predict are both techniques used for cross-validation, which is a method to assess the performance of a model on unseen data. However, they serve different purposes and return different types of results.

cross_val_score: cross_val_score is used to evaluate the performance of a model by applying cross-validation and returning an array of scores for each fold. It calculates and returns the score (e.g., accuracy, precision, recall, etc.) of the model on each fold of the cross-validation.

Here’s an example using cross_val_score with a simple decision tree classifier:

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create a decision tree classifier
clf = DecisionTreeClassifier()

# Calculate cross-validation scores (default scoring is accuracy)
scores = cross_val_score(clf, X, y, cv=5)
print("Cross-validation scores:", scores)
print("Mean accuracy:", scores.mean())

cross_val_predict: cross_val_predict is used to generate cross-validated predictions for each data point. It performs cross-validation like cross_val_score, but instead of returning scores, it returns an array of predicted labels (or values) for each data point in the dataset.

Here’s an example using cross_val_predict with a linear regression model:

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_predict
import matplotlib.pyplot as plt

# Load the Diabetes dataset
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target

# Create a linear regression model
model = LinearRegression()

# Generate cross-validated predictions
predictions = cross_val_predict(model, X, y, cv=5)

# Plot actual vs. predicted values
plt.scatter(y, predictions)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs. Predicted Values using cross_val_predict")
plt.show()

In this example, cross_val_predict returns an array of predicted values, and you can use it to compare the actual vs. predicted values for assessment.

In summary:

  • Use cross_val_score when you want to assess the model’s performance using cross-validation scores.
  • Use cross_val_predict when you want to generate cross-validated predictions for each data point and analyze the prediction quality.

Both functions are useful tools for evaluating and understanding the performance of your machine learning models.