In machine learning, cross_val_score
and cross_val_predict
are both techniques used for cross-validation, which is a method to assess the performance of a model on unseen data. However, they serve different purposes and return different types of results.
cross_val_score
: cross_val_score
is used to evaluate the performance of a model by applying cross-validation and returning an array of scores for each fold. It calculates and returns the score (e.g., accuracy, precision, recall, etc.) of the model on each fold of the cross-validation.
Here’s an example using cross_val_score
with a simple decision tree classifier:
from sklearn.datasets import load_iris from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import cross_val_score # Load the Iris dataset iris = load_iris() X = iris.data y = iris.target # Create a decision tree classifier clf = DecisionTreeClassifier() # Calculate cross-validation scores (default scoring is accuracy) scores = cross_val_score(clf, X, y, cv=5) print("Cross-validation scores:", scores) print("Mean accuracy:", scores.mean())
cross_val_predict
: cross_val_predict
is used to generate cross-validated predictions for each data point. It performs cross-validation like cross_val_score
, but instead of returning scores, it returns an array of predicted labels (or values) for each data point in the dataset.
Here’s an example using cross_val_predict
with a linear regression model:
from sklearn.datasets import load_diabetes from sklearn.linear_model import LinearRegression from sklearn.model_selection import cross_val_predict import matplotlib.pyplot as plt # Load the Diabetes dataset diabetes = load_diabetes() X = diabetes.data y = diabetes.target # Create a linear regression model model = LinearRegression() # Generate cross-validated predictions predictions = cross_val_predict(model, X, y, cv=5) # Plot actual vs. predicted values plt.scatter(y, predictions) plt.xlabel("Actual Values") plt.ylabel("Predicted Values") plt.title("Actual vs. Predicted Values using cross_val_predict") plt.show()
In this example, cross_val_predict
returns an array of predicted values, and you can use it to compare the actual vs. predicted values for assessment.
In summary:
- Use
cross_val_score
when you want to assess the model’s performance using cross-validation scores. - Use
cross_val_predict
when you want to generate cross-validated predictions for each data point and analyze the prediction quality.
Both functions are useful tools for evaluating and understanding the performance of your machine learning models.