Choosing the Right Scorer

When creating a machine learning model, selecting the appropriate scorer is critical as it directly aligns with your specific business objectives. The scorer determines how model performance is evaluated and optimized.

For Regression Models

When predicting continuous values (like thickness, temperature, or yield):

Mean Absolute Error (MAE)

Business objective: Minimize the average magnitude of errors regardless of direction
Best for: Cases where all prediction errors are equally important regardless of size
Example use case: Predicting material thickness where you care about average deviation in millimeters across all parts
When not to use: When very large errors incur disproportionately high business cost or safety risk, in which case a quadratic or custom weighted loss (e.g., MSE, RMSE) may be more appropriate

Root Mean Squared Error (RMSE)

Business objective: Minimize larger errors more aggressively than smaller ones
Best for: Situations where large errors are disproportionately problematic
Example use case: Predicting critical process parameters where large deviations could cause significant quality issues
When not to use: When your data contains outliers that would disproportionately influence the model, or when all error magnitudes should be treated equally in your business context

Median Absolute Error (MedAE)

Business objective: Find typical error magnitude while ignoring outliers
Best for: Datasets with known outliers or when occasional extreme errors are expected
Example use case: Predicting process parameters in environments with occasional sensor glitches
When not to use: When outliers represent important edge cases that must be captured by your model, or when you need to minimize all errors including infrequent large ones

R2 Score

Business objective: Maximize the proportion of variance explained by the model
Best for: Understanding how well your model captures the underlying pattern
Example use case: When you need to explain how much of a quality parameter's variation is due to your production variables
When not to use: When the absolute scale of prediction errors is more important than their relative performance, or when working with non-linear relationships that may not be well-captured by variance explanations

For Classification Models

When predicting categories (like pass/fail, conforming/non-conforming):

F1 Score

Business objective: Balance precision and recall with equal importance
Best for: Cases where both false positives and false negatives have similar business impact
Example use case: Identifying defective parts when both missing defects and unnecessary rework are equally problematic
When not to use: When false positives and false negatives have significantly different business costs, or when class distribution is extremely imbalanced

Precision

Business objective: Minimize false positives
Best for: Cases where wrongly predicting the positive class is very costly
Example use case: Predicting when expensive rework is needed, where unnecessary rework is costly
When not to use: When false negatives (missing actual positive cases) are more problematic than false positives, or when recall is the primary concern

Balanced Accuracy

Business objective: Perform well across all classes, even with imbalanced data
Best for: Datasets with uneven class distributions
Example use case: Detecting rare failure modes that occur infrequently in production
When not to use: When you need to prioritize one type of error over another based on business costs, or when working with well-balanced classes where standard accuracy would be sufficient

Matthews Correlation Coefficient (MCC)

Business objective: Get a balanced measure of performance even with very skewed class distributions
Best for: Highly imbalanced datasets where other metrics might be misleading
Example use case: Detecting rare quality issues that happen less than 1% of the time
When not to use: When you need a more intuitive and directly interpretable metric for business stakeholders, or when you specifically need to optimize for precision or recall independently

How to Choose

Consider the cost of errors in your specific business context:
- Is missing a defect more costly than a false alarm?
- Are all prediction errors equally important, or are large errors disproportionately problematic?
Consider your data characteristics:
- For highly imbalanced classification problems, prefer MCC or balanced accuracy
- For regression with outliers, consider MedAE for robustness
Consider interpretability requirements:
- R2 is easily explained as "percentage of variance explained"
- RMSE and MAE are in the same units as your target variable
Test multiple scorers and compare results:
- Create parallel models with different scorers
- Review all performance metrics, not just the optimization target
- Evaluate real-world performance based on business outcomes

By selecting a scorer that aligns with your business objectives, you ensure the model is optimized for what actually matters in your manufacturing process rather than an arbitrary statistical measure.

For Regression Models​

Mean Absolute Error (MAE)​

Root Mean Squared Error (RMSE)​

Median Absolute Error (MedAE)​

R2 Score​

For Classification Models​

F1 Score​

Precision​

Balanced Accuracy​

Matthews Correlation Coefficient (MCC)​

How to Choose​