Choosing the Right Scorer
When creating a machine learning model, selecting the appropriate scorer is critical as it directly aligns with your specific business objectives. The scorer determines how model performance is evaluated and optimized.
For Regression Models
When predicting continuous values (like thickness, temperature, or yield):
Mean Absolute Error (MAE)
- Business objective: Minimize the average magnitude of errors regardless of direction
- Best for: Cases where all prediction errors are equally important regardless of size
- Example use case: Predicting material thickness where you care about average deviation in millimeters across all parts
- When not to use: When very large errors incur disproportionately high business cost or safety risk, in which case a quadratic or custom weighted loss (e.g., MSE, RMSE) may be more appropriate
Root Mean Squared Error (RMSE)
- Business objective: Minimize larger errors more aggressively than smaller ones
- Best for: Situations where large errors are disproportionately problematic
- Example use case: Predicting critical process parameters where large deviations could cause significant quality issues
- When not to use: When your data contains outliers that would disproportionately influence the model, or when all error magnitudes should be treated equally in your business context
Median Absolute Error (MedAE)
- Business objective: Find typical error magnitude while ignoring outliers
- Best for: Datasets with known outliers or when occasional extreme errors are expected
- Example use case: Predicting process parameters in environments with occasional sensor glitches
- When not to use: When outliers represent important edge cases that must be captured by your model, or when you need to minimize all errors including infrequent large ones
R2 Score
- Business objective: Maximize the proportion of variance explained by the model
- Best for: Understanding how well your model captures the underlying pattern
- Example use case: When you need to explain how much of a quality parameter's variation is due to your production variables
- When not to use: When the absolute scale of prediction errors is more important than their relative performance, or when working with non-linear relationships that may not be well-captured by variance explanations
For Classification Models
When predicting categories (like pass/fail, conforming/non-conforming):
F1 Score
- Business objective: Balance precision and recall with equal importance
- Best for: Cases where both false positives and false negatives have similar business impact
- Example use case: Identifying defective parts when both missing defects and unnecessary rework are equally problematic
- When not to use: When false positives and false negatives have significantly different business costs, or when class distribution is extremely imbalanced
Precision
- Business objective: Minimize false positives
- Best for: Cases where wrongly predicting the positive class is very costly
- Example use case: Predicting when expensive rework is needed, where unnecessary rework is costly
- When not to use: When false negatives (missing actual positive cases) are more problematic than false positives, or when recall is the primary concern
Balanced Accuracy
- Business objective: Perform well across all classes, even with imbalanced data
- Best for: Datasets with uneven class distributions
- Example use case: Detecting rare failure modes that occur infrequently in production
- When not to use: When you need to prioritize one type of error over another based on business costs, or when working with well-balanced classes where standard accuracy would be sufficient
Matthews Correlation Coefficient (MCC)
- Business objective: Get a balanced measure of performance even with very skewed class distributions
- Best for: Highly imbalanced datasets where other metrics might be misleading
- Example use case: Detecting rare quality issues that happen less than 1% of the time
- When not to use: When you need a more intuitive and directly interpretable metric for business stakeholders, or when you specifically need to optimize for precision or recall independently
How to Choose
-
Consider the cost of errors in your specific business context:
- Is missing a defect more costly than a false alarm?
- Are all prediction errors equally important, or are large errors disproportionately problematic?
-
Consider your data characteristics:
- For highly imbalanced classification problems, prefer MCC or balanced accuracy
- For regression with outliers, consider MedAE for robustness
-
Consider interpretability requirements:
- R2 is easily explained as "percentage of variance explained"
- RMSE and MAE are in the same units as your target variable
-
Test multiple scorers and compare results:
- Create parallel models with different scorers
- Review all performance metrics, not just the optimization target
- Evaluate real-world performance based on business outcomes
By selecting a scorer that aligns with your business objectives, you ensure the model is optimized for what actually matters in your manufacturing process rather than an arbitrary statistical measure.