Skip to main content
info

Access Athinia Catalyst at catalyst.athinia.io.

Univariate Outlier Detection

The Univariate Outlier Detection tool analyzes each variable independently to identify data points that fall significantly outside the expected range. This is the most straightforward approach to outlier detection: for each selected column, the tool calculates statistical boundaries and flags any values that exceed them. Because it operates on one variable at a time, it is easy to interpret and often the best starting point when you suspect individual measurements contain errors or extreme readings. The tool supports three statistical methods (IQR, MAD, and Z-Score), each with adjustable sensitivity, and provides both a summary overview showing outlier counts per column and detailed box plot visualizations for deeper investigation.

Univariate Outlier Detection
Univariate outlier detection interface showing method selection and outlier summary

Methods

The tool offers three detection methods. Each defines a boundary around "normal" values and flags anything outside it.

MethodBoundary FormulaDefault ThresholdBest For
IQRQ1 - (t × IQR) to Q3 + (t × IQR)1.5Skewed or non-normal data
MADMedian ± (t × MAD)2.0Data with many extreme values
Z-ScoreMean ± (t × Std Dev)3.0Normally distributed data

How to choose:

  • Use IQR as a general-purpose default that works well with most distributions
  • Use MAD when your data contains extreme outliers that might bias IQR calculations
  • Use Z-Score when your data is approximately symmetric and normally distributed

Key Features

Column Selection

Select the numerical columns you want to analyze. The tool automatically filters to numerical data types (Float64, Int64).

  • Use column groups to quickly select related fields
  • Search for specific columns by name
  • Select or deselect individual columns with checkboxes

Threshold Control

The threshold controls sensitivity across all methods:

  • Range: 1.0 to 5.0 (adjustable in 0.25 increments)
  • Lower values: More sensitive, flags more potential outliers
  • Higher values: Less sensitive, flags only extreme deviations

Start with the default for your chosen method and adjust based on results. Lower the threshold when data quality is critical, raise it when working with naturally variable data.

Summary View

Displays a bar chart of outlier counts per column:

  • Columns sorted by outlier count (highest first)
  • Paginated for large datasets (30 columns per page)
  • Click on bars to select columns for detailed analysis
  • Updates in real time as you adjust the threshold

Detailed Analysis

Select up to 10 columns from the summary to see box plot visualizations showing:

  • Data distribution boundaries (Q1 to Q3)
  • Individual outlier values and their positions
  • Statistical context based on the selected method

Using the Tool

  1. Choose a method based on your data characteristics (see table above)
  2. Select columns relevant to your analysis
  3. Adjust the threshold if the default sensitivity is too high or too low
  4. Review the summary to identify which columns have the most outliers
  5. Click bars to select columns for detailed box plot analysis
  6. Investigate patterns to determine if outliers are errors or valid observations

Frequently Asked Questions

How do I choose between the three methods?

Consider your data distribution. For general use, start with IQR. If your data has extreme values that might skew the IQR calculation, switch to MAD. If your data is well-behaved and approximately normal, Z-Score provides the most interpretable boundaries.

What threshold should I use?

Start with the method's default (IQR: 1.5, MAD: 2.0, Z-Score: 3.0). Lower it if you need higher sensitivity for critical quality applications. Raise it if you are seeing too many false positives or working with naturally variable data.

Why can't I see all my columns?

Only numerical columns (Float64, Int64) are available for outlier analysis. Text and categorical columns are automatically filtered out.

What does "No outliers found" mean?

All values in the selected columns fall within the normal range at your current threshold. Try lowering the threshold if you expected to find outliers.

How many outliers is too many?

As a general rule: less than 5% is typical for most datasets, 5 to 10% may indicate data quality issues, and more than 10% likely indicates systematic problems worth investigating.

Why is the detailed view empty?

The detailed view only shows data after you click on bars in the summary chart. Select up to 10 columns to compare their distributions.