Statistics

info

Access Athinia Catalyst at catalyst.athinia.io.

The Statistical Data Explorer provides a comprehensive view of your dataset's statistics, helping you understand your data's characteristics at a glance. The component displays key metrics for each column in your dataset, allowing you to quickly identify patterns, outliers, and data quality issues.

Key Features

Data Overview

At the top of the component, you'll see a summary of your dataset showing:

Total number of columns
Total number of rows
Dataset size in MiB

Filtering

Use the Filter component to filter your dataset based on specific conditions. The statistics will update immediately to reflect the filtered dataset.

Column Statistics

For each column in your dataset, the following information is displayed:

Column: The name of each data field with selection checkboxes
Data Type: The data type of the column (String, Float64, Int32, etc.)
Density: A sparkline visualization showing the distribution of values
Boxplot: A visual representation of the five-number summary (minimum, first quartile, median, third quartile, maximum)
Unique Values: The count of distinct values in the column
Null Count: The number of missing values, shown as both count and percentage
Mean: The average value (for numerical columns)
Median: The middle value (for numerical columns)
Min: The minimum value in the column
Max: The maximum value in the column
Standard Deviation: The measure of dispersion in the data (for numerical columns)
Q1: The first quartile value
Q3: The third quartile value

Using the Component

Searching for Columns

Use the search box in the top-right corner to quickly find specific columns by name. The table will automatically filter to only show matching columns.

Adjusting Precision

The numeric input field with the "floating-point" icon allows you to adjust the number of decimal places shown for numeric values. You can set any value between 0 and 38 to control the display precision.

Column Selection

Use the checkboxes in the first column to select specific columns for further analysis or operations.

Visualizing Distributions

The Density column shows sparklines that visualize the distribution of values in numerical columns. This helps you quickly identify patterns such as:

Normal distributions (bell curves)
Skewed distributions
Bimodal or multimodal distributions
Uniform distributions

Understanding Boxplots

The boxplots visually represent the five-number summary:

The vertical line inside the box represents the median
The left edge of the box represents the first quartile (Q1)
The right edge of the box represents the third quartile (Q3)
The horizontal lines (whiskers) extend to the minimum and maximum values

Tips for Effective Use

Data Quality Assessment: Check the "Null Count" column to quickly identify fields with missing values.
Outlier Detection: Look at the boxplots to spot potential outliers that extend beyond the whiskers.
Distribution Analysis: Examine the density sparklines to understand the distribution shape of your numerical data.
Data Type Verification: Review the "Data Type" column to ensure your columns have the appropriate data types for your analysis.
Column Filtering: Use the search box when working with datasets that have many columns to focus on specific fields of interest.

The Statistical Data Explorer is designed to give you immediate insights into your data structure and quality, setting the foundation for more advanced analysis within the data science module.

Frequently Asked Questions

Why don't I see density plots for some of my columns?

Density plots are only generated for numerical data types. Categorical and String columns don't have density visualizations since these distribution representations only make sense for continuous data.

What does the percentage in the "Null Count" column represent?

This percentage shows what portion of the total dataset is missing for that particular column. It helps you quickly assess data completeness.

How are the boxplots calculated?

Boxplots represent the five-number summary of your data: minimum, Q1 (25th percentile), median (50th percentile), Q3 (75th percentile), and maximum. The box itself spans from Q1 to Q3, with a line at the median.

Why do some numeric values show "bytes" next to them?

For String and Categorical data types, the numeric values represent the byte size of the data stored in that column rather than the actual data values.

How can I export these statistics?

Currently, the statistics are presented for visual analysis within the interface. If you need to export them, consider using the data science module's export functionality available in other parts of the application.

What happens if my dataset is empty?

If your dataset has no rows (possibly due to filtering conditions), you'll see a message indicating "Dataset is empty" with a suggestion to remove filter conditions.

Why do some values appear truncated?

You can adjust the precision of displayed values using the numeric input with the floating-point icon in the top-right corner. This controls how many decimal places are shown for numeric values.

Can I change the data type of a column?

The data types shown in the Data Explorer are informational. To modify data types, you'll need to use the data transformation features available elsewhere in the application.

Key Features​

Data Overview​

Filtering​

Column Statistics​

Using the Component​

Searching for Columns​

Adjusting Precision​

Column Selection​

Visualizing Distributions​

Understanding Boxplots​

Tips for Effective Use​

Frequently Asked Questions​

Why don't I see density plots for some of my columns?​

What does the percentage in the "Null Count" column represent?​

How are the boxplots calculated?​

Why do some numeric values show "bytes" next to them?​

How can I export these statistics?​

What happens if my dataset is empty?​

Why do some values appear truncated?​

Can I change the data type of a column?​