Data Types
Introduction
When working with semiconductor manufacturing data, it's important to understand how different types of information are classified. This guide explains how data is categorized in our system to help you make better sense of your analytics.
Data Column Basics
Each column in your dataset has two important aspects:
- What it contains (data type) - like numbers, text, dates
- What it means (semantic type) - how the information should be understood
Common Data Types
Text Values (STRING)
- Examples: Tool IDs, Wafer Batch Numbers, Manufacturer Names
- How it's used: Identifies specific equipment, batches, or sources
Whole Numbers (INTEGER)
- Examples: Defect Count, Wafer Count, Production Cycles
- How it's used: Counts discrete items or events
Decimal Numbers (FLOAT)
- Examples: Temperature (°C), Motor Current (Amps), Film Thickness (nm)
- How it's used: Precise measurements with decimal points
Yes/No Values (BOOLEAN)
- Examples: Pass/Fail Test Results, Equipment Status (On/Off)
- How it's used: Binary conditions
Date and Time (DATE or DATETIME)
- Examples: Production Start Time, Maintenance Date, Process Completion
- How it's used: Tracks when events occurred
Understanding Semantic Types
Labels (NOMINAL)
- What it means: Categories without any natural order
- Examples:
- Tool ID (ETC-001, PVD-102, CMP-023)
- Material Supplier (Supplier A, Supplier B, Supplier C)
Ranked Categories (ORDINAL)
- What it means: Categories with a meaningful order
- Examples:
- Process Quality Rating (Low, Medium, High, Premium)
- Alert Severity (Minor, Major, Critical)
Continuous Measurements (CONTINUOUS)
- What it means: Smooth numerical values that can have any decimal value
- Examples:
- Chamber Pressure (mTorr)
- Motor Current (Amps)
- Chemical Viscosity (cP)
Count Values (DISCRETE)
- What it means: Whole number counts
- Examples:
- Defects per Wafer
- Number of Production Cycles
- Particles Detected
Time-Based Values (TEMPORAL)
- What it means: Points or periods in time
- Examples:
- Production Start Time
- Process Duration
- Maintenance Schedule Date
Quick Reference Guide
What You're Measuring | Suggested Data Type | Other Supported | Semantic Type |
---|---|---|---|
Equipment IDs, Batch Numbers | STRING | INTEGER | NOMINAL |
Quality Levels, Priorities | STRING | INTEGER | ORDINAL |
Temperatures, Currents, Thickness | FLOAT | INTEGER | CONTINUOUS |
Defect Count, Cycle Count | INTEGER | FLOAT | DISCRETE |
Process Dates, Maintenance Times | DATE, DATETIME | STRING | TEMPORAL |
Frequently Asked Questions
What if my data doesn't fit neatly into one type?
Sometimes semiconductor data can be complex. For example, a defect code might be "D123" (looks like text) but actually represents a specific numerical category (ORDINAL). "D123" might be considered a more severe defect than "D122". In this case, you should convert the defect code to an INTEGER type and assign it an ORDINAL semantic type.
Why is identifying semantic types important?
Semantic types help our analytics system understand the meaning behind your data. For example, knowing that "Lot002" is a nominal label rather than a number helps prevent meaningless calculations (like averaging lot numbers).
Can I change data types after importing?
Yes, but it's best to get them right from the start. Changing a column from FLOAT to INTEGER might truncate decimal values, and changing from STRING to a numeric type can cause errors if text values are present.
How should I handle chemical concentration measurements?
Chemical concentrations (like Iron at 0.5 ppb) are typically FLOAT data type with CONTINUOUS semantic type, even when measured in parts per billion or million, as they can take any decimal value.
What about timestamp data in log files?
Timestamps from equipment logs should be formatted consistently as DATE or DATETIME type with TEMPORAL semantic type to enable time-based analysis like trend detection and cycle time calculations.
What about the Decimal data type?
Data stored as DECIMAL is converted to FLOAT for analysis. The DECIMAL type is used for storage and display purposes, but all calculations are performed using FLOAT.