Feature Tags
The Feature Analysis panel uses a comprehensive tagging system to highlight important characteristics of your data. These tags are automatically generated and use a colour-coded system to indicate different levels of significance.
The tags are organised into three main categories:
| Tag Category | Tag Name | Description |
|---|---|---|
| Distribution | Small number of outliers | Column contains few statistical outliers |
| Distribution | Medium number of outliers | Column contains moderate outliers |
| Distribution | Significant number of outliers | Column contains many outliers |
| Distribution | High-skewness | Data shows significant asymmetry |
| Distribution | Low-variance | Data shows minimal variation |
| Distribution | High-variance | Data shows significant spread |
| Distribution | Zeros-ratio | High proportion of zero values |
| Pattern | Chronological-order | Values in ascending time sequence |
| Pattern | Reverse-chronological-order | Values in descending time sequence |
| Pattern | All-unique-value | Every value is unique |
| Pattern | Unary | Only one unique value |
| Pattern | High-cardinality | Large number of unique values |
| Balance | Multi-imbalance | Uneven distribution across categories |
| Balance | Multi-balance | Even distribution across categories |
The Feature Tags uses a traffic light colour system to indicate different levels of significance or potential data quality concerns e.g. imbalance datasets, high/low variance features, or presence of outliers.
| Color | Significance | Example Tags |
|---|---|---|
| Green | Good/Normal condition. | Multi-balance, Normal distribution |
| Yellow | Warning/Moderate concern. | Small number of outliers, Minor skewness |
| Red | Critical/Significant concern | Significant number of outliers, Multi-imbalance |