Skip to main content

Randomised Dependence Coefficient (RDC)

Definition

The Randomised Dependence Coefficient (RDC) is a statistical measure used to identify and quantify the level of association between two features. It is a non-parametric measure, meaning it does not make any assumptions about the underlying distribution of the data. RDC is particularly useful because it can capture both linear and non-linear dependencies, making it a versatile tool for data exploration and analysis. It works with both numerical and categorical features.

Range of scores: 0 to 1

A score of 0 indicates that the two features are independent, meaning there is no relationship between them. A score of 1, on the other hand, indicates complete dependence, meaning one feature can perfectly predict the other. Scores in between 0 and 1 indicate varying degrees of dependence.

How it works

The RDC is calculated using a process that involves several steps. First, it applies a copula transformation to the data, which is a way of standardising the data to remove any linear dependencies. Then, it applies a random projection to the transformed data, which is a way of reducing the dimensionality of the data while preserving its structure. Finally, it calculates the canonical correlation of the projected data, which is a measure of the linear dependence between the two features. This process allows the RDC to capture both linear and non-linear dependencies between features, making it a powerful tool for data analysis.