The concordance index (c-index) is a parameter used to evaluate how well a predictive model performs. By definition, it is the proportion of concordant pairs among all comparable pairs at different time points. This metric is particularly significant in biological contexts such as cancer prognosis, where it helps assess the accuracy of survival time predictions.
In Python, you can compute it using the concordance_index
function from the lifelines
package.
Let’s look at a concrete example to understand its meaning. Suppose we have six patients with actual survival times of 1 month, 6 months, 12 months, 2 years, 3 years, and 5 years. If the predictions exactly match the actual values, the c-index is 1.0, indicating perfect prediction.
# Import necessary packages
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
# Install lifelines if not already installed
!pip install lifelines
from lifelines.utils import concordance_index
# Define the data
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [1, 6, 12, 24, 36, 60]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# Output:
# name survive predicted
# 0 Zhang San 1 1
# 1 Li Si 6 6
# 2 Wang Wu 12 12
# 3 Zhao Er 24 24
# 4 Ma Zi 36 36
# 5 someone 60 60
# 1.0
In fact, the c-index does not depend on the actual values but rather on the ordering, making it similar to Spearman’s correlation — a non-parametric method. If we change the predicted values while preserving the order, the c-index remains 1.
Example 1:
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [1, 1.1, 1.2, 2.4, 3.6, 6]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# 1.0
Example 2:
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [1, 60, 120, 240, 360, 600]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# 1.0
However, if the order is incorrect, the c-index drops significantly.
Example 3:
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [1, 12, 6, 36, 24, 60]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# Output: 0.8666666666666667
Example 4: Reverse order:
df = pd.DataFrame({
"name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
"survive": [1, 6, 12, 24, 36, 60],
"predicted": [60, 36, 24, 12, 6, 1]
})
c_index = concordance_index(df.survive, df.predicted)
print(df)
print(c_index)
# Output: 0.0
Summary
The concordance index (c-index) is a useful metric in survival analysis for evaluating the performance of predictive models. It is sensitive to the ranking order of predictions, but insensitive to the specific numerical values. This makes it especially suitable for assessing models where rank accuracy is more important than exact value prediction.