December 23, 2021December 23, 2021

The c-index and Its Application in Survival Analysis

The concordance index (c-index) is a parameter used to evaluate how well a predictive model performs. By definition, it is the proportion of concordant pairs among all comparable pairs at different time points. This metric is particularly significant in biological contexts such as cancer prognosis, where it helps assess the accuracy of survival time predictions.

In Python, you can compute it using the concordance_index function from the lifelines package.

Let’s look at a concrete example to understand its meaning. Suppose we have six patients with actual survival times of 1 month, 6 months, 12 months, 2 years, 3 years, and 5 years. If the predictions exactly match the actual values, the c-index is 1.0, indicating perfect prediction.

# Import necessary packages
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

# Install lifelines if not already installed
!pip install lifelines

from lifelines.utils import concordance_index

# Define the data
df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [1, 6, 12, 24, 36, 60]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# Output:
#       name  survive  predicted
# 0  Zhang San        1          1
# 1     Li Si        6          6
# 2    Wang Wu       12         12
# 3    Zhao Er       24         24
# 4     Ma Zi       36         36
# 5   someone       60         60
# 1.0

In fact, the c-index does not depend on the actual values but rather on the ordering, making it similar to Spearman’s correlation — a non-parametric method. If we change the predicted values while preserving the order, the c-index remains 1.

Example 1:

df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [1, 1.1, 1.2, 2.4, 3.6, 6]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# 1.0

Example 2:

df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [1, 60, 120, 240, 360, 600]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# 1.0

However, if the order is incorrect, the c-index drops significantly.

Example 3:

df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [1, 12, 6, 36, 24, 60]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# Output: 0.8666666666666667

Example 4: Reverse order:

df = pd.DataFrame({
    "name": ["Zhang San", "Li Si", "Wang Wu", "Zhao Er", "Ma Zi", "someone"],
    "survive": [1, 6, 12, 24, 36, 60],
    "predicted": [60, 36, 24, 12, 6, 1]
})
c_index = concordance_index(df.survive, df.predicted)

print(df)
print(c_index)

# Output: 0.0

Summary

The concordance index (c-index) is a useful metric in survival analysis for evaluating the performance of predictive models. It is sensitive to the ranking order of predictions, but insensitive to the specific numerical values. This makes it especially suitable for assessing models where rank accuracy is more important than exact value prediction.

December 22, 2021December 22, 2021

Python Native Lists vs. NumPy Arrays

TECHNOLOGY

In Python, you can choose from various native data types to store collection data, including list, array, tuple, and dictionary. Among these, the list is highly flexible, can store any content, and is mutable, making it widely applicable. However, for scientific computing and storing purely numerical data, NumPy is widely used and has practically replaced lists. So, what are the differences between them, how significant are these differences, and how should they be applied in practice?

December 28, 2020December 28, 2020

Hands-on Implementation of Random Forest Algorithm with Python

TECHNOLOGY

Hands-on Implementation of Random Forest Algorithm with Python

This article will guide you through a hands-on implementation of a powerful random forest machine learning model. It aims to complement my conceptual explanation of random forests, but as long as you have a basic understanding of decision trees and random forests, you can fully read it. Later, we will discuss how to improve the model built here.

July 29, 2020July 29, 2020

python3 solution to LeeCode medium problem

TECHNOLOGY

python3 solution to LeeCode medium problem

This is an article analyzing a problem from the coding practice site LeeCode.

May 31, 2020May 31, 2020

Calculating the Gini Coefficient and Plotting the Lorenz Curve with matplotlib

TECHNOLOGY

Calculating the Gini Coefficient and Plotting the Lorenz Curve with matplotlib

The Gini coefficient and Lorenz curve are widely used to represent data inequality, especially wealth inequality. However, currently in Python, there isn't a very good function to directly plot the Lorenz curve. Since the current project requires it, this article records how to use numpy, pandas, matplotlib, and other packages to calculate the Gini coefficient and plot the Lorenz curve for practical use.

Google Advertisement

May 3, 2020May 3, 2020

Using folium to Draw a COVID-19 Pandemic Map

TECHNOLOGY

Using folium to Draw a COVID-19 Pandemic Map

After being contained in China, the COVID-19 pandemic became increasingly severe worldwide. Countries and regions publish daily new infection and death data to help fight the pandemic globally.

January 8, 2020January 8, 2020

Application of Python Implementation of Gradient Descent in Practice

TECHNOLOGY

Application of Python Implementation of Gradient Descent in Practice

Gradient descent is a first-order optimization algorithm, commonly called the steepest descent method. To find a local minimum of a function using gradient descent, one must iteratively move from the current point in the opposite direction of the gradient (or approximate gradient) by a specified step size.

December 27, 2019December 27, 2019

Statistical Skew Distributions Reveal Statistical Traps in Life

TECHNOLOGY

Statistical Skew Distributions Reveal Statistical Traps in Life

90% of drivers believe their driving skills are above average, 90% think their IQ is above the average IQ of the population, and the key is this can actually be consistent with real data — it's true and not fabricated.

May 17, 2019May 17, 2019

Decoding Real Addresses from Xunlei Thunder Download Links

MISCELLANEOUS

Decoding Real Addresses from Xunlei Thunder Download Links

Students who frequently download videos and games often encounter Xunlei download links starting with 'thunder://', but are often unable to download due to copyright issues. Here, we will explain the conversion between regular download URLs and Xunlei download links.

Google Advertisement

March 28, 2019March 28, 2019

Longest Palindromic Substring Algorithm - Manacher

TECHNOLOGY

Longest Palindromic Substring Algorithm - Manacher

While solving LeetCode problems, I encountered a question about finding the longest palindromic substring.

March 8, 2019March 8, 2019

Finding Common Values in Two Python Lists

TECHNOLOGY

Finding Common Values in Two Python Lists

In daily life, we often encounter the need to find common values between two arrays. This article provides several simple and practical methods on how to elegantly get common values between two arrays in Python.

January 20, 2019January 20, 2019

Detailed Examples of Seaborn Plotting Kernel Density Curves

TECHNOLOGY

Detailed Examples of Seaborn Plotting Kernel Density Curves

January 13, 2019January 13, 2019

Python Implementation for Kugou Music MP3 Download

TECHNOLOGY

Summary

Related