Quick Links:
The development of methods for prediction of mortality rates in Intensive Care Unit (ICU) populations has been motivated primarily by the need to compare the efficacy of medications, care guidelines, surgery, and other interventions when, as is common, it is necessary to control for differences in severity of illness or trauma, age, and other factors. For example, comparing overall mortality rates between trauma units in a community hospital, a teaching hospital, and a military field hospital is likely to reflect the differences in the patient populations more than any differences in standards of care. Acuity scores such as APACHE and SAPS-II are widely used to account for these differences in the context of such studies.
By contrast, the focus of the PhysioNet/CinC Challenge 2012 is to develop methods for patient-specific prediction of in-hospital mortality. Participants will use information collected during the first two days of an ICU stay to predict which patients survive their hospitalizations, and which patients do not.
See the Quick Links at the top of this page to download the Challenge data!
The data used for the challenge consist of records from 12,000 ICU stays. All patients were adults who were admitted for a wide variety of reasons to cardiac, medical, surgical, and trauma ICUs. ICU stays of less than 48 hours have been excluded.
Four thousand records comprise training set A, and the remaining records form test sets B and C. Outcomes are provided for the training set records, and are withheld for the test set records.
Up to 41 variables were recorded at least once during the first 48 hours after admission to the ICU. Not all variables are available in all cases, however. Five of these variables are general descriptors (collected on admission), and the remainder are time series, for which multiple observations may be available.
Each observation has an associated time-stamp indicating the elapsed time of the observation since ICU admission in each case, in hours and minutes. Thus, for example, a time stamp of 35:19 means that the associated observation was made 35 hours and 19 minutes after the patient was admitted to the ICU.
Each record is stored as a comma-separated value (CSV) text file. To simplify downloading, participants may download a zip file or tarball containing all of training set A or test set B. Test set C will be used for validation only and will not be made available to participants.
Five additional outcome-related descriptors, described below, are known for each record. These are stored in separate CSV text files for each of sets A, B, and C, but only those for set A are available to challenge participants.
All valid values for general descriptors, time series variables, and outcome-related descriptors are non-negative (≥ 0). A value of -1 indicates missing or unknown data (for example, if a patient's height was not recorded).
As noted, these five descriptors are collected at the time the patient is admitted to the ICU. Their associated time-stamps are set to 00:00 (thus they appear at the beginning of each patient's record).
|
These 37 variables may be observed once, more than once, or not at all in some cases:
|
|
|
The time series measurements are recorded in chronological order within each record, and the associated time stamps indicate the elapsed time since admission to the ICU. Measurements may be recorded at regular intervals ranging from hourly to daily, or at irregular intervals as required. Not all time series are available in all cases.
In a few cases, such as blood pressure, different measurements made
using two or more methods or sensors may be recorded with the same or only
slightly different time-stamps. Occasional outliers should be expected as well.
*
The outcome-related descriptors are kept in a separate CSV text file for each of the three record sets; as noted, only the file associated with training set A is available to participants. Each line of the outcomes file contains these descriptors:
The Length of stay is the number of days between the patient's admission to the ICU and the end of hospitalization (including any time spent in the hospital after discharge from the ICU). If the patient's death was recorded (in or out of hospital), then Survival is the number of days between ICU admission and death; otherwise, Survival is assigned the value -1. Since patients who spent less than 48 hours in the ICU have been excluded, Length of stay and Survival never have the values 0 or 1 in the challenge data sets. Given these definitions and constraints,
Survival > Length of stay ⇒ Survivor
Survival = -1 ⇒ Survivor
2 ≤ Survival ≤ Length of stay ⇒ In-hospital death
The rules for participating in the Challenge will be posted here shortly.
To begin, we recommend studying the training set as preparation for the Challenge itself. In particular, note that the SAPS-I score can be calculated readily from the time series, as the sample entry does. To succeed in the Challenge, you should aim to outperform the sample entry.
Awards will be presented to the most successful eligible participants during Computing in Cardiology 2012. To be eligible for an award, you must:
As in previous challenges, participants may compete in multiple events:
Scoring for these events is based on four metrics: sensitivity (Se), specificity (Sp), positive predictivity (+P), and negative predictivity (-P). We define the numbers of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) as below:
| Outcome | Observed | ||
| Death | Survivor | ||
| Predicted | Death | TP | FP |
| Survivor | FN | TN | |
Using these definitions, the four metrics are given by:
| Se = TP / (TP + FN) | [the fraction of in-hospital deaths that are predicted] |
| +P = TP / (TP + FP) | [the fraction of correct predictions of in-hospital deaths] |
| Sp = TN / (TN + FP) | [the fraction of survivors that are predicted] |
| -P = TN / (TN + FN) | [the fraction of correct predictions of survivors] |