Final Week Data Analysis Questions

SIBS logoSummer Institute for Training in Biostatistics (SIBS)

Preface

These questions are meant for you to exercise your new skills in data analysis and programming. There is a series of questions for each dataset. You are required to collaborate with your classmates assigned the same dataset and submit a report describing your findings, by 5:00 pm on Friday, 22nd July 2005. Graphics will probably form a major part of your analysis.

Your report will be looked at and any of your questions will be answered; assessments will be very lenient. Feel free to explore the data beyond the scope of the questions and include any questions you might have about the data in your report (along with your emails, so that we can contact you after the course).

Please contact Deepayan Sarkar with questions.

Enjoy the rest of your summer. We enjoyed having you here.

Questions

VA Lung Cancer trial

You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with

The Karnofsky score is a measure of a patient's general health, ranging from 0 (dead) to 100 (unimpaired). For further details, see http://virtualtrials.com/karnofsky.cfm.

COAST

For the COAST study, consider the following question. Did cytokine levels (IL-5, IL-10, IL-13, IFN-gamma) change from cord blood to 1 year? Did the change in cytokine levels appear to vary according to

DIG

You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Use death as the endpoint, but consider death due to non-cardiac, non-vascular causes as censored. Does the treatment appear to be effective in reducing time to first hospitalization (for any reason)? Does the effectiveness, such as it may be, appear to vary with prior history of hypertension?

VEST

You can address the following questions with survival analysis methods. Do either the high dose (60 mg.) or low dose (30 mg.) treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with New York Heart Association class (a scale reflecting severity of symptoms)?

PROMISE

You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with

PRAISE

You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with

PRAISE 2

You can address the following questions with survival analysis methods. Does the treatment appear to be effective? Does the effectiveness, such as it may be, appear to vary with

The PRAISE and PRAISE2 studies had the same treatment regimens and nearly identical eligibility requirements. PRAISE enrolled patients with both ischemic and non-ischemic etiology of heart failure while PRAISE2 enrolled only patients with non-ischemic etiology. Looking only at patients with non-ischemic etiology, compare the survival of the placebo groups in the two trials. Are there baseline characteristics among those available in your datasets which are different in these two groups of subjects?

FRAMINGHAM

We are interested in studying whether survival differs between men/women and smokers/non-smokers. The event times recorded in the study are relative to the date of the first clinic exam, which has no particular biological meaning. A more reasonable baseline is the patients' birth. Unfortunately, this makes it impossible to naively use the survival analysis techniques we have learnt. These techniques assume that all patients have been followed since the baseline, whereas in this case, we only see patients who are alive at the time of the first clinic exam.

To answer the questions below, you will first need to extract some information from the dataset. For each of the 4434 patients, obtain their

From this information, derive (assuming 365.25 days in a year) each patient's age (in years) at the time of death / censoring (round the results).

look at only the subset of 1550 patients that died during follow-up. Construct a two-way frequency table of age at death against sex. Plot this table using a bar chart and conduct a chi-square test to see if sex is independent of age at death (you can pool age groups together if you feel that the assumptions of the chi-square tests are being violated). Do the same analysis replacing sex by smoking status.

In a study like this, one interesting question is whether survival patterns change over time (due to improvement in medical care, etc). Artificially group the patients that died according to their age at the first exam (say 34-48, 49-55, 56-61 and 62-70). The cut function may be helpful here. Repeat the plots and tests above for each subgroup and comment on the results.

Spellman

Available as a PDF file here.

Last modified: Tue Jul 19 10:24:08 CDT 2005