Comment from the Stata technical group
An Introduction to Survival Analysis Using Stata, Revised Edition, is an ideal tutorial
for professional data analysts who want to learn survival analysis or want to learn to use Stata to analyze survival data. This text also
serves as a valuable reference for those who already have experience using
Stata's survival-analysis routines.
Because survival analysis requires specialized data management
and analysis procedures, Stata provides the st family of
commands for organizing and summarizing survival data. The authors of this
text developed Stata's st commands,
as well as Stata's NetCourse 631An Introduction to Survival
Analysis. This text is an outgrowth of the lecture notes for that course,
and those who have taken the course will find in this text the companion
text that many students have asked for.
This book includes statistical theory, step-by-step procedures for
analyzing survival data, a detailed usage guide for Stata's most widely used
st commands, and pointers for using Stata to
analyze survival data and present the results. This book develops
from first principles the statistical concepts unique to survival data and
assumes that the reader has only a knowledge of basic probability and statistics and a working
knowledge of Stata.
The first three chapters cover basic theoretical concepts:
hazard and cumulative hazard functions and their interpretations;
survivor functions; hazard models; and a comparison of nonparametric,
semiparametric and parametric methodologies. Chapter 4 deals with censoring
and truncation.
The next three chapters cover the formatting, manipulation,
stsetting, and error-checking involved in preparing survival data
for analysis using Stata's st analysis commands. Chapter 8 covers
nonparametric methods, including the KaplanMeier and NelsonAalen
estimators, and the various nonparametric tests for the equality of survival
experience.
Chapters 9, 10, and 11 are devoted to Cox regression and include various
examples of fitting a Cox model, obtaining predictions, interpreting results,
building models, and model diagnostics.
The final four chapters cover
parametric models, which are fitted using Stata's streg command. Included
in these chapters are detailed derivations of all six parametric models
currently supported in Stata; methods for determining which model is
appropriate, for obtaining predictions, and for stratification; and advanced topics,
such as frailty models.
Table of contents
Preface to the revised edition (pdf)
Preface (pdf)
Notation and Typography
1 The problem of survival analysis (pdf)
- 1.1 Parametric modeling
- 1.2 Semiparametric modeling
- 1.3 Nonparametric analysis
- 1.4 Linking the three approaches
2 Describing the distribution of failure times
- 2.1 The survivor and hazard functions
- 2.2 The quantile function
- 2.3 Interpreting the hazard and cumulative hazard
- 2.3.1 Interpreting the cumulative hazard
- 2.3.2 Interpreting the hazard rate
- 2.4 Means and medians
3 Hazard models
- 3.1 Parametric models
- 3.2 Semiparametric models
- 3.3 Analysis time (time at risk)
4 Censoring and truncation
- 4.1 Censoring
- 4.1.1 Right censoring
- 4.1.2 Interval censoring
- 4.1.3 Left censoring
- 4.2 Truncation
- 4.2.1 Left truncation (delayed entry)
- 4.2.2 Interval truncation (gaps)
- 4.2.3 Right truncation
5 Recording survival data
- 5.1 The desired format
- 5.2 Other formats
- 5.3 Example
6 Using stset
- 6.1 A short lesson on dates
- 6.2 The purpose of the stset command
- 6.3 The syntax of the stset command
- 6.3.1 Specifying analysis time
- 6.3.2 Variables defined by stset
- 6.3.3 Specifying what constitutes failure
- 6.3.4 Specifying when subjects exit from the analysis
- 6.3.5 Specifying when subjects enter the analysis
- 6.3.6 Specifying the subject-id variable
- 6.3.7 Specifying the begin-of-span variable
- 6.3.8 Convenience options
7 After stset
- 7.1 Look at stset's output
- 7.2 List some of your data
- 7.3 Use stdes
- 7.4 Use stvary
- 7.5 Perhaps use stfill
- 7.6 Example: Hip fracture data
8 Nonparametric analysis
- 8.1 Inadequacies of standard univariate methods
- 8.2 The KaplanMeier estimator
- 8.2.1 Calculation
- 8.2.2 Censoring
- 8.2.3 Left truncation (delayed entry)
- 8.2.4 Interval truncation (gaps)
- 8.2.5 Relationship to the empirical distribution function
- 8.2.6 Other uses of sts list
- 8.2.7 Graphing the KaplanMeier estimate
- 8.3 The NelsonAalen estimator
- 8.4 Estimating the hazard function
- 8.5 Tests of hypothesis
- 8.5.1 The log-rank test
- 8.5.2 The Wilcoxon test
- 8.5.3 Other tests
- 8.5.4 Stratified tests
9 The Cox proportional hazards model
- 9.1 Using stcox
- 9.1.1 The Cox model has no intercept
- 9.1.2 Interpreting coefficients
- 9.1.3 The effect of units on coefficients
- 9.1.4 Estimating the baseline cumulative hazard and survivor functions
- 9.1.5 Estimating the baseline hazard function
- 9.1.6 The effect of units on the baseline functions
- 9.2 Likelihood calculations
- 9.2.1 No tied failures
- 9.2.2 Tied failures
- The marginal calculation
- The partial calculation
- The Breslow approximation
- The Efron approximation
- 9.2.3 Summary
- 9.3 Stratified analysis
- 9.3.1 Obtaining coefficient estimates
- 9.3.2 Obtaining estimates of baseline functions
- 9.4 Cox models with shared frailty
- 9.4.1 Parameter estimation
- 9.4.2 Obtaining estimates of baseline functions
10 Model building using stcox
- 10.1 Indicator variables
- 10.2 Categorical variables
- 10.3 Continuous variables
- 10.4 Interactions
- 10.5 Time-varying variables
- 10.5.1 Using stcox, tvc() texp()
- 10.5.2 Using stsplit
11 The Cox model: Diagnostics
- 11.1 Testing the proportional hazards assumption
- 11.1.1 Tests based on re-estimation
- 11.1.2 Test based on Schoenfeld residuals
- 11.1.3 Graphical methods
- 11.2 Residuals
- Reye's syndrome data
- 11.2.1 Determining functional form
- 11.2.2 Goodness of fit
- 11.2.3 Outliers and influential points
12 Parametric models
- 12.1 Motivation
- 12.2 Classes of parametric models
- 12.2.1 Parametric proportional hazards models
- 12.2.2 Accelerated failure-time models
- 12.2.3 Comparing the two parameterizations
13 A survey of parametric regression models in Stata
- 13.1 The exponential model
- 13.1.1 Exponential regression in the PH metric
- 13.1.2 Exponential regression in the AFT metric
- 13.2 Weibull regression
- 13.2.1 Weibull regression in the PH metric
- Fitting null models
- 13.2.2 Weibull regression in the AFT metric
- 13.3 Gompertz regression (PH metric)
- 13.4 Log-normal regression (AFT metric)
- 13.5 Log-logistic regression (AFT metric)
- 13.6 Generalized gamma regression (AFT metric)
- 13.7 Choosing among parametric models
- 13.7.1 Nested models
- 13.7.2 Non-nested models
14 Post-estimation commands for parametric models
- 14.1 Use of predict after streg
- 14.1.1 Predicting the time of failure
- 14.1.2 Predicting the hazard and related functions
- 14.1.3 Calculating residuals
- 14.2 Using stcurve
15 Generalizing the parametric regression model
- 15.1 Using the ancillary() option
- 15.2 Stratified models
- 15.3 Frailty models
- 15.3.1 Unshared frailty models
- 15.3.2 Kidney data
- 15.3.3 Testing for heterogeneity
- 15.3.4 Shared frailty models
References
Author index (pdf)
Subject index (pdf)