Comment from the Stata technical group
Regression Models for Categorical Dependent Variables Using Stata, 2nd
Edition, by J. Scott Long and Jeremy Freese, shows how to fit and
interpret regression models for categorical data with Stata. Nearly 50% longer
than the previous edition, the book covers new topics for fitting and
interpretating models included in Stata 9, such as multinomial probit models,
the stereotype logistic model, and zero-truncated count models. Many of the
interpretation techniques have been updated to include interval as well as
point estimates.
Although regression models for categorical dependent variables are common, few
texts explain how to interpret such models. Regression Models for Categorical
Dependent Variables Using Stata, 2nd Edition, fills this void, showing how to
fit and interpret regression models for categorical data with Stata. The
authors also provide a suite of commands for hypothesis testing and model
diagnostics to accompany the book.
The book begins with an excellent introduction to Stata and then provides a
general treatment of estimation, testing, fit, and interpretation in this
class of models. Binary, ordinal, nominal, and count outcomes are covered in
detail in separate chapters. The final chapter discusses how to fit and
interpret models with special characteristics, such as ordinal and nominal
independent variables, interaction, and nonlinear terms. One appendix
discusses the syntax of the author-written commands, and a second gives
details of the datasets used by the authors in the book.
This book is filled with concrete examples. Because all the examples,
datasets, and author-written commands are available from the authors’
web site,
readers can easily replicate the examples using Stata. This book is
ideal for students or applied researchers who want to know how to fit and
interpret models for categorical data.
Table of contents
Preface (pdf)
Part I General Information
1 Introduction
- 1.1 What is this book about?
- 1.2 Which models are considered?
- 1.3 Whom is this book for?
- 1.4 How is the book organized?
- 1.5 What software do you need?
- 1.5.1 Updating Stata 9
- 1.5.2 Installing SPost
- Installing SPost using search
- Installing SPost using net install
- 1.5.3 What if commands do not work?
- 1.5.4 Uninstalling SPost
- 1.5.5 Using spex to load data and run examples
- 1.5.6 More files available on the web site
- 1.6 Where can I learn more about the models?
2 Introduction to Stata
- 2.1 The Stata interface
- Changing the scrollback buffer size
- Changing the display of variable names in the Variables window
- 2.2 Abbreviations
- 2.3 How to get help
- 2.3.1 Online help
- 2.3.2 Manuals
- 2.3.3 Other resources
- 2.4 The working directory
- 2.5 Stata file types
- 2.6 Saving output to log files
- Options
- 2.6.1 Closing a log file
- 2.6.2 Viewing a log file
- 2.6.3 Converting from SMCL to plain text or PostScript
- 2.7 Using and saving datasets
- 2.7.1 Data in Stata format
- 2.7.2 Data in other formats
- 2.7.3 Entering data by hand
- 2.8 Size limitations on datasets*
- 2.9 Do-files
- 2.9.1 Adding comments
- 2.9.2 Long lines
- 2.9.3 Stopping a do-file while it is running
- 2.9.4 Creating do-files
- Using Stata's Do-file Editor
- Using other editors to create do-files
- 2.9.5 Recommended structure for do-files
- 2.10 Using Stata for serious data analysis
- 2.11 Syntax of Stata commands
- 2.11.1 Commands
- 2.11.2 Variable lists
- 2.11.3 if and in qualifiers
- Examples of if qualifier
- 2.11.4 Options
- 2.12 Managing data
- 2.12.1 Looking at your data
- 2.12.2 Getting information about variables
- 2.12.3 Missing values
- 2.12.4 Selecting observations
- 2.12.5 Selecting variables
- 2.13 Creating new variables
- 2.13.1 generate command
- 2.13.2 replace command
- 2.13.3 recode command
- 2.13.4 Common transformations for RHS variables
- Breaking a categorical variable into a set of binary variables
- More examples of creating binary variables
- Nonlinear transformations
- Interaction terms
- 2.14 Labeling variables and values
- 2.14.1 Variable labels
- 2.14.2 Value labels
- 2.14.3 notes command
- 2.15 Global and local macros
- 2.16 Graphics
- 2.16.1 graph command
- 2.16.2 Displaying previously drawn graphs
- 2.16.3 Printing graphs
- 2.16.4 Combining graphs
- 2.17 A brief tutorial
- A batch version
3 Estimation, testing, fit, and interpretation
- 3.1 Estimation
- 3.1.1 Stata's output for ML estimation
- 3.1.2 ML and sample size
- 3.1.3 Problems in obtaining ML estimates
- 3.1.4 Syntax of estimation commands
- Variable lists
- Specifying the estimation sample
- Weights
- Options
- 3.1.5 Reading the output
- Header
- Estimates and standard errors
- Confidence intervals
- 3.1.6 Storing estimation results
- 3.1.7 Reformatting output with estimates table
- 3.1.8 Reformatting output with estout
- 3.1.9 Alternative output with listcoef
- Options for types of coefficients
- Options for mlogit, mprobit, and slogit
- Other options
- Standardized coefficients
- Factor and percent change
- 3.2 Postestimation analysis
- 3.3 Testing
- 3.3.1 Wald tests
- The accumulate option
- 3.3.2 LR tests
- Avoiding invalid LR tests
- 3.4 estat command
- 3.5 Measures of fit
- Syntax of fitstat
- Options
- Models and measures
- Example of fitstat
- Methods and formulas for fitstat
- 3.6 Interpretation
- 3.6.1 Approaches to interpretation
- 3.6.2 Predictions using predict
- 3.6.3 Overview of prvalue, prchange, prtab, and prgen
- Specifying the levels of variables
- Options controlling output
- 3.6.4 Syntax for prvalue
- Options
- Options for confidence intervals
- Options used for bootstrapped confidence intervals
- 3.6.5 Syntax for prchange
- Options
- 3.6.6 Syntax for prtab
- Options
- 3.6.7 Syntax for prgen
- Options
- Options for confidence intervals and marginals
- Variables generated
- 3.6.8 Computing marginal effects using mfx
- 3.7 Confidence intervals for prediction
- 3.8 Next steps
Part II Models for Specific Kinds of Outcomes
4 Models for binary outcomes
- 4.1 The statistical model
- 4.1.1 A latent-variable model
- 4.1.2 A nonlinear probability model
- 4.2 Estimation using logit and probit
- Variable lists
- Specifying the estimation sample
- Weights
- Options
- Example
- 4.2.1 Observations predicted perfectly
- 4.3 Hypothesis testing with test and lrtest
- 4.3.1 Testing individual coefficients
- One- and two-tailed tests
- Testing single coefficients using test
- Testing single coefficients using lrtest
- 4.3.2 Testing multiple coefficients
- Testing multiple coefficients using test
- Testing multiple coefficients using lrtest
- 4.3.3 Comparing LR and Wald tests
- 4.4 Residuals and influence using predict
- 4.4.1 Residuals
- Example
- 4.4.2 Influential cases
- 4.4.3 Least likely observations
- Syntax
- Options
- Options for controlling the list of values
- 4.5 Measuring fit
- 4.5.1 Scalar measures of fit using fitstat
- 4.5.2 Hosmer–Lemeshow statistic
- 4.6 Interpretation using predicted values
- 4.6.1 Predicted probabilities with predict
- 4.6.2 Individual predicted probabilities with prvalue
- 4.6.3 Tables of predicted probabilities with prtab
- 4.6.4 Graphing predicted probabilities with prgen
- 4.6.5 Plotting confidence intervals
- 4.6.6 Changes in predicted probabilities
- Marginal change
- Discrete change
- 4.7 Interpretation using odds ratios with listcoef
- Multiplicative coefficients
- Effect of the base probability
- Percent change in the odds
- 4.8 Other commands for binary outcomes
5 Models for ordinal outcomes
- 5.1 The statistical model
- 5.1.1 A latent-variable model
- 5.1.2 A nonlinear probability model
- 5.2 Estimation using ologit and oprobit
- Variable lists
- Specifying the estimation sample
- Weights
- Options
- 5.2.1 Example of attitudes toward working mothers
- 5.2.2 Predicting perfectly
- 5.3 Hypothesis testing with test and lrtest
- 5.3.1 Testing individual coefficients
- 5.3.2 Testing multiple coefficients
- 5.4 Scalar measures of fit using fitstat
- 5.5 Converting to a different parameterization*
- 5.6 The parallel regression assumption
- 5.7 Residuals and outliers using predict
- 5.8 Interpretation
- 5.8.1 Marginal change in y*
- 5.8.2 Predicted probabilities
- 5.8.3 Predicted probabilities with predict
- 5.8.4 Individual predicted probabilities with prvalue
- 5.8.5 Tables of predicted probabilities with prtab
- 5.8.6 Graphing predicted probabilities with prgen
- 5.8.7 Changes in predicted probabilities
- Marginal change with prchange
- Marginal change with mfx
- Discrete change with prchange
- Confidence intervals for discrete changes
- Computing discrete change for a 10-year increase in age
- 5.8.8 Odds ratios using listcoef
- 5.9 Less common models for ordinal outcomes
- 5.9.1 The stereotype model
- 5.9.2 The generalized ordered logit model
- 5.9.3 The continuation ratio model
6 Models for nominal outcomes with case-specific data
- 6.1 The multinomial logit model
- 6.1.1 Formal statement of the model
- 6.2 Estimation using mlogit
- Variable lists
- Specifying the estimation sample
- Weights
- Options
- 6.2.1 Example of occupational attainment
- 6.2.2 Using different base categories
- 6.2.3 Predicting perfectly
- 6.3 Hypothesis testing of coefficients
- 6.3.1 mlogtest for tests of the MNLM
- Options
- 6.3.2 Testing the effects of the independent variables
- A likelihood-ratio test
- A Wald test
- Testing multiple independent variables
- 6.3.3 Tests for combining alternatives
- A Wald test for combining alternatives
- Using test [category]*
- An LR test for combining alternatives
- Using constraint with lrtest*
- 6.4 Independence of irrelevant alternatives
- Hausman test of IIA
- Small–Hsiao test of IIA
- 6.5 Measures of fit
- 6.6 Interpretation
- 6.6.1 Predicted probabilities
- 6.6.2 Predicted probabilities with predict
- Using predict to compare mlogit and ologit
- 6.6.3 Predicted probabilities and discrete change with prvalue
- 6.6.4 Tables of predicted probabilities with prtab
- 6.6.5 Graphing predicted probabilities with prgen
- Plotting probabilities for one outcome and two groups
- Graphing probabilities for all outcomes for one group
- 6.6.6 Changes in predicted probabilities
- Computing marginal and discrete change with prchange
- Marginal change with mfx
- 6.6.7 Plotting discrete changes with prchange and mlogview
- 6.6.8 Odds ratios using listcoef and mlogview
- Listing odds ratios with listcoef
- Plotting odds ratios
- 6.6.9 Using mlogplot*
- 6.6.10 Plotting estimates from matrices with mlogplot*
- Options for using matrices with mlogplot
- Global macros and matrices used by mlogplot
- Example
- 6.7 Multinomial probit model with IIA
- 6.8 Stereotype logistic regression
- 6.8.1 Formal statement of the one-dimensional SLM
- 6.8.2 Fitting the SLM with slogit
- Options
- Example
- 6.8.3 Interpretation using predicted probabilities
- 6.8.4 Interpretation using odds ratios
- 6.8.5 Distinguisability and the φ parameters
- 6.8.6 Ordinality in the one-dimensional SLM
- Higher-dimension SLM
7 Models for nominal outcomes with alternative-specific data
- 7.1 Alternative-specific data organization
- 7.1.1 Syntax for case2alt
- 7.2 The conditional logit model
- 7.2.1 Fitting the conditional logit model
- Example of the clogit model
- 7.2.2 Interpreting odds ratios from clogit
- 7.2.3 Interpreting probabilities from clogit
- Using predict
- Using asprvalue
- 7.2.4 Fitting the multinomial logit model using clogit
- Setting up the data with case2alt
- Fitting multinomial logit with clogit
- 7.2.5 Using clogit with case- and alternative-specific variables
- Example of a mixed model
- Interpretation of odds ratios using listcoef
- Interpretation of predicted probabilities using asprvalue
- Allowing the effects of alternative-specific variables to vary over
the alternatives
- 7.3 Alternative-specific multinomial probit
- 7.3.1 The model
- 7.3.2 Informal explanation of estimation by simulation
- 7.3.3 Alternative-based data with uncorrelated errors
- Options
- Examples
- 7.3.4 Alternative-based data with correlated errors
- 7.4 The sturctural covariance matrix
- 7.4.1 Interpretation using probabilities
- Using predict
- Using asprvalue
- 7.4.2 Identification, discrete change, and marginal effects
- 7.4.3 Testing for IIA
- 7.4.4 Adding case-specific data
- 7.5 Rank-ordered logistic regression
- 7.5.1 Fitting the rank-ordered logit model
- Options
- Example of the rank-ordered logit model
- 7.5.2 Interpreting results from rologit
- Interpretation using odds ratios
- Interpretation using predicted probabilties
- 7.6 Conclusions
8 Models for count outcomes
- 8.1 The Poisson distribution
- 8.1.1 Fitting the Poisson distribution with the poisson command
- 8.1.2 Computing predicted probabilities with prcounts
- Syntax
- Options
- Variables generated
- 8.1.3 Comparing observed and predicted counts with prcounts
- 8.2 The Poisson regression model
- 8.2.1 Estimating the PRM with poisson
- Variable lists
- Specifying the estimation sample
- Weights
- Options
- 8.2.2 Example of fitting the PRM
- 8.2.3 Interpretation using the rate, μ
- Factor change in E(y|x)
- Percent change in E(y|x)
- Example of factor and percent change
- Marginal change in E(y|x)
- Example of marginal change using prchange
- Example of marginal change using mfx
- Discrete change in E(y|x)
- Example of discrete change using prchange
- Example of discrete change with confidence intervals
- 8.2.4 Interpretation using predicted probabilities
- Example of predicted probabilities using prvalue
- Example of predicted probabilities using prgen
- Example of predicted probabilities using prcounts
- 8.2.5 Exposure time*
- 8.3 The negative binomial regression model
- 8.3.1 Fitting the NBRM with nbreg
- NB1 and NB2 variance functions
- 8.3.2 Example of fitting the NBRM
- Comparing the PRM and NBRM using estimates table
- 8.3.3 Testing for overdispersion
- 8.3.4 Interpretation using the rate μ
- 8.3.5 Interpretation using predicted probabilities
- 8.4 Models for truncated counts
- 8.4.1 Fitting zero-truncated models
- 8.4.2 Example of fitting zero-truncated models
- 8.4.3 Interpretation of parameters
- 8.4.4 Interpretation using predicted probabilities and rates
- 8.4.5 Computing predicted rates and probabilities in the estimation sample
- 8.5 The hurdle regression model*
- 8.5.1 In-sample predictions for the hurdle model
- 8.5.2 Predictions for user-specified values
- 8.6 Zero-inflated count models
- 8.6.1 Fitting zero-inflated models with zinb and zip
- Variable lists
- Options
- 8.6.2 Example of fitting the ZIP and ZINB models
- 8.6.3 Interpretation of coefficients
- 8.6.4 Interpretation of predicted probabilities
- Predicted probabilities with prvalue
- Confidence intervals with prvalue
- Predicted probabilities with prgen
- 8.7 Comparisons among count models
- 8.7.1 Comparing mean probabilities
- 8.7.2 Tests to compare count models
- LR tests of α
- Vuong test of nonnested models
- 8.8 Using countfit to compare count models
9 More topics
- 9.1 Ordinal and nominal independent variables
- 9.1.1 Coding a categorical independent variable as a set of dummy variables
- 9.1.2 Estimation and interpretation with categorical independent variables
- 9.1.3 Tests with categorical independent variables
- Testing the effect of membership in one category versus the reference category
- Testing the effect of membership in two nonreference categories
- Testing that a categorical independent variable has no effect
- Testing whether treating an ordinal variable as interval loses information
- 9.1.4 Discrete change for categorical independent variables
- Computing discrete change with prchange
- Computing discrete change with prvalue
- 9.2 Interactions
- 9.2.1 Computing sex differences in predictions with interactions
- 9.2.2 Computing sex differences in discrete change with interactions
- 9.3 Nonlinear nonlinear models
- 9.3.1 Adding nonlinearities to linear predictors
- 9.3.2 Discrete change in nonlinear models
- 9.4 Using praccum and forvalues to plot predictions
- Options
- 9.4.1 Example using age and age-squared
- 9.4.2 Using forvalues with praccum
- 9.4.3 Using praccum for graphing a transformed variable
- 9.4.4 Using praccum to graph interactions
- 9.4.5 Using forvalues with prvalue to create tables
- 9.4.6 A more advanced example*
- 9.4.7 Using forvalues to create tables with other commands
- 9.5 Extending SPost to other estimation commands
- 9.6 Using Stata more efficiently
- 9.6.1 profile.do
- 9.6.2 Changing screen fonts and window preferences
- 9.6.3 Using ado-files for changing directories
- 9.6.4 me.hlp file
- 9.7 Conclusions
A Syntax for SPost commands
- A.1 asprvalue
- Syntax
- Description
- Options
- Examples
- A.2 brant
- Syntax
- Description
- Options
- Examples
- Saved results
- A.3 case2alt
- Syntax
- Description
- Options
- Examples
- A.4 countfit
- Syntax
- Description
- Options for specifying the model
- Options to select the models to fit
- Options to label and save results
- Options to control what is printed
- Example
- A.5 fitstat
- Syntax
- Description
- Options
- Examples
- Saved results
- A.6 leastlikely
- Syntax
- Description
- Options
- Options for listing
- Examples
- A.7 listcoef
- Syntax
- Description
- Options
- Options for nominal outcomes
- Examples
- Saved results
- A.8 misschk
- Syntax
- Options
- Examples
- A.9 mlogplot
- Syntax
- Description
- Options
- Examples
- A.10 mlogtest
- Syntax
- Description
- Options
- Examples
- Saved results
- Acknowledgment
- A.11 mlogview
- Syntax
- Description
- Dialog box controls
- A.12 Overview of prchange, prgen, prtab, and prvalue
- Syntax
- Examples
- A.13 praccum
- Syntax
- Description
- Options
- Examples
- Variables generated
- A.14 prchange
- Syntax
- Description
- Options
- Examples
- A.15 prcounts
- Syntax
- Description
- Options
- Variables generated
- Examples
- A.16 prgen
- Syntax
- Description
- Options
- Options for confidence intervals and marginals
- Examples
- Variables generated
- A.17 prtab
- Syntax
- Description
- Options
- Examples
- A.18 prvalue
- Syntax
- Description
- Options
- Options for confidence intervals
- Options used for bootstrapped confidence intervals
- Examples
- Saved results
- A.19 spex
- Syntax
- Description
- Options
- Examples
B Description of datasets
- B.1 binlfp2
- B.2 couart2
- B.3 gsskidvalue2
- B.4 nomocc2
- B.5 ordwarm2
- B.6 science2
- B.7 travel2
- B.8 wlsrnk
References
Author index (pdf)
Subject index (pdf)