Comment from the Stata technical group
Data Analysis Using Stata provides a comprehensive introduction to
Stata that will be useful to those who are just learning statistics
and Stata as well as users of other statistical packages making the
switch to Stata. Throughout the book, the authors make extensive use
of examples using data from the German Socioeconomic Panel, a large
survey of households containing demographic, income, employment, and
other key information.
The book begins with an introduction to the Stata interface and then
proceeds with a discussion of Stata syntax and simple programming
tools like foreach loops. The core of the book includes chapters on
producing tables and graphs, performing linear regression, and using
logistic regression. All key concepts are illustrated with multiple
examples.
The remainder of the book includes chapters on reading text files,
writing programs and ado-files, and Internet resources, such as the
search command and the SSC archive.
Overall, Kohler and Kreuter's book will serve as a valuable introduction
to Stata, both for those who are new to statistics and statistical
computing as well as for those new to Stata but familiar with other
programs. The book also makes a handy reference guide for existing
Stata users.
Table of contents
Preface (pdf)
0 About the book
- 0.1 Structure
- 0.2 Using this book: Materials and hints
- 0.3 Teaching with this manual
1 "The first time"
- 1.1 Starting Stata
- 1.2 Setting up your screen
- 1.3 Your first analysis
- 1.4 Do-files
- 1.5 Exiting Stata
2 Working with do-files
- 2.1 From interactive work to working with a do-file
- 2.1.1 Alternative 1
- 2.1.2 Alternative 2
- 2.2 Designing do-files
- 2.2.1 Comments
- 2.2.2 Line breaks
- 2.2.3 Some crucial commands
- 2.3 Organizing your work
- 2.4 Summary
3 The grammar of Stata
- 3.1 The elements of Stata commands
- 3.1.1 Stata commands
- 3.1.2 The variable list
- List of variables: required or optional
- Abbreviation rules
- Special listings
- 3.1.3 Options
- 3.1.4 The in qualifier
- 3.1.5 The if qualifier
- 3.1.6 Expressions
- Operators
- Functions
- 3.1.7 Lists of numbers
- 3.1.8 Using filenames
- 3.2 Repeating similar commands
- 3.2.1 The by prefix
- 3.2.2 The foreach loop
- 3.2.3 The forvalues loop
- 3.3 Weights
4 Some general comments on the statistical commands
5 Creating and changing variables
- 5.1 The commands generate and replace
- 5.1.1 Variable names
- 5.1.2 Some examples
- 5.1.3 Changing codes with by, _n, and _N
- 5.1.4 Subscripts
- 5.2 Specialized recoding commands
- 5.2.1 The recode command
- 5.2.2 The egen command
- 5.3 Additional tools for recording data
- 5.3.1 String functions
- 5.3.2 Date functions
- 5.4 Commands for dealing with missing values
- 5.5 Labels
- 5.6 Storage types, or, the ghost in the machine
6 Creating and changing graphs
- 6.1 A primer on graph syntax
- 6.2 Graph types
- 6.2.1 Examples
- 6.2.2 Specialized graphs
- 6.3 Graph elements
- 6.3.1 Appearance of data
- Choice of marker
- Marker colors
- Marker size
- Lines
- 6.3.2 Graph and plot regions
- Graph size
- Plot region
- Scaling the axes
- 6.3.3 Information inside the plot region
- Reference lines
- Labeling inside the plot region
- 6.3.4 Information outside the plot region
- Labeling the axes
- Tick lines
- Axis titles
- The legend
- Graph titles
- 6.4 Multiple graphs
- 6.4.1 Overlaying numerous twoway graphs
- 6.4.2 Option by()
- 6.4.3 Combining graphs
- 6.5 Saving and printing graphs
7 Describing and comparing distributions
- 7.1 Categories: Few or many?
- 7.2 Variables with few categories
- 7.2.1 Tables
- Frequency tables
- More than one frequency table
- Comparing distributions
- Summary statistics
- 7.2.2 Graphs
- Histograms
- Bar charts
- Dot chart
- 7.3 Variables with many categories
- 7.3.1 Frequencies of grouped data
- Some remarks on grouping data
- Special techniques for grouping data
- 7.3.2 Describing data using statistics
- Important summary statistics
- The summarize command
- The tabstat command
- Comparing distributions using statistics
- 7.3.3 Graphs
- Box plots
- Histograms
- Kernel density estimation
- Quantile plot
- 7.3.4 Summary
- 7.4 Summary
8 Introduction to linear regression
- 8.1 Simple linear regression
- 8.1.1 The basic principle
- 8.1.2 Linear regression using Stata
- The table of coefficients
- Standard errors
- The table of ANOVA results
- The model fit table
- 8.2 Multiple regression
- 8.2.1 Multiple regression using Stata
- 8.2.2 Additional computations
- 8.2.3 What does "under control" mean?
- 8.3 Regression diagnostics
- 8.3.1 Violation of E(εi) = 0
- Linearity
- Influential cases
- Omitted variables
- 8.3.2 Violation of Var(εi) = σ2
- 8.3.3 Violation of Cov(εi, εj) = 0, i ≠ j
- 8.4 Model extensions
- 8.4.1 Categorical independent variables
- 8.4.2 Interaction terms
- 8.4.3 Regression models using transformed variables
- Nonlinear relations
- Eliminating heteroskedasticity
- 8.5 More on standard errors
- 8.5.1 Bootstrap techniques
- 8.5.2 Confidence intervals in cluster samples
- 8.6 Advanced techniques
- 8.6.1 Median regression
- 8.6.2 Regression models for panel data
- From wide to long format
- Fixed-effects models
- 8.6.3 Error-components models
- 8.7 Summary
9 Regression models for categorical dependent variables
- 9.1 The linear probability model
- 9.2 Basic concepts
- 9.2.1 Odds, log odds, and odds ratios
- 9.2.2 Excursion: The maximum likelihood principle
- 9.3 Logistic regression with Stata
- 9.3.1 The coefficients block
- Sign interpretation
- Interpretation with odds ratios
- Probability interpretation
- 9.3.2 The iteration block
- 9.3.3 The model fit block
- Classification tables
- Pearson chi-squared
- 9.4 Logistic regression diagnostics
- 9.4.1 Linearity
- 9.4.2 Influential cases
- 9.5 Likelihood-ratio test
- 9.6 Refined models
- 9.7 Advanced techniques
- 9.7.1 Probit models
- 9.7.2 Multinomial logistic regression
- 9.7.3 Models for ordinal data
- 9.8 Summary
10 Reading and writing data
- 10.1 The goal: The data matrix
- 10.2 Importing machine-readable data
- 10.2.1 Reading system files from other packages
- 10.2.2 Reading ASCII text files
- Reading data in spreadsheet format
- Reading data in free format
- Reading data in fixed format
- 10.3 Inputting data
- 10.3.1 Input data using the editor
- 10.3.2 The input command
- 10.4 Combining data
- 10.4.1 The GSOEP database
- 10.4.2 The merge command
- The merge procedure
- Keeping track of observations
- Merging more than two files
- Merging data on different levels
- 10.4.3 The append command
- 10.5 Saving and exporting data
- 10.6 Handling big datasets
- 10.6.1 Rules for handling the working memory
- 10.6.2 Using oversized datasets
- 10.7 Summary
11 Do-files for advanced users and user-written programs
- 11.1 Two examples of usage
- 11.2 Four programming tools
- 11.2.1 Local macros
- 11.2.2 Do-files
- 11.2.3 Programs
- 11.2.4 Programs in do-files and ado-files
- 11.3 User-written Stata commands
- 11.3.1 Parsing variable lists
- 11.3.2 Parsing options
- 11.3.3 Parsing if and in qualifiers
- 11.3.4 Generating an unknown number of variables
- 11.3.5 Default values
- 11.3.6 Extended macro functions
- 11.3.7 Avoiding changes in the dataset
- 11.3.8 Help files
- 11.4 Summary
12 Around Stata
- 12.1 Resources and information
- 12.2 Taking care of Stata
- 12.3 Additional procedures
- 12.3.1 SJ and STB ado-files
- 12.3.2 SSC ado-files
- 12.3.3 Other ado-files
- 12.4 Summary
References
Author index (pdf)
Subject index (pdf)