
VARIANCE COMPONENTS AND MIXED MODEL ANOVA/ANCOVA Variance
Components and Mixed Model ANOVA/ANCOVA. is a
specialized module for designs with random effects and/or
factors with many levels; options for handling random
effects and for estimating variance components are also
provided in the General Linear Models module. Random effects
(factors)occur frequently in industrial research, when the
levels of a factor represent values sampled from a random
variable (as opposed to being deliberately chosen or
arranged by the experimenter). The Variance Components
module will allow you to analyze designs with any
combinations of fixed effects, random effects, and
covariates. Extremely large ANOVA/ANCOVA designs can be
efficiently analyzed: Factors can have several hundreds of
levels. The program will analyze standard factorial
(crossed) designs as well as hierarchically nested designs,
and compute the standard Type I, II, and III
analysis of variance sums of squares and mean squares for
the effects in the model. In addition, you can compute the
table of expected mean squares for the effects in the
design, the variance components for the random effects in
the model, the coefficients for the denominator synthesis,
and the complete ANOVA table with tests based on synthesized
error sums of squares and degrees of freedom (using
Satterthwaite's method). Other methods for estimating
variance components are also supported (e.g., MIVQUE0,
Maximum Likelihood [ML], Restricted Maximum
Likelihood [REML]). For maximum likelihood
estimation, both the Newton-Raphson and Fisher scoring
algorithms are used, and the model will not be arbitrarily
changed (reduced) during estimation to handle situations
where most components are at or near zero. Several options
for reviewing the weighted and unweighted marginal means,
and their confidence intervals, are also available.
Extensive graphics options can be used to visualize the
results.

SURVIVAL/FAILURE TIME ANALYSIS. This module
features a comprehensive implementation of a variety of
techniques for analyzing censored data from social,
biological, and medical research, as well as procedures used
in engineering and marketing (e.g., quality control,
reliability estimation, etc.). In addition to computing life
tables with various descriptive statistics and Kaplan-Meier
product limit estimates, the user can compare the
survivorship functions in different groups using a large
selection of methods (including the Gehan test, Cox F-test,
Cox-Mantel test, Log-rank test, and Peto & Peto
generalized Wilcoxon test). Also, Kaplan-Meier plots can be
computed for groups (uncensored observations are identified
in graphs with different point markers). The program also
features a selection of survival function fitting procedures
(including the Exponential, Linear Hazard, Gompertz,
and Weibull functions) based on either unweighted and
weighted least squares methods (maximum-likelihood parameter
estimates for various distributions, including Weibull,
can also be computed via the STATISTICA
Process Analysis module). Finally, the program
offers full implementations of four general explanatory
models (Cox's proportional hazard model, exponential
regression model, log-normal and normal regression models)
with extended diagnostics, including stratified analysis and
graphs of survival for user-specified values of predictors.
For Cox proportional hazard regression, the user can choose
to stratify the sample to permit different baseline hazards
in different strata (but a constant coefficient vector), or
the user can allow for different baseline hazards as well as
coefficient vectors. In addition, general facilities are
provided to define one or more time-dependent covariates.
Time-dependent covariates can be specified via a flexible
formula interpreter that allows the user to define the
covariates via arithmetic expressions which may include
time, as well as the standard logical functions (e.g., timdep=age+age*log(t_)*(age>45),
where t_ references survival time) and a wide variety
of distribution functions. As in all other modules of STATISTICA,
the user can access and change the technical parameters of
all procedures (or accept dynamic defaults). The module also
offers an extensive selection of graphics and specialized
diagrams to aid in the interpretation of results (including
plots of cumulative proportions surviving/failing, patterns
of censored data, hazard and cumulative hazard functions,
probability density functions, group comparison plots,
distribution fitting plots, various residual plots, and many
others). For engineering applications, see also Weibull
Analysis.

GENERAL NONLINEAR ESTIMATION (and Quick Logit/Probit
Regression). The Nonlinear Estimation module
allows the user to fit essentially any type of nonlinear
model. One of the unique features of this module is that
(unlike traditional nonlinear estimation programs) it does
not impose any limits on the size of data files that it can
process.
Estimation Methods. The models can be fit using
least squares or maximum-likelihood estimation, or any
user-specified loss function. When using the least-squares
criterion, the very efficient Levenberg-Marquardt and
Gauss-Newton algorithms can be used to estimate the
parameters for arbitrary linear and nonlinear regression
problems. For large datasets or for difficult nonlinear
regression problems (such as those rated "higher
difficulty" among the Statistical Reference Datasets
provided by the National Institute of Standards and
Technology; see http://www.nist.gov/itl/div898/strd/index.html),
when using the least-squares criterion, this is the
recommended method for computing precise parameter
estimates. When using arbitrary loss functions, the user can
choose from among four very different, powerful estimation
procedures (quasi-Newton, Simplex, Hooke-Jeeves pattern
moves, and Rosenbrock pattern search method of rotating
coordinates) so that stable parameter estimates can be
obtained in practically all cases, and even in extremely
numerically-demanding conditions
Models. The user can specify any type of model by
typing in the respective equation into an equation editor.
The equations may include logical operators; thus,
discontinuous (piecewise) regression models and models
including indicator variables can also be estimated. The
equations may also include a wide selection of distribution
functions and cumulative distribution functions (Beta,
Binomial, Cauchy, Chi-square, Exponential, Extreme value, F,
Gamma, Geometric, Laplace, Logistic, Normal, Log-Normal,
Pareto, Poisson, Rayleigh, t (Student), or Weibull
distribution). The user has full control over all aspects of
the estimation procedure (e.g., starting values, step sizes,
convergence criteria, etc.). The most common nonlinear
regression models are predefined in the Nonlinear
Estimation module, and can be chosen simply as menu
options. Those regression models include stepwise Probit and
Logit regression, the exponential regression model, and
linear piecewise (break point) regression. Note that STATISTICA
also includes implementations of powerful algorithms for
fitting generalized linear models, including probit and
multinomial logit models, and generalized additive models;
see the respective descriptions for additional details.
Results. In addition to various descriptive
statistics, standard results of the nonlinear estimation
include the parameter estimates and their standard errors
the variance/covariance matrix of parameter estimates, the
predicted values, residuals, and appropriate measures of
goodness-of-fit (e.g., log-likelihood of estimated/null
models and Chi-square test of difference, proportion
of variance accounted for, classification of cases and
odds-ratios for Logit and Probit models, etc.). Predicted
and residual values can be appended to the data file for
further analyses. For Probit and Logit models, the
incremental fit is also automatically computed when adding
or deleting parameters from the regression model (thus, the
user can explore the data via a stepwise nonlinear
estimation procedure; options for automatic forward and
backward stepwise regression as well as best-subset
selection of predictors in logit and probit models is
provided in the Generalized Linear Models
module, below).
Graphs. All output is integrated with extensive
selections of graphs, including interactively-adjustable 2D
and 3D (surface) arbitrary function fitting graphs which
allow the user to visualize the quality of the fit and
identify outliers or ranges of discrepancy between the model
and the data; the user can interactively adjust the equation
of the fitted function (as shown in the graph) without
re-processing the data and visualize practically all aspects
of the nonlinear fitting process). Many other specialized
graphs are provided to evaluate the fitting process and
visualize the results, such as histograms of all selected
variables and residual values, scatterplots of observed
versus predicted values and predicted versus residual
values, normal and half-normal probability plots of
residuals, and many others.
LOG-LINEAR
ANALYSIS OF FREQUENCY TABLES. This module offers a
complete implementation of log-linear modeling procedures
for multi-way frequency tables. Note that STATISTICA
also includes the Generalized Linear Models module,
which provides options for analyzing binomial and
multinomial logit models with coded ANOVA/ANCOVA-like
designs. In the Log-Linear Analysis module, the user
can analyze up to 7-way tables in a single run. Both
complete and incomplete tables (with structural zeros) can
be analyzed. Frequency tables can be computed from raw data,
or may be entered directly into the program. The Log-Linear
Analysis module provides a comprehensive selection of
advanced modeling procedures in an interactive and flexible
environment that greatly facilitates exploratory and
confirmatory analyses of complex tables. The user may at all
times review the complete observed table as well as marginal
tables, and fitted (expected) values, and may evaluate the
fit of all partial and marginal association models or select
specific models (marginal tables) to be fitted to the
observed data. The program also offers an intelligent
automatic model selection procedure that first determines
the necessary order of interaction terms required for a
model to fit the data, and then, through backwards
elimination, determines the best sufficient model to
satisfactorily fit the data (using criteria determined by
the user). The standard output includes G-square
(Maximum-Likelihood Chi-square), the standard Pearson
Chi-square with the appropriate degrees of freedom
and significance levels, the observed and expected tables,
marginal tables, and other statistics. Graphics options
available in the Log-linear module include a variety
of 2D and 3D graphs designed to visualize 2-way and
multi-way frequency tables (including interactive,
user-controlled cascades of categorized histograms
and 3D histograms revealing "slices" of multi-way
tables), plots of observed and fitted frequencies, plots of
various residuals (standardized, components of
Maximum-Likelihood Chi-square, Freeman-Tukey
deviates, etc.), and many others.

TIME SERIES ANALYSIS/FORECASTING. The Time
Series module contains a wide range of descriptive,
modeling, decomposition, and forecasting methods for both
time and frequency domain models. These procedures are
integrated, that is, the results of one analysis (e.g.,
ARIMA residuals) can be used directly in subsequent analysis
(e.g., to compute the autocorrelation of the residuals).
Also, numerous flexible options are provided to review and
plot single or multiple series. Analyses can be performed on
even very long series. Multiple series can be maintained in
the active work area of the program (e.g., multiple
raw input data series or series resulting from different
stages of the analysis); the series can be reviewed and
compared. The program will automatically keep track of
successive analyses, and maintain a log of transformations
and other results (e.g., ARIMA residuals, seasonal
components, etc.). Thus, the user can always return to prior
transformations or compare (plot) the original series
together with its transformations. Information about the
consecutive transformations is maintained in the form of
long variable labels, so if you save the newly created
variables into a dataset, the "history" of each of
the series will be permanently preserved. The specific Time
Series procedures are described in the following
subsections.
Transformations, Modeling, Plots, Autocorrelations.
The available time series transformations allow the user to
fully explore patterns in the input series, and to perform
all common time series transformations, including:
de-trending, removal of autocorrelation, moving average
smoothing (unweighted and weighted, with user-defined or
Daniell, Tukey, Hamming, Parzen, or Bartlett weights),
moving median smoothing, simple exponential smoothing (see
also the description of all exponential smoothing options
below), differencing, integrating, residualizing, shifting,
4253H smoothing, tapering, Fourier (and inverse)
transformations, and others. Autocorrelation, partial
autocorrelation, and crosscorrelation analyses can also be
performed.
ARIMA and Interrupted Time Series (Intervention)
Analysis. The Time Series module offers a
complete implementation of ARIMA. Models may include a
constant, and the series can be transformed prior to the
analysis; these transformations will automatically be
"undone" when ARIMA forecasts are computed, so
that the forecasts and their standard errors are expressed
in terms of the values of the original input series.
Approximate and exact maximum-likelihood conditional sums of
squares can be computed, and the ARIMA implementation in the
Time Series module is uniquely suited to fitting
models with long seasonal periods (e.g., periods of 30
days). Standard results include the parameter estimates and
their standard errors and the parameter correlations.
Forecasts and their standard errors can be computed and
plotted, and appended to the input series. In addition,
numerous options for examining the ARIMA residuals (for
model adequacy) are available, including a large selection
of graphs. The implementation of ARIMA in the Time Series
module also allows the user to perform interrupted time
series (intervention) analysis. Several simultaneous
interventions may be modeled, which can either be
single-parameter abrupt-permanent interventions, or
two-parameter gradual or temporary interventions (graphs of
different impact patterns can be reviewed). Forecasts can be
computed for all intervention models, which can be plotted
(together with the input series) as well as appended to the
original series.
Seasonal
and Non-Seasonal Exponential Smoothing. The Time
Series module contains a complete implementation of all
12 common exponential smoothing models. Models can be
specified to contain an additive or multiplicative seasonal
component and/or linear, exponential, or damped trend; thus,
available models include the popular Holt-Winter linear
trend models. The user may specify the initial value for the
smoothing transformation, initial trend value, and seasonal
factors (if appropriate). Separate smoothing parameters can
be specified for the trend and seasonal components. The user
can also perform a grid search of the parameter space in
order to identify the best parameters; the respective
results spreadsheet will report for all combinations of
parameter values the mean error, mean absolute error, sum of
squares error, mean square error, mean percentage error, and
mean absolute percentage error. The smallest value for these
fit indices will be highlighted in the spreadsheet. In
addition, the user can also request an automatic search for
the best parameters with regard to the mean square error,
mean absolute error, or mean absolute percentage error (a
general function minimization procedure is used for this
purpose). The results of the respective exponential
smoothing transformation, the residuals, as well as the
requested number of forecasts, are available for further
analyses and plots. A summary plot is also available to
assess the adequacy of the respective exponential smoothing
model; that plot will show the original series together with
the smoothed values and forecasts, as well as the smoothing
residuals plotted separately against the right-Y axis.
Classical Seasonal Decomposition
(Census Method I). The user may specify the length of
the seasonal period, and choose either the additive or
multiplicative seasonal model. The program will compute the
moving averages, ratios or differences, seasonal factors,
the seasonally adjusted series, the smoothed trend-cycle
component, and the irregular component. Those components are
available for further analysis; for example, the user may
compute histograms, normal probability plots, etc. for any
or all of these components (e.g., to test model adequacy).
X-11 Monthly and
Quarterly Seasonal Decomposition and Seasonal Adjustment
(Census Method II). The Time Series module
contains a full-featured implementation of the US Bureau of
the Census X-11 variant of the Census Method II seasonal
adjustment procedure. While the original X-11 algorithms
were not year-2000 compatible (only data prior to January
2000 could be analyzed), the STATISTICA
implementation of X11 can handle data containing dates prior
to January 1, 2000, after that date, or series that will
start prior to that date but terminate in or after the year
2000. The arrangement of options and dialogs closely follows
the definitions and conventions described in the Bureau of
the Census documentation. Additive and multiplicative
seasonal models may be specified. The user may also specify
prior trading-day factors and seasonal adjustment factors.
Trading-day variation can be estimated via regression
(controlling for extreme observations), and used to adjust
the series (conditionally if requested). The standard
options are provided for graduating extreme observations,
for computing the seasonal factors, and for computing the
trend-cycle component (the user can choose between various
types of weighted moving averages; optimal lengths and types
of moving averages can also automatically be chosen by the
program). The final components (seasonal, trend-cycle,
irregular) and the seasonally adjusted series are
automatically available for further analyses and plots;
those components can also be saved for further analyses with
other programs. The program will produce the plots of the
different components, including categorized plots by months
(or quarters).
Polynomial Distributed Lag Models.
The implementation of the polynomial distributed lag methods
in the Time Series module will estimate models with
unconstrained lags as well as (constrained) Almon
distributed lags models. A selection of graphs are available
to examine the distributions of the model variables.
Spectrum (Fourier) and Cross-Spectrum Analysis. The Time
Series module includes a full implementation of spectrum
(Fourier decomposition) analysis and cross-spectrum analysis
techniques. The program is particularly suited for the
analysis of unusually long time series (e.g., with over
250,000 observations), and it will not impose any
constraints on the length of the series (i.e., the length of
input series does not have to be a multiple of 2). However,
the user may also choose to pad or truncate the series prior
to the analysis. Standard pre-analysis transformations
include tapering, subtraction of the mean, and detrending.
For single spectrum analysis, the standard results include
the frequency, period, sine and cosine coefficients,
periodogram values, and spectral density estimates. The
density estimates can be computed using Daniell, Hamming,
Bartlett, Tukey, Parzen, or user-defined weights and
user-defined window widths. An option that is particularly
useful for long input series is to display only a
user-defined number of the largest periodogram or density
values in descending order; thus, the most salient
periodogram or density peaks can be easily identified in
long series. The user can compute the Kolmogorov-Smirnov d
test for the periodogram values to test whether they follow
an exponential distribution (i.e., whether the input is a
white-noise series). Numerous plots are available to
summarize the results; the user can plot the sine and cosine
coefficients, periodogram values, log-periodogram values,
spectral density values, and log-density values against the
frequencies, period, or log-period. For long input series,
the user can choose the segment (period) for which to plot
the respective periodogram or density values, thus enhancing
the "resolution" of the periodogram or density
plot. For cross-spectrum analysis, in addition to the single
spectrum results for each series, the program computes the
cross-periodogram (real and imaginary part), co-spectral
density, quadrature spectrum, cross-amplitude, coherency
values, gain values, and the phase spectrum. All of these
can also be plotted against the frequency, period, or
log-period, either for all periods (frequencies) or only for
a user-defined segment. A user-defined number of the largest
cross-periodogram values (real or imaginary) can also be
displayed in a spreadsheet in descending order of magnitude
to facilitate the identification of salient peaks when
analyzing long input series. As with all other procedures in
the Time Series module, all of these result series
can be appended to the active work area, and will be
available for further analyses with other time series
methods or other STATISTICA modules.
|