Design
of Experiments. STATISTICA Design of Experiments
offers an extremely comprehensive selection of procedures to design
and analyze the experimental designs used in industrial (quality)
research: 2**(k-p) factorial designs with blocking (for over 100
factors, including unique, highly efficient search algorithms for
finding minimum aberration and maximum unconfounding designs, where
the user can specify the interaction effects of interest that are to
be unconfounded), screening designs (for over 100 factors, including
Plackett-Burman designs), 3**(k-p) factorial designs with blocking
(including Box-Behnken designs), mixed-level designs, central
composite (or response surface) designs (including small central
composite designs), Latin square designs, Taguchi robust design
experiments via orthogonal arrays, mixture designs and triangular
surfaces designs, vertices and centroids for constrained surfaces
and mixtures, and D- and A-optimal designs for
factorial designs, surfaces, and mixtures. The specific types of
available designs, and methods for generating and analyzing them,
are described in the following sections.
STATISTICA Design of Experiments is compatible with
Windows 95, Windows 98, Windows NT, Windows 2000, Windows XP,
Windows Me.
Analysis of experiments: General features.
The options for analyzing all factorial, response surface, and
mixture designs are general in nature, can handle unbalanced and
incomplete designs, and give the user full control of the choice of
models to be fitted to the data. The program will compute the
generalized inverse of the X'X matrix (where X stands
for the design matrix) to determine the estimable effects, and the
effects that are aliases of other effects. The program will then
automatically report the table of aliases and compute the parameter
estimates for all non-redundant effects. You can also manually
"toggle" specific effects in and out of the current model
quickly and easily, and observe the effect on the overall fit. All
analyses can be performed in terms of recoded factor values or the
original factor values, and a large number of output options are
provided to review the parameter estimates, analysis of variance
table, etc. Numerous additional options are provided for exploring
the predicted (fitted) means, surfaces, etc.; these options will be
further described in the context of the respective designs below.
Residual analyses and transformations.
A large number of graphs and other output options are provided for
further analyses of residuals from a given model. Specifically, the
program will compute predicted (fitted) and residual values and
their standard errors, user-defined prediction intervals and
confidence intervals for the predicted (fitted) values, standardized
predicted and residual values, studentized residuals, deleted
residuals, studentized deleted residuals, leverage scores,
Mahalanobis and Cook distances, and DFFIT and standardized DFFIT
values. All of these residual statistics can be saved for further
analysis using other STATISTICA modules (e.g., in order to
analyze serial correlations of errors via the Time
Series module). Also, these residual statistics for each
observation can be reviewed in the order of the observation (case)
numbers, or displayed in the order sorted by their magnitudes; thus,
outliers with respect to any of the residual statistics can quickly
be identified. As further aids for evaluating the fit of the
respective model, and for identifying outliers, you can review
histograms of residual (and deleted residual) and predicted values,
scatterplots of (deleted) residual versus predicted values, or
normal, half-normal, and de-trended normal probability plots of
(deleted) residuals. Also, as a check for serial correlation of
residuals, you can plot the (deleted) residual values against the
case numbers. In all plots of individual observations (e.g.,
residual values for cases), the points are identified by their
respective case numbers or labels, and therefore, it is very easy to
identify outliers in a dataset. Finally, maximum-likelihood lambda
values can be computed for the Box-Cox transformation of the
response variables; a plot of the residual sums of squares versus
lambda, along with the confidence limit of lambda,
accompanies the results in the Box-Cox transformation plot.
Optimization
of single or multiple response variables: The response
(desirability) profiler. A unique set of options is
provided to allow the user to interactively optimize single or
multiple response variables, given the current model. First, for
second-order response surface models and mixture surface models, the
program will compute the factor settings associated with the
minimum, maximum, or saddle point value of the respective surface
(i.e., determine the critical value of the current surface, along
with the respective eigenvalues and eigenvectors, to indicate the
curvature and orientation of the quadratic response surface). Note
that for mixture designs, the desirability profiler options are not
based on a simple reparameterization of the mixture model to an
unconstrained surface model (which can lead to erroneous
results, such as optimum factor settings that are not valid
mixtures); instead all computations will be performed based on the
actual (currently fitted constrained) mixture model. Thus, when
searching for the optimum factor settings given the desirability
function for one or more response variables, it is assured that only
the constrained (mixture) experimental region is inspected, and that
the resulting factor settings sum to a valid mixture. Second, a
comprehensive set of graphical options is provided for visualizing
the predicted values of one or more response variables as a function
of each factor in the analysis, while holding all other factors
constant at particular values. Specifically, for multiple response
variables you can specify a desirability function that reflects the
most desirable value for each response variable, and the importance
of each variable for the overall desirability. Then you can plot the
profiles of the desirability function (computed from the predicted
values of each response variable) across a user-defined number of
levels for each factor. Also, the profiles for each individual
response variable, along with confidence intervals, can be displayed
in the same graph.
Moreover,
the desirability function can be plotted in 3D surface plots or
contour plots (desirability contours), and the user can request
matrices of such plots for all factors in the analysis (see the
illustration at left). All settings, such as the factor grid or the
desirability function, can quickly be modified for interactive
analyses (e.g., you can quickly exclude specific response variables
from the analysis, and observe the effect on the overall
desirability function). Also, the specifications for complex
desirability functions for many response variables can be saved to a
file, and later quickly retrieved when you want to analyze other
experiments using the same response variables. Finally, options are
provided for determining the optimum value of the desirability
function, either by using a grid search over the experimental
region, or by using an efficient general function optimization
algorithm (which is particularly useful for optimizing desirability
functions for experiments with many factors). Note that desirability
profiling options are also provided in STATISTICA
General Linear Models (GLM), General
Regression Models(GRM), and General
Discriminant Analysis Models (GDA) (for categorical
responses).
Standard
two-level 2**(k-p) fractional factorial designs with blocks
(Box-Hunter-Hunter minimum aberration designs). STATISTICA
Design of Experiments provides the complete catalog of all
standard (so-called, minimum aberration) designs (as, for example,
reproduced in the widely used text books by Box and Draper, 1987;
Box, Hunter, and Hunter, 1978; Montgomery, 1991). The user can
review designs in a Spreadsheet; the runs may be randomized (overall
or within blocks), and blank columns may be added to the
Spreadsheet. Options are provided for specifying the factor highs
and lows, and the design can be reviewed and saved in terms of the
coded factor levels or the original metric of factors. The user can
also request replications, add center points to the design, or add a
fold-over of the original design. The fractional design
generators and block generators of the design, as well as the matrix
of aliases of main effects and interactions can also be reviewed. STATISTICA
Design of Experiments will automatically perform a complete
ANOVA on the design. The user has full control over the effects and
interactions to be included in the model, and can review the
correlations among the columns of the design matrix (X) as
well as the inverse of the X'X matrix (i.e., the covariance
and correlation matrices of the parameter estimates). The program
will compute the ANOVA parameter estimates and their standard errors
and confidence intervals, the coefficients for the recoded (-1,
+1) factor values and their standard errors and confidence
intervals, and the coefficients (standard errors, confidence
intervals) for the untransformed factor values. Based on those
estimates, the program can compute predicted values (standard
errors, confidence intervals) for user-specified factor levels.
The
program will compute the complete ANOVA table, based on the
mean-square (ms) residual term, or, when the design is at least
partially replicated, based on the estimate of pure error. When a
pure error estimate is available, the program will also compute a
test for overall lack-of-fit; when the design contains center
points, the program will perform an overall curvature check. The
user can review the table of means and marginal means, and their
confidence intervals. Numerous options are available for reviewing
the results in graphs: Pareto charts of effects, normal and
half-normal probability plots of effects, square and cube plots,
means plots and interaction plots (with confidence intervals for
marginal means), response surface plots, and response contour plots.
In addition, all general features described above (under the
headings Design of experiments, Analysis of experiments: General
features, Residual analyses and transformations, and Optimization
of single or multiple response variables) are available, for
performing detailed analyses of residuals, to evaluate the fit of
the model, and for finding the optimum factors settings, given one
or more response variables.
Minimum
aberration and maximum unconfounding 2**(k-p) fractional factorial
designs with blocks: General design search. In addition to
the standard 2**(k-p) designs, STATISTICA Design of Experiments
includes a general design search option for generating minimum
aberration (least confounded) fractional factorial designs with or
without blocks with over 100 factors and over 2,000 runs. These
types of efficient designs have only recently been discovered and
they allow you to evaluate a greater number of (specific) factor
interactions than the standard Box-Hunter designs; STATISTICA
Design of Experiments is the only program that currently offers
this functionality. Given a desired resolution, you can either
perform a comprehensive search of all (non-isomorphic) sets of
generators, or specify particular sets of interactions that you
would like to keep unconfounded at the respective resolution. In
addition to the common search criterion of "minimum
aberration," you can also choose the criterion of "maximum
unconfounding" which will lead to the design with the largest
possible number of unconfounded effects (unconfounded with all other
effects, given the current resolution of the design). These designs
can be further enhanced in the same manner as the standard 2**(k-p)
designs described in the previous paragraph (by adding replications,
center points, foldover, etc.). Also, all analysis options described
in the previous paragraph are applicable to these designs (or any
arbitrary 2**(k-p) design).
Click
here to read the white paper entitled Minimum Aberration Designs
Are Not Maximally Unconfounded.
Screening (Plackett-Burman) designs. STATISTICA
Design of Experiments allows the user to design and analyze
screening designs for a large number of factors. The program will
generate Plackett-Burman (Hadamard matrix) designs and saturated
fractional factorial designs with up to 127 factors. As with
2**(k-p) designs, the user can request replications of the design,
manually add points, add center points, and print or save the
design. For the analysis of screening designs, the same options are
available as those described for the analysis of 2**(k-p) designs
(see the previous paragraphs).
Mixed-level factorial designs. The
program also supports mixed designs (as enumerated for the National
Bureau of Standards of the U.S. Department of Commerce). The design
and analysis options available for those designs are identical to
those described for 3**(k-p) designs (see the previous paragraph).
Three-level
3**(k-p) fractional factorial designs with blocks and Box-Behnken
designs. STATISTICA Design of Experiments contains a
complete implementation of the standard (blocked) 3**(k-p) designs.
Also included are the standard Box-Behnken designs. As with all
other designs, the user can display and save those designs in
standard or randomized order, request replications or add individual
runs, review the design and block generators, etc. The program will
perform a complete analysis for 3**(k-p) designs. The user has full
control over the effects that are to be included in the analysis.
The main effects are broken down into linear and quadratic effects,
and the interactions are broken down into linear-linear,
linear-quadratic, quadratic-linear, and quadratic-quadratic effects.
The user can review the correlation matrix of the design matrix (X)
as well as the inverse of X'X. The program will compute the
standard ANOVA parameter estimates (standard errors, confidence
intervals, statistical significance, etc.), coefficients for the
recoded (-1, 0, +1) factors, and coefficients for the
unrecoded factors. Based on those values, the program provides
options for computing predicted values (and standard errors,
confidence intervals) based on user-specified values of the factors.
The ANOVA table will include tests for the linear and quadratic
components of each effect as well as combined
multiple-degree-of-freedom tests for the effects. If the design
includes replications, then the estimate of pure error can be used
for the ANOVA and significance testing; in that case an overall
lack-of-fit test will also be performed.
To aid in the interpretation of results, the program will compute
the table of means (and confidence intervals) as well as marginal
means (and confidence intervals) for interactions. Graphical options
include plots of means and marginal means (with confidence
intervals), the Pareto chart of effects, normal and half-normal
probability plots of effects, and response surface and contour
plots. In addition, all general features described above (under the
headings Design of experiments, Analysis of experiments: General
features, Residual analyses and transformations, and Optimization
of single or multiple response variables) are available, for
performing detailed analyses of residuals, to evaluate the fit of
the model, and for finding the optimum factors settings, given one
or more response variables.
Central
composite (response surface) designs. The user can choose
from a catalog of standard designs, including small central
composite designs (based on Plackett-Burman designs). In addition to
the standard options available for all designs (adding runs,
randomization, replications, factor highs and lows, etc.; refer to
the description of 2**(k-p) designs) the user has the choice of
star-points that are face-centered, or computed for rotatability,
orthogonality, or both. The analysis options are very similar to
those described for 3**(k-p) and 2**(k-p) designs above. The user
can compute the ANOVA parameters, coefficients for the recoded
factor values, and the coefficients for the untransformed factors.
Predicted values for user-specified factor values can also be
computed. The user has full control over the effects to be included
in the model, and can review the correlation matrix for the design
matrix (X) as well as the inverse of X'X. If
replicates are available, the ANOVA table may include the estimate
of pure error, and an overall lack-of-fit test. The standard results
graphics options include the Pareto chart of effects, probability
plot of effects, and response surface and contour plots (if there
are more than two factors, for user-specified values of additional
factors). In addition, all general features described above (under
the headings Design of experiments, Analysis of experiments:
General features, Residual analyses and transformations, and
Optimization of single or multiple response variables) are
available, for performing detailed analyses of residuals, to
evaluate the fit of the model, and for finding the optimum factors
settings, given one or more response variables.
Latin squares. The user can choose
between different Latin square designs, with up to nine levels.
Whenever possible, the program will also make available Greco-Latin
squares and Hyper-Greco Latin squares. When there are several
alternative Latin squares available, the program will either choose
randomly from among them, or the user can select the desired Latin
square(s). Designs can be reviewed in a Spreadsheet, randomized
order, and blank columns may be added to create convenient data
entry forms. The design can also be saved in a standard STATISTICA
data file. After appending the observed data to this file, the
experiment can then be easily analyzed. In addition to the full
ANOVA table, STATISTICA Design of Experiments will compute
the means for all factors. These means can be plotted in a summary
plot.
Taguchi
robust design experiments. STATISTICA Design of
Experiments will generate orthogonal arrays for up to 31
factors; designs with up to 65 factors can be analyzed. As in all
other types of designs, the runs of the experiment can be
randomized, and the user can add blank columns to the Spreadsheet to
generate convenient data entry forms. The user can also examine the
aliases of two-way interactions. STATISTICA Design of Experiments
will automatically compute the standard signal-to-noise (S/N)
ratios for problems of these types: (1) Smaller-the-better,
(2) Nominal-the-best, (3) Larger-the-better, (4) Signed
target, (5) Fraction defective, and (6) Number
defective per interval (accumulation analysis). In
addition, untransformed data can also be analyzed; thus, the user
can produce any type of customized S/N ratios via STATISTICA
Visual Basic and analyze them with this procedure. In addition
to comprehensive descriptive statistics, the user can review the
computed S/N ratios. The full ANOVA results are displayed in
an interactive Spreadsheet in which the user can "toggle"
effects into or out of the error term. A similar interactive
Spreadsheet allows the user to predict Eta (the S/N
ratio) under optimum conditions, that is, settings of levels of
factors. Again, the user can "toggle" effects into or out
of the model, and specify particular levels for factors. Finally,
the means can be summarized in a standard main effect plot of Eta
by factor level; if an accumulation analysis on categorical data is
performed, the results can be summarized in a stacked bar plot as
well as line plots of the cumulative probabilities across categories
for the levels of selected factors. Note that different types of
response desirability functions for single or multiple variables can
also be optimized via the response (desirability) profiler
described earlier, available in conjunction with 2**(k-p), 3**(k-p),
central composite designs, etc. (or in GLM,
GRM, GDA).
Designs
for mixtures and triangular graphs. This procedure includes
options for designing the simplex-lattice and simplex-centroid
designs for mixture variables. These designs can be enhanced by
additional interior points and a centroid. The user can enter
lower-bound constraints for each factor, and the program will
automatically construct the respective design in the sub-simplex
defined by the constraints. Multiple upper and lower constraints can
be handled via the general facilities for constructing designs in
constrained experimental regions (see below). The user can add
individual runs or replications, and display and save the design in
standard or randomized order. The program will compute the
coefficients for the pseudo-components and the components in their
original metric, along with the standard errors, confidence
intervals, and tests of statistical significance. (Note that the STATISTICA
General Linear Models (GLM) module also includes facilities for
analyzing mixture experiments; those options are particularly useful
for analyzing designs that combine both mixture and non-mixture
variables in complex designs.) The user has full control over the
terms that are to be included in the model; standard models include
the linear, quadratic, special cubic, and full cubic models. The
ANOVA table will include tests for the incremental fit of the
different models, and if the design includes replicated runs, a test
for lack-of-fit based on the estimate of pure error will also be
computed. Results options include the table of means, the
correlations for the columns of the design matrix (X), the
inverse of the design matrix X'X (the variance/covariance
matrix for the parameter estimates), the Pareto chart, probability
plots of parameter estimates, etc. Also, the user can compute
predicted values, based on user-defined values of the factors.
Specialized graphs to summarize the results of mixture experiments
include response trace plots for user-defined reference blends, and
triangular surface and contour plots. If there are more than 3
components in the experiment, then the surface and contour plots can
be produced for user-defined values of the additional components.
Finally, all general features described above (under the headings Design
of experiments, Analysis of experiments: General features, Residual
analyses and transformations, and Optimization of single or
multiple response variables) are available, for performing
detailed analyses of residuals, to evaluate the fit of the model,
and for finding the optimum factors settings, given one or more
response variables. Note that the response (desirability) profiler
options available for mixture designs are not based on a simple
reparameterization of the mixture model to an unconstrained surface
model; instead all computations will be performed based on the
actual (fitted) mixture model. Thus, when searching for the optimum
factor settings given the desirability function for one or more
response variables, it is assured that only the constrained
(mixture) experimental region is inspected, and that the resulting
factor settings sum to a valid mixture.
Designs
for constrained surfaces and mixtures. STATISTICA Design
of Experiments contains procedures for computing vertex and
centroid points for constrained surfaces and mixtures defined by
linear constraints. The user can enter upper and lower limits for
the factors, and specify any additional linear constraints (of the
form A1*x1 + ... + An*xn
+ A0 >= 0) on the factor values. The program will
then compute the vertex points, and optional centroid points, for
the constrained region. The constraints will be processed
sequentially, and unnecessary constraints will be identified. There
are numerous additional options for reviewing the characteristics of
the constrained region. The user can review the vertex and centroid
points in 3D and triangular scatterplots (for mixtures). The
correlation matrix for the columns of the design matrix X,
for various standard types of designs, can also be computed as well
as the inverse of the X'X matrix (i.e., the
variance/covariance matrix of the parameter estimates). This allows
the user to evaluate the characteristics of the design, based on the
vertex and centroid points. These points can then be submitted to
the optimal design facilities (see below), to construct designs with
the minimum number of runs.
D-
and A-optimal designs. The program includes several
algorithms for constructing optimal designs. The user can choose
between the D (determinant) optimality and the A (or
trace) optimality criterion, and specify models for surfaces and
mixtures. A list of candidate points for the design can be entered
by hand or retrieved from a STATISTICA data file (e.g., a
design previously created via the facilities for computing vertex
and centroid points for constrained surfaces and mixtures, see
above). Points in the candidate list can be marked for forced
inclusion in the final design, thus, the user can enhance or
"repair" existing experiments. The program includes all
common search algorithms developed for constructing D- and A-optimal
designs: Dykstra's sequential search procedure, the Wynn-Mitchell
simple exchange procedure, the Mitchell DETMAX procedure
(exchange with excursions), Fedorov's simultaneous switching
procedure, and a modified simultaneous switching procedure. For the
final design, the program will compute the determinant of X'X and
the D, A, and G efficiencies. The user can also review
the correlation matrix for the columns of the final design matrix (X),
and the inverse of the X'X matrix (the variance covariance
matrix of parameter estimates). The final design points can be
visualized in 3D and triangular scatterplots (for mixtures).
Alternative procedures for analyzing data
collected in experiments. STATISTICA includes an
extremely large number of computational methods for analyzing data
collected in experiments, and for fitting ANOVA/ANCOVA - like
designs to continuous or categorical outcome variables.
Specifically, STATISTICA includes complete implementations
of:
 |
- General
Linear Models (GLM) and General
Regression Models (GRM) (available in STATISTICA
Advanced Linear/Non-Linear Models) with
sophisticated model-building procedures (stepwise and
best-subset selection of predictor effects),
- Generalized
Linear Models (GLZ) (available in STATISTICA
Advanced Linear/Non-Linear Models), which also
offers stepwise and best-subset selection of predictor
effects in ANOVA/ANCOVA - like designs, for various
popular alternatives to linear least squares models,
such as logit, multi-nomial logit, and probit models,
- General
Discriminant Analysis Models (GDA) (available in
STATISTICA
Multivariate Exploratory Techniques), which
allows you to use ANOVA/ANCOVA - like experimental
designs for classification, and to use stepwise and
best-subset selection of predictor effects; GDA also
includes desirability profiler and response optimization
methods, which can be used to determine the factor
combinations, levels, and/or values that maximize the
posterior classification probabilities for one or more
categories of the dependent (outcome) variable,
- General
Classification and Regression Trees Models (available
in STATISTICA
Multivariate Exploratory Techniques), and General
CHAID models (available in STATISTICA
Enterprise-wide Data Mining System), which allow
you to evaluate the efficacy of ANOVA/ANCOVA - like
experimental designs for building highly non-linear
hierarchical classification or regression trees.
|
Thus, STATISTICA can be applied to quality-improvement
research in creative and innovative ways, when the dependent
variables of interest are categorical in nature, or when the effect
of the predictor variables (effects) is clearly non-linear in
nature.
STATISTICA Design of Experiments is an add-on package that
requires a base product such as STATISTICA
Base or STATISTICA Quality Control
Charts.
|