|
|
|
|
|
Features
of STATISTICA Multivariate Exploratory Techniques
STATISTICA Multivariate Exploratory Techniques offers a broad
selection of exploratory techniques, from cluster
analysis to advanced classification trees methods, with an endless
array of interactive visualization tools for exploring relationships
and patterns; built-in complete Visual Basic scripting.
STATISTICA Multivariate Exploratory Techniques is
compatible with Windows 95, Windows 98, Windows NT, Windows 2000,
Windows XP, Windows Me. It features the following modules:
Cluster Analysis Techniques
Factor Analysis
Principal Components & Classification
Analysis
Canonical Correlation Analysis
Reliability/Item Analysis
Classification Trees
Correspondence Analysis
Multidimensional Scaling
Discriminant Analysis
General Discriminant Analysis Models (GDA)
|
|
|

CLUSTER ANALYSIS. This module includes a
comprehensive implementation of clustering methods (k-means,
hierarchical clustering, 2-way joining). The program can
process data from either raw data files or matrices of
distance measures (e.g., correlation matrices). The user can
cluster cases, variables, or both based on a wide variety of
distance measures (including Euclidean, squared Euclidean,
City-block (Manhattan), Chebychev, Power distances, Percent
disagreement, and 1-r) and amalgamation/linkage rules
(including single, complete, weighted and unweighted group
average or centroid, Ward's method, and others). Matrices of
distances can be saved for further analysis with other
modules of the STATISTICA system. In k-means
clustering, the user has full control over the initial
cluster centers. Extremely large analysis designs can be
processed; for example, hierarchical (tree) joining can
analyze matrices with over 1,000 variables, or with over 1
million distances. In addition to the standard cluster
analysis output, a comprehensive set of descriptive
statistics and extended diagnostics (e.g., the complete
amalgamation schedule with cohesion levels in hierarchical
clustering, the ANOVA table in k-means clustering) is
available. Cluster membership data can be appended to the
current data file for further processing. Graphics options
in the Cluster Analysis module include customizable
tree diagrams, discrete contour-style two-way joining matrix
plots, plots of amalgamation schedules, plots of means in k-means
clustering, and many others.
|
FACTOR
ANALYSIS. The Factor Analysis module contains
a wide range of statistics and options, and provides a
comprehensive implementation of factor (and hierarchical
factor) analytic techniques with extended diagnostics and a
wide variety of analytic and exploratory graphs. It will
perform principal components, common, and hierarchical
(oblique) factor analysis, and can handle extremely large
analysis problems (e.g., with thousands of variables).
Confirmatory factor analysis (as well as path analysis) can
also be performed via the Structural
Equation Modeling and Path Analysis (SEPATH) module
found in the add-on STATISTICA Advanced Linear/Non-Linear
Models.
|
PRINCIPAL COMPONENTS & CLASSIFICATION ANALYSIS.
STATISTICA also includes a designated program for
principal components and classification analysis. The output
includes eigenvalues (regular, cumulative, relative), factor
loadings, factor scores (which can be appended to the input
data file, reviewed graphically as icons, and interactively
recoded), and a number of more technical statistics and
diagnostics. Available rotations include Varimax, Equimax,
Quartimax, Biquartimax (either normalized or raw), and
Oblique rotations. The factorial space can be plotted and
reviewed "slice by slice" in either 2D or 3D
scatterplots with labeled variable-points; other integrated
graphs include Scree plots, various scatterplots, bar and
line graphs, and others. After a factor solution is
determined, the user can recalculate (i.e., reconstruct) the
correlation matrix from the respective number of factors to
evaluate the fit of the factor model. Both raw data files
and matrices of correlations can be used as input.
Confirmatory factor analysis and other related analyses can
be performed with the Structural
Equation Modeling and Path Analysis (SEPATH) module
available in STATISTICA Advanced Linear/Non-Linear Models,
where a designated Confirmatory Factor Analysis Wizard
will guide you step by step through the process of
specifying the model.
Click
here to read a real-life application story using STATISTICA's
Principal Components Analysis tools.
|
CANONICAL CORRELATION ANALYSIS.
This module offers a comprehensive implementation of
canonical analysis procedures; it can process raw data files
or correlation matrices and it computes all of the standard
canonical correlation statistics (including eigenvectors,
eigenvalues, redundancy coefficients, canonical weights,
loadings, extracted variances, significance tests for each
root, etc.) and a number of extended diagnostics. The scores
of canonical variates can be computed for each case,
appended to the data file, and visualized via integrated
icon plots. The Canonical Analysis module also
includes a variety of integrated graphs (including plots of
eigenvalues, canonical correlations, scatterplots of
canonical variates, and many others). Note that confirmatory
analyses of structural relationships between latent
variables can also be performed via the SEPATH
(Structural Equation Modeling and Path Analysis)
module in STATISTICA Advanced Linear/Non-Linear Models;
advanced stepwise and best-subset selection of predictor
variables for MANOVA/MANCOVA designs (with multiple
dependent variables) is available in the General
Regression Models (GRM) module in STATISTICA
Advanced Linear/Non-Linear Models.
|
RELIABILITY/ITEM ANALYSIS.
This module includes a comprehensive selection of procedures
for the development and evaluation of surveys and
questionnaires. As in all other modules of STATISTICA,
extremely large designs can be analyzed. The user can
calculate reliability statistics for all items in a scale,
interactively select subsets, or obtain comparisons between
subsets of items via the "split-half" (or
split-part) method. In a single run, the user can evaluate
the reliability of a sum-scale as well as subscales. When
interactively deleting items, the new reliability is
computed instantly without processing the data file again.
The output includes correlation matrices and descriptive
statistics for items, Cronbach alpha, the
standardized alpha, the average inter-item
correlation, the complete ANOVA table for the scale, the
complete set of item-total statistics (including multiple
item-total R's), the split-half reliability, and the
correlation between the two halves corrected for
attenuation. A selection of graphs (including various
integrated scatterplots, histograms, line plots and other
plots) and a set of interactive what-if procedures
are provided to aid in the development of scales. For
example, the user can calculate the expected reliability
after adding a particular number of items to the scale, and
can estimate the number of items that would have to be added
to the scale in order to achieve a particular reliability.
Also, the user can estimate the correlation corrected for
attenuation between the current scale and another measure
(given the reliability of the current scale).
|

CLASSIFICATION TREES. STATISTICA's
Classification Trees module provides a comprehensive
implementation of the most recently developed algorithms for
efficiently producing and testing the robustness of
classification trees (a classification tree is a rule for
predicting the class of an object from the values of its
predictor variables). Advanced methods for tree
classifications, including flexible options for model
building and interactive tools to explore the trees are also
available in the General Classification and Regression
Tree Models (GTrees) and General CHAID (Chi-square
Automatic Interaction Detection) models facilities.
Classification trees can be produced using categorical
predictor variables, ordered predictor variables, or both,
and using univariate splits or linear combination splits.
Analysis options include performing exhaustive splits (as in
THAID and C&RT) or discriminant-based
splits; unbiased variable selection (as in QUEST);
direct stopping rules (as in FACT) or bottom-up
pruning (as in C&RT); pruning based on
misclassification rates or on the deviance function;
generalized Chi-square, G-square, or Gini-index
goodness of fit measures. Priors and misclassification costs
can be specified as equal, estimated from the data, or
user-specified. The user can also specify the v value for
v-fold cross-validation during tree building, v value
for v-fold cross-validation for error estimation,
size of the SE rule, minimum node size before pruning, seeds
for random number generation, and alpha value for
variable selection. Integrated graphics options are provided
to explore the input and output data.
See Also: General
Classification and Regression Trees (GTrees) General
CHAID (Chi-square Automatic Interaction Detection) Models
|

CORRESPONDENCE ANALYSIS. This module features a
full implementation of simple and multiple correspondence
analysis techniques, and can analyze even extremely large
tables. The program will accept input data files with
grouping (coding) variables that are to be used to compute
the crosstabulation table, data files that contain
frequencies (or some other measure of correspondence,
association, similarity, confusion, etc.) and coding
variables that identify (enumerate) the cells in the input
table, or data files with frequencies (or other measure of
correspondence) only (e.g., the user can directly type in
and analyze a frequency table). For multiple correspondence
analysis the user can also directly specify a Burt
table as input for the analysis. The program will compute
various tables, including the table of row percentages,
column percentages, total percentages, expected values,
observed minus expected values, standardized deviates, and
contributions to the Chi-square values. The Correspondence
Analysis module will compute the generalized eigenvalues
and eigenvectors, and report all standard diagnostics
including the singular values, eigenvalues, and proportions
of inertia for each dimension. The user can either manually
choose the number of dimensions, or specify a cutoff value
for the maximum cumulative percent of inertia. The program
will compute the standard coordinate values for column and
row points. The user has the choice of row-profile
standardization, column-profile standardization, row and
column profile standardization, or canonical
standardization. For each dimension and row or column point,
the program will compute the inertia, quality, and
cosine-square values. In addition, the user can display (in
spreadsheets) the matrices of the generalized singular
vectors; like the values in all spreadsheets, these matrices
can be accessed via STATISTICA Visual Basic, for
example, in order to implement non-standard methods of
computing the coordinates. The user can compute coordinate
values and related statistics (quality and cosine-square
values) for supplementary points (row or column), and
compare the results with the regular row and column points.
Supplementary points can also be specified for multiple
correspondence analysis. In addition to the 3D histograms
that can be computed for all tables, the user can produce a
line plot for the eigenvalues, and 1D, 2D, and 3D plots for
the row or column points. Row and column points can also be
combined in a single graph, along with any supplementary
points (each type of point will use a different color and
point marker, so the different types of points can easily be
identified in the plots). All points are labeled, and an
option is available to truncate the names for the points to
a user-specified number of characters.
|
MULTIDIMENSIONAL
SCALING. The Multidimensional Scaling module
includes a full implementation of (nonmetric)
multidimensional scaling. Matrices of similarities,
dissimilarities, or correlations between variables (i.e.,
"objects" or cases) can be analyzed. The starting
configuration can be computed by the program (via principal
components analysis) or specified by the user. The program
employs an iterative procedure to minimize the stress value
and the coefficient of alienation. The user can monitor the
iterations and inspect the changes in these values. The
final configurations can be reviewed via spreadsheets, and
via 2D and 3D scatterplots of the dimensional space with
labeled item-points. The output includes the values for the
raw stress (raw F), Kruskal stress coefficient S,
and the coefficient of alienation. The goodness of fit can
be evaluated via Shepard diagrams (with d-hats and d-stars).
Like all other results in STATISTICA, the final
configuration can be saved to a data file.
|

DISCRIMINANT ANALYSIS. The Discriminant
Analysis module is a full implementation of multiple
stepwise discriminant function analysis. STATISTICA
also includes the General Discriminant
Analysis Models module (below) for fitting
ANOVA/ANCOVA-like designs to categorical dependent
variables, and to perform various advanced types of analyses
(e.g., best subset selection of predictors, profiling of
posterior probabilities, etc.) . The Discriminant
Analysis program will perform forward or backward
stepwise analyses, or enter user-specified blocks of
variables into the model. In addition to the numerous
graphics and diagnostics describing the discriminant
functions, the program also provides a wide range of options
and statistics for the classification of old or new
cases (for validation of the model). The output includes the
respective Wilks' lambdas, partial lambdas, F
to enter (or remove), the p levels, the tolerance
values, and the R-square. The program will perform a
full canonical analysis and report the raw and cumulative
eigenvalues for all roots, and their p levels, the
raw and standardized discriminant (canonical) function
coefficients, the structure coefficient matrix (of factor
loadings), the means for the discriminant functions, and the
discriminant scores for each case (which can also be
automatically appended to the data file). Integrated graphs
include histograms of the canonical scores within each group
(and all groups combined), special scatterplots for pairs of
canonical variables (where group membership of individual
cases is visibly marked), a comprehensive selection of
categorized (multiple) graphs allowing the user to explore
the distribution and relations between dependent variables
across the groups (including multiple box-and-whisker plots,
histograms, scatterplots, and probability plots), and many
others. The Discriminant Analysis module will also
compute the standard classification functions for each
group. The classification of cases can be reviewed in terms
of Mahalanobis distances, posterior probabilities, or
actual classifications, and the scores for individual cases
can be visualized via exploratory icon plots and other
multidimensional graphs integrated directly with the results
spreadsheets. All of these values can be automatically
appended to the current data file for further analyses. The
summary classification matrix of the number and percent of
correctly classified cases can also be displayed. The user
has several options to specify the a priori
classification probabilities and can specify selection
conditions to include or exclude selected cases from the
classification (e.g., to validate the classification
functions in a new sample).
|
GENERAL
DISCRIMINANT ANALYSIS MODELS (GDA). The STATISTICA
General Discriminant Analysis Models (GDA) module is an
application and extension of the General Linear Model
to classification problems. Like the Discriminant
Analysis module, GDA allows you to perform
standard and stepwise discriminant analyses. GDA
implements the discriminant analysis problem as a special
case of the general linear model, and thereby offers
extremely useful analytic techniques that are innovative,
efficient, and extremely powerful. As in traditional
discriminant analysis, GDA allows you to specify a
categorical dependent variable. For the analysis, the group
membership (with regard to the dependent variable) is then
coded into indicator variables, and all methods of GRM
can be applied. In the results dialogs, the extensive
selection of residual statistics of GRM and GLM
are available in GDA as well. GDA provides
powerful and efficient tools for data mining as well as
applied research. GDA will compute all standard
results for discriminant analysis, including discriminant
function coefficients, canonical analysis results
(standardized and raw coefficients, step-down tests of
canonical roots, etc.), classification statistics (including
Mahalanobis distances, posterior probabilities, actual
classification of cases in the analysis sample and
validation sample, misclassification matrix, etc.) |
|
|