Veri  Analizi Ana Sayfası
 


  

STATISTICA 7 -
New Features and Enhancements


  Enhancements to Existing Functionality
      Extended Import/Export
      Enhanced Graph Updating
      Interactive Graphs, Brushing
      "By Group" Analysis for Statistics and Graphs
      Variable and Case Metadata
          Case Metadata
          Variable Metadata
      Workbook Multi-Item Display
      New Recording / Reporting Options for Case Selection Conditions
      Web Browser Document Type
      Enhanced Text Importing
      Automatic Variable Classification
      Licensing Changes
      Sorting
      Merge Data
      Stacking/Unstacking
      Enhanced Spreadsheet Formulas and Case Selection Conditions
      Further Expanded STATISTICA Visual Basic Functionality
      "All Values" Categorization Method
      Basic Statistics
      Quality Control
      Process Analysis
  SEWSS Enhancements
      Aggregated Data
      Sets
      21 CFR PART 11 Compliance
  New Products and Analysis Modules
      STATISTICA NIPALS Algorithm (PCA/PLS)
      STATISTICA Sequence, Association and Link Analysis
      STATISTICA Multivariate Statistical Process Control (MSPC)
      Random Forests


Enhancements to Existing Functionality:

Extended Import/Export

Added support for importing and exporting to/from:

  • SAS Data Files (binary)
  • SAS Transport Files
  • SPSS Data Files (binary)
  • SPSS Portable Files (Replaces V6 SPSS POR import functionality)
  • Minitab Data Files (binary)
  • JMP Data Files (binary)
Enhanced Graph Updating
  • Support for maintaining integrated "data-graphs" exploratory environments.
  • STATISTICA Graphs will update when the source Spreadsheet data change even after the respective STATISTICA analyses are closed.
  • Graphs can be re-linked to new Spreadsheets and Variables, allowing currently customized graphs (titles, scaling, embedded objects, bar shading, etc.) to be used as "Templates" for deployment to different data sets.
Interactive Graphs
  • Tight integration between Graphs and their source Spreadsheets.
  • Brush points on Scatterplots and the Cases will automatically become marked in the respective Spreadsheet, so the subsets can be used in subsequent analyses.
  • Brushing states will propagate to the source Spreadsheet and then to all other open Graphs based on the same Spreadsheet; this feature enables the user to brush points on one graph and view the corresponding Cases highlighted on other open Graphs.
  • Brushing events will update the Spreadsheet marking Cases as Labeled/Unlabeled, Excluded/Included, Marked/Unmarked. Related graphs tied to the same data can then be updated to reflect the brushing events performed on the first graph.
"By Group" Analysis for Statistics and Graphs
  • All STATISTICA Analyses and Graphs now support the selection of one or more "By Variables." The specified analysis is repeated for each unique level (value) of the "By Variables." For example, a Multiple Linear Regression model can be specified and calculated independently for subsets of cases defined by each unique value of variable City (e.g., Dallas, Atlanta, Pittsburgh, Chicago...).
Variable and Case Metadata

Metadata can now be defined for Cases and Variables to offer new analytic options and simplify and speed up specifying new analyses.

Case Metadata:

  • Marker Type: Defines the point marker shape to be used for the respective Case(s); used in Graph Types such as Scatterplots (for example, one particular case can be assigned a "red star" marker, and it will appear as such in all scatterplots).
  • Marker Color: Defines the point marker color to be used for the respective Case(s).
  • Excluded: User can mark a case as Excluded. An Excluded case will be omitted from calculations, but will still be present in graphical displays.
  • Hidden: User can turn off a point in graph, i.e., the point will still be used in computations, but will not be displayed in a graph.
  • Label: User can select to label individual cases within graphs.

Variable Metadata:

  • Measurement Type (Auto, Continuous, Categorical, Ordinal): Used for automatic variable classification in Analyses and, optionally, automatically populating variable selection list boxes only with variables of the appropriate types.
  • Excluded: Prevents display in Variable selection dialogs.
  • Label: The User can define a variable as a Label variable. The values of a Label variable will be used as point labels within appropriate graphs.
  • Case state: User can save case states to a specified variable.
  • Properties: User can create custom metadata fields (name-value pairs) to be stored and associated with a Variable. For example, a User can define an "Upper Control Limit" property for a variable assigned a value of "2.6". A STATISTICA Visual Basic (SVB) macro can query the Variable Properties, including the custom Upper Control Limit Property, to apply it to Quality Control Charts based on this Variable. With this approach, the same SVB macro can be applied to different data and dynamically use appropriate QC Chart limits and specifications.
Workbook Multi-Item Display
  • New default behavior in Workbooks is to display the contents of a folder, when the folder is selected, as a pane of the respective Spreadsheets and Graphs from that folder displayed in form of a grid of items (adjacent to each other).
  • Workbooks now support the ability to View/Print the contents of a Workbook folder in a user-defined grid configuration.
New Recording / Reporting Options for Case Selection Conditions
  • Currently specified case selection conditions can now be automatically displayed in title areas of all respective graphs (generated from the case selected subsets) and in the header areas of all result spreadsheets.
Web Browser Document Type
  • Support for Integrated Internet Explorer (IE) Windows in the STATISTICA Application.
  • The integrated IE Window offers one more method supported in STATISTICA to easily build custom User Interfaces, in this case, using the standard HTML scripting.
  • IE Windows supports HTML applications that can include native STATISTICA Spreadsheet and Graph objects for interactive editing, brushing, etc.
  • IE Windows support hosting of native STATISTICA Spreadsheet and Graph objects for interactive editing, brushing, etc.
  • Seamless integration of desktop STATISTICA and WebSTATISTICA running on a remote server.
Enhanced Text Importing
  • The import of text files has been enhanced through the "auto" text import method. Users can now have the system automatically determine which columns should be imported as variables of type Text (instead of variables of type Double with text labels), or users can manually specify which columns are to be imported as text.
Automatic Variable Classification
  • To speed up and simplify the process of selecting variables for analyses, Variable selection in Analyses and Graphs will (optionally) limit the display of Variables to the types that are appropriate for their respective roles in the Analyses. For example, in "By Group" Analyses, by default only Categorical Variables will be displayed for selection as the By Variables.
Licensing Changes
  • STATISTICA Concurrent Licensing has been enhanced to allow for more granular licensing of modules and offline usage while a STATISTICA User is disconnected from the network (as well as supporting "trial period" usage of individual modules).
Sorting
  • Improved user interface to define complex sorting scenarios with very many keys.
  • Support via automation for up to 14 sort keys.
Merge Data
  • Merging from both open and disk-based Spreadsheets.
  • Addition of "Cartesian-join" merge.
  • Enhanced user interface.
Stacking/Unstacking
  • Added ability to Interleave output when stacking.
  • Stacking - Unstacked variables can be included/excluded from results.
  • Unstacking - Added options for handling multiple cross tab values.
Enhanced Spreadsheet Formulas and Case Selection Conditions

STATISTICA now provides an even broader selection of regular expression (including so-called fuzzy text searching) functions that can be used in spreadsheet and case selection formulas. For example:
  • RE_SEARCH - search for text in a variable using regular expressions.
  • RE_MATCH - compare text using regular expressions.
  • RE_REPLACE - text replacement in a variable using regular expressions.
  • LIKE - compare text using an operator similar to SQL's LIKE keyword.
Further Expanded STATISTICA Visual Basic Functionality
  • Go even further using STATISTICA as an efficient programming platform for developing highly interactive custom graphics applications.
  • Embed a wide variety of ActiveX controls within STATISTICA graphs.
"All Values" Categorization Method
  • A new method of categorization in graphs allows for up to 255 distinct categories of integer or non-integer values.

Basic Statistics
  • Enhanced breakdown tables generated with elimination of empty rows in generated tables.
Quality Control
  • STATISTICA QC Charts support aggregated data (means, ranges, standard deviations) as input. This capability is particularly useful when automated data collection equipment and instruments output only aggregated data for each sample.
Process Analysis
  • Gage Linearity analysis.
  • 5,000 cases limit has been removed.
SEWSS Enhancements:

Aggregated Data
SEWSS supports aggregated data (means, ranges, standard deviation) as input. This capability is important and useful when automated data collection equipment and instruments output only aggregated data for each sample.

Sets
Samples can be grouped together using labels which allow for unique specifications and limits per category in the label. This label, also called Set Name, will provide the user with the following options:

  • Set Names are for labeling only.
  • Calculate limits and sigma from the samples in a set and apply to the same Set.
  • Calculate limits and sigma from the first X samples in a set and apply to all future samples in the same Set.
  • Use the values returned in the Query with specification and limit column types assigned to them and apply them to the same Set.

21 CFR Part 11 Compliance
There are extensions to SEWSS offer options to better keep track of SEWSS users' activities and to increase administrator's control over the way in which SEWSS is being used. These features are also required for complete compliance in a 21 CFR Part 11 environment. This includes logging of system changes, implementing a Windows integrated logon environment, and locking Spreadsheets and Graphs from modifications.

New Products and Analysis Modules:

STATISTICA NIPALS Algorithm (PCA/PLS) - an implementation of a number of techniques known as Principal Component Analysis (PCA) and Partial Least Squares (PLS). In STATISTICA, PCA and PLS are implemented using the state of the art NIPALS algorithm (Nonlinear Iterative Partial Least Squares) a mathematical procedure designed to extract systematic variations, relationships, and information in datasets. STATISTICA NIPALS simplifies the analysis at hand while effectively combating the curse of high dimensionality (typically present when the number of variables is large). STATISTICA NIPALS is also particularly suited for use in data diagnostics, making it an ideal tool for use in Quality Control in many areas of science and technology. A few examples are pharmaceuticals, biochemicals and semiconductor industry. Important features include:

  • Scalability: The ability to handle datasets with very large number of variables.
  • Data diagnostics and inter-variable relations: Capable of applying PCA to data diagnostics, while also using Partial Least Squares for relating a number of predictors to a set of outcome variables (whether in a classification or a regression problem).
  • Integrated Graphical Analysis: Wide selection of integrated graphical techniques including batches plotted in the component space, importance plot of components, and univariate and multivariate QC Charts.
  • Cross-validation. Integrated options for cross-validation to evaluate the number of components to extract.
  • Quality Control. Wide selection of univariate and multivariate QC Charts for offline analysis or automatically-updated as new data are collected.

STATISTICA Sequence, Association and Link Analysis - this new, stand-alone product addresses the needs of clients in retailing, banking, insurance, etc., industries by implementing the fastest known, highly scalable sequence analysis algorithm with the ability to drive Association and Sequence rules in one single analysis. Furthermore, the program represents a stand-alone application that can be used for model building and deployment.

STATISTICA Multivariate Statistical Process Control (MSPC) - this new, stand-alone product (available in enterprise, client-server versions) is designed for advanced process control applications in many industries, including pharmaceutical, chemical and bio-chemical, food production and others; it provides the widest selection of univariate and multivariate techniques for statistical process control applications. Analytic capabilities include, among many others:

  • Partial Least Squares - comprehensive implementation of NIPALS algorithm for partial least squares regression including hierarchical PLS and multi-way PLS.
  • Principal Components - comprehensive implementation of NIPALS algorithm for Principal Components Analysis including hierarchical PCA and multi-way PCA.
  • Scalable to hundreds of thousands of parameters, both process parameters, in-process tests, and finished product tests.
  • Integrated Graphical Analysis - wide selection of integrated graphical techniques including batches plotted in the component space, importance plot of components, and univariate and multivariate QC Charts.
  • Cross-validation - integrated options for cross-validation to evaluate the number of components to extract.
  • Quality Control - wide selection of univariate and multivariate QC Charts for offline analysis or automatically-updated as new data are collected.

Random Forests - this new module of STATISTICA Data Miner applications offers cutting-edge techniques for building flexible models for classification and regression; particularly well-suited for extremely large numbers of predictor variables.


Go to STATISTICA Products Overview page.

©

Copyright StatSoft, Inc., 1984-2004. StatSoft, StatSoft logo, STATISTICA, SEWSS, SEDAS,

Data Miner, SEPATH and GTrees are trademarks of StatSoft, Inc.