 |
Decide on the data to be used for analysis. Criteria include relevance to the data mining goals, quality, and technical constraints such as limits on data volume or data types. Note that data selection covers selection of attributes (columns) as well as selection of records (rows) in a table. |
|
Purpose
Decide on the data to be used for analysis. Criteria include relevance to the data mining goals, quality, and technical
constraints such as limits on data volume or data types. Note that data selection covers selection of attributes (columns)
as well as selection of records (rows) in a table. |
Relationships
Roles | Primary Performer:
| Additional Performers:
|
Process Usage |
|
Key Considerations
List the data to be included/excluded and the reasons for these decisions. Bear in mind the followings:
-
Is a given attribute relevant to your data mining goals?
-
Does the quality of a particular data set or attribute preclude the validity of your results?
-
Can you salvage such data?
-
Are there any constraints on using particular fields such as gender or race?
|
Licensed Materials - Property of IBM. (c) Copyright IBM Corp. 2015.
IBM, the IBM logo, and SPSS are trademarks of International Business Machines Corp,
registered in many jurisdictions worldwide. Other products and service names may be trademarks of IBM or
other companies. You may use the Content 'AS IS" or modify them, however IBM will not be responsible for
any deficiencies or errors that result from modifications that you make.
|
|
|