Task: Select Data
Decide on the data to be used for analysis. Criteria include relevance to the data mining goals, quality, and technical constraints such as limits on data volume or data types. Note that data selection covers selection of attributes (columns) as well as selection of records (rows) in a table.
Purpose
Decide on the data to be used for analysis. Criteria include relevance to the data mining goals, quality, and technical constraints such as limits on data volume or data types. Note that data selection covers selection of attributes (columns) as well as selection of records (rows) in a table.
Relationships
RolesPrimary Performer: Additional Performers:
Process Usage
Key Considerations

List the data to be included/excluded and the reasons for these decisions. Bear in mind the followings:

  • Is a given attribute relevant to your data mining goals?
  • Does the quality of a particular data set or attribute preclude the validity of your results?
  • Can you salvage such data?
  • Are there any constraints on using particular fields such as gender or race?