Task: Collect Initial Data
Acquire the data (or access to the data) listed in the project resources. This initial collection includes data loading, which is necessary for data understanding. Load the data into Modeler. This effort possibly leads to initial data preparation steps.
Note: if you acquire multiple data sources, integration is an additional issue, either here or in the later data preparation activity.
Purpose
Acquire the data (or access to the data) listed in the project resources. This initial collection includes data loading, which is necessary for data understanding. Load the data into Modeler. This effort possibly leads to initial data preparation steps.
Note: if you acquire multiple data sources, integration is an additional issue, either here or in the later data preparation activity.
Relationships
Key Considerations

Data come from a variety of sources, such as:

Existing data. This includes a wide variety of data, such as transactional data, survey data, Web logs, etc. Consider whether the existing data are enough to meet your needs.

Purchased data. Does your organization use supplemental data, such as demographics? If not, consider whether it may be needed.

Additional data. If the above sources don’t meet your needs, you may need to conduct surveys or begin additional tracking to supplement the existing data stores.

Take a look at the data in SPSS Modeler and consider the following questions. Be sure to take notes on your findings:

  • Which attributes (columns) from the database seem most promising?
  • Which attributes seem irrelevant and can be excluded?
  • Is there enough data to draw generalizable conclusions or make accurate predictions?
  • Are there too many attributes for your modeling method of choice?
  • Are you merging various data sources? If so, are there areas that might pose a problem when merging?
  • Have you considered how missing values are handled in each of your data sources?