 |
Examine the “gross” or “surface” properties of the acquired data and report on the results. |
|
Purpose
Create
a report that describes the data that has been acquired, including the format of the data, the quantity of data (for
example, the number of records and fields in each table), the identities of the fields, and any other surface features
which have been discovered. Evaluate whether the data acquired satisfies the relevant requirements.
|
Relationships
Roles | Primary Performer:
| Additional Performers:
|
Process Usage |
|
Main Description
There
are many ways to describe data, but most descriptions focus on the quantity and quality of the data—how much data is
available and the condition of the data. Listed below are some key characteristics to address when describing
data:
-
Amount
of data. For
most modeling techniques, there are trade-offs associated with data size. Large data sets can produce more accurate
models, but they can also lengthen the processing time. Consider whether using a subset of data is a possibility.
When taking notes for the final report, be sure to include size statistics for all data sets, and remember to
consider both the number of records as well as fields (attributes) when describing data.
-
Value
types. Data
can take a variety of formats, such as numeric, categorical (string), or Boolean (true/false).
Paying attention to value type can head off problems during later modeling.
-
Coding
schemes. Frequently,
values in the database are representations of characteristics such as gender or product type. For example, one data
set may use M and F to represent male and female, while another may use the numeric
values 1 and 2. Note any conflicting schemes in the data report.
|
Licensed Materials - Property of IBM. (c) Copyright IBM Corp. 2015.
IBM, the IBM logo, and SPSS are trademarks of International Business Machines Corp,
registered in many jurisdictions worldwide. Other products and service names may be trademarks of IBM or
other companies. You may use the Content 'AS IS" or modify them, however IBM will not be responsible for
any deficiencies or errors that result from modifications that you make.
|
|
|