It
is frequently the case that you’ll need to construct new data. For example, it may be useful to create a new column
flagging the purchase of an extended warranty for example each transaction.
There
are two ways to construct new data:
-
Deriving
attributes (columns or characteristics)
-
Generating
records (rows)
Derived
attributes are new attributes that are constructed from one or more existing attributes in the same record. Example:
area = length * width.
Generating
records describes the creation of completely new records. Example: Create records for customers who made no purchase
during the past year. There was no reason to have such records in the raw data, but for modeling purposes it might make
sense to explicitly represent the fact that certain customers made zero purchases.
|