DSPA Chapter 16 (Variable Selection)As we mentioned in
Chapter 15,
variable selection is very important when dealing with bioinformatics,
healthcare, and biomedical data where we may have more features than
observations. Variable selection, or feature selection, can help us
focus only on the core important information contained in the
observations, instead of every piece of information. Due to presence of
intrinsic and extrinsic noise, the volume and complexity of big health
data, and different methodological and technological challenges, this
process of identifying the salient features may resemble finding a
needle in a haystack. Here, we will illustrate alternative strategies
for feature selection using filtering (e.g., correlation-based feature
selection), wrapping (e.g., recursive feature elimination), and
embedding (e.g., variable importance via random forest classification)
techniques.
…Read more
Less…