DSPA Chapter 19 Text Mining (TM) and Natural Language Processing (NLP)
Natural Language Processing (NLP) and Text Mining (TM) refer to
automated machine-driven algorithms for semantically mapping, extracting
information, and understanding of (natural) human language. Sometimes,
this involves extracting salient information from large amounts of
unstructured text. To do so, we need to build a semantic and syntactic
mapping algorithm for effective processing of heavy text. Related to
NLP/TM, the work we did in Chapter 7 showed a powerful text classifier using the naive Bayes algorithm.
In this Chapter, we will present more details about various text
processing strategies in R. Specifically, we will present simulated and
real examples of text processing and computing document term frequency
(TF), inverse document frequency (IDF), and cosine similarity
transformation.
…Read more
Less…