DSPA Chapter 19 Text Mining (TM) and Natural Language Processing (NLP)
From Tina Chang
Natural Language Processing (NLP) and Text Mining (TM) refer to automated machine-driven algorithms for semantically mapping, extracting information, and understanding of (natural) human language. Sometimes, this involves extracting salient information from large amounts of unstructured text. To do so, we need to build a semantic and syntactic mapping algorithm for effective processing of heavy text. Related to NLP/TM, the work we did in Chapter 7 showed a powerful text classifier using the naive Bayes algorithm.
In this Chapter, we will present more details about various text processing strategies in R. Specifically, we will present simulated and real examples of text processing and computing document term frequency (TF), inverse document frequency (IDF), and cosine similarity transformation.