DSPA Chapter 19 Text Mining (TM) and Natural Language Processing (NLP)

views comments

Natural Language Processing (NLP) and Text Mining (TM) refer to automated machine-driven algorithms for semantically mapping, extracting information, and understanding of (natural) human language. Sometimes, this involves extracting salient information from large amounts of unstructured text. To do so, we need to build a semantic and syntactic mapping algorithm for effective processing of heavy text. Related to NLP/TM, the work we did in Chapter 7 showed a powerful text classifier using the naive Bayes algorithm.

In this Chapter, we will present more details about various text processing strategies in R. Specifically, we will present simulated and real examples of text processing and computing document term frequency (TF), inverse document frequency (IDF), and cosine similarity transformation.

Tags

DSPA Chapter 19 Text Mining (TM) and Natural Language Processing (NLP)

Related Media