책 이미지
책 정보
· 분류 : 외국도서 > 과학/수학/생태 > 수학 > 확률과 통계 > 일반
· ISBN : 9781119282082
· 쪽수 : 320쪽
목차
Foreword 1
Chapter 1: What is Text Mining? 1
1.1 What is it? 1
1.1.1 What is text mining in practice? 1
1.1.2 Where does text mining fit? 1
1.2 Why we care about text mining? 1
1.2.1 What are the consequences of ignoring text? 1
1.2.2 What are the benefits of text mining? 1
1.2.3 Setting Expectations: When text mining should (and should not) be used. 1
1.3 A basic workflow. How the process works. 1
1.4 What tools do I need to get started with this? 1
1.5 A Simple Example 1
1.6 A Real World Use Case 1
1.7 Summary 1
Chapter 2: Basics of text mining 1
2.1 What is Text Mining in a practical sense? 1
2.2 Types of Text Mining: Bag of Words. 1
2.2.1 Types of Text Mining: Syntactic Parsing. 1
2.3 The text mining process in context 1
2.4 String Manipulation: Number of Characters & Substitutions 1
2.4.1 String Manipulations: Paste, Character Splits & Extractions 1
2.5 Keyword Scanning 1
2.6 String Packages stringr & stringi 1
2.7 Preprocessing Steps for Bag of Words Text Mining 1
2.8 Spell Check 1
2.9 Frequent Terms & Associations 1
2.9 Delta Assist Wrap Up 1
2.10 Summary 1
Chapter 3: Common Text Mining Visualizations 1
3.1 A tale of two (or three) cultures 1
3.2 Simple Exploration: Term Frequency, Associations & Word Networks 1
3.2.1 Term Frequency 1
3.2.2 Word Associations 1
3.2.3 Word Networks 1
3.3 Simple Word Clusters: Hierarchical Dendrograms 1
3.4 Word Clouds: Overused but Effective 1
3.4.1 One Corpus Word Clouds 1
3.4.2 Comparing and Contrasting Corpora in Word Clouds 1
3.4.3 Polarized Tag Plot 1
3.5 Summary 1
Chapter 4: Sentiment Scoring 1
4.1 What is Sentiment Analysis? 1
4.2 Sentiment Scoring: Parlor Trick or Insightful? 1
4.3 Polarity: Simple Sentiment Scoring 1
4.3.1 Subjectivity Lexicons 1
4.3.2 Qdap’s Scoring for positive and negative word choice 1
4.3.3 Revisiting Word Clouds…Sentiment Word Clouds 1
4.4 Emoticons :) Dealing with these perplexing clues 1
4.4.1 Symbol-Based Emoticons Native to R 1
4.4.2 Punctuation Based Emoticons 1
4.4.3 Emoji 1
4.5 R’s Archived Sentiment Scoring Library 1
4.5 Sentiment the tidytext way 1
4.6 Airbnb.com Boston Wrap Up 1
4.7 Summary 1
Chapter 5: Hidden Structures: Clustering, String Distance, Text Vectors & Topic Modeling 1
5.1 What is clustering? 1
5.1.1 K Means Clustering 1
5.1.2 Spherical K Means Clustering 1
5.1.3 K Mediod Clustering 1
5.1.4 Evaluating the cluster approaches 1
5.2 Calculating & Exploring String Distance 1
5.2.1 What is string distance? 1
5.2.2 Fuzzy Matching-amatch, ain 1
5.2.3 Similarity Distances- stringdist, stringdistmatrix 1
5.3 LDA Topic Modeling Explained 1
5.3.2 Topic Modeling Case Study 1
5.3.2 LDA &LDAvis 1
5.4 Text to Vectors using “text2vec” 1
5.4.1 text2vec 1
5.5 Summary 1
Chapter 6: Document Classification: Finding Clickbait from Headlines 1
6.1 What is document classification? 1
6.2 Clickbait Case Study 1
6.2.2 Session & Data Set Up 1
6.2.3 GLMNET Training 1
6.2.4 GLMNET Test Predictions 1
6.2.5 Test Set Evaluation 1
6.2.6 Finding the most impactful words 1
6.2.7 Case study Wrap Up: Model Accuracy & Improving Performance Recommendations 1
6.3 Summary 1
Chapter 7: Predictive Modeling: Using text for classifying & predicting outcomes. 1
7.1 Classification Vs Prediction 1
7.2 Case Study I: Will this patient come back to the hospital? 1
7.2.2 Patient Readmission in the Text Mining Workflow 1
7.2.3 Session & Data Set Up 1
7.2.4 Patient Modeling 1
7.2.5 More Model KPI: AUC, Recall, Precision & F1 1
7.2.5.1 Additional Evaluation Metrics 1
7.2.6 Apply the model to new patients 1
7.2.7 Patient Readmission Conclusion 1
7.3 Case Study II: Predicting Box Office Success 1
7.3.2 Opening Weekend Revenue in the Text Mining Workflow 1
7.3.3 Session & Data Set Up 1
7.3.4 Opening Weekend Modeling 1
7.3.5 Model Evaluation 1
7.3.6 Apply the Model to new Movie Reviews 1
7.3.7 Movie Revenue Conclusion 1
7.4 Summary 1
Chapter 8: The OpenNLP Project 1
8.1 What is the OpenNLP project? 1
8.2 R’s OpenNLP Package 1
8.3 Named Entities in Hillary Clinton’s Email 1
8.3.1 R Session Set-up 1
8.3.2 Minor Text Cleaning 1
8.3.3 Using OpenNLP on a single email 1
8.3.4 Using OpenNLP on multiple documents 1
8.3.5 Revisiting the Text Mining Workflow 1
8.4 Analyzing the Named Entities 1
8.4.1 Worldwide Map of Hillary Clinton’s Location Mentions 1
8.4.2 Mapping Only European Locations 1
8.4.3 Entities & Polarity: How does Hillary Clinton feel about an entity? 1
8.4.4 Stock Charts for Entities 1
8.4.5 Reach an Insight or Conclusion about Hillary Clinton’s Emails 1
8.5 Summary 1
Chapter 9: Text Sources 1
9.1 Sourcing Text 1
9.2 Web Sources 1
9.2.1 Web Scraping a Single Page with rvest 1
9.2.2 Web Scraping Multiple Pages with rvest 1
9.2.3 Application Program Interfaces (APIs) 1
9.2.4 Newspaper Articles from The Guardian Newspaper 1
9.2.5 Tweets using the “twitteR” Package 1
9.2.6 Calling an API without a dedicated R package 1
9.2.7 Using jsonlite to access the New York Times 1
9.2.8 Using RCurl & XML to Parse Google News Feeds 1
9.2.9 The tm library Web-Mining Plugin 1
9.3 Getting Text from File Sources 1
9.3.1 Individual CSV, TXT and Microsoft Office Files 1
9.3.2 Reading multiple files quickly 1
9.3.2 Extracting Text from PDFs 1
9.3.3 Optical Character Recognition: Extracting Text from Images 1
9.4 Summary 1