책 이미지

책 정보
· 분류 : 외국도서 > 컴퓨터 > 데이터베이스 관리 > 일반
· ISBN : 9781484264997
· 쪽수 : 410쪽
· 출판일 : 2020-12-18
목차
Chapter 1: Setting up the Pyspark Environment
Chapter Goal: Introduce readers to the PySpark environment, walk them through steps to setup the environment and execute some basic operations
Number of pages: 20
Subtopics:
1. Setting up your environment & data
2. Basic operations
Chapter 2: Basic Statistics and Visualizations
Chapter Goal: Introduce readers to predictive model building framework and help them acclimate with basic data operations
Number of pages: 30
Subtopics:
1. Basic Statistics
2. data manipulations/feature engineering
3. Data visualizations
4. Model building framework
Chapter 3: Variable Selection
Chapter Goal: Illustrate the different variable selection techniques to identify the top variables in a dataset and how they can be implemented using PySpark pipelines
Number of pages: 40
Subtopics:
1. Principal Component Analysis2. Weight of Evidence & Information Value
3. Chi square selector
4. Singular Value Decomposition
5. Voting based approach
Chapter 4: Introduction to different supervised machine algorithms, implementations & Fine-tuning techniques
Chapter Goal: Explain and demonstrate supervised machine learning techniques and help the readers to understand the challenges, nuances of model fitting with multiple evaluation metrics
Number of pages: 40
Subtopics:
1. Supervised:
· Linear regression
· Logistic regression
· Decision Trees
· Random Forests
· Gradient Boosting
· Neural Nets
· Support Vector Machine
· One Vs Rest Classifier
· Naive Bayes
2. Model hyperparameter tuning:
· L1 & L2 regularization· Elastic net
Chapter 5: Model Validation and selecting the best model
Chapter Goal: Illustrate the different techniques used to validate models, demonstrate which technique should be used for a particular model selection task and finally pick the best model out of the candidate models
Number of pages: 30
Subtopics:
1. Model Validation Statistics:
· ROC
· Accuracy· Precision
· Recall
· F1 Score
· Misclassification
· KS
· Decile
· Lift & Gain
· R square
· Adjusted R square
· Mean squared error
Chapter 6: Unsupervised and recommendation algorithms
Chapter Goal: The readers explore a different set of algorithms ? Unsupervised and recommendation algorithms and the use case of when to apply them
Number of pages: 30
Subtopics:
1. Unsupervised:
· K-Means· Latent Dirichlet Allocation
2. Collaborative filtering using Alternating least squares
Chapter 7: End to end modeling pipelines
Chapter Goal: Exemplify building the automated model framework and introduce reader to a end to end model building pipeline including experimentation and model tracking
Number of pages: 40
Subtopics:
1. ML Flow
Chapter 8: Productionalizing a machine learning model
Chapter Goal: Demonstrate multiple model deployment techniques that can fit and serve variety of real-world use cases
Number of pages: 60
Subtopics:
1. Model Deployment using hdfs object
2. Model Deployment using Docker
3. Creating a simple Flask API
Chapter 9: Experimentations
Chapter Goal: The purpose of this chapter is to introduce hypothesis testing and use cases, optimizations for experiment-based data science applications
Number of pages: 40
Subtopics:
1. Hypothesis testing
2. Sampling techniques
Chapter 10: Other Tips: Optional
Chapter Goal: This bonus chapter is optional and will offer reader some handy tips and tricks of the trade
Number of pages: 20
Subtopics:
1. Tips on when to switch between python and PySpark
2. Graph networks