logo
logo
x
바코드검색
BOOKPRICE.co.kr
책, 도서 가격비교 사이트
바코드검색

인기 검색어

실시간 검색어

검색가능 서점

도서목록 제공

Statistical Regression and Classification : From Linear Models to Machine Learning

Statistical Regression and Classification : From Linear Models to Machine Learning (Hardcover)

Norman Matloff (지은이)
Taylor & Francis Ltd
378,000원

일반도서

검색중
서점 할인가 할인률 배송비 혜택/추가 실질최저가 구매하기
309,960원 -18% 0원
15,500원
294,460원 >
yes24 로딩중
교보문고 로딩중
notice_icon 검색 결과 내에 다른 책이 포함되어 있을 수 있습니다.

중고도서

검색중
서점 유형 등록개수 최저가 구매하기
로딩중

eBook

검색중
서점 정가 할인가 마일리지 실질최저가 구매하기
로딩중

책 이미지

Statistical Regression and Classification : From Linear Models to Machine Learning
eBook 미리보기

책 정보

· 제목 : Statistical Regression and Classification : From Linear Models to Machine Learning (Hardcover) 
· 분류 : 외국도서 > 과학/수학/생태 > 수학 > 확률과 통계 > 일반
· ISBN : 9781138066465
· 쪽수 : 490쪽
· 출판일 : 2017-07-20

목차

*Statistical Regression and Classification: From Linear Models to Machine Learning was awarded the 2017 Ziegel Award for the best book reviewed in Technometrics in 2017.*

Chapter One

Setting the Stage


Example: Predicting Bike-Sharing Activity

Example of the Prediction Goal: Body Fat

Example of the Description Goal: Who Clicks Web Ads?

Optimal Prediction

A Note About E(), Samples and Populations

Example: Do Baseball Players Gain Weight As They Age?

Prediction vs Description
A First Estimator
A Possibly Better Estimator, Using a Linear Model

Parametric vs Nonparametric Models

Example: Click-Through Rate

Several Predictor Variables
Multipredictor Linear Models
Estimation of Coefficients
The Description Goal
Nonparametric Regression Estimation: k-NN
Looking at Nearby Points
Measures of Nearness
The k-NN Method, and Tuning Parameters
Nearest-Neighbor Analysis in the regtools
Package
Example: Baseball Player Data

After Fitting a Model, How Do We Use It for Prediction?
Parametric Settings
Nonparametric Settings
The Generic predict() Function

Overfitting, and the Variance-Bias Tradeoff
Intuition
Example: Student Evaluations of Instructors

Cross-Validation
Linear Model Case
The Code
Applying the Code
k-NN Case
Choosing the Partition Sizes


Important Note on Tuning Parameters


Rough Rule of Thumb


Example: Bike-Sharing Data
Linear Modeling of _(t)
Nonparametric Analysis


Interaction Terms, Including Quadratics
Example: Salaries of Female Programmers and Engineers

Saving Your Work
Higher-Order Polynomial Models


Classification Techniques

It's a Regression Problem!

Example: Bike-Sharing Data


Crucial Advice: Don't Automate, Participate!


Mathematical Complements
Indicator Random Variables
Mean Squared Error of an Estimator
_(t) Minimizes Mean Squared Prediction Error
_(t) Minimizes the Misclassification Rate
Kernel-Based Nonparametric Estimation of Regression
Functions
General Nonparametric Regression
Some Properties of Conditional Expectation
Conditional Expectation As a Random Variable
The Law of Total Expectation
Law of Total Variance
Tower Property
Geometric View

Computational Complements
CRAN Packages
The Function tapply() and Its Cousins
The Innards of the k-NN Code
Function Dispatch


Centering and Scaling

Further Exploration: Data, Code and Math Problems



Chapter Two

Linear Regression Models



Notation

The "Error Term"


Random- vs Fixed-X Cases

Least-Squares Estimation
Motivation
Matrix Formulations
() in Matrix Terms
Using Matrix Operations to Minimize ()
Models Without an Intercept Term


A Closer Look at lm() Output
Statistical Inference

Assumptions
Classical
Motivation: the Multivariate Normal Distribution Family


Unbiasedness and Consistency
b_ Is Unbiased
Bias As an Issue/Nonissue
b_ Is Statistically Consistent


Inference under Homoscedasticity
Review: Classical Inference on a Single Mean
Back to Reality
The Concept of a Standard Error
Extension to the Regression Case
Example: Bike-Sharing Data

Collective Predictive Strength of the X(j)
Basic Properties
Definition of R
Bias Issues
Adjusted-R
The Leaving-One-Out Method"
Extensions of LOOM
LOOM for k-NN
Other Measures


The Practical Value of p-Values | Small OR Large
Misleadingly Small p-Values
Example: Forest Cover Data
Example: Click Through Data
Misleadingly LARGE p-Values
The Verdict

Missing Values

Mathematical Complements
Covariance Matrices
The Multivariate Normal Distribution Family
The Central Limit Theorem
Details on Models Without a Constant Term
Unbiasedness of the Least-Squares Estimator
Consistency of the Least-Squares Estimator
Biased Nature of S

The Geometry of Conditional Expectation
Random Variables As Inner Product Spaces
Projections
Conditional Expectations As Projections
Predicted Values and Error Terms Are Uncorrelated
Classical \Exact" Inference
Asymptotic (p + )-Variate Normality of b_


Computational Complements
Details of the Computation of ()
R Functions Relating to the Multivariate Normal Distribution
Family
Example: Simulation Computation of a Bivariate
Normal Quantity
More Details of 'lm' Objects

Further Exploration: Data, Code and Math Problems



Chapter Three

Homoscedasticity and Other Assumptions in Practice

Normality Assumption


Independence Assumption | Don't Overlook It

Estimation of a Single Mean
Inference on Linear Regression Coefficients
What Can Be Done?
Example: MovieLens Data

Dropping the Homoscedasticity Assumption
Robustness of the Homoscedasticity Assumption
Weighted Least Squares
A Procedure for Valid Inference
The Methodology
Example: Female Wages
Simulation Test
Variance-Stabilizing Transformations
The Verdict


Further Reading


Computational Complements
The R merge() Function


Mathematical Complements
The Delta Method
Distortion Due to Transformation


Further Exploration: Data, Code and Math Problems



Chapter Four

Generalized Linear and Nonlinear Models


Example: Enzyme Kinetics Model

The Generalized Linear Model (GLM)
Definition
Poisson Regression
Exponential Families
GLM Computation
R's glm() Function


GLM: the Logistic Model
Motivation
Example: Pima Diabetes Data
Interpretation of Coefficients
The predict() Function Again
Overall Prediction Accuracy
Example: Predicting Spam E-mail
Linear Boundary


GLM: the Poisson Regression Model


Least-Squares Computation for Nonlinear Models
The Gauss-Newton Method
Eicker-White Asymptotic Standard Errors
Example: Bike Sharing Data
The Elephant in the Room": Convergence Issues


Further Reading


Computational Complements
R Factors

Mathematical Complements
Maximum Likelihood Estimation

Further Exploration: Data, Code and Math Problems



Chapter Five

Multiclass Classification Problems


Key Notation

Key Equations

Estimating the Functions i(t)

How Do We Use Models for Prediction?

One vs All or All vs All?
Which Is Better?
Example: Vertebrae Data
Intuition
Example: Letter Recognition Data
Example: k-NN on the Letter Recognition Data
The Verdict


The Classical Approach: Fisher Linear Discriminant Analysis
Background
Derivation
Example: Vertebrae Data
LDA Code and Results


Multinomial Logistic Model
Model
Software
Example: Vertebrae Data


The Issue of \Unbalanced" (and Balanced) Data
Why the Concern Regarding Balance?
A Crucial Sampling Issue
It All Depends on How We Sample
Remedies
Example: Letter Recognition

Going Beyond Using the 0.5 Threshhold
Unequal Misclassification Costs
Revisiting the Problem of Unbalanced Data
The Confusion Matrix and the ROC Curve
Code
Example: Spam Data


Mathematical Complements
Classification via Density Estimation
Methods for Density Estimation
Time Complexity Comparison, OVA vs AVA
Optimal Classification Rule for Unequal Error Costs


Computational Complements
R Code for OVA and AVA Logit Analysis
ROC Code


Further Exploration: Data, Code and Math Problems



Chapter Six

Model Fit: Assessment and Improvement


Aims of This Chapter


Methods


Notation


Goals of Model Fit-Checking
Prediction Context
Description Context
Center vs Fringes of the Data Set


Example: Currency Data


Overall Measures of Model Fit
R-Squared, Revisited
Cross-Validation, Revisited
Plotting Parametric Fit Against Nonparametric One
Residuals vs Smoothing


Diagnostics Related to Individual Predictors
Partial Residual Plots
Plotting Nonparametric Fit Against Each Predictor
The freqparcoord Package
Parallel Coordinates
The regdiag() Function


Effects of Unusual Observations on Model Fit
The inuence() Function
Example: Currency Data
Use of freqparcoord for Outlier Detection


Automated Outlier Resistance
Median Regression
Example: Currency Data


Example: Vocabulary Acquisition


Classification Settings
Example: Pima Diabetes Study


Improving Fit
Deleting Terms from the Model
Adding Polynomial Terms
Example: Currency Data
Example: Programmer/Engineer Census Data
Boosting
View from the 30,000 Foot Level
Performance


A Tool to Aid Model Selection


Special Note on the Description Goal


Computational Complements
Data Wrangling for the Word Bank Dataset
Mathematical Complements
The Hat Matrix
Matrix Inverse Update
The Median Minimizes Mean Absolute Deviation


Further Exploration: Data, Code and Math Problems




Chapter Seven


Disaggregating Regressor Effects


A Small Analytical Example

Example: Baseball Player Data

Simpson's Paradox
Example: UCB Admissions Data (Logit)
The Verdict


Unobserved Predictor Variables
Instrumental Variables (IVs)
The IV Method
Stage Least Squares:
Example: Years of Schooling
Multiple Predictors
The Verdict
Random Effects Models
Example: Movie Ratings Data, Random Effects
Multiple Random Effects
Why Use Random/Mixed Effects Models?


Regression Function Averaging
Estimating the Counterfactual
Example: Job Training
Small Area Estimation: \Borrowing from Neighbors"
The Verdict


Multiple Inference
The Frequent Occurence of Extreme Events
Relation to Statistical Inference
The Bonferroni Inequality
Scheffe's Method
Example: MovieLens Data
The Verdict


Computational Complements
Movie Lens Data Wrangling
More Data Wrangling in the MovieLens Example


Mathematical Complements
Iterated Projections
Standard Errors for RFA
Asymptotic Chi-Square Distributions


Further Exploration: Data, Code and Math Problems



Chapter Eight

Shrinkage Estimators


Relevance of James-Stein to Regression Estimation


Multicollinearity
What's All the Fuss About?
A Simple Guiding Model
Wrong" Signs in Estimated Coefficients
Checking for Multicollinearity
The Variance Ination Factor
Example: Currency Data
What Can/Should One Do?
Do Nothing
Eliminate Some Predictors
Employ a Shrinkage Method


Ridge Regression
Alternate Definitions
Yes, It Is Smaller
Choosing the Value of _
Example: Currency Data


The LASSO
Definition
The lars Package
Example: Currency Data
The Elastic Net


Cases of Exact Multicollinearity, Including p > n
Why It May Work
Example: R mtcars Data
Additional Motivation for the Elastic Net


Bias, Standard Errors and Significance Tests

Generalized Linear Models
Example: Vertebrae Data


Other Terminology


Further Reading


Mathematical Complements
James-Stein Theory
Definition
Theoretical Properties
When Might Shrunken Estimators Be Helpful?
Ridge Action Increases Eigenvalues


Computational Complements
Code for ridgelm()


Further Exploration: Data, Code and Math Problems


Chapter Nine

Variable Selection and Dimension Reduction


A Closer Look at Under/Overfitting
A Simple Guiding Example


How Many Is Too Many?


Fit Criteria
Some Common Measures
No Panacea!

Variable Selection Methods


Simple Use of p-Values: Pitfalls


Asking \What If" Questions


Stepwise Selection
Basic Notion
Forward vs Backward Selection
R Functions for Stepwise Regression
Example: Bodyfat Data
Classification Settings
Example: Bank Marketing Data
Example: Vertebrae Data
Nonparametric Settings
Is Dimension Reduction Important in the
Nonparametric Setting?
The LASSO
Why the LASSO Often Performs Subsetting
Example: Bodyfat Data


Post-Selection Inference


Direct Methods for Dimension Reduction
Informal Nature
Role in Regression Analysis
PCA
Issues
Example: Bodyfat Data
Example: Instructor Evaluations
Nonnegative Matrix Factorization (NMF)
Overview
Interpretation
Sum-of-Parts Property
Example: Spam Detection
Use of freqparcoord for Dimension Reduction
Example: Student Evaluations of Instructors
Dimension Reduction for Dummy/R Factor
Variables


The Verdict


Further Reading


Computational Complements
Computation for NMF


Mathematical Complements
MSEs for the Simple Example


Further Exploration: Data, Code and Math Problems

 

Chapter Ten

Partition-Based Methods


CART


Example: Vertebral Column Data


Technical Details


Statistical Consistency


Tuning Parameters


Random Forests
Bagging
Example: Vertebrae Data
Example: Letter Recognition


Other Implementations of CART


Further Exploration: Data, Code and Math Problems



Chapter Eleven

Semi-Linear Methods



k-NN with Linear Smoothing
Extrapolation Via lm()
Multicollinearity Issues
Example: Bodyfat Data
Tuning Parameter

Linear Approximation of Class Boundaries
SVMs
Geometric Motivation
Reduced Convex Hulls
Tuning Parameter
Nonlinear Boundaries
Statistical Consistency
Example: Letter Recognition Data
Neural Networks
Example: Vertebrae Data
Tuning Parameters and Other Technical Details
Dimension Reduction
Statistical Consistency


The Verdict


Mathematical Complements

Edge Bias with k-NN and Kernel Methods
Dual Formulation for SVM
The Kernel Trick


Further Reading


Further Exploration: Data, Code and Math Problems

 

Chapter Twelve

Regression and Classification in Big Data


Solving the Big-n Problem
Software Alchemy
Example: Flight Delay Data
More on the Insufficient Memory Issue
Deceivingly  Big- n
The Independence Assumption in Big-n Data


Addressing Big-p
How Many Is Too Many?
Toy Model
Results from the Research Literature
A Much Simpler and More Direct Approach
Nonparametric Case
The Curse of Dimensionality
Example: Currency Data
Example: Quiz Documents
The Verdict


Mathematical Complements
Speedup from Software Alchemy

Computational Complements
The partools Package
Use of the tm Package

Further Exploration: Data, Code and Math Problems

A Matrix Algebra


A Terminology and Notation
A Matrix Addition and Multiplication


A Matrix Transpose


A Linear Independence


A Matrix Inverse


A Eigenvalues and Eigenvectors


A Rank of a Matrix


A Matrices of the Form B'B


A Partitioned Matrices


A Matrix Derivatives


A Matrix Algebra in R


A Further Reading

이 포스팅은 쿠팡 파트너스 활동의 일환으로,
이에 따른 일정액의 수수료를 제공받습니다.
이 포스팅은 제휴마케팅이 포함된 광고로 커미션을 지급 받습니다.
도서 DB 제공 : 알라딘 서점(www.aladin.co.kr)
최근 본 책