logo
logo
x
바코드검색
BOOKPRICE.co.kr
책, 도서 가격비교 사이트
바코드검색

인기 검색어

실시간 검색어

검색가능 서점

도서목록 제공

[eBook Code] Real-Time Analytics

[eBook Code] Real-Time Analytics (eBook Code, 1st)

(Techniques to Analyze and Visualize Streaming Data)

바이런 엘리스 (지은이)
  |  
Wiley
2014-06-23
  |  
62,550원

일반도서

검색중
서점 할인가 할인률 배송비 혜택/추가 실질최저가 구매하기
알라딘 50,040원 -20% 0원 0원 50,040원 >
yes24 로딩중
교보문고 로딩중
notice_icon 검색 결과 내에 다른 책이 포함되어 있을 수 있습니다.

중고도서

검색중
로딩중

e-Book

검색중
서점 정가 할인가 마일리지 실질최저가 구매하기
로딩중

해외직구

책 이미지

[eBook Code] Real-Time Analytics

책 정보

· 제목 : [eBook Code] Real-Time Analytics (eBook Code, 1st) (Techniques to Analyze and Visualize Streaming Data)
· 분류 : 외국도서 > 컴퓨터 > 데이터베이스 관리 > 데이터 웨어하우징
· ISBN : 9781118838020
· 쪽수 : 432쪽

목차

Introduction xv

Chapter 1 Introduction to Streaming Data 1

Sources of Streaming Data 2

Operational Monitoring 3

Web Analytics 3

Online Advertising 4

Social Media 5

Mobile Data and the Internet of Things 5

Why Streaming Data Is Different 7

Always On, Always Flowing 7

Loosely Structured 8

High-Cardinality Storage 9

Infrastructures and Algorithms 10

Conclusion 10

Part I Streaming Analytics Architecture 13

Chapter 2 Designing Real-Time Streaming Architectures 15

Real-Time Architecture Components 16

Collection 16

Data Flow 17

Processing 19

Storage 20

Delivery 22

Features of a Real-Time Architecture 24

High Availability 24

Low Latency 25

Horizontal Scalability 26

Languages for Real-Time Programming 27

Java 27

Scala and Clojure 28

JavaScript 29

The Go Language 30

A Real-Time Architecture Checklist 30

Collection 31

Data Flow 31

Processing 32

Storage 32

Delivery 33

Conclusion 34

Chapter 3 Service Configuration and Coordination 35

Motivation for Confi guration and Coordination Systems 36

Maintaining Distributed State 36

Unreliable Network Connections 36

Clock Synchronization 37

Consensus in an Unreliable World 38

Apache ZooKeeper 39

The znode 39

Watches and Notifi cations 41

Maintaining Consistency 41

Creating a ZooKeeper Cluster 42

ZooKeeper’s Native Java Client 47

The Curator Client 56

Curator Recipes 63

Conclusion 70

Chapter 4 Data-Flow Management in Streaming Analysis 71

Distributed Data Flows 72

At Least Once Delivery 72

The “n+1” Problem 73

Apache Kafka: High-Throughput Distributed Messaging 74

Design and Implementation 74

Configuring a Kafka Environment 80

Interacting with Kafka Brokers 89

Apache Flume: Distributed Log Collection 92

The Flume Agent 92

Configuring the Agent 94

The Flume Data Model 95

Channel Selectors 95

Flume Sources 98

Flume Sinks 107

Sink Processors 110

Flume Channels 110

Flume Interceptors 112

Integrating Custom Flume Components 114

Running Flume Agents 114

Conclusion 115

Chapter 5 Processing Streaming Data 117

Distributed Streaming Data Processing 118

Coordination 118

Partitions and Merges 119

Transactions 119

Processing Data with Storm 119

Components of a Storm Cluster 120

Configuring a Storm Cluster 122

Distributed Clusters 123

Local Clusters 126

Storm Topologies 127

Implementing Bolts 130

Implementing and Using Spouts 136

Distributed Remote Procedure Calls 142

Trident: The Storm DSL 144

Processing Data with Samza 151

Apache YARN 151

Getting Started with YARN and Samza 153

Integrating Samza into the Data Flow 157

Samza Jobs 157

Conclusion 166

Chapter 6 Storing Streaming Data 167

Consistent Hashing 168

“NoSQL” Storage Systems 169

Redis 170

MongoDB 180

Cassandra 203

Other Storage Technologies 215

Relational Databases 215

Distributed In-Memory Data Grids 215

Choosing a Technology 215

Key-Value Stores 216

Document Stores 216

Distributed Hash Table Stores 216

In-Memory Grids 217

Relational Databases 217

Warehousing 217

Hadoop as ETL and Warehouse 218

Lambda Architectures 223

Conclusion 224

Part II Analysis and Visualization 225

Chapter 7 Delivering Streaming Metrics 227

Streaming Web Applications 228

Working with Node 229

Managing a Node Project with NPM 231

Developing Node Web Applications 235

A Basic Streaming Dashboard 238

Adding Streaming to Web Applications 242

Visualizing Data 254

HTML5 Canvas and Inline SVG 254

Data-Driven Documents: D3.js 262

High-Level Tools 272

Mobile Streaming Applications 277

Conclusion 279

Chapter 8 Exact Aggregation and Delivery 281

Timed Counting and Summation 285

Counting in Bolts 286

Counting with Trident 288

Counting in Samza 289

Multi-Resolution Time-Series Aggregation 290

Quantization Framework 290

Stochastic Optimization 296

Delivering Time-Series Data 297

Strip Charts with D3.js 298

High-Speed Canvas Charts 299

Horizon Charts 301

Conclusion 303

Chapter 9 Statistical Approximation of Streaming Data 305

Numerical Libraries 306

Probabilities and Distributions 307

Expectation and Variance 309

Statistical Distributions 310

Discrete Distributions 310

Continuous Distributions 312

Joint Distributions 315

Working with Distributions 316

Inferring Parameters 316

The Delta Method 317

Distribution Inequalities 319

Random Number Generation 319

Generating Specific Distributions 321

Sampling Procedures 324

Sampling from a Fixed Population 325

Sampling from a Streaming Population 326

Biased Streaming Sampling 327

Conclusion 329

Chapter 10 Approximating Streaming Data with Sketching 331

Registers and Hash Functions 332

Registers 332

Hash Functions 332

Working with Sets 336

The Bloom Filter 338

The Algorithm 338

Choosing a Filter Size 340

Unions and Intersections 341

Cardinality Estimation 342

Interesting Variations 344

Distinct Value Sketches 347

The Min-Count Algorithm 348

The HyperLogLog Algorithm 351

The Count-Min Sketch 356

Point Queries 356

Count-Min Sketch Implementation 357

Top-K and “Heavy Hitters” 358

Range and Quantile Queries 360

Other Applications 364

Conclusion 364

Chapter 11 Beyond Aggregation 367

Models for Real-Time Data 368

Simple Time-Series Models 369

Linear Models 373

Logistic Regression 378

Neural Network Models 380

Forecasting with Models 389

Exponential Smoothing Methods 390

Regression Methods 393

Neural Network Methods 394

Monitoring 396

Outlier Detection 397

Change Detection 399

Real-Time Optimization 400

Conclusion 402

Index 403

저자소개

바이런 엘리스 (지은이)    정보 더보기
광고 기술 기업인 스펀지셀(Spongecell)의 CTO. 연구와 개발뿐만 아니라 스펀지셀의 컴퓨팅 인프라의 유지 관리를 담당하고 있다. 스펀지셀에 입사하기 전에는 온라인 연결 기술의 선두 업체인 라이브퍼슨(Liveperson)의 최고 데이터 과학자(Chief Data Scientist)였다. 세계 최대의 광고 거래 플랫폼 중 하나인 애드브라이트(adBrite)에서 다양한 직급을 역임하기도 했다. 하버드 대학교에서 통계학 박사학위를 취득했다.
펼치기
이 포스팅은 쿠팡 파트너스 활동의 일환으로,
이에 따른 일정액의 수수료를 제공받습니다.
도서 DB 제공 : 알라딘 서점(www.aladin.co.kr)
최근 본 책