책 이미지
책 정보
· 분류 : 외국도서 > 컴퓨터 > 데이터베이스 관리 > 데이터 마이닝
· ISBN : 9781484214800
· 쪽수 : 230쪽
· 출판일 : 2016-06-14
목차
Chapter 1: Introduction to Spark Chapter Goal: Introduce the reader to Spark in general. This book does not assume that the reader is already familiar with Spark. Sub -Topics Introduction to Spark and its key selling points The programming model Architecture Introduction to other systems within the ecosystem, such as MLlib, GraphX, SparkSQL, and SparkR Chapter 2: Spark Streaming Chapter Goal: Introduces Spark Streaming and the concept of micro batch processing (DStreams) Sub - Topics Introduction to Spark Streaming/DStreams Comparison with traditional stream processing How Spark Streaming works under the hood Programming API and how it relates to the general Spark API First sample application using FileInputDStream Chapter 3: Best Practices Chapter Goal: To transfer best practices in terms of application development Sub - Topics: Maintaining state in an application Data caching to reduce redundant work Offloading RDD maintenance to Tachyon Fault-tolerance and check-pointing Chapter 4: Ingesting data from external data sources Chapter Goal: To enable the reader to understand the various data ingestion options, their pros and cons, and their integration with Spark Streaming Sub - Topics: 1. Introduction to Receivers 2. Kafka 3. Twitter 4. Flume 5. Other sources 6. Writing your own connector. Example Apache Qpid Chapter 5: Optimizing and maintaining a Spark Streaming application/deployment Chapter Goal: To help the user in optimizing an application and how it can be maintained in production Sub - Topics Different configuration parameters and how they affect the application Parallelism Serialization, memory, etc. enhancements Various monitoring and instrumentation options Chapter 6: Spark Streaming, SQL, and R Chapter Goal: To illustrate how a SQL/Dataframe interface can simplify common transforms Introduction to SparkSQL and SQLContext Various SQL constructs Integration with R Design of a few real-world applications Chapter 7: Streaming Machine Learning Chapter Goal: Employ MLlib to implement streaming machine learning applications Introduction to streaming algorithms in MLlib Real-world applications using streaming MLlib Chapter 8: Lambda Architecture using Spark Chapter Goal: Blending data at rest with data in motion Introduction to the Lambda Architecture Design of Lambda Architecture using Spark Chapter 9: Java and Python APIs for Spark Streaming Chapter Goal: Introduction to Spark Streaming in Java and Python Java API Python API Chapter 10: Spark Streaming and Beyond Chapter Goal: Overview of some of the future plans for Spark Streaming from the ope n source community Project Tungsten and how its CPU and memory improvements can benefit streaming applications Links to useful resources














