책 이미지
책 정보
· 분류 : 외국도서 > 경제경영 > 산업 > 컴퓨터
· ISBN : 9781484257807
· 쪽수 : 274쪽
· 출판일 : 2020-06-12
목차
Chapter 1, Introduction to large scale data analytics.
Chapter goal: Reader should understand the data analytics and the workflow.
· So what is data analysis?
· The process of running a data analysis project.
· Real world example.
· What about data scientists?
Chapter 3, Distributed processing, Spark and Databricks.
Chapter goal: Reader should understand Spark on a high level and what Databricks is.
· Computational history.
· Scale up vs scale out.
· Traditional analytics platforms.
· The power of Spark.· The simplicity of Databricks.
Chapter 4, Getting started with Databricks.
Chapter goal: Reader should understand how to get a Databricks installation up and running.
· A short introduction to Spark architecture
· Setting up a cloud account.
· Getting Databricks running.· Finally ? time to start Databricks.
Chapter 5, Workspaces, Clusters and Notebooks.
Chapter goal: Reader should understand how to find his or her way around the UI.
· Finding your way around the user interface.
· Starting the engine ? cluster creation.
· A short note about checkboxes and configurations.· Picking the right notebook.
· Keeping track of the workspace
Chapter 6, Getting data into Databricks.
Chapter goal: Reader should understand the many ways they can get data into Databricks.
· Filesystems and data formats.
· Working with schemas.· Importing Excel data.
· Picking up information from the web.
· Mounting the cloud data lake.
Chapter 7, Querying data using SQL.
Chapter goal: Reader should understand how to use SQL for looking and manipulating data.
· Databases and tables in the Hive Metastore.· Pulling some data.
· Joining, grouping and summarizing.
· Views and procedures.
· Hey ? what’s up with updates?
Chapter 8, Python (and a little bit of Scala and R).
Chapter goal: Reader should understand how to use Python for playing around with data.
· An introduction to Dataframes.
· Python vs SQL.
· Working with data
· But what about Scala and R?
Chapter 9, ETL and more advanced data wrangling.
Chapter goal: Reader should understand even more around manipulating data.· Stars and snowflakes.
· Cleaning the data.
· Speeding things up.
· Working with partitions.
· Setting parameters.Chapter 10, Connecting from afar.
Chapter goal: Reader should understand how they can connect to Databricks from other tools.
· Setting up ODBC and JDBC.
· Getting to know the API:s.
· Example: Connecting Power BI.
Chapter 11, Running in Production
Chapter goal: Reader should understand how to run and monitor jobs in production.· How to set up jobs.
· Working with schedules.
· Monitoring the jobs.Chapter 12, Removing the training wheels.
Chapter goal: Reader should get to know the more advanced options.
· Security in Databricks.
· Machine learning using MLlib.
· Going full ACID with Delta lake.
· High speed streaming.· A deep dive into Spark architecture.















