책 이미지

책 정보
· 분류 : 외국도서 > 경제경영 > 경제수학
· ISBN : 9781032340197
· 쪽수 : 576쪽
· 출판일 : 2022-07-12
목차
IntroductionSo What Is Big Data?Growing Interest in Decision MakingWhat This Book AddressesThe Conversation about Big DataTechnological Change as a Driver of Big DataThe Central Question: So What?Our Goals as AuthorsReferencesThe Mother of Invention’s Triplets: Moore’s Law, the Proliferation of Data, and Data Storage TechnologyMoore’s LawParallel Computing, Between and Within MachinesQuantum ComputingRecap of Growth in Computing PowerStorage, Storage EverywhereGrist for the Mill: Data Used and UnusedAgricultureAutomotiveMarketing in the Physical WorldOnline MarketingAsset Reliability and EfficiencyProcess Tracking and AutomationToward a Definition of Big DataPutting Big Data in ContextKey Concepts of Big Data and Their ConsequencesSummaryReferences.HadoopPower through Distribution Cost Effectiveness of HadoopNot Every Problem Is a Nail Some Technical AspectsTroubleshooting HadoopRunning HadoopHadoop File System MapReducePig and HiveInstallationCurrent Hadoop EcosystemHadoop Vendors ClouderaAmazon Web Services (AWS)HortonworksIBMIntelMapRMicrosoft To Run Pig Latin Using PowershellPivotalReferencesHBase and Other Big Data DatabasesEvolution from Flat File to the Three V’s Flat File Hierarchical Database Network Database Relational Database Object-Oriented Databases Relational-Object DatabasesTransition to Big Data Databases What Is Different bbout HBase? What Is Bigtable? What Is MapReduce? What Are the Various Modalities for Big Data Databases?Graph Databases How Does a Graph Database Work? What is the Performance of a Graph Database?Document DatabasesKey-Value DatabasesColumn-Oriented Databases HBase Apache AccumuloReferencesMachine LearningMachine Learning BasicsClassifying with Nearest NeighborsNaive BayesSupport Vector MachinesImproving Classification with Adaptive BoostingRegressionLogistic RegressionTree-Based RegressionK-Means ClusteringApriori AlgorithmFrequent Pattern-GrowthPrincipal Component Analysis (PCA)Singular Value DecompositionNeural NetworksBig Data and MapReduceData ExplorationSpam FilteringRankingPredictive RegressionText RegressionMultidimensional ScalingSocial GraphingReferencesStatisticsStatistics, Statistics EverywhereDigging into the DataStandard Deviation: The Standard Measure of DispersionThe Power of Shapes: DistributionsDistributions: Gaussian CurveDistributions: Why Be Normal?Distributions: The Long Arm of the Power LawThe Upshot? Statistics Are not BloodlessFooling Ourselves: Seeing What We Want to See in the DataWe Can Learn Much from an OctopusHypothesis Testing: Seeking a Verdict Two-Tailed TestingHypothesis Testing: A Broad FieldMoving on to Specific Hypothesis TestsRegression and Correlationp Value in Hypothesis Testing: A Successful Gatekeeper?Specious Correlations and Overfitting the DataA Sample of Common Statistical Software Packages Minitab SPSS R SAS Big Data Analytics Hadoop Integration Angoss Statistica CapabilitiesSummaryReferencesGoogleBig Data GiantsGoogle Go Android Google Product Offerings Google Analytics Advertising and Campaign Performance Analysis and TestingFacebookNingNon-United States Social Media Tencent Line Sina Weibo Odnoklassniki Vkontakte NimbuzzRanking Network SitesNegative Issues with Social NetworksAmazonSome Final WordsReferencesGeographic Information Systems (GIS)GIS ImplementationsA GIS ExampleGIS ToolsGIS DatabasesReferencesDiscoveryFaceted Search versus Strict TaxonomyFirst Key Ability: Breaking Down BarriersSecond Key Ability: Flexible Search and NavigationUnderlying TechnologyThe UpshotSummaryReferencesData QualityKnow Thy Data and ThyselfStructured, Unstructured, and Semistructured DataData Inconsistency: An Example from This BookThe Black Swan and Incomplete DataHow Data Can Fool Us Ambiguous Data Aging of Data or Variables Missing Variables May Change the Meaning Inconsistent Use of Units and TerminologyBiases Sampling Bias Publication Bias Survivorship BiasData as a Video, Not a Snapshot: Different Viewpoints as a Noise FilterWhat Is My Toolkit for Improving My Data? Ishikawa Diagram Interrelationship Digraph Force Field AnalysisData-Centric Methods Troubleshooting Queries from Source Data Troubleshooting Data Quality beyond the Source System Using Our Hidden ResourcesSummaryReferencesBenefitsData SerendipityConverting Data Dreck to UsefulnessSalesReturned MerchandiseSecurityMedicalTravel Lodging Vehicle MealsGeographical Information Systems New York City Chicago CLEARMAP Baltimore San Francisco Los Angeles Tucson, Arizona, University of Arizona, and COPLINKSocial NetworkingEducation General Educational Data Legacy Data Grades and other Indicators Testing Results Addresses, Phone Numbers, and MoreConcluding CommentsReferencesConcernsPart Two: Basic Principles of National Application Collection Limitation Principle Data Quality Principle Purpose Specification Principle Use Limitation Principle Security Safeguards Principle Openness Principle Individual Participation Principle Accountability PrincipleLogical Fallacies Affirming the Consequent Denying the Antecedent Ludic FallacyCognitive Biases Confirmation Bias Notational Bias Selection/Sample Bias Halo Effect Consistency and Hindsight Biases Congruence Bias Von Restorff EffectData Serendipity Converting Data Dreck to Usefulness SalesMerchandise ReturnsSecurity CompStat MedicalTravel Lodging Vehicle MealsSocial NetworkingEducationMaking Yourself Harder to Track Misinformation Disinformation Reducing/Eliminating Profiles Social Media Self Redefinition Identity Theft FacebookConcluding CommentsReferencesEpilogue Michael Porter’s Five Forces Model Bargaining Power of Customers Bargaining Power of Suppliers Threat of New Entrants OthersThe OODA LoopImplementing Big DataNonlinear, Qualitative ThinkingClosingReferences