Data analysis in the cloud : models, techniques and applications / [electronic resource]
by Talia, Domenico [author.]; Trunfio, Paolo [author.]; Marozzo, Fabrizio [author.].
Material type: BookSeries: Computer science reviews and trends: Publisher: Amsterdam, Netherlands : Elsevier Ltd., 2016.Description: 1 online resource.ISBN: 9780128029145; 0128029145.Subject(s): Quantitative research | Data mining | Cloud computing | COMPUTERS -- General | Cloud computing | Data mining | Quantitative research | Electronic books | Electronic bookOnline resources: ScienceDirect | ScienceDirectVendor-supplied metadata.
Includes bibliographical references.
Data Analysis in the Cloud introduces and discusses models, methods, techniques, and systems to analyze the large number of digital data sources available on the Internet using the computing and storage facilities of the cloud. Coverage includes scalable data mining and knowledge discovery techniques together with cloud computing concepts, models, and systems. Specific sections focus on map-reduce and NoSQL models. The book also includes techniques for conducting high-performance distributed analysis of large data on clouds. Finally, the book examines research trends such as Big Data pervasive computing, data-intensive exascale computing, and massive social network analysis.
Cover; Title Page; Copyright Page; Dedication; Contents; Preface; Chapter 1 -- Introduction to Data Mining; 1.1 -- Data mining concepts ; 1.1.1 -- Classification ; 1.1.1.1 -- Decision Trees ; 1.1.1.2 -- Classification with kNN ; 1.1.2 -- Clustering ; 1.1.2.1 -- Bayesian Classification ; 1.1.2.2 -- The K-Means Algorithm ; 1.1.3 -- Association Rules ; 1.2 -- Parallel and distributed data mining ; 1.2.1 -- Parallel Classification ; 1.2.2 -- Parallel Clustering ; 1.2.3 -- Parallelism in Association Rules ; 1.2.4 -- Distributed Data Mining ; 1.2.4.1 -- Meta-Learning.
1.2.4.2 -- Collective Data Mining 1.2.4.3 -- Ensemble Learning ; 1.3 -- Summary ; References; Chapter 2 -- Introduction to Cloud Computing; 2.1 -- Cloud computing: definition, models, and architectures ; 2.1.1 -- Service Models ; 2.1.2 -- Deployment Models ; 2.1.3 -- Cloud Environments ; 2.1.3.1 -- Microsoft Azure ; 2.1.3.2 -- Amazon Web Services ; 2.1.3.3 -- OpenNebula ; 2.1.3.4 -- OpenStack ; 2.2 -- Cloud computing systems for data-intensive applications ; 2.2.1 -- Functional Requirements ; 2.2.1.1 -- Resource Management ; 2.2.1.2 -- Application Management.
2.2.2 -- Nonfunctional Requirements 2.2.2.1 -- User Requirements ; 2.2.2.2 -- Architecture Requirements ; 2.2.2.3 -- Infrastructure Requirements ; 2.2.3 -- Cloud Models for Distributed Data Analysis ; 2.3 -- Summary ; References ; Chapter 3 -- Models and Techniques for Cloud-Based Data Analysis; 3.1 -- MapReduce for data analysis ; 3.1.1 -- MapReduce Paradigm ; 3.1.2 -- MapReduce Frameworks ; 3.1.3 -- MapReduce Algorithms and Applications ; 3.2 -- Data analysis workflows ; 3.2.1 -- Workflow Programming ; 3.2.2 -- Workflow Management Systems ; 3.2.3 -- Workflow Management Systems for Clouds.
3.3 -- NoSQL models for data analytics 3.3.1 -- Key Features of NoSQL ; 3.3.2 -- Classification of NoSQL Databases ; 3.3.3 -- NoSQL Systems ; 3.3.3.1 -- Dynamo ; 3.3.3.2 -- MongoDB ; 3.3.3.3 -- Bigtable ; 3.3.4 -- Use Cases ; 3.4 -- Summary ; References ; Chapter 4 -- Designing and Supporting Scalable Data Analytics ; 4.1 -- Data analysis systems for clouds ; 4.1.1 -- Pegasus ; 4.1.2 -- Swift ; 4.1.3 -- Hunk ; 4.1.4 -- Sector/Sphere ; 4.1.5 -- BigML ; 4.1.6 -- Kognitio Analytical Platform ; 4.1.7 -- Mahout ; 4.1.8 -- Spark ; 4.1.9 -- Microsoft Azure Machine Learning ; 4.1.10 -- ClowdFlows.
4.2 -- How to design a scalable data analysis framework in clouds 4.2.1 -- Architecture and Execution Mechanisms ; 4.2.2 -- Implementation on Microsoft Azure ; 4.3 -- Programming workflow-based data analysis ; 4.3.1 -- VL4Cloud ; 4.3.2 -- JS4Cloud ; 4.3.3 -- Workflow Patterns in DMCF ; 4.3.3.1 -- Single Task ; 4.3.3.2 -- Pipeline ; 4.3.3.3 -- Data Partitioning ; 4.3.3.4 -- Data Aggregation ; 4.3.3.5 -- Parameter Sweeping ; 4.3.3.6 -- Input Sweeping ; 4.3.3.7 -- Tool Sweeping ; 4.3.3.8 -- Combination of Sweeping Patterns ; 4.4 -- Data analysis case studies.
4.4.1 -- Trajectory Mining Workflow Using VL4Cloud.
There are no comments for this item.