Advances in Data Science and Analytics

Advances in Data Science and Analytics

Concepts and Paradigms

Gianey, Hemant Kumar; Niranjanamurthy, M.; Gandomi, Amir H.

John Wiley & Sons Inc

11/2022

352

Dura

Inglês

9781119791881

15 a 20 dias

573

Descrição não disponível.
Preface xv

1 Implementation Tools for Generating Statistical Consequence Using Data Visualization Techniques 1
Dr. Ajay B. Gadicha, Dr. Vijay B. Gadicha, Prof. Sneha Bohra and Dr. Niranjanamurthy M.

1.1 Introduction 2

1.2 Literature Review 4

1.3 Tools in Data Visualization 4

1.4 Methodology 14

1.4.1 Plotting the Data 14

1.4.2 Plotting the Model on Data 15

1.4.3 Quantifying Linear Relationships 16

1.4.4 Covariance vs. Correlation 17

1.5 Conclusion 18

References 18

2 Decision Making and Predictive Analysis for Real Time Data 21
Umesh Pratap Singh

2.1 Introduction 22

2.2 Data Analytics 23

2.2.1 Descriptive Analytics 23

2.2.2 Diagnostic Analytics 23

2.2.3 Predictive Analytics 23

2.2.4 Prescriptive Analytics 24

2.3 Predictive Modeling 24

2.4 Categories of Predictive Models 24

2.5 Process of Predictive Modeling 25

2.5.1 Requirement Gathering 26

2.5.2 Data Gathering 26

2.5.3 Data Analysis and Massaging 26

2.5.4 Machine Learning Statistics 26

2.5.5 Predictive Modeling 26

2.5.6 Prediction and Decision Making 27

2.6 Predictive Analytics Opportunities 27

2.6.1 Detecting Fraud 27

2.6.2 Reduction of Risk 27

2.6.3 Marketing Campaign Optimization 28

2.6.4 Operation Improvement 28

2.6.5 Clinical Decision Support System 28

2.7 Classification of Predictive Analytics Models 28

2.7.1 Predictive Models 28

2.7.2 Descriptive Models 29

2.7.3 Decision Models 29

2.8 Predictive Analytics Techniques 29

2.8.1 Predictive Analytics Software 29

2.8.2 The Importance of Good Data 30

2.8.3 Predictive Analytics vs. Business Intelligence 30

2.8.4 Pricing Information 30

2.9 Data Analysis Tools 30

2.9.1 Excel 30

2.9.2 Tableau 31

2.9.3 Power BI 31

2.9.4 Fine Report 31

2.9.5 R & Python 31

2.10 Advantages & Disadvantages of Predictive Modeling 31

2.10.1 Advantages 31

2.10.2 Disadvantages 32

2.10.2.1 Data Labeling 32

2.10.2.2 Obtaining Massive Training Datasets 32

2.10.2.3 The Explainability Problem 32

2.10.2.4 Generalizability of Learning 33

2.10.2.5 Bias in Algorithms and Data 33

2.11 Predictive Analytics Biggest Impact 33

2.11.1 Predicting Demand 33

2.11.2 Transformation Using Technology and Process 34

2.11.3 Improved Pricing 34

2.11.4 Predictive Maintenance 35

2.12 Application of Predictive Analytics 35

2.12.1 Financial and Banking Services 35

2.12.2 Retail 35

2.12.3 Health and Insurance 36

2.12.4 Oil and Gas Utilities 36

2.12.5 Public Sector 36

2.13 Future Scope of Predictive Modeling 36

2.13.1 Technological Advancements 37

2.13.2 Changes in Work 37

2.13.3 Risk Mitigation 37

2.14 Conclusion 37

References 38

3 Optimizing Water Quality with Data Analytics and Machine Learning 39
Bin Liang, Zhidong Li, Hongda Tian, Shuming Liang, Yang Wang and Fang Chen

3.1 Introduction 39

3.2 Related Work 41

3.3 Data Sources and Collection 42

3.4 Water Demand Forecasting 43

3.4.1 Network Flow and Zone Demand Estimation 43

3.4.2 Demand Forecasting 44

3.4.2.1 Feature Importance 45

3.4.2.2 Forecast Horizon 46

3.4.3 Performance Characterization 46

3.5 Re-Chlorination Optimization 49

3.5.1 Data 51

3.5.2 Water Age Estimation 52

3.5.2.1 Travel Time Estimation 53

3.5.2.2 Residential Time Estimation 54

3.5.3 Ammonia Prediction 54

3.5.4 Optimization Model Definition 57

3.5.5 Improvements in Customer Water Quality 59

3.5.6 Plant Dosing Optimization 62

3.6 Conclusion 63

Acknowledgements 63

References 63

4 Lip Reading Framework using Deep Learning and Machine Learning 67
Hemant Kumar Gianey, Parth Khandelwal, Prakhar Goel, Rishav Maheshwari, Bhannu Galhotra and Divyanshu Pratap Singh

4.1 Introduction 68

4.1.1 Overview 68

4.1.2 Motivation 68

4.1.3 Lip Reading System Outcomes and Deliverables 69

4.2 The Emergence and Definition of the Lip-Reading System 70

4.2.1 Background of Domain 70

4.2.2 Identified Problems 78

4.2.3 Tools and Technologies Used 78

4.2.4 Implementation Aspects 78

4.2.4.1 Data Preparation 79

4.3 Design and Components of Lip-Reading System 82

4.4 Lip Reading System Architecture 82

4.5 Testing 84

4.6 Problems Encountered During Implementation 84

4.6.1 Assumptions and Constraints 85

4.7 Conclusion 85

4.8 Future Work 85

References 86

5 New Perspective to Management, Economic Growth and Debt Nexus Analysis: Evidence from Indian Economy 89
Edmund Ntom Udemba, Festus Victor Bekun, Dervis Kirikkaleli and Esra Sipahi Doenguel

5.1 Introduction 90

5.2 Literature Review 92

5.2.1 External Debt and Economic Growth 92

5.2.2 Trade Openness, FDI, and Economic Growth 94

5.2.3 FDI and Economic Growth 94

5.3 Data 95

5.3.1 Analytical Framework and Data Description 96

5.3.2 Theoretical Background and Specifications 96

5.3.2.1 Model Specification 98

5.4 Methodology and Findings 99

5.4.1 Unit Root Testing 99

5.4.2 Cointegration 99

5.4.3 Vector Error Correction Model 103

5.4.4 Long-Run Relationship Estimation 105

5.4.5 Causality Test 107

5.5 Conclusion and Policy Implications 108

Declarations 109

Availability of Data and Materials 109

Competing Interests 110

Funding 110

Authors' Contributions 110

Acknowledgments 110

References 110

6 Data-Driven Delay Analysis with Applications to Railway Networks 115
Boyu Li, Ting Guo, Yang Wang and Fang Chen

6.1 Introduction 116

6.2 Related Works 118

6.3 Background Knowledge 119

6.3.1 Background and Problem Formulation 120

6.3.1.1 Train Delay 120

6.3.1.2 Delay Propagation 121

6.3.2 Preliminaries 122

6.3.2.1 Bayesian Inference 123

6.3.2.2 Markov Property 123

6.4 Delay Propagation Model 123

6.4.1 Conditional Bayesian Delay Propagation 123

6.4.1.1 Delay Self-Propagation 124

6.4.1.2 Incremental Run-Time Delay 125

6.4.1.3 Incremental Dwell Time Delay 125

6.4.1.4 Accumulative Departure Delay 126

6.4.2 Cross-Line Propagation, Backward Propagation and Train Connection Propagation 127

6.5 Primary Delay Tracing Back 130

6.5.1 Delay Candidates Selection 130

6.5.2 Relation Construction 131

6.5.2.1 Preceding and Following Trains 131

6.5.2.2 Preceding and Connecting Trains 131

6.6 Evaluation on Dwell Time Improvement Strategy 132

6.7 Experiments 135

6.7.1 Experiment Setting 135

6.7.2 Temporal Prediction of Delay Propagation 137

6.7.3 Spatial Prediction of Delay Propagation 138

6.7.4 Case Study of Primary Delay Tracing Down 139

6.7.5 Evaluation of Dwell Time Improvement Strategy 140

6.8 Conclusion 142

References 142

7 Proposing a Framework to Analyze Breast Cancer in Mammogram Images Using Global Thresholding, Gray Level Co-Occurrence Matrix, and Convolutional Neural Network (CNN) 145
Ms. Tanishka Dixit and Ms. Namrata Singh

7.1 Introduction & Purpose of Study 146

7.1.1 Segmentation 146

7.1.1.1 Types of Segmentation 147

7.1.2 Compression 150

7.2 Literature Review & Motivation 153

7.3 Proposed Work 161

7.3.1 Algorithm 161

7.3.2 Explanation 162

7.3.3 Flowchart 162

7.4 Observation Tables and Figures 163

7.5 Conclusion 176

7.6 Future Work 176

References 176

8 IoT Technologies for Smart Healthcare 181
Rehab A. Rayan, Imran Zafar and Christos Tsagkaris

8.1 Introduction 182

8.2 Literature Review 183

8.2.1 IoT-Based Smart Health 183

8.2.2 Advantages of Applying IoT in Health 186

8.3 Findings 187

8.3.1 Significant Features and Applications of IoT in Health 187

8.3.1.1 Simultaneous Monitoring and Reporting 189

8.3.1.2 End-to-End Connectivity and Affordability 190

8.3.1.3 Data Analysis 190

8.3.1.4 Tracking, Alerts, and Remote Medical Care 190

8.3.1.5 Research 191

8.3.1.6 Patient-Generated Health Data (PGHD) 191

8.3.1.7 Management of Chronic Diseases and Preventative Care 191

8.3.1.8 Home-Based and Short-Term Care 192

8.4 Case Study: CyberMed as an IoT-Based Smart Health Model 192

8.5 Discussions 193

8.5.1 Limitations of Adopting IoT in Health 193

8.5.1.1 Data Security and Privacy 193

8.5.1.2 Connectivity 194

8.5.1.3 Compatibility and Data Integration 195

8.5.1.4 Implementation Cost 195

8.5.1.5 Complexity and Risk of Errors 195

8.6 Future Insights 196

8.7 Conclusions 197

References 197

9 Enhancement of Scalability of SVM Classifiers for Big Data 203
Vijaykumar Bhajantri, Shashikumar G. Totad and Geeta R. Bharamagoudar

9.1 Introduction 204

9.2 Support Vector Machine 205

9.2.1 Challenges 208

9.3 Parallel and Distributed Mechanism 209

9.3.1 Shared-Memory Parallelism 209

9.4 Distributed Big Data Architecture 210

9.4.1 Hadoop MapReduce 210

9.4.2 Spark 210

9.4.3 Akka 211

9.5 Distributed High Performance Computing 212

9.5.1 GASNet 212

9.5.2 Charm++ 213

9.6 GPU Based Parallelism 214

9.6.1 Cuda 215

9.6.2 OpenCL 215

9.7 Parallel and Distributed SVM Algorithms 217

9.7.1 Ls-svm 218

9.7.2 Cascade SVM 219

9.7.3 dc Svm 220

9.7.4 Parallel Distributed Multiclass SVM Algorithms 222

9.8 Conclusion and Future Research Directions 222

References 225

10 Electrical Network-Related Incident Prediction Based on Weather Factors 233
Hongda Tian, Jessie Nghiem and Fang Chen

10.1 Introduction 233

10.2 Related Work 235

10.3 Methodology 235

10.3.1 Binary Classification of Incident and Normality 235

10.3.2 Incident Categorization Using Natural Language Processing 236

10.3.3 Classification of Multiple Types of Incidents 236

10.4 Experiments 237

10.4.1 Data Sets 237

10.4.2 Evaluation Metrics 239

10.4.3 Binary Classification 239

10.4.4 Incident Categorization 241

10.4.5 Multi-Class Classification 242

10.5 Conclusion and Future Work 244

Acknowledgements 244

References 245

11 Green IoT: Environment-Friendly Approach to IoT 247
Abhishek Goel and Siddharth Gautam

11.1 Introduction 247

11.2 G-IoT (Green Internet of Things) 249

11.3 Layered Architecture of G-IoT 251

11.3.1 Data Center/Cloud 252

11.3.2 Data Analytics and Control Applications It 252

11.3.3 Data Aggregation and Storage 253

11.3.4 Edge Computing 253

11.3.5 Communication and Processing Unit 254

11.4 Techniques for Implementation of G-IoT 257

11.5 Power Saving Methods Based on Components 266

11.6 Applications of G-IoT 266

11.7 Challenges and Future Scope 269

11.8 Case Study 269

11.9 Conclusion 270

References 271

12 Big-Data Analytics: A New Paradigm Shift in Micro Finance Industry 275
Vinay Pal Singh, Rohit Bansal and Ram Singh

12.1 Introduction 276

12.2 Reality of Area and Transcendent Difficulties 276

12.2.1 Probable Overlending 278

12.2.2 Information Imbalance 278

12.2.3 Retreating Not-for-Profit Sector 278

12.2.4 Neighbourhood Pressure 279

12.3 Data Analytics in Microfinance 280

12.3.1 Types of Data Analytics Used in Microfinance 280

12.3.2 Use of Big Data in Microfinance Industry 281

12.3.3 Risk and Data Based Credit Decisions 282

12.3.4 Product Development and Selection 283

12.3.5 Product or Service Positioning 283

12.3.6 M-Commerce and E-Payments 283

12.3.7 Making Reliable Credit Decisions 284

12.3.8 Big Data-Driven Model Promises Psychometric Evaluations 284

12.3.9 Product Build-Up, Service Positioning, and Offering 284

12.4 Opportunities and Risks in Using Data Analytics 284

12.5 Risk in Utilizing Big Data 287

12.6 Conclusion 290

References 290

13 Big Data Storage and Analysis 293
Namrata Dhanda

13.1 Introduction 293

13.1.1 6 V's of Big Data 294

13.1.2 Types of Data 295

13.1.3 Issues in Handling Big Data 297

13.2 Hadoop as a Solution to Challenges of Big Data 297

13.2.1 The Hadoop Ecosystem 298

13.2.2 Rack Awareness Policy in HDFS 307

13.3 In-Memory Storage and NoSQL 308

13.3.1 Key-Value Data Stores 309

13.3.2 Document Stores 309

13.3.3 Wide Column Stores 310

13.3.4 Graph Stores 310

13.3.5 Multi-Modal Databases 310

13.4 Advantages of NoSQL Database 310

13.5 Conclusion 311

References 311

14 A Framework for Analysing Social Media and Digital Data by Applying Machine Learning Techniques for Pandemic Management 313
Mutyala Sridevi

14.1 Introduction 314

14.2 Literature Review 314

14.3 Understanding Pandemic Analogous to a Disaster 317

14.4 Application of Machine Learning Techniques at Various Phases of Pandemic Management 318

14.4.1 Mitigation Phase 319

14.4.2 Preparedness Phase 320

14.4.3 Response Phase 321

14.4.4 Recovery Phase 321

14.5 Generalized Framework to Apply Machine Learning Techniques for Pandemic Management 322

14.6 Conclusion 324

References 324

About the Editors 327

Index 329
Este título pertence ao(s) assunto(s) indicados(s). Para ver outros títulos clique no assunto desejado.
<p>Data Science; Machine Learning; Big Data; Business Intelligence; Four types of Analytics; Descriptive Analytics; Diagnostic Analytics; Predictive Analytics; Prescriptive Analytics; Introduction to Data Science; What is Data Science and why is it so important? Overview of Data Science and Analytics; Mathematics for Data Science; Introduction to Python and R; Data Visualization techniques; Understanding and Visualizing Data; Data Visualization in Tableau; Decision Making and Predictive Analysis; Implementing Scientific Decision Making; Using Predictive Data Analysis; Data Modeling and Optimization; Modeling Uncertainty and Risk; Optimization and Modeling Simultaneous Decisions; Machine Learning (Supervised Learning); Regression Techniques; Data exploration; Evaluation methods; Classification Techniques; Machine Learning (Unsupervised Learning); Clustering Techniques; Anomaly Detection; Dimensionality Reduction; Association Rule Learning; Hands-on on clustering; Hands-on association rule mining; Hands-on dimensionality reduction; Hands-on anomaly detection; Deep Learning; Neural Networks; Big Data Analytics; Introduction to Big data and Hadoop; HDFS and YARN; MapReduce and Sqoop; Hive and Impala; Apache Flume and HBase; Pig; Apache Spark; Spark RDD Optimization Techniques; Spark Algorithm; Spark SQL; Data Science with R; Python for Data Science; Building a Data Team; Data Processing; Data Storage; Data Privacy and security; Bayesian Networks; Association Rules Learning; Clustering; With analytical case studies Based on Domains; Advanced Tools to support Data Science and Analytics</p>