Computational Statistics in Data Science

Computational Statistics in Data Science

Lee, Thomas C. M.; Piegorsch, Walter W.; Levine, Richard A.; Zhang, Hao Helen

John Wiley & Sons Inc

04/2022

672

Dura

Inglês

9781119561071

15 a 20 dias

1270

Descrição não disponível.
List of Contributors xxiii

Preface xxix


Part I Computational Statistics and Data Science 1


1 Computational Statistics and Data Science in the Twenty-first Century 3

Andrew J. Holbrook, Akihiko Nishimura, Xiang Ji, and Marc A. Suchard

1 Introduction 3

2 Core Challenges 1-3 5

3 Model-Specific Advances 8

4 Core Challenges 4 and 5 12

5 Rise of Data Science 16



2 Statistical Software 23

Alfred G. Schissler and Alexander D. Knudson

1 User Development Environments 23

2 Popular Statistical Software 26

3 Noteworthy Statistical Software and Related Tools 30

4 Promising and Emerging Statistical Software 36

5 The Future of Statistical Computing 38

6 Concluding Remarks 39


3 An Introduction to Deep Learning Methods 43

Yao Li, Justin Wang and Thomas C.M. Lee

1 Introduction 43

2 Machine Learning: An Overview 43

3 Feedforward Neural Networks 45

4 Convolutional Neural Networks 48

5 Autoencoders 52

6 Recurrent Neural Networks 54

7 Conclusion 57



4 Streaming Data and Data Streams 59

Taiwo Kolajo, Olawande Daramola, and Ayodele Adebiyi

1 Introduction 59

2 Data Stream Computing 61

3 Issues in Data Stream Mining 61

4 Streaming Data Tools and Technologies 64

5 Streaming Data Pre-Processing: Concept and Implementation 65

6 Streaming Data Algorithms 65

7 Strategies for Processing Data Streams 68

8 Best Practices for Managing Data Streams 69

9 Conclusion and theWay Forward 70


Part II Simulation-Based Methods 79



5 Monte Carlo Simulation: Are We There Yet? 81

Dootika Vats, James M. Flegal, and Galin L. Jones

1 Introduction 81

2 Estimation 83

3 Sampling Distribution 84

4 Estimating ? 87

5 Stopping Rules 88

6 Workflow 89

7 Examples 90



6 Sequential Monte Carlo: Particle Filters and Beyond 99

Adam M. Johansen

1 Introduction 99

2 Sequential Importance Sampling and Resampling 99

3 SMC in Statistical Contexts 106

4 Selected Recent Developments 112



7 Markov Chain Monte Carlo Methods, A Survey with Some Frequent Misunderstandings 119

Christian P. Robert and Wu Changye

1 Introduction 119

2 Monte Carlo Methods 121

3 Markov Chain Monte Carlo Methods 128

4 Approximate Bayesian Computation 141

5 Further Reading 145


8 Bayesian Inference with Adaptive Markov Chain Monte Carlo 151

Matti Vihola

1 Introduction 151

2 Random-Walk Metropolis Algorithm 151

3 Adaptation of Random-Walk Metropolis 152

4 Multimodal Targets with Parallel Tempering 156

5 Dynamic Models with Particle Filters 157

6 Discussion 159


9 Advances in Importance Sampling 165

Victor Elvira and Luca Martino

1 Introduction and Problem Statement 165

2 Importance Sampling 167

3 Multiple Importance Sampling (MIS) 171

4 Adaptive Importance Sampling (AIS) 174


Part III Statistical Learning 183



10 Supervised Learning 185

Weibin Mo and Yufeng Liu

1 Introduction 185

2 Penalized Empirical Risk Minimization 186

3 Linear Regression 190

4 Classification 193

5 Extensions for Complex Data 200

6 Discussion 203


11 Unsupervised and Semisupervised Learning 209

Jia Li and Vincent A. Pisztora

1 Introduction 209

2 Unsupervised Learning 210

3 Semisupervised Learning 219

4 Conclusions 224


12 Random Forest 231

Peter Calhoun, Xiaogang Su, Kelly M. Spoon, Richard A. Levine, and Juanjuan Fan

1 Introduction 231

2 Random Forest (RF) 232

3 Random Forest Extensions 235

4 Random Forests of Interaction Trees (RFIT) 239

5 Random Forest of Interaction Trees for Observational Studies 243

6 Discussion 249


13 Network Analysis 253

Rong Ma and Hongzhe Li

1 Introduction 253

2 Gaussian Graphical Models for Mixed Partial Compositional Data 255

3 Theoretical Properties 257

4 Graphical Model Selection 260

5 Analysis of a Microbiome-Metabolomics Data 260

6 Discussion 261


14 Tensors in Modern Statistical Learning 269

Will Wei Sun, Botao Hao, and Lexin Li

1 Introduction 269

2 Background270

3 Tensor Supervised Learning 272

4 Tensor Unsupervised Learning 276

5 Tensor Reinforcement Learning 282

6 Tensor Deep Learning 286


15 Computational Approaches to Bayesian Additive Regression Trees 297

Hugh Chipman, Edward George, Richard Hahn, Robert McCulloch, Matthew Pratola, and Rodney Sparapani

1 Introduction 297

2 Bayesian CART 298

3 TreeMCMC302

4 The BART Model 308

5 BART Example: Boston Housing Values and Air Pollution 310

6 BARTMCMC311

7 BART Extentions 313

8 Conclusion 320


Part IV High-Dimensional Data Analysis 323



16 Penalized Regression 325

Seung Jun Shin and Yichao Wu

1 Introduction 325

2 Penalization for Smoothness 326

3 Penalization for Sparsity 328

4 Tuning Parameter Selection 330


17 Model Selection in High-Dimensional Regression 333

Hao H. Zhang

1 Model Selection Problem 333

2 Model Selection in High-Dimensional Linear Regression 335

3 Interaction-Effect Selection for High-Dimensional Data 339

4 Model Selection in High-Dimensional Nonparametric Models 342

5 Concluding Remarks 349



18 Sampling Local Scale Parameters in High-Dimensional Regression Models 355

Anirban Bhattacharya and James E. Johndrow

1 Introduction 355

2 A Blocked Gibbs Sampler for the Horseshoe 356

3 Sampling (??, ??2, ??) 359

4 Sampling ?? 360

5 Appendix: A. Newton-Raphson Steps for the Inverse-cdf Sampler for ?? 367


19 Factor Modeling for High-Dimensional Time Series 371

Chun Yip Yau

1 Introduction 371

2 Identifiability 372

3 Estimation of High-Dimensional Factor Model 373

4 Determining the Number of Factors 383



Part V Quantitative Visualization 387



20 Visual Communication of Data: It Is Not a Programming Problem, It Is Viewer Perception 389

Edward Mulrow and Nola du Toit

1 Introduction 389

2 Case Studies Part 1 391

3 Let StAR Be Your Guide 393

4 Case Studies Part 2: Using StAR Principles to Develop Better Graphics 394

5 Ask Colleagues Their Opinion 397

6 Case Studies: Part 3 398

7 Iterate 401

8 Final Thoughts 402


21 Uncertainty Visualization 405

Lace Padilla, Matthew Kay, and Jessica Hullman

1 Introduction 405

2 Uncertainty Visualization Theories 408

3 General Discussion 420



22 Big Data Visualization 427

Leland Wilkinson

1 Introduction 427

2 Architecture for Big Data Analytics 428

3 Filtering430

4 Aggregating 430

5 Analyzing 436

6 Big Data Graphics 436

7 Conclusion 440


23 Visualization-Assisted Statistical Learning 443

Catherine B. Hurley and Katarina Domijan

1 Introduction 443

2 Better Visualizations with Seriation 444

3 Visualizing Machine Learning Fits 445

4 Condvis2 Case Studies 447

5 Discussion 453


24 Functional Data Visualization 457

Marc G. Genton and Ying Sun

1 Introduction 457

2 Univariate Functional Data Visualization 458

3 Multivariate Functional Data Visualization 461

4 Conclusions 465


Part VI Numerical Approximation and Optimization 469



25 Gradient-Based Optimizers for Statistics and Machine Learning 471

Cho-Jui Hsieh

1 Introduction 471

2 Convex Versus Nonconvex Optimization 472

3 Gradient Descent 473

4 Proximal Gradient Descent: Handling Nondifferentiable Regularization 475

5 Stochastic Gradient Descent 476


26 Alternating Minimization Algorithms 481

David R. Hunter

1 Introduction 481

2 Coordinate Descent 482

3 EM as Alternating Minimization 484

3.1 Finite Mixture Models 485

4 Matrix Approximation Algorithms 486

5 Conclusion 489


27 A Gentle Introduction to Alternating Direction Method of Multipliers (ADMM) for Statistical Problems 493



Shiqian Ma and Mingyi Hong

1 Introduction 493

2 Two Perfect Examples of ADMM 494

3 Variable Splitting and Linearized ADMM 496

4 Multiblock ADMM 499

5 Nonconvex Problems 501

6 Stopping Criteria 502

7 Convergence Results of ADMM 502


28 Nonconvex Optimization via MM Algorithms: Convergence Theory 509

Kenneth Lange, Joong-Ho Won, Alfonso Landeros, and Hua Zhou

1 Background509

2 Convergence Theorems 510

3 Paracontraction 521

4 Bregman Majorization 523


Part VII High-Performance Computing 535


29 Massive Parallelization 537

Robert B. Gramacy

1 Introduction 537

2 Gaussian Process Regression and Surrogate Modeling 539

3 Divide-and-Conquer GP Regression 542

4 Empirical Results 548

5 Conclusion 552


30 Divide-and-Conquer Methods for Big Data Analysis 559

Xueying Chen, Jerry Q. Cheng, and Min-ge Xie

1 Introduction 559

2 Linear Regression Model 560

3 Parametric Models 561

4 Nonparametric and Semiparametric Models 567

5 Online Sequential Updating 568

6 Splitting the Number of Covariates 569

7 Bayesian Divide-and-Conquer and Median-Based Combining 570

8 Real-World Applications 571

9 Discussion 572


31 Bayesian Aggregation 577

Yuling Yao

1 From Model Selection to Model Combination 577

2 From Bayesian Model Averaging to Bayesian Stacking 580

3 Asymptotic Theories of Stacking 584

4 Stacking in Practice 586

5 Discussion 588


32 Asynchronous Parallel Computing 593

Ming Yan

1 Introduction 593

2 Asynchronous Parallel Coordinate Update 597

3 Asynchronous Parallel Stochastic Approaches 602

4 Doubly Stochastic Coordinate Optimization with Variance Reduction 604

5 Concluding Remarks 605
Este título pertence ao(s) assunto(s) indicados(s). Para ver outros títulos clique no assunto desejado.
Data science statistics; data statistics; big data statistics; data stream processing; data stream statistics; data science statistics introduction; data science stats intro; big data; computational statistics