Data Science For Dummies

Data Science For Dummies

;

John Wiley & Sons Inc

03/2017

384

Mole

Inglês

9781119327639

512


ebook 22,99 €

Descrição não disponível.
Foreword xv Introduction 1 About This Book 2 Foolish Assumptions 2 Icons Used in This Book 3 Beyond the Book 3 Where to Go from Here 4 Part 1: Getting Started with Data Science 5 Chapter 1: Wrapping Your Head around Data Science 7 Seeing Who Can Make Use of Data Science 8 Analyzing the Pieces of the Data Science Puzzle 10 Collecting, querying, and consuming data 10 Applying mathematical modeling to data science tasks 11 Deriving insights from statistical methods 12 Coding, coding, coding - it's just part of the game 12 Applying data science to a subject area 12 Communicating data insights 14 Exploring the Data Science Solution Alternatives 14 Assembling your own in-house team 14 Outsourcing requirements to private data science consultants 15 Leveraging cloud-based platform solutions 15 Letting Data Science Make You More Marketable 16 Chapter 2: Exploring Data Engineering Pipelines and Infrastructure 17 Defining Big Data by the Three Vs 18 Grappling with data volume 18 Handling data velocity 18 Dealing with data variety 19 Identifying Big Data Sources 20 Grasping the Difference between Data Science and Data Engineering 21 Defining data science 21 Defining data engineering 22 Comparing data scientists and data engineers 23 Making Sense of Data in Hadoop 24 Digging into MapReduce 24 Stepping into real-time processing 26 Storing data on the Hadoop distributed file system (HDFS) 27 Putting it all together on the Hadoop platform 28 Identifying Alternative Big Data Solutions 28 Introducing massively parallel processing (MPP) platforms 29 Introducing NoSQL databases 29 Data Engineering in Action: A Case Study 30 Identifying the business challenge 30 Solving business problems with data engineering 32 Boasting about benefits 32 Chapter 3: Applying Data-Driven Insights to Business and Industry 33 Benefiting from Business-Centric Data Science 34 Converting Raw Data into Actionable Insights with Data Analytics 35 Types of analytics 35 Common challenges in analytics 36 Data wrangling 36 Taking Action on Business Insights 37 Distinguishing between Business Intelligence and Data Science 39 Business intelligence, defined 39 The kinds of data used in business intelligence 40 Technologies and skillsets that are useful in business intelligence 40 Defining Business-Centric Data Science 41 Kinds of data that are useful in business-centric data science 42 Technologies and skillsets that are useful in business-centric data science 43 Making business value from machine learning methods 43 Differentiating between Business Intelligence and Business-Centric Data Science 44 Knowing Whom to Call to Get the Job Done Right 45 Exploring Data Science in Business: A Data-Driven Business Success Story 46 Part 2: Using Data Science to Extract Meaning from Your Data 49 Chapter 4: Machine Learning: Learning from Data with Your Machine 51 Defining Machine Learning and Its Processes 51 Walking through the steps of the machine learning process 52 Getting familiar with machine learning terms 52 Considering Learning Styles 53 Learning with supervised algorithms 53 Learning with unsupervised algorithms 53 Learning with reinforcement 54 Seeing What You Can Do 54 Selecting algorithms based on function 54 Using Spark to generate real-time big data analytics 58 Chapter 5: Math, Probability, and Statistical Modeling 61 Exploring Probability and Inferential Statistics 62 Probability distributions 63 Conditional probability with Naive Bayes 65 Quantifying Correlation 66 Calculating correlation with Pearson's r 66 Ranking variable-pairs using Spearman's rank correlation 66 Reducing Data Dimensionality with Linear Algebra 67 Decomposing data to reduce dimensionality 67 Reducing dimensionality with factor analysis 69 Decreasing dimensionality and removing outliers with PCA 70 Modeling Decisions with Multi-Criteria Decision Making 70 Turning to traditional MCDM 71 Focusing on fuzzy MCDM 72 Introducing Regression Methods 73 Linear regression 73 Logistic regression 74 Ordinary least squares (OLS) regression methods 74 Detecting Outliers 75 Analyzing extreme values 75 Detecting outliers with univariate analysis 76 Detecting outliers with multivariate analysis 77 Introducing Time Series Analysis 78 Identifying patterns in time series 78 Modeling univariate time series data 79 Chapter 6: Using Clustering to Subdivide Data 81 Introducing Clustering Basics 81 Getting to know clustering algorithms 82 Looking at clustering similarity metrics 85 Identifying Clusters in Your Data 86 Clustering with the k-means algorithm 86 Estimating clusters with kernel density estimation (KDE) 87 Clustering with hierarchical algorithms 88 Dabbling in the DBScan neighborhood 90 Categorizing Data with Decision Tree and Random Forest Algorithms 91 Chapter 7: Modeling with Instances 93 Recognizing the Difference between Clustering and Classification 94 Reintroducing clustering concepts 94 Getting to know classification algorithms 95 Making Sense of Data with Nearest Neighbor Analysis 97 Classifying Data with Average Nearest Neighbor Algorithms 98 Classifying with K-Nearest Neighbor Algorithms 101 Understanding how the k-nearest neighbor algorithm works 102 Knowing when to use the k-nearest neighbor algorithm 103 Exploring common applications of k-nearest neighbor algorithms 104 Solving Real-World Problems with Nearest Neighbor Algorithms 104 Seeing k-nearest neighbor algorithms in action 104 Seeing average nearest neighbor algorithms in action 105 Chapter 8: Building Models That Operate Internet-of-Things Devices 107 Overviewing the Vocabulary and Technologies 108 Learning the lingo 108 Procuring IoT platforms 110 Spark streaming for the IoT 110 Getting context-aware with sensor fusion 111 Digging into the Data Science Approaches 111 Taking on time series 112 Geospatial analysis 112 Dabbling in deep learning 113 Advancing Artificial Intelligence Innovation 113 Part 3: Creating Data Visualizations That Clearly Communicate Meaning 115 Chapter 9: Following the Principles of Data Visualization Design 117 Data Visualizations: The Big Three 118 Data storytelling for organizational decision makers 118 Data showcasing for analysts 118 Designing data art for activists 119 Designing to Meet the Needs of Your Target Audience 119 Step 1: Brainstorm (about Brenda) 120 Step 2: Define the purpose 121 Step 3: Choose the most functional visualization type for your purpose 121 Picking the Most Appropriate Design Style 122 Inducing a calculating, exacting response 122 Eliciting a strong emotional response 123 Choosing How to Add Context 124 Creating context with data 125 Creating context with annotations 125 Creating context with graphical elements 125 Selecting the Appropriate Data Graphic Type 127 Standard chart graphics 127 Comparative graphics 130 Statistical plots 134 Topology structures 135 Spatial plots and maps 138 Choosing a Data Graphic 140 Chapter 10: Using D3.js for Data Visualization 141 Introducing the D3.js Library 141 Knowing When to Use D3.js (and When Not To) 142 Getting Started in D3.js 143 Bringing in the HTML and DOM 144 Bringing in the JavaScript and SVG 145 Bringing in the Cascading Style Sheets (CSS) 146 Bringing in the web servers and PHP 146 Implementing More Advanced Concepts and Practices in D3.js 147 Getting to know chain syntax 151 Getting to know scales 152 Getting to know transitions and interactions 153 Chapter 11: Web-Based Applications for Visualization Design 157 Designing Data Visualizations for Collaboration 158 Visualizing and collaborating with Plotly 159 Talking about Tableau Public 161 Visualizing Spatial Data with Online Geographic Tools 162 Making pretty maps with OpenHeatMap 163 Mapmaking and spatial data analytics with CartoDB 164 Visualizing with Open Source: Web-Based Data Visualization Platforms 166 Making pretty data graphics with Google Fusion Tables 166 Using iCharts for web-based data visualization 167 Using RAW for web-based data visualization 168 Knowing When to Stick with Infographics 170 Making cool infographics with Infogr.am 170 Making cool infographics with Piktochart 172 Chapter 12: Exploring Best Practices in Dashboard Design 173 Focusing on the Audience 174 Starting with the Big Picture 175 Getting the Details Right 176 Testing Your Design 178 Chapter 13: Making Maps from Spatial Data 179 Getting into the Basics of GIS 180 Spatial databases 181 File formats in GIS 182 Map projections and coordinate systems 185 Analyzing Spatial Data 187 Querying spatial data 187 Buffering and proximity functions 188 Using layer overlay analysis 189 Reclassifying spatial data 190 Getting Started with Open-Source QGIS 191 Getting to know the QGIS interface 191 Adding a vector layer in QGIS 192 Displaying data in QGIS 193 Part 4: Computing for Data Science 199 Chapter 14: Using Python for Data Science 201 Sorting Out the Python Data Types 203 Numbers in Python 204 Strings in Python 204 Lists in Python 204 Tuples in Python 205 Sets in Python 205 Dictionaries in Python 205 Putting Loops to Good Use in Python 206 Having Fun with Functions 207 Keeping Cool with Classes 208 Checking Out Some Useful Python Libraries 210 Saying hello to the NumPy library 211 Getting up close and personal with the SciPy library 213 Peeking into the Pandas offering 213 Bonding with MatPlotLib for data visualization 214 Learning from data with Scikit-learn 215 Analyzing Data with Python - an Exercise 216 Installing Python on the Mac and Windows OS 216 Loading CSV files 218 Calculating a weighted average 219 Drawing trendlines 222 Chapter 15: Using Open Source R for Data Science 225 R's Basic Vocabulary 226 Delving into Functions and Operators 229 Iterating in R 232 Observing How Objects Work 234 Sorting Out Popular Statistical Analysis Packages 236 Examining Packages for Visualizing, Mapping, and Graphing in R 238 Visualizing R statistics with ggplot2 238 Analyzing networks with statnet and igraph 239 Mapping and analyzing spatial point patterns with spatstat 240 Chapter 16: Using SQL in Data Science 241 Getting a Handle on Relational Databases and SQL 242 Investing Some Effort into Database Design 245 Defining data types 246 Designing constraints properly 246 Normalizing your database 247 Integrating SQL, R, Python, and Excel into Your Data Science Strategy 249 Narrowing the Focus with SQL Functions 249 Chapter 17: Doing Data Science with Excel and Knime 255 Making Life Easier with Excel 255 Using Excel to quickly get to know your data 256 Reformatting and summarizing with pivot tables 261 Automating Excel tasks with macros 262 Using KNIME for Advanced Data Analytics 264 Reducing customer churn via KNIME 265 Using KNIME to make the most of your social data 265 Using KNIME for environmental good stewardship 266 Part 5: Applying Domain Expertise to Solve Real-World Problems Using Data Science 267 Chapter 18: Data Science in Journalism: Nailing Down the Five Ws (and an H) 269 Who Is the Audience? 270 Who made the data 271 Who comprises the audience 271 What: Getting Directly to the Point 272 Bringing Data Journalism to Life: The Black Budget 273 When Did It Happen? 274 When as the context to your story 274 When does the audience care the most? 275 Where Does the Story Matter? 275 Where is the story relevant? 276 Where should the story be published? 276 Why the Story Matters 277 Asking why in order to generate and augment a storyline 277 Why your audience should care 277 How to Develop, Tell, and Present the Story 278 Integrating how as a source of data and story context 278 Finding stories in your data 278 Presenting a data-driven story 279 Collecting Data for Your Story 279 Scraping data 279 Setting up data alerts 280 Finding and Telling Your Data's Story 280 Spotting strange trends and outliers 281 Examining context to understand the significance of data 283 Emphasizing the story through visualization 284 Creating compelling and highly focused narratives 285 Chapter 19: Delving into Environmental Data Science 287 Modeling Environmental-Human Interactions with Environmental Intelligence 288 Examining the types of problems solved 288 Defining environmental intelligence 289 Identifying major organizations that work in environmental intelligence 290 Making positive impacts with environmental intelligence 291 Modeling Natural Resources in the Raw 293 Exploring natural resource modeling 293 Dabbling in data science 293 Modeling natural resources to solve environmental problems 294 Using Spatial Statistics to Predict for Environmental Variation across Space 295 Addressing environmental issues with spatial predictive analytics 296 Describing the data science that's involved 296 Addressing environmental issues with spatial statistics 297 Chapter 20: Data Science for Driving Growth in E-Commerce 299 Making Sense of Data for E-Commerce Growth 302 Optimizing E-Commerce Business Systems 303 Angling in on analytics 304 Talking about testing your strategies 308 Segmenting and targeting for success 311 Chapter 21: Using Data Science to Describe and Predict Criminal Activity 315 Temporal Analysis for Crime Prevention and Monitoring 316 Spatial Crime Prediction and Monitoring 317 Crime mapping with GIS technology 317 Going one step further with location-allocation analysis 318 Analyzing complex spatial statistics to better understand crime 319 Probing the Problems with Data Science for Crime Analysis 322 Caving in on civil rights 322 Taking on technical limitations 323 Part 6: The Part of Tens 325 Chapter 22: Ten Phenomenal Resources for Open Data 327 Digging through data.gov 328 Checking Out Canada Open Data 329 Diving into data.gov.uk 330 Checking Out U.S Census Bureau Data 331 Knowing NASA Data 332 Wrangling World Bank Data 333 Getting to Know Knoema Data 334 Queuing Up with Quandl Data 335 Exploring Exversion Data 336 Mapping OpenStreetMap Spatial Data 337 Chapter 23: Ten Free Data Science Tools and Applications 339 Making Custom Web-Based Data Visualizations with Free R Packages 340 Getting Shiny by RStudio 340 Charting with rCharts 341 Mapping with rMaps 341 Examining Scraping, Collecting, and Handling Tools 342 Scraping data with import.io 342 Collecting images with ImageQuilts 343 Wrangling data with DataWrangler 343 Looking into Data Exploration Tools 344 Getting up to speed in Gephi 345 Machine learning with the WEKA suite 347 Evaluating Web-Based Visualization Tools 347 Getting a little Weave up your sleeve 347 Checking out Knoema's data visualization offerings 348 Index 351
Este título pertence ao(s) assunto(s) indicados(s). Para ver outros títulos clique no assunto desejado.
Data science; data; statistical analysis; data sets; massive data sets; big data; capitalize on data; data science jobs; jobs in data science; big data processing; big data processing tools; data science skills; IT; IT professional; IT professionals; make sense of data; make sense of data sets; interpret data; Data Science For Dummies; Lillian Pierson