Details

Practical Machine Learning in R


Practical Machine Learning in R


1. Aufl.

von: Fred Nwanganga, Mike Chapple

25,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 10.04.2020
ISBN/EAN: 9781119591573
Sprache: englisch
Anzahl Seiten: 464

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>Guides professionals and students through the rapidly growing field of machine learning with hands-on examples in the popular R programming language</b></p> <p>Machine learning—a branch of Artificial Intelligence (AI) which enables computers to improve their results and learn new approaches without explicit instructions—allows organizations to reveal patterns in their data and incorporate predictive analytics into their decision-making process. <i>Practical Machine Learning in R</i> provides a hands-on approach to solving business problems with intelligent, self-learning computer algorithms. </p> <p>Bestselling author and data analytics experts Fred Nwanganga and Mike Chapple explain what machine learning is, demonstrate its organizational benefits, and provide hands-on examples created in the R programming language. A perfect guide for professional self-taught learners or students in an introductory machine learning course, this reader-friendly book illustrates the numerous real-world business uses of machine learning approaches. Clear and detailed chapters cover data wrangling, R programming with the popular RStudio tool, classification and regression techniques, performance evaluation, and more. </p> <ul> <li>Explores data management techniques, including data collection, exploration and dimensionality reduction</li> <li>Covers unsupervised learning, where readers identify and summarize patterns using approaches such as apriori, eclat and clustering</li> <li>Describes the principles behind the Nearest Neighbor, Decision Tree and Naive Bayes classification techniques</li> <li>Explains how to evaluate and choose the right model, as well as how to improve model performance using ensemble methods such as Random Forest and XGBoost</li> </ul> <p><i>Practical Machine Learning in R </i>is a must-have guide for business analysts, data scientists, and other professionals interested in leveraging the power of AI to solve business problems, as well as students and independent learners seeking to enter the field.</p>
<p>About the Authors vii</p> <p>About the Technical Editors ix</p> <p>Acknowledgments xi</p> <p>Introduction xxi</p> <p><b>Part I: Getting Started 1</b></p> <p><b>Chapter 1 What is Machine Learning? 3</b></p> <p>Discovering Knowledge in Data 5</p> <p>Introducing Algorithms 5</p> <p>Artificial Intelligence, Machine Learning, and Deep Learning 6</p> <p>Machine Learning Techniques 7</p> <p>Supervised Learning 8</p> <p>Unsupervised Learning 12</p> <p>Model Selection 14</p> <p>Classification Techniques 14</p> <p>Regression Techniques 15</p> <p>Similarity Learning Techniques 16</p> <p>Model Evaluation 16</p> <p>Classification Errors 17</p> <p>Regression Errors 19</p> <p>Types of Error 20</p> <p>Partitioning Datasets 22</p> <p>Holdout Method 23</p> <p>Cross-Validation Methods 23</p> <p>Exercises 24</p> <p><b>Chapter 2 Introduction to R and RStudio 25</b></p> <p>Welcome to R 26</p> <p>R and RStudio Components 27</p> <p>The R Language 27</p> <p>RStudio 28</p> <p>RStudio Desktop 28</p> <p>RStudio Server 29</p> <p>Exploring the RStudio</p> <p>Environment 29</p> <p>R Packages 38</p> <p>The CRAN Repository 38</p> <p>Installing Packages 38</p> <p>Loading Packages 39</p> <p>Package Documentation 40</p> <p>Writing and Running an R Script 41</p> <p>Data Types in R 44</p> <p>Vectors 45</p> <p>Testing Data Types 47</p> <p>Converting Data Types 50</p> <p>Missing Values 51</p> <p>Exercises 52</p> <p><b>Chapter 3 Managing Data 53</b></p> <p>The Tidyverse 54</p> <p>Data Collection 55</p> <p>Key Considerations 55</p> <p>Collecting Ground Truth Data 55</p> <p>Data Relevance 55</p> <p>Quantity of Data 56</p> <p>Ethics 56</p> <p>Importing the Data 56</p> <p>Reading Comma-Delimited Files 56</p> <p>Reading Other Delimited Files 60</p> <p>Data Exploration 60</p> <p>Describing the Data 61</p> <p>Instance 61</p> <p>Feature 61</p> <p>Dimensionality 62</p> <p>Sparsity and Density 62</p> <p>Resolution 62</p> <p>Descriptive Statistics 63</p> <p>Visualizing the Data 69</p> <p>Comparison 69</p> <p>Relationship 70</p> <p>Distribution 72</p> <p>Composition 73</p> <p>Data Preparation 74</p> <p>Cleaning the Data 75</p> <p>Missing Values 75</p> <p>Noise 79</p> <p>Outliers 81</p> <p>Class Imbalance 82</p> <p>Transforming the Data 84</p> <p>Normalization 84</p> <p>Discretization 89</p> <p>Dummy Coding 89</p> <p>Reducing the Data 92</p> <p>Sampling 92</p> <p>Dimensionality Reduction 99</p> <p>Exercises 100</p> <p><b>Part II: Regression 101</b></p> <p><b>Chapter 4 Linear Regression 103</b></p> <p>Bicycle Rentals and Regression 104</p> <p>Relationships Between Variables 106</p> <p>Correlation 106</p> <p>Regression 114</p> <p>Simple Linear Regression 115</p> <p>Ordinary Least Squares Method 116</p> <p>Simple Linear Regression Model 119</p> <p>Evaluating the Model 120</p> <p>Residuals 121</p> <p>Coefficients 121</p> <p>Diagnostics 122</p> <p>Multiple Linear Regression 124</p> <p>The Multiple Linear Regression Model 124</p> <p>Evaluating the Model 125</p> <p>Residual Diagnostics 127</p> <p>Influential Point Analysis 130</p> <p>Multicollinearity 133</p> <p>Improving the Model 135</p> <p>Considering Nonlinear Relationships 135</p> <p>Considering Categorical Variables 137</p> <p>Considering Interactions Between Variables 139</p> <p>Selecting the Important Variables 141</p> <p>Strengths and Weaknesses 146</p> <p>Case Study: Predicting Blood Pressure 147</p> <p>Importing the Data 148</p> <p>Exploring the Data 149</p> <p>Fitting the Simple Linear Regression Model 151</p> <p>Fitting the Multiple Linear Regression Model 152</p> <p>Exercises 161</p> <p><b>Chapter 5 Logistic Regression 165</b></p> <p>Prospecting for Potential Donors 166</p> <p>Classifi cation 169</p> <p>Logistic Regression 170</p> <p>Odds Ratio 172</p> <p>Binomial Logistic Regression Model 176</p> <p>Dealing with Missing Data 178</p> <p>Dealing with Outliers 182</p> <p>Splitting the Data 187</p> <p>Dealing with Class Imbalance 188</p> <p>Training a Model 190</p> <p>Evaluating the Model 190</p> <p>Coeffi cients 193</p> <p>Diagnostics 195</p> <p>Predictive Accuracy 195</p> <p>Improving the Model 198</p> <p>Dealing with Multicollinearity 198</p> <p>Choosing a Cutoff Value 205</p> <p>Strengths and Weaknesses 206</p> <p>Case Study: Income Prediction 207</p> <p>Importing the Data 208</p> <p>Exploring and Preparing the Data 208</p> <p>Training the Model 212</p> <p>Evaluating the Model 215</p> <p>Exercises 216</p> <p><b>Part III: Classification 221</b></p> <p><b>Chapter 6 <i>k</i>-Nearest Neighbors 223</b></p> <p>Detecting Heart Disease 224</p> <p><i>k</i>-Nearest Neighbors 226</p> <p>Finding the Nearest Neighbors 228</p> <p>Labeling Unlabeled Data 230</p> <p>Choosing an Appropriate <i>k </i>231</p> <p><i>k</i>-Nearest Neighbors Model 232</p> <p>Dealing with Missing Data 234</p> <p>Normalizing the Data 234</p> <p>Dealing with Categorical Features 235</p> <p>Splitting the Data 237</p> <p>Classifying Unlabeled Data 237</p> <p>Evaluating the Model 238</p> <p>Improving the Model 239</p> <p>Strengths and Weaknesses 241</p> <p>Case Study: Revisiting the Donor Dataset 241</p> <p>Importing the Data 241</p> <p>Exploring and Preparing the Data 242</p> <p>Dealing with Missing Data 243</p> <p>Normalizing the Data 245</p> <p>Splitting and Balancing the Data 246</p> <p>Building the Model 248</p> <p>Evaluating the Model 248</p> <p>Exercises 249</p> <p><b>Chapter 7 Naïve Bayes 251</b></p> <p>Classifying Spam Email 252</p> <p>Naïve Bayes 253</p> <p>Probability 254</p> <p>Joint Probability 255</p> <p>Conditional Probability 256</p> <p>Classification with Naïve Bayes 257</p> <p>Additive Smoothing 261</p> <p>Naïve Bayes Model 263</p> <p>Splitting the Data 266</p> <p>Training a Model 267</p> <p>Evaluating the Model 267</p> <p>Strengths and Weaknesses of the Naïve Bayes Classifier 269</p> <p>Case Study: Revisiting the Heart Disease Detection Problem 269</p> <p>Importing the Data 270</p> <p>Exploring and Preparing the Data 270</p> <p>Building the Model 272</p> <p>Evaluating the Model 273</p> <p>Exercises 274</p> <p><b>Chapter 8 Decision Trees 277</b></p> <p>Predicting Build Permit Decisions 278</p> <p>Decision Trees 279</p> <p>Recursive Partitioning 281</p> <p>Entropy 285</p> <p>Information Gain 286</p> <p>Gini Impurity 290</p> <p>Pruning 290</p> <p>Building a Classification Tree Model 291</p> <p>Splitting the Data 294</p> <p>Training a Model 295</p> <p>Evaluating the Model 295</p> <p>Strengths and Weaknesses of the Decision Tree Model 298</p> <p>Case Study: Revisiting the Income Prediction Problem 299</p> <p>Importing the Data 300</p> <p>Exploring and Preparing the Data 300</p> <p>Building the Model 302</p> <p>Evaluating the Model 302</p> <p>Exercises 304</p> <p><b>Part IV: Evaluating and Improving</b><b> Performance 305</b></p> <p><b>Chapter 9 Evaluating Performance 307</b></p> <p>Estimating Future Performance 308</p> <p>Cross-Validation 311</p> <p><i>k</i>-Fold Cross-Validation 311</p> <p>Leave-One-Out Cross-Validation 315</p> <p>Random Cross-Validation 316</p> <p>Bootstrap Sampling 318</p> <p>Beyond Predictive Accuracy 321</p> <p>Kappa 323</p> <p>Precision and Recall 326</p> <p>Sensitivity and Specificity 328</p> <p>Visualizing Model Performance 332</p> <p>Receiver Operating Characteristic Curve 333</p> <p>Area Under the Curve 336</p> <p>Exercises 339</p> <p><b>Chapter 10 Improving Performance 341</b></p> <p>Parameter Tuning 342</p> <p>Automated Parameter Tuning 342</p> <p>Customized Parameter Tuning 348</p> <p>Ensemble Methods 354</p> <p>Bagging 355</p> <p>Boosting 358</p> <p>Stacking 361</p> <p>Exercises 366</p> <p><b>Part V: Unsupervised Learning 367</b></p> <p><b>Chapter</b><b> 11 Discovering Patterns with Association Rules 369</b></p> <p>Market Basket Analysis 370</p> <p>Association Rules 371</p> <p>Identifying Strong Rules 373</p> <p>Support 373</p> <p>Confi dence 373</p> <p>Lift 374</p> <p>The Apriori Algorithm 374</p> <p>Discovering Association Rules 376</p> <p>Generating the Rules 377</p> <p>Evaluating the Rules 382</p> <p>Strengths and Weaknesses 386</p> <p>Case Study: Identifying Grocery Purchase Patterns 386</p> <p>Importing the Data 387</p> <p>Exploring and Preparing the Data 387</p> <p>Generating the Rules 389</p> <p>Evaluating the Rules 389</p> <p>Exercises 392</p> <p>Notes 393</p> <p><b>Chapter</b><b> 12 Grouping Data with Clustering 395</b></p> <p>Clustering 396</p> <p><i>k</i>-Means Clustering 399</p> <p>Segmenting Colleges with <i>k</i>-Means Clustering 403</p> <p>Creating the Clusters 404</p> <p>Analyzing the Clusters 407</p> <p>Choosing the Right Number of Clusters 409</p> <p>The Elbow Method 409</p> <p>The Average Silhouette Method 411</p> <p>The Gap Statistic 412</p> <p>Strengths and Weaknesses of <i>k</i>-Means Clustering 414</p> <p>Case Study: Segmenting Shopping Mall Customers 415</p> <p>Exploring and Preparing the Data 415</p> <p>Clustering the Data 416</p> <p>Evaluating the Clusters 418</p> <p>Exercises 420</p> <p>Notes 420</p> <p>Index 421</p>
<p><b>FRED NWANGANGA</b>, <b>P<small>H</small>D</b>, is an assistant teaching professor of business analytics at the University of Notre Dame's Mendoza College of Business. He has over 15 years of technology leadership experience. <p><b>MIKE CHAPPLE</b>, <b>P<small>H</small>D</b>, is associate teaching professor of information technology, analytics, and operations at the Mendoza College of Business. Mike is a bestselling author of over 25 books, and he currently serves as academic director of the University's Master of Science in Business Analytics program.
<p><b>INTRODUCING MACHINE LEARNING THROUGH THE INTUITIVE R PROGRAMMING LANGUAGE</b> <p>Machine learning and data analytics have emerged as important avenues of value creation. Through machine learning, you can discover hidden patterns in data, leading to new ideas and understandings that might remain unknown without this powerful technique. <i>Practical Machine Learning in R</i> offers a hands-on introduction to working with large datasets using the R programming language, which is simple to understand and was built specifically for statistical analysis. Even if you have no prior coding experience, this book will show you how data scientists put machine learning into practice to generate business insights, solid predictions, and better decisions. <p>Unlike other books on the topic, <i>Practical Machine Learning in R</i> provides both a conceptual and technical introduction to machine learning. Examples and exercises use the R programming language and the latest data analytics tools, so you can get started without getting bogged down by advanced mathematics. With this book, machine learning techniques—from logistic regression to association rules and clustering—are within reach. <p>The only book to integrate an intuitive introduction to machine learning with step-by-step technical applications, <i>Practical Machine Learning in R</i> shows you how to: <ul> <li>Conceptualize the different types of machine learning</li> <li>Discover patterns that exist within large datasets</li> <li>Begin writing and executing R scripts with RStudio</li> <li>Use R with Tidyverse to manage and visualize data</li> <li>Apply core statistical techniques like logistic regression and Naïve Bayes</li> <li>Evaluate and improve upon machine learning models</li> </ul>

Diese Produkte könnten Sie auch interessieren:

Software Process Modeling
Software Process Modeling
von: Silvia T. Acuna, Natalia Juristo
PDF ebook
96,29 €
A Software Process Model Handbook for Incorporating People's Capabilities
A Software Process Model Handbook for Incorporating People's Capabilities
von: Silvia T. Acuna, Natalia Juristo, Ana Maria Moreno, Alicia Mon
PDF ebook
149,79 €
XML for Bioinformatics
XML for Bioinformatics
von: Ethan Cerami
PDF ebook
53,49 €