Preface from the Second Edition¶
by Jan Pearce and Jacqueline Boggs
We are excited to bring you this enhanced version of this book. As we were planning to teach a course in data analytics, a course which is cross listed in computer science and business at our institution, we found it quite challenging to identify a book that had appropriate content for this type of interdisciplinary course. We were so very excited to find this open source book due to the clear focus on the data. We both believe that curiosity is exactly what drives data science and data analytics. When we encounter a set of data, it leads us to ask provocative questions that can often be answered by the data techniques covered in this book.
As professors, we believe it is crucially important that students build life-long learning skills. We have found that it is sometimes difficult for students to transfer learning to another area/topic/dataset. For these reasons, we wanted to add some additional datasets into this book, so we could help students learn to better apply and transfer their knowledge.
Some of the key changes from the First Edition include:
Learning Goals, Learning Objectives, and Glossaries added to each chapter.
Chapter titles that identify the data technique to be utilized while still letting curiosity about each of the datasets drive the exploration.
The fourth chapter has been significantly expanded to include a targetted introduction/review of Python.
The option to choose to use Google Colaboratory Notebooks or an Anaconda installation using Jupyter Notebooks.
Additional datasets presented as case studies that focus on business applications added in addition to the existing case studies on other interesting topics.
One can find data science offered by departments such as computer science, math or statistics, as well as business, so this edition strives to appeal to the interests of students in each of these disciplines. Of course, the applications of data science are even broader and have broad application across the entire curriculum. Our best hope is that the second edition of this text can be used for courses in Data Science, Data Analytics, Business Analytics, and possibly beyond!
We hope you like it and would love to hear from you!
Preface from the First Edition¶
by Brad Miller
It is said that the most important characteristic of a data scientist is curiosity. Curiosity has certainly led me on a path of discovery throughout the world of data science and many fascinating data sets that I have encountered. So, the premise of this book is to let the data sets lead you to learning. The best and most interesting way to learn is to find some data and then begin to ask questions about it an analyze it, visualize it, and then write down new questions that have occurred to you as you have been doing your initial analysis.
This is how I organized the first two data science courses I ever taught, and surprisingly it worked. In fact it worked so well that I would never want to teach it any other way. Nevertheless it may not be clear from a high level look at the table of contents what this course covers and the learning goals it strives to achieve. So let me lay it out for you in a different organization.
Articulate the data science processing pipeline
Extract data using SQL
Gather data from the Internet using web API’s and screen scraping
combine data from different sources
Clean the data
Handle missing data/finding outliers/fixing data
Normalize and rescaling data
Visualize the data
Translate questions to analysis and analysis to interesting stories
Single variable regression, logistic regression
Market basket analysis
Sentiment analysis, exposure to Bayes Theorem
Simulations, Monte Carlo
Understand statistical significance and how to test for it using practical simulation techniques.
More Traditional Topic Outline¶
Using Web APIs
reading CSV files
Reading data from relational databases with SQL
dealing with missing data
re-encoding data (one-hot)
group by and aggregation
Market basket analysis
Box and whisker plot
bar chart / stacked bar chart