# Preface¶

It is said that the most important characteristic of a data scientist is curiosity. Curiosity has certainly led me on a path of discovery throughout the world of data science and many fascinating data sets that I have encountered. So, the premise of this book is to let the data sets lead you to learning. The best and most interesting way to learn is to find some data and then begin to ask questions about it an analyze it, visualize it, and then write down new questions that have occurred to you as you have been doing your initial analysis.

This is how I organized the first two data science courses I ever taught, and surprisingly it worked. In fact it worked so well that I would never want to teach it any other way. Nevertheless it may not be clear from a high level look at the table of contents what this course covers and the learning goals it strives to achieve. So let me lay it out for you in a different organization.

## Learning Objectives¶

Articulate the data science processing pipeline

Extract data using SQL

Gather data from the Internet using web API’s and screen scraping

combine data from different sources

Clean the data

Handle missing data/finding outliers/fixing data

Normalize and rescaling data

Visualize the data

Translate questions to analysis and analysis to interesting stories

Analyze data

Single variable regression, logistic regression

Market basket analysis

Cohort analysis

Sentiment analysis, exposure to Bayes Theorem

Time series

Geographic analysis

Simulations, Monte Carlo

Understand statistical significance and how to test for it using practical simulation techniques.

## More Traditional topic Outline¶

Data Gathering

Using Web APIs

reading CSV files

Screen Scraping

Reading data from relational databases with SQL

Data Munging

dealing with missing data

string processing

regular expressions

re-encoding data (one-hot)

re-scaling data

Data Querying

filter

group by and aggregation

joining

sorting

reshaping

pivoting

Analytical techniques

Linear Regression

Sentiment analysis

Market basket analysis

Cohort analysis

Time series

Visualization

Understanding Distributions

Histogram

Box and whisker plot

Violin plot

Understanding relationships

scatter plot

bubble plot

heat map

Network diagrams

chord charts

Making Comparisons

bar chart / stacked bar chart

line chart

spider plot

Geographic analysis

Choropleth maps