# Big Data¶

## Introduction and Goals¶

We live in the information age with an exponential growth of data. In 2010 Eric Schmidt, the CEO of Google, said, "There were five exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days." In 2019, the World Economic Forum estimated that "the entire digital universe is expected to reach 44 zettabytes by 2020."

How much is an Exabyte or Zettabyte? Here is a visualization and a table from the same article at the World Economic Forum. Click on each to view full-size versions.

Learning Objectives: I will learn to
• describe what information can be extracted from data
• identify what qualifies as big data
• describe challenges associated with processing big data sets
• recognize both benefits and harms of using big data
Language Objectives: I will be able to
• discuss privacy and security concerns related to a data set
• use target vocabulary, such as megabyte, gigabyte, and terabyte while describing the effects of big data, with the support of concept definitions from this lesson

## Learning Activities¶

### Big Data

We live in the era of Big Data which refers to data sets that are too large to fit on a normal computer or be processed by a standard spreadsheet or database program. Large data sets are difficult to process using a single computer and may require parallel systems (multiple computers working together to run an algorithm). Scalability of systems is an important consideration when working with large data sets, as the computational capacity of a system affects how data sets can be processed and stored.

We will explore Big Data through a number of videos from the PBS documentary, The Human Face of Big Data. We will start with a short (2:31) video, Everything Is Quantifiable.

### Data Science

The field of Data Science deals with extracting information from and visualizing the results of manipulating large data sets. The size of a data set affects the amount and quality of information that can be extracted from it. From this information, further analysis may yield knowledge or even wisdom. Tables, diagrams, text, and other visual tools can be used to communicate insight and knowledge gained from data. We often think of data, information, knowledge and wisdom forming a pyramid.

Data provide opportunities for identifying trends, making connections, and addressing problems. Computing enables new methods of deriving information from data, driving monumental change across many disciplines — from art to business to science. Keep the DIKW pyramid in mind as you watch the short 3 minute video, Learning Revealed: Acquiring Language.

### Impacts of Big Data

Careful analysis of data can help us solve many problems. Watch the following 4-minute video to see how tracking data on The Smallest Heartbeat can help save a child's life.

### Bias in Data

The path from data to information to knowledge is not always straightforward. Bias can be introduced into the collection and analysis of data with dangerous results. Care must be taken when collecting and analyzing data. Problems of bias are often caused by the type or source of data that is being collected. Bias is not eliminated by simply collecting more data.

Joy Buolamwini from the MIT Media labs studies the impact of bias in face recognition systems. Watch the following video about her research.

The following spoken word piece by Joy Buolamwini highlights how computer systems based on incomplete data misinterpret the images of iconic black women.

### Big Data Activity: Exploring Data Sets

Explore some of examples of big data and find at least two data sets that interest you. Some ideas of where to find data sets are below. Then, answer the following reflection questions in your portfolio.
1. What specifically were the types of data (text, sounds, transactions, etc.) included in the data set you chose?
2. What new facts did you learn when exploring the data set? List at least 3 facts.
3. Write a question you have about the data set you chose. Now, convert that question into a hypothesis (a statement) with your prediction about the data.
4. Identify at least one security and/or privacy concern that is associated with the data in the data set you chose.
5. If your data set included a visualization, explain the purpose of the visualization. How would you change or improve the visualization? If it did not include a visualization, describe one that you think would be useful in understanding the data.
Here are some websites where you can explore big data sets.

## Summary¶

In this lesson, you learned how to: