Before you keep reading...
Runestone Academy can only continue if we get support from individuals like you. As a student you are well aware of the high cost of textbooks. Our mission is to provide great books to you for free, but we ask that you consider a $10 donation, more if you can or less if $10 is a burden.
Before you keep reading...
Making great stuff takes time and $$. If you appreciate the book you are reading now and want to keep quality materials free for other students please consider a donation to Runestone Academy. We ask that you consider a $10 donation, but if you can give more thats great, if $10 is too much for your budget we would be happy with whatever you can afford as a show of support.
7.2 Big Data¶
Time Estimate: 45 minutes
Introduction and Goals¶
We live in the information age with an exponential growth of data. In 2010 Eric Schmidt, the CEO of Google, said, "There were five exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days." In 2019, the World Economic Forum estimated that "the entire digital universe is expected to reach 44 zettabytes by 2020."How much is an Exabyte or Zettabyte? Here is a visualization and a table from the same article at the World Economic Forum. Click on each to view full-size versions.
- describe what information can be extracted from data
- identify what qualifies as big data
- describe challenges associated with processing big data sets
- recognize both benefits and harms of using big data
- discuss privacy and security concerns related to a data set
- use target vocabulary, such as megabyte, gigabyte, and terabyte while describing the effects of big data, with the support of concept definitions from this lesson
We live in the era of Big Data which refers to data sets that are too large to fit on a normal computer or be processed by a standard spreadsheet or database program. Large data sets are difficult to process using a single computer and may require parallel systems (multiple computers working together to run an algorithm). Scalability of systems is an important consideration when working with large data sets, as the computational capacity of a system affects how data sets can be processed and stored.
We will explore Big Data through a number of videos from the PBS documentary, The Human Face of Big Data. We will start with a short (2:31) video, Everything Is Quantifiable.
The field of Data Science deals with extracting information from and visualizing the results of manipulating large data sets. The size of a data set affects the amount and quality of information that can be extracted from it. From this information, further analysis may yield knowledge or even wisdom. Tables, diagrams, text, and other visual tools can be used to communicate insight and knowledge gained from data. We often think of data, information, knowledge and wisdom forming a pyramid.
Data provide opportunities for identifying trends, making connections, and addressing problems. Computing enables new methods of deriving information from data, driving monumental change across many disciplines — from art to business to science. Keep the DIKW pyramid in mind as you watch the short 3 minute video, Learning Revealed: Acquiring Language.
Impacts of Big Data
Careful analysis of data can help us solve many problems. Watch the following 4-minute video to see how tracking data on The Smallest Heartbeat can help save a child's life.
Bias in Data
The path from data to information to knowledge is not always straightforward. Bias can be introduced into the collection and analysis of data with dangerous results. Care must be taken when collecting and analyzing data. Problems of bias are often caused by the type or source of data that is being collected. Bias is not eliminated by simply collecting more data.
Joy Buolamwini from the MIT Media labs studies the impact of bias in face recognition systems. Watch this video about her research.
This spoken word piece by Joy Buolamwini highlights how computer systems based on incomplete data misinterpret the images of iconic black women.
Big Data Activity: Exploring Data SetsExplore some of examples of big data and find at least two data sets that interest you. Some ideas of where to find data sets are below. Then, answer the following reflection questions in your portfolio.
- What specifically were the types of data (text, sounds, transactions, etc.) included in the data set you chose?
- What new facts did you learn when exploring the data set? List at least 3 facts.
- Write a question you have about the data set you chose. Now, convert that question into a hypothesis (a statement) with your prediction about the data.
- Identify at least one security and/or privacy concern that is associated with the data in the data set you chose.
- If your data set included a visualization, explain the purpose of the visualization. How would you change or improve the visualization? If it did not include a visualization, describe one that you think would be useful in understanding the data.
- Wikipedia Article on Big Data
- Reddit maintains a Data is Beautiful site that has lots of visualizations of interesting data sets. Browse through that collection.
- These data sets allow you to create visualizations with different types of graphs to explore the data.
- Here's a nice visualization of student debt that was put together by the New York Times.
- This is a nice interactive visualization of how the Internet has grown and when various technologies have been introduced.
- NY Times How much warmer was your city in 2016? visualization
- NY Times Air Pollution in Cities visualization
In this lesson, you learned how to:
Sample AP CSP Exam Question
Reflection: For Your Portfolio¶
Answer the following portfolio reflection questions as directed by your instructor. Questions are also available in this Google Doc where you may use File/Make a Copy to make your own editable copy.