# Unit 3: Data

The goal of Unit 3 is for students to see the availability of large-scale data collection and analysis in every area they can imagine. Students examine very large data sets tied to themselves as well as to areas of work and society. They learn a variety of data visualization techniques and work to recognize opportunities to apply algorithmic thinking and automation when considering questions that have answers embedded in data. The complexity of the data sets, visualizations, and analysis increases in the second lesson of the unit, challenging students to generalize concepts developed in the first lesson.

**3.1 Visualizing Data**

- The goal of this lesson is for students to be able to create visualizations to analyze sets of large data and to meaningfully interpret the patterns they uncover. They draw conclusions about themselves from relevant data, including local weather, the economics of their community, and naming trends with their name. At the beginning of the lesson, students weigh societal concerns around the collection and persistence of Big Data. The students learn how to use Python to make useful graphic representations of data, developing from familiar visualizations to more modern visual analyses like scaled-dot or colorized scatter plots of multidimensional data sets. Students are introduced to basic ExcelÂ® spreadsheet programming and cell manipulation. A Monte Carlo simulation is used to help students appreciate the meaning of evidence for association between two variables.

**3.2 Discovering Knowledge from Data**

- As in the previous lesson, the goal of this lesson is for students to be able to create a range of visualizations to analyze complex sets of large data and to meaningfully interpret the patterns they uncover. Students use statistics to deepen the meaning of knowledge gained by visualization. The hooks are again conclusions they can draw about themselves from relevant data, including various geographic perspectives on their life and facial recognition of their own features. The lesson uses Excel as well as Python to manipulate and visualize data. Students examine multidimensional data sets using scatter plot arrays and view geographic and social data using heat maps and directed graphs. Students experiment with object recognition and face recognition. They are challenged to discover clustering and linear correlation patterns lurking in data sets distributed across student computers and school sites, such that data cleaning and warehousing are necessary. Finally, student teams choose a question and answer it using large data.