Data Visualization

Introduction

Data visualization is a very important tool when working with very large data sets. When it takes hours to run all of your data through a program, creating a single visualization can help identity patterns that might take some time to otherwise discover. It can also help people who are not familiar with the data better understand what story it is telling.

It is important when working with data visualization to be mindful of some core principles so that your visualizations can most clearly communicate your data and so that you don’t misrepresent your data. It can be very easy to accidentally create difficult to understand visuals or to misrepresent values, particularly when you are expecting the user to draw comparisons between elements in your visualization.

Edward Tufte, a key figure in data visualization breaks it down into two categories, Graphical Integrity (don’t lie with data) and Graphical Excellence (be efficient and clear). These are the two factors we will look at in the following explorations.

Key Questions

  • What are the major issues to consider when trying to accurately visualize data?
  • What are important factors to consider in terms of design when creating visualizations?
  • What are the various components of matplotlib figures?

Assignment Overview

This is a lighter week in terms of assignment to let you do some more work where you see fit. You will need to critique some visualizations and use matplotlib but there won’t be a whole lot in terms of major deliverables.

Explore the Topics

Graphical Excellence and Integrity
A look at best visualization practices based on the concepts of data excellence and integrity.
matplotlib
This exploration includes a Jupyter notebook which is where the content is actually located. It will let you interactively explore matplotlib

Additional Resources

Jupyter Notebook
This is the tool you will be using to interact with matplotlib it allows you to view and modify code while working within a notebook that guides you through the matplotlib components.
https://matplotlib.org/
The official Matplotlib website. You can find documentation here. I find that I get the most value at looking at their examples and experimenting. The actual documentation is subpar in my opinion.
The Visual Display of Quantitative Information
This is a paper book by Edward R. Tufte. He is a leading figure in the area of data visualization. If you are interested in learning about data visualization design this is a must have book. A lot of newer work is based on Tufte’s initial works.

Review

Matplotlib is more or less the standard when it comes to data visualization in Python. There are other languages I would prefer to use if I were doing interactive visualization (e.g. JavaScript with the D3 library) but they are inferior when it comes to data processing. So if you want to easily tie your visualizations to your data in one language, Matplotlib is really the only way to go.

In addition to learning Matplotlib you should have started to think about what makes a good and useful visualization when it comes to data and should be prepared to make some visuals of your own for your upcoming projects.