Data visualization is a very important tool when working with very large data sets. When it takes hours to run all of your data through a program, creating a single visualization can help identity patterns that might take some time to otherwise discover. It can also help people who are not familiar with the data better understand what story it is telling.
It is important when working with data visualization to be mindful of some core principles so that your visualizations can most clearly communicate your data and so that you don’t misrepresent your data. It can be very easy to accidentally create difficult to understand visuals or to misrepresent values, particularly when you are expecting the user to draw comparisons between elements in your visualization.
Edward Tufte, a key figure in data visualization breaks it down into two categories, Graphical Integrity (don’t lie with data) and Graphical Excellence (be efficient and clear). These are the two factors we will look at in the following explorations.
matplotlib
figures?This is a lighter week in terms of assignment to let you do some more work where you see fit. You will need to critique some visualizations and use matplotlib
but there won’t be a whole lot in terms of major deliverables.
matplotlib
it allows you to view and modify code while working within a notebook that guides you through the matplotlib
components.Matplotlib is more or less the standard when it comes to data visualization in Python. There are other languages I would prefer to use if I were doing interactive visualization (e.g. JavaScript with the D3 library) but they are inferior when it comes to data processing. So if you want to easily tie your visualizations to your data in one language, Matplotlib is really the only way to go.
In addition to learning Matplotlib you should have started to think about what makes a good and useful visualization when it comes to data and should be prepared to make some visuals of your own for your upcoming projects.