According to Michael Friendly, in Milestones in the History of Thematic Cartography, Statistical Graphics, and Data visualization (pdf), data visualization is..
..the science of visual representation of “data”, defined as information which has been abstracted in some schematic form, including attributes or variables for the units of information.1
After that, the article starts to get a little thick. I would describe it as a variety of methods for expressing large amounts of data in ways that help that data to tell its story. Those of you who have seen my presentations about contemporary literacy, have probably watched me import into Excel™ earthquake data from the Advanced National Seismic Systems (ANSS), and then express that data by plotting each event’s longitude and latitude on a scatter plot. The result is a fairly recognizable map of the world through its numerous fault lines. It’s a dramatic demonstration of data visualization with just a handful of mouse clicks.
But data vis and its cousins (information graphics, information visualization, scientific visualization, and Statistical graphics) have become something of an art-form in the past few years, owing to the availability of data-processing tools, the astounding increased access in access to data, and the imaginations of some very creative people.
I have an ongoing Twitter search for posts that mention visualization, and — aside from the few new-age graphic psychedelics and wisp music, designed to cure me — I often end my work day by browsing the various announcements of data vis examples, new tools, and blog posts on the subject. It relaxes me.
I’ve decided to take a stab at some of the more sophisticated methods, and chose Processing 1.0, initiated by Ben Fry and Casey Reas and continued out of the Aesthetics and Computation Group at the MIT Media Lab.
Processing is an open source programming language and environment for people who want to program images, animation, and interactions. It is used by students, artists, designers, researchers, and hobbyists for learning, prototyping, and production. It is created to teach fundamentals of computer programming within a visual context and to serve as a software sketchbook and professional production tool.
The tutorials are decent and there are some fairly rich reference pages — which is where you do most of your learning when tackling a new programming language. What frustration I experienced came from wanting to do things I’m accustomed to doing with PHP, and finding that it is similar enough to PHP, yet different enough to have to resort to a lot of experimentation to make it work.
I finished the first project several days ago, starting in a realm of familiarity — earthquake data. I basically doctored the same dataset that I typically use from ANSS, using Excel to extract out the dates, magnitude, depth at the epicenter, and lon/lat coordinates — saving it all as a text file.
Using the graphing capabilities of Excel enables me to plot the locations of the quakes, or magnitude, or depth, or frequency. But not all at the same time. When I can literally talk to the computer, explaining in its language, what I want to happen with each data record, I can have it plot the locations and also size and color each dot according to the magnitude of the seismic event. I’ve also had it run the data chronologically, so that you see the month of December 2004 progress with each day displaying a bar graph with the number of quakes. The magnitude and depth of each quake is also illustrated. (see YouTube video above)
My second vis is based on data from Bowker, managers of the ISBN numbers associated with published books. The data includes the number of books published from 2002 to 2008, broken out by genre (agriculture, arts, biology, business, computers, etc.) My visualization (to the right) illustrates the percent of each genre among the total number of traditionally published books for each year. Of particular interest to me is the bottom part, where I show not only the increase of book publishing over the last few years, but the portion of published books that are on-demand printed and short runs — often self-published books. This indicates both an increased access to publishing by people like you and me and also the growth of self-publishing.
The point of this for education has four parts:
- Literacy – It’s about communication, creating a lens through which the numbers that describe some object, action, or condition in the world can be readily seen and understood by an audience — in much the same way that sentences can serve as lenses to see what you see, hear, smell, and touch in your world.
- Technology – It’s about learning to use the available and emerging technologies in order to work those numbers into a lens. The emphasis is on the Learning, not on the technology. What is new five years from now (one year from now or one month from now) will have to be learned. It will not be taught.
- Mathematics – Building these visualizations was almost entirely about mathematics. It was not deep math, though I suspect that a better memory of Mr. Gilianti’s Trigonometry 104 class at Gaston College might have come in handy. It was not deep math, but it was a lot of thinking in the language of numbers. It was about working those numbers so that they would express themselves in ways that were visual and consistent to a visual theme.
- Creativity – I mentioned that data visualization has become something of an art-form. It requires a certain amount of inventiveness to imagine the most effective expression of the numbers and to creatively evoke the unexpected. We learn well, what surprises us.
- Friendly, Michael. “Milestones in the History of Thematic Cartography, Statistical Graphics, and Data Visualization.” York University. 24 Aug 2009. York University Mathematics Department, Web. 14 Dec 2009.?