Data Journalism Reading: Wk 8

The reading this weeks talks about the proposed interactive design for a bubble tree.  This is one of those instances where a picture makes much more sense than trying to describe something in words.

A bubble tree is basically used when one wants to illustrate hierarchies or subsets of categories. The core of the design here is that once you click on a bubble, the bubble ‘burst’ and reveals what it contains. Each click will take you into a subcategory bubbles. If you click outside the bubble, it closes up going up one step.

This is fairly easy to visualize and intuitive to use as most of us are now familiar with the folder system used in both Mac and PC operating systems. Each folder contains sub-folders and if you click on the back button or up button, you can go back one level of folder.

This visualization method is great for scientific representation where classification is a major part of the understanding process. It can also be used to chart out work flow and responsibilities for a team.


Data Journalism Critique: Wk 8


What we are looking at here is a simple interactive map with a sliding scale to show how the world’s urban population has changed from 1950 to present day, with an additional feature of projecting the population all the way up to 2050.

What I liked best about this visualization was that it chose to use circles to represent countries rather than the shapes of the countries. This cleans up a lot of clutter and makes it easier to see just how much the population is increasing as we slide the timescale from left to right. If we don’t want to do it manually, we can press the play button and see a gradual increase as the years go by.

When you mouse over one of the country circles, it gives you only two figures which are urban population and percentage of population that is urban. If you click on the circle, you can zoom in and mouse over the stats for smaller surrounding countries that may not be seen in the zoomed out version.

While the power of this visualization comes from it’s simplicity, I couldn’t help but think it was slightly incomplete.  Getting the number for the urban population of each country is an important statistic, but we need to make it relevant. How exactly does an increase in urban population affect a country? Possibly a decrease in agricultural output or an increase in service related jobs ? maybe an increase in education? I would have liked to see some more stats on the mouse-over that would give some additional information about how the increase in urban population affects the country as a whole.

I also wish the map had an option the switch between the bubbles representing percentage of urban population rather than the entire number for urban population. I think percentage is a more accurate indicator, for e.g. China is leading with 630 million urban dwellers but their percentage is only 47 percent. The U.S. on the other hand has a much smaller population but an urban population of 82 percent. If we were to compare the two nations, percentage would be a more accurate standard and the map should reflect that.

Data Visualization Critique: Wk 7

This weeks visualization comes to us from Information is Beautiful and it represents the profitability of Hollywood movies.

At first, the visual is intimidating because there are just so many bars and so many colors, you never feel like you’re going to get anything out of it, but once you start playing around with it, things become pretty simple.

I like that the chart does only a few things but it does them well. The top bars show how much the movie made and the bars extending downward show how much the movie cost to make (you can also reverse this).  This means its pretty easy to spot what the highest grossing movies are and you can run several different combos of the drop down menus to check some hypotheses.

For e.g., I wanted to see if the most profitable movies made the most money or simply cost the least. So I arranged the chart in order of profitability, we can see that on the left  side are low budget movies that didn’t really gross much but on the far right side are much taller bars showing larger revenues and also larger costs. In Hollywood, the majority rule is, you gotta spend money to make money.

The chart is interesting from a trivia point of view also because you can hover on each bar and it gives you some handy stats about the movie. The one stat I wasn’t sold on was “rating” which was supposed to be a combination of ‘audience’ and the film website ‘Rotten Tomatoes’. What do they mean by audience? Did they survey movie goers after they watched the movie. What did the audience rank the movie on?

Data Journalism Readings Week 7:

Time Series/Oversampling

The author makes the case for the use of ‘oversampling’ to standardize a subgroup of a particular set of data. This is used to correct for a bias or a higher variability in one group. You simply increase a number of instances for a variable and then weigh it down in proportion to what it is supposed to be with the overall group. This is a helpful too to use when representing data for analysis. While there are arguments to be made about how artificially setting irregular intervals to standardize a graph or representation can be misleading, if used correctly, it helps give us a more accurate representation of our data.

In the map example, the author gives different examples of how manipulating the classes can help illustrate different parts about the data. For e.g, in the population map of the US, he standardizes a large middle class for most of the numbers to fall in so that the map does a better job of highlighting extremes on the spectrum.

Wall Street Journal Style Guide

The first two chapters from the Wall Street Journal style guide to charts illustrate the dos and donts of data visualization.  It is a comprehensive guide telling us what we should keep in mind regarding each kind of chart, whether its a bar chart, a pie chart or a graph.

The guide walks us through four steps of creating a visualization, which is Research, Edit, Plot and Review. This is important to remember instead of plotting first and then having to go back and edit the numbers or the research. It also makes an important point that simply acquiring data is not enough. In itself we cannot make a large quantity of data work for us, it has to be made relevant, either by changing it’s value representations (e.g. changing numerical increase to percentage increase) or taking only a certain section of the data so that it is easier to digest.

The rest of the reading discusses things we should keep in mind, such as using the right units, starting base and choosing the right kind of chart for the corresponding data.

We are also given a guide to color schemes, what works and what stands out as garish or with too much contrast.

Readings: Wk 6

Choropleth Maps

The author discusses the various issues with creating a choropleth map, which is basically a map in which areas are shaded to represent differing amounts of a variable statistic.

Setting the class intervals can radically change how a map appears, using the same data, therefore its usually helpful to set the intervals so that there is an equal distribution within each class. This well help us when it comes to comparisons across regions. Then there is the choice of class colors and to decide the color gradient. If the shades are too close together, you won’t be able to see a significant difference across the map, but if they are radically different, we wont get a sense of increasing intensity. Finally, another issue to consider is that larger areas on the map may distort the overall picture or story we are getting because they take up more visual space, this doesn’t necessarily correspond to the actual statistical number they represent.

Proportional Maps and Cartographic Projections by Demographics

To solve the problem of areas distorting representation, we have the proportional map, where the physical count of each statistic dictates the size of the area it is representing. In this proportional map of the electoral votes in the US, the designer has designated one square to represent each electoral vote in each state. This affects the perspective of the election in a very significant way. Mow when we look at the original map of the New York Times, it looks significantly red, and seems like an overwhelming majority voted for Bush. With the proportional map, you can now see much more blue and its obvious that the election was much closer than represented.

Data Visualization Critique: Wk 6

This map, created by Columbia University’s Engineering School does an excellent job of showing the estimated annual consumption of energy by buildings in New York City. Right from the start we’re able to identify that red mean more energy consumption than green and are able to calibrate all the color shades in the middle accordingly.

The layout of the map is simple and easy to understand and yet it contains a remarkable amount of information, if the viewer is so inclined. If you click to pull up the interactive version of the map, you can click on each block and it will give you the energy consumption stats for that area. More importantly it gives us the breakdown for energy consumption which is arguably more relevant. We can see how much are building expends on heating, cooling, electricity and hot water. This helps us answer questions about how we use energy in the year.  Some of the results are unexpected, when we find that heating takes up more energy than cooling, given how much we run air-conditioners in the summer.  The information is also relevant to landlords or real estate developers who might own blocks of property in the same area and come up with collective plans to cut down their energy consumption.

There is also the element of personalization. We can easily navigate our map to the street we live on or the building we work in and get the figures for our energy consumption. We could also use this information to compare different areas or buildings in the same areas that have differing energy consumptions, leading us to investigate the causes for that.

Via NYTimes Green.

Data Visualization Critique:Wk 5

In the spirit of Valentine’s Day, The Guardian created a map of all the places in the world where Valentine’s Day is celebrated. It’s a sweet, simple visualization that doesn’t claim to convey any heavy data, mostly just trivia, but even then they get a few things wrong.

It’s cute and pretty obvious to use a heart as an icon for Valentine’s Day but why would they use a double heart? It looks like there are two spots in each country and it would drive a user mad to click twice thinking there are two icons so close together when in fact it is just one. This also gives the map an extremely cluttered look, something that could have been easily avoided with a simpler, sharper icon.

When you click on the double heart, you get an interesting tidbit about how Valentine’s is celebrated in that particular country. This is not very elegant, but it is effective. However, the info boxes that popped up also had fields like

2nd Day:
2nd Date Celebrated:
2nd Reason:
3rd Day:
3rd Date Celebrated:
3rd Reason:

Which were all basically empty (for the majority of the countries, even after much random clicking I couldn’t find one that had those fields filled). Were those fields really necessary? They didn’t contain any vital information and it was so sparsely filled you could barely compare it across countries. There could have been so many more sta in those info boxes like “percentage of population married” or “percentage of single men/women”, something to do with love or romance.

This ended up being a pretty dull representation of what could have been a really fun visual.

%d bloggers like this: