On Thursday Facebook had the third-largest I.P.O. ever. In the week leading up it, my colleague Amanda Cox spent some time thinking how to best explain and contextualize this offering to readers. What follows is a series of sketches from Amanda, who shared her project folder with me for this post, and Matt Ericson, who edited the piece.
The universe of initial public offerings is seemingly simple: about 2,400 tech companies since 1980, compiled by Jay Ritter, a professor of finance at the University of Florida.
As a first step, Amanda charted the companies by I.P.O. date (x-axis) and value at I.P.O. (y-axis), colored them by their 3-year return. (The key’s not included in her sketch, but for these purposes, know that red is bad and green is good.)
This chart’s not bad (even if, like me, you have low standards), but it doesn’t say much other than that there was a dot-com boom, that most of those companies didn’t do so well, and that Facebook is worth a ton of money.
Next, a plot of 3-year return by I.P.O. date:
Trying to add in more nuance to this picture, shading the companies by the companies’ price-to-sales ratio at I.P.O. and including Facebook in a random position just for size:
But rather than bringing clarity, it just sort of looked chaotic, even to the seasoned chart freaks of 620 8th Avenue. So she tried another form: a histogram of 3-year returns, colored by I.P.O. date:
Or the same chart but piled into three time periods (not that anyone asked me, but I really like this one):
By the way, even the queen bee of statistical charting screws up that chart the first time (be conservative with your “cex” values, folks):
Another idea, vaguely reminiscent of the balloons from “Up,” is sales vs. market cap at I.P.O. colored by year. I won’t lie, I don’t get this one:
Going back to time series, which many readers are more accustomed to reading and understanding, Amanda focused on one thing that always gets talked about with IPOs: almost all of the companies have a bump in market cap after their first day of trading. So she charted the “trails” of companies over their first day on the market (a log scale makes percentage changes look the same):
The trails felt promising, so she pursued them with sales, too. (Along with some screw-ups.) Again, full transparency here, I don’t get this one either, but since there are some screw-ups in there I think we’re safe:
At this point, there were a lot of charts made, but no clear answers about form or the best things to show. Matt Ericson, eyeing the looming deadlines, looked through Amanda’s analysis and offered a compromise of sorts, related to the histogram she had generated earlier, and suggested a slightly different form:
Which turned into this:
And, ultimately, into this:
If you’ve seen the web version, though, you know it doesn’t look like this. [Amanda thinks print graphics can be smarter than web graphics.] For one, the browser window doesn’t give us this kind of space. But the medium itself plays a part too. Online, if you’re not engaged in 10 seconds, you’re not going to stay on the page, so they needed to keep it fun. For that, Amanda and Matt got some help from three (pretty badass) colleagues: Jeremy Ashkenas, Matt Bloch and Shan Carter. Together, they made an interactive chart that stepped through a handful of the steps above, slowly explaining the dataset, with each step building on the last:
A couple major design processes are at work in this piece. First, sketching with data is massively important. Only by looking at the data in multiple forms, from different angles, did this group of visual journalists really peel back what was most interesting about it. Here, we saw histograms, crazy arrow charts, bubble charts, time series and others – all shaded with different variables. All but one, more or less, got cut.
Second, and related, is that you go with the chart you have when the deadline comes – or that you’re only as good as the last chart you threw away. (Her words, not mine.)
To be quite honest, Amanda wasn’t thrilled with her graphics that went in the paper and online. (She is always searching for The Perfect Form, whether or not it’s there.) If the I.P.O. were delayed another week, there would be another dozen charts in the trash can and maybe something else would be the last good chart. But you go to print with the charts you have, not the charts you want. So, you know, make a lot of them.
Last week the Times published their interactive electoral map. Although a medium-sized team of reporters, editors, designers and developers (including, but not limited to, Jeremy Ashkenas, Matt Ericson, Alan McLean, David Nolen and Derek Willis) had a hand in designing and building the project, Shan Carter did much of the developing of the main visualization, and he agreed to let me post some of his sketches here. (I had no hand in this – I’m just the image copy-paster this evening.)
You’ll notice some similarities – there is analysis for every state and the option to share your own map. But they wanted to explore some different options this year, too. First, Shan started by making a cartogram in Illustrator, overlaid on a (pretty terrible) hand trace of the US:
And then slowly tinkering with it:
One idea was to take the geography out of the graphic completely:
Or at least minimize it further by dividing states into regions:
Another was to compare two maps side-by-side, similar to the “split screen” view of the Senate in 2008:
But no one was really super thrilled with maps as the main conduit for the analysis. Instead, they decided on minimizing the geography and using “bins” for states. (Shan has sort of been obsessed with “bins” since 2008, when his dream of having states magically fall into buckets on election night ultimately didn’t pan out. I personally had to cheer him up after that and it was not pretty.)
Anyway, an early prototype of that concept:
And how that part of the graphic ultimately looked:
If you’ve seen this piece by now, you’ll notice that they didn’t make just one decision – they expanded on a few of them in a compelling mix of interactive and linear storytelling that told a few different stories and also let you make your own and share it wherever you wanted.
It’s also a fun insight into Shan’s workflow, which is to mostly experiment directly with markup rather than with flat outputs from R or Adobe Illustrator mockups, which many of us do. (OK, technically, he tells me the cartograms, being more art than science, were hand-made in Illustrator and then their xy positions were exported to D3, but still, he’s on the record saying “mockups are for suckers.”)
Also, this was made using D3 and implemented a technique that let the graphic function properly even in Internet Explorer 8. (A sharp guy named Jim Vallandingham chronicled this in extreme detail if you’re interested in doing this sort of thing.)
One of the best things about working at a newspaper is that you can come into work and do something different every day. Yesterday I had planned on spending the day doing some longer-term work in preparation for the Olympics and generally phoning it in Friday-style when a handful of us got assigned a daily – a graphic that looked back on Mariano Rivera’s career in light of his A.C.L. injury on Thursday. I was totally going to do an insane 3D-video that analyzed his cutter, but apparently someone did that already, so we went with charts instead. I looked at saves over time of top pitchers while my colleague Tom Giratikanon, who just started this week, compared Rivera across different categories.
We had a broad idea for what we were going for, which Matt Ericson sketched out by hand:
I scraped the data for the players with the most saves from baseball-reference.com (using an old template Shan Carter made using hpricot, which I learned is now “over”), then sketched the top 250 or so in R. This only takes a couple seconds to read about, but it was in fact at least two hours of screws ups and swearing before I saw this chart:
Which eventually turned to this (we export odd colors to pick them up easily in Adobe Illustrator):
And the final print version:
Online, we took basically the same approach, except we wanted to make them interactive, so Shan Carter pitched in some D3 expertise and Tom made his in Raphael, and six painless hours later, after all the programming, browser checking, conditional loading (which might not be a term) and Matt Ericson VPNing in from New Jersey to fix everything, we had a nice interactive, mostly mobile-friendly graphic:
Our approach wasn’t revolutionary or anything – in fact, Amanda and I used an identical charting form to chart home runs a couple years ago – but the package worked well, and if anything, Rivera stands out more in the saves chart than Barry Bonds does in the homers chart. And it was a promising start to the possibility of turning around this kind of work on deadline.
Elisabeth Bumiller’s recent profile of Jeremy Bernard, the first man and openly gay person to be the White House social secretary, used an interesting dataset: a list of everyone who has attended a state dinner in the Obama administration. I don’t have a ton of experience with Styles (or with “style”, for that matter), but this was a good chance to do something different with a new section. Except not that different, since charts are pretty much the only trick.
Alicia Parlapiano and I ended up using a sort of spiral plot, which we then just joined together in illustrator. I remembered that we had used a similar technique in one of my first graphics at the Times to visualize which countries were good at which sports. (Then, as now, Amanda did the hard stuff.) So I ported the code from Actionscript to use for this, while also sizing for frequency of visits.
Here’s the sketch:
And how it looked in print:
Matt Ericson and Amanda Cox helped out on a late night to make a fun interactive version, perfect for gawking at all those people who were invited instead of you.
In last Sunday’s, paper Mike McIntire and Michael Luo published their investigation into White House visits by large Democratic donors. As simple as the chart was, we pondered many complex options before publishing it.
Early on, I thought some large-scale visualization of all major donors might be interesting, so I plotted a couple hundred of the top donors (based loosely on first and last names) with donations and WH visits on the same axis to see if there was any meaningful pattern. It looked like this:
Although it looked sort of cool (in a meaningless data-art kind of way), nothing there illuminated the real focus of the story – namely, the possibility that large donors might get more access to the White House. Really, that was my only idea, and I was being annoying and complaining about it when Amanda Cox matter-of-factly told me to make a sketch that showed the percent chance of visiting the White House based on one’s total donation size. An hour later, I had this:
We all liked it right away. Most of the remaining work went to matching the databases of donors and visitors as well as we could. That data work is important, but horribly unsexy and not really conducive to sketches. In general, we matched on middle initials where we could, and Matt Ericson helped me implement his handy Mr. People gem to get the various names parsed in a uniform fashion. Otherwise, all the data work was done in R, with a typically heavy-bordering-on-embarrassing level of assistance from Amanda.
Once we published, there was some discussion about the form of the chart on Twitter, and I admit it’s slightly odd. We had a lot of discussion about form on our end, too. So I present 4 options, each named for a delightful animal (we do a lot of animal-based filenames in the department, for some reason):
First, the “Blue Whale,” arguably the most straightforward, accessible approach. This form makes the trend the focus of the graphic:
“Polar Bear” is perhaps the best chart for a more technical audience…
…but it might mean fewer people understand it. And is it me, or do the horizontal segments look like error margins instead of donation ranges? It’s not quite a scatterplot, since the percentages plotted represent “buckets” of donation sizes rather than individual points.
A slightly different approach, the “Tree Lobster” might indeed be the most accurate representation of this dataset:
But where’s the continuity? And seriously, how boring are bar charts? Also, labeling is hard on this thing, which is not a trivial problem.
Lastly, (Dull) Giraffe:
Seriously, this one is dull and maybe not worth discussing. Or is it? Discuss. Any discussion of these forms might happen on Twitter under the hashtag #chartingSpiritAnimals until I figure out how to put comments into this site, which, let’s face it, isn’t ever going to happen.
If you’ve seen the graphic online or in print, you’ll know that we went with the Blue Whale. Aside from carrying the crucial Steve Duenes/Matt Ericson/Amanda Cox voting bloc (their decisions somehow track the majority vote 100% of the time), it felt suited for the data and the story it was published with.
(It looks fine online too, but it’s sort of stranded on its own URL.)
Finally, as a disclaimer, the data plotted in these examples is slightly different than what went into print last week, as we did some manual tweaking on a handful of names, which moved a couple percentages up or down a tiny bit.
Look forward to seeing if any data visualizers Tweet silly animal names this week. I’ll go first…
We had a medium-sized graphic in today’s paper looking back on Rick Santorum’s campaign. The map was made in R using maptools, a package I find increasingly easy and fun to use. For me, the best part about visualizing data in R is that it even when you screw things up pretty bad, the result usually looks pretty cool.
Anyway, the map is not revolutionary or anything, but it worked well to tell the story we wanted to tell. I took a screenshot of it at various points in the process (although a small army of people took care of most of the hard parts). Looked great online, too, thanks to that same small army.
Here, making sure I remember how to plot counties:
Sizing bubbles by margin of victory (too big, it turns out):
Getting the colors and sizing closer:
Exporting everything to a PDF so Illustrator can easily clean up the vector work:
This week the graphics department published a couple graphics based on exit poll data. The first one, made mostly by Shan Carter, was similar in many respects to the one he made in 2008 to show the differences in voters supporting Barack Obama and Hillary Clinton. (Known internally as the “delightful dancing boxes.”)
This view, which focused on Mitt Romney and Rick Santorum, was perfect for capturing the differences between their supporters, but we also wanted to show the influence of the other candidates, who have gotten substantial amounts of delegates.
Shan addressed this with a quick sketch:
Next they tried a ternary plot (I had to look it up myself), which is apparently beloved in geology and frequently to describe soil samples. Anyway, I came on to the project late, after the concept had been more or less decided.
First, a sketch showing how voters of a single demographic group supported in 7 different states. (Groups that supported Mitt Romney are farther to the right; groups for Santorum are farther to the left; groups supporting anyone else are toward the bottom.)
A different approach, and one we eventually went with, showed all the groups across a single state. This is for Iowa.
Then we just tried to show this as best we could. One thought was to label the biggest groups and draw lines for the shift from another state. Here’s who Michigan voters supported, with the lines emphasizing the main groups’ change from New Hampshire.
We really liked the lines in print, but once you animate the transitions you don’t really need them, since the motion has the same effect. (Plus I didn’t know how program the lines anyway.)
Then we just had to build the thing, which we made using the D3 libraries. In Flash this thing would have been not so hard, and it was slow going at the beginning. But we’re as good as anyone at copy/pasting from demos, so it wasn’t too long before this:
A couple weeks ago, just in time for the Super Bowl, we published a couple fun graphics that used transcripts of ESPN’s “SportsCenter” as a way to look back on the NFL season.
A colleague suggested instead using 3D players rather than photos, in part just to do something new and in part to give us a way to put more players on a field at one time. Here’s a progression of sketches on that concept:
Original whiteboard sketch:
A drawing for how it would fit on a print page:
Graham Roberts’s proof of concept (with sizes semi-randomly assigned):
We added some labels and charts to Graham’s final rendering:
It ended up looking pretty cool and we were happy with it, but in the course of our analysis we really noticed a lot of funny quotes and cliches that the announcers said but I couldn’t really find an interesting way to present them.
We made a ton of charts looking for keywords we wanted to inspect, which let us sift through the data a little faster (though eventually we would have to weed out non-NFL references by hand). This output showed charts for mentions of words, both cumulative and week-by-week, along with a list of the usage of each word in context:
But presenting them was kind of a challenge. A straightforward approach (the only kind I know how to do, really) didn’t do much for anyone and took up a ton of space, so we dumped it:
We tried highlighting individual sentences (like “You’ll have more luck getting a ticktack out of the mouth of an alligator than getting information, especially about injuries, out of the mouth of Bill Belichick,” Aug. 10), but there wasn’t anything cohesive about a random list of quotes.
Then my boss said to write something original with the quotes as if I were writing for McSweeney’s. I said, great idea, imagining something like “Is It OK To Dunk On the President?”, one of my favorite McSweeney’s articles ever. Unfortunately, I couldn’t get it to work. Luckily, our intern, Ritchie King, who was already helping me with the analysis, was.
He turned a handful of silly cliches into a hilarious narrative about sports, war and Tim Tebow. We made his cliches piece the center of the graphic and had Sam Sifton read it online. (If you haven’t heard it yet, it’s worth a listen.)
Anyway, it was a fun project and proof that data is out there for almost any crazy idea. It also emphasized two important lessons. One, from Amanda Cox, is that you should make a hundred charts and pick the best one. We definitely did that – our project folder is full of boring analysis of various players and ideas. The second lesson is that the design and editing machine of the NYT graphics department can take a decent idea and turn it into something much better.
For the nerds out there, most of the analysis was done in R using the tm, openNLP and Rstem packages, but I can’t be sure which methods I used from which since Amanda just told me to import all of them.
A few weeks ago we published a “Defense Budget Puzzle” (a sequel of sorts to one we made in 2010 that dealt with the federal deficit) that focused on a series of choices that the Pentagon is making to cut its budget. This time, however, we stored the choices readers made when they “submitted” their plans. And last week, when Elisabeth Bumiller and Thom Shanker highlighted the Pentagon’s first major step toward that goal, we published the results of the more than 12,000 readers who submitted a plan.
We weren’t sure how to visualize the results to include a choice’s popularity and its cost. So we grouped them by category and explored a couple different presentations (both using a very few lines of R). The first one used proportional circles and I sort of liked it but almost none of the smart people I showed it to did, which is pretty much the end of the story there.
The second one was based on a simple chart we had run about ads the week before (the link to which I can’t seem to find).
It looked cleaner, fit better in the space and was pretty straightforward to make. So we went with it and built it. (Alas, my neon colors were changed to something more sensible.) Still looks pretty close to the final output, though.
This weekend the NYT published Shaila Dewan and Robert Gebeloff’s story about the richest 1 percent of Americans (a more diverse bunch than you’d think). The graphics department published a lot of work in print and online to accompany the article. Online, there was an interactive map that shows you where you and your income rank in 344 zones across the country and a treemap of what jobs the 1 percent hold. But the print version, made by Alicia DeSantis and Ford Fessenden, was really imaginative. (I’m writing this only as a fan - my involvement was limited to about 10 minutes of data monkeying.)
First, Alicia’s original sketch, written on some junk paper:
Originally, they wanted to export the “labels-map” using ArcMap, but to make it as easy as possible to style (it’s not so fun to try to dynamically color or manipulate strings in Arcview, as far as I know, anyway), I used R (specifically, the maptools library) to make a pdf, which takes only about 5 lines of code.
Here’s the original output as a proof of concept:
Then, after a couple iterations, we did more styling on the programmatic side to cut down on manual labor.
And the final product:
This is a good example, I think, of using each medium to its best potential, meeting the design constraints of each. More and more, this means making totally separate versions of things – admittedly, it frequently takes twice the time and energy – but the mediums are just so different that works well in one just doesn’t work well in another.
One thing I do wish we could do better online is integrating graphics in the context of stories and other assets – photos, videos, whatever. Unfortunately, we don’t get to make every web page by hand once per day like we do in print.