May 15, 2013
Sketches from Money on the Bench

On Monday we published something a little different than most of the graphics we make – a running, updating tracker of how much money major league teams are paying to players on the disabled list.

I love sports, but I’m not a huge baseball fan and I’m neutral on the Yankees scale – I don’t really hate them but I can’t say I care whether they win or lose. But early in the year, I remember seeing a fun New Yorker cover that planted a seed:

image

Talking to some friends and colleagues, Joe Ward and I thought it would be fun to do something that put a dollar figure on the Yankees’ disabled list. We certainly weren’t the first people to notice this – in addition to coverage from traditional outlets, the Onion wrote about how “stacked” the D.L. was and there was a well-circulated blog post when their payroll approached $100 million in annual salaries – but we wanted to make something that showed all major league teams and was updated throughout the season. To do that, you only need two data sources: salaries for every player in the league and a list of all major league transactions, both of which are updated regularly.

We wanted users to be able to find their own team, but also to see the big picture. Some of our original sketches focused on the amount spent per team per day. Below, a chart where each line represents one team’s amount paid to players per day (the jumps and dips represent players coming on and off the list):

image

Another sketch showed the teams as small multiples:

image

And another used stacked bars (poorly):

image

Or one that just showed the players on the bench and how long they’d been on it, regardless of team or salary:

image

But the one that stuck out the most in the end was the simplest – an aggregate per-team calculation:

image

With that, we started developing things in the browser. The following are sketches made with D3 based on the previous R charts.

Originally, this started as an idea for a phone with just a couple numbers per team. (These sketches are old and the numbers are calculated incorrectly… I screwed some things up.)

image

But we also wanted to see individual players. Here, a first attempt at the data join in D3:

image

Later, hooking it up to real salary data:

image

And making it a little less boring (or “adding sugar,” as Shan says)

image

Before coming to the version that’s online now:

image

We still kept a mobile view that I think turned out as well or better than the desktop version:

image

Is this an earth-shattering example of data journalism? I suppose it is not. It’s two data sets, a timer and and a giant photograph of A-Rod updated a couple times a day. But I must say I like it. It’s fun and engaging for the users it’s aimed at; it’s not tied to a single news event but it’s not aimless either; it was developed and published in less than two weeks; it works on all sorts of devices and it updates every day (originally an R script running on a crontab, now a node script). It’s also a good example of using D3 to make data-driven applications without using SVG at all.

Normally I show what we did in print, but in this case, we didn’t make anything. Most of the fun of this is seeing the numbers tick up in front of you (Shan’s idea) as you’re on the page. In print, it’s just another bar chart. At the same time, if something happens, we’ll be ready on short notice with all the data we need.  

April 29, 2013
Charting Skill and Chance in the N.F.L. Draft

Last week we published an interactive graphic about the N.F.L. draft. Our goal was to show an odd reality: even though N.F.L. teams do tend to pick the “best” players early in the draft, there’s a tremendous amount of chance involved. The best 10 eventual N.F.L. performers will not be the first 10 players drafted – or even close.

How to know that both of these are true and decide which is most important? We used draft and performance data from pro-football-reference.com. (One note: N.F.L. performance is hard to measure across positions – how do you decide if a tight end is “better” than a linebacker or a defensive tackle? Most analyses use a combination of games started and pro bowls; the one developed by pro-football-reference uses both of those but has some fine-tuning by position.)

So, for for every pick in the draft, we have one number encompassing their N.F.L. performance. Here are the top 20 since 1995:

image

Here’s a first sketch, where every dot represents one player. The Y axis is “how good” every player is, and the X axis is where in the draft they were selected. I actually screwed something up here – there aren’t more than 250 or so picks in a draft – but otherwise the distribution is more or less right:

imageMy colleague Mike Bostock cleaned this up by coloring the picks by round and adding some labels:

image

Although that shows all the data, it’s too noisy to really interpret. Wanting to simplify this, I tried taking the average of all players who went at a certain round and certain pick – here, each dot represents the average value of all players at a certain pick (for example, the players drafted at Round 1, Pick 1, or Round 2, Pick 13). As before, the dots are colored by round:

image

The dot on the top-left represents the average value of all first picks in the draft since 1995 – on average, this group, which includes Peyton Manning, Cam Newton, Andrew Luck, Michael Vick, Keyshawn Johnson and others, clearly outperforms the other picks. (This is might be obvious, but then again, the group also includes Tim Couch and JaMarcus Russell.)

I admit I liked this chart more than I probably should have. (My colleagues corrected me!) Averaging this way is a little misleading because every round doesn’t have the same number of picks (the league has grown and there are extra picks at the end of each round, which leads to some funny business with the math), and hiding the distribution oversimplifies things a little. But this chart does make a simple point – the better players tend to go first.

Instead, Mike offered a boxplot, which shows the distribution without being so noisy:

image

Even this was a little too busy for the point we wanted to make, so we settled for a small bar chart.

image

What we wanted to focus on was the reality that there’s much more randomness in the draft than people realize. Cade Massey and Richard H. Thaler, behavioral psychologists, analyzed the draft and found that not only is there no persistent skill among teams in picking players – teams have good years and bad years in equal measure – but that across all players and positions, teams only picked a player better than the person who went next at that position 52 percent of the time. Their academic paper is here, but Massey explained this in a much more accessible way in a recent talk at the Sloan/MIT sports analytics conference. 

I took a stab at replicating some of their findings just to see what it would look like. Here’s a rough chart of the percentage of teams picking a player who ended up being better than the guy drafted after him at the same position. For example, if you chose Peyton Manning (Pick 1 in the 1998 draft) over Ryan Leaf (Pick 2), your guy is better than the next guy at that position, but if you chose Spergon Wynn (Pick 183 in the 2000 draft) over Tom Brady (Pick 199), you did not. (Sorry, Cleveland Browns.)

image

Simply put, teams don’t pick the “right” player as often as you think, and tend to do better than a coin flip only in the first round. This chart goes under 50 percent after the third round, but that reflects some noise in the data towards the end of the draft – most of these players don’t actually get in the game, so it’s not very meaningful to say that one benchwarmer is marginally better than another. But this concept is hard to explain in a chart like this (the title would be something like “percent of players who were better than the next player at the same position by round”), so we took a simpler approach.

I had been tinkering on a version of a chart I had that showed where the best eventual players were drafted:

image

This chart highlights where the 10 “best” players in each draft were picked. My colleague Joe Ward thought it would look good in print, where we have more space, and this chart ended up closely resembling what was eventually printed:

image

Online, Shan Carter suggested an interface that showed this uncertainty with two sentences: the percent of the best players that came in the first round and the percent that came after:

image

A slider and about a hundred commits later, you have an tool that lets you explore where the best N players from the draft came from every year.

image

Mike also made a similar implementation based on the Fisher-Yates shuffle, which is a thing I learned about when he showed me, but it wasn’t the right application for this data, and anyway it was getting too late to change our minds:

image

These charts and sketches were made in R and D3. Normally, at the end of these posts, I write about how other people implemented the best parts of this graphic, but this time it’s especially true. 

One of the great things about working in a department with a staff of 25 people is that you can be in big trouble three days before something publishes. Then you make a phone call to San Francisco and everything works out fine.

November 12, 2012
Some sketches from the Times’ scenario builder

Probably the best-known of the department’s graphics this election season is Mike Bostock and Shan Carter’s 512 Paths to the White House. Instead of posting on this in detail, I’ll just put up a few images and direct you to some stuff that’s already out there.

First, an interview on Source with the authors. 

Next, Shan Carter’s recent talk at the Visualized conference in New York. (There was apparently a burst of applause when the first slides for this graphic came on the screen.)

These photos are from that talk, but there are dozens more if you read through the whole thing, which you should, obv.

s1

s2

And the final graphic, which was wired up to results on election night.

finalThe only meaningful footnote I can add to this is that Mike Bostock described programming the animations as “really, really hard.” I read that to mean I need to give up programming immediately, but your mileage may vary.

November 12, 2012
Sketches from the Swinging Swing States

Now that the election is over and there’s a bit more time, I can post some sketches I have been sitting on for a while. (Early disclaimer: sitting on them was as close to this content as I got – these are courtesy of Mike Bostock, Shan Carter and Amanda Cox. I’m just doing the manual cut-and-paste labor.)

This first sketch, from Amanda, was made with R, and it had been in her ideas folder for a while (more than six months, I think).

orig

Some brief styling of that idea in D3:

s1

s2

Then, before eventually coming back to this concept, some experimentation with other forms:

e1

e2

e3

e4

Before honing in on their final idea:

e5

h2

h3

And then just refining the view:

h5

r2

r3

And here’s the final piece:

f1

August 19, 2012
Droughts on display

Chris Fennewald, an editor with the Missouri Farm Bureau Publications, sent me these photos of our recent drought graphic on display at the Missouri State Fair.

pic1

pic1

Nice to see the maps out in the wild! 

Also, if you missed it, Shan Carter and Mike Bostock revisited the drought data for a recent piece in the Sunday Review.

July 25, 2012
Shan Carter’s track

Yesterday the graphics desk published the third in its “How to Win” series; here, Shan Carter and Joe Ward explain the handoff in track relays. (You should really be checking those out, btw.)

Shan sent along one of his first passes at the 3D track, which I believe he made using Modo. This only reinforces my current belief that 3D rendering is a basically a series of inexplicably magical dials.

track

Here’s his final rendering from the published graphic

published

May 15, 2012
Shan Carter (and an army of others) share some sketches from the NYT electoral map

Last week the Times published their interactive electoral map. Although a medium-sized team of reporters, editors, designers and developers (including, but not limited to, Jeremy Ashkenas, Matt Ericson, Alan McLean, David Nolen and Derek Willis) had a hand in designing and building the project, Shan Carter did much of the developing of the main visualization, and he agreed to let me post some of his sketches here. (I had no hand in this – I’m just the image copy-paster this evening.)

First, a look back to the Times’ electoral map of 2008:

2008 Map

You’ll notice some similarities – there is analysis for every state and the option to share your own map. But they wanted to explore some different options this year, too. First, Shan started by making a cartogram in Illustrator, overlaid on a (pretty terrible) hand trace of the US:

First map sketch

And then slowly tinkering with it:

Testy

Improvement

One idea was to take the geography out of the graphic completely:

NO GEO

Or at least minimize it further by dividing states into regions:

Minimized geo

Another was to compare two maps side-by-side, similar to the "split screen" view of the Senate in 2008:

Split screen img

But no one was really super thrilled with maps as the main conduit for the analysis. Instead, they decided on minimizing the geography and using “bins” for states. (Shan has sort of been obsessed with “bins” since 2008, when his dream of having states magically fall into buckets on election night ultimately didn’t pan out. I personally had to cheer him up after that and it was not pretty.)

Anyway, an early prototype of that concept:

prototypeAnd how that part of the graphic ultimately looked:

Final blobs

If you’ve seen this piece by now, you’ll notice that they didn’t make just one decision – they expanded on a few of them in a compelling mix of interactive and linear storytelling that told a few different stories and also let you make your own and share it wherever you wanted.

It’s also a fun insight into Shan’s workflow, which is to mostly experiment directly with markup rather than with flat outputs from R or Adobe Illustrator mockups, which many of us do. (OK, technically, he tells me the cartograms, being more art than science, were hand-made in Illustrator and then their xy positions were exported to D3, but still, he’s on the record saying “mockups are for suckers.”)

Also, this was made using D3 and implemented a technique that let the graphic function properly even in Internet Explorer 8. (A sharp guy named Jim Vallandingham chronicled this in extreme detail if you’re interested in doing this sort of thing.)

Liked posts on Tumblr: More liked posts »