Mostly as a test, but also to allow for some increased flexibility, I am porting this blog to github pages. It’s now available, along with a new post about NYT 4th Down Bot, right here.
I’ll try to add today’s expected internet behavior to make sure it’s easy to follow along from whatever device/portal/dark corner you prefer your internet to come from, but if you have any suggestions or ideas, feel free to email me. My email address is really easy to find.
On Sunday Eli Manning started his 150th consecutive game for the Giants, the highest active streak in the NFL and the third-longest streak in NFL history. (One of the other two people above him is his brother, Peyton.)
The graphics department published an interactive graphic that put Eli’s streak in the context of about 2,000 streaks from about 500 pro quarterbacks. The graphic lets you explore the qbs and search for any quarterback or explore a team to go down memory lane for your team.
It’s not particularly important news, but the data provided by pro-football-reference is incredibly detailed and the concept lended itself to a variety of sketches. It was also good practice sketching in D3, which, once you memorize a few things, isn’t as painful as I had thought. (Being in SF for the fall makes learning easier, too, since I can interrupt Mike more easily when he’s sitting one desk over. )
A couple bar charts in R. First, total games started (this compares Eli to QBs in his draft class or later).
And percent of games started (the people are 100% are players like Andrew Luck or RGIII who just haven’t played a lot of seasons.)
Ported to a browser, just using total starts:
And share of total possible starts
A different angle, showing teams with all the QBs who have started for them since 2004. (Sorry, Cleveland Browns.)
A similar idea, but mapping when each quarterback played for a particular team
Or just showing all the quarterbacks, regardless of team, going back to 2004…
…or all the way back to 1970
Focuing on the teams took up less space, with a new color for each new QB. (Please wear sunglasses.)
Simplifying the output, only labeling prominent quarterbacks:
At this point this was the general thought, but there were still some thougts on what the bars could look like. Lines were one option:
And then some more streak-like streaks, which eventually became more like arrows and less like, I don’t know, whatever you think these look like:
And the eventual one published Sunday, with search and fullscreen mode:
My colleague Graham Roberts is teaching a SkillShare class about animated motion graphics. It’s only $20. If you’re interested in this sort of thing, it’s probably worth not drinking those three beers and spending the money on Graham instead. (I get nothing by posting this except his eternal gratitude, which means maybe next time we do a narrated video I’ll win the fight and he’ll cut all the background music.)
These kind of storytelling skills are pretty hard to come by. If you have them, your portfolio looks like Graham’s. If you don’t, you could be stuck making broken 2D Sankey diagrams like this one for the rest of your career:
From a chat transcript with Amanda Cox this morning (with permission).
The spirit-lifting content she is describing is the print version of this interactive article, for lack of a better word, about income mobility, mostly developed by Amanda and Shan Carter. Subway man’s fingers were tracing this chart:
It’s always hard watching print readers, most of whom are ruthless scanners, come across something you made; I’m glad it appears to have worked out. It goes without saying that “winning” Amanda is preferable to “spirits fell” Amanda by all parties everywhere.
Was just at a sandwich place outside work, chatting with a co-worker about an upcoming map project. Co-corker leaves. A guy in the sandwich place walks up to me, asks me if I make maps. I tell him yes, sort of, sometimes. He says he loves maps. I say, cool, me too. Then he says, You know what would be really cool is, a map of quiet places in New York. I say are you messing with me? He says no. I say, are you joking? He says no. I then tell him that in the last day or two the Times published THIS EXACT MAP. I then watched his mind get blown in slow motion.
When you work on a team that has 5+ professional cartographers, you don’t often get called on to make maps. But sometimes they’re busy, alas, and you use Google’s maps, which are actually pretty great these days. Today’s update of our bike map lets users submit short bits of bike wisdom anywhere in the world (though we’re focusing on North America).
The new maps use Canvas and were inplemented by the peerless Tom Giratikanon with a handy backend comment moderation tool from Marc Lavallee and our colleagues in Interactive News.
Tom spent a bit of time on the labeling algorithm before getting it right:
I know all the rage is about mobile devices these days, but I must say if you have a 40-inch monitor this looks pretty good in fullscreen mode really zoomed-in.
On Monday we published something a little different than most of the graphics we make – a running, updating tracker of how much money major league teams are paying to players on the disabled list.
I love sports, but I’m not a huge baseball fan and I’m neutral on the Yankees scale – I don’t really hate them but I can’t say I care whether they win or lose. But early in the year, I remember seeing a fun New Yorker cover that planted a seed:
Talking to some friends and colleagues, Joe Ward and I thought it would be fun to do something that put a dollar figure on the Yankees’ disabled list. We certainly weren’t the first people to notice this – in addition to coverage from traditional outlets, the Onion wrote about how “stacked” the D.L. was and there was a well-circulated blog post when their payroll approached $100 million in annual salaries – but we wanted to make something that showed all major league teams and was updated throughout the season. To do that, you only need two data sources: salaries for every player in the league and a list of all major league transactions, both of which are updated regularly.
We wanted users to be able to find their own team, but also to see the big picture. Some of our original sketches focused on the amount spent per team per day. Below, a chart where each line represents one team’s amount paid to players per day (the jumps and dips represent players coming on and off the list):
Another sketch showed the teams as small multiples:
And another used stacked bars (poorly):
Or one that just showed the players on the bench and how long they’d been on it, regardless of team or salary:
But the one that stuck out the most in the end was the simplest – an aggregate per-team calculation:
With that, we started developing things in the browser. The following are sketches made with D3 based on the previous R charts.
Originally, this started as an idea for a phone with just a couple numbers per team. (These sketches are old and the numbers are calculated incorrectly… I screwed some things up.)
But we also wanted to see individual players. Here, a first attempt at the data join in D3:
Later, hooking it up to real salary data:
And making it a little less boring (or “adding sugar,” as Shan says)
Before coming to the version that’s online now:
We still kept a mobile view that I think turned out as well or better than the desktop version:
Is this an earth-shattering example of data journalism? I suppose it is not. It’s two data sets, a timer and and a giant photograph of A-Rod updated a couple times a day. But I must say I like it. It’s fun and engaging for the users it’s aimed at; it’s not tied to a single news event but it’s not aimless either; it was developed and published in less than two weeks; it works on all sorts of devices and it updates every day (originally an R script running on a crontab, now a node script). It’s also a good example of using D3 to make data-driven applications without using SVG at all.
Normally I show what we did in print, but in this case, we didn’t make anything. Most of the fun of this is seeing the numbers tick up in front of you (Shan’s idea) as you’re on the page. In print, it’s just another bar chart. At the same time, if something happens, we’ll be ready on short notice with all the data we need.
Since we had already done some of the data work with the interactive graphic, we were able to turn around a quick chart after the draft. Below, some charts about what kind of players were drafted in the first round this year. (Nine offensive lineman and no running backs, both records.)
Last week we published an interactive graphic about the N.F.L. draft. Our goal was to show an odd reality: even though N.F.L. teams do tend to pick the “best” players early in the draft, there’s a tremendous amount of chance involved. The best 10 eventual N.F.L. performers will not be the first 10 players drafted – or even close.
How to know that both of these are true and decide which is most important? We used draft and performance data from pro-football-reference.com. (One note: N.F.L. performance is hard to measure across positions – how do you decide if a tight end is “better” than a linebacker or a defensive tackle? Most analyses use a combination of games started and pro bowls; the one developed by pro-football-reference uses both of those but has some fine-tuning by position.)
So, for for every pick in the draft, we have one number encompassing their N.F.L. performance. Here are the top 20 since 1995:
Here’s a first sketch, where every dot represents one player. The Y axis is “how good” every player is, and the X axis is where in the draft they were selected. I actually screwed something up here – there aren’t more than 250 or so picks in a draft – but otherwise the distribution is more or less right:
My colleague Mike Bostock cleaned this up by coloring the picks by round and adding some labels:
Although that shows all the data, it’s too noisy to really interpret. Wanting to simplify this, I tried taking the average of all players who went at a certain round and certain pick – here, each dot represents the average value of all players at a certain pick (for example, the players drafted at Round 1, Pick 1, or Round 2, Pick 13). As before, the dots are colored by round:
The dot on the top-left represents the average value of all first picks in the draft since 1995 – on average, this group, which includes Peyton Manning, Cam Newton, Andrew Luck, Michael Vick, Keyshawn Johnson and others, clearly outperforms the other picks. (This is might be obvious, but then again, the group also includes Tim Couch and JaMarcus Russell.)
I admit I liked this chart more than I probably should have. (My colleagues corrected me!) Averaging this way is a little misleading because every round doesn’t have the same number of picks (the league has grown and there are extra picks at the end of each round, which leads to some funny business with the math), and hiding the distribution oversimplifies things a little. But this chart does make a simple point – the better players tend to go first.
Instead, Mike offered a boxplot, which shows the distribution without being so noisy:
Even this was a little too busy for the point we wanted to make, so we settled for a small bar chart.
What we wanted to focus on was the reality that there’s much more randomness in the draft than people realize. Cade Massey and Richard H. Thaler, behavioral psychologists, analyzed the draft and found that not only is there no persistent skill among teams in picking players – teams have good years and bad years in equal measure – but that across all players and positions, teams only picked a player better than the person who went next at that position 52 percent of the time. Their academic paper is here, but Massey explained this in a much more accessible way in a recent talk at the Sloan/MIT sports analytics conference.
I took a stab at replicating some of their findings just to see what it would look like. Here’s a rough chart of the percentage of teams picking a player who ended up being better than the guy drafted after him at the same position. For example, if you chose Peyton Manning (Pick 1 in the 1998 draft) over Ryan Leaf (Pick 2), your guy is better than the next guy at that position, but if you chose Spergon Wynn (Pick 183 in the 2000 draft) over Tom Brady (Pick 199), you did not. (Sorry, Cleveland Browns.)
Simply put, teams don’t pick the “right” player as often as you think, and tend to do better than a coin flip only in the first round. This chart goes under 50 percent after the third round, but that reflects some noise in the data towards the end of the draft – most of these players don’t actually get in the game, so it’s not very meaningful to say that one benchwarmer is marginally better than another. But this concept is hard to explain in a chart like this (the title would be something like “percent of players who were better than the next player at the same position by round”), so we took a simpler approach.
I had been tinkering on a version of a chart I had that showed where the best eventual players were drafted:
This chart highlights where the 10 “best” players in each draft were picked. My colleague Joe Ward thought it would look good in print, where we have more space, and this chart ended up closely resembling what was eventually printed:
Online, Shan Carter suggested an interface that showed this uncertainty with two sentences: the percent of the best players that came in the first round and the percent that came after:
A slider and about a hundred commits later, you have an tool that lets you explore where the best N players from the draft came from every year.
Mike also made a similar implementation based on the Fisher-Yates shuffle, which is a thing I learned about when he showed me, but it wasn’t the right application for this data, and anyway it was getting too late to change our minds:
These charts and sketches were made in R and D3. Normally, at the end of these posts, I write about how other people implemented the best parts of this graphic, but this time it’s especially true.
One of the great things about working in a department with a staff of 25 people is that you can be in big trouble three days before something publishes. Then you make a phone call to San Francisco and everything works out fine.
A couple weeks back, we used PitchFX data to show the relative “nastiness” (for lack of a better word) of the Mets’ pitcher Matt Harvey. The chart below shows pitches that batters swung at outside the strike zone during a recent game against the Phillies.
Just over a week ago we published a graphic – more of an interactive short blog post without a blog, really – that accompanied Tyler Kepner’s piece about strikeouts for the Times’ 2013 baseball preview. The subject of both pieces was the steep increase in strikeouts across the board in the past decade: last year, ten Major League clubs set franchise records for strikeouts.
The fact Tyler came to us with was one he’d found on his own: 18 teams struck out at least 1,200 times last season; through 2005, there had never been a season in which more than two teams topped that total. Below, the first sketch, based on that stat – the number of teams with 1,200 strikeouts or more in a season going back to 1968:
That’s a compelling chart, but it’s also a little misleading because the league has expanded a few times and not all seasons are the same length.
Instead, Joe Ward and I thought about making small multiples of the teams and arranging them in a sort of histogram, sort of like my colleague Bill Marsh did with exit polls in 2008 and 2012.
Here are the first nine teams in alphabetical order, with the league average in grey:
We didn’t really care for these, and I complained about it to my colleague and cubicle-partner Alicia Desantis, who suggested I make it look like the climate change “hockey stick charts.” (FYI, The image below, one of the better ones from Wikipedia, is meant to convey the form, not wade into the “Hockey Stick controversy“ if you believe there is one.)
Here’s what the first R sketch of that idea looked like – every team’s average strikeouts per game per year. (Red is the league average.)
At this point, we had a chart we liked and the process went forward like many of our other projects do. However, there was a key difference with this one that’s worth mentioning - all the rest of the sketches, edits and and design improvements happened in a web browser. (More on this later.)
Here are a few successions of this chart, made using D3:
Getting this data from baseball-reference.com requires a bit of scraping, and this project sold me for life on R’s XML package, which makes scraping fast and shamefully easy.
In the final project, there are three interactive charts and a table on the page, and they are all generated in D3 with just one data file. The whole chart form – line selection, tooltip, calculating averages – is easily abstracted out, and for the first time I felt some of the same sketching power in a browser that I’d seen only with R: the concept that if you can make one chart, you can make a hundred with the same effort. But with D3, the sketches are already in a browser and wired for interaction! From a development point of view, it felt tremendously powerful. (For many of you this might be obvious, but old habits die hard.)
Also, thanks to the open-source SVG Crowbar bookmarklet developed by Shan Carter, this project represented a recent change in development process, for me, at least. Instead of developing both print and online charts separately, we were able to generate all the charts for print in a web browser at precisely the sizes we wanted, then save them down to Illustrator. Aside from being a useful shift in thinking, it saved a ton of time. (This isn’t the first time the department has done something like this – just the first time I did.)
For example, we included the small multiples in print, but we made them in D3 first:
Here’s the two-page spread in print. Again, all these charts were produced in a browser, saved to SVG and edited lightly in Illustrator.
Finally, for the record, most of the best parts of this graphic were made by Shan while I was on vacation (with standard last-minute triage from Amanda Cox and Mike Bostock), and all the meaningful annotation was from Joe Ward, who, did you know, played D1 baseball and was a scout for the Cleveland Indians before coming to the Times?
On Christmas we published an interactive game of sorts that lets you pick your own all-time team of New Yorkers who went on to the N.B.A, along with a selection of teams from some notable pundits and former players. There weren’t really a lot of sketches to post, but for me, the best part of the project was looking through old N.B.A. photographs from the NYT archives and, in some cases, from the players’ colleges. (Below, Bill Bolger, courtesy of Georgetown University.)
The project didn’t end up being a real hit traffic-wise, possibly because people were spending time with their loved ones on Christmas rather than playing games on the internet, or possibly because the actual audience is relatively small. Still, it was worth it for me – this feature had a lot of new features we hope to use again, including customized sharing and, I think, a good integration of Isotope – some of these features were used again on Interactive News’s Year on Page 1 project. I also got much better at scraping with R’s XML package, and I’ll try to post a demo here soon.
Finally, if you can’t tell, Dan Nguyen’s SOPA Opera was an obvious design inspiration for this.
Last week I posted a video from Jeremy White, loosely describing how he turned LIDAR data into a stunning model of Tunnel Creek. But more modeling yet went into showing exactly where the avalanche happened and how it traveled. My colleague Graham Roberts added trees, elevation lines and an actual model of the avalanche – its shape, depth, and size — as it flowed down the mountain. (The Swiss Federal Institute for Snow and Avalanche Research created the model specifically for this project.)
Below, a set of drafts that show the animation at various points along completion. These are courtesy of Graham, who rendered these in 3D, and Hannah Fairfield, one of the project’s editors.
Contrast these to the version that made it into the project (I failed at internet in trying to post that video here, but it looks better on the Snow Fall page anyway). You’ll see that they added elevation lines, toned back the background sound a bit and added a faint “tick” to help show the speed of the avalanche as it moved down the mountain.
The NYT published its Snow Fall project this week. (You’ve seen it, right?) It’s a large, immersive and complex multimedia storytelling piece by more than a dozen people. I had zero (zilch, none, undefined) to do with it, but I do have a blog, and Jeremy White, one of the folks responsible for the 3D animated flyover in the first chapter (it’s a video, not a gif), made a relatively face-melting video showing how it came to pass:
For those interested in making these on your own, it may be dispiriting to learn that Jeremy is all-but-dissertation in a PhD program for cartography and we are not. But he told me he didn’t use a ton of proper GIS for this – mostly 3D and data skills. (I don’t buy it totally, but whatever.)
In short, he made a 3D mesh in 3ds max from King County LIDAR data, added and georeferenced satellite imagery from the USGS, added some snow and atmospheric conditions (like fog) with V-Ray, thew in a touch of color correction, sent it to the department’s render farm (16 Mac Pros), and 48 hours later, boom, a 43 second video. Simple! (Obviously, it’s not; it took weeks.)
For those of you with extreme technical questions, Jeremy’s on Twitter and he loves talking about this stuff all day long. I sit right next to him, so I know.
The same plot, with extra arguments to clean it up a little:
plot(data$Year,data$total.unified,type='l',ylim=c(0,50),xlab="Year",ylab="States",main="States with unified control of state government since 1938",col="red",lwd=3)abline(h=c(0,10,20,30,40,50),col='lightgrey')abline(v=c(1940,1960,1980,2000),col='lightgrey')
Adding more layers onto the plot, drawing lines for Democratic- and Republican- unified states. (In general, “plot” makes a chart and “lines” add to an existing plot.)
Now we’ll make a barplot instead. The syntax here is a little weird, and I had to get Amanda to fix mine originally, but it’s not so bad. Basically, our data needs to be transposed and reduced to just the columns we want to plot. You can do this in one step, but for clarity I’ll break it up here. It looks like a waffle chart just because of the horizontal axis lines, but it’s just a barplot.
#just the numbers we want to plot data.we.need<-data[,c("Unified.D","Divided","Unified.R")] #a simple reshaping, transposing our datatransposed<-t(data.we.need) barplot(transposed,ylim=c(0,50),col=c('blue','grey','red'),border=F)abline(h=c(1:50),col='white')
We end up doing the same plot for the final output; it’s just shaped differently and has fewer axis lines. We’re also saving it as a pdf in the dimensions we want:
Last week, my colleague Monica Davey reported that starting in January, one party will control both the state legislature and governor’s office in 37 states, the highest that figure has been since 1952.
Numbers like that don’t always mean a chart will be good, but it usually means it’s worth at least checking out, so I got data from the National Council of State Legislatures, which had previously published a chart on their blog.
Getting data behind a chart you see on the internet isn’t groundbreaking work or anything, but it happens regularly in our daily work, and just because you can get the data easily doesn’t mean you can’t screw it up. Anyway, there are a number of forms this could have taken, so I thought I would share some.
The most basic chart to do here is to show the news: that the number of states with unified governments is at a 60-year high:
That does show the news, but not much else. Adding on lines depicting which parties have unified control per year (the black line is just the sum of the other two) helped a little:
But the lines look super noisy and I thought maybe someone would want to see the states more prominently. Here’s a waffle chart, with each square a state. (One cool byproduct with the area chart forms is you get to see the U.S. add Hawaii and Alaska – the “last” bump on this chart is when Minnesota switched from a nonpartisan legislature in 1972.)
That might look a little better, but it’s not like you get to identify individual states or anything, and it takes up more space than it deserves.
So a compromise was made, making it shorter, but in a similar style:
One problem with both of these forms is that you don’t actually get to see the main point of the story: that there are more unified states than ever before. But I couldn’t think of a smart way to get all those, and I admit I liked being able to see the distributions.
But another approach yet made it into the pages of the NYT. Charles Blow, an op-ed columnist and the paper’s former graphics director, liked the chart, and wanted to use the same data in his column. But he used it in a slightly different way. His approach lets you compare all three numbers by separating them into two charts:
So, given the news and the data, which form is best? Or care to make your own, better chart? The data is already online, but it’s in a cleaner format right here. I’ll happily post any charts as long as they’re politely submitted or worse than mine.
I’ll also post the (very few lines) of R code used to make these if you want to do some learnin.
Readers, aggregators and bored skimmers of chartsnthings will know that this is frequently a place for statistical sketches, many of which are made in R. Yet this is not because the New York Times Graphics department only makes statistical charts; more realistically, it’s because this blog’s frequent contributors stink at drawing. The department has a wide assortment of (frankly badass) illustrators, cartographers and 3D modelers, and I’ll try to include some more of their sketches in future posts.
In that spirit, my colleague Alicia Desantis agreed to share sketches from her recent Thanksgiving flow chart of turkey preparation decisions.
Our original idea was to qualify 80 different turkey combinations. What’s the difference between a heritage bird that is roasted whole, brined and air-dried and one that is butterflied, brined and air-dried? Supposedly these decisions have consequences, right? The final turkeys would be rated in a number of factors: juiciness, crispness, cost, time-prep etc.
But this way of thinking about the story severely limited the number of variables and bogged us down in meaningless differences. So we moved to a decision chart — this way we could more clearly articulate what was at stake in each individual cooking choice. It also left some room for basic “tips” and commentary — and gave us an opportunity to experiment with a different voice.
Instead of not talking to your family at the Thanksgiving table, why not take a look through her design process? First, some thoughts in Illustrator…
Probably the best-known of the department’s graphics this election season is Mike Bostock and Shan Carter’s 512 Paths to the White House. Instead of posting on this in detail, I’ll just put up a few images and direct you to some stuff that’s already out there.
These photos are from that talk, but there are dozens more if you read through the whole thing, which you should, obv.
And the final graphic, which was wired up to results on election night.
The only meaningful footnote I can add to this is that Mike Bostock described programming the animations as “really, really hard.” I read that to mean I need to give up programming immediately, but your mileage may vary.
Now that the election is over and there’s a bit more time, I can post some sketches I have been sitting on for a while. (Early disclaimer: sitting on them was as close to this content as I got – these are courtesy of Mike Bostock, Shan Carter and Amanda Cox. I’m just doing the manual cut-and-paste labor.)
This first sketch, from Amanda, was made with R, and it had been in her ideas folder for a while (more than six months, I think).
Some brief styling of that idea in D3:
Then, before eventually coming back to this concept, some experimentation with other forms:
Election night is (mercifully) over. Aside from the dozens of maps, tables and charts that ran live during the election, including a bunch live graphical updates to the live blog (some of which I reblogged here last night), the department also published a more analytical piece this morning.
There had been many sketches and ideas about what to do for the “How the Winner Won” graphic (Amanda made at least 40 ideas, some of which she might share someday), but by the end of the night there was a pretty good plan.
This sketch, by Steve Duenes, doesn’t show all the research or failed ideas that got us to this point, but it does show a clear plan that was eventually executed overnight by about ten graphics editors. The main map, by Mike Bostock and Shan Carter, was a variation on earlier maps we had published, including the map of House shifts in 2010 (which itself was part of the Times’s election app this year) with some clear inspiration by Fernanda Viégas and Martin Wattenberg’s wind map.
Anyway, here’s the sketch:
And what was on the NYTimes homepage this morning: