Issue

Metro collects data from riders when they enter and exit the metro system. This tells you where people come from and where they go, but not which trains they ride. For example, which trains do people ride from L'Enfant Plaza to Dupont Circle? There are many combinations of trains people can take, likely determined by which train first arrives at the station going in their intended direction.

Solution

The solution is simple in theory but difficult in practice. Quite simply, I assume people will take the shortest and most convenient route (or routes) possible, according to the following assumptions:

Assumption 1

The number of station stops is similar to the amount of time people spend on the train. This assumption holds in the city for the most part, and while the stops toward the end of most lines take more time, they actually don't matter, because in these areas where there is a large difference, people have only one line choice anyway (at the end of the Orange or Red lines, for example). Obviously this assumption may not always hold true — I present it to help you interpret my results.

Assumption 2

People perceive switching stations as a cost — obviously they would rather ride only one train from origin to destination if possible — that has two components: Time Cost and Convenience Cost.

Time Cost is calculated based on the day and time, and assumes that the amount of time someone has to wait at a transfer point is equal to the amount of time it takes to walk from train 1 to train 2 (defined here as 2 minutes) plus the amount of time between trains divided by two (or the average amount of time you would have to wait for the train). If the amount of time between trains is 6 minutes during peak rush hour, this becomes 2 minutes + (6/2 = 3) minutes = 5 minutes. Assuming approximately 3 minutes between station stops on average, this is rounded to 2 station stops in the model.

Convenience Cost is a constant defined here as 2 stations. This represents the loss of convenience a rider experiences from not being able to stay on the same line (e.g., getting a seat, not having to deal with the hassle of a transfer, etc.). This ensures that the model does NOT assume people would equally prefer taking the Green line to Gallery Place and the Red Line to Metro Center from L'Enfant Plaza, rather than taking the Orange and Blue lines directly to Metro Center, which is what almost everyone would do.

End Result

Using these assumptions, the model calculates all the shortest routes for each Origin-Destination in the system (over 7,000 O-D combinations all told). If more than one route is the shortest, every shortest route is recorded for that Origin-Destination combination and assigned a weight equal to the inverse of the total number of shortest routes. This ensures that for people with two shortest routes, the model will assume that half will take one and half will take the other.

The resulting Origin-Desintation route spreadsheet can be used in combination with the Origin-Destination data from WMATA to calculate the number of people on each line at each station, and how many people getting on the train at each stop are getting off at each other stop in the system. This data is aggregated to the spreadsheet posted on the data page.

Any questions? Feel free to contact me on Twitter at @GrahamIMac.