How to store public transport data

Question

How to store public transport data

I'm currently trying to implement my own path search path for finding tram / bus connections, etc. with given schedules. All data is generated by me (just adding stop coordinates from google map). Thanks to this, I can freely choose my own way of storing data and processing it. The entire transport network is represented by a balanced schedule. So, the question arises: how to store public transport data in a standard SQL database so that some selected algorithms can be easily processed? How easy is it to convert it later to an extended time graph form so that a simple Dijkstra algorithm is enough?

+3

algorithm database graph path-finding

Niemand Mar 19 '15 at 11:41

source share

1 answer

dube · Accepted Answer · 2015-03-19T12:42:14+0000

Since I wrote a bachelor’s diploma on “how far you can go in X minutes using only public transport,” I can tell you about your problem.

Problem

First of all, forget about traditional algorithms. Go for those who know the time. What works for regular road networks fails completely on time-limited schedules. Time awareness is one of the problems, but there are many more that you will never think of as an ordinary passenger.

some trains are guaranteed to wait for other trains.
you have minimal downtime when switching trains (from train a to b)
downtime depends on the station (large or small stations) and the platform (switching from platform 1 to 2 is faster than from 1 to 20).
train schedules vary depending on the date, your data table contains many entries that are not relevant to the date you selected.

Decision

Judging by your nickname, I assume that you can read German. You can read my analysis of the algorithms and the settings of my database in the test . The database model and page 50 show the inmemory model. Also take a look at the links on pages 55-57 (they are mostly English).

Even time-aware-dijkstra is very slow. A good RAPTOR algorithm (a scientific description with an example can be found here ). RAPTOR uses schedule templates instead of discouraging this.

How RAPTOR works: Schedules mainly consist of stations (location), trips (one run of one train), stops (combo rides, time, place). The predator's trick is to organize all trips to routes. The route can only contain attractions that have the same sequence of stops and do not overtake each other . Thus, you can make the first trip on a route that matches your time and omit checking all other trips on the same route (because they are guaranteed to appear later). The number of routes is much less (a factor of 1000 in my data) than the number of trips. In addition, the RAPTOR algorithm works in “hopping cycles”, which allows it to check one station exactly once (dijkstra cannot) and follow

This happens as follows:

reachableStations = (startStation,timeX) for i=0; i < maxHops; i++ { if( reachableStations contains targetStation) finished! usableRoutes = collect all routes leaving from reachableStations reachableStations = follow all usableRoutes to the end and collect stations for the next cycle. keep station-time combos for each find. }

.

For my application, I used a modified version of RAPTOR, which I called REAS. It is optimized for receiving data on a complete network (instead of searching for a single route). You should just stick with RAPTOR. One of the key knowledge about algorithms in public transport is that the problem is much more complicated than it seems.

My application can be shown here . It uses data from HAFAS of the Swiss railway company (SBB and ZVV), but currently the data refer only to 2014. The calculation itself is fast, so it takes a long time to create a graphic overlay.

If you have more questions, feel free to contact me directly (contact information is available at ttm.ti8m.ch)

How to store public transport data

Problem

Decision

More articles: