-
Notifications
You must be signed in to change notification settings - Fork 6
Internal model GTFS'
The internal model used, GTFS', is close to GTFS but simplified / normalized / expanded for ease of use.
A calendar is a simple list of calendar dates. There is no date range, day of the week and positive/negative exceptions anymore.
We also force calendars to exists if they are defined in calendars OR calendar_dates (in the GTFS model the calendars table is optional, calendar_dates can be defined only).

This simplify greatly the following queries:
- List of calendars active a given day or a set of days (
SELECT ... WHERE calendar_dates.date = ?) - List of calendars active before/after a given date or an interval (
SELECT ... WHERE calendar_dates.date <= ?) - Computing the number of days a calendar is active (
SELECT calendar.id, COUNT(calendar_dates.*) ...) - Computing the union of days a set of calendars is active (
SELECT DISTINCT calendar_dates.date ...)
A few fields have their name changed:
-
stop.parent_stationhas been renamed tostop.parent_station_id, for consistency with other objects, and because it conflict with theparent_stationfield that now refers to the linked parent station object.
All other fields and class/table names are identical to GTFS.
All optional fields with a default value are initialized to the default value if non defined. Below the list:
stop.location_typestop.wheelchair_boardingtrip.wheelchair_accessibletrip.bikes_allowed-
agency.agency_id(in case a single agency exists) -
route.agency(in case a single agency exists)
This allow for simpler queries, the caller not having to check for missing values.
All missing stop times are interpolated based on the distance between stops. Interpolated stop times have the field interpolated set to True.

This allow for simpler processing of trip times, a stop time always have a stop time set (except first arrival and last departure). For example to query for all departures in a given hour range: ... WHERE stop_times.departure_time >= ? AND stop_times.departure_time <= ?.
The first arrival time and last departure time of each trip are set to NULL (None).
This allows simpler queries, for example all departures from a stop only need to select non-null departure times (... WHERE stop_times.departure_time IS NOT NULL), this will make sure the last stop times from each trip are not included in the result. The same for all arrivals to a stop (as first stop time should not be included).
All missing travelled distances (stop_times.shape_dist_traveled, shape.shape_dist_traveled) are computed if missing, and all (including existing distances) are converted to meters. If no shapes are available, distance is simply the straight-line distance between stops.

This allow for simpler queries based on distance (... WHERE stb.shape_dist_traveled - sta.shape_dist_traveled > ?) or speed (... WHERE (stb.shape_dist_traveled - sta.shape_dist_traveled) / (stb.departure_time - sta.departure_time) > ?).
All stop_times.stop_sequence are re-numbered from 0 using a consecutive index (0, 1, 2, 3...). The number of stop times for a trip is always equals to the last stop sequence + 1.
This allow for simpler queries for hops. For example all hops between two stops (... WHERE sta.stop_sequence = stb.stop_sequence + 1) or the number of stops between two stop_times (... stb.stop_sequence - sta.stop_sequence). Another example is selecting trip hop count (SELECT trip.trip_id, MAX(stop_times.stop_sequence)+1 ...), altough the latter can also be done using a simple SQL COUNT().
All frequencies are expanded to normal trips, and flagged as such with the boolean frequency_generated. The exact_times flag is backported to the trip ("standard" trips having exact_times=1, that is exactly scheduled). Both the initial frequencies and trips are deleted.
The ID for frequency-expanded trips is constructed by appending the trip departure time to the original trip ID, such as trip42@8:30:00, trip42@8:40:00, etc... This assume frequency-expanded trips do not overlap for the same original trip. (Note: this should be true according to the GTFS specifications, but may be wrong if two frequency rows associated to the same trip overlaps.)
