GTFSTK 6.0 Documentation¶
GTFSTK is a Python 3.5 tool kit for processing General Transit Feed Specification (GTFS) data in memory without a database. It is mostly for computing statistics, such as daily service distance per route and daily number of trips per stop. It uses Pandas and Shapely to do the heavy lifting.
Installation¶
Create a Python 3.5 virtual environment and pip install gtfstk
.
Examples¶
You can play with ipynb/examples.ipynb
in a Jupyter notebook
Conventions¶
In conformance with GTFS and unless specified otherwise, dates are encoded as date strings of the form YYMMDD and times are encoded as time strings of the form HH:MM:SS with the possibility that the hour is greater than 24. Unless specified otherwise, ‘data frame’ and ‘series’ refer to Pandas data frames and series, respectively.
constants Module¶
-
gtfstk.constants.
CRS_WGS84
= {'ellps': 'WGS84', 'no_defs': True, 'proj': 'longlat', 'datum': 'WGS84'}¶
-
gtfstk.constants.
DIST_UNITS
= ['ft', 'mi', 'm', 'km']¶
-
gtfstk.constants.
DTYPE
= {'zone_id': <class 'str'>, 'route_short_name': <class 'str'>, 'date': <class 'str'>, 'trip_id': <class 'str'>, 'shape_id': <class 'str'>, 'parent_station': <class 'str'>, 'to_stop_id': <class 'str'>, 'route_id': <class 'str'>, 'service_id': <class 'str'>, 'agency_id': <class 'str'>, 'origin_id': <class 'str'>, 'end_date': <class 'str'>, 'contains_id': <class 'str'>, 'destination_id': <class 'str'>, 'start_date': <class 'str'>, 'stop_code': <class 'str'>, 'from_stop_id': <class 'str'>, 'fare_id': <class 'str'>, 'stop_id': <class 'str'>}¶
-
gtfstk.constants.
FEED_ATTRS_PRIVATE
= ['_trips_i', '_calendar_i', '_calendar_dates_g']¶
-
gtfstk.constants.
FEED_ATTRS_PUBLIC
= ['agency', 'stops', 'routes', 'trips', 'stop_times', 'calendar', 'calendar_dates', 'fare_attributes', 'fare_rules', 'shapes', 'frequencies', 'transfers', 'feed_info', 'dist_units']¶
-
gtfstk.constants.
GTFS_TABLES_OPTIONAL
= ['calendar_dates', 'fare_attributes', 'fare_rules', 'shapes', 'frequencies', 'transfers', 'feed_info']¶
-
gtfstk.constants.
GTFS_TABLES_REQUIRED
= ['agency', 'stops', 'routes', 'trips', 'stop_times', 'calendar']¶
-
gtfstk.constants.
INT_COLUMNS
= ['location_type', 'wheelchair_boarding', 'route_type', 'direction_id', 'stop_sequence', 'wheelchair_accessible', 'bikes_allowed', 'pickup_type', 'drop_off_type', 'timepoint', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday', 'exception_type', 'payment_method', 'transfers', 'shape_pt_sequence', 'exact_times', 'transfer_type', 'transfer_duration', 'min_transfer_time']¶
utilities Module¶
-
gtfstk.utilities.
almost_equal
(f, g)¶ Return
True
if and only if the given data frames are equal after sorting their columns names, sorting their values, and reseting their indices.
-
gtfstk.utilities.
datestr_to_date
(x, format_str='%Y%m%d', inverse=False)¶ Given a string object
x
representing a date in the given format, convert it to a datetime.date object and return the result. Ifinverse
, then assume thatx
is a date object and return its corresponding string in the given format.
-
gtfstk.utilities.
get_convert_dist
(dist_units_in, dist_units_out)¶ Return a function of the form
distance in the unitsdist_units_in
-> distance in the unitsdist_units_out
Only supports distance units in
DIST_UNITS
.
-
gtfstk.utilities.
get_max_runs
(x)¶ Given a list of numbers, return a NumPy array of pairs (start index, end index + 1) of the runs of max value.
EXAMPLES:
>>> get_max_runs([7, 1, 2, 7, 7, 1, 2]) array([[0, 1], [3, 5]])
Assume x is not empty. Recipe from here
-
gtfstk.utilities.
get_peak_indices
(times, counts)¶ Given an increasing list of times as seconds past midnight and a list of trip counts at those times, return a pair of indices i, j such that times[i] to times[j] is the first longest time period such that for all i <= x < j, counts[x] is the max of counts. Assume times and counts have the same nonzero length.
-
gtfstk.utilities.
get_segment_length
(linestring, p, q=None)¶ Given a Shapely linestring and two Shapely points, project the points onto the linestring, and return the distance along the linestring between the two points. If
q is None
, then return the distance from the start of the linestring to the projection ofp
. The distance is measured in the native coordinates of the linestring.
-
gtfstk.utilities.
get_utm_crs
(lat, lon)¶ Return a GeoPandas coordinate reference system (CRS) dictionary corresponding to the UTM projection appropriate to the given WGS84 latitude and longitude.
-
gtfstk.utilities.
is_not_null
(data_frame, column_name)¶ Return
True
if the given data frame has a column of the given name (string), and there exists at least one non-NaN value in that column; returnFalse
otherwise.
-
gtfstk.utilities.
linestring_to_utm
(linestring)¶ Given a Shapely LineString in WGS84 coordinates, convert it to the appropriate UTM coordinates. If
inverse
, then do the inverse.
-
gtfstk.utilities.
time_it
(f)¶
-
gtfstk.utilities.
timestr_mod24
(timestr)¶ Given a GTFS time string in the format %H:%M:%S, return a timestring in the same format but with the hours taken modulo 24.
-
gtfstk.utilities.
timestr_to_seconds
(x, inverse=False, mod24=False)¶ Given a time string of the form ‘%H:%M:%S’, return the number of seconds past midnight that it represents. In keeping with GTFS standards, the hours entry may be greater than 23. If
mod24
, then return the number of seconds modulo24*3600
. Ifinverse
, then do the inverse operation. In this case, ifmod24
also, then first take the number of seconds modulo24*3600
.
-
gtfstk.utilities.
weekday_to_str
(weekday, inverse=False)¶ Given a weekday, that is, an integer in
range(7)
, return it’s corresponding weekday name as a lowercase string. Here 0 -> ‘monday’, 1 -> ‘tuesday’, and so on. Ifinverse
, then perform the inverse operation.
feed Module¶
This module defines the Feed class, which represents a GTFS feed as a collection of data frames, and defines some basic operations on Feed objects.
Almost all other operations on Feed objects are defined as functions living outside of the Feed class rather than methods of the Feed class.
Every function that acts on a Feed object assumes that every attribute of the feed that represents a GTFS file, such as agency
or stops
, is either None
or a data frame with the columns required in the GTFS.
-
class
gtfstk.feed.
Feed
(dist_units, agency=None, stops=None, routes=None, trips=None, stop_times=None, calendar=None, calendar_dates=None, fare_attributes=None, fare_rules=None, shapes=None, frequencies=None, transfers=None, feed_info=None)¶ Bases:
object
A class that represents a GTFS feed, where GTFS tables are stored as data frames. Beware, the stop times data frame can be big (several gigabytes), so make sure you have enough memory to handle it. Feed (public) attributes are
dist_units
: a string inconstants.DIST_UNITS
; specifies the distance units to use when calculating various stats, such as route service distance; should match the implicit distance units of theshape_dist_traveled
column values, if presentagency
stops
routes
trips
stop_times
calendar
calendar_dates
fare_attributes
fare_rules
shapes
frequencies
transfers
feed_info
There are also a few private Feed attributes that are derived from some public attributes and are automatically updated when those public attributes change. However, for this update to work, you must properly update the primary attributes like this:
feed.trips['route_short_name'] = 'bingo' feed.trips = feed.trips
and not like this:
feed.trips['route_short_name'] = 'bingo'
The first way ensures that the altered trips data frame is saved as the new
trips
attribute, but the second way does not.-
calendar
¶ A public Feed attribute made into a property for easy auto-updating of private feed attributes based on the calendar data frame.
-
calendar_dates
¶ A public Feed attribute made into a property for easy auto-updating of private feed attributes based on the calendar dates data frame.
-
copy
()¶ Return a copy of this feed, that is, a feed with all the same public and private attributes.
-
dist_units
¶ A public Feed attribute made into a property for easy validation.
-
trips
¶ A public Feed attribute made into a property for easy auto-updating of private feed attributes based on the trips data frame.
-
gtfstk.feed.
read_gtfs
(path, dist_units=None)¶ Create a Feed object from the given path and given distance units. The path points to a directory containing GTFS text files or a zip file that unzips as a collection of GTFS text files (but not as a directory containing GTFS text files).
-
gtfstk.feed.
write_gtfs
(feed, path, ndigits=6)¶ Export the given feed to a zip archive located at
path
. Round all decimals tondigits
decimal places. All distances will be displayed in unitsfeed.dist_units
.
validator Module¶
This module contains functions that supplement but do not replace the feedvalidator
module of the transitfeed package.
The latter module checks if GFTS feeds adhere to the GTFS specification.
-
exception
gtfstk.validator.
GTFSError
(feed, msg)¶ Bases:
Exception
Exception raised for Feed objects that do not conform to the GTFS specification. Attributes:
- msg: explanation of the error
-
gtfstk.validator.
check_calendar
(feed)¶ Check that one of
feed.calendar
orfeed.calendar_dates
is nonempty.
cleaner Module¶
This module contains functions for cleaning Feed objects.
-
gtfstk.cleaner.
aggregate_routes
(feed, by='route_short_name', route_id_prefix='route_')¶ Given a GTFSTK Feed object, group routes by the
by
column offeed.routes
and for each group,- choose the first route in the group,
- assign a new route ID based on the given
route_id_prefix
string and a running count, e.g.'route_013'
- assign all the trips associated with routes in the group to that first route.
Update
feed.routes
andfeed.trips
with the new routes, and return the resulting feed.
-
gtfstk.cleaner.
assess
(feed)¶ Return a Pandas series containing various feed assessments, such as the number of trips missing shapes. This is not a GTFS validator.
-
gtfstk.cleaner.
clean
(feed)¶ Given a GTFSTK Feed instance, apply the following functions to it and return the resulting feed.
-
gtfstk.cleaner.
clean_ids
(feed)¶ Strip whitespace from all string IDs and then replace every remaining whitespace chunk with an underscore. Return the resulting feed.
-
gtfstk.cleaner.
clean_route_short_names
(feed)¶ In
feed.routes
, assign ‘n/a’ to missing route short names and strip whitespace from route short names. Then disambiguate each route short name that is duplicated by appending ‘-‘ and its route ID. Return the resulting feed.
-
gtfstk.cleaner.
clean_stop_times
(feed)¶ In
feed.stop_times
, prefix a zero to arrival and departure times if necessary. This makes sorting by time work as expected. Return the resulting feed.
-
gtfstk.cleaner.
drop_invalid_columns
(feed)¶ Given a GTFSTK Feed instance, drop all data frame columns not listed in
constants.VALID_COLS
. Return the resulting feed.
-
gtfstk.cleaner.
prune_dead_routes
(feed)¶ Remove all routes from
feed.routes
that do not have trips listed infeed.trips
. Return the result feed.
calculator Module¶
This module contains functions for calculating properties of Feed objects, such as daily service duration per route.
-
gtfstk.calculator.
append_dist_to_shapes
(feed)¶ Calculate and append the optional
shape_dist_traveled
field infeed.shapes
in terms of the distance unitsfeed.dist_units
. Return the resulting feed.Assume the following feed attributes are not
None
:feed.shapes
- NOTES:
- All of the calculated
shape_dist_traveled
values for the Portland feed https://transitfeeds.com/p/trimet/43/1400947517 differ by at most 0.016 km in absolute values from of the original values.
- All of the calculated
-
gtfstk.calculator.
append_dist_to_stop_times
(feed, trips_stats)¶ Calculate and append the optional
shape_dist_traveled
field infeed.stop_times
in terms of the distance unitsfeed.dist_units
. Need trip stats in the form output bycompute_trip_stats()
for this. Return the resulting feed. Does not always give accurate results, as described below.Assume the following feed attributes are not
None
:feed.stop_times
- Those used in
build_geometry_by_shape()
- Those used in
build_geometry_by_stop()
- ALGORITHM:
Compute the
shape_dist_traveled
field by using Shapely to measure the distance of a stop along its trip linestring. If for a given trip this process produces a non-monotonically increasing, hence incorrect, list of (cumulative) distances, then fall back to estimating the distances as follows.Get the average speed of the trip via
trips_stats
and use is to linearly interpolate distances for stop times, assuming that the first stop is at shape_dist_traveled = 0 (the start of the shape) and the last stop is at shape_dist_traveled = the length of the trip (taken from trips_stats and equal to the length of the shape, unless trips_stats was called withget_dist_from_shapes == False
). This fallback method usually kicks in on trips with feed-intersecting linestrings. Unfortunately, this fallback method will produce incorrect results when the first stop does not start at the start of its shape (so shape_dist_traveled != 0). This is the case for several trips in the Portland feed at https://transitfeeds.com/p/trimet/43/1400947517, for example.
-
gtfstk.calculator.
append_route_type_to_shapes
(feed)¶ Append a
route_type
column to a copy offeed.shapes
and return the resulting shapes data frame. Note that a single shape can be linked to multiple trips on multiple routes of multiple route types. In that case the route type of the shape is the route type of the last route (sorted by ID) with a trip with that shape.Assume the following feed attributes are not
None
:feed.routes
feed.trips
feed.shapes
-
gtfstk.calculator.
build_geometry_by_shape
(feed, use_utm=False, shape_ids=None)¶ Return a dictionary with structure shape_id -> Shapely linestring of shape. If
feed.shapes is None
, then returnNone
. Ifuse_utm
, then return each linestring in in UTM coordinates. Otherwise, return each linestring in WGS84 longitude-latitude coordinates. If a list of shape IDsshape_ids
is given, then only include the given shape IDs.Assume the following feed attributes are not
None
:feed.shapes
-
gtfstk.calculator.
build_geometry_by_stop
(feed, use_utm=False, stop_ids=None)¶ Return a dictionary with structure stop_id -> Shapely point object. If
use_utm
, then return each point in in UTM coordinates. Otherwise, return each point in WGS84 longitude-latitude coordinates. If a list of stop IDsstop_ids
is given, then only include the given stop IDs.Assume the following feed attributes are not
None
:feed.stops
-
gtfstk.calculator.
combine_time_series
(time_series_dict, kind, split_directions=False)¶ Given a dictionary of time series data frames, combine the time series into one time series data frame with multi-index (hierarchical) columns and return the result. The top level columns are the keys of the dictionary and the second and third level columns are ‘route_id’ and ‘direction_id’, if
kind == 'route'
, or ‘stop_id’ and ‘direction_id’, ifkind == 'stop'
. Ifsplit_directions == False
, then there is no third column level, no ‘direction_id’ column.
-
gtfstk.calculator.
compute_bounds
(feed)¶ Return the tuple (min longitude, min latitude, max longitude, max latitude) where the longitudes and latitude vary across all the stop (WGS84)coordinates.
-
gtfstk.calculator.
compute_busiest_date
(feed, dates)¶ Given a list of dates, return the first date that has the maximum number of active trips. If the list of dates is empty, then raise a
ValueError
.Assume the following feed attributes are not
None
:- Those used in
compute_trip_activity()
- Those used in
-
gtfstk.calculator.
compute_center
(feed, num_busiest_stops=None)¶ Compute the convex hull of all the given feed’s stop coordinates and return the centroid. If an integer
num_busiest_stops
is given, then compute thenum_busiest_stops
busiest stops in the feed on the first Monday of the feed and return the mean of the longitudes and the mean of the latitudes of these stops, respectively.
-
gtfstk.calculator.
compute_feed_stats
(feed, trips_stats, date)¶ Given
trips_stats
, which is the output offeed.compute_trip_stats()
and a date, return a data frame including the following feed stats for the date.- num_trips: number of trips active on the given date
- num_routes: number of routes active on the given date
- num_stops: number of stops active on the given date
- peak_num_trips: maximum number of simultaneous trips in service
- peak_start_time: start time of first longest period during which the peak number of trips occurs
- peak_end_time: end time of first longest period during which the peak number of trips occurs
- service_distance: sum of the service distances for the active routes
- service_duration: sum of the service durations for the active routes
- service_speed: service_distance/service_duration
If there are no stats for the given date, return an empty data frame with the specified columns.
Assume the following feed attributes are not
None
:- Those used in
get_trips()
- Those used in
get_routes()
- Those used in
get_stops()
-
gtfstk.calculator.
compute_feed_time_series
(feed, trips_stats, date, freq='5Min')¶ Given trips stats (output of
feed.compute_trip_stats()
), a date, and a Pandas frequency string, return a time series of stats for this feed on the given date at the given frequency with the following columns- num_trip_starts: number of trips starting at this time
- num_trips: number of trips in service during this time period
- service_distance: distance traveled by all active trips during this time period
- service_duration: duration traveled by all active trips during this time period
- service_speed: service_distance/service_duration
If there is no time series for the given date, return an empty data frame with specified columns.
Assume the following feed attributes are not
None
:- Those used in
compute_route_time_series()
-
gtfstk.calculator.
compute_route_stats
(feed, trips_stats, date, split_directions=False, headway_start_time='07:00:00', headway_end_time='19:00:00')¶ Take
trips_stats
, which is the output ofcompute_trip_stats()
, cut it down to the subsetS
of trips that are active on the given date, and then callcompute_route_stats_base()
withS
and the keyword argumentssplit_directions
,headway_start_time
, andheadway_end_time
.See
compute_route_stats_base()
for a description of the output.Assume the following feed attributes are not
None
:- Those used in
compute_route_stats_base()
- NOTES:
- This is a more user-friendly version of
compute_route_stats_base()
. The latter function works without a feed, though. - Return
None
if the date does not lie in this feed’s date range.
- This is a more user-friendly version of
- Those used in
-
gtfstk.calculator.
compute_route_stats_base
(trips_stats_subset, split_directions=False, headway_start_time='07:00:00', headway_end_time='19:00:00')¶ Given a subset of the output of
Feed.compute_trip_stats()
, calculate stats for the routes in that subset.Return a data frame with the following columns:
- route_id
- route_short_name
- route_type
- direction_id
- num_trips: number of trips
- is_loop: 1 if at least one of the trips on the route has its
is_loop
field equal to 1; 0 otherwise - is_bidirectional: 1 if the route has trips in both directions; 0 otherwise
- start_time: start time of the earliest trip on the route
- end_time: end time of latest trip on the route
- max_headway: maximum of the durations (in minutes) between trip starts on the route between
headway_start_time
andheadway_end_time
on the given dates - min_headway: minimum of the durations (in minutes) mentioned above
- mean_headway: mean of the durations (in minutes) mentioned above
- peak_num_trips: maximum number of simultaneous trips in service (for the given direction, or for both directions when
split_directions==False
) - peak_start_time: start time of first longest period during which the peak number of trips occurs
- peak_end_time: end time of first longest period during which the peak number of trips occurs
- service_duration: total of the duration of each trip on the route in the given subset of trips; measured in hours
- service_distance: total of the distance traveled by each trip on the route in the given subset of trips; measured in wunits, that is, whatever distance units are present in trips_stats_subset; contains all
np.nan
entries iffeed.shapes is None
- service_speed: service_distance/service_duration; measured in wunits per hour
- mean_trip_distance: service_distance/num_trips
- mean_trip_duration: service_duration/num_trips
If
split_directions == False
, then remove the direction_id column and compute each route’s stats, except for headways, using its trips running in both directions. In this case, (1) compute max headway by taking the max of the max headways in both directions; (2) compute mean headway by taking the weighted mean of the mean headways in both directions.If
trips_stats_subset
is empty, return an empty data frame with the columns specified above.Assume the following feed attributes are not
None
: none.
-
gtfstk.calculator.
compute_route_time_series
(feed, trips_stats, date, split_directions=False, freq='5Min')¶ Take
trips_stats
, which is the output ofcompute_trip_stats()
, cut it down to the subsetS
of trips that are active on the given date, and then callcompute_route_time_series_base()
withS
and the given keyword argumentssplit_directions
andfreq
and withdate_label = ut.date_to_str(date)
.See
compute_route_time_series_base()
for a description of the output.If there are no active trips on the date, then return
None
.Assume the following feed attributes are not
None
:- Those used in
get_trips()
- NOTES:
- This is a more user-friendly version of
compute_route_time_series_base()
. The latter function works without a feed, though.
- Those used in
-
gtfstk.calculator.
compute_route_time_series_base
(trips_stats_subset, split_directions=False, freq='5Min', date_label='20010101')¶ Given a subset of the output of
Feed.compute_trip_stats()
, calculate time series for the routes in that subset.Return a time series version of the following route stats:
- number of trips in service by route ID
- number of trip starts by route ID
- service duration in hours by route ID
- service distance in kilometers by route ID
- service speed in kilometers per hour
The time series is a data frame with a timestamp index for a 24-hour period sampled at the given frequency. The maximum allowable frequency is 1 minute.
date_label
is used as the date for the timestamp index.The columns of the data frame are hierarchical (multi-index) with
- top level: name = ‘indicator’, values = [‘service_distance’, ‘service_duration’, ‘num_trip_starts’, ‘num_trips’, ‘service_speed’]
- middle level: name = ‘route_id’, values = the active routes
- bottom level: name = ‘direction_id’, values = 0s and 1s
If
split_directions == False
, then don’t include the bottom level.If
trips_stats_subset
is empty, then return an empty data frame with the indicator columns.- NOTES:
- To resample the resulting time series use the following methods:
- for ‘num_trips’ series, use
how=np.mean
- for the other series, use
how=np.sum
- ‘service_speed’ can’t be resampled and must be recalculated from ‘service_distance’ and ‘service_duration’
- for ‘num_trips’ series, use
To remove the date and seconds from the time series f, do
f.index = [t.time().strftime('%H:%M') for t in f.index.to_datetime()]
-
gtfstk.calculator.
compute_screen_line_counts
(feed, linestring, date, geo_shapes=None)¶ Compute all the trips active in the given feed on the given date that intersect the given Shapely LineString (with WGS84 longitude-latitude coordinates), and return a data frame with the columns:
'trip_id'
'route_id'
'route_short_name'
'crossing_time'
: time that the trip’s vehicle crosses the linestring; one trip could cross multiple times'orientation'
: 1 or -1; 1 indicates trip travel from the left side to the right side of the screen line; -1 indicates trip travel in the opposite direction
- NOTES:
Requires GeoPandas.
The first step is to geometrize
feed.shapes
viageometrize_shapes()
. Alternatively, use thegeo_shapes
GeoDataFrame, if given.Assume
feed.stop_times
has an accurateshape_dist_traveled
column.- Assume the following feed attributes are not
None
: feed.shapes
, ifgeo_shapes
is not given
- Assume the following feed attributes are not
Assume that trips travel in the same direction as their shapes. That restriction is part of GTFS, by the way. To calculate direction quickly and accurately, assume that the screen line is straight and doesn’t double back on itself.
Probably does not give correct results for trips with self-intersecting shapes.
- ALGORITHM:
- Compute all the shapes that intersect the linestring.
- For each such shape, compute the intersection points.
- For each point p, scan through all the trips in the feed that have that shape and are active on the given date.
- Interpolate a stop time for p by assuming that the feed has the shape_dist_traveled field in stop times.
- Use that interpolated time as the crossing time of the trip vehicle, and compute the trip orientation to the screen line via a cross product of a vector in the direction of the screen line and a tiny vector in the direction of trip travel.
-
gtfstk.calculator.
compute_station_stats
(feed, date, split_directions=False, headway_start_time='07:00:00', headway_end_time='19:00:00')¶ If this feed has station data, that is,
location_type
andparent_station
columns infeed.stops
, then compute the same stats thatfeed.compute_stop_stats()
does, but for stations. Otherwise, return an empty data frame with the specified columns.Assume the following feed attributes are not
None
:- Those used in
get_stops_in_stations()
- Those used in
get_stop_times()
- Those used in
-
gtfstk.calculator.
compute_stop_activity
(feed, dates)¶ Return a data frame with the columns
- stop_id
dates[0]
: 1 if the stop has at least one trip visiting it ondates[0]
; 0 otherwisedates[1]
: 1 if the stop has at least one trip visiting it ondates[1]
; 0 otherwise- etc.
dates[-1]
: 1 if the stop has at least one trip visiting it ondates[-1]
; 0 otherwise
If
dates
isNone
or the empty list, then return an empty data frame with the column ‘stop_id’.Assume the following feed attributes are not
None
:feed.stop_times
- Those used in
compute_trip_activity()
-
gtfstk.calculator.
compute_stop_stats
(feed, date, split_directions=False, headway_start_time='07:00:00', headway_end_time='19:00:00')¶ Call
compute_stop_stats_base()
with the subset of trips active on the given date and with the keyword argumentssplit_directions
,headway_start_time
, andheadway_end_time
.See
compute_stop_stats_base()
for a description of the output.Assume the following feed attributes are not
None
:feed.stop_timtes
- Those used in
get_trips()
NOTES:
This is a more user-friendly version of
compute_stop_stats_base()
. The latter function works without a feed, though.
-
gtfstk.calculator.
compute_stop_stats_base
(stop_times, trips_subset, split_directions=False, headway_start_time='07:00:00', headway_end_time='19:00:00')¶ Given a stop times data frame and a subset of a trips data frame, return a data frame that provides summary stats about the stops in the (inner) join of the two data frames.
The columns of the output data frame are:
- stop_id
- direction_id: present if and only if
split_directions
- num_routes: number of routes visiting stop (in the given direction)
- num_trips: number of trips visiting stop (in the givin direction)
- max_headway: maximum of the durations (in minutes) between trip departures at the stop between
headway_start_time
andheadway_end_time
on the given date - min_headway: minimum of the durations (in minutes) mentioned above
- mean_headway: mean of the durations (in minutes) mentioned above
- start_time: earliest departure time of a trip from this stop on the given date
- end_time: latest departure time of a trip from this stop on the given date
If
split_directions == False
, then compute each stop’s stats using trips visiting it from both directions.If
trips_subset
is empty, then return an empty data frame with the columns specified above.
-
gtfstk.calculator.
compute_stop_time_series
(feed, date, split_directions=False, freq='5Min')¶ Call
compute_stops_times_series_base()
with the subset of trips active on the given date and with the keyword argumentssplit_directions``and ``freq
and withdate_label
equal todate
. Seecompute_stop_time_series_base()
for a description of the output.Assume the following feed attributes are not
None
:feed.stop_times
- Those used in
get_trips()
NOTES:
This is a more user-friendly version of
compute_stop_time_series_base()
. The latter function works without a feed, though.
-
gtfstk.calculator.
compute_stop_time_series_base
(stop_times, trips_subset, split_directions=False, freq='5Min', date_label='20010101')¶ Given a stop times data frame and a subset of a trips data frame, return a data frame that provides summary stats about the stops in the (inner) join of the two data frames.
The time series is a data frame with a timestamp index for a 24-hour period sampled at the given frequency. The maximum allowable frequency is 1 minute. The timestamp includes the date given by
date_label
, a date string of the form ‘%Y%m%d’.The columns of the data frame are hierarchical (multi-index) with
- top level: name = ‘indicator’, values = [‘num_trips’]
- middle level: name = ‘stop_id’, values = the active stop IDs
- bottom level: name = ‘direction_id’, values = 0s and 1s
If
split_directions == False
, then don’t include the bottom level.If
trips_subset
is empty, then return an empty data frame with the indicator columns.NOTES:
- ‘num_trips’ should be resampled with
how=np.sum
- To remove the date and seconds from the time series f, do
f.index = [t.time().strftime('%H:%M') for t in f.index.to_datetime()]
-
gtfstk.calculator.
compute_trip_activity
(feed, dates)¶ Return a data frame with the columns
- trip_id
dates[0]
: 1 if the trip is active ondates[0]
; 0 otherwisedates[1]
: 1 if the trip is active ondates[1]
; 0 otherwise- etc.
dates[-1]
: 1 if the trip is active ondates[-1]
; 0 otherwise
If
dates
isNone
or the empty list, then return an empty data frame with the column ‘trip_id’.Assume the following feed attributes are not
None
:feed.trips
- Those used in
is_active_trip()
-
gtfstk.calculator.
compute_trip_locations
(feed, date, times)¶ Return a data frame of the positions of all trips active on the given date and times Include the columns:
- trip_id
- route_id
- direction_id
- time
- rel_dist: number between 0 (start) and 1 (end) indicating the relative distance of the trip along its path
- lon: longitude of trip at given time
- lat: latitude of trip at given time
Assume
feed.stop_times
has an accurateshape_dist_traveled
column.Assume the following feed attributes are not
None
:feed.trips
- Those used in
get_stop_times()
- Those used in
build_geometry_by_shape()
-
gtfstk.calculator.
compute_trip_stats
(feed, compute_dist_from_shapes=False)¶ Return a data frame with the following columns:
- trip_id
- route_id
- route_short_name
- route_type
- direction_id
- shape_id
- num_stops: number of stops on trip
- start_time: first departure time of the trip
- end_time: last departure time of the trip
- start_stop_id: stop ID of the first stop of the trip
- end_stop_id: stop ID of the last stop of the trip
- is_loop: 1 if the start and end stop are less than 400m apart and 0 otherwise
- distance: distance of the trip in
feed.dist_units
; contains allnp.nan
entries iffeed.shapes is None
- duration: duration of the trip in hours
- speed: distance/duration
Assume the following feed attributes are not
None
:feed.trips
feed.routes
feed.stop_times
feed.shapes
(optionally)- Those used in
build_geometry_by_stop()
- NOTES:
If
feed.stop_times
has ashape_dist_traveled
column with at least one non-NaN value andcompute_dist_from_shapes == False
, then use that column to compute the distance column. Else iffeed.shapes is not None
, then compute the distance column using the shapes and Shapely. Otherwise, set the distances tonp.nan
.Calculating trip distances with
compute_dist_from_shapes=True
seems pretty accurate. For example, calculating trip distances on the Portland feed at https://transitfeeds.com/p/trimet/43/1400947517 usingcompute_dist_from_shapes=False
andcompute_dist_from_shapes=True
, yields a difference of at most 0.83km.
-
gtfstk.calculator.
convert_dist
(feed, new_dist_units)¶ Convert the distances recorded in the
shape_dist_traveled
columns of the given feed from the feed’s native distance units (recorded infeed.dist_units
) to the given new distance units. New distance units must lie inconstants.DIST_UNITS
-
gtfstk.calculator.
count_active_trips
(trip_times, time)¶ Given a data frame
trip_times
containing the columns- trip_id
- start_time: start time of the trip in seconds past midnight
- end_time: end time of the trip in seconds past midnight
and a time in seconds past midnight, return the number of trips in the data frame that are active at the given time. A trip is a considered active at time t if start_time <= t < end_time.
-
gtfstk.calculator.
create_shapes
(feed, all_trips=False)¶ Given a feed, create a shape for every trip that is missing a shape ID. Do this by connecting the stops on the trip with straight lines. Return the resulting feed which has updated shapes and trips data frames.
If
all_trips
, then create new shapes for all trips by connecting stops, and remove the old shapes.Assume the following feed attributes are not
None
:feed.stop_times
feed.trips
feed.stops
-
gtfstk.calculator.
downsample
(time_series, freq)¶ Downsample the given route, stop, or feed time series, (outputs of
Feed.compute_route_time_series()
,Feed.compute_stop_time_series()
, orFeed.compute_feed_time_series()
, respectively) to the given Pandas frequency. Return the given time series unchanged if the given frequency is shorter than the original frequency.
-
gtfstk.calculator.
geometrize_shapes
(shapes, use_utm=False)¶ Given a shapes data frame, convert it to a GeoPandas GeoDataFrame and return the result. The result has a ‘geometry’ column of WGS84 line strings instead of ‘shape_pt_sequence’, ‘shape_pt_lon’, ‘shape_pt_lat’, and ‘shape_dist_traveled’ columns. If
use_utm
, then use UTM coordinates for the geometries.Requires GeoPandas.
-
gtfstk.calculator.
geometrize_stops
(stops, use_utm=False)¶ Given a stops data frame, convert it to a GeoPandas GeoDataFrame and return the result. The result has a ‘geometry’ column of WGS84 points instead of ‘stop_lon’ and ‘stop_lat’ columns. If
use_utm
, then use UTM coordinates for the geometries. Requires GeoPandas.
-
gtfstk.calculator.
get_dates
(feed, as_date_obj=False)¶ Return a chronologically ordered list of dates for which this feed is valid. If
as_date_obj
, then return the dates asdatetime.date
objects.If
feed.calendar
andfeed.calendar_dates
are bothNone
, then return the empty list.
-
gtfstk.calculator.
get_first_week
(feed, as_date_obj=False)¶ Return a list of date corresponding to the first Monday–Sunday week for which this feed is valid. If the given feed does not cover a full Monday–Sunday week, then return whatever initial segment of the week it does cover, which could be the empty list. If
as_date_obj
, then return the dates as asdatetime.date
objects.
-
gtfstk.calculator.
get_route_timetable
(feed, route_id, date)¶ Return a data frame encoding the timetable for the given route ID on the given date. The columns are all those in
feed.trips
plus those infeed.stop_times
. The result is sorted by grouping by trip ID and sorting the groups by their first departure time.Assume the following feed attributes are not
None
:feed.stop_times
- Those used in
get_trips()
-
gtfstk.calculator.
get_routes
(feed, date=None, time=None)¶ Return the section of
feed.routes
that contains only routes active on the given date. If no date is given, then return all routes. If a date and time are given, then return only those routes with trips active at that date and time. Do not take times modulo 24.Assume the following feed attributes are not
None
:feed.routes
- Those used in
get_trips()
-
gtfstk.calculator.
get_shapes_intersecting_geometry
(feed, geometry, geo_shapes=None, geometrized=False)¶ Return the slice of
feed.shapes
that contains all shapes that intersect the given Shapely geometry object (e.g. a Polygon or LineString). Assume the geometry is specified in WGS84 longitude-latitude coordinates.To do this, first geometrize
feed.shapes
viageometrize_shapes()
. Alternatively, use thegeo_shapes
GeoDataFrame, if given. Requires GeoPandas.Assume the following feed attributes are not
None
:feed.shapes
, ifgeo_shapes
is not given
If
geometrized
isTrue
, then return the resulting shapes data frame in geometrized form.
-
gtfstk.calculator.
get_start_and_end_times
(feed, date=None)¶ Return the first departure time and last arrival time (time strings) listed in
feed.stop_times
, respectively. Restrict to the given date if specified.
-
gtfstk.calculator.
get_stop_times
(feed, date=None)¶ Return the section of
feed.stop_times
that contains only trips active on the given date. If no date is given, then return all stop times.Assume the following feed attributes are not
None
:feed.stop_times
- Those used in
get_trips()
-
gtfstk.calculator.
get_stop_timetable
(feed, stop_id, date)¶ Return a data frame encoding the timetable for the given stop ID on the given date. The columns are all those in
feed.trips
plus those infeed.stop_times
. The result is sorted by departure time.Assume the following feed attributes are not
None
:feed.trips
- Those used in
get_stop_times()
-
gtfstk.calculator.
get_stops
(feed, date=None, trip_id=None, route_id=None)¶ Return
feed.stops
. If a date is given, then restrict the output to stops that are visited by trips active on the given date. If a trip ID (string) is given, then restrict the output possibly further to stops that are visited by the trip. Eles if a route ID (string) is given, then restrict the output possibly further to stops that are visited by at least one trip on the route.Assume the following feed attributes are not
None
:feed.stops
- Those used in
get_stop_times()
feed.routes
-
gtfstk.calculator.
get_stops_in_stations
(feed)¶ If this feed has station data, that is,
location_type
andparent_station
columns infeed.stops
, then return a data frame that has the same columns asfeed.stops
but only includes stops with parent stations, that is, stops with location type 0 or blank and non-blank parent station. Otherwise, return an empty data frame with the specified columns.Assume the following feed attributes are not
None
:feed.stops
-
gtfstk.calculator.
get_stops_intersecting_polygon
(feed, polygon, geo_stops=None)¶ Return the slice of
feed.stops
that contains all stops that intersect the given Shapely Polygon object. Assume the polygon specified in WGS84 longitude-latitude coordinates.To do this, first geometrize
feed.stops
viageometrize_stops()
. Alternatively, use thegeo_stops
GeoDataFrame, if given. Requires GeoPandas.Assume the following feed attributes are not
None
:feed.stops
, ifgeo_stops
is not given
-
gtfstk.calculator.
get_trips
(feed, date=None, time=None)¶ Return the section of
feed.trips
that contains only trips active on the given date. Iffeed.trips
isNone
or the date isNone
, then return allfeed.trips
. If a date and time are given, then return only those trips active at that date and time. Do not take times modulo 24.
-
gtfstk.calculator.
is_active_trip
(feed, trip, date)¶ If the given trip (trip ID) is active on the given date, then return
True
; otherwise returnFalse
. To avoid error checking in the interest of speed, assumetrip
is a valid trip ID in the given feed anddate
is a valid date object.Assume the following feed attributes are not
None
:feed.trips
- NOTES:
- This function is key for getting all trips, routes, etc. that are active on a given date, so the function needs to be fast.
-
gtfstk.calculator.
restrict_by_polygon
(feed, polygon)¶ Build a new feed by taking the given one, keeping only the trips that have at least one stop intersecting the given polygon, and then restricting stops, routes, stop times, etc. to those associated with that subset of trips. Return the resulting feed. Requires GeoPandas.
Assume the following feed attributes are not
None
:feed.stop_times
feed.trips
feed.stops
feed.routes
- Those used in
get_stops_intersecting_polygon()
-
gtfstk.calculator.
restrict_by_routes
(feed, route_ids)¶ Build a new feed by taking the given one and chopping it down to only the stops, trips, shapes, etc. used by the routes specified in the given list of route IDs. Return the resulting feed.
-
gtfstk.calculator.
route_to_geojson
(feed, route_id, include_stops=False)¶ Given a feed and a route ID (string), return a (decoded) GeoJSON feature collection comprising a MultiLinestring feature of distinct shapes of the trips on the route. If
include_stops
, then also include one Point feature for each stop visited by any trip on the route. The MultiLinestring feature will contain as properties all the columns infeed.routes
pertaining to the given route, and each Point feature will contain as properties all the columns infeed.stops
pertaining to the stop, except thestop_lat
andstop_lon
properties.Assume the following feed attributes are not
None
:feed.routes
feed.shapes
feed.trips
feed.stops
-
gtfstk.calculator.
shapes_to_geojson
(feed)¶ Return a (decoded) GeoJSON feature collection of linestring features representing
feed.shapes
. Each feature will have ashape_id
property. Iffeed.shapes
isNone
, then returnNone
. The coordinates reference system is the default one for GeoJSON, namely WGS84.Assume the following feed attributes are not
None
:- Those used in
build_geometry_by_shape()
- Those used in
-
gtfstk.calculator.
trip_to_geojson
(feed, trip_id, include_stops=False)¶ Given a feed and a trip ID (string), return a (decoded) GeoJSON feature collection comprising a Linestring feature of representing the trip’s shape. If
include_stops
, then also include one Point feature for each stop visited by the trip. The Linestring feature will contain as properties all the columns infeed.trips
pertaining to the given trip, and each Point feature will contain as properties all the columns infeed.stops
pertaining to the stop, except thestop_lat
andstop_lon
properties.Assume the following feed attributes are not
None
:feed.trips
feed.shapes
feed.stops
-
gtfstk.calculator.
ungeometrize_shapes
(geo_shapes)¶ The inverse of
geometrize_shapes()
. Produces the columns:- shape_id
- shape_pt_sequence
- shape_pt_lon
- shape_pt_lat
If
geo_shapes
is in UTM (has a UTM CRS property), then convert UTM coordinates back to WGS84 coordinates,
-
gtfstk.calculator.
ungeometrize_stops
(geo_stops)¶ The inverse of
geometrize_stops()
. Ifgeo_stops
is in UTM (has a UTM CRS property), then convert UTM coordinates back to WGS84 coordinates,
plotter Module¶
This module contains functions for plotting various graphs related to Feed objects. It is optional and requires Matplotlib.
-
gtfstk.plotter.
plot_feed_time_series
(feed_time_series)¶ Given a routes time series data frame, sum each time series indicator over all routes, plot each series indicator using Matplotlib, and return the resulting figure of subplots.
NOTES:
Take the resulting figure
f
and dof.tight_layout()
for a nice-looking plot.
-
gtfstk.plotter.
plot_headways
(stats, max_headway_limit=60)¶ Given a stops or routes stats data frame, return bar charts of the max and mean headways as a Matplotlib figure. Only include the stops/routes with max headways at most
max_headway_limit
minutes. Ifmax_headway_limit is None
, then include them all in a giant plot. If there are no stops/routes within the max headway limit, then returnNone
.NOTES:
Take the resulting figure
f
and dof.tight_layout()
for a nice-looking plot.