red-datasets-parquet is a Ruby gem that provides datasets in Apache Arrow's Parquet format. It includes datasets from New York City's Taxi and Limousine Commission (TLC) data, such as green and yellow taxi trip records.
You can install the red-datasets-parquet gem by adding the following line to your Gemfile:
gem 'red-datasets-parquet'
Then execute:
$ bundle
Or install it yourself as:
$ gem install red-datasets-parquet
Here's an example of how you can use the red-datasets-parquet gem to access the taxi trip datasets:
require 'datasets-parquet'
# Get the green taxi trip dataset for January 2022
green_taxi_trips = Datasets::TLC::GreenTaxiTrip.new(year: 2022, month: 1)
# Access the dataset as an Apache Arrow table
green_arrow_table = green_taxi_trips.to_arrow
# Iterate over the green taxi trip records
green_taxi_trips.each do |trip|
# Access data about the taxi trip
p trip.vendor
p trip.lpep_pickup_datetime
p trip.lpep_dropoff_datetime
# ...
end
# Get the yellow taxi trip dataset for January 2022
yellow_taxi_trips = Datasets::TLC::YellowTaxiTrip.new(year: 2022, month: 1)
# Access the dataset as an Apache Arrow table
yellow_arrow_table = yellow_taxi_trips.to_arrow
# Iterate over the yellow taxi trip records
yellow_taxi_trips.each do |trip|
# Access data about the taxi trip
p trip.vendor
p trip.tpep_pickup_datetime
p trip.tpep_dropoff_datetime
# ...
end
This gem currently provides the following datasets:
- New York city Taxi and Limousine Commission: green taxi trip record dataset
- New York city Taxi and Limousine Commission: yellow taxi trip record dataset
Note that the datasets provided by this gem may be updated periodically, so the data you receive may differ from the examples shown above.
The red-datasets-parquet code is available under the MIT License. Please note that the datasets themselves may be subject to different licenses and terms of use.