The Hubverse is in the processing of making hub data available via publicly-accessible AWS S3 buckets.
Cloud-based hubs "mirror" the data stored in a hub's GitHub repository and provide a few advantages for data consumers:
- No need to clone a repository to access data
- Cloud-based model-output files are in parquet format, which is easier to work with and more performant
The examples here use the CDC's FluSight Forecast Hub, which is available in the following S3 bucket:
s3://cdcepi-flusight-forecast-hub
Other files in the gist provide examples of accessing cloud-based hub data via DuckDB, Python, and R.
Alternately, you can use the AWS CLI to copy the files from S3 to your local machine for use with your favorite data analysis software.
List the available directories in hub's S3 bucket:
aws s3 ls s3://cdcepi-flusight-forecast-hub/ --no-sign-request
PRE auxiliary-data/
PRE hub-config/
PRE model-metadata/
PRE model-output/
PRE raw/
PRE target-data/
Copy files from a specific directory (in this case, model-output
) to a local machine:
aws s3 cp s3://cdcepi-flusight-forecast-hub/model-output/ . --recursive --no-sign-request