The U.S. Historical Climatology Network USHCN data are used to quantify national and regional-scale temperature changes in the contiguous United States (CONUS). The dataset provides adjustments for systematic, non-climatic changes that bias temperature trends of monthly temperature records of long-term COOP stations.
USHCN defines these elements as follows:
- tmax = monthly mean maximum temperature
- tmin = monthly mean minimum temperature
- tavg = average monthly temperature (tmax+tmin)/2
- prcp = total monthly precipitation
Unfortunately the data is stored in fixed record format which makes it difficult to process and use easily. The attached SQL file is a set of tables and views for SQLite that allows direct access to the USHCN data. At the very bottom of the SQL file are several commands, commented out, that show how to import data and export it in a usable form for further processing in Excel or other systems.
There are tables that hold the fixed format data from the USHCN. In addition, there are views that understand the data format and how to separate it into fields. The following tables are defined:
- tavg_fixed
- tmin_fixed
- tmax_fixed
- prcp_fixed
For each of these tables there is a corresponding view that splits the data out into one record per month. These views are named the same as the table, but without the "_fixed" on the end. So the temperature average (tavg) view is named "tavg" and so on. The views have the following fields:
- tavg(station,year,month,value,celsius,fahrenheit,dmflag,qcflag,dsflag)
- tmin(station,year,month,value,celsius,fahrenheit,dmflag,qcflag,dsflag)
- tmax(station,year,month,value,celsius,fahrenheit,dmflag,qcflag,dsflag)
- prcp(station,year,month,value,mm,inches,dmflag,qcflag,dsflag)
In each case the 'value' field is the original value from the USHCN file. In the case of temperature, this is an integer value in degrees celsius times 100. In the case of precipitation, it's in millimeters times 10. The views adjust these stored values into usable units and present them in other fields (celsius, fahrenheit, mm, inches). Note that in all cases a value of -9999 indicates a missing value. This sentinal value is preserved across the conversions meaning that the celsius, fahrenheit, mm, and inches values will still be -9999 for missing values.
In addition to the main data files there are the following support tables
- stations_fixed
- dmflag
- qcflag
- dsflag
There is a stations view that breaks the fixed stations record into usable fields:
- stations(station,lat,lon,elevation,state,name,comp1,comp2,comp3,UTCoffset)
The USHCN reports use a "station ID" which can be translated to a real location via the stations view. Each value stored in the data tables (tavg, tmin, tmax, prcp) includes 3 flags, DM (data measurement), QC (quality control), and DS (data source). Explainations of these are represented in the associated 'flag' table. This data is taken from seciton 2.2.1 of the USHCN readme file.
Finally there is a 'combined' view that aggregates all the information by station into a single record. Note that this is a complex join and so it takes significant work for SQLite to process this.
-
combined(station,lat,lon,elevation,state,name,comp1,comp2,comp3,UTCoffset,year,month,min_c,min_f,min_dm,min_qc,min_ds,max_c,max_f,max_dm,max_qc,max_ds,avg_c,avg_f,avg_dm,avg_qc,avg_ds,mm,inches,prcp_dm,prcp_qc,prcp_ds)
-
station: the station identifier
-
lat: the station's latitude
-
lon: the station's longitude
-
state: the state the station is located in
-
comp1 - comp3: previous stations combined into this one, see USHCN documentation
-
UTCoffset: the offset from UTC where this station is located
-
year: the year for these readings
-
month: the month for these readings
-
min_c: tmin (monthly mean minimum temperature) in celsius
-
min_f: tmin in fahrenheit
-
min_dm, min_qc, min_ds: flags for the minimum temperature
-
max_, avg_: same as mininimum above but for maximum and average temperature
-
mm: precipitation in millimeters
-
inches: precipitation in inches
-
prcp_dm, prcp_qc, prcp_ds: flags for the precipitation value
In order to use this database you will need to download data from USHCN. Data will be loaded into the following tables as indicated. USHCN has 3 formats of the same data. The RAW format is exactly as it was reported by the station. The TOB format has been corrected for time of observation errors. And finally the FLs.52j data has been fully adjusted based on NOAAs algorithim as described in detail in their documentation (see their readme for more information).
File | Table |
---|---|
ushcn-v2.5-stations.txt | stations_fixed |
ushcn.prcp.latest.[format].tar.gz | prcp_fixed |
ushcn.tavg.latest.[format].tar.gz | tavg_fixed |
ushcn.tmax.latest.[format].tar.gz | tmax_fixed |
ushcn.tmin.latest.[format].tar.gz | tmin_fixed |
The station text file can be imported into the SQLite database directly (.import ushcn-v2.5-stations.txt stations_fixed
). For the other data, the downloaded file needs to be unzipped. Instructions for doing this based on your operating system are included in section 1.3 of the USHCN readme. Once unzipped you will have one file per station per element. You can then import each station file into the appropriate table shown above (e.g. .import USH00011084.raw.tmin tmin_fixed
) . If you plan to use the combined view it is strongly encouraged to only import station data of interest as there are no indexes used in the database, therefore the more data you have the longer the processing will take.
If you find that you need to analyze more data than the unindexed views can handle, you can always create real tables from the views by doing something like create table tmin_r as select * from tmin;
to materialize the view into an actual table. Then you can create indexes to speed up searches, joins, etc. Since the views are just based on table (view) names, you can even replace a view with a materialized table by creating the table as above, then drop the original view (drop view tmin;
) and then rename the table to the view name (alter table tmin_r rename to tmin;
). Now the 'combined' view will use the materialized data and any associated indexes rather than the view. Just keep in mind that uploading new data to the underlying 'fixed' (tmin_fixed in this example) table will no longer be reflected in the materialized data (obviously). One could create a more complex system using triggers to automatically update the materialized data each time new records were inserted into the fixed tables, but that's beyond the scope of what I was trying to do here.
Hopefully others will find this helpful.