This gist explains how to interpret the DUSCMPUB-formatted mortality data found here in conjunction with the reference PDF.
The reference PDF uses 3 components to describe the type of data, its location within the row, and its size:
- data item: A specific datapoint, for example, a value representing the highest completed level of education.
- tape location: The column within the line where a data item is located. Each character in a line represents a column, with each line having 472 columns.
- size: The number of columns used to represent a particular data item.
The reference PDF contains one or more tables for each data item describing how it is to be interpreted.
Page 3 of the reference PDF says that the data item resident status is at tape location 20 with a size of 1. Referencing figure 1 below, the value at column 20 is 1
:
1 3101 M1084 422210 1M1 2015U7BN I500230 067 22 0211I500 61L031 02 I500 L031 01 11 100 601`
Figure 1: An example row of data in DUSCMCPUB format.
In order to understand what a 1
means when representing resident status, we reference table 1 below:
1 ... RESIDENTS
State and County of Occurrence and Residence are the same.
2 ... INTRASTATE NONRESIDENTS
State of Occurrence and Residence are the same, but County is
different.
3 ... INTERSTATE NONRESIDENTS
State of Occurrence and Residence are different, but both are in the U.S.
4 ... FOREIGN RESIDENTS
State of Occurrence is one of the 50 States or the District of Columbia,
but Place of Residence is outside of the U.S.
Table 1: A definition of the numerical values used to represent resident status, taken from page 3 of the reference PDF.
Some data items have an "internal format." For example, education data exists at tape location 61 and has a size of 4, but consists of 3 distinct values, which is further broken down in table 2 below:
61-62: Education (1989 revision)
63: Education (2003 revision)
64: Education Reporting flag
Table 2: A breakdown of the columns used to represent education data, taken from page 5 of the reference PDF.
Table 2 indicates that column 64 describes which version of education data is present:
0 ... 1989 revision of education item on certificate
1 ... 2003 revision of education item on certificate
2 ... no education item on certificate
Table 3: A definition of values used to represent the version of education data present, taken from page 5 of the reference PDF.
In other words:
- If the value at column 64 is
0
, then the relevant value exists at columns 61 and 62 - If the value at column 64 is
1
, the relevant value is at column 63 - If the value at column 64 is
2
, there is no education data present
We reference table 4 to interpret the aforementioned value depending on which version of the education data exists:
Education (1989 revision)
00 ... No formal education
01-08 ... Years of elementary school
09 ... 1 year of high school
10 ... 2 years of high school
11 ... 3 years of high school
12 ... 4 years of high school
13 ... 1 year of college
14 ... 2 years of college
15 ... 3 years of college
16 ... 4 years of college
17 ... 5 or more years of college
99 ... Not stated
Education (2003 revision)
Field is blank for registration areas that are using the 1989 revision format of the item.
1 ... 8th grade or less
2 ... 9 - 12th grade, no diploma
3 ... high school graduate or GED completed
4 ... some college credit, but no degree
5 ... Associate degree
6 ... Bachelor’s degree
7 ... Master’s degree
8 ... Doctorate or professional degree
9 ... Unknown
Table 4: A definition of values used to represent level-of-education broken down by format, taken from page 5 of the PDF.
Using these references, we can interpret the first two data items from figure 1 as being a RESIDENT
with an education level of high school graduate or GED completed
. Using the reference PDF, all data items in a row of DUSCMCPUB formatted data can be meaningfully interpreted.