Created
April 4, 2017 19:29
-
-
Save tswast/19f1e7988095207e9dd34d9031e0c816 to your computer and use it in GitHub Desktop.
USA Names Conditional Probabilities
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#standardSQL | |
SELECT | |
a.name AS name, | |
a.state AS state, | |
a.gender AS gender, | |
a.year AS year, | |
a.number AS number, | |
(a.number / b.total_number) AS name_frequency | |
FROM | |
`bigquery-public-data.usa_names.usa_1910_current` a | |
JOIN ( | |
SELECT | |
name, | |
gender, | |
year, | |
SUM(number) AS total_number | |
FROM | |
`bigquery-public-data.usa_names.usa_1910_current` | |
GROUP BY | |
name, | |
gender, | |
year) b | |
ON | |
a.name = b.name | |
AND a.gender = b.gender | |
AND a.year = b.year |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Why is name frequency equal to “number / total number”? The name frequency values represent conditional probabilities, for example