Last active
December 10, 2015 23:29
-
-
Save AlexNisnevich/4509555 to your computer and use it in GitHub Desktop.
Calculating the "Linguistic Diversity Index" (% chance that two random residents of a given country have different native languages)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'mechanize' | |
agent = Mechanize.new | |
language_regex = Regexp.new(/]\r\n\t\t\t\t\t\t([0-9,]*)(\.| )/) | |
index_page = agent.get('http://www.ethnologue.com/country_index.asp?place=all') | |
index_page.links_with(:href => /show_country.asp/).each do |country_link| | |
country = country_link.text().strip() | |
speakers = [] | |
country_page = agent.get('http://www.ethnologue.com/' + country_link.href) | |
# get language speakers from table | |
languages = country_page / "#main table p" | |
if languages.length() > 0 | |
speakers += languages.map {|l| language_regex.match(l.children()[0].content())[1].gsub(',','').to_i rescue 0} | |
end | |
# get immigrant language speakers | |
description = (country_page / "#main blockquote")[0].children()[0].content() | |
immigrant_languages_desciption = description.scan(/Immigrant languages:.*?\./)[0] | |
if immigrant_languages_desciption | |
immigrant_languages = immigrant_languages_desciption.scan(/\(([0-9,]*)\)/) | |
.map{|x| x[0].gsub(',','').to_i} | |
speakers += immigrant_languages | |
end | |
# get language speakers from subpages (if any) | |
country_page.links_with(:href => /show_country.asp/).each do |subpage_link| | |
subpage = agent.get('http://www.ethnologue.com/' + subpage_link.href) | |
subpage_languages = subpage / "#main table p" | |
if subpage_languages.length() > 0 | |
subpage_speakers = subpage_languages.map {|l| language_regex.match(l.children()[0].content())[1].gsub(',','').to_i rescue 0} | |
end | |
speakers += subpage_speakers | |
end | |
# compute diversity index | |
total_speakers = speakers.inject(0, &:+) | |
diversity_index = if total_speakers > 0 | |
(1 - speakers.map{|x| x**2}.inject(0, &:+).to_f / total_speakers ** 2) * 100 | |
else | |
"Insufficient data for meaningful answer" | |
end | |
puts "#{country.ljust(40)}\t#{diversity_index}" | |
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Algeria 31.699673192820164 | |
Angola 81.29370798817328 | |
Benin 92.11057717539474 | |
Botswana 51.22767828568089 | |
British Indian Ocean Territory 0.0 | |
Burkina Faso 77.2756605083412 | |
Burundi 0.4098220647649864 | |
Cameroon 94.63046943265564 | |
Cape Verde Islands 6.978565492626021 | |
Central African Republic 95.9003856837838 | |
Chad 94.42896815988286 | |
Comoros 54.54390605981542 | |
Congo 85.77254657779197 | |
Côte d’Ivoire 91.65124005127421 | |
Democratic Republic of the Congo 94.90941126069495 | |
Djibouti 57.05889315356345 | |
Egypt 53.57875674539042 | |
Equatorial Guinea 41.68156707806562 | |
Eritrea 62.7137123854154 | |
Ethiopia 86.39875771469482 | |
Gabon 76.18998906031202 | |
Gambia 78.04879967155473 | |
Ghana 80.51300067120881 | |
Guinea 75.4053290379022 | |
Guinea-Bissau 87.14282048237423 | |
Kenya 87.66138280471371 | |
Lesotho 26.043526957427876 | |
Liberia 91.64831584153298 | |
Libya 50.01589550409926 | |
Madagascar 72.1205706762556 | |
Malawi 52.47145692691115 | |
Mali 87.40234794866716 | |
Mauritania 18.340606668180826 | |
Mauritius 58.69801400867735 | |
Mayotte 45.93301997343626 | |
Morocco 46.57984833648553 | |
Mozambique 93.23020888409424 | |
Namibia 80.9691286020928 | |
Niger 63.982686924673885 | |
Nigeria 87.5880637525699 | |
Réunion 6.611325321420535 | |
Rwanda 0.3991325973138782 | |
Saint Helena 0.0 | |
São Tomé e Príncipe 38.96049208000429 | |
Senegal 77.47010999068402 | |
Seychelles 6.674832661570374 | |
Sierra Leone 82.15575424124413 | |
Somalia 34.8653561366311 | |
South Africa 87.40920547395258 | |
Sudan 54.468506177066715 | |
Swaziland 21.03374771229205 | |
Tanzania 94.68256584379323 | |
Togo 89.7438085924971 | |
Tunisia 1.1554334899803953 | |
Uganda 92.7806797491654 | |
Western Sahara Insufficient data for meaningful answer | |
Zambia 87.7741043652666 | |
Zimbabwe 51.81854504340745 | |
Anguilla 14.096546829889844 | |
Antigua and Barbuda 24.840236686390536 | |
Argentina 23.305694740552884 | |
Aruba 38.658268085533635 | |
Bahamas 38.61544731346203 | |
Barbados 9.101967993079585 | |
Belize 76.93329284299442 | |
Bermuda 8.138973497503343 | |
Bolivia 68.05554408956584 | |
Brazil 10.337024836154285 | |
British Virgin Islands 16.73426914990762 | |
Canada 59.93624971621565 | |
Cayman Islands 54.72500325835451 | |
Chile 3.4993045673172074 | |
Colombia 3.6925509076310314 | |
Costa Rica 4.952362177378156 | |
Cuba 0.06995102571299983 | |
Dominica 31.316000918273645 | |
Dominican Republic 5.3372797375142 | |
Ecuador 29.123402165765732 | |
El Salvador 0.4327304895766826 | |
Falkland Islands 0.0 | |
French Guiana 44.99106681494588 | |
Greenland 24.18808769285289 | |
Grenada 6.434382826213081 | |
Guadeloupe 8.390447425825997 | |
Guatemala 68.98774716682014 | |
Guyana 7.960544985142537 | |
Haiti 0.017238407043085324 | |
Honduras 5.56690668783939 | |
Jamaica 1.1129074207005596 | |
Martinique 4.274563437697953 | |
Mexico 13.876397101094373 | |
Montserrat 2.573565033512437 | |
Netherlands Antilles 28.06978816243101 | |
Nicaragua 8.225830508450994 | |
Panama 32.37349331972893 | |
Paraguay 35.201130577432096 | |
Peru 38.75688964989019 | |
Puerto Rico 5.823930331220106 | |
Saint Kitts and Nevis 1.0152019991670147 | |
Saint Lucia 1.984912155074403 | |
Saint Pierre and Miquelon 13.427438016528924 | |
Saint Vincent and the Grenadines 0.8628857113705646 | |
Suriname 78.8380729218009 | |
Trinidad and Tobago 69.63512296915673 | |
Turks and Caicos Islands 14.581068310616452 | |
United States 31.86691914915486 | |
U.S. Virgin Islands 35.408447211924155 | |
Uruguay 9.227889391811617 | |
Venezuela 5.063642596432083 | |
Afghanistan 74.06260141634601 | |
Armenia 15.937405486856449 | |
Azerbaijan 45.53382480660305 | |
Bahrain 66.29017268103308 | |
Bangladesh 38.65796084557438 | |
Bhutan 88.37058473560809 | |
Brunei 58.530730550563305 | |
Cambodia 16.934725805345497 | |
China 50.94033487596905 | |
Cyprus 33.146694856656445 | |
East Timor 89.67768750191938 | |
Georgia 58.2130826546376 | |
India 94.00709564817782 | |
Indonesia 81.6407750978348 | |
Iran 82.19822545378335 | |
Iraq 67.44527018279865 | |
Israel 66.51619374537717 | |
Japan 3.3222413272624762 | |
Jordan 49.56195172750171 | |
Kazakhstan 69.86098216151181 | |
Korea, North 0.0 | |
Korea, South 0.301941997578703 | |
Kuwait 55.64118217196852 | |
Kyrgyzstan 67.00557480388112 | |
Laos 67.373776901974 | |
Lebanon 16.1187517582416 | |
Malaysia 74.35744255468673 | |
Maldives 0.7803670668814044 | |
Mongolia 33.153601936743286 | |
Myanmar 53.45618901811102 | |
Nepal 73.7517920705598 | |
Oman 69.31238634619146 | |
Pakistan 76.20274896145973 | |
Palestinian West Bank and Gaza 20.80927249779493 | |
Philippines 85.53401557476266 | |
Qatar 60.786779506476684 | |
Russian Federation 24.995010207148162 | |
Saudi Arabia 13.799749634195669 | |
Singapore 77.33491872022185 | |
Sri Lanka 31.96150969145147 | |
Syria 52.702457034896156 | |
Taiwan 48.83268568594645 | |
Tajikistan 48.46487950273295 | |
Thailand 74.05745652100431 | |
Turkey 27.117282890229287 | |
Turkmenistan 38.55994503735436 | |
United Arab Emirates 77.71106874603448 | |
Uzbekistan 43.6943321595747 | |
Viet Nam 24.23233459945745 | |
Yemen 57.88924643577778 | |
Albania 57.74006706496839 | |
Andorra 57.4397807553035 | |
Austria 53.483226201624355 | |
Belarus 39.65698563991642 | |
Belgium 74.7369790880361 | |
Bosnia and Herzegovina 65.9494825787697 | |
Bulgaria 26.34263804471374 | |
Croatia 21.062584554220997 | |
Czech Republic 14.588923337734116 | |
Denmark 5.48362302023897 | |
Estonia 45.71375248167826 | |
Finland 14.973124353299482 | |
France 26.813924266089685 | |
Germany 36.92089095990695 | |
Gibraltar 49.79188345473465 | |
Greece 14.381700973809085 | |
Hungary 2.4042740252828265 | |
Iceland 0.0 | |
Ireland 16.53529661205034 | |
Italy 58.60722197497383 | |
Latvia 58.3538653000266 | |
Liechtenstein 12.807973449406019 | |
Lithuania 34.061970346000884 | |
Luxembourg 48.85987807319667 | |
Macedonia 57.81180262435082 | |
Malta 41.194965421903476 | |
Moldova 58.97702521384139 | |
Monaco 52.123456790123456 | |
Montenegro 56.70522262634346 | |
Netherlands 29.21576730753316 | |
Norway 7.385205099860115 | |
Poland 6.609509310244688 | |
Portugal 2.199399068252028 | |
Romania 16.880122492038062 | |
Russian Federation 24.995010207148162 | |
San Marino 49.40978657922036 | |
Serbia 62.84260207262302 | |
Slovakia 23.657541831815355 | |
Slovenia 17.40638784015417 | |
Spain 51.17915479068357 | |
Sweden 14.653394547023424 | |
Switzerland 57.734424451117384 | |
Turkey 27.117282890229287 | |
Ukraine 49.48674649222507 | |
United Kingdom 13.410166985364558 | |
Vatican State 0.0 | |
American Samoa 11.60779188508898 | |
Australia 12.492019130827991 | |
Cook Islands 60.165320871372295 | |
Fiji 60.84507344863677 | |
French Polynesia 59.61834448804556 | |
Guam 65.49054866014649 | |
Kiribati 3.297694785495231 | |
Marshall Islands 0.0 | |
Micronesia 77.20501006525059 | |
Nauru 59.633273819088004 | |
New Caledonia 83.37247480837159 | |
New Zealand 10.701019572854731 | |
Niue 7.126551422429783 | |
Norfolk Island 31.065088757396452 | |
Northern Mariana Islands 64.21251949209179 | |
Palau 7.7521647037822605 | |
Papua New Guinea 99.03440960795126 | |
Pitcairn 0.0 | |
Samoa 0.2006016031999458 | |
Solomon Islands 96.71358879627837 | |
Tokelau 5.365041617122468 | |
Tonga 1.4127049108573209 | |
Tuvalu 13.908052073396382 | |
Vanuatu 97.35776298681776 | |
Wallis and Futuna 40.70498483891487 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment