I think that password entropy could be calculated from standard deviation of code points: log2((deviation * 2) ** length
.
The log2 converts the entropy in bits.
Indeed, unicode points are grouped by subsets. All latin characters have close code points. It's the same for other alphabets and mathematic symbols. They are grouped together. So it makes sens to compute the password entropy from the standard deviation of all used code points.
However I don't get the same entropy as the famous XKCD 936.
Probably because XKCD computes from its generation rules whereas I compute from the result.
correcthorsebatterystaple is 4 english words.
If you know the rules the entropy is log2(number_of_english_words ** 4)
.
But for the formula it's 25 latin characters.
So there is the inside entropy (with knowing the rules) and the outside entropy (without knowing the rules).
But, the answer is ∀∁∂∃∄∅∆∇∈∉∊∋∌∍∎∏
as strong as abcdefghijklmnop
according to this formula.
It's not suprising if you look at the Unicode table.
This is just an idea I got in the middle of the night. I dump it here unfinished to focus on something else.
Here is below a small Ruby script.
module SolidPassword
def self.variance(array)
average = array.sum.to_f / array.size
array.reduce(0) { |sum, code| sum + (code - average) ** 2 } / array.size
end
def self.deviation(array)
Math.sqrt(variance(array))
end
def self.entropy(string)
deviation = deviation(string.each_codepoint)
deviation > 0 ? Math.log2((deviation * 2) ** string.size) : 0
end
end
# https://en.wikipedia.org/wiki/Unicode#Standardized_subsets
passwords = [
"password",
"Tr0ub4dor&3",
"Troubador&3",
"troubador&3",
"Tr0ub4dor",
"Tr0ubador",
"troubador",
"correcthorsebatterystaple",
"mot de passe", # Password in french
"κωδικός πρόσβασης", # Password in greek
"暗号", # Password in chinese
"パスワード", # Password in japanese
"пассщорд", # Password converted in cyrillic
"a9624317cb0f55b2ba1e04485b05cf6f77961ef5", # SHA1
"∀∁∂∃∄∅∆∇∈∉∊∋∌∍∎∏",
"abcdefghijklmnop"
]
puts passwords.map { |str| "#{str}: #{SolidPassword.entropy(str).round}" }
Output:
password: 31
Tr0ub4dor&3: 65
Troubador&3: 62
troubador&3: 63
Tr0ub4dor: 51
Tr0ubador: 48
troubador: 35
correcthorsebatterystaple: 98
mot de passe: 70
κωδικός πρόσβασης: 149
暗号: 24
パスワード: 28
пассщорд: 31
a9624317cb0f55b2ba1e04485b05cf6f77961ef5: 220
∀∁∂∃∄∅∆∇∈∉∊∋∌∍∎∏: 51
abcdefghijklmnop: 51