You might already know that security is a very “special” area. You also might know that there are plenty of surprises when dealing with unicode. As Mikael Goldman describes in his blog post “Creative usernames and Spotify account hijacking”, Spotify did have a vulnerability just because allowing users to use unicode characters as user names.
In my opinion, allowing user names with unicode was not really a fault – even using the “buggy” version of nodeprep.prepare(str) was not the problem. The problem in this case: using a display string (log on user name) as an identifier in subsequent processing.
So what would have been the alternative approach? I don’t trust strings, so for identifying objects (and users) I like GUIDs much more. They are not guessable and they do not store any other information than being an ID.
How would have that prevented the issue? Well as far as I understand the information Mikael Goldman did share in his blog post, the problem was that the user requests a password reset for the user “ᴮᴵᴳᴮᴵᴿᴰ”. This name has been canonicalized to “BIGBIRD” to lookup the email address and the canonicalized user name “BIGBIRD” has then be used for the update of the password – in that process another canonicalization took place, resulting in the user name “bigbird”, which is a different user.
Lets assume we use the canonicalized user name just for authentication and then only use the users ID (and assume that is a GUID). In that case the update would have been done using the GUID for “ᴮᴵᴳᴮᴵᴿᴰ”, since the password reset request would include that GUID instead of the user name itself. Also the update statement for updating the user database would update the password for the GUID of user “ᴮᴵᴳᴮᴵᴿᴰ”.
As you can see by simply NOT using a string as an identifier (a GUID should be parsed and really used as a strong typed GUID if you are using a language that supports strong typing – but even in Python, where no compiler checks will be done for type consistancy, you can ensure that the ID is a valid GUID), you get rid of potential security issues.
As a generic rule: NEVER use “names” as “identifiers”! In case of Authentication you might need to lookup the user by its name, but after that process you should always use the ID of the user for additional processing – not a name.