I am trying to learn how to defend against security attacks on websites. The link below shows a good tutorial, but I am puzzled by one statement:
In http://google-gruyere.appspot.com/part3#3__client_state_manipulation , under "Cookie manipulation", Gruyere says Pythons hash is insecure since it hashes from left-to-right.
The Gruyere application is using this to encrypt data:
# global cookie_secret; only use positive hash values
h_data = str(hash(cookie_secret + c_data) & 0x7FFFFFF)
c_data is a username; cookie_secret is a static string (which is just '' by default)
I understand that in more secure hash functions, one change generates a whole new result, but I don't understand why this insecure, because different c_data generates whole different hashes!
EDIT: How would one go about beating a hash like this?
What the comment may be trying to get at is that for most hash functions, if you are given
HASH(m) then it is easy to calculate
HASH(m . x), for any
. is concatenation).
Therefore, if you are user
ro, and the server sends you
HASH(secret . ro), then you can easily calculate
HASH(secret . root), and login as a different user.
I think that's just a bad explanation there. Python's
hash() is insecure because it's easy to find collisions, but "hashes from left to right" has nothing to do with why it's easy to find collisions. Cryptographically secure hashes also process data strictly in sequence; they're likely to operate on data 128 or 256 bits at a time rather than one byte at a time, but that's just a detail of the implementation.
(It should be said that
hash() being insecure is not a bug in Python, because that's not what it's for. It's an exposed detail of the implementation of Python's dictionaries as hash tables, and you generally don't want a secure hash function for your hash table, because that would slow it down so much that it would defeat the purpose. Python does provide secure hash functions in the hashlib module.)
(The use of an insecure hash is not the only problem with the code you show, but it is by far the most important problem.)
Python's default hashing algorithm (for all types, but it has the most severe consequences for strings as those are commonly hashed for security) is geared towards running fast and playing nice with the implementation of dicts. It's not a cryptographic hashing function, you shouldn't use it for security. Use
hashlib for this.
The python built-in hash function is not intended for secure, cryptographic hashing. It's intention is to facilitate storing Python objects into dictionaries efficiently.
The internal hash implementations are too predictable (too many collisions) for secure uses. For example, the following assertions are all true:
hash('a') < hash('b') hash('b') < hash('c') hash('c') < hash('d')
This sequential nature makes for great dictionary storage behaviour, for which it was designed.
To create a secure hash, use the hashlib library instead.
One would go about "beating" a hash like that by appending their data to the end of the string being hashed and predicting the hash function output. Let me illustrate this:
Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> data = 'root|admin|author' >>> str(hash('' + data) & 0x7FFFFFF); '116042699' >>> data = 'root|admin|authos' >>> str(hash('' + data) & 0x7FFFFFF); '116042698' >>>
Empty string ('') is the cookie secret you mentioned to be an empty string. In this particular example, though not really exploitable, one can see that the hash changed by 1 and the last byte of
data changed "by one" too. Now, this example is not really an exploit (omitting the fact that creating a username of the
anything_here|admin format makes that user admin) because there's some data after the username (left to right) so even if you create a username that's very close to the one being attacked then the rest of the string changes the hash in a completely undesirable manner. However, if the cookie was in the form of
105770185|user07 instead of
105770185|user07||author then you'd easily create a user "user08" or "user06" and
computepredict the hash (hometask: what's the hash for "user08"? ).