I would like to hash images, which have been converted to byte arrays. The faster the process, the better, so I was wondering how much of a 300000 element byte array I really need to feed into the hash function (sha1 in this case) to get a unique hash string? Does anybody know if, in image binaries, the first x chunk is all meta data? Is there a magic index number that I can use instead of the full length of the byte array? Ex. only hash the first .
My opinion is that this is a matter of the probability of having a unique hash for each image. If you use only the first 5000 bytes of a 300000 bytes image, then two images only differing in their lower part would probably have the same hash. This is not necessarily a matter of sha1.
You could also distribute your 5000 bytes evenly over the 300000 bytes.