We have some MySQL tables with 100,000 to 10,000,000 records. Some of the fields are VARCHAR(100) when in fact no entry exceeds 11 characters.
Clearly we are using up way more space then we should be... If one VARCHAR(100) field for a million-record table uses 100MB of space, then we might be wasting as much as several GB of space.
If we were to streamline these tables, and reduce the VARCHAR fields to their proper size, would it help us with more than just storage space? Could it possibly improve the lookup times for queries?
As of MySQL documentation to Data type storage requirements the varchar type stores the values as follows:
L + 1 bytes if column values require 0 – 255 bytes, L + 2 bytes if values may require more than 255 bytes, where L represents the actual length in bytes of a given string value
Seems to me that if your plan is to change the type from VARCHAR(100) to VARCHAR(11) it will not affect query performance because MySQL already stores the value on its "optimum".
If you had a type CHAR(100) your strings with less than 100 characters would be right padded with blank spaces and in this case you will have a bad space consumption and I think that a bad query performance too.
The length of CHAR type, referring the documentation, is:
M × w bytes, 0 <= M <= 255, where w is the number of bytes required for the maximum-length character in the character set, where M represents the declared column length in characters
But if all your records have fixed length 11 you should use CHAR(11) and it will improve the storage and performance of queries.
Another important point about string storage refers to the char set, as says in documentation:
To calculate the number of bytes used to store a particular CHAR, VARCHAR, or TEXT column value, you must take into account the character set used for that column and whether the value contains multi-byte characters. In particular, when using the utf8 Unicode character set, you must keep in mind that not all characters use the same number of bytes and can require up to three bytes per character.
Hope it helps!
I don't know the specifics of the mysql implementation, but I do know the typical implementation of a relational database, and in that implementation it does help.
Typically, records are stored consecutively in a file called a RID table. The record number in the RID table (using zero based counting) times the record size is an offset to where in the file the record is stored.
If the record size is smaller, then more records from the RID table fit into a disk sector fetched from the disk and more records fit into memory.
Even with a different implementation, a smaller record buffer allows more records to be cached in memory, which can reduce the number of disk accesses.