The world's most popular open source database
The storage requirements for each of the data types supported by MySQL are listed here by category.
The maximum size of a row in a MyISAM table is
65,535 bytes. (However, each BLOB
or TEXT column contributes only
9-12 bytes toward this size.) This limitation may be shared by
other storage engines as well. See
Chapter 13, Storage Engines, for more information.
For tables using the NDBCLUSTER storage
engine, there is the factor of 4-byte
alignment to be taken into account when calculating
storage requirements. This means that all NDB
data storage is done in multiples of 4 bytes. Thus, a column
value that would take 15 bytes in a table using a storage engine
other than NDB requires 16 bytes in an
NDB table. This requirement applies in
addition to any other considerations that are discussed in this
section. For example, in NDBCLUSTER tables,
the TINYINT,
SMALLINT,
MEDIUMINT, and
INTEGER
(INT) column types each require 4
bytes storage per record due to the alignment factor.
An exception to this rule is the
BIT type, which is
not 4-byte aligned. In MySQL Cluster
tables, a BIT(
column takes M)M bits of storage space.
However, if a table definition contains 1 or more
BIT columns (up to 32
BIT columns), then
NDBCLUSTER reserves 4 bytes (32 bits) per row
for these. If a table definition contains more than 32
BIT columns (up to 64 such
columns), then NDBCLUSTER reserves 8 bytes
(that is, 64 bits) per row.
In addition, while a NULL itself does not
require any storage space, NDBCLUSTER
reserves 4 bytes per row if the table definition contains any
columns defined as NULL, up to 32
NULL columns. (If a MySQL Cluster table is
defined with more than 32 NULL columns up to
64 NULL columns, then 8 bytes per row is
reserved.)
When calculating storage requirements for MySQL Cluster tables,
you must also remember that every table using the
NDBCLUSTER storage engine requires a primary
key; if no primary key is defined by the user, then a
“hidden” primary key will be created by
NDB. This hidden primary key consumes 31-35
bytes per table record.
You may find the ndb_size.pl utility to be
useful for estimating NDB storage requirements.
This Perl script connects to a current MySQL (non-Cluster)
database and creates a report on how much space that database
would require if it used the NDBCLUSTER storage
engine. See Section 17.10.14, “ndb_size.pl — NDBCLUSTER Size Requirement Estimator”,
for more information.
Storage Requirements for Numeric Types
| Data Type | Storage Required |
TINYINT |
1 byte |
SMALLINT |
2 bytes |
MEDIUMINT |
3 bytes |
INT,
INTEGER
|
4 bytes |
BIGINT |
8 bytes |
FLOAT( |
4 bytes if 0 <= p <= 24, 8 bytes if 25
<= p <= 53 |
FLOAT |
4 bytes |
DOUBLE [PRECISION],
REAL
|
8 bytes |
DECIMAL(,
NUMERIC(
|
Varies; see following discussion |
BIT( |
approximately (M+7)/8 bytes |
The storage requirements for
DECIMAL (and
NUMERIC) are version-specific:
As of MySQL 5.0.3, values for
DECIMAL columns are represented
using a binary format that packs nine decimal (base 10) digits
into four bytes. Storage for the integer and fractional parts of
each value are determined separately. Each multiple of nine digits
requires four bytes, and the “leftover” digits
require some fraction of four bytes. The storage required for
excess digits is given by the following table:
| Leftover Digits | Number of Bytes |
| 0 | 0 |
| 1 | 1 |
| 2 | 1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 3 |
| 6 | 3 |
| 7 | 4 |
| 8 | 4 |
Before MySQL 5.0.3, DECIMAL columns
are represented as strings and storage requirements are:
M+2 bytes if
D > 0,
bytes if
M+1D = 0, D+2
if M <
D
Storage Requirements for Date and Time Types
The storage requirements shown in the table arise from the way that MySQL represents temporal values:
DATE: A three-byte integer
packed as DD +
MM×32 +
YYYY×16×32
TIME: A three-byte integer
packed as DD×24×3600 +
HH×3600 +
MM×60 + SS
DATETIME: Eight bytes:
A four-byte integer packed as
YYYY×10000 +
MM×100 +
DD
A four-byte integer packed as
HH×10000 +
MM×100 +
SS
TIMESTAMP: A four-byte integer
representing seconds UTC since the epoch ('1970-01-01
00:00:00' UTC)
YEAR: A one-byte integer
Storage Requirements for String Types
In the following table, M represents
the declared column length in characters for non-binary string
types and bytes for binary string types.
L represents the actual length in bytes
of a given string value.
| Data Type | Storage Required |
CHAR( |
M × w bytes,
0 <= 255, where w is
the number of bytes required for the maximum-length
character in the character set |
BINARY( |
M bytes, 0 <=
255 |
VARCHAR(,
VARBINARY(
|
L + 1 bytes if column values require 0
– 255 bytes, L + 2 bytes
if values may require more than 255 bytes |
TINYBLOB,
TINYTEXT
|
L + 1 bytes, where
L <
28
|
BLOB, TEXT
|
L + 2 bytes, where
L <
216
|
MEDIUMBLOB,
MEDIUMTEXT
|
L + 3 bytes, where
L <
224
|
LONGBLOB,
LONGTEXT
|
L + 4 bytes, where
L <
232
|
ENUM(' |
1 or 2 bytes, depending on the number of enumeration values (65,535 values maximum) |
SET(' |
1, 2, 3, 4, or 8 bytes, depending on the number of set members (64 members maximum) |
Variable-length string types are stored using a length prefix plus
data. The length prefix requires from one to four bytes depending
on the data type, and the value of the prefix is
L (the byte length of the string). For
example, storage for a MEDIUMTEXT
value requires L bytes to store the
value plus three bytes to store the length of the value.
To calculate the number of bytes used to store a particular
CHAR,
VARCHAR, or
TEXT column value, you must take
into account the character set used for that column and whether
the value contains multi-byte characters. In particular, when
using the utf8 Unicode character set, you must
keep in mind that not all utf8 characters use
the same number of bytes and can require up to three bytes per
character. For a breakdown of the storage used for different
categories of utf8 characters, see
Section 9.1.9, “Unicode Support”.
VARCHAR,
VARBINARY, and the
BLOB and
TEXT types are variable-length
types. For each, the storage requirements depend on these factors:
The actual length of the column value
The column's maximum possible length
The character set used for the column, because some character sets contain multi-byte characters
For example, a VARCHAR(255) column can hold a
string with a maximum length of 255 characters. Assuming that the
column uses the latin1 character set (one byte
per character), the actual storage required is the length of the
string (L), plus one byte to record the
length of the string. For the string 'abcd',
L is 4 and the storage requirement is
five bytes. If the same column is instead declared to use the
ucs2 double-byte character set, the storage
requirement is 10 bytes: The length of 'abcd'
is eight bytes and the column requires two bytes to store lengths
because the maximum length is greater than 255 (up to 510 bytes).
The effective maximum number of bytes that
can be stored in a VARCHAR or
VARBINARY column is subject to
the maximum row size of 65,535 bytes, which is shared among all
columns. For a VARCHAR column
that stores multi-byte characters, the effective maximum number
of characters is less. For example,
utf8 characters can require up to three bytes
per character, so a VARCHAR
column that uses the utf8 character set can
be declared to be a maximum of 21,844 characters.
As of MySQL 5.0.3, the NDBCLUSTER engine
supports only fixed-width columns. This means that a
VARCHAR column from a table in a
MySQL Cluster will behave as follows:
If the size of the column is fewer than 256 characters, the column requires one byte extra storage per row.
If the size of the column is 256 characters or more, the column requires two bytes extra storage per row.
The number of bytes required per character varies according to the
character set used. For example, if a
VARCHAR(100) column in a Cluster table uses the
utf8 character set, each character requires 3
bytes storage. This means that each record in such a column takes
up 100 × 3 + 1 = 301 bytes for storage,
regardless of the length of the string actually stored in any
given record. For a VARCHAR(1000) column in a
table using the NDBCLUSTER storage engine with
the utf8 character set, each record will use
1000 × 3 + 2 = 3002 bytes storage; that is, the
column is 1,000 characters wide, each character requires 3 bytes
storage, and each record has a 2-byte overhead because 1,000 >=
256.
TEXT and
BLOB columns are implemented
differently in the NDB Cluster storage engine, wherein each row in
a TEXT column is made up of two
separate parts. One of these is of fixed size (256 bytes), and is
actually stored in the original table. The other consists of any
data in excess of 256 bytes, which is stored in a hidden table.
The rows in this second table are always 2,000 bytes long. This
means that the size of a TEXT
column is 256 if size <= 256 (where
size represents the size of the row);
otherwise, the size is 256 + size +
(2000 – (size – 256) %
2000).
The size of an ENUM object is
determined by the number of different enumeration values. One byte
is used for enumerations with up to 255 possible values. Two bytes
are used for enumerations having between 256 and 65,535 possible
values. See Section 10.4.4, “The ENUM Type”.
The size of a SET object is
determined by the number of different set members. If the set size
is N, the object occupies
( bytes,
rounded up to 1, 2, 3, 4, or 8 bytes. A
N+7)/8SET can have a maximum of 64
members. See Section 10.4.5, “The SET Type”.


User Comments
Had a lot of trouble finding the maximum table size in bytes for capacity planning. More specifically it was InnoDB tables that I had a problem with. Average row size is good, but I wanted maximum row size.
I checked several products and could not find what I wanted. Some of the tables I deal with are 300+ fields and so manual calculation was not practical.
So I wrote a little perl script that does it. Thought it might be of some use, so I include it here...it does all field types except enum/set types. It does not calculate anything regarding index size.
Just do a mysqldump -d (just the schema) of your DB to a file, and run this perl script specifying the schema file as the only argument.
----------------------------------------------------------------
#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;
my %DataType =
("TINYINT"=>1,
"SMALLINT"=>2,
"MEDIUMINT"=>3,
"INT"=>4,
"BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3,
"DATETIME"=>8,
"TIMESTAMP"=>4,
"TIME"=>3,
"YEAR"=>1,
"CHAR"=>'$M',
"VARCHAR"=>'$M+1',
"TINYBLOB"=>'$M+1',
"TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2',
"TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3',
"MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4',
"LONGTEXT"=>'$M+4');
my $D;
my $M;
my $dt;
my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;
open (TABLEFILE,"< $ARGV[0]");
LOGPARSE:while (<TABLEFILE>)
{
chomp;
if ( $_ =~ s/create table[ ]*([a-zA-Z_]*).*/$1/i )
{
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*\(*[0-9,]*\)*|SMALLINT[ ]*\(*[0-9,]*\)*|MEDIUMINT[ ]*\(*[0-9,]*\)*|INT[ ]*\(*[0-9,]*\)*|BIGINT[ ]*\(*[0-9,]*\)*|FLOAT[ ]*\(*[0-9,]*\)*|DOUBLE[ ]*\(*[0-9,]*\)*|DECIMAL[ ]*\(*[0-9,]*\)*|NUMERIC[ ]*\(*[0-9,]*\)*|DATE[ ]*\(*[0-9,]*\)*|DATETIME[ ]*\(*[0-9,]*\)*|TIMESTAMP[ ]*\(*[0-9,]*\)*|TIME[ ]*\(*[0-9,]*\)*|YEAR[ ]*\(*[0-9,]*\)*|CHAR[ ]*\(*[0-9,]*\)*|VARCHAR[ ]*\(*[0-9,]*\)*|TINYBLOB[ ]*\(*[0-9,]*\)*|TINYTEXT[ ]*\(*[0-9,]*\)*|BLOB[ ]*\(*[0-9,]*\)*|TEXT[ ]*\(*[0-9,]*\)*|MEDIUMBLOB[ ]*\(*[0-9,]*\)*|MEDIUMTEXT[ ]*\(*[0-9,]*\)*|LONGBLOB[ ]*\(*[0-9,]*\)*|LONGTEXT[ ]*\(*[0-9,]*\)*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";
Here's a modification of Marc's script above that also handles ENUM's. Enjoy.
#!/usr/bin/perl
use Data::Dumper;
use strict;
$| = 1;
my %DataType =
("TINYINT"=>1, "SMALLINT"=>2, "MEDIUMINT"=>3,
"INT"=>4, "BIGINT"=>8,
"FLOAT"=>'if ($M <= 24) {return 4;} else {return 8;}',
"DOUBLE"=>8,
"DECIMAL"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"NUMERIC"=>'if ($M < $D) {return $D + 2;} elsif ($D > 0) {return $M + 2;} else {return $M + 1;}',
"DATE"=>3, "DATETIME"=>8, "TIMESTAMP"=>4, "TIME"=>3, "YEAR"=>1,
"CHAR"=>'$M', "VARCHAR"=>'$M+1',
"ENUM"=>1,
"TINYBLOB"=>'$M+1', "TINYTEXT"=>'$M+1',
"BLOB"=>'$M+2', "TEXT"=>'$M+2',
"MEDIUMBLOB"=>'$M+3', "MEDIUMTEXT"=>'$M+3',
"LONGBLOB"=>'$M+4', "LONGTEXT"=>'$M+4');
my ($D, $M, $dt);
my $fieldCount = 0;
my $byteCount = 0;
my $fieldName;
open (TABLEFILE,"< $ARGV[0]");
LOGPARSE:while (<TABLEFILE>) {
chomp;
if ( $_ =~ s/create table[ ]`*([a-zA-Z_]*).*`/$1/i ) {
print "Fieldcount: $fieldCount Bytecount: $byteCount\n" if $fieldCount;
$fieldCount = 0;
$byteCount = 0;
print "\nTable: $_\n";
next;
}
next if $_ !~ s/(.*)[ ]+(TINYINT[ ]*\(*[0-9,]*\)*|SMALLINT[ ]*\(*[0-9,]*\)*|MEDIUMINT[ ]*\(*[0-9,]*\)*|INT[ ]*\(*[0-9,]*\)*|BIGINT[ ]*\(*[0-9,]*\)*|FLOAT[ ]*\(*[0-9,]*\)*|DOUBLE[ ]*\(*[0-9,]*\)*|DECIMAL[ ]*\(*[0-9,]*\)*|NUMERIC[ ]*\(*[0-9,]*\)*|DATE[ ]*\(*[0-9,]*\)*|DATETIME[ ]*\(*[0-9,]*\)*|TIMESTAMP[ ]*\(*[0-9,]*\)*|TIME[ ]*\(*[0-9,]*\)*|YEAR[ ]*\(*[0-9,]*\)*|CHAR[ ]*\(*[0-9,]*\)*|VARCHAR[ ]*\(*[0-9,]*\)*|TINYBLOB[ ]*\(*[0-9,]*\)*|TINYTEXT[ ]*\(*[0-9,]*\)*|ENUM[ ]*\(*['A-Za-z_,]*\)*|BLOB[ ]*\(*[0-9,]*\)*|TEXT[ ]*\(*[0-9,]*\)*|MEDIUMBLOB[ ]*\(*[0-9,]*\)*|MEDIUMTEXT[ ]*\(*[0-9,]*\)*|LONGBLOB[ ]*\(*[0-9,]*\)*|LONGTEXT[ ]*\(*[0-9,]*\)*).*/$2/gix;
$fieldName=$1;
$_=uc;
$D=0;
($D = $_) =~ s/.*\,([0-9]+).*/$1/g if ( $_ =~ m/\,/ );
$_ =~ s/\,([0-9]*)//g if ( $_ =~ m/\,/ );
($M = $_) =~ s/[^0-9]//g;
$M=0 if ! $M;
($dt = $_) =~ s/\(.*\)//g;
$dt =~ s/[^A-Za-z_]*//g;
print "$fieldName $_:\t".eval($DataType{"$dt"})." bytes\n";
++$fieldCount;
$byteCount += eval($DataType{"$dt"});
}
print "Fieldcount: $fieldCount Bytecount: $byteCount\n";
Add your own comment.