Go to the first, previous, next, last section, table of contents.

17 MySQL Cluster

MySQL Cluster uses the new NDBCluster (from MySQL 4.1.2) storage engine to enable running several MySQL servers in a cluster.

This chapter represents work in progress. Other documents describing MySQL Cluster can be found at http://www.mysql.com/cluster/ and http://dev.mysql.com/doc/#cluster.

17.1 MySQL Cluster Overview

MySQL Cluster is a new technology to enable clustering of in-memory databases in a share-nothing system. The share-nothing architecture allows the system to work with very inexpensive hardware, without any specific requirement on hardware or software. It also does not have any single point of failure since each component has its own memory and disk.

MySQL Cluster is an integration of the standard MySQL server with an in-memory clustered storage engine, called NDB. In our documentation, the term NDB refers to the storage engine specific part of the setup, whereas MySQL Cluster refers to the combination of MySQL and the new storage engine.

A MySQL Cluster consists of computers with a set of processes executing several MySQL servers, storage nodes for NDB Cluster, management servers and possibly also specialized data access programs. All these programs work together to form MySQL Cluster. When data is stored in the NDBCluster storage engine, the tables are stored in the storage nodes for NDB Cluster. Those tables are directly accessible also from all other MySQL servers in the cluster. Thus, if one application updates the salary of an employee all other MySQL servers that query this data can see it immediately.

The data stored in the storage nodes for MySQL Cluster can be mirrored and can handle failures of storage nodes with no other impact than that a number of transactions are aborted due to losing the transaction state. This should cause no problems bcause transactional applications should be written to handle transaction failure.

By bringing MySQL Cluster to the open source world, MySQL makes clustered data management with high availability, high performance and scalability available to all who need it.

17.2 Basic MySQL Cluster Concepts

NDB is an in-memory storage engine offering high-availability and data-persistence features. NDB can (although this requires extensive knowledge) be used as an independent database system, supporting the traditional relational data model with full ACID transactions.

The NDB storage engine can be configured with a range of fail-over and load-balancing options, but it is easiest to start with the storage engine at the cluster level. The NDB storage engine of MySQL Cluster contains a complete set of data, dependent only on other data within the cluster itself.

A MySQL Cluster may also replicate clustered data to other MySQL Clusters, but this is a complex configuration. Here, we will focus on how to set up a single MySQL Cluster consisting of an NDB storage engine and some MySQL servers.

The cluster part of MySQL Cluster is currently configured independently from the MySQL servers. In an MySQL Cluster each part of the cluster is considered to be a node.

Note: A node in many contexts is often a computer, but for MySQL Cluster it is a process. There can be any number of nodes on a single computer.

Each node has a type, and there can be multiple nodes in the MySQL Cluster of each type. In a minimal MySQL Cluster configuration, there will be at least three nodes:

The management (MGM) node. The role of this type of node is to manage the other nodes within the MySQL Cluster, such as providing configuration data, starting and stopping nodes, running backup etc. As this node type manages the configuration of the other nodes, a node of this type must always be started first, before any other node. With a running cluster, the MGM node does necessarily have to be running all the time.
The storage or database (DB) node. This is the type of node that manages and stores the database itself. There are as many DB nodes as you have replicas times the number of fragments. That is, with two fragments, each with two replicas, you need four DB nodes. Note that it is not necessary to have more than one replica, so a minimal MySQL Cluster may contain just one DB node.
The client (API) node. This is the client node that will access the cluster, and in the case of MySQL Cluster, these are traditional MySQL servers with a new storage engine NDBCluster which enables access to clustered tables. Basically, the MySQL daemon is a client of the NDB cluster. If you have applications that use the NDB API directly, then these are considered API nodes too.

We refer to these cluster processes as nodes in the cluster. Setting up the configuration of the cluster involves configuring each individual node in the cluster and setting up each individual communication link between the nodes in the cluster. MySQL Cluster currently is designed with the intention that storage nodes are homogenous in terms of processor power, memory space, and communication bandwidth. Also, to enable one point of configuration, it was decided to move the entire cluster configuration to one configuration file.

The management server manages the cluster configuration file and the cluster log. All nodes in the cluster contact the management server to retrieve their part of the configuration, so they need a way to determine where the management server resides. When interesting events occur in the storage nodes, they transfer the information of these events to the management server, which then writes the information to the cluster log.

In addition, there are any number of clients to the cluster. These are of two types. First, there are the normal MySQL clients that are no different for MySQL Cluster. MySQL Cluster can be accessed from all MySQL applications written in PHP, Perl, C, C++, Java, Ruby, and so forth. Second, there are management clients. These clients access the management server and provide commands to start and stop nodes gracefully, to start and stop message tracing (only in debug versions), to print configuration, to show node status of all nodes in the cluster, to show versions of all nodes in the cluster, to start and stop backups, and so forth.

17.3 MySQL Cluster Configuration

A MySQL server that is part of MySQL Cluster differs in only one aspect from what we are used to, it has an additional storage engine (NDB or NDBCLUSTER), which is initially disabled.

Except for this, the MySQL server is not much different than what we are used to from previous MySQL releases, except any other new 4.1 features, of course. As default, the MySQL is configured with the NDB storage engine disabled; to enable it you need to modify `my.cnf'.

Also, as the MySQL daemon is an API client to the NDB storage engine, the minimal configuration data needed to access the MGM node from the MySQL server must be set. When this is done, then all MGM nodes (one is sufficient to start) and DB nodes must be up and running before starting the MySQL server.

17.3.1 Building the Software

Currently, you need to build from source, using MySQL 4.1 from the BitKeeper tree. Note that some tools need different versions than is typically used when building MySQL.

Tool/library	Version	Comments
libncurses	5.2.2	Used by some of the commandline tools.
Make	3.79.1
Gawk	3.1.0	Some Linux distributions come with mawk instead (like Debian). This will NOT work.
Autoconf	2.56	Very important to have the right version here, at least 2.5x.
Automake	1.7.6	Also very important, some buildfiles rely on recent functionality and absence of bugs. Having the wrong version will cause strange build errors, not immediately related to automake.
Libtool	1.5.6	Again, the version is important. Strange error occurs if very old versions are used.
Gcc	2.95.4
Bison	1.75	The build will fail if an old version of Bison is used, the table-size in the SQL parser will overflow.
Zlib	1.1.4	This isn't normally required, but due to a small bug, the build will not complete if zlib is missing, and regrettably at a very late stage.

To start the build, use the BUILD/compile-pentium-max script. This build script also includes OpenSSL, so you either have to get OpenSSL of modify the build script to exclude it.

Apart from these things, follow the standard instructions to build the binaries, run the tests and perform the installation procedure. See section 2.3.3 Installing from the Development Source Tree.

17.3.2 Installing the Software

You need to have all the MGM and DB nodes up and running first, and this will probably be the most time-consuming part of the configuration, if for no other reason than because we will assume that you are already familiar with MySQL to a certain extent. As for the MySQL configuration and the `my.cnf' file, this is very straightforward, and this section only covers the differences from configuring MySQL without clustering.

17.4 Process Management in MySQL Cluster

There are four processes that are important to know about when using MySQL Cluster. We will cover how to work with those processes, which options to use when starting and so forth.

17.4.1 MySQL Server Process Usage for MySQL Cluster

mysqld is the traditional MySQL server process. To be used with MySQL Cluster it needs to be built with support for the NDBCluster storage engine. If the mysqld binary has been built in such a manner, the NDBCluster storage engine is still disabled by default.

To enable the NDBCluster storage engine there are two ways. Either use --ndbcluster as a start-up option when starting mysqld or insert a line with ndbcluster in the my.cnf file. An easy way to verify that your server runs with support for the NDBCluster storage engine is to issue the command SHOW TABLE TYPES from a mysql client.

The MySQL server needs to know how to get the configuration of the cluster. To access this configuration it needs to know three things, it needs to know its own node id in the cluster, it needs to know the hostname (or IP address) where the management server resides and finally it needs to know the port on which it can connect to the management server.

There are two possible ways to provide this information to the mysqld process. The first option is to include this information in a file called `Ndb.cfg'. This file should reside in the data directory of the MySQL Server. The second option is to set an environment variable called NDB_CONNECTSTRING. The string is the same in both cases.

nodeid=3;hostname=ndb_mgmd.mysql.com:2200

where 3 is the node id, ndb_mgmd.mysql.com is the host where the management server resides, and it is listening to port 2200.

With this set-up the MySQL server will be a full citizen of MySQL Cluster and will be fully aware of all storage nodes in the cluster and their status. It will set-up connection to all storage engine nodes and will be able to use all storage engine nodes as transaction coordinator and to access their data for reading and updating.

17.4.2 `ndbd`, the Storage Engine Node Process

ndbd is the process which is used to handle all the data in the tables using the NDBCluster storage engine. This is the process that contains all the logic of distributed transaction handling, node recovery, checkpointing to disk, on-line backup and lots of other functionality.

In a cluster there is a set of ndbd processes cooperating in handling the data. These processes can execute on the same computer or on different ina completely configurable manner.

Each ndbd process should start from a separate directory. The reason for this is that ndbd generates a set of log files in its starting directory. These log files are:

error.log is a file that contains information of all the crashes which the ndbd process has encountered and a smaller error string and reference to a trace file for this crash. An entry could like this:

Date/Time: Saturday 31 January 2004 - 00:20:01
Type of error: error
Message: Internal program error (failed ndbrequire)
Fault ID: 2341
Problem data: DbtupFixAlloc.cpp
Object of reference: DBTUP (Line: 173)
ProgramName: NDB Kernel
ProcessID: 14909
TraceFile: NDB_TraceFile_1.trace
***EOM***

NDB_TraceFile_1.trace is a trace file describing exactly what happened before the error occurred. This information is useful for the MySQL Cluster team when analysing any bugs occurring in MySQL Cluster. The information in this file will be described in the section MySQL Cluster Troubleshooting. There can be a configurable number of those trace files in the directory before old files are overwritten. 1 in this context is simply the number of the trace file.
NextTraceFileNo.log is the file keeping track of what the number of the next trace file is to be.
node2.out is the file which contains any data printed by the ndbd process when executing as a daemon process. 2 in this context is the node id. This file only exists when starting ndbd as a daemon since then stdout and stderr is redirected to this file.
node2.pid is a file used to ensure that only one ndb node is started with this node id. This is the normal pid-file created when starting a daemon. 2 in this context is the node id. This file only exists when starting ndbd as a daaemon process.
Signal.log is a file which is only used in debug versions of ndbd where it is possible to trace all incoming, outgoing and internal messages with their data in the ndbd process.

It is recommended to not use a directory mounted through NFS since that can in some environments cause problems with the lock on the pid-file remaining even after the process has stopped.

Also when starting the ndbd process it is necessary to specify which node id the process is to use, the host of the management server and the port it is listening to. Again there are two ways of specifying this information. Either in a string in the file `Ndb.cfg', this file should be stored in the starting directory of the ndbd process. The second option is to set the environment variable NDB_CONNECTSTRING before starting the process.

When ndbd starts it will actually start two processes. The starting process is called the "angel" and its only job is to discover when the execution process has completed and then if configured to do so, to restart the ndbd process. Thus if one attempts to kill the ndbd through the kill command in a Unix variant it is necessary to kill both processes.

The execution process will use one thread for all activities in reading, writing and scanning data and all other activities. This thread is designed with asynchronous programming so it can easily handle thousands of concurrent activites. In addition there is a watch-dog thread supervising the execution thread to ensure it doesn't stop in an eternal loop or other problem. There is a pool of threads handling file I/O. Each thread can handle one open file. In addition threads can be used for connection activities of the transporters in the ndbd process. Thus in a system that performs a large number of activities including update activities the ndbd process will consume up to about 2 cpu's if allowed to. Thus in a large SMP box with many CPU's it is recommended to use several ndbd processes which are configured to be part of different node groups.

nodeid=2;hostname=ndb_mgmd.mysql.com:2200

17.4.3 `ndb_mgmd`, the Management Server Process

The management server is the process which reads the configuration file of the cluster and distributes this information to all nodes in the cluster requesting it. It does also maintain the log of cluster activities. Management clients can connect to the management server and use commands to check status of the cluster in various aspects.

Also when starting ndb_mgmd it is necessary to state the same information as for ndbd and mysqld processes and again there are two options using the file `Ndb.cfg' or using the environment variable NDB_CONNECTSTRING. The `Ndb.cfg' will if used be placed in the start directory of ndb_mgmd.

nodeid=1;hostname=ndb_mgmd.mysql.com:2200

The following files are created or used by ndb_mgmd in its starting directory of ndb_mgmd:

`config.ini' is the configuration file of the cluster. This is created by the user and read by the management server. How to write this file is described in the section MySQL Cluster Configuration.
cluster.log is the file where events in the cluster are reported. Examples of events are checkpoints started and completed, node failures and nodes starting, levels of memory usage passed and so forth. The events reported are described in the section section 17.5 Management of MySQL Cluster.
node1.out is the file used for stdout and stderr when executing the management server as a daemon process. 1 in this context is the node id.
node1.pid is the pid file used when executing the management server as a daemon process. 1 in this context is the node id.
cluster.log.1 when the cluster log becomes bigger than 1 million bytes then the file is renamed cluster.log.1 where 1 is the number of the cluster log file, so if 1, 2, and 3 already exists the next will be having the number 4 instead.

17.4.4 `ndb_mgm`, the Management Client Process

The final important process to know about is the management client. This process is not needed to run the cluster. Its value lies in its ability to check status of the cluster, start backups and other management activities. It does so by providing access to a set of commands.

Actually the management client is using a C API which is used to access the management server so for advanced users it is also possible to program dedicated management processes which can do similar things as the management client can do.

When starting the management client it is necessary to state the hostname and port of the management server as in the example below.

ndb_mgm localhost 2200

17.4.5 Command Options for MySQL Cluster Processes

17.4.5.1 MySQL Cluster-Related Command Options for `mysqld`

--ndbcluster: If the binary includes support for the NDBCluster storage engine the default disabling of support for the NDB storage engine can be overruled by using this option. Using the NDBCluster storage engine is necessary for using MySQL Cluster.
--skip-ndbcluster: Disable the NDBCluster storage engine. This is disabled by default for binaries where it is included. So this option only applies if the server was configured to use the NDBCluster storage engine.

17.4.5.2 Command Options for `ndbd`

-?, --usage

These options only starts the program to print its command options.

-c string, --connect-string string

For ndbd it is also possible to set the connect string to the management server as a command option.

nodeid=2;host=ndb_mgmd.mysql.com:2200

-d, --daemon

Instructs ndbd to execute as a daemon process.

-i, --initial

Instructs ndbd to perform an initial start. An initial start will erase any files created by earlier ndbd instances for recovery. It will also recreate recovery log files which on some Operating Systems can take a substantial amount of time.

-n, --no-start

Instructs ndbd to not automatically start. ndbd will connect to the management server and get the configuration and initialise communication objects. It will not start the execution engine until requested to do so by the management server. The management server can request by command issued by the management client.

-s, --start

Instructs the ndbd process to immediately start. This is the default behavior so it is not really needed.

-v, --version

Prints the version number of the ndbd process. The version number is the MySQL Cluster version number. This version number is important since at start-up the MySQL Cluster processes verifies that the versions of the processes in the cluster can co-exist in the cluster. It is also important for on-line software upgrade of MySQL Cluster (see section Software Upgrade of MySQL Cluster).

17.4.5.3 Command Options for `ndb_mgmd`

-?, --usage: These options only starts the program to print its command options.
-c filename: Instructs the management server which file to use as configuration file. This option must be specified. There is no default value.
-d: Instructs ndb_mgmd to start as a daemon process.
-l filename: Instructs the management server in which file it can find the connect string. Default file is `Ndb.cfg'.
-n: Instructs the management server to not start as a daemon process. This is the default behavior but this is quite likely to change.
--version: Prints the version number of the management server. The version number is the MySQL Cluster version number. The management server can check that only versions capable of working with its versions are accepted and provided with the configuration information.

17.4.5.4 Command Options for `ndb_mgm`

-?, --usage: These options only starts the program to print its command options.
[hostname [port]]: To start the management client it is necessary to specify where the management server resides. This means specifying the hostname and the port. The default hostname is localhost and the default port is 2200.
--try-reconnect=number: If the connection to the management server is broken it is possible to perform only a specified amount of retries before reporting a fault code to the user. Default is that it keeps retrying every 5 seconds until it succeeds.

17.5 Management of MySQL Cluster

Managing a MySQL Cluster involves a number of activities. The first activity is to configure and start-up MySQL Cluster. This is covered by the sections section 17.3 MySQL Cluster Configuration and section 17.4 Process Management in MySQL Cluster. This section covers how to manage a running MySQL Cluster.

There are essentially two ways of actively managing a running MySQL Cluster. The first is by commands entered into the management client where status of cluster can be checked, log levels changed, backups started and stopped and nodes can be stopped and started. The second method is to study the output in the cluster log. The cluster log is directed towards the cluster.log in the directory where the management server started. The cluster log contains event reports generated from the ndbd processes in the cluster. It is also possible to send the cluster log entries to a Unix system log.

Go to the first, previous, next, last section, table of contents.