This consists of the following steps:
The master sets the latest GCI as the restart GCI, and then synchronizes its system file to all other nodes involved in the system restart.
The next step is to synchronize the schema of all the nodes in the system restart. This is performed in 15 passes. The problem we are trying to solve here occurs when a schema object has been created while the node was up but was dropped while the node was down, and possibly a new object was even created with the same schema ID while that node was unavailable. In order to handle this situation, it is necessary first to re-create all objects that are supposed to exist from the viewpoint of the starting node. After this, any objects that were dropped by other nodes in the cluster while this node was “dead” are dropped; this also applies to any tables that were dropped during the outage. Finally, any tables that have been created by other nodes while the starting node was unavailable are re-created on the starting node. All these operations are local to the starting node. As part of this process, is it also necessary to ensure that all tables that need to be re-created have been created locally and that the proper data structures have been set up for them in all kernel blocks.
After performing the procedure described previously for the master node the new schema file is sent to all other participants in the system restart, and they perform the same synchronization.
All fragments involved in the restart must have proper
parameters as derived from DBDIH
. This
causes a number of START_FRAGREQ
signals to be sent from DBDIH
to
DBLQH
. This also starts the restoration
of the fragments, which are restored one by one and one
record at a time in the course of reading the restore data
from disk and applying in parallel the restore data read
from disk into main memory. This restores only the main
memory parts of the tables.
Once all fragments have been restored, a
START_RECREQ
message is sent to all
nodes in the starting cluster, and then all undo logs for
any Disk Data parts of the tables are applied.
After applying the undo logs in LGMAN
,
it is necessary to perform some restore work in
TSMAN
that requires scanning the extent
headers of the tablespaces.
Next, it is necessary to prepare for execution of the redo
log, which log can be performed in up to four phases. For
each fragment, execution of redo logs from several
different nodes may be required. This is handled by
executing the redo logs in different phases for a specific
fragment, as decided in DBDIH
when
sending the START_FRAGREQ
signal. An
EXEC_FRAGREQ
signal is sent for each
phase and fragment that requires execution in this phase.
After these signals are sent, an
EXEC_SRREQ
signal is sent to all nodes
to tell them that they can start executing the redo log.
Before starting execution of the first redo log, it is
necessary to make sure that the setup which was
started earlier (in Phase 4) by
DBLQH
has finished, or to wait
until it does before continuing.
Prior to executing the redo log, it is necessary to calculate where to start reading and where the end of the REDO log should have been reached. The end of the REDO log should be found when the last GCI to restore has been reached.
After completing the execution of the redo logs, all redo log pages that have been written beyond the last GCI to be restore are invalidated. Given the cyclic nature of the redo logs, this could carry the invalidation into new redo log files past the last one executed.
After the completion of the previous step,
DBLQH
report this back to
DBDIH
using a
START_RECCONF
message.
When the master has received this message back from all
starting nodes, it sends a
NDB_STARTCONF
signal back to
NDBCNTR
.
The NDB_STARTCONF
message signals the
end of STTOR
phase 4 to
NDBCNTR
, which is the only block
involved to any significant degree in this phase.