Multi-Layer Distributed Storage of LHD Plasma Diagnostic Database

H. Nakanishi, M. Kojima, M. Ohsuna, M. Nonomura, S. Imazu, and Y. Nagayama

National Institute for Fusion Science, 322-6 Oroshi-cho, Toki 509-5292, Japan

Near the end of 2003's LHD experimental campaign, its raw data amount of whole plasma diagnostics had reached up to 3.16 GB in a long-pulse experiment. It is the world new record in fusion plasma experiments, far beyond the previous one of about 1.5 GB/shot of JET [1]. The intense growth of diagnostic data amount inevitably needs larger reinforcement of storage volumes every year than previous one. The total size of the LHD diagnostic data is about 21.6 TB for the whole six years' experiments, and even keeps growing with the increasing rate. Owing to these requirements for the data storage system, it must be enough flexible and easily expandable to maintain the whole data integrity. The LHD diagnostic data storage, i.e. the LABCOM system, therefore, has a completely distributed architecture based on fast network. It also realizes the data redundancy, fail-safe ability, and even load-balancing by means of replication pairs of every storage servers. They are equally accessible through the network, and their data contents have been listed in a "facilitator" PostgreSQL relational database management system (DBMS) and informed to any data retrieval clients on their demands. Now it contains about 6.2 millions of entries as a whole, and the primary part of 3.4 millions are the information for data locations. As for the storage equipment and database system, the LABCOM system has three categories of the storage layer: The first one is the local disk arrays for each data acquisition computers where raw data just acquired have been stored into the virtual volume provided by the Object-Oriented DBMS. The adoption of OODBMS is due to the seamless between the volatile data objects in C++ application and their persistent instances in OODB space [2]. The second layer consists of plural sets of huge redundant disk array (RAID) servers, for providing fast data retrieval to clients. The third one has a few sets of so-called the mass storage systems (MSS): For the beginning four campaigns, three sets of 1.2 TB magneto-optical (MO) disk jukeboxes have been applied, and after that, DVD-R changers whose capacities are 1.8 TB or 3.3 TB for each are adopted until now. The numbers of running data servers are about 40, 3, and 4 in respective layers. Servers in the latter two categories are not operated on OODBMS. By means of the data migration mechanism between these layers, however, they can be considered as the OODB extension area virtually. As the raw data of plasma diagnostics usually consist of multi-channels of lengthy time-series signals, the occupied volume size in data storage is far from its number of entries. Typical plasma data having tens of binary large objects cannot be stored in it, even though millions of data entries require the fast index searching engine in RDBMS. It is quite different from usual RDBMS applications in other fields, and the data retrieval speed to the clients becomes quite important. On a structural reason, the OODB client/server shares so much information that their links often require excessive network bandwidth for Internet. By using the "glib" compression for all of the binary data and applying the three-tier application model for the OODB data transfer/retrieval, the optimized 1.7 MB/s OODB read-out rate and effective 3∼25 MB/s client access speed has been achieved. As a result, the LABCOM data system has succeeded in combination use of RDBMS, OODBMS, RAID, and MSS to enable the virtual and always expandable storage volume simultaneously with the rapid data access.

References

[1] J.W. Farthing, Proc. 4th IAEA TM of Control, Data Acquisition and Remote Participation for Fusion Research, San Diego, 21-23 July 2003.
[2] H. Nakanishi, et al., Fusion Eng. Design 48 pp.135-142 (2000).