Automated development of plasma process database: a linux system for on-line data mining and data update

M. Suzuki, L. Pichl, T. Kato1), I. Murakami1), A. Sasaki2)

University of Aizu, Tsuruga, Ikki, Aizu-Wakamatsu, 965-8580 Japan
1) National Institute for Fusion Science, Oroshi-cho, Toki, Gifu 509-5292, Japan
2) Japan Atomic Energy Research Institute, Higashi-Shiokoji-cho, Shimogyo-ku,Kyoto, 600-8216 Japan

As a part of the joint research project reported in parallel at the conference, we have been developing a database system similar to those common in the world data centers, such as DPC at NIFS. Instead of outsourcing the software engineering work, here we focus on the development of a free-software open-source solution. In particular, our work on an on-line electron-molecule scattering bibliographic database is reported (linux operating system, MySQL relational data management system, PHP logic layer, Apache server, and HTML presentation layer) [1]. We also report on our development of the cross-section database with on-line generation of data plots using a test data set [2]. The major issue discussed in this work is the sequence: automated data collection, data mining and database update. Automated scripts browsing abstract databases of several publishers have been developed and tested. In semi-regular time intervals, new data are searched for, downloaded, extracted and uploaded to the database automatically. The present system employs a simple keyword search to estimate abstract relevance; in future, it is designed to incorporate more sophisticated methods, such as the Learning Vector Quantization algorithm (reported in parallel by other authors).

References

[1] http://qpc3.u-aizu.ac.jp/~suzuki/bib/top.php
[2] L. Pichl, S. Zou, M. Kimura, I. Murakami, and T. Kato, J. Phys. Chem. Ref. Data 33, 2004, in print.


This work has been supported in part by a Grant-in-Aid by Japan Society for the Promotion of Science and by the National Institute for Fusion Science.