Design and implementation of an evolutional data collecting system for the atomic and molecular databases

Akira Sasaki, Kazuki Joe1), Hiroe Kashiwagi1), Chiemi Watanabe1), Manabu Suzuki2), Lukas Pichl2), Masatoshi Ohishi3), Daiji Kato4), Masatoshi Kato4), Takako Kato4)

Advanced Photon Research Center, Japan Atomic Energy Research Institute
(1)Nara Women’s University
(2)The University of Aizu
(3)National Astronomical Observatory of Japan
(4)National Institute for Fusion Science

Increasing demand of atomic and molecular data for basic science as well as industrial application has been recognized. Atomic and molecular data, which is theoretically calculated or experimentally measured, is published in scientific journals. The data is then stored manually to the database by staff scientists in the data centers [1], who search for such papers and evaluate the data. Recently, papers in major journals are published electronically and become available on-line. Therefore, it would be possible to improve productivity of the databases to cover wide variety of atoms and molecules, and corresponding excitation and ionization processes through radiation and collisions. We study a prototype system to help find scientific papers containing atomic data, and extract data from the paper. To find a paper, we customize search engines to download abstract of the paper. We report on our recent development of bibliographic and cross-section databases using open source software [2]. Then after converting the html data into an appropriate format for further manipulation, we calculate similarity of the abstract with those of reference abstracts taken from AMDIS bibliographic database of NIFS [3], which is known to contain atomic data of the same kind. We show preliminary experimental results of the LVQ (Learning Vector Quantization [4]) based text classification method which finds the abstracts of the papers containing atomic data. The feature vector for classification is generated by applying the TF/IDF method to each abstract.

References

[1] for example, see http://physics.nist.gov/cgi-bin/AtData/main_asd
[2] http://qpc3.u-aizu.ac.jp/~suzuki/bib/top.php
[3] http://dpc.nifs.ac.jp/amdrc/index.html
[4] Kohonen, T.; The Self-Organizing Maps (3rd edition), Springer, 2001