Mercurial > hg > Papers > 2016 > masa-master

\begin{abstract}
Cerium は当研究室で開発している並列プログラミングフレームワークである。

従来はファイル読み込みを mmap で実装していたが、本論文では並列処理向け I/O の Blocked Read を実装を行った。
Blocked Read とは、ファイルを一度に読み込まずに、あるサイズに分割して読み込む手法である。

Cerium にはファイルを読み込んで文字列処理を行う例題があり、Word Count 、Boyer-Moore String Search 、正規表現を実装し測定した。
それぞれの例題によって結果の整合性を取る必要があるが、どのように整合性を取るかは問題によって考慮する必要がある。

本研究で実装した正規表現は、正規表現から正規表現木を生成し、その正規表現木に状態を割り振りながらNFA を生成する。
NFA を  Subset Construction で DFA に変換する。
そして、ファイルとマッチングさせる。

マッチングはマッチの Range と次の状態の組の配列(tState)を生成し、tState を状態遷移させておこなう。
並列処理はブロック単位で行うが、ブロックにまたがる Pattern はあとでやり直すというシンプルな方法を採用した。
それぞれの例題と Mac に built-in されている wc、egrep と比較し測定を行なった。
\end{abstract}

\begin{abstract_eng}
Cerium is a parallel programming framework, which developed by our laboratory.
We implemented parallel file processing using both mmap and multi threaded read.
The multi-threaded read uses Blocked Read. The parallel proccesing is performed for each block as a map task.
The results of map tasks are summed in a reduce task.

Examples of parallel string proccesing in Cerium are including Word Count, Boyer-Moore Search and Regular Expression matching.
Regular expression matching uses DFA generation (Subset Construction).
The matching is performed by array of tState, which are pair of matching range of characters and next tState.
The parallel proccesing of the matching is per-block based.
In case of matching patterns which acrosses blocks, the matching processes are executed in the reduce task.

We compare examples of Cerium with UNIX wc and egrep.

\end{abstract_eng}
author	Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
date	Sat, 20 Feb 2016 18:32:23 +0900
parents	94ec38d9bdc6
children