annotate paper/abstract.tex @ 108:199561d48b97

add poster
author Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
date Sat, 20 Feb 2016 18:32:23 +0900
parents 94ec38d9bdc6
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
97
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 \begin{abstract}
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
2 Cerium は当研究室で開発している並列プログラミングフレームワークである。
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
3
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
4 従来はファイル読み込みを mmap で実装していたが、本論文では並列処理向け I/O の Blocked Read を実装を行った。
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
5 Blocked Read とは、ファイルを一度に読み込まずに、あるサイズに分割して読み込む手法である。
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
6
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
7 Cerium にはファイルを読み込んで文字列処理を行う例題があり、Word Count 、Boyer-Moore String Search 、正規表現を実装し測定した。
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
8 それぞれの例題によって結果の整合性を取る必要があるが、どのように整合性を取るかは問題によって考慮する必要がある。
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
9
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
10 本研究で実装した正規表現は、正規表現から正規表現木を生成し、その正規表現木に状態を割り振りながらNFA を生成する。
100
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
11 NFA を Subset Construction で DFA に変換する。
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
12 そして、ファイルとマッチングさせる。
97
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
13
100
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
14 マッチングはマッチの Range と次の状態の組の配列(tState)を生成し、tState を状態遷移させておこなう。
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
15 並列処理はブロック単位で行うが、ブロックにまたがる Pattern はあとでやり直すというシンプルな方法を採用した。
97
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
16 それぞれの例題と Mac に built-in されている wc、egrep と比較し測定を行なった。
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
17 \end{abstract}
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
18
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
19 \begin{abstract_eng}
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
20 Cerium is a parallel programming framework, which developed by our laboratory.
100
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
21 We implemented parallel file processing using both mmap and multi threaded read.
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
22 The multi-threaded read uses Blocked Read. The parallel proccesing is performed for each block as a map task.
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
23 The results of map tasks are summed in a reduce task.
97
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
24
100
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
25 Examples of parallel string proccesing in Cerium are including Word Count, Boyer-Moore Search and Regular Expression matching.
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
26 Regular expression matching uses DFA generation (Subset Construction).
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
27 The matching is performed by array of tState, which are pair of matching range of characters and next tState.
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
28 The parallel proccesing of the matching is per-block based.
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
29 In case of matching patterns which acrosses blocks, the matching processes are executed in the reduce task.
97
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
30
100
94ec38d9bdc6 add abstruct eng
masa
parents: 97
diff changeset
31 We compare examples of Cerium with UNIX wc and egrep.
97
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
32
c1738511433c add tSearch Source code
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
33 \end{abstract_eng}