annotate paper/chapter3.tex @ 73:17c93faef65b

fix
author Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
date Wed, 26 Feb 2014 02:56:44 +0900
parents 286a8f57becf
children e13727d01f7a
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
1 \chapter{例題}
0
9bf2694ed231 add some files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2 \label{chap:poordirection}
9bf2694ed231 add some files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents:
diff changeset
3
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
4 \section{ファイルの読み込みに関する例題}
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
5 テキストファイルをある一定のサイズに分割して読み込むプログラムである。このプログラムでは、pread という関数で実装した。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
6 pread 関数は UNIX 標準に関するヘッダファイル、unistd.h に含まれている関数である。(表\ref{table:pread})
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
7 読み込んだテキストファイルはバッファに格納されるが、その格納先は TaskManager の API でメモリを確保する。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
8 \begin{tiny}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
9 \begin{table}[ht]
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
10 \begin{center}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
11 \label{table:pread}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
12 \small
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
13 ssize\_t pread(int fd, void *buf, size\_t nbyte, off\_t offset);
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
14 \begin{tabular}[t]{c|l}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
15 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
16 int fd & 読み込むファイルディスクリプタ \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
17 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
18 void *buf & 予め用意したバッファへの書き込み \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
19 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
20 size\_t nbyte & 読み込むサイズ \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
21 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
22 off\_t offset& ファイルの先頭からのオフセット \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
23 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
24 \end{tabular}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
25 \caption{pread 関数の概要}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
26 \end{center}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
27 \end{table}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
28 \end{tiny}
37
ce985cabf699 add OUTLINE chapter4
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 36
diff changeset
29
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
30 この例題の Task 生成部分を以下に示す。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
31 \\
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
32 \begin{breakbox}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
33 \begin{verbatim}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
34 HTaskPtr read = manager->create_task(Read_task);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
35 read->set_cpu(SPE_ANY);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
36
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
37 read->set_param(0,(long)task_number);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
38 read->set_param(1,(long)division_size);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
39 if(read_left_size <= division_size){
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
40 read->set_param(2,(long)left_size);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
41 }else{
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
42 read->set_param(2,(long)division_size);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
43 }
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
44 read->set_param(3,(long)fd);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
45
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
46 read->set_outData(0,read_text + task_number*division_size,
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
47 division_size);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
48 read->spawn();
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
49
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
50 read_left_size -= division_size;
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
51 task_number++;
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
52 \end{verbatim}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
53 \end{breakbox}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
54
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
55 read という Task を宣言し、Task に CPU Type、パラメータとして生成した Task 番号の task\_number
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
56 1つの Task の読み込み量 division\_size、読み込むファイルのファイルディスクリプタ fd を設定する。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
57 読み込んだデータの格納先を set\_outData にて設定を行い、そして spawn にて Task を生成する。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
58
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
59 Task が生成されると、division\_size分の読み込みを行ったということで、残りの読み込み量 read\_left\_size から division\_size を引き、そして task\_numberを増加させる。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
60 この Task 生成部分は ファイルサイズ を division\_size で割った数だけ生成され、もし余りがあれば更に1加えた数になる。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
61 その数が Read Task の数となり、その数だけループ処理を行う。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
62 中盤にある if 文は、最後の Read Task かどうかで実際に読み込む量が決定される。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
63
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
64 read Task の記述を以下に示す。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
65 \\
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
66 \begin{breakbox}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
67 \begin{verbatim}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
68 static int
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
69 read_task(SchedTask *s, void *rbuf, void *wbuf)
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
70 {
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
71 long task_number = (long)s->get_param(0);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
72 long division_size = (long)s->get_param(1);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
73 long read_size = (long)s->get_param(2);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
74 long fd = (long)s->get_param(3);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
75
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
76 char *read_text = (char*)s->get_output(wbuf,0);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
77
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
78 pread(fd, read_text, (long)read_size , division_size*task_number);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
79 return 0;
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
80 }
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
81 \end{verbatim}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
82 \end{breakbox}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
83 生成時に設定したデータ群を受け取り、それらのデータを pread の引数に渡す。読み込まれたファイルは read\_text に格納される。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
84
68
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
85 ハードディスクに保存されている 10GB のテキストファイルを分割して読み込み終わるまでの時間を表\ref{table:preaddata}に示す。
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
86 分割サイズとは、1回の読み込み量である。
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
87 \begin{tiny}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
88 \begin{table}[ht]
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
89 \begin{center}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
90 \label{table:preaddata}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
91 \small
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
92 \begin{tabular}[t]{c|l}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
93 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
94 分割サイズ & 読み込み速度(s)\\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
95 \hline
68
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
96 16KB & 391.7 \\
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
97 \hline
68
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
98 16MB & 123.6 \\
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
99 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
100 \end{tabular}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
101 \caption{file read の実行結果}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
102 \end{center}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
103 \end{table}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
104 \end{tiny}
37
ce985cabf699 add OUTLINE chapter4
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 36
diff changeset
105
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
106 分割サイズを大きくすると、pread の呼ばれる回数が少なくなるので読み込むことが速くなる。
37
ce985cabf699 add OUTLINE chapter4
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 36
diff changeset
107
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
108 \newpage
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
109
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
110 \section{ファイルに対して処理を行う例題}
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
111 読み込んだテキストファイルに対して文字列検索を行う例題として、力任せ法と Boyer-Moore String Search を実装した。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
112 Boyer-Moore String Search は 1977 年に Robert S. Boyer と J Strother Moore が開発した効率的なアルゴリズムである。
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
113 なお、テキストファイルに含まれている文字列を text、検索する文字列を pattern と定義する。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
114
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
115 \subsection{力任せ法}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
116 力任せ法(総当り法とも呼ばれる)は、text と pattern を先頭から比較していき、
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
117 pattern と一致しなければ pattern を1文字分だけ後ろにずらして再度比較をしていくアルゴリズムである。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
118 text の先頭から pattern の先頭を比較していき、文字の不一致が起きた場合は、
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
119 pattern を右に 1 つだけずらして、再度 text と pattern の先頭を比較していく。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
120 (図\ref{fig:bruteforth})
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
121
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
122 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
123 \begin{center}
68
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
124 \includegraphics[width=0.7\textwidth]{fig/bruteforth.pdf}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
125 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
126 \caption{力まかせ法}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
127 \label{fig:bruteforth}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
128 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
129
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
130 \newpage
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
131 このアルゴリズムは実装が簡単であり、pattern が text に含まれていれば必ず探しだすことができる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
132 しかし、text と pattern の文字数が大きくなるにつれて、比較回数も膨大になる恐れがある。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
133 text の文字数を $n$、pattern の文字数を $m$とすると、力任せ法の最悪計算時間は $O(nm)$ となる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
134
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
135 \subsection{Boyer-Moore String Search Algorithm}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
136 力任せ法の比較回数を改善したアルゴリズムが Boyer-Moore String Search である。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
137 力任せ法との大きな違いとして、text と pattern を先頭から比較するのではなく、
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
138 pattern の末尾から比較していくことである。そして不一致が起こった場合は、
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
139 その不一致が起こった text の文字で再度比較する場所が決まる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
140
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
141 まず始めに、比較する場所を着目点とおく。図\ref{fig:bmsearchthink}の場合、
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
142 最初に比較する patternの末尾 と、それに対応する text を着目点とする。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
143 (a)ではその着目点で不一致が起こっているので、それ以上比較しなくてもよいことがわかる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
144 不一致が起こった場合は (b) のように着目点をずらしていく。着目点を1つ右にずらして再度 pattern の末尾から比較していく。これを繰り返しをして、(d) のときに初めて一致することがわかる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
145
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
146 (a)のときに不一致を起こした text の文字に注目する。その文字が pattern に含まれていない文字であるならば、着目点を1つずらしても、2つずらしても一致することはない。pattern に含まれていない文字で不一致になった場合は、pattern の文字数分だけ着目点をずらすことができる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
147
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
148 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
149 \begin{center}
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
150 \includegraphics[width=0.5\textwidth]{fig/bmsearchthink.pdf}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
151 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
152 \caption{pattern に含まれていない文字で不一致になった場合}
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
153 \label{fig:bmsearchthink}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
154 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
155
68
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
156 \newpage
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
157
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
158 次に、pattern に含まれている文字で不一致になった場合を紹介する。
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
159 図\ref{fig:bmsearchthink}と同様に、文字を比較していく。
46
b72aa70d301d write BMsearch
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 45
diff changeset
160 図\ref{fig:bmsearchinclude}(a)のときに不一致が起こる。
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
161 その時の text の文字列は pattern に含まれている。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
162 この場合は着目点を右に2つずらすと text と pattern が一致する。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
163 もし、pattern に含まれている文字で不一致になった場合は、その text の文字に注目する。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
164 その文字を pattern 内から探し、その文字が pattern の末尾から
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
165 どれだけ離れているかで着目点を右にずらす量が決定される。
46
b72aa70d301d write BMsearch
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 45
diff changeset
166 図\ref{fig:bmsearchinclude}の場合であれば、不一致時の文字が a であれば右に2つ、
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
167 b であれば右に1つずらすことができる。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
168
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
169 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
170 \begin{center}
68
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
171 \includegraphics[width=0.7\textwidth]{fig/bmsearchinlucde.pdf}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
172 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
173 \caption{pattern に含まれている文字で不一致になった場合}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
174 \label{fig:bmsearchinclude}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
175 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
176
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
177 \newpage
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
178
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
179 pattern に同じ文字が複数入り、その文字で不一致になった場合は図\ref{fig:bmsearchsame}のようになる。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
180 この場合 a という文字が pattern の末尾から 1つ離れている箇所と 3 つ離れている箇所が存在する。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
181 (a) のように、a で不一致が起こった場合は、着目点を右に1つか3つ移動できる。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
182 しかし、着目点を右に3つずらしてしまうと、(b-1)のように text の途中にある "abac" という文字列を見逃してしまう。着目点を右に1つずらせば、(b-2)のように検索漏れを起こすことはなくなる。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
183
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
184 このように、pattern に同じ文字が複数入っている場合は、末尾から一番近いほうを適用する。よって、図\ref{fig:bmsearchinclude}では、不一致時の文字が a であれば右に1つ、b であれば右に2つ着目点をずらすことができる。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
185
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
186 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
187 \begin{center}
68
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
188 \includegraphics[width=0.7\textwidth]{fig/bmsearchsame.pdf}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
189 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
190 \caption{pattern に同じ文字が複数入り、その文字で不一致になった場合}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
191 \label{fig:bmsearchsame}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
192 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
193
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
194
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
195 \newpage
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
196 pattern と text と不一致時の処理をまとめると、
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
197
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
198 \begin{itemize}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
199 \item pattern に含まれていない文字の場合は、 pattern の長さだけ着目点を右にずらす
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
200 \item pattern に含まれている文字の場合は、 その文字が pattern の末尾から離れている分だけ着目点を右にずらす
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
201 \item pattern に含まれている文字かつ、その文字が pattern に複数含まれている場合は、 pattern の末尾から一番近い分だけ着目点を右にずらす
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
202 \end{itemize}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
203
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
204 となる。 図\ref{fig:bmsearchbasic}の例であれば、不一致字の text の文字が a であれば着目点を 2つ、 b であれば 1つ、それ以外の文字列は3つずらすことができる。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
205
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
206 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
207 \begin{center}
68
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
208 \includegraphics[width=0.7\textwidth]{fig/bmsearchbasic.pdf}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
209 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
210 \caption{Boyer-Moore Search String}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
211 \label{fig:bmsearchbasic}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
212 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
213
68
9c5f2ffbeb4e fix preliminary
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 64
diff changeset
214 \newpage
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
215 このように、Boyer-Moore Search String は、不一致が起こった時の text の文字によって着目点をずらすということが起こるので、
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
216 文字列検索を行う前に、着目点をずらす量を参照するための BMsearch skip table を作成する。
46
b72aa70d301d write BMsearch
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 45
diff changeset
217 "doing" という pattern であれば、そのテーブルは図\ref{fig:bmskiptable1}となる。
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
218
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
219 \begin{figure}[htbp]
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
220 \begin{center}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
221 \includegraphics[width=0.8\textwidth]{fig/bmskiptable1.pdf}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
222 \end{center}
73
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 70
diff changeset
223 \caption{BMsearch skip table のまとめ}
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
224 \label{fig:bmskiptable1}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
225 \end{figure}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
226
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
227 このようなテーブルを、文字列検索を行う前に生成する必要がある。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
228 そのテーブル生成プログラムは以下のようになる。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
229 \\
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
230 \begin{breakbox}
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
231 \begin{verbatim}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
232 static int*
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
233 create_BMskiptable(unsigned char *pattern,
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
234 int pattern_len,int *skip_table)
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
235 {
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
236 for(int i = 0; i < 128; ++i){
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
237 skip_table[i] = pattern_len;
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
238 }
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
239
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
240 for(int j = 0; j < pattern_len - 1; ++j){
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
241 skip_table[pattern[j]] = pattern_len - j - 1;
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
242 }
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
243
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
244 return skip_table;
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
245 }
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
246 \end{verbatim}
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
247 \end{breakbox}
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
248 このソースでの 128 とは ASCII コード表における最大値である。
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
249 それぞれの文字に対して pattern の長さであるpattern\_lenを格納する。pattern が "doing" という文字列だと仮定すると、
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
250 pattern\_len = 5 となる。次に pattern の先頭から文字を調べる。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
251 先頭の文字は d であり、d に対応する table に着目点をずらすための移動量を格納する。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
252 移動量を格納したら、pattern の次の文字を調べ、そして移動量を格納していく。(図\ref{fig:bmskiptable})
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
253
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
254 \begin{figure}[htbp]
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
255 \begin{center}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
256 \includegraphics[width=0.8\textwidth]{fig/bmskiptable.pdf}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
257 \end{center}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
258 \caption{skip table の 生成時の様子}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
259 \label{fig:bmskiptable}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
260 \end{figure}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
261
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
262 \newpage
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
263 生成したテーブルを利用して文字列検索を行うアルゴリズムを以下に示す。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
264 \\
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
265 \begin{breakbox}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
266 \begin{verbatim}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
267 int i = pattern_len - 1;
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
268 int match_counter = 0;
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
269
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
270 while ( i < text_len){
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
271 int j = pattern_len - 1;
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
272 while (text[i] == pattern[j]){
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
273 if (j == 0){
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
274 match_counter++;
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
275 }
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
276 --i;
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
277 --j;
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
278 }
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
279 i = i + max(skip_table[text[i]],pattern_len - j);
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
280 }
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
281 \end{verbatim}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
282 \end{breakbox}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
283 読み込まれたテキストファイル text[i] と、検索文字列 pattern[j]を末尾から比較していく。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
284 一致していれば参照する場所を先頭方向にずらしていき、先頭まで完全一致した場合は、
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
285 検索文字列を発見したということで、ここではマッチした数 match\_counter を増やしている。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
286
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
287 もし途中で不一致が起こった場合は、その不一致が起こった時の text[i] の文字列か pattern の長さから j 引いたものの最大値の分だけ text の参照先をずらす。
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
288
73
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 70
diff changeset
289 \newpage
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
290 \subsection{ファイル分割時の処理の注意点}
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
291 この例題ではファイルを読み込んで一定の大きさでファイルを分割する。分割したものにそれぞれ文字列検索を行う。
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
292 それぞれの結果は pattern が含まれている個数が返ってくるので、最後に集計をして個数を表示する。このような一つ一つの処理を Task と呼ぶ。
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
293 図\ref{fig:includeiotask}では、ファイルの読み込みが File Read、分割したものに文字列検索を行うことが Run Tasks、
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
294 返した結果が Output Data、それぞれの結果の集計が Run resultTask に相当する。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
295
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
296 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
297 \begin{center}
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
298 \includegraphics[width=1.0\textwidth]{fig/includeiotask.pdf}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
299 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
300 \caption{IO を含む Task}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
301 \label{fig:includeiotask}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
302 \end{figure}
37
ce985cabf699 add OUTLINE chapter4
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 36
diff changeset
303
69
3988365f6f03 add eclbkbox.sty
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 68
diff changeset
304 \newpage
3988365f6f03 add eclbkbox.sty
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 68
diff changeset
305
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
306 ファイルを分割したときに、分割される部分で pattern が含まれる場合が存在する。
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
307 その場合は、本来の読み込み部分の text の長さ $L$ に加えて、pattern の長さ $s$ だけ多く読みこむように設計することでこの問題は解決できる。
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
308 しかし、1 つの Search Task の text の長さが $L+s$ の場合だと、pattern が Search Task 1 に含まれ、Search Task 2 にも含まれてしまう。
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
309 (図\ref{fig:includeiotask})
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
310
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
311 \begin{figure}[htbp]
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
312 \begin{center}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
313 \includegraphics[width=1.0\textwidth]{fig/iodivfail.pdf}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
314 \end{center}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
315 \caption{分割周りの処理・失敗時 (例:doing の検索)}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
316 \label{fig:includeiotask}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
317 \end{figure}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
318
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
319 それを解決するために、1つの Search task の text の長さに pattern の長さを加えてから 1 引いた数だけ読み込むようにすればそのような問題は起こらない。よって、読み込むデータ量は $ L + s -1 $となる。
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
320
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
321 \begin{figure}[htbp]
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
322 \begin{center}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
323 \includegraphics[width=1.0\textwidth]{fig/iodivsuc.pdf}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
324 \end{center}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
325 \caption{分割周りの処理・成功時 (例:doing の検索)}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
326 \label{fig:includeiotask}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
327 \end{figure}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
328
73
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 70
diff changeset
329 \newpage
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 70
diff changeset
330
64
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 48
diff changeset
331 力任せ法と Boyer-Moore String Search を比較してみた。以下に実験環境と結果を示す。(表\ref{table:search})
73
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 70
diff changeset
332
64
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 48
diff changeset
333 \begin{itemize}
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 48
diff changeset
334 \item Mac OS X 10.9.1
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 48
diff changeset
335 \item 2*2.66 GHz 6-Core Intel Xeon
70
286a8f57becf add source code chapter 2,3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 69
diff changeset
336 \item File Size 10GB
64
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 48
diff changeset
337 \end{itemize}
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
338
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
339 \begin{tiny}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
340 \begin{table}[ht]
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
341 \begin{center}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
342 \label{table:search}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
343 \small
64
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 48
diff changeset
344 \begin{tabular}[t]{c|r}
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
345 \hline
64
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 48
diff changeset
346 文字列検索アルゴリズム & 処理速度(s)\\
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
347 \hline
48
masa
parents: 47
diff changeset
348 力任せ法 & 11.792 \\
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
349 \hline
48
masa
parents: 47
diff changeset
350 Boyer-Moore String Search & 6.508 \\
47
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
351 \hline
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
352 \end{tabular}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
353 \caption{文字列検索アルゴリズムの比較}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
354 \end{center}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
355 \end{table}
6cb2ab3726bf chapter3 ok
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
356 \end{tiny}
48
masa
parents: 47
diff changeset
357
masa
parents: 47
diff changeset
358 Boyer-Moore String Search によって 44\% 改善した。