annotate paper/chapter3.tex @ 45:c041b9b3bf9d

add image
author Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
date Tue, 18 Feb 2014 19:15:02 +0900
parents cd95400bcaec
children b72aa70d301d
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
1 \chapter{例題}
0
9bf2694ed231 add some files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2 \label{chap:poordirection}
9bf2694ed231 add some files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents:
diff changeset
3
11
5e67750b1c4f write chapter label
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 6
diff changeset
4 \section{File Read}
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
5 テキストファイルをある一定のサイズに分割して読み込むプログラムである。このプログラムでは、pread という関数で実装した。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
6 pread 関数は UNIX 標準に関するヘッダファイル、unistd.h に含まれている関数である。(表\ref{table:pread})
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
7 読み込んだテキストファイルはバッファに格納されるが、その格納先は TaskManager の API でメモリを確保する。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
8 \begin{tiny}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
9 \begin{table}[ht]
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
10 \begin{center}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
11 \label{table:pread}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
12 \small
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
13 ssize\_t pread(int fd, void *buf, size\_t nbyte, off\_t offset);
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
14 \begin{tabular}[t]{c|l}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
15 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
16 int fd & 読み込むファイルディスクリプタ \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
17 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
18 void *buf & 予め用意したバッファへの書き込み \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
19 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
20 size\_t nbyte & 読み込むサイズ \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
21 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
22 off\_t offset& ファイルの先頭からのオフセット \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
23 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
24 \end{tabular}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
25 \caption{pread 関数の概要}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
26 \end{center}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
27 \end{table}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
28 \end{tiny}
37
ce985cabf699 add OUTLINE chapter4
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 36
diff changeset
29
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
30 1GB のテキストファイルを分割して読み込み終わるまでの時間を表\ref{table:preaddata}に示す。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
31 \begin{tiny}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
32 \begin{table}[ht]
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
33 \begin{center}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
34 \label{table:preaddata}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
35 \small
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
36 \begin{tabular}[t]{c|l}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
37 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
38 分割サイズ & 読み込み速度(s)\\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
39 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
40 16KB & XX.XXX \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
41 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
42 16MB & XX.XXX \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
43 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
44 256MB & XX.XXX \\
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
45 \hline
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
46 \end{tabular}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
47 \caption{file read の実行結果}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
48 \end{center}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
49 \end{table}
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
50 \end{tiny}
37
ce985cabf699 add OUTLINE chapter4
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 36
diff changeset
51
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
52 分割サイズを大きくすると、pread の呼ばれる回数が少なくなるので読み込むことが速くなる。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
53 しかし、ある一定以上の大きさになると I/O ネックが勝ってしまい、読み込み速度が変わらない。
37
ce985cabf699 add OUTLINE chapter4
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 36
diff changeset
54
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
55 \newpage
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
56
36
03c8790e34b5 add OUTLINE chapter3
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 11
diff changeset
57 \section{Boyer-Moore String Search Algorithm}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
58 読み込んだテキストファイルに対して文字列検索を行う例題で、Boyer-Moore String Search を実装した。
43
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
59 このアルゴリズムは 1977 年に Robert S. Boyer と J Strother Moore が開発した効率的なアルゴリズムである。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
60
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
61 Boyer-Moore String Search を紹介する前に、文字列検索で比較的単純なアルゴリズムである力任せ法を紹介する。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
62 なお、テキストファイルに含まれている文字列を text、検索する文字列を pattern と定義する。
c153591bb3f7 add some image files
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 42
diff changeset
63
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
64 力任せ法(総当り法とも呼ばれる)は、text と pattern を先頭から比較していき、
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
65 pattern と一致しなければ pattern を1文字分だけ後ろにずらして再度比較をしていくアルゴリズムである。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
66 text の先頭から pattern の先頭を比較していき、文字の不一致が起きた場合は、
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
67 pattern を右に 1 つだけずらして、再度 text と pattern の先頭を比較していく。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
68 (図\ref{fig:bruteforth})
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
69
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
70 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
71 \begin{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
72 \includegraphics[width=0.5\textwidth]{fig/bruteforth.pdf}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
73 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
74 \caption{力まかせ法}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
75 \label{fig:bruteforth}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
76 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
77
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
78 このアルゴリズムは実装が簡単であり、pattern が text に含まれていれば必ず探しだすことができる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
79 しかし、text と pattern の文字数が大きくなるにつれて、比較回数も膨大になる恐れがある。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
80 text の文字数を $n$、pattern の文字数を $m$とすると、力任せ法の最悪計算時間は $O(nm)$ となる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
81
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
82 この比較回数を改善したアルゴリズムが Boyer-Moore String Search である。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
83 力任せ法との大きな違いとして、text と pattern を先頭から比較するのではなく、
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
84 pattern の末尾から比較していくことである。そして不一致が起こった場合は、
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
85 その不一致が起こった text の文字で再度比較する場所が決まる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
86
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
87
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
88 \newpage
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
89 まず始めに、比較する場所を着目点とおく。図\ref{fig:bmsearchthink}の場合、
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
90 最初に比較する patternの末尾 と、それに対応する text を着目点とする。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
91 (a)ではその着目点で不一致が起こっているので、それ以上比較しなくてもよいことがわかる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
92 不一致が起こった場合は (b) のように着目点をずらしていく。着目点を1つ右にずらして再度 pattern の末尾から比較していく。これを繰り返しをして、(d) のときに初めて一致することがわかる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
93
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
94 (a)のときに不一致を起こした text の文字に注目する。その文字が pattern に含まれていない文字であるならば、着目点を1つずらしても、2つずらしても一致することはない。pattern に含まれていない文字で不一致になった場合は、pattern の文字数分だけ着目点をずらすことができる。
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
95
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
96 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
97 \begin{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
98 \includegraphics[width=0.5\textwidth]{fig/bmsearchthink.pdf}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
99 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
100 \caption{pattern に含まれていない文字で不一致になった場合}
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
101 \label{fig:bmsearchthink}
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
102 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
103
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
104 次に、pattern に含まれている文字で不一致になった場合を紹介する。
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
105 図\ref{fig:bmsearchthink}と同様に、文字を比較していく。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
106 図\ref{bmsearchinclude}(a)のときに不一致が起こる。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
107 その時の text の文字列は pattern に含まれている。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
108 この場合は着目点を右に2つずらすと text と pattern が一致する。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
109 もし、pattern に含まれている文字で不一致になった場合は、その text の文字に注目する。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
110 その文字を pattern 内から探し、その文字が pattern の末尾から
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
111 どれだけ離れているかで着目点を右にずらす量が決定される。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
112 図\ref{bmsearchinclude}の場合であれば、不一致時の文字が a であれば右に2つ、
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
113 b であれば右に1つずらすことができる。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
114
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
115 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
116 \begin{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
117 \includegraphics[width=0.5\textwidth]{fig/bmsearchinlucde.pdf}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
118 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
119 \caption{pattern に含まれている文字で不一致になった場合}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
120 \label{fig:bmsearchinclude}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
121 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
122
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
123 \newpage
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
124
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
125 pattern に同じ文字が複数入り、その文字で不一致になった場合は図\ref{fig:bmsearchsame}のようになる。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
126 この場合 a という文字が pattern の末尾から 1つ離れている箇所と 3 つ離れている箇所が存在する。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
127 (a) のように、a で不一致が起こった場合は、着目点を右に1つか3つ移動できる。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
128 しかし、着目点を右に3つずらしてしまうと、(b-1)のように text の途中にある "abac" という文字列を見逃してしまう。着目点を右に1つずらせば、(b-2)のように検索漏れを起こすことはなくなる。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
129
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
130 このように、pattern に同じ文字が複数入っている場合は、末尾から一番近いほうを適用する。よって、図\ref{fig:bmsearchinclude}では、不一致時の文字が a であれば右に1つ、b であれば右に2つ着目点をずらすことができる。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
131
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
132 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
133 \begin{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
134 \includegraphics[width=0.5\textwidth]{fig/bmsearchsame.pdf}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
135 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
136 \caption{pattern に同じ文字が複数入り、その文字で不一致になった場合}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
137 \label{fig:bmsearchsame}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
138 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
139
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
140
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
141 \newpage
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
142 pattern と text と不一致時の処理をまとめると、
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
143
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
144 \begin{itemize}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
145 \item pattern に含まれていない文字の場合は、 pattern の長さだけ着目点を右にずらす
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
146 \item pattern に含まれている文字の場合は、 その文字が pattern の末尾から離れている分だけ着目点を右にずらす
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
147 \item pattern に含まれている文字かつ、その文字が pattern に複数含まれている場合は、 pattern の末尾から一番近い分だけ着目点を右にずらす
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
148 \end{itemize}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
149
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
150 となる。 図\ref{fig:bmsearchbasic}の例であれば、不一致字の text の文字が a であれば着目点を 2つ、 b であれば 1つ、それ以外の文字列は3つずらすことができる。
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
151
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
152 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
153 \begin{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
154 \includegraphics[width=0.5\textwidth]{fig/bmsearchbasic.pdf}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
155 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
156 \caption{Boyer-Moore Search String}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
157 \label{fig:bmsearchbasic}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
158 \end{figure}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
159
45
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
160 このように、Boyer-Moore Search String は、不一致が起こった時の text の文字によって着目点をずらすということが起こるので、
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
161 文字列検索を行う前に、着目点をずらす量を参照するための BMsearch skip table を作成する。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
162 "doing" という pattern であれば、そのテーブルは図\ref{bmskiptable1}となる。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
163
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
164 \begin{figure}[htbp]
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
165 \begin{center}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
166 \includegraphics[width=0.8\textwidth]{fig/bmskiptable1.pdf}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
167 \end{center}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
168 \caption{BMsearch skip table}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
169 \label{fig:bmskiptable1}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
170 \end{figure}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
171
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
172 \if0
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
173 文字列検索を行う前に、不一致になったときの文字列によって着目点をいくつ右にずらすかというテーブルを作成しなければならない。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
174
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
175 そのテーブルの作成は次のプログラムで作成できる。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
176
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
177 \newpage
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
178
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
179 \begin{verbatim}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
180 static int*
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
181 create_BMskiptable(unsigned char *pattern,
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
182 int pattern_len,int *skip_table)
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
183 {
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
184 for(int i = 0; i < 128; ++i){
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
185 skip_table[i] = pattern_len;
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
186 }
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
187
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
188 for(int j = 0; j < pattern_len - 1; ++j){
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
189 skip_table[pattern[j]] = pattern_len - j - 1;
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
190 }
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
191
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
192 return skip_table;
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
193 }
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
194 \end{verbatim}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
195 このソースでの 128 とは ASCII コード表における最大値である。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
196 それぞれの文字に対して pattern の長さ (pattern\_len) を格納する。pattern が "doing" という文字列だと仮定すると、
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
197 pattern\_len = 5 となる。次に pattern の先頭から文字を調べる。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
198 先頭の文字は d であり、d に対応する table に着目点をずらすための移動量を格納する。
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
199 移動量を格納したら、pattern の次の文字を調べ、そして移動量を格納していく。(図\ref{fig:bmskiptable})
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
200
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
201 \begin{figure}[htbp]
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
202 \begin{center}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
203 \includegraphics[width=0.8\textwidth]{fig/bmskiptable.pdf}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
204 \end{center}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
205 \caption{skip table の 生成時の様子}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
206 \label{fig:bmskiptable}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
207 \end{figure}
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
208
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
209 \fi
c041b9b3bf9d add image
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 44
diff changeset
210
44
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
211
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
212 図\ref{fig:includeiotask}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
213
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
214 \begin{figure}[htbp]
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
215 \begin{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
216 \includegraphics[width=0.5\textwidth]{fig/includeiotask.pdf}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
217 \end{center}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
218 \caption{IO を含む Task}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
219 \label{fig:includeiotask}
cd95400bcaec write BM search (not finish)
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 43
diff changeset
220 \end{figure}
37
ce985cabf699 add OUTLINE chapter4
Masataka Kohagura <e085726@ie.u-ryukyu.ac.jp>
parents: 36
diff changeset
221