annotate a06/lecture.ind @ 238:3aad251b8d8b

...
author Shinji KONO <kono@ie.u-ryukyu.ac.jp>
date Mon, 28 Jun 2021 09:30:41 +0900
parents b3f05cd08d24
children a3fb231feeb9
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 -title: 正規表現とオートマトン
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
3 --ε遷移とオートマトンの組合せ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
4
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
5 文字の入力がなくても状態遷移が可能であるとすると、オートマトンの組合せが楽になる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
6
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
7 文字を消費しない遷移を ε遷移と言う。ε遷移可能な状態をまとめて一つの状態と見れば、ε遷移のないNFAに変換できる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
8
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
9 <a href="../agda/epautomaton.agda"> ε automatocn </a>
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
10
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
11 --正規表現とオートマトンの組合せ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
12
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 <a href="../agda/regex.agda"> regex.agda </a>
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
14
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
15 --微分法
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
16
141
b3f05cd08d24 clean up
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
17 <center><img src="fig/derivation.svg"> </center>
b3f05cd08d24 clean up
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
18
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
19 --オートマトンから正規表現を生成する
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
20
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
21 状態遷移の条件を正規表現した一般化オートマトンを考える。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
22
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
23 普通のオートマトンから始めて、状態を組み合わせて遷移条件を正規表現にしていく。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
24
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
25 状態が一つになったら正規表現が得られる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
26
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
27 --実際の正規表現エンジン
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
28
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
29 grep のソースコード
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
30
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
31 --Boyer-Moore Search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
32
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
33 固定文字列を検索するなら、正規表現よりも高速な手法がある。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
34
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
35 例えば、engine を検索するとする。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
36
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
37   6文字目がeでなければ、先頭の文字列からengineであることはない
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
38
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
39 なので、6文字見なくてもだめなことはわかる。しかし、7文字目からengineを探すと、
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
40
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
41   xxxxxn
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
42   the engine
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
43
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
44 を見落とす可能性がある。つまり、
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
45
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
46   6文字目が検索文字列に含まれる文字列だと途中からマッチする可能性がある
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
47
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
48 何文字目からマッチする可能性があるかは、あらかじめ調べることができる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
49
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
50   e 6文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
51   n 2文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
52   g 3文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
53   i 4文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
54   ? 7文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
55
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
56 これを繰り返せば良い。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
57
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
58   .*engine
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
59
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
60 をDFAに変換して探す場合とどれくらい速度が違うか調べてみよう。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
61
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
62 --複数文字列に対する Boyer-Moore Search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
63
46
964e4bd0272a add coinduction
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 37
diff changeset
64 ---問題6.1 正規表現の決定性オートマトンへの変換
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
65
46
964e4bd0272a add coinduction
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 37
diff changeset
66 <a href="../exercise/005.html"> 例題 </a> <!--- Exercise 6.1 --->
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
67