annotate a06/lecture.ind @ 406:a60132983557

...
author Shinji KONO <kono@ie.u-ryukyu.ac.jp>
date Wed, 08 Nov 2023 21:35:54 +0900
parents a3fb231feeb9
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 -title: 正規表現とオートマトン
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
2
326
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
3 正規表現は、Automaton と同じ言語であることが期待される。つまり、
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
4
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
5  正規表現から、それと同じ language を受け付ける Automaton
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
6
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
7 を作ればよい。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
8
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
9 --ε遷移と非決定性オートマトンの合成による方法
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
10
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
11 教科書にはこの方法が記述されている。
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
12
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 文字の入力がなくても状態遷移が可能であるとすると、オートマトンの組合せが楽になる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
14
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
15 文字を消費しない遷移を ε遷移と言う。ε遷移可能な状態をまとめて一つの状態と見れば、ε遷移のないNFAに変換できる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
16
326
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
17 そして、それを subset construction で DFA にすれば良い。
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
18
326
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
19 実際に、M-Concat などで合成を実行することができる。
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
20
326
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
21 しかし、もう少し直接的な方法も存在する。
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
22
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
23 --微分法
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
24
326
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
25 ある正規表現を考えた時に、それを
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
26
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
27   空列を受け付けるかどうか
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
28   個々の文字列を一文字受け付けたらどうなるか
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
29
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
30 と考える。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
31
141
b3f05cd08d24 clean up
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
32 <center><img src="fig/derivation.svg"> </center>
b3f05cd08d24 clean up
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 46
diff changeset
33
326
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
34 まず空列を受け付けるかどうか判定する。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
35
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
36 empty? : Regex Σ → Bool
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
37 empty? ε = true
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
38 empty? φ = false
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
39 empty? (x *) = true
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
40 empty? (x & y) = empty? x /\ empty? y
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
41 empty? (x || y) = empty? x \/ empty? y
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
42 empty? < x > = false
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
43
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
44 正規表現の変形を実行する
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
45
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
46 derivative : Regex Σ → Σ → Regex Σ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
47 derivative ε s = φ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
48 derivative φ s = φ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
49 derivative (x *) s with derivative x s
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
50 ... | ε = x *
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
51 ... | φ = φ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
52 ... | t = t & (x *)
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
53 derivative (x & y) s with empty? x
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
54 ... | true with derivative x s | derivative y s
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
55 ... | ε | φ = φ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
56 ... | ε | t = y || t
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
57 ... | φ | t = t
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
58 ... | x1 | φ = x1 & y
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
59 ... | x1 | y1 = (x1 & y) || y1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
60 derivative (x & y) s | false with derivative x s
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
61 ... | ε = y
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
62 ... | φ = φ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
63 ... | t = t & y
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
64 derivative (x || y) s with derivative x s | derivative y s
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
65 ... | φ | y1 = y1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
66 ... | x1 | φ = x1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
67 ... | x1 | y1 = x1 || y1
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
68 derivative < x > s with eq? x s
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
69 ... | yes _ = ε
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
70 ... | no _ = φ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
71
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
72 ここで、
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
73
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
74 eq? : (x y : Σ) → Dec (x ≡ y)
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
75
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
76 があることを使っている。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
77
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
78 これの繰り返しで選れる正規表現の集合を data を使って表現できる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
79
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
80 data regex-states (x : Regex Σ ) : Regex Σ → Set where
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
81 unit : regex-states x x
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
82 derive : { y : Regex Σ } → regex-states x y → (s : Σ) → regex-states x ( derivative y s )
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
83
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
84 record Derivative (x : Regex Σ ) : Set where
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
85 field
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
86 state : Regex Σ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
87 is-derived : regex-states x state
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
88
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
89 これを状態にして Automaton を構成できる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
90
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
91 regex→automaton : (r : Regex Σ) → Automaton (Derivative r) Σ
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
92 regex→automaton r = record { δ = λ d s → record { state = derivative (state d) s
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
93 ; is-derived = derive-step d s} ; aend = λ d → empty? (state d) } where
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
94 derive-step : (d0 : Derivative r) → (s : Σ) → regex-states r (derivative (state d0) s)
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
95 derive-step d0 s = derive (is-derived d0) s
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
96
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
97 実際に match を実行するには以下のようにする。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
98
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
99 regex-match : (r : Regex Σ) → (List Σ) → Bool
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
100 regex-match ex is = accept ( regex→automaton ex ) record { state = ex ; is-derived = unit } is
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
101
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
102 ここではいくつかの問題が残っている。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
103
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
104   生成される状態は有限か? (つまり微分法は停止するのか)
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
105
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
106 停止するかどうかに関係なく、regex-match は具体的な有限文字列に対して実行可能である。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
107
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
108 ただし、Agda では具体的ではない文字列も用意できる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
109
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
110 微分法で定義した Automaton は、正規表現でしていされた言語に一致するのか?
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
111
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
112 これも証明する必要がある。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 141
diff changeset
113
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
114 --オートマトンから正規表現を生成する
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
115
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
116 状態遷移の条件を正規表現した一般化オートマトンを考える。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
117
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
118 普通のオートマトンから始めて、状態を組み合わせて遷移条件を正規表現にしていく。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
119
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
120 状態が一つになったら正規表現が得られる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
121
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
122 --実際の正規表現エンジン
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
123
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
124 grep のソースコード
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
125
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
126 --Boyer-Moore Search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
127
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
128 固定文字列を検索するなら、正規表現よりも高速な手法がある。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
129
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
130 例えば、engine を検索するとする。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
131
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
132   6文字目がeでなければ、先頭の文字列からengineであることはない
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
133
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
134 なので、6文字見なくてもだめなことはわかる。しかし、7文字目からengineを探すと、
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
135
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
136   xxxxxn
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
137   the engine
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
138
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
139 を見落とす可能性がある。つまり、
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
140
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
141   6文字目が検索文字列に含まれる文字列だと途中からマッチする可能性がある
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
142
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
143 何文字目からマッチする可能性があるかは、あらかじめ調べることができる。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
144
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
145   e 6文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
146   n 2文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
147   g 3文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
148   i 4文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
149   ? 7文字目
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
150
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
151 これを繰り返せば良い。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
152
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
153   .*engine
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
154
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
155 をDFAに変換して探す場合とどれくらい速度が違うか調べてみよう。
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
156
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
157 --複数文字列に対する Boyer-Moore Search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
158
46
964e4bd0272a add coinduction
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 37
diff changeset
159 ---問題6.1 正規表現の決定性オートマトンへの変換
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
160
46
964e4bd0272a add coinduction
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents: 37
diff changeset
161 <a href="../exercise/005.html"> 例題 </a> <!--- Exercise 6.1 --->
37
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
diff changeset
162