Mercurial > hg > Members > masakoha > testcode
annotate regexParser/TODO @ 324:879dc5d1cb6a default tip
fix
author | mir3636 |
---|---|
date | Fri, 27 May 2016 21:21:09 +0900 |
parents | c48a8671ce34 |
children |
rev | line source |
---|---|
304
c48a8671ce34
fix parallel search first match
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
302
diff
changeset
|
1 Mon Feb 8 12:13:08 JST 2016 |
c48a8671ce34
fix parallel search first match
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
302
diff
changeset
|
2 |
c48a8671ce34
fix parallel search first match
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
302
diff
changeset
|
3 word の処理をする前に、CharClassをobjectにする方が良いか? CbCっぽくはなくなるが。 |
c48a8671ce34
fix parallel search first match
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
302
diff
changeset
|
4 |
302
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
5 Sat Feb 6 19:50:04 JST 2016 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
6 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
7 ちょっとあれだけど、 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
8 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
9 各blockはstate 1から始める |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
10 終わりの状態が1でなかったら、そこだけやりなおす |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
11 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
12 ってのが簡単。最悪、全部やり直す可能性があるが... |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
13 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
14 Wed Feb 3 21:15:49 JST 2016 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
15 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
16 blockedSearch だと一つはoverrapさせる必要がある。 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
17 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
18 (aaa|aaabb) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
19 state : 1 [a-a] (14) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
20 state : 2* |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
21 state : 4 [a-a] (8) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
22 state : 8 [a-a] (2) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
23 state : 10 [a-a] (20) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
24 state : 20 [a-a] (40) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
25 state : 40 [b-b] (80) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
26 state : 80 [b-b] (2) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
27 state : 14 [a-a] (28) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
28 state : 28 [a-a] (42) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
29 state : 42* [b-b] (80) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
30 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
31 a | a | a bbb |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
32 prev 14 28 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
33 curret 7F ... .. |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
34 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
35 a a | a | a bbb |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
36 prev 14 28 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
37 curret 7F ... .. |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
38 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
39 false positive がある → 再判定 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
40 maxmum match による見落としがある (元々そういうものはあるのだが...) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
41 なくそうと思うと、ちょっと大変(可能な resultを全部推移させる必要がある) |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
42 内部の非決定性がなければ、こういう問題は出ない |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
43 |
27414e6fb33c
retrying blocked search
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
298
diff
changeset
|
44 |
298 | 45 Wed Feb 3 08:20:06 JST 2016 |
46 | |
47 state : 1 [w-w] (4) | |
48 state : 4 [o-o] (8) | |
49 state : 8 [r-r] (10) | |
50 node : a 10 -> 2 [d-d] (2) | |
51 | |
52 w | o r d | |
53 4 8 10 2 | |
54 | |
55 x | w o r d | |
56 1 4 8 10 2 | |
57 | |
58 Tue Feb 2 11:21:14 JST 2016 kono | |
295 | 59 |
60 あとは word の処理だけだ | |
61 charClassMergeをなおさないといけない | |
62 merge で文字列のlistにする | |
63 長いものは分割 | |
64 部分文字列は分解する? | |
65 | |
296 | 66 Cerirum 側で、最初のmatchが表示されてない |
67 | |
298 | 68 Tue Feb 2 09:55:40 JST 2016 kono |
293
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
69 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
70 % ./regexParser -subst -regex '(a|b)*a(a|b)(a|b)' |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
71 ---Print Node---- |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
72 a(1)->(1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
73 | |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
74 b(1)->(1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
75 * |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
76 + |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
77 a(4)->(4) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
78 + |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
79 a(4)->(8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
80 | |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
81 b(4)->(8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
82 + |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
83 a(8)->(2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
84 | |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
85 b(8)->(2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
86 ----------------- |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
87 state : 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
88 node : + 1 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
89 [a-a] (5) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
90 [b-b] (1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
91 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
92 state : 2* |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
93 node : e 2 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
94 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
95 state : 4 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
96 node : | 4 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
97 [a-a] (8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
98 [b-b] (8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
99 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
100 state : 8 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
101 node : | 8 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
102 [a-a] (2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
103 [b-b] (2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
104 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
105 state : 5 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
106 [a-a] (1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
107 [b-b] (9) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
108 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
109 state : 9 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
110 [a-a] (1) <---- 間違い 2 とmergeしているはずだが... |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
111 [b-b] (3) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
112 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
113 state : 3* |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
114 [a-a] (5) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
115 [b-b] (1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
116 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
117 やはり charClassMerge のbugだった。 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
118 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
119 createCharClassRangeで、同じものだったら新しく作らないってのがあると良い |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
120 charClassMerg が同じものを返す場合があるってことね |
295 | 121 同じレンジで同じ状態のものだけなので、それほどあるとは思えないが。 |
293
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
122 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
123 |
289 | 124 Mon Feb 1 01:51:10 JST 2016 kono |
125 | |
126 非決定性がある時の maxmum match がよろしくない | |
127 これ以上拡張できないという終了条件の実現は? | |
128 | |
129 ./regexParser -ts -subset -regex '(a|b)*a' -file ahoaho.txt | |
130 | |
131 で、bの後にaが来なくなると、bの手前までをacceptする | |
132 | |
291 | 133 subset construction はいじらない方針で。 |
134 | |
135 | |
136 state : 1 | |
137 node : + 1 -> 1 | |
138 [a-a] (3) | |
139 [b-b] (1) | |
140 | |
141 state : 2* | |
142 node : e 2 -> 1 | |
143 | |
144 state : 3* | |
145 [a-a] (3) | |
146 [b-b] (1) | |
147 | |
148 * はaccept state。 | |
149 | |
150 [a-a] (3) で stateMatch で良いが、maxmum だと match している間は stateMatch はしない。 | |
151 現状は、*の付いているstateで、条件にmatchしない時に stateMatch してる。 | |
152 これだと state 3 で b で satete 1 に行ってしまい、b 以降に a がない時に失敗する。b に行く前の state 3 で stateMatchするべき。 | |
153 | |
154 matchする可能性がなくなったところで、前の部分でmatchさせる必要がある。 | |
155 * match してなければ、match top をupdate | |
156 * match している間は直前matchをupdate | |
157 * match fail したところで、直前のmatch があれば、それを返す | |
158 という感じか? | |
159 | |
160 minimum match は | |
161 * match してなければ、match top をupdate | |
162 * match したところで、直前のmatch があれば、それを返す | |
163 か? | |
164 | |
165 ソース生成を CbC に対応させる。(でないと動かないらしい) | |
289 | 166 |
167 | |
284 | 168 Sun Jan 31 20:37:49 JST 2016 masa |
289 | 169 並列処理時のバグ Ok |
170 (mili|have) のsubset construction のミス Ok | |
171 tSearch の segv Ok | |
284 | 172 |
289 | 173 '(main|int) ' .. Ok |
174 '(main|int)\(' .. Ok | |
287 | 175 |
176 とかが動かない。 | |
177 | |
291 | 178 start state に accept flag が立っていると''にmatchしてしまう。それは別に生成する。 |
179 | |
221 | 180 Sat Jan 2 15:29:16 JST 2016 kono |
181 | |
182 stateよりもstate transitionの方が大きいので、subset contructionで CharClassWalkするのは良くない。 | |
183 mergeTransition した時に、state listに新しいものを接続してやれば、CharClassWalkの必要はない。 | |
184 その時に、stateArray には入れないでおく。sateArrayは処理済みなので。 | |
185 | |
186 EOF stateには cc がないので特別扱いする必要がある。 | |
187 | |
188 Tue Dec 29 17:55:17 JST 2015 kono | |
215 | 189 |
190 Todo は上に付け加えていく。 | |
191 | |
192 abc*d + | |
193 / \ | |
194 + d | |
195 / \ | |
196 + * | |
197 / \ | | |
198 a b c | |
199 | |
200 Parserを書き換えて、 | |
201 | |
202 abc*d + | |
203 / \ | |
204 a + | |
205 / \ | |
206 b + | |
207 / \ | |
208 * d | |
209 | | |
210 c | |
211 | |
212 とすることもできる。たぶん、こっちの方が良い。でも、 | |
213 ((ab)(c*))d | |
214 と書いても良いはずで、しかも、これは abc*d とおなじになるので解決になってない。 | |
215 | |
216 sub treeは、最初の状態を返す必要がある。そうでないと、 | |
217 (ab*|bc*) | |
218 とかがうまく動かない。 | |
219 | |
220 最後が*で終わっている時には、次の式と重ねる必要がある。なので、 | |
221 最後の*があれば、それを持ち歩く | |
222 方式が良いと思います。 | |
223 | |
224 stateAllocateをgenerateTransitionは1 passにすると stateArrayの大きさを徐々に増やす必要がある。 | |
225 少なくともループは一つにした方が間違いが少ないだろう。 | |
226 | |
210
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
227 |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
228 2015年 12月27日 日曜日 19時31分03秒 JST |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
229 例題 特定の IP のアクセス数をカウントする |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
230 concordance |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
231 regex をつかった条件付き concordance |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
232 regex をつかった条件付き wordcount |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
233 これを行う perl スクリプトと比較 |
215 | 234 |
235 2015年 12月26日 土曜日 18時07分00秒 JST | |
236 TODO CharClassWalker の routine test を作成する | |
237 TODO CharClassMerge の routine test を作成する | |
238 TODO searchBit の routine test を作成する | |
239 TODO subsetConstraction の routine test を作成する | |
240 |