Mercurial > hg > Members > masakoha > testcode
annotate regexParser/TODO @ 293:948428caf616
NFA maximum match worked
author | Shinji KONO <kono@ie.u-ryukyu.ac.jp> |
---|---|
date | Tue, 02 Feb 2016 10:38:45 +0900 |
parents | 1b75546ff65f |
children | 0c663f46954d |
rev | line source |
---|---|
293
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
1 Tue Feb 2 09:55:40 JST 2016 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
2 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
3 % ./regexParser -subst -regex '(a|b)*a(a|b)(a|b)' |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
4 ---Print Node---- |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
5 a(1)->(1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
6 | |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
7 b(1)->(1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
8 * |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
9 + |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
10 a(4)->(4) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
11 + |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
12 a(4)->(8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
13 | |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
14 b(4)->(8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
15 + |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
16 a(8)->(2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
17 | |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
18 b(8)->(2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
19 ----------------- |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
20 state : 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
21 node : + 1 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
22 [a-a] (5) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
23 [b-b] (1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
24 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
25 state : 2* |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
26 node : e 2 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
27 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
28 state : 4 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
29 node : | 4 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
30 [a-a] (8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
31 [b-b] (8) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
32 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
33 state : 8 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
34 node : | 8 -> 1 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
35 [a-a] (2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
36 [b-b] (2) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
37 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
38 state : 5 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
39 [a-a] (1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
40 [b-b] (9) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
41 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
42 state : 9 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
43 [a-a] (1) <---- 間違い 2 とmergeしているはずだが... |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
44 [b-b] (3) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
45 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
46 state : 3* |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
47 [a-a] (5) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
48 [b-b] (1) |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
49 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
50 やはり charClassMerge のbugだった。 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
51 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
52 createCharClassRangeで、同じものだったら新しく作らないってのがあると良い |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
53 charClassMerg が同じものを返す場合があるってことね |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
54 |
948428caf616
NFA maximum match worked
Shinji KONO <kono@ie.u-ryukyu.ac.jp>
parents:
291
diff
changeset
|
55 |
289 | 56 Mon Feb 1 01:51:10 JST 2016 kono |
57 | |
58 非決定性がある時の maxmum match がよろしくない | |
59 これ以上拡張できないという終了条件の実現は? | |
60 | |
61 ./regexParser -ts -subset -regex '(a|b)*a' -file ahoaho.txt | |
62 | |
63 で、bの後にaが来なくなると、bの手前までをacceptする | |
64 | |
291 | 65 subset construction はいじらない方針で。 |
66 | |
67 | |
68 state : 1 | |
69 node : + 1 -> 1 | |
70 [a-a] (3) | |
71 [b-b] (1) | |
72 | |
73 state : 2* | |
74 node : e 2 -> 1 | |
75 | |
76 state : 3* | |
77 [a-a] (3) | |
78 [b-b] (1) | |
79 | |
80 * はaccept state。 | |
81 | |
82 [a-a] (3) で stateMatch で良いが、maxmum だと match している間は stateMatch はしない。 | |
83 現状は、*の付いているstateで、条件にmatchしない時に stateMatch してる。 | |
84 これだと state 3 で b で satete 1 に行ってしまい、b 以降に a がない時に失敗する。b に行く前の state 3 で stateMatchするべき。 | |
85 | |
86 matchする可能性がなくなったところで、前の部分でmatchさせる必要がある。 | |
87 * match してなければ、match top をupdate | |
88 * match している間は直前matchをupdate | |
89 * match fail したところで、直前のmatch があれば、それを返す | |
90 という感じか? | |
91 | |
92 minimum match は | |
93 * match してなければ、match top をupdate | |
94 * match したところで、直前のmatch があれば、それを返す | |
95 か? | |
96 | |
97 ソース生成を CbC に対応させる。(でないと動かないらしい) | |
289 | 98 |
99 | |
284 | 100 Sun Jan 31 20:37:49 JST 2016 masa |
289 | 101 並列処理時のバグ Ok |
102 (mili|have) のsubset construction のミス Ok | |
103 tSearch の segv Ok | |
284 | 104 |
289 | 105 '(main|int) ' .. Ok |
106 '(main|int)\(' .. Ok | |
287 | 107 |
108 とかが動かない。 | |
109 | |
291 | 110 start state に accept flag が立っていると''にmatchしてしまう。それは別に生成する。 |
111 | |
221 | 112 Sat Jan 2 15:29:16 JST 2016 kono |
113 | |
114 stateよりもstate transitionの方が大きいので、subset contructionで CharClassWalkするのは良くない。 | |
115 mergeTransition した時に、state listに新しいものを接続してやれば、CharClassWalkの必要はない。 | |
116 その時に、stateArray には入れないでおく。sateArrayは処理済みなので。 | |
117 | |
118 EOF stateには cc がないので特別扱いする必要がある。 | |
119 | |
120 Tue Dec 29 17:55:17 JST 2015 kono | |
215 | 121 |
122 Todo は上に付け加えていく。 | |
123 | |
124 abc*d + | |
125 / \ | |
126 + d | |
127 / \ | |
128 + * | |
129 / \ | | |
130 a b c | |
131 | |
132 Parserを書き換えて、 | |
133 | |
134 abc*d + | |
135 / \ | |
136 a + | |
137 / \ | |
138 b + | |
139 / \ | |
140 * d | |
141 | | |
142 c | |
143 | |
144 とすることもできる。たぶん、こっちの方が良い。でも、 | |
145 ((ab)(c*))d | |
146 と書いても良いはずで、しかも、これは abc*d とおなじになるので解決になってない。 | |
147 | |
148 sub treeは、最初の状態を返す必要がある。そうでないと、 | |
149 (ab*|bc*) | |
150 とかがうまく動かない。 | |
151 | |
152 最後が*で終わっている時には、次の式と重ねる必要がある。なので、 | |
153 最後の*があれば、それを持ち歩く | |
154 方式が良いと思います。 | |
155 | |
156 stateAllocateをgenerateTransitionは1 passにすると stateArrayの大きさを徐々に増やす必要がある。 | |
157 少なくともループは一つにした方が間違いが少ないだろう。 | |
158 | |
210
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
159 |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
160 2015年 12月27日 日曜日 19時31分03秒 JST |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
161 例題 特定の IP のアクセス数をカウントする |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
162 concordance |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
163 regex をつかった条件付き concordance |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
164 regex をつかった条件付き wordcount |
e8aa8a1ea749
add benchmark TODO
Masataka Kohagura <kohagura@cr.ie.u-ryukyu.ac.jp>
parents:
204
diff
changeset
|
165 これを行う perl スクリプトと比較 |
215 | 166 |
167 2015年 12月26日 土曜日 18時07分00秒 JST | |
168 TODO CharClassWalker の routine test を作成する | |
169 TODO CharClassMerge の routine test を作成する | |
170 TODO searchBit の routine test を作成する | |
171 TODO subsetConstraction の routine test を作成する | |
172 |