Mercurial > hg > Members > shinya > pyrect
annotate pyrect/regexp/analyzer.py @ 106:8102bf4bbec6
modify range stmt.
author | Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp> |
---|---|
date | Tue, 14 Dec 2010 15:02:25 +0900 |
parents | a6a0504dea7b |
children |
rev | line source |
---|---|
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
1 #!/usr/bin/env python |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
2 |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
3 """ |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
4 Extract Keywords from AST. Keywords, |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
5 which are necessary words to be accepted with Regular-Expression. |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
6 and which are used to Fixed-String-Filtering (ex: Boyer-Moore). |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
7 kwset is also used in GNU-GREP. |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
8 """ |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
9 |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
10 from pyrect.regexp.parser import Parser |
67
b02b321d0e06
implement bm_filter on mmap. but it's slower than dfa. ?;-(
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
11 from pyrect.regexp.ast import ASTWalker, Plus |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
12 |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
13 class Analyzer(ASTWalker): |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
14 """ Extract with Visitor-Pattern. |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
15 AST (ast), is represented by Node-Tree. |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
16 >>> prs = Parser() |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
17 >>> an = Analyzer() |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
18 >>> an.analyze(prs.parse('fixed-string')) |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
19 (12, 12, ['fixed-string']) |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
20 >>> an.analyze(prs.parse('(build|fndecl|gcc)')) |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
21 (6, 3, []) |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
22 >>> an.analyze(prs.parse('123(AB|CD)*456')) |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
23 (inf, 6, ['123', '456']) |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
24 >>> an.analyze(prs.parse('((12)+|3)|456')) |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
25 (inf, 1, []) |
73
a6a0504dea7b
modify bm-filter's implimentation. table-lookup -> switch. it's more simple and beautiful.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
70
diff
changeset
|
26 >>> an.analyze(prs.parse('^(plus)?(qmark)?')) |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
27 (9, 0, []) |
67
b02b321d0e06
implement bm_filter on mmap. but it's slower than dfa. ?;-(
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
28 >>> an.analyze(prs.parse('\*+ \[\[')) |
70
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
67
diff
changeset
|
29 (inf, 4, ['* [[']) |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
30 """ |
67
b02b321d0e06
implement bm_filter on mmap. but it's slower than dfa. ?;-(
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
31 |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
32 def __init__(self, ast=None): |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
33 if ast: |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
34 self.analyze(ast) |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
35 else: |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
36 self.max_len = 0 |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
37 self.min_len = 0 |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
38 |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
39 def analyze(self, ast=None): |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
40 if ast: |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
41 self.max_len, self.min_len, self.must_words = ast.accept(self) |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
42 self.must_words = [x for x in self.must_words if x != ""] |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
43 return self.max_len, self.min_len, self.must_words |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
44 |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
45 def visit(self, ast): |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
46 """Following Classes contain no-Keywords. |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
47 Union, Star |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
48 """ |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
49 return 1, 1, [str(ast)] |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
50 |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
51 def visit_Character(self, ast): |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
52 return 1, 1, [chr(ast.char)] |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
53 |
70
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
67
diff
changeset
|
54 def concat(self, (max1, min1, key1), (max2, min2, key2)): |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
67
diff
changeset
|
55 return max1 + max2, min1 + min2, key1[0:-1] \ |
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
67
diff
changeset
|
56 + ([key1[-1] + key2[0]]) + key2[1:] |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
57 |
70
74f4e50c4f11
add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
67
diff
changeset
|
58 def union(self, (max1, min1, _), (max2, min2, __)): |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
59 return max(max1, max2), min(min1, min2), ["", ""] |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
60 |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
61 def visit_Star(self, star): |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
62 return float("inf"), 0, ["", ""] |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
63 |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
64 def visit_Plus(self, plus): |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
65 (_, m, k) = plus.op.accept(self) |
67
b02b321d0e06
implement bm_filter on mmap. but it's slower than dfa. ?;-(
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
57
diff
changeset
|
66 return float("inf"), m, k + [""] + k |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
67 |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
68 def visit_Qmark(self, qmark): |
57
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
69 (m, _, _) = qmark.op.accept(self) |
81b44ae1cd73
add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
55
diff
changeset
|
70 return m, 0, ["", ""] |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
71 |
106
8102bf4bbec6
modify range stmt.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
73
diff
changeset
|
72 def visit_CharClass(self, cclass): |
8102bf4bbec6
modify range stmt.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
73
diff
changeset
|
73 return 1, 1, ["", ""] |
8102bf4bbec6
modify range stmt.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
73
diff
changeset
|
74 |
55
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
75 def test(): |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
76 import doctest |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
77 doctest.testmod() |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
78 |
4ae288b37591
ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff
changeset
|
79 if __name__ == "__main__": test() |