annotate pyrect/regexp/analyzer.py @ 106:8102bf4bbec6

modify range stmt.
author Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
date Tue, 14 Dec 2010 15:02:25 +0900
parents a6a0504dea7b
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
1 #!/usr/bin/env python
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
2
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
3 """
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
4 Extract Keywords from AST. Keywords,
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
5 which are necessary words to be accepted with Regular-Expression.
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
6 and which are used to Fixed-String-Filtering (ex: Boyer-Moore).
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
7 kwset is also used in GNU-GREP.
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
8 """
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
9
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
10 from pyrect.regexp.parser import Parser
67
b02b321d0e06 implement bm_filter on mmap. but it's slower than dfa. ?;-(
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
11 from pyrect.regexp.ast import ASTWalker, Plus
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
12
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
13 class Analyzer(ASTWalker):
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
14 """ Extract with Visitor-Pattern.
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
15 AST (ast), is represented by Node-Tree.
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
16 >>> prs = Parser()
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
17 >>> an = Analyzer()
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
18 >>> an.analyze(prs.parse('fixed-string'))
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
19 (12, 12, ['fixed-string'])
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
20 >>> an.analyze(prs.parse('(build|fndecl|gcc)'))
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
21 (6, 3, [])
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
22 >>> an.analyze(prs.parse('123(AB|CD)*456'))
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
23 (inf, 6, ['123', '456'])
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
24 >>> an.analyze(prs.parse('((12)+|3)|456'))
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
25 (inf, 1, [])
73
a6a0504dea7b modify bm-filter's implimentation. table-lookup -> switch. it's more simple and beautiful.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 70
diff changeset
26 >>> an.analyze(prs.parse('^(plus)?(qmark)?'))
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
27 (9, 0, [])
67
b02b321d0e06 implement bm_filter on mmap. but it's slower than dfa. ?;-(
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
28 >>> an.analyze(prs.parse('\*+ \[\['))
70
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
29 (inf, 4, ['* [['])
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
30 """
67
b02b321d0e06 implement bm_filter on mmap. but it's slower than dfa. ?;-(
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
31
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
32 def __init__(self, ast=None):
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
33 if ast:
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
34 self.analyze(ast)
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
35 else:
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
36 self.max_len = 0
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
37 self.min_len = 0
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
38
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
39 def analyze(self, ast=None):
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
40 if ast:
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
41 self.max_len, self.min_len, self.must_words = ast.accept(self)
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
42 self.must_words = [x for x in self.must_words if x != ""]
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
43 return self.max_len, self.min_len, self.must_words
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
44
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
45 def visit(self, ast):
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
46 """Following Classes contain no-Keywords.
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
47 Union, Star
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
48 """
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
49 return 1, 1, [str(ast)]
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
50
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
51 def visit_Character(self, ast):
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
52 return 1, 1, [chr(ast.char)]
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
53
70
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
54 def concat(self, (max1, min1, key1), (max2, min2, key2)):
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
55 return max1 + max2, min1 + min2, key1[0:-1] \
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
56 + ([key1[-1] + key2[0]]) + key2[1:]
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
57
70
74f4e50c4f11 add boost algorithm. but it's buggy, not work.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 67
diff changeset
58 def union(self, (max1, min1, _), (max2, min2, __)):
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
59 return max(max1, max2), min(min1, min2), ["", ""]
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
60
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
61 def visit_Star(self, star):
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
62 return float("inf"), 0, ["", ""]
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
63
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
64 def visit_Plus(self, plus):
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
65 (_, m, k) = plus.op.accept(self)
67
b02b321d0e06 implement bm_filter on mmap. but it's slower than dfa. ?;-(
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 57
diff changeset
66 return float("inf"), m, k + [""] + k
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
67
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
68 def visit_Qmark(self, qmark):
57
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
69 (m, _, _) = qmark.op.accept(self)
81b44ae1cd73 add fixed-string filter(Boyer-Moore), and add option '--disable-filter'.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 55
diff changeset
70 return m, 0, ["", ""]
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
71
106
8102bf4bbec6 modify range stmt.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 73
diff changeset
72 def visit_CharClass(self, cclass):
8102bf4bbec6 modify range stmt.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 73
diff changeset
73 return 1, 1, ["", ""]
8102bf4bbec6 modify range stmt.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents: 73
diff changeset
74
55
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
75 def test():
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
76 import doctest
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
77 doctest.testmod()
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
78
4ae288b37591 ddd analyzer. analyzer can analyzing to regexp max-length.
Ryoma SHINYA <shinya@firefly.cr.ie.u-ryukyu.ac.jp>
parents:
diff changeset
79 if __name__ == "__main__": test()