Mercurial > hg > CbC > CbC_llvm
comparison docs/re_format.7 @ 0:95c75e76d11b LLVM3.4
LLVM 3.4
author | Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp> |
---|---|
date | Thu, 12 Dec 2013 13:56:28 +0900 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:95c75e76d11b |
---|---|
1 .\" $OpenBSD: re_format.7,v 1.14 2007/05/31 19:19:30 jmc Exp $ | |
2 .\" | |
3 .\" Copyright (c) 1997, Phillip F Knaack. All rights reserved. | |
4 .\" | |
5 .\" Copyright (c) 1992, 1993, 1994 Henry Spencer. | |
6 .\" Copyright (c) 1992, 1993, 1994 | |
7 .\" The Regents of the University of California. All rights reserved. | |
8 .\" | |
9 .\" This code is derived from software contributed to Berkeley by | |
10 .\" Henry Spencer. | |
11 .\" | |
12 .\" Redistribution and use in source and binary forms, with or without | |
13 .\" modification, are permitted provided that the following conditions | |
14 .\" are met: | |
15 .\" 1. Redistributions of source code must retain the above copyright | |
16 .\" notice, this list of conditions and the following disclaimer. | |
17 .\" 2. Redistributions in binary form must reproduce the above copyright | |
18 .\" notice, this list of conditions and the following disclaimer in the | |
19 .\" documentation and/or other materials provided with the distribution. | |
20 .\" 3. Neither the name of the University nor the names of its contributors | |
21 .\" may be used to endorse or promote products derived from this software | |
22 .\" without specific prior written permission. | |
23 .\" | |
24 .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND | |
25 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
26 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
27 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE | |
28 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
29 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |
30 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |
31 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |
32 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |
33 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |
34 .\" SUCH DAMAGE. | |
35 .\" | |
36 .\" @(#)re_format.7 8.3 (Berkeley) 3/20/94 | |
37 .\" | |
38 .Dd $Mdocdate: May 31 2007 $ | |
39 .Dt RE_FORMAT 7 | |
40 .Os | |
41 .Sh NAME | |
42 .Nm re_format | |
43 .Nd POSIX regular expressions | |
44 .Sh DESCRIPTION | |
45 Regular expressions (REs), | |
46 as defined in | |
47 .St -p1003.1-2004 , | |
48 come in two forms: | |
49 basic regular expressions | |
50 (BREs) | |
51 and extended regular expressions | |
52 (EREs). | |
53 Both forms of regular expressions are supported | |
54 by the interfaces described in | |
55 .Xr regex 3 . | |
56 Applications dealing with regular expressions | |
57 may use one or the other form | |
58 (or indeed both). | |
59 For example, | |
60 .Xr ed 1 | |
61 uses BREs, | |
62 whilst | |
63 .Xr egrep 1 | |
64 talks EREs. | |
65 Consult the manual page for the specific application to find out which | |
66 it uses. | |
67 .Pp | |
68 POSIX leaves some aspects of RE syntax and semantics open; | |
69 .Sq ** | |
70 marks decisions on these aspects that | |
71 may not be fully portable to other POSIX implementations. | |
72 .Pp | |
73 This manual page first describes regular expressions in general, | |
74 specifically extended regular expressions, | |
75 and then discusses differences between them and basic regular expressions. | |
76 .Sh EXTENDED REGULAR EXPRESSIONS | |
77 An ERE is one** or more non-empty** | |
78 .Em branches , | |
79 separated by | |
80 .Sq \*(Ba . | |
81 It matches anything that matches one of the branches. | |
82 .Pp | |
83 A branch is one** or more | |
84 .Em pieces , | |
85 concatenated. | |
86 It matches a match for the first, followed by a match for the second, etc. | |
87 .Pp | |
88 A piece is an | |
89 .Em atom | |
90 possibly followed by a single** | |
91 .Sq * , | |
92 .Sq + , | |
93 .Sq ?\& , | |
94 or | |
95 .Em bound . | |
96 An atom followed by | |
97 .Sq * | |
98 matches a sequence of 0 or more matches of the atom. | |
99 An atom followed by | |
100 .Sq + | |
101 matches a sequence of 1 or more matches of the atom. | |
102 An atom followed by | |
103 .Sq ?\& | |
104 matches a sequence of 0 or 1 matches of the atom. | |
105 .Pp | |
106 A bound is | |
107 .Sq { | |
108 followed by an unsigned decimal integer, | |
109 possibly followed by | |
110 .Sq ,\& | |
111 possibly followed by another unsigned decimal integer, | |
112 always followed by | |
113 .Sq } . | |
114 The integers must lie between 0 and | |
115 .Dv RE_DUP_MAX | |
116 (255**) inclusive, | |
117 and if there are two of them, the first may not exceed the second. | |
118 An atom followed by a bound containing one integer | |
119 .Ar i | |
120 and no comma matches | |
121 a sequence of exactly | |
122 .Ar i | |
123 matches of the atom. | |
124 An atom followed by a bound | |
125 containing one integer | |
126 .Ar i | |
127 and a comma matches | |
128 a sequence of | |
129 .Ar i | |
130 or more matches of the atom. | |
131 An atom followed by a bound | |
132 containing two integers | |
133 .Ar i | |
134 and | |
135 .Ar j | |
136 matches a sequence of | |
137 .Ar i | |
138 through | |
139 .Ar j | |
140 (inclusive) matches of the atom. | |
141 .Pp | |
142 An atom is a regular expression enclosed in | |
143 .Sq () | |
144 (matching a part of the regular expression), | |
145 an empty set of | |
146 .Sq () | |
147 (matching the null string)**, | |
148 a | |
149 .Em bracket expression | |
150 (see below), | |
151 .Sq .\& | |
152 (matching any single character), | |
153 .Sq ^ | |
154 (matching the null string at the beginning of a line), | |
155 .Sq $ | |
156 (matching the null string at the end of a line), | |
157 a | |
158 .Sq \e | |
159 followed by one of the characters | |
160 .Sq ^.[$()|*+?{\e | |
161 (matching that character taken as an ordinary character), | |
162 a | |
163 .Sq \e | |
164 followed by any other character** | |
165 (matching that character taken as an ordinary character, | |
166 as if the | |
167 .Sq \e | |
168 had not been present**), | |
169 or a single character with no other significance (matching that character). | |
170 A | |
171 .Sq { | |
172 followed by a character other than a digit is an ordinary character, | |
173 not the beginning of a bound**. | |
174 It is illegal to end an RE with | |
175 .Sq \e . | |
176 .Pp | |
177 A bracket expression is a list of characters enclosed in | |
178 .Sq [] . | |
179 It normally matches any single character from the list (but see below). | |
180 If the list begins with | |
181 .Sq ^ , | |
182 it matches any single character | |
183 .Em not | |
184 from the rest of the list | |
185 (but see below). | |
186 If two characters in the list are separated by | |
187 .Sq - , | |
188 this is shorthand for the full | |
189 .Em range | |
190 of characters between those two (inclusive) in the | |
191 collating sequence, e.g.\& | |
192 .Sq [0-9] | |
193 in ASCII matches any decimal digit. | |
194 It is illegal** for two ranges to share an endpoint, e.g.\& | |
195 .Sq a-c-e . | |
196 Ranges are very collating-sequence-dependent, | |
197 and portable programs should avoid relying on them. | |
198 .Pp | |
199 To include a literal | |
200 .Sq ]\& | |
201 in the list, make it the first character | |
202 (following a possible | |
203 .Sq ^ ) . | |
204 To include a literal | |
205 .Sq - , | |
206 make it the first or last character, | |
207 or the second endpoint of a range. | |
208 To use a literal | |
209 .Sq - | |
210 as the first endpoint of a range, | |
211 enclose it in | |
212 .Sq [. | |
213 and | |
214 .Sq .] | |
215 to make it a collating element (see below). | |
216 With the exception of these and some combinations using | |
217 .Sq [ | |
218 (see next paragraphs), | |
219 all other special characters, including | |
220 .Sq \e , | |
221 lose their special significance within a bracket expression. | |
222 .Pp | |
223 Within a bracket expression, a collating element | |
224 (a character, | |
225 a multi-character sequence that collates as if it were a single character, | |
226 or a collating-sequence name for either) | |
227 enclosed in | |
228 .Sq [. | |
229 and | |
230 .Sq .] | |
231 stands for the sequence of characters of that collating element. | |
232 The sequence is a single element of the bracket expression's list. | |
233 A bracket expression containing a multi-character collating element | |
234 can thus match more than one character, | |
235 e.g. if the collating sequence includes a | |
236 .Sq ch | |
237 collating element, | |
238 then the RE | |
239 .Sq [[.ch.]]*c | |
240 matches the first five characters of | |
241 .Sq chchcc . | |
242 .Pp | |
243 Within a bracket expression, a collating element enclosed in | |
244 .Sq [= | |
245 and | |
246 .Sq =] | |
247 is an equivalence class, standing for the sequences of characters | |
248 of all collating elements equivalent to that one, including itself. | |
249 (If there are no other equivalent collating elements, | |
250 the treatment is as if the enclosing delimiters were | |
251 .Sq [. | |
252 and | |
253 .Sq .] . ) | |
254 For example, if | |
255 .Sq x | |
256 and | |
257 .Sq y | |
258 are the members of an equivalence class, | |
259 then | |
260 .Sq [[=x=]] , | |
261 .Sq [[=y=]] , | |
262 and | |
263 .Sq [xy] | |
264 are all synonymous. | |
265 An equivalence class may not** be an endpoint of a range. | |
266 .Pp | |
267 Within a bracket expression, the name of a | |
268 .Em character class | |
269 enclosed | |
270 in | |
271 .Sq [: | |
272 and | |
273 .Sq :] | |
274 stands for the list of all characters belonging to that class. | |
275 Standard character class names are: | |
276 .Bd -literal -offset indent | |
277 alnum digit punct | |
278 alpha graph space | |
279 blank lower upper | |
280 cntrl print xdigit | |
281 .Ed | |
282 .Pp | |
283 These stand for the character classes defined in | |
284 .Xr ctype 3 . | |
285 A locale may provide others. | |
286 A character class may not be used as an endpoint of a range. | |
287 .Pp | |
288 There are two special cases** of bracket expressions: | |
289 the bracket expressions | |
290 .Sq [[:<:]] | |
291 and | |
292 .Sq [[:>:]] | |
293 match the null string at the beginning and end of a word, respectively. | |
294 A word is defined as a sequence of | |
295 characters starting and ending with a word character | |
296 which is neither preceded nor followed by | |
297 word characters. | |
298 A word character is an | |
299 .Em alnum | |
300 character (as defined by | |
301 .Xr ctype 3 ) | |
302 or an underscore. | |
303 This is an extension, | |
304 compatible with but not specified by POSIX, | |
305 and should be used with | |
306 caution in software intended to be portable to other systems. | |
307 .Pp | |
308 In the event that an RE could match more than one substring of a given | |
309 string, | |
310 the RE matches the one starting earliest in the string. | |
311 If the RE could match more than one substring starting at that point, | |
312 it matches the longest. | |
313 Subexpressions also match the longest possible substrings, subject to | |
314 the constraint that the whole match be as long as possible, | |
315 with subexpressions starting earlier in the RE taking priority over | |
316 ones starting later. | |
317 Note that higher-level subexpressions thus take priority over | |
318 their lower-level component subexpressions. | |
319 .Pp | |
320 Match lengths are measured in characters, not collating elements. | |
321 A null string is considered longer than no match at all. | |
322 For example, | |
323 .Sq bb* | |
324 matches the three middle characters of | |
325 .Sq abbbc ; | |
326 .Sq (wee|week)(knights|nights) | |
327 matches all ten characters of | |
328 .Sq weeknights ; | |
329 when | |
330 .Sq (.*).* | |
331 is matched against | |
332 .Sq abc , | |
333 the parenthesized subexpression matches all three characters; | |
334 and when | |
335 .Sq (a*)* | |
336 is matched against | |
337 .Sq bc , | |
338 both the whole RE and the parenthesized subexpression match the null string. | |
339 .Pp | |
340 If case-independent matching is specified, | |
341 the effect is much as if all case distinctions had vanished from the | |
342 alphabet. | |
343 When an alphabetic that exists in multiple cases appears as an | |
344 ordinary character outside a bracket expression, it is effectively | |
345 transformed into a bracket expression containing both cases, | |
346 e.g.\& | |
347 .Sq x | |
348 becomes | |
349 .Sq [xX] . | |
350 When it appears inside a bracket expression, | |
351 all case counterparts of it are added to the bracket expression, | |
352 so that, for example, | |
353 .Sq [x] | |
354 becomes | |
355 .Sq [xX] | |
356 and | |
357 .Sq [^x] | |
358 becomes | |
359 .Sq [^xX] . | |
360 .Pp | |
361 No particular limit is imposed on the length of REs**. | |
362 Programs intended to be portable should not employ REs longer | |
363 than 256 bytes, | |
364 as an implementation can refuse to accept such REs and remain | |
365 POSIX-compliant. | |
366 .Pp | |
367 The following is a list of extended regular expressions: | |
368 .Bl -tag -width Ds | |
369 .It Ar c | |
370 Any character | |
371 .Ar c | |
372 not listed below matches itself. | |
373 .It \e Ns Ar c | |
374 Any backslash-escaped character | |
375 .Ar c | |
376 matches itself. | |
377 .It \&. | |
378 Matches any single character that is not a newline | |
379 .Pq Sq \en . | |
380 .It Bq Ar char-class | |
381 Matches any single character in | |
382 .Ar char-class . | |
383 To include a | |
384 .Ql \&] | |
385 in | |
386 .Ar char-class , | |
387 it must be the first character. | |
388 A range of characters may be specified by separating the end characters | |
389 of the range with a | |
390 .Ql - ; | |
391 e.g.\& | |
392 .Ar a-z | |
393 specifies the lower case characters. | |
394 The following literal expressions can also be used in | |
395 .Ar char-class | |
396 to specify sets of characters: | |
397 .Bd -unfilled -offset indent | |
398 [:alnum:] [:cntrl:] [:lower:] [:space:] | |
399 [:alpha:] [:digit:] [:print:] [:upper:] | |
400 [:blank:] [:graph:] [:punct:] [:xdigit:] | |
401 .Ed | |
402 .Pp | |
403 If | |
404 .Ql - | |
405 appears as the first or last character of | |
406 .Ar char-class , | |
407 then it matches itself. | |
408 All other characters in | |
409 .Ar char-class | |
410 match themselves. | |
411 .Pp | |
412 Patterns in | |
413 .Ar char-class | |
414 of the form | |
415 .Eo [. | |
416 .Ar col-elm | |
417 .Ec .]\& | |
418 or | |
419 .Eo [= | |
420 .Ar col-elm | |
421 .Ec =]\& , | |
422 where | |
423 .Ar col-elm | |
424 is a collating element, are interpreted according to | |
425 .Xr setlocale 3 | |
426 .Pq not currently supported . | |
427 .It Bq ^ Ns Ar char-class | |
428 Matches any single character, other than newline, not in | |
429 .Ar char-class . | |
430 .Ar char-class | |
431 is defined as above. | |
432 .It ^ | |
433 If | |
434 .Sq ^ | |
435 is the first character of a regular expression, then it | |
436 anchors the regular expression to the beginning of a line. | |
437 Otherwise, it matches itself. | |
438 .It $ | |
439 If | |
440 .Sq $ | |
441 is the last character of a regular expression, | |
442 it anchors the regular expression to the end of a line. | |
443 Otherwise, it matches itself. | |
444 .It [[:<:]] | |
445 Anchors the single character regular expression or subexpression | |
446 immediately following it to the beginning of a word. | |
447 .It [[:>:]] | |
448 Anchors the single character regular expression or subexpression | |
449 immediately following it to the end of a word. | |
450 .It Pq Ar re | |
451 Defines a subexpression | |
452 .Ar re . | |
453 Any set of characters enclosed in parentheses | |
454 matches whatever the set of characters without parentheses matches | |
455 (that is a long-winded way of saying the constructs | |
456 .Sq (re) | |
457 and | |
458 .Sq re | |
459 match identically). | |
460 .It * | |
461 Matches the single character regular expression or subexpression | |
462 immediately preceding it zero or more times. | |
463 If | |
464 .Sq * | |
465 is the first character of a regular expression or subexpression, | |
466 then it matches itself. | |
467 The | |
468 .Sq * | |
469 operator sometimes yields unexpected results. | |
470 For example, the regular expression | |
471 .Ar b* | |
472 matches the beginning of the string | |
473 .Qq abbb | |
474 (as opposed to the substring | |
475 .Qq bbb ) , | |
476 since a null match is the only leftmost match. | |
477 .It + | |
478 Matches the singular character regular expression | |
479 or subexpression immediately preceding it | |
480 one or more times. | |
481 .It ? | |
482 Matches the singular character regular expression | |
483 or subexpression immediately preceding it | |
484 0 or 1 times. | |
485 .Sm off | |
486 .It Xo | |
487 .Pf { Ar n , m No }\ \& | |
488 .Pf { Ar n , No }\ \& | |
489 .Pf { Ar n No } | |
490 .Xc | |
491 .Sm on | |
492 Matches the single character regular expression or subexpression | |
493 immediately preceding it at least | |
494 .Ar n | |
495 and at most | |
496 .Ar m | |
497 times. | |
498 If | |
499 .Ar m | |
500 is omitted, then it matches at least | |
501 .Ar n | |
502 times. | |
503 If the comma is also omitted, then it matches exactly | |
504 .Ar n | |
505 times. | |
506 .It \*(Ba | |
507 Used to separate patterns. | |
508 For example, | |
509 the pattern | |
510 .Sq cat\*(Badog | |
511 matches either | |
512 .Sq cat | |
513 or | |
514 .Sq dog . | |
515 .El | |
516 .Sh BASIC REGULAR EXPRESSIONS | |
517 Basic regular expressions differ in several respects: | |
518 .Bl -bullet -offset 3n | |
519 .It | |
520 .Sq \*(Ba , | |
521 .Sq + , | |
522 and | |
523 .Sq ?\& | |
524 are ordinary characters and there is no equivalent | |
525 for their functionality. | |
526 .It | |
527 The delimiters for bounds are | |
528 .Sq \e{ | |
529 and | |
530 .Sq \e} , | |
531 with | |
532 .Sq { | |
533 and | |
534 .Sq } | |
535 by themselves ordinary characters. | |
536 .It | |
537 The parentheses for nested subexpressions are | |
538 .Sq \e( | |
539 and | |
540 .Sq \e) , | |
541 with | |
542 .Sq ( | |
543 and | |
544 .Sq )\& | |
545 by themselves ordinary characters. | |
546 .It | |
547 .Sq ^ | |
548 is an ordinary character except at the beginning of the | |
549 RE or** the beginning of a parenthesized subexpression. | |
550 .It | |
551 .Sq $ | |
552 is an ordinary character except at the end of the | |
553 RE or** the end of a parenthesized subexpression. | |
554 .It | |
555 .Sq * | |
556 is an ordinary character if it appears at the beginning of the | |
557 RE or the beginning of a parenthesized subexpression | |
558 (after a possible leading | |
559 .Sq ^ ) . | |
560 .It | |
561 Finally, there is one new type of atom, a | |
562 .Em back-reference : | |
563 .Sq \e | |
564 followed by a non-zero decimal digit | |
565 .Ar d | |
566 matches the same sequence of characters matched by the | |
567 .Ar d Ns th | |
568 parenthesized subexpression | |
569 (numbering subexpressions by the positions of their opening parentheses, | |
570 left to right), | |
571 so that, for example, | |
572 .Sq \e([bc]\e)\e1 | |
573 matches | |
574 .Sq bb\& | |
575 or | |
576 .Sq cc | |
577 but not | |
578 .Sq bc . | |
579 .El | |
580 .Pp | |
581 The following is a list of basic regular expressions: | |
582 .Bl -tag -width Ds | |
583 .It Ar c | |
584 Any character | |
585 .Ar c | |
586 not listed below matches itself. | |
587 .It \e Ns Ar c | |
588 Any backslash-escaped character | |
589 .Ar c , | |
590 except for | |
591 .Sq { , | |
592 .Sq } , | |
593 .Sq \&( , | |
594 and | |
595 .Sq \&) , | |
596 matches itself. | |
597 .It \&. | |
598 Matches any single character that is not a newline | |
599 .Pq Sq \en . | |
600 .It Bq Ar char-class | |
601 Matches any single character in | |
602 .Ar char-class . | |
603 To include a | |
604 .Ql \&] | |
605 in | |
606 .Ar char-class , | |
607 it must be the first character. | |
608 A range of characters may be specified by separating the end characters | |
609 of the range with a | |
610 .Ql - ; | |
611 e.g.\& | |
612 .Ar a-z | |
613 specifies the lower case characters. | |
614 The following literal expressions can also be used in | |
615 .Ar char-class | |
616 to specify sets of characters: | |
617 .Bd -unfilled -offset indent | |
618 [:alnum:] [:cntrl:] [:lower:] [:space:] | |
619 [:alpha:] [:digit:] [:print:] [:upper:] | |
620 [:blank:] [:graph:] [:punct:] [:xdigit:] | |
621 .Ed | |
622 .Pp | |
623 If | |
624 .Ql - | |
625 appears as the first or last character of | |
626 .Ar char-class , | |
627 then it matches itself. | |
628 All other characters in | |
629 .Ar char-class | |
630 match themselves. | |
631 .Pp | |
632 Patterns in | |
633 .Ar char-class | |
634 of the form | |
635 .Eo [. | |
636 .Ar col-elm | |
637 .Ec .]\& | |
638 or | |
639 .Eo [= | |
640 .Ar col-elm | |
641 .Ec =]\& , | |
642 where | |
643 .Ar col-elm | |
644 is a collating element, are interpreted according to | |
645 .Xr setlocale 3 | |
646 .Pq not currently supported . | |
647 .It Bq ^ Ns Ar char-class | |
648 Matches any single character, other than newline, not in | |
649 .Ar char-class . | |
650 .Ar char-class | |
651 is defined as above. | |
652 .It ^ | |
653 If | |
654 .Sq ^ | |
655 is the first character of a regular expression, then it | |
656 anchors the regular expression to the beginning of a line. | |
657 Otherwise, it matches itself. | |
658 .It $ | |
659 If | |
660 .Sq $ | |
661 is the last character of a regular expression, | |
662 it anchors the regular expression to the end of a line. | |
663 Otherwise, it matches itself. | |
664 .It [[:<:]] | |
665 Anchors the single character regular expression or subexpression | |
666 immediately following it to the beginning of a word. | |
667 .It [[:>:]] | |
668 Anchors the single character regular expression or subexpression | |
669 immediately following it to the end of a word. | |
670 .It \e( Ns Ar re Ns \e) | |
671 Defines a subexpression | |
672 .Ar re . | |
673 Subexpressions may be nested. | |
674 A subsequent backreference of the form | |
675 .Pf \e Ns Ar n , | |
676 where | |
677 .Ar n | |
678 is a number in the range [1,9], expands to the text matched by the | |
679 .Ar n Ns th | |
680 subexpression. | |
681 For example, the regular expression | |
682 .Ar \e(.*\e)\e1 | |
683 matches any string consisting of identical adjacent substrings. | |
684 Subexpressions are ordered relative to their left delimiter. | |
685 .It * | |
686 Matches the single character regular expression or subexpression | |
687 immediately preceding it zero or more times. | |
688 If | |
689 .Sq * | |
690 is the first character of a regular expression or subexpression, | |
691 then it matches itself. | |
692 The | |
693 .Sq * | |
694 operator sometimes yields unexpected results. | |
695 For example, the regular expression | |
696 .Ar b* | |
697 matches the beginning of the string | |
698 .Qq abbb | |
699 (as opposed to the substring | |
700 .Qq bbb ) , | |
701 since a null match is the only leftmost match. | |
702 .Sm off | |
703 .It Xo | |
704 .Pf \e{ Ar n , m No \e}\ \& | |
705 .Pf \e{ Ar n , No \e}\ \& | |
706 .Pf \e{ Ar n No \e} | |
707 .Xc | |
708 .Sm on | |
709 Matches the single character regular expression or subexpression | |
710 immediately preceding it at least | |
711 .Ar n | |
712 and at most | |
713 .Ar m | |
714 times. | |
715 If | |
716 .Ar m | |
717 is omitted, then it matches at least | |
718 .Ar n | |
719 times. | |
720 If the comma is also omitted, then it matches exactly | |
721 .Ar n | |
722 times. | |
723 .El | |
724 .Sh SEE ALSO | |
725 .Xr ctype 3 , | |
726 .Xr regex 3 | |
727 .Sh STANDARDS | |
728 .St -p1003.1-2004 : | |
729 Base Definitions, Chapter 9 (Regular Expressions). | |
730 .Sh BUGS | |
731 Having two kinds of REs is a botch. | |
732 .Pp | |
733 The current POSIX spec says that | |
734 .Sq )\& | |
735 is an ordinary character in the absence of an unmatched | |
736 .Sq ( ; | |
737 this was an unintentional result of a wording error, | |
738 and change is likely. | |
739 Avoid relying on it. | |
740 .Pp | |
741 Back-references are a dreadful botch, | |
742 posing major problems for efficient implementations. | |
743 They are also somewhat vaguely defined | |
744 (does | |
745 .Sq a\e(\e(b\e)*\e2\e)*d | |
746 match | |
747 .Sq abbbd ? ) . | |
748 Avoid using them. | |
749 .Pp | |
750 POSIX's specification of case-independent matching is vague. | |
751 The | |
752 .Dq one case implies all cases | |
753 definition given above | |
754 is the current consensus among implementors as to the right interpretation. | |
755 .Pp | |
756 The syntax for word boundaries is incredibly ugly. |