RegExpr_Parse

int RegExpr_Parse (const char *regularExpressionText, int caseSensitive, int *regularExpressionHandle);

Purpose

This function parses a regular expression string.

If the string is a valid regular expression, a value is placed into the 'handle' parameter, which can then be passed to the following functions:

RegExpr_MatchText
RegExpr_Destroy

If the string is not a valid regular expression, a negative error number is returned. You can pass this error number to the RegExpr_GetErrorString function. However, in some cases there is more error information than can be encoded in the error number. You can get more detailed information about the result of the last call to this function by calling RegExpr_GetErrorElaboration.

Parameters

Input
Name Type Description
regularExpressionText const char * A nul–terminated string containing a regular expression.

A regular expression consists of the following tokens:
. (period) match 1 character a.t matches act and apt but not abort or at
* (asterisk) match 0 or more occurrences of the preceding character or {expression} 0*1 matches 1, 01, 001, etc. a.* matches act, apt, abort, and at
+ (plus sign) match 1 or more occurrences of the preceding character or {expression} 0+1 matches 01, 001, 0001, etc. {ab}+c matches abc, ababc, but not c
? (question mark) match 0 or 1 occurrences of the preceding character or {expression} 0?1 matches 1, 01, but not 001
| (pipe) matches either the preceding or following character or {expression} a3|4b matches a3b or a4b
^ (caret) matches the beginning of a line ^int matches any line that begins with int
$ (dollar sign) matches the end of a line done$ matches any line that ends with done
{} (curly braces) groups characters or expressions {a3}|{4b} matches a3 or 4b
[] (brackets) matches any one character or range listed within the brackets [a–z] matches every occurrence of lowercase letters [abc] matches every occurrence of a, b or c
~ (tilde) when appearing immediately after the left bracket, negates the contents of the set [~a–z] matches anything except lowercase letters [a–z~A–Z] matches all letters and the '~' character
\t (backslash t) matches any tab character \t3 matches every occurrence of a tab followed by a 3
\x (backslash x) matches any character specified in hex \x2a matches every occurrence of the '*' specified in hexcharacter
\ (backslash) used if any of the above characters themselves are to be included in the search \–\?\\ matches every occurrence of '–' followed by '?' and '\'
caseSensitive integer Specifies whether the matching of characters is to be done on a case–senstive or case–insensitive basis.

A non–zero value specifies that characters are to be matched on a case–sensitive basis. For example. "chr" would match only to "chr" and not to "CHR".

A zero value specifies that characters are to be matched on a case–insensitive basis. For example. "chr" would match to "chr", "CHR", and "Chr".

This parameter does apply to ranges. For example, if this parameter is non–zero, then "[a–z]" in the regular expression string would match to any lowercase letter. If this parameter is zero, then "[a–z]" would match to any letter.
Output
Name Type Description
regularExpressionHandle integer * A handle that represents the parse regular expression.

It can be passed to the following functions:

RegExpr_GetFirstCharVec
RegExpr_MatchText
RegExpr_Destroy

When you are done with the regular expression, you should call RegExpr_Destroy on the handle. Otherwise, you will lose some memory.

Return Value

Name Type Description
parseStatus integer Indicates if the regular expression was parsed successfully.

If the string is not a valid regular expression, a negative error number is returned. You can pass this error number to the RegExpr_GetErrorString function. However, in some cases there is more error information than can be encoded in the error number. You can get more detailed information about the result of the last call to this function by calling RegExpr_GetErrorElaboration.

The error numbers are:

0 success
–12 out of memory
–7900 unmatched character
–7901 invalid character in range
–7902 regular expression ends with backslash
–7903 invalid hex character (after \x)
–7904 operator applied to an empty pattern
–7905 empty left side of '|'
–7906 empty right side of '|'
–7907 empty group
–7908 invalid range
–7909 empty set ([])
–7910 empty input string
–7911 NULL input string
–7912 multibyte characters not allowed in range