Team LiB
Previous Section Next Section

RegExp Object

Now that we’ve covered how to form regular expressions, it is time to look at how to use them. We do so by discussing the properties and methods of the RegExp and String objects that can be used to test and parse strings. Recall that regular expressions created with the literal syntax in the previous section are in fact RegExp objects. In this section, we favor the object syntax so the reader will be familiar with both.

test()

The simplest RegExp method, which we have already seen in this chapter numerous times, is test(). This method returns a Boolean value indicating whether the given string argument matches the regular expression. Here we construct a regular expression and then use it to test against two strings:

var pattern = new RegExp("a*bbbc", "i"); // case-insensitive matchingalert(pattern.test("1a12c")); //displays falsealert(pattern.test("aaabBbcded")); //displays true

Subexpressions

The RegExp object provides an easy way to extract pieces of a string that match parts of your patterns. This is accomplished by grouping (placing parentheses around) the portions of the pattern you wish to extract. For example, suppose you wished to extract first names and phone numbers from strings that look like this,

Firstname Lastname NNN-NNNN

where N’s are the digits of a phone number.

You could use the following regular expression, grouping the part that is intended to match the first name as well as the part intended to match the phone number:

var pattern = /(\w+) \w+ ([\d-]{8})/;

This pattern is read as one or more word characters, followed by a space and another sequence of one or more word characters, followed by another space and then followed by an eight-character string composed of digits and dashes.

When this pattern is applied to a string, the parentheses induce subexpressions. When a match is successful, these parenthesized subexpressions can be referred to individually by using static properties $1 to $9 of the RegExp class object. To continue our example:

var customer = "Alan Turing 555-1212";var pattern = /(\w+) \w+ ([\d-]{8})/;pattern.test(customer);

Since the pattern contained parentheses that created two subexpressions, \w+ and [\d-]{8}, we can reference the two substrings they match, “Alan” and “555-1212,” individually. Substrings accessed in this manner are numbered from left to right, beginning with $1 and ending typically with $9. For example,

var customer = "Alan Turing 555-1212";var pattern = /(\w+) \w+ ([\d-]{8})/;if (pattern.test(customer)) alert("RegExp.$1 = " + RegExp.$1 + "\nRegExp.$2 = " + RegExp.$2);

displays the alert shown here:

Notice the use of the RegExp class object to access the subexpression components, not the RegExp instance or pattern we created.

Note 

According to the ECMA specification, you should be able to reference more than nine subexpressions. In fact, up to 99 should be allowed using identifiers like $10, $11, and so on. At the time of this book's writing, however, common browsers support no more than nine.

compile()

A rather infrequently used method is compile(), which replaces an existing regular expression with a new one. This method takes the same arguments as the RegExp() constructor (a string containing the pattern and an optional string containing the flags) and can be used to create a new expression by discarding an old one:

var pattern = new RegExp("http:.* ","i");// do something with your regexppattern.compile("https:.* ", "i"); // replaced the regexp in pattern with new pattern

Another use of this function is for efficiency. Regular expressions declared with the RegExp constructor are “compiled” (turned into string matching routines by the interpreter) each time they are used, and this can be a time-consuming process, particularly if the pattern is complicated. Explicitly calling compile() saves the recompilation overhead at each use by compiling a regexp once, ahead of time.

exec()

The RegExp object also provides a method called exec(). This method is used when you’d like to test whether a given string matches a pattern and would additionally like more information about the match, for example, the offset in the string at which the pattern first appears. You can also repeatedly apply this method to a string in order to step through the portions of the string that match, one by one.

The exec() method accepts a string to match against, and it can be written shorthand by directly invoking the name of the regexp as a function. For example, the two invocations in the following example are equivalent:

var pattern = /http:.*/;pattern.exec("http://www.w3c.org/");pattern("http://www.w3c.org/");

The exec() method returns an array with a variety of properties. Included are the length of the array; input, which shows the original input string; index, which holds the character index at which the matching portion of the string begins; and lastIndex, which points to the character after the match, which is also where the next search will begin. The script here illustrates the exec() method and its returned values:

var pattern = /cat/;var result = pattern.exec("He is a big cat, a fat black cat named Rufus.");

document.writeln("result = "+result+"<<br />>");
document.writeln("result.length = "+result.length+"<<br />>");
document.writeln("result.index = "+result.index+"<<br />>");
document.writeln("result.lastIndex = "+result.lastIndex+"<<br />>");
document.writeln("result.input = "+result.input+"<<br />>");

The result of this example is shown here:

Click To expand

The array returned may have more than one element if subexpressions are used. For example, the following script has a set of three parenthesized subexpressions that are parsed out in the array separately:

var pattern = /(cat) (and) (dog) /;
var result = pattern.exec("My cat and dog are black.");

document.writeln("result = "+result);
document.writeln("result.length = "+result.length);
document.writeln("result.index = "+result.index);
document.writeln("result.lastIndex = "+result.lastIndex);
document.writeln("result.input = "+result.input);

As you can see from the result,

Click To expand

the exec() method places the entire matched string in element zero of the array and any substrings that match parenthesized subexpressions in subsequent elements.

exec() and the Global Flag

Sometimes you might wish to extract not just the first occurrence of a pattern in a string, but each occurrence of it. Adding the global flag (g) to a regular expression indicates the intent to search for every occurrence (i.e., globally) instead of just the first.

The way the global flag is interpreted by RegExp and by String is a bit subtle. In RegExp, it’s used to perform a global search incrementally, that is, by parsing out each successive occurrence of the pattern one at a time. In String, it’s used to perform a global search all at once, that is, by parsing out all occurrences of the pattern in one single function call. We’ll cover using the global flag with String methods in the following section.

To demonstrate the difference between a regexp with the global flag set and one without, consider the following simple example:

var lucky = "The lucky numbers are 3, 14, and 27";
var pattern = /\d+/;
document.writeln("Without global we get:");
document.writeln(pattern.exec(lucky));
document.writeln(pattern.exec(lucky));
document.writeln(pattern.exec(lucky));
pattern = /\d+/g;
document.writeln("With global we get:");
document.writeln(pattern.exec(lucky));
document.writeln(pattern.exec(lucky));
document.writeln(pattern.exec(lucky));

As you can see in Figure 8-2, when the global flag is set, the exec() starts searching where the previous match ended. Without the global flag, exec() always returns the first matching portion of the string.

Click To expand
Figure 8-2: The global flag starts searching where the previous match left off.

How does global matching work? Recall that exec() sets the lastIndex property of both the array returned and the RegExp class object to point to the character immediately following the substring that was most recently matched. Subsequent calls to the exec() method begin their search from the offset lastIndex in the string. If no match is found, lastIndex is set to zero.

A common use of exec() is to loop through each substring matching a regular expression, obtaining complete information about each match. This use is illustrated in the following example, which matches words in the given string. The result (when used within a <<pre>> tag) is shown in Figure 8-3. Notice how lastIndex is set appropriately, as we discussed.

Click To expand
Figure 8-3: Parsing out words in a string using exec() on a regexp with the global flag set
var sentence = "A very interesting sentence.";
var pattern = /\b\w+\b/g;        // recognizes words; global
var token = pattern.exec(sentence);   // get the first match
while (token != null)
{
 // if we have a match, print information about it
 document.writeln("Matched " + token[0] + " ");
 document.writeln("\ttoken.input = " + token.input);
 document.writeln("\ttoken.index = " + token.index);
 document.writeln("\ttoken.lastIndex = " + token.lastIndex + "\n ");
 token = pattern.exec(sentence);    // get the next match
}

One caveat when using the exec() method: If you stop a search before finding the last match, you need to manually set the lastIndex property of the regular expression to zero. If you do not, the next time you use that regexp, it will automatically start matching at offset lastIndex rather than at the beginning of the string.

Note 

The test() method obeys lastIndex as well, so it can be used to incrementally search a string in the same manner as exec(). Think of test() as a simplified, Boolean version of exec().

RegExp Properties

Examining the internals of regular expression instance objects as well as the static (class) properties of the RegExp object can be helpful when performing complex matching tasks and during debugging. The instance properties of RegExp objects are listed in Table 8-6 and, with a few exceptions, should be familiar to the reader by this point.

Table 8-6: Instance Properties of RegExp Objects

Property

Value

Example

global

Boolean indicating whether the global flag (g) was set. This property is ReadOnly.

var pattern = /(cat) (dog)/g;
pattern.test("this is a cat dog
and cat dog");
document.writeln(pattern.global);
// prints true

ignoreCase

Boolean indicating whether the case-insensitive flag (i) was set. This property is ReadOnly.

var pattern = /(cat) (dog)/g;
pattern.test("this is a cat dog
and cat dog");
document.writeln(pattern.
ingoreCase);
// prints false

lastIndex

Integer specifying the position in the string at which to start the next match. You may set
this value.

var pattern = /(cat) (dog)/g;
pattern.test("this is a cat dog
and cat dog");
document.writeln(pattern.
lastIndex);
// prints 17

multiline

Boolean indicating whether the multiline
flag (m) was set. This property is ReadOnly.

var pattern = /(cat) (dog)/g;
pattern.test("this is a cat dog
and cat dog");
document.writeln(pattern.
multiline);
// prints false

source

The string form of the regular expression. This property is ReadOnly.

var pattern = /(cat) (dog)/g;
pattern.test("this is a cat dog
and cat dog");
document.writeln(pattern.source);
// prints (cat) (dog)

The RegExp class object also has static properties that can be very useful. These properties are listed in Table 8-7 and come in two forms. The alternate form uses a dollar sign and a special character and may be recognized by those who are already intimately familiar with regexps. A downside to the alternate form is that it has to be accessed in an associative array fashion. Note that using this form will probably confuse those readers unfamiliar with languages like Perl, so it is definitely best to just stay away from it.

Table 8-7: Static Properties of the RegExp Class Object

Property

Alternate Form

Value

Example

>$1, $2, …, $9

>None

>Strings holding
the text of the first
nine parenthesized subexpressions
of the most
recent match.

>var pattern = /(cat) (dog)/g;
pattern.test("this is a cat dog
and cat dog");
document.writeln
("$1="+RegExp.$1);
document.writeln
("$2="+RegExp.$2);
// prints $1= cat $2 = dog

>index

>None

>Holds the string
index value of the first character in the most recent pattern match. This property is not
part of the ECMA standard, though it
is supported widely. Therefore, it may be better to use the length of the regexp pattern and the lastIndex property to calculate
this value.

>var pattern = /(cat) (dog)/g;
pattern.test("this is a
cat dog and cat dog");
document.writeln
(RegExp.index);
// prints 10

>input

>$_

>String containing the default string to match against the pattern.

>var pattern = /(cat) (dog)/g;
pattern.test("this is a
cat dog and cat dog");
document.writeln RegExp.input);
// prints "this is a cat dog and cat dog"
document.writeln(RegExp['$_ ']);

>lastIndex

>None

>Integer specifying the position in the string at which to start the next match. Same as the instance property, which should be used instead.

>var pattern = /(cat) (dog)/g;
pattern.test("this is a
cat dog and cat dog");
document.writeln(RegExp.lastIndex);
// prints 17
>lastMatch>$&

>String containing
the most recently matched text.

>var pattern = /(cat) (dog)/g;
pattern.test("this is a
cat dog and cat dog");
document.writeln
(RegExp.lastMatch);
// prints "cat dog"
document.writeln
(RegExp['$&']);
// prints "cat dog"

>lastParen

>$+

>String containing
the text of the
last parenthesized subexpression of the most recent match.

>var pattern = /(cat) (dog)/g;
pattern.test("this is a
cat dog and cat dog");
document.writeln
(RegExp.lastParen);
// prints dog
document.writeln(RegExp['$+ ']);
// prints "dog"

>leftContext

>$`

>String containing the text to the left of the most recent match.

>var pattern = /(cat) (dog)/g;
pattern.test("this is a
cat dog and cat dog");
document.writeln
(RegExp.leftContext);
// prints "this is a"
document.writeln(RegExp['$` ']);
// prints "this is a"

>rightContext

>$ '

>String containing the text to the right of the most recent match.

>var pattern = /(cat) (dog)/g;
pattern.test("this is a
cat dog and cat dog");
document.writeln(RegExp.rightContext);
// prints "and cat dog"
document.writeln(RegExp['$\' ']);
// prints "and cat dog"

One interesting aspect of the static RegExp class properties is that they are global and therefore change every time you use a regular expression, whether with String or RegExp methods. For this reason, they are the exception to the rule that JavaScript is statically scoped. These properties are dynamically scoped—that is, changes are reflected in the RegExp object in the context of the calling function, rather than in the enclosing context of the source code that is invoked. For example, JavaScript in a frame that calls a function using regular expressions in a different frame will update the static RegExp properties in the calling frame, not the frame in which the called function is found. This rarely poses a problem, but it is something you should keep in mind if you are relying upon static properties in a framed environment.


Team LiB
Previous Section Next Section