Team LiB
Previous Section Next Section

String Methods for Regular Expressions

The String object provides four methods that utilize regular expressions. They perform similar and in some cases more powerful tasks than the RegExp object itself. Whereas the RegExp methods are geared toward matching and extracting substrings, the String methods use regular expressions to modify or chop strings up, in addition to matching and extracting.


The simplest regexp-related String method is search(), which takes a regular expression argument and returns the index of the character at which the first matching substring begins. If no substring matching the pattern is found, –1 is returned. Consider the following two examples:

"JavaScript regular expressions are powerful!".search(/pow.*/i);"JavaScript regular expressions are powerful!".search(/\d/);

The first statement returns 35, the character index at which the matching substring “powerful!” begins. The second statement searches for a digit and returns –1 because no numeric character is found.


The second method provided by String is also fairly simple. The split() method splits (for lack of a better word) a string up into substrings and returns them in an array. It accepts a string or regular expression argument containing the delimiter at which the string will be broken. For example,

var stringwithdelimits = "10 / 3 / / 4 / 7 / 9";var splitExp = /[ \/]+/; // one or more spaces and slashesmyArray = stringwithdelimits.split(splitExp);

places 10, 3, 4, 7, and 9 into the first five indices of the array called myArray. Of course you could do this much more tersely:

var myArray = "10 / 3 / / 4 / 7 / 9".split(/[ \/]+/);

Using split() with a regular expression argument (rather than a string argument) allows you the flexibility of ignoring multiple whitespace or delimiter characters. Because regular expressions are greedy (see the section, “Advanced Regular Expressions”), the regular expression “eats up” as many delimiter characters as it can. If the string " /" would have been used as a delimiter instead of a regular expression, we would have ended up with empty elements in our array.


The replace() method returns the string that results when you replace text matching its first argument (a regular expression) with the text of the second argument (a string). If the g (global) flag is not set in the regular expression declaration, this method replaces only the first occurrence of the pattern. For example,

var s = "Hello. Regexps are fun.";s = s.replace(/\./, "!"); // replace first period with an exclamation pointalert(s);

produces the string “Hello! Regexps are fun.” Including the g flag will cause the interpreter to perform a global replace, finding and replacing every matching substring. For example,

var s = "Hello. Regexps are fun.";s = s.replace(/\./g, "!"); // replace all periods with exclamation pointsalert(s);

yields this result: “Hello! Regexps are fun!”

replace() with Subexpressions

Recall that parenthesized subexpressions can be referred to by number using the RegExp class object (e.g., RegExp.$1). You can use this capability in replace() to reference certain portions of a string. The substrings matched by parenthesized subexpressions are referred to in the replacement string with a dollar sign ($) followed by the number of the desired subexpression. For example, the following inserts dashes into a hypothetical social security number:

var pattern = /(\d{3})(\d{2})(\d{4})/;var ssn = "123456789";ssn = ssn.replace(pattern, "$1-$2-$3");

The result “123-45-6789” is placed in ssn.

This technique is called backreferencing and is very useful for formatting data according to your needs. How many times have you entered a phone number into a Web site and been told that you need to include dashes (or not include them)? Since it’s just as easy to fix the problem using regular expressions and backreferencing as it is to detect it, consider using this technique in order to accommodate users who deviate slightly from expected patterns. For example, the following script does some basic normalization on phone numbers:

function normalizePhone(phone) { var p1 = /(\d{3})(\d{3})(\d{4})/; // eg, 4155551212 var p2 = /\((\d{3})\)\s+(\d{3})[^\d]+(\d{4})/; // eg, (415)555-1212 phone = phone.replace(p1, "$1-$2-$3"); phone = phone.replace(p2, "$1-$2-$3"); return phone;}


The final method provided by String objects is match(). This method takes a regular expression as an argument and returns an array containing the results of the match. If the given regexp has the global (g) flag, the array returned contains the results of each substring matched. For example,

var pattern = /\d{2}/g;var lottoNumbers = "22, 48, 13, 17, 26";var result = lottoNumbers.match(pattern);

places 22 in result[0], 48 in result[1], and so on up to 26 in result[4]. Using match() with the global flag is a great way to quickly parse strings of a known format.

The behavior of match() when the expression does not have the global flag is nearly identical to RegExp.exec() with the global flag set. match() places the character position at which the first match begins in an instance property index of the array that is returned. The instance property called input is also added and contains the entire original string. The contents of the entire matching substring are placed in the first element (index zero) of the array. The rest of the array elements are filled in with the matching subexpressions, with index n holding the value of $n. For example,

var url = "The URL is";var pattern = /(\w+):\/\/([\w\.]+)\/([\w\/]+)/; // three subexpressionsvar results = url.match(pattern);document.writeln("results.input =\t" + results.input);document.writeln("<<br />>");document.writeln("results.index =\t" + results.index);document.writeln("<<br />>");for (var i=0; i << results.length; i++) { document.writeln("results[" + i + "] =\t" + results[i]); document.writeln("<<br />>");}

produces the result shown in Figure 8-4. As you can see, all three subexpressions were matched and placed in the array. The entire match was placed in the first element, and the instance properties index and input reflect the original string (remember, string offsets are enumerated beginning with zero, just like arrays). Note that if match() does not find a match, it returns null.

Click To expand
Figure 8-4: Results of regular expression matching without the global flag

Team LiB
Previous Section Next Section