Suite101

JavaScript Class Using Substring

How To Parse Strings with the Java Script String Methods

© Guy Lecky-Thompson

JavaScript tutorial class notes on string parsing using the standard string processing methods such as substring and split.

There are two principle ways to tokenize a string. Tokenizing is the action of breaking apart a string into substrings based on some form of delimiter, in a similar fashion to the C/C++ strrok function from the string. library (see String Tokenizing in C Programming for more details).

The aim of tokenizing is to take a formatted string, and break it into substrings. However, there are times when the programmer needs to be arbitrarily split a string based on other criteria. The classic example is to consider two kinds of file:

  • Comma Separated Value i.e. records like "Name, Address, Email"
  • Fixed Field Width i.e. records like "Mr. N. E. Body 1 Main Street, City me@myhost.com "

Clearly the first of the two problems can be solved using the string.split function, but the second requires an arbritrary split, as simply splitting on space characters is not going to work terribly well.

What Does string.split Do?

To understand the advantage of string.substring it is useful to appreciate what code such as the following actually does:

var myString = new String("Name, Address, Email");
var myStringList = myString.split(','); // split on commas

The above code turns the myString object into an array, where:

  • myStringList[0] is "Name"
  • myStringList[1] is "Address"
  • myStringList[2] is "Email"

For further clarification on string.split, the JavaScript Split List or String article gives a more complete discussion.

What Does string.substring Do?

In contrast to string.split, the string.substring method actually allows the programmer to specify precisely where the string should be split, rather than relying on the presence of delimiters. Instead of returning an array of strings, the substring function actually returns one sub string at a time, so it is necessary to make several calls in order to parse a string.

So, to parse a string that is based on a fixed field width format, code similar to the following could be used:

var myString = new String("Name Address Email ");
var myStringList = new Array ( myString.substring ( 0, 10 ), myString.substring( 10, 10 ), myString.substring ( 20, 10 ) );

Notice how the array has been built up to mimic the string.split functionality from the previous example. The reader will have worked out by now that the generic form of the string.substring function is:

  • substring ( startCharPos, substringLength );

It would also be convenient to be able to tokenize in the same way as the string.split function, and luckily there is an easy way to do this.

How to Use string.substring to Tokenize

Tokenizing a string using substring has some advantages. Firstly, it is possible to change the delimiter halfway through the process, which can be useful if, as some strings (like document.cookie, explained in the Using JavaScript Cookies with HTML article) are broken up by multiple delimiter types.

This approach requires that the programmer:

  • Finds the start position
  • Calculates the length
  • Extracts a single string

Assuming that a parser for styles is being created, the following code can extract property names and values from a correctly formed string:

var myString = new String ( "property:value;property2:value2");
var myPropertyList = myString.split ( ';' );
for(var i=0;i myPropertyList[.length;i++) {
var myProperty = myPropertyList[i];
var startPos = myProperty.indexOf(':');
var myValue = myProperty.substring ( startPos, myProperty.length - startPos );
// do something useful with myValue
}

One extra feature that needs to be added to the above code is correct error trapping to remove white space and deal with incorrectly formed XML strings. These have been left out to preserve the clarity of the technique.


The copyright of the article JavaScript Class Using Substring in Javascript/Java Programming is owned by Guy Lecky-Thompson. Permission to republish JavaScript Class Using Substring in print or online must be granted by the author in writing.





Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo