Skip to main content

Section 7.10 From the Java Library: java.util.StringTokenizer

One of the most widespread string-processing tasks is that of breaking up a string into its components, or tokens.

For example, when processing a sentence, you may need to break the sentence into its constituent words, which are considered the sentence tokens. When processing a name-password string, such as “boyd:14irXp”, you may need to break it into a name and a password.

Tokens are separated from each other by one or more characters which are known as delimiters. Thus, for a sentence, white space, including blank spaces, tabs, and line feeds, serve as the delimiters. For the password example, the colon character serves as a delimiter.

Figure 7.10.1. The StringTokenizer class.

Java's java.util.StringTokenizer class is specially designed for breaking strings into their tokens (Figure 7.10.1). When instantiated with a String parameter, a StringTokenizer breaks the string into tokens, using white space as delimiters. For example, if we instantiated a StringTokenizer as in the code

StringTokenizer sTokenizer
  = new StringTokenizer("This is an English sentence.");

it would break the string into the following tokens, which would be stored internally in the StringTokenizer in the order shown:

This
is
an
English
sentence.

Note that the period is part of the last token (“sentence.”). This is because punctuation marks are not considered delimiters by default.

If you wanted to include punctuation symbols as delimiters, you could use the second StringTokenizer() constructor, which takes a second String parameter (Figure 7.10.1). The second parameter specifies a string of those characters that should be used as delimiters. For example, in the instantiation,

StringTokenizer sTokenizer
    = new StringTokenizer("This is an English sentence.",
                          "\b\t\n,;.!");

various punctuation symbols (periods, commas, and so on) are included among the delimiters. Note that escape sequences (\b\t\n) are used to specify blanks, tabs, and newlines.

The hasMoreTokens() and nextToken() methods can be used to process a delimited string, one token at a time. The first method returns true as long as more tokens remain; the second gets the next token in the list. For example, here's a code segment that will break a standard URL string into its constituent parts:

String url = "http://java.trincoll.edu/~jjj/index.html";
StringTokenizer sTokenizer = new StringTokenizer(url,":/");
while (sTokenizer.hasMoreTokens()) {
    System.out.println(sTokenizer.nextToken());}

This code segment will produce the following output:

http
java.trincoll.edu
~jjj
index.html

The only delimiters used in this case were the “:” and “/” symbols. And note that nextToken() does not return the empty string between “:” and “/” as a token.

You have attempted of activities on this page.