home contribute faq download

FunctX XSLT Functions

fn:tokenize

Splits a string based on a regular expression

Google
Webxsltfunctions.com

Description

The fn:tokenize function splits a string based on a regular expression. The regular expression syntax used is defined by XML Schema with a few modifications/additions in XQueryXPath/XSLT 2.0. The $pattern argument is a regular expression that represents the separator. The simplest patterns can be a single space, or a string that contains the separator character, such as ,. However, certain characters must be escaped in regular expressions, namely .\?*+|^${}()[].

The separators are not included in the result strings. If two adjacent separators appear, a zero-length string is included in the result sequence. If the string starts with the separator, a zero-length string is the first value returned. Likewise, if the string ends with the separator, a zero-length string is the last value in the result sequence.

The $flags parameter allows for additional options in the interpretation of the regular expression. Flags and regular expressions are covered in detail in chapter 18 of the book XQuery.

If a particular point in the string could match more than one alternative, the first alternative is chosen. This is exhibited in the last example below, where the function considers the comma to be the separator, even though "comma plus asterisk" also applies.

For more examples of XQueryXPath/XSLT/XML Schema regular expressions, see this page.

This description is © Copyright 2007, Priscilla Walmsley. It is excerpted from the book XQuery by Priscilla Walmsley, O'Reilly, 2007. For a complete explanation of this function, please refer to Appendix A of the book.

Arguments and Return Type

NameTypeDescription
$input xs:string? the string to tokenize
$pattern xs:string regular expression to match the delimiters
$flags xs:string flags that control multiline mode, case insensitivity, etc.
return value xs:string*

Examples

XPath ExampleResults
tokenize(
   'a b c', '\s')
('a', 'b', 'c')
tokenize(
   'a   b c', '\s')
('a', '', '', 'b', 'c')
tokenize(
   'a   b c', '\s+')
('a', 'b', 'c')
tokenize(
   ' b c', '\s')
('', 'b', 'c')
tokenize(
   'a,b,c', ',')
('a', 'b', 'c')
tokenize(
   'a,b,,c', ',')
('a', 'b', '', 'c')
tokenize(
   'a, b, c', '[,\s]+')
('a', 'b', 'c')
tokenize(
   '2006-12-25T12:15:00', '[\-T:]')
('2006', '12', '25',
 '12', '15', '00')
tokenize(
   'Hello, there.', '\W+')
('Hello', 'there', '')
tokenize(
   (), '\s+')
()
tokenize(
   'abc', '\s')
abc
tokenize(
   'abcd', 'b?')
Error FORX0003
tokenize(
   'a,xb,xc', ',|,x')
('a', 'xb', 'xc')

See Also

functx:charsConverts a string to a sequence of characters

History

Published OnLast UpdatedContributor(s)
2006-06-272007-02-26W3C, XQuery 1.0 and XPath 2.0 Functions and Operators, http://www.w3.org/TR/xpath-functions/
Datypic XSLT Services

Recommended Reading:

XQuery