Table Of Contents

Scan String for Tokens (G Dataflow)

Last Modified: January 9, 2017

Searches a string for the next token, where a token is defined as either the next set of characters that appears before a specified delimiter or one of a specified set of operators.

Programming Patterns
connector_pane_image
datatype_icon

allow empty tokens?

A Boolean value that determines whether an empty token exists between consecutive delimiters.

True The node returns an empty string for token string when it encounters a pair of consecutive delimiters.
False The node considers consecutive delimiters to be a single delimiter and therefore never returns an empty string as a token.

Default: False

datatype_icon

input string

The string to scan for tokens.

datatype_icon

offset

The number of bytes into the input string at which this node begins its operation.

The offset of the first byte in the input string is 0. If offset is beyond the end of the input string, this node returns an empty string.

spd-note-note
Note  

Strings are encoded in UTF-8. In most cases, the number of bytes in a string is equivalent to the number of characters. However, for strings containing the characters U+0080 through U+7FFFFFFF, the number of bytes is greater than the number of characters.

Default: 0

datatype_icon

operators

An array of strings that this node identifies as tokens when they appear in input string, even if they are not surrounded by delimiters.

Available Format Specifiers

In addition to specifying literal strings as tokens, you can use certain format specifiers to interpret a series of digits as a token.

%d match decimal integer
%o match octal integer
%x match hexadecimal integer
%b match binary integer
%e, %f, %g match floating-point or scientific real number
%% match a single % character
datatype_icon

delimiters

An array of strings that act as separators between tokens. The node does not return these strings as tokens but instead uses these strings to determine where tokens begin and end.

Default: White space characters — space, tab, linefeed, and carriage return

datatype_icon

use cached delimiter/operator data?

A Boolean value that determines whether the node uses saved values for delimiters and operators, thereby improving string parsing performance.

Set this input to True only if delimiters and operators have not changed since the last time this node executed.

True The node uses the delimiters and operators from the most recent time this node executed.
False The node uses the values wired to delimiters and operators.

Default: False

datatype_icon

string out

The same string as input string, unchanged.

datatype_icon

offset past token

The index in input string of the first byte following the token and any trailing delimiters.

If offset is less than 0 or greater than the number of bytes in input string, or if the end of the string was reached, this output is -1.

To continue searching for more tokens in input string, use this value as offset the next time you call this node.

spd-note-note
Note  

Strings are encoded in UTF-8. In most cases, the number of bytes in a string is equivalent to the number of characters. However, for strings containing the characters U+0080 through U+7FFFFFFF, the number of bytes is greater than the number of characters.

datatype_icon

token string

The first token in input string following offset. This output is either all text that appears between two delimiters or one of the strings specified by operators.

datatype_icon

token index

The index in operators of token string if token string matches one of the elements in operators.

If token string is any other string, token index returns -1. If the node reaches the end of input string without finding any valid operator, token index returns -2.

Token Definition

Tokens are text segments that typically represent individual keywords, numeric values, or operators found when parsing a configuration file or other text-based data format. You can specify tokens with the data you pass into the node through the delimiters and operators inputs. For example, because the space character is a delimiter by default, each word of This is a string is a token, and you can parse the sentence into its component words.

Scanning Multiple Tokens

You can use this node in a While Loop to scan multiple tokens. Refer to Parsing a String into Smaller Pieces for more information.

Behavior When Matching Multiple Operators

If a portion of input string matches more than one defined operator, the node chooses the longest match as a token. For example, if >, =, and >= are defined operators, the input string 4>=0 produces >= as the next token string with an offset of 1.

Interpreting Numbers as Tokens

If you want to interpret a series of digits as a token that represents a number, include a format specifier in the list of values for the operators input. For example, including %b as one of the elements in operators causes the node to interpret a string of 1's and 0's as a binary number and returns it as a token after encountering any character that is not a 1 or 0.

If you include a format specifier in operators along with the strings + or -, the node does not recognize leading, or unary, + and - signs. The node always returns them as separate tokens. For example, if input string contains -5 and operators includes [%d, -], token string returns [-, 5] instead of [-5]. This is an exception to the "longest match" rule.

If you place the node in a While Loop, the node returns the following values.

input string operators delimiters token string Comments
4>=0 [>, =, >=] \s, \t, \r, \n (default) [4, >=, 0]

If a portion of the input string matches more than one defined operator, Scan String for Tokens chooses the longest match as a token.

a==b

c!=d

[==, !=] \s, \t, \r, \n (default) [a, ==, b, c, !=, d]
G2 X0.5Y1.0 i0.5j0 z-0.05 [X, Y, Z, i, j, z] \s, \t, \r, \n (default) [G2, X, 0.5, Y, 1.0, i, 0.5, j, 0, z, -0.05]

This is an example of a string of G-code, a language commonly used for machine control. This string describes a circle.

C1_1.11C2_2.22C3_3.33 None C, _ (add to delimiters array)

\s, \t, \r, \n (default)

[1, 1.11, 2, 2.22, 3, 3.33]

This is an example of a string from a DAQ log with three channels.

Where This Node Can Run:

Desktop OS: Windows

FPGA: All devices (only within an Optimized FPGA VI)


Recently Viewed Topics