Parslets
Last updated
Last updated
After populating a named buffer with data from an external source such as an HTTP request or a file, it is often necessary to extract fields from it for uses such as creating subsequent HTTP requests or rendering output csv files.
This is accomplished using parslets. There are two types of parslet, static and dynamic. In both cases, when a parslet is used in a script it is expanded such that it is replaced with the value it is referencing, just like a variable is.
Static parslets refer to a fixed location in XML or JSON data
Dynamic parslets are used in conjunction with foreach loops to retrieve values when iterating over arrays in XML or JSON data
Parslets can be used to query JSON or XML data. Although JSON is used for illustrative purposes, some additional notes specific to XML can be found further down in this article.
Consider the example JSON shown below:
The object containing all the data (known as the root node) contains the following children:
Child
Type
title
string
heading
object
items
array
Objects and arrays can be nested to any depth in JSON. The children of nested objects and arays are not considered as children of the object containing those objects and arrays, i.e. the children of the heading
object are not considered as children of the root object.
Every individual 'thing' in JSON data, regardless of its type is termed a node.
Although different system return JSON in different forms, the JSON standard dictates that the basic principles apply universally to all of them. Thus, any possible valid JSON may contain arrays, objects, strings, boolean values (true or false values), numbers and null children.
It is often the case that the number of elements in arrays is not known in advance, therefore a means of iterating over all the elements in an array is required to extract arbitrary data from JSON. This principle also applies to objects, in that an object may contain any number of children of any valid type. Valid types are:
Type
Description
object
A node encompassing zero or more child nodes (termed children) of any type
array
A list of children, which may be of any type (but all children in any given array must be of the same type)
string
Textual data
number
Numeric data, may be integer or floating point
boolean
A true or false value
null
A null value
Some systems return JSON in a fixed and predictable format, whereas others may return objects and arrays of varying length and content. The documentation for any given API should indicate which fields are always going to be present and which may or may not be so.
Parslets are the means by which USE locates and extracts fields of interest in any valid JSON data, regardless of the structure. For full details of the JSON data format, please refer to http://json.org
Static parslets act like variables in that the parslet itself is expanded such that the extracted data replaces it. Static parslets extract a single field from the data and require that the location of that field is known in advance.
In the example JSON above, let us assume that the data is held in a named buffer called example
and that the title
and heading
children are guaranteed to be present. Further, the heading
object always has the children category
and finalised
. Note that for all of these guaranteed fields, the value associated with them is indeterminate.
The values associated with these fields can be extracted using a static parslet which is specified using the following syntax:
$JSON{buffer_name}.[node_path]
Static parslets always specify a named buffer in curly braces immediately after the $JSON
prefix
The buffer_name is the name of the buffer containing the JSON data, which must have previously been populated using the buffer statement.
The node_path describes the location and name of the node containing the value we wish to extract. Starting at the root node, the name of each node leading to the required value is specified in square brackets. Each set of square brackets is separated by a dot.
The nodepaths for the fixed nodes described above are therefore as follows:
Nodepath
Referenced value
.[title]
Example JSON data
.[heading].[category]
Documentation
.[heading].[finalised]
true
Putting all the above together, the parslet for locating the category in the heading is therefore:
$JSON{example}.[heading].[category]
When this parslet is used in a USE script, the value associated with the parslet is extracted and the parslet is replaced with this extracted value. For example:
print $JSON{example}.[heading].[category]
will result in the word Documentation being output by the statement, and:
var category = $JSON{example}.[heading].[category]
will create a variable called category with a value of Documentation.
Currently, a parslet must be followed by whitespace in order to be correctly expanded. If you want to embed the value into a longer string, create a variable from a parslet and use that instead:
When using JSON parslets that reference values that may contain whitespace it is sometimes necessary to enclose them in double quotes to prevent the extracted value being treated as multiple words by the script
It may be required to extract values from a JSON array which contains values that do not have names as shown below:
Extraction of values that do not have names can be accomplished via the use of nested foreach loops in conjunction with an empty nodepath ([]
) as follows:
The result of executing the above against the sample data is:
If the anonymous arrays have a known fixed length then it is also possible to simply stream the values out to the CSV without bothering to assign them to variables. Thus assuming that the elements in the metrics
array always had two values, the following would also work:
Which method is used will depend on the nature of the input data. Note that the special variable ${loopname.COUNT}
(where loopname is the label of the enclosing foreach
loop) is useful in many contexts for applying selective processing to each element in an array or object as it will be automatically incremented every time the loop iterates. See foreach for more information.
Dynamic parslets are used in to extract data from locations in the data that are not known in advance, such as when an array of unknown length is traversed in order to retrieve a value from each element in the array.
A dynamic parslet must be used in conjunction with a foreach loop and takes the following form:
Note the following differences between a static parslet and a dynamic parslet:
A dynamic parslet does not reference a named buffer directly, rather it references the name of a foreach loop
Parentheses are used to surround the name of the foreach loop (as opposed to curly braces)
The nodepath following a dynamic parslet is relative to the target of the foreach loop
The following script fragment will render the elements in the items array (in the example JSON above) to disk as a CSV file.
In the example above, the first foreach
loop iterates over the elements in the 'items' array, and each of the dynamic parslets extract values from the current element in that loop. The dynamic parslets use the current element, this_item as the root for their node paths.
If a parslet references a non-existent location in the XML or JSON data then it will resolve to the value EXIVITY_NOT_FOUND
XML parslets work in exactly the same way that JSON parslets do, apart from the following minor differences:
XML parslets are prefixed $XML
When extracting data from XML, the foreach
statement only supports iterating over XML arrays (whereas JSON supports iterating over objects and arrays)
An XML parslet may access an XML attribute
To access an XML attribute, the node_path should end with [@atrribute_name]
where attribute_name is the name of the attribute to extract. For example given the following data in a buffer called xmlbuf
:
The following script:
will produce the following output: