# Parslets

## Overview

After populating a [named buffer](https://github.com/exivity/docs/tree/60a265079e19e329e990b94f7836ea2024c5f214/extract/language/basics/README.md#named_buffers.md) with data from an external source such as an HTTP request or a file, it is often necessary to extract fields from it for uses such as creating subsequent HTTP requests or rendering output [csv](https://olddocs.exivity.io/2.3.1/diving-deeper/extract/language/csv) files.

This is accomplished using *parslets*. There are two types of parslet, *static* and *dynamic*. In both cases, when a parslet is used in a script it is expanded such that it is replaced with the value it is referencing, just like a variable is.

* *Static* parslets refer to a fixed location in XML or JSON data
* *Dynamic* parslets are used in conjunction with [foreach](https://olddocs.exivity.io/2.3.1/diving-deeper/extract/language/foreach) loops to retrieve values when iterating over arrays in XML or JSON data

{% hint style="info" %}
Parslets can be used to query JSON or XML data. Although JSON is used for illustrative purposes, some additional notes specific to XML can be found further down in this article.
{% endhint %}

## A quick JSON primer

Consider the example JSON shown below:

![](https://2831153169-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LHEKskLK6aXinV75Knl%2F-LJnG7K_wAi_bWMVk08U%2F-LJnH5C8iQEvZvTihJoz%2Fjson_structure.png?alt=media\&token=c11ca23e-d6c8-484d-ad44-5429d58721c5)

The object containing all the data (known as the *root node*) contains the following children:

| Child   | Type   |
| ------- | ------ |
| title   | string |
| heading | object |
| items   | array  |

Objects and arrays can be nested to any depth in JSON. The children of nested objects and arays are not considered as children of the object containing those objects and arrays, i.e. the children of the `heading` object are not considered as children of the root object.

Every individual 'thing' in JSON data, regardless of its type is termed a *node*.

Although different system return JSON in different forms, the JSON standard dictates that the basic principles apply universally to all of them. Thus, any possible valid JSON may contain arrays, objects, strings, boolean values (true or false values), numbers and null children.

It is often the case that the number of elements in arrays is not known in advance, therefore a means of iterating over all the elements in an array is required to extract arbitrary data from JSON. This principle also applies to objects, in that an object may contain any number of children of any valid type. Valid types are:

| Type      | Description                                                                                                 |
| --------- | ----------------------------------------------------------------------------------------------------------- |
| `object`  | A node encompassing zero or more child nodes (termed *children*) of any type                                |
| `array`   | A list of children, which may be of any type (but all children in any given array must be of the same type) |
| `string`  | Textual data                                                                                                |
| `number`  | Numeric data, may be integer or floating point                                                              |
| `boolean` | A true or false value                                                                                       |
| `null`    | A null value                                                                                                |

Some systems return JSON in a fixed and predictable format, whereas others may return objects and arrays of varying length and content. The documentation for any given API should indicate which fields are always going to be present and which may or may not be so.

Parslets are the means by which USE locates and extracts fields of interest in any valid JSON data, regardless of the structure. For full details of the JSON data format, please refer to [http://json.org](http://json.org/)

## Static parslets

Static parslets act like variables in that the parslet itself is expanded such that the extracted data replaces it. Static parslets extract a single field from the data and require that the location of that field is known in advance.

In the example JSON above, let us assume that the data is held in a [named buffer](https://github.com/exivity/docs/tree/60a265079e19e329e990b94f7836ea2024c5f214/extract/language/basics/README.md#named_buffers.md) called `example` and that the `title` and `heading` children are guaranteed to be present. Further, the `heading` object always has the children `category` and `finalised`. Note that for all of these guaranteed fields, the value associated with them is indeterminate.

The values associated with these fields can be extracted using a static parslet which is specified using the following syntax:

`$JSON{buffer_name}.[node_path]`

{% hint style="info" %}
Static parslets *always* specify a named buffer in curly braces immediately after the `$JSON` prefix
{% endhint %}

The *buffer\_name* is the name of the buffer containing the JSON data, which must have previously been populated using the [buffer](https://olddocs.exivity.io/2.3.1/diving-deeper/extract/language/buffer) statement.

The *node\_path* describes the location and name of the node containing the value we wish to extract. Starting at the root node, the name of each node leading to the required value is specified in square brackets. Each set of square brackets is separated by a dot.

The nodepaths for the fixed nodes described above are therefore as follows:

| Nodepath                 | Referenced value  |
| ------------------------ | ----------------- |
| `.[title]`               | Example JSON data |
| `.[heading].[category]`  | Documentation     |
| `.[heading].[finalised]` | true              |

Putting all the above together, the parslet for locating the category in the heading is therefore:

`$JSON{example}.[heading].[category]`

When this parslet is used in a USE script, the value associated with the parslet is extracted and the parslet is replaced with this extracted value. For example:

`print $JSON{example}.[heading].[category]`

will result in the word *Documentation* being output by the statement, and:

`var category = $JSON{example}.[heading].[category]`

will create a variable called *category* with a value of *Documentation*.

Currently, a parslet must be followed by whitespace in order to be correctly expanded. If you want to embed the value into a longer string, create a variable from a parslet and use that instead:

```
var category = $JSON{example}.[heading].[category]
var filename = JSON_${category}_${dataDate}
```

{% hint style="info" %}
When using JSON parslets that reference values that may contain whitespace it is sometimes necessary to enclose them in double quotes to prevent the extracted value being treated as multiple words by the script
{% endhint %}

### Anonymous JSON arrays

It may be required to extract values from a JSON array which contains values that do not have names as shown below:

```
{
  "data": {
    "result": [
      {
        "account": {
          "name": "account_one"
        },
        "metrics": [
          [
            34567,
            "partner"
          ],
          [
            98765,
            "reseller"
          ]
        ]
      },
      {
        "account": {
          "name": "account_two"
        },
        "metrics": [
          [
            24680,
            "internal"
          ],
          [
            13579,
            "partner"
          ]
        ]
      }
    ]
  }
}
```

Extraction of values that do not have names can be accomplished via the use of nested [foreach ](https://olddocs.exivity.io/2.3.1/diving-deeper/extract/language/foreach)loops in conjunction with an empty nodepath (`[]`) as follows:

```
buffer json_data = FILE system/extracted/json.json

csv OUTFILE = system/extracted/result.csv
csv add_headers OUTFILE account related_id type
csv fix_headers OUTFILE

foreach $JSON{json_data}.[data].[result] as this_result {

	# Extract the account name from each element in the 'result' array
	var account_name = $JSON(this_result).[account].[name]

	print Processing namespace: ${account_name}

	# Iterate over the metrics array within the result element
	foreach $JSON(this_result).[metrics] as this_metric {

	# As the metrics array contains anonymous arrays we need to iterate
	# further over each element. Note the use of an empty notepath.

		foreach $JSON(this_metric).[] as this_sub_metric {
			if (${this_sub_metric.COUNT} == 1) {
				# Assign the value on the first loop iteration to 'related_id'
				var related_id = $JSON(this_sub_metric).[]
			}
			if (${this_sub_metric.COUNT} == 2) {
				# Assign the value on the second loop iteration to 'type'
				var type = $JSON(this_sub_metric).[]
			}
		}
		
		csv write_fields OUTFILE ${account_name} ${related_id} ${type}
	}	
}
csv close OUTFILE

```

The result of executing the above against the sample data is:

```
"account","related_id","type"
"account_one","34567","partner"
"account_one","98765","reseller"
"account_two","24680","internal"
"account_two","13579","partner"

```

If the anonymous arrays have a known fixed length then it is also possible to simply stream the values out to the CSV without bothering to assign them to variables. Thus assuming that the elements in the `metrics` array always had two values, the following would also work:

```
buffer json_data = FILE system/extracted/json.json

csv OUTFILE = system/extracted/result.csv
csv add_headers OUTFILE account related_id type
csv fix_headers OUTFILE

foreach $JSON{json_data}.[data].[result] as this_result {

	# Extract the account name from each element in the 'result' array
	var account_name = $JSON(this_result).[account].[name]

	print Processing namespace: ${account_name}

	# Iterate over the metrics array within the result element
	foreach $JSON(this_result).[metrics] as this_metric {

	# As the metrics array contains anonymous arrays we need to iterate
	# further over each element. Note the use of an empty notepath.

		csv write_field OUTFILE ${account_name}
		
		foreach $JSON(this_metric).[] as this_sub_metric {
				csv write_field OUTFILE $JSON(this_sub_metric).[]
		}		
	}	
}
csv close OUTFILE
```

Which method is used will depend on the nature of the input data. Note that the special variable `${loopname.COUNT}` (where *loopname* is the label of the enclosing `foreach` loop) is useful in many contexts for applying selective processing to each element in an array or object as it will be automatically incremented every time the loop iterates. See [foreach](https://olddocs.exivity.io/2.3.1/diving-deeper/extract/language/foreach) for more information.

## Dynamic parslets

Dynamic parslets are used in to extract data from locations in the data that are not known in advance, such as when an array of unknown length is traversed in order to retrieve a value from each element in the array.

A dynamic parslet must be used in conjunction with a [foreach](https://olddocs.exivity.io/2.3.1/diving-deeper/extract/language/foreach) loop and takes the following form:

```
$JSON(loopName).[node_path]
```

Note the following differences between a static parslet and a dynamic parslet:

1. A dynamic parslet does not reference a named buffer directly, rather it references the name of a [foreach](https://olddocs.exivity.io/2.3.1/diving-deeper/extract/language/foreach) loop
2. Parentheses are used to surround the name of the foreach loop (as opposed to curly braces)
3. The nodepath following a dynamic parslet is relative to the target of the foreach loop

The following script fragment will render the elements in the *items* array (in the example JSON above) to disk as a CSV file.

```
# For illustrative purposes assume that the JSON
# is contained in a named buffer called 'myJSON'

# Create an export file
csv "items" = "system/extracted/items.csv"
csv add_headers id name category subcategory
csv add_headers subvalue1 subvalue2 subvalue3 subvalue4
csv fixheaders "items"

foreach $JSON{myJSON}.[items] as this_item
{
    # Define the fields to export to match the headers
    csv write_field items $JSON(this_item).[id]
    csv write_field items $JSON(this_item).[name]
    csv write_field items $JSON(this_item).[category]
    csv write_field items $JSON(this_item).[subcategory]

    # For every child of the 'subvalues' array in the current item
    foreach $JSON(this_item).[subvalues] as this_subvalue
    {
        csv write_field items $JSON(this_item).[0]
        csv write_field items $JSON(this_item).[10]
        csv write_field items $JSON(this_item).[100]
        csv write_field items $JSON(this_item).[1000]
    }
}
csv close "items"
```

In the example above, the first `foreach` loop iterates over the elements in the 'items' array, and each of the dynamic parslets extract values from the current element in that loop. The dynamic parslets use the current element, *this\_item* as the root for their node paths.

{% hint style="info" %}
If a parslet references a non-existent location in the XML or JSON data then it will resolve to the value `EXIVITY_NOT_FOUND`
{% endhint %}

## XML parslets

XML parslets work in exactly the same way that JSON parslets do, apart from the following minor differences:

1. XML parslets are prefixed `$XML`
2. When extracting data from XML, the `foreach` statement only supports iterating over XML arrays (whereas JSON supports iterating over objects and arrays)
3. An XML parslet may access an XML attribute

To access an XML attribute, the node\_path should end with `[@atrribute_name]` where *attribute\_name* is the name of the attribute to extract. For example given the following data in a buffer called `xmlbuf`:

```markup
<note>
<to>Tove</to>
<from>
    <name comment="test_attribute">Jani</name>
</from>
<test_array>
    <test_child>
        <name attr="test">Child 1</name>
        <age>01</age>
    </test_child>
    <test_child>
        <name attr="two">Child 2</name>
        <age>02</age>
    </test_child>
    <test_child>
        <name attr="trois">Child 3</name>
        <age>03</age>
    </test_child>
    <test_child>
        <name attr="quad">Child 4</name>
        <age>04</age>
    </test_child>
</test_array>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
```

The following script:

```
foreach $XML{xmlbuf}.[test_array] as this_child {
    print Child name ${this_child.COUNT} is $XML(this_child).[name] and age is $XML(this_child).[age] - attribute $XML(this_child).[name].[@attr]
}
```

will produce the following output:

```
Child name 1 is Child 1 and age is 01 - attribute test
Child name 2 is Child 2 and age is 02 - attribute two
Child name 3 is Child 3 and age is 03 - attribute trois
Child name 4 is Child 4 and age is 04 - attribute quad
```
