# split

## Overview

The `split` statement is used to create and/or update columns by splitting the textual value of an existing column into multiple parts.

## Syntax

**`split`***`ColName`***`using`***`sep`*

\
\&#xNAN;**`split`***`ColName`***`using`***`sep`***`retaining first`***`[column_count]`*<br>

**`split`***`ColName`***`using`***`sep`***`retaining last`***`[column_count]`*

\
\&#xNAN;**`split`***`ColName`***`using`***`sep`***`retaining`***`first_column_index`***`[to`***`last_column_index]`*<br>

{% hint style="info" %}
The keyword following *ColName* may be `using`, `separator` or `delimiter`. All three work in exactly the same way.
{% endhint %}

## Details

The *ColName* argument is the name of an existing column whose values are to be split and the *sep* argument (which must only be a single character in length) is the delimiter by which to split them.

For example a value `one:two:three:four`, if divided by a *sep* character of `:`, would result in the fields `one`, `two`, `three` and `four`.

{% hint style="info" %}
To specify a *sep* value of a space, use `" "` or`\<space>`\
&#x20;(where `<space>`is a literal space character)
{% endhint %}

New columns are created if necessary to contain the values resulting from the split. These additional columns are named *`ColName_split1`* through *`ColName_splitN`* where *N* is the number of values resulting from the process.

If there is an existing column with a name conflicting with any of the columns that `split` creates then:

* If the [overwrite option](https://olddocs.exivity.io/2.3.1/diving-deeper/transform/language/option) is set then all values in that pre-existing column will be overwritten
* If the [overwrite option](https://olddocs.exivity.io/2.3.1/diving-deeper/transform/language/option) is not set:
  * If there are blank values in the existing column they will be updated
  * If there are no blank values in the existing column then no changes will be made to it

### Keeping only specific fields

If the `retaining` keyword is specified, `split` will discard one or more of the resulting columns automatically based on the specification that follows the keyword. The following specifications are supported:

| Specification         | Result                                                            |
| --------------------- | ----------------------------------------------------------------- |
| `first` or `1`        | Discard all but the first column                                  |
| `first N` or `1 to N` | Discard all but the first *N* columns                             |
| `last`                | Discard all but the last (rightmost) column                       |
| `last N`              | Discard all but the last *N* columns                              |
| `N`                   | Discard all but the *N*th column                                  |
| `N` to `M`            | Discard all but the columns from the *N*th to the *M*th inclusive |

{% hint style="info" %}
In the table above, *N* refers to a result column's number where the first column created by `split` has a number of `1`
{% endhint %}

## Example

Given an input dataset of the form:

```
Name,ID
VM-One,sales:2293365:37
VM-Two,marketing:18839:division:89AB745
VM-Three,development:34345:engineering:345345:Jake Smith
VM-Four,sales::38
VM-Five,marketing:234234234:testMachine
VM-Six,development:xxxx:test
VM-Seven,1234:5678
VM-Eight,test:::
VM-Nine,field::5
VM-Ten,test::3425:
```

The statement ...

&#x20;`split ID using :`&#x20;

... will result in the dataset:

```
Name,ID,ID_split1,ID_split2,ID_split3,ID_split4,ID_split5
VM-One,sales:2293365:37,sales,2293365,37,,
VM-Two,marketing:18839:division:89AB745,marketing,18839,division,89AB745,
VM-Three,development:34345:engineering:345345:Jake Smith,development,34345,engineering,345345,Jake Smith
VM-Four,sales::38,sales,,38,,
VM-Five,marketing:234234234:testMachine,marketing,234234234,testMachine,,
VM-Six,development:xxxx:test,development,xxxx,test,,
VM-Seven,1234:5678,1234,5678,,,
VM-Eight,test:::,test,,,,
VM-Nine,field::5,field,,5,,
VM-Ten,test::3425:,test,,3425,,
```

Using the same original dataset, the statement ...

&#x20;`split ID using : retaining 3 to 5`&#x20;

... will result in the dataset:

```
Name,ID,ID_split1,ID_split2,ID_split3
VM-One,sales:2293365:37,37,,
VM-Two,marketing:18839:division:89AB745,division,89AB745,
VM-Three,development:34345:engineering:345345:Jake Smith,engineering,345345,Jake Smith
VM-Four,sales::38,38,,
VM-Five,marketing:234234234:testMachine,testMachine,,
VM-Six,development:xxxx:test,test,,
VM-Seven,1234:5678,,,
VM-Eight,test:::,,,
VM-Nine,field::5,5,,
VM-Ten,test::3425:,3425,,
```
