The split
statement is used to create and/or update columns by splitting the textual value of an existing column into multiple parts.
split
ColName
using
sep
split
ColName
using
sep
retaining first
[column_count]
split
ColName
using
sep
retaining last
[column_count]
split
ColName
using
sep
retaining
first_column_index
[to
last_column_index]
The keyword following ColName may be using
, separator
or delimiter
. All three work in exactly the same way.
The ColName argument is the name of an existing column whose values are to be split and the sep argument (which must only be a single character in length) is the delimiter by which to split them.
For example a value one:two:three:four
, if divided by a sep character of :
, would result in the fields one
, two
, three
and four
.
To specify a sep value of a space, use " "
or\<space>
(where <space>
is a literal space character)
New columns are created if necessary to contain the values resulting from the split. These additional columns are named ColName_split1
through ColName_splitN
where N is the number of values resulting from the process.
If there is an existing column with a name conflicting with any of the columns that split
creates then:
If the overwrite option is set then all values in that pre-existing column will be overwritten
If the overwrite option is not set:
If there are blank values in the existing column they will be updated
If there are no blank values in the existing column then no changes will be made to it
If the retaining
keyword is specified, split
will discard one or more of the resulting columns automatically based on the specification that follows the keyword. The following specifications are supported:
Specification
Result
first
or 1
Discard all but the first column
first N
or 1 to N
Discard all but the first N columns
last
Discard all but the last (rightmost) column
last N
Discard all but the last N columns
N
Discard all but the _N_th column
N
to M
Discard all but the columns from the _N_th to the _M_th inclusive
In the table above, N refers to a result column's number where the first column created by split
has a number of 1
Given an input dataset of the form:
The statement ...
split ID using :
... will result in the dataset:
Using the same original dataset, the statement ...
split ID using : retaining 3 to 5
... will result in the dataset: