# create

## Overview

The `create` statement is used to add one more more new columns to an existing DSET.

## Syntax

**`create column`***`NewColumnName`***`[value`***`Value]`*

**`create columns from`***`ColumnName`***`[using`***`ValueColumnName]`*

**`create mergedcolumn`***`NewColumn`***`[separator`***`sep]`***`from [string`***`literal] Column [/regex/] [Column [/regex/]`*`|`**`string`***`literal]`*

## Details

### Explicit single column creation

#### Syntax

**`create column`***`NewColumnName`***`[value`***`Value]`*

#### Details

This statement is used to create a new column called *NewColumnName*. The `NewColumnName` argument may be a [fully qualified](https://olddocs.exivity.io/3.4.3/advanced/digging-deeper/dataset-lifecycle) column name, in which case the new column will be created in the DSET specified as part of that name.

Note: If no [default](https://olddocs.exivity.io/3.4.3/data-pipelines/transform/language/default) DSET has been explicitly defined using the [default dset](https://olddocs.exivity.io/3.4.3/data-pipelines/transform/language/default) statement then the DSET created by the first [use](https://olddocs.exivity.io/3.4.3/data-pipelines/transform/language/import) or [import](https://olddocs.exivity.io/3.4.3/data-pipelines/transform/language/import) statement in the Transcript task is automatically set as the default DSET.

A column called *NewColumnName* must not already exist in the DSET. If `NewColumnName` contains dots then they will be converted into underscores.

The new column will be created with no values in any cells, unless the optional `value *Value*` portion of the statement is present, in which case all the cells in the new column will be set to *Value*.

#### Examples

Create a new empty column called *Cost* in the default DSET:\
`create column Cost`

Create a new column called *Cost* with a value of *1.0* in every row of the default DSET:\
`create column Cost value 1.0`

Create a new column called *Cost* with a value of *1.0* in every row of the DSET *custom.charges*:\
`create column custom.charges.Cost value 1.0`

### Automated single/multiple column creation

#### Syntax

**`create columns from`***`ColumnName`***`[using`***`ValueColumnName]`*

#### Details

This statement is used to create multiple columns in a single operation. As is the case for `create columns` above, if the `using ValueColumnName` portion of the statement is not present, then all newly created columns will have no values in any cells.

Given this example dataset:

![](https://3540922554-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LHEKskLK6aXinV75Knl%2F-LJZZCkXsjem1OpTWADm%2F-LJZZFSBlYqq4Tl-8Nba%2Fservicedata.png?generation=1533916812298224\&alt=media)

The statement `create columns from ServiceName using Count` will create the result shown below:

![](https://3540922554-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LHEKskLK6aXinV75Knl%2F-LJZZCkXsjem1OpTWADm%2F-LJZZFSDlllB7LtHHBTx%2Fcreate_columns.png?generation=1533916812540785\&alt=media)

The names of the new columns to create are derived from the contents of the cells in the column called *ColumnName*, and the values (if opted for) are derived from the contents of the cells in the column called *ValueColumnName*. Duplicates are ignored. If all the cells in *ColumnName* have the same contents, then only a single new column will be created. To illustrate this, consider the following:

```
SubscriptionID,ServiceName,Quantity
FE67,StorageGB,30
1377,Small_VM,2
EDED,Medium_VM,8
8E1B,Large_VM,1
99AA,Small_VM,99
```

When applied to the data above, the statement `create columns from ServiceName` will produce the following result (note that only a single column called *Small\_VM* is created, and that empty cells are represented with a separator character, which in the case of the below is a comma):

```
SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,,,,
1377,Small_VM,2,,,,
EDED,Medium_VM,8,,,,
8E1B,Large_VM,1,,,,
99AA,Small_VM,99,,,,
```

If opting to set the values in the new columns, then for each row the value in *ValueColumnName* will be copied into the column whose name matches *ColumnName*. When applied to the same original data, the statement `create columns from ServiceName using Quantity` will produce the following result:

```
SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,30,,,
1377,Small_VM,2,,2,,
EDED,Medium_VM,8,,,8,
8E1B,Large_VM,1,,,,1
99AA,Small_VM,99,,99,,
```

When using `create columns` the new columns are always created in the default DSET. This means that when no values are being set, it is possible to specify a different DSET for *ColumnName*. If the default DSET is `Azure.usage`, then the statement `create columns from custom.data.Services` will derive the names of the new columns from the cell contents in the *Services* column in the *custom.data* DSET.

This is only possible in the absence of the `using ValueColumnName` option. When values are to be set, both the *ColumnName* and *ValueColumnName* arguments must belong to the default DSET.

#### Example

The following transcript task will import the datasets *Azure.usage* and *system/extracted/Services.csv*, and create new (empty) columns in *Azure.usage* whose names are taken from the values in the column *ServiceDefinitions* in *Services.csv*.

```
import system/extracted/Services.csv source custom
import usage from Azure
default dset Azure.usage
create columns from custom.Services.ServiceDefinitions
```

### Merging column values to create a new column

#### Syntax

**`create mergedcolumn`***`NewColumn`***`[separator`***`sep]`***`from [string`***`literal] Column [/regex/] [ ... Column [/regex/]`*`|`**`string`***`literal]`*

{% hint style="info" %}
If preferred, the wor&#x64;**`using`**&#x6D;ay be used instead of the wor&#x64;**`from`**(both work in an identical fashion)
{% endhint %}

#### Details

This form of the statement is used to generate a new column containing values derived from those in one or more existing columns (termed *source columns*). The parameters are as follows:

| Parameter            | Required           | Meaning                                                                                                                                             |
| -------------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| *NewColumn*          | Yes                | The name of the new column to create                                                                                                                |
| *sep*                | No                 | If specified after the **separator** keyword, *sep* is a string to insert between values extracted from the source columns                          |
| *Column*             | Yes (at least one) | The name of a source column. A minimum of 1 column must be specified (and at most, 8 may be specified)                                              |
| */regex/*            | No                 | If specified, the expression enclosed by the `/` characters is applied to the values in source column specified by the preceding *ColumnN* argument |
| **string** *literal* | No                 | If specified, *literal* will be add to column value. Relative order of *Column* and *literal* bits is observed                                      |

{% hint style="info" %}
The separator may be more than one character in length (up to 31 characters may be specified)
{% endhint %}

If a regex is specified then it must contain a subgroup enclosed in parentheses. The portion of the text in the source column matched by this subgroup will be extracted and used in place of the full column value.

{% hint style="warning" %}
The '/' characters surrounding the regular expression in the statement are not considered to be part of the expression itself - they are merely there to differentiate an expression from another column name.
{% endhint %}

If a regex is not specified, then the entire value in the source column will be used.

#### Options

By default the value extracted from a source column will be blank in the following two cases:

* There is a blank value in a source column
* No match for a regular expression is found in the value of a source column

In such cases the merged result will simply omit the contribution from the source column(s) in question. If all the source columns yield a blank result then the final merged result will also be blank.

This behaviour can be overridden through the use of the [option](https://olddocs.exivity.io/3.4.3/data-pipelines/transform/language/option) statement. The options associated with the `create mergedcolumn` statement are as follows:

`option merge_blank = some_text_here`

This option will use the string *some\_text\_here* in place of any blank source column value.

`option merge_nomatch = some_text_here`

This option will use the string *some\_text\_here* if the result of applying the regular expression to a column value returns no matches.

Specifying the literal string `<blank>` as the *merge\_blank* or *merge\_nomatch* value will reset the option such that the default behaviour is re-activated.

#### Examples

Given the following dataset:

```
name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing
```

The following examples illustrate some uses of the `create mergedcolumn` statement:

**Example 1**

```
# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :

create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
```

**Example 2**

If no regular expression is specified then the values in the source column will be used in their entirety:

```
# Create a new column called 'key' which combines the 'department'
# and 'user_id' columns separated by ":", with prefix

create mergedcolumn key separator : from string prefix department user_id

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,prefix:Development:123-456-123456
Tim,654-321-654321,Project Management,prefix:Project Management:654-321-654321
Joram,555-222-999111,Development,prefix:Development:555-222-999111
Joost,826-513-284928,Sales and Marketing,prefix:Sales and Marketing:826-513-284928
```

**Example 3**

Let us add a new row to the sample dataset which has a non-compliant value for the user\_id:

```
name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
John,xxx-xxx-xxxxxx,Pending
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing
```

By default a non-matching value will result in a blank component of the merged result:

```
# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :

create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
```

In this case, the resulting key for `John` has no separator characters in it. We can force a default value for the missing user\_id portion as follows:

```
option merge_nomatch = [none]
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending:[none]
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
```
