# create

## Overview

The `create` statement is used to add one more more new columns to an existing DSET.

## Syntax

**`create column`***`NewColumnName`***`[value`***`Value]`*

**`create columns from`***`ColumnName`***`[using`***`ValueColumnName]`*

**`create mergedcolumn`***`NewColumn`***`[separator`***`sep]`***`from [string`***`literal] Column [/regex/] [Column [/regex/]`*`|`**`string`***`literal]`*

## Details

### Explicit single column creation

#### Syntax

**`create column`***`NewColumnName`***`[value`***`Value]`*

#### Details

This statement is used to create a new column called *NewColumnName*. The `NewColumnName` argument may be a [fully qualified](/2.10.2/advanced/digging-deeper/dataset-lifecycle.md) column name, in which case the new column will be created in the DSET specified as part of that name.

Note: If no [default](/2.10.2/data-pipelines/transform/language/default.md) DSET has been explicitly defined using the [default dset](/2.10.2/data-pipelines/transform/language/default.md) statement then the DSET created by the first [use](/2.10.2/data-pipelines/transform/language/import.md) or [import](/2.10.2/data-pipelines/transform/language/import.md) statement in the Transcript task is automatically set as the default DSET.

A column called *NewColumnName* must not already exist in the DSET. If `NewColumnName` contains dots then they will be converted into underscores.

The new column will be created with no values in any cells, unless the optional `value *Value*` portion of the statement is present, in which case all the cells in the new column will be set to *Value*.

#### Examples

Create a new empty column called *Cost* in the default DSET:\
`create column Cost`

Create a new column called *Cost* with a value of *1.0* in every row of the default DSET:\
`create column Cost value 1.0`

Create a new column called *Cost* with a value of *1.0* in every row of the DSET *custom.charges*:\
`create column custom.charges.Cost value 1.0`

### Automated single/multiple column creation

#### Syntax

**`create columns from`***`ColumnName`***`[using`***`ValueColumnName]`*

#### Details

This statement is used to create multiple columns in a single operation. As is the case for `create columns` above, if the `using ValueColumnName` portion of the statement is not present, then all newly created columns will have no values in any cells.

Given this example dataset:

![](/files/-LJZZFSBlYqq4Tl-8Nba)

The statement `create columns from ServiceName using Count` will create the result shown below:

![](/files/-LJZZFSDlllB7LtHHBTx)

The names of the new columns to create are derived from the contents of the cells in the column called *ColumnName*, and the values (if opted for) are derived from the contents of the cells in the column called *ValueColumnName*. Duplicates are ignored. If all the cells in *ColumnName* have the same contents, then only a single new column will be created. To illustrate this, consider the following:

```
SubscriptionID,ServiceName,Quantity
FE67,StorageGB,30
1377,Small_VM,2
EDED,Medium_VM,8
8E1B,Large_VM,1
99AA,Small_VM,99
```

When applied to the data above, the statement `create columns from ServiceName` will produce the following result (note that only a single column called *Small\_VM* is created, and that empty cells are represented with a separator character, which in the case of the below is a comma):

```
SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,,,,
1377,Small_VM,2,,,,
EDED,Medium_VM,8,,,,
8E1B,Large_VM,1,,,,
99AA,Small_VM,99,,,,
```

If opting to set the values in the new columns, then for each row the value in *ValueColumnName* will be copied into the column whose name matches *ColumnName*. When applied to the same original data, the statement `create columns from ServiceName using Quantity` will produce the following result:

```
SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,30,,,
1377,Small_VM,2,,2,,
EDED,Medium_VM,8,,,8,
8E1B,Large_VM,1,,,,1
99AA,Small_VM,99,,99,,
```

When using `create columns` the new columns are always created in the default DSET. This means that when no values are being set, it is possible to specify a different DSET for *ColumnName*. If the default DSET is `Azure.usage`, then the statement `create columns from custom.data.Services` will derive the names of the new columns from the cell contents in the *Services* column in the *custom.data* DSET.

This is only possible in the absence of the `using ValueColumnName` option. When values are to be set, both the *ColumnName* and *ValueColumnName* arguments must belong to the default DSET.

#### Example

The following transcript task will import the datasets *Azure.usage* and *system/extracted/Services.csv*, and create new (empty) columns in *Azure.usage* whose names are taken from the values in the column *ServiceDefinitions* in *Services.csv*.

```
import system/extracted/Services.csv source custom
import usage from Azure
default dset Azure.usage
create columns from custom.Services.ServiceDefinitions
```

### Merging column values to create a new column

#### Syntax

**`create mergedcolumn`***`NewColumn`***`[separator`***`sep]`***`from [string`***`literal] Column [/regex/] [ ... Column [/regex/]`*`|`**`string`***`literal]`*

{% hint style="info" %}
If preferred, the wor&#x64;**`using`**&#x6D;ay be used instead of the wor&#x64;**`from`**(both work in an identical fashion)
{% endhint %}

#### Details

This form of the statement is used to generate a new column containing values derived from those in one or more existing columns (termed *source columns*). The parameters are as follows:

| Parameter            | Required           | Meaning                                                                                                                                             |
| -------------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| *NewColumn*          | Yes                | The name of the new column to create                                                                                                                |
| *sep*                | No                 | If specified after the **separator** keyword, *sep* is a string to insert between values extracted from the source columns                          |
| *Column*             | Yes (at least one) | The name of a source column. A minimum of 1 column must be specified (and at most, 8 may be specified)                                              |
| */regex/*            | No                 | If specified, the expression enclosed by the `/` characters is applied to the values in source column specified by the preceding *ColumnN* argument |
| **string** *literal* | No                 | If specified, *literal* will be add to column value. Relative order of *Column* and *literal* bits is observed                                      |

{% hint style="info" %}
The separator may be more than one character in length (up to 31 characters may be specified)
{% endhint %}

If a regex is specified then it must contain a subgroup enclosed in parentheses. The portion of the text in the source column matched by this subgroup will be extracted and used in place of the full column value.

{% hint style="warning" %}
The '/' characters surrounding the regular expression in the statement are not considered to be part of the expression itself - they are merely there to differentiate an expression from another column name.
{% endhint %}

If a regex is not specified, then the entire value in the source column will be used.

#### Options

By default the value extracted from a source column will be blank in the following two cases:

* There is a blank value in a source column
* No match for a regular expression is found in the value of a source column

In such cases the merged result will simply omit the contribution from the source column(s) in question. If all the source columns yield a blank result then the final merged result will also be blank.

This behaviour can be overridden through the use of the [option](/2.10.2/data-pipelines/transform/language/option.md) statement. The options associated with the `create mergedcolumn` statement are as follows:

`option merge_blank = some_text_here`

This option will use the string *some\_text\_here* in place of any blank source column value.

`option merge_nomatch = some_text_here`

This option will use the string *some\_text\_here* if the result of applying the regular expression to a column value returns no matches.

Specifying the literal string `<blank>` as the *merge\_blank* or *merge\_nomatch* value will reset the option such that the default behaviour is re-activated.

#### Examples

Given the following dataset:

```
name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing
```

The following examples illustrate some uses of the `create mergedcolumn` statement:

**Example 1**

```
# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :

create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
```

**Example 2**

If no regular expression is specified then the values in the source column will be used in their entirety:

```
# Create a new column called 'key' which combines the 'department'
# and 'user_id' columns separated by ":", with prefix

create mergedcolumn key separator : from string prefix department user_id

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,prefix:Development:123-456-123456
Tim,654-321-654321,Project Management,prefix:Project Management:654-321-654321
Joram,555-222-999111,Development,prefix:Development:555-222-999111
Joost,826-513-284928,Sales and Marketing,prefix:Sales and Marketing:826-513-284928
```

**Example 3**

Let us add a new row to the sample dataset which has a non-compliant value for the user\_id:

```
name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
John,xxx-xxx-xxxxxx,Pending
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing
```

By default a non-matching value will result in a blank component of the merged result:

```
# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :

create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
```

In this case, the resulting key for `John` has no separator characters in it. We can force a default value for the missing user\_id portion as follows:

```
option merge_nomatch = [none]
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending:[none]
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://olddocs.exivity.io/2.10.2/data-pipelines/transform/language/create.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
