create
Overview
The create
statement is used to add one or more new columns to an existing DSET.
Syntax
create column
NewColumnName
[value
Value]
create columns from
ColumnName
[using
ValueColumnName]
create mergedcolumn
NewColumn
[separator
sep]
from [string
literal] Column [/regex/] [Column [/regex/]
|
string
literal]
Details
Explicit single column creation
Syntax
create column
NewColumnName
[value
Value]
Details
This statement is used to create a new column called NewColumnName. The NewColumnName
argument may be a fully qualified column name, in which case the new column will be created in the DSET specified as part of that name.
Note: If no default DSET has been explicitly defined using the default dset statement then the DSET created by the first use or import statement in the Transcript task is automatically set as the default DSET.
A column called NewColumnName must not already exist in the DSET. If NewColumnName
contains dots then they will be converted into underscores.
The new column will be created with no values in any cells, unless the optional value *Value*
portion of the statement is present, in which case all the cells in the new column will be set to Value.
Examples
Create a new empty column called Cost in the default DSET:
create column Cost
Create a new column called Cost with a value of 1.0 in every row of the default DSET:
create column Cost value 1.0
Create a new column called Cost with a value of 1.0 in every row of the DSET custom.charges:
create column custom.charges.Cost value 1.0
Automated single/multiple column creation
Syntax
create columns from
ColumnName
[using
ValueColumnName]
Details
This statement is used to create multiple columns in a single operation. As is the case for create columns
above, if the using ValueColumnName
portion of the statement is not present, then all newly created columns will have no values in any cells.
Given this example dataset:
The statement create columns from ServiceName using Count
will create the result shown below:
The names of the new columns to create are derived from the contents of the cells in the column called ColumnName, and the values (if opted for) are derived from the contents of the cells in the column called ValueColumnName. Duplicates are ignored. If all the cells in ColumnName have the same contents, then only a single new column will be created. To illustrate this, consider the following:
When applied to the data above, the statement create columns from ServiceName
will produce the following result (note that only a single column called Small_VM is created, and that empty cells are represented with a separator character, which in the case of the below is a comma):
If opting to set the values in the new columns, then for each row the value in ValueColumnName will be copied into the column whose name matches ColumnName. When applied to the same original data, the statement create columns from ServiceName using Quantity
will produce the following result:
When using create columns
the new columns are always created in the default DSET. This means that when no values are being set, it is possible to specify a different DSET for ColumnName. If the default DSET is Azure.usage
, then the statement create columns from custom.data.Services
will derive the names of the new columns from the cell contents in the Services column in the custom.data DSET.
This is only possible in the absence of the using ValueColumnName
option. When values are to be set, both the ColumnName and ValueColumnName arguments must belong to the default DSET.
Example
The following transcript task will import the datasets Azure.usage and system/extracted/Services.csv, and create new (empty) columns in Azure.usage whose names are taken from the values in the column ServiceDefinitions in Services.csv.
Merging column values to create a new column
Syntax
create mergedcolumn
NewColumn
[separator
sep]
from [string
literal] Column [/regex/] [ ... Column [/regex/]
|
string
literal]
If preferred, the wordusing
may be used instead of the wordfrom
(both work in an identical fashion)
Details
This form of the statement is used to generate a new column containing values derived from those in one or more existing columns (termed source columns). The parameters are as follows:
The separator may be more than one character in length (up to 31 characters may be specified)
If a regex is specified then it must contain a subgroup enclosed in parentheses. The portion of the text in the source column matched by this subgroup will be extracted and used in place of the full column value.
The '/' characters surrounding the regular expression in the statement are not considered to be part of the expression itself - they are merely there to differentiate an expression from another column name.
If a regex is not specified, then the entire value in the source column will be used.
Options
By default the value extracted from a source column will be blank in the following two cases:
There is a blank value in a source column
No match for a regular expression is found in the value of a source column
In such cases the merged result will simply omit the contribution from the source column(s) in question. If all the source columns yield a blank result then the final merged result will also be blank.
This behaviour can be overridden through the use of the option statement. The options associated with the create mergedcolumn
statement are as follows:
option merge_blank = some_text_here
This option will use the string some_text_here in place of any blank source column value.
option merge_nomatch = some_text_here
This option will use the string some_text_here if the result of applying the regular expression to a column value returns no matches.
Specifying the literal string <blank>
as the merge_blank or merge_nomatch value will reset the option such that the default behaviour is re-activated.
Examples
Given the following dataset:
The following examples illustrate some uses of the create mergedcolumn
statement:
Example 1
Example 2
If no regular expression is specified then the values in the source column will be used in their entirety:
Example 3
Let us add a new row to the sample dataset which has a non-compliant value for the user_id:
By default a non-matching value will result in a blank component of the merged result:
In this case, the resulting key for John
has no separator characters in it. We can force a default value for the missing user_id portion as follows:
Last updated