1 of 38

Transform

Transcript executes user-definable scripts (termed tasks) in order to produce one or more (RDFs) from one or more input Dataset files in CSV format. These RDFs are later used by the reporting engine to generate results.

Overview

Transcript tasks are located in system/config/transcript/ and are ASCII files which can be created with any editor. Both UNIX and Windows end-of-line formats are supported.

Statements

Each statement in a Transcript task must be contained on a single line. Statements consist of a keyword indicating the action to perform, followed by zero or more parameters, separated by white-space, required by the statement. Documentation for all the possible statements can be found in the Transcript language .

Quotes and escapes

By default a space, tab or newline will mark the end of a word in a Transcript task. To include white-space in a parameter (for example to reference a column name with a space in it) then this can be done by enclosing it in double quotes or escaping it by preceding it with \.

Examples: create columns from "Meter Name" using Quantity create columns from Meter\ Name using Quantity

The following table summarises the behaviour of quotes and escapes:

Comments

Comments in a Transcript task start with a # character that is either of:

the first character of a line in the Transcript task
the first character in a word

Comments always end at the end of the line they are started on.

Variables

Transcript statements may contain variables. Variables have a name and a value. When a variable name is encountered during execution of the task, the name is replaced with the value of the variable with that name.

Variable names ...

may be used multiple times in a single statement
are case sensitive - ${dataDate} is different to ${datadate}
may not be nested
may be embedded within surrounding text - xxx${dataDate}yyy
may be used within quotes: import "${baseDir}\to_import\AzureJuly${dataDate}.ccr" source AzureJuly
may appear as words of their own in a transcript statement - create column Date value ${dataDate}

Regular Expression variables

A regular expression variable is a special type of variable used to match the name of a column in a DSET. It is enclosed by ${/ and /} and the text within this enclosure can take either of the following two forms:

${/expression/}
${/dset.id/expression/}
- If the text preceding the / character is not a valid DSET ID then the entire text of the variable between the ${/ and /} enclosure is treated as a regular expression and will be applied to the default DSET

Once the DSET ID and the expression have been established by the above, the expression is tested against each column name in the DSET and the first matching column name is returned. If no match is found, then an error is logged and the transcript task will fail.

The regular expression may contain a subgroup, which is enclosed within parentheses - ( and ). If no subgroup is present, and a match is made, then the entire column name will be returned. If a subgroup is present and a match is made, then only the characters matching the portion of the expression within the parentheses are returned. For example:

The expression does not have to match the entire column name. Assuming no subgroup is specified, as long as a match is made then the variable will be expanded to the whole column name.

Examples:

Importing Data

The Dataset (in CSV format) is read from disk
An index is constructed, which facilitates high speed manipulation of the data in the DSET
The DSET is added to the list of DSETs available for use by subsequent statements in the Transcript task

Once these actions have been completed, a DSET can be identified through the unique combination of source.alias. This permits Transcript statements to specify which DSET to operate on.

Exporting Data

Data can be exported in one of two ways during the execution of a Transcript task:

Export on demand

Many Transcript statements change the data in the DSET in some way. Columns may be created, renamed or deleted and rows may be added and removed for example.

Finishing

Transform Preview

When developing your Transformer, it is possible to view intermediate results by utilizing the preview functionality in the Transformer editor. To use the Transformer Preview, place your cursor at a line number in your script uptill where the Transformer should execute. In the example below, the cursor is placed at line numer 27:

Make sure to select a Preview Date for which data is available in the system. Then execute the preview by clicking the Update and Preview button.

By default, the preview will load the first 1000 records for each DSET being imported. In case you want to increase / decrease this amount, you can do so by adjusting the value in the Output Limit field:

Additionally, the preview will by default return the contents of the Default DSET. In case you prefer to preview a static DSET, you can control this by selecting a custom DSET from the drop down selector:

When executing a Transformer in Preview mode, it will not write any daily RDF files to disk, nor will it populate any services.

Configuration

The 'Data Pipelines' menu allows an admin of the Exivity solution to manage Transcript 'Transformer' scripts. Transcript has its own language reference, which is fully covered in a separate chapter of this documentation.

As described in the , you are free to use your editor of choice to create and modify Transformers. However, the GUI also comes with a built-in Transformers-editor.

Creating Transformers

To create a new Transformer for Transcript, follow these steps:

From the menu on the left, select "Data Pipelines" > 'Transformer'
To create a new Transformer to normalise and enrich USE Extractor consumption and lookup data, click the 'Create Transformer' button
When your Exivity instance has access to the Internet, it will pull in the latest set of Transformer Templates from our account. These templates are then presented to you, and you can pick one from the list to start Extracting. If you don't have access to the internet, you can download them directly from . You are also free to start creating your own Extractor from scratch.
Provide a meaningful name for your Transformer. When we create a Transformer for a consolidated bill of various IT resources we would, for example, name it: 'IT Services Consumption'.
When you're done creating your Transformer, click the 'Insert' at the bottom of the screen.

The Transformer editor has syntax highlighting and auto completion, to simplify the development of your scripts

Edit and Delete Transformers

When you want to change or delete an existing Transformer, first select one from the list of Transformer that you want to change:

When you've selected your Transformer from the "Data Pipelines" > 'Transformers' list, you can change the Transformer script in the editor
In this example, we're adding a 'services' statement using auto completion, to simplify the creation of services
In case you want to save your changes, click the 'Save' button at the bottom of the 'Editor' screen. To delete this Transformer, you can do so by clicking the 'Remove' button, after which you'll receive an confirmation pop-up where you'll have to click 'OK'.

Run and Schedule Transformers

To test your Transformer, you can execute or schedule it directly from the Glass interface:

After you have selected the Transformer that you would like to run, click to the 'Run' tab next to the 'Editor' tab
Manual execution of a Transformer can only be done for a single day. Provide the date you want to run this transformer for in dd-MM-yyyy format. You can also use the date picker, by clicking on the down facing arrow, on the right side of the date field
When you've provided the required date, click 'Run Now' to execute the Transformer. After the Transformer has completed running, you will receive some success or failed message, after which you might need to make additional changes to your Transformer. For further investigations or troubleshooting, consult the "Log Viewer" found under the administration dropdown menu top right of the screen.
Once you're happy with your output, you can schedule the Transformer via the 'Schedule' tab, which is located next to the 'Run' tab at the top of the screen
Transformer can be scheduled to run once a day at a specific time. Also you should provide a date, which is provided by using an offset value. For example, if you want to execute this Transformer against yesterdays date with every schedule run, you should provide a value there of -1
When you're done with the schedule configuration, you may click the 'Schedule' button. In case you want to change or remove this schedule afterwards, click the 'Unschedule' button.

As of version 1.6, it is recommend to use the Workflow function instead of the schedule tab to schedule transformers.

Language

Syntax

Within the individual reference articles for each statement, the syntax is described using the following conventions:

bold for keywords
italics for arguments
Square brackets for optional keywords and arguments [likethis]
Vertical pipe for alternative keyword options just|exactly as shown
Ellipses for a variable length list of arguments: Column1 ... ColumnN

Refer to the core concepts page for more information regarding datasets, fully qualified column names and related information.

Reference

The following statements (in alphabetical order) are supported by Transcript:

aggregate

Overview

The aggregate statement is used to reduce the number of rows in a DSET while preserving required information within them

Syntax

aggregate [counter_columncolname] [dset.id][notime|daily] [offsetoffset][nudge] [default_functionfunction] colname function [... colname function]

Details

The aggregate statement is a powerful tool for reducing the number of rows in a DSET. Aggregation is based on the concept of matching rows. Any two rows that match may be merged into a single row which selectively retains information from both of the original rows. Any further rows that match may also be merged into the same result row.

A quick introduction

A match is determined by comparing all the columns which have a function of match associated with them (further information regarding this can be found below). If all the column values match, then the rows are merged.

Merging involves examining all the columns in the data that were not used in the matching process. For each of those columns, it applies a function to the values in the two rows and updates the result row with the computed result of that function. For a full list of functions, please refer to the table further down in this article.

To illustrate this consider the following two row dataset:

id,colour,location,quantity
1234,blue,europe,4.5
1234,green,europe,5.5

If we don't care about the colour value in the above records, we can combine them together. We do care about the quantity however, so we'll add the two values together to get the final result.

The statement to do this is:

aggregate notime id match location match quantity sum

id match means that the values in the id columns must be the same
location match means that the values in the location columns must be the same
quantity sum means that the resulting value should be the sum of the two existing values
by default, a function of first is applied to the columns, such that the original row retains its value

Applying these rules to the above example we get the following single result record:

id,colour,location,quantity,EXIVITY_AGGR_COUNT
1234,blue,europe,10,2

Parameters

The aggregate statement accepts a range of parameters as summarised in the table below:

If two records are deemed suitable for merging then the function determines the resulting value in each column. The available functions are as follows:

The counter column

Each of the rows in a DSET that has been aggregated may be the result of merging multiple rows that existed prior to aggregation. The aggregate statement therefore creates a new column (or overwrites an existing column if told to) which for each row contains a count of the number of rows that were merged to create that row.

By default this column is called EXIVITY_AGGR_COUNT but an alternative name may be specified using the optional counter_column parameter.

Any of counter_column, counter_col, count_column or count_col may be used as the parameter name. They all work in the same way.

For example the following statement will perform the aggregation and place the counts for each row into a column called merged_row_count

aggregate counter_column merged_row_count test.data category match quantity sum

If the column specified by counter_column does not exist then it will be created.

If the column does exist then the contents will be overwritten.

Note that the column named by counter_column may not be assigned an aggregation function. If it is, then an error will be logged and the transform will fail for the current data date.

Non time-sensitive aggregation

When the notime parameter is specified, the aggregation process treats any columns flagged as start and end times in the data as data columns, not timestamp columns.

In this case when comparing two rows to see if they can be merged, the aggregation function simply checks to see if all the columns with a function of match are the same, and if they are the two rows are merged into one by applying the appropriate function to each column in turn.

De-duplication

The following illustrates the aggregate statement being used to remove duplicate rows from a DSET:

# The first column in the DSET being aggregated
# is called subscription_id

aggregate notime default_function match subscription_id match

The analysis of the statement above is as follows:

notime - we are not interested in timestamps
default_function match - by default every column has to match before records can be aggregated
subscription_id match - this is effectively redundant as the default_function is match but needs to be present because at least one pair of colname function parameters is required by the aggregate statement

The resulting DSET will have no duplicate data rows, as each group of rows whose column values were the same were collapsed into a single record.

Row reduction while preserving data

The example shown at the top of this article used the sum function to add up the two quantity values, resulting in the same total at the expense of being able to say which source record contributed which value to that total.

The sum function can therefore accurately reflect the values in a number of source rows, albeit with the above limitation. By using a function of sum, max or min, various columns can be processed by aggregate in a meaningful manner, depending on the specific use case.

Time-sensitive aggregation

When aggregating, columns containing start time and end time values in UNIX epoch format can be specified. Each record in the DSET therefore has start and end time markers defining the period of time that the usage in the record represents. As well as taking the start times and end times into account, time-sensitive aggregation can perform additinal manipulations on these start and end times.

A quick example

Consider the following CSV file called aggregate_test.csv:

startUsageTime,endUsageTime,id,subscription_id,service,quantity
2017-11-03:00.00.00,2017-11-03:02.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:03.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:06.00.00,ID_3456,SUB_efgh,Medium VM,2
2017-11-03:00.00.00,2017-11-03:04.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:05.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:06.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:07.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:02.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:03.00.00,ID_3456,SUB_efgh,Medium VM,2
2017-11-03:00.00.00,2017-11-03:04.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:05.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:07.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:06.00.00,ID_3456,SUB_efgh,Medium VM,2

It is possible to aggregate these into 3 output records with adjusted timestamps using the following Transcript task:

import system/extracted/aggregate_test.csv source aggr alias test

var template = YYYY.MM.DD.hh.mm.ss
timestamp START_TIME using startUsageTime template ${template}
timestamp END_TIME using endUsageTime template ${template}
timecolumns START_TIME END_TIME
delete columns startUsageTime endUsageTime

aggregate aggr.test daily nudge default_function first id match subscription_id match service match quantity sum

timerender START_TIME as FRIENDLY_START
timerender END_TIME as FRIENDLY_END

Resulting in:

id,subscription_id,service,quantity,START_TIME,END_TIME,EXIVITY_AGGR_COUNT,FRIENDLY_START,FRIENDLY_END
ID_1234,SUB_abcd,Large VM,12,1509667200,1509692399,6,20171103 00:00:00,20171103 06:59:59
ID_3456,SUB_efgh,Medium VM,6,1509667200,1509688799,3,20171103 00:00:00,20171103 05:59:59
ID_3456,SUB_efgh,Large VM,8,1509667200,1509692399,4,20171103 00:00:00,20171103 06:59:59

As can be seen, for each unique combination of the values in the id,subscription-id and service columns, the start and end times have been adjusted as described above and the quantity column contains the sum of all the values in the original rows.

When performing time-sensitive aggregation, any records with a start or end time falling outside the current data date will be discarded.

Further notes

The daily parameter to aggregate means that the START_TIME and END_TIME columns are now recognised as containing timestamps. When aggregating with the daily option, timestamps within the current dataDate are combined to result in an output record which has the earliest start time and the latest end time seen within the day.

Optionally, following daily an offset may be specified as follows:

aggregate aggr.test daily offset 2 id match subscription_id match quantity sum

In this case the start and end timestamps are adjusted by the number of hours specified after the word offset before aggregation is performed. This permits processing of data which has timestamps with timezone information in them, and which may start at 22:00:00 of the first day and end at 21:59:59 of the second day, as an offset can be applied to realign the records with the appropriate number of hours to compensate.

The nudge parameter shaves 1 second off end times before aggregating in order to avoid conflicts where hourly records start and end on 00:00:00 9the last second of the current hour is the same as the first second of the next hour)

append

Overview

The append statement is used to append one to the end of another.

Syntax

appendsource_dset.idtodestination_dset.id

Details

If the source DSET has any column names not present in the destination DSET then additional columns are automatically created in the destination DSET. These additional columns will contain blank values by default.

If one or more column names are present in both DSETs then the columns copied from the source DSET may be re-ordered into the same order as that used by the destination DSET.

At the end of the operation, the destination DSET will contain all the data from both DSETs, and the source DSET is unchanged.

Both DSETs must exist and both should have data. To verify a DSET existents or to check whether a DSET is empty, use one of the following functions:

Additionally, it is not possible to append a DSET to itself.

Example

Given the following DSETs:

DSET ID: example.data

DSET ID: example2.data

The statement append example2.data to example.data will result in the following destination DSET (example.data):

calculate

Overview

The calculate statement is used to perform arithmetic operations using literal and column values.

Syntax

calculate columnResultColas source operation source

where source is either of columncolName or valueliteral_value

and operation is one of the characters + - * / % for addition, subtraction, multiplication, division and modulo respectively.

There must be whitespace on each side of the _operation`_character

Examples: calculate column ResultCol as column Amount * value 1.2 calculate column Net as column total - column cogs calculate column constant_7 as value 3.5 + value 3.5

Details

The ResultCol parameter is the name of the column that will hold the results. This column may or may not exist (if necessary it will be created automatically).

Both of the two source parameters can specify a literal value, or the name of a column containing the value to use when performing the calculation.

A literal value is specified using valueN where N is the literal number required
A column name is specified using columncolName where ColName is the name of the column containing the values required

The ResultCol may be the same as a column specified by one of the source parameters in which case any existing values in it will be updated with the result of the calculation.

Additional notes:

Any blank or non-numeric values in a source column will be treated as 0
An attempt to divide by zero will result in 0
When performing a modulo operation, the two source values are rounded to the nearest integer first

Examples

Add 1.5 to the values in the Rate column:

calculate column Rate as column Rate + value 1.5

Multiply the values in the Rate column by those in the Quantity column
Store the result in a new column called Charge

calculate column Charge as column Rate * column Quantity

convert

Overview

The convert statement is used to convert values in a column from base-10 to base-16 or vice-versa.

Syntax

convertcolNameto decimal|hex from decimal|hex

The keywords decanddecimaland the keywordshexandhexadecimalare equivalent.

Details

When converting values in a column, the following considerations apply:

Values in the column are replaced with the converted values
The colName argument must reference an existing column, and may optionally be fully qualified (else the column is assumed to be in the default DSET)
If any values in the column are not valid numbers, they will be treated as 0
Blank values are ignored
The convert statement may be used in the body of a where statement
If a value in colName contains a partially correct value such as 123xyz then it will be treated as a number up to the first invalid character, in this case resulting in a value of 123.
The hex digits in the original value can be either upper or lower case
The hex digits from A-F will be rendered in upper case in the converted output
The convert statement only supports integer values (floating points will be treated as floored to the nearest integer)

Example

convert decimal_count from decimal to hex
convert unique_id from hexadecimal to dec

copy

This article covers both the copy and move statements. They both work in the same way apart from the fact that move deletes the source row after copying it.

Overview

The copy statement is used to copy rows from one DSET to another

Syntax

copy rows todset.id

move rows todset.id

Details

Both copy and move must be used within the body of a where statement. Only rows that match the expression will be copied (or moved).

The DSET from which rows will be copied or moved is automatically determined from the expression used by the where statement.
The DSET to which rows will be copied or moved is determined by the dset.id parameter

The source and destination DSETs must be different (it is not possible to copy or move a row within the same DSET).

The destination DSET may or may not exist. If it does not exist then it will be created. If it does exist then the following logic is applied:

If the destination DSET has more columns than the source DSET then the new rows in the destination DSET will have blank values in the rightmost columns
If the destination DSET has fewer columns than the source DSET then the destination DSET will be extended with enough new columns to accomodate the new rows. In this case, existing rows in the destination DSET will have blank values in the rightmost columns

If the destination DSET is extended to accomodate the source rows then the new (rightmost) columns will have the same names as the equivalent columns in the source DSET. In the event that this would cause a naming conflict with existing columns in the destination DSET, one or more new columns in the destination DSET will heve a suffix added to their name to ensure uniqueness. This suffix takes the form _N where N is a number starting at 2.

To illustrate this, if the source DSET has columns called subscription,name,address,hostname and the destination DSET has a single column called name then the resulting extended destination DSET would have columns called name,subscription,name_2,address,hostname.

Example

# Move all rows in usage.data where hostname is "test server"
# to a new DSET called test.servers

where ([usage.data.hostname] == "test server") {
    move rows to test.servers
}

correlate

Overview

The correlate statement is used to enrich the default DSET by adding new columns to it, and/or updating existing columns with useful values. The new column names are derived from other DSETs and the values in those columns are set using a lookup function based on the value in a key column shared between the DSETs.

Syntax

correlateColName1 [ ... ColNameN]usingKeyColumn[assumingassumeDSET][defaultDefaultValue]

Details

The ColName1 ... ColNameN arguments are column names that will be copied from their original DSETs and merged into the default DSET.

Column names must be fully qualified, unless the assuming parameter is used, in which case any column names that are not fully-qualified will be assumed to belong to the DSET specified by assumeDSET.

Source and Destination columns

Source columns are those from which a cell is to be copied when the KeyColumn matches. Destination columns are columns in the default DSET into which a cell will be copied. Destination column names are derived from the names of the source columns as follows:

The source column is the argument in its original form, for example: Azure.usage.MeterName
The destination column is the same argument, but with the DSET ID replaced with that of the default DSET. For example if the default DSET is Custom.Services then the destination column for the above would be Custom.Services.MeterName.

If a destination column name doesn't exist in the default DSET then a new column with that name will automatically be created.

The Key Column

The KeyColumn argument is a column name which must not be fully qualified and which must exist in the default DSET and all of the DSETs referenced by the ColNameN arguments.

Default values

The DefaultValue argument, if present, specifies the value to write into the destination column if there is no match for the KeyColumn. If the DefaultValue argument is not specified then any rows where there is no match will result in a blank cell in the destination column.

For each row in the default DSET, the source DSET is searched for a matching KeyColumn value, and if a match is found then the value in the source column is used to update the default DSET. The row of the first match found in the source DSET will be used.

Overwriting

When matching the KeyColumn values, the logic in the following table is evaluated against every row in the destination DSET.

✘ means no or disabled, ✔ means yes or enabled

Examples

Given two Datasets as follows, where the default DSET is MyData.Owners:

Dataset 'MyData.Owners'

owner,id
John,100
Tim,110
Fokke,120
Joost,130
Jon,140

Dataset 'Custom.Services'

service,description,id
Small_VM,Webserver,130
Medium_VM,App_Server,100
Large_VM,DB_Server,110
Medium_VM,Test_Server,120

The statement: correlate service description using id assuming Custom.Services

Will enrich the MyData.Owners Dataset such that it contains:

owner,id,service,description
John,100,Medium_VM,App_Server
Tim,110,Large_VM,DB_Server
Fokke,120,Medium_VM,Test_Server
Joost,130,Small_VM,Web_Server
Jon,140,,

The statement: correlate service description using id assuming Custom.Services default unknown

Will produce:

owner,id,service,description
John,100,Medium_VM,App_Server
Tim,110,Large_VM,DB_Server
Fokke,120,Medium_VM,Test_Server
Joost,130,Small_VM,Web_Server
Jon,140,unknown,unknown

default

Overview

The default statement is used to explicitly define the default DSET to use when specifying a column name as an argument in subsequent statements.

Syntax

default dsetsource.alias

Details

Given that multiple DSETS can be loaded at once, it is necessary to specify which DSET to operate on when performing actions such as creating a new column with create. A column name in a Transcript statement is assumed as belonging to the default DSET unless it is a fully qualified column name.

If there is no default statement in the Transcript, then the first CSV file imported via the import statement will automatically be designated as the default DSET.

The default statement can be used multiple times throughout a Transcript, in which case the default DSET will be whichever was specified by the last default statement executed.

Lastly, when executing a finish statement, unless otherwise specified, the default DSET will be used to populate the reporting database created as a result.

When changing the default DSET, any service definitions that referenced the default DSET at the time they were created will be updated with the new default DSET.

Examples

Set custom.datafile as the default DSET:

default dset custom.datafile

environment

The environment statement specifies the name of environment to use for resolving .

Syntax

environment name

Details

The environment statement selects the predefined environment to use for lookup. It is and error to specify the environment which is not defined in global database.

If no environment specified, default environment (the one specified as default in global database) is assumed.

Environment can be changed many times without limitations, and change affects only global variables that are referenced first time within the script, e.g. all global variables, resolved (copied to local variables) retain their values.

event_to_usage

Overview

The event_to_usage statement generates new CSV file with usage records from START/STOP/UPDATE events in source .

Syntax

event_to_usage fromsource.aliastofile{ options }

See Details section for options description

Details

This statement produces usage records from events in source DSET. Three times of events are supported:

START event - marks the start of consumption
STOP event - marks the end of consumption
UPDATE event - marks the change of consumption attributes, such as quantity

There are several situations when the usage record is created:

from START to first matching STOP event
from START to first matching UPDATE event
from UPDATE to first matching STOP event
from UPDATE to first matching UPDATE event
from the beginning of the day to STOP event (if consumption started during previous days)
from the beginning of the day to UPDATE event (if consumption started during previous days)
from START event to the end of the day (if no matching STOP/UPDATE events found)
from UPDATE event to the end of the day (if no matching STOP/UPDATE events found)
for the whole day (if consumption started during previous days, and there was no STOP or UPDATE event during processing day)

Event B is considered matching to event A if happened after event B and has matching key fields.

Options

Several options control the behaviour of this statement:

Conditions for events are valid SQL conditions, which are copied verbatim into query's WHERE, therefore it can be any legal SQL expression, such as "state = 'started' AND (prev_state = 'stopped' OR prev_state IS NULL)". Remember using SQL-standard single quotes for string literals.

Initial data load

If epoch_date option is specified, it is possible to perform initial data load - load running consumptions on specific date. Transcript performs following checks:

there are no events loaded for the specified DSET
processing date matches specified epoch_date

If any of these checks fail, Transcript stops with error.

Data integrity checks

It is very important to load events in correct order, therefore Transcript performs following checks:

the data for the same day for the specified DSET cannot be loaded twice
there cannot be gaps in processed dates (except for Initial data load)

If any of these checks fail, Transcript stops with error.

If there is a need to re-process data for specific day, event-related state in the database must be rolled back to the preceding date, and after data for all following days processed in correct order.

Examples

Only START and STOP events, single-column key:

START, STOP and UPDATE events, complex key:

finish

Overview

The finish statement creates a (RDF) from a DSET. The RDF can subsequently be used by the reporting engine.

Syntax

finish[dset.id]

Details

The finish statement is used to create an RDF from a DSET. Only a single DSET can be used to create an RDF, but multiple finish statements may be used within the same task file. If there is no dset.id parameter then the default DSET will be used.

The RDF created by finish will be saved as <BaseDir>\system\report\<yyyy>\<MM>\<dd>_source.alias.rdf

where:

<yyyy> is the 4-digit year
<MM> is the 2-digit month
<dd> is the 2-digit day
source.alias are the tags which form the DSET ID

Any existing RDF with the same name will be overwritten.

Examples

Create a Reporting Database file for the default DSET: finish

Create a Reporting Database file for the DSET Azure.usage finish Azure.usage

normalise

Overview

The normalise statement is used to update the values in a numerical column such that they are all positive, negative or inverted.

In this documentation the spelling normalise is used but normalize may also be used. The functionality is identical in either case.

Syntax

normalise columncolNameas positive

normalise columncolNameas negative

normalise columncolNameas invert

normalise columncolNameas standard

Details

The normalise statement processes each value in the column called colName and applies the following logic based on the last argument shown above as follows:

In order to be considered a number, a value in the colName column must start with any of the characters +, -, . or 0 to 9 and may contain a single . character which is interpreted as a decimal point.

If a value in colName is non-numeric or blank it is left intact

When using standardall non-blank values are assumed to be numeric, and as such any non-numeric values will be changed to a numeric zero.

Additionally:

Any numerical value in colName which starts with a +, . or decimal character is considered positive
Any numerical value in colName which starts with a - character is considered negative
When using standard the resulting conventional number will be accurate up to 14 decimal places

The normalise statement ignores the option overwrite setting, as its sole purpose is to modify existing values.

Example

import "system/extracted/csp_usage.csv" source test alias data

# Invert all numerical values in column 'quantity'
normalise column quantity as invert

option

Overview

The option statement is used to set global parameters during the execution of a Transcript task.

Syntax

optionoption = setting

option noquote

Details

The option statement can be used multiple times within the same task script and always takes immediate effect. It is therefore possible (for example) to import a CSV file delimited with commas and quoted with single quotes, change the options and then export it with semicolon delimiters and double quotes.

The supported options are as follows:

When using options, there must be whitespace on each side of the = sign

Additional notes

Continue

option continue = yes|enabled

option continue = no|disabled

When executing a task file repeatedly against each day in a date range, by default Transcript will abort the whole run if a task failure occurs. In cases where this is undesirable, setting the continue option to enabled or yes (both work in exactly the same way) will change the behaviour such that if a task failure occurs then execution will resume with the next day in the range.

When combining the continue option with option mode = permissive it is possible to process a range of dates for which usage or other data is not available, because the mode option will prevent a failed import statement from being treated as a fatal error.

Delimiter / Separator

When specifying a quote or tab as the separator it is necessary to escape it in order to prevent Transcript from interpreting it as a meaningful character during the parsing of the task script. For example:

option delimiter = \t # Specify a literal TAB character
option delimiter = \" # Specify a literal quote

Services

This option may be specified as readonly , overwrite or update and influences the behaviour of the service and services statements as follows:

Any services that are overwritten through use of option overwritewill lose any custom rates associated with them and a single default rate will be created instead.

In all cases, rate revisions may be created depending on the data being processed, but existing global rates will not be replaced with new ones if the effective date of the updated service matches that of an existing global revision.

Execution mode

option mode = strict
option mode = permissive

Transcript supports two modes of execution for tasks:

In strict mode, if an error is encountered at any point in the task, the error will be logged and execution will terminate
In permissive mode, many errors that would otherwise have caused the task to fail will be logged, the statement that caused the error will be skipped and execution will continue from the next statement in the task.

The mode option can be used multiple times and changed at any point during task execution. This means that specific sections of the task can be more error tolerant.

Errors that can be handled in permissive mode are mainly syntax errors or those involving invalid parameters to Transcript statements. There are error conditions that can arise during the execution of a statement will cause the task to fail even in permissive mode.

Log levels

Transcript generates a considerable amount of logging information during the execution of a task. The loglevel option can be used to increase or decrease the level of detail written to the logfile. All logging levels must be specified in UPPER CASE. The following levels can be set:

The order of the logging levels in the table above is significant, in that for any given level, all levels above it in the table are also in effect. Therefore a logging level of WARN will result in log entries for ERROR, FATAL, and INTERNAL level events, as well as warnings.

The loglevel option can appear multiple times within a transcript task and will take immediate effect whenever it is used. This means that within a task, the loglevel can be increased for certain statements and reduced for others.

Regardless of the logging level, some events will always create a logfile entry, for example the success or failure of a transcript task at the end of its execution.

Log mode

In order to minimise the effect on performance when logging, Transcript opens the logfile when it is first run and then holds it open until the completion of the task being executed. The logfile can be accessed in one of two modes:

The default is SAFE. It is not recommended that this be changed.

Examples

The following Transcript task will import a CSV file quoted with double quotes and delimited with commas and then export a copy with semicolons as delimiters and quoted with single quotes:

It also increases the logging level for the import statement

option quote = \"
option loglevel = DEBUGX
import usage from Azure
option loglevel = INFO
option separator = ;
option quote = '
export azure.Usage as c:\transcript\exported\azure_modified.csv

rename

Overview

The rename statement is used to change the name of an existing column in a DSET, or to change the source and/or alias of a DSET

Syntax

rename columnOldNametoNewName

rename dsetOldSource.OldAliastoNewSource.NewAlias

Details

Renaming a column

When renaming a column, OldName may be a column name. If it is not fully qualified then the column OldName in the DSET will be renamed.

The NewName argument must not be fully qualified.

Any dots (.) in the new column name will automatically be replaced by underscores (_), as a dot is a reserved character used to implement DSET namespaces.

Renaming a DSET

When renaming a DSET, NewSource.NewAlias must contain exactly one dot and must not be the name of an existing DSET. The rename takes immediate effect, thus any subsequent reference to the renamed DSET in the transcript task must use its new name.

Any pending definitions referencing the renamed DSET will automatically be updated with the new name.

Examples

Renaming columns:

Renaming a DSET:

aggregate

Overview

The aggregate statement is used to reduce the number of rows in a DSET while preserving required information within them

Syntax

aggregate [counter_columncolname] [dset.id][notime|daily] [offsetoffset][nudge] [default_functionfunction] colname function [... colname function]

Details

A quick introduction

To illustrate this consider the following two row dataset:

id,colour,location,quantity
1234,blue,europe,4.5
1234,green,europe,5.5

If we don't care about the colour value in the above records, we can combine them together. We do care about the quantity however, so we'll add the two values together to get the final result.

The statement to do this is:

aggregate notime id match location match quantity sum

id match means that the values in the id columns must be the same
location match means that the values in the location columns must be the same
quantity sum means that the resulting value should be the sum of the two existing values
by default, a function of first is applied to the columns, such that the original row retains its value

Applying these rules to the above example we get the following single result record:

id,colour,location,quantity,EXIVITY_AGGR_COUNT
1234,blue,europe,10,2

Parameters

The aggregate statement accepts a range of parameters as summarised in the table below:

If two records are deemed suitable for merging then the function determines the resulting value in each column. The available functions are as follows:

The counter column

By default this column is called EXIVITY_AGGR_COUNT but an alternative name may be specified using the optional counter_column parameter.

Any of counter_column, counter_col, count_column or count_col may be used as the parameter name. They all work in the same way.

For example the following statement will perform the aggregation and place the counts for each row into a column called merged_row_count

aggregate counter_column merged_row_count test.data category match quantity sum

If the column specified by counter_column does not exist then it will be created.

If the column does exist then the contents will be overwritten.

Note that the column named by counter_column may not be assigned an aggregation function. If it is, then an error will be logged and the transform will fail for the current data date.

Non time-sensitive aggregation

When the notime parameter is specified, the aggregation process treats any columns flagged as start and end times in the data as data columns, not timestamp columns.

De-duplication

The following illustrates the aggregate statement being used to remove duplicate rows from a DSET:

# The first column in the DSET being aggregated
# is called subscription_id

aggregate notime default_function match subscription_id match

The analysis of the statement above is as follows:

notime - we are not interested in timestamps
default_function match - by default every column has to match before records can be aggregated
subscription_id match - this is effectively redundant as the default_function is match but needs to be present because at least one pair of colname function parameters is required by the aggregate statement

The resulting DSET will have no duplicate data rows, as each group of rows whose column values were the same were collapsed into a single record.

Row reduction while preserving data

Time-sensitive aggregation

A quick example

Consider the following CSV file called aggregate_test.csv:

startUsageTime,endUsageTime,id,subscription_id,service,quantity
2017-11-03:00.00.00,2017-11-03:02.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:03.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:06.00.00,ID_3456,SUB_efgh,Medium VM,2
2017-11-03:00.00.00,2017-11-03:04.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:05.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:06.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:07.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:02.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:03.00.00,ID_3456,SUB_efgh,Medium VM,2
2017-11-03:00.00.00,2017-11-03:04.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:05.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:07.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:06.00.00,ID_3456,SUB_efgh,Medium VM,2

It is possible to aggregate these into 3 output records with adjusted timestamps using the following Transcript task:

import system/extracted/aggregate_test.csv source aggr alias test

var template = YYYY.MM.DD.hh.mm.ss
timestamp START_TIME using startUsageTime template ${template}
timestamp END_TIME using endUsageTime template ${template}
timecolumns START_TIME END_TIME
delete columns startUsageTime endUsageTime

aggregate aggr.test daily nudge default_function first id match subscription_id match service match quantity sum

timerender START_TIME as FRIENDLY_START
timerender END_TIME as FRIENDLY_END

Resulting in:

id,subscription_id,service,quantity,START_TIME,END_TIME,EXIVITY_AGGR_COUNT,FRIENDLY_START,FRIENDLY_END
ID_1234,SUB_abcd,Large VM,12,1509667200,1509692399,6,20171103 00:00:00,20171103 06:59:59
ID_3456,SUB_efgh,Medium VM,6,1509667200,1509688799,3,20171103 00:00:00,20171103 05:59:59
ID_3456,SUB_efgh,Large VM,8,1509667200,1509692399,4,20171103 00:00:00,20171103 06:59:59

When performing time-sensitive aggregation, any records with a start or end time falling outside the current data date will be discarded.

Further notes

Optionally, following daily an offset may be specified as follows:

aggregate aggr.test daily offset 2 id match subscription_id match quantity sum

create

Overview

The create statement is used to add one more more new columns to an existing DSET.

Syntax

create columnNewColumnName[valueValue]

create columns fromColumnName[usingValueColumnName]

create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [Column [/regex/]|stringliteral]

Details

Explicit single column creation

Syntax

create columnNewColumnName[valueValue]

Details

This statement is used to create a new column called NewColumnName. The NewColumnName argument may be a fully qualified column name, in which case the new column will be created in the DSET specified as part of that name.

Note: If no default DSET has been explicitly defined using the default dset statement then the DSET created by the first use or import statement in the Transcript task is automatically set as the default DSET.

A column called NewColumnName must not already exist in the DSET. If NewColumnName contains dots then they will be converted into underscores.

The new column will be created with no values in any cells, unless the optional value *Value* portion of the statement is present, in which case all the cells in the new column will be set to Value.

Examples

Create a new empty column called Cost in the default DSET: create column Cost

Create a new column called Cost with a value of 1.0 in every row of the default DSET: create column Cost value 1.0

Create a new column called Cost with a value of 1.0 in every row of the DSET custom.charges: create column custom.charges.Cost value 1.0

Automated single/multiple column creation

Syntax

create columns fromColumnName[usingValueColumnName]

Details

This statement is used to create multiple columns in a single operation. As is the case for create columns above, if the using ValueColumnName portion of the statement is not present, then all newly created columns will have no values in any cells.

Given this example dataset:

The statement create columns from ServiceName using Count will create the result shown below:

The names of the new columns to create are derived from the contents of the cells in the column called ColumnName, and the values (if opted for) are derived from the contents of the cells in the column called ValueColumnName. Duplicates are ignored. If all the cells in ColumnName have the same contents, then only a single new column will be created. To illustrate this, consider the following:

SubscriptionID,ServiceName,Quantity
FE67,StorageGB,30
1377,Small_VM,2
EDED,Medium_VM,8
8E1B,Large_VM,1
99AA,Small_VM,99

When applied to the data above, the statement create columns from ServiceName will produce the following result (note that only a single column called Small_VM is created, and that empty cells are represented with a separator character, which in the case of the below is a comma):

SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,,,,
1377,Small_VM,2,,,,
EDED,Medium_VM,8,,,,
8E1B,Large_VM,1,,,,
99AA,Small_VM,99,,,,

If opting to set the values in the new columns, then for each row the value in ValueColumnName will be copied into the column whose name matches ColumnName. When applied to the same original data, the statement create columns from ServiceName using Quantity will produce the following result:

SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,30,,,
1377,Small_VM,2,,2,,
EDED,Medium_VM,8,,,8,
8E1B,Large_VM,1,,,,1
99AA,Small_VM,99,,99,,

When using create columns the new columns are always created in the default DSET. This means that when no values are being set, it is possible to specify a different DSET for ColumnName. If the default DSET is Azure.usage, then the statement create columns from custom.data.Services will derive the names of the new columns from the cell contents in the Services column in the custom.data DSET.

This is only possible in the absence of the using ValueColumnName option. When values are to be set, both the ColumnName and ValueColumnName arguments must belong to the default DSET.

Example

The following transcript task will import the datasets Azure.usage and system/extracted/Services.csv, and create new (empty) columns in Azure.usage whose names are taken from the values in the column ServiceDefinitions in Services.csv.

import system/extracted/Services.csv source custom
import usage from Azure
default dset Azure.usage
create columns from custom.Services.ServiceDefinitions

Merging column values to create a new column

Syntax

create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [ ... Column [/regex/]|stringliteral]

If preferred, the wordusingmay be used instead of the wordfrom(both work in an identical fashion)

Details

This form of the statement is used to generate a new column containing values derived from those in one or more existing columns (termed source columns). The parameters are as follows:

The separator may be more than one character in length (up to 31 characters may be specified)

If a regex is specified then it must contain a subgroup enclosed in parentheses. The portion of the text in the source column matched by this subgroup will be extracted and used in place of the full column value.

The '/' characters surrounding the regular expression in the statement are not considered to be part of the expression itself - they are merely there to differentiate an expression from another column name.

If a regex is not specified, then the entire value in the source column will be used.

Options

By default the value extracted from a source column will be blank in the following two cases:

There is a blank value in a source column
No match for a regular expression is found in the value of a source column

In such cases the merged result will simply omit the contribution from the source column(s) in question. If all the source columns yield a blank result then the final merged result will also be blank.

This behaviour can be overridden through the use of the option statement. The options associated with the create mergedcolumn statement are as follows:

option merge_blank = some_text_here

This option will use the string some_text_here in place of any blank source column value.

option merge_nomatch = some_text_here

This option will use the string some_text_here if the result of applying the regular expression to a column value returns no matches.

Specifying the literal string <blank> as the merge_blank or merge_nomatch value will reset the option such that the default behaviour is re-activated.

Examples

Given the following dataset:

name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing

The following examples illustrate some uses of the create mergedcolumn statement:

Example 1

# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :

create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513

Example 2

If no regular expression is specified then the values in the source column will be used in their entirety:

# Create a new column called 'key' which combines the 'department'
# and 'user_id' columns separated by ":", with prefix

create mergedcolumn key separator : from string prefix department user_id

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,prefix:Development:123-456-123456
Tim,654-321-654321,Project Management,prefix:Project Management:654-321-654321
Joram,555-222-999111,Development,prefix:Development:555-222-999111
Joost,826-513-284928,Sales and Marketing,prefix:Sales and Marketing:826-513-284928

Example 3

Let us add a new row to the sample dataset which has a non-compliant value for the user_id:

name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
John,xxx-xxx-xxxxxx,Pending
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing

By default a non-matching value will result in a blank component of the merged result:

# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :

create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513

In this case, the resulting key for John has no separator characters in it. We can force a default value for the missing user_id portion as follows:

option merge_nomatch = [none]
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending:[none]
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513

services

This article assumes a knowledge of services, their rates and related concepts as documented in and in the article on the statement

Overview

The services statement is used to create or modify multiple services based on the data in a .

Syntax

services{ param1 = value [ ... paramN = value] }

Example:

Parameters may be specified in any order. The '=' between parameters and their values is optional and may be omitted, but if present it must be surrounded by white-space

Details

Summary

How column names are used

For many of the parameters to the services statement there are two ways of using a column name:

The values in the column are extracted from the usage data and those values are embedded as literals into the service definition.
The column name itself is used in the service definition such that the reporting engine dynamically determines the values to use for any given day when generating report data

Using the second method requires only a single rate revision which identifies by name the column(s) containing rate and/or COGS information. When a report is run, the charge engine then obtains the correct rate information for any given day from the data in those named columns.

Parameter table

The parameters supported by the services statement are summarised in the following table. The Type column in the table indicates the way the column name is used as described in How column names are used above. Additional information about each parameter can be found below the summary table itself.

Parameter details

usages_col

The usages_col parameter is the name of a column containing service keys. A service will be created for each distinct value in this column, and these values will be used as the service keys.

service_type

In order to calculate the charges associated with a service it is necessary to know the number of units of that service that were consumed. Exivity supports two methods of retrieving the units of consumption from usage data and the service_type determines which of these is applied.

Any given service may use one or other of these methods, which are as follows:

Manual services: service_type = MANUAL
Automatic services: service_type = AUTOMATIC

Please note that in the Exivity API and in the GUI, these parameters have different names.

Manual services

Manual services require that the units of consumption for each service named in the usages_col column are stored in separate columns whose names correlate to the service keys themselves.

To illustrate this, consider the following fragment of usage data:

In this case, the service_name column contains a list of service keys and for each of those service keys there is a corresponding column containing the units of consumption for that service. Thus in the above example we can see that there are two services, "Small VM" and "Large VM" and that the units of consumption for each of these services are in the columns of the same name.

The more manual services that are represented in the data, the more columns are required.

Automatic services

Automatic services require that the units of consumption for each service named in the usages_col column are stored in the column named by the consumption_col parameter. To represent the same information as that shown in the example above, the following would be used:

It can be seen that any number of automatic services, along with their consumption figures, can be represented using only two columns of data.

consumption_col

The consumption_col parameter is only required when creating automatic services and determines the column containing the units of consumption for each service as described above.

instance_col

It is not enough to know only the units of consumption for each service, as this merely provides the total consumption for each service across the entire usage. In the examples above, for example, the "Large VM" service has 10 units of consumption but using that information alone there is no way to know if this represents one instance of a VM used 10 times, 10 instances of VMs used once each, or something in between.

The instance_col parameter is therefore required to tell the difference. Typically this will be a unique identifier which groups the units of consumption into 'buckets'. In the case of a VM this may be a VM ID which remains constant throughout the life of a VM in the cloud.

To illustrate this, we can supplement the example usage fragment used previously with additional information to use as the instance_col as follows:

By specifying instance_col = vmid we can now see that the usage represents:

5 instances of a single Small VM with an ID of 444
6 instances of a Large VM with an ID of 555
4 instances of a Large VM with an ID of 666

description_col

If specified, the description_col denotes a column containing a friendly description for reports to identify the service with.

Typically in cloud usage data, services are identified using unique IDs (referred to as keys in Exivity) which are often non-meaningful to human eyes, so Exivity supports a 'friendly' description for each service for display purposes when generating a report.

For example description_col = description may be used in conjunction with the following data to map the service_id to a friendly name:

It is not mandatory to provide a description_col parameter. If one is not supplied then the description will be set to a duplicate of the service key (as derived via the usages_col parameter).

In the example above, it can be seen that there are multiple rows in the data for the same service key (vmid). When using description_col, the first row for each distinct value in the usages_col will be used to set the description.

category_col

Usage data normally contains information about a range of services of different types such as Virtual Machines, Storage, Networking and so on. By referencing a column in the usage data which identifies the correct category for each service, multiple categories will be created and each service assigned to the correct category by the services statement.

To illustrate this, let us extend the sample data as follows:

By specifying category_col = category each service will now be associated with the correct category.

interval

The interval parameter is used to specify a literal interval for all the services created by the services statement.

The interval parameter may be any of:

individually
daily
monthly

If the interval parameter is not specified, then a default interval of monthly will be used.

interval_col

In the event that different services in the usages_col require different charge intervals, a column name containing the interval to use may be specified using the interval_col column as follows:

By specifying interval_col = interval each service in the above usage data will be assigned the correct charge interval.

model

The model parameter is used to enable proration for monthly services. Either of unprorated or prorated may be specified.

If no model is specified, then a value of unprorated will be used by default.

model_col

In the event that different services in the consumptions_col require different proration settings, the model_col parameter can be used to specify which column contains the proration setting for each service.

By specifying model_col = model, each service in the above usage data will be assigned the correct proration model.

charge_model

If specified as peak then the charge for any monthly services created will be calculated based on the day with the highest charge in the month.

If specified as average then the charge for any monthly services created will be calculated as the average unit price (for days that have usage only) multiplied by the average quantity (days with no usage will be treated as if they had a quantity of 0).

If specified as last_day then the charge for any monthly services created will be calculated based on the last day of the month.

If specified as day_xxx (where xxx is a number in 1-28 range) then the charge for any monthly services created will be calculated based on the specified day of the month.

charge_model_col

In the event that different services in the consumptions_col require different charge models, the charge_model_col parameter can be used to specify which column contains the charge_model setting for each service. For the example data below charge_model_col would be set to chargetype:

unit_label

The unit_label parameter is used by reports to provide a meaningful description of the units of consumption associated with a service. A virtual machine may have a unit label of Virtual Machines, but storage-related services may have a unit label of Gb for example.

If the unit_label parameter is not specified then a default lavel of Units will be used.

The unit label may be up to 63 characters in length. Longer values will be truncated.

unit_label_col

In cases where the services contained in the usages_col column collectively require more than one unit label, the unit_label_col parameter can be used to identify a column in the usage data which contains an appropriate label for each service.

For example unit_label_col = label can be used to associate an appropriate label using the data below:

The parameters rate_col, set_rate_using, cogs_col and set_cogs_using, (all of which are detailed below) collectively determine the types of charge that will be associated with the service definitions created by the services statement.

rate_col

The rate_col parameter is used to determine the column in the usage data which contains the unit rates for the service definitions created by the services statement.

As each service definition is created, an initial rate revision is also created which contains the column named by the rate_col parameter. When a report is run, for each day in the reporting range the unit rate for that day will be determined by whatever value is in the column named by the rate_col parameter in the usage data.

This means that only a single rate revision is required, even if the actual value in the rate_col column is different from day to day.

set_rate_using

The set_rate_using parameter is also used to determine the unit rate for each service. This differs from the rate_col parameter in that the values in the column named by set_rate_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

This means that the unit cost is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

Either of rate_col or set_rate_using (but not both) may be used in a single services statement

cogs_col

The cogs_col parameter is used to determine the column in the usage data which contains the COGS rate associated with the service definitions created by the services statement.

As each service definition is created, an initial rate revision is also created which contains the column named by the cogs_col parameter. When a report is run, for each day in the reporting range the COGS rate for that day will be determined by whatever value is in the column named by the cogs_col parameter in the usage data.

If a monthly service has different COGS rates for different days in the month, then whichever results in the highest charge will be used.

This means that only a single rate revision is required, even if the actual value in the cogs_col column is different from day to day.

set_cogs_using

The set_cogs_using parameter is also used to determine the COGS rate for each service. This differs from the cogs_col parameter in that the values in the column named by set_cogs_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

This means that the COGS rate is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

Either of cogs_col or set_cogs_using (but not both) may be used in a single services statement

set_min_commit_using

The set_min_commit_using parameter is used to set the minimum commit value in the initial rate revision for each service.

The values in the column identified by set_min_commit_using are extracted from the usage data and used as numeric literals in the revision.

effective_date

When creating the initial rate revision for a service, the value specified by the effective_date parameter is interpreted as a yyyyMMdd value to determine the date from which the revision should be applied.

When using effective_date, the value will be used to set the initial rate revision date for all the service definitions created by the services statement. If different services require different effective dates then the effective_date_col parameter may be used to determine the effective date for each service from a column in the usage data.

effective_date_col

If there is a column in the usage data containing yyyyMMdd values representing the desired effective date for the initial revision of each service, The effective_date_col parameter may be used to extract the values from this column and set the effective date for each service accordingly.

Either of effective_date or effective_date_col may be specified in a single services statement, but not both

Examples

if

Overview

The if statement is used to conditionally execute one or more statements

Syntax

if (conditional expression) {
    <statements ...>
} [else {
    <statements ...>
}]

Conditional Expressions

A conditional expression (hereafter referred to as simple an expression) is evaluated to provide a TRUE or FALSE result which in turn determines whether one or more statements are to be executed or not. The following are examples of a valid expression:

(${dataDate} == 20180801)

((${dataDate} >= 20180801) && ([hostname] == "templateVM"))

An expression used by the if statement may contain:

Numeric and string literals
Regular expressions
Variables
Operators
Functions

Numeric and string literals

A literal is a specified value, such as 4.5 or "hostname". Literals may be numbers or strings (text).

If a literal is non-quoted then it will be treated as a number if it represents a valid decimal integer or floating point number (in either regular or scientific notation), else it will be treated as a string.

If a literal is quoted then it is always treated as a string, thus 3.1515926 is a number and "3.1415926" is a string.

Regular expressions

Regular expressions must be enclosed within forward slashes (/), and are assumed to be in ECMAScript format.

If present, a regular expression must be used on the right hand side of either an !~ or an =~ operator, and when evaluated it will be applied to the value on the left hand side of an operator, eg:

if (${dataDate} =~ /[0-9]{4}01/) {
    var first_day_of_month = yes
} else {
    var first_day_of_month = no
}

As the forward slash is used as a delimiter for the expression, any literal forward slashes required by the expression should be escaped with a back-slash: \/

Variables

Variables can be used within expressions, in which case they are replaced with their values. Once expanded, these values are treated as literals.

Operators

Operators are evaluated according to the operator precedence rules in the table below (where the highest precedence is evaluated first), unless parentheses are used to override them. Operators with the same precedence are evaluated from left to right.

Although expressions are evaluated based on the precedence of each operator as listed in the above table, it is recommended that parenthesis are used within the expression in order to remove any ambiguity on the part of a future reader.

Functions

A function is used to evaluate one or more arguments and return a result which is then taken into consideration when evaluating the overall truth of the expression.

Function calls start with a the character @ which is followed by the function name and a comma separated list of parenthesised parameters, for example @MIN(1, 2, 3) .

Function names must be specified in UPPER CASE as shown in the examples below.

The following functions are supported by the if statement:

Numeric functions

MIN

@MIN(number, number [, number ...])

Return the smallest number from the specified list (requires at least 2 arguments)

Examples:

@MIN(1,2) returns 1
@MIN(1,2,-3) returns -3
@MIN(1,2,"-1") returns -1 - string "-1" is converted to number -1
@MIN(1,2,3/6) returns 0.5
@MIN(1,2,"3/6") returns 1 - string "3/6" is converted to number 3, up to first invalid character
@MIN(1,2,"zzz") returns 0 - string "zzz" is converted to number 0

MAX

@MAX(number, number [, number ...])

Return the largest number from the specified list (requires at least 2 arguments)

Examples:

@MAX(1,2) returns 2
@MAX(-1,-2,-3) returns -1
@MAX(1,2,100/10) returns 10

ROUND

@ROUND(number [, digits])

Returns number rounded to digits decimal places. If the digits argument is not specified then the function will round to the nearest integer.

This function rounds half away from zero, e.g. 0.5 is rounded to 1, and -0.5 is rounded to -1

Examples:

@ROUND(3.1415,3) returns 3.142
@ROUND(3.1415,2) returns 3.14
@ROUND(3.1415926536,6) returns 3.141593
@ROUND(3.1415) returns 3
@ROUND(2.71828) returns 3

String functions

CONCAT

@CONCAT(string1, string2 [, stringN ...])

This function will treat all its arguments as strings, concatenate them and return the result.

Examples:

@CONCAT("the answer ", "is") returns the answer is
@CONCAT("the answer ", "is", " 42") returns the answer is 42
@CONCAT("the answer ", "is", " ", 42) returns the answer is 42

SUBSTR

@SUBSTR(string, start [, length])

Return a sub-string of string, starting from the character at position start and continuing until the end of the string end until the character at position length, whichever is shorter.

If length is omitted, then the portion of the string starting at position start and ending at the end of the string is returned.

Examples:

@SUBSTR("abcdef", 1) returns abcdef
@SUBSTR("abcdef", 3) returns cdef
@SUBSTR("abcdef", 3, 2) returns cd
@SUBSTR("abcdef", 3, 64) returns cdef

STRLEN

@STRLEN(string)

Returns the length of its argument in bytes.

Examples:

@STRLEN("foo") returns 3
@STRLEN(@CONCAT("ab", "cd")) returns 4
@STRLEN(1000000) returns 7 (the number 1000000 is treated as a string)

PAD

@PAD(width, value [, pad_char])

This function returns value, left-padded with pad_char (0 by default) up to specified width. If width is less than or equal to the width of value, no padding occurs.

Examples:

@PAD(5, 123) returns 00123
@PAD(5, 12345) returns 12345
@PAD(1, 12345) returns 12345
@PAD(5, top, Z) returns ZZtop

EXTRACT_BEFORE

@EXTRACT_BEFORE(string, pattern)

This function returns the substring of string that precedes the pattern. If pattern cannot be found in the string, or either string or pattern are empty, result of the function is empty string.

Examples:

@EXTRACT_BEFORE("abcdef", "d") returns ab
@EXTRACT_BEFORE("abcbc", "bc") returns a
@EXTRACT_BEFORE("abcdef", "x") returns empty string

EXTRACT_AFTER

@EXTRACT_AFTER(string, pattern)

This function returns the substring of string that follows the pattern. If pattern cannot be found in the string, or either string or pattern are empty, result of the function is empty string.

Examples:

@EXTRACT_AFTER("abcdef", "cd") returns ef
@EXTRACT_AFTER("abcabc", "ab") returns cabc
@EXTRACT_AFTER("abcdef", "abb") returns empty string

EXTRACT_XXX functions can be combined to extract the middle part of the string, for example @EXTRACT_AFTER(@EXTRACT_BEFORE("abcdef", "ef"), "ab") returns cd.

Date functions

All date functions operate with dates in yyyyMMdd format

CURDATE

@CURDATE([format])

Returns the current (actual) date in the timezone of the Exivity server. The format may be any valid combination of strftime specifiers. The default format is %Y%m%d which returns a date in yyyyMMdd format.

Examples (assuming run date is 1 July 2019, at 12:34:56):

@CURDATE() returns 20190701
@CURDATE(\"%d-%b-%y\") returns 01-Jul-19
@CURDATE("%H:%M:%S") returns 12:34:56
@CURDATE("%u") returns 1 (weekday - Monday)
@CURDATE("%j") returns 182 (day of the year)

DATEADD

@DATEADD(date, days)

Adds a specified number of days to the given date, returning the result as a yyyyMMdd date.

Invalid dates are normalised, where possible (see example below):

Examples:

@DATEADD(20180101, 31) returns 20180201
@DATEADD(20180101, 1) returns 20180102
@DATEADD(20171232, 1) returns 20180102 (the invalid date 20171232 is normalised to 20180101)
@DATEADD(20180101, 365) returns 20190101

DATEDIFF

@DATEDIFF(end_date, start_date)

Returns the difference in days between two yyyyMMdd dates. A positive result means that date1 is later than date2. A negative result means that date2 is later than date1. A result of 0 means that the two dates are the same.

Invalid dates are normalised, when possible (see example below):

Examples:

@DATEDIFF(20190101, 20180101) returns 365
@DATEDIFF(20180201, 20180101) returns 31
@DATEDIFF(20180102, 20180101) returns 1
@DATEDIFF(20180101, 20180102) returns -1
@DATEDIFF(20180101, 20180101) returns 0
@DATEDIFF(20171232, 20180101) returns 0 (the invalid date 20171232 is normalised to 20180101)

DTADD

@DTADD(datetime, count [, unit])

This function adds count number of unit_s (DAYS by default) to the specified datetime value and return normalised result datetime value in YYYYMMDDhhmmss_ format.

Datetime can be in any of the following formats:

YYYYMMDD
YYYYMMDDhh
YYYYMMDDhhmm
YYYYMMDDhhmmss

All missing bits of datetime value assumed zeros.

Supported units are (both singular and plural spellings supported):

YEAR
MONTH
DAY (default)
HOUR
MINUTE
SECOND

Example

@DTADD(20190701, 2) returns 20190703000000
@DTADD(20190701, 2, HOURS) returns 20190701020000
@DTADD(2019070112, 50, DAYS) returns 20190820120000
@DTADD(20190701123456, 10, MONTH) returns 20200501123456

Transcript-specific functions

Transcript-specific functions may be preceded with an exclamation mark in order to negate their output. For example:

if (!@COLUMN_EXISTS("colName")) {
   The column colName does NOT exist
}

FILE_EXISTS

@FILE_EXISTS(filename)

Returns 1 if the file filename exists, else returns 0.

FILE_EMPTY

@FILE_EMPTY(filename)

In strict mode, this function returns 1 if the file filename exists and is empty. If the file does not exist, then this is considered an error.

In permissive mode, a non-existent file is considered equivalent to an existing empty file.

In either case, if the file exists and is not empty, the function returns 0

FILE_EXISTS and FILE_EMPTY functions will only check files within Exivity home directory and its sub-directories, filename must contain pathname relative to Exivity home directory.

DSET_EXISTS

@DSET_EXISTS(dset.id)

Returns 1 if the specified DSET exists, else 0

DSET_EMPTY

In strict mode (option mode = strict), this function returns 1 if the specified DSET exists and is empty. If the DSET does not exist, then this is considered an error.

In permissive mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET.

In either case, if the DSET exists and is not empty, the function returns 0.

COLUMN_EXISTS

@COLUMN_EXISTS(column_name)

This function returns 1 if the specified column exists, else 0. The column name may be fully-qualified, but if it is not, then it is assumed to be in the default DSET.

DSET_ROWCOUNT

@DSET_ROWCOUNT(dset)

This function returns number of rows within specified DSET.

In permissive mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET, and zero is returned.

DSET_COLCOUNT

@DSET_COLCOUNT(dset)

This function returns number of columns within specified DSET.

In permissive mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET, and zero is returned.

import

Overview

The import statement is used to read a CSV file (which must conform to Dataset standards) from disk in order to create a DSET which can then be processed by subsequent Transcript statements.

To import CSV files that do not use a comma as the delimiter, please refer to the Quote and Separator Options section further down in this article.

Syntax

Import from CSV files

importfilenamefromsource[aliasalias][options { ... }]

importfilenamesourcecustom_source[aliasalias][options { ... }]

Import from CCR files

import filename.ccr source custom_source [aliasalias]

Import from database tables

import ACCOUNT sourcecustom_source[aliasalias]

import USAGE fordatefromdb_namesourcecustom_source[aliasalias]

Details

If using the Windows path delimiter - \ - it is advisable to put the path and filename in double quotes to avoid the backslash being interpreted as an escape character

Options

Data imported from a CSV file can be filtered as it is being read in order to reduce post-import processing time and memory overhead.

The import filters are configured via the options parameter, which if present is imemdiately followed by one or more name = value pairs enclosed within braces, for example:

import system/extracted/example.csv source test alias data options {
    skip = 1
}

The = sign is optional, but if present it must be surrounded by whitespace.

Multiple options may be specified and each option must be placed on separate line, for example:

import system/extracted/example.csv source test alias data options { 
    skip = 1
    omit = 4
}

The options supported by import are as follows:

If the skip option is specified then the column headings in the imported data will be set to those of the first row following the skipped rows

The filter expression, if specified, is of identical format to that used by the where statement and it must reference at least one column name, enclosed within square brackets:

import system/extracted/example.csv source test alias data options { 
    # Only import rows where the vmType column is not the value "Template"
    filter ([vmType] != Template)
}

To specify a parameter or column name that contains spaces in an expression, use quotes within the square brackets as follows:

import system/extracted/example.csv source test alias data options { 
    filter (["service name"] =~ /.*D/)
}

File name / pattern

If a pattern is specified in the import options, then the filename parameter is treated as an ECMAScript-type regular expression, and all files matching that pattern are imported and appended to one another to result in a single DSET.

Only filenames may contain a regular expression, directory names are always treated literally.

If using a regular expression, the import options are applied to all files matching that expression. All files must have the same structure (after any select/ignore options have been applied).

If any file in the matching set being imported has different columns to the first file that matched then an error will be generated and the task will fail.

Database tables

Account data

To import all accounts from Exivity the following statement is used:

import ACCOUNT sourcecustom_source[aliasalias]

This functionality is intended for advanced use cases where it is necessary to correlate information about existing accounts into the new data being processed as part of the Transform step.

Usage data

To import usage data previously written to an RDF using finish the following statement is used:

import USAGE fordatefromdsetsourcecustom_source[aliasalias]

This functionality may be useful in the event that the original data retrieved from an API has been deleted or is otherwise unavailable.

The date must be inyyyyMMddformat. The RDF from which the usage will be imported is:

<basedir>/system/report/<year>/<month>/<day>_<dset>_usage.rdf.

To illustrate this in practice, the statement ...

import USAGE for 20180501 from azure.usage source test alias data

... will load the usage data from ...

<basedir>/system/report/2018/05/01_azure.usage_usage.rdf

... into a DSET calledtest.data.

Source and Alias tags

A file imported from disk contains one or more named columns which are used to create an index for subsequent processing of the data in that file. As multiple files may be imported it is necessary to use namespaces to distinguish between the DSETs created from the files (each imported file is converted into a memory-resident DSET before it can be processed further). This is accomplished through the use of source and alias tags. Each file imported is given a unique source and alias tag, meaning that any column in the data imported from that file can be uniquely identified using a combination of source.alias.column_name.

There are two main variations of the import statement which are as follows:

Automatic source tagging

importfilenamefromsource[aliasalias]

By convention, data retrieved by the USE extractor from external sources is located in a file called

<basedir>/system/extracted/<source>/<yyyy>/<MM>/dd>_.csv

where <yyyy> is the year, <MM> is the month and <dd> is the day of the current data date.

Typically, the <source> portion of that path will reflect the name or identity of the external system that the data was collected from.

By default the alias tag will be set to the filename, minus the `_ portion of that filename and _.csv` extension but an alias can be manually specified using the optional _alias parameter.

As an example, assuming the data date is 20170223, the statement:

import usage from Azure

will create a DSET called Azure.usage from the file /system/extracted/Azure/2017/02/23_usage.csv.

The statement:

import usage from Azure alias custom

will create a DSET called Azure.custom from the file

<basedir>/system/extracted/Azure/2017/02/23_usage.csv.

Manual source tagging

importfilenamesourcecustom_source[aliasalias]

This form of the statement will import filename and the source tag will be set to the value of the custom_source parameter.

By default the alias tag will be set to the filename minus the .csv extension but an alias can be manually specified using the optional alias parameter.

As an example, assuming the data date is 20170223, the statement:

import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

will create a DSET called Azure.usage fom the file

<basedir>/system/extracted/Azure/20170223.csv

Importing CCR files

Options will be ignored when importing CCR files

When performing an import using manual tagging it is possible to import a Cloud Cruiser CCR file. This is done by specifying the .ccr filename extension as follows:

import system/extracted/myccrfile.ccr source MyData

Full information regarding CCR files is beyond the scope of this article. Cloud Cruiser documentation should be consulted if there is a requirement to know more about them.

In order to create a DSET from a CCR file, Transcript coverts the CCR file to a Dataset in memory as an interim step. The resulting DSET will have blank values in any given row for any heading where no dimension or measure in the CCR file existed that matched that heading. No distinction is made between dimensions and measures in the CCR file, they are imported as columns regardless.

When importing a CCR file, the quote and separator characters are always a qouble quote - " - and a comma - , - respectively. Fields in the CCR file may or may not be quoted. The conversion process will handle this automatically and any quotes at the start and end of a field will be removed.

Quote and Separator options

The fields in a dataset file may or may not be quoted and the separator character used to delineate fields can be any character apart from an ASCII NUL (0) value. It is therefore important to ensure that the correct options are set before importing a dataset in order to avoid unwanted side effects.

The options relating to import are quote and separator (or delimiter). By default, these are set to a double quote and a comma respectively.

If defined the quote character will be stripped from any fields beginning and ending with it. Any additional quote characters inside the outer quotes are preserved. Fields that do not have quotes around them will be imported correctly, unless they contain a quote character at only one end of the field.

If no quote character is defined, then quote characters are ignored during import, but any separator characters in the column headings will be converted to underscores.

Embeds

If the embed option is set, then a curly bracket in the data - { - will cause all characters (including newlines) up until the closing bracket - } - to be imported. Nested brackets are supported, although if there are an uneven number of brackets before the end of the line an error will be generated in the logfile and the task will fail.

Embeds are not supported in CCR files

Examples

Create a DSET called Azure.usage from the file /system/extracted/Azure/2017/02/23_usage.csv import usage from Azure

Create a DSET called Azure.custom from the file /system/extracted/Azure/2017/02/23_usage.csv import usage from Azure alias custom

Create a DSET called Azure.usage fom the file /system/extracted/Azure/20170223.csv import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

Create a DSET called Azure.usage from the file /system/extracted/Azure/20170223.csv import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage