All pages
Powered by GitBook
1 of 34

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

default

Overview

The default statement is used to explicitly define the default DSET to use when specifying a column name as an argument in subsequent statements.

Syntax

default dsetsource.alias

Details

Given that multiple DSETS can be loaded at once, it is necessary to specify which DSET to operate on when performing actions such as creating a new column with . A column name in a Transcript statement is assumed as belonging to the default DSET unless it is a column name.

If there is no default statement in the Transcript, then the first CSV file imported via the statement will automatically be designated as the default DSET.

The default statement can be used multiple times throughout a Transcript, in which case the default DSET will be whichever was specified by the last default statement executed.

Lastly, when executing a statement, unless otherwise specified, the default DSET will be used to populate the reporting database created as a result.

When changing the default DSET, any definitions that referenced the default DSET at the time they were created will be updated with the new default DSET.

Examples

Set custom.datafile as the default DSET:

default dset custom.datafile

capitalise

Overview

The capitalise (the spelling capitalize is also supported) statement is used to modify the name of a column and/or the values in a column such that the first character is a capital letter and the remaining characters are lower case.

Syntax

capitalise values|heading [and values|heading] in column|columnsColName1 [... ColNameN]

After the keyword in, either of the keywords column or columns may be used

Details

The heading and values keywords refer to the name of an existing column and the values in each row for that column respectively.

Only the first character in the column name or value is converted to upper case. If this character is not a letter, then the statement will have no effect on it. For example applying the statement:

will have no effect on the column name as the first character is an underscore. However, applying the same statement to a column called _VMName would result in a new name of _vmname as after attempting to make the first character a capital (which in the case of the underscore has no effect), the remaining characters are converted to lower case.

Any number of column names may be specified, and any of these may or may not be fully qualified. When applied to values in a column, blank values are ignored.

Examples

finish

Overview

The finish statement creates a Reporting Database File (RDF) from a DSET. The RDF can subsequently be used by the reporting engine.

Syntax

finish[dset.id]

Details

The finish statement is used to create an RDF from a DSET. Only a single DSET can be used to create an RDF, but multiple finish statements may be used within the same task file. If there is no dset.id parameter then the default DSET will be used.

The RDF created by finish will be saved as <BaseDir>\system\report\<yyyy>\<MM>\<dd>_source.alias.rdf

where:

  • <yyyy> is the 4-digit year

  • <MM> is the 2-digit month

  • <dd> is the 2-digit day

Any existing RDF with the same name will be overwritten.

Examples

Create a Reporting Database file for the default DSET: finish

Create a Reporting Database file for the DSET Azure.usage finish Azure.usage

rename

Overview

The rename statement is used to change the name of an existing column in a DSET, or to change the source and/or alias of a DSET

set

Overview

The set statement is used to write a specified value into all the cells in any given column, or to copy values from one column to another.

calculate

Overview

The calculate statement is used to perform arithmetic operations using literal and column values.

convert

Overview

The convert statement is used to convert values in a column from base-10 to base-16 or vice-versa.

source.alias are the source and alias tags which form the DSET ID

Syntax

calculate columnResultColas source operation source

where source is either of columncolName or valueliteral_value

and operation is one of the characters + - * / % for addition, subtraction, multiplication, division and modulo respectively.

There must be whitespace on each side of the _operation`_character

Examples: calculate column ResultCol as column Amount * value 1.2 calculate column Net as column total - column cogs calculate column constant_7 as value 3.5 + value 3.5

Details

The ResultCol parameter is the name of the column that will hold the results. This column may or may not exist (if necessary it will be created automatically).

Both of the two source parameters can specify a literal value, or the name of a column containing the value to use when performing the calculation.

  • A literal value is specified using valueN where N is the literal number required

  • A column name is specified using columncolName where ColName is the name of the column containing the values required

The ResultCol may be the same as a column specified by one of the source parameters in which case any existing values in it will be updated with the result of the calculation.

Additional notes:

  • Any blank or non-numeric values in a source column will be treated as 0

  • An attempt to divide by zero will result in 0

  • When performing a modulo operation, the two source values are rounded to the nearest integer first

  • If the result column already exists then if option overwrite is set to no, only blank cells in the result column will be updated.

Examples

  • Add 1.5 to the values in the Rate column:

calculate column Rate as column Rate + value 1.5

  • Multiply the values in the Rate column by those in the Quantity column

  • Store the result in a new column called Charge

calculate column Charge as column Rate * column Quantity

create
fully qualified
import
finish
service
Syntax

rename columnOldNametoNewName

rename dsetOldSource.OldAliastoNewSource.NewAlias

Details

Renaming a column

When renaming a column, OldName may be a fully qualified column name. If it is not fully qualified then the column OldName in the default DSET will be renamed.

The NewName argument must not be fully qualified.

Any dots (.) in the new column name will automatically be replaced by underscores (_), as a dot is a reserved character used to implement DSET namespaces.

Renaming a DSET

When renaming a DSET, NewSource.NewAlias must contain exactly one dot and must not be the name of an existing DSET. The rename takes immediate effect, thus any subsequent reference to the renamed DSET in the transcript task must use its new name.

Any pending service definitions referencing the renamed DSET will automatically be updated with the new name.

Examples

Renaming columns:

Renaming a DSET:

Syntax

setColNametoValue

setColNameasSrcColName

setColName=Expression

Details

If the keyword word to is used then column ColName will be updated using the constant value specified in Value.

If the word as is used then the values in the column SrcColName will be copied into column ColName.

If = is used then the expression is evaluated and the column values are set to the result. For a complete list of functions that can be used in an expression please refer to the list of functions in the article about the if statement

The ColName argument must identify an existing column. The modified column will be in the default DSET unless it is a fully qualified column name.

If the overwrite option is disabled, then only cells with blank values will be updated with the new value.

Examples

Set the column called Flag in the default DSET to a value of 0:

Set the column called Flag to be the value in the column status:

Set the column called DiskTier to the 4th character in the existing value of that column

set DiskTier = @SUBSTR([DiskTier],4,1)

Set the column called Flag in the DSET custom.dataset to a value of Pending:

Set the column Flag to free if the rate column contains a value of 0:

Fill any blank values in the column username in the DSET Azure.usage to Unknown :

Syntax

convertcolNameto decimal|hex from decimal|hex

The keywords decanddecimaland the keywordshexandhexadecimalare equivalent.

Details

When converting values in a column, the following considerations apply:

  • Values in the column are replaced with the converted values

  • The colName argument must reference an existing column, and may optionally be fully qualified (else the column is assumed to be in the default DSET)

  • If any values in the column are not valid numbers, they will be treated as 0

  • Blank values are ignored

  • The convert statement may be used in the body of a where statement

  • If a value in colName contains a partially correct value such as 123xyz then it will be treated as a number up to the first invalid character, in this case resulting in a value of 123.

  • The hex digits in the original value can be either upper or lower case

  • The hex digits from A-F will be rendered in upper case in the converted output

  • The convert statement only supports integer values (floating points will be treated as floored to the nearest integer)

Example

capitalise heading in column _vmname
capitalise heading in column servicegroup

capitalize heading and values in columns servicegroup Azure.usage.resourcepool chargeinterval

capitalise values in column Region
# Rename the column 'UnitCost' in the default DSET to 'Cost'
rename column UnitCost to Cost

# Rename the column 'UnitCost' in the DSET 'custom.prices' to 'CustomCost'
rename column custom.prices.UnitCost to CustomCost
# Usage file has yyyyMMdd in filename
import "system/extracted/AzureJuly/${dataDate}.ccr" source AzureJuly

# The resulting DSET will have an ID of 'AzureJuly.yyyyMMdd'
# where yyyyMMdd is the current dataDate

#
# Any additional processing would be done here
#

# Remove the yyyyMMddfrom the DSET ID
rename dset AzureJuly.${dataDate} to Azure.July

# Create 'Azure.July.rdf'
finish
set Flag to 0
set Flag as status
set custom.dataset.Flag to Pending
where ([rate] == 0) {
    set Flag to free
}
option overwrite = no
set Azure.usage.username to Unknown
convert decimal_count from decimal to hex
convert unique_id from hexadecimal to dec

copy

This article covers both the copy and move statements. They both work in the same way apart from the fact that move deletes the source row after copying it.

Overview

The copy statement is used to copy rows from one DSET to another

Syntax

copy rows todset.id

move rows todset.id

Details

Both copy and move must be used within the body of a statement. Only rows that match the expression will be copied (or moved).

  • The DSET from which rows will be copied or moved is automatically determined from the expression used by the where statement.

  • The DSET to which rows will be copied or moved is determined by the dset.id parameter

The source and destination DSETs must be different (it is not possible to copy or move a row within the same DSET).

The destination DSET may or may not exist. If it does not exist then it will be created. If it does exist then the following logic is applied:

  • If the destination DSET has more columns than the source DSET then the new rows in the destination DSET will have blank values in the rightmost columns

  • If the destination DSET has fewer columns than the source DSET then the destination DSET will be extended with enough new columns to accomodate the new rows. In this case, existing rows in the destination DSET will have blank values in the rightmost columns

If the destination DSET is extended to accomodate the source rows then the new (rightmost) columns will have the same names as the equivalent columns in the source DSET. In the event that this would cause a naming conflict with existing columns in the destination DSET, one or more new columns in the destination DSET will heve a suffix added to their name to ensure uniqueness. This suffix takes the form _N where N is a number starting at 2.

To illustrate this, if the source DSET has columns called subscription,name,address,hostname and the destination DSET has a single column called name then the resulting extended destination DSET would have columns called name,subscription,name_2,address,hostname.

Example

normalise

Overview

The normalise statement is used to update the values in a numerical column such that they are all positive, negative or inverted.

In this documentation the spelling normalise

append

Overview

The append statement is used to append one to the end of another.

export

Overview

The export statement is used to snapshot the data in a DSET and write it to disk as a Dataset.

lowercase

Overview

The lowercase statement is used to modify the name of, and/or the values in, a column such that any upper case characters are replaced with their lower case equivalent.

where
is used but
normalize
may also be used. The functionality is identical in either case.

Syntax

normalise columncolNameas positive

normalise columncolNameas negative

normalise columncolNameas invert

normalise columncolNameas standard

Details

The normalise statement processes each value in the column called colName and applies the following logic based on the last argument shown above as follows:

Argument

Result

positive

All negative numbers are replaced with their positive equivalent. Non-negative numbers are left unmodified.

negative

All positive numbers are replaced with their negative equivalent. Negative numbers are left unmodified.

invert

All positive numbers are replaced with their negative equivalent, and all negative numbers are replaced with their positive equivalent

standard

All non-blank values are assumed to be a decimal number and are replaced with that value in conventional notation. This functionality is intended to provide a means to convert numbers in scientific notation such as 2.1E-5 to conventional notation such as 0.000021.

In order to be considered a number, a value in the colName column must start with any of the characters +, -, . or 0 to 9 and may contain a single . character which is interpreted as a decimal point.

If a value in colName is non-numeric or blank it is left intact

When using standardall non-blank values are assumed to be numeric, and as such any non-numeric values will be changed to a numeric zero.

Additionally:

  • Any numerical value in colName which starts with a +, . or decimal character is considered positive

  • Any numerical value in colName which starts with a - character is considered negative

  • When using standard the resulting conventional number will be accurate up to 14 decimal places

The normalise statement ignores the option overwrite setting, as its sole purpose is to modify existing values.

Example

Syntax

appendsource_dset.idtodestination_dset.id

Details

If the source DSET has any column names not present in the destination DSET then additional columns are automatically created in the destination DSET. These additional columns will contain blank values by default.

If one or more column names are present in both DSETs then the columns copied from the source DSET may be re-ordered into the same order as that used by the destination DSET.

At the end of the operation, the destination DSET will contain all the data from both DSETs, and the source DSET is unchanged.

Both DSETs must exist and both should have data. To verify a DSET existents or to check whether a DSET is empty, use one of the following functions:

  • @DSET_EMPTY

  • @DSET_EXISTS

Additionally, it is not possible to append a DSET to itself.

Example

Given the following DSETs:

DSET ID: example.data

DSET ID: example2.data

The statement append example2.data to example.data will result in the following destination DSET (example.data):

DSET
Syntax

exportsource.aliasasfilename

Details

The exported file will be created under <base_dir>/exported. The filename parameter may include a path as well as the filename to export, so long as it does not contain the substring ".." and is not an absolute path. If the path contains one or more directories that do not exist then they will be created.

Exporting usage

The source.alias argument is the DSET ID to export (see Core concepts for more information on DSET IDs).

The export statement can be used at any point during this process to save a CSV (conforming to Dataset format) which snapshots the data in the DSET at that moment in time. This can be used for a number of purposes including:

  • Examining the state of data part-way through processing for debugging purposes

  • Creating custom exports for subsequent import into 3rd part systems

  • Producing output CSV files (which may potentially contain merged data from multiple sources) for subsequent processing with another Transcript

Examples

When specifying a Windows path it is advisable to use UNIX-style forward slashes as the path delimiters, or to put the path+filename in double quotes to avoid occurrences of\t being interpreted as a TAB character

The following Transcript will import a quoted CSV file, add a ProcessedFlag column to it with a value of 1 in each row, and save it back out again without quotes:

Syntax

lowercase heading|values [and heading|values] in columnColName1 [... ColNameN]

Although the syntax shown uses the keyword column, either column or columns may be specified. Both work in an identical manner.

Details

The heading and values keywords determine the scope of the processing to be applied. The columns named in ColName1 ... ColNameN may or may not be fully qualified. If a column name is not fully qualified then the default DSET will be assumed.

Only upper case characters in a column name or value are converted. Numbers, punctuation and other symbols are ignored. Any blank values in a column are ignored.

Examples

lowercase heading in column servicegroup

lowercase heading and values in columns servicegroup VMWare.usage.resourcepool chargeinterval

lowercase values in column Region
# Move all rows in usage.data where hostname is "test server"
# to a new DSET called test.servers

where ([usage.data.hostname] == "test server") {
    move rows to test.servers
}
import "system/extracted/csp_usage.csv" source test alias data

# Invert all numerical values in column 'quantity'
normalise column quantity as invert
VMSize,RAM,CPU
small,2,2
medium,4,2
large,8,4
huge,16,8
VMSize,storage
small,50
medium,100
large,500
huge,1000
VMSize,RAM,CPU,storage
small,2,2,
medium,4,2,
large,8,4,
huge,16,8,
small,,,50
medium,,,100
large,,,500
huge,,,1000
option quote = \"
use usage from azure
create column ProcessedFlag value 1
option noquote

# Will be exported as <basedir>\exported\azure\flagged.csv
export azure.usage as "azure\flagged.csv"

aggregate

Overview

The aggregate statement is used to reduce the number of rows in a DSET while preserving required information within them

Syntax

aggregate[dset.id][notime|daily] [offsetoffset][nudge] [default_functionfunction] colname function [... colname function]

Details

The aggregate statement is a powerful tool for reducing the number of rows in a DSET. Aggregation is based on the concept of matching rows. Any two rows that match may be merged into a single row which selectively retains information from both of the original rows. Any further rows that match may also be merged into the same result row.

A quick introduction

A match is determined by comparing all the columns which have a function of match associated with them (further information regarding this can be found below). If all the column values match, then the rows are merged.

Merging involves examining all the columns in the data that were not used in the matching process. For each of those columns, it applies a function to the values in the two rows and updates the result row with the computed result of that function. For a full list of functions, please refer to the table further down in this article.

To illustrate this consider the following two row dataset:

If we don't care about the colour value in the above records, we can combine them together. We do care about the quantity however, so we'll add the two values together to get the final result.

The statement to do this is:

aggregate notime id match location match quantity sum

  • id match means that the values in the id columns must be the same

  • location match means that the values in the location columns must be the same

  • quantity sum means that the resulting value should be the sum of the two existing values

Applying these rules to the above example we get the following single result record:

A column calledEXIVITY_AGGR_COUNT is automatically created by the aggregate statement and for each row in the output it will contain the number of source rows that were merged together to create that result row

Parameters

The aggregate statement accepts a range of parameters as summarised in the table below:

If two records are deemed suitable for merging then the function determines the resulting value in each column. The available functions are as follows:

Non time-sensitive aggregation

When the notime parameter is specified, the aggregation process treats any columns flagged as start and end times in the data as data columns, not timestamp columns.

In this case when comparing two rows to see if they can be merged, the aggregation function simply checks to see if all the columns with a function of match are the same, and if they are the two rows are merged into one by applying the appropriate function to each column in turn.

De-duplication

The following illustrates the aggregate statement being used to remove duplicate rows from a DSET:

The analysis of the statement above is as follows:

  • notime - we are not interested in timestamps

  • default_function match - by default every column has to match before records can be aggregated

  • subscription_id match - this is effectively redundant as the default_function is match but needs to be present because at least one pair of colname function parameters is required by the aggregate

The resulting DSET will have no duplicate data rows, as each group of rows whose column values were the same were collapsed into a single record.

Row reduction while preserving data

The example shown at the top of this article used the sum function to add up the two quantity values, resulting in the same total at the expense of being able to say which source record contributed which value to that total.

The sum function can therefore accurately reflect the values in a number of source rows, albeit with the above limitation. By using a function of sum, max or min, various columns can be processed by aggregate in a meaningful manner, depending on the specific use case.

Time-sensitive aggregation

When aggregating, columns containing start time and end time values in UNIX epoch format can be specified. Each record in the DSET therefore has start and end time markers defining the period of time that the usage in the record represents. As well as taking the start times and end times into account, time-sensitive aggregation can perform additinal manipulations on these start and end times.

A quick example

Consider the following CSV file called aggregate_test.csv:

It is possible to aggregate these into 3 output records with adjusted timestamps using the following Transcript task:

Resulting in:

As can be seen, for each unique combination of the values in the id,subscription-id and service columns, the start and end times have been adjusted as described above and the quantity column contains the sum of all the values in the original rows.

When performing time-sensitive aggregation, any records with a start or end time falling outside the current will be discarded.

Further notes

The daily parameter to aggregate means that the START_TIME and END_TIME columns are now recognised as containing timestamps. When aggregating with the daily option, timestamps within the current dataDate are combined to result in an output record which has the earliest start time and the latest end time seen within the day.

Optionally, following daily an offset may be specified as follows:

aggregate aggr.test daily offset 2 id match subscription_id match quantity sum

In this case the start and end timestamps are adjusted by the number of hours specified after the word offset before aggregation is performed. This permits processing of data which has timestamps with timezone information in them, and which may start at 22:00:00 of the first day and end at 21:59:59 of the second day, as an offset can be applied to realign the records with the appropriate number of hours to compensate.

The nudge parameter shaves 1 second off end times before aggregating in order to avoid conflicts where hourly records start and end on 00:00:00 9the last second of the current hour is the same as the first second of the next hour)

round

Overview

The round statement is used to ensure that numeric values in a column are whole multiples of a specified number.

Syntax

roundcolName [direction][to nearestvalue]

Details

The round statement is used to round numbers to the nearest multiple of any integer or floating point number.

The parameters supported by round are as follows:

The simplest form of the statement is round colName. This will use the defaults shown in the table above such that values are rounded to the next highest integer. Alternatively, the statement round colName down will round down to the nearest integer.

If the value argument is provided then the numbers in colName will be rounded up or down to the nearest multiple of value. For example the statement ...

round Quantity up to nearest 5

... will round the numbers in the column Quantity up to the next multiple of 5. Thus any number in Quantity higher than 10 and less than 15 will be rounded to 15.

When specifying the value argument, floating point values are supported.

Additional notes

The round statement observes the following rules:

  • Non-numeric values in colName are ignored

  • Blank values, or a value of 0 in colName are ignored

  • Numbers in colName that are already a whole multiple of value will not be modified

The round statement may be used in the body of a statement to perform conditional rounding.

Examples

Language

Syntax

Within the individual reference articles for each statement, the syntax is described using the following conventions:

  • bold for keywords

  • italics for arguments

  • Square brackets for optional keywords and arguments [likethis]

  • Vertical pipe for alternative keyword options just|exactly as shown

  • Ellipses for a variable length list of arguments: Column1 ... ColumnN

Refer to the page for more information regarding datasets, fully qualified column names and related information.

Reference

The following statements (in alphabetical order) are supported by Transcript:

split

Overview

The split statement is used to create and/or update columns by splitting the textual value of an existing column into multiple parts.

include

Overview

The include statement is used to combine Transcript task files together

Syntax

delete

Overview

The delete statement is used to delete one or more columns or rows from one or more DSETs.

uppercase

Overview

The uppercase statement is used to modify the name of, and/or the values in, a column such that any lower case characters are replaced with their upper case equivalent.

terminate

The terminate statement will exit the transcript task immediately.

Syntax

terminate

by default, a function of first is applied to the columns, such that the original row retains its value

Specifies the default logic to apply to a column when merging records together. If not specified then the default is first (see table below)

colname function

One or more pairs of column + function parameters. For each pair, the specified function will be used for the specified column name when merging records together during the aggregation process. For any columns not explicitly named, the default function will be applied.

The values will be treated as numbers and the smallest will be used

longest

Whichever value has the most characters in it will be used

shortest

Whichever value has the least characters in it will be used

blank

The value in the resulting merged record will be blank

avg

The values will be treated as numbers and the average will be used

statement

Parameter

Notes

dset.id

If not specified then the default DSET will be used

notime

(Either notime or daily is required) If used, timestamps in records are not taken into consideration when aggregating

daily

(Either notime or daily is required) If used, specifies that timestamps in the records will be considered when aggregating

offset

(May only be used if daily is present) The number of hours to shift timestamps by prior to aggregation

nudge

(May only be used if daily is present) If present, the times in the timestamp column marked as the end time column will have 1 second shaved off them prior to aggregation

Function

Logic

match

The value in both records must be the same

first

The existing value in the first ever result record wil be used

last

The value in the last record merged will be used

sum

The values will be treated as numbers and summed

max

The values will be treated as numbers and the greatest will be used

data date

default_function

min

Parameter

Default

Purpose

colName

n/a

The name of the column containing the values to round

direction

up

Whether to round to the next highest (up ) or lowest ( down ) multiple of value

value

1

A value determining the granularity of the rounding

where

includetaskfile

Details

The include statement is used to combine two or more task files into a single script prior to commencement of execution.

If using the Windows path delimiter - \ - it is advisable to put the path+filename in double quotes to avoid the backslash being interpreted as an escape character

Before executing a task file, Transcript analyses the script and processes all the include statements. Wherever an include statement is encountered, the specified taskfile is read from disk and inserted into the script in place of the include statement itself.

Command placement

As with most statements, include must occupy a single line of its own in the script.

Additional notes

The taskfile argument is treated as relative to <basedir>/system/config/transcript/.

Multiple files may be included in a task file and any included file may include other files but a check is done to ensure that no infinite include loops are generated.

Example

Given the following two files in /system/config/transcript/ ...

Prior to execution the pre-processor will combine these into the single memory-resident Transcript task:

Syntax

delete columns [except]ColName1 [... ColNameN]

delete blankcolumns

delete rows

delete dsetdset_id

Details

Deleting columns

The list of column names following the initial delete columns statement may be used in one of two ways:

  1. The columns listed will be deleted

  2. All columns except those listed will be deleted

Which method to use is determined by the presence (or otherwise) of the except keyword:

When using the except keyword, all column names following it must belong to the same DSET

It is not possible to delete a column from a DSET which only contains a single column. When deleting a column, the memory used to index it, along with the memory used to store the contents of its cells is released. This may be useful in situations where memory is limited.

Deleting unwanted columns early in a Transcript task will also increase the the performance of many subsequent operations performed by the task.

The second keyword may be column instead of columns. Either way, the statement behaves identically.

Deleting blank columns

The delete blankcolumns statement will delete columns that only contain blank values from the default DSET.

It is not possible to completely empty a DSET using delete blankcolumns. In the event that the last remaining column in a DSET is blank then it will not be deleted.

Deleting rows

In order to delete rows from a DSET a local filter must be in effect. A local filter is created using the where statement.

When used within the body of a where statement, delete rows will delete all rows in the DSET associated with the local filter where the condition is true. For more information on conditions, please refer to the where article.

Either delete rows or delete row may be used (both variations work in an identical manner)

Deleting a DSET

The delete dset statement can be used to remove a DSET from memory. This may be useful in cases where a DSET is no longer required, for example after it has been used to enrich another DSET via the correlate statement.

It is not possible to delete the default DSET.

Examples

Delete the column temp from the default DSET and the column Interim from the azure.Usage DSET: delete columns temp azure.Usage.Interim

Delete any columns in the default DSET that have no values in any rows: delete blankcolumns

Delete all rows where the VMID is 1234

Delete the DSET azure.rates delete dset azure.rates

Syntax

uppercase heading|values [and heading|values] in columnColName1 [... ColNameN]

Although the syntax shown includes the keyword column, either column or columns may be specified. Both work in an identical manner.

Details

The heading and values keywords determine the scope of the processing to be applied. The columns named in ColName1 ... ColNameN may or may not be fully qualified. If a column name is not fully qualified then the default DSET will be assumed.

Only lower case characters in a column name or value are converted. Numbers, punctuation and other symbols are ignored. Any blank values in a column are ignored.

Examples

uppercase heading in column servicegroup

uppercase heading and values in columns servicegroup VMWare.usage.resourcepool chargeinterval

uppercase values in column Region
Details

Normally a transformation script will finish execution when an error is encountered or when the end of the script file is reached, whichever comes first.

When the terminate statement is encountered, the script will finish at that point. No statements after the terminate statement will be executed.

Example

var file = "system/extracted/mydata.csv"

if (@FILE_EXISTS(${file})) {
    import ${file} source my alias data
} else {
    terminate
}
id,colour,location,quantity
1234,blue,europe,4.5
1234,green,europe,5.5
id,colour,location,quantity,EXIVITY_AGGR_COUNT
1234,blue,europe,10,2
# The first column in the DSET being aggregated
# is called subscription_id

aggregate notime default_function match subscription_id match
startUsageTime,endUsageTime,id,subscription_id,service,quantity
2017-11-03:00.00.00,2017-11-03:02.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:03.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:06.00.00,ID_3456,SUB_efgh,Medium VM,2
2017-11-03:00.00.00,2017-11-03:04.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:05.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:06.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:07.00.00,ID_1234,SUB_abcd,Large VM,2
2017-11-03:00.00.00,2017-11-03:02.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:03.00.00,ID_3456,SUB_efgh,Medium VM,2
2017-11-03:00.00.00,2017-11-03:04.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:05.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:07.00.00,ID_3456,SUB_efgh,Large VM,2
2017-11-03:00.00.00,2017-11-03:06.00.00,ID_3456,SUB_efgh,Medium VM,2
import system/extracted/aggregate_test.csv source aggr alias test

var template = YYYY.MM.DD.hh.mm.ss
timestamp START_TIME using startUsageTime template ${template}
timestamp END_TIME using endUsageTime template ${template}
timecolumns START_TIME END_TIME
delete columns startUsageTime endUsageTime

aggregate aggr.test daily nudge default_function first id match subscription_id match service match quantity sum

timerender START_TIME as FRIENDLY_START
timerender END_TIME as FRIENDLY_END
id,subscription_id,service,quantity,START_TIME,END_TIME,EXIVITY_AGGR_COUNT,FRIENDLY_START,FRIENDLY_END
ID_1234,SUB_abcd,Large VM,12,1509667200,1509692399,6,20171103 00:00:00,20171103 06:59:59
ID_3456,SUB_efgh,Medium VM,6,1509667200,1509688799,3,20171103 00:00:00,20171103 05:59:59
ID_3456,SUB_efgh,Large VM,8,1509667200,1509692399,4,20171103 00:00:00,20171103 06:59:59
round Quantity                       # Round up to nearest integer
round Quantity down                  # Round down to nearest integer
round Quantity up to nearest 4       # Round up to next multiple of 4
round Quantity down to nearest 2     # Round down to next lowest even number

round Quantity up to nearest 0.5     # ie: 2.25 -> 2.5
round Quantity down to nearest 0.1   # ie: 23.34567 -> 23.3
round Quantity down to nearest 0.01  # ie: 23.34567 -> 23.34

where ( ([quantity] > 5) && ([quantity] <= 100) ) {
    round quantity up to nearest 10       # Force consumption in blocks of 10
}

where ([quantity] > 100) {
       round quantity up to nearest 5     # Better deals for larger consumption
}
# FILE 1: myscript.trs
option loglevel = DEBUGX

# import usage data from csp
import "system\extracted\AzureCSP\${dataDate}_csp_usage.csv" source CSP alias Usage
default dset csp.usage
rename column resource_id to meter_id

include import_customers.trs

option overwrite = no
set resource_subcategory to Generic
create column interval value individually
create mergedcolumn service_name separator " - " from resource_name resource_subcategory region
# etc ...
# FILE 2: import_customers.trs
import "system\extracted\AzureCSP\${dataDate}_csp_customers.csv" source CSP alias Customers
rename column CSP.Customers.ID to customer_id
# END FILE 2
# FILE 1: myscript.trs
option loglevel = DEBUGX

# import usage data from csp
import "system\extracted\AzureCSP\${dataDate}_csp_usage.csv" source CSP alias Usage
default dset csp.usage
rename column resource_id to meter_id

# FILE 2: import_customers.trs
import "system\extracted\AzureCSP\${dataDate}_csp_customers.csv" source CSP alias Customers
rename column CSP.Customers.ID to customer_id
# END FILE 2

option overwrite = no
set resource_subcategory to Generic
create column interval value individually
create mergedcolumn service_name separator " - " from resource_name resource_subcategory region
# etc ...
delete columns One Two Three         # Delete the listed columns
delete columns except One Two Three  # Delete all but the listed columns
where ([VMID] == 1234) {
    delete rows
}

Copy rows from one DSET to another

Merge DSETs using a key

Create one or more columns

Specify the default DSET

Delete columns, rows or DSETs

Snapshot a DSET to disk

Create a Reporting Database File

Conditionally execute statements

Import a Dataset or CCR file

Execute one task from within another

Convert column name and/or values to lower case

Move rows from one DSET to another

Normalise strings

Set global parameters

Rename an existing column or DSET

Search and replace values in a column

Round numeric values in a column

Create a chargeable service

Create multiple chargeable services

Set cell values in a column

Split column values

Set the start time and end time columns

Render a UNIX timestamp in human-readable form

Create a timestamp column

See import

Modify one or more existing descriptions and/or unit label

Convert column name and/or values to upper case

Define a variable

Define a local filter

Statement

Description

aggregate

Reduce the number of rows in a DSET while preserving information

append

Append one DSET to the end of another

calculate

Perform arithmetic on column values

capitalise capitalize

Capitalise column name and/or values

convert

Convert between decimal and hex values

core concepts

Syntax

splitColNameusingsep

splitColNameusingsepretaining first[column_count]

splitColNameusingsepretaining last[column_count]

splitColNameusingsepretainingfirst_column_index[tolast_column_index]

The keyword following ColName may be using, separator or delimiter. All three work in exactly the same way.

Details

The ColName argument is the name of an existing column whose values are to be split and the sep argument (which must only be a single character in length) is the delimiter by which to split them.

For example a value one:two:three:four, if divided by a sep character of :, would result in the fields one, two, three and four.

To specify a sep value of a space, use " " or\<space> (where <space>is a literal space character)

New columns are created if necessary to contain the values resulting from the split. These additional columns are named ColName_split1 through ColName_splitN where N is the number of values resulting from the process.

If there is an existing column with a name conflicting with any of the columns that split creates then:

  • If the overwrite option is set then all values in that pre-existing column will be overwritten

  • If the overwrite option is not set:

    • If there are blank values in the existing column they will be updated

    • If there are no blank values in the existing column then no changes will be made to it

Keeping only specific fields

If the retaining keyword is specified, split will discard one or more of the resulting columns automatically based on the specification that follows the keyword. The following specifications are supported:

Specification

Result

first or 1

Discard all but the first column

first N or 1 to N

Discard all but the first N columns

last

Discard all but the last (rightmost) column

last N

Discard all but the last N columns

N

Discard all but the _N_th column

In the table above, N refers to a result column's number where the first column created by split has a number of 1

Example

Given an input dataset of the form:

The statement ...

split ID using :

... will result in the dataset:

Using the same original dataset, the statement ...

split ID using : retaining 3 to 5

... will result in the dataset:

correlate

Overview

The correlate statement is used to enrich the default DSET by adding new columns to it, and/or updating existing columns with useful values. The new column names are derived from other DSETs and the values in those columns are set using a lookup function based on the value in a key column shared between the DSETs.

Syntax

correlateColName1 [ ... ColNameN]usingKeyColumn[assumingassumeDSET][defaultDefaultValue]

Details

The ColName1 ... ColNameN arguments are column names that will be copied from their original DSETs and merged into the default DSET.

Column names must be , unless the assuming parameter is used, in which case any column names that are not fully-qualified will be assumed to belong to the DSET specified by assumeDSET.

Source and Destination columns

Source columns are those from which a cell is to be copied when the KeyColumn matches. Destination columns are columns in the default DSET into which a cell will be copied. Destination column names are derived from the names of the source columns as follows:

  • The source column is the argument in its original form, for example: Azure.usage.MeterName

  • The destination column is the same argument, but with the DSET ID replaced with that of the DSET. For example if the default DSET is Custom.Services then the destination column for the above would be Custom.Services.MeterName.

If a destination column name doesn't exist in the default DSET then a new column with that name will automatically be created.

The Key Column

The KeyColumn argument is a column name which must not be fully qualified and which must exist in the default DSET and all of the DSETs referenced by the ColNameN arguments.

Default values

The DefaultValue argument, if present, specifies the value to write into the destination column if there is no match for the KeyColumn. If the DefaultValue argument is not specified then any rows where there is no match will result in a blank cell in the destination column.

For each row in the default DSET, the source DSET is searched for a matching KeyColumn value, and if a match is found then the value in the source column is used to update the default DSET. The row of the first match found in the source DSET will be used.

Overwriting

When matching the KeyColumn values, the logic in the following table is evaluated against every row in the destination DSET.

✘ means no or disabled, ✔ means yes or enabled

Examples

Given two Datasets as follows, where the default DSET is MyData.Owners:

Dataset 'MyData.Owners'

Dataset 'Custom.Services'

The statement: correlate service description using id assuming Custom.Services

Will enrich the MyData.Owners Dataset such that it contains:

The statement: correlate service description using id assuming Custom.Services default unknown

Will produce:

timerender

Overview

The timerender statement is used to create or update a column containing human-readable versions of UNIX timestamps

Syntax

timerendertimestampColasreadableCol

Details

When working with time-sensitive data, Exivity uses UNIX epoch timestamps (accurate to 1 second) which contain an integer representing the number of seconds that have elapsed since 00:00:00 on January 1st 1970.

These values can be generated from usage data using the statement and may be modified during aggregation, so in order to assist the development and debugging of Transcript tasks the timerender statement may be used to create human-readable versions UNIX timestamps as follows:

  • The timestampCol parameter is the name of the column containing the UNIX timestamps to convert

  • The readableCol parameter is the name fo the column to write the human-readable strings into

If the readableCol column does not exist, it will be created. If it does exist then the values in it will be updated.

If readableCol exists and the 'overwrite' option is disabled via then only blank values will be updated

The human readable timestamps contain fields for year month day hour minute second, eg: 20170518 17:00:00.

The timerender statement will always use the local timezone of the Exivity server when converting the timestamps. If other timezones are required then can be invoked with the offset option to adjust the UNIX timestamps as required

If any values in timestampCol are blank, or do not contain a valid UNIX timestamp then the value in readableCol on the same row will default to a blank value.

Example

replace

Overview

The replace statement is used to search for, remove, and optionally replace with new values, substrings of the values in a column.

update_service

Overview

The update_service statement is used to update service descriptions and/or unit labels.

var

Overview

The var statement is used to create or update a variable.

A number of variables are automatically created before the commencement of a Transcript task. Please refer to the

Name,ID
VM-One,sales:2293365:37
VM-Two,marketing:18839:division:89AB745
VM-Three,development:34345:engineering:345345:Jake Smith
VM-Four,sales::38
VM-Five,marketing:234234234:testMachine
VM-Six,development:xxxx:test
VM-Seven,1234:5678
VM-Eight,test:::
VM-Nine,field::5
VM-Ten,test::3425:
Name,ID,ID_split1,ID_split2,ID_split3,ID_split4,ID_split5
VM-One,sales:2293365:37,sales,2293365,37,,
VM-Two,marketing:18839:division:89AB745,marketing,18839,division,89AB745,
VM-Three,development:34345:engineering:345345:Jake Smith,development,34345,engineering,345345,Jake Smith
VM-Four,sales::38,sales,,38,,
VM-Five,marketing:234234234:testMachine,marketing,234234234,testMachine,,
VM-Six,development:xxxx:test,development,xxxx,test,,
VM-Seven,1234:5678,1234,5678,,,
VM-Eight,test:::,test,,,,
VM-Nine,field::5,field,,5,,
VM-Ten,test::3425:,test,,3425,,
Name,ID,ID_split1,ID_split2,ID_split3
VM-One,sales:2293365:37,37,,
VM-Two,marketing:18839:division:89AB745,division,89AB745,
VM-Three,development:34345:engineering:345345:Jake Smith,engineering,345345,Jake Smith
VM-Four,sales::38,38,,
VM-Five,marketing:234234234:testMachine,testMachine,,
VM-Six,development:xxxx:test,test,,
VM-Seven,1234:5678,,,
VM-Eight,test:::,,,
VM-Nine,field::5,5,,
VM-Ten,test::3425:,3425,,

N to M

Discard all but the columns from the _N_th to the _M_th inclusive

copy
correlate
create
default
delete
export
finish
if
import
use
include
lowercase
move
normalise
normalize
option
rename
replace
round
service
services
set
split
timecolumns
timerender
timestamp
use
update_service
service
uppercase
var
where

✔

Empty destination column cells will be set to the default value

✔

✘

✔

Empty destination column cells will be set to the matched source column value

✘

✔

✘

No values will be updated

✔

✔

✘

Destination column cells will be updated

✘

✔

✔

Destination column cells will be set to the default value

✔

✔

✔

Destination column cells will be set to the matched source column value

Match Found

Overwrite

Default Value

Result

✘

✘

✘

No values will be updated

✔

✘

✘

Empty destination column cells will be updated

✘

fully qualified
default

✘

timestamp
option overwrite
timestamp
Syntax

replacesubstringincolName[withreplacement]

Details

The replace statement will remove or replace all occurrences of substring in the values of a column. The parameters are as follows:

Parameter

Notes

substring

A string to search the values in colName for

colName

The column to search for the substring

replacement

The string to replace occurrences of substring with

The colName argument may or may not be fully qualified, but must reference an existing column.

If the optional replacement string is provided then any occurrences of substring will be substituted with the specified value.

If the replacement string is not provided then all occurrences of substring will be removed from the values in colName.

The replace statement is useful for reducing verbosity in extracted data, resulting in smaller RDFs without losing required information

Example

Given the following sample data:

The following script ...

... will produce the following output data.

Syntax

update_serviceservice_key[descriptionnew_description][labelnew_label]

update_service using filefile key_col[descriptiondescription_col][labellabel_col]

update_service using dsetkey_col[descriptiondescription_col][labellabel_col]

Details

The update_service statement can:

  • update a single service using a specific key

  • update all services whose keys are present in a DSET or CSV lookup source

Update a single service

update_serviceservice_key[descriptionnew_description][labelnew_label]

The service with a key of service_key is updated with the literal description and label provided.

Update multiple services using a CSV lookup file

update_service using filefile key_col[descriptiondescription_col][labellabel_col]

The file parameter may contain a path component and is considered as relative to the home directory.

All services keys that match values in the key_col column in the CSV called file have their descriptions and/or labels updated with the values in the corresponding description_col and label_col columns in file.

Update multiple services using a DSET

update_service using dsetkey_col[descriptiondescription_col][labellabel_col]

To update services using a DSET, it must have been previously imported in order to be used as the lookup source.

The DSET ID is derived from the key_col column as follows:

  • If fully qualified then the DSET is derived from key_col directly

  • If not fully qualified then the default DSET is assumed

In either case, whichever is used, or both, of description_col and label_col must be located in the same DSET as key_col

Examples

Variables
section in the
to this guide for further details.

Syntax

varname=value

As the value may be derived from an expression (see Using an expression to set a variable below) it is recommended that variable values be quoted in order to avoid characters such as / being interpreted as mathematical operators.

Details

A variable is a named value. Once defined, the name can be used in place of the value for the rest of the transcript task. This can be useful in that it permits configuration of various parameters at the top of a transcript task, which makes maintenance easier.

Both the name and the value parameters must be present, and there must be a space on each side of the = character. To use a space or tab in a variable value, use an escape or double quote the value:

A variables is referenced by enclosing its name with ${ and }. For example a variable called outputFile can be referenced using ${outputFile}.

Variable names are case sensitive, therefore ${variableName} and ${VariableName} are separate variables.

If there is already a variable called name then the var statement will update the value.

The value of a variable may not exceed 1023 characters

Using an expression to set a variable

As well as a literal value, an expression may be specified, in which case the variable is set to the output of that expression. For example:

For a full list of available functions please refer to the functions section of the documentation for the if statement.

Example

Introduction
owner,id
John,100
Tim,110
Fokke,120
Joost,130
Jon,140
service,description,id
Small_VM,Webserver,130
Medium_VM,App_Server,100
Large_VM,DB_Server,110
Medium_VM,Test_Server,120
owner,id,service,description
John,100,Medium_VM,App_Server
Tim,110,Large_VM,DB_Server
Fokke,120,Medium_VM,Test_Server
Joost,130,Small_VM,Web_Server
Jon,140,,
owner,id,service,description
John,100,Medium_VM,App_Server
Tim,110,Large_VM,DB_Server
Fokke,120,Medium_VM,Test_Server
Joost,130,Small_VM,Web_Server
Jon,140,unknown,unknown
# Import raw usage data
import "system\extracted\csp_usage.csv" source CSP alias usage

# Create UNIX start and end time columns from it
var template = YYYY.MM.DD.hh.mm.ss
timestamp START_TIME_UNIX using usageStartTime template ${template}
timestamp END_TIME_UNIX using usageEndTime template ${template}

# Render human-readable columns
timerender START_TIME_UNIX as START_TIME_HUMAN
timerender END_TIME_UNIX as END_TIME_HUMAN
PayerAccountName,TaxationAddress,ProductCode,ProductName,ItemDescription
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AWSDataTransfer,AWS Data Transfer,$0.00 per GB - EU (Germany) data transfer from EU (Ireland)
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AmazonS3,Amazon Simple Storage Service,$0.0245 per GB - first 50 TB / month of storage used
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AWSDataTransfer,AWS Data Transfer,$0.00 per GB - EU (Germany) data transfer from US West (Northern California)
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AWSDataTransfer,AWS Data Transfer,$0.090 per GB - first 10 TB / month data transfer out beyond the global free tier
John Doe,"123 Big St, Largetown, RH15 0HZ, United Kingdom",AWSDataTransfer,AWS Data Transfer,$0.00 per GB - EU (Germany) data transfer from US East (Northern Virginia)
import "system\extracted\example.csv" source test alias data
replace "John Doe" in PayerAccountName with "Finance Dept"
replace "RH15 0HZ, " in TaxationAddress
replace "United Kingdom" in TaxationAddress with UK
replace "AWS" in ProductCode

where ([ProductCode] == "AmazonS3") {
    replace "Amazon" in ProductCode
}

replace "Amazon " in ProductName
replace "AWS " in ProductName
replace "transfer from" in ItemDescription with -
"PayerAccountName","TaxationAddress","ProductCode","ProductName","ItemDescription"
"Finance Dept","123 Big St, Largetown, UK","DataTransfer","Data Transfer","$0.00 per GB - EU (Germany) data - EU (Ireland)"
"Finance Dept","123 Big St, Largetown, UK","S3","Simple Storage Service","$0.0245 per GB - first 50 TB / month of storage used"
"Finance Dept","123 Big St, Largetown, UK","DataTransfer","Data Transfer","$0.00 per GB - EU (Germany) data - US West (Northern California)"
"Finance Dept","123 Big St, Largetown, UK","DataTransfer","Data Transfer","$0.090 per GB - first 10 TB / month data transfer out beyond the global free tier"
"Finance Dept","123 Big St, Largetown, UK","DataTransfer","Data Transfer","$0.00 per GB - EU (Germany) data - US East (Northern Virginia)"
# Update all services from CSV file
update_service using file "system/lookup.csv" service_key description service_name label unit

# Update specific service
update_service "Small_VM1" description "Small VM 1" label "VM's"
var example = "Hello world"
var example = Hello\ world
var value1 = 5
var value2 = 10
var largest = @MAX(${value1}, ${value2})
# ----> Start Config <----
var tmp_folder = AzureTmp
# ----> End Config <----

import "system/extracted/${tmp_folder}/${dataDate}.csv" source Azure alias usage
import "system/extracted/${tmp_folder}/rates.csv" source source Azure alias rates

timecolumns

Overview

The timecolumns statement is used to set the start time and end time columns in a DSET

Syntax

timecolumnsstart_time_col end_time_col

timecolumns clear

Details

The usage data stored in a DSET may or may not be time sensitive. By default it is not, and every record is treated as representing usage for the entire day. In many cases however, the usage data contains start and end times for each record which define the exact time period within the day that the record is valid for.

If the usage data contains start and end times that are required for functions such as aggregation or reporting, the column(s) containing those times need to be marked such that they can be identified further on in the processing pipeline. This marking is done using the timecolumns statement.

The timecolumns statement does not perform any validation of the values in either of the columns it is flagging. This is by design, as it may be that the values in the columns will be updated by subsequent statements.

The values in the columns will be validated by the finish statement.

If the timecolumns statement is executed more than once, then only the columns named by the latest execution of the statement will be flagged. It is not possible to have more than one start time and one end time column.

Both the start_time_col and end_time_col parameters may be fully qualified column names, but they must both belong to the same DSET.

It is possible to use the same column as both the start and end times. In such cases the usage record is treated as spanning 1 second of time. To do this, simply reference it twice in the statement:

Clearing the flagged timestamp columns

To clear both the start and end time columns, thus restoring the default DSET to treating each record as spanning the entire day, the statement timecolumns clear may be used.

Currently the statement timecolumns clearwill only clear the timestamp columns in the default DSET

This can be useful in the following use case:

  • The DSET is loaded and timestamp columns are created

  • finish is used to create a time-sensitive RDF

  • The timestamp columns are cleared

  • The DSET is renamed using the rename dset statement

  • Further processing is done on the DSET as required

  • is used to create a second RDF which is not time-sensitive

Example

where

Overview

The where statement is used to define a local filter, which restricts the scope of any statements contained within its body.

timecolumns timestamp_col timestamp_col
# Read data from file into a DSET called usage.data
import system/extracted/usage_data.csv source usage alias data

# Create two UNIX-format timestamp columns from the columns
# usageStartTime and usageEndTime, each of which records a time
# in the format '2017-05-17T17:00:00-07:00'
var template = YYYY.MM.DD.hh.mm.ss
timestamp START_TIME using usageStartTime template ${template}
timestamp END_TIME using usageEndTime template ${template}

# Flag the two columns we just created as being the start and
# end time columns
timecolumns START_TIME END_TIME

# Create the time sensitive DSET
finish
finish
Syntax

Details

The expressions used by the where statement take the same form as those used in the if statement.

The information below covers aspects of expression use that are unique to the where statement

The where statement is used to define a local filter which restricts the effect of the statements located in the lines between the opening { and closing }.

These braces define the start and end of the body of the where statement.

  • The opening { must be preceded with white-space may be positioned on the same line as the where statement or may be on a line of its own

  • The closing } must be located on a line of its own

Only statements that support a local filter may be located with the body. If a non-supported statement is included in the body then a log entry is generated and the task will fail.

The conditional expression associated with the where statement must reference at least one column name as detailed in the Column names section below

Only rows in the DSET associated with the expression for which the conditional expression is true will be affected by the statements in the body of the where statement.

Referencing column names

As the purpose of the where statement is to selectively process rows of data based on the expression, it is a requirement that the condition references one or more column names.

To reference a column in an expression, its name should be specified, enclosed in square brackets [like_this].

If the column name contains white-space then it should be quoted within the brackets ["like this"]. The reason that the quotes must be located within the brackets is that otherwise the brackets (along with their contents) would be treated as a literal string.

To illustrate this in use, consider a DSET called test.data which contains the following:

The following script will create a new column called "flag" and set that new column value to 1 for all rows where the user key is f.

A blank value in a column can be checked for using a pair of quotes with no content - "". The following will delete all rows in the DSET Azure.usage where the quantity column is empty:

It is possible for one or more statements (such as set) within the statement block enclosed by where { ... } to modify the contents of a column that is being used by the expression.

Doing this may lead to unexpected results as for each statement in the block the expression is re-evaluated for every row in the data being processed, so if set is used to update the values in a column then the new values will be used for all subsequent evaluations of the expression, which will affect all following statements in the where block.

If updating a value in any column(s) referenced by the expression, it is recommended to make a copy of the column before the where statement, using

set copy_of_column as original_column

The values in the copy can then be freely modified, and the original column can be replaced with it after execution of the where statement has completed.

Regular expressions

Regular expressions are often used to restrict processing to certain rows in a dataset.

For example:

As is the case with the if statement, the entire value in the column must match the expression in order for the row to be processed (there is no implicit .* at the end of an expression).

Supported statements

The following statements may be used in the body of a where statement:

  • calculate

  • convert

  • copy

  • create mergedcolumn

The following statements may also be used in the body of a where statement but are unaffected by the expression (and as a result work exactly the same as if they were used outside the where statement body)

  • default

  • export

  • import

  • option

  • terminate

Examples

option

Overview

The option statement is used to set global parameters during the execution of a Transcript task.

create

Overview

The create statement is used to add one more more new columns to an existing DSET.

Syntax

where (conditional expression) {
   <statements>
}
user key,value,name
d,4,dexter
f,6,freddy
g,7,geoff
a,1,andy
# Create a blank column called 'flag'
create column flag

# Set values in 'flag' to 1 where the user key is f
where (["user key"] == f) {
   set flag to 1
}
where ([Azure.usage.quantity] == "") {
    delete rows
}
# Delete all rows where the hostname contains the word 'test'
where ([hostname] =~ /.*test.*/) {
    delete row
}
# Delete all rows where the VMID is 1234

where ([VMID] == 1234) {
    delete rows
}
delete rows
move
replace
round
set
split
timerender
var
Syntax

optionoption = setting

option noquote

Details

The option statement can be used multiple times within the same task script and always takes immediate effect. It is therefore possible (for example) to import a CSV file delimited with commas and quoted with single quotes, change the options and then export it with semicolon delimiters and double quotes.

The supported options are as follows:

Option

Default

Notes

continue

disabled

Determines whether to proceed to the next day in a date range or bail out if an error occurs on any given day when processing a range of dates - see notes below.

delimiter separator

,

Specifies the delimiter character between each field in a CSV file. This character will be used when importing and exporting Datasets. The default value is a comma. If a dot character is used, then the option statement will be ignored and a warning generated in the logfile. Either delimiter or separator can be used - they act in exactly the same way.

services

readonly

May be set to readonly ,overwrite or update. This determines whether service definitions can be updated and, if so, which fields to update. Please refer to the notes in the next section for more detail.

mode

When using options, there must be whitespace on each side of the = sign

Additional notes

Continue

option continue = yes|enabled

option continue = no|disabled

When executing a task file repeatedly against each day in a date range, by default Transcript will abort the whole run if a task failure occurs. In cases where this is undesirable, setting the continue option to enabled or yes (both work in exactly the same way) will change the behaviour such that if a task failure occurs then execution will resume with the next day in the range.

When combining the continue option with option mode = permissive it is possible to process a range of dates for which usage or other data is not available, because the mode option will prevent a failed import statement from being treated as a fatal error.

Delimiter / Separator

When specifying a quote or tab as the separator it is necessary to escape it in order to prevent Transcript from interpreting it as a meaningful character during the parsing of the task script. For example:

Services

This option may be specified as readonly , overwrite or update and influences the behaviour of the service and services statements as follows:

Value

Notes

readonly

Existing service definitions which have keys matching those in the data being processed will not be overwritten with new versions (though new services may be created).

overwrite

Existing services which have keys matching those in the data being processed will be overwritten.

update

Existing services which have keys matching those in the data being processed will have their Description and Unit Label fields updated, but no other changes will be made.

Any services that are overwritten through use of option overwritewill lose any custom rates associated with them and a single default rate will be created instead.

In all cases, rate revisions may be created depending on the data being processed, but existing global rates will not be replaced with new ones if the effective date of the updated service matches that of an existing global revision.

Execution mode

Transcript supports two modes of execution for tasks:

  • In strict mode, if an error is encountered at any point in the task, the error will be logged and execution will terminate

  • In permissive mode, many errors that would otherwise have caused the task to fail will be logged, the statement that caused the error will be skipped and execution will continue from the next statement in the task.

The mode option can be used multiple times and changed at any point during task execution. This means that specific sections of the task can be more error tolerant.

Errors that can be handled in permissive mode are mainly syntax errors or those involving invalid parameters to Transcript statements. There are error conditions that can arise during the execution of a statement will cause the task to fail even in permissive mode.

Log levels

Transcript generates a considerable amount of logging information during the execution of a task. The loglevel option can be used to increase or decrease the level of detail written to the logfile. All logging levels must be specified in UPPER CASE. The following levels can be set:

Level

Details

INTERNAL

Self-diagnostic messages indicating an internal error detected with Transcript itself

FATAL

Non-recoverable errors, usually as a result of an Operating System error such as no available memory

ERROR

A transcript task error (syntax or data related)

WARN

An unexpected event or a syntax error that can be recovered from

INFO

Informational messages about actions performed during the execution of a transcript task

The order of the logging levels in the table above is significant, in that for any given level, all levels above it in the table are also in effect. Therefore a logging level of WARN will result in log entries for ERROR, FATAL, and INTERNAL level events, as well as warnings.

The loglevel option can appear multiple times within a transcript task and will take immediate effect whenever it is used. This means that within a task, the loglevel can be increased for certain statements and reduced for others.

Regardless of the logging level, some events will always create a logfile entry, for example the success or failure of a transcript task at the end of its execution.

Log mode

In order to minimise the effect on performance when logging, Transcript opens the logfile when it is first run and then holds it open until the completion of the task being executed. The logfile can be accessed in one of two modes:

Mode

Details

SAFE

After every message, the logfile is flushed to disk

FAST

The logfile is not explicitly flushed to disk until the termination of the task file

The default is SAFE. It is not recommended that this be changed.

Examples

The following Transcript task will import a CSV file quoted with double quotes and delimited with commas and then export a copy with semicolons as delimiters and quoted with single quotes:

It also increases the logging level for the import statement

create columnNewColumnName[valueValue]

create columns fromColumnName[usingValueColumnName]

create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [Column [/regex/]|stringliteral]

Details

Explicit single column creation

Syntax

create columnNewColumnName[valueValue]

Details

This statement is used to create a new column called NewColumnName. The NewColumnName argument may be a fully qualified column name, in which case the new column will be created in the DSET specified as part of that name.

Note: If no default DSET has been explicitly defined using the default dset statement then the DSET created by the first use or import statement in the Transcript task is automatically set as the default DSET.

A column called NewColumnName must not already exist in the DSET. If NewColumnName contains dots then they will be converted into underscores.

The new column will be created with no values in any cells, unless the optional value *Value* portion of the statement is present, in which case all the cells in the new column will be set to Value.

Examples

Create a new empty column called Cost in the default DSET: create column Cost

Create a new column called Cost with a value of 1.0 in every row of the default DSET: create column Cost value 1.0

Create a new column called Cost with a value of 1.0 in every row of the DSET custom.charges: create column custom.charges.Cost value 1.0

Automated single/multiple column creation

Syntax

create columns fromColumnName[usingValueColumnName]

Details

This statement is used to create multiple columns in a single operation. As is the case for create columns above, if the using ValueColumnName portion of the statement is not present, then all newly created columns will have no values in any cells.

Given this example dataset:

The statement create columns from ServiceName using Count will create the result shown below:

The names of the new columns to create are derived from the contents of the cells in the column called ColumnName, and the values (if opted for) are derived from the contents of the cells in the column called ValueColumnName. Duplicates are ignored. If all the cells in ColumnName have the same contents, then only a single new column will be created. To illustrate this, consider the following:

When applied to the data above, the statement create columns from ServiceName will produce the following result (note that only a single column called Small_VM is created, and that empty cells are represented with a separator character, which in the case of the below is a comma):

If opting to set the values in the new columns, then for each row the value in ValueColumnName will be copied into the column whose name matches ColumnName. When applied to the same original data, the statement create columns from ServiceName using Quantity will produce the following result:

When using create columns the new columns are always created in the default DSET. This means that when no values are being set, it is possible to specify a different DSET for ColumnName. If the default DSET is Azure.usage, then the statement create columns from custom.data.Services will derive the names of the new columns from the cell contents in the Services column in the custom.data DSET.

This is only possible in the absence of the using ValueColumnName option. When values are to be set, both the ColumnName and ValueColumnName arguments must belong to the default DSET.

Example

The following transcript task will import the datasets Azure.usage and system/extracted/Services.csv, and create new (empty) columns in Azure.usage whose names are taken from the values in the column ServiceDefinitions in Services.csv.

Merging column values to create a new column

Syntax

create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [ ... Column [/regex/]|stringliteral]

If preferred, the wordusingmay be used instead of the wordfrom(both work in an identical fashion)

Details

This form of the statement is used to generate a new column containing values derived from those in one or more existing columns (termed source columns). The parameters are as follows:

Parameter

Required

Meaning

NewColumn

Yes

The name of the new column to create

sep

No

If specified after the separator keyword, sep is a string to insert between values extracted from the source columns

Column

Yes (at least one)

The name of a source column. A minimum of 1 column must be specified (and at most, 8 may be specified)

/regex/

The separator may be more than one character in length (up to 31 characters may be specified)

If a regex is specified then it must contain a subgroup enclosed in parentheses. The portion of the text in the source column matched by this subgroup will be extracted and used in place of the full column value.

The '/' characters surrounding the regular expression in the statement are not considered to be part of the expression itself - they are merely there to differentiate an expression from another column name.

If a regex is not specified, then the entire value in the source column will be used.

Options

By default the value extracted from a source column will be blank in the following two cases:

  • There is a blank value in a source column

  • No match for a regular expression is found in the value of a source column

In such cases the merged result will simply omit the contribution from the source column(s) in question. If all the source columns yield a blank result then the final merged result will also be blank.

This behaviour can be overridden through the use of the option statement. The options associated with the create mergedcolumn statement are as follows:

option merge_blank = some_text_here

This option will use the string some_text_here in place of any blank source column value.

option merge_nomatch = some_text_here

This option will use the string some_text_here if the result of applying the regular expression to a column value returns no matches.

Specifying the literal string <blank> as the merge_blank or merge_nomatch value will reset the option such that the default behaviour is re-activated.

Examples

Given the following dataset:

The following examples illustrate some uses of the create mergedcolumn statement:

Example 1

Example 2

If no regular expression is specified then the values in the source column will be used in their entirety:

Example 3

Let us add a new row to the sample dataset which has a non-compliant value for the user_id:

By default a non-matching value will result in a blank component of the merged result:

In this case, the resulting key for John has no separator characters in it. We can force a default value for the missing user_id portion as follows:

option delimiter = \t # Specify a literal TAB character
option delimiter = \" # Specify a literal quote
option mode = strict
option mode = permissive
option quote = \"
option loglevel = DEBUGX
import usage from Azure
option loglevel = INFO
option separator = ;
option quote = '
export azure.Usage as c:\transcript\exported\azure_modified.csv
SubscriptionID,ServiceName,Quantity
FE67,StorageGB,30
1377,Small_VM,2
EDED,Medium_VM,8
8E1B,Large_VM,1
99AA,Small_VM,99
SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,,,,
1377,Small_VM,2,,,,
EDED,Medium_VM,8,,,,
8E1B,Large_VM,1,,,,
99AA,Small_VM,99,,,,
SubscriptionID,ServiceName,Quantity,StorageGB,Small_VM,Medium_VM,Large_VM
FE67,StorageGB,30,30,,,
1377,Small_VM,2,,2,,
EDED,Medium_VM,8,,,8,
8E1B,Large_VM,1,,,,1
99AA,Small_VM,99,,99,,
import system/extracted/Services.csv source custom
import usage from Azure
default dset Azure.usage
create columns from custom.Services.ServiceDefinitions
name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing
# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :

create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
# Create a new column called 'key' which combines the 'department'
# and 'user_id' columns separated by ":", with prefix

create mergedcolumn key separator : from string prefix department user_id

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,prefix:Development:123-456-123456
Tim,654-321-654321,Project Management,prefix:Project Management:654-321-654321
Joram,555-222-999111,Development,prefix:Development:555-222-999111
Joost,826-513-284928,Sales and Marketing,prefix:Sales and Marketing:826-513-284928
name,user_id,department
Eddy,123-456-123456,Development
Tim,654-321-654321,Project Management
John,xxx-xxx-xxxxxx,Pending
Joram,555-222-999111,Development
Joost,826-513-284928,Sales and Marketing
# Create a new column called 'key' which combines the 'department'
# with the middle three digits of the 'user_id', separated by :

create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513
option merge_nomatch = [none]
create mergedcolumn key separator : from department user_id /[0-9]{3}-([0-9]{3})/

# Result:
name,user_id,department,key
Eddy,123-456-123456,Development,Development:456
Tim,654-321-654321,Project Management,Project Management:321
John,xxx-xxx-xxxxxx,Pending,Pending:[none]
Joram,555-222-999111,Development,Development:222
Joost,826-513-284928,Sales and Marketing,Sales and Marketing:513

strict

May be set to strict or permissive. This option specifies whether or not to terminate a transcript task if an error is encountered - see notes below

quote

"

Specifies the quote character used for quoting fields in a CSV file. This character is taken into consideration when importing and exporting Datasets. When importing a Dataset, any fields that begin and end with this character will have it removed during the import process. When exporting a Dataset, all non-blank fields will be quoted using this character. Note that when specifying a literal quote character, it must be escaped: \"

noquote

n/a

Specifies that no quoting is to be performed either when importing or exporting Datasets. A subsequent option quote statement will override this option.

loglevel

INFO

Sets the logging level - see notes below

logmode

SAFE

Sets the logging mode - see notes below

overwrite

yes

May be set to yesor 1, no or 0. If set to nothen statements that update cell values will only affect blank cells. Refer to the documentation articles for any given statement for more information.

embed

no

May be set to yesor 1 , noor 0. If enabled, then when importing a CSV file, any text between an opening curly brace - { - and its matching closing brace is not checked in any way. This permits separators and quotes to be included in the value. The curly brackets may be nested to any depth. If there is no matching } for the opening bracket then an error will be generated and the task will fail.

merge_blank

(none)

The default value for create mergedcolumn to use if a source column is blank

merge_nomatch

(none)

The default value for create mergedcolumn to use if no match is found for a regular expression in a source column value

DEBUG

Detailed logs of actions performed during the execution of a transcript task

DEBUGX

Extended debugging information (may cause very large logfiles and a minor reduction in performance)

No

If specified, the expression enclosed by the / characters is applied to the values in source column specified by the preceding ColumnN argument

string literal

No

If specified, literal will be add to column value. Relative order of Column and literal bits is observed

import

Overview

The import statement is used to read a CSV file (which must conform to standards) from disk in order to create a which can then be processed by subsequent Transcript statements.

To import CSV files that do not use a comma as the delimiter, please refer to the Quote and Separator Options section further down in this article.

Syntax

Import from CSV files

importfilenamefromsource[aliasalias][options { ... }]

importfilenamesourcecustom_source[aliasalias][options { ... }]

Import from CCR files

import filename.ccr source custom_source [aliasalias]

Import from database tables

import ACCOUNT sourcecustom_source[aliasalias]

import USAGE fordatefromdb_namesourcecustom_source[aliasalias]

Details

If using the Windows path delimiter - \ - it is advisable to put the path and filename in double quotes to avoid the backslash being interpreted as an escape character

Options

Data imported from a CSV file can be filtered as it is being read in order to reduce post-import processing time and memory overhead.

The import filters are configured via the options parameter, which if present is imemdiately followed by one or more name = value pairs enclosed within braces, for example:

The = sign is optional, but if present it must be surrounded by whitespace.

Multiple options may be specified and each option must be placed on separate line, for example:

The options supported by import are as follows:

Name

Type/possible values

Description

skip

numeric

Number of leading non-blank rows to skip

omit

numeric

Number of trailing non-blank rows to omit

heading_dupe_check header_dupe_check

enabled / on / true / yes

Skip rows that are a duplicate of the header row

select include

If the skip option is specified then the column headings in the imported data will be set to those of the first row following the skipped rows

The filter expression, if specified, is of identical format to that used by the where statement and it must reference at least one column name, enclosed within square brackets:

To specify a parameter or column name that contains spaces in an expression, use quotes within the square brackets as follows:

File name / pattern

If a pattern is specified in the import options, then the filename parameter is treated as an ECMAScript-type regular expression, and all files matching that pattern are imported and appended to one another to result in a single DSET.

Only filenames may contain a regular expression, directory names are always treated literally.

If using a regular expression, the import options are applied to all files matching that expression. All files must have the same structure (after any select/ignore options have been applied).

If any file in the matching set being imported has different columns to the first file that matched then an error will be generated and the task will fail.

Database tables

Account data

To import all accounts from Exivity the following statement is used:

import ACCOUNT sourcecustom_source[aliasalias]

This functionality is intended for advanced use cases where it is necessary to correlate information about existing accounts into the new data being processed as part of the Transform step.

Usage data

To import usage data previously written to an RDF using finish the following statement is used:

import USAGE fordatefromdsetsourcecustom_source[aliasalias]

This functionality may be useful in the event that the original data retrieved from an API has been deleted or is otherwise unavailable.

The date must be inyyyyMMddformat. The RDF from which the usage will be imported is:

<basedir>/system/report/<year>/<month>/<day>_<dset>_usage.rdf.

To illustrate this in practice, the statement ...

import USAGE for 20180501 from azure.usage source test alias data

... will load the usage data from ...

<basedir>/system/report/2018/05/01_azure.usage_usage.rdf

... into a DSET calledtest.data.

Source and Alias tags

A file imported from disk contains one or more named columns which are used to create an index for subsequent processing of the data in that file. As multiple files may be imported it is necessary to use namespaces to distinguish between the DSETs created from the files (each imported file is converted into a memory-resident DSET before it can be processed further). This is accomplished through the use of source and alias tags. Each file imported is given a unique source and alias tag, meaning that any column in the data imported from that file can be uniquely identified using a combination of source.alias.column_name.

There are two main variations of the import statement which are as follows:

Automatic source tagging

importfilenamefromsource[aliasalias]

By convention, data retrieved by the USE extractor from external sources is located in a file called

<basedir>/system/extracted/<source>/<yyyy>/<MM>/dd>_.csv

where <yyyy> is the year, <MM> is the month and <dd> is the day of the current data date.

Typically, the <source> portion of that path will reflect the name or identity of the external system that the data was collected from.

By default the alias tag will be set to the filename, minus the `_ portion of that filename and _.csv` extension but an alias can be manually specified using the optional _alias parameter.

As an example, assuming the data date is 20170223, the statement:

import usage from Azure

will create a DSET called Azure.usage from the file /system/extracted/Azure/2017/02/23_usage.csv.

The statement:

import usage from Azure alias custom

will create a DSET called Azure.custom from the file

<basedir>/system/extracted/Azure/2017/02/23_usage.csv.

Manual source tagging

importfilenamesourcecustom_source[aliasalias]

This form of the statement will import filename and the source tag will be set to the value of the custom_source parameter.

By default the alias tag will be set to the filename minus the .csv extension but an alias can be manually specified using the optional alias parameter.

As an example, assuming the data date is 20170223, the statement:

import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

will create a DSET called Azure.usage fom the file

<basedir>/system/extracted/Azure/20170223.csv

Importing CCR files

Options will be ignored when importing CCR files

When performing an import using manual tagging it is possible to import a Cloud Cruiser CCR file. This is done by specifying the .ccr filename extension as follows:

Full information regarding CCR files is beyond the scope of this article. Cloud Cruiser documentation should be consulted if there is a requirement to know more about them.

In order to create a DSET from a CCR file, Transcript coverts the CCR file to a Dataset in memory as an interim step. The resulting DSET will have blank values in any given row for any heading where no dimension or measure in the CCR file existed that matched that heading. No distinction is made between dimensions and measures in the CCR file, they are imported as columns regardless.

When importing a CCR file, the quote and separator characters are always a qouble quote - " - and a comma - , - respectively. Fields in the CCR file may or may not be quoted. The conversion process will handle this automatically and any quotes at the start and end of a field will be removed.

Quote and Separator options

The fields in a dataset file may or may not be quoted and the separator character used to delineate fields can be any character apart from an ASCII NUL (0) value. It is therefore important to ensure that the correct options are set before importing a dataset in order to avoid unwanted side effects.

The options relating to import are quote and separator (or delimiter). By default, these are set to a double quote and a comma respectively.

If defined the quote character will be stripped from any fields beginning and ending with it. Any additional quote characters inside the outer quotes are preserved. Fields that do not have quotes around them will be imported correctly, unless they contain a quote character at only one end of the field.

If no quote character is defined, then quote characters are ignored during import, but any separator characters in the column headings will be converted to underscores.

Embeds

If the embed option is set, then a curly bracket in the data - { - will cause all characters (including newlines) up until the closing bracket - } - to be imported. Nested brackets are supported, although if there are an uneven number of brackets before the end of the line an error will be generated in the logfile and the task will fail.

Embeds are not supported in CCR files

Examples

Create a DSET called Azure.usage from the file /system/extracted/Azure/2017/02/23_usage.csv import usage from Azure

Create a DSET called Azure.custom from the file /system/extracted/Azure/2017/02/23_usage.csv import usage from Azure alias custom

Create a DSET called Azure.usage fom the file /system/extracted/Azure/20170223.csv import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

Create a DSET called Azure.usage from the file /system/extracted/Azure/20170223.csv import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

Dataset
DSET

service

This article assumes a knowledge of services, their rates and related concepts as documented in Services in Exivity

Overview

The service statement is used to create or modify a single during the execution of a Transcript task. The service definition is associated with the data in an existing DSET, and the is updated if and when the Transcript task successfully completes.

It is recommended not to use the service statement for creating services, rather to use .

Syntax

service{ param1 = value [... paramN = value] }

Example:

Parameters may be specified in any order. The '=' between parameters and their values is optional and may be omitted, but if present it must be surrounded by white-space

Details

Summary

The service statement is used to create a new service definition. Once created, a service definition automatically becomes available to the reporting engine.

Parameter table

The service statement creates a new service using the following parameters:

Parameter details

key

The key parameter must be distinct from that in any other service unless it is used to identify an existing service definition to be overwritten with new values. By default, an attempt to create a service with a key that is a duplicate to an existing service in the global database will result in a warning in the logfile and no further action will be taken.

If the key matches that of a service that has been defined previously in the task file then one of two things can happen depending on the value of the current :

  1. If the mode is set to strict then an error is logged and the task will fail

  2. If the mode is set to permissive then a warning is logged and the newest service definition is ignored

To override the default protection against overwriting existing service definitions in the global database, the statement should be invoked prior to the service statement.

The value of the key parameter may be up to 127 characters in length. Longer names will be truncated (and as a result may no longer be unique).

description

The description parameter is freely definable and does not have to be unique. When a report is generated, the description of a service will be used on that report so care should be taken to make the description meaningful.

By default, if description is not specified in a service definition then a copy of the key will be used as the description.

The value of the description parameter may be up to 255 characters in length. Longer descriptions will be truncated.

category / group

Either category or group may be used. The two terms are interchangeable in this context.

The category parameter is used to logically associate a service with other services. All services sharing the same category will be grouped together on reports. Any number of different categories may exist and if the category specified does not exist then it will be automatically created.

If no category is specified then the service will be placed into a category called Default.

The value of the category parameter may be up to 63 characters in length. Longer values will be truncated.

usage_col

In order to calculate a price for a service, the number of units of that service that were consumed needs to be known. The value of the usage_col parameter specifies the column in the usage data which contains this figure.

The usage_col argument is also used to derive the DSET associated with the service as follows:

  • If the value of usage_col is a column name then the service will be associated with the DSET identified by that name

  • If the value of usage_col is not fully qualified then the service will be associated with the default DSET

If the column specified by the usage_col argument is determined not to exist using the above checks, then one of two things can happen depending on the value of the current :

  1. If the mode is set to strict then an error is logged and the task will fail

  2. If the mode is set to permissive then a warning is logged and the service definition is ignored

The value of the usage_col parameter may be up to 255 characters in length. Longer values will be truncated.

interval

Services may be charged in different ways. For example some services will invoke a charge whenever they are used whereas others are charged by a time interval, such as per month.

The value of the interval parameter determines how the service should be charged as per the following table:

model

Specifies whether or not to apply when calculating the charge for a monthly service.

Currently, only monthly services can be prorated, based on the number of days in the month the service was used.

unit_label

The unit_label is the label used for units of consumption on reports. For example storage-related services may be measured in Gb or Tb, Virtual Machines may be measured in Instances and software usage may be measured in licenses.

The specified unit_label value may be up to 63 characters in length. Longer values will be truncated.

If the unit_label parameter is not specified then a default value of Units will be used.

account_id

The account_id parameter is intended for internal use only and should not be used

The optional account_id references an entry in the account table in the global database. If specified, the service will be created with a rate revision specific to the account id.

The default rate for the service (which applies to all combinations of report column values not explicitly associated with their own rate) can be defined in any of the following ways:

  • By omitting the account_id parameter altogether

  • By specifying an account_id of 0

  • By specifying an account_id of *

rate

The rate parameter determines the cost per unit of consumption to use when calculating the charge. A rate of 0.0 may be used if there is no charge associated with the service.

This may be used in conjunction with a fixed_price and one of cogs or fixed_cogs.

Either or both of rate or fixed_price may be specified, but at least one of them is required.

fixed_price

The fixed_price is a charge applied to the service per charge interval regardless of the units of consumption.

This may be used in conjunction with a rate and one of cogs or fixed_cogs

Either or both of rate or fixed_price may be specified, but at least one of them is required.

cogs

The optional COGS charge is the price to the provider of the service. Usually, COGS-related charges are not included on reports special permissions are required to see the COGS charges.

For services with a defined cogs value it is possible to generate Profit and Loss reports, the profit/loss being the total charge calculated from the rate and/or fixed_rate values minus the price calculated from the cogs or fixed_cogs values.

fixed_cogs

The optional fixed_cogs value is a fixed price to be factored into COGS-related calculations regardless of the number of units consumed in the charging interval.

min_commit

The optional min_commit parameter specifies a minimum number of units of consumption to include when calculating the charge associated with a service. In cases where the actual consumption is greater than the min_commit value this will have no effect, but where the actual consumption is less than the minimum commit the price will be calculated as if the min_commit units had been consumed.

charge_model

If specified as peak then the charge for monthly service created will be calculated based on the day with the highest charge in the month.

If specified as average then the charge for monthly service created will be calculated as the average unit price (for days that have usage only) multiplied by the average quantity (days with no usage will be treated as if they had a quantity of 0).

If specified as last_day then the charge for monthly service created will be calculated based on the last day of the month.

If specified as day_xxx (where xxx is a number in 1-28 range) then the charge for monthly service created will be calculated based on the specified day of the month.

The Service Definition cache

Once all parameters have been evaluated by the service statement the resulting service definition is added to a cache in memory. This cache is committed to the at the successful conclusion of the Transcript task (or discarded in the case of error).

Once a service definition has been added to the cache, no subsequent service definitions with the same key may be added to the cache. In the event of conflict the first definition written to the cache is perserved and the attempt to add a duplicate will result in a warning or an error depending on the value of the current as described in the section above.

Rate revisions

As described in a service may have multiple rate revisions associated with it. The service statement can only create a single rate revision per service per execution of a Transcript task.

A rate revision consists of the rate, fixed_rate, cogs, fixed_cogs and effective_date parameters. Each revision must have a different effective_date which indicates the date from which that service revision is to be used. A rate definition remains in force for all dates on or after the effective_date, or until such time as a rate revision with a later effective_date is defined (at which point that revision comes into effect).

To create multiple revisions, Transcript must be run multiple times using a service statement that has the same key and uses the same rate, fixed_rate, cogs and fixed_cogs values but has a different effective_date each time. For each of the effective_date parameters a new rate revision will be created for the service.

Updating the Global Database

At the successful conclusion of a Transcript task the is updated from the memory cache.

Examples

import system/extracted/example.csv source test alias data options {
    skip = 1
}
import system/extracted/example.csv source test alias data options { 
    skip = 1
    omit = 4
}
import system/extracted/example.csv source test alias data options { 
    # Only import rows where the vmType column is not the value "Template"
    filter ([vmType] != Template)
}
import system/extracted/example.csv source test alias data options { 
    filter (["service name"] =~ /.*D/)
}
import system/extracted/myccrfile.ccr source MyData

column list

A space-separated list of column names to include in the import

ignore exclude

column list

A space-separated list of column names to ignore when importing

filter

expression

If specified, only rows that match the expression will be imported

escaped escape

enabled / on / true / yes

Treat backslash (\) characters in the data in the CSV to be imported as an escape character. This will escape the following delimiter or quote. For a literal backslash to be imported it needs to be escaped in the data (\\). Where a backslash is not followed by another backslash or a quote, it is treated literally.

pattern

enabled / on / true / yes

Permits the filename to be specified as a regular expression

encoding

encoding

Specifies the encoding of the file to be imported. Any encoding supported by libiconv may be used.

skip_corrupted_file

enabled / on / true / yes

If this option is enabled and pattern is set, any file that is not formatted correctly is skipped. If this option is not enabled then any file that is not formatted correctly will cause the script to terminate with an error.

skip_corrupted_record

enabled / on / true / yes

If this option is enabled, corrupted or malformed records (rows) are skipped

filename_column

enabled / on / true / yes

If enabled, a column calledEXIVITY_FILE_NAME will be created which contains the filename from which each record was imported

The name of the column containing the number of units consumed

interval

No

The charging interval for the service

model

No

Whether the service charges should be prorated or not

unit_label

No

The label for the units of consumption; eg: GB for storage

account_id

No

Which account to associate with the rate revision

rate

No

The price per unit of consumption

fixed_price

No

The fixed price which will be charged regardless of non-zero usage in the charge interval

cogs

No

The COGS price per unit of consumption

fixed_cogs

No

The fixed COGS price which will be charged regardless of non-zero usage in the charge interval

min_commit

No

The minimum number of units of consumption that will be charged per interval

effective_date

No

The date from which the rate revision should be applied

charge_model

No

A specific charge model (may be peak , average, last_day or day_xxx, where xxx is number in range 1-28)

Parameter

Required

Notes

key

Yes

A unique identifier for the service

description

No

The name of the service as it will appear on a report

category or group

No

Services are grouped by category on a report

usage_col

Interval value

Meaning

individually

Every unit of consumption is charged individually. For example if a network-related service charges by the GB, then for every GB seen in the usage the charge is applied regardless of how many GB are used or over how long a period of time the consumption took place.

hourly

The charge is applied once per hour regardless of the number of times within the hour the service was consumed

daily

As for hourly but applied per-day

monthly

As for daily but applied per calendar month

Model value

Meaning

unprorated

No proration is to be applied

prorated

Proration will be applied

service definition
global database
services
execution mode
option services = overwrite
fully qualified
execution mode
proration
global database
execution mode
key
Services in Exivity
global database

Yes

service {
    key = "A1 VM"
    usage_col = "A1 VM - EU North"
    description = "A1 VM (EU North)"
    category = "Virtual Machines"
    interval = monthly
    model = unprorated
    unit_label = "Instances"
    rate = 0.8
    fixed_price = 10
    cogs = 45
    fixed_cogs = 16
    min_commit = 4
}
service {
    key = "Azure A1 VM"
    usage_col = "A1 VM - EU North"
    description = "Azure A1 VM (EU North)"
    category = "Virtual Machines"
    interval = monthly
    model = unprorated
    unit_label = "instances"
    min_commit = 2
    rate = 0.8
    fixed_price = 0.2
    cogs = 0.55
}

timestamp

Overview

The timestamp statement is used to create or update a column containing a timestamp.

The value of the timestamp is derived by applying a template to one or (optionally) two source columns. Timestamps generated by the timestamp statement can be UNIX epoch values or yyyyMMdd format strings.

Syntax

timestampTimeCol[offsetsecs]usingColName [ColName2]templateTemplate[format yyyymmdd]

Details

The timestamp statement populates the TimeCol column with either a UNIX timestamp (an integer representing a number of seconds since 00:00:00 on January 1st, 1970) or a yyyyMMdd format string.

If a column called TimeCol does not exist, then one will be created. If it does exist then the values within it will be overwritten.

The ColName argument must be the name of an existing column which contains date and/or time information to be extracted and used to derive the values in the TimeCol column. If the optional ColName2 argument is present then for each row of data the values in TimeCol and TimeCol2 are concatenated before the extraction of the data is done.

The TimeCol, ColName and ColName2 arguments may be fully qualified column names, but all three must be located in the same DSET. If TimeCol does not exist, then it will be created in the DSET that ColName is located in.

A UNIX timestamp will be generated unless format yyyymmdd is specified in which case the result will be a yyyyMMdd format timestamp.

If a secs parameter is provided and if the output is to be a UNIX timestamp, the specified number of seconds will be added to, or subtracted from, the result. As well as being useful for adjusting for timezones, this permits a time which falls on a midnight boundary to be 'nudged' back 1 second to fall on 23:59:59 of the previous day.

Currently, an offset can only be applied to UNIX format timestamps

Data used by timestamps

In order to create a UNIX timestamp value, the following data is used:

Field

Values

Required

Default

Year

1971 or greater

Yes

n/a

Month

1 - 12

No

1

Day

In order to create a yyyyMMdd timestamp value, the following data is used:

Field

Values

Required

Default

Year

Any 4 digits

Yes

None

Month

1 - 12

No

01

Day

In both cases the fields are extracted from the ColName and (optionally) ColName2 columns using the Template argument as detailed below.

Templates

The Template argument is a string of characters defining which characters in the ColName column (and, if present, the ColName2 column) are to be extracted in order to obtain the field values shown in the tables above.

The template for a UNIX format timestamp consists of the following characters:

Character

Meaning

.

Any character

Y

A year digit

M

A month digit

D

A day digit

h

An hour digit

Upper-case Y, M and D characters are used for the date

Lower-case h, m and s characters are used for the time

The template for a yyyyMMdd format timestamp consists of the following characters:

Character

Meaning

.

Any character

Y

A year digit

M

A month digit

D

A day digit

Source values

For every row in the dataset, the template is applied to a source value which is constructed from the ColName and ColName2 columns as follows:

ColName

ColName2

Source value

Blank

Blank

No action is taken and the row is skipped

Not blank

Blank

The value in ColName

Blank

Not blank

The value in ColName2

Not blank

When applying the template to the source value the characters in the template are examined one at a time. A dot (.) causes the character in the same position in the source value to be ignored. Any of the other template characters will cause the character in the same position in the source value to be extracted and added to one of the fields used to create the timestamp.

Here are some sample template definitions:

Source value

Data to extract

Template

15:30:30

Hour, Minute and Second

hh.mm.ss

20160701

Year, Month and Day

YYYYMMDD

31-01-2016 17:35:59

Full date and time

DD.MM.YYYY.hh.mm.ss

2015-09-01T00:00:00Z

The length of the template may be shorter than the value that it is being applied to. In the last example shown above, the year, month and date values occur at the start of the string, and the template therefore is only as long as is required to extract them.

The template must always contain four Y characters to define the year, although they do not have to be consecutive. For all the other characters (apart from .) there may be 0 - 2 of them present in the template.

If none of any given character is present, then the value will default to the lowest possible value. For example the template YYYYMM.., when applied to the input value 20160224 will result in a year of 2016, a month of 02 and a day of 01 (being the lowest possible value of a day in any given month).

Timestamp generation

Once the above fields have been extracted, they are converted into a UNIX or yyyyMMdd timestamp value as follows:

  • If a UNIX timestamp, then it is adjusted for the local time of the Exivity server

  • If a yyyyMMdd timestamp then it is treated 'as is'

This value is then placed in the TimeCol column. The TimeCol column may be the same as ColName or ColName2.

Examples

Example 1 - UNIX format timestamp

Given a dataset of the form:

The statements:

Will produce the following result:

For verification, the converted values translate back to the following times:

Timestamp

Date

1467291600

30/06/2016, 14:00:00 GMT+1:00 DST

1467295199

30/06/2016, 14:59:59 GMT+1:00 DST

1467295200

30/06/2016, 15:00:00 GMT+1:00 DST

1467298799

30/06/2016, 15:59:59 GMT+1:00 DST

Example 2 - yyyyMMdd format timestamp

Given a dataset of the form:

The statement:

Will produce the following result:

start_date,end_date,start_time,end_time,subscriptionId, ...
20160630,20160630,14:00:00,14:59:59,a9470811-83f2-474b-9523-0ece853d8c3c, ...
20160630,20160630,15:00:00,15:59:59,a9470811-83f2-474b-9523-0ece853d8c3c, ...
timestamp start_time using start_date start_time template "YYYYMMDDhh.mm.ss"
timestamp end_time using end_date end_time template "YYYYMMDDhh.mm.ss"
delete columns start_date end_date
start_time,end_time,subscriptionId, ...
1467291600,1467295199,a9470811-83f2-474b-9523-0ece853d8c3c, ...
1467295200,1467298799,a9470811-83f2-474b-9523-0ece853d8c3c, ...
subscriptionId,effectiveDate ...
a9470811-83f2-474b-9523-0ece853d8c3c,2015-09-01T00:00:00Z
a9470811-83f2-474b-9523-0ece853d8c3c,2017-01-01T00:00:00Z
timestamp effectiveDate using effectiveDate template YYYY.MM.DD format yyyymmdd
subscriptionId,effectiveDate ...
a9470811-83f2-474b-9523-0ece853d8c3c,20150901
a9470811-83f2-474b-9523-0ece853d8c3c,20170101

1 - 31

No

1

Hour

0 - 23

No

0

Minute

0 - 59

No

0

Second

0 - 59

No

0

1 - 31

No

01

m

A minute digit

s

A seconds digit

Not blank

The value of ColName with the value of ColName2 appended to the end

Year, Month and Day

YYYY.MM.DD

if

Overview

The if statement is used to conditionally execute one or more statements

Syntax

Conditional Expressions

A conditional expression (hereafter referred to as simple an expression) is evaluated to provide a TRUE or FALSE result which in turn determines whether one or more statements are to be executed or not. The following are examples of a valid expression:

An expression used by the if statement may contain:

  • Numeric and string literals

  • Regular expressions

  • Variables

  • Operators

Numeric and string literals

A literal is a specified value, such as 4.5 or "hostname". Literals may be numbers or strings (text).

If a literal is non-quoted then it will be treated as a number if it represents a valid decimal integer or floating point number (in either regular or scientific notation), else it will be treated as a string.

If a literal is quoted then it is always treated as a string, thus 3.1515926 is a number and "3.1415926" is a string.

Regular expressions

Regular expressions must be enclosed within forward slashes (/), and are assumed to be in .

If present, a regular expression must be used on the right hand side of either an !~ or an =~ operator, and when evaluated it will be applied to the value on the left hand side of an operator, eg:

As the forward slash is used as a delimiter for the expression, any literal forward slashes required by the expression should be escaped with a back-slash: \/

Variables

can be used within expressions, in which case they are replaced with their values. Once expanded, these values are treated as literals.

Operators

Operators are evaluated according to the operator precedence rules in the table below (where the highest precedence is evaluated first), unless parentheses are used to override them. Operators with the same precedence are evaluated from left to right.

Although expressions are evaluated based on the precedence of each operator as listed in the above table, it is recommended that parenthesis are used within the expression in order to remove any ambiguity on the part of a future reader.

Functions

A function is used to evaluate one or more arguments and return a result which is then taken into consideration when evaluating the overall truth of the expression.

Function calls start with a the character @ which is followed by the function name and a comma separated list of parenthesised parameters, for example @MIN(1, 2, 3) .

Function names must be specified in UPPER CASE as shown in the examples below.

The following functions are supported by the if statement:

Numeric functions

MIN

Return the smallest number from the specified list (requires at least 2 arguments)

Examples:

  • @MIN(1,2) returns 1

  • @MIN(1,2,-3) returns -3

  • @MIN(1,2,"-1")

MAX

Return the largest number from the specified list (requires at least 2 arguments)

Examples:

  • @MAX(1,2) returns 2

  • @MAX(-1,-2,-3) returns -1

  • @MAX(1,2,100/10)

ROUND

Returns number rounded to digits decimal places. If the digits argument is not specified then the function will round to the nearest integer.

This function rounds half away from zero, e.g. 0.5 is rounded to 1, and -0.5 is rounded to -1

Examples:

  • @ROUND(3.1415,3) returns 3.142

  • @ROUND(3.1415,2) returns 3.14

  • @ROUND(3.1415926536,6)

String functions

CONCAT

This function will treat all its arguments as strings, concatenate them and return the result.

Examples:

  • @CONCAT("the answer ", "is") returns the answer is

  • @CONCAT("the answer ", "is", " 42") returns the answer is 42

  • @CONCAT("the answer ", "is", " ", 42)

SUBSTR

Return a sub-string of string, starting from the character at position start and continuing until the end of the string end until the character at position length, whichever is shorter.

If length is omitted, then the portion of the string starting at position start and ending at the end of the string is returned.

Examples:

  • @SUBSTR("abcdef", 1) returns abcdef

  • @SUBSTR("abcdef", 3) returns cdef

  • @SUBSTR("abcdef", 3, 2)

STRLEN

Returns the length of its argument in bytes.

Examples:

  • @STRLEN("foo") returns 3

  • @STRLEN(@CONCAT("ab", "cd")) returns 4

  • @STRLEN(1000000) returns 7 (the number 1000000 is treated as a string)

PAD

This function returns value, left-padded with pad_char (0 by default) up to specified width. If width is less than or equal to the width of value, no padding occurs.

Examples:

  • @PAD(5, 123) returns 00123

  • @PAD(5, 12345) returns 12345

  • @PAD(1, 12345)

EXTRACT_BEFORE

This function returns the substring of string that precedes the pattern. If pattern cannot be found in the string, or either string or pattern are empty, result of the function is empty string.

Examples:

  • @EXTRACT_BEFORE("abcdef", "d") returns ab

  • @EXTRACT_BEFORE("abcbc", "bc") returns a

  • @EXTRACT_BEFORE("abcdef", "x")

EXTRACT_AFTER

This function returns the substring of string that follows the pattern. If pattern cannot be found in the string, or either string or pattern are empty, result of the function is empty string.

Examples:

  • @EXTRACT_AFTER("abcdef", "cd") returns ef

  • @EXTRACT_AFTER("abcabc", "ab") returns cabc

  • @EXTRACT_AFTER("abcdef", "abb")

EXTRACT_XXX functions can be combined to extract the middle part of the string, for example @EXTRACT_AFTER(@EXTRACT_BEFORE("abcdef", "ef"), "ab") returns cd.

Date functions

All date functions operate with dates in yyyyMMdd format

CURDATE

Returns the current (actual) date in the timezone of the Exivity server. The format may be any valid combination of specifiers. The default format is %Y%m%d which returns a date in yyyyMMdd format.

Examples (assuming run date is 1 July 2019, at 12:34:56):

  • @CURDATE() returns 20190701

  • @CURDATE(\"%d-%b-%y\") returns 01-Jul-19

  • @CURDATE("%H:%M:%S")

DATEADD

Adds a specified number of days to the given date, returning the result as a yyyyMMdd date.

Invalid dates are normalised, where possible (see example below):

Examples:

  • @DATEADD(20180101, 31) returns 20180201

  • @DATEADD(20180101, 1) returns 20180102

  • @DATEADD(20171232, 1)

DATEDIFF

Returns the difference in days between two yyyyMMdd dates. A positive result means that date1 is later than date2. A negative result means that date2 is later than date1. A result of 0 means that the two dates are the same.

Invalid dates are normalised, when possible (see example below):

Examples:

  • @DATEDIFF(20190101, 20180101) returns 365

  • @DATEDIFF(20180201, 20180101) returns 31

  • @DATEDIFF(20180102, 20180101)

DTADD

This function adds count number of units (DAYS by default) to the specified datetime value and return normalised result datetime value in YYYYMMDDhhmmss format.

Datetime can be in any of the following formats:

  • YYYYMMDD

  • YYYYMMDDhh

  • YYYYMMDDhhmm

All missing bits of datetime value assumed zeros.

Supported units are (both singular and plural spellings supported):

  • YEAR

  • MONTH

  • DAY (default)

Example

  • @DTADD(20190701, 2) returns 20190703000000

  • @DTADD(20190701, 2, HOURS) returns 20190701020000

  • @DTADD(2019070112, 50, DAYS)

Transcript-specific functions

Transcript-specific functions may be preceded with an exclamation mark in order to negate their output. For example:

FILE_EXISTS

Returns 1 if the file filename exists, else returns 0.

FILE_EMPTY

In mode, this function returns 1 if the file filename exists and is empty. If the file does not exist, then this is considered an error.

In mode, a non-existent file is considered equivalent to an existing empty file.

In either case, if the file exists and is not empty, the function returns 0

FILE_EXISTS and FILE_EMPTY functions will only check files within Exivity home directory and its sub-directories, filename must contain pathname relative to Exivity home directory.

DSET_EXISTS

Returns 1 if the specified DSET exists, else 0

DSET_EMPTY

In mode (option mode = strict), this function returns 1 if the specified DSET exists and is empty. If the DSET does not exist, then this is considered an error.

In mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET.

In either case, if the DSET exists and is not empty, the function returns 0.

COLUMN_EXISTS

This function returns 1 if the specified column exists, else 0. The column name may be , but if it is not, then it is assumed to be in the default DSET.

DSET_ROWCOUNT

This function returns number of rows within specified DSET.

In mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET, and zero is returned.

DSET_COLCOUNT

This function returns number of columns within specified DSET.

In mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET, and zero is returned.

if (conditional expression) {
    <statements ...>
} [else {
    <statements ...>
}]

Functions

2

/

Division

2

%

Modulo

3

+

Addition

3

-

Subtraction

4

<

Less than

4

<=

Less than or equal to

4

>

Greater than

4

>=

Greater than or equal to

5

==

Is equal to

5

!=

Is not equal to

5

=~

Matches regular expression

5

!~

Does not match regular expression

6

&&

Boolean AND

7

`

`

Boolean OR

returns
-1
- string
"-1"
is converted to number
-1
  • @MIN(1,2,3/6) returns 0.5

  • @MIN(1,2,"3/6") returns 1 - string "3/6" is converted to number 3, up to first invalid character

  • @MIN(1,2,"zzz") returns 0 - string "zzz" is converted to number 0

  • returns
    10
    returns
    3.141593
  • @ROUND(3.1415) returns 3

  • @ROUND(2.71828) returns 3

  • returns
    the answer is 42
    returns
    cd
  • @SUBSTR("abcdef", 3, 64) returns cdef

  • returns
    12345
  • @PAD(5, top, Z) returns ZZtop

  • returns empty string
    returns empty string
    returns
    12:34:56
  • @CURDATE("%u") returns 1 (weekday - Monday)

  • @CURDATE("%j") returns 182 (day of the year)

  • returns
    20180102
    (
    the invalid date
    20171232
    is normalised to
    20180101
    )
  • @DATEADD(20180101, 365) returns 20190101

  • returns
    1
  • @DATEDIFF(20180101, 20180102) returns -1

  • @DATEDIFF(20180101, 20180101) returns 0

  • @DATEDIFF(20171232, 20180101) returns 0 (the invalid date 20171232 is normalised to 20180101)

  • YYYYMMDDhhmmss

    HOUR
  • MINUTE

  • SECOND

  • returns
    20190820120000
  • @DTADD(20190701123456, 10, MONTH) returns 20200501123456

  • Precedence

    Operator

    Meaning

    1

    !

    Unary negation

    2

    *

    Multiplication

    ECMAScript format
    Variables
    strftime
    strict
    permissive
    strict
    permissive
    fully-qualified
    permissive
    permissive

    (${dataDate} == 20180801)
    
    ((${dataDate} >= 20180801) && ([hostname] == "templateVM"))
    if (${dataDate} =~ /[0-9]{4}01/) {
        var first_day_of_month = yes
    } else {
        var first_day_of_month = no
    }
    @MIN(number, number [, number ...])
    @MAX(number, number [, number ...])
    @ROUND(number [, digits])
    @CONCAT(string1, string2 [, stringN ...])
    @SUBSTR(string, start [, length])
    @STRLEN(string)
    @PAD(width, value [, pad_char])
    @EXTRACT_BEFORE(string, pattern)
    @EXTRACT_AFTER(string, pattern)
    @CURDATE([format])
    @DATEADD(date, days)
    @DATEDIFF(end_date, start_date)
    @DTADD(datetime, count [, unit])
    if (!@COLUMN_EXISTS("colName")) {
       The column colName does NOT exist
    }
    @FILE_EXISTS(filename)
    @FILE_EMPTY(filename)
    @DSET_EXISTS(dset.id)
    @COLUMN_EXISTS(column_name)
    @DSET_ROWCOUNT(dset)
    @DSET_COLCOUNT(dset)

    services

    This article assumes a knowledge of services, their rates and related concepts as documented in Services in Exivity and in the article on the service statement

    Overview

    The services statement is used to create or modify multiple services based on the data in a .

    Syntax

    services{ param1 = value [ ... paramN = value] }

    Example:

    Parameters may be specified in any order. The '=' between parameters and their values is optional and may be omitted, but if present it must be surrounded by white-space

    Details

    Summary

    The services statement is used to create or modify multiple services from the data in a . The parameters supplied to the statement map columns in the usage data to attributes of the created services.

    How column names are used

    For many of the parameters to the services statement there are two ways of using a column name:

    1. The values in the column are extracted from the usage data and those values are embedded as literals into the service definition.

    2. The column name itself is used in the service definition such that the reporting engine dynamically determines the values to use for any given day when generating report data

    When creating or updating services using the first method, Transcript will create a new rate revision every time the rate information changes. For data sources such as Microsoft Azure where the rates can change daily this will result in a lot of rate revisions in the .

    Using the second method requires only a single rate revision which identifies by name the column(s) containing rate and/or COGS information. When a report is run, the charge engine then obtains the correct rate information for any given day from the data in those named columns.

    Parameter table

    The parameters supported by the services statement are summarised in the following table. The Type column in the table indicates the way the column name is used as described in How column names are used above. Additional information about each parameter can be found below the summary table itself.

    Parameter details

    usages_col

    The usages_col parameter is the name of a column containing service keys. A service will be created for each distinct value in this column, and these values will be used as the service keys.

    service_type

    In order to calculate the charges associated with a service it is necessary to know the number of units of that service that were consumed. Exivity supports two methods of retrieving the units of consumption from usage data and the service_type determines which of these is applied.

    Any given service may use one or other of these methods, which are as follows:

    • Manual services: service_type = MANUAL

    • Automatic services: service_type = AUTOMATIC

    Manual services

    Manual services require that the units of consumption for each service named in the usages_col column are stored in separate columns whose names correlate to the service keys themselves.

    To illustrate this, consider the following fragment of usage data:

    In this case, the service_name column contains a list of service keys and for each of those service keys there is a corresponding column containing the units of consumption for that service. Thus in the above example we can see that there are two services, "Small VM" and "Large VM" and that the units of consumption for each of these services are in the columns of the same name.

    The more manual services that are represented in the data, the more columns are required.

    Automatic services

    Automatic services require that the units of consumption for each service named in the usages_col column are stored in the column named by the consumption_col paramater. To represent the same information as that shown in the example above, the following would be used:

    It can be seen that any number of automatic services, along with their consumption figures, can be represented using only two columns of data.

    consumption_col

    The consumption_col parameter is only required when creating automatic services and determines the column containing the units of consumption for each service as described above.

    instance_col

    It is not enough to know only the units of consumption for each service, as this merely provides the total consumption for each service across the entire usage. In the examples above, for example, the "Large VM" service has 10 units of consumption but using that information alone there is no way to know if this represents one instance of a VM used 10 times, 10 instances of VMs used once each, or something in between.

    The instance_col parameter is therefore required to tell the difference. Typically this will be a unique identifier which groups the units of consumption into 'buckets'. In the case of a VM this may be a VM ID which remains constant throughout the life of a VM in the cloud.

    To illustrate this, we can supplement the example usage fragment used previously with additional information to use as the instance_col as follows:

    By specifying instance_col = vmid we can now see that the usage represents:

    • 5 instances of a single Small VM with an ID of 444

    • 6 instances of a Large VM with an ID of 555

    • 4 instances of a Large VM with an ID of 666

    description_col

    If specified, the description_col denotes a column containing a friendly description for reports to identify the service with.

    Typically in cloud usage data, services are identified using unique IDs (referred to as keys in Exivity) which are often non-meaningful to human eyes, so Exivity supports a 'friendly' description for each service for display purposes when generating a report.

    For example description_col = description may be used in conjunction with the following data to map the service_id to a friendly name:

    It is not mandatory to provide a description_col parameter, but if one is not supplied then the description will be set to a duplicate of the service key (as derived via the usages_col parameter).

    In the example above, it can be seen that there are multiple rows in the data for the same service key (vmid). When using description_col, the first row for each distinct value in the usages_col will be used to set the description.

    category

    By default Exivity will group services on reports according to their category. Using categories ensures that charges for services that fall within the same category wil appear in a contiguous block.

    The category parameter specifies the category to which all services created will be assigned, thus specifying category = "Virtual Machines" might be appropriate for the example data used so far in this article.

    If no category is specified, the services created will be assigned to a category called Default.

    category_col

    Usage data normally contains information about a range of services of different types such as Virtual Machines, Storage, Networking and so on. By referencing a column in the usage data which identifies the correct category for each service, multiple categories will be created and each service assigned to the correct category by the services statement.

    To illustrate this, let us extend the sample data as follows:

    By specifying category_col = category each service will now be associated with the correct category.

    interval

    The interval parameter is used to specify a literal interval for all the services created by the services statement.

    The interval parameter may be any of:

    • individually

    • daily

    • monthly

    If the interval parameter is not specified, then a default interval of monthly will be used.

    interval_col

    In the event that different services in the usages_col require different charge intervals, a column name containing the interval to use may be specified using the interval_col column as follows:

    By specifying interval_col = interval each service in the above usage data will be assigned the correct charge interval.

    model

    The model parameter is used to enable proration for monthly services. Either of unprorated or prorated may be specified.

    If no model is specified, then a value of unprorated will be used by default.

    model_col

    In the event that different services in the consumptions_col require different proration settings, the model_col parameter can be used to specify which column contains the proration setting for each service.

    By specifying model_col = model, each service in the above usage data will be assigned the correct proration model.

    charge_model

    If specified as peak then the charge for any monthly services created will be calculated based on the day with the highest charge in the month.

    If specified as average then the charge for any monthly services created will be calculated as the average unit price (for days that have usage only) multiplied by the average quantity (days with no usage will be treated as if they had a quantity of 0).

    If specified as last_day then the charge for any monthly services created will be calculated based on the last day of the month.

    If specified as day_xxx (where xxx is a number in 1-28 range) then the charge for any monthly services created will be calculated based on the specified day of the month.

    charge_model_col

    In the event that different services in the consumptions_col require different charge models, the charge_model_col parameter can be used to specify which column contains the charge_model setting for each service. For the example data below charge_model_col would be set to chargetype:

    unit_label

    The unit_label parameter is used by reports to provide a meaningful description of the units of consumption associated with a service. A virtual machine may have a unit label of Virtual Machines, but storage-related services may have a unit label of Gb for example.

    If the unit_label parameter is not specified then a default lavel of Units will be used.

    The unit label may be up to 63 characters in length. Longer values will be truncated.

    unit_label_col

    In cases where the services contained in the usages_col column collectively require more than one unit label, the unit_label_col parameter can be used to identify a column in the usage data which contains an appropriate label for each service.

    For example unit_label_col = label can be used to associate an appropriate label using the data below:

    The parameters rate_col, set_rate_using, fixed_price_col, set_fixed_price_using, cogs_col, set_cogs_using, fixed_cogs and fixed_cogs_using (all of which are detailed below) collectively determine the types of charge that will be associated with the service definitions created by the services statement.

    rate_col

    The rate_col parameter is used to determine the column in the usage data which contains the unit rates for the service definitions created by the services statement.

    As each service definition is created, an initial rate revision is also created which contains the column named by the rate_col parameter. When a report is run, for each day in the reporting range the unit rate for that day will be determined by whatever value is in the column named by the rate_col parameter in the usage data.

    This means that only a single rate revision is required, even if the actual value in the rate_col column is different from day to day.

    set_rate_using

    The set_rate_using parameter is also used to determine the unit rate for each service. This differs from the rate_col parameter in that the values in the column named by set_rate_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

    This means that the unit cost is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

    Either of rate_col or set_rate_using (but not both) may be used in a single services statement

    fixed_price_col

    The fixed_price_col parameter is used to determine the column in the usage data which contains the fixed price associated with the service definitions created by the services statement.

    As each service definition is created, an initial rate revision is also created which contains the column named by the fixed_price_col parameter. When a report is run, for each day in the reporting range the fixed price for that day will be determined by whatever value is in the column named by the fixed_price_col parameter in the usage data.

    If a monthly service has different fixed prices for different days in the month, then whichever results in the highest charge will be used.

    This means that only a single rate revision is required, even if the actual value in the fixed_price_col column is different from day to day.

    set_fixed_price_using

    The set_fixed_price_using parameter is also used to determine the fixed price for each service. This differs from the fixed_price_col parameter in that the values in the column named by set_fixed_price_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

    This means that the fixed price is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

    Either of fixed_price_col or set_fixed_price_using (but not both) may be used in a single services statement

    cogs_col

    The cogs_col parameter is used to determine the column in the usage data which contains the COGS rate associated with the service definitions created by the services statement.

    As each service definition is created, an initial rate revision is also created which contains the column named by the cogs_col parameter. When a report is run, for each day in the reporting range the COGS rate for that day will be determined by whatever value is in the column named by the cogs_col parameter in the usage data.

    If a monthly service has different COGS rates for different days in the month, then whichever results in the highest charge will be used.

    This means that only a single rate revision is required, even if the actual value in the cogs_col column is different from day to day.

    set_cogs_using

    The set_cogs_using parameter is also used to determine the COGS rate for each service. This differs from the cogs_col parameter in that the values in the column named by set_cogs_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

    This means that the COGS rate is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

    Either of cogs_col or set_cogs_using (but not both) may be used in a single services statement

    fixed_cogs_col

    The fixed_cogs_col parameter is used to determine the column in the usage data which contains the fixed COGS price associated with the service definitions created by the services statement.

    As each service definition is created, an initial rate revision is also created which contains the column named by the fixed_cogs_col parameter. When a report is run, for each day in the reporting range the fixed COGS price for that day will be determined by whatever value is in the column named by the fixed_cogs_col parameter in the usage data.

    If a monthly service has different fixed COGS prices for different days in the month, then whichever results in the highest charge will be used.

    This means that only a single rate revision is required, even if the actual value in the fixed_cogs_col column is different from day to day.

    set_fixed_cogs_using

    The set_fixed_cogs_using parameter is also used to determine the fixed COGS price for each service. This differs from the fixed_cogs_col parameter in that the values in the column named by set_fixed_cogs_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

    This means that the fixed COGS price is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

    Either of fixed_cogs_col or set_fixed_cogs_using (but not both) may be used in a single services statement

    set_min_commit_using

    The set_min_commit_using parameter is used to set the minimum commit value in the initial rate revision for each service.

    The values in the column identified by set_min_commit_using are extracted from the usage data and used as numeric literals in the revision.

    effective_date

    When creating the initial rate revision for a service, the value specified by the effective_date parameter is interpreted as a yyyyMMdd value to determine the date from which the revision should be applied.

    If the effective_date parameter is omitted then the current will be used by default.

    When using effective_date, the value will be used to set the initial rate revision date for all the service definitions created by the services statement. If different services require different effective dates then the effective_date_col parameter may be used to determine the effective date for each service from a column in the usage data.

    effective_date_col

    If there is a column in the usage data containing yyyyMMdd values representing the desired effective date for the initial revision of each service, The effective_date_col parameter may be used to extract the values from this column and set the effective date for each service accordingly.

    Either of effective_date or effective_date_col may be specified in a single services statement, but not both

    Examples

    Values in this column are used to distinguish between service instances

    description_col

    1

    Values in this column determine the service description for each service definition

    category or group

    n/a

    A specific category string to use for all service definitions

    category_col or group_col

    1

    Values in this column determine the service category for each service definition

    interval

    n/a

    A specific charging interval to use for all service defintions

    interval_col

    1

    Values in this column determine the charging interval for each service definition

    model

    n/a

    A specific proration setting to use for all service definitions

    model_col

    1

    Values in this column determine the proration setting for each service definition

    charge_model

    n/a

    A specific charge model to use for all service definitions (may be peak , average, last_day or day_xxx, where xxx is number in range 1-28)

    charge_model_col

    1

    Values in this column determine the charge_model for each service definition

    unit_label

    n/a

    A specific unit label to use for all service definitions

    unit_label_col

    1

    Values in this column determine the unit label for each service definition

    rate_col

    2

    Set the column name from which Edify will determine rate per unit at report time

    set_rate_using

    1

    Values in this column determine the rate per unit for each service definition

    fixed_price_col

    2

    Set the column name from which Edify will determine the fixed price per charging interval for each service definition

    set_fixed_price_using

    1

    Values in this column determine the fixed_price per charging interval for each service definition

    cogs_col

    2

    Set the column name from which Edify will determine the COGS rate per unit for each service definition

    set_cogs_using

    1

    Values in this column determine the COGS rate per unit for each service definition

    fixed_cogs_col

    2

    Set the column name from which Edify will determine the fixed COGS price per charging interval for each service defnition

    set_fixed_cogs_using

    1

    Values in this column determine the fixed COGS price per charging interval for each service definition

    set_min_commit_using

    1

    Values in this column determine the minimum commit for each service definition

    effective_date_col

    1

    Values in this column determine the effective date of the rate revision created for each service definition

    effective_date

    n/a

    A specific effective date to use in the rate revision created for each service definition

    Parameter

    Type

    Meaning

    usages_col

    2

    The name of the column from which units of consumption are derived

    service_type

    n/a

    Determines how the usages_col and consumption_col values are used when interrogating the usage data to get the units of consumption

    consumption_col

    2

    The name of the column containing units of consumption (AUTOMATIC services only - see below for more details)

    instance_col

    daily RDF
    daily RDF
    global database
    data date

    2

    services {
        usages_col = ServiceName
        effective_date = 20160101
        set_cogs_using = cogs_prices
        #set_fixed_cogs_using = cogs_prices
        description_col = service_description
        unit_label_col = units_description
        set_min_commit_using = minimum_commit
        category_col = service_group
        interval_col = charging_interval
        model_col = proration
        charge_model_col = charge_model
        set_rate_using = rate
        set_fixed_price_using = fixed_prices
    }
    service_name,Small VM,Large VM
    Small VM,1,0
    Small VM,4,0
    Large VM,0,6
    Large VM,0,4
    service_name,quantity
    Small VM,1
    Small VM,4
    Large VM,6
    Large VM,4
    vmid,service_name,quantity
    444,Small VM,1
    444,Small VM,4
    555,Large VM,6
    666,Large VM,4
    vmid,service_id,quantity,description
    444,ABC123,1,Small VM
    444,ABC123,4,Small VM
    555,DEF789,6,Large VM
    666,DEF789,4,Large VM
    instance_id,service_name,quantity,description,category
    444,Small VM,1,Bronze Computing Service,Virtual Machines
    444,Small VM,4,Bronze Computing Service,Virtual Machines
    555,Large VM,6,Gold Computing Service,Virtual Machines
    666,Large VM,4,Gold Computing Service,Vortual Machines
    999,SSD Storage,50,Fast Storage,Storage
    instance_id,service_name,quantity,description,category,interval
    444,Small VM,1,Bronze Computing Service,Virtual Machines,monthly
    444,Small VM,4,Bronze Computing Service,Virtual Machines,monthly
    555,Large VM,6,Gold Computing Service,Virtual Machines,monthly
    666,Large VM,4,Gold Computing Service,Vortual Machines,monthly
    999,SSD Storage,50,Fast Storage,Storage,daily
    instance_id,service_name,quantity,description,category,interval,model
    444,Small VM,1,Bronze Computing Service,Virtual Machines,monthly,prorated
    444,Small VM,4,Bronze Computing Service,Virtual Machines,monthly,prorated
    555,Large VM,6,Gold Computing Service,Virtual Machines,monthly,prorated
    666,Large VM,4,Gold Computing Service,Vortual Machines,monthly,prorated
    999,SSD Storage,50,Fast Storage,Storage,daily,unprorated
    instance_id,service_name,quantity,description,category,interval,model,chargetype
    444,Small VM,1,Bronze Computing Service,Virtual Machines,monthly,prorated,average
    444,Small VM,4,Bronze Computing Service,Virtual Machines,monthly,prorated,average
    555,Large VM,6,Gold Computing Service,Virtual Machines,monthly,prorated,peak
    666,Large VM,4,Gold Computing Service,Vortual Machines,monthly,prorated,peak
    999,SSD Storage,50,Fast Storage,Storage,daily,unprorated,peak
    instance_id,service_name,quantity,description,category,interval,model,label
    444,Small VM,1,Bronze Computing Service,Virtual Machines,monthly,prorated,Virtual Machines
    444,Small VM,4,Bronze Computing Service,Virtual Machines,monthly,prorated,Virtual Machines
    555,Large VM,6,Gold Computing Service,Virtual Machines,monthly,prorated, Virtual Machines
    666,Large VM,4,Gold Computing Service,Vortual Machines,monthly,prorated, Virtual Machines
    999,SSD Storage,50,Fast Storage,Storage,daily,unprorated,Gb
    A service definition must have at least one charge type and may have up to three (as potentially a *rate*, a *fixed rate* and either of *cogs* or *fixed cogs* may be used)
    services {
        usages_col = ServiceName
        effective_date = 20180101
        set_cogs_using = cogs_prices
        description_col = service_description
        unit_label_col = units_description
        set_min_commit_using = minimum_commit
        category_col = service_group
        interval_col = charging_interval
        model_col = proration
        set_rate_using = rate
        set_fixed_price_using = fixed_prices
    }