1 of 34

Language

Syntax

Within the individual reference articles for each statement, the syntax is described using the following conventions:

bold for keywords
italics for arguments
Square brackets for optional keywords and arguments [likethis]
Vertical pipe for alternative keyword options just|exactly as shown
Ellipses for a variable length list of arguments: Column1 ... ColumnN

Refer to the page for more information regarding datasets, fully qualified column names and related information.

Reference

The following statements (in alphabetical order) are supported by Transcript:

aggregate

Overview

The aggregate statement is used to reduce the number of rows in a DSET while preserving required information within them

Syntax

aggregate[dset.id][notime|daily] [offsetoffset][nudge] [default_functionfunction] colname function [... colname function]

Details

The aggregate statement is a powerful tool for reducing the number of rows in a DSET. Aggregation is based on the concept of matching rows. Any two rows that match may be merged into a single row which selectively retains information from both of the original rows. Any further rows that match may also be merged into the same result row.

A quick introduction

A match is determined by comparing all the columns which have a function of match associated with them (further information regarding this can be found below). If all the column values match, then the rows are merged.

Merging involves examining all the columns in the data that were not used in the matching process. For each of those columns, it applies a function to the values in the two rows and updates the result row with the computed result of that function. For a full list of functions, please refer to the table further down in this article.

To illustrate this consider the following two row dataset:

If we don't care about the colour value in the above records, we can combine them together. We do care about the quantity however, so we'll add the two values together to get the final result.

The statement to do this is:

aggregate notime id match location match quantity sum

id match means that the values in the id columns must be the same
location match means that the values in the location columns must be the same
quantity sum means that the resulting value should be the sum of the two existing values

Applying these rules to the above example we get the following single result record:

A column calledEXIVITY_AGGR_COUNT is automatically created by the aggregate statement and for each row in the output it will contain the number of source rows that were merged together to create that result row

Parameters

The aggregate statement accepts a range of parameters as summarised in the table below:

If two records are deemed suitable for merging then the function determines the resulting value in each column. The available functions are as follows:

Non time-sensitive aggregation

When the notime parameter is specified, the aggregation process treats any columns flagged as start and end times in the data as data columns, not timestamp columns.

In this case when comparing two rows to see if they can be merged, the aggregation function simply checks to see if all the columns with a function of match are the same, and if they are the two rows are merged into one by applying the appropriate function to each column in turn.

De-duplication

The following illustrates the aggregate statement being used to remove duplicate rows from a DSET:

The analysis of the statement above is as follows:

notime - we are not interested in timestamps
default_function match - by default every column has to match before records can be aggregated
subscription_id match - this is effectively redundant as the default_function is match but needs to be present because at least one pair of colname function parameters is required by the aggregate

The resulting DSET will have no duplicate data rows, as each group of rows whose column values were the same were collapsed into a single record.

Row reduction while preserving data

The example shown at the top of this article used the sum function to add up the two quantity values, resulting in the same total at the expense of being able to say which source record contributed which value to that total.

The sum function can therefore accurately reflect the values in a number of source rows, albeit with the above limitation. By using a function of sum, max or min, various columns can be processed by aggregate in a meaningful manner, depending on the specific use case.

Time-sensitive aggregation

When aggregating, columns containing start time and end time values in UNIX epoch format can be specified. Each record in the DSET therefore has start and end time markers defining the period of time that the usage in the record represents. As well as taking the start times and end times into account, time-sensitive aggregation can perform additinal manipulations on these start and end times.

A quick example

Consider the following CSV file called aggregate_test.csv:

It is possible to aggregate these into 3 output records with adjusted timestamps using the following Transcript task:

Resulting in:

As can be seen, for each unique combination of the values in the id,subscription-id and service columns, the start and end times have been adjusted as described above and the quantity column contains the sum of all the values in the original rows.

When performing time-sensitive aggregation, any records with a start or end time falling outside the current will be discarded.

Further notes

The daily parameter to aggregate means that the START_TIME and END_TIME columns are now recognised as containing timestamps. When aggregating with the daily option, timestamps within the current dataDate are combined to result in an output record which has the earliest start time and the latest end time seen within the day.

Optionally, following daily an offset may be specified as follows:

aggregate aggr.test daily offset 2 id match subscription_id match quantity sum

In this case the start and end timestamps are adjusted by the number of hours specified after the word offset before aggregation is performed. This permits processing of data which has timestamps with timezone information in them, and which may start at 22:00:00 of the first day and end at 21:59:59 of the second day, as an offset can be applied to realign the records with the appropriate number of hours to compensate.

The nudge parameter shaves 1 second off end times before aggregating in order to avoid conflicts where hourly records start and end on 00:00:00 9the last second of the current hour is the same as the first second of the next hour)

append

Overview

The append statement is used to append one to the end of another.

calculate

Overview

The calculate statement is used to perform arithmetic operations using literal and column values.

capitalise

Overview

The capitalise (the spelling capitalize is also supported) statement is used to modify the name of a column and/or the values in a column such that the first character is a capital letter and the remaining characters are lower case.

Syntax

capitalise values|heading [and values|heading] in column|columnsColName1 [... ColNameN]

After the keyword in, either of the keywords column or columns may be used

Details

The heading and values keywords refer to the name of an existing column and the values in each row for that column respectively.

Only the first character in the column name or value is converted to upper case. If this character is not a letter, then the statement will have no effect on it. For example applying the statement:

will have no effect on the column name as the first character is an underscore. However, applying the same statement to a column called _VMName would result in a new name of _vmname as after attempting to make the first character a capital (which in the case of the underscore has no effect), the remaining characters are converted to lower case.

Any number of column names may be specified, and any of these may or may not be fully qualified. When applied to values in a column, blank values are ignored.

Examples

convert

Overview

The convert statement is used to convert values in a column from base-10 to base-16 or vice-versa.

copy

This article covers both the copy and move statements. They both work in the same way apart from the fact that move deletes the source row after copying it.

Overview

The copy statement is used to copy rows from one DSET to another

Syntax

copy rows todset.id

move rows todset.id

Details

Both copy and move must be used within the body of a statement. Only rows that match the expression will be copied (or moved).

The DSET from which rows will be copied or moved is automatically determined from the expression used by the where statement.
The DSET to which rows will be copied or moved is determined by the dset.id parameter

The source and destination DSETs must be different (it is not possible to copy or move a row within the same DSET).

The destination DSET may or may not exist. If it does not exist then it will be created. If it does exist then the following logic is applied:

If the destination DSET has more columns than the source DSET then the new rows in the destination DSET will have blank values in the rightmost columns
If the destination DSET has fewer columns than the source DSET then the destination DSET will be extended with enough new columns to accomodate the new rows. In this case, existing rows in the destination DSET will have blank values in the rightmost columns

If the destination DSET is extended to accomodate the source rows then the new (rightmost) columns will have the same names as the equivalent columns in the source DSET. In the event that this would cause a naming conflict with existing columns in the destination DSET, one or more new columns in the destination DSET will heve a suffix added to their name to ensure uniqueness. This suffix takes the form _N where N is a number starting at 2.

To illustrate this, if the source DSET has columns called subscription,name,address,hostname and the destination DSET has a single column called name then the resulting extended destination DSET would have columns called name,subscription,name_2,address,hostname.

Example

default

Overview

The default statement is used to explicitly define the default DSET to use when specifying a column name as an argument in subsequent statements.

Syntax

default dsetsource.alias

Details

Given that multiple DSETS can be loaded at once, it is necessary to specify which DSET to operate on when performing actions such as creating a new column with . A column name in a Transcript statement is assumed as belonging to the default DSET unless it is a column name.

If there is no default statement in the Transcript, then the first CSV file imported via the statement will automatically be designated as the default DSET.

The default statement can be used multiple times throughout a Transcript, in which case the default DSET will be whichever was specified by the last default statement executed.

Lastly, when executing a statement, unless otherwise specified, the default DSET will be used to populate the reporting database created as a result.

When changing the default DSET, any definitions that referenced the default DSET at the time they were created will be updated with the new default DSET.

Examples

Set custom.datafile as the default DSET:

default dset custom.datafile

delete

Overview

The delete statement is used to delete one or more columns or rows from one or more DSETs.

export

Overview

The export statement is used to snapshot the data in a DSET and write it to disk as a Dataset.

finish

Overview

The finish statement creates a Reporting Database File (RDF) from a DSET. The RDF can subsequently be used by the reporting engine.

Syntax

finish[dset.id]

Details

The finish statement is used to create an RDF from a DSET. Only a single DSET can be used to create an RDF, but multiple finish statements may be used within the same task file. If there is no dset.id parameter then the default DSET will be used.

The RDF created by finish will be saved as <BaseDir>\system\report\<yyyy>\<MM>\<dd>_source.alias.rdf

where:

<yyyy> is the 4-digit year
<MM> is the 2-digit month
<dd> is the 2-digit day

Any existing RDF with the same name will be overwritten.

Examples

Create a Reporting Database file for the default DSET: finish

Create a Reporting Database file for the DSET Azure.usage finish Azure.usage

include

Overview

The include statement is used to combine Transcript task files together

Syntax

lowercase

Overview

The lowercase statement is used to modify the name of, and/or the values in, a column such that any upper case characters are replaced with their lower case equivalent.

normalise

Overview

The normalise statement is used to update the values in a numerical column such that they are all positive, negative or inverted.

In this documentation the spelling normalise

rename

Overview

The rename statement is used to change the name of an existing column in a DSET, or to change the source and/or alias of a DSET

round

Overview

The round statement is used to ensure that numeric values in a column are whole multiples of a specified number.

Syntax

roundcolName [direction][to nearestvalue]

Details

The round statement is used to round numbers to the nearest multiple of any integer or floating point number.

The parameters supported by round are as follows:

The simplest form of the statement is round colName. This will use the defaults shown in the table above such that values are rounded to the next highest integer. Alternatively, the statement round colName down will round down to the nearest integer.

If the value argument is provided then the numbers in colName will be rounded up or down to the nearest multiple of value. For example the statement ...

round Quantity up to nearest 5

... will round the numbers in the column Quantity up to the next multiple of 5. Thus any number in Quantity higher than 10 and less than 15 will be rounded to 15.

When specifying the value argument, floating point values are supported.

Additional notes

The round statement observes the following rules:

Non-numeric values in colName are ignored
Blank values, or a value of 0 in colName are ignored
Numbers in colName that are already a whole multiple of value will not be modified

The round statement may be used in the body of a statement to perform conditional rounding.

Examples

set

Overview

The set statement is used to write a specified value into all the cells in any given column, or to copy values from one column to another.

split

Overview

The split statement is used to create and/or update columns by splitting the textual value of an existing column into multiple parts.

terminate

The terminate statement will exit the transcript task immediately.

Syntax

terminate

uppercase

Overview

The uppercase statement is used to modify the name of, and/or the values in, a column such that any lower case characters are replaced with their upper case equivalent.

aggregate

Overview

The aggregate statement is used to reduce the number of rows in a DSET while preserving required information within them

Syntax

aggregate[dset.id][notime|daily] [offsetoffset][nudge] [default_functionfunction] colname function [... colname function]

Details

A quick introduction

To illustrate this consider the following two row dataset:

If we don't care about the colour value in the above records, we can combine them together. We do care about the quantity however, so we'll add the two values together to get the final result.

The statement to do this is:

aggregate notime id match location match quantity sum

id match means that the values in the id columns must be the same
location match means that the values in the location columns must be the same
quantity sum means that the resulting value should be the sum of the two existing values

Applying these rules to the above example we get the following single result record:

Parameters

The aggregate statement accepts a range of parameters as summarised in the table below:

If two records are deemed suitable for merging then the function determines the resulting value in each column. The available functions are as follows:

Non time-sensitive aggregation

When the notime parameter is specified, the aggregation process treats any columns flagged as start and end times in the data as data columns, not timestamp columns.

De-duplication

The following illustrates the aggregate statement being used to remove duplicate rows from a DSET:

The analysis of the statement above is as follows:

notime - we are not interested in timestamps
default_function match - by default every column has to match before records can be aggregated
subscription_id match - this is effectively redundant as the default_function is match but needs to be present because at least one pair of colname function parameters is required by the aggregate

The resulting DSET will have no duplicate data rows, as each group of rows whose column values were the same were collapsed into a single record.

Row reduction while preserving data

Time-sensitive aggregation

A quick example

Consider the following CSV file called aggregate_test.csv:

It is possible to aggregate these into 3 output records with adjusted timestamps using the following Transcript task:

Resulting in:

When performing time-sensitive aggregation, any records with a start or end time falling outside the current will be discarded.

Further notes

Optionally, following daily an offset may be specified as follows:

aggregate aggr.test daily offset 2 id match subscription_id match quantity sum

service

This article assumes a knowledge of services, their rates and related concepts as documented in Services in Exivity

Overview

The service statement is used to create or modify a single during the execution of a Transcript task. The service definition is associated with the data in an existing DSET, and the is updated if and when the Transcript task successfully completes.

It is recommended not to use the service statement for creating services, rather to use .

Syntax

service{ param1 = value [... paramN = value] }

Example:

Parameters may be specified in any order. The '=' between parameters and their values is optional and may be omitted, but if present it must be surrounded by white-space

Details

Summary

The service statement is used to create a new service definition. Once created, a service definition automatically becomes available to the reporting engine.

Parameter table

The service statement creates a new service using the following parameters:

Parameter details

key

The key parameter must be distinct from that in any other service unless it is used to identify an existing service definition to be overwritten with new values. By default, an attempt to create a service with a key that is a duplicate to an existing service in the global database will result in a warning in the logfile and no further action will be taken.

If the key matches that of a service that has been defined previously in the task file then one of two things can happen depending on the value of the current :

If the mode is set to strict then an error is logged and the task will fail
If the mode is set to permissive then a warning is logged and the newest service definition is ignored

To override the default protection against overwriting existing service definitions in the global database, the statement should be invoked prior to the service statement.

The value of the key parameter may be up to 127 characters in length. Longer names will be truncated (and as a result may no longer be unique).

description

The description parameter is freely definable and does not have to be unique. When a report is generated, the description of a service will be used on that report so care should be taken to make the description meaningful.

By default, if description is not specified in a service definition then a copy of the key will be used as the description.

The value of the description parameter may be up to 255 characters in length. Longer descriptions will be truncated.

category / group

Either category or group may be used. The two terms are interchangeable in this context.

The category parameter is used to logically associate a service with other services. All services sharing the same category will be grouped together on reports. Any number of different categories may exist and if the category specified does not exist then it will be automatically created.

If no category is specified then the service will be placed into a category called Default.

The value of the category parameter may be up to 63 characters in length. Longer values will be truncated.

usage_col

In order to calculate a price for a service, the number of units of that service that were consumed needs to be known. The value of the usage_col parameter specifies the column in the usage data which contains this figure.

The usage_col argument is also used to derive the DSET associated with the service as follows:

If the value of usage_col is a column name then the service will be associated with the DSET identified by that name
If the value of usage_col is not fully qualified then the service will be associated with the default DSET

If the column specified by the usage_col argument is determined not to exist using the above checks, then one of two things can happen depending on the value of the current :

If the mode is set to strict then an error is logged and the task will fail
If the mode is set to permissive then a warning is logged and the service definition is ignored

The value of the usage_col parameter may be up to 255 characters in length. Longer values will be truncated.

interval

Services may be charged in different ways. For example some services will invoke a charge whenever they are used whereas others are charged by a time interval, such as per month.

The value of the interval parameter determines how the service should be charged as per the following table:

model

Specifies whether or not to apply when calculating the charge for a monthly service.

Currently, only monthly services can be prorated, based on the number of days in the month the service was used.

unit_label

The unit_label is the label used for units of consumption on reports. For example storage-related services may be measured in Gb or Tb, Virtual Machines may be measured in Instances and software usage may be measured in licenses.

The specified unit_label value may be up to 63 characters in length. Longer values will be truncated.

If the unit_label parameter is not specified then a default value of Units will be used.

account_id

The account_id parameter is intended for internal use only and should not be used

The optional account_id references an entry in the account table in the global database. If specified, the service will be created with a rate revision specific to the account id.

The default rate for the service (which applies to all combinations of report column values not explicitly associated with their own rate) can be defined in any of the following ways:

By omitting the account_id parameter altogether
By specifying an account_id of 0
By specifying an account_id of *

rate

The rate parameter determines the cost per unit of consumption to use when calculating the charge. A rate of 0.0 may be used if there is no charge associated with the service.

This may be used in conjunction with a fixed_price and one of cogs or fixed_cogs.

Either or both of rate or fixed_price may be specified, but at least one of them is required.

fixed_price

The fixed_price is a charge applied to the service per charge interval regardless of the units of consumption.

This may be used in conjunction with a rate and one of cogs or fixed_cogs

Either or both of rate or fixed_price may be specified, but at least one of them is required.

cogs

The optional COGS charge is the price to the provider of the service. Usually, COGS-related charges are not included on reports special permissions are required to see the COGS charges.

For services with a defined cogs value it is possible to generate Profit and Loss reports, the profit/loss being the total charge calculated from the rate and/or fixed_rate values minus the price calculated from the cogs or fixed_cogs values.

fixed_cogs

The optional fixed_cogs value is a fixed price to be factored into COGS-related calculations regardless of the number of units consumed in the charging interval.

min_commit

The optional min_commit parameter specifies a minimum number of units of consumption to include when calculating the charge associated with a service. In cases where the actual consumption is greater than the min_commit value this will have no effect, but where the actual consumption is less than the minimum commit the price will be calculated as if the min_commit units had been consumed.

charge_model

If specified as peak then the charge for monthly service created will be calculated based on the day with the highest charge in the month.

If specified as average then the charge for monthly service created will be calculated as the average unit price (for days that have usage only) multiplied by the average quantity (days with no usage will be treated as if they had a quantity of 0).

If specified as last_day then the charge for monthly service created will be calculated based on the last day of the month.

If specified as day_xxx (where xxx is a number in 1-28 range) then the charge for monthly service created will be calculated based on the specified day of the month.

The Service Definition cache

Once all parameters have been evaluated by the service statement the resulting service definition is added to a cache in memory. This cache is committed to the at the successful conclusion of the Transcript task (or discarded in the case of error).

Once a service definition has been added to the cache, no subsequent service definitions with the same key may be added to the cache. In the event of conflict the first definition written to the cache is perserved and the attempt to add a duplicate will result in a warning or an error depending on the value of the current as described in the section above.

Rate revisions

As described in a service may have multiple rate revisions associated with it. The service statement can only create a single rate revision per service per execution of a Transcript task.

A rate revision consists of the rate, fixed_rate, cogs, fixed_cogs and effective_date parameters. Each revision must have a different effective_date which indicates the date from which that service revision is to be used. A rate definition remains in force for all dates on or after the effective_date, or until such time as a rate revision with a later effective_date is defined (at which point that revision comes into effect).

To create multiple revisions, Transcript must be run multiple times using a service statement that has the same key and uses the same rate, fixed_rate, cogs and fixed_cogs values but has a different effective_date each time. For each of the effective_date parameters a new rate revision will be created for the service.

Updating the Global Database

At the successful conclusion of a Transcript task the is updated from the memory cache.

Examples

if

Overview

The if statement is used to conditionally execute one or more statements

Syntax

Conditional Expressions

A conditional expression (hereafter referred to as simple an expression) is evaluated to provide a TRUE or FALSE result which in turn determines whether one or more statements are to be executed or not. The following are examples of a valid expression:

An expression used by the if statement may contain:

Numeric and string literals
Regular expressions
Variables
Operators

Numeric and string literals

A literal is a specified value, such as 4.5 or "hostname". Literals may be numbers or strings (text).

If a literal is non-quoted then it will be treated as a number if it represents a valid decimal integer or floating point number (in either regular or scientific notation), else it will be treated as a string.

If a literal is quoted then it is always treated as a string, thus 3.1515926 is a number and "3.1415926" is a string.

Regular expressions

Regular expressions must be enclosed within forward slashes (/), and are assumed to be in .

If present, a regular expression must be used on the right hand side of either an !~ or an =~ operator, and when evaluated it will be applied to the value on the left hand side of an operator, eg:

As the forward slash is used as a delimiter for the expression, any literal forward slashes required by the expression should be escaped with a back-slash: \/

Variables

can be used within expressions, in which case they are replaced with their values. Once expanded, these values are treated as literals.

Operators

Operators are evaluated according to the operator precedence rules in the table below (where the highest precedence is evaluated first), unless parentheses are used to override them. Operators with the same precedence are evaluated from left to right.

Although expressions are evaluated based on the precedence of each operator as listed in the above table, it is recommended that parenthesis are used within the expression in order to remove any ambiguity on the part of a future reader.

Functions

A function is used to evaluate one or more arguments and return a result which is then taken into consideration when evaluating the overall truth of the expression.

Function calls start with a the character @ which is followed by the function name and a comma separated list of parenthesised parameters, for example @MIN(1, 2, 3) .

Function names must be specified in UPPER CASE as shown in the examples below.

The following functions are supported by the if statement:

Numeric functions

MIN

Return the smallest number from the specified list (requires at least 2 arguments)

Examples:

@MIN(1,2) returns 1
@MIN(1,2,-3) returns -3
@MIN(1,2,"-1")

MAX

Return the largest number from the specified list (requires at least 2 arguments)

Examples:

@MAX(1,2) returns 2
@MAX(-1,-2,-3) returns -1
@MAX(1,2,100/10)

ROUND

Returns number rounded to digits decimal places. If the digits argument is not specified then the function will round to the nearest integer.

This function rounds half away from zero, e.g. 0.5 is rounded to 1, and -0.5 is rounded to -1

Examples:

@ROUND(3.1415,3) returns 3.142
@ROUND(3.1415,2) returns 3.14
@ROUND(3.1415926536,6)

String functions

CONCAT

This function will treat all its arguments as strings, concatenate them and return the result.

Examples:

@CONCAT("the answer ", "is") returns the answer is
@CONCAT("the answer ", "is", " 42") returns the answer is 42
@CONCAT("the answer ", "is", " ", 42)

SUBSTR

Return a sub-string of string, starting from the character at position start and continuing until the end of the string end until the character at position length, whichever is shorter.

If length is omitted, then the portion of the string starting at position start and ending at the end of the string is returned.

Examples:

@SUBSTR("abcdef", 1) returns abcdef
@SUBSTR("abcdef", 3) returns cdef
@SUBSTR("abcdef", 3, 2)

STRLEN

Returns the length of its argument in bytes.

Examples:

@STRLEN("foo") returns 3
@STRLEN(@CONCAT("ab", "cd")) returns 4
@STRLEN(1000000) returns 7 (the number 1000000 is treated as a string)

PAD

This function returns value, left-padded with pad_char (0 by default) up to specified width. If width is less than or equal to the width of value, no padding occurs.

Examples:

@PAD(5, 123) returns 00123
@PAD(5, 12345) returns 12345
@PAD(1, 12345)

EXTRACT_BEFORE

This function returns the substring of string that precedes the pattern. If pattern cannot be found in the string, or either string or pattern are empty, result of the function is empty string.

Examples:

@EXTRACT_BEFORE("abcdef", "d") returns ab
@EXTRACT_BEFORE("abcbc", "bc") returns a
@EXTRACT_BEFORE("abcdef", "x")

EXTRACT_AFTER

This function returns the substring of string that follows the pattern. If pattern cannot be found in the string, or either string or pattern are empty, result of the function is empty string.

Examples:

@EXTRACT_AFTER("abcdef", "cd") returns ef
@EXTRACT_AFTER("abcabc", "ab") returns cabc
@EXTRACT_AFTER("abcdef", "abb")

EXTRACT_XXX functions can be combined to extract the middle part of the string, for example @EXTRACT_AFTER(@EXTRACT_BEFORE("abcdef", "ef"), "ab") returns cd.

Date functions

All date functions operate with dates in yyyyMMdd format

CURDATE

Returns the current (actual) date in the timezone of the Exivity server. The format may be any valid combination of specifiers. The default format is %Y%m%d which returns a date in yyyyMMdd format.

Examples (assuming run date is 1 July 2019, at 12:34:56):

@CURDATE() returns 20190701
@CURDATE(\"%d-%b-%y\") returns 01-Jul-19
@CURDATE("%H:%M:%S")

DATEADD

Adds a specified number of days to the given date, returning the result as a yyyyMMdd date.

Invalid dates are normalised, where possible (see example below):

Examples:

@DATEADD(20180101, 31) returns 20180201
@DATEADD(20180101, 1) returns 20180102
@DATEADD(20171232, 1)

DATEDIFF

Returns the difference in days between two yyyyMMdd dates. A positive result means that date1 is later than date2. A negative result means that date2 is later than date1. A result of 0 means that the two dates are the same.

Invalid dates are normalised, when possible (see example below):

Examples:

@DATEDIFF(20190101, 20180101) returns 365
@DATEDIFF(20180201, 20180101) returns 31
@DATEDIFF(20180102, 20180101)

DTADD

This function adds count number of units (DAYS by default) to the specified datetime value and return normalised result datetime value in YYYYMMDDhhmmss format.

Datetime can be in any of the following formats:

YYYYMMDD
YYYYMMDDhh
YYYYMMDDhhmm

All missing bits of datetime value assumed zeros.

Supported units are (both singular and plural spellings supported):

YEAR
MONTH
DAY (default)

Example

@DTADD(20190701, 2) returns 20190703000000
@DTADD(20190701, 2, HOURS) returns 20190701020000
@DTADD(2019070112, 50, DAYS)

Transcript-specific functions

Transcript-specific functions may be preceded with an exclamation mark in order to negate their output. For example:

FILE_EXISTS

Returns 1 if the file filename exists, else returns 0.

FILE_EMPTY

In mode, this function returns 1 if the file filename exists and is empty. If the file does not exist, then this is considered an error.

In mode, a non-existent file is considered equivalent to an existing empty file.

In either case, if the file exists and is not empty, the function returns 0

FILE_EXISTS and FILE_EMPTY functions will only check files within Exivity home directory and its sub-directories, filename must contain pathname relative to Exivity home directory.

DSET_EXISTS

Returns 1 if the specified DSET exists, else 0

DSET_EMPTY

In mode (option mode = strict), this function returns 1 if the specified DSET exists and is empty. If the DSET does not exist, then this is considered an error.

In mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET.

In either case, if the DSET exists and is not empty, the function returns 0.

COLUMN_EXISTS

This function returns 1 if the specified column exists, else 0. The column name may be , but if it is not, then it is assumed to be in the default DSET.

DSET_ROWCOUNT

This function returns number of rows within specified DSET.

In mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET, and zero is returned.

DSET_COLCOUNT

This function returns number of columns within specified DSET.

In mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET, and zero is returned.

services

This article assumes a knowledge of services, their rates and related concepts as documented in Services in Exivity and in the article on the service statement

Overview

The services statement is used to create or modify multiple services based on the data in a .

Syntax

services{ param1 = value [ ... paramN = value] }

Example:

Parameters may be specified in any order. The '=' between parameters and their values is optional and may be omitted, but if present it must be surrounded by white-space

Details

Summary

The services statement is used to create or modify multiple services from the data in a . The parameters supplied to the statement map columns in the usage data to attributes of the created services.

How column names are used

For many of the parameters to the services statement there are two ways of using a column name:

The values in the column are extracted from the usage data and those values are embedded as literals into the service definition.
The column name itself is used in the service definition such that the reporting engine dynamically determines the values to use for any given day when generating report data

When creating or updating services using the first method, Transcript will create a new rate revision every time the rate information changes. For data sources such as Microsoft Azure where the rates can change daily this will result in a lot of rate revisions in the .

Using the second method requires only a single rate revision which identifies by name the column(s) containing rate and/or COGS information. When a report is run, the charge engine then obtains the correct rate information for any given day from the data in those named columns.

Parameter table

The parameters supported by the services statement are summarised in the following table. The Type column in the table indicates the way the column name is used as described in How column names are used above. Additional information about each parameter can be found below the summary table itself.

Parameter details

usages_col

The usages_col parameter is the name of a column containing service keys. A service will be created for each distinct value in this column, and these values will be used as the service keys.

service_type

In order to calculate the charges associated with a service it is necessary to know the number of units of that service that were consumed. Exivity supports two methods of retrieving the units of consumption from usage data and the service_type determines which of these is applied.

Any given service may use one or other of these methods, which are as follows:

Manual services: service_type = MANUAL
Automatic services: service_type = AUTOMATIC

Manual services

Manual services require that the units of consumption for each service named in the usages_col column are stored in separate columns whose names correlate to the service keys themselves.

To illustrate this, consider the following fragment of usage data:

In this case, the service_name column contains a list of service keys and for each of those service keys there is a corresponding column containing the units of consumption for that service. Thus in the above example we can see that there are two services, "Small VM" and "Large VM" and that the units of consumption for each of these services are in the columns of the same name.

The more manual services that are represented in the data, the more columns are required.

Automatic services

Automatic services require that the units of consumption for each service named in the usages_col column are stored in the column named by the consumption_col paramater. To represent the same information as that shown in the example above, the following would be used:

It can be seen that any number of automatic services, along with their consumption figures, can be represented using only two columns of data.

consumption_col

The consumption_col parameter is only required when creating automatic services and determines the column containing the units of consumption for each service as described above.

instance_col

It is not enough to know only the units of consumption for each service, as this merely provides the total consumption for each service across the entire usage. In the examples above, for example, the "Large VM" service has 10 units of consumption but using that information alone there is no way to know if this represents one instance of a VM used 10 times, 10 instances of VMs used once each, or something in between.

The instance_col parameter is therefore required to tell the difference. Typically this will be a unique identifier which groups the units of consumption into 'buckets'. In the case of a VM this may be a VM ID which remains constant throughout the life of a VM in the cloud.

To illustrate this, we can supplement the example usage fragment used previously with additional information to use as the instance_col as follows:

By specifying instance_col = vmid we can now see that the usage represents:

5 instances of a single Small VM with an ID of 444
6 instances of a Large VM with an ID of 555
4 instances of a Large VM with an ID of 666

description_col

If specified, the description_col denotes a column containing a friendly description for reports to identify the service with.

Typically in cloud usage data, services are identified using unique IDs (referred to as keys in Exivity) which are often non-meaningful to human eyes, so Exivity supports a 'friendly' description for each service for display purposes when generating a report.

For example description_col = description may be used in conjunction with the following data to map the service_id to a friendly name:

It is not mandatory to provide a description_col parameter, but if one is not supplied then the description will be set to a duplicate of the service key (as derived via the usages_col parameter).

In the example above, it can be seen that there are multiple rows in the data for the same service key (vmid). When using description_col, the first row for each distinct value in the usages_col will be used to set the description.

category_col

Usage data normally contains information about a range of services of different types such as Virtual Machines, Storage, Networking and so on. By referencing a column in the usage data which identifies the correct category for each service, multiple categories will be created and each service assigned to the correct category by the services statement.

To illustrate this, let us extend the sample data as follows:

By specifying category_col = category each service will now be associated with the correct category.

interval

The interval parameter is used to specify a literal interval for all the services created by the services statement.

The interval parameter may be any of:

individually
daily
monthly

If the interval parameter is not specified, then a default interval of monthly will be used.

interval_col

In the event that different services in the usages_col require different charge intervals, a column name containing the interval to use may be specified using the interval_col column as follows:

By specifying interval_col = interval each service in the above usage data will be assigned the correct charge interval.

model

The model parameter is used to enable proration for monthly services. Either of unprorated or prorated may be specified.

If no model is specified, then a value of unprorated will be used by default.

model_col

In the event that different services in the consumptions_col require different proration settings, the model_col parameter can be used to specify which column contains the proration setting for each service.

By specifying model_col = model, each service in the above usage data will be assigned the correct proration model.

charge_model

If specified as peak then the charge for any monthly services created will be calculated based on the day with the highest charge in the month.

If specified as average then the charge for any monthly services created will be calculated as the average unit price (for days that have usage only) multiplied by the average quantity (days with no usage will be treated as if they had a quantity of 0).

If specified as last_day then the charge for any monthly services created will be calculated based on the last day of the month.

If specified as day_xxx (where xxx is a number in 1-28 range) then the charge for any monthly services created will be calculated based on the specified day of the month.

charge_model_col

In the event that different services in the consumptions_col require different charge models, the charge_model_col parameter can be used to specify which column contains the charge_model setting for each service. For the example data below charge_model_col would be set to chargetype:

unit_label

The unit_label parameter is used by reports to provide a meaningful description of the units of consumption associated with a service. A virtual machine may have a unit label of Virtual Machines, but storage-related services may have a unit label of Gb for example.

If the unit_label parameter is not specified then a default lavel of Units will be used.

The unit label may be up to 63 characters in length. Longer values will be truncated.

unit_label_col

In cases where the services contained in the usages_col column collectively require more than one unit label, the unit_label_col parameter can be used to identify a column in the usage data which contains an appropriate label for each service.

For example unit_label_col = label can be used to associate an appropriate label using the data below:

The parameters rate_col, set_rate_using, fixed_price_col, set_fixed_price_using, cogs_col, set_cogs_using, fixed_cogs and fixed_cogs_using (all of which are detailed below) collectively determine the types of charge that will be associated with the service definitions created by the services statement.

rate_col

The rate_col parameter is used to determine the column in the usage data which contains the unit rates for the service definitions created by the services statement.

As each service definition is created, an initial rate revision is also created which contains the column named by the rate_col parameter. When a report is run, for each day in the reporting range the unit rate for that day will be determined by whatever value is in the column named by the rate_col parameter in the usage data.

This means that only a single rate revision is required, even if the actual value in the rate_col column is different from day to day.

set_rate_using

The set_rate_using parameter is also used to determine the unit rate for each service. This differs from the rate_col parameter in that the values in the column named by set_rate_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

This means that the unit cost is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

Either of rate_col or set_rate_using (but not both) may be used in a single services statement

fixed_price_col

The fixed_price_col parameter is used to determine the column in the usage data which contains the fixed price associated with the service definitions created by the services statement.

As each service definition is created, an initial rate revision is also created which contains the column named by the fixed_price_col parameter. When a report is run, for each day in the reporting range the fixed price for that day will be determined by whatever value is in the column named by the fixed_price_col parameter in the usage data.

If a monthly service has different fixed prices for different days in the month, then whichever results in the highest charge will be used.

This means that only a single rate revision is required, even if the actual value in the fixed_price_col column is different from day to day.

set_fixed_price_using

The set_fixed_price_using parameter is also used to determine the fixed price for each service. This differs from the fixed_price_col parameter in that the values in the column named by set_fixed_price_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

This means that the fixed price is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

Either of fixed_price_col or set_fixed_price_using (but not both) may be used in a single services statement

cogs_col

The cogs_col parameter is used to determine the column in the usage data which contains the COGS rate associated with the service definitions created by the services statement.

As each service definition is created, an initial rate revision is also created which contains the column named by the cogs_col parameter. When a report is run, for each day in the reporting range the COGS rate for that day will be determined by whatever value is in the column named by the cogs_col parameter in the usage data.

If a monthly service has different COGS rates for different days in the month, then whichever results in the highest charge will be used.

This means that only a single rate revision is required, even if the actual value in the cogs_col column is different from day to day.

set_cogs_using

The set_cogs_using parameter is also used to determine the COGS rate for each service. This differs from the cogs_col parameter in that the values in the column named by set_cogs_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

This means that the COGS rate is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

Either of cogs_col or set_cogs_using (but not both) may be used in a single services statement

fixed_cogs_col

The fixed_cogs_col parameter is used to determine the column in the usage data which contains the fixed COGS price associated with the service definitions created by the services statement.

As each service definition is created, an initial rate revision is also created which contains the column named by the fixed_cogs_col parameter. When a report is run, for each day in the reporting range the fixed COGS price for that day will be determined by whatever value is in the column named by the fixed_cogs_col parameter in the usage data.

If a monthly service has different fixed COGS prices for different days in the month, then whichever results in the highest charge will be used.

This means that only a single rate revision is required, even if the actual value in the fixed_cogs_col column is different from day to day.

set_fixed_cogs_using

The set_fixed_cogs_using parameter is also used to determine the fixed COGS price for each service. This differs from the fixed_cogs_col parameter in that the values in the column named by set_fixed_cogs_using are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.

This means that the fixed COGS price is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date for more details)

Either of fixed_cogs_col or set_fixed_cogs_using (but not both) may be used in a single services statement

set_min_commit_using

The set_min_commit_using parameter is used to set the minimum commit value in the initial rate revision for each service.

The values in the column identified by set_min_commit_using are extracted from the usage data and used as numeric literals in the revision.

effective_date

When creating the initial rate revision for a service, the value specified by the effective_date parameter is interpreted as a yyyyMMdd value to determine the date from which the revision should be applied.

If the effective_date parameter is omitted then the current will be used by default.

When using effective_date, the value will be used to set the initial rate revision date for all the service definitions created by the services statement. If different services require different effective dates then the effective_date_col parameter may be used to determine the effective date for each service from a column in the usage data.

effective_date_col

If there is a column in the usage data containing yyyyMMdd values representing the desired effective date for the initial revision of each service, The effective_date_col parameter may be used to extract the values from this column and set the effective date for each service accordingly.

Either of effective_date or effective_date_col may be specified in a single services statement, but not both