1 of 100

2.3.1 Introduction

Exivity is a metering and billing software solution for public and private cloud environments that allows you to report on cloud consumption from any IT resource. Exivity enables you to apply your MSP/CSP business rules and makes any type of Pay-as-you-Go model work. It also facilitates internal charge-back and show-back requirements for Enterprise IT.

These things are done by extracting IT consumption data from various endpoints and then mapping this data to meaningful customer-specific information such as services, customer IDs, names and contracts.

There are four main steps involved in a successful deployment:

Extract
Transform
Report
Integrate (optional)

Extract

The Extract step defines your data sources such as:

APIs that return usage data, service catalogue, rate card, customer/subscriber lists and similarly available records from public or private clouds
APIs or ODBC queries that return contracts, customer names, IDs and other contextual lookup data from CMDB / CRM systems
Flat files on disk in CSV, JSON or XML format
Other HTTP/S sources

Exivity provides a rich scripting interface via its Unified Scriptable Extractor (USE) component which facilitates integration with almost any data source. For most of the big cloud platforms we provide template extractor scripts as part of the product. Additionally you can also write your own USE scripts from scratch in order to integrate with custom data sources.

Transform

The Transform step provides a powerful for processing extracted data. Using it you can merge consumption metrics, contract details, customer information, custom metadata, service definitions or any other imported information to produce an enriched and/or normalised result.

This is done using the Transcript component, which executes user-definable scripts (termed tasks) in order to produce a meaningful set of data suitable for reporting against. Often this data will feed a consolidated bill of IT based on the various different consumed services.

Transcript also allows you to define and populate services and rates, either of which may be passed through from cloud data, defined as custom offerings or a mixture of the two.

Report

Exivity provides a modern responsive User Interface, that allows you to 'slice and dice' the processed data in any way you choose. Multiple Report Definitions can be created with ease which allow you to graphically and textually display both cost and usage statistics.

Integrate

We think that Exivity should be part of your automation landscape, where it can provide (for example) line items that can be digested by your ERP and/or invoicing system. Therefore we consider Integrate as the logical final step for any deployment where it is useful.

To this end we offer an open and fully featured REST API. Our GUI uses our own API for all back-end processes meaning that all textual data shown in the Exivity GUI is also obtainable via our API.

Getting started

Installation

Server

Exivity can be installed on any Microsoft Windows 2012 R2 or higher server in your on premise data center or in the cloud. Depending on the amount of data, Exivity recommends the following system configuration:

Deployment

Data sources

CUPR

CPU

Both the Data sources and CUPR in above tables are recommended limits. All systems should have a standard C: OS drive. The storage recommendation in the table above is for a D: drive, which preferably is SSD.

Client

The Exivity front-end runs fine in these desktop browsers (with at least the specified version):

Google Chrome 63
Microsoft Edge 16
Opera 50

We don't support Apple Safari and Mozilla Firefox at the moment due to missing features in these browsers.

We aim to provide the fastest metering and billing solution available today, and this means we have to rely on modern (web) technologies. Part of our speed comes from pre-processing the raw data, and part comes from having almost all processed data available right in the browser, and streaming the missing pieces on request. To efficiently and reliably achieve this, we use some very specific technologies not yet available in all browsers. Most notably, Safari and Firefox don't support all features we need to build our next-generation platform yet. When they do catch up, we'll fully support those browser, and until that time, please choose from Edge, Chrome or Opera.

Azure Market Place

Introduction

Apart from installing Exivity in any on premise environment, Exivity can also be deployed from the Azure Market Place (AMP). Deploying Exivity on AMP is straight forward, and can be finised within a few minutes via your Azure Portal.

Azure Marketplace Offering

Once you've selected the Exivity offering, you should be presented with the following screen:

After clicking the Create button, you will be redirected to the VM deployment wizard

Deployment Wizard

Fill in a Windows user/pass and pick your deployment Resource Group:

Make sure to write down this username and password, as you will need these when connecting to the Exivity Windows server using the Remote Desktop Protocol.

2. Try to pick a recommended VM size type that has enough CPU's and Memory (see for general system requirements). Smaller machines are possible, but will influence performance:

3. You may select any additional options, but none are required for running Exivity succesfully, so you may skip this page simply by clicking the OK button:

4. Review the summary and click Create to deploy your Exivity VM:

This may take a few minutes. You may review the status of the Virtual Machine in your VM list:

Write down the Public IP address once it is available. Optionally you may configure a custom DNS name to have an easy way to connect.

Connecting to your Exivity instance

You can logon to your Exivity instance with RDP, but after deployment you should be able to connect to your instance using the public IP address or DNS name of your Exivity instance considering the following default URL:

https://<Your_Public_IP>:8001

The default admin username is admin with password exivity.

By default no data is loaded into the system, so you'll have to create a new for obtaining consumption data and a to process that data. A is then created to be able to report on your consumption metrics and costs.

Next steps

A couple of getting started guides are provided , but feel free to drop us an or create a in our support portal. We will then assist you to get your started for your specific use case.

Concepts

The following articles provide information regarding the basic concepts around which Exivity is built. These concepts are referenced repeatedly throughout the documentation and as such the below articles are recommended reading:

- The input files to and how their contents are accessed by Transcript tasks
- The date against which data is to be processed by

Datadate

When Transcript is executed one of the command line arguments it requires is a date in yyyyMMdd format which is termed the data date. The activities performed by Transcript are associated with this date in several ways, most notably:

when importing using automatic source tagging which specific Dataset file to import from \collected
to determine the concept of 'today' when processing files containing data with timestamps spanning multiple days
to determine the output directory into which files will be generated
to generate the filename of the RDF files generated

The data date is made available to a Transcript task through the automatic creation of the ${dataDate} variable

Datasets

CSV files

CSV files are a way of storing data in a table format. A table consists of one or more rows, each row containing one or more columns. There is no formal specification for CSV files although a proposed standard can be found at .

Although the 'C' in 'CSV' stands for the word comma, other characters may be used as the separator. TAB and semicolons are common alternatives. Exivity can import CSVs using any separator character apart from an dot or an ASCII NUL and uses the comma by default.

Home directory

During the execution of a task, or during the generation of a report, a number of files are accessed, both for reading and writing. The user Home Working Directory (referred to throughout this documentation as home_dir or base_dir) is the directory relative to which these files are located.

The home directory should preferably be located on a dedicated volume i.e. D:\exivity\home and it is recommended that it be located on an SSD drive.

Accounts

This page is a work in progress

Configuration

Data Sources

Transformers

The 'Data Sources' menu allows an admin of the Exivity solution to manage Transcript 'Transformer' scripts. Transcript has its own language reference, which is fully covered in a separate chapter of this documentation.

As described in the , you are free to use your editor of choice to create and modify Transformers. However, the GUI also comes with a built-in Transformers-editor.

Creating Transformers

To create a new

Adjustments

Exivity enables you to create account specific rate adjustment policies. An adjustment policy allows you to apply a discount or a premium using one of these modifiers:

a certain amount of money (i.e. $ 100)
a certain quantity (i.e. 100 GB/hours)
a percentage (i.e. 10%)

This Adjustment can then be applied to a single service, multiple different services, or one or more service categories.

Create an Adjustment Policy

To create a new adjustment policy for an account, follow these steps:

From the menu on the left, select 'Catalogue' > 'Adjustments'
Then select the Account from the list of accounts for which you want to create an adjustment policy
After selecting the account, click 'Add Policy', and provide a meaningful name for your policy in right screen where it says 'Adjustment name'

Services

The services screen gives a user the ability to view and change the available services in the service catalogue of the Exivity deployment. When creating new services, it is required to use a Transformer with the or statement.

Obtaining details for a Service

To view the details of a service that has already been created, click on one of the services listed in the 'Catalogue' > 'Services' screen:

Workflows

The Workflows menu allows you to schedule various tasks and execute them at a specific date and time. This allows the execution of different Extractors and Transformers, so that they are tightly chained together.

Creating a Workflow

To create a new Workflow, go to 'Administration' > 'Workflows', then click the green button '+ Workflow':

Tutorials

Azure EA

Introduction

When deploying the Azure EA for Exivity, some configuration is required within your Azure EA environment. The following process must be completed in order to report on Azure EA consumption:

Create an Access Key and Secret in your Azure EA portal

Upgrading to version 2

Report Caches

Version 2 introduces a breaking change to historical cached data from installations before version 2.0.0. Therefore, when you're upgrading an Exivity version 1 to version 2, all Reports will be automatically Unprepared during upgrade. After completing the upgrade, you will have to browse to your Report Definitions and Prepare each of them for the appropriate date period:

Changes to Transcript

As part of version 2, outdated syntax in your Transformers will be automatically converted to comply with version 2 Transcript syntax. Your version 1 Transformers will be backed up during upgrade into the default backup folder %EXIVITY_HOME_PATH%/system/backup. To manually upgrade any of your version 1 Transformers, you may use the script provided in %EXIVITY_PROGRAM_PATH%/update-transcript-v2.0.0.bat (this script will upgrade any Transformer currently available in the default Transformer folder).

Replacement of Exivity Eternity Service

the service that was previously responsible for scheduling and execution of workflows ("Exivity Eternity Service" which ran eternity.exe) has been replaced by the Aeon component and runs as the "Exivity Scheduling Service". In case the "Exivity Eternity Service" was running under a non-system account, ensure to update the newly registered "Exivity Eternity Service" with the corresponding user service credentials.

Diving deeper

Extract

Introduction

Extraction is the process by which USE (Unified Scriptable Extractor) retrieves data from external locations. The following types of data source are supported:

Templates

Exivity provides a catalogue of USE extraction scripts that can be used to integrate with almost any cloud provider, hypervisor or legacy IT end point. We've published some of our templates on GitHub for your convenience.

This repository contains Extractors for VMware, Azure, Amazon and others. However, if you are currently missing an integration template and are unwilling or unable to create your own, feel free to drop us an e-mail at [email protected].

basename

The basename statement is used to extract the filename portion of a path + filename string

Syntax

basenamevarName

basename

clear

The clear statement is used to delete all HTTP headers previously configured using the set http_header statement.

Syntax

clear http_headers

Details

The clear statement will remove all the headers currently defined, after which a new set of headers can be specified using .

Example

discard

The discard statement is used to delete a named .

Syntax

discard{buffer_name}

encode

The encode statement is used to base16 or base64 encode the contents of a variable or a named buffer.

Syntax

encode base16|base64varName|{buffer_name}

Details

The encode statement will encode the contents of an existing variable or named buffer, replacing those contents with the encoded version.

The result of encoding the contents will increase their length. With base16 encoding the new length will be double the original. With base64 encoding the new length will be greater than the original but the exact size increase will depend on the contents being encoded.

When encoding a variable, if the size of the result after encoding exceeds the maximum allowable length for a variable value (8095 characters) then the USE script will fail and an error will be returned.

Encoding an empty variable or buffer will produce an empty result

Example

The following script ...

... produces the following output:

escape

The escape statement is used to escape quotes in a variable value or the contents of a named buffer

Syntax

escape quotes invarName|{bufferName}

Details

If a variable value or named buffer contains quotes then it may be desirable to escape them, either for display purposes (to prevent USE from removing them before rendering the data as output) or in order to satisfy the requirements of an external API.

The escape statement will precede all occurrances of the character " with a backslash as shown in the example below. This operation is not just temporary - it will update the actual contents of the variable or named buffer.

The escape statement does not consider the context of any existing quotes in the data. Running it multiple times against the same data will add an additional escape character each time to each occurrence of a quote.

Example

Given an input file called 'escapeme.txt' containing the following data:

The following script:

will produce the following output:

exit_loop

The exit_loop statement will terminate the current loop.

Either exit_loop or loop_exit may be used. Both variants work identically.

Syntax

exit_loop

Details

The exit_loop statement will immediately terminate the current loop and script execution will jump to the statement following the } at the end of the current loop.

This can be done even if the exit_loop statement is within one or more constructs inside the loop.

If no loop is in effect then an error will be logged and the script will terminate.

get_last_day_of

The days_in_month statement sets a variable to contain the number of days in the specified month

Syntax

get_last_day_ofyyyyMMasvarName

gosub

The gosub keyword is used to run a named subroutine

Syntax

gosub subroutineName([argument1, ... argumentN]

gunzip

The functionality described in this article is not yet available. This notice will be removed when the appropriate release is made.

The gunzip statement is used to inflate a GZIP file

Syntax

gunzip filename as filename

gunzip {bufferName} as filename

Details

The gunzip statement can be used to extract the contents of a GZIP archive containing a single file. The GZIP archive may be a file on disk or may be the contents of a named buffer.

It is not possible to inflate GZIP data directly in memory, but the same effect can be achieved by extracting GZIP data in a named buffer to disk, and then loading the extracted data back into the named buffer as shown in the example below.

All paths and filenames are treated as relative to the Exivity home directory

Example

if

The if statement is used to conditionally execute one or more statements. In conjunction with an optional else statement it can cause one or other of two blocks of statements to be executed depending on whether an expression is true or false.

Syntax

if(expression)

json

The json statement is used to format JSON in a named buffer.

Syntax

json format{buffername}

Details

In many cases an API or other external source will return JSON in a densely packed format which is not easy for the human eye to read. The json statement is used to re-format JSON data that has been previously loaded into a named buffer (via the statement) into a form that is friendlier to human eyes.

After the JSON has been formatted, the buffer can be or for subsequent inspection

Example

Given the following single packed line of JSON in a named buffer called myJSON:

The following USE script fragment:

will result in the following output:

loglevel

While executing a USE script, various messages are written to a logfile. The loglevel option determines the amount of detail recorded in that logfile.

Syntax

loglevelloglevel

Details

The table below shows the valid values for the loglevel argument. Either the numeric level or the label can be specified. If the label is used then it must be specified in CAPITAL LETTERS.

The log levels are cumulative, in that higher log-level values include lower level messages. For example a level of INFO will cause FATAL, ERROR, WARN and INFO level messages to be written to the log.

The loglevel statement takes immediate effect and may be used multiple times within a USE script in order to increase or decrease the logging level at any time.

loop

The loop statement executes one or more statements multiple times.

Syntax

looplabel [count][timeouttimelimit]

pause

The pause statement is used to suspend execution of a USE script for a specified time.

Syntax

pausedelaytime

print

The print statement is used to display text to standard output while a USE script is executing.

Syntax

print [-n]word|{buffer_name} [... word|{buffer_name]

return

The return statement is used to exit a subroutine at an arbitrary point and return to the calling location

Syntax

return

Details

A will automatically return to the location it was called from when the end of its body is reached. However it may be desirable to explicitly exit the subroutine at some other point in which case the return statement is used.

The return statement cannot be used to return a value to the calling code (this should be done via the use of variables as described in the statement documentation)

Example

save

The save statement is used to write the contents of a to disk.

Syntax

save{buffer_name}asfilename

terminate

The terminate statement will exit the USE script immediately.

Syntax

terminate [with error]

unzip

The unzip statement is used to unzip the data in a named buffer.

Syntax

unzip{buffer_name}

Details

The unzip statement will extract a single file from a zip archive stored in a named buffer. In order for this to succeed, the buffer must have been previously populated using the statement, and the data within the buffer must be a valid ZIP file.

Only ZIP files are supported. To extract GZIP files, use

A warning will be logged, the buffer left intact and the script will continue to execute if any of the following conditions arise:

The buffer is empty or does not contain a valid ZIP archive
The ZIP archive is damaged or otherwise corrupted
More than 1 file is present within the archive

After the unzip statement completes, the buffer will contain the unzipped data (the original ZIP archive is discarded during this process).

The filename of the unpacked file is also discarded, as the resulting data is stored in the buffer and can subsequently be saved using an explicit filename as shown in the example below.

Example

append

Overview

The append statement is used to append one DSET to the end of another.

Syntax

appendsource_dset.idtodestination_dset.id

Details

If the source DSET has any column names not present in the destination DSET then additional columns are automatically created in the destination DSET. These additional columns will contain blank values by default.

If one or more column names are present in both DSETs then the columns copied from the source DSET may be re-ordered into the same order as that used by the destination DSET.

At the end of the operation, the destination DSET will contain all the data from both DSETs, and the source DSET is unchanged.

!!! note Both DSETs must exist and it is not possible to append a DSET to itself

Example

Given the following DSETs:

DSET ID: example.data

DSET ID: example2.data

The statement append example2.data to example.data will result in the following destination DSET (example.data):

calculate

Overview

The calculate statement is used to perform arithmetic operations using literal and column values.

Syntax

calculate columnResultColas source operation source

where source is either of columncolName or valueliteral_value

and operation is one of the characters + - * / % for addition, subtraction, multiplication, division and modulo respectively.

There must be whitespace on each side of the operation`character

Examples: calculate column ResultCol as column Amount * value 1.2 calculate column Net as column total - column cogs calculate column constant_7 as value 3.5 + value 3.5

Details

The ResultCol parameter is the name of the column that will hold the results. This column may or may not exist (if necessary it will be created automatically).

Both of the two source parameters can specify a literal value, or the name of a column containing the value to use when performing the calculation.

A literal value is specified using valueN where N is the literal number required
A column name is specified using columncolName where ColName is the name of the column containing the values required

The ResultCol may be the same as a column specified by one of the source parameters in which case any existing values in it will be updated with the result of the calculation.

Additional notes:

Any blank or non-numeric values in a source column will be treated as 0
An attempt to divide by zero will result in 0
When performing a modulo operation, the two source values are rounded to the nearest integer first
If the result column already exists then if

Examples

Add 1.5 to the values in the Rate column:

calculate column Rate as column Rate + value 1.5

Multiply the values in the Rate column by those in the Quantity column
Store the result in a new column called Charge

calculate column Charge as column Rate * column Quantity

capitalise

Overview

The capitalise (the spelling capitalize is also supported) statement is used to modify the name of a column and/or the values in a column such that the first character is a capital letter and the remaining characters are lower case.

Syntax

capitalise values|heading [and values|heading] in column|columnsColName1 [... ColNameN]

After the keyword in, either of the keywords column or columns may be used

Details

The heading and values keywords refer to the name of an existing column and the values in each row for that column respectively.

Only the first character in the column name or value is converted to upper case. If this character is not a letter, then the statement will have no effect on it. For example applying the statement:

will have no effect on the column name as the first character is an underscore. However, applying the same statement to a column called _VMName would result in a new name of _vmname as after attempting to make the first character a capital (which in the case of the underscore has no effect), the remaining characters are converted to lower case.

Any number of column names may be specified, and any of these may or may not be fully qualified. When applied to values in a column, blank values are ignored.

Examples

default

Overview

The default statement is used to explicitly define the default DSET to use when specifying a column name as an argument in subsequent statements.

delete

Overview

The delete statement is used to delete one or more columns or rows from one or more DSETs.

finish

Overview

The finish statement creates a (RDF) from a DSET. The RDF can subsequently be used by the reporting engine.

lowercase

Overview

The lowercase statement is used to modify the name of, and/or the values in, a column such that any upper case characters are replaced with their lower case equivalent.

Syntax

lowercase heading|values [and heading|values] in columnColName1 [... ColNameN]

Although the syntax shown uses the keyword column, either column or columns may be specified. Both work in an identical manner.

Details

The heading and values keywords determine the scope of the processing to be applied. The columns named in ColName1 ... ColNameN may or may not be fully qualified. If a column name is not fully qualified then the default DSET will be assumed.

Only upper case characters in a column name or value are converted. Numbers, punctuation and other symbols are ignored. Any blank values in a column are ignored.

Examples

rename

Overview

The rename statement is used to change the name of an existing column in a DSET, or to change the source and/or alias of a DSET

set

Overview

The set statement is used to write a specified value into all the cells in any given column, or to copy values from one column to another.

Syntax

setColNametoValue

setColNameasSrcColName

setColName=Expression

Details

If the keyword word to is used then column ColName will be updated using the constant value specified in Value.

If the word as is used then the values in the column SrcColName will be copied into column ColName.

If = is used then the expression is evaluated and the column values are set to the result. For a complete list of functions that can be used in an expression please refer to the in the article about the statement

The ColName argument must identify an existing column. The modified column will be in the default DSET unless it is a column name.

If the overwrite is disabled, then only cells with blank values will be updated with the new value.

Examples

Set the column called Flag in the default DSET to a value of 0:

Set the column called Flag to be the value in the column status:

Set the column called DiskTier to the 4th character in the existing value of that column

set DiskTier = @SUBSTR([DiskTier],4,1)

Set the column called Flag in the DSET custom.dataset to a value of Pending:

Set the column Flag to free if the rate column contains a value of 0:

Fill any blank values in the column username in the DSET Azure.usage to Unknown :

terminate

The terminate statement will exit the transcript task immediately.

Syntax

terminate

Details

Normally a transformation script will finish execution when an error is encountered or when the end of the script file is reached, whichever comes first.

When the terminate statement is encountered, the script will finish at that point. No statements after the terminate statement will be executed.

Example

set

The set statement is used to configure a setting for use by a subsequent http or buffer statements.

Syntax

setsetting value

Details

A protocol such as offers a number of configuration options. Any given option is either persistent or transient:

The following settings can be configured using set:

http_progress

set http_progress yes|no

Persistent. If set to yes then dots will be sent to standard output to indicate that data is downloading when an HTTP session is in progress. When downloading large files if a lengthy delay with no output is undesirable then the dots indicate that the session is still active.

http_username

set http_usernameusername

Persistent. Specifies the username to be used to authenticate the session if the http_authtype setting is set to anything other than none. If the username contains any spaces then it should be enclosed in double quotes.

http_password

set http_passwordpassword

Persistent. Specifies the password to be used to authenticate the session if the http_authtype setting is set to anything other than none. If the password contains any spaces then it should be enclosed in double quotes.

http_authtype

set http_authtypetype

Persistent. Specifies the type of authentication required when initiating a new connection. The type parameter can be any of the following:

http_authtarget

set http_authtargettarget

Persistent. Specifies whether any authentication configured using the http_authtype setting should be performed against a proxy or the hostname specified in the URL.

Valid values for target are:

server (default) - authenticate against a hostname directly
proxy - authenticate against the proxy configured at the Operating System level

http_header

set http_header"name: value"

Persistent. Used to specify a single HTTP header to be included in subsequent HTTP requests. If multiple headers are required, then multiple set http_header statements should be used.

An HTTP header is a string of the form name: value.

There must be a space between the colon at the end of the name and the value following it, so the header should be enclosed in quotes

Example: set http_header "Accept: application/json"

Headers configured using set http_header will be used for all subsequent HTTP connections. If a different set of headers is required during the course of a USE script then the statement can be used to remove all the configured headers, after which set http_header can be used to set up the new values.

By default, no headers at all will be included with requests made by the statement. For some cases this is acceptable, but often one or more headers need to be set in order for a request to be successful.

Typically these will be an Accept: header for GET requests and an Accept: and a Content-Type: header for POST requests. However there is no hard and fast standard so the documentation for any API or other external endpoint that is being queried should be consulted in order to determine the correct headers to use in any specific scenario.

Headers are not verified as sane until the next HTTP connection is made

http_body

set http_body datastring - use the specified string as the body of the request

set http_body filefilename - send the specified file as the body of the request

set http_body{named_buffer} - send the contents of the named buffer as the body of the request

Transient. By default no data other than the headers (if defined) is sent to the server when an HTTP request is made. The http_body setting is used to specify data that should be sent to the server in the body of the request.

When using http_body a Content-Length: header will automatically be generated for the request. After the request this Content-Length: header is discarded (also automatically). This process does not affect any other defined HTTP headers.

After the request has been made the http_body setting is re-initialised such that the next request will contain no body unless another set http_body statement is used.

http_savefile

set http_savefilefilename

Transient. If set, any response returned by the server after the next HTTP request will be saved to the specified filename. This can be used in conjunction with the statement, in which case the response will both be cached in the named buffer and saved to disk.

If no response is received from the next request after using set http_savefile then the setting will be ignored and no file will be created.

Regardless of whether the server sent a response or not after the HTTP request has completed, the http_savefile setting is re-initialised such that the next request will not cause the response to be saved unless another set http_savefile statement is used.

No directories will be created automatically when saving a file, so if there is a pathname component in the specified filename, that path must exist.

http_savemode

set http_savemodemode

Persistent.

If mode is overwrite (the default) then if the filename specified by the set http_savefile statement already exists it will be overwritten if the server returns any response data. If no response data is sent by the server, then the file will remain untouched.
If mode is append then if the filename specified by the set http_savefile statement already exists any data returned by the server will be appended to the end of the file.

http_timeout

set http_timeoutseconds

Persistent. After a connection has been made to a server it may take a while for a response to be received, especially on some older or slower APIs. By default, a timeout of 5 minutes (300 seconds) is endured before an error is generated.

This timeout may be increased (or decreased) by specifying a new timeout limit in seconds, for example:

The minimum allowable timeout is 1 second.

odbc_connect

set odbc_connectconnection_string

Persistent. Sets the ODBC connection string for use by the statement's odbc_direct protocol. The connection string may reference an ODBC DSN or contain full connection details, in which case a DSN doesn't need to be created.

A DSN connection string must contain a DSN attribute and optional UID and PWD attributes. A non-DSN connection string must contain a DRIVER attribute, followed by driver-specific attributes.

Please refer to the documentation for the database to which you wish to connect to ensure that the connection string is well formed.

An example connection string for Microsoft SQL Server is:

create

Overview

The create statement is used to add one more more new columns to an existing DSET.

Syntax

create columnNewColumnName[valueValue]

create columns fromColumnName[usingValueColumnName]

create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [Column [/regex/]|stringliteral]

Details

Explicit single column creation

Syntax

create columnNewColumnName[valueValue]

Details

This statement is used to create a new column called NewColumnName. The NewColumnName argument may be a column name, in which case the new column will be created in the DSET specified as part of that name.

Note: If no DSET has been explicitly defined using the statement then the DSET created by the first or statement in the Transcript task is automatically set as the default DSET.

A column called NewColumnName must not already exist in the DSET. If NewColumnName contains dots then they will be converted into underscores.

The new column will be created with no values in any cells, unless the optional value *Value* portion of the statement is present, in which case all the cells in the new column will be set to Value.

Examples

Create a new empty column called Cost in the default DSET: create column Cost

Create a new column called Cost with a value of 1.0 in every row of the default DSET: create column Cost value 1.0

Create a new column called Cost with a value of 1.0 in every row of the DSET custom.charges: create column custom.charges.Cost value 1.0

Automated single/multiple column creation

Syntax

create columns fromColumnName[usingValueColumnName]

Details

This statement is used to create multiple columns in a single operation. As is the case for create columns above, if the using ValueColumnName portion of the statement is not present, then all newly created columns will have no values in any cells.

Given this example dataset:

The statement create columns from ServiceName using Count will create the result shown below:

The names of the new columns to create are derived from the contents of the cells in the column called ColumnName, and the values (if opted for) are derived from the contents of the cells in the column called ValueColumnName. Duplicates are ignored. If all the cells in ColumnName have the same contents, then only a single new column will be created. To illustrate this, consider the following:

When applied to the data above, the statement create columns from ServiceName will produce the following result (note that only a single column called Small_VM is created, and that empty cells are represented with a separator character, which in the case of the below is a comma):

If opting to set the values in the new columns, then for each row the value in ValueColumnName will be copied into the column whose name matches ColumnName. When applied to the same original data, the statement create columns from ServiceName using Quantity will produce the following result:

When using create columns the new columns are always created in the default DSET. This means that when no values are being set, it is possible to specify a different DSET for ColumnName. If the default DSET is Azure.usage, then the statement create columns from custom.data.Services will derive the names of the new columns from the cell contents in the Services column in the custom.data DSET.

This is only possible in the absence of the using ValueColumnName option. When values are to be set, both the ColumnName and ValueColumnName arguments must belong to the default DSET.

Example

The following transcript task will import the datasets Azure.usage and system/extracted/Services.csv, and create new (empty) columns in Azure.usage whose names are taken from the values in the column ServiceDefinitions in Services.csv.

Merging column values to create a new column

Syntax

create mergedcolumnNewColumn[separatorsep]from [stringliteral] Column [/regex/] [ ... Column [/regex/]|stringliteral]

If preferred, the wordusingmay be used instead of the wordfrom(both work in an identical fashion)

Details

This form of the statement is used to generate a new column containing values derived from those in one or more existing columns (termed source columns). The parameters are as follows:

The separator may be more than one character in length (up to 31 characters may be specified)

If a regex is specified then it must contain a subgroup enclosed in parentheses. The portion of the text in the source column matched by this subgroup will be extracted and used in place of the full column value.

The '/' characters surrounding the regular expression in the statement are not considered to be part of the expression itself - they are merely there to differentiate an expression from another column name.

If a regex is not specified, then the entire value in the source column will be used.

Options

By default the value extracted from a source column will be blank in the following two cases:

There is a blank value in a source column
No match for a regular expression is found in the value of a source column

In such cases the merged result will simply omit the contribution from the source column(s) in question. If all the source columns yield a blank result then the final merged result will also be blank.

This behaviour can be overridden through the use of the statement. The options associated with the create mergedcolumn statement are as follows:

option merge_blank = some_text_here

This option will use the string some_text_here in place of any blank source column value.

option merge_nomatch = some_text_here

This option will use the string some_text_here if the result of applying the regular expression to a column value returns no matches.

Specifying the literal string <blank> as the merge_blank or merge_nomatch value will reset the option such that the default behaviour is re-activated.

Examples

Given the following dataset:

The following examples illustrate some uses of the create mergedcolumn statement:

Example 1

Example 2

If no regular expression is specified then the values in the source column will be used in their entirety:

Example 3

Let us add a new row to the sample dataset which has a non-compliant value for the user_id:

By default a non-matching value will result in a blank component of the merged result:

In this case, the resulting key for John has no separator characters in it. We can force a default value for the missing user_id portion as follows:

import

Overview

The import or use statement is used to read a CSV file (which must conform to Dataset standards) from disk in order to create a DSET which can then be processed by subsequent Transcript statements.

The statements import and use are identical in operation. For clarity, the term import will be used for the remainder of this article, but use can be substituted in all cases.

To import CSV files that do not use a comma as the delimiter, please refer to the section further down in this article.

Syntax

Import from CSV files

importfilenamefromsource[aliasalias][options { ... }]

importfilenamesourcecustom_source[aliasalias][options { ... }]

Import from CCR files

import filename.ccr source custom_source [aliasalias]

Import from database tables

import ACCOUNT sourcecustom_source[aliasalias]

import USAGE fordatefromdb_namesourcecustom_source[aliasalias]

Details

If using the Windows path delimiter - \ - it is advisable to put the path and filename in double quotes to avoid the backslash being interpreted as an escape character

Options

Data imported from a CSV file can be filtered as it is being read in order to reduce post-import processing time and memory overhead.

The import filters are configured via the options parameter, which if present is imemdiately followed by one or more name = value pairs enclosed within braces, for example:

The = sign is optional, but if present it must be surrounded by whitespace.

Multiple options may be specified and each option must be placed on separate line, for example:

The options supported by import are as follows:

If the skip option is specified then the column headings in the imported data will be set to those of the first row following the skipped rows

The filter expression, if specified, is of identical format to that used by the statement and it must reference at least one column name, enclosed within square brackets:

To specify a parameter or column name that contains spaces in an expression, use quotes within the square brackets as follows:

File name / pattern

If a pattern is specified in the , then the filename parameter is treated as an ECMAScript-type regular expression, and all files matching that pattern are imported and appended to one another to result in a single DSET.

Only filenames may contain a regular expression, directory names are always treated literally.

If using a regular expression, the import options are applied to all files matching that expression. All files must have the same structure (after any select/ignore options have been applied).

If any file in the matching set being imported has different columns to the first file that matched then an error will be generated and the task will fail.

Database tables

Account data

To import all accounts from Exivity the following statement is used:

import ACCOUNT sourcecustom_source[aliasalias]

This functionality is intended for advanced use cases where it is necessary to correlate information about existing accounts into the new data being processed as part of the Transform step.

Usage data

To import usage data previously written to an using the following statement is used:

import USAGE fordatefromdsetsourcecustom_source[aliasalias]

This functionality may be useful in the event that the original data retrieved from an API has been deleted or is otherwise unavailable.

The date must be inyyyyMMddformat. The RDF from which the usage will be imported is:

<basedir>/system/report/<year>/<month>/<day>_<dset>_usage.rdf.

To illustrate this in practice, the statement ...

import USAGE for 20180501 from azure.usage source test alias data

... will load the usage data from ...

<basedir>/system/report/2018/05/01_azure.usage_usage.rdf

... into a DSET calledtest.data.

Source and Alias tags

A file imported from disk contains one or more named columns which are used to create an index for subsequent processing of the data in that file. As multiple files may be imported it is necessary to use namespaces to distinguish between the DSETs created from the files (each imported file is converted into a memory-resident DSET before it can be processed further). This is accomplished through the use of source and alias tags. Each file imported is given a unique source and alias tag, meaning that any column in the data imported from that file can be uniquely identified using a combination of source.alias.column_name.

There are two main variations of the import statement which are as follows:

Automatic source tagging

importfilenamefromsource[aliasalias]

By convention, data retrieved by the from external sources is located in a file called

<basedir>/system/extracted/<source>/<yyyy>/<MM>/dd>_.csv

where <yyyy> is the year, <MM> is the month and <dd> is the day of the current .

Typically, the <source> portion of that path will reflect the name or identity of the external system that the data was collected from.

By default the alias tag will be set to the filename, minus the _ portion of that filename and .csv extension but an alias can be manually specified using the optional alias parameter.

As an example, assuming the data date is 20170223, the statement:

import usage from Azure

will create a DSET called Azure.usage from the file /system/extracted/Azure/2017/02/23_usage.csv.

The statement:

import usage from Azure alias custom

will create a DSET called Azure.custom from the file

<basedir>/system/extracted/Azure/2017/02/23_usage.csv.

Manual source tagging

importfilenamesourcecustom_source[aliasalias]

This form of the statement will import filename and the source tag will be set to the value of the custom_source parameter.

By default the alias tag will be set to the filename minus the .csv extension but an alias can be manually specified using the optional alias parameter.

As an example, assuming the data date is 20170223, the statement:

import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

will create a DSET called Azure.usage fom the file

<basedir>/system/extracted/Azure/20170223.csv

Importing CCR files

Options will be ignored when importing CCR files

When performing an import using manual tagging it is possible to import a Cloud Cruiser CCR file. This is done by specifying the .ccr filename extension as follows:

Full information regarding CCR files is beyond the scope of this article. Cloud Cruiser documentation should be consulted if there is a requirement to know more about them.

In order to create a DSET from a CCR file, Transcript coverts the CCR file to a Dataset in memory as an interim step. The resulting DSET will have blank values in any given row for any heading where no dimension or measure in the CCR file existed that matched that heading. No distinction is made between dimensions and measures in the CCR file, they are imported as columns regardless.

When importing a CCR file, the quote and separator characters are always a qouble quote - " - and a comma - , - respectively. Fields in the CCR file may or may not be quoted. The conversion process will handle this automatically and any quotes at the start and end of a field will be removed.

Quote and Separator options

The fields in a dataset file may or may not be quoted and the separator character used to delineate fields can be any character apart from an ASCII NUL (0) value. It is therefore important to ensure that the correct are set before importing a dataset in order to avoid unwanted side effects.

The options relating to import are quote and separator (or delimiter). By default, these are set to a double quote and a comma respectively.

If defined the quote character will be stripped from any fields beginning and ending with it. Any additional quote characters inside the outer quotes are preserved. Fields that do not have quotes around them will be imported correctly, unless they contain a quote character at only one end of the field.

If no quote character is defined, then quote characters are ignored during import, but any separator characters in the column headings will be converted to underscores.

Embeds

If the embed is set, then a curly bracket in the data - { - will cause all characters (including newlines) up until the closing bracket - } - to be imported. Nested brackets are supported, although if there are an uneven number of brackets before the end of the line an error will be generated in the logfile and the task will fail.

Embeds are not supported in CCR files

Examples

Create a DSET called Azure.usage from the file /system/extracted/Azure/2017/02/23_usage.csv import usage from Azure

Create a DSET called Azure.custom from the file /system/extracted/Azure/2017/02/23_usage.csv import usage from Azure alias custom

Create a DSET called Azure.usage fom the file /system/extracted/Azure/20170223.csv import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

Create a DSET called Azure.usage from the file /system/extracted/Azure/20170223.csv import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

if

Overview

The if statement is used to conditionally execute one or more statements

Syntax

Conditional Expressions

A conditional expression (hereafter referred to as simple an expression) is evaluated to provide a TRUE or FALSE result which in turn determines whether one or more statements are to be executed or not. The following are examples of a valid expression:

An expression used by the if statement may contain:

Numeric and string literals
Regular expressions
Variables
Operators

Numeric and string literals

A literal is a specified value, such as 4.5 or "hostname". Literals may be numbers or strings (text).

If a literal is non-quoted then it will be treated as a number if it represents a valid decimal integer or floating point number (in either regular or scientific notation), else it will be treated as a string.

If a literal is quoted then it is always treated as a string, thus 3.1515926 is a number and "3.1415926" is a string.

Regular expressions

Regular expressions must be enclosed within forward slashes (/), and are assumed to be in .

If present, a regular expression must be used on the right hand side of either an !~ or an =~ operator, and when evaluated it will be applied to the value on the left hand side of an operator, eg:

As the forward slash is used as a delimiter for the expression, any literal forward slashes required by the expression should be escaped with a back-slash: \/

Variables

can be used within expressions, in which case they are replaced with their values. Once expanded, these values are treated as literals.

Operators

Operators are evaluated according to the operator precedence rules in the table below (where the highest precedence is evaluated first), unless parentheses are used to override them. Operators with the same precedence are evaluated from left to right.

Although expressions are evaluated based on the precedence of each operator as listed in the above table, it is recommended that parenthesis are used within the expression in order to remove any ambiguity on the part of a future reader.

Functions

A function is used to evaluate one or more arguments and return a result which is then taken into consideration when evaluating the overall truth of the expression.

Function calls start with a the character @ which is followed by the function name and a comma separated list of parenthesised parameters, for example @MIN(1, 2, 3) .

Function names must be specified in UPPER CASE as shown in the examples below.

The following functions are supported by the if statement:

Numeric functions

MIN

Return the smallest number from the specified list (requires at least 2 arguments)

Examples:

@MIN(1,2) returns 1
@MIN(1,2,-3) returns -3
@MIN(1,2,"-1")

MAX

Return the largest number from the specified list (requires at least 2 arguments)

Examples:

@MAX(1,2) returns 2
@MAX(-1,-2,-3) returns -1
@MAX(1,2,100/10)

ROUND

Returns number rounded to digits decimal places. If the digits argument is not specified then the function will round to the nearest integer.

This function rounds half away from zero, e.g. 0.5 is rounded to 1, and -0.5 is rounded to -1

Examples:

@ROUND(3.1415,3) returns 3.142
@ROUND(3.1415,2) returns 3.14
@ROUND(3.1415926536,6)

String functions

CONCAT

This function will treat all its arguments as strings, concatenate them and return the result.

Examples:

@CONCAT("the answer ", "is") returns the answer is
@CONCAT("the answer ", "is", " 42") returns the answer is 42
@CONCAT("the answer ", "is", " ", 42)

SUBSTR

Return a sub-string of string, starting from the character at position start and continuing until the end of the string end until the character at position length, whichever is shorter.

If length is omitted, then the portion of the string starting at position start and ending at the end of the string is returned.

Examples:

@SUBSTR("abcdef", 1) returns abcdef
@SUBSTR("abcdef", 3) returns cdef
@SUBSTR("abcdef", 3, 2)

STRLEN

Returns the length of its argument in bytes.

Examples:

@STRLEN("foo") returns 3
@STRLEN(@CONCAT("ab", "cd")) returns 4
@STRLEN(1000000) returns 7 (the number 1000000 is treated as a string)

Date functions

All date functions operate with dates in yyyyMMdd format

CURDATE

Returns the current (actual) date in the timezone of the Exivity server.

DATEADD

Adds a specified number of days to the given date, returning the result as a YYYYMMDD date.

Invalid dates are normalised, where possible (see example below).

Examples:

@DATEADD(20180101, 31) returns 20180201
@DATEADD(20180101, 1) returns 20180102
@DATEADD(20171232, 1)

DATEDIFF

Returns the difference in days between two yyyyMMdd dates. A positive result means that date1 is later than date2. A negative result means that date2 is later than date1. A result of 0 means that the two dates are the same.

Invalid dates are normalised, when possible (see example below):

Examples:

@DATEDIFF(20190101, 20180101) returns 365
@DATEDIFF(20180201, 20180101) returns 31
@DATEDIFF(20180102, 20180101)

Transcript-specific functions

Transcript-specific functions may be preceded with an exclamation mark in order to negate their output. For example:

FILE_EXISTS

Returns 1 if the file filename exists, else returns 0.

The FILE_EXISTS function will only check for the presence of files within the directories system or exported (as well as any sub-directories they contain) in the Exivity home directory.

FILE_EMPTY

In mode, this function returns 1 if the file filename exists and is empty. If the file does not exist, then this is considered an error.

In mode, a non-existent file is considered equivalent to an existing empty file.

In either case, if the file exists and is not empty, the function returns 0

DSET_EXISTS

Returns 1 if the specified DSET exists, else 0

DSET_EMPTY

In mode (option mode = strict), this function returns 1 if the specified DSET exists and is empty. If the DSET does not exist, then this is considered an error.

In mode (option mode = permissive), a non-existent DSET is considered equivalent to an existing empty DSET.

In either case, if the DSET exists and is not empty, the function returns 0.

COLUMN_EXISTS

This function returns 1 if the specified column exists, else 0. The column name may be , but if it is not, then it is assumed to be in the default DSET.

aggregate

Overview

The aggregate statement is used to reduce the number of rows in a DSET while preserving required information within them

Syntax

aggregate[dset.id][notime|daily] [offsetoffset][nudge] [default_functionfunction] colname function [... colname function]

Details

The aggregate statement is a powerful tool for reducing the number of rows in a DSET. Aggregation is based on the concept of matching rows. Any two rows that match may be merged into a single row which selectively retains information from both of the original rows. Any further rows that match may also be merged into the same result row.

A quick introduction

A match is determined by comparing all the columns which have a function of match associated with them (further information regarding this can be found below). If all the column values match, then the rows are merged.

Merging involves examining all the columns in the data that were not used in the matching process. For each of those columns, it applies a function to the values in the two rows and updates the result row with the computed result of that function. For a full list of functions, please refer to the table further down in this article.

To illustrate this consider the following two row dataset:

If we don't care about the colour value in the above records, we can combine them together. We do care about the quantity however, so we'll add the two values together to get the final result.

The statement to do this is:

aggregate notime id match location match quantity sum

id match means that the values in the id columns must be the same
location match means that the values in the location columns must be the same
quantity sum means that the resulting value should be the sum of the two existing values

Applying these rules to the above example we get the following single result record:

A column calledEXIVITY_AGGR_COUNT automatically created by the aggregate statement and for each row in the output it will contain the number of source rows that were merged together to create that result row

Parameters

The aggregate statement accepts a range of parameters as summarised in the table below:

If two records are deemed suitable for merging then the function determines the resulting value in each column. The available functions are as follows:

Non time-sensitive aggregation

When the notime parameter is specified, the aggregation process treats any columns flagged as start and end times in the data as data columns, not timestamp columns.

In this case when comparing two rows to see if they can be merged, the aggregation function simply checks to see if all the columns with a function of match are the same, and if they are the two rows are merged into one by applying the appropriate function to each column in turn.

De-duplication

The following illustrates the aggregate statement being used to remove duplicate rows from a DSET:

The analysis of the statement above is as follows:

notime - we are not interested in timestamps
default_function match - by default every column has to match before records can be aggregated
subscription_id match - this is effectively redundant as the default_function is match but needs to be present because at least one pair of colname function parameters is required by the aggregate

The resulting DSET will have no duplicate data rows, as each group of rows whose column values were the same were collapsed into a single record.

Row reduction while preserving data

The example shown at the top of this article used the sum function to add up the two quantity values, resulting in the same total at the expense of being able to say which source record contributed which value to that total.

The sum function can therefore accurately reflect the values in a number of source rows, albeit with the above limitation. By using a function of sum, max or min, various columns can be processed by aggregate in a meaningful manner, depending on the specific use case.

Time-sensitive aggregation

When aggregating, columns containing start time and end time values in UNIX epoch format can be specified. Each record in the DSET therefore has start and end time markers defining the period of time that the usage in the record represents. As well as taking the start times and end times into account, time-sensitive aggregation can perform additinal manipulations on these start and end times.

A quick example

Consider the following CSV file called aggregate_test.csv:

It is possible to aggregate these into 3 output records with adjusted timestamps using the following Transcript task:

Resulting in:

As can be seen, for each unique combination of the values in the id,subscription-id and service columns, the start and end times have been adjusted as described above and the quantity column contains the sum of all the values in the original rows.

!!! warning When performing time-sennsitive aggregation, any records with a start or end time falling outside the current will be discarded.

Further notes

The daily parameter to aggregate means that the START_TIME and END_TIME columns are now recognised as containing timestamps. When aggregating with the daily option, timestamps within the current dataDate are combined to result in an output record which has the earliest start time and the latest end time seen within the day.

Optionally, following daily an offset may be specified as follows:

aggregate aggr.test daily offset 2 id match subscription_id match quantity sum

In this case the start and end timestamps are adjusted by the number of hours specified after the word offset before aggregation is performed. This permits processing of data which has timestamps with timezone information in them, and which may start at 22:00:00 of the first day and end at 21:59:59 of the second day, as an offset can be applied to realign the records with the appropriate number of hours to compensate.

The nudge parameter shaves 1 second off end times before aggregating in order to avoid conflicts where hourly records start and end on 00:00:00 9the last second of the current hour is the same as the first second of the next hour)