Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
The buffer
command is used to create and/or populate one of these named buffers with data.
buffer
name
=
protocol protocol_parameter(s)
The first argument to the buffer
statement is the name of the buffer to create. If a buffer with this name already exists then any data it contains will be overwritten.
There must be whitespace on both sides of the 'equals' symbol following the buffer name.
The following protocols are supported:
buffer
buffername
= file
filename
The file
protocol imports a file directly into a buffer. This can be very useful when developing USE scripts, as the USE script for processing for a JSON file (for example) can be implemented without requiring access to a server.
If the specified buffer name already exists, then a warning will be logged and any data in it will be cleared before importing the file.
buffer
buffername
= data
string
The data
protocol populates the buffer with the literal text specified in string. This is useful when extracting embedded JSON. For example the JSON snippet below contains embedded JSON in the instanceData field:
In this case the instanceData field can be extracted using a parslet, placed into a new buffer and re-parsed to extract the values within it. Assuming the snippet is in a file called my_data.json
this would be done as follows:
buffer
buffername
= http
method url
!!! note For full details on the HTTP protocol and its parameters please refer to the http article.
Once the HTTP request has been executed, any data it returned will be contained in the named buffer, even if the data is binary in format (eg: images, audio files or anything else non-human readable).
If the HTTP request returned no data, one of the following will apply:
If the buffer does not already exist then the buffer will not be created
If the buffer already exists then it will be deleted altogether
For details of how to access the data in a named buffer, please refer to the USE script basics article.
buffer
buffername
= odbc
dsn [username password] query
username and password are optional, but neither or both must be specified
where:
dsn is the ODBC Data Source Name (this should be configured at the OS level)
username and password are the credentials required by the DSN
query is an SQL query
Once the query has been executed, the resulting data is located in the named buffer. It can subsequently be saved as a CSV file to disk using:
save {buffername} as filename.csv
The resulting CSV uses a comma (,
) as the separator and double quotes ("
) as the quoting character. Any fields in the data which contain a comma will be quoted.
buffer
buffername
= odbc_direct
query
where query is an SQL query.
Executes SQL query against ODBC datasource that is described in set's odbc_connect parameter.
Once the query has been executed, the resulting data is located in the named buffer. It can subsequently be saved as a CSV file to disk using:
save {buffername} as filename.csv
The resulting CSV uses a comma (,
) as the separator and double quotes ("
) as the quoting character. Any fields in the data which contain a comma will be quoted.
The following examples retrieve data from ODBC and HTTP sources:
Exivity can be deployed from the AWS Market Place allowing you to have a functional Exivity solution in a matter of minutes. This tutorial will get you up and running.
Login to your AWS portal and access the Exivity offering here.
Click on Continue to Subscribe.
Read our Terms and Conditions, and when ready, click on Continue to Configuration.
Select the Region where you want to deploy Exivity and click on Continue to Launch.
In Choose Action select Launch though EC2 and click on Launch to access the Deployment Wizard.
In the first screen, try to pick a recommended VM size type that has enough CPU's and Memory (see this page for general system requirements). When you are done with your selection click on Next: Configure Instance details.
2. In this section, you can select your VPC Configuration, or leave the default values.When you are done with your configuration click on Next: Add Storage.
3. In the Storage section, two drives are recommended, you can influence the Volume Type and the Size (GiB) parameters. When ready, click on Next: Add Tags.
4. In the Tags section, include the tags that are meaningful to you, as a minimum, a Name tag is recommended. Click on Next: Configure Security Group.
5. In the Security Group section, you can leave the default recommended security group or add more rules if needed. Click on Review and Launch.
6. Review the details and click on Launch, select your preferred Key Pair to connect to the instance.
In a few minutes your instance will be deployed, you can track the progress in your EC2 Dashboard:
Write down the Public IP address / Public DNS and the Instance ID once they are available.
You can logon to your Exivity instance with RDP, but after deployment you should be able to connect to your instance using the public IP address or DNS name of your Exivity instance considering the following default URL:
https://<Your_Public_DNS>:8001
The default admin username is admin with password Instance ID.
By default no data is loaded into the system, so you'll have to create a new Extractor for obtaining consumption data and a Transformer to process that data. A Report Definition is then created to be able to report on your consumption metrics and costs.
A couple of getting started guides are provided here, but feel free to drop us an e-mail or create a ticket in our support portal. We will then assist you to get your started for your specific use case
Exivity can be installed in your on premise data center using the provided installer. You can automatically deploy it using the silent installation command line options. But you may also execute it as a interactive installer.
Before you can install Exivity, you'll need the following:
A system that complies with the Exivity minimal system requirements
The Exivity software installation executable
A valid Exivity license key
If you need help on fulfilling one or more of the above requirements, please get in contact with our support department.
To install Exivity interactively, execute the provided setup executable. Then follow the instructions on screen:
Provide a valid license key
Pick a folder to install the Exivity program Files
Pick a folder that Exivity will use to save its configuration, extraction and reporting files. This is called the Exivity home directory
Provide the TCP port for the Exivity API (default: 8002)
Provide the TCP port for the Exivity GUI (default: 8001)
Choose a admin username
and password
After the installer completes, you can automatically start it.
To execute a silent installation, consider the following command line parameters:
<setup>.exe /S /EXIVITY_PROGRAM_PATH=[path] /EXIVITY_HOME_PATH=[path] /ADMIN_USER=[user] /ADMIN_PASSWORD=[password]
EXAMPLE:
<setup>.exe /S /EXIVITY_PROGRAM_PATH="C:\Program Files\Exivity\program" /EXIVITY_HOME_PATH=D:\Exivity\home /ADMIN_USER=master /ADMIN_PASSWORD=P@ssword
Upgrading your installation of Exivity is very simple and fast:
Execute the setup executable. It will detect the installed version of Exivity, and will automatically upgrade when you click Next
When executing <setup>.exe /S
, your existing installation will be automatically upgraded.
Exivity comes standard with an insecure self signed SSL certificate. It is therefore highly recommended to replace the default certificate with an official one, signed by your Certificate Authority. To install a signed certificate, follow this procedure:
Download the 32 bit version of openssl.exe from https://slproweb.com/products/Win32OpenSSL.html, and install this tool on the Exivity server
Use the openssl.exe executable to generate a valid key file on the Exivity server by executing the following command:
C:\TEMP>c:\location\to\openssl.exe genrsa -out exivity.key 4096
Then run the following command to create a certificate signing request file:
C:\TEMP>c:\location\to\openssl.exe req -new -key exivity.key -out exivity.csr
You will be asked to enter general information like company name, city, etc. However, it is important to include the FQDN of the Exivity server when asked for:
'Common Name (e.g. server FQDN or YOUR name) []'
NOTE: when asked, it is required to not provide a password (leave this field empty and press return), otherwise the Exivity application will not be able to use your certificate.
The generated CSR file should be send to your Certificate Authority. After processing by your CA, you shoud receive back a .crt file. Rename this file to exivity.crt and copy it, together with your exivity.key file, in your %EXIVITY_PROGRAM_PATH%\server\nginx\conf
. This should overwrite the existing key and crt files
Now restart the Exivity Web Service
Windows service, and your signed certificate should now be active.
In some cases it might be desirable to separate the webserver from the backend components. This can be achieved by installing two separate Exivity instances. One instance could be placed in a DMZ, and the second instance would then typically be deployed within a local network as shown in the following diagram:
To achieve this goal, first install Exivity on both nodes using the standard procedure described here. After installing the Exivity software on the system that should become the User Portal, create the following file in the directory%EXIVITY_PROGRAM_PATH%/server/nginx/conf/sites-enabled
:
Make sure to replace HOSTNAME_BACKEND_PORTAL
with the actual hostname or IP address of the system that serves as your Exivity Backend Portal.
The second item that requires configuration is the config.json
in the %EXIVITY_PROGRAM_PATH%/web/glass
directory on the User Portal:
Replace HOSTNAME_USER_PORTAL
with the actual hostname or IP address of the system that serves as your Exivity User Portal. In case the User Portal should be accessible from the internet, you will need to ensure to provide the fully qualified domain of the User Portal.
Once you have applied these changes, restart the Exivity Web Service
Windows services. You should now be able to access your Exivity User Portal.
The following articles provide information regarding the basic concepts around which Exivity is built. These concepts are referenced repeatedly throughout the documentation and as such the below articles are recommended reading:
CSVs Datasets and DSETs - The input files to Transcript and how their contents are accessed by Transcript tasks
Data Date - The date against which data is to be processed by Transcript
Base Working Directory - the directory within which Transcript and Edify operate
Reporting Database Files - the files produced by Transcript and used by Edify to generate reports
When Transcript is executed one of the command line arguments it requires is a date in yyyyMMdd format which is termed the data date. The activities performed by Transcript are associated with this date in several ways, most notably:
when importing using automatic source tagging which specific Dataset file to import from \collected
to determine the concept of 'today' when processing files containing data with timestamps spanning multiple days
to determine the output directory into which RDF files will be generated
to generate the filename of the RDF files generated
The data date is made available to a Transcript task through the automatic creation of the ${dataDate}
variable
Exivity is a metering and billing software solution for public and private cloud environments that allows you to report on cloud consumption from any IT resource. Exivity enables you to apply your MSP/CSP business rules and makes any type of Pay-as-you-Go model work. It also facilitates internal charge-back and show-back requirements for Enterprise IT.
These things are done by extracting IT consumption data from various endpoints and then mapping this data to meaningful customer-specific information such as services, customer IDs, names and contracts.
There are four main steps involved in a successful deployment:
Extract
Transform
Report
Integrate (optional)
The Extract step defines your data sources such as:
APIs that return usage data, service catalogue, rate card, customer/subscriber lists and similarly available records from public or private clouds
APIs or ODBC queries that return contracts, customer names, IDs and other contextual lookup data from CMDB / CRM systems
Flat files on disk in CSV, JSON or XML format
Other HTTP/S sources
Exivity provides a rich scripting interface via its Unified Scriptable Extractor (USE) component which facilitates integration with almost any data source. For most of the big cloud platforms we provide template extractor scripts as part of the product. Additionally you can also write your own USE scripts from scratch in order to integrate with custom data sources.
The Transform step provides a powerful ETL framework for processing extracted data. Using it you can merge consumption metrics, contract details, customer information, custom metadata, service definitions or any other imported information to produce an enriched and/or normalised result.
This is done using the Transcript component, which executes user-definable scripts (termed tasks) in order to produce a meaningful set of data suitable for reporting against. Often this data will feed a consolidated bill of IT based on the various different consumed services.
Transcript also allows you to define and populate services and rates, either of which may be passed through from cloud data, defined as custom offerings or a mixture of the two.
Exivity provides a modern responsive User Interface, that allows you to 'slice and dice' the processed data in any way you choose. Multiple Report Definitions can be created with ease which allow you to graphically and textually display both cost and usage statistics.
We think that Exivity should be part of your automation landscape, where it can provide (for example) line items that can be digested by your ERP and/or invoicing system. Therefore we consider Integrate as the logical final step for any deployment where it is useful.
To this end we offer an open and fully featured REST API. Our GUI uses our own API for all back-end processes meaning that all textual data shown in the Exivity GUI is also obtainable via our API.
Exivity can be installed on any Microsoft Windows 2012 R2 or higher server in your on premise data center or in the cloud. Depending on the amount of data, Exivity recommends the following system configuration:
Both the Data sources and CUPR in above tables are recommended limits. All systems should have a standard C: OS drive. The storage recommendation in the table above is for a D: drive, which preferably is SSD.
The Exivity front-end runs fine in these desktop browsers (with at least the specified version):
Google Chrome 63
Microsoft Edge 16
Opera 50
We don't support Apple Safari and Mozilla Firefox at the moment due to missing features in these browsers.
We aim to provide the fastest metering and billing solution available today, and this means we have to rely on modern (web) technologies. Part of our speed comes from pre-processing the raw data, and part comes from having almost all processed data available right in the browser, and streaming the missing pieces on request. To efficiently and reliably achieve this, we use some very specific technologies not yet available in all browsers. Most notably, Safari and Firefox don't support all features we need to build our next-generation platform yet. When they do catch up, we'll fully support those browser, and until that time, please choose from Edge, Chrome or Opera.
Apart from installing Exivity in any on premise environment, Exivity can also be deployed from the Azure Market Place (AMP). Deploying Exivity on AMP is straight forward, and can be finised within a few minutes via your Azure Portal.
Login to your Azure Portal at https://portal.azure.com and then go to the Marketplace to search for the Exivity offer:
Once you've selected the Exivity offering, you should be presented with the following screen:
After clicking the Create button, you will be redirected to the VM deployment wizard
Fill in a Windows user/pass and pick your deployment Resource Group:
Make sure to write down this username and password, as you will need these when connecting to the Exivity Windows server using the Remote Desktop Protocol.
3. You may select any additional options, but none are required for running Exivity succesfully, so you may skip this page simply by clicking the OK button:
4. Review the summary and click Create to deploy your Exivity VM:
This may take a few minutes. You may review the status of the Virtual Machine in your VM list:
Write down the Public IP address once it is available. Optionally you may configure a custom DNS name to have an easy way to connect.
You can logon to your Exivity instance with RDP, but after deployment you should be able to connect to your instance using the public IP address or DNS name of your Exivity instance considering the following default URL:
https://<Your_Public_IP>:8001
The default admin username is admin with password exivity.
2. Try to pick a recommended VM size type that has enough CPU's and Memory (see for general system requirements). Smaller machines are possible, but will influence performance:
By default no data is loaded into the system, so you'll have to create a new for obtaining consumption data and a to process that data. A is then created to be able to report on your consumption metrics and costs.
A couple of getting started guides are provided , but feel free to drop us an or create a in our support portal. We will then assist you to get your started for your specific use case.
Deployment | Data sources | CUPR | CPU | Memory | Storage |
Tiny | 2 | 2 000 | 1 Core | 4 GB | 25 GB |
Small | 2 | 10 000 | 2 cores | 8 GB | 50 GB |
Medium | 4 | 30 000 | 4 cores | 16 GB | 100 GB |
Large | 8 | 60 000 | 8 cores | 32 GB | 200 GB |
The main program directory as it should be installed by the Exivity installer:
CSV files are a way of storing data in a table format. A table consists of one or more rows, each row containing one or more columns. There is no formal specification for CSV files although a proposed standard can be found at https://www.ietf.org/rfc/rfc4180.txt.
Although the 'C' in 'CSV' stands for the word comma, other characters may be used as the separator. TAB and semicolons are common alternatives. Exivity can import CSVs using any separator character apart from an dot or an ASCII NUL and uses the comma by default.
A Dataset is a CSV file, usually produced by USE, which can be imported for processing by a Transcript task. To qualify as a dataset a CSV file must meet the following requirements:
The first line in the file must define the column names
Every row in the file must contain the same number of fields
A field with no value is represented as a single separator character
The separator character must not be a dot (.
)
There may be no ASCII NUL characters in the file (a NUL character has an ASCII value of 0)
When a Dataset is imported by a Transcript task any dots (.
) contained in the a column name will be replaced with underscores (_
). This is due to a dot being a reserved character used by Fully Qualified Column Names
When a DSET is exported during execution of a Transcript task, the exported CSV file will always be a Dataset, in that it can be imported by another Transcript task.
Although datasets are generated by USE as part of the extraction phase, additional information may be provided in CSV format for the purposes of enriching the extracted data. This additional CSV data must conform to the requirements above.
A DSET is the the data in a Dataset once it has been imported by a Transcript task. A DSET resides in RAM during the Transform phase and, if referenced by a finish statement, is then stored in a database file for long-term use.
A Transcript task may import more than one dataset during execution, in which case multiple DSETs will be resident in RAM at the same time. It is therefore necessary for any subsequent Transcript statement which manipulates the data in a DSET to identify which DSET (or in some cases DSETs) to perform the operation on. This is achieved using a DSET ID which is the combination of a Source tag and an Alias tag.
If multiple DSETs are present in memory, the first one that was created will be the default DSET. Column names that are not fully qualified are assumed to be located in the default DSET.
After a Dataset has been imported the resulting DSET is assigned a unique identifier such that it can be identified by subsequent statements. This unique identifier is termed the DSET ID and consists of two components, the Source and Alias tags.
The default alias tag is defined automatically and is the filename of the imported Dataset, minus the file extension. For example a Dataset called usage.csv
will have an alias of usage
.
The Transcript import statement can take one of two forms. Depending on which is used, the Source tag is determined automatically, or specified manually as follows:
By convention Datasets produced by USE are located in sub-directories within the directory <basedir>\collected
. The sub-directories are named according to the data source and datadate associated with that data. The naming convention is as follows:
where:
<data_source>
is a descriptive name for the external source from which the data was collected
<yyyy>
is the year of the datadate as a 4-digit number
<MM>
is the month of the datadate as a 2 digit number
<dd>
is the day of the month of the datate as a 2 digit number
When importing one of these datasets using automatic source tagging, the source tag will be the name of the directory containing that dataset. Thus, assuming a datadate of 20160801 the following statement:
import usage from Azure
will import the Dataset <basedir>\collected\Azure\2016\08\01_usage.csv
, and assign it a Source tag of Azure.
When importing a Dataset from a specified path, the Source tag is specified as part of the import statement. For example the statement:
import my_custom_data\mycosts.csv source costs
will import the Dataset <basedir>\my_custom_data\mycosts.csv
and assign it a Source tag of costs. Checks are done during the import process to ensure that every imported Dataset has a unique source.alias
combination.
When a column name is prefixed with a DSET ID in the manner described previously, it is said to be fully qualified. For example the fully qualified column name Azure.usage.MeterName
explicitly refers to the column called MeterName in the DSET Azure.usage.
During the execution of a task, or during the generation of a report, a number of files are accessed, both for reading and writing. The user Home Working Directory (referred to throughout this documentation as home_dir
or base_dir
) is the directory relative to which these files are located.
The home directory should preferably be located on a dedicated volume i.e. D:\exivity\home
and it is recommended that it be located on an SSD drive.
RDF stands for Reporting Database File. Exivity uses these to store usage data and configuration information.
A daily RDF stores the usage data from which Edify produces report data. Any given RDF contains a single DSET, along with internally managed metadata such as column types and prepared report data.
An RDF is created using the statement in a task. For any given DSET, there can be a single RDF per day, although there may be many RDF files per day in total (one RDF for each DSET).
RDFs are named according to the and ID of the DSET they contain, and have a .rdf
extension. For example a DSET with an ID of Azure.usage
for the first day of the month will result in an RDF called 01_Azure.usage.rdf
.
An RDF containing usage data is located at <home dir>/report/<yyyy>/<mm>/<dd>_<dset>.rdf
where:
<yyyy>
is the year of the datadate
<MM>
is the month of the datadate
<dd>
is the day of the datadate
<dset>
is the DSET ID that was used to populate the usage data
The global RDF (located at<home_dir>/system/global.rdf
) contains system-wide configuration data including (but not necessary limited to):
Service definitions
Service rate revisions
Service rate adjustments
Service categories
User accounts
Report definitions (and related metadata such as sychronisation timestamps)
Metadata about RDF files created by Transcript
Usergroups
Security settings
Account information
Job Schedules
The Global RDF should never be manually modified unless this process is performed by, or under the guidance of, Exivity support staff. It is critical to make a backup copy of this file before any changes are made.
The file p_gp.rdf (located in <basedir>/system/report
) is used by the charge engine to store the output of report runs.
This page is a work in progress
In Exivity Services can be anything that corresponds to a SKU or sellable item from your Service Catalogue. It should relate to a consumption record (or multiple records) from your extracted data sources.
For example: with most public cloud providers, the provider defines the chargeable items that are shown on the end of month invoice. However, when working through a Managed Services Provider, a Cloud Services Provider or a System Integrator, additional services can be sold on top of those. Potentially, you may want to apply an uplift to the rate or charge a fixed amount of money every month for a certain service. Different scenario's are possible here, it all depends on your business logic.
A service is a named item with associated rates and/or costs used to calculate a charge that appears on a report, where rates represent revenue and costs represent overheads.
When discussing services and their related charges a number of terms are required. Exivity uses the following terminology in this regard:
When created during the ETL process, service definitions are created via the service and services statements in Transcript. During the execution of a Transcript task, service definitions created by these statements are cached in memory. Once the task has completed successfully, the cached services are written to the global database where they remain indefinitely (or until such time as they are manually deleted).
If the task does not complete successfully then the service definitions cached in memory are discarded, the expectation being that the task will be re-run after the error condition that caused it to fail has been rectified and the services will be written to the global database at that time.
There are different types of charge that can be associated with a service. Collectively these influence the total charge(s) shown on the report and Exivity supports the following charge types as described in the Terminology table above:
unit rate
fixed price
COGS rate
fixed COGS
At least one of these charge types must be associated with a service definition. Multiple combinations of the charge types may also be used.
Once the resulting charge has been calculated based on the charge types, it may be further modified through the application of adjustments, proration and minimum commit (all of which are detailed later in this article).
In order to calculate the charge(s) associated with usage of a service Exivity needs to know the period of time for which each payment is valid. For example a Virtual Machine may have a daily cost associated with it, in which case using it multiple times in a single day counts as a single unit of consumption whereas Network Bandwidth may be chargeable per Gigabyte and each gigagyte transferred is charged as it occurs.
The charge interval (also termed simply interval) for a service can be one of the following:
individually - the charge for a service is applied every time a unit of the service is consumed, with no regard for a charging interval
daily - the charge is applied once per day
monthly - the charge is applied once per calendar month
Although hourly charge intervals are not yet directly supported, it is possible to charge per hour by aggregating hourly records and using the EXIVITY_AGGR_COUNT
column created during the process to determine the units of hourly consumption as a result.
The minimum commit is the minimum number of units of consumption that are charged every interval, or (in the case of services with an interval of individually) every time the service is used. If fewer units than the minimum commit are actually consumed then the service will be charged as if the minimum commit number of units had been used.
After the charge for usage of a monthly service has been determined, it may be prorated by modifying that charge based on the frequency of the usage.
This process will reduce the charge based on the number of days within the month that the service was used. For example if consumption of a service with a monthly charge interval was only seen for 15 days within a 30 day calendar month then the final charge will be 1/2 of the monthly charge.
!!! note Daily proration whereby a daily service is prorated based on the hours seen will be implemented in the second half of 2018 and this documentation will be updated at that time to reflect the additional feature
A service definition comprises two categories of information:
The service - Metadata describing fixed attributes of the service such as its name, description, group, interval, proration and charge type(s)
The rate revision - Information detailing the charge type(s) associated with the service (the rate, fixed price, COGS rate and fixed COGS values) and additional information detailing the date(s) for which those values should be applied
A service definition is associated with a specific a DSET as the units of consumption are retrieved from a column (named in the service definition itself) in usage data.
The following tables summarise the data members that comprise each of these categories:
The rate_col, fixed_price_col, cogs_col and fixed_cogs_col fields are used when the specific value to use is derived at report-time from the usage data, as opposed to explicitly being included in the rate revision itself.
A service may have any number of associated rate revisions so long as they have different effective_date or minimum commit values. This means that a service can have different charges applied depending on the date that the report is to be generated for, or depending on the specific values in the columns used by a report.
A service may use neither, either or both of rate and fixed_price, and neither or one of cogs and fixed_cogs. At least one of rate, fixed_price, cogs or fixed_cogs is required, but cogs and fixed_cogs may not both be used.
Any or all of rate, fixed_price, cogs and fixed_cogs may have a value of 0.0, in which case no charges will be leveraged against the service but the units of consumption will still be shown on reports.
Term
Synonym/Abbreviation
Meaning
service definition
service
A template defining the manner in which service instances should be charged
service instance
instance
Consumption of a service, associated with a unique value such as a VM ID, a VM hostname, a resource ID or any other distinguishing field in the usage data
unit of consumption
unit
The consumption of 1 quantity of a service instance
charge interval
interval
The period of time that a unit of consumption is charged over (additional units of the same service instance consumed within the charge interval do not increase the resulting charge)
unit rate
rate
The charge associated with 1 unit of consumption of a service instance in the charge interval
COGS rate
cogs
(Short for Cost Of Goods Sold
) The cost (overhead) to the provider of a service for providing 1 unit of consumption of that service per charge interval
fixed price
fixed rate or interval-based rate
A specific amount charged per service instance per interval for one or more units of consumption
fixed COGS
interval-based COGS
A specific amount representing the overheads associated with providing one service instance of a service per charge interval
charge
A generic term to indicate some money payable by the consumer of service instances to the provider of those instances
Attribute
Purpose
key
A unique key (as a textual string) used to identify the service
description
A user-defined description or label for the service
group or category
An arbitrary label used to group services together
unit label
A label for the units of measure, such as 'GB' for storage
RDF or DSET
The DSET ID of the usage data against which the service is reported
usage_col
The name of the column in the usage data from which the number of units consumed can be derived
interval
The charging interval for the service, such as 'daily', 'monthly' etc.
proration or model
Whether the service is prorated or unprorated
rate type
Which (if any) of rate and fixed rate to apply
cogs type
Which (if any) of cogs and fixed cogs to apply
Field
Description
rate
The cost per unit of consumption
rate_col
The name of a column containing the cost per unit of consumption
fixed_price
A fixed charge associated with use of the service per charging interval, regardless of the amount of usage
fixed_price_col
The name of a column containing the fixed charges as described above
cogs
(Short for Cost Of Goods Sold) The cost per unit associated with delivery of the service
cogs_col
The name of a column containing the COGS cost per unit
fixed_cogs
As for fixed_price but for the cost of delivering the service
fixed_cogs_col
The name of a column containing the fixed_cogs prices
effective_date
A date in yyyyMMdd format (stored internally as an integer) from which the rate is valid
minimum commit
The minimum commit value for the service (if this is 0
then no minimum commit is applied)
The 'Data Sources' menu allows an admin of the Exivity solution to manage USE 'Extractors'. USE has its own language reference, which is fully covered in a separate chapter of this documentation.
As described in the USE documentation, you are free to use your editor of choice to create and modify USE Extractors. However, the GUI also comes with a built-in USE Extractor-editor.
To create a new USE Extractor, follow these steps:
From the menu on the left, select "Data Sources" > 'Extractors'
To create a new USE Extractor where you want pull usage or lookup data from, click the 'Add Extractors' button
When your Exivity instance has access to the Internet, it will pull in the latest set of Extraction Templates from our Github account. These templates are then presented to you, and you can pick one from the list to start Extracting. If you don't have access to the internet, you can download them directly from Github. You are also free to start creating your own Extractor from scratch.
Provide a meaningful name for your USE Extractor. In the above example we're creating an USE Extractor for VMware vCenter 6.5 and higher. Therefore we call this USE Extractor : 'vCenter 6.5'
When you're done creating your USE Extractor, click the 'Insert' at the bottom of the screen
When you want to change or delete an existing USE Extractor, first select one from the list of USE Extractor that you want to change:
When you've selected your USE Extractor from the "Data Sources" > 'Extractors' list, you can change the USE Extractor variable values via the 'Variables' tab. Any variable value (except encrypted variables) from the USE Extractor script, can be change via this menu
In this example there is a variable called User, which has a value of vcenter_admin. We can change this value simply by changing that value to something else: my_new_user
When clicking 'Save' at the bottom right of this screen, you can save your changes to this USE Extractor
If you want to make more fundamental changes to this USE Extractor, you may click on the 'Editor' tab just next to the 'Variables' tab.
In the 'Editor' screen, you can make more advanced changes to your script. Such as:
changing existing API calls
changing csv output format
provide usernames / password that require encryption
In case you want to save your changes, click the 'Save' button at the bottom of the 'Editor' screen. To delete this USE Extractor, you can do so by clicking the 'Remove' button, after which you'll receive an confirmation pop-up where you'll have to click 'OK'.
To test your USE Extractor, you can execute or schedule it directly from the Glass interface:
After you have selected the USE Extractor that you would like to run, click to the 'Run' tab next to the 'Editor' tab
Most Extractors require one or more parameters, usually in a date format such as 20171231. In this example, the USE Extractor requires two parameters: a from and to date
When you've provided the required run parameters, click 'Run Now' to execute the USE Extractor. After the USE Extractor has completed running, you will receive some success or failed message, after which you might need to make additional changes to your USE Extractor
Once you're happy with your output, you can schedule the USE Extractor via the 'Schedule' tab, which is located next to the 'Run' tab at the top of the screen.
USE Extractors can be scheduled to run once a day at a specific time. Also you should provide a from and (optionally) to date, which are provided by using an offset value. For example, if you want to use the day before yesterday as a from date, you should use the down pointing arrows on the right, to select a value of -2. If the to date should always correspond with yesterdays date, you should provide a value there of -1.
If your Use Extractor requires additional parameters, you may provide these as well in the 'Schedule with these arguments' text field.
When you're done with the schedule configuration, you may click the 'Schedule' button. In case you want to change or remove this schedule afterwards, click the 'Unschedule' button.
As of version 1.6, it is recommend to use the Workflow function instead of the Extractor schedule
The 'Data Sources' menu allows an admin of the Exivity solution to manage Transcript 'Transformer' scripts. Transcript has its own language reference, which is fully covered in a separate chapter of this documentation.
As described in the Transcript Documentation, you are free to use your editor of choice to create and modify Transformers. However, the GUI also comes with a built-in Transformers-editor.
To create a new Transformer for Transcript, follow these steps:
From the menu on the left, select "Data Sources" > 'Transformer'
To create a new Transformer to normalise and enrich USE Extractor consumption and lookup data, click the 'Add Transformer' button
When your Exivity instance has access to the Internet, it will pull in the latest set of Transformer Templates from our Github account. These templates are then presented to you, and you can pick one from the list to start Extracting. If you don't have access to the internet, you can download them directly from Github. You are also free to start creating your own Extractor from scratch.
Provide a meaningful name for your Transformer. In the above example we're creating an Transformer for a consolidated bill of various IT resources. Therefore we call this Transformer : 'IT Services Consumption'
When you're done creating your Transformer, click the 'Insert' at the bottom of the screen.
!!! INFO The Transformer editor has syntax highlighting and auto completion, to simplify the development of your scripts
When you want to change or delete an existing Transformer, first select one from the list of Transformer that you want to change:
When you've selected your Transformer from the "Data Sources" > 'Transformers' list, you can change the Transformer script in the editor
In this example, we're adding a 'services' statement using auto completion, to simplify the creation of services
In case you want to save your changes, click the 'Save' button at the bottom of the 'Editor' screen. To delete this Transformer, you can do so by clicking the 'Remove' button, after which you'll receive an confirmation pop-up where you'll have to click 'OK'.
To test your Transformer, you can execute or schedule it directly from the Glass interface:
After you have selected the Transformer that you would like to run, click to the 'Run' tab next to the 'Editor' tab
Manual execution of a Transformer can only be done for a single day. Provide the date you want to run this transformer for in dd-MM-yyyy format. You can also use the date picker, by clicking on the down facing arrow, on the right side of the date field
When you've provided the required date, click 'Run Now' to execute the Transformer. After the Transformer has completed running, you will receive some success or failed message, after which you might need to make additional changes to your Transformer. For futher investigations or troubleshooting, consult the "Administration" > "Log Viewer" screen
Once you're happy with your output, you can schedule the Transformer via the 'Schedule' tab, which is located next to the 'Run' tab at the top of the screen
Transformer can be scheduled to run once a day at a specific time. Also you should provide a date, which is provided by using an offset value. For example, if you want to execute this Transformer against yesterdays date with every schedule run, you should provide a value there of -1
When you're done with the schedule configuration, you may click the 'Schedule' button. In case you want to change or remove this schedule afterwards, click the 'Unschedule' button.
!!! warning The as of version 1.6, it is recommend to use the Workflow function instead of the Extractor schedule
Exivity enables you to create account specific rate adjustment policies. An adjustment policy allows you to apply a discount or a premium using one of these modifiers:
a certain amount of money (i.e. $ 100)
a certain quantity (i.e. 100 GB/hours)
a percentage (i.e. 10%)
This Adjustment can then be applied to a single service, multiple different services, or one or more service categories.
To create a new adjustment policy for an account, follow these steps:
From the menu on the left, select 'Catalogue' > 'Adjustments'
Then select the Account from the list of accounts for which you want to create an adjustment policy
After selecting the account, click 'Add Policy', and provide a meaningful name for your policy in right screen where it says 'Adjustment name'
Provide the Start date, by selecting the initial month when this adjustment policy is applied
Provide the End date, by selecting the month when this adjustment policy will be discontinued. This is optional, since an adjustment policy can be applied permanently.
Select which Service or Service Category this policy is applied to. You are able to select multiple using the check boxes that are provided.
Select a Type for this adjustment. This can be either a Discount or a Premium
Select the Target, meaning: is this Adjustment targeting the total Charge or the total Quantity of the selected service(s)?
Select the Difference setting, to indicate an Absolute value (i.e. 100 units, or 100 dollars) or a Relative value (such as 10%)
Lastly provide the Adjustment value. In the example shown in the image above, there is a value of '10' provided in the Amount field, which will adjust the total charge with -10% given the provided parameters.
When you're done, click the Add Policy button. Your changes are now applied to all charge related reports.
The 'Rates' screen allows you to configure manual rates for services that do not have a rate provided with their data source. Before you can use this screen, it is required to the necessary service(s) via the engine. When that requirement has been fulfilled, you may configure Global and customer specific rate configurations. The following rate types are currently supported for automatic, daily and monthly services:
automatic per unit
automatic per interval
automatic per unit & interval
manual per unit
manual per interval
manual per unit & interval
Automatic services obtain the rate and/or interval value from a column you specify, whereas manual services allow the user to manually specify a rate and fixed interval money value. For manual services, if a service definition has proration enabled, the charge on the cost reports is calculated based on the actual consumption (see ).
By default each service has a global rate configured, which will be applied to all accounts that consume this service. However, it is possible to use customer specific rates by overriding a service rate for one ore multiple accounts.
A manual service can have up to 3 rate values that can be changed: the unit rate, the interval money value and the COGS rate. To change these values, go the 'Catalogue' > 'Rates' screen and click on the service name for which you want to change the global rate value:
To change the rate values of this service, consider the following:
Effective date is the date from when this rate is applied to the service. A service can have one or multiple revisions. You may add new rate revisions by using the Add Revision button. Existing rate revision dates can be changed using the Change Date option
The Per Unit rate value is the value that the service charged for, every (portion of) configured interval service. In this example, if this would be a daily service that is charged 1 euro per Gigabyte of database usage, and each day a 100 GB database is consumed, a value of € 100 will be charged per day (and € 3100 if used for entire month of December)
The Per Interval charge is applied every occurrence of the consumed service, regardless of the total consumed quantity of that service. In the previous example at 2, this would mean every day a value of 110 euro will be charged for a 100 GB database: 100 euro because of the standard rate + 10 euro for the fixed interval charge. Considering the same consumption for the entire month of December, the total charge for that month will be 31 x € 110 = € 3410
It is possible to configure a COGS rate for this service. This is applied the same way as the Per Unit rate
To delete an invalid or wrong revision, use the Remove Revision button. Do bear in mind you cannot delete the last rate revision for a service
If you are planning to make more changes to other services in the same report definition, use the Save Revision > Without Preparing option. This will avoid running the re-preparation several times, and allows you to start the re-preparation only after you've made all of the required rate changes.
The services screen gives a user the ability to view and change the available services in the service catalogue of the Exivity deployment. When creating new services, it is required to use a Transformer with the or statement.
To view the details of a service that has already been created, click on one of the services listed in the 'Catalogue' > 'Services' screen:
The numbered items from above screenshot refer to the following list:
The description or friendly name for this service
The time stamp when the service was created
The time stamp when the service was updated
The DataSet where this service relates to
Where to obtain the service name from (in header or in data). The value will be used for the service description (see 1)
The source column that has the consumed quantity
The Instance column refers to the chargeable instance column value (i.e. VM ID) which is required for automatic Services
The interval that defeines the frequency of how often this service is being charged. Meaning: automatic (every occurrence/record/hour), daily or monthly
When using proration, this checkbox will be enabled. Proration takes into account whether to charge for a portion of a consumption interval. For example: when having 10 days of consumption for a monthly service with a rate configured of € 90 per unit that has proration enabled, will result in a line item of € 30 for that services monthly charge
The Billing Type provides information whether this is a service that has manual (using manually provided, adjustable rate value) or automatic (using rate column) rates configured
COGS (Cost of Goods) of a service will have its own rate configuration, which can be either manual/automatic per unit or manual/automatic per interval
Exivity will in a future release allow the user to change and create services through the web interface. At this time, this feature is disabled.
In case you need to change the configuration of an already populated service, the GUI enables you to do so. To change an existing service you will need to make sure that you have first selected the appropriate report from the Report Selector and the left top of the screen. To change the configuration or delete a service, you will then need to follow these steps:
Navigate to the Catalogue > Services menu and click the white Edit button at the top of service list. The system will warn you that any changes made to existing service, may require you to re-Prepare the currently selected Report Definition.
If you have confirmed the warning message, you will be able to select one, multiple or all of the services within the currently selected Report Definition. You can then select the Delete button next to the Edit button, to delete all selected services.
If you want to change the configuration of one of the services, you should first select the service which you'd like to change.
When you have the service that you want to change selected, you can change any of the available parameters such as the Instance Column, Interval, etc. Once you are satisfied with your changes, you may press the Update button.
Ensure to re-Prepare your report in the Report Definitions in case you have made any changes.
To save your changes, which will also initiate a re-preparation of the applicable Report Definition. click the Save Revision button (see to learn more about report preparation)
The unique key value of this service (see )
Exivity allows you to configure a number of system variables easily, such as:
custom logo, branding and color schema
currency and decimal values
date format
fiscal year
custom css
To change these and other settings, follow the instructions provided in this document.
To apply custom branding to the Glass interface, use the 'Administration' > 'Configuration' menu, and then select the 'Branding' tab:
The following GUI adjustments can be done:
Provide your custom product name
Change the logo as show at the left top part of the screen. The rectangular logo that says "Exivity" by default
Change the square logo at the top left part of the screen. Next to the brand logo that says "Exivity" by default
Change the favicon as shown in your browser favorites bar
Change the GUI default color using the drop down list.
If you want to change other colors, font types and other CSS specific parts of the GUI that are not listed in this menu, you can use the 'Show Advanced' button, to add custom CSS code
Click the 'Save' button to apply your changes/
To change country specific formatting or decimal rounding, use the 'Administration' > 'Configuration' menu, and select the 'Branding' tab:
The following changes to the formatting can be done:
Change the system wide Currency settings
Configure comma or dot value for the decimal separator
If you want to configure a thousands separator, you can choose between comma, dot and space
Rate precision refers to the amount of decimal values shown on reports for the (average) rate value
The Report precision can be changed to limit the number of decimal values shown on non-invoice reports
The Invoice precision can be changed to limit the number of decimal values shown on invoice cost reports
The quantity decimal values can be limited using Quantity precision
To apply a system wide date format, choose your preferred date format from the Date format drop down list
To change reporting settings, such as the address lines and logo on your invoice report, use the 'Administration' > 'Configuration' menu, and select the 'Reporting' tab:
The following changes to the reporting configuration can be done:
Use the Address lines text field to provide a custom address which will be shown on the textual invoice cost reports
To change the Logo that will be shown on the right hand side of your invoice cost report, use Select File to browse and select your custom logo
A custom footer text can be added to your invoice report using the text field next to Extra text
To adjust the maximum amount of series shown in the Exivity graphical report line graphs, change the value for Maximum graph series
If you have a non-standard fiscal year start, you may change the Reporting start month using the provided drop down list
Use the Save button to apply your changes
Exivity is able to send out e-mail messages for certain tasks. This can be used for resetting user account passwords and sending out events such as failing workflows. To enable this functionality it is required to configure an SMTP server. This can be done using the following procedure:
Navigate to: Administration > System > Environment
Ensure to fill in your SMTP server details as shown in above image
Click the Update button to save your changes
If you now logout and try to login again, you will notice a "Reset Password" link. This can be used by all of your users, as long as a valid e-mail address has been associated to each user account
Make sure that your SMTP server allows relaying using the provided credentials (user/pass) and from the Exivity host's IP address. Consult your mail server administrator for additional information.
config.json
In order to access advanced configuration for all users which can't currently be modified through the interface, edit the file config.json
in the web/glass
subdirectory of the Program directory.
The default contents of this file should look like this:
When logging in, users have the option to change the API domain by clicking the Change domain
link:
In order to specify a default for all users, edit the config.json
file and add your default domain as the apiHost
option:
To never show any Exivity related branding in the interface, set the whiteLabel
option to true
:
When the whiteLabel option is enabled, certain functionality will be disabled in the GUI:
Manual clearing of caches on the About page.
Documentation links in the header will be turned off.
Some other elements in the interface have Exivity branding by default. These defaults can be modified to match your brand:
The title and logo of the application (displayed in browser tabs, sidebar, and so on) can be changed on the Configuration page.
The sender e-mail address and name for system and notification e-mails on the System page.
The Groups and Users menu's, allow you to configure Role Based Access and Account level permissions. Groups are created to define custom restrictive roles, that can be associated to users that must have access to only certain parts of the Glass interface. Users are created to provide login credentials for other users of the Exivity system. A user can be associated to a single Group, and can have access to all accounts or to a subset of accounts.
To create a custom group with limited access to the Exivity Glass user interface, follow this procedure:
Go to 'Administration' > 'Groups', then click 'Add Group'
Provide a meaningful Name for this Group
Select one or multiple permissions from the custom list of 'Role Names' below
When you're done with your selection, click the 'Save' button to create your group. Now you can create Users and associate them to this group.
5.
To create a User that will only have access to a certain account or a selection of several accounts, use the following procedure to create a user with special account level permissions:
Go to 'Administration' > 'Users', then click 'Add User'
Provide a login name in the 'Username' field, a valid 'E-mail' address, and a 'Password' of at least 8 characters
From the 'Group' dropdown box, select the 'Group' you want to associate to this users
Next, click the 'Account Access' tab, and select the 'Access Type' for this user:
Grant access to all accounts - to provide access to any of the available accounts in the system
Grant access to only specific accounts - to only provide access to usage and charge data of specific accounts
If you've selected Grant access to only specific accounts, then select from the Account Access list, each Account Name where this user should have access for. You may select a top level account, or an account further down in the hierarchy. Account inheritance is applicable, meaning lower level accounts are automatically included when selecting a top level account. If there are multiple Report Definitions, select the corresponding Definition from the drop down list first, before selecting any accounts.
When all fields have been filled in, you may create the user by clicking the 'Save' button.
When deploying the Azure EA Extraction template for Exivity, some configuration is required within your Azure EA environment. The following process must be completed in order to report on Azure EA consumption:
Create an Access Key and Secret in your Azure EA portal
Configure the Azure EA Blob Extractor
Configure your Azure EA Transformer
Create a Report definition
In order for Exivity to authenticate with Azure EA, you will need to create an access key and secret in the Azure EA Portal. Also you will need to find your enrollment number. To do this, login to your Azure EA Portal on https://ea.azure.com and navigate to the Reports menu:
Just under the Windows logo in the left menu section you will find your enrollment number which you will need to provide later when configuring the Extractor. To create the Access Key & Secret, click on Download Usage and then on API Access Key. This brings you to the menu where you can manage your Access Keys and corresponding secret which you will need in order to configure the data Extractor.
To create the Extractor, browse to Data Sources > Extractors in the Exivity GUI and click the Create Extractor button. This will try to connect to the Exivity Github account to obtain a list of available templates. For Azure EA:
Pick Azure EA Blob from the list
Provide a name for the Extractor in the name field
Click the Create button.
Once you've created the Extractor, go to first tab: Variables
Fill in all variables in above screenshot, and feel free to encrypt any sensitive data using the lock symbol on the right.
Once you've filled in all details, go to the Run tab to execute the Extractor for a single day:
The Extractor requires two parameters in yyyyMMdd format:
from_date is the date for which you wish to collect consumption data
to_date should be the date immediately following from_date
These should be specified as shown in the screenshot above, separated with a space.
When you click the Run Now button, you should get a successful result.
Once you have successfully run your Azure EA Blob Extractor, you can create a Transformer template via Data Sources > Transformers in the Exivity GUI. Browse to this location and click the Create Transformer button. Make any changes that you feel necessary and then select the run tab to execute it for a single day as a test.
Make sure that when running the Transformer you select custom range in the drop-down menu labelled Run for and select the same day as for which you have extracted consumption data in the previous step.
Once you have run both your Extractor and Transformer successfully create a Report Definition via the menu option Reports > Definitions:
Select the column(s) by which you would like to break down the costs. Once you have created the report, you should then click the Prepare Report button after first making sure you have selected a valid date range from the date selector shown when preparing the report.
Once this is done you should be able to run any of Accounts, Instances, Services or Invoices report types located under the Report menu for the date range you prepared the report for.
The Workflows menu allows you to schedule various tasks and execute them at a specific date and time. This allows the execution of different Extractors and Transformers, so that they are tightly chained together.
To create a new Workflow, go to 'Administration' > 'Workflows', then click the green button '+ Workflow':
Provide a meaningful name for the Workflow
Optionally you can provide a detailed Description
The Start date of the Workflow, when it will run the first time
The interval: Hourly, Daily or Monthly
At what time should the Workflow start
Provide an interval value. I.e. provide 2 for every second Hourly / Daily / Monthly, depending on your configured interval
Add a new step in your work flow
Set the Type of the step using the drop down menu
Provide the option that goes with the selected Type. This can be your Extractor, Transformer, Report or other name
Depending on the selected Step Type, you can provide an offset date. This value is used during execution of that step. Typically this would be used for a From date offset (i.e. -1 for yesterday)
A To date offset can be provided for some step types (i.e. 0 for today)
Additional arguments can be send to the step. This is typically used for some Extractors and when executing a custom Command
You can delete a step using the red minus button
To view historical Workflow results, click the Status tab
Apart from adding Extractor, Transformer and Report steps, there are two different Workflow Step types:
Core
Execute Command
The Core command allows you to run a few predefined API calls, currently these are the following:
Run garbage collector
Cleans up the server getPrices cache table and Redis cache
Purge cache
This will Unprepare any prepared reports. Use with caution
Refresh budgets
Evalualte all configured budgets
Send heartbeat
Send an API heartbeat request. For future use.
Purge cache should be used with caution, as it will unprepare all available reports. This means that nonr of your reports will return any data, until you have prepared them again.
The Execute Command step enables you to execute an external command, like a script:
As an example: you could run a Powershell script to obtains some data from a special data source that Exivity Extractors are not able or allowed to connect to. This script could be executed the following manner:
The above command calls the Powershell executable to run the special.ps1 script, with a dynamically generated parameter that is evaluated at run time. This particular example always provides yesterdays date in yyyyMMdd format as a parameter to the special.ps1 script. Many other variations and scripting languages are possible. Feel free to experiment.
When deploying the Azure CSP for Exivity, some configuration is required within your Microsoft Cloud Solution Provider Potal. The following process must be completed in order to report on Azure CSP consumption:
Create a Partner Center Web Application
Configure Extractors for Azure CSP Usage, Billing & Ratecard
Configure Transformers
Create your Report
Create your Workflows
It is necessary to create independent Extractors/Transformers for the Usage and Billing, the Usage Extractors will be retrieving the data on a daily basis giving an estimation of your daily costs. The Billing Extractor will consolidate the rates based on the blended costs per service for the billing period.
Perform the following to create the Azure AD configured to access the Partner Center API
Browse to Partner Center, , and login using credentials that have admin agent and global admin privileges
Click on the following Dashboard –> Account Settings –> App Management
Click Add key to create a new Application Key for you App ID that can be used with Exivity.
Make sure to write down the App ID and its corresponding Key, since you will need these when configuring the Extractor later.
Go to the Billing section.
Open the last month's invoice in pdf format.
Take note of your billing period
Make sure to write down the billing period, since you will need it when configuring the Extractor later.
Go into the Exivity GUI and browse to Data Sources -> Extractors. Then click on Create Extractor and you should get a list of templates. Unfold Azure CSP and pick the usage template:
After selecting the template, click the green Create button on the bottom right. Now make sure to give the new Extractor a name in field at the top:
Now click again on the green Create button at the bottom right. Then click on the Variables menu item:
Now make sure to fill in your Client ID, Secret and your onmicrosoft.com domain. When required, you can encrypt security sensitive field using the lock button on the right of each field. Once you are filling in these details, click the Update button.
Now test the Extractor by going into the Run tab and providing a from and to date like in this example:
Now click the Run Now button and confirm it works as expected:
Create a second extractor using the template Azure_CSP_Invoice_Extractor and give it a name.
This extractor uses the same variables in the Variables menu item as the previous extractor. You can now test the extractor by going to the Run tab, this script uses 3 arguments:
Positive offset: Starting on 0, it will retrieve the most recent invoice, a 1 will retrieve the previous report, etc.
Year of the report: Year of the report you want to retrieve.
Starting day of the billing period: If your billing period goes from 22nd to 21st, the input will be 22.
After filling the arguments you can test the extractor by clicking in Run Now.
Finally, follow the same steps for the Azure Rate Card Extractor, this extractor does not need any arguments.
Once you have successfully run your Azure CSP Usage, Billing & Rate Card Extractors, you can create the Transformers templates via Data Sources -> Transformers in the Exivity GUI. Browse to this location and click the Create Transformer button. You will need to create two separate transformers using these two templates:
The Azure_CSP_Daily-Usage Transformer will transform the daily usage data and the Azure_CSP_End-of-Month Transformer will consolidate the usage with the final blended rates.
Make any changes that you feel necessary and then select the Run tab to execute it for a single day as a test.Make sure that when running the Transformer you select custom range in the drop-down menu labelled Run for and select the same day as for which you have extracted consumption data in the previous step.
Once you have run both your Extractor and Transformer successfully create a Report Definition via the menu option Reports > Definitions:
Select the column(s) by which you would like to break down the costs. Once you have created the report, you should then click the Prepare Report button after first making sure you have selected a valid date range from the date selector shown when preparing the report.
Once this is done you should be able to run any of Accounts, Instances, Services or Invoices report types located under the Report menu for the date range you prepared the report for.
You may want to automate the CSP ETL process, you can achieve it by leveraging on the Exivity's Workflow capabilities. You will create two Workflows, one will run on a daily basis calculating the usage of your CSP subscriptions and the other one will run on a monthly basis to consolidate the service rates.
Start by browsing to Administration -> Workflows in the Exivity GUI and click on the +Create button.
Fill the Name and Description fields, in the SCHEDULES section configure the workflow to run on a daily basis at a convenient time.
In the STEPS section you can create as many steps as needed by adding them with the + button. For the first daily workflow, a minimum of 4 steps are required, two steps for the Usage and Ratecard extractors, one for the Transformer and one for the Report. Make sure to input the right FROM and TO date offsets. Click on Update to finish the creation of your first Workflow.
Create a second Workflow, fill the Name, and Description fields and configure the SCHEDULES section to run on a monthly basis, preferably 2 to 3 days after your month's billing period has finished. For the monthly workflow, a minimum of 3 steps are required, one step for the Billing extractor, one for the Transformer and one for the Report. Make sure to input the right FROM and TO date offsets (to cover the entire billing period) and arguments. Click on Update to finish the creating of the second Workflow.
This tutorial assumes that you have CUR (Cost and Usage Report) set up in your AWS environment. In the event that this is not the case please follow the steps in before proceeding.
Please note that in order to deploy this solution the S3 bucket to which CUR reports are written must reside in one of the following AWS regions:
Northern Virginia
Ohio
Oregon
Mumbai
Seoul
Singapore
Sydney
Tokyo
Frankfurt
Ireland
London
At this point in time, only the previous regions have all the necessary services deployed.
This tutorial shows how to build a serverless solution for querying the AWS CUR Report using Exivity. This solution makes use of AWS serverless services such as Lambda and Athena, as well as other commonly used services such as S3, CloudFormation, and API Gateway. The following topics will be covered:
Solution Overview
Launching the CloudFormation Template
Creating the Lambda function and API Gateway
Configuring an Extractor
Configuring a Transformer
Creating your Report
The Billing and Cost Management service writes your AWS Cost and Usage report to the S3 bucket that you designated when setting up the service. These files can be written on either an hourly or daily basis.
The CloudFormation template that accompanies this tutorial builds a Serverless environment containing a Lambda function which reads a CUR file, processes it and writes the resulting report to an output S3 bucket. The output data object has a prefix structure of "year=current-year" and "month=current-month". For example, if a file is written 13/09/2018 then the Lambda function outputs an object called "bucket-name/year=2018/month=09/file_name".
The next step in the template is to translate this processed report into Athena so that it can be queried. The following diagram shows the steps involved in the process:
Afterwards, we will create a Lambda function to query the Athena database, returning a URL with the results of the query in CSV format. We will also create an API EndPoint with the AWS API Gateway service, which is used by Exivity to retrieve the data.
In order to deploy this solution successfully the following information is required:
The name of your AWS Cost and Usage report.
The name of the S3 bucket in which the reports are currently stored.
Firstly, launch the CloudFormation template that builds all the serverless components that facilitate running queries against your billing data. When doing this, ensure that you choose the same AWS Region within which your CUR S3 bucket is located.
Now follow the instructions in the CloudFormation wizard, using the following options, and then choose Create.
For CostnUsageReport, type the name of your AWS Cost and Usage report.
For S3BucketName, type a unique name to be given to a new S3 bucket which will contain the processed reports.
For s3CURBucket, type the name of the bucket into which your current reports are written.
While your stack is building, a page similar to the following is displayed.
When the Status column shows CREATE_COMPLETE, you have successfully created four new Lambda functions and an S3 bucket into which your transformed bills will be stored.
Once you have successfully built your CloudFormation stack, you can create a Lambda trigger that points to the new S3 bucket. This means that every time a new file is added to, or and existing file is modified in, the S3 bucket the action will trigger the lambda function.
Create this trigger using the following steps:
Open the Lambda console.
Choose Functions, and select the aws-cost-n-usage-main-lambda-fn-A Lambda function (note: do not click the check box beside it).
There should be no existing triggers. Choose Trigger, Add trigger.
For Trigger type (the box with dotted lines), choose S3.
Select the S3 bucket withing which your CUR reports are stored.
For Event type, choose Object Created (All) and check Enable trigger.
Click Submit.
The database and table are not created until your function runs for the first time. Once this has been done, Athena will contain the database and table.
Athena stores query results in S3 automatically. Each query that you run has a results file in CSV format and a metadata file (*.csv.metadata) that includes header information such as column type, etc.
Once you have successfully added the trigger to the S3 bucket in which the Billing and Cost Management services writes your CUR reports, test the configuration using the following steps.
In the S3 path to which AWS writes your AWS Cost and Usage Billing reports, open the folder with your billing reports, open the folder with your billing reports. There will be either a set of folders or a single folder with a date range naming format.
Open the folder with the data range for the current month. In this folder, there is a metadata file that can be found at the bottom of the folder. It has a JSON extension and holds the S3 key for the latest report.
Download the metadata file. Ensure that the name of the file on your machine is the same as the version stored on your S3 bucket.
Upload the metadata file to the same S3 path from which you downloaded it. This triggers the Lambda function aws-cost-n-usage-main-lmbda-fn-A.
In the S3 bucket that you created to hold your processed files, choose the "year=" folder and then the "month=" folder that corresponds to the current month. You should see the transformed file there, with the time stamp that indicated that it was just written.
To automate this process a CloudFormation template will be provided. This template will create an IAM role and Policy in order that our API can invoke Lambda functions. Then it will create a Lambda function with the capabilities of querying our previously created Athena Serverless DB, and save the output in a S3 bucket in .csv format, (this output will be later retrieved by Exivity). Finally, it will deploy an API Gateway allowing us to create an endpoint for our Lambda function, this is the endpoint that the Exivity extractor will consume. Make sure to launch the CloudFormation template in the same region that you have deployed the previous one.
Let's start by downloading the CloudFormation template (you only need to choose one of the formats, both are supported by AWS):
Then follow the next steps:
Go to the CloudFormation console.
Choose Create Stack.
Choose Upload a template to Amazon S3.
Select from your computer the template that you have downloaded.
Follow the CloudFormation wizard - Add a Name to the Stack and select I acknowledge that AWS CloudFormation might create IAM resources with custom names in the last step.
Once the stack is created you should see an CREATE_COMPLETE message.
Click on Output to take a note of your endpoint (you will need to input this in the Exivity extractor).
Next, we will associate an API Gateway trigger to our Lambda function:
Go to the Lambda console.
Choose the QueryAthena2 function.
Under Add Triggers select API gateway. You should see an image like the following:
Click on API Gateway figure to configure it.
On API select QueryAthena2.
On Deployment Stage select v1.
On Security select Open.
Choose Add.
Choose Save.
You should see a screen like this:
Finally, we will deploy the API Gateway:
Go to the API Gateway console.
Choose QueryAthena2.
In the Resources section, click on the ANY method.
In Actions, choose Delete Method.
Click on Delete.
In the Resources section, choose Actions.
Click on Deploy API
In Deployment Stage select V1.
Add a Deployment Description.
Choose Deploy.
Initially, the created API endpoint is public and as such is vulnerable to the possibility of misuse or denial-of-service attacks. To prevent this, associate an API Key with the endpoint as per the following steps:
Inside the API Gateway dashboard, select the QueryAthena2 API
In Resources, select Method Request
In Settings, change API Key Required to True
Click on Actions and choose Deploy API to effect the change
In Deployment Stage, select v1 and click on Deploy
Go to the API Keys section
Click on Actions and select Create API Key
In Name write ExivityAPIKey
Click on Save
Copy the API Key, as this will be required by the Exivity configuration
Go to Usage Plan
Click on Create.
In Name write ExivityUsagePlan
In the Throttling Section, change Rate to 100 and Burst to 10
In the Quota Section, change it to 50000 requests per Month
Click on Next
Click on Add API Stage
In API, select QueryAthena2 and in Stage select v1
Confirm the changes and click on Next
Click on Add API Key to Usage Plan
Select ExivityAPIKey, confirm the changes
Click on Done
The API Key is now required to access the API endpoint thus adding a layer of security to mitigate unauthorized access attempts.
To create the Extractor in Exivity, browse to Data Sources > Extractors and click the Create Extractor button. This will try to connect to the Exivity Github account to obtain a list of available templates. For AWS, please click AWS_CUR_Extractor from the list. Provide a name for the Extractor in the name field, and click the Create button.
Once you have created the Extractor, go to first tab: Variables
In the Bucket variable specify the name of the S3 bucket where the .csv with the output of the query will be saved (The S3BucketName previously specified when launching the CloudFormation template).
In the Api endpoint variable specify the API endpoint previously created plus the route /QueryAthena.
In the DBname variable specify the name of your DB, you can find it in the Athena main Dashboard.
In the Tablename variable specify the name of the table inside your DB, you can find it in the Athena main Dashboard.
In the API_Key variable specify the API Key that we have created in the Securing API Gateway Section.
Once you have filled in all details, go to the Run tab to execute the Extractor for a single day:
The Extractor requires two parameters in yyyMMdd format:
from_date is the date for which you wish to collect consumption data.
to_date should be the date immediately following from_date.
These should be specified as shown in the screenshot above, separated with a space.
When you click the Run Now button, you should get a successful result.
Once you have successfully run your AWS CUR Extractor, you should be able to create a Transformer template via Data Sources > Transformers and click the Create Transformer button. Select the AWS CUR Transformer and run it for a single day as a test. Make sure that it is the same day as for which you extracted consumption data in the previous step.
Once you have run both your Extractor and Transformer successfully create a Report Definition via the menu option Report Definition via the menu Reports > Definitions:
Select the column(s) by which you would like to break down the costs. Once you have created the report, you should then click the Prepare Report button after first making sure you have selected a valid date selector shown when preparing the report.
Once this is done you should be able to run any of Accounts, Instances, Services or Invoices report types located under the Report menu for the date range you prepared the report for.
Click on in the region associated with the S3 bucket containing your CUR files (this tutorial uses Ireland (eu-west-1) for ilustrative purposes, but all the supported regions work in the same way).
Role Name
Description
View Reports
Provides access to all elements of the Reports menu
View Cogs
Enabling this option, provides access to the Cost of Goods rates and charges
View Logs
Provides access to the Administration > Log Viewer
View Audit
Provides access to the Administration > Audit Trial
Manage reports
Allows the creation and deletion of Report Definitions
Manage accounts
Provides access to the Accounts menu section
Manage catalogue
Enables read and write access to the entire Catalogue menu. This includes adding and changing of Rates and Adjustments
Manage data sources
Allows to create, edit and delete Extractors and Transformers
Manage users
Allows to create, edit and delete all internal users and Groups
Manage configuration
Allows editing of all elements in the Administration > Configuration menu
Manage system
Provides access to the Administration > System Info menu. This includes updating of the installed license file.
Upload files
Allows to upload usage and lookup data through the Exivity API (see api.exivity.com)
Exivity provides a catalogue of USE extraction scripts that can be used to integrate with almost any cloud provider, hypervisor or legacy IT end point. We've published some of our templates on GitHub for your convenience.
This repository contains Extractors for VMware, Azure, Amazon and others. However, if you are currently missing an integration template and are unwilling or unable to create your own, feel free to drop us an e-mail at support@exivity.com.
June 06, 2018
New features
Minimum commit is now supported in the charge engine When generating a report, the results for any services that have a minimum commit value (and for which the usage does not meet that minimum commit quantity) will be adjusted to reflect that minimum commit value.
Updates to internal service and rate schema
The rate attributes min_commit
and threshold
are now implemented and the API will return a slightly different schema for the /v1/services
endpoint (and related /v1/dump
models) - the rate_type
attribute is now called charge_type
. More information can be found at https://api.exivity.com/#0f8c2ac2-1f56-3c54-3000-210a5e94bd61
The charge engine now includes information about proration adjustments When applying proration to a monthly service, the charge engine will now include information in the raw report data which shows the amount that the unprorated charged was reduced by. This information will be used by the GUI in a future release.
Proration is now applied to monthly services where applicable Report results for monthly services that are flagged as being prorated will now reflect a percentage of the monthly charge, based on the number of days in the month that the service is used.
GUI preferences are now saved for each user For example, selected reports, date ranges and filters are now persisted for each user, so they can be restored after logging out and in again.
The charge engine can now execute a script passed to it via standard input The charge engine can now execute a reportfile passed to it via standard input. This internal change results in fewer termporary files on disk during normal use.
Error reporting can now be disabled in configuration
Transcript can now import usage data from existing RDFs The 'import' statement in Transcript can now retrieve the raw usage data from an existing RDF file.
Usernames are no longer case sensitive when loggin in
Transformers now always run with loglevel = warn
when triggered in workflows
Service and service category filters now only show items actually in the visible report
USE will now trap more HTTP errors When enacting some HTTP operations, if an error such as a timeout or invalid host is encountered, USE will now return an error in the HTTP_STATUS_CODE variable instead of automatically terminating the script.
Added daily usage information for monthly services in the charge engine When generating a report, the charge engine will now include information about the usage quantity for each day in the charge interval. This information will be used by the GUI in a future release.
Drilldown functionality is now available from the legend in reports
Reference account information in ETL Account information can now be imported directly during the data transformation step, such that existing account data can be used to enrich the data being processed.
Increased HTTP client timeout USE will now wait for a three minutes by default before deciding that the connection has timed out if no data is received after the initial connection to a server has been made.
Improved the syntax for options to the 'import' statement in Transcript The options supported by the 'import' statement must now be formatted such that there is a single option per line of script. This removes the previous requirement to quote the list of column names when using 'select' and 'ignore', as well as the requirement to quote the expression used by the 'filter' option.
Added a system variable to return the last day of any given month A new system variable has been implemented which will return the last day in any given calendar month.
The 'correlate' transform now supports a default DSET for column names The 'correlate' statement always uses the default DSET as the destination for correlated columns but now supports an 'assume' parameter which determines the default DSET within which to locate non-fully-qualified source columns.
Added a button to the detailed widget in reports to toggle search field
Added an option to configuration to add a custom Google Analytics property
The charge engine can now be used to identify unused service definitions
The charge engine now supports the ability to retrieve a list of services which are not used by any existing reports.
Bug fixes
Fixed an issue where the depth filter wouldn't reload after preparing a report
Improved the print/PDF layout of consolidated invoices
Fixed a bug where the summary in Instance reports would sometimes remain empty
The charge engine now correctly deletes un-needed RDFs The charge engine now includes a mechanism to 'unload' historical data. This is an internal mechanism which will be used by the GUI in a future release.
Services for users with limited access to accounts are now filtered
When creating services, an instancecol parameter is now required _Previously it was possible to create services with no instance_col specified. This would result in missing data in reports if no instance_col was specified. Transcript now requires that an instance_col parameter is provided to the 'service' and 'services' statements.
Consolidated invoices can now be exported to PDF
Fixed an issue where in some circumstances the reports wouldn't load
It is now possible to view budget audit trails
The 'import' statement in Transcript now correctly imports usage data in all forms of the statement Fixed a bug whereby when using automatic source and alias tagging, the 'import' statement would not permit the importing of usage data from an existing RDF
Improved readability of text when a light background colour is chosen
USE will no longer reject some valid expressions In some cases, a valid expression in a script was rejected as having an unbalanced number of brackets. This has now been fixed.
The charge engine can now delete services associated with DSETs that are unused by any reports Fixed a bug where the charge engine would not correctly delete services if there were no RDF files for the DSET that the service is associated with.
The reset pins button has been moved to the top of the detailed widget in reports
May 03, 2018
New features
Implemented search field in report details table Ability to filter and pin a selection using a search query in the Accounts, Services and Instances report details table
Quantity adjustments can now be applied to a customer Adjustments can now also be set to affect quantities instead of charges. Both relative and absolute quantity adjustments are supported.
Ability to show consumed quantity in a report
Ability in transcript to convert number bases The following is now possible in a transcript: convert colName from dec|hex to dec|hex
It is now possible in the Invoice cost report to consolidate all child accounts on a single page
Added option to create workflow step which purges Proximity cache.
Beta version of budget manager & viewer is now available.
Bug fixes
When encrypting a variable it could get corrupted
Transcript could previously crash when running for a large date range
Workflows status tab did not consistently show historical log files
Fix for Invoice report error "Depth can't be empty, 0 or greater than 5"
April 13, 2018
Bug fixes
Extractor arguments where not used correctly when running USE script interactively from GUI
Report timeline graph could previously show zero when there's consumption
v1.6.1
April 13, 2018
New features
Add profile page where logged in users can change their own e-mail address and password.
Bug fixes
Fixed issue where scheduling multiple steps could corrupt Workflow WARNING: as of this release it is required to re-create your Workflows from scratch, to avoid potential issues
Fix loading overlays to improve multitasking in GUI
Fixed an upgrade bug which caused creating report definitions to be broken
Ability to specify to and from dates for transformers in workflows.
v1.6.0
April 8, 2018
Notable new features
[EXVT-971] - Implement day and month name variables in USE
[EXVT-973] - Add option to enable.disable client certificate support in USE
[EXVT-979] - Scheduler is now called workflows
[EXVT-1021] - When using COLNAME_NOT_EXISTS in a filter, it always evaluates to 'TRUE'
[EXVT-1024] - Internal Error when applying filters in a 'where' statement if import options are used
[EXVT-396] - Garbage collector
[EXVT-744] - Glass config file for default port/host
[EXVT-749] - Stacked bar chart option in accounts/services report, including optimized legend
[EXVT-805] - Extend `Run` tab in Transformer with `from` and `to` date
[EXVT-966] - Added Instance reports
[EXVT-985] - Make toggle so reports can go fullscreen
[EXVT-986] - Connect graphs + legend
[EXVT-988] - All lists in the front-end are now sorted alphabetically
[EXVT-993] - Ability to pin report items
March 26, 2018
Notable new features
[EXVT-370] - Add support for SAML Single Sign-On
[EXVT-849] - Add ability in transcript aggregate to average the values from a column
[EXVT-879] - Add ability to base new extractor on templates from GitHub repository
[EXVT-958] - Enhance the 'hash' statement in USE to support base-64 encoding of the result
[EXVT-780] - Fixed manually editing the value of an encrypted variable in USE can cause a crash
[EXVT-924] - Fixed Eterenity hourly schedule does not consider start date
[EXVT-946] - Fixed OSI_TIME_UTC variable is missing a trailing Z
[EXVT-947] - Fixed some accounts show slight discrepancies when comparing to Excel calculation
[EXVT-949] - Fixed radio buttons don't update when changing adjustments in Glass
March 19, 2018
Notable new features
[EXVT-940] - Fixed duplicate headings not always eliminated in filtered import in Transcript
[EXVT-941] - Fixed a Transcript crash on 'move rows' or 'delete' after a 'replace'.
March 16, 2018
Notable new features
[EXVT-871] - Added 'include' statement to Transcript
[EXVT-932] - Added UTC versions of time-related variables in USE
[EXVT-105] - The API can now render an invoice report as a native PDF document
[EXVT-691] - Ability to change the service description via the GUI
[EXVT-779] - Made updating of Extractor variables more robust, and added support for encrypted variables in the GUI
[EXVT-810] - Added the ability to use wildcard in import statement in USE
[EXVT-823] - Added a daterange wrapper for Transcript
February 23, 2018
Notable new features
[EXVT-889] - Fixed a corner case where USE can stop working when executed from Glass
February 23, 2018
Notable new features
[EXVT-720] - Add option to choose custom currency symbol
[EXVT-741] - Move report selector to sidebar
[EXVT-847] - Improved syntax highlighting for USE and Transcript in the Glass script editor
[EXVT-868] - Add additional checks to global conditions in Transcript
[EXVT-827] - Implement import filters in Transcript
[EXVT-881] - Add escaping option to import statement in Transcript
[EXVT-476] - Added scheduler interface
[EXVT-798] - Add report depth breadcrumbs to reports
[EXVT-832] - Extractor log is now shown when running on-demand through GUI
[EXVT-842] - Fixed an issue which caused small discrepancies when using different reporting definitions
A full changelog is available upon request.
February 09, 2018
Notable new features
[EXVT-102] - Ability to extract data from databases using ODBC connection
[EXVT-693] - Scheduler endpoints in API
[EXVT-802] - Ability to schedule the preparation of report definitions through the GUI
[EXVT-774] - Enhanced conditional execution in Transcript with support for regex matching
A full changelog is available upon request.
February 03, 2018
Notable new features
[EXVT-818] - Fix for Cannot read property 'relationships' of undefined
error when logging in as a user with limited account permissions.
A full changelog is available upon request.
February 02, 2018
Notable new features
[EXVT-253] - Syntax highlighting for USE
[EXVT-255] - Add support for XML data extraction in USE
[EXVT-600] - Enable parallel processing in Eternity
[EXVT-681] - Create USE script for reading AWS S3 bucket
[EXVT-717] - Extractor and Transformer execution must show last 25 lines of corresponding log file
[EXVT-739] - Perform cross-browser test and add warning in unsupported browsers.
[EXVT-770] - Improve orbit performance when syncing large amounts of records
[EXVT-777] - Select single days in datepicker
[EXVT-783] - Support in Eternity for hourly and monthly schedules
A full changelog is available upon request.
January 12, 2018
Initial release.
Upgrading to the latest release is a straight-forward process. More information.
When upgrading to new major version (e.g. from version 1.x.x to version 2.x.x), also take note of these manual upgrade steps:
January 23, 2019
New features
The 'export' statement will no longer generate an error if asked to process an empty DSET while option mode = strict is set
Implemented user-definable timeout setting when retrieving data from HTTP sources
When retrieving data from HTTP sources, the number of seconds to wait for a server response before timing out can be defined using set http_timeout.
January 17, 2019
New features
Added editable labels for report levels
Added the ability to mass delete services
Invoice report is now called Summary report
The 'finish' statement in Transcript can be made to cause the task to fail if the DSET it's given is empty
Previously, when creating an RDF, the finish statement would perform no action if the DSET to create the RDF from was empty. This is still the case if 'option mode = permissive' is in force, but if 'option mode = strict' then the finish statement will now generate an error. The error will cause the task to fail for the current day, and if 'option continue' is not enabled, the task will be terminated, else the task will move onto the next date in the range of days being processed.
Added the ability to exit a subscript invoked via #include in a transform script The 'return' statement, when used in a Transform script, will now cause script execution to resume from the statement following the #include statement in a parent script that referenced the script containing the 'return' statement.
Increased the performance of the correlate statement Correlation should now be significantly faster than it was previously
Increased the performance of the aggregate statement
Aggregation should now be significantly faster than it was previously
Bug fixes
Fixed a bug where users couldn't update their own details (including their password)
Fixed a bug where only 10 datasets were displayed when creating a new report
Fixed a bug where the mail sender name was not persisted in the configuration
Fixed an issue when importing CSV files containing quotes
When importing a CSV file with two successive quote characters at the end of a field, Transcript would reject the file as invalid. This has now been fixed.
Fixed an issue where deleting data let to GUI crashes on occasion When deleting data (RDFs) associated with a report, it could be that if one or more days had previously been overwritten, a stale database entry would cause issues after the RDFs were deleted. This has now been fixed.
Fixed a bug whereby using terminate within the body of an if statement in a Transform script could cause an error Invoking terminate in an if block when running a transform script against a range of dates could cause the error The maximum number of nested blocks (32) is already in use.
This has now been fixed.
December 13, 2018
Bug fixes
A bug was fixed which could lead to an error in the invoice report when using a rate with a minimum commit set
Fixed an issue with minimum commit It was possible that when applying minimum commit to a service, that other services would be affected by that minimum commit. This has now been fixed.
Fixed an issue when retrieving NULL fields from an ODBC query When using ODBC to collect data, the presence of NULL values in the results could cause USE to crash. This has been fixed.
November 30, 2018
New features
Added the ability to view instance level details on the invoice reports.
Added the ability to customize the report exports (CSV format only) field delimiter and decimal separator. These settings are system-wide and available to administrators by navigating to Administration > Configuration > Formatting.
Added the ability for users to reset their own passwords. This requires the email address of users to be set and a working server configuration for sending emails. This can be configured in Administration > System > Environment.
Logfiles generated from workflow tasks now include a timestamp. This prevents logfiles from consecutive runs of the same task from being overwritten.
Workflow status now automatically refreshes after a manual run.
Added a new Environment tab in Administration > System. In this tab, information about the system the Exivity instance is running on can be filled out. In the future this will be expanded to include more configuration options.
Invoice reports now include minimum commit uplifts as separate entries.
Carriage-returns and line-feeds in data extracted using ODBC are now replaced with spaces. When extracting data with USE, the presence of newlines in the data could cause corrupt CSV output. Carriage Return and Line Feed characters in data extracted from ODBC are therefore now replaced with spaces.
Enhanced expression support in the Extractor component. Conditional expressions have been enhanced in the Extractor component such that more complex conditions can be evaluated and additional operations can be performed. Additionally, it is now possible to set a variable value using an expression.
Services can now be manually deleted using the GUI.
Bug fixes
The datasets selector visible when creating a new report definition is now alphabetically sorted.
Fixed a bug which caused the contents of extractor editor to not update after updating variables. The contents of the extractor script itself was always saved after updating variables, only those changes were not visible in the editor.
Fixed a bug which caused the account depth selector to reset after performing an upgrade.
Fixed a bug which could cause the interface to become unresponsive after preparing a report.
Fixed an issue when running Transform scripts for days with 25 hours in them. When running a Transform script with a data-date representing a day where the clocks were adjusted such that the day had 25 hours in it, the script would be executed a second time automatically once the first had completed. This could lead to unexpected errors and log entries on occasion, and has now been fixed.
When writing to CSV files in USE, embedded CR/LF characters are converted to spaces. USE will now automatically strip out embedded carriage-return and line-feed characters when writing data to CSV files. Each unique occurrence of one or more sequential CR/LF characters will be replaced with a single space.
November 22, 2018
Bug fixes
Updated the documentation links in the header to point to our new documentation site.
Fixed grouping behaviour in the details table of the accounts report. In some cases, accounts could appear grouped under the wrong parent account in the 'Detailed' table in the accounts report.
October 31, 2018
Bug fixes
Fixed an issue with incorrect quantities sometimes showing on reports. Occasionally, when running a report for a range of dates, the quantities on one or more services differed from the quantity for that service shown when a report was run for a different date range (or just the day in question). This issue has now been fixed.
October 26, 2018
New features
The USE 'basename' statement can now write its results to a new variable. Previously, the 'basename' statement would always modify the value of the variable whose name was supplied as the argument. It can now also accept a literal string and create a new, or update an existing, variable to hold the result.
Archives in GZIP format can now be decompressed using USE. USE now supports the 'gunzip' statement which can be used to inflate GZIP'd data. Details of how to use this statement may be found at https://docs.exivity.com/diving-deeper/extract/language/gunzip
FIxed an issue whereby when running a Transform script the Audit database would be locked for the duration of the task. Transcript now only opens the Audit database when it needs to, reducing the likelihood of encountering errors pertaining to a locked audit database in the logfile.
A new system variable is now available containing the number of days in the current month. The existing dataMonth variable, which contains the yyyyMM of the current month is now supplemented with a new variable called dataMonthDays which contains the number of days in that month.
Changed default service type to 'automatic' in the 'services' statement in Transcript. When creating services, if no 'type' parameter is provided then the default service type will now be be set to 'automatic'.
Bug fixes
Fixed an issue whereby when creating a service, the audit indicated that the service creation failed. When a service definition is successfully created, Transcript will now correctly audit that event as opposed to indicating that the attempt failed.
Fixed an issue whereby over-writing services could result in database errors in the logfile. Sometimes when overwriting services, a constraint error would be logged in the logfile and the service would not have any rate associated with it. This has been fixed.
October 18, 2018
Bug fixes
Fixed an issue that could cause database corruption.
Fixed an issue that could cause database corruption due to the Aeon database being held open for long periods of time.
October 10, 2018
Added a live preview feature when working with transforms. A new feature has been added which can display a live preview of the transformer output. Note: this feature is currently in beta and will be further updated in the next release.
The code editor has been updated. The code editor for Extractor and Transformer scripts has been updated (it now uses the open source Monaco editor - https://microsoft.github.io/monaco-editor/) resulting in a significant improvement over our previous editor. This greatly enhances the user experience when editing scripts in the GUI. Note: This change also lays the foundation for more advanced features going forwards.
Charges for monthly services now take quantity into consideration as well as price. If two or more days in a month have the same highest price then the one with the highest quantity will be reported. Previously, the first seen was reported which could lead to discrepancies between the reported quantity and price on the report.
When running reports blank instance values are now displayed as a hyphen. When running reports against data with blank instance values in the usage data, the instance value will now be represented as a hyphen, which improves the aesthetics of the report.
Added hardware information to Transcript log-files. Log-files created by Transcript now contain information about the CPU and RAM at the top of the log.
Increased auditing information in Transcript. Events relating to service, rate and RDF changes are now audited
Bug fixes
Removed COGS option for users without rights to view COGS information. In the services and instances report, users with no access to view COGS will no longer be able to select the COGS type in the details table. Note: This bug never allowed users without appropriate access rights to view the actual COGS data.
Fixed a bug where the list of datasets on the report definition page was only showing the first 10 results. This could result in an inability to create new reports using datasets that were not included in those results
A link to the instances report has been added to the search feature in the header.
The service interval column in the instances report now contains data. Previously this column was always blank
Fixed a bug where searching for units within a services reports leads to a GUI crash.
Fixed an issue whereby very rarely a charge would not be included in reports. On very rare occasions, information in a record in the prepared report caches was not included in the output when a report was run. This has now been fixed.
Fixed an issue that could cause Aeon database corruption. Fixed an issue that could cause database corruption (and workflows to fail) due to the Aeon database being held open for long periods of time.
Fixed an issue whereby re-using an existing named buffer in USE for ODBC purposes could lead to unexpected results. Fixed an issue in USE whereby if an existing named buffer was re-used to store data retrieved from ODBC then a new buffer could have been created with the same name as the existing buffer, and attempts to reference it would return the old data.
Fixed an issue when executing ODBC queries that return no data. Using the ODBC capability to execute a query that returns no data will no longer cause an extractor to return an error.
September 05, 2018
New features
Upgraded the underlying API framework For more information, please refer to the Laravel release notes.
Bug fixes
Fixed an issue whereby 'append' could crash if one or other DSET was empty When executing the 'append' statement in a transformation script, if one or other of the DSETs involved in the operation was empty (having no data rows) then a crash could occur. This has now been fixed.
Fixed an issue where an expression that evaluated as FALSE could show the wrong line number in a log message The DEBUG level logfile entry indicating an expression is true or false would contain a reference to the wrong line number if the expression evaluated to false. This has now been fixed.
Fixed an issue whereby some comparisons would evaluate incorrectly in expressions Fixed and issue whereby in some cases where a value was quoted in an expression, the quotes would be considered part of the value itself.
Fixed a condition where reports were not showing for non-admins
Quantity metric is available again for the timeline chart on the services and instances reports
Reports in the navigation menu dropdown are now alphabetically ordered
August 28, 2018
New features
Ability to filter data in report using search query: The search bar in Accounts, Services and Instances reports now supports the use of operators (for example >
and <
) to filter your results based on column values or strings.
Add avg_unit_based_rate
to report/run
API endpoint Added the average per unit rate field to the report/run
API endpoint and a placeholder for the average per interval rate which will be implemented later.
Bug fixes
Fixed an issue where deleting services could lead to adjustments not displaying correctly
Rate column in report details tables now use the configured rate precision setting
Fixed an issue whereby scheduled tasks that output more than 4kb of data to the console could suspend execution and do nothing until they timed out
August 22, 2018
New features
Transcript can now normalise scientific decimal numbers to standard format: When processing data that contains numbers in scientific format (such as 2.1E-5) the normalise
statement can now be used to convert these to standard decimal notation (0.000021 in the above case) using the form normalise columncolNameas standard
where colName
is the column containing the values to convert. Any values already in decimal will be unchanged, except that any trailing zeros will be removed from them. Non-numeric values will be converted to 0.
Support group
and group_col
as service parameters in Transcript: In the service
and services
statements in Transcript, the parameters to define the service category are category
and category_col
These parameters now have aliases of group
and group_col
respectively, for those who prefer to use that terminology.
Bug fixes
The replace
statement in Transcript will no longer behave unexpectedly when given an empty string as the target to replace: When using replace
to update substrings within the values in a column, if the target string (the text to replace) is empty then Transcript will generate a meaningful log entry explaining that it cannot be used to replace empty strings, and will no longer overwrite non-blank column values with multiple copies of the replacement text.
The export
statement in Transcript now supports backslashes as path delimiters: When specifying a relative path for the export
statement, Transcript will automatically create any directories that do not exist in that path. Previously there was a bug whereby the auto-creation of those directories would only work if UNIX-style forward slashes were used as delimiters in the path. This has now been fixed and Windows or UNIX style delimiters may be used when specifying an export path.
Fixed a bug in the scheduler that could cause schedules to fail: In some cases schedules could fail for no obvious reason. This has now been fixed.
August 17, 2018
New features
USE scripts can now be forced to terminate with an error result Previously, the 'terminate' statement could be used to cancel script execution, but its use would always indicate that the script ran successfully. This may not be appropriate in all cases (for example if an error is detected by the script itself but ultimately cannot be resolved satisfactorily). The 'terminate' statement will still cause a script to exit with a success result by default, but may now be invoked as 'terminate with error' such that an error status is returned instead.
Added more service attributes as optional columns in the reports details table. The following extra service attributes can now be enabled as columns in the report details table: interval, charge type, cogs type and proration.
Support 'group' and 'groupcol' as service parameters in Transcript In the 'service' and 'services' statements in Transcript, the parameters to define the service category are 'category' and 'category_col'. These parameters now have aliases of 'group' and 'group_col' respectively, for those who prefer to use that terminology.
Reduced the chance of a 'database is locked' warning when preparing reports When preparing reports, on occasion it is possible for a warning to appear in the logfile pertaining to the global database being locked. When this warning happened, it could cause some days in the reporting period to remain unprepared. A known specific cause of this issue has been fixed, significantly reducing the likelihood of it happening.
Bug fixes
Fixed an issue where an ODBC connection could cause a crash in USE When executing an ODBC-based collection in USE, under certain circumstances an incorrect direct connection string could cause a crash. This has been fixed. Additionally, when an ODBC error occurs the error written to the logfile contains more detail than in previous releases.
The order of workflow steps in the status tab now corresponds to the order of workflow steps in the configuration tab.
An issue has been fixed where old user preferences could conflict by updates in the GUI, leading to errors when loading the service and instance reports.
An issue has been fixed where certain characters in a workflow status could lead to errors in the API. Sometimes, when running a scheduled task, the output written to the database contains non-printable characters. The API now re-encodes those characters, which means the GUI will now correctly show the status for those workflows.
When selecting a reporting period that spans multiple months, the charts will now only show a single label for each month.
Fixed a USE crash bug with certain combinations of conditional expressions Fixed an issue whereby if an expression with more than 2 parameters was followed later in the script by an expression with fewer parameters than the first, a crash would occur.
Fixed issue where an extractor could crash when using a parslet after formatting some JSON A bug has been fixed whereby if the 'json format' statement was used to prettify some JSON in a named buffer, use of a parslet to extract data from the JSON could cause a crash.
Fixed an issue where sometimes an XML parslet would cause an 'out of memory' error in USE When using an XML parslet, it was possible that an 'out of memory' error would be returned in the logfile and the script would fail, even on small input files. This has now been fixed.
August 03, 2018
Bug fixes
The 'export' statement in Transcript now supports backslashes as path delimiters
When specifying a relative path for the 'export' statement, Transcript will automatically create any directories that do not exist in that path. Previously there was a bug whereby the auto-creation of those directories would only work if UNIX-style forward slashes were used as delimiters in the path. This has now been fixed and Windows or UNIX style delimiters may be used when specifying an export path.
The 'replace' statement in Transcript will no longer behave unexpectedly when given an empty string as the target to replace
When using 'replace' to update substrings within the values in a column, if the target string (the characters to replace) is empty then Transcript will generate a meaningful log entry explaining that it cannot be used to replace empty strings, and will no longer overwrite non-blank column values with multiple copies of the replacement text.
July 25, 2018
New features
Increased default timeout when retrieving data from HTTP servers
Currently a USE script will fail if more than 3 minutes elapse without response when downloading data from an HTTP server. This has been increased to 5 minutes to cater for slow APIs.
July 19, 2018
New features
Transcript can now normalise scientific decimal numbers to standard format When processing data that contains numbers in scientific format (such as 2.1E-5) the 'normalise' statement can now be used to convert these to standard decimal notation (0.000021 in the above case) using the form 'normalise column colName as standard' where 'colName' is the column containing the values to convert. Any values already in decimal will be unchanged, except that any trailing zeros will be removed from them. Non-numeric values will be converted to 0.
When accessing the GUI via http visitors will be redirected to https automatically
Progress indicator in the report/run endpoint can be disabled To disable, set the progress
parameter to 0
. More information at our API documentation.
Filter selectors show which items are present in the current report The service category selector in the services and instances report, and service selector in the instances report will show items not available in the current report grayed out.
On-demand workflow execution Workflows can now be executed on demand. Also the schedule for a Workflow can be disabled.
Single workflows can now have multiple schedules
Workflows can now be scheduled in a specific timezone
Added the average rate column to the reports details table
Added the ability to show various totals in reports summary widget Summary widget now has the option to show all totals (previous behaviour) or only the totals for the current search results, or for the current pinned items.
Added report shortcuts to the dashboard
COGS, fixed COGS and fixed prices are now evaluated per instance when preparing reports Previously, if a service was created that used any of fixed_price_col, cogs_col or fixed_cogs_col to indicate that the rate in question should be obtained from the usage data for any given day, then the charge engine would use a single value from the specified column(s) and apply that to all instances of the services for the day. Now, each row of usage is individually consulted when preparing reports such that the specific value on that row is used (as is already the case when using 'rate_col' for pass-through rates)
When extracting XML or JSON values, parse errors no longer cause the USE script to terminate Previously, when using a static parslet to extract XML or JSON values from the contents of a named buffer, if the buffer contained invalid JSON or XML then the USE script failed with an error in the log saying that the contents of the buffer could not be parsed. Now, if a named buffer contains data that is not valid JSON or XML, any attempt to extract a value from it using a static parslet will be expanded to the value EXIVITY_INVALID_XML or EXIVITY_INVALID_JSON.
Improved performance and lowered memory requirements when running a report Previously, in some circumstances running a report could take longer than expected and consume large amounts of memory in the process. The performance and memory use of the report engine have both been improved.
Reduced memory and increased performance when preparing reports Previously it was possible for some installations to use large amounts of memory and exhibit unreasonably slow performance when preparing reports. Preparing reports is not intended to be a realtime feature and will always incur some time overhead, but this time should now be significantly reduced in many cases, and the memory required to complete the process will be much less.
New output format for the /report/run
endpoint in the API Due to changes to the charge engine, the output format of the /report/run
endpoint in the API has changed. An up-to-date overview of the attributes returned by this endpoint can be found at our API documentation.
Free formatted ODBC connect strings are now supported in USE This exposes all ODBC driver options to the user, and avoid the requirement of creating manually DSN at the operating system level.
The 'split' statement now supports discarding unwanted result columns When using the 'split' statement it is now possible to discard all but a selected range of the resulting new columns.
Bug fixes
Fixed a Transcript crash when deleting a DSET Transcript will no longer crash in certain circumstances when deleting a DSET using the 'delete dset' statement.
The Transcript 'export' statement now creates a path automatically When exporting data from Transcript, if a relative path is specified as part of the export filename, Transcript will automatically create the path if it does not exist. The path will be created relative to /exported
Changed the behaviour of some columns in the report tables The optional _per unit charges and per interval charges columns on the report pages represent a fraction of the total charge and as such should be considered a subtotal rather than a rate._
Allow users to see anonymous roll-up accounts even if they have no access When a user only has access to some children of a parent account, reports will now show the combined usage of those accounts grouped as an unknown account in the reports.
Fixed a rare bug where incorrect character encoding in the data source could lead to reports not loading
Currency symbol is no longer shown for quantity graphs
An issue has been fixed which could lead to empty reports when there actually was report data In some cases, selecting certain combination of filters could lead to reports showing No data while there actually was report data for the current set of filters. This behaviour was observed mainly on the instances report page.
Usernames are now allowed to contain special characters As a side effect of changing usernames to be case-insensitive, using special characters was no longer permitted since v1.8.1. This restriction is now removed.
Changed the behaviour of clearing the charge engine caches Clearing the charge engine (Edify) caches unprepares all reports. The button on the About page now reflects this.
It is now possible to use decimal values for adjustment amounts Previously this was only possible through the API. The GUI has been updated to also support this.
Changing the date in the invoice report no longer resets the account selection Previously, when changing the date range on the invoice report screen, the current account selection (dropdown inside the invoice page) would automatically select the first account in the list. This has now been fixed to remember the selection when changing the date.
Exivity now works correctly when installed in a directory containing spaces
Transcript variables were not properly expanded when using in an import filter
Export of consolidated invoice now contains data for all accounts Previously, selecting the CSV or Excel export of a consolidated invoice would only export data for the first account on the invoice.
Fixed crash bug in the 'services' statement When creating services, Transcript will no longer crash if a blank interval or model value is encountered while building the service definitions.
Older release notes can be found here.
Version 2 introduces a breaking change to historical cached data from installations before version 2.0.0. Therefore, when you're upgrading an Exivity version 1 to version 2, all Reports will be automatically Unprepared during upgrade. After completing the upgrade, you will have to browse to your Report Definitions and Prepare each of them for the appropriate date period:
As part of version 2, outdated syntax in your Transformers will be automatically converted to comply with version 2 Transcript syntax. Your version 1 Transformers will be backed up during upgrade into the default backup folder %EXIVITY_HOME_PATH%/system/backup. To manually upgrade any of your version 1 Transformers, you may use the script provided in %EXIVITY_PROGRAM_PATH%/update-transcript-v2.0.0.bat (this script will upgrade any Transformer currently available in the default Transformer folder).
the service that was previously responsible for scheduling and execution of workflows ("Exivity Eternity Service" which ran eternity.exe) has been replaced by the Aeon component and runs as the "Exivity Scheduling Service". In case the "Exivity Eternity Service" was running under a non-system account, ensure to update the newly registered "Exivity Eternity Service" with the corresponding user service credentials.
Extraction is the process by which USE (Unified Scriptable Extractor) retrieves data from external locations. The following types of data source are supported:
A USE script is required for USE to operate. Further information can be found via the links below:
An introductory overview of the scripting language:
A reference guide for the USE scripting language:
How to parse XML and JSON data
Template scripts that can be used as starting points for common data sources:
When deploying the Azure Stack Extraction template for Exivity, some configuration is required within your Azure Stack environment and a lookup file needs to be created. The following process must be completed in order to report on Azure Stack consumption:
Create an Exivity Enterprise Application in your Azure AD for authentication
Configure a rate card lookup file
Configure an Extractor
Configure a Transformer
Create your Report
In order for Exivity to authenticate with Azure Stack, you will need to create an application in the Azure AD where you have registered your Azure Stack management node:
Make sure to write down the Application ID and its corresponding Secret, since you will need these when configuring the Extractor later.
When you create this application in your Azure AD make sure it has (at least) the Reader Role in your Default Provider Subscription:
As Microsoft does not provide rate card information via the Azure Stack Consumption API you will only obtain usage metrics from Azure Stack for all of the Meter IDs that are mentioned here by Microsoft.
Exivity provides a template rate card that you can use for creating your own rates. Please bear in mind that these rates are fictional, thus you should update it with your preferred values. However to get started, you can use the file linked above by placing it as a csv file in your Exivity home folder at the following location:
%EXIVITY_HOME_PATH%\system\extracted\AzureStack\rates\azure_stack_example_rates.csv
Once loaded into the system using a Transformer you will be able to change the rates easily through the GUI. This will also enable you to test any draft rates.
To create the Extractor, browse to Data Sources > Extractors in the Exivity GUI and click the Create Extractor button. This will try to connect to the Exivity Github account to obtain a list of available templates. For Azure Stack:
Pick Azure_Stack_Extractor_(App+Secret) from the list
Provide a name for the Extractor in the name field
Click the Create button.
Once you've created the Extractor, go to first tab: Variables
Fill in all required variables marked within the red box in above screenshot. If you don't know some of the required GUIDs, most of these can be obtained by browsing to the Azure Stack management node URL:
https://adminmanagement.**<your.domain.com>**/metadata/endpoints?api-version=2015-01-01
Another way to obtain some of this information is using the Diagnostics button in your management portal:
When you click the Show Diagnostics link, it should download a JSON file containing most of the parameters you'll need, such as Provider GUID, Audience GUID etc.
Once you've filled in all details, go to the Run tab to execute the Extractor for a single day:
The Extractor requires two parameters in yyyyMMdd format:
from_date is the date for which you wish to collect consumption data
to_date should be the date immediately following from_date
These should be specified as shown in the screenshot above, separated with a space.
When you click the Run Now button, you should get a successful result.
Once you have successfully run your Azure Stack Extractor, you can create a Transformer template via Data Sources > Transformers in the Exivity GUI. Browse to this location and click the Create Transformer button. Make any changes that you feel necessary and then select the run tab to execute it for a single day as a test.
Make sure that when running the Transformer you select custom range in the drop-down menu labelled Run for and select the same day as for which you have extracted consumption data in the previous step.
Once you have run both your Extractor and Transformer successfully create a Report Definition via the menu option Reports > Definitions:
Select the column(s) by which you would like to break down the costs. Once you have created the report, you should then click the Prepare Report button after first making sure you have selected a valid date range from the date selector shown when preparing the report.
Once this is done you should be able to run any of Accounts, Instances, Services or Invoices report types located under the Report menu for the date range you prepared the report for.
The basename
statement is used to extract the filename portion of a path + filename string
basename
varName
basename
string
as
varName
Given a string describing the full path of a file, such as /extracted/test/mydata.csv the basename
statement is used to identify the filename (including the file extension, if any) portion of that string only. If there are no path delimiters in the string then the original string is returned.
The basename
statement supports both UNIX-style (forward slash) and Windows-style (backslash) delimiters.
When invoked as basename varName
, the varName parameter must be the name of the variable containing the string to analyse. The value of the variable will be updated with the result so care should be taken to copy the original value to a new variable beforehand if the full path may be required later in the script.
As a convenience in cases where the full path needs to be retained, the result of the operation can be placed into a separate variable by using the form basename string as varName
where string is the value containing the full path + filename and varName is the name of the variable to set as the result.
When invoked using basename string as varName
if a variable called varName does not exist then it will be created, else its value will be updated.
The following script ...
... will produce the following output:
The following script ...
... will produce the following output:
USE scripts are stored in <basedir>/system/config/use
, and are ASCII files which can be created with any editor. Both UNIX and Windows end-of-line formats are supported but in certain circumstances they may be automatically converted to UNIX end-of-line format.
Each statement in a USE script must be contained on a single line. Statements consist of a keyword followed by zero or more parameters separated by whitespace. The contains documentation for each statement.
By default, a space, tab or newline will mark the end of a word in a USE script. To include whitespace in a word (for example to create a variable with a space in it) then double quotes - "
- or an escape - \
- must be used to prevent the parser from interpreting the space as an end of word marker. Unless within double quotes, to specify a literal tab or space character it must be escaped by preceding it with a backslash character - \
.
Examples:
The following table summarises the behaviour:
Comments in a USE script start with a #
character that is either of
the first character of a line
the first character in a word
Comments always end at the end of the line they were started on
USE scripts often make use of variables. Variables have a name and a value. When a variable name is encountered on any given line during execution of the script, the name is replaced with the value before the line is executed.
To reference a variable, the name should be preceded with ${
and followed by }
. For example to access the value of a variable called username, it should be written as ${username}
.
The length (in characters) of a variable can be determined by appending .LENGTH
to the variable name when referencing it. Thus if a variable called result has a value of success then ${result.LENGTH}
will be replaced with 7
.
Variables may be exposed in the GUI by prefixing their declaration with the word public
as follows:
Any variable so marked may be edited using a form in the GUI before the script is executed. If a public variable is followed by a comment on the same line, then the GUI will display that comment for reference. If there is no comment on the same line, then the line before the variable declaration is checked, and if it starts with a comment then this is used. Both variants are shown in the example below:
If a variable declaration has both kinds of comment associated with it then the comment on the same line as the variable declaration will be used
A named buffer (also termed a response buffer) contains data retrieved from an external source, such as an HTTP or ODBC request. Buffers are created with the buffer
statement.
Once created, a buffer can be referenced by enclosing its name in {
and }
as follows:
Buffer names may be up to 31 characters in length
Up to 128 buffers may exist simultaneously
Up to 2Gb of data can be stored in any given buffer (memory permitting)
Parslets are used to extract data from from the contents of a named buffer.
The csv
statement is used to create and populate CSV files. It is typically combined with loops to write values extracted from an array in a JSON and/or XML document stored in a .
CSV files are produced via the use of multiple csv
statements which perform the following functions:
Create a new empty CSV file
Define the headers
Finalise the headers
Write data to one or more rows of the file
Close the file
All CSV files created by the csv
command use a comma - ,
- as the separator character and a double quote - "
- as the quote character. Headers and data fields are automatically separated and quoted.
The following is used to create a new, empty CSV file:
csv
label = filename
The label must not be associated with any other open CSV file. Up to 16 CSV files may be open simultaneously and the label is used by suesequent csv
statements to determine which of the open files the statement should operate on. Labels are case sensitive and may be from 1 to 15 characters in length.
The specified filename is created immediately, and if it is the name of an existing file then it will be truncated to 0 bytes when opened.
The filename argument may contain a path component but the csv
statement does not create directories, so any path component in the filename must already exist. The path, if specified, will be local to the Exivity home directory.
csv usage = "${exportdir}/azure_usage.csv"
This section refers to add_headers
as the action, but either add_header
or add_headers
may be used. Both variants work in an identical fashion.
csv add_headers
label header1 [header2 ... headerN]
All CSV files created by USE script must start with a header row which names the columns in the file. The number of columns can vary from file to file, but in any given file every data row must have the same number of columns as there are headers.
To create one or more columns in a newly created CSV file, the csv add_headers
statement is used as shown above. The label must match the label previously associated with the file as described previously.
One or more header names can be specified as arguments to csv add_headers
. Multiple instances of the csv add_headers
statement may reference the same CSV file, as each statement will append additional headers to any headers already defined for the file.
No checks are done to ensure the uniqueness of the headers. It is therefore up to the script author to ensure that all the specified headers in any given file are unique.
csv add_headers usage username user_id subscription_id
This section refers to fix_headers
as the action, but either fix_header
or fix_headers
may be used. Both variants work in an identical fashion
csv fix_headers
label
After csv add_headers
has been used to define at least one header, the headers are finalised using csv fix_headers
statement. Once the headers have been fixed, no further headers can be added to the file and until the headers have been fixed, no data can be written to the file.
csv fix_headers usage
This section refers to write_fields
as the action, but either write_field
or write_fields
may be used. Both variants work in an identical fashion
csv write_fields
label value1 [value2 ... valueN]
USE keeps track of the rows and columns as they are populated using one or more csv write_fields
statements, and will automatically write the fields from left to right starting at the first column in the first data row and will advance to the next row when the rightmost column has been written to.
It is the responsibility of the script author to ensure that the number of fields written to a CSV file is such that when the file is closed, the last row is complete, in order to avoid malformed files with one or more fields missing from the last row.
csv close
label
Once all fields have been written to a CSV file, it must be closed using the csv close
statement. This will ensure that all data is properly flushed to disk, and will free the label for re-use.
csv close usage
Consider the file "\examples\json\customers.json" representing two customers:
The resulting CSV file is as follows:
The aws_sign_String
statement is used to generate an AWS4-HMAC-SHA256 signature, used as the signature component of the Authorization
HTTP header when calling the AWS API.
aws_sign_string
varName
using
secret_key date region service
The authentication method used by AWS requires the generation of an authorization signature which is derived from a secret key known to the client along with specific elements of the query being made to the API.
This is a fairly involved process and a full step-by-step walkthrough is provided by Amazon on the following pages (these should be read in the order listed below):
The aws_sign_string
statement is used to generate the final signature as detailed on the page listed above.
Note that in order to use this statement it is necessary to have the following strings available:
A string to sign, obtained by following the process , containing meta-data about the request being made
A secret_key, obtained from Amazon which is used by any client application authorising against their API
The date associated with the API request, in YYYYMMDD format
The AWS region associated with the API request (for example eu-central-1)
The AWS service being accessed (for example s3)
The aws_sign_string
statement will use these inputs to generate the HMAC-SHA256 signature which is a component of the Authorization
header when connecting to the API itself.
The varName
parameter is the name of a variable containing the string to sign. After executing aws_sign_string
the contents of this same variable will have been updated to the base-16 encoded signature value.
If there are any errors in the string to sign, _date, AWS region or AWS service strings used as input to aws_sign_string
then a signature will still be generated, but the AWS API will reject the request. In this case it is necessary to review the process by which these strings were created as per the AWS guide provided above.
The following is an example USE script that implements everything described above.
After populating a with data from an external source such as an HTTP request or a file, it is often necessary to extract fields from it for uses such as creating subsequent HTTP requests or rendering output files.
This is accomplished using parslets. There are two types of parslet, static and dynamic. In both cases, when a parslet is used in a script it is expanded such that it is replaced with the value it is referencing, just like a variable is.
Static parslets refer to a fixed location in XML or JSON data
Dynamic parslets are used in conjunction with loops to retrieve values when iterating over arrays in XML or JSON data
Parslets can be used to query JSON or XML data. Although JSON is used for illustrative purposes, some additional notes specific to XML can be found further down in this article.
Consider the example JSON shown below:
The object containing all the data (known as the root node) contains the following children:
Objects and arrays can be nested to any depth in JSON. The children of nested objects and arays are not considered as children of the object containing those objects and arrays, i.e. the children of the heading
object are not considered as children of the root object.
Every individual 'thing' in JSON data, regardless of its type is termed a node.
Although different system return JSON in different forms, the JSON standard dictates that the basic principles apply universally to all of them. Thus, any possible valid JSON may contain arrays, objects, strings, boolean values (true or false values), numbers and null children.
It is often the case that the number of elements in arrays is not known in advance, therefore a means of iterating over all the elements in an array is required to extract arbitrary data from JSON. This principle also applies to objects, in that an object may contain any number of children of any valid type. Valid types are:
Some systems return JSON in a fixed and predictable format, whereas others may return objects and arrays of varying length and content. The documentation for any given API should indicate which fields are always going to be present and which may or may not be so.
Static parslets act like variables in that the parslet itself is expanded such that the extracted data replaces it. Static parslets extract a single field from the data and require that the location of that field is known in advance.
The values associated with these fields can be extracted using a static parslet which is specified using the following syntax:
$JSON{buffer_name}.[node_path]
Static parslets always specify a named buffer in curly braces immediately after the $JSON
prefix
The node_path describes the location and name of the node containing the value we wish to extract. Starting at the root node, the name of each node leading to the required value is specified in square brackets. Each set of square brackets is separated by a dot.
The nodepaths for the fixed nodes described above are therefore as follows:
Putting all the above together, the parslet for locating the category in the heading is therefore:
$JSON{example}.[heading].[category]
When this parslet is used in a USE script, the value associated with the parslet is extracted and the parslet is replaced with this extracted value. For example:
print $JSON{example}.[heading].[category]
will result in the word Documentation being output by the statement, and:
var category = $JSON{example}.[heading].[category]
will create a variable called category with a value of Documentation.
Currently, a parslet must be followed by whitespace in order to be correctly expanded. If you want to embed the value into a longer string, create a variable from a parslet and use that instead:
When using JSON parslets that reference values that may contain whitespace it is sometimes necessary to enclose them in double quotes to prevent the extracted value being treated as multiple words by the script
It may be required to extract values from a JSON array which contains values that do not have names as shown below:
The result of executing the above against the sample data is:
If the anonymous arrays have a known fixed length then it is also possible to simply stream the values out to the CSV without bothering to assign them to variables. Thus assuming that the elements in the metrics
array always had two values, the following would also work:
Dynamic parslets are used in to extract data from locations in the data that are not known in advance, such as when an array of unknown length is traversed in order to retrieve a value from each element in the array.
Note the following differences between a static parslet and a dynamic parslet:
Parentheses are used to surround the name of the foreach loop (as opposed to curly braces)
The nodepath following a dynamic parslet is relative to the target of the foreach loop
The following script fragment will render the elements in the items array (in the example JSON above) to disk as a CSV file.
In the example above, the first foreach
loop iterates over the elements in the 'items' array, and each of the dynamic parslets extract values from the current element in that loop. The dynamic parslets use the current element, this_item as the root for their node paths.
If a parslet references a non-existent location in the XML or JSON data then it will resolve to the value EXIVITY_NOT_FOUND
XML parslets work in exactly the same way that JSON parslets do, apart from the following minor differences:
XML parslets are prefixed $XML
When extracting data from XML, the foreach
statement only supports iterating over XML arrays (whereas JSON supports iterating over objects and arrays)
An XML parslet may access an XML attribute
To access an XML attribute, the node_path should end with [@atrribute_name]
where attribute_name is the name of the attribute to extract. For example given the following data in a buffer called xmlbuf
:
The following script:
will produce the following output:
Currently, comments should not be used on the same line as the statement as it will consider the comment as part of the value to encrypt
Variables may be explicitly declared using the statement, or may be automatically created as a consequence of actions performed in the script. Additionally, a number of variables are automatically created before a script is executed.
For a list of variables created automatically please consult the article on the statement
It may be desirable to conceal the value of some variables (such as passwords) rather than have them represented as plain text in a USE script. This can be accomplished via the statement.
Please refer to the full article on for more information on parslets and their use.
After the headers have been fixed, the csv write_fields
statement is used to write one or more fields of data to the CSV file. Currently it is not possible to write a blank field using csv write_fields
, however when extracting data from a buffer using a , if the extracted value is blank then it will automatically be expanded to the string (no value)
.
Using a combination of loops and , the information in the above JSON can be converted to CSV format as follows:
Parslets are the means by which USE locates and extracts fields of interest in any valid JSON data, regardless of the structure. For full details of the JSON data format, please refer to
In the example JSON above, let us assume that the data is held in a called example
and that the title
and heading
children are guaranteed to be present. Further, the heading
object always has the children category
and finalised
. Note that for all of these guaranteed fields, the value associated with them is indeterminate.
The buffer_name is the name of the buffer containing the JSON data, which must have previously been populated using the statement.
Extraction of values that do not have names can be accomplished via the use of nested loops in conjunction with an empty nodepath ([]
) as follows:
Which method is used will depend on the nature of the input data. Note that the special variable ${loopname.COUNT}
(where loopname is the label of the enclosing foreach
loop) is useful in many contexts for applying selective processing to each element in an array or object as it will be automatically incremented every time the loop iterates. See for more information.
A dynamic parslet must be used in conjunction with a loop and takes the following form:
A dynamic parslet does not reference a named buffer directly, rather it references the name of a loop
Type
Description
APIs
Typically, usage data is retrieved from the API or APIs provided by the cloud (or clouds) for which reports need to be generated. This is usually a REST API accessed via HTTP/S.
Files
A file on the local file-system or on a shared volume. This is usually a CSV, JSON or XML file.
Exivity
In some cases it is useful to retrieve information from Exivity itself, such that accounts and usage data that were created historically can be incorporated into the daily processing.
Database
Arbitrary SQL queries can be executed against an SQL server either via a direct connection string or via an ODBC DSN.
Web
Arbitrary HTTP queries can be invoked in order to retrieve information from any web page accessible from the Exivity server.
Child | Type |
title | string |
heading | object |
items | array |
Type | Description |
| A node encompassing zero or more child nodes (termed children) of any type |
| A list of children, which may be of any type (but all children in any given array must be of the same type) |
| Textual data |
| Numeric data, may be integer or floating point |
| A true or false value |
| A null value |
Nodepath | Referenced value |
| Example JSON data |
| Documentation |
| true |
Characters | Meaning |
| Anything inside the quotes, except for a newline, is treated as literal text |
| Whether within quotes or not, this is expanded to a double quote - |
| When used outside quotes, this is expanded to a TAB character |
| When used outside quotes, a space following the |
| When used outside quotes, this is expanded to a backslash - |
The discard
statement is used to delete a named buffer.
discard
{buffer_name}
The discard
statement will delete the named buffer and free the memory used to store its contents. The statement takes immediate effect and any attempt to reference the buffer afterwards (at least until such time as another buffer with the same name is created) will cause the USE script to log an error and fail.
The exit_loop
statement will terminate the current loop.
Either exit_loop
or loop_exit
may be used. Both variants work identically.
exit_loop
The exit_loop
statement will immediately terminate the current loop and script execution will jump to the statement following the }
at the end of the current loop.
This can be done even if the exit_loop
statement is within one or more if constructs inside the loop.
If no loop is in effect then an error will be logged and the script will terminate.
The encode
statement is used to base16 or base64 encode the contents of a variable or a named buffer.
encode base16|base64
varName|{buffer_name}
The encode
statement will encode the contents of an existing variable or named buffer, replacing those contents with the encoded version.
The result of encoding the contents will increase their length. With base16
encoding the new length will be double the original. With base64
encoding the new length will be greater than the original but the exact size increase will depend on the contents being encoded.
When encoding a variable, if the size of the result after encoding exceeds the maximum allowable length for a variable value (8095 characters) then the USE script will fail and an error will be returned.
Encoding an empty variable or buffer will produce an empty result
The following script ...
... produces the following output:
The clear
statement is used to delete all HTTP headers previously configured using the set http_header statement.
clear http_headers
The clear
statement will remove all the headers currently defined, after which a new set of headers can be specified using set http_header.
The escape
statement is used to escape quotes in a variable value or the contents of a named buffer
escape quotes in
varName|{bufferName}
If a variable value or named buffer contains quotes then it may be desirable to escape them, either for display purposes (to prevent USE from removing them before rendering the data as output) or in order to satisfy the requirements of an external API.
The escape
statement will precede all occurrances of the character "
with a backslash as shown in the example below. This operation is not just temporary - it will update the actual contents of the variable or named buffer.
The escape
statement does not consider the context of any existing quotes in the data. Running it multiple times against the same data will add an additional escape character each time to each occurrence of a quote.
Given an input file called 'escapeme.txt' containing the following data:
The following script:
will produce the following output:
This article assumes knowledge of variables.
The encrypt
statement is used to conceal the value of a variable, such that it does not appear in plain text in a USE script.
encrypt var
name = value_to_be_encrypted
The encrypt
statement differs from other statements in that it takes effect before execution of a USE script begins. In this regard is is effectively a directive to the internal script pre-processor which prepares a script for execution.
Comments, quotes and escapes in the value to be encrypted are treated as literal text up until the end of the line.
White-space following the value to be encrypted will therefore be included in the encrypted result.
White-space preceding the value to be encrypted will be ignored and will not be included in the encrypted result.
Any variable prefixed with the word encrypt will be encrypted by the pre-processor and the script file itself will be modified as follows:
All text (including trailing white-space) from the word following the =
character up to the end of the line is encrypted
The encrypted value is base64 encoded
The original variable value in the USE script is substituted with the result
The encrypt
keyword for that variable is changed to encrypted
The USE script is overwritten on disk in this new form
This process is repeated for all variables preceded by the encrypt
keyword.
As a side effect of the encryption process, it is not currently possible to encrypt a value that begins with a space or a tab. This functionality will be implemented in due course.
Once encrypted a variable can be used just as any other, the only requirement being that the encrypted
keyword preceding its declaration is not removed or modified.
To change the value of an encrypted variable simply replace the declaration altogether and precede the new declaration with encrypt
. Upon first execution, the USE script will be updated with an encrypted version of the variable as described above.
Encrypted values can only be used on the system that they were created on. If an encrypted value is moved or copied to a different installation of Exivity then any attempt to reference or decrypt it will result in something other than the original value.
Firstly, create the script as usual, with encrypt
preceding any variables that are to be encrypted:
Secondly, run the script. Prior to execution the script will be automatically modified as shown below:
Statement | Description |
Create an AWS4-HMAC-SHA256 signature value |
Extract the filename from path + filename string |
Create a named buffer |
Create a CSV file |
Base16 or base64 encode data |
Break out of a loop |
Iterate over an array |
Set a variable to contain the number of the last day of a specified month |
Inflate GZIP data |
Generate an SHA256 or HMACSHA256 hash |
Execute an HTTP request |
Conditionally execute statements |
Format JSON data |
Change the logging level |
Execute statements repeatedly |
Search using a regular expression |
Suspend script execution |
Echo text to standard output |
Save a named buffer to disk |
Specify a protocol parameter |
Define a subroutine |
End script execution |
URI (percent) encode a variable |
Create or update a variable |
Delete any defined headers
Delete a named
Encrypt a
Escape quotes in a value or named
Call a
Explicitly return from a
Decompress ZIP data in a named
The gosub
keyword is used to run a named subroutine
gosub
subroutineName
(
[argument1, ... argumentN]
)
The argument list may span multiple lines, so long as any given argument is contained on a single line and ends with a comma, eg:
The subroutineName provided to the gosub
statement must be that of a subroutine defined elsewhere in the script using the subroutine statement.
If any argument contains white-space or a comma then it must be quoted:
gosub getfile("directory with spaces/filename.txt")
It is permitted to call a subroutine from within another subroutine, therefore gosub
can be used within the body of a subroutine. This may be done up to 256 levels in depth.
The opening bracket after subroutineName may or may not be preceded with a space:
gosub getfile ("filename.txt")
To call a subroutine with no parameters, use empty brackets:
gosub dosomething()
Please refer to the example in the documentation for the subroutine statement
The functionality described in this article is not yet available. This notice will be removed when the appropriate release is made.
The gunzip
statement is used to inflate a GZIP file
gunzip
filename
as
filename
gunzip
{bufferName}
as
filename
The gunzip
statement can be used to extract the contents of a GZIP archive containing a single file. The GZIP archive may be a file on disk or may be the contents of a named buffer.
It is not possible to inflate GZIP data directly in memory, but the same effect can be achieved by extracting GZIP data in a named buffer to disk, and then loading the extracted data back into the named buffer as shown in the example below.
All paths and filenames are treated as relative to the Exivity home directory
The days_in_month
statement sets a variable to contain the number of days in the specified month
get_last_day_of
yyyyMM
as
varName
The get_last_day_of
statement will set the value of the variable called varName to contain the number of days in the month specifed by yyyyMM where yyyy is a four-digit year and MM is a 2-digit month.
The statement will take leap years into account.
The hash
statement is used to generate a base-16 or base-64 encoded hash of data stored in a variable or named buffer.
hash sha256 [HMAC
key] target|{target}
as
result
[b16|b64]
hash md5
target|{target}
as
result
[b16|b64]
The hash
statement uses the contents of target as its input and places the final result into result. The SHA256 and MD5 hash algorithms are supported.
If target is surrounded with curly braces like {this}
then it is taken to be the name of a memory buffer and the contents of the buffer will be used as input. Otherwise, it is treated as the name of the variable, the value of which will be hashed.
By default the resulting hash is base-16 encoded and the result placed into the variable specified by the result argument.
result is the name of the variable to put the output into, and not a reference to the contents of that variable. This is why it is not ${result}
If the optional HMAC
key
arguments are provided when the hash type is sha256
then the secret in key will be used to generate an HMAC-SHA-256 result.
If the optional b64
argument is used (base64
may also be specified), then the result will be encoded using base-64.
The optional b16
(base16
may also be used) is provided for completeness, but need not be specified as this is the default encoding to use.
Running the script:
results in the following output:
The foreach
statement defines a block of zero or more statements and associates this block with multiple values. The block of statements is executed repeatedly, once for each value.
foreach
parslet
as
loop_label
{
# Zero or more USE script statements go here
}
The opening {
may be placed on a line of its own if preferred
The foreach
statement is used to iterate over the values in an array or object (identified by a parslet) within the data in a named buffer.
The loop will execute for as many elements as there are in the array, or for as many members there are in the object. For the purposes of this documentation, the term child will be used to refer to a single array element or object member.
If the array or object is empty, then the body of the loop will be skipped and execution will continue at the statement following the closing }
.
The loop_label can be any string, but must not be the same as any other loop_label values in the same scope (ie: when nesting foreach
loops, each loop must have a unique label). This label is used to uniquely identify any given loop level when loops are nested.
The foreach
statement will execute the statements in the body of the loop once for every child. foreach
loops can be nested, and at each iteration the loop_label can be used to extract values from an array or object in the current child using a dynamic parslet. See the examples at the end of this article for a sample implementation showing this in action.
As the foreach
loop iterates over the children, a number of variables are automatically created or updated as follows:
Consider the following JSON in a file called samples/json/array.json
:
To generate a list of IDs and names from the items array, the following would be used:
To extract values from an array using nested loops:
Given the source JSON in a file called example.json
, the following USE script:
will produce the following output:
The http
statement initiates an HTTP session using any settings previously configured using the set statement. It can also be used for querying response headers.
http
method url
http dump_headers
http get_header
headerName
as
varName
The http
statement performs an HTTP request against the server and resource specified in the url paramater. Any http-related settings previously configured using set will be applied to the request.
The method argument determines the HTTP method to use for the request and must be one of GET
, PUT
, POST
or DELETE
.
The url argument must start with either http:
or https:
. If https:
is used then SSL will be used for the request.
The url argument must also contain a valid IP address or hostname. Optionally, it may also contain a port number (preceded by a colon and appended to the IP address or hostname) and a resource.
The following defaults apply if no port or resource is specified:
The format of the http
statement is identical when used in conjunction with the buffer statement.
To dump a list of all the response headers returned by the server in the most recent session use the statement:
http dump_headers
This will render a list of the headers to standard output, and is useful when implementing and debugging USE scripts. The intention of this statement is to provide a tool to assist in script development, and as such it would normally be removed or suppressed with a debug mode switch in production environments.
To retrieve the value of a specific header, use the statement:
http get_header
headerName
as
varName
This will set the variable varName to be the value of the header headerName.
If headerName was not found in the response, then a warning will be written to the log-file. In this case varName will not be created but if it already exists then its original value will be unmodified.
The following shows the process of retrieving a header. The output of:
Takes the following form:
Variable
Value
loop_label.COUNT
The number of times the loop has executed. If the object or array is empty then this variable will have a value of 0.
loop_label.NAME
The name of the current child
loop_label.VALUE
The value of the current child
loop_label.TYPE
The type of the current child
Field
Default
port
80
if using http
443
if using https
resource
/
The loop
statement executes one or more statements multiple times.
loop
label [count]
[timeout
timelimit]
{
# Statements
}
The opening {
may be placed on a line of its own if preferred but the closing }
must be on a line of its own
The loop
statement will loop indefinitely unless one of three exit conditions cause it to stop. These are as follows:
The number of loops specified by the count parameter are completed
At least as many milliseconds as are specified by the timelimit parameter elapse
An exit_loop statement explicitly exits the loop
In all three cases when the loop exits, execution of the script will continue from the first statement after the closing }
marking the end of the loop.
In the event that both count and timelimit parameters are specified, the loop will exit as soon as one or other of the limits have been reached, whichever comes first.
Both the count and timeout parameters are optional. If omitted then the default for both of them will be infinite.
The loop
statement will automatically create and update a variable called loop_label.COUNT
which can be referenced to determine how many times the loop has executed (as shown in the example below). This variable is not deleted when the loop exits which means that it is possible to know how many times any given loop executed, even after the loop has exited.
Any specified timeout value is evaluated at the end of each execution of the loop and as such the actual time before the loop exits is likely to be a short time (typically a few milliseconds) greater than the specified value. In practice this should be of no consequence.
The loop shown above will result in the following output:
The if
statement is used to conditionally execute one or more statements. In conjunction with an optional else
statement it can cause one or other of two blocks of statements to be executed depending on whether an expression is true or false.
if
(expression)
{
# Statements
} [else {
# Statements
}]
If the condition evaluates to true, then the first block of statements is executed, and the second block (if present) is skipped over. If the condition evaluates to false then the first block of statements is skipped and the second block (if present) is executed.
The opening {
character at the start of each block may be placed on a line of its own if preferred but the closing }
must be on a line of its own.
Multiple conditions can be used in a single expression and combined with the boolean operators &&
or ||
(for AND and OR respectively) so long as each condition is enclosed in braces. For example:
Given the source JSON in a file called example.json
, the following USE script:
will produce the following output:
The match
statement is to used search either a specified string or the contents of a named buffer using a regular expression.
match
label expression target
The three parameters serve the following purposes:
The label associates a meaningful name to the search. Once the match has been attempted, two variables will be created or updated as follows:
These variables can be checked after the match in order to determine the result status and access the results.
The regular expression must contain one or more characters enclosed in brackets - (
... )
- the contents of which are termed a subgroup. If a successful match is made then the portion of the target text that was matched by the subgroup will be returned in the label.RESULT
variable.
The target determines whether a supplied string or the contents of a named buffer are searched. By default the parameter will be treated as a string.
If the string contains white-space then it must be enclosed in double quotes
If the target argument is surrounded with curly braces - {
... }
- then it is taken to be the name of a buffer and the expression will be applied to the contents of that buffer.
Regular expressions are generally used for searching ASCII data. Searching binary data is possible but may be of limited usefulness.
Search the contents of a variable for the text following the word 'connection:' with or without a capital 'C':
Search a text file previously retrieved from a HTTP request to locate the word 'Error' or 'error'
While executing a USE script, various messages are written to a logfile. The loglevel
option determines the amount of detail recorded in that logfile.
loglevel
loglevel
The table below shows the valid values for the loglevel argument. Either the numeric level or the label can be specified. If the label is used then it must be specified in CAPITAL LETTERS.
The log levels are cumulative, in that higher log-level values include lower level messages. For example a level of INFO
will cause FATAL
, ERROR
, WARN
and INFO
level messages to be written to the log.
The loglevel
statement takes immediate effect and may be used multiple times within a USE script in order to increase or decrease the logging level at any time.
The json
statement is used to format JSON in a .
json format
{buffername}
In many cases an API or other external source will return JSON in a densely packed format which is not easy for the human eye to read. The json
statement is used to re-format JSON data that has been previously loaded into a named buffer (via the statement) into a form that is friendlier to human eyes.
After the JSON has been formatted, the buffer can be or for subsequent inspection
Given the following single packed line of JSON in a named buffer called myJSON:
The following USE script fragment:
will result in the following output:
Parameter
Value
label
A unique name to associate with this match
expression
The regular expression to apply to the target
target
The data to search using the expression
Variable
Possible values
Notes
label
.STATUS
MATCH
NOMATCH
ERROR
The result of applying the expression (ERROR
infers an invalid expression)
label
.RESULT
(A string) (Empty value)
The text matched by the subgroup in the expression, if any
Level | Label | Meaning |
|
| Extended debugging information |
|
| Debugging information |
|
| Standard informational messages |
|
| Warnings and non-fatal errors |
|
| Run-time errors |
|
| Non-recoverable errors |
The pause
statement is used to suspend execution of a USE script for a specified time.
pause
delaytime
The delaytime parameter is the number of milliseconds to wait before continuing. A value of 0 is allowed, in which case no delay will occur.
The pause
statement may be useful in cases where an external data source imposes some form of rate limiting on the number of queries that can be serviced in a given time-frame, or to slow down execution at critical points when debugging a long or complex script.
This example makes use of script parameters which are provided when USE is executed. For more information on script parameters please refer to the Extract introduction.
The return
statement is used to exit a subroutine at an arbitrary point and return to the calling location
return
A subroutine will automatically return to the location it was called from when the end of its body is reached. However it may be desirable to explicitly exit the subroutine at some other point in which case the return
statement is used.
The return
statement cannot be used to return a value to the calling code (this should be done via the use of variables as described in the subroutine statement documentation)
The print
statement is used to display text to standard output while a USE script is executing.
print [-n]
word|{buffer_name} [... word|{buffer_name]
The print
statement enables user-defined output to be generated during the execution of a USE script. When retrieving data from external sources it may take some time for a lengthy series of operations to complete, so one use of the print
statement is to provide periodic status updates during this time.
The print
statement will process as many arguments as it is given, but at least one argument is required. If the first argument is -n
then no newline will be output after the last argument has been echoed to standard output, else a newline is output after the last argument.
Arguments that are normal words will be sent to standard output followed by a space. Arguments referencing a named buffer will result in the contents of the buffer being displayed.
Note that print
will stop output of data from a named buffer as soon as a NUL
(ASCII value 0) character is encountered
The save
statement is used to write the contents of a named buffer to disk.
save
{buffer_name}
as
filename
The save
statement will write the contents of a named buffer to filename. As well as providing a means of direct-to-disk downloading this can be useful for retrieving server responses and capturing them for later examination, whether it be for analysis, debugging or audit purposes.
If the destination file already exists then it will be overwritten.
If the filename argument contains a path component, then any directories not present in the path will be created. If creation of the path destination file is not successful then an error will be logged and the USE script will fail.
The save
statement is similar in effect to the http_savefile option supported by set, in that data from a server is written to disk. There is one important distinction however:
When set http_savefile has been used to specify a file to save, the next HTTP request will stream data to the file as it is received from the server
When a buffer statement is used to capture the server response, and a subsequent save
statement is used to write it to disk, all the buffered data will be written to the file immediately
The subroutine
keyword is used to define a named subroutine
A subroutine is a named section of code that can be executed multiple times on demand from anywhere in the script. When called (via the gosub statement), execution of the script jumps to the start of the specified subroutine. When the end of the code in the subroutine body is reached or a return statement is encountered (whichever comes first), execution resumes at the statement following the most recent gosub statement that was executed.
The code in the body of a subroutine
statement is never executed unless the subroutine is explicitly called using gosub. If a subroutine is encountered during normal linear execution of the script then the code in it will be ignored.
Subroutines in USE do not return any values, but any variables that are set within the subroutine can be accessed from anywhere in the script and as such they should be used for returning values as needed.
When invoked via the gosub statement, arguments can be passed to the subroutine. These arguments are read-only but may be copied to normal variables if required.
Arguments are accessed using the same syntax as is used for variables as follows:
${SUBARG.COUNT}
contains the number of arguments that were passed to the subroutine
${SUBARG_N}
is the value of any given argument, where N
is the number of the argument starting at 1
Every time a subroutine is called, any number of arguments may be passed to it. These arguments are local to the subroutine and will be destroyed when the subroutine returns. However, copying an argument to a standard variable will preserve the original value as follows:
After the subroutine above has been executed the return_value
variable will retain the value it was set to.
It is not permitted to nest subroutine
statements. If used within the body of a subroutine statement, a subroutine
statement will cause the script to terminate with an error.
The following demonstrates using a subroutine to detect when another subroutine has been provided with an incorrect number of arguments:
The set
statement is used to configure a setting for use by a subsequent http or buffer statements.
set
setting value
A protocol such as http offers a number of configuration options. Any given option is either persistent or transient:
The following settings can be configured using set
:
set http_progress yes|no
Persistent. If set to yes then dots will be sent to standard output to indicate that data is downloading when an HTTP session is in progress. When downloading large files if a lengthy delay with no output is undesirable then the dots indicate that the session is still active.
set http_username
username
Persistent. Specifies the username to be used to authenticate the session if the http_authtype
setting is set to anything other than none
. If the username contains any spaces then it should be enclosed in double quotes.
set http_password
password
Persistent. Specifies the password to be used to authenticate the session if the http_authtype
setting is set to anything other than none
. If the password contains any spaces then it should be enclosed in double quotes.
set http_authtype
type
Persistent. Specifies the type of authentication required when initiating a new connection. The type parameter can be any of the following:
set http_authtarget
target
Persistent. Specifies whether any authentication configured using the http_authtype
setting should be performed against a proxy or the hostname specified in the http URL.
Valid values for target are:
server
(default) - authenticate against a hostname directly
proxy
- authenticate against the proxy configured at the Operating System level
set http_header
"name: value"
Persistent. Used to specify a single HTTP header to be included in subsequent HTTP requests. If multiple headers are required, then multiple set http_header
statements should be used.
An HTTP header is a string of the form name: value.
There must be a space between the colon at the end of the name and the value following it, so the header should be enclosed in quotes
Example: set http_header "Accept: application/json"
Headers configured using set http_header
will be used for all subsequent HTTP connections. If a different set of headers is required during the course of a USE script then the clear statement can be used to remove all the configured headers, after which set http_header
can be used to set up the new values.
By default, no headers at all will be included with requests made by the http statement. For some cases this is acceptable, but often one or more headers need to be set in order for a request to be successful.
Typically these will be an Accept:
header for GET requests and an Accept:
and a Content-Type:
header for POST requests. However there is no hard and fast standard so the documentation for any API or other external endpoint that is being queried should be consulted in order to determine the correct headers to use in any specific scenario.
Headers are not verified as sane until the next HTTP connection is made
set http_body data
string
- use the specified string as the body of the request
set http_body file
filename
- send the specified file as the body of the request
set http_body
{named_buffer}
- send the contents of the named buffer as the body of the request
Transient. By default no data other than the headers (if defined) is sent to the server when an HTTP request is made. The http_body
setting is used to specify data that should be sent to the server in the body of the request.
When using http_body
a Content-Length:
header will automatically be generated for the request. After the request this Content-Length:
header is discarded (also automatically). This process does not affect any other defined HTTP headers.
After the request has been made the http_body
setting is re-initialised such that the next request will contain no body unless another set http_body
statement is used.
set http_savefile
filename
Transient. If set, any response returned by the server after the next HTTP request will be saved to the specified filename. This can be used in conjunction with the buffer statement, in which case the response will both be cached in the named buffer and saved to disk.
If no response is received from the next request after using set http_savefile
then the setting will be ignored and no file will be created.
Regardless of whether the server sent a response or not after the HTTP request has completed, the http_savefile
setting is re-initialised such that the next request will not cause the response to be saved unless another set http_savefile
statement is used.
No directories will be created automatically when saving a file, so if there is a pathname component in the specified filename, that path must exist.
set http_savemode
mode
Persistent.
If mode is overwrite
(the default) then if the filename specified by the set http_savefile
statement already exists it will be overwritten if the server returns any response data. If no response data is sent by the server, then the file will remain untouched.
If mode is append
then if the filename specified by the set http_savefile
statement already exists any data returned by the server will be appended to the end of the file.
set http_timeout
seconds
Persistent. After a connection has been made to a server it may take a while for a response to be received, especially on some older or slower APIs. By default, a timeout of 5 minutes (300 seconds) is endured before an error is generated.
This timeout may be increased (or decreased) by specifying a new timeout limit in seconds, for example:
The minimum allowable timeout is 1 second.
set odbc_connect
connection_string
Persistent. Sets the ODBC connection string for use by the buffer statement's odbc_direct protocol. The connection string may reference an ODBC DSN or contain full connection details, in which case a DSN doesn't need to be created.
A DSN connection string must contain a DSN attribute and optional UID and PWD attributes. A non-DSN connection string must contain a DRIVER attribute, followed by driver-specific attributes.
Please refer to the documentation for the database to which you wish to connect to ensure that the connection string is well formed.
An example connection string for Microsoft SQL Server is:
Type
Meaning
Persistent
The setting remains active indefinitely and will be re-used over successive HTTP calls
Transient
The setting only applies to a single HTTP call, after which it is automatically reset
Value
Meaning
none
(default)
no authentication is required or should be used
basic
use basic authentication
ntlm
use NTLM authentication
passport
use passport authentication
digest
use digest authentication
negotiate
automatically selects between NTLM and Kerberos authentication
The unzip
statement is used to unzip the data in a named buffer.
unzip
{buffer_name}
The unzip
statement will extract a single file from a zip archive stored in a named buffer. In order for this to succeed, the buffer must have been previously populated using the buffer statement, and the data within the buffer must be a valid ZIP file.
Only ZIP files are supported. To extract GZIP files, use gunzip
A warning will be logged, the buffer left intact and the script will continue to execute if any of the following conditions arise:
The buffer is empty or does not contain a valid ZIP archive
The ZIP archive is damaged or otherwise corrupted
More than 1 file is present within the archive
After the unzip
statement completes, the buffer will contain the unzipped data (the original ZIP archive is discarded during this process).
The filename of the unpacked file is also discarded, as the resulting data is stored in the buffer and can subsequently be saved using an explicit filename as shown in the example below.
The terminate
statement will exit the USE script immediately.
terminate [with error]
Normally a USE script will finish execution when an error is encountered or when the end of the script file is reached, whichever comes first.
When the terminate
statement is encountered, the script will finish at that point. No statements after the terminate
statement will be executed.
By default, the script will exit with a success status, however it may be useful to exit deliberately when an error such as an invalid or unexpected response from an HTTP session is detected. Adding the keywords with error
to the statement will cause it to exit with an error status.
Transcript executes user-definable scripts (termed tasks) in order to produce one or more Reporting Database Files (RDFs) from one or more input Dataset files in CSV format. These RDFs are later used by the reporting engine to generate results.
Transcript tasks are located in system/config/transcript/
and are ASCII files which can be created with any editor. Both UNIX and Windows end-of-line formats are supported.
Each statement in a Transcript task must be contained on a single line. Statements consist of a keyword indicating the action to perform, followed by zero or more parameters, separated by white-space, required by the statement. Documentation for all the possible statements can be found in the Transcript language reference guide.
By default a space, tab or newline will mark the end of a word in a Transcript task. To include white-space in a parameter (for example to reference a column name with a space in it) then this can be done by enclosing it in double quotes or escaping it by preceding it with \
.
Examples:
create columns from "Meter Name" using Quantity
create columns from Meter\ Name using Quantity
The following table summarises the behaviour of quotes and escapes:
Comments in a Transcript task start with a #
character that is either of:
the first character of a line in the Transcript task
the first character in a word
Comments always end at the end of the line they are started on.
Transcript statements may contain variables. Variables have a name and a value. When a variable name is encountered during execution of the task, the name is replaced with the value of the variable with that name.
To separate them from normal statement words, variable names are always preceded with ${
and followed by }
. Therefore the variable with the name dataDate
is referenced as ${dataDate}
in the transcript task. As well as user defined variables (created using the var statement), the following default variables are supported by Exivity:
Variable names ...
may be used multiple times in a single statement
are case sensitive - ${dataDate}
is different to ${datadate}
may not be nested
may be embedded within surrounding text - xxx${dataDate}yyy
may be used within quotes:
import "${baseDir}\to_import\AzureJuly${dataDate}.ccr" source AzureJuly
may appear as words of their own in a transcript statement - create column Date value ${dataDate}
A regular expression variable is a special type of variable used to match the name of a column in a DSET. It is enclosed by ${/
and /}
and the text within this enclosure can take either of the following two forms:
${/
expression
/}
The regular expression described by expression will be applied to the default DSET
${/
dset.id/expression
/}
If the text preceding the central /
character is a valid DSET ID then the expression after that /
will be applied to the column names in that DSET
If the text preceding the /
character is not a valid DSET ID then the entire text of the variable between the ${/
and /}
enclosure is treated as a regular expression and will be applied to the default DSET
Once the DSET ID and the expression have been established by the above, the expression is tested against each column name in the DSET and the first matching column name is returned. If no match is found, then an error is logged and the transcript task will fail.
The regular expression may contain a subgroup, which is enclosed within parentheses - (
and )
. If no subgroup is present, and a match is made, then the entire column name will be returned. If a subgroup is present and a match is made, then only the characters matching the portion of the expression within the parentheses are returned. For example:
The expression does not have to match the entire column name. Assuming no subgroup is specified, as long as a match is made then the variable will be expanded to the whole column name.
Regular expression variables are powerful tools when combined with the rename statement, as they can be used to transform an uncertain column name into a known one.
Examples:
A Transcript task cannot manipulate data on disk directly, so it is necessary to import one or more Datasets in CSV format at runtime in order to process the data within them. When a Dataset is imported the following sequence of actions takes place:
The Dataset (in CSV format) is read from disk
A number of checks are done on the data to ensure it meets the requirements to qualify as a Dataset
The data is converted into an internal format called a DSET
The DSET is assigned two tags (source
and alias
) which when combined together form a unique ID to identify the DSET (see Core concepts for more information)
An index is constructed, which facilitates high speed manipulation of the data in the DSET
The DSET is added to the list of DSETs available for use by subsequent statements in the Transcript task
Once these actions have been completed, a DSET can be identified through the unique combination of source.alias
. This permits Transcript statements to specify which DSET to operate on.
In addition, a default DSET can be specified, which will be used if no alternative DSET is specified. Full details of these mechanism are detailed in the reference guide, specifically in the import and default articles.
Data can be exported in one of two ways during the execution of a Transcript task:
Many Transcript statements change the data in the DSET in some way. Columns may be created, renamed or deleted and rows may be added and removed for example.
At any point in the Transcript process the current state of a DSET can be rendered to disk as an output CSV file. This is accomplished via use of the export statement. This permits snapshots of a DSET to be created for debugging or audit purposes, as well as the creation of partially processed CSV files for import into a later Transcript process.
The finish statement creates a Reporting Database File (RDF) containing the data in a DSET. This RDF can then be used by the reporting engine.
The uri
statement is used to encode the contents of a variable such that it does not contain any illegal or ambiguous characters when used in an request.
uri encode
varname
uri component-encode
varname
uri aws-object-encode
varname
As well as uri component-encode
you can use uri encode-component
(the two are identical in operation). Similarly, uri aws-object-encode
and aws-encode-object
are aliases for each other.
When sending a request to an HTTP server it is necessary to encode certain characters such that the server can accurately determine their meaning in context. The encoding involves replacing those characters with a percent symbol - %
- followed by two hexadecimal digits representing the ASCII value of that character.
Note that the last parameter to the uri
statement is a variable name, so to encode the contents of a variable called my_query
the correct statement would be uri encode my_query
and not uri encode ${my_query}
(The latter would only be correct if the value of my_query
was the name of the actual variable to encode)
USE script provides the following methods for encoding the contents of a variable:
uri encode
varname
This method will encode all characters except for the following:
This is typically used to encode a URI which contains spaces (spaces encode to %20
) but doesn't contain any query parameters.
uri encode-component
varname
This method will encode all characters except for the following:
This is typically used to encode query components of a URI, such as usernames and other parameters. Note that this method will encode the symbols =
, &
and ?
and as such a URL of the form:
server.com/resource?name=
name_value
&domain=
domain_value
is usually constructed from its various components using the values of the parameters as shown in the example below.
uri aws-object-encode
varname
This method is specifically implemented to support the encoding of object names when downloading from Amazon S3 buckets. Amazon S3 buckets appear much like shared directories, but they do not have a heirarchical filesystem.
The 'files' in buckets are termed objects and to assist in organising the contents of a bucket, object prefixes may be used to logically group objects together.
These prefixes may include the forward slash character, making the resulting object name appear identical to a conventional pathname (an example might be billing_data/20180116_usage.csv
). When downloading an object from S3 the object name is provided as part of the HTTP query string.
When referencing an S3 object name there is an explicit requirement not to encode any forward slashes in the object name. USE therefore provides the aws-object-encode
method to ensure that any S3 object names are correctly encoded. This method will encode all characters except for the following:
URI encode every byte. UriEncode() must enforce the following rules:
URI encode every byte except the unreserved characters: 'A'-'Z', 'a'-'z', '0'-'9', '-', '.', '', and '~'._
The space character is a reserved character and must be encoded as "%20" (and not as "+").
Each URI encoded byte is formed by a '%' and the two-digit hexadecimal value of the byte.
Letters in the hexadecimal value must be uppercase, for example "%1A".
Encode the forward slash character, '/', everywhere except in the object key name. For example, if the object key name is photos/Jan/sample.jpg, the forward slash in the key name is not encoded.
The usr-object-encode
method is compliant with the above requirements. For most trivial cases it should not be necessary to encode the AWS object name as it is relatively straightforward to do it by hand. However using uri aws-object-encode
to URI-encode the object name may be useful for object names that contain a number of characters not listed above or for cases where the object name is provided as a parameter to the USE script.
The above script will output:
The var
statement is used to create or update a variable which can subsequently be referenced by name in the USE script.
[public] var
name [ = value]
[public] var
name operator number
[public] encrypt var
name = value
For details on encrypted variables please refer to the article
Variables are created in one of two ways:
Manually via the var
command
Automatically, as a consequence of other statements in the script
If the word public
precedes a variable declaration then the variable will be shown in, and its value can be updated from, the Exivity GUI. Only variables prefixed with the word public
appear in the GUI (all others are only visible in the script itself). To make an automatic variable public, re-declare it with a value of itself as shown below:
A variable is a named value. Once defined, the name can be used in place of the value for the rest of the script. Amongst other things this permits configuration of various parameters at the top of a script, making configuration changes easier.
The = value portion of the statement is optional, but if used there must be white-space on each side of the =
character. To use spaces in a variable value it should be quoted with double quotes.
Once a variable has been defined it can be referenced by prefixing its name with ${
and post-fixing it with a }
. For example a variable called outputFile
can be referenced using ${outputFile}
. If no value is specified, then the variable will be empty, eg:
will result in the output:
Variable names are case sensitive, therefore ${variableName}
and ${VariableName}
are different variables.
If there is already a variable called name then the var
statement will update the value.
There is no limit to the number of variables that can be created, but any given variable may not have a value longer than 8095 characters
Variables that contain a numeric value can have the arithmetic operations performed on them. This is done using the following syntax:
var
name operator number
The operator must be surrounded by white-space and following values are supported:
For example the statement var x += 10
will add 10 to the value of x.
When performing arithmetic operations on a variable, any leading zeros in the value of that variable will be respected:
Attempting to perform an arithmetic operation on a variable that does not contain a valid number will result in an error being logged, and the script will terminate.
Currently, only integer arithmetic is supported.
Automatic variables are referenced in exactly the same way as manually created ones; the only difference is in the manner of creation.
The following variables are automatically created during the execution of a USE script:
match day "(...)" ${DAY_NAME_UTC}
var short_day = ${day.RESULT}
On occasion it may be useful to determine the length (in characters) of the value of a variable. This can be done by appending the suffix .LENGTH
to the variable name when referencing it. For example if a variable called result has a value of success then ${result.LENGTH}
will be replaced with 7
(this being the number of characters in the word 'success').
A variable with no value will have a length of 0, therefore using the .LENGTH
suffix can also be used to check for empty variables as follows:
myvar.LENGTH
is not a variable in its own right. The .LENGTH
suffix merely modifies the manner in which the myvar
variable is used.
Within the individual reference articles for each statement, the syntax is described using the following conventions:
bold
for keywords
italics
for arguments
Square brackets for optional keywords and arguments [
like
this
]
Vertical pipe for alternative keyword options just|exactly
as shown
Ellipses for a variable length list of arguments: Column1
...
ColumnN
Refer to the page for more information regarding datasets, fully qualified column names and related information.
The following statements (in alphabetical order) are supported by Transcript:
More information may be found at where it states:
To derive the short versions of the day and month names, use a statement to extract the first 3 characters as follows:
Characters
Meaning
" ... "
Anything inside the quotes, except for a newline or an escape character is treated as literal text
\"
Whether within quotes or not, this is expanded to a double quote - "
- character
\t
When used outside quotes, this is expanded to a TAB character
\
When used outside quotes, a space following the \
is treated as a literal character
\\
Whether within quotes or not, this is expanded to a backslash - \
- character
Variable
Meaning
${dataDate}
The datadate currently in effect, in yyyyMMdd format
${dataDay}
The day value in the dataDate
variable, expressed as a 2 digit number padded with a leading zero if necessary
${dataMonth}
The month value in the dataDate
variable, expressed as a 2 digit number padded with a leading zero if necessary
${dataMonthDays}
The number of days in the month in the dataMonth
variable
${dataDateStart}
00:00:00 on the day in the dataDate
variable, expressed as a UNIX timestamp
${dataDateEnd}
23:59:59 on the day in the dataDate
variable, expressed as a UNIX timestamp
${dataYear}
The year value in the dataDate
variable, expressed as a 4 digit number
${homeDir}
The base working directory currently in effect
${exportDir}
This is the equivalent of ${baseDir}\exported
Operator | Meaning |
| Addition |
| Subtraction |
| Multiplication |
| Division |
The append
statement is used to append one DSET to the end of another.
append
source_dset.id
to
destination_dset.id
If the source DSET has any column names not present in the destination DSET then additional columns are automatically created in the destination DSET. These additional columns will contain blank values by default.
If one or more column names are present in both DSETs then the columns copied from the source DSET may be re-ordered into the same order as that used by the destination DSET.
At the end of the operation, the destination DSET will contain all the data from both DSETs, and the source DSET is unchanged.
!!! note Both DSETs must exist and it is not possible to append a DSET to itself
Given the following DSETs:
DSET ID: example.data
DSET ID: example2.data
The statement append example2.data to example.data
will result in the following destination DSET (example.data):
The calculate
statement is used to perform arithmetic operations using literal and column values.
calculate column
ResultCol
as
source operation source
where source is either of column
colName
or value
literal_value
and operation is one of the characters + - * / %
for addition, subtraction, multiplication, division and modulo respectively.
There must be whitespace on each side of the operation`character
Examples:
calculate column ResultCol as column Amount * value 1.2
calculate column Net as column total - column cogs
calculate column constant_7 as value 3.5 + value 3.5
The ResultCol parameter is the name of the column that will hold the results. This column may or may not exist (if necessary it will be created automatically).
Both of the two source parameters can specify a literal value, or the name of a column containing the value to use when performing the calculation.
A literal value is specified using value
N
where N is the literal number required
A column name is specified using column
colName
where ColName is the name of the column containing the values required
The ResultCol may be the same as a column specified by one of the source parameters in which case any existing values in it will be updated with the result of the calculation.
Any blank or non-numeric values in a source column will be treated as 0
An attempt to divide by zero will result in 0
When performing a modulo operation, the two source values are rounded to the nearest integer first
If the result column already exists then if option overwrite is set to no
, only blank cells in the result column will be updated.
Add 1.5 to the values in the Rate column:
calculate column Rate as column Rate + value 1.5
Multiply the values in the Rate column by those in the Quantity column
Store the result in a new column called Charge
calculate column Charge as column Rate * column Quantity
Variable | Details |
| The number of parameters passed to the script |
| For each parameter passed to the script a variable called |
| The day of the current local date, padded to 2 digits if necessary |
| The full English name of the current day of the week |
| The day of the current date in UTC, padded to 2 digits if necessary |
| The full English name of the current day o fthe week in UTC |
| The current local time in 'friendly' format, eg |
|
|
|
| The hour of the current local time, padded to 2 digits if necessary |
| The hour of the current time in UTC, padded to 2 digits if necessary |
|
| The minute of the current local time, padded to 2 digits if necessary |
| The minute of the current time in UTC, padded to 2 digits if necessary |
| The month of the current local date, padded to 2 digits if necessary |
| The full English name of the current month of the year |
| The month of the current date in UTC, padded to 2 digits if necessary |
| The full English name of the current month of the year in UTC |
| A newline ( |
| The second of the current local time, padded to 2 digits if necessary |
| The second of the current time in UTC, padded to 2 digits if necessary |
| The milliseconds of the current local time, padded to 3 digits if necessary |
| The milliseconds of the current time in UTC, padded to 3 digits if necessary |
| The filename of the script being executed |
| The current UTC time in YYYYMMDD'T'HHMMSS'Z' format, eg: |
| The year of the current local date as a 4 digit number |
| The year of the current date in UTC as a 4 digit number |
Statement | Description |
Reduce the number of rows in a DSET while preserving information |
Append one DSET to the end of another |
Perform arithmetic on column values |
Capitalise column name and/or values |
Convert between decimal and hex values |
Copy rows from one DSET to another |
Merge DSETs using a key |
Create one or more columns |
Specify the default DSET |
Delete columns, rows or DSETs |
Snapshot a DSET to disk |
Create a Reporting Database File |
Conditionally execute statements |
Import a Dataset or CCR file |
Execute one task from within another |
Convert column name and/or values to lower case |
Move rows from one DSET to another |
Normalise strings |
Set global parameters |
Rename an existing column or DSET |
Search and replace values in a column |
Round numeric values in a column |
Create a chargeable service |
Create multiple chargeable services |
Set cell values in a column |
Split column values |
Set the start time and end time columns |
Render a UNIX timestamp in human-readable form |
Create a timestamp column |
See |
Convert column name and/or values to upper case |
Define a variable |
Define a local filter |
The aggregate
statement is used to reduce the number of rows in a DSET while preserving required information within them
aggregate
[dset.id]
[notime|daily] [offset
offset]
[nudge] [default_function
function] colname function [... colname function]
The aggregate
statement is a powerful tool for reducing the number of rows in a DSET. Aggregation is based on the concept of matching rows. Any two rows that match may be merged into a single row which selectively retains information from both of the original rows. Any further rows that match may also be merged into the same result row.
A match is determined by comparing all the columns which have a function of match
associated with them (further information regarding this can be found below). If all the column values match, then the rows are merged.
Merging involves examining all the columns in the data that were not used in the matching process. For each of those columns, it applies a function to the values in the two rows and updates the result row with the computed result of that function. For a full list of functions, please refer to the table further down in this article.
To illustrate this consider the following two row dataset:
If we don't care about the colour
value in the above records, we can combine them together. We do care about the quantity
however, so we'll add the two values together to get the final result.
The statement to do this is:
aggregate notime id match location match quantity sum
id match means that the values in the id
columns must be the same
location match means that the values in the location
columns must be the same
quantity sum means that the resulting value should be the sum of the two existing values
by default, a function of first
is applied to the columns, such that the original row retains its value
Applying these rules to the above example we get the following single result record:
A column calledEXIVITY_AGGR_COUNT
automatically created by the aggregate
statement and for each row in the output it will contain the number of source rows that were merged together to create that result row
The aggregate
statement accepts a range of parameters as summarised in the table below:
If two records are deemed suitable for merging then the function determines the resulting value in each column. The available functions are as follows:
When the notime parameter is specified, the aggregation process treats any columns flagged as start and end times in the data as data columns, not timestamp columns.
In this case when comparing two rows to see if they can be merged, the aggregation function simply checks to see if all the columns with a function of match
are the same, and if they are the two rows are merged into one by applying the appropriate function to each column in turn.
The following illustrates the aggregate
statement being used to remove duplicate rows from a DSET:
The analysis of the statement above is as follows:
notime - we are not interested in timestamps
default_function match - by default every column has to match before records can be aggregated
subscription_id match - this is effectively redundant as the default_function is match
but needs to be present because at least one pair of colname function parameters is required by the aggregate
statement
The resulting DSET will have no duplicate data rows, as each group of rows whose column values were the same were collapsed into a single record.
The example shown at the top of this article used the sum
function to add up the two quantity
values, resulting in the same total at the expense of being able to say which source record contributed which value to that total.
The sum
function can therefore accurately reflect the values in a number of source rows, albeit with the above limitation. By using a function of sum
, max
or min
, various columns can be processed by aggregate
in a meaningful manner, depending on the specific use case.
When aggregating, columns containing start time and end time values in UNIX epoch format can be specified. Each record in the DSET therefore has start and end time markers defining the period of time that the usage in the record represents. As well as taking the start times and end times into account, time-sensitive aggregation can perform additinal manipulations on these start and end times.
Consider the following CSV file called aggregate_test.csv
:
It is possible to aggregate these into 3 output records with adjusted timestamps using the following Transcript task:
Resulting in:
As can be seen, for each unique combination of the values in the id
,subscription-id
and service
columns, the start and end times have been adjusted as described above and the quantity
column contains the sum of all the values in the original rows.
!!! warning When performing time-sennsitive aggregation, any records with a start or end time falling outside the current data date will be discarded.
The daily parameter to aggregate
means that the START_TIME and END_TIME columns are now recognised as containing timestamps. When aggregating with the daily option, timestamps within the current dataDate are combined to result in an output record which has the earliest start time and the latest end time seen within the day.
Optionally, following daily
an offset may be specified as follows:
aggregate aggr.test daily offset 2 id match subscription_id match quantity sum
In this case the start and end timestamps are adjusted by the number of hours specified after the word offset before aggregation is performed. This permits processing of data which has timestamps with timezone information in them, and which may start at 22:00:00 of the first day and end at 21:59:59 of the second day, as an offset can be applied to realign the records with the appropriate number of hours to compensate.
The nudge parameter shaves 1 second off end times before aggregating in order to avoid conflicts where hourly records start and end on 00:00:00 9the last second of the current hour is the same as the first second of the next hour)
The convert
statement is used to convert values in a column from base-10 to base-16 or vice-versa.
convert
colName
to decimal|hex from decimal|hex
The keywords dec
anddecimal
and the keywordshex
andhexadecimal
are equivalent.
When converting values in a column, the following considerations apply:
Values in the column are replaced with the converted values
The colName argument must reference an existing column, and may optionally be fully qualified (else the column is assumed to be in the default DSET)
If any values in the column are not valid numbers, they will be treated as 0
Blank values are ignored
The convert
statement may be used in the body of a where
statement
If a value in colName contains a partially correct value such as 123xyz
then it will be treated as a number up to the first invalid character, in this case resulting in a value of 123
.
The hex digits in the original value can be either upper or lower case
The hex digits from A-F will be rendered in upper case in the converted output
The convert
statement only supports integer values (floating points will be treated as floored to the nearest integer)
The capitalise
(the spelling capitalize
is also supported) statement is used to modify the name of a column and/or the values in a column such that the first character is a capital letter and the remaining characters are lower case.
capitalise values|heading [and values|heading] in column|columns
ColName1 [... ColNameN]
After the keyword in
, either of the keywords column
or columns
may be used
The heading
and values
keywords refer to the name of an existing column and the values in each row for that column respectively.
Only the first character in the column name or value is converted to upper case. If this character is not a letter, then the statement will have no effect on it. For example applying the statement:
will have no effect on the column name as the first character is an underscore. However, applying the same statement to a column called _VMName
would result in a new name of _vmname
as after attempting to make the first character a capital (which in the case of the underscore has no effect), the remaining characters are converted to lower case.
Any number of column names may be specified, and any of these may or may not be fully qualified. When applied to values in a column, blank values are ignored.
This article covers both the copy
and move
statements. They both work in the same way apart from the fact that move
deletes the source row after copying it.
The copy
statement is used to copy rows from one DSET to another
copy rows to
dset.id
move rows to
dset.id
Both copy
and move
must be used within the body of a statement. Only rows that match the expression will be copied (or moved).
The DSET from which rows will be copied or moved is automatically determined from the expression used by the where
statement.
The DSET to which rows will be copied or moved is determined by the dset.id parameter
The source and destination DSETs must be different (it is not possible to copy or move a row within the same DSET).
The destination DSET may or may not exist. If it does not exist then it will be created. If it does exist then the following logic is applied:
If the destination DSET has more columns than the source DSET then the new rows in the destination DSET will have blank values in the rightmost columns
If the destination DSET has fewer columns than the source DSET then the destination DSET will be extended with enough new columns to accomodate the new rows. In this case, existing rows in the destination DSET will have blank values in the rightmost columns
If the destination DSET is extended to accomodate the source rows then the new (rightmost) columns will have the same names as the equivalent columns in the source DSET. In the event that this would cause a naming conflict with existing columns in the destination DSET, one or more new columns in the destination DSET will heve a suffix added to their name to ensure uniqueness. This suffix takes the form _N
where N
is a number starting at 2
.
To illustrate this, if the source DSET has columns called subscription,name,address,hostname
and the destination DSET has a single column called name
then the resulting extended destination DSET would have columns called name,subscription,name_2,address,hostname
.
The default
statement is used to explicitly define the default DSET to use when specifying a column name as an argument in subsequent statements.
default dset
source.alias
Given that multiple DSETS can be loaded at once, it is necessary to specify which DSET to operate on when performing actions such as creating a new column with . A column name in a Transcript statement is assumed as belonging to the default DSET unless it is a column name.
If there is no default
statement in the Transcript, then the first CSV file imported via the statement will automatically be designated as the default DSET.
The default
statement can be used multiple times throughout a Transcript, in which case the default DSET will be whichever was specified by the last default
statement executed.
Lastly, when executing a statement, unless otherwise specified, the default DSET will be used to populate the reporting database created as a result.
When changing the default DSET, any definitions that referenced the default DSET at the time they were created will be updated with the new default DSET.
Set custom.datafile as the default DSET:
default dset custom.datafile
The correlate
statement is used to enrich the default DSET by adding new columns to it, and/or updating existing columns with useful values. The new column names are derived from other DSETs and the values in those columns are set using a lookup function based on the value in a key column shared between the DSETs.
correlate
ColName1 [ ... ColNameN]
using
KeyColumn
[assuming
assumeDSET]
[default
DefaultValue]
The ColName1 ... ColNameN arguments are column names that will be copied from their original DSETs and merged into the default DSET.
Column names must be , unless the assuming parameter is used, in which case any column names that are not fully-qualified will be assumed to belong to the DSET specified by assumeDSET.
Source columns are those from which a cell is to be copied when the KeyColumn matches. Destination columns are columns in the default DSET into which a cell will be copied. Destination column names are derived from the names of the source columns as follows:
The source column is the argument in its original form, for example: Azure.usage.MeterName
The destination column is the same argument, but with the DSET ID replaced with that of the DSET. For example if the default DSET is Custom.Services then the destination column for the above would be Custom.Services.MeterName.
If a destination column name doesn't exist in the default DSET then a new column with that name will automatically be created.
The KeyColumn argument is a column name which must not be fully qualified and which must exist in the default DSET and all of the DSETs referenced by the ColNameN arguments.
The DefaultValue argument, if present, specifies the value to write into the destination column if there is no match for the KeyColumn. If the DefaultValue argument is not specified then any rows where there is no match will result in a blank cell in the destination column.
For each row in the default DSET, the source DSET is searched for a matching KeyColumn value, and if a match is found then the value in the source column is used to update the default DSET. The row of the first match found in the source DSET will be used.
When matching the KeyColumn values, the logic in the following table is evaluated against every row in the destination DSET.
✘
means no or disabled, ✔
means yes or enabled
Given two Datasets as follows, where the default DSET is MyData.Owners:
Dataset 'MyData.Owners'
Dataset 'Custom.Services'
The statement:
correlate service description using id assuming Custom.Services
Will enrich the MyData.Owners Dataset such that it contains:
The statement:
correlate service description using id assuming Custom.Services default unknown
Will produce:
The create
statement is used to add one more more new columns to an existing DSET.
create column
NewColumnName
[value
Value]
create columns from
ColumnName
[using
ValueColumnName]
create mergedcolumn
NewColumn
[separator
sep]
from [string
literal] Column [/regex/] [Column [/regex/]
|
string
literal]
create column
NewColumnName
[value
Value]
This statement is used to create a new column called NewColumnName. The NewColumnName
argument may be a column name, in which case the new column will be created in the DSET specified as part of that name.
Note: If no DSET has been explicitly defined using the statement then the DSET created by the first or statement in the Transcript task is automatically set as the default DSET.
A column called NewColumnName must not already exist in the DSET. If NewColumnName
contains dots then they will be converted into underscores.
The new column will be created with no values in any cells, unless the optional value *Value*
portion of the statement is present, in which case all the cells in the new column will be set to Value.
Create a new empty column called Cost in the default DSET:
create column Cost
Create a new column called Cost with a value of 1.0 in every row of the default DSET:
create column Cost value 1.0
Create a new column called Cost with a value of 1.0 in every row of the DSET custom.charges:
create column custom.charges.Cost value 1.0
create columns from
ColumnName
[using
ValueColumnName]
This statement is used to create multiple columns in a single operation. As is the case for create columns
above, if the using ValueColumnName
portion of the statement is not present, then all newly created columns will have no values in any cells.
Given this example dataset:
The statement create columns from ServiceName using Count
will create the result shown below:
The names of the new columns to create are derived from the contents of the cells in the column called ColumnName, and the values (if opted for) are derived from the contents of the cells in the column called ValueColumnName. Duplicates are ignored. If all the cells in ColumnName have the same contents, then only a single new column will be created. To illustrate this, consider the following:
When applied to the data above, the statement create columns from ServiceName
will produce the following result (note that only a single column called Small_VM is created, and that empty cells are represented with a separator character, which in the case of the below is a comma):
If opting to set the values in the new columns, then for each row the value in ValueColumnName will be copied into the column whose name matches ColumnName. When applied to the same original data, the statement create columns from ServiceName using Quantity
will produce the following result:
When using create columns
the new columns are always created in the default DSET. This means that when no values are being set, it is possible to specify a different DSET for ColumnName. If the default DSET is Azure.usage
, then the statement create columns from custom.data.Services
will derive the names of the new columns from the cell contents in the Services column in the custom.data DSET.
This is only possible in the absence of the using ValueColumnName
option. When values are to be set, both the ColumnName and ValueColumnName arguments must belong to the default DSET.
The following transcript task will import the datasets Azure.usage and system/extracted/Services.csv, and create new (empty) columns in Azure.usage whose names are taken from the values in the column ServiceDefinitions in Services.csv.
create mergedcolumn
NewColumn
[separator
sep]
from [string
literal] Column [/regex/] [ ... Column [/regex/]
|
string
literal]
If preferred, the wordusing
may be used instead of the wordfrom
(both work in an identical fashion)
This form of the statement is used to generate a new column containing values derived from those in one or more existing columns (termed source columns). The parameters are as follows:
The separator may be more than one character in length (up to 31 characters may be specified)
If a regex is specified then it must contain a subgroup enclosed in parentheses. The portion of the text in the source column matched by this subgroup will be extracted and used in place of the full column value.
The '/' characters surrounding the regular expression in the statement are not considered to be part of the expression itself - they are merely there to differentiate an expression from another column name.
If a regex is not specified, then the entire value in the source column will be used.
By default the value extracted from a source column will be blank in the following two cases:
There is a blank value in a source column
No match for a regular expression is found in the value of a source column
In such cases the merged result will simply omit the contribution from the source column(s) in question. If all the source columns yield a blank result then the final merged result will also be blank.
option merge_blank = some_text_here
This option will use the string some_text_here in place of any blank source column value.
option merge_nomatch = some_text_here
This option will use the string some_text_here if the result of applying the regular expression to a column value returns no matches.
Specifying the literal string <blank>
as the merge_blank or merge_nomatch value will reset the option such that the default behaviour is re-activated.
Given the following dataset:
The following examples illustrate some uses of the create mergedcolumn
statement:
Example 1
Example 2
If no regular expression is specified then the values in the source column will be used in their entirety:
Example 3
Let us add a new row to the sample dataset which has a non-compliant value for the user_id:
By default a non-matching value will result in a blank component of the merged result:
In this case, the resulting key for John
has no separator characters in it. We can force a default value for the missing user_id portion as follows:
A loop creates this variable (where loop_name is the name of the loop). The value of the variable is updated every time the loop executes, with a value of 1
on the first loop. If no loops are performed, then the variable will have a value of 0
When iterating over the children of a JSON object (not an array) using , these variables are updated with the name and value respectively of the current child every time the loop is executed (either may be blank if the child has no name or value respectively)
When iterating over the children of a JSON object (not an array) using , this variable is updated to reflect the type of the current child every time the loop is executed. The type will be one of boolean
, number
, string
, array
, object
or null
.
The HTTP status code returned by the server in response to the most recent request executed
Modify one or more existing descriptions and/or unit label
This behaviour can be overridden through the use of the statement. The options associated with the create mergedcolumn
statement are as follows:
Parameter
Notes
dset.id
If not specified then the default DSET will be used
notime
(Either notime or daily is required) If used, timestamps in records are not taken into consideration when aggregating
daily
(Either notime or daily is required) If used, specifies that timestamps in the records will be considered when aggregating
offset
(May only be used if daily is present) The number of hours to shift timestamps by prior to aggregation
nudge
(May only be used if daily is present) If present, the times in the timestamp column marked as the end time column will have 1 second shaved off them prior to aggregation
default_function
Specifies the default logic to apply to a column when merging records together. If not specified then the default is first
(see table below)
colname function
One or more pairs of column + function parameters. For each pair, the specified function will be used for the specified column name when merging records together during the aggregation process. For any columns not explicitly named, the default function will be applied.
Function
Logic
match
The value in both records must be the same
first
The existing value in the first ever result record wil be used
last
The value in the last record merged will be used
sum
The values will be treated as numbers and summed
max
The values will be treated as numbers and the greatest will be used
min
The values will be treated as numbers and the smallest will be used
longest
Whichever value has the most characters in it will be used
shortest
Whichever value has the least characters in it will be used
blank
The value in the resulting merged record will be blank
avg
The values will be treated as numbers and the average will be used
Match Found | Overwrite | Default Value | Result |
✘ | ✘ | ✘ | No values will be updated |
✔ | ✘ | ✘ | Empty destination column cells will be updated |
✘ | ✘ | ✔ | Empty destination column cells will be set to the default value |
✔ | ✘ | ✔ | Empty destination column cells will be set to the matched source column value |
✘ | ✔ | ✘ | No values will be updated |
✔ | ✔ | ✘ | Destination column cells will be updated |
✘ | ✔ | ✔ | Destination column cells will be set to the default value |
✔ | ✔ | ✔ | Destination column cells will be set to the matched source column value |
Parameter | Required | Meaning |
NewColumn | Yes | The name of the new column to create |
sep | No | If specified after the separator keyword, sep is a string to insert between values extracted from the source columns |
Column | Yes (at least one) | The name of a source column. A minimum of 1 column must be specified (and at most, 8 may be specified) |
/regex/ | No | If specified, the expression enclosed by the |
string literal | No | If specified, literal will be add to column value. Relative order of Column and literal bits is observed |
The finish
statement creates a Reporting Database File (RDF) from a DSET. The RDF can subsequently be used by the reporting engine.
finish
[dset.id]
The finish
statement is used to create an RDF from a DSET. Only a single DSET can be used to create an RDF, but multiple finish
statements may be used within the same task file. If there is no dset.id parameter then the default DSET will be used.
The RDF created by finish
will be saved as <BaseDir>\system\report\<yyyy>\<MM>\<dd>_source.alias.rdf
where:
<yyyy>
is the 4-digit year
<MM>
is the 2-digit month
<dd>
is the 2-digit day
source.alias
are the source and alias tags which form the DSET ID
Any existing RDF with the same name will be overwritten.
Create a Reporting Database file for the default DSET:
finish
Create a Reporting Database file for the DSET Azure.usage
finish Azure.usage
The export
statement is used to snapshot the data in a DSET and write it to disk as a Dataset.
export
source.alias
as
filename
The exported file will be created under <base_dir>/exported
. The filename parameter may include a path as well as the filename to export, so long as it does not contain the substring ".." and is not an absolute path. If the path contains one or more directories that do not exist then they will be created.
The source.alias
argument is the DSET ID to export (see Core concepts for more information on DSET IDs).
The export
statement can be used at any point during this process to save a CSV (conforming to Dataset format) which snapshots the data in the DSET at that moment in time. This can be used for a number of purposes including:
Examining the state of data part-way through processing for debugging purposes
Creating custom exports for subsequent import into 3rd part systems
Producing output CSV files (which may potentially contain merged data from multiple sources) for subsequent processing with another Transcript
When specifying a Windows path it is advisable to use UNIX-style forward slashes as the path delimiters, or to put the path+filename in double quotes to avoid occurrences of\t
being interpreted as a TAB character
The following Transcript will import a quoted CSV file, add a ProcessedFlag column to it with a value of 1 in each row, and save it back out again without quotes:
The include
statement is used to combine Transcript task files together
include
taskfile
The include
statement is used to combine two or more task files into a single script prior to commencement of execution.
If using the Windows path delimiter - \
- it is advisable to put the path+filename in double quotes to avoid the backslash being interpreted as an escape character
Before executing a task file, Transcript analyses the script and processes all the include
statements. Wherever an include
statement is encountered, the specified taskfile is read from disk and inserted into the script in place of the include
statement itself.
As with most statements, include
must occupy a single line of its own in the script.
The taskfile argument is treated as relative to <basedir>/system/config/transcript/
.
Multiple files may be included in a task file and any included file may include other files but a check is done to ensure that no infinite include loops are generated.
Given the following two files in /system/config/transcript/
...
Prior to execution the pre-processor will combine these into the single memory-resident Transcript task:
The delete
statement is used to delete one or more columns or rows from one or more DSETs.
delete columns [except]
ColName1 [... ColNameN]
delete blankcolumns
delete rows
delete dset
dset_id
The list of column names following the initial delete columns
statement may be used in one of two ways:
The columns listed will be deleted
All columns except those listed will be deleted
Which method to use is determined by the presence (or otherwise) of the except
keyword:
When using the except
keyword, all column names following it must belong to the same DSET
It is not possible to delete a column from a DSET which only contains a single column. When deleting a column, the memory used to index it, along with the memory used to store the contents of its cells is released. This may be useful in situations where memory is limited.
Deleting unwanted columns early in a Transcript task will also increase the the performance of many subsequent operations performed by the task.
The second keyword may be column
instead of columns
. Either way, the statement behaves identically.
The delete blankcolumns
statement will delete columns that only contain blank values from the default DSET.
It is not possible to completely empty a DSET using delete blankcolumns
. In the event that the last remaining column in a DSET is blank then it will not be deleted.
In order to delete rows from a DSET a local filter must be in effect. A local filter is created using the where statement.
When used within the body of a where statement, delete rows
will delete all rows in the DSET associated with the local filter where the condition is true. For more information on conditions, please refer to the where article.
Either delete rows
or delete row
may be used (both variations work in an identical manner)
The delete dset
statement can be used to remove a DSET from memory. This may be useful in cases where a DSET is no longer required, for example after it has been used to enrich another DSET via the correlate statement.
It is not possible to delete the default DSET.
Delete the column temp from the default DSET and the column Interim from the azure.Usage DSET:
delete columns temp azure.Usage.Interim
Delete any columns in the default DSET that have no values in any rows:
delete blankcolumns
Delete all rows where the VMID is 1234
Delete the DSET azure.rates
delete dset azure.rates
The import
or use
statement is used to read a CSV file (which must conform to Dataset standards) from disk in order to create a DSET which can then be processed by subsequent Transcript statements.
The statements import
and use
are identical in operation. For clarity, the term import
will be used for the remainder of this article, but use
can be substituted in all cases.
To import CSV files that do not use a comma as the delimiter, please refer to the Quote and Separator Options section further down in this article.
import
filename
from
source
[alias
alias]
[options { ... }]
import
filename
source
custom_source
[alias
alias]
[options { ... }]
import
filename.ccr
source
custom_source
[alias
alias]
import ACCOUNT source
custom_source
[alias
alias]
import USAGE for
date
from
db_name
source
custom_source
[alias
alias]
If using the Windows path delimiter - \
- it is advisable to put the path and filename in double quotes to avoid the backslash being interpreted as an escape character
Data imported from a CSV file can be filtered as it is being read in order to reduce post-import processing time and memory overhead.
The import filters are configured via the options parameter, which if present is imemdiately followed by one or more name = value pairs enclosed within braces, for example:
The =
sign is optional, but if present it must be surrounded by whitespace.
Multiple options may be specified and each option must be placed on separate line, for example:
The options supported by import
are as follows:
If the skip option is specified then the column headings in the imported data will be set to those of the first row following the skipped rows
The filter expression, if specified, is of identical format to that used by the where statement and it must reference at least one column name, enclosed within square brackets:
To specify a parameter or column name that contains spaces in an expression, use quotes within the square brackets as follows:
If a pattern is specified in the import options, then the filename parameter is treated as an ECMAScript-type regular expression, and all files matching that pattern are imported and appended to one another to result in a single DSET.
Only filenames may contain a regular expression, directory names are always treated literally.
If using a regular expression, the import options are applied to all files matching that expression. All files must have the same structure (after any select/ignore options have been applied).
If any file in the matching set being imported has different columns to the first file that matched then an error will be generated and the task will fail.
To import all accounts from Exivity the following statement is used:
import ACCOUNT source
custom_source
[alias
alias]
This functionality is intended for advanced use cases where it is necessary to correlate information about existing accounts into the new data being processed as part of the Transform step.
To import usage data previously written to an RDF using finish the following statement is used:
import USAGE for
date
from
dset
source
custom_source
[alias
alias]
This functionality may be useful in the event that the original data retrieved from an API has been deleted or is otherwise unavailable.
The date must be inyyyyMMdd
format. The RDF from which the usage will be imported is:
<basedir>/system/report/<year>/<month>/<day>_<dset>_usage.rdf
.
To illustrate this in practice, the statement ...
import USAGE for 20180501 from azure.usage source test alias data
... will load the usage data from ...
<basedir>/system/report/2018/05/01_azure.usage_usage.rdf
... into a DSET calledtest.data
.
A file imported from disk contains one or more named columns which are used to create an index for subsequent processing of the data in that file. As multiple files may be imported it is necessary to use namespaces to distinguish between the DSETs created from the files (each imported file is converted into a memory-resident DSET before it can be processed further). This is accomplished through the use of source and alias tags. Each file imported is given a unique source and alias tag, meaning that any column in the data imported from that file can be uniquely identified using a combination of source.alias.column_name.
There are two main variations of the import
statement which are as follows:
import
filename
from
source
[alias
alias]
By convention, data retrieved by the USE extractor from external sources is located in a file called
<basedir>/system/extracted/<source>/<yyyy>/<MM>/dd>_.csv
where <yyyy>
is the year, <MM>
is the month and <dd>
is the day of the current data date.
Typically, the <source> portion of that path will reflect the name or identity of the external system that the data was collected from.
By default the alias tag will be set to the filename, minus the _
portion of that filename and .csv
extension but an alias can be manually specified using the optional alias parameter.
As an example, assuming the data date is 20170223
, the statement:
import usage from Azure
will create a DSET called Azure.usage
from the file /system/extracted/Azure/2017/02/23_usage.csv
.
The statement:
import usage from Azure alias custom
will create a DSET called Azure.custom
from the file
<basedir>/system/extracted/Azure/2017/02/23_usage.csv
.
import
filename
source
custom_source
[alias
alias]
This form of the statement will import filename and the source tag will be set to the value of the custom_source parameter.
By default the alias tag will be set to the filename minus the .csv
extension but an alias can be manually specified using the optional alias parameter.
As an example, assuming the data date is 20170223
, the statement:
import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage
will create a DSET called Azure.usage
fom the file
<basedir>/system/extracted/Azure/20170223.csv
Options will be ignored when importing CCR files
When performing an import using manual tagging it is possible to import a Cloud Cruiser CCR file. This is done by specifying the .ccr
filename extension as follows:
Full information regarding CCR files is beyond the scope of this article. Cloud Cruiser documentation should be consulted if there is a requirement to know more about them.
In order to create a DSET from a CCR file, Transcript coverts the CCR file to a Dataset in memory as an interim step. The resulting DSET will have blank values in any given row for any heading where no dimension or measure in the CCR file existed that matched that heading. No distinction is made between dimensions and measures in the CCR file, they are imported as columns regardless.
When importing a CCR file, the quote and separator characters are always a qouble quote - "
- and a comma - ,
- respectively. Fields in the CCR file may or may not be quoted. The conversion process will handle this automatically and any quotes at the start and end of a field will be removed.
The fields in a dataset file may or may not be quoted and the separator character used to delineate fields can be any character apart from an ASCII NUL (0) value. It is therefore important to ensure that the correct options are set before importing a dataset in order to avoid unwanted side effects.
The options relating to import
are quote and separator (or delimiter). By default, these are set to a double quote and a comma respectively.
If defined the quote character will be stripped from any fields beginning and ending with it. Any additional quote characters inside the outer quotes are preserved. Fields that do not have quotes around them will be imported correctly, unless they contain a quote character at only one end of the field.
If no quote character is defined, then quote characters are ignored during import, but any separator characters in the column headings will be converted to underscores.
If the embed
option is set, then a curly bracket in the data - {
- will cause all characters (including newlines) up until the closing bracket - }
- to be imported. Nested brackets are supported, although if there are an uneven number of brackets before the end of the line an error will be generated in the logfile and the task will fail.
Embeds are not supported in CCR files
Create a DSET called Azure.usage
from the file /system/extracted/Azure/2017/02/23_usage.csv
import usage from Azure
Create a DSET called Azure.custom
from the file /system/extracted/Azure/2017/02/23_usage.csv
import usage from Azure alias custom
Create a DSET called Azure.usage
fom the file /system/extracted/Azure/20170223.csv
import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage
Create a DSET called Azure.usage
from the file /system/extracted/Azure/20170223.csv
import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage
The if
statement is used to conditionally execute one or more statements
A conditional expression (hereafter referred to as simple an expression) is evaluated to provide a TRUE or FALSE result which in turn determines whether one or more statements are to be executed or not. The following are examples of a valid expression:
An expression used by the if
statement may contain:
Numeric and string literals
Regular expressions
Variables
Operators
Functions
A literal is a specified value, such as 4.5
or "hostname"
. Literals may be numbers or strings (text).
If a literal is non-quoted then it will be treated as a number if it represents a valid decimal integer or floating point number (in either regular or scientific notation), else it will be treated as a string.
If a literal is quoted then it is always treated as a string, thus 3.1515926
is a number and "3.1415926"
is a string.
Regular expressions must be enclosed within forward slashes (/
), and are assumed to be in ECMAScript format.
If present, a regular expression must be used on the right hand side of either an !~
or an =~
operator, and when evaluated it will be applied to the value on the left hand side of an operator, eg:
As the forward slash is used as a delimiter for the expression, any literal forward slashes required by the expression should be escaped with a back-slash: \/
Variables can be used within expressions, in which case they are replaced with their values. Once expanded, these values are treated as literals.
Operators are evaluated according to the operator precedence rules in the table below (where the highest precedence is evaluated first), unless parentheses are used to override them. Operators with the same precedence are evaluated from left to right.
Although expressions are evaluated based on the precedence of each operator as listed in the above table, it is recommended that parenthesis are used within the expression in order to remove any ambiguity on the part of a future reader.
A function is used to evaluate one or more arguments and return a result which is then taken into consideration when evaluating the overall truth of the expression.
Function calls start with a the character @
which is followed by the function name and a comma separated list of parenthesised parameters, for example @MIN(1, 2, 3)
.
Function names must be specified in UPPER CASE as shown in the examples below.
The following functions are supported by the if
statement:
Return the smallest number from the specified list (requires at least 2 arguments)
Examples:
@MIN(1,2)
returns 1
@MIN(1,2,-3)
returns -3
@MIN(1,2,"-1")
returns -1
- string "-1"
is converted to number -1
@MIN(1,2,3/6)
returns 0.5
@MIN(1,2,"3/6")
returns 1
- string "3/6"
is converted to number 3
, up to first invalid character
@MIN(1,2,"zzz")
returns 0
- string "zzz"
is converted to number 0
Return the largest number from the specified list (requires at least 2 arguments)
Examples:
@MAX(1,2)
returns 2
@MAX(-1,-2,-3)
returns -1
@MAX(1,2,100/10)
returns 10
Returns number rounded to digits decimal places. If the digits argument is not specified then the function will round to the nearest integer.
This function rounds half away from zero, e.g. 0.5
is rounded to 1
, and -0.5
is rounded to -1
Examples:
@ROUND(3.1415,3)
returns 3.142
@ROUND(3.1415,2)
returns 3.14
@ROUND(3.1415926536,6)
returns 3.141593
@ROUND(3.1415)
returns 3
@ROUND(2.71828)
returns 3
This function will treat all its arguments as strings, concatenate them and return the result.
Examples:
@CONCAT("the answer ", "is")
returns the answer is
@CONCAT("the answer ", "is", " 42")
returns the answer is 42
@CONCAT("the answer ", "is", " ", 42)
returns the answer is 42
Return a sub-string of string, starting from the character at position start
and continuing until the end of the string end until the character at position length
, whichever is shorter.
If length is omitted, then the portion of the string starting at position start and ending at the end of the string is returned.
Examples:
@SUBSTR("abcdef", 1)
returns abcdef
@SUBSTR("abcdef", 3)
returns cdef
@SUBSTR("abcdef", 3, 2)
returns cd
@SUBSTR("abcdef", 3, 64)
returns cdef
Returns the length of its argument in bytes.
Examples:
@STRLEN("foo")
returns 3
@STRLEN(@CONCAT("ab", "cd"))
returns 4
@STRLEN(1000000)
returns 7 (the number 1000000
is treated as a string)
All date functions operate with dates in yyyyMMdd format
Returns the current (actual) date in the timezone of the Exivity server.
Adds a specified number of days to the given date, returning the result as a YYYYMMDD
date.
Invalid dates are normalised, where possible (see example below).
Examples:
@DATEADD(20180101, 31)
returns 20180201
@DATEADD(20180101, 1)
returns 20180102
@DATEADD(20171232, 1)
returns 20180102
(the invalid date 20171232
is normalised to 20180101
)
@DATEADD(20180101, 365)
returns 20190101
Returns the difference in days between two yyyyMMdd dates. A positive result means that date1 is later than date2. A negative result means that date2 is later than date1. A result of 0 means that the two dates are the same.
Invalid dates are normalised, when possible (see example below):
Examples:
@DATEDIFF(20190101, 20180101)
returns 365
@DATEDIFF(20180201, 20180101)
returns 31
@DATEDIFF(20180102, 20180101)
returns 1
@DATEDIFF(20180101, 20180102)
returns -1
@DATEDIFF(20180101, 20180101)
returns 0
@DATEDIFF(20171232, 20180101)
returns 0
(the invalid date 20171232
is normalised to 20180101
)
Transcript-specific functions may be preceded with an exclamation mark in order to negate their output. For example:
Returns 1 if the file filename exists, else returns 0.
The FILE_EXISTS function will only check for the presence of files within the directories system
or exported
(as well as any sub-directories they contain) in the Exivity home directory.
In strict mode, this function returns 1 if the file filename exists and is empty. If the file does not exist, then this is considered an error.
In permissive mode, a non-existent file is considered equivalent to an existing empty file.
In either case, if the file exists and is not empty, the function returns 0
Returns 1 if the specified DSET exists, else 0
In strict mode (option mode = strict
), this function returns 1 if the specified DSET exists and is empty. If the DSET does not exist, then this is considered an error.
In permissive mode (option mode = permissive
), a non-existent DSET is considered equivalent to an existing empty DSET.
In either case, if the DSET exists and is not empty, the function returns 0.
This function returns 1 if the specified column exists, else 0. The column name may be fully-qualified, but if it is not, then it is assumed to be in the default DSET.
The lowercase
statement is used to modify the name of, and/or the values in, a column such that any upper case characters are replaced with their lower case equivalent.
lowercase heading|values [and heading|values] in column
ColName1 [... ColNameN]
Although the syntax shown uses the keyword column
, either column
or columns
may be specified. Both work in an identical manner.
The heading
and values
keywords determine the scope of the processing to be applied. The columns named in ColName1 ... ColNameN may or may not be fully qualified. If a column name is not fully qualified then the default DSET will be assumed.
Only upper case characters in a column name or value are converted. Numbers, punctuation and other symbols are ignored. Any blank values in a column are ignored.
The rename
statement is used to change the name of an existing column in a DSET, or to change the source and/or alias of a DSET
rename column
OldName
to
NewName
rename dset
OldSource.OldAlias
to
NewSource.NewAlias
When renaming a column, OldName may be a column name. If it is not fully qualified then the column OldName in the DSET will be renamed.
The NewName argument must not be fully qualified.
Any dots (.
) in the new column name will automatically be replaced by underscores (_
), as a dot is a reserved character used to implement DSET namespaces.
When renaming a DSET, NewSource.NewAlias must contain exactly one dot and must not be the name of an existing DSET. The rename takes immediate effect, thus any subsequent reference to the renamed DSET in the transcript task must use its new name.
Any pending definitions referencing the renamed DSET will automatically be updated with the new name.
Renaming columns:
Renaming a DSET:
The normalise
statement is used to update the values in a numerical column such that they are all positive, negative or inverted.
In this documentation the spelling normalise
is used but normalize
may also be used. The functionality is identical in either case.
normalise column
colName
as positive
normalise column
colName
as negative
normalise column
colName
as invert
normalise column
colName
as standard
The normalise
statement processes each value in the column called colName and applies the following logic based on the last argument shown above as follows:
In order to be considered a number, a value in the colName column must start with any of the characters +
, -
, .
or 0 to 9
and may contain a single .
character which is interpreted as a decimal point.
If a value in colName is non-numeric or blank it is left intact
When using standard
all non-blank values are assumed to be numeric, and as such any non-numeric values will be changed to a numeric zero.
Additionally:
Any numerical value in colName which starts with a +
, .
or decimal character is considered positive
Any numerical value in colName which starts with a -
character is considered negative
When using standard
the resulting conventional number will be accurate up to 14 decimal places
The replace
statement is used to search for, remove, and optionally replace with new values, substrings of the values in a column.
replace
substring
in
colName
[with
replacement]
The replace
statement will remove or replace all occurrences of substring in the values of a column. The parameters are as follows:
The colName argument may or may not be , but must reference an existing column.
If the optional replacement string is provided then any occurrences of substring will be substituted with the specified value.
If the replacement string is not provided then all occurrences of substring will be removed from the values in colName.
The replace
statement is useful for reducing verbosity in extracted data, resulting in smaller RDFs without losing required information
Given the following sample data:
The following script ...
... will produce the following output data.
The round
statement is used to ensure that numeric values in a column are whole multiples of a specified number.
round
colName [direction]
[to nearest
value]
The round
statement is used to round numbers to the nearest multiple of any integer or floating point number.
The parameters supported by round
are as follows:
The simplest form of the statement is round colName
. This will use the defaults shown in the table above such that values are rounded to the next highest integer. Alternatively, the statement round colName down
will round down to the nearest integer.
If the value argument is provided then the numbers in colName will be rounded up or down to the nearest multiple of value. For example the statement ...
round Quantity up to nearest 5
... will round the numbers in the column Quantity up to the next multiple of 5. Thus any number in Quantity higher than 10 and less than 15 will be rounded to 15.
When specifying the value argument, floating point values are supported.
The round
statement observes the following rules:
Non-numeric values in colName are ignored
Blank values, or a value of 0
in colName are ignored
Numbers in colName that are already a whole multiple of value will not be modified
The option
statement is used to set global parameters during the execution of a Transcript task.
option
option = setting
option noquote
The options
statement can be used multiple times within the same task script and always takes immediate effect. It is therefore possible (for example) to import a CSV file delimited with commas and quoted with single quotes, change the options and then export it with semicolon delimiters and double quotes.
The supported options are as follows:
When using options, there must be whitespace on each side of the =
sign
option continue = yes|enabled
option continue = no|disabled
When executing a task file repeatedly against each day in a date range, by default Transcript will abort the whole run if a task failure occurs. In cases where this is undesirable, setting the continue option to enabled or yes (both work in exactly the same way) will change the behaviour such that if a task failure occurs then execution will resume with the next day in the range.
When specifying a quote or tab as the separator it is necessary to escape it in order to prevent Transcript from interpreting it as a meaningful character during the parsing of the task script. For example:
Transcript supports two modes of execution for tasks:
In strict
mode, if an error is encountered at any point in the task, the error will be logged and execution will terminate
In permissive
mode, many errors that would otherwise have caused the task to fail will be logged, the statement that caused the error will be skipped and execution will continue from the next statement in the task.
The mode option can be used multiple times and changed at any point during task execution. This means that specific sections of the task can be more error tolerant.
Errors that can be handled in permissive
mode are mainly syntax errors or those involving invalid parameters to Transcript statements. There are error conditions that can arise during the execution of a statement will cause the task to fail even in permissive
mode.
Transcript generates a considerable amount of logging information during the execution of a task. The loglevel
option can be used to increase or decrease the level of detail written to the logfile. All logging levels must be specified in UPPER CASE. The following levels can be set:
The order of the logging levels in the table above is significant, in that for any given level, all levels above it in the table are also in effect. Therefore a logging level of WARN
will result in log entries for ERROR
, FATAL
, and INTERNAL
level events, as well as warnings.
The loglevel option can appear multiple times within a transcript task and will take immediate effect whenever it is used. This means that within a task, the loglevel can be increased for certain statements and reduced for others.
Regardless of the logging level, some events will always create a logfile entry, for example the success or failure of a transcript task at the end of its execution.
In order to minimise the effect on performance when logging, Transcript opens the logfile when it is first run and then holds it open until the completion of the task being executed. The logfile can be accessed in one of two modes:
The default is SAFE
. It is not recommended that this be changed.
The following Transcript task will import a CSV file quoted with double quotes and delimited with commas and then export a copy with semicolons as delimiters and quoted with single quotes:
It also increases the logging level for the import statement
The normalise
statement ignores the setting, as its sole purpose is to modify existing values.
The round
statement may be used in the body of a statement to perform conditional rounding.
When combining the continue option with option mode = permissive
it is possible to process a range of dates for which usage or other data is not available, because the mode option will prevent a failed statement from being treated as a fatal error.
Name
Type/possible values
Description
skip
numeric
Number of leading non-blank rows to skip
omit
numeric
Number of trailing non-blank rows to omit
heading_dupe_check header_dupe_check
enabled
/ on
/ true
/ yes
Skip rows that are a duplicate of the header row
select include
column list
A space-separated list of column names to include in the import
ignore exclude
column list
A space-separated list of column names to ignore when importing
filter
expression
If specified, only rows that match the expression will be imported
escaped escape
enabled
/ on
/ true
/ yes
Treat backslash (\
) characters in the data in the CSV to be imported as an escape character. This will escape the following delimiter or quote. For a literal backslash to be imported it needs to be escaped in the data (\\
). Where a backslash is not followed by another backslash or a quote, it is treated literally.
pattern
enabled
/ on
/ true
/ yes
Permits the filename to be specified as a regular expression
Precedence
Operator
Meaning
1
!
Unary negation
2
*
Multiplication
2
/
Division
2
%
Modulo
3
+
Addition
3
-
Subtraction
4
<
Less than
4
<=
Less than or equal to
4
>
Greater than
4
>=
Greater than or equal to
5
==
Is equal to
5
!=
Is not equal to
5
=~
Matches regular expression
5
!~
Does not match regular expression
6
&&
Boolean AND
7
||
Boolean OR
Argument | Result |
| All negative numbers are replaced with their positive equivalent. Non-negative numbers are left unmodified. |
| All positive numbers are replaced with their negative equivalent. Negative numbers are left unmodified. |
| All positive numbers are replaced with their negative equivalent, and all negative numbers are replaced with their positive equivalent |
| All non-blank values are assumed to be a decimal number and are replaced with that value in conventional notation. This functionality is intended to provide a means to convert numbers in scientific notation such as |
Parameter | Default | Purpose |
colName | n/a | The name of the column containing the values to round |
direction |
| Whether to round to the next highest ( |
value |
| A value determining the granularity of the rounding |
Level | Details |
| Self-diagnostic messages indicating an internal error detected with Transcript itself |
| Non-recoverable errors, usually as a result of an Operating System error such as no available memory |
| A transcript task error (syntax or data related) |
| An unexpected event or a syntax error that can be recovered from |
| Informational messages about actions performed during the execution of a transcript task |
| Detailed logs of actions performed during the execution of a transcript task |
| Extended debugging information (may cause very large logfiles and a minor reduction in performance) |
Mode | Details |
| After every message, the logfile is flushed to disk |
| The logfile is not explicitly flushed to disk until the termination of the task file |
Parameter | Notes |
substring | A string to search the values in colName for |
colName | The column to search for the substring |
replacement | The string to replace occurrences of substring with |
The terminate
statement will exit the transcript task immediately.
terminate
Normally a transformation script will finish execution when an error is encountered or when the end of the script file is reached, whichever comes first.
When the terminate
statement is encountered, the script will finish at that point. No statements after the terminate
statement will be executed.
The set
statement is used to write a specified value into all the cells in any given column, or to copy values from one column to another.
set
ColName
to
Value
set
ColName
as
SrcColName
set
ColName
=
Expression
If the keyword word to
is used then column ColName will be updated using the constant value specified in Value.
If the word as
is used then the values in the column SrcColName will be copied into column ColName.
If =
is used then the expression is evaluated and the column values are set to the result. For a complete list of functions that can be used in an expression please refer to the list of functions in the article about the if statement
The ColName argument must identify an existing column. The modified column will be in the default DSET unless it is a fully qualified column name.
If the overwrite
option is disabled, then only cells with blank values will be updated with the new value.
Set the column called Flag in the default DSET to a value of 0
:
Set the column called Flag to be the value in the column status:
Set the column called DiskTier to the 4th character in the existing value of that column
set DiskTier = @SUBSTR([DiskTier],4,1)
Set the column called Flag in the DSET custom.dataset to a value of Pending
:
Set the column Flag to free
if the rate column contains a value of 0:
Fill any blank values in the column username in the DSET Azure.usage to Unknown
:
The timecolumns
statement is used to set the start time and end time columns in a DSET
timecolumns
start_time_col end_time_col
timecolumns clear
The usage data stored in a DSET may or may not be time sensitive. By default it is not, and every record is treated as representing usage for the entire day. In many cases however, the usage data contains start and end times for each record which define the exact time period within the day that the record is valid for.
If the usage data contains start and end times that are required for functions such as aggregation or reporting, the column(s) containing those times need to be marked such that they can be identified further on in the processing pipeline. This marking is done using the timecolumns
statement.
The timecolumns
statement does not perform any validation of the values in either of the columns it is flagging. This is by design, as it may be that the values in the columns will be updated by subsequent statements.
The values in the columns will be validated by the finish statement.
If the timecolumns
statement is executed more than once, then only the columns named by the latest execution of the statement will be flagged. It is not possible to have more than one start time and one end time column.
Both the start_time_col and end_time_col parameters may be fully qualified column names, but they must both belong to the same DSET.
It is possible to use the same column as both the start and end times. In such cases the usage record is treated as spanning 1 second of time. To do this, simply reference it twice in the statement:
To clear both the start and end time columns, thus restoring the default DSET to treating each record as spanning the entire day, the statement timecolumns clear
may be used.
Currently the statement timecolumns clear
will only clear the timestamp columns in the default DSET
This can be useful in the following use case:
The DSET is loaded and timestamp columns are created
finish is used to create a time-sensitive RDF
The timestamp columns are cleared
The DSET is renamed using the rename dset statement
Further processing is done on the DSET as required
finish is used to create a second RDF which is not time-sensitive
Option | Default | Notes |
continue |
| Determines whether to proceed to the next day in a date range or bail out if an error occurs on any given day when processing a range of dates - see notes below. |
delimiter separator |
| Specifies the delimiter character between each field in a CSV file. This character will be used when importing and exporting Datasets. The default value is a comma. If a dot character is used, then the |
services |
| May be set to |
mode |
| May be set to |
quote |
| Specifies the quote character used for quoting fields in a CSV file. This character is taken into consideration when importing and exporting Datasets. When importing a Dataset, any fields that begin and end with this character will have it removed during the import process. When exporting a Dataset, all non-blank fields will be quoted using this character. Note that when specifying a literal quote character, it must be escaped: |
noquote | n/a | Specifies that no quoting is to be performed either when importing or exporting Datasets. A subsequent |
loglevel |
| Sets the logging level - see notes below |
logmode |
| Sets the logging mode - see notes below |
overwrite |
| May be set to |
embed |
| May be set to |
merge_blank | (none) |
merge_nomatch | (none) |
The split
statement is used to create and/or update columns by splitting the textual value of an existing column into multiple parts.
split
ColName
using
sep
split
ColName
using
sep
retaining first
[column_count]
split
ColName
using
sep
retaining last
[column_count]
split
ColName
using
sep
retaining
first_column_index
[to
last_column_index]
The keyword following ColName may be using
, separator
or delimiter
. All three work in exactly the same way.
The ColName argument is the name of an existing column whose values are to be split and the sep argument (which must only be a single character in length) is the delimiter by which to split them.
For example a value one:two:three:four
, if divided by a sep character of :
, would result in the fields one
, two
, three
and four
.
To specify a sep value of a space, use " "
or\<space>
(where <space>
is a literal space character)
New columns are created if necessary to contain the values resulting from the split. These additional columns are named ColName_split1
through ColName_splitN
where N is the number of values resulting from the process.
If there is an existing column with a name conflicting with any of the columns that split
creates then:
If the overwrite option is set then all values in that pre-existing column will be overwritten
If the overwrite option is not set:
If there are blank values in the existing column they will be updated
If there are no blank values in the existing column then no changes will be made to it
If the retaining
keyword is specified, split
will discard one or more of the resulting columns automatically based on the specification that follows the keyword. The following specifications are supported:
In the table above, N refers to a result column's number where the first column created by split
has a number of 1
Given an input dataset of the form:
The statement ...
split ID using :
... will result in the dataset:
Using the same original dataset, the statement ...
split ID using : retaining 3 to 5
... will result in the dataset:
This article assumes a knowledge of services, their rates and related concepts as documented in Services in Exivity
The service
statement is used to create or modify a single service definition during the execution of a Transcript task. The service definition is associated with the data in an existing DSET, and the global database is updated if and when the Transcript task successfully completes.
service
{ param1 = value [... paramN = value] }
Example:
Parameters may be specified in any order. The '=' between parameters and their values is optional and may be omitted, but if present it must be surrounded by white-space
The service
statement is used to create a new service definition. Once created, a service definition automatically becomes available to the reporting engine.
The service
statement creates a new service using the following parameters:
The key parameter must be distinct from that in any other service unless it is used to identify an existing service definition to be overwritten with new values. By default, an attempt to create a service with a key that is a duplicate to an existing service in the global database will result in a warning in the logfile and no further action will be taken.
If the key matches that of a service that has been defined previously in the task file then one of two things can happen depending on the value of the current execution mode:
If the mode is set to strict
then an error is logged and the task will fail
If the mode is set to permissive
then a warning is logged and the newest service definition is ignored
To override the default protection against overwriting existing service definitions in the global database, the statement option services = overwrite should be invoked prior to the service
statement.
The value of the key parameter may be up to 127 characters in length. Longer names will be truncated (and as a result may no longer be unique).
The description parameter is freely definable and does not have to be unique. When a report is generated, the description of a service will be used on that report so care should be taken to make the description meaningful.
By default, if description is not specified in a service definition then a copy of the key will be used as the description.
The value of the description parameter may be up to 255 characters in length. Longer descriptions will be truncated.
Either category or group may be used. The two terms are interchangeable in this context.
The category parameter is used to logically associate a service with other services. All services sharing the same category will be grouped together on reports. Any number of different categories may exist and if the category specified does not exist then it will be automatically created.
If no category is specified then the service will be placed into a category called Default
.
The value of the category parameter may be up to 63 characters in length. Longer values will be truncated.
In order to calculate a price for a service, the number of units of that service that were consumed needs to be known. The value of the usage_col parameter specifies the column in the usage data which contains this figure.
The usage_col argument is also used to derive the DSET associated with the service as follows:
If the value of usage_col is a fully qualified column name then the service will be associated with the DSET identified by that name
If the value of usage_col is not fully qualified then the service will be associated with the default DSET
If the column specified by the usage_col argument is determined not to exist using the above checks, then one of two things can happen depending on the value of the current execution mode:
If the mode is set to strict
then an error is logged and the task will fail
If the mode is set to permissive
then a warning is logged and the service definition is ignored
The value of the usage_col parameter may be up to 255 characters in length. Longer values will be truncated.
Services may be charged in different ways. For example some services will invoke a charge whenever they are used whereas others are charged by a time interval, such as per month.
The value of the interval parameter determines how the service should be charged as per the following table:
Specifies whether or not to apply proration when calculating the charge for a monthly service.
Currently, only monthly services can be prorated, based on the number of days in the month the service was used.
The unit_label is the label used for units of consumption on reports. For example storage-related services may be measured in Gb
or Tb
, Virtual Machines may be measured in Instances
and software usage may be measured in licenses
.
The specified unit_label value may be up to 63 characters in length. Longer values will be truncated.
If the unit_label parameter is not specified then a default value of Units
will be used.
The account_id parameter is intended for internal use only and should not be used
The optional account_id references an entry in the account table in the global database. If specified, the service will be created with a rate revision specific to the account id.
The default rate for the service (which applies to all combinations of report column values not explicitly associated with their own rate) can be defined in any of the following ways:
By omitting the account_id parameter altogether
By specifying an account_id of 0
By specifying an account_id of *
The rate parameter determines the cost per unit of consumption to use when calculating the charge. A rate of 0.0
may be used if there is no charge associated with the service.
This may be used in conjunction with a fixed_price and one of cogs or fixed_cogs.
Either or both of rate or fixed_price may be specified, but at least one of them is required.
The fixed_price is a charge applied to the service per charge interval regardless of the units of consumption.
This may be used in conjunction with a rate and one of cogs or fixed_cogs
Either or both of rate or fixed_price may be specified, but at least one of them is required.
The optional COGS charge is the price to the provider of the service. Usually, COGS-related charges are not included on reports special permissions are required to see the COGS charges.
For services with a defined cogs value it is possible to generate Profit and Loss reports, the profit/loss being the total charge calculated from the rate and/or fixed_rate values minus the price calculated from the cogs or fixed_cogs values.
The optional fixed_cogs value is a fixed price to be factored into COGS-related calculations regardless of the number of units consumed in the charging interval.
The optional min_commit parameter specifies a minimum number of units of consumption to include when calculating the charge associated with a service. In cases where the actual consumption is greater than the min_commit value this will have no effect, but where the actual consumption is less than the minimum commit the price will be calculated as if the min_commit units had been consumed.
Once all parameters have been evaluated by the service
statement the resulting service definition is added to a cache in memory. This cache is committed to the global database at the successful conclusion of the Transcript task (or discarded in the case of error).
Once a service definition has been added to the cache, no subsequent service definitions with the same key may be added to the cache. In the event of conflict the first definition written to the cache is perserved and the attempt to add a duplicate will result in a warning or an error depending on the value of the current execution mode as described in the key section above.
As described in Services in Exivity a service may have multiple rate revisions associated with it. The service
statement can only create a single rate revision per service per execution of a Transcript task.
A rate revision consists of the rate, fixed_rate, cogs, fixed_cogs and effective_date parameters. Each revision must have a different effective_date which indicates the date from which that service revision is to be used. A rate definition remains in force for all dates on or after the effective_date, or until such time as a rate revision with a later effective_date is defined (at which point that revision comes into effect).
To create multiple revisions, Transcript must be run multiple times using a service
statement that has the same key and uses the same rate, fixed_rate, cogs and fixed_cogs values but has a different effective_date each time. For each of the effective_date parameters a new rate revision will be created for the service.
At the successful conclusion of a Transcript task the global database is updated from the memory cache.
This article assumes a knowledge of services, their rates and related concepts as documented in Services in Exivity and in the article on the service statement
The services
statement is used to create or modify multiple services based on the data in a daily RDF.
services
{ param1 = value [ ... paramN = value] }
Example:
Parameters may be specified in any order. The '=' between parameters and their values is optional and may be omitted, but if present it must be surrounded by white-space
The services
statement is used to create or modify multiple services from the data in a daily RDF. The parameters supplied to the statement map columns in the usage data to attributes of the created services.
For many of the parameters to the services
statement there are two ways of using a column name:
The values in the column are extracted from the usage data and those values are embedded as literals into the service definition.
The column name itself is used in the service definition such that the reporting engine dynamically determines the values to use for any given day when generating report data
When creating or updating services using the first method, Transcript will create a new rate revision every time the rate information changes. For data sources such as Microsoft Azure where the rates can change daily this will result in a lot of rate revisions in the global database.
Using the second method requires only a single rate revision which identifies by name the column(s) containing rate and/or COGS information. When a report is run, the charge engine then obtains the correct rate information for any given day from the data in those named columns.
The parameters supported by the services
statement are summarised in the following table. The Type
column in the table indicates the way the column name is used as described in How column names are used above. Additional information about each parameter can be found below the summary table itself.
The usages_col
parameter is the name of a column containing service keys. A service will be created for each distinct value in this column, and these values will be used as the service keys.
In order to calculate the charges associated with a service it is necessary to know the number of units of that service that were consumed. Exivity supports two methods of retrieving the units of consumption from usage data and the service_type determines which of these is applied.
Any given service may use one or other of these methods, which are as follows:
Manual services: service_type = MANUAL
Automatic services: service_type = AUTOMATIC
Manual services
Manual services require that the units of consumption for each service named in the usages_col
column are stored in separate columns whose names correlate to the service keys themselves.
To illustrate this, consider the following fragment of usage data:
In this case, the service_name
column contains a list of service keys and for each of those service keys there is a corresponding column containing the units of consumption for that service. Thus in the above example we can see that there are two services, "Small VM" and "Large VM" and that the units of consumption for each of these services are in the columns of the same name.
The more manual services that are represented in the data, the more columns are required.
Automatic services
Automatic services require that the units of consumption for each service named in the usages_col
column are stored in the column named by the consumption_col
paramater. To represent the same information as that shown in the example above, the following would be used:
It can be seen that any number of automatic services, along with their consumption figures, can be represented using only two columns of data.
The consumption_col
parameter is only required when creating automatic services and determines the column containing the units of consumption for each service as described above.
It is not enough to know only the units of consumption for each service, as this merely provides the total consumption for each service across the entire usage. In the examples above, for example, the "Large VM" service has 10 units of consumption but using that information alone there is no way to know if this represents one instance of a VM used 10 times, 10 instances of VMs used once each, or something in between.
The instance_col
parameter is therefore required to tell the difference. Typically this will be a unique identifier which groups the units of consumption into 'buckets'. In the case of a VM this may be a VM ID which remains constant throughout the life of a VM in the cloud.
To illustrate this, we can supplement the example usage fragment used previously with additional information to use as the instance_col
as follows:
By specifying instance_col = vmid
we can now see that the usage represents:
5 instances of a single Small VM with an ID of 444
6 instances of a Large VM with an ID of 555
4 instances of a Large VM with an ID of 666
If specified, the description_col
denotes a column containing a friendly description for reports to identify the service with.
Typically in cloud usage data, services are identified using unique IDs (referred to as keys in Exivity) which are often non-meaningful to human eyes, so Exivity supports a 'friendly' description for each service for display purposes when generating a report.
For example description_col = description
may be used in conjunction with the following data to map the service_id
to a friendly name:
It is not mandatory to provide a description_col
parameter, but if one is not supplied then the description will be set to a duplicate of the service key (as derived via the usages_col
parameter).
In the example above, it can be seen that there are multiple rows in the data for the same service key (vmid). When using description_col
, the first row for each distinct value in the usages_col
will be used to set the description.
By default Exivity will group services on reports according to their category. Using categories ensures that charges for services that fall within the same category wil appear in a contiguous block.
The category
parameter specifies the category to which all services created will be assigned, thus specifying category = "Virtual Machines"
might be appropriate for the example data used so far in this article.
If no category is specified, the services created will be assigned to a category called Default.
Usage data normally contains information about a range of services of different types such as Virtual Machines, Storage, Networking and so on. By referencing a column in the usage data which identifies the correct category for each service, multiple categories will be created and each service assigned to the correct category by the services
statement.
To illustrate this, let us extend the sample data as follows:
By specifying category_col = category
each service will now be associated with the correct category.
The interval
parameter is used to specify a literal interval for all the services created by the services
statement.
The interval
parameter may be any of:
individually
daily
monthly
If the interval
parameter is not specified, then a default interval of monthly will be used.
In the event that different services in the usages_col
require different charge intervals, a column name containing the interval to use may be specified using the interval_col
column as follows:
By specifying interval_col = interval
each service in the above usage data will be assigned the correct charge interval.
The model
parameter is used to enable proration for monthly services. Either of unprorated or prorated may be specified.
If no model is specified, then a value of unprorated will be used by default.
In the event that different services in the consumptions_col
require different proration settings, the model_col
parameter can be used to specify which column contains the proration setting for each service.
By specifying model_col = model
, each service in the above usage data will be assigned the correct proration model.
The unit_label
parameter is used by reports to provide a meaningful description of the units of consumption associated with a service. A virtual machine may have a unit label of Virtual Machines
, but storage-related services may have a unit label of Gb
for example.
If the unit_label
parameter is not specified then a default lavel of Units
will be used.
The unit label may be up to 63 characters in length. Longer values will be truncated.
In cases where the services contained in the usages_col
column collectively require more than one unit label, the unit_label_col
parameter can be used to identify a column in the usage data which contains an appropriate label for each service.
For example unit_label_col = label
can be used to associate an appropriate label using the data below:
The parameters rate_col
, set_rate_using
, fixed_price_col
, set_fixed_price_using
, cogs_col
, set_cogs_using
, fixed_cogs
and fixed_cogs_using
(all of which are detailed below) collectively determine the types of charge that will be associated with the service definitions created by the services
statement.
The rate_col
parameter is used to determine the column in the usage data which contains the unit rates for the service definitions created by the services
statement.
As each service definition is created, an initial rate revision is also created which contains the column named by the rate_col
parameter. When a report is run, for each day in the reporting range the unit rate for that day will be determined by whatever value is in the column named by the rate_col
parameter in the usage data.
This means that only a single rate revision is required, even if the actual value in the rate_col
column is different from day to day.
The set_rate_using
parameter is also used to determine the unit rate for each service. This differs from the rate_col
parameter in that the values in the column named by set_rate_using
are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.
This means that the unit cost is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date
for more details)
Either of rate_col
or set_rate_using
(but not both) may be used in a single services
statement
The fixed_price_col
parameter is used to determine the column in the usage data which contains the fixed price associated with the service definitions created by the services
statement.
As each service definition is created, an initial rate revision is also created which contains the column named by the fixed_price_col
parameter. When a report is run, for each day in the reporting range the fixed price for that day will be determined by whatever value is in the column named by the fixed_price_col
parameter in the usage data.
If a monthly service has different fixed prices for different days in the month, then whichever results in the highest charge will be used.
This means that only a single rate revision is required, even if the actual value in the fixed_price_col
column is different from day to day.
The set_fixed_price_using
parameter is also used to determine the fixed price for each service. This differs from the fixed_price_col
parameter in that the values in the column named by set_fixed_price_using
are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.
This means that the fixed price is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date
for more details)
Either of fixed_price_col
or set_fixed_price_using
(but not both) may be used in a single services
statement
The cogs_col
parameter is used to determine the column in the usage data which contains the COGS rate associated with the service definitions created by the services
statement.
As each service definition is created, an initial rate revision is also created which contains the column named by the cogs_col
parameter. When a report is run, for each day in the reporting range the COGS rate for that day will be determined by whatever value is in the column named by the cogs_col
parameter in the usage data.
If a monthly service has different COGS rates for different days in the month, then whichever results in the highest charge will be used.
This means that only a single rate revision is required, even if the actual value in the cogs_col
column is different from day to day.
The set_cogs_using
parameter is also used to determine the COGS rate for each service. This differs from the cogs_col
parameter in that the values in the column named by set_cogs_using
are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.
This means that the COGS rate is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date
for more details)
Either of cogs_col
or set_cogs_using
(but not both) may be used in a single services
statement
The fixed_cogs_col
parameter is used to determine the column in the usage data which contains the fixed COGS price associated with the service definitions created by the services
statement.
As each service definition is created, an initial rate revision is also created which contains the column named by the fixed_cogs_col
parameter. When a report is run, for each day in the reporting range the fixed COGS price for that day will be determined by whatever value is in the column named by the fixed_cogs_col
parameter in the usage data.
If a monthly service has different fixed COGS prices for different days in the month, then whichever results in the highest charge will be used.
This means that only a single rate revision is required, even if the actual value in the fixed_cogs_col
column is different from day to day.
The set_fixed_cogs_using
parameter is also used to determine the fixed COGS price for each service. This differs from the fixed_cogs_col
parameter in that the values in the column named by set_fixed_cogs_using
are consulted when the service is created, and the literal values in that column are used to populate the initial rate revision.
This means that the fixed COGS price is hard-coded into the rate revision and will apply indefinitely, or until such time as a new rate revision takes effect (see effective_date
for more details)
Either of fixed_cogs_col
or set_fixed_cogs_using
(but not both) may be used in a single services
statement
The set_min_commit_using
parameter is used to set the minimum commit value in the initial rate revision for each service.
The values in the column identified by set_min_commit_using
are extracted from the usage data and used as numeric literals in the revision.
When creating the initial rate revision for a service, the value specified by the effective_date
parameter is interpreted as a yyyyMMdd
value to determine the date from which the revision should be applied.
If the effective_date
parameter is omitted then the current data date will be used by default.
When using effective_date
, the value will be used to set the initial rate revision date for all the service definitions created by the services
statement. If different services require different effective dates then the effective_date_col
parameter may be used to determine the effective date for each service from a column in the usage data.
If there is a column in the usage data containing yyyyMMdd values representing the desired effective date for the initial revision of each service, The effective_date_col
parameter may be used to extract the values from this column and set the effective date for each service accordingly.
Either of effective_date
or effective_date_col
may be specified in a single services
statement, but not both
The default value for to use if a source column is blank
The default value for to use if no match is found for a regular expression in a source column value
Specification
Result
first
or 1
Discard all but the first column
first N
or 1 to N
Discard all but the first N columns
last
Discard all but the last (rightmost) column
last N
Discard all but the last N columns
N
Discard all but the Nth column
N
to M
Discard all but the columns from the Nth to the Mth inclusive
Parameter
Required
Notes
key
Yes
A unique identifier for the service
description
No
The name of the service as it will appear on a report
category or group
No
Services are grouped by category on a report
usage_col
Yes
The name of the column containing the number of units consumed
interval
No
The charging interval for the service
model
No
Whether the service charges should be prorated or not
unit_label
No
The label for the units of consumption; eg: GB
for storage
account_id
No
Which account to associate with the rate revision
rate
No
The price per unit of consumption
fixed_price
No
The fixed price which will be charged regardless of non-zero usage in the charge interval
cogs
No
The COGS price per unit of consumption
fixed_cogs
No
The fixed COGS price which will be charged regardless of non-zero usage in the charge interval
min_commit
No
The minimum number of units of consumption that will be charged per interval
effective_date
No
The date from which the rate revision should be applied
Interval value
Meaning
individually
Every unit of consumption is charged individually. For example if a network-related service charges by the GB, then for every GB seen in the usage the charge is applied regardless of how many GB are used or over how long a period of time the consumption took place.
hourly
The charge is applied once per hour regardless of the number of times within the hour the service was consumed
daily
As for hourly
but applied per-day
monthly
As for daily
but applied per calendar month
Model value
Meaning
unprorated
No proration is to be applied
prorated
Proration will be applied
Parameter
Type
Meaning
usages_col
2
The name of the column from which units of consumption are derived
service_type
n/a
Determines how the usages_col and consumption_col values are used when interrogating the usage data to get the units of consumption
consumption_col
2
The name of the column containing units of consumption (AUTOMATIC services only - see below for more details)
instance_col
2
Values in this column are used to distinguish between service instances
description_col
1
Values in this column determine the service description for each service definition
category or group
n/a
A specific category string to use for all service definitions
category_col or group_col
1
Values in this column determine the service category for each service definition
interval
n/a
A specific charging interval to use for all service defintions
interval_col
1
Values in this column determine the charging interval for each service definition
model
n/a
A specific proration setting to use for all service definitions
model_col
1
Values in this column determine the proration setting for each service definition
unit_label
n/a
A specific unit label to use for all service definitions
unit_label_col
1
Values in this column determine the unit label for each service definition
rate_col
2
Set the column name from which Edify will determine rate per unit at report time
set_rate_using
1
Values in this column determine the rate per unit for each service definition
fixed_price_col
2
Set the column name from which Edify will determine the fixed price per charging interval for each service definition
set_fixed_price_using
1
Values in this column determine the fixed_price per charging interval for each service definition
cogs_col
2
Set the column name from which Edify will determine the COGS rate per unit for each service definition
set_cogs_using
1
Values in this column determine the COGS rate per unit for each service definition
fixed_cogs_col
2
Set the column name from which Edify will determine the fixed COGS price per charging interval for each service defnition
set_fixed_cogs_using
1
Values in this column determine the fixed COGS price per charging interval for each service definition
set_min_commit_using
1
Values in this column determine the minimum commit for each service definition
effective_date_col
1
Values in this column determine the effective date of the rate revision created for each service definition
effective_date
n/a
A specific effective date to use in the rate revision created for each service definition