Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
RDF stands for Reporting Database File. Exivity uses these to store usage data and configuration information.
A daily RDF stores the usage data from which Edify produces report data. Any given RDF contains a single DSET, along with internally managed metadata such as column types and prepared report data.
An RDF is created using the finish statement in a Transcript task. For any given DSET, there can be a single RDF per day, although there may be many RDF files per day in total (one RDF for each DSET).
RDFs are named according to the datadate and ID of the DSET they contain, and have a .rdf
extension. For example a DSET with an ID of Azure.usage
for the first day of the month will result in an RDF called 01_Azure.usage.rdf
.
An RDF containing usage data is located at <home dir>/report/<yyyy>/<mm>/<dd>_<dset>.rdf
where:
<yyyy>
is the year of the datadate
<MM>
is the month of the datadate
<dd>
is the day of the datadate
<dset>
is the DSET ID that was used to populate the usage data
The global RDF (located at<home_dir>/system/global.rdf
) contains system-wide configuration data including (but not necessary limited to):
Service definitions
Service rate revisions
Service rate adjustments
Service categories
User accounts
Report definitions (and related metadata such as sychronisation timestamps)
Metadata about RDF files created by Transcript
Usergroups
Security settings
Account information
Job Schedules
The Global RDF should never be manually modified unless this process is performed by, or under the guidance of, Exivity support staff. It is critical to make a backup copy of this file before any changes are made.
The file p_gp.rdf (located in <basedir>/system/report
) is used by the charge engine to store the output of report runs.
The main program directory as it should be installed by the Exivity installer:
During the execution of a task, or during the generation of a report, a number of files are accessed, both for reading and writing. The user Home Working Directory (referred to throughout this documentation as home_dir
or base_dir
) is the directory relative to which these files are located.
The home directory should preferably be located on a dedicated volume i.e. D:\exivity\home
and it is recommended that it be located on an SSD drive.
The following articles provide information regarding the basic concepts around which Exivity is built. These concepts are referenced repeatedly throughout the documentation and as such the below articles are recommended reading:
CSVs Datasets and DSETs - The input files to Transcript and how their contents are accessed by Transcript tasks
Data Date - The date against which data is to be processed by Transcript
Base Working Directory - the directory within which Transcript and Edify operate
Reporting Database Files - the files produced by Transcript and used by Edify to generate reports
When Transcript is executed one of the command line arguments it requires is a date in yyyyMMdd format which is termed the data date. The activities performed by Transcript are associated with this date in several ways, most notably:
when importing using automatic source tagging which specific Dataset file to import from \collected
to determine the concept of 'today' when processing files containing data with timestamps spanning multiple days
to determine the output directory into which files will be generated
to generate the filename of the RDF files generated
The data date is made available to a Transcript task through the automatic creation of the ${dataDate}
variable
CSV files are a way of storing data in a table format. A table consists of one or more rows, each row containing one or more columns. There is no formal specification for CSV files although a proposed standard can be found at https://www.ietf.org/rfc/rfc4180.txt.
Although the 'C' in 'CSV' stands for the word comma, other characters may be used as the separator. TAB and semicolons are common alternatives. Exivity can import CSVs using any separator character apart from an dot or an ASCII NUL and uses the comma by default.
A Dataset is a CSV file, usually produced by USE, which can be imported for processing by a Transcript task. To qualify as a dataset a CSV file must meet the following requirements:
The first line in the file must define the column names
Every row in the file must contain the same number of fields
A field with no value is represented as a single separator character
The separator character must not be a dot (.
)
There may be no ASCII NUL characters in the file (a NUL character has an ASCII value of 0)
When a Dataset is imported by a Transcript task any dots (.
) contained in the a column name will be replaced with underscores (_
). This is due to a dot being a reserved character used by Fully Qualified Column Names
When a DSET is exported during execution of a Transcript task, the exported CSV file will always be a Dataset, in that it can be imported by another Transcript task.
Although datasets are generated by USE as part of the extraction phase, additional information may be provided in CSV format for the purposes of enriching the extracted data. This additional CSV data must conform to the requirements above.
A DSET is the the data in a Dataset once it has been imported by a Transcript task. A DSET resides in RAM during the Transform phase and, if referenced by a finish statement, is then stored in a database file for long-term use.
A Transcript task may import more than one dataset during execution, in which case multiple DSETs will be resident in RAM at the same time. It is therefore necessary for any subsequent Transcript statement which manipulates the data in a DSET to identify which DSET (or in some cases DSETs) to perform the operation on. This is achieved using a DSET ID which is the combination of a Source tag and an Alias tag.
If multiple DSETs are present in memory, the first one that was created will be the default DSET. Column names that are not fully qualified are assumed to be located in the default DSET.
After a Dataset has been imported the resulting DSET is assigned a unique identifier such that it can be identified by subsequent statements. This unique identifier is termed the DSET ID and consists of two components, the Source and Alias tags.
The default alias tag is defined automatically and is the filename of the imported Dataset, minus the file extension. For example a Dataset called usage.csv
will have an alias of usage
.
The Transcript import statement can take one of two forms. Depending on which is used, the Source tag is determined automatically, or specified manually as follows:
By convention Datasets produced by USE are located in sub-directories within the directory <basedir>\collected
. The sub-directories are named according to the data source and datadate associated with that data. The naming convention is as follows:
where:
<data_source>
is a descriptive name for the external source from which the data was collected
<yyyy>
is the year of the datadate as a 4-digit number
<MM>
is the month of the datadate as a 2 digit number
<dd>
is the day of the month of the datate as a 2 digit number
When importing one of these datasets using automatic source tagging, the source tag will be the name of the directory containing that dataset. Thus, assuming a datadate of 20160801 the following statement:
import usage from Azure
will import the Dataset <basedir>\collected\Azure\2016\08\01_usage.csv
, and assign it a Source tag of Azure.
When importing a Dataset from a specified path, the Source tag is specified as part of the import statement. For example the statement:
import my_custom_data\mycosts.csv source costs
will import the Dataset <basedir>\my_custom_data\mycosts.csv
and assign it a Source tag of costs. Checks are done during the import process to ensure that every imported Dataset has a unique source.alias
combination.
When a column name is prefixed with a DSET ID in the manner described previously, it is said to be fully qualified. For example the fully qualified column name Azure.usage.MeterName
explicitly refers to the column called MeterName in the DSET Azure.usage.
This page is a work in progress
In Exivity Services can be anything that corresponds to a SKU or sellable item from your Service Catalogue. It should relate to a consumption record (or multiple records) from your extracted data sources.
For example: with most public cloud providers, the provider defines the chargeable items that are shown on the end of month invoice. However, when working through a Managed Services Provider, a Cloud Services Provider or a System Integrator, additional services can be sold on top of those. Potentially, you may want to apply an uplift to the rate or charge a fixed amount of money every month for a certain service. Different scenario's are possible here, it all depends on your business logic.
A service is a named item with associated rates and/or costs used to calculate a charge that appears on a report, where rates represent revenue and costs represent overheads.
When discussing services and their related charges a number of terms are required. Exivity uses the following terminology in this regard:
When created during the ETL process, service definitions are created via the service and services statements in Transcript. During the execution of a Transcript task, service definitions created by these statements are cached in memory. Once the task has completed successfully, the cached services are written to the global database where they remain indefinitely (or until such time as they are manually deleted).
If the task does not complete successfully then the service definitions cached in memory are discarded, the expectation being that the task will be re-run after the error condition that caused it to fail has been rectified and the services will be written to the global database at that time.
There are different types of charge that can be associated with a service. Collectively these influence the total charge(s) shown on the report and Exivity supports the following charge types as described in the Terminology table above:
unit rate
fixed price
COGS rate
fixed COGS
At least one of these charge types must be associated with a service definition. Multiple combinations of the charge types may also be used.
Once the resulting charge has been calculated based on the charge types, it may be further modified through the application of adjustments, proration and minimum commit (all of which are detailed later in this article).
In order to calculate the charge(s) associated with usage of a service Exivity needs to know the period of time for which each payment is valid. For example a Virtual Machine may have a daily cost associated with it, in which case using it multiple times in a single day counts as a single unit of consumption whereas Network Bandwidth may be chargeable per Gigabyte and each gigagyte transferred is charged as it occurs.
The charge interval (also termed simply interval) for a service can be one of the following:
individually - the charge for a service is applied every time a unit of the service is consumed, with no regard for a charging interval
daily - the charge is applied once per day
monthly - the charge is applied once per calendar month
Although hourly charge intervals are not yet directly supported, it is possible to charge per hour by aggregating hourly records and using the EXIVITY_AGGR_COUNT
column created during the process to determine the units of hourly consumption as a result.
The minimum commit is the minimum number of units of consumption that are charged every interval, or (in the case of services with an interval of individually) every time the service is used. If fewer units than the minimum commit are actually consumed then the service will be charged as if the minimum commit number of units had been used.
After the charge for usage of a monthly service has been determined, it may be prorated by modifying that charge based on the frequency of the usage.
This process will reduce the charge based on the number of days within the month that the service was used. For example if consumption of a service with a monthly charge interval was only seen for 15 days within a 30 day calendar month then the final charge will be 1/2 of the monthly charge.
!!! note Daily proration whereby a daily service is prorated based on the hours seen will be implemented in the second half of 2018 and this documentation will be updated at that time to reflect the additional feature
A service definition comprises two categories of information:
The service - Metadata describing fixed attributes of the service such as its name, description, group, interval, proration and charge type(s)
The rate revision - Information detailing the charge type(s) associated with the service (the rate, fixed price, COGS rate and fixed COGS values) and additional information detailing the date(s) for which those values should be applied
A service definition is associated with a specific a DSET as the units of consumption are retrieved from a column (named in the service definition itself) in usage data.
The following tables summarise the data members that comprise each of these categories:
The rate_col, fixed_price_col, cogs_col and fixed_cogs_col fields are used when the specific value to use is derived at report-time from the usage data, as opposed to explicitly being included in the rate revision itself.
A service may have any number of associated rate revisions so long as they have different effective_date or minimum commit values. This means that a service can have different charges applied depending on the date that the report is to be generated for, or depending on the specific values in the columns used by a report.
A service may use neither, either or both of rate and fixed_price, and neither or one of cogs and fixed_cogs. At least one of rate, fixed_price, cogs or fixed_cogs is required, but cogs and fixed_cogs may not both be used.
Any or all of rate, fixed_price, cogs and fixed_cogs may have a value of 0.0, in which case no charges will be leveraged against the service but the units of consumption will still be shown on reports.
Term
Synonym/Abbreviation
Meaning
service definition
service
A template defining the manner in which service instances should be charged
service instance
instance
Consumption of a service, associated with a unique value such as a VM ID, a VM hostname, a resource ID or any other distinguishing field in the usage data
unit of consumption
unit
The consumption of 1 quantity of a service instance
charge interval
interval
The period of time that a unit of consumption is charged over (additional units of the same service instance consumed within the charge interval do not increase the resulting charge)
unit rate
rate
The charge associated with 1 unit of consumption of a service instance in the charge interval
COGS rate
cogs
(Short for Cost Of Goods Sold
) The cost (overhead) to the provider of a service for providing 1 unit of consumption of that service per charge interval
fixed price
fixed rate or interval-based rate
A specific amount charged per service instance per interval for one or more units of consumption
fixed COGS
interval-based COGS
A specific amount representing the overheads associated with providing one service instance of a service per charge interval
charge
A generic term to indicate some money payable by the consumer of service instances to the provider of those instances
Attribute
Purpose
key
A unique key (as a textual string) used to identify the service
description
A user-defined description or label for the service
group or category
An arbitrary label used to group services together
unit label
A label for the units of measure, such as 'GB' for storage
RDF or DSET
The DSET ID of the usage data against which the service is reported
usage_col
The name of the column in the usage data from which the number of units consumed can be derived
interval
The charging interval for the service, such as 'daily', 'monthly' etc.
proration or model
Whether the service is prorated or unprorated
rate type
Which (if any) of rate and fixed rate to apply
cogs type
Which (if any) of cogs and fixed cogs to apply
Field
Description
rate
The cost per unit of consumption
rate_col
The name of a column containing the cost per unit of consumption
fixed_price
A fixed charge associated with use of the service per charging interval, regardless of the amount of usage
fixed_price_col
The name of a column containing the fixed charges as described above
cogs
(Short for Cost Of Goods Sold) The cost per unit associated with delivery of the service
cogs_col
The name of a column containing the COGS cost per unit
fixed_cogs
As for fixed_price but for the cost of delivering the service
fixed_cogs_col
The name of a column containing the fixed_cogs prices
effective_date
A date in yyyyMMdd format (stored internally as an integer) from which the rate is valid
minimum commit
The minimum commit value for the service (if this is 0
then no minimum commit is applied)