1 of 7

Digging deeper

The following articles provide information regarding advanced concepts around which Exivity is built. These concepts are referenced repeatedly throughout the documentation and as such the below articles are recommended reading.

Authentication flows

Introduction

There are three different ways users of Exivity can be authenticated. The default method is local authentication which uses local users managed through Exivity (either via the GUI or API). More options are available by navigating to Settings > System > Single Sign-On:

Local authentication

Users logging in with local authentication use a combination of username and password.

SAML authentication

Users logging in with our SAML SSO integration use an external SAML Identity Provider (IdP) to authenticate users. User attributes, usergroup and account access can be provisioned with user properties from the SAML SSO response.

LDAP authentication

Users logging in with our LDAP SSO integration use an external LDAP server to authenticate users with a combination of username and password. User attributes and usergroup can be provisioned with user properties from the SAML SSO response.

Transformer datadate

When a transformer runs, transcript.exe is executed and one of the command line arguments it requires is a date in yyyyMMdd format which is termed the data date. The activities performed by Transcript are associated with this date in several ways, most notably:

when importing using automatic source tagging which specific Dataset file to import from \collected
to determine the concept of 'today' when processing files containing data with timestamps spanning multiple days
to determine the output directory into which RDF files will be generated
to generate the filename of the RDF files generated

The data date is made available to a Transcript task through the automatic creation of the ${dataDate} variable

Dataset lifecycle

CSV files

CSV files are a way of storing data in a table format. A table consists of one or more rows, each row containing one or more columns. There is no formal specification for CSV files although a proposed standard can be found at .

Although the 'C' in 'CSV' stands for the word comma, other characters may be used as the separator. TAB and semicolons are common alternatives. Exivity can import CSVs using any separator character apart from an dot or an ASCII NUL and uses the comma by default.

Datasets

A Dataset is a CSV file, usually produced by , which can be imported for processing by a task. To qualify as a dataset a CSV file must meet the following requirements:

The first line in the file must define the column names
Every row in the file must contain the same number of fields
A field with no value is represented as a single separator character
The separator character must not be a dot (.)
There may be no ASCII NUL characters in the file (a NUL character has an ASCII value of 0)

When a Dataset is imported by a Transcript task any dots (.) contained in the a column name will be replaced with underscores (_). This is due to a dot being a reserved character used by Fully Qualified Column Names

When a DSET is ed during execution of a Transcript task, the exported CSV file will always be a Dataset, in that it can be imported by another Transcript task.

Although datasets are generated by USE as part of the extraction phase, additional information may be provided in CSV format for the purposes of enriching the extracted data. This additional CSV data must conform to the requirements above.

DSETs

Source/Alias tags and DSET IDs

After a Dataset has been imported the resulting DSET is assigned a unique identifier such that it can be identified by subsequent statements. This unique identifier is termed the DSET ID and consists of two components, the Source and Alias tags.

The default alias tag is defined automatically and is the filename of the imported Dataset, minus the file extension. For example a Dataset called usage.csv will have an alias of usage.

Automatic source tagging

where:

<data_source> is a descriptive name for the external source from which the data was collected
<yyyy> is the year of the datadate as a 4-digit number
<MM> is the month of the datadate as a 2 digit number
<dd> is the day of the month of the datate as a 2 digit number

import usage from Azure

will import the Dataset <basedir>\collected\Azure\2016\08\01_usage.csv, and assign it a Source tag of Azure.

Manual source tagging

import my_custom_data\mycosts.csv source costs

will import the Dataset <basedir>\my_custom_data\mycosts.csv and assign it a Source tag of costs. Checks are done during the import process to ensure that every imported Dataset has a unique source.alias combination.

Fully Qualified Column names

When a column name is prefixed with a DSET ID in the manner described previously, it is said to be fully qualified. For example the fully qualified column name Azure.usage.MeterName explicitly refers to the column called MeterName in the DSET Azure.usage.

Config.json

The config.json is the primary configuration file that holds all required settings in order to run an Exivity instance or node successfully. Theconfig.jsonis divided into several sections related to the individual Exivity software components (i.e. database, mq, etc) and is typically created and modified by the installer. However, in some situations such as multi-node setups, it is required to change this configuration file in order to support different workloads on different nodes. Also when migrating to a different PostgreSQL database or RabbitMQ cluster, it would be required to change the database or mqrelated settings in the config.jsonconfiguration file.

db

The db section contains the settings for the PostgreSQL database engine where the Exivity database is hosted:

"db": {
		"driver": "postgres",
		"parameters": {
			"host": "localhost",
			"port": "5432",
			"sslmode": "disable",
			"dbname": "exdb",
			"user": "postgres",
			"password": "secretpassword",
			"connect_timeout": "10",
			"target_session_attrs": "read-write"
		}
	}

mq

The mq section contains the settings for the RabbitMQ message engine:

"mq": {
		"servers": [
			{
				"host": "localhost",
				"port": 5672,
				"secure": false
			}
		],
		"user": "guest",
		"password": "guest",
		"vhost": "/",
		"nodeID": "mqnode01",
		"redialPeriod": 5
	}

griffon

The griffonsection contains settings for the Exivity Job Manager:

"griffon": {
		"heartbeatPeriod": 5,
		"TTL": 10
	}

chronos

The chronos section stores information related to the Exivity Scheduling Service:

	"chronos": {
		"heartbeatPeriod": 5,
		"TTL": 60
	}

merlin

The merlin section contains all parameters related to backend components:

	"merlin": {
		"reservedCPU": 1,
		"heartbeatPeriod": 5,
		"programs": {
			"extract": {
				"path": "bin\\use.exe",
				"queue": "EXTRACT",
				"CPU": 0.25,
				"RAM": 100,
				"params": "${params}"
			},
			"transform": {
				"path": "bin\\transcript.exe",
				"queue": "TRANSFORM",
				"CPU": 0.5,
				"RAM": 250
			},
			"report": {
				"path": "bin\\edify.exe",
				"queue": "REPORT",
				"CPU": 1,
				"RAM": 500
			},
			"horizon": {
				"path": "bin\\horizon.exe",
				"queue": "BUDGET",
				"CPU": 0.5,
				"RAM": 50
			},
			"execute": {
				"path": "${program}",
				"queue": "EXECUTE",
				"CPU": 0.25,
				"RAM": 100
			},
			"proximity": {
				"path": "server\\php\\php.exe",
				"queue": "PROXIMITY",
				"CPU": 0.5,
				"RAM": 1000
			},
			"pigeon": {
				"path": "server\\php\\php.exe",
				"queue": "PIGEON",
				"CPU": 0.25,
				"RAM": 250
			},
			"workflow_ended": {
				"path": "server\\php\\php.exe",
				"queue": "WORKFLOW_EVENT",
				"topic": "evt.workflow_status.griffon.#",
				"params": "common\\pigeon\\pigeon.phar event:post workflow_ended `${payload}`",
				"CPU": 0.25,
				"RAM": 250
			},
			"report_published": {
				"path": "server\\php\\php.exe",
				"queue": "REPORT_PUBLISHED",
				"topic": "evt.report_published.proximity.#",
				"params": "common\\pigeon\\pigeon.phar event:post report_published `${payload}`",
				"CPU": 0.25,
				"RAM": 250
			}
		}
	}

Directories

Exivity uses two main disk directories when it's installed. One to store the user data and one to store the software files. These directories are called the home and program directories, respectively.

On the system where Exivity is installed, the following environmental variables contain a absolute path reference to these directories: EXIVITY_HOME_DIRECTORY and EXIVITY_PROGRAM_DIRECTORY.

Home directory

The home directory should preferably be located on a dedicated volume i.e. D:\exivity\home and it is recommended that it be located on an SSD drive.

Program directory

The main program directory as it should be installed by the Exivity installer:

root
├─── bin                        Backend binaries
|    ├─── exivityd.exe
|    ├─── chronos.exe
|    ├─── horizon.exe
|    ├─── edify.exe
|    ├─── transcript.exe
|    └─── use.exe
├─── server                     Frontend / API dependencies
|    ├─── nginx
|    ├─── php
|    ├─── rabbitmq
|    ├─── pgsql
|    └─── redis
├─── web                        Compiled frontend repositories
|    ├─── glass
|    └─── proximity
├─── *.bat
└─── uninstall.exe

Databases

Daily RDFs

RDF stands for Reporting Database File. Exivity uses these to store usage data and configuration information.

A daily RDF stores the usage data from which Edify produces report data. Any given RDF contains a single DSET, along with internally managed metadata such as column types and prepared report data.

An RDF is created using the finish statement in a Transformer task. For any given DSET, there can be a single RDF per day, although there may be many RDF files per day in total (one RDF for each DSET).

RDFs are named according to the datadate and ID of the DSET they contain, and have a .rdf extension. For example a DSET with an ID of Azure.usage for the first day of the month will result in an RDF called 01_Azure.usage.rdf.

An RDF containing usage data is located at <home dir>/report/<yyyy>/<mm>/<dd>_<dset>.rdf where:

<yyyy> is the year of the datadate
<MM> is the month of the datadate
<dd> is the day of the datadate
<dset> is the DSET ID that was used to populate the usage data

The global database

The global database uses PostgreSQL (user data is stored at<home_dir>/system/pgdata) and contains system-wide configuration data including (but not necessarily limited to):

Service definitions
Service rate revisions
Service rate adjustments
Service categories
User accounts
Report definitions (and related metadata such as sychronisation timestamps)
Metadata about RDF files created by Transcript
Usergroups
Security settings
Account information
Job Schedules

The Global database should never be manually modified unless this process is performed by, or under the guidance of, Exivity support staff. It is critical to make a backup copy of this file before any changes are made.

Config.json

db

The db section contains the settings for the PostgreSQL database engine where the Exivity database is hosted:

"db": {
		"driver": "postgres",
		"parameters": {
			"host": "localhost",
			"port": "5432",
			"sslmode": "disable",
			"dbname": "exdb",
			"user": "postgres",
			"password": "secretpassword",
			"connect_timeout": "10",
			"target_session_attrs": "read-write"
		}
	}

mq

The mq section contains the settings for the RabbitMQ message engine:

"mq": {
		"servers": [
			{
				"host": "localhost",
				"port": 5672,
				"secure": false
			}
		],
		"user": "guest",
		"password": "guest",
		"vhost": "/",
		"nodeID": "mqnode01",
		"redialPeriod": 5
	}

griffon

The griffonsection contains settings for the Exivity Job Manager:

"griffon": {
		"heartbeatPeriod": 5,
		"TTL": 10
	}

chronos

The chronos section stores information related to the Exivity Scheduling Service:

	"chronos": {
		"heartbeatPeriod": 5,
		"TTL": 60
	}

merlin

The merlin section contains all parameters related to backend components:

	"merlin": {
		"reservedCPU": 1,
		"heartbeatPeriod": 5,
		"programs": {
			"extract": {
				"path": "bin\\use.exe",
				"queue": "EXTRACT",
				"CPU": 0.25,
				"RAM": 100,
				"params": "${params}"
			},
			"transform": {
				"path": "bin\\transcript.exe",
				"queue": "TRANSFORM",
				"CPU": 0.5,
				"RAM": 250
			},
			"report": {
				"path": "bin\\edify.exe",
				"queue": "REPORT",
				"CPU": 1,
				"RAM": 500
			},
			"horizon": {
				"path": "bin\\horizon.exe",
				"queue": "BUDGET",
				"CPU": 0.5,
				"RAM": 50
			},
			"execute": {
				"path": "${program}",
				"queue": "EXECUTE",
				"CPU": 0.25,
				"RAM": 100
			},
			"proximity": {
				"path": "server\\php\\php.exe",
				"queue": "PROXIMITY",
				"CPU": 0.5,
				"RAM": 1000
			},
			"pigeon": {
				"path": "server\\php\\php.exe",
				"queue": "PIGEON",
				"CPU": 0.25,
				"RAM": 250
			},
			"workflow_ended": {
				"path": "server\\php\\php.exe",
				"queue": "WORKFLOW_EVENT",
				"topic": "evt.workflow_status.griffon.#",
				"params": "common\\pigeon\\pigeon.phar event:post workflow_ended `${payload}`",
				"CPU": 0.25,
				"RAM": 250
			},
			"report_published": {
				"path": "server\\php\\php.exe",
				"queue": "REPORT_PUBLISHED",
				"topic": "evt.report_published.proximity.#",
				"params": "common\\pigeon\\pigeon.phar event:post report_published `${payload}`",
				"CPU": 0.25,
				"RAM": 250
			}
		}
	}

Dataset lifecycle

CSV files

Datasets

A Dataset is a CSV file, usually produced by , which can be imported for processing by a task. To qualify as a dataset a CSV file must meet the following requirements:

The first line in the file must define the column names
Every row in the file must contain the same number of fields
A field with no value is represented as a single separator character
The separator character must not be a dot (.)
There may be no ASCII NUL characters in the file (a NUL character has an ASCII value of 0)

When a DSET is ed during execution of a Transcript task, the exported CSV file will always be a Dataset, in that it can be imported by another Transcript task.

DSETs

A DSET is the the data in a Dataset once it has been ed by a Transcript task. A DSET resides in RAM during the Transform phase and, if referenced by a statement, is then stored in a database file for long-term use.

A Transcript task may more than one dataset during execution, in which case multiple DSETs will be resident in RAM at the same time. It is therefore necessary for any subsequent Transcript statement which manipulates the data in a DSET to identify which DSET (or in some cases DSETs) to perform the operation on. This is achieved using a DSET ID which is the combination of a Source tag and an Alias tag.

If multiple DSETs are present in memory, the first one that was created will be the default DSET. Column names that are not are assumed to be located in the default DSET.

Source/Alias tags and DSET IDs

The default alias tag is defined automatically and is the filename of the imported Dataset, minus the file extension. For example a Dataset called usage.csv will have an alias of usage.

The Transcript statement can take one of two forms. Depending on which is used, the Source tag is determined automatically, or specified manually as follows:

Automatic source tagging

By convention Datasets produced by USE are located in sub-directories within the directory <basedir>\collected. The sub-directories are named according to the data source and associated with that data. The naming convention is as follows:

<data_source>\<yyyy>\<MM>\<dd>_filename.csv

where:

<data_source> is a descriptive name for the external source from which the data was collected
<yyyy> is the year of the datadate as a 4-digit number
<MM> is the month of the datadate as a 2 digit number
<dd> is the day of the month of the datate as a 2 digit number

When one of these datasets using automatic source tagging, the source tag will be the name of the directory containing that dataset. Thus, assuming a datadate of 20160801 the following statement:

import usage from Azure

will import the Dataset <basedir>\collected\Azure\2016\08\01_usage.csv, and assign it a Source tag of Azure.

Manual source tagging

When importing a Dataset from a specified path, the Source tag is specified as part of the statement. For example the statement:

import my_custom_data\mycosts.csv source costs