Transform
Last updated
Last updated
Transcript executes user-definable scripts (termed tasks) in order to produce one or more Reporting Database Files (RDFs) from one or more input Dataset files in CSV format. These RDFs are later used by the reporting engine to generate results.
Transcript tasks are located in system/config/transcript/
and are ASCII files which can be created with any editor. Both UNIX and Windows end-of-line formats are supported.
Each statement in a Transcript task must be contained on a single line. Statements consist of a keyword indicating the action to perform, followed by zero or more parameters, separated by white-space, required by the statement. Documentation for all the possible statements can be found in the Transcript language reference guide.
By default a space, tab or newline will mark the end of a word in a Transcript task. To include white space in a parameter (for example to reference a column name with a space in it) then this can be done by enclosing it in double quotes or escaping it by preceding it with \
.
Examples:
create columns from "Meter Name" using Quantity
create columns from Meter\ Name using Quantity
The following table summarizes the behaviour of quotes and escapes:
Comments in a Transcript task start with a #
character that is either of:
the first character of a line in the Transcript task
the first character in a word
Comments always end at the end of the line they are started on.
Transcript statements may contain variables. Variables have a name and a value. When a variable name is encountered during the execution of the task, the name is replaced with the value of the variable with that name.
To separate them from normal statement words, variable names are always preceded with ${
and followed by }
. Therefore the variable with the name dataDate
is referenced as ${dataDate}
in the transcript task. As well as user-defined variables (created using the var statement), the following default variables are supported by Exivity:
Variable names ...
may be used multiple times in a single statement
are case sensitive - ${dataDate}
is different to ${datadate}
may not be nested
may be embedded within surrounding text - xxx${dataDate}yyy
may be used within quotes: import "${baseDir}\to_import\AzureJuly${dataDate}.ccr" source AzureJuly
may appear as words of their own in a transcript statement - create column Date value ${dataDate}
A regular expression variable is a special type of variable used to match the name of a column in a DSET. It is enclosed by ${/
and /}
and the text within this enclosure can take either of the following two forms:
${/
expression
/}
The regular expression described by expression will be applied to the default DSET
${/
dset.id/expression
/}
If the text preceding the central /
character is a valid DSET ID then the expression after that /
will be applied to the column names in that DSET
If the text preceding the /
character is not a valid DSET ID then the entire text of the variable between the ${/
and /}
enclosure is treated as a regular expression and will be applied to the default DSET
Once the DSET ID and the expression have been established by the above, the expression is tested against each column name in the DSET and the first matching column name is returned. If no match is found, then an error is logged and the transcript task will fail.
The regular expression may contain a subgroup, which is enclosed within parentheses - (
and )
. If no subgroup is present, and a match is made, then the entire column name will be returned. If a subgroup is present and a match is made, then only the characters matching the portion of the expression within the parentheses are returned. For example:
The expression does not have to match the entire column name. Assuming no subgroup is specified, as long as a match is made then the variable will be expanded to the whole column name.
Regular expression variables are powerful tools when combined with the rename statement, as they can be used to transform an uncertain column name into a known one.
Examples:
A Transcript task cannot manipulate data on disk directly, so it is necessary to import one or more Datasets in CSV format at runtime in order to process the data within them. When a Dataset is imported the following sequence of actions takes place:
The Dataset (in CSV format) is read from disk
A number of checks are done on the data to ensure it meets the requirements to qualify as a Dataset
The data is converted into an internal format called a DSET
The DSET is assigned two tags (source
and alias
) which when combined together form a unique ID to identify the DSET (see Core concepts for more information)
An index is constructed, which facilitates high speed manipulation of the data in the DSET
The DSET is added to the list of DSETs available for use by subsequent statements in the Transcript task
Once these actions have been completed, a DSET can be identified through the unique combination of source.alias
. This permits Transcript statements to specify which DSET to operate on.
In addition, a default DSET can be specified, which will be used if no alternative DSET is specified. Full details of these mechanisms are detailed in the reference guide, specifically in the import and default articles.
Data can be exported in one of two ways during the execution of a Transcript task:
Many Transcript statements change the data in the DSET in some way. Columns may be created, renamed or deleted and rows may be added and removed for example.
At any point in the Transcript process the current state of a DSET can be rendered to disk as an output CSV file. This is accomplished via use of the export statement. This permits snapshots of a DSET to be created for debugging or audit purposes, as well as the creation of partially processed CSV files for import into a later Transcript process.
The finish statement creates a Reporting Database File (RDF) containing the data in a DSET. This RDF can then be used by the reporting engine.
Characters
Meaning
" ... "
Anything inside the quotes, except for a newline or an escape character is treated as literal text
\"
Whether within quotes or not, this is expanded to a double quote - "
- character
\t
When used outside quotes, this is expanded to a TAB character
\
When used outside quotes, a space following the \
is treated as a literal character
\\
Whether within quotes or not, this is expanded to a backslash - \
- character
Variable
Meaning
${dataDate}
The datadate currently in effect, in yyyyMMdd format
${dataDay}
The day value in the dataDate
variable, expressed as a 2 digit number padded with a leading zero if necessary
${dataMonth}
The month value in the dataDate
variable, expressed as a 2 digit number padded with a leading zero if necessary
${dataMonthDays}
The number of days in the month in the dataMonth
variable
${dataDateStart}
00:00:00 on the day in the dataDate
variable, expressed as a UNIX timestamp
${dataDateEnd}
23:59:59 on the day in the dataDate
variable, expressed as a UNIX timestamp
${dataYear}
The year value in the dataDate
variable, expressed as a 4 digit number
${homeDir}
The base working directory currently in effect
${exportDir}
This is the equivalent of ${baseDir}\exported