CSV files are a way of storing data in a table format. A table consists of one or more rows, each row containing one or more columns. There is no formal specification for CSV files although a proposed standard can be found at https://www.ietf.org/rfc/rfc4180.txt.
Although the 'C' in 'CSV' stands for the word comma, other characters may be used as the separator. TAB and semicolons are common alternatives. Exivity can import CSVs using any separator character apart from a dot or an ASCII NUL and uses the comma by default.
- 1.The first line in the file must define the column names
- 2.Every row in the file must contain the same number of fields
- 3.A field with no value is represented as a single separator character
- 4.The separator character must not be a dot (
- 5.There may be no ASCII NUL characters in the file (a NUL character has an ASCII value of 0)
When a Dataset is imported by a Transcript task any dots (
.) contained in the column name will be replaced with underscores (
_). This is due to a dot being a reserved character used by Fully Qualified Column Names
Although datasets are generated by USE as part of the extraction phase, additional information may be provided in CSV format to enrich the extracted data. This additional CSV data must conform to the requirements above.
A Transcript task may import more than one dataset during execution, in which case multiple DSETs will be resident in RAM at the same time. It is, therefore, necessary for any subsequent Transcript statement which manipulates the data in a DSET to identify which DSET (or in some cases DSETs) to perform the operation on. This is achieved using a DSET ID which is the combination of a Source tag and an Alias tag.
After a Dataset has been imported the resulting DSET is assigned a unique identifier such that it can be identified by subsequent statements. This unique identifier is termed the DSET ID and consists of two components, the Source and Alias tags.
The default alias tag is defined automatically and is the filename of the imported Dataset, minus the file extension. For example, a Dataset called
usage.csvwill have an alias of
By convention Datasets produced by USE are located in sub-directories within the directory
<basedir>\collected. The sub-directories are named according to the data source and datadate associated with that data. The naming convention is as follows:
<data_source>is a descriptive name for the external source from which the data was collected
<yyyy>is the year of the datadate as a 4-digit number
<MM>is the month of the datadate as a 2 digit number
<dd>is the day of the month of the datadate as a 2 digit number
When importing one of these datasets using automatic source tagging, the source tag will be the name of the directory containing that dataset. Thus, assuming a datadate of 20160801 the following statement:
import usage from Azure
will import the Dataset
<basedir>\collected\Azure\2016\08\01_usage.csv, and assign it a Source tag of Azure.
import my_custom_data\mycosts.csv source costs
will import the Dataset
<basedir>\my_custom_data\mycosts.csvand assign it a Source tag of costs. Checks are done during the import process to ensure that every imported Dataset has a unique
When a column name is prefixed with a DSET ID in the manner described previously, it is said to be fully qualified. For example the fully qualified column name
Azure.usage.MeterNameexplicitly refers to the column called MeterName in the DSET Azure.usage.