import

Overview

The import statement is used to read a CSV file (which must conform to Dataset standards) from disk in order to create a DSET which can then be processed by subsequent Transcript statements.

To import CSV files that do not use a comma as the delimiter, please refer to the Quote and Separator Options section further down in this article.

Syntax

Import from CSV files

importfilenamefromsource[aliasalias][options { ... }]

importfilenamesourcecustom_source[aliasalias][options { ... }]

Import from CCR files

import filename.ccr source custom_source [aliasalias]

Import from database tables

import ACCOUNT sourcecustom_source[aliasalias]

import USAGE fordatefromdb_namesourcecustom_source[aliasalias]

Details

If using the Windows path delimiter - \ - it is advisable to put the path and filename in double quotes to avoid the backslash being interpreted as an escape character

Options

Data imported from a CSV file can be filtered as it is being read in order to reduce post-import processing time and memory overhead.

The import filters are configured via the options parameter, which if present is immediately followed by one or more name = value pairs enclosed within braces, for example:

import system/extracted/example.csv source test alias data options {
    skip = 1
}

The = sign is optional, but if present it must be surrounded by whitespace.

Multiple options may be specified and each option must be placed on a separate line, for example:

import system/extracted/example.csv source test alias data options { 
    skip = 1
    omit = 4
}

The options supported by import are as follows:

Name

Type/possible values

Description

skip

numeric

Number of leading non-blank rows to skip

omit

numeric

Number of trailing non-blank rows to omit

heading_dupe_check header_dupe_check

enabled / on / true / yes

Skip rows that are a duplicate of the header row

select include

column list

A space-separated list of column names to include in the import

ignore exclude

column list

A space-separated list of column names to ignore when importing

filter

expression

If specified, only rows that match the expression will be imported

escaped escape

enabled / on / true / yes

Treat backslash (\) characters in the data in the CSV to be imported as an escape character. This will escape the following delimiter or quote. For a literal backslash to be imported it needs to be escaped in the data (\\). Where a backslash is not followed by another backslash or a quote, it is treated literally.

pattern

enabled / on / true / yes

Permits the filename to be specified as a regular expression

encoding

Specifies the encoding of the file to be imported. Any encoding supported by libiconv may be used.

skip_corrupted_file

enabled / on / true / yes

If this option is enabled and pattern is set, any file that is not formatted correctly is skipped. If this option is not enabled then any file that is not formatted correctly will cause the script to terminate with an error.

skip_corrupted_record

enabled / on / true / yes

If this option is enabled, corrupted or malformed records (rows) are skipped

filename_column

enabled / on / true / yes

If enabled, a column calledEXIVITY_FILE_NAME will be created which contains the filename from which each record was imported

If the skip option is specified then the column headings in the imported data will be set to those of the first row following the skipped rows

The filter expression, if specified, is of identical format to that used by the where statement and it must reference at least one column name, enclosed within square brackets:

import system/extracted/example.csv source test alias data options { 
    # Only import rows where the vmType column is not the value "Template"
    filter ([vmType] != Template)
}

To specify a parameter or column name that contains spaces in an expression, use quotes within the square brackets as follows:

import system/extracted/example.csv source test alias data options { 
    filter (["service name"] =~ /.*D/)
}

File name / pattern

If a pattern is specified in the import options, then the filename parameter is treated as an ECMAScript-type regular expression, and all files matching that pattern are imported and appended to one another to result in a single DSET.

Only filenames may contain a regular expression, directory names are always treated literally.

If using a regular expression, the import options are applied to all files matching that expression. All files must have the same structure (after any select/ignore options have been applied).

If any file in the matching set being imported has different columns to the first file that matched then an error will be generated and the task will fail.

Database tables

Account data

To import all accounts from Exivity the following statement is used:

import ACCOUNT sourcecustom_source[aliasalias]

This functionality is intended for advanced use cases where it is necessary to correlate information about existing accounts into the new data being processed as part of the Transform step.

Usage data

To import usage data previously written to an RDF using finish the following statement is used:

import USAGE fordatefromdsetsourcecustom_source[aliasalias]

This functionality may be useful in the event that the original data retrieved from an API has been deleted or is otherwise unavailable.

The date must be inyyyyMMddformat. The RDF from which the usage will be imported is:

<basedir>/system/report/<year>/<month>/<day>_<dset>_usage.rdf.

To illustrate this in practice, the statement ...

import USAGE for 20180501 from azure.usage source test alias data

... will load the usage data from ...

<basedir>/system/report/2018/05/01_azure.usage_usage.rdf

... into a DSET calledtest.data.

Source and Alias tags

A file imported from disk contains one or more named columns which are used to create an index for subsequent processing of the data in that file. As multiple files may be imported it is necessary to use namespaces to distinguish between the DSETs created from the files (each imported file is converted into a memory-resident DSET before it can be processed further). This is accomplished through the use of source and alias tags. Each file imported is given a unique source and alias tag, meaning that any column in the data imported from that file can be uniquely identified using a combination of source.alias.column_name.

There are two main variations of the import statement which are as follows:

Automatic source tagging

importfilenamefromsource[aliasalias]

By convention, data retrieved by the USE extractor from external sources is located in a file called

<basedir>/system/extracted/<source>/<yyyy>/<MM>/dd>_.csv

where <yyyy> is the year, <MM> is the month and <dd> is the day of the current data date.

Typically, the <source> portion of that path will reflect the name or identity of the external system that the data was collected from.

By default, the alias tag will be set to the filename, minus the `_ portion of that filename and _.csv` extension but an alias can be manually specified using the optional _alias parameter.

As an example, assuming the data date is 20170223, the statement:

import usage from Azure

will create a DSET called Azure.usage from the file /system/extracted/Azure/2017/02/23_usage.csv.

The statement:

import usage from Azure alias custom

will create a DSET called Azure.custom from the file

<basedir>/system/extracted/Azure/2017/02/23_usage.csv.

Manual source tagging

importfilenamesourcecustom_source[aliasalias]

This form of the statement will import the filename and the source tag will be set to the value of the custom_source parameter.

By default, the alias tag will be set to the filename minus the .csv extension but an alias can be manually specified using the optional alias parameter.

As an example, assuming the data date is 20170223, the statement:

import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

will create a DSET called Azure.usage from the file

<basedir>/system/extracted/Azure/20170223.csv

Importing CCR files

Options will be ignored when importing CCR files

When performing an import using manual tagging it is possible to import a Cloud Cruiser CCR file. This is done by specifying the .ccr filename extension as follows:

import system/extracted/myccrfile.ccr source MyData

Full information regarding CCR files is beyond the scope of this article. Cloud Cruiser documentation should be consulted if there is a requirement to know more about them.

In order to create a DSET from a CCR file, Transcript converts the CCR file to a Dataset in memory as an interim step. The resulting DSET will have blank values in any given row for any heading where no dimension or measure in the CCR file existed that matched that heading. No distinction is made between dimensions and measures in the CCR file, they are imported as columns regardless.

When importing a CCR file, the quote and separator characters are always a double quote - " - and a comma - , - respectively. Fields in the CCR file may or may not be quoted. The conversion process will handle this automatically and any quotes at the start and end of a field will be removed.

Quote and Separator options

The fields in a dataset file may or may not be quoted and the separator character used to delineate fields can be any character apart from an ASCII NUL (0) value. It is therefore important to ensure that the correct options are set before importing a dataset to avoid unwanted side effects.

The options relating to import are quote and separator (or delimiter). By default, these are set to a double quote and a comma respectively.

If defined the quote character will be stripped from any fields beginning and ending with it. Any additional quote characters inside the outer quotes are preserved. Fields that do not have quotes around them will be imported correctly, unless they contain a quote character at only one end of the field.

If no quote character is defined, then quote characters are ignored during import, but any separator characters in the column headings will be converted to underscores.

Embeds

If the embed option is set, then a curly bracket in the data - { - will cause all characters (including newlines) up until the closing bracket - } - to be imported. Nested brackets are supported, although if there are an uneven number of brackets before the end of the line an error will be generated in the log file and the task will fail.

Embeds are not supported in CCR files

Examples

Create a DSET called Azure.usage from the file /system/extracted/Azure/2017/02/23_usage.csv import usage from Azure

Create a DSET called Azure.custom from the file /system/extracted/Azure/2017/02/23_usage.csv import usage from Azure alias custom

Create a DSET called Azure.usage from the file /system/extracted/Azure/20170223.csv import "system/extracted/Azure/${dataDate}.csv" source Azure alias usage

Previousif Nextinclude

Last updated 3 years ago

Was this helpful?