where

Overview

The where statement is used to define a local filter, which restricts the scope of any statements contained within its body.

Syntax

where (conditional expression) {
   <statements>
}

Details

The expressions used by the where statement take the same form as those used in the if statement.

The information below covers aspects of expression use that are unique to the where statement

The where statement is used to define a local filter which restricts the effect of the statements located in the lines between the opening { and closing }.

These braces define the start and end of the body of the where statement.

  • The opening { must be preceded with white-space may be positioned on the same line as the where statement or may be on a line of its own

  • The closing } must be located on a line of its own

Only statements that support a local filter may be located with the body. If a non-supported statement is included in the body then a log entry is generated and the task will fail.

The conditional expression associated with the where statement must reference at least one column name as detailed in the Column names section below

Only rows in the DSET associated with the expression for which the conditional expression is true will be affected by the statements in the body of the where statement.

Referencing column names

As the purpose of the where statement is to selectively process rows of data based on the expression, it is a requirement that the condition references one or more column names.

To reference a column in an expression, its name should be specified, enclosed in square brackets [like_this].

If the column name contains white-space then it should be quoted within the brackets ["like this"]. The reason that the quotes must be located within the brackets is that otherwise the brackets (along with their contents) would be treated as a literal string.

To illustrate this in use, consider a DSET called test.data which contains the following:

user key,value,name
d,4,dexter
f,6,freddy
g,7,geoff
a,1,andy

The following script will create a new column called "flag" and set that new column value to 1 for all rows where the user key is f.

# Create a blank column called 'flag'
create column flag

# Set values in 'flag' to 1 where the user key is f
where (["user key"] == f) {
   set flag to 1
} 

A blank value in a column can be checked for using a pair of quotes with no content - "". The following will delete all rows in the DSET Azure.usage where the quantity column is empty:

where ([Azure.usage.quantity] == "") {
    delete rows
}

It is possible for one or more statements (such as set) within the statement block enclosed by where { ... } to modify the contents of a column that is being used by the expression.

Doing this may lead to unexpected results as for each statement in the block the expression is re-evaluated for every row in the data being processed, so if set is used to update the values in a column then the new values will be used for all subsequent evaluations of the expression, which will affect all following statements in the where block.

If updating a value in any column(s) referenced by the expression, it is recommended to make a copy of the column before the where statement, using

set copy_of_column as original_column

The values in the copy can then be freely modified, and the original column can be replaced with it after execution of the where statement has completed.

Regular expressions

Regular expressions are often used to restrict processing to certain rows in a dataset.

For example:

# Delete all rows where the hostname contains the word 'test'
where ([hostname] =~ /.*test.*/) {
    delete row
}

As is the case with the if statement, the entire value in the column must match the expression in order for the row to be processed (there is no implicit .* at the end of an expression).

Supported statements

The following statements may be used in the body of a where statement:

The following statements may also be used in the body of a where statement but are unaffected by the expression (and as a result work exactly the same as if they were used outside the where statement body)

Examples

# Delete all rows where the VMID is 1234

where ([VMID] == 1234) {
    delete rows
}

Last updated