replace

Overview

The replace statement is used to search for, remove, and optionally replace with new values, substrings of the values in a column.

Syntax

replacesubstringincolName[withreplacement]

Details

The replace statement will remove or replace all occurrences of substring in the values of a column. The parameters are as follows:

Parameter

Notes

substring

A string to search the values in colName for

colName

The column to search for the substring

replacement

The string to replace occurrences of substring with

The colName argument may or may not be fully qualified, but must reference an existing column.

If the optional replacement string is provided then any occurrences of substring will be substituted with the specified value.

If the replacement string is not provided then all occurrences of substring will be removed from the values in colName.

The replace statement is useful for reducing verbosity in extracted data, resulting in smaller RDFs without losing required information

Example

Given the following sample data:

The following script ...

... will produce the following output data.

Last updated

Was this helpful?