Dedup
Read the API documentation »
Filter Dedup
Overview
This filter removes duplicate records. A record is considered a duplicate, and is thus removed by this filter,
if another record with the same values has already been seen. The comparison is performed on a
user-provided list of fields (Fields
setting).
WARNING: to remove duplicates, this filter stores one key per unique record in memory, this means that the overall memory grows linearly with the number of unique records in your data set. Depending on your data set, this might lead to OOM (i.e. out of memory) errors.
Configuration
Keys available in the [filter.config]
section:
Name | Type | Default | Required | Description |
---|---|---|---|---|
Fields | array of strings | [] | true | fields to consider when comparing records |
KeySeparator | string | “\x1e” | false | character separator used to build a key from the fields |
Last modified December 12, 2022