Skip to main content
Version: Next

Aerospike

Testing

Important Capabilities

CapabilityStatusNotes
Detect Deleted EntitiesOptionally enabled via stateful_ingestion.remove_stale_metadata
Platform InstanceEnabled by default
Schema MetadataEnabled by default

This plugin extracts the following:

  • Namespaces and associated metadata
  • Sets in each namespace and schemas for each set (via schema inference)

By default, schema inference samples 1,000 documents from each set. Setting schemaSamplingSize: null will scan the entire set.

Note that schemaSamplingSize has no effect if enableSchemaInference: False is set.

Really large schemas will be further truncated to a maximum of 300 schema fields. This is configurable using the maxSchemaSize parameter.

CLI based Ingestion

Install the Plugin

The aerospike source works out of the box with acryl-datahub.

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: aerospike
config:
# Coordinates
hosts:
- - host1
- 3000
- - host2
- 3000

# Credentials
user: user
password: pass


sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
auth_mode
Enum
The authentication mode with the server.
Default: 0
ignore_empty_sets
boolean
Ignore empty sets in the schema inference.
Default: False
include_xdr
boolean
Include XDR information in the dataset properties.
Default: False
inferSchemaDepth
integer
The depth of nested fields to infer schema. If set to -1, infer schema at all levels. If set to 0, does not infer the schema. Default is 1.
Default: 1
login_timeout_ms
integer
Login timeout in milliseconds. Default None, using the default value of the Aerospike Python client.
maxSchemaSize
integer
Maximum number of fields to include in the schema.
Default: 300
password
string
Aerospike password.
platform_instance
string
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.
records_per_second
integer
Number of records per second for Aerospike query. Default is 0, which means no limit.
Default: 0
schemaSamplingSize
integer
Number of documents to use when inferring schema. If set to null, all documents will be scanned.
Default: 1000
tls_cafile
string
Path to the CA certificate file.
tls_capath
string
Path to the CA certificate file.
tls_enabled
boolean
Whether to use TLS for the connection.
Default: False
username
string
Aerospike username.
env
string
The environment that all assets produced by this connector belong to
Default: PROD
hosts
array
Aerospike hosts list.
Default: [['localhost', 3000]]
hosts.array
array
hosts.array.object
object
namespace_pattern
AllowDenyPattern
regex patterns for namespaces to filter in ingestion.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
namespace_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
namespace_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
namespace_pattern.allow.string
string
namespace_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
namespace_pattern.deny.string
string
set_pattern
AllowDenyPattern
regex patterns for sets to filter in ingestion.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
set_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
set_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
set_pattern.allow.string
string
set_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
set_pattern.deny.string
string
stateful_ingestion
StatefulStaleMetadataRemovalConfig
Base specialized config for Stateful Ingestion with stale metadata removal capability.
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Code Coordinates

  • Class Name: datahub.ingestion.source.aerospike.AerospikeSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Aerospike, feel free to ping us on our Slack.