Secondary index queries

Record bin values can be optionally indexed with a secondary index to support faster queries than equivalent primary index queries. When multiple secondary indexes are available to a query it should filter by the most selective index (meaning the index with the lowest entries_per_bval).

Foreground queries

Similar to the SELECT statement in a relational database, a foreground secondary index query is read-only, and has the query features of being partitioned and partition-tolerant, supporting pagination and filtering with an expression.

When using the client, a secondary index query starts with assigning an index filter. The records matched by the index filter are read locally, as secondary indexes are colocated on the same node as the data partition, and can be filtered further with a filter expression. Together, the filters act as a WHERE clause.

Foreground secondary index queries have the ability to:

Filter records by set name.
Filter records with an index filter (required).
Filter records with filter expressions such as last_update() > X.
Return only record digests and metadata (generation and TTL).
Return specified bins (projection).
Return a specified number of records with the ability to continue the query from that point (pagination).
Query by data partition (partitioning).
Leverage the short query runtime optimization.

Limiting query speed

Each individual query can be capped to run at a specified records per-second limit.

SREs can enforce that the totality of a user’s commands, including queries, are limited to a record per-second rate quota.

Caveats

Secondary indexes are built independently on each cluster node, a structure for each data partition, which means that a secondary index query hits all the nodes. For better performance, secondary indexes should not be used when key-based commands are an alternative.

For example, secondary indexes should not be built on a “primary key”. The record already has an entry in the primary index for direct, low latency access with read, upsert and delete commands.

Similarly, secondary indexes are not the most efficient way to implement a unique index query (where the entries_per_bval index statistic is equal to 1). Instead, a lookup table provides significantly better performance and throughput. A distinct set can be used to insert all the unique identifiers as record keys with a digest stored in a record bin. A single read operation against the lookup table retrieves the digest of the actual record, which is then used to access that record directly in a subsequent read command. This two-phase lookup pattern is one or two orders of magnitude faster than a secondary index query to find zero or one record.

Index filters

Index filters define the predicate to be used directly against a secondary index. The following index filters are available:

Equals comparison against indexes on string, blob or integer values.
Range comparisons against indexes on integer values. Range query results are inclusive, that is both specified values are included in the results.
Point-In-Region or Region-Contain-Point comparisons against geospatial indexes.

Background queries

A client application can issue an asynchronous background query to modify records in place on the server. This is similar to an UPDATE statement in a relational database.

A secondary index background query starts with assigning an index filter. The records matched by the index filter can be filtered further with a filter expression. Together, the filters act as a WHERE clause.

Background queries apply either multiple native bin operations or a user-defined function (UDF) written in Lua to the records matched by the query. Using bin operations (AKA background ops) is often more efficient and higher performing than using a Lua UDF (AKA background UDF).

Background secondary index queries have the following features:

Filter records by set name.
Filter records with an index filter (required).
Filter records with filter expressions such as last_update() > X.
Clients can poll for progress and completion of a background query.

Background queries are not rebalancing-tolerant and might miss records when a partition is migrating.

Limitations

Aerospike supports up to 256 secondary indexes per namespace.
String and blob values bigger than 2KiB cannot be indexed.
Database 6.0-compatible clients are needed for partition-tolerant queries.