curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/{table}/_create_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "field_name" : "book_intro", "index_name" : "ivf_flat_index", "metric_type" : "l2", "index_type": "IVF_FLAT", "params": { "nlist" : 10 } }';
复制
Create Vector Index
This chapter introduces how to create vector index, which is used for vector similarity search. If not created, similarity search cannot be performed in Hippo. The following example creates an IVF_FLAT index with 10 cluster centers and uses Euclidean distance (L2) as main similarity metric.
Result:
{ "acknowledged" : true }
复制
Parameter description:
Parameters | Description | Options |
---|---|---|
table |
Table name, such as "book" created in this example |
|
database_name |
Database where the destination table is located |
|
field_name |
Vector column where the vector index is to be created |
|
index_name |
Vector index name |
|
metric_type |
Vector similarity metric type used to measure similarities among vectors |
<li>L2 (Euclidean distance)</li> <li>IP (Inner product)</li> |
index_type |
Vector index type |
<li>FLAT</li> <li>IVF_FLAT</li> <li>IVF_SQ</li> <li>IVF_PQ</li> <li>IVF_PQ_FS</li> <li>HNSW</li> |
params |
Vector index parameter, related to vector index type |
For some indexes, like PQ and SQ, sacrificing performance for compression ratio improvement, Hippo provides “refine” related parameter “index_slow_refine” in params additionally when creating vector index, enhancing recall rate effectively through increasing the number of returned records (topk). Here is an example of building index:
curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/{table}/_create_embedding_index?pretty' -H 'Content-Type: application/json' -d' { "field_name": "vec", "index_name": "ivf_sq_index", "metric_type": "L2", "index_type": "ivf_sq", "params": { "nlist": 512, "sq_type": "SQ8", "index_slow_refine": "true" } }';
复制
Parameter description:
[title='Create Vector Index – with refine parameter (Restful API) ']
Parameters | Description | Options |
---|---|---|
index_slow_refine (params) |
Improves recall rate, mainly used for PQ, SQ |
Defaults to "false" |
Activate Vector Index
After creating vector index, Hippo will not activate it automatically as when the volume of data stored in Hippo is large, index creation will be resource consuming. Users can activate the index when available. Once activated, Hippo will perform automatic maintenance and update the index synchronously once the corresponding data is added/deleted/updated. This chapter introduces how to activate the vector index.
curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_activate_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "index_name" : "ivf_flat_index", "wait_for_completion" : true, "timeout" : "2m" }';
复制
Result:
{ "job_id" : "2c7fcdd7cd23479bb8660a86703e5c2f", "job_status" : "SHIVA_JOB_SUCCESS", "embedding_number" : 100, "task_results" : [ { "id" : "cecfe25db0b840b899cb5b3806bdcfcc", "status" : "TASK_SUCCESS", "server" : "172.29.40.26:27861", "embedding_number" : 100, "execute_time" : 0.233 } ] }
复制
Parameter description:
Parameters | Description | Required |
---|---|---|
table |
Table name, such as "book" created in this example |
Yes |
database_name |
Database where the destination table is located |
No, defaults to "default" database |
index_name |
Vector index name |
Yes |
wait_for_completion |
Whether to wait until the job is done |
Yes |
timeout |
Operation timeout |
If "wait_for_completion" is set to true, timeout parameter is required |
When activating vector index, Hippo will scan the data stored to create vector index. As mentioned, when data volume is large, this creation process takes time, thus users can set “wait_for_completion” to false to achieve asynchronous activation. During this process, Hippo will return the activation job ID, which can be used for checking job status. After the activation job is complete, the number of vectors used for index creation will be summarized and the whole job will not have negative impact on write operations.
Release Vector Index
The vector indexes currently Hippo supports are in-memory indexes. Hippo supports releasing index to reduce memory consumption. Released vector index does not occupy memory anymore, however it cannot be used for vector search indeed.
curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_release_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "index_name" : "ivf_flat_index", "wait_for_completion" : true, "timeout" : "2m" }';
复制
Result:
{ "job_id" : "3d585076c8994ef881eab2f16daaf096", "job_status" : "SHIVA_JOB_SUCCESS", "embedding_number" : 100, "task_results" : [ { "id" : "0c4502bcefd4436a9efdddf78902c7a7", "status" : "TASK_SUCCESS", "server" : "172.29.40.26:27861", "embedding_number" : 100, "execute_time" : 0.014 } ] }
复制
Parameter description:
The parameters used in index release are similar to the ones used in index activation. After release is done, the number of vectors released will be returned and the whole job will not have negative impact on write operations.
Load Vector Index
Before using vector index, loading index into memory should be confirmed first. This chapter introduces how to load index.
curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_load_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "index_name" : "ivf_flat_index", "wait_for_completion" : true, "timeout" : "2m" }';
复制
Result:
{ "job_id" : "b0dc39c6023041228a5e2888174f6396", "job_status" : "SHIVA_JOB_SUCCESS", "embedding_number" : 100, "task_results" : [ { "id" : "ee4db1d2c5ee455db85db072b90b5e4a", "status" : "TASK_SUCCESS", "server" : "172.29.40.26:27861", "embedding_number" : 100, "execute_time" : 0.014 } ] }
复制
Parameter description:
The parameters used in index release are similar to the ones used in index activation. After release is done, the number of vectors released will be returned and the whole job will not have negative impact on write operations.
Check Vector Index
Users can check detailed information of a vector index. Hippo will return all segment information of index vectors each shard has.
curl -u shiva:shiva -XGET 'localhost:8902/hippo/v1/{table}/_get_embedding_index?database_name={database_name}&pretty'
复制
Result:
{ "default#book" : { "table_id" : "361165c970d1470886c435a86e3eae2f", "shards" : [ { "tablet_id" : 0, "address" : "172.29.40.26:27851", "indexes" : [ { "index_id" : 0, "state" : "PUBLIC", "flat_segments" : [ { "min_id" : -1, "max_id" : -1, "embedding_num" : 0, "deleted_embedding_num" : 0 }, { "min_id" : 0, "max_id" : 99, "embedding_num" : 100, "deleted_embedding_num" : 0 } ] } ] } ] } }
复制
Parameter description:
Parameters | Description | Required |
---|---|---|
table |
Table name, such as "book" created in this example |
|
database_name |
Database where the table is located |
No, defaults to "default" database |
Delete Vector Index
This chapter introduces how to delete a vector index in Hippo.
curl -u shiva:shiva -XDELETE 'localhost:8902/hippo/v1/{table}/_drop_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "index_name" : "book_intro_index" }';
复制
Result:
{ "acknowledged" : true }
复制
Parameter description:
Parameters | Description | Required |
---|---|---|
table |
Table name, such as "book" created in this example |
Yes |
database_name |
Database where the table is located |
No, defaults to "default" database |
index_name |
Index to be deleted. If users would like to delete multiple indexes, they should list all names as: ["a","b","c"] |
Yes |
Enable/Disable Vector Index Auto-Compaction
By default, Hippo performs segment auto-compaction automatically. Users can enable/disable it via below command.
curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_embedding_index_auto_compaction?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "index_name" : "pq_index", "enable_auto_compaction" : true, "wait_for_completion" : true, "timeout" : "2m" }';
复制
Result:
{ "job_id" : "737e8f256f7141c69d4871bab51fadb6", "job_status" : "SHIVA_JOB_SUCCESS", "task_results" : [ { "id" : "a6ea4f51db754485b0c1bcba54f201ab", "status" : "TASK_SUCCESS", "server" : "tw-node48:8702", "execute_time" : 0.0 } ] }
复制
Parameter description:
Parameters | Description | Required |
---|---|---|
table |
Table name, such as "book" created in this example |
Yes |
database_name |
Database where the table is located |
No, defaults to "default" database |
enable_auto_compaction |
Enable/disable auto-compaction |
Yes |
index_name |
Vector index name |
Yes |
wait_for_completion |
Whether to wait until the job is done |
Yes |
timeout |
Operation timeout |
If "wait_for_completion" is set to true, timeout parameter is required |
Compact Vector Index Manually
curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_compact_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "index_name" : "pq_index", "wait_for_completion" : true, "timeout" : "2m" }';
复制
Result:
{ "job_id" : "da4ce8d7f032415e87456c697f6592c3", "job_status" : "SHIVA_JOB_SUCCESS", "embedding_number" : 0, "task_results" : [ { "id" : "bef72e410cd44d56a2f50a444353bf2c", "status" : "TASK_SUCCESS", "server" : "tw-node48:8702", "embedding_number" : 0, "execute_time" : 0.0 } ] }
复制
Parameter description:
Parameters | Description | Required |
---|---|---|
table |
Table name, such as "book" created in this example |
Yes |
database_name |
Database where the table is located |
No, defaults to "default" database |
index_name |
Vector index name |
Yes |
wait_for_completion |
Whether to wait until the job is done |
Yes |
timeout |
Operation timeout |
If "wait_for_completion" is set to true, timeout parameter is required |
Create Scalar Index
Unlike vectors, which have both magnitude and direction, scalars have only magnitude. Similar to traditional database, users can build indexes on scalar fields to speed up operations on scalars.
curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/{table}/_create_scalar_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "index_name" : "index", "field_names" : ["word_count"] }';
复制
Result:
{ "acknowledged" : true }
复制
Parameter description:
Parameters | Description | Required |
---|---|---|
table |
Table name, such as "book" created in this example |
Yes |
database_name |
Database where the table is located |
No, defaults to "default" database |
index_name |
Scalar index name |
Yes |
filed_names |
Field containing scalar index. Hippo supports single index and composite index now. |
Yes |
Create Array Index
Hippo supports creating array containing scalars only. With index created on array column, the performance of hybrid search can be improved. This chapter introduces the array type and how to create array index in Hippo.
curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/book?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "settings": { "number_of_shards" : 1, "number_of_replicas" : 1 }, "schema": { "fields": [ { "name": "book_id", "is_primary_key": true, "data_type": "int64" }, { "name": "word_count", "is_primary_key": false, "data_type": "int64" }, { "name": "attributes", "is_primary_key": false, "data_type": "array", "type_params": { "element_data_type" : "string" } }, { "name": "book_intro", "data_type": "float_vector", "is_primary_key": false, "type_params": { "dimension" : 2 } } ] } }';
复制
The example shown above creates a String array column called “attributes” for table “book”, and the related parameters are listed below:
Parameters | Description | Options |
---|---|---|
element_data_type (type) |
Data type of elements stored in array |
<li>Int8</li> <li>Int16</li> <li>Int32</li> <li>Int64</li> <li>String</li> <li>Float</li> <li>Double</li> <li>Bool</li> |
Hippo supports creating index on array column, and the API used for array column during index creation is the same as the one used for other columns. However, there are several points users should pay attention to: - The array index must be single column index. - Only array columns containing elements with Int or String data type can be indexed. - Currently array index only supports conditions with “=” or “in” operators.
curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/book/_create_scalar_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "index_name" : "array_index", "field_names" : ["attributes"] }';
复制
Result:
{ "acknowledged" : true }
复制
The insert operation is similar to:
curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/book/_bulk?database_name={database_name}&pretty' -H'Content-Type: application/json' -d'{ "fields_data": [ { "field_name": "book_id", "field": [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100 ] }, { "field_name": "word_count", "field": [1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000,13000,14000,15000,16000,17000,18000,19000,20000,21000,22000,23000,24000,25000,26000,27000,28000,29000,30000,31000,32000,33000,34000,35000,36000,37000,38000,39000,40000,41000,42000,43000,44000,45000,46000,47000,48000,49000,50000,51000,52000,53000,54000,55000,56000,57000,58000,59000,60000,61000,62000,63000,64000,65000,66000,67000,68000,69000,70000,71000,72000,73000,74000,75000,76000,77000,78000,79000,80000,81000,82000,83000,84000,85000,86000,87000,88000,89000,90000,91000,92000,93000,94000,95000,96000,97000,98000,99000,100000 ] }, { "field_name": "attributes", "field": [["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0","attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3","attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10","attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"] ] }, { "field_name": "book_intro", "field": [[1,1],[2,1],[3,1],[4,1],[5,1],[6,1],[7,1],[8,1],[9,1],[10,1],[11,1],[12,1],[13,1],[14,1],[15,1],[16,1],[17,1],[18,1],[19,1],[20,1],[21,1],[22,1],[23,1],[24,1],[25,1],[26,1],[27,1],[28,1],[29,1],[30,1],[31,1],[32,1],[33,1],[34,1],[35,1],[36,1],[37,1],[38,1],[39,1],[40,1],[41,1],[42,1],[43,1],[44,1],[45,1],[46,1],[47,1],[48,1],[49,1],[50,1],[51,1],[52,1],[53,1],[54,1],[55,1],[56,1],[57,1],[58,1],[59,1],[60,1],[61,1],[62,1],[63,1],[64,1],[65,1],[66,1],[67,1],[68,1],[69,1],[70,1],[71,1],[72,1],[73,1],[74,1],[75,1],[76,1],[77,1],[78,1],[79,1],[80,1],[81,1],[82,1],[83,1],[84,1],[85,1],[86,1],[87,1],[88,1],[89,1],[90,1],[91,1],[92,1],[93,1],[94,1],[95,1],[96,1],[97,1],[98,1],[99,1],[100,1] ] } ], "num_rows": 100 }';
复制
After creating vector index, users can perform hybrid search on array or vector data. For more details, please refer to Chapter Hybrid Search Using Array and Vector Field.
Delete Scalar Index
This chapter introduces how to delete scalar index.
curl -u shiva:shiva -XDELETE 'localhost:8902/hippo/v1/{table}/_drop_scalar_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{ "index_name" : "index" }';
复制
Result:
{ "acknowledged" : true }
复制
Parameter description:
Parameters | Description | Required |
---|---|---|
table |
Table name, such as "book" created in this example |
Yes |
database_name |
Database where the table is located |
No, defaults to "default" database |
index_name |
Index to be deleted. If users would like to delete multiple indexes, they should list all names as: ["a","b","c"] |
Yes |