联 系 我 们
售前咨询
售后咨询
微信关注:星环科技服务号
更多联系方式 >
7.6 Index Operations
更新时间:12/16/2024, 3:28:36 PM

Create Vector Index

This chapter introduces how to create vector index, which is used for vector similarity search. If not created, similarity search cannot be performed in Hippo. The following example creates an IVF_FLAT index with 10 cluster centers and uses Euclidean distance (L2) as main similarity metric.

curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/{table}/_create_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "field_name" : "book_intro",
  "index_name" : "ivf_flat_index",
  "metric_type" : "l2",
  "index_type": "IVF_FLAT",
  "params": {
    "nlist" : 10
  }
}';
复制

Result:

{
  "acknowledged" : true
}
复制

Parameter description:

Table 44. Create Vector Index (Restful API)
Parameters Description Options

table

Table name, such as "book" created in this example

database_name

Database where the destination table is located

field_name

Vector column where the vector index is to be created

index_name

Vector index name

metric_type

Vector similarity metric type used to measure similarities among vectors

<li>L2 (Euclidean distance)</li> <li>IP (Inner product)</li>

index_type

Vector index type

<li>FLAT</li> <li>IVF_FLAT</li> <li>IVF_SQ</li> <li>IVF_PQ</li> <li>IVF_PQ_FS</li> <li>HNSW</li>

params

Vector index parameter, related to vector index type

For some indexes, like PQ and SQ, sacrificing performance for compression ratio improvement, Hippo provides “refine” related parameter “index_slow_refine” in params additionally when creating vector index, enhancing recall rate effectively through increasing the number of returned records (topk). Here is an example of building index:

curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/{table}/_create_embedding_index?pretty' -H 'Content-Type: application/json' -d'
{
  "field_name": "vec",
  "index_name": "ivf_sq_index",
  "metric_type": "L2",
  "index_type": "ivf_sq",
  "params": {
    "nlist": 512,
    "sq_type": "SQ8",
    "index_slow_refine": "true"
  }
}';
复制

Parameter description:

[title='Create Vector Index – with refine parameter (Restful API) ']

Parameters Description Options

index_slow_refine (params)

Improves recall rate, mainly used for PQ, SQ

Defaults to "false"

Activate Vector Index

After creating vector index, Hippo will not activate it automatically as when the volume of data stored in Hippo is large, index creation will be resource consuming. Users can activate the index when available. Once activated, Hippo will perform automatic maintenance and update the index synchronously once the corresponding data is added/deleted/updated. This chapter introduces how to activate the vector index.

curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_activate_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "index_name" : "ivf_flat_index",
  "wait_for_completion" : true,
  "timeout" : "2m"
}';
复制

Result:

{
  "job_id" : "2c7fcdd7cd23479bb8660a86703e5c2f",
  "job_status" : "SHIVA_JOB_SUCCESS",
  "embedding_number" : 100,
  "task_results" : [
    {
      "id" : "cecfe25db0b840b899cb5b3806bdcfcc",
      "status" : "TASK_SUCCESS",
      "server" : "172.29.40.26:27861",
      "embedding_number" : 100,
      "execute_time" : 0.233
    }
  ]
}
复制

Parameter description:

Table 45. Activate Vector Index (Restful API)
Parameters Description Required

table

Table name, such as "book" created in this example

Yes

database_name

Database where the destination table is located

No, defaults to "default" database

index_name

Vector index name

Yes

wait_for_completion

Whether to wait until the job is done

Yes

timeout

Operation timeout

If "wait_for_completion" is set to true, timeout parameter is required

When activating vector index, Hippo will scan the data stored to create vector index. As mentioned, when data volume is large, this creation process takes time, thus users can set “wait_for_completion” to false to achieve asynchronous activation. During this process, Hippo will return the activation job ID, which can be used for checking job status. After the activation job is complete, the number of vectors used for index creation will be summarized and the whole job will not have negative impact on write operations.

Release Vector Index

The vector indexes currently Hippo supports are in-memory indexes. Hippo supports releasing index to reduce memory consumption. Released vector index does not occupy memory anymore, however it cannot be used for vector search indeed.

curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_release_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "index_name" : "ivf_flat_index",
  "wait_for_completion" : true,
  "timeout" : "2m"
}';
复制

Result:

{
  "job_id" : "3d585076c8994ef881eab2f16daaf096",
  "job_status" : "SHIVA_JOB_SUCCESS",
  "embedding_number" : 100,
  "task_results" : [
    {
      "id" : "0c4502bcefd4436a9efdddf78902c7a7",
      "status" : "TASK_SUCCESS",
      "server" : "172.29.40.26:27861",
      "embedding_number" : 100,
      "execute_time" : 0.014
    }
  ]
}
复制

Parameter description:

The parameters used in index release are similar to the ones used in index activation. After release is done, the number of vectors released will be returned and the whole job will not have negative impact on write operations.

Load Vector Index

Before using vector index, loading index into memory should be confirmed first. This chapter introduces how to load index.

curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_load_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "index_name" : "ivf_flat_index",
  "wait_for_completion" : true,
  "timeout" : "2m"
}';
复制

Result:

{
  "job_id" : "b0dc39c6023041228a5e2888174f6396",
  "job_status" : "SHIVA_JOB_SUCCESS",
  "embedding_number" : 100,
  "task_results" : [
    {
      "id" : "ee4db1d2c5ee455db85db072b90b5e4a",
      "status" : "TASK_SUCCESS",
      "server" : "172.29.40.26:27861",
      "embedding_number" : 100,
      "execute_time" : 0.014
    }
  ]
}
复制

Parameter description:

The parameters used in index release are similar to the ones used in index activation. After release is done, the number of vectors released will be returned and the whole job will not have negative impact on write operations.

Check Vector Index

Users can check detailed information of a vector index. Hippo will return all segment information of index vectors each shard has.

curl -u shiva:shiva -XGET 'localhost:8902/hippo/v1/{table}/_get_embedding_index?database_name={database_name}&pretty'
复制

Result:

{
  "default#book" : {
    "table_id" : "361165c970d1470886c435a86e3eae2f",
    "shards" : [
      {
        "tablet_id" : 0,
        "address" : "172.29.40.26:27851",
        "indexes" : [
          {
            "index_id" : 0,
            "state" : "PUBLIC",
            "flat_segments" : [
              {
                "min_id" : -1,
                "max_id" : -1,
                "embedding_num" : 0,
                "deleted_embedding_num" : 0
              },
              {
                "min_id" : 0,
                "max_id" : 99,
                "embedding_num" : 100,
                "deleted_embedding_num" : 0
              }
            ]
          }
        ]
      }
    ]
  }
}
复制

Parameter description:

Table 46. Check Vector Index (Restful API)
Parameters Description Required

table

Table name, such as "book" created in this example

database_name

Database where the table is located

No, defaults to "default" database

Delete Vector Index

This chapter introduces how to delete a vector index in Hippo.

curl -u shiva:shiva -XDELETE 'localhost:8902/hippo/v1/{table}/_drop_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "index_name" : "book_intro_index"
}';
复制

Result:

{
  "acknowledged" : true
}
复制

Parameter description:

Table 47. Delete Vector Index (Restful API)
Parameters Description Required

table

Table name, such as "book" created in this example

Yes

database_name

Database where the table is located

No, defaults to "default" database

index_name

Index to be deleted. If users would like to delete multiple indexes, they should list all names as: ["a","b","c"]

Yes

Enable/Disable Vector Index Auto-Compaction

By default, Hippo performs segment auto-compaction automatically. Users can enable/disable it via below command.

curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_embedding_index_auto_compaction?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "index_name" : "pq_index",
  "enable_auto_compaction" : true,
  "wait_for_completion" : true,
  "timeout" : "2m"
}';
复制

Result:

{
  "job_id" : "737e8f256f7141c69d4871bab51fadb6",
  "job_status" : "SHIVA_JOB_SUCCESS",
  "task_results" : [
    {
      "id" : "a6ea4f51db754485b0c1bcba54f201ab",
      "status" : "TASK_SUCCESS",
      "server" : "tw-node48:8702",
      "execute_time" : 0.0
    }
  ]
}
复制

Parameter description:

Table 48. Enable/Disable Vector Index Auto-Compaction (Restful API)
Parameters Description Required

table

Table name, such as "book" created in this example

Yes

database_name

Database where the table is located

No, defaults to "default" database

enable_auto_compaction

Enable/disable auto-compaction

Yes

index_name

Vector index name

Yes

wait_for_completion

Whether to wait until the job is done

Yes

timeout

Operation timeout

If "wait_for_completion" is set to true, timeout parameter is required

Compact Vector Index Manually

curl -u shiva:shiva -XPOST 'localhost:8902/hippo/v1/{table}/_compact_embedding_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "index_name" : "pq_index",
  "wait_for_completion" : true,
  "timeout" : "2m"
}';
复制

Result:

{
  "job_id" : "da4ce8d7f032415e87456c697f6592c3",
  "job_status" : "SHIVA_JOB_SUCCESS",
  "embedding_number" : 0,
  "task_results" : [
    {
      "id" : "bef72e410cd44d56a2f50a444353bf2c",
      "status" : "TASK_SUCCESS",
      "server" : "tw-node48:8702",
      "embedding_number" : 0,
      "execute_time" : 0.0
    }
  ]
}
复制

Parameter description:

Table 49. Compact Vector Index Manually (Restful API)
Parameters Description Required

table

Table name, such as "book" created in this example

Yes

database_name

Database where the table is located

No, defaults to "default" database

index_name

Vector index name

Yes

wait_for_completion

Whether to wait until the job is done

Yes

timeout

Operation timeout

If "wait_for_completion" is set to true, timeout parameter is required

Create Scalar Index

Unlike vectors, which have both magnitude and direction, scalars have only magnitude. Similar to traditional database, users can build indexes on scalar fields to speed up operations on scalars.

curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/{table}/_create_scalar_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "index_name" : "index",
  "field_names" : ["word_count"]
}';
复制

Result:

{
  "acknowledged" : true
}
复制

Parameter description:

Table 50. Create Scalar Index (Restful API)
Parameters Description Required

table

Table name, such as "book" created in this example

Yes

database_name

Database where the table is located

No, defaults to "default" database

index_name

Scalar index name

Yes

filed_names

Field containing scalar index. Hippo supports single index and composite index now.

Yes

Create Array Index

Hippo supports creating array containing scalars only. With index created on array column, the performance of hybrid search can be improved. This chapter introduces the array type and how to create array index in Hippo.

curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/book?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "settings": {
    "number_of_shards" : 1,
    "number_of_replicas" : 1
  },
  "schema": {
    "fields": [
      {
        "name": "book_id",
        "is_primary_key": true,
        "data_type": "int64"
      },
      {
        "name": "word_count",
        "is_primary_key": false,
        "data_type": "int64"
      },
      {
        "name": "attributes",
        "is_primary_key": false,
        "data_type": "array",
        "type_params": {
          "element_data_type" : "string"
        }
      },
      {
        "name": "book_intro",
        "data_type": "float_vector",
        "is_primary_key": false,
        "type_params": {
          "dimension" : 2
        }
      }
    ]
  }
}';
复制

The example shown above creates a String array column called “attributes” for table “book”, and the related parameters are listed below:

Table 51. Data Type of An Array in Hippo
Parameters Description Options

element_data_type (type)

Data type of elements stored in array

<li>Int8</li> <li>Int16</li> <li>Int32</li> <li>Int64</li> <li>String</li> <li>Float</li> <li>Double</li> <li>Bool</li>

Hippo supports creating index on array column, and the API used for array column during index creation is the same as the one used for other columns. However, there are several points users should pay attention to: - The array index must be single column index. - Only array columns containing elements with Int or String data type can be indexed. - Currently array index only supports conditions with “=” or “in” operators.

curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/book/_create_scalar_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "index_name" : "array_index",
  "field_names" : ["attributes"]
}';
复制

Result:

{
  "acknowledged" : true
}
复制

The insert operation is similar to:

curl -u shiva:shiva -XPUT 'localhost:8902/hippo/v1/book/_bulk?database_name={database_name}&pretty' -H'Content-Type: application/json' -d'{
  "fields_data": [
    {
      "field_name": "book_id",
      "field": [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
      ]
    },
    {
      "field_name": "word_count",
      "field": [1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000,13000,14000,15000,16000,17000,18000,19000,20000,21000,22000,23000,24000,25000,26000,27000,28000,29000,30000,31000,32000,33000,34000,35000,36000,37000,38000,39000,40000,41000,42000,43000,44000,45000,46000,47000,48000,49000,50000,51000,52000,53000,54000,55000,56000,57000,58000,59000,60000,61000,62000,63000,64000,65000,66000,67000,68000,69000,70000,71000,72000,73000,74000,75000,76000,77000,78000,79000,80000,81000,82000,83000,84000,85000,86000,87000,88000,89000,90000,91000,92000,93000,94000,95000,96000,97000,98000,99000,100000
      ]
    },
    {
      "field_name": "attributes",
      "field": [["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0","attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr0", "attr1", "attr2"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3","attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr3", "attr4", "attr5"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr6", "attr7", "attr8"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10","attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"],["attr9", "attr10", "attr11"]
      ]
    },
    {
      "field_name": "book_intro",
      "field": [[1,1],[2,1],[3,1],[4,1],[5,1],[6,1],[7,1],[8,1],[9,1],[10,1],[11,1],[12,1],[13,1],[14,1],[15,1],[16,1],[17,1],[18,1],[19,1],[20,1],[21,1],[22,1],[23,1],[24,1],[25,1],[26,1],[27,1],[28,1],[29,1],[30,1],[31,1],[32,1],[33,1],[34,1],[35,1],[36,1],[37,1],[38,1],[39,1],[40,1],[41,1],[42,1],[43,1],[44,1],[45,1],[46,1],[47,1],[48,1],[49,1],[50,1],[51,1],[52,1],[53,1],[54,1],[55,1],[56,1],[57,1],[58,1],[59,1],[60,1],[61,1],[62,1],[63,1],[64,1],[65,1],[66,1],[67,1],[68,1],[69,1],[70,1],[71,1],[72,1],[73,1],[74,1],[75,1],[76,1],[77,1],[78,1],[79,1],[80,1],[81,1],[82,1],[83,1],[84,1],[85,1],[86,1],[87,1],[88,1],[89,1],[90,1],[91,1],[92,1],[93,1],[94,1],[95,1],[96,1],[97,1],[98,1],[99,1],[100,1]
      ]
    }
  ],
  "num_rows": 100
}';
复制

After creating vector index, users can perform hybrid search on array or vector data. For more details, please refer to Chapter Hybrid Search Using Array and Vector Field.

Delete Scalar Index

This chapter introduces how to delete scalar index.

curl -u shiva:shiva -XDELETE 'localhost:8902/hippo/v1/{table}/_drop_scalar_index?database_name={database_name}&pretty' -H 'Content-Type: application/json' -d'{
  "index_name" : "index"
}';
复制

Result:

{
  "acknowledged" : true
}
复制

Parameter description:

Table 52. Delete Scalar Index (Restful API)
Parameters Description Required

table

Table name, such as "book" created in this example

Yes

database_name

Database where the table is located

No, defaults to "default" database

index_name

Index to be deleted. If users would like to delete multiple indexes, they should list all names as: ["a","b","c"]

Yes