Amazon OpenSearch Service is utilized for a broad set of usage cases like real-time application tracking, log analytics, and site search at scale. As your domain ages and you include extra customers, you require to reassess and alter the domain’s setup to deal with extra storage and calculate requirements. You wish to decrease downtime and efficiency effect as you make these modifications.
Clients have actually been looking for assistance on finest practices and patterns for altering index settings without an index upkeep window or impacting total efficiency of the OpenSearch Service domain. This is part among a two-part series, in which we demonstrate how to make settings modifications to OpenSearch Service indexes with little to no downtime while supporting active manufacturers and customers of the information.
Indexes in OpenSearch Service
In OpenSearch Service, information need to be indexed prior to it can be queried. Indexing is the approach by which online search engine arrange information for quick retrieval. The resulting structure is called, fittingly, an index All operations carried out on an index are done by means of index APIs Likewise, each index consists of index mappings, which specify field names and information enters the index. Information manufacturers can include brand-new fields with information types to an index. Index mappings can’t alter throughout the index lifecycle.
OpenSearch Service indexes have 2 kinds of settings that occasionally require changes as the profile of your work modifications:
- Dynamic— Settings that can be altered on the index at any time
- Fixed— Settings that can just be specified at the index production time and can’t be altered throughout the index lifecycle
Dynamic index settings can be altered at any time utilizing the upgrade settings API While the OpenSearch Service domain is carrying out advised operations on vibrant index settings, the index does not need a downtime. Modifications to the majority of vibrant index settings will not set off background jobs that impact the total usage of domain resources; nevertheless, some settings such as increasing the variety of reproductions by means of index.number _ of_replicas
or index.auto _ expand_replicas
, and depending upon the domain’s setup, can trigger a short-term boost in resource usage while the domain includes reproductions. We suggest keeping a minimum of one reproduction for redundancy factors, and numerous reproductions for high inquiry throughput usage cases.
Fixed index settings such as mapping and fragment count are specified at index production time and can’t be altered throughout the index lifecycle. In this post, we concentrate on patterns and finest practices for dealing with fixed index settings, such as altering fragment count and patterns for upgrading index mappings.
All operations and treatments that we cover in this post are released straight to the OpenSearch REST API or by means of the Dev Tools in OpenSearch Dashboards.
Similar to any usage case, there is a spectrum of services and restraints to be thought about. We begin with a couple of basic fundamental patterns and develop on them, representing usage cases with more functional restraints and dealing with big datasets.
Option summary
OpenSearch Service has a default sharding method of 5:1, where each index is divided into 5 main fragments. Within each index, each main fragment likewise has its own reproduction. OpenSearch Service immediately appoints main fragments and reproduction fragments to different information nodes.
It’s not possible to increase the main fragment variety of an existing index, implying an index should be recreated if you wish to increase the main fragment count.
The _ reindex operation is perfect for producing location indexes with upgraded fragments and mapping settings The _ reindex
operation is resource extensive. We suggest disabling reproductions in your location index by setting number_of_replicas
to 0 and re-enable reproductions when the reindex procedure is total. If you have your information in a 2nd, resilient shop, the easiest thing to do is stop briefly updates and reindex from the source. However that’s not constantly possible. In this post, we share a number of patterns that allow you to upgrade even fixed index settings like fragment count.
One the significant benefits of utilizing the _ reindex
operation is that it does not need putting the source index in a read-only mode (information manufacturers might continue to compose the information while reindexing remains in development). Likewise, the _ reindex
operation allows reprocessing, change, and reindexing a subset of files and even selectively integrating files from numerous indexes. With the _ reindex
operation, you can copy all or a subset of files that you choose through an inquiry to another index. In its the majority of fundamental type, the _ reindex
operation needs you to define a source and a location index and setup criteria.
The following are the a few of the usage cases supported by the reindex API:
- Reindexing all files
- Reindexing from a remote cluster when moving information in between clusters
- Reindexing a subset of files that match a search inquiry
- Integrating several indexes
- Changing files throughout reindexing
To increase the fragment count, you produce a brand-new index, set number_of_shards
to your preferred main fragment count, set number_of_replicas
to 0, upgrade the brand-new index mapping based upon your requirement, and run the reindex API operation. After the _ reindex
operation is total, we suggest upgrading number_of_replicas
in the location index settings to accomplish your preferred level of reproduction fragments.
In the following areas, we offer a walkthrough of the reindex API operation. Keep in mind that the patterns and treatments provided in this post have actually been verified on Amazon OpenSearch Service variation 1.3.
Requirements
The source of the files need to be kept in the index (the " _ source"
setting at the index mappings level need to be set to " made it possible for": real
, which is the default). The _ reindex
operation can’t be utilized without source files.
Produce the location index with your preferred mapping (field or information type). For presentation functions, our source index has a field rankings specified as long, and we desire the exact same field to utilize the float information enter the location index:
Guarantee that you have enough disk area on each hot tier information node to house the brand-new index main fragments and, depending upon your setup, reproduction fragments. If disk area is inadequate, carry out an upgrade operation on the OpenSearch Service domain to include the needed storage capability. Depending upon storage requirements, you might require to move the OpenSearch Service domain to a various circumstances type, since nodes have restraints on the EBS volume size that can be installed to each circumstances type. Release the following operation to verify offered disk area:
The following screenshot reveals the output.
Examine the disk.avail
metric for hot storage tier nodes to verify your offered disk area.
Utilize the reindex API operation
The _ reindex
operation pictures the index at the start of its run and carries out processing on a photo to decrease effect on the source index. The source index can still be utilized for querying and processing the information. Although the _ reindex
operation can run both synchronously and asynchronously, we suggest utilizing an asynchronous run. You can keep an eye on the development of the _ reindex
operation, cancel its run, or throttle its run utilizing the _ job
, _ cancel
, and _ rethrottle
operations, respectively.
Since the _ reindex
operation does not need the source index put in a read-only mode, inquiry and index upgrade operations are totally free to continue.
Utilize the reindex API with the following command:
The source indexes as part of the _ reindex
API operation can be supplemented with an inquiry for reindexing a subset of files and keeping them in the location index. Development of the re-indexing operation can be kept track of by means of jobs API operation:
Keep In Mind that the _ reindex
operation can be throttled by means of a _ rethrottle
API or settings passed as a specification. You can cancel the job with the _ cancel
operation:
The following screenshot reveals the output of the _ reindex
operation for reindexing from source_index_name
to destination_index_name
When the operation is total, both customers and manufacturers of the source indexes or aliases require to re-point to the location index and the exact same _ reindex
operation requires to run once again to capture up on any produce, upgrade, or erase operations carried out on the source indexes while the preliminary _ reindex
operation was running. This action is needed since the _ reindex
operation is working on a photo of the index. At this time, the _ reindex
operation requires to keep up " op_type":" produce"
to straighten missing and out-of-version files. See the following API command:
After the operation is total and information stability in the location index is validated, you can erase the source index to recover disk area.
Boost index shard count utilizing the split index API
The split index API and diminish index API cover a big selection of usage cases and present with low resource usage in the domain. Nevertheless, these APIs need closing the index for compose operations and do not attend to usage cases that need modifications to the mapping settings.
In OpenSearch Service, the number_of_shards
index setting is immutable and specified at the time when the index is developed. Nevertheless, although this setting is immutable, there are a number of patterns to increase or reduce index shard count without requiring to clearly reindex the information. You can additionally utilize the split index API to increase index shard count in the environments that can suspend compose operations. The split index API supplies a streamlined method of producing a brand-new index with a various fragment setting and without reindexing your information. The split index API operation produces a brand-new index based off of a read-only index with a wanted variety of main fragments.
In OpenSearch Service, an index alias is a virtual index name that can indicate several indexes. Referencing to indexes utilizing aliases in your applications permits you to prevent index name modifications. Index aliases are utilized to point customers and manufacturers to a brand-new index after the split index API operation is total.
Although most of usage cases concentrate on increasing a variety of fragments on an existing index due to information development, there are likewise circumstances where you require to decrease the variety of fragments on an existing index. Such cases sometimes occur when a real index size is less than what was prepared for when the index was developed, and you wish to line up with a fragment method for functional finest practices for OpenSearch Service In cases where you require to decrease a variety of fragments on an index, you can utilize the diminish index API to accomplish this job.
Conclusion
In this post, we examined finest practices when reindexing information for making modifications in OpenSearch Service fixed index settings and mappings that need little or no index downtime. We likewise covered usage of the split index and diminish index APIs for altering the main index shard count for usage cases where the index can be put in a read-only state.
In our next post, we’ll check out patterns for remote indexing to relieve load and resource usage on the source OpenSearch Service domain.
About the Authors
Mikhail Vaynshteyn is a Solutions Designer with Amazon Web Solutions. Mikhail deals with health care and life sciences clients to develop services that assist enhance clients’ results. Mikhail concentrates on information analytics services.
Sukhomoy Basak is a Solutions Designer at Amazon Web Solutions, with an enthusiasm for information and analytics services. Sukhomoy deals with business clients to assist them designer, develop, and scale applications to accomplish their organization results.