作者回复: 这是一个很好的问题。我贴一段es文档里的描述，希望对你有一定的帮助 Why doesn’t Elasticsearch support incremental resharding?edit Going from N shards to N+1 shards, aka. incremental resharding, is indeed a feature that is supported by many key-value stores. Adding a new shard and pushing new data to this new shard only is not an option: this would likely be an indexing bottleneck, and figuring out which shard a document belongs to given its _id, which is necessary for get, delete and update requests, would become quite complex. This means that we need to rebalance existing data using a different hashing scheme. The most common way that key-value stores do this efficiently is by using consistent hashing. Consistent hashing only requires 1/N-th of the keys to be relocated when growing the number of shards from N to N+1. However Elasticsearch’s unit of storage, shards, are Lucene indices. Because of their search-oriented data structure, taking a significant portion of a Lucene index, be it only 5% of documents, deleting them and indexing them on another shard typically comes with a much higher cost than with a key-value store. This cost is kept reasonable when growing the number of shards by a multiplicative factor as described in the above section: this allows Elasticsearch to perform the split locally, which in-turn allows to perform the split at the index level rather than reindexing documents that need to move, as well as using hard links for efficient file copying. In the case of append-only data, it is possible to get more flexibility by creating a new index and pushing new data to it, while adding an alias that covers both the old and the new index for read operations. Assuming that the old and new indices have respectively M and N shards, this has no overhead compared to searching an index that would have M+N shards.

2019-08-11

2

6

低调光环

请问，视频中更新和删除文档的请求，首先会发送到master节点吗，还是通过前置的负载均衡工具分发到某一个节点？

作者回复: 视频中发送到9200，我也没在开发环境中指定dedicated的节点。所以这个节点既是master也是data，当然肯定也是coordinating节点。在生产环境，你可以设置dedicate的 coordinate节点，发查询到这些节点。不建议直接发送请求到master节点，虽然也会工作，但是大量请求发送到master，会有潜在的性能问题

2019-08-08

2

4

刘应明

我理解更新的流程也应该和删除一样同步副本分片吧？视频中没体现出来

作者回复: 你的理解没错，视频里要是体现出来，会使得表达更为严谨

2019-08-04

3

4

GaelYang

有一个疑问？就是存在副本的情况，索引文档的时候，副本也是同时做索引的吗？

作者回复: 对。所以过多的副本会降低索引的速度

2019-08-06



2

Coisini

老师请问下当一个搜索请求进来时会被分配到副本分片上去查询结果么

作者回复: 会到主分片或者副本分片上查询，所以，不排除副本分片没有完全同步完成导致数据的不完全一致

2019-09-04



1

Sunqc

老师，我是windows环境，没有用docker，，启动了两个实例，cluster名字一样，就是看不到集群信息，就那一次看到了，过了一个星期，现在又是一个星期，还是看不到，我感觉我卡在这里了

作者回复: 你可以一个个启动，先看第一个是否启动，再启动第二个，看 _cat/nodes里面能否看到新加入的节点

2019-08-21

3



Sunqc

按照老师的例子上周启动了三个还可以看集群信息呢，今天准备复习一下，用cerebro访问9200，只能看到node1 ,看不到集群信息： bin/elasticsearch -E node.name=node1 -E cluster.name=sunqc -E path.data=node1_data -E http.port=9200 -E transport.port=9300 bin/elasticsearch -E node.name=node2 -E cluster.name=sunqc -E path.data=node2_data -E http.port=9201 -E transport.port=9301 .这是node1启动信息： ============================node1======【node2信息一样】==== [2019-08-16T16:42:32,453][INFO ][o.e.t.TransportService ] [node1] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300} [2019-08-16T16:42:32,472][WARN ][o.e.b.BootstrapChecks ] [node1] the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured [2019-08-16T16:42:32,505][INFO ][o.e.c.c.ClusterBootstrapService] [node1] no discovery configuration found, will perform best-effort cluster bootstrapping after [3s] unless existing master is discovered [2019-08-16T16:42:35,509][INFO ][o.e.c.c.Coordinator ] [node1] setting initial configuration to VotingConfiguration{MZVYW8hkTyGf3A_hXeIo3Q} [2019-08-16T16:42:35,743][INFO ][o.e.c.s.MasterService ] [node1] elected-as-master ([1] nodes joined)[{node1}{MZVYW8hkTyGf3A_hXeIo3Q}{09DZrBifRra_dVenD6nW7Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=4172951552, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 1, version: 1, reason: master node changed {previous [], current [{node1}{MZVYW8hkTyGf3A_hXeIo3Q}{09DZrBifRra_dVenD6nW7Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=4172951552, xpack.installed=true, ml.max_open_jobs=20}]} [2019-08-16T16:42:35,922][INFO ][o.e.c.s.ClusterApplierService] [node1] master node changed {previous [], current [{node1}{MZVYW8hkTyGf3A_hXeIo3Q}{09DZrBifRra_dVenD6nW7Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=4172951552, xpack.installed=true, ml.max_open_jobs=20}]}, term: 1, version: 1, reason: Publication

作者回复: 你尝试着docker-compose down -v 再启动一下吧

2019-08-16





少明

如文中所说，更新文档会先删除再写入，在这之间是否会有瞬间该文档搜不到呢？

2021-01-05



3

flame198?

有个疑问麻烦老师回答一下，插入数据不指定_id的情况，是由协调节点生成id然后路由给指定分片吗？还是根据负载算法到分片，然后由分片生成id？

2020-04-17



3

钱

有几个疑问： 1：添加文档的时候，文档ID那来的？这个ID是索引下唯一的吧？还是集群下唯一？ 2：查询、修改、删除这些操作，也必须知道文档ID，此时的文档ID是通过倒排索引获取的吧？倒排索引这个数据结构具体存储在哪里呢？每个节点只有有操作文档的权限都需要获取文档的ID，换句话说都需要能获取倒排索引，如果倒排索引比较大怎么存？ 3：现在基本分清楚了节点和分片以及文档的关系，节点大体分为主节点、路由节点、数据节点这几种，每一种都有其对应的职责；一个节点上可以有N个分片，分片专门用于存储数据，分片分为主分片和副本分片，一个索引可能会有N个主分片，此索引下的文档具体会被分配到那个分片上是根据动态计算的，分片号=哈希(文档ID)/主分片数，复分片用于数据备份以及分担读请求的压力。切记，同一个分片的主分片和复分片不可能在同一个节点上。总于，弄明白了，很舒服，感谢！

2019-09-21



3

收起评论