

下载APP



关闭

讲堂

算法训练营

Python 进阶训练营

企业服务

极客商城

客户端下载

兑换中心

渠道合作

推荐作者

当前播放: 39 | 文档分布式存储



00:00 / 00:00

标清

标清

1.0x

2.0x
1.5x
1.25x
1.0x
0.5x



网页全屏



全屏

00:00

付费课程，可试看

Elasticsearch核心技术与实战



共100讲 · 约1000分钟

 16573

免费

01 | 课程介绍

免费

02 | 内容综述及学习建议

免费

03 | Elasticsearch简介及其发...

免费

04 | Elastic Stack家族成员及...

免费

05 | Elasticsearch的安装与...

06 | Kibana的安装与界面快速...

07 | 在Docker容器中运行Elas...

08 | Logstash安装与导入数据

09 | 基本概念：索引、文档和...

10 | 基本概念：节点、集群、分...

11 | 文档的基本CRUD与批量操...

12 | 倒排索引介绍

13 | 通过Analyzer进行分词

14 | Search API概览

15 | URI Search详解

16 | Request Body与Query ...

17 | Query String&Simple ...

18 | Dynamic Mapping和常见...

19 | 显式Mapping设置与常见参...

20 | 多字段特性及Mapping中配...

21 | Index Template和Dynami...

22 | Elasticsearch聚合分析简...

23 | 第一部分总结

24 | 基于词项和基于全文的搜索

25 | 结构化搜索

26 | 搜索的相关性算分

27 | Query&Filtering与多字...

28 | 单字符串多字段查询：Dis...

29 | 单字符串多字段查询：Mult...

30 | 多语言及中文分词与检索

31 | Space Jam，一次全文搜索...

32 | 使用Search Template和In...

33 | 综合排序：Function Scor...

34 | Term&Phrase Suggester

35 | 自动补全与基于上下文的提...

36 | 配置跨集群搜索

37 | 集群分布式模型及选主与脑...

38 | 分片与集群的故障转移

39 | 文档分布式存储

40 | 分片及其生命周期

41 | 剖析分布式查询及相关性算...

42 | 排序及Doc Values&Field...

43 | 分页与遍历：From, Siz...

44 | 处理并发读写操作

45 | Bucket & Metric聚合分...

46 | Pipeline聚合分析

47 | 作用范围与排序

48 | 聚合分析的原理及精准度问...

49 | 对象及Nested对象

50 | 文档的父子关系

51 | Update By Query &...

52 | Ingest Pipeline & Pa...

53 | Elasticsearch数据建模实...

54 | Elasticsearch数据建模最...

55 | 第二部分总结回顾

56 | 集群身份认证与用户鉴权

57 | 集群内部安全通信

58 | 集群与外部间的安全通信

59 | 常见的集群部署方式

60 | Hot & Warm架构与Shard...

61 | 分片设计及管理

62 | 如何对集群进行容量规划

63 | 在私有云上管理Elasticsea...

64 | 在公有云上管理与部署Elas...

65 | 生产环境常用配置与上线清...

66 | 监控Elasticsearch集群

67 | 诊断集群的潜在问题

68 | 解决集群Yellow与Red的问...

69 | 提升集群写性能

70 | 提升进群读性能

71 | 集群压力测试

72 | 段合并优化及注意事项

73 | 缓存及使用Breaker限制内...

74 | 一些运维的相关建议

75 | 使用Shrink与Rollover AP...

76 | 索引全生命周期管理及工具...

77 | Logstash入门及架构介绍

78 | 利用JDBC插件导入数据到El...

79 | Beats介绍

80 | 使用Index Pattern配置数...

81 | 使用Kibana Discover探索...

82 | 基本可视化组件介绍

83 | 构建Dashboard

84 | 用Monitoring和Alerting监...

85 | 用APM进行程序性能监控

86 | 用机器学习实现时序数据的...

87 | 用机器学习实现时序数据的...

88 | 用ELK进行日志管理

89 | 用Canvas做数据演示

90 | 项目需求分析及架构设计

91 | 将电影数据导入Elasticsea...

92 | 搭建你的电影搜索服务

93 | 需求分析及架构设计

94 | 数据Extract & Enrichm...

95 | 构建Insights Dashboard

96 | Elastic认证介绍

97 | 考点梳理

98 | 集群数据备份

99 | 基于Java和Elasticseach构...

100 | 结束语

本节摘要

课程 Demo

课件地址

深入了解 Elasticsearch

展开



精选留言(12)

艾文

2019-08-11

老师，请问为什么es的文档没采用一致性hash算法呢？

作者回复: 这是一个很好的问题。我贴一段es文档里的描述，希望对你有一定的帮助

Why doesn’t Elasticsearch support incremental resharding?edit
Going from N shards to N+1 shards, aka. incremental resharding, is indeed a feature that is supported by many key-value stores. Adding a new shard and pushing new data to this new shard only is not an option: this would likely be an indexing bottleneck, and figuring out which shard a document belongs to given its _id, which is necessary for get, delete and update requests, would become quite complex. This means that we need to rebalance existing data using a different hashing scheme.

The most common way that key-value stores do this efficiently is by using consistent hashing. Consistent hashing only requires 1/N-th of the keys to be relocated when growing the number of shards from N to N+1. However Elasticsearch’s unit of storage, shards, are Lucene indices. Because of their search-oriented data structure, taking a significant portion of a Lucene index, be it only 5% of documents, deleting them and indexing them on another shard typically comes with a much higher cost than with a key-value store. This cost is kept reasonable when growing the number of shards by a multiplicative factor as described in the above section: this allows Elasticsearch to perform the split locally, which in-turn allows to perform the split at the index level rather than reindexing documents that need to move, as well as using hard links for efficient file copying.

In the case of append-only data, it is possible to get more flexibility by creating a new index and pushing new data to it, while adding an alias that covers both the old and the new index for read operations. Assuming that the old and new indices have respectively M and N shards, this has no overhead compared to searching an index that would have M+N shards.

 1

 1
Sunqc

2019-08-10

主分片删除成功，副分片没有成功，结果最终返回失败吗，还是不存在只有一方删除成功的情况呢

 1

 1
低调光环

2019-08-08

请问，视频中更新和删除文档的请求，首先会发送到master节点吗，还是通过前置的负载均衡工具分发到某一个节点？

作者回复: 视频中发送到9200，我也没在开发环境中指定dedicated的节点。所以这个节点既是master也是data，当然肯定也是coordinating节点。

在生产环境，你可以设置dedicate的 coordinate节点，发查询到这些节点。不建议直接发送请求到master节点，虽然也会工作，但是大量请求发送到master，会有潜在的性能问题



 1
GaelYang

2019-08-06

有一个疑问？就是存在副本的情况，索引文档的时候，副本也是同时做索引的吗？

作者回复: 对。所以过多的副本会降低索引的速度



 1
刘应明

2019-08-04

我理解更新的流程也应该和删除一样同步副本分片吧？视频中没体现出来

作者回复: 你的理解没错，视频里要是体现出来，会使得表达更为严谨

 1

 1
godtrue

2019-09-21

有几个疑问：
1：添加文档的时候，文档ID那来的？这个ID是索引下唯一的吧？还是集群下唯一？
2：查询、修改、删除这些操作，也必须知道文档ID，此时的文档ID是通过倒排索引获取的吧？倒排索引这个数据结构具体存储在哪里呢？每个节点只有有操作文档的权限都需要获取文档的ID，换句话说都需要能获取倒排索引，如果倒排索引比较大怎么存？
3：现在基本分清楚了节点和分片以及文档的关系，节点大体分为主节点、路由节点、数据节点这几种，每一种都有其对应的职责；一个节点上可以有N个分片，分片专门用于存储数据，分片分为主分片和副本分片，一个索引可能会有N个主分片，此索引下的文档具体会被分配到那个分片上是根据动态计算的，分片号=哈希(文档ID)/主分片数，复分片用于数据备份以及分担读请求的压力。切记，同一个分片的主分片和复分片不可能在同一个节点上。
总于，弄明白了，很舒服，感谢！

展开




Coisini

2019-09-04

老师请问下当一个搜索请求进来时会被分配到副本分片上去查询结果么

作者回复: 会到主分片或者副本分片上查询，所以，不排除副本分片没有完全同步完成导致数据的不完全一致




汤尼房

2019-08-28

老师请教个问题，我在做热温数据迁移的过程中，想要对分片的relocating操作做监控，比如relocation了多大的数据量，速率如何？在ES官方文档没能找到答案，望老师给点提示。




标

2019-08-23

有个疑问，写入流程是否是这样子的，请求到达协调节点，协调节点负责转发和组装数据，协调节点转发到master节点，master操作hash（route）路由，决定写入哪个分片，master更新状态，返回给协调节点，协调节点最终返回给请求客户端




Sunqc

2019-08-21

老师，我是windows环境，没有用docker，，启动了两个实例，cluster名字一样，就是看不到集群信息，就那一次看到了，过了一个星期，现在又是一个星期，还是看不到，我感觉我卡在这里了

作者回复: 你可以一个个启动，先看第一个是否启动，再启动第二个，看 _cat/nodes里面能否看到新加入的节点

 1


Sunqc

2019-08-16

按照老师的例子上周启动了三个还可以看集群信息呢，今天准备复习一下，用cerebro访问9200，只能看到node1 ,看不到集群信息：
bin/elasticsearch -E node.name=node1 -E cluster.name=sunqc -E path.data=node1_data -E http.port=9200 -E transport.port=9300
bin/elasticsearch -E node.name=node2 -E cluster.name=sunqc -E path.data=node2_data -E http.port=9201 -E transport.port=9301 .这是node1启动信息：
============================node1======【node2信息一样】====
[2019-08-16T16:42:32,453][INFO ][o.e.t.TransportService ] [node1] publish_address {127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}, {[::1]:9300}
[2019-08-16T16:42:32,472][WARN ][o.e.b.BootstrapChecks ] [node1] the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
[2019-08-16T16:42:32,505][INFO ][o.e.c.c.ClusterBootstrapService] [node1] no discovery configuration found, will perform best-effort cluster bootstrapping after [3s] unless existing master is discovered
[2019-08-16T16:42:35,509][INFO ][o.e.c.c.Coordinator ] [node1] setting initial configuration to VotingConfiguration{MZVYW8hkTyGf3A_hXeIo3Q}
[2019-08-16T16:42:35,743][INFO ][o.e.c.s.MasterService ] [node1] elected-as-master ([1] nodes joined)[{node1}{MZVYW8hkTyGf3A_hXeIo3Q}{09DZrBifRra_dVenD6nW7Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=4172951552, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 1, version: 1, reason: master node changed {previous [], current [{node1}{MZVYW8hkTyGf3A_hXeIo3Q}{09DZrBifRra_dVenD6nW7Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=4172951552, xpack.installed=true, ml.max_open_jobs=20}]}
[2019-08-16T16:42:35,922][INFO ][o.e.c.s.ClusterApplierService] [node1] master node changed {previous [], current [{node1}{MZVYW8hkTyGf3A_hXeIo3Q}{09DZrBifRra_dVenD6nW7Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=4172951552, xpack.installed=true, ml.max_open_jobs=20}]}, term: 1, version: 1, reason: Publication

展开

作者回复: 你尝试着docker-compose down -v 再启动一下吧




Erick

2019-08-07

请问删除文档的话，也是需要路由到Master节点，执行删除然后同步删除命令到副本执行删除操作吗？还是说路由到文档所在的主分片执行删除，然后同步删除命令到副本呢？





去订阅《Elasticsearch核心技术与实战》