作者回复: dangling出现的情况是，用户删除一个索引a，删除的时候恰好nodeA 关机了，或者因为一些原因，离开了集群。这个时候索引a被删除了。然后NodeA又回来了，NodeA里面带有索引a的分片，这样的场景下会导致集群变红。而索引a本来就是用户希望delete的，只是没delete干净，所以不存在你说的丢数据的情况。

2019-09-30



4

老师，副本无法分配的那个案例，你说增加hot类型节点，我实验了下，搭建了两个hot，一个warm，创建的索引设置和你的案例一致，副本并无法正常被分配啊，这是为什么？

作者回复: 你可以通过_cat/allocation查看一下具体的原因

2020-03-15





小田黑阳

老师，这个问题是什么导致的？能不能结合原理讲下 { "index" : "containers-logs-2019.08.18", "shard" : 0, "primary" : false, "current_state" : "unassigned", "unassigned_info" : { "reason" : "ALLOCATION_FAILED", "at" : "2019-10-12T15:45:12.702Z", "failed_allocation_attempts" : 5, "details" : "failed shard on node [WxlYLXMWQwuEGRpVCYIPuQ]: failed recovery, failure RecoveryFailedException[[containers-logs-2019.08.18][0]: Recovery failed from {es7_01}{hd8FLM8UQkqikNTdEBoKbA}{-R1DhY8fQiu-TZ-V2f29pw}{172.18.0.5}{172.18.0.5:9300}{ml.machine_memory=6087548928, ml.max_open_jobs=20, xpack.installed=true} into {es7_02}{WxlYLXMWQwuEGRpVCYIPuQ}{t_R3pB5JQVmxh_LES8JSFA}{172.18.0.4}{172.18.0.4:9300}{ml.machine_memory=6087548928, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[es7_01][172.18.0.5:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [1017681884/970.5mb], which is larger than the limit of [986932838/941.2mb], real usage: [1017681608/970.5mb], new bytes reserved: [276/276b]]; ", "last_allocation_status" : "no_attempt" },

作者回复: 看起来是heap的限制导致节点之间无法传输数据。需要查看一下cluster node stats API，看看heap usage，需要的时候适当增加

2019-10-12





伟伟哦

"type":"circuit_breaking_exception", "reason":"[parent]"Data too large,data for [<http_request>] would be [4528630082/4.2gb],which is larger than the limit of [4488796569/4.1gb],real usage:[4528629696/4.2gb] new bytes reserved:[386/386b]" 我改成我问下剩下的空间都用在哪 -Xms7g -Xms7g indices.breaker.fielddata.limit 60% indices.breaker.request.limit 40% indices.breaker.total.limit 70% indices.fielddata.cache.size 40% 我怎么解决这个问题呀

2019-09-11

1

1

晨露

老师，能问一下6.3.1的版本，不知道为什么索引会自动删除 [2020-08-24T01:22:00,001][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [hckjes3] Deleting expired data [2020-08-24T01:22:00,186][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [hckjes3] Completed deletion of expired data

2020-08-24





晨露

[2020-06-18T12:09:30,100][ERROR][o.e.x.m.c.i.IndexRecoveryCollector] [node3] collector [index_recovery] timed out when collecting data [2020-06-18T12:09:40,101][ERROR][o.e.x.m.c.i.IndexStatsCollector] [node3] collector [index-stats] timed out when collecting data 老师能问一下，这个有没有好的解决办法呢

2020-06-18





窝窝头

老师你好，如果遇到这种问题怎么解决呢，shard是存在的且为red { "index": "logstash-2019.10.18", "shard": 1, "primary": false, "current_state": "unassigned", "unassigned_info": { "reason": "MANUAL_ALLOCATION", "at": "2019-10-21T03:56:38.094Z", "details": "failed shard on node [OBgTcksjRU-lzQU2jb7QdQ]: failed recovery, failure RecoveryFailedException[[logstash-2019.10.18][1]: Recovery failed from {elasticsearch-logging-1}{x6cTKNFBTS-MvF7favwBTg}{6yqlOf7oTEWpV7sAxzWfGw}{10.2.33.252}{10.2.33.252:9300} into {elasticsearch-logging-0}{OBgTcksjRU-lzQU2jb7QdQ}{06Mr39VaQSawv7ViHM3JIg}{10.2.3.223}{10.2.3.223:9300}]; nested: RemoteTransportException[[elasticsearch-logging-1][10.2.33.252:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[2] phase2 failed]; nested: IOException[No such device or address]; ", "last_allocation_status": "no_attempt" },

2019-10-22





很有用，给老师点赞

2019-09-25





钱

ES集群变红变黄的常见原因及定位分析解决之道——常见故障解决经验的积累，使用es提供的集群诊断API，挺棒的分享，感谢。

2019-09-22





收起评论