• Senior Two Times
    2020-09-17
    operation之间的shuffle操作,是上游operation推送到下游,而mapreduce的shuffle是下游的reduce主动从上游拉取,那么推送模式相对于拉取模式,有哪些优势?

    作者回复: 这种模式更适合于流计算,算子之间的数据能够即时推送到下游进行处理,因此Flink的延时就非常低,可以做到基于Record级别处理数据,而基于MR的就只能做到微批模式,毕竟每次如果只拉取一条就太浪费了

    共 2 条评论
    9
  • hehetown
    2022-03-10
    安装视频操作kubernetes-session.sh,启动后报错: INFO org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Starting the resource manager. INFO org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] - Closing the slot manager. INFO org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager [] - Suspending the slot manager. ERROR org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - Fatal error occurred in ResourceManager. org.apache.flink.runtime.resourcemanager.exceptions.ResourceManagerException: Could not start the ResourceManager akka.tcp://flink@my-first-flink-cluster.default:6123/user/rpc/resourcemanager_1 at org.apache.flink.runtime.resourcemanager.ResourceManager.onStart(ResourceManager.java:223) ~[flink-dist_2.12-1.14.3.jar:1.14.3] at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStart(RpcEndpoint.java:181) ~[flink-dist_2.12-1.14.3.jar:1.14.3] at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StoppedState.lambda$start$0(AkkaRpcActor.java:624) ~[flink-rpc-akka_7204fcdf-6732-4f67-8a25-1d3d4b67a0c2.jar:1.14.3] at org.apache.flink.runtime.concurrent.akk
    展开
    
    
  • 程序猿的小浣熊
    2021-12-28
    老师,shuffle过程改成主动推送后,上游节点是怎么知道该推到哪些节点呢。
    共 1 条评论
    
  • Allan
    2021-02-22
    dataflow设计关注的是数据的正确性,延迟,乱序,大数据量的处理方式,三部分:source 与外部进行连接 算子中间计算逻辑 sink 落地数据与外部打交道,dataflow采用的是由上游向下游发送的过程模型。
    
    