当前位置: 动力学知识库 > 问答 > 编程问答 >

GroupByKey returns no elements in Google Cloud Dataflow

问题描述:

I'm new to Dataflow, so this is probably an easy question.

I want to try out the Sessions windowing strategy. According to the windowing documentation, windowing is not applied until we've done a GroupByKey, so I'm trying to do that.

However, when I look at my pipeline in Google Cloud Platform, I can see that MapElements returns elements, but no elements are returned by GroupByKey ("Elements Added: -"). What am I doing wrong when grouping by key?

Here's the most relevant part of the code:

events = events

.apply(Window.named("eventsSessionsWindowing")

.<MyEvent>into(Sessions.withGapDuration(Duration.standardSeconds(3)))

);

PCollection<KV<String, MyEvent>> eventsKV = events

.apply(MapElements

.via((MyEvent e) -> KV.of(ExtractKey(e), e))

.withOutputType(new TypeDescriptor<KV<String, MyEvent>>() {}));

PCollection<KV<String, Iterable<MyEvent>>> eventsGrouped = eventsKV.apply(GroupByKey.<String, MyEvent>create());

网友答案:

A GroupByKey fires according to a triggering strategy, which determines when the system thinks that all data for this key/window has been received and it's time to group it and pass to downstream transforms. The default strategy is:

The default trigger for a PCollection is event time-based, and emits the results of the window when the system's watermark (Dataflow's notion of when it "should" have all the data) passes the end of the window.

Please see Default Trigger for details. You were seeing a delay of a couple of minutes that corresponded to the progression of PubSub's watermark.

分享给朋友:
您可能感兴趣的文章:
随机阅读: