上QQ阅读APP看书,第一时间看更新
CheckpointListener/CheckpointNotificationListener
This interface provides hooks for the operator developer to align behavior with the engine's checkpointing mechanism (which will be covered in detail in the Chapter 5, Fault Tolerance and Reliability). For now, it will be sufficient to understand that during checkpointing the state of the operator is externalized and saved to durable storage. In order to optimize this and align any additional fault tolerance support that the operator requires, the interfaces provide:
- beforeCheckpoint: This is called before the engine extracts the state from the operator. This is an opportunity to update the state. For example, suppose the operator implementation needs to write state to a write ahead log and needs to remember the file name(s) and offset range(s) so that when there is a need to recover, it knows what to read. In this example, the operator would flush pending data to the file and the update it's state containing the offset information, so that the meta information is checkpointed and available during recovery.
- checkpointed: This is called immediately after a checkpoint is completed. This could be used to optimize resources or record information about a completed checkpoint. In most cases, the operator or developer would implement beforeCheckpoint, however.
- committed: This is called with the latest known streaming window interval identifier that was fully processed in the entire DAG and therefore will never need to be replayed during recovery. This is an opportunity for operator implementations to finalize state and make it available to external consumers. For example, a file writer could close intermediate files and move/rename them to the final location. Alternatively, incremental state saving mechanisms can materialize that state and drop recovery logs.