Datafusion Comet (part4) dataflow

#rust #comet #datafusion #spark

This note is shown to record the dataflow between with the java and rust execution that I learned from the code directly.

Tips

All the data is propagated from java side to rust with arrow format. The key points are shown as follows

  1. In comet java side, the arrow read from the parquet file will be delegated to the cometVector, but nothing is changed, just is delegator.
  2. The CometNativeExec's doExecuteColumnar will be triggered by other vanaill spark operators and then fetch the input data to fill the native operators chain
  3. In native rust side, its input will be wrapped by the native faked scan exec to get the data from the jni api to get the arrow data, this is zero copy.

Whiteboard

Pasted image 20240531161202.png