Performance issues when Target batch records are drained too often
When using the target sdk in batch mode the default behavior is to drain batches any time a new state message arrives regardless of the batch settings provided by the target. A specific case is that tap-snowflake emits a currently_syncing
state message after every 1000 records but on the target side I may have a batch size of >1000. In this case my batch is getting drained more frequently than I'd like. There is a private property _DRAIN_AFTER_STATE
that is set to true but theres no way to override this without changing private variables.
Proposed Changes:
- making
_DRAIN_AFTER_STATE
a public property that can be overridden like max_parallelism - default behavior should be to check the new state message against the existing state and only drain if the state has changed
- log a warning or info message letting the user know that their batch was drained prior to being full and for what reason especially if this ticket implements some other optimizations. Something like
Batch draining prior to being full due to a state message change: batch 1000, max size 10000
Thoughts? I'm happy to make these proposed changes if theyre accepted.
Related to #135 (closed).
Edited by AJ Steers