ELT run fails when more than 64KiB of state is written
What is the current bug behavior?
meltano elt with
target-jsonl throws error after a long run and downloading more than 800 objects.
What is the expected correct behavior?
meltano elt with
target-jsonl should complete successfully without any error.
Steps to reproduce
meltano elt tap-salesforce target-jsonl with all objects (.) and after downloading more than 800 objects the meltano throws error.
Relevant logs and/or screenshots
ERROR exception calling callback for <Future at 0x7fb9db662110 state=finished returned NoneType> Traceback (most recent call last): File "/usr/lib64/python3.7/concurrent/futures/_base.py", line 324, in _invoke_callbacks callback(self) File "/usr/lib64/python3.7/asyncio/futures.py", line 365, in _call_set_state dest_loop.call_soon_threadsafe(_set_state, destination, source) File "/usr/lib64/python3.7/asyncio/base_events.py", line 732, in call_soon_threadsafe self._check_closed() File "/usr/lib64/python3.7/asyncio/base_events.py", line 479, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed Task exception was never retrieved future: <Task finished coro=<SingerRunner.bookmark() done, defined at /home/ec2-user/.venv/meltano/lib64/python3.7/site-packages/meltano/core/runner/singer.py:133> exception=ValueError('Separator is found, but chunk is longer than limit')> Traceback (most recent call last): File "/usr/lib64/python3.7/asyncio/streams.py", line 496, in readline line = await self.readuntil(sep) File "/usr/lib64/python3.7/asyncio/streams.py", line 592, in readuntil 'Separator is found, but chunk is longer than limit', isep) asyncio.streams.LimitOverrunError: Separator is found, but chunk is longer than limit During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ec2-user/.venv/meltano/lib64/python3.7/site-packages/meltano/core/runner/singer.py", line 135, in bookmark last_state = await target_stream.readline() File "/usr/lib64/python3.7/asyncio/streams.py", line 505, in readline raise ValueError(e.args) ValueError: Separator is found, but chunk is longer than limit ELT could not complete, an error happened during the process: Subprocesses didn't exit cleanly: tap(1), target(0)
The error indicates that the tap is writing a state line (that the target is forwarding) of more than 64KiB in length, or that the total state emitted up to that point is over 64KiB in length (see https://stackoverflow.com/a/55458913)
That's a lot of state, but not necessarily out of character for Salesforce
It will also help if
meltano persist state in system database more frequently, may be after each object download completion.
Further regression test
Ensure we automatically catch similar issues in the future
- Write additional adequate test cases and submit test results
- Test results should be reviewed by a person from the team
/label bug /label ~"To Do"