[fix for 132] Two GetAssetOutboundFee patches #check-lint-warning

added 1 commit

caaaf28b - fix test

changed title from Don't return error if no NetworkFee is set for chain, and use GetFee for older GetAssetOutboundFee versions to [fix for 132] Two GetAssetOutboundFee patches

changed the description

changed title from [fix for 132] Two GetAssetOutboundFee patches to [fix for 132] Two GetAssetOutboundFee patches #check-lint-warning

approved this merge request

LGTM but will defer approval to @akrokr since this is in response to an issue he brought up

assigned to @akrokr and @Multipartite

approved this merge request

added 3 commits

887e8101 - 1 commit from branch develop
bf508115 - gas manager patches
0d5a17c7 - fix test

Compare with previous version

enabled an automatic merge when the pipeline for 0d5a17c7 succeeds

merged

resolved all threads

Not related to the MR itself, but commenting here to @ursa9r on that the develop-branch simulation test failed
whereas a run here (and for a later develop-branch simulation test) succeeded.

Failed job:
https://gitlab.com/thorchain/thornode/-/jobs/6743421833

9:31PM INF actors/swap.go:242 > swap complete actor="Swap LTC.LTC => GAIA.ATOM" from=LTC.LTC maxExpected=100836120 minExpected=91232680 received=100937100 to=GAIA.ATOM user=goat
9:31PM ERR pkg/types/actor.go:164 > op failed error="received amount outside of tolerance" actor="Swap LTC.LTC => GAIA.ATOM" from=LTC.LTC to=GAIA.ATOM user=goat
9:31PM FTL pkg/dag/dag.go:105 > actor execution failed error="received amount outside of tolerance" actor="Swap LTC.LTC => GAIA.ATOM" from=LTC.LTC to=GAIA.ATOM user=goat

MR run success:
https://gitlab.com/thorchain/thornode/-/jobs/6743299346

9:02AM INF actors/swap.go:242 > swap complete actor="Swap LTC.LTC => GAIA.ATOM" from=LTC.LTC maxExpected=101419290 minExpected=91760310 received=97558800 to=GAIA.ATOM user=deer

Develop branch later commit success:
https://gitlab.com/thorchain/thornode/-/jobs/6743910135

10:42PM INF actors/swap.go:242 > swap complete actor="Swap LTC.LTC => GAIA.ATOM" from=LTC.LTC maxExpected=102920895 minExpected=93118905 received=99017500 to=GAIA.ATOM user=hawk

Is there sufficient information in the logs or/and artifacts to determine why the failed job failed?
I do note that in the failed job received=100937100 is greater than maxExpected=100836120.
|
[Edit: I now think so,
regarding the quote being calculated before any swaps, then the swap itself taking place after
one complete GAIA.ATM => LTC.LTC swap,
two other complete GAIA.ATOM inbound swaps,
one other complete LTC.LTC outbound swap,
no complete LTC.LTC inbound swaps (other than itself),
and no complete GAIA.ATOM outbound swaps (other than itself).]

After running the THORNode log through ansi2txt:

2024-04-30_failed_simulation_test__ansi2txt__docker-thornode-1.log

9:31PM INF gitlab.com/thorchain/thornode/x/thorchain/handler_observed_txin.go:195 > handleMsgObservedTxIn request Tx:="CC924A3D1662BAAC160F8A56611FF7BA0F4EA3827135E70AF3DCD20610EA1640: rltc1qxkz88t875m7ms7wa6220estwwdlc86pqjsk5lg ==> rltc1qzf3gsk7edzwl9syyefvfhle37cjtql354jcs8g (Memo: =:GAIA.ATOM:cosmos1xkz88t875m7ms7wa6220estwwdlc86pqjgsgqp) 10000000 LTC.LTC (gas: [10000 LTC.LTC])"

9:31PM INF gitlab.com/thorchain/thornode/x/thorchain/handler_swap.go:269 > receive MsgSwap request tx hash=CC924A3D1662BAAC160F8A56611FF7BA0F4EA3827135E70AF3DCD20610EA1640 signer=tthor1zf3gsk7edzwl9syyefvfhle37cjtql35h6k85m source asset=LTC.LTC target asset=GAIA.ATOM
9:31PM INF gitlab.com/thorchain/thornode/x/thorchain/swap_current.go:194 > swapping coins={"amount":"10000000","asset":"LTC.LTC"} fee=1941323 from=rltc1qxkz88t875m7ms7wa6220estwwdlc86pqjsk5lg target=THOR.RUNE to=cosmos1xkz88t875m7ms7wa6220estwwdlc86pqjgsgqp
9:31PM INF gitlab.com/thorchain/thornode/x/thorchain/swap_current.go:311 > pre swap asset=1980632232 lp units=20000000000 pool=LTC.LTC rune=20195583376 synth units=0
9:31PM INF gitlab.com/thorchain/thornode/x/thorchain/swap_current.go:368 > post swap asset=1990632232 emit asset=100943458 lp units=20000000000 pool=LTC.LTC rune=20094639918 synth units=0
9:31PM INF gitlab.com/thorchain/thornode/x/thorchain/swap_current.go:194 > swapping coins={"amount":"100943458","asset":"THOR.RUNE"} fee=1941323 from=rltc1qxkz88t875m7ms7wa6220estwwdlc86pqjsk5lg target=GAIA.ATOM to=cosmos1xkz88t875m7ms7wa6220estwwdlc86pqjgsgqp
9:31PM INF gitlab.com/thorchain/thornode/x/thorchain/swap_current.go:311 > pre swap asset=20300000000 lp units=20000000000 pool=GAIA.ATOM rune=19704426393 synth units=0
9:31PM INF gitlab.com/thorchain/thornode/x/thorchain/swap_current.go:368 > post swap asset=20197062900 emit asset=102937100 lp units=20000000000 pool=GAIA.ATOM rune=19805369851 synth units=0

9:31PM INF gitlab.com/thorchain/thornode/x/thorchain/handler_observed_txout.go:188 > handleMsgObservedTxOut request Tx:="C38B9D0078F363CAE321D49B4546F02012B51AD03F8EE8965C87B4E781D89323: cosmos1zf3gsk7edzwl9syyefvfhle37cjtql35427vcp ==> cosmos1xkz88t875m7ms7wa6220estwwdlc86pqjgsgqp (Memo: OUT:CC924A3D1662BAAC160F8A56611FF7BA0F4EA3827135E70AF3DCD20610EA1640) 100937100 GAIA.ATOM (gas: [1500000 GAIA.ATOM])"

By contrast, I can't download artifacts (THORNode logs) from any passed pipeline job.

Looking at the log positions, in the failed job the LTC.LTC => GAIA.ATOM was the very earliest "broadcasted swap tx", whereas in the succeed two it was much later.
https://gitlab.com/thorchain/thornode/-/blob/v1.132.0/test/simulation/actors/swap.go#L139-152
Looking at the timing/input of getQuote.
https://gitlab.com/thorchain/thornode/-/blob/v1.132.0/test/simulation/actors/swap.go#L45-55

	// lock an account with from balance
	a.Ops = append(a.Ops, a.acquireUser)

	// generate swap quote
	a.Ops = append(a.Ops, a.getQuote)

	// send swap inbound
	a.Ops = append(a.Ops, a.sendSwap)

	// verify the swap within expected range
	a.Ops = append(a.Ops, a.verifyOutbound)

Perhaps then the swap quote was calculated before any other swaps had taken place,
but the swap itself took place in the queue after other swaps (due to slippage)
and the three swap complete actor="Swap GAIA.ATOM swaps which took place first,
especially the swap complete actor="Swap GAIA.ATOM => LTC.LTC which both decreased the inbound depth and increased the outbound depth.

Running make test-simulation locally:
WARN[0000] /thornode/build/docker/docker-compose.yml: version is obsolete
9:31AM FTL cmd/mocknet.go:74 > parallelism exceeds number of user accounts
This startled me.
https://gitlab.com/thorchain/thornode/-/blob/develop/test/simulation/cmd/mocknet.go#L52-75
Necessary to commit to memory that (unlike regression tests) simulation test PARALLELISM= has a maximum of 12?

9:34AM INF cmd/main.go:66 > waiting for bifrost to be ready
9:34AM INF cmd/mocknet.go:76 > initializing mocknet simulation user accounts
9:34AM INF cmd/mocknet.go:99 > posting gaia network fee
9:34AM INF ../../bifrost/thorclient/broadcast.go:46 > account info account_number=6 module=thorchain_client sequence_number=0
9:34AM INF ../../bifrost/thorclient/broadcast.go:92 > failed tx bytes=220 code=32 messages=[{"block_height":1,"chain":"GAIA","signer":"tthor1zf3gsk7edzwl9syyefvfhle37cjtql35h6k85m","transaction_fee_rate":1000000,"transaction_size":1}] module=thorchain_client
9:34AM ERR cmd/mocknet.go:105 > failed to post network fee error="fail to broadcast to THORChain,code:32, log:account sequence mismatch, expected 1, got 0: incorrect account sequence"
9:34AM INF ../../bifrost/thorclient/broadcast.go:46 > account info account_number=6 module=thorchain_client sequence_number=1
9:34AM INF ../../bifrost/thorclient/broadcast.go:92 > failed tx bytes=222 code=32 messages=[{"block_height":1,"chain":"GAIA","signer":"tthor1zf3gsk7edzwl9syyefvfhle37cjtql35h6k85m","transaction_fee_rate":1000000,"transaction_size":1}] module=thorchain_client
9:34AM ERR cmd/mocknet.go:105 > failed to post network fee error="fail to broadcast to THORChain,code:32, log:account sequence mismatch, expected 2, got 1: incorrect account sequence"
9:35AM INF ../../bifrost/thorclient/broadcast.go:46 > account info account_number=6 module=thorchain_client sequence_number=2
9:35AM INF ../../bifrost/thorclient/broadcast.go:100 > Received a TxHash of 6299843588363FB6FFC7B7795D3B04E2734953AB01504187E69994F7AA7A2A8A from the thorchain module=thorchain_client

9:35AM INF cmd/mocknet.go:261 > account funded account=rabbit address=tthor1eyllaazzj8kmkqq6mzswwwk5ml7xlt703q37qu amount=1000.000000 chain=THOR txid=E7628875AB1D0D096E36C8DD9FCED5152037DBA85EA4EB16A4C07D874E2A99CE
9:35AM INF ../../bifrost/thorclient/broadcast.go:46 > account info account_number=9 module=thorchain_client sequence_number=11
9:35AM INF ../../bifrost/thorclient/broadcast.go:100 > Received a TxHash of 8B35858B16A27DF836237F006CAF28C771518292730ADF645011CBBC2A08843C from the thorchain module=thorchain_client
9:35AM INF cmd/mocknet.go:261 > account funded account=wolf address=tthor1f6z8lvkkj4u0f4g0cz6akzhydn6d9yv64t7ss7 amount=1000.000000 chain=THOR txid=8B35858B16A27DF836237F006CAF28C771518292730ADF645011CBBC2A08843C
9:35AM INF cmd/mocknet.go:219 > account funded account=hawk address=rltc1qpnmhtlt8umryv9h4uz60ylc07x5d6lkxqrehvc amount=100.000000 chain=LTC txid=0d4265abfa7c33b652c13a22fc55c0e1602c1fa7a396854ff8705c2e48ee16b6
9:35AM FTL cmd/mocknet.go:209 > failed to broadcast funding tx error="insufficient funds for gas * price + value: balance 0, tx cost 10000003893717544000, overshot 10000003893717544000" chain=ETH from=0xee4eaa642b992412f628ff4cec1c96cf2fd0ea4d
9:35AM INF cmd/mocknet.go:219 > account funded account=hawk address=bcrt1qpnmhtlt8umryv9h4uz60ylc07x5d6lkx7wr7mx amount=10.000000 chain=BTC txid=2f9d72162f6a92f9464693abb3176e04d189bf374acf7a1a8c912c254ef5512c
make: *** [Makefile:238: _test-simulation] Error 1
make: *** [Makefile:222: test-simulation] Error 2

Trying again with no PARALLELISM= set.

Same thing.

docker ps
The mocknet containers are still running, which is convenient for debugging and resource-inconvenient if forgetting to manually stop them afterwards.
make halt-mocknet

Looking again at the pipeline logs:
The "receive MsgSwap request" for "source asset=LTC.LTC target asset=GAIA.ATOM" was for committed state "height=28".
"source asset=GAIA.ATOM target asset=LTC.LTC" was for "height=27", the block before.
|
The "handleMsgObservedTxIn request" for LTC.LTC -> GAIA.ATOM was similarly block 28, same block as its swap;
the one for LTC.LTC -> GAIA.ATOM was block 27, same block as its swap.

Perhaps even if they had been in the same block,
if they each got a quote before either had taken place
then whichever came later could have a lower chance of passing the test?

I note again that in the failed job
swap complete actor="Swap LTC.LTC =>
first appeared for the swap to GAIA.ATOM,
in contrast with both succeeded jobs.

Looking into that account sequence mismatch both times:

https://gitlab.com/thorchain/thornode/-/blob/v1.132.0/bifrost/thorclient/broadcast.go#L36-43

		b.accountNumber, seqNum, err = b.getAccountNumberAndSequenceNumber()
		if err != nil {
			return noTxID, fmt.Errorf("fail to get account number and sequence number from thorchain : %w", err)
		}
		b.blockHeight = blockHeight
		if seqNum > b.seqNumber {
			b.seqNumber = seqNum
		}

https://gitlab.com/thorchain/thornode/-/blob/v1.132.0/bifrost/thorclient/broadcast.go#L83-96

	// Code will be the tendermint ABICode , it start at 1 , so if it is an error , code will not be zero
	if commit.Code > 0 {
		if commit.Code == 32 {
			// bad sequence number, fetch new one
			_, seqNum, _ := b.getAccountNumberAndSequenceNumber()
			if seqNum > 0 {
				b.seqNumber = seqNum
			}
		}
		b.logger.Info().Int("bytes", len(txBytes)).Uint32("code", commit.Code).Interface("messages", msgs).Msg("failed tx")
		// commit code 6 means `unknown request` , which means the tx can't be accepted by thorchain
		// if that's the case, let's just ignore it and move on
		if commit.Code != 6 {
			return txHash, fmt.Errorf("fail to broadcast to THORChain,code:%d, log:%s", commit.Code, commit.RawLog)

So if THORChain requires sequence number 0 because it's the first transaction,
it explicitly keeps b.seqNumber set to the wrong number?
[Edit: But there should be no way for it to get wrongly higher than THORChain zero unless THORChain has been rolled back;
here the issue appears to have been parallel broadcasts
(GAIA network fee from test/simulation/cmd/mocknet.go)
with the same sequence number]

https://gitlab.com/search?search=.seqnumber+%3D&nav_source=navbar&project_id=13422983&group_id=5545144&search_code=true&repository_ref=v1.132.0
No other .seqnumber = .
https://gitlab.com/search?search=seqNumber%3A&nav_source=navbar&project_id=13422983&group_id=5545144&search_code=true&repository_ref=v1.132.0
SeqNumber: acc.Sequence, only for BNB and GAIA, suggesting that always starting at 0 for thorchainBridge.
https://gitlab.com/thorchain/thornode/-/blob/v1.132.0/bifrost/thorclient/thorchain.go#L71
seqNumber uint64

Though there's
https://gitlab.com/thorchain/thornode/-/blob/v1.132.0/bifrost/thorclient/broadcast.go#L103
atomic.AddUint64(&b.seqNumber, 1)
that's only reached if it doesn't return first from the wrong sequence number.

What especially puzzles me is why both the 'expected' and 'got' were different (for the same signer).

2024-04-30_failed_simulation_test__ansi2txt__docker-bifrost-1.log

No "failed tx" log, so only happening locally?
docker ps -a
Ah. Though make halt-mocknet was used rather than make stop-mocknet, the containers (and thus logs) are gone.
make test-simulation
docker logs docker-bifrost-1 --timestamps -f
docker logs docker-thornode-1 --timestamps |& grep "incorrect account sequence"
Nothing. Looking again at the source--
https://gitlab.com/thorchain/thornode/-/blob/v1.132.0/test/simulation/cmd/mocknet.go#L88-105

	// validators
	for _, mnemonic := range mocknetValidatorMnemonics {
		wg.Add(1)
		sem <- struct{}{}
		go func(mnemonic string) {
			a := NewAccount(mnemonic, liteClientConstructors)
			mu.Lock()
			c.NodeAccounts = append(c.NodeAccounts, a)
			mu.Unlock()

			// send gaia network fee observation
			log.Info().Msg("posting gaia network fee")
			for {
				_, err := a.Thorchain.PostNetworkFee(1, common.GAIAChain, 1, 1_000_000)
				if err == nil {
					break
				}
				log.Error().Err(err).Msg("failed to post network fee")

Ah, so the simulation test Mocknet is trying to broadcast in parallel with other broadcast attempts;
that's why the THORChain needed sequence is going up (and why it's wrong in the first place),
in that one of the broadcasts for that sequence number is succeeding and the other (the simulation test Mocknet GAIA network fee broadcast) is failing
(until it doesn't and the for loop ends).

PARALLELISM in simulations is the number of parallel actors. More mnemonics can be added in time if needed.
Yes these are inherently a bit racy - that's why success on the pipeline stage is not required while we hammer out kinks. Over time the races will be reduced by additions like an arbitrage actor to maintain pool prices, adjustments to the expected range, perhaps additions like global semaphores or locks per pool.

2. I'm happy if there's awareness of the current state.

1. Could there be (is there any problem from)
"if parallelism is set higher than the maximum mnemonics, reduce to the maximum mnemonics and continue"?

Especially if I were leaving to run a full
PARALLELISM=[number] make test lint-ci test-regression test-simulation
I would welcome being able to use a higher number rather than have to do the test-regression in small batches
or needing to break it into a longer && PARALLELISM=[smaller number] command.

(Plus when new regression tests are added I can notice from logs the number is bigger and start using a bigger PARALLELISM,
but even if mnemonics/actors were increased I might not realise and keep using an unnecessarily-small PARALLELISM for it (taking longer than preferred).)

((Similarly, it's currently convenient to be able to set PARALLELISM= with room to spare for regression tests
(not failing if specifying higher PARALLELISM than there are tests)
, so that even when new tests are added I can keep using the same number each time until the number of tests gets close to it,
and only then switch to using a new bigger number for the indefinite future.))

Mmmm I see. Yea the parallelism is still effectively limited by the number of mnemonics and the intent was to just to fail so the user knows what they're attempting will not be achieved. I can change that to just a warning on my next change. In the meantime you can always just:

PARALLELISM=[number] make test lint-ci test-regression && PARALLELISM=[other-number] make test-simulation

added priority1 label

[fix for 132] Two GetAssetOutboundFee patches #check-lint-warning

Merged by Eridanus (9R) 10 months ago (Apr 29, 2024 8:57pm UTC) 10 months ago

Activity

[fix for 132] Two GetAssetOutboundFee patches #check-lint-warning

Merge request reports

Merged by Eridanus (9R) 10 months ago (Apr 29, 2024 8:57pm UTC) 10 months ago

Activity