`Parser::HTTP1#reset!` discards bytes between pipelined responses
Hi, I'll start by admitting that I don't fully understand the internals of an HTTP parser, so I'll start with my problem, before moving in to my possibly AI-psychosis led solution 😇 We use HTTPX (among other ways) via Faraday via the elasticsearch-transport gem. And we recently just attempted a switch from Puma to Falcon, and immediately we started seeing timeouts caused by our requests to our elasticsearch instance. We were able to isolate the problem to HTTPX, because if we switched out the faraday adapter for `:net_http` everything was working fine. After having Opus 4.7 dig into it for about 2 hours, it emerged with a reproduction script, and a fix. It also found that the issue was in part caused by a previous issue/fix of mine #371. The issue seems to be that the `@buffer.clear` is dropping data for another request. I don't want to sound like I understand more than I do so I'll stop my human writing here and hand over my LLM's report below, in case it is of any use. --- ## TL;DR `Parser::HTTP1#reset!` (in `lib/httpx/parser/http1.rb`) calls `@buffer.clear` after every completed response. When two or more HTTP/1.1 responses arrive back-to-back in the same client-side TCP read — which happens easily under the `:persistent` plugin with multiple concurrent Async fibers — the parser parses response 1, fires `on_complete`, calls `reset!`, and the `clear` discards the bytes for response 2 that were already in the buffer. Those bytes are gone, the fiber that owns response 2 waits forever, and after `operation_timeout` the connection resets and in-flight requests fail. The fix is one line: keep the `@buffer = @buffer.to_s` (the unwrap-from-Decoder that was the actual fix for #371 / #378) and drop the `@buffer.clear`. ## Symptoms Under `:persistent` + multiple concurrent Async fibers hitting a server that sends chunked transfer-encoded responses (e.g. Elasticsearch): - `HTTPX::OperationTimeoutError: timed out while waiting on select` after the configured `operation_timeout` (60 s default; 3 s in the bundled repro). - Some requests succeed, some fail. The failure is intermittent because it depends on the timing of TCP reads — specifically, on two responses landing in the same client-side read. - Removing `:persistent` makes everything pass (every request gets its own single-response connection, so pipelining never happens). ## Where the bug is `lib/httpx/parser/http1.rb:26-33`: ```ruby def reset! @state = :idle @headers = {} @content_length = nil @_has_trailers = @upgrade = false @buffer = @buffer.to_s # ← good: unwraps Chunker::Decoder back to a String @buffer.clear # ← BUG: that String still holds bytes for the next response end ``` When parsing a chunked response, `prepare_data` wraps `@buffer` in a `Transcoder::Chunker::Decoder`. The decoder consumes bytes off its underlying string as it decodes chunks. After the terminating `0\r\n\r\n`, any *further* bytes that arrived in the same TCP read (i.e., the start of the next response) are still sitting in the decoder's underlying string. `@buffer = @buffer.to_s` correctly unwraps the decoder back to that string — but `@buffer.clear` then empties it, throwing the next response's bytes away. The very next thing the parser does after `reset!` is: ```ruby nextstate(:idle) unless @buffer.empty? ``` …which is dead code today (the buffer is always empty after `reset!`), but clearly indicates the intended design: if more bytes are in the buffer, go back to `:idle` and parse them as the next response. ## How this happened This is a follow-up to two earlier issues that already touched this exact spot: - **#371** ("`Parser::HTTP1#reset!` doesn't reset `@buffer`, causing `undefined method 'index' for Chunker::Decoder` on connection reuse"). Fixed by commit `24c1df6` (Jan 2026) which added `@buffer.clear`. - **#378** ("http1: reset body back to string on reset"). Fixed by commit `4fe3e64` (Apr 2026) which added `@buffer = @buffer.to_s` *before* the existing `.clear`. The commit message specifically mentions HEAD responses with `transfer-encoding: chunked`. The `.to_s` from #378 was the actual fix for both crashes — once the buffer is unwrapped from the `Decoder`, subsequent string operations work again. The `.clear` from #371 was always overzealous, but until you have pipelined chunked responses landing in a single read, you never notice the bytes you were throwing away. This patch keeps the unwrap (so #371 / #378 stay fixed) and drops the clear. ## Standalone reproduction (parser only — no network) `parser_repro.rb` in this repo: ```ruby require "httpx" require "httpx/parser/http1" class Observer attr_accessor :parser attr_reader :events def initialize; @events = []; end def on_start; @events << :start; end def on_headers(h); @events << [:headers, h.dup]; end def on_data(d); @events << [:data, d.dup]; end def on_trailers(h); @events << [:trailers, h.dup]; end def on_complete @events << :complete @parser.reset! # mimics Connection::HTTP1#dispatch after on_complete end end def chunked(body) body.chars.each_slice(5).map(&:join).map { |c| "#{c.bytesize.to_s(16)}\r\n#{c}\r\n" }.join + "0\r\n\r\n" end def response(body) "HTTP/1.1 200 OK\r\nTransfer-Encoding: chunked\r\nConnection: keep-alive\r\n\r\n" + chunked(body) end obs = Observer.new parser = HTTPX::Parser::HTTP1.new(obs) obs.parser = parser # Two pipelined responses arriving in a single feed (== a single TCP read). parser << (response("Hello!") + response("World!")) complete = obs.events.count { |e| e == :complete } data = obs.events.select { |e| e.is_a?(Array) && e[0] == :data }.map { |e| e[1] }.join puts "complete events: #{complete} (expected 2)" puts "decoded body: #{data.inspect} (expected \"Hello!World!\")" ``` Stock httpx 1.7.6: ``` $ bundle exec ruby parser_repro.rb complete events: 1 (expected 2) decoded body: "Hello!" (expected "Hello!World!") FAIL ``` The bytes for the second response were silently dropped. After the patch: ``` $ bundle exec ruby parser_repro.rb complete events: 2 (expected 2) decoded body: "Hello!World!" (expected "Hello!World!") PASS ``` ## End-to-end reproduction This repo also contains: - `repro.rb` — fires concurrent fibers under Async + `:persistent`. - `server.rb` — a small TCP server that pipelines chunked + gzipped responses the way Elasticsearch does, so you can reproduce without ES. ### Without Elasticsearch ```sh bundle install ruby server.rb & # listens on 127.0.0.1:9201 URL=http://127.0.0.1:9201/test bundle exec ruby repro.rb ``` Stock 1.7.6 fails most runs (run it a few times — bug is intermittent): ``` === final === success: 96 / 100 slow (>1s): 0 errors: 4 [4x] HTTPX::OperationTimeoutError: timed out while waiting on select ``` ### With Elasticsearch If you have ES 8 on `localhost:9200`, point at any populated index that has the fields the repro's aggregations reference (or edit `repro.rb` to use your own query): ```sh INDEX=your_index_name bundle exec ruby repro.rb ``` ## The fix (`fix.patch`) ```diff --- a/lib/httpx/parser/http1.rb +++ b/lib/httpx/parser/http1.rb @@ -28,8 +28,11 @@ @headers = {} @content_length = nil @_has_trailers = @upgrade = false + # Unwrap the Chunker::Decoder back to its underlying string (the fix + # from #371/#378). Do NOT clear the string: when responses are + # pipelined, it may already hold the start of the next response, + # which `nextstate(:idle) unless @buffer.empty?` will then parse. @buffer = @buffer.to_s - @buffer.clear end def upgrade? ``` To apply locally and verify: ```sh patch -p1 -d $(bundle show httpx) < fix.patch bundle exec ruby parser_repro.rb # → PASS ruby server.rb & URL=http://127.0.0.1:9201/test bundle exec ruby repro.rb # → 100/100, 0 errors ``` To revert: `patch -R -p1 -d $(bundle show httpx) < fix.patch`. ## Suggested test Worth adding a regression test in `test/parser_test.rb` along the lines of "feeding two pipelined chunked responses in a single `<<` produces two `on_complete` events and concatenates their bodies correctly". The `parser_repro.rb` snippet above is essentially that test. Happy to send a PR with it (or with this whole fix) if helpful. ## Notes on what I missed initially I started by suspecting the `FiberConcurrency::HTTP1Methods#interests` filter — fibers really do go into `select_one(:r, 3)` and time out, and that's the visible `OperationTimeoutError`. But that's a *consequence* of the parser dropping bytes, not the cause. The connection waits for response bytes that the server already sent and the client already received but threw away.
issue