[chef] Re: Re: Bizarre EOFError on Google Compute Engine, behind NAT


Chronological Thread 
  • From: Lamont Granquist < >
  • To:
  • Cc: Jeff Goldschrafe < >
  • Subject: [chef] Re: Re: Bizarre EOFError on Google Compute Engine, behind NAT
  • Date: Sat, 18 Oct 2014 14:45:54 -0700

On Sat Oct 18 14:06:36 2014, Jeff Goldschrafe wrote:
Thanks for the response! The MTU was my first gut instinct, but it
didn’t take me anywhere right away. I spent a few hours Wiresharking
(including force-disabling DHE ciphers so I could decrypt the SSL
payloads), then disabled gzip on the server and started getting
Content-Length mismatches instead of EOFErrors. That made me notice
something suspicious about the payload sizes of the TLS continuations,
and that led me to suspect a fragmentation issue with the GCE
networking layer. This took me to a recent issue on the GCE project:

https://code.google.com/p/google-compute-engine/issues/detail?id=118&colspec=ID%20Type%20Component%20Resource%20Service%20Status%20Stars%20Summary%20Log

So, problem solved, I guess. Very weird that it only visibly impacted
a handful of Chef API calls, and no other traffic.

On October 18, 2014 at 4:30:15 PM, Lamont Granquist
(
 
<mailto: >)
 wrote:

Is the NAT gateway changing the Path MTU and have you broken PMTU
discovery via blocking all ICMP with iptables or something similar?
Also look at jumbo frames, etc. You should be able to debug this with
large ping packets, large GET/POST requests with curl/wget, or force
the MTU on the interfaces on both ends of the connection lower until it
starts working.


Yeah we ran into a similar problem with TCP offloading in rackspace windows images that produced EOFErrors as well, and rackspace engineers produced some magic incantations for windows to turn off offloading which made it go away. https://github.com/opscode/chef/issues/1881#issuecomment-53668823

Generally EOFErrors like this seem to be not-a-chef-bug and be some kind of busted networking.




Archive powered by MHonArc 2.6.16.

§