11
12
09

Hpricot – [BUG] Bus Error – Solution / Workaround

Problem:

You’re using Hpricot to parse web content, but it’s throwing an error like this that completely kills the process (probably crashing your app, or your background task, as the case may be):

/usr/local/lib/ruby/gems/1.8/gems/hpricot-0.8.2/lib/hpricot/parse.rb:33: [BUG] Bus Error
ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin8.11.1]
Abort trap

This resource suggests that the problem is that the content retrieved is precisely 16384 bytes long, however, that was not the problem in my case.

My problem is replicated in this gist. Examination of the URL it was trying to retrieve using curl with -i indicated that this was returning a 302 redirect:

HTTP/1.1 302 Found
Date: Thu, 12 Nov 2009 14:50:53 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
X-UA-Compatible: IE=EmulateIE7
Location: /
Set-Cookie: ASP.NET_SessionId=p2s0dljru11tiwer3e01jfq2; path=/; HttpOnly
Set-Cookie: Forum2backURL=/tm.aspx?m=1859288#1859354; path=/
Set-Cookie: Forum2preURL=; path=/
Cache-Control: private
Expires: Wed, 11 Nov 2009 13:50:53 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 120

I am not sure why Ruby’s OpenURI open method was not capable of parsing / following this redirect. However, I determined that the file returned by open() had a size of zero bytes, and this was causing Hpricot to blow up.

My workaround is just to check the size of the file returned by open() and only try to parse it if it is greater than 0:

f = open(file_or_uri)
if f.size > 0
doc = Hpricot(f)
else
raise "Could not retrieve content due to zero-sized file, possibly due to site redirect."
end

25 Responses to “Hpricot – [BUG] Bus Error – Solution / Workaround”

  1. Hey – thanks for the post. Been googling this very error for over an hour now. There seems no way to adequately rescue the Bus Error Hpricot is throwing.

    Cheers!

    Richard Luck
    HeyPublisher.com