The document type is not part of the document, but is part of its DTD
require 'rubygems' require 'nokogiri' html = <<EOF <!DOCTYPE foo PUBLIC "bar" "qux"> <html> </html> EOF doc = Nokogiri::HTML(html) puts doc.internal_subset.name puts doc.internal_subset.external_id puts doc.internal_subset.system_id
akuhn source share