I am trying to minimize sqlite3 db with lots of HTML using compression. I used python to create sqlite3 db and I am trying to unpack on Android.
I use gzip to compress HTML files and save in db as a BLOB. Here is the code I wrote to create sqlite3 db (in Python):
from sys import stdin, argv import sqlite3 import gzip import cStringIO def compressBuf(buf): zbuf = cStringIO.StringIO() zfile = gzip.GzipFile(mode = 'wb', fileobj = zbuf, compresslevel = 9) zfile.write(buf) zfile.close() return zbuf.getvalue() conn = sqlite3.connect(argv[1]) conn.text_factory = str c = conn.cursor() c.execute('''CREATE TABLE articles ( id INTEGER NOT NULL PRIMARY KEY, name TEXT, category TEXT, html BLOB );''') c.execute(' CREATE INDEX name_index on articles (name); ') for line in stdin: line = line.strip().split('\t') line[-1] = sqlite3.Binary(compressBuf(line[-1])) c.execute('INSERT INTO articles VALUES (?, ?, ?, ?);', line) conn.commit() c.close() conn.close()
Here is a snippet of code for Android:
Cursor cursor = db.rawQuery("SELECT html FROM articles WHERE id = " + id + " limit 1;", null); cursor.moveToFirst(); byte[] zhtml = cursor.getBlob(0); ByteArrayInputStream is = new ByteArrayInputStream(zhtml); GZIPInputStream gis = new GZIPInputStream(is, zhtml.length);
I get the following exception complaining that the header is incorrect:
java.io.IOException: unknown format (magic number 213c) at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:84) at tw.cse.o0o.MyApp.WebServer$ArticleHandler$1.writeTo(WebServer.java:196) at org.apache.http.entity.EntityTemplate.writeTo(EntityTemplate.java:76) at org.apache.http.impl.entity.EntitySerializer.serialize(EntitySerializer.java:97) at org.apache.http.impl.AbstractHttpServerConnection.sendResponseEntity(AbstractHttpServerConnection.java:182) at org.apache.http.protocol.HttpService.handleRequest(HttpService.java:209) at tw.cse.o0o.MyApp.WebServer.run(SQLHelper.java:90)
Using the Python interpreter, I can confirm the compressBuf function returns with the correct magic number gzip 0x1f8b:
>>> compressBuf('test') '\x1f\x8b\x08\x00 \xba:O\x02\xff+I-.\x01\x00\x0c~\x7f\xd8\x04\x00\x00\x00'
[change]
Ok, here is what I found out:
In Nexus One, the getBlob () function automatically decompresses binary data, be it zlib or gzip. 213c in the error log is the first two characters of the original html. However, this does not apply to the Samsung Galaxy Tab (first gene). I'm still trying to find a way to decompress on my Galaxy tab.