I have a database (for tracking statistics by email) that has grown to hundreds of megabytes, and I have been looking for ways to reduce it.
It seems that the main reason for the large file size is that the same lines are repeated in thousands of lines. To avoid this problem, I plan to create another table for the row pool, for example:
CREATE TABLE AddressLookup ( ID INTEGER PRIMARY KEY AUTOINCREMENT, Address TEXT UNIQUE ); CREATE TABLE EmailInfo ( MessageID INTEGER PRIMARY KEY AUTOINCREMENT, ToAddrRef INTEGER REFERENCES AddressLookup(ID), FromAddrRef INTEGER REFERENCES AddressLookup(ID) );
And for convenience, a view for joining these tables:
CREATE VIEW EmailView AS SELECT MessageID, A1.Address AS ToAddr, A2.Address AS FromAddr FROM EmailInfo LEFT JOIN AddressLookup A1 ON (ToAddrRef = A1.ID) LEFT JOIN AddressLookup A2 ON (FromAddrRef = A2.ID);
To be able to use this view as if it were a regular table, I made several triggers:
CREATE TRIGGER trg_id_EmailView INSTEAD OF DELETE ON EmailView BEGIN DELETE FROM EmailInfo WHERE MessageID = OLD.MessageID; END; CREATE TRIGGER trg_ii_EmailView INSTEAD OF INSERT ON EmailView BEGIN INSERT OR IGNORE INTO AddressLookup(Address) VALUES (NEW.ToAddr); INSERT OR IGNORE INTO AddressLookup(Address) VALUES (NEW.FromAddr); INSERT INTO EmailInfo SELECT NEW.MessageID, A1.ID, A2.ID FROM AddressLookup A1, AddressLookup A2 WHERE A1.Address = NEW.ToAddr AND A2.Address = NEW.FromAddr; END; CREATE TRIGGER trg_iu_EmailView INSTEAD OF UPDATE ON EmailView BEGIN UPDATE EmailInfo SET MessageID = NEW.MessageID WHERE MessageID = OLD.MessageID; REPLACE INTO EmailView SELECT NEW.MessageID, NEW.ToAddr, NEW.FromAddr; END;
Problem
After:
INSERT OR REPLACE INTO EmailView VALUES (1, ' alice@example.com ', ' bob@example.com '); INSERT OR REPLACE INTO EmailView VALUES (2, ' alice@example.com ', ' chad@example.com ');
Updated lines contain:
MessageID ToAddr FromAddr --------- ------ -------- 1 NULL bob@example.com 2 alice@example.com chad@example.com
There is NULL, which should not be. The corresponding cell in the EmailInfo table contains the orphan value ToAddrRef .
If you do INSERT one at a time, you will see that the Alice ID in the AddressLookup table changes!
This behavior seems to be documented :
The ON CONFLICT clause can be specified as part of an UPDATE or INSERT action in the trigger body. However, if the ON CONFLICT clause is specified as part of the statement that triggers the trigger, the processing policy of the external expression of the external statement is used instead.
Thus, “REPLACE” in the top-level statement “INSERT OR REPLACE” overrides the critical “INSERT OR IGNORE” in the launcher.
Is there any way to make it work the way I wanted?