IMAP messages have UID for which we all rejoice. However, I am trying to figure out how to create a unique identifier for a POP3 message and have problems (older systems like hotmail.com only allow POP3).
Available messages to the client are captured when the POP maildrop session is opened, and are identified by the message number local to the session or, optionally, using the unique identifier assigned to the message by the POP server. This unique identifier is persistent and unique to maildrop and allows the client to access the same message in different POP sessions. Mail is retrieved and marked to delete the message number. When a client leaves the session, mail marked for deletion is deleted from the mailbox. - wikipedia
It seems that the main LIST command simply returns an array of temporary numbers so that you can receive the email. These numbers are by no means unique, although it seems to have added another extension called UIDL: CAPA (POP3 extension mechanism).
POP3 claims that a UIDL is unique as long as the message exists.
The unique identifier of the message is an arbitrary string defined by the server, consisting of one to 70 characters in the range from 0x21 to 0x7E, which uniquely identifies the message within the maildrop and which is stored through the sessions. This persistence is required even if the session ends without entering the UPDATE state. The server should never reuse unique-id in a given mailbox until an object using unique-id exists.
Please note that messages marked as deleted are not listed.
Although it is usually preferable to store server implementations of randomly assigned unique identifiers in maildrop, this specification is intended to allow unique identifiers to be calculated as a hash message. Clients should be able to deal with a situation where two identical copies of a message in maildrop have the same unique identifier.
Which makes me think that it is possible that I can download another message a year later (after the first one has been deleted), which has the same UIDL and may run into my system.
Should I just hash the entire text of the message and use it as an identifier?
Instead of retrieving all the email for the hash, perhaps I just need to use TOP [id] 1 for the hash headers (and the first line) that should not match the existing email address, since the receiving server will always add some type of information correct ? Thus, the attacker could never cause a collision since the received or something had to be changed correctly?
The MDaemon program seems to solve the problem of partial hashing of headers:
MDaemon creates UIDL results using the message name, date stamp, size, and several other message information. As a result, if the message is changed on the server, it will be displayed as βnewβ for mail clients, even if you do not rename it.
What is the correct way to create an identifier for POP3 email?
Note. Email often contains a Message-ID header, but I cannot rely on it because it can be used as an attack vector to confuse my system. It is also abandoned by some email clients.