I noticed a peculiarity when trying to store 2003 MS text documents in a SQL Server BLOB ( nvarchar(max) ) block, and then retrieve them using the Entity Framework (EF1), then convert them back to a file (which is then attached to the email).
Main code:
1) I insert the document into the database using openrowset:
INSERT INTO [dbo].[Attachment]([id],[Blob]) SELECT 1, (SELECT BulkColumn FROM OPENROWSET(Bulk 'path_to_attachment\abc123.doc', SINGLE_BLOB) AS BLOB)
2) Then I retrieve the file from the database using EF1 (simplified for brevity - repo traversal, etc.):
var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault() var bytes = Encoding.Unicode.GetBytes(attachment.Blob); var stream = new MemoryStream(bytes); var fileName = "abc123.doc"; var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);
Problem:
This works, but I noticed some inconsistencies with the file after clicking the file through this process, some minor formatting problems, etc.
After digging deeper (using VBinDiff) - it looks like some of the Unicode characters are converted to FDFF
00DC > FDFF
original:
00 00 00 00 00 00 00 00 00 DC 00 00 00 00 00 00
converted by:
00 00 00 00 00 00 00 00 FD FF 00 00 00 00 00 00
another example:
BED9 > FDFF CFD9 > FDFF
which looks last in the range from here: http://www.unicode.org/charts/PDF/UFB50.pdf
Questions
1) Am I doing something stupid, or does EF do something funky when returning the attachment.Blob string object - before I then try to convert it back to an array of bytes?
2) Is there a better way to get exact bytes from the blob field when using entity structures? (or should I use a stored procedure or SqlDataReader instead, which I really don't want to do so as not to collapse the data access paths)