Encoding problem when extracting blob from SQL Server with Entity Framework

Question

Encoding problem when extracting blob from SQL Server with Entity Framework

I noticed a peculiarity when trying to store 2003 MS text documents in a SQL Server BLOB ( nvarchar(max) ) block, and then retrieve them using the Entity Framework (EF1), then convert them back to a file (which is then attached to the email).

Main code:

1) I insert the document into the database using openrowset:

 INSERT INTO [dbo].[Attachment]([id],[Blob]) SELECT 1, (SELECT BulkColumn FROM OPENROWSET(Bulk 'path_to_attachment\abc123.doc', SINGLE_BLOB) AS BLOB)

2) Then I retrieve the file from the database using EF1 (simplified for brevity - repo traversal, etc.):

 var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault() var bytes = Encoding.Unicode.GetBytes(attachment.Blob); var stream = new MemoryStream(bytes); var fileName = "abc123.doc"; var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);

Problem:

This works, but I noticed some inconsistencies with the file after clicking the file through this process, some minor formatting problems, etc.

After digging deeper (using VBinDiff) - it looks like some of the Unicode characters are converted to FDFF

 00DC > FDFF

original:

00 00 00 00 00 00 00 00 00 DC 00 00 00 00 00 00

converted by:

00 00 00 00 00 00 00 00 FD FF 00 00 00 00 00 00

another example:

 BED9 > FDFF CFD9 > FDFF

which looks last in the range from here: http://www.unicode.org/charts/PDF/UFB50.pdf

Questions

1) Am I doing something stupid, or does EF do something funky when returning the attachment.Blob string object - before I then try to convert it back to an array of bytes?

2) Is there a better way to get exact bytes from the blob field when using entity structures? (or should I use a stored procedure or SqlDataReader instead, which I really don't want to do so as not to collapse the data access paths)

+4

sql-server encoding c # -3.0 entity-framework

Tr1stan Feb 28 '11 at 10:17

source share

2 answers

Import using SINGLE_BLOB returns the contents of the file as a single-row rowset with a single column of type varbinary(max) .

Instead of SINGLE_BLOB I would suggest using SINGLE_NCLOB , which reads the file as nvarchar(max) .

Unicode files must be read using the SINGLE_NCLOB parameter shown here:

 SELECT BulkColumn FROM OPENROWSET (BULK 'path_to_attachment\abc123.doc', SINGLE_NCLOB) AS BLOB

Ref .: Using OPENROWSET to read large files in SQL Server

Refresh (in response to a comment): If the files are not Unicode (as you tried), then when you receive them, you should not use Unicode encoding to receive bytes:

 var bytes = Encoding.ASCII.GetBytes(attachment.Blob);

+1

Mitch wheat Mar 6 '11 at 3:32

source share

Tr1stan · Accepted Answer · 2011-03-08T10:06:20+0000

Mitch's answer helped indicate a failure in my code. For some reason (out of habit, I think) I would set the Blob field to nvarchar(max) when, as Mitch pointed out, SINGLE_BLOB saves the file information as varbinary(max) , which was actually what I wanted in (see question 2).

Decision:

Convert database field from nvarchar(max) to varbinary(max)
Update Entity Framework Model
Change the blob field inside the EF model from string to binary `

And finally, change this

 var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault() var bytes = Encoding.Unicode.GetBytes(attachment.Blob); var stream = new MemoryStream(bytes); var fileName = "abc123.doc"; var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);

For this:

 var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault() var stream = new MemoryStream(attachment.Blob); var fileName = "abc123.doc"; var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet);

Encoding problem when extracting blob from SQL Server with Entity Framework

More articles: