Encoding problem when extracting blob from SQL Server with Entity Framework

I noticed a peculiarity when trying to store 2003 MS text documents in a SQL Server BLOB ( nvarchar(max) ) block, and then retrieve them using the Entity Framework (EF1), then convert them back to a file (which is then attached to the email).

Main code:

1) I insert the document into the database using openrowset:

 INSERT INTO [dbo].[Attachment]([id],[Blob]) SELECT 1, (SELECT BulkColumn FROM OPENROWSET(Bulk 'path_to_attachment\abc123.doc', SINGLE_BLOB) AS BLOB) 

2) Then I retrieve the file from the database using EF1 (simplified for brevity - repo traversal, etc.):

 var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault() var bytes = Encoding.Unicode.GetBytes(attachment.Blob); var stream = new MemoryStream(bytes); var fileName = "abc123.doc"; var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet); 

Problem:

This works, but I noticed some inconsistencies with the file after clicking the file through this process, some minor formatting problems, etc.

After digging deeper (using VBinDiff) - it looks like some of the Unicode characters are converted to FDFF

 00DC > FDFF 

original:

00 00 00 00 00 00 00 00 00 DC 00 00 00 00 00 00

converted by:

00 00 00 00 00 00 00 00 FD FF 00 00 00 00 00 00

another example:

 BED9 > FDFF CFD9 > FDFF 

which looks last in the range from here: http://www.unicode.org/charts/PDF/UFB50.pdf

Questions

1) Am I doing something stupid, or does EF do something funky when returning the attachment.Blob string object - before I then try to convert it back to an array of bytes?

2) Is there a better way to get exact bytes from the blob field when using entity structures? (or should I use a stored procedure or SqlDataReader instead, which I really don't want to do so as not to collapse the data access paths)

+4
source share
2 answers

Mitch's answer helped indicate a failure in my code. For some reason (out of habit, I think) I would set the Blob field to nvarchar(max) when, as Mitch pointed out, SINGLE_BLOB saves the file information as varbinary(max) , which was actually what I wanted in (see question 2).

Decision:

  • Convert database field from nvarchar(max) to varbinary(max)
  • Update Entity Framework Model
  • Change the blob field inside the EF model from string to binary `

And finally, change this

 var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault() var bytes = Encoding.Unicode.GetBytes(attachment.Blob); var stream = new MemoryStream(bytes); var fileName = "abc123.doc"; var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet); 

For this:

 var attachment = (from a in ctx.Attachment where a.id == 1 select a).FirstOrDefault() var stream = new MemoryStream(attachment.Blob); var fileName = "abc123.doc"; var fileToAttach = new Attachment(stream, fileName, MediaTypeNames.Application.Octet); 
+2
source

Import using SINGLE_BLOB returns the contents of the file as a single-row rowset with a single column of type varbinary(max) .

Instead of SINGLE_BLOB I would suggest using SINGLE_NCLOB , which reads the file as nvarchar(max) .

Unicode files must be read using the SINGLE_NCLOB parameter shown here:

 SELECT BulkColumn FROM OPENROWSET (BULK 'path_to_attachment\abc123.doc', SINGLE_NCLOB) AS BLOB 

Ref .: Using OPENROWSET to read large files in SQL Server

Refresh (in response to a comment): If the files are not Unicode (as you tried), then when you receive them, you should not use Unicode encoding to receive bytes:

 var bytes = Encoding.ASCII.GetBytes(attachment.Blob); 
+1
source

Source: https://habr.com/ru/post/1341622/


All Articles