Convert XML data to tsql record for any XML response

I don’t know if it has already been answered, but my failure or I can’t find it anywhere in stackoverflow using my hunting methods. Please ignore my spam.

We have a requirement when we need to write an API parser that works for any API that provides XML output.

We will not know the XML structure before we get started.

The solution is to convert the XML file and save it in a common tsql table with the names of the XML elements / attributes as the first row.

So basically it is an XML deserializer for any API.

We cannot use a third-party dll for our C # class.

I have no idea about C #, so I don’t know if this is possible or not. But I was able to write genct> XML-> string converter in tsql using OPENXML. The problem with tsql solution is that we cannot successfully import a huge XML file into the database.

I can provide any details that are required. Please let me know in the comments / replies.

I don’t want anyone to write code for me, any suitable pointers would be enough

Resources: JSON

[ { "id" : 21953, "mainReqIdentity" : "xxxx", "itemName" : "xxxx", "kanbanPhase" : "xxxx", "kanbanStatus" : "xxxx", "backlogItemType" : "xxxx", "identityDomain" : "xxxx", "fromDatetime" : "2016-08-05 17:52:34", "teams" : [], "releases" : [{ "id" : 1229, "release_name" : "xxxx", "release_connection_type" : "xxxx" } ], "fpReleases" : [], "sources" : [{ "sourceName" : "xxxx", "sourceRecordUrl" : "xxxx", "sourceRecordIdentity" : "xxxx" } ], "productNumbers" : [], "tags" : [], "productComponents" : [], "ranPlatforms" : [], "subReleases" : [], "requirementAreaId" : xxxx, "requirementArea" : "xxxx", "toBeHandledAtxxxx" : "xxxx" }, { "id" : 22014, "mainReqIdentity" : "xxxx", "itemName" : "xxxx", "kanbanPhase" : "xxxx", "kanbanStatus" : "xxxx", "backlogItemType" : "xxxx", "identityDomain" : "xxxx", "fromDatetime" : "2016-08-05 17:52:34", "teams" : [], "releases" : [{ "id" : xxxx, "release_name" : "xxxx", "release_connection_type" : "xxxx" } ], "fpReleases" : [], "sources" : [{ "sourceName" : "xxxx", "sourceRecordUrl" : "xxxx", "sourceRecordIdentity" : "xxxx" } ], "productNumbers" : [], "tags" : [], "productComponents" : [], "ranPlatforms" : [], "subReleases" : [], "requirementAreaId" : xxxx, "requirementArea" : "xxxx", "f0Date" : "2015-10-01", "f1Date" : "2015-10-01", "f2Date" : "2016-02-01", "f4Date" : "2016-03-31", "fgDate" : "2016-04-29", "toBeHandledAtxxxx" : "xxxx" } ] 

XML: 2 samples

Example 1

  <root type="array"> <id type="number">21286</id> <mainReqIdentity type="string">xxxxxx</mainReqIdentity> <itemName type="string">xxxxxx</itemName> <kanbanPhase type="string">xxxxxx</kanbanPhase> <kanbanStatus type="string">xxxxxx</kanbanStatus> <kanbanNote type="string">xxxxxx</kanbanNote> <backlogItemType type="string">xxxxxx</backlogItemType> <identityDomain type="string">xxxxxx</identityDomain> <fromDatetime type="string">2016-08-23 17:01:52</fromDatetime> <teams type="array"> <item type="object"> <team_name type="string">xxxxxx</team_name> <preliminary type="boolean">xxxxxx</preliminary> </item> </teams> <releases type="array"> <item type="object"> <id type="number">xxxxxx</id> <release_name type="string">xxxxxx</release_name> <release_connection_type type="string">xxxxxx</release_connection_type> </item> </releases> <fpReleases type="array"> </fpReleases> <sources type="array"> <item type="object"> <sourceName type="string">xxxxxx</sourceName> <sourceRecordUrl type="string">xxxxxx</sourceRecordUrl> </item> </sources> <productNumbers type="array"> </productNumbers> <tags type="array"> </tags> <productComponents type="array"> </productComponents> <ranPlatforms type="array"> </ranPlatforms> <subReleases type="array"> </subReleases> <requirementAreaId type="number">xxxxxx</requirementAreaId> <requirementArea type="string">xxxxxx</requirementArea> <itemContact type="string">xxxxxx</itemContact> <toBeHandledAtxxx type="string">xxxxxx</toBeHandledAtLuca> </item> <item type="object"> <id type="number">xxxxxx</id> <mainReqIdentity type="string">xxxxxx</mainReqIdentity> <itemName type="string">xxxxxx</itemName> <kanbanPhase type="string">xxxxxx</kanbanPhase> <kanbanStatus type="string">xxxxxx</kanbanStatus> <kanbanNote type="string">xxxxxx</kanbanNote> <backlogItemType type="string">xxxxxx</backlogItemType> <identityDomain type="string">xxxxxx</identityDomain> <fromDatetime type="string">2016-08-23 17:01:52</fromDatetime> <teams type="array"> <item type="object"> <team_name type="string">xxxxxx</team_name> <preliminary type="boolean">xxxxxx</preliminary> </item> </teams> <releases type="array"> <item type="object"> <id type="number">xxxxxx</id> <release_name type="string">xxxxxx</release_name> <release_connection_type type="string">xxxxxx</release_connection_type> </item> </releases> <fpReleases type="array"> </fpReleases> <sources type="array"> <item type="object"> <sourceName type="string">xxxxxx</sourceName> <sourceRecordUrl type="string">xxxxxx</sourceRecordUrl> </item> </sources> <productNumbers type="array"> </productNumbers> <tags type="array"> </tags> <productComponents type="array"> </productComponents> <ranPlatforms type="array"> </ranPlatforms> <subReleases type="array"> </subReleases> <requirementAreaId type="number">xxxxxx</requirementAreaId> <requirementArea type="string">xxxxxx</requirementArea> <oaResultReference type="string">xxxxxx</oaResultReference> <itemContact type="string">xxxxxx</itemContact> <f0Date type="string">2014-10-17</f0Date> <f1Date type="string">2015-01-16</f1Date> <f2Date type="string">2015-02-13</f2Date> <f4Date type="string">2015-06-12</f4Date> <faDate type="string">2015-06-12</faDate> <fgDate type="string">2015-06-12</fgDate> <toBeHandledAtxxx type="string">xxxxxx</toBeHandledAtLuca> </item> </root> 

Example 2

 <ROOT> <Customer CustomerID="VINET" ContactName="Paul Henriot"> <Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00"> <OrderDetail OrderID="10248" ProductID="11" Quantity="12"/> <OrderDetail OrderID="10248" ProductID="42" Quantity="10"/> </Order> </Customer> <Customer CustomerID="LILAS" ContactName="Carlos Gonzlez"> <Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00"> <OrderDetail OrderID="10283" ProductID="72" Quantity="3"/> </Order> </Customer> </ROOT> 

SQL

General table table

 create table ZZZZZZZZZ ( api_id int, record_type char(1), record_id INT, last_run_time datetime, last_run_by varchar(500), col1 VARCHAR(500), col2 VARCHAR(500), col3 VARCHAR(500), col4 VARCHAR(500), col5 VARCHAR(500), col6 VARCHAR(500), col7 VARCHAR(500), col8 VARCHAR(500), col9 VARCHAR(500), col10 VARCHAR(500), col11 VARCHAR(500), col12 VARCHAR(500), col13 VARCHAR(500), col14 VARCHAR(500), col15 VARCHAR(500), col16 VARCHAR(500), col17 VARCHAR(500), col18 VARCHAR(500), col19 VARCHAR(500), col20 VARCHAR(500), col21 VARCHAR(500), col22 VARCHAR(500), col23 VARCHAR(500), col24 VARCHAR(500), col25 VARCHAR(500), col26 VARCHAR(500), col27 VARCHAR(500), col28 VARCHAR(500), col29 VARCHAR(500), col30 VARCHAR(500), col31 VARCHAR(500), col32 VARCHAR(500), col33 VARCHAR(500), col34 VARCHAR(500), col35 VARCHAR(500), col36 VARCHAR(500), col37 VARCHAR(500), col38 VARCHAR(500), col39 VARCHAR(500), col40 VARCHAR(500), col41 VARCHAR(500), col42 VARCHAR(500), col43 VARCHAR(500), col44 VARCHAR(500), col45 VARCHAR(500), col46 VARCHAR(500), col47 VARCHAR(500), col48 VARCHAR(500), col49 VARCHAR(500), col50 VARCHAR(500), col51 VARCHAR(500), col52 VARCHAR(500), col53 VARCHAR(500), col54 VARCHAR(500), col55 VARCHAR(500), col56 VARCHAR(500), col57 VARCHAR(500), col58 VARCHAR(500), col59 VARCHAR(500), col60 VARCHAR(500), col61 VARCHAR(500), col62 VARCHAR(500), col63 VARCHAR(500), col64 VARCHAR(500), col65 VARCHAR(500), col66 VARCHAR(500), col67 VARCHAR(500), col68 VARCHAR(500), col69 VARCHAR(500), col70 VARCHAR(500), col71 VARCHAR(500), col72 VARCHAR(500), col73 VARCHAR(500), col74 VARCHAR(500), col75 VARCHAR(500), col76 VARCHAR(500), col77 VARCHAR(500), col78 VARCHAR(500), col79 VARCHAR(500), col80 VARCHAR(500), col81 VARCHAR(500), col82 VARCHAR(500), col83 VARCHAR(500), col84 VARCHAR(500), col85 VARCHAR(500), col86 VARCHAR(500), col87 VARCHAR(500), col88 VARCHAR(500), col89 VARCHAR(500), col90 VARCHAR(500), col91 VARCHAR(500), col92 VARCHAR(500), col93 VARCHAR(500), col94 VARCHAR(500), col95 VARCHAR(500), col96 VARCHAR(500), col97 VARCHAR(500), col98 VARCHAR(500), col99 VARCHAR(500), col100 VARCHAR(500), col101 VARCHAR(500), col102 VARCHAR(500), col103 VARCHAR(500), col104 VARCHAR(500), col105 VARCHAR(500), col106 VARCHAR(500), col107 VARCHAR(500), col108 VARCHAR(500), col109 VARCHAR(500), col110 VARCHAR(500), col111 VARCHAR(500), col112 VARCHAR(500), col113 VARCHAR(500), col114 VARCHAR(500), col115 VARCHAR(500), col116 VARCHAR(500), col117 VARCHAR(500), col118 VARCHAR(500), col119 VARCHAR(500), col120 VARCHAR(500), col121 VARCHAR(500), col122 VARCHAR(500), col123 VARCHAR(500), col124 VARCHAR(500), col125 VARCHAR(500), col126 VARCHAR(500), col127 VARCHAR(500), col128 VARCHAR(500), col129 VARCHAR(500), col130 VARCHAR(500), col131 VARCHAR(500), col132 VARCHAR(500), col133 VARCHAR(500), col134 VARCHAR(500), col135 VARCHAR(500), col136 VARCHAR(500), col137 VARCHAR(500), col138 VARCHAR(500), col139 VARCHAR(500), col140 VARCHAR(500), col141 VARCHAR(500), col142 VARCHAR(500), col143 VARCHAR(500), col144 VARCHAR(500), col145 VARCHAR(500), col146 VARCHAR(500), col147 VARCHAR(500), col148 VARCHAR(500), col149 VARCHAR(500), col150 VARCHAR(500) ) 

Output result

enter image description here

Generic XML parser written in TSQL. There are several hacks in the code and some stray code that needs to be removed. It works great. But the problem is sending the whole XML document as an input parameter from C # code through a direct call or through a file.

 CREATE PROC ZZZZZZZ ( @in_api_id int, @in_xml_doc XML, @in_xml_root varchar(100), @in_tot_result_col int = 150, @in_need_colnm_result CHAR(1) = 'Y', @in_debug_flg CHAR(1) = 'N' ) AS BEGIN DECLARE @idoc int, @sqlstr nvarchar(max) = '', @param nvarchar(200) = '', @runtime datetime = getdate(), @runby varchar(30) = suser_name(), @cnt int, @pre_stg_col_nm varchar(max) = '', @max_lvl int, @max_node varchar(500)='', @max_node_wo_slash varchar(500)='', @xml_col nvarchar(max) = '', @unq_col nvarchar(max) = '', @unq_xml_col nvarchar(max)='' --Create an internal representation of the XML document. EXEC sp_xml_preparedocument @idoc OUTPUT, @in_xml_doc; -- Execute a SELECT statement that uses the OPENXML rowset provider. set @in_xml_root = concat('/',@in_xml_root) SELECT * into #tmp FROM OPENXML (@idoc, @in_xml_root,2) where id <> 0; --select * from #tmp_xml_nodes --select * from #tmp --select * from #tmp_pre_staging ;with xml_cte(id, parentid, nodetype, localname, prefix, namespaceuri, datatype, prev, text, lvl,node,parent_localname) AS ( select id, parentid, nodetype, localname, prefix, namespaceuri, datatype, prev, text, 1 as lvl, cast(CONCAT(@in_xml_root,'/',localname) as varchar(100)) node, cast('' as varchar(200)) from #tmp where parentid = 0 UNION all select t.id, t.parentid, t.nodetype, t.localname, t.prefix, t.namespaceuri, t.datatype, t.prev, t.text, iif(t.nodetype = 1,xc.lvl+1,xc.lvl), cast( CONCAT ( xc.node ,iif(t.nodetype = 1, CONCAT ( '/' ,t.localname ) ,'' ) ) AS VARCHAR(100) ), cast(xc.localname as varchar(200)) from #tmp t inner join xml_cte xc on xc.id = t.parentid ) select * into #xmlcte from xml_cte --select * from #xmlcte --v2 change select @max_lvl = max(lvl)--iif(max(lvl)>=4,1,0) -- the iif condition is just a hack, I dont know why it works from #xmlcte select @max_node = concat(max(node),'/'), @max_node_wo_slash = max(node) from #xmlcte where lvl = @max_lvl select *,concat(parent_localname,'_',localname,' varchar(500)') fnl_col_nm, case when lvl<@max_lvl then concat(replicate('../',@max_lvl-lvl+iif(nodetype=1,nodetype,0)),iif(nodetype=1,'','@'),localname) --v2 change when lvl>@max_lvl then concat(replace(node,@max_node,''),iif(nodetype=1,'','/@'),localname)--v2 change else concat('../',iif(nodetype=1,'',concat(parent_localname,'/@')),localname)--v2 change end col_Struct ,concat(parent_localname,'_',localname) col_unq_nm ,ROW_NUMBER() over (order by(select 100)) sno ,concat('xmlname.value(''/Names[1]/name[',ROW_NUMBER() over (order by(select 100)),']'',''varchar(500)'') AS ',concat(parent_localname,'_',localname)) col_splt_nm into #xml_col_struct from #xmlcte where nodetype <= 2--v2 change --select * from #xml_col_struct set @cnt = (select count(distinct col_unq_nm) from #xml_col_struct) select @pre_stg_col_nm = ( select concat(',',COLUMN_NAME) from INFORMATION_SCHEMA.COLUMNS where table_name = 'ZZZZZZ' and COLUMN_NAME like 'col%' and ORDINAL_POSITION <= @cnt+5 order by ORDINAL_POSITION for xml path('') ) set @sqlstr = concat( 'insert into ZZZZZ(api_id,record_type,record_id,last_run_time,last_run_by', @pre_stg_col_nm, ')' ) select @xml_col = ( select distinct concat(',',fnl_col_nm,' ''',col_Struct,'''',char(10)) from #xml_col_struct order by 1 for xml path('') ) set @xml_col = stuff(@xml_col,1,1,'') select @unq_col = ( select distinct concat(',',col_unq_nm ) from #xml_col_struct order by 1 for xml path('') ) set @unq_col = stuff(@unq_col,1,1,'') select @in_tot_result_col = @in_tot_result_col - count(distinct col_unq_nm) from #xml_col_struct select @unq_xml_col = ( select concat(',xmlname.value(''/Names[1]/name[',ROW_NUMBER() over (order by(select 100)),']'',''varchar(500)'') AS ',col_unq_nm,char(10)) from (select distinct col_unq_nm from #xml_col_struct) t for xml path('') ) set @unq_xml_col = stuff(@unq_xml_col,1,1,'') set @sqlstr = concat( iif(@in_need_colnm_result = 'Y', concat(' ;WITH Split_Names (xmlname) AS ( SELECT CONVERT(XML,''<Names><name>'' + REPLACE(''',@unq_col,''','','', ''</name><name>'') + ''</name></Names>'') AS xmlname ) ' --,@sqlstr ,char(10), ' SELECT ',@in_api_id,',''H'',0,''',@runtime,''',''',@runby,''',',char(10) ,@unq_xml_col,replicate(',NULL',@in_tot_result_col)--v2 change ,char(10) ,'FROM Split_Names' ,char(10) ,'union all' ) ,'' ) --,iif(@in_need_colnm_result = 'Y','',@sqlstr) ,' SELECT ',@in_api_id,',''D'',ROW_NUMBER() over (order by(select 100)),''',@runtime,''',''',@runby,''',*' ,replicate(',NULL',@in_tot_result_col)--v2 change ,char(10) ,'FROM OPENXML (@idoc_inn, ''',@max_node_wo_slash,''',2)' ,char(10) ,'WITH (',@xml_col,')' ) if @in_debug_flg = 'Y' begin select @max_lvl+1,@max_lvl,@max_node_wo_slash,@xml_col,@unq_col,@sqlstr,@unq_xml_col select * from #xml_col_struct--v2 change end else begin set @param = '@idoc_inn int' exec sys.sp_executesql @sqlstr,@param,@idoc_inn = @idoc end EXEC sp_xml_removedocument @idoc END 

SQL code to read the XML file loaded by the C # class. This also works great, but the problem is that all lines are on separate lines, and concatenation truncates after the dot

 create table #tmp(data_line nvarchar(max)) bulk insert #tmp FROM '\\Server\\ZZZZ\\Downloads\\Data.xml' WITH ( --firstrow = 1 ROWTERMINATOR ='\n' ); select * from #tmp 

C # class

 Object httpConn = Dts.Connections["HTTP"].AcquireConnection(null); HttpClientConnection myConnection = new HttpClientConnection(httpConn); myConnection.ServerURL = string.Format(("http://xxxx.com/jjjj"),"userid","password"); byte[] webdata = myConnection.DownloadData(); String result_data = Convert.ToBase64String(webdata); XmlDocument xd = new XmlDocument(); XmlDictionaryReader xr = JsonReaderWriterFactory.CreateJsonReader(webdata, XmlDictionaryReaderQuotas.Max); xr.Read(); xd.LoadXml(xr.ReadOuterXml()); xd.Save("\\Server\\ZZZZ\\Downloads\\Data.xml"); 
+6
source share
4 answers

If you get properly formatted XML. You can use dataset . In particular, use Dataset.ReadXml() . This will load your xml into the Dataset object regardless of the xml tags. Then you can use ADO.Net, linq2sql, EF or any other communication method to put it in the database.

Since you save the file on the server, you can use the code below:

 DataSet ds = new DataSet(); ds.ReadXml("\\Server\\ZZZZ\\Downloads\\Data.xml"); 

then you can iterate over each dataset table using the foreach loop. Attributes in xml will become your columns in datatable.

So your final code would be something like this:

 using (DataSet ds = new DataSet()) { ds.ReadXml("\\Server\\ZZZZ\\Downloads\\Data.xml"); int nTableCounts = ds.Tables.Count; foreach(DataTable dt in ds.Tables) { using (dt) { //Put data in SQL table. } } } 

Let me know if something is unclear.

+4
source

Thank you for posting so many details. Now - along with generosity - people rush for help.

This is actually not an answer, at least it is not a solution to your actual problem! Rather, some thoughts / hints:

To be honest...

This must be done in a completely different way!

Your current approach is terrible ...

  • How do you read an XML string
  • How did you destroy it with FROM OPENXML
  • A huge amount of string code ...
  • Pointless storage of untyped string values
  • A column with multiple tables with meaningless column names ...

I doubt this concept will ever work with any xml answer ...

Act

Your sample 1 is not even valid XML.

  • missing <item>
  • invalid closing tag <toBeHandledAtxxx type="string">xxxxxx</toBeHandledAtLuca>
  • ISO8601 datetime format ISO8601 ... This format will work, but it is a sign that this XML is not created in the best way ...

If this is your actual answer, any XML-based approach will never work. Hope this happened to reduce the lines ...

Hierarchy

Sample 1 can have any number of elements below the <root> . There are 9 areas !!! potential data 1: n (more than one team, more than one release ...).

in relational terms, you would have to spread this over several related tables

Your sample 2 is pretty simple, but has a hierarchy with 3 levels.

How do you want to put this in your table of hundreds of columns? Side by side??? Tons of rows due to Cartesian product

Naming

Your sample 2 can be easily simple in a simple structure. You can use local-name() to find out what is Customer and is OrderDetail .

But in your example 2, they are all called <item> ...

Structure

Your sample 2 is a simple attribute centered XML , while your sample 1 mixes attributes and data with centered elements.

Import

Once there comes a moment when you want to shuffle data from your intermediate table into the appropriate structures. Reading from your rough multi-table column means you need to know everything about the source XML. You must work with complex mapping tables to know what the value of col107 (semantic, type, format ...). For this process, you will need a very specific procedure for each individual XML type.

Reading XML file

Below you show how you read an XML file with SQL Server

You are following a linear approach that allows you to specify: but the problem is that all lines are on separate lines. Better try this, it will read all the XML in one go:

 DECLARE @x XML= ( SELECT x FROM OPENROWSET(BULK 'C:\YourPath\YourXMLFile.xml', SINGLE_BLOB) AS YourFile(x) ); select @x; 

My advice

Especially the last point (a very specific procedure for each individual kind of XML) screams about how to import the as-is XML file. At least in my eyes - there is absolutely no value in your attempts to get any XML into one huge regular table. It is better to use a thin staging table with some meta columns and one column for full XML (maybe like NVARCHAR(MAX) , so that it is more tolerant of invalid XML files). There you fill in the XML exactly as you read it .

In any case, you will need a specific reading process for each type of XML. Create this directly for XML .

Do not use FROM OPENXML This is deprecated ... In your case it may be possible, but if performance does not matter, it is better to use the function that you find in John Capplettis responds. This will recursively move down any XML and provide information in the table - without having to dive into XPath and XQuery .

It was best to define a specific reading process for each type of XML based on modern XML methods .nodes() , .query() and .value() . It will be clean, fast and convenient.

Come back with new questions if you need help!

Last reason to convince you

Imagine your XML sender adding one element somewhere in between. In your table, the same content moves from col108 to col109 . Are you sure you want to upgrade to a version of your reading process? If you read the data directly from XML, the added element will simply be added to the reading process. Old XML without this value can be read with one engine without any problems ...

+4
source

Consider the following

I use a helper function (shamelessly shot from http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx ... with a pair i.e. range keys ) to convert almost any XML to a hierarchy.

I should note that I chose Temp Tables instead of the CTE series for convenience only. I am sure that if you want, you can easily switch to the CTE approach.

In terms of performance, we look at 90 - 110 ms for the provided sample files. I can't talk about how well this will work with a large XML source.

SQL

 --Drop Table #TempBase;Drop Table #TempCols;Drop Table #TempHier;Drop Table #TempPivot -- Declare @Vars Declare @in_api_id int = 1 Declare @runby varchar(30)='Some User' Declare @XML xml ='<Root><Customer CustomerID="VINET" ContactName="Paul Henriot"><Order CustomerID="VINET" EmployeeID="5" OrderDate="1996-07-04T00:00:00"><OrderDetail OrderID="10248" ProductID="11" Quantity="12"/><OrderDetail OrderID="10248" ProductID="42" Quantity="10"/></Order></Customer><Customer CustomerID="LILAS" ContactName="Carlos Gonzlez"><Order CustomerID="LILAS" EmployeeID="3" OrderDate="1996-08-16T00:00:00"><OrderDetail OrderID="10283" ProductID="72" Quantity="3"/></Order></Customer></Root>' --Declare @XML xml='<catalog><product description="Cardigan Sweater" product_image="cardigan.jpg"><catalog_item gender="Men''s"><item_number>QWZ5671</item_number><price>39.95</price><size description="Medium"><color_swatch image="red_cardigan.jpg">Red</color_swatch><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch></size><size description="Large"><color_swatch image="red_cardigan.jpg">Red</color_swatch><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch></size></catalog_item><catalog_item gender="Women''s"><item_number>RRX9856</item_number><price>42.50</price><size description="Small"><color_swatch image="red_cardigan.jpg">Red</color_swatch><color_swatch image="navy_cardigan.jpg">Navy</color_swatch><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch></size><size description="Medium"><color_swatch image="red_cardigan.jpg">Red</color_swatch><color_swatch image="navy_cardigan.jpg">Navy</color_swatch><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch><color_swatch image="black_cardigan.jpg">Black</color_swatch></size><size description="Large"><color_swatch image="navy_cardigan.jpg">Navy</color_swatch><color_swatch image="black_cardigan.jpg">Black</color_swatch></size><size description="Extra Large"><color_swatch image="burgundy_cardigan.jpg">Burgundy</color_swatch><color_swatch image="black_cardigan.jpg">Black</color_swatch></size></catalog_item></product></catalog>' --Select * From [dbo].[udf-XML-Hier](@XML) Order by R1 -- Generate Base Hier Data from XML Select * Into #TempBase From [dbo].[udf-XML-Hier](@XML) Order by R1 -- Generate Required Columns with Sequence Select *,ColSeq = Row_Number() over (Order by MinR1),ColName = concat('col',Row_Number() over (Order by MinR1) ),ColTitle = IIF(Attribute='','_','')+Element+IIF(Attribute='','','_'+Attribute) Into #TempCols From ( Select Element,Attribute,MinR1 = Min(R1) From #TempBase Where R1>1 Group By Element,Attribute ) A -- Extend Base Data with Col Seq Select A.*,C.ColSeq,RowSeq = 1+Sum(RowFlg) over (Order By R1) Into #TempHier From (Select *,RowFlg =IIF(Lag(Lvl,1) over (Order By R1)>Lvl,1,0) From #TempBase) A Join #TempCols C on (A.Element=C.Element and A.Attribute=C.Attribute) -- Generate Data to be Pivoted and Augment for Inheritance Select RowSeq=0,ColSeq,ColName,Value = cast(ColTitle as varchar(max)) Into #TempPivot From #TempCols Union All Select A.RowSeq,A.ColSeq,A.ColName,Value = IsNull(B.Value,'') From ( Select A.*,R1 = case when B.R1 is not null then B.R1 else (Select Max(R1) from #TempHier Where ColSeq=A.ColSeq and RowSeq<=A.RowSeq) end From ( Select A.*,B.* From (Select Distinct RowSeq From #TempHier) A Join (Select * From #TempCols) B on (1=1) ) A Left Join #TempHier B on (A.RowSeq=B.RowSeq and A.ColSeq=B.ColSeq ) ) A Join #TempHier B on (A.R1=B.R1) -- Build and Execute the Final Select Declare @SQL varchar(max) = '' Select @SQL = @SQL+concat(',',ColName,'=max(case when ColSeq=',ColSeq,' then Value else null end)') from #TempCols Order by ColSeq Select @SQL = ' Select api_id = '+cast(@in_api_id as varchar(25))+' ,record_type = max(case when RowSeq=0 then ''H'' else ''D'' end) ,record_id = RowSeq ,last_run_time = GetDate() ,last_run_by =''' +@runby +'''' +@SQL +' From #TempPivot Group By RowSeq Order By RowSeq ' Exec(@SQL) 

Returns

enter image description here

Performance

I know this is a small sample, but the results come back between 80 and 160 ms.


Value table (if necessary)

 CREATE FUNCTION [dbo].[udf-XML-Hier](@XML xml) Returns Table As Return with cte0 as ( Select Lvl = 1 ,ID = Cast(1 as int) ,Pt = Cast(NULL as int) ,Element = x.value('local-name(.)','varchar(150)') ,Attribute = cast('' as varchar(150)) ,Value = x.value('text()[1]','varchar(max)') ,XPath = cast(concat(x.value('local-name(.)','varchar(max)'),'[' ,cast(Row_Number() Over(Order By (Select 1)) as int),']') as varchar(max)) ,Seq = cast(10000001 as varchar(max)) ,AttData = x.query('.') ,XMLData = x.query('*') From @XML.nodes('/*') a(x) Union All Select Lvl = p.Lvl + 1 ,ID = Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10 ,Pt = p.ID ,Element = c.value('local-name(.)','varchar(150)') ,Attribute = cast('' as varchar(150)) ,Value = cast( c.value('text()[1]','varchar(max)') as varchar(max) ) ,XPath = cast(concat(p.XPath,'/',c.value('local-name(.)','varchar(max)'),'[',cast(Row_Number() Over(PARTITION BY c.value('local-name(.)','varchar(max)') Order By (Select 1)) as int),']') as varchar(max) ) ,Seq = cast(concat(p.Seq,' ',10000000+Cast( (Lvl + 1) * 1024 + (Row_Number() Over(Order By (Select 1)) * 2) as int ) * 10) as varchar(max)) ,AttData = c.query('.') ,XMLData = c.query('*') From cte0 p Cross Apply p.XMLData.nodes('*') b(c) ) , cte1 as ( Select R1 = Row_Number() over (Order By Seq),A.* From ( Select Lvl,ID,Pt,Element,Attribute,Value,XPath,Seq From cte0 Union All Select Lvl = p.Lvl+1 ,ID = p.ID + Row_Number() over (Order By (Select NULL)) ,Pt = p.ID ,Element = p.Element ,Attribute = x.value('local-name(.)','varchar(150)') ,Value = x.value('.','varchar(max)') ,XPath = p.XPath + '/@' + x.value('local-name(.)','varchar(max)') ,Seq = cast(concat(p.Seq,' ',10000000+p.ID + Row_Number() over (Order By (Select NULL)) ) as varchar(max)) From cte0 p Cross Apply AttData.nodes('/*/@*') a(x) ) A ) Select A.R1 ,R2 = IsNull((Select max(R1) From cte1 Where Seq Like A.Seq+'%'),A.R1) ,A.Lvl ,A.ID ,A.Pt ,A.Element ,A.Attribute ,A.XPath ,Title = Replicate('|---',Lvl-1)+Element+IIF(Attribute='','','@'+Attribute) ,A.Value From cte1 A /* Source: http://beyondrelational.com/modules/2/blogs/28/posts/10495/xquery-lab-58-select-from-xml.aspx Declare @XML xml='<person><firstname preferred="Annie" nickname="BeBe">Annabelle</firstname><lastname>Smith</lastname></person>' Select * from [dbo].[udf-XML-Hier](@XML) Order by R1 */ 

Change Just for fun, I grabbed the XML file from http://www.service-architecture.com/articles/object-oriented-databases/xml_file_for_complex_data.html , and the results are as follows:

enter image description here

+3
source

Why use SQL Server to deserialize a file. This is what is best left in C #.

I did not do this in C #, but you can deserialize the XML into a list of strings and iterate through the list to create an entry to insert into the database. Here's a general deserialization method that might help.

I would think that it would be faster, and not quarrel with your database administrator, trying to do heavy string manipulations on the server.

+2
source

Source: https://habr.com/ru/post/1011516/


All Articles