Problem. An XML declaration is permitted only at the beginning of a document.

xml: 19558: parser error: XML declaration allowed only at the beginning of the document

any solutions? I am using php XMLReader to parse a large XML file, but getting this error. I know that the file is not well formatted, but I think that it is impossible to view it in the file and delete these additional declarations. so any idea, PLEASE HELP

+4
source share
3 answers

Make sure there are no spaces before the first tag. Try the following:

<?php //Declarations $file = "data.txt"; //The file to read from. #Read the file $fp = fopen($file, "r"); //Open the file $data = ""; //Initialize variable to contain the file content while(!feof($fp)) //Loop through the file, read it till the end. { $data .= fgets($fp, 1024); //append next kb to data } fclose($fp); //Close file #End read file $split = preg_split('/(?<=<\/xml>)(?!$)/', $data); //Split each xml occurence into its own string foreach ($split as $sxml) //Loop through each xml string { //echo $sxml; $reader = new XMLReader(); //Initialize the reader $reader->xml($sxml) or die("File not found"); //open the current xml string while($reader->read()) //Read it { switch($reader->nodeType) { case constant('XMLREADER::ELEMENT'): //Read element if ($reader->name == 'record') { $dataa = $reader->readInnerXml(); //get contents for <record> tag. echo $dataa; //Print it to screen. } break; } } $reader->close(); //close reader } ?> 

Set the $ file variable to the desired file. Note. I don't know how well this works for a 4gb file. Tell me if it is not.

EDIT: Here is another solution, it should work better with a large file (parses as it reads the file).

 <?php set_time_limit(0); //Declarations $file = "data.txt"; //The file to read from. #Read the file $fp = fopen($file, "r") or die("Couldn't Open"); //Open the file $FoundXmlTagStep = 0; $FoundEndXMLTagStep = 0; $curXML = ""; $firstXMLTagRead = false; while(!feof($fp)) //Loop through the file, read it till the end. { $data = fgets($fp, 2); if ($FoundXmlTagStep==0 && $data == "<") $FoundXmlTagStep=1; else if ($FoundXmlTagStep==1 && $data == "x") $FoundXmlTagStep=2; else if ($FoundXmlTagStep==2 && $data == "m") $FoundXmlTagStep=3; else if ($FoundXmlTagStep==3 && $data == "l") { $FoundXmlTagStep=4; $firstXMLTagRead = true; } else if ($FoundXmlTagStep!=4) $FoundXmlTagStep=0; if ($FoundXmlTagStep==4) { if ($firstXMLTagRead) { $firstXMLTagRead = false; $curXML = "<xm"; } $curXML .= $data; //Start trying to match end of xml if ($FoundEndXMLTagStep==0 && $data == "<") $FoundEndXMLTagStep=1; elseif ($FoundEndXMLTagStep==1 && $data == "/") $FoundEndXMLTagStep=2; elseif ($FoundEndXMLTagStep==2 && $data == "x") $FoundEndXMLTagStep=3; elseif ($FoundEndXMLTagStep==3 && $data == "m") $FoundEndXMLTagStep=4; elseif ($FoundEndXMLTagStep==4 && $data == "l") $FoundEndXMLTagStep=5; elseif ($FoundEndXMLTagStep==5 && $data == ">") { $FoundEndXMLTagStep=0; $FoundXmlTagStep=0; #finished Reading XML ParseXML ($curXML); } elseif ($FoundEndXMLTagStep!=5) $FoundEndXMLTagStep=0; } } fclose($fp); //Close file function ParseXML ($xml) { //echo $sxml; $reader = new XMLReader(); //Initialize the reader $reader->xml($xml) or die("File not found"); //open the current xml string while($reader->read()) //Read it { switch($reader->nodeType) { case constant('XMLREADER::ELEMENT'): //Read element if ($reader->name == 'record') { $dataa = $reader->readInnerXml(); //get contents for <record> tag. echo $dataa; //Print it to screen. } break; } } $reader->close(); //close reader } ?> 
+17
source

If you have multiple XML declarations, you probably have a concatenation of many XML files, as well as more than one root element. It is not clear how you deliberately disassemble them.

Try to keep the XML source first to give you real XML. If that doesn't work, see if you can do some preprocessing to fix the XML before parsing it.

+1
source

Another possible cause of this problem is the unicode file system. If your XML encoding is UTF-8, the contents of the file always begin with these three bytes of "EF BB BF". These bytes may not be interpreted correctly if you are trying to convert from a byte array to a string. The solution is to write a byte array for the immediate file without reading getString from the byte array.

ASCII does not have a Unicode file head: FF FE UTF-8: EF BB BF UTF-32: FF FE 00 00

Just open the file in ultraedit and you will see these bytes.

+1
source

Source: https://habr.com/ru/post/1345793/


All Articles