Since people tend to throw regular expressions at everything, even those things that cannot be analyzed using regular expressions (i.e., irregular languages): I wrote a parser to prove this data format:
$input = '{ "idArray" = ( "99516", "99518", "97344", "97345", "98425" ); "frame" = { "size" = { "width" = "8"; "height" = "8"; }; "origin" = { "x" = "244"; "y" = "345"; }; }; }'; echo json_encode(parse($input)); function parse($input) { $tokens = tokenize($input); $index = 0; $result = parse_value($tokens, $index); if ($result[1] !== count($tokens)) { throw new Exception("parsing stopped at token " . $result[1] . " but there is more input"); } return $result[0][1]; } function tokenize($input) { $tokens = array(); $length = strlen($input); $pos = 0; while($pos < $length) { list($token, $pos) = find_token($input, $pos); $tokens[] = $token; } return $tokens; } function find_token($input, $pos) { $static_tokens = array("=", "{", "}", "(", ")", ";", ","); while(preg_match("/\s/mis", substr($input, $pos, 1))) { // eat whitespace $pos += 1; } foreach ($static_tokens as $static_token) { if (substr($input, $pos, strlen($static_token)) === $static_token) { return array($static_token, $pos + strlen($static_token)); } } if (substr($input, $pos, 1) === '"') { $length = strlen($input); $token_length = 1; while ($pos + $token_length < $length) { if (substr($input, $pos + $token_length, 1) === '"') { return array(array("value", substr($input, $pos + 1, $token_length - 1)), $pos + $token_length + 1); } $token_length += 1; } } throw new Exception("invalid input at " . $pos . ": `" . substr($input, $pos - 10, 20) . "`"); } // value is either an object {}, an array (), or a literal "" function parse_value($tokens, $index) { if ($tokens[$index] === "{") { // object: a list of key-value pairs, glued together by ";" $return_value = array(); $index += 1; while ($tokens[$index] !== "}") { list($key, $value, $index) = parse_key_value($tokens, $index); $return_value[$key] = $value[1]; if ($tokens[$index] !== ";") { throw new Exception("Unexpected: " . print_r($tokens[$index], true)); } $index += 1; } return array(array("object", $return_value), $index + 1); } if ($tokens[$index] === "(") { // array: a list of values, glued together by ",", the last "," is optional $return_value = array(); $index += 1; while ($tokens[$index] !== ")") { list($value, $index) = parse_value($tokens, $index); $return_value[] = $value[1]; if ($tokens[$index] === ",") { // last, is optional $index += 1; } else { if ($tokens[$index] !== ")") { throw new Exception("Unexpected: " . print_r($tokens[$index], true)); } return array(array("array", $return_value), $index + 1); } } return array(array("array", $return_value), $index + 1); } if ($tokens[$index][0] === "value") { return array(array("string", $tokens[$index][1]), $index + 1); } throw new Exception("Unexpected: " . print_r($tokens[$index], true)); } // find a key (string) followed by '=' followed by a value (any value) function parse_key_value($tokens, $index) { list($key, $index) = parse_value($tokens, $index); if ($key[0] !== "string") { // key must be a string throw new Exception("Unexpected: " . print_r($key, true)); } if ($tokens[$index] !== "=" ) { throw new Exception("'=' expected"); } $index += 1; list($value, $index) = parse_value($tokens, $index); return array($key[1], $value, $index); }
Output:
{"idArray":["99516","99518","97344","97345","98425"],"frame":{"size":{"width":"8","height":"8"},"origin":{"x":"244","y":"345"}}}
Notes
source input has final,. I deleted this character. It throws an error (more input) if you return it.
This parser is naive in the sense that it marxes all input data before parsing begins. This is not good for big input.
I did not add escape detection for strings in the tokenizer. For example: "foo\"bar"
.
It was fun. If you have any questions, let me know.
Edit: I see this is a JavaScript issue. Porting PHP to JavaScript should not be too complicated. The value of list($foo, $bar) = func()
equivalent to: var res = func(); var foo = res[0]; var bar = res[1];
var res = func(); var foo = res[0]; var bar = res[1];