I am extracting JSON files in go that are not formatted uniformly. For example, I might have the following:
{"email": "\"blah.blah@blah.com\""}
{"email": "robert@gmail.com"}
{"name": "m\303\203ead"}
We see that there will be a problem with the escape character. Using json.Decode:
WITH
{"name": "m\303\203ead"}
I get an error: invalid character '3' in string escape code
I tried several approaches to normalizing my data, for example, by passing in a string array (it works, but there are too many cases of edges) or even for filtering escape characters.
Finally, I came to this article: ( http://blog.golang.org/normalization ) And the solution they proposed seemed very interesting.
I tried the following
isMn := func(r rune) bool {
return unicode.Is(unicode.Mn, r)
}
t := transform.Chain(norm.NFC, transform.RemoveFunc(isMn), norm.NFD)
fileReader, err := bucket.GetReader(filename)
transformReader := transform.NewReader(fileReader, t)
decoder := json.NewDecoder(tReader)
for {
var dataModel Model
if err := decoder.Decode(&kmData); err == io.EOF {
break
} else {
// DO SOMETHING
}
}
C Modelwill be:
type Model struct {
Name string `json:"name" bson:"name"`
Email string `json:"email" bson:"email"`
}
I tried several variations of this, but couldn't get it to work.
, , / JSON ? , JSON.
, .