Golang Decoding / Unmarshaling Invalid Unicode in JSON

I am extracting JSON files in go that are not formatted uniformly. For example, I might have the following:

{"email": "\"blah.blah@blah.com\""}
{"email": "robert@gmail.com"}
{"name": "m\303\203ead"}

We see that there will be a problem with the escape character. Using json.Decode:

WITH

{"name": "m\303\203ead"}

I get an error: invalid character '3' in string escape code

I tried several approaches to normalizing my data, for example, by passing in a string array (it works, but there are too many cases of edges) or even for filtering escape characters.

Finally, I came to this article: ( http://blog.golang.org/normalization ) And the solution they proposed seemed very interesting.

I tried the following

isMn := func(r rune) bool {
    return unicode.Is(unicode.Mn, r)
}

t := transform.Chain(norm.NFC, transform.RemoveFunc(isMn), norm.NFD)

fileReader, err := bucket.GetReader(filename)

transformReader := transform.NewReader(fileReader, t)

decoder := json.NewDecoder(tReader)

for {
    var dataModel Model
    if err := decoder.Decode(&kmData); err == io.EOF {
        break
    } else {
      // DO SOMETHING
    }
}

C Modelwill be:

type Model struct {
    Name  string `json:"name" bson:"name"`
    Email string `json:"email" bson:"email"` 
}

I tried several variations of this, but couldn't get it to work.

, , / JSON ? , JSON.

, .

+4
1

json.RawMessage string, json.Decode .

: http://play.golang.org/p/fB-38KGAO0

type Model struct {
    N  json.RawMessage `json:"name" bson:"name"`
}

func (m *Model) Name() string {
    return string(m.N)
}
func main() {
    s := "{\"name\": \"m\303\203ead\"}"
    r := strings.NewReader(s)
    d := json.NewDecoder(r)
    m := Model{}

    fmt.Println(d.Decode(&m))
    fmt.Println(m.Name())
}

: , , , http://play.golang.org/p/VYJKTKmiYm:

func cleanUp(s string) string {
    re := regexp.MustCompile(`\b(\\\d\d\d)`)
    return re.ReplaceAllStringFunc(s, func(s string) string {
        return `\u0` + s[1:]
    })
}
func main() {
    s := "{\"name\": \"m\303\203ead\"}"
    s = cleanUp(s)
    r := strings.NewReader(s)
    d := json.NewDecoder(r)
    m := Model{}
    fmt.Println(d.Decode(&m))
    fmt.Println(m.Name())
}
+4

Source: https://habr.com/ru/post/1545130/


All Articles