Extract multiple columns from a column containing JSON text in KDB

I have a table containing a column with JSON text. I want to parse this column and extract various attributes for multiple columns.

c:([] date:2?.zd ; client:( "{ \"firstName\": \"John\", \"lastName\": \"Smith\", \"age\": 27 }" ; "{ \"firstName\": \"Scott\", \"lastName\": \"Tiger\", \"age\":29 }" ) ) 
+4
source share
5 answers

Although JSON analysis has already been integrated into the KDB '.j' namespace (3.3 and above). It can be parsed using .jk .

 select date, firstName: client@ \:`firstName , lastName: client@ \:`lastName from update .jk each client from c 

You can use '.jj' to serialize a kdb object for a string representing JSON.

+4
source

Assuming you want to delete the client column after data extraction, you can use the following function, which has 3 parameters - x table, y json columns and z columns for extraction:

 q)f:{((),y)_x,'((),z)#.j.k'[xy]} q)f[c;`client;`firstName] date firstName -------------------- 2008.02.04 "John" 2015.01.02 "Scott" 

This can be broken down as follows: .j.k'[xy] will parse and json data into a table using .jk , ((),z)# uses accept the # operator to extract only the columns you need, which then join the original table with x,' . The final step is to remove the _ json column using ((),y)_ .

The use of (), required for both accept and drop operations, since they are awaiting a list transfer, and this ensures that it is.

+3
source

If you run .jk on every json in your table, you will have a list of dictionaries similar to the table;

 q)exec .jk each client from c firstName lastName age ---------------------- "John" "Smith" 27 "Scott" "Tiger" 29 

Then you can attach this to your source table and extract all the data you need;

 q)foo:(,'/)(c; exec .jk each client from c) q)select date, firstName, lastName, age from foo date firstName lastName age --------------------------------- 2008.02.04 "John" "Smith" 27 2015.01.02 "Scott" "Tiger" 29 
+3
source

Here is an alternative method to create a single json array from an entire json column.

Running .jk on a single line will be more efficient than running .jk on many small arrays / dictionaries / lines.

 // test table q)c:([] date:2?.zd ; client:( "{ \"firstName\": \"John\", \"lastName\": \"Smith\", \"age\": 27 }" ; "{ \"firstName\": \"Scott\", \"lastName\": \"Tiger\", \"age\":29 }" ) ); // combine each string with "," and encompass in "[]" // join each parsed dictionary with each row from c, keep client column for keep sake q)c,'exec .jk {"[",x,"]"}"," sv client from c 

It often happens with json capture that data and messages are schematic / unstructured. If, in this case, each json line does not have to have the same tags as the others, then you can create a static map of your required output and connect this map with each parsed json message. Then this will always affect the permission to the table. You can also enter a map to make sure the resulting table is entered correctly.

 // insert row with missing age tag and new location tag q)`c insert (.zd;"{\"firstName\": \"Sean\", \"lastName\": \"O'Hagan\", \"location\" : \"Dub\"}"); // name and locations are strings, age is float q)map:{x,'first each x$\:()}[`firstName`lastName`age`location!10 10 9 11h]; // manually edit symbol nulls to be varchars, so when casting to symbol the RHS param is always a varchar q).[`map;where map[;1]~\:`;:;(-11h;" ")]; // join map to each parsed dictionary, and delete client column as we have extracted useful data q)delete client from c,'exec flip map[;0]$flip (map[;1],/:.jk {"[",x,"]"}"," sv client) from c date firstName lastName age location ------------------------------------------- 2004.02.01 "John" "Smith" 27 2005.06.06 "Scott" "Tiger" 29 2018.03.13 "Sean" "O'Hagan" Dub q)meta delete client from c,'exec flip map[;0]$flip (map[;1],/:.jk {"[",x,"]"}"," sv client) from c c | tfa ---------| ----- date | d firstName| C lastName | C age | f location | s 

HTH, Sean

+3
source

From what I understand, you are not interested in keeping the client column as it is. Here's how I do it:

 ((),`client) _ c,'{.jk x}'[exec client from c] 

and you will get a table like this:

 date firstName lastName age --------------------------------- 2003.07.05 "John" "Smith" 27 2005.02.25 "Scott" "Tiger" 29 
+1
source

Source: https://habr.com/ru/post/1275848/


All Articles