Split multi-valued cells in multiple columns into rows (Open Refine)

I am clearing an Open Open table. Now it looks like this for me:

REF                 Handle      Size        Price
2002, 2003          t-shirt1    M, L        23
3001, 3002, 3003    t-shirt2    S, M, L     24

I need to split these multi-valued cells in REF and Size to get:

REF                 Handle      Size        Price
2002                t-shirt1    M           23
2003                t-shirt1    L           23  
3001                t-shirt2    S           24  
3002                t-shirt2    M           24
3003                t-shirt2    L           24

Can this be done in Open Refine? The "Split multi-values ​​cells ..." command executes only one column. Thanks Ana Rita

+4
source share
1 answer

Yes, it is possible:

  • Separate the 1st column using "," as the separator.
  • Move column 2 to position 1
  • displays your project as a record (not a string)
  • Separate column 3 using "," as the separator
  • Fill in columns 4 and 2
  • reorder columns

Here is my recipe for GREL:

[
  {
    "op": "core/row-removal",
    "description": "Remove rows",
    "engineConfig": {
      "facets": [
        {
          "invert": false,
          "expression": "row.starred",
          "selectError": false,
          "omitError": false,
          "selectBlank": false,
          "name": "Starred Rows",
          "omitBlank": false,
          "columnName": "",
          "type": "list",
          "selection": [
            {
              "v": {
                "v": true,
                "l": "true"
              }
            }
          ]
        }
      ],
      "mode": "row-based"
    }
  },
  {
    "op": "core/multivalued-cell-split",
    "description": "Split multi-valued cells in column Column 1",
    "columnName": "Column 1",
    "keyColumnName": "Column 1",
    "separator": ", ",
    "mode": "plain"
  },
  {
    "op": "core/column-move",
    "description": "Move column Column 2 to position 0",
    "columnName": "Column 2",
    "index": 0
  },
  {
    "op": "core/multivalued-cell-split",
    "description": "Split multi-valued cells in column Column 3",
    "columnName": "Column 3",
    "keyColumnName": "Column 2",
    "separator": ", ",
    "mode": "plain"
  },
  {
    "op": "core/fill-down",
    "description": "Fill down cells in column Column 4",
    "engineConfig": {
      "facets": [],
      "mode": "record-based"
    },
    "columnName": "Column 4"
  },
  {
    "op": "core/fill-down",
    "description": "Fill down cells in column Column 2",
    "engineConfig": {
      "facets": [],
      "mode": "record-based"
    },
    "columnName": "Column 2"
  },
  {
    "op": "core/column-reorder",
    "description": "Reorder columns",
    "columnNames": [
      "Column 1",
      "Column 2",
      "Column 3",
      "Column 4"
    ]
  }
]

Herve

+3
source

Source: https://habr.com/ru/post/1609374/


All Articles