This gist contains a list of tricks, commands and formula I collected while working with Open Refine.
See this blogpost: Merging Datasets with Common Columns in Google Refine
Open Refine supports expressions when cleaning update using its' "transform" tools. However, the documentation is a bit lacking on how to create 'complex' expressions. These are expressions that consist of multiple statements that need to be evaluated.
A classic example would be evaluating two separate cells in a row, in order to determine the value of a third row.
Then you will need boolean operations:
if (and(exp1, exp2), "ValueA", "ValueB")
Another example is mimicking an if(exp1) { valueA } elseif(exp2) { valueB } else { valueC } structure. This can be accomplished by nesting if structures:
if(exp1, (if(exp2), "valueA", "valueB"), "valueC")
Real life example given a dataset with 3 columns "ID", URL" and "Type".
if (isNotNull(cells["URL"]), (if(isNotNull(cells["URL"].value.match(/(.*)jpg/)), "representation", "data")), "")
Translates to:
- If URL == empty Then type = NULL
- elseIF (URL ends on .jpg) Then type = "representation"
- else type = "data"
Note the use of a regular expression with the "match" function.