Monday, 30 May 2016

Data Scrubbing with Google Sheets formulas

While scrubbing data in Google Sheets for a hobby project, I discovered the following formulas that made the job easy -

1) If you have a raw string like "Blah|2|4|5|www.google.com", the below formula can split the values separated by the (pipe) delimiter & place them in a new column in a row

In cell B2 paste the following:
=ARRAYFORMULA(IFERROR(SPLIT(A1:A;"|")))
Then paste your data starting with cell A2. As soon as you enter the data it will automatically split the combined values to a value-per-column using the '|' as a delimiter.

2) To remove duplicate values from rows in a column and then show them in sorted order, I used this forumula: =SORT(UNIQUE(A1:A151))

3) REGEXEXTRACT xtracts matching substrings according to a regular expression.

This formula will pick a group of numbers from the value at cell A2
=REGEXEXTRACT(A2, "[\d]+")

4) There is no in-built formula to parse JSON output but it is possible to create custom formulas with Google App Script

Example:
=ImportJSON("https://api.stackexchange.com/2.2/tags/" & J11 & 
"/wikis?&site=webapps&filter=default",
"/items/excerpt","noInherit,noTruncate,noHeaders")

No comments:

Post a Comment