└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Data Engineering Recombinator Challenge 2 | 3 | *If you stumbled on this challenge and enjoyed trying it, feel 4 | free to send a note over to __robk@avant.com__ and talk directly to the director 5 | of data science. We're constantly chatting with awesome developers. Even 6 | if you're purely into open source... if you think playing with data is cool we think you're cool.* 7 | 8 | ## Context 9 | Being able to easily manipulate data (aka munging) is an important part of our day-to-day here at Avant. Please write a small [CLI tool](https://en.wikipedia.org/wiki/Command-line_interface) that works as described below. You may use the language of your choice. 10 | ## Input 11 | Your program should expect valid JSON representing a two dimensional matrix. It will be in one of the following two formats: 12 | 13 | **A**) **a list of lists** 14 | 15 | The expected format is `[ [variable names], [first row], [second row], ... ]` 16 | 17 | recombinator '[ ["a","b","c"], [1,2,null], [null,3,4], [5,null,6] ]' 18 | **B**) **a list of objects** 19 | 20 | JSON objects are simply unordered collections of key:value pairs. 21 | 22 | Each row is represented by an object containing the variables set for that row. 23 | 24 | Accordingly, this form of input is convenient for sparse data sets. 25 | 26 | recombinator '[ { "a":1, "b":2 }, { "b":3, "c":4 }, { "c":6, "a":5 } ]' 27 | ## Output 28 | Your program should transform the input into a single JSON object mapping variable names to lists of values. 29 | 30 | For input type (B), any variables that are missing in a row should be [imputed](https://en.wikipedia.org/wiki/Imputation_(statistics)) with null. 31 | 32 | The order of the values in each variable's list should be the same as the order of the rows. 33 | 34 | '{ "a": [1,null,5], "b": [2,3,null], "c": [null,4,6] }' 35 | --------------------------------------------------------------------------------