├── src └── README.md ├── insight_testsuite ├── README.md └── test_1 │ ├── output │ └── report.csv │ └── input │ └── complaints.csv ├── input └── README.md ├── output └── README.md ├── run.sh └── README.md /src/README.md: -------------------------------------------------------------------------------- 1 | This is the directory where your souce code would reside 2 | -------------------------------------------------------------------------------- /insight_testsuite/README.md: -------------------------------------------------------------------------------- 1 | This is the directory where you'd place your test cases. One test, test_1, has been provided for your use. 2 | -------------------------------------------------------------------------------- /input/README.md: -------------------------------------------------------------------------------- 1 | This is the directory where your program would find test input files (e.g., when we review your submission, this is the directory where we'll place our test input files in) 2 | -------------------------------------------------------------------------------- /output/README.md: -------------------------------------------------------------------------------- 1 | This is the directory where your program would write out output files. (e.g., when we run your code, this is where we'd expect your program would write the expected output file) 2 | -------------------------------------------------------------------------------- /insight_testsuite/test_1/output/report.csv: -------------------------------------------------------------------------------- 1 | "credit reporting, credit repair services, or other personal consumer reports",2019,3,2,67 2 | "credit reporting, credit repair services, or other personal consumer reports",2020,1,1,100 3 | debt collection,2019,1,1,100 4 | -------------------------------------------------------------------------------- /run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # Use this shell script to compile (if necessary) your code and then execute it. Belw is an example of what might be found in this file if your program was written in Python 3.7 4 | # python3.7 ./src/consumer_complaints.py ./input/complaints.csv ./output/report.csv 5 | -------------------------------------------------------------------------------- /insight_testsuite/test_1/input/complaints.csv: -------------------------------------------------------------------------------- 1 | Date received,Product,Sub-product,Issue,Sub-issue,Consumer complaint narrative,Company public response,Company,State,ZIP code,Tags,Consumer consent provided?,Submitted via,Date sent to company,Company response to consumer,Timely response?,Consumer disputed?,Complaint ID 2 | 2019-09-24,Debt collection,I do not know,Attempts to collect debt not owed,Debt is not yours,"transworld systems inc. is trying to collect a debt that is not mine, not owed and is inaccurate.",,TRANSWORLD SYSTEMS INC,FL,335XX,,Consent provided,Web,2019-09-24,Closed with explanation,Yes,N/A,3384392 3 | 2019-09-19,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Incorrect information on your report,Information belongs to someone else,,Company has responded to the consumer and the CFPB and chooses not to provide a public response,Experian Information Solutions Inc.,PA,15206,,Consent not provided,Web,2019-09-20,Closed with non-monetary relief,Yes,N/A,3379500 4 | 2020-01-06,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Incorrect information on your report,Information belongs to someone else,,,Experian Information Solutions Inc.,CA,92532,,N/A,Email,2020-01-06,In progress,Yes,N/A,3486776 5 | 2019-10-24,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Incorrect information on your report,Information belongs to someone else,,Company has responded to the consumer and the CFPB and chooses not to provide a public response,"TRANSUNION INTERMEDIATE HOLDINGS, INC.",CA,925XX,,Other,Web,2019-10-24,Closed with explanation,Yes,N/A,3416481 6 | 2019-11-20,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Incorrect information on your report,Account information incorrect,I would like the credit bureau to correct my XXXX XXXX XXXX XXXX balance. My correct balance is XXXX,Company has responded to the consumer and the CFPB and chooses not to provide a public response,"TRANSUNION INTERMEDIATE HOLDINGS, INC.",TX,77004,,Consent provided,Web,2019-11-20,Closed with explanation,Yes,N/A,3444592 7 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Consumer Complaints 2 | 3 | ## Table of Contents 4 | 1. [Problem](README.md#problem) 5 | 1. [Steps to submit your solution](README.md#steps-to-submit-your-solution) 6 | 1. [Input Dataset](README.md#input-dataset) 7 | 1. [Expected output](README.md#expected-output) 8 | 1. [Instructions](README.md#instructions) 9 | 1. [Tips on getting an interview](README.md#tips-on-getting-an-interview) 10 | 1. [Repo directory structure](README.md#repo-directory-structure) 11 | 1. [Testing your code](README.md#testing-your-code) 12 | 1. [Questions?](README.md#questions?) 13 | 14 | ## Problem 15 | The federal government provides a way for consumers to file complaints against companies regarding different financial products, such as payment problems with a credit card or debt collection tactics. This challenge will be about identifying the number of complaints filed and how they're spread across different companies. 16 | 17 | **For this challenge, we want to know for each financial product and year, the total number of complaints, number of companies receiving a complaint, and the highest percentage of complaints directed at a single company. 18 | 19 | ## Steps to submit your solution 20 | 21 | * To submit your entry, use the link you received in your coding challenge invite email 22 | * Do NOT attach a file - we will not accept solutions with attached files 23 | * Do NOT send your solution over an email - We are unable to accept coding challenges that way 24 | * To see whether your code will pass at least one key test on our system (do this prior to submission), use this page: https://insight-cc-submission.com/test-my-repo-link and choose 'Consumer Complaints' in the challenge dropdown 25 | 26 | ### Creating private repositories 27 | To avoid plagiarism and wrongdoing, we request you submit a private repository of your code, and then invite us to collaborate prior to submitting your solution. Both GitHub and Bitbucket offer free private repositories at no extra cost. 28 | * Create a private repository on GitHub or Bitbucket with the directory structure detailed [below](README.md#repo-directory-structure) 29 | * Add "insight-cc-bot" (or cc@insightdataengineering.com on Bitbucket) as a collaborator in your project 30 | * [How to add collaborators on GitHub?](https://help.github.com/articles/inviting-collaborators-to-a-personal-repository/) 31 | * [How to add users and groups as collaborators in Bitbucket?](https://confluence.atlassian.com/bitbucket/grant-repository-access-to-users-and-groups-221449716.html) 32 | * **We will NOT be grading submissions we do not have access to.** 33 | 34 | ### Submitting a link to your repository 35 | * Provide a link to the specific repo for this project, not your general profile 36 | * Exactly follow the directory structure [detailed](README.md#repo-directory-structure) in this Readme, especially providing a 'run.sh' shell script that executes your code 37 | * Put any comments in the README file of your project repo 38 | 39 | ## Input dataset 40 | For this challenge, when we grade your submission, an input file, `complaints.csv`, will be moved to the top-most `input` directory of your repository. Your code must read that input file, process it and write the results to an output file, `report.csv` that your code must place in the top-most `output` directory of your repository. 41 | 42 | Below are the contents of an example `complaints.csv` file: 43 | ``` 44 | Date received,Product,Sub-product,Issue,Sub-issue,Consumer complaint narrative,Company public response,Company,State,ZIP code,Tags,Consumer consent provided?,Submitted via,Date sent to company,Company response to consumer,Timely response?,Consumer disputed?,Complaint ID 45 | 2019-09-24,Debt collection,I do not know,Attempts to collect debt not owed,Debt is not yours,"transworld systems inc. is trying to collect a debt that is not mine, not owed and is inaccurate.",,TRANSWORLD SYSTEMS INC,FL,335XX,,Consent provided,Web,2019-09-24,Closed with explanation,Yes,N/A,3384392 46 | 2019-09-19,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Incorrect information on your report,Information belongs to someone else,,Company has responded to the consumer and the CFPB and chooses not to provide a public response,Experian Information Solutions Inc.,PA,15206,,Consent not provided,Web,2019-09-20,Closed with non-monetary relief,Yes,N/A,3379500 47 | 2020-01-06,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Incorrect information on your report,Information belongs to someone else,,,Experian Information Solutions Inc.,CA,92532,,N/A,Email,2020-01-06,In progress,Yes,N/A,3486776 48 | 2019-10-24,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Incorrect information on your report,Information belongs to someone else,,Company has responded to the consumer and the CFPB and chooses not to provide a public response,"TRANSUNION INTERMEDIATE HOLDINGS, INC.",CA,925XX,,Other,Web,2019-10-24,Closed with explanation,Yes,N/A,3416481 49 | 2019-11-20,"Credit reporting, credit repair services, or other personal consumer reports",Credit reporting,Incorrect information on your report,Account information incorrect,I would like the credit bureau to correct my XXXX XXXX XXXX XXXX balance. My correct balance is XXXX,Company has responded to the consumer and the CFPB and chooses not to provide a public response,"TRANSUNION INTERMEDIATE HOLDINGS, INC.",TX,77004,,Consent provided,Web,2019-11-20,Closed with explanation,Yes,N/A,3444592 50 | ``` 51 | Each line of the input file, except for the first-line header, represents one complaint. Consult the [Consumer Finance Protection Bureau's technical documentation](https://cfpb.github.io/api/ccdb/fields.html) for a description of each field. 52 | 53 | * Notice that complaints were not listed in chronological order 54 | * In 2019, there was a complaint against `TRANSWORLD SYSTEMS INC` for `Debt collection` 55 | * Also in 2019, `Experian Information Solutions Inc.` received one complaint for `Credit reporting, credit repair services, or other personal consumer reports` while `TRANSUNION INTERMEDIATE HOLDINGS, INC.` received two 56 | * In 2020, `Experian Information Solutions Inc.` received a complaint for `Credit reporting, credit repair services, or other personal consumer reports` 57 | 58 | In summary that means 59 | * In 2019, there was one complaint for `Debt collection`, and 100% of it went to one company 60 | * Also in 2019, three complaints against two companies were received for `Credit reporting, credit repair services, or other personal consumer reports` and 2/3rd of them (or 67% if we rounded the percentage to the nearest whole number) were against one company (TRANSUNION INTERMEDIATE HOLDINGS, INC.) 61 | * In 2020, only one complaint was received for `Credit reporting, credit repair services, or other personal consumer reports`, and so the highest percentage received by one company would be 100% 62 | 63 | For this challenge, we want for each product and year that complaints were received, the total number of complaints, number of companies receiving a complaint and the highest percentage of complaints directed at a single company. 64 | 65 | For the purposes of this challenge, all names, including company and product, should be treated as case insensitive. For example, "Acme", "ACME", and "acme" would represent the same company. 66 | 67 | ## Expected output 68 | 69 | After reading and processing the input file, your code should create an output file, `report.csv`, with as many lines as unique pairs of product and year (of `Date received`) in the input file. 70 | 71 | Each line in the output file should list the following fields in the following order: 72 | * product (name should be written in all lowercase) 73 | * year 74 | * total number of complaints received for that product and year 75 | * total number of companies receiving at least one complaint for that product and year 76 | * highest percentage (rounded to the nearest whole number) of total complaints filed against one company for that product and year. Use standard rounding conventions (i.e., Any percentage between 0.5% and 1%, inclusive, should round to 1% and anything less than 0.5% should round to 0%) 77 | 78 | The lines in the output file should be sorted by product (alphabetically) and year (ascending) 79 | 80 | Given the above `complaints.csv` input file, we'd expect an output file, `report.csv`, in the following format 81 | ``` 82 | "credit reporting, credit repair services, or other personal consumer reports",2019,3,2,67 83 | "credit reporting, credit repair services, or other personal consumer reports",2020,1,1,100 84 | debt collection,2019,1,1,100 85 | ``` 86 | Notice that because `debt collection` was only listed for 2019 and not 2020, the output file only has a single entry for debt collection. Also, notice that when a product has a comma (`,`) in the name, the name should be enclosed by double quotation marks (`"`). Finally, notice that percentages are listed as numbers and do not have `%` in them. 87 | 88 | ## Instructions 89 | We designed this coding challenge to assess your coding skills, your understanding of computer science fundamentals and ability to program in a Linux environment. They are both prerequisites of becoming a data engineer. To solve this challenge you might pick a programing language of your choice (preferably Python, Scala, Java, or C/C++ because they are commonly used and will help us better assess you), but you are only allowed to use the default data structures that come with that programming language (you might use I/O libraries). For example, you can code in Python, but you should not use Pandas or any other external libraries (i.e., don't use Python modules that must be installed using 'pip'). 90 | 91 | The objective here is to see if you can implement the solution using basic data structure building blocks and software engineering best practices (by writing clean, modular, and well-tested code). 92 | 93 | ## Tips on getting an interview 94 | As a data engineer, it’s important that you write clean, well-documented code that scales for a large amount of data. For this reason, it’s important to ensure that your solution works well for a large number of records, rather than just the above example. 95 | 96 | [Here](http://files.consumerfinance.gov/ccdb/complaints.csv.zip) you can find a zipped, modest-sized dataset to test your code (see [here](https://cfpb.github.io/api/ccdb/fields.html) for more information on the data dictionary). 97 | 98 | Note, we will use this data to test the full functionality of your code, along with other test cases. 99 | 100 | It's important to use software engineering best practices like unit tests, especially because data is not always clean and predictable. 101 | 102 | Before submitting your solution you should summarize your approach and run instructions (if any) in your README. 103 | 104 | You may write your solution in any mainstream programming language, such as C, C++, Go, Java, Python, Ruby, or Scala. Once completed, submit a link of your Github or Bitbucket repo with your source code. 105 | 106 | In addition to the source code, the top-most directory of your repo must include the input and output directories, and a shell script named run.sh that compiles and runs the program(s) that implement(s) the required features. 107 | 108 | See the figure below for the required structure of the top-most directory in your repo, or simply clone this repo. 109 | 110 | ## Repo directory structure 111 | The top-level directory structure for your repo should look like the following: (So that we can grade your submission, replicate this directory structure at the top-most level of your project repository. Do not place the structure in a subdirectory) 112 | 113 | ├── README.md 114 | ├── run.sh 115 | ├── src 116 | │ └── consumer_complaints.py 117 | ├── input 118 | │ └── complaints.csv 119 | ├── output 120 | | └── report.csv 121 | ├── insight_testsuite 122 | └── tests 123 | └── test_1 124 | | ├── input 125 | | │ └── complaints.csv 126 | | |__ output 127 | | │ └── report.csv 128 | ├── your-own-test_1 129 | ├── input 130 | │ └── complaints.csv 131 | |── output 132 | └── report.csv 133 | 134 | **Don't fork this repo** and don't use this `README` instead of your own. The content of `src` does not need to be a single file called `consumer_complaints.py`, which is only an example. Instead, you should include your own source files and give them expressive names. 135 | 136 | ## Testing your code 137 | As an engineer, you'll want to make sure you are thoroughly testing your code. Use the `insight_testsuite` directory to showcase the tests you conducted on your code. Under that directory, create a separate folder for each test. Each test directory should also have a separate `input` subdirectory containing the `complaint.csv` input file you want to test, and an `output` subdirectory containing the expected `report.csv` output for that test. 138 | 139 | We've included one test (`test_1`), which contains the sample input and output files detailed in this Readme. To test your code, you can manually move each input test file into the top-level input directory, then run your program and compare the output with the expected output. Or you can write a script to do this automatically, but note we are not requiring you to write a test script. 140 | 141 | We do ask that you test your code using the web page mentioned earlier to ensure your code can run in the Linux environment that we will review your code. The test page will check to see if your code passes `test_1`. If there are errors or if the results don't match what is expected, you should debug your code's behavior by yourself. If you receive system errors that you do not believe are due to your code, you can email cc@insightdataengineering.com for help. 142 | 143 | If your code must be compiled to run (e.g., javac, make), that compilation (as well as the execution) of your code must be specified in the `run.sh` script of your code repository. 144 | 145 | For Python programmers, you can use Python 2 or Python 3. If you use the former, specify `python` in your `run.sh` script, or if you use the later, specify `python3`, which defaults to Python 3.5.2. Other options that could be use are `python3.7` or `python3.8`. 146 | 147 | ## Questions? 148 | Email us at cc@insightdataengineering.com 149 | 150 | --------------------------------------------------------------------------------