├── README.md ├── TEST.md ├── assets └── vendas-combustiveis-m3.xls └── images ├── pivot.png └── raizen.png /README.md: -------------------------------------------------------------------------------- 1 | Data Engineering Test 2 | ===================== 3 | 4 | This test is part of the hiring process for Data Engineers @ Raízen Analytics. 5 | 6 | Our team is responsible for all advanced analytics, machine learning and data science initiatives in every business at Raízen. 7 | 8 | We will always have room for talented and skilled data engineering professionals. If you are interested in joining our team, complete this test and send the repository's link to `rudison dot lacerda at raizen dot com` or apply for an opening in our [portal](https://vagas.raizen.com.br). 9 | 10 | **Access the [ANP Fuel Sales ETL Test](https://github.com/raizen-analytics/data-engineering-test/blob/master/TEST.md).** 11 | 12 | ## About Raízen 13 | 14 | ![Raízen Logo](./images/raizen.png) 15 | 16 | **Raízen is an integrated energy company.** 17 | 18 | We are among the largest private business groups in Brazil and our team is our biggest differential: around 29k employees and 15k business partners thoughout the country. 19 | 20 | Originating in Brazil, we also work in Argentina and we have the ambition to win the world. We propel the people and the country with energy solutions. Energy is our business, and is in our name. 21 | 22 | We integrate all the stages of the production chain: from the cultivation of sugarcane to the production and sale of sugar and ethanol, the generation of bio-energy, and fuel distribution through Shell's brand license. 23 | 24 | [Find out more about Raízen](https://raizen.com.br/en) 25 | 26 | [Who we are](https://raizen.com.br/en/about-raizen/who-we-are) 27 | -------------------------------------------------------------------------------- /TEST.md: -------------------------------------------------------------------------------- 1 | ANP Fuel Sales ETL Test 2 | ======================= 3 | 4 | This test consists in developing an ETL pipeline to extract internal pivot caches from consolidated reports [made available](http://www.anp.gov.br/dados-estatisticos) by Brazilian government's regulatory agency for oil/fuels, *ANP (Agência Nacional do Petróleo, Gás Natural e Biocombustíveis)*. 5 | 6 | **The raw file can be found [here](https://github.com/raizen-analytics/data-engineering-test/raw/master/assets/vendas-combustiveis-m3.xls).** 7 | 8 | ## Goal 9 | 10 | This `xls` file has some pivot tables like this one: 11 | 12 | ![Pivot Table](./images/pivot.png) 13 | 14 | The developed pipeline is meant to extract and structure the underlying data of two of these tables: 15 | - Sales of oil derivative fuels by UF and product 16 | - Sales of diesel by UF and type 17 | 18 | The totals of the extracted data must be equal to the totals of the pivot tables. 19 | 20 | ## Schema 21 | 22 | Data should be stored in the following format: 23 | 24 | | Column | Type | 25 | | ------------ | ----------- | 26 | | `year_month` | `date` | 27 | | `uf` | `string` | 28 | | `product` | `string` | 29 | | `unit` | `string` | 30 | | `volume` | `double` | 31 | | `created_at` | `timestamp` | 32 | 33 | Remember to define a convenient partitioning or indexing schema. 34 | 35 | ## Closing Remarks 36 | 37 | Use the tools and technologies of your choice - preferably in Python - and store the code in GitHub. Seize this opportunity to demonstrate your skills in some data pipeline orchestration framework and containerization technology! Also, remember to add steps to your pipeline to check whether the extracted data is consistent with the consolidated values on raw tables. 38 | 39 | **Good luck!** 40 | -------------------------------------------------------------------------------- /assets/vendas-combustiveis-m3.xls: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raizen-analytics/data-engineering-test/026f1fcc512cee064ea7ecba082f7799cc1b9b04/assets/vendas-combustiveis-m3.xls -------------------------------------------------------------------------------- /images/pivot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raizen-analytics/data-engineering-test/026f1fcc512cee064ea7ecba082f7799cc1b9b04/images/pivot.png -------------------------------------------------------------------------------- /images/raizen.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/raizen-analytics/data-engineering-test/026f1fcc512cee064ea7ecba082f7799cc1b9b04/images/raizen.png --------------------------------------------------------------------------------