├── README.md ├── api ├── Dockerfile ├── Gemfile ├── Gemfile.lock ├── api.rb ├── config.ru ├── data_processor │ └── file_grabber.rb ├── deploy │ └── start.sh └── resource_presenter.rb └── docker-compose.yml /README.md: -------------------------------------------------------------------------------- 1 | # API-in-a-Box 2 | 3 | API-in-a-Box is exactly what it sounds like. Say you have a handful of CSV files that you need a searchable API for. Put those files in Github repository, spin up this API-in-a-Box, and there you go! A REST hypermedia API that utilizes Elasticsearch's killer searching. 4 | 5 | ## Setup 6 | 7 | Before you get started, you must have Docker and Docker Compose installed: 8 | https://www.docker.com 9 | https://docs.docker.com/compose/install/ 10 | 11 | Make sure your docker daemon is running by running `docker info`. You should see a long list of output, not an error about not being able to connect to the docker daemon. 12 | 13 | Next, clone this repo. 14 | 15 | Then replace the environment variable `ORIGIN_REPO` in the `docker-compose.yml` file to point to the github repo (format: [username]/[repo]) you would like to pull data from. You may use the current repo "switzersc/atlanta-food-data" as an example. 16 | 17 | Then, from the root of this directory, build the containers: 18 | 19 | docker-compose build 20 | 21 | And start them up: 22 | 23 | docker-compose up 24 | 25 | If you would like to run them in the background, add `-d` to the `up` command. 26 | 27 | Now you can curl to get your data: 28 | 29 | curl http://localhost:4567/resources 30 | 31 | If you're running on boot2docker, you will need to replace `localhost` with your boot2docker ip. 32 | 33 | You may need to specify a size to get data if you just see the metadata from the `/resources` request. Try: 34 | 35 | curl http://localhost:4567/resources?size=50 36 | 37 | 38 | ## Using this API 39 | 40 | You have three endpoints to work with: 41 | 42 | * `GET /resources`: Lists all resources, paginated by 50 by default. You can change the size of each response by passing in a `size` query param, and you can offset to get to the next "page" by passing the query param `from` to identify which result you'd like to start at. 43 | * `GET /resources/:id`: Returns a specific resource. 44 | * `GET /resources/search`: Search all resources. You can use `size` and `from` in the same way as above, but you also have all the queries given in the `queries` object in the `/resources` response. These queries match the field names (mappings) of each document, so for example if your CSV has a column called "STREET", each document now has a field with the name "STREET", and you can pass `STREET` as a query param in a request to this endpoint. You will get results which have a STREET value that includes any word in the value of this query. If you want results matching a whole phrase rather than any word in the phrase, you can pass a query string of `match_phrase=true`. You can also search by `FILE_SOURCE`, which is added to every document and contains the name of the file the document was originally a row in. 45 | 46 | All responses are returned in the [Collection+JSON format](https://github.com/collection-json/spec), with a MIME type of 'application/vnd.collection+json' 47 | 48 | 49 | ## How it works 50 | 51 | When you run `docker-compose up`, the Elasticsearch container starts, and then the API container starts with the `api/deploy/start.sh` script. This script sleeps for 8 seconds (to give time for the Elasticsearch container to start), then it runs the 'api/data_processor/file_grabber.rb' ruby file, which makes a request to Github's API to download the raw files of whatever is in the repository that you specify with the environment variable `ORIGIN_REPO`. It also creates an Elasticsearch index with the name `api` and then adds the rows from the downloaded files as documents to this index and gives them the type `resource`. 52 | 53 | Then the start script spins up a Sinatra server that provides three endpoints above, and returns responses formatted according to the [Collection+JSON spec](https://github.com/collection-json/spec). 54 | 55 | NB: Currently, if one of the source files has any incorrect formatting, it will probably raise an exception and the FileGrabber class will skip over this file. Watch the docker logs for the API container to see if there's any note of skipping a file. 56 | 57 | 58 | ## To Do 59 | 60 | These are ideas for further development: 61 | 62 | * add validations and specs 63 | * add more cool Elasticsearch features 64 | * add examples of requests and responses to README 65 | * add ability to use any remote source, not just a github repo 66 | * allow for private github repo 67 | * add ability to use file structure in a repo to define different types of documents for different types of resources 68 | -------------------------------------------------------------------------------- /api/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM ruby:2.2.5 2 | 3 | MAINTAINER Shelby Switzer 4 | 5 | # Run updates 6 | RUN apt-get update -qq && apt-get install -y build-essential libpq-dev 7 | 8 | RUN mkdir /api 9 | WORKDIR /api 10 | 11 | ADD /Gemfile /api/Gemfile 12 | ADD /Gemfile.lock /api/Gemfile.lock 13 | RUN bundle install 14 | 15 | EXPOSE 4567 16 | 17 | COPY ./data_processor/file_grabber.rb /tmp/file_grabber.rb 18 | 19 | 20 | CMD bash ./deploy/start.sh 21 | -------------------------------------------------------------------------------- /api/Gemfile: -------------------------------------------------------------------------------- 1 | source "http://rubygems.org" 2 | 3 | gem 'sinatra' 4 | gem 'tux' 5 | gem 'pry' 6 | gem 'rest-client' 7 | gem 'httparty' 8 | gem 'faraday', '0.8.9' # requirement for stretcher 9 | gem 'stretcher' 10 | gem 'activesupport', '=4.0.1' # remote_table currently does not support 4.2.0 11 | gem 'remote_table' 12 | gem 'collection-json' 13 | gem 'rack-cors' 14 | gem 'rerun' 15 | -------------------------------------------------------------------------------- /api/Gemfile.lock: -------------------------------------------------------------------------------- 1 | GEM 2 | remote: http://rubygems.org/ 3 | specs: 4 | activesupport (4.0.1) 5 | i18n (~> 0.6, >= 0.6.4) 6 | minitest (~> 4.2) 7 | multi_json (~> 1.3) 8 | thread_safe (~> 0.1) 9 | tzinfo (~> 0.3.37) 10 | bond (0.5.1) 11 | coderay (1.1.0) 12 | collection-json (0.1.7) 13 | excon (0.45.3) 14 | faraday (0.8.9) 15 | multipart-post (~> 1.2.0) 16 | faraday_middleware (0.9.2) 17 | faraday (>= 0.7.4, < 0.10) 18 | faraday_middleware-multi_json (0.0.6) 19 | faraday_middleware 20 | multi_json 21 | fastercsv (1.5.5) 22 | ffi (1.9.18) 23 | fixed_width-multibyte (0.2.3) 24 | activesupport 25 | hash_digest (1.1.3) 26 | murmurhash3 27 | hashie (3.4.2) 28 | httparty (0.13.5) 29 | json (~> 1.8) 30 | multi_xml (>= 0.5.2) 31 | i18n (0.7.0) 32 | json (1.8.3) 33 | listen (3.1.5) 34 | rb-fsevent (~> 0.9, >= 0.9.4) 35 | rb-inotify (~> 0.9, >= 0.9.7) 36 | ruby_dep (~> 1.2) 37 | method_source (0.8.2) 38 | mime-types (2.4.3) 39 | mini_portile (0.6.2) 40 | minitest (4.7.5) 41 | multi_json (1.11.2) 42 | multi_xml (0.5.5) 43 | multipart-post (1.2.0) 44 | murmurhash3 (0.1.6) 45 | netrc (0.10.2) 46 | nokogiri (1.6.6.2) 47 | mini_portile (~> 0.6.0) 48 | pry (0.10.1) 49 | coderay (~> 1.1.0) 50 | method_source (~> 0.8.1) 51 | slop (~> 3.4) 52 | rack (1.6.0) 53 | rack-cors (0.4.0) 54 | rack-protection (1.5.3) 55 | rack 56 | rack-test (0.6.3) 57 | rack (>= 1.0) 58 | rb-fsevent (0.10.2) 59 | rb-inotify (0.9.10) 60 | ffi (>= 0.5.0, < 2) 61 | remote_table (3.3.0) 62 | activesupport (>= 2.3.4) 63 | fastercsv (>= 1.5.0) 64 | fixed_width-multibyte (>= 0.2.3) 65 | hash_digest (>= 1.1.3) 66 | i18n 67 | roo (>= 1.11) 68 | unix_utils (>= 0.0.8) 69 | rerun (0.11.0) 70 | listen (~> 3.0) 71 | rest-client (1.7.2) 72 | mime-types (>= 1.16, < 3.0) 73 | netrc (~> 0.7) 74 | ripl (0.7.1) 75 | bond (~> 0.5.1) 76 | ripl-multi_line (0.3.1) 77 | ripl (>= 0.3.6) 78 | ripl-rack (0.2.1) 79 | rack (>= 1.0) 80 | rack-test (~> 0.6.2) 81 | ripl (>= 0.7.0) 82 | roo (2.0.1) 83 | nokogiri (~> 1) 84 | rubyzip (~> 1.1, < 2.0.0) 85 | ruby_dep (1.5.0) 86 | rubyzip (1.1.7) 87 | sinatra (1.4.5) 88 | rack (~> 1.4) 89 | rack-protection (~> 1.4) 90 | tilt (~> 1.3, >= 1.3.4) 91 | slop (3.6.0) 92 | stretcher (1.21.1) 93 | excon (>= 0.16) 94 | faraday (~> 0.8) 95 | faraday_middleware (~> 0.9.0) 96 | faraday_middleware-multi_json (~> 0.0.5) 97 | hashie (>= 1.2.0) 98 | multi_json (>= 1.0) 99 | thread_safe (0.3.5) 100 | tilt (1.4.1) 101 | tux (0.3.0) 102 | ripl (>= 0.3.5) 103 | ripl-multi_line (>= 0.2.4) 104 | ripl-rack (>= 0.2.0) 105 | sinatra (>= 1.2.1) 106 | tzinfo (0.3.44) 107 | unix_utils (0.0.15) 108 | 109 | PLATFORMS 110 | ruby 111 | 112 | DEPENDENCIES 113 | activesupport (= 4.0.1) 114 | collection-json 115 | faraday (= 0.8.9) 116 | httparty 117 | pry 118 | rack-cors 119 | remote_table 120 | rerun 121 | rest-client 122 | sinatra 123 | stretcher 124 | tux 125 | 126 | BUNDLED WITH 127 | 1.13.6 128 | -------------------------------------------------------------------------------- /api/api.rb: -------------------------------------------------------------------------------- 1 | gem 'faraday', '0.8.9' # requirement for stretcher 2 | require "sinatra/base" 3 | require 'stretcher' 4 | require './resource_presenter' 5 | 6 | class ApiInABox < Sinatra::Base 7 | 8 | get "/resources" do 9 | size = params["size"] || "50" 10 | from = params["from"].to_i 11 | search = server.index(:api).search(size: size, from: from) 12 | docs = search.documents 13 | mappings = server.index(:api).get_mapping["api"]["mappings"]["resource"]["properties"] 14 | resources = ResourcePresenter.new({resources: docs, queries: mappings}).as_json 15 | content_type 'application/vnd.collection+json' 16 | resources.to_json 17 | end 18 | 19 | get "/resources/search" do 20 | if params["match_phrase"] == "true" 21 | params.delete("match_phrase") 22 | size = params.delete("size") || 50 23 | from = params.delete("from") 24 | search = server.index(:api).search(size: size, from: from, query: {match_phrase: params}) 25 | elsif params.present? 26 | present_params = params.select{|k,v| v.present?} 27 | query_string = present_params.values.join(", ") 28 | search = server.index(:api).search(query: {match: present_params}) 29 | else 30 | search = server.index(:api).search(size: 50) 31 | end 32 | 33 | docs = search.documents 34 | mappings = server.index(:api).get_mapping["api"]["mappings"]["resource"]["properties"] 35 | resources = ResourcePresenter.new({resources: docs, queries: mappings}).as_json 36 | content_type 'application/vnd.collection+json' 37 | resources.to_json 38 | end 39 | 40 | get "/resources/:id" do 41 | doc = server.index(:api).type(:resource).get(params[:id]) 42 | resources = ResourcePresenter.new({resources: [doc]}).as_json 43 | content_type 'application/vnd.collection+json' 44 | resources.to_json 45 | end 46 | 47 | def host 48 | ENV["ELASTIC_HOST"] || "http://localhost:9200" 49 | end 50 | 51 | def server 52 | @server ||= Stretcher::Server.new(host) 53 | end 54 | end 55 | -------------------------------------------------------------------------------- /api/config.ru: -------------------------------------------------------------------------------- 1 | require 'rubygems' 2 | require 'bundler' 3 | 4 | Bundler.require 5 | 6 | require './api' 7 | 8 | use Rack::Cors do 9 | allow do 10 | origins '*' 11 | resource '*', :headers => :any, :methods => :get 12 | end 13 | end 14 | 15 | run ApiInABox 16 | -------------------------------------------------------------------------------- /api/data_processor/file_grabber.rb: -------------------------------------------------------------------------------- 1 | gem 'faraday', '0.8.9' 2 | gem 'activesupport', '=4.0.1' 3 | require 'active_support' 4 | require 'httparty' 5 | require 'stretcher' # elastic search client 6 | require 'pry' 7 | require 'remote_table' 8 | 9 | class FileGrabber 10 | 11 | def initialize(options={}) 12 | @repo = options[:repo] || ENV["ORIGIN_REPO"] || "switzersc/atlanta-food-data" 13 | @index = options[:index] || :api 14 | @host = ENV["ELASTIC_HOST"] || "http://localhost:9200" 15 | @server = Stretcher::Server.new(@host) 16 | end 17 | 18 | def run 19 | create_elastic_index 20 | p "Getting contents from repo" 21 | @contents = get_contents 22 | p "Got contents" 23 | @contents.each do |file| 24 | process_file(file) 25 | end 26 | end 27 | 28 | private 29 | 30 | def create_elastic_index 31 | unless (index = @server.index(@index)).exists? 32 | index.create 33 | end 34 | end 35 | 36 | # TODO validate URI and github repo 37 | def get_contents 38 | base = "https://api.github.com/repos/" 39 | url = base + @repo + "/contents" 40 | HTTParty.get(url) 41 | end 42 | 43 | # TODO add default mappings to ES so everything is a string 44 | # TODO don't add these rows if they already exist in database 45 | # TODO fix invalid csv files 46 | def process_file(file) 47 | # create remote_table object for file contents 48 | name = file["name"] 49 | p "Processing file #{name}" 50 | table = build_table(file) 51 | 52 | docs = table.rows.map do |row| 53 | row.merge({"FILE_SOURCE"=>name, "_type"=>"resource"}) 54 | end 55 | # add json docs in bulk to elasticsearch 56 | @server.index(@index).bulk_index(docs) 57 | rescue => e 58 | p "Encountered error parsing file #{file}: #{e}" 59 | p "Skipping file #{file}" 60 | end 61 | 62 | # TODO add validations, exception handling 63 | def build_table(file) 64 | url = file["download_url"] 65 | name = file["name"] 66 | RemoteTable.new url, file_name: name 67 | end 68 | end 69 | 70 | FileGrabber.new().run 71 | -------------------------------------------------------------------------------- /api/deploy/start.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | echo "sleeping because compose doesn't wait to make sure elastic is ready" 4 | sleep 8 5 | 6 | echo "running the file grabber" 7 | ruby /tmp/file_grabber.rb 8 | 9 | echo "starting the API server with live reload" 10 | rerun "bundle exec rackup -p 4567 -o 0.0.0.0" 11 | -------------------------------------------------------------------------------- /api/resource_presenter.rb: -------------------------------------------------------------------------------- 1 | require 'collection-json' 2 | class ResourcePresenter 3 | 4 | def initialize(options={}) 5 | @resources = options[:resources] # required 6 | @queries = options[:queries] # optional 7 | end 8 | 9 | def as_json 10 | CollectionJSON.generate_for('/resources/') do |builder| 11 | @resources.each do |resource| 12 | builder.add_item("/resources/#{resource._id}") do |item| 13 | resource.each do |k,v| 14 | item.add_data k, value: v 15 | end 16 | end 17 | end 18 | 19 | if @queries 20 | builder.add_query("/resources/search", "search", prompt: "Search") do |query| 21 | # queries for each column name from CSVs 22 | @queries.keys.each do |query_name| 23 | query.add_data query_name 24 | end 25 | # other useful queries 26 | query.add_data "match_phrase" # if you want to match the exact phrase given in another param, pass true 27 | query.add_data "size" # how many results you want back at once. 50 is default 28 | query.add_data "from" # where you would like to begin in the results array. default is 0 (the first result, then you get results 1-50) 29 | end 30 | end 31 | end 32 | end 33 | end 34 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | elastic: 2 | image: elasticsearch 3 | ports: 4 | - "9200:9200" 5 | - "9300:9300" 6 | volumes: 7 | - elasticsearch_data:/usr/share/elasticsearch/data 8 | api: 9 | build: ./api 10 | command: bash ./deploy/start.sh 11 | volumes: 12 | - ./api:/api 13 | ports: 14 | - "4567:4567" 15 | links: 16 | - elastic 17 | environment: 18 | - ORIGIN_REPO=switzersc/atlanta-food-data 19 | - ELASTIC_HOST=http://elastic:9200 20 | --------------------------------------------------------------------------------