├── README.md
├── api
    ├── Dockerfile
    ├── Gemfile
    ├── Gemfile.lock
    ├── api.rb
    ├── config.ru
    ├── data_processor
    │   └── file_grabber.rb
    ├── deploy
    │   └── start.sh
    └── resource_presenter.rb
└── docker-compose.yml


/README.md:
--------------------------------------------------------------------------------
 1 | # API-in-a-Box
 2 | 
 3 | API-in-a-Box is exactly what it sounds like. Say you have a handful of CSV files that you need a searchable API for. Put those files in Github repository, spin up this API-in-a-Box, and there you go! A REST hypermedia API that utilizes Elasticsearch's killer searching.
 4 | 
 5 | ## Setup
 6 | 
 7 | Before you get started, you must have Docker and Docker Compose installed:
 8 | https://www.docker.com
 9 | https://docs.docker.com/compose/install/
10 | 
11 | Make sure your docker daemon is running by running `docker info`. You should see a long list of output, not an error about not being able to connect to the docker daemon.
12 | 
13 | Next, clone this repo. 
14 | 
15 | Then replace the environment variable `ORIGIN_REPO` in the `docker-compose.yml` file to point to the github repo (format: [username]/[repo]) you would like to pull data from. You may use the current repo "switzersc/atlanta-food-data" as an example.
16 | 
17 | Then, from the root of this directory, build the containers:
18 | 
19 |     docker-compose build
20 | 
21 | And start them up:
22 | 
23 |     docker-compose up
24 | 
25 | If you would like to run them in the background, add `-d` to the `up` command. 
26 | 
27 | Now you can curl to get your data:
28 | 
29 |     curl http://localhost:4567/resources
30 |     
31 | If you're running on boot2docker, you will need to replace `localhost` with your boot2docker ip.
32 | 
33 | You may need to specify a size to get data if you just see the metadata from the `/resources` request. Try:
34 | 
35 |     curl http://localhost:4567/resources?size=50
36 | 
37 | 
38 | ## Using this API
39 | 
40 | You have three endpoints to work with:
41 | 
42 | * `GET /resources`: Lists all resources, paginated by 50 by default. You can change the size of each response by passing in a `size` query param, and you can offset to get to the next "page" by passing the query param `from` to identify which result you'd like to start at.
43 | * `GET /resources/:id`: Returns a specific resource.
44 | * `GET /resources/search`: Search all resources. You can use `size` and `from` in the same way as above, but you also have all the queries given in the `queries` object in the `/resources` response. These queries match the field names (mappings) of each document, so for example if your CSV has a column called "STREET", each document now has a field with the name "STREET", and you can pass `STREET` as a query param in a request to this endpoint. You will get results which have a STREET value that includes any word in the value of this query. If you want results matching a whole phrase rather than any word in the phrase, you can pass a query string of `match_phrase=true`. You can also search by `FILE_SOURCE`, which is added to every document and contains the name of the file the document was originally a row in.
45 | 
46 | All responses are returned in the [Collection+JSON format](https://github.com/collection-json/spec), with a MIME type of 'application/vnd.collection+json'
47 | 
48 | 
49 | ## How it works
50 | 
51 | When you run `docker-compose up`, the Elasticsearch container starts, and then the API container starts with the `api/deploy/start.sh` script. This script sleeps for 8 seconds (to give time for the Elasticsearch container to start), then it runs the 'api/data_processor/file_grabber.rb' ruby file, which makes a request to Github's API to download the raw files of whatever is in the repository that you specify with the environment variable `ORIGIN_REPO`. It also creates an Elasticsearch index with the name `api` and then adds the rows from the downloaded files as documents to this index and gives them the type `resource`. 
52 | 
53 | Then the start script spins up a Sinatra server that provides three endpoints above, and returns responses formatted according to the [Collection+JSON spec](https://github.com/collection-json/spec). 
54 | 
55 | NB: Currently, if one of the source files has any incorrect formatting, it will probably raise an exception and the FileGrabber class will skip over this file. Watch the docker logs for the API container to see if there's any note of skipping a file. 
56 | 
57 | 
58 | ## To Do
59 | 
60 | These are ideas for further development:
61 | 
62 | * add validations and specs
63 | * add more cool Elasticsearch features
64 | * add examples of requests and responses to README
65 | * add ability to use any remote source, not just a github repo
66 | * allow for private github repo
67 | * add ability to use file structure in a repo to define different types of documents for different types of resources
68 | 


--------------------------------------------------------------------------------
/api/Dockerfile:
--------------------------------------------------------------------------------
 1 | FROM ruby:2.2.5
 2 | 
 3 | MAINTAINER Shelby Switzer <shelbyswitzer@gmail.com>
 4 | 
 5 | # Run updates
 6 | RUN apt-get update -qq && apt-get install -y build-essential libpq-dev
 7 | 
 8 | RUN mkdir /api
 9 | WORKDIR /api
10 | 
11 | ADD /Gemfile /api/Gemfile
12 | ADD /Gemfile.lock /api/Gemfile.lock
13 | RUN bundle install
14 | 
15 | EXPOSE 4567
16 | 
17 | COPY ./data_processor/file_grabber.rb /tmp/file_grabber.rb
18 | 
19 | 
20 | CMD bash ./deploy/start.sh
21 | 


--------------------------------------------------------------------------------
/api/Gemfile:
--------------------------------------------------------------------------------
 1 | source "http://rubygems.org"
 2 | 
 3 | gem 'sinatra'
 4 | gem 'tux'
 5 | gem 'pry'
 6 | gem 'rest-client'
 7 | gem 'httparty'
 8 | gem 'faraday', '0.8.9' # requirement for stretcher
 9 | gem 'stretcher'
10 | gem 'activesupport', '=4.0.1' # remote_table currently does not support 4.2.0
11 | gem 'remote_table'
12 | gem 'collection-json'
13 | gem 'rack-cors'
14 | gem 'rerun'
15 | 


--------------------------------------------------------------------------------
/api/Gemfile.lock:
--------------------------------------------------------------------------------
  1 | GEM
  2 |   remote: http://rubygems.org/
  3 |   specs:
  4 |     activesupport (4.0.1)
  5 |       i18n (~> 0.6, >= 0.6.4)
  6 |       minitest (~> 4.2)
  7 |       multi_json (~> 1.3)
  8 |       thread_safe (~> 0.1)
  9 |       tzinfo (~> 0.3.37)
 10 |     bond (0.5.1)
 11 |     coderay (1.1.0)
 12 |     collection-json (0.1.7)
 13 |     excon (0.45.3)
 14 |     faraday (0.8.9)
 15 |       multipart-post (~> 1.2.0)
 16 |     faraday_middleware (0.9.2)
 17 |       faraday (>= 0.7.4, < 0.10)
 18 |     faraday_middleware-multi_json (0.0.6)
 19 |       faraday_middleware
 20 |       multi_json
 21 |     fastercsv (1.5.5)
 22 |     ffi (1.9.18)
 23 |     fixed_width-multibyte (0.2.3)
 24 |       activesupport
 25 |     hash_digest (1.1.3)
 26 |       murmurhash3
 27 |     hashie (3.4.2)
 28 |     httparty (0.13.5)
 29 |       json (~> 1.8)
 30 |       multi_xml (>= 0.5.2)
 31 |     i18n (0.7.0)
 32 |     json (1.8.3)
 33 |     listen (3.1.5)
 34 |       rb-fsevent (~> 0.9, >= 0.9.4)
 35 |       rb-inotify (~> 0.9, >= 0.9.7)
 36 |       ruby_dep (~> 1.2)
 37 |     method_source (0.8.2)
 38 |     mime-types (2.4.3)
 39 |     mini_portile (0.6.2)
 40 |     minitest (4.7.5)
 41 |     multi_json (1.11.2)
 42 |     multi_xml (0.5.5)
 43 |     multipart-post (1.2.0)
 44 |     murmurhash3 (0.1.6)
 45 |     netrc (0.10.2)
 46 |     nokogiri (1.6.6.2)
 47 |       mini_portile (~> 0.6.0)
 48 |     pry (0.10.1)
 49 |       coderay (~> 1.1.0)
 50 |       method_source (~> 0.8.1)
 51 |       slop (~> 3.4)
 52 |     rack (1.6.0)
 53 |     rack-cors (0.4.0)
 54 |     rack-protection (1.5.3)
 55 |       rack
 56 |     rack-test (0.6.3)
 57 |       rack (>= 1.0)
 58 |     rb-fsevent (0.10.2)
 59 |     rb-inotify (0.9.10)
 60 |       ffi (>= 0.5.0, < 2)
 61 |     remote_table (3.3.0)
 62 |       activesupport (>= 2.3.4)
 63 |       fastercsv (>= 1.5.0)
 64 |       fixed_width-multibyte (>= 0.2.3)
 65 |       hash_digest (>= 1.1.3)
 66 |       i18n
 67 |       roo (>= 1.11)
 68 |       unix_utils (>= 0.0.8)
 69 |     rerun (0.11.0)
 70 |       listen (~> 3.0)
 71 |     rest-client (1.7.2)
 72 |       mime-types (>= 1.16, < 3.0)
 73 |       netrc (~> 0.7)
 74 |     ripl (0.7.1)
 75 |       bond (~> 0.5.1)
 76 |     ripl-multi_line (0.3.1)
 77 |       ripl (>= 0.3.6)
 78 |     ripl-rack (0.2.1)
 79 |       rack (>= 1.0)
 80 |       rack-test (~> 0.6.2)
 81 |       ripl (>= 0.7.0)
 82 |     roo (2.0.1)
 83 |       nokogiri (~> 1)
 84 |       rubyzip (~> 1.1, < 2.0.0)
 85 |     ruby_dep (1.5.0)
 86 |     rubyzip (1.1.7)
 87 |     sinatra (1.4.5)
 88 |       rack (~> 1.4)
 89 |       rack-protection (~> 1.4)
 90 |       tilt (~> 1.3, >= 1.3.4)
 91 |     slop (3.6.0)
 92 |     stretcher (1.21.1)
 93 |       excon (>= 0.16)
 94 |       faraday (~> 0.8)
 95 |       faraday_middleware (~> 0.9.0)
 96 |       faraday_middleware-multi_json (~> 0.0.5)
 97 |       hashie (>= 1.2.0)
 98 |       multi_json (>= 1.0)
 99 |     thread_safe (0.3.5)
100 |     tilt (1.4.1)
101 |     tux (0.3.0)
102 |       ripl (>= 0.3.5)
103 |       ripl-multi_line (>= 0.2.4)
104 |       ripl-rack (>= 0.2.0)
105 |       sinatra (>= 1.2.1)
106 |     tzinfo (0.3.44)
107 |     unix_utils (0.0.15)
108 | 
109 | PLATFORMS
110 |   ruby
111 | 
112 | DEPENDENCIES
113 |   activesupport (= 4.0.1)
114 |   collection-json
115 |   faraday (= 0.8.9)
116 |   httparty
117 |   pry
118 |   rack-cors
119 |   remote_table
120 |   rerun
121 |   rest-client
122 |   sinatra
123 |   stretcher
124 |   tux
125 | 
126 | BUNDLED WITH
127 |    1.13.6
128 | 


--------------------------------------------------------------------------------
/api/api.rb:
--------------------------------------------------------------------------------
 1 | gem 'faraday', '0.8.9' # requirement for stretcher
 2 | require "sinatra/base"
 3 | require 'stretcher'
 4 | require './resource_presenter'
 5 | 
 6 | class ApiInABox < Sinatra::Base
 7 | 
 8 |   get "/resources" do
 9 |     size = params["size"] || "50"
10 |     from = params["from"].to_i
11 |     search = server.index(:api).search(size: size, from: from)
12 |     docs = search.documents
13 |     mappings = server.index(:api).get_mapping["api"]["mappings"]["resource"]["properties"]
14 |     resources = ResourcePresenter.new({resources: docs, queries: mappings}).as_json
15 |     content_type 'application/vnd.collection+json'
16 |     resources.to_json
17 |   end
18 | 
19 |   get "/resources/search" do
20 |     if params["match_phrase"] == "true"
21 |       params.delete("match_phrase")
22 |       size = params.delete("size") || 50
23 |       from = params.delete("from")
24 |       search = server.index(:api).search(size: size, from: from, query: {match_phrase: params})
25 |     elsif params.present?
26 |       present_params = params.select{|k,v| v.present?}
27 |       query_string = present_params.values.join(", ")
28 |       search = server.index(:api).search(query: {match: present_params})
29 |     else
30 |       search = server.index(:api).search(size: 50)
31 |     end
32 | 
33 |     docs = search.documents
34 |     mappings = server.index(:api).get_mapping["api"]["mappings"]["resource"]["properties"]
35 |     resources = ResourcePresenter.new({resources: docs, queries: mappings}).as_json
36 |     content_type 'application/vnd.collection+json'
37 |     resources.to_json
38 |   end
39 | 
40 |   get "/resources/:id" do
41 |     doc = server.index(:api).type(:resource).get(params[:id])
42 |     resources = ResourcePresenter.new({resources: [doc]}).as_json
43 |     content_type 'application/vnd.collection+json'
44 |     resources.to_json
45 |   end
46 | 
47 |   def host
48 |     ENV["ELASTIC_HOST"] || "http://localhost:9200"
49 |   end
50 | 
51 |   def server
52 |     @server ||= Stretcher::Server.new(host)
53 |   end
54 | end
55 | 


--------------------------------------------------------------------------------
/api/config.ru:
--------------------------------------------------------------------------------
 1 | require 'rubygems'
 2 | require 'bundler'
 3 | 
 4 | Bundler.require
 5 | 
 6 | require './api'
 7 | 
 8 | use Rack::Cors do
 9 |   allow do
10 |     origins '*'
11 |     resource '*', :headers => :any, :methods => :get
12 |   end
13 | end
14 | 
15 | run ApiInABox
16 | 


--------------------------------------------------------------------------------
/api/data_processor/file_grabber.rb:
--------------------------------------------------------------------------------
 1 | gem 'faraday', '0.8.9'
 2 | gem 'activesupport', '=4.0.1'
 3 | require 'active_support'
 4 | require 'httparty'
 5 | require 'stretcher' # elastic search client
 6 | require 'pry'
 7 | require 'remote_table'
 8 | 
 9 | class FileGrabber
10 | 
11 |   def initialize(options={})
12 |     @repo = options[:repo] || ENV["ORIGIN_REPO"] || "switzersc/atlanta-food-data"
13 |     @index = options[:index] || :api
14 |     @host = ENV["ELASTIC_HOST"] || "http://localhost:9200"
15 |     @server = Stretcher::Server.new(@host)
16 |   end
17 | 
18 |   def run
19 |     create_elastic_index
20 |     p "Getting contents from repo"
21 |     @contents = get_contents
22 |     p "Got contents"
23 |     @contents.each do |file|
24 |       process_file(file)
25 |     end
26 |   end
27 | 
28 |   private
29 | 
30 |   def create_elastic_index
31 |     unless (index = @server.index(@index)).exists?
32 |       index.create
33 |     end
34 |   end
35 | 
36 |   # TODO validate URI and github repo
37 |   def get_contents
38 |     base = "https://api.github.com/repos/"
39 |     url = base + @repo + "/contents"
40 |     HTTParty.get(url)
41 |   end
42 | 
43 |   # TODO add default mappings to ES so everything is a string
44 |   # TODO don't add these rows if they already exist in database
45 |   # TODO fix invalid csv files
46 |   def process_file(file)
47 |     # create remote_table object for file contents
48 |     name = file["name"]
49 |     p "Processing file #{name}"
50 |     table = build_table(file)
51 | 
52 |     docs = table.rows.map do |row|
53 |         row.merge({"FILE_SOURCE"=>name, "_type"=>"resource"})
54 |     end
55 |     # add json docs in bulk to elasticsearch
56 |     @server.index(@index).bulk_index(docs)
57 |   rescue => e
58 |     p "Encountered error parsing file #{file}: #{e}"
59 |     p "Skipping file #{file}"
60 |   end
61 | 
62 |   # TODO add validations, exception handling
63 |   def build_table(file)
64 |     url = file["download_url"]
65 |     name = file["name"]
66 |     RemoteTable.new url, file_name: name
67 |   end
68 | end
69 | 
70 | FileGrabber.new().run
71 | 


--------------------------------------------------------------------------------
/api/deploy/start.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | echo "sleeping because compose doesn't wait to make sure elastic is ready"
 4 | sleep 8
 5 | 
 6 | echo "running the file grabber"
 7 | ruby /tmp/file_grabber.rb
 8 | 
 9 | echo "starting the API server with live reload"
10 | rerun "bundle exec rackup -p 4567 -o 0.0.0.0"
11 | 


--------------------------------------------------------------------------------
/api/resource_presenter.rb:
--------------------------------------------------------------------------------
 1 | require 'collection-json'
 2 | class ResourcePresenter
 3 | 
 4 |   def initialize(options={})
 5 |     @resources = options[:resources] # required
 6 |     @queries = options[:queries]     # optional
 7 |   end
 8 | 
 9 |   def as_json
10 |     CollectionJSON.generate_for('/resources/') do |builder|
11 |       @resources.each do |resource|
12 |         builder.add_item("/resources/#{resource._id}") do |item|
13 |           resource.each do |k,v|
14 |             item.add_data k, value: v
15 |           end
16 |         end
17 |       end
18 | 
19 |       if @queries
20 |         builder.add_query("/resources/search", "search", prompt: "Search") do |query|
21 |           # queries for each column name from CSVs
22 |           @queries.keys.each do |query_name|
23 |             query.add_data query_name
24 |           end
25 |           # other useful queries
26 |           query.add_data "match_phrase" # if you want to match the exact phrase given in another param, pass true
27 |           query.add_data "size" # how many results you want back at once. 50 is default
28 |           query.add_data "from" # where you would like to begin in the results array. default is 0 (the first result, then you get results 1-50)
29 |         end
30 |       end
31 |     end
32 |   end
33 | end
34 | 


--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
 1 | elastic:
 2 |   image: elasticsearch
 3 |   ports:
 4 |     - "9200:9200"
 5 |     - "9300:9300"
 6 |   volumes:
 7 |     - elasticsearch_data:/usr/share/elasticsearch/data
 8 | api:
 9 |   build: ./api
10 |   command: bash ./deploy/start.sh
11 |   volumes:
12 |     - ./api:/api
13 |   ports:
14 |     - "4567:4567"
15 |   links:
16 |     - elastic
17 |   environment:
18 |     - ORIGIN_REPO=switzersc/atlanta-food-data
19 |     - ELASTIC_HOST=http://elastic:9200
20 | 


--------------------------------------------------------------------------------