├── .gitignore
├── LICENSE
├── README.md
├── go.mod
├── go.sum
├── img
├── logo-dark.png
└── logo-light.png
├── main.go
└── testdata
├── sentences.txt
└── tlds.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | raink
2 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 Bishop Fox
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 | Use LLMs for document ranking.
8 |
9 |
10 | ## Description
11 |
12 | There's power in AI in that you can "throw a problem at it" and get some result, without even fully defining the problem. For example, give it a bunch of code diffs and a security advisory, and ask, "Which of these diffs seems most likely to fix the security bug?" However, it's not always that easy:
13 | - nondeterminism: doesn't always respond with the same result
14 | - context window: can't pass in all the data at once, need to break it up
15 | - output contraints: sometimes doesn't return all the data you asked it to review
16 | - subjectivity in scoring: has a really hard time assigning a numeric score to an individual item
17 |
18 | We built raink to circumvent those issues and solve general ranking problems that are otherwise difficult for LLMs to process. See our blog post [raink: Use LLMs for Document Ranking](https://bishopfox.com/blog/raink-llms-document-ranking) for more background on this technique, and our talk [Patch Perfect: Harmonizing with LLMs to Find Security Vulns](https://www.youtube.com/watch?v=IBuL1zY69tY) to see how we've applied raink to offensive security problems.
19 |
20 | ## Getting started
21 |
22 | ### Install
23 |
24 | ```
25 | git clone https://github.com/noperator/raink
26 | cd raink
27 | go install
28 | ```
29 |
30 | ### Configure
31 |
32 | Set your `OPENAI_API_KEY` environment variable.
33 |
34 | ### Usage
35 |
36 | ```
37 | raink -h
38 | Usage of raink:
39 | -dry-run
40 | Enable dry run mode (log API calls without making them)
41 | -encoding string
42 | Tokenizer encoding (default "o200k_base")
43 | -f string
44 | Input file
45 | -json
46 | Force JSON parsing regardless of file extension
47 | -o string
48 | JSON output file
49 | -ollama-model string
50 | Ollama model name (if not set, OpenAI will be used)
51 | -ollama-url string
52 | Ollama API URL (default "http://localhost:11434/api/chat")
53 | -openai-model string
54 | OpenAI model name (default "gpt-4o-mini")
55 | -p string
56 | Initial prompt (prefix with @ to use a file)
57 | -r int
58 | Number of runs (default 10)
59 | -ratio float
60 | Refinement ratio as a decimal (e.g., 0.5 for 50%) (default 0.5)
61 | -s int
62 | Number of items per batch (default 10)
63 | -t int
64 | Max tokens per batch (default 128000)
65 | -template string
66 | Template for each object in the input file (prefix with @ to use a file) (default "{{.Data}}")
67 | ```
68 |
69 | Compares 100 [sentences](https://github.com/noperator/raink/blob/main/testdata/sentences.txt) in under 2 min.
70 |
71 | ```
72 | raink \
73 | -f testdata/sentences.txt \
74 | -r 10 \
75 | -s 10 \
76 | -p 'Rank each of these items according to their relevancy to the concept of "time".' |
77 | jq -r '.[:10] | map(.value)[]' |
78 | nl
79 |
80 | 1 The train arrived exactly on time.
81 | 2 The old clock chimed twelve times.
82 | 3 The clock ticked steadily on the wall.
83 | 4 The bell rang, signaling the end of class.
84 | 5 The rooster crowed at the break of dawn.
85 | 6 She climbed to the top of the hill to watch the sunset.
86 | 7 He watched as the leaves fell one by one.
87 | 8 The stars twinkled brightly in the clear night sky.
88 | 9 He spotted a shooting star while stargazing.
89 | 10 She opened the curtains to let in the morning light.
90 | ```
91 |
92 | #### JSON Support
93 |
94 | If the input file is a JSON document, it will be read as an array of objects and each object will be used for ranking.
95 |
96 | For instance, two objects would be loaded and ranked from this document:
97 |
98 | ```json
99 | [
100 | {
101 | "path": "/foo",
102 | "code": "bar",
103 | },
104 | {
105 | "path": "/baz",
106 | "code": "nope",
107 | }
108 | ]
109 | ```
110 |
111 | #### Templates
112 |
113 | It is possible to include each element from the input file in a template using the [Go template syntax](https://pkg.go.dev/text/template) via the `-template "template string"` (or `-template @file.tpl`) argument.
114 |
115 | For text input files, each line can be referenced in the template with the `Data` variable:
116 |
117 | ```
118 | Anything you want with {{ .Data }}
119 | ```
120 |
121 | For JSON input files, each object in the array can be referenced directly. For instance, elements of the previous JSON example can be referenced in the template code like so:
122 |
123 | ```
124 | # {{ .path }}
125 |
126 | {{ .code }}
127 | ```
128 |
129 | Note in the following example that the resulting `value` key contains the actual value being presented for ranking (as described by the template), while the `object` key contains the entire original object from the input file for easy reference.
130 |
131 | ```
132 | # Create some test JSON data.
133 | seq 9 |
134 | paste -d @ - - - |
135 | parallel 'echo {} | tr @ "\n" | jo -a | jo nums=:/dev/stdin' |
136 | jo -a |
137 | tee input.json
138 |
139 | [{"nums":[1,2,3]},{"nums":[4,5,6]},{"nums":[7,8,9]}]
140 |
141 | # Use template to extract the first element of the nums array in each input object.
142 | raink \
143 | -f input.json \
144 | -template '{{ index .nums 0 }}' \
145 | -p 'Which is biggest?' \
146 | -r 1 |
147 | jq -c '.[]'
148 |
149 | {"key":"eQJpm-Qs","value":"7","object":{"nums":[7,8,9]},"score":0,"exposure":1,"rank":1}
150 | {"key":"SyJ3d9Td","value":"4","object":{"nums":[4,5,6]},"score":2,"exposure":1,"rank":2}
151 | {"key":"a4ayc_80","value":"1","object":{"nums":[1,2,3]},"score":3,"exposure":1,"rank":3}
152 | ```
153 |
154 | ## Back matter
155 |
156 | ### See also
157 |
158 | - [Hard problems that reduce to document ranking](https://noperator.dev/posts/document-ranking-for-complex-problems/)
159 | - [Commentary: Critical Thinking - Bug Bounty Podcast](https://youtu.be/qd08UBNpu7k?si=pMVEYtmKnyuJkL9B&t=1511)
160 | - [Discussion: Hacker News](https://news.ycombinator.com/item?id=43174910)
161 | - [Raink: Use LLMs for Document Ranking](https://bishopfox.com/blog/raink-llms-document-ranking)
162 | - [Patch Perfect: Harmonizing with LLMs to Find Security Vulns](https://www.youtube.com/watch?v=IBuL1zY69tY)
163 | - [Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting](https://arxiv.org/html/2306.17563v2)
164 | - [Introducing Rerank 3.5: Precise AI Search](https://cohere.com/blog/rerank-3pt5)
165 |
166 | ### To-do
167 |
168 | - [x] parallelize openai calls for each run
169 | - [x] save time by using shorter hash ids
170 | - [x] make sure that each randomized run is evenly split into groups so each one gets included/exposed
171 | - [ ] allow specifying an input _directory_ (where each file is distinct object)
172 | - [x] alert if the incoming context window is super large
173 | - [x] some batches near the end of a run (9?) are small for some reason
174 | - [ ] run openai batch mode
175 | - [x] automatically calculate optimal batch size?
176 | - [x] explore "tournament" sort vs complete exposure each time
177 | - [x] add parameter for refinement ratio
178 | - [x] add blog link
179 | - [x] support non-OpenAI models
180 | - [ ] add ~boolean~ refinement ratio flag
181 | - [ ] separate package and cli tool
182 | - [ ] add python bindings?
183 | - [ ] clarify when prompt included in token estimate
184 | - [ ] remove token limit threshold? potentially confusing/unnecessary
185 |
186 | ### License
187 |
188 | This project is licensed under the [MIT License](LICENSE).
189 |
--------------------------------------------------------------------------------
/go.mod:
--------------------------------------------------------------------------------
1 | module github.com/bishopfox/raink
2 |
3 | go 1.23.4
4 |
5 | require (
6 | github.com/invopop/jsonschema v0.12.0
7 | github.com/openai/openai-go v0.1.0-alpha.38
8 | github.com/pkoukk/tiktoken-go v0.1.7
9 | )
10 |
11 | require (
12 | github.com/bahlo/generic-list-go v0.2.0 // indirect
13 | github.com/buger/jsonparser v1.1.1 // indirect
14 | github.com/dlclark/regexp2 v1.10.0 // indirect
15 | github.com/google/uuid v1.6.0 // indirect
16 | github.com/mailru/easyjson v0.7.7 // indirect
17 | github.com/tidwall/gjson v1.14.4 // indirect
18 | github.com/tidwall/match v1.1.1 // indirect
19 | github.com/tidwall/pretty v1.2.1 // indirect
20 | github.com/tidwall/sjson v1.2.5 // indirect
21 | github.com/wk8/go-ordered-map/v2 v2.1.8 // indirect
22 | gopkg.in/yaml.v3 v3.0.1 // indirect
23 | )
24 |
--------------------------------------------------------------------------------
/go.sum:
--------------------------------------------------------------------------------
1 | github.com/bahlo/generic-list-go v0.2.0 h1:5sz/EEAK+ls5wF+NeqDpk5+iNdMDXrh3z3nPnH1Wvgk=
2 | github.com/bahlo/generic-list-go v0.2.0/go.mod h1:2KvAjgMlE5NNynlg/5iLrrCCZ2+5xWbdbCW3pNTGyYg=
3 | github.com/buger/jsonparser v1.1.1 h1:2PnMjfWD7wBILjqQbt530v576A/cAbQvEW9gGIpYMUs=
4 | github.com/buger/jsonparser v1.1.1/go.mod h1:6RYKKt7H4d4+iWqouImQ9R2FZql3VbhNgx27UK13J/0=
5 | github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
6 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
7 | github.com/dlclark/regexp2 v1.10.0 h1:+/GIL799phkJqYW+3YbOd8LCcbHzT0Pbo8zl70MHsq0=
8 | github.com/dlclark/regexp2 v1.10.0/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
9 | github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
10 | github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
11 | github.com/invopop/jsonschema v0.12.0 h1:6ovsNSuvn9wEQVOyc72aycBMVQFKz7cPdMJn10CvzRI=
12 | github.com/invopop/jsonschema v0.12.0/go.mod h1:ffZ5Km5SWWRAIN6wbDXItl95euhFz2uON45H2qjYt+0=
13 | github.com/josharian/intern v1.0.0/go.mod h1:5DoeVV0s6jJacbCEi61lwdGj/aVlrQvzHFFd8Hwg//Y=
14 | github.com/mailru/easyjson v0.7.7 h1:UGYAvKxe3sBsEDzO8ZeWOSlIQfWFlxbzLZe7hwFURr0=
15 | github.com/mailru/easyjson v0.7.7/go.mod h1:xzfreul335JAWq5oZzymOObrkdz5UnU4kGfJJLY9Nlc=
16 | github.com/openai/openai-go v0.1.0-alpha.38 h1:j/rL0aEIHWnWaPgA8/AXYKCI79ZoW44NTIpn7qfMEXQ=
17 | github.com/openai/openai-go v0.1.0-alpha.38/go.mod h1:3SdE6BffOX9HPEQv8IL/fi3LYZ5TUpRYaqGQZbyk11A=
18 | github.com/pkoukk/tiktoken-go v0.1.7 h1:qOBHXX4PHtvIvmOtyg1EeKlwFRiMKAcoMp4Q+bLQDmw=
19 | github.com/pkoukk/tiktoken-go v0.1.7/go.mod h1:9NiV+i9mJKGj1rYOT+njbv+ZwA/zJxYdewGl6qVatpg=
20 | github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
21 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
22 | github.com/stretchr/testify v1.8.2 h1:+h33VjcLVPDHtOdpUCuF+7gSuG3yGIftsP1YvFihtJ8=
23 | github.com/stretchr/testify v1.8.2/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4=
24 | github.com/tidwall/gjson v1.14.2/go.mod h1:/wbyibRr2FHMks5tjHJ5F8dMZh3AcwJEMf5vlfC0lxk=
25 | github.com/tidwall/gjson v1.14.4 h1:uo0p8EbA09J7RQaflQ1aBRffTR7xedD2bcIVSYxLnkM=
26 | github.com/tidwall/gjson v1.14.4/go.mod h1:/wbyibRr2FHMks5tjHJ5F8dMZh3AcwJEMf5vlfC0lxk=
27 | github.com/tidwall/match v1.1.1 h1:+Ho715JplO36QYgwN9PGYNhgZvoUSc9X2c80KVTi+GA=
28 | github.com/tidwall/match v1.1.1/go.mod h1:eRSPERbgtNPcGhD8UCthc6PmLEQXEWd3PRB5JTxsfmM=
29 | github.com/tidwall/pretty v1.2.0/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU=
30 | github.com/tidwall/pretty v1.2.1 h1:qjsOFOWWQl+N3RsoF5/ssm1pHmJJwhjlSbZ51I6wMl4=
31 | github.com/tidwall/pretty v1.2.1/go.mod h1:ITEVvHYasfjBbM0u2Pg8T2nJnzm8xPwvNhhsoaGGjNU=
32 | github.com/tidwall/sjson v1.2.5 h1:kLy8mja+1c9jlljvWTlSazM7cKDRfJuR/bOJhcY5NcY=
33 | github.com/tidwall/sjson v1.2.5/go.mod h1:Fvgq9kS/6ociJEDnK0Fk1cpYF4FIW6ZF7LAe+6jwd28=
34 | github.com/wk8/go-ordered-map/v2 v2.1.8 h1:5h/BUHu93oj4gIdvHHHGsScSTMijfx5PeYkE/fJgbpc=
35 | github.com/wk8/go-ordered-map/v2 v2.1.8/go.mod h1:5nJHM5DyteebpVlHnWMV0rPz6Zp7+xBAnxjb1X5vnTw=
36 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
37 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
38 | gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
39 | gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
40 |
--------------------------------------------------------------------------------
/img/logo-dark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/noperator/raink/3a5ac848b3eb9cf59589391ac853c7db333a4766/img/logo-dark.png
--------------------------------------------------------------------------------
/img/logo-light.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/noperator/raink/3a5ac848b3eb9cf59589391ac853c7db333a4766/img/logo-light.png
--------------------------------------------------------------------------------
/main.go:
--------------------------------------------------------------------------------
1 | package main
2 |
3 | import (
4 | "bufio"
5 | "bytes"
6 | "context"
7 | "crypto/sha256"
8 | "encoding/base64"
9 | "encoding/json"
10 | "flag"
11 | "fmt"
12 | "io"
13 | "log"
14 | "math/rand"
15 | "net/http"
16 | "os"
17 | "path/filepath"
18 | "sort"
19 | "strconv"
20 | "strings"
21 | "text/template"
22 | "time"
23 |
24 | "github.com/invopop/jsonschema"
25 | "github.com/openai/openai-go"
26 | "github.com/openai/openai-go/option"
27 | "github.com/pkoukk/tiktoken-go"
28 | )
29 |
30 | const (
31 | idLen = 8
32 | minBatchSize = 2
33 | )
34 |
35 | /*
36 | When deciding whether a value belongs in Config or Ranker structs, consider the following:
37 | - Does this value change during operation? → Ranker if yes, Config if no
38 | - Should users be able to configure this directly? → Config if yes, Ranker if no
39 | - Is this derived from other configuration? → Usually Ranker
40 | - Does this require initialization or cleanup? → Usually Ranker
41 | - Is this part of the public API? → Config if yes, Ranker if no
42 | */
43 |
44 | type Config struct {
45 | InitialPrompt string `json:"initial_prompt"`
46 | BatchSize int `json:"batch_size"`
47 | NumRuns int `json:"num_runs"`
48 | OllamaModel string `json:"ollama_model"`
49 | OpenAIModel openai.ChatModel `json:"openai_model"`
50 | TokenLimit int `json:"token_limit"`
51 | RefinementRatio float64 `json:"refinement_ratio"`
52 | OpenAIKey string `json:"-"`
53 | OpenAIAPIURL string `json:"-"`
54 | OllamaAPIURL string `json:"-"`
55 | Encoding string `json:"encoding"`
56 | BatchTokens int `json:"batch_tokens"`
57 | DryRun bool `json:"-"`
58 | }
59 |
60 | // TODO: Move all CLI flag validation this func instead.
61 | func (c *Config) Validate() error {
62 | if c.InitialPrompt == "" {
63 | return fmt.Errorf("initial prompt cannot be empty")
64 | }
65 | if c.BatchSize <= 0 {
66 | return fmt.Errorf("batch size must be greater than 0")
67 | }
68 | if c.NumRuns <= 0 {
69 | return fmt.Errorf("number of runs must be greater than 0")
70 | }
71 | if c.TokenLimit <= 0 {
72 | return fmt.Errorf("token limit must be greater than 0")
73 | }
74 | if c.OllamaModel == "" && c.OpenAIAPIURL == "" && c.OpenAIKey == "" {
75 | return fmt.Errorf("openai key cannot be empty")
76 | }
77 | if c.BatchSize < minBatchSize {
78 | return fmt.Errorf("batch size must be at least %d", minBatchSize)
79 | }
80 | return nil
81 | }
82 |
83 | type Ranker struct {
84 | cfg *Config
85 | encoding *tiktoken.Tiktoken
86 | rng *rand.Rand
87 | numBatches int
88 | round int
89 | }
90 |
91 | func NewRanker(config *Config) (*Ranker, error) {
92 | if err := config.Validate(); err != nil {
93 | return nil, err
94 | }
95 |
96 | encoding, err := tiktoken.GetEncoding(config.Encoding)
97 | if err != nil {
98 | return nil, fmt.Errorf("failed to get tiktoken encoding: %w", err)
99 | }
100 |
101 | return &Ranker{
102 | cfg: config,
103 | encoding: encoding,
104 | rng: rand.New(rand.NewSource(time.Now().UnixNano())),
105 | }, nil
106 | }
107 |
108 | func (ranker *Ranker) AdjustBatchSize(objects []Object, samples int) {
109 | // Dynamically adjust batch size upfront.
110 | for {
111 | valid := true
112 | var estTotalTokens int
113 | var numBatches int
114 |
115 | for i := 0; i < samples; i++ {
116 | ranker.rng.Shuffle(len(objects), func(i, j int) {
117 | objects[i], objects[j] = objects[j], objects[i]
118 | })
119 | numBatches = max(1, len(objects)/ranker.cfg.BatchSize) // Need at least one batch.
120 | for j := 0; j < numBatches; j++ {
121 | batch := objects[j*ranker.cfg.BatchSize : (j+1)*min(len(objects), ranker.cfg.BatchSize)] // Don't index more objects than we have.
122 | estBatchTokens := ranker.estimateTokens(batch, true)
123 | estTotalTokens += estBatchTokens
124 | if estBatchTokens > ranker.cfg.TokenLimit {
125 | log.Printf("Sample %d: estimated tokens %d > max token threshold %d", i, estBatchTokens, ranker.cfg.TokenLimit)
126 | ranker.logTokenSizes(batch)
127 | valid = false
128 | break
129 | }
130 | }
131 | if !valid {
132 | break
133 | }
134 | }
135 |
136 | if valid {
137 | avgEstTokens := estTotalTokens / (samples * numBatches)
138 | avgEstPct := float64(avgEstTokens) / float64(ranker.cfg.TokenLimit) * 100
139 | log.Printf("Average estimated tokens: %d (%.2f%% of max %d tokens)", avgEstTokens, avgEstPct, ranker.cfg.TokenLimit)
140 | break
141 | }
142 | if ranker.cfg.BatchSize <= minBatchSize {
143 | log.Fatal("Cannot create a valid batch within the token limit")
144 | }
145 | ranker.cfg.BatchSize--
146 | log.Printf("Decreasing batch size to %d", ranker.cfg.BatchSize)
147 | }
148 | }
149 |
150 | type Object struct {
151 | // object unique identifier use to identify the object in the final results
152 | ID string `json:"id"`
153 | // string value to be ranked
154 | Value string `json:"value"`
155 | // the original structured object if we're loading a json file
156 | Object interface{} `json:"object"`
157 | }
158 |
159 | type RankedObject struct {
160 | Object Object
161 | Score float64
162 | }
163 |
164 | type RankedObjectResponse struct {
165 | Objects []string `json:"objects" jsonschema_description:"List of ranked object IDs"`
166 | }
167 |
168 | type FinalResult struct {
169 | Key string `json:"key"`
170 | Value string `json:"value"`
171 | // the original structured object if we're loading a json file
172 | Object interface{} `json:"object"`
173 | Score float64 `json:"score"`
174 | Exposure int `json:"exposure"`
175 | Rank int `json:"rank"`
176 | }
177 |
178 | func GenerateSchema[T any]() interface{} {
179 | reflector := jsonschema.Reflector{
180 | AllowAdditionalProperties: false,
181 | DoNotReference: true,
182 | }
183 | var v T
184 | schema := reflector.Reflect(v)
185 | return schema
186 | }
187 |
188 | var RankedObjectResponseSchema = GenerateSchema[RankedObjectResponse]()
189 |
190 | func ShortDeterministicID(input string, length int) string {
191 | // Keep only A-Za-z0-9 from Base64-encoded SHA-256 hash.
192 | hash := sha256.Sum256([]byte(input))
193 | base64Encoded := base64.URLEncoding.EncodeToString(hash[:])
194 | var result strings.Builder
195 | for _, char := range base64Encoded {
196 | if (char >= '0' && char <= '9') || (char >= 'a' && char <= 'z') || (char >= 'A' && char <= 'Z') {
197 | result.WriteRune(char)
198 | }
199 | }
200 | filtered := result.String()
201 | if length > len(filtered) {
202 | length = len(filtered)
203 | }
204 | return filtered[:length]
205 | }
206 |
207 | func loadObjectsFromFile(filePath string, templateData string, forceJSON bool) (objects []Object, err error) {
208 | var tmpl *template.Template
209 | if templateData != "" {
210 | if templateData[0] == '@' {
211 | content, err := os.ReadFile(templateData[1:])
212 | if err != nil {
213 | return nil, err
214 | }
215 | templateData = string(content)
216 | }
217 | if tmpl, err = template.New("raink-item-template").Parse(templateData); err != nil {
218 | return nil, err
219 | }
220 | }
221 |
222 | file, err := os.Open(filePath)
223 | if err != nil {
224 | return nil, err
225 | }
226 | defer file.Close()
227 |
228 | ext := strings.ToLower(filepath.Ext(filePath))
229 | if ext == ".json" || forceJSON {
230 | // parse the file in an opaque array
231 | var data []interface{}
232 | if err := json.NewDecoder(file).Decode(&data); err != nil {
233 | return nil, err
234 | }
235 |
236 | // iterate over the map and create objects
237 | for _, value := range data {
238 | var valueStr string
239 | if tmpl != nil {
240 | var tmplData bytes.Buffer
241 | if err := tmpl.Execute(&tmplData, value); err != nil {
242 | return nil, err
243 | }
244 | valueStr = tmplData.String()
245 | } else {
246 | log.Printf("WARNING: using json input without a template, using JSON object as it is\n")
247 | jsonValue, err := json.Marshal(value)
248 | if err != nil {
249 | return nil, err
250 | }
251 | valueStr = string(jsonValue)
252 | }
253 |
254 | id := ShortDeterministicID(valueStr, idLen)
255 | objects = append(objects, Object{ID: id, Object: value, Value: valueStr})
256 | }
257 | } else {
258 | // read and interpolate the file line by line
259 | reader := bufio.NewReader(file)
260 | for {
261 | line, err := reader.ReadString('\n')
262 | if err != nil {
263 | if err == io.EOF {
264 | break
265 | }
266 | return nil, err
267 | }
268 | line = strings.TrimSpace(line)
269 |
270 | if tmpl != nil {
271 | var tmplData bytes.Buffer
272 | if err := tmpl.Execute(&tmplData, map[string]string{"Data": line}); err != nil {
273 | return nil, err
274 | }
275 | line = tmplData.String()
276 | }
277 |
278 | id := ShortDeterministicID(line, idLen)
279 | objects = append(objects, Object{ID: id, Object: nil, Value: line})
280 | }
281 | }
282 |
283 | return objects, nil
284 | }
285 |
286 | // TODO: Move all of this CLI-related code to a separate package.
287 | func main() {
288 | log.SetOutput(os.Stderr)
289 |
290 | inputFile := flag.String("f", "", "Input file")
291 | forceJSON := flag.Bool("json", false, "Force JSON parsing regardless of file extension")
292 | inputTemplate := flag.String("template", "{{.Data}}", "Template for each object in the input file (prefix with @ to use a file)")
293 | batchSize := flag.Int("s", 10, "Number of items per batch")
294 | numRuns := flag.Int("r", 10, "Number of runs")
295 | batchTokens := flag.Int("t", 128000, "Max tokens per batch")
296 | initialPrompt := flag.String("p", "", "Initial prompt (prefix with @ to use a file)")
297 | outputFile := flag.String("o", "", "JSON output file")
298 |
299 | ollamaURL := flag.String("ollama-url", "http://localhost:11434/api/chat", "Ollama API URL")
300 | ollamaModel := flag.String("ollama-model", "", "Ollama model name (if not set, OpenAI will be used)")
301 | oaiModel := flag.String("openai-model", openai.ChatModelGPT4oMini, "OpenAI model name")
302 | oaiURL := flag.String("openai-url", "", "OpenAI API base URL (e.g., for OpenAI-compatible API like vLLM)")
303 | encoding := flag.String("encoding", "o200k_base", "Tokenizer encoding")
304 |
305 | dryRun := flag.Bool("dry-run", false, "Enable dry run mode (log API calls without making them)")
306 | refinementRatio := flag.Float64("ratio", 0.5, "Refinement ratio as a decimal (e.g., 0.5 for 50%)")
307 | flag.Parse()
308 |
309 | // TODO: This should be a more resilient check. We're assuming that if the
310 | // batchTokens is 128000, then a user didn't pass that value via CLI (i.e.,
311 | // that it's the default value).
312 | if *ollamaModel != "" && *batchTokens == 128000 {
313 | *batchTokens = 4096
314 | }
315 |
316 | // This "threshold" is a way to add some padding to our estimation of
317 | // average token usage per batch. We're effectively leaving 5% of
318 | // wiggle room.
319 | var tokenLimitThreshold = int(0.95 * float64(*batchTokens))
320 |
321 | if *inputFile == "" {
322 | log.Println("Usage: raink -f [-s ] [-r ] [-p ] [-t ] [-ollama-model ] [-openai-model ] [-openai-url ] [-ratio ]")
323 | return
324 | }
325 |
326 | if *refinementRatio < 0 || *refinementRatio >= 1 {
327 | fmt.Println("Error: Refinement ratio must be >= 0 and < 1")
328 | os.Exit(1)
329 | }
330 |
331 | userPrompt := *initialPrompt
332 | if strings.HasPrefix(userPrompt, "@") {
333 | filePath := strings.TrimPrefix(userPrompt, "@")
334 | content, err := os.ReadFile(filePath)
335 | if err != nil {
336 | log.Fatalf("Error reading initial prompt file: %v", err)
337 | }
338 | userPrompt = string(content)
339 | }
340 |
341 | config := &Config{
342 | InitialPrompt: userPrompt,
343 | BatchSize: *batchSize,
344 | NumRuns: *numRuns,
345 | OllamaModel: *ollamaModel,
346 | OpenAIModel: *oaiModel,
347 | TokenLimit: tokenLimitThreshold,
348 | RefinementRatio: *refinementRatio,
349 | OpenAIKey: os.Getenv("OPENAI_API_KEY"),
350 | OpenAIAPIURL: *oaiURL,
351 | OllamaAPIURL: *ollamaURL,
352 | Encoding: *encoding,
353 | BatchTokens: *batchTokens,
354 | DryRun: *dryRun,
355 | }
356 |
357 | ranker, err := NewRanker(config)
358 | if err != nil {
359 | log.Fatal(err)
360 | }
361 |
362 | objects, err := loadObjectsFromFile(*inputFile, *inputTemplate, *forceJSON)
363 | if err != nil {
364 | log.Fatal(err)
365 | }
366 |
367 | // check that no object is too large
368 | for _, obj := range objects {
369 | tokens := ranker.estimateTokens([]Object{obj}, true)
370 | if tokens > *batchTokens {
371 | log.Fatalf("Object is too large with %d tokens:\n%s", tokens, obj.Value)
372 | }
373 | }
374 |
375 | // Dynamically adjust batch size upfront.
376 | ranker.AdjustBatchSize(objects, 10)
377 |
378 | // Recursive processing
379 | finalResults := ranker.Rank(objects, 1)
380 |
381 | // Add the rank key to each final result based on its position in the list
382 | for i := range finalResults {
383 | finalResults[i].Rank = i + 1
384 | }
385 |
386 | jsonResults, err := json.MarshalIndent(finalResults, "", " ")
387 | if err != nil {
388 | panic(err)
389 | }
390 |
391 | if !config.DryRun {
392 | fmt.Println(string(jsonResults))
393 | }
394 |
395 | if *outputFile != "" {
396 | os.WriteFile(*outputFile, jsonResults, 0644)
397 | log.Printf("Results written to %s\n", *outputFile)
398 | }
399 | }
400 |
401 | // TODO: The final exposure value should be the sum of all exposures from all
402 | // refinement rounds (not just the last one). This isn't crucial since exposure
403 | // is just a helpful metric to show that objects compared to a sufficiently
404 | // large number of other objects.
405 |
406 | func (r *Ranker) Rank(objects []Object, round int) []FinalResult {
407 | r.round = round
408 |
409 | log.Printf("Round %d: Ranking %d objects\n", r.round, len(objects))
410 |
411 | // If we've narrowed down to a single object, we're done.
412 | if len(objects) == 1 {
413 | return []FinalResult{
414 | {
415 | Key: objects[0].ID,
416 | Value: objects[0].Value,
417 | Object: objects[0].Object,
418 | Score: 0, // 0 is guaranteed to be the "highest" score.
419 | Exposure: 1,
420 | },
421 | }
422 | }
423 |
424 | // Downstream ranking gets unhappy if we try to rank more objects than we
425 | // have.
426 | if r.cfg.BatchSize > len(objects) {
427 | r.cfg.BatchSize = len(objects)
428 | }
429 |
430 | r.numBatches = len(objects) / r.cfg.BatchSize
431 |
432 | // Process the objects and get the sorted results.
433 | results := r.shuffleBatchRank(objects)
434 |
435 | // If the refinement ratio is 0, that effectively means we're refining
436 | // _none_ of the top objects, so we're done.
437 | if r.cfg.RefinementRatio == 0 {
438 | return results
439 | }
440 |
441 | // Calculate the mid index based on the refinement ratio.
442 | mid := int(float64(len(results)) * r.cfg.RefinementRatio)
443 | topPortion := results[:mid]
444 | bottomPortion := results[mid:]
445 |
446 | // If we haven't reduced the number of objects (as may eventually happen
447 | // for a ratio above 0.5), we're done.
448 | if len(topPortion) == len(objects) {
449 | return results
450 | }
451 |
452 | log.Println("Top items being sent back into recursion:")
453 | for i, obj := range topPortion {
454 | log.Printf("Rank %d: ID=%s, Score=%.2f, Value=%s", i+1, obj.Key, obj.Score, obj.Value)
455 | }
456 |
457 | var topPortionObjects []Object
458 | for _, result := range topPortion {
459 | topPortionObjects = append(topPortionObjects, Object{ID: result.Key, Value: result.Value, Object: result.Object})
460 | }
461 |
462 | refinedTopPortion := r.Rank(topPortionObjects, round+1)
463 |
464 | // Adjust scores by recursion depth; this serves as an inverted weight so
465 | // that later rounds are guaranteed to sit higher in the final list.
466 | for i := range refinedTopPortion {
467 | refinedTopPortion[i].Score /= float64(2 * round)
468 | }
469 |
470 | // Combine the refined top portion with the unrefined bottom portion.
471 | finalResults := append(refinedTopPortion, bottomPortion...)
472 |
473 | return finalResults
474 | }
475 |
476 | // TODO: Also log the request/retry attempt number.
477 | func (r *Ranker) logFromApiCall(runNum, batchNum int, message string, args ...interface{}) {
478 | formattedMessage := fmt.Sprintf("Round %d, Run %*d/%d, Batch %*d/%d: "+message, r.round, len(strconv.Itoa(r.cfg.NumRuns)), runNum, r.cfg.NumRuns, len(strconv.Itoa(r.numBatches)), batchNum, r.numBatches)
479 | log.Printf(formattedMessage, args...)
480 | }
481 |
482 | func (r *Ranker) shuffleBatchRank(objects []Object) []FinalResult {
483 | scores := make(map[string][]float64)
484 |
485 | exposureCounts := make(map[string]int)
486 |
487 | resultsChan := make(chan []RankedObject, r.numBatches)
488 |
489 | var firstRunRemainderItems []Object
490 |
491 | for i := 0; i < r.cfg.NumRuns; i++ {
492 | r.rng.Shuffle(len(objects), func(i, j int) {
493 | objects[i], objects[j] = objects[j], objects[i]
494 | })
495 |
496 | // Ensure remainder items from the first run are not in the remainder
497 | // range in the second run
498 | if i == 1 && len(firstRunRemainderItems) > 0 {
499 | for {
500 | remainderStart := r.numBatches * r.cfg.BatchSize
501 | remainderItems := objects[remainderStart:]
502 | conflictFound := false
503 | for _, item := range remainderItems {
504 | for _, firstRunItem := range firstRunRemainderItems {
505 | if item.ID == firstRunItem.ID {
506 | log.Printf("Conflicting remainder item found: %v, %v\n", item, firstRunItem)
507 | conflictFound = true
508 | break
509 | }
510 | }
511 | if conflictFound {
512 | break
513 | }
514 | }
515 | if !conflictFound {
516 | break
517 | }
518 | r.rng.Shuffle(len(objects), func(i, j int) {
519 | objects[i], objects[j] = objects[j], objects[i]
520 | })
521 | }
522 | }
523 |
524 | // Split into groups of batchSize and process them concurrently
525 | log.Printf("Round %d, Run %*d/%d: Submitting batches to API\n", r.round, len(strconv.Itoa(r.cfg.NumRuns)), i+1, r.cfg.NumRuns)
526 | for j := 0; j < r.numBatches; j++ {
527 | batch := objects[j*r.cfg.BatchSize : (j+1)*r.cfg.BatchSize]
528 | go func(runNumber, batchNumber int, batch []Object) {
529 | rankedBatch := r.rankObjects(batch, runNumber, batchNumber)
530 | resultsChan <- rankedBatch
531 | }(i+1, j+1, batch)
532 | }
533 |
534 | // Collect results from all batches
535 | for j := 0; j < r.numBatches; j++ {
536 | rankedBatch := <-resultsChan
537 | for _, rankedObject := range rankedBatch {
538 | scores[rankedObject.Object.ID] = append(scores[rankedObject.Object.ID], rankedObject.Score)
539 | exposureCounts[rankedObject.Object.ID]++ // Update exposure count
540 | }
541 | }
542 |
543 | // Save remainder items from the first run
544 | if i == 0 {
545 | remainderStart := r.numBatches * r.cfg.BatchSize
546 | if remainderStart < len(objects) {
547 | firstRunRemainderItems = make([]Object, len(objects[remainderStart:]))
548 | copy(firstRunRemainderItems, objects[remainderStart:])
549 | log.Printf("First run remainder items: %v\n", firstRunRemainderItems)
550 | }
551 | }
552 | }
553 |
554 | // Calculate average scores
555 | finalScores := make(map[string]float64)
556 | for id, scoreList := range scores {
557 | var sum float64
558 | for _, score := range scoreList {
559 | sum += score
560 | }
561 | finalScores[id] = sum / float64(len(scoreList))
562 | }
563 |
564 | var results []FinalResult
565 | for id, score := range finalScores {
566 | for _, obj := range objects {
567 | if obj.ID == id {
568 | results = append(results, FinalResult{
569 | Key: id,
570 | Value: obj.Value,
571 | Object: obj.Object,
572 | Score: score,
573 | Exposure: exposureCounts[id], // Include exposure count
574 | })
575 | break
576 | }
577 | }
578 | }
579 |
580 | sort.Slice(results, func(i, j int) bool {
581 | return results[i].Score < results[j].Score
582 | })
583 |
584 | return results
585 | }
586 |
587 | func (r *Ranker) logTokenSizes(group []Object) {
588 | log.Println("Logging token sizes for each object in the batch:")
589 | for _, obj := range group {
590 | tokenSize := r.estimateTokens([]Object{obj}, false)
591 | valuePreview := obj.Value
592 | if len(valuePreview) > 100 {
593 | valuePreview = valuePreview[:100]
594 | }
595 | log.Printf("Object ID: %s, Token Size: %d, Value Preview: %s", obj.ID, tokenSize, valuePreview)
596 | }
597 | }
598 |
599 | const promptFmt = "id: `%s`\nvalue:\n```\n%s\n```\n\n"
600 |
601 | // TODO: Merge these and clean them up.
602 |
603 | var promptDisclaimer = fmt.Sprintf(
604 | "\n\nREMEMBER to:\n"+
605 | "- ALWAYS respond with the short %d-character ID of each item found above the value "+
606 | "(i.e., I'll provide you with `id: ` above the value, and you should respond with that same ID in your response)\n"+
607 | "— NEVER respond with the actual value!\n"+
608 | "— NEVER include backticks around IDs in your response!\n"+
609 | "— NEVER include scores or a written reason/justification in your response!\n"+
610 | "- Respond in RANKED DESCENDING order, where the FIRST item in your response is the MOST RELEVANT\n"+
611 | "- Respond in JSON format, with the following schema:\n {\"objects\": [\"\", \"\", ...]}\n\n"+
612 | "Here are the objects to be ranked:\n\n",
613 | idLen,
614 | )
615 |
616 | const missingIDsStr = "Your last response was missing the following IDs: [%s]. " +
617 | "Try again—and make ABSOLUTELY SURE to remember to:\n" +
618 | "- ALWAYS return the IDs and NOT THE VALUES! " +
619 | "- ALWAYS respond in JSON format as specified! " +
620 | "- ALWAYS return ALL of the IDs in the list!" +
621 | "- NEVER include backticks around IDs in your response!" +
622 | "— NEVER include scores or a written reason/justification in your response!"
623 |
624 | const invalidJSONStr = "Your last response was not valid JSON. Try again!"
625 |
626 | func (r *Ranker) estimateTokens(group []Object, includePrompt bool) int {
627 | text := ""
628 | if includePrompt {
629 | text += r.cfg.InitialPrompt + promptDisclaimer
630 | }
631 | for _, obj := range group {
632 | text += fmt.Sprintf(promptFmt, obj.ID, obj.Value)
633 | }
634 |
635 | if r.cfg.OllamaModel != "" {
636 | // TODO: Update to use Ollama tokenize API when this PR is merged:
637 | // https://github.com/ollama/ollama/pull/6586
638 | return len(text) / 4
639 | } else {
640 | return len(r.encoding.Encode(text, nil, nil))
641 | }
642 | }
643 |
644 | func (r *Ranker) rankObjects(group []Object, runNumber int, batchNumber int) []RankedObject {
645 | prompt := r.cfg.InitialPrompt + promptDisclaimer
646 | for _, obj := range group {
647 | prompt += fmt.Sprintf(promptFmt, obj.ID, obj.Value)
648 | }
649 |
650 | if r.cfg.DryRun {
651 | log.Printf("Dry run API call")
652 | // Simulate a ranked response for dry run
653 | var rankedObjects []RankedObject
654 | for i, obj := range group {
655 | rankedObjects = append(rankedObjects, RankedObject{
656 | Object: obj,
657 | Score: float64(i + 1), // Simulate scores based on position
658 | })
659 | }
660 | return rankedObjects
661 | }
662 |
663 | var rankedResponse RankedObjectResponse
664 | inputIDs := make(map[string]bool)
665 | for _, obj := range group {
666 | inputIDs[obj.ID] = true
667 | }
668 | if r.cfg.OllamaModel != "" {
669 | rankedResponse = r.callOllama(prompt, runNumber, batchNumber, inputIDs)
670 | } else {
671 | rankedResponse = r.callOpenAI(prompt, runNumber, batchNumber, inputIDs)
672 | }
673 |
674 | // Assign scores based on position in the ranked list
675 | var rankedObjects []RankedObject
676 | for i, id := range rankedResponse.Objects {
677 | for _, obj := range group {
678 | if obj.ID == id {
679 | rankedObjects = append(rankedObjects, RankedObject{
680 | Object: obj,
681 | Score: float64(i + 1), // Score based on position (1 for first, 2 for second, etc.)
682 | })
683 | break
684 | }
685 | }
686 | }
687 |
688 | return rankedObjects
689 | }
690 |
691 | type CustomTransport struct {
692 | Transport http.RoundTripper
693 | Headers http.Header
694 | StatusCode int
695 | Body []byte
696 | }
697 |
698 | func (t *CustomTransport) RoundTrip(req *http.Request) (*http.Response, error) {
699 | resp, err := t.Transport.RoundTrip(req)
700 | if err != nil {
701 | return nil, err
702 | }
703 |
704 | t.Headers = resp.Header
705 | t.StatusCode = resp.StatusCode
706 |
707 | t.Body, err = io.ReadAll(resp.Body)
708 | if err != nil {
709 | return nil, err
710 | }
711 |
712 | resp.Body = io.NopCloser(bytes.NewBuffer(t.Body))
713 |
714 | return resp, nil
715 | }
716 |
717 | // Updates the rankedResponse in place to fix case-insensitive ID mismatches.
718 | // If any IDs are missing, returns the missing IDs along with an error.
719 | // TODO: Also error on IDs in rankedResponse that are not in inputIDs. For example:
720 | // Run 1/10, Batch 8/10: Missing IDs: [VkCMOyV9]
721 | // Ollama API response: {"objects": ["5reULTRv", "KTJsPKHz", "eBFIaWo7", "AhqhnGsE", "Ug_hOxYp", "bWfMDUnE", "4sSg4Ojz", "VkJMOyV9", "UJ1-iMmW", "v6Puwf8K"]}
722 |
723 | func validateIDs(rankedResponse *RankedObjectResponse, inputIDs map[string]bool) ([]string, error) {
724 | // Create a map for case-insensitive ID matching
725 | inputIDsLower := make(map[string]string)
726 | for id := range inputIDs {
727 | inputIDsLower[strings.ToLower(id)] = id
728 | }
729 |
730 | missingIDs := make(map[string]bool)
731 | for id := range inputIDs {
732 | missingIDs[id] = true
733 | }
734 |
735 | for i, id := range rankedResponse.Objects {
736 | id = strings.ReplaceAll(id, "`", "")
737 | lowerID := strings.ToLower(id)
738 | if correctID, found := inputIDsLower[lowerID]; found {
739 | if correctID != id {
740 | // Replace the case-wrong match with the correct ID
741 | rankedResponse.Objects[i] = correctID
742 | }
743 | delete(missingIDs, correctID)
744 | }
745 | }
746 |
747 | if len(missingIDs) == 0 {
748 | return nil, nil
749 | } else {
750 | missingIDsKeys := make([]string, 0, len(missingIDs))
751 | for id := range missingIDs {
752 | missingIDsKeys = append(missingIDsKeys, id)
753 | }
754 | return missingIDsKeys, fmt.Errorf("missing IDs: %s", strings.Join(missingIDsKeys, ", "))
755 | }
756 | }
757 |
758 | func (r *Ranker) callOpenAI(prompt string, runNum int, batchNum int, inputIDs map[string]bool) RankedObjectResponse {
759 |
760 | customTransport := &CustomTransport{Transport: http.DefaultTransport}
761 | customClient := &http.Client{Transport: customTransport}
762 |
763 | clientOptions := []option.RequestOption{
764 | option.WithAPIKey(r.cfg.OpenAIKey),
765 | option.WithHTTPClient(customClient),
766 | option.WithMaxRetries(5),
767 | }
768 |
769 | // Add base URL option if specified
770 | if r.cfg.OpenAIAPIURL != "" {
771 | // Ensure the URL ends with a trailing slash
772 | baseURL := r.cfg.OpenAIAPIURL
773 | if !strings.HasSuffix(baseURL, "/") {
774 | baseURL += "/"
775 | }
776 | clientOptions = append(clientOptions, option.WithBaseURL(baseURL))
777 | }
778 |
779 | client := openai.NewClient(clientOptions...)
780 |
781 | backoff := time.Second
782 |
783 | conversationHistory := []openai.ChatCompletionMessageParamUnion{
784 | openai.UserMessage(prompt),
785 | }
786 |
787 | var rankedResponse RankedObjectResponse
788 | for {
789 | ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
790 | defer cancel()
791 |
792 | completion, err := client.Chat.Completions.New(ctx, openai.ChatCompletionNewParams{
793 | Messages: openai.F(conversationHistory),
794 | ResponseFormat: openai.F[openai.ChatCompletionNewParamsResponseFormatUnion](
795 | openai.ResponseFormatJSONSchemaParam{
796 | Type: openai.F(openai.ResponseFormatJSONSchemaTypeJSONSchema),
797 | JSONSchema: openai.F(openai.ResponseFormatJSONSchemaJSONSchemaParam{
798 | Name: openai.F("ranked_object_response"),
799 | Description: openai.F("List of ranked object IDs"),
800 | Schema: openai.F(RankedObjectResponseSchema),
801 | Strict: openai.Bool(true),
802 | }),
803 | },
804 | ),
805 | Model: openai.F(r.cfg.OpenAIModel),
806 | })
807 | if err == nil {
808 |
809 | conversationHistory = append(conversationHistory,
810 | openai.AssistantMessage(completion.Choices[0].Message.Content),
811 | )
812 |
813 | err = json.Unmarshal([]byte(completion.Choices[0].Message.Content), &rankedResponse)
814 | if err != nil {
815 | r.logFromApiCall(runNum, batchNum, fmt.Sprintf("Error unmarshalling response: %v\n", err))
816 | conversationHistory = append(conversationHistory,
817 | openai.UserMessage(invalidJSONStr),
818 | )
819 | trimmedContent := strings.TrimSpace(completion.Choices[0].Message.Content)
820 | log.Printf("OpenAI API response: %s", trimmedContent)
821 | continue
822 | }
823 |
824 | missingIDs, err := validateIDs(&rankedResponse, inputIDs)
825 | if err != nil {
826 | r.logFromApiCall(runNum, batchNum, fmt.Sprintf("Missing IDs: [%s]", strings.Join(missingIDs, ", ")))
827 | conversationHistory = append(conversationHistory,
828 | openai.UserMessage(fmt.Sprintf(missingIDsStr, strings.Join(missingIDs, ", "))),
829 | )
830 | trimmedContent := strings.TrimSpace(completion.Choices[0].Message.Content)
831 | log.Printf("OpenAI API response: %s", trimmedContent)
832 | continue
833 | }
834 |
835 | return rankedResponse
836 | }
837 |
838 | if err == context.DeadlineExceeded {
839 | r.logFromApiCall(runNum, batchNum, "Context deadline exceeded, retrying...")
840 | time.Sleep(backoff)
841 | backoff *= 2
842 | continue
843 | }
844 |
845 | if customTransport.StatusCode == http.StatusTooManyRequests {
846 | for key, values := range customTransport.Headers {
847 | if strings.HasPrefix(key, "X-Ratelimit") {
848 | for _, value := range values {
849 | r.logFromApiCall(runNum, batchNum, fmt.Sprintf("Rate limit header: %s: %s", key, value))
850 | }
851 | }
852 | }
853 |
854 | respBody := customTransport.Body
855 | if respBody == nil {
856 | r.logFromApiCall(runNum, batchNum, "Error reading response body: %v", "response body is nil")
857 | } else {
858 | r.logFromApiCall(runNum, batchNum, "Response body: %s", string(respBody))
859 | }
860 |
861 | remainingTokensStr := customTransport.Headers.Get("X-Ratelimit-Remaining-Tokens")
862 | resetTokensStr := customTransport.Headers.Get("X-Ratelimit-Reset-Tokens")
863 |
864 | remainingTokens, _ := strconv.Atoi(remainingTokensStr)
865 | resetDuration, _ := time.ParseDuration(strings.Replace(resetTokensStr, "s", "s", 1))
866 |
867 | r.logFromApiCall(runNum, batchNum, fmt.Sprintf("Rate limit exceeded. Suggested wait time: %v. Remaining tokens: %d", resetDuration, remainingTokens))
868 |
869 | if resetDuration > 0 {
870 | r.logFromApiCall(runNum, batchNum, fmt.Sprintf("Waiting for %v before retrying...", resetDuration))
871 | time.Sleep(resetDuration)
872 | } else {
873 | r.logFromApiCall(runNum, batchNum, fmt.Sprintf("Waiting for %v before retrying...", backoff))
874 | time.Sleep(backoff)
875 | backoff *= 2
876 | }
877 | } else {
878 | log.Fatalf("Run %*d/%d, Batch %*d/%d: Unexpected error: %v", len(strconv.Itoa(r.cfg.NumRuns)), runNum, r.cfg.NumRuns, len(strconv.Itoa(r.numBatches)), batchNum, r.numBatches, err)
879 | }
880 | }
881 | }
882 |
883 | func (r *Ranker) callOllama(prompt string, runNum int, batchNum int, inputIDs map[string]bool) RankedObjectResponse {
884 |
885 | var rankedResponse RankedObjectResponse
886 |
887 | // Initialize the conversation history with the initial prompt
888 | conversationHistory := []map[string]interface{}{
889 | {"role": "user", "content": prompt},
890 | }
891 |
892 | for {
893 |
894 | requestBody, err := json.Marshal(map[string]interface{}{
895 | "model": r.cfg.OllamaModel,
896 | "stream": false,
897 | "format": "json",
898 | "num_ctx": r.cfg.BatchTokens,
899 | "messages": conversationHistory,
900 | })
901 | if err != nil {
902 | log.Fatalf("Error creating Ollama API request body: %v", err)
903 | }
904 |
905 | req, err := http.NewRequest("POST", r.cfg.OllamaAPIURL, bytes.NewReader(requestBody))
906 | if err != nil {
907 | log.Fatalf("Error creating Ollama API request: %v", err)
908 | }
909 | req.Header.Set("Content-Type", "application/json")
910 |
911 | client := &http.Client{}
912 |
913 | resp, err := client.Do(req)
914 | if err != nil {
915 | log.Fatalf("Error making request to Ollama API: %v", err)
916 | }
917 | defer resp.Body.Close()
918 |
919 | if resp.StatusCode != http.StatusOK {
920 | body, _ := io.ReadAll(resp.Body)
921 | log.Fatalf("Ollama API returned an error: %v, body: %s", resp.StatusCode, body)
922 | }
923 |
924 | responseBody, err := io.ReadAll(resp.Body)
925 | if err != nil {
926 | log.Fatalf("Error reading Ollama API response body: %v", err)
927 | }
928 |
929 | var ollamaResponse struct {
930 | Message struct {
931 | Content string `json:"content"`
932 | } `json:"message"`
933 | }
934 |
935 | err = json.Unmarshal(responseBody, &ollamaResponse)
936 | if err != nil {
937 | log.Fatalf("Error parsing Ollama API response: %v", err)
938 | }
939 |
940 | conversationHistory = append(
941 | conversationHistory,
942 | map[string]interface{}{
943 | "role": "assistant",
944 | "content": ollamaResponse.Message.Content,
945 | },
946 | )
947 |
948 | err = json.Unmarshal([]byte(ollamaResponse.Message.Content), &rankedResponse)
949 | if err != nil {
950 | r.logFromApiCall(runNum, batchNum, fmt.Sprintf("Error unmarshalling response: %v\n", err))
951 | conversationHistory = append(conversationHistory,
952 | map[string]interface{}{
953 | "role": "user",
954 | "content": invalidJSONStr,
955 | },
956 | )
957 | trimmedContent := strings.TrimSpace(ollamaResponse.Message.Content)
958 | log.Printf("Ollama API response: %s", trimmedContent)
959 | continue
960 | }
961 |
962 | missingIDs, err := validateIDs(&rankedResponse, inputIDs)
963 | if err != nil {
964 | r.logFromApiCall(runNum, batchNum, fmt.Sprintf("Missing IDs: [%s]", strings.Join(missingIDs, ", ")))
965 | conversationHistory = append(conversationHistory,
966 | map[string]interface{}{
967 | "role": "user",
968 | "content": fmt.Sprintf(missingIDsStr, strings.Join(missingIDs, ", ")),
969 | },
970 | )
971 | trimmedContent := strings.TrimSpace(ollamaResponse.Message.Content)
972 | log.Printf("Ollama API response: %s", trimmedContent)
973 | continue
974 | }
975 |
976 | return rankedResponse
977 | }
978 | }
979 |
--------------------------------------------------------------------------------
/testdata/sentences.txt:
--------------------------------------------------------------------------------
1 | A group of hikers trekked through the dense forest.
2 | He read the letter aloud with a trembling voice.
3 | A small child laughed and splashed in the puddle.
4 | The festival was filled with music and laughter.
5 | The rain tapped gently on the roof.
6 | The candlelight created a warm glow in the room.
7 | A sudden storm darkened the sky.
8 | He stacked the logs neatly by the fireplace.
9 | The chef drizzled sauce over the plated dish.
10 | He wrote a letter he would never send.
11 | The snow crunched underfoot as they walked.
12 | The library was quiet except for the rustle of pages.
13 | The fox dashed into the woods at the first sound.
14 | A rainbow appeared after the heavy rain.
15 | He found an old coin buried in the sand.
16 | The clock ticked steadily on the wall.
17 | A spider spun a web between the branches.
18 | The little boy held a balloon tightly in his hand.
19 | He opened the door and gasped in surprise.
20 | A squirrel darted across the park path.
21 | A child waved enthusiastically from the swings.
22 | A butterfly landed on her outstretched hand.
23 | He poured tea into delicate porcelain cups.
24 | The city lights sparkled from the rooftop view.
25 | The waves crashed against the rocky shore.
26 | The train whistle echoed through the valley.
27 | She knitted a scarf while sitting by the window.
28 | The farmer harvested ripe apples from the orchard.
29 | A pair of swans glided gracefully across the lake.
30 | The stage was set for the grand performance.
31 | A thunderclap startled the sleeping dog.
32 | He built a small wooden boat by hand.
33 | The book had a surprising twist at the end.
34 | The aroma of fresh bread filled the air.
35 | A lizard basked in the warm sunlight.
36 | The magician performed a trick that amazed everyone.
37 | The kite soared high in the clear blue sky.
38 | The sound of waves soothed her mind.
39 | The violinist played a hauntingly beautiful melody.
40 | A small fish swam near the surface of the pond.
41 | The painting was breathtakingly beautiful.
42 | The chocolate melted in the summer heat.
43 | She wore a bracelet made of colorful beads.
44 | A paper airplane sailed across the classroom.
45 | The old bridge creaked as they crossed it.
46 | A curious fox appeared at the edge of the forest.
47 | She discovered an old diary in the attic.
48 | He watched as the leaves fell one by one.
49 | A cool breeze swept through the meadow.
50 | The crowd erupted in cheers at the winning goal.
51 | The mountain trail was steep and challenging.
52 | The music played softly in the background.
53 | The ancient ruins stood silently in the desert.
54 | The mirror reflected an unfamiliar face.
55 | She planted flowers in her grandmother's garden.
56 | She opened the curtains to let in the morning light.
57 | The detective inspected the room carefully.
58 | A mysterious letter arrived in the mail.
59 | He whispered a secret into her ear.
60 | A small group of stars formed a familiar constellation.
61 | The bell rang, signaling the end of class.
62 | A stray dog found shelter under the porch.
63 | She placed the last puzzle piece into its spot.
64 | The chef prepared a dish with perfect precision.
65 | He walked into the room with a smile.
66 | He sketched a portrait of his best friend.
67 | The new store had an impressive display of goods.
68 | She discovered a hidden passage behind the bookshelf.
69 | He wore a red scarf on the chilly day.
70 | The ship's horn sounded as it departed the dock.
71 | The stars twinkled brightly in the clear night sky.
72 | The fire crackled softly in the hearth.
73 | The rooster crowed at the break of dawn.
74 | She balanced a tray full of dishes with ease.
75 | The candle flickered as the wind blew gently.
76 | She couldn't find her keys anywhere.
77 | She wore a dress that shimmered in the light.
78 | She decided to bake a cake from scratch.
79 | A bird chirped happily outside the window.
80 | The coffee shop was packed with customers.
81 | A stray cat followed him down the street.
82 | The cat jumped onto the windowsill.
83 | She climbed to the top of the hill to watch the sunset.
84 | The wind carried the scent of the ocean.
85 | He doodled in the margins of his notebook.
86 | She tied a ribbon around the gift box.
87 | He finished the marathon despite the pain.
88 | The lighthouse stood tall against the stormy sky.
89 | A horse galloped across the open field.
90 | The train arrived exactly on time.
91 | He spotted a shooting star while stargazing.
92 | The professor spoke passionately about the subject.
93 | The old clock chimed twelve times.
94 | Her laughter echoed through the hall.
95 | The smell of fresh paint lingered in the air.
96 | The wind howled through the abandoned house.
97 | A group of friends gathered around the campfire.
98 | She wore a hat decorated with colorful feathers.
99 | The puppy wagged its tail excitedly.
100 | A stranger handed her a flower as she walked by.
101 |
--------------------------------------------------------------------------------
/testdata/tlds.txt:
--------------------------------------------------------------------------------
1 | aaa
2 | aarp
3 | abb
4 | abbott
5 | abbvie
6 | abc
7 | able
8 | abogado
9 | abudhabi
10 | ac
11 | academy
12 | accenture
13 | accountant
14 | accountants
15 | aco
16 | actor
17 | ad
18 | ads
19 | adult
20 | ae
21 | aeg
22 | aero
23 | aetna
24 | af
25 | afl
26 | africa
27 | ag
28 | agakhan
29 | agency
30 | ai
31 | aig
32 | airbus
33 | airforce
34 | airtel
35 | akdn
36 | al
37 | alibaba
38 | alipay
39 | allfinanz
40 | allstate
41 | ally
42 | alsace
43 | alstom
44 | am
45 | amazon
46 | americanexpress
47 | americanfamily
48 | amex
49 | amfam
50 | amica
51 | amsterdam
52 | analytics
53 | android
54 | anquan
55 | anz
56 | ao
57 | aol
58 | apartments
59 | app
60 | apple
61 | aq
62 | aquarelle
63 | ar
64 | arab
65 | aramco
66 | archi
67 | army
68 | arpa
69 | art
70 | arte
71 | as
72 | asda
73 | asia
74 | associates
75 | at
76 | athleta
77 | attorney
78 | au
79 | auction
80 | audi
81 | audible
82 | audio
83 | auspost
84 | author
85 | auto
86 | autos
87 | aw
88 | aws
89 | ax
90 | axa
91 | az
92 | azure
93 | ba
94 | baby
95 | baidu
96 | banamex
97 | band
98 | bank
99 | bar
100 | barcelona
101 | barclaycard
102 | barclays
103 | barefoot
104 | bargains
105 | baseball
106 | basketball
107 | bauhaus
108 | bayern
109 | bb
110 | bbc
111 | bbt
112 | bbva
113 | bcg
114 | bcn
115 | bd
116 | be
117 | beats
118 | beauty
119 | beer
120 | bentley
121 | berlin
122 | best
123 | bestbuy
124 | bet
125 | bf
126 | bg
127 | bh
128 | bharti
129 | bi
130 | bible
131 | bid
132 | bike
133 | bing
134 | bingo
135 | bio
136 | biz
137 | bj
138 | black
139 | blackfriday
140 | blockbuster
141 | blog
142 | bloomberg
143 | blue
144 | bm
145 | bms
146 | bmw
147 | bn
148 | bnpparibas
149 | bo
150 | boats
151 | boehringer
152 | bofa
153 | bom
154 | bond
155 | boo
156 | book
157 | booking
158 | bosch
159 | bostik
160 | boston
161 | bot
162 | boutique
163 | box
164 | br
165 | bradesco
166 | bridgestone
167 | broadway
168 | broker
169 | brother
170 | brussels
171 | bs
172 | bt
173 | build
174 | builders
175 | business
176 | buy
177 | buzz
178 | bv
179 | bw
180 | by
181 | bz
182 | bzh
183 | ca
184 | cab
185 | cafe
186 | cal
187 | call
188 | calvinklein
189 | cam
190 | camera
191 | camp
192 | canon
193 | capetown
194 | capital
195 | capitalone
196 | car
197 | caravan
198 | cards
199 | care
200 | career
201 | careers
202 | cars
203 | casa
204 | case
205 | cash
206 | casino
207 | cat
208 | catering
209 | catholic
210 | cba
211 | cbn
212 | cbre
213 | cc
214 | cd
215 | center
216 | ceo
217 | cern
218 | cf
219 | cfa
220 | cfd
221 | cg
222 | ch
223 | chanel
224 | channel
225 | charity
226 | chase
227 | chat
228 | cheap
229 | chintai
230 | christmas
231 | chrome
232 | church
233 | ci
234 | cipriani
235 | circle
236 | cisco
237 | citadel
238 | citi
239 | citic
240 | city
241 | ck
242 | cl
243 | claims
244 | cleaning
245 | click
246 | clinic
247 | clinique
248 | clothing
249 | cloud
250 | club
251 | clubmed
252 | cm
253 | cn
254 | co
255 | coach
256 | codes
257 | coffee
258 | college
259 | cologne
260 | com
261 | commbank
262 | community
263 | company
264 | compare
265 | computer
266 | comsec
267 | condos
268 | construction
269 | consulting
270 | contact
271 | contractors
272 | cooking
273 | cool
274 | coop
275 | corsica
276 | country
277 | coupon
278 | coupons
279 | courses
280 | cpa
281 | cr
282 | credit
283 | creditcard
284 | creditunion
285 | cricket
286 | crown
287 | crs
288 | cruise
289 | cruises
290 | cu
291 | cuisinella
292 | cv
293 | cw
294 | cx
295 | cy
296 | cymru
297 | cyou
298 | cz
299 | dad
300 | dance
301 | data
302 | date
303 | dating
304 | datsun
305 | day
306 | dclk
307 | dds
308 | de
309 | deal
310 | dealer
311 | deals
312 | degree
313 | delivery
314 | dell
315 | deloitte
316 | delta
317 | democrat
318 | dental
319 | dentist
320 | desi
321 | design
322 | dev
323 | dhl
324 | diamonds
325 | diet
326 | digital
327 | direct
328 | directory
329 | discount
330 | discover
331 | dish
332 | diy
333 | dj
334 | dk
335 | dm
336 | dnp
337 | do
338 | docs
339 | doctor
340 | dog
341 | domains
342 | dot
343 | download
344 | drive
345 | dtv
346 | dubai
347 | dunlop
348 | dupont
349 | durban
350 | dvag
351 | dvr
352 | dz
353 | earth
354 | eat
355 | ec
356 | eco
357 | edeka
358 | edu
359 | education
360 | ee
361 | eg
362 | email
363 | emerck
364 | energy
365 | engineer
366 | engineering
367 | enterprises
368 | epson
369 | equipment
370 | er
371 | ericsson
372 | erni
373 | es
374 | esq
375 | estate
376 | et
377 | eu
378 | eurovision
379 | eus
380 | events
381 | exchange
382 | expert
383 | exposed
384 | express
385 | extraspace
386 | fage
387 | fail
388 | fairwinds
389 | faith
390 | family
391 | fan
392 | fans
393 | farm
394 | farmers
395 | fashion
396 | fast
397 | fedex
398 | feedback
399 | ferrari
400 | ferrero
401 | fi
402 | fidelity
403 | fido
404 | film
405 | final
406 | finance
407 | financial
408 | fire
409 | firestone
410 | firmdale
411 | fish
412 | fishing
413 | fit
414 | fitness
415 | fj
416 | fk
417 | flickr
418 | flights
419 | flir
420 | florist
421 | flowers
422 | fly
423 | fm
424 | fo
425 | foo
426 | food
427 | football
428 | ford
429 | forex
430 | forsale
431 | forum
432 | foundation
433 | fox
434 | fr
435 | free
436 | fresenius
437 | frl
438 | frogans
439 | frontier
440 | ftr
441 | fujitsu
442 | fun
443 | fund
444 | furniture
445 | futbol
446 | fyi
447 | ga
448 | gal
449 | gallery
450 | gallo
451 | gallup
452 | game
453 | games
454 | gap
455 | garden
456 | gay
457 | gb
458 | gbiz
459 | gd
460 | gdn
461 | ge
462 | gea
463 | gent
464 | genting
465 | george
466 | gf
467 | gg
468 | ggee
469 | gh
470 | gi
471 | gift
472 | gifts
473 | gives
474 | giving
475 | gl
476 | glass
477 | gle
478 | global
479 | globo
480 | gm
481 | gmail
482 | gmbh
483 | gmo
484 | gmx
485 | gn
486 | godaddy
487 | gold
488 | goldpoint
489 | golf
490 | goo
491 | goodyear
492 | goog
493 | google
494 | gop
495 | got
496 | gov
497 | gp
498 | gq
499 | gr
500 | grainger
501 | graphics
502 | gratis
503 | green
504 | gripe
505 | grocery
506 | group
507 | gs
508 | gt
509 | gu
510 | gucci
511 | guge
512 | guide
513 | guitars
514 | guru
515 | gw
516 | gy
517 | hair
518 | hamburg
519 | hangout
520 | haus
521 | hbo
522 | hdfc
523 | hdfcbank
524 | health
525 | healthcare
526 | help
527 | helsinki
528 | here
529 | hermes
530 | hiphop
531 | hisamitsu
532 | hitachi
533 | hiv
534 | hk
535 | hkt
536 | hm
537 | hn
538 | hockey
539 | holdings
540 | holiday
541 | homedepot
542 | homegoods
543 | homes
544 | homesense
545 | honda
546 | horse
547 | hospital
548 | host
549 | hosting
550 | hot
551 | hotels
552 | hotmail
553 | house
554 | how
555 | hr
556 | hsbc
557 | ht
558 | hu
559 | hughes
560 | hyatt
561 | hyundai
562 | ibm
563 | icbc
564 | ice
565 | icu
566 | id
567 | ie
568 | ieee
569 | ifm
570 | ikano
571 | il
572 | im
573 | imamat
574 | imdb
575 | immo
576 | immobilien
577 | in
578 | inc
579 | industries
580 | infiniti
581 | info
582 | ing
583 | ink
584 | institute
585 | insurance
586 | insure
587 | int
588 | international
589 | intuit
590 | investments
591 | io
592 | ipiranga
593 | iq
594 | ir
595 | irish
596 | is
597 | ismaili
598 | ist
599 | istanbul
600 | it
601 | itau
602 | itv
603 | jaguar
604 | java
605 | jcb
606 | je
607 | jeep
608 | jetzt
609 | jewelry
610 | jio
611 | jll
612 | jm
613 | jmp
614 | jnj
615 | jo
616 | jobs
617 | joburg
618 | jot
619 | joy
620 | jp
621 | jpmorgan
622 | jprs
623 | juegos
624 | juniper
625 | kaufen
626 | kddi
627 | ke
628 | kerryhotels
629 | kerrylogistics
630 | kerryproperties
631 | kfh
632 | kg
633 | kh
634 | ki
635 | kia
636 | kids
637 | kim
638 | kindle
639 | kitchen
640 | kiwi
641 | km
642 | kn
643 | koeln
644 | komatsu
645 | kosher
646 | kp
647 | kpmg
648 | kpn
649 | kr
650 | krd
651 | kred
652 | kuokgroup
653 | kw
654 | ky
655 | kyoto
656 | kz
657 | la
658 | lacaixa
659 | lamborghini
660 | lamer
661 | lancaster
662 | land
663 | landrover
664 | lanxess
665 | lasalle
666 | lat
667 | latino
668 | latrobe
669 | law
670 | lawyer
671 | lb
672 | lc
673 | lds
674 | lease
675 | leclerc
676 | lefrak
677 | legal
678 | lego
679 | lexus
680 | lgbt
681 | li
682 | lidl
683 | life
684 | lifeinsurance
685 | lifestyle
686 | lighting
687 | like
688 | lilly
689 | limited
690 | limo
691 | lincoln
692 | link
693 | lipsy
694 | live
695 | living
696 | lk
697 | llc
698 | llp
699 | loan
700 | loans
701 | locker
702 | locus
703 | lol
704 | london
705 | lotte
706 | lotto
707 | love
708 | lpl
709 | lplfinancial
710 | lr
711 | ls
712 | lt
713 | ltd
714 | ltda
715 | lu
716 | lundbeck
717 | luxe
718 | luxury
719 | lv
720 | ly
721 | ma
722 | madrid
723 | maif
724 | maison
725 | makeup
726 | man
727 | management
728 | mango
729 | map
730 | market
731 | marketing
732 | markets
733 | marriott
734 | marshalls
735 | mattel
736 | mba
737 | mc
738 | mckinsey
739 | md
740 | me
741 | med
742 | media
743 | meet
744 | melbourne
745 | meme
746 | memorial
747 | men
748 | menu
749 | merckmsd
750 | mg
751 | mh
752 | miami
753 | microsoft
754 | mil
755 | mini
756 | mint
757 | mit
758 | mitsubishi
759 | mk
760 | ml
761 | mlb
762 | mls
763 | mm
764 | mma
765 | mn
766 | mo
767 | mobi
768 | mobile
769 | moda
770 | moe
771 | moi
772 | mom
773 | monash
774 | money
775 | monster
776 | mormon
777 | mortgage
778 | moscow
779 | moto
780 | motorcycles
781 | mov
782 | movie
783 | mp
784 | mq
785 | mr
786 | ms
787 | msd
788 | mt
789 | mtn
790 | mtr
791 | mu
792 | museum
793 | music
794 | mv
795 | mw
796 | mx
797 | my
798 | mz
799 | na
800 | nab
801 | nagoya
802 | name
803 | navy
804 | nba
805 | nc
806 | ne
807 | nec
808 | net
809 | netbank
810 | netflix
811 | network
812 | neustar
813 | new
814 | news
815 | next
816 | nextdirect
817 | nexus
818 | nf
819 | nfl
820 | ng
821 | ngo
822 | nhk
823 | ni
824 | nico
825 | nike
826 | nikon
827 | ninja
828 | nissan
829 | nissay
830 | nl
831 | no
832 | nokia
833 | norton
834 | now
835 | nowruz
836 | nowtv
837 | np
838 | nr
839 | nra
840 | nrw
841 | ntt
842 | nu
843 | nyc
844 | nz
845 | obi
846 | observer
847 | office
848 | okinawa
849 | olayan
850 | olayangroup
851 | ollo
852 | om
853 | omega
854 | one
855 | ong
856 | onl
857 | online
858 | ooo
859 | open
860 | oracle
861 | orange
862 | org
863 | organic
864 | origins
865 | osaka
866 | otsuka
867 | ott
868 | ovh
869 | pa
870 | page
871 | panasonic
872 | paris
873 | pars
874 | partners
875 | parts
876 | party
877 | pay
878 | pccw
879 | pe
880 | pet
881 | pf
882 | pfizer
883 | pg
884 | ph
885 | pharmacy
886 | phd
887 | philips
888 | phone
889 | photo
890 | photography
891 | photos
892 | physio
893 | pics
894 | pictet
895 | pictures
896 | pid
897 | pin
898 | ping
899 | pink
900 | pioneer
901 | pizza
902 | pk
903 | pl
904 | place
905 | play
906 | playstation
907 | plumbing
908 | plus
909 | pm
910 | pn
911 | pnc
912 | pohl
913 | poker
914 | politie
915 | porn
916 | post
917 | pr
918 | pramerica
919 | praxi
920 | press
921 | prime
922 | pro
923 | prod
924 | productions
925 | prof
926 | progressive
927 | promo
928 | properties
929 | property
930 | protection
931 | pru
932 | prudential
933 | ps
934 | pt
935 | pub
936 | pw
937 | pwc
938 | py
939 | qa
940 | qpon
941 | quebec
942 | quest
943 | racing
944 | radio
945 | re
946 | read
947 | realestate
948 | realtor
949 | realty
950 | recipes
951 | red
952 | redstone
953 | redumbrella
954 | rehab
955 | reise
956 | reisen
957 | reit
958 | reliance
959 | ren
960 | rent
961 | rentals
962 | repair
963 | report
964 | republican
965 | rest
966 | restaurant
967 | review
968 | reviews
969 | rexroth
970 | rich
971 | richardli
972 | ricoh
973 | ril
974 | rio
975 | rip
976 | ro
977 | rocks
978 | rodeo
979 | rogers
980 | room
981 | rs
982 | rsvp
983 | ru
984 | rugby
985 | ruhr
986 | run
987 | rw
988 | rwe
989 | ryukyu
990 | sa
991 | saarland
992 | safe
993 | safety
994 | sakura
995 | sale
996 | salon
997 | samsclub
998 | samsung
999 | sandvik
1000 | sandvikcoromant
1001 | sanofi
1002 | sap
1003 | sarl
1004 | sas
1005 | save
1006 | saxo
1007 | sb
1008 | sbi
1009 | sbs
1010 | sc
1011 | scb
1012 | schaeffler
1013 | schmidt
1014 | scholarships
1015 | school
1016 | schule
1017 | schwarz
1018 | science
1019 | scot
1020 | sd
1021 | se
1022 | search
1023 | seat
1024 | secure
1025 | security
1026 | seek
1027 | select
1028 | sener
1029 | services
1030 | seven
1031 | sew
1032 | sex
1033 | sexy
1034 | sfr
1035 | sg
1036 | sh
1037 | shangrila
1038 | sharp
1039 | shell
1040 | shia
1041 | shiksha
1042 | shoes
1043 | shop
1044 | shopping
1045 | shouji
1046 | show
1047 | si
1048 | silk
1049 | sina
1050 | singles
1051 | site
1052 | sj
1053 | sk
1054 | ski
1055 | skin
1056 | sky
1057 | skype
1058 | sl
1059 | sling
1060 | sm
1061 | smart
1062 | smile
1063 | sn
1064 | sncf
1065 | so
1066 | soccer
1067 | social
1068 | softbank
1069 | software
1070 | sohu
1071 | solar
1072 | solutions
1073 | song
1074 | sony
1075 | soy
1076 | spa
1077 | space
1078 | sport
1079 | spot
1080 | sr
1081 | srl
1082 | ss
1083 | st
1084 | stada
1085 | staples
1086 | star
1087 | statebank
1088 | statefarm
1089 | stc
1090 | stcgroup
1091 | stockholm
1092 | storage
1093 | store
1094 | stream
1095 | studio
1096 | study
1097 | style
1098 | su
1099 | sucks
1100 | supplies
1101 | supply
1102 | support
1103 | surf
1104 | surgery
1105 | suzuki
1106 | sv
1107 | swatch
1108 | swiss
1109 | sx
1110 | sy
1111 | sydney
1112 | systems
1113 | sz
1114 | tab
1115 | taipei
1116 | talk
1117 | taobao
1118 | target
1119 | tatamotors
1120 | tatar
1121 | tattoo
1122 | tax
1123 | taxi
1124 | tc
1125 | tci
1126 | td
1127 | tdk
1128 | team
1129 | tech
1130 | technology
1131 | tel
1132 | temasek
1133 | tennis
1134 | teva
1135 | tf
1136 | tg
1137 | th
1138 | thd
1139 | theater
1140 | theatre
1141 | tiaa
1142 | tickets
1143 | tienda
1144 | tips
1145 | tires
1146 | tirol
1147 | tj
1148 | tjmaxx
1149 | tjx
1150 | tk
1151 | tkmaxx
1152 | tl
1153 | tm
1154 | tmall
1155 | tn
1156 | to
1157 | today
1158 | tokyo
1159 | tools
1160 | top
1161 | toray
1162 | toshiba
1163 | total
1164 | tours
1165 | town
1166 | toyota
1167 | toys
1168 | tr
1169 | trade
1170 | trading
1171 | training
1172 | travel
1173 | travelers
1174 | travelersinsurance
1175 | trust
1176 | trv
1177 | tt
1178 | tube
1179 | tui
1180 | tunes
1181 | tushu
1182 | tv
1183 | tvs
1184 | tw
1185 | tz
1186 | ua
1187 | ubank
1188 | ubs
1189 | ug
1190 | uk
1191 | unicom
1192 | university
1193 | uno
1194 | uol
1195 | ups
1196 | us
1197 | uy
1198 | uz
1199 | va
1200 | vacations
1201 | vana
1202 | vanguard
1203 | vc
1204 | ve
1205 | vegas
1206 | ventures
1207 | verisign
1208 | versicherung
1209 | vet
1210 | vg
1211 | vi
1212 | viajes
1213 | video
1214 | vig
1215 | viking
1216 | villas
1217 | vin
1218 | vip
1219 | virgin
1220 | visa
1221 | vision
1222 | viva
1223 | vivo
1224 | vlaanderen
1225 | vn
1226 | vodka
1227 | volvo
1228 | vote
1229 | voting
1230 | voto
1231 | voyage
1232 | vu
1233 | wales
1234 | walmart
1235 | walter
1236 | wang
1237 | wanggou
1238 | watch
1239 | watches
1240 | weather
1241 | weatherchannel
1242 | webcam
1243 | weber
1244 | website
1245 | wed
1246 | wedding
1247 | weibo
1248 | weir
1249 | wf
1250 | whoswho
1251 | wien
1252 | wiki
1253 | williamhill
1254 | win
1255 | windows
1256 | wine
1257 | winners
1258 | wme
1259 | wolterskluwer
1260 | woodside
1261 | work
1262 | works
1263 | world
1264 | wow
1265 | ws
1266 | wtc
1267 | wtf
1268 | xbox
1269 | xerox
1270 | xihuan
1271 | xin
1272 | xn--11b4c3d
1273 | xn--1ck2e1b
1274 | xn--1qqw23a
1275 | xn--2scrj9c
1276 | xn--30rr7y
1277 | xn--3bst00m
1278 | xn--3ds443g
1279 | xn--3e0b707e
1280 | xn--3hcrj9c
1281 | xn--3pxu8k
1282 | xn--42c2d9a
1283 | xn--45br5cyl
1284 | xn--45brj9c
1285 | xn--45q11c
1286 | xn--4dbrk0ce
1287 | xn--4gbrim
1288 | xn--54b7fta0cc
1289 | xn--55qw42g
1290 | xn--55qx5d
1291 | xn--5su34j936bgsg
1292 | xn--5tzm5g
1293 | xn--6frz82g
1294 | xn--6qq986b3xl
1295 | xn--80adxhks
1296 | xn--80ao21a
1297 | xn--80aqecdr1a
1298 | xn--80asehdb
1299 | xn--80aswg
1300 | xn--8y0a063a
1301 | xn--90a3ac
1302 | xn--90ae
1303 | xn--90ais
1304 | xn--9dbq2a
1305 | xn--9et52u
1306 | xn--9krt00a
1307 | xn--b4w605ferd
1308 | xn--bck1b9a5dre4c
1309 | xn--c1avg
1310 | xn--c2br7g
1311 | xn--cck2b3b
1312 | xn--cckwcxetd
1313 | xn--cg4bki
1314 | xn--clchc0ea0b2g2a9gcd
1315 | xn--czr694b
1316 | xn--czrs0t
1317 | xn--czru2d
1318 | xn--d1acj3b
1319 | xn--d1alf
1320 | xn--e1a4c
1321 | xn--eckvdtc9d
1322 | xn--efvy88h
1323 | xn--fct429k
1324 | xn--fhbei
1325 | xn--fiq228c5hs
1326 | xn--fiq64b
1327 | xn--fiqs8s
1328 | xn--fiqz9s
1329 | xn--fjq720a
1330 | xn--flw351e
1331 | xn--fpcrj9c3d
1332 | xn--fzc2c9e2c
1333 | xn--fzys8d69uvgm
1334 | xn--g2xx48c
1335 | xn--gckr3f0f
1336 | xn--gecrj9c
1337 | xn--gk3at1e
1338 | xn--h2breg3eve
1339 | xn--h2brj9c
1340 | xn--h2brj9c8c
1341 | xn--hxt814e
1342 | xn--i1b6b1a6a2e
1343 | xn--imr513n
1344 | xn--io0a7i
1345 | xn--j1aef
1346 | xn--j1amh
1347 | xn--j6w193g
1348 | xn--jlq480n2rg
1349 | xn--jvr189m
1350 | xn--kcrx77d1x4a
1351 | xn--kprw13d
1352 | xn--kpry57d
1353 | xn--kput3i
1354 | xn--l1acc
1355 | xn--lgbbat1ad8j
1356 | xn--mgb9awbf
1357 | xn--mgba3a3ejt
1358 | xn--mgba3a4f16a
1359 | xn--mgba7c0bbn0a
1360 | xn--mgbaam7a8h
1361 | xn--mgbab2bd
1362 | xn--mgbah1a3hjkrd
1363 | xn--mgbai9azgqp6j
1364 | xn--mgbayh7gpa
1365 | xn--mgbbh1a
1366 | xn--mgbbh1a71e
1367 | xn--mgbc0a9azcg
1368 | xn--mgbca7dzdo
1369 | xn--mgbcpq6gpa1a
1370 | xn--mgberp4a5d4ar
1371 | xn--mgbgu82a
1372 | xn--mgbi4ecexp
1373 | xn--mgbpl2fh
1374 | xn--mgbt3dhd
1375 | xn--mgbtx2b
1376 | xn--mgbx4cd0ab
1377 | xn--mix891f
1378 | xn--mk1bu44c
1379 | xn--mxtq1m
1380 | xn--ngbc5azd
1381 | xn--ngbe9e0a
1382 | xn--ngbrx
1383 | xn--node
1384 | xn--nqv7f
1385 | xn--nqv7fs00ema
1386 | xn--nyqy26a
1387 | xn--o3cw4h
1388 | xn--ogbpf8fl
1389 | xn--otu796d
1390 | xn--p1acf
1391 | xn--p1ai
1392 | xn--pgbs0dh
1393 | xn--pssy2u
1394 | xn--q7ce6a
1395 | xn--q9jyb4c
1396 | xn--qcka1pmc
1397 | xn--qxa6a
1398 | xn--qxam
1399 | xn--rhqv96g
1400 | xn--rovu88b
1401 | xn--rvc1e0am3e
1402 | xn--s9brj9c
1403 | xn--ses554g
1404 | xn--t60b56a
1405 | xn--tckwe
1406 | xn--tiq49xqyj
1407 | xn--unup4y
1408 | xn--vermgensberater-ctb
1409 | xn--vermgensberatung-pwb
1410 | xn--vhquv
1411 | xn--vuq861b
1412 | xn--w4r85el8fhu5dnra
1413 | xn--w4rs40l
1414 | xn--wgbh1c
1415 | xn--wgbl6a
1416 | xn--xhq521b
1417 | xn--xkc2al3hye2a
1418 | xn--xkc2dl3a5ee0h
1419 | xn--y9a3aq
1420 | xn--yfro4i67o
1421 | xn--ygbi2ammx
1422 | xn--zfr164b
1423 | xxx
1424 | xyz
1425 | yachts
1426 | yahoo
1427 | yamaxun
1428 | yandex
1429 | ye
1430 | yodobashi
1431 | yoga
1432 | yokohama
1433 | you
1434 | youtube
1435 | yt
1436 | yun
1437 | za
1438 | zappos
1439 | zara
1440 | zero
1441 | zip
1442 | zm
1443 | zone
1444 | zuerich
1445 | zw
1446 |
--------------------------------------------------------------------------------