└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # [CrofAI](https://ai.nahcrof.com/) 2 | API documentation for crofAI 3 | # API/SDK 4 | CrofAI supports the OpenAI SDK for LLM inference. Below will be a python example. 5 | ## Python (No Streaming) 6 | ```python 7 | from openai import OpenAI 8 | 9 | client = OpenAI( 10 | base_url="https://ai.nahcrof.com/v2", 11 | api_key="api-key" 12 | ) 13 | response = client.chat.completions.create( 14 | model="MODEL-FROM-LIST", 15 | messages=[ 16 | {"role": "user", "content": "Hello!"} 17 | ] 18 | ) 19 | print(response.choices[0].message.content) 20 | ``` 21 | ## Python (With Streaming) 22 | ```python 23 | from openai import OpenAI 24 | 25 | client = OpenAI( 26 | base_url="https://ai.nahcrof.com/v2", 27 | api_key="api-key" 28 | ) 29 | 30 | response = client.chat.completions.create( 31 | model="MODEL-FROM-LIST", 32 | messages=[ 33 | {"role": "user", "content": "Howdy there! How are you?"} 34 | ], 35 | stream=True # Enable streaming 36 | ) 37 | 38 | for chunk in response: 39 | if chunk.choices and chunk.choices[0].delta.content: 40 | print(chunk.choices[0].delta.content, end="", flush=True) 41 | ``` 42 | ## Python (embedding model) 43 | ```python 44 | from openai import OpenAI 45 | 46 | client = OpenAI( 47 | base_url="https://ai.nahcrof.com/v2", 48 | api_key="api-key" 49 | ) 50 | 51 | response = client.embeddings.create( 52 | input="The quick brown fox jumps over the lazy dog", 53 | model="multilingual-e5-large-instruct", 54 | ) 55 | 56 | print("Embedding:", response.data[0].embedding[:5]) # Print first 5 57 | print("Total tokens:", response.usage.total_tokens) 58 | ``` 59 | 60 | # AI models / API MODEL-NAME 61 | ``` 62 | llama3.1-8b 63 | llama3.3-70b 64 | llama3.1-405b 65 | llama3.1-tulu3-405b 66 | deepseek-r1 67 | deepseek-r1-turbo 68 | deepseek-r1-0528 69 | deepseek-v3 70 | deepseek-v3-0324 71 | deepseek-v3-0324-turbo 72 | deepseek-r1-distill-llama-70b 73 | deepseek-r1-distill-qwen-32b 74 | qwen-qwq-32b 75 | gemma-3-27b-it 76 | gemma-3n-e4b-it 77 | llama-4-scout 78 | qwen3-235b-a22b 79 | multilingual-e5-large-instruct 80 | ``` 81 | # LLM average speeds (tokens/second) 82 | ``` 83 | 84 | llama3.1-8b 85 | Average speed: ~50 tokens/second 86 | 87 | llama3.3-70b 88 | Average speed: ~30 tokens/second 89 | 90 | llama3.1-405b-Instruct 91 | Average speed: ~30 tokens/second 92 | 93 | llama3.1-tulu-3-405b 94 | Average speed: ~30 tokens/second 95 | 96 | deepseek-r1-distill-llama-70b 97 | Average speed: ~30 tokens/second 98 | 99 | deepseek-r1-distill-qwen-32b 100 | Average speed: ~50 tokens/second 101 | 102 | deepseek-r1 103 | Average speed: ~25 tokens/second 104 | 105 | deepseek-r1-turbo 106 | Average speed: ~230 tokens/second 107 | 108 | deepseek-r1-0528 109 | Average speed: ~30 tokens/second 110 | 111 | deepseek-v3 112 | Average speed: ~35 tokens/second 113 | 114 | deepseek-v3-0324 115 | Average speed: ~35 tokens/second 116 | 117 | deepseek-v3-0324-turbo 118 | Average speed: ~325 tokens/second 119 | 120 | qwen-qwq-32b 121 | Average speed: ~25 tokens/second 122 | 123 | gemma-3-27b-it 124 | Average speed: ~80 tokens/second 125 | 126 | llama-4-scout 127 | Average speed: ~65 tokens/second 128 | 129 | qwen3-235b-a22b 130 | Average speed: ~60 tokens/second 131 | 132 | multilingual-e5-large-instruct 133 | average speed: ~75 tokens/second 134 | ``` 135 | --------------------------------------------------------------------------------