├── readme.md ├── directory_helper.py ├── rag.py └── data └── investments.txt /readme.md: -------------------------------------------------------------------------------- 1 | # RAG in 80 lines of code 2 | 3 | It's simple. It's fast. It's easy. It's 80% of the way there for RAG. 4 | 5 | For your application, RAG at 100% might still only solve 20% of your problem. 6 | 7 | It might be worth investing energy into the part that takes things to the next level. 8 | But it's a good start. This is just one ingredient in building effective custom LLMs. 9 | 10 | Specifically, RAG is [step 3](https://github.com/lamini-ai/lamini-sdk/tree/main/03_RAG) in the [Lamini SDK](https://github.com/lamini-ai/lamini-sdk/tree/main). Step through best practices for building custom LLMs on open-source there. 11 | 12 | [Get your API key (free)](https://app.lamini.ai/). Easy steps to get started and install. [Let us know](https://www.lamini.ai/contact) if you want to chat about enterprise usage. -------------------------------------------------------------------------------- /directory_helper.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | class DefaultChunker: 4 | def __init__(self, chunk_size=512, step_size=256): 5 | self.chunk_size = chunk_size 6 | self.step_size = step_size 7 | 8 | def get_chunks(self, data): 9 | for text in data: 10 | for i in range(0, len(text), self.step_size): 11 | max_size = min(self.chunk_size, len(text) - i) 12 | yield text[i:i+max_size] 13 | 14 | class DirectoryLoader: 15 | def __init__(self, directory, batch_size=512, chunker=DefaultChunker()): 16 | self.directory = directory 17 | self.chunker = chunker 18 | self.batch_size = batch_size 19 | 20 | def load(self): 21 | for root, dirs, files in os.walk(self.directory): 22 | for file in files: 23 | with open(os.path.join(root, file), 'r') as f: 24 | print("Loading file: %s", os.path.join(root, file)) 25 | yield f.read() 26 | 27 | def get_chunks(self): 28 | return self.chunker.get_chunks(self.load()) 29 | 30 | def get_chunk_batches(self): 31 | chunks = [] 32 | for chunk in self.get_chunks(): 33 | chunks.append(chunk) 34 | if len(chunks) == self.batch_size: 35 | yield chunks 36 | chunks = [] 37 | 38 | if len(chunks) > 0: 39 | yield chunks 40 | 41 | def __iter__(self): 42 | return self.get_chunk_batches() -------------------------------------------------------------------------------- /rag.py: -------------------------------------------------------------------------------- 1 | import faiss 2 | import time 3 | import numpy as np 4 | from tqdm import tqdm 5 | from lamini.api.embedding import Embedding 6 | from lamini import Lamini 7 | 8 | from directory_helper import DirectoryLoader 9 | 10 | class LaminiIndex: 11 | def __init__(self, loader): 12 | self.loader = loader 13 | self.build_index() 14 | 15 | def build_index(self): 16 | self.content_chunks = [] 17 | self.index = None 18 | for chunk_batch in tqdm(self.loader): 19 | embeddings = self.get_embeddings(chunk_batch) 20 | if self.index is None: 21 | self.index = faiss.IndexFlatL2(len(embeddings[0])) 22 | self.index.add(embeddings) 23 | self.content_chunks.extend(chunk_batch) 24 | 25 | def get_embeddings(self, examples): 26 | ebd = Embedding() 27 | embeddings = ebd.generate(examples) 28 | embedding_list = [embedding[0] for embedding in embeddings] 29 | return np.array(embedding_list) 30 | 31 | def query(self, query, k=5): 32 | embedding = self.get_embeddings([query])[0] 33 | embedding_array = np.array([embedding]) 34 | _, indices = self.index.search(embedding_array, k) 35 | return [self.content_chunks[i] for i in indices[0]] 36 | 37 | class QueryEngine: 38 | def __init__(self, index, k=5): 39 | self.index = index 40 | self.k = k 41 | self.model = Lamini(model_name="mistralai/Mistral-7B-Instruct-v0.1") 42 | 43 | def answer_question(self, question): 44 | most_similar = self.index.query(question, k=self.k) 45 | prompt = "\n".join(reversed(most_similar)) + "\n\n" + question 46 | print("------------------------------ Prompt ------------------------------\n" + prompt + "\n----------------------------- End Prompt -----------------------------") 47 | return self.model.generate("[INST]" + prompt + "[/INST]") 48 | 49 | class RetrievalAugmentedRunner: 50 | def __init__(self, dir, k=5): 51 | self.k = k 52 | self.loader = DirectoryLoader(dir) 53 | 54 | def train(self): 55 | self.index = LaminiIndex(self.loader) 56 | 57 | def __call__(self, query): 58 | query_engine = QueryEngine(self.index, k=self.k) 59 | return query_engine.answer_question(query) 60 | 61 | def main(): 62 | model = RetrievalAugmentedRunner(dir="data") 63 | start = time.time() 64 | model.train() 65 | print("Time taken to build index: ", time.time() - start) 66 | while True: 67 | prompt = input("\n\nEnter another investment question (e.g. Have we invested in any generative AI companies in 2023?): ") 68 | start = time.time() 69 | print(model(prompt)) 70 | print("\nTime taken: ", time.time() - start) 71 | 72 | main() -------------------------------------------------------------------------------- /data/investments.txt: -------------------------------------------------------------------------------- 1 | In 2022, the company only invested one AI project. 2 | 3 | Nelson B., a former employee, invested in Hooli XYZ, a subsidary of Hooli. The deal was finalized on Nov 1, 2022. Hooli XYZ uses generative AI to create bizarre potato cannons. The investment totaled $50,000 and a 2% equity share. 4 | 5 | In 2023, the company investmented in diverse projects. 6 | 7 | Erlich C., a seasoned investor with a passion for environmental innovation, took a key leadership role in spearheading the seed round investment for AquaTech Dynamics. The investment negotiations, which concluded on August 8, 2022, showcased Erlich's commitment to supporting groundbreaking initiatives. The seed round proved to be a substantial boost for AquaTech Dynamics, injecting $5,000,000 into their visionary project. In return, Erlich secured a notable 20% equity share, solidifying his belief in the company's potential. AquaTech Dynamics, headquartered in the vibrant city of Seattle, has become a hub for cutting-edge water purification technologies. Leveraging state-of-the-art research and development, the company stands at the forefront of advancements in water treatment. Their commitment to innovation directly contributes to the global movement for clean and accessible water resources, addressing critical environmental challenges and promoting sustainable practices. Erlich's strategic involvement positions AquaTech Dynamics as a frontrunner in revolutionizing water purification for a more sustainable and water-secure future. 8 | 9 | Russe H. took the lead in orchestrating the Series A investment for Super Piped Piper in Palo Alto. The deal was finalized on March 2, 2023, securing a significant investment of $1,000,000 along with a 10% equity stake. Super Piped Piper distinguishes itself by prioritizing the responsible deployment of generative AI models, contributing to the advancement of ethical AI practices. 10 | Beyond financial commitments, the partnership forged during this investment positions Super Piped Piper as a key player in the evolution of AI technologies, contributing not only to their own growth but also to the broader discourse on responsible and ethical AI deployment. 11 | 12 | On a parallel trajectory, Erlich B. played a pivotal role in guiding the seed round investment for SeeFood, also situated in Palo Alto. The transaction concluded on October 1, 2023, with a substantial investment totaling $10,000,000 and a 25% equity share. SeeFood stands out for its innovative use of AI, creating engaging octopus cooking videos that can be experienced seamlessly through Oculus headsets. Seefood was founded on the concept that having eight octopus recipes is more intriguing than having just one recipe for octopus. 13 | 14 | Elon M. took charge of the Series A investment for Solar Spectrum Solutions in San Francisco, reaching its conclusion on May 15, 2023. The funding totaled $2,500,000, securing a 15% equity stake. Solar Spectrum Solutions specializes in innovative solar energy solutions, emphasizing sustainability and environmental responsibility. 15 | 16 | Peter G. led a strategic investment in Urban Mobility Innovations headquartered in Los Angeles. The investment, concluding on November 20, 2023, amounted to $8,000,000, securing a 30% equity stake. Urban Mobility Innovations is at the forefront of revolutionizing urban transportation, leveraging innovative solutions for sustainable and efficient mobility. 17 | --------------------------------------------------------------------------------