├── .gitignore ├── README.md ├── data ├── data1.txt ├── data2.txt └── data3.txt ├── dynamic_RAPTOR.py ├── raptor_notebook.ipynb └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | app 2 | .env -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | "# RAPTOR---Advanced-Retrieval" 2 | -------------------------------------------------------------------------------- /data/data1.txt: -------------------------------------------------------------------------------- 1 | Step into our Tuscan-inspired sanctuary, where the golden hues of sun-kissed landscapes and the earthy aromas of olive groves and vineyards set the stage for a memorable dining experience. Tuscany, renowned for its breathtaking countryside and rich culinary heritage, offers a bounty of flavors and traditions to explore. 2 | 3 | Begin your journey with an antipasto platter adorned with cured meats, aged cheeses, and marinated vegetables, reminiscent of leisurely gatherings under the Tuscan sun. Each bite tells a story of generations past, of farmers and artisans who have perfected their craft over centuries. 4 | 5 | Indulge in classics like Ribollita, a hearty vegetable soup enriched with cannellini beans and Tuscan kale. This comforting dish, born out of frugality and resourcefulness, warms both body and soul, offering a taste of Tuscan tradition with every spoonful. 6 | 7 | Alternatively, savor the simplicity of Cacciucco, a rustic seafood stew brimming with the bounty of the Mediterranean. With its roots firmly planted in coastal communities, this dish celebrates the region's abundant seafood while paying homage to the fishermen who have sustained generations with their daily catch. 8 | 9 | As you dine, let the flavors of Tuscany transport you to a simpler time, where meals were savored slowly and each ingredient was cherished for its role in creating culinary magic. From the fields to the table, every aspect of Tuscan cuisine speaks to a deep connection to the land and a reverence for the ingredients that define it. 10 | 11 | Soak in the essence of Tuscany as you sip on a glass of Chianti, the region's iconic red wine, and let the spirit of la dolce vita wash over you. At Bella Vista, our Tuscan-inspired menu invites you to experience the true essence of Italian hospitality, where every dish is a celebration of tradition, flavor, and the timeless beauty of Tuscany. 12 | 13 | Chapter 2: Flavors of Sicily 14 | 15 | Transport yourself to the vibrant island of Sicily, where the mingling of diverse cultures has birthed a culinary landscape as bold and colorful as its landscapes. From the bustling markets of Palermo to the sun-drenched shores of Taormina, Sicily offers a feast for the senses unlike any other. 16 | 17 | Dive into a sea of flavors with Arancini, golden orbs of saffron-infused risotto stuffed with savory ragù and molten cheese. These indulgent delights, often enjoyed as street food or appetizers, showcase Sicily's love affair with rice and its penchant for bold, robust flavors. 18 | 19 | Alternatively, sample the delights of Caponata, a sweet and sour eggplant relish bursting with tomatoes, olives, and capers. This versatile dish, with its roots in Arab cuisine, reflects Sicily's rich history of culinary conquests, where each bite tells a story of cultural exchange and innovation. 20 | 21 | For mains, indulge in Pasta alla Norma, a Sicilian classic marrying silky eggplant, rich tomato sauce, and salty ricotta salata. Named after Bellini's famous opera, this dish is a tribute to Sicily's artistic soul, where food and culture intertwine in a symphony of flavors. 22 | 23 | Alternatively, savor the robust flavors of Swordfish Involtini, tender fish rolls filled with breadcrumbs, pine nuts, and raisins, bathed in a fragrant tomato sauce. This dish, with its blend of sweet and savory elements, captures the essence of Sicily's coastal cuisine, where seafood reigns supreme. 24 | 25 | As you dine, let the flavors of Sicily transport you to a land of ancient ruins and vibrant cities, where the past and present collide in a kaleidoscope of flavor and tradition. At Bella Vista, our Sicilian-inspired menu invites you to experience the bold, colorful cuisine of this enchanting island, where every dish is a celebration of Sicilian spirit and hospitality. 26 | 27 | Chapter 3: Venetian Elegance 28 | 29 | Embrace the romance and charm of Venice with our menu inspired by the city's culinary heritage and maritime traditions. From the labyrinthine streets of the historic center to the tranquil waters of the Grand Canal, Venice offers a feast for the senses unlike any other. 30 | 31 | Start your culinary voyage with Cicchetti, bite-sized Venetian snacks bursting with flavor, from creamy bacalà mantecato to delicate baccalà fritters. These small but flavorful bites, often enjoyed with a glass of local wine, showcase Venice's love affair with seafood and its talent for transforming simple ingredients into culinary masterpieces. 32 | 33 | For a taste of the sea, delve into Risotto al Nero di Seppia, a luxurious black squid ink risotto crowned with plump shrimp and tender calamari. This iconic Venetian dish, with its dramatic color and rich, briny flavor, captures the essence of the city's maritime heritage, where seafood has always played a central role in the local cuisine. 34 | 35 | Alternatively, indulge in Risi e Bisi, a comforting risotto enriched with sweet peas and savory pancetta. This simple yet satisfying dish, often enjoyed during the spring months, celebrates the bounty of the Venetian lagoon and the region's rich agricultural heritage. 36 | 37 | Complete your Venetian feast with a decadent slice of Tiramisu, the quintessential Italian dessert that translates to "pick me up." This heavenly confection, with its layers of espresso-soaked ladyfingers, creamy mascarpone, and dusting of cocoa powder, is a fitting finale to a meal that lifts the spirits and satisfies the soul. 38 | 39 | As you dine, let the flavors of Venice transport you to a city of timeless beauty and unparalleled elegance, where each dish is a testament to the ingenuity and creativity of Venetian cuisine. At Bella Vista, our Venetian-inspired menu invites you to experience the magic of Venice, where every bite is a journey through the city's rich culinary heritage and maritime traditions. 40 | 41 | Chapter 4: Roman Revival 42 | 43 | In the eternal city of Rome, ancient traditions blend seamlessly with modern flair, resulting in a culinary landscape that's as timeless as it is tantalizing. From the majestic ruins of the Colosseum to the bustling streets of Trastevere, Rome offers a feast for the senses unlike any other. 44 | 45 | Begin your Roman odyssey with Supplì al Telefono, golden rice croquettes oozing with molten mozzarella. These irresistible snacks, often enjoyed as street food or appetizers, showcase Rome's love affair with rice and its talent for transforming humble ingredients into culinary delights. 46 | 47 | Alternatively, savor the simplicity of Carciofi alla Romana, tender artichokes simmered in fragrant herbs and olive oil. This classic Roman dish, with its delicate flavors and melt-in-your-mouth texture, is a testament to the city's rich culinary heritage and its love affair with fresh, seasonal produce. 48 | 49 | For a taste of la dolce vita, feast on Cacio e Pepe, a decadent pasta dish featuring al dente spaghetti coated in a velvety sauce of Pecorino Romano and black pepper. This iconic Roman dish, with its bold flavors and comforting simplicity, captures the essence of Roman cuisine, where the beauty lies in the purity of the ingredients and the skill of the chef. 50 | 51 | Alternatively, indulge in Saltimbocca alla Romana, tender veal cutlets wrapped in prosciutto and sage, sautéed to perfection in a white wine sauce. This iconic Roman dish, with its bold flavors and elegant presentation, is a testament to the city's rich culinary heritage and its love affair with the art of cooking. 52 | 53 | As you dine, let the flavors of Rome transport you to a city of ancient wonders and modern marvels, where each dish is a celebration of tradition, flavor, and the eternal spirit of la dolce vita. At Bella Vista, our Roman-inspired menu invites you to experience the magic of Rome, where every bite is a journey through the city's rich culinary heritage and timeless traditions. 54 | 55 | Chapter 5: Neapolitan Soul 56 | 57 | Journey to the sun-drenched shores of Naples, where the air is filled with the intoxicating scent of wood-fired pizzas and the joyful strains of Neapolitan melodies. From the bustling streets of Spaccanapoli to the tranquil waters of the Bay of Naples, Naples offers a feast for the senses unlike any other. 58 | 59 | Begin your culinary escapade with a selection of Antipasti Napoletani, featuring delights like fried calamari, stuffed peppers, and marinated anchovies. These vibrant and flavorful starters, often enjoyed with a glass of local wine, showcase Naples' love affair with seafood and its talent for transforming simple ingredients into culinary masterpieces. 60 | 61 | For mains, delight your palate with Pizza Margherita, the iconic Neapolitan pie adorned with vibrant tomatoes, creamy mozzarella, and fragrant basil. This classic Neapolitan dish, with its crispy crust and melty cheese, captures the essence of Naples' culinary heritage and its love affair with pizza. 62 | 63 | Alternatively, indulge in Spaghetti alla Puttanesca, a fiery pasta dish bursting with the bold flavors of olives, capers, and anchovies. This iconic Neapolitan dish, with its robust flavors and vibrant colors, is a testament to the city's rich culinary heritage and its love affair with bold, Mediterranean flavors. 64 | 65 | End your meal on a sweet note with a slice of Pastiera Napoletana, a heavenly ricotta and wheat pie perfumed with orange blossom water and adorned with candied citrus. This iconic Neapolitan dessert, with its delicate flavors and rich textures, is a fitting finale to a meal that celebrates the soulful flavors of Naples. 66 | 67 | As you dine, let the flavors of Naples transport you to a city of vibrant colors and rich flavors, where each dish is a celebration of tradition, flavor, and the soulful spirit of Neapolitan cuisine. At Bella Vista, our Neapolitan-inspired menu invites you to experience the magic of Naples, where every bite is a journey through the city's rich culinary heritage and joyful traditions. -------------------------------------------------------------------------------- /data/data2.txt: -------------------------------------------------------------------------------- 1 | In the bustling streets of Naples, where the aroma of freshly baked pizza mingles with the salty breeze of the Mediterranean, Giovanni Di Napoli was born into a world steeped in culinary tradition. From an early age, Giovanni's passion for food was ignited by his Nonna Rosa, whose rustic kitchen served as the heart of their family gatherings, filled with laughter, love, and the irresistible scent of simmering sauces. 2 | 3 | Giovanni's upbringing was deeply rooted in the rich tapestry of Neapolitan culture, where food wasn't merely sustenance but a sacred ritual—a celebration of life itself. His Nonna's culinary prowess, coupled with the vibrant markets of Naples, instilled in him a profound appreciation for fresh, locally sourced ingredients and the time-honored techniques that transformed them into culinary masterpieces. 4 | 5 | Chapter 1: Roots of Inspiration 6 | 7 | Born into a family of artisans and craftsmen, Giovanni's childhood was a symphony of sights, sounds, and flavors—a vibrant mosaic of Neapolitan culture that shaped his culinary journey from the very beginning. From helping his Nonna knead dough for homemade pasta to accompanying his Nonno to the bustling markets of Naples, Giovanni's early years were steeped in the traditions and rituals that would later define his approach to cooking. 8 | 9 | Guided by his Nonna's gentle hand and unwavering passion, Giovanni learned the art of cooking not from textbooks or formal training but from the heart—a lesson in flavor, technique, and above all, love. In his Nonna's kitchen, he discovered the magic of simple, honest food made with care and attention to detail—a philosophy that would become the cornerstone of his culinary identity. 10 | 11 | Chapter 2: A Journey Begins 12 | 13 | At the tender age of fifteen, Giovanni embarked on a culinary odyssey that would take him from the cobblestone streets of Naples to the bustling kitchens of Rome, Florence, and beyond. Driven by a relentless thirst for knowledge and a burning passion for his craft, Giovanni immersed himself in the rich tapestry of Italian cuisine, absorbing the secrets of pasta-making, sauce-simmering, and dough-kneading with unwavering dedication. 14 | 15 | Under the tutelage of master chefs and seasoned artisans, Giovanni honed his skills, refining his palate and expanding his culinary repertoire with each new experience. From the art of crafting perfect risotto to the delicate balance of flavors in a traditional ragù, Giovanni embraced the diversity and complexity of Italian cuisine, weaving together a tapestry of flavors and techniques that would later define his signature style. 16 | 17 | Chapter 3: The Birth of Bella Vista 18 | 19 | Armed with a vision and a burning passion to share the flavors of his homeland with the world, Giovanni set out to create Bella Vista—a culinary sanctuary where the soulful flavors of Italy could be experienced in all their glory. Nestled in the heart of a bustling city, Bella Vista quickly became a beacon of authenticity, drawing food lovers from far and wide with its tantalizing aromas, warm hospitality, and unapologetically Italian ambiance. 20 | 21 | For Giovanni, Bella Vista wasn't just a restaurant—it was a labor of love, a tribute to his roots, and a testament to the power of food to bring people together. From the carefully curated menu to the warm, inviting atmosphere, every aspect of Bella Vista reflected Giovanni's unwavering commitment to excellence and his deep reverence for the culinary traditions that had shaped him. 22 | 23 | Chapter 4: The Art of Hospitality 24 | 25 | For Giovanni, hospitality isn't just about serving food; it's about creating an experience that nourishes the body, delights the senses, and touches the soul. From the moment guests step through the doors of Bella Vista, they are enveloped in a warm embrace of Italian hospitality, welcomed as family and invited to savor the simple pleasures of good food, great wine, and lively conversation. 26 | 27 | At Bella Vista, hospitality is more than just a job—it's a way of life, a philosophy that permeates every aspect of the dining experience. From the friendly smiles of the staff to the meticulous attention to detail in every dish, Giovanni and his team strive to create an atmosphere of warmth, comfort, and joy, where guests can relax, unwind, and savor the moment. 28 | 29 | Chapter 5: The Culinary Visionary 30 | 31 | As the driving force behind Bella Vista, Giovanni is not content to simply rest on his laurels; he is constantly pushing the boundaries of Italian cuisine, exploring new flavors, techniques, and ingredients to keep his menu fresh, exciting, and relevant. From seasonal specials inspired by the bounty of the earth to innovative reinterpretations of classic dishes, Giovanni's culinary creations are a testament to his creativity, passion, and unwavering commitment to excellence. 32 | 33 | For Giovanni, cooking is more than just a job—it's an art form, a creative expression of his love for food and his deep connection to his heritage. Whether he's experimenting with new flavor combinations or putting his own spin on traditional recipes, Giovanni approaches each dish with a sense of curiosity, creativity, and reverence for the ingredients that inspire him. 34 | 35 | Chapter 6: A Community of Food Lovers 36 | 37 | Beyond the walls of Bella Vista, Giovanni is deeply committed to giving back to the community that has embraced him with open arms. Whether through charitable events, culinary workshops, or partnerships with local farmers and artisans, Giovanni is dedicated to supporting and nurturing the vibrant tapestry of food culture that thrives in his city, ensuring that future generations will continue to be inspired by the flavors, traditions, and spirit of Italy. 38 | 39 | For Giovanni, food isn't just about sustenance—it's about connection, community, and shared experiences. Whether he's hosting a cooking class for aspiring chefs or organizing a fundraiser for a local charity, Giovanni believes in the power of food to bring people together, bridge divides, and create lasting memories that transcend language, culture, and background. 40 | 41 | Chapter 7: Legacy of Love 42 | 43 | As Giovanni looks back on his journey from the streets of Naples to the helm of Bella Vista, he is filled with gratitude for the experiences, the mentors, and the moments that have shaped him into the chef and the man he is today. But above all, Giovanni's greatest source of pride is the legacy he is building for his own family, passing down the traditions, the recipes, and the values that have been cherished for generations, ensuring that the flame of Italian cuisine will continue to burn bright for years to come. 44 | 45 | For Giovanni, Bella Vista isn't just a restaurant—it's a legacy, a testament to his love for food, family, and the rich tapestry of Italian culture. Whether he's sharing a meal with loved ones or serving guests at his restaurant, Giovanni's passion for cooking is a reflection of his deep-seated belief in the power of food to nourish the body, uplift the spirit, and bring people together in celebration of life's simple pleasures. -------------------------------------------------------------------------------- /data/data3.txt: -------------------------------------------------------------------------------- 1 | Introduction: 2 | Welcome to Bella Vista, where every dish is a love letter to the vibrant flavors, fresh ingredients, and time-honored traditions of Italian cuisine. At the heart of our culinary philosophy lies a deep reverence for the simple yet sublime beauty of the Mediterranean diet, which celebrates the natural bounty of the land and sea, and the art of transforming humble ingredients into extraordinary culinary creations. Join us as we embark on a gastronomic journey through the diverse regions of Italy, from the sun-drenched shores of Sicily to the verdant hills of Tuscany, to discover the soul-stirring flavors that have captivated palates for centuries. 3 | 4 | Chapter 1: The Essence of Italian Cuisine 5 | Italian cuisine is a celebration of simplicity, freshness, and flavor, where every dish is a reflection of the land, the culture, and the people who create it. At Bella Vista, we embrace this philosophy wholeheartedly, sourcing the finest seasonal ingredients from local farmers, fishermen, and artisans, and allowing their natural flavors to shine through in every bite. From ripe tomatoes bursting with sun-ripened sweetness to fragrant basil leaves picked at the peak of freshness, our kitchen is a symphony of colors, aromas, and textures that pays homage to the rich culinary heritage of Italy. 6 | 7 | Chapter 2: The Heart of the Mediterranean 8 | At the heart of Italian cuisine lies the Mediterranean diet, a way of eating that emphasizes whole grains, fresh fruits and vegetables, lean proteins, and healthy fats, all of which are abundant in the sun-drenched regions of southern Europe. Olive oil, the cornerstone of Mediterranean cooking, infuses our dishes with its golden hue and fruity flavor, while herbs like oregano, rosemary, and thyme add depth and complexity to our sauces and marinades. Seafood, another staple of the Mediterranean diet, features prominently on our menu, from delicate crudo plates adorned with citrus and herbs to hearty seafood stews simmered in rich tomato broth. 9 | 10 | Chapter 3: The Art of Pasta Making 11 | No discussion of Italian cuisine would be complete without paying homage to pasta, the beloved staple that has been nourishing Italians for centuries. At Bella Vista, we take pride in our pasta-making traditions, crafting each batch of dough with care and precision to achieve the perfect texture and flavor. From silky strands of spaghetti to delicate ravioli parcels, our pasta dishes are a testament to the artistry and craftsmanship of our chefs, who draw inspiration from the diverse regions of Italy to create dishes that are both timeless and innovative. 12 | 13 | Chapter 4: From Field to Fork 14 | At Bella Vista, we believe that the quality of our ingredients is paramount to the success of our dishes, which is why we go to great lengths to source the freshest, most flavorful produce available. Our tomatoes are handpicked at the peak of ripeness from local farms, our cheeses are crafted with care by artisanal cheesemakers, and our meats are sourced from trusted suppliers who share our commitment to quality and sustainability. By supporting local farmers and producers, we not only ensure the integrity of our ingredients but also contribute to the vibrant food culture that thrives in our community. 15 | 16 | Chapter 5: A Symphony of Flavors 17 | Italian cuisine is renowned for its bold and vibrant flavors, which are achieved through the careful balance of sweet, sour, salty, and savory elements. At Bella Vista, our chefs draw on a rich palette of ingredients and seasonings to create dishes that are as satisfying as they are memorable. From the zesty tang of citrus and the peppery bite of arugula to the umami richness of aged Parmesan and the subtle sweetness of roasted garlic, each component plays a vital role in creating dishes that are layered, complex, and utterly irresistible. 18 | 19 | Chapter 6: The Power of Tradition 20 | While we take pride in our culinary creativity and innovation, we also recognize the importance of honoring the time-honored traditions that have been passed down through generations of Italian cooks. At Bella Vista, we pay homage to these traditions by preserving classic recipes and techniques, from slow-simmered ragùs and hand-rolled gnocchi to wood-fired pizzas and artisanal gelato. By staying true to our roots and embracing the wisdom of the past, we ensure that every dish we serve is a tribute to the rich culinary heritage of Italy. 21 | 22 | Chapter 7: A Feast for the Senses 23 | Italian cuisine is more than just food; it's a feast for the senses that engages sight, smell, taste, and touch in equal measure. At Bella Vista, we believe that dining should be a multisensory experience that stimulates the senses and nourishes the soul. From the sizzle of garlic hitting hot olive oil to the aroma of freshly baked bread wafting from the oven, every moment in our kitchen is infused with passion, creativity, and a deep appreciation for the art of cooking. Join us as we embark on a culinary journey that delights the senses and transports you to the sun-drenched shores of Italy. 24 | 25 | Chapter 8: The Joy of Sharing 26 | At Bella Vista, we believe that food has the power to bring people together, to forge connections, and to create lasting memories. Whether you're sharing a leisurely meal with loved ones, savoring a romantic dinner for two, or simply enjoying a quiet moment of solitude, our dishes are meant to be shared, savored, and celebrated. Join us at Bella Vista, where every meal is an opportunity to come together, to break bread, and to revel in the simple pleasures of good food, good company, and the joy of shared experiences. -------------------------------------------------------------------------------- /dynamic_RAPTOR.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import umap 3 | import pandas as pd 4 | from sklearn.mixture import GaussianMixture 5 | from langchain.prompts import ChatPromptTemplate 6 | from langchain_core.output_parsers import StrOutputParser 7 | from langchain_openai import OpenAIEmbeddings, ChatOpenAI 8 | from langchain_text_splitters import RecursiveCharacterTextSplitter 9 | from langchain_community.document_loaders import DirectoryLoader 10 | from dotenv import find_dotenv, load_dotenv 11 | import tiktoken 12 | 13 | import os 14 | 15 | load_dotenv(find_dotenv()) 16 | 17 | 18 | class TextClusterSummarizer: 19 | def __init__( 20 | self, 21 | token_limit, 22 | data_directory, 23 | glob_pattern="**/*.txt", 24 | ): 25 | print("Initializing TextClusterSummarizer...") 26 | self.token_limit = token_limit 27 | self.loader = DirectoryLoader(data_directory, glob=glob_pattern) 28 | self.text_splitter = RecursiveCharacterTextSplitter( 29 | chunk_size=200, 30 | chunk_overlap=20, 31 | length_function=len, 32 | is_separator_regex=False, 33 | ) 34 | self.embedding_model = OpenAIEmbeddings() 35 | self.chat_model = ChatOpenAI(temperature=0, model="gpt-3.5-turbo") 36 | self.iteration_summaries = [] 37 | 38 | def load_and_split_documents(self): 39 | print("Loading and splitting documents...") 40 | docs = self.loader.load() 41 | return self.text_splitter.split_documents(docs) 42 | 43 | def embed_texts(self, texts): 44 | print("Embedding texts...") 45 | return [self.embedding_model.embed_query(txt) for txt in texts] 46 | 47 | def reduce_dimensions(self, embeddings, dim, n_neighbors=None, metric="cosine"): 48 | print(f"Reducing dimensions to {dim}...") 49 | if n_neighbors is None: 50 | n_neighbors = int((len(embeddings) - 1) ** 0.5) 51 | return umap.UMAP( 52 | n_neighbors=n_neighbors, n_components=dim, metric=metric 53 | ).fit_transform(embeddings) 54 | 55 | def num_tokens_from_string(self, string: str) -> int: 56 | """Returns the number of tokens in a text string.""" 57 | encoding = tiktoken.get_encoding("cl100k_base") 58 | num_tokens = len(encoding.encode(string)) 59 | return num_tokens 60 | 61 | def cluster_embeddings(self, embeddings, threshold, random_state=0): 62 | print("Clustering embeddings...") 63 | n_clusters = self.get_optimal_clusters(embeddings) 64 | gm = GaussianMixture(n_components=n_clusters, random_state=random_state).fit( 65 | embeddings 66 | ) 67 | probs = gm.predict_proba(embeddings) 68 | return [np.where(prob > threshold)[0] for prob in probs], n_clusters 69 | 70 | def get_optimal_clusters(self, embeddings, max_clusters=50, random_state=1234): 71 | print("Calculating optimal number of clusters...") 72 | max_clusters = min(max_clusters, len(embeddings)) 73 | bics = [ 74 | GaussianMixture(n_components=n, random_state=random_state) 75 | .fit(embeddings) 76 | .bic(embeddings) 77 | for n in range(1, max_clusters) 78 | ] 79 | print(f"Optimal number of clusters: {np.argmin(bics) + 1}") 80 | return np.argmin(bics) + 1 81 | 82 | def format_cluster_texts(self, df): 83 | print("Formatting cluster texts...") 84 | clustered_texts = {} 85 | for cluster in df["Cluster"].unique(): 86 | cluster_texts = df[df["Cluster"] == cluster]["Text"].tolist() 87 | clustered_texts[cluster] = " --- ".join(cluster_texts) 88 | return clustered_texts 89 | 90 | def generate_summaries(self, texts): 91 | print("Generating summaries...") 92 | template = """You are an assistant to create a detailed summary of the text input provided. 93 | Text: 94 | {text} 95 | """ 96 | prompt = ChatPromptTemplate.from_template(template) 97 | chain = prompt | self.chat_model | StrOutputParser() 98 | 99 | summaries = {} 100 | for cluster, text in texts.items(): 101 | token_count = self.num_tokens_from_string(text) 102 | 103 | if token_count > self.token_limit: 104 | raise ValueError( 105 | f"Token limit exceeded for cluster {cluster} with {token_count} tokens. Unable to generate summary." 106 | ) 107 | 108 | summary = chain.invoke({"text": text}) 109 | summaries[cluster] = summary 110 | return summaries 111 | 112 | def run(self): 113 | print("Running TextClusterSummarizer...") 114 | docs = self.load_and_split_documents() 115 | texts = [doc.page_content for doc in docs] 116 | all_summaries = texts 117 | 118 | iteration = 1 119 | 120 | self.iteration_summaries.append( 121 | {"iteration": 0, "texts": texts, "summaries": []} 122 | ) 123 | 124 | while True: 125 | print(f"Iteration {iteration}") 126 | embeddings = self.embed_texts(all_summaries) 127 | 128 | # Need enough neighbours for UMAP 129 | n_neighbors = min(int((len(embeddings) - 1) ** 0.5), len(embeddings) - 1) 130 | if n_neighbors < 2: 131 | print("Not enough data points for UMAP reduction. Stopping iterations.") 132 | break 133 | 134 | embeddings_reduced = self.reduce_dimensions( 135 | embeddings, dim=2, n_neighbors=n_neighbors 136 | ) 137 | labels, num_clusters = self.cluster_embeddings( 138 | embeddings_reduced, threshold=0.5 139 | ) 140 | 141 | if num_clusters == 1: 142 | print("Reduced to a single cluster. Stopping iterations.") 143 | break 144 | 145 | simple_labels = [label[0] if len(label) > 0 else -1 for label in labels] 146 | df = pd.DataFrame( 147 | { 148 | "Text": all_summaries, 149 | "Embedding": list(embeddings_reduced), 150 | "Cluster": simple_labels, 151 | } 152 | ) 153 | 154 | clustered_texts = self.format_cluster_texts(df) 155 | summaries = self.generate_summaries(clustered_texts) 156 | 157 | all_summaries = list(summaries.values()) 158 | self.iteration_summaries.append( 159 | { 160 | "iteration": iteration, 161 | "texts": all_summaries, 162 | "summaries": list(summaries.values()), 163 | } 164 | ) 165 | iteration += 1 166 | 167 | final_summary = all_summaries[0] if all_summaries else "" 168 | return { 169 | "initial_texts": texts, 170 | "iteration_summaries": self.iteration_summaries, 171 | "final_summary": final_summary, 172 | } 173 | 174 | 175 | ### Run code 176 | summarizer = TextClusterSummarizer(token_limit=200, data_directory="data") 177 | final_output = summarizer.run() 178 | -------------------------------------------------------------------------------- /raptor_notebook.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "from langchain_community.document_loaders import DirectoryLoader" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "loader = DirectoryLoader('data', glob=\"**/*.txt\")\n", 19 | "docs = loader.load()" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": null, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "from langchain_text_splitters import RecursiveCharacterTextSplitter" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": null, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "text_splitter = RecursiveCharacterTextSplitter(\n", 38 | " chunk_size=200,\n", 39 | " chunk_overlap=20,\n", 40 | " length_function=len,\n", 41 | " is_separator_regex=False,\n", 42 | ")" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": null, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "docs = text_splitter.split_documents(docs)\n", 52 | "texts = [doc.page_content for doc in docs]" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": null, 58 | "metadata": {}, 59 | "outputs": [], 60 | "source": [ 61 | "import tiktoken\n", 62 | "\n", 63 | "def num_tokens_from_string(string: str) -> int:\n", 64 | " \"\"\"Returns the number of tokens in a text string.\"\"\"\n", 65 | " encoding = tiktoken.get_encoding(\"cl100k_base\")\n", 66 | " num_tokens = len(encoding.encode(string))\n", 67 | " return num_tokens" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "counts = [num_tokens_from_string(t) for t in texts]" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": null, 82 | "metadata": {}, 83 | "outputs": [], 84 | "source": [ 85 | "import matplotlib.pyplot as plt\n", 86 | "\n", 87 | "plt.figure(figsize=(10, 6))\n", 88 | "plt.hist(counts, bins=30, color=\"blue\", edgecolor=\"black\", alpha=0.7)\n", 89 | "plt.title(\"Histogram of Token Counts\")\n", 90 | "plt.xlabel(\"Token Count\")\n", 91 | "plt.ylabel(\"Frequency\")\n", 92 | "plt.grid(axis=\"y\", alpha=0.75)\n", 93 | "plt.show" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": null, 99 | "metadata": {}, 100 | "outputs": [], 101 | "source": [ 102 | "d_sorted = sorted(docs, key=lambda x: x.metadata[\"source\"])\n", 103 | "d_reversed = list(reversed(d_sorted))\n", 104 | "concatenated_content = \"\\n\\n\\n --- \\n\\n\\n\".join(\n", 105 | " [doc.page_content for doc in d_reversed]\n", 106 | ")\n", 107 | "print(\n", 108 | " \"Num tokens in all context: %s\"\n", 109 | " % num_tokens_from_string(concatenated_content)\n", 110 | ")" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": null, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "from langchain_openai import OpenAIEmbeddings, ChatOpenAI\n", 120 | "\n", 121 | "embedding_model = OpenAIEmbeddings()\n", 122 | "model = ChatOpenAI(temperature=0, model=\"gpt-3.5-turbo\")" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": {}, 129 | "outputs": [], 130 | "source": [ 131 | "global_embeddings = [embedding_model.embed_query(txt) for txt in texts]" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": null, 137 | "metadata": {}, 138 | "outputs": [], 139 | "source": [ 140 | "global_embeddings[0]" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": null, 146 | "metadata": {}, 147 | "outputs": [], 148 | "source": [ 149 | "import matplotlib.pyplot as plt\n", 150 | "from typing import Optional\n", 151 | "import numpy as np\n", 152 | "import umap\n", 153 | "\n", 154 | "def reduce_cluster_embeddings(\n", 155 | " embeddings: np.ndarray,\n", 156 | " dim: int,\n", 157 | " n_neighbors: Optional[int] = None,\n", 158 | " metric: str = \"cosine\",\n", 159 | ") -> np.ndarray:\n", 160 | " if n_neighbors is None:\n", 161 | " n_neighbors = int((len(embeddings) - 1) ** 0.5)\n", 162 | " return umap.UMAP(\n", 163 | " n_neighbors=n_neighbors, n_components=dim, metric=metric\n", 164 | " ).fit_transform(embeddings)\n", 165 | "\n", 166 | "\n", 167 | "dim = 2\n", 168 | "global_embeddings_reduced = reduce_cluster_embeddings(global_embeddings, dim)\n", 169 | "global_embeddings_reduced[0]\n", 170 | "\n" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": null, 176 | "metadata": {}, 177 | "outputs": [], 178 | "source": [ 179 | "plt.figure(figsize=(10, 8))\n", 180 | "plt.scatter(global_embeddings_reduced[:, 0], global_embeddings_reduced[:, 1], alpha=0.5)\n", 181 | "plt.title(\"Global Embeddings\")\n", 182 | "plt.xlabel(\"Dimension 1\")\n", 183 | "plt.ylabel(\"Dimension 2\")\n", 184 | "plt.show()" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "import matplotlib.pyplot as plt\n", 194 | "import numpy as np\n", 195 | "from sklearn.mixture import GaussianMixture\n", 196 | "\n", 197 | "def get_optimal_clusters(embeddings: np.ndarray, max_clusters: int = 50, random_state: int = 1234):\n", 198 | " max_clusters = min(max_clusters, len(embeddings))\n", 199 | " bics = [GaussianMixture(n_components=n, random_state=random_state).fit(embeddings).bic(embeddings)\n", 200 | " for n in range(1, max_clusters)]\n", 201 | " return np.argmin(bics) + 1\n", 202 | "\n", 203 | "def gmm_clustering(embeddings: np.ndarray, threshold: float, random_state: int = 0):\n", 204 | " n_clusters = get_optimal_clusters(embeddings)\n", 205 | " gm = GaussianMixture(n_components=n_clusters, random_state=random_state).fit(embeddings)\n", 206 | " probs = gm.predict_proba(embeddings)\n", 207 | " labels = [np.where(prob > threshold)[0] for prob in probs]\n", 208 | " return labels, n_clusters" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": null, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "labels, _ = gmm_clustering(global_embeddings_reduced, threshold=0.5)\n", 218 | "\n", 219 | "plot_labels = np.array([label[0] if len(label) > 0 else -1 for label in labels])\n", 220 | "plt.figure(figsize=(10, 8))\n", 221 | "\n", 222 | "unique_labels = np.unique(plot_labels)\n", 223 | "colors = plt.cm.rainbow(np.linspace(0, 1, len(unique_labels)))\n", 224 | "\n", 225 | "for label, color in zip(unique_labels, colors):\n", 226 | " mask = plot_labels == label\n", 227 | " plt.scatter(global_embeddings_reduced[mask, 0], global_embeddings_reduced[mask, 1], color=color, label=f'Cluster {label}', alpha=0.5)\n", 228 | "\n", 229 | "plt.title(\"Cluster Visualization of Global Embeddings\")\n", 230 | "plt.xlabel(\"Dimension 1\")\n", 231 | "plt.ylabel(\"Dimension 2\")\n", 232 | "plt.legend()\n", 233 | "plt.show()" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [ 242 | "import pandas as pd\n", 243 | "\n", 244 | "simple_labels = [label[0] if len(label) > 0 else -1 for label in labels]\n", 245 | "\n", 246 | "df = pd.DataFrame({\n", 247 | " 'Text': texts,\n", 248 | " 'Embedding': list(global_embeddings_reduced),\n", 249 | " 'Cluster': simple_labels\n", 250 | "})\n", 251 | "print(df.head(3))" 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "def format_cluster_texts(df):\n", 261 | " clustered_texts = {}\n", 262 | " for cluster in df['Cluster'].unique():\n", 263 | " cluster_texts = df[df['Cluster'] == cluster]['Text'].tolist()\n", 264 | " clustered_texts[cluster] = \" --- \".join(cluster_texts)\n", 265 | " return clustered_texts" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [ 274 | "clustered_texts = format_cluster_texts(df)" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": null, 280 | "metadata": {}, 281 | "outputs": [], 282 | "source": [ 283 | "clustered_texts" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": null, 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [ 292 | "from langchain.prompts import ChatPromptTemplate\n", 293 | "from langchain_core.output_parsers import StrOutputParser\n", 294 | "\n", 295 | "template = \"\"\"You are an assistant to create a detailed summary of the text input prodived.\n", 296 | "Text:\n", 297 | "{text}\n", 298 | "\"\"\"\n", 299 | "prompt = ChatPromptTemplate.from_template(template)\n", 300 | "chain = prompt | model | StrOutputParser()" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": null, 306 | "metadata": {}, 307 | "outputs": [], 308 | "source": [ 309 | "summaries = {}\n", 310 | "for cluster, text in clustered_texts.items():\n", 311 | " summary = chain.invoke({\"text\": text})\n", 312 | " summaries[cluster] = summary\n" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": null, 318 | "metadata": {}, 319 | "outputs": [], 320 | "source": [ 321 | "summaries" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": null, 327 | "metadata": {}, 328 | "outputs": [], 329 | "source": [ 330 | "embedded_summaries = [embedding_model.embed_query(summary) for summary in summaries.values()]\n", 331 | "\n", 332 | "embedded_summaries_np = np.array(embedded_summaries)\n", 333 | "\n", 334 | "labels, _ = gmm_clustering(embedded_summaries_np, threshold=0.5)\n", 335 | "\n", 336 | "simple_labels = [label[0] if len(label) > 0 else -1 for label in labels]" 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": null, 342 | "metadata": {}, 343 | "outputs": [], 344 | "source": [ 345 | "clustered_summaries = {}\n", 346 | "for i, label in enumerate(simple_labels):\n", 347 | " if label not in clustered_summaries:\n", 348 | " clustered_summaries[label] = []\n", 349 | " clustered_summaries[label].append(list(summaries.values())[i])" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [ 358 | "clustered_summaries" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": null, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "final_summaries = {}\n", 368 | "for cluster, texts in clustered_summaries.items():\n", 369 | " combined_text = ' '.join(texts)\n", 370 | " summary = chain.invoke({\"text\": combined_text})\n", 371 | " final_summaries[cluster] = summary" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": null, 377 | "metadata": {}, 378 | "outputs": [], 379 | "source": [ 380 | "final_summaries" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": null, 386 | "metadata": {}, 387 | "outputs": [], 388 | "source": [ 389 | "texts_from_df = df['Text'].tolist()\n", 390 | "texts_from_clustered_texts = list(clustered_texts.values())\n", 391 | "texts_from_final_summaries = list(final_summaries.values())\n", 392 | "\n", 393 | "combined_texts = texts_from_df + texts_from_clustered_texts + texts_from_final_summaries" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": null, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [ 402 | "from langchain_community.vectorstores import Chroma\n", 403 | "\n", 404 | "# Now, use all_texts to build the vectorstore with Chroma\n", 405 | "vectorstore = Chroma.from_texts(texts=combined_texts, embedding=embedding_model)\n" 406 | ] 407 | }, 408 | { 409 | "cell_type": "code", 410 | "execution_count": null, 411 | "metadata": {}, 412 | "outputs": [], 413 | "source": [ 414 | "def adjust_final_number(string: str, max_threshold: int, initial_number: int) -> int:\n", 415 | " final_number = initial_number\n", 416 | " while final_number < max_threshold:\n", 417 | " retriever = vectorstore.as_retriever(search_kwargs={\"k\": final_number})\n", 418 | " docs = retriever.get_relevant_documents(string)\n", 419 | " text = \"\".join([doc.page_content for doc in docs])\n", 420 | " if num_tokens_from_string(text) < max_threshold:\n", 421 | " final_number += 1\n", 422 | " else:\n", 423 | " break\n", 424 | " return final_number\n", 425 | "\n", 426 | "final_number = adjust_final_number(\"Who is the owner of the restaurant\", 10000, 4)\n" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": null, 432 | "metadata": {}, 433 | "outputs": [], 434 | "source": [ 435 | "final_number" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": null, 441 | "metadata": {}, 442 | "outputs": [], 443 | "source": [ 444 | "retriever = vectorstore.as_retriever(search_kwargs={\"k\": final_number})" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": null, 450 | "metadata": {}, 451 | "outputs": [], 452 | "source": [ 453 | "from langchain_core.runnables import RunnablePassthrough\n", 454 | "\n", 455 | "template = \"\"\"\n", 456 | "Answer the question based only on the following context:\n", 457 | "{context}\n", 458 | "\n", 459 | "Answer the following question:\n", 460 | "Question: {question}\n", 461 | "\"\"\"\n", 462 | "prompt = ChatPromptTemplate.from_template(template)\n", 463 | "\n", 464 | "\n", 465 | "def format_docs(docs):\n", 466 | " return \"\\n\\n\".join(doc.page_content for doc in docs)\n", 467 | "\n", 468 | "rag_chain = (\n", 469 | " {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n", 470 | " | prompt\n", 471 | " | model\n", 472 | " | StrOutputParser()\n", 473 | ")" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": null, 479 | "metadata": {}, 480 | "outputs": [], 481 | "source": [ 482 | "# Question\n", 483 | "rag_chain.invoke(\"Who is the owner of the restaurant\")" 484 | ] 485 | } 486 | ], 487 | "metadata": { 488 | "kernelspec": { 489 | "display_name": "app", 490 | "language": "python", 491 | "name": "python3" 492 | }, 493 | "language_info": { 494 | "codemirror_mode": { 495 | "name": "ipython", 496 | "version": 3 497 | }, 498 | "file_extension": ".py", 499 | "mimetype": "text/x-python", 500 | "name": "python", 501 | "nbconvert_exporter": "python", 502 | "pygments_lexer": "ipython3", 503 | "version": "3.11.0" 504 | } 505 | }, 506 | "nbformat": 4, 507 | "nbformat_minor": 2 508 | } 509 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.9.3 2 | aiosignal==1.3.1 3 | annotated-types==0.6.0 4 | anyio==4.3.0 5 | asgiref==3.7.2 6 | asttokens==2.4.1 7 | attrs==23.2.0 8 | backoff==2.2.1 9 | bcrypt==4.1.2 10 | beautifulsoup4==4.12.3 11 | build==1.1.1 12 | cachetools==5.3.3 13 | certifi==2024.2.2 14 | chardet==5.2.0 15 | charset-normalizer==3.3.2 16 | chroma-hnswlib==0.7.3 17 | chromadb==0.4.24 18 | click==8.1.7 19 | colorama==0.4.6 20 | coloredlogs==15.0.1 21 | comm==0.2.1 22 | contourpy==1.2.0 23 | cycler==0.12.1 24 | dataclasses-json==0.6.4 25 | dataclasses-json-speakeasy==0.5.11 26 | debugpy==1.8.1 27 | decorator==5.1.1 28 | Deprecated==1.2.14 29 | distro==1.9.0 30 | emoji==2.10.1 31 | executing==2.0.1 32 | fastapi==0.110.0 33 | filelock==3.13.1 34 | filetype==1.2.0 35 | flatbuffers==24.3.7 36 | fonttools==4.49.0 37 | frozenlist==1.4.1 38 | fsspec==2024.2.0 39 | google-auth==2.28.2 40 | googleapis-common-protos==1.62.0 41 | greenlet==3.0.3 42 | grpcio==1.62.1 43 | h11==0.14.0 44 | httpcore==1.0.4 45 | httptools==0.6.1 46 | httpx==0.27.0 47 | huggingface-hub==0.21.4 48 | humanfriendly==10.0 49 | idna==3.6 50 | importlib-metadata==6.11.0 51 | importlib_resources==6.1.3 52 | ipykernel==6.29.3 53 | ipython==8.22.2 54 | jedi==0.19.1 55 | joblib==1.3.2 56 | jsonpatch==1.33 57 | jsonpath-python==1.0.6 58 | jsonpointer==2.4 59 | jupyter_client==8.6.0 60 | jupyter_core==5.7.1 61 | kiwisolver==1.4.5 62 | kubernetes==29.0.0 63 | langchain==0.1.11 64 | langchain-community==0.0.27 65 | langchain-core==0.1.30 66 | langchain-openai==0.0.8 67 | langchain-text-splitters==0.0.1 68 | langchainhub==0.1.15 69 | langdetect==1.0.9 70 | langsmith==0.1.23 71 | llvmlite==0.42.0 72 | lxml==5.1.0 73 | marshmallow==3.20.2 74 | matplotlib==3.8.3 75 | matplotlib-inline==0.1.6 76 | mmh3==4.1.0 77 | monotonic==1.6 78 | mpmath==1.3.0 79 | multidict==6.0.5 80 | mypy-extensions==1.0.0 81 | nest-asyncio==1.6.0 82 | nltk==3.8.1 83 | numba==0.59.0 84 | numpy==1.26.4 85 | oauthlib==3.2.2 86 | onnxruntime==1.17.1 87 | openai==1.13.3 88 | opentelemetry-api==1.23.0 89 | opentelemetry-exporter-otlp-proto-common==1.23.0 90 | opentelemetry-exporter-otlp-proto-grpc==1.23.0 91 | opentelemetry-instrumentation==0.44b0 92 | opentelemetry-instrumentation-asgi==0.44b0 93 | opentelemetry-instrumentation-fastapi==0.44b0 94 | opentelemetry-proto==1.23.0 95 | opentelemetry-sdk==1.23.0 96 | opentelemetry-semantic-conventions==0.44b0 97 | opentelemetry-util-http==0.44b0 98 | orjson==3.9.15 99 | overrides==7.7.0 100 | packaging==23.2 101 | pandas==2.2.1 102 | parso==0.8.3 103 | pillow==10.2.0 104 | platformdirs==4.2.0 105 | posthog==3.5.0 106 | prompt-toolkit==3.0.43 107 | protobuf==4.25.3 108 | psutil==5.9.8 109 | pulsar-client==3.4.0 110 | pure-eval==0.2.2 111 | pyasn1==0.5.1 112 | pyasn1-modules==0.3.0 113 | pydantic==2.6.3 114 | pydantic_core==2.16.3 115 | Pygments==2.17.2 116 | pynndescent==0.5.11 117 | pyparsing==3.1.2 118 | PyPika==0.48.9 119 | pyproject_hooks==1.0.0 120 | pyreadline3==3.4.1 121 | python-dateutil==2.8.2 122 | python-dotenv==1.0.1 123 | python-iso639==2024.2.7 124 | python-magic==0.4.27 125 | pytz==2024.1 126 | pywin32==306 127 | PyYAML==6.0.1 128 | pyzmq==25.1.2 129 | rapidfuzz==3.6.1 130 | regex==2023.12.25 131 | requests==2.31.0 132 | requests-oauthlib==1.3.1 133 | rsa==4.9 134 | scikit-learn==1.4.1.post1 135 | scipy==1.12.0 136 | six==1.16.0 137 | sniffio==1.3.1 138 | soupsieve==2.5 139 | SQLAlchemy==2.0.28 140 | stack-data==0.6.3 141 | starlette==0.36.3 142 | sympy==1.12 143 | tabulate==0.9.0 144 | tenacity==8.2.3 145 | threadpoolctl==3.3.0 146 | tiktoken==0.6.0 147 | tokenizers==0.15.2 148 | tornado==6.4 149 | tqdm==4.66.2 150 | traitlets==5.14.1 151 | typer==0.9.0 152 | types-requests==2.31.0.20240310 153 | typing-inspect==0.9.0 154 | typing_extensions==4.9.0 155 | tzdata==2024.1 156 | umap-learn==0.5.5 157 | unstructured==0.12.6 158 | unstructured-client==0.18.0 159 | urllib3==1.26.18 160 | uvicorn==0.28.0 161 | watchfiles==0.21.0 162 | wcwidth==0.2.13 163 | websocket-client==1.7.0 164 | websockets==12.0 165 | wrapt==1.16.0 166 | yarl==1.9.4 167 | zipp==3.17.0 168 | --------------------------------------------------------------------------------