├── .gitignore ├── README.md ├── datasets ├── .gitignore ├── data-still-to-label.jsonl ├── docs.zip ├── eval-data-v1-full.jsonl ├── eval-dataset-v1.jsonl ├── golden-responses.json └── synthetic-eval-dataset.jsonl ├── docker-compose.yaml ├── images ├── length-distribution.png └── retrieval-eval.png ├── notebooks ├── .gitignore ├── 01_rag.ipynb ├── 02_evaluation.ipynb ├── 03_optimize.ipynb ├── data.py ├── eval-scores.json ├── eval.py └── utils.py ├── presentation.pdf └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | rag/ 2 | postgres_data/ 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Building, Evaluating, and Optimizing your RAG App for Production 2 | 3 | Large Language Models (LLMs) are revolutionizing how users can search for, interact with, and generate new content. Some recent stacks and toolkits around Retrieval-Augmented Generation (RAG) have emerged, enabling users to build applications such as chatbots using LLMs on their private data. However, while setting up a naive RAG stack is straightforward, having it meet a production quality bar is hard. To be an AI engineer, you need to learn principled development practices for evaluation and optimization of your RAG app - from data parameters to retrieval algorithms to fine-tuning. 4 | 5 | This workshop will guide you through this development process. You'll start with the basic RAG stack, create an initial evaluation suite, and then experiment with different advanced techniques to improve RAG performance. 6 | 7 | ## Environment Setup 8 | Setup python environment 9 | 1. Create and activate a python virtual environment 10 | ``` 11 | python3 -m venv rag 12 | source rag/bin/activate 13 | ``` 14 | 2. Install dependencies 15 | ``` 16 | pip install -r requirements.txt 17 | ``` 18 | 19 | Setup postgres 20 | 1. Install docker: follow OS-specific instructions at https://docs.docker.com/engine/install/ 21 | 2. Launch postgres with docker compose (under project directory) 22 | ``` 23 | docker-compose up -d 24 | ``` 25 | 26 | Prepare OpenAI credentials 27 | 1. Create one at https://platform.openai.com/account/api-keys if you don't have one 28 | 29 | ## Get Started 30 | We will be going through 3 notebooks, to follow along: 31 | ``` 32 | jupyter lab 33 | ``` 34 | 35 | 36 | ## Core Dependencies 37 | ``` 38 | llama-index 39 | ray[data] 40 | 41 | # for notebooks 42 | jupyter 43 | 44 | # for postgres 45 | sqlalchemy[asyncio] 46 | pgvector 47 | psycopg2-binary 48 | asyncpg 49 | ``` 50 | -------------------------------------------------------------------------------- /datasets/.gitignore: -------------------------------------------------------------------------------- 1 | docs.ray.io/ 2 | sql_dumps/ 3 | -------------------------------------------------------------------------------- /datasets/data-still-to-label.jsonl: -------------------------------------------------------------------------------- 1 | {'question': 'What is the rest api for getting the head node id?', 'source': 'https://docs.ray.io/en/latest/index.html'} 2 | {'question': 'how to rerun a canceled ray task', 'source': 'https://docs.ray.io/en/latest/ray-core/api/doc/ray.cancel.html#ray.cancel'} 3 | {'question': 'how to print ray version in notebook', 'source': 'https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments-api-ref'} 4 | {'question': 'How do I set the max parallel concurrent scheduled tasks in map_batches?', 'source': 'https://docs.ray.io/en/latest/ray-core/examples/batch_prediction.html'} 5 | {'question': 'How do I get the number of cpus from ray cluster?', 'source': 'https://docs.ray.io/en/latest/ray-air/examples/huggingface_text_classification.html'} 6 | {'question': 'How to use the exclude option to the runtime_env', 'source': 'https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#api-reference'} 7 | {'question': 'show a map batch example with batch_format', 'source': 'https://docs.ray.io/en/latest/data/transforming-data.html'} 8 | {'question': 'how to find local ray address', 'source': 'https://docs.ray.io/en/latest/ray-core/examples/gentle_walkthrough.html'} 9 | {'question': 'Why don’t I see any deprecation warnings from `warnings.warn` when running with Ray Tune?', 'source': 'https://docs.ray.io/en/latest/tune/tutorials/tune-output.html'} 10 | {'question': 'how can I set *num_heartbeats_timeout in `ray start --head`* command ?', 'source': 'https://docs.ray.io/en/latest/cluster/cli.html'} 11 | {'question': "ray crashing with AttributeError: module 'pydantic.fields' has no attribute 'ModelField", 'source': 'https://discuss.ray.io/'} 12 | {'question': 'How to start ray cluster on multiple node via CLI?', 'source': 'https://docs.ray.io/en/latest/cluster/vms/user-guides/launching-clusters/aws.html'} 13 | {'question': 'my ray tuner shows "running" but CPU usage is almost 0%. why ?', 'source': 'https://docs.ray.io/en/latest/tune/faq.html'} 14 | {'question': 'should the Ray head node and all workers have the same object store memory size allocated?', 'source': 'https://docs.ray.io/en/latest/ray-observability/user-guides/debug-apps/debug-memory.html'} 15 | {'question': 'I want to set up gcs health checks via REST API, what is the endpoint that I can hit to check health for gcs?', 'source': 'https://docs.ray.io'} 16 | {'question': 'In Ray Serve, how to specify whether to set up an httpproxy on each node, or just the head node?', 'source': 'https://docs.ray.io/en/latest/serve/architecture.html'} 17 | {'question': 'Want to embed Grafana into the Ray Dashboard, given that I am using KubeRay\n\nGiven the context that Prometheus and Grafana are not running on my Head node, and that I am using KubeRay, how should I be setting the following variables?\n• `RAY_GRAFANA_HOST`\n• `RAY_PROMETHEUS_HOST`\nAnd is there a way to set them more intelligently, given that head node IP is changing every time we reconfigure our cluster?', 'source': 'https://docs.ray.io/en/latest/cluster/configure-manage-dashboard.html'} 18 | {'question': 'How the GCS determines which Kubernetes pod to kill when using KubeRay autoscaling?', 'source': 'https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/configuring-autoscaling.html'} 19 | {'question': 'How can I set the `request_timeout_s` in `http_options` section of a Ray Serve YAML config file?', 'source': 'https://docs.ray.io/en/latest/serve/index.html'} 20 | {'question': 'How do I make the GPU available on my M1 laptop to ray?', 'source': 'https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh'} 21 | {'question': 'How can I add a timeout for the Ray job?', 'source': 'https://docs.ray.io/en/latest/serve/performance.html'} 22 | {'question': 'how do I set custom /tmp directory for remote cluster?', 'source': 'https://discuss.ray.io/t/8862'} 23 | {'question': 'if I set --temp-dir to a different directory than /tmp, will ray object spill to the custom directory ?', 'source': 'https://docs.ray.io/en/latest/ray-core/objects/object-spilling.html'} 24 | {'question': 'can you give me an example for *`--runtime-env-json`*', 'source': 'https://docs.ray.io/en/latest/serve/dev-workflow.html'} 25 | {'question': 'What is a default value for memory for rayActorOptions?', 'source': 'https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.RayActorOptionsSchema.html'} 26 | {'question': 'What should be the value of `maxConcurrentReplicas` if autoscaling configuration is specified?', 'source': 'https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.DeploymentSchema.html#ray.serve.schema.DeploymentSchema.num_replicas_and_autoscaling_config_mutually_exclusive'} 27 | {'question': 'Yes what should be the value of `max_concurrent_queries` when `target_num_ongoing_requests_per_replica` is specified?', 'source': 'https://docs.ray.io/en/latest/serve/performance.html'} 28 | {'question': 'what is a `smoothing_factor`', 'source': 'https://docs.ray.io/en/latest/serve/scaling-and-resource-allocation.html'} 29 | {'question': 'Why do we need to configure ray serve application such that it can run on one node?', 'source': 'https://www.anyscale.com/blog/simplify-your-mlops-with-ray-and-ray-serve'} 30 | {'question': 'What is the reason actors change their state to unhealthy?', 'source': 'https://docs.ray.io/en/latest/ray-core/fault_tolerance/actors.html'} 31 | {'question': 'How can I add `max_restarts` to serve deployment?', 'source': 'https://docs.ray.io/en/latest/serve/index.html'} 32 | {'question': 'How do I access logs for a dead node?', 'source': 'https://docs.ray.io/en/latest/ray-observability/user-guides/cli-sdk.html'} 33 | {'question': 'What are the reasons for a node to change it’s status to dead?', 'source': 'https://docs.ray.io/en/latest/ray-core/fault_tolerance/nodes.html'} 34 | {'question': 'What are the reasons for spikes in node CPU utilization', 'source': 'https://www.anyscale.com/blog/autoscaling-clusters-with-ray'} 35 | {'question': 'What AWS machine type is recommended to deploy a RayService on EKS?', 'source': 'https://docs.ray.io/en/latest/'} 36 | {'question': 'Can you write a function that runs exactly once on each node of a ray cluster?', 'source': 'https://docs.ray.io/en/latest/ray-air/examples/gptj_deepspeed_fine_tuning.html'} 37 | {'question': 'can you drain a node for maintenance?', 'source': 'https://docs.ray.io/en/latest/cluster/cli.html'} 38 | {'question': 'what env variable should I set to disable the heartbeat message displayed every 5 sec? I would like to turn it to every 1 minute for instance.', 'source': 'https://docs.ray.io/en/latest/'} 39 | {'question': 'Is there a way to configure the session name generated by ray?', 'source': 'https://docs.ray.io/en/latest/ray-core/configure.html'} 40 | {'question': 'How can I choose which worker group to use when submitting a ray job?', 'source': 'https://discuss.ray.io/t/9824'} 41 | {'question': 'can I use the Python SDK to get a link to Ray dashboard for a given job?', 'source': 'https://docs.ray.io/en/latest/ray-observability/getting-started.html'} 42 | {'question': 'I’d like to use the Ray Jobs Python SDK to get a link to a specific Job view in the dashboard', 'source': 'https://docs.ray.io/en/latest/cluster/running-applications/job-submission/sdk.html'} 43 | {'question': 'I am building a product on top of ray and would like to use ray name & logo for it :slightly_smiling_face: where can I find ray name usage guidelines?', 'source': 'https://forms.gle/9TSdDYUgxYs8SA9e8'} 44 | {'question': 'What may possible cause the node where this task was running crashed unexpectedly. This can happen if: (1) the instance where the node was running failed, (2) raylet crashes unexpectedly (OOM, preempted node, etc).', 'source': 'https://www.anyscale.com/blog/automatic-and-optimistic-memory-scheduling-for-ml-workloads-in-ray'} 45 | {'question': 'Do you know how to resolve (gcs_server) : Health check failed for node? I observed that the node is still up and running.', 'source': 'https://docs.ray.io/en/latest/ray-observability/user-guides/cli-sdk.html'} 46 | {'question': 'How to extend the health check threshold?', 'source': 'https://docs.ray.io/en/latest/serve/api/doc/ray.serve.schema.DeploymentSchema.html'} 47 | {'question': 'How to extend the GCS health check threshold for for a Ray job use case?', 'source': 'https://docs.ray.io/en/latest/ray-core/fault_tolerance/gcs.html'} 48 | {'question': 'What is the working of `PowerOfTwoChoicesReplicaScheduler` ?', 'source': 'https://github.com/ray-project/ray/pull/36501'} 49 | {'question': 'Do you need the DAGDriver to deploy a serve application using RayServe?', 'source': 'https://docs.ray.io/en/latest/serve/key-concepts.html'} 50 | {'question': 'What’s the import path that I need to provide to a simple RayServe deployment?', 'source': 'https://maxpumperla.com/learning_ray'} 51 | {'question': 'what’s the latest version of ray', 'source': 'https://github.com/ray-project/ray/releases/tag/ray-1.11.0'} 52 | {'question': 'do you know ray have been updated to version 2.6?', 'source': 'https://github.com/ray-project/ray'} 53 | {'question': 'do you have any documents / examples showing the usage of RayJob in Kuberay?', 'source': 'https://ray-project.github.io/kuberay/guidance/rayjob/'} 54 | {'question': 'Do you have any document/guide which shows how to setup the local development environment for kuberay on a arm64 processor based machine?', 'source': 'https://docs.ray.io/en/latest/ray-contribute/development.html#building-ray'} 55 | {'question': 'How can I configure min and max worker number of nodes when I’m using Ray on Databricks?', 'source': 'https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-configuration.html'} 56 | {'question': 'Does Ray metrics have to be exported via an actor?', 'source': 'https://docs.ray.io/en/latest/ray-core/ray-metrics.html'} 57 | {'question': 'How is object store memory calculated?', 'source': 'https://docs.ray.io/en/latest/ray-core/scheduling/memory-management.html'} 58 | {'question': 'how can I avoid objects not getting spilled?', 'source': 'https://docs.ray.io/en/latest/data/data-internals.html'} 59 | {'question': 'what’s ray core', 'source': 'https://docs.ray.io/en/latest/ray-core/tasks.html#ray-remote-functions'} 60 | {'question': 'Does ray support cron job', 'source': 'https://pillow.readthedocs.io/en/stable/handbook/concepts.html#modes'} 61 | {'question': 'can you give me the dependencies list for api read_images?', 'source': 'https://pillow.readthedocs.io/en/stable/handbook/concepts.html#modes'} 62 | {'question': 'how do I kill a specific serve replica', 'source': 'https://docs.ray.io/en/latest/serve/production-guide/fault-tolerance.html'} 63 | {'question': 'What exactly is rayjob? How is it handled in kuberay? Can you give an example of what a Rayjob will look like?', 'source': 'https://ray-project.github.io/kuberay/guidance/rayjob/'} 64 | {'question': 'do you have access to the CRD yaml file of RayJob for KubeRay?', 'source': 'https://github.com/ray-project/kuberay'} 65 | 66 | {'question': 'how do I adjust the episodes per iteration in Ray Tune?', 'source': 'https://docs.ray.io/en/latest/tune/index.html'} 67 | {'question': 'in Ray Tune, can you explain what episodes are?', 'source': 'https://docs.ray.io/en/latest/ray-references/glossary.html'} 68 | {'question': 'how do I know how many agents a Tune episode is spanning?', 'source': 'https://docs.ray.io/en/latest/index.html'} 69 | {'question': 'how can I limit the number of jobs in the history stored in the ray GCS?', 'source': 'https://docs.ray.io/en/latest/index.html'} 70 | {'question': 'I have a large csv file on S3. How do I use Ray to create another csv file with one column removed?', 'source': 'https://docs.ray.io/en/master/data/api/doc/ray.data.read_csv.html#ray-data-read-csv'} 71 | {'question': 'How to discover what node was used to run a given task', 'source': 'https://docs.ray.io/en/latest/ray-core/ray-dashboard.html#ray-dashboard'} 72 | {'question': 'it is possible to discover what node was used to execute a given task using its return future, object reference ?', 'source': 'https://docs.ray.io/en/latest/ray-core/walkthrough.html#running-a-task'} 73 | {'question': 'how to efficiently broadcast a large nested dictionary from a single actor to thousands of tasks', 'source': 'https://discuss.ray.io/t/6521'} 74 | {'question': 'How to mock remote calls of an Actor for Testcases?', 'source': 'https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments'} 75 | {'question': 'How to use pytest mock to create a Actor', 'source': 'https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments'} 76 | {'question': 'Can I initiate an Actor directly without remote()', 'source': 'https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments'} 77 | {'question': 'Is there a timeout or retry setting for long a worker will wait / retry to make an initial connection to the head node?', 'source': 'https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#runtime-environments'} 78 | {'question': 'im getting this error of ValueError: The base resource usage of this topology ExecutionResources but my worker and head node are both GPU nodes...oh is it expecting 2 GPUs on a single worker node is that why?', 'source': 'https://docs.ray.io/en/latest/train/faq.html'} 79 | {'question': 'how can I move airflow variables in ray task ?', 'source': 'https://docs.ray.io/en/latest/ray-observability/monitoring-debugging/gotchas.html#environment-variables-are-not-passed-from-the-driver-to-workers'} 80 | {'question': 'How to recompile Ray docker image using Ubuntu 22.04LTS as the base docker image?', 'source': 'https://github.com/ray-project/ray.git'} 81 | {'question': 'I am using TuneSearchCV with an XGBoost regressor. To test it out, I have set the n_trials to 3 and left the n_jobs at its default of -1 to use all available processors. From what I have observed, only one trial runs per CPU since 3 trials only uses 3 CPUs which is pretty time consuming. Is there a way to run a single trial across multiple CPUs to speed things up?', 'source': 'https://docs.ray.io/en/latest/ray-core/actors/async_api.html'} 82 | {'question': 'how do I make rolling mean column in ray dataset?', 'source': 'https://docs.ray.io/en/latest/ray-core/actors/terminating-actors.html#manual-termination-within-the-actor'} 83 | {'question': "Where is the execution limit coming from? I'm not sure where I set it", 'source': 'https://docs.ray.io/en/latest/data/dataset-internals.html#configuring-resources-and-locality'} 84 | {'question': 'The ray cluster spins up the workers, but then immediately kills them when it starts to process the data - is this expected behavior? If not, what could the issue be?', 'source': 'https://docs.ray.io/en/latest/data/examples/nyc_taxi_basic_processing.html'} 85 | {'question': 'Does Ray support numpy 1.24.2?', 'source': 'https://docs.ray.io/en/latest/index.html'} 86 | {'question': 'Can I have a super class of Actor?', 'source': 'https://docs.ray.io/en/latest/cluster/running-applications/job-submission/ray-client.html#client-arguments'} 87 | {'question': 'can I specify working directory in ray.client(base_url).namespace(namespsce).connect()', 'source': 'https://docs.ray.io/en/latest/cluster/running-applications/job-submission/ray-client.html#client-arguments'} 88 | {'question': 'can I monkey patch a ray function?', 'source': 'https://docs.ray.io/en/latest/ray-observability/monitoring-debugging/gotchas.html#outdated-function-definitions'} 89 | {'question': 'I get the following error using Ray Tune with Ray version 2.4.0 after a successful training epoch: “TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.” According to the stack trace, the error seems to come from the __report_progress_ method. I’m using one GPU to train a pretrained ResNet18 model. Do you know what is causing this issue?', 'source': 'https://docs.ray.io/en/latest/index.html'} 90 | {'question': 'how to use ray.init to launch a multi-node cluster', 'source': 'https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-configuration.html'} 91 | {'question': 'why detauched Actor pointing to old working directory ?', 'source': 'https://docs.ray.io/en/latest/ray-core/actors/named-actors.html#actor-lifetimes'} 92 | {'question': 'If I spawn a process in a Ray Task, what happens to that process when the Ray Task completes?', 'source': 'https://docs.ray.io/en/latest/ray-core/tasks/using-ray-with-gpus.html'} 93 | {'question': 'how can I use torch.distributed.launch with Ray jobs?', 'source': 'https://www.anyscale.com/blog/large-scale-distributed-training-with-torchx-and-ray'} 94 | {'question': 'how to fix this issue: "WARNING sample.py:469 -- sample_from functions that take a spec dict are deprecated. Please update your function to work with the config dict directly."', 'source': 'https://docs.ray.io/en/latest/tune/api/doc/ray.tune.sample_from.html'} 95 | {'question': 'How does one define the number of timesteps and episodes when training a PPO algorithm with Rllib?', 'source': 'https://docs.ray.io/en/latest/rllib/rllib-algorithms.html#part-2'} 96 | {'question': "my serve endpoint doesn't seem to run my code when deployed onto our remote cluster. Only the endpoints that are using DAGDrivers are running into issues", 'source': 'https://docs.ray.io/en/latest/serve/production-guide/deploy-vm.html#adding-a-runtime-environment'} 97 | {'question': 'How to specify different preprocessors for train and evaluation ray datasets?', 'source': 'https://docs.ray.io/en/latest/'} 98 | {'question': 'Can I set the ray.init() in the worker code for ray serve?', 'source': 'https://docs.ray.io/en/latest/serve/api/index.html'} 99 | {'question': 'Can I use a ubuntu 22.04 image to install Ray as a python package and use it for Kubernetes cluster?', 'source': 'https://docs.ray.io/en/latest/ray-overview/installation.html#installation'} 100 | -------------------------------------------------------------------------------- /datasets/docs.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/run-llama/ai-engineer-workshop/918b7efd79ec631978f484e1d1ad9704fae64306/datasets/docs.zip -------------------------------------------------------------------------------- /datasets/eval-data-v1-full.jsonl: -------------------------------------------------------------------------------- 1 | {"question": "I’m struggling a bit with Ray Data type conversions when I do map_batches. Any advice?", "source": "https://docs.ray.io/en/master/data/transforming-data.html#configuring-batch-format"} 2 | {"question": "How does autoscaling work in a Ray Serve application?", "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#autoscaling"} 3 | {"question": "can i create my own ray image with custom python version", "source": ""} 4 | {"question": "how do I get the address of a ray node", "source": "https://docs.ray.io/en/master/ray-core/miscellaneous.html#node-information"} 5 | {"question": "are you based on GPT-4?", "source": ""} 6 | {"question": "why it takes 10 mins for you to answer my question?", "source": ""} 7 | {"question": "Does Ray support NCCL?", "source": "https://docs.ray.io/en/master/ray-more-libs/ray-collective.html"} 8 | {"question": "could you give me an example of using this library for data-parallel training of CNNs on Ray?", "source": "https://docs.ray.io/en/master/ray-air/computer-vision.html#training-vision-models"} 9 | {"question": "Is Ray integrated with DeepSpeed?", "source": "https://docs.ray.io/en/master/ray-air/examples/gptj_deepspeed_fine_tuning.html#fine-tuning-the-model-with-ray-air-a-name-train-a"} 10 | {"question": "what will happen if I use AsyncIO's await to wait for a Ray future like `await x.remote()`", "source": "https://docs.ray.io/en/master/ray-core/actors/async_api.html#objectrefs-as-asyncio-futures"} 11 | {"question": "How would you compare Spark, Ray, Dask?", "source": "https://docs.ray.io/en/master/data/overview.html#how-does-ray-data-compare-to-x-for-offline-inference"} 12 | {"question": "why would ray overload a node w/ more task that the resources allow ?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/resources.html#physical-resources-and-logical-resources"} 13 | {"question": "when should I use Ray Client?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/ray-client.html#when-to-use-ray-client"} 14 | {"question": "how to scatter actors across the cluster?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/index.html#spread"} 15 | {"question": "how can i go about fine tuning a LLM with Ray?", "source": "https://raw.githubusercontent.com/ray-project/ray/master/doc/source/templates/04_finetuning_llms_with_deepspeed/README.md"} 16 | {"question": "can you create a tweet thread from chapter 8, \"Online Inference with Ray Serve\" of the book \"Learning Ray\"?", "source": ""} 17 | {"question": "On remote ray cluster, when I do `ray debug` I'm getting connection refused error. Why ?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/ray-debugging.html#running-on-a-cluster"} 18 | {"question": "How does Ray AIR set up the model to communicate gradient updates across machines?", "source": "https://docs.ray.io/en/master/train/train.html#intro-to-ray-train"} 19 | {"question": "Why would I use Ray Serve instead of Modal or Seldon? Why can't I just do it via containers?", "source": "https://docs.ray.io/en/master/serve/index.html"} 20 | {"question": "How do I deploy an LLM workload on top of Ray Serve?", "source": "https://docs.ray.io/en/master/ray-air/examples/gptj_serving.html"} 21 | {"question": "what size of memory should I need for this if I am setting set the `model_id` to “EleutherAI/gpt-j-6B”?", "source": "https://docs.ray.io/en/master/ray-air/examples/gptj_serving.html"} 22 | {"question": "How do I log the results from multiple distributed workers into a single tensorboard?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-output.html#how-to-log-your-tune-runs-to-tensorboard"} 23 | {"question": "how do you config SyncConfig for a Ray AIR job?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-storage.html#on-a-multi-node-cluster-deprecated"} 24 | {"question": "how can I quickly narrow down the root case of a failed ray job, assuming I have access to all the logs", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-files-in-logging-directory"} 25 | {"question": "How do I specify how many GPUs a serve deployment needs?", "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#resource-management-cpus-gpus"} 26 | {"question": "One of my worker nodes keeps dying on using TensorflowTrainer with around 1500 workers, I observe SIGTERM has been received to the died node's raylet. How can I debug this?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-files-in-logging-directory"} 27 | {"question": "what are the possible reasons for nodes dying in a cluster?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/ray-oom-prevention.html"} 28 | {"question": "how do I programatically get ray remote cluster to a target size immediately without scaling up through autoscaler ?", "source": "https://docs.ray.io/en/master/cluster/running-applications/autoscaling/reference.html#ray-autoscaler-sdk-request-resources"} 29 | {"question": "how do you disable async iter_batches with Ray Dataset?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.iter_batches.html#ray-data-dataset-iter-batches"} 30 | {"question": "what is the different between a batch and a block, for ray datasets?", "source": "https://docs.ray.io/en/master/data/data-internals.html#datasets-and-blocks"} 31 | {"question": "what might be the reason for \"ray up\" not staring worker nodes (after ray up a cluster configuration). The connection between nodes works well, I can ssh from head to workers. The ray config has correct ssh key listed.", "source": "https://discuss.ray.io/t/6677"} 32 | {"question": "How to setup the development environments for ray project?", "source": "https://docs.ray.io/en/master/ray-contribute/development.html"} 33 | {"question": "how do I debug why ray rollout workers are deadlocking when using the sample API in `ray/rllib/evaluation/rollout_worker.py`", "source": "https://docs.ray.io/en/master/rllib/rllib-dev.html#troubleshooting"} 34 | {"question": "how do I join two ray datasets?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.zip.html"} 35 | {"question": "Is there a way to retrieve an object ref from its id?", "source": "https://docs.ray.io/en/master/ray-core/objects.html"} 36 | {"question": "how to create model Checkpoint from the model in memory?", "source": "https://docs.ray.io/en/master/train/api/doc/ray.train.torch.TorchCheckpoint.from_model.html#ray-train-torch-torchcheckpoint-from-model"} 37 | {"question": "what is Deployment in Ray Serve?", "source": "https://docs.ray.io/en/master/serve/key-concepts.html#deployment"} 38 | {"question": "What is user config in Ray Serve? how do I use it?", "source": "https://docs.ray.io/en/master/serve/configure-serve-deployment.html#configure-ray-serve-deployments"} 39 | {"question": "What is the difference between PACK and SPREAD strategy?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/placement-group.html#placement-strategy"} 40 | {"question": "What’s the best way to run ray across multiple machines?", "source": "https://docs.ray.io/en/master/ray-core/cluster/index.html"} 41 | {"question": "how do I specify ScalingConfig for a Tuner run?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.Tuner.html"} 42 | {"question": "how to utilize ‘zero-copy’ feature ray provide for numpy?", "source": "https://docs.ray.io/en/master/ray-core/objects/serialization.html#numpy-arrays"} 43 | {"question": "if there are O(millions) of keys that all have state, is it ok to spin up 1=1 actors? Or would it be advised to create ‘key pools’ where an actor can hold 1=many keys?", "source": "https://docs.ray.io/en/master/ray-core/patterns/too-fine-grained-tasks.html"} 44 | {"question": "How to get the best AIR checkpoint after training without a Result object?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.air.Result.html#ray-air-result"} 45 | {"question": "How to find the best checkpoint from the trial directory?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.ExperimentAnalysis.html"} 46 | {"question": "what are the advantage and disadvantage of using singleton Actor ?", "source": "https://docs.ray.io/en/master/ray-core/actors/named-actors.html"} 47 | {"question": "what are the advantages of using a named actor?", "source": "https://docs.ray.io/en/master/ray-core/actors/named-actors.html"} 48 | {"question": "How do I read a text file stored on S3 using Ray Data?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.read_text.html"} 49 | {"question": "how do I get the IP of the head node for my Ray cluster?", "source": "https://docs.ray.io/en/master/ray-core/miscellaneous.html#node-information"} 50 | {"question": "How to write a map function that returns a list of object for `map_batches`?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map_batches.html#ray-data-dataset-map-batches"} 51 | {"question": "Can you tell me more about the strict_mode in Ray Data? Why it is introduced and what code changes do we need?", "source": "https://raw.githubusercontent.com/ray-project/enhancements/main/reps/2023-04-27-data-strict-mode.md"} 52 | {"question": "How do I set a maximum episode length when training with Rllib?", "source": "https://docs.ray.io/en/master/rllib/key-concepts.html"} 53 | {"question": "how do I make a Ray Tune trial retry on failures?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-fault-tolerance.html"} 54 | {"question": "For the supervised actor pattern, can we keep the Worker Actor up if the Supervisor passes a reference to the Actor to another Actor, to allow the worker actor to remain even on Supervisor / Driver failure?", "source": "https://docs.ray.io/en/master/ray-core/patterns/tree-of-actors.html"} 55 | {"question": "How do I read a large text file in S3 with Ray?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.read_text.html"} 56 | {"question": "how do I get a ray dataset from pandas", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.from_pandas.html"} 57 | {"question": "can you give me an example of using `ray.data.map` ?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map.html"} 58 | {"question": "can you give me an example of using `ray.data.map` , with a callable class as input?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map.html"} 59 | {"question": "How to set memory limit for each trial in Ray Tuner?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-resources.html"} 60 | {"question": "how do I get the actor id of an actor", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.runtime_context.get_runtime_context.html"} 61 | {"question": "can ray.init() can check if ray is all-ready initiated ?", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.init.html"} 62 | {"question": "What does the `compute=actor` argument do within `ray.data.map_batches` ?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map_batches.html"} 63 | {"question": "how do I use wandb logger with accelerateTrainer?", "source": "https://docs.ray.io/en/master/ray-air/api/doc/ray.air.integrations.wandb.WandbLoggerCallback.html"} 64 | {"question": "What will be implicitly put into object store?", "source": "https://docs.ray.io/en/master/ray-core/objects.html#objects"} 65 | {"question": "How do I kill or cancel a ray task that I already started?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/tasks.html#cancelling-misbehaving-tasks"} 66 | {"question": "how to send extra arguments in dataset.map_batches function?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map_batches.html#ray-data-dataset-map-batches"} 67 | {"question": "where does ray GCS store the history of jobs run on a kuberay cluster? What type of database and format does it use for this?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/static-ray-cluster-without-kuberay.html#external-redis-integration-for-fault-tolerance"} 68 | {"question": "How to resolve ValueError: The actor ImplicitFunc is too large?", "source": "https://docs.ray.io/en/master/ray-core/patterns/closure-capture-large-objects.html"} 69 | {"question": "How do I use ray to distribute training for my custom neural net written using Keras in Databricks?", "source": "https://docs.ray.io/en/master/train/examples/tf/tensorflow_mnist_example.html"} 70 | {"question": "how to use ray.put and ray,get?", "source": "https://docs.ray.io/en/master/ray-core/objects.html#fetching-object-data"} 71 | {"question": "how do I use Ray Data to pre process many files?", "source": "https://docs.ray.io/en/master/data/transforming-data.html#transforming-batches-with-tasks"} 72 | {"question": "can’t pickle SSLContext objects", "source": "https://docs.ray.io/en/master/ray-core/objects/serialization.html#customized-serialization"} 73 | {"question": "How do I install CRDs in Kuberay?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/getting-started.html#deploying-the-kuberay-operator"} 74 | {"question": "Why the function for Ray data batch inference has to be named as _`__call__()`_ ?", "source": "https://docs.ray.io/en/master/data/examples/nyc_taxi_basic_processing.html#parallel-batch-inference"} 75 | {"question": "How to disconnnect ray client?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/ray-client.html#connect-to-multiple-ray-clusters-experimental"} 76 | {"question": "how to submit job with python with local files?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/quickstart.html#submitting-a-job"} 77 | {"question": "How do I do inference from a model trained by Ray tune.fit()?", "source": "https://docs.ray.io/en/master/data/batch_inference.html#using-models-from-ray-train"} 78 | {"question": "is there a way to load and run inference without using pytorch or tensorflow directly?", "source": "https://docs.ray.io/en/master/serve/index.html"} 79 | {"question": "what does ray do", "source": "https://docs.ray.io/en/master/ray-overview/index.html#overview"} 80 | {"question": "If I specify a fractional GPU in the resource spec, what happens if I use more than that?", "source": "https://docs.ray.io/en/master/ray-core/tasks/using-ray-with-gpus.html#fractional-gpus"} 81 | {"question": "how to pickle a variable defined in actor’s init method", "source": "https://docs.ray.io/en/master/ray-core/objects/serialization.html#customized-serialization"} 82 | {"question": "how do I do an all_reduce operation among a list of actors", "source": "https://docs.ray.io/en/master/ray-core/examples/map_reduce.html#shuffling-and-reducing-data"} 83 | {"question": "What will happen if we specify a bundle with `{\"CPU\":0}` in the PlacementGroup?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/placement-group.html#bundles"} 84 | {"question": "How to cancel job from UI?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/cli.html#ray-job-stop"} 85 | {"question": "how do I get my project files on the cluster when using Ray Serve? My workflow is to call `serve deploy config.yaml --address `", "source": "https://docs.ray.io/en/master/serve/advanced-guides/dev-workflow.html#testing-on-a-remote-cluster"} 86 | {"question": "how do i install ray nightly wheel", "source": "https://docs.ray.io/en/master/ray-overview/installation.html#daily-releases-nightlies"} 87 | {"question": "how do i install the latest ray nightly wheel?", "source": "https://docs.ray.io/en/master/ray-overview/installation.html#daily-releases-nightlies"} 88 | {"question": "how can I write unit tests for Ray code?", "source": "https://docs.ray.io/en/master/ray-core/examples/testing-tips.html#tip-2-sharing-the-ray-cluster-across-tests-if-possible"} 89 | {"question": "How I stop Ray from spamming lots of Info updates on stdout?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#disable-logging-to-the-driver"} 90 | {"question": "how to deploy stable diffusion 2.1 with Ray Serve?", "source": "https://docs.ray.io/en/master/serve/tutorials/stable-diffusion.html#serving-a-stable-diffusion-model"} 91 | {"question": "what is actor_handle?", "source": "https://docs.ray.io/en/master/ray-core/actors.html#passing-around-actor-handles"} 92 | {"question": "how to kill a r detached actors?", "source": "https://docs.ray.io/en/master/ray-core/actors/named-actors.html#actor-lifetimes"} 93 | {"question": "How to force upgrade the pip package in the runtime environment if an old version exists?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#api-reference"} 94 | {"question": "How do I do global shuffle with Ray?", "source": "https://docs.ray.io/en/master/data/transforming-data.html#shuffling-rows"} 95 | {"question": "How to find namespace of an Actor?", "source": "https://docs.ray.io/en/master/ray-observability/reference/doc/ray.util.state.list_actors.html#ray-util-state-list-actors"} 96 | {"question": "How does Ray work with async.io ?", "source": "https://docs.ray.io/en/master/ray-core/actors/async_api.html#asyncio-for-actors"} 97 | {"question": "How do I debug a hanging `ray.get()` call? I have it reproduced locally.", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/debug-hangs.html"} 98 | {"question": "can you show me an example of ray.actor.exit_actor()", "source": "https://docs.ray.io/en/master/ray-core/actors/terminating-actors.html#manual-termination-within-the-actor"} 99 | {"question": "how to add log inside actor?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#customizing-worker-process-loggers"} 100 | {"question": "can you write a script to do batch inference with GPT-2 on text data from an S3 bucket?", "source": "https://docs.ray.io/en/master/data/working-with-text.html#performing-inference-on-text"} 101 | {"question": "How do I enable Ray debug logs?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#using-rays-logger"} 102 | {"question": "How do I list the current Ray actors from python?", "source": "https://docs.ray.io/en/master/ray-observability/reference/doc/ray.util.state.list_actors.html#ray-util-state-list-actors"} 103 | {"question": "I want to kill the replica actor from Python. how do I do it?", "source": "https://docs.ray.io/en/master/ray-core/actors/terminating-actors.html#manual-termination-via-an-actor-handle"} 104 | {"question": "how do I specify in my remote function declaration that I want the task to run on a V100 GPU type?", "source": "https://docs.ray.io/en/master/ray-core/tasks/using-ray-with-gpus.html#accelerator-types"} 105 | {"question": "How do I get started?", "source": "https://docs.ray.io/en/master/ray-overview/getting-started.html#getting-started"} 106 | {"question": "How to specify python version in runtime_env?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#api-reference"} 107 | {"question": "how to create a Actor in a namespace?", "source": "https://docs.ray.io/en/master/ray-core/namespaces.html#using-namespaces"} 108 | {"question": "Can I specify multiple working directories?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#using-local-files"} 109 | {"question": "what if I set num_cpus=0 for tasks", "source": "https://docs.ray.io/en/master/ray-core/scheduling/resources.html#fractional-resource-requirements"} 110 | {"question": "is it possible to have ray on k8s without using kuberay? especially with the case that autoscaler is enabled.", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/static-ray-cluster-without-kuberay.html"} 111 | {"question": "how to manually configure and manage Ray cluster on Kubernetes", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/static-ray-cluster-without-kuberay.html#deploying-a-static-ray-cluster"} 112 | {"question": "If I shutdown a raylet, will the tasks and workers on that node also get killed?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/nodes.html#node-fault-tolerance"} 113 | {"question": "If I’d like to debug out of memory, how do I Do that, and which documentation should I look?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/debug-memory.html#detecting-out-of-memory-errors"} 114 | {"question": "How to use callback in Trainer?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.Callback.html#ray-tune-callback"} 115 | {"question": "How to provide current working directory to ray?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#remote-uris"} 116 | {"question": "how to create an actor instance with parameter?", "source": "https://docs.ray.io/en/master/ray-core/actors.html#actors"} 117 | {"question": "how to push a custom module to ray which is using by Actor ?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#library-development"} 118 | {"question": "how to print ray working directory?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments"} 119 | {"question": "why I can not see log.info in ray log?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#customizing-worker-process-loggers"} 120 | {"question": "when you use ray dataset to read a file, can you make sure the order of the data is preserved?", "source": "https://docs.ray.io/en/master/data/performance-tips.html#deterministic-execution"} 121 | {"question": "Can you explain what \"Ray will *not* retry tasks upon exceptions thrown by application code\" means ?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/tasks.html#retrying-failed-tasks"} 122 | {"question": "how do I specify the log directory when starting Ray?", "source": "https://docs.ray.io/en/master/ray-core/configure.html#logging-and-debugging"} 123 | {"question": "how to launch a ray cluster with 10 nodes, without setting the min worker as 10", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/on-premises.html#start-worker-nodes"} 124 | {"question": "how to use ray api to scale up a cluster", "source": "https://docs.ray.io/en/master/cluster/running-applications/autoscaling/reference.html#ray-autoscaler-sdk-request-resources"} 125 | {"question": "we plan to use Ray cloud launcher to start a cluster in AWS. How can we specify a subnet in the deployment file?", "source": "https://docs.ray.io/en/master/cluster/vms/references/ray-cluster-configuration.html#full-configuration"} 126 | {"question": "where I can find HTTP server error code log for Ray serve", "source": "https://docs.ray.io/en/master/serve/monitoring.html#ray-logging"} 127 | {"question": "I am running ray cluster on amazon and I have troubles displaying the dashboard. When a I tunnel the dashboard port from the headnode to my machine, the dashboard opens, and then it disappears (internal refresh fails). Is it a known problem? What am I doing wrong?", "source": "https://docs.ray.io/en/master/cluster/configure-manage-dashboard.html#viewing-ray-dashboard-in-browsers"} 128 | {"question": "In the Ray cluster launcher YAML, does `max_workers` include the head node, or only worker nodes?", "source": "https://docs.ray.io/en/master/cluster/vms/references/ray-cluster-configuration.html"} 129 | {"question": "How to update files in working directory ?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#using-local-files"} 130 | {"question": "How I can update working directory file when ray allready initiated ?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#using-local-files"} 131 | {"question": "how can I force ray head node to use custom pem file to ssh worker node?", "source": "https://docs.ray.io/en/master/cluster/vms/references/ray-cluster-configuration.html#full-configuration"} 132 | {"question": "what doess the GCS server do, and why is my GCS server taking up so much memory on the head node?", "source": "https://docs.ray.io/en/master/ray-references/glossary.html"} 133 | {"question": "when starting cluster with ray up, there are few nodes \"pending\" for a long time. how can I debug this?", "source": "https://docs.ray.io/en/master/ray-observability/getting-started.html#ray-status"} 134 | {"question": "how to install Ray 2.5.1 from github or wheel?", "source": "https://docs.ray.io/en/master/ray-overview/installation.html#from-wheels"} 135 | {"question": "How do I use `worker_setup_hook` in a runtime env to set do some setup on worker node creation?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#customizing-worker-process-loggers"} 136 | {"question": "how to use Ray dataset on aws", "source": "https://docs.ray.io/en/master/data/key-concepts.html"} 137 | {"question": "How do I avoid my dataset shuffling during a ray.data.map_batches?", "source": "https://docs.ray.io/en/master/data/performance-tips.html#deterministic-execution"} 138 | {"question": "Is the order of the input data preserved after a map_batches operation?", "source": "https://docs.ray.io/en/master/data/performance-tips.html#deterministic-execution"} 139 | {"question": "ray serve returns generic internal service error when there is an internal failure, how do I get it to emit more detailed errors or logs?", "source": "https://docs.ray.io/en/master/serve/monitoring.html#ray-logging"} 140 | {"question": "how do i track an uncaught exception in ray serve", "source": "https://docs.ray.io/en/master/serve/monitoring.html#ray-logging"} 141 | {"question": "where do I view logs using python logger emitted by my ray serve endpoint in the ray cluster", "source": "https://docs.ray.io/en/master/serve/monitoring.html#ray-logging"} 142 | {"question": "where can I see logs for a failed ray serve deployment", "source": "https://docs.ray.io/en/master/ray-observability/getting-started.html#serve-view"} 143 | {"question": "How to take a subset of a Ray Dataset?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.limit.html#ray-data-dataset-limit"} 144 | {"question": "How do I load all checkpoints from trials of a Tune experiment launched with `tune.run`? I ran my initial experiment with cloud checkpointing, so I’d need to download all the checkpoints to analyze them.", "source": "https://docs.ray.io/en/master/tune/tutorials/tune_get_data_in_and_out.html#how-do-i-access-tune-results-after-i-am-finished"} 145 | {"question": "How can I kill a \"detached\" Actor ?", "source": "https://docs.ray.io/en/master/ray-core/actors/named-actors.html#actor-lifetimes"} 146 | {"question": "How do I set env variables in ray init? Let’ say it’s export foo=“foo”", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html#ray-runtime-env-runtimeenv"} 147 | {"question": "What is the rest api for getting the head node id?", "source": "https://docs.ray.io/en/master/ray-observability/reference/doc/ray.util.state.list_nodes.html#ray-util-state-list-nodes"} 148 | {"question": "how to rerun a canceled ray task", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.cancel.html#ray-cancel"} 149 | {"question": "How do I set the max parallel concurrent scheduled tasks in map_batches?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map_batches.html#ray-data-dataset-map-batches"} 150 | {"question": "How do I get the number of cpus from ray cluster?", "source": "https://docs.ray.io/en/master/ray-core/miscellaneous.html#resource-information"} 151 | {"question": "How to use the exclude option to the runtime_env", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#api-reference"} 152 | {"question": "show a map batch example with batch_format", "source": "https://docs.ray.io/en/master/data/transforming-data.html#configuring-batch-format"} 153 | {"question": "how to find local ray address", "source": "https://docs.ray.io/en/master/ray-core/examples/gentle_walkthrough.html#ray-core"} 154 | {"question": "ray crashing with AttributeError: module 'pydantic.fields' has no attribute 'ModelField", "source": "https://github.com/ray-project/ray/issues/36990"} 155 | {"question": "How to start ray cluster on multiple node via CLI?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/on-premises.html#manually-set-up-a-ray-cluster"} 156 | {"question": "my ray tuner shows \"running\" but CPU usage is almost 0%. why ?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/optimize-performance.html#no-speedup"} 157 | {"question": "should the Ray head node and all workers have the same object store memory size allocated?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/large-cluster-best-practices.html#configuring-the-head-node"} 158 | {"question": "In Ray Serve, how to specify whether to set up an httpproxy on each node, or just the head node?", "source": "https://docs.ray.io/en/master/serve/architecture.html#how-does-serve-ensure-horizontal-scalability-and-availability"} 159 | {"question": "Want to embed Grafana into the Ray Dashboard, given that I am using KubeRay\n\nGiven the context that Prometheus and Grafana are not running on my Head node, and that I am using KubeRay, how should I be setting the following variables?\n• `RAY_GRAFANA_HOST`\n• `RAY_PROMETHEUS_HOST`\nAnd is there a way to set them more intelligently, given that head node IP is changing every time we reconfigure our cluster?", "source": "https://docs.ray.io/en/master/cluster/metrics.html"} 160 | {"question": "How the GCS determines which Kubernetes pod to kill when using KubeRay autoscaling?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/configuring-autoscaling.html"} 161 | {"question": "How can I set the `request_timeout_s` in `http_options` section of a Ray Serve YAML config file?", "source": "https://docs.ray.io/en/master/serve/production-guide/config.html#serve-config-files-serve-build"} 162 | {"question": "How do I make the GPU available on my M1 laptop to ray?", "source": "https://docs.ray.io/en/master/ray-overview/installation.html#m1-mac-apple-silicon-support"} 163 | {"question": "How can I add a timeout for the Ray job?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/quickstart.html#interacting-with-long-running-jobs"} 164 | {"question": "how do I set custom /tmp directory for remote cluster?", "source": "https://docs.ray.io/en/master/cluster/cli.html#ray-start"} 165 | {"question": "if I set --temp-dir to a different directory than /tmp, will ray object spill to the custom directory ?", "source": "https://docs.ray.io/en/master/ray-core/objects/object-spilling.html"} 166 | {"question": "can you give me an example for *`--runtime-env-json`*", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#specifying-a-runtime-environment-per-job"} 167 | {"question": "What should be the value of `maxConcurrentReplicas` if autoscaling configuration is specified?", "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#autoscaling-config-parameters"} 168 | {"question": "Yes what should be the value of `max_concurrent_queries` when `target_num_ongoing_requests_per_replica` is specified?", "source": "https://docs.ray.io/en/master/serve/architecture.html#ray-serve-autoscaling"} 169 | {"question": "what is a `smoothing_factor`", "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#autoscaling-config-parameters"} 170 | {"question": "What is the reason actors change their state to unhealthy?", "source": "https://docs.ray.io/en/master/ray-observability/reference/doc/ray.util.state.common.ActorState.html#ray-util-state-common-actorstate"} 171 | {"question": "How do I access logs for a dead node?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/logging.html#log-persistence"} 172 | {"question": "What are the reasons for a node to change it’s status to dead?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/nodes.html"} 173 | {"question": "What are the reasons for spikes in node CPU utilization", "source": "https://docs.ray.io/en/master/ray-core/patterns/limit-running-tasks.html#pattern-using-resources-to-limit-the-number-of-concurrently-running-tasks"} 174 | {"question": "What AWS machine type is recommended to deploy a RayService on EKS?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/large-cluster-best-practices.html#configuring-the-head-node"} 175 | {"question": "Is there a way to configure the session name generated by ray?", "source": "https://docs.ray.io/en/master/ray-core/configure.html#logging-and-debugging"} 176 | {"question": "can I use the Python SDK to get a link to Ray dashboard for a given job?", "source": "https://docs.ray.io/en/master/ray-observability/getting-started.html#set-up-dashboard"} 177 | {"question": "What may possible cause the node where this task was running crashed unexpectedly. This can happen if: (1) the instance where the node was running failed, (2) raylet crashes unexpectedly (OOM, preempted node, etc).", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/debug-memory.html#debugging-memory-issues"} 178 | {"question": "Do you know how to resolve (gcs_server) gcs_health_check_manager.cc:108: Health check failed for node? I observed that the node is still up and running.", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#system-component-logs"} 179 | {"question": "What is the working of `PowerOfTwoChoicesReplicaScheduler` ?", "source": "https://github.com/ray-project/ray/pull/36501"} 180 | {"question": "Do you need the DAGDriver to deploy a serve application using RayServe?", "source": "https://docs.ray.io/en/master/serve/key-concepts.html#deployment"} 181 | {"question": "What’s the import path that I need to provide to a simple RayServe deployment?", "source": "https://docs.ray.io/en/master/serve/production-guide/config.html#serve-config-files-serve-build"} 182 | {"question": "do you have any documents / examples showing the usage of RayJob in Kuberay?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/experimental.html#rayjobs"} 183 | {"question": "Does Ray metrics have to be exported via an actor?", "source": "https://docs.ray.io/en/master/cluster/metrics.html#processing-and-exporting-metrics"} 184 | {"question": "how can I avoid objects not getting spilled?", "source": "https://docs.ray.io/en/master/ray-core/objects/object-spilling.html#single-node"} 185 | {"question": "what’s ray core", "source": "https://docs.ray.io/en/master/ray-core/walkthrough.html#what-is-ray-core"} 186 | {"question": "Does ray support cron job", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/index.html#ray-jobs-api"} 187 | {"question": "can you give me the dependencies list for api read_images?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.read_images.html#ray-data-read-images"} 188 | {"question": "how do I kill a specific serve replica", "source": "https://docs.ray.io/en/master/serve/production-guide/fault-tolerance.html"} 189 | {"question": "What exactly is rayjob? How is it handled in kuberay? Can you give an example of what a Rayjob will look like?", "source": "https://ray-project.github.io/kuberay/guidance/rayjob/"} 190 | -------------------------------------------------------------------------------- /datasets/eval-dataset-v1.jsonl: -------------------------------------------------------------------------------- 1 | {"question": "I’m struggling a bit with Ray Data type conversions when I do map_batches. Any advice?", "source": "https://docs.ray.io/en/master/data/transforming-data.html#configuring-batch-format"} 2 | {"question": "How does autoscaling work in a Ray Serve application?", "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#autoscaling"} 3 | {"question": "how do I get the address of a ray node", "source": "https://docs.ray.io/en/master/ray-core/miscellaneous.html#node-information"} 4 | {"question": "Does Ray support NCCL?", "source": "https://docs.ray.io/en/master/ray-more-libs/ray-collective.html"} 5 | {"question": "Is Ray integrated with DeepSpeed?", "source": "https://docs.ray.io/en/master/ray-air/examples/gptj_deepspeed_fine_tuning.html#fine-tuning-the-model-with-ray-air-a-name-train-a"} 6 | {"question": "what will happen if I use AsyncIO's await to wait for a Ray future like `await x.remote()`", "source": "https://docs.ray.io/en/master/ray-core/actors/async_api.html#objectrefs-as-asyncio-futures"} 7 | {"question": "How would you compare Spark, Ray, Dask?", "source": "https://docs.ray.io/en/master/data/overview.html#how-does-ray-data-compare-to-x-for-offline-inference"} 8 | {"question": "why would ray overload a node w/ more task that the resources allow ?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/resources.html#physical-resources-and-logical-resources"} 9 | {"question": "when should I use Ray Client?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/ray-client.html#when-to-use-ray-client"} 10 | {"question": "how to scatter actors across the cluster?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/index.html#spread"} 11 | {"question": "On remote ray cluster, when I do `ray debug` I'm getting connection refused error. Why ?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/ray-debugging.html#running-on-a-cluster"} 12 | {"question": "How does Ray AIR set up the model to communicate gradient updates across machines?", "source": "https://docs.ray.io/en/master/train/train.html#intro-to-ray-train"} 13 | {"question": "Why would I use Ray Serve instead of Modal or Seldon? Why can't I just do it via containers?", "source": "https://docs.ray.io/en/master/serve/index.html"} 14 | {"question": "How do I deploy an LLM workload on top of Ray Serve?", "source": "https://docs.ray.io/en/master/ray-air/examples/gptj_serving.html"} 15 | {"question": "what size of memory should I need for this if I am setting set the `model_id` to “EleutherAI/gpt-j-6B”?", "source": "https://docs.ray.io/en/master/ray-air/examples/gptj_serving.html"} 16 | {"question": "How do I log the results from multiple distributed workers into a single tensorboard?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-output.html#how-to-log-your-tune-runs-to-tensorboard"} 17 | {"question": "how do you config SyncConfig for a Ray AIR job?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-storage.html#on-a-multi-node-cluster-deprecated"} 18 | {"question": "how can I quickly narrow down the root case of a failed ray job, assuming I have access to all the logs", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-files-in-logging-directory"} 19 | {"question": "How do I specify how many GPUs a serve deployment needs?", "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#resource-management-cpus-gpus"} 20 | {"question": "One of my worker nodes keeps dying on using TensorflowTrainer with around 1500 workers, I observe SIGTERM has been received to the died node's raylet. How can I debug this?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-files-in-logging-directory"} 21 | {"question": "what are the possible reasons for nodes dying in a cluster?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/ray-oom-prevention.html"} 22 | {"question": "how do I programatically get ray remote cluster to a target size immediately without scaling up through autoscaler ?", "source": "https://docs.ray.io/en/master/cluster/running-applications/autoscaling/reference.html#ray-autoscaler-sdk-request-resources"} 23 | {"question": "how do you disable async iter_batches with Ray Dataset?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.iter_batches.html#ray-data-dataset-iter-batches"} 24 | {"question": "what is the different between a batch and a block, for ray datasets?", "source": "https://docs.ray.io/en/master/data/data-internals.html#datasets-and-blocks"} 25 | {"question": "How to setup the development environments for ray project?", "source": "https://docs.ray.io/en/master/ray-contribute/development.html"} 26 | {"question": "how do I debug why ray rollout workers are deadlocking when using the sample API in `ray/rllib/evaluation/rollout_worker.py`", "source": "https://docs.ray.io/en/master/rllib/rllib-dev.html#troubleshooting"} 27 | {"question": "how do I join two ray datasets?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.zip.html"} 28 | {"question": "Is there a way to retrieve an object ref from its id?", "source": "https://docs.ray.io/en/master/ray-core/objects.html"} 29 | {"question": "how to create model Checkpoint from the model in memory?", "source": "https://docs.ray.io/en/master/train/api/doc/ray.train.torch.TorchCheckpoint.from_model.html#ray-train-torch-torchcheckpoint-from-model"} 30 | {"question": "what is Deployment in Ray Serve?", "source": "https://docs.ray.io/en/master/serve/key-concepts.html#deployment"} 31 | {"question": "What is user config in Ray Serve? how do I use it?", "source": "https://docs.ray.io/en/master/serve/configure-serve-deployment.html#configure-ray-serve-deployments"} 32 | {"question": "What is the difference between PACK and SPREAD strategy?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/placement-group.html#placement-strategy"} 33 | {"question": "What’s the best way to run ray across multiple machines?", "source": "https://docs.ray.io/en/master/ray-core/cluster/index.html"} 34 | {"question": "how do I specify ScalingConfig for a Tuner run?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.Tuner.html"} 35 | {"question": "how to utilize ‘zero-copy’ feature ray provide for numpy?", "source": "https://docs.ray.io/en/master/ray-core/objects/serialization.html#numpy-arrays"} 36 | {"question": "if there are O(millions) of keys that all have state, is it ok to spin up 1=1 actors? Or would it be advised to create ‘key pools’ where an actor can hold 1=many keys?", "source": "https://docs.ray.io/en/master/ray-core/patterns/too-fine-grained-tasks.html"} 37 | {"question": "How to find the best checkpoint from the trial directory?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.ExperimentAnalysis.html"} 38 | {"question": "what are the advantage and disadvantage of using singleton Actor ?", "source": "https://docs.ray.io/en/master/ray-core/actors/named-actors.html"} 39 | {"question": "what are the advantages of using a named actor?", "source": "https://docs.ray.io/en/master/ray-core/actors/named-actors.html"} 40 | {"question": "How do I read a text file stored on S3 using Ray Data?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.read_text.html"} 41 | {"question": "how do I get the IP of the head node for my Ray cluster?", "source": "https://docs.ray.io/en/master/ray-core/miscellaneous.html#node-information"} 42 | {"question": "How to write a map function that returns a list of object for `map_batches`?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map_batches.html#ray-data-dataset-map-batches"} 43 | {"question": "How do I set a maximum episode length when training with Rllib?", "source": "https://docs.ray.io/en/master/rllib/key-concepts.html"} 44 | {"question": "how do I make a Ray Tune trial retry on failures?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-fault-tolerance.html"} 45 | {"question": "For the supervised actor pattern, can we keep the Worker Actor up if the Supervisor passes a reference to the Actor to another Actor, to allow the worker actor to remain even on Supervisor / Driver failure?", "source": "https://docs.ray.io/en/master/ray-core/patterns/tree-of-actors.html"} 46 | {"question": "How do I read a large text file in S3 with Ray?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.read_text.html"} 47 | {"question": "how do I get a ray dataset from pandas", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.from_pandas.html"} 48 | {"question": "can you give me an example of using `ray.data.map` ?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map.html"} 49 | {"question": "can you give me an example of using `ray.data.map` , with a callable class as input?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map.html"} 50 | {"question": "How to set memory limit for each trial in Ray Tuner?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-resources.html"} 51 | {"question": "how do I get the actor id of an actor", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.runtime_context.get_runtime_context.html"} 52 | {"question": "can ray.init() can check if ray is all-ready initiated ?", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.init.html"} 53 | {"question": "What does the `compute=actor` argument do within `ray.data.map_batches` ?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map_batches.html"} 54 | {"question": "how do I use wandb logger with accelerateTrainer?", "source": "https://docs.ray.io/en/master/tune/examples/tune-wandb.html"} 55 | {"question": "What will be implicitly put into object store?", "source": "https://docs.ray.io/en/master/ray-core/objects.html#objects"} 56 | {"question": "How do I kill or cancel a ray task that I already started?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/tasks.html#cancelling-misbehaving-tasks"} 57 | {"question": "how to send extra arguments in dataset.map_batches function?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map_batches.html#ray-data-dataset-map-batches"} 58 | {"question": "where does ray GCS store the history of jobs run on a kuberay cluster? What type of database and format does it use for this?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/static-ray-cluster-without-kuberay.html#external-redis-integration-for-fault-tolerance"} 59 | {"question": "How to resolve ValueError: The actor ImplicitFunc is too large?", "source": "https://docs.ray.io/en/master/ray-core/patterns/closure-capture-large-objects.html"} 60 | {"question": "How do I use ray to distribute training for my custom neural net written using Keras in Databricks?", "source": "https://docs.ray.io/en/master/train/examples/tf/tensorflow_mnist_example.html"} 61 | {"question": "how to use ray.put and ray,get?", "source": "https://docs.ray.io/en/master/ray-core/objects.html#fetching-object-data"} 62 | {"question": "how do I use Ray Data to pre process many files?", "source": "https://docs.ray.io/en/master/data/transforming-data.html#transforming-batches-with-tasks"} 63 | {"question": "can’t pickle SSLContext objects", "source": "https://docs.ray.io/en/master/ray-core/objects/serialization.html#customized-serialization"} 64 | {"question": "How do I install CRDs in Kuberay?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/getting-started.html#deploying-the-kuberay-operator"} 65 | {"question": "Why the function for Ray data batch inference has to be named as _`__call__()`_ ?", "source": "https://docs.ray.io/en/master/data/examples/nyc_taxi_basic_processing.html#parallel-batch-inference"} 66 | {"question": "How to disconnnect ray client?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/ray-client.html#connect-to-multiple-ray-clusters-experimental"} 67 | {"question": "how to submit job with python with local files?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/quickstart.html#submitting-a-job"} 68 | {"question": "How do I do inference from a model trained by Ray tune.fit()?", "source": "https://docs.ray.io/en/master/data/batch_inference.html#using-models-from-ray-train"} 69 | {"question": "is there a way to load and run inference without using pytorch or tensorflow directly?", "source": "https://docs.ray.io/en/master/serve/index.html"} 70 | {"question": "what does ray do", "source": "https://docs.ray.io/en/master/ray-overview/index.html#overview"} 71 | {"question": "If I specify a fractional GPU in the resource spec, what happens if I use more than that?", "source": "https://docs.ray.io/en/master/ray-core/tasks/using-ray-with-gpus.html#fractional-gpus"} 72 | {"question": "how to pickle a variable defined in actor’s init method", "source": "https://docs.ray.io/en/master/ray-core/objects/serialization.html#customized-serialization"} 73 | {"question": "how do I do an all_reduce operation among a list of actors", "source": "https://docs.ray.io/en/master/ray-core/examples/map_reduce.html#shuffling-and-reducing-data"} 74 | {"question": "What will happen if we specify a bundle with `{\"CPU\":0}` in the PlacementGroup?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/placement-group.html#bundles"} 75 | {"question": "How to cancel job from UI?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/cli.html#ray-job-stop"} 76 | {"question": "how do I get my project files on the cluster when using Ray Serve? My workflow is to call `serve deploy config.yaml --address `", "source": "https://docs.ray.io/en/master/serve/advanced-guides/dev-workflow.html#testing-on-a-remote-cluster"} 77 | {"question": "how do i install ray nightly wheel", "source": "https://docs.ray.io/en/master/ray-overview/installation.html#daily-releases-nightlies"} 78 | {"question": "how do i install the latest ray nightly wheel?", "source": "https://docs.ray.io/en/master/ray-overview/installation.html#daily-releases-nightlies"} 79 | {"question": "how can I write unit tests for Ray code?", "source": "https://docs.ray.io/en/master/ray-core/examples/testing-tips.html#tip-2-sharing-the-ray-cluster-across-tests-if-possible"} 80 | {"question": "How I stop Ray from spamming lots of Info updates on stdout?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#disable-logging-to-the-driver"} 81 | {"question": "how to deploy stable diffusion 2.1 with Ray Serve?", "source": "https://docs.ray.io/en/master/serve/tutorials/stable-diffusion.html#serving-a-stable-diffusion-model"} 82 | {"question": "what is actor_handle?", "source": "https://docs.ray.io/en/master/ray-core/actors.html#passing-around-actor-handles"} 83 | {"question": "how to kill a r detached actors?", "source": "https://docs.ray.io/en/master/ray-core/actors/named-actors.html#actor-lifetimes"} 84 | {"question": "How to force upgrade the pip package in the runtime environment if an old version exists?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#api-reference"} 85 | {"question": "How do I do global shuffle with Ray?", "source": "https://docs.ray.io/en/master/data/transforming-data.html#shuffling-rows"} 86 | {"question": "How to find namespace of an Actor?", "source": "https://docs.ray.io/en/master/ray-observability/reference/doc/ray.util.state.list_actors.html#ray-util-state-list-actors"} 87 | {"question": "How does Ray work with async.io ?", "source": "https://docs.ray.io/en/master/ray-core/actors/async_api.html#asyncio-for-actors"} 88 | {"question": "How do I debug a hanging `ray.get()` call? I have it reproduced locally.", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/debug-hangs.html"} 89 | {"question": "can you show me an example of ray.actor.exit_actor()", "source": "https://docs.ray.io/en/master/ray-core/actors/terminating-actors.html#manual-termination-within-the-actor"} 90 | {"question": "how to add log inside actor?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#customizing-worker-process-loggers"} 91 | {"question": "can you write a script to do batch inference with GPT-2 on text data from an S3 bucket?", "source": "https://docs.ray.io/en/master/data/working-with-text.html#performing-inference-on-text"} 92 | {"question": "How do I enable Ray debug logs?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#using-rays-logger"} 93 | {"question": "How do I list the current Ray actors from python?", "source": "https://docs.ray.io/en/master/ray-observability/reference/doc/ray.util.state.list_actors.html#ray-util-state-list-actors"} 94 | {"question": "I want to kill the replica actor from Python. how do I do it?", "source": "https://docs.ray.io/en/master/ray-core/actors/terminating-actors.html#manual-termination-via-an-actor-handle"} 95 | {"question": "how do I specify in my remote function declaration that I want the task to run on a V100 GPU type?", "source": "https://docs.ray.io/en/master/ray-core/tasks/using-ray-with-gpus.html#accelerator-types"} 96 | {"question": "How do I get started?", "source": "https://docs.ray.io/en/master/ray-overview/getting-started.html#getting-started"} 97 | {"question": "How to specify python version in runtime_env?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#api-reference"} 98 | {"question": "how to create a Actor in a namespace?", "source": "https://docs.ray.io/en/master/ray-core/namespaces.html#using-namespaces"} 99 | {"question": "Can I specify multiple working directories?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#using-local-files"} 100 | {"question": "what if I set num_cpus=0 for tasks", "source": "https://docs.ray.io/en/master/ray-core/scheduling/resources.html#fractional-resource-requirements"} 101 | {"question": "is it possible to have ray on k8s without using kuberay? especially with the case that autoscaler is enabled.", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/static-ray-cluster-without-kuberay.html"} 102 | {"question": "how to manually configure and manage Ray cluster on Kubernetes", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/static-ray-cluster-without-kuberay.html#deploying-a-static-ray-cluster"} 103 | {"question": "If I shutdown a raylet, will the tasks and workers on that node also get killed?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/nodes.html#node-fault-tolerance"} 104 | {"question": "If I’d like to debug out of memory, how do I Do that, and which documentation should I look?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/debug-memory.html#detecting-out-of-memory-errors"} 105 | {"question": "How to use callback in Trainer?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.Callback.html#ray-tune-callback"} 106 | {"question": "How to provide current working directory to ray?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#remote-uris"} 107 | {"question": "how to create an actor instance with parameter?", "source": "https://docs.ray.io/en/master/ray-core/actors.html#actors"} 108 | {"question": "how to push a custom module to ray which is using by Actor ?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#library-development"} 109 | {"question": "how to print ray working directory?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#runtime-environments"} 110 | {"question": "why I can not see log.info in ray log?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#customizing-worker-process-loggers"} 111 | {"question": "when you use ray dataset to read a file, can you make sure the order of the data is preserved?", "source": "https://docs.ray.io/en/master/data/performance-tips.html#deterministic-execution"} 112 | {"question": "Can you explain what \"Ray will *not* retry tasks upon exceptions thrown by application code\" means ?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/tasks.html#retrying-failed-tasks"} 113 | {"question": "how do I specify the log directory when starting Ray?", "source": "https://docs.ray.io/en/master/ray-core/configure.html#logging-and-debugging"} 114 | {"question": "how to launch a ray cluster with 10 nodes, without setting the min worker as 10", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/on-premises.html#start-worker-nodes"} 115 | {"question": "how to use ray api to scale up a cluster", "source": "https://docs.ray.io/en/master/cluster/running-applications/autoscaling/reference.html#ray-autoscaler-sdk-request-resources"} 116 | {"question": "we plan to use Ray cloud launcher to start a cluster in AWS. How can we specify a subnet in the deployment file?", "source": "https://docs.ray.io/en/master/cluster/vms/references/ray-cluster-configuration.html#full-configuration"} 117 | {"question": "where I can find HTTP server error code log for Ray serve", "source": "https://docs.ray.io/en/master/serve/monitoring.html#ray-logging"} 118 | {"question": "I am running ray cluster on amazon and I have troubles displaying the dashboard. When a I tunnel the dashboard port from the headnode to my machine, the dashboard opens, and then it disappears (internal refresh fails). Is it a known problem? What am I doing wrong?", "source": "https://docs.ray.io/en/master/cluster/configure-manage-dashboard.html#viewing-ray-dashboard-in-browsers"} 119 | {"question": "In the Ray cluster launcher YAML, does `max_workers` include the head node, or only worker nodes?", "source": "https://docs.ray.io/en/master/cluster/vms/references/ray-cluster-configuration.html"} 120 | {"question": "How to update files in working directory ?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#using-local-files"} 121 | {"question": "How I can update working directory file when ray allready initiated ?", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#using-local-files"} 122 | {"question": "how can I force ray head node to use custom pem file to ssh worker node?", "source": "https://docs.ray.io/en/master/cluster/vms/references/ray-cluster-configuration.html#full-configuration"} 123 | {"question": "what doess the GCS server do, and why is my GCS server taking up so much memory on the head node?", "source": "https://docs.ray.io/en/master/ray-references/glossary.html"} 124 | {"question": "when starting cluster with ray up, there are few nodes \"pending\" for a long time. how can I debug this?", "source": "https://docs.ray.io/en/master/ray-observability/getting-started.html#ray-status"} 125 | {"question": "how to install Ray 2.5.1 from github or wheel?", "source": "https://docs.ray.io/en/master/ray-overview/installation.html#from-wheels"} 126 | {"question": "How do I use `worker_setup_hook` in a runtime env to set do some setup on worker node creation?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#customizing-worker-process-loggers"} 127 | {"question": "how to use Ray dataset on aws", "source": "https://docs.ray.io/en/master/data/key-concepts.html"} 128 | {"question": "How do I avoid my dataset shuffling during a ray.data.map_batches?", "source": "https://docs.ray.io/en/master/data/performance-tips.html#deterministic-execution"} 129 | {"question": "Is the order of the input data preserved after a map_batches operation?", "source": "https://docs.ray.io/en/master/data/performance-tips.html#deterministic-execution"} 130 | {"question": "ray serve returns generic internal service error when there is an internal failure, how do I get it to emit more detailed errors or logs?", "source": "https://docs.ray.io/en/master/serve/monitoring.html#ray-logging"} 131 | {"question": "how do i track an uncaught exception in ray serve", "source": "https://docs.ray.io/en/master/serve/monitoring.html#ray-logging"} 132 | {"question": "where do I view logs using python logger emitted by my ray serve endpoint in the ray cluster", "source": "https://docs.ray.io/en/master/serve/monitoring.html#ray-logging"} 133 | {"question": "where can I see logs for a failed ray serve deployment", "source": "https://docs.ray.io/en/master/ray-observability/getting-started.html#serve-view"} 134 | {"question": "How to take a subset of a Ray Dataset?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.limit.html#ray-data-dataset-limit"} 135 | {"question": "How do I load all checkpoints from trials of a Tune experiment launched with `tune.run`? I ran my initial experiment with cloud checkpointing, so I’d need to download all the checkpoints to analyze them.", "source": "https://docs.ray.io/en/master/tune/tutorials/tune_get_data_in_and_out.html#how-do-i-access-tune-results-after-i-am-finished"} 136 | {"question": "How can I kill a \"detached\" Actor ?", "source": "https://docs.ray.io/en/master/ray-core/actors/named-actors.html#actor-lifetimes"} 137 | {"question": "How do I set env variables in ray init? Let’ say it’s export foo=“foo”", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.runtime_env.RuntimeEnv.html#ray-runtime-env-runtimeenv"} 138 | {"question": "What is the rest api for getting the head node id?", "source": "https://docs.ray.io/en/master/ray-observability/reference/doc/ray.util.state.list_nodes.html#ray-util-state-list-nodes"} 139 | {"question": "how to rerun a canceled ray task", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.cancel.html#ray-cancel"} 140 | {"question": "How do I set the max parallel concurrent scheduled tasks in map_batches?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.map_batches.html#ray-data-dataset-map-batches"} 141 | {"question": "How do I get the number of cpus from ray cluster?", "source": "https://docs.ray.io/en/master/ray-core/miscellaneous.html#resource-information"} 142 | {"question": "How to use the exclude option to the runtime_env", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#api-reference"} 143 | {"question": "show a map batch example with batch_format", "source": "https://docs.ray.io/en/master/data/transforming-data.html#configuring-batch-format"} 144 | {"question": "how to find local ray address", "source": "https://docs.ray.io/en/master/ray-core/examples/gentle_walkthrough.html#ray-core"} 145 | {"question": "How to start ray cluster on multiple node via CLI?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/on-premises.html#manually-set-up-a-ray-cluster"} 146 | {"question": "my ray tuner shows \"running\" but CPU usage is almost 0%. why ?", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/optimize-performance.html#no-speedup"} 147 | {"question": "should the Ray head node and all workers have the same object store memory size allocated?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/large-cluster-best-practices.html#configuring-the-head-node"} 148 | {"question": "In Ray Serve, how to specify whether to set up an httpproxy on each node, or just the head node?", "source": "https://docs.ray.io/en/master/serve/architecture.html#how-does-serve-ensure-horizontal-scalability-and-availability"} 149 | {"question": "Want to embed Grafana into the Ray Dashboard, given that I am using KubeRay\n\nGiven the context that Prometheus and Grafana are not running on my Head node, and that I am using KubeRay, how should I be setting the following variables?\n• `RAY_GRAFANA_HOST`\n• `RAY_PROMETHEUS_HOST`\nAnd is there a way to set them more intelligently, given that head node IP is changing every time we reconfigure our cluster?", "source": "https://docs.ray.io/en/master/cluster/metrics.html"} 150 | {"question": "How the GCS determines which Kubernetes pod to kill when using KubeRay autoscaling?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/configuring-autoscaling.html"} 151 | {"question": "How can I set the `request_timeout_s` in `http_options` section of a Ray Serve YAML config file?", "source": "https://docs.ray.io/en/master/serve/production-guide/config.html#serve-config-files-serve-build"} 152 | {"question": "How do I make the GPU available on my M1 laptop to ray?", "source": "https://docs.ray.io/en/master/ray-overview/installation.html#m1-mac-apple-silicon-support"} 153 | {"question": "How can I add a timeout for the Ray job?", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/quickstart.html#interacting-with-long-running-jobs"} 154 | {"question": "how do I set custom /tmp directory for remote cluster?", "source": "https://docs.ray.io/en/master/cluster/cli.html#ray-start"} 155 | {"question": "if I set --temp-dir to a different directory than /tmp, will ray object spill to the custom directory ?", "source": "https://docs.ray.io/en/master/ray-core/objects/object-spilling.html"} 156 | {"question": "can you give me an example for *`--runtime-env-json`*", "source": "https://docs.ray.io/en/master/ray-core/handling-dependencies.html#specifying-a-runtime-environment-per-job"} 157 | {"question": "What should be the value of `maxConcurrentReplicas` if autoscaling configuration is specified?", "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#autoscaling-config-parameters"} 158 | {"question": "Yes what should be the value of `max_concurrent_queries` when `target_num_ongoing_requests_per_replica` is specified?", "source": "https://docs.ray.io/en/master/serve/architecture.html#ray-serve-autoscaling"} 159 | {"question": "what is a `smoothing_factor`", "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#autoscaling-config-parameters"} 160 | {"question": "What is the reason actors change their state to unhealthy?", "source": "https://docs.ray.io/en/master/ray-observability/reference/doc/ray.util.state.common.ActorState.html#ray-util-state-common-actorstate"} 161 | {"question": "How do I access logs for a dead node?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/logging.html#log-persistence"} 162 | {"question": "What are the reasons for a node to change it’s status to dead?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/nodes.html"} 163 | {"question": "What are the reasons for spikes in node CPU utilization", "source": "https://docs.ray.io/en/master/ray-core/patterns/limit-running-tasks.html#pattern-using-resources-to-limit-the-number-of-concurrently-running-tasks"} 164 | {"question": "What AWS machine type is recommended to deploy a RayService on EKS?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/large-cluster-best-practices.html#configuring-the-head-node"} 165 | {"question": "Is there a way to configure the session name generated by ray?", "source": "https://docs.ray.io/en/master/ray-core/configure.html#logging-and-debugging"} 166 | {"question": "can I use the Python SDK to get a link to Ray dashboard for a given job?", "source": "https://docs.ray.io/en/master/ray-observability/getting-started.html#set-up-dashboard"} 167 | {"question": "What may possible cause the node where this task was running crashed unexpectedly. This can happen if: (1) the instance where the node was running failed, (2) raylet crashes unexpectedly (OOM, preempted node, etc).", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/debug-apps/debug-memory.html#debugging-memory-issues"} 168 | {"question": "Do you know how to resolve (gcs_server) gcs_health_check_manager.cc:108: Health check failed for node? I observed that the node is still up and running.", "source": "https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#system-component-logs"} 169 | {"question": "Do you need the DAGDriver to deploy a serve application using RayServe?", "source": "https://docs.ray.io/en/master/serve/key-concepts.html#deployment"} 170 | {"question": "What’s the import path that I need to provide to a simple RayServe deployment?", "source": "https://docs.ray.io/en/master/serve/production-guide/config.html#serve-config-files-serve-build"} 171 | {"question": "do you have any documents / examples showing the usage of RayJob in Kuberay?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/user-guides/experimental.html#rayjobs"} 172 | {"question": "Does Ray metrics have to be exported via an actor?", "source": "https://docs.ray.io/en/master/cluster/metrics.html#processing-and-exporting-metrics"} 173 | {"question": "how can I avoid objects not getting spilled?", "source": "https://docs.ray.io/en/master/ray-core/objects/object-spilling.html#single-node"} 174 | {"question": "what’s ray core", "source": "https://docs.ray.io/en/master/ray-core/walkthrough.html#what-is-ray-core"} 175 | {"question": "Does ray support cron job", "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/index.html#ray-jobs-api"} 176 | {"question": "can you give me the dependencies list for api read_images?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.read_images.html#ray-data-read-images"} 177 | {"question": "how do I kill a specific serve replica", "source": "https://docs.ray.io/en/master/serve/production-guide/fault-tolerance.html"} 178 | -------------------------------------------------------------------------------- /datasets/golden-responses.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "question": "I\u2019m struggling a bit with Ray Data type conversions when I do map_batches. Any advice?", 4 | "source": "https://docs.ray.io/en/master/data/transforming-data.html#configuring-batch-format", 5 | "response": "Sure, when using the `map_batches()` function in Ray Data, you can specify the batch format by using the `batch_format` parameter. If you want to work with NumPy ndarrays, you can set `batch_format=\"numpy\"`. For example, if you have a function that increases the brightness of an image, you can use it like this:\n\n```python\nfrom typing import Dict\nimport numpy as np\nimport ray\n\ndef increase_brightness(batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:\n batch[\"image\"] = np.clip(batch[\"image\"] + 4, 0, 255)\n return batch\n\nds = (\n ray.data.read_images(\"s3://[email\u00a0protected]/image-datasets/simple\")\n .map_batches(increase_brightness, batch_format=\"numpy\")\n)\n```\n\nOn the other hand, if you prefer to work with pandas DataFrames, you can set `batch_format=\"pandas\"`. For instance, if you have a function that drops NA values from a DataFrame, you can use it like this:\n\n```python\nimport pandas as pd\nimport ray\n\ndef drop_nas(batch: pd.DataFrame) -> pd.DataFrame:\n return batch.dropna()\n\nds = (\n ray.data.read_csv(\"s3://[email\u00a0protected]/iris.csv\")\n .map_batches(drop_nas, batch_format=\"pandas\")\n)\n```\n\nRemember, by default, Ray Data represents batches as dicts of NumPy ndarrays." 6 | }, 7 | { 8 | "question": "How does autoscaling work in a Ray Serve application?", 9 | "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#autoscaling", 10 | "response": "In a Ray Serve application, autoscaling works by adjusting to traffic spikes through observing queue sizes and making scaling decisions to add or remove replicas. This is achieved by setting the autoscaling_config field in deployment options. The configuration includes parameters like min_replicas, max_replicas, initial_replicas, and target_num_ongoing_requests_per_replica. \n\nThe min_replicas and max_replicas fields define the range of replicas from which the Serve autoscaler can choose. Deployments start with initial_replicas, which is optional and defaults to min_replicas if not specified. The target_num_ongoing_requests_per_replica configuration determines how aggressively the autoscaler reacts to traffic. \n\nThe Ray Serve Autoscaler operates at the application level and sits on top of the Ray Autoscaler. It requests Ray to start a number of replica actors based on the request demand. If the Ray Autoscaler determines there aren\u2019t enough available CPUs to place these actors, it requests more Ray nodes from the underlying cloud provider. \n\nWhen Ray Serve scales down and terminates some replica actors, it tries to do so in a way that results in the most nodes having no Ray actors or tasks running on them, at which point the Ray autoscaler will remove those nodes. \n\nThere are also parameters like downscale_delay_s and upscale_delay_s that control the frequency of autoscaling work, and smoothing_factor that speeds up or slows down each autoscaling step. The metrics_interval_s parameter controls how often each replica sends metrics to the autoscaler." 11 | }, 12 | { 13 | "question": "how do I get the address of a ray node", 14 | "source": "https://docs.ray.io/en/master/ray-core/miscellaneous.html#node-information", 15 | "response": "You can get the address of a Ray node by accessing the 'NodeManagerAddress' field of the node's information. This field contains the PrivateIP of the node that the raylet is on." 16 | }, 17 | { 18 | "question": "Does Ray support NCCL?", 19 | "source": "https://docs.ray.io/en/master/ray-more-libs/ray-collective.html", 20 | "response": "Yes, Ray does support NCCL. It uses NCCL as one of the optional high-performance communication backends in its collective communication library." 21 | }, 22 | { 23 | "question": "Is Ray integrated with DeepSpeed?", 24 | "source": "https://docs.ray.io/en/master/ray-air/examples/gptj_deepspeed_fine_tuning.html#fine-tuning-the-model-with-ray-air-a-name-train-a", 25 | "response": "Yes, Ray is integrated with DeepSpeed. This integration provides optimization of the training process, including offloading and partitioning of optimizer and parameter states, which can decrease GPU memory usage. This is especially beneficial when dealing with large models that may not fit on smaller GPU types." 26 | }, 27 | { 28 | "question": "what will happen if I use AsyncIO's await to wait for a Ray future like `await x.remote()`", 29 | "source": "https://docs.ray.io/en/master/ray-core/actors/async_api.html#objectrefs-as-asyncio-futures", 30 | "response": "If you use AsyncIO's await to wait for a Ray future like `await x.remote()`, it will allow you to wait for the completion of the remote task. This is possible because ObjectRefs can be translated to asyncio.Futures. This feature enables you to integrate Ray futures into existing concurrent applications." 31 | }, 32 | { 33 | "question": "How would you compare Spark, Ray, Dask?", 34 | "source": "https://docs.ray.io/en/master/data/overview.html#how-does-ray-data-compare-to-x-for-offline-inference", 35 | "response": "The context only provides information on the comparison between Ray Data and Apache Spark. According to the provided information, both Ray Data and Apache Spark handle batch processing workloads. However, Ray Data uses a streaming paradigm that is more suitable for GPU workloads for deep learning inference. The context does not provide any information on Dask, so a comparison involving Dask cannot be made." 36 | }, 37 | { 38 | "question": "why would ray overload a node w/ more task that the resources allow ?", 39 | "source": "https://docs.ray.io/en/master/ray-core/scheduling/resources.html#physical-resources-and-logical-resources", 40 | "response": "Ray allows for overloading a node with more tasks than the resources allow because it operates on logical resources, not physical ones. This means that the resource requirements of tasks or actors do not impose limits on actual physical resource usage. It's the user's responsibility to ensure that tasks or actors do not use more resources than specified. Ray doesn't provide CPU isolation for tasks or actors, and it lets the operating system schedule and run the task. However, it does provide GPU isolation by setting the CUDA_VISIBLE_DEVICES environment variable." 41 | }, 42 | { 43 | "question": "when should I use Ray Client?", 44 | "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/ray-client.html#when-to-use-ray-client", 45 | "response": "Ray Client should be used when you want to connect an interactive Python shell to a remote cluster. It is particularly useful for developing interactively in a local Python shell. However, it requires a stable connection to the remote cluster and will terminate the workload if the connection is lost for more than 30 seconds. If you have a long running workload that you want to run on your cluster, it is recommended to use Ray Jobs instead. Please note that Ray Client may not work as expected for ML workloads like Ray Tune or Ray Train." 46 | }, 47 | { 48 | "question": "how to scatter actors across the cluster?", 49 | "source": "https://docs.ray.io/en/master/ray-core/scheduling/index.html#spread", 50 | "response": "To scatter actors across the cluster, you can use the \"SPREAD\" scheduling strategy. This can be done by creating instances of the actor with the \"SPREAD\" option. Here is an example:\n\n```python\nactors = [SpreadActor.options(scheduling_strategy=\"SPREAD\").remote() for _ in range(10)]\n```\n\nIn this example, 10 instances of the SpreadActor are created and spread across the cluster." 51 | } 52 | ] -------------------------------------------------------------------------------- /datasets/synthetic-eval-dataset.jsonl: -------------------------------------------------------------------------------- 1 | {"question": "What is the purpose of the \"num_gpus\" parameter in the AlgorithmConfig.resources function?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.resources.html#ray-rllib-algorithms-algorithm-config-algorithmconfig-resources"} 2 | {"question": "How can the \"num_learner_workers\" parameter be used to enable multi-GPU training in the algorithm?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.algorithms.algorithm_config.AlgorithmConfig.resources.html#ray-rllib-algorithms-algorithm-config-algorithmconfig-resources"} 3 | {"question": "What is the purpose of the `sum` method in the `ray.data.Dataset` class?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.sum.html#ray-data-dataset-sum"} 4 | {"question": "How does the `ignore_nulls` parameter affect the computation of the sum in the `sum` method?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.sum.html#ray-data-dataset-sum"} 5 | {"question": "What is the purpose of the TorchVisionPreprocessor.transform_stats() method?", "source": "https://docs.ray.io/en/master/ray-air/api/doc/ray.data.preprocessors.TorchVisionPreprocessor.transform_stats.html#ray-data-preprocessors-torchvisionpreprocessor-transform-stats"} 6 | {"question": "Why is the TorchVisionPreprocessor.transform_stats() method deprecated and what does this mean for future Ray releases?", "source": "https://docs.ray.io/en/master/ray-air/api/doc/ray.data.preprocessors.TorchVisionPreprocessor.transform_stats.html#ray-data-preprocessors-torchvisionpreprocessor-transform-stats"} 7 | {"question": "What are trial checkpoints and why are they important in the Tune framework?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-trial-checkpoints.html#how-to-save-and-load-trial-checkpoints"} 8 | {"question": "How can trial-level checkpoints be saved and loaded in Tune's Trainable API?", "source": "https://docs.ray.io/en/master/tune/tutorials/tune-trial-checkpoints.html#how-to-save-and-load-trial-checkpoints"} 9 | {"question": "How does the ActorPool class in the ray.util module operate on a fixed pool of actors? Provide an example.", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.util.ActorPool.html#ray-util-actorpool"} 10 | {"question": "What is the purpose of the get_next_unordered() method in the ActorPool class? How does it differ from the get_next() method?", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.util.ActorPool.html#ray-util-actorpool"} 11 | {"question": "What is the purpose of the HyperBandScheduler class in Ray Tune? How does it contribute to the early stopping of trials in hyperparameter optimization?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.schedulers.HyperBandScheduler.html#ray-tune-schedulers-hyperbandscheduler"} 12 | {"question": "Can you explain the concept of brackets in the HyperBand algorithm? How are low-performing trials identified and early stopped within each bracket?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.schedulers.HyperBandScheduler.html#ray-tune-schedulers-hyperbandscheduler"} 13 | {"question": "What is the purpose of the `ProgressReporter` class in the Ray Tune library?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.ProgressReporter.html#ray-tune-progressreporter"} 14 | {"question": "How does the `should_report()` function in the `ProgressReporter` class determine whether progress should be reported?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.ProgressReporter.html#ray-tune-progressreporter"} 15 | {"question": "What is the purpose of the variable \"SampleBatch.PREV_REWARDS\" in the Ray RLlib policy sample batch?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.sample_batch.SampleBatch.PREV_REWARDS.html#ray-rllib-policy-sample-batch-samplebatch-prev-rewards"} 16 | {"question": "How does the \"prev_rewards\" field in the Ray RLlib policy sample batch contribute to the overall functionality of the RLlib library?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.sample_batch.SampleBatch.PREV_REWARDS.html#ray-rllib-policy-sample-batch-samplebatch-prev-rewards"} 17 | {"question": "What is the purpose of creating a regularization image set in the context of the Stable Diffusion model?", "source": "https://docs.ray.io/en/master/ray-air/examples/dreambooth_finetuning.html#step-3-create-the-regularization-images"} 18 | {"question": "How does the use of Ray Data help in generating more images in parallel during batch inference?", "source": "https://docs.ray.io/en/master/ray-air/examples/dreambooth_finetuning.html#step-3-create-the-regularization-images"} 19 | {"question": "How does the BayesOptSearch algorithm integrate the given analysis into the gaussian process?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.bayesopt.BayesOptSearch.register_analysis.html#ray-tune-search-bayesopt-bayesoptsearch-register-analysis"} 20 | {"question": "What is the purpose of the analysis parameter in the BayesOptSearch.register_analysis() method?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.bayesopt.BayesOptSearch.register_analysis.html#ray-tune-search-bayesopt-bayesoptsearch-register-analysis"} 21 | {"question": "How can TFRecord files be created from tf.train.Example messages?", "source": "https://docs.ray.io/en/master/data/api/input_output.html#tfrecords"} 22 | {"question": "What is the purpose of using TFRecord files in the context of creating a Dataset?", "source": "https://docs.ray.io/en/master/data/api/input_output.html#tfrecords"} 23 | {"question": "What are the key concepts of RLlib mentioned in the document? Provide examples for each concept.", "source": "https://docs.ray.io/en/master/rllib/index.html#feature-overview"} 24 | {"question": "How does RLlib support highly distributed learning? Explain the role of the \"num_workers\" config parameter in parallelizing and speeding up learning.", "source": "https://docs.ray.io/en/master/rllib/index.html#feature-overview"} 25 | {"question": "What is the purpose of logs in troubleshooting Ray applications and Clusters? How can system logs be helpful in identifying issues with a node termination?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/logging.html#log-persistence"} 26 | {"question": "Since Ray does not provide a native storage solution for log data, what is the responsibility of users in managing the lifecycle of logs? Can you explain the steps involved in collecting logs from Ray Clusters running on VMs?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/logging.html#log-persistence"} 27 | {"question": "What is the purpose of the is_time_major() method in the TorchModelV2 class?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.models.torch.torch_modelv2.TorchModelV2.is_time_major.html#ray-rllib-models-torch-torch-modelv2-torchmodelv2-is-time-major"} 28 | {"question": "How does the is_time_major() method determine whether the data format should be time-major or not?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.models.torch.torch_modelv2.TorchModelV2.is_time_major.html#ray-rllib-models-torch-torch-modelv2-torchmodelv2-is-time-major"} 29 | {"question": "What is the purpose of the SyncSampler API in the given context?", "source": "https://docs.ray.io/en/master/rllib/package_ref/evaluation.html#synchronous-sampler-api"} 30 | {"question": "How does the SyncSampler differ from other types of samplers in terms of data collection?", "source": "https://docs.ray.io/en/master/rllib/package_ref/evaluation.html#synchronous-sampler-api"} 31 | {"question": "What are the possible values for the \"status\" metric at the workflow level?", "source": "https://docs.ray.io/en/master/workflows/metadata.html#available-metrics"} 32 | {"question": "How can users provide custom metadata for a workflow or task?", "source": "https://docs.ray.io/en/master/workflows/metadata.html#available-metrics"} 33 | {"question": "How does the Repeater algorithm in the ray.tune.search module work?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.Repeater.html#ray-tune-search-repeater"} 34 | {"question": "What is the purpose of the Repeater algorithm in the context of hyperparameter optimization?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.Repeater.html#ray-tune-search-repeater"} 35 | {"question": "How can you obtain the desired nodes from the LSF scheduler using bsub directives?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/community/lsf.html#deploying-on-lsf"} 36 | {"question": "What steps are involved in starting a Ray cluster on LSF and running DL workloads through it?", "source": "https://docs.ray.io/en/master/cluster/vms/user-guides/community/lsf.html#deploying-on-lsf"} 37 | {"question": "How can you cancel a misbehaving task in Ray? What happens when you call ray.cancel on an ObjectRef?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/tasks.html#cancelling-misbehaving-tasks"} 38 | {"question": "How can you prevent memory leaks caused by repeated task executions in Ray? What option can you set in a task's @ray.remote decorator to automatically exit a worker after a certain number of invocations?", "source": "https://docs.ray.io/en/master/ray-core/fault_tolerance/tasks.html#cancelling-misbehaving-tasks"} 39 | {"question": "How does NevergradSearch handle the optimization process when interacting with Nevergrad Optimizers?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.nevergrad.NevergradSearch.on_trial_complete.html#ray-tune-search-nevergrad-nevergradsearch-on-trial-complete"} 40 | {"question": "In the context of NevergradSearch, why is the result internally negated before being passed to Nevergrad Optimizers?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.nevergrad.NevergradSearch.on_trial_complete.html#ray-tune-search-nevergrad-nevergradsearch-on-trial-complete"} 41 | {"question": "Based on the context information provided, what is the activation function used for the objective_bb3b2702 experiment?", "source": "https://docs.ray.io/en/master/tune/examples/optuna_example.html#providing-an-initial-set-of-hyperparameters"} 42 | {"question": "What is the mean loss value for the objective_b9acff32 experiment?", "source": "https://docs.ray.io/en/master/tune/examples/optuna_example.html#providing-an-initial-set-of-hyperparameters"} 43 | {"question": "What is the purpose of the function `set_get_interceptor` in the `SampleBatch` class of the `ray.rllib.policy.sample_batch` module?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.sample_batch.SampleBatch.set_get_interceptor.html#ray-rllib-policy-sample-batch-samplebatch-set-get-interceptor"} 44 | {"question": "How does the `set_get_interceptor` function work in the `SampleBatch` class?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.sample_batch.SampleBatch.set_get_interceptor.html#ray-rllib-policy-sample-batch-samplebatch-set-get-interceptor"} 45 | {"question": "What command should be executed to delete a local kind cluster?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/getting-started.html#deleting-a-local-kind-cluster"} 46 | {"question": "As a teacher, explain the purpose of running the command \"kind delete cluster\" in the context of setting up a local kind cluster.", "source": "https://docs.ray.io/en/master/cluster/kubernetes/getting-started.html#deleting-a-local-kind-cluster"} 47 | {"question": "What is the average duration of each epoch in the training process?", "source": "https://docs.ray.io/en/master/ray-air/examples/dolly_lightning_fsdp_finetuning.html#fine-tune-with-lightningtrainer"} 48 | {"question": "How does the training loss change over the course of the epochs?", "source": "https://docs.ray.io/en/master/ray-air/examples/dolly_lightning_fsdp_finetuning.html#fine-tune-with-lightningtrainer"} 49 | {"question": "What is the purpose of the `get_current()` method in the `DataContext` class?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.DataContext.get_current.html#ray-data-datacontext-get-current"} 50 | {"question": "How does the `get_current()` method in the `DataContext` class handle the initialization of the context if it has not yet been created in the process?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.DataContext.get_current.html#ray-data-datacontext-get-current"} 51 | {"question": "Explain the concept of parallel execution in the given C++ code. How does it relate to the creation and incrementing of the Counter actors?", "source": "https://docs.ray.io/en/master/ray-core/actors.html#calling-the-actor"} 52 | {"question": "Describe the difference between executing tasks in parallel and executing tasks serially in the context of the given C++ code. How does the code demonstrate both types of execution?", "source": "https://docs.ray.io/en/master/ray-core/actors.html#calling-the-actor"} 53 | {"question": "What is the purpose of the `TuneReportCallback.order` parameter in the `ray.tune.integration.lightgbm` module?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.integration.lightgbm.TuneReportCallback.order.html#ray-tune-integration-lightgbm-tunereportcallback-order"} 54 | {"question": "How does the `TuneReportCallback` class contribute to the integration of Ray Tune with LightGBM?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.integration.lightgbm.TuneReportCallback.order.html#ray-tune-integration-lightgbm-tunereportcallback-order"} 55 | {"question": "What is the purpose of the ray.util.collective.collective.allgather_multigpu() function? Explain its parameters and return value.", "source": "https://docs.ray.io/en/master/ray-more-libs/ray-collective.html#module-ray.util.collective.collective"} 56 | {"question": "How does the ray.util.collective.collective.reducescatter_multigpu() function work? Describe its input parameters and the purpose of the function.", "source": "https://docs.ray.io/en/master/ray-more-libs/ray-collective.html#module-ray.util.collective.collective"} 57 | {"question": "What is the purpose of the TorchTrainer.fit() method in the given context?", "source": "https://docs.ray.io/en/master/train/api/doc/ray.train.torch.TorchTrainer.fit.html#ray-train-torch-torchtrainer-fit"} 58 | {"question": "What exception is raised if there are any failures during the execution of the self.as_trainable() method or during the Tune execution loop?", "source": "https://docs.ray.io/en/master/train/api/doc/ray.train.torch.TorchTrainer.fit.html#ray-train-torch-torchtrainer-fit"} 59 | {"question": "How does the sleep function simulate the cost of accessing and processing data from the database in the given code?", "source": "https://docs.ray.io/en/master/ray-core/examples/gentle_walkthrough.html#your-first-ray-api-example"} 60 | {"question": "Why is the measured runtime of the function 2.8 seconds considered the worst case scenario?", "source": "https://docs.ray.io/en/master/ray-core/examples/gentle_walkthrough.html#your-first-ray-api-example"} 61 | {"question": "What is the purpose of the `tune_mnist` function in the given code?", "source": "https://docs.ray.io/en/master/tune/examples/includes/mlflow_ptl_example.html#mlflow-pytorch-lightning-example"} 62 | {"question": "How does the `tuner` object in the code determine the best hyperparameters for the model?", "source": "https://docs.ray.io/en/master/tune/examples/includes/mlflow_ptl_example.html#mlflow-pytorch-lightning-example"} 63 | {"question": "How can fractional resource requirements be used in Ray? Provide an example.", "source": "https://docs.ray.io/en/master/ray-core/scheduling/resources.html#fractional-resource-requirements"} 64 | {"question": "What is the precision of the fractional resource requirement in Ray? How should it be specified?", "source": "https://docs.ray.io/en/master/ray-core/scheduling/resources.html#fractional-resource-requirements"} 65 | {"question": "What is the purpose of the `Histogram.info` property in the `ray.util.metrics` module?", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.util.metrics.Histogram.info.html#ray-util-metrics-histogram-info"} 66 | {"question": "How can the `Histogram.info` property be used to obtain information about a histogram metric in Ray?", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.util.metrics.Histogram.info.html#ray-util-metrics-histogram-info"} 67 | {"question": "What is the purpose of the `update_ops` method in the `TFModelV2` class?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.models.tf.tf_modelv2.TFModelV2.update_ops.html#ray-rllib-models-tf-tf-modelv2-tfmodelv2-update-ops"} 68 | {"question": "Can you provide an example of an update operation that would be included in the list returned by the `update_ops` method?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.models.tf.tf_modelv2.TFModelV2.update_ops.html#ray-rllib-models-tf-tf-modelv2-tfmodelv2-update-ops"} 69 | {"question": "How can you divide an application's steps into independent deployments using ServeHandles?", "source": "https://docs.ray.io/en/master/serve/model_composition.html#composing-deployments-using-servehandles"} 70 | {"question": "What is the purpose of using the ServeHandle in composing deployments?", "source": "https://docs.ray.io/en/master/serve/model_composition.html#composing-deployments-using-servehandles"} 71 | {"question": "What is the purpose of the \"Annotations\" subdirectory in the AnimalDetection dataset?", "source": "https://docs.ray.io/en/master/ray-air/examples/torch_detection.html#create-a-dataset"} 72 | {"question": "How many classes are included in the full Pascal VOC dataset, and what are their corresponding labels?", "source": "https://docs.ray.io/en/master/ray-air/examples/torch_detection.html#create-a-dataset"} 73 | {"question": "What is the purpose of increasing the value of the `num_workers` argument in the `DataLoader` initialization for the `train_dataloader` and `val_dataloader`?", "source": "https://docs.ray.io/en/master/train/examples/lightning/lightning_cola_advanced.html#fine-tune-the-model-with-lightningtrainer"} 74 | {"question": "How can increasing the number of workers in the `DataLoader` initialization improve performance in the context of the given document?", "source": "https://docs.ray.io/en/master/train/examples/lightning/lightning_cola_advanced.html#fine-tune-the-model-with-lightningtrainer"} 75 | {"question": "How can the progress of the benchmark be observed? What tools should be used for this purpose?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/examples/ml-example.html#observe-progress"} 76 | {"question": "What is the maximum duration for the benchmark to run? How long should one wait before considering it complete?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/examples/ml-example.html#observe-progress"} 77 | {"question": "How can adjusting the prefetch_batches argument for DataIterator.iter_batches help improve performance in a network bottleneck situation?", "source": "https://docs.ray.io/en/master/train/distributed-pytorch/data-loading-preprocessing.html#other-performance-tips"} 78 | {"question": "What is the purpose of using print(ds.stats()) or print(iterator.stats()) in the context of Ray Data performance?", "source": "https://docs.ray.io/en/master/train/distributed-pytorch/data-loading-preprocessing.html#other-performance-tips"} 79 | {"question": "What is the purpose of the \"accelerator_type\" option in Ray?", "source": "https://docs.ray.io/en/master/ray-core/tasks/using-ray-with-gpus.html#accelerator-types"} 80 | {"question": "How does the \"accelerator_type\" option in Ray help in managing resource allocation and autoscaling?", "source": "https://docs.ray.io/en/master/ray-core/tasks/using-ray-with-gpus.html#accelerator-types"} 81 | {"question": "What are the two ways to achieve logging results and uploading models to Weights & Biases?", "source": "https://docs.ray.io/en/master/ray-air/examples/upload_to_wandb.html#logging-results-and-uploading-models-to-weights-biases"} 82 | {"question": "How can you interact with Weights and Biases when logging training results?", "source": "https://docs.ray.io/en/master/ray-air/examples/upload_to_wandb.html#logging-results-and-uploading-models-to-weights-biases"} 83 | {"question": "What is the purpose of using the torch.utils.data.Dataset in the code?", "source": "https://docs.ray.io/en/master/ray-air/examples/convert_existing_pytorch_code_to_ray_air.html#unmodified"} 84 | {"question": "How does the code determine whether to use the CPU or GPU for training the neural network?", "source": "https://docs.ray.io/en/master/ray-air/examples/convert_existing_pytorch_code_to_ray_air.html#unmodified"} 85 | {"question": "What is the purpose of the \"unwrapped\" method in the RLModule class?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.unwrapped.html#ray-rllib-core-rl-module-rl-module-rlmodule-unwrapped"} 86 | {"question": "Can you provide an example of a wrapped module and explain how it is different from the underlying module?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.core.rl_module.rl_module.RLModule.unwrapped.html#ray-rllib-core-rl-module-rl-module-rlmodule-unwrapped"} 87 | {"question": "How can hyperparameter tuning impact the performance of a machine learning model?", "source": "https://docs.ray.io/en/master/tune/examples/tune-xgboost.html#conclusion"} 88 | {"question": "In what scenarios can intelligent hyperparameter tuning make a significant difference in model performance?", "source": "https://docs.ray.io/en/master/tune/examples/tune-xgboost.html#conclusion"} 89 | {"question": "What is the purpose of the `make_rl_module` method in the `TorchPolicyV2` class?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.torch_policy_v2.TorchPolicyV2.make_rl_module.html#ray-rllib-policy-torch-policy-v2-torchpolicyv2-make-rl-module"} 90 | {"question": "Under what condition will RLlib error out if the `make_rl_module` method is not implemented?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.torch_policy_v2.TorchPolicyV2.make_rl_module.html#ray-rllib-policy-torch-policy-v2-torchpolicyv2-make-rl-module"} 91 | {"question": "Explain the concept of Prioritized Experience Replay and its significance in reinforcement learning. How does it differ from other replay buffer algorithms?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBuffer.html#ray-rllib-utils-replay-buffers-prioritized-replay-buffer-prioritizedreplaybuffer"} 92 | {"question": "What is the main idea behind the PrioritizedReplayBuffer class in the ray.rllib.utils.replay_buffers.prioritized_replay_buffer module? How does it implement the Prioritized Experience Replay algorithm?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.utils.replay_buffers.prioritized_replay_buffer.PrioritizedReplayBuffer.html#ray-rllib-utils-replay-buffers-prioritized-replay-buffer-prioritizedreplaybuffer"} 93 | {"question": "What information does the Overview Metrics page provide in the Ray Cluster Overview view? How can it be useful for monitoring the cluster's performance?", "source": "https://docs.ray.io/en/master/ray-observability/getting-started.html#overview-view"} 94 | {"question": "Explain the purpose of the Events view in the Ray Cluster interface. What types of events can be found in this view and how can they be accessed using CLI commands?", "source": "https://docs.ray.io/en/master/ray-observability/getting-started.html#overview-view"} 95 | {"question": "What is the purpose of deploying the KubeRay operator in a Kubernetes cluster?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/examples/ml-example.html#deploy-the-kuberay-operator"} 96 | {"question": "Where can you find instructions on deploying the KubeRay operator in a Kubernetes cluster?", "source": "https://docs.ray.io/en/master/cluster/kubernetes/examples/ml-example.html#deploy-the-kuberay-operator"} 97 | {"question": "How does Ray ensure fault tolerance in the event of object data loss? Explain the concept of lineage reconstruction and its role in recovering from data loss.", "source": "https://docs.ray.io/en/master/ray-core/objects.html#fault-tolerance"} 98 | {"question": "What is the limitation of Ray's fault tolerance mechanism in terms of owner failure? Discuss the implications of this limitation and any potential solutions that could be implemented.", "source": "https://docs.ray.io/en/master/ray-core/objects.html#fault-tolerance"} 99 | {"question": "What is the purpose of the Dataset constructor in the ray.data module?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.__init__.html#ray-data-dataset-init"} 100 | {"question": "How can a Dataset be constructed using the ray.data module?", "source": "https://docs.ray.io/en/master/data/api/doc/ray.data.Dataset.__init__.html#ray-data-dataset-init"} 101 | {"question": "How does the \"rollout\" function in the given code sample contribute to the evaluation of the environment and model?", "source": "https://docs.ray.io/en/master/ray-core/examples/plot_pong_example.html#helper-functions"} 102 | {"question": "Can you explain the purpose of the \"dlogps\" list in the \"rollout\" function and how it is computed?", "source": "https://docs.ray.io/en/master/ray-core/examples/plot_pong_example.html#helper-functions"} 103 | {"question": "What is the purpose of the LightGBMTrainer.fit() method in the given context?", "source": "https://docs.ray.io/en/master/train/api/doc/ray.train.lightgbm.LightGBMTrainer.fit.html#ray-train-lightgbm-lightgbmtrainer-fit"} 104 | {"question": "What exception is raised if there are any failures during the execution of the LightGBMTrainer.fit() method?", "source": "https://docs.ray.io/en/master/train/api/doc/ray.train.lightgbm.LightGBMTrainer.fit.html#ray-train-lightgbm-lightgbmtrainer-fit"} 105 | {"question": "What does the `popitem()` method in the `RuntimeEnvConfig` class do?", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.runtime_env.RuntimeEnvConfig.popitem.html#ray-runtime-env-runtimeenvconfig-popitem"} 106 | {"question": "How are the (key, value) pairs returned by the `popitem()` method in the `RuntimeEnvConfig` class ordered?", "source": "https://docs.ray.io/en/master/ray-core/api/doc/ray.runtime_env.RuntimeEnvConfig.popitem.html#ray-runtime-env-runtimeenvconfig-popitem"} 107 | {"question": "What is the purpose of the `TuneReportCallback.EvalsLog` class in the Ray Tune integration for XGBoost?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.integration.xgboost.TuneReportCallback.EvalsLog.html#ray-tune-integration-xgboost-tunereportcallback-evalslog"} 108 | {"question": "Can you explain the structure and data types of the `TuneReportCallback.EvalsLog` dictionary?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.integration.xgboost.TuneReportCallback.EvalsLog.html#ray-tune-integration-xgboost-tunereportcallback-evalslog"} 109 | {"question": "What is the purpose of the `logger_creator` parameter in the `Algorithm.__init__` method?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.__init__.html#ray-rllib-algorithms-algorithm-algorithm-init"} 110 | {"question": "How does the `Algorithm` class handle the initialization of an instance?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.algorithms.algorithm.Algorithm.__init__.html#ray-rllib-algorithms-algorithm-algorithm-init"} 111 | {"question": "What is the purpose of calling the Dataset.repartition(n) function before applying custom data transform functions in Ray Data?", "source": "https://docs.ray.io/en/master/data/examples/batch_training.html#transforming-a-dataset-in-parallel-using-custom-functions-a-class-anchor-id-transform-ds-a"} 112 | {"question": "Explain the steps involved in the custom data transform function \"transform_df\" and how it is applied to the dataset in parallel.", "source": "https://docs.ray.io/en/master/data/examples/batch_training.html#transforming-a-dataset-in-parallel-using-custom-functions-a-class-anchor-id-transform-ds-a"} 113 | {"question": "How can the price of a mango be updated in the FruitStand example using Serve config?", "source": "https://docs.ray.io/en/master/serve/production-guide/kubernetes.html#example-serve-config-update"} 114 | {"question": "What is the expected result when querying the application after updating the price of a mango to 4?", "source": "https://docs.ray.io/en/master/serve/production-guide/kubernetes.html#example-serve-config-update"} 115 | {"question": "How does the concept of operations management contribute to the overall success of a business?", "source": "https://docs.ray.io/en/master/data/data-internals.html#operations"} 116 | {"question": "Explain the importance of effective supply chain management in operations and its impact on a company's profitability.", "source": "https://docs.ray.io/en/master/data/data-internals.html#operations"} 117 | {"question": "Explain the purpose and functionality of the PiecewiseSchedule class in the ray.rllib.utils.schedules.piecewise_schedule module. How does it differ from other schedule classes in the module?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.utils.schedules.piecewise_schedule.PiecewiseSchedule.html#ray-rllib-utils-schedules-piecewise-schedule-piecewiseschedule"} 118 | {"question": "Describe the parameters and their significance in the initialization of a PiecewiseSchedule instance. How does the interpolation parameter affect the behavior of the schedule?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.utils.schedules.piecewise_schedule.PiecewiseSchedule.html#ray-rllib-utils-schedules-piecewise-schedule-piecewiseschedule"} 119 | {"question": "What is the purpose of the `get_internal_representation` method in the `Checkpoint` class?", "source": "https://docs.ray.io/en/master/ray-air/api/doc/ray.air.checkpoint.Checkpoint.get_internal_representation.html#ray-air-checkpoint-checkpoint-get-internal-representation"} 120 | {"question": "How does the `get_internal_representation` method return the internal representation of a checkpoint object?", "source": "https://docs.ray.io/en/master/ray-air/api/doc/ray.air.checkpoint.Checkpoint.get_internal_representation.html#ray-air-checkpoint-checkpoint-get-internal-representation"} 121 | {"question": "What are some factors that can affect the performance improvement of neural network architectures during inference?", "source": "https://docs.ray.io/en/master/rllib/rllib-torch2x.html#some-meta-level-comments"} 122 | {"question": "Which backends are recommended for CPU inference in order to optimize performance?", "source": "https://docs.ray.io/en/master/rllib/rllib-torch2x.html#some-meta-level-comments"} 123 | {"question": "How can advanced Python APIs be beneficial for developers?", "source": "https://docs.ray.io/en/master/rllib/rllib-advanced-api.html#advanced-python-apis"} 124 | {"question": "What are some examples of advanced Python APIs and how are they used in practice?", "source": "https://docs.ray.io/en/master/rllib/rllib-advanced-api.html#advanced-python-apis"} 125 | {"question": "What is the purpose of the `restore_from_dir` function in the `Repeater` class?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.Repeater.restore_from_dir.html#ray-tune-search-repeater-restore-from-dir"} 126 | {"question": "How can you restore the state of a searcher using the `restore_from_dir` function?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.Repeater.restore_from_dir.html#ray-tune-search-repeater-restore-from-dir"} 127 | {"question": "True or False: According to the context information, the BasicVariantGenerator.is_finished() method returns True only after all trials have finished executing.", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.basic_variant.BasicVariantGenerator.is_finished.html#ray-tune-search-basic-variant-basicvariantgenerator-is-finished"} 128 | {"question": "What does the BasicVariantGenerator.is_finished() method return if there are no trials left to be queued into TrialRunner?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.search.basic_variant.BasicVariantGenerator.is_finished.html#ray-tune-search-basic-variant-basicvariantgenerator-is-finished"} 129 | {"question": "What is the purpose of the ProgressReporter.report function in the given context?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.ProgressReporter.report.html#ray-tune-progressreporter-report"} 130 | {"question": "How does the ProgressReporter.report function handle the \"done\" parameter?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.ProgressReporter.report.html#ray-tune-progressreporter-report"} 131 | {"question": "What is the purpose of the `on_trial_restore` callback in Ray Tune?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.Callback.on_trial_restore.html#ray-tune-callback-on-trial-restore"} 132 | {"question": "How does the `on_trial_restore` callback function differ from other callbacks in Ray Tune?", "source": "https://docs.ray.io/en/master/tune/api/doc/ray.tune.Callback.on_trial_restore.html#ray-tune-callback-on-trial-restore"} 133 | {"question": "What is the purpose of the offline batch inference in the given context?", "source": "https://docs.ray.io/en/master/ray-air/examples/huggingface_text_classification.html#next-steps"} 134 | {"question": "As a teacher, how would you explain the concept of end-to-end offline batch inference to your students?", "source": "https://docs.ray.io/en/master/ray-air/examples/huggingface_text_classification.html#next-steps"} 135 | {"question": "What is the purpose of the `load_batch_into_buffer` function in the `Policy` class?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.Policy.load_batch_into_buffer.html#ray-rllib-policy-policy-load-batch-into-buffer"} 136 | {"question": "How does the `load_batch_into_buffer` function distribute the data across the devices' memories?", "source": "https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.Policy.load_batch_into_buffer.html#ray-rllib-policy-policy-load-batch-into-buffer"} 137 | -------------------------------------------------------------------------------- /docker-compose.yaml: -------------------------------------------------------------------------------- 1 | version: "3.9" 2 | 3 | services: 4 | postgres: 5 | image: ankane/pgvector 6 | ports: 7 | - 5432:5432 8 | volumes: 9 | - ./postgres_data:/var/lib/postgresql/data 10 | environment: 11 | - POSTGRES_PASSWORD=postgres 12 | - POSTGRES_USER=postgres 13 | - POSTGRES_DB=postgres 14 | -------------------------------------------------------------------------------- /images/length-distribution.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/run-llama/ai-engineer-workshop/918b7efd79ec631978f484e1d1ad9704fae64306/images/length-distribution.png -------------------------------------------------------------------------------- /images/retrieval-eval.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/run-llama/ai-engineer-workshop/918b7efd79ec631978f484e1d1ad9704fae64306/images/retrieval-eval.png -------------------------------------------------------------------------------- /notebooks/.gitignore: -------------------------------------------------------------------------------- 1 | __pycache__/ 2 | .ipynb_checkpoints/ 3 | -------------------------------------------------------------------------------- /notebooks/01_rag.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "685a91ef-626a-4d76-8f7f-b89bfa6d1d6f", 6 | "metadata": {}, 7 | "source": [ 8 | "# Part 1: Developing the RAG application" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "a2ab54b8-5341-42fa-8790-93e71bbc43b5", 14 | "metadata": {}, 15 | "source": [ 16 | "- GitHub repository: https://github.com/Disiok/ai-engineer-workshop/\n", 17 | "- Ray documentation: https://docs.ray.io/\n", 18 | "- LlamaIndex documentation: https://gpt-index.readthedocs.io/en/stable/" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "id": "536f1270-5328-416e-90c5-9a8e087ae354", 24 | "metadata": {}, 25 | "source": [ 26 | "We will start by building our example RAG application: a Q&A app that given a question about Ray, can answer it using the Ray documentation.\n", 27 | "\n", 28 | "In this notebook we will learn how to:\n", 29 | "1. 💻 Develop a retrieval augmented generation (RAG) based LLM application.\n", 30 | "2. 🚀 Scale the major components (embed, index, etc.) in our application.\n", 31 | "\n", 32 | "We will use both [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/) and [Ray](https://docs.ray.io/) for developing our LLM application. \n", 33 | "\n", 34 | "" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "id": "7aa52945-492f-47ae-aabc-18ad43430f6d", 40 | "metadata": {}, 41 | "source": [ 42 | "## Setup" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "id": "b1f4fa1b-e1a6-402e-8f8a-462b3d02c87d", 48 | "metadata": {}, 49 | "source": [ 50 | "Let's setup our credentials for Open AI" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 46, 56 | "id": "e991060f-c95d-46f0-8bb9-7a310fc17ed3", 57 | "metadata": { 58 | "tags": [] 59 | }, 60 | "outputs": [], 61 | "source": [ 62 | "import os\n", 63 | "\n", 64 | "# os.environ[\"OPENAI_API_KEY\"] = ..." 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "id": "55bd9f4f-ba08-4178-a077-a31751ae91b9", 70 | "metadata": { 71 | "tags": [] 72 | }, 73 | "source": [ 74 | "## Step 1: Loading and parsing the Data" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "id": "f8e0d3e2-f390-4023-b24b-6904bdd361c4", 80 | "metadata": {}, 81 | "source": [ 82 | "To build our RAG application, we first need to load, parse, and embed the data that we want to use for answering our questions. \n", 83 | "\n", 84 | "This data processing pipeline has 3 steps:\n", 85 | "1. First, we will load the latest documentation for Ray\n", 86 | "2. Then we will parse the documentation to extract out chunks of text\n", 87 | "3. Finally, we will **embed** each chunk. This creates a vector representation of the provided text snippet. This vector representation allows us to easily determine the similarity between two different text snippets.\n", 88 | "\n", 89 | "" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "id": "d8ca95a6-d8c4-47c8-b960-c14094967e28", 95 | "metadata": {}, 96 | "source": [ 97 | "LlamaIndex provides utlities for loading our data, and also the abstractions for how we represent our data and their relationships.\n", 98 | "\n", 99 | "Ray, and in particular the Ray Data library, is used to scale out our data processing pipeline, allowing us to process data in parallel, leveraging the cores and GPUs in our Ray cluster. " 100 | ] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "id": "b78c1823-ac58-4bc3-a0b7-b94e8a7bac52", 105 | "metadata": {}, 106 | "source": [ 107 | "### Load data" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "id": "d310aaa2-9cfc-4a90-bf2b-76ac06f7f68b", 113 | "metadata": {}, 114 | "source": [ 115 | "The Ray documentation has already been downloaded and is stored in shared storage directory in our Anyscale workspace. We parse the html files in the downloaded documentation, and create a Ray Dataset out of the doc paths." 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "id": "20d5d0b9-be8b-491c-8879-09532c70dee6", 122 | "metadata": { 123 | "scrolled": true, 124 | "tags": [] 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "%cd ../datasets\n", 129 | "!unzip -o docs.zip" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "id": "140a2be2-aa55-4223-8c1b-20cc4d5a1f27", 136 | "metadata": { 137 | "tags": [] 138 | }, 139 | "outputs": [], 140 | "source": [ 141 | "from pathlib import Path\n", 142 | "\n", 143 | "RAY_DOCS_DIRECTORY = \"../datasets/docs.ray.io/en/master/\"" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": null, 149 | "id": "fd4ae53d-f922-4240-af8b-985d943151fe", 150 | "metadata": { 151 | "tags": [] 152 | }, 153 | "outputs": [], 154 | "source": [ 155 | "import ray\n", 156 | "\n", 157 | "docs_path = Path(RAY_DOCS_DIRECTORY)\n", 158 | "ds = ray.data.from_items([{\"path\": path} for path in docs_path.rglob(\"*.html\") if not path.is_dir()])\n", 159 | "print(f\"{ds.count()} documents\")" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "id": "f4a94e5f-aa03-4483-b3a7-0a4509769671", 165 | "metadata": { 166 | "tags": [] 167 | }, 168 | "source": [ 169 | "Now that we have a dataset of all the paths to the html files, we now need to extract text from these HTML files. We want to do this in a generalized manner so that we can perform this extraction across all of our docs pages. \n", 170 | "\n", 171 | "Therefore, we use LlamaIndex's HTMLTagReader to identify the sections in our HTML page and then extract the text in between them. For each section of text, we create a LlamaIndex Document, and also store the source url for that section as part of the metadata for the Document. After extracting all the text, we return a list of LlamaIndex documents.\n", 172 | "\n", 173 | "" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "id": "a2ed7959-77c6-473a-a9c5-22bd4e4875f1", 179 | "metadata": {}, 180 | "source": [ 181 | "### Parse data" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "id": "6c872e26-615b-4d91-96f5-603d3828c177", 188 | "metadata": { 189 | "tags": [] 190 | }, 191 | "outputs": [], 192 | "source": [ 193 | "from llama_index.readers import HTMLTagReader" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": null, 199 | "id": "ae9a63f7-e9da-4b33-b613-e1bf902d493d", 200 | "metadata": { 201 | "tags": [] 202 | }, 203 | "outputs": [], 204 | "source": [ 205 | "def path_to_uri(path, scheme=\"https://\", domain=\"docs.ray.io\"):\n", 206 | " # Converts the file path of a Ray documentation page to the original URL for the documentation.\n", 207 | " # Example: /efs/shared_storage/goku/docs.ray.io/en/master/rllib-env.html -> https://docs.ray.io/en/master/rllib/rllib-env.html#environments\n", 208 | " return scheme + domain + str(path).split(domain)[-1]\n", 209 | "\n", 210 | "def extract_sections(record):\n", 211 | " # Given a HTML file path, extract out text from the section tags, and return a LlamaIndex document from each one. \n", 212 | " html_file_path = record[\"path\"]\n", 213 | " reader = HTMLTagReader(tag=\"section\")\n", 214 | " documents = reader.load_data(html_file_path)\n", 215 | " \n", 216 | " # For each document, store the source URL as part of the metadata.\n", 217 | " for document in documents:\n", 218 | " document.metadata[\"source\"] = f\"{path_to_uri(document.metadata['file_path'])}#{document.metadata['tag_id']}\"\n", 219 | " return [{\"document\": document} for document in documents]" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "id": "c5b096df-94e3-48bb-8f1b-a464b8e9ef0d", 225 | "metadata": {}, 226 | "source": [ 227 | "Let's try this out on a single example HTML file" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": null, 233 | "id": "ef89386e-0203-446e-97dd-243197393eea", 234 | "metadata": { 235 | "tags": [] 236 | }, 237 | "outputs": [], 238 | "source": [ 239 | "example_path = Path(RAY_DOCS_DIRECTORY, \"rllib/rllib-env.html\")\n", 240 | "document = extract_sections({\"path\": example_path})[0][\"document\"]\n", 241 | "print(document)\n", 242 | "print(\"\\n\")\n", 243 | "print(\"Document source: \", document.metadata[\"source\"])" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "id": "ea63b9b8-1873-4595-a5e4-fb7debc78f32", 249 | "metadata": {}, 250 | "source": [ 251 | "Now, let's use Ray Data to parallelize this across all of the HTML files. We can stitch together operations on our Ray dataset to map a function over each document. \n", 252 | "\n", 253 | "Ray Data is lazy by default, so can first stitch together our entire pipeline, and then trigger execution. This allows Ray Data to fully optimize resource usage for our pipeline." 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "id": "0a3dcf6e-a12f-423d-8141-e602798f6c42", 260 | "metadata": { 261 | "tags": [] 262 | }, 263 | "outputs": [], 264 | "source": [ 265 | "sections_ds = ds.flat_map(extract_sections)\n", 266 | "sections_ds.schema()" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "id": "5322a66f-4209-4e29-bba3-269052329ec8", 272 | "metadata": {}, 273 | "source": [ 274 | "### Chunk data" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "id": "b77115cd-734a-4228-b49e-5546739b8694", 280 | "metadata": {}, 281 | "source": [ 282 | "We now have a list of Documents (with text and source of each section) but we shouldn't directly use this as context to our RAG application just yet. The text lengths of each section are all varied and many are quite large chunks. If were to use these large sections, then we'd be inserting a lot of noisy/unwanted context and because all LLMs have a maximum context length, we wouldn't be able to fit too many relevant contexts. Therefore, we're going to split the text within each section into smaller chunks. Intuitively, smaller chunks will encapsulate single/few concepts and will be less noisy compared to larger chunks. We're going to choose some typical text splitting values (ex. `chunk_size=512`) to create our chunks for now but we'll be experiments with a range of values later.\n", 283 | "\n", 284 | "\"Section" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "id": "8f58b811-153d-4cc0-9f04-a1df48ce82a9", 290 | "metadata": {}, 291 | "source": [ 292 | "Once again, we will use LlamaIndex's abstractions to chunk each Document into a **Node** with the provided chunk size. And we will use Ray Data to parallelize the chunking computation." 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": null, 298 | "id": "e7b8eecd-b74a-4d8b-b6af-942917755925", 299 | "metadata": { 300 | "tags": [] 301 | }, 302 | "outputs": [], 303 | "source": [ 304 | "from llama_index.node_parser import SimpleNodeParser" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "id": "52d8bb61-ada8-4f9f-a838-64b4d86614b8", 311 | "metadata": { 312 | "tags": [] 313 | }, 314 | "outputs": [], 315 | "source": [ 316 | "chunk_size = 512\n", 317 | "chunk_overlap = 50\n", 318 | "\n", 319 | "def chunk_document(document):\n", 320 | " node_parser = SimpleNodeParser.from_defaults(\n", 321 | " chunk_size=chunk_size,\n", 322 | " chunk_overlap=chunk_overlap\n", 323 | " )\n", 324 | " nodes = node_parser.get_nodes_from_documents([document[\"document\"]])\n", 325 | " return [{\"node\": node} for node in nodes]" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "id": "4d2dfc56-d6c4-47ff-8f3a-5a5420ebe9d8", 331 | "metadata": {}, 332 | "source": [ 333 | "Let's run an example over a single document. The document wil be chunked and will result in 2 nodes, each representing 1 chunk." 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": null, 339 | "id": "ddbbc34b-8cb5-4ee4-9d53-cd8d873afb0c", 340 | "metadata": { 341 | "tags": [] 342 | }, 343 | "outputs": [], 344 | "source": [ 345 | "sample_document = sections_ds.take(1)[0]\n", 346 | "\n", 347 | "# Nodes\n", 348 | "nodes = chunk_document(sample_document)\n", 349 | "\n", 350 | "print(\"Num chunks: \", len(nodes))\n", 351 | "print(f\"Example text: {nodes[0]['node'].text}\\n\")\n", 352 | "print(f\"Example metadata: {nodes[0]['node'].metadata}\\n\")" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "id": "0491bab5-bea6-4fbc-9347-0645e0df2e33", 358 | "metadata": {}, 359 | "source": [ 360 | "Now let's chunk all of our documents, stitching this operation into our Ray Dataset pipeline." 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": null, 366 | "id": "018a79ac-d28e-474c-bbe3-da4cf8e0adaf", 367 | "metadata": { 368 | "tags": [] 369 | }, 370 | "outputs": [], 371 | "source": [ 372 | "from ray.util.scheduling_strategies import NodeAffinitySchedulingStrategy\n", 373 | "\n", 374 | "chunks_ds = sections_ds.flat_map(chunk_document, scheduling_strategy=NodeAffinitySchedulingStrategy(node_id=ray.get_runtime_context().get_node_id(), soft=False))\n", 375 | "chunks_ds.schema()" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "id": "d0321c4d-6244-4aa7-834f-f3d117e78f76", 381 | "metadata": { 382 | "tags": [] 383 | }, 384 | "source": [ 385 | "### Embed data" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "id": "9e5790a1-d153-4aaa-b28f-17f2c501810e", 391 | "metadata": {}, 392 | "source": [ 393 | "Now that we've created small chunks from our dataset, we need a way to identify the most relevant ones to a given query. A very effective and quick method is to embed our data using a pretrained model and use the same model to embed the query. We can then compute the distance between all of the chunk embeddings and our query embedding to determine the top k chunks. There are many different pretrained models to choose from to embed our data but the most popular ones can be discovered through [HuggingFace's Massive Text Embedding Benchmark (MTEB)](https://huggingface.co/spaces/mteb/leaderboard) leadboard. These models were pretrained on very large text corpus through tasks such as next/masked token prediction that allows them to learn to represent subtokens in N dimensions and capture semantic relationships. We can leverage this to represent our data and make decisions such as the most relevant contexts to use to answer a given query. We're using Langchain's Embedding wrappers ([HuggingFaceEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.huggingface.HuggingFaceEmbeddings.html) and [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.openai.OpenAIEmbeddings.html)) to easily load the models and embed our document chunks.\n", 394 | "\n", 395 | "**Note**: embeddings aren't the only way to determine the more relevant chunks. We could also use an LLM to decide! However, because LLMs are much larger than these embedding models and have maximum context lengths, it's better to use embeddings to retrieve the top k chunks. And then we could use LLMs on the fewer k chunks to determine the " 555 | ] 556 | }, 557 | { 558 | "cell_type": "markdown", 559 | "id": "587030d3-4b28-4cf3-82c4-08bfcc7fa3c9", 560 | "metadata": {}, 561 | "source": [ 562 | "As the final step in our data pipeline, we will store the embeddings into our Postgres database" 563 | ] 564 | }, 565 | { 566 | "cell_type": "markdown", 567 | "id": "e3f5b67c-adfd-4f97-a3f8-3d2891674b56", 568 | "metadata": { 569 | "tags": [] 570 | }, 571 | "source": [ 572 | "#### Postgres Vector Store" 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "id": "1d26ef0f-14a5-423c-a429-c6d71dfe6e03", 578 | "metadata": {}, 579 | "source": [ 580 | "Let's setup our Postgres database. The following assume you have docker installed and launched postgres in a local container, i.e. via `docker-compose up -d`" 581 | ] 582 | }, 583 | { 584 | "cell_type": "code", 585 | "execution_count": null, 586 | "id": "1235c463-29bc-431d-b609-ad3ed06ef61c", 587 | "metadata": { 588 | "tags": [] 589 | }, 590 | "outputs": [], 591 | "source": [ 592 | "%%bash\n", 593 | "# Drop existing table if it exists\n", 594 | "docker exec -u postgres ai-engineer-workshop-postgres-1 psql -d postgres -c \"DROP TABLE IF EXISTS data_document;\"" 595 | ] 596 | }, 597 | { 598 | "cell_type": "code", 599 | "execution_count": null, 600 | "id": "8d2cd4e1-5d5b-4ccb-bb4e-5ef6bbc4f70c", 601 | "metadata": { 602 | "tags": [] 603 | }, 604 | "outputs": [], 605 | "source": [ 606 | "from llama_index.vector_stores import PGVectorStore\n", 607 | "\n", 608 | "# First create the table.\n", 609 | "def get_postgres_store():\n", 610 | " return PGVectorStore.from_params(\n", 611 | " database=\"postgres\", \n", 612 | " user=\"postgres\", \n", 613 | " password=\"postgres\", \n", 614 | " host=\"localhost\", \n", 615 | " table_name=\"document\",\n", 616 | " port=\"5432\",\n", 617 | " embed_dim=1536,\n", 618 | " )\n", 619 | "\n", 620 | "store = get_postgres_store()\n", 621 | "del store" 622 | ] 623 | }, 624 | { 625 | "cell_type": "code", 626 | "execution_count": null, 627 | "id": "139ff2a5-fc04-456d-8f6a-f2b408ab80ba", 628 | "metadata": { 629 | "tags": [] 630 | }, 631 | "outputs": [], 632 | "source": [ 633 | "class StoreResults:\n", 634 | " def __init__(self):\n", 635 | " self.vector_store = get_postgres_store()\n", 636 | " \n", 637 | " def __call__(self, batch):\n", 638 | " embedded_nodes = batch[\"embedded_nodes\"]\n", 639 | " self.vector_store.add(list(embedded_nodes))\n", 640 | " return {}" 641 | ] 642 | }, 643 | { 644 | "cell_type": "code", 645 | "execution_count": null, 646 | "id": "e3054ffa-50c3-4e71-9df7-11ad5905bfe0", 647 | "metadata": { 648 | "tags": [] 649 | }, 650 | "outputs": [], 651 | "source": [ 652 | "# Store all the embeddings in Postgres, and trigger exection of the Ray Data pipeline.\n", 653 | "from ray.util.scheduling_strategies import NodeAffinitySchedulingStrategy\n", 654 | "\n", 655 | "embedded_chunks.map_batches(\n", 656 | " StoreResults,\n", 657 | " batch_size=128,\n", 658 | " num_cpus=1,\n", 659 | " compute=ActorPoolStrategy(size=8),\n", 660 | " # Since our database is only created on the head node, we need to force the Ray tasks to only executed on the head node.\n", 661 | " scheduling_strategy=NodeAffinitySchedulingStrategy(node_id=ray.get_runtime_context().get_node_id(), soft=False)\n", 662 | " \n", 663 | ").count()" 664 | ] 665 | }, 666 | { 667 | "cell_type": "markdown", 668 | "id": "ac01c84b-019b-4005-9bb8-4d4366003ef2", 669 | "metadata": {}, 670 | "source": [ 671 | "Let's check our table to see how many chunks that we have stored." 672 | ] 673 | }, 674 | { 675 | "cell_type": "code", 676 | "execution_count": null, 677 | "id": "24e45b87-f44a-4cbc-8a75-78a8db2fe20e", 678 | "metadata": { 679 | "tags": [] 680 | }, 681 | "outputs": [], 682 | "source": [ 683 | "%%bash\n", 684 | "docker exec -u postgres ai-engineer-workshop-postgres-1 psql -c \"SELECT count(*) FROM data_document;\"" 685 | ] 686 | }, 687 | { 688 | "cell_type": "markdown", 689 | "id": "19b0a6c5-2963-43c4-b002-7b2c48c69842", 690 | "metadata": {}, 691 | "source": [ 692 | "## Step 2: Retrieval" 693 | ] 694 | }, 695 | { 696 | "cell_type": "markdown", 697 | "id": "16d7b49e-5cda-4542-8286-1e004c59db9f", 698 | "metadata": {}, 699 | "source": [ 700 | "Now that we have processed, embedded, and stored all of our chunks from the Ray documentation, we can test out the retrieval portion of the application.\n", 701 | "\n", 702 | "In the retrieval portion, we want to pull the relevant context for a given query. We do this by embedding the query using the same embedding model we used to embed the chunks, and then check for similarity between the embedded query and all the embedded chunks to pull the most relevant context.\n", 703 | "\n", 704 | "" 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": null, 710 | "id": "6749178b-f02a-4c2a-8c62-3b0b8dcce1f6", 711 | "metadata": { 712 | "tags": [] 713 | }, 714 | "outputs": [], 715 | "source": [ 716 | "from llama_index import VectorStoreIndex, ServiceContext" 717 | ] 718 | }, 719 | { 720 | "cell_type": "code", 721 | "execution_count": null, 722 | "id": "1a3d0823-721f-4813-8619-b5bc7eb15333", 723 | "metadata": { 724 | "tags": [] 725 | }, 726 | "outputs": [], 727 | "source": [ 728 | "# Create a connection to our Postgres vector store\n", 729 | "vector_store = get_postgres_store()" 730 | ] 731 | }, 732 | { 733 | "cell_type": "code", 734 | "execution_count": null, 735 | "id": "1d54980d-2749-44f4-a19d-c6dedf05638a", 736 | "metadata": { 737 | "tags": [] 738 | }, 739 | "outputs": [], 740 | "source": [ 741 | "# Use the same embedding model that we used to embed our documents.\n", 742 | "embedding_model = get_embedding_model(embedding_model_name)" 743 | ] 744 | }, 745 | { 746 | "cell_type": "code", 747 | "execution_count": null, 748 | "id": "3d2f3071-c7b5-407f-a344-093449235a14", 749 | "metadata": { 750 | "tags": [] 751 | }, 752 | "outputs": [], 753 | "source": [ 754 | "# Create our retriever.\n", 755 | "service_context = ServiceContext.from_defaults(embed_model=embedding_model, llm=None)\n", 756 | "index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)\n", 757 | "\n", 758 | "# Fetch the top 5 most relevant chunks.\n", 759 | "retriever = index.as_retriever(similarity_top_k=5)" 760 | ] 761 | }, 762 | { 763 | "cell_type": "markdown", 764 | "id": "a3dceebd-9c40-4315-b940-3026d3190533", 765 | "metadata": {}, 766 | "source": [ 767 | "Now, let's try a sample query and pull the most relevant context. Looks like the retrieval is working great! From the eye-test, it looks like the chunks are all relevant to the query." 768 | ] 769 | }, 770 | { 771 | "cell_type": "code", 772 | "execution_count": null, 773 | "id": "1ba1ce48-ba88-456c-8f42-262bb0a3ce40", 774 | "metadata": { 775 | "tags": [] 776 | }, 777 | "outputs": [], 778 | "source": [ 779 | "query = \"What is the default batch size for map_batches?\"\n", 780 | "nodes = retriever.retrieve(query)\n", 781 | "\n", 782 | "for node in nodes:\n", 783 | " print(node)\n", 784 | " print(\"Source: \", node.metadata[\"source\"])" 785 | ] 786 | }, 787 | { 788 | "cell_type": "markdown", 789 | "id": "b06ee9e3-9482-41c3-9edb-852d2376a276", 790 | "metadata": {}, 791 | "source": [ 792 | "## Step 3: Response generation" 793 | ] 794 | }, 795 | { 796 | "cell_type": "markdown", 797 | "id": "f87f064e-50bd-46f4-b8ed-7a367b5dcb8b", 798 | "metadata": {}, 799 | "source": [ 800 | "With our retrieval working, we can now build the next portion of our LLM application, which is the actual response generation.\n", 801 | "\n", 802 | "In this step, we pass in both the query and the relevant contex to an LLM. The LLM synthesizes a response to the query given the context. Without this relevant context that we retreived, the LLM may not have been able to accurately answer our question. And as our data grows, we can just as easily embed and index any new data and be able to retrieve it to answer questions.\n", 803 | "\n", 804 | "" 805 | ] 806 | }, 807 | { 808 | "cell_type": "markdown", 809 | "id": "db024edc-1922-4ec4-b90d-31def67a5f5e", 810 | "metadata": {}, 811 | "source": [ 812 | "Creating an end-to-end query engine becomes very easy with LlamaIndex and Anyscale Endpoints. With Anyscale endpoints, we can use open source LLMs, like Llama2 models, just as easy as Open AI, but more cost effectively." 813 | ] 814 | }, 815 | { 816 | "cell_type": "code", 817 | "execution_count": null, 818 | "id": "b4539eb1-b429-4a63-b96c-f8867953e518", 819 | "metadata": { 820 | "tags": [] 821 | }, 822 | "outputs": [], 823 | "source": [ 824 | "from llama_index.llms import OpenAI" 825 | ] 826 | }, 827 | { 828 | "cell_type": "code", 829 | "execution_count": null, 830 | "id": "97e1041f-ba18-48ac-b4e6-4527973f0159", 831 | "metadata": { 832 | "tags": [] 833 | }, 834 | "outputs": [], 835 | "source": [ 836 | "# Use OpenAI as the LLM to LlamaIndex.\n", 837 | "llm = OpenAI(model=\"gpt-3.5-turbo\", temperature=0.1)\n", 838 | "\n", 839 | "# Use the same embedding model that we used to embed our documents.\n", 840 | "embedding_model = get_embedding_model(embedding_model_name)\n", 841 | "\n", 842 | "service_context = ServiceContext.from_defaults(embed_model=embedding_model, llm=llm)" 843 | ] 844 | }, 845 | { 846 | "cell_type": "code", 847 | "execution_count": null, 848 | "id": "e117dab2-b483-417f-af63-d65e4fe51231", 849 | "metadata": { 850 | "tags": [] 851 | }, 852 | "outputs": [], 853 | "source": [ 854 | "# Create our query engine.\n", 855 | "vector_store = get_postgres_store()\n", 856 | "index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)\n", 857 | "query_engine = index.as_query_engine(similarity_top_k=5)" 858 | ] 859 | }, 860 | { 861 | "cell_type": "code", 862 | "execution_count": null, 863 | "id": "f923c14d-acb3-4598-84c8-c715a36d131d", 864 | "metadata": { 865 | "tags": [] 866 | }, 867 | "outputs": [], 868 | "source": [ 869 | "# Get a response to our query.\n", 870 | "\n", 871 | "query = \"What is the default batch size for map_batches?\"\n", 872 | "response = query_engine.query(query)" 873 | ] 874 | }, 875 | { 876 | "cell_type": "markdown", 877 | "id": "0cfa881e-a6ad-4d03-97a6-59a70827e4cf", 878 | "metadata": {}, 879 | "source": [ 880 | "Let's see the response to our query, as well as the retrieved context that we passed to the LLM." 881 | ] 882 | }, 883 | { 884 | "cell_type": "code", 885 | "execution_count": null, 886 | "id": "f07b97eb-ef73-4fd2-9d80-49802048fe84", 887 | "metadata": { 888 | "tags": [] 889 | }, 890 | "outputs": [], 891 | "source": [ 892 | "print(\"Response: \", response.response)\n", 893 | "print(\"\\n\")\n", 894 | "source_nodes = response.source_nodes\n", 895 | "\n", 896 | "for node in source_nodes:\n", 897 | " print(\"Text: \", node.node.text)\n", 898 | " print(\"Score: \", node.score)\n", 899 | " print(\"Source: \", node.node.metadata[\"source\"])\n", 900 | " print(\"\\n\")" 901 | ] 902 | } 903 | ], 904 | "metadata": { 905 | "kernelspec": { 906 | "display_name": "Python 3 (ipykernel)", 907 | "language": "python", 908 | "name": "python3" 909 | }, 910 | "language_info": { 911 | "codemirror_mode": { 912 | "name": "ipython", 913 | "version": 3 914 | }, 915 | "file_extension": ".py", 916 | "mimetype": "text/x-python", 917 | "name": "python", 918 | "nbconvert_exporter": "python", 919 | "pygments_lexer": "ipython3", 920 | "version": "3.11.4" 921 | } 922 | }, 923 | "nbformat": 4, 924 | "nbformat_minor": 5 925 | } 926 | -------------------------------------------------------------------------------- /notebooks/02_evaluation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "c044803a-3fe2-4297-ad71-c93ae2e078f5", 6 | "metadata": {}, 7 | "source": [ 8 | "# Part 2: Evaluating our LLM application" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "id": "afc13cfb-5e8a-401d-bb04-24a5793e69be", 14 | "metadata": {}, 15 | "source": [ 16 | "So far, we've chosen typical/arbitrary values for the various parts of our RAG application. But if we were to change something, such as our chunking logic, embedding model, LLM, etc. how can we know that we have a better configuration than before. A generative task like this is very difficult to quantitatively assess and so we need to develop creative ways to do so. \n", 17 | "\n", 18 | "Because we have many moving parts in our application, we need to perform unit/component and end-to-end evaluation. Component-wise evaluation can involve evaluating our retrieval in isolation (is the best source in our set of retrieved chunks) and evaluating our LLMs response (given the best source, is the LLM able to produce a quality answer). As for end-to-end evaluation, we can assess the quality of the entire system (given all data, what is the quality of the response).\n", 19 | "\n", 20 | "" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "5b67c46c-7650-484b-a792-7eaf62c0e82e", 26 | "metadata": {}, 27 | "source": [ 28 | "## Setup" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": null, 34 | "id": "119575d6-b0cc-49f3-a118-46bc3adf8189", 35 | "metadata": { 36 | "tags": [] 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "import os\n", 41 | "\n", 42 | "# os.environ[\"OPENAI_API_KEY\"] = ..." 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": null, 48 | "id": "ce9df539-d629-4e8b-9d93-2a2da5cd1b2c", 49 | "metadata": { 50 | "tags": [] 51 | }, 52 | "outputs": [], 53 | "source": [ 54 | "import nest_asyncio\n", 55 | "nest_asyncio.apply()" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "id": "32cedb0a-a7b3-4194-88a9-355122cd8a00", 61 | "metadata": {}, 62 | "source": [ 63 | "## Golden Context Dataset" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "id": "31583b8a-bd08-4054-95f6-2ae5256e6a21", 69 | "metadata": {}, 70 | "source": [ 71 | "In an ideal world, we would have a golden validation dataset: given a set of queries, we would have the correct sources that answer those queries, and optionally the correct answer that should be returned by the LLM.\n", 72 | "\n", 73 | "For this example, we have manually collected 177 representative user queries and identified the correct source in the documentation that answer those user queries." 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "id": "8495831b-c4f9-4e84-a211-bc37ecf369e2", 80 | "metadata": { 81 | "tags": [] 82 | }, 83 | "outputs": [], 84 | "source": [ 85 | "from pathlib import Path\n", 86 | "import json\n", 87 | "\n", 88 | "golden_dataset_path = Path(\"../datasets/eval-dataset-v1.jsonl\")\n", 89 | "\n", 90 | "with open(golden_dataset_path, \"r\") as f:\n", 91 | " data = [json.loads(item) for item in list(f)]\n", 92 | " \n", 93 | "len(data)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "id": "b187c2d5-5c4d-4b9a-9e5b-5f37c9e92b36", 99 | "metadata": {}, 100 | "source": [ 101 | "Our dataset contains 'question' and 'source' pairs. If we have a golden context dataset, it is the best option for evaluation." 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": null, 107 | "id": "2300e0fc-3e65-4f43-98b6-3564bc1ccb3e", 108 | "metadata": { 109 | "tags": [] 110 | }, 111 | "outputs": [], 112 | "source": [ 113 | "data[:5]" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "id": "c6739202-90fa-46e8-bc7d-3c0ed33c8673", 119 | "metadata": {}, 120 | "source": [ 121 | "## Cold Start" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "id": "20636b3e-5211-41c9-a390-0cfa23654c44", 127 | "metadata": { 128 | "tags": [] 129 | }, 130 | "source": [ 131 | "We may not always have a prepared dataset of questions and the best source to answer that question readily available. To address this cold start problem, we could use an LLM to look at our documents and generate questions that the specific chunk would answer. This provides us with quality questions and the exact source the answer is in. However, this dataset generation method could be a bit noisy. The generate questions may not always be resembling of what your users may ask and the specific chunk we say is the best source may also have that exact information in other chunks. Nonetheless, this is a great way to start our development process while we collect + manually label a high quality dataset.\n", 132 | "\n", 133 | "" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "id": "2f0192cc-40f7-4fe6-8175-25cedd51d9a2", 139 | "metadata": { 140 | "tags": [] 141 | }, 142 | "source": [ 143 | "We need to define a few parameters first. \n", 144 | "- Notably, the chunk size determines the size of the text chunk shown to the LLM when generating hypothetical question & answer pairs. This must be set below the context window limitation of the chosen LLM.\n", 145 | "- We choose a subsample ratio since we just want to construct a small representative subset for the purpose of evaluation and iteration. (We choose an even smaller subset for the purpose of the demonstration here).\n", 146 | "- We use `gpt-3.5-turbo` since it's fast and cheap. " 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "id": "ff6951de-bf96-4e1a-a666-893e62f3eab8", 153 | "metadata": {}, 154 | "outputs": [], 155 | "source": [ 156 | "from pathlib import Path\n", 157 | "\n", 158 | "RAY_DOCS_DIRECTORY = Path(\"docs.ray.io/en/master/\")" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "id": "79cd3470-8542-4323-a70d-c3cc9fdb04a8", 164 | "metadata": {}, 165 | "source": [ 166 | "First, we load in the documents and chunk them to the appropriate sizes, creating LlamaIndex nodes. We already did the data processing in part 1, and have packaged the logic as a utility." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "id": "790da50a-24c8-4d5f-8395-fb225f1c49fe", 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [ 176 | "from data import create_nodes\n", 177 | "\n", 178 | "# needs to be smaller than context window\n", 179 | "CHUNK_SIZE = 1024\n", 180 | "\n", 181 | "nodes = create_nodes(RAY_DOCS_DIRECTORY, chunk_size=CHUNK_SIZE, chunk_overlap=20).take_all()" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "id": "5dcc6135-03b1-43b3-8c0a-155b8eff3037", 188 | "metadata": {}, 189 | "outputs": [], 190 | "source": [ 191 | "nodes = [node_dict[\"node\"] for node_dict in nodes]\n", 192 | "id_to_node = {node.node_id: node for node in nodes}" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "id": "4a9f820b-bc94-40a4-8e66-c78d5c2120b9", 198 | "metadata": {}, 199 | "source": [ 200 | "Now, we subsample the nodes to obtain a representative subset (here we use a very small subset for a fast demonstration)" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "id": "cada634d-fc1e-4b1a-b1d5-d2266a518090", 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [ 210 | "from utils import subsample\n", 211 | "\n", 212 | "SUBSAMPLE_RATIO = 0.01\n", 213 | "\n", 214 | "subsampled_nodes = subsample(nodes, SUBSAMPLE_RATIO)\n", 215 | "print('Subsampled {} nodes into {} nodes'.format(len(nodes), len(subsampled_nodes)))" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "id": "d5856201-06db-4292-990c-21b1010e12c0", 221 | "metadata": {}, 222 | "source": [ 223 | "Now, we use LlamaIndex's built in utility `generate_qa_embedding_pairs` to create synthetic query/context pairs.\n", 224 | "\n", 225 | "(We can also use this utility for fine-tuning embeddings, hence the naming. More on this in part 3!)" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "id": "d718383c-74a2-4518-a51b-69a7857c5cde", 232 | "metadata": { 233 | "tags": [] 234 | }, 235 | "outputs": [], 236 | "source": [ 237 | "from llama_index.finetuning import generate_qa_embedding_pairs\n", 238 | "from llama_index.llms import OpenAI\n", 239 | "\n", 240 | "llm = OpenAI(model='gpt-3.5-turbo')\n", 241 | "synthetic_dataset = generate_qa_embedding_pairs(subsampled_nodes, llm=llm, num_questions_per_chunk=2)" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "id": "87212c89-9356-45c9-8608-552b350b7e96", 247 | "metadata": {}, 248 | "source": [ 249 | "Now we will transform the shape of the data a bit to match the format of our labeled data." 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "id": "bc5d468a-6a6e-468d-ae9d-95297dd2c860", 256 | "metadata": { 257 | "tags": [] 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "synthetic_data = []\n", 262 | "for query_id, context_ids in synthetic_dataset.relevant_docs.items():\n", 263 | " query = synthetic_dataset.queries[query_id]\n", 264 | " golden_context = id_to_node[context_ids[0]].metadata['source']\n", 265 | " entry = {\n", 266 | " 'question': query,\n", 267 | " 'source': golden_context,\n", 268 | " }\n", 269 | " synthetic_data.append(entry)" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "id": "486593e8-f24a-4278-8704-9845a90adbb5", 276 | "metadata": { 277 | "tags": [] 278 | }, 279 | "outputs": [], 280 | "source": [ 281 | "synthetic_data[:5]" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": null, 287 | "id": "1bd61e90-0b15-4cca-92ec-61b76804bedd", 288 | "metadata": { 289 | "tags": [] 290 | }, 291 | "outputs": [], 292 | "source": [ 293 | "from utils import write_jsonl\n", 294 | "\n", 295 | "write_jsonl(\"../datasets/synthetic-eval-dataset.jsonl\", synthetic_data)" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "id": "64373866-d0f7-4704-b03b-5c466fc59afb", 301 | "metadata": {}, 302 | "source": [ 303 | "Since we already have a dataset with representative user queries and ground truth labels, we will use that for evaluation instead of a synthetically generated dataset." 304 | ] 305 | }, 306 | { 307 | "cell_type": "markdown", 308 | "id": "330a7e10-18fc-495e-ab4a-da2f3ca00a97", 309 | "metadata": {}, 310 | "source": [ 311 | "## Evaluating Retrieval" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "id": "eb6e7e79-5a2a-4247-bc3a-d093d0416761", 317 | "metadata": {}, 318 | "source": [ 319 | "The first component to evaluate in our RAG application is retrieval. Given a query, is our retriever pulling in the correct context to answer that query? Regardless of how good our LLM is, if it does not have the right context to answer the question, it cannot provide the right answer.\n", 320 | "\n", 321 | "We can use our golden context dataset to evaluate retrieval. The simplest approach is that for each query in our dataset, we can test to see if the correct source is included in any of the chunks that are retrieved by our retriever. This measures \"hit rate\".\n", 322 | "\n", 323 | "However, simply checking for existence can be misleading if we increase the number of chunks that we retrieve. Therefore, we also want to check the score that our retriever gives for the correct source. A higher score means our retriever is accurately determining the correct context. \n", 324 | "\n", 325 | "To summarize, for each query in our evaluation dataset, we will measure the following:\n", 326 | "1. Is the correct source included in any of the retrived chunks?\n", 327 | "2. What is the score our retriever gives to the correct source?\n", 328 | "\n", 329 | "" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "id": "60a5958c-74d8-4a11-b46c-a12a8fd5ad99", 335 | "metadata": {}, 336 | "source": [ 337 | "First, let's a get a retriever over the vector database. We have packaged this as a utility. It is the same as we did in notebook 1." 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": null, 343 | "id": "b88e0d5f-a0ed-47e9-afcf-0ac882201e57", 344 | "metadata": { 345 | "tags": [] 346 | }, 347 | "outputs": [], 348 | "source": [ 349 | "from utils import get_retriever" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "id": "dad0afdc-2f44-4ca7-b5c0-2c6499f4f35e", 356 | "metadata": { 357 | "tags": [] 358 | }, 359 | "outputs": [], 360 | "source": [ 361 | "retriever = get_retriever(similarity_top_k=5, embedding_model_name='text-embedding-ada-002')" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "id": "4f3b5438-9930-4857-a9a5-165cbd15c79b", 367 | "metadata": {}, 368 | "source": [ 369 | "Now let's evaluate our retriever. " 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": null, 375 | "id": "bb4cd2ae-17ff-4e75-904f-a577499bc8ab", 376 | "metadata": { 377 | "tags": [] 378 | }, 379 | "outputs": [], 380 | "source": [ 381 | "from tqdm import tqdm\n", 382 | "results = []\n", 383 | "\n", 384 | "for entry in tqdm(data):\n", 385 | " query = entry[\"question\"]\n", 386 | " expected_source = entry['source']\n", 387 | " \n", 388 | " retrieved_nodes = retriever.retrieve(query)\n", 389 | " retrieved_sources = [node.metadata['source'] for node in retrieved_nodes]\n", 390 | " \n", 391 | " # If our label does not include a section, then any sections on the page should be considered a hit.\n", 392 | " if \"#\" not in expected_source:\n", 393 | " retrieved_sources = [source.split(\"#\")[0] for source in retrieved_sources]\n", 394 | " \n", 395 | " if expected_source in retrieved_sources:\n", 396 | " is_hit = True\n", 397 | " score = retrieved_nodes[retrieved_sources.index(expected_source)].score\n", 398 | " else:\n", 399 | " is_hit = False\n", 400 | " score = 0.0\n", 401 | " \n", 402 | " result = {\n", 403 | " \"is_hit\": is_hit,\n", 404 | " \"score\": score,\n", 405 | " \"retrieved\": retrieved_sources,\n", 406 | " \"expected\": expected_source,\n", 407 | " \"query\": query,\n", 408 | " }\n", 409 | " results.append(result)" 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": null, 415 | "id": "a12ecc64-efae-4a67-86de-3f3fbb36820b", 416 | "metadata": { 417 | "tags": [] 418 | }, 419 | "outputs": [], 420 | "source": [ 421 | "results[:2]" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "id": "3d1ddd39-f141-4585-93a4-593daa146f60", 427 | "metadata": {}, 428 | "source": [ 429 | "Let's see how well our retriever does. It's not great right now, but we now have a solid metric to evaluate our retriever for future optimizations." 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": null, 435 | "id": "df7d03f6-76bf-414d-a072-2862bdc2c1a8", 436 | "metadata": { 437 | "tags": [] 438 | }, 439 | "outputs": [], 440 | "source": [ 441 | "total_hits = sum(result[\"is_hit\"] for result in results)\n", 442 | "hit_percentage = total_hits / len(results)\n", 443 | "hit_percentage" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": null, 449 | "id": "069c754b-30d9-432a-98f0-1bcec9d0fe99", 450 | "metadata": { 451 | "tags": [] 452 | }, 453 | "outputs": [], 454 | "source": [ 455 | "average_score = sum(result[\"score\"] for result in results) / len(results)\n", 456 | "average_score" 457 | ] 458 | }, 459 | { 460 | "cell_type": "markdown", 461 | "id": "e852dd79-247f-47d3-8ffa-97f79781a68a", 462 | "metadata": {}, 463 | "source": [ 464 | "## End-to-end evaluation" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "id": "6193ec2a-3a68-412b-bbb9-997e57b27edf", 470 | "metadata": {}, 471 | "source": [ 472 | "While we can evaluate our retriever in isolation, ultimately we want to evaluate our RAG application end-to-end, which includes the final response generated from our LLM.\n", 473 | "\n", 474 | "To effectively evaluate our generated responses, we need \"ground truth\" responses. These ground truth responses can be generated by feeding the correct context to a \"golden\" LLM. Then, we can use an LLM to evaluate our generated responses compared to the ground truth responses.\n", 475 | "\n", 476 | "" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "id": "da62c16b-605f-41cd-8894-b7f2e6ae460c", 482 | "metadata": {}, 483 | "source": [ 484 | "### Choosing a Golden LLM" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "id": "ac0266f2-8a34-411e-a661-4c2fc8cbb5ef", 490 | "metadata": {}, 491 | "source": [ 492 | "To generate ground truth responses, and then to evaluate the generated responses vs. the ground truth, we need a \"golden\" LLM. But which LLM should we use? We now run into a problem: we need to determine the quality of different LLMs to choose as a \"golden\" LLM, but doing so requires a \"golden\" LLM. \n", 493 | "\n", 494 | "Leaderboards on general benchmarks provide a rough indication on which LLMs perform better, we will go with OpenAI's GPT-4 here since it's been shown to be [well aligned with human preferences](https://arxiv.org/pdf/2306.05685.pdf)" 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": null, 500 | "id": "75423140-d93e-495a-8a3a-f1cb3cbb72db", 501 | "metadata": { 502 | "tags": [] 503 | }, 504 | "outputs": [], 505 | "source": [ 506 | "from bs4 import BeautifulSoup\n", 507 | "\n", 508 | "def fetch_text_from_source(source: str):\n", 509 | " url, anchor = source.split(\"#\") if \"#\" in source else (source, None)\n", 510 | " file_path = Path(\"./\", url.split(\"https://\")[-1])\n", 511 | " with open(file_path, \"r\", encoding=\"utf-8\") as file:\n", 512 | " html_content = file.read()\n", 513 | " soup = BeautifulSoup(html_content, \"html.parser\")\n", 514 | " if anchor:\n", 515 | " target_element = soup.find(id=anchor)\n", 516 | " if target_element:\n", 517 | " text = target_element.get_text()\n", 518 | " else:\n", 519 | " return fetch_text_from_source(source=url)\n", 520 | " else:\n", 521 | " text = soup.get_text()\n", 522 | " return text" 523 | ] 524 | }, 525 | { 526 | "cell_type": "code", 527 | "execution_count": null, 528 | "id": "e578a402-99d5-4dbd-804e-e17c24518744", 529 | "metadata": { 530 | "tags": [] 531 | }, 532 | "outputs": [], 533 | "source": [ 534 | "example_source = data[0][\"source\"]\n", 535 | "print(example_source)\n", 536 | "\n", 537 | "text = fetch_text_from_source(example_source)\n", 538 | "text" 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": null, 544 | "id": "eed04e2f-ffcd-446d-9c16-4cb9c3f226ca", 545 | "metadata": { 546 | "tags": [] 547 | }, 548 | "outputs": [], 549 | "source": [ 550 | "from tqdm import tqdm\n", 551 | "from llama_index import ServiceContext\n", 552 | "from llama_index.llms import OpenAI\n", 553 | "from llama_index.response_synthesizers import get_response_synthesizer\n", 554 | "from llama_index.schema import TextNode, NodeWithScore\n", 555 | "\n", 556 | "def generate_responses(entries, llm):\n", 557 | " context_window = llm.metadata.context_window - 500\n", 558 | " service_context = ServiceContext.from_defaults(llm=llm, context_window=context_window)\n", 559 | " rs = get_response_synthesizer(service_context=service_context)\n", 560 | "\n", 561 | " responses = []\n", 562 | " for entry in tqdm(entries):\n", 563 | " query = entry[\"question\"]\n", 564 | " source = entry[\"source\"]\n", 565 | "\n", 566 | " context = fetch_text_from_source(source)\n", 567 | " nodes = [NodeWithScore(node=TextNode(text=context))]\n", 568 | "\n", 569 | " response = rs.synthesize(query, nodes=nodes)\n", 570 | " responses.append(response.response)\n", 571 | " return responses" 572 | ] 573 | }, 574 | { 575 | "cell_type": "markdown", 576 | "id": "08f5c2a6-a5ac-4b9c-ab45-61d444b97b9b", 577 | "metadata": {}, 578 | "source": [ 579 | "We can now generate our reference responses. Let's generate 10 reference responses and save them to a file." 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": null, 585 | "id": "5981d038-4b6c-4a97-8e05-ccf9121ade7e", 586 | "metadata": { 587 | "tags": [] 588 | }, 589 | "outputs": [], 590 | "source": [ 591 | "llm = OpenAI(model='gpt-4', temperature=0.0)\n", 592 | "ten_samples = data[:10]\n", 593 | "golden_responses = generate_responses(ten_samples, llm)" 594 | ] 595 | }, 596 | { 597 | "cell_type": "code", 598 | "execution_count": null, 599 | "id": "5fc8506c-0deb-43ae-9936-cd323635cc44", 600 | "metadata": { 601 | "tags": [] 602 | }, 603 | "outputs": [], 604 | "source": [ 605 | "reference_dataset = [{\"question\": entry[\"question\"], \"source\": entry[\"source\"], \"response\": response} for entry, response in zip(ten_samples, golden_responses)]" 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "execution_count": null, 611 | "id": "23632aab-2be3-49cb-905b-1b26d7bde46a", 612 | "metadata": { 613 | "tags": [] 614 | }, 615 | "outputs": [], 616 | "source": [ 617 | "with open(\"../datasets/golden-responses.json\", \"w\") as file:\n", 618 | " json.dump(reference_dataset, file, indent=4)" 619 | ] 620 | }, 621 | { 622 | "cell_type": "markdown", 623 | "id": "23dbf6c6-74ea-4802-be16-f96f2982a642", 624 | "metadata": {}, 625 | "source": [ 626 | "## Evaluating our Query Engine" 627 | ] 628 | }, 629 | { 630 | "cell_type": "markdown", 631 | "id": "3ef0af94-f983-464f-8a4b-a91ff72d7f87", 632 | "metadata": {}, 633 | "source": [ 634 | "Once we have reference responses, we can get our generated responses from our query engine. Then pass both responses to our golden LLM to evaluate the responses from our application." 635 | ] 636 | }, 637 | { 638 | "cell_type": "code", 639 | "execution_count": null, 640 | "id": "cd15459b-a679-4f1b-832c-bb38cd467d0f", 641 | "metadata": { 642 | "tags": [] 643 | }, 644 | "outputs": [], 645 | "source": [ 646 | "with open(\"../datasets/golden-responses.json\", \"r\") as file:\n", 647 | " golden_responses = json.load(file)" 648 | ] 649 | }, 650 | { 651 | "cell_type": "code", 652 | "execution_count": null, 653 | "id": "3f2a555f-20c0-4716-b4b2-1d9921edfa79", 654 | "metadata": { 655 | "tags": [] 656 | }, 657 | "outputs": [], 658 | "source": [ 659 | "golden_responses[0]" 660 | ] 661 | }, 662 | { 663 | "cell_type": "code", 664 | "execution_count": null, 665 | "id": "b8a53891-cd44-4efb-b9f9-9e3cc7de3a6b", 666 | "metadata": { 667 | "tags": [] 668 | }, 669 | "outputs": [], 670 | "source": [ 671 | "from utils import get_query_engine" 672 | ] 673 | }, 674 | { 675 | "cell_type": "code", 676 | "execution_count": null, 677 | "id": "3adebada-bce2-4600-ac2f-4ba541ce32f1", 678 | "metadata": { 679 | "tags": [] 680 | }, 681 | "outputs": [], 682 | "source": [ 683 | "query_engine = get_query_engine(similarity_top_k=5, llm_model_name='gpt-3.5-turbo', embedding_model_name='text-embedding-ada-002')\n", 684 | "\n", 685 | "# Store both the original response object and the response string.\n", 686 | "rag_responses = []\n", 687 | "rag_response_str = []\n", 688 | "\n", 689 | "for entry in tqdm(golden_responses):\n", 690 | " query = entry[\"question\"]\n", 691 | " response = query_engine.query(query)\n", 692 | " rag_responses.append(response)\n", 693 | " rag_response_str.append(response.response)" 694 | ] 695 | }, 696 | { 697 | "cell_type": "code", 698 | "execution_count": null, 699 | "id": "a17e76f6-7333-4ea7-9357-3bc125c59cf8", 700 | "metadata": { 701 | "tags": [] 702 | }, 703 | "outputs": [], 704 | "source": [ 705 | "rag_response_str[0]" 706 | ] 707 | }, 708 | { 709 | "cell_type": "code", 710 | "execution_count": null, 711 | "id": "a9a8456b-8a78-4e87-8110-3fd163f10864", 712 | "metadata": { 713 | "tags": [] 714 | }, 715 | "outputs": [], 716 | "source": [ 717 | "from llama_index.evaluation import CorrectnessEvaluator" 718 | ] 719 | }, 720 | { 721 | "cell_type": "code", 722 | "execution_count": null, 723 | "id": "ef83fc36-e93d-4341-b7a6-d0a24a598872", 724 | "metadata": { 725 | "tags": [] 726 | }, 727 | "outputs": [], 728 | "source": [ 729 | "eval_llm = OpenAI(model='gpt-4', temperature=0.0)\n", 730 | "service_context = ServiceContext.from_defaults(llm=eval_llm)\n", 731 | "evaluator = CorrectnessEvaluator(service_context=service_context)" 732 | ] 733 | }, 734 | { 735 | "cell_type": "code", 736 | "execution_count": null, 737 | "id": "90bb5e09-08e1-40e8-9129-0f441bbce72f", 738 | "metadata": { 739 | "tags": [] 740 | }, 741 | "outputs": [], 742 | "source": [ 743 | "eval_results = []\n", 744 | "for rag_response, golden_response in tqdm(list(zip(rag_response_str, golden_responses))):\n", 745 | " query = golden_response[\"question\"]\n", 746 | " golden_answer = golden_response[\"response\"]\n", 747 | " generated_answer = rag_response\n", 748 | " \n", 749 | " eval_result = evaluator.evaluate(query=query, reference=golden_answer, response=generated_answer)\n", 750 | " eval_results.append(eval_result)" 751 | ] 752 | }, 753 | { 754 | "cell_type": "code", 755 | "execution_count": null, 756 | "id": "91e27497-0ec6-4ae3-bc0a-1c342afb3047", 757 | "metadata": { 758 | "tags": [] 759 | }, 760 | "outputs": [], 761 | "source": [ 762 | "[r.score for r in eval_results]" 763 | ] 764 | }, 765 | { 766 | "cell_type": "markdown", 767 | "id": "3190240d-7bb1-4079-8bfa-0d6d0ed9166d", 768 | "metadata": {}, 769 | "source": [ 770 | "Let's save the query, both responses, and the score to a JSON file" 771 | ] 772 | }, 773 | { 774 | "cell_type": "code", 775 | "execution_count": null, 776 | "id": "8fc98a6d-a27a-4a42-a5c5-4574a336e679", 777 | "metadata": { 778 | "tags": [] 779 | }, 780 | "outputs": [], 781 | "source": [ 782 | "scores = [\n", 783 | " {\"question\": golden_response[\"question\"],\n", 784 | " \"golden_response\": golden_response[\"response\"],\n", 785 | " \"generated_response\": eval_result.response,\n", 786 | " \"score\": eval_result.score,\n", 787 | " \"reasoning\": eval_result.feedback,\n", 788 | " }\n", 789 | " for eval_result, golden_response in zip(eval_results, golden_responses)\n", 790 | "]" 791 | ] 792 | }, 793 | { 794 | "cell_type": "code", 795 | "execution_count": null, 796 | "id": "e00e6239-5c44-4c1a-89d5-7549fb0ad98a", 797 | "metadata": { 798 | "tags": [] 799 | }, 800 | "outputs": [], 801 | "source": [ 802 | "with open(\"eval-scores.json\", \"w\") as file:\n", 803 | " json.dump(scores, file, indent=4)" 804 | ] 805 | }, 806 | { 807 | "cell_type": "markdown", 808 | "id": "095ef29b-b968-4449-9a50-4234a5b294ab", 809 | "metadata": {}, 810 | "source": [ 811 | "We can also calculate the average scores" 812 | ] 813 | }, 814 | { 815 | "cell_type": "code", 816 | "execution_count": null, 817 | "id": "d6b7d3a9-d2c5-4381-b2d4-9c1aa1d21df1", 818 | "metadata": { 819 | "tags": [] 820 | }, 821 | "outputs": [], 822 | "source": [ 823 | "average_scores = sum(score[\"score\"] for score in scores) / len(scores)\n", 824 | "average_scores" 825 | ] 826 | }, 827 | { 828 | "cell_type": "markdown", 829 | "id": "4f0229cb-2256-4762-a772-55740e617951", 830 | "metadata": {}, 831 | "source": [ 832 | "## Evaluation without Golden Responses" 833 | ] 834 | }, 835 | { 836 | "cell_type": "markdown", 837 | "id": "d43b1970-ded0-4ee4-b49e-49447fad1533", 838 | "metadata": {}, 839 | "source": [ 840 | "Generating reference responses and then using them for evaluation can give us a more accurate assesment on how our query engine is performing. However, this approach can be expensive. We have to make an initial pass through GPT4 to generate the reference response, and then we have to make another pass through GPT4 to evaluate our application's responses against the reference response.\n", 841 | "\n", 842 | "We can explore other evaluation metrics to get a better sense on how our query engine is performing, without needing to make multiple passes to GPT4." 843 | ] 844 | }, 845 | { 846 | "cell_type": "markdown", 847 | "id": "1905688d-9a0d-4e2a-81ea-40242f50905a", 848 | "metadata": {}, 849 | "source": [ 850 | "### Evaluating for faithfulness/relevancy" 851 | ] 852 | }, 853 | { 854 | "cell_type": "markdown", 855 | "id": "6662a23a-8388-42cc-b986-0062afee3b48", 856 | "metadata": {}, 857 | "source": [ 858 | "One metric we can test is relevancy, which does not require generating reference responses. With this approach, we check to see if the generated response is relevant to at least one of the retrieved sources and to the query. This ensures that our LLM is not making up a response, but rather that it is relevant to the question that is being asked, and also that is relevant to at least one of the retrieved context.\n", 859 | "\n", 860 | "This does NOT check whether the response is a correct response.\n", 861 | "\n", 862 | "This capability is built into LlamaIndex, via the various `Evaluator` modules. We use gpt-4 as the evaluator." 863 | ] 864 | }, 865 | { 866 | "cell_type": "code", 867 | "execution_count": null, 868 | "id": "c767913f-fc0a-4437-932a-4a5106a5768c", 869 | "metadata": { 870 | "tags": [] 871 | }, 872 | "outputs": [], 873 | "source": [ 874 | "from llama_index.evaluation import FaithfulnessEvaluator, RelevancyEvaluator\n", 875 | "from llama_index.llms import OpenAI\n", 876 | "from llama_index import ServiceContext\n", 877 | "\n", 878 | "def evaluate(queries: list, responses: list, metric: str):\n", 879 | " llm = OpenAI(model=\"gpt-4\", temperature=0.0)\n", 880 | " service_context = ServiceContext.from_defaults(llm=llm)\n", 881 | " \n", 882 | " \n", 883 | " if metric == 'faithfulness':\n", 884 | " evaluator = FaithfulnessEvaluator(service_context=service_context)\n", 885 | " elif metric == 'relevancy':\n", 886 | " evaluator = RelevancyEvaluator(service_context=service_context)\n", 887 | " else:\n", 888 | " raise ValueError(\"Unknown metric: \", metrc)\n", 889 | "\n", 890 | " evals = []\n", 891 | " for query, response in tqdm(list(zip(queries, responses))):\n", 892 | " eval_result = evaluator.evaluate_response(query=query, response=response)\n", 893 | " evals.append(eval_result)\n", 894 | " \n", 895 | " return evals\n", 896 | "\n", 897 | "def get_pass_rate(evals):\n", 898 | " return len([val.passing for val in evals]) / len(evals)" 899 | ] 900 | }, 901 | { 902 | "cell_type": "code", 903 | "execution_count": null, 904 | "id": "6ecd11db-ced6-433b-844d-03c6d0adc30a", 905 | "metadata": { 906 | "tags": [] 907 | }, 908 | "outputs": [], 909 | "source": [ 910 | "faithfulness_results = evaluate(queries=[sample[\"question\"] for sample in ten_samples], responses=rag_responses, metric='faithfulness')" 911 | ] 912 | }, 913 | { 914 | "cell_type": "code", 915 | "execution_count": null, 916 | "id": "ed4555fa-5901-49df-8f66-afbe00d27079", 917 | "metadata": { 918 | "tags": [] 919 | }, 920 | "outputs": [], 921 | "source": [ 922 | "faithfulness_score = get_pass_rate(faithfulness_results)\n", 923 | "faithfulness_score" 924 | ] 925 | }, 926 | { 927 | "cell_type": "code", 928 | "execution_count": null, 929 | "id": "84e8a04f-62f6-46cb-80e1-adde0195c1cc", 930 | "metadata": { 931 | "tags": [] 932 | }, 933 | "outputs": [], 934 | "source": [ 935 | "relevancy_results = evaluate(queries=[sample[\"question\"] for sample in ten_samples], responses=rag_responses, metric='relevancy')" 936 | ] 937 | }, 938 | { 939 | "cell_type": "code", 940 | "execution_count": null, 941 | "id": "61b35f60-3989-459c-a26e-12ed35456e8f", 942 | "metadata": { 943 | "tags": [] 944 | }, 945 | "outputs": [], 946 | "source": [ 947 | "relevancy_score = get_pass_rate(relevancy_results)\n", 948 | "relevancy_score" 949 | ] 950 | }, 951 | { 952 | "cell_type": "code", 953 | "execution_count": null, 954 | "id": "a535e536-8829-46af-96ad-c3ba832eead4", 955 | "metadata": {}, 956 | "outputs": [], 957 | "source": [] 958 | } 959 | ], 960 | "metadata": { 961 | "kernelspec": { 962 | "display_name": "Python 3 (ipykernel)", 963 | "language": "python", 964 | "name": "python3" 965 | }, 966 | "language_info": { 967 | "codemirror_mode": { 968 | "name": "ipython", 969 | "version": 3 970 | }, 971 | "file_extension": ".py", 972 | "mimetype": "text/x-python", 973 | "name": "python", 974 | "nbconvert_exporter": "python", 975 | "pygments_lexer": "ipython3", 976 | "version": "3.11.4" 977 | } 978 | }, 979 | "nbformat": 4, 980 | "nbformat_minor": 5 981 | } 982 | -------------------------------------------------------------------------------- /notebooks/data.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | import os 4 | import ray 5 | from ray.data import ActorPoolStrategy 6 | from ray.util.scheduling_strategies import NodeAffinitySchedulingStrategy 7 | 8 | from llama_index.embeddings import OpenAIEmbedding, HuggingFaceEmbedding 9 | from llama_index.readers import HTMLTagReader 10 | from llama_index.vector_stores import PGVectorStore 11 | from llama_index.node_parser import SimpleNodeParser 12 | 13 | EMBEDDING_DIMENSIONS = { 14 | 'thenlper/gte-base': 768, 15 | 'BAAI/bge-large-en': 1024, 16 | 'text-embedding-ada-002': 1536 17 | } 18 | 19 | def path_to_uri(path, scheme="https://", domain="docs.ray.io"): 20 | # Converts the file path of a Ray documentation page to the original URL for the documentation. 21 | # Example: /efs/shared_storage/goku/docs.ray.io/en/master/rllib-env.html -> https://docs.ray.io/en/master/rllib/rllib-env.html#environments 22 | return scheme + domain + str(path).split(domain)[-1] 23 | 24 | 25 | def extract_sections(record): 26 | # Given a HTML file path, extract out text from the section tags, and return a LlamaIndex document from each one. 27 | html_file_path = record["path"] 28 | reader = HTMLTagReader(tag="section") 29 | documents = reader.load_data(html_file_path) 30 | 31 | # For each document, store the source URL as part of the metadata. 32 | for document in documents: 33 | document.metadata["source"] = f"{path_to_uri(document.metadata['file_path'])}#{document.metadata['tag_id']}" 34 | return [{"document": document} for document in documents] 35 | 36 | 37 | def get_embedding_model(model_name, embed_batch_size=100): 38 | if model_name == "text-embedding-ada-002": 39 | return OpenAIEmbedding( 40 | model=model_name, 41 | embed_batch_size=embed_batch_size, 42 | api_key=os.environ["OPENAI_API_KEY"]) 43 | else: 44 | return HuggingFaceEmbedding( 45 | model_name=model_name, 46 | embed_batch_size=embed_batch_size 47 | ) 48 | 49 | 50 | class EmbedChunks: 51 | def __init__(self, model_name): 52 | self.embedding_model = get_embedding_model(model_name) 53 | 54 | def __call__(self, node_batch): 55 | # Get the batch of text that we want to embed. 56 | nodes = node_batch["node"] 57 | text = [node.text for node in nodes] 58 | 59 | # Embed the batch of text. 60 | embeddings = self.embedding_model.get_text_embedding_batch(text) 61 | assert len(nodes) == len(embeddings) 62 | 63 | # Store the embedding in the LlamaIndex node. 64 | for node, embedding in zip(nodes, embeddings): 65 | node.embedding = embedding 66 | return {"embedded_nodes": nodes} 67 | 68 | 69 | def get_postgres_store(embed_dim=768): 70 | return PGVectorStore.from_params( 71 | database="postgres", 72 | user="postgres", 73 | password="postgres", 74 | host="localhost", 75 | table_name="document", 76 | port="5432", 77 | embed_dim=embed_dim, 78 | ) 79 | 80 | 81 | class StoreResults: 82 | def __init__(self, embed_dim=768): 83 | self.vector_store = get_postgres_store(embed_dim) 84 | 85 | def __call__(self, batch): 86 | embedded_nodes = batch["embedded_nodes"] 87 | self.vector_store.add(list(embedded_nodes)) 88 | return {} 89 | 90 | def create_nodes(docs_path, chunk_size, chunk_overlap): 91 | ds = ray.data.from_items( 92 | [{"path": path} for path in docs_path.rglob("*.html") if not path.is_dir()] 93 | ) 94 | sections_ds = ds.flat_map(extract_sections) 95 | 96 | def chunk_document(document): 97 | node_parser = SimpleNodeParser.from_defaults( 98 | chunk_size=chunk_size, 99 | chunk_overlap=chunk_overlap 100 | ) 101 | nodes = node_parser.get_nodes_from_documents([document["document"]]) 102 | return [{"node": node} for node in nodes] 103 | 104 | chunks_ds = sections_ds.flat_map(chunk_document, scheduling_strategy=NodeAffinitySchedulingStrategy(node_id=ray.get_runtime_context().get_node_id(), soft=False)) 105 | 106 | return chunks_ds 107 | 108 | def build_index(docs_path, embedding_model_name, chunk_size, chunk_overlap): 109 | 110 | chunks_ds = create_nodes(docs_path, chunk_size, chunk_overlap) 111 | 112 | embedded_chunks = chunks_ds.map_batches( 113 | EmbedChunks, 114 | fn_constructor_kwargs={"model_name": embedding_model_name}, 115 | batch_size=100, 116 | num_gpus=0 if embedding_model_name!="text-embedding-ada-002" else 0, 117 | compute=ActorPoolStrategy(size=2)) 118 | 119 | # Index data 120 | embed_dim=EMBEDDING_DIMENSIONS[embedding_model_name] 121 | embedded_chunks.map_batches( 122 | StoreResults, 123 | fn_constructor_kwargs={"embed_dim": embed_dim}, 124 | batch_size=128, 125 | num_cpus=1, 126 | compute=ActorPoolStrategy(size=8), 127 | # Since our database is only created on the head node, we need to force the Ray tasks to only executed on the head node. 128 | scheduling_strategy=NodeAffinitySchedulingStrategy(node_id=ray.get_runtime_context().get_node_id(), soft=False) 129 | 130 | ).count() -------------------------------------------------------------------------------- /notebooks/eval-scores.json: -------------------------------------------------------------------------------- 1 | [ 2 | { 3 | "question": "I\u2019m struggling a bit with Ray Data type conversions when I do map_batches. Any advice?", 4 | "golden_response": { 5 | "question": "I\u2019m struggling a bit with Ray Data type conversions when I do map_batches. Any advice?", 6 | "source": "https://docs.ray.io/en/master/data/transforming-data.html#configuring-batch-format", 7 | "response": "Sure, when using the `map_batches()` function in Ray Data, you can specify the batch format by using the `batch_format` parameter. If you want to work with NumPy ndarrays, you can set `batch_format=\"numpy\"`. For example, if you have a function that increases the brightness of an image, you can use this format. \n\nHere's a sample code:\n\n```python\nfrom typing import Dict\nimport numpy as np\nimport ray\n\ndef increase_brightness(batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:\n batch[\"image\"] = np.clip(batch[\"image\"] + 4, 0, 255)\n return batch\n\nds = (\n ray.data.read_images(\"s3://[email\u00a0protected]/image-datasets/simple\")\n .map_batches(increase_brightness, batch_format=\"numpy\")\n)\n```\n\nOn the other hand, if you prefer to work with pandas DataFrames, you can set `batch_format=\"pandas\"`. For instance, if you have a function that drops NA values from a DataFrame, you can use this format.\n\nHere's a sample code:\n\n```python\nimport pandas as pd\nimport ray\n\ndef drop_nas(batch: pd.DataFrame) -> pd.DataFrame:\n return batch.dropna()\n\nds = (\n ray.data.read_csv(\"s3://[email\u00a0protected]/iris.csv\")\n .map_batches(drop_nas, batch_format=\"pandas\")\n)\n```\n\nRemember, by default, Ray Data represents batches as dicts of NumPy ndarrays." 8 | }, 9 | "generated_response": "Consider specifying the batch format in the `map_batches()` function when working with Ray Data type conversions. By default, Ray Data represents batches as dictionaries of NumPy ndarrays. However, you can configure the batch type by setting the `batch_format` parameter in `map_batches()`. This allows you to return batches in either NumPy ndarrays or pandas DataFrames format. By specifying the desired batch format, you can ensure that the type conversions are handled correctly during the `map_batches()` operation.", 10 | "score": 4.5, 11 | "reasoning": "The generated answer is relevant and mostly correct. It provides the user with advice on how to handle type conversions in Ray Data when using the `map_batches()` function, which is what the user asked for. However, it lacks the specific examples provided in the reference answer, which would have made the explanation clearer and more practical." 12 | }, 13 | { 14 | "question": "How does autoscaling work in a Ray Serve application?", 15 | "golden_response": { 16 | "question": "How does autoscaling work in a Ray Serve application?", 17 | "source": "https://docs.ray.io/en/master/serve/scaling-and-resource-allocation.html#autoscaling", 18 | "response": "In a Ray Serve application, autoscaling works by adjusting to traffic spikes through observing queue sizes and making scaling decisions to add or remove replicas. This is achieved by setting the autoscaling_config field in deployment options. The configuration includes parameters like min_replicas, max_replicas, initial_replicas, and target_num_ongoing_requests_per_replica. \n\nThe min_replicas and max_replicas fields define the range of replicas that the autoscaler can choose from. The initial_replicas field sets the starting number of replicas, and if it's not specified, it defaults to min_replicas. The target_num_ongoing_requests_per_replica parameter determines how aggressively the autoscaler should react to traffic. \n\nThe Ray Serve Autoscaler operates at the application level and sits on top of the Ray Autoscaler. It requests Ray to start a number of replica actors based on the request demand. If the Ray Autoscaler determines there aren\u2019t enough available CPUs to place these actors, it requests more Ray nodes from the underlying cloud provider. \n\nThere are also parameters like downscale_delay_s and upscale_delay_s that control the frequency of autoscaling work, and smoothing_factor to speed up or slow down each autoscaling step. The metrics_interval_s parameter controls how often each replica sends metrics to the autoscaler." 19 | }, 20 | "generated_response": "The autoscaling feature in a Ray Serve application automatically adjusts the number of replicas based on the load. The Ray Serve Autoscaler, which runs in the Serve Controller actor, receives metrics from each ServeHandle and replica. It periodically checks the ServeHandle queues and in-flight queries on replicas to determine whether to scale the number of replicas. When new replicas are discovered, the ServeHandle sends any buffered or new queries to the replica until the maximum number of concurrent queries is reached. Autoscaling in Ray Serve considers logical resource requests for scaling, such as those specified in @ray.remote and displayed in ray status, rather than physical machine utilization. If there are insufficient resources, the request will be queued, and the autoscaler will add nodes to satisfy the resource demands in the queue. Idle nodes, which have no active tasks, actors, or objects, are removed by the autoscaler after a certain period of time.", 21 | "score": 4.5, 22 | "reasoning": "The generated answer is highly relevant to the user query and provides a detailed explanation of how autoscaling works in a Ray Serve application. It correctly describes the process of adjusting the number of replicas based on load, the role of the Serve Controller actor, and the consideration of logical resource requests for scaling. However, it does not mention specific configuration parameters like min_replicas, max_replicas, initial_replicas, and target_num_ongoing_requests_per_replica, which are included in the reference answer." 23 | }, 24 | { 25 | "question": "how do I get the address of a ray node", 26 | "golden_response": { 27 | "question": "how do I get the address of a ray node", 28 | "source": "https://docs.ray.io/en/master/ray-core/miscellaneous.html#node-information", 29 | "response": "You can get the address of a Ray node by accessing the 'NodeManagerAddress' field of the node's information. This field provides the PrivateIP of the node that the raylet is on." 30 | }, 31 | "generated_response": "You can use the `ray.nodes` API to fetch all nodes and map the node ID to the corresponding IP address.", 32 | "score": 3.5, 33 | "reasoning": "The generated answer is relevant to the user query and provides a method to get the address of a Ray node. However, it does not mention the 'NodeManagerAddress' field which is part of the correct procedure according to the reference answer." 34 | }, 35 | { 36 | "question": "Does Ray support NCCL?", 37 | "golden_response": { 38 | "question": "Does Ray support NCCL?", 39 | "source": "https://docs.ray.io/en/master/ray-more-libs/ray-collective.html", 40 | "response": "Yes, Ray does support NCCL. It uses NCCL as one of the optional high-performance communication backends in its collective communication library." 41 | }, 42 | "generated_response": "Yes, Ray supports NCCL as one of the optional high-performance communication backends for its collective communication library.", 43 | "score": 5.0, 44 | "reasoning": "The generated answer is completely relevant and correct, providing the same information as the reference answer. It is also concise and clear." 45 | }, 46 | { 47 | "question": "Is Ray integrated with DeepSpeed?", 48 | "golden_response": { 49 | "question": "Is Ray integrated with DeepSpeed?", 50 | "source": "https://docs.ray.io/en/master/ray-air/examples/gptj_deepspeed_fine_tuning.html#fine-tuning-the-model-with-ray-air-a-name-train-a", 51 | "response": "Yes, Ray is integrated with DeepSpeed. This integration allows for the optimization of the training process, including the ability to offload and partition optimizer and parameter states, which can reduce GPU memory usage. This is particularly useful when working with large models that may not fit on smaller GPU types." 52 | }, 53 | "generated_response": "Yes, Ray is integrated with DeepSpeed.", 54 | "score": 4.0, 55 | "reasoning": "The generated answer is correct and relevant to the user query. However, it lacks the detailed explanation provided in the reference answer." 56 | }, 57 | { 58 | "question": "what will happen if I use AsyncIO's await to wait for a Ray future like `await x.remote()`", 59 | "golden_response": { 60 | "question": "what will happen if I use AsyncIO's await to wait for a Ray future like `await x.remote()`", 61 | "source": "https://docs.ray.io/en/master/ray-core/actors/async_api.html#objectrefs-as-asyncio-futures", 62 | "response": "If you use AsyncIO's await to wait for a Ray future like `await x.remote()`, it will allow you to wait for the completion of the remote task. This feature is useful for integrating Ray futures into existing concurrent applications." 63 | }, 64 | "generated_response": "Using AsyncIO's `await` to wait for a Ray future like `await x.remote()` will allow the program to continue executing other tasks while waiting for the Ray future to complete. This enables concurrency and parallelism in the program, as other tasks can be executed while waiting for the result of the Ray future. This is in contrast to using `ray.get()` which is a blocking call and would prevent the program from executing other tasks until the result is available.", 65 | "score": 4.5, 66 | "reasoning": "The generated answer is relevant and mostly correct. It accurately explains the benefits of using AsyncIO's `await` with Ray futures, such as enabling concurrency and parallelism. However, it does not explicitly mention that it allows you to wait for the completion of the remote task, which is mentioned in the reference answer." 67 | }, 68 | { 69 | "question": "How would you compare Spark, Ray, Dask?", 70 | "golden_response": { 71 | "question": "How would you compare Spark, Ray, Dask?", 72 | "source": "https://docs.ray.io/en/master/data/overview.html#how-does-ray-data-compare-to-x-for-offline-inference", 73 | "response": "The context does not provide information on how to compare Spark, Ray, and Dask." 74 | }, 75 | "generated_response": "Spark, Ray, and Dask are all frameworks that are used for distributed computing and data processing. Spark is a popular framework that is known for its ability to handle large-scale batch processing workloads. It provides a streaming paradigm and is commonly used for big data analytics and machine learning tasks. Ray, on the other hand, is a framework that is designed for distributed computing and is particularly well-suited for GPU workloads, such as deep learning inference. It provides features like launching cloud clusters and shared-memory store. Dask is a Python library that is focused on parallel computing and is geared towards scaling analytics and scientific computing workloads. It provides big data collections that mimic the APIs of NumPy and Pandas, allowing for operations on larger-than-memory data and execution on multi-machine clusters. Overall, while Spark is widely used for batch processing, Ray is better suited for GPU workloads, and Dask is focused on parallel computing and scaling analytics workloads.", 76 | "score": 5.0, 77 | "reasoning": "The generated answer is highly relevant to the user query and provides a detailed comparison of Spark, Ray, and Dask. It not only explains what each of these frameworks is used for, but also highlights their unique features and use cases. The answer is fully correct and provides more information than the reference answer." 78 | }, 79 | { 80 | "question": "why would ray overload a node w/ more task that the resources allow ?", 81 | "golden_response": { 82 | "question": "why would ray overload a node w/ more task that the resources allow ?", 83 | "source": "https://docs.ray.io/en/master/ray-core/scheduling/resources.html#physical-resources-and-logical-resources", 84 | "response": "Ray allows for overloading a node with more tasks than the resources allow because it treats resources as logical rather than physical. This means that the resource requirements of tasks or actors do not impose limits on actual physical resource usage. It's the user's responsibility to ensure that tasks or actors do not use more resources than specified. Ray doesn't provide CPU isolation for tasks or actors, and it lets the operating system schedule and run the task. However, it does provide GPU isolation by setting the CUDA_VISIBLE_DEVICES environment variable." 85 | }, 86 | "generated_response": "Ray may overload a node with more tasks than the resources allow if the tasks or actors use more than their proportionate share of memory. This can lead to issues like out-of-memory (OOM) errors. To prevent this, it is recommended to reduce the number of concurrently running tasks or actors on each node by increasing the amount of resources requested by them. By increasing the amount of resources, Ray ensures that the sum of the resource requirements of all concurrently running tasks and actors on a given node does not exceed the node's total resources.", 87 | "score": 2.5, 88 | "reasoning": "The generated answer is relevant to the user query but it contains mistakes. It incorrectly suggests that Ray overloads a node due to tasks or actors using more than their share of memory and recommends increasing the amount of resources requested by them. The reference answer correctly explains that Ray treats resources as logical rather than physical, and it's the user's responsibility to ensure tasks or actors do not use more resources than specified." 89 | }, 90 | { 91 | "question": "when should I use Ray Client?", 92 | "golden_response": { 93 | "question": "when should I use Ray Client?", 94 | "source": "https://docs.ray.io/en/master/cluster/running-applications/job-submission/ray-client.html#when-to-use-ray-client", 95 | "response": "Ray Client should be used when you want to connect an interactive Python shell to a remote cluster. It is particularly useful for developing interactively in a local Python shell. However, it requires a stable connection to the remote cluster and will terminate the workload if the connection is lost for more than 30 seconds. If you have a long running workload that you want to run on your cluster, it is recommended to use Ray Jobs instead. Please note that Ray Client may not work as expected for ML workloads, such as Ray Tune or Ray Train." 96 | }, 97 | "generated_response": "Ray Client should be used when you want to connect an interactive Python shell to a remote cluster. It is useful for developing interactively in a local Python shell while leveraging a remote Ray cluster. However, it has architectural limitations and may not work as expected when using Ray for ML workloads. If you have a long-running workload that you want to run on your cluster, it is recommended to use Ray Jobs instead.", 98 | "score": 4.5, 99 | "reasoning": "The generated answer is highly relevant and mostly correct. It accurately explains when to use Ray Client and mentions its limitations. However, it omits the detail about the requirement of a stable connection and the termination of the workload if the connection is lost for more than 30 seconds, which was present in the reference answer." 100 | }, 101 | { 102 | "question": "how to scatter actors across the cluster?", 103 | "golden_response": { 104 | "question": "how to scatter actors across the cluster?", 105 | "source": "https://docs.ray.io/en/master/ray-core/scheduling/index.html#spread", 106 | "response": "To scatter actors across the cluster, you can use the \"SPREAD\" scheduling strategy. This can be done by creating instances of the actor with the \"SPREAD\" option. Here is an example:\n\n```python\nactors = [SpreadActor.options(scheduling_strategy=\"SPREAD\").remote() for _ in range(10)]\n```\n\nIn this example, 10 instances of the SpreadActor class are created and spread across the cluster." 107 | }, 108 | "generated_response": "To scatter actors across the cluster, you can use the \"SPREAD\" scheduling strategy. By specifying the scheduling_strategy parameter as \"SPREAD\" when creating the actors, Ray will try to spread the tasks or actors among available nodes in the cluster. This ensures that the actors are distributed across different nodes rather than being concentrated on a single node.", 109 | "score": 4.5, 110 | "reasoning": "The generated answer is relevant and correct. It explains the use of the \"SPREAD\" scheduling strategy to scatter actors across the cluster, similar to the reference answer. However, it lacks the specific code example provided in the reference answer." 111 | } 112 | ] -------------------------------------------------------------------------------- /notebooks/eval.py: -------------------------------------------------------------------------------- 1 | import openai 2 | import re 3 | import numpy as np 4 | from tqdm import tqdm 5 | 6 | from llama_index.evaluation import CorrectnessEvaluator 7 | from llama_index.llms import OpenAI 8 | from llama_index import ServiceContext 9 | 10 | 11 | def evaluate_retrieval( 12 | llama_index_retriever, 13 | queries, 14 | golden_sources 15 | ): 16 | results = [] 17 | 18 | for query, expected_source in tqdm(list(zip(queries, golden_sources))): 19 | retrieved_nodes = llama_index_retriever.retrieve(query) 20 | retrieved_sources = [node.metadata['source'] for node in retrieved_nodes] 21 | 22 | # If our label does not include a section, then any sections on the page should be considered a hit. 23 | if "#" not in expected_source: 24 | retrieved_sources = [source.split("#")[0] for source in retrieved_sources] 25 | 26 | if expected_source in retrieved_sources: 27 | is_hit = True 28 | score = retrieved_nodes[retrieved_sources.index(expected_source)].score 29 | else: 30 | is_hit = False 31 | score = 0.0 32 | 33 | result = { 34 | "is_hit": is_hit, 35 | "score": score, 36 | "retrieved": retrieved_sources, 37 | "expected": expected_source, 38 | "query": query, 39 | } 40 | results.append(result) 41 | return results 42 | 43 | 44 | def get_hit_rate(results): 45 | return np.mean([r["is_hit"] for r in results]) 46 | 47 | 48 | def evaluate_e2e( 49 | llama_index_query_engine, 50 | queries, 51 | golden_responses, 52 | llm=None, 53 | verbose=False, 54 | ): 55 | # run inference 56 | if verbose: 57 | print('Running inference') 58 | 59 | generated_responses_str = [] 60 | for query in tqdm(queries): 61 | response = llama_index_query_engine.query(query) 62 | generated_responses_str.append(response.response) 63 | 64 | # setup evaluator 65 | eval_llm = llm or OpenAI(model='gpt-4', temperature=0.0) 66 | service_context = ServiceContext.from_defaults(llm=eval_llm) 67 | evaluator = CorrectnessEvaluator(service_context=service_context) 68 | 69 | # run evaluation 70 | if verbose: 71 | print('Running eval') 72 | 73 | eval_results = [] 74 | for query, rag_response, golden_response in tqdm(list(zip(queries, generated_responses_str, golden_responses))): 75 | eval_result = evaluator.evaluate( 76 | query=query, 77 | reference=golden_response, 78 | response=rag_response) 79 | eval_results.append(eval_result) 80 | 81 | return eval_results 82 | 83 | 84 | def get_mean_score(results): 85 | return np.mean([r.score for r in results]) -------------------------------------------------------------------------------- /notebooks/utils.py: -------------------------------------------------------------------------------- 1 | import json 2 | import random 3 | from llama_index import VectorStoreIndex, ServiceContext 4 | from llama_index.llms import Anyscale, OpenAI 5 | 6 | from data import get_embedding_model, get_postgres_store, EMBEDDING_DIMENSIONS 7 | 8 | 9 | def _get_vector_store_index( 10 | service_context, 11 | embedding_model_name, 12 | ): 13 | 14 | embed_dim = EMBEDDING_DIMENSIONS[embedding_model_name] 15 | vector_store = get_postgres_store(embed_dim) 16 | index = VectorStoreIndex.from_vector_store( 17 | vector_store, 18 | service_context=service_context 19 | ) 20 | return index 21 | 22 | 23 | def get_query_engine( 24 | llm_model_name: str = "meta-llama/Llama-2-70b-chat-hf", 25 | temperature: float = 0.1, 26 | embedding_model_name = "text-embedding-ada-002", 27 | similarity_top_k=2 28 | ): 29 | embed_model = get_embedding_model(embedding_model_name) 30 | 31 | if "llama" in llm_model_name: 32 | llm = Anyscale(model=llm_model_name, temperature=temperature) 33 | else: 34 | llm = OpenAI(model=llm_model_name, temperature=temperature) 35 | 36 | service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm) 37 | 38 | index = _get_vector_store_index(service_context, embedding_model_name) 39 | return index.as_query_engine(similarity_top_k=similarity_top_k) 40 | 41 | 42 | def get_retriever( 43 | embedding_model_name = "text-embedding-ada-002", 44 | similarity_top_k=2 45 | ): 46 | 47 | embed_model = get_embedding_model(embedding_model_name) 48 | service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=None) 49 | 50 | index = _get_vector_store_index(service_context, embedding_model_name) 51 | return index.as_query_engine(similarity_top_k=similarity_top_k) 52 | 53 | 54 | def train_test_split(data, split_ratio=0.8): 55 | """ 56 | Split a list of items into training and testing sets. 57 | 58 | Args: 59 | data (list): The list of items to be split. 60 | split_ratio (float): The ratio of items to include in the training set (default is 0.8). 61 | 62 | Returns: 63 | tuple: A tuple containing two lists - the training set and the testing set. 64 | """ 65 | if not 0 <= split_ratio <= 1: 66 | raise ValueError("Split ratio must be between 0 and 1") 67 | 68 | # Shuffle the data to ensure randomness in the split 69 | random.shuffle(data) 70 | 71 | # Calculate the split indices 72 | split_index = int(len(data) * split_ratio) 73 | 74 | # Split the data into training and testing sets 75 | train_set = data[:split_index] 76 | test_set = data[split_index:] 77 | 78 | return train_set, test_set 79 | 80 | 81 | def subsample(data, ratio): 82 | """ 83 | Subsample a list to a given ratio. 84 | 85 | Args: 86 | data (list): The list of items to be subsampled. 87 | ratio (float): The ratio of items to retain in the subsample. 88 | 89 | Returns: 90 | list: A subsampled list containing the specified ratio of items. 91 | """ 92 | if not 0 <= ratio <= 1: 93 | raise ValueError("Ratio must be between 0 and 1") 94 | 95 | # Calculate the number of items to retain in the subsample 96 | num_items_to_retain = int(len(data) * ratio) 97 | 98 | # Randomly select items to retain 99 | subsampled_data = random.sample(data, num_items_to_retain) 100 | 101 | return subsampled_data 102 | 103 | 104 | def write_jsonl(filename, data): 105 | """ 106 | Write a list of dictionaries to a JSON Lines (JSONL) file. 107 | 108 | Args: 109 | filename (str): The name of the JSONL file to write to. 110 | data (list): A list of dictionaries to write as JSONL objects. 111 | """ 112 | with open(filename, 'w', encoding='utf-8') as file: 113 | for item in data: 114 | json.dump(item, file, ensure_ascii=False) 115 | file.write('\n') -------------------------------------------------------------------------------- /presentation.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/run-llama/ai-engineer-workshop/918b7efd79ec631978f484e1d1ad9704fae64306/presentation.pdf -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.8.6 2 | aiosignal==1.3.1 3 | annotated-types==0.6.0 4 | anyio==3.7.1 5 | appnope==0.1.3 6 | argon2-cffi==23.1.0 7 | argon2-cffi-bindings==21.2.0 8 | arrow==1.3.0 9 | asttokens==2.4.0 10 | async-lru==2.0.4 11 | async-timeout==4.0.3 12 | asyncpg==0.28.0 13 | attrs==23.1.0 14 | Babel==2.13.0 15 | backcall==0.2.0 16 | beautifulsoup4==4.12.2 17 | bleach==6.1.0 18 | certifi==2023.7.22 19 | cffi==1.16.0 20 | charset-normalizer==3.3.0 21 | click==8.1.7 22 | comm==0.1.4 23 | dataclasses-json==0.6.1 24 | debugpy==1.8.0 25 | decorator==5.1.1 26 | defusedxml==0.7.1 27 | executing==2.0.0 28 | fastjsonschema==2.18.1 29 | filelock==3.12.4 30 | fqdn==1.5.1 31 | frozenlist==1.4.0 32 | fsspec==2023.9.2 33 | greenlet==3.0.0 34 | idna==3.4 35 | ipykernel==6.25.2 36 | ipython==8.16.1 37 | ipython-genutils==0.2.0 38 | ipywidgets==8.1.1 39 | isoduration==20.11.0 40 | jedi==0.19.1 41 | Jinja2==3.1.2 42 | joblib==1.3.2 43 | json5==0.9.14 44 | jsonpatch==1.33 45 | jsonpointer==2.4 46 | jsonschema==4.19.1 47 | jsonschema-specifications==2023.7.1 48 | jupyter==1.0.0 49 | jupyter-console==6.6.3 50 | jupyter-events==0.7.0 51 | jupyter-lsp==2.2.0 52 | jupyter_client==8.3.1 53 | jupyter_core==5.3.2 54 | jupyter_server==2.7.3 55 | jupyter_server_terminals==0.4.4 56 | jupyterlab==4.0.6 57 | jupyterlab-pygments==0.2.2 58 | jupyterlab-widgets==3.0.9 59 | jupyterlab_server==2.25.0 60 | langchain==0.0.310 61 | langsmith==0.0.43 62 | llama-index==0.8.41 63 | MarkupSafe==2.1.3 64 | marshmallow==3.20.1 65 | matplotlib-inline==0.1.6 66 | mistune==3.0.2 67 | msgpack==1.0.7 68 | multidict==6.0.4 69 | mypy-extensions==1.0.0 70 | nbclient==0.8.0 71 | nbconvert==7.9.2 72 | nbformat==5.9.2 73 | nest-asyncio==1.5.8 74 | nltk==3.8.1 75 | notebook==7.0.4 76 | notebook_shim==0.2.3 77 | numpy==1.26.0 78 | openai==0.28.1 79 | overrides==7.4.0 80 | packaging==23.2 81 | pandas==2.1.1 82 | pandocfilters==1.5.0 83 | parso==0.8.3 84 | pexpect==4.8.0 85 | pgvector==0.2.3 86 | pickleshare==0.7.5 87 | platformdirs==3.11.0 88 | prometheus-client==0.17.1 89 | prompt-toolkit==3.0.39 90 | protobuf==4.24.4 91 | psutil==5.9.5 92 | psycopg2-binary==2.9.9 93 | ptyprocess==0.7.0 94 | pure-eval==0.2.2 95 | pyarrow==13.0.0 96 | pycparser==2.21 97 | pydantic==2.4.2 98 | pydantic_core==2.10.1 99 | Pygments==2.16.1 100 | python-dateutil==2.8.2 101 | python-json-logger==2.0.7 102 | pytz==2023.3.post1 103 | PyYAML==6.0.1 104 | pyzmq==25.1.1 105 | qtconsole==5.4.4 106 | QtPy==2.4.0 107 | ray==2.7.0 108 | referencing==0.30.2 109 | regex==2023.10.3 110 | requests==2.31.0 111 | rfc3339-validator==0.1.4 112 | rfc3986-validator==0.1.1 113 | rpds-py==0.10.4 114 | Send2Trash==1.8.2 115 | six==1.16.0 116 | sniffio==1.3.0 117 | soupsieve==2.5 118 | SQLAlchemy==2.0.21 119 | stack-data==0.6.3 120 | tenacity==8.2.3 121 | terminado==0.17.1 122 | tiktoken==0.5.1 123 | tinycss2==1.2.1 124 | tornado==6.3.3 125 | tqdm==4.66.1 126 | traitlets==5.11.2 127 | types-python-dateutil==2.8.19.14 128 | typing-inspect==0.9.0 129 | typing_extensions==4.8.0 130 | tzdata==2023.3 131 | uri-template==1.3.0 132 | urllib3==1.26.17 133 | wcwidth==0.2.8 134 | webcolors==1.13 135 | webencodings==0.5.1 136 | websocket-client==1.6.4 137 | widgetsnbextension==4.0.9 138 | yarl==1.9.2 139 | --------------------------------------------------------------------------------