├── reps ├── README.md ├── user-story.png ├── 2025-11-24-ray-token-auth │ ├── ray_auth_flow.png │ └── ray_auth_with_k8s_architecture.png ├── 2023-12-04-accelerated-dag-figures │ ├── image1.png │ ├── image2.png │ ├── image3.png │ ├── image4.png │ └── image5.png ├── 2024-10-18-train-tune-api-revamp │ ├── train_architecture.png │ ├── train_tune_decoupled.png │ ├── train_tune_dependency.png │ ├── train_tune_interop_after.png │ └── train_tune_interop_before.png ├── 2025-11-21-ray-history-server │ ├── events_file_structure.png │ ├── history_server_architecture.png │ └── 2025-11-21-ray-history-server.md ├── 2024-05-21-ray-kubectl-plugin.md ├── 2023-04-27-data-strict-mode.md ├── 2022-03-09-shuffle.md ├── 2024-06-16-support-apache-yunikorn-scheduler.md ├── 2022-03-08-serve_pipeline.md ├── 2024-05-21-kuberay-authentication.md ├── 2023-07-08-air-surface-syntax.md ├── 2023-03-15-train-api.md ├── 2022-12-08-ray-for-federated-learning-and-privacy-preserving-computing.md ├── 2022-10-11-serve-java-http-ingress.md ├── 2023-04-28-remove-algorithms-from-rllib.md ├── 2023-03-20-air-storage-path.md ├── 2022-09-19-ray-on-spark.md ├── 2023-08-18-serve-java-dag-api.md ├── 2023-8-18-ray-on-spark-autoscaling.md ├── 2023-06-06-simplify-sync.md ├── 2022-04-21-state-observability-apis.md ├── 2025-03-18-label-based-scheduling.md └── 2023-10-13-accelerator-support.md ├── README.md ├── .gitignore └── LICENSE /reps/README.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /reps/user-story.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/user-story.png -------------------------------------------------------------------------------- /reps/2025-11-24-ray-token-auth/ray_auth_flow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2025-11-24-ray-token-auth/ray_auth_flow.png -------------------------------------------------------------------------------- /reps/2023-12-04-accelerated-dag-figures/image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2023-12-04-accelerated-dag-figures/image1.png -------------------------------------------------------------------------------- /reps/2023-12-04-accelerated-dag-figures/image2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2023-12-04-accelerated-dag-figures/image2.png -------------------------------------------------------------------------------- /reps/2023-12-04-accelerated-dag-figures/image3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2023-12-04-accelerated-dag-figures/image3.png -------------------------------------------------------------------------------- /reps/2023-12-04-accelerated-dag-figures/image4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2023-12-04-accelerated-dag-figures/image4.png -------------------------------------------------------------------------------- /reps/2023-12-04-accelerated-dag-figures/image5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2023-12-04-accelerated-dag-figures/image5.png -------------------------------------------------------------------------------- /reps/2024-10-18-train-tune-api-revamp/train_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2024-10-18-train-tune-api-revamp/train_architecture.png -------------------------------------------------------------------------------- /reps/2025-11-21-ray-history-server/events_file_structure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2025-11-21-ray-history-server/events_file_structure.png -------------------------------------------------------------------------------- /reps/2024-10-18-train-tune-api-revamp/train_tune_decoupled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2024-10-18-train-tune-api-revamp/train_tune_decoupled.png -------------------------------------------------------------------------------- /reps/2024-10-18-train-tune-api-revamp/train_tune_dependency.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2024-10-18-train-tune-api-revamp/train_tune_dependency.png -------------------------------------------------------------------------------- /reps/2024-10-18-train-tune-api-revamp/train_tune_interop_after.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2024-10-18-train-tune-api-revamp/train_tune_interop_after.png -------------------------------------------------------------------------------- /reps/2025-11-21-ray-history-server/history_server_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2025-11-21-ray-history-server/history_server_architecture.png -------------------------------------------------------------------------------- /reps/2025-11-24-ray-token-auth/ray_auth_with_k8s_architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2025-11-24-ray-token-auth/ray_auth_with_k8s_architecture.png -------------------------------------------------------------------------------- /reps/2024-10-18-train-tune-api-revamp/train_tune_interop_before.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ray-project/enhancements/HEAD/reps/2024-10-18-train-tune-api-revamp/train_tune_interop_before.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Ray Enhancement Proposals 2 | This repo tracks Ray Enhancement Proposals (REP). The REP process is the main way to propose, discuss, and decide on features and other major changes to the Ray project. We'll start with a simple decision-making process (and evolve it over time):. 3 | - First, a draft PR is created against the repo with a draft REP. A senior Ray committer should be designated as the shepherd in the Stewardship section and assigned to the PR. 4 | - The shepherd will review the PR and get it into a polished state for further review by Ray committers. 5 | - Once the PR is reviewable, we will hold a vote on the ``ray-committers`` mailing list. In most cases this should reach consensus; if the result is not unanimous, Eric Liang (@ericl) and Philipp Moritz (@pcmoritz) will be the final deciders on whether to accept the change. 6 | - Based on the results of the vote and possible final decision, the PR will either be merged (REP approved) or closed (REP rejected) with a short summary of the decision. 7 | 8 | You can find a list of PRs for REPs here (both open and merged PRs are available for comment): https://github.com/ray-project/enhancements/pulls?q=is%3Apr 9 | 10 | Each REP should include the following information: 11 | ## Summary 12 | ### General Motivation 13 | What use cases is this proposal supposed to enhance. If possible, please include details like the environment and scale. 14 | ### Should this change be within `ray` or outside? 15 | From a software layering perspective, should this change be part of the main `ray` project, part of an ecosystem project under `ray-project`, or a new ecosystem project? 16 | 17 | When reviewing the REP, the reviewers and the shepherd should apply the following judgements: 18 | - If an author proposes a change to be within the `ray` repo, the reviewers and the shepherd should assess whether the change can be layered on top of `ray` instead. 19 | If so we should try to make the change in a separate repo. 20 | - For a change proposed as an ecosystem project under `ray-project`: the reviewers and the shepherd should make sure that the technical quality 21 | meets the bar of (at least) a good "experimental" or "alpha" feature -- we should be comfortable welcoming Ray users with similar use cases to try this project. 22 | - For a change proposed as a new ecosystem project (outside of `ray-project`): then this REP is just serving as a "request for comments". 23 | We don't need to go through the voting process, since it's not Ray committers' decision to approve the change. 24 | 25 | ## Stewardship 26 | ### Required Reviewers 27 | The proposal will be open to the public, but please suggest a few experienced Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers. 28 | ### Shepherd of the Proposal (should be a senior committer) 29 | To make the review process more productive, the owner of each proposal should identify a **shepherd** (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review. 30 | 31 | ## Design and Architecture 32 | The proposal should include sufficient technical details for reviewers to determine the anticipated benefits and risks. 33 | 34 | ## Compatibility, Deprecation, and Migration Plan 35 | An important part of the proposal is to explicitly point out any compability implications of the proposed change. If there is any, we should thouroughly discuss a plan to deprecate existing APIs and migration to the new one(s). 36 | 37 | ## Test Plan and Acceptance Criteria 38 | The proposal should discuss how the change will be tested **before** it can be merged or enabled. It should also include other acceptance criteria including documentation and examples. 39 | 40 | ## (Optional) Follow-on Work 41 | Optionally, the proposal should discuss necessary follow-on work after the change is accepted. 42 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ## Core latex/pdflatex auxiliary files: 2 | *.aux 3 | *.lof 4 | *.log 5 | *.lot 6 | *.fls 7 | *.out 8 | *.toc 9 | *.fmt 10 | *.fot 11 | *.cb 12 | *.cb2 13 | .*.lb 14 | 15 | ## Intermediate documents: 16 | *.dvi 17 | *.xdv 18 | *-converted-to.* 19 | # these rules might exclude image files for figures etc. 20 | # *.ps 21 | # *.eps 22 | # *.pdf 23 | 24 | ## Generated if empty string is given at "Please type another file name for output:" 25 | .pdf 26 | 27 | ## Bibliography auxiliary files (bibtex/biblatex/biber): 28 | *.bbl 29 | *.bcf 30 | *.blg 31 | *-blx.aux 32 | *-blx.bib 33 | *.run.xml 34 | 35 | ## Build tool auxiliary files: 36 | *.fdb_latexmk 37 | *.synctex 38 | *.synctex(busy) 39 | *.synctex.gz 40 | *.synctex.gz(busy) 41 | *.pdfsync 42 | 43 | ## Build tool directories for auxiliary files 44 | # latexrun 45 | latex.out/ 46 | 47 | ## Auxiliary and intermediate files from other packages: 48 | # algorithms 49 | *.alg 50 | *.loa 51 | 52 | # achemso 53 | acs-*.bib 54 | 55 | # amsthm 56 | *.thm 57 | 58 | # beamer 59 | *.nav 60 | *.pre 61 | *.snm 62 | *.vrb 63 | 64 | # changes 65 | *.soc 66 | 67 | # comment 68 | *.cut 69 | 70 | # cprotect 71 | *.cpt 72 | 73 | # elsarticle (documentclass of Elsevier journals) 74 | *.spl 75 | 76 | # endnotes 77 | *.ent 78 | 79 | # fixme 80 | *.lox 81 | 82 | # feynmf/feynmp 83 | *.mf 84 | *.mp 85 | *.t[1-9] 86 | *.t[1-9][0-9] 87 | *.tfm 88 | 89 | #(r)(e)ledmac/(r)(e)ledpar 90 | *.end 91 | *.?end 92 | *.[1-9] 93 | *.[1-9][0-9] 94 | *.[1-9][0-9][0-9] 95 | *.[1-9]R 96 | *.[1-9][0-9]R 97 | *.[1-9][0-9][0-9]R 98 | *.eledsec[1-9] 99 | *.eledsec[1-9]R 100 | *.eledsec[1-9][0-9] 101 | *.eledsec[1-9][0-9]R 102 | *.eledsec[1-9][0-9][0-9] 103 | *.eledsec[1-9][0-9][0-9]R 104 | 105 | # glossaries 106 | *.acn 107 | *.acr 108 | *.glg 109 | *.glo 110 | *.gls 111 | *.glsdefs 112 | *.lzo 113 | *.lzs 114 | 115 | # uncomment this for glossaries-extra (will ignore makeindex's style files!) 116 | # *.ist 117 | 118 | # gnuplottex 119 | *-gnuplottex-* 120 | 121 | # gregoriotex 122 | *.gaux 123 | *.gtex 124 | 125 | # htlatex 126 | *.4ct 127 | *.4tc 128 | *.idv 129 | *.lg 130 | *.trc 131 | *.xref 132 | 133 | # hyperref 134 | *.brf 135 | 136 | # knitr 137 | *-concordance.tex 138 | # TODO Comment the next line if you want to keep your tikz graphics files 139 | *.tikz 140 | *-tikzDictionary 141 | 142 | # listings 143 | *.lol 144 | 145 | # luatexja-ruby 146 | *.ltjruby 147 | 148 | # makeidx 149 | *.idx 150 | *.ilg 151 | *.ind 152 | 153 | # minitoc 154 | *.maf 155 | *.mlf 156 | *.mlt 157 | *.mtc[0-9]* 158 | *.slf[0-9]* 159 | *.slt[0-9]* 160 | *.stc[0-9]* 161 | 162 | # minted 163 | _minted* 164 | *.pyg 165 | 166 | # morewrites 167 | *.mw 168 | 169 | # nomencl 170 | *.nlg 171 | *.nlo 172 | *.nls 173 | 174 | # pax 175 | *.pax 176 | 177 | # pdfpcnotes 178 | *.pdfpc 179 | 180 | # sagetex 181 | *.sagetex.sage 182 | *.sagetex.py 183 | *.sagetex.scmd 184 | 185 | # scrwfile 186 | *.wrt 187 | 188 | # sympy 189 | *.sout 190 | *.sympy 191 | sympy-plots-for-*.tex/ 192 | 193 | # pdfcomment 194 | *.upa 195 | *.upb 196 | 197 | # pythontex 198 | *.pytxcode 199 | pythontex-files-*/ 200 | 201 | # tcolorbox 202 | *.listing 203 | 204 | # thmtools 205 | *.loe 206 | 207 | # TikZ & PGF 208 | *.dpth 209 | *.md5 210 | *.auxlock 211 | 212 | # todonotes 213 | *.tdo 214 | 215 | # vhistory 216 | *.hst 217 | *.ver 218 | 219 | # easy-todo 220 | *.lod 221 | 222 | # xcolor 223 | *.xcp 224 | 225 | # xmpincl 226 | *.xmpi 227 | 228 | # xindy 229 | *.xdy 230 | 231 | # xypic precompiled matrices and outlines 232 | *.xyc 233 | *.xyd 234 | 235 | # endfloat 236 | *.ttt 237 | *.fff 238 | 239 | # Latexian 240 | TSWLatexianTemp* 241 | 242 | ## Editors: 243 | # WinEdt 244 | *.bak 245 | *.sav 246 | 247 | # Texpad 248 | .texpadtmp 249 | 250 | # LyX 251 | *.lyx~ 252 | 253 | # Kile 254 | *.backup 255 | 256 | # gummi 257 | .*.swp 258 | 259 | # KBibTeX 260 | *~[0-9]* 261 | 262 | # TeXnicCenter 263 | *.tps 264 | 265 | # auto folder when using emacs and auctex 266 | ./auto/* 267 | *.el 268 | 269 | # expex forward references with \gathertags 270 | *-tags.tex 271 | 272 | # standalone packages 273 | *.sta 274 | 275 | # Makeindex log files 276 | *.lpz 277 | 278 | # IDE 279 | .idea/* 280 | -------------------------------------------------------------------------------- /reps/2024-05-21-ray-kubectl-plugin.md: -------------------------------------------------------------------------------- 1 | # Ray Kubectl Plugin 2 | 3 | This document introduces a kubectl plugin designed to enhance the interaction with KubeRay resources, simplifying the experience of utilizing Ray on Kubernetes. 4 | The primary objective of this plugin is to provide a user-friendly interface, enabling users to leverage the benefits of Ray on Kubernetes without needing extensive knowledge of Kubernetes concepts and tools. 5 | 6 | ## Motivation 7 | 8 | Today it is incredibly challenging for data scientists and AI researchers unfamiliar with Kubernetes to start using Ray on Kubernetes. 9 | However, running Ray on Kubernetes is advantageous or necessary in certain environments. These users struggle with KubeRay for a variety of reasons: 10 | 1. **Unfamiliarity with Kubernetes API and Best Practices**: Users new to Kubernetes may struggle to operate Ray clusters using Kubernetes concepts like Pods and Services due to unfamiliarity with the Kubernetes API and best practices. 11 | 2. **Complex Kubernetes Networking**: Kubernetes networking can be daunting for beginners. Understanding how to use kubectl port-forward or externally exposed Services to connect to Ray clusters can be challenging without prior experience. 12 | 3. **Construction of Kubernetes YAML Manifests**: Creating YAML manifests for RayCluster, RayJob, and RayService can be challenging for those not experienced with using Kubernetes custom resources. 13 | 14 | For novice Kubernetes users, an intuitive CLI experience can drastically enhance their onboarding journey. A kubectl plugin is proposed since it can seamlessly integrate with the rest of the existing kubectl surface as needed. 15 | The primary goal of this plugin should be user-friendliness and a smooth progression from zero to “something good enough”. For more advanced scenarios and scaling requirements, users can opt to manage KubeRay custom resources 16 | independently as many users do today. 17 | 18 | More simply, the Ray Kubectl Plugin simplifies the management of Ray clusters by eliminating the need for users to handle complex YAML configurations. 19 | 20 | Why a kubectl plugin instead of the existing kuberay CLI? 21 | * **Convenience / ease of use**: Kubectl plugins are designed to work within the existing kubectl framework. They inherit the user's current authentication, context, and cluster metadata, eliminating the need for additional setup. 22 | * **Seamless kubectl integration**: Users can effortlessly switch between basic kubectl subcommands and the Ray plugin using a single command-line tool. 23 | * **Accessibility**: The Kuberay CLI needs the KubeRay API server to function. However, the majority of clusters using KubeRay don't run the KubeRay API server, which limits the accessibility of the CLI. 24 | 25 | ## User Journey 26 | 27 | ### Create / Manage a Ray cluster 28 | 29 | ``` 30 | $ kubectl ray cluster create my-cluster --ray-version=2.9.3 31 | --worker-replicas=10 --worker-cpus=8 -worker-memory=32GB --worker-gpus=2 32 | ``` 33 | 34 | ``` 35 | kubectl ray cluster scale default-worker-group --cluster my-cluster --replicas=10 36 | ``` 37 | 38 | ``` 39 | kubectl ray cluster worker-group create cpu-pool --cluster my-cluster -worker-replicas=10 40 | --worker-cpus=8 -worker-memory=32GB 41 | 42 | ``` 43 | 44 | ``` 45 | kubectl ray cluster delete my-cluster 46 | ``` 47 | 48 | ### Ray Job Submissions (using RayJob) 49 | 50 | ``` 51 | $ kubectl ray job submit --cluster my-cluster --image=image:tag -- python job.py 52 | ``` 53 | 54 | ### Ray Session - Interactive Client / Dashboard 55 | 56 | ``` 57 | $ kubectl ray cluster session my-cluster 58 | Connecting... 59 | Ray Interactive Client Session started on port 10001... 60 | Ray Dashboard Session started on port 8265... 61 | ``` 62 | 63 | ### Ray Dashboard 64 | 65 | ``` 66 | $ kubectl ray cluster dashboard my-cluster 67 | Opening browser session to Ray dashboard ... 68 | ``` 69 | 70 | ### Ray Logs 71 | 72 | ``` 73 | $ kubectl ray cluster logs my-cluster --out-dir ./log-dir 74 | Downloading Ray logs to ./log-dir ... 75 | ``` 76 | 77 | ## Implementation Details 78 | 79 | The kubectl plugin will be developed in the cli directory, replacing the current KubeRay CLI. While the kubectl plugin will overlap with the existing KubeRay CLI in some ways (especially in managing KubeRay cluster), it will go further by enhancing day-to-day operations with Ray clusters. These enhancements include authenticating to clusters, establishing local sessions, submitting jobs, and scaling clusters. In addition, the kubectl plugin will not depend on the KubeRay API server, making it viable for a larger audience of KubeRay users. 80 | 81 | The CLI will extend kubectl using kubectl’s plugin mechanism. See [Extend kubectl with plugins](https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/) for more details. 82 | 83 | MVP Scope: 84 | * `kubectl ray cluster get|list` 85 | * `kubectl ray cluster scale` 86 | * `kubectl ray cluster session` 87 | * `kubectl ray cluster dashboard` 88 | * `kubectl ray cluster logs` 89 | 90 | Future Scope: 91 | * `kubectl ray cluster create|update|delete` 92 | * `kubectl ray create cluster –provider=gke|eks|aks|etc` 93 | * Support for adding provider specific YAML like GCSFuse/EFS mounts, loadbalancers, etc 94 | * `kubectl ray job get|list` 95 | * `kubectl ray job submit` 96 | -------------------------------------------------------------------------------- /reps/2023-04-27-data-strict-mode.md: -------------------------------------------------------------------------------- 1 | # Roll out "strict mode" for Ray Data 2 | 3 | ## Summary 4 | 5 | Make a (breaking) API change to always require data schemas in Ray Data, dropping support for standalone Python objects. In addition to unification and simplicity benefits, this aligns the Ray Data API closer to industry-standard distributed data APIs like Apache Spark and also emerging standards for machine learning datasets like HuggingFace. 6 | 7 | ### General Motivation 8 | 9 | This REP proposes rolling out a breaking API change to Ray Data, termed "strict mode". In strict mode, support for standalone Python objects is dropped. This means that instead of directly storing, e.g., Python `Tuple[str, int]` instance in Ray Data, users will have to either give each field a name (i.e., `{foo: str, bar: int}`), or use a named object-type field (i.e., `{foo: object}`). In addition, strict mode removes the "default" batch format in place of "numpy" by default. This means that most users just need to be aware of `Dict[str, Any]` (non-batched data records) and `Dict[str, np.ndarray]` (batched data) types when working with Ray Data. 10 | 11 | The motivation for this change is to cut down on the number of alternative representations users have to be aware of in Ray Data, which complicate the docs, examples, and add to new user confusion. 12 | For reference, this is the main PR originally introducing strict mode: https://github.com/ray-project/ray/pull/34336 13 | 14 | **Full list of changes** 15 | - All read apis return structured data, never standalone Python objects. 16 | - Standalone Python objects are prohibited from being returned from map / map batches. 17 | - Standalone Numpy arrays are prohibited from being returned from map / map batches. 18 | - There is no more special interpretation of single-column schema containing just `__value__` as a column. 19 | - The default batch format is "numpy" instead of "default" (pandas). 20 | - schema() returns a unified Schema class instead of Union[pyarrow.lib.Schema, type]. 21 | 22 | **Datasource behavior changes** 23 | - `range_tensor`: create "data" col instead of `__value__` 24 | - `from_numpy`/`from_numpy_refs` : create "data" col instead of using `__value__` 25 | - `from_items`: create "item" col instead of using Python objects 26 | - `range`: create "id" column instead of using Python objects 27 | 28 | The change itself has been well received in user testing, so the remainder of this REP will focus on the rollout strategy. 29 | 30 | ### Should this change be within `ray` or outside? 31 | main `ray` project. Changes are made to Ray Data. 32 | 33 | ## Stewardship 34 | ### Required Reviewers 35 | The proposal will be open to the public, but please suggest a few experienced Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers. 36 | 37 | @amogkam, @c21 38 | 39 | ### Shepherd of the Proposal (should be a senior committer) 40 | To make the review process more productive, the owner of each proposal should identify a **shepherd** (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review. 41 | 42 | @pcmoritz 43 | 44 | ## Rollout Plan 45 | 46 | ### Impact of Changes 47 | 48 | The proposed change mainly impacts users that are working with in-memory data objects and image datasets. For these users, they will get an error when trying to load data without a schema (e.g., ``StrictModeError: Error validating : standalone Python objects are not allowed in strict mode. Please wrap the item in a dictionary like `{data: }`. For more details and how to disable strict mode, visit DOC_URL_HERE.``). 49 | 50 | ### Notification 51 | 52 | The main method of notification will be the ``StrictModeError`` exception raised when the user tries to create disallowed data types. The exception will link to documentation on how to upgrade / disable strict mode. 53 | 54 | We will also add a warning banner (for a couple releases) on the first import of Ray Data that notifies users of this change. 55 | 56 | ### Timeline 57 | 58 | - Ray 2.5: Enable strict mode by default, with the above notification plan. 59 | - Ray 2.6: No changes. 60 | - Ray 2.7 or after: Enforce strict mode always, and remove code for supporting the legacy code paths. 61 | 62 | ## Examples: 63 | 64 | ### Before 65 | ```python 66 | ds = ray.data.range(5) 67 | # -> Datastream(num_blocks=1, schema=) 68 | 69 | ds.take()[0] 70 | # -> 0 71 | 72 | assert ds.take_batch() 73 | # -> [0, 1, 2, 3, 4] 74 | 75 | ds.map_batches(lambda b: b * 2).take_batch() # b is coerced to pd.DataFrame 76 | # -> pd.DataFrame({"id": [0, 2, 4, 6, 8]}) 77 | ``` 78 | 79 | ### After 80 | ```python 81 | ds = ray.data.range(1) 82 | # -> Datastream(num_blocks=1, schema={id: int64}) 83 | 84 | ds.take()[0] 85 | # -> {"id": 0} 86 | 87 | assert ds.take_batch() 88 | # -> {"id": np.array([0, 1, 2, 3, 4])} 89 | 90 | ds.map_batches(lambda b: {"id": b["id"] * 2}).take_batch() # b is Dict[str, np.ndarray] 91 | # -> {"id": np.array([0, 2, 4, 6, 8])} 92 | ``` 93 | 94 | Note that in the "after" code, the datastream always has a fixed schema, and the batch type is consistently a dict of numpy arrays. 95 | 96 | ## Test Plan and Acceptance Criteria 97 | 98 | The master branch will have strict mode on by default. There will be a suite that tests basic functionality with strict mode off, to avoid regressions. 99 | -------------------------------------------------------------------------------- /reps/2022-03-09-shuffle.md: -------------------------------------------------------------------------------- 1 | ## Summary 2 | ### General Motivation 3 | 4 | Shuffle is an important primitive for data processing, e.g., sort, and ML ingest workloads, e.g., shuffling training data. 5 | Currently, Ray offers shuffle through the Datasets library, but the implementation has known data scalability and performance limitations past 10TB data scales. 6 | The goal of this proposal is to improve Datasets shuffle stability and scalability, through both Datasets and Ray core improvements. 7 | By the end of this work, we hope to achieve petabyte-scale shuffle operations with Datasets. 8 | 9 | ### Should this change be within `ray` or outside? 10 | 11 | These changes would lie within Ray Datasets, to improve the shuffle algorithm, and within Ray core, to improve the shuffle execution. 12 | 13 | ## Stewardship 14 | ### Required Reviewers 15 | 16 | @stephanie-wang, @ericl, @scv119 17 | 18 | ### Shepherd of the Proposal (should be a senior committer) 19 | 20 | @ericl 21 | 22 | ## Design and Architecture 23 | 24 | Currently Datasets shuffle is implemented as a map and reduce stage of Ray tasks. For example: 25 | 26 | ```python 27 | @ray.remote 28 | def map(partition, split_fn): 29 | return [block for block in split_fn(partition)] 30 | 31 | @ray.remote 32 | def reduce(*partitions): 33 | return merge(partitions) 34 | 35 | map_outputs = [map.options(num_returns=num_reducers).remote(partition, split_fn) for partition in partitions] 36 | reduce_outputs = [reduce.remote(*[map_output[i] for map_output in map_outputs]) for i in range(num_reducers)] 37 | ray.get(reduce_outputs) 38 | ``` 39 | 40 | This forms a task graph that looks something like this: 41 | 42 | ![MapReduce](https://miro.medium.com/max/680/1*nJYIs2ktVkqVsgSUCzfjaA.gif) 43 | 44 | The number of intermediate map outputs increases quadratically with the number of partitions and therefore the dataset size. This has two primary problems: 45 | 1. I/O efficiency worsens as the data size gets larger and the size of each intermediate map output gets smaller. 46 | 2. The system overhead of each map output becomes a scalability bottleneck. 47 | 48 | We propose addressing (1) through improvements in the Datasets library and (2) through Ray core improvements. 49 | 50 | ### Ray Datasets 51 | 52 | To improve I/O efficiency, we will incorporate some of the work done in the [Exoshuffle paper](https://arxiv.org/abs/2203.05072) on improving shuffle performance on Ray. 53 | Exoshuffle implements a "push-based shuffle" using Ray tasks, which pushes map outputs directly to their reducer while the map stage is still executing. 54 | More details on push-based shuffle can be found in the [Magnet paper](https://dl.acm.org/doi/10.14778/3415478.3415558), which implemented the algorithm as a part of Spark. 55 | 56 | In the current Datasets shuffle, the metadata overhead for `ObjectRefs` in Ray becomes a bottleneck at around 10TB or more. 57 | The Exoshuffle work reduces the total number of `ObjectRefs` and showed that it is possible to run sort at 100TB or more data scales on Ray. 58 | To incorporate this work, we will: 59 | 1. Unify Datasets shuffle-based primitives on the same internal shuffle implementation. 60 | 2. Benchmark Datasets shuffle-based primitives. 61 | 3. Implement a push-based shuffle in Datasets. 62 | 63 | ### Ray core 64 | 65 | Although push-based shuffle reduces the amount of system metadata, it is not enough to scale to petabyte-size data. 66 | Thus, we also propose a number of Ray core improvements to reduce the total amount of metadata needed during shuffle operations. 67 | These improvements center around reducing the per-`ObjectRef` metadata needed. 68 | Currently, each task requires about 5KB of metadata and each object requires about 2KB of metadata at the driver. 69 | We plan to reduce these by: 70 | 71 | 1. "Collapsing" `ObjectRef` metadata for objects returned by the same task. All shuffle-on-Ray implementations rely on tasks with multiple return values. Currently, the metadata for these objects is stored separately, but we can combine these to amortize the metadata cost. 72 | 2. "Collapsing" task metadata for tasks submitted in the same stage. This is analogous to the above, but for tasks that are "similar". 73 | 3. Optimizing per-task and per-object metadata. Many of the metadata fields are not actually set or often have the same values. We can potentially save memory by not including these fields. 74 | 75 | #### Fault tolerance and scheduling support 76 | 77 | The push-based shuffle implemented in Exoshuffle currently requires precise placement of each task to reduce data movement across the cluster. 78 | Currently, this is implemented using node-specific resource requirements. 79 | However, this will hang the job if one of those nodes fails. 80 | 81 | To ensure fault tolerance, we plan to implement soft scheduling constraints to allow tasks to execute even if their original node fails. 82 | In the future, we can improve this further by providing a higher-level API to express scheduling constraints for groups of tasks. 83 | 84 | ## Compatibility, Deprecation, and Migration Plan 85 | 86 | This proposal will not include any API changes other than a way to choose the shuffle implementation in Datasets. 87 | 88 | ## Test Plan and Acceptance Criteria 89 | The proposal should discuss how the change will be tested **before** it can be merged or enabled. It should also include other acceptance criteria including documentation and examples. 90 | 91 | We plan to benchmark Datasets using the following workloads: 92 | 1. Shuffling data loader for ML ingest. 93 | 2. Sorting. 94 | 3. Groupby, using the [h2oai benchmark](https://h2oai.github.io/db-benchmark/). 95 | 96 | We will also use chaos testing (random node failures) to check fault tolerance. 97 | 98 | Initially, we will test the Datasets sort and compare performance both to Exoshuffle and the theoretical best (based on disk bandwidth specs). 99 | We will also test scalability to confirm the current shuffle bottleneck in Datasets and determine the amount of driver memory needed to support a petabyte-scale sort. 100 | 101 | Acceptance criteria: 102 | * Shuffling data loader for ML ingest can scale to 100TB or more (requires push-based shuffle in Datasets). 103 | * Ray Datasets sort can scale to petabyte-size data (requires Ray core optimizations). 104 | * Ray includes a shuffle workload with millions of partitions in the nightly CI tests. 105 | 106 | ## (Optional) Follow-on Work 107 | 108 | Some of the follow-up work includes: 109 | * Further optimizations for groupby and other end-to-end Datasets benchmarks 110 | * Improving scalability to larger clusters (100s or 1000 nodes) 111 | * Providing a high-level API to express scheduling constraints between groups of tasks 112 | * Further optimizations to reduce object overhead in Ray core (e.g., multi-part objects) 113 | -------------------------------------------------------------------------------- /reps/2024-06-16-support-apache-yunikorn-scheduler.md: -------------------------------------------------------------------------------- 1 | # Support new batch scheduler option: Apache YuniKorn 2 | 3 | ## Motivation 4 | 5 | The [Support batch scheduling and queueing](https://github.com/ray-project/kuberay/issues/213) allows Ray to integrate 6 | easily with non-default Kubernetes scheduler, such as [Volcano](https://volcano.sh/). This proposal aims to add support for 7 | another popular Kubernetes scheduler: [Apache YuniKorn](https://yunikorn.apache.org/). This provides Ray users with 8 | another option to leverage the [scheduling features](https://yunikorn.apache.org/docs/next/get_started/core_features) 9 | YuniKorn offers. 10 | 11 | ## Existing state 12 | 13 | ### Current Batch Scheduler Option 14 | 15 | Currently, the batch scheduler feature can be enabled using the `--enable-batch-scheduler` boolean option. 16 | When set to `true`, the operator is started with the following Helm chart value override: 17 | 18 | ```shell 19 | --set batchScheduler.enabled=true 20 | ``` 21 | 22 | The scheduler manager in the KubeRay operator will initiate the scheduler plugins. The framework provides hooks for 23 | each scheduler implementation to inject custom resources and modify pod metadata accordingly. When a Ray cluster is 24 | created, the framework calls the appropriate scheduler plugin functions based on the scheduler name provided 25 | by `ray.io/scheduler-name`. 26 | 27 | ### Limitations 28 | 29 | The framework is designed to be scheduler-agnostic and provides general hooks for supporting different scheduler options. 30 | However, once the `--enable-batch-scheduler` option is set to `true`, the scheduler manager will attempt to load all 31 | scheduler schemas by calling the `AddToScheme(scheme *runtime.Scheme)` function implemented by each scheduler plugin. 32 | This loads all the schemas, including CRDs defined in the implementation. Since only the `VolcanoBatchScheduler` 33 | is currently implemented, it always loads Volcano's `PodGroup` CRD in the controller runtime, 34 | requiring the installation of Volcano CRDs even when enabling other scheduler options. 35 | 36 | ## Proposed changes 37 | 38 | ### Targeted scheduler option 39 | 40 | To support Apache YuniKorn, and more importantly, to support other schedulers in Ray. 41 | We propose to add an option to explicitly set the scheduler name, i.e `--batch-scheduler`. 42 | Option syntax: `--batch-scheduler [SupportedSchedulerName]`, for example: 43 | 44 | Option Syntax: 45 | 46 | ```shell 47 | # use volcano 48 | --batch-scheduler=volcano 49 | 50 | # use yunikorn 51 | --batch-scheduler=yunikorn 52 | ``` 53 | 54 | When a scheduler name is specified, the scheduler manager will only load the configured scheduler plugin, 55 | ensuring only necessary resources are loaded in the controller runtime. 56 | 57 | The option `--batch-scheduler` accepts a single scheduler name as the value, and the value must be `volcano` 58 | or `yunikorn` (before other scheduler plugins introduced). If an unrecognized scheduler name is provided, 59 | the controller will fail to start with an error indicating that the scheduler plugin is not found. 60 | 61 | ### Deprecate "ray.io/scheduler-name" 62 | 63 | With the scheduler name is specified in the operator startup options, there is no need to set `ray.io/scheduler-name` 64 | in `RayJob` or `RayCluster` CRs. This option should be marked as deprecated and eventually removed. 65 | 66 | ### Compatibility 67 | 68 | To maintain backwards compatibility, the `--enable-batch-scheduler` option will remain supported for a few more 69 | releases. However, it will be marked as deprecated, and users are encouraged to switch to the new option. 70 | Below are the scenarios how these options can be used: 71 | 72 | ```shell 73 | # case 1: 74 | # use volcano 75 | --enable-batch-scheduler=true 76 | 77 | # case 2: 78 | # use volcano 79 | --batch-scheduler=volcano 80 | 81 | # case 3: 82 | # use yunikorn 83 | --batch-scheduler=yunikorn 84 | 85 | # case 4: 86 | # invalid options, error message: do not use two options together 87 | # for simplicity, only one of these 2 options should be used 88 | --enable-batch-scheduler=true 89 | --batch-scheduler=volcano|yunikorn 90 | ``` 91 | 92 | ### YuniKorn scheduler plugin behavior 93 | 94 | The yunikorn scheduler plugin will support both `RayCluster` and `RayJob` resources. The integration will be lightweight, 95 | as yunikorn does not require new CRDs. The plugin will add labels to the Ray pods, and the yunikorn scheduler will 96 | schedule Ray pods based on these labels. 97 | 98 | To enable yunikorn scheduler, set the following options when starting the KubeRay operator: 99 | 100 | ```shell 101 | --batch-scheduler=yunikorn 102 | ``` 103 | 104 | if submitting a `RayCluster`, add `yunikorn.apache.org/queue-name` and `yunikorn.apache.org/application-id` to the labels. 105 | 106 | ```yaml 107 | apiVersion: ray.io/v1 108 | kind: RayCluster 109 | metadata: 110 | labels: 111 | yunikorn.apache.org/queue-name: root.abc 112 | yunikorn.apache.org/application-id: rayjob-sample-ltpjh 113 | ``` 114 | 115 | The `RayCluster` will be submitted to `root.abc` queue and scheduled by the yunikorn scheduler. The `RayCluster` will be 116 | recognized as an "application" with ID "rayjob-sample-ltpjh". 117 | 118 | If submitting a `RayJob`, provide only the queue name: 119 | 120 | ```yaml 121 | apiVersion: ray.io/v1 122 | kind: RayJob 123 | metadata: 124 | name: rayjob-example 125 | namespace: my-namespace 126 | labels: 127 | yunikorn.apache.org/queue-name: root.abc 128 | ``` 129 | 130 | when the Ray job is submitted to the cluster, the Ray operator will create the following `RayCluster` CR: 131 | 132 | ```yaml 133 | apiVersion: ray.io/v1 134 | kind: RayCluster 135 | metadata: 136 | labels: 137 | # RayCluster inherits the labels from RayJob 138 | yunikorn.apache.org/queue-name: root.abc 139 | # the same job ID defined in the RayJob spec, or generated by the controller 140 | yunikorn.apache.org/application-id: rayjob-sample-ltpjh 141 | ``` 142 | 143 | ### YuniKorn scheduler plugin details 144 | 145 | The YuniKorn scheduler plugin looks for relevant labels in the RayCluster and populates the following labels 146 | to all the pods created by the RayCluster CR: 147 | 148 | ```yaml 149 | apiVersion: v1 150 | kind: Pod 151 | metadata: 152 | labels: 153 | app.kubernetes.io/created-by: kuberay-operator 154 | app.kubernetes.io/name: kuberay 155 | # value is taken from RayCluster CR label: "yunikorn.apache.org/application-id" 156 | applicationId: rayjob-sample-ltpjh 157 | # value is taken from RayCluster CR label: "yunikorn.apache.org/queue" 158 | queue: root.abc 159 | spec: 160 | schedulerName: yunikorn 161 | ``` 162 | Details about the meaning of these labels can be found in this 163 | [doc](https://yunikorn.apache.org/docs/user_guide/labels_and_annotations_in_yunikorn#labels-and-annotations-in-yunikorn). 164 | YuniKorn will recognize all these pods as part of the same application "rayjob-sample-ltpjh" and apply the 165 | scheduling features accordingly. 166 | 167 | 168 | -------------------------------------------------------------------------------- /reps/2022-03-08-serve_pipeline.md: -------------------------------------------------------------------------------- 1 | 2 | ## Summary - Serve Pipeline 3 | ### General Motivation 4 | Production machine learning serving pipelines are getting longer and wider. They often consist of multiple, or even tens of models collectively making a final prediction, such as image / video content classification and tagging, fraud detection pipeline with multiple policies and models, multi-stage ranking and recommendation, etc. 5 | 6 | Meanwhile, the size of a model is also growing beyond the memory limit of a single machine due to the exponentially growing number of parameters, such as GPT-3, sparse feature embeddings in recsys models such that the ability to do disaggregated and distributed inference is desirable and future proof. 7 | 8 | We want to leverage the programmable and general purpose distributed computing ability of Ray, double down on its unique strengths (scheduling, communication and shared memory) to facilitate authoring, orchestrating, scaling and deployment of complex serving pipelines under one set of DAG API, so a user can program & test multiple models or multiple shards of a single large model dynamically, deploy to production at scale, and upgrade individually. 9 | #### Key requirements: 10 | - Provide the ability to author a DAG of serve nodes to form a complex inference graph. 11 | - Pipeline authoring experience should be fully python programmable with support for dynamic selection, control flows, user business logic, etc. 12 | - DAG can be instantiated and locally executed using tasks and actors API 13 | - DAG can be deployed via declarative and idempotent API, individual nodes can be reconfigured and scaled indepenently. 14 | 15 | ### Should this change be within `ray` or outside? 16 | main `ray` project. Changes are made to Ray Core and Ray Serve level. 17 | 18 | ## Stewardship 19 | ### Required Reviewers 20 | The proposal will be open to the public, but please suggest a few experience Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers. 21 | 22 | @ericl, @edoakes, @simon-mo, @jiaodong 23 | 24 | ### Shepherd of the Proposal (should be a senior committer) 25 | To make the review process more productive, the owner of each proposal should identify a **shepherd** (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review. 26 | 27 | @ericl 28 | 29 | ## Design and Architecture 30 | 31 | ### Example - Diagram 32 | 33 | We want to author a simple diamond-shaped DAG where user provided inputs is send to two models (m1, m2) where each access partial or idential input, and also forward part of original input to the final ensemble stage to compute final output. 34 | 35 | m1.forward(dag_input[0]) 36 | / \ 37 | dag_input ----- dag_input[2] ------ ensemble -> dag_output 38 | \ / 39 | m2.forward(dag_input[1]) 40 | 41 | 42 | ### Example - Code 43 | 44 | Classes or functions decorated by ray can be directly used in Ray DAG building. 45 | ```python 46 | @ray.remote 47 | class Model: 48 | def __init__(self, val): 49 | self.val = val 50 | def forward(self, input): 51 | return self.val * input 52 | 53 | @ray.remote 54 | def ensemble(a, b, c): 55 | return a + b + c 56 | 57 | async def request_to_data_int(request: starlette.requests.Request): 58 | data = await request.body() 59 | return int(data) 60 | 61 | # Args binding, DAG building and input preprocessor definition 62 | with ServeInputNode(preprocessor=request_to_data_int) as dag_input: 63 | m1 = Model.bind(1) 64 | m2 = Model.bind(2) 65 | m1_output = m1.forward.bind(dag_input[0]) 66 | m2_output = m2.forward.bind(dag_input[1]) 67 | ray_dag = ensemble.bind(m1_output, m2_output, dag_input[2]) 68 | ``` 69 | 70 | A DAG authored with Ray DAG API should be locally executable just by Ray Core runtime. 71 | 72 | ```python 73 | # 1*1 + 2*2 + 3 74 | assert ray.get(ray_dag.execute(1, 2, 3)) == 8 75 | ``` 76 | 77 | A Ray DAG can be built into an `serve application` that contains all nodes needed. 78 | ```python 79 | # Build, configure and deploy 80 | app = serve.pipeline.build(ray_dag) 81 | ``` 82 | 83 | Configure individual deployments in app, with same variable name used in ray_dag. 84 | ```python 85 | app.m1.set_options(num_replicas=3) 86 | app.m2.set_options(num_replicas=5) 87 | ``` 88 | 89 | We reserve the name and generate a serve `ingress` deployment that takes care of HTTP / gRPC, input schema validation, adaption, etc. It's our python interface to configure pipeline ingress. 90 | ```python 91 | app.ingress.set_options(num_replicas=10) 92 | 93 | # Translate to group_deploy behind the scene 94 | app_handle = app.deploy() 95 | 96 | # Serve App is locally executable 97 | assert ray.get(app_handle.remote(1, 2, 3)) == 8 98 | ``` 99 | 100 | A serve pipeline application can be built into a YAML file for structured deployment, and configurable by the Ops team by directly mutating configurable fields without deep knowledge or involvement of model code in the pipeline. 101 | ```python 102 | deployment.yaml = app.to_yaml() 103 | 104 | # Structured deployment CLI 105 | serve deploy deployment.yaml 106 | ``` 107 | 108 | ## Compatibility, Deprecation, and Migration Plan 109 | An important part of the proposal is to explicitly point out any compability implications of the proposed change. If there is any, we should thouroughly discuss a plan to deprecate existing APIs and migration to the new one(s). 110 | 111 | - Ray Core 112 | - Serve Pipeline is co-designed with Ray Unified DAG API, where each DAG is always authored using Ray DAG API first. 113 | - The only new API introduced is `.bind()` method on ray decorated function or class. 114 | - Ray Serve 115 | - Serve Pipeline DAG is transformed from Ray DAG where classes used are replaced with serve `Deployment` and class instances with deployment's `RayServeHandle` for better compatibility, deprecation as well as migration. 116 | 117 | - Breaking Changes: Ray Serve 118 | - All args and kwargs passed into class or function in Serve Pipeline needs to be JSON serializable, enforced upon `build()` call. 119 | - We need to introduce and abstract out an `Ingress` component for serve pipeline. 120 | 121 | - Deprecation 122 | - Existing Serve Pipeline Alpha API will be deprecated in favor of Ray Unified DAG API as well as Serve Pipeline Beta. 123 | 124 | - Migration Plan: Ray Serve 125 | - New concepts and API introduced will be applied to Serve Pipeline Beta launch first to minimize compatibility risks. We can expect existing deployment implementation will migrate to `Ingress` and `Serve App` APIs later on. 126 | - Existing multi-model pipeline using Alpha API or raw deployment handle is expected to be migrated to Pipeline Beta API over time. 127 | 128 | 129 | ## Test Plan and Acceptance Criteria 130 | The proposal should discuss how the change will be tested **before** it can be merged or enabled. It should also include other acceptance criteria including documentation and examples. 131 | 132 | - Unit and integration test for core components 133 | - Benchmarks on common multi-model inference workload 134 | - Documentation with representative workload, covered by CI. 135 | 136 | ## (Optional) Follow-on Work 137 | - Performance optimizations for multi-model inference, such as communication, multiplexing, scale-to-zero, etc. 138 | - UX and UI improvements for better user experience 139 | - Exploration of large model Distributed Inference on Serve Pipeline where each node represents a shard of a large model. 140 | -------------------------------------------------------------------------------- /reps/2024-05-21-kuberay-authentication.md: -------------------------------------------------------------------------------- 1 | # Kubernetes Native Ray Authentication 2 | 3 | Ray, in its default configuration, lacks robust security and authentication measures. 4 | Its deployment on Kubernetes provides some level of protection by leveraging RBAC for access control of RayCluster resources. 5 | However, once provisioned, a RayCluster remains vulnerable to unauthorized access from anyone with network connectivity to the Ray head node. 6 | 7 | This proposal introduces a Kubernetes aware sidecar proxy for authenticating external access to the Ray head. 8 | It will leverage the existing Kubernetes authentication system to grant tokens used to securely access Ray head endpoints. 9 | This simplifies Ray authentication by centralizing management of tokens and users within Kubernetes. 10 | 11 | ## Authentication Scope 12 | 13 | While there is a large surface area of a Ray cluster that could benefit from authenticated access, this proposal focuses specifically on securing Ray head endpoints that are frequently accessed externally. 14 | This includes enforcing authentication for both the dashboard server and the interactive client server. Securing access from internal clients like the Raylet to the GCS server is not addressed in this proposal 15 | as network policies are typically sufficient to protect this communication. 16 | 17 | ## Sidecar Authentication Proxy 18 | 19 | To enforce authenticated access to specific Ray head ports, the KubeRay operator will deploy a sidecar container alongside the Ray head pod. 20 | This sidecar will function as a reverse proxy, validating authentication tokens for requests to the dashboard port (8265) and the interactive client port (10001). 21 | Traffic to other ports will pass through the sidecar proxy without requiring authentication. 22 | 23 | The KubeRay operator will be changed in the following ways: 24 | 1. The Ray head container will bind to localhost only 25 | 2. A reverse proxy sidecar will run alongside the Ray head container 26 | 3. A ServiceAccount is created per RayCluster resource 27 | 28 | The sidecar ensures only requests with the correct authorization tokens can access the Ray head. More details on how tokens are authenticated below. 29 | For the MVP, the sidecar will be implemented using Envoy configured with an [External Authorizer](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_authz_filter). 30 | 31 | ## Access Control with TokenReview API 32 | 33 | The sidecar authenticator will use the TokenReview API to authenticate tokens passed into the authorization headers of each request. 34 | Tokens can be created from Kubernetes ServiceAccounts or an external identity provider as long as the Kubernetes API Server is configured with an external authorizer. 35 | Tokens from Kubernetes Service Accounts are required to specify a unique audience for the cluster. For now the placeholder audience is "ray.io/cluster/". 36 | Each RayCluster will be provisioned a default ServiceAccount that KubeRay Operator will use when authenticating to the Ray head (specifically for RayJob and RayService). 37 | Users can use the default ServiceAccount in the absence of external identity providers, but using external identity providers is strongly recommended. 38 | 39 | Example of a TokenReview request that contains a token: 40 | ``` 41 | apiVersion: authentication.k8s.io/v1 42 | kind: TokenReview 43 | spec: 44 | token: 45 | audiences: 46 | - "ray.io/cluster/" 47 | ``` 48 | 49 | Example of a TokenReview response with user information about the token: 50 | ``` 51 | apiVersion: authentication.k8s.io/v1 52 | kind: TokenReview 53 | status: 54 | authenticated: true 55 | user: 56 | username: system:serviceaccount:default:ray-cluster 57 | uid: 58 | groups: ["system:serviceaccounts", "system:serviceaccounts:default"] 59 | audiences: 60 | - "ray.io/cluster/" 61 | ``` 62 | 63 | ## User to Ray Head Authentication 64 | 65 | When submitting jobs using the Ray job CLI: 66 | ``` 67 | export RAY_JOB_HEADERS="Authorization: Bearer $(kubectl create token ray-cluster)" 68 | OR 69 | export RAY_JOB_HEADERS="Authorization: Bearer $(gcloud auth print-access-token)" # example external identity provider 70 | 71 | ray job submit... 72 | ``` 73 | 74 | When using Ray interactive client: 75 | ``` 76 | export RAY_CLUSTER_TOKEN=$(kubectl create token ray-cluster) 77 | OR 78 | export RAY_CLUSTER_TOKEN=$(gcloud auth print-access-token) # example external identity provider 79 | ------------------------------------------------------------ 80 | 81 | import os 82 | import ray 83 | import base64 84 | 85 | def get_metadata(): 86 | """ 87 | Get authentication header metadata for ray.util.connect 88 | """ 89 | headers = {"Authorization": f"Bearer {os.environ["RAY_CLUSTER_TOKEN"]}"} 90 | return [(key.lower(), value) for key, value in headers.items()] 91 | 92 | ray.init( 93 | "ray-cluster:443", 94 | metadata=get_metadata(), 95 | secure=True, 96 | ) 97 | ``` 98 | 99 | ## RayCluster API Changes 100 | 101 | Enabling authentication should be optional per RayCluster. Two new fields `enableAuthentication` and `authenticationOptions` will be added to the RayCluster spec to enable this capability and configure allowed principals. 102 | If no allowed principals are specified, it will default to a ServiceAccount with the same namespace and name as the RayCluster. 103 | 104 | ``` 105 | apiVersion: ray.io/v1 106 | kind: RayCluster 107 | metdata: 108 | ... 109 | spec: 110 | enableAuthentication: true 111 | authenticationOptions: 112 | allowedPrincipals: 113 | - my-user@example.com # example user 114 | - system:serviceaccount:default:ray-cluster # example service account 115 | - system:authenticated # example group 116 | ``` 117 | 118 | ## Dynamic Access Control with Kubernetes RBAC and the SubjectAccessReview API 119 | 120 | Beyond token authentication via the TokenReview API, Kubernetes Role-Based Access Control (RBAC) provides a dynamic mechanism to manage which principals (users or service accounts) 121 | have access to Ray clusters. This can be achieved by introducing a custom verb and leveraging the SubjectAccessReview API. 122 | 123 | A custom verb `admin` can be used in Roles that reference the `rayclusters` resource: 124 | ``` 125 | apiVersion: rbac.authorization.k8s.io/v1 126 | kind: Role 127 | metadata: 128 | name: ray-admins 129 | namespace: my-team 130 | rules: 131 | - apiGroups: ["ray.io"] 132 | resources: ["rayclusters"] 133 | verbs: ["admin"] 134 | ``` 135 | 136 | Role bindings to this role can grant users access to the Ray cluster. The auth proxy can use the `SubjectAccessReview` API to verify if 137 | the authenticated user also has admin access to the targetted Ray cluster. 138 | 139 | Below is an example `SubjectAccessReview` request that would be sent from the auth proxy: 140 | ``` 141 | apiVersion: authorization.k8s.io/v1 142 | kind: SubjectAccessReview 143 | spec: 144 | user: userA 145 | resourceAttributes: 146 | verb: admin 147 | group: ray.io 148 | resource: rayclusters 149 | name: ray-cluster 150 | namespace: my-team 151 | ``` 152 | 153 | If the authenticated user has access to the `admin` verb for the Ray cluster resource, the `SubjectAccessReview` request returns an `allowed` status 154 | and the auth proxy grants access to the authenticated user. 155 | 156 | ## Risks 157 | 158 | Enabling the authentication proxy puts the Kubernetes control plane in the critical path for accessing the Ray head. This is a trade-off users should consider when enabling this capability. 159 | To mitigate this, we may consider implementing an in-memory cache of access control rules in the sidecar proxy so that every request to the sidecar does not require a TokenReview or SubjectAccessReview request. 160 | -------------------------------------------------------------------------------- /reps/2023-07-08-air-surface-syntax.md: -------------------------------------------------------------------------------- 1 | # Refining the Ray AIR Surface API 2 | 3 | ## Summary 4 | 5 | Disband the `ray.air` namespace and get rid of Ray AIR sessions. 6 | 7 | ### General Motivation 8 | 9 | Ray AIR has made it significantly easier to use Ray's scalable machine learning 10 | libraries 11 | - Ray Data for batch inference and last mile data processing and ingestion, 12 | - Ray Train for machine learning training and 13 | - Ray Serve for model and application serving 14 | 15 | together. 16 | 17 | One piece of feedback we have frequently received from users is that they are confused how Ray AIR 18 | relates to the individual libraries. In particular: 19 | - When should I use AIR's abstractions (e.g. should I use `BatchPredictor` or use Ray Data's map functionality, 20 | should I use `PredictorDeployment` or deploy my model with Ray Serve directly?) and 21 | - How does the `ray.air` namespace relate to `ray.data`, `ray.train` and `ray.serve`? 22 | 23 | The `ray.air` namespace both containing low level common utilities as well as highler level 24 | abstraction adds to this confusion. We have also learned that the higher level abstractions we 25 | originally introduced for Ray AIR become unneccessary and the same functionality can nicely be achieved 26 | with the libraries themselves by making the libraries a little more interoperable. 27 | 28 | We have already implemented this strategy by replacing `BatchPredictor` with Ray Data native functionality 29 | (see https://github.com/ray-project/enhancements/blob/main/reps/2023-03-10-batch-inference.md and 30 | https://docs.ray.io/en/master/data/batch_inference.html) and by 31 | improving Ray Train's ingestion APIs 32 | (https://github.com/ray-project/enhancements/blob/main/reps/2023-03-15-train-api.md and 33 | https://docs.ray.io/en/master/ray-air/check-ingest.html). 34 | 35 | As a result of these changes, the `ray.air` namespace has become less and less relevant, and in this 36 | REP we propose to go all the way and remove it altogether in line with the Zen of Python 37 | ``` 38 | There should be one -- and preferably only one -- obvious way to do it. 39 | ``` 40 | This solves the confusions mentioned above and makes the Ray AIR APIs more coherent and focused around 41 | the cricital workloads (`ray.data` for batch inference, `ray.train` for training and `ray.serve` for serving). 42 | 43 | ### Should this change be within `ray` or outside? 44 | 45 | main `ray` project. Changes are made to Ray Train, Tune and AIR. 46 | 47 | ## Stewardship 48 | 49 | ### Required Reviewers 50 | The proposal will be open to the public, but please suggest a few experienced Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers. 51 | 52 | @matthewdeng, @krfricke 53 | 54 | ### Shepherd of the Proposal (should be a senior committer) 55 | To make the review process more productive, the owner of each proposal should identify a **shepherd** (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review. 56 | 57 | @ericl 58 | 59 | ## Details of the API changes 60 | 61 | Concretely, we replace the Ray AIR session with a training context to 62 | 1. avoid the user confusion of what a `session` is (and not having to explain in the documentation) and 63 | 2. bring the API in line with other Ray APIs like `get_runtime_context` as well as Ray Data's `DataContext`. 64 | 65 | The API changes are 66 | ``` 67 | from ray import air, train 68 | 69 | # Ray Train methods and classes: 70 | 71 | air.session.report -> train.report 72 | air.session.get_dataset_shard -> train.get_dataset_shard 73 | air.session.get_checkpoint -> train.get_checkpoint 74 | air.Checkpoint -> train.Checkpoint 75 | air.Result -> train.Result 76 | 77 | # Ray Train configurations: 78 | 79 | air.config.CheckpointConfig -> train.CheckpointConfig 80 | air.config.FailureConfig -> train.FailureConfig 81 | air.config.RunConfig -> train.RunConfig 82 | air.config.ScalingConfig -> train.ScalingConfig 83 | 84 | # Ray TrainContext methods: 85 | 86 | air.session.get_experiment_name -> train.get_context().get_experiment_name 87 | air.session.get_trial_name -> train.get_context().get_trial_name 88 | air.session.get_trial_id -> train.get_context().get_trial_id 89 | air.session.get_trial_resources -> train.get_context().get_trial_resources 90 | air.session.get_trial_dir -> train.get_context().get_trial_dir 91 | air.session.get_world_size -> train.get_context().get_world_size 92 | air.session.get_world_rank -> train.get_context().get_world_rank 93 | air.session.get_local_rank -> train.get_context().get_local_rank 94 | air.session.get_local_world_size -> train.get_context().get_local_world_size 95 | air.session.get_node_rank -> train.get_context().get_node_rank 96 | 97 | del air 98 | ``` 99 | 100 | These changes are ready to try out with https://github.com/ray-project/ray/pull/36706 and we encourage user feedback on the changes. 101 | 102 | ## Open Questions 103 | 104 | We are likely going to remove `PredictorWrapper` and `PredictorDeployment` and migrate the examples to use Ray Serve deployments 105 | direcly, and we are also likely going to move `air.integrations` to `train.integrations`. 106 | 107 | For the `PredictorDeployment` removal, the user code will change from 108 | ```python 109 | from ray import serve 110 | from ray.serve import PredictorDeployment 111 | from ray.serve.http_adapters import pandas_read_json 112 | from ray.train.xgboost import XGBoostPredictor 113 | 114 | # checkpoint = ... 115 | 116 | serve.run( 117 | PredictorDeployment.options(name="XGBoostService").bind( 118 | XGBoostPredictor, checkpoint, http_adapter=pandas_read_json 119 | ) 120 | ) 121 | ``` 122 | to 123 | ```python 124 | import pandas as pd 125 | from starlette.requests import Request 126 | from ray import serve 127 | from ray.train.xgboost import XGBoostTrainer 128 | 129 | # checkpoint = ... 130 | 131 | @serve.deployment 132 | class XGBoostService: 133 | def __init__(self, checkpoint): 134 | self.model = XGBoostTrainer.get_model(checkpoint) 135 | 136 | async def __call__(self, http_request: Request): 137 | input = await http_request.body() 138 | data = pd.read_json(input.decode(), **http_request.query_params) 139 | return self.model.predict(data) 140 | 141 | serve.run(XGBoostService.bind(checkpoint)) 142 | ``` 143 | 144 | This is almost as simple but a lot more explicit, removes the magic, and can 145 | be easily adapted to different settings. Furthermore it is more unified with 146 | the Ray Serve documentation and the way Ray Serve is typically used. 147 | 148 | ## Internal changes 149 | 150 | As part of this effort, we are also making the recommendation to completely 151 | remove the `air` namespace also for internal use (just to make things clearer 152 | for developers). This work does not need to be connected to a specific release 153 | and here is an idea on where things could go: 154 | 155 | - `air.examples` -- don't have the examples in the source tree, instead put 156 | them into the `ray/doc` folder 157 | - `air.execution` -- due to the layering of Tune depending on Train but not 158 | the other way around, most likely `train._internal` is the right place for these. 159 | - `air.util` -- the tensor extension functionality should go into `ray.data`, 160 | the torch related functions into `ray.train.torch`. 161 | 162 | If there are any other common internal utilities that are unaccounted for, 163 | most likely `train._internal` is a good place to put them. 164 | 165 | ## Migration Plan 166 | 167 | We acknowledge that these kinds of API changes are very taxing on our users and we paid special attention that the migration can be done 168 | easily as a simple text substitution without needing large changes for existing code bases. To enable a smooth migration, both APIs will 169 | work for the Ray 2.7 release. 170 | 171 | Examples and documentation will be fully converted by Ray 2.7 and the old versions of the APIs will print deprecation warnings together 172 | with instructions on how the user code needs to be upgraded. 173 | -------------------------------------------------------------------------------- /reps/2023-03-15-train-api.md: -------------------------------------------------------------------------------- 1 | # Simplifying Ray Train Ingest APIs 2 | 3 | ## Summary 4 | Deprecate the preprocessor and object store memory arguments for `ray.train.Trainer`. 5 | 6 | ### General Motivation 7 | 8 | This doc proposes to simplify Train's DatasetConfig as we move to the new streaming backend by default for Datasets. Similar to as noted in https://github.com/ray-project/enhancements/pull/25, Ray Datasets will have both lazy and streaming execution by default in Ray 2.4. Furthermore, `DatasetPipeline` will be deprecated in the future to consolidate functionality on the Dataset class. 9 | 10 | With these changes, a few possibilities for simplification open up in the Train API: 11 | 1. Decoupling Preprocessors from Trainers, so that Data preprocessing is performed on the Dataset explicitly by the user prior to passing the Dataset into the Trainer. 12 | 2. Using the same resource limiting API in Train as in Datasets (i.e., `ExecutionResources`), instead of having a separate `max_object_store_memory_fraction` config. 13 | 14 | Simplification is greatly desirable here, since users commonly find Dataset\<\>Trainer interactions difficult to understand and debug. 15 | 16 | ### Should this change be within `ray` or outside? 17 | main `ray` project. Changes are made to Ray Data and Ray AIR level. 18 | 19 | ## Stewardship 20 | ### Required Reviewers 21 | The proposal will be open to the public, but please suggest a few experienced Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers. 22 | 23 | @amogkam, @c21, @clarkzinzow, @jianoiax 24 | 25 | ### Shepherd of the Proposal (should be a senior committer) 26 | To make the review process more productive, the owner of each proposal should identify a **shepherd** (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review. 27 | 28 | @stephanie-wang 29 | 30 | ## Design and Architecture 31 | 32 | ## API Changes 33 | 34 | 1. Introduce a `resource_limits: ExecutionResources(object_store_memory=2 * GiB)` arg to `ray.air.DatasetConfig`. This enables streaming by default, with a limit of 2GiB, and deprecates the previous `max_object_store_memory_fraction` argument. 35 | 36 | 2. Introduce `Dataset.get_logical_plan()` (DeveloperAPI), which returns the logical plan that can be used to extract the lineage of preprocessors applied to this Dataset. If multiple preprocessors are applied, Train can return a `Chain` of the preprocessors. Non-preprocessor operations on the Dataset are ignored, and we can also allow ignoring preprocessors such as per epoch preprocessors. This function will be used by Train to persist fitted preprocessors with checkpoints. 37 | 38 | 3. Deprecate the following additional Trainer configs when streaming is enabled: `global_shuffle` and `randomize_block_order` (user to use native Dataset shuffle ops), and the preprocessing args `fit`, `transform`, `preprocessor`, and `per_epoch_preprocessor` (user to setup preprocessing explicitly prior to creating Trainer). 39 | 40 | ## Example: 41 | 42 | ### Before 43 | ```python 44 | 45 | base = ray.data.read_parquet("s3://bucket/etl_output") 46 | fact_table = ray.data.read_csv("s3://bucket/my.csv") 47 | 48 | # Create the preprocessor. 49 | prep = StandardScaler(["f1", "f2"]) 50 | 51 | # Create the per-epoch preprocessor. 52 | per_epoch_prep = RandomNoisePreprocessor() 53 | 54 | # Trainer applies preprocessing internally via config. 55 | trainer = TorchTrainer( 56 | model, 57 | datasets={ 58 | "train_ds": train_ds, 59 | "fact_table": fact_table, 60 | }, 61 | scaling_config=ScalingConfig(num_workers=4), 62 | preprocessor=prep, 63 | dataset_config={ 64 | "train_ds": { 65 | "max_object_store_memory_fraction": 0.2, # Enable streaming. 66 | "randomize_block_order": True, 67 | "per_epoch_preprocessor": per_epoch_prep, 68 | }, 69 | }, 70 | ) 71 | 72 | # Checkpoint includes fitted preprocessor. 73 | best_checkpoint = trainer.fit().checkpoint 74 | assert best_checkpoint.get_preprocessor() == prep 75 | ``` 76 | 77 | ### After 78 | ```python 79 | 80 | base = ray.data.read_parquet("s3://bucket/etl_output") 81 | fact_table = ray.data.read_csv("s3://bucket/my.csv") 82 | 83 | # Fit the preprocessor. 84 | prep = StandardScaler(["f1", "f2"]) 85 | prep.fit(base) 86 | 87 | # Apply base preprocessing. 88 | train_ds = base.map_batches(prep) 89 | train_ds.cache() # Optional: cache the base data in memory. 90 | 91 | # Per-epoch preprocessing. 92 | per_epoch_prep = RandomNoisePreprocessor() 93 | per_epoch_prep.ignore_for_inference = True 94 | train_ds = train_ds \ 95 | .randomize_block_order() \ 96 | .map_batches(per_epoch_prep) 97 | 98 | # Trainer doesn't know about preprocessing at all. 99 | trainer = TorchTrainer( 100 | model, 101 | datasets={ 102 | "train_ds": train_ds, 103 | "fact_table": fact_table, 104 | }, 105 | scaling_config=ScalingConfig(num_workers=4), 106 | dataset_config={ 107 | "train_ds": { 108 | "resource_limits": ExecutionResources( 109 | object_store_memory=20e9, # Customized streaming memory limit. 110 | ), 111 | }, 112 | }, 113 | ) 114 | 115 | # Checkpoint includes fitted preprocessor. 116 | best_checkpoint = trainer.fit().checkpoint 117 | assert best_checkpoint.get_preprocessor() == prep 118 | ``` 119 | 120 | While the "after" code is longer, note that all the data processing code is now cleanly separated from the Trainer, which both a conceptual and practical simplification. In addition, having the fitted preprocessor computed early enables the user code to reference it (e.g., to get computed categories, etc.). 121 | 122 | ## FAQ 123 | 124 | - Q: What if I wanted to change per-trial datasets / prep with Tune? 125 | - A: You could prepare multiple datasets lazily on the driver. 126 | 127 | - Q: Are we deprecating the preprocessor arg for Train entirely? 128 | - A: Yes. 129 | 130 | - Q: Will we still save the preprocessor in the Checkpoint? 131 | - A: Yes, this doesn't change. 132 | 133 | - Q: Should we have both `Preprocessor.transform` and `Dataset.map_batches`? 134 | - A: We will deprecate the former. 135 | 136 | - Q: What happens if you apply multiple preprocessors to a Datasets? 137 | - A: The checkpoint will have the full chain, including per-epoch ones. Preprocessors can be tagged as used for training only / ignored during inference by setting an `ignore_for_inference` (constructor) attribute. 138 | 139 | - Q: What happens if you apply ordinary functions to the Dataset? 140 | - A: You'll get a warning that these functions are not captured in the preprocessing chain, and to use BatchMapper if you want that. 141 | 142 | - Q: Why not require the user to do all Data operations outside of Train, including the split? 143 | - A: This would break tuning, as Train needs to create a separate Data stream per trial. This is not possible post split as calling split is a consumption operation. 144 | 145 | ## Compatibility, Deprecation, and Migration Plan 146 | An important part of the proposal is to explicitly point out any compability implications of the proposed change. If there is any, we should thouroughly discuss a plan to deprecate existing APIs and migration to the new one(s). 147 | 148 | Ray 2.4: Lay the groundwork for these new APIs 149 | - Streaming on by default in Datasets only (not Train). 150 | - API changes from the related inference REP https://github.com/ray-project/enhancements/pull/25 151 | 152 | Ray 2.5: Onboard new users onto new APIs 153 | - Introduce the API changes proposed above, and enable streaming by default in Train. 154 | - Deprecated APIs will be inaccessible in streaming mode for Train. 155 | - Legacy APIs will be fully supported in non-streaming mode for Train. 156 | - Rewrite docs and examples to use new APIs. 157 | 158 | Ray 2.6/7: Deprecate old APIs 159 | - Full feature parity with global / windowed shuffles using new streaming data APIs. 160 | - Fully deprecate DatasetPipeline / legacy Train APIs. 161 | 162 | 163 | ## Test Plan and Acceptance Criteria 164 | The proposal should discuss how the change will be tested **before** it can be merged or enabled. It should also include other acceptance criteria including documentation and examples. 165 | 166 | - Unit and integration for new APIs 167 | - Documentation and examples on new API. 168 | -------------------------------------------------------------------------------- /reps/2022-12-08-ray-for-federated-learning-and-privacy-preserving-computing.md: -------------------------------------------------------------------------------- 1 | 2 | ## Summary - Ray for federated learning and privacy-preserving computing 3 | ### General Motivation 4 | Federated machine learning and privacy-preserving computing are getting wider. They're obviously distributed systems because they're always across parties(companies or ORGs.) To extend Ray ecosystem to federated learning and privacy-preserving computing domain, we should let users be easy to use Ray, for building their own federated learning and privacy-preserving computing applications without any concern on data privacy, task-attacks, illegal invasion, and etc. 5 | 6 | PS: I also found another related issue: https://github.com/ray-project/ray/issues/25846 7 | 8 | In this proposal, we'd like to build a connector layer on Ray, to provide the ability for users use Ray easily to build their such kind of system, with the secure issues addressed totally. 9 | 10 | ### Key requirements: 11 | - Provide the ability for users to easily build their such kind of application on Ray. 12 | - Setup different Ray clusters for different parties to avoid uncotrollable complex Ray protocols. 13 | - We have tried to setup a single cross-party Ray cluster but failed for security reasons. It is difficult for parties to detect and prevent attacks, e.g. a malicious attacker (it could even be one of the parties) runs destructive code. The complex Ray protocols make it very difficult to enhance the security under single Ray cluster. 14 | - Have a unified and global-viewed programming mode for different parties. 15 | - Data transmission across parties should be in push mode instead of pull mode. It's hard to prevent malicious attacker stealing data in pull mode. Push mode is much better since it's the responsibility of the party to decide whether to send data to others. 16 | - Tasks should be driven in multi-controller mode. That means tasks should be driven by themselves(who are inside this party) instead of others. 17 | 18 | ### Should this change be within ray or outside? 19 | No, there is no need to change ray repo, the steps should be: 20 | 1. Adding a new repo `RayFed` under `ray-project`: `ray-project/RayFed` 21 | 2. The package name is `rayfed`: pip install rayfed 22 | 3. importing lines should be `import fed` 23 | 24 | ## Stewardship 25 | ### Required Reviewers 26 | @jovany-wang 27 | ### Shepherd of the Proposal (should be a senior committer) 28 | @jovany-wang 29 | 30 | ## Design and Architecture 31 | The key words in this design are `multiple controller mode`, `restricted data perimeters`. 32 | 33 | ### User Examples 34 | #### A Simple Example 35 | ```python 36 | import fed 37 | 38 | @fed.remote 39 | class My: 40 | def incr(val): 41 | return self._val += val 42 | 43 | @fed.remote(party="BOB") 44 | def mean(val1, val2): 45 | return np.mean(val1, val2) 46 | 47 | 48 | fed.init() 49 | 50 | my1 = My.party("ALICE").remote() 51 | my2 = My.party("BOB").remote() 52 | 53 | val1 = my1.incr.remote(10) 54 | val2 = my1.incr.remote(20) 55 | 56 | result = mean.remote(val1, val2) 57 | fed.get(result) 58 | 59 | fed.shutdown() 60 | ``` 61 | 62 | From the very simple example code, we create an actor for ALICE and BOB, and then do an aggregation in BOB. Note that, the 2 actors are not in the same Ray cluster. 63 | 64 | #### A Federated Learning Example - Diagram 65 | ![image](https://user-images.githubusercontent.com/19473861/206469265-e104888e-50c8-4313-aa18-f2caf0928ef1.png) 66 | 67 | ALICE and BOB are doing their local training loop and synchronizing the weights every once in a while. In this proposal, it's easy to author the code on the top of Ray. 👇 68 | 69 | #### A Federated Learning Example - Code 70 | ```python 71 | import numpy as np 72 | import tensorflow as tf 73 | from sklearn.preprocessing import OneHotEncoder 74 | from tensorflow import keras 75 | import fed 76 | 77 | 78 | @fed.remote 79 | class My: 80 | def __init__(self): 81 | self._model = keras.Sequential() 82 | self._model.compile(optimizer=keras.Adam(), loss=crossentropy) 83 | 84 | def load_data(self, batch_size: int, epochs: int): 85 | self._dataset = _load_data_from_local() 86 | 87 | def train(self): 88 | x, y = next(self._dataset) 89 | with tf.GradientTape() as tape: 90 | y_pred = self.model(x, training=True) 91 | loss = self.model.compiled_loss(y, y_pred) 92 | self._model.optimizer.apply_gradients(zip( 93 | tape.gradient(loss, trainable_vars), 94 | self._model.trainable_variables)) 95 | 96 | def get_weights(self, party): 97 | return self._model.get_weights() 98 | 99 | def update_weights(self, party, weights): 100 | self._model.set_weights(weights) 101 | return True 102 | 103 | 104 | @fed.remote 105 | def mean(party, x, y): 106 | return np.mean([x, y], axis=0) 107 | 108 | 109 | def main(): 110 | fed.init() 111 | epochs = 100, batch_size = 4 112 | 113 | my_alice = My.party("alice").remote() 114 | my_bob = My.party("bob").remote() 115 | 116 | my_alice.load_data.remote(batch_size, epochs) 117 | my_bob.load_data.remote(batch_size, epochs) 118 | 119 | num_batchs = int(150 / batch_size) 120 | for epoch in range(epochs): 121 | # Locally training inner party. 122 | for step in range(num_batchs): 123 | my_alice.train.remote() 124 | my_bob.train.remote() 125 | 126 | # Do weights aggregation and updating. 127 | w_a = my_alice.get_weights.remote() 128 | w_b = my_bob.get_weights.remote() 129 | w_mean = mean.party('alice').remote(w_a, w_b) 130 | result = fed.get(w_mean) 131 | print(f'Epoch {epoch} is finished, mean is {result}') 132 | n_wa = my_alice.update_weights.remote(w_mean) 133 | n_wb = my_bob.update_weights.remote(w_mean) 134 | 135 | fed.shutdown() 136 | ``` 137 | 138 | ### Understanding the mechanism 139 | In this section, we're introducing the deep dive of the above federated learning example, to help understand the mechanism of this connector layer. 140 | 141 | Firstly, ALICE and BOB companies need to create their Ray clusters, and expose one communication port to each other. 142 | And then, both ALICE and BOB need to run the example code in their Ray clusters(drive the DAG in multi-controller mode). Note that, the peer port and parties info should be passed via `fed.init()`. 143 | ![image](https://user-images.githubusercontent.com/19473861/206469580-468ea258-407e-40cf-b059-06a63a25c168.png) 144 | 145 | Every Ray cluster will create the Ray tasks specified to its party, meanwhile, it will send the output data to peer if the downstream Ray task are not specified to its own party. For example, in ALICE cluster, `mean.party('alice').remote(w_a, w_b)` will create a Ray task in ALICE cluster, but it wouldn't be created in BOB cluster. Due to `w_b` is an output data from BOB cluster, therefore, in ALICE cluster, it will be inserted a `recv_op` barrier to `mean` task, and in BOB cluster, it will be inserted a `send_op` barrier as a downstream of `w_b = my_bob.get_weights.remote()` task. 146 | 147 | ### Benefits 148 | Compared with running the DAG in one Ray cluster, we have the following significant benefits: 149 | - It's very close to Ray native programming pattern. But the DAG could be ran across Ray clusters. 150 | - Very restricted and clear data perimeters. we only transmit the data which is needed by others. And all of the data couldn't be fetched in any way if we don't allow. 151 | - If we run the DAG in one Ray cluster, data transmission is in PULL-BASED mode. For example, if BOB invokes `ray.get(f.options("PARTY_ALICE").remote())`, the return object is pulled by BOB, as a result, ALICE don't have the knowledge for that. In this proposal, it's in PUSH-BASED mode. ALICE has the knowledge for that there is a data object will be sent to BOB. This is a significant advantage of multi-controller mode. 152 | - All the disadvantages are addressed in this proposal: **code distribution**, **data privacy**, **task-attacks**, **illegal invasion**, **deserialization vulnerability**, and etc. 153 | - Brings all Ray advantages to the local Ray cluster, like highly performance RPC, fault tolerance, task scheduling/resource management, and other Ray ecosystem libraries( Ray datasets, Ray train and Ray serve) for local training. 154 | 155 | 156 | 157 | ## Compatibility, Deprecation, and Migration Plan 158 | None. 159 | 160 | ## Test Plan and Acceptance Criteria 161 | - Test should include unit tests and integration tests(release tests). 162 | - Tests should be enabled in CI. 163 | - Benchmark on across parties data transmission for typical federated learning case. 164 | - Document page and necessary code comments. 165 | 166 | ## (Optional) Follow-on Work 167 | - UX improvements for across parties exceptions passing. 168 | - Provide single-controller mode for quickly building applications and debug inner one party. 169 | - Performance optimization for other workloads. 170 | - A unified gateway for ETH communication. 171 | - Support TEE(Trust Execution Environment) device. -------------------------------------------------------------------------------- /reps/2025-11-21-ray-history-server/2025-11-21-ray-history-server.md: -------------------------------------------------------------------------------- 1 | ## Ray History Server 2 | 3 | ### General Motivation 4 | 5 | It is becoming increasingly common for Ray users to treat Ray clusters as ephemeral units of compute 6 | that only run for the duration of a single (or multiple) Ray jobs. This pattern results in significant improvements in 7 | cost efficiency and resource sharing, especially in capacity-constrained environments where hardware accelerators are scarce. 8 | 9 | However, a fundamental trade-off of this approach, compared to long-lived interactive Ray clusters, is that users 10 | lose access to the Ray Dashboard, which is often treated as the entry point for most observability signals 11 | in a Ray cluster. While it’s possible to export the relevant signals to external sources, users prefer the experience 12 | of using the Ray Dashboard as a single source of truth to debug job failures. 13 | 14 | This enhancement proposal introduces the concept of a Ray History Server, which can orchestrate the reconstruction of the 15 | Ray Dashboard even for terminated Ray clusters. This will be accomplished by leveraging Ray’s Event Export API to persist 16 | task/actor/job state, and native components in KubeRay for pushing logs and events to a blob store (GCS, S3, etc). 17 | 18 | ### Should this change be within `ray` or outside? 19 | 20 | Some components/libraries, such as the event exporter, will be in Ray. Everything else will be hosted in the KubeRay project. 21 | 22 | ## Stewardship 23 | 24 | ### Owners 25 | 26 | - @KunWuLuan (Alibaba) 27 | - @MengjinYan, @Future-Outlier, @rueian (Anyscale) 28 | - @andrewsykim, @Chia-ya (Google) 29 | 30 | ### Shepherd of the Proposal (should be a senior committer) 31 | 32 | @edoakes 33 | 34 | ## Design and Architecture 35 | 36 | ### Components and Libraries 37 | 38 | The Ray History Server project will introduce the following components and libraries across Ray and KubeRay: 39 | 40 | * Ray: 41 | * Updated Ray Dashboard frontend that can dynamically adjust request paths to fetch task/actor/job state from a history server. 42 | * Ray Event Export API, available starting in Ray 2.49, which supports task/actor/job/node events. 43 | The events can be used to generate the state of the Ray cluster at any given timestamp. 44 | * KubeRay: 45 | * An events collector sidecar that receives push events from Ray (via RAY_enable_core_worker_ray_event_to_aggregator) 46 | and persists events to the blob store. 47 | * A logging collector sidecar that uploads Ray logs to the blob store. 48 | * A history server (standalone Deployment) that can process events for historical Ray clusters and 49 | serve Ray API endpoints requested by Ray Dashboards. 50 | * A storage reader/writer library providing a pluggable interface for different storage implementations. 51 | 52 | ![Ray History Server Architecture](history_server_architecture.png) 53 | 54 | #### Events Collector 55 | 56 | The Events Collector is a sidecar container deployed alongside every Ray node. It operates as an 57 | HTTP server ingesting events from Ray’s Event Export API (enabled via `RAY_enable_core_worker_ray_event_to_aggregator`). 58 | The server exposes a single POST /events endpoint which receives event data as JSON objects. 59 | Ray is configured to push these events to a localhost endpoint (configured with `RAY_DASHBOARD_AGGREGATOR_AGENT_EVENTS_EXPORT_ADDR`). 60 | The Events Collector is strictly responsible for persisting raw events to blob storage; it does not perform any pre-processing or deduplication. 61 | 62 | #### Logging Collector 63 | 64 | The Logging Collector is responsible for persisting logs in `/tmp/ray/session_latest/logs` to blob store. 65 | While the Ray cluster is active, the Logging Collector will periodically upload snapshot of logs to storage. 66 | Upon receiving a termination signal, it will attempt to upload a final snapshot of logs before exiting. 67 | 68 | #### History Server 69 | 70 | The History Server component is a stateless Kubernetes Deployment that serves API endpoints that are compatible with the Ray Dashboard. 71 | To serve endpoints like `/api/v0/tasks`, the History Server will be responsible for server-side processing of events 72 | in blob store that were uploaded by the Events Collector. In alpha / beta milestones, the history server will store 73 | final task / actor / job states in-memory. For GA, we may reconsider this approoach if we identify scalability limitations. 74 | More details on event processing below. 75 | 76 | #### Event Processor 77 | 78 | The Event Processor runs as a process within the History Server container. It is responsible for downloading the 79 | complete event history of terminated clusters and aggregating that data into final states. These processed states 80 | are then used to serve API requests from Ray Dashboard clients. 81 | 82 | ### File structure for persisted events & logs 83 | 84 | Users rarely filter events by node name; instead, they typically filter by job ID and time range. 85 | Therefore, building an index based on job ID and timestamps is critical. Unlike the Spark History Server, 86 | Ray events are emitted by an aggregation agent residing on each node; therefore, the collector on each specific node 87 | is responsible for grouping the events. 88 | 89 | All events will initially be partitioned by Job ID. Specifically, task events associated with the same Job ID will be stored in the same directory. 90 | * Node-level events will be stored in: cluster_name_cluster_uid/session_id/node_events/-