├── StructuredConcurrency_structured.jpeg ├── StructuredConcurrency_not_structured.jpeg ├── README.md ├── LICENSE.md ├── RTLIB.md ├── Logging.md ├── Find.md ├── StructuredConcurrency.md ├── GcExtensions.md └── Pkg3.md /StructuredConcurrency_structured.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JuliaLang/Juleps/HEAD/StructuredConcurrency_structured.jpeg -------------------------------------------------------------------------------- /StructuredConcurrency_not_structured.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JuliaLang/Juleps/HEAD/StructuredConcurrency_not_structured.jpeg -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Juleps: Julia Enhancement Proposals 2 | 3 | This repository contains proposals to enhance the Julia language and ecosystem. 4 | It contains the following "Juleps" (Julia Enhancement Proposals): 5 | 6 | - [Pkg3](Pkg3.md) – the next generation of Julia package management 7 | - [RTLIB](RTLIB.md) – a runtime-library for Julia. 8 | - [Find](Find.md) - Reorganize search and find API 9 | - [Logging](Logging.md) – A general logging interface 10 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Julia enhancement proposals are licensed under the MIT License: 2 | 3 | > Copyright (c) 2016: [contributors](https://github.com/JuliaLang/Juleps/contributors) 4 | > 5 | > Permission is hereby granted, free of charge, to any person obtaining 6 | > a copy of this software and associated documentation files (the 7 | > "Software"), to deal in the Software without restriction, including 8 | > without limitation the rights to use, copy, modify, merge, publish, 9 | > distribute, sublicense, and/or sell copies of the Software, and to 10 | > permit persons to whom the Software is furnished to do so, subject to 11 | > the following conditions: 12 | > 13 | > The above copyright notice and this permission notice shall be 14 | > included in all copies or substantial portions of the Software. 15 | > 16 | > THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | > EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | > MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 19 | > NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE 20 | > LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 21 | > OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION 22 | > WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 23 | -------------------------------------------------------------------------------- /RTLIB.md: -------------------------------------------------------------------------------- 1 | # JULEP RTLIB 2 | 3 | - **Title:** A runtime library for Julia 4 | - **Authors:** Valentin Churavy <> 5 | - **Created:** November 11, 2016 6 | - **Status:** work in progress 7 | 8 | ## Introduction 9 | 10 | Currently there are two implementation of intrinsics supported in Julia. One of the 11 | implementations is defined in `runtime_intrinsics.c` and the second one is defined in 12 | `intrinsics.cpp` on top of [LLVM intrinsics](http://llvm.org/docs/LangRef.html#id1190) 13 | and [LLVM instructions](http://llvm.org/docs/LangRef.html#instruction-reference). 14 | 15 | The first implementation specifies the semantics and behavior of the intrinsics and is 16 | used as a fallback. The LLVM based implementation is best understood as a pure performance 17 | optimization. 18 | 19 | When a LLVM intrinsic is used and the compiler can't generate hardware instructions 20 | for it, a library call to the runtime library (compiler-rt or libgcc) is emitted. 21 | As a result Julia code using LLVM as a codegen backend needs to link against a 22 | runtime library. In the exploratory work (see https://github.com/JuliaLang/julia/pull/18734) 23 | the LLVM compiler-rt library was choosen, due to its MIT license. 24 | 25 | A drawback of compiler-rt is that it is not fully portable. As an example 26 | `Float128` support is missing on 32bit platforms. 27 | 28 | ## Motivating problem 29 | 30 | How do we support `Float16` and `Float128` in a portable and performant manner? 31 | The current implementation of `Float16` support in Julia is eagerly resolving to 32 | promotion to `Float32` in order to implement most operations. This precludes optimization 33 | on platforms that natively support `Float16` (most prominently GPUs). The second 34 | problem is how are we going to support `Float128` across all platforms in a portable 35 | and still performant way. On 32bit systems we cannot rely on platform implementations. 36 | 37 | ## Goals 38 | 39 | Implement a runtime-library in a mix of C and Julia that contains Julia intrinsics 40 | and compiler-rt. This would allow us to use optimized implementations from compiler-rt, 41 | while having the flexibility of Julia to implement non-essential intrinsics. 42 | 43 | When Julia uses LLVM as a compiler backend it should use a lazy libcall scheme. 44 | When the compiler can't emit optimized code paths (e.g. LLVM instructions and intrinsics), 45 | it will emit calls to calls to the intrinsics. Currently this done already for 46 | better error reporting. The interpreter can eagerly resolve to the implementations 47 | in the runtime library, which will require `ccall`for the `C` based part of the 48 | runtime library. 49 | 50 | The runtime library will consist of two stages. Stage-1 is implemented in C and 51 | contains compiler-rt, while Stage-2 is implemented in Julia. Stage-1 is required 52 | for bootstrapping a minimal Julia, on which Stage-2 can be implemented. 53 | Stage- 1 will also contain optimized (and platform dependent) implementations of 54 | the intrinsics, while Stage-2 will contain portable and general implementations. 55 | 56 | Another goal is to define what is the minimal set of intrinsics that Julia 57 | requires (Stage-1) and what is the extended set (Stage-2). A well defined set of 58 | intrinsics would also be beneficial for alternative compilers. 59 | 60 | ### Stage 1 61 | 62 | C-based implementation of the essential Julia intrinsics + compiler-rt. 63 | - `jl_reinterpret` 64 | - `jl_pointerset` 65 | - `jl_pointerref` 66 | - Operations on Integers 67 | - Arithmetic 68 | - Comparisons 69 | - Conversion between integers 70 | - Operations on Floating Point (hardware based) 71 | - Arithmetic 72 | - Comparisons 73 | - Conversion to and from integers 74 | - `Float32` and `Float64` support 75 | 76 | ### Stage 2 77 | 78 | Julia based implementation for non-essential Julia intrinsics and implementations 79 | to supplement compiler-rt. This implementation will based on a reduced base library 80 | (early stages of the sysimage) that will is only allowed to use Stage-1 funcionality. 81 | The proper sysimage will be based upon Stage-1 and Stage-2. 82 | The basic idea is https://github.com/JuliaLang/julia/pull/18927, 83 | which contains the initial port of compiler-rt to Julia. 84 | - Operations on Floating Point (software based) 85 | - Arithmetic 86 | - Comparisons 87 | - Conversion to and from integers 88 | - Necessary for `Float16` and `Float128` support 89 | - Scalar implementations for vectorized instructions 90 | 91 | #### Building Stage-2 92 | 1. Build Stage-1 and create a shared object file `rtlib-stage1.so` 93 | 2. Build inference.ji 94 | 3. Build Stage-2 with `--rtlib=rtlib-stage1.so` and `--sysimage inference.ji` 95 | 4. Take the object files from Stage-1 and Stage-2 and create `rtlib.so` containing 96 | both Stage-1 and Stage-2. 97 | 5. Build `sys.so` with `--rtlib=rtlib.so` and `--sysimage inference.ji` 98 | 99 | ## Testing and Benchmarking 100 | 101 | All implementations, but especially the runtime versions should be thoroughly 102 | tested for correctness and performance. As part of this Julep the testsuite needs 103 | to be extended to cover the current and future runtime intrinsics. 104 | 105 | ## Non-Goals 106 | 107 | - No support for atomics at this time. Julia will continue to use `libatomic`. 108 | -------------------------------------------------------------------------------- /Logging.md: -------------------------------------------------------------------------------- 1 | # JULEP Logging 2 | 3 | - **Title:** A unified logging interface 4 | - **Author:** Chris Foster 5 | - **Created:** February 2017 6 | - **Status:** work in progress 7 | 8 | ## Abstract 9 | 10 | *Logging* is a tool for understanding program execution by recording the order and 11 | timing of a sequence of events. A *logging library* provides tools to define 12 | these events in the source code and capture the event stream when the program runs. 13 | The information captured from each event makes its way through the system as a 14 | *log record*. The ideal logging library should give developers and users insight 15 | into the running of their software by provide tools to filter, save and 16 | visualize these records. 17 | 18 | Julia has included simple logging in `Base` since version 0.1, but the tools to 19 | generate and capture events are still immature as of version 0.6. For example, 20 | log messages are unstructured, there's no systematic capture of log metadata, no 21 | debug logging, inflexible dispatch and filtering, and the role of the code at 22 | the log site isn't completely clear. Because of this, Julia 0.6 packages use 23 | any of several incompatible logging libraries, and there's no systematic way to 24 | generate and capture log messages. 25 | 26 | This julep aims to improve the situation by proposing: 27 | 28 | * A simple, unified interface to generate log events in `Base` 29 | * Conventions for the structure and semantics of the resulting log records 30 | * A minimum of dispatch machinery to capture, route and filter log records 31 | * A default backend for displaying, filtering and interacting with the log stream. 32 | 33 | A non-goal is to create a complete set of logging backends - these can be 34 | supplied by packages. 35 | 36 | ## The design problem 37 | 38 | There's two broad classes of users for a logging library - library authors and 39 | application authors - each with rather different needs. 40 | 41 | ### The library author 42 | 43 | Ideally logging should be a high value tool for library development, making 44 | library authors lives easier, and giving users insight. 45 | 46 | For the library author, the logging tools should make log events *easy to generate*: 47 | 48 | * Logging should require a minimum of syntax - ideally just a logger verb and 49 | the message object in many cases. Context information for log messages (file 50 | name, line number, module, stack trace, etc.) should be automatically gathered 51 | without a syntax burden. 52 | * Log generation should be free from prescriptive log message formatting. Simple 53 | string interpolation, `@sprintf` and `fmt()`, etc should all be fine. When 54 | log messages aren't strings, a sensible conversion should be applied by 55 | default. 56 | * Flexible user definable structure for log records should make it easy to 57 | record snapshots of program state in the form of variable names and values. 58 | This would generalize `@show` using log records as a transport mechanism. 59 | 60 | The default configuration for log message reporting should involve *zero 61 | setup* and should produce *readable output*: 62 | 63 | * No mention of log dispatch should be necessary at the message creation site. 64 | * The default console log handler should integrate somehow with the display 65 | system, to show log records in a way which is highly readable. 66 | * Basic filtering of log messages should be easy to configure. 67 | 68 | The default configuration for log message reporting will generally define what 69 | library authors see during development, so will end up defining the conventions 70 | authors use when including logging in their library. To this extent, it's 71 | important to do a good job displaying metadata! 72 | 73 | ### The application author 74 | 75 | Application authors bring together many disparate libraries into a larger 76 | system; they need consistency and flexibility in collecting log records. 77 | 78 | Log events are generally tagged with useful context information which is 79 | available both lexically (eg, module, file name, line number) and dynamically 80 | (eg, time, stack trace, thread id). Log records should have *consistent, 81 | flexible metadata* which represents and preserve this structured information in 82 | a way that can be collected systematically. 83 | 84 | * Each logging location should have a unique identifier, `id`, passed as part of 85 | the log record metadata. This greatly simplifies tasks such limiting the rate 86 | of logging for a given line of code. 87 | * Users should be able to add structured information to log records, to be 88 | preserved along with data extracted from the logging context. For example, a 89 | list of `key=value` pairs offers a decent combination of simplicity and power. 90 | * Clear guidelines should be given about the meaning and appropriate use of 91 | standard log levels so libraries can be consistent. 92 | 93 | Log *collection* should be unified: 94 | 95 | * For all libraries using the standard logging API, it should be simple to 96 | intercept, and dispatch logs in a unified way which is under the control of 97 | the application author. For example, to write json log records across the 98 | network to a log server. 99 | * It should be possible to naturally control log dispatch from concurrent tasks. 100 | For example, if the application uses a library to handle simultaneous HTTP 101 | connections for both an important task and a noncritical background job, we 102 | may wish to handle the messages generated by these two `Task`s differently. 103 | 104 | The design should allow for an *efficient implementation*, to encourage 105 | the availability of logging in production systems; logs you don't see should be 106 | almost free, and logs you do see should be cheap to produce. The runtime cost 107 | comes in a few flavours: 108 | 109 | * Cost in the logging frontend, to determine whether to filter a message. 110 | * Cost in the logging frontend, in collecting context information. 111 | * Cost in user code, to construct quantities which will only be used in a 112 | log message. 113 | * Cost in the logging backend, in filtering and displaying messages. 114 | 115 | 116 | ## Proposed design 117 | 118 | A prototype implementation is available at https://github.com/c42f/MicroLogging.jl 119 | 120 | ### Quickstart Example 121 | 122 | #### Frontend 123 | ```julia 124 | # using Base.Log 125 | 126 | # Logging macros 127 | @debug "A message for debugging (filtered out by default)" 128 | @info "Information about normal program operation" 129 | @warn "A potentially problem was detected" 130 | @error "Something definitely went wrong, but we recovered enough to continue" 131 | @logmsg Logging.Info "Explicitly defined info log level" 132 | 133 | # Free form message formatting 134 | x = 10.50 135 | @info "$x" 136 | @info @sprintf("%.3f", x) 137 | @info begin 138 | A = ones(4,4) 139 | "sum(A) = $(sum(A))" 140 | end 141 | 142 | # Progress reporting 143 | for i=1:10 144 | @info "Some algorithm" progress=i/10 145 | end 146 | 147 | # User defined key value pairs 148 | foo_val = 10.0 149 | @info "test" foo=foo_val bar=42 150 | ``` 151 | 152 | #### Backend 153 | 154 | ### What is a log record? 155 | 156 | Logging statements are used to understand algorithm flow - the order and timing 157 | in which logging events happen - and the program state at each event. Each 158 | logging event is preserved in a *log record*. The information in a record 159 | needs to be gathered efficiently, but should be rich enough to give insight into 160 | program execution. 161 | 162 | A log record includes information explicitly given at the call site, and any 163 | relevant metadata which can be harvested from the lexical and dynamic 164 | environment. Most logging libraries allow for two key pieces of information 165 | to be supplied explicitly: 166 | 167 | * The *log message* - a user-defined string containing key pieces of program 168 | state, chosen by the developer. 169 | * The *log level* - a category for the message, usually ordered from verbose 170 | to severe. The log level is generally used as an initial filter to remove 171 | verbose messages. 172 | 173 | Some logging libraries (for example 174 | [glib](https://developer.gnome.org/glib/stable/glib-Message-Logging.html) 175 | structured logging) allow users to supply extra log record information in the 176 | form of key value pairs. Others like 177 | [log4j2](https://logging.apache.org/log4j/2.x/manual/messages.html) require extra information to be 178 | explicitly wrapped in a log record type. In julia, supporting key value pairs 179 | in logging statements gives a good mixture of usability and flexibility: 180 | Information can be communicated to the logging backend as simple keyword 181 | function arguments, and the keywords provide syntactic hints for early filtering 182 | in the logging macro frontend. 183 | 184 | In addition to the explicitly provided information, some useful metadata can be 185 | automatically extracted and stored with each log record. Some of this is 186 | extracted from the lexical environment or generated by the logging frontend 187 | macro, including code location (module, file, line number) and a unique message 188 | identifier. The rest is dynamic state which can be generated on demand by the 189 | backend, including system time, stack trace, current task id. 190 | 191 | ### The logging frontend 192 | 193 | TODO 194 | 195 | ### Logging middle layer 196 | 197 | TODO 198 | 199 | ### Early filtering 200 | 201 | TODO 202 | 203 | ### Default backend 204 | 205 | TODO 206 | 207 | ## Concrete use cases 208 | 209 | ### Base 210 | 211 | In Base, there are three somewhat disparate mechanisms for controlling logging. 212 | An improved logging interface should unify these in a way which is convenient 213 | both in the code and for user control. 214 | 215 | * The 0.6 logging system's `logging()` function with redirection based on module 216 | and function. 217 | * The `DEBUG_LOADING` mechanism in loading.jl and `JULIA_DEBUG_LOADING` 218 | environment variable. 219 | * The depwarn system, and `--depwarn` command line flag 220 | 221 | 222 | ## Inspiration 223 | 224 | This Julep draws inspiration from many previous logging frameworks, and helpful 225 | discussions with many people online and at JuliaCon 2017. 226 | 227 | The Java logging framework [log4j2](https://logging.apache.org/log4j/2.x/) was a 228 | great source of use cases, as it contains the lessons from at least twenty years 229 | of large production systems. While containing a fairly large amount of 230 | complexity, the design is generally very well motivated in the documentation, 231 | giving a rich set of use cases. The julia logging libraries - Base in julia 0.6, 232 | Logging.jl, MiniLogging.jl, LumberJack.jl, and particularly 233 | [Memento.jl](https://github.com/invenia/Memento.jl) - provided helpful 234 | context for the needs of the julia community. 235 | 236 | Structured logging as available in 237 | [glib](https://developer.gnome.org/glib/stable/glib-Message-Logging.html) 238 | and [RFC5424](https://datatracker.ietf.org/doc/rfc5424/?include_text=1) (The 239 | Syslog protocol) provide context for the usefulness of log records as key value 240 | pairs. 241 | 242 | For the most part, existing julia libraries seem to follow the design tradition 243 | of the standard [python logging library](https://docs.python.org/3/library/logging.html), 244 | which has a lineage further described in [PEP-282](https://www.python.org/dev/peps/pep-0282/). 245 | The python logging system provided a starting point for this Julep, though the 246 | design eventually diverged from the typical hierarchical setup. 247 | 248 | TODO: Re-survey the following? 249 | * a-cl-logger (Common lisp) - https://github.com/AccelerationNet/a-cl-logger 250 | * Lager (Erlang) - https://github.com/erlang-lager/lager 251 | 252 | 253 | -------------------------------------------------------------------------------- /Find.md: -------------------------------------------------------------------------------- 1 | # JULEP find 2 | 3 | - **Title:** Reorganize Search and Find API 4 | - **Authors:** Milan Bouchet-Valat <> 5 | - **Created:** December 10, 2016 6 | - **Status:** work in progress 7 | 8 | ## Abstract 9 | 10 | The current `find` and `search` families of functions are not very consistent with regard to 11 | naming and supported features. This proposal aims to make the API more systematic. It is based 12 | on ideas discussed in particular in [issue #10593](https://github.com/JuliaLang/julia/issues/10593) 13 | and [issue #5664](https://github.com/JuliaLang/julia/issues/5664). 14 | 15 | ## Current Functions 16 | 17 | Currently there are (at least) five families of search and find functions: 18 | - `find` `findn` `findin` `findnz`, `findfirst` `findlast` `findprev` `findnext` 19 | - `[r]search` `[r]searchindex` `searchsorted` `searchsortedlast` `searchsortedfirst` 20 | - `match` `matchall` `eachmatch` 21 | - `indmin` `indmax` `findmin` `findmax` 22 | - `indexin` 23 | 24 | In the `find` family, `find` and `findn` return indices of non-zero or `true` values. 25 | `findfirst`, `findlast`, `findprev` and `findnext` are very similar to `find`, but 26 | iterative. `findin` allows looking for all elements of a collection inside another one. 27 | Finally, `findnz` is even more different as it only works on matrices and returns a tuple 28 | of vectors `(I,J,V)` for the row- and column-index and value. 29 | 30 | In the `search` family, `[r]search` and `[r]searchindex` look for strings/chars/regex in a 31 | string (though they also support bytes), the former returning a range, the latter the first 32 | index. `searchsorted`, `searchsortedlast` and `searchsortedfirst` look for values equal to 33 | or lower than an argument, and return a range for the first, and index for the two others. 34 | 35 | The `match`, `matchall` and `eachmatch` functions deal with regular expressions. `match` 36 | returns a special `RegexMatch` object with offsets and matches. `matchall` returns all 37 | matching substrings. `eachmatch` returns an iterator over matches. 38 | 39 | The `indmin` and `indmax` functions are quite different, as they return the index of the 40 | minimum/maximum value. `findmin` and `findmax` return an `(index, value)` tuple of these 41 | elements. 42 | 43 | Finally, `indexin` is the same as `findin` (i.e. returns index of elements in a collection), 44 | but it returns `0` for elements that were not found, instead of a shorter vector. 45 | 46 | ## Dimensions of Variation 47 | 48 | This diversity can be organized along several dimensions, which are not always combined 49 | systematically in the existing API: 50 | 51 | - **Mode of operation**: 52 | - all matches at once (`find`, `findin`, `indexin`) 53 | - iteratively forward (`findnext`, `search`) 54 | - iteratively backwards (`findprev`, `rsearch`) 55 | - the first match (`findfirst`, `searchsortedfirst`) 56 | - the last match (`findlast`, `searchsortedlast`) 57 | 58 | - **Look for**: 59 | - non-zeros or `true` entries (`find(A)`) 60 | - predicate-test-true (`find(pred, A)`) 61 | - elements present in a collection (`findin`, `indexin`) 62 | - elements equal to a value (`findfirst(A, v)`, `findlast(A, v)`, `findnext(A, v)`, 63 | `findprev(A, v)`) 64 | - extrema (`findmin`, `findmax`) 65 | - range of elements matching a sequence (`search*`, mostly for strings) 66 | 67 | - **Return**: 68 | - linear indices (most `find*` functions) 69 | - cartesian indices (`findn`) 70 | - cartesian indices and values (`findnz`) 71 | - range of linear indices (`search*`) 72 | 73 | - **Return when not found**: 74 | - shorter vector for all-at-once functions (`find`, `findin`, 75 | `findn`, `findfnz`) 76 | - except `indexin` which includes a `0` entry 77 | - `0` for functions returning a single index 78 | 79 | ## Summary of Current Status 80 | 81 | The following table reorganizes existing methods which return linear indices based on the 82 | first two dimensions described above: 83 | - Whether to return all matches, only the next one, or only the previous one. 84 | - What values to look for. 85 | 86 | | | nonzeros | test predicate `pred` | in collection `c` | equal to value `v` | sequence or regex `s` | extrema | 87 | | --- | --- | --- | --- | --- | --- | --- | 88 | | All at once | `find(A)` | `find(pred,A)` | `findin(A,c)` | `searchsorted(A,v)` | | `indmin(A)`/`indmax(A)` | 89 | | Next match | `findnext(A,1)` | `findnext(pred,A,1)` | | `findnext(A,v,1)` | `search(A,s,1)` | | 90 | | Previous match | `findprev(A,endof(A))` | `findprev(pred,A,endof(A))` | | `findprev(A,v,endof(A))` | `rsearch(A,s,endof(A))` | | 91 | 92 | Some functions do not fit into this table: 93 | - `findfirst` and `findlast` are special cases of `findnext` and `findprev`. 94 | - `searchsortedfirst` and `searchsortedlast` give each a part of the result of `searchsorted`. 95 | - `[r]searchindex` give part of the result from `[r]search`. 96 | - `findn` and `findnz` do not return linear indices. 97 | - `indexin` is similar to `findin` but returns `0` for entries with no match. 98 | - `match`, `matchall` and `eachmatch` return `RegexMatch` objects or strings rather than 99 | indices. 100 | - `findmin` and `findmax` return both the index an value of extrema. 101 | 102 | ## Open Design Issues 103 | 104 | - **How to switch between forward and backward search**: 105 | - Separate functions (e.g. starting with `r` for "reverse"): not great for 106 | documentation, harder to find, not very Julian. 107 | - `rev=false` positional argument: not very explicit. 108 | - `rev=false` keyword argument: clearer and consistent with `sort`, but maybe too slow 109 | (especially for single-element functions). 110 | - special object like `Order.Forward`/`Order.Backward`: clearer, but these objects do 111 | not have this meaning in Base, and introducing separate objects just for this may not 112 | be worth it. 113 | 114 | - **Whether to keep the the `find`/`search` distinction**: 115 | - Obviously requires clearly distinct meanings for each family of functions. 116 | - Advantage: less complex signatures (due to many methods) for users, and limits 117 | dispatch ambiguities. 118 | - Drawback: using two different names for related functions makes it harder to find 119 | one variant when you know the other one; in particular, auto-completion does not help. 120 | 121 | - **How to search iteratively**: 122 | - Functions returning the next/previous match after a given index: simple, but require 123 | manual handling of indices. 124 | - Functions returning an iterator over matches: more user-friendly, though overkill when 125 | you only want the next match. 126 | - The first approach can fit all needs (even if it can be cumbersome), but the second 127 | one can only replace the first one if it supports creating an iterator starting from 128 | a given index (on which you can call `first` to get the first match). 129 | 130 | 131 | ## General Proposal 1 132 | 133 | The first proposal uses `find` for all-at-once variants, and `search` for iterative 134 | variants. The variants returning iterators (last two rows) do not correspond to existing 135 | functions: they could be added later, or never, without breaking the consistency of the API. 136 | In this proposal, it is not possible to have both methods returning the next/previous match 137 | and methods returning an iterator starting from a given index: the signatures would be the 138 | same. 139 | 140 | | | nonzeros | predicate test | in collection `c` | equal to `v` | sequence or regex `s` | extrema | 141 | | --- | --- | --- | --- | --- | --- | --- | 142 | | All at once | `find(A)` | `find(pred,A)` | `findin(A,c)` | `findeq(A,v)` | `findseq(A,s)` | `findmin(A)`/`findmax(A)` | 143 | | Next match | `search(A,1)` | `search(pred,A,1)` | * | `searcheq(A,v,1)` | `searchseq(A,s,1)` | | 144 | | Previous match | `search(A,endof(A),true)` | `search(pred,A,endof(A),true)` | * | `searcheq(A,v,endof(A),true)` | `searchseq(A,s,endof(A),true)` | | 145 | | Forward iterator | `search(A)` | `search(pred,A)` | * | `searcheq(A,v)` | `searchseq(A,s)` | | 146 | | Backward iterator | `search(A,true)` | `search(pred,A,true)` | * | `searcheq(A,v,true)` | `searchseq(A,s,true)` | | 147 | 148 | \* These combinations are not needed as they correspond to `searchseq`. Indeed they do not 149 | exist in the current API. 150 | 151 | ## General Proposal 2 152 | 153 | The second proposal uses `find` for functions returning one or several indices (either 154 | all-at-once or iterative), and `search` for functions returning iterators (which 155 | currently do not exist). Contrary to the first proposal, it therefore allows for 156 | iterators starting at a a specific index. If those variants were not added in the end, 157 | only `find` would exist. Conversely, methods returning the next/previous match could be 158 | droppped in favor of iterators. 159 | 160 | | | nonzeros | predicate test | in collection `c` | equal to `v` | sequence or regex `s` | extrema | 161 | | --- | --- | --- | --- | --- | --- | --- | 162 | | All at once | `find(A)` | `find(pred,A)` | `findin(A,c)` | `findeq(A,v)` | `findseq(A,s)` | `findmin(A)`/`findmax(A)` | 163 | | Next match | `find(A,1)` | `find(pred,A,1)` | * | `findeq(A,v,1)` | `findseq(A,s,1)` | | 164 | | Previous match | `find(A,endof(A),true)` | `find(pred,A,endof(A),true)` | * | `findeq(A,v,endof(A),true)` | `findseq(A,s,endof(A),true)` | | 165 | | Forward iterator | `search(A)` | `search(pred,A)` | * | `searcheq(A,v)` | `searchseq(A,s)` | | 166 | | Backward iterator | `search(A,true)` | `search(pred,A,true)` | * | `searcheq(A,v,true)` | `searchseq(A,s,true)` | | 167 | 168 | \* These combinations are not needed as they correspond to `searchseq`. Indeed they do not 169 | exist in the current API. 170 | 171 | ## Proposal 3 172 | 173 | This proposal adds `findeach(pred, A[, rev])`, which returns an iterator and can be used to 174 | implement most of the other functions in one line. 175 | Predicates are always used instead of separate functions for different kinds of searches when possible. 176 | This potentially allows using the same function for sequence searching, since a subsequence to look for 177 | is unlikely to be confused with a predicate. 178 | 179 | | | nonzeros | predicate test | in collection `c` | equal to `v` | sequence or regex `s` | 180 | | --- | --- | --- | --- | --- | --- | 181 | | All at once | `find(A)` | `find(pred, A)` | `find(occursin(c), A)` | `find(equalto(v), A)` | `find(s, A)` | 182 | | Next match | `findeach(!iszero,A)` * | `findeach(pred,A)` * | `findeach(occursin(c),A)` * | `findeach(equalto(v),A)` * | `findeach(s,A)` * | 183 | | Previous match | `findeach(!iszero,A,true)` * | `findeach(pred,A,true)` * | `findeach(occursin(c),A,true)` * | `findeach(equalto(v),A,true)` * | `findeach(s,A,true)` * | 184 | | Forward iterator | `findeach(!iszero,A)` | `findeach(pred,A)` | `findeach(occursin(c),A)` | `findeach(equalto(v),A)` | `findeach(s,A)` | 185 | | Backward iterator | `findeach(!iszero,A,true)` | `findeach(pred,A,true)` | `findeach(occursin(c),A,true)` | `findeach(equalto(v),A,true)` | `findeach(s,A,true)` | 186 | 187 | \* Getting the next and previous matches is handled by the iteration protocol. 188 | If necessary, you can pass `findeach(pred, rest(itr, st))` to start at a particular state. 189 | We can keep `findnext` and `findprev`, since they operate on array indices while 190 | the general iterator needs to operate on state objects. 191 | 192 | We can also keep `findfirst` and `findnext`, since they are especially convenient. 193 | Ideally we will keep only `findfirst(pred, A)` and deprecate other methods. 194 | 195 | If we want, `find` can be deprecated to `collect(findeach(...))`. 196 | 197 | The following functions can also be deprecated to `findeach` calls: `findin`, `search`, `rsearch`, `match`, `eachmatch`. 198 | 199 | This proposal does not touch `findmin`, `findmax`, etc. 200 | 201 | ## Particular Cases 202 | 203 | Other issues are more localized and can be fixed one by one, depending on the chosen general 204 | plan. 205 | 206 | - **`findmin` and `findmax`**: `findmin` and `findmax` are inconsistent 207 | with both proposals, since they return an `(index, value)` tuple instead of an index. They 208 | should be changed to return an index (as in both proposals above). A new name needs to be 209 | found if we want to keep `(index, value)` variants, which are slightly more efficient. 210 | 211 | - **`searchsorted*` functions**: These functions should be replaced with 212 | standard search/find functions called on a special `SortedArray` wrapper. `findeq` would 213 | replace `searchsorted` and -- like that function -- return a range (instead of a `Vector`) 214 | of indices, which is possible when input is sorted. 215 | 216 | - **`indexin`**: It is not clear whether this function really belongs to the 217 | find/search family. It could be kept as-is. 218 | 219 | - **`*match*` functions**: These functions (`match`, `matchall` and `eachmatch`) 220 | return `RegexMatch` objects or strings (rather than indices). They can be left outside the 221 | scope of this Julep. On the other hand `findseq`/`searchseq` functions should support regexes 222 | for consistency, only returning ranges of indices (as does `search` currently). 223 | 224 | ## Deprecation strategy 225 | 226 | Depending on the choices made, the migration to the new API will be possible in a single 227 | release (if no ambiguity exists with the old one), or it will have to be done in two 228 | releases (to allow removing old conflicting methods first). 229 | 230 | ## Issues Beyond the Scope of This Julep 231 | 232 | These are important to resolve but are not covered by the above proposals. 233 | 234 | - **Whether to return a `Nullable` instead of `0` when there is no match** 235 | ([PR#15755](https://github.com/JuliaLang/julia/pull/15755)): This is blocked by progress 236 | with regard to `Nullable`, in particular whether they are stack-allocated in all cases 237 | and whether they can be represented as a `Union` type. It is therefore out of this Julep's 238 | scope. 239 | 240 | - **Whether to return linear or cartesian indices** 241 | ([PR#14086](https://github.com/JuliaLang/julia/pull/14086)): Both could be needed depending 242 | on the context. Passing `CartesianIndex` as the first argument to all functions would work 243 | and would allow replacing `findn(A)` with `find(CartesianIndex, A)`. On the other hand, 244 | computing the linear index is slow for `LinearSlow` arrays, which means that returning 245 | the same index type as `eachindex(A)` could be a better default; it also makes more sense 246 | for multidimensional arrays. Then one would write `find(LinearIndex, A)` or `find(Int, A)` 247 | to always get a linear index. 248 | 249 | - **Sentinel values in a world where array indices do not necessarily start with 1**: 250 | - `findfirst(x, v)` returns 0 if no value matching `v` is found; 251 | however, if `x` allows 0 as an index, the meaning of 0 is 252 | ambiguous. One could return `typemin(Int)` or 253 | `minimum(linearindices(x))-1`, but what if `x` starts indexing 254 | at `typemin(Int)`? 255 | - No matter sentinel value gets returned, the deprecation 256 | strategy here is delicate. There may be a lot of code that 257 | checks the return value and compares it to 0. 258 | -------------------------------------------------------------------------------- /StructuredConcurrency.md: -------------------------------------------------------------------------------- 1 | # Structured Concurrency 2 | 3 | * Title: Exploring Structured Concurrency 4 | * Editor: Chris Foster 5 | * Created: 2019-09-12 6 | * Status: Work in progress 7 | * Discussion: [JuliaLange/julia#33248](https://github.com/JuliaLang/julia/issues/33248) 8 | 9 | Here are some notes surveying structured concurrency as it can be applied to 10 | Julia. 11 | 12 | Julia has supported non-parallel concurrency since very early on and a 13 | restricted form of parallel programming with the `@threads` macro since version 14 | 0.5. 15 | 16 | In julia 1.3 a threadsafe runtime for truly parallel tasks [has 17 | arrived](https://julialang.org/blog/2019/07/multithreading) which will greatly 18 | increase their appeal in Julia's numerical and technical computing community. 19 | It's time to think about APIs where users can express concurrent computation in 20 | a safe and composable way. 21 | 22 | ### Background terminology 23 | 24 | For clarity, here's a few items of terminology: 25 | 26 | * A Julia [**task**](https://docs.julialang.org/en/v1/manual/control-flow/index.html#man-tasks-1) 27 | stores the computational state needed to continue execution of a nested set 28 | of function calls. In the standard runtime this includes any native stack 29 | frames, CPU registers and julia runtime state needed to suspend and resume 30 | execution. 31 | * A program is **concurrent** when there are multiple tasks which have started 32 | but not yet completed at a given time. 33 | * A program is **parallel** when two or more tasks are executing at a given 34 | time. 35 | 36 | With these definitions, parallelism implies concurrency but a concurrent 37 | program can be non-parallel if the runtime serially interleaves task execution. 38 | See, for example, 39 | [section 2.1.2](https://books.google.com.au/books?redir_esc=y&id=J5-ckoCgc3IC&q=paralleism+versus+concurrency#v=snippet&q=paralleism%20versus%20concurrency&f=false) 40 | of "Introduction to Concurrency in Programming Languages". 41 | 42 | ### What is structured concurrency? 43 | 44 | To quote the [`libdill` documentation](http://libdill.org/structured-concurrency.html), 45 | 46 | > Structured concurrency means that lifetimes of concurrent functions are 47 | > cleanly nested. If coroutine `foo` launches coroutine `bar`, then `bar` must 48 | > finish before `foo` finishes. 49 | > 50 | > This is not structured concurrency: 51 | > 52 | > ![unstructured concurrency](StructuredConcurrency_not_structured.jpeg) 53 | > 54 | > This is structured concurrency: 55 | > 56 | > ![structured concurrency](StructuredConcurrency_structured.jpeg) 57 | 58 | It's all about composability. Structured concurrency is good because it 59 | reasserts the function call as the natural unit of program composition, where 60 | the lifetime of a computation is delimited in the *structure of the source 61 | code*. This is sometimes called the 62 | [*black box rule*](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/#what-happened-to-goto). 63 | Without this, 64 | 65 | * Task failures can go unhandled because there's nowhere to propagate the error. 66 | * Task lifetime is not defined by the source code. When a task starts and 67 | whether it runs to completion is an implementation detail of the runtime. 68 | * Computation cannot be cancelled systematically because there's no natural 69 | tree of child tasks. 70 | * Scope-based resource cleanup (eg, with `open(...) do io`) is broken because 71 | task local context can leak from parents into long running children. 72 | 73 | For a colourful view on the downsides of unstructured concurrency, `@njsmith` 74 | has expressed it this way in his blog post [Notes on structured concurrency or, 75 | "go statement considered harmful"](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/#conclusion): 76 | 77 | > The popular concurrency primitives — go statements, thread spawning 78 | > functions, callbacks, futures, promises, ... they're all variants on `goto`, 79 | > in theory and in practice. And not even the modern domesticated `goto`, but the 80 | > old-testament fire-and-brimstone `goto`, that could leap across function 81 | > boundaries. These primitives are dangerous even if we don't use them 82 | > directly, because they undermine our ability to reason about control flow and 83 | > compose complex systems out of abstract modular parts, and they interfere 84 | > with useful language features like automatic resource cleanup and error 85 | > propagation. Therefore, like goto, they have no place in a modern high-level 86 | > language. 87 | 88 | 89 | ### Structured concurrency in Julia 1.0? 90 | 91 | Julia 1.0 supports a limited kind of structured concurrency via the `@sync` 92 | block which waits for lexically contained child tasks (scheduled using 93 | `@async`) to complete. However, like Go, there's no requirement that concurrent 94 | work is actually scoped this way; that's completely up to the user and they may 95 | use `@async` anywhere. At first sight, it may seem just as natural to choose an 96 | unstructured 97 | [communicating sequential processes](https://en.wikipedia.org/wiki/Communicating_sequential_processes) 98 | (CSP) style in current Julia. 99 | 100 | Even if the user chooses structured concurrency with `@sync`, they are still 101 | faced with implementing robust cancellation machinery by hand using `Channel`s. 102 | This is the big missing piece required for the natural use of structured 103 | concurrency in Julia. 104 | 105 | 106 | ## Cancellation and preemption 107 | 108 | A robust task cancellation system is required to express structured 109 | concurrency. Without it, child tasks cannot be systematically managed in 110 | response to events such as a timeout from the parent or the failure of a 111 | sibling. For a great discussion of cancellation and a survey of cancellation 112 | APIs see the blog post ["Timeouts and cancellation for 113 | humans"](https://vorpus.org/blog/timeouts-and-cancellation-for-humans). 114 | 115 | **Big challenge**: how do we handle cancellation safely but in a timely way? 116 | What are the valid cancellation points and can we have cancellation which is 117 | both timely, safe and efficient in a wide variety of situations? Ideally we'd 118 | like tight numerical loops to be cancellable as well as IO. And we want all 119 | this without the performance penalty of inserting extra checks or safe points 120 | into loop code. 121 | 122 | ### The challenge of preemptive cancellation 123 | 124 | At first sight one might hope to treat preemptive cancellation somewhat like 125 | `InterruptException`: wake the task, deliver a signal to its thread to generate 126 | a `CanceledException` which then unwinds the stack, running regular user 127 | cleanup code. 128 | 129 | The key difficulty here is that arbitrary preemptive cancellation can occur in 130 | any location with no syntactic hint in the source. Others [have 131 | claimed](https://github.com/golang/go/issues/29011#issuecomment-443441031) that 132 | this makes arbitrary cancellation an impossible problem for user code. The 133 | standard compromise is to make only a core set of operations (including IO) 134 | cancellable. This is the solution offered in 135 | [Python Trio checkpoints](https://trio.readthedocs.io/en/stable/reference-core.html#checkpoints), 136 | libdill's family of IO functions and in pthreads (see [pthread\_cancel](http://man7.org/linux/man-pages/man3/pthread_cancel.3.html) 137 | and [pthreads cancellation points](http://man7.org/linux/man-pages/man7/pthreads.7.html)). 138 | In contrast, consider the failed preemptive cancellation APIs 139 | [Java `Thread.stop`](https://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html) 140 | Windows API 141 | [`TerminateThread`](https://devblogs.microsoft.com/oldnewthing/?p=91811), 142 | both of which were found to be fundamentally non-robust. 143 | 144 | Let's consider the ways in which current julia code can be non-robust in the 145 | face of `InterruptException`. A particular difficulty occurs in resource 146 | acquisition. Consider this snippet from task.jl: 147 | 148 | ```julia 149 | lock(t.donenotify) 150 | # < What if InterruptException is thrown here? 151 | try 152 | while !istaskdone(t) 153 | wait(t.donenotify) 154 | end 155 | finally 156 | unlock(t.donenotify) 157 | end 158 | ``` 159 | 160 | In Julia we have the escape hatch `disable_sigint` (`jl_sigatomic_begin` in the 161 | runtime) for deferring `InterruptException`, but most code doesn't consider or 162 | use this which makes user resource handling broken by default. 163 | 164 | So it's fairly clear that arbitrary cancellation without cleanup is a 165 | non-starter and that arbitrary cancellation with cleanup is difficult. But that 166 | leaves us in a difficult situation: how do we allow for cancellation of 167 | expensive numerical operations? Are there options for cancellation of numerical 168 | loops with a semantic which can be understood by users? The Go people seem to 169 | consider that arbitrary *preemption* is workable, but can arbitrary 170 | cancellation be made to work with the right language and library features? 171 | 172 | #### Runtime technicalities for preemption 173 | 174 | On a technical level, our runtime situation in julia-1.3 is very similar to 175 | Go where preemption is cooperative and a rouge goroutine can sometimes wedge 176 | the entire system. There has been a large amount of work in the Go community to 177 | address this, leading to the proposal 178 | ["Non-cooperative goroutine preemption"](https://github.com/golang/proposal/blob/master/design/24543-non-cooperative-preemption.md). 179 | In the process, several interesting alternatives 180 | [were assessed](https://github.com/golang/go/issues/24543) including cooperative 181 | preemption of loops (by the insertion of safe points) and more complex 182 | mechanisms such as returning from a signal to out-of-line code which leads 183 | quickly to a safe point. 184 | 185 | ## Syntax 186 | 187 | When comparing to solutions in other languages it's important to mention that 188 | many have introduced special syntax to mark concurrent code. 189 | 190 | * C# introduced `async`/`await`; many followed (Python, Rust, ...). This makes 191 | potential suspension points syntactic. 192 | * `await` in Python marks preemption points. `async` is required to go with it, 193 | forming a chain of custody around "potentially suspending" functions. 194 | * Kotlin has `suspend` to introduce a special calling convention which passes 195 | along the coroutine context. 196 | * Go doesn't have `async` or `await` but is deeply concurrent and is the best 197 | analogy to Julia. 198 | 199 | The problem with `async`/`suspend` is that it splits the world of functions in 200 | two, as nicely expressed in Bob Nystrom's blog post 201 | ["What color is your function?"](http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/). 202 | This is a barrier to composability because higher order functions have to know 203 | about the color of the function they're being passed. Bob argues that Go 204 | handles this in the nicest way by having first class support for continuations 205 | in the language. The Julia runtime does this in the same way. 206 | 207 | On the other hand, a syntax such as `async`/`await` is arguably a useful visual 208 | marker for possible cancellation points (`await`) and for which functions are 209 | cancellable (`async`). Note that this doesn't have to be implemented at the 210 | language level; for example, Go's context and errgroup also allow the reader to 211 | recognize where the cancellation can happen (listening to the Done channel) and 212 | which functions can be cancelled (those that accept Context as an argument). 213 | 214 | ## Prototypical use cases 215 | 216 | * The "happy eyeballs" algorithm is becoming a standard example of structured 217 | concurrency thanks to `@njsmith`'s tutorial at PyCon 2018. 218 | - [General discussion](https://trio.discourse.group/t/happy-eyeballs-structured-concurrencys-hello-world/57) 219 | - [Trio implementation](https://github.com/python-trio/trio/blob/master/trio/_highlevel_open_tcp_stream.py) 220 | - [Libdill implementation](https://github.com/sustrik/libdill/blob/master/happyeyeballs.c) and [discussion](http://250bpm.com/blog:139) 221 | * The Go concurrency tutorial — in his talk `@elizarov` suggested that 222 | implementing all the examples there was a great inspiration. 223 | 224 | ## Related julia issues and prototypes 225 | 226 | * [Tapir parallel IR](https://github.com/JuliaLang/julia/pull/31086) 227 | 228 | * [API Request : Interrupt and terminate a task](https://github.com/JuliaLang/julia/issues/6283) 229 | * [Error handling in tasks](https://github.com/JuliaLang/julia/issues/32677) 230 | * [Uncaught exceptions from tasks](https://github.com/JuliaLang/julia/issues/32034) 231 | * [silent errors in Tasks](https://github.com/JuliaLang/julia/issues/10405) 232 | * [asyncmap: Include original backtrace in rethrown exception](https://github.com/JuliaLang/julia/pull/32749) 233 | 234 | TODO: We should organize these, and more, with a tag. 235 | 236 | * [Awaits.jl](https://github.com/tkf/Awaits.jl) 237 | 238 | 239 | ## Resources 240 | 241 | A lot has been written on structured concurrency quite recently. Relevant 242 | implementations are available in C, Kotlin and Python, with Go also having to 243 | deal with many of the same issues. The Trio forum has a section dedicated to 244 | the [language-independent discussion of structured 245 | concurrency](https://trio.discourse.group/c/structured-concurrency). 246 | 247 | #### Links 248 | 249 | * [Structured concurrency resources - Structured concurrency - Trio forum](https://trio.discourse.group/t/structured-concurrency-resources/21) 250 | * [Reading list · Python-Trio/Trio Wiki](https://github.com/python-trio/trio/wiki/Reading-list) 251 | 252 | #### People in the wider community 253 | 254 | * Bob Nystrom ([`@munificent`](http://journal.stuffwithstuff.com)) works on the 255 | Dart language at google. Regarding async/await, he wrote a very on-topic post 256 | - [What color is your function?](http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/). 257 | * Martin Sústrik ([`@sustrik`](https://github.com/sustrik)) is the author of 258 | the C library libdill, and has a interesting [blog](http://250bpm.com/) in 259 | which the term "structured concurrency" appears to have (perhaps) first 260 | appeared: 261 | - [Structured Concurrency](http://250bpm.com/blog:71) 262 | - [Update on Structured Concurrency](http://250bpm.com/blog:137) 263 | - [Two approaches to structured concurrency](http://250bpm.com/blog:139) 264 | * Nathanial Smith ([`@njsmith`](https://github.com/njsmith)) is the author of 265 | the Python Trio library and a key advocate of structured concurrency. His 266 | [blog](https://vorpus.org/blog/archives.html) has several very interesting 267 | posts on the topic. 268 | - [Timeouts and cancellation for humans](https://vorpus.org/blog/timeouts-and-cancellation-for-humans) 269 | - [Notes on structured concurrency, or: go statement considered harmful](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/). 270 | 271 | See also his PyCon 2018 talk: 272 | - [Nathaniel J. Smith - Trio: Async concurrency for mere mortals - PyCon 2018 - YouTube](https://www.youtube.com/watch?v=oLkfnc_UMcE) 273 | * Roman Elizarov ([`@elizarov`](https://github.com/elizarov)) is the team lead 274 | for Kotlin libraries at JetBrains. Here's his [blog](https://medium.com/@elizarov). 275 | - [Structured concurrency](https://youtu.be/hW4vjgtPCAY?t=25960) for Kotlin talk at Hydraconf ([talk abstract](https://hydraconf.com/2019/talks/68l5ztovlf0xm9aindouzr)) 276 | - [Kotlin structured concurrency blog post](https://medium.com/@elizarov/structured-concurrency-722d765aa952) 277 | 278 | #### Structured concurrency libraries 279 | 280 | * [libdill (C)](http://libdill.org/structured-concurrency.html) 281 | * [Trio (Python)](https://trio.readthedocs.io/en/stable) 282 | * [Kotlin coroutines](https://kotlinlang.org/docs/reference/coroutines/basics.html#structured-concurrency) 283 | 284 | #### Cancellation 285 | 286 | * Python 287 | - [Timeouts and cancellation for humans](https://vorpus.org/blog/timeouts-and-cancellation-for-humans) 288 | * Go 289 | - [errgroup](https://godoc.org/golang.org/x/sync/errgroup) 290 | - [context](https://golang.org/pkg/context) 291 | - [Discussion of using the Trio approach for Go](https://github.com/golang/go/issues/29011) 292 | -------------------------------------------------------------------------------- /GcExtensions.md: -------------------------------------------------------------------------------- 1 | # Garbage Collector Extensions 2 | 3 | - **Title:** Garbage collector extensions for better foreign language support 4 | - **Author:** Reimer Behrends 5 | - **Created:** May 2018 6 | - **Status:** work in progress 7 | 8 | ## Introduction 9 | 10 | The support for modules written entirely or partly in foreign languages 11 | to interface with the Julia GC or to use the GC for allocations that do 12 | not neatly fit Julia's type system or use low-level approaches not 13 | available in Julia (such as irregular data structure layouts) is 14 | currently somewhat limited. 15 | 16 | This proposal aims at fleshing out the API for allowing more complex 17 | interaction of foreign code with the GC, especially the use of long-lived 18 | foreign objects that are inextricably interwoven with Julia 19 | objects. 20 | 21 | Specific use cases that we are trying to address are: 22 | 23 | 1. *Allowing the Julia GC to manage foreign objects with arbitrary 24 | layouts.* Not all objects -- especially those from preexisting 25 | libraries -- fit Julia's type system, for example, specialized 26 | container types written in C/C++. Such objects can comprise multiple 27 | memory blocks that require a custom marking mechanism and may also 28 | require low-level finalizer behavior written in C. 29 | 2. *Providing additional roots to the GC.* Currently, to have additional 30 | roots, they must be stored in a location that is visible to Julia. 31 | This can be expensive if such roots are updated frequently or are 32 | contained in data structures that would have to be laboriously 33 | translated into a format usable by Julia. Instead, we want to allow 34 | for roots to be discoverable at the beginning of a garbage collection. 35 | 3. *Conservative scanning of stack frames and objects.* Currently, 36 | scanning does have to be precise. If we desire to use the GC for 37 | foreign code that requires conservative scanning (especially for 38 | foreign stack frames), then it is necessary to have functionality 39 | that determines whether a machine word is a pointer to an object, 40 | including to its interior. 41 | 42 | To demonstrate the applicability and viability of these mechanisms, we 43 | have fully integrated Julia with the GAP computer algebra system, to the 44 | point that GAP's regular garbage collector is completely replaced with 45 | Julia's and that the lifetime of all GAP objects is entirely managed by 46 | Julia. We also implemented a self-contained test program using these 47 | mechanisms, and integrated it into the Julia test suite. 48 | 49 | The proposed implementation should not incur measurable overhead for 50 | Julia itself, as it only exposes additional functionality that is unused 51 | by Julia code, plus functionality hooks that are designed to only incur 52 | a few clock cycles of overhead per garbage collection. More specific 53 | discussion of overhead can be found accompanying the descriptions of 54 | these hooks. 55 | 56 | An implementation of this proposal can be found on GitHub under 57 | (branch `rb/gc-extensions`). See 58 | the example in the `test/gcext` subdirectory for an example of using 59 | this API. 60 | 61 | An implementation of GAP that uses the Julia GC in lieu of its native 62 | GC can likewise been found on GitHub at 63 | (branch `alt-gc`). That version of GAP can be built with: 64 | 65 | ./autogen.sh 66 | ./configure --with-gc=julia --with-julia=/path/to/julia/usr 67 | make 68 | 69 | ## Callbacks 70 | 71 | In order to allow foreign code to have access to necessary functionality 72 | in the garbage collector, we allow foreign code to register callbacks for 73 | certain GC events. We provide for six types of callbacks: 74 | 75 | 1. Beginning of garbage collection (`pre_gc`) 76 | 2. End of garbage collection (`post_gc`) 77 | 3. When scanning GC roots (`root_scanner`) 78 | 4. When scanning Julia tasks (`task_scanner`) 79 | 5. When an external object is allocated (`notify_external_alloc`). 80 | 6. When an external object is deallocated (`notify_external_free`). 81 | 82 | These callbacks are *not* per se thread-safe. It is up to to the callback 83 | implementation to ensure that no violations of thread-safety occur. 84 | 85 | In particular, each of these can be called from any thread. All except the 86 | first two can be called concurrently. In the current Julia GC implementation, 87 | the `post_gc` callback may also not be called before the next `pre_gc`. 88 | 89 | With external objects, we refer to what in the current Julia implementation are 90 | called `bigval_t` objects. These are allocated using the system's memory 91 | allocator rather than using Julia's external allocator. In order to not 92 | expose this implementation detail, we talk about "internal" and "external" 93 | objects rather than objects that are allocated as part of Julia's object 94 | pool or through system routines, respectively. 95 | 96 | For each type of callback, there is a corresponding function pointer type. 97 | Registering and deregistering callbacks occurs via corresponding setter 98 | functions. 99 | 100 | ``` 101 | typedef void (*jl_gc_cb_pre_gc_t)(int full); 102 | typedef void (*jl_gc_cb_post_gc_t)(int full); 103 | typedef void (*jl_gc_cb_root_scanner_t)(int full); 104 | typedef void (*jl_gc_cb_task_scanner_t)(jl_task_t *task, int full); 105 | typedef void (*jl_gc_cb_notify_external_alloc_t)(void *addr, size_t size); 106 | typedef void (*jl_gc_cb_notify_external_free_t)(void *addr); 107 | 108 | void jl_gc_set_cb_root_scanner(jl_gc_cb_root_scanner_t cb, int enable); 109 | void jl_gc_set_cb_task_scanner(jl_gc_cb_task_scanner_t cb, int enable); 110 | void jl_gc_set_cb_pre_gc(jl_gc_cb_pre_gc_t cb, int enable); 111 | void jl_gc_set_cb_post_gc(jl_gc_cb_post_gc_t cb, int enable); 112 | void jl_gc_set_cb_notify_external_alloc(jl_gc_cb_notify_external_alloc_t cb, int enable); 113 | void jl_gc_set_cb_notify_external_free(jl_gc_cb_notify_external_free_t cb, int enable); 114 | ``` 115 | 116 | For each setter function, a callback function is supplied, along with a flag 117 | (`1` for enabling the callback, `0` for removing it again). Attempting to 118 | register a callback multiple times will only register it once. 119 | 120 | *Performance impact:* The callback implementation is designed to incur 121 | negligible overhead if no callbacks are used and no more overhead than 122 | necessary to invoke the callbacks. The callbacks are all kept in linked 123 | lists; if no callbacks are registered, all that is done is testing a 124 | static variable for being null and to branch if it is. As branch 125 | behavior should always be the same, only a few clock cycles are used, as 126 | long as the variable is in the cache and the branch target in the BTB. 127 | 128 | ## Additional GC roots and hooking into the GC process 129 | 130 | We provide three callbacks that are called at the beginning of a GC 131 | (`pre_gc`), the beginning of the mark phase (`root_scanner`), and the end of 132 | the GC (`post_gc`). As these callbacks are tested and called only once per 133 | collection, overhead should be negligible. The `full` argument passed 134 | to these callbacks indicates whether this is a full or partial garbage 135 | collection. 136 | 137 | In addition, we also provide a `task_scanner` hook, which functions like 138 | the `root_scanner` hook, except that it is called for each task and with 139 | a pointer to the task object as its first argument. 140 | 141 | Additional roots can be marked from the `root_scanner` and 142 | `task_scanner` callbacks by calling the `jl_gc_mark_queue_obj()` 143 | function, which takes a pointer to the current thread's thread-local 144 | storage a pointer to the object as its parameters. 145 | 146 | ``` 147 | int jl_gc_mark_queue_obj(jl_ptls_t ptls, jl_value_t *obj); 148 | ``` 149 | 150 | The `ptls` parameter can be filled in from the return value of 151 | the `jl_get_ptls_states()` function, which returns a pointer to 152 | the thread-local storage of the current thread. 153 | 154 | The return value of `jl_gc_mark_queue_obj()` can be ignored for marking 155 | roots, but will be relevant for marking foreign objects (see below). 156 | 157 | When processing large objects, calling `jl_gc_mark_queue_obj()` can be 158 | ineffecient, as each object will be pushed on the mark stack separately. 159 | 160 | If possible, it is therefore recommended that programmers use the 161 | following function, designed for arrays of references, which handles 162 | this use case more efficiently: 163 | 164 | ``` 165 | void jl_gc_mark_queue_objarray(jl_ptls_t ptls, jl_value_t *parent, 166 | jl_value_t **objs, size_t nobjs); 167 | ``` 168 | 169 | Here, `parent` is a reference to the current object, `objs` is a pointer 170 | to the start of an array of object references, and `nobjs` is the number 171 | of object references contained in that array. That array must be part of 172 | the object; it must not be allocated in static memory or on the stack. 173 | 174 | Unlike `jl_gc_mark_queue_obj()`, this function does not have a return 175 | value, as it does the requisite tracking itself. 176 | 177 | Calling this function will only require one slot on the mark stack, as 178 | opposed to the `nobjs` slot that individual calls to 179 | `jl_gc_mark_queue_obj()` would require, making it considerably more 180 | memory efficient. 181 | 182 | ## Managing foreign objects with custom layouts 183 | 184 | Foreign objects with custom layouts can define their own datatype through 185 | the `jl_new_foreign_type()` function: 186 | 187 | ``` 188 | typedef uintptr_t (*jl_markfunc_t)(jl_ptls_t ptls, jl_value_t *obj); 189 | typedef void (*jl_sweepfunc_t)(jl_value_t *obj); 190 | 191 | jl_datatype_t *jl_new_foreign_type( 192 | jl_sym_t *name, 193 | jl_module_t *module, 194 | jl_datatype_t *super, 195 | jl_markfunc_t markfunc, 196 | jl_sweepfunc_t sweepfunc, 197 | int haspointers, 198 | int large 199 | ); 200 | ``` 201 | 202 | The first three parameters of `jl_new_foreign_type` are the same as for 203 | regular data types; following are a pointer to a mark function 204 | (`markfunc`) and a pointer to a sweep function (`sweepfunc`); the latter 205 | of which can be null. 206 | 207 | The `haspointers` parameter should be non-zero if instances of the new 208 | datatype may contain references to Julia objects; the `large` parameter 209 | should be non-zero if the size of instances of the new datatype will be 210 | greater than the value returned by `jl_gc_max_internal_obj_size()` and 211 | zero otherwise. If the objects can be both larger or not, then two 212 | distinct foreign types need to be created, one for the case where the 213 | size is less than or equal and one for the case where it is larger than 214 | the value of `jl_gc_max_internal_obj_size()`. 215 | 216 | ``` 217 | size_t jl_gc_max_internal_obj_size(void); 218 | ``` 219 | 220 | *Performance impact:* Custom mark functions need to be called during the 221 | performance-critical mark loop of the garbage collector. In order to 222 | avoid overhead for the other cases, the code is engineered to consider 223 | such objects as the last possible option in the existing if-else chains. 224 | To accomplish that, such foreign types use the existing 225 | `jl_datatype_layout_t` structure, with `fielddesc_type` set to `3`, 226 | which is looked at after the other data types and the other alternatives 227 | for `fielddesc_type`. 228 | 229 | ### Mark functions for foreign objects 230 | 231 | The mark function `markfunc` gets passed a pointer to thread-local 232 | storage (`ptls`) 233 | and the object to be marked (which will be of the type defined through 234 | `jl_new_foreign_type()`. The `ptls` argument is an optimization so that 235 | `jl_get_ptls_states()` does not need to be called unnecessarily during 236 | the mark loop. 237 | 238 | The mark function implementation also uses `jl_gc_mark_queue_obj()` to 239 | mark objects, as with the `root_scanner` callback; however, in contrast to marking 240 | roots, the return value cannot be ignored. Per object, the mark function 241 | should count how often `jl_gc_mark_queue_obj()` for subjects return 242 | non-zero values and return that number. If an object has no subobjects, 243 | the mark function should return zero. 244 | 245 | This information is relevant for the generational part of garbage 246 | collection. The return value of `jl_gc_mark_queue_obj()` is non-zero 247 | if a young generation object has been marked. When the mark function 248 | has been called for an old object and the mark function returns a 249 | non-zero value (thus showing how many young objects have been marked 250 | from the old one), the GC knows to update its internal data 251 | structures accordingly. 252 | 253 | For an example of this, see the `gcext` test in the Julia repository, 254 | which defines a couple of such custom mark functions. 255 | 256 | ### Sweep functions for foreign objects 257 | 258 | Sweep functions for foreign objects are similar to, but more limited 259 | than finalizers, as they are not intended to replace finalizer 260 | functionality. Rather, they are meant to clean up complex memory 261 | structures allocated with raw malloc calls or operating system 262 | resources. They will be called during the sweep phase and must not have 263 | side effects that are visible to Julia. 264 | 265 | To enable sweep functions for a foreign object, the function 266 | `jl_gc_schedule_foreign_sweepfunc()` has to be called on the object, 267 | which has to be of a foreign type and that foreign function has to be 268 | defined with a non-null sweep function `sweepfunc`. Without that call, 269 | the sweep function will not be called on this particular object. This is 270 | to avoid unnecessary overhead if not all objects of that type require 271 | extra sweep phase semantics. This function should be called at most once 272 | per object; if called multiple times, the sweep function may be invoked 273 | more than once on the given object. 274 | 275 | ``` 276 | JL_DLLEXPORT void jl_gc_schedule_foreign_sweepfunc(jl_ptls_t ptls, 277 | jl_value_t *obj); 278 | ``` 279 | 280 | ### Allocating foreign objects 281 | 282 | On the C side, such objects can be allocated using the call 283 | `jl_gc_alloc_typed()`; the function takes a pointer to the thread's 284 | thread-local storage, the desired size, and the foreign datatype as its 285 | arguments. 286 | 287 | ``` 288 | JL_DLLEXPORT void * jl_gc_alloc_typed(jl_ptls_t ptls, size_t sz, 289 | void *ty); 290 | ``` 291 | 292 | ## Conservative scanning 293 | 294 | Some external modules may require conservative scanning, especially 295 | of the stack. This was the case, for example, with our application 296 | involving the GAP computer algebra system. 297 | 298 | We note that conservative scanning should be avoided if at all possible; 299 | it is not intended as a way to avoid tracking Julia references (for which 300 | the `root_scanner` callback and custom marking functions offer efficient 301 | options if other approaches fail), but as a feature of last resort if 302 | integrating an existing codebase through other means is not viable. 303 | 304 | Conservative scanning must be enabled through a call to the following 305 | function: 306 | 307 | ``` 308 | void jl_gc_enable_conservative_scanning(void); 309 | ``` 310 | 311 | This function can be called from C code both before and after `jl_init()` 312 | and is thread-safe. Enabling this introduces a very small, but non-zero 313 | overhead, which is why it is not enabled by default. 314 | 315 | In order to handle conservative scanning, we need to expose the fact 316 | that Julia distinguishes between objects it manages itself (which we 317 | call "internal objects" in this document) and objects that it manages 318 | via "malloc()" or similar calls (these we call "external objects"). 319 | 320 | The proposed functionality relies on calls to Julia to determine 321 | if a pointer is a reference to an internal object, but leaves it up 322 | to the author of the foreign code to determine this for external 323 | objects; to this end, we provide callbacks to notify foreign code 324 | of the allocation or deallocation of such objects. 325 | 326 | The accompanying function pointer types are: 327 | 328 | ``` 329 | typedef void (*jl_gc_cb_notify_external_alloc_t)(void *addr, size_t size); 330 | typedef void (*jl_gc_cb_notify_external_free_t)(void *addr); 331 | ``` 332 | 333 | The allocation callback is invoked with the address and size of the 334 | new object, the deallocation callback is invoked with the address of 335 | the object about to be freed. Allocation and deallocation is still 336 | managed by Julia. The intent here is that foreign code can track 337 | allocations and deallocations in a data structure of its own if 338 | needed. An example of this can be seen in the `gcext` test, where 339 | we use a balanced tree to track allocations. 340 | 341 | Note that registering such callbacks will only track allocations that 342 | occur *after* the callbacks have been set. We assume here that the client 343 | is only interested in tracking its own objects that may be stored in 344 | opaque stack frames, but not other Julia objects that may be passed in 345 | from Julia calls. If the client needs to track *all* allocations, then 346 | the callbacks *must* be registered before calling `jl_init()`. 347 | 348 | *Performance impact:* The overhead for the callbacks should be minimal, 349 | especially since the cost of allocating large objects through the system 350 | allocator and initializing them will dominate the allocation process. 351 | 352 | Note that some of these objects may not have a valid type field and 353 | especially in the context of conservative scanning, pointers to 354 | objects with invalid type fields may inadvertently be generated. In 355 | such a case, the validity of the type field should also be checked, 356 | e.g. with: `jl_gc_internal_obj_base_ptr(jl_typeof(obj)) != NULL` (see 357 | below for the semantics of this function). 358 | 359 | To determine whether a pointer points to an internal object, the 360 | following functions may be used: 361 | 362 | ``` 363 | jl_value_t *jl_gc_internal_obj_base_ptr(void *p); 364 | int jl_gc_is_internal_obj_alloc(jl_value_t *p); 365 | ``` 366 | 367 | The `jl_gc_internal_obj_base_ptr()` function returns `NULL` if the 368 | argument does not point to the beginning, the interior, or the end of an 369 | internal object. otherwise, it returns a pointer to the beginning of the 370 | object it points to. The `jl_gc_is_internal_obj_alloc()` function is an 371 | optimized fast path version; it returns a non-zero value if and only the 372 | argument is a valid internal object or if it points to memory reserved 373 | for the allocation of such objects. In the latter case, it is guaranteed 374 | that the type field of such an object does not contain a valid datatype. 375 | 376 | ## Performance evaluation 377 | 378 | In order to evaluate the changes for performance, we ran the system 379 | with no callbacks or foreign types installed against the Julia base 380 | benchmarks (namely, the "array", "collection", "micro", "shootout", 381 | "sparse", "string", and "tuple" suites) for both our changes and 382 | a recent version of the master branch. 383 | 384 | We did not observe any performance regressions in those benchmarks. 385 | While, due to the noisiness of our test system, spurious regressions 386 | crept up occasionally (about a handful per run), none of them 387 | persisted for more than one run of the suite. 388 | 389 | Also, when testing for improvements, similar and similarly common 390 | performance changes occurred in the other direction, including spurious 391 | "regressions" of the master branch compared to our changed version. 392 | 393 | Finally, we ran a couple of specialized microbenchmarks (included below) 394 | designed to stress-test the garbage collector several times and observed 395 | the performance over several runs for both the master branch version and 396 | our changes; we did not observe significant differences in the 397 | distribution of `@btime` results. 398 | 399 | using BenchmarkTools 400 | 401 | function bencharr(n, m, m2, x, y) 402 | global t = [ (x, y) for i in 1:(n * n * m2) ] 403 | local a = [ [ (x, y) for j in 1:n ] for i in 1:n ] 404 | for i in 1:m 405 | a = map(outer -> map(inner -> (inner[2], inner[1]), outer), a) 406 | end 407 | end 408 | 409 | function fac(n) 410 | local result = BigInt(1) 411 | for i in 2:n 412 | result *= i 413 | end 414 | return result 415 | end 416 | 417 | function benchfac(n) 418 | local total = BigInt(0) 419 | for i in 1:n 420 | total += fac(i) 421 | end 422 | return total 423 | end 424 | 425 | print("Array allocations benchmark: ") 426 | @btime bencharr(200, 1000, 100, "x", "y") 427 | print("BigInt allocations benchmark: ") 428 | @btime benchfac(2000) 429 | 430 | -------------------------------------------------------------------------------- /Pkg3.md: -------------------------------------------------------------------------------- 1 | # JULEP 3 2 | 3 | - **Title:** Pkg3 4 | - **Authors:** Stefan Karpinski <>, Art Diky <> 5 | - **Created:** October 21, 2016 6 | - **Status:** work in progress 7 | 8 | ## Abstract 9 | 10 | Pkg3 is the working name for a next-generation replacement for Julia's built-in package manager, the current version of which is unofficially known as Pkg2 (introduced in Julia 0.2 to replace the original Pkg1). 11 | 12 | ### Table of Contents 13 | - [JULEP 3](#julep-3) 14 | - [Abstract](#abstract) 15 | - [Rationale](#rationale) 16 | - [Depots](#depots) 17 | - [Immutability](#immutability) 18 | - [Environments](#environments) 19 | - [Using Environments](#using-environments) 20 | - [Project Environments](#project-environments) 21 | - [Packages](#packages) 22 | - [Registries](#registries) 23 | - [Versions & Compatibility](#versions--compatibility) 24 | - [Configuration](#configuration) 25 | - [Configuration Fragments](#configuration-fragments) 26 | - [Package metadata](#package-metadata) 27 | - [Version metadata](#version-metadata) 28 | - [Compatibility](#compatibility) 29 | - [Runtime Configuration](#runtime-configuration) 30 | - [Manifest](#manifest) 31 | - [Source Package File](#source-package-file) 32 | - [Registry Package File](#registry-package-file) 33 | - [Operations](#operations) 34 | - [Adding packages](#adding-packages) 35 | - [Synopsis](#synopsis) 36 | - [Example](#example) 37 | - [Pseudo-code](#pseudo-code) 38 | - [Dependency fixing](#dependency-fixing) 39 | - [Questions](#questions) 40 | - [Removing packages](#removing-packages) 41 | - [Synopsis](#synopsis) 42 | - [Example](#example) 43 | - [Pseudo-code](#pseudo-code) 44 | - [Updating & upgrading packages](#updating--upgrading-packages) 45 | - [Synopsis](#synopsis) 46 | - [Examples](#examples) 47 | - [Pseudo-code](#pseudo-code) 48 | 49 | ## Rationale 50 | 51 | There are a number of issues with the design of Pkg2, which necessitate a redesign and replacement: 52 | 53 | - Pkg2's METADATA repository format uses many small files to represent data, which leads to awful performance on many filesystems, especially on Windows. 54 | - Pkg2 uses a variety of ad hoc configuration formats which are simple but not particularly consistent. 55 | - Pkg2 identifies versions of packages by git SHA1 commit hashes. This forces the package manager to use git to acquire package versions and makes package installation and verification impossible without including the entire git history of a package – which can be impractical. 56 | - Some Julia packages have large objects in their git history, which users are forced to download even when they are installing more recent versions that no longer include these large objects. 57 | - Pkg2 makes replacing a package with another package of the same name with disjoint git history a nightmare. This happened when `Stats` was renamed to `StatsBase` and a new `Stats` package was created. The only practical way to resolve this situation was to delete all packages and start over. Moreover, versions of `StatsBase` from before the rename became uninstallable afterwards. 58 | - Pkg2 was designed to allow package development in the same location as package installation for usage. This design forces Pkg2 to use complex and subtle heuristics to try to determine when it is safe to update or modify installed packages. A large amount of code complexity stems from this design. 59 | - Pkg2's package version resolution is designed to depend only on requirements and version information in METADATA, *not* on the current set of installed package versions. This implies that any update potentially updates all packages to the latest available version. This is typically undesirable: one often wants to do much more conservative, targeted updates of a subset of installed packages. Pkg2's update behavior effectively assumes that the user has carefully and accurately curated their exact requirement of packages, and that package developers never break things – neither of which is typically true. 60 | - In Pkg2 *any* operation on packages invokes a full version resolution, not just explicit updates: adding or removing a new package updates all packages. This is unfortunate behavior for a package manager. It should be possible to add a new package with zero or minimal changes to pre-installed packages. It should always be possible to remove a package by simply removing it and its dependents. 61 | - Pkg2 provides little support for projects tracking the precise versions of libraries and packages that they have used. This makes reproducibility more challenging than it should be. 62 | - The `JULIA_PKGDIR` environent variable allows some amount of simulation of virtualenv-like "environments" – i.e. different sets of packages and language versions. This could be much better supported, however, and environment contents should ideally be easily commitable and sharable between different projects and systems, at various levels of granularity. 63 | 64 | ## Depots 65 | 66 | A **depot** is a file system location where we keep infrastructure related to Julia package management: registries, libraries, packages, and environments. There are typically at least three of these: 67 | 68 | - **Standard depot:** default packages and libraries that ship with a specific version of Julia. This depot is strictly read-only. These versions of libraries and packages serve as a fallback when no other depots available. If you delete or disable this depot as well, standard packages will be unavailable. Example: `/usr/local/share/julia/standard`. 69 | 70 | - **System depot:** package versions and libraries installed here are available to everyone on the system. They are typically only writable by administrators. If users want to add or upgrade packages, they will do so in their individual user depots. Example: `/usr/local/share/julia/system`. 71 | 72 | - **User depot:** package versions and libraries installed by a user. Example: `~/.julia/`. 73 | 74 | Note the lack of Julia versions in this scheme: a depot is expected to be shared between different Julia versions. This should work because of the principle of immutability (see below): since we don't update versions of libraries or packages in place, installed copies can be shared between different versions of Julia without issues. Different sets of library and package versions are handled at the environment level. 75 | 76 | Each package depot contains the following directories: 77 | 78 | - **`registries`:** named registries describe sets of packages, versions and compatibility between them. 79 | - **`libraries`:** installed versions of libraries (e.g. `libcairo`, `libpango`). 80 | - **`packages`:** installed versions of Julia packages (e.g. `Cairo`, `DataFrames`, `JuMP`). 81 | - **`environments`:** named sets of versions of libraries and packages and global configuration. 82 | 83 | Some environment and/or Julia variable – `DEPOT_PATH` maybe? – will control the set of depots visible to a Julia process. The registries, libraries, packages, and environments visible to Julia are the union across all depots in the depot path. 84 | 85 | The set of registered packages visible to a Julia process is the union of all packages specified across all registries, merging specifications of the same package occurring in multiple registries by the following rules: 86 | 87 | - The set of known packages is the union across all registries. 88 | - The set of available versions of a package is the union across all registries. 89 | - If the same version of a package appears in multiple registries, all versions must match. 90 | - The registry with the largest registered version of a package determines its metadata; 91 | - If two different registries "tie" then the package metadata must match. 92 | 93 | The set of installed library versions is the union across depots. If the same library version occurs multiple times in the depot path, the first occurance is used – different instances of the same library version may be different depending on how they are configured and installed. The set of installed package versions is the union across depots. If the same package version occurs multiple times in the depot path, the first occurance is used. If installed correctly, different installations of the same package should be identical. 94 | 95 | Each named environment specifies a set of specific library and package versions. These libraries and packages do not need be installed in the same depot where the environment appears. They can be provided by another package depot, allowing preinstalled libraries and packages to be "inherited" from a system depot, for example. The default environment name is `v$(VERSION.major).$(VERSION.minor)`. This allows different versions of Julia to have different default environments. 96 | 97 | ## Immutability 98 | 99 | Installed libraries and packages are immutable: instead of updating libraries or packages in-place, once they are successfully installed, Pkg3 leaves them as-is until they are no longer needed. This requires a "cleanup" mechanism that does garbage collection of old, unused versions of libraries and packages. To that end, `Pkg3` will maintain a sorted `~/.julia_env.log` file tracking the paths of environment files they have used. During cleanup, if a path no longer points to a valid environment file, the entry is removed from `~/.julia_env.log`; if a path does point to a valid environment file, it is retained, and library and package versions referred to by it are considered to be in use. Any library or package versions that are not marked as in use are removed. When cleaning up a system depot, all user environment logs are scanned; when cleaning up a user depot, only that user's environment log is considered. 100 | 101 | ## Environments 102 | 103 | An **environment** captures a specific set of package and library versions and their global configuration. Pkg2 has some limited support for changing environments using the `JULIA_PKGDIR` environement variable. Pkg3 makes named environments and project-local environments a primary part of its design, making the invocation of Julia with different sets of libraries and packages far more convenient. It also standardizes how to record the names and versions of libraries and packages that are used, improving reproducibility. 104 | 105 | In Pkg2, package operations like `Pkg.add`, `Pkg.rm`, and `Pkg.update` are somewhat inconsistent about whether they operate on the current running Julia process or not. This is because different actions have different feasibility with respect to the current session: it's possible to install or update a package before it is loaded, but it is impossible to remove or update an already-loaded package. Thus, performing operations on the set of available packages *in general* requires a restart of the process before it can take effect, but installing and then loading a new package without restarting the current process is common and useful. 106 | 107 | In Pkg3, general operations on environments are not done in the Julia process using an environment. Instead, they are done through a standalone process, which (although it is implemented in Julia) does not operate within the environment that it manipulates. The most common operation, however – installing and loading a new package – will typically be done implicitly and automatically in an interactive Julia session. In other words, when the user does `using XYZ` in the REPL, if `XYZ` is not installed, the REPL will prompt the user if they want to install `XYZ` and its dependencies, and if they agree, it will install and then load it. Since this is the most common operation it can be done without restarting the current Julia process, it makes sense that it be handled specially. When the user wants to remove package or update packages from an environment, they will instead invoke an external package management mode (`julia --pkg`?), which makes it clear that changes will not affect any currently running Julia sessions. The impact on usability is a strict improvement: 108 | 109 | - Adding packages and loading them is easier since one simply does `using XYZ` and answers interactive prompts. 110 | - Removing and upgrading packages is no less difficult since it previously required restarting the current Julia process anyway, and is less confusing since the requirement to restart is explict since running a separate process clearly doesn't affect the current one. 111 | 112 | ### Using Environments 113 | 114 | When starting Julia, it is given an environment by default, by name or by path: 115 | 116 | - `julia`: use the default named environment – `v$(VERSION.major).$(VERSION.minor)`. 117 | - `julia --env=abc`: use the environment named "abc", searched for in the depot path. 118 | - `julia --env=.`: use the local project environment (see below). 119 | - `julia --env=./proj`: use the project environment of the directory `./proj`. 120 | - `julia --env=./env.toml`: use environment described by the file `./env.toml`. 121 | 122 | An environment spec with no slash is taken to be a named environment – except for the special name `.` which indicates using the current project environment. An environment spec with a slash is taken to be a path (relative or absolute): if the path is a directory, it is interpreted as a project and the project environment is used; if the path is a file, it is loaded as an environment specification (in TOML format, see "Configuration" below). 123 | 124 | An environment spells out exactly what version of each of a set of packages and libraries to use (version, hash, path, etc.). A Julia process can be "open" or "closed" with respect to its environment: 125 | 126 | - **Open:** packages that are not in the environment can be loaded. They will be resolved greedily in the order they are loaded, choosing the highest installed version that satisfies the requirements of the environment and all loaded packages. If no statisfactory version is installed, but some registered version exists that would satisfy all requirements, the user is prompted to install and use it. 127 | - **Closed:** packages that are not in the environment cannot be loaded. 128 | 129 | By default, Julia runs in open mode. When testing or deploying, however, Julia should default to closed mode to help ensure that a project hasn't inadvertently used packages that aren't recorded as dependencies. Since the project configuration also records which packages are direct dependencies, closed mode could enforce that project code only uses direct dependencies and indirect dependencies are only loaded indirectly. Note that this also helps address the problem that different packages may refer to different packages by the same top-level name. 130 | 131 | ### Project Environments 132 | 133 | The environment specification of a project is split into three files: `Config.toml`, `Manifest.toml`, and `Local.toml`. (Each file name may also be prefixed with `Julia`, in which case the non-prefixed file, if it exists, is ignored.) The purpose of these files is to separate the environment into three parts: 134 | 135 | - `Config.toml`: manual configuration, checked into version control (input) 136 | - `Manifest.toml`: generated information, checked into version control (output) 137 | - `Local.toml`: generated information, not checked into version control (by product) 138 | 139 | Accordingly, `.gitignore` for Julia projects should include entries for `/Local.toml` and `/JuliaLocal.toml` so that those files are ignored by version control. The `Config.toml` file controls what subset of environment information goes into `Manifest.toml` versus what goes into `Local.toml` – everything ends up in one or the other. Examples of different scenarios with various choices of manifest subsets: 140 | 141 | - A project meant to run on a single system (or homogenous systems) may choose to save everything in the manifest, including exact versions of packages and libraries, paths to them, even hashes of them, so that a complete record is checked into the project repository. 142 | - A project meant to run on different systems, on the other hand, may choose to check specific project versions and hashes into version control, but not library information, using libraries available on each system. 143 | - Published packages will generally not check specific dependency versions into version control since these will differ among developers and users. They will, however, check in general dependency version requirements (e.g. `XYZ = "1.2-1.9"`). During early development, however, it may be desirable to check in more detail so that different developers can stay in sync more easily. 144 | 145 | When using the current project environment, specified by starting Julia with the ` --env=.` flag, the project directory is searched for by looking in the current directory and each parent directory for `JuliaConfig.toml` or `Config.toml`. If a directory is found containing a file by this name, it is considered to be the project root and the config, manifest and local files are loaded from there. 146 | 147 | ## Packages 148 | 149 | Packages continue to work much as they have previously with a few exceptions: 150 | 151 | 1. Each package has `Config.toml` and `Manifest.toml` files. 152 | 2. `Config.toml` contains an entry giving the package a [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier). 153 | 3. Package versions are identified by a hash of a source tree instead of a git commit. 154 | 4. Eventually packages will not need to be git repositories. 155 | 156 | UUIDs for registered packages will be assigned and when new packages are generated, a UUID will be created (this should happen even for private, unregistered packages). UUIDs will generally not be user-facing, but they are used internally to identify packages in registries and environment files. The purpose of UUIDs is to allow renaming of packages and moving of packages between different registries. A couple of scenarios to consider before arguing against using UUIDs: 157 | 158 | - The `Stats` / `StatsBase` situation: `Stats` was renamed to `StatsBase` and a new package also called `Stats` was created. This broke many people's package installations and caused a great deal of grief. With packages identified by UUID, this kind of rename is completely unproblematic. 159 | - Two different packages may be created in different private registries with the same name. If these are both later made public, they may need to be renamed, but some way of knowing which one an old environment using one of them was referring to. Version hashes should be unique, but environments can record unregistered states of packages: unless every tree hash that could ever have been recorded in an environement using a package is known, it's impossible to figure out which package was used. If packages have UUIDs and these are recorded in environments, then it will always be possible to know which package was meant. 160 | 161 | Identifying package version by hashes of *source trees* rather than git commit hashes allows us to acquire and verify package versions without necessarily using git, and even with git it makes it easier to support shallow cloning and history rewriting, as long as the source trees of a published version doesn't change. The git style SHA1 tree hash is one means of identifying a source tree, but we may want to support other hashes since SHA1 is no longer considered secure. We could, for example, also publish SHA2-512 hashes for the source trees of package versions, along side SHA1 hashes, allowing smooth transitioning to a more secure hash. With multiple coexisting ways of acquiring package versions, we can also smoothly transition away from using git alone for delivery of package code. 162 | 163 | ## Registries 164 | 165 | A **registry** is a Pkg3 replacement for the METADATA repository. Crucially, Pkg3 supports using multiple registries, and there will be "cathedral" and "bazaar" style public registries, and private registries will be supported. Private registries allow organizations to internally register private packages and versions which can refer to and depend on public packages. Registries provide four kinds of information: 166 | 167 | 1. Bidirectional many-to-many mapping between package names and UUIDs. 168 | 2. A list of versions for each package, identified by their source tree hash. 169 | 3. Version dependency and compatibility information. 170 | 4. Where to get each package version. 171 | 172 | The latest UUID associated with a name is the current one; other UUIDs were previous packages associated with that name. A UUID may have multiple names associated with it over time, but the latest one is current. If the same name occurs in different registries, referring to different UUIDs, then there is a name conflict which must be resolved interactively as needed. For example, if a user asks to add `XYZ` but the name refers to different packages in different registries, then the user should be prompted for which one they want. 173 | 174 | Each version is associated with a specific source tree, unlike Pkg2 where each version is associated with a git commit. This allows us to acquire and verify package versions without necessarily using git, and even with git it makes it easier to support shallow cloning or history rewriting, as long as the source trees of published versions don't change. The git style SHA1 tree hash is one means of identifying a source tree, but we may want to support other means since SHA1 is no longer considered secure. We could, for example, also publish SHA2-512 hashes for the source trees of package versions, thereby allowing them to be securely verified even though SHA1 is no longer secure. 175 | 176 | ## Versions & Compatibility 177 | 178 | Expressing compatibility between various versions of packages is complicated by the fact that compatibility claims for a particular version can either be: 179 | 180 | - mistakenly incorrect when published, or 181 | - correct when published but so broad that they later become incorrect. 182 | 183 | Pkg2 allows and even encourages very loose dependency declarations and deals with both of the above situations by allowing compatibility claims to be adjusted after the fact. Dependencies can and are expected to be changed in METADATA to adjust for mistakes and invalidation. This causes significant complexity and confusion, however: the dependencies of a package version according to its own immutable source may not match the current dependencies registered for it in METADATA – which are still potentially evolving. Because of this, Pkg2 contains tricky logic about which compatibility claims take precedence – those in the source tree or those in METADATA. These rules are especially complicated since Pkg2 supports development of packages where they are installed, further muddying what the definitive record of compatibility is. 184 | 185 | In Pkg3, a package version's compatibility claims are immutable. While compatibility claims may still be incorrect, they cannot be changed, only superseded by a newer version. Overly broad compatibility claims cannot, by design, be expressed in the first place. In this design, any invalidation of claimed compatibility can only stem from another package's failure to follow [semantic versioning](http://semver.org/) correctly. 186 | 187 | However, since this will certainly occur in practice, there will need to be a mechanism to remedy it: 188 | 189 | * If the compatibility claims were too restrictive, a new patch with wider version compatibility ranges can be published. Pkg3's version resolution will favor the most recent patch very strongly: unless you explicitly ask for an earlier patch specifically, a freshly installed or updated package will always be the latest patch in its major-minor series. Package developers should follow semantic versioning strictly and *only* include bug fixes in patch releases: patches should neither break existing features nor introduce new features. 190 | 191 | * However, if the compatibility claims were too broad, tagging a new version may not necessarily remedy the problem as the dependency resolver may decide to use the older (broken) version, in order to obtain compatibility with another package. In this case, the invalid compatibility claims will need to be revoked by the registry. 192 | 193 | Compatibility claims in Pkg3 are expressed at *exactly* minor version granularity. This may be easiest to explain starting with the textual form. In configuration files, sets of compatible versions are expressed using arrays of string literals (in TOML format), each string being of one of the following forms: 194 | 195 | - **minor version:** `"a.b"` includes versions with `major == a && minor == b`; 196 | - **version range:** `"a.b-a.c"` includes versions with `major == a && b ≤ minor ≤ c`; 197 | - **negated patch:** `"!a.b.c"` excludes versions with `major == a && minor == b && patch == c`. 198 | 199 | A list of terms expresses a set of package versions: the union of versions included in minor version strings and version range strings, minus the specific versions excluded by negated patch strings. In other words, the version list `["1.2-1.4", "!1.2.5", "2.0"]` includes any version such that 200 | 201 | ```julia 202 | major == 1 && (2 ≤ minor ≤ 4) && !(minor == 2 && patch == 5) || major == 2 && minor == 0) 203 | ``` 204 | 205 | Compatibility lists should be normalized according to the following rules: 206 | 207 | - versions and ranges should be mutually disjoint; 208 | - versions and ranges should appear in sorted order by major and minor version; 209 | - versions and ranges which can be coalesced should be combined into a single range; 210 | - negated patches should follow the version or range in which they are contained, separated from it only by smaller negated patches (i.e. negated patches are sorted by major, minor and patch numbers). 211 | 212 | Following these rules, each possible set of compatible versions can be expressed in exactly one way. Here are some examples of normalized version sets: 213 | 214 | ```toml 215 | ["1.2"] 216 | ["1.2", "!1.2.5"] 217 | ["1.2-1.3", "!1.2.5"] 218 | ["1.2-1.4", "!1.2.5", "2.0"] 219 | ["1.2-1.4", "!1.2.5", "!1.4.0", "2.0"] 220 | ["1.2-1.4", "!1.2.5", "!1.4.0", "2.0-2.1"] 221 | ["1.2-1.4", "!1.2.5", "!1.4.0", "2.0-2.5", "3.0"] 222 | ``` 223 | 224 | Compatibility sets include an unbounded number of potential future patches, but include a finite number of minor versions. A package should not declare compatibility with a minor version series unless some version in that series has actually been published – this guarantees that compatibility can (and should) be tested. If a new compatible major or minor version of a package is released, this should be reflected by publishing a new patch that expands the compatibility claims. If a new patch of an otherwise compatible major/minor version series contains a bug that breaks compatibility, a new patch of each package should be released: a patch of the buggy package, fixing the bug, and a patch of the other package, excluding the buggy version from its compatibility claims. 225 | 226 | ## Configuration 227 | 228 | Pkg3 uses [TOML](https://github.com/toml-lang/toml) for configuration files. Several other projects have adopted this format: see [Cargo](http://doc.crates.io/manifest.html) and [PEP 518](https://www.python.org/dev/peps/pep-0518/) for example. This [format comparison](https://github.com/toml-lang/toml#comparison-with-other-formats) has some thoughts and justifications for using this format over other common configuration formats. The basic justification is: 229 | 230 | - compared to **JSON** it is more human readable and writeable 231 | - compared to **YAML** it is far simpler to parse and understand 232 | - compared to **INI** it is very similar but standardized 233 | - compared to **XML** it is… hah, no. 234 | 235 | All said, TOML seems to be the most reasonable format for simple, human-readable configuration files. An implementation of TOML parsing and printing in Julia can be found [here](https://github.com/wildart/TOML.jl). There are a few other implementations floating around, and this version need not be the one we adopt, but it has been used for experimentation during the design process so it should handle formats discussed in what follows. 236 | 237 | ### Configuration Fragments 238 | 239 | We'll begin by describing certain types of configuration fragments. Environments and registries use these fragments in similar ways. TOML headers are absolute, not relative, which makes describing fragments a bit awkward. To address this, consider sections to be implicitly relative: if a fragmen has a header `[header]` consider it relative to wherever it occurs, so if that fragment were used in a section called `[section]` then the header would actually be `[section.header]`. 240 | 241 | #### Package metadata 242 | 243 | High-level description of a package: its UUID, name, license, authorship, where to get it, etc. This will appear in a package's configuration file and copied into any registries that the package appears in. 244 | 245 | ```toml 246 | name = "Example" 247 | uuid = "86d33384-d511-4271-be88-8c3e434c707e" 248 | license = "MIT" 249 | authors = [ 250 | "Jane Q. Programmer ", 251 | "Jack X. Developer ", 252 | ] 253 | description = "Example package." 254 | keywords = ["example", "fake", "unreal"] 255 | documentation = "https://docs.github.io/Example.jl" 256 | homepage = "https://example.com/Example.jl" 257 | repository = "https://github.com/ExampleOrg/Example.jl" 258 | ``` 259 | 260 | #### Version metadata 261 | 262 | This descripes a particular version of a package. 263 | 264 | ```toml 265 | version = "1.2.3" 266 | SHA1 = "739ea886f7ae45ef27f7c0a2ea2bc25d59d40fd2" 267 | SHA2-512 = """ 268 | 45d8153f80a301a890d5da67592ddf42fb96c4cd3945998386d0293dcf80b44d 269 | c9c8499c6e1ba4068381ac5bb243561de3e9c25e8989e949d56e8438085a9a22 270 | """ 271 | ``` 272 | 273 | Note that the string for a SHA2-512 hash value is allowed to contain extra whitespace including a newline. This improve readability of files including long hash values by avoiding overly long lines. The hash value is a hash of the source tree, computed as trees are hashed in git, but using different hashing functions. Thus, the SHA1 tree hash is the same as the tree name in git, allowing us to retrieve the source version. 274 | 275 | #### Compatibility 276 | 277 | The compatibility section expresses which libraries and packages a project directly interacts with, either as requirements or "optional dependencies" – i.e. packages that this package has some special code for, only to be loaded if that other package is also loaded. Only direct dependencies and optional packages are specified in the compatibility section. Any indirect dependencies are strictly the concern of the packages that depend on them. Thus, if `Required` depends on `Indirect`, we cannot constrain the version of `Indirect` here, although `Required` can. Thus, if a new version of `Required` comes out that don't use `Indirect` anymore, and we upgrade to that, the package manager is free to get rid of `Indirect`. 278 | 279 | ```toml 280 | [library.libXYZ] 281 | uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41" 282 | versions = "2.3-2.5" 283 | 284 | [package.Required] 285 | uuid = "85241492-0f92-400a-8719-bdc0424991f7" 286 | versions = ["1.2-1.3", "!1.2.5"] 287 | 288 | [package.Optional] 289 | uuid = "f7faa14e-633f-4b87-8f63-428f7e99170d" 290 | versions = "3.7" 291 | optional = true 292 | ``` 293 | 294 | The last component of the header is the library or package name, while the `uuid` field gives its UUID – this unambiguously identifies the package. The name is what the local project will refer to and load the package or library as – this should probably match what its published as, although we may want to allow publishing under multiple names simultaneously. The `versions` field is either a string or an array of strings which specifies a set of compatible versions, as described in "Versions & Compatibility" above. 295 | 296 | #### Runtime Configuration 297 | 298 | Runtime configuration sections allow projects to set global configuration flags to be passed to libraries. This section only makes sense at a project level since there can only be one source of configuaration for a given library or package – i.e. libraries and packages cannot configure other libraries or packages. 299 | 300 | ```toml 301 | [library.libXYZ] 302 | backend = "abc" 303 | knob = 1.5 304 | 305 | [package.Required] 306 | numbers = [4, 8, 15, 16, 23, 42] 307 | 308 | [package.Indirect] 309 | fiddle = true 310 | ``` 311 | 312 | A parsed dictionary representation of a package's configuration will be passed to the package's `__init__` method when it is loaded, allowing a project to control the global runtime configuration of packages. It remains to be determined how runtime configuration data will be passed to libraries. Packages may not provide runtime configuration of other packages since packages (by definition) are projects that are intended to be reusable by other projects and are thus, not the primary project. Runtime configuration may be provided for non-top-level dependencies (e.g. `Indirect` in the above fragment). 313 | 314 | #### Manifest 315 | 316 | The manifest fragment records all the details of which libraries and packages are included in a Julia environement. The information should be kept by running Julia process so that we can save it to a manifest file. Not all of the data will be appropriate to be committed for all kinds projects, so these data may be split between different files – some to be checked into version control and some strictly local. 317 | 318 | ```toml 319 | [library.libXYZ] 320 | version = "2.3.4" 321 | path = "/home/user/.julia/libraries/libXYZ/2.3.4" 322 | mtime = 2016-10-20T18:28:56.299 323 | CRC32C = "3ba18fe1" 324 | SHA1 = "d2672146a1aca6023073074d765a32d7eb298baf" 325 | SHA2-512 = """ 326 | 981702a057faa649b7fa24337a67e0d6e8af258f81d0ed8ce90775cdfe0942c6 327 | d18ce0b5747e5fb1123cceb65b1074a9ba20f788e7cbacc7e824bac043f80208 328 | """ 329 | 330 | [package.Required] 331 | version = "1.2.8" 332 | path = "/usr/julia/system/packages/Required/1.2.8" 333 | mtime = 2016-10-20T18:29:55.605 334 | CRC32C = "d1a6296e" 335 | SHA1 = "982d4e4e0f728e7e0416472ffb394250c7afd1aa" 336 | SHA2-512 = """ 337 | f991d247834effca8ce7114b7100d191d259abf36bbe6a1cf03382a8e1a51171 338 | 0c107a0a7b5a5dd21cfd304e7e5525fc2287cc255de15f1c7d4f33ac86990e85 339 | """ 340 | 341 | [package.Optional] 342 | version = "3.7.2" 343 | path = "/home/user/.julia/packages/Optional/3.7.2" 344 | mtime = 2016-10-20T18:35:29.124 345 | CRC32C = "595b180b" 346 | SHA1 = "ff1ca382d0f905ce9e75fc829cfa4419123c0491" 347 | SHA2-512 = """ 348 | 904b16f8cea76f8feb04526983a42a4b11194a840223976497f85e59c0948c3c 349 | 3a4ad1c0c5f1b7f61734f4f8cfee74869693fe6be56e56ca9e54398e3ea06765 350 | """ 351 | 352 | [package.Indirect] 353 | version = "1.5.3" 354 | path = "/usr/julia/system/packages/Indirect/1.5.3" 355 | mtime = 2016-10-21T10:42:25.366 356 | CRC32C = "2ffefb96" 357 | SHA1 = "8182d2ea3d4427eccc7e968923cb1bf6affb74c8" 358 | SHA2-512 = """ 359 | 7cc5a55bf2f55f4ce95d4d63594bb5d2c468a41c552eb6c5d29a9ffcb8a8b40f 360 | 665b09748acc0cf3af9eeef81f55805269b86e9f26e32ede03c11d2043bf3f2d 361 | """ 362 | ``` 363 | 364 | ### Source Package File 365 | 366 | Package configuration includes package metadata and compatibility sections for libraries and packages: 367 | 368 | ```toml 369 | name = "Example" 370 | uuid = "86d33384-d511-4271-be88-8c3e434c707e" 371 | license = "MIT" 372 | authors = [ 373 | "Jane Q. Programmer ", 374 | "Jack X. Developer ", 375 | ] 376 | description = "Example package." 377 | keywords = ["example", "fake", "unreal"] 378 | documentation = "https://docs.github.io/Example.jl" 379 | homepage = "https://example.com/Example.jl" 380 | repository = "https://github.com/ExampleOrg/Example.jl.git" 381 | 382 | [library.libXYZ] 383 | uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41" 384 | versions = "2.3-2.5" 385 | 386 | [package.Required] 387 | uuid = "85241492-0f92-400a-8719-bdc0424991f7" 388 | versions = ["1.2-1.3", "!1.2.5"] 389 | 390 | [package.Optional] 391 | uuid = "f7faa14e-633f-4b87-8f63-428f7e99170d" 392 | versions = "3.7" 393 | optional = true 394 | ``` 395 | 396 | ### Registry Package File 397 | 398 | Each registered package has its own file (name TBD, but probably `Example.toml`), describing the package, all its registered versions, and their compatibility and requirements on other libraries and packages. 399 | 400 | ```toml 401 | name = "Example" 402 | uuid = "86d33384-d511-4271-be88-8c3e434c707e" 403 | license = "MIT" 404 | authors = [ 405 | "Jane Q. Programmer ", 406 | "Jack X. Developer ", 407 | ] 408 | description = "Example package." 409 | keywords = ["example", "fake", "unreal"] 410 | documentation = "https://docs.github.io/Example.jl" 411 | homepage = "https://example.com/Example.jl" 412 | repository = "https://github.com/ExampleOrg/Example.jl.git" 413 | 414 | [[version]] 415 | version = "1.2.3" 416 | SHA1 = "739ea886f7ae45ef27f7c0a2ea2bc25d59d40fd2" 417 | SHA2-512 = """ 418 | 45d8153f80a301a890d5da67592ddf42fb96c4cd3945998386d0293dcf80b44d 419 | c9c8499c6e1ba4068381ac5bb243561de3e9c25e8989e949d56e8438085a9a22 420 | """ 421 | 422 | [version.library.libXYZ] 423 | uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41" 424 | versions = "2.3-2.5" 425 | 426 | [version.package.Required] 427 | uuid = "85241492-0f92-400a-8719-bdc0424991f7" 428 | versions = ["1.2-1.3", "!1.2.5"] 429 | 430 | [version.package.Optional] 431 | uuid = "f7faa14e-633f-4b87-8f63-428f7e99170d" 432 | versions = "3.7" 433 | optional = true 434 | 435 | [[version]] 436 | version = "1.2.4" 437 | SHA1 = "e92729c0e7c23d9f83fadba3e197ab9b5ddd9791" 438 | SHA2-512 = """ 439 | fd22289bb2440e9d6c112ff4b33e36183a792edafb2cd96eb688ef931faddf9c 440 | 81d4a7a544921bc3c5d79aa74db0a163fa8f75f57c6fb603810dd3d51e17ba2e 441 | """ 442 | 443 | [version.library.libXYZ] 444 | uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41" 445 | versions = "2.3-2.6" 446 | 447 | [version.package.Required] 448 | uuid = "85241492-0f92-400a-8719-bdc0424991f7" 449 | versions = ["1.2-1.4", "!1.2.5", "2.0"] 450 | 451 | [version.package.Optional] 452 | uuid = "f7faa14e-633f-4b87-8f63-428f7e99170d" 453 | versions = ["3.7", "!3.7.3"] 454 | optional = true 455 | ``` 456 | 457 | This format is pretty verbose. We could design a custom compression scheme for this format, aggregating information across multiple versions of the same package, or simply use general purpose compression. General purpose compression would be easier, certainly, but would still require parsing of a potentially very large number of version sections once they're uncompressed. A custom compression scheme could support faster parsing of logically compressed data, allowing the package manager to query the compressed data as-is. 458 | 459 | ## Operations 460 | 461 | In this section, we go through various operations on the set of packages in an environment. This supposes a `pkg>` REPL mode that has command-like syntax. For some operations, we'll provide pseudo-code for operations, which is not intended to actually work or even use real operation names, but to suggest the general operation. We distinguish top-level dependencies of a project – i.e. packages that appear in `Config.toml` with name, UUID, and compatible versions – from indirect dependencies which do not appear in `Config.toml` but do appear in `Manifest.toml` beacuse they are recursively depended on by top-level dependencies. Each pseudo-code snippet has an implicit preamble like this: 462 | 463 | ```julia 464 | cfg₀ = load("Config.toml") 465 | env₀ = merge(load("Manifest.toml"), load("Local.toml")) 466 | ``` 467 | 468 | There is a similar postamble saving cfg₁ and env₁ back to `Config.toml` and env₁ to `Manifest.toml` and `Local.toml` as determined by the configuration splitting those files (TBD = to be designed). 469 | 470 | ### Adding packages 471 | 472 | #### Synopsis 473 | 474 | ``` 475 | pkg> add p₁ [=v₁] p₂ [=v₂] … 476 | ``` 477 | 478 | Add packages p₁, p₂, … as top-level dependencies of the current environment, adding version constraints as indicated. 479 | 480 | #### Example 481 | 482 | ``` 483 | pkg> add Foo Bar=1 Baz=2.3 Qux=4.5.6 484 | ``` 485 | 486 | This command installs `Foo` at any version, `Bar` at major version 1, `Baz` at major/minor version 2.3, and `Qux` at exactly version 4.5.6. Corresponding constraints on these packages are added to `Config.toml`. 487 | 488 | #### Pseudo-code 489 | 490 | ```julia 491 | cfg₁ = add(cfg₀, p₁ => v₁, p₂ => v₂, …) 492 | env₁ = resolve(cfg₁, env₀, fix = [:all|:top|:none]) 493 | ``` 494 | 495 | #### Dependency fixing 496 | 497 | There are three available strategies for keeping dependencies fixed when adding top-level packages: 498 | 499 | 1. **Fix all:** Only extend env₀ – i.e. env₁ ⊇ env₀. No versions in the manifest are changed, only new packages are added to it. 500 | 2. **Fix top:** Only allow changing indirect dependencies, not top-level dependencies. I.e. don’t change the versions of any packages that appear in cfg₀ – packages that aren’t directly used by the project are fair game to change the installed versions of (and to add or remove to the environment). 501 | 3. **Fix none:** add, remove, update any packages to satisfy cfg₁, but only change what you have to. 502 | 503 | It is important to note that unlike Pkg2, with all strategies, even `fix = :none` , package versions are never changed unnecessarily. If you *also* want to upgrade packages to newer versions, you can do an upgrade operation before or after doing the add operation. 504 | 505 | #### Questions 506 | 507 | Do we really need multiple strategies, or can we just pick one of them? 508 | 509 | If the operation fails, what state should `Config.toml` and `Manifest.toml`, etc. be left in? 510 | 511 | ### Removing packages 512 | 513 | #### Synopsis 514 | 515 | ``` 516 | pkg> rm p₁ p₂ … 517 | ``` 518 | 519 | Remove top-level packages p₁, p₂, … from the current environment. 520 | 521 | #### Example 522 | 523 | ``` 524 | pkg> rm Foo Qux 525 | ``` 526 | 527 | Remove the packages `Foo` and `Qux` and any indirect dependencies that are only installed because of them. If any top-levels recursively depend on them (this can be direct or indirect via indirect dependencies, even though that's a somewhat strange situation), we could prompt the user if they want to remove those as well. 528 | 529 | #### Pseudo-code 530 | 531 | ```julia 532 | cfg₁ = rm(cfg₀, p₁, p₂, …) 533 | env₁ = resolve(cfg₁, env₀, fix = :all) 534 | ``` 535 | 536 | For package removal, it’s always possible to leave all remaining packages at the same version. Just remove p₁, p₂, …, and any indirect dependencies that aren’t necessary anymore. What remains is always a coherent set of packages. 537 | 538 | ### Updating & upgrading packages 539 | 540 | I'm proposing that we distinguish between "updating" and "upgrading" packages: an update is a version bump while an upgrade is a more significant change in version. The intuition is that when up update packages, there are essentially two things we want: 541 | 542 | - **Update:** "Give me any bug fixes you've got but don't break my code." 543 | - **Upgrade:** "Install the latest version and if it breaks some stuff, I'll fix it." 544 | 545 | #### Synopsis 546 | 547 | ``` 548 | pkg> [update|upgrade] p₁ p₂ … 549 | ``` 550 | 551 | Update or upgrade the packages p₁ p₂ … or all packages if none are specified. Update bumps listed packages and all of their recursive dependencies to the latest patch release of the current major/minor version they're currently at; if indirect dependencies must be upgraded, they may be but only if needed to get bug fix release of something else. Upgrade all listed packages and their recursive dependencies to the latest version compatible with `Config.toml` . 552 | 553 | #### Examples 554 | 555 | ``` 556 | pkg> update 557 | ``` 558 | 559 | Update all packages to the latest bugfixes. 560 | 561 | ``` 562 | pkg> update Bar Baz 563 | ``` 564 | 565 | Update `Bar` and `Baz` and all their dependencies to the latest bugfix releases. 566 | 567 | ``` 568 | pkg> upgrade 569 | ``` 570 | 571 | Upgrade all packages to their latest versions. 572 | 573 | ``` 574 | pkg> upgrade Bar Baz 575 | ``` 576 | 577 | Upgrade `Bar` and `Baz` and all their dependencies to their latest versions. 578 | 579 | #### Pseudo-code 580 | 581 | ```julia 582 | cfg₁ = cfg₀ 583 | env₁ = [update|upgrade](cfg₀, env₀, p₁, p₂, …) 584 | ``` 585 | 586 | Whether the function `update` or `upgrade` is called depends on the operation. 587 | --------------------------------------------------------------------------------