├── StructuredConcurrency_structured.jpeg
├── StructuredConcurrency_not_structured.jpeg
├── README.md
├── LICENSE.md
├── RTLIB.md
├── Logging.md
├── Find.md
├── StructuredConcurrency.md
├── GcExtensions.md
└── Pkg3.md


/StructuredConcurrency_structured.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JuliaLang/Juleps/HEAD/StructuredConcurrency_structured.jpeg


--------------------------------------------------------------------------------
/StructuredConcurrency_not_structured.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/JuliaLang/Juleps/HEAD/StructuredConcurrency_not_structured.jpeg


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Juleps: Julia Enhancement Proposals
 2 | 
 3 | This repository contains proposals to enhance the Julia language and ecosystem.
 4 | It contains the following "Juleps" (Julia Enhancement Proposals):
 5 | 
 6 | - [Pkg3](Pkg3.md) – the next generation of Julia package management
 7 | - [RTLIB](RTLIB.md) – a runtime-library for Julia.
 8 | - [Find](Find.md) - Reorganize search and find API
 9 | - [Logging](Logging.md) – A general logging interface
10 | 


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 | Julia enhancement proposals are licensed under the MIT License:
 2 | 
 3 | > Copyright (c) 2016: [contributors](https://github.com/JuliaLang/Juleps/contributors)
 4 | >
 5 | > Permission is hereby granted, free of charge, to any person obtaining
 6 | > a copy of this software and associated documentation files (the
 7 | > "Software"), to deal in the Software without restriction, including
 8 | > without limitation the rights to use, copy, modify, merge, publish,
 9 | > distribute, sublicense, and/or sell copies of the Software, and to
10 | > permit persons to whom the Software is furnished to do so, subject to
11 | > the following conditions:
12 | >
13 | > The above copyright notice and this permission notice shall be
14 | > included in all copies or substantial portions of the Software.
15 | >
16 | > THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17 | > EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18 | > MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19 | > NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20 | > LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21 | > OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22 | > WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
23 | 


--------------------------------------------------------------------------------
/RTLIB.md:
--------------------------------------------------------------------------------
  1 | # JULEP RTLIB
  2 | 
  3 | - **Title:** A runtime library for Julia
  4 | - **Authors:** Valentin Churavy <<v.churavy@gmail.com>>
  5 | - **Created:** November 11, 2016
  6 | - **Status:** work in progress
  7 | 
  8 | ## Introduction
  9 | 
 10 | Currently there are two implementation of intrinsics supported in Julia. One of the
 11 | implementations is defined in `runtime_intrinsics.c` and the second one is defined in
 12 | `intrinsics.cpp` on top of [LLVM intrinsics](http://llvm.org/docs/LangRef.html#id1190)
 13 | and [LLVM instructions](http://llvm.org/docs/LangRef.html#instruction-reference).
 14 | 
 15 | The first implementation specifies the semantics and behavior of the intrinsics and is
 16 | used as a fallback. The LLVM based implementation is best understood as a pure performance
 17 | optimization.
 18 | 
 19 | When a LLVM intrinsic is used and the compiler can't generate hardware instructions
 20 | for it, a library call to the runtime library (compiler-rt or libgcc) is emitted.
 21 | As a result Julia code using LLVM as a codegen backend needs to link against a
 22 | runtime library. In the exploratory work (see https://github.com/JuliaLang/julia/pull/18734)
 23 | the LLVM compiler-rt library was choosen, due to its MIT license.
 24 | 
 25 | A drawback of compiler-rt is that it is not fully portable. As an example
 26 | `Float128` support is missing on 32bit platforms.
 27 | 
 28 | ## Motivating problem
 29 | 
 30 | How do we support `Float16` and `Float128` in a portable and performant manner?
 31 | The current implementation of `Float16` support in Julia is eagerly resolving to
 32 | promotion to `Float32` in order to implement most operations. This precludes optimization
 33 | on platforms that natively support `Float16` (most prominently GPUs). The second
 34 | problem is how are we going to support `Float128` across all platforms in a portable
 35 | and still performant way. On 32bit systems we cannot rely on platform implementations.
 36 | 
 37 | ## Goals
 38 | 
 39 | Implement a runtime-library in a mix of C and Julia that contains Julia intrinsics
 40 | and compiler-rt. This would allow us to use optimized implementations from compiler-rt,
 41 | while having the flexibility of Julia to implement non-essential intrinsics.
 42 | 
 43 | When Julia uses LLVM as a compiler backend it should use a lazy libcall scheme.
 44 | When the compiler can't emit optimized code paths (e.g. LLVM instructions and intrinsics),
 45 | it will emit calls to calls to the intrinsics. Currently this done already for
 46 | better error reporting. The interpreter can eagerly resolve to the implementations
 47 | in the runtime library, which will require `ccall`for the `C` based part of the
 48 | runtime library.
 49 | 
 50 | The runtime library will consist of two stages. Stage-1 is implemented in C and
 51 | contains compiler-rt, while Stage-2 is implemented in Julia. Stage-1 is required
 52 | for bootstrapping a minimal Julia, on which Stage-2 can be implemented.
 53 | Stage- 1 will also contain optimized (and platform dependent) implementations of
 54 | the intrinsics, while Stage-2 will contain portable and general implementations.
 55 | 
 56 | Another goal is to define what is the minimal set of intrinsics that Julia
 57 | requires (Stage-1) and what is the extended set (Stage-2). A well defined set of
 58 | intrinsics would also be beneficial for alternative compilers.
 59 | 
 60 | ### Stage 1
 61 | 
 62 | C-based implementation of the essential Julia intrinsics + compiler-rt.
 63 | - `jl_reinterpret`
 64 | - `jl_pointerset`
 65 | - `jl_pointerref`
 66 | - Operations on Integers
 67 |   - Arithmetic
 68 |   - Comparisons
 69 |   - Conversion between integers
 70 | - Operations on Floating Point (hardware based)
 71 |   - Arithmetic
 72 |   - Comparisons
 73 |   - Conversion to and from integers
 74 |   - `Float32` and `Float64` support
 75 | 
 76 | ### Stage 2
 77 | 
 78 | Julia based implementation for non-essential Julia intrinsics and implementations
 79 | to supplement compiler-rt. This implementation will based on a reduced base library
 80 | (early stages of the sysimage) that will is only allowed to use Stage-1 funcionality.
 81 | The proper sysimage will be based upon Stage-1 and Stage-2.
 82 | The basic idea is https://github.com/JuliaLang/julia/pull/18927,
 83 | which contains the initial port of compiler-rt to Julia.
 84 | - Operations on Floating Point (software based)
 85 |   - Arithmetic
 86 |   - Comparisons
 87 |   - Conversion to and from integers
 88 |   - Necessary for `Float16` and `Float128` support
 89 | - Scalar implementations for vectorized instructions
 90 | 
 91 | #### Building Stage-2
 92 | 1. Build Stage-1 and create a shared object file `rtlib-stage1.so`
 93 | 2. Build inference.ji
 94 | 3. Build Stage-2 with `--rtlib=rtlib-stage1.so` and `--sysimage inference.ji`
 95 | 4. Take the object files from Stage-1 and Stage-2 and create `rtlib.so` containing
 96 |    both Stage-1 and Stage-2.
 97 | 5. Build `sys.so` with `--rtlib=rtlib.so` and `--sysimage inference.ji`
 98 | 
 99 | ## Testing and Benchmarking
100 | 
101 | All implementations, but especially the runtime versions should be thoroughly
102 | tested for correctness and performance. As part of this Julep the testsuite needs
103 | to be extended to cover the current and future runtime intrinsics.
104 | 
105 | ## Non-Goals
106 | 
107 | - No support for atomics at this time. Julia will continue to use `libatomic`.
108 | 


--------------------------------------------------------------------------------
/Logging.md:
--------------------------------------------------------------------------------
  1 | # JULEP Logging
  2 | 
  3 | - **Title:** A unified logging interface
  4 | - **Author:** Chris Foster <chris42f@gmail.com>
  5 | - **Created:** February 2017
  6 | - **Status:** work in progress
  7 | 
  8 | ## Abstract
  9 | 
 10 | *Logging* is a tool for understanding program execution by recording the order and
 11 | timing of a sequence of events.  A *logging library* provides tools to define
 12 | these events in the source code and capture the event stream when the program runs.
 13 | The information captured from each event makes its way through the system as a
 14 | *log record*. The ideal logging library should give developers and users insight
 15 | into the running of their software by provide tools to filter, save and
 16 | visualize these records.
 17 | 
 18 | Julia has included simple logging in `Base` since version 0.1, but the tools to
 19 | generate and capture events are still immature as of version 0.6. For example,
 20 | log messages are unstructured, there's no systematic capture of log metadata, no
 21 | debug logging, inflexible dispatch and filtering, and the role of the code at
 22 | the log site isn't completely clear.  Because of this, Julia 0.6 packages use
 23 | any of several incompatible logging libraries, and there's no systematic way to
 24 | generate and capture log messages.
 25 | 
 26 | This julep aims to improve the situation by proposing:
 27 | 
 28 | * A simple, unified interface to generate log events in `Base`
 29 | * Conventions for the structure and semantics of the resulting log records
 30 | * A minimum of dispatch machinery to capture, route and filter log records
 31 | * A default backend for displaying, filtering and interacting with the log stream.
 32 | 
 33 | A non-goal is to create a complete set of logging backends - these can be
 34 | supplied by packages.
 35 | 
 36 | ## The design problem
 37 | 
 38 | There's two broad classes of users for a logging library - library authors and
 39 | application authors - each with rather different needs.
 40 | 
 41 | ### The library author
 42 | 
 43 | Ideally logging should be a high value tool for library development, making
 44 | library authors lives easier, and giving users insight.
 45 | 
 46 | For the library author, the logging tools should make log events *easy to generate*:
 47 | 
 48 | * Logging should require a minimum of syntax - ideally just a logger verb and
 49 |   the message object in many cases.  Context information for log messages (file
 50 |   name, line number, module, stack trace, etc.) should be automatically gathered
 51 |   without a syntax burden.
 52 | * Log generation should be free from prescriptive log message formatting. Simple
 53 |   string interpolation, `@sprintf` and `fmt()`, etc should all be fine.  When
 54 |   log messages aren't strings, a sensible conversion should be applied by
 55 |   default.
 56 | * Flexible user definable structure for log records should make it easy to
 57 |   record snapshots of program state in the form of variable names and values.
 58 |   This would generalize `@show` using log records as a transport mechanism.
 59 | 
 60 | The default configuration for log message reporting should involve *zero
 61 | setup* and should produce *readable output*:
 62 | 
 63 | * No mention of log dispatch should be necessary at the message creation site.
 64 | * The default console log handler should integrate somehow with the display
 65 |   system, to show log records in a way which is highly readable.
 66 | * Basic filtering of log messages should be easy to configure.
 67 | 
 68 | The default configuration for log message reporting will generally define what
 69 | library authors see during development, so will end up defining the conventions
 70 | authors use when including logging in their library.  To this extent, it's
 71 | important to do a good job displaying metadata!
 72 | 
 73 | ### The application author
 74 | 
 75 | Application authors bring together many disparate libraries into a larger
 76 | system; they need consistency and flexibility in collecting log records.
 77 | 
 78 | Log events are generally tagged with useful context information which is
 79 | available both lexically (eg, module, file name, line number) and dynamically
 80 | (eg, time, stack trace, thread id).  Log records should have *consistent,
 81 | flexible metadata* which represents and preserve this structured information in
 82 | a way that can be collected systematically.
 83 | 
 84 | * Each logging location should have a unique identifier, `id`, passed as part of
 85 |   the log record metadata.  This greatly simplifies tasks such limiting the rate
 86 |   of logging for a given line of code.
 87 | * Users should be able to add structured information to log records, to be
 88 |   preserved along with data extracted from the logging context. For example, a
 89 |   list of `key=value` pairs offers a decent combination of simplicity and power.
 90 | * Clear guidelines should be given about the meaning and appropriate use of
 91 |   standard log levels so libraries can be consistent.
 92 | 
 93 | Log *collection* should be unified:
 94 | 
 95 | * For all libraries using the standard logging API, it should be simple to
 96 |   intercept, and dispatch logs in a unified way which is under the control of
 97 |   the application author.  For example, to write json log records across the
 98 |   network to a log server.
 99 | * It should be possible to naturally control log dispatch from concurrent tasks.
100 |   For example, if the application uses a library to handle simultaneous HTTP
101 |   connections for both an important task and a noncritical background job, we
102 |   may wish to handle the messages generated by these two `Task`s differently.
103 | 
104 | The design should allow for an *efficient implementation*, to encourage
105 | the availability of logging in production systems; logs you don't see should be
106 | almost free, and logs you do see should be cheap to produce. The runtime cost
107 | comes in a few flavours:
108 | 
109 | * Cost in the logging frontend, to determine whether to filter a message.
110 | * Cost in the logging frontend, in collecting context information.
111 | * Cost in user code, to construct quantities which will only be used in a
112 |   log message.
113 | * Cost in the logging backend, in filtering and displaying messages.
114 | 
115 | 
116 | ## Proposed design
117 | 
118 | A prototype implementation is available at https://github.com/c42f/MicroLogging.jl
119 | 
120 | ### Quickstart Example
121 | 
122 | #### Frontend
123 | ```julia
124 | # using Base.Log
125 | 
126 | # Logging macros
127 | @debug "A message for debugging (filtered out by default)"
128 | @info "Information about normal program operation"
129 | @warn "A potentially problem was detected"
130 | @error "Something definitely went wrong, but we recovered enough to continue"
131 | @logmsg Logging.Info "Explicitly defined info log level"
132 | 
133 | # Free form message formatting
134 | x = 10.50
135 | @info "$x"
136 | @info @sprintf("%.3f", x)
137 | @info begin
138 |     A = ones(4,4)
139 |     "sum(A) = $(sum(A))"
140 | end
141 | 
142 | # Progress reporting
143 | for i=1:10
144 |     @info "Some algorithm" progress=i/10
145 | end
146 | 
147 | # User defined key value pairs
148 | foo_val = 10.0
149 | @info "test" foo=foo_val bar=42
150 | ```
151 | 
152 | #### Backend
153 | 
154 | ### What is a log record?
155 | 
156 | Logging statements are used to understand algorithm flow - the order and timing
157 | in which logging events happen - and the program state at each event.  Each
158 | logging event is preserved in a *log record*.  The information in a record
159 | needs to be gathered efficiently, but should be rich enough to give insight into
160 | program execution.
161 | 
162 | A log record includes information explicitly given at the call site, and any
163 | relevant metadata which can be harvested from the lexical and dynamic
164 | environment.  Most logging libraries allow for two key pieces of information
165 | to be supplied explicitly:
166 | 
167 | * The *log message* - a user-defined string containing key pieces of program
168 |   state, chosen by the developer.
169 | * The *log level* - a category for the message, usually ordered from verbose
170 |   to severe.  The log level is generally used as an initial filter to remove
171 |   verbose messages.
172 | 
173 | Some logging libraries (for example
174 | [glib](https://developer.gnome.org/glib/stable/glib-Message-Logging.html)
175 | structured logging) allow users to supply extra log record information in the
176 | form of key value pairs.  Others like
177 | [log4j2](https://logging.apache.org/log4j/2.x/manual/messages.html) require extra information to be
178 | explicitly wrapped in a log record type.  In julia, supporting key value pairs
179 | in logging statements gives a good mixture of usability and flexibility:
180 | Information can be communicated to the logging backend as simple keyword
181 | function arguments, and the keywords provide syntactic hints for early filtering
182 | in the logging macro frontend.
183 | 
184 | In addition to the explicitly provided information, some useful metadata can be
185 | automatically extracted and stored with each log record.  Some of this is
186 | extracted from the lexical environment or generated by the logging frontend
187 | macro, including code location (module, file, line number) and a unique message
188 | identifier.  The rest is dynamic state which can be generated on demand by the
189 | backend, including system time, stack trace, current task id.
190 | 
191 | ### The logging frontend
192 | 
193 | TODO
194 | 
195 | ### Logging middle layer
196 | 
197 | TODO
198 | 
199 | ### Early filtering
200 | 
201 | TODO
202 | 
203 | ### Default backend
204 | 
205 | TODO
206 | 
207 | ## Concrete use cases
208 | 
209 | ### Base
210 | 
211 | In Base, there are three somewhat disparate mechanisms for controlling logging.
212 | An improved logging interface should unify these in a way which is convenient
213 | both in the code and for user control.
214 | 
215 | * The 0.6 logging system's `logging()` function with redirection based on module
216 |   and function.
217 | * The `DEBUG_LOADING` mechanism in loading.jl and `JULIA_DEBUG_LOADING`
218 |   environment variable.
219 | * The depwarn system, and `--depwarn` command line flag
220 | 
221 | 
222 | ## Inspiration
223 | 
224 | This Julep draws inspiration from many previous logging frameworks, and helpful
225 | discussions with many people online and at JuliaCon 2017.
226 | 
227 | The Java logging framework [log4j2](https://logging.apache.org/log4j/2.x/) was a
228 | great source of use cases, as it contains the lessons from at least twenty years
229 | of large production systems.  While containing a fairly large amount of
230 | complexity, the design is generally very well motivated in the documentation,
231 | giving a rich set of use cases.  The julia logging libraries - Base in julia 0.6,
232 | Logging.jl, MiniLogging.jl, LumberJack.jl, and particularly
233 | [Memento.jl](https://github.com/invenia/Memento.jl) - provided helpful
234 | context for the needs of the julia community.
235 | 
236 | Structured logging as available in
237 | [glib](https://developer.gnome.org/glib/stable/glib-Message-Logging.html)
238 | and [RFC5424](https://datatracker.ietf.org/doc/rfc5424/?include_text=1) (The
239 | Syslog protocol) provide context for the usefulness of log records as key value
240 | pairs.
241 | 
242 | For the most part, existing julia libraries seem to follow the design tradition
243 | of the standard [python logging library](https://docs.python.org/3/library/logging.html),
244 | which has a lineage further described in [PEP-282](https://www.python.org/dev/peps/pep-0282/).
245 | The python logging system provided a starting point for this Julep, though the
246 | design eventually diverged from the typical hierarchical setup.
247 | 
248 | TODO: Re-survey the following?
249 | * a-cl-logger (Common lisp) - https://github.com/AccelerationNet/a-cl-logger
250 | * Lager (Erlang) - https://github.com/erlang-lager/lager
251 | 
252 | 
253 | 


--------------------------------------------------------------------------------
/Find.md:
--------------------------------------------------------------------------------
  1 | # JULEP find
  2 | 
  3 | - **Title:** Reorganize Search and Find API
  4 | - **Authors:** Milan Bouchet-Valat <<nalimilan@club.fr>>
  5 | - **Created:** December 10, 2016
  6 | - **Status:** work in progress
  7 | 
  8 | ## Abstract
  9 | 
 10 | The current `find` and `search` families of functions are not very consistent with regard to
 11 | naming and supported features. This proposal aims to make the API more systematic. It is based
 12 | on ideas discussed in particular in [issue #10593](https://github.com/JuliaLang/julia/issues/10593)
 13 | and [issue #5664](https://github.com/JuliaLang/julia/issues/5664).
 14 | 
 15 | ## Current Functions
 16 | 
 17 | Currently there are (at least) five families of search and find functions:
 18 | - `find` `findn` `findin` `findnz`, `findfirst` `findlast` `findprev` `findnext`
 19 | - `[r]search` `[r]searchindex` `searchsorted` `searchsortedlast` `searchsortedfirst`
 20 | - `match` `matchall` `eachmatch`
 21 | -  `indmin` `indmax` `findmin` `findmax`
 22 | - `indexin`
 23 | 
 24 | In the `find` family, `find` and `findn` return indices of non-zero or `true` values.
 25 | `findfirst`, `findlast`, `findprev` and `findnext` are very similar to `find`, but
 26 | iterative. `findin` allows looking for all elements of a collection inside another one.
 27 | Finally, `findnz` is even more different as it only works on matrices and returns a tuple
 28 | of vectors `(I,J,V)` for the row- and column-index and value.
 29 | 
 30 | In the `search` family, `[r]search` and `[r]searchindex` look for strings/chars/regex in a
 31 | string (though they also support bytes), the former returning a range, the latter the first
 32 | index. `searchsorted`, `searchsortedlast` and `searchsortedfirst` look for values equal to
 33 | or lower than an argument, and return a range for the first, and index for the two others.
 34 | 
 35 | The `match`, `matchall` and `eachmatch` functions deal with regular expressions. `match`
 36 | returns a special `RegexMatch` object with offsets and matches. `matchall` returns all
 37 | matching substrings. `eachmatch` returns an iterator over matches.
 38 | 
 39 | The `indmin` and `indmax` functions are quite different, as they return the index of the
 40 | minimum/maximum value. `findmin` and `findmax` return an `(index, value)` tuple of these
 41 | elements.
 42 | 
 43 | Finally, `indexin` is the same as `findin` (i.e. returns index of elements in a collection),
 44 | but it returns `0` for elements that were not found, instead of a shorter vector.
 45 | 
 46 | ## Dimensions of Variation
 47 | 
 48 | This diversity can be organized along several dimensions, which are not always combined
 49 | systematically in the existing API:
 50 | 
 51 | - **Mode of operation**:
 52 |   - all matches at once (`find`, `findin`, `indexin`)
 53 |   - iteratively forward (`findnext`, `search`)
 54 |   - iteratively backwards (`findprev`, `rsearch`)
 55 |   - the first match (`findfirst`, `searchsortedfirst`)
 56 |   - the last match (`findlast`, `searchsortedlast`)
 57 | 
 58 | - **Look for**:
 59 |   - non-zeros or `true` entries (`find(A)`)
 60 |   - predicate-test-true (`find(pred, A)`)
 61 |   - elements present in a collection (`findin`, `indexin`)
 62 |   - elements equal to a value (`findfirst(A, v)`, `findlast(A, v)`, `findnext(A, v)`,
 63 | `findprev(A, v)`)
 64 |   - extrema (`findmin`, `findmax`)
 65 |   - range of elements matching a sequence (`search*`, mostly for strings)
 66 | 
 67 | - **Return**:
 68 |   - linear indices (most `find*` functions)
 69 |   - cartesian indices (`findn`)
 70 |   - cartesian indices and values (`findnz`)
 71 |   - range of linear indices (`search*`)
 72 | 
 73 | - **Return when not found**:
 74 |   - shorter vector for all-at-once functions (`find`, `findin`,
 75 | `findn`, `findfnz`)
 76 |   - except `indexin` which includes a `0` entry
 77 |   - `0` for functions returning a single index
 78 | 
 79 | ## Summary of Current Status
 80 | 
 81 | The following table reorganizes existing methods which return linear indices based on the
 82 | first two dimensions described above:
 83 | - Whether to return all matches, only the next one, or only the previous one.
 84 | - What values to look for.
 85 | 
 86 | |  | nonzeros | test predicate `pred` | in collection `c` | equal to value `v` | sequence or regex `s` | extrema |
 87 | | --- | --- | --- | --- | --- | --- | --- |
 88 | | All at once | `find(A)` | `find(pred,A)` | `findin(A,c)` | `searchsorted(A,v)` | | `indmin(A)`/`indmax(A)` |
 89 | | Next match | `findnext(A,1)` | `findnext(pred,A,1)` | | `findnext(A,v,1)` | `search(A,s,1)` | |
 90 | | Previous match | `findprev(A,endof(A))` | `findprev(pred,A,endof(A))` | | `findprev(A,v,endof(A))` | `rsearch(A,s,endof(A))` | |
 91 | 
 92 | Some functions do not fit into this table:
 93 | - `findfirst` and `findlast` are special cases of `findnext` and `findprev`.
 94 | - `searchsortedfirst` and `searchsortedlast` give each a part of the result of `searchsorted`.
 95 | - `[r]searchindex` give part of the result from `[r]search`.
 96 | - `findn` and `findnz` do not return linear indices.
 97 | - `indexin` is similar to `findin` but returns `0` for entries with no match.
 98 | - `match`, `matchall` and `eachmatch` return `RegexMatch` objects or strings rather than
 99 | indices.
100 | - `findmin` and `findmax` return both the index an value of extrema.
101 | 
102 | ## Open Design Issues
103 | 
104 | - **How to switch between forward and backward search**:
105 |     - Separate functions (e.g. starting with `r` for "reverse"): not great for
106 |     documentation, harder to find, not very Julian.
107 |     - `rev=false` positional argument: not very explicit.
108 |     - `rev=false` keyword argument: clearer and consistent with `sort`, but maybe too slow
109 |     (especially for single-element functions).
110 |     - special object like `Order.Forward`/`Order.Backward`: clearer, but these objects do
111 |     not have this meaning in Base, and introducing separate objects just for this may not
112 |     be worth it.
113 | 
114 | - **Whether to keep the the `find`/`search` distinction**:
115 |     - Obviously requires clearly distinct meanings for each family of functions.
116 |     - Advantage: less complex signatures (due to many methods) for users, and limits
117 |     dispatch ambiguities.
118 |     - Drawback: using two different names for related functions makes it harder to find
119 |     one variant when you know the other one; in particular, auto-completion does not help.
120 | 
121 | - **How to search iteratively**:
122 |     - Functions returning the next/previous match after a given index: simple, but require
123 |     manual handling of indices.
124 |     - Functions returning an iterator over matches: more user-friendly, though overkill when
125 |     you only want the next match.
126 |     - The first approach can fit all needs (even if it can be cumbersome), but the second
127 |     one can only replace the first one if it supports creating an iterator starting from
128 |     a given index (on which you can call `first` to get the first match).
129 | 
130 | 
131 | ## General Proposal 1
132 | 
133 | The first proposal uses `find` for all-at-once variants, and `search` for iterative
134 | variants. The variants returning iterators (last two rows) do not correspond to existing
135 | functions: they could be added later, or never, without breaking the consistency of the API.
136 | In this proposal, it is not possible to have both methods returning the next/previous match
137 | and methods returning an iterator starting from a given index: the signatures would be the
138 | same.
139 | 
140 | |  | nonzeros | predicate test | in collection `c` | equal to `v` | sequence or regex `s` | extrema |
141 | | --- | --- | --- | --- | --- | --- | --- |
142 | | All at once | `find(A)` | `find(pred,A)` | `findin(A,c)` | `findeq(A,v)` | `findseq(A,s)` | `findmin(A)`/`findmax(A)` |
143 | | Next match | `search(A,1)` | `search(pred,A,1)` | * | `searcheq(A,v,1)` | `searchseq(A,s,1)` | |
144 | | Previous match | `search(A,endof(A),true)` | `search(pred,A,endof(A),true)` | * | `searcheq(A,v,endof(A),true)` | `searchseq(A,s,endof(A),true)` | |
145 | | Forward iterator | `search(A)` | `search(pred,A)` | * | `searcheq(A,v)` | `searchseq(A,s)` | |
146 | | Backward iterator | `search(A,true)` | `search(pred,A,true)` | * | `searcheq(A,v,true)` | `searchseq(A,s,true)` | |
147 | 
148 | \* These combinations are not needed as they correspond to `searchseq`. Indeed they do not
149 | exist in the current API.
150 | 
151 | ## General Proposal 2
152 | 
153 | The second proposal uses `find` for functions returning one or several indices (either
154 | all-at-once or iterative), and `search` for functions returning iterators (which
155 | currently do not exist). Contrary to the first proposal, it therefore allows for
156 | iterators starting at a a specific index. If those variants were not added in the end,
157 | only `find` would exist. Conversely, methods returning the next/previous match could be
158 | droppped in favor of iterators.
159 | 
160 | |  | nonzeros | predicate test | in collection `c` | equal to `v` | sequence or regex `s` | extrema |
161 | | --- | --- | --- | --- | --- | --- | --- |
162 | | All at once | `find(A)` | `find(pred,A)` | `findin(A,c)` | `findeq(A,v)` | `findseq(A,s)` | `findmin(A)`/`findmax(A)` |
163 | | Next match | `find(A,1)` | `find(pred,A,1)` | * | `findeq(A,v,1)` | `findseq(A,s,1)` | |
164 | | Previous match | `find(A,endof(A),true)` | `find(pred,A,endof(A),true)` | * | `findeq(A,v,endof(A),true)` | `findseq(A,s,endof(A),true)` | |
165 | | Forward iterator | `search(A)` | `search(pred,A)` | * | `searcheq(A,v)` | `searchseq(A,s)` | |
166 | | Backward iterator | `search(A,true)` | `search(pred,A,true)` | * | `searcheq(A,v,true)` | `searchseq(A,s,true)` | |
167 | 
168 | \* These combinations are not needed as they correspond to `searchseq`. Indeed they do not
169 | exist in the current API.
170 | 
171 | ## Proposal 3
172 | 
173 | This proposal adds `findeach(pred, A[, rev])`, which returns an iterator and can be used to
174 | implement most of the other functions in one line.
175 | Predicates are always used instead of separate functions for different kinds of searches when possible.
176 | This potentially allows using the same function for sequence searching, since a subsequence to look for
177 | is unlikely to be confused with a predicate.
178 | 
179 | |  | nonzeros | predicate test | in collection `c` | equal to `v` | sequence or regex `s` |
180 | | --- | --- | --- | --- | --- | --- |
181 | | All at once | `find(A)` | `find(pred, A)` | `find(occursin(c), A)` | `find(equalto(v), A)` | `find(s, A)` |
182 | | Next match | `findeach(!iszero,A)` * | `findeach(pred,A)` * | `findeach(occursin(c),A)` * | `findeach(equalto(v),A)` * | `findeach(s,A)` * |
183 | | Previous match | `findeach(!iszero,A,true)` * | `findeach(pred,A,true)` * | `findeach(occursin(c),A,true)` * | `findeach(equalto(v),A,true)` * | `findeach(s,A,true)` * |
184 | | Forward iterator | `findeach(!iszero,A)` | `findeach(pred,A)` | `findeach(occursin(c),A)` | `findeach(equalto(v),A)` | `findeach(s,A)` |
185 | | Backward iterator | `findeach(!iszero,A,true)` | `findeach(pred,A,true)` | `findeach(occursin(c),A,true)` | `findeach(equalto(v),A,true)` | `findeach(s,A,true)` |
186 | 
187 | \* Getting the next and previous matches is handled by the iteration protocol.
188 | If necessary, you can pass `findeach(pred, rest(itr, st))` to start at a particular state.
189 | We can keep `findnext` and `findprev`, since they operate on array indices while
190 | the general iterator needs to operate on state objects.
191 | 
192 | We can also keep `findfirst` and `findnext`, since they are especially convenient.
193 | Ideally we will keep only `findfirst(pred, A)` and deprecate other methods.
194 | 
195 | If we want, `find` can be deprecated to `collect(findeach(...))`.
196 | 
197 | The following functions can also be deprecated to `findeach` calls: `findin`, `search`, `rsearch`, `match`, `eachmatch`.
198 | 
199 | This proposal does not touch `findmin`, `findmax`, etc.
200 | 
201 | ## Particular Cases
202 | 
203 | Other issues are more localized and can be fixed one by one, depending on the chosen general
204 | plan.
205 | 
206 | - **`findmin` and `findmax`**: `findmin` and `findmax` are inconsistent
207 | with both proposals, since they return an `(index, value)` tuple instead of an index. They
208 | should be changed to return an index (as in both proposals above). A new name needs to be
209 | found if we want to keep `(index, value)` variants, which are slightly more efficient.
210 | 
211 | - **`searchsorted*` functions**: These functions should be replaced with
212 | standard search/find functions called on a special `SortedArray` wrapper. `findeq` would
213 | replace `searchsorted` and -- like that function -- return a range (instead of a `Vector`)
214 | of indices, which is possible when input is sorted.
215 | 
216 | - **`indexin`**: It is not clear whether this function really belongs to the
217 | find/search family. It could be kept as-is.
218 | 
219 | - **`*match*` functions**: These functions (`match`, `matchall` and `eachmatch`)
220 | return `RegexMatch` objects or strings (rather than indices). They can be left outside the
221 | scope of this Julep. On the other hand `findseq`/`searchseq` functions should support regexes
222 | for consistency, only returning ranges of indices (as does `search` currently).
223 | 
224 | ## Deprecation strategy
225 | 
226 | Depending on the choices made, the migration to the new API will be possible in a single
227 | release (if no ambiguity exists with the old one), or it will have to be done in two
228 | releases (to allow removing old conflicting methods first).
229 | 
230 | ## Issues Beyond the Scope of This Julep
231 | 
232 | These are important to resolve but are not covered by the above proposals.
233 | 
234 | - **Whether to return a `Nullable` instead of `0` when there is no match**
235 | ([PR#15755](https://github.com/JuliaLang/julia/pull/15755)): This is blocked by progress
236 | with regard to `Nullable`, in particular whether they are stack-allocated in all cases
237 | and whether they can be represented as a `Union` type. It is therefore out of this Julep's
238 | scope.
239 | 
240 | - **Whether to return linear or cartesian indices**
241 | ([PR#14086](https://github.com/JuliaLang/julia/pull/14086)): Both could be needed depending
242 | on the context. Passing `CartesianIndex` as the first argument to all functions would work
243 | and would allow replacing `findn(A)` with `find(CartesianIndex, A)`. On the other hand,
244 | computing the linear index is slow for `LinearSlow` arrays, which means that returning
245 | the same index type as `eachindex(A)` could be a better default; it also makes more sense
246 | for multidimensional arrays. Then one would write `find(LinearIndex, A)` or  `find(Int, A)`
247 | to always get a linear index.
248 | 
249 | - **Sentinel values in a world where array indices do not necessarily start with 1**:
250 |     - `findfirst(x, v)` returns 0 if no value matching `v` is found;
251 |       however, if `x` allows 0 as an index, the meaning of 0 is
252 |       ambiguous. One could return `typemin(Int)` or
253 |       `minimum(linearindices(x))-1`, but what if `x` starts indexing
254 |       at `typemin(Int)`?
255 |     - No matter sentinel value gets returned, the deprecation
256 |       strategy here is delicate. There may be a lot of code that
257 |       checks the return value and compares it to 0.
258 | 


--------------------------------------------------------------------------------
/StructuredConcurrency.md:
--------------------------------------------------------------------------------
  1 | # Structured Concurrency
  2 | 
  3 | * Title: Exploring Structured Concurrency
  4 | * Editor: Chris Foster <chris42f@gmail.com>
  5 | * Created: 2019-09-12
  6 | * Status: Work in progress
  7 | * Discussion: [JuliaLange/julia#33248](https://github.com/JuliaLang/julia/issues/33248)
  8 | 
  9 | Here are some notes surveying structured concurrency as it can be applied to
 10 | Julia.
 11 | 
 12 | Julia has supported non-parallel concurrency since very early on and a
 13 | restricted form of parallel programming with the `@threads` macro since version
 14 | 0.5.
 15 | 
 16 | In julia 1.3 a threadsafe runtime for truly parallel tasks [has
 17 | arrived](https://julialang.org/blog/2019/07/multithreading) which will greatly
 18 | increase their appeal in Julia's numerical and technical computing community.
 19 | It's time to think about APIs where users can express concurrent computation in
 20 | a safe and composable way.
 21 | 
 22 | ### Background terminology
 23 | 
 24 | For clarity, here's a few items of terminology:
 25 | 
 26 | * A Julia [**task**](https://docs.julialang.org/en/v1/manual/control-flow/index.html#man-tasks-1)
 27 |   stores the computational state needed to continue execution of a nested set
 28 |   of function calls. In the standard runtime this includes any native stack
 29 |   frames, CPU registers and julia runtime state needed to suspend and resume
 30 |   execution.
 31 | * A program is **concurrent** when there are multiple tasks which have started
 32 |   but not yet completed at a given time.
 33 | * A program is **parallel** when two or more tasks are executing at a given
 34 |   time.
 35 | 
 36 | With these definitions, parallelism implies concurrency but a concurrent
 37 | program can be non-parallel if the runtime serially interleaves task execution.
 38 | See, for example,
 39 | [section 2.1.2](https://books.google.com.au/books?redir_esc=y&id=J5-ckoCgc3IC&q=paralleism+versus+concurrency#v=snippet&q=paralleism%20versus%20concurrency&f=false)
 40 | of "Introduction to Concurrency in Programming Languages".
 41 | 
 42 | ### What is structured concurrency?
 43 | 
 44 | To quote the [`libdill` documentation](http://libdill.org/structured-concurrency.html),
 45 | 
 46 | > Structured concurrency means that lifetimes of concurrent functions are
 47 | > cleanly nested. If coroutine `foo` launches coroutine `bar`, then `bar` must
 48 | > finish before `foo` finishes.
 49 | >
 50 | > This is not structured concurrency:
 51 | >
 52 | > ![unstructured concurrency](StructuredConcurrency_not_structured.jpeg)
 53 | >
 54 | > This is structured concurrency:
 55 | >
 56 | > ![structured concurrency](StructuredConcurrency_structured.jpeg)
 57 | 
 58 | It's all about composability. Structured concurrency is good because it
 59 | reasserts the function call as the natural unit of program composition, where
 60 | the lifetime of a computation is delimited in the *structure of the source
 61 | code*. This is sometimes called the
 62 | [*black box rule*](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/#what-happened-to-goto).
 63 | Without this,
 64 | 
 65 | * Task failures can go unhandled because there's nowhere to propagate the error.
 66 | * Task lifetime is not defined by the source code. When a task starts and
 67 |   whether it runs to completion is an implementation detail of the runtime.
 68 | * Computation cannot be cancelled systematically because there's no natural
 69 |   tree of child tasks.
 70 | * Scope-based resource cleanup (eg, with `open(...) do io`) is broken because
 71 |   task local context can leak from parents into long running children.
 72 | 
 73 | For a colourful view on the downsides of unstructured concurrency, `@njsmith`
 74 | has expressed it this way in his blog post [Notes on structured concurrency or,
 75 | "go statement considered harmful"](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/#conclusion):
 76 | 
 77 | > The popular concurrency primitives — go statements, thread spawning
 78 | > functions, callbacks, futures, promises, ... they're all variants on `goto`,
 79 | > in theory and in practice. And not even the modern domesticated `goto`, but the
 80 | > old-testament fire-and-brimstone `goto`, that could leap across function
 81 | > boundaries. These primitives are dangerous even if we don't use them
 82 | > directly, because they undermine our ability to reason about control flow and
 83 | > compose complex systems out of abstract modular parts, and they interfere
 84 | > with useful language features like automatic resource cleanup and error
 85 | > propagation. Therefore, like goto, they have no place in a modern high-level
 86 | > language.
 87 | 
 88 | 
 89 | ### Structured concurrency in Julia 1.0?
 90 | 
 91 | Julia 1.0 supports a limited kind of structured concurrency via the `@sync`
 92 | block which waits for lexically contained child tasks (scheduled using
 93 | `@async`) to complete. However, like Go, there's no requirement that concurrent
 94 | work is actually scoped this way; that's completely up to the user and they may
 95 | use `@async` anywhere. At first sight, it may seem just as natural to choose an
 96 | unstructured
 97 | [communicating sequential processes](https://en.wikipedia.org/wiki/Communicating_sequential_processes)
 98 | (CSP) style in current Julia.
 99 | 
100 | Even if the user chooses structured concurrency with `@sync`, they are still
101 | faced with implementing robust cancellation machinery by hand using `Channel`s.
102 | This is the big missing piece required for the natural use of structured
103 | concurrency in Julia.
104 | 
105 | 
106 | ## Cancellation and preemption
107 | 
108 | A robust task cancellation system is required to express structured
109 | concurrency. Without it, child tasks cannot be systematically managed in
110 | response to events such as a timeout from the parent or the failure of a
111 | sibling. For a great discussion of cancellation and a survey of cancellation
112 | APIs see the blog post ["Timeouts and cancellation for
113 | humans"](https://vorpus.org/blog/timeouts-and-cancellation-for-humans).
114 | 
115 | **Big challenge**: how do we handle cancellation safely but in a timely way?
116 | What are the valid cancellation points and can we have cancellation which is
117 | both timely, safe and efficient in a wide variety of situations? Ideally we'd
118 | like tight numerical loops to be cancellable as well as IO. And we want all
119 | this without the performance penalty of inserting extra checks or safe points
120 | into loop code.
121 | 
122 | ### The challenge of preemptive cancellation
123 | 
124 | At first sight one might hope to treat preemptive cancellation somewhat like
125 | `InterruptException`: wake the task, deliver a signal to its thread to generate
126 | a `CanceledException` which then unwinds the stack, running regular user
127 | cleanup code.
128 | 
129 | The key difficulty here is that arbitrary preemptive cancellation can occur in
130 | any location with no syntactic hint in the source. Others [have
131 | claimed](https://github.com/golang/go/issues/29011#issuecomment-443441031) that
132 | this makes arbitrary cancellation an impossible problem for user code. The
133 | standard compromise is to make only a core set of operations (including IO)
134 | cancellable. This is the solution offered in
135 | [Python Trio checkpoints](https://trio.readthedocs.io/en/stable/reference-core.html#checkpoints),
136 | libdill's family of IO functions and in pthreads (see [pthread\_cancel](http://man7.org/linux/man-pages/man3/pthread_cancel.3.html)
137 | and [pthreads cancellation points](http://man7.org/linux/man-pages/man7/pthreads.7.html)).
138 | In contrast, consider the failed preemptive cancellation APIs
139 | [Java `Thread.stop`](https://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html)
140 | Windows API
141 | [`TerminateThread`](https://devblogs.microsoft.com/oldnewthing/?p=91811),
142 | both of which were found to be fundamentally non-robust.
143 | 
144 | Let's consider the ways in which current julia code can be non-robust in the
145 | face of `InterruptException`. A particular difficulty occurs in resource
146 | acquisition. Consider this snippet from task.jl:
147 | 
148 | ```julia
149 | lock(t.donenotify)
150 | # < What if InterruptException is thrown here?
151 | try
152 |     while !istaskdone(t)
153 |         wait(t.donenotify)
154 |     end
155 | finally
156 |     unlock(t.donenotify)
157 | end
158 | ```
159 | 
160 | In Julia we have the escape hatch `disable_sigint` (`jl_sigatomic_begin` in the
161 | runtime) for deferring `InterruptException`, but most code doesn't consider or
162 | use this which makes user resource handling broken by default.
163 | 
164 | So it's fairly clear that arbitrary cancellation without cleanup is a
165 | non-starter and that arbitrary cancellation with cleanup is difficult. But that
166 | leaves us in a difficult situation: how do we allow for cancellation of
167 | expensive numerical operations? Are there options for cancellation of numerical
168 | loops with a semantic which can be understood by users? The Go people seem to
169 | consider that arbitrary *preemption* is workable, but can arbitrary
170 | cancellation be made to work with the right language and library features?
171 | 
172 | #### Runtime technicalities for preemption
173 | 
174 | On a technical level, our runtime situation in julia-1.3 is very similar to
175 | Go where preemption is cooperative and a rouge goroutine can sometimes wedge
176 | the entire system. There has been a large amount of work in the Go community to
177 | address this, leading to the proposal
178 | ["Non-cooperative goroutine preemption"](https://github.com/golang/proposal/blob/master/design/24543-non-cooperative-preemption.md).
179 | In the process, several interesting alternatives
180 | [were assessed](https://github.com/golang/go/issues/24543) including cooperative
181 | preemption of loops (by the insertion of safe points) and more complex
182 | mechanisms such as returning from a signal to out-of-line code which leads
183 | quickly to a safe point.
184 | 
185 | ## Syntax
186 | 
187 | When comparing to solutions in other languages it's important to mention that
188 | many have introduced special syntax to mark concurrent code.
189 | 
190 | * C# introduced `async`/`await`; many followed (Python, Rust, ...). This makes
191 |   potential suspension points syntactic.
192 | * `await` in Python marks preemption points. `async` is required to go with it,
193 |   forming a chain of custody around "potentially suspending" functions.
194 | * Kotlin has `suspend` to introduce a special calling convention which passes
195 |   along the coroutine context.
196 | * Go doesn't have `async` or `await` but is deeply concurrent and is the best
197 |   analogy to Julia.
198 | 
199 | The problem with `async`/`suspend` is that it splits the world of functions in
200 | two, as nicely expressed in Bob Nystrom's blog post
201 | ["What color is your function?"](http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/).
202 | This is a barrier to composability because higher order functions have to know
203 | about the color of the function they're being passed. Bob argues that Go
204 | handles this in the nicest way by having first class support for continuations
205 | in the language. The Julia runtime does this in the same way.
206 | 
207 | On the other hand, a syntax such as `async`/`await` is arguably a useful visual
208 | marker for possible cancellation points (`await`) and for which functions are
209 | cancellable (`async`). Note that this doesn't have to be implemented at the
210 | language level; for example, Go's context and errgroup also allow the reader to
211 | recognize where the cancellation can happen (listening to the Done channel) and
212 | which functions can be cancelled (those that accept Context as an argument).
213 | 
214 | ## Prototypical use cases
215 | 
216 | * The "happy eyeballs" algorithm is becoming a standard example of structured
217 |   concurrency thanks to `@njsmith`'s tutorial at PyCon 2018.
218 |   - [General discussion](https://trio.discourse.group/t/happy-eyeballs-structured-concurrencys-hello-world/57)
219 |   - [Trio implementation](https://github.com/python-trio/trio/blob/master/trio/_highlevel_open_tcp_stream.py)
220 |   - [Libdill implementation](https://github.com/sustrik/libdill/blob/master/happyeyeballs.c) and [discussion](http://250bpm.com/blog:139)
221 | * The Go concurrency tutorial — in his talk `@elizarov` suggested that
222 |   implementing all the examples there was a great inspiration.
223 | 
224 | ## Related julia issues and prototypes
225 | 
226 | * [Tapir parallel IR](https://github.com/JuliaLang/julia/pull/31086)
227 | 
228 | * [API Request : Interrupt and terminate a task](https://github.com/JuliaLang/julia/issues/6283)
229 | * [Error handling in tasks](https://github.com/JuliaLang/julia/issues/32677)
230 | * [Uncaught exceptions from tasks](https://github.com/JuliaLang/julia/issues/32034)
231 | * [silent errors in Tasks](https://github.com/JuliaLang/julia/issues/10405)
232 | * [asyncmap: Include original backtrace in rethrown exception](https://github.com/JuliaLang/julia/pull/32749)
233 | 
234 | TODO: We should organize these, and more, with a tag.
235 | 
236 | * [Awaits.jl](https://github.com/tkf/Awaits.jl)
237 | 
238 | 
239 | ## Resources
240 | 
241 | A lot has been written on structured concurrency quite recently. Relevant
242 | implementations are available in C, Kotlin and Python, with Go also having to
243 | deal with many of the same issues. The Trio forum has a section dedicated to
244 | the [language-independent discussion of structured
245 | concurrency](https://trio.discourse.group/c/structured-concurrency).
246 | 
247 | #### Links
248 | 
249 | * [Structured concurrency resources - Structured concurrency - Trio forum](https://trio.discourse.group/t/structured-concurrency-resources/21)
250 | * [Reading list · Python-Trio/Trio Wiki](https://github.com/python-trio/trio/wiki/Reading-list)
251 | 
252 | #### People in the wider community
253 | 
254 | * Bob Nystrom ([`@munificent`](http://journal.stuffwithstuff.com)) works on the
255 |   Dart language at google. Regarding async/await, he wrote a very on-topic post
256 |   - [What color is your function?](http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/).
257 | * Martin Sústrik ([`@sustrik`](https://github.com/sustrik)) is the author of
258 |   the C library libdill, and has a interesting [blog](http://250bpm.com/) in
259 |   which the term "structured concurrency" appears to have (perhaps) first
260 |   appeared:
261 |   - [Structured Concurrency](http://250bpm.com/blog:71)
262 |   - [Update on Structured Concurrency](http://250bpm.com/blog:137)
263 |   - [Two approaches to structured concurrency](http://250bpm.com/blog:139)
264 | * Nathanial Smith ([`@njsmith`](https://github.com/njsmith)) is the author of
265 |   the Python Trio library and a key advocate of structured concurrency. His
266 |   [blog](https://vorpus.org/blog/archives.html) has several very interesting
267 |   posts on the topic.
268 |   - [Timeouts and cancellation for humans](https://vorpus.org/blog/timeouts-and-cancellation-for-humans)
269 |   - [Notes on structured concurrency, or: go statement considered harmful](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/).
270 | 
271 |   See also his PyCon 2018 talk:
272 |   - [Nathaniel J. Smith - Trio: Async concurrency for mere mortals - PyCon 2018 - YouTube](https://www.youtube.com/watch?v=oLkfnc_UMcE)
273 | * Roman Elizarov ([`@elizarov`](https://github.com/elizarov)) is the team lead
274 |   for Kotlin libraries at JetBrains. Here's his [blog](https://medium.com/@elizarov).
275 |   - [Structured concurrency](https://youtu.be/hW4vjgtPCAY?t=25960) for Kotlin talk at Hydraconf ([talk abstract](https://hydraconf.com/2019/talks/68l5ztovlf0xm9aindouzr))
276 |   - [Kotlin structured concurrency blog post](https://medium.com/@elizarov/structured-concurrency-722d765aa952)
277 | 
278 | #### Structured concurrency libraries
279 | 
280 | * [libdill (C)](http://libdill.org/structured-concurrency.html)
281 | * [Trio (Python)](https://trio.readthedocs.io/en/stable)
282 | * [Kotlin coroutines](https://kotlinlang.org/docs/reference/coroutines/basics.html#structured-concurrency)
283 | 
284 | #### Cancellation
285 | 
286 | * Python
287 |   - [Timeouts and cancellation for humans](https://vorpus.org/blog/timeouts-and-cancellation-for-humans)
288 | * Go
289 |   - [errgroup](https://godoc.org/golang.org/x/sync/errgroup)
290 |   - [context](https://golang.org/pkg/context)
291 |   - [Discussion of using the Trio approach for Go](https://github.com/golang/go/issues/29011)
292 | 


--------------------------------------------------------------------------------
/GcExtensions.md:
--------------------------------------------------------------------------------
  1 | # Garbage Collector Extensions
  2 | 
  3 | - **Title:** Garbage collector extensions for better foreign language support
  4 | - **Author:** Reimer Behrends <behrends@gmail.com>
  5 | - **Created:** May 2018
  6 | - **Status:** work in progress
  7 | 
  8 | ## Introduction
  9 | 
 10 | The support for modules written entirely or partly in foreign languages
 11 | to interface with the Julia GC or to use the GC for allocations that do
 12 | not neatly fit Julia's type system or use low-level approaches not
 13 | available in Julia (such as irregular data structure layouts) is
 14 | currently somewhat limited.
 15 | 
 16 | This proposal aims at fleshing out the API for allowing more complex
 17 | interaction of foreign code with the GC, especially the use of long-lived
 18 | foreign objects that are inextricably interwoven with Julia
 19 | objects.
 20 | 
 21 | Specific use cases that we are trying to address are:
 22 | 
 23 | 1. *Allowing the Julia GC to manage foreign objects with arbitrary
 24 |   layouts.* Not all objects -- especially those from preexisting
 25 |   libraries -- fit Julia's type system, for example, specialized
 26 |   container types written in C/C++. Such objects can comprise multiple
 27 |   memory blocks that require a custom marking mechanism and may also
 28 |   require low-level finalizer behavior written in C.
 29 | 2. *Providing additional roots to the GC.* Currently, to have additional
 30 |   roots, they must be stored in a location that is visible to Julia.
 31 |   This can be expensive if such roots are updated frequently or are
 32 |   contained in data structures that would have to be laboriously
 33 |   translated into a format usable by Julia. Instead, we want to allow
 34 |   for roots to be discoverable at the beginning of a garbage collection.
 35 | 3. *Conservative scanning of stack frames and objects.* Currently,
 36 |   scanning does have to be precise. If we desire to use the GC for
 37 |   foreign code that requires conservative scanning (especially for
 38 |   foreign stack frames), then it is necessary to have functionality
 39 |   that determines whether a machine word is a pointer to an object,
 40 |   including to its interior.
 41 | 
 42 | To demonstrate the applicability and viability of these mechanisms, we
 43 | have fully integrated Julia with the GAP computer algebra system, to the
 44 | point that GAP's regular garbage collector is completely replaced with
 45 | Julia's and that the lifetime of all GAP objects is entirely managed by
 46 | Julia. We also implemented a self-contained test program using these
 47 | mechanisms, and integrated it into the Julia test suite.
 48 | 
 49 | The proposed implementation should not incur measurable overhead for
 50 | Julia itself, as it only exposes additional functionality that is unused
 51 | by Julia code, plus functionality hooks that are designed to only incur
 52 | a few clock cycles of overhead per garbage collection. More specific
 53 | discussion of overhead can be found accompanying the descriptions of
 54 | these hooks.
 55 | 
 56 | An implementation of this proposal can be found on GitHub under
 57 | <https://github.com/rbehrends/julia> (branch `rb/gc-extensions`). See
 58 | the example in the `test/gcext` subdirectory for an example of using
 59 | this API.
 60 | 
 61 | An implementation of GAP that uses the Julia GC in lieu of its native
 62 | GC can likewise been found on GitHub at <https://github.com/rbehrends/gap>
 63 | (branch `alt-gc`). That version of GAP can be built with:
 64 | 
 65 |     ./autogen.sh
 66 |     ./configure --with-gc=julia --with-julia=/path/to/julia/usr
 67 |     make
 68 | 
 69 | ## Callbacks
 70 | 
 71 | In order to allow foreign code to have access to necessary functionality
 72 | in the garbage collector, we allow foreign code to register callbacks for
 73 | certain GC events. We provide for six types of callbacks:
 74 | 
 75 | 1. Beginning of garbage collection (`pre_gc`)
 76 | 2. End of garbage collection (`post_gc`)
 77 | 3. When scanning GC roots (`root_scanner`)
 78 | 4. When scanning Julia tasks (`task_scanner`)
 79 | 5. When an external object is allocated (`notify_external_alloc`).
 80 | 6. When an external object is deallocated (`notify_external_free`).
 81 | 
 82 | These callbacks are *not* per se thread-safe. It is up to to the callback
 83 | implementation to ensure that no violations of thread-safety occur.
 84 | 
 85 | In particular, each of these can be called from any thread. All except the
 86 | first two can be called concurrently. In the current Julia GC implementation,
 87 | the `post_gc` callback may also not be called before the next `pre_gc`.
 88 | 
 89 | With external objects, we refer to what in the current Julia implementation are
 90 | called `bigval_t` objects. These are allocated using the system's memory
 91 | allocator rather than using Julia's external allocator. In order to not
 92 | expose this implementation detail, we talk about "internal" and "external"
 93 | objects rather than objects that are allocated as part of Julia's object
 94 | pool or through system routines, respectively.
 95 | 
 96 | For each type of callback, there is a corresponding function pointer type.
 97 | Registering and deregistering callbacks occurs via corresponding setter
 98 | functions.
 99 | 
100 | ```
101 | typedef void (*jl_gc_cb_pre_gc_t)(int full);
102 | typedef void (*jl_gc_cb_post_gc_t)(int full);
103 | typedef void (*jl_gc_cb_root_scanner_t)(int full);
104 | typedef void (*jl_gc_cb_task_scanner_t)(jl_task_t *task, int full);
105 | typedef void (*jl_gc_cb_notify_external_alloc_t)(void *addr, size_t size);
106 | typedef void (*jl_gc_cb_notify_external_free_t)(void *addr);
107 | 
108 | void jl_gc_set_cb_root_scanner(jl_gc_cb_root_scanner_t cb, int enable);
109 | void jl_gc_set_cb_task_scanner(jl_gc_cb_task_scanner_t cb, int enable);
110 | void jl_gc_set_cb_pre_gc(jl_gc_cb_pre_gc_t cb, int enable);
111 | void jl_gc_set_cb_post_gc(jl_gc_cb_post_gc_t cb, int enable);
112 | void jl_gc_set_cb_notify_external_alloc(jl_gc_cb_notify_external_alloc_t cb, int enable);
113 | void jl_gc_set_cb_notify_external_free(jl_gc_cb_notify_external_free_t cb, int enable);
114 | ```
115 | 
116 | For each setter function, a callback function is supplied, along with a flag
117 | (`1` for enabling the callback, `0` for removing it again). Attempting to
118 | register a callback multiple times will only register it once.
119 | 
120 | *Performance impact:* The callback implementation is designed to incur
121 | negligible overhead if no callbacks are used and no more overhead than
122 | necessary to invoke the callbacks. The callbacks are all kept in linked
123 | lists; if no callbacks are registered, all that is done is testing a
124 | static variable for being null and to branch if it is. As branch
125 | behavior should always be the same, only a few clock cycles are used, as
126 | long as the variable is in the cache and the branch target in the BTB.
127 | 
128 | ## Additional GC roots and hooking into the GC process
129 | 
130 | We provide three callbacks that are called at the beginning of a GC
131 | (`pre_gc`), the beginning of the mark phase (`root_scanner`), and the end of
132 | the GC (`post_gc`). As these callbacks are tested and called only once per
133 | collection, overhead should be negligible. The `full` argument passed
134 | to these callbacks indicates whether this is a full or partial garbage
135 | collection.
136 | 
137 | In addition, we also provide a `task_scanner` hook, which functions like
138 | the `root_scanner` hook, except that it is called for each task and with
139 | a pointer to the task object as its first argument.
140 | 
141 | Additional roots can be marked from the `root_scanner` and
142 | `task_scanner` callbacks by calling the `jl_gc_mark_queue_obj()`
143 | function, which takes a pointer to the current thread's thread-local
144 | storage a pointer to the object as its parameters.
145 | 
146 | ```
147 | int jl_gc_mark_queue_obj(jl_ptls_t ptls, jl_value_t *obj);
148 | ```
149 | 
150 | The `ptls` parameter can be filled in from the return value of
151 | the `jl_get_ptls_states()` function, which returns a pointer to
152 | the thread-local storage of the current thread.
153 | 
154 | The return value of `jl_gc_mark_queue_obj()` can be ignored for marking
155 | roots, but will be relevant for marking foreign objects (see below).
156 | 
157 | When processing large objects, calling `jl_gc_mark_queue_obj()` can be
158 | ineffecient, as each object will be pushed on the mark stack separately.
159 | 
160 | If possible, it is therefore recommended that programmers use the
161 | following function, designed for arrays of references, which handles
162 | this use case more efficiently:
163 | 
164 | ```
165 | void jl_gc_mark_queue_objarray(jl_ptls_t ptls, jl_value_t *parent,
166 |     jl_value_t **objs, size_t nobjs);
167 | ```
168 | 
169 | Here, `parent` is a reference to the current object, `objs` is a pointer
170 | to the start of an array of object references, and `nobjs` is the number
171 | of object references contained in that array. That array must be part of
172 | the object; it must not be allocated in static memory or on the stack.
173 | 
174 | Unlike `jl_gc_mark_queue_obj()`, this function does not have a return
175 | value, as it does the requisite tracking itself.
176 | 
177 | Calling this function will only require one slot on the mark stack, as
178 | opposed to the `nobjs` slot that individual calls to
179 | `jl_gc_mark_queue_obj()` would require, making it considerably more
180 | memory efficient.
181 | 
182 | ## Managing foreign objects with custom layouts
183 | 
184 | Foreign objects with custom layouts can define their own datatype through
185 | the `jl_new_foreign_type()` function:
186 | 
187 | ```
188 | typedef uintptr_t (*jl_markfunc_t)(jl_ptls_t ptls, jl_value_t *obj);
189 | typedef void (*jl_sweepfunc_t)(jl_value_t *obj);
190 | 
191 | jl_datatype_t *jl_new_foreign_type(
192 |   jl_sym_t *name,
193 |   jl_module_t *module,
194 |   jl_datatype_t *super,
195 |   jl_markfunc_t markfunc,
196 |   jl_sweepfunc_t sweepfunc,
197 |   int haspointers,
198 |   int large
199 | );
200 | ```
201 | 
202 | The first three parameters of `jl_new_foreign_type` are the same as for
203 | regular data types; following are a pointer to a mark function
204 | (`markfunc`) and a pointer to a sweep function (`sweepfunc`); the latter
205 | of which can be null.
206 | 
207 | The `haspointers` parameter should be non-zero if instances of the new
208 | datatype may contain references to Julia objects; the `large` parameter
209 | should be non-zero if the size of instances of the new datatype will be
210 | greater than the value returned by `jl_gc_max_internal_obj_size()` and
211 | zero otherwise. If the objects can be both larger or not, then two
212 | distinct foreign types need to be created, one for the case where the
213 | size is less than or equal and one for the case where it is larger than
214 | the value of `jl_gc_max_internal_obj_size()`.
215 | 
216 | ```
217 | size_t jl_gc_max_internal_obj_size(void);
218 | ```
219 | 
220 | *Performance impact:* Custom mark functions need to be called during the
221 | performance-critical mark loop of the garbage collector. In order to
222 | avoid overhead for the other cases, the code is engineered to consider
223 | such objects as the last possible option in the existing if-else chains.
224 | To accomplish that, such foreign types use the existing
225 | `jl_datatype_layout_t` structure, with `fielddesc_type` set to `3`,
226 | which is looked at after the other data types and the other alternatives
227 | for `fielddesc_type`.
228 | 
229 | ### Mark functions for foreign objects
230 | 
231 | The mark function `markfunc` gets passed a pointer to thread-local
232 | storage (`ptls`)
233 | and the object to be marked (which will be of the type defined through
234 | `jl_new_foreign_type()`. The `ptls` argument is an optimization so that
235 | `jl_get_ptls_states()` does not need to be called unnecessarily during
236 | the mark loop.
237 | 
238 | The mark function implementation also uses `jl_gc_mark_queue_obj()` to
239 | mark objects, as with the `root_scanner` callback; however, in contrast to marking
240 | roots, the return value cannot be ignored. Per object, the mark function
241 | should count how often `jl_gc_mark_queue_obj()` for subjects return
242 | non-zero values and return that number. If an object has no subobjects,
243 | the mark function should return zero.
244 | 
245 | This information is relevant for the generational part of garbage
246 | collection. The return value of `jl_gc_mark_queue_obj()` is non-zero
247 | if a young generation object has been marked. When the mark function
248 | has been called for an old object and the mark function returns a
249 | non-zero value (thus showing how many young objects have been marked
250 | from the old one), the GC knows to update its internal data
251 | structures accordingly.
252 | 
253 | For an example of this, see the `gcext` test in the Julia repository,
254 | which defines a couple of such custom mark functions.
255 | 
256 | ### Sweep functions for foreign objects
257 | 
258 | Sweep functions for foreign objects are similar to, but more limited
259 | than finalizers, as they are not intended to replace finalizer
260 | functionality. Rather, they are meant to clean up complex memory
261 | structures allocated with raw malloc calls or operating system
262 | resources. They will be called during the sweep phase and must not have
263 | side effects that are visible to Julia.
264 | 
265 | To enable sweep functions for a foreign object, the function
266 | `jl_gc_schedule_foreign_sweepfunc()` has to be called on the object,
267 | which has to be of a foreign type and that foreign function has to be
268 | defined with a non-null sweep function `sweepfunc`. Without that call,
269 | the sweep function will not be called on this particular object. This is
270 | to avoid unnecessary overhead if not all objects of that type require
271 | extra sweep phase semantics. This function should be called at most once
272 | per object; if called multiple times, the sweep function may be invoked
273 | more than once on the given object.
274 | 
275 | ```
276 | JL_DLLEXPORT void jl_gc_schedule_foreign_sweepfunc(jl_ptls_t ptls,
277 |         jl_value_t *obj);
278 | ```
279 | 
280 | ### Allocating foreign objects
281 | 
282 | On the C side, such objects can be allocated using the call
283 | `jl_gc_alloc_typed()`; the function takes a pointer to the thread's
284 | thread-local storage, the desired size, and the foreign datatype as its
285 | arguments.
286 | 
287 | ```
288 | JL_DLLEXPORT void * jl_gc_alloc_typed(jl_ptls_t ptls, size_t sz,
289 |   void *ty);
290 | ```
291 | 
292 | ## Conservative scanning
293 | 
294 | Some external modules may require conservative scanning, especially
295 | of the stack. This was the case, for example, with our application
296 | involving the GAP computer algebra system.
297 | 
298 | We note that conservative scanning should be avoided if at all possible;
299 | it is not intended as a way to avoid tracking Julia references (for which
300 | the `root_scanner` callback and custom marking functions offer efficient
301 | options if other approaches fail), but as a feature of last resort if
302 | integrating an existing codebase through other means is not viable.
303 | 
304 | Conservative scanning must be enabled through a call to the following
305 | function:
306 | 
307 | ```
308 | void jl_gc_enable_conservative_scanning(void);
309 | ```
310 | 
311 | This function can be called from C code both before and after `jl_init()` 
312 | and is thread-safe. Enabling this introduces a very small, but non-zero
313 | overhead, which is why it is not enabled by default.
314 | 
315 | In order to handle conservative scanning, we need to expose the fact
316 | that Julia distinguishes between objects it manages itself (which we
317 | call "internal objects" in this document) and objects that it manages
318 | via "malloc()" or similar calls (these we call "external objects").
319 | 
320 | The proposed functionality relies on calls to Julia to determine
321 | if a pointer is a reference to an internal object, but leaves it up
322 | to the author of the foreign code to determine this for external
323 | objects; to this end, we provide callbacks to notify foreign code
324 | of the allocation or deallocation of such objects.
325 | 
326 | The accompanying function pointer types are:
327 | 
328 | ```
329 | typedef void (*jl_gc_cb_notify_external_alloc_t)(void *addr, size_t size);
330 | typedef void (*jl_gc_cb_notify_external_free_t)(void *addr);
331 | ```
332 | 
333 | The allocation callback is invoked with the address and size of the
334 | new object, the deallocation callback is invoked with the address of
335 | the object about to be freed. Allocation and deallocation is still
336 | managed by Julia. The intent here is that foreign code can track
337 | allocations and deallocations in a data structure of its own if
338 | needed. An example of this can be seen in the `gcext` test, where
339 | we use a balanced tree to track allocations.
340 | 
341 | Note that registering such callbacks will only track allocations that
342 | occur *after* the callbacks have been set. We assume here that the client
343 | is only interested in tracking its own objects that may be stored in
344 | opaque stack frames, but not other Julia objects that may be passed in
345 | from Julia calls. If the client needs to track *all* allocations, then
346 | the callbacks *must* be registered before calling `jl_init()`.
347 | 
348 | *Performance impact:* The overhead for the callbacks should be minimal,
349 | especially since the cost of allocating large objects through the system
350 | allocator and initializing them will dominate the allocation process.
351 | 
352 | Note that some of these objects may not have a valid type field and
353 | especially in the context of conservative scanning, pointers to
354 | objects with invalid type fields may inadvertently be generated. In
355 | such a case, the validity of the type field should also be checked,
356 | e.g. with: `jl_gc_internal_obj_base_ptr(jl_typeof(obj)) != NULL` (see
357 | below for the semantics of this function).
358 | 
359 | To determine whether a pointer points to an internal object, the
360 | following functions may be used:
361 | 
362 | ```
363 | jl_value_t *jl_gc_internal_obj_base_ptr(void *p);
364 | int jl_gc_is_internal_obj_alloc(jl_value_t *p);
365 | ```
366 | 
367 | The `jl_gc_internal_obj_base_ptr()` function returns `NULL` if the
368 | argument does not point to the beginning, the interior, or the end of an
369 | internal object. otherwise, it returns a pointer to the beginning of the
370 | object it points to. The `jl_gc_is_internal_obj_alloc()` function is an
371 | optimized fast path version; it returns a non-zero value if and only the
372 | argument is a valid internal object or if it points to memory reserved
373 | for the allocation of such objects. In the latter case, it is guaranteed
374 | that the type field of such an object does not contain a valid datatype.
375 | 
376 | ## Performance evaluation
377 | 
378 | In order to evaluate the changes for performance, we ran the system
379 | with no callbacks or foreign types installed against the Julia base
380 | benchmarks (namely, the "array", "collection", "micro", "shootout",
381 | "sparse", "string", and "tuple" suites) for both our changes and
382 | a recent version of the master branch.
383 | 
384 | We did not observe any performance regressions in those benchmarks.
385 | While, due to the noisiness of our test system, spurious regressions
386 | crept up occasionally (about a handful per run), none of them
387 | persisted for more than one run of the suite.
388 | 
389 | Also, when testing for improvements, similar and similarly common
390 | performance changes occurred in the other direction, including spurious
391 | "regressions" of the master branch compared to our changed version.
392 | 
393 | Finally, we ran a couple of specialized microbenchmarks (included below)
394 | designed to stress-test the garbage collector several times and observed
395 | the performance over several runs for both the master branch version and
396 | our changes; we did not observe significant differences in the
397 | distribution of `@btime` results.
398 | 
399 |     using BenchmarkTools
400 | 
401 |     function bencharr(n, m, m2, x, y)
402 |       global t = [ (x, y) for i in 1:(n * n * m2) ]
403 |       local a = [ [ (x, y) for j in 1:n ] for i in 1:n ]
404 |       for i in 1:m
405 |         a = map(outer -> map(inner -> (inner[2], inner[1]), outer), a)
406 |       end
407 |     end
408 | 
409 |     function fac(n)
410 |       local result = BigInt(1)
411 |       for i in 2:n
412 |         result *= i
413 |       end
414 |       return result
415 |     end
416 | 
417 |     function benchfac(n)
418 |       local total = BigInt(0)
419 |       for i in 1:n
420 |         total += fac(i)
421 |       end
422 |       return total
423 |     end
424 | 
425 |     print("Array allocations benchmark:  ")
426 |     @btime bencharr(200, 1000, 100, "x", "y")
427 |     print("BigInt allocations benchmark: ")
428 |     @btime benchfac(2000)
429 | 
430 | 


--------------------------------------------------------------------------------
/Pkg3.md:
--------------------------------------------------------------------------------
  1 | # JULEP 3
  2 | 
  3 | - **Title:** Pkg3
  4 | - **Authors:** Stefan Karpinski <<stefan@karpinski.org>>, Art Diky <<wildart@gmail.com>>
  5 | - **Created:** October 21, 2016
  6 | - **Status:** work in progress
  7 | 
  8 | ## Abstract
  9 | 
 10 | Pkg3 is the working name for a next-generation replacement for Julia's built-in package manager, the current version of which is unofficially known as Pkg2 (introduced in Julia 0.2 to replace the original Pkg1).
 11 | 
 12 | ### Table of Contents
 13 | - [JULEP 3](#julep-3)
 14 |   - [Abstract](#abstract)
 15 |   - [Rationale](#rationale)
 16 |   - [Depots](#depots)
 17 |   - [Immutability](#immutability)
 18 |   - [Environments](#environments)
 19 |     - [Using Environments](#using-environments)
 20 |     - [Project Environments](#project-environments)
 21 |   - [Packages](#packages)
 22 |   - [Registries](#registries)
 23 |   - [Versions & Compatibility](#versions--compatibility)
 24 |   - [Configuration](#configuration)
 25 |     - [Configuration Fragments](#configuration-fragments)
 26 |       - [Package metadata](#package-metadata)
 27 |       - [Version metadata](#version-metadata)
 28 |       - [Compatibility](#compatibility)
 29 |       - [Runtime Configuration](#runtime-configuration)
 30 |       - [Manifest](#manifest)
 31 |     - [Source Package File](#source-package-file)
 32 |     - [Registry Package File](#registry-package-file)
 33 |   - [Operations](#operations)
 34 |     - [Adding packages](#adding-packages)
 35 |       - [Synopsis](#synopsis)
 36 |       - [Example](#example)
 37 |       - [Pseudo-code](#pseudo-code)
 38 |       - [Dependency fixing](#dependency-fixing)
 39 |       - [Questions](#questions)
 40 |     - [Removing packages](#removing-packages)
 41 |       - [Synopsis](#synopsis)
 42 |       - [Example](#example)
 43 |       - [Pseudo-code](#pseudo-code)
 44 |     - [Updating & upgrading packages](#updating--upgrading-packages)
 45 |       - [Synopsis](#synopsis)
 46 |       - [Examples](#examples)
 47 |       - [Pseudo-code](#pseudo-code)
 48 | 
 49 | ## Rationale
 50 | 
 51 | There are a number of issues with the design of Pkg2, which necessitate a redesign and replacement:
 52 | 
 53 | - Pkg2's METADATA repository format uses many small files to represent data, which leads to awful performance on many filesystems, especially on Windows.
 54 | - Pkg2 uses a variety of ad hoc configuration formats which are simple but not particularly consistent.
 55 | - Pkg2 identifies versions of packages by git SHA1 commit hashes. This forces the package manager to use git to acquire package versions and makes package installation and verification impossible without including the entire git history of a package – which can be impractical.
 56 | - Some Julia packages have large objects in their git history, which users are forced to download even when they are installing more recent versions that no longer include these large objects.
 57 | - Pkg2 makes replacing a package with another package of the same name with disjoint git history a nightmare. This happened when `Stats` was renamed to `StatsBase` and a new `Stats` package was created. The only practical way to resolve this situation was to delete all packages and start over. Moreover, versions of `StatsBase` from before the rename became uninstallable afterwards.
 58 | - Pkg2 was designed to allow package development in the same location as package installation for usage. This design forces Pkg2 to use complex and subtle heuristics to try to determine when it is safe to update or modify installed packages. A large amount of code complexity stems from this design.
 59 | - Pkg2's package version resolution is designed to depend only on requirements and version information in METADATA, *not* on the current set of installed package versions. This implies that any update potentially updates all packages to the latest available version. This is typically undesirable: one often wants to do much more conservative, targeted updates of a subset of installed packages. Pkg2's update behavior effectively assumes that the user has carefully and accurately curated their exact requirement of packages, and that package developers never break things – neither of which is typically true.
 60 | - In Pkg2 *any* operation on packages invokes a full version resolution, not just explicit updates: adding or removing a new package updates all packages. This is unfortunate behavior for a package manager. It should be possible to add a new package with zero or minimal changes to pre-installed packages. It should always be possible to remove a package by simply removing it and its dependents.
 61 | - Pkg2 provides little support for projects tracking the precise versions of libraries and packages that they have used. This makes reproducibility more challenging than it should be.
 62 | - The `JULIA_PKGDIR` environent variable allows some amount of simulation of virtualenv-like "environments" – i.e. different sets of packages and language versions. This could be much better supported, however, and environment contents should ideally be easily commitable and sharable between different projects and systems, at various levels of granularity.
 63 | 
 64 | ## Depots
 65 | 
 66 | A **depot** is a file system location where we keep infrastructure related to Julia package management: registries, libraries, packages, and environments. There are typically at least three of these:
 67 | 
 68 | - **Standard depot:** default packages and libraries that ship with a specific version of Julia. This depot is strictly read-only. These versions of libraries and packages serve as a fallback when no other depots available. If you delete or disable this depot as well, standard packages will be unavailable. Example: `/usr/local/share/julia/standard`.
 69 | 
 70 | - **System depot:** package versions and libraries installed here are available to everyone on the system. They are typically only writable by administrators. If users want to add or upgrade packages, they will do so in their individual user depots. Example: `/usr/local/share/julia/system`.
 71 | 
 72 | - **User depot:** package versions and libraries installed by a user. Example: `~/.julia/`.
 73 | 
 74 | Note the lack of Julia versions in this scheme: a depot is expected to be shared between different Julia versions. This should work because of the principle of immutability (see below): since we don't update versions of libraries or packages in place, installed copies can be shared between different versions of Julia without issues. Different sets of library and package versions are handled at the environment level.
 75 | 
 76 | Each package depot contains the following directories:
 77 | 
 78 | - **`registries`:** named registries describe sets of packages, versions and compatibility between them.
 79 | - **`libraries`:** installed versions of libraries (e.g. `libcairo`,  `libpango`).
 80 | - **`packages`:** installed versions of Julia packages (e.g. `Cairo`,  `DataFrames`, `JuMP`).
 81 | - **`environments`:** named sets of versions of libraries and packages and global configuration.
 82 | 
 83 | Some environment and/or Julia variable – `DEPOT_PATH` maybe? – will control the set of depots visible to a Julia process. The registries, libraries, packages, and environments visible to Julia are the union across all depots in the depot path.
 84 | 
 85 | The set of registered packages visible to a Julia process is the union of all packages specified across all registries, merging specifications of the same package occurring in multiple registries by the following rules:
 86 | 
 87 | - The set of known packages is the union across all registries.
 88 | - The set of available versions of a package is the union across all registries.
 89 | - If the same version of a package appears in multiple registries, all versions must match.
 90 | - The registry with the largest registered version of a package determines its metadata;
 91 |   - If two different registries "tie" then the package metadata must match.
 92 | 
 93 | The set of installed library versions is the union across depots. If the same library version occurs multiple times in the depot path, the first occurance is used – different instances of the same library version may be different depending on how they are configured and installed. The set of installed package versions is the union across depots. If the same package version occurs multiple times in the depot path, the first occurance is used. If installed correctly, different installations of the same package should be identical.
 94 | 
 95 | Each named environment specifies a set of specific library and package versions. These libraries and packages do not need be installed in the same depot where the environment appears. They can be provided by another package depot, allowing preinstalled libraries and packages to be "inherited" from a system depot, for example. The default environment name is `v$(VERSION.major).$(VERSION.minor)`. This allows different versions of Julia to have different default environments.
 96 | 
 97 | ## Immutability
 98 | 
 99 | Installed libraries and packages are immutable: instead of updating libraries or packages in-place, once they are successfully installed, Pkg3 leaves them as-is until they are no longer needed. This requires a "cleanup" mechanism that does garbage collection of old, unused versions of libraries and packages. To that end, `Pkg3` will maintain a sorted `~/.julia_env.log` file tracking the paths of environment files they have used. During cleanup, if a path no longer points to a valid environment file, the entry is removed from `~/.julia_env.log`; if a path does point to a valid environment file, it is retained, and library and package versions referred to by it are considered to be in use. Any library or package versions that are not marked as in use are removed. When cleaning up a system depot, all user environment logs are scanned; when cleaning up a user depot, only that user's environment log is considered.
100 | 
101 | ## Environments
102 | 
103 | An **environment** captures a specific set of package and library versions and their global configuration. Pkg2 has some limited support for changing environments using the `JULIA_PKGDIR` environement variable. Pkg3 makes named environments and project-local environments a primary part of its design, making the invocation of Julia with different sets of libraries and packages far more convenient. It also standardizes how to record the names and versions of libraries and packages that are used, improving reproducibility.
104 | 
105 | In Pkg2, package operations like `Pkg.add`, `Pkg.rm`, and `Pkg.update` are somewhat inconsistent about whether they operate on the current running Julia process or not. This is because different actions have different feasibility with respect to the current session: it's possible to install or update a package before it is loaded, but it is impossible to remove or update an already-loaded package. Thus, performing operations on the set of available packages *in general* requires a restart of the process before it can take effect, but installing and then loading a new package without restarting the current process is common and useful.
106 | 
107 | In Pkg3, general operations on environments are not done in the Julia process using an environment. Instead, they are done through a standalone process, which (although it is implemented in Julia) does not operate within the environment that it manipulates. The most common operation, however – installing and loading a new package – will typically be done implicitly and automatically in an interactive Julia session. In other words, when the user does `using XYZ` in the REPL, if  `XYZ` is not installed, the REPL will prompt the user if they want to install `XYZ` and its dependencies, and if they agree, it will install and then load it. Since this is the most common operation it can be done without restarting the current Julia process, it makes sense that it be handled specially. When the user wants to remove package or update packages from an environment, they will instead invoke an external package management mode (`julia --pkg`?), which makes it clear that changes will not affect any currently running Julia sessions. The impact on usability is a strict improvement:
108 | 
109 | - Adding packages and loading them is easier since one simply does `using XYZ` and answers interactive prompts.
110 | - Removing and upgrading packages is no less difficult since it previously required restarting the current Julia process anyway, and is less confusing since the requirement to restart is explict since running a separate process clearly doesn't affect the current one.
111 | 
112 | ### Using Environments
113 | 
114 | When starting Julia, it is given an environment by default, by name or by path:
115 | 
116 | - `julia`: use the default named environment – `v$(VERSION.major).$(VERSION.minor)`.
117 | - `julia --env=abc`: use the environment named "abc", searched for in the depot path.
118 | - `julia --env=.`: use the local project environment (see below).
119 | - `julia --env=./proj`: use the project environment of the directory `./proj`.
120 | - `julia --env=./env.toml`: use environment described by the file `./env.toml`.
121 | 
122 | An environment spec with no slash is taken to be a named environment – except for the special name `.` which indicates using the current project environment. An environment spec with a slash is taken to be a path (relative or absolute): if the path is a directory, it is interpreted as a project and the project environment is used; if the path is a file, it is loaded as an environment specification (in TOML format, see "Configuration" below).
123 | 
124 | An environment spells out exactly what version of each of a set of packages and libraries to use (version, hash, path, etc.). A Julia process can be "open" or "closed" with respect to its environment:
125 | 
126 | - **Open:** packages that are not in the environment can be loaded. They will be resolved greedily in the order they are loaded, choosing the highest installed version that satisfies the requirements of the environment and all loaded packages. If no statisfactory version is installed, but some registered version exists that would satisfy all requirements, the user is prompted to install and use it.
127 | - **Closed:** packages that are not in the environment cannot be loaded.
128 | 
129 | By default, Julia runs in open mode. When testing or deploying, however, Julia should default to closed mode to help ensure that a project hasn't inadvertently used packages that aren't recorded as dependencies. Since the project configuration also records which packages are direct dependencies, closed mode could enforce that project code only uses direct dependencies and indirect dependencies are only loaded indirectly. Note that this also helps address the problem that different packages may refer to different packages by the same top-level name.
130 | 
131 | ### Project Environments
132 | 
133 | The environment specification of a project is split into three files: `Config.toml`, `Manifest.toml`, and `Local.toml`. (Each file name may also be prefixed with `Julia`, in which case the non-prefixed file, if it exists, is ignored.) The purpose of these files is to separate the environment into three parts:
134 | 
135 | - `Config.toml`: manual configuration, checked into version control (input)
136 | - `Manifest.toml`: generated information, checked into version control (output)
137 | - `Local.toml`: generated information, not checked into version control (by product)
138 | 
139 | Accordingly, `.gitignore` for Julia projects should include entries for `/Local.toml` and `/JuliaLocal.toml` so that those files are ignored by version control. The `Config.toml` file controls what subset of environment information goes into `Manifest.toml` versus what goes into `Local.toml` – everything ends up in one or the other. Examples of different scenarios with various choices of manifest subsets:
140 | 
141 | - A project meant to run on a single system (or homogenous systems) may choose to save everything in the manifest, including exact versions of packages and libraries, paths to them, even hashes of them, so that a complete record is checked into the project repository.
142 | - A project meant to run on different systems, on the other hand, may choose to check specific project versions and hashes into version control, but not library information, using libraries available on each system.
143 | - Published packages will generally not check specific dependency versions into version control since these will differ among developers and users. They will, however, check in general dependency version requirements (e.g. `XYZ = "1.2-1.9"`). During early development, however, it may be desirable to check in more detail so that different developers can stay in sync more easily.
144 | 
145 | When using the current project environment, specified by starting Julia with the ` --env=.` flag, the project directory is searched for by looking in the current directory and each parent directory for `JuliaConfig.toml` or `Config.toml`. If a directory is found containing a file by this name, it is considered to be the project root and the config, manifest and local files are loaded from there.
146 | 
147 | ## Packages
148 | 
149 | Packages continue to work much as they have previously with a few exceptions:
150 | 
151 | 1. Each package has `Config.toml` and `Manifest.toml` files.
152 | 2. `Config.toml` contains an entry giving the package a [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier).
153 | 3. Package versions are identified by a hash of a source tree instead of a git commit.
154 | 4. Eventually packages will not need to be git repositories.
155 | 
156 | UUIDs for registered packages will be assigned and when new packages are generated, a UUID will be created (this should happen even for private, unregistered packages). UUIDs will generally not be user-facing, but they are used internally to identify packages in registries and environment files. The purpose of UUIDs is to allow renaming of packages and moving of packages between different registries. A couple of scenarios to consider before arguing against using UUIDs:
157 | 
158 | - The `Stats` / `StatsBase` situation: `Stats` was renamed to `StatsBase` and a new package also called `Stats` was created. This broke many people's package installations and caused a great deal of grief. With packages identified by UUID, this kind of rename is completely unproblematic.
159 | - Two different packages may be created in different private registries with the same name. If these are both later made public, they may need to be renamed, but some way of knowing which one an old environment using one of them was referring to. Version hashes should be unique, but environments can record unregistered states of packages: unless every tree hash that could ever have been recorded in an environement using a package is known, it's impossible to figure out which package was used. If packages have UUIDs and these are recorded in environments, then it will always be possible to know which package was meant.
160 | 
161 | Identifying package version by hashes of *source trees* rather than git commit hashes allows us to acquire and verify package versions without necessarily using git, and even with git it makes it easier to support shallow cloning and history rewriting, as long as the source trees of a published version doesn't change. The git style SHA1 tree hash is one means of identifying a source tree, but we may want to support other hashes since SHA1 is no longer considered secure. We could, for example, also publish SHA2-512 hashes for the source trees of package versions, along side SHA1 hashes, allowing smooth transitioning to a more secure hash. With multiple coexisting ways of acquiring package versions, we can also smoothly transition away from using git alone for delivery of package code.
162 | 
163 | ## Registries
164 | 
165 | A **registry** is a Pkg3 replacement for the METADATA repository. Crucially, Pkg3 supports using multiple registries, and there will be "cathedral" and "bazaar" style public registries, and private registries will be supported. Private registries allow organizations to internally register private packages and versions which can refer to and depend on public packages. Registries provide four kinds of information:
166 | 
167 | 1. Bidirectional many-to-many mapping between package names and UUIDs.
168 | 2. A list of versions for each package, identified by their source tree hash.
169 | 3. Version dependency and compatibility information.
170 | 4. Where to get each package version.
171 | 
172 | The latest UUID associated with a name is the current one; other UUIDs were previous packages associated with that name. A UUID may have multiple names associated with it over time, but the latest one is current. If the same name occurs in different registries, referring to different UUIDs, then there is a name conflict which must be resolved interactively as needed. For example, if a user asks to add `XYZ` but the name refers to different packages in different registries, then the user should be prompted for which one they want.
173 | 
174 | Each version is associated with a specific source tree, unlike Pkg2 where each version is associated with a git commit. This allows us to acquire and verify package versions without necessarily using git, and even with git it makes it easier to support shallow cloning or history rewriting, as long as the source trees of published versions don't change. The git style SHA1 tree hash is one means of identifying a source tree, but we may want to support other means since SHA1 is no longer considered secure. We could, for example, also publish SHA2-512 hashes for the source trees of package versions, thereby allowing them to be securely verified even though SHA1 is no longer secure.
175 | 
176 | ## Versions & Compatibility
177 | 
178 | Expressing compatibility between various versions of packages is complicated by the fact that compatibility claims for a particular version can either be:
179 | 
180 | - mistakenly incorrect when published, or
181 | - correct when published but so broad that they later become incorrect.
182 | 
183 | Pkg2 allows and even encourages very loose dependency declarations and deals with both of the above situations by allowing compatibility claims to be adjusted after the fact. Dependencies can and are expected to be changed in METADATA to adjust for mistakes and invalidation. This causes significant complexity and confusion, however: the dependencies of a package version according to its own immutable source may not match the current dependencies registered for it in METADATA – which are still potentially evolving. Because of this, Pkg2 contains tricky logic about which compatibility claims take precedence – those in the source tree or those in METADATA. These rules are especially complicated since Pkg2 supports development of packages where they are installed, further muddying what the definitive record of compatibility is.
184 | 
185 | In Pkg3, a package version's compatibility claims are immutable. While compatibility claims may still be incorrect, they cannot be changed, only superseded by a newer version. Overly broad compatibility claims cannot, by design, be expressed in the first place. In this design, any invalidation of claimed compatibility can only stem from another package's failure to follow [semantic versioning](http://semver.org/) correctly. 
186 | 
187 | However, since this will certainly occur in practice, there will need to be a mechanism to remedy it:
188 | 
189 |  * If the compatibility claims were too restrictive, a new patch with wider version compatibility ranges can be published. Pkg3's version resolution will favor the most recent patch very strongly: unless you explicitly ask for an earlier patch specifically, a freshly installed or updated package will always be the latest patch in its major-minor series. Package developers should follow semantic versioning strictly and *only* include bug fixes in patch releases: patches should neither break existing features nor introduce new features.
190 |  
191 |  * However, if the compatibility claims were too broad, tagging a new version may not necessarily remedy the problem as the dependency resolver may decide to use the older (broken) version, in order to obtain compatibility with another package. In this case, the invalid compatibility claims will need to be revoked by the registry.
192 | 
193 | Compatibility claims in Pkg3 are expressed at *exactly* minor version granularity. This may be easiest to explain starting with the textual form. In configuration files, sets of compatible versions are expressed using arrays of string literals (in TOML format), each string being of one of the following forms:
194 | 
195 | - **minor version:** `"a.b"` includes versions with `major == a && minor == b`;
196 | - **version range:** `"a.b-a.c"` includes versions with `major == a && b ≤ minor ≤ c`;
197 | - **negated patch:** `"!a.b.c"` excludes versions with `major == a && minor == b && patch == c`.
198 | 
199 | A list of terms expresses a set of package versions: the union of versions included in minor version strings and version range strings, minus the specific versions excluded by negated patch strings. In other words, the version list `["1.2-1.4", "!1.2.5", "2.0"]`  includes any version such that
200 | 
201 | ```julia
202 | major == 1 && (2 ≤ minor ≤ 4) && !(minor == 2 && patch == 5) || major == 2 && minor == 0)
203 | ```
204 | 
205 | Compatibility lists should be normalized according to the following rules:
206 | 
207 | - versions and ranges should be mutually disjoint;
208 | - versions and ranges should appear in sorted order by major and minor version;
209 | - versions and ranges which can be coalesced should be combined into a single range;
210 | - negated patches should follow the version or range in which they are contained, separated from it only by smaller negated patches (i.e. negated patches are sorted by major, minor and patch numbers).
211 | 
212 | Following these rules, each possible set of compatible versions can be expressed in exactly one way. Here are some examples of normalized version sets:
213 | 
214 | ```toml
215 | ["1.2"]
216 | ["1.2", "!1.2.5"]
217 | ["1.2-1.3", "!1.2.5"]
218 | ["1.2-1.4", "!1.2.5", "2.0"]
219 | ["1.2-1.4", "!1.2.5", "!1.4.0", "2.0"]
220 | ["1.2-1.4", "!1.2.5", "!1.4.0", "2.0-2.1"]
221 | ["1.2-1.4", "!1.2.5", "!1.4.0", "2.0-2.5", "3.0"]
222 | ```
223 | 
224 | Compatibility sets include an unbounded number of potential future patches, but include a finite number of minor versions. A package should not declare compatibility with a minor version series unless some version in that series has actually been published – this guarantees that compatibility can (and should) be tested. If a new compatible major or minor version of a package is released, this should be reflected by publishing a new patch that expands the compatibility claims. If a new patch of an otherwise compatible major/minor version series contains a bug that breaks compatibility, a new patch of each package should be released: a patch of the buggy package, fixing the bug, and a patch of the other package, excluding the buggy version from its compatibility claims.
225 | 
226 | ## Configuration
227 | 
228 | Pkg3 uses [TOML](https://github.com/toml-lang/toml) for configuration files. Several other projects have adopted this format: see [Cargo](http://doc.crates.io/manifest.html) and [PEP 518](https://www.python.org/dev/peps/pep-0518/) for example. This [format comparison](https://github.com/toml-lang/toml#comparison-with-other-formats) has some thoughts and justifications for using this format over other common configuration formats. The basic justification is:
229 | 
230 | - compared to **JSON** it is more human readable and writeable
231 | - compared to **YAML** it is far simpler to parse and understand
232 | - compared to **INI** it is very similar but standardized
233 | - compared to **XML** it is… hah, no.
234 | 
235 | All said, TOML seems to be the most reasonable format for simple, human-readable configuration files. An implementation of TOML parsing and printing in Julia can be found [here](https://github.com/wildart/TOML.jl). There are a few other implementations floating around, and this version need not be the one we adopt, but it has been used for experimentation during the design process so it should handle formats discussed in what follows.
236 | 
237 | ### Configuration Fragments
238 | 
239 | We'll begin by describing certain types of configuration fragments. Environments and registries use these fragments in similar ways. TOML headers are absolute, not relative, which makes describing fragments a bit awkward. To address this, consider sections to be implicitly relative: if a fragmen has a header `[header]` consider it relative to wherever it occurs, so if that fragment were used in a section called `[section]` then the header would actually be `[section.header]`.
240 | 
241 | #### Package metadata
242 | 
243 | High-level description of a package: its UUID, name, license, authorship, where to get it, etc. This will appear in a package's configuration file and copied into any registries that the package appears in.
244 | 
245 | ```toml
246 | name = "Example"
247 | uuid = "86d33384-d511-4271-be88-8c3e434c707e"
248 | license = "MIT"
249 | authors = [
250 |     "Jane Q. Programmer <jane@example.com>",
251 |     "Jack X. Developer <jack@example.com>",
252 | ]
253 | description = "Example package."
254 | keywords = ["example", "fake", "unreal"]
255 | documentation = "https://docs.github.io/Example.jl"
256 | homepage = "https://example.com/Example.jl"
257 | repository = "https://github.com/ExampleOrg/Example.jl"
258 | ```
259 | 
260 | #### Version metadata
261 | 
262 | This descripes a particular version of a package.
263 | 
264 | ```toml
265 | version = "1.2.3"
266 | SHA1 = "739ea886f7ae45ef27f7c0a2ea2bc25d59d40fd2"
267 | SHA2-512 = """
268 | 45d8153f80a301a890d5da67592ddf42fb96c4cd3945998386d0293dcf80b44d
269 | c9c8499c6e1ba4068381ac5bb243561de3e9c25e8989e949d56e8438085a9a22
270 | """
271 | ```
272 | 
273 | Note that the string for a SHA2-512 hash value is allowed to contain extra whitespace including a newline. This improve readability of files including long hash values by avoiding overly long lines. The hash value is a hash of the source tree, computed as trees are hashed in git, but using different hashing functions. Thus, the SHA1 tree hash is the same as the tree name in git, allowing us to retrieve the source version.
274 | 
275 | #### Compatibility
276 | 
277 | The compatibility section expresses which libraries and packages a project directly interacts with, either as requirements or "optional dependencies" – i.e. packages that this package has some special code for, only to be loaded if that other package is also loaded. Only direct dependencies and optional packages are specified in the compatibility section. Any indirect dependencies are strictly the concern of the packages that depend on them. Thus, if `Required` depends on `Indirect`, we cannot constrain the version of `Indirect` here, although `Required` can. Thus, if a new version of `Required` comes out that don't use `Indirect` anymore, and we upgrade to that, the package manager is free to get rid of `Indirect`.
278 | 
279 | ```toml
280 | [library.libXYZ]
281 | uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41"
282 | versions = "2.3-2.5"
283 | 
284 | [package.Required]
285 | uuid = "85241492-0f92-400a-8719-bdc0424991f7"
286 | versions = ["1.2-1.3", "!1.2.5"]
287 | 
288 | [package.Optional]
289 | uuid = "f7faa14e-633f-4b87-8f63-428f7e99170d"
290 | versions = "3.7"
291 | optional = true
292 | ```
293 | 
294 | The last component of the header is the library or package name, while the `uuid` field gives its UUID – this unambiguously identifies the package. The name is what the local project will refer to and load the package or library as – this should probably match what its published as, although we may want to allow publishing under multiple names simultaneously. The `versions` field is either a string or an array of strings which specifies a set of compatible versions, as described in "Versions & Compatibility" above.
295 | 
296 | #### Runtime Configuration
297 | 
298 | Runtime configuration sections allow projects to set global configuration flags to be passed to libraries. This section only makes sense at a project level since there can only be one source of configuaration for a given library or package – i.e. libraries and packages cannot configure other libraries or packages.
299 | 
300 | ```toml
301 | [library.libXYZ]
302 | backend = "abc"
303 | knob = 1.5
304 | 
305 | [package.Required]
306 | numbers = [4, 8, 15, 16, 23, 42]
307 | 
308 | [package.Indirect]
309 | fiddle = true
310 | ```
311 | 
312 | A parsed dictionary representation of a package's configuration will be passed to the package's `__init__` method when it is loaded, allowing a project to control the global runtime configuration of packages. It remains to be determined how runtime configuration data will be passed to libraries. Packages may not provide runtime configuration of other packages since packages (by definition) are projects that are intended to be reusable by other projects and are thus, not the primary project. Runtime configuration may be provided for non-top-level dependencies (e.g. `Indirect` in the above fragment).
313 | 
314 | #### Manifest
315 | 
316 | The manifest fragment records all the details of which libraries and packages are included in a Julia environement. The information should be kept by running Julia process so that we can save it to a manifest file. Not all of the data will be appropriate to be committed for all kinds projects, so these data may be split between different files – some to be checked into version control and some strictly local.
317 | 
318 | ```toml
319 | [library.libXYZ]
320 | version = "2.3.4"
321 | path = "/home/user/.julia/libraries/libXYZ/2.3.4"
322 | mtime = 2016-10-20T18:28:56.299
323 | CRC32C = "3ba18fe1"
324 | SHA1 = "d2672146a1aca6023073074d765a32d7eb298baf"
325 | SHA2-512 = """
326 | 981702a057faa649b7fa24337a67e0d6e8af258f81d0ed8ce90775cdfe0942c6
327 | d18ce0b5747e5fb1123cceb65b1074a9ba20f788e7cbacc7e824bac043f80208
328 | """
329 | 
330 | [package.Required]
331 | version = "1.2.8"
332 | path = "/usr/julia/system/packages/Required/1.2.8"
333 | mtime = 2016-10-20T18:29:55.605
334 | CRC32C = "d1a6296e"
335 | SHA1 = "982d4e4e0f728e7e0416472ffb394250c7afd1aa"
336 | SHA2-512 = """
337 | f991d247834effca8ce7114b7100d191d259abf36bbe6a1cf03382a8e1a51171
338 | 0c107a0a7b5a5dd21cfd304e7e5525fc2287cc255de15f1c7d4f33ac86990e85
339 | """
340 | 
341 | [package.Optional]
342 | version = "3.7.2"
343 | path = "/home/user/.julia/packages/Optional/3.7.2"
344 | mtime = 2016-10-20T18:35:29.124
345 | CRC32C = "595b180b"
346 | SHA1 = "ff1ca382d0f905ce9e75fc829cfa4419123c0491"
347 | SHA2-512 = """
348 | 904b16f8cea76f8feb04526983a42a4b11194a840223976497f85e59c0948c3c
349 | 3a4ad1c0c5f1b7f61734f4f8cfee74869693fe6be56e56ca9e54398e3ea06765
350 | """
351 | 
352 | [package.Indirect]
353 | version = "1.5.3"
354 | path = "/usr/julia/system/packages/Indirect/1.5.3"
355 | mtime = 2016-10-21T10:42:25.366
356 | CRC32C = "2ffefb96"
357 | SHA1 = "8182d2ea3d4427eccc7e968923cb1bf6affb74c8"
358 | SHA2-512 = """
359 | 7cc5a55bf2f55f4ce95d4d63594bb5d2c468a41c552eb6c5d29a9ffcb8a8b40f
360 | 665b09748acc0cf3af9eeef81f55805269b86e9f26e32ede03c11d2043bf3f2d
361 | """
362 | ```
363 | 
364 | ### Source Package File
365 | 
366 | Package configuration includes package metadata and compatibility sections for libraries and packages:
367 | 
368 | ```toml
369 | name = "Example"
370 | uuid = "86d33384-d511-4271-be88-8c3e434c707e"
371 | license = "MIT"
372 | authors = [
373 |     "Jane Q. Programmer <jane@example.com>",
374 |     "Jack X. Developer <jack@example.com>",
375 | ]
376 | description = "Example package."
377 | keywords = ["example", "fake", "unreal"]
378 | documentation = "https://docs.github.io/Example.jl"
379 | homepage = "https://example.com/Example.jl"
380 | repository = "https://github.com/ExampleOrg/Example.jl.git"
381 | 
382 | [library.libXYZ]
383 | uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41"
384 | versions = "2.3-2.5"
385 | 
386 | [package.Required]
387 | uuid = "85241492-0f92-400a-8719-bdc0424991f7"
388 | versions = ["1.2-1.3", "!1.2.5"]
389 | 
390 | [package.Optional]
391 | uuid = "f7faa14e-633f-4b87-8f63-428f7e99170d"
392 | versions = "3.7"
393 | optional = true
394 | ```
395 | 
396 | ### Registry Package File
397 | 
398 | Each registered package has its own file (name TBD, but probably `Example.toml`), describing the package, all its registered versions, and their compatibility and requirements on other libraries and packages.
399 | 
400 | ```toml
401 | name = "Example"
402 | uuid = "86d33384-d511-4271-be88-8c3e434c707e"
403 | license = "MIT"
404 | authors = [
405 |     "Jane Q. Programmer <jane@example.com>",
406 |     "Jack X. Developer <jack@example.com>",
407 | ]
408 | description = "Example package."
409 | keywords = ["example", "fake", "unreal"]
410 | documentation = "https://docs.github.io/Example.jl"
411 | homepage = "https://example.com/Example.jl"
412 | repository = "https://github.com/ExampleOrg/Example.jl.git"
413 | 
414 |   [[version]]
415 |   version = "1.2.3"
416 |   SHA1 = "739ea886f7ae45ef27f7c0a2ea2bc25d59d40fd2"
417 |   SHA2-512 = """
418 |   45d8153f80a301a890d5da67592ddf42fb96c4cd3945998386d0293dcf80b44d
419 |   c9c8499c6e1ba4068381ac5bb243561de3e9c25e8989e949d56e8438085a9a22
420 |   """
421 | 
422 |     [version.library.libXYZ]
423 |     uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41"
424 |     versions = "2.3-2.5"
425 | 
426 |     [version.package.Required]
427 |     uuid = "85241492-0f92-400a-8719-bdc0424991f7"
428 |     versions = ["1.2-1.3", "!1.2.5"]
429 | 
430 |     [version.package.Optional]
431 |     uuid = "f7faa14e-633f-4b87-8f63-428f7e99170d"
432 |     versions = "3.7"
433 |     optional = true
434 | 
435 |   [[version]]
436 |   version = "1.2.4"
437 |   SHA1 = "e92729c0e7c23d9f83fadba3e197ab9b5ddd9791"
438 |   SHA2-512 = """
439 |   fd22289bb2440e9d6c112ff4b33e36183a792edafb2cd96eb688ef931faddf9c
440 |   81d4a7a544921bc3c5d79aa74db0a163fa8f75f57c6fb603810dd3d51e17ba2e
441 |   """
442 | 
443 |     [version.library.libXYZ]
444 |     uuid = "994d35e9-862f-42c9-aa51-d40fef54ab41"
445 |     versions = "2.3-2.6"
446 | 
447 |     [version.package.Required]
448 |     uuid = "85241492-0f92-400a-8719-bdc0424991f7"
449 |     versions = ["1.2-1.4", "!1.2.5", "2.0"]
450 | 
451 |     [version.package.Optional]
452 |     uuid = "f7faa14e-633f-4b87-8f63-428f7e99170d"
453 |     versions = ["3.7", "!3.7.3"]
454 |     optional = true
455 | ```
456 | 
457 | This format is pretty verbose. We could design a custom compression scheme for this format, aggregating information across multiple versions of the same package, or simply use general purpose compression. General purpose compression would be easier, certainly, but would still require parsing of a potentially very large number of version sections once they're uncompressed. A custom compression scheme could support faster parsing of logically compressed data, allowing the package manager to query the compressed data as-is.
458 | 
459 | ## Operations
460 | 
461 | In this section, we go through various operations on the set of packages in an environment. This supposes a `pkg>` REPL mode that has command-like syntax. For some operations, we'll provide pseudo-code for operations, which is not intended to actually work or even use real operation names, but to suggest the general operation. We distinguish top-level dependencies of a project – i.e. packages that appear in `Config.toml` with name, UUID, and compatible versions – from indirect dependencies which do not appear in `Config.toml` but do appear in `Manifest.toml` beacuse they are recursively depended on by top-level dependencies. Each pseudo-code snippet has an implicit preamble like this:
462 | 
463 | ```julia
464 | cfg₀ = load("Config.toml")
465 | env₀ = merge(load("Manifest.toml"), load("Local.toml"))
466 | ```
467 | 
468 | There is a similar postamble saving cfg₁ and env₁ back to `Config.toml` and env₁ to `Manifest.toml` and `Local.toml` as determined by the configuration splitting those files (TBD = to be designed).
469 | 
470 | ### Adding packages
471 | 
472 | #### Synopsis
473 | 
474 | ```
475 | pkg> add p₁ [=v₁] p₂ [=v₂] …
476 | ```
477 | 
478 | Add packages p₁, p₂, … as top-level dependencies of the current environment, adding version constraints as indicated.
479 | 
480 | #### Example
481 | 
482 | ```
483 | pkg> add Foo Bar=1 Baz=2.3 Qux=4.5.6
484 | ```
485 | 
486 | This command installs `Foo` at any version, `Bar` at major version 1, `Baz` at major/minor version 2.3, and `Qux` at exactly version 4.5.6. Corresponding constraints on these packages are added to `Config.toml`.
487 | 
488 | #### Pseudo-code
489 | 
490 | ```julia
491 | cfg₁ = add(cfg₀, p₁ => v₁, p₂ => v₂, …)
492 | env₁ = resolve(cfg₁, env₀, fix = [:all|:top|:none])
493 | ```
494 | 
495 | #### Dependency fixing
496 | 
497 | There are three available strategies for keeping dependencies fixed when adding top-level packages:
498 | 
499 | 1. **Fix all:** Only extend env₀ – i.e. env₁ ⊇ env₀. No versions in the manifest are changed, only new packages are added to it.
500 | 2. **Fix top:** Only allow changing indirect dependencies, not top-level dependencies. I.e. don’t change the versions of any packages that appear in cfg₀ – packages that aren’t directly used by the project are fair game to change the installed versions of (and to add or remove to the environment).
501 | 3. **Fix none:** add, remove, update any packages to satisfy cfg₁, but only change what you have to.
502 | 
503 | It is important to note that unlike Pkg2, with all strategies, even `fix = :none` , package versions are never changed unnecessarily. If you *also* want to upgrade packages to newer versions, you can do an upgrade operation before or after doing the add operation.
504 | 
505 | #### Questions
506 | 
507 | Do we really need multiple strategies, or can we just pick one of them?
508 | 
509 | If the operation fails, what state should `Config.toml` and `Manifest.toml`, etc. be left in?
510 | 
511 | ### Removing packages
512 | 
513 | #### Synopsis
514 | 
515 | ```
516 | pkg> rm p₁ p₂ …
517 | ```
518 | 
519 | Remove top-level packages p₁, p₂, … from the current environment.
520 | 
521 | #### Example
522 | 
523 | ```
524 | pkg> rm Foo Qux
525 | ```
526 | 
527 | Remove the packages `Foo` and `Qux` and any indirect dependencies that are only installed because of them. If any top-levels recursively depend on them (this can be direct or indirect via indirect dependencies, even though that's a somewhat strange situation), we could prompt the user if they want to remove those as well.
528 | 
529 | #### Pseudo-code
530 | 
531 | ```julia
532 | cfg₁ = rm(cfg₀, p₁, p₂, …)
533 | env₁ = resolve(cfg₁, env₀, fix = :all)
534 | ```
535 | 
536 | For package removal, it’s always possible to leave all remaining packages at the same version. Just remove p₁, p₂, …, and any indirect dependencies that aren’t necessary anymore. What remains is always a coherent set of packages.
537 | 
538 | ### Updating & upgrading packages
539 | 
540 | I'm proposing that we distinguish between "updating" and "upgrading" packages: an update is a version bump while an upgrade is a more significant change in version. The intuition is that when up update packages, there are essentially two things we want:
541 | 
542 | - **Update:** "Give me any bug fixes you've got but don't break my code."
543 | - **Upgrade:** "Install the latest version and if it breaks some stuff, I'll fix it."
544 | 
545 | #### Synopsis
546 | 
547 | ```
548 | pkg> [update|upgrade] p₁ p₂ …
549 | ```
550 | 
551 | Update or upgrade the packages p₁ p₂ … or all packages if none are specified. Update bumps listed packages and all of their recursive dependencies to the latest patch release of the current major/minor version they're currently at; if indirect dependencies must be upgraded, they may be but only if needed to get bug fix release of something else. Upgrade all listed packages and their recursive dependencies to the latest version compatible with `Config.toml` .
552 | 
553 | #### Examples
554 | 
555 | ```
556 | pkg> update
557 | ```
558 | 
559 | Update all packages to the latest bugfixes.
560 | 
561 | ```
562 | pkg> update Bar Baz
563 | ```
564 | 
565 | Update `Bar` and `Baz` and all their dependencies to the latest bugfix releases.
566 | 
567 | ```
568 | pkg> upgrade
569 | ```
570 | 
571 | Upgrade all packages to their latest versions.
572 | 
573 | ```
574 | pkg> upgrade Bar Baz
575 | ```
576 | 
577 | Upgrade `Bar` and `Baz` and all their dependencies to their latest versions.
578 | 
579 | #### Pseudo-code
580 | 
581 | ```julia
582 | cfg₁ = cfg₀
583 | env₁ = [update|upgrade](cfg₀, env₀, p₁, p₂, …)
584 | ```
585 | 
586 | Whether the function `update` or `upgrade` is called depends on the operation.
587 | 


--------------------------------------------------------------------------------