├── rfcs
    ├── 20180821-differentiable-functional-while
    │   ├── while_v1.png
    │   ├── while_v2.png
    │   ├── while_body.png
    │   └── while_cond.png
    ├── README.md
    ├── yyyymmdd-rfc-template.md
    ├── 20181225-tf-raw-ops.md
    ├── 20181026-tf-nightly.md
    ├── 20181025-tf-integration-testing.md
    ├── 20180604-dynamic-kernels.md
    ├── 20180507-cond-v2.md
    ├── 20180731-dockerfile-assembler.md
    ├── 20180821-differentiable-functional-while.md
    ├── 20180726-tf-data-windowing-reducers.md
    ├── 20180817-variables-20.md
    ├── 20181016-optimizer-unification.md
    ├── 20181214-move-to-addons.md
    └── 20181217-tf2-random-numbers.md
├── sigs
    ├── testing
    │   └── README.md
    ├── rust
    │   └── CHARTER.md
    ├── build
    │   ├── CHARTER.md
    │   ├── community-builds.md
    │   └── tensorflow-testing.md
    ├── tensorboard
    │   └── CHARTER.md
    ├── networking
    │   └── CHARTER.md
    ├── io
    │   ├── CHARTER.md
    │   └── RELEASE.md
    └── addons
    │   └── CHARTER.md
├── CODEOWNERS
├── README.md
├── MEETINGS.md
├── governance
    ├── SIG-charter-template.md
    ├── SIG-request-template.md
    ├── code-and-collaboration.md
    ├── tensorflow-testing.md
    ├── SIGS.md
    └── TF-RFCs.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
└── LICENSE


/rfcs/20180821-differentiable-functional-while/while_v1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/0101011/community/master/rfcs/20180821-differentiable-functional-while/while_v1.png


--------------------------------------------------------------------------------
/rfcs/20180821-differentiable-functional-while/while_v2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/0101011/community/master/rfcs/20180821-differentiable-functional-while/while_v2.png


--------------------------------------------------------------------------------
/rfcs/20180821-differentiable-functional-while/while_body.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/0101011/community/master/rfcs/20180821-differentiable-functional-while/while_body.png


--------------------------------------------------------------------------------
/rfcs/20180821-differentiable-functional-while/while_cond.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/0101011/community/master/rfcs/20180821-differentiable-functional-while/while_cond.png


--------------------------------------------------------------------------------
/sigs/testing/README.md:
--------------------------------------------------------------------------------
 1 | # TensorFlow 2.0 Testing
 2 | 
 3 | Welcome to TF 2.0 testing! This repository will house testing plans, friction logs, and a guide for installing TensorFlow 2.0.
 4 | 
 5 | Thanks for your interest!
 6 | 
 7 | ## Installation Instructions
 8 | 
 9 | ## Community Logistics
10 | 
11 | ## Testing Team
12 | 


--------------------------------------------------------------------------------
/rfcs/README.md:
--------------------------------------------------------------------------------
 1 | # TensorFlow RFCs
 2 | 
 3 | This directory stores approved RFCs.
 4 | 
 5 | ## Process
 6 | 
 7 | Please read carefully the [TensorFlow RFC
 8 | process](https://github.com/tensorflow/community/blob/master/governance/TF-RFCs.md).
 9 | 
10 | ## Template
11 | 
12 | Use [this template](yyyymmdd-rfc-template.md)
13 | to draft an RFC.
14 | 


--------------------------------------------------------------------------------
/CODEOWNERS:
--------------------------------------------------------------------------------
 1 | # Owners for community repo
 2 | # For syntax, see https://help.github.com/articles/about-codeowners/
 3 | 
 4 | # This file controls who is asked for a review when PRs are submitted
 5 | 
 6 | # SIGS
 7 | 
 8 | sigs/rust/ @adamcrume
 9 | sigs/tensorboard/ @ewilderj @manivaradarajan @martinwicke
10 | sigs/build/ @martinwicke @ewilderj @angersson @perfinion
11 | sigs/addons/ @martinwicke @ewilderj @karmel @seanpmorgan @armando-fandango
12 | sigs/networking/ @martinwicke @ewilderj @byronyi @jbedorf @poxvoculi
13 | sigs/io/ @martinwicke @ewilderj @mrry @yongtang @dmitrievanthony
14 | 
15 | # RFCs
16 | 
17 | rfcs/ @ewilderj @martinwicke @goldiegadde
18 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Welcome to the TensorFlow Developer Community
 2 | 
 3 | ## This Repository
 4 | 
 5 | The `community` repository stores documents used by the developer community.
 6 | 
 7 | * `rfcs` - design documents used by the design review process
 8 | * `sigs` - documentation for each TensorFlow Special Interest group (SIG)
 9 | * `governance` - operating processes for the TensorFlow project
10 | 
11 | ## Contact
12 | 
13 | For questions about this repository, please file an issue or reach out
14 | to Edd Wilder-James: ewj@google.com.
15 | 
16 | ## Further Community Resources
17 | 
18 | For a complete overview of the TensorFlow community resources,
19 | please visit [tensorflow.org/community](https://tensorflow.org/community). 
20 | 


--------------------------------------------------------------------------------
/MEETINGS.md:
--------------------------------------------------------------------------------
 1 | # TensorFlow Community Meetings
 2 | 
 3 | [TensorFlow SIGs](https://github.com/tensorflow/community/tree/master/sigs)
 4 | and other groups hold regular meetings via videoconference. Usually 
 5 | you will find these added to your calendar after you join the Google Group relevant
 6 | to the SIG or community group.
 7 | 
 8 | There is also a master calendar listing all community meetings:
 9 | [TensorFlow Community Calendar](https://calendar.google.com/calendar/embed?src=tensorflow.org_14t769n89qhsps949c3l0nhd9c%40group.calendar.google.com).
10 | 
11 | Google Calendar users can add the calendar to theirs using the button on the bottom of the master calendar page. If you want
12 | to add the Community Calendar to your own calendar application, use [this iCal link](https://calendar.google.com/calendar/ical/tensorflow.org_14t769n89qhsps949c3l0nhd9c%40group.calendar.google.com/public/basic.ics).
13 | 
14 | 


--------------------------------------------------------------------------------
/sigs/rust/CHARTER.md:
--------------------------------------------------------------------------------
 1 | # SIG-Rust
 2 | ## Objective
 3 | 
 4 | For users and contributors to collaborate on the TensorFlow Rust bindings
 5 | project.
 6 | 
 7 | ## Membership
 8 | 
 9 | Everyone involved in using or developing the TensorFlow Rust bindings is welcome
10 | to join the group. To participate, join the mailing list.
11 | 
12 | Archives of the mailing list will be publicly accessible.
13 | 
14 | ## Resources
15 | 
16 | * [sig-rust mailing list](https://groups.google.com/a/tensorflow.org/forum/#!forum/sig-rust)
17 | * Github [tensorflow/rust](https://github.com/tensorflow/rust)
18 | 
19 | 
20 | ## Contacts
21 | * Project lead: Adam Crume [@adamcrume](https://github.com/adamcrume) - acrume
22 |   at google
23 | * For administrative questions, contact Edd Wilder-James
24 |   [@ewilderj](https://github.com/ewilderj) - ewj at google
25 | 
26 | ## Code of Conduct
27 | 
28 | As with all forums and spaces related to TensorFlow, SIG-TensorBoard is subject
29 | to the [TensorFlow Code of
30 | Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).
31 | 


--------------------------------------------------------------------------------
/sigs/build/CHARTER.md:
--------------------------------------------------------------------------------
 1 | # SIG-Build - TensorFlow Distributors & Packagers Group
 2 | 
 3 | ## Objective
 4 | 
 5 | For discussion and collaboration around the building, testing, packaging, and
 6 | distribution of TensorFlow.
 7 | 
 8 | ## Membership
 9 | 
10 | Everyone involved in the building, testing, packaging, distributing or embedding
11 | of TensorFlow is welcome to join the group. To participate, request an invitation
12 | join the mailing list.
13 | 
14 | Archives of the mailing list will be publicly accessible.
15 | 
16 | ## Resources
17 | 
18 | * [sig-build mailing list](https://groups.google.com/a/tensorflow.org/forum/#!forum/build)
19 | 
20 | ## Contacts
21 | 
22 | * Project leads: Jason Zaman [@perfinion](https://github.com/perfinion), Austin Anderson [@angersson](https://github.com/angersson)
23 | * For administrative questions, contact Edd Wilder-James @ewilderj - ewj at
24 |   google
25 | 
26 | ## Code of Conduct
27 | 
28 | As with all forums and spaces related to TensorFlow, SIG-Build is subject to the [TensorFlow Code of Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).
29 | 


--------------------------------------------------------------------------------
/governance/SIG-charter-template.md:
--------------------------------------------------------------------------------
 1 | # Proposed name: SIG-??????
 2 | 
 3 | ## Objective
 4 | 
 5 | One or two sentences describing the group's purpose.
 6 | 
 7 | ## Membership
 8 | 
 9 | *Who can join? How can they join? Who can read the group's activity?*
10 | 
11 | Example:
12 | 
13 | > Everyone involved in the packaging, distributing or embedding of TensorFlow is
14 | > welcome to join the group. To participate, request an invitation to join the
15 | > mailing list. Archives of the mailing list will be publicly accessible.
16 | 
17 | ## Resources
18 | 
19 | *Links to essential resources: proposed mailing list, Github repo, key documents, etc.*
20 | 
21 | ## Contacts
22 | 
23 | *Minimum highlight a group leader, and somebody to reach out to for
24 | administrative purposes*
25 | 
26 | * *Project lead: A N Other [@githubhandle](https://github.com/githubhandle) -
27 |   another at companyname*
28 | * For administrative questions, contact Edd Wilder-James
29 |   [@ewilderj](https://github.com/ewilderj) - ewj at google
30 | 
31 | ## Code of Conduct
32 | 
33 | As with all forums and spaces related to TensorFlow, SIG-?????? is subject to
34 | the [TensorFlow Code of
35 | Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).
36 | 


--------------------------------------------------------------------------------
/sigs/tensorboard/CHARTER.md:
--------------------------------------------------------------------------------
 1 | # SIG TensorBoard
 2 | 
 3 | ## Objective
 4 | 
 5 | For discussion and collaboration around TensorBoard, the visualization tool for TensorFlow.
 6 | 
 7 | ## Membership
 8 | 
 9 | Everyone interested in developing TensorBoard is welcome to join the group. To participate, request an invitation join the mailing list.
10 | 
11 | Archives of the mailing list will be publicly accessible.
12 | 
13 | ## Resources
14 | 
15 | * [SIG TensorBoard mailing list](https://groups.google.com/a/tensorflow.org/forum/#!forum/sig-tensorboard)
16 | * GitHub [tensorflow/tensorboard](https://github.com/tensorflow/tensorboard)
17 | 
18 | * [Writing a TensorBoard plugin](https://github.com/tensorflow/tensorboard-plugin-example/blob/master/README.md)
19 | 
20 | ## Contacts
21 | * Project leads: 
22 |     * Mani Varadarajan [@maniv](https://github.com/manivaradarajan) - maniv at google
23 |     * Gal Oshri [@GalOshri](https://github.com/GalOshri) - goshri at google
24 | * For administrative questions, contact Edd Wilder-James [@ewilderj](https://github.com/ewilderj) - ewj at google
25 | 
26 | ## Code of Conduct
27 | 
28 | As with all forums and spaces related to TensorFlow, SIG-TensorBoard is subject
29 | to the [TensorFlow Code of
30 | Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).
31 | 


--------------------------------------------------------------------------------
/governance/SIG-request-template.md:
--------------------------------------------------------------------------------
 1 | # Request for SIG
 2 | 
 3 | ## What is this group for?
 4 | 
 5 | Describe the need the group fills. Who is the audience?
 6 | Provide evidence that work is already ongoing in this area.
 7 | 
 8 | ## Who will be part of it?
 9 | 
10 | Describe:
11 | 
12 | * group leader
13 | * a second for the leader
14 | * one or more interested parties who will also be in the group -- provide
15 |   evidence of the sustainability of the group
16 | 
17 | What will be your membership policy?
18 | 
19 | ## What initial problems will the group tackle?
20 | 
21 | *List potential goals for the group*
22 | 
23 | ## What modes of communication do you intend to use?
24 | 
25 | *A mailing list is a minimum. We recommend regularly scheduled VC calls to focus
26 | on agenda items. Slack or other chat channels are optional.*
27 | 
28 | ## Launch plan
29 | 
30 | *Describe how the group will be launched. Example follows*
31 | 
32 | ```
33 | 1.  `VC call with initial interested parties to finalize charter and initial group goals`
34 | 1.  `SIG set up with initial group members`
35 | 1.  `SIG added to community pages on tensorflow.org`
36 | 1.  `Write blog post about SIG and its goals`
37 | 1.  `Leader starts off mailing list discussion about initial work items`
38 | ```
39 | 
40 | # Charter
41 | 
42 | Please draft the SIG's charter using the [SIG Charter Template](SIG-charter-template.md).
43 | 
44 | 
45 | 


--------------------------------------------------------------------------------
/sigs/networking/CHARTER.md:
--------------------------------------------------------------------------------
 1 | # SIG Networking Charter
 2 | 
 3 | ## Objective
 4 | 
 5 | TensorFlow has built-in support for communicating intermediate results across the network using gRPC. SIG Networking aims to add support for different network fabrics and protocols. 
 6 | 
 7 | The group evaluates proposals and designs in this area and maintains code in the `tensorflow/networking` repository. 
 8 | 
 9 | ## Membership
10 | 
11 | Everybody with an interest in making TensorFlow work (better) on differrent types of networks or underlying drivers and libraries is welcome to join the SIG. To participate, request an invitation to join the mailing list. Archives of the mailing list are publicly accessible.
12 | 
13 | ## Resources
14 | 
15 | * SIG Networking mailing list: [networking@tensorflow.org](https://groups.google.com/a/tensorflow.org/forum/#!forum/networking)
16 | * Repository maintained by SIG Networking: [github.com/tensorflow/networking](https://github.com/tensorflow/networking)
17 | 
18 | ## Contacts
19 | 
20 | * SIG leads: Bairen Yi [@byronyi](https://github.com/byronyi) - byronyi@clustar.ai, Jeroen Bédorf [@jbedorf](https://github.com/jbedorf) - jeroen@minds.ai
21 | * TensorFlow technical contact: Paul Tucker [@poxvoculi](https://github.com/poxvoculi) - tucker@google.com
22 | * For administrative questions, contact Edd Wilder-James [@ewilderj](https://github.com/ewilderj) - ewj at google
23 | 
24 | ## Code of Conduct
25 | 
26 | As with all forums and spaces related to TensorFlow, SIG Networking is subject to the [TensorFlow Code of Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).
27 | 


--------------------------------------------------------------------------------
/sigs/io/CHARTER.md:
--------------------------------------------------------------------------------
 1 | # SIG IO - TensorFlow data formats and file systems group
 2 | 
 3 | ## Objective
 4 | 
 5 | TensorFlow has built-in support for accessing a small set of file systems and
 6 | data formats. This SIG aims to add support for more file systems and file
 7 | formats.
 8 | 
 9 | The group evaluates proposals and designs in this area and maintains code in the
10 | `tensorflow/io` repository. The repository should contain only subclasses of
11 | `tf.data.Dataset` and TensorFlow filesystems, as well as supporting code.
12 | 
13 | ## Membership
14 | 
15 | Everybody with an interest in improving TensorFlow interoperability is welcome
16 | to join the SIG. To participate, request an invitation to join the mailing list.
17 | Archives of the mailing list are publicly accessible.
18 | 
19 | ## Resources
20 | 
21 | * SIG IO  mailing list: [io@tensorflow.org](https://groups.google.com/a/tensorflow.org/forum/#!forum/io)
22 | * Gitter room: [tensorflow/sig-io](https://gitter.im/tensorflow/sig-io)
23 | * Github repository: [github.com/tensorflow/io](https://github.com/tensorflow/io)
24 | * Python package repository: [tensorflow-io](https://pypi.org/project/tensorflow-io)
25 | * R package repository: [tfio](https://cran.r-project.org/package=tfio)
26 | 
27 | ## Releases
28 | 
29 | Information about SIG IO releases and the release team could be found in [RELEASE.md](RELEASE.md).
30 | 
31 | ## Contacts
32 | 
33 | * Project leads:
34 |   - Yong Tang [@yongtang](https://github.com/yongtang) - yong.tang.github@outlook.com
35 |   - Anthony Dmitriev [@dmitrievanthony](https://github.com/dmitrievanthony) - dmitrievanthony@gmail.com
36 | * TensorFlow technical contact [@mrry](https://github.com/mrry) - mrry@google.com
37 | * For administrative questions, contact Edd Wilder-James
38 |   [@ewilderj](https://github.com/ewilderj) - ewj at google
39 | 
40 | ## Code of Conduct
41 | 
42 | As with all forums and spaces related to TensorFlow, SIG-I/O is subject to the [TensorFlow Code of Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).
43 | 


--------------------------------------------------------------------------------
/rfcs/yyyymmdd-rfc-template.md:
--------------------------------------------------------------------------------
 1 | # Title of RFC
 2 | 
 3 | | Status        | (Proposed / Accepted / Implemented / Obsolete)       |
 4 | :-------------- |:---------------------------------------------------- |
 5 | | **Author(s)** | My Name (me@example.org), AN Other (you@example.org) |
 6 | | **Sponsor**   | A N Expert (whomever@tensorflow.org)                 |
 7 | | **Updated**   | YYYY-MM-DD                                           |
 8 | | **Obsoletes** | TF-RFC it replaces, else remove this header          |
 9 | 
10 | ## Objective
11 | 
12 | What are we doing and why? What problem will this solve? What are the goals and
13 | non-goals? This is your executive summary; keep it short, elaborate below.
14 | 
15 | ## Motivation
16 | 
17 | Why this is a valuable problem to solve? What background information is needed
18 | to show how this design addresses the problem?
19 | 
20 | Which users are affected by the problem? Why is it a problem? What data supports
21 | this? What related work exists?
22 | 
23 | ## Design Proposal
24 | 
25 | This is the meat of the document, where you explain your proposal. If you have
26 | multiple alternatives, be sure to use sub-sections for better separation of the
27 | idea, and list pros/cons to each approach. If there are alternatives that you
28 | have eliminated, you should also list those here, and explain why you believe
29 | your chosen approach is superior.
30 | 
31 | Factors to consider include:
32 | 
33 | * performance implications
34 | * dependencies
35 | * maintenance
36 | * platforms and environments impacted (e.g. hardware, cloud, other software
37 |   ecosystems)
38 | * [compatibility](https://www.tensorflow.org/programmers_guide/version_compat)
39 | * how will this change impact users, and how will that be managed?
40 | 
41 | ## Detailed Design
42 | 
43 | This section is optional. Elaborate on details if they’re important to
44 | understanding the design, but would make it hard to read the proposal section
45 | above.
46 | 
47 | ## Questions and Discussion Topics
48 | 
49 | Seed this with open questions you require feedback on from the RFC process.
50 | 


--------------------------------------------------------------------------------
/rfcs/20181225-tf-raw-ops.md:
--------------------------------------------------------------------------------
 1 | # `tf.raw_ops`
 2 | 
 3 | | Status | Accepted |
 4 | | :------------ | :------------------------------------------------------ |
 5 | | **Author**    | apassos@google.com 								  |
 6 | | **Sponsor**   | wicke@google.com                                        |
 7 |  | **Updated**   | 2018-12-21                                              |
 8 | 
 9 | ## Objective
10 | 
11 | Expose a `tf.raw_ops` namespace containing all raw operations in TensorFlow.
12 | 
13 | ## Motivation
14 | 
15 | Some parts of the TensorFlow Python API, such as variables, optimizers, and
16 | control flow, are currently not implementable by third parties. Moreover, with
17 | the tf.contrib deprecation, there is now no valid Python endpoint from which to
18 | use many TF operations.
19 | 
20 | ## Design Proposal
21 | 
22 | We'll add a `tf.raw_ops` namespace to TensorFlow with Python bindings to all
23 | non-deprecated TensorFlow ops which is usable in a backwards-compatible
24 | way. This is designed to be consumed by downstream library writers and not end
25 | users.
26 | 
27 | ## Detailed Design
28 | 
29 | The namespace will be automatically populated with generated bindings for every
30 | operation in TensorFlow. These generated bindings will be similar to the ones
31 | currently used for the python API, with the following differences:
32 | 
33 | * All arguments are keyword arguments.
34 |  - This allows us to add new attributes to existing ops without breaking users
35 |    who call by positional arguments (given that there is an always-last `name`
36 |    argument added by the tf op binding generator).
37 |  - This also prevents users from assuming that calling conventions from the
38 |    existing python bindings apply to the raw versions (we often do argument
39 |    reordering in our python bindings, for example).
40 | * Any op marked as deprecated will be in the namespace but will raise an
41 |   exception when used.
42 |  - This includes ops which take or produce ref tensors.
43 |  - This allows us to deprecate ops eventually and to be less strict with the API
44 |    here than with the main API.
45 |  - This is mostly OK since only library writers are supposed to use these
46 |    symbols, and the deprecation messages should include upgrading instructions.
47 | 
48 | 
49 | ## Questions and Discussion Topics
50 | 
51 | * Naming: tf.raw_ops is the name
52 | * Backward compatibility policy: we'll document on tf.org
53 | * Flat namespace vs nested? flat
54 | * Will not include protocol buffers
55 | 


--------------------------------------------------------------------------------
/sigs/addons/CHARTER.md:
--------------------------------------------------------------------------------
 1 | # SIG Addons
 2 | 
 3 | ## Objective
 4 | 
 5 | TensorFlow natively supports a larger number of operators, layers, metrics, losses, and optimizers. However, in a fast moving field like ML, there are many interesting new developments that cannot be integrated into core TensorFlow (because they are experimental, or their significance is not yet clear).
 6 | 
 7 | This special interest group maintains a repository of bleeding edge contributions that conform to well-established API patterns, but implement new functionality not available in core TensorFlow.
 8 | 
 9 | ## Scope
10 | 
11 | This group maintains the [tensorflow/addons](https://github.com/tensorflow/addons) repository. It contains additional functionality which fits the following criteria:
12 | 
13 | * The functionality is not otherwise available in TensorFlow
14 | * The functionality conforms to an established API pattern in TensorFlow. For instance, it could be an additional subclass of an existing interface (new Layer, Metric, or Optimizer subclasses), or an additional Op or OpKernel implementation.
15 | * Addons have to be compatible with TensorFlow 2.x. 
16 | * The addon conforms to the code and documentation standards defined by the group. These policies are detailed in the project's [README](https://github.com/tensorflow/addons/blob/master/README.md)
17 | * The addon is useful for a large number of users (e.g., an implenentation used in widely cited paper, or a utility with broad applicability)
18 | 
19 | The group is responsible for reviewing new additions to the repository, including evaluating designs and implementations.
20 | 
21 | ## Membership
22 | 
23 | Everybody with an interest in helping extend TensorFlow with new types of Ops, Layers, etc. is welcome to join the SIG. To participate, request an invitation to join the mailing list. Maintainer status for the repository will be conferred by consensus of the existing members. Archives of the mailing list are publicly accessible.
24 | 
25 | ## Resources
26 | 
27 | * SIG Addons mailing list: [addons@tensorflow.org](https://groups.google.com/a/tensorflow.org/forum/#!forum/addons)
28 | * Repository maintained by SIG Addons: [github.com/tensorflow/addons](https://github.com/tensorflow/addons)
29 | 
30 | ## Contacts
31 | 
32 | * Project leads: Sean Morgan [@seanpmorgan](https://github.com/seanpmorgan) - seanmorgan@outlook.com,
33 |   Armando Fandango [@armando-fandango](https://github.com/armando-fandango) - armando@neurasights.com
34 | * TensorFlow technical contact [@karmel](https://github.com/karmel) - karmel@google.com
35 | * For administrative questions, contact Edd Wilder-James [@ewilderj](https://github.com/ewilderj) - ewj at google
36 | 
37 | ## Code of Conduct
38 | 
39 | As with all forums and spaces related to TensorFlow, SIG Addons is subject to the [TensorFlow Code of Conduct](https://github.com/tensorflow/tensorflow/blob/master/CODE_OF_CONDUCT.md).
40 | 


--------------------------------------------------------------------------------
/rfcs/20181026-tf-nightly.md:
--------------------------------------------------------------------------------
 1 | # `tf-nightly` and `tf-nightly-gpu` renovations
 2 | 
 3 | | Status        | Implemented                                             |
 4 | | :------------ | :------------------------------------------------------ |
 5 | | **Author**    | amitpatankar@google.com 								  |
 6 | | **Sponsor**   | gunan@google.com                                        |
 7 | | **Updated**   | 2018-10-26                                              |
 8 | 
 9 | ## Objective
10 | 
11 | Plan a new process and protocol for how we build and distribute nightlies. `tf-nightly` is now widely used to test and evaluate performance as official releases are spread more far apart. If we are following `tf-estimator`'s process for individual module testing we need to make `tf-nightly` more reliable and robust.
12 | 
13 | ## Motivation
14 | 
15 | Earlier we built from HEAD from the master branch every night for all operating systems. We take the builds for Ubuntu and use them to create our docker containers so that way the git hash matches. Breakages were quite common, and most of our nightly builds were quite behind on all platforms. For example, there was a three month stretch where Windows was not updated.
16 | 
17 | ## Design Proposal
18 | 
19 | We will take the latest postsubmit build that has passed for each platform and get the commit hash. Based on that hash we will create nightly binaries. If the last postsubmit green is more than 24 hours old we will not publish binaries for that platform for that day.
20 | 
21 | Absolutely no tests will be run on the binaries. If it builds it ships. Refer to the diagram below for different cases.
22 | 
23 | ![](https://storage.googleapis.com/amitpatankar/tf-nightly-postsubmit.png)
24 | 
25 | 
26 | ## Detailed Design
27 | 
28 | ### Support
29 | * We will still continue to offer packages for:
30 | 
31 | |Platform/OS:   |CPU   |GPU   |Package Type             |
32 | |---------------|------|------|-------------------------|
33 | |Mac            |Yes   |No    |Pip `Python 2.7-3.6`     |
34 | |Ubuntu         |Yes   |Yes   |Pip `Python 2.7-3.6`     |
35 | |Windows        |Yes   |Yes   |Pip `Python 2.7-3.6`     |
36 | |Docker-dev     |Yes   |Yes   |Container `Python 2&3`   |
37 | |Docker-nondev  |Yes   |Yes   |Container `Python 2&3`   |
38 | * Please file bugs on [GitHub](https://github.com/tensorflow/tensorflow/issues) if a nightly build for a certain platform has not been pushed for a week. We will do our best to push builds every night, but please wait 7 days before notifying us.
39 | * We will also be much less active for Windows builds especially GPU. We often find that those are difficult to fix, most of `tf-nightly` users are Ubuntu and Docker. That grace period before you can notify us for Windows GPU will be two weeks.
40 | 
41 | 
42 | ### Versioning
43 | ![](https://storage.googleapis.com/amitpatankar/tf-rename-release-diagram.png)
44 | 
45 | ## Questions and Discussion Topics
46 | 
47 | * Although the package names for `tf-nightly` and `tensorflow` differ, installing one after the other will overwrite some files in site-packages.
48 | * Hashes may be mismatched. The binary for a certain day on Windows can be a different hash from that corresponding binary on Ubuntu.
49 | * Cannot name them something better due to [PEP440](https://www.python.org/dev/peps/pep-0440/) compliance.


--------------------------------------------------------------------------------
/sigs/io/RELEASE.md:
--------------------------------------------------------------------------------
 1 | # SIG IO Releases
 2 | 
 3 | At the moment SIG IO Releases consist of two parts:
 4 | - Release of source code with versioning in GitHub
 5 | - Release of python package in PyPI
 6 | - Release of R package to CRAN
 7 | 
 8 | ## GitHub Source Code Release
 9 | 
10 | To perform a release in GitHub, the following steps are needed:
11 | - Create a PR to update the RELEASE.md in
12 |   [github.com/tensorflow/io](https://github.com/tensorflow/io)
13 |   * Add updates for new features, enhancements, bug fixes
14 |   * Add contributors using `git shortlog <last-version>..HEAD -s`
15 | - Merge the PR for RELEASE.md update
16 | - Create a new version through GitHub
17 | 
18 | ## PyPI Python Package Release
19 | 
20 | To perform a release in PyPI, first complete the above GitHub release, then
21 | build pip packages locally with docker in the following commands
22 | ```
23 | $ docker run -it -e BAZEL_VERSION=0.20.0 --rm -v ${PWD}:/working_dir \
24 |     -w /working_dir  tensorflow/tensorflow:custom-op \
25 |     bash -x /working_dir/.travis/python.release.sh <2.7|3.4|3.5|3.6>
26 | ```
27 | Note the above commands has to run four times with 2.7, 3.4, 3.5, 3.6
28 | to generate all pip packages for different python versions.
29 | 
30 | Then upload `artifacts/*.whl` files with:
31 | ```
32 | twine upload artifacts/*
33 | ```
34 | 
35 | ## CRAN R Package Release
36 | 
37 | Before submitting the R package to CRAN, manually perform and check the following items:
38 | * Make sure the documentation in `README.md` and `vignettes` is up-to-date
39 | * Update `Version` field in `DESCRIPTION` file
40 | * Update `NEWS.md` to include items for this new release
41 | * Run `devtools::check()` and fix all the notable issues, especially warnings and errors
42 | * Update `cran-comments.md` to include any unsolvable issues from `devtools::check()` and
43 | other comments/responses to CRAN maintainers
44 | * Run checks on R-hub via `devtools::check_rhub()` and on win-builder via `devtools::check_win_devel()`. This is
45 | optional since Python is not be installed on CRAN test machines and we skip the tests on
46 | CRAN.
47 | 
48 | To submit the package to CRAN for review, do the following:
49 | * Run `devtools::release()` to submit for review. Here's how it looks like if submission is successful:
50 | ```
51 | Submitting file: /var/folders/zp/k98_wphd0h9c5b3zyk5xhnhm0000gn/T//RtmpHh9Wdo/tfio_0.1.0.tar.gz
52 | File size: 483.4 Kb
53 | Uploading package & comments
54 | Confirming submission
55 | Package submission successful.
56 | Check your email for confirmation link.
57 | ```
58 | * Check email for confirmation link and confirm the submission
59 | * CRAN maintainers will review the submission and email you for the result of this submission.
60 | If there are any additional issues and comments that need to be addressed, address them and re-submit
61 | 
62 | ## SIG IO Release Team
63 | 
64 | Everybody with an interest in helping SIG IO releases, is welcome
65 | to join the Release Team. To participate, create a PR to update
66 | the doc or send an email to SIG IO mailing list
67 | [io@tensorflow.org](https://groups.google.com/a/tensorflow.org/forum/#!forum/io).
68 | Please provide both GitHub and PyPI handle to join the release team.
69 | 
70 | Current Release Team:
71 | - Yong Tang - GitHub: [@yongtang](https://github.com/yongtang) - PyPI: [yongtang](https://pypi.org/user/yongtang)
72 | - Anthony Dmitriev - GitHub: [@dmitrievanthony](https://github.com/dmitrievanthony) - PyPI: [dmitrievanthony](https://pypi.org/user/dmitrievanthony)
73 | - Yuan (Terry) Tang - GitHub: [@terrytangyuan](https://github.com/terrytangyuan) - PyPI: [terrytangyuan](https://pypi.org/user/terrytangyuan)
74 | - Bryan Cutler - GitHub: [@BryanCutler](https://github.com/BryanCutler) - PyPI: [cutlerb](https://pypi.org/user/cutlerb)
75 | 


--------------------------------------------------------------------------------
/rfcs/20181025-tf-integration-testing.md:
--------------------------------------------------------------------------------
 1 | # TensorFlow Integration Testing
 2 | 
 3 | | Status        | Accepted                                                |
 4 | | :------------ | :------------------------------------------------------ |
 5 | | **Author**    | amitpatankar@google.com 								  |
 6 | | **Sponsor**   | gunan@google.com                                        |
 7 | | **Updated**   | 2018-10-24                                              |
 8 | 
 9 | ## Objective
10 | 
11 | This document proposes the official way to test projects and repositories downstream from TensorFlow. With TensorFlow becoming more and more modularized, libraries that sit on top of core TensorFlow need to be tested. Unfortunately we cannot wait for any adjustments made to core TensorFlow to propagate through to a formal release and we need a way to have a reliable way of getting the latest stable TensorFlow to test any new changes to the external repositories. A great example is the estimator library which is moving out of TensorFlow, but is still heavily dependent on core TensorFlow changes.
12 | 
13 | ## Motivation
14 | 
15 | There are three potential possibilities to test TensorFlow dependent libraries:
16 | 
17 |  * Test with the latest official release.
18 |  * Test by building TensorFlow from source at HEAD on the maste branch.
19 |  * Test with the old `tf-nightly`.
20 |  
21 | |Approach:                     |TF-Release|TF-Head  |Old `tf-nightly`|
22 | |------------------------------|----------|---------|----------------|
23 | |TensorFlow update latency     |Poor      |Excellent|Average         |
24 | |Test setup overhead           |Excellent |Poor     |Excellent       |
25 | |Stability                     |Excellent |Poor     |Poor            |
26 | |Test dependencies immediately |Poor      |Excellent|Poor            |
27 | 
28 | None of these solutions are ideal for testing projects downstream from TensorFlow.
29 | 
30 | ## Design Proposal
31 | 
32 | ### New Testing Approach
33 | 
34 | The [renovated `tf-nightly` approach](https://github.com/tensorflow/community/blob/master/rfcs/20181026-tf-nightly.md) will combat the two issues that plague option 3 for testing TensorFlow dependent packages.
35 | 
36 | |Approach:                    |New `tf-nightly`  |
37 | |-----------------------------|------------------|
38 | |TensorFlow update latency    |Excellent         |
39 | |Test setup overhead          |Excellent         |
40 | |Stability                    |Excellent         |
41 | |Test dependencies immediately|Excellent         |
42 | 
43 | #### Stability
44 | Sometimes the `tf-nightly` packages were created but failed immediately when attempting `import tensorflow`. 
45 | 
46 | #### Test dependencies immediately
47 | Sometimes `tf-nightly` packages are behind since there are infrastructure issues or the hash they build off of at midnight does not build. With the guaranteed latest green postsubmit, your test is guaranteed to be run against the latest stable TensorFlow code possibly from the previous day.
48 | 
49 | 
50 | ### Example testing strategy
51 | Here is a quick example that shows how TensorFlow can work with Tensorboard. This example uses a virtualenv with Python3 to run a simple test that theoretically depends on the latest code from TensorFlow.
52 | 
53 | ##### Create the virtual environment
54 | 
55 | ```bash
56 | $ virtualenv -p python3 tf
57 | $ source tf/bin/activate
58 | (tf)$ pip install --upgrade pip
59 | ```
60 | 
61 | #####  Install and check `tf-nightly` or `tf-nightly-gpu`
62 | 
63 | ```bash
64 | (tf)$ pip install --upgrade tf-nightly
65 | Successfully installed tf-nightly-1.13.0.dev20181023
66 | (tf)$ python -c 'import tensorflow as tf; print(tf.__version__)'
67 | 1.13.0-dev20181023
68 | ```
69 | 
70 | #####  Clone and test the dependent project
71 | 
72 | ```bash
73 | (tf)$ git clone git@github.com:tensorflow/tensorboard.git
74 | Cloning into 'tensorboard'...
75 | remote: Counting objects: 20684, done.
76 | remote: Total 20684 (delta 0), reused 0 (delta 0), pack-reused 20683
77 | Receiving objects: 100% (20684/20684), 12.17 MiB | 8.89 MiB/s, done.
78 | Resolving deltas: 100% (15053/15053), done.
79 | (tf)$ cd tensorboard
80 | (tf)$ bazel run //tensorboard/plugins/scalar:scalars_demo
81 | ```
82 | 
83 | 
84 | 


--------------------------------------------------------------------------------
/sigs/build/community-builds.md:
--------------------------------------------------------------------------------
 1 | # Community Supported TensorFlow Builds and Releases
 2 | 
 3 | 
 4 | ## Overview
 5 | 
 6 | TensorFlow is used in many more environments and configurations than is practical for the core team to regularly test and support: so we need a way to include federated third party testing and builds.
 7 | 
 8 | This document describes a process for creating third party builds of TensorFlow, federating tests and builds, and making the build artifacts available to users. Examples of such builds include those optimized for particular hardware configurations, operating system environments, or other specific applications.
 9 | 
10 | There are three major phases of the process:
11 | 
12 | 
13 | 
14 | 1.  Engagement — connect with the TensorFlow core team and work on a plan for integration, tests, documentation and support
15 | 1.  Testing — set up continuous integration and connect to GitHub webhooks
16 | 1.  Building — once tests exist and pass, and builds are available, they will be linked as community supported builds from the official TensorFlow site
17 | 
18 | 
19 | ## Phase 1: Engagement
20 | 
21 | You should first join the [SIG Build interest group](https://groups.google.com/a/tensorflow.org/forum/#!forum/build): this community is the main way coordination happens around building, testing and releasing TensorFlow.
22 | 
23 | To start the process, reach out with a description of your intent to build a particular flavor or release of TensorFlow to the SIG Build community: include a tracking bug filed in GitHub.
24 | 
25 | A TensorFlow team member will reply and start the planning process with you. Together, we will create a plan to get the work to "community supported" status. We discuss how to integrate your code, what the TensorFlow team needs from you, and set expectations for both sides.
26 | 
27 | In particular we will need to ensure there is:
28 | 
29 | 
30 | 
31 | *   A testing plan, to make sure the build is periodically tested by you, with our help. The TensorFlow team won't run these tests. We also will not add tests that will block merging code to the central TensorFlow repository.
32 | *   Documentation and examples. You should plan to provide sufficient documentation to let people install, setup, and use the artifacts you have created.
33 | *   A support plan. Before we link the build artifacts from the web site, you will need to provide a contact for support and maintenance of the packages.
34 | 
35 | The TensorFlow team will periodically review community supported efforts, and highlight them in collaboration with you through various promotional channels on a case-by-case basis: for example, through blog posts or conference presentations.
36 | 
37 | 
38 | ## Phase 2: Testing
39 | 
40 | In this phase, we agree what configurations should be tested based on what the community needs and what you are willing to contribute. Usually, this should be a discussion conducted within SIG Build.
41 | 
42 | The TensorFlow team will work with you to set up continuous testing of your build:
43 | 
44 | 
45 | 
46 | *   There is no mandated CI system: you can choose what CI system you would like to use (e.g. Jenkins, Travis, custom)
47 | *   We recommend running as many unit tests as possible
48 | *   Continuous testing of the master branch is required
49 | *   Testing release branches at least once after each branch cut is highly recommended
50 | 
51 | The TensorFlow team will create "webhooks" in our GitHub repository to enable automated triggering of tests in your CI.
52 | 
53 | Once the tests are up and running, we will link to the CI build status under community supported builds on GitHub, as is the case for the IBM CI links [here](https://github.com/tensorflow/tensorflow/blob/master/README.md)!
54 | 
55 | 
56 | ## Phase 3: Building
57 | 
58 | At this stage, we must be sure that the continuous integration is configured, all tests pass, and that the CI setup proves stable.
59 | 
60 | You will set up a destination download and documentation site, and the TensorFlow web site will add a link to it, highlighting that this is a community supported build, with credit to you and your organization.
61 | 
62 | To be listed as a build, you must also provide:
63 | 
64 | 
65 | 
66 | *   One or more GitHub users to assign issues to
67 | *   Support details for users to report bugs to you
68 | *   Documentation as discussed in Phase 1
69 | 
70 | If the build remains broken for an extended period of time, the TensorFlow team may remove it from the community builds list until the requirements for phase 3 are once again met.
71 | 
72 | 
73 | ## Comments and questions
74 | 
75 | Please feel free to ask further about this process on the [build@tensorflow.org](mailto:build@tensorflow.org) mailing list.
76 | 
77 | 


--------------------------------------------------------------------------------
/governance/code-and-collaboration.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # TensorFlow Governance: Code and Collaboration
 3 | 
 4 | ## Projects
 5 | 
 6 | A **project** is the primary unit of collaboration. It can either have its own
 7 | repo, or be a part of another repo (e.g. a directory in _tensorflow/models_).
 8 | 
 9 | 
10 | ## Contributors
11 | 
12 | Anyone can submit a PR contribution to any project, as long as they have signed
13 | the CLA and follow the guidelines in
14 | [CONTRIBUTING.md](https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md).
15 | Their code must be reviewed by a maintainer, and must pass all applicable tests.
16 | 
17 | Code reviews check for code quality and style, including documentation, and
18 | enforce API compatibility guarantees and other policies. Contributions may be
19 | rejected for strategic reasons unrelated to the code in question. For instance,
20 | because a feature may be too costly to maintain, or because it would duplicate
21 | APIs.
22 | 
23 | ## Maintainers
24 | 
25 | A project has one or more **maintainers**. 
26 | 
27 | Maintainers have write access to the repo containing their project. That means
28 | they can review PRs—an approving review will allow PRs to be merged. They can
29 | also change labels (which means they can trigger tests), add assignees and
30 | reviewers, and they can be assigned to issues and PRs.
31 | 
32 | Note that for some repos being a maintainer may not allow direct commit access,
33 | which is reserved for administrators or bots. *tensorflow/tensorflow* is a case
34 | in point, due to the complexity around releasing, only a small group of release
35 | engineers are administrators. In this case maintainers have approval rights.
36 | 
37 | If a repo is shared between many projects, we use GitHub's CODEOWNERS to
38 | identify owners and route PRs to them for review. Because of the way CODEOWNERS
39 | works, it is not possible to use the GitHub routing mechanism everywhere -- for
40 | example, the TensorFlow CODEOWNERS file is mainly for informational purposes, as
41 | to not impede merges.
42 | 
43 | Once there are more than a couple of maintainers for a project, we will create a
44 | GitHub team for the project maintainers. This allows for easier maintenance, and
45 | opens up some [GitHub
46 | tooling](https://help.github.com/articles/about-team-discussions/) for
47 | communication. Larger projects can facilitate coordination and contribution
48 | through establishing a
49 | [TensorFlow SIG](SIGS.md).
50 | 
51 | 
52 | ### Repositories requiring synchronization
53 | 
54 | For some projects initiated by Google (including the _tensorflow/tensorflow_
55 | repo), the infrastructure which synchronizes and merges internal and external
56 | changes requires that all merges are performed by a Google employee. In such
57 | cases, Google sets up an on-call rotation which merges PRs once they pass tests
58 | (and a specific label is applied to the PR in order to notify the rotation to
59 | merge it). This does not preclude non-Google contributors from becoming
60 | maintainers. In this case, the maintainers of the project decide on what should
61 | be merged, then the actual merging is performed as a service. In some cases,
62 | Google-internal tests may fail and may have to be fixed: the Google employee
63 | will work with the submitter to achieve this.
64 | 
65 | 
66 | ### Achieving maintainer status
67 | 
68 | Maintainers may elevate a contributor to maintainer status, on evidence of
69 | previous contributions and established trust.
70 | 
71 | ## Collaboration
72 | 
73 | Maintainers are free to agree on their preferred form of collaboration and
74 | decision making, with the requirement that regular communication about decisions
75 | must be made publicly accessible—this can happen after the fact, for example in
76 | the form of publishing meeting minutes, reviews, or announcements. Communication
77 | about topics such as admitting other maintainers, or as of yet undisclosed
78 | security issues, can be kept confidential.
79 | 
80 | If significant engagement from multiple parties is encountered, the group may
81 | request the formation of a SIG to formalize collaboration and cooperation. The
82 | threshold for SIG formation includes:
83 | 
84 | *   A clearly stated purpose
85 | *   Two or more non-maintainers willing to contribute code, and evidence of
86 |     existing demand for the group
87 | *   Project maintainers willing to be in the group and shepherd contributions
88 | 
89 | For further details on SIGs, read [TensorFlow SIGs](SIGS.md).
90 | 
91 | As with most structures, a project doesn't need a SIG to get started, but should
92 | find a home in one if it has proven itself as an ongoing concern, as SIGs are
93 | the primary organizational vehicle for the contributor community.
94 | 


--------------------------------------------------------------------------------
/governance/tensorflow-testing.md:
--------------------------------------------------------------------------------
 1 | # Testing TensorFlow and Reporting Issues
 2 | 
 3 | ## 📢 How to Report Issues
 4 | 
 5 | Over the last few years, and with the extremely productive involvement of our community (_thank you!_), the TensorFlow development team has [reviewed RFCs](https://github.com/tensorflow/community/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3A2.0+), added [many new features](https://www.tensorflow.org/resources/), and implemented most of what will be [TensorFlow 2.0](https://www.tensorflow.org/community/roadmap#tensorflow_20_is_coming) - a significant milestone for the framework, with a focus on ease of use.
 6 | 
 7 | TensorFlow is truly a community effort, and **we would love to have your feedback on how we've been doing so far**, as well as your suggestions for ways that we can improve!
 8 | 
 9 | ---------------------------------
10 | 
11 | ## 📝 What is a Good Issue?
12 | 
13 | ### 🐞 Report a Bug
14 | 
15 | Please submit all bugs, errors, and pecularities on GitHub. Differences between documentation and implementation, lack of
16 | documentation, performance issues, or compatibility problems are all fair game. Please be specific and include all information
17 | that would be helpful to debug the issue using our issue templates:
18 | 
19 | * **[Bug / Performance Issue](https://github.com/tensorflow/tensorflow/issues/new?template=00-bug-performance-issue.md)**
20 | * **[Build / Installation Issue](https://github.com/tensorflow/tensorflow/issues/new?template=10-build-installation-issue.md)**
21 | * **[Documentation Issue](https://github.com/tensorflow/tensorflow/issues/new?template=20-documentation-issue.md)**
22 | * **[Other Issue - Not Listed](https://github.com/tensorflow/tensorflow/issues/new?template=50-other-issues.md)**
23 | 
24 | If you have a general question, you can [submit it to StackOverflow](https://stackoverflow.com/questions/tagged/tensorflow) with the tag `tensorflow`, or to our [discuss@](https://groups.google.com/a/tensorflow.org/forum/#!forum/discuss) mailing group. Our engineering team tries to answer as many of these questions as possible, but we appreciate help from end users!
25 | 
26 | ### ✨ Submit a Feature Request
27 | 
28 | As members of the TensorFlow community, your recommendations and suggestions are highly valued, and we are honored to have them. Please submit all feature requests as an issue on GitHub:
29 | 
30 | * **[Feature Request](https://github.com/tensorflow/tensorflow/issues/new?template=30-feature-request.md)**
31 | * **[TensorFlow Lite Op Request](https://github.com/tensorflow/tensorflow/issues/new?template=40-tflite-op-request.md)**
32 | 
33 | 
34 | ### 🤔 Send an Experience Report
35 | 
36 | If you would like to submit general feedback about TensorFlow (and in particular, about TensorFlow 2.0), consider submitting a friction log! 
37 | 
38 | **Friction logs** are documents that describe the frustrations and delights of a product, focused around a specific use case (for example, creating an LSTM model for text classification). They're also intended to be brutally honest - feel free to vent or to praise! 😊
39 | 
40 | An template and example of a TensorFlow friction log can be found [here](https://docs.google.com/document/d/1_-0Zzn0hqS4ltLwqWAHm41-MgE60_9zlKyPHr5c-HCs/edit?usp=sharing).
41 | 
42 | Once you have completed such a document, please email it to our [testing team](mailto:testing@tensorflow.org).
43 | 
44 | ---------------------------------
45 | 
46 | ## 🛠 How to Get Involved
47 | 
48 | Between now and the preview launch for TensorFlow 2.0, we will be actively maintaining a discussion group for any questions, comments, suggestions, or issues that arise. **We will be holding a weekly stand-up for TF 2.0 testing via Hangouts** that will be announced through the TensorFlow Testing Discussion Group.
49 | 
50 | _Please subscribe to [testing@tensorflow.org](http://groups.google.com/a/tensorflow.org/forum/#!forum/testing) to stay up-to-date._
51 | 
52 | ### Special Interest Groups (SIGs)
53 | 
54 | TensorFlow's [Special Interest Groups (SIGs)](https://github.com/tensorflow/community/tree/master/sigs) support community collaboration on particular projects. Members of these groups work together to build and support specific parts of TensorFlow or TensorFlow-related projects. 
55 | 
56 | _To join the discussion on a specific topic, subscribe to one of our SIG mailing lists:_
57 | 
58 | * **[TensorBoard](https://groups.google.com/a/tensorflow.org/d/forum/sig-tensorboard)**: Plug-in development, discussion, and contribution to TensorFlow visualization tooling.
59 | * **[Networking](https://groups.google.com/a/tensorflow.org/d/forum/networking)**: Adding network protocols other than gRPC.
60 | * **[I/O](https://groups.google.com/a/tensorflow.org/d/forum/io)**: Support for file systems and formats not available in core TensorFlow.
61 | * **[Add-ons](https://groups.google.com/a/tensorflow.org/d/forum/addons)**: Extensions to TensorFlow that conform to the stable API.
62 | * **[Build](https://groups.google.com/a/tensorflow.org/d/forum/build)**: Discussion on TensorFlow distribution and packaging.
63 | 


--------------------------------------------------------------------------------
/sigs/build/tensorflow-testing.md:
--------------------------------------------------------------------------------
 1 | # Testing TensorFlow and Reporting Issues
 2 | 
 3 | ## 📢 How to Report Issues
 4 | 
 5 | Over the last few years, and with the extremely productive involvement of our community (_thank you!_), the TensorFlow development team has [reviewed RFCs](https://github.com/tensorflow/community/pulls?utf8=%E2%9C%93&q=is%3Apr+label%3A2.0+), added [many new features](https://www.tensorflow.org/resources/), and implemented most of what will be [TensorFlow 2.0](https://www.tensorflow.org/community/roadmap#tensorflow_20_is_coming) - a significant milestone for the framework, with a focus on ease of use.
 6 | 
 7 | TensorFlow is truly a community effort, and **we would love to have your feedback on how we've been doing so far**, as well as your suggestions for ways that we can improve!
 8 | 
 9 | ---------------------------------
10 | 
11 | ## 📝 What is a Good Issue?
12 | 
13 | ### 🐞 Report a Bug
14 | 
15 | Please submit all bugs, errors, and pecularities on GitHub. Differences between documentation and implementation, lack of
16 | documentation, performance issues, or compatibility problems are all fair game. Please be specific and include all information
17 | that would be helpful to debug the issue using our issue templates:
18 | 
19 | * **[Bug / Performance Issue](https://github.com/tensorflow/tensorflow/issues/new?template=00-bug-performance-issue.md)**
20 | * **[Build / Installation Issue](https://github.com/tensorflow/tensorflow/issues/new?template=10-build-installation-issue.md)**
21 | * **[Documentation Issue](https://github.com/tensorflow/tensorflow/issues/new?template=20-documentation-issue.md)**
22 | * **[Other Issue - Not Listed](https://github.com/tensorflow/tensorflow/issues/new?template=50-other-issues.md)**
23 | 
24 | If you have a general question, you can [submit it to StackOverflow](https://stackoverflow.com/questions/tagged/tensorflow) with the tag `tensorflow`, or to our [discuss@](https://groups.google.com/a/tensorflow.org/forum/#!forum/discuss) mailing group. Our engineering team tries to answer as many of these questions as possible, but we appreciate help from end users!
25 | 
26 | ### ✨ Submit a Feature Request
27 | 
28 | As members of the TensorFlow community, your recommendations and suggestions are highly valued, and we are honored to have them. Please submit all feature requests as an issue on GitHub:
29 | 
30 | * **[Feature Request](https://github.com/tensorflow/tensorflow/issues/new?template=30-feature-request.md)**
31 | * **[TensorFlow Lite Op Request](https://github.com/tensorflow/tensorflow/issues/new?template=40-tflite-op-request.md)**
32 | 
33 | 
34 | ### 🤔 Send an Experience Report
35 | 
36 | If you would like to submit general feedback about TensorFlow (and in particular, about TensorFlow 2.0), consider submitting a friction log! 
37 | 
38 | **Friction logs** are documents that describe the frustrations and delights of a product, focused around a specific use case (for example, creating an LSTM model for text classification). They're also intended to be brutally honest - feel free to vent or to praise! 😊
39 | 
40 | An template and example of a TensorFlow friction log can be found [here](https://docs.google.com/document/d/1HVG3t-mgGZKU4iMeguTWGejbnQ54qUTXwdCFkA5xHG0/edit?usp=sharing).
41 | 
42 | Once you have completed such a document, please email it to our [testing team](mailto:testing@tensorflow.org).
43 | 
44 | ---------------------------------
45 | 
46 | ## 🛠 How to Get Involved
47 | 
48 | Between now and the preview launch for TensorFlow 2.0, we will be actively maintaining a discussion group for any questions, comments, suggestions, or issues that arise. **We will be holding a weekly stand-up for TF 2.0 testing via Hangouts** that will be announced through the TensorFlow Testing Discussion Group.
49 | 
50 | _Please subscribe to [testing@tensorflow.org](http://groups.google.com/a/tensorflow.org/forum/#!forum/testing) to stay up-to-date._
51 | 
52 | ### Special Interest Groups (SIGs)
53 | 
54 | TensorFlow's [Special Interest Groups (SIGs)](https://github.com/tensorflow/community/tree/master/sigs) support community collaboration on particular projects. Members of these groups work together to build and support specific parts of TensorFlow or TensorFlow-related projects. 
55 | 
56 | _To join the discussion on a specific topic, subscribe to one of our SIG mailing lists:_
57 | 
58 | * **[TensorBoard](https://groups.google.com/a/tensorflow.org/d/forum/sig-tensorboard)**: Plug-in development, discussion, and contribution to TensorFlow visualization tooling.
59 | * **[Networking](https://groups.google.com/a/tensorflow.org/d/forum/networking)**: Adding network protocols other than gRPC.
60 | * **[I/O](https://groups.google.com/a/tensorflow.org/d/forum/io)**: Support for file systems and formats not available in core TensorFlow.
61 | * **[Add-ons](https://groups.google.com/a/tensorflow.org/d/forum/addons)**: Extensions to TensorFlow that conform to the stable API.
62 | * **[Build](https://groups.google.com/a/tensorflow.org/d/forum/build)**: Discussion on TensorFlow distribution and packaging.
63 | 


--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # TensorFlow Code of Conduct
 2 | 
 3 | In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, religion, or sexual identity and orientation.
 4 | 
 5 | 
 6 | ## Our Standards
 7 | 
 8 | Examples of behavior that contributes to creating a positive environment include:
 9 | 
10 | * Using welcoming and inclusive language
11 | * Being respectful of differing viewpoints and experiences
12 | * Gracefully accepting constructive criticism
13 | * Focusing on what is best for the community
14 | * Showing empathy towards other community members
15 | 
16 | Examples of unacceptable behavior by participants include:
17 | 
18 | * The use of sexualized language or imagery and unwelcome sexual attention or advances
19 | * Trolling, insulting/derogatory comments, and personal or political attacks
20 | * Public or private harassment
21 | * Publishing others' private information, such as a physical or electronic address, without explicit permission
22 | * Conduct which could reasonably be considered inappropriate for the forum in which it occurs. 
23 | 
24 | All TensorFlow forums and spaces are meant for professional interactions, and any behavior which could reasonably be considered inappropriate in a professional setting is unacceptable.
25 | 
26 | 
27 | ## Our Responsibilities
28 | 
29 | Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior.
30 | 
31 | Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful.
32 | 
33 | 
34 | ## Scope
35 | 
36 | This Code of Conduct applies to all content on tensorflow.org, TensorFlow’s GitHub organization, or any other official TensorFlow web presence allowing for community interactions, as well as at all official TensorFlow events, whether offline or online.
37 | 
38 | The Code of Conduct also applies within project spaces and in public spaces whenever an individual is representing TensorFlow or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed or de facto representative at an online or offline event. 
39 | 
40 | 
41 | ## Conflict Resolution
42 | 
43 | Conflicts in an open source project can take many forms, from someone having a bad day and using harsh and hurtful language in the issue queue, to more serious instances such as sexist/racist statements or threats of violence, and everything in between.
44 | 
45 | If the behavior is threatening or harassing, or for other reasons requires immediate escalation, please see below.
46 | 
47 | However, for the vast majority of issues, we aim to empower individuals to first resolve conflicts themselves, asking for help when needed, and only after that fails to escalate further. This approach gives people more control over the outcome of their dispute. 
48 | 
49 | If you are experiencing or witnessing conflict, we ask you to use the following escalation strategy to address the conflict:
50 | 
51 | 1. Address the perceived conflict directly with those involved, preferably in a real-time medium. 
52 | 2. If this fails, get a third party (e.g. a mutual friend, and/or someone with background on the issue, but not involved in conflict) to intercede.
53 | 3. If you are still unable to resolve the conflict, and you believe it rises to harassment or another code of conduct violation, report it.
54 | 
55 | 
56 | ## Reporting Violations
57 | 
58 | Violations of the Code of Conduct can be reported to TensorFlow’s Project Stewards, Edd Wilder-James (ewj@google.com) and Sarah Novotny (sarahnovotny@google.com). The Project Steward will determine whether the Code of Conduct was violated, and will issue an appropriate sanction, possibly including a written warning or expulsion from the project, project sponsored spaces, or project forums. We ask that you make a good-faith effort to resolve your conflict via the conflict resolution policy before submitting a report.
59 | 
60 | Violations of the Code of Conduct can occur in any setting, even those unrelated to the project. We will only consider complaints about conduct that has occurred within one year of the report.
61 | 
62 | 
63 | ## Enforcement
64 | 
65 | If the Project Stewards receive a report alleging a violation of the Code of Conduct, the Project Stewards will notify the accused of the report, and provide them an opportunity to discuss the report before a sanction is issued. The Project Stewards will do their utmost to keep the reporter anonymous. If the act is ongoing (such as someone engaging in harassment), or involves a threat to anyone's safety (e.g. threats of violence), the Project Stewards may issue sanctions without notice.
66 | 
67 | 
68 | ## Attribution
69 | 
70 | This Code of Conduct is adapted from the Contributor Covenant, version 1.4, available at https://contributor-covenant.org/version/1/4, and includes some aspects of the Geek Feminism Code of Conduct and the Drupal Code of Conduct.
71 | 


--------------------------------------------------------------------------------
/governance/SIGS.md:
--------------------------------------------------------------------------------
  1 | # TensorFlow Special Interest Groups (SIGs)
  2 | 
  3 | ## What makes a good SIG?
  4 | 
  5 | The ideal scope for a SIG will meet a well-defined domain, where the majority
  6 | participation will be from the community. Additionally, there should be
  7 | sufficient evidence that there are community members willing to engage and
  8 | contribute should the interest group be established.
  9 | 
 10 | Not all SIGs will have the same level of energy, breadth of scope, or governance
 11 | models, so we should expect some variability.
 12 | 
 13 | ## Non-goals: What a SIG is not
 14 | 
 15 | The intent of a SIG is to facilitate collaboration on shared work. A SIG is
 16 | therefore:
 17 | 
 18 | *   **Not a support forum**: a mailing list and a SIG is not the same thing
 19 | *   **Not immediately required**: early on in a project's life, you may not know if you have shared work or collaborators
 20 | *   **Not free labor**: energy is required to grow and coordinate the work collaboratively.
 21 | 
 22 | Our approach to SIG creation will be conservative: thanks to the ease of
 23 | starting projects on GitHub, there are many avenues where collaboration can
 24 | happen without the need for a SIG.
 25 | 
 26 | ## SIG playbook
 27 | 
 28 | ### Research and consultation
 29 | 
 30 | Proposers of groups should gather evidence for approval, as specified below.
 31 | Some possible avenues to consider are:
 32 | 
 33 | *   A well-defined problem or set of problems the group would solve
 34 | *   Consultation with community members who would benefit, assessing both the
 35 |     benefit and their willingness to commit
 36 | *   For existing projects, evidence from issues and PRs that contributors care
 37 |     about the topic
 38 | *   Potential goals for the group to achieve
 39 | *   Resource requirements of running the group
 40 | 
 41 | Even if the need for a SIG seems self-evident, the research and consultation is
 42 | still important to the success of the group.
 43 | 
 44 | ### Creating the new group
 45 | 
 46 | The new group should follow the below process for chartering. In particular, it
 47 | must demonstrate:
 48 | 
 49 | *   A clear purpose and benefit to TensorFlow (either around a sub-project or
 50 |     application area)
 51 | *   Two or more contributors willing to act as group leads, existence of other
 52 |     contributors, and evidence of demand for the group
 53 | *   Resources it will initially require (usually, mailing list and regular VC
 54 |     call.)
 55 | 
 56 | Approval for the group will be given by a decision of the TF Community Team,
 57 | defined as being the maintainers of the _tensorflow/community_ project. The team
 58 | will consult other stakeholders as necessary.
 59 | 
 60 | Before entering the formal parts of the process, it is advisable to consult with
 61 | the TensorFlow community team, *community-team@tensorflow.org*. It is highly
 62 | likely that conversation and iteration will be required before the SIG request
 63 | is ready.
 64 | 
 65 | The formal request for the new group is done by submitting a charter as a PR to
 66 | _tensorflow/community_, and including the request in the comments on the PR (see
 67 | template below). On approval, the PR for the group will be merged and the
 68 | required resources created.
 69 | 
 70 | ### Template Request for New SIG
 71 | 
 72 | This template will be available in the community repo: [SIG-request-template.md](SIG-request-template.md).
 73 | 
 74 | ### Chartering
 75 | 
 76 | Each group will be established with a charter, and be governed by the TensorFlow
 77 | code of conduct. Archives of the group will be public. Membership may either be
 78 | open to all without approval, or available on request, pending approval of the
 79 | group administrator.
 80 | 
 81 | The charter must nominate an administrator. As well as an administrator, the
 82 | group must include at least one person as lead (these may be the same person),
 83 | who will serve as point of contact for coordination as required with the TF
 84 | community team.
 85 | 
 86 | This charter will be posted initially to the group mailing list. The _community_
 87 | repository in the TensorFlow Github organization will archive such documents and
 88 | policies ([example from Kubernetes](https://github.com/kubernetes/community)).
 89 | As any group evolves its practices and conventions, we expect it to document
 90 | these within the relevant part of the community repository.
 91 | 
 92 | ### Collaboration and inclusion
 93 | 
 94 | While it is not mandated, the group should choose to make use of collaboration
 95 | via scheduled conference call or chat channels to conduct meetings. Any such
 96 | meetings should be advertised on the mailing list, and notes posted to the
 97 | mailing list afterwards. Regular meeting helps drive accountability and progress
 98 | in a SIG.
 99 | 
100 | TensorFlow community team members will proactively monitor and encourage the
101 | group to discussion and action as appropriate.
102 | 
103 | ### Launching
104 | 
105 | Required activities:
106 | 
107 | *   Notifying TensorFlow general mailing lists (discuss@, developers ML) of new group
108 | *   Adding SIG to the community pages on TensorFlow web site
109 | 
110 | Optional activities:
111 | 
112 | *   Creating a blog post for the TensorFlow Medium.com blog community
113 | 
114 | ### Health and termination of SIGs
115 | 
116 | The TF community team will make best effort to ensure the health of SIGs. From
117 | time to time it will request the SIG lead to provide a report of the SIG's work,
118 | which will be used to inform the broader TensorFlow community of the activity of
119 | the group.
120 | 
121 | If a SIG no longer has a useful purpose or interested community, it may be
122 | archived and cease operation. The TF community team reserves the right to
123 | archive such inactive SIGs, in order to maintain the health of the project at
124 | large, though it is a less preferable outcome. A SIG may also opt to disband if
125 | it recognizes it has reached the end of its useful life.
126 | 


--------------------------------------------------------------------------------
/governance/TF-RFCs.md:
--------------------------------------------------------------------------------
  1 | # TensorFlow Request for Comments (TF-RFC)
  2 | 
  3 | The purpose of a TensorFlow RFC is to engage the TensorFlow community in
  4 | development, by getting feedback from stakeholders and experts, and
  5 | communicating design changes broadly.
  6 | 
  7 | ## Who is involved?
  8 | 
  9 | Any **community member** may help by providing feedback on whether the RFC will
 10 | meet their needs.
 11 | 
 12 | An **RFC author** is one or more community member who writes an RFC and is
 13 | committed to championing it through the  process.
 14 | 
 15 | An **RFC sponsor** is any maintainer who sponsors the RFC and will shepherd it
 16 | through the RFC review process.
 17 | 
 18 | A **review committee** is a group of maintainers who have the responsibility of
 19 | recommending the adoption of the RFC.
 20 | 
 21 | ## What is a TensorFlow RFC?
 22 | 
 23 | An RFC is a document that describes a requirement and the proposed changes that
 24 | will solve it. Specifically, the RFC will:
 25 | 
 26 | * be formatted according to the RFC template
 27 | * be submitted as a pull request to the
 28 |   [community/rfcs](https://github.com/tensorflow/community/tree/master/rfcs) directory
 29 | * be subject to discussion and a review meeting prior to acceptance
 30 | 
 31 | ## RFC process
 32 | 
 33 | Before submitting an RFC, it is a good idea to discuss your aims with project
 34 | contributors and maintainers and get early feedback. Use the developer mailing
 35 | list for the project concerned (developers@tensorflow.org, or the list for the
 36 | relevant SIG). After writing the RFC draft, get feedback from these
 37 | experts before submitting it.
 38 | 
 39 | 1. Recruit a sponsor from the maintainers of the project which your RFC concerns.
 40 | 
 41 |    Identify them in the RFC, before posting the PR in step 2.
 42 |    If no sponsor is found you may still post the RFC, but if 
 43 |    within a month of posting the PR there is still no sponsor,
 44 |    it will be closed.
 45 | 
 46 | 2. Submit your RFC as a pull request to community/rfcs. 
 47 | 
 48 |    Name your RFC file using the [template](https://github.com/tensorflow/community/blob/master/rfcs/yyyymmdd-rfc-template.md) `YYYYMMDD-descriptive-name.md`, where
 49 |    YYYYMMDD is the date of submission, and ‘descriptive-name’ relates to the
 50 |    title of your RFC. For instance, if your RFC is titled “Parallel Widgets API”,
 51 |    you might use the filename `20180531-parallel-widgets.md`. If you have images
 52 |    or other auxiliary files, create a directory of the form `YYYYMMDD-descriptive-name`
 53 |    in which to store those files.
 54 | 
 55 |    Include the header table and the contents of the **Objective** section 
 56 |    in the comment of your pull request, using Markdown. For an example,
 57 |    please see [this example
 58 |    RFC](https://github.com/tensorflow/community/pull/5). Include a mention
 59 |   of any of the GitHub handles of co-authors, reviewers, and sponsors.
 60 | 
 61 |    At the top of the PR identify how long the comment period will be. This
 62 |    should be a minimum of two weeks from posting the PR.
 63 | 
 64 | 3. Email the developer mailing list with a brief description, and a link to the
 65 |    PR and a request for review. Follow the example of previous mailings,
 66 |    as you can see in [this
 67 |    example](https://groups.google.com/a/tensorflow.org/forum/#!topic/developers/PIChGLLnpTE).
 68 | 
 69 | 4. The sponsor will request a review committee meeting, no sooner than two weeks
 70 |    after the RFC PR is posted. If discussion is lively, wait until it has
 71 |    settled before going to review. The goal of the review meeting is to resolve
 72 |    minor issues; consensus should be reached on major issues beforehand.
 73 | 
 74 | 5. The meeting may approve the RFC, reject it, or require changes before it
 75 |    can be considered again. Approved RFCs will be merged into community/rfcs, and
 76 |    rejected RFCs will have their PRs closed.
 77 | 
 78 | 6. Implementations of a successful RFC should reference it in their
 79 |    documentation, and work with the sponsor to successfully land the code.
 80 | 
 81 | While implementation code is not necessary to start the RFC process, its
 82 | existence in full or part may help the design discussion.
 83 | 
 84 | If in any doubt about this process, feel free to ask on the
 85 | developers mailing list or file an issue in tensorflow/community.
 86 | 
 87 | ## Community members
 88 | 
 89 | As the purpose of RFCs is to ensure the community is well represented and served
 90 | by new changes to TensorFlow, it is the responsibility of community members to
 91 | participate in reviewing RFCs where they have an interest in the outcome.
 92 | 
 93 | Community members should:
 94 | 
 95 | * provide feedback as soon as possible to allow adequate time for consideration
 96 | * read RFCs thoroughly before providing feedback
 97 | * be civil and constructive
 98 | 
 99 | ## Review committees
100 | 
101 | The constitution of a review committee may change according to the particular
102 | governance style and leadership of each project. For core TensorFlow, the
103 | committee will exist of contributors to the TensorFlow project, who have
104 | expertise in the domain area concerned.
105 | 
106 | Review committees must:
107 | 
108 | * ensure that substantive items of public feedback have been accounted for
109 | * add their meeting notes as comments to the PR
110 | * provide reasons for their decisions
111 | 
112 | If a review committee requires changes before acceptance, it is the
113 | responsibility of the sponsor to ensure these are made and seek subsequent
114 | approval from the committee members.
115 | 
116 | ## RFC sponsors
117 | 
118 | A sponsor is a project maintainer responsible for ensuring the best possible
119 | outcome of the RFC process. In particular this includes:
120 | 
121 | * advocating for the proposed design
122 | * guiding the RFC to adhere to existing design and style conventions
123 | * guiding the review committee to come to a productive consensus
124 | * if the RFC moves to implementation:
125 |   * ensuring proposed implementation adheres to the design
126 |   * liaison with appropriate parties to successfully land implementation
127 | 
128 | ## Keeping the bar high
129 | 
130 | While we encourage and celebrate every contributor, the bar for RFC acceptance
131 | should be kept intentionally high. A design may be rejected or need significant
132 | revision at any one of these stages:
133 | 
134 | * initial design conversations on the relevant mailing list
135 | * failure to recruit a sponsor
136 | * critical objections during the feedback phase
137 | * failure to achieve consensus during the design review
138 | * concerns raised during implementation (e.g., inability to achieve backwards
139 |   compatibility, concerns about maintenance appearing once a partial implementation
140 |   is available)
141 | 
142 | If this process is functioning well, RFCs are expected to fail in the earlier,
143 | rather than later, stages.
144 | 
145 | An approved RFC is no guarantee of a commitment to implement, and acceptance of
146 | a proposed RFC implementation is still subject to the usual code review
147 | process.
148 | 
149 | ## RFC Template
150 | 
151 | Use the template [from
152 | GitHub](https://github.com/tensorflow/community/blob/master/rfcs/yyyymmdd-rfc-template.md),
153 | being sure to follow the naming conventions described above.
154 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
  1 | # Contributing guidelines
  2 | 
  3 | ## How to become a contributor and submit your own code
  4 | 
  5 | ### Contributor License Agreements
  6 | 
  7 | We'd love to accept your patches! Before we can take them, we have to jump a couple of legal hurdles.
  8 | 
  9 | Please fill out either the individual or corporate Contributor License Agreement (CLA).
 10 | 
 11 |   * If you are an individual writing original source code and you're sure you own the intellectual property, then you'll need to sign an [individual CLA](https://code.google.com/legal/individual-cla-v1.0.html).
 12 |   * If you work for a company that wants to allow you to contribute your work, then you'll need to sign a [corporate CLA](https://code.google.com/legal/corporate-cla-v1.0.html).
 13 | 
 14 | Follow either of the two links above to access the appropriate CLA and instructions for how to sign and return it. Once we receive it, we'll be able to accept your pull requests.
 15 | 
 16 | ***NOTE***: Only original source code from you and other people that have signed the CLA can be accepted into the main repository.
 17 | 
 18 | ### Contributing code
 19 | 
 20 | If you have improvements to TensorFlow, send us your pull requests! For those
 21 | just getting started, Github has a [howto](https://help.github.com/articles/using-pull-requests/).
 22 | 
 23 | TensorFlow team members will be assigned to review your pull requests. Once the pull requests are approved and pass continuous integration checks, we will merge the pull requests.
 24 | For some pull requests, we will apply the patch for each pull request to our internal version control system first, and export the change out as a new commit later, at which point the original pull request will be closed. The commits in the pull request will be squashed into a single commit with the pull request creator as the author. These pull requests will be labeled as pending merge internally.
 25 | 
 26 | If you want to contribute but you're not sure where to start, take a look at the
 27 | [issues with the "contributions welcome" label](https://github.com/tensorflow/tensorflow/labels/stat%3Acontributions%20welcome).
 28 | These are issues that we believe are particularly well suited for outside
 29 | contributions, often because we probably won't get to them right now. If you
 30 | decide to start on an issue, leave a comment so that other people know that
 31 | you're working on it. If you want to help out, but not alone, use the issue
 32 | comment thread to coordinate.
 33 | 
 34 | ### Contribution guidelines and standards
 35 | 
 36 | Before sending your pull request for
 37 | [review](https://github.com/tensorflow/tensorflow/pulls),
 38 | make sure your changes are consistent with the guidelines and follow the
 39 | TensorFlow coding style.
 40 | 
 41 | #### General guidelines and philosophy for contribution
 42 | 
 43 | * Include unit tests when you contribute new features, as they help to
 44 |   a) prove that your code works correctly, and b) guard against future breaking
 45 |   changes to lower the maintenance cost.
 46 | * Bug fixes also generally require unit tests, because the presence of bugs
 47 |   usually indicates insufficient test coverage.
 48 | * Keep API compatibility in mind when you change code in core TensorFlow,
 49 |   e.g., code in [tensorflow/core](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core) and  [tensorflow/python](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/python).
 50 |   TensorFlow has reached version 1 and hence cannot make
 51 |   non-backward-compatible API changes without a major release. Reviewers of your
 52 |   pull request will comment on any API compatibility issues.
 53 | * When you contribute a new feature to TensorFlow, the maintenance burden is (by
 54 |   default) transferred to the TensorFlow team. This means that benefit of the
 55 |   contribution must be compared against the cost of maintaining the feature.
 56 | * Full new features (e.g., a new op implementing a cutting-edge algorithm)
 57 |   typically will live in
 58 |   [tensorflow/contrib](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib)
 59 |   to get some airtime before decision is made regarding whether they are to be
 60 |   migrated to the core.
 61 | 
 62 | #### License
 63 | 
 64 | Include a license at the top of new files.
 65 | 
 66 | * [C/C++ license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op.cc#L1)
 67 | * [Python license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/nn.py#L1)
 68 | * [Java license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/src/main/java/org/tensorflow/Graph.java#L1)
 69 | * [Go license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/go/operation.go#L1)
 70 | * [Bash license example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/ci_build/ci_sanity.sh#L2)
 71 | * [HTML license example](https://github.com/tensorflow/tensorboard/blob/master/tensorboard/components/tf_backend/tf-backend.html#L2)
 72 | * [JavaScript/TypeScript license example](https://github.com/tensorflow/tensorboard/blob/master/tensorboard/components/tf_backend/backend.ts#L1)
 73 | 
 74 | Bazel BUILD files also need to include a license section, e.g.,
 75 | [BUILD example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/BUILD#L61).
 76 | 
 77 | #### C++ coding style
 78 | 
 79 | Changes to TensorFlow C++ code should conform to
 80 | [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
 81 | 
 82 | Use `clang-tidy` to check your C/C++ changes. To install clang-tidy on ubuntu:16.04, do:
 83 | 
 84 | ```bash
 85 | apt-get install -y clang-tidy
 86 | ```
 87 | 
 88 | You can check a C/C++ file by doing:
 89 | 
 90 | 
 91 | ```bash
 92 | clang-format <my_cc_file> --style=google > /tmp/my_cc_file.cc
 93 | diff <my_cc_file> /tmp/my_cc_file.cc
 94 | ```
 95 | 
 96 | #### Python coding style
 97 | 
 98 | Changes to TensorFlow Python code should conform to
 99 | [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
100 | 
101 | Use `pylint` to check your Python changes. To install `pylint` and
102 | retrieve TensorFlow's custom style definition:
103 | 
104 | ```bash
105 | pip install pylint
106 | wget -O /tmp/pylintrc https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/tools/ci_build/pylintrc
107 | ```
108 | 
109 | To check a file with `pylint`:
110 | 
111 | ```bash
112 | pylint --rcfile=/tmp/pylintrc myfile.py
113 | ```
114 | 
115 | #### Coding style for other languages
116 | 
117 | * [Google Java Style Guide](https://google.github.io/styleguide/javaguide.html)
118 | * [Google JavaScript Style Guide](https://google.github.io/styleguide/jsguide.html)
119 | * [Google Shell Style Guide](https://google.github.io/styleguide/shell.xml)
120 | * [Google Objective-C Style Guide](https://google.github.io/styleguide/objcguide.html)
121 | 
122 | #### Running sanity check
123 | 
124 | If you have Docker installed on your system, you can perform a sanity check on
125 | your changes by running the command:
126 | 
127 | ```bash
128 | tensorflow/tools/ci_build/ci_build.sh CPU tensorflow/tools/ci_build/ci_sanity.sh
129 | ```
130 | 
131 | This will catch most license, Python coding style and BUILD file issues that
132 | may exist in your changes.
133 | 
134 | #### Running unit tests
135 | 
136 | There are two ways to run TensorFlow unit tests.
137 | 
138 | 1. Using tools and libraries installed directly on your system.
139 | 
140 |    Refer to the
141 |    [CPU-only developer Dockerfile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel) and
142 |    [GPU developer Dockerfile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-gpu)
143 |    for the required packages. Alternatively, use the said
144 |    [Docker images](https://hub.docker.com/r/tensorflow/tensorflow/tags/), e.g.,
145 |    `tensorflow/tensorflow:nightly-devel` and `tensorflow/tensorflow:nightly-devel-gpu`
146 |    for development to avoid installing the packages directly on your system.
147 | 
148 |    Once you have the packages installed, you can run a specific unit test in
149 |    bazel by doing as follows:
150 | 
151 |    If the tests are to be run on GPU, add CUDA paths to LD_LIBRARY_PATH and add
152 |    the `cuda` option flag
153 | 
154 |    ```bash
155 |    export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH"
156 | 
157 |    export flags="--config=opt --config=cuda -k"
158 |    ```
159 | 
160 |    For example, to run all tests under tensorflow/python, do:
161 | 
162 |    ```bash
163 |    bazel test ${flags} //tensorflow/python/...
164 |    ```
165 | 
166 | 2. Using [Docker](https://www.docker.com) and TensorFlow's CI scripts.
167 | 
168 |    ```bash
169 |    # Install Docker first, then this will build and run cpu tests
170 |    tensorflow/tools/ci_build/ci_build.sh CPU bazel test //tensorflow/...
171 |    ```
172 | 
173 |    See
174 |    [TensorFlow Builds](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/ci_build) for details.
175 | 
176 | 


--------------------------------------------------------------------------------
/rfcs/20180604-dynamic-kernels.md:
--------------------------------------------------------------------------------
  1 | # Dynamic Loading of Kernels in TensorFlow
  2 |  | Status | Accepted |
  3 | :-------------- | :-------------------------------------------------|
  4 | | **Author(s)** | Gunhan Gulsoy (Google) |
  5 | | **Sponsor**   | Martin Wicke (Google)  |
  6 | | **Updated**   | 2018-06-04             |
  7 | 
  8 | ## Objective
  9 |  This document describes a new way to create and deploy new kernels for
 10 | TensorFlow. We propose deploying kernels in separate shared libraries (dso,
 11 | dylib or dll) and loading these at runtime. While at the moment the scope of
 12 | this document only covers **TensorFlow Python distribution**, we aim to
 13 | generalize this approach for all TF distributions. With this mechanism, we
 14 | would like to create the following capabilities:
 15 | * Loading kernels dynamically at runtime from shared libraries.
 16 | * Being able to load multiple kernels for the same op/device pair, and pick the
 17 |   best one in terms of hardware compatibility and performance.
 18 | * Check the hardware and load the compatible kernels.
 19 | * Check compiler options used and load the compatible kernels.
 20 | 
 21 | ## Overview
 22 |  For an Op, we need three pieces:
 23 |  * Python bindings, to make them accessible in the Python API
 24 | * C++ op implementation
 25 | * C++ Kernel implementation(s)
 26 | 
 27 | This document proposes a new way on how **kernels** can be deployed and loaded.
 28 | 
 29 | In the current mechanism, the only constraint is Python bindings have to be
 30 | executed/loaded after C++ op implementation is loaded. Kernels can be loaded at
 31 | any time. This makes our task easier. When a kernel is loaded, it registers
 32 | itself in the global registry with a string key. The string key is constructed
 33 | as follows: `op_name:device_name:(optional)label`
 34 | 
 35 | To start this project off, what we propose is the following:
 36 | * Create a new API, `tf.load_kernel_library`
 37 | * Use the new API to load kernels from a different shared object.
 38 | 
 39 | Then, we will start to build checks, to be more picky about the kernels we load.
 40 | * Build handling for loading multiple kernels for the same op and device pair.
 41 | * Enhance Global Kernel Registry to allow cleanup of registered kernels when a
 42 |   library is unloaded.
 43 | * Build the library compatibility checking mechanism, and unload libraries when
 44 |   they are found to be incompatible
 45 | 
 46 | Finally, we will add the following advanced checks
 47 | * Keep track of which libraries provide which kernels
 48 | * Garbage collection of unqualified kernels, and their libraries.
 49 | 
 50 | ## Detailed Current State
 51 |  While this document proposes a new way to **load kernels**, there is a lot of
 52 | ideas we would like to adopt from the way ops are loaded. Therefore, current
 53 | op loading mechanism is also described in this section.
 54 | 
 55 | ### Op loading
 56 |  Currently, we can load op libraries from shared objects. When loading custom or
 57 | contrib ops, we also load their kernels. The following pseudocode describes how
 58 | the current custom/contrib op loading mechanism works:
 59 | * Custom contrib op Python bindings are not loaded until they are accessed.
 60 | * At the first access, the `__init__` file of the custom op module calls `tf.load_op_library`
 61 | * `load_op_library` loads the shared object using `TF_LoadLibrary` in the C API
 62 | * Once the shared object is loaded, `load_op_library` now executes and loads the rest of the Python code in the op library.
 63 | 
 64 | Now, diving deep into `TF_LoadLibrary`
 65 | * `TF_LoadLibrary` is called. This is just a thin wrapper and status checker around `tensorflow::LoadLibrary`
 66 | * `tensorflow::LoadLibrary` checks first if this shared object is already loaded
 67 | * In a serial way, making sure only one library is processed at a time:
 68 |   * It starts a watcher for `OpRegistry`, to get a list of ops included in the library
 69 |   * Try loading the library using `Environment::LoadLibrary`
 70 |   * Which just calls `tensorflow::internal::LoadLibrary`
 71 |   * Which is essentially just `dlopen`.
 72 | 
 73 | ### Kernel loading
 74 |  Currently, kernel loading mechanism is simpler than the op loading mechanism, at least at loading time. The mechanism can be summarized as follows:
 75 | * Kernels use `REGISTER_KERNEL_BUILDER` macro to create a static initializer
 76 | * The static initializer is just an object of type `OpKernelRegistrar`
 77 | * Which calls `OpKernelRegistrar::InitInternal`
 78 | * Which saves the kernel in the `GlobalKernelRegistry`, with a factory method.
 79 | * Kernel is read from the registry and instantiated when op tries to be executed.
 80 | 
 81 | ## Design
 82 | Here we will describe the details of the work we plan to perform. The work will be divided into three milestones:
 83 | 
 84 | ### Milestone 1: Load kernels from shared objects
 85 |  This phase will just be a simple proof of concept, to show that loading kernels
 86 | from shared objects will work. The deliverables of this phase are:
 87 | 1. `tf.load_kernel_library` api. This new method on our API will be responsible
 88 |   for loading kernels from given shared objects, or folders containing shared
 89 |   objects. It will:
 90 |   * Load the given shared object, if it is an `.so` file
 91 |   * If a folder is given, load all `libtfkernel-*` shared object files in the folder
 92 | 2. Split one or more kernels into a different shared object. This will involve:
 93 |   * Resolve the `BUILD` dependency mess to be able to create a reasonably small
 94 |     shared object for a kernel (size will be optimized later).
 95 |   * Resolve all symbol collisions stemming from the different shared objects,
 96 |     potentially both depending on core TF framework.
 97 |   * Finally, on the Python side of the op whose kernel is being split out, add
 98 |     the directive: `tf.load_kernel_library(“libtfkernel_kernel_name.so”)`
 99 | 3. Get a bazel test to pass with a split kernel library
100 | 4. Get a working Python wheel file with a split kernel library, and run the
101 |    kernel from the shared object.
102 |  To simplify the proof of concept, at this stage we will only do this on linux.
103 | 
104 | ### Milestone 2: Enable kernel compatibility checks
105 | Once the proof of concept is ready, we need to start building the fancier
106 | features of the proposal. These will be:
107 |  1. Create a mechanism to save the compiler options from bazel side, and make
108 |   them available to read in C++ runtime.
109 | 2. Create a mechanism in addition to `KernelDef` to be stored in the
110 |   `GlobalKernelRegistry` to help decide which kernels should be loaded. The
111 |   following is the data structure we propose for this information:
112 |  ```c
113 | typedef struct TF_DsoDef {
114 |   const char* name;
115 |   const char* version;
116 | };
117 | 
118 | typedef struct TF_HardwareDef {
119 |   const char** SIMD_ISA;  // Or enum
120 |   int SIMD_ISA_length;
121 |   char* cpu_arch;
122 |   const char** accelerator;
123 |   int accelerator_length;
124 | };
125 | 
126 | typedef struct TF_CompilerDef {
127 |   const char* compiler;
128 |   const char* compiler_version;
129 |   const char** compiler_options;
130 |   int compiler_options_length;
131 |   int memory_alignment;
132 | };
133 | 
134 | typedef struct TF_SourceDef {
135 |   const char* git_hash;
136 | };
137 | 
138 | typedef struct TF_KernelBuildInfo {
139 |   TF_DsoDef* dependencies;
140 |   int dependencies_list_size;
141 | 
142 |   TF_SourceDef source_version;
143 |   TF_HardwareDef hardware_def;
144 |   TF_CompilerDef compiler_def;
145 | };
146 | ```
147 | 3. Create Methods to extract all the above information from the core runtime,
148 |   to check for compatibility with any given kernel library.
149 | 4. During kernel registration, implement checks for the following:
150 |   * Is this kernel compatible with the given hardware
151 |   * Is this kernel compatible with the software available on the system
152 |   * Is this kernel ABI compatible with the core runtime
153 |   * Is this kernel faster than any other kernels that are loaded. In this context faster means one of the following:
154 |     * Better optimized for the hardware
155 |     * Uses a special acceleration library such as MKL
156 | 5. Provide means to override some of the above checks for loading experimental kernels
157 | 6. Expand Global kernel registry to be functionally similar to the op registry. Op registry can unregister ops if there are any problems during the object loading, kernel registry should be able to do the same.
158 | 
159 | ### Milestone 3: Make it work on different OSs
160 |  While the above will be done on linux, we will have to get things to work on all operating systems we support. For macos, the issues are mainly around bazel bugs. For windows, we will have to be more careful about symbol collisions, and partial lockdown of symbol exports may be required to get things working.
161 | 
162 | ### Milestone 4: Memory and performance optimizations
163 |  When we load multiple shared objects, we can easily have some bloat in memory
164 | usage, or performance hits. The simplest things we can foresee are:
165 |  1. Multiple kernel registry entries that are retained when multiple kernels for
166 |   the same op and device pair are loaded.
167 | 2. Some shared object may only include slow kernels, and they may just be
168 |   included in the distribution for compatibility. We can unload shared objects
169 |   from memory if none of the kernels in it are useful.
170 | 3. Minimize the total size of the shared libraries created. Currently, tf
171 |   framework is this big monolithic build rule everyone ends up depending on.
172 |   Try to slim down the kernels, and get them to a size that makes sense to be
173 |   included in tf lite packages.
174 | 4. Make sure there are only kernels in the given shared object. Error out if
175 |   someone sneaks in ops in kernel libraries.
176 | 
177 | ## Alternatives considered
178 |  A number of alternatives have been considered before deciding on this route:
179 |  1. Create and distribute the whole package with different compiler options.
180 |   While this is the path of least resistance, the monolithic package that needs
181 |   to be tested fully on different hardware and compiler options is becoming
182 |   unmanageable. The simplest example is, we have a lot of code that needs to be
183 |   tested with GPU compilers only once, but we end up having to run similar tests
184 |   with 5+ different compiler options. Such issues drive up our testing costs in
185 |   terrms of both resources, and developer time.
186 | 2. Splitting kernels into different binaries rather than different shared
187 |   objects. While this will protect us from symbol collisions, ODR violations, or
188 |   other classical headaches that plague shared objects, this will make things
189 |   slower. Also, we would need to implement shared memory pages to share data
190 |   across different processes, which will incur a similar engineering cost to the
191 |   proposed approach. Therefore, we decided on using shared libraries instead.
192 | 


--------------------------------------------------------------------------------
/rfcs/20180507-cond-v2.md:
--------------------------------------------------------------------------------
  1 | # **"Functional"** **cond design doc**
  2 | 
  3 | | Status        | Approved       |
  4 | :-------------- |:---------------------------------------------------- |
  5 | | **Author(s)** | Skye Wanderman-Milne (skyewm@gmail.com)  |
  6 | | **Created**   | 2018-05-07                                           |
  7 | | **Updated**   | 2018-08-22                                           |
  8 | 
  9 | ## Objective
 10 | 
 11 | **Switch tf.cond to emit a single If op.**
 12 | 
 13 | We can do tf.while_loop next.
 14 | 
 15 | This would make mapping to XLA's control flow constructs easier/possible. In particular, just switching to the If op would be a big win (more work needed to get cond working with XLA than while_loop, which already had a lot of work done), and easier than while loop. It will also making debugging and analysis of cond constructs much simpler, e.g. to implement higher-order derivatives.
 16 | 
 17 | Note that cond will still support side-effecting ops (e.g. variable updates).
 18 | 
 19 | 
 20 | ## Background material
 21 | 
 22 | tf.cond API: https://www.tensorflow.org/api_docs/python/tf/cond
 23 | 
 24 | If op: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/functional_ops.cc#L104
 25 | 
 26 | Overview of current control flow implementation: [Implementation of Control Flow in TensorFlow](http://download.tensorflow.org/paper/white_paper_tf_control_flow_implementation_2017_11_1.pdf)
 27 | 
 28 | 
 29 | ## Design overview
 30 | 
 31 | 
 32 | ### Functional tf.cond
 33 | 
 34 | The signature of `tf.cond` will stay the same: boolean predicate Tensor, and Python callables for the two branches. The two callables each take no arguments (they instead close over any input tensors), and are required to return the same number and type of tensors.
 35 | 
 36 | We need to convert this to the If op signature, which is a boolean predicate, and FunctionDefs for the two branches. The FunctionDefs are required to have the same number and type of inputs and outputs. Luckily, tfe.defun already gives us the machinery to convert the Python callables into FunctionDefs, including converting closures to inputs and adding extra inputs to make the branch signatures match. This is done via an overloaded Graph subclass, [FuncGraph](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/eager/function.py#L191), which gives us the full flexibility of graphs while creating the branch functions.
 37 | 
 38 | This conversion results in a single If op representing the `tf.cond`.
 39 | 
 40 | 
 41 | ### Gradients
 42 | 
 43 | The gradient of an If op is another If op. The predicate is the same as the forward op's, and each branch function is the gradient function of the corresponding forward branch.
 44 | 
 45 | This requires the gradient branch functions to access intermediate tensors of the forward branch functions. Internal tensors in a function can't be directly accessed, so we need to add the necessary intermediates as outputs to the forward If op (how to do this is discussed in the "Implementation challenges" section).
 46 | 
 47 | 
 48 | ### Execution
 49 | 
 50 | There are two choices for running the resulting If ops:
 51 | 
 52 | 
 53 | 
 54 | 1.  Use the `IfOp` kernel as-is, which runs the functions using `FunctionLibraryRuntime`.
 55 | 1.  "Lower" the If ops to the current `tf.cond` implementation (i.e. `Switch` and `Merge` nodes).
 56 | 
 57 | (1) is simpler at a high level, but (2) will avoid some of the implementation challenges below.
 58 | 
 59 | The lowering can be implemented as an early (pre-placement) optimization pass, in order for the lowered control flow to be placed, pruned, partitioned, etc. as usual. There are already a few examples of similar passes: ParallelConcatRemovePass, AccumulateNV2RemovePass
 60 | 
 61 | **Update**: this is done: [LowerIfOpPass](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/lower_if_op.h)
 62 | 
 63 | We don't want to lower If ops that will eventually be consumed by the [XLA encapsulation pass](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/jit/jit_compilation_pass_registration.cc#L35), so the TF-XLA bridge can take advantage of the easy-to-convert functional representation. This can be achieved by setting an attribute on the If op indicating whether it should be lowered, determined by e.g. if the If op is in an `XLAContext`. This may prove useful for other future use cases as well, such as transitioning to using the functional representation in the main TF runtime.
 64 | 
 65 | 
 66 | ## Implementation challenges
 67 | 
 68 | 
 69 | ### Exposing intermediate tensors to gradient functions
 70 | 
 71 | See the "Gradients" section. We somehow need to add intermediate tensors as outputs to the already-created forward-pass If op and its branch functions. Options:
 72 | 
 73 | 
 74 | 
 75 | 1.  Create a new If op with the required outputs. To prevent running both the original and new ops, we need to rewire the outputs of the original op to use the new op (and ideally modify any existing Tensor objects as well).
 76 | 1.  Modify the existing If op in-place. This involves either modifying or replacing the branch functions, and changing the outputs of the op (tricky, but probably doable).
 77 | 
 78 | Note that both of these options require mutating existing graph elements. If the graph has already been run, **this will invalidate any existing Sessions!** Other options:
 79 | 
 80 | 
 81 | 
 82 | 1.  Use placeholders for intermediates during construction, then use a C++ rewrite (Grappler or GraphOptimizationPass) to rewire the graph.
 83 | 1.  Output every possible intermediate.
 84 | 1.  It might already work as-is.
 85 |     1.  Except for ExtendGraph -- solution could be to make C API and Session share Graph*
 86 | 
 87 | **Update**: we went with (2) output every possible intermediate
 88 | 
 89 | 
 90 | ### Making branch function outputs match
 91 | 
 92 | After adding the intermediate outputs to the forward If op's branch functions, it's likely the two functions don't have the same output signature anymore. For each new output of each branch, we need to add an extra output tensor to the other branch to mirror it (since the If op requires the two outputs signatures match).
 93 | 
 94 | Note that the "mirror" tensors never need to be read. The original output is only consumed by the corresponding gradient function, which is only executed if the original output's branch is taken. Thus, if the mirror tensor is produced, no consumer of it will be run. However, without pruning and/or non-strict execution, the If op must still produce some value for the mirror tensor.
 95 | 
 96 | _Solution:_
 97 | 
 98 | Introduce a special op to output mirror tensors. This op's shape inference function will claim to output the same shape and type of the mirrored output, but since the tensor isn't actually needed the kernel will produce some small value to avoid producing large unnecessary values. If/when the op doesn't need to produce a value (e.g. via lowering + pruning), the kernel can CHECK or similar.
 99 | 
100 | 
101 | ### Taking the gradient of deserialized If ops
102 | 
103 | We need a graph representing the branch function of an If op in order to take its gradient. We already have a graph as part of creating the function, but if the graph was loaded from a GraphDef, we no longer have this graph. Options:
104 | 
105 | 
106 | 
107 | 1.  FunctionDef → Graph method
108 | 
109 | 
110 | ### Variable initialization
111 | 
112 | Variables created in the `cond` input callables must be created in the main graph, not in the temporary `FuncGraphs`. Luckily this is already handled by [init_scope](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/ops.py#L5230), which should already be used as necessary to handle creating variables in Defuns, etc.
113 | 
114 | 
115 | ### Collections
116 | 
117 | We must support reading and writing to collections in the `cond` input callables.
118 | 
119 | Reading from collections in eager-mode defuns already works by copying the collections into the `FuncGraphs`, which should presumably work here as well.
120 | 
121 | For writing, we'll have to forward or copy the values back to the original collections. This is tricky and poorly-defined for Tensor and Operation values, and possibly intractable for data structures containing graph elements (e.g. `WhileContext`). Options:
122 | 
123 | 
124 | 
125 | 1.  Collections are supposed to go away in TF 2.0
126 | 1.  Somehow turn Tensors into function outputs
127 | 1.  Can some tensors/operations be pulled out of the function?
128 | 1.  Expose "legacy cond" in contrib, eventually deprecate.
129 | 
130 | **Writing to collections requires more investigation.**
131 | 
132 | For example, how are people using collections within `cond` branches? How do they avoid dead Tensors?
133 | 
134 | 
135 | ### Name/device/colocation scope
136 | 
137 | Similar to reading collections, any graph-wide stacks and other state can be copied into the `FuncGraphs`. New scopes can then be added within the FuncGraph, and the semantics prevent any added state from persisting beyond the input callable.
138 | 
139 | For colocation, we can possibly use external tensor names as-is, since they'll either be lowered into the main graph or compiled by XLA.
140 | 
141 | 
142 | ### Control dependencies
143 | 
144 | If the `tf.cond` call occurs inside a control_dependencies block, the control inputs will be added directly to the resulting If op.
145 | 
146 | If the `cond` input callables contain control_dependencies blocks referring external tensors, we can create Identity nodes of the external tensors inside the function definition, and then create internal control edges (functions only have data inputs).
147 | 
148 | _The following concerns are avoided by lowering If ops before execution (see "Execution" section):_
149 | 
150 | 
151 | ### Devices
152 | 
153 | Akshay is working on allowing functions to run across multiple devices. My understanding is that it's mostly working, with a few limitations (e.g. all arguments to the function must go through the caller device, colocation with external tensors doesn't work).
154 | 
155 | 
156 | ### Partial evaluation
157 | 
158 | TF graphs are pruned before execution, meaning only the subgraph needed to compute the requested output tensors is run (this doesn't work completely for ops in a conditional branch, but some pruning still occurs). This is not currently possible with TF functions; the entire function is run regardless of which outputs are needed. This would need to be supported for parity with the current `cond` implementation.
159 | 
160 | 
161 | ### Non-strict execution
162 | 
163 | The current `cond` implementation allows each op in the taken branch to be run as soon as its inputs are ready, even if other ops in the branch aren't ready yet ("non-strict" execution). However, each TF op kernel will only begin running once it's inputs are all ready ("strict" execution), with `Merge` nodes being the only exception. If we replace the current `cond` construct with a single op, this will switch `cond` to strict execution. We would need to support non-strict execution of If ops and their branch functions.
164 | 
165 | 
166 | ## Future work
167 | 
168 | **tf.while_loop**. This effort will solve most of the problems with switching to a functional While representation (or a recursive function representation?). The remaining challenges are inserting stacks for the gradients, and support parallel iterations.
169 | 
170 | **C API support.** Ideally other language bindings support conditional execution as well. The C API already includes the primitives for other bindings to implement something similar to `tf.cond` that produces an `If` op, but the C API `TF_AddGradients` method would need to support `If` ops in order for other bindings to (easily) allow autodiff of conditionals.
171 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Copyright 2018 The TensorFlow Authors.  All rights reserved.
  2 | 
  3 |                                  Apache License
  4 |                            Version 2.0, January 2004
  5 |                         http://www.apache.org/licenses/
  6 | 
  7 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  8 | 
  9 |    1. Definitions.
 10 | 
 11 |       "License" shall mean the terms and conditions for use, reproduction,
 12 |       and distribution as defined by Sections 1 through 9 of this document.
 13 | 
 14 |       "Licensor" shall mean the copyright owner or entity authorized by
 15 |       the copyright owner that is granting the License.
 16 | 
 17 |       "Legal Entity" shall mean the union of the acting entity and all
 18 |       other entities that control, are controlled by, or are under common
 19 |       control with that entity. For the purposes of this definition,
 20 |       "control" means (i) the power, direct or indirect, to cause the
 21 |       direction or management of such entity, whether by contract or
 22 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 23 |       outstanding shares, or (iii) beneficial ownership of such entity.
 24 | 
 25 |       "You" (or "Your") shall mean an individual or Legal Entity
 26 |       exercising permissions granted by this License.
 27 | 
 28 |       "Source" form shall mean the preferred form for making modifications,
 29 |       including but not limited to software source code, documentation
 30 |       source, and configuration files.
 31 | 
 32 |       "Object" form shall mean any form resulting from mechanical
 33 |       transformation or translation of a Source form, including but
 34 |       not limited to compiled object code, generated documentation,
 35 |       and conversions to other media types.
 36 | 
 37 |       "Work" shall mean the work of authorship, whether in Source or
 38 |       Object form, made available under the License, as indicated by a
 39 |       copyright notice that is included in or attached to the work
 40 |       (an example is provided in the Appendix below).
 41 | 
 42 |       "Derivative Works" shall mean any work, whether in Source or Object
 43 |       form, that is based on (or derived from) the Work and for which the
 44 |       editorial revisions, annotations, elaborations, or other modifications
 45 |       represent, as a whole, an original work of authorship. For the purposes
 46 |       of this License, Derivative Works shall not include works that remain
 47 |       separable from, or merely link (or bind by name) to the interfaces of,
 48 |       the Work and Derivative Works thereof.
 49 | 
 50 |       "Contribution" shall mean any work of authorship, including
 51 |       the original version of the Work and any modifications or additions
 52 |       to that Work or Derivative Works thereof, that is intentionally
 53 |       submitted to Licensor for inclusion in the Work by the copyright owner
 54 |       or by an individual or Legal Entity authorized to submit on behalf of
 55 |       the copyright owner. For the purposes of this definition, "submitted"
 56 |       means any form of electronic, verbal, or written communication sent
 57 |       to the Licensor or its representatives, including but not limited to
 58 |       communication on electronic mailing lists, source code control systems,
 59 |       and issue tracking systems that are managed by, or on behalf of, the
 60 |       Licensor for the purpose of discussing and improving the Work, but
 61 |       excluding communication that is conspicuously marked or otherwise
 62 |       designated in writing by the copyright owner as "Not a Contribution."
 63 | 
 64 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 65 |       on behalf of whom a Contribution has been received by Licensor and
 66 |       subsequently incorporated within the Work.
 67 | 
 68 |    2. Grant of Copyright License. Subject to the terms and conditions of
 69 |       this License, each Contributor hereby grants to You a perpetual,
 70 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 71 |       copyright license to reproduce, prepare Derivative Works of,
 72 |       publicly display, publicly perform, sublicense, and distribute the
 73 |       Work and such Derivative Works in Source or Object form.
 74 | 
 75 |    3. Grant of Patent License. Subject to the terms and conditions of
 76 |       this License, each Contributor hereby grants to You a perpetual,
 77 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 78 |       (except as stated in this section) patent license to make, have made,
 79 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 80 |       where such license applies only to those patent claims licensable
 81 |       by such Contributor that are necessarily infringed by their
 82 |       Contribution(s) alone or by combination of their Contribution(s)
 83 |       with the Work to which such Contribution(s) was submitted. If You
 84 |       institute patent litigation against any entity (including a
 85 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 86 |       or a Contribution incorporated within the Work constitutes direct
 87 |       or contributory patent infringement, then any patent licenses
 88 |       granted to You under this License for that Work shall terminate
 89 |       as of the date such litigation is filed.
 90 | 
 91 |    4. Redistribution. You may reproduce and distribute copies of the
 92 |       Work or Derivative Works thereof in any medium, with or without
 93 |       modifications, and in Source or Object form, provided that You
 94 |       meet the following conditions:
 95 | 
 96 |       (a) You must give any other recipients of the Work or
 97 |           Derivative Works a copy of this License; and
 98 | 
 99 |       (b) You must cause any modified files to carry prominent notices
100 |           stating that You changed the files; and
101 | 
102 |       (c) You must retain, in the Source form of any Derivative Works
103 |           that You distribute, all copyright, patent, trademark, and
104 |           attribution notices from the Source form of the Work,
105 |           excluding those notices that do not pertain to any part of
106 |           the Derivative Works; and
107 | 
108 |       (d) If the Work includes a "NOTICE" text file as part of its
109 |           distribution, then any Derivative Works that You distribute must
110 |           include a readable copy of the attribution notices contained
111 |           within such NOTICE file, excluding those notices that do not
112 |           pertain to any part of the Derivative Works, in at least one
113 |           of the following places: within a NOTICE text file distributed
114 |           as part of the Derivative Works; within the Source form or
115 |           documentation, if provided along with the Derivative Works; or,
116 |           within a display generated by the Derivative Works, if and
117 |           wherever such third-party notices normally appear. The contents
118 |           of the NOTICE file are for informational purposes only and
119 |           do not modify the License. You may add Your own attribution
120 |           notices within Derivative Works that You distribute, alongside
121 |           or as an addendum to the NOTICE text from the Work, provided
122 |           that such additional attribution notices cannot be construed
123 |           as modifying the License.
124 | 
125 |       You may add Your own copyright statement to Your modifications and
126 |       may provide additional or different license terms and conditions
127 |       for use, reproduction, or distribution of Your modifications, or
128 |       for any such Derivative Works as a whole, provided Your use,
129 |       reproduction, and distribution of the Work otherwise complies with
130 |       the conditions stated in this License.
131 | 
132 |    5. Submission of Contributions. Unless You explicitly state otherwise,
133 |       any Contribution intentionally submitted for inclusion in the Work
134 |       by You to the Licensor shall be under the terms and conditions of
135 |       this License, without any additional terms or conditions.
136 |       Notwithstanding the above, nothing herein shall supersede or modify
137 |       the terms of any separate license agreement you may have executed
138 |       with Licensor regarding such Contributions.
139 | 
140 |    6. Trademarks. This License does not grant permission to use the trade
141 |       names, trademarks, service marks, or product names of the Licensor,
142 |       except as required for reasonable and customary use in describing the
143 |       origin of the Work and reproducing the content of the NOTICE file.
144 | 
145 |    7. Disclaimer of Warranty. Unless required by applicable law or
146 |       agreed to in writing, Licensor provides the Work (and each
147 |       Contributor provides its Contributions) on an "AS IS" BASIS,
148 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
149 |       implied, including, without limitation, any warranties or conditions
150 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
151 |       PARTICULAR PURPOSE. You are solely responsible for determining the
152 |       appropriateness of using or redistributing the Work and assume any
153 |       risks associated with Your exercise of permissions under this License.
154 | 
155 |    8. Limitation of Liability. In no event and under no legal theory,
156 |       whether in tort (including negligence), contract, or otherwise,
157 |       unless required by applicable law (such as deliberate and grossly
158 |       negligent acts) or agreed to in writing, shall any Contributor be
159 |       liable to You for damages, including any direct, indirect, special,
160 |       incidental, or consequential damages of any character arising as a
161 |       result of this License or out of the use or inability to use the
162 |       Work (including but not limited to damages for loss of goodwill,
163 |       work stoppage, computer failure or malfunction, or any and all
164 |       other commercial damages or losses), even if such Contributor
165 |       has been advised of the possibility of such damages.
166 | 
167 |    9. Accepting Warranty or Additional Liability. While redistributing
168 |       the Work or Derivative Works thereof, You may choose to offer,
169 |       and charge a fee for, acceptance of support, warranty, indemnity,
170 |       or other liability obligations and/or rights consistent with this
171 |       License. However, in accepting such obligations, You may act only
172 |       on Your own behalf and on Your sole responsibility, not on behalf
173 |       of any other Contributor, and only if You agree to indemnify,
174 |       defend, and hold each Contributor harmless for any liability
175 |       incurred by, or claims asserted against, such Contributor by reason
176 |       of your accepting any such warranty or additional liability.
177 | 
178 |    END OF TERMS AND CONDITIONS
179 | 
180 |    APPENDIX: How to apply the Apache License to your work.
181 | 
182 |       To apply the Apache License to your work, attach the following
183 |       boilerplate notice, with the fields enclosed by brackets "[]"
184 |       replaced with your own identifying information. (Don't include
185 |       the brackets!)  The text should be enclosed in the appropriate
186 |       comment syntax for the file format. We also recommend that a
187 |       file or class name and description of purpose be included on the
188 |       same "printed page" as the copyright notice for easier
189 |       identification within third-party archives.
190 | 
191 |    Copyright 2017, The TensorFlow Authors.
192 | 
193 |    Licensed under the Apache License, Version 2.0 (the "License");
194 |    you may not use this file except in compliance with the License.
195 |    You may obtain a copy of the License at
196 | 
197 |        http://www.apache.org/licenses/LICENSE-2.0
198 | 
199 |    Unless required by applicable law or agreed to in writing, software
200 |    distributed under the License is distributed on an "AS IS" BASIS,
201 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
202 |    See the License for the specific language governing permissions and
203 |    limitations under the License.
204 | 


--------------------------------------------------------------------------------
/rfcs/20180731-dockerfile-assembler.md:
--------------------------------------------------------------------------------
  1 | # TensorFlow Dockerfile Assembler
  2 | 
  3 | | Status        | Accepted       |
  4 | :-------------- |:---------------------------------------------------- |
  5 | | **Author(s)** | Austin Anderson (angerson@google.com) |
  6 | | **Sponsor**   | Gunhan Gulsoy (gunan@google.com)                     |
  7 | | **Updated**   | 2018-08-23                                           |
  8 | 
  9 | 
 10 | # Summary
 11 | 
 12 | This document describes a new way to manage TensorFlow's dockerfiles. Instead
 13 | of handling complexity via an on-demand build script, Dockerfile maintainers
 14 | manage re-usable chunks called partials which are assembled into documented,
 15 | standard, committed-to-repo Dockerfiles that don't need extra scripts to build.
 16 | It is also decoupled from the system that builds and uploads the Docker images,
 17 | which can be safely handled by separate CI scripts.
 18 | 
 19 | **Important:** This document is slim. The real meat of the design has already
 20 | been implemented in [this PR to
 21 | tensorflow/tensorflow](https://github.com/tensorflow/tensorflow/pull/21291).
 22 | 
 23 | **Also Important:** This design is not currently attempting to revise the images for speed or size: the design sets out a process that makes optimizing the images much easier to do on a larger scale.
 24 | 
 25 | # Background
 26 | 
 27 | TensorFlow's Docker offerings have lots of problems that affect both users and
 28 | developers. [Our images](https://hub.docker.com/r/tensorflow/tensorflow/) are
 29 | not particularly well defined or documented, and [our
 30 | Dockerfiles](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker)
 31 | are complicated and frightening.
 32 | 
 33 | ## Existing Images are Hard to Optimize
 34 | 
 35 | TensorFlow's current set of Dockerfiles are difficult to optimize. Developers
 36 | dislike pulling enormous Docker containers, and many of our tags could be
 37 | considered clunky (tag sizes yanked from Dockerhub, see also @flx42's comment
 38 | on this doc's PR for on-disk sizes):
 39 | 
 40 | | Image Tag          |   Size |
 41 | |:-------------------|-------:|
 42 | |latest-devel-gpu-py3| 1 GB   |
 43 | |latest-devel-py3    | 773 MB |
 44 | |latest-gpu-py3      |1 GB    |
 45 | |latest-py3          | 438 MB |
 46 | |latest-devel-gpu    | 1 GB   |
 47 | |latest-devel        | 727 MB |
 48 | |latest-gpu          | 1 GB   |
 49 | |latest              | 431 MB |
 50 | 
 51 | Including an extra dependency like Jupyter and convenience packagesnot can add
 52 | a few hundred megabytes of extra storage. Since some developers want to have
 53 | Jupyter in the images and it's too much trouble for us to maintain many similar
 54 | Dockerfiles, we've ended up with a limited set of non-optimized images. I'm not
 55 | sure if this truly a critical problem, but it's a little annoying (one of my
 56 | personal computers only has 32 GB of SSD space on the root drive, and I
 57 | regularly need to wipe my docker cache of large images).
 58 | 
 59 | ## TF Docker Images need Complexity
 60 | 
 61 | Our Docker images support two primary use cases: development _with_ TensorFlow,
 62 | and development _on_ TensorFlow. We want a matrix of options available for both
 63 | of these types of users, the most critical being GPU development support
 64 | (currently nvidia-only) and pre-installed Jupyter support. With only those
 65 | options considered, we target eight very similar Docker images; sixteen with
 66 | Python versioning.
 67 | 
 68 | Our current images come from a script called `parameterized_docker_build.sh`,
 69 | which live-edits a templated Dockerfile with `sed` to insert new Dockerfile
 70 | commands. The script has a poor reputation because it's hard to understand, can
 71 | be finicky, and is not easily understood compared to vanilla Dockerfiles. Some
 72 | Dockerfiles are duplicated, some are unused, and some users have made their own
 73 | instead. None of the Dockerfiles use the ARG directive.
 74 | 
 75 | Furthermore, `parameterized_docker_build.sh` is tightly coupled with the
 76 | deploy-to-image-hub process we use, which is confusing because users who build
 77 | the images locally don't need that information at all.
 78 | 
 79 | This document proposes a new way for the TF team to maintain this complex set
 80 | of similar Dockerfiles.
 81 | 
 82 | # Design
 83 | 
 84 | We use a generator to assemble multiple partial Dockerfiles into concrete
 85 | Dockerfiles that get committed into source control. These Dockerfiles are fully
 86 | documented and support argument customization. Unlike the parameterized image
 87 | builder script, this system excludes the image deployment steps, which should
 88 | be handled by a totally different system anyway.
 89 | 
 90 | This section lightly describes the design, which is fully implemented in [this
 91 | pull request to the main TensorFlow
 92 | repo](https://github.com/tensorflow/tensorflow/pull/21291).
 93 | 
 94 | Partial files are syntactically valid but incomplete files with Dockerfile
 95 | syntax.
 96 | 
 97 | Assembly is controlled by a specification file, defined in yaml. The spec
 98 | defines the partials, the ARGs they use, the list of Dockerfiles to generate
 99 | based on ordered lists of partials, and documentation for those values.
100 | 
101 | The assembler is a python script that accepts a spec and generates a bunch of
102 | Dockerfiles to be committed. The spec includes documentation and descriptions,
103 | and the output Dockerfiles are fully documented and can be built manually.
104 | 
105 | **Important**: This design in its current implementation does **not** attempt
106 | to address the limitations of our current set of images. Instead, it replicates
107 | the current set of tags with a few easy improvements, the most notable being a
108 | separate set of Dockerfiles that add Jupyter -- identical in every way to the
109 | non-Jupyter images without needing any extra maintenance. This design makes it
110 | much easier to craft TensorFlow's Docker offering in a way that satisfies
111 | everyone with minimal extra work from the Dockerfile maintainers.
112 | 
113 | # Impact
114 | 
115 | This approach has many convenient benefits:
116 | 
117 | *   The result is concrete, buildable, documented Dockerfiles. Users who wish
118 | to build their own images locally do not need to also understand the build
119 | system. Furthermore, basing our images on clean Dockerfiles that live in the repository feels clean and right -- as a user, I (personally) like to be able to see how an image works. It removes the mystery and magic from the process.
120 | *   This implementation is agnostic to what images we would like to make
121 | available online (i.e. our Docker story). It's very easy to add new dockerfile
122 | outputs.
123 | *   The build-test-and-deploy-images process is decoupled from the Dockerfile
124 | generation process.
125 | *   Control of the set of dockerfiles is centralized to the spec file, instead
126 | of being spread across each Dockerfile.
127 | *   The spec can be extended to add more conveniences. My implementation, for
128 | example, already includes de-duplication of many similar Dockerfile
129 | specifications.
130 | *   All dockerfiles are consistently documented.
131 | *   Common pieces of code, like a slick shell environment or a Jupyter
132 | interface, can be updated in batch by updating a single partial file.
133 | *   The spec can also be used in the image building process, e.g. to read all
134 | available args.
135 | 
136 | # Caveats and Rejected Alternatives
137 | 
138 | I considered two alternatives while working on this.
139 | 
140 | ## Hacky Multi-Stage Dockerfile
141 | 
142 | "Multi-stage Building" is a powerful new Dockerfile feature that supports
143 | multiple FROM statements in one Dockerfile. Multi-stage builds let you build
144 | and run an artifact (like a compiled version of a binary) in any number of
145 | separate stages designated by FROM directives; the resulting image is only as
146 | large as the final stage without the build-only dependencies from previous
147 | stages.
148 | 
149 | However, Docker's ARG parameter expansion can be used in these extra FROM
150 | directives to conditionally set base images for each build stage:
151 | 
152 | 
153 | ```dockerfile
154 | # If --build-arg FROM_FOO is set, build FROM foo. else build FROM bar.
155 | ARG FROM_FOO
156 | ARG _HELPER=${FROM_FOO:+foo}
157 | ARG BASE_IMAGE=${_HELPER:-bar}
158 | FROM ${BASE_IMAGE}
159 | …
160 | ```
161 | 
162 | This means that it's possible to use multi-stage builds and ARGs to create
163 | stages that are conditionally based on previous stages in the Dockerfile.
164 | [This sample
165 | Dockerfile](https://gist.github.com/angersson/3d2b5ae6a01de4064b1c3fe7a56e3821),
166 | which I've included only as a demonstration of a bad idea (and may currently
167 | work), is very powerful but not extensible and not easy to understand. It is
168 | heavily coupled to our current environment, which may change immensely e.g. if
169 | AMD releases Docker images similar to Nvidia's or if someone would like to add
170 | MKL support.
171 | 
172 | ## Multiple Normal Dockerfiles Aggregated into Multiple Stages
173 | 
174 | In a [comment on this doc's PR](https://github.com/tensorflow/community/pull/8#issuecomment-410080344), @flx42 suggested a much-improved version of the
175 | previous section. Another way of using ARG interpolation in FROM lines would be
176 | to write multiple isolated Dockerfiles that can be layered together during the `docker build` process:
177 | 
178 | 
179 | ```
180 | ARG from
181 | FROM ${from}
182 | 
183 | ARG PIP
184 | RUN ${PIP} install jupyter
185 | ```
186 | 
187 | And then:
188 | 
189 | ```
190 | $ docker build -t nvidia-devel -f Dockerfile.nvidia-devel .
191 | $ docker build -t nvidia-devel-jupyter-py3 --build-arg from=nvidia-devel --build-arg pip=pip3 -f Dockerfile.jupyter .
192 | ```
193 | 
194 | This shares the advantage of the current design by working from many reusable parts, but carries some notable tradeoffs:
195 | 
196 | ### Advantages over Current Design
197 | 
198 | I can see a variety of minor improvements:
199 | 
200 | - No need for assembler script or spec file
201 | - Possibly faster build times due to concretely isolated image stages
202 | - Image stages (akin to partials) may be more reusable due to slot-like usage of `--build-args`
203 | - Because there are no concrete Dockerfiles, there's only one place that defines the Dockerhub tags and what components describe them (in the current design, the spec files describes the Dockerfiles, and then a little more logic elsewhere in our CI would configure those Dockerfiles with the tags)
204 | 
205 | ### Downsides compared to Current Design
206 | 
207 | ...but some downsides that I think are fairly heavy:
208 | 
209 | - Spec + Assembler have some very nice advantages (validation, re-use, etc.)
210 | - No concrete Dockerfiles for OSS devs to use / refer to
211 | - Advanced usage requires some unintuitive file/directory layout + build ordering
212 | - Image-building complexity offloaded to OSS developers and to the CI scripts, which would need scripts / logic to define sets of images to build
213 | - Updating requires familiarity with multi-stage behavior
214 | 
215 | ### Conclusion
216 | 
217 | This is an interesting approach that I like a lot, but I don't think it offers
218 | enough benefits over the current design (which has another advantage in that it
219 | is already mostly finished) to implement.
220 | 
221 | It's worth noting that using multiple FROM stages is a powerful tool that could
222 | possibly be leveraged in the partials for the current design.
223 | 
224 | ## Manually Maintained Dockerfiles with Script References
225 | 
226 | Another pattern that supports complicated Dockerfiles is to manually maintain
227 | many Dockerfiles that each call out to a common set of build scripts:
228 | 
229 | ```dockerfile
230 | FROM ubuntu
231 | COPY install_scripts/ /bin
232 | RUN /bin/install_nvidia_dev.sh
233 | RUN /bin/install_python_dev.sh
234 | RUN /bin/install_bazel.sh
235 | ...
236 | ```
237 | 
238 | This is better than our current approach, but has many small drawbacks that add
239 | up:
240 | 
241 | * Argument passing becomes slightly more complex, because ARGs must be passed
242 | and read as either ENV variables or as build arguments.
243 | * Each dockerfile has to be properly documented manually, if at all.
244 | * Developers have to leave the Dockerfile to read the shell scripts, which
245 | gets annoying.
246 | * Maintenance is spread across the dockerfiles and the scripts, and can grow
247 | into even more work (like some Dockerfiles having extra non-script directives,
248 | etc.).
249 | * Extra overhead in the scripts can be kind of wasteful 
250 | 
251 | # Work Estimates
252 | 
253 | I have already completed a PR that will introduce these Dockerfiles without
254 | affecting our current builds. These would probably take a week or two to
255 | migrate.
256 | 
257 | ## Questions and Discussion Topics
258 | 
259 | Seed this with open questions you require feedback on from the RFC process.
260 | 


--------------------------------------------------------------------------------
/rfcs/20180821-differentiable-functional-while.md:
--------------------------------------------------------------------------------
  1 | # Functional while_loop
  2 | | Status        | Accepted                                             |
  3 | :---------------|:-----------------------------------------------------|
  4 | | **Author** | Saurabh Saxena (Google) |
  5 | | **Sponsor**   | Skye Wanderman-Milne (Google)                 |
  6 | | **Updated**   | 2018-08-23                                           |
  7 | 
  8 | 
  9 | ## Objective
 10 | 
 11 | This proposal talks about an implementation of [while_loop](https://www.tensorflow.org/api_docs/python/tf/while_loop) which adds a single While op to the GraphDef as opposed to the current implementation that uses [lower level primitives](https://arxiv.org/abs/1805.01772). The goal is to simplify debugging and other analysis and to make it easier for compiler backends like XLA to [recognize](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/tf2xla/functionalize_while.cc) the while loop in the GraphDef. At runtime, a C++ optimization pass will lower this op to the primitive dataflow ops for feature parity with the current implementation similar to how we do for the [If op](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/lower_if_op.cc).
 12 | 
 13 | 
 14 | ## Motivation
 15 | 
 16 | TensorFlow provide two flavours of control flow constructs which differ widely in the way they manifest themselves in the GraphDef:
 17 | 
 18 | 
 19 | 
 20 | 1.  Functional ops which create a single node in the Graph: [If](https://github.com/tensorflow/tensorflow/blob/fc4504edb1ab419ae59b0ebb9ff8d943beb61117/tensorflow/core/ops/functional_ops.cc#L104), [While](https://github.com/tensorflow/tensorflow/blob/fc4504edb1ab419ae59b0ebb9ff8d943beb61117/tensorflow/core/ops/functional_ops.cc#L147).
 21 | 1.  Non-functional ops which make use of primitive control flow constructs namely Enter, Exit, Switch, Merge and NextIteration: [tf.cond](https://www.tensorflow.org/api_docs/python/tf/cond), [tf.while_loop](https://www.tensorflow.org/api_docs/python/tf/while_loop).
 22 | 
 23 | Both approaches have their merits and demerits. The functional representation emits a single node in the GraphDef thus making it easy to recognize such ops in processing pipelines that operate on the GraphDef, which is not the case when control flow is represented using lower level primitives. The functional representation is however not easily differentiable and requires using the [SymbolicGradient](https://github.com/tensorflow/tensorflow/blob/a0e76ce73c5f095fc61e06c19ff8e653cfd2965c/tensorflow/core/ops/functional_ops.cc#L24) op which recomputes the forward pass(slow) and needs symbolic gradients defined for all ops in the function body which can be complicated to implement. Also since we force a strict execution of functions, i.e., a function can start executing only after its inputs are all ready, the functional ops may not be that performant. The current representation solved these problems at the cost of a slightly complicated GraphDef. In this proposal, we try to achieve the best of both worlds.
 24 | 
 25 | We recently added a differentiable version of the [functional If/cond op](https://github.com/tensorflow/community/blob/master/rfcs/20180507-cond-v2.md). As with functional cond, a key challenge here is to figure out gradient computation. For cond, we could expose the [intermediate tensors](https://github.com/tensorflow/tensorflow/blob/51100a8de57ef53e36a8a9f5a9829cbd33fbed04/tensorflow/python/ops/cond_v2_impl.py#L114) as op outputs so that they could be used for computing gradients. We cannot directly do the same for while loops since we would need the intermediate values _for all iterations_ and not just the values after the last iteration. Hence, some sort of accumulator is required. We use TensorLists for accumulating the loop body intermediates. Since while loops may run for a large number of iterations, e.g. long RNNs,  we need to be mindful of the memory usage by accumulators.
 26 | 
 27 | 
 28 | ## Design Proposal
 29 | 
 30 | 
 31 | ### Accumulating intermediates
 32 | 
 33 | 
 34 | #### Stack vs TensorArray vs TensorList
 35 | 
 36 | The current implementation uses [Stacks](https://github.com/tensorflow/tensorflow/blob/51100a8de57ef53e36a8a9f5a9829cbd33fbed04/tensorflow/python/ops/control_flow_ops.py#L1002) for accumulating intermediate values from the forward pass that may be needed for gradient computation. This implementation will use [TensorLists](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/list_ops.cc)(TL) instead which, unlike Stack and TensorArray, do not have a mutable internal state making them easy to differentiate.
 37 | 
 38 | 
 39 | #### Algorithm
 40 | 
 41 | For each intermediate tensor of the while loop function body that may be needed for gradient computation, we create an empty TensorList and add it to the list of loop_vars. We then push the intermediate values to the TL using the [TensorListPushBack](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/list_ops.cc#L40) op. Note that this way we may be accumulating more tensors than are actually needed for gradient computation. It is even possible that the graph is just used for inference and hence we do not need the accumulators at all! We rely on the [C++ optimization pass](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/optimizers/model_pruner.cc) that happens after the While op is lowered to remove all such superfluous accumulators. So adding extra accumulators will not have any performance or memory overhead at runtime.
 42 | 
 43 | To facilitate use-cases where lowering is not desired we can perform a few optimizations to the functional form of the While op:
 44 | 
 45 | *   Expose only those intermediate values that are required by the backward pass by building the gradient graph in the forward pass.
 46 |     *   This will increase graph building time.
 47 | *   Do not accumulate Const nodes. We can lift these outside the while loop.
 48 | *   Do not accumulate loop vars that are passed-through unchanged.
 49 | *   Rewrite the forward pass to add accumulators when gradients are requested.
 50 |     *   This will require creating a new While op and new FunctionDefs for the loop condition and body.
 51 |     *   Since we cannot remove nodes from the Graph there will be unused functions and the dangling While op in the GraphDef. These will however be pruned out at runtime and hence will not affect performance or correctness.
 52 | 
 53 | 
 54 | ### Computing gradients
 55 | 
 56 | Excerpt from white paper on [Control Flow in TensorFlow](http://download.tensorflow.org/paper/white_paper_tf_control_flow_implementation_2017_11_1.pdf):
 57 | 
 58 | > Intuitively, the gradient of `while_loop(pred, body)` is just a while loop of the form:
 59 | >
 60 | >
 61 | > ```
 62 | > def pred(i, _): return i < N
 63 | > while_loop(pred, g_body, [0] + g_vars)
 64 | > ```
 65 | >
 66 | > Where `N` is the number of iterations that the forward while loop runs, `g_body` is the gradient of the forward loop body, and `g_vars` is the initial values for the loop variables. As we will see later, `g_vars` includes the initial gradients for the loop variables of the forward while loop.
 67 | 
 68 | We use the same logic here as well. To get a count of the number of forward iterations we add an integer counter which is initialized to 0 and is incremented in the loop body. Note that we just need the total number of iterations for the gradient pass so we do not need to accumulate the intermediate values of the counter. This counter is always the first output of the While op.
 69 | 
 70 | To compute *g_body* we use the [gradients_impl._GradientsHelper](https://github.com/tensorflow/tensorflow/blob/600caf99897e82cd0db8665acca5e7630ec1a292/tensorflow/python/ops/gradients_impl.py#L599) function which supports computing the gradient of a given [src_graph](https://github.com/tensorflow/tensorflow/blob/600caf99897e82cd0db8665acca5e7630ec1a292/tensorflow/python/ops/gradients_impl.py#L607) in another graph, which in this case is a [_FuncGraph](https://github.com/tensorflow/tensorflow/blob/600caf99897e82cd0db8665acca5e7630ec1a292/tensorflow/python/framework/function.py#L621). This gradient graph captures references to the intermediate values of the forward graph(the src_graph). We replace these references with popped values from the accumulators of the intermediate tensors. Note that these accumulators were already added to the list of loop_vars of the While op and hence were in the list of outputs of the forward While op.
 71 | 
 72 | We will register a custom python [gradient function](https://github.com/tensorflow/tensorflow/blob/0440ccfc199cbffc10aae19fde07f0100c823ed9/tensorflow/python/framework/ops.py#L2352) to compute the gradient of a functional While op. This will allow taking the gradient of any functional While op(not only the ones generated by the new while_loop function) which satisfies the following conditions:
 73 | 
 74 | 
 75 | 
 76 | 1.  The first loop output must be the number of loop iterations.
 77 | 1.  Each intermediate tensor of the While body which may be needed during gradient computation must be accumulated in a TensorList. We will check to make sure that the TensorList is indeed unique to the intermediate value.
 78 | 1.  The position of the accumulator in the list of inputs and outputs must be the same.
 79 | 
 80 | The While op generated by the gradient function satisfies the above constraints and hence can be differentiated again to generate the 2nd order derivative and so on.
 81 | 
 82 | In the case of nested while loops, we will accumulate the intermediate values of inner while loops in nested TensorLists.
 83 | 
 84 | 
 85 | ### Memory management
 86 | 
 87 | tf.while_loop swaps the tensors from GPU to CPU when the [swap_memory](https://github.com/tensorflow/tensorflow/blob/600caf99897e82cd0db8665acca5e7630ec1a292/tensorflow/python/ops/control_flow_ops.py#L3046) flag is set. Section 5.3 of the control flow [paper](https://arxiv.org/abs/1805.01772) mentions that with memory swapping they were able to handle an RNN with 2x the unrolled length(1000 vs 500) with little overhead. The heuristics for memory swapping are implemented in the [StackPush](https://github.com/tensorflow/tensorflow/blob/600caf99897e82cd0db8665acca5e7630ec1a292/tensorflow/core/kernels/stack_ops.cc#L289) and [StackPop](https://github.com/tensorflow/tensorflow/blob/600caf99897e82cd0db8665acca5e7630ec1a292/tensorflow/core/kernels/stack_ops.cc#L411) ops. We will need to support similar functionality for TensorListPushBack and TensorListPopBack ops.
 88 | 
 89 | 
 90 | ### Lowering pass
 91 | 
 92 | In order to get feature parity with the current implementation we will lower the While op to the current while loop graph representation as a grappler pass similar to the one for [if_op](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/lower_if_op.cc). This gets us around some of the issues with the current functional op:
 93 | 
 94 | 
 95 | 
 96 | 1.  We can perform parallel iterations which are not possible due to the strict mode execution of functions which requires that all inputs to the function must be ready before the function can start executing. We will need to add a `parallel_iterations` attr to the While op.
 97 | 1.  The FunctionLibraryRuntime currently does not allow running multi-device functions.
 98 | 1.  We can perform global grappler optimizations without needing to cross function boundaries. E.g. we can remove accumulators for intermediate values which are not consumed downstream.
 99 | 
100 | 
101 | ### Example
102 | 
103 | ```python
104 | 
105 | x = tf.constant(2.)
106 | 
107 | ret = while_loop(lambda v: v < 8., lambda v: v * v, [x])
108 | 
109 | grad = tf.gradients(ret, [x])
110 | 
111 | ```
112 | 
113 | **Current implementation**
114 | 
115 | 
116 | 
117 | ![alt_text](20180821-differentiable-functional-while/while_v1.png "image_tooltip")
118 | 
119 | 
120 | **New implementation**
121 | 
122 | 
123 | 
124 | ![alt_text](20180821-differentiable-functional-while/while_v2.png "image_tooltip")
125 | 
126 | 
127 | The forward functional while op is highlighted in <span style="color:#ff0000;">red</span>. Note that it takes 2 `Const` nodes as inputs. One of the `Const` nodes is `x` with value 2. The other `Const` node is the initial value of the loop counter which is set to 0. There are also 2 `EmptyTensorList` nodes which are used for accumulating intermediate values.
128 | 
129 | *while_cond*
130 | 
131 | The loop condition function is fairly trivial. It expects the extra args for the loop counter and accumulators but doesn't actually use them.
132 | 
133 | 
134 | 
135 | ![alt_text](20180821-differentiable-functional-while/while_cond.png "image_tooltip")
136 | 
137 | 
138 | *while_body*
139 | 
140 | The loop body contains the extra nodes for updating the counter and accumulating intermediates.
141 | 
142 | 
143 | 
144 | ![alt_text](20180821-differentiable-functional-while/while_body.png "image_tooltip")
145 | 
146 | 
147 | `arg0` is the loop counter which gets initialized to 0. This is always the first argument.
148 | 
149 | `arg1` is the value of x at the start of each iteration.
150 | 
151 | `add_0` is the counter update node and `y` is the increment `Const` node with value 1.
152 | 
153 | `mul_0` performs `x * x`
154 | 
155 | 
156 | Accumulators:
157 | 
158 | `tensorlist0` <- `arg1`, the value of `x` at the start of the loop.
159 | 
160 | `tensorlist1` <- Output of `mul_0`.
161 | 
162 | ## Discussion notes
163 | 
164 | Please see notes in [tensorflow/community#13](https://github.com/tensorflow/community/pull/13#issuecomment-422591773).
165 | 


--------------------------------------------------------------------------------
/rfcs/20180726-tf-data-windowing-reducers.md:
--------------------------------------------------------------------------------
  1 | # Generalizing tf.data batching using windowing and reducers 
  2 | 
  3 | | Status        | Accepted                                             |
  4 | :---------------|:-----------------------------------------------------|
  5 | | **Author(s)** | Jiri Simsa (Google)                                  |
  6 | | **Sponsor**   | Derek Murray (Google)                                |
  7 | | **Updated**   | 2018-09-19                                           |
  8 | 
  9 | ## Objective 
 10 | 
 11 | This proposal addresses the known limitations of the current tf.data batching API:
 12 | 
 13 | *   it provides a mechanism for padded batching of sparse tensors
 14 | *   it facilitates customization of batching logic (users can now express batching logic as a pure Python function)
 15 | *   it enables application of different batching logic on different components
 16 | 
 17 | ## **Motivation**
 18 | 
 19 | The tf.data API is the de facto standard for creating TensorFlow input pipelines, whose purpose is to extract data from a storage system, transform it, and load it onto an accelerator.
 20 | 
 21 | A common transformation performed by TensorFlow input pipelines is batching -- combining multiple tensors into a single tensor of higher dimension, most often to make a minibatch for training. Currently, the core tf.data API for batching consists of [batch](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch) and [padded_batch](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#padded_batch). The former assumes the inputs have the same shape and supports both dense and sparse inputs. The latter supports dynamically shaped inputs, such as you might find in sequential data: it assumes the inputs have the same rank but not necessarily the same shape and can pad differently shaped inputs to a common shape; only dense inputs are supported by padded_batch.
 22 | 
 23 | The tf.data batching API has several limitations that has surfaced in various users requests:
 24 | 
 25 | *   As already mentioned, the padded_batch transformation does not support sparse tensors inputs ([issue](https://github.com/tensorflow/tensorflow/issues/18302)).
 26 | *   The current API is not flexible enough to accept user-provided batching logic (e.g. [issue](https://github.com/tensorflow/tensorflow/issues/20391)).
 27 | *   The same batching logic needs to be applied to all components of the input dataset, which is not always desirable (e.g. [issue](https://github.com/tensorflow/tensorflow/issues/20391)). Users can work around this limitation by creating separate datasets to which different batching transformations are applied and then zipping the datasets; however, this can be inefficient, unergonomic, and error prone.
 28 | 
 29 | 
 30 | ## Proposal
 31 | 
 32 | This document proposes leveraging the recently introduced support for _nested_ datasets as inputs to tf.data transformations to perform generalized batching as follows:
 33 | 
 34 | 
 35 | 
 36 | 1.  A __window__ transformation is used to combine consecutive elements of the input into a nested dataset (as opposed to a higher dimensional tensor).
 37 | 1.  A map transformation is used to, on a per-component basis, apply a suitable __reducer__ which transforms the nested dataset to a batched tensor.
 38 | 
 39 | The underlined transformations do not exist and are the proposed extensions to the tf.data API.
 40 | 
 41 | 
 42 | ### Windowing
 43 | 
 44 | Windowing combines elements of a dataset into finite datasets referred to as windows. This is similar to batching, with the main difference being that batching combines elements of dataset into a higher dimensional element, while windowing combines the elements to a dataset.
 45 | 
 46 | 
 47 | ```python
 48 | def window(size, shift=1, stride=1, drop_remainder=True):
 49 |   """Combines input elements into a dataset of windows.
 50 | 
 51 |   Each window is a dataset itself and contains `size` elements (or
 52 |   possibly less if there is not enough input elements to fill the window
 53 |   and `drop_remainder` evaluates to false).
 54 | 
 55 |   The `stride` argument determines the stride of the input elements,
 56 |   and the `shift` argument determines the shift of the window.
 57 | 
 58 |   For example:
 59 |  - tf.data.range(5).window(3) produces {{0, 1, 2}, {1, 2, 3}, {2, 3, 4}}
 60 |  - tf.data.range(5).window(3, 3, 1, False) produces {{0, 1, 2}, {3, 4}}
 61 |  - tf.data.range(6).window(3, 1, 2) produces {{0, 2, 4}, {1, 3, 5}}
 62 | 
 63 |   Args:
 64 |     size: A `tf.int64` scalar `tf.Tensor`, representing the number
 65 |       of elements of the input dataset to combine into a window.
 66 |     shift: A `tf.int64` scalar `tf.Tensor`, representing the forward
 67 |       shift of the sliding window in each iteration.
 68 |     stride: A `tf.int64` scalar `tf.Tensor`, representing the stride
 69 |       of the input elements in the sliding window.
 70 |     drop_remainder: A `tf.bool` scalar `tf.Tensor`, representing whether
 71 |       a window should be dropped in case its size is smaller than 
 72 |       `window_size`.
 73 | 
 74 |   Returns:
 75 |     Dataset: A `Dataset` whose elements are a `Dataset`.
 76 |   """
 77 | ```
 78 | 
 79 | 
 80 | ### Reducers
 81 | 
 82 | 
 83 | #### Example 0: Count Elements
 84 | 
 85 | To introduce the concept of tf.data reducers to readers unfamiliar with it, we illustrate how a reducer can be used to count the elements of a dataset:
 86 | 
 87 | 
 88 | ```python
 89 | def count(dataset):
 90 |   """Counts the elements of a dataset."""
 91 | 
 92 |   def init_fn(_):
 93 |     return 0
 94 | 
 95 |   def reduce_fn(state, value):
 96 |     return state + 1
 97 | 
 98 |   def finalize_fn(state):
 99 |     return state
100 | 
101 |   count_reducer = tf.data.Reducer(init_fn, reduce_fn, finalize_fn)
102 |   return dataset.reduce(count_reducer)
103 | 
104 | value = count(tf.data.Dataset.range(10))
105 | with tf.Session() as sess:
106 |   print(sess.run(value)) # produces 10
107 | ```
108 | 
109 | 
110 | As you can see, a tf.data reducer consists of three functions: 1) an _init()_ function that sets up the initial state, which can be an arbitrary nest of tensor-like objects, 2) a _reduce()_ function that defines how to update the intermediate state given the value of the next element, and 3) a _finalize()_ function that defines how to produce the transform the final state into the output value.
111 | 
112 | The reducer inputs an entire dataset and reduces it to a single value. This single value is the result of taking the output of init(), calling reduce() successively on every element of the dataset until the dataset is exhausted, and then calling finalize() on the result.
113 | 
114 | 
115 | #### Example 1: Batch of Dense Tensors
116 | 
117 | Next, we illustrate how tf.data reducers can be used to create a batch from a dataset of dense tensors.
118 | 
119 | ```python
120 | def batch_dense(dataset):
121 |   """Batches a dataset of dense tensors."""
122 | 
123 |   if dataset.output_shapes.is_fully_defined():
124 |     shape = dataset.output_shapes
125 |   else:
126 |     first_element = tf.contrib.data.get_single_element(dataset.take(1))
127 |     shape = tf.shape(first_element)
128 | 
129 |   def batch_init_fn(_):
130 |     """Return an empty Tensor of the correct shape and type."""
131 |     batch_shape = tf.concat([[0], shape], 0)
132 |     return gen_array_ops.empty(batch_shape, dtype=dataset.output_types)
133 | 
134 |   def batch_reduce_fn(state, value):
135 |     """Append this value to what we have of the batch so far."""
136 |     return tf.concat([state, [value]], 0)
137 | 
138 |   def batch_finalize_fn(state):
139 |     """Return the batch tensor as constructed so far."""
140 |     return state
141 | 
142 |   batch_reducer = tf.data.Reducer(batch_init_fn, batch_reduce_fn,
143 |                                   batch_finalize_fn)
144 |   return dataset.reduce(batch_reducer)
145 | 
146 | batch = batch_dense(tf.data.Dataset.range(5))
147 | with tf.Session() as sess:
148 |   print(sess.run(batch)) # produces [0 1 2 3 4]
149 | 
150 | ```
151 | 
152 | 
153 | 
154 | #### Example 2: Padded Batch of Dense Tensors
155 | 
156 | Our next tf.data reducer example illustrates how to use a reducer to create a padded batch from a dataset of dense tensors.
157 | 
158 | ```python
159 | def padded_batch_dense(dataset, padded_shape, padding_value):
160 |   """Batches a dataset of dense tensors with padding."""
161 | 
162 |   padded_shape = tf.cast(
163 |       convert.partial_shape_to_tensor(padded_shape), tf.int32)
164 | 
165 |   def init_fn(_):
166 |     return 0, padded_shape
167 | 
168 |   def reduce_fn(state, value):
169 |     count, shape = state
170 |     return count + 1, tf.maximum(shape, tf.shape(value))
171 | 
172 |   def finalize_fn(state):
173 |     return state
174 | 
175 |   # Compute the padded shape and count elements.
176 |   reducer = tf.contrib.Reducer(init_fn, reduce_fn, finalize_fn)
177 |   count, padded_shape = dataset.reduce(reducer)
178 | 
179 |   def pad_fn(value):
180 |     shape = tf.shape(value)
181 |     left = tf.zeros_like(shape)
182 |     right = padded_shape - shape
183 |     return tf.pad(value, tf.stack([left, right], 1), 
184 |                   constant_values=padding_value)
185 | 
186 |   return dataset.map(pad_fn).batch(count)
187 | 
188 | padded_batch = padded_batch_dense(
189 |     tf.data.Dataset.from_tensor_slices([[1], [2]]), [2], 0))
190 |     .make_one_shot_iterator().get_next()
191 | with tf.Session() as sess:
192 |   print(sess.run(padded_batch)) # produces [[1 0] [2 0]]
193 | ```
194 | 
195 | 
196 | 
197 | ### End-to-end Example
198 | 
199 | Finally, we illustrate how to use the window transformation to perform generalized tf.data batching:
200 | 
201 | ```python
202 | import tensorflow as tf
203 | 
204 | def gen():
205 |   yield ('a', [1])
206 |   yield ('b', [2])
207 |   yield ('c', [3])
208 |   yield ('d', [4, 4])
209 | 
210 | def map_fn(a, b):
211 |   return tf.data.Dataset.zip((a.batch(2), b.padded_batch(2, [2])))
212 |   
213 | dataset = tf.data.Dataset.from_generator(gen, (tf.string, tf.int32))
214 | dataset = dataset.window(2, 2).flat_map(map_fn)
215 | get_next = dataset.make_one_shot_iterator().get_next()
216 | 
217 | with tf.Session() as sess:
218 |   print(sess.run(get_next)) # produces (['a', 'b'], [[1, 0], [2, 0]])
219 |   print(sess.run(get_next)) # produces (['c', 'd'], [[3, 0], [4, 4]])
220 | ```
221 | 
222 | 
223 | 
224 | ## API Changes
225 | 
226 | This design document proposes the following changes to the tf.data API:
227 | 
228 | *   Adding a `tf.data.Dataset.window` method, which provides the windowing functionality described in this proposal. 
229 | *   Promoting the `tf.contrib.data.reduce_dataset()` method to `tf.data.Dataset.reduce()` and the `tf.contrib.data.Reducer` class to `tf.data.Reducer`.
230 | *   Allowing nested datasets as inputs of `map` and `filter`.
231 | *   Adding canned reducers for padded batching of dense and sparse tensors to `tf.contrib.data`, changing implementation of `tf.data.Dataset.padded_batch()` to use these, and marking it as deprecated.
232 | 
233 | ## Summary
234 | 
235 | This proposal addresses known limitations of the current tf.data batching API:
236 | 
237 | *   it provides a mechanism for padded batching of sparse tensors
238 | *   it facilitates customization of batching logic (users can now express batching logic as a pure Python function)
239 | *   it enables application of different batching logic on different components
240 | 
241 | 
242 | ## Discussion Notes
243 | 
244 | See also notes from [public review](https://github.com/tensorflow/community/pull/5). The following notes were taken in the review committee.
245 | 
246 | Q: What is the better value added by the new examples?
247 | 
248 | A: The previous examples were inefficient versions of things that already exist.
249 | 
250 | Q: The obvious use of the API led to an inefficient implementation (of batching, using tf.concat()). It might be hard to write batching in this API without it being 
251 | 
252 | A: This API is not meant to be used to implement something that already exists.
253 | 
254 | Q: Is this not a good API for implementing batching? The structure encourages inefficient implementations.
255 | 
256 | A: The point was not to illustrate how we do batching efficiently. It's already done.
257 | 
258 | Q: I thought the point was to show many different ways to do batching.
259 | 
260 | A: The base case is still an efficient implementation of batch, but we can add other logic around it (e.g. to do different forms of padding, etc.).
261 | 
262 | Q: What were the biggest questions?
263 | 
264 | A: Batching efficiency was the biggest one. Some questions about the signature of the newly introduced transformation. One reader commented that the meaning of "window" in other communities (video processing) typically includes some notion of slide/stride. Conclusion was that we will support shift and stride as we already do in `sliding_window_batch()`. Stride = number of elements you skip (i.e. for non-consecutive elements in a window), shift = how much the window shifts between windows.
265 | 
266 | Q: Is there any significant overhead from elements being datasets (e.g. from extra work in Python)?
267 | 
268 | A: The amount of computation that you have to do to compute the batch should be the same. There is no additional work in Python.
269 | 
270 | Q: How do you compile the reduce function to run it in C++?
271 | 
272 | A: It's a TF function, similar to existing map functions, etc.
273 | 
274 | Q: Concern about how many times count() is invoked.
275 | 
276 | A: The example shows how to use it in a filter(), where the count is evaluated in a function context.
277 | 
278 | Q: Re: runtime efficiency, in the higher dimensional case, would we always make a copy to concatenate?
279 | 
280 | A: That's what the Dataset.batch() transformation does. The nested dataset elements aren't intended for direct consumption, but to serve as input to other transformations, which e.g. build padded batches, sparse tensors, etc. This proposal lets you mix and match how you treat the different components, as illustrated in the end-to-end example. The goal of the new API isn't to improve efficiency of the existing implementations, but to add support for new kinds of transformation.
281 | 
282 | Q: What about the parallel proposal for random access datasets? Will count() be an exposed primitive or would you use the efficient random-access count?
283 | 
284 | A: We would add efficient random-access count for the nested datasets produced by window().
285 | 
286 | 


--------------------------------------------------------------------------------
/rfcs/20180817-variables-20.md:
--------------------------------------------------------------------------------
  1 | # Variables in TensorFlow 2.0
  2 | 
  3 | | Status        | Accepted       |
  4 | :-------------- |:---------------------------------------------------- |
  5 | | **Author(s)** | apassos@google.com |
  6 | | **Sponsor**   | wicke@google.com, joshl@google.com, ashankar@google.com                 |
  7 | | **Updated**   | 2018-08-17                                           |
  8 | 
  9 | 
 10 | ## Objective
 11 | 
 12 | The API for TensorFlow variables has many drawbacks: impossible-to-reason-about semantics, reliance on global scopes, and reliance on global collections. As the TensorFlow API moves to become more pythonic and object oriented, with the Keras layers and models and the object-based serialization, we no longer have a need for much of this global infrastructure around variables.
 13 | 
 14 | 
 15 | ## Main changes
 16 | 
 17 | The API for Variables will then change in the following ways for TF 2.0:
 18 | 
 19 | 
 20 | 
 21 | *   tf.Variable will become an abstract base class with a well-defined interface and a scoped factory to construct instances
 22 |     *   users will be able to implement their own variable-like objects by subclassing tf.Variable and adding a scoped factory function to use those variables
 23 | *   variable_scope and get_variable will be removed
 24 |     *   the tf 1.0 version of variable_scope and get_variable will be left in tf.compat.v1
 25 |     *   to control variable naming users can use tf.name_scope + tf.Variable
 26 |     *   whether a variable is shared across sessions / processes will be controlled by a constructor argument to tf.Variable; no other type of scope reuse will be done in the framework
 27 |     *   scoped partitioning will be implemented as a factory function at first
 28 |     *   libraries and users are encouraged to reuse variables by reusing their objects, like Keras layers do
 29 |     *   custom_getters will have the following API: [variable_creator_scope](https://github.com/tensorflow/tensorflow/blob/567189980f7a1c2aa09a5170bd8d01a6ec37d303/tensorflow/python/ops/variable_scope.py#L2402)
 30 | *   the default implementation of the tf.Variable interface will be ResourceVariable
 31 |     *   RefVariable will be kept in tf.compat.v1 and will be the default implementation for tf.compat.v1.Variable
 32 |     *   tf.compat.v1.Variable will have a use_resource argument to control whether a resource variable or a ref variable will be created
 33 | *   symbols like tf.assign* will be removed in favor of methods in tf.Variable
 34 |     *   in tf.compat.v1 these symbols will be marked as deprecated and will call the corresponding methods in the Variable object instead
 35 | 
 36 | 
 37 | ## Detailed changes
 38 | 
 39 | 
 40 | ### tf.Variable class
 41 | 
 42 | The tf.Variable class will be an abstract base class which defines a tf.Variable interface. Initially this interface will have enough abstract methods such that the user-visible API of tf.Variable does not change.
 43 | 
 44 | There will be two main implementations of this interface: RefVariable, with the legacy ref edges, available only in tf.compat.v1, and ResourceVariable, which is the default for the v2 API. PartitionedVariable, MirroredVariable, _UnreadVariable, CastVariable, etc, are other implementations which are part of the core library. None of these implementations will be publicly visible, only tf.Variable will be.
 45 | 
 46 | Constructing variables is done by calling tf.Variable(*args, **kwargs). Under the hood this will call a hierarchy of scoped constructor functions, similar to what is now done in variable_scope.variable. Each such constructor function can do some combination of:
 47 | 
 48 | 
 49 | 
 50 | *   calling a base constructor to actually create a variable
 51 | *   returning preexisting variables
 52 | *   changing some arguments to the base constructor, and maybe calling it multiple times
 53 | 
 54 | This is implemented by having a custom metaclass for tf.Variable which, when asked to construct a tf.Variable directly will call the factory functions, but when asked to construct subclasses of tf.Variable will do nothing and construct the child class.
 55 | 
 56 | The tf.Variable interface will make no reference to graph collections, and tf.Variable will not add the Variable to any collections by default. tf.compat.v1.Variable, on the other hand, will have the collections argument and respect the existing semantics for it. Things which currently rely on collections (saving / loading, Optimizer.minimize, etc) will instead be expected to be passed either a list of variables or a CheckpointableBase-inheriting object.
 57 | 
 58 | 
 59 | ### Variable sharing
 60 | 
 61 | Sharing within a model will not be a part of the public API for tf.Variable. Users are strongly encouraged to share variables by sharing a reference to their objects.
 62 | 
 63 | That said, the tf.compat.v1.variable_scope library can be made self-contained if we replace the per-graph variable scope stack with a module-global weak key dictionary from graphs to scope objects, and we call the protected methods to access graph collections. This will remain available for users who are not willing to port their libraries to have object-based sharing, as the support burden of maintaining that file in tf.compat.v1 is negligible and the volume of code written to use it is broad.
 64 | 
 65 | 
 66 | ### Checkpointing
 67 | 
 68 | Checkpointing will be done in tf 2.0 via the [object-oriented checkpointing API](https://www.tensorflow.org/api_docs/python/tf/contrib/checkpoint/Checkpointable).
 69 | 
 70 | 
 71 | ### Optimizers
 72 | 
 73 | The Optimizer.minimize method will no longer work if it's passed a Tensor and no list of variables. Users are expected to pass the list of variables to minimize wrt or pass an object which implements the CheckpointableBase interface to let the optimizer find the variables. The behavior of tf.compat.v1.Optimizer will not change.
 74 | 
 75 | 
 76 | ### Assignment operations
 77 | 
 78 | Instead of having free functions which access internal state of variables, reading from and writing to variables will be done via methods. Current tf.assign*(variable, ...) will become variable.assign*(...). tf.compat.v1 will keep the old aliases, but they will call the new methods instead.
 79 | 
 80 | This is an easy LSC to make (once the current operations are modified to return a RefVariable object instead of a Ref tensor) and will make the code more homogeneous and pythonic.
 81 | 
 82 | 
 83 | ### Ref edges versus resources
 84 | 
 85 | TensorFlow graphs need to represent state (information which survives calls to session.run, or generally information produced by an op which depends on something other than the content of its input tensors) so most nontrivial programs can be useful. Examples of state are input pipelines, model parameters, queues, mutexes, and random number generators.
 86 | 
 87 | There are a number of ways of representing state in TensorFlow directly in the graph, but the most robust and flexible is using resource handles. A **resource handle** is a regular immutable Tensor which represents a name to a shared out-of-graph resource (any C++ class inheriting from ResourceBase can be used as a resource). The resource handle itself doesn't change during the program execution. The resource pointed to by a handle lives on a specific device (so while it's possible to serialize resource handle tensors it's usually not a good idea), and can be accessed by any op which runs on that device and has access to the resource handle tensor. These ops can do things such as reading from the resource, modifying the resource, initializing the resource, and deleting it.
 88 | 
 89 | A resource handle is a scalar tensor of dtype DT_RESOURCE (or dtypes.resource in Python), and can be manipulated as any other Tensor: you can concatenate resources, they can go through conditionals, you can slice into them, etc. This means that while it's often possible to determine statically whether two operations can access the same resource some graphs might be structured in ways which make this difficult.
 90 | 
 91 | When you can determine statically that two ops touch the same resource you can make inferences about the state of the resource when one op is executing solely by looking at the graph. For example, if there is a path formed of control or data edges connecting a resource-using op O to a resource-using op O', you know that O' is guaranteed to see the effects of O on the resource and, conversely, that O is guaranteed to not see the effects of O' on the resource. If, on the other hand, there is no path in the graph connecting ops O and O' which use the same resource then whether one sees the effects of the other is undefined, and might vary from one execution to another.
 92 | 
 93 | Resource variables were the motivating case for introducing the explicit notion of resources to TensorFlow graphs. This was done to avoid complicated issues related to the lack of a memory model for the deprecated ref-edge-based variables  and allow compilation of TensorFlow graphs containing mutable state.
 94 | 
 95 | A resource-based variable is the simplest type of resource. What's stored in the device's resource manager is a pair of a Tensor and a mutex. The main operation to read the value of a variable is read_variable_op, and it simply outputs a Tensor which has the same value as the Tensor in the resource handle state. There are many ops which write to the resource (assign_variable_op, assign_add_variable_op, resource_apply_gradient_descent, etc), and the basic properties of the resource edges ensure that it's possible to order reading and writing ops to avoid undefined behavior.
 96 | 
 97 | These ops are currently implemented using copy-on-write, but they could also be implemented using copy-on-read or other, more complex, mechanisms, as long as the semantics of the read-before-writes and write-before-read are respected and as long as no mutation is done to the Tensor returned by a read_variable_op after it's been read. Here are two examples of why mutating a Tensor returned by a read_variable_op might be dangerous:
 98 | 
 99 | 
100 | 
101 | *   tf.cond predicates: a tf.cond takes a boolean tensor as a predicate and conditionally executes ops in the true or false branch of the conditional based on the value of the predicate. The way this is implemented in TensorFlow, to allow for graph pruning and non-strict execution is that there are many "switch" ops in the graph, each of which looks at the value of the predicate and decides which operations downstream from it can execute. If the predicate is a variable and one branch modifies the value of this variable, we would like to ensure that, because the "read" operation happened before the switch ops, only one branch of the conditional will execute. If, instead, writing to a variable could mutate the value of the tensor returned by "read", then a subset of both branches could execute, leading to hard-to-debug errors.
102 | *   gating gradients: when computing the backward pass and training a deep neural network there is by default no in-graph order between the operation to update the parameters of a layer based on its gradients and to use the value of the parameters of the layer to compute the gradient with respect to the previous layer. If the value of a variable was allowed to change after it was read, it would be possible for the value after the update to be used in the backward pass, leading to incorrect gradients for the layers closer to the input of the network.
103 | 
104 | These are just two examples of how it's much harder to reason about TensorFlow programs when the value of a variable can change after it was read.
105 | 
106 | Before resource handles TensorFlow variables were represented using a "ref" edge. A ref edge is a pair of pointers, one to a Tensor and one to a mutex, owned by something other than the tf runtime. When an op takes a ref tensor its input has to be a ref tensor, but when an op takes a non-ref tensor but its input is a ref tensor the pointer is silently dereferenced. This means that normal tensor objects in the graph can silently alias a mutable tensor, and hence two ops with the same input can see it having different values. Which value will be seen can depend on execution-specific details such as whether the variables are on a local or remote device, and in general it's not easy to ensure that a read happens before or after a specific write.
107 | 
108 | 
109 | ### Internal resource variable ops
110 | 
111 | We will expose the internal ops used to implement ResourceVariable as tf.experimental.variable_operations (name TBD). This way users and libraries can, if they need to, modify the behavior of variables at will.
112 | 
113 | 
114 | ## Migration plan
115 | 
116 | The migration plan is roughly as follows. TODO(apassos): flesh out this section with cost estimates.
117 | 
118 | 
119 | 
120 | 1.  Implement the abstract base class and factory function scope under the hood
121 | 1.  Expose the factory function scope as tf.variable_creator_scope
122 | 1.  LSC to change tf.variable_scope / tf.get_variable to tf.compat.v1.*
123 | 1.  Removal of tf.variable_scope and tf.get_variable from the tf 2 namespace
124 | 1.  Implement the subclass to be returned from tf.assign*
125 | 1.  LSC to change tf.assign*(v, …) to v.assign*(...)
126 | 1.  Change the implementation of tf.compat.v1.variable_scope to not rely on a per-graph variable scope stack
127 | 1.  Remove the get_variable_scope and related public methods from tf.Graph (leaving them on tf.compat.v1.Graph)
128 | 1.  Implement PartitionedVariable as a subclass of the tf.Variable interface
129 | 1.  Add a partitioner scope to the tf 2.0 API
130 | 1.  Add a deprecation warning to the tf.compat.v1 partitioned variable scope with a migration warning
131 | 1.  [questionable] Implement a variable creator factory function which calls get_variable under the hood
132 | 1.  Make this function active in all tf.compat.v1 endpoints which currently call get_variable (with a decorator, probably)
133 | 1.  Change the behavior in tf2 to call tf.Variable (which will redirect to tf.get_variable in tf.compat.v1, keeping the existing behavior but cleaning the codebase)
134 | 1.  [WARNING: checkpoint-breaking change] drop calls to variable_scope in parts of our API which use it. Right now they are: feature_column, rnn, canned estimators, optimizer slots, TPU estimator. Most can be replaced with judicious use of name= arguments
135 | 1.  [optional] Implement tf v2 make_template which does not rely on variable_scope internally and uses a factory creator function to track and reuse variables
136 | 
137 | 
138 | ## Questions and Discussion Topics
139 | 
140 | 1. How should we deal with the deprecation of model building APIs?
141 | 


--------------------------------------------------------------------------------
/rfcs/20181016-optimizer-unification.md:
--------------------------------------------------------------------------------
  1 | # TensorFlow 2.0: Optimizer unification
  2 | 
  3 | | Status        | Proposed       |
  4 | :-------------- |:---------------------------------------------------- |
  5 | | **Author(s)** | Francois Chollet (fchollet@google.com)               |
  6 | | **Sponsor**   | Martin Wicke (wicke@google.com)                      |
  7 | | **Updated**   | 2018-10-16                                           |
  8 | 
  9 | ---
 10 | 
 11 | ## Context
 12 | 
 13 | Keras has its own set of optimizers, living in `tf.keras.optimizers` (e.g. `tf.keras.optimizers.Adam`, `tf.keras.optimizers.Adadelta`). TensorFlow has also its own set of optimizers, living in `tf.train` (internally named `tf.training`), e.g. `tf.train.AdamOptimizer`, `tf.train.AdadeltaOptimizer`.
 14 | 
 15 | TensorFlow optimizers are now the recommended way to train tf.keras models, because:
 16 | 1. they are required to support eager execution.
 17 | 2. they are required to support Distribution Strategies.
 18 | 
 19 | However, there are a number of key Keras features that are broken when using TensorFlow optimizers due to limitations of the current TensorFlow optimizer API:
 20 | 
 21 | 1) `model.save()`, for a model compiled with a TF optimizer, will not include the optimizer configuration nor the optimizer state, which prevents users from restarting training from a saved model. This is due to:
 22 | - TF Optimizer instances cannot be serialized (cloned).
 23 | - TF Optimizer instances do not implement the Layer/Model API for weight loading/setting.
 24 | 
 25 | 2) The callbacks `LearningRateScheduler` and `ReduceLROnPlateau` (dynamic adaption of the optimizer's learning rate during training) will not work for a model compiled with a TF optimizer. This is due to the fact that there is no way to dynamically adjust the hyperparameters of a TF Optimizer after instantiating it.
 26 | 
 27 | 3) By forcing TF Optimizers for Keras training, we are asking users to take on additional complexity to use Keras. It's not enough to learn ML and NN and Keras and datasets and eager mode, now they also need to be able to know the library of TF optimizers and how to configure them. This also breaks the marketing pitch of "You can run tf.keras just like the normal keras library, with only an import change".
 28 | 
 29 | In addition, it is fairly confusing for users to have 2 distinct sets of optimizers with a different feature set.
 30 | 
 31 | Thus we should seek to unify the Keras optimizer API and the TensorFlow optimizer API, by 1) extending the TensorFlow optimizer API, 2) replacing the tf.keras optimizers the upgraded TF optimizers.
 32 | 
 33 | ---
 34 | 
 35 | ## Objective
 36 | 
 37 | - Unify `tf.train` and `tf.keras.optimizers` API:
 38 |     - Make all TensorFlow optimizers JSON-serializable, and make it possible to save/restore their state.
 39 |     - Make it possible to dynamically modify the value of the hyperparameters of all TensorFlow optimizers, in particular the learning rate.
 40 |         - The current way to achieve dynamic learning rates is 1) use a LR tensor with built-in decay, 2) use a callable. Both of these approaches are limited (do not support fully-dynamic rates, e.g. adapting the rate based on the current loss decrease), and not intuitive. Doing `optimizer.lr = 0.2` at arbitrary points during training is eager-first and more user-friendly.
 41 | - Have a single set of optimizers (same signatures, same objects, no wrappers), introduced as a new set of classes with an updated API, importable from `tf.keras.optimizers`. These optimizers would be based on the existing `tf.contrib.optimizer_v2` optimizers (which themselves are based on `tf.train optimizers`).
 42 | 
 43 | 
 44 | The old optimizers will exist in tf.compat.v1 as-is.
 45 | 
 46 | The known breaking changes are:
 47 | - Due to name changes, old checkpoints would not be loadable with the new optimizers. This is opt-in: your checkpoint won't break until you start using the new optimizers in your code (you can always import the old optimizers from tf.compat.v1).
 48 | - Some arguments are getting renamed.
 49 | - The `use_locking` argument is removed.
 50 | 
 51 | ---
 52 | 
 53 | ## Design Proposal
 54 | 
 55 | - Add a `get_config` method on every optimizer, as well as a `from_config` class method, to serialize / deserialize an optimizer (does not include weights value, i.e. state, but only includes hyperparameter values, i.e. the arguments that can be passed to the constructor).
 56 | - Add a `get_weights` and `set_weights` method, to retrieve (or set) the optimizer’s state as a list of numpy arrays -- this is necessary for compatibility with the Keras API.
 57 | - Add ability to set the values of optimizer hyperparameters (i.e. the arguments that can be passed to the constructor) at any point in the lifetime of the optimizer, without having to reinstantiate it. In particular this includes the ability to change the value of the learning rate.
 58 | - Add support for gradient clipping by norm and by value.
 59 | - Disable reusing a single optimizer instance across multiple graphs.
 60 | - Move the optimizer classes to `tf.keras.optimizers`, with revise signatures (see details below).
 61 | 
 62 | 
 63 | ---
 64 | 
 65 | ## Detailed Design
 66 | 
 67 | ### I - Add a get_config method on every optimizer:
 68 | 
 69 | ```python
 70 | optimizer.get_config()
 71 | ```
 72 | 
 73 | This method is already present on the Model class and every Layer class.
 74 | 
 75 | **Returns:**
 76 | - A JSON-serializable dictionary (does not contain any non-serializable data such as tensors) containing the configuration of the optimizer, i.e. its constructor arguments. For instance, for Adadelta, this would look like `{'learning_rate': 0.1, 'rho': 0.95, 'epsilon': 1e-8, 'name': 'my_optimizer'}`
 77 | 
 78 | 
 79 | ### II - Add a from_config class method on every optimizer (only needs a single implementation on the base class):
 80 | 
 81 | ```python
 82 | optimizer = Adadelta.from_config(config)
 83 | ```
 84 | 
 85 | This method is already present on the Model class and every Layer class. This method is required for Keras compatibility.
 86 | 
 87 | **Args:**
 88 | - config: A dictionary, containing the same keys as what gets returned by `get_config`.
 89 | 
 90 | **Returns:**
 91 | - An optimizer instance with the desired configuration, effectively a clone of the original optimizer (minus its state, i.e. its weight values).
 92 | 
 93 | 
 94 | ### III - Add a get_weights method on every optimizer (only needs a single implementation on the base class):
 95 | 
 96 | ```python
 97 | optimizer.get_weights()
 98 | ```
 99 | 
100 | This method is already present on the Model class and every Layer class.
101 | 
102 | **Returns:**
103 | - A flat list of Numpy arrays, in deterministic order, where each array represents the value of an internal weight of the optimizer (such as the momentum of a model weight).
104 | 
105 | 
106 | ### IV - Add a set_weights method on every optimizer (only needs a single implementation on the base class):
107 | 
108 | ```python
109 | optimizer.set_weights(weights)
110 | ```
111 | 
112 | This method is already present on the Model class and every Layer class. This method is required for Keras compatibility.
113 | 
114 | **Args:**
115 | - weights: A flat list of Numpy arrays, in deterministic order, same as returned by get_weights. Note that since the optimizer creates its internal weights to match the set of weights it is trying to optimize, set_weights would only match get_weights when the set of weights being optimized is equivalent. E.g.:
116 | 
117 | ```python
118 | optimizer = Adadelta()
119 | _ = optimizer.get_weights()  # returns empty list since the optimizer has no weights at that point
120 | model.compile(optimizer=optimizer, loss=loss) # Weights are created here
121 | weights = optimizer.get_weights()  # Returns a list of numpy arrays
122 | optimizer.set_weights(weights)  # This works!
123 | 
124 | # This will not work since this optimizer would have a different set weight
125 | different_model.optimizer.set_weights(weights)
126 | ```
127 | 
128 | Note: if the optimizer has been called on more than a single set of weights, we should disable `get_weights` and `set_weights` since their meaning would be ambiguous.
129 | 
130 | 
131 | ### V - Make all optimizer hyperparameters accessible via attributes (they currently aren’t retrievable):
132 | 
133 | ```python
134 | optimizer = Adadelta(learning_rate=0.2)
135 | optimizer.learning_rate  # returns learning rate tensor
136 | ```
137 | 
138 | This should generally work for any numerical parameter that can be passed to the constructor.
139 | 
140 | 
141 | ### VI - Make the following work on every optimizer, in both eager and graph modes:
142 | 
143 | ```python
144 | optimizer = Adadelta(learning_rate=0.2)
145 | optimizer.learning_rate = 0.1
146 | ```
147 | 
148 | This should generally work for any numerical parameter that can be passed to the constructor.
149 | 
150 | In graph mode, this would require 1) creating TF variables for these parameters in the constructor, 2) overriding `__setattr__` to do an assign on the target parameter using the default session.
151 | 
152 | In eager mode, there are no issues.
153 | 
154 | 
155 | ### VII - Add support for gradient clipping by norm or by value
156 | 
157 | The following arguments should be supported on all optimizers (it only requires a single shared implementation in the base class):
158 | 
159 | ```python
160 | Adadelta(clip_norm=0.)
161 | Adadelta(clip_value=0.)
162 | ```
163 | 
164 | 
165 | ### VIII - Unify optimizer signatures across Keras and tf.train.
166 | 
167 | Optimizers would live in `tf.keras.optimizers`. The old optimizers would remain in `tf.compat.v1`.
168 | 
169 | The set of new optimizers would be:
170 | 
171 | - SGD (aliased to GradientDescent, corresponds to both GradientDescentOptimizer and MomentumOptimizer)
172 | - Adadelta
173 | - Adagrad
174 | - Adam
175 | - FTRL (not yet in Keras)
176 | - RMSProp
177 | - Adamax (not yet in TF)
178 | - Nadam (not yet in TF)
179 | 
180 | We will remove `ProximalGradientDescent` and `ProximalAdagrad` (they will stay in `tf.compat.v1`). They do not appear to be used by a critical mass of users.
181 | 
182 | The implementation of these optimizers would be essentially the same as that of current TF optimizers, with slight occasional changes to support new functionality (rare). However, the signature of these optimizers would change significantly, as described below. There would also be changes in the core Keras API. These changes would be made fully backwards compatible via API conversion decorators (similar to what we did when we changed the Keras API from 1.0 to 2.0) and would be replicated in both tf.keras and external Keras.
183 | 
184 | Signature details below.
185 | 
186 | 
187 | ### SGD
188 | 
189 | Current TF signatures:
190 | 
191 | ```Python
192 | GradientDescentOptimizer(learning_rate, use_locking=False, name="GradientDescent")
193 | MomentumOptimizer(learning_rate, momentum, use_locking=False, name="Momentum", use_nesterov=False)
194 | ```
195 | 
196 | Current Keras signature:
197 | 
198 | ```Python
199 | keras.optimizers.SGD(lr=0.01, momentum=0.0, decay=0.0, nesterov=False)
200 | ```
201 | 
202 | Proposed signature:
203 | 
204 | ```Python
205 | SGD(learning_rate=0.001,
206 |     momentum=0.0,
207 |     decay=0.0,
208 |     nesterov=False,
209 |     name='SGD')
210 | ```
211 | 
212 | **Notes:**
213 | - Optimizers should not require positional arguments, especially if some do and some don’t (like now), and especially if the set of required positional arguments changes from optimizer to optimizer. For the best UX, all arguments should have a reasonable default value.
214 | - The implementation of SGD with/without momentum is not sufficiently different to justify two distinct classes. A single SGD class provides a better UX.
215 | - Public API arguments should not be about internal implementation details that cannot be readily understood by users (e.g. `use_locking`).
216 | 
217 | ### Adadelta
218 | 
219 | Current TF signature:
220 | 
221 | ```Python
222 | AdadeltaOptimizer(learning_rate=0.001, rho=0.95, epsilon=1e-8,
223 |                   use_locking=False, name="Adadelta")
224 | ```
225 | 
226 | Current Keras signature:
227 | 
228 | ```Python
229 | Adadelta(lr=1.0, rho=0.95, epsilon=None, decay=0.0)
230 | ```
231 | 
232 | Proposed signature:
233 | 
234 | ```Python
235 | Adadelta(learning_rate=0.001,
236 |          rho=0.95,
237 |          epsilon=1e-8,
238 |          decay=0.0,
239 |          name="Adadelta")
240 | ```
241 | 
242 | **Notes:**
243 | - `epsilon=None` in Keras means “use the global default value for the epsilon fuzz factor” (typically `1e-7`). Should we also keep this behavior in the new API or should we have explicit values in the signatures? This applies to all optimizers.
244 | 
245 | 
246 | ### Adagrad
247 | 
248 | Current TF signature:
249 | 
250 | ```Python
251 | AdagradOptimizer(learning_rate, initial_accumulator_value=0.1, use_locking=False, name="Adagrad")
252 | ```
253 | 
254 | Current Keras signature:
255 | 
256 | ```Python
257 | Adagrad(lr=0.01, epsilon=None, decay=0.0)
258 | ```
259 | 
260 | Proposed signature:
261 | 
262 | ```Python
263 | Adagrad(learning_rate=0.001,
264 |         epsilon=1e-8,
265 |         decay=0.0,
266 |         initial_accumulator_value=0.1,
267 |         name="Adagrad")
268 | ```
269 | 
270 | ### Adam
271 | 
272 | Current TF signature:
273 | 
274 | ```Python
275 | AdamOptimizer(learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8, use_locking=False, name="Adam")
276 | ```
277 | 
278 | Current Keras signature:
279 | 
280 | ```Python
281 | Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
282 | ```
283 | 
284 | Proposed signature:
285 | 
286 | ```Python
287 | Adam(learning_rate=0.001,
288 |      beta_1=0.9,
289 |      beta_2=0.999,
290 |      epsilon=1e-8,
291 |      decay=0.0,
292 |      amsgrad=False,
293 |      name="Adam")
294 | ```
295 | 
296 | ### FTRL
297 | 
298 | Current TF signature:
299 | 
300 | ```Python
301 | FtrlOptimizer(learning_rate,
302 |               learning_rate_power=-0.5,
303 |               initial_accumulator_value=0.1,
304 |               l1_regularization_strength=0.0,
305 |               l2_regularization_strength=0.0,
306 |               use_locking=False,
307 |               name="Ftrl",
308 |               accum_name=None,
309 |               linear_name=None,
310 |               l2_shrinkage_regularization_strength=0.0)
311 | ```
312 | 
313 | Proposed signature:
314 | 
315 | ```Python
316 | FTRL(learning_rate,
317 |      learning_rate_power=-0.5,
318 |      initial_accumulator_value=0.1,
319 |      l1_regularization_strength=0.0,
320 |      l2_regularization_strength=0.0,
321 |      name="FTRL",
322 |      l2_shrinkage_regularization_strength=0.0)
323 | ```
324 | 
325 | 
326 | ### RMSProp
327 | 
328 | Current TF signature:
329 | 
330 | ```Python
331 | RMSPropOptimizer(learning_rate,
332 |                  decay=0.9,
333 |                  momentum=0.0,
334 |                  epsilon=1e-10,
335 |                  use_locking=False,
336 |                  centered=False,
337 |                  name="RMSProp")
338 | ```
339 | 
340 | Current Keras signature:
341 | 
342 | ```Python
343 | RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
344 | ```
345 | 
346 | Proposed signature:
347 | 
348 | ```Python
349 | RMSProp(learning_rate=0.001,
350 |         rho=0.9,
351 |         epsilon=1e-8,
352 |         decay=0.0,
353 |         centered=False,
354 |         name="RMSProp")
355 | ```
356 | 
357 | **Notes:**
358 | - The `rho` argument was named `decay` in TF. The `decay` argument is a standard argument on all adaptive learning-rate optimizers.
359 | 
360 | 
361 | ### Adamax
362 | 
363 | Current Keras signature:
364 | 
365 | ```Python
366 | Adamax(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0)
367 | ```
368 | 
369 | Proposed signature:
370 | 
371 | ```Python
372 | Adamax(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8, decay=0.0, name="Adamax")
373 | ```
374 | 
375 | 
376 | ### Nadam
377 | 
378 | Current Keras signature:
379 | 
380 | ```Python
381 | Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, schedule_decay=0.004)
382 | ```
383 | 
384 | Proposed signature:
385 | 
386 | ```Python
387 | Nadam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-8, decay=0.0, name="Nadam")
388 | ```
389 | 
390 | 
391 | 
392 | ---
393 | 
394 | ## Questions and Discussion Topics
395 | 
396 | - Do you have a use case where you need to reuse an optimizer across different sets of weights? (note: this will still be doable with this proposal) Describe your use case.
397 | - Do you use the `centered` or `initial_accumulator_value` arguments?
398 | - Do you use the `use_locking` argument? Describe your use case.
399 | 


--------------------------------------------------------------------------------
/rfcs/20181214-move-to-addons.md:
--------------------------------------------------------------------------------
  1 | # Move from tf.contrib to tensorflow/addons
  2 | 
  3 | | Status        | Accepted       |
  4 | :-------------- |:---------------------------------------------------- |
  5 | | **Author(s)** | Sean Morgan (seanmorgan@outlook.com), Armando Fandango (armando@neurasights.com) |
  6 | | **Sponsor**   | Karmel Allison (karmel@google.com)                 |
  7 | | **Updated**   | 2018-12-16                                           |
  8 | 
  9 | ## Objective
 10 | 
 11 | With the upcoming removal of tf.contrib in TF 2.0, we are in the process
 12 | of deciding what existing functionality will be moved and maintained in
 13 | the [tensorflow/addons](https://github.com/tensorflow/addons)
 14 | repository.
 15 | 
 16 | This document details what functionality the SIG plans to move and
 17 | invites discussion around the decisions.
 18 | 
 19 | 
 20 | ## Motivation
 21 | 
 22 | In this RFC, we are soliciting discussion regarding what tf.contrib code
 23 | should be moved to tensorflow/addons. This RFC discussion will help us
 24 | determine the value of moving code and their respective
 25 | maintainability aspects.
 26 | 
 27 | ## Design Proposal
 28 | 
 29 | ### Criteria for moving
 30 | 1) The functionality is not otherwise available in TensorFlow
 31 | 1) There is sufficient interest in the community to maintain the code being moved
 32 | 1) The code conforms to an established API pattern (Some pieces can be refactored if needed)
 33 | 
 34 | It is worth noting that just because some functionality isn't part of
 35 | the initial move, does not mean it won't be eventually part of addons
 36 | if there is value. We will begin reviewing pull requests to the
 37 | repository after the directory structure is shaped during the initial move.
 38 | 
 39 | ### Code to be moved from tf.contrib to addons
 40 | 
 41 | | Module (tf.contrib)     | Class/Function   | Rationale                               |
 42 | |:----------------------- |:----------- |:------------------------------------ |
 43 | | opt.external_optimizer  | ExternalOptimizerInferface  | Base class for external optimizers used in OSS projects |
 44 | | opt.external_optimizer | ScipyOptimizerInterface  | Significant usage in OSS projects |
 45 | | opt.lazy_adam_optimizer | LazyAdamOptimizer | Significant usage in OSS projects / discussions |
 46 | | opt.moving_average_optimizer | MovingAverageOptimizer | Significant usage in OSS projects |
 47 | | layers.layers | dense_to_sparse | Useful functionality and discussion around it |
 48 | | layers.layers | layer_norm | Heavily used is OSS projects / From impactful paper |
 49 | | layers.layers | maxout | From impactful paper |
 50 | | layers.layers | poincare_normalize | Functionality not available / Useful for hyperbolic embeddings |
 51 | | layers.normalization | instance_norm | Heavily used is OSS projects / Used for style xfer |
 52 | | layers.normalization | group_norm | Will be moved as a generalized case of layer_norm and instance_norm |
 53 | | losses.metric_loss_ops | pairwise_distance | Useful functionality not otherwise available  |
 54 | | losses.metric_loss_ops | contrastive_loss | Useful functionality not otherwise available |
 55 | | losses.metric_loss_ops | masked_maximum | Useful functionality not otherwise available |
 56 | | losses.metric_loss_ops | masked_minimum | Useful functionality not otherwise available |
 57 | | losses.metric_loss_ops | triplet_semihard_loss | Useful functionality not otherwise available / From impactful paper |
 58 | | losses.metric_loss_ops | npairs_loss | Useful functionality not otherwise available |
 59 | | losses.metric_loss_ops | npairs_loss_multilabel | Useful functionality not otherwise available |
 60 | | losses.metric_loss_ops | lifted_struct_loss | Useful functionality not otherwise available |
 61 | | sparsemax.sparsemax | ALL | Useful functionality not otherwise available / Volunteers to maintain |
 62 | | image.dense_image_warp | dense_image_warp | Useful functionality not otherwise available |
 63 | | image.distort_image_ops | random_hsv_in_yiq | Useful functionality not otherwise available |
 64 | | image.distort_image_ops | adjust_hsv_in_yiq | Useful functionality not otherwise available |
 65 | | image.image_ops | rotate | Useful functionality not otherwise available / Several uses in OSS found |
 66 | | image.image_ops | translate  | Useful functionality not otherwise available |
 67 | | image.image_ops | angles_to_projective_transforms | Useful functionality not otherwise available / Several uses in OSS found |
 68 | | image.image_ops | translations_to_projective_transforms | Useful functionality not otherwise available  |
 69 | | image.image_ops | transform  | Useful functionality not otherwise available / Several uses in OSS found |
 70 | | image.image_ops | compose_transforms | Useful functionality not otherwise available / Several uses in OSS found |
 71 | | image.image_ops | flat_transforms_to_matrices | Helper util used a few times in module |
 72 | | image.image_ops | matrices_to_flat_transforms | Helper util used a few times in module |
 73 | | image.image_ops | connected_components | Useful functionality not otherwise available |
 74 | | text.skip_gram_ops | ALL | Useful functionality not otherwise available |
 75 | | crf.crf | ALL | Heavily used by the NLP community |
 76 | | opt.weight_decay_optimizers | DecoupledWeightDecayExtension | ~SOTA convergence speeds / Needs refactored as Wrapper subclass |
 77 | | opt.weight_decay_optimizers | AdamWOptimizer  | ~SOTA convergence speeds / Needs refactored as wrapper + keras Adam |
 78 | | opt.weight_decay_optimizers | MomentumWOptimizer | ~SOTA convergence speeds / Needs refactored as wrapper + keras SGD|
 79 | 
 80 | ### Code that will not be moved from tf.contrib pending objections
 81 | 
 82 | | Module (tf.contrib)     | Class/Function   | Rationale                               |
 83 | |:----------------------- |:----------- |:------------------------------------ |
 84 | | opt.addsign  | AddSignOptimizer  | No OSS uses found / Needs refactored as OptimizerV2 subclass |
 85 | | opt.agn_optimizer | AGNOptimizer      | No OSS uses found / Needs refactored as OptimizerV2 subclass |
 86 | | opt.drop_stale_gradient_optimizer | DropStaleGradientOptimizer | No OSS uses found / Needs refactored as Wrapper subclass |
 87 | | opt.elastic_average_optimizer | ElasticAverageOptimizer | No OSS uses found / Needs refactored as Wrapper subclass |
 88 | | opt.ggt | GGTOptimizer | No OSS uses found |
 89 | | opt.lars_optimizer | LARSOptimizer | No OSS uses found / Needs refactored as OptimizerV2 subclass |
 90 | | opt.shampoo | ShampooOptimizer | No OSS uses found / Needs refactored as OptimizerV2 subclass |
 91 | | opt.matrix_functions | matrix_inverse_pth_root  | Used in opt.shampoo |
 92 | | opt.model_average_optimizer | ModelAverageOptimizer | No OSS uses found / Needs refactored as Wrapper subclass |
 93 | | opt.multitask_optimizer_wrapper | MultitaskOptimizerWrapper | No OSS uses found / Needs refactored as Wrapper subclass |
 94 | | opt.multitask_optimizer_wrapper | clip_gradients_by_global_norm | No OSS uses found / Specific to MultitaskOptimizers / At least partly covered in Keras optimizer |
 95 | | opt.powersign | PowerSignOptimizer | No OSS uses found / Needs refactored as OptimizerV2 subclass |
 96 | | opt.sign_decay | get_linear_decay_fn | No OSS usage / Used in AddSign & PowerSign |
 97 | | opt.sign_decay | get_cosine_decay_fn | No OSS usage / Not an optimizer |
 98 | | opt.sign_decay | get_restart_decay_fn | No OSS usage / Not an optimizer |
 99 | | opt.reg_adagrad_optimizer | RegAdagradOptimizer | No OSS uses found / Needs refactored as keraas Adagrad subclass |
100 | | opt.variable_clipping_optimizer | VariableClippingOptimizer  | No OSS uses found / Needs refactored as Wrapper subclass / partial covered by keras norm clip |
101 | | opt.weight_decay_optimizers | ShampooWOptimizer | No OSS uses found |
102 | | opt.weight_decay_optimizers | extend_with_decoupled_weight_decay | No OSS uses found /  Functional paradigm - factory function |
103 | | layers.embedding_ops | scattered_embedding_lookup_sparse | No OSS uses found |
104 | | layers.embedding_ops | embedding_lookup_unique | No OSS uses found |
105 | | layers.encoders | bow_encoder | Creates variables, but does not subclass Layer |
106 | | layers.encoders | embed_sequence | Creates variables, but does not subclass Layer |
107 | | layers.layers | convolution2d_in_plane | No OSS uses found |
108 | | layers.layers | GDN | No OSS uses found |
109 | | layers.layers | scale_gradient | No OSS uses found |
110 | | layers.layers | sequence_to_images | No OSS uses found |
111 | | layers.layers | spatial_softmax  | One OSS project found / Needs refactored as base Layer subclass / Uses get_variable_collections |
112 | | layers.optimizers | optimize_loss | Concience wrapper to build a training op / Would need refactor to stick to TF2.0 APIs |
113 | | layers.optimizers | adaptive_clipping_fn | No OSS uses found |
114 | | layers.rev_block_lib | RevBlock | No OSS uses found |
115 | | layers.rev_block_lib | recompute_grad | No OSS uses found |
116 | | layers.summaries | summarize_tensor  | One OSS project found / Very simple wrapper |
117 | | layers.utils | constant_value | Simple wrapper... need a good reason to support |
118 | | nn.alpha_dropout | alpha_dropout | No OSS uses found / Needs refactored as base Layer subclass |
119 | | nn.fwd_gradients | fwd_gradients | No OSS uses found |
120 | | nn.sampling_ops | rank_sampled_softmax_loss | One OSS use found / Needs to utilize sampled_softmax_loss_v2 |
121 | | nn.sampling_ops | sampled_sparse_softmax_loss  | No OSS uses found / Needs to utilize sampled_softmax_loss_v2 |
122 | | nn.scaled_softplus | scaled_softplus | No OSS uses found |
123 | | losses.metric_loss_ops | update_1d_tensor | No OSS uses found / Large amount of code related to cluster_loss |
124 | | losses.metric_loss_ops | get_cluster_assignment | No OSS uses found / Large amount of code related to cluster_loss |
125 | | losses.metric_loss_ops | compute_facility_energy  | No OSS uses found / Large amount of code related to cluster_loss |
126 | | losses.metric_loss_ops | compute_clustering_score | No OSS uses found / Large amount of code related to cluster_loss |
127 | | losses.metric_loss_ops | compute_augmented_facility_locations | No OSS uses found / Large amount of code related to cluster_loss |
128 | | losses.metric_loss_ops | update_medoid_per_cluster | No OSS uses found / Large amount of code related to cluster_loss |
129 | | losses.metric_loss_ops | update_all_medoids | No OSS uses found / Large amount of code related to cluster_loss |
130 | | losses.metric_loss_ops | compute_augmented_facility_locations_pam | No OSS uses found / Large amount of code related to cluster_loss |
131 | | losses.metric_loss_ops | compute_gt_cluster_score | No OSS uses found / Large amount of code related to cluster_loss |
132 | | losses.metric_loss_ops | cluster_loss | No OSS uses found / Large amount of code related to cluster_loss |
133 | | image.image_ops | bipartite_match | No OSS uses found / Should live in linalg or somewhere else? |
134 | | image.interpolate_spline | interpolate_spline | One OSS uses found / Should live in tf.signal? |
135 | | image.single_image_random_dot_stereograms | single_image_random_dot_stereograms | No OSS uses found |
136 | | image.parse_image_wrap | sparse_image_warp | No OSS uses found |
137 | | resampler.resampler_ops | ALL | Pending community interest |
138 | | solvers | ALL | Pending community interest to maintain |
139 | | integrate | ALL | Pending community interest to maintain |
140 | 
141 | ### Code that will not be copied from tf.contrib to addons and hence would not be available in either of tf.contrib or addons
142 | 
143 | | Module (tf.contrib)     | Class/Function   | Rationale                               |
144 | |:----------------------- |:----------- |:------------------------------------ |
145 | | opt.adamax  | AdaMaxOptimizer  | Available in tf.keras.optimizers |
146 | | opt.matrix_functions | matrix_square_root | Available as linalg_ops.matrix_square_root  |
147 | | opt.nadam_optimizer | NadamOptimizer | Available in tf.keras.optimizers  |
148 | | layers.embedding_ops | safe_embedding_lookup_sparse | Exists as tf.nn.safe_embedding_lookup_sparse |
149 | | layers.embedding_ops | embedding_lookup_sparse_with_distributed_aggregation | Replaced by emedding_lookup_sparse_v2 |
150 | | layers.feature_column  | ALL | Better version available in tf.feature_column |
151 | | layers.initizalizers | xavier_initializer  | tf.keras has a glorot_normal and glorot_uniform |
152 | | layers.initizalizers | variance_scaling_initializer | Exists in tf.keras.initializers |
153 | | layers.layers | avg_pool2d  | Exists in tf.keras.layers |
154 | | layers.layers | avg_pool3d | Exists in tf.keras.layers |
155 | | layers.layers | batch_norm | Exists in tf.keras.layers |
156 | | layers.layers | bias_add | Exists in tf.keras.layers |
157 | | layers.layers | conv1d  | Exists in tf.keras.layers |
158 | | layers.layers | conv2d  | Exists in tf.keras.layers |
159 | | layers.layers | conv3d  | Exists in tf.keras.layers |
160 | | layers.layers | conv2d_in_plane | Functional Alias |
161 | | layers.layers | conv2d_transpose | Exists in tf.keras.layers |
162 | | layers.layers | conv3d_transpose | Exists in tf.keras.layers |
163 | | layers.layers | convolution | Exists in tf.keras.layers |
164 | | layers.layers | convolution1d | Exists in tf.keras.layers |
165 | | layers.layers | convolution2d  | Exists in tf.keras.layers |
166 | | layers.layers | convolution2d_transpose | Exists in tf.keras.layers |
167 | | layers.layers | convolution3d | Exists in tf.keras.layers |
168 | | layers.layers | convolution3d_transpose | Exists in tf.keras.layers |
169 | | layers.layers | dropout | Exists in tf.keras.layers |
170 | | layers.layers | elu | Exists in tf.keras.layers |
171 | | layers.layers | flatten | Exists in tf.keras.layers |
172 | | layers.layers | fully_connected | Exists in tf.keras.layers |
173 | | layers.layers | gdn | Functional interface of GDN |
174 | | layers.layers | images_to_sequence | No OSS uses found /  Functional paradigm |
175 | | layers.layers | linear | Exists in tf.keras.layers |
176 | | layers.layers | pool | Exists in tf.keras.layers |
177 | | layers.layers | max_pool2d| Exists in tf.keras.layers |
178 | | layers.layers | max_pool3d | Exists in tf.keras.layers |
179 | | layers.layers | one_hot_encoding | Exists in tf.keras  / Uses collections |
180 | | layers.layers | relu | Exists in tf.keras.layers |
181 | | layers.layers | relu6 | Exists in tf.keras.layers |
182 | | layers.layers | repeat | Exists as sequential model |
183 | | layers.layers | separable_conv2d | Exists in tf.keras.layers |
184 | | layers.layers | separable_convolution2d | Exists in tf.keras.layers |
185 | | layers.layers | softmax  | Exists in tf.keras.layers |
186 | | layers.layers | stack | Exists as sequential model / Uses variable scoping |
187 | | layers.layers | unit_norm | Exists in linalg |
188 | | layers.layers | legacy_fully_connected | Legacy layer |
189 | | layers.layers | legacy_linear | Legacy layer |
190 | | layers.layers | legacy_relu  | Legacy layer |
191 | | layers.regularizers | l1_regularizer  | Available in tf.keras.regularizers |
192 | | layers.regularizers | l2_regularizer | Available in tf.keras.regularizers |
193 | | layers.regularizers | l1_l2_regularizer | Available in tf.keras.regularizers |
194 | | layers.regularizers | sum_regularizer | Trivial convience wrapper |
195 | | layers.regularizers | apply_regularization | Uses collections |
196 | | layers.rev_block_lib | rev_block | Functional paradigm for RevBlock |
197 | | layers.summaries | summarize_tensors  | Trivial list comprehension |
198 | | layers.summaries | summarize_collection | Uses collections |
199 | | layers.summaries | summarize_activations | Uses collecftions |
200 | | layers.target_column | ALL | Deprecated since Estimators |
201 | | layers.utils | collect_named_output | Unsupported tensor alias API  |
202 | | layers.utils | append_tensor_alias | Unsupported tensor alias API |
203 | | layers.utils | gather_tensors_aliases | Unsupported tensor alias API |
204 | | layers.utils | get_tensor_aliases | Unsupported tensor alias API |
205 | | layers.utils | convert_collection_to_dict | Uses collections |
206 | | layers.utils | static_cond | Simple wrapper / No OSS use |
207 | | layers.utils | smart_cond | Simple wrapper / Little OSS use |
208 | | layers.utils | get_variable_collections | Uses collections |
209 | | layers.utils | channel_dimension | Simple wrapper / No OSS use |
210 | | layers.utils | last_dimension  | Simple wrapper / No OSS use |
211 | | layers.utils | two_element_tuple  | No OSS use |
212 | | layers.utils | n_positive_integers | No OSS use |
213 | | nn.cross_entropy | ALL | Deprecated Losses |
214 | | losses.loss_ops | ALL | Available in core tf.losses |
215 | 
216 | 
217 | 
218 | **Notes:**
219 | * More details of our code review can be found in [this spreadsheet](https://docs.google.com/spreadsheets/d/1hYJchHp1y1t2U6htq5UXxMxWlGxxtOyyNHDF8_qhtQQ/edit#gid=185512613)
220 | * We used [this analysis tool](https://tf-contrib-analyzer.herokuapp.com/) to detect OSS usage.
221 | 
222 | ## Questions and Discussion Topics
223 | 
224 | * Are there any modules being excluded from the move that you feel have substantial value to the community?
225 | * Are there any new modules that you feel should be added to addons from somewhere else apart from tf.contrib
226 | * We're actively collecting volunteers to help move, refactor and/or maintain in Addons (Please reachout to our [mailing list](https://groups.google.com/a/tensorflow.org/forum/#!forum/addons)
227 | or [gitter channel](https://gitter.im/tensorflow/sig-addons) if you have interest in helping our community.
228 | 
229 | ## After Request Notes
230 | * Now that the review period has ended, please post all suggested
231 |  additions/removals directly to the tensorflow/addons [issues page](https://github.com/tensorflow/addons/issues)


--------------------------------------------------------------------------------
/rfcs/20181217-tf2-random-numbers.md:
--------------------------------------------------------------------------------
  1 | # Random numbers in TensorFlow 2.0
  2 | 
  3 | | Status        | Accepted       |
  4 | :-------------- |:---------------------------------------------------- |
  5 | | **Author(s)** | Peng Wang (wangpeng@google.com), Josh Levenberg (joshl@google.com), Alexandre Passos (apassos@google.com), Asim Shankar (ashankar@google.com) |
  6 | | **Sponsor**   | Josh Levenberg (joshl@google.com), Alexandre Passos (apassos@google.com)                 |
  7 | | **Updated**   | 2019-01-30                                           |
  8 | 
  9 | ## Objective
 10 | 
 11 | We'd like to revamp how we do random number facilities in TensorFlow 2.0.
 12 | 
 13 | *   Replace the current stateful random ops, which keep state in the C++ OpKernel instance. For 2.0, all state should be moved into Resources where it can be checkpointed, used to sequence access to the same state, allow us to manage that state when executing eagerly, etc.
 14 | *   Use stateless random ops where possible, to improve reproducibility and simplicity. For example, variable initializers should switch to stateless random ops, so that saving the initialization graph allows you to reproduce the same initial values.
 15 | *   Improve reproducibility
 16 |     *   Random state is checkpointed by default.
 17 |     *   Seeding isn't as sensitive to how many ops created in the graph so far. 
 18 |     *   Code written with eager execution should produce the same sequence when you switch to graph execution using `tf.function`.
 19 | *   Options for regenerating random tensors from a small amount of state. For example, dropout needs the large mask tensor used in the forward pass available in the backward pass, but we'd prefer not to hold on to it tying up memory in between.
 20 | *   We should switch to using the same RNG algorithm across devices, where possible.
 21 | *   We should reset the op seed any time we reset the global seed, to address [GitHub issue 9171](https://github.com/tensorflow/tensorflow/issues/9171).
 22 | *   Give the user greater control over the RNG algorithm used, to be able to select some combination of:
 23 |     *   the same sequence across many different accelerator types
 24 |     *   a fast implementation for a specific kind of accelerator
 25 |     *   RNG strength (lack of observable regularities in the output)
 26 | 
 27 | ## Motivation
 28 | 
 29 | Switching how we do random numbers is going to break a lot of tests. We should do this once. Some of the changes are likely going to be API changes that can only happen at a major version transition, and we'd prefer to get them into 2.0 instead of waiting for 3.0. The current solution relies on using the current graph's op count, which is less unique when creating a new graph for each `tf.function`.
 30 | 
 31 | ## Background
 32 | 
 33 | We currently have:
 34 | 
 35 | *   `tf.set_random_seed()` to set a "graph" seed. This is currently global to a graph. This will become a "global" seed in 2.0 due to the migration away from graphs.
 36 | *   A bunch of stateful random ops (like `tf.random_uniform()`) that explicitly take an optional "op" seed and implicitly take the graph/global seed as attrs. If both of these seeds are zero, the kernel generates seeds nondeterministically. The Python layer ensures that both seeds are zero only when neither the graph/global and op seeds are specified (both are `None`). State is kept in the C++ kernel instance, so repeated executions of the kernel return different results.
 37 | *   A set of stateless random ops that have recently been moved from contrib to core. These take two seeds as input (as tensors not attrs), but always produce the same output given the same input.
 38 | 
 39 | The contract and implementation for the stateful ops is:
 40 | 
 41 | *   If you specify either the global or op seed (i.e. at least one is not `None`), then you get deterministic/reproducible behavior.
 42 | *   If you specify global seed but not op seed, different ops get different seeds, but still deterministic/reproducible. [Currently](https://github.com/tensorflow/tensorflow/blob/00d91e7bc3111b00c2e679627362ec21dab64833/tensorflow/python/framework/random_seed.py#L39) this is generated using the count of ops in the current graph in graph-construction mode, and a pseudo-random sequence when executing eagerly. This pseudo-random sequence depends on a seed and the number of random ops executed (not all ops), see [Context.internal_operation_seed](https://github.com/tensorflow/tensorflow/blob/a3d634438e9cc70073faa796018b6173212e2f85/tensorflow/python/eager/context.py#L279).
 43 | *   If you specify neither global nor op seed (both are `None`), you get different random sequences every time, including different results if you restart the program without changing anything. Currently this is implemented by passing zero to both seed attrs to the kernel, which the kernel treats as a special case. If you set either the global or op seed, we make sure never to pass 0, 0 to the kernel, even if you say `tf.set_random_seed(0)`.
 44 | *   If you specify just the op seed, we use <code>[DEFAULT_GLOBAL_SEED](https://github.com/tensorflow/tensorflow/blob/3eb7616b5459aec3dabaa4152a00de14a1fa0914/tensorflow/python/framework/random_seed.py#L29)</code> for the global seed so you get deterministic behavior.
 45 | 
 46 | ## Design Proposal
 47 | 
 48 | The following represents the desired end-state, and doesn't go into detail about transitioning from our current stateful ops:
 49 | 
 50 | ```python
 51 | # random.py
 52 | 
 53 | # A seed for random ops (stateful and stateless) will always be 1024 
 54 | # bits, all of which will be sent to the C++ code. The actual C++ 
 55 | # implementation of some algorithms may only use a lower part of the bits.
 56 | # *QUESTION*: Is 1024 a good number? 
 57 | # *DECISION*: Yes.
 58 |   
 59 | @tf_export("random.non_deterministic_seed")
 60 | def non_deterministic_seed():  # returns an integer
 61 |   # *QUESTION*: Is this pure Python or an op?
 62 |   # *DECISION*: Op.
 63 | 
 64 | # *QUESTION*: Should this be public?
 65 | # *DECISION*: Yes.
 66 | # *QUESTION*: Should this function be usable inside tf.function?
 67 | # *DECISION*: Yes.
 68 | @tf_export("random.create_rng_state")
 69 | def create_rng_state(seed, algorithm):
 70 |   # seed must be an integer or stateless seed, never None.
 71 |   # Returns a 1-D tensor whose size depends on the algorithm.
 72 | 
 73 | @tf_export("random.Generator")
 74 | class Generator(Checkpointable):
 75 | 
 76 |   # *QUESTION*: Should this function be usable inside tf.function?
 77 |   # *DECISION*: Yes.
 78 |   def __init__(self, copy_from=None, seed=None, algorithm=None):
 79 |     if copy_from is None:
 80 |       if seed is None:
 81 |         seed = non_deterministic_seed()
 82 |       if algorithm is None:
 83 |         algorithm = ...  // auto-select
 84 |       self._state_var = tf.Variable(create_rng_state(seed, algorithm))
 85 |       self._alg_var = tf.Variable(algorithm)
 86 |     else:
 87 |       assert seed is None
 88 |       self._state_var = tf.Variable(copy_from.state)
 89 |       self._alg_var = tf.Variable(copy_from.algorithm)
 90 | 
 91 |   # *QUESTION*: Should this function be usable inside tf.function?
 92 |   # *DECISION*: Yes.
 93 |   def reset(self, seed):
 94 |     # Will be able to also change algorithm in the future
 95 |     state = create_rng_state(seed, self.algorithm)
 96 |     self._state_var.assign(state)
 97 | 
 98 |   @property
 99 |   def state(self):
100 |     return self._state_var
101 | 
102 |   @property
103 |   def algorithm(self):
104 |     return self._alg_var
105 | 
106 |   # The following functions return a tensor and as a side effect update 
107 |   # self._state_var.
108 |   def uniform(self, shape, minval=0, maxval=None, dtype=tf.float32, name=None):
109 |   def normal(self, shape, mean=0.0, stddev=1.0, dtype=tf.float32, name=None):
110 |   def make_seeds(self, shape=()):  # generates seeds for stateless random ops
111 |   def make_generators(self, count=1, name=None):
112 |     # Returns a list of `count` independent `Generator` objects
113 |   # ...
114 |   # How to use `Generator` with distribution strategies:
115 |   #   - If the generator is created outside of the distributed portion, no 
116 |   #     special treatment is needed.
117 |   #   - If the generator is created within the distributed portion, its 
118 |   #     variables always get mirrored.
119 |   #   - If you want per-replica unsynced generators, you need to explicitly 
120 |   #     create the generators (where len(generators)==len(replicas)) and send 
121 |   #     them to the replicas via the `args` argument of 
122 |   #     `DistributionStrategyExtended.call_for_each_replica`. 
123 | 
124 | global_generator = Generator()
125 | 
126 | # This function discards the old Generator object (and the variables within), 
127 | # which may be problematic with tf.function because the old object may be
128 | # captured by a 'tf.function'ed function and still be used by it.
129 | # A 'tf.function'ed function only keeps weak references to variables,
130 | # so deleting a variable and then calling that function again may raise an
131 | # error. 
132 | @tf_export("random.set_global_generator")
133 | def set_global_generator(generator):
134 |   global global_generator
135 |   global_generator = generator
136 | 
137 | @tf_export("random.get_global_generator")
138 | def get_global_generator():
139 |   return global_generator
140 | 
141 | @tf_export("random.default_algorithm")
142 | def default_algorithm():
143 | 
144 | @tf_export("random.algorithms_for_device")
145 | def algorithms_for_device(device_type):
146 |   """Returns a sequence of (algorithm, speed, strength) tuples."""
147 |   # Maybe run an op on that device to ask it
148 | 
149 | @tf_export("random.algorithms_supported_on_all_devices")
150 | def algorithms_supported_on_all_devices():
151 |   # Pick some algorithms that we can then require all devices implement
152 | 
153 | def make_seed_if_none(op_seed):
154 |   global global_generator
155 |   if op_seed is None:
156 |     return global_generator.make_seeds()
157 |   return op_seed
158 |   
159 | @tf_export("initializer.random_uniform")
160 | class RandomUniform(Initializer):
161 |   """Initializer that generates tensors with a uniform distribution..."""
162 | 
163 |   def __init__(self, minval=0, maxval=None, seed=None, dtype=dtypes.float32,
164 |                algorithm=None):
165 |     ... # unchanged, except for the addition of `algorithm`:
166 |     if algorithm is None:
167 |       algorithm = default_algorithm()
168 |     self.algorithm = algorithm
169 | 
170 |   def __call__(self, shape, dtype=None, partition_info=None):
171 |     if dtype is None:
172 |       dtype = self.dtype
173 |     return stateless_random_ops.stateless_random_uniform(
174 |         shape, make_seed_if_none(self.seed), self.minval, self.maxval, dtype,
175 |         self.algorithm)
176 | ```
177 | 
178 | We would also remove the stateful random ops from the public 2.0 API, replacing them with the stateless versions or the `tf.random.Generator` above.
179 | 
180 | This pretty well achieves our objectives:
181 | 
182 | *   `tf.random.Generator` keeps its state in resource variables:
183 |     *   the Python object owns the state
184 |     *   can be checkpointed, etc.
185 | *   Uses stateless random ops in the random initializers. The stateless seed will be a constant if the `seed` argument to the initializer is set to a non-`None` value. Otherwise it will depend on the value produced by the global op RNG.
186 | *   `tf.random.Generator` used for the op seed generation and directly should work the same in graph and eager execution.
187 | *   Seeding of individual ops without an op seed is dependent on the number of calls to `tf.random.make_seed_if_none()` not the number of ops in the graph.
188 | *   `tf.random.Generator`'s state may be copied to another `Generator`.
189 | *   Calling `tf.random.set_seed()` reinitializes the sequence of op seeds, addressing [GitHub issue 9171](https://github.com/tensorflow/tensorflow/issues/9171).
190 | *   Switching to new RNG APIs are an opportunity to switch to a different RNG algorithm that can be efficiently implemented on both TPUs and GPUs. We include a number identifying the algorithm being used in the RNG state so we can be sure that different devices agree on which algorithm to use or raise an error. 
191 | *   Symbols moved to the `tf.random` namespace.
192 | *   Additional features, like batch seeds for the stateless random ops to address DeepMind use cases.
193 | 
194 | ## Questions and Discussion Topics
195 | 
196 | *   There is another design where there is a global variable called `global_seed`. Initializers will use it together with the op seed to determine the seed sent to the stateless random ops. The affected change is:
197 | ```python
198 | global_seed = None
199 | global_generator = Generator(seed=global_seed)
200 | DEFAULT_GLOBAL_SEED = 87654321
201 | 
202 | @tf_export("random.set_seed")
203 | def set_seed(seed, algorithm=None):
204 |   # reset the global seed and the global generator
205 |   global global_seed, global_generator
206 |   global_seed = seed
207 |   if algorithm is None:
208 |     algorithm = global_generator.algorithm
209 |   global_generator = Generator(seed=seed, algorithm=algorithm)
210 | 
211 | def _combine_seeds(global_seed, op_seed):
212 |   # combines global_seed and op_seed into a seed for stateless random ops
213 |   return tf.stack([global_seed, op_seed])
214 | 
215 | @tf_export("random.make_seed_if_none")
216 | def make_seed_if_none(op_seed):
217 |   global global_seed, global_generator
218 |   if op_seed is None:
219 |     return global_generator.make_seeds()
220 |   if global_seed is None:
221 |     return _combine_seeds(DEFAULT_GLOBAL_SEED, op_seed)
222 |   return _combine_seeds(global_seed, op_seed)
223 | ```
224 | The motivation is to preserve the design in TensorFlow 1.x which uses a global seed and an op seed. Do we want `global_seed`?
225 |     * Decision: No need for `global_seed`.
226 | *   The `RandomUniform` implementation shown above has the behavior that when `seed` is not `None`, multiple `__call__` invocations return the same result. This has the advantage that it makes it easy to initialize two layers the same way when you want, and the downside that it makes it easy to accidentally initialize two layers the same way. An alternative implementation is that when `seed` is not `None`, `RandomUniform` creates a `Generator` instance from `seed`, stores it as a member, and draws samples from it. In this way, multiple `__call__` invocations return different results, but we can use `seed` to get determinism. Which of the two semantics do we want?
227 |     * Decision: The first semantics (always return the same sequence when seeded).
228 | 
229 | ## Design Review Notes
230 | 
231 | 2019-01-17
232 | 
233 | * Question: Differences between CL with implementation and github RFC?
234 |   * CL matches RFC
235 | * All 4 Questions asked by RFC now have answers (counting all `tf.function`-related questions as 1).
236 | * Minor questions (e.g. naming, etc.) asked new as a result of the RFC have been responded.
237 | * New big question: device placement.
238 | * Seed size: No one objects to 1024; TF Probability team wants >= 256.
239 |   * No provision for ever raising the limit, but we don't see a fundamental reason we can't use larger tensors later.
240 |   * State size is separate, algorithm specific, not fixed at 1024 bits.
241 | * Question: Algorithm + state bundled together? 
242 |   * Makes it easier to have a single thing for reproducing a sequence.
243 | * Question: But what about changing the algorithm? 
244 |   * Should be supported by `Generator.reset`. Currently we have bugs related to changing the size of the variable (to match the size of an algorithm's state size).
245 | * Decision: Using ops where there is a question, which means being compatibile with `tf.function`.
246 | * Decision: If you use an initializer with a specified seed, you should get the same model if you reinitialize; if you leave the seed unspecified, get different initialization each time.
247 | * Note: we have replaced the old global seed with a global generator.
248 | * New big question: we used to assume that the global generator is on one device. How do we handle models on multiple devices?
249 | * We could allow communication to the single device to get random numbers, but it's slow and has high latency.
250 | * There are a couple different ways of having one variable per device, either having multiple variables per generator (lazily adding them as you access the generator from new devices), or multiple generators one per device (one variable each) (here we are treating `_state_var` and `_alg_var` as one variable).
251 | * Question: Regarding determinism of splitting, can we say something about the sequence you get from a seed?
252 |   * Decision: require explicit splitting (i.e. `Generator.make_generators`) until we have need for an automatic solution.
253 | * Question: Should input pipeline use these random numbers? 
254 |   * Ex. `tf.data.Dataset.list_files` is not currently affected by this proposal.
255 | * Problem right now with 1.x RNG ops, has different behavior for dynamic rnn vs. unrolling
256 | * Question: Interaction with `tf.distribute.Strategy`; will get mirrored variable if you use `MirroredStrategy`.
257 |   * Probably bad for GAN.
258 | * Question: Checkpointing the mirrored state?
259 |   * Checkpointing/reviving synced mirrored state is easy. Checkpointing/reviving unsynced per-replica states is hard.
260 | * Suggestion: Require explicit split if you are going to use random numbers in training step, where you explicitly specify whether the generators you are using on each device should be in sync. Have an API for things like the dropout layer: "Give me a generator and it should be (synced/unsynced) across replicas."
261 | * Expectation is that users like `tf.probability` are fine being explicit and generally want the control.
262 | * Hopefully the decision to be explicit will make checkpointing straightforward; harder case is unsynced across replicas -- what to do if the list of devices changes? 
263 |   * Word from [Allen](https://github.com/allenlavoie): will at least get an error if the set of variables changes.
264 | 


--------------------------------------------------------------------------------