├── .gitignore ├── LICENSE.adoc ├── README.adoc ├── charter.adoc ├── meetings ├── 2020 │ ├── 2020-10-26-minutes.adoc │ ├── 2020-11-16-agenda.adoc │ ├── 2020-11-16-minutes.adoc │ ├── 2020-12-15-agenda.adoc │ └── 2020-12-15-minutes.adoc ├── 2021-01-04-agenda.adoc ├── 2021-01-04-minutes.adoc └── README.adoc └── projects ├── candidate-projects.adoc ├── gcc-files ├── extra-array-add-bin.txt ├── extra-array-add-gcc.txt ├── extra-array-add-patch.txt ├── extra-array-add-testcase.txt ├── large-frame-hack.txt ├── opt-si-di-sext.txt ├── opt-sign-extend.txt ├── slt-opt.testcase.c └── slt-opt.txt ├── gcc-optimizations.adoc ├── infrastructure-for-perf-tracking.adoc ├── linker-files ├── .gitignore ├── globvars.c └── prog.c ├── linker-optimizations.adoc └── prd-outline.adoc /.gitignore: -------------------------------------------------------------------------------- 1 | # Generated docs 2 | *.pdf 3 | *.html 4 | # Editor backup 5 | *~ 6 | -------------------------------------------------------------------------------- /LICENSE.adoc: -------------------------------------------------------------------------------- 1 | = Creative Commons Attribution 4.0 International License 2 | 3 | //// 4 | Document conventions: 5 | - one line per paragraph (don't fill lines - this makes changes clearer) 6 | - do not alter this standard text 7 | //// 8 | 9 | 10 | Creative Commons Corporation (“Creative Commons”) is not a law firm and does not provide legal services or legal advice. Distribution of Creative Commons public licenses does not create a lawyer-client or other relationship. Creative Commons makes its licenses and related information available on an “as-is” basis. Creative Commons gives no warranties regarding its licenses, any material licensed under their terms and conditions, or any related information. Creative Commons disclaims all liability for damages resulting from their use to the fullest extent possible. 11 | 12 | == Using Creative Commons Public Licenses 13 | 14 | Creative Commons public licenses provide a standard set of terms and conditions that creators and other rights holders may use to share original works of authorship and other material subject to copyright and certain other rights specified in the public license below. The following considerations are for informational purposes only, are not exhaustive, and do not form part of our licenses. 15 | 16 | [horizontal] 17 | *Considerations for licensors*:: Our public licenses are intended for use by those authorized to give the public permission to use material in ways otherwise restricted by copyright and certain other rights. Our licenses are irrevocable. Licensors should read and understand the terms and conditions of the license they choose before applying it. Licensors should also secure all rights necessary before applying our licenses so that the public can reuse the material as expected. Licensors should clearly mark any material not subject to the license. This includes other CC-licensed material, or material used under an exception or limitation to copyright. https://wiki.creativecommons.org/wiki/Considerations_for_licensors_and_licensees#Considerations_for_licensors[More considerations for licensors]. 18 | 19 | *Considerations for the public*:: By using one of our public licenses, a licensor grants the public permission to use the licensed material under specified terms and conditions. If the licensor's permission is not necessary for any reason–for example, because of any applicable exception or limitation to copyright–then that use is not regulated by the license. Our licenses grant only permissions under copyright and certain other rights that a licensor has authority to grant. Use of the licensed material may still be restricted for other reasons, including because others have copyright or other rights in the material. A licensor may make special requests, such as asking that all changes be marked or described. Although not required by our licenses, you are encouraged to respect those requests where reasonable. https://wiki.creativecommons.org/Considerations_for_licensors_and_licensees#Considerations_for_licensees[More considerations for the public]. 20 | 21 | [[app_cc_by_4.0]] 22 | == Creative Commons Attribution 4.0 International Public License 23 | 24 | By exercising the Licensed Rights (defined below), You accept and agree to be bound by the terms and conditions of this Creative Commons Attribution 4.0 International Public License ("Public License"). To the extent this Public License may be interpreted as a contract, You are granted the Licensed Rights in consideration of Your acceptance of these terms and conditions, and the Licensor grants You such rights in consideration of benefits the Licensor receives from making the Licensed Material available under these terms and conditions. 25 | 26 | :numbered!: 27 | === Section 1--Definitions. 28 | 29 | a. *Adapted Material* means material subject to Copyright and Similar Rights that is derived from or based upon the Licensed Material and in which the Licensed Material is translated, altered, arranged, transformed, or otherwise modified in a manner requiring permission under the Copyright and Similar Rights held by the Licensor. For purposes of this Public License, where the Licensed Material is a musical work, performance, or sound recording, Adapted Material is always produced where the Licensed Material is synched in timed relation with a moving image. 30 | 31 | b. *Adapter's License* means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License. 32 | 33 | c. *Copyright and Similar Rights* means copyright and/or similar rights closely related to copyright including, without limitation, performance, broadcast, sound recording, and Sui Generis Database Rights, without regard to how the rights are labeled or categorized. For purposes of this Public License, the rights specified in Section 2(b)(1)-(2) are not Copyright and Similar Rights. 34 | 35 | d. *Effective Technological Measures* means those measures that, in the absence of proper authority, may not be circumvented under laws fulfilling obligations under Article 11 of the WIPO Copyright Treaty adopted on December 20, 1996, and/or similar international agreements. 36 | 37 | e. *Exceptions and Limitations* means fair use, fair dealing, and/or any other exception or limitation to Copyright and Similar Rights that applies to Your use of the Licensed Material. 38 | 39 | f. *Licensed Material* means the artistic or literary work, database, or other material to which the Licensor applied this Public License. 40 | 41 | g. *Licensed Rights* means the rights granted to You subject to the terms and conditions of this Public License, which are limited to all Copyright and Similar Rights that apply to Your use of the Licensed Material and that the Licensor has authority to license. 42 | 43 | h. *Licensor* means the individual(s) or entity(ies) granting rights under this Public License. 44 | 45 | i. *Share* means to provide material to the public by any means or process that requires permission under the Licensed Rights, such as reproduction, public display, public performance, distribution, dissemination, communication, or importation, and to make material available to the public including in ways that members of the public may access the material from a place and at a time individually chosen by them. 46 | 47 | j. *Sui Generis Database Rights* means rights other than copyright resulting from Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases, as amended and/or succeeded, as well as other essentially equivalent rights anywhere in the world. 48 | 49 | k. *You* means the individual or entity exercising the Licensed Rights under this Public License. Your has a corresponding meaning. 50 | 51 | === Section 2 – Scope 52 | 53 | a. *License grant*. 54 | 1. Subject to the terms and conditions of this Public License, the Licensor hereby grants You a worldwide, royalty-free, non-sublicensable, non-exclusive, irrevocable license to exercise the Licensed Rights in the Licensed Material to: 55 | A. reproduce and Share the Licensed Material, in whole or in part; and 56 | B. produce, reproduce, and Share Adapted Material. 57 | 2. _Exceptions and Limitations_. For the avoidance of doubt, where Exceptions and Limitations apply to Your use, this Public License does not apply, and You do not need to comply with its terms and conditions. 58 | 3. _Term_. The term of this Public License is specified in Section 6(a). 59 | 4. _Media and formats; technical modifications allowed_. The Licensor authorizes You to exercise the Licensed Rights in all media and formats whether now known or hereafter created, and to make technical modifications necessary to do so. The Licensor waives and/or agrees not to assert any right or authority to forbid You from making technical modifications necessary to exercise the Licensed Rights, including technical modifications necessary to circumvent Effective Technological Measures. For purposes of this Public License, simply making modifications authorized by this Section 2(a)(4) never produces Adapted Material. 60 | 5. _Downstream recipients_. 61 | A. _Offer from the Licensor – Licensed Material_. Every recipient of the Licensed Material automatically receives an offer from the Licensor to exercise the Licensed Rights under the terms and conditions of this Public License. 62 | B. _No downstream restrictions_. You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, the Licensed Material if doing so restricts exercise of the Licensed Rights by any recipient of the Licensed Material. 63 | 6. _No endorsement_. Nothing in this Public License constitutes or may be construed as permission to assert or imply that You are, or that Your use of the Licensed Material is, connected with, or sponsored, endorsed, or granted official status by, the Licensor or others designated to receive attribution as provided in Section 3(a)(1)(A)(i). 64 | 65 | b. *Other rights*. 66 | 1. Moral rights, such as the right of integrity, are not licensed under this Public License, nor are publicity, privacy, and/or other similar personality rights; however, to the extent possible, the Licensor waives and/or agrees not to assert any such rights held by the Licensor to the limited extent necessary to allow You to exercise the Licensed Rights, but not otherwise. 67 | 2. Patent and trademark rights are not licensed under this Public License. 68 | 3. To the extent possible, the Licensor waives any right to collect royalties from You for the exercise of the Licensed Rights, whether directly or through a collecting society under any voluntary or waivable statutory or compulsory licensing scheme. In all other cases the Licensor expressly reserves any right to collect such royalties. 69 | 70 | === Section 3 -- License Conditions. 71 | 72 | Your exercise of the Licensed Rights is expressly made subject to the 73 | following conditions. 74 | 75 | a. Attribution. 76 | 1. If You Share the Licensed Material (including in modified form), You must: 77 | A. retain the following if it is supplied by the Licensor with the Licensed Material: 78 | i) identification of the creator(s) of the Licensed Material and any others designated to receive attribution, in any reasonable manner requested by the Licensor (including by pseudonym if designated); 79 | ii) a copyright notice; 80 | iii) notice that refers to this Public License; 81 | iv) a notice that refers to the disclaimer of warranties; 82 | v) a URI or hyperlink to the Licensed Material to the extent reasonably practicable; 83 | B. indicate if You modified the Licensed Material and retain an indication of any previous modifications; and 84 | C. indicate the Licensed Material is licensed under this Public License, and include the text of, or the URI or hyperlink to, this Public License. 85 | 2. You may satisfy the conditions in Section 3(a)(1) in any reasonable manner based on the medium, means, and context in which You Share the Licensed Material. For example, it may be reasonable to satisfy the conditions by providing a URI or hyperlink to a resource that includes the required information. 86 | 3. If requested by the Licensor, You must remove any of the information required by Section 3(a)(1)(A) to the extent reasonably practicable. 87 | 4. If You Share Adapted Material You produce, the Adapter's License You apply must not prevent recipients of the Adapted Material from complying with this Public License. 88 | 89 | === Section 4 -- Sui Generis Database Rights. 90 | 91 | Where the Licensed Rights include Sui Generis Database Rights that apply to Your use of the Licensed Material: 92 | 93 | a. for the avoidance of doubt, Section 2(a)(1) grants You the right to extract, reuse, reproduce, and Share all or a substantial portion of the contents of the database; 94 | 95 | b. if You include all or a substantial portion of the database contents in a database in which You have Sui Generis Database Rights, then the database in which You have Sui Generis Database Rights (but not its individual contents) is Adapted Material; and 96 | 97 | c. You must comply with the conditions in Section 3(a) if You Share all or a substantial portion of the contents of the database. 98 | For the avoidance of doubt, this Section 4 supplements and does not replace Your obligations under this Public License where the Licensed Rights include other Copyright and Similar Rights. 99 | 100 | === Section 5 -- Disclaimer of Warranties and Limitation of Liability. 101 | 102 | a. *Unless otherwise separately undertaken by the Licensor, to the extent possible, the Licensor offers the Licensed Material as-is and as-available, and makes no representations or warranties of any kind concerning the Licensed Material, whether express, implied, statutory, or other. This includes, without limitation, warranties of title, merchantability, fitness for a particular purpose, non-infringement, absence of latent or other defects, accuracy, or the presence or absence of errors, whether or not known or discoverable. Where disclaimers of warranties are not allowed in full or in part, this disclaimer may not apply to You.* 103 | 104 | b. *To the extent possible, in no event will the Licensor be liable to You on any legal theory (including, without limitation, negligence) or otherwise for any direct, special, indirect, incidental, consequential, punitive, exemplary, or other losses, costs, expenses, or damages arising out of this Public License or use of the Licensed Material, even if the Licensor has been advised of the possibility of such losses, costs, expenses, or damages. Where a limitation of liability is not allowed in full or in part, this limitation may not apply to You.* 105 | 106 | c. The disclaimer of warranties and limitation of liability provided above shall be interpreted in a manner that, to the extent possible, most closely approximates an absolute disclaimer and waiver of all liability. 107 | 108 | === Section 6 -- Term and Termination. 109 | 110 | a. This Public License applies for the term of the Copyright and Similar Rights licensed here. However, if You fail to comply with this Public License, then Your rights under this Public License terminate automatically. 111 | 112 | b. Where Your right to use the Licensed Material has terminated under Section 6(a), it reinstates: 113 | 1. automatically as of the date the violation is cured, provided it is cured within 30 days of Your discovery of the violation; or 114 | 2. upon express reinstatement by the Licensor. 115 | 116 | c. For the avoidance of doubt, this Section 6(b) does not affect any right the Licensor may have to seek remedies for Your violations of this Public License. 117 | 118 | d. For the avoidance of doubt, the Licensor may also offer the Licensed Material under separate terms or conditions or stop distributing the Licensed Material at any time; however, doing so will not terminate this Public License. 119 | Sections 1, 5, 6, 7, and 8 survive termination of this Public License. 120 | 121 | === Section 7 -- Other Terms and Conditions. 122 | 123 | a. The Licensor shall not be bound by any additional or different terms or conditions communicated by You unless expressly agreed. 124 | 125 | b. Any arrangements, understandings, or agreements regarding the Licensed Material not stated herein are separate from and independent of the terms and conditions of this Public License. 126 | 127 | === Section 8 -- Interpretation. 128 | 129 | a. For the avoidance of doubt, this Public License does not, and shall not be interpreted to, reduce, limit, restrict, or impose conditions on any use of the Licensed Material that could lawfully be made without permission under this Public License. 130 | 131 | b. To the extent possible, if any provision of this Public License is deemed unenforceable, it shall be automatically reformed to the minimum extent necessary to make it enforceable. If the provision cannot be reformed, it shall be severed from this Public License without affecting the enforceability of the remaining terms and conditions. 132 | 133 | c. No term or condition of this Public License will be waived and no failure to comply consented to unless expressly agreed to by the Licensor. 134 | 135 | d. Nothing in this Public License constitutes or may be interpreted as a limitation upon, or waiver of, any privileges and immunities that apply to the Licensor or You, including from the legal processes of any jurisdiction or authority. 136 | 137 | == Supplementary 138 | 139 | Creative Commons is not a party to its public licenses. Notwithstanding, Creative Commons may elect to apply one of its public licenses to material it publishes and in those instances will be considered the “Licensor.” The text of the Creative Commons public licenses is dedicated to the public domain under the https://creativecommons.org/publicdomain/zero/1.0/legalcode[CC0 Public Domain Dedication]. Except for the limited purpose of indicating that material is shared under a Creative Commons public license or as otherwise permitted by the Creative Commons policies published at https://creativecommons.org/policies[creativecommons.org/policies], Creative Commons does not authorize the use of the trademark “Creative Commons” or any other trademark or logo of Creative Commons without its prior written consent including, without limitation, in connection with any unauthorized modifications to any of its public licenses or any other arrangements, understandings, or agreements concerning use of licensed material. For the avoidance of doubt, this paragraph does not form part of the public licenses. 140 | 141 | Creative Commons may be contacted at https://creativecommons.org/[creativecommons.org]. 142 | -------------------------------------------------------------------------------- /README.adoc: -------------------------------------------------------------------------------- 1 | = RISC-V Code Speed Optimization SIG = 2 | 3 | //// 4 | SPDX-License-Identifier: CC-BY-4.0 5 | 6 | Document conventions: 7 | - one line per paragraph (don't fill lines - this makes changes clearer) 8 | - Wikipedia heading conventions (First word only capitalized) 9 | - US spelling throughout. 10 | //// 11 | 12 | This is the home of the RISC-V Code Speed Optimization SIG. At present this has the status of a proposed SIG, with a candidate chair and co-chair, and a link:./charter.adoc[charter] in review. 13 | 14 | The SIG is developing a list of link:projects/candidate-projects.adoc[candidate projects] to pursue. All are invited to submit pull requests to extend the list of candidate projects. 15 | 16 | * Candidate Chair: Jeremy Bennett 17 | * Candidate Co-Chair: Wei Wu 18 | -------------------------------------------------------------------------------- /charter.adoc: -------------------------------------------------------------------------------- 1 | = Charter = 2 | RISC-V International Code Speed Optimization SIG 3 | :toc: 4 | :icons: font 5 | :numbered: 6 | :source-highlighter: rouge 7 | 8 | //// 9 | SPDX-License-Identifier: CC-BY-4.0 10 | 11 | Document conventions: 12 | - one line per paragraph (don't fill lines - this makes changes clearer) 13 | - Wikipedia heading conventions (First word only capitalized) 14 | - US spelling throughout. 15 | //// 16 | 17 | == Overview 18 | 19 | This document has been approved by the SIG membership at its meeting on 16 November 2020. It awaits ratification by the Technical Steering Committee. 20 | 21 | == Scope 22 | 23 | The scope of the SIG is: 24 | 25 | * how to improve speed through ISA extensions 26 | * how to improve speed using the compilers and interpreters 27 | * how to improve speed using libraries 28 | 29 | The remit explicitly excludes consideration of hardware implementation strategies to keep the scope of the SIG to a reasonable size. Hardware implementation strategy is a big enough subject to deserve its own SIG. However this SIG may help develop best practice for those developing hardware. 30 | 31 | As a SIG, our role is to bring forward ideas and proposals. We will prepare these for our parent body, the Toolchain and Runtimes sub-committee. Any proposal that is accepted would then lead to a task group being created to bring the project to completion, and we would provide oversight of the task group's work. 32 | 33 | == Goals 34 | 35 | These provide the overarching framework within which we operate. 36 | 37 | 1. To increase the execution speed of programs running on RISC-V hardware from the smallest microcontroller to the largest HPC systems. 38 | 39 | 2. To support the development of a thriving commercial ecosystem for RISC-V technology which enhances code speed, both open source and proprietary. 40 | 41 | 3. To see any open source code speed technologies from this group incorporated and maintained in their upstream projects 42 | 43 | The second of these is particularly important. The task is too big to be done just by RISC-V International, so we must ensure commercial players find it worth their while to be committed. 44 | 45 | These goals then lead to some metrics 46 | 47 | 1. Benchmark speeds. We will need to define what benchmarks are appropriate, and since we are looking at systems of all sizes, we shall certainly need many benchmarks. 48 | 49 | 2. Number of companies offering RISC-V products and services within the remit of this SIG 50 | 51 | 3. Number of projects originating from this SIG which have been incorporated upstream. 52 | 53 | == Commentary 54 | 55 | The types of proposals we might create are. 56 | 57 | 1. Areas where research is needed. For example we might want to explore whether an optimizing linker (beyond conventional relaxation) would be beneficial for RISC-V. We would scope the research and specify its goals and outcomes. The outcomes could then lead to us specifying a new development task. 58 | 59 | 2. Areas where development is needed. For example creation of hand-optimized emulation, standard C and standard math libraries. In this case we would provide an outline Request for Quotation (RFQ) specifying the deliverables. This may be a follow-on from earlier commissioned research. 60 | 61 | 3. Areas where processes need creating or improving. I have given an example of a process to establish vendor specific relocations, in order that we can have vendor specific tool chains that are consistent. 62 | 63 | For things that are of modest scope and generally of value the work might be commissioned by RISC-V International. However for larger projects we might have an additional role, which is to help formation of industry consortia to commission the work. An example of this might be delivery of optimizing Fortran compiler support in GCC and/or LLVM, something that is critical to the small number of RISC-V International members who are in the HPC space, and which represents tens of engineer years of work over an extended period. Any such consortium formation will be carried out in a completely neutral fashion, following all anti-trust rules. 64 | 65 | Some of these areas overlap with other groups, notably the Code Size Reduction SIG (compiler and library issues), the Managed Runtimes SIG (interpreted and JITable languages) and J-extension TG (pointer masking and sandboxing may be overlap areas). We shall communicate regularly with these groups, and agree who will lead on any overlapping issues. 66 | 67 | == Document history 68 | [cols="<1,<2,<3,<4",options="header,pagewidth",] 69 | |================================================================================ 70 | | _Revision_ | _Date_ | _Author_ | _Modification_ 71 | | 0.01 | 19 October 2020 | 72 | 73 | Jeremy Bennett, 74 | Wei Wu | 75 | 76 | Initial version for discussion. 77 | 78 | | 0.10 | 26 October 2020 | 79 | 80 | Jeremy Bennett | 81 | 82 | Draft for review, incorporating comments from initial meeting and pending finalization on 16 November 2020. 83 | 84 | | 0.90 | 16 November 2020 | 85 | 86 | Jeremy Bennett | 87 | 88 | Final version approved by the meeting of 16 November to submit for ratification by TSC. 89 | 90 | |================================================================================ 91 | -------------------------------------------------------------------------------- /meetings/2020/2020-10-26-minutes.adoc: -------------------------------------------------------------------------------- 1 | :leveloffset: 1 2 | = RISC-V Code Speed Optimization SIG Minutes = 3 | Monday 26 October 2020, 07:00 Pacific Time 4 | 5 | //// 6 | SPDX-License-Identifier: CC-BY-4.0 7 | 8 | Document conventions: 9 | - one line per paragraph (don't fill lines - this makes changes clearer) 10 | - Wikipedia heading conventions (First word only capitalized) 11 | - US spelling throughout. 12 | //// 13 | 14 | == Summary of actions 15 | 16 | - *Mark Himelstein* to advise how to access the free advertising for the commercial ecosystem 17 | - *Jeremy Bennett* to collect comments on the charter and prepare an updated version for the next weeting. 18 | - *Jeremy Bennett* to create the candidate project list in GitHub, so it can be updated by pull request and issue submission. 19 | - *ACTION:* Jim Wilson to provide his list of GCC improvements 20 | 21 | == Review of actions 22 | 23 | First meeting, so none! 24 | 25 | == Welcome 26 | 27 | Mark Himelstein, CTO RISC-V International introduced the context for the group. 28 | 29 | Wei Wu: Candidate Co-chair, Head of PLCT, Chinese Academy of Sciences Institute of Software. Contributing to compilers, emulators and virtual machines. 30 | 31 | Jeremy Bennett: Candidate Chair, Chief Executive of Embecosm. Developer of open source compiler tool chains, processor models, operating systems and AI/ML. 32 | 33 | == Review the charter 34 | 35 | Mark Himelstein, encouraged the group to develop best practice for hardware, even though actually hardware is excluded. 36 | 37 | Mark Himelstein is establishing cross-group forums. He also noted that RISC-V has free advertising for ecosystem. Reach out to ISVs 38 | 39 | *ACTION:* Mark Himelstein to advise how to access the free advertising. 40 | 41 | Paolo Savini - metrics may overlap with other groups. Benchmarks for code size may also be useful for code speed. 42 | 43 | *ACTION:* Jeremy Bennett to collect comments on the charter and prepare an updated version for the next weeting. 44 | 45 | == Project development process 46 | 47 | Mark Himelstein noted that the definition of done is a commmuity effort and subject to continuous improvement. Please suggest improvements from our group. 48 | 49 | == List of candidate projects 50 | 51 | Some of these may be in other groups. 52 | 53 | *ACTION:* Jeremy Bennett to create the candidate project list in GitHub, so it can be updated by pull request and issue submission. 54 | 55 | === Research 56 | 57 | - extension of optimization in the linker 58 | - machine learning in the compiler 59 | 60 | === Development 61 | 62 | - compiler (GCC/LLVM/IAR) optimization for upcoming extensions (B, V, P, J, etc) 63 | - machine learning outside the compiler 64 | - superoptimization 65 | 66 | *ACTION:* Jim Wilson to provide his list of GCC improvements 67 | 68 | === Process 69 | 70 | - FSF copyright assignment in RISC-V mirror repositories 71 | - allocation of vendor specific linker relocations 72 | - `-menable-experimental-extensions` option for GCC 73 | - ongoing benchmarking (and regressions in benchmarking), including competitive analysis 74 | - documentation, for example to facilitate writing a scheduler, what can be standardize/parameterized, what are the "hints" in RISC-V documentation 75 | - buildbot infrastructure for open source CI and measurement 76 | 77 | == Dates of future meetings 78 | 79 | The group meets at 07:00 Pacific Time 80 | 81 | - Monday 16 November 2020 82 | - Monday 7 December 2020 83 | - Thereafter first Monday of the month throughout 2021 84 | 85 | Mark Himelstein noted we can have more meetings if we need to. 86 | 87 | == AOB 88 | 89 | None. 90 | 91 | Jeremy Bennett, Candidate Chair + 92 | Wei Wu, Candidate Co-chair 93 | -------------------------------------------------------------------------------- /meetings/2020/2020-11-16-agenda.adoc: -------------------------------------------------------------------------------- 1 | :leveloffset: 1 2 | = RISC-V Code Speed Optimization SIG Meeting Agenda = 3 | Monday 16 November 2020, 07:00 Pacific Time 4 | 5 | //// 6 | SPDX-License-Identifier: CC-BY-4.0 7 | 8 | Document conventions: 9 | - one line per paragraph (don't fill lines - this makes changes clearer) 10 | - Wikipedia heading conventions (First word only capitalized) 11 | - US spelling throughout. 12 | //// 13 | 14 | == Welcome 15 | 16 | == Review of actions 17 | 18 | * *Mark Himelstein* to advise how to access the free advertising for the commercial ecosystem 19 | 20 | ** information circulated via the mailing list. 21 | 22 | * *Jeremy Bennett* to collect comments on the charter and prepare an updated version for the next weeting. 23 | 24 | ** complete 25 | 26 | * *Jeremy Bennett* to create the candidate project list in GitHub, so it can be updated by pull request and issue submission. 27 | 28 | ** complete. 29 | 30 | * *Jim Wilson* to provide his list of GCC improvements. 31 | 32 | ** complete. 33 | 34 | == Finalize the charter 35 | 36 | This meeting will agree the final charter to be submitted to TSG for approval. link:https://github.com/riscv/riscv-code-speed-optimization/blob/main/charter.adoc[Current text] on GitHub. 37 | 38 | == List of candidate projects 39 | 40 | The meeting will review the current list of projects and suggest an initial prioritization. To be prioritizes a project must have an owner to drive it to completion. This is the link:https://github.com/riscv/riscv-code-speed-optimization/blob/main/projects/candidate-projects.adoc[current list]. 41 | 42 | The goal is that our next meeting will work the two highest priority of these up to more detail. 43 | 44 | == Dates of future meetings 45 | 46 | The group meets at 07:00 Pacific Time 47 | 48 | * Monday 7 December 2020 49 | * Thereafter first Monday of the month throughout 2021 50 | 51 | Mark Himelstein noted we can have more meetings if we need to. 52 | 53 | == AOB 54 | 55 | None. 56 | 57 | Jeremy Bennett, Candidate Chair + 58 | Wei Wu, Candidate Co-chair 59 | -------------------------------------------------------------------------------- /meetings/2020/2020-11-16-minutes.adoc: -------------------------------------------------------------------------------- 1 | :leveloffset: 1 2 | = RISC-V Code Speed Optimization SIG Meeting Minutes = 3 | 4 | Monday 16 November 2020, 07:00 Pacific Time 5 | 6 | //// 7 | SPDX-License-Identifier: CC-BY-4.0 8 | 9 | Document conventions: 10 | - one line per paragraph (don't fill lines - this makes changes clearer) 11 | - Wikipedia heading conventions (First word only capitalized) 12 | - US spelling throughout. 13 | //// 14 | 15 | == Summary of actions 16 | 17 | * **Jeremy Bennett.** Clarify wording around faciliating consortia. 18 | * **Jeremy Bennett.** Submit final charter to TWG for ratification. 19 | * **Jeremy Bennett.** Advise Toolchain & Runtimes SC they need to own the list of software maintained by RISC-V International. 20 | * **Jeremy Bennett.** Update list of potential projects. 21 | * **Jeremy Bennett.** Advise Toolchain & Runtimes SC that they need to track libraries of interest to RISC-V International. 22 | 23 | == Review of actions 24 | 25 | * *Mark Himelstein* to advise how to access the free advertising for the commercial ecosystem 26 | 27 | ** complete - information circulated via the mailing list. 28 | 29 | * *Jeremy Bennett* to collect comments on the charter and prepare an updated version for the next weeting. 30 | 31 | ** complete 32 | 33 | * *Jeremy Bennett* to create the candidate project list in GitHub, so it can be updated by pull request and issue submission. 34 | 35 | ** complete. 36 | 37 | * *Jim Wilson* to provide his list of GCC improvements. 38 | 39 | ** complete. 40 | 41 | == Welcome 42 | 43 | Attendees introduced themselves. 44 | 45 | == Review the charter 46 | 47 | Question of whether we can facilitate as a consortium. Probably need to do by escalation. Forming consortia would need to be outside RISC-V International. 48 | 49 | **ACTION:** Jeremy Bennett. Clarify wording around faciliating consortia. 50 | 51 | Subject to clarification of the point above, approved _nem. con._ 52 | 53 | **ACTION:** Jeremy Bennett. Submit final charter to TSC for ratification. 54 | 55 | == List of candidate projects 56 | 57 | Add the following. 58 | 59 | * lots of language support. Set up baseline of code speed for each language 60 | 61 | ** see RISC-V software list maintained by RISC-V International 62 | ** This should be owned by Toolchain & Runtimes SC 63 | ** **ACTION:** Jeremy Bennett. Advise Toolchain & Runtimes SC they need to own the list of software maintained by RISC-V International 64 | ** Chinese Academy of Sciences PLCT Lab have infrastructure 65 | 66 | * need list of benchmarks (see existing proposed project). 67 | 68 | * loop level profiling, compilers support for profiling 69 | 70 | * how does code density affect code speed, including cache impacts 71 | 72 | * LLVM optimizations 73 | 74 | * Other compiler optimizations 75 | 76 | **ACTION:** Jeremy Bennett. Update list of potential projects. 77 | 78 | === Prioritization 79 | 80 | Each person present was asked to choose their top two priorities for projects to work on 81 | 82 | [cols="<4,>1",options="header,pagewidth",] 83 | |============================================================================= 84 | | _Project_ | _Count_ 85 | | Continuous integration and test, benchmarking and tracing | 8 86 | | Compiler optimizations for upcoming extensions | 7 87 | | Allocation of vendor specific relocations | 3 88 | | GCC optimizations | 2 89 | | Linker related optimization | 2 90 | | Documentation | 1 91 | | Making best use of existing optimizations | 1 92 | |============================================================================= 93 | 94 | At our next meeting we will tackle the first two of these 95 | 96 | - **Wei Wu** (co-chair) will lead the discussion on continuous integration adn test, benchmarking and tracing 97 | - **Jeremy Bennett** (chair) will led the discussion on compiler optimizations for upcoming extensions 98 | 99 | == Dates of future meetings 100 | 101 | The group meets at 07:00 Pacific Time 102 | 103 | - Monday 7 December 2020 104 | - Thereafter first Monday of the month throughout 2021 105 | 106 | Mark Himelstein noted we can have more meetings if we need to. 107 | 108 | == AOB 109 | 110 | News from Chinese Academy of Sciences PLCT Lab 111 | 112 | - Now have OpenJDK with basic RV64G porting. 50KLOC. 113 | - Working on optimization OpenCV. 114 | 115 | The SIG's role is to optimize libraries for speed, however there is a wider question of tracking which libraries matter to RISC-V International. 116 | 117 | **ACTION:** Jeremy Bennett. Advice Toolchain & Runtimes SC that they need to track libraries of interest to RISC-V. 118 | 119 | Jeremy Bennett, Candidate Chair + 120 | Wei Wu, Candidate Co-chair 121 | -------------------------------------------------------------------------------- /meetings/2020/2020-12-15-agenda.adoc: -------------------------------------------------------------------------------- 1 | :leveloffset: 1 2 | = RISC-V Code Speed Optimization SIG Meeting Agenda = 3 | Tuesday 15 December 2020, 08:00 Pacific Time 4 | 5 | //// 6 | SPDX-License-Identifier: CC-BY-4.0 7 | 8 | Document conventions: 9 | - one line per paragraph (don't fill lines - this makes changes clearer) 10 | - Wikipedia heading conventions (First word only capitalized) 11 | - US spelling throughout. 12 | //// 13 | 14 | == Welcome 15 | 16 | Attendees are invited to introduce themselves. 17 | 18 | == Review of actions 19 | 20 | * **Jeremy Bennett.** Clarify wording around faciliating consortia. 21 | 22 | ** Complete. 23 | 24 | * **Jeremy Bennett.** Submit final charter to TWG for ratification. 25 | 26 | ** Complete. Note that it is the Software Steering Committee which now ratifies. 27 | 28 | * **Jeremy Bennett.** Advise Toolchain & Runtimes SC they need to own the list of software maintained by RISC-V International. 29 | 30 | ** Complete. 31 | 32 | * **Jeremy Bennett.** Update list of potential projects. 33 | 34 | ** Complete. 35 | 36 | * **Jeremy Bennett.** Advise Toolchain & Runtimes SC that they need to track 37 | libraries of interest to RISC-V International. 38 | 39 | ** Complete. 40 | 41 | 42 | == Process 43 | 44 | The chair and co-chair will outline the process for developing proposals. This will involve developing formal requirements and a briefing note. 45 | 46 | == Project #1: Continuous integration, testing, trace and benchmarking 47 | 48 | The co-chair will lead a discussion of this project, with a view to developing an initial outline proposal for consideration by Toolchain & Runtimes Technical Subcommittee. 49 | 50 | == Project #2: Compiler optimizations for upcoming extensions 51 | 52 | The chair will lead a discussion of this project, with a view to developing an initial outline proposal for consideration by Toolchain & Runtimes Technical Subcommittee. 53 | 54 | == List of candidate projects 55 | 56 | Members are invited to consider additional projects to be added to the priority list. This is the link:https://github.com/riscv/riscv-code-speed-optimization/blob/main/projects/candidate-projects.adoc[current list]. The outstanding projects will be prioritized by polling members attending the meeting. 57 | 58 | Future meetings will progress current projects and typically consider initiating the next project on the list. 59 | 60 | == Dates of future meetings 61 | 62 | The group meets at 07:00 Pacific Time 63 | 64 | * Monday 4 January 2021 65 | * Monday 1 February 2021 66 | * Monday 1 March 2021 67 | * Monday 5 April 2021 68 | * Monday 3 May 2021 69 | * Monday 7 June 2021 70 | * Monday 5 July 2021 71 | * Monday 2 August 2021 72 | * Monday 6 September 2021 73 | * Monday 4 October 2021 74 | * Monday 1 November 2021 75 | * Monday 6 December 2021 76 | 77 | Subgroups developing project proposals are expected to hold additional meetings open to all between monthly meetings. 78 | 79 | == AOB 80 | 81 | Members are invited to raise any other items of business. 82 | 83 | Jeremy Bennett, Candidate Chair + 84 | Wei Wu, Candidate Co-chair 85 | -------------------------------------------------------------------------------- /meetings/2020/2020-12-15-minutes.adoc: -------------------------------------------------------------------------------- 1 | :leveloffset: 1 2 | = RISC-V Code Speed Optimization SIG Meeting Minutes = 3 | 4 | Monday 7 December 2020, 07:00 Pacific Time 5 | 6 | //// 7 | SPDX-License-Identifier: CC-BY-4.0 8 | 9 | Document conventions: 10 | - one line per paragraph (don't fill lines - this makes changes clearer) 11 | - Wikipedia heading conventions (First word only capitalized) 12 | - US spelling throughout. 13 | //// 14 | 15 | == Summary of actions 16 | 17 | * **Jeremy Bennett** to merge template PRD document into GitHub repository 18 | * **Wei Wu** to commit his draft PRD to the projects directory of the repository and solicit comments from this group and the wider RISC-V tech group 19 | * **Wei Wu** to lead a review at 4 January meeting, with a view to submitting to Toolchain & Runtimes committee on 14 January 20 | * **Jeremy Bennett** to provide an outline PRD for B extension speed optimization for review at the next meeting 21 | * **Jeremy Bennett** to add individual projects for each extension to the list of candidate projects 22 | * **Jeremy Bennett** to update list of potential projects 23 | 24 | == Welcome 25 | 26 | The anti-trust and code of conduct notice were presented. 27 | 28 | Attendees introduced themselves. 29 | 30 | == Review of actions 31 | 32 | * **Jeremy Bennett.** Clarify wording around faciliating consortia. 33 | 34 | ** Complete. 35 | 36 | * **Jeremy Bennett.** Submit final charter to TWG for ratification. 37 | 38 | ** Complete. Note that it is the Software Steering Committee which now ratifies. 39 | 40 | * **Jeremy Bennett.** Advise Toolchain & Runtimes SC they need to own the list of software maintained by RISC-V International. 41 | 42 | ** Complete. 43 | 44 | * **Jeremy Bennett.** Update list of potential projects. 45 | 46 | ** Complete. 47 | 48 | * **Jeremy Bennett.** Advise Toolchain & Runtimes SC that they need to track 49 | libraries of interest to RISC-V International. 50 | 51 | ** Complete. 52 | 53 | == Process 54 | 55 | Jeremy Bennett and Wei Wu presented the proposed PRD document to capture the details of a project. 56 | 57 | General agreement this is a useful approach. In many ways similar to a grant proposal. Noted that we could always change if the proposal does not meet needs. 58 | 59 | **ACTION:** Jeremy Bennett to merge template PRD document into GitHub repository. 60 | 61 | == Project #1: Continuous integration, testing, trace and benchmarking 62 | 63 | Wei Wu led a discussion of the project proposal. Key points: 64 | 65 | * Inspired by https://arewefastyet.com/[arewefastyet.com] website. 66 | * Phase 1 - set up infrastructure with one compiler, none board and 3 JS benchmarks 67 | * Phase 2 - add more boards, tool chains and benchmarks 68 | * Phase 3 - yet more boards, tool chains and libraries, allow RVI members to access and upload their projects 69 | * Timeline by August 2021, phases are milestones 70 | 71 | Questions and comments: 72 | 73 | * Currently uses bash scripts, but would be better being OS neutral technology, for example Python. 74 | 75 | **ACTION:** Wei Wu to commit his draft PRD to the projects directory of the repository and solicit comments from this group and the wider RISC-V tech group. 76 | 77 | **ACTION:** Wei Wu to lead a review at 4 January meeting, with a view to submitting to Toolchain & Runtimes committee on 14 January. 78 | 79 | == Project #2: Compiler optimizations for upcoming extensions 80 | 81 | Jeremy Bennett led a discussion of the project proposal. This needs to be a series of projects one for each extension under development. 82 | 83 | Status of current extensions, by parent TG: 84 | 85 | * Alternative floating point formats TG 86 | 87 | ** `Zfh` is supported in GCC/binutils in RISC-V repository 88 | ** Posits no support yet 89 | 90 | * Bit manipulation TG 91 | 92 | ** Clang/LLVM experimental support upstream 93 | ** GCC/binutils ports in RISC-V repositories and out-of-tree from various suppliers 94 | 95 | * JIT TG 96 | 97 | ** Support from V8 by PLCT 98 | ** OpenJDK supported 99 | ** Contributions from Huawei, which has been open sourced. 100 | 101 | * DSP TG 102 | 103 | ** Andes have GCC support 104 | ** IAR support at assembler/intrinsic level 105 | ** Question over status of register pairs 106 | ** Note that documentation has many errors in assembler examples 107 | 108 | * Vector TG 109 | 110 | ** Support in RISC-V GCC/binutils repositories 111 | ** Experimental target support upstream in Clang/LLVM, now the primary focus. 112 | ** Two proposals for intrinsics 113 | 114 | * `Zfinx` TG 115 | 116 | Binutils/GCC/GDB in development by PLCT at Chinese Academy of Sciences 117 | 118 | It was agreed that there was little point in putting effort into optimization until the basic tool chain is stable. This leaves as candidates 119 | 120 | * `Zfh` extension 121 | * B extension 122 | * J extension 123 | * V extension 124 | 125 | In discussion it was agreed that there was limited opportunity for speedup of the `Zfh` extension, that the J extension of itself was a speedup technology, rather than needing speeding up itself. 126 | 127 | The SIG decided that it made more sense to start with optimization of the B extension, since the V extension would be a huge project, and we should establish our credentials with a more tractable project. 128 | 129 | **ACTION:** Jeremy Bennett to provide an outline PRD for B extension speed optimization for review at the next meeting. 130 | 131 | **ACTION:** Jeremy Bennett to add individual projects for each extension to the list of candidate projects. 132 | 133 | == List of candidate projects 134 | 135 | It was requested to add the following projects 136 | 137 | * optimization for specific applications, e.g OpenCV, crypto 138 | * a version of `-msave-restore` optimized for speed 139 | 140 | **ACTION:** Jeremy Bennett to update list of potential projects. 141 | 142 | === Reprioritization 143 | 144 | The meeting ran out of time to reprioritize its list of projects, so this will be considered at the next meeting. 145 | 146 | == Dates of future meetings 147 | 148 | The group meets at 07:00 Pacific Time 149 | 150 | * Monday 4 January 2021 151 | * Monday 1 February 2021 152 | * Monday 1 March 2021 153 | * Monday 5 April 2021 154 | * Monday 3 May 2021 155 | * Monday 7 June 2021 156 | * Monday 5 July 2021 157 | * Monday 2 August 2021 158 | * Monday 6 September 2021 159 | * Monday 4 October 2021 160 | * Monday 1 November 2021 161 | * Monday 6 December 2021 162 | 163 | Subgroups developing project proposals are expected to hold additional meetings open to all between monthly meetings. 164 | 165 | == AOB 166 | 167 | Jeremy Bennett, Candidate Chair + 168 | Wei Wu, Candidate Co-chair 169 | -------------------------------------------------------------------------------- /meetings/2021-01-04-agenda.adoc: -------------------------------------------------------------------------------- 1 | :leveloffset: 1 2 | = RISC-V Code Speed Optimization SIG Meeting Agenda = 3 | Monday 4 January 2021, 07:00 Pacific Time 4 | 5 | //// 6 | SPDX-License-Identifier: CC-BY-4.0 7 | 8 | Document conventions: 9 | - one line per paragraph (don't fill lines - this makes changes clearer) 10 | - Wikipedia heading conventions (First word only capitalized) 11 | - US spelling throughout. 12 | //// 13 | 14 | == Welcome 15 | 16 | Attendees are invited to introduce themselves. 17 | 18 | == Review of actions 19 | 20 | * **Jeremy Bennett** to merge template PRD document into GitHub repository 21 | 22 | ** Complete. 23 | 24 | * **Wei Wu** to commit his draft PRD to the projects directory of the repository and solicit comments from this group and the wider RISC-V tech group 25 | 26 | ** Wei Wu to report. 27 | 28 | * **Wei Wu** to lead a review at 4 January meeting, with a view to submitting to Toolchain & Runtimes committee on 14 January 29 | 30 | ** See agenda item <>. 31 | 32 | * **Jeremy Bennett** to provide an outline PRD for B extension speed optimization for review at the next meeting 33 | 34 | ** See agenda item <>. 35 | 36 | * **Jeremy Bennett** to add individual projects for each extension to the list of candidate projects 37 | 38 | ** Complete. 39 | 40 | * **Jeremy Bennett** to update list of potential projects 41 | 42 | ** Complete. 43 | 44 | 45 | == Report back from Toolchain & Runtime SC 46 | 47 | The chair will report back on the Toolchain & Runtime Subcommittee, which has asked for changes to this group's charter. 48 | 49 | [[project-1]] 50 | == Project #1: Continuous integration, testing, trace and benchmarking 51 | 52 | The co-chair presented the draft PRD at our previous meeting. This session will review that proposal for consideration by Toolchain & Runtimes Technical Subcommittee on 14 January 2021. 53 | 54 | [[project-2]] 55 | == Project #2: Compiler optimizations for the bit manipulation extension 56 | 57 | The chair will present a draft PRD for consideration. A project lead to develop this proposal fully is sought. 58 | 59 | == List of candidate projects 60 | 61 | Members are invited to consider additional projects to be added to the priority list. This is the link:https://github.com/riscv/riscv-code-speed-optimization/blob/main/projects/candidate-projects.adoc[current list]. The outstanding projects will be prioritized by polling members attending the meeting. 62 | 63 | Future meetings will progress current projects and typically consider initiating the next project on the list. 64 | 65 | == Dates of future meetings 66 | 67 | The group meets at 07:00 Pacific Time 68 | 69 | * Monday 1 February 2021 70 | * Monday 1 March 2021 71 | * Monday 5 April 2021 72 | * Monday 3 May 2021 73 | * Monday 7 June 2021 74 | * Monday 5 July 2021 75 | * Monday 2 August 2021 76 | * Monday 6 September 2021 77 | * Monday 4 October 2021 78 | * Monday 1 November 2021 79 | * Monday 6 December 2021 80 | 81 | Subgroups developing project proposals are expected to hold additional meetings open to all between monthly meetings. 82 | 83 | == AOB 84 | 85 | Members are invited to raise any other items of business. 86 | 87 | Jeremy Bennett, Candidate Chair + 88 | Wei Wu, Candidate Co-chair 89 | -------------------------------------------------------------------------------- /meetings/2021-01-04-minutes.adoc: -------------------------------------------------------------------------------- 1 | :leveloffset: 1 2 | = RISC-V Code Speed Optimization SIG Meeting Minutes = 3 | 4 | Monday 4 January 2021, 07:00 Pacific Time 5 | 6 | //// 7 | SPDX-License-Identifier: CC-BY-4.0 8 | 9 | Document conventions: 10 | - one line per paragraph (don't fill lines - this makes changes clearer) 11 | - Wikipedia heading conventions (First word only capitalized) 12 | - US spelling throughout. 13 | //// 14 | 15 | == Summary of actions 16 | 17 | * **All** to propose wording to indicate this SIG's welcome for contributions from academia for consideration at the next SIG meeting. 18 | 19 | * **Wei Wu** to commit the final version of the Performance Tracking System PRD to the projects directory of the repository and submit to the next meeting of Toolchain & Runtimes committee on 14 January, so that a task group can be created to execute the project. 20 | 21 | * **Jeremy Bennett** to add Trace back to the list of candidate projects. 22 | 23 | * **Jeremy Bennett** to update the list of candidate projects with new projects and priorities. 24 | 25 | == Welcome 26 | 27 | The anti-trust and code of conduct notice were presented. 28 | 29 | Attendees introduced themselves and agreed nem. con. that the minutes should record the list of attendees. 30 | 31 | * Erin Olson, Seagate, wants to learn about optimization 32 | * Evandro Menezes, Co-chair T&R 33 | * Jim Wilson, GCC guy at SiFive 34 | * Allen Baum, observing, Chair of arch test group 35 | * Lu Yahan, PLCT lab, Chinese Academy of Sciences 36 | * Mark Seligman, HPC compiler developer, individual method 37 | * Max Ma, compiler engineer SiFive, focus on LLVM 38 | * Michael Wong, CodePlay, ISO C++ commmittee, ex compiler optimization person 39 | * Mehmet Oguz Derin, undergraduate in mathematics, interested in computer graphics 40 | * Ren Guo, Kernel developer, Alibaba T-Head 41 | * Wei Wu, PLCT lab, Chinese Academy of Sciences, Co-chair 42 | * Jeremy Bennett, Embecosm, open source tool chain, operating system and modeling, chair 43 | 44 | == Review of actions 45 | 46 | * **Jeremy Bennett** to merge template PRD document into GitHub repository 47 | 48 | ** Complete. 49 | 50 | * **Wei Wu** to commit his draft PRD to the projects directory of the repository and solicit comments from this group and the wider RISC-V tech group 51 | 52 | ** Complete. 53 | 54 | * **Wei Wu** to lead a review at 4 January meeting, with a view to submitting to Toolchain & Runtimes committee on 14 January 55 | 56 | ** See agenda item <>. 57 | 58 | * **Jeremy Bennett** to provide an outline PRD for B extension speed optimization for review at the next meeting 59 | 60 | ** See agenda item <>. 61 | 62 | * **Jeremy Bennett** to add individual projects for each extension to the list of candidate projects 63 | 64 | ** Complete. 65 | 66 | * **Jeremy Bennett** to update list of potential projects 67 | 68 | ** Complete. 69 | 70 | == Report back from Toolchain & Runtime SC 71 | 72 | Jeremy Bennett attended the meeting on 17 December 2021. 73 | 74 | The SIG was asked to extend its charter to indicate that contributions from the academic community were welcome. 75 | 76 | **ACTION:** All to propose wording to indicate this SIG's welcome for contributions from academia for consideration at the next SIG meeting. 77 | 78 | 79 | [[project-1]] 80 | == Project #1: Performance Tracking System 81 | 82 | Wei Wu led a discussion of the final draft project proposal pending submission to Toolchain and Runtimes Subcommittee to create a TG to run this project. Key points: 83 | 84 | * The scope should include functional languages such as Haskell and OCaml which are important to the academic community. 85 | 86 | * This project only covers bencmarking. The original scope included continuous integration, trace and functional testing to go into a separate project. 87 | 88 | ** Allen Baum's new group is likely to have responsibility for functional testing - out of scope for this group. 89 | 90 | ** There is a separate group looking at continuous integraion - out of scope fro this group. 91 | 92 | * The importance of testing many different boards was emphasized. 93 | 94 | * Question was raised of whether to use docker images to improve reproducibility. Something for the task group to sort out. 95 | 96 | * The project needs more benchmarks than just Embench, such as SPEC CPU 2006 (licensing challenges). What about SPEC 2000 (quicker to run) - still relevant, but same licensing issues. Embench IoT suite is appropriate for small microncontrollers. What about Linpack and HPCG for the HPC community? Needs to be a generic framework, so can include new benchmarks as they ecome available. 97 | 98 | * What about OS specific benchmarking? Need to include benchmarks of this sort in the performance tracking system. 99 | 100 | * Note that scope of original project included continuous integration, trace and functional test. Continuous integration and functional test are the responsibilty of other groups, but we should add trace back in to the list of projects. 101 | 102 | The group agreed to support submission to Toolchain and Runtimes SC nem. con. 103 | 104 | **ACTION:** Wei Wu to commit the final version of the Performance Tracking System PRD to the projects directory of the repository and submit to the next meeting of Toolchain & Runtimes committee on 14 January, so that a task group can be created to execute the project. 105 | 106 | **ACTION:** Jeremy Bennett to add Trace back to the list of candidate projects. 107 | 108 | [[project-2]] 109 | == Project #2: Compiler optimizations for the bit manipulation extension 110 | 111 | This item was deferred to a subsequent meeting due to lack of time. 112 | 113 | == List of candidate projects 114 | 115 | Add the following: 116 | 117 | * optimization for different run-time environments: Eg. V8, OpenJDK, SpiderMonkey 118 | 119 | * Glibc optimizaton 120 | 121 | * Characterizing what optimizations are RISC-V specific 122 | 123 | === Reprioritization 124 | 125 | The top two priorities are establised: 126 | 127 | 1. Performance Tracking System 128 | 2. Compiler optimization for the B extension 129 | 130 | A poll of those present established the following priority for future projects (up to 2 votes each). 131 | 132 | [cols="<4,>1",options="header,pagewidth",] 133 | |============================================================================= 134 | | _Project_ | _Count_ 135 | | Compiler optimization for the V extension | 3 136 | | Generic GCC optimization | 3 137 | | Profiling | 3 138 | | Superoptimization | 3 139 | | Application specific optimizations (esp OpenCV) | 1 140 | | Generic LLVM optimization | 1 141 | | Glibc optimizaton | 1 142 | | Linker optimization | 1 143 | | Machine learning optimizations | 1 144 | |============================================================================= 145 | 146 | **ACTION:** Jeremy Bennett to update the list of candidate projects with new projects and priorities. 147 | 148 | == Dates of future meetings 149 | 150 | The group meets at 07:00 Pacific Time 151 | 152 | * Monday 1 February 2021 153 | * Monday 1 March 2021 154 | * Monday 5 April 2021 155 | * Monday 3 May 2021 156 | * Monday 7 June 2021 157 | * Monday 5 July 2021 158 | * Monday 2 August 2021 159 | * Monday 6 September 2021 160 | * Monday 4 October 2021 161 | * Monday 1 November 2021 162 | * Monday 6 December 2021 163 | 164 | Subgroups developing project proposals are expected to hold additional meetings open to all between monthly meetings. 165 | 166 | == AOB 167 | 168 | Jeremy Bennett, Candidate Chair + 169 | Wei Wu, Candidate Co-chair 170 | -------------------------------------------------------------------------------- /meetings/README.adoc: -------------------------------------------------------------------------------- 1 | = RISC-V Code Speed Optimization SIG Meetings = 2 | 3 | //// 4 | SPDX-License-Identifier: CC-BY-4.0 5 | 6 | Document conventions: 7 | - one line per paragraph (don't fill lines - this makes changes clearer) 8 | - Wikipedia heading conventions (First word only capitalized) 9 | - US spelling throughout. 10 | //// 11 | 12 | This directory holds the Asciidoc source of meeting minutes, agendas and othe rmaterial. However the definitive location for all meeting minutes is the PDF held on the https://drive.google.com/drive/folders/1iSOCGbMp5bWYVvgs5GkeHmmHFS9y0jp1[Code Speed Optimization SIG Google Drive]. 13 | -------------------------------------------------------------------------------- /projects/candidate-projects.adoc: -------------------------------------------------------------------------------- 1 | = RISC-V Code Speed Optimization Candidate Projects 2 | 3 | //// 4 | SPDX-License-Identifier: CC-BY-4.0 5 | 6 | Document conventions: 7 | * one line per paragraph (don't fill lines - this makes changes clearer) 8 | * Wikipedia heading conventions (First word only capitalized) 9 | * US spelling throughout. 10 | //// 11 | 12 | Some of these may be handled by other groups. Please submit pull requests or issues to update this list. 13 | 14 | == Priorities 15 | 16 | The following projects are currently in progress 17 | 18 | 1. Performance Tracking System 19 | 2. Compiler optimization for the B extension 20 | 21 | The following priority list for the remaining projects was most recently reviewed at the SIG meeting of 4 January 2020, where each person present was asked to choose their top two priorities for projects to work on 22 | 23 | [cols="<4,>1",options="header,pagewidth",] 24 | |============================================================================= 25 | | _Project_ | _Count_ 26 | | Compiler optimization for the V extension | 3 27 | | Generic GCC optimization | 3 28 | | Profiling | 3 29 | | Superoptimization | 3 30 | | Application specific optimizations (esp OpenCV) | 1 31 | | Generic LLVM optimization | 1 32 | | Glibc optimizaton | 1 33 | | Linker optimization | 1 34 | | Machine learning optimizations | 1 35 | |============================================================================= 36 | 37 | == Candidate projects 38 | 39 | === Research 40 | 41 | * link:linker-optimizations.adoc[extension of optimization in the linker] 42 | * machine learning in the compiler 43 | * characterizing what optimizations are RISC-V specific 44 | 45 | === Development 46 | 47 | * a collection of general RISC-V link:gcc-optimizations.adoc[GCC optimizations] 48 | * compiler/library (GCC/LLVM/IAR) optimization for `Zfh` extension 49 | * compiler/library (GCC/LLVM/IAR) optimization for `B` extension 50 | * compiler/library (GCC/LLVM/IAR) optimization for `J` extension 51 | * compiler/library (GCC/LLVM/IAR) optimization for `P` extension 52 | * compiler/library (GCC/LLVM/IAR) optimization for `V` extension 53 | * compiler/library (GCC/LLVM/IAR) optimization for `Zfinx` extension 54 | * machine learning outside the compiler 55 | * superoptimization 56 | * compiler support for profiling, including loop level profiling 57 | * general LLVM optimizations 58 | * general optimizations for other compilers 59 | * `-menable-experimental-extensions` option for GCC 60 | * version of `-msave-restore` optimized for speed 61 | * compiler//library optimization for the V8 run-time environment 62 | * compiler//library optimization for the OpenJDK run-time environment 63 | * compiler//library optimization for the SpiderMonkey run-time environment 64 | * optimization for specific applications, e.g. OpenCV, crypto 65 | * trace for optimization 66 | * Glibc optimization 67 | 68 | === Process 69 | 70 | * FSF copyright assignment in RISC-V mirror repositories 71 | * allocation of vendor specific linker relocations 72 | * ongoing benchmarking (and regressions in benchmarking), including competitive analysis, identifying suitable benchmarks to use and support for tracing 73 | * determination of baseline code speed benchmarks for each language supported by RISC-V 74 | * documentation, for example to facilitate writing a scheduler, what can be standardize/parameterized, what are the "hints" in RISC-V documentation 75 | * buildbot infrastructure for open source CI and measurement (tied into benchmarking and tracing) 76 | * determine the impact of code density on code speed, including cache impacts 77 | -------------------------------------------------------------------------------- /projects/gcc-files/extra-array-add-bin.txt: -------------------------------------------------------------------------------- 1 | diff --git a/bfd/bfd-in2.h b/bfd/bfd-in2.h 2 | index c64eee1..273f5b3 100644 3 | --- a/bfd/bfd-in2.h 4 | +++ b/bfd/bfd-in2.h 5 | @@ -4779,6 +4779,7 @@ number for the SBIC, SBIS, SBI and CBI instructions */ 6 | BFD_RELOC_RISCV_SET16, 7 | BFD_RELOC_RISCV_SET32, 8 | BFD_RELOC_RISCV_32_PCREL, 9 | + BFD_RELOC_RISCV_GPREL_ADD, 10 | 11 | /* Renesas RL78 Relocations. */ 12 | BFD_RELOC_RL78_NEG8, 13 | diff --git a/bfd/elfnn-riscv.c b/bfd/elfnn-riscv.c 14 | index b82e655..7b1acf1 100644 15 | --- a/bfd/elfnn-riscv.c 16 | +++ b/bfd/elfnn-riscv.c 17 | @@ -1799,6 +1799,7 @@ riscv_elf_relocate_section (bfd *output_bfd, 18 | case R_RISCV_COPY: 19 | case R_RISCV_JUMP_SLOT: 20 | case R_RISCV_RELATIVE: 21 | + case R_RISCV_GPREL_ADD: 22 | /* These require nothing of us at all. */ 23 | continue; 24 | 25 | @@ -3035,6 +3036,20 @@ _bfd_riscv_relax_lui (bfd *abfd, 26 | rel->r_info = ELFNN_R_INFO (sym, R_RISCV_GPREL_S); 27 | return TRUE; 28 | 29 | + case R_RISCV_GPREL_ADD: 30 | + { 31 | + bfd_vma add_insn = bfd_get_32 (abfd, contents + rel->r_offset); 32 | + /* ??? Verify that this is a 32-bit instruction. */ 33 | + if ((add_insn & 0x3) != 0x3) 34 | + abort (); 35 | + /* The HI20 part is always in RS2. */ 36 | + add_insn &= ~(OP_MASK_RS2 << OP_SH_RS2); 37 | + add_insn |= 0x3 << OP_SH_RS2; 38 | + bfd_put_32 (abfd, add_insn, contents + rel->r_offset); 39 | + rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); 40 | + } 41 | + break; 42 | + 43 | case R_RISCV_HI20: 44 | /* We can delete the unnecessary LUI and reloc. */ 45 | rel->r_info = ELFNN_R_INFO (0, R_RISCV_NONE); 46 | @@ -3370,6 +3385,7 @@ _bfd_riscv_relax_section (bfd *abfd, asection *sec, 47 | if (type == R_RISCV_CALL || type == R_RISCV_CALL_PLT) 48 | relax_func = _bfd_riscv_relax_call; 49 | else if (type == R_RISCV_HI20 50 | + || type == R_RISCV_GPREL_ADD 51 | || type == R_RISCV_LO12_I 52 | || type == R_RISCV_LO12_S) 53 | relax_func = _bfd_riscv_relax_lui; 54 | diff --git a/bfd/elfxx-riscv.c b/bfd/elfxx-riscv.c 55 | index 7d4f59f..043cfa3 100644 56 | --- a/bfd/elfxx-riscv.c 57 | +++ b/bfd/elfxx-riscv.c 58 | @@ -855,6 +855,21 @@ static reloc_howto_type howto_table[] = 59 | 0, /* src_mask */ 60 | MINUS_ONE, /* dst_mask */ 61 | FALSE), /* pcrel_offset */ 62 | + 63 | + /* For relaxing absolute addressing to gprel. */ 64 | + HOWTO (R_RISCV_GPREL_ADD, /* type */ 65 | + 0, /* rightshift */ 66 | + 2, /* size */ 67 | + 32, /* bitsize */ 68 | + FALSE, /* pc_relative */ 69 | + 0, /* bitpos */ 70 | + complain_overflow_dont, /* complain_on_overflow */ 71 | + bfd_elf_generic_reloc, /* special_function */ 72 | + "R_RISCV_GPREL_ADD", /* name */ 73 | + TRUE, /* partial_inplace */ 74 | + 0, /* src_mask */ 75 | + 0, /* dst_mask */ 76 | + FALSE), /* pcrel_offset */ 77 | }; 78 | 79 | /* A mapping from BFD reloc types to RISC-V ELF reloc types. */ 80 | @@ -917,6 +932,7 @@ static const struct elf_reloc_map riscv_reloc_map[] = 81 | { BFD_RELOC_RISCV_SET16, R_RISCV_SET16 }, 82 | { BFD_RELOC_RISCV_SET32, R_RISCV_SET32 }, 83 | { BFD_RELOC_RISCV_32_PCREL, R_RISCV_32_PCREL }, 84 | + { BFD_RELOC_RISCV_GPREL_ADD, R_RISCV_GPREL_ADD }, 85 | }; 86 | 87 | /* Given a BFD reloc type, return a howto structure. */ 88 | diff --git a/bfd/libbfd.h b/bfd/libbfd.h 89 | index b810c40..34ba040 100644 90 | --- a/bfd/libbfd.h 91 | +++ b/bfd/libbfd.h 92 | @@ -2262,6 +2262,7 @@ static const char *const bfd_reloc_code_real_names[] = { "@@uninitialized@@", 93 | "BFD_RELOC_RISCV_SET16", 94 | "BFD_RELOC_RISCV_SET32", 95 | "BFD_RELOC_RISCV_32_PCREL", 96 | + "BFD_RELOC_RISCV_GPREL_ADD", 97 | "BFD_RELOC_RL78_NEG8", 98 | "BFD_RELOC_RL78_NEG16", 99 | "BFD_RELOC_RL78_NEG24", 100 | diff --git a/bfd/reloc.c b/bfd/reloc.c 101 | index 411f998..6459789 100644 102 | --- a/bfd/reloc.c 103 | +++ b/bfd/reloc.c 104 | @@ -5218,6 +5218,8 @@ ENUMX 105 | BFD_RELOC_RISCV_SET32 106 | ENUMX 107 | BFD_RELOC_RISCV_32_PCREL 108 | +ENUMX 109 | + BFD_RELOC_RISCV_GPREL_ADD 110 | ENUMDOC 111 | RISC-V relocations. 112 | 113 | diff --git a/gas/config/tc-riscv.c b/gas/config/tc-riscv.c 114 | index 43ae21f..d54d831 100644 115 | --- a/gas/config/tc-riscv.c 116 | +++ b/gas/config/tc-riscv.c 117 | @@ -1296,6 +1296,7 @@ static const struct percent_op_match percent_op_stype[] = 118 | static const struct percent_op_match percent_op_rtype[] = 119 | { 120 | {"%tprel_add", BFD_RELOC_RISCV_TPREL_ADD}, 121 | + {"%gprel_add", BFD_RELOC_RISCV_GPREL_ADD}, 122 | {0, 0} 123 | }; 124 | 125 | @@ -2543,6 +2544,7 @@ md_apply_fix (fixS *fixP, valueT *valP, segT seg ATTRIBUTE_UNUSED) 126 | 127 | case BFD_RELOC_RISCV_CALL: 128 | case BFD_RELOC_RISCV_CALL_PLT: 129 | + case BFD_RELOC_RISCV_GPREL_ADD: 130 | relaxable = TRUE; 131 | break; 132 | 133 | diff --git a/include/elf/riscv.h b/include/elf/riscv.h 134 | index d036e83..2387eaf 100644 135 | --- a/include/elf/riscv.h 136 | +++ b/include/elf/riscv.h 137 | @@ -88,6 +88,7 @@ START_RELOC_NUMBERS (elf_riscv_reloc_type) 138 | RELOC_NUMBER (R_RISCV_SET16, 55) 139 | RELOC_NUMBER (R_RISCV_SET32, 56) 140 | RELOC_NUMBER (R_RISCV_32_PCREL, 57) 141 | + RELOC_NUMBER (R_RISCV_GPREL_ADD, 58) 142 | END_RELOC_NUMBERS (R_RISCV_max) 143 | 144 | /* Processor specific flags for the ELF header e_flags field. */ 145 | -------------------------------------------------------------------------------- /projects/gcc-files/extra-array-add-gcc.txt: -------------------------------------------------------------------------------- 1 | updated patch version that tries to add relocs for relaxation 2 | 3 | diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c 4 | index 9a9d9e1..b6e7d3b 100644 5 | --- a/gcc/config/riscv/riscv.c 6 | +++ b/gcc/config/riscv/riscv.c 7 | @@ -1035,6 +1035,18 @@ static rtx riscv_tls_add_tp_le (rtx dest, rtx base, rtx sym) 8 | return gen_tls_add_tp_lesi (dest, base, tp, sym); 9 | } 10 | 11 | +/* Add a register to a hi/lo pair. */ 12 | + 13 | +static rtx riscv_gprel_add (rtx dest, rtx addend, rtx lo_sum_op) 14 | +{ 15 | + if (Pmode == DImode) 16 | + return gen_gprel_adddi (dest, addend, XEXP (lo_sum_op, 0), 17 | + XEXP (lo_sum_op, 1)); 18 | + else 19 | + return gen_gprel_addsi (dest, addend, XEXP (lo_sum_op, 0), 20 | + XEXP (lo_sum_op, 1)); 21 | +} 22 | + 23 | /* If MODE is MAX_MACHINE_MODE, ADDR appears as a move operand, otherwise 24 | it appears in a MEM of that mode. Return true if ADDR is a legitimate 25 | constant in that context and can be split into high and low parts. 26 | @@ -1245,6 +1257,55 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED, 27 | return riscv_force_address (addr, mode); 28 | } 29 | 30 | + if (GET_CODE (x) == PLUS) 31 | + { 32 | + rtx op0 = XEXP (x, 0); 33 | + rtx op1 = XEXP (x, 1); 34 | + rtx addr; 35 | + rtx lo_sum_op = NULL_RTX; 36 | + rtx other_op = NULL_RTX; 37 | + 38 | + /* Catch an expression like (plus (reg) (label_ref:SI 0)) which would 39 | + normally be expanded to 40 | + lui a5,%hi(.L4) 41 | + addi a5,a5,%lo(.L4) 42 | + add a0,a0,a5 43 | + lw a5,0(a0) 44 | + We instead split out the low part and reorder to get 45 | + lui a5,%hi(.L4) 46 | + add a0,a0,a5 47 | + lw a5,%lo(.L4)(a0) */ 48 | + 49 | + if (GET_CODE (op0) == LO_SUM) 50 | + { 51 | + lo_sum_op = op0; 52 | + other_op = op1; 53 | + } 54 | + else if (riscv_split_symbol (NULL, op0, Pmode, &addr)) 55 | + { 56 | + lo_sum_op = addr; 57 | + other_op = op1; 58 | + } 59 | + else if (GET_CODE (op1) == LO_SUM) 60 | + { 61 | + lo_sum_op = op1; 62 | + other_op = op0; 63 | + } 64 | + else if (riscv_split_symbol (NULL, op1, Pmode, &addr)) 65 | + { 66 | + lo_sum_op = addr; 67 | + other_op = op0; 68 | + } 69 | + 70 | + if (lo_sum_op != NULL_RTX) 71 | + { 72 | + rtx dest = gen_reg_rtx (Pmode); 73 | + emit_insn (riscv_gprel_add (dest, force_reg (Pmode, other_op), 74 | + lo_sum_op)); 75 | + x = gen_rtx_LO_SUM (Pmode, dest, XEXP (lo_sum_op, 1)); 76 | + } 77 | + } 78 | + 79 | return x; 80 | } 81 | 82 | diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md 83 | index 56fe516..4226b32 100644 84 | --- a/gcc/config/riscv/riscv.md 85 | +++ b/gcc/config/riscv/riscv.md 86 | @@ -36,6 +36,9 @@ 87 | ;; High part of PC-relative address. 88 | UNSPEC_AUIPC 89 | 90 | + ;; For relaxing hi/lo add pairs. 91 | + UNSPEC_GPREL_ADD 92 | + 93 | ;; Floating-point unspecs. 94 | UNSPEC_FLT_QUIET 95 | UNSPEC_FLE_QUIET 96 | @@ -1195,6 +1198,18 @@ 97 | [(set_attr "type" "arith") 98 | (set_attr "cannot_copy" "yes")]) 99 | 100 | +(define_insn "gprel_add" 101 | + [(set (match_operand:P 0 "register_operand" "=r") 102 | + (unspec:P 103 | + [(match_operand:P 1 "register_operand" "r") 104 | + (match_operand:P 2 "register_operand" "r") 105 | + (match_operand:P 3 "symbolic_operand" "")] 106 | + UNSPEC_GPREL_ADD))] 107 | + "" 108 | + "add\t%0,%1,%2,%%gprel_add(%3)" 109 | + [(set_attr "type" "arith") 110 | + (set_attr "mode" "")]) 111 | + 112 | ;; Instructions for adding the low 12 bits of an address to a register. 113 | ;; Operand 2 is the address: riscv_print_operand works out which relocation 114 | ;; should be applied. 115 | -------------------------------------------------------------------------------- /projects/gcc-files/extra-array-add-patch.txt: -------------------------------------------------------------------------------- 1 | converts 2 | lui a5,%hi(array) 3 | addi a5,a5,%lo(array) 4 | add a0,a5,a0 5 | lbu a0,0(a0) 6 | into 7 | lui a5,%hi(array) 8 | add a0,a5,a0 9 | lbu a0,%lo(array)(a0) 10 | fails with linker relaxation, which gives 11 | 101ac: 00a78533 add a0,a5,a0 12 | 101b0: 9981c503 lbu a0,-1640(gp) # 11ba8 13 | needs a solution like the tls 4-operand add relocation/relaxation 14 | 15 | diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c 16 | index 2a8f87d..61e0cf7 100644 17 | --- a/gcc/config/riscv/riscv.c 18 | +++ b/gcc/config/riscv/riscv.c 19 | @@ -1245,6 +1245,55 @@ riscv_legitimize_address (rtx x, rtx oldx ATTRIBUTE_UNUSED, 20 | return riscv_force_address (addr, mode); 21 | } 22 | 23 | + if (GET_CODE (x) == PLUS) 24 | + { 25 | + rtx op0 = XEXP (x, 0); 26 | + rtx op1 = XEXP (x, 1); 27 | + rtx addr; 28 | + rtx lo_sum_op = NULL_RTX; 29 | + rtx other_op = NULL_RTX; 30 | + 31 | + /* Catch an expression like (plus (reg) (label_ref:SI 0)) which would 32 | + normally be expanded to 33 | + lui a5,%hi(.L4) 34 | + addi a5,a5,%lo(.L4) 35 | + add a0,a0,a5 36 | + lw a5,0(a0) 37 | + We instead split out the low part and reorder to get 38 | + lui a5,%hi(.L4) 39 | + add a0,a0,a5 40 | + lw a5,%lo(.L4)(a0) */ 41 | + 42 | + if (GET_CODE (op0) == LO_SUM) 43 | + { 44 | + lo_sum_op = op0; 45 | + other_op = op1; 46 | + } 47 | + else if (riscv_split_symbol (NULL, op0, Pmode, &addr)) 48 | + { 49 | + lo_sum_op = addr; 50 | + other_op = op1; 51 | + } 52 | + else if (GET_CODE (op1) == LO_SUM) 53 | + { 54 | + lo_sum_op = op1; 55 | + other_op = op0; 56 | + } 57 | + else if (riscv_split_symbol (NULL, op1, Pmode, &addr)) 58 | + { 59 | + lo_sum_op = addr; 60 | + other_op = op0; 61 | + } 62 | + 63 | + if (lo_sum_op != NULL_RTX) 64 | + { 65 | + x = force_operand (gen_rtx_PLUS (Pmode, other_op, 66 | + XEXP (lo_sum_op, 0)), 67 | + NULL_RTX); 68 | + x = gen_rtx_LO_SUM (Pmode, x, XEXP (lo_sum_op, 1)); 69 | + } 70 | + } 71 | + 72 | return x; 73 | } 74 | 75 | -------------------------------------------------------------------------------- /projects/gcc-files/extra-array-add-testcase.txt: -------------------------------------------------------------------------------- 1 | /* Compile with -O to reproduce problem. */ 2 | char array[8] __attribute__ ((section(".sdata"))); 3 | int filler[100] __attribute__ ((used, section(".sdata"))); 4 | 5 | int __attribute__ ((noinline)) 6 | sub (int i) 7 | { 8 | return array[i]; 9 | } 10 | 11 | int 12 | main (void) 13 | { 14 | return sub (5); 15 | } 16 | -------------------------------------------------------------------------------- /projects/gcc-files/large-frame-hack.txt: -------------------------------------------------------------------------------- 1 | Accept add %hi for eliminable registers, so that when they are eliminated we 2 | can simplify the arithmetic. 3 | 4 | diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md 5 | index ae93788..c7aa350 100644 6 | --- a/gcc/config/riscv/constraints.md 7 | +++ b/gcc/config/riscv/constraints.md 8 | @@ -49,6 +49,11 @@ 9 | (and (match_code "const_int") 10 | (match_test "IN_RANGE (ival, 0, 31)"))) 11 | 12 | +(define_constraint "L" 13 | + "A U-type 20-bit signed immediate." 14 | + (and (match_code "const_int") 15 | + (match_test "(ival & 0xFFF) == 0"))) 16 | + 17 | ;; Floating-point constant +0.0, used for FCVT-based moves when FMV is 18 | ;; not available in RV32. 19 | (define_constraint "G" 20 | diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md 21 | index 854af14..8559c23 100644 22 | --- a/gcc/config/riscv/predicates.md 23 | +++ b/gcc/config/riscv/predicates.md 24 | @@ -27,6 +27,14 @@ 25 | (ior (match_operand 0 "const_arith_operand") 26 | (match_operand 0 "register_operand"))) 27 | 28 | +(define_predicate "const_lui_operand" 29 | + (and (match_code "const_int") 30 | + (match_test "(INTVAL (op) & 0xFFF) == 0"))) 31 | + 32 | +(define_predicate "add_operand" 33 | + (ior (match_operand 0 "arith_operand") 34 | + (match_operand 0 "const_lui_operand"))) 35 | + 36 | (define_predicate "const_csr_operand" 37 | (and (match_code "const_int") 38 | (match_test "IN_RANGE (INTVAL (op), 0, 31)"))) 39 | @@ -51,6 +59,11 @@ 40 | (ior (match_operand 0 "const_0_operand") 41 | (match_operand 0 "register_operand"))) 42 | 43 | +;; For use in adds, when adding to an eliminable register. 44 | +(define_predicate "reg_or_const_int_operand" 45 | + (ior (match_code "const_int") 46 | + (match_operand 0 "register_operand"))) 47 | + 48 | ;; Only use branch-on-bit sequences when the mask is not an ANDI immediate. 49 | (define_predicate "branch_on_bit_operand" 50 | (and (match_code "const_int") 51 | diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h 52 | index 9718683..1e381fa 100644 53 | --- a/gcc/config/riscv/riscv-protos.h 54 | +++ b/gcc/config/riscv/riscv-protos.h 55 | @@ -63,6 +63,7 @@ extern void riscv_expand_conditional_branch (rtx, enum rtx_code, rtx, rtx); 56 | extern rtx riscv_legitimize_call_address (rtx); 57 | extern void riscv_set_return_address (rtx, rtx); 58 | extern bool riscv_expand_block_move (rtx, rtx, rtx); 59 | +extern bool riscv_eliminable_reg (rtx); 60 | extern rtx riscv_return_addr (int, rtx); 61 | extern HOST_WIDE_INT riscv_initial_elimination_offset (int, int); 62 | extern void riscv_expand_prologue (void); 63 | diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c 64 | index cb7dbdf..7fea35a 100644 65 | --- a/gcc/config/riscv/riscv.c 66 | +++ b/gcc/config/riscv/riscv.c 67 | @@ -3304,6 +3304,16 @@ riscv_initial_elimination_offset (int from, int to) 68 | return src - dest; 69 | } 70 | 71 | +/* Return true if X is a register that will be eliminated later on. */ 72 | +bool 73 | +riscv_eliminable_reg (rtx x) 74 | +{ 75 | + return REG_P (x) && (REGNO (x) == FRAME_POINTER_REGNUM 76 | + || REGNO (x) == ARG_POINTER_REGNUM 77 | + || (REGNO (x) >= FIRST_VIRTUAL_REGISTER 78 | + && REGNO (x) <= LAST_VIRTUAL_REGISTER)); 79 | +} 80 | + 81 | /* Implement RETURN_ADDR_RTX. We do not support moving back to a 82 | previous frame. */ 83 | 84 | diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md 85 | index 7d63d81..1e8953a 100644 86 | --- a/gcc/config/riscv/riscv.md 87 | +++ b/gcc/config/riscv/riscv.md 88 | @@ -409,12 +409,56 @@ 89 | [(set_attr "type" "fadd") 90 | (set_attr "mode" "")]) 91 | 92 | -(define_insn "addsi3" 93 | - [(set (match_operand:SI 0 "register_operand" "=r,r") 94 | - (plus:SI (match_operand:SI 1 "register_operand" " r,r") 95 | - (match_operand:SI 2 "arith_operand" " r,I")))] 96 | +(define_expand "addsi3" 97 | + [(set (match_operand:SI 0 "register_operand" "") 98 | + (plus:SI (match_operand:SI 1 "register_operand" "") 99 | + (match_operand:SI 2 "reg_or_const_int_operand" "")))] 100 | "" 101 | - { return TARGET_64BIT ? "add%i2w\t%0,%1,%2" : "add%i2\t%0,%1,%2"; } 102 | +{ 103 | + if (! riscv_eliminable_reg (operands[1])) 104 | + { 105 | + if (! const_arith_operand (operands[2], SImode)) 106 | + operands[2] = force_reg (SImode, operands[2]); 107 | + } 108 | + else 109 | + { 110 | + if (splittable_const_int_operand (operands[2], SImode)) 111 | + { 112 | + /* The idea here is that we emit 113 | + add op0, op1, %hi(op2) 114 | + addi op0, op0, %lo(op2) 115 | + Then when op1, the eliminable reg, gets replaced with sp+offset, 116 | + we can simplify the constants. */ 117 | + HOST_WIDE_INT high_part = CONST_HIGH_PART (INTVAL (operands[2])); 118 | + emit_insn (gen_addsi3_internal (operands[0], operands[1], 119 | + GEN_INT (high_part))); 120 | + operands[1] = operands[0]; 121 | + operands[2] = GEN_INT (INTVAL (operands[2]) - high_part); 122 | + } 123 | + else if (! const_arith_operand (operands[2], SImode)) 124 | + operands[2] = force_reg (SImode, operands[2]); 125 | + } 126 | +}) 127 | + 128 | +(define_insn_and_split "addsi3_internal" 129 | + [(set (match_operand:SI 0 "register_operand" "=r,r,&r") 130 | + (plus:SI (match_operand:SI 1 "register_operand" " r,r,r") 131 | + (match_operand:SI 2 "add_operand" " r,I,L")))] 132 | + "" 133 | +{ 134 | + if (which_alternative == 2) 135 | + return "#"; 136 | + return TARGET_64BIT ? "add%i2w\t%0,%1,%2" : "add%i2\t%0,%1,%2"; 137 | +} 138 | + "&& reload_completed && const_lui_operand (operands[2], SImode)" 139 | + [(const_int 0)] 140 | +{ 141 | + if (REGNO (operands[0]) == REGNO (operands[1])) 142 | + abort (); 143 | + emit_insn (gen_movsi (operands[0], operands[2])); 144 | + emit_insn (gen_addsi3_internal (operands[0], operands[0], operands[1])); 145 | + DONE; 146 | +} 147 | [(set_attr "type" "arith") 148 | (set_attr "mode" "SI")]) 149 | 150 | diff --git a/gcc/postreload.c b/gcc/postreload.c 151 | index e721f2f..e2e69f2 100644 152 | --- a/gcc/postreload.c 153 | +++ b/gcc/postreload.c 154 | @@ -69,6 +69,10 @@ reload_cse_regs (rtx_insn *first ATTRIBUTE_UNUSED) 155 | if (moves_converted) 156 | reload_combine (); 157 | reload_cse_regs_1 (); 158 | + /* The previous reload_cse_regs_1 call deletes no-op moves, which allows 159 | + another reload_combine pass to succeed in cases where previous ones 160 | + did not. */ 161 | + reload_combine (); 162 | } 163 | } 164 | 165 | -------------------------------------------------------------------------------- /projects/gcc-files/opt-si-di-sext.txt: -------------------------------------------------------------------------------- 1 | diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md 2 | index 8b21c19..e5a2961 100644 3 | --- a/gcc/config/riscv/riscv.md 4 | +++ b/gcc/config/riscv/riscv.md 5 | @@ -474,6 +474,32 @@ 6 | [(set_attr "type" "arith") 7 | (set_attr "mode" "SI")]) 8 | 9 | +;; This matches the combiner result for the insn pair 10 | +;; (set (reg:SI x) (plus:SI (reg:SI) (reg:SI))) 11 | +;; (set (reg:DI y) (sign_extend:DI (reg:SI x))) 12 | +;; and converts it into 13 | +;; (set (reg:DI y) (sign_extend:DI (plus:SI (reg:SI) (reg:SI)))) 14 | +;; (set (reg:SI x) (subreg:SI (reg:DI y))) 15 | +;; The subreg can then be substituted into a following instruction, or perhaps 16 | +;; eliminated by register allocation. 17 | + 18 | +(define_insn_and_split "*addsi3_extended3" 19 | + [(set (match_operand:DI 0 "register_operand" "=r,r") 20 | + (sign_extend:DI 21 | + (plus:SI (match_operand:SI 1 "register_operand" " r,r") 22 | + (match_operand:SI 2 "arith_operand" " r,I")))) 23 | + (set (match_operand:SI 3 "register_operand" "=r,r") 24 | + (plus:SI (match_dup 1) 25 | + (match_dup 2)))] 26 | + "TARGET_64BIT" 27 | + "#" 28 | + "&& 1" 29 | + [(set (match_dup 0) (sign_extend:DI (plus:SI (match_dup 1) (match_dup 2)))) 30 | + (set (match_dup 3) (match_dup 4))] 31 | + "operands[4] = gen_lowpart (SImode, operands[0]);" 32 | + [(set_attr "type" "arith") 33 | + (set_attr "mode" "SI")]) 34 | + 35 | ;; 36 | ;; .................... 37 | ;; 38 | @@ -530,6 +556,23 @@ 39 | [(set_attr "type" "arith") 40 | (set_attr "mode" "SI")]) 41 | 42 | +(define_insn_and_split "*subsi3_extended3" 43 | + [(set (match_operand:DI 0 "register_operand" "= r") 44 | + (sign_extend:DI 45 | + (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ") 46 | + (match_operand:SI 2 "register_operand" " r")))) 47 | + (set (match_operand:SI 3 "register_operand" "= r") 48 | + (minus:SI (match_dup 1) 49 | + (match_dup 2)))] 50 | + "TARGET_64BIT" 51 | + "#" 52 | + "&& 1" 53 | + [(set (match_dup 0) (sign_extend:DI (minus:SI (match_dup 1) (match_dup 2)))) 54 | + (set (match_dup 3) (match_dup 4))] 55 | + "operands[4] = gen_lowpart (SImode, operands[0]);" 56 | + [(set_attr "type" "arith") 57 | + (set_attr "mode" "SI")]) 58 | + 59 | (define_insn "negdi2" 60 | [(set (match_operand:DI 0 "register_operand" "=r") 61 | (neg:DI (match_operand:DI 1 "register_operand" " r")))] 62 | @@ -565,6 +608,21 @@ 63 | [(set_attr "type" "arith") 64 | (set_attr "mode" "SI")]) 65 | 66 | +(define_insn_and_split "*negsi2_extended3" 67 | + [(set (match_operand:DI 0 "register_operand" "=r") 68 | + (sign_extend:DI 69 | + (neg:SI (match_operand:SI 1 "register_operand" " r")))) 70 | + (set (match_operand:SI 2 "register_operand" "=r") 71 | + (neg:SI (match_dup 1)))] 72 | + "TARGET_64BIT" 73 | + "#" 74 | + "&& 1" 75 | + [(set (match_dup 0) (sign_extend:DI (neg:SI (match_dup 1)))) 76 | + (set (match_dup 2) (match_dup 3))] 77 | + "operands[3] = gen_lowpart (SImode, operands[0]);" 78 | + [(set_attr "type" "arith") 79 | + (set_attr "mode" "SI")]) 80 | + 81 | ;; 82 | ;; .................... 83 | ;; 84 | @@ -621,6 +679,23 @@ 85 | [(set_attr "type" "imul") 86 | (set_attr "mode" "SI")]) 87 | 88 | +(define_insn_and_split "*mulsi3_extended3" 89 | + [(set (match_operand:DI 0 "register_operand" "=r") 90 | + (sign_extend:DI 91 | + (mult:SI (match_operand:SI 1 "register_operand" " r") 92 | + (match_operand:SI 2 "register_operand" " r")))) 93 | + (set (match_operand:SI 3 "register_operand" "=r") 94 | + (mult:SI (match_dup 1) 95 | + (match_dup 2)))] 96 | + "TARGET_64BIT" 97 | + "#" 98 | + "&& 1" 99 | + [(set (match_dup 0) (sign_extend:DI (mult:SI (match_dup 1) (match_dup 2)))) 100 | + (set (match_dup 3) (match_dup 4))] 101 | + "operands[4] = gen_lowpart (SImode, operands[0]);" 102 | + [(set_attr "type" "imul") 103 | + (set_attr "mode" "SI")]) 104 | + 105 | ;; 106 | ;; ........................ 107 | ;; 108 | @@ -789,6 +864,24 @@ 109 | [(set_attr "type" "idiv") 110 | (set_attr "mode" "DI")]) 111 | 112 | +(define_insn_and_split "*si3_extended3" 113 | + [(set (match_operand:DI 0 "register_operand" "=r") 114 | + (sign_extend:DI 115 | + (any_div:SI (match_operand:SI 1 "register_operand" " r") 116 | + (match_operand:SI 2 "register_operand" " r")))) 117 | + (set (match_operand:SI 3 "register_operand" "=r") 118 | + (any_div:SI (match_dup 1) 119 | + (match_dup 2)))] 120 | + "TARGET_64BIT" 121 | + "#" 122 | + "&& 1" 123 | + [(set (match_dup 0) (sign_extend:DI 124 | + (any_div:SI (match_dup 1) (match_dup 2)))) 125 | + (set (match_dup 3) (match_dup 4))] 126 | + "operands[4] = gen_lowpart (SImode, operands[0]);" 127 | + [(set_attr "type" "idiv") 128 | + (set_attr "mode" "SI")]) 129 | + 130 | (define_insn "div3" 131 | [(set (match_operand:ANYF 0 "register_operand" "=f") 132 | (div:ANYF (match_operand:ANYF 1 "register_operand" " f") 133 | -------------------------------------------------------------------------------- /projects/gcc-files/opt-sign-extend.txt: -------------------------------------------------------------------------------- 1 | works for small testcase, but not for big ones, needs more work 2 | 3 | diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c 4 | index 2a8f87d..195b7cc 100644 5 | --- a/gcc/config/riscv/riscv.c 6 | +++ b/gcc/config/riscv/riscv.c 7 | @@ -3571,6 +3571,10 @@ riscv_adjust_libcall_cfi_prologue () 8 | /* Debug info for adjust sp. */ 9 | adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx, 10 | stack_pointer_rtx, GEN_INT (-saved_size)); 11 | + /* This might be a pattern or an insn. Extract the pattern if it is an 12 | + insn. */ 13 | + if (GET_CODE (adjust_sp_rtx) == INSN) 14 | + adjust_sp_rtx = PATTERN (adjust_sp_rtx); 15 | dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx, 16 | dwarf); 17 | return dwarf; 18 | @@ -3679,6 +3683,10 @@ riscv_adjust_libcall_cfi_epilogue () 19 | /* Debug info for adjust sp. */ 20 | adjust_sp_rtx = gen_add3_insn (stack_pointer_rtx, 21 | stack_pointer_rtx, GEN_INT (saved_size)); 22 | + /* This might be a pattern or an insn. Extract the pattern if it is an 23 | + insn. */ 24 | + if (GET_CODE (adjust_sp_rtx) == INSN) 25 | + adjust_sp_rtx = PATTERN (adjust_sp_rtx); 26 | dwarf = alloc_reg_note (REG_CFA_ADJUST_CFA, adjust_sp_rtx, 27 | dwarf); 28 | 29 | diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md 30 | index 9d22273..3a58d1e 100644 31 | --- a/gcc/config/riscv/riscv.md 32 | +++ b/gcc/config/riscv/riscv.md 33 | @@ -409,7 +409,22 @@ 34 | [(set_attr "type" "fadd") 35 | (set_attr "mode" "")]) 36 | 37 | -(define_insn "addsi3" 38 | +(define_expand "addsi3" 39 | + [(set (match_operand:SI 0 "register_operand") 40 | + (plus:SI (match_operand:SI 1 "register_operand") 41 | + (match_operand:SI 2 "arith_operand")))] 42 | + "" 43 | +{ 44 | + if (TARGET_64BIT) 45 | + { 46 | + rtx tmp = gen_reg_rtx (DImode); 47 | + emit_insn (gen_addsi3_extended (tmp, operands[1], operands[2])); 48 | + emit_move_insn (operands[0], gen_lowpart (SImode, tmp)); 49 | + DONE; 50 | + } 51 | +}) 52 | + 53 | +(define_insn "*addsi3" 54 | [(set (match_operand:SI 0 "register_operand" "=r,r") 55 | (plus:SI (match_operand:SI 1 "register_operand" " r,r") 56 | (match_operand:SI 2 "arith_operand" " r,I")))] 57 | @@ -427,7 +442,7 @@ 58 | [(set_attr "type" "arith") 59 | (set_attr "mode" "DI")]) 60 | 61 | -(define_insn "*addsi3_extended" 62 | +(define_insn "addsi3_extended" 63 | [(set (match_operand:DI 0 "register_operand" "=r,r") 64 | (sign_extend:DI 65 | (plus:SI (match_operand:SI 1 "register_operand" " r,r") 66 | @@ -474,7 +489,22 @@ 67 | [(set_attr "type" "arith") 68 | (set_attr "mode" "DI")]) 69 | 70 | -(define_insn "subsi3" 71 | +(define_expand "subsi3" 72 | + [(set (match_operand:SI 0 "register_operand") 73 | + (minus:SI (match_operand:SI 1 "reg_or_0_operand") 74 | + (match_operand:SI 2 "register_operand")))] 75 | + "" 76 | +{ 77 | + if (TARGET_64BIT) 78 | + { 79 | + rtx tmp = gen_reg_rtx (DImode); 80 | + emit_insn (gen_subsi3_extended (tmp, operands[1], operands[2])); 81 | + emit_move_insn (operands[0], gen_lowpart (SImode, tmp)); 82 | + DONE; 83 | + } 84 | +}) 85 | + 86 | +(define_insn "*subsi3_internal" 87 | [(set (match_operand:SI 0 "register_operand" "= r") 88 | (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ") 89 | (match_operand:SI 2 "register_operand" " r")))] 90 | @@ -483,7 +513,7 @@ 91 | [(set_attr "type" "arith") 92 | (set_attr "mode" "SI")]) 93 | 94 | -(define_insn "*subsi3_extended" 95 | +(define_insn "subsi3_extended" 96 | [(set (match_operand:DI 0 "register_operand" "= r") 97 | (sign_extend:DI 98 | (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ") 99 | @@ -521,7 +551,22 @@ 100 | [(set_attr "type" "fmul") 101 | (set_attr "mode" "")]) 102 | 103 | -(define_insn "mulsi3" 104 | +(define_expand "mulsi3" 105 | + [(set (match_operand:SI 0 "register_operand") 106 | + (mult:SI (match_operand:SI 1 "register_operand") 107 | + (match_operand:SI 2 "register_operand")))] 108 | + "TARGET_MUL" 109 | +{ 110 | + if (TARGET_64BIT) 111 | + { 112 | + rtx tmp = gen_reg_rtx (DImode); 113 | + emit_insn (gen_mulsi3_extended (tmp, operands[1], operands[2])); 114 | + emit_move_insn (operands[0], gen_lowpart (SImode, tmp)); 115 | + DONE; 116 | + } 117 | +}) 118 | + 119 | +(define_insn "*mulsi3" 120 | [(set (match_operand:SI 0 "register_operand" "=r") 121 | (mult:SI (match_operand:SI 1 "register_operand" " r") 122 | (match_operand:SI 2 "register_operand" " r")))] 123 | @@ -539,7 +584,7 @@ 124 | [(set_attr "type" "imul") 125 | (set_attr "mode" "DI")]) 126 | 127 | -(define_insn "*mulsi3_extended" 128 | +(define_insn "mulsi3_extended" 129 | [(set (match_operand:DI 0 "register_operand" "=r") 130 | (sign_extend:DI 131 | (mult:SI (match_operand:SI 1 "register_operand" " r") 132 | @@ -700,7 +745,22 @@ 133 | ;; .................... 134 | ;; 135 | 136 | -(define_insn "si3" 137 | +(define_expand "si3" 138 | + [(set (match_operand:SI 0 "register_operand") 139 | + (any_div:SI (match_operand:SI 1 "register_operand") 140 | + (match_operand:SI 2 "register_operand")))] 141 | + "TARGET_DIV" 142 | +{ 143 | + if (TARGET_64BIT) 144 | + { 145 | + rtx tmp = gen_reg_rtx (DImode); 146 | + emit_insn (gen_si3_extended (tmp, operands[1], operands[2])); 147 | + emit_move_insn (operands[0], gen_lowpart (SImode, tmp)); 148 | + DONE; 149 | + } 150 | +}) 151 | + 152 | +(define_insn "*si3" 153 | [(set (match_operand:SI 0 "register_operand" "=r") 154 | (any_div:SI (match_operand:SI 1 "register_operand" " r") 155 | (match_operand:SI 2 "register_operand" " r")))] 156 | @@ -718,7 +778,7 @@ 157 | [(set_attr "type" "idiv") 158 | (set_attr "mode" "DI")]) 159 | 160 | -(define_insn "*si3_extended" 161 | +(define_insn "si3_extended" 162 | [(set (match_operand:DI 0 "register_operand" "=r") 163 | (sign_extend:DI 164 | (any_div:SI (match_operand:SI 1 "register_operand" " r") 165 | @@ -1488,7 +1548,23 @@ 166 | ;; expand_shift_1 can do this automatically when SHIFT_COUNT_TRUNCATED is 167 | ;; defined, but use of that is discouraged. 168 | 169 | -(define_insn "si3" 170 | +(define_expand "si3" 171 | + [(set (match_operand:SI 0 "register_operand") 172 | + (any_shift:SI 173 | + (match_operand:SI 1 "register_operand") 174 | + (match_operand:QI 2 "arith_operand")))] 175 | + "" 176 | +{ 177 | + if (TARGET_64BIT) 178 | + { 179 | + rtx tmp = gen_reg_rtx (DImode); 180 | + emit_insn (gen_si3_extend (tmp, operands[1], operands[2])); 181 | + emit_move_insn (operands[0], gen_lowpart (SImode, tmp)); 182 | + DONE; 183 | + } 184 | +}) 185 | + 186 | +(define_insn "*si3_internal" 187 | [(set (match_operand:SI 0 "register_operand" "= r") 188 | (any_shift:SI 189 | (match_operand:SI 1 "register_operand" " r") 190 | @@ -1599,7 +1675,7 @@ 191 | [(set_attr "type" "shift") 192 | (set_attr "mode" "DI")]) 193 | 194 | -(define_insn "*si3_extend" 195 | +(define_insn "si3_extend" 196 | [(set (match_operand:DI 0 "register_operand" "= r") 197 | (sign_extend:DI 198 | (any_shift:SI (match_operand:SI 1 "register_operand" " r") 199 | @@ -1701,6 +1777,25 @@ 200 | [(set_attr "type" "shift") 201 | (set_attr "mode" "SI")]) 202 | 203 | +;; Created by combine when a zero_extract is followed by a sign_extend. 204 | +;; Should simplify sign_extend to a move, but meanwhile we can do this 205 | +;; manually with a combiner pattern. 206 | +(define_insn_and_split "*lshrsi3_zero_extend_4" 207 | + [(set (match_operand:DI 0 "register_operand" "=r") 208 | + (zero_extract:DI (match_operand:DI 1 "register_operand" " r") 209 | + (match_operand 2 "const_int_operand") 210 | + (match_operand 3 "const_int_operand"))) 211 | + (set (match_operand:DI 4 "register_operand" "=r") 212 | + (zero_extract:DI (match_dup 1) (match_dup 2) (match_dup 3)))] 213 | + "(TARGET_64BIT && (INTVAL (operands[3]) > 0) 214 | + && (INTVAL (operands[2]) + INTVAL (operands[3]) == 32))" 215 | + "#" 216 | + "" 217 | + [(set (match_dup 0) 218 | + (zero_extract:DI (match_dup 1) (match_dup 2) (match_dup 3))) 219 | + (set (match_dup 4) (match_dup 0))] 220 | + "") 221 | + 222 | ;; 223 | ;; .................... 224 | ;; 225 | -------------------------------------------------------------------------------- /projects/gcc-files/slt-opt.testcase.c: -------------------------------------------------------------------------------- 1 | int sublt (int i) { return i < 10; } 2 | int suble (int i) { return i <= 10; } 3 | int subgt (int i) { return i > 10; } 4 | int subge (int i) { return i >= 10; } 5 | 6 | int sublt_z (int i) { return i < 0; } 7 | int suble_z (int i) { return i <= 0; } 8 | int subgt_z (int i) { return i > 0; } 9 | int subge_z (int i) { return i >= 0; } 10 | 11 | int sublt_z_edge1 (int i) { return i < -1; } 12 | int suble_z_edge1 (int i) { return i <= -1; } 13 | int subgt_z_edge1 (int i) { return i > -1; } 14 | int subge_z_edge1 (int i) { return i >= -1; } 15 | 16 | int sublt_z_edge2 (int i) { return i < 1; } 17 | int suble_z_edge2 (int i) { return i <= 1; } 18 | int subgt_z_edge2 (int i) { return i > 1; } 19 | int subge_z_edge2 (int i) { return i >= 1; } 20 | 21 | int sublt_li (int i) { return i < 0xf00f; } 22 | int suble_li (int i) { return i <= 0xf00f; } 23 | int subgt_li (int i) { return i > 0xf00f; } 24 | int subge_li (int i) { return i >= 0xf00f; } 25 | 26 | int sublt_lui (int i) { return i < 0xffff0000; } 27 | int suble_lui (int i) { return i <= 0xffff0000; } 28 | int subgt_lui (int i) { return i > 0xffff0000; } 29 | int subge_lui (int i) { return i >= 0xffff0000; } 30 | 31 | int sublt_li_edge1 (int i) { return i < 0xfffff7ff; } 32 | int suble_li_edge1 (int i) { return i <= 0xfffff7ff; } 33 | int subgt_li_edge1 (int i) { return i > 0xfffff7ff; } 34 | int subge_li_edge1 (int i) { return i >= 0xfffff7ff; } 35 | 36 | int sublt_li_edge2 (int i) { return i < 0x000007ff; } 37 | int suble_li_edge2 (int i) { return i <= 0x000007ff; } 38 | int subgt_li_edge2 (int i) { return i > 0x000007ff; } 39 | int subge_li_edge2 (int i) { return i >= 0x000007ff; } 40 | 41 | int sublt_lui_edge (int i) { return i < 0xfffffeff; } 42 | int suble_lui_edge (int i) { return i <= 0xfffffeff; } 43 | int subgt_lui_edge (int i) { return i > 0xfffffeff; } 44 | int subge_lui_edge (int i) { return i >= 0xfffffeff; } 45 | 46 | int u_sublt (unsigned int i) { return i < 10; } 47 | int u_suble (unsigned int i) { return i <= 10; } 48 | int u_subgt (unsigned int i) { return i > 10; } 49 | int u_subge (unsigned int i) { return i >= 10; } 50 | 51 | int u_sublt_z (unsigned int i) { return i < 0; } 52 | int u_suble_z (unsigned int i) { return i <= 0; } 53 | int u_subgt_z (unsigned int i) { return i > 0; } 54 | int u_subge_z (unsigned int i) { return i >= 0; } 55 | 56 | int u_sublt_z_edge1 (unsigned int i) { return i < -1; } 57 | int u_suble_z_edge1 (unsigned int i) { return i <= -1; } 58 | int u_subgt_z_edge1 (unsigned int i) { return i > -1; } 59 | int u_subge_z_edge1 (unsigned int i) { return i >= -1; } 60 | 61 | int u_sublt_z_edge2 (unsigned int i) { return i < 1; } 62 | int u_suble_z_edge2 (unsigned int i) { return i <= 1; } 63 | int u_subgt_z_edge2 (unsigned int i) { return i > 1; } 64 | int u_subge_z_edge2 (unsigned int i) { return i >= 1; } 65 | 66 | int u_sublt_li (unsigned int i) { return i < 0xf00f; } 67 | int u_suble_li (unsigned int i) { return i <= 0xf00f; } 68 | int u_subgt_li (unsigned int i) { return i > 0xf00f; } 69 | int u_subge_li (unsigned int i) { return i >= 0xf00f; } 70 | 71 | int u_sublt_lui (unsigned int i) { return i < 0xffff0000; } 72 | int u_suble_lui (unsigned int i) { return i <= 0xffff0000; } 73 | int u_subgt_lui (unsigned int i) { return i > 0xffff0000; } 74 | int u_subge_lui (unsigned int i) { return i >= 0xffff0000; } 75 | 76 | int u_sublt_li_edge1 (unsigned int i) { return i < 0xfffff7ff; } 77 | int u_suble_li_edge1 (unsigned int i) { return i <= 0xfffff7ff; } 78 | int u_subgt_li_edge1 (unsigned int i) { return i > 0xfffff7ff; } 79 | int u_subge_li_edge1 (unsigned int i) { return i >= 0xfffff7ff; } 80 | 81 | int u_sublt_li_edge2 (unsigned int i) { return i < 0x000007ff; } 82 | int u_suble_li_edge2 (unsigned int i) { return i <= 0x000007ff; } 83 | int u_subgt_li_edge2 (unsigned int i) { return i > 0x000007ff; } 84 | int u_subge_li_edge2 (unsigned int i) { return i >= 0x000007ff; } 85 | 86 | int u_sublt_lui_edge (unsigned int i) { return i < 0xfffffeff; } 87 | int u_suble_lui_edge (unsigned int i) { return i <= 0xfffffeff; } 88 | int u_subgt_lui_edge (unsigned int i) { return i > 0xfffffeff; } 89 | int u_subge_lui_edge (unsigned int i) { return i >= 0xfffffeff; } 90 | -------------------------------------------------------------------------------- /projects/gcc-files/slt-opt.txt: -------------------------------------------------------------------------------- 1 | diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md 2 | index f722881..bce4818 100644 3 | --- a/gcc/config/riscv/predicates.md 4 | +++ b/gcc/config/riscv/predicates.md 5 | @@ -51,6 +51,10 @@ 6 | (and (match_operand 0 "sle_operand") 7 | (match_test "INTVAL (op) + 1 != 0"))) 8 | 9 | +(define_predicate "sge_operand" 10 | + (and (match_code "const_int") 11 | + (match_test "SMALL_OPERAND (INTVAL (op) - 1)"))) 12 | + 13 | (define_predicate "const_0_operand" 14 | (and (match_code "const_int,const_wide_int,const_double,const_vector") 15 | (match_test "op == CONST0_RTX (GET_MODE (op))"))) 16 | diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c 17 | index d45b19d..3984a60 100644 18 | --- a/gcc/config/riscv/riscv.c 19 | +++ b/gcc/config/riscv/riscv.c 20 | @@ -2041,14 +2041,18 @@ riscv_int_order_operand_ok_p (enum rtx_code code, rtx cmp1) 21 | 22 | static bool 23 | riscv_canonicalize_int_order_test (enum rtx_code *code, rtx *cmp1, 24 | - machine_mode mode) 25 | + machine_mode mode, bool inverted_p) 26 | { 27 | HOST_WIDE_INT plus_one; 28 | 29 | if (riscv_int_order_operand_ok_p (*code, *cmp1)) 30 | return true; 31 | 32 | - if (CONST_INT_P (*cmp1)) 33 | + /* If the adjusted constant is not simple, then it is better to use swapped 34 | + operands than inverted condition with adjusted operand. A swapped operand 35 | + is lui/addi,slt whereas inverted is li,slt,xor which is 2 insn longer. */ 36 | + if (CONST_INT_P (*cmp1) 37 | + && (!inverted_p || riscv_integer_cost (UINTVAL (*cmp1) + 1) == 1)) 38 | switch (*code) 39 | { 40 | case LE: 41 | @@ -2074,6 +2078,7 @@ riscv_canonicalize_int_order_test (enum rtx_code *code, rtx *cmp1, 42 | default: 43 | break; 44 | } 45 | + 46 | return false; 47 | } 48 | 49 | @@ -2092,12 +2097,12 @@ riscv_emit_int_order_test (enum rtx_code code, bool *invert_ptr, 50 | If not, try doing the same for the inverse operation. If that also 51 | fails, force CMP1 into a register and try again. */ 52 | mode = GET_MODE (cmp0); 53 | - if (riscv_canonicalize_int_order_test (&code, &cmp1, mode)) 54 | + if (riscv_canonicalize_int_order_test (&code, &cmp1, mode, false)) 55 | riscv_emit_binary (code, target, cmp0, cmp1); 56 | else 57 | { 58 | enum rtx_code inv_code = reverse_condition (code); 59 | - if (!riscv_canonicalize_int_order_test (&inv_code, &cmp1, mode)) 60 | + if (!riscv_canonicalize_int_order_test (&inv_code, &cmp1, mode, true)) 61 | { 62 | cmp1 = force_reg (mode, cmp1); 63 | riscv_emit_int_order_test (code, invert_ptr, target, cmp0, cmp1); 64 | @@ -2158,8 +2163,13 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx *op1) 65 | } 66 | else 67 | { 68 | + machine_mode mode = GET_MODE (*op0); 69 | + 70 | *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0); 71 | - if (*op1 != const0_rtx) 72 | + if (CONST_INT_P (*op1)) 73 | + *op1 = GEN_INT (trunc_int_for_mode (INTVAL (*op1), 74 | + GET_MODE (*op0))); 75 | + else 76 | *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1); 77 | } 78 | } 79 | diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md 80 | index 434e064..bed26dc 100644 81 | --- a/gcc/config/riscv/riscv.md 82 | +++ b/gcc/config/riscv/riscv.md 83 | @@ -377,6 +377,13 @@ 84 | (lt "") (ltu "u") 85 | (le "") (leu "u")]) 86 | 87 | +;; Same as above except capital U. 88 | +(define_code_attr U [(sign_extend "") (zero_extend "U") 89 | + (gt "") (gtu "U") 90 | + (ge "") (geu "U") 91 | + (lt "") (ltu "U") 92 | + (le "") (leu "U")]) 93 | + 94 | ;; is like , but the signed form expands to "s" rather than "". 95 | (define_code_attr su [(sign_extend "s") (zero_extend "u")]) 96 | 97 | @@ -2090,6 +2097,78 @@ 98 | [(set_attr "type" "slt") 99 | (set_attr "mode" "")]) 100 | 101 | +;; Allow combine to match slt/xor/sext and change to slt/xor. 102 | +;; But don't accept 0, because that is a single instruction. 103 | +;; ??? We only need one of these patterns. 104 | +(define_split 105 | + [(set (match_operand:DI 0 "register_operand") 106 | + (any_gt:DI (match_operand:DI 1 "register_operand") 107 | + (match_operand 2 "sle_operand")))] 108 | + "TARGET_64BIT && operands[2] != const0_rtx" 109 | + [(set (match_dup 0) (match_op_dup 3 [(match_dup 1) (match_dup 2)])) 110 | + (set (match_dup 0) (xor:DI (match_dup 0) (const_int 1)))] 111 | +{ 112 | + operands[2] = GEN_INT (INTVAL (operands[2]) + 1); 113 | + operands[3] = gen_rtx_LT (DImode, operands[1], operands[2]); 114 | +}) 115 | + 116 | +;; Allow combine to match slt/xor/sext and change to li/slt. 117 | +;; But don't accept 0, because that is a single instruction. 118 | +;; ??? We only need one of these patterns. 119 | +(define_split 120 | + [(set (match_operand:DI 0 "register_operand") 121 | + (any_gt:DI (match_operand:DI 1 "register_operand") 122 | + (match_operand 2 "const_arith_operand"))) 123 | + (clobber (match_operand:DI 3 "register_operand"))] 124 | + "TARGET_64BIT && operands[2] != const0_rtx" 125 | + [(set (match_dup 3) (match_dup 2)) 126 | + (set (match_dup 0) (match_op_dup 4 [(match_dup 3) (match_dup 1)]))] 127 | +{ 128 | + operands[4] = gen_rtx_LT (DImode, operands[3], operands[1]); 129 | +}) 130 | + 131 | +;; Allow combine to match slt/xor/sext and change to slt/xor. 132 | +;; But don't accept 0, because that is a single instruction. 133 | +;; ??? We only need one of these patterns. 134 | +(define_split 135 | + [(set (match_operand:DI 0 "register_operand") 136 | + (any_ge:DI (match_operand:DI 1 "register_operand") 137 | + (match_operand 2 "const_arith_operand")))] 138 | + "TARGET_64BIT && operands[2] != const0_rtx" 139 | + [(set (match_dup 0) (match_op_dup 3 [(match_dup 1) (match_dup 2)])) 140 | + (set (match_dup 0) (xor:DI (match_dup 0) (const_int 1)))] 141 | +{ 142 | + operands[3] = gen_rtx_LT (DImode, operands[1], operands[2]); 143 | +}) 144 | + 145 | +;; Allow combine to match slt/xor/sext and change to li/slt. 146 | +;; But don't accept 0, because that is a single instruction. 147 | +;; ??? We only need one of these patterns. 148 | +(define_split 149 | + [(set (match_operand:DI 0 "register_operand") 150 | + (any_ge:DI (match_operand:DI 1 "register_operand") 151 | + (match_operand 2 "sge_operand"))) 152 | + (clobber (match_operand:DI 3 "register_operand"))] 153 | + "TARGET_64BIT && operands[2] != const0_rtx" 154 | + [(set (match_dup 3) (match_dup 2)) 155 | + (set (match_dup 0) (match_op_dup 4 [(match_dup 3) (match_dup 1)]))] 156 | +{ 157 | + operands[2] = GEN_INT (INTVAL (operands[2]) - 1); 158 | + operands[4] = gen_rtx_LT (DImode, operands[3], operands[1]); 159 | +}) 160 | + 161 | +;; Allow combine to match slt/xor/sext and change to slt/xor. 162 | +(define_split 163 | + [(set (match_operand:DI 0 "register_operand") 164 | + (any_ge:DI (match_operand:DI 1 "register_operand") 165 | + (match_operand:DI 2 "register_operand")))] 166 | + "TARGET_64BIT" 167 | + [(set (match_dup 0) (match_op_dup 3 [(match_dup 1) (match_dup 2)])) 168 | + (set (match_dup 0) (xor:DI (match_dup 0) (const_int 1)))] 169 | +{ 170 | + operands[3] = gen_rtx_LT (DImode, operands[1], operands[2]); 171 | +}) 172 | + 173 | ;; 174 | ;; .................... 175 | ;; 176 | -------------------------------------------------------------------------------- /projects/gcc-optimizations.adoc: -------------------------------------------------------------------------------- 1 | = Project proposal: GCC Optimization 2 | RISC-V International Code Speed Optimization SIG 3 | 4 | //// 5 | SPDX-License-Identifier: CC-BY-4.0 6 | 7 | Document conventions: 8 | - one line per paragraph (don't fill lines - this makes changes clearer) 9 | - Wikipedia heading conventions (First word only capitalized) 10 | - US spelling throughout. 11 | //// 12 | 13 | == Introduction 14 | 15 | A set of 6 GCC RISC-V optimizations for speed proposed by Jim Wilson. Comments and feedback welcome via issues, pull requests (for corrections/clarifications) or the mailing list (for discussions). 16 | 17 | === Exposing SImode operations to the RTL expanders for `any_bitwise` on RV64 18 | 19 | Problem noted by Philipp Tomsich 20 | 21 | [quote] 22 | ---- 23 | All attempts to model this, failed due to the bitwise operations being always expanded into DImode (usually with a paradoxical `(subreg:DI (reg:SI))`). 24 | 25 | Turns out that the (judging by the comment in the `.md`-file: intentional) absence of SImode RTL expansion is the root cause—and we need to allow SImode expansion for RV64, if we want to absorb the `sext.w` later during combine (or avoid it altogether in the first place). 26 | ---- 27 | 28 | Commments from Jim Wilson 29 | 30 | I think part of our trouble with sign extensions is due to the fact that the `riscv.md` file is lying about what instructions are available in rv64. For instance, we have an `addsi3` pattern that claims that we can do `(set (reg:SI) (plus:SI (reg:SI) (reg:SI)))` but there is in fact no such instruction. We only have `(set (reg:DI) (sign_extend:DI (plus:SI (reg:SI) (reg:SI)))`. As an experiment once, I tried modifying the `addsi` pattern to emit `addsi3_extended` instead, putting the result in a temp reg, and then doing a subreg copy into the SImode target reg. The idea here is that as part of the regular optimizations, the subreg should get folded into a following instruction, and in theory we end up with smaller faster code. The same thing can be done with `sub`, `mul`, and `div`. As part of this same patch, I changed the logical operations to work the same way. When testing the patch, I sometimes got better code and sometimes got worse code. I was hoping some day to take a look at this and try to improve it, but that hasn't happened yet. I think I might be missing some combiner patterns necessary to avoid the performance/code size regressions. 31 | 32 | Also, once as an experiment, I tried writing a patch that added combiner patterns to match an add followed by a sign extend, and split it into a sign-extended add followed by a subreg. Same theory here that further optimization should eliminate the subreg copy. This one also sometimes helped and sometimes hurt. I didn't do the logicals in this patch, just the arithmetics. I never had time to finish this one either. 33 | 34 | I think the first patch is a better solution. But it needs a lot more testing than I ever had time to do, and perhaps a few more combiner patterns to avoid regressions. 35 | 36 | I attached my patches so you can see what I was doing. These are old patches so may not apply without changes. 37 | 38 | I have a bunch of other unfinished patches, but there was never anyone available to help me with them, or anyone I could talk to about them. Maybe this can be brought up in the new code optimization task group. 39 | 40 | === Files 41 | 42 | Patches: 43 | 44 | - link:gcc-files/opt-sign-extend.txt[] 45 | - link:gcc-files/opt-si-di-sext.txt[] 46 | 47 | == Optimizing `slt` expressions 48 | 49 | I think this is mostly done, except I haven't proven that all of the transformations are correct, and that they are optimal for rv32 and rv64. 50 | 51 | The testcase is trying to capture every possible edge condition with `slt` expressions. The constant is zero. The constant can fit in an slt immediate. The constant can fit in a `lui`. Etc. And sometimes we need to add/subtract one to change the condition, so we need constants that are one off from each edge where the constant representation changes. And we have signed and unsigned `slt` expressions. We should also test int and long for rv64. I don't think I have that test yet. 52 | 53 | The patch is trying to ensure the shortest possible code sequence for each `slt` expression. As an example, consider this testcase 54 | 55 | [source,c] 56 | ---- 57 | int subgt_li (int i) { return i > 0xf00f; } 58 | ---- 59 | 60 | This currently generates 61 | [source,gas] 62 | ---- 63 | li a5,61440 64 | addi a5,a5,16 65 | slt a0,a0,a5 66 | xori a0,a0,1 67 | ---- 68 | 69 | It should instead generate 70 | [source,gas] 71 | ---- 72 | li a5,61440 73 | addi a5,a5,15 74 | sgt a0,a0,a5 75 | ---- 76 | 77 | which is one instruction shorter. The compiler is inverting the comparison and adjusting the constant when it shouldn't. 78 | 79 | Here is another one 80 | [source,c] 81 | ---- 82 | int subgt_lui (int i) { return i > 0xffff0000; } 83 | ---- 84 | 85 | where the compiler emits 86 | 87 | [source,gas] 88 | ---- 89 | li a5,-65536 90 | addi a5,a5,1 91 | sltu a0,a0,a5 92 | xori a0,a0,1 93 | ---- 94 | 95 | and it should just be 96 | [source,gas] 97 | ---- 98 | li a5,-65536 99 | sgtu a0,a0,a5 100 | ---- 101 | 102 | === Files 103 | 104 | Patch: 105 | 106 | - link:gcc-files/slt-opt.txt[] 107 | 108 | Test case: 109 | 110 | - link:gcc-files/slt-opt.testcase.c[] 111 | 112 | == Eliminate extra array adds 113 | 114 | This one was originally reported as RISC-V GNU tool chain repository https://github.com/riscv/riscv-gnu-toolchain/issues/126[issue #126]. 115 | 116 | The idea here is that we want to fold a non-constant offset into an `lui`/`addi`/`lw` sequence to make it 3 instructions instead of 4. So this converts 117 | [source,gas] 118 | ---- 119 | lui a5,%hi(array) 120 | addi a5,a5,%lo(array) 121 | add a0,a5,a0 122 | lbu a0,0(a0) 123 | ---- 124 | 125 | into 126 | [source,gas] 127 | ---- 128 | lui a5,%hi(array) 129 | add a0,a5,a0 130 | lbu a0,%lo(array)(a0) 131 | ---- 132 | 133 | where `a0` has a value to be added to the array address. 134 | 135 | I have a gcc patch, but it fails with linker relaxation. If array is in range of the `gp`, then the `lui` gets deleted and we are left with 136 | [source,gas] 137 | ---- 138 | add a0,a5,a0 139 | lbu a0,-1640(gp) 140 | ---- 141 | 142 | which is wrong. This should instead be 143 | [source,gas] 144 | ---- 145 | add a0,gp,a0 146 | lbu a0,-1640(a0) 147 | ---- 148 | 149 | So we need a new relocation, and we need binutils relaxation support to handle the new reloc and generate the desired code. 150 | 151 | `extra-array-add-patch.txt` is the original patch, and the bin/gcc patches are my incomplete attempt at fixing it 152 | 153 | === Files 154 | 155 | Patches 156 | 157 | - link:gcc-files/extra-array-add-bin.txt[] 158 | - link:gcc-files/extra-array-add-gcc.txt[] 159 | - link:gcc-files/extra-array-add-patch.txt[] 160 | 161 | Test case 162 | 163 | - link:gcc-files/extra-array-add-testcase.txt[] 164 | 165 | == Large stack frame optimization problem 166 | 167 | This one was also mentioned in the code size task group, but I don't think it is useful there, as people who care about code size generally don't write code this way. 168 | 169 | If you have a stack frame larger than 4K (2K?) we get poor code generation for stack slot references. The compiler generates FP+large constant, which requires lui/addi to load. Then later we do frame pointer elimination which replaces FP with SP-large constant. In theory the constants should cancel. Unfortunately, CSE and other optimizations in between try to optimize the constant loads, sharing similar constants when multiple stack slot references, and then when we do FP elimination the code is so confused that we can't do the constant cancelation and we get an ugly mess. 170 | 171 | This isn't a RISC-V specific problem. We just hit it sooner than other targets as pretty much everyone else has 16-bit constants, and hence needs a 64K (32K?) stack frame before they have a problem. In theory, there is no problem if you have a 48-bit instruction to load a constant, such as Huawei has proposed in the code size task group, because that eliminates the constant cse that gets in the way. 172 | 173 | There is a good testcase for this in MI Bench which is the file `susan.c`. There is also a bug report in RISC-V GCC repository https://github.com/riscv/riscv-gcc/issues/193[issue #193] 174 | 175 | I have a prototype patch. But I needed a change to a target independent optimization pass to make it work without regressions, and I don't have a good argument to justify that. I also haven't tested it much. 176 | 177 | === Files 178 | 179 | Patch 180 | 181 | - link:gcc-files/large-frame-hack.txt[] 182 | 183 | == `target` attribute and pragma 184 | 185 | The `target` attribute and `target` pragma is supported by the most popular targets, e.g. x86, arm, aarch64, powerpc, so should be supported by RISC-V also. This is a quality of implementation issue. 186 | 187 | This allows one to specify target dependent options on a per function basis, e.g. you can compile one copy of a function with the B extension and one without, and then at run-time call the appropriate one depending on whether B extension support exists. 188 | 189 | This was requested somewhere, but I don't remember exactly where. Probably either sw-dev or an issue in the riscv github tree somewhere. 190 | 191 | == Explicit relocations and `medany` 192 | 193 | This combination is off by default as it can result in bad code. 194 | 195 | Consider this testcase. 196 | [source,c] 197 | ---- 198 | int array[995] = { [10] 10, [99] 99 }; 199 | long long ll = 100; 200 | 201 | long long 202 | sub (void) 203 | { 204 | return ll; 205 | } 206 | 207 | int 208 | main (void) 209 | { 210 | return sub (); 211 | } 212 | ---- 213 | 214 | If I use `riscv-gnu-toolchain`, configured for `rv32i` newlib, and compile it with `-O -mcmodel=medany -mexplicit-relocs`, in the assembly output I see 215 | 216 | [source,gas] 217 | ---- 218 | sub: 219 | .LA0: auipc a5,%pcrel_hi(ll) 220 | lw a0,%pcrel_lo(.LA0)(a5) 221 | lw a1,%pcrel_lo(.LA0+4)(a5) 222 | ret 223 | ---- 224 | 225 | which looks reasonable. Though maybe that should be `%pcrel_lo(.LA0)+4` instead, because the +4 is added to the address of `ll` not the address of `.LA0`. However, when I disassemble the `a.out` file, I see 226 | 227 | [source,gas] 228 | ---- 229 | 000101ac : 230 | 101ac: 00002797 auipc a5,0x2 231 | 101b0: 7fc7a503 lw a0,2044(a5) # 129a8 232 | 101b4: 8007a583 lw a1,-2048(a5) 233 | 101b8: 00008067 ret 234 | ---- 235 | 236 | and note that the +4 offset overflowed giving silent bad code. 237 | 238 | I carefully choose the array size to force the error. if you have a slightly different version or configuration of the tools, you might need a different array size to see the error. 239 | 240 | The problem here is that while the variable `ll` is 8-byte aligned, the `auipc` is not aligned, and `medany` is using the offset between the `auipc` and `ll`, so this offset is not a multiple of 8. The `auipc` is only guaranteed to have 4-byte alignment without the C extension, and 2 byte alignment with the C extension. GCC is assuming that any offset smaller than the alignment of the variable is safe, which is not true in this case. 241 | 242 | The same problem can happen for both rv32 and rv64 when using `long double` and `int128_t`, which requires 16-byte alignment. We don't have anything that requires more than 16-byte alignment though, so the problem ends here. 243 | 244 | Unfortunately, I don't see an obvious, easy, and good solution for this. 245 | 246 | We could disallow offsets with `pcrel_lo`, but that means `medany` code won't be as efficient as `medlow` because it will need extra address generation instructions. 247 | 248 | We could force alignment of `auipc`, but that means potentially emitting multiple nops before `auipc`, which again hurts `medany` code size and performance. 249 | 250 | We could maybe change the code sequence to something like 251 | [source,gas] 252 | ---- 253 | aupic %pcrel_hi 254 | addi %pcrel_addi 255 | lw %pcrel_lo_with_addi 256 | lw %pcrel_lo_with_addi+4 257 | ---- 258 | 259 | and then the new `pcrel_addi` reloc adds a value if necessary to avoid overflow, and the `pcrel_lo_with_addi` subtracts the same value. The `addi` can then be deleted via relaxation if it is unnecessary. However, cleanly specifying and implementing these relocs could be a problem because of the complex interactions between them. 260 | 261 | Other solutions might involve defining a new code model, a new ABI, or adding new instructions to the ISA, all of which I'm hoping to avoid. 262 | 263 | While testing this support, I've also managed to find two binutils bugs that can result in link time errors when `pcrel_lo` is used with an offset. Though exactly how those should be fixed depends on how exactly we decide to fix the gcc problem. There is also the (third) linker problem of silently creating bad code when `pcrel_lo+offset` overflows. I can add an error for that, but if someone hits it, there isn't anything they can do to fix it, other than to recompile without `-mexplicit-relocs`. 264 | 265 | Meanwhile, with the gnu toolchain, use of `-mcmodel=medany` is safe, but use of both `-mcmodel=medany` and `-mexplicit-relocs` together is not safe. 266 | 267 | More discussion on the https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/KnziiZtEJNo/m/M8Vfbw9UCgAJ[sw-dev mailing list]. 268 | 269 | I added a linker check a while back, so you should get a linker error now instead of a silent error. But we still can't enable it by default. I don't know of any solution for the problem other than an ISA change or an ABI change, both of which are outside the scope of a gcc patch. A number of solutions have been suggested, for instance emitting a 3 instruction sequence to align the auipc result, and then relax away the extra instruction when unnecessary, but we haven't figured out how to make any of them work. 270 | 271 | This is probably not a good project. 272 | 273 | == Subtracted shift count optimizations 274 | 275 | Consider this testcase 276 | [source,c] 277 | ---- 278 | unsigned foo(unsigned i0H, unsigned x0, unsigned q0) { 279 | return (x0 << (64-q0)) | (i0H >> (q0-32)); 280 | } 281 | ---- 282 | 283 | compiled with `-O2 -S` we get 284 | [source,gas] 285 | ---- 286 | li a5,64 287 | sub a5,a5,a2 288 | addi a2,a2,-32 289 | sll a1,a1,a5 290 | srl a0,a0,a2 291 | or a0,a1,a0 292 | ret 293 | ---- 294 | 295 | Ideally we should get something like this 296 | [source,gas] 297 | ---- 298 | foo: 299 | neg a5,a2 300 | sll a1,a1,a5 301 | srl a0,a0,a2 302 | or a0,a1,a0 303 | ret 304 | ---- 305 | 306 | which takes advantage of the fact that shift counts are truncated. 307 | 308 | I haven't tried to fix this problem yet. 309 | 310 | We can get the good result if we define `SHIFT_COUNT_TRUNCATED`, but that is discouraged as it can cause problems. With `SHIFT_COUNT_TRUNCATED`, `combine` will assume that all shift counts are truncated, even for instructions that you may not consider to be a shift like bitfield insert and extract, and vector instructions. If any of these behave differently than a regular integer shift you can get bad code. 311 | 312 | There is a newer `TARGET_SHIFT_TRUNCATION_MASK` that might work better, as it allows you to specify a mode. Hence it won't accidentally trigger for vector operations, but may still trigger for bitfield instructions. 313 | 314 | There is incidentally a discussion about the B extension where `sbextiw` works differently than `slliw` which could be a problem. 315 | 316 | Another solution is to add combiner patterns to try to match these constructs. We already have combiner patterns that match a shift with the shift count anded against a mask. We could add similar patterns to accept a shift with a shift count that has a constant added or subtracted from it when the constant is a multiple of the word size. For a reverse subtract immediate, e.g. 32 - count, we emit a `neg` instruction for the shift count. Though since this is trading 2 insns for 2 insns, I'm not sure if there is any benefit to this. 317 | 318 | == Document history 319 | 320 | [cols="<1,<2,<3,<4",options="header,pagewidth",] 321 | |================================================================================ 322 | | _Revision_ | _Date_ | _Author_ | _Modification_ 323 | | 0.01 | 26 October 2020 | 324 | 325 | Jim Wilson | 326 | 327 | Initial set of optimizations 328 | 329 | |================================================================================ 330 | -------------------------------------------------------------------------------- /projects/infrastructure-for-perf-tracking.adoc: -------------------------------------------------------------------------------- 1 | = The Performance Tracking System (PTS), and a Task Group for it 2 | //// 3 | SPDX-License-Identifier: CC-BY-4.0 4 | 5 | Document conventions: 6 | - one line per paragraph (don't fill lines - this makes changes clearer) 7 | - Wikipedia heading conventions (First word only capitalized) 8 | - US spelling throughout. 9 | //// 10 | 11 | == Overview 12 | 13 | === Scope 14 | 15 | The proposal aims to build a performance tracking system (PTS) for V8, OpenJDK, and other performance critical open source software in the RISC-V ecosystem. 16 | The system includes a set of scripts for building the softwares we care about and running benchmarks, a farm of machines (hardwares), and a website to show the performance data. 17 | The idea and implementation are based on the arewefastyet open-source project originated from Mozilla. 18 | 19 | After the system is set up, we may need a task group for determining which hardware, which software, which benchmarks should be added into the system. 20 | The task group may share the meeting with code speed opt SIG, for the people attending the two groups are almost the same. 21 | 22 | The main difference of the perf tracking system and general/upstream CI/CD system is that the PTS is focusing on getting real performance data, hence it should always run on physical hardware, and different hardware has different performance characteristics. 23 | 24 | We aim to track V8 and OpenJDK projects in the first place, and include GCC, Clang/LLVM, Spidermonkey, Rust, Golang, Haskell, OCaml, Wasmtime and LuaJIT, etc in near future. 25 | 26 | We are also targeting OpenBLAS and other important HPC/AI libraries in future. 27 | 28 | === RISC-V market justification 29 | 30 | RISC-V is expanding its ecosystem boundary. It got popular in the embedded area in the past years, and now it is targeting server, mobile, AI, and other areas. 31 | The Code Speed (performance) is blocking the prosperity of RISC-V on Desktop/HPC/Mobile fields. 32 | Java speed on RV was ~100x slower than x86/Arm w/o JIT. 33 | V8 and other runtimes w/ JITs are under optimized for RISC-V. 34 | GCC and Clang still have huge potential to improve. 35 | It is important to know where we are, how far can we achieve. 36 | We need a platform to track the opportunities and measure our progress. 37 | It is a kind of public goods, lighthouse for toolchains & runtimes & libs. 38 | So, it is better to put under the umbrella of RISC-V International. 39 | 40 | === Document history 41 | 42 | [cols="<2,<2,<3,<7",options="header,pagewidth",] 43 | |================================================================================ 44 | | _Revision_ | _Date_        | _Author_ | _Modification_ 45 | | 0.2       | 2021-01-03   | 46 | 47 | Wei Wu | 48 | 49 | Update the proposal after meeting 2020-12-17. 50 | 51 | | 0.1       | 2020-12-15   | 52 | 53 | Wei Wu | 54 | 55 | Initial version for discussion. 56 | 57 | |================================================================================ 58 | 59 | == Requirements 60 | 61 | === Context 62 | 63 | The performance tracking system has these type of objects: 64 | 65 | * the repo we care about. Currently has GCC, LLVM, V8, OpenJDK, OpenBLAS. 66 | * the hardware/machine we care about. Currently has Hifive Unleashed. 67 | * the benchmarks we want to measure. Currently we have SunSpider/Octane/Kraken for JavaScript, SPECjvm98 for OpenJDK, and Embench for GCC/Clang. 68 | * the combinations of building configs and running configs for the compilers/libs we wan to track. 69 | 70 | === Phasing 71 | 72 | * Phase 1: set up the infrastructure 73 |   - one compiler (v8), one board (unleashed), three (JS) benchmarks 74 |   - One engineer, in one month. Run the process. Make the website online. 75 | 76 | * Phase 2: more toolchains & runtimes, all available open-source benchmarks. 77 |   - Include upstream GCC, RISC-V GCC, upstream Clang/LLVM 78 |   - Include OpenJDK/HotSpot, OpenJDK/OpenJ9 79 |   - Include Rust, Golang, and Wasmtime 80 |   - Include Embench and all available open-source benchmarks 81 | 82 | * Phase 3: more compilers, runtimes, physical boards and libraries 83 |   - Add Hifive Unmatched and other physical boards, like Alibaba, etc. 84 |   - Add more compilers & runtimes for academic and domain specific languages, like Haskell, OCaml, Julia, R, etc. 85 |   - Set up APIs or processes to let RVI members run the scripts on their own devices and upload to the platform/website. (The idea is much like the submitting process of spec.org ) 86 |   - Include OpenBLAS and other compute libs. 87 | 88 | === Outline plan and timescale 89 | 90 | * Phase 1: 1 engineer. 1 month. Plan: 2021-2-1 ~ 2021-3-1 91 |   - Leave one month for volunteers. Can start today though. 92 | * Phase 2: 2 engineer. 3 month. Plan: 2021-3-1 ~ 2021-6-1 93 | * Phase 3: 2 engineer. 2 month[1]. Plan: 2021-6-1 ~ 2021-8-1 94 |   - [1] it depends on the status of hardware. 95 | * Phase 4: Govern by a Perf Tracking TG. 96 | 97 | After the building of infrastructure got done, a Task Group is needed to determine which repo should be added, which benchmark should be imported. and how to analyze and report the performance data we get. 98 | The new task group may share the meeting time with code speed opt SIG. 99 | 100 | === Prerequisites 101 | 102 | Contributors taking this project needs: 103 | 104 | * Basic knowledge about performance testing and tuning 105 | * Basic knowledge about how to establish a website and tuning JS and PHP code. 106 | * Have RISC-V physical board (not necessary if you just want a test) 107 | * Basic knowledge about how to write bash and python scripts. 108 | 109 | 110 | === Deliverables 111 | 112 | * An open-source toolset for building compilers & running benchmarks & showing perf results on the web. 113 | * Potential optimization opportunities and regressions uncovered by the PTS. 114 | * Annual progress report of all the compilers, runtimes, libraries under the tracking of the PTS. 115 | 116 | A monthly report to the SIG is a routine deliverables for all projects, which will cover 117 | - key progress in the past month 118 | - table showing tasks to be done, in progress, complete and verified 119 | - test/benchmark results 120 | - updates to the risk register 121 | - plans for the next month 122 | 123 | === Milestones 124 | 125 | It is a small project in the first place so the milestone is the phase stage itself. 126 | 127 | === Costs 128 | 129 | * equipment costs 130 |   - Plan to get more physical boards donated by the member companies which made them. 131 |   - An VPS/IaaS for hosting the website. PLCT has an existing VPS and can donate it. 132 | * license costs 133 |   - Use free & open benchmark first. 134 |   - The non-free benchmarks are under investigation. How to donate? Is it possible? (due the LICENSE of each benchmark) 135 | * personnel costs 136 |   - In the first phase a professional engineer might need 80 hours to get all things done. 137 |   - After the website and hardware farm are established, roughly one engineer one day (8 hours) per week. 138 | 139 | == Risk register 140 | 141 | TBD. The PLCT Lab is willing to contribute engineers. 142 | 143 | Risks are assessed by the Impact (I) they have on the project from 1 (minor) to project killer (3) and by the Likelihood (L) of the risk occurring from 1 (10% chance) through 10 (100% chance).  The two are multiplied to give an overall Risk Factor (R).  Mitigation must be provided for any risk with I = 3 or R >= 10. 144 | 145 | [cols="<4,1,1,1,<4",options="header,pagewidth",] 146 | |============================================================================= 147 | | _Risk_  | _I_ | _L_ | _R_ | _Mitigation_ 148 | | No silicon available for testing | 3 | 2 | 6 | Use cycle accurate models. Persuading more companies to donate. 149 | | Too few engineers committed by members | 3 | 1  | 6 | RISC-V International to fund contract engineers to do the work. 150 | | Too slow progress made by the projects under tracking | 1 | 8  | 8 | Raise this status to software SC or TSC/CTO. 151 | |============================================================================= 152 | 153 | The risk register will be maintained on an ongoing basis. 154 | 155 | == Support 156 | 157 | * Physical boards are needed. 158 |   - Especially the boards that can run Linux are welcome. 159 | * Members can run scripts in their own boards and upload the data. 160 | * Need commercial toolchains & runtimes to run the scripts and send back the performance data to the tracking platform. 161 | 162 | 163 | Table of member organizations and commitments 164 | 165 | [cols="<4,<4,1,1,1",options="header,pagewidth",] 166 | |============================================================================= 167 | | _Organization_  | _Commitment_ | _Past_ | _2021_ | _2022_ 168 | | PLCT Lab.       | 12 engineer months compiler expertise; VM for website; hardware | X | X | X 169 | | StarFive        | Hardware: Hifive Unleashed; Unmatched; | X | X | TBD 170 | | ICT, CAS        | Open Source RISC-V Core & FPGA emulation platform  | N/A | X | X 171 | | Nuclei System Technology        | Nuclei DDR 200T FPGA Evaluation Board, Supporting Nuclei 200/300/600/900  | N/A | X | X 172 | |============================================================================= 173 | -------------------------------------------------------------------------------- /projects/linker-files/.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore files if someone follows the steps to generate the example 2 | demo 3 | globvars.i 4 | globvars.o 5 | globvars.s 6 | prog.i 7 | prog.o 8 | prog.s 9 | -------------------------------------------------------------------------------- /projects/linker-files/globvars.c: -------------------------------------------------------------------------------- 1 | __attribute__((section(".data"))) int globvar1 = 42; 2 | __attribute__((section(".data"))) int globvar2 = 561; 3 | -------------------------------------------------------------------------------- /projects/linker-files/prog.c: -------------------------------------------------------------------------------- 1 | extern int globvar1; 2 | extern int globvar2; 3 | 4 | int 5 | main () 6 | { 7 | int z = globvar1 + globvar2; 8 | return z; 9 | } 10 | -------------------------------------------------------------------------------- /projects/linker-optimizations.adoc: -------------------------------------------------------------------------------- 1 | = Project proposal: Linker Optimization 2 | RISC-V International Code Speed Optimization SIG 3 | 4 | //// 5 | SPDX-License-Identifier: CC-BY-4.0 6 | 7 | Document conventions: 8 | - one line per paragraph (don't fill lines - this makes changes clearer) 9 | - Wikipedia heading conventions (First word only capitalized) 10 | - US spelling throughout. 11 | //// 12 | 13 | == Introduction 14 | 15 | For a number of scenarios, optimization of code is only possible once addresses have been resolved after linker relaxation and relocation. 16 | 17 | Consider a program in two source files. First link:linker-files/prog.c[prog.c] 18 | [source,c] 19 | ---- 20 | extern int globvar1; 21 | extern int globvar2; 22 | 23 | int 24 | main () 25 | { 26 | int z = globvar1 + globvar2; 27 | return z; 28 | } 29 | ---- 30 | 31 | and a second file, link:linker-files/globvars.c[globvars.c], providing a definition of the two global variables: 32 | [source,c] 33 | ---- 34 | __attribute__((section(".data"))) int globvar1 = 42; 35 | __attribute__((section(".data"))) int globvar2 = 561; 36 | ---- 37 | 38 | For this example, we use the `section` attribute to prevent the variables ending up in small data accessible via the `gp` register (we want to treat their addresses as potentially large). 39 | 40 | We compile them with optimization to object, saving their intermediate assembly files for inspection and then link them: 41 | [source,bash] 42 | ---- 43 | riscv32-unknown-elf-gcc -O2 -save-temps -c prog.c 44 | riscv32-unknown-elf-gcc -O2 -save-temps -c globvars.c 45 | riscv32-unknown-elf-gcc -o demo prog.o globvars.o 46 | ---- 47 | 48 | If we disassemble the generated executable, we see an apparent missed optimization: 49 | [source,objdump] 50 | ---- 51 | riscv32-unknown-elf-objdump --disassemble=main demo 52 | 53 | demo: file format elf32-littleriscv 54 | 55 | 56 | Disassembly of section .text: 57 | 58 | 00010074
: 59 | 10074: 67c5 lui a5,0x11 60 | 10076: 3847a503 lw a0,900(a5) # 11384 61 | 1007a: 67c5 lui a5,0x11 62 | 1007c: 3807a783 lw a5,896(a5) # 11380 63 | 10080: 953e add a0,a0,a5 64 | 10082: 8082 ret 65 | ---- 66 | 67 | Why is `a5` reloaded - surely copy propagation followed by dead code elimination should have eliminated the second `lui a5,0x11`. We can see why by looking at the object code from compiling `prog.c` in the file `prog.s` 68 | [source,gas] 69 | ---- 70 | main: 71 | lui a5,%hi(globvar1) 72 | lw a0,%lo(globvar1)(a5) 73 | lui a5,%hi(globvar2) 74 | lw a5,%lo(globvar2)(a5) 75 | add a0,a0,a5 76 | ret 77 | ---- 78 | 79 | The compiler cannot see that `%hi(globvar1)` and `%hi(globvar2)` are going to yield the same value because `globvar1` and `globvar2` end up in the same 4KiB page. This only becomes apparent once the final executable is linked. 80 | 81 | == Proposed solution 82 | 83 | The linker already provides some optimizations. The most obvious is relaxation, but in the case of RISC-V it also generates compressed instructions and makes some simplifying peephole like optimizations. 84 | 85 | To solve the problem described here, we need to extend the linker with some mainstream compiler optimizations that run in parallel with linker relaxation and relocation. The suggested initial set are 86 | - copy propagation 87 | - dead code elimination (DCE) 88 | 89 | The linker will need to know where to find basic blocks, but this information should be readily available, since it is created to support options like `-fprofile-arcs`. 90 | 91 | The major difference is that carrying out any optimization step potentially triggers a relaxation opportunity, causing addresses to change. This means that optimizations that have been already carried out may become invalid. Imagine for instance that in our example relaxation caused `globvar1` to slip into a different 4KiB page. Thus the optimizations must allow backtracking. 92 | 93 | Backtracking in compilers is avoided where possible for efficiency, but is not unknown: for example SNOBOL4 pattern matching or Prolog resolution. Thus this optimization will need to address the challenge of efficiency. 94 | 95 | There is one more challenge, which is the optimization will need to guarantee convergence. It must not be able to get into a cycle of optimization and backtracking that does not terminate. 96 | 97 | == Category of project 98 | 99 | This is a _research_ project. Academic exploration is needed to understand whether a practical solution is feasible: 100 | 101 | - can backtracking versions of these optimizations be created? 102 | - what is the implication for linker performance of these optimizations? 103 | - can convergence be guaranteed? 104 | 105 | == Document history 106 | 107 | [cols="<1,<2,<3,<4",options="header,pagewidth",] 108 | |================================================================================ 109 | | _Revision_ | _Date_ | _Author_ | _Modification_ 110 | | 0.01 | 4 November 2020 | 111 | 112 | Jeremy Bennett | 113 | 114 | Problem description and outline proposal. 115 | 116 | |================================================================================ 117 | -------------------------------------------------------------------------------- /projects/prd-outline.adoc: -------------------------------------------------------------------------------- 1 | = RISC-V Code Speed Optimization Project Requirement Definition Template 2 | 3 | //// 4 | SPDX-License-Identifier: CC-BY-4.0 5 | 6 | Document conventions: 7 | - one line per paragraph (don't fill lines - this makes changes clearer) 8 | - Wikipedia heading conventions (First word only capitalized) 9 | - US spelling throughout. 10 | //// 11 | 12 | This is the outline template for developing project proposals 13 | 14 | == Overview 15 | 16 | === Scope 17 | 18 | Summarize the project and what it is to achieve 19 | 20 | === RISC-V market justification 21 | 22 | Why should RISC-V International be driving this project. 23 | 24 | === Document history 25 | 26 | [cols="<2,<2,<3,<7",options="header,pagewidth",] 27 | |================================================================================ 28 | | _Revision_ | _Date_ | _Author_ | _Modification_ 29 | | 0.01 | 2020-12-04 | 30 | 31 | Jeremy Bennett, 32 | Wei Wu | 33 | 34 | Initial version for discussion. 35 | 36 | |================================================================================ 37 | 38 | == Requirements 39 | 40 | === Context 41 | 42 | Explain the project in sufficient detail for the following sections to make sense. 43 | 44 | === Phasing 45 | 46 | Some projects may represent 10s of engineer years of effort over a number of years. In these cases, the project should be broken into a number of phases, lasting 3-6 months. Each phase should be logically coherent and have meaningful deliverables, such that just completing a single phase is a useful body of work. 47 | 48 | Note that is quite usual for the PRD to describe a multi-phase project, but only provide detailed planning for the first phase, with planning the next phase being one of the deliverables of each phase. 49 | 50 | === Outline plan and timescale 51 | 52 | The problem must be sufficiently well analysed that it can be planned in sufficient detail to provide meaningful estimates of effort required and likely timescales. 53 | 54 | === Prerequisites 55 | 56 | * Equipment needed, and where this can be obtained 57 | * Skills required by engineers 58 | * Other prerequisites - for example those working on FSF tools will need FSF copyright agreement in place 59 | 60 | === Deliverables 61 | 62 | The outputs of the project. For projects of any length it is reasonable to have interim deliverables. The success of the project is measured through its deliverables. 63 | 64 | A monthly report to the SIG is a routine deliverables for all projects, which will cover 65 | - key progress in the past month 66 | - table showing tasks to be done, in progress, complete and verified 67 | - test/benchmark results 68 | - updates to the risk register 69 | - plans for the next month 70 | 71 | === Milestones 72 | 73 | In order to be program managed, the project must have quantifiable milestones. Typically 1-2 per month. It is important that the milestones are objectively measurable. 74 | 75 | === Costs 76 | 77 | * equipment costs 78 | * license costs 79 | * personnel costs 80 | 81 | The costs should not be in cash terms, but in terms of equipment required, or engineer months/years of efforts required (and type of engineer required). 82 | 83 | == Risk register 84 | 85 | Risks are assessed by the Impact (I) they have on the project from 1 (minor) to project killer (3) and by the Likelihood (L) of the risk occurring from 1 (10% chance) through 10 (100% chance). The two are multiplied to give an overall Risk Factor (R). Mitigation must be provided for any risk with I = 3 or R >= 10. 86 | 87 | [cols="<4,1,1,1,<4",options="header,pagewidth",] 88 | |============================================================================= 89 | | _Risk_ | _I_ | _L_ | _R_ | _Mitigation_ 90 | | No silicon available for testing | 2 | 6 | 12 | Use cycle accurate models 91 | | Too few engineers committed by members | 92 | 93 | 3 | 2 | 6 | 94 | 95 | RISC-V International to fund contract engineers to do the work 96 | |============================================================================= 97 | 98 | The risk register will be maintained on an ongoing basis. 99 | 100 | == Support 101 | 102 | A project is only useful if members are willing to back it with real resources, which can be in kind or in cash. Generally a project should only go ahead under the aegis of RISC-V if multiple members are backing it. 103 | 104 | Table of member organizations and commitments 105 | 106 | [cols="<4,<4,1,1,1",options="header,pagewidth",] 107 | |============================================================================= 108 | | _Organization_ | _Commitment_ | _Past_ | _2021_ | _2022_ 109 | | Acme Inc. | 3 engineer months compiler expertise | | X | 110 | | Nadir AB | €50k funding | | | X 111 | | ... | ... | ... | ... | ... 112 | |============================================================================= 113 | --------------------------------------------------------------------------------