├── LICENSE ├── README.md ├── __init__.py ├── amr_io.py ├── dev1_coref.fof ├── dev2_coref.fof ├── docAMR.jpg ├── docAMR_from_gold.sh ├── docAMR_from_pairwise.sh ├── docAMR_from_unmerged.sh ├── docSmatch ├── __init__.py ├── amr.py └── smatch.py ├── doc_amr.py ├── doc_amr_baseline ├── README.md ├── __init__.py ├── amr_constituents.py ├── baseline_io.py ├── example │ ├── doc_sen.amr │ └── docamr_docAMR.out ├── get_allen_coref.py ├── make_doc_amr.py ├── requirements.txt ├── run_doc_amr_baseline.sh └── tests │ └── baseline_allennlp_test.sh ├── requirements.txt ├── setup.py ├── test_coref.fof └── train_coref.fof /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright 2022 International Business Machines 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Setup 2 | 3 | make an environment with python 3.7 4 | 5 | activate environment 6 | ``` 7 | pip install -r requirements.txt 8 | ``` 9 | 10 | # Create docAMR representation 11 | 12 | Link to NAACL 2022 paper DOCAMR: Multi-Sentence AMR Representation and Evaluation 13 | 14 | https://aclanthology.org/2022.naacl-main.256.pdf 15 | 16 | 17 | 18 | To create docAMR representation from gold AMR3.0 data and the coref annotation in xml format: 19 | ``` 20 | python doc_amr.py 21 | --amr3-path 22 | --coref-fof 23 | --out-amr 24 | --rep 25 | ``` 26 | `````` should point to uncompressed LDC data directory for LDC2020T02 with its original directory structure. 27 | 28 | `````` is one of the ```_coref.fof``` files included in this repository. 29 | 30 | Default value for ```--rep``` is ```'docAMR'```. Other values can be: ```'no-merge'```,```'merge-names'```,```'merge-all'```. Use ```--help``` to read the descriptions of these representations. 31 | 32 | ------- 33 | 34 | To create docAMR representation from dcoument AMRs with no nodes merged 35 | ``` 36 | python doc_amr.py 37 | --in-doc-amr-unmerged 38 | --rep 39 | --out-amr 40 | ``` 41 | 42 | ------- 43 | 44 | To create docAMR representation from dcoument AMRs with pairwise edges between a representative node in the chain and the rest of the nodes in the chain: 45 | ``` 46 | python doc_amr.py 47 | --in-doc-amr-pairwise 48 | --pairwise-coref-rel 49 | --rep 50 | --out-amr 51 | ``` 52 | 53 | default value for ```--pairwise-coref-rel``` is ```same-as``` 54 | 55 | # Evaluate docAMR (docSmatch) 56 | 57 | Use docSmatch the same way as the standard Smatch. 58 | 59 | ``` 60 | python docSmatch/smatch.py -f 61 | ``` 62 | 63 | It assumes that ```:snt``` relations connect sentences to the root. Moreover, it assumes that the numeric suffix of ```:snt``` is the sentence number and that the matching sentence numbers in the two AMRs are aligned. 64 | 65 | You can also get a detailed score breakdown for the accuracy of coreference prediction: 66 | ``` 67 | python docSmatch/smatch.py -f --coref-subscore 68 | ``` 69 | This will ouput the normal smatch score as 'Overall Score', as well as a 'Coref Score' indicating the quality of cross sentential edges and nodes. 70 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/docAMR/e8937bad9aa3fa4077751f9bcedfbfcbfa37047e/__init__.py -------------------------------------------------------------------------------- /dev1_coref.fof: -------------------------------------------------------------------------------- 1 | data/multisentence/ms-amr-split/train/msamr_dfa_004.xml 2 | data/multisentence/ms-amr-split/train/msamr_dfa_018.xml 3 | data/multisentence/ms-amr-split/train/msamr_dfa_022.xml 4 | data/multisentence/ms-amr-split/train/msamr_dfa_040.xml 5 | data/multisentence/ms-amr-split/train/msamr_dfa_044.xml 6 | data/multisentence/ms-amr-split/train/msamr_dfa_056.xml 7 | data/multisentence/ms-amr-split/train/msamr_dfa_064.xml 8 | data/multisentence/ms-amr-split/train/msamr_dfa_066.xml 9 | data/multisentence/ms-amr-split/train/msamr_dfa_073.xml 10 | data/multisentence/ms-amr-split/train/msamr_dfa_086.xml 11 | data/multisentence/ms-amr-split/train/msamr_dfa_092.xml 12 | data/multisentence/ms-amr-split/train/msamr_dfa_096.xml 13 | data/multisentence/ms-amr-split/train/msamr_dfa_097.xml 14 | data/multisentence/ms-amr-split/train/msamr_dfa_100.xml 15 | data/multisentence/ms-amr-split/train/msamr_dfa_104.xml 16 | data/multisentence/ms-amr-split/train/msamr_dfa_106.xml 17 | data/multisentence/ms-amr-split/train/msamr_dfa_108.xml 18 | data/multisentence/ms-amr-split/train/msamr_dfa_109.xml 19 | data/multisentence/ms-amr-split/train/msamr_dfa_138.xml 20 | data/multisentence/ms-amr-split/train/msamr_dfa_147.xml 21 | data/multisentence/ms-amr-split/train/msamr_dfb_002.xml 22 | data/multisentence/ms-amr-split/train/msamr_dfb_003.xml 23 | data/multisentence/ms-amr-split/train/msamr_dfb_006.xml 24 | data/multisentence/ms-amr-split/train/msamr_dfb_010.xml 25 | data/multisentence/ms-amr-split/train/msamr_dfb_013.xml 26 | data/multisentence/ms-amr-split/train/msamr_dfb_015.xml 27 | data/multisentence/ms-amr-split/train/msamr_dfb_019.xml 28 | data/multisentence/ms-amr-split/train/msamr_dfb_021.xml 29 | data/multisentence/ms-amr-split/train/msamr_dfb_022.xml 30 | data/multisentence/ms-amr-split/train/msamr_dfb_026.xml 31 | data/multisentence/ms-amr-split/train/msamr_dfb_028.xml 32 | data/multisentence/ms-amr-split/train/msamr_dfb_032.xml 33 | data/multisentence/ms-amr-split/train/msamr_dfb_038.xml 34 | data/multisentence/ms-amr-split/train/msamr_dfb_041.xml 35 | data/multisentence/ms-amr-split/train/msamr_dfb_054.xml 36 | data/multisentence/ms-amr-split/train/msamr_dfb_055.xml 37 | data/multisentence/ms-amr-split/train/msamr_dfb_057.xml 38 | data/multisentence/ms-amr-split/train/msamr_dfb_061.xml 39 | data/multisentence/ms-amr-split/train/msamr_dfb_066.xml 40 | data/multisentence/ms-amr-split/train/msamr_dfb_121.xml 41 | data/multisentence/ms-amr-split/train/msamr_dfb_133.xml 42 | data/multisentence/ms-amr-split/train/msamr_dfb_134.xml 43 | -------------------------------------------------------------------------------- /dev2_coref.fof: -------------------------------------------------------------------------------- 1 | data/multisentence/ms-amr-double-annotations/msamr_dfa_004.alternative.xml 2 | data/multisentence/ms-amr-double-annotations/msamr_dfa_018.alternative.xml 3 | data/multisentence/ms-amr-double-annotations/msamr_dfa_022.alternative.xml 4 | data/multisentence/ms-amr-double-annotations/msamr_dfa_040.alternative.xml 5 | data/multisentence/ms-amr-double-annotations/msamr_dfa_044.alternative.xml 6 | data/multisentence/ms-amr-double-annotations/msamr_dfa_056.alternative.xml 7 | data/multisentence/ms-amr-double-annotations/msamr_dfa_064.alternative.xml 8 | data/multisentence/ms-amr-double-annotations/msamr_dfa_066.alternative.xml 9 | data/multisentence/ms-amr-double-annotations/msamr_dfa_073.alternative.xml 10 | data/multisentence/ms-amr-double-annotations/msamr_dfa_086.alternative.xml 11 | data/multisentence/ms-amr-double-annotations/msamr_dfa_092.alternative.xml 12 | data/multisentence/ms-amr-double-annotations/msamr_dfa_096.alternative.xml 13 | data/multisentence/ms-amr-double-annotations/msamr_dfa_097.alternative.xml 14 | data/multisentence/ms-amr-double-annotations/msamr_dfa_100.alternative.xml 15 | data/multisentence/ms-amr-double-annotations/msamr_dfa_104.alternative.xml 16 | data/multisentence/ms-amr-double-annotations/msamr_dfa_106.alternative.xml 17 | data/multisentence/ms-amr-double-annotations/msamr_dfa_108.alternative.xml 18 | data/multisentence/ms-amr-double-annotations/msamr_dfa_109.alternative.xml 19 | data/multisentence/ms-amr-double-annotations/msamr_dfa_138.alternative.xml 20 | data/multisentence/ms-amr-double-annotations/msamr_dfa_147.alternative.xml 21 | data/multisentence/ms-amr-double-annotations/msamr_dfb_002.alternative.xml 22 | data/multisentence/ms-amr-double-annotations/msamr_dfb_003.alternative.xml 23 | data/multisentence/ms-amr-double-annotations/msamr_dfb_006.alternative.xml 24 | data/multisentence/ms-amr-double-annotations/msamr_dfb_010.alternative.xml 25 | data/multisentence/ms-amr-double-annotations/msamr_dfb_013.alternative.xml 26 | data/multisentence/ms-amr-double-annotations/msamr_dfb_015.alternative.xml 27 | data/multisentence/ms-amr-double-annotations/msamr_dfb_019.alternative.xml 28 | data/multisentence/ms-amr-double-annotations/msamr_dfb_021.alternative.xml 29 | data/multisentence/ms-amr-double-annotations/msamr_dfb_022.alternative.xml 30 | data/multisentence/ms-amr-double-annotations/msamr_dfb_026.alternative.xml 31 | data/multisentence/ms-amr-double-annotations/msamr_dfb_028.alternative.xml 32 | data/multisentence/ms-amr-double-annotations/msamr_dfb_032.alternative.xml 33 | data/multisentence/ms-amr-double-annotations/msamr_dfb_038.alternative.xml 34 | data/multisentence/ms-amr-double-annotations/msamr_dfb_041.alternative.xml 35 | data/multisentence/ms-amr-double-annotations/msamr_dfb_054.alternative.xml 36 | data/multisentence/ms-amr-double-annotations/msamr_dfb_055.alternative.xml 37 | data/multisentence/ms-amr-double-annotations/msamr_dfb_057.alternative.xml 38 | data/multisentence/ms-amr-double-annotations/msamr_dfb_061.alternative.xml 39 | data/multisentence/ms-amr-double-annotations/msamr_dfb_066.alternative.xml 40 | data/multisentence/ms-amr-double-annotations/msamr_dfb_121.alternative.xml 41 | data/multisentence/ms-amr-double-annotations/msamr_dfb_133.alternative.xml 42 | data/multisentence/ms-amr-double-annotations/msamr_dfb_134.alternative.xml 43 | -------------------------------------------------------------------------------- /docAMR.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/docAMR/e8937bad9aa3fa4077751f9bcedfbfcbfa37047e/docAMR.jpg -------------------------------------------------------------------------------- /docAMR_from_gold.sh: -------------------------------------------------------------------------------- 1 | 2 | 3 | amr3path="/dccstor/ykt-parse/SHARED/CORPORA/AMR/amr_annotation_3.0" 4 | split=$1 5 | rep=$2 6 | output_dir="outputs" 7 | 8 | mkdir -p $output_dir 9 | 10 | python doc_amr.py \ 11 | --amr3-path $amr3path \ 12 | --coref-fof ${split}_coref.fof \ 13 | --rep $rep \ 14 | --out-amr $output_dir/${split}.gold.$rep.out 15 | 16 | -------------------------------------------------------------------------------- /docAMR_from_pairwise.sh: -------------------------------------------------------------------------------- 1 | 2 | amr=$1 3 | rep=$2 4 | rel="same-as" 5 | 6 | python doc_amr.py \ 7 | --in-doc-amr-pairwise $amr \ 8 | --pairwise-coref-rel $rel \ 9 | --rep $rep \ 10 | --out-amr $amr.$rep.out 11 | -------------------------------------------------------------------------------- /docAMR_from_unmerged.sh: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | amr=$1 5 | rep=$2 6 | 7 | python doc_amr.py \ 8 | --in-doc-amr-unmerged $amr \ 9 | --rep $rep \ 10 | --out-amr $amr.$rep.out 11 | 12 | -------------------------------------------------------------------------------- /docSmatch/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/docAMR/e8937bad9aa3fa4077751f9bcedfbfcbfa37047e/docSmatch/__init__.py -------------------------------------------------------------------------------- /docSmatch/amr.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | """ 5 | This code is taken from https://github.com/snowblink14/smatch 6 | 7 | and changed for DocAMR to: 8 | 1. constrain smatch node mapping based on sentence alignements to make it faster 9 | 2. compute coref subscore for DocAMR 10 | 11 | ----- 12 | 13 | AMR (Abstract Meaning Representation) structure 14 | For detailed description of AMR, see http://www.isi.edu/natural-language/amr/a.pdf 15 | 16 | """ 17 | 18 | from __future__ import print_function 19 | from collections import defaultdict 20 | import sys 21 | 22 | # change this if needed 23 | ERROR_LOG = sys.stderr 24 | 25 | # change this if needed 26 | DEBUG_LOG = sys.stderr 27 | 28 | 29 | class AMR(object): 30 | """ 31 | AMR is a rooted, labeled graph to represent semantics. 32 | This class has the following members: 33 | nodes: list of node in the graph. Its ith element is the name of the ith node. For example, a node name 34 | could be "a1", "b", "g2", .etc 35 | node_values: list of node labels (values) of the graph. Its ith element is the value associated with node i in 36 | nodes list. In AMR, such value is usually a semantic concept (e.g. "boy", "want-01") 37 | root: root node name 38 | relations: list of edges connecting two nodes in the graph. Each entry is a link between two nodes, i.e. a triple 39 | . In AMR, such link denotes the relation between two semantic 40 | concepts. For example, "arg0" means that one of the concepts is the 0th argument of the other. 41 | attributes: list of edges connecting a node to an attribute name and its value. For example, if the polarity of 42 | some node is negative, there should be an edge connecting this node and "-". A triple < attribute name, 43 | node name, attribute value> is used to represent such attribute. It can also be viewed as a relation. 44 | 45 | """ 46 | def __init__(self, node_list=None, node_value_list=None, relation_list=None, attribute_list=None, nodes_by_sentence=None): 47 | """ 48 | node_list: names of nodes in AMR graph, e.g. "a11", "n" 49 | node_value_list: values of nodes in AMR graph, e.g. "group" for a node named "g" 50 | relation_list: list of relations between two nodes 51 | attribute_list: list of attributes (links between one node and one constant value) 52 | 53 | """ 54 | # initialize AMR graph nodes using list of nodes name 55 | # root, by default, is the first in var_list 56 | 57 | if node_list is None: 58 | self.nodes = [] 59 | self.root = None 60 | else: 61 | self.nodes = node_list[:] 62 | if len(node_list) != 0: 63 | self.root = node_list[0] 64 | else: 65 | self.root = None 66 | if node_value_list is None: 67 | self.node_values = [] 68 | else: 69 | self.node_values = node_value_list[:] 70 | if relation_list is None: 71 | self.relations = [] 72 | else: 73 | self.relations = relation_list[:] 74 | if attribute_list is None: 75 | self.attributes = [] 76 | else: 77 | self.attributes = attribute_list[:] 78 | 79 | self._descendants = None 80 | if nodes_by_sentence is None: 81 | self.categorize_nodes_by_sentence()#_uninverted() 82 | else: 83 | self.nodes_by_sentence = nodes_by_sentence[:] 84 | self.coref_nodes = [] 85 | self.coref_edges = [] 86 | self.named_entities = [] 87 | self.find_coref() 88 | 89 | def find_coref(self): 90 | self.coref_nodes = [] 91 | self.coref_edges = [] 92 | self.named_entities = [] 93 | sentences_by_node = {n:set() for n in self.nodes} 94 | for i,nodes_in_sentence in enumerate(self.nodes_by_sentence): 95 | for n in nodes_in_sentence: 96 | sentences_by_node[n].add(i) 97 | candidates = [n for n in sentences_by_node if len(sentences_by_node[n])>1] 98 | relations_to_node = {n:[] for n in self.nodes} 99 | for i,s in enumerate(self.nodes): 100 | for rel in self.relations[i]: 101 | r, t, is_inverted = rel 102 | if r not in ['coref','part','subset']: 103 | if is_inverted: 104 | relations_to_node[s].append((r+'-of', t)) 105 | else: 106 | relations_to_node[t].append((r, s)) 107 | # coref nodes 108 | special = ['implicit-role', 'coref-entity'] 109 | for n, concept in zip(self.nodes, self.node_values): 110 | if concept in special: 111 | self.coref_nodes.append(n) 112 | _coref_nodes = set() 113 | for n in self.coref_nodes: 114 | _coref_nodes.add(n) 115 | for n in candidates[:]: 116 | if len(relations_to_node[n]) < 2: continue 117 | rels = [rel for rel in relations_to_node[n] if sentences_by_node[rel[1]]!=sentences_by_node[n]] 118 | if len(rels) < 2: continue 119 | parent1 = rels[0][1] 120 | if any(sentences_by_node[parent2]!=sentences_by_node[parent1] for _,parent2 in rels): 121 | _coref_nodes.add(n) 122 | # coref edges 123 | for i,s in enumerate(self.nodes): 124 | for rel in self.relations[i]: 125 | r, t, _ = rel 126 | if t in _coref_nodes: 127 | self.coref_edges.append((r, s, t)) 128 | elif s in _coref_nodes and r in ['coref']: 129 | self.coref_edges.append((r, s, t)) 130 | elif r in ['part','subset'] and sentences_by_node[s]!=sentences_by_node[t]: 131 | self.coref_edges.append((r, s, t)) 132 | # named entities 133 | _named_entities = set() 134 | for i, s in enumerate(self.nodes): 135 | for rel in self.relations[i]: 136 | r, t, inv = rel 137 | if r == 'name': 138 | _named_entities.add(s) 139 | self.named_entities = [n for n in _named_entities] 140 | 141 | def categorize_nodes_by_sentence(self): 142 | 143 | # this method sorts nodes into sentence buckets 144 | # one node can go into multiple sentences' buckets 145 | 146 | # this is usefull for faster smatch for document AMR when sentence alignments are known 147 | # following constraint makes the smatch faster: 148 | # A node 'n' in one AMR can map to a node in the other AMR only if one of its sentence bucket aligns to a senetnce bucket of 'n'. 149 | # this code assumes that :sntx in one AMR is aligned to :snty in the other of x == y 150 | 151 | sen_roots = {} 152 | for rel in self.relations[self.nodes.index(self.root)]: 153 | if rel[0].startswith('snt'): 154 | idx = int(rel[0][3:]) 155 | sen_roots[idx] = rel[1] 156 | 157 | if len(sen_roots) == 0: 158 | self.nodes_by_sentence = [self.nodes[:]] 159 | return 160 | 161 | #find descendents of every node 162 | #so that descendents of each sentence root get populated 163 | descendents = {n: {n} for n in self.nodes} 164 | coref_rels = [] 165 | for (i, rels) in enumerate(self.relations): 166 | x = self.nodes[i] 167 | for rel in rels: 168 | r, y, is_inverted = rel 169 | 170 | if r != 'coref' and (r != 'domain' or not is_inverted): 171 | descendents[x].update(descendents[y]) 172 | for n in descendents: 173 | if x in descendents[n]: 174 | descendents[n].update(descendents[x]) 175 | 176 | if r == 'coref': 177 | coref_rels.append((x,r,y)) 178 | 179 | xx = x 180 | yy = y 181 | 182 | if is_inverted or r in ['part','subset','coref']: 183 | 184 | #if the edge was originally inverted 185 | #add descendents in the reverse direction too 186 | 187 | yy = x 188 | xx = y 189 | 190 | descendents[xx].update(descendents[yy]) 191 | for n in descendents: 192 | if xx in descendents[n]: 193 | descendents[n].update(descendents[xx]) 194 | 195 | for (x,r,y) in coref_rels: 196 | if y not in descendents[self.root]: 197 | if x in descendents[self.root]: 198 | descendents[x].update(descendents[y]) 199 | for n in descendents: 200 | if x in descendents[n]: 201 | descendents[n].update(descendents[x]) 202 | 203 | for node in self.nodes: 204 | if node not in descendents[self.root]: 205 | print(node + " is still not assigned to a sentence!!!") 206 | 207 | self.nodes_by_sentence = [[self.root]] 208 | for i in range(len(sen_roots)): 209 | nodes_in_sentence = list(descendents[sen_roots[i+1]]) 210 | self.nodes_by_sentence.append(nodes_in_sentence) 211 | 212 | def rename_node(self, prefix): 213 | """ 214 | Rename AMR graph nodes to prefix + node_index to avoid nodes with the same name in two different AMRs. 215 | 216 | """ 217 | self.new2old_map = {} 218 | node_map_dict = {} 219 | # map each node to its new name (e.g. "a1") 220 | for i in range(0, len(self.nodes)): 221 | node_map_dict[self.nodes[i]] = prefix + str(i) 222 | self.new2old_map[prefix + str(i)] = self.nodes[i] 223 | # update node name 224 | for i, v in enumerate(self.nodes): 225 | self.nodes[i] = node_map_dict[v] 226 | # update names for list of persentence node lists 227 | new_nodes_by_sentence = [] 228 | for i in range(len(self.nodes_by_sentence)): 229 | new_nodes_by_sentence.append([]) 230 | for j in range(len(self.nodes_by_sentence[i])): 231 | if node_map_dict[self.nodes_by_sentence[i][j]] not in new_nodes_by_sentence[i]: 232 | new_nodes_by_sentence[i].append(node_map_dict[self.nodes_by_sentence[i][j]]) 233 | self.nodes_by_sentence = new_nodes_by_sentence 234 | # update node name in relations 235 | for node_relations in self.relations: 236 | for i, l in enumerate(node_relations): 237 | node_relations[i][1] = node_map_dict[l[1]] 238 | self.coref_nodes = [node_map_dict[n] for n in self.coref_nodes] 239 | self.coref_edges = [(r, node_map_dict[s], node_map_dict[t]) for r,s,t in self.coref_edges] 240 | self.named_entities = [node_map_dict[n] for n in self.named_entities] 241 | 242 | def get_triples(self): 243 | """ 244 | Get the triples in three lists. 245 | instance_triple: a triple representing an instance. E.g. instance(w, want-01) 246 | attribute triple: relation of attributes, e.g. polarity(w, - ) 247 | and relation triple, e.g. arg0 (w, b) 248 | 249 | """ 250 | instance_triple = [] 251 | relation_triple = [] 252 | attribute_triple = [] 253 | for i in range(len(self.nodes)): 254 | instance_triple.append(("instance", self.nodes[i], self.node_values[i])) 255 | # l[0] is relation name 256 | # l[1] is the other node this node has relation with 257 | for l in self.relations[i]: 258 | relation_triple.append((l[0], self.nodes[i], l[1])) 259 | # l[0] is the attribute name 260 | # l[1] is the attribute value 261 | for l in self.attributes[i]: 262 | attribute_triple.append((l[0], self.nodes[i], l[1])) 263 | return instance_triple, attribute_triple, relation_triple 264 | 265 | 266 | def get_triples2(self): 267 | """ 268 | Get the triples in two lists: 269 | instance_triple: a triple representing an instance. E.g. instance(w, want-01) 270 | relation_triple: a triple representing all relations. E.g arg0 (w, b) or E.g. polarity(w, - ) 271 | Note that we do not differentiate between attribute triple and relation triple. Both are considered as relation 272 | triples. 273 | All triples are represented by (triple_type, argument 1 of the triple, argument 2 of the triple) 274 | 275 | """ 276 | instance_triple = [] 277 | relation_triple = [] 278 | for i in range(len(self.nodes)): 279 | # an instance triple is instance(node name, node value). 280 | # For example, instance(b, boy). 281 | instance_triple.append(("instance", self.nodes[i], self.node_values[i])) 282 | # l[0] is relation name 283 | # l[1] is the other node this node has relation with 284 | for l in self.relations[i]: 285 | relation_triple.append((l[0], self.nodes[i], l[1])) 286 | # l[0] is the attribute name 287 | # l[1] is the attribute value 288 | for l in self.attributes[i]: 289 | relation_triple.append((l[0], self.nodes[i], l[1])) 290 | return instance_triple, relation_triple 291 | 292 | 293 | def __str__(self): 294 | """ 295 | Generate AMR string for better readability 296 | 297 | """ 298 | lines = [] 299 | for i in range(len(self.nodes)): 300 | lines.append("Node "+ str(i) + " " + self.nodes[i]) 301 | lines.append("Value: " + self.node_values[i]) 302 | lines.append("Relations:") 303 | for relation in self.relations[i]: 304 | lines.append("Node " + relation[1] + " via " + relation[0]) 305 | for attribute in self.attributes[i]: 306 | lines.append("Attribute: " + attribute[0] + " value " + attribute[1]) 307 | return "\n".join(lines) 308 | 309 | def __repr__(self): 310 | return self.__str__() 311 | 312 | def output_amr(self): 313 | """ 314 | Output AMR string 315 | 316 | """ 317 | print(self.__str__(), file=DEBUG_LOG) 318 | 319 | @staticmethod 320 | def get_amr_line(input_f): 321 | """ 322 | Read the file containing AMRs. AMRs are separated by a blank line. 323 | Each call of get_amr_line() returns the next available AMR (in one-line form). 324 | Note: this function does not verify if the AMR is valid 325 | 326 | """ 327 | cur_amr = [] 328 | has_content = False 329 | for line in input_f: 330 | line = line.strip() 331 | if line == "": 332 | if not has_content: 333 | # empty lines before current AMR 334 | continue 335 | else: 336 | # end of current AMR 337 | break 338 | if line.strip().startswith("#"): 339 | #if "::id" in line: 340 | # print(line) 341 | # ignore the comment line (starting with "#") in the AMR file 342 | continue 343 | else: 344 | has_content = True 345 | cur_amr.append(line.strip()) 346 | return "".join(cur_amr) 347 | 348 | @staticmethod 349 | def parse_AMR_line(line): 350 | """ 351 | Parse a AMR from line representation to an AMR object. 352 | This parsing algorithm scans the line once and process each character, in a shift-reduce style. 353 | 354 | """ 355 | # Current state. It denotes the last significant symbol encountered. 1 for (, 2 for :, 3 for /, 356 | # and 0 for start state or ')' 357 | # Last significant symbol is ( --- start processing node name 358 | # Last significant symbol is : --- start processing relation name 359 | # Last significant symbol is / --- start processing node value (concept name) 360 | # Last significant symbol is ) --- current node processing is complete 361 | # Note that if these symbols are inside parenthesis, they are not significant symbols. 362 | 363 | exceptions =set(["prep-on-behalf-of", "prep-out-of", "consist-of"]) 364 | def update_triple(node_relation_dict, u, r, v): 365 | # we detect a relation (r) between u and v, with direction u to v. 366 | # in most cases, if relation name ends with "-of", e.g."arg0-of", 367 | # it is reverse of some relation. For example, if a is "arg0-of" b, 368 | # we can also say b is "arg0" a. 369 | # If the relation name ends with "-of", we store the reverse relation. 370 | # but note some exceptions like "prep-on-behalf-of" and "prep-out-of" 371 | # also note relation "mod" is the reverse of "domain" 372 | if r.endswith("-of") and not r in exceptions: 373 | #if u.split(".")[0] != v.split(".")[0]: 374 | # print(u+" -" + r + "-> "+v) 375 | node_relation_dict[v].append((r[:-3], u, True)) 376 | elif r=="mod": 377 | node_relation_dict[v].append(("domain", u, True)) 378 | else: 379 | node_relation_dict[u].append((r, v, False)) 380 | 381 | state = 0 382 | # node stack for parsing 383 | stack = [] 384 | # current not-yet-reduced character sequence 385 | cur_charseq = [] 386 | # key: node name value: node value 387 | node_dict = {} 388 | # node name list (order: occurrence of the node) 389 | node_name_list = [] 390 | # key: node name: value: list of (relation name, the other node name) 391 | node_relation_dict1 = defaultdict(list) 392 | # key: node name, value: list of (attribute name, const value) or (relation name, unseen node name) 393 | node_relation_dict2 = defaultdict(list) 394 | # current relation name 395 | cur_relation_name = "" 396 | # having unmatched quote string 397 | in_quote = False 398 | for i, c in enumerate(line.strip()): 399 | if c == " ": 400 | # allow space in relation name 401 | if state == 2: 402 | cur_charseq.append(c) 403 | continue 404 | if c == "\"": 405 | # flip in_quote value when a quote symbol is encountered 406 | # insert placeholder if in_quote from last symbol 407 | if in_quote: 408 | cur_charseq.append('_') 409 | in_quote = not in_quote 410 | elif c == "(": 411 | # not significant symbol if inside quote 412 | if in_quote: 413 | cur_charseq.append(c) 414 | continue 415 | # get the attribute name 416 | # e.g :arg0 (x ... 417 | # at this point we get "arg0" 418 | if state == 2: 419 | # in this state, current relation name should be empty 420 | if cur_relation_name != "": 421 | print("Format error when processing ", line[0:i + 1], file=ERROR_LOG) 422 | return None 423 | # update current relation name for future use 424 | cur_relation_name = "".join(cur_charseq).strip() 425 | cur_charseq[:] = [] 426 | state = 1 427 | elif c == ":": 428 | # not significant symbol if inside quote 429 | if in_quote: 430 | cur_charseq.append(c) 431 | continue 432 | # Last significant symbol is "/". Now we encounter ":" 433 | # Example: 434 | # :OR (o2 / *OR* 435 | # :mod (o3 / official) 436 | # gets node value "*OR*" at this point 437 | if state == 3: 438 | node_value = "".join(cur_charseq) 439 | # clear current char sequence 440 | cur_charseq[:] = [] 441 | # pop node name ("o2" in the above example) 442 | cur_node_name = stack[-1] 443 | # update node name/value map 444 | node_dict[cur_node_name] = node_value 445 | # Last significant symbol is ":". Now we encounter ":" 446 | # Example: 447 | # :op1 w :quant 30 448 | # or :day 14 :month 3 449 | # the problem is that we cannot decide if node value is attribute value (constant) 450 | # or node value (variable) at this moment 451 | elif state == 2: 452 | temp_attr_value = "".join(cur_charseq) 453 | cur_charseq[:] = [] 454 | parts = temp_attr_value.split() 455 | if len(parts) < 2: 456 | import ipdb; ipdb.set_trace() 457 | print("Error in processing; part len < 2", line[0:i + 1], file=ERROR_LOG) 458 | return None 459 | # For the above example, node name is "op1", and node value is "w" 460 | # Note that this node name might not be encountered before 461 | relation_name = parts[0].strip() 462 | relation_value = parts[1].strip() 463 | # We need to link upper level node to the current 464 | # top of stack is upper level node 465 | if len(stack) == 0: 466 | print("Error in processing", line[:i], relation_name, relation_value, file=ERROR_LOG) 467 | return None 468 | # if we have not seen this node name before 469 | if relation_value not in node_dict: 470 | update_triple(node_relation_dict2, stack[-1], relation_name, relation_value) 471 | else: 472 | update_triple(node_relation_dict1, stack[-1], relation_name, relation_value) 473 | state = 2 474 | elif c == "/": 475 | if in_quote: 476 | cur_charseq.append(c) 477 | continue 478 | # Last significant symbol is "(". Now we encounter "/" 479 | # Example: 480 | # (d / default-01 481 | # get "d" here 482 | if state == 1: 483 | node_name = "".join(cur_charseq) 484 | cur_charseq[:] = [] 485 | # if this node name is already in node_dict, it is duplicate 486 | if node_name in node_dict: 487 | print("Duplicate node name ", node_name, " in parsing AMR", file=ERROR_LOG) 488 | return None 489 | # push the node name to stack 490 | stack.append(node_name) 491 | # add it to node name list 492 | node_name_list.append(node_name) 493 | # if this node is part of the relation 494 | # Example: 495 | # :arg1 (n / nation) 496 | # cur_relation_name is arg1 497 | # node name is n 498 | # we have a relation arg1(upper level node, n) 499 | if cur_relation_name != "": 500 | update_triple(node_relation_dict1, stack[-2], cur_relation_name, node_name) 501 | cur_relation_name = "" 502 | else: 503 | # error if in other state 504 | print("Error in parsing AMR", line[0:i + 1], file=ERROR_LOG) 505 | return None 506 | state = 3 507 | elif c == ")": 508 | if in_quote: 509 | cur_charseq.append(c) 510 | continue 511 | # stack should be non-empty to find upper level node 512 | if len(stack) == 0: 513 | print("Unmatched parenthesis at position", i, "in processing", line[0:i + 1], file=ERROR_LOG) 514 | return None 515 | # Last significant symbol is ":". Now we encounter ")" 516 | # Example: 517 | # :op2 "Brown") or :op2 w) 518 | # get \"Brown\" or w here 519 | if state == 2: 520 | temp_attr_value = "".join(cur_charseq) 521 | cur_charseq[:] = [] 522 | parts = temp_attr_value.split() 523 | if len(parts) < 2: 524 | print("Error processing", line[:i + 1], temp_attr_value, file=ERROR_LOG) 525 | return None 526 | relation_name = parts[0].strip() 527 | relation_value = parts[1].strip() 528 | # attribute value not seen before 529 | # Note that it might be a constant attribute value, or an unseen node 530 | # process this after we have seen all the node names 531 | if relation_value not in node_dict: 532 | update_triple(node_relation_dict2, stack[-1], relation_name, relation_value) 533 | else: 534 | update_triple(node_relation_dict1, stack[-1], relation_name, relation_value) 535 | # Last significant symbol is "/". Now we encounter ")" 536 | # Example: 537 | # :arg1 (n / nation) 538 | # we get "nation" here 539 | elif state == 3: 540 | node_value = "".join(cur_charseq) 541 | cur_charseq[:] = [] 542 | cur_node_name = stack[-1] 543 | # map node name to its value 544 | node_dict[cur_node_name] = node_value 545 | # pop from stack, as the current node has been processed 546 | stack.pop() 547 | cur_relation_name = "" 548 | state = 0 549 | else: 550 | # not significant symbols, so we just shift. 551 | cur_charseq.append(c) 552 | #create data structures to initialize an AMR 553 | node_value_list = [] 554 | relation_list = [] 555 | attribute_list = [] 556 | for v in node_name_list: 557 | if v not in node_dict: 558 | print("Error: Node name not found", v, file=ERROR_LOG) 559 | return None 560 | else: 561 | node_value_list.append(node_dict[v]) 562 | # build relation list and attribute list for this node 563 | node_rel_list = [] 564 | node_attr_list = [] 565 | if v in node_relation_dict1: 566 | for v1 in node_relation_dict1[v]: 567 | node_rel_list.append([v1[0], v1[1], v1[2]]) 568 | if v in node_relation_dict2: 569 | for v2 in node_relation_dict2[v]: 570 | # if value is in quote, it is a constant value 571 | # strip the quote and put it in attribute map 572 | if v2[1][0] == "\"" and v2[1][-1] == "\"": 573 | node_attr_list.append([[v2[0]], v2[1][1:-1], v2[2]]) 574 | # if value is a node name 575 | elif v2[1] in node_dict: 576 | node_rel_list.append([v2[0], v2[1], v2[2]]) 577 | else: 578 | node_attr_list.append([v2[0], v2[1], v2[2]]) 579 | # each node has a relation list and attribute list 580 | relation_list.append(node_rel_list) 581 | attribute_list.append(node_attr_list) 582 | # add TOP as an attribute. The attribute value just needs to be constant 583 | attribute_list[0].append(["TOP", 'top']) 584 | result_amr = AMR(node_name_list, node_value_list, relation_list, attribute_list) 585 | return result_amr 586 | 587 | # test AMR parsing 588 | # run by amr.py [file containing AMR] 589 | # a unittest can also be used. 590 | if __name__ == "__main__": 591 | if len(sys.argv) < 2: 592 | print("No file given", file=ERROR_LOG) 593 | exit(1) 594 | amr_count = 1 595 | for line in open(sys.argv[1]): 596 | cur_line = line.strip() 597 | if cur_line == "" or cur_line.startswith("#"): 598 | continue 599 | print("AMR", amr_count, file=DEBUG_LOG) 600 | current = AMR.parse_AMR_line(cur_line) 601 | current.output_amr() 602 | amr_count += 1 603 | -------------------------------------------------------------------------------- /docSmatch/smatch.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | # -*- coding: utf-8 -*- 3 | 4 | 5 | """ 6 | This code is taken from https://github.com/snowblink14/smatch 7 | 8 | and changed for DocAMR to: 9 | 1. constrain smatch node mapping based on sentence alignements to make it faster 10 | 2. compute coref subscore for DocAMR 11 | """ 12 | 13 | 14 | """ 15 | This script computes smatch score between two AMRs. 16 | For detailed description of smatch, see http://www.isi.edu/natural-language/amr/smatch-13.pdf 17 | 18 | """ 19 | 20 | import random 21 | 22 | from . import amr 23 | import sys 24 | import time 25 | from tqdm import tqdm 26 | 27 | # total number of iteration in smatch computation 28 | iteration_num = 5 29 | allowed_mappings = {} 30 | # verbose output switch. 31 | # Default false (no verbose output) 32 | verbose = False 33 | veryVerbose = False 34 | 35 | # single score output switch. 36 | # Default true (compute a single score for all AMRs in two files) 37 | single_score = True 38 | 39 | # precision and recall output switch. 40 | # Default false (do not output precision and recall, just output F score) 41 | pr_flag = False 42 | 43 | # Error log location 44 | ERROR_LOG = sys.stderr 45 | 46 | # Debug log location 47 | DEBUG_LOG = sys.stderr 48 | 49 | # dictionary to save pre-computed node mapping and its resulting triple match count 50 | # key: tuples of node mapping 51 | # value: the matching triple count 52 | match_triple_dict = {} 53 | 54 | 55 | class SMATCH_Alignment(): 56 | 57 | def __init__(self, mapping, pred_amr, gold_amr): 58 | (pred_instance, pred_attributes, pred_relation) = pred_amr.get_triples() 59 | (gold_instance, gold_attributes, gold_relation) = gold_amr.get_triples() 60 | 61 | self.pred_concepts = {} 62 | self.gold_concepts = {} 63 | self.pred_edges = {} 64 | self.gold_edges = {} 65 | self.pred_attr = {} 66 | self.gold_attr = {} 67 | for instance in pred_instance: 68 | r, s, t = instance 69 | self.pred_concepts[s] = t 70 | for instance in gold_instance: 71 | r, s, t = instance 72 | self.gold_concepts[s] = t 73 | for rel in pred_relation: 74 | r, s, t = rel 75 | if not (s,t) in self.pred_edges: 76 | self.pred_edges[(s, t)] = [] 77 | self.pred_edges[(s, t)].append(r) 78 | for rel in gold_relation: 79 | r, s, t = rel 80 | if not (s, t) in self.gold_edges: 81 | self.gold_edges[(s, t)] = [] 82 | self.gold_edges[(s, t)].append(r) 83 | for rel in pred_attributes: 84 | r, s, t = rel 85 | if s not in self.pred_attr: 86 | self.pred_attr[s] = [] 87 | self.pred_attr[s].append((r, normalize(t))) 88 | for rel in gold_attributes: 89 | r, s, t = rel 90 | if s not in self.gold_attr: 91 | self.gold_attr[s] = [] 92 | self.gold_attr[s].append((r, normalize(t))) 93 | 94 | self.node_align = {} 95 | self.node_align_inv = {} 96 | self.edge_align = {} 97 | self.edge_align_inv = {} 98 | self.attr_align = {} 99 | 100 | self._build_node_alignment(mapping, pred_instance, gold_instance) 101 | self._build_edge_alignment() 102 | self._build_attr_alignment() 103 | 104 | 105 | def _build_node_alignment(self, mapping, pred_instance, gold_instance): 106 | for instance2 in gold_instance: 107 | r, s, t = instance2 108 | self.node_align_inv[('instance',s,t)] = set() 109 | for instance1, m in zip(pred_instance, mapping): 110 | r,s,t = instance1 111 | if m>-1: 112 | r2,s2,t2 = gold_instance[m] 113 | self.node_align[('instance',s,t)] = ('instance',s2,t2) 114 | self.node_align_inv[('instance',s2,t2)].add(('instance',s,t)) 115 | else: 116 | self.node_align[('instance',s,t)] = None 117 | 118 | 119 | def _build_edge_alignment(self): 120 | for s, t in self.pred_edges: 121 | rs = self.pred_edges[(s, t)] 122 | s2 = self.pred_to_gold(node=s) 123 | t2 = self.pred_to_gold(node=t) 124 | if not s2 or not t2: 125 | for r in rs: 126 | self.edge_align[(r, s, t)] = None 127 | continue 128 | if (s2, t2) in self.gold_edges: 129 | rs2 = self.gold_edges[(s2, t2)] 130 | for r in rs: 131 | if r in rs2: 132 | self.edge_align[(r, s, t)] = (r, s2, t2) 133 | else: 134 | self.edge_align[(r, s, t)] = (rs2[0], s2, t2) 135 | else: 136 | for r in rs: 137 | self.edge_align[(r, s, t)] = None 138 | self.edge_align_inv = {} 139 | for s,t in self.gold_edges: 140 | rs = self.gold_edges[(s, t)] 141 | for r in rs: 142 | self.edge_align_inv[(r,s,t)] = set() 143 | for rel in self.edge_align: 144 | rel2 = self.edge_align[rel] 145 | if rel2: 146 | self.edge_align_inv[rel2].add(rel) 147 | 148 | 149 | def _build_attr_alignment(self): 150 | for n in self.pred_attr: 151 | for r, t in self.pred_attr[n]: 152 | n2 = self.pred_to_gold(node=n) 153 | if not n2: 154 | self.attr_align[(r, n, t)] = None 155 | continue 156 | if n2 in self.gold_attr and (r, t) in self.gold_attr[n2]: 157 | self.attr_align[(r, n, t)] = (r, n2, t) 158 | else: 159 | self.attr_align[(r, n, t)] = None 160 | 161 | def gold_to_pred(self, node): 162 | if node: 163 | x = self.node_align_inv[('instance', node, self.gold_concepts[node])] 164 | return [n for _, n, c in x] 165 | 166 | def pred_to_gold(self, node): 167 | if node: 168 | x = self.node_align[('instance', node, self.pred_concepts[node])] 169 | if not x: 170 | return None 171 | _, node2, concept = x 172 | return node2 173 | 174 | def iterate_errors(self): 175 | for rel in self.node_align: 176 | rel2 = self.node_align[rel] 177 | if not rel2: 178 | yield 'instance', rel, rel2 179 | continue 180 | r,s,t = rel 181 | r2,s2,t2 = rel2 182 | if t!=t2: 183 | yield 'instance', rel, rel2 184 | for rel in self.edge_align: 185 | rel2 = self.edge_align[rel] 186 | if not rel2: 187 | yield 'relation', rel, rel2 188 | continue 189 | r, s, t = rel 190 | r2, s2, t2 = rel2 191 | if r!=r2: 192 | yield 'relation', rel, rel2 193 | for rel in self.attr_align: 194 | rel2 = self.attr_align[rel] 195 | if not rel2: 196 | yield 'attribute', rel, rel2 197 | continue 198 | r, s, t = rel 199 | r2, s2, t2 = rel2 200 | if r != r2 or t!=t2: 201 | yield 'attribute', rel, rel2 202 | 203 | class Scores: 204 | def __init__(self): 205 | self.num = 0 206 | self.gold_total = 0 207 | self.pred_total = 0 208 | 209 | def get(self): 210 | return self.num, self.pred_total, self.gold_total 211 | 212 | def set(self, num=None, pred=None, gold=None): 213 | if num is not None: 214 | self.num = num 215 | if gold is not None: 216 | self.gold_total = gold 217 | if pred is not None: 218 | self.pred_total = pred 219 | 220 | def increment(self, num=None, pred=None, gold=None): 221 | if num is not None: 222 | self.num += num 223 | if gold is not None: 224 | self.gold_total += gold 225 | if pred is not None: 226 | self.pred_total += pred 227 | 228 | def update(self, scores): 229 | self.num += scores.num 230 | self.gold_total += scores.gold_total 231 | self.pred_total += scores.pred_total 232 | 233 | 234 | 235 | def get_best_match(instance1, attribute1, relation1, 236 | instance2, attribute2, relation2, 237 | prefix1, prefix2, doinstance=True, doattribute=True, dorelation=True, nodes_by_sentence1=None, nodes_by_sentence2=None): 238 | """ 239 | Get the highest triple match number between two sets of triples via hill-climbing. 240 | Arguments: 241 | instance1: instance triples of AMR 1 ("instance", node name, node value) 242 | attribute1: attribute triples of AMR 1 (attribute name, node name, attribute value) 243 | relation1: relation triples of AMR 1 (relation name, node 1 name, node 2 name) 244 | instance2: instance triples of AMR 2 ("instance", node name, node value) 245 | attribute2: attribute triples of AMR 2 (attribute name, node name, attribute value) 246 | relation2: relation triples of AMR 2 (relation name, node 1 name, node 2 name) 247 | prefix1: prefix label for AMR 1 248 | prefix2: prefix label for AMR 2 249 | Returns: 250 | best_match: the node mapping that results in the highest triple matching number 251 | best_match_num: the highest triple matching number 252 | 253 | """ 254 | # Compute candidate pool - all possible node match candidates. 255 | # In the hill-climbing, we only consider candidate in this pool to save computing time. 256 | # weight_dict is a dictionary that maps a pair of node 257 | (candidate_mappings, weight_dict) = compute_pool(instance1, attribute1, relation1, 258 | instance2, attribute2, relation2, 259 | prefix1, prefix2, doinstance=doinstance, doattribute=doattribute, 260 | dorelation=dorelation, 261 | lol1=nodes_by_sentence1, lol2=nodes_by_sentence2) 262 | if veryVerbose: 263 | print("Candidate mappings:", file=DEBUG_LOG) 264 | print(candidate_mappings, file=DEBUG_LOG) 265 | print("Weight dictionary", file=DEBUG_LOG) 266 | print(weight_dict, file=DEBUG_LOG) 267 | 268 | lol1 = [] 269 | lol2 = [] 270 | for n in range(len(nodes_by_sentence1)): 271 | lol1.append([]) 272 | for node in nodes_by_sentence1[n]: 273 | node_index = int(node[len(prefix1):]) 274 | lol1[-1].append(node_index) 275 | for n in range(len(nodes_by_sentence2)): 276 | lol2.append([]) 277 | for node in nodes_by_sentence2[n]: 278 | node_index = int(node[len(prefix2):]) 279 | lol2[-1].append(node_index) 280 | 281 | best_match_num = 0 282 | # initialize best match mapping 283 | # the ith entry is the node index in AMR 2 which maps to the ith node in AMR 1 284 | best_mapping = [-1] * len(instance1) 285 | for i in range(iteration_num): 286 | if veryVerbose: 287 | print("Iteration", i, file=DEBUG_LOG) 288 | if i == 0: 289 | # smart initialization used for the first round 290 | cur_mapping = smart_init_mapping(candidate_mappings, instance1, instance2) 291 | else: 292 | # random initialization for the other round 293 | cur_mapping = random_init_mapping(candidate_mappings) 294 | # compute current triple match number 295 | match_num = compute_match(cur_mapping, weight_dict) 296 | if veryVerbose: 297 | print("Node mapping at start", cur_mapping, file=DEBUG_LOG) 298 | print("Triple match number at start:", match_num, file=DEBUG_LOG) 299 | while True: 300 | # get best gain 301 | (gain, new_mapping) = get_best_gain(cur_mapping, candidate_mappings, weight_dict, 302 | len(instance2), match_num, lol1=lol1) 303 | if veryVerbose: 304 | print("Gain after the hill-climbing", gain, file=DEBUG_LOG) 305 | # hill-climbing until there will be no gain for new node mapping 306 | if gain <= 0: 307 | break 308 | # otherwise update match_num and mapping 309 | match_num += gain 310 | cur_mapping = new_mapping[:] 311 | if veryVerbose: 312 | print("Update triple match number to:", match_num, file=DEBUG_LOG) 313 | print("Current mapping:", cur_mapping, file=DEBUG_LOG) 314 | if match_num > best_match_num: 315 | best_mapping = cur_mapping[:] 316 | best_match_num = match_num 317 | return best_mapping, best_match_num, weight_dict 318 | 319 | 320 | def normalize(item): 321 | """ 322 | lowercase and remove quote signifiers from items that are about to be compared 323 | """ 324 | return item.lower().rstrip('_') 325 | 326 | 327 | def compute_pool(instance1, attribute1, relation1, 328 | instance2, attribute2, relation2, 329 | prefix1, prefix2, doinstance=True, doattribute=True, dorelation=True, lol1=None, lol2=None): 330 | """ 331 | compute all possible node mapping candidates and their weights (the triple matching number gain resulting from 332 | mapping one node in AMR 1 to another node in AMR2) 333 | 334 | Arguments: 335 | instance1: instance triples of AMR 1 336 | attribute1: attribute triples of AMR 1 (attribute name, node name, attribute value) 337 | relation1: relation triples of AMR 1 (relation name, node 1 name, node 2 name) 338 | instance2: instance triples of AMR 2 339 | attribute2: attribute triples of AMR 2 (attribute name, node name, attribute value) 340 | relation2: relation triples of AMR 2 (relation name, node 1 name, node 2 name 341 | prefix1: prefix label for AMR 1 342 | prefix2: prefix label for AMR 2 343 | Returns: 344 | candidate_mapping: a list of candidate nodes. 345 | The ith element contains the node indices (in AMR 2) the ith node (in AMR 1) can map to. 346 | (resulting in non-zero triple match) 347 | weight_dict: a dictionary which contains the matching triple number for every pair of node mapping. The key 348 | is a node pair. The value is another dictionary. key {-1} is triple match resulting from this node 349 | pair alone (instance triples and attribute triples), and other keys are node pairs that can result 350 | in relation triple match together with the first node pair. 351 | 352 | 353 | """ 354 | #allowed_mappings = {} 355 | allowed_mappings.clear() 356 | if lol1 is not None and lol2 is not None: 357 | for n in range(len(lol1)): 358 | for node1 in lol1[n]: 359 | node1_index = int(node1[len(prefix1):]) 360 | if node1_index not in allowed_mappings: 361 | allowed_mappings[node1_index] = set() 362 | if len(lol2) <= n : 363 | lol2.append([]) 364 | for node2 in lol2[n]: 365 | node2_index = int(node2[len(prefix2):]) 366 | if node2_index not in allowed_mappings[node1_index]: 367 | allowed_mappings[node1_index].add(node2_index) 368 | #allowed_mappings = None 369 | candidate_mapping = [] 370 | weight_dict = {} 371 | for instance1_item in instance1: 372 | # each candidate mapping is a set of node indices 373 | candidate_mapping.append(set()) 374 | if doinstance: 375 | for instance2_item in instance2: 376 | # if both triples are instance triples and have the same value 377 | if normalize(instance1_item[0]) == normalize(instance2_item[0]) and \ 378 | normalize(instance1_item[2]) == normalize(instance2_item[2]): 379 | # get node index by stripping the prefix 380 | node1_index = int(instance1_item[1][len(prefix1):]) 381 | node2_index = int(instance2_item[1][len(prefix2):]) 382 | if node1_index not in allowed_mappings: 383 | import ipdb; ipdb.set_trace() 384 | if allowed_mappings is None or node2_index in allowed_mappings[node1_index]: 385 | candidate_mapping[node1_index].add(node2_index) 386 | node_pair = (node1_index, node2_index) 387 | # use -1 as key in weight_dict for instance triples and attribute triples 388 | if node_pair in weight_dict: 389 | weight_dict[node_pair][-1] += 1 390 | else: 391 | weight_dict[node_pair] = {} 392 | weight_dict[node_pair][-1] = 1 393 | if doattribute: 394 | for attribute1_item in attribute1: 395 | for attribute2_item in attribute2: 396 | # if both attribute relation triple have the same relation name and value 397 | if normalize(attribute1_item[0]) == normalize(attribute2_item[0]) \ 398 | and normalize(attribute1_item[2]) == normalize(attribute2_item[2]): 399 | node1_index = int(attribute1_item[1][len(prefix1):]) 400 | node2_index = int(attribute2_item[1][len(prefix2):]) 401 | if allowed_mappings is None or node2_index in allowed_mappings[node1_index]: 402 | candidate_mapping[node1_index].add(node2_index) 403 | node_pair = (node1_index, node2_index) 404 | # use -1 as key in weight_dict for instance triples and attribute triples 405 | if node_pair in weight_dict: 406 | weight_dict[node_pair][-1] += 1 407 | else: 408 | weight_dict[node_pair] = {} 409 | weight_dict[node_pair][-1] = 1 410 | if dorelation: 411 | for relation1_item in relation1: 412 | for relation2_item in relation2: 413 | # if both relation share the same name 414 | if normalize(relation1_item[0]) == normalize(relation2_item[0]): 415 | node1_index_amr1 = int(relation1_item[1][len(prefix1):]) 416 | node1_index_amr2 = int(relation2_item[1][len(prefix2):]) 417 | node2_index_amr1 = int(relation1_item[2][len(prefix1):]) 418 | node2_index_amr2 = int(relation2_item[2][len(prefix2):]) 419 | # add mapping between two nodes 420 | if allowed_mappings is not None and (node1_index_amr2 not in allowed_mappings[node1_index_amr1] or node2_index_amr2 not in allowed_mappings[node2_index_amr1]): 421 | continue 422 | node_pair1 = (node1_index_amr1, node1_index_amr2) 423 | node_pair2 = (node2_index_amr1, node2_index_amr2) 424 | candidate_mapping[node1_index_amr1].add(node1_index_amr2) 425 | candidate_mapping[node2_index_amr1].add(node2_index_amr2) 426 | if node_pair2 != node_pair1: 427 | # update weight_dict weight. Note that we need to update both entries for future search 428 | # i.e weight_dict[node_pair1][node_pair2] 429 | # weight_dict[node_pair2][node_pair1] 430 | if node1_index_amr1 > node2_index_amr1: 431 | # swap node_pair1 and node_pair2 432 | node_pair1 = (node2_index_amr1, node2_index_amr2) 433 | node_pair2 = (node1_index_amr1, node1_index_amr2) 434 | if node_pair1 in weight_dict: 435 | if node_pair2 in weight_dict[node_pair1]: 436 | weight_dict[node_pair1][node_pair2] += 1 437 | else: 438 | weight_dict[node_pair1][node_pair2] = 1 439 | else: 440 | weight_dict[node_pair1] = {-1: 0, node_pair2: 1} 441 | if node_pair2 in weight_dict: 442 | if node_pair1 in weight_dict[node_pair2]: 443 | weight_dict[node_pair2][node_pair1] += 1 444 | else: 445 | weight_dict[node_pair2][node_pair1] = 1 446 | else: 447 | weight_dict[node_pair2] = {-1: 0, node_pair1: 1} 448 | else: 449 | if node_pair1 in weight_dict: 450 | weight_dict[node_pair1][-1] += 1 451 | else: 452 | weight_dict[node_pair1] = {-1: 1} 453 | 454 | return candidate_mapping, weight_dict 455 | 456 | 457 | def smart_init_mapping(candidate_mapping, instance1, instance2): 458 | """ 459 | Initialize mapping based on the concept mapping (smart initialization) 460 | Arguments: 461 | candidate_mapping: candidate node match list 462 | instance1: instance triples of AMR 1 463 | instance2: instance triples of AMR 2 464 | Returns: 465 | initialized node mapping between two AMRs 466 | 467 | """ 468 | random.seed() 469 | matched_dict = {} 470 | result = [] 471 | # list to store node indices that have no concept match 472 | no_word_match = [] 473 | for i, candidates in enumerate(candidate_mapping): 474 | if not candidates: 475 | # no possible mapping 476 | result.append(-1) 477 | continue 478 | # node value in instance triples of AMR 1 479 | value1 = instance1[i][2] 480 | for node_index in candidates: 481 | value2 = instance2[node_index][2] 482 | # find the first instance triple match in the candidates 483 | # instance triple match is having the same concept value 484 | if value1 == value2: 485 | if node_index not in matched_dict: 486 | result.append(node_index) 487 | matched_dict[node_index] = 1 488 | break 489 | if len(result) == i: 490 | no_word_match.append(i) 491 | result.append(-1) 492 | # if no concept match, generate a random mapping 493 | for i in no_word_match: 494 | candidates = list(candidate_mapping[i]) 495 | while candidates: 496 | # get a random node index from candidates 497 | rid = random.randint(0, len(candidates) - 1) 498 | candidate = candidates[rid] 499 | if candidate in matched_dict: 500 | candidates.pop(rid) 501 | else: 502 | matched_dict[candidate] = 1 503 | result[i] = candidate 504 | break 505 | return result 506 | 507 | 508 | def random_init_mapping(candidate_mapping): 509 | """ 510 | Generate a random node mapping. 511 | Args: 512 | candidate_mapping: candidate_mapping: candidate node match list 513 | Returns: 514 | randomly-generated node mapping between two AMRs 515 | 516 | """ 517 | # if needed, a fixed seed could be passed here to generate same random (to help debugging) 518 | random.seed() 519 | matched_dict = {} 520 | result = [] 521 | for c in candidate_mapping: 522 | candidates = list(c) 523 | if not candidates: 524 | # -1 indicates no possible mapping 525 | result.append(-1) 526 | continue 527 | found = False 528 | while candidates: 529 | # randomly generate an index in [0, length of candidates) 530 | rid = random.randint(0, len(candidates) - 1) 531 | candidate = candidates[rid] 532 | # check if it has already been matched 533 | if candidate in matched_dict: 534 | candidates.pop(rid) 535 | else: 536 | matched_dict[candidate] = 1 537 | result.append(candidate) 538 | found = True 539 | break 540 | if not found: 541 | result.append(-1) 542 | return result 543 | 544 | 545 | def compute_match(mapping, weight_dict): 546 | """ 547 | Given a node mapping, compute match number based on weight_dict. 548 | Args: 549 | mappings: a list of node index in AMR 2. The ith element (value j) means node i in AMR 1 maps to node j in AMR 2. 550 | Returns: 551 | matching triple number 552 | Complexity: O(m*n) , m is the node number of AMR 1, n is the node number of AMR 2 553 | 554 | """ 555 | # If this mapping has been investigated before, retrieve the value instead of re-computing. 556 | if veryVerbose: 557 | print("Computing match for mapping", file=DEBUG_LOG) 558 | print(mapping, file=DEBUG_LOG) 559 | if tuple(mapping) in match_triple_dict: 560 | if veryVerbose: 561 | print("saved value", match_triple_dict[tuple(mapping)], file=DEBUG_LOG) 562 | return match_triple_dict[tuple(mapping)] 563 | match_num = 0 564 | # i is node index in AMR 1, m is node index in AMR 2 565 | for i, m in enumerate(mapping): 566 | if m == -1: 567 | # no node maps to this node 568 | continue 569 | # node i in AMR 1 maps to node m in AMR 2 570 | current_node_pair = (i, m) 571 | if current_node_pair not in weight_dict: 572 | continue 573 | if veryVerbose: 574 | print("node_pair", current_node_pair, file=DEBUG_LOG) 575 | for key in weight_dict[current_node_pair]: 576 | if key == -1: 577 | # matching triple resulting from instance/attribute triples 578 | match_num += weight_dict[current_node_pair][key] 579 | if veryVerbose: 580 | print("instance/attribute match", weight_dict[current_node_pair][key], file=DEBUG_LOG) 581 | # only consider node index larger than i to avoid duplicates 582 | # as we store both weight_dict[node_pair1][node_pair2] and 583 | # weight_dict[node_pair2][node_pair1] for a relation 584 | elif key[0] < i: 585 | continue 586 | elif mapping[key[0]] == key[1]: 587 | match_num += weight_dict[current_node_pair][key] 588 | if veryVerbose: 589 | print("relation match with", key, weight_dict[current_node_pair][key], file=DEBUG_LOG) 590 | if veryVerbose: 591 | print("match computing complete, result:", match_num, file=DEBUG_LOG) 592 | # update match_triple_dict 593 | match_triple_dict[tuple(mapping)] = match_num 594 | return match_num 595 | 596 | 597 | def move_gain(mapping, node_id, old_id, new_id, weight_dict, match_num): 598 | """ 599 | Compute the triple match number gain from the move operation 600 | Arguments: 601 | mapping: current node mapping 602 | node_id: remapped node in AMR 1 603 | old_id: original node id in AMR 2 to which node_id is mapped 604 | new_id: new node in to which node_id is mapped 605 | weight_dict: weight dictionary 606 | match_num: the original triple matching number 607 | Returns: 608 | the triple match gain number (might be negative) 609 | 610 | """ 611 | # new node mapping after moving 612 | new_mapping = (node_id, new_id) 613 | # node mapping before moving 614 | old_mapping = (node_id, old_id) 615 | # new nodes mapping list (all node pairs) 616 | new_mapping_list = mapping[:] 617 | new_mapping_list[node_id] = new_id 618 | # if this mapping is already been investigated, use saved one to avoid duplicate computing 619 | if tuple(new_mapping_list) in match_triple_dict: 620 | return match_triple_dict[tuple(new_mapping_list)] - match_num 621 | gain = 0 622 | # add the triple match incurred by new_mapping to gain 623 | if new_mapping in weight_dict: 624 | for key in weight_dict[new_mapping]: 625 | if key == -1: 626 | # instance/attribute triple match 627 | gain += weight_dict[new_mapping][-1] 628 | elif new_mapping_list[key[0]] == key[1]: 629 | # relation gain incurred by new_mapping and another node pair in new_mapping_list 630 | gain += weight_dict[new_mapping][key] 631 | # deduct the triple match incurred by old_mapping from gain 632 | if old_mapping in weight_dict: 633 | for k in weight_dict[old_mapping]: 634 | if k == -1: 635 | gain -= weight_dict[old_mapping][-1] 636 | elif mapping[k[0]] == k[1]: 637 | gain -= weight_dict[old_mapping][k] 638 | # update match number dictionary 639 | match_triple_dict[tuple(new_mapping_list)] = match_num + gain 640 | return gain 641 | 642 | 643 | def swap_gain(mapping, node_id1, mapping_id1, node_id2, mapping_id2, weight_dict, match_num): 644 | """ 645 | Compute the triple match number gain from the swapping 646 | Arguments: 647 | mapping: current node mapping list 648 | node_id1: node 1 index in AMR 1 649 | mapping_id1: the node index in AMR 2 node 1 maps to (in the current mapping) 650 | node_id2: node 2 index in AMR 1 651 | mapping_id2: the node index in AMR 2 node 2 maps to (in the current mapping) 652 | weight_dict: weight dictionary 653 | match_num: the original matching triple number 654 | Returns: 655 | the gain number (might be negative) 656 | 657 | """ 658 | new_mapping_list = mapping[:] 659 | # Before swapping, node_id1 maps to mapping_id1, and node_id2 maps to mapping_id2 660 | # After swapping, node_id1 maps to mapping_id2 and node_id2 maps to mapping_id1 661 | new_mapping_list[node_id1] = mapping_id2 662 | new_mapping_list[node_id2] = mapping_id1 663 | if tuple(new_mapping_list) in match_triple_dict: 664 | return match_triple_dict[tuple(new_mapping_list)] - match_num 665 | gain = 0 666 | new_mapping1 = (node_id1, mapping_id2) 667 | new_mapping2 = (node_id2, mapping_id1) 668 | old_mapping1 = (node_id1, mapping_id1) 669 | old_mapping2 = (node_id2, mapping_id2) 670 | if node_id1 > node_id2: 671 | new_mapping2 = (node_id1, mapping_id2) 672 | new_mapping1 = (node_id2, mapping_id1) 673 | old_mapping1 = (node_id2, mapping_id2) 674 | old_mapping2 = (node_id1, mapping_id1) 675 | if new_mapping1 in weight_dict: 676 | for key in weight_dict[new_mapping1]: 677 | if key == -1: 678 | gain += weight_dict[new_mapping1][-1] 679 | elif new_mapping_list[key[0]] == key[1]: 680 | gain += weight_dict[new_mapping1][key] 681 | if new_mapping2 in weight_dict: 682 | for key in weight_dict[new_mapping2]: 683 | if key == -1: 684 | gain += weight_dict[new_mapping2][-1] 685 | # to avoid duplicate 686 | elif key[0] == node_id1: 687 | continue 688 | elif new_mapping_list[key[0]] == key[1]: 689 | gain += weight_dict[new_mapping2][key] 690 | if old_mapping1 in weight_dict: 691 | for key in weight_dict[old_mapping1]: 692 | if key == -1: 693 | gain -= weight_dict[old_mapping1][-1] 694 | elif mapping[key[0]] == key[1]: 695 | gain -= weight_dict[old_mapping1][key] 696 | if old_mapping2 in weight_dict: 697 | for key in weight_dict[old_mapping2]: 698 | if key == -1: 699 | gain -= weight_dict[old_mapping2][-1] 700 | # to avoid duplicate 701 | elif key[0] == node_id1: 702 | continue 703 | elif mapping[key[0]] == key[1]: 704 | gain -= weight_dict[old_mapping2][key] 705 | match_triple_dict[tuple(new_mapping_list)] = match_num + gain 706 | return gain 707 | 708 | 709 | def get_best_gain(mapping, candidate_mappings, weight_dict, instance_len, cur_match_num, lol1=None): 710 | """ 711 | Hill-climbing method to return the best gain swap/move can get 712 | Arguments: 713 | mapping: current node mapping 714 | candidate_mappings: the candidates mapping list 715 | weight_dict: the weight dictionary 716 | instance_len: the number of the nodes in AMR 2 717 | cur_match_num: current triple match number 718 | Returns: 719 | the best gain we can get via swap/move operation 720 | 721 | """ 722 | largest_gain = 0 723 | # True: using swap; False: using move 724 | use_swap = True 725 | # the node to be moved/swapped 726 | node1 = None 727 | # store the other node affected. In swap, this other node is the node swapping with node1. In move, this other 728 | # node is the node node1 will move to. 729 | node2 = None 730 | # unmatched nodes in AMR 2 731 | unmatched = set(range(instance_len)) 732 | # exclude nodes in current mapping 733 | # get unmatched nodes 734 | for nid in mapping: 735 | if nid in unmatched: 736 | unmatched.remove(nid) 737 | for i, nid in enumerate(mapping): 738 | # current node i in AMR 1 maps to node nid in AMR 2 739 | for nm in unmatched: 740 | if nm in candidate_mappings[i]: 741 | # remap i to another unmatched node (move) 742 | # (i, m) -> (i, nm) 743 | if veryVerbose: 744 | print("Remap node", i, "from ", nid, "to", nm, file=DEBUG_LOG) 745 | mv_gain = move_gain(mapping, i, nid, nm, weight_dict, cur_match_num) 746 | if veryVerbose: 747 | print("Move gain:", mv_gain, file=DEBUG_LOG) 748 | new_mapping = mapping[:] 749 | new_mapping[i] = nm 750 | new_match_num = compute_match(new_mapping, weight_dict) 751 | if new_match_num != cur_match_num + mv_gain: 752 | print(mapping, new_mapping, file=ERROR_LOG) 753 | print("Inconsistency in computing: move gain", cur_match_num, mv_gain, new_match_num, 754 | file=ERROR_LOG) 755 | if mv_gain > largest_gain: 756 | largest_gain = mv_gain 757 | node1 = i 758 | node2 = nm 759 | use_swap = False 760 | 761 | # compute swap gain 762 | 763 | if True: 764 | for i, m in enumerate(mapping): 765 | for j in range(i + 1, len(mapping)): 766 | m2 = mapping[j] 767 | if (m2 not in candidate_mappings[i]) and (m not in candidate_mappings[j]): 768 | continue 769 | # swap operation (i, m) (j, m2) -> (i, m2) (j, m) 770 | # j starts from i+1, to avoid duplicate swap 771 | if veryVerbose: 772 | print("Swap node", i, "and", j, file=DEBUG_LOG) 773 | print("Before swapping:", i, "-", m, ",", j, "-", m2, file=DEBUG_LOG) 774 | print(mapping, file=DEBUG_LOG) 775 | print("After swapping:", i, "-", m2, ",", j, "-", m, file=DEBUG_LOG) 776 | sw_gain = swap_gain(mapping, i, m, j, m2, weight_dict, cur_match_num) 777 | if veryVerbose: 778 | print("Swap gain:", sw_gain, file=DEBUG_LOG) 779 | new_mapping = mapping[:] 780 | new_mapping[i] = m2 781 | new_mapping[j] = m 782 | print(new_mapping, file=DEBUG_LOG) 783 | new_match_num = compute_match(new_mapping, weight_dict) 784 | if new_match_num != cur_match_num + sw_gain: 785 | print(mapping, new_mapping, file=ERROR_LOG) 786 | print("Inconsistency in computing: swap gain", cur_match_num, sw_gain, new_match_num, 787 | file=ERROR_LOG) 788 | if sw_gain > largest_gain: 789 | largest_gain = sw_gain 790 | node1 = i 791 | node2 = j 792 | use_swap = True 793 | 794 | # generate a new mapping based on swap/move 795 | cur_mapping = mapping[:] 796 | if node1 is not None: 797 | if use_swap: 798 | if veryVerbose: 799 | print("Use swap gain", file=DEBUG_LOG) 800 | temp = cur_mapping[node1] 801 | cur_mapping[node1] = cur_mapping[node2] 802 | cur_mapping[node2] = temp 803 | else: 804 | if veryVerbose: 805 | print("Use move gain", file=DEBUG_LOG) 806 | cur_mapping[node1] = node2 807 | else: 808 | if veryVerbose: 809 | print("no move/swap gain found", file=DEBUG_LOG) 810 | if veryVerbose: 811 | print("Original mapping", mapping, file=DEBUG_LOG) 812 | print("Current mapping", cur_mapping, file=DEBUG_LOG) 813 | 814 | return largest_gain, cur_mapping 815 | 816 | 817 | def print_alignment(mapping, instance1, instance2, new2old1=None, new2old2=None): 818 | """ 819 | print the alignment based on a node mapping 820 | Args: 821 | mapping: current node mapping list 822 | instance1: nodes of AMR 1 823 | instance2: nodes of AMR 2 824 | 825 | """ 826 | result = [] 827 | for instance1_item, m in zip(instance1, mapping): 828 | if new2old1 is None: 829 | r = instance1_item[1] + "(" + instance1_item[2] + ")" 830 | else: 831 | r = new2old1[instance1_item[1]] + "(" + instance1_item[2] + ")" 832 | if m == -1: 833 | r += "-Null" 834 | else: 835 | instance2_item = instance2[m] 836 | if new2old2 is None: 837 | r += "-" + instance2_item[1] + "(" + instance2_item[2] + ")" 838 | else: 839 | r += "-" + new2old2[instance2_item[1]] + "(" + instance2_item[2] + ")" 840 | result.append(r) 841 | return " ".join(result) 842 | 843 | 844 | def compute_f(match_num, test_num, gold_num): 845 | """ 846 | Compute the f-score based on the matching triple number, 847 | triple number of AMR set 1, 848 | triple number of AMR set 2 849 | Args: 850 | match_num: matching triple number 851 | test_num: triple number of AMR 1 (test file) 852 | gold_num: triple number of AMR 2 (gold file) 853 | Returns: 854 | precision: match_num/test_num 855 | recall: match_num/gold_num 856 | f_score: 2*precision*recall/(precision+recall) 857 | """ 858 | if test_num == 0 or gold_num == 0: 859 | return 0.00, 0.00, 0.00 860 | precision = float(match_num) / float(test_num) 861 | recall = float(match_num) / float(gold_num) 862 | if (precision + recall) != 0: 863 | f_score = 2 * precision * recall / (precision + recall) 864 | if veryVerbose: 865 | print("F-score:", f_score, file=DEBUG_LOG) 866 | return precision, recall, f_score 867 | else: 868 | if veryVerbose: 869 | print("F-score:", "0.0", file=DEBUG_LOG) 870 | return precision, recall, 0.00 871 | 872 | 873 | def generate_amr_lines(f1, f2): 874 | """ 875 | Read one AMR line at a time from each file handle 876 | :param f1: file handle (or any iterable of strings) to read AMR 1 lines from 877 | :param f2: file handle (or any iterable of strings) to read AMR 2 lines from 878 | :return: generator of cur_amr1, cur_amr2 pairs: one-line AMR strings 879 | """ 880 | while True: 881 | cur_amr1 = amr.AMR.get_amr_line(f1) 882 | cur_amr2 = amr.AMR.get_amr_line(f2) 883 | if not cur_amr1 and not cur_amr2: 884 | pass 885 | elif not cur_amr1: 886 | print("Error: File 1 has less AMRs than file 2", file=ERROR_LOG) 887 | print("Ignoring remaining AMRs", file=ERROR_LOG) 888 | elif not cur_amr2: 889 | print("Error: File 2 has less AMRs than file 1", file=ERROR_LOG) 890 | print("Ignoring remaining AMRs", file=ERROR_LOG) 891 | else: 892 | yield cur_amr1, cur_amr2 893 | continue 894 | break 895 | 896 | 897 | def get_amr_match(cur_amr1, cur_amr2, sent_num=1, justinstance=False, justattribute=False, justrelation=False, coref=False): 898 | amr_pair = [] 899 | for i, cur_amr in (1, cur_amr1), (2, cur_amr2): 900 | try: 901 | amr_pair.append(amr.AMR.parse_AMR_line(cur_amr)) 902 | except Exception as e: 903 | print("Error in parsing amr %d: %s" % (i, cur_amr), file=ERROR_LOG) 904 | print("Please check if the AMR is ill-formatted. Ignoring remaining AMRs", file=ERROR_LOG) 905 | print("Error message: %s" % e, file=ERROR_LOG) 906 | subscores = {} 907 | if len(amr_pair) != 2: 908 | return (0,0,0), subscores 909 | amr1, amr2 = amr_pair 910 | 911 | if False: 912 | 913 | #code to check if all correct mapping are still allowed 914 | #with sentence alignment contraints for faster doc smatch 915 | #----- 916 | #tested this on gold docAMR and corresponding no coref version 917 | 918 | sentences_by_node_1 = {n:set() for n in amr1.nodes} 919 | for i,nodes_in_sentence in enumerate(amr1.nodes_by_sentence): 920 | for n in nodes_in_sentence: 921 | sentences_by_node_1[n].add(i) 922 | sentences_by_node_2 = {n:set() for n in amr2.nodes} 923 | for i,nodes_in_sentence in enumerate(amr2.nodes_by_sentence): 924 | for n in nodes_in_sentence: 925 | sentences_by_node_2[n].add(i) 926 | 927 | for n1 in sentences_by_node_1: 928 | if n1 in sentences_by_node_2: 929 | if len( sentences_by_node_1[n1].intersection(sentences_by_node_2[n1]) ) == 0: 930 | if True: 931 | print(n1) 932 | print(sentences_by_node_1[n1]) 933 | print(sentences_by_node_2[n1]) 934 | print("====") 935 | 936 | return (0,0,0), subscores 937 | 938 | if amr1 is None or amr2 is None: 939 | return (0,0,0), subscores 940 | prefix1 = "a" 941 | prefix2 = "b" 942 | # Rename node to "a1", "a2", .etc 943 | amr1.rename_node(prefix1) 944 | # Renaming node to "b1", "b2", .etc 945 | amr2.rename_node(prefix2) 946 | (instance1, attributes1, relation1) = amr1.get_triples() 947 | (instance2, attributes2, relation2) = amr2.get_triples() 948 | if verbose: 949 | print("AMR pair", sent_num, file=DEBUG_LOG) 950 | print("============================================", file=DEBUG_LOG) 951 | print("AMR 1 (one-line):", cur_amr1, file=DEBUG_LOG) 952 | print("AMR 2 (one-line):", cur_amr2, file=DEBUG_LOG) 953 | print("Instance triples of AMR 1:", len(instance1), file=DEBUG_LOG) 954 | print(instance1, file=DEBUG_LOG) 955 | print("Attribute triples of AMR 1:", len(attributes1), file=DEBUG_LOG) 956 | print(attributes1, file=DEBUG_LOG) 957 | print("Relation triples of AMR 1:", len(relation1), file=DEBUG_LOG) 958 | print(relation1, file=DEBUG_LOG) 959 | print("Instance triples of AMR 2:", len(instance2), file=DEBUG_LOG) 960 | print(instance2, file=DEBUG_LOG) 961 | print("Attribute triples of AMR 2:", len(attributes2), file=DEBUG_LOG) 962 | print(attributes2, file=DEBUG_LOG) 963 | print("Relation triples of AMR 2:", len(relation2), file=DEBUG_LOG) 964 | print(relation2, file=DEBUG_LOG) 965 | # optionally turn off some of the node comparison 966 | doinstance = doattribute = dorelation = True 967 | if justinstance: 968 | doattribute = dorelation = False 969 | if justattribute: 970 | doinstance = dorelation = False 971 | if justrelation: 972 | doinstance = doattribute = False 973 | (best_mapping, best_match_num, weight_dict) = get_best_match(instance1, attributes1, relation1, 974 | instance2, attributes2, relation2, 975 | prefix1, prefix2, doinstance=doinstance, 976 | doattribute=doattribute, dorelation=dorelation, 977 | nodes_by_sentence1=amr1.nodes_by_sentence, nodes_by_sentence2=amr2.nodes_by_sentence) 978 | if verbose: 979 | print("best match number", best_match_num, file=DEBUG_LOG) 980 | print("best node mapping", best_mapping, file=DEBUG_LOG) 981 | print("Best node mapping alignment:", print_alignment(best_mapping, instance1, instance2, new2old1=amr1.new2old_map, new2old2=amr2.new2old_map), file=DEBUG_LOG) 982 | if justinstance: 983 | test_triple_num = len(instance1) 984 | gold_triple_num = len(instance2) 985 | elif justattribute: 986 | test_triple_num = len(attributes1) 987 | gold_triple_num = len(attributes2) 988 | elif justrelation: 989 | test_triple_num = len(relation1) 990 | gold_triple_num = len(relation2) 991 | else: 992 | test_triple_num = len(instance1) + len(attributes1) + len(relation1) 993 | gold_triple_num = len(instance2) + len(attributes2) + len(relation2) 994 | 995 | total_nums = (best_match_num, test_triple_num, gold_triple_num) 996 | if coref: 997 | amr1.find_coref() 998 | amr2.find_coref() 999 | alignment = SMATCH_Alignment(best_mapping, pred_amr=amr1, gold_amr=amr2) 1000 | #for type,rel,rel2 in alignment.iterate_errors(): 1001 | # print(type,rel,rel2) 1002 | # print() 1003 | # named entities 1004 | # bridging relations 1005 | # other 1006 | ne_scores = Scores() 1007 | bridging_scores = Scores() 1008 | other_scores = Scores() 1009 | for n in amr2.coref_nodes: 1010 | if n in amr2.named_entities: 1011 | ne_scores.increment(gold=1) 1012 | else: 1013 | other_scores.increment(gold=1) 1014 | ns = alignment.gold_to_pred(node=n) 1015 | for n2 in ns: 1016 | if n2 not in amr1.coref_nodes: 1017 | continue 1018 | #if n in amr2.named_entities: 1019 | # ne_scores.increment(pred=1) 1020 | #else: 1021 | # other_scores.increment(pred=1) 1022 | if alignment.gold_concepts[n]==alignment.pred_concepts[n2]: 1023 | if n in amr2.named_entities: 1024 | ne_scores.increment(num=1) 1025 | else: 1026 | other_scores.increment(num=1) 1027 | for n in amr1.coref_nodes: 1028 | if n in amr1.named_entities: 1029 | ne_scores.increment(pred=1) 1030 | else: 1031 | other_scores.increment(pred=1) 1032 | 1033 | #ne_scores = Scores() 1034 | #bridging_scores = Scores() 1035 | #other_scores = Scores() 1036 | for r,s,t in amr2.coref_edges: 1037 | if r in ['part','subset']: 1038 | bridging_scores.increment(gold=1) 1039 | else: 1040 | other_scores.increment(gold=1) 1041 | rels = alignment.edge_align_inv[(r,s,t)] 1042 | for rel in rels: 1043 | if rel not in amr1.coref_edges: 1044 | continue 1045 | #if r in ['part', 'subset']: 1046 | # bridging_scores.increment(pred=1) 1047 | #else: 1048 | # other_scores.increment(pred=1) 1049 | r2, s2, t2 = rel 1050 | if r == r2: 1051 | if r in ['part', 'subset']: 1052 | bridging_scores.increment(num=1) 1053 | else: 1054 | other_scores.increment(num=1) 1055 | for r,s,t in amr1.coref_edges: 1056 | if r in ['part','subset']: 1057 | bridging_scores.increment(pred=1) 1058 | else: 1059 | other_scores.increment(pred=1) 1060 | 1061 | subscores['Named Entity Coref'] = ne_scores 1062 | subscores['Bridging Relations'] = bridging_scores 1063 | subscores['Other Coref'] = other_scores 1064 | coref_scores = Scores() 1065 | for scores in [ne_scores, bridging_scores, other_scores]: 1066 | coref_scores.update(scores) 1067 | subscores['Total Coref'] = coref_scores 1068 | noncoref_scores = Scores() 1069 | noncoref_scores.set(num=best_match_num-coref_scores.num, 1070 | gold=gold_triple_num-coref_scores.gold_total, 1071 | pred=test_triple_num-coref_scores.pred_total) 1072 | subscores['Non-Coref'] = noncoref_scores 1073 | return total_nums, subscores 1074 | 1075 | # long_sents = []#2, 19, 35, 40, 41] 1076 | 1077 | def score_amr_pairs(f1, f2, justinstance=False, justattribute=False, justrelation=False, coref=False): 1078 | """ 1079 | Score one pair of AMR lines at a time from each file handle 1080 | :param f1: file handle (or any iterable of strings) to read AMR 1 lines from 1081 | :param f2: file handle (or any iterable of strings) to read AMR 2 lines from 1082 | :param justinstance: just pay attention to matching instances 1083 | :param justattribute: just pay attention to matching attributes 1084 | :param justrelation: just pay attention to matching relations 1085 | :return: generator of cur_amr1, cur_amr2 pairs: one-line AMR strings 1086 | """ 1087 | # matching triple number, triple number in test file, triple number in gold file 1088 | total_scores = Scores() 1089 | subscores = {} 1090 | # Read amr pairs from two files 1091 | for sent_num, (cur_amr1, cur_amr2) in tqdm(enumerate(generate_amr_lines(f1, f2), start=1), desc='Smatch'): 1092 | #for sent_num, (cur_amr1, cur_amr2) in enumerate(generate_amr_lines(f1, f2), start=1): 1093 | nums, ss = get_amr_match(cur_amr1, cur_amr2, 1094 | sent_num=sent_num, # sentence number 1095 | justinstance=justinstance, 1096 | justattribute=justattribute, 1097 | justrelation=justrelation, 1098 | coref=coref) 1099 | best_match_num, test_triple_num, gold_triple_num = nums 1100 | total_scores.increment(num=best_match_num, 1101 | pred=test_triple_num, 1102 | gold=gold_triple_num) 1103 | for label in ss: 1104 | if label in subscores: 1105 | subscores[label].update(ss[label]) 1106 | else: 1107 | subscores[label] = ss[label] 1108 | # clear the matching triple dictionary for the next AMR pair 1109 | match_triple_dict.clear() 1110 | if not single_score: # if each AMR pair should have a score, compute and output it here 1111 | yield compute_f(best_match_num, test_triple_num, gold_triple_num) 1112 | todo = [] 1113 | todo.append(('Overall Score:', total_scores)) 1114 | if coref: 1115 | todo.append(('Coref Score', subscores['Total Coref'])) 1116 | #for label in subscores: 1117 | # todo.append((label, subscores[label])) 1118 | 1119 | for label, scores in todo: 1120 | print(label) 1121 | total_match_num, total_test_num, total_gold_num = scores.get() 1122 | if verbose: 1123 | print("Total match number, total triple number in AMR 1, and total triple number in AMR 2:", file=DEBUG_LOG) 1124 | print(total_match_num, total_test_num, total_gold_num, file=DEBUG_LOG) 1125 | print("---------------------------------------------------------------------------------", file=DEBUG_LOG) 1126 | if single_score: # output document-level smatch score (a single f-score for all AMR pairs in two files) 1127 | yield compute_f(total_match_num, total_test_num, total_gold_num) 1128 | 1129 | 1130 | def main(): 1131 | """ 1132 | Main function of smatch score calculation 1133 | """ 1134 | global verbose 1135 | global veryVerbose 1136 | global iteration_num 1137 | global single_score 1138 | global pr_flag 1139 | global match_triple_dict 1140 | 1141 | import argparse 1142 | 1143 | parser = argparse.ArgumentParser(description="Smatch calculator") 1144 | parser.add_argument( 1145 | '-f', 1146 | nargs=2, 1147 | required=True, 1148 | type=argparse.FileType('r'), 1149 | help=('Two files containing AMR pairs. ' 1150 | 'AMRs in each file are separated by a single blank line')) 1151 | parser.add_argument( 1152 | '-r', 1153 | type=int, 1154 | default=4, 1155 | help='Restart number (Default:4)') 1156 | parser.add_argument( 1157 | '--significant', 1158 | type=int, 1159 | default=2, 1160 | help='significant digits to output (default: 2)') 1161 | parser.add_argument( 1162 | '-v', 1163 | action='store_true', 1164 | help='Verbose output (Default:false)') 1165 | parser.add_argument( 1166 | '--vv', 1167 | action='store_true', 1168 | help='Very Verbose output (Default:false)') 1169 | parser.add_argument( 1170 | '--ms', 1171 | action='store_true', 1172 | default=False, 1173 | help=('Output multiple scores (one AMR pair a score) ' 1174 | 'instead of a single document-level smatch score ' 1175 | '(Default: false)')) 1176 | parser.add_argument( 1177 | '--pr', 1178 | action='store_true', 1179 | default=False, 1180 | help=('Output precision and recall as well as the f-score. ' 1181 | 'Default: false')) 1182 | parser.add_argument( 1183 | '--justinstance', 1184 | action='store_true', 1185 | default=False, 1186 | help="just pay attention to matching instances") 1187 | parser.add_argument( 1188 | '--justattribute', 1189 | action='store_true', 1190 | default=False, 1191 | help="just pay attention to matching attributes") 1192 | parser.add_argument( 1193 | '--justrelation', 1194 | action='store_true', 1195 | default=False, 1196 | help="just pay attention to matching relations") 1197 | parser.add_argument( 1198 | '--coref-subscore', 1199 | action='store_true', 1200 | default=False, 1201 | help="include subscores for coreference") 1202 | 1203 | arguments = parser.parse_args() 1204 | 1205 | # set the iteration number 1206 | # total iteration number = restart number + 1 1207 | iteration_num = arguments.r + 1 1208 | if arguments.ms: 1209 | single_score = False 1210 | if arguments.v: 1211 | verbose = True 1212 | if arguments.vv: 1213 | veryVerbose = True 1214 | if arguments.pr: 1215 | pr_flag = True 1216 | # significant digits to print out 1217 | floatdisplay = "%%.%df" % arguments.significant 1218 | 1219 | start_time = time.time() 1220 | for (precision, recall, best_f_score) in score_amr_pairs(arguments.f[0], arguments.f[1], 1221 | justinstance=arguments.justinstance, 1222 | justattribute=arguments.justattribute, 1223 | justrelation=arguments.justrelation, 1224 | coref=arguments.coref_subscore): 1225 | # print("Sentence", sent_num) 1226 | if pr_flag: 1227 | print("Precision: " + floatdisplay % precision) 1228 | print("Recall: " + floatdisplay % recall) 1229 | print("F-score: " + floatdisplay % best_f_score) 1230 | end_time = time.time() 1231 | elapsed_time = end_time - start_time 1232 | print("Time(s): "+str(floatdisplay % elapsed_time)) 1233 | 1234 | arguments.f[0].close() 1235 | arguments.f[1].close() 1236 | 1237 | 1238 | if __name__ == "__main__": 1239 | main() 1240 | -------------------------------------------------------------------------------- /doc_amr.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | import os 3 | import re 4 | import copy 5 | from tqdm import tqdm 6 | 7 | from amr_io import ( 8 | AMR, 9 | read_amr, 10 | process_corefs 11 | ) 12 | from ipdb import set_trace 13 | 14 | def make_doc_amrs(corefs, amrs, coref=True,chains=True): 15 | doc_amrs = {} 16 | 17 | desc = "making doc-level AMRs" 18 | if not coref: 19 | desc += " (without corefs)" 20 | for doc_id in tqdm(corefs, desc=desc): 21 | (doc_corefs,doc_sids,fname) = corefs[doc_id] 22 | if doc_sids[0] not in amrs: 23 | import ipdb; ipdb.set_trace() 24 | doc_amr = copy.deepcopy(amrs[doc_sids[0]]) 25 | for sid in doc_sids[1:]: 26 | if sid not in amrs: 27 | import ipdb; ipdb.set_trace() 28 | if amrs[sid].root is None: 29 | continue 30 | doc_amr = doc_amr + amrs[sid] 31 | doc_amr.amr_id = doc_id 32 | doc_amr.doc_file = fname 33 | if coref: 34 | if chains: 35 | doc_amr.add_corefs(doc_corefs) 36 | else: 37 | doc_amr.add_edges(doc_corefs) 38 | 39 | #setting penman to None to avoid copying sentence amr's penman 40 | doc_amr.penman = None 41 | 42 | doc_amrs[doc_id] = doc_amr 43 | 44 | return doc_amrs 45 | 46 | def connect_sen_amrs(amr): 47 | 48 | if len(amr.roots) <= 1: 49 | return 50 | 51 | node_id = amr.add_node("document") 52 | amr.root = str(node_id) 53 | for (i,root) in enumerate(amr.roots): 54 | amr.edges.append((amr.root, ":snt"+str(i+1), root)) 55 | 56 | def argument_parser(): 57 | 58 | parser = argparse.ArgumentParser(description='Read AMRs and Corefs and put them together', \ 59 | formatter_class=argparse.RawTextHelpFormatter) 60 | parser.add_argument( 61 | "--amr3-path", 62 | help="path to AMR3 annoratations", 63 | type=str 64 | ) 65 | parser.add_argument( 66 | "--coref-fof", 67 | help="File containing list of xml files with coreference information ", 68 | type=str 69 | ) 70 | parser.add_argument( 71 | "--out-amr", 72 | help="Output file containing AMR in penman format", 73 | type=str, 74 | ) 75 | parser.add_argument( 76 | "--in-doc-amr-unmerged", 77 | help="path to a doc AMR file in 'no-merge' format", 78 | type=str 79 | ) 80 | parser.add_argument( 81 | "--in-doc-amr-pairwise", 82 | help="path to a doc AMR file with coref chains as pairwise edges", 83 | type=str 84 | ) 85 | parser.add_argument( 86 | "--pairwise-coref-rel", 87 | default='same-as', 88 | help="edge label representing pairwise coref edges", 89 | type=str 90 | ) 91 | parser.add_argument( 92 | '--rep', 93 | default='docAMR', 94 | help='''Which representation to use, options: 95 | "no-merge" -- No node merging, only chain-nodes 96 | "merge-names" -- Merge only names 97 | "docAMR" -- Merge names and drop pronouns 98 | "merge-all" -- Merge all nodes''', 99 | type=str 100 | ) 101 | parser.add_argument( 102 | '--flipped', 103 | help='whether or not to use the flipped representation i.e. parent->coref-entity->child', 104 | action='store_true' 105 | ) 106 | args = parser.parse_args() 107 | return args 108 | 109 | 110 | def main(): 111 | 112 | args = argument_parser() 113 | assert args.out_amr 114 | 115 | if args.amr3_path and args.coref_fof: 116 | 117 | # read cross sentenctial corefs from document AMR 118 | coref_files = [args.amr3_path+"/"+line.strip() for line in open(args.coref_fof)] 119 | corefs = process_corefs(coref_files) 120 | 121 | # Read AMR 122 | directory = args.amr3_path + r'/data/amrs/unsplit/' 123 | amrs = {} 124 | for filename in tqdm(os.listdir(directory), desc="Reading sentence-level AMRs"): 125 | amrs.update(read_amr(directory+filename)) 126 | 127 | # write documents without corefs 128 | plain_doc_amrs = make_doc_amrs(corefs,amrs,coref=False).values() 129 | with open(args.out_amr+".nocoref", 'w') as fid: 130 | for amr in plain_doc_amrs: 131 | damr = copy.deepcopy(amr) 132 | connect_sen_amrs(damr) 133 | fid.write(damr.__str__()) 134 | # add corefs into Documentr level AMRs 135 | amrs = make_doc_amrs(corefs,amrs).values() 136 | with open(args.out_amr, 'w') as fid: 137 | for amr in amrs: 138 | damr = copy.deepcopy(amr) 139 | connect_sen_amrs(damr) 140 | print("\nnormalizing "+damr.doc_file.split("/")[-1]) 141 | print("normalizing "+damr.amr_id) 142 | damr.normalize(rep=args.rep, flip=args.flipped) 143 | fid.write(damr.__str__()) 144 | 145 | if args.in_doc_amr_unmerged : 146 | amrs = read_amr(args.in_doc_amr_unmerged).values() 147 | with open(args.out_amr, 'w') as fid: 148 | for amr in amrs: 149 | damr = copy.deepcopy(amr) 150 | print("\nnormalizing "+damr.amr_id) 151 | damr.normalize(rep=args.rep, flip=args.flipped) 152 | fid.write(damr.__str__()) 153 | 154 | if args.in_doc_amr_pairwise : 155 | amrs = read_amr(args.in_doc_amr_pairwise).values() 156 | with open(args.out_amr, 'w') as fid: 157 | for amr in amrs: 158 | damr = copy.deepcopy(amr) 159 | print("\nnormalizing "+damr.amr_id) 160 | damr.make_chains_from_pairs(args.pairwise_coref_rel) 161 | damr.normalize(rep=args.rep, flip=args.flipped) 162 | fid.write(damr.__str__()) 163 | 164 | 165 | 166 | if __name__ == '__main__': 167 | main() 168 | -------------------------------------------------------------------------------- /doc_amr_baseline/README.md: -------------------------------------------------------------------------------- 1 | ## Run DocAMR Baseline 2 | To setup the environment we use a file called set_environment.sh 3 | 4 | ```bash 5 | touch set_environment.sh 6 | ``` 7 | 8 | The activation of the conda/virtual environment can be added inside this file 9 | 10 | Packages to install are in requirements.txt. Python 3.7 works best. To install the packages required, perform the following inside the conda/virtual environment. 11 | 12 | ```bash 13 | pip install -r doc_amr_baseline/requirements.txt 14 | ``` 15 | Clone a dependent repository for conll coref conversion 16 | ```bash 17 | git clone https://github.com/boberle/corefconversion.git 18 | ``` 19 | To get a document amr , given the sentence amrs run 20 | 21 | ```bash 22 | bash doc_amr_baseline/run_doc_amr_baseline.sh 23 | 24 | ``` 25 | is a folder containing a file of sentence amrs for each document . Each file in the folder should have extension '.amr' and contain sentence amrs for all sentences in the document seperated by a newline. See **Format of AMR files** for further details. 26 | 27 | folder the doc amr for each document is to be output 28 | 29 | 30 | 31 | "no-merge" -- No node merging, only chain-nodes 32 | "merge-names" -- Merge only names 33 | "docAMR" -- Merge names and drop pronouns 34 | "merge-all" -- Merge all nodes 35 | 36 | Recommended representation based on the [paper](https://aclanthology.org/2022.naacl-main.256.pdf) is **"docAMR"** 37 | 38 | path to allennlp coref in pickled format, is optional (ie previously generated coref can be reused here). If not provided, the script will use the sentences in the amrs to get spanBert coref from "https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2021.03.10.tar.gz" 39 | 40 | ## Format of AMR files expected for DocAMR baseline 41 | Each file inside the folder should 42 | 1. End with extension '.amr' 43 | 2. Contain sentence amrs for all sentences in the document seperated by a newline 44 | 3. Contain metadata information about the AMR parse such as alignments and node id. 45 | 46 | See example folder for sample .amr file 47 | 48 | To get sentence amrs , please checkout this parser : 49 | https://github.com/IBM/transition-amr-parser 50 | 51 | Note: To get an amr in the same format as the example in the folder , use --jamr and --no-isi as arguments to the amr-parse command while use this parser. 52 | 53 | 54 | ## Run DocAMR Baseline test 55 | 56 | To run a test of the baseline, given the gold docamr and sentence amrs , 57 | 58 | ```bash 59 | bash doc_amr_baseline/tests/baseline_allennlp_test.sh 60 | 61 | ``` 62 | 63 | is a file containing the gold docamr obtained using the command mentioned in the main README with the same representation as 64 | 65 | ```bash 66 | python doc_amr.py 67 | --amr3-path 68 | --coref-fof 69 | --out-amr 70 | --rep 71 | 72 | ``` 73 | 74 | is a folder containing a file of sentence amrs for each document . Each file in the folder should have extension '.amr' and contain sentence amrs for all sentences in the document seperated by a newline. See **Format of AMR files** for further details. 75 | 76 | 77 | 78 | "no-merge" -- No node merging, only chain-nodes 79 | "merge-names" -- Merge only names 80 | "docAMR" -- Merge names and drop pronouns 81 | "merge-all" -- Merge all nodes 82 | 83 | Recommended representation based on the [paper](https://aclanthology.org/2022.naacl-main.256.pdf) is **"docAMR"** 84 | 85 | 86 | 87 | -------------------------------------------------------------------------------- /doc_amr_baseline/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/docAMR/e8937bad9aa3fa4077751f9bcedfbfcbfa37047e/doc_amr_baseline/__init__.py -------------------------------------------------------------------------------- /doc_amr_baseline/amr_constituents.py: -------------------------------------------------------------------------------- 1 | from collections import defaultdict 2 | from ipdb import set_trace 3 | import re 4 | import copy 5 | 6 | 7 | def get_subgraph_by_id(amr, no_reentrancies=True, no_reverse_edges=False): 8 | ''' 9 | Given an AMR class provide for each node a list of all nodes "below" it 10 | in the graph. Ignore re-entrancies and reverse edges if solicited. Not 11 | ignoring re-entrancies can lead to infinite loops 12 | ''' 13 | 14 | # get re-entrant edges 15 | if no_reentrancies: 16 | reentrancy_edges = get_reentrancy_edges(amr) 17 | else: 18 | reentrancy_edges = [] 19 | 20 | # Gather constituents bottom-up 21 | # find leaf nodes 22 | leaf_nodes = [] 23 | for nid, nname in amr.nodes.items(): 24 | child_edges = [(nid, label, tgt) for tgt, label in amr.children(nid)] 25 | # If no nodes, or nodes are re-entrant 26 | if set(child_edges) <= set(reentrancy_edges): 27 | leaf_nodes.append(nid) 28 | # start from leaf nodes and go upwards ignoring re-entrant edges 29 | # store subgraph for every node as all the nodes "below" it on the tree 30 | subgraph_by_id = defaultdict(set) 31 | candidates = leaf_nodes 32 | new_nodes = True 33 | count = 0 34 | while new_nodes: 35 | new_candidates = set() 36 | new_nodes = False 37 | for nid in candidates: 38 | # ignore re-entrant nodes 39 | unique_parents = [] 40 | for (src, label) in amr.parents(nid): 41 | if ( 42 | (src, label, nid) not in reentrancy_edges 43 | and not (no_reverse_edges and label.endswith('-of')) 44 | ): 45 | unique_parents.append(src) 46 | if len(unique_parents) == 0: 47 | continue 48 | elif len(unique_parents) > 1: 49 | set_trace(context=30) 50 | # colect subgraph for this node 51 | src = unique_parents[0] 52 | subgraph_by_id[src] |= set([nid]) 53 | subgraph_by_id[src] |= set(subgraph_by_id[nid]) 54 | new_candidates |= set([src]) 55 | new_nodes = True 56 | 57 | candidates = new_candidates 58 | 59 | count += 1 60 | if count > 1000: 61 | set_trace(context=30) 62 | print() 63 | 64 | return subgraph_by_id 65 | 66 | 67 | def get_constituents_from_subgraph(amr): 68 | '''Get spans associated to each subgraph''' 69 | 70 | # get the subgraph below each node 71 | subgraph_by_id = get_subgraph_by_id(amr) 72 | 73 | def get_constituent(nid): 74 | '''Given nid and subgraph extract span aligned to it''' 75 | # Token aligned to node 76 | indices = copy.deepcopy(amr.alignments[nid]) 77 | sids = subgraph_by_id[nid] 78 | if sids is not None: 79 | # Tokens aligned to all nodes below it 80 | for id in sids: 81 | if amr.alignments[id] is None: 82 | continue 83 | for idx in amr.alignments[id]: 84 | if indices is None: 85 | print('Alignment for node ',nid, 'not found but constituents are added') 86 | indices = [] 87 | indices.append(idx) 88 | if indices: 89 | return min(indices), max(indices) 90 | else: 91 | return None, None 92 | 93 | # gather constituents associated to each node 94 | candidates = [amr.root] 95 | depth = 0 96 | depths = [depth] 97 | constituent_spans = [] 98 | count = 0 99 | while candidates and count < 1000: 100 | nid = candidates.pop() 101 | ndepth = depths.pop() 102 | start, end = get_constituent(nid) 103 | if start is None: 104 | count += 1 105 | continue 106 | # Add constituent to list 107 | constituent_spans.append(dict( 108 | depth=ndepth, 109 | indices=(start, end+1), 110 | head=amr.nodes[nid], 111 | head_position=amr.alignments[nid], 112 | nid=nid 113 | )) 114 | reentrancy_edges = get_reentrancy_edges(amr) 115 | # update candidates, ignore re-entrant nodes 116 | candidates.extend([ 117 | tgt for tgt, label in amr.children(nid) 118 | if (nid, label, tgt) not in reentrancy_edges 119 | ]) 120 | depth += 1 121 | depths.extend([depth for _ in range(len(amr.children(nid)))]) 122 | count += 1 123 | 124 | if count == 1000: 125 | # We got trapped in a loop 126 | set_trace(context=30) 127 | pass 128 | 129 | return {'tokens': amr.tokens, 'constituents': constituent_spans} 130 | 131 | 132 | def get_reentrancy_edges(amr): 133 | 134 | # Get re-entrant edges i.e. extra parents. We keep the edge closest to 135 | # root 136 | # annotate depth at which edeg occurs 137 | candidates = [amr.root] 138 | depths = [0] 139 | depth_by_edge = dict() 140 | while candidates: 141 | for (tgt, label) in amr.children(candidates[0]): 142 | edge = (candidates[0], label, tgt) 143 | if edge in depth_by_edge: 144 | continue 145 | depth_by_edge[edge] = depths[0] 146 | candidates.append(tgt) 147 | depths.append(depths[0] + 1) 148 | candidates.pop(0) 149 | depths.pop(0) 150 | 151 | # in case of multiple parents keep the one closest to the root 152 | reentrancy_edges = [] 153 | for nid, nname in amr.nodes.items(): 154 | parents = [(src, label, nid) for src, label in amr.parents(nid)] 155 | if nid == amr.root: 156 | # Root can not have parents 157 | reentrancy_edges.extend(parents) 158 | elif len(parents) > 1: 159 | # Keep only highest edge from re-entrant ones 160 | # FIXME: Unclear why depth is missing sometimes 161 | reentrancy_edges.extend( 162 | sorted(parents, key=lambda e: depth_by_edge.get(e, 1000))[1:] 163 | ) 164 | return reentrancy_edges 165 | 166 | def get_predicates(amr): 167 | pred_regex = re.compile('.+-[0-9]+$') 168 | 169 | 170 | num_preds = 0 171 | pred_ret = [] 172 | 173 | predicates = {n:v for n,v in amr.nodes.items() if pred_regex.match(v) and not v.endswith('91') and v not in ['have-half-life.01']} 174 | num_preds += len(predicates) 175 | for pred in predicates: 176 | if amr.alignments[pred] is not None: 177 | 178 | args = { 179 | trip[1][1:].replace('-of', ''):(amr.nodes[trip[2]],amr.tokens[min(amr.alignments[trip[2]])],amr.alignments[trip[2]]) 180 | for trip in amr.edges 181 | if trip[0] == pred #and trip[1].startswith(':ARG') 182 | } 183 | 184 | 185 | pred_ret.append({'pred':predicates[pred],'text':amr.tokens[min(amr.alignments[pred])],'args':args,'beg':min(amr.alignments[pred]),'end':max(amr.alignments[pred])+1}) 186 | 187 | return pred_ret 188 | 189 | -------------------------------------------------------------------------------- /doc_amr_baseline/baseline_io.py: -------------------------------------------------------------------------------- 1 | from tqdm import tqdm 2 | import penman 3 | import os, sys 4 | sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__)))) 5 | from amr_io import AMR 6 | import re 7 | from collections import defaultdict 8 | 9 | alignment_regex = re.compile('(-?[0-9]+)-(-?[0-9]+)') 10 | class AMR2(AMR): 11 | 12 | @classmethod 13 | def get_all_vars(cls, penman_str): 14 | in_quotes = False 15 | all_vars = [] 16 | for (i,ch) in enumerate(penman_str): 17 | if ch == '"': 18 | if in_quotes: 19 | in_quotes = False 20 | else: 21 | in_quotes = True 22 | if in_quotes: 23 | continue 24 | if ch == '(': 25 | var = '' 26 | j = i+1 27 | while j < len(penman_str) and penman_str[j] not in [' ','\n']: 28 | var += penman_str[j] 29 | j += 1 30 | all_vars.append(var) 31 | return all_vars 32 | 33 | @classmethod 34 | def get_node_var(cls, penman_str, node_id): 35 | """ 36 | find node variable based on ids like 0.0.1 37 | """ 38 | nid = '99990.0.0.0.0.0.0' 39 | cidx = [] 40 | lvls = [] 41 | all_vars = AMR2.get_all_vars(penman_str) 42 | in_quotes = False 43 | for (i,ch) in enumerate(penman_str): 44 | if ch == '"': 45 | if in_quotes: 46 | in_quotes = False 47 | else: 48 | in_quotes = True 49 | if in_quotes: 50 | continue 51 | 52 | if ch == ":": 53 | idx = i 54 | while idx < len(penman_str) and penman_str[idx] != ' ': 55 | idx += 1 56 | if idx+1 < len(penman_str) and penman_str[idx+1] != '(': 57 | var = '' 58 | j = idx+1 59 | while j < len(penman_str) and penman_str[j] not in [' ','\n']: 60 | var += penman_str[j] 61 | j += 1 62 | if var not in all_vars: 63 | lnum = len(lvls) 64 | if lnum >= len(cidx): 65 | cidx.append(1) 66 | else: 67 | cidx[lnum] += 1 68 | if ch == '(': 69 | lnum = len(lvls) 70 | if lnum >= len(cidx): 71 | cidx.append(0) 72 | lvls.append(str(cidx[lnum])) 73 | 74 | if ch == ')': 75 | lnum = len(lvls) 76 | if lnum < len(cidx): 77 | cidx.pop() 78 | cidx[lnum-1] += 1 79 | lvls.pop() 80 | 81 | if ".".join(lvls) == node_id: 82 | j = i+1 83 | while penman_str[j] == ' ': 84 | j += 1 85 | var = "" 86 | while penman_str[j] != ' ': 87 | var += penman_str[j] 88 | j += 1 89 | return var 90 | 91 | return None 92 | 93 | @classmethod 94 | def from_metadata(cls, penman_text, tokenize=False): 95 | """Read AMR from metadata (IBM style)""" 96 | 97 | # Read metadata from penman 98 | field_key = re.compile(f'::[A-Za-z]+') 99 | metadata = defaultdict(list) 100 | separator = None 101 | penman_str = "" 102 | for line in penman_text: 103 | if line.startswith('#'): 104 | line = line[2:].strip() 105 | start = 0 106 | for point in field_key.finditer(line): 107 | end = point.start() 108 | value = line[start:end] 109 | if value: 110 | metadata[separator].append(value) 111 | separator = line[end:point.end()][2:] 112 | start = point.end() 113 | value = line[start:] 114 | if value: 115 | metadata[separator].append(value) 116 | else: 117 | penman_str += line.strip() + ' ' 118 | 119 | # assert 'tok' in metadata, "AMR must contain field ::tok" 120 | if tokenize: 121 | assert 'snt' in metadata, "AMR must contain field ::snt" 122 | tokens, _ = protected_tokenizer(metadata['snt'][0]) 123 | else: 124 | assert 'tok' in metadata, "AMR must contain field ::tok" 125 | assert len(metadata['tok']) == 1 126 | tokens = metadata['tok'][0].split() 127 | 128 | #print(penman_str) 129 | 130 | sid="000" 131 | nodes = {} 132 | nvars = {} 133 | alignments = {} 134 | edges = [] 135 | root = None 136 | sentence = None 137 | 138 | if 'short' in metadata: 139 | short_str = metadata["short"][0].split('\t')[1] 140 | short = eval(short_str) 141 | short = {str(k):v for k,v in short.items()} 142 | all_vars = list(short.values()) 143 | else: 144 | short = None 145 | all_vars = AMR2.get_all_vars(penman_str) 146 | 147 | 148 | for key, value in metadata.items(): 149 | if key == 'edge': 150 | for items in value: 151 | items = items.split('\t') 152 | if len(items) == 6: 153 | _, _, label, _, src, tgt = items 154 | edges.append((src, f':{label}', tgt)) 155 | elif key == 'node': 156 | for items in value: 157 | items = items.split('\t') 158 | if len(items) > 3: 159 | _, node_id, node_name, alignment = items 160 | start, end = alignment_regex.match(alignment).groups() 161 | indices = list(range(int(start), int(end))) 162 | alignments[node_id] = indices 163 | else: 164 | _, node_id, node_name = items 165 | alignments[node_id] = None 166 | nodes[node_id] = node_name 167 | if short is not None: 168 | var = short[node_id] 169 | else: 170 | var = node_id 171 | if var is not None and var+" / " not in penman_str: 172 | nvars[node_id] = None 173 | else: 174 | nvars[node_id] = var 175 | all_vars.remove(var) 176 | elif key == 'root': 177 | root = value[0].split('\t')[1] 178 | elif key == 'id': 179 | sid = value[0].strip() 180 | if len(all_vars): 181 | print("varaible not linked to nodes:") 182 | print(all_vars) 183 | print(penman_str) 184 | return cls(tokens, nodes, edges, root, penman=None, 185 | alignments=alignments, nvars=nvars, sid=sid) 186 | 187 | def read_amr2(file_path, ibm_format=False, tokenize=False): 188 | with open(file_path) as fid: 189 | raw_amr = [] 190 | raw_amrs = [] 191 | for line in tqdm(fid.readlines(), desc='Reading AMR'): 192 | if line.strip() == '': 193 | if ibm_format: 194 | # From ::node, ::edge etc 195 | raw_amrs.append( 196 | AMR2.from_metadata(raw_amr, tokenize=tokenize) 197 | ) 198 | else: 199 | # From penman 200 | raw_amrs.append( 201 | AMR.from_penman(raw_amr, tokenize=tokenize) 202 | ) 203 | raw_amr = [] 204 | else: 205 | raw_amr.append(line) 206 | return raw_amrs 207 | 208 | def read_amr3(file_path, ibm_format=False, tokenize=False): 209 | with open(file_path) as fid: 210 | raw_amr = [] 211 | raw_amrs = {} 212 | for line in tqdm(fid.readlines(), desc='Reading AMR'): 213 | if line.strip() == '': 214 | if ibm_format: 215 | # From ::node, ::edge etc 216 | amr = AMR2.from_metadata(raw_amr, tokenize=tokenize) 217 | else: 218 | # From penman 219 | 220 | amr = AMR.from_penman(raw_amr, tokenize=tokenize) 221 | 222 | raw_amrs[amr.sid] = amr 223 | raw_amr = [] 224 | else: 225 | raw_amr.append(line) 226 | return raw_amrs 227 | 228 | def read_amr3_docid(file_path, ibm_format=False, tokenize=False): 229 | doc_id = None 230 | with open(file_path) as fid: 231 | raw_amr = [] 232 | raw_amrs = {} 233 | 234 | for line in tqdm(fid.readlines(), desc='Reading AMR'): 235 | if line.strip() == '': 236 | if ibm_format: 237 | # From ::node, ::edge etc 238 | amr = AMR2.from_metadata(raw_amr, tokenize=tokenize) 239 | else: 240 | # From penman 241 | amr = AMR.from_penman(raw_amr, tokenize=tokenize) 242 | raw_amrs[amr.sid] = amr 243 | if doc_id is None: 244 | doc_id = amr.sid.rsplit('.',1)[0] 245 | 246 | 247 | raw_amr = [] 248 | else: 249 | raw_amr.append(line) 250 | return raw_amrs,doc_id 251 | 252 | #store by sen 253 | def read_amr_by_snt(file_path, tokenize=False): 254 | with open(file_path) as fid: 255 | raw_amr = [] 256 | raw_amrs = {} 257 | for line in tqdm(fid.readlines(), desc='Reading AMR'): 258 | if line.strip() == '': 259 | 260 | amr = AMR.from_penman(raw_amr, tokenize=tokenize) 261 | if tokenize: 262 | tok_sen = " ".join(amr.tokens) 263 | raw_amrs[tok_sen] = raw_amr 264 | else: 265 | raw_amrs[amr.penman.metadata['tok']] = raw_amr 266 | raw_amr = [] 267 | else: 268 | raw_amr.append(line) 269 | return raw_amrs 270 | 271 | def read_amr_raw(file_path, tokenize=False): 272 | with open(file_path) as fid: 273 | raw_amr = [] 274 | raw_amrs = [] 275 | for line in tqdm(fid.readlines(), desc='Reading AMR'): 276 | if line.strip() == '': 277 | 278 | amr = AMR.from_penman(raw_amr, tokenize=tokenize) 279 | raw_amrs.append(raw_amr) 280 | raw_amr = [] 281 | else: 282 | raw_amr.append(line) 283 | return raw_amrs 284 | 285 | def read_amr_as_raw_str(file_path): 286 | with open(file_path) as fid: 287 | raw_amrs = [] 288 | raw_amr = '' 289 | for line in fid.readlines(): 290 | if line.strip(): 291 | raw_amr+=line.strip()+'\n' 292 | else: 293 | raw_amrs.append(raw_amr) 294 | raw_amr = '' 295 | return raw_amrs 296 | 297 | #for amrs without an ::id 298 | def read_amr_add_sen_id(file_path,doc_id,remove_id=False,tokenize=False,ibm_format=True): 299 | with open(file_path) as fid: 300 | raw_amr = [] 301 | raw_amrs = {} 302 | for line in tqdm(fid.readlines(), desc='Reading AMR'): 303 | if line.strip() == '': 304 | if ibm_format: 305 | # From ::node, ::edge etc 306 | amr = AMR2.from_metadata(raw_amr, tokenize=tokenize) 307 | else: 308 | # From penman 309 | amr = AMR.from_penman(raw_amr, tokenize=tokenize) 310 | raw_amrs[amr.sid] = amr 311 | raw_amr = [] 312 | else: 313 | if remove_id and '::id' in line: 314 | continue 315 | raw_amr.append(line) 316 | if tokenize: 317 | if '::snt' in line and remove_id: 318 | raw_amr.append('# ::id '+doc_id+'.'+str(len(raw_amrs)+1)) 319 | else: 320 | if '::tok' in line and remove_id: 321 | raw_amr.append('# ::id '+doc_id+'.'+str(len(raw_amrs)+1)) 322 | elif '::snt' in line and remove_id: 323 | raw_amr.append('# ::id '+doc_id+'.'+str(len(raw_amrs)+1)) 324 | 325 | 326 | return raw_amrs 327 | 328 | def read_amr_str_add_sen_id(amr_strs,doc_id,tokenize=False): 329 | raw_amrs = {} 330 | for idx,amr_str in enumerate(amr_strs): 331 | 332 | # From ::node, ::edge etc 333 | amr_list = amr_str.splitlines(True) 334 | amr_list.insert(1,'# ::id '+doc_id+'.'+str(idx+1)+'\n') 335 | # amr_list = [line+'\n# ::id '+doc_id+'.'+str(idx+1) if '::tok' in line else line for line in amr_str.split('\n')] 336 | amr = AMR2.from_metadata(amr_list, tokenize=tokenize) 337 | raw_amrs[amr.sid] = amr 338 | 339 | return raw_amrs 340 | 341 | -------------------------------------------------------------------------------- /doc_amr_baseline/example/doc_sen.amr: -------------------------------------------------------------------------------- 1 | # ::tok Hailey is going to London tomorrow . 2 | # ::node p person 0-1 3 | # ::node n name 0-1 4 | # ::node 0 Hailey 0-1 5 | # ::node g go-02 2-3 6 | # ::node c city 4-5 7 | # ::node n2 name 4-5 8 | # ::node 1 London 4-5 9 | # ::node t tomorrow 5-6 10 | # ::root g go-02 11 | # ::edge person name name p n 12 | # ::edge name op1 Hailey n 0 13 | # ::edge go-02 ARG0 person g p 14 | # ::edge go-02 ARG4 city g c 15 | # ::edge city name name c n2 16 | # ::edge name op1 London n2 1 17 | # ::edge go-02 time tomorrow g t 18 | (g / go-02 19 | :ARG0 (p / person 20 | :name (n / name 21 | :op1 "Hailey")) 22 | :ARG4 (c / city 23 | :name (n2 / name 24 | :op1 "London")) 25 | :time (t / tomorrow)) 26 | 27 | # ::tok She is planning to go to Italy after London . 28 | # ::node s she 0-1 29 | # ::node p plan-01 2-3 30 | # ::node g go-02 4-5 31 | # ::node c2 country 6-7 32 | # ::node n name 6-7 33 | # ::node 0 Italy 6-7 34 | # ::node a after 7-8 35 | # ::node c city 8-9 36 | # ::node n2 name 8-9 37 | # ::node 1 London 8-9 38 | # ::root p plan-01 39 | # ::edge plan-01 ARG0 she p s 40 | # ::edge plan-01 ARG1 go-02 p g 41 | # ::edge go-02 ARG0 she g s 42 | # ::edge go-02 ARG4 country g c2 43 | # ::edge country name name c2 n 44 | # ::edge name op1 Italy n 0 45 | # ::edge go-02 time after g a 46 | # ::edge after op1 city a c 47 | # ::edge city name name c n2 48 | # ::edge name op1 London n2 1 49 | (p / plan-01 50 | :ARG0 (s / she) 51 | :ARG1 (g / go-02 52 | :ARG0 s 53 | :ARG4 (c2 / country 54 | :name (n / name 55 | :op1 "Italy")) 56 | :time (a / after 57 | :op1 (c / city 58 | :name (n2 / name 59 | :op1 "London"))))) 60 | 61 | # ::tok She is going to see the Big Ben . 62 | # ::node s2 she 0-1 63 | # ::node s see-01 4-5 64 | # ::node b building 6-7 65 | # ::node n name 6-7 66 | # ::node 1 Big 6-7 67 | # ::node 0 Ben 7-8 68 | # ::root s see-01 69 | # ::edge see-01 ARG0 she s s2 70 | # ::edge see-01 ARG1 building s b 71 | # ::edge building name name b n 72 | # ::edge name op1 Big n 1 73 | # ::edge name op2 Ben n 0 74 | (s / see-01 75 | :ARG0 (s2 / she) 76 | :ARG1 (b / building 77 | :name (n / name 78 | :op1 "Big" 79 | :op2 "Ben"))) 80 | 81 | # ::tok Her friend Phil is meeting her in London . 82 | # ::node s she 0-1 83 | # ::node h have-rel-role-91 1-2 84 | # ::node f friend 1-2 85 | # ::node p person 2-3 86 | # ::node n name 2-3 87 | # ::node 1 Phil 2-3 88 | # ::node m meet-03 4-5 89 | # ::node c city 7-8 90 | # ::node n2 name 7-8 91 | # ::node 0 London 7-8 92 | # ::root m meet-03 93 | # ::edge have-rel-role-91 ARG1 she h s 94 | # ::edge have-rel-role-91 ARG2 friend h f 95 | # ::edge person ARG0-of have-rel-role-91 p h 96 | # ::edge person name name p n 97 | # ::edge name op1 Phil n 1 98 | # ::edge meet-03 ARG0 person m p 99 | # ::edge meet-03 ARG1 she m s 100 | # ::edge meet-03 location city m c 101 | # ::edge city name name c n2 102 | # ::edge name op1 London n2 0 103 | (m / meet-03 104 | :ARG0 (p / person 105 | :name (n / name 106 | :op1 "Phil") 107 | :ARG0-of (h / have-rel-role-91 108 | :ARG1 (s / she) 109 | :ARG2 (f / friend))) 110 | :ARG1 s 111 | :location (c / city 112 | :name (n2 / name 113 | :op1 "London"))) 114 | 115 | -------------------------------------------------------------------------------- /doc_amr_baseline/example/docamr_docAMR.out: -------------------------------------------------------------------------------- 1 | # ::id sentence_test 2 | # ::doc_file sentence_test 3 | # ::tok Hailey is going to London tomorrow . She is planning to go to Italy after London . She is going to see the Big Ben . Her friend Phil is meeting her in London . 4 | (d / document 5 | :snt1 (s1.g / go-02 6 | :ARG0 (s1.p / person 7 | :name (s1.n / name 8 | :op1 "Hailey")) 9 | :ARG4 (s2.c / city 10 | :name (s2.n2 / name 11 | :op1 "London")) 12 | :time (s1.t / tomorrow)) 13 | :snt2 (s2.p / plan-01 14 | :ARG0 (pro / she) 15 | :ARG1 (s2.g / go-02 16 | :ARG0 pro 17 | :ARG4 (s2.c2 / country 18 | :name (s2.n / name 19 | :op1 "Italy")) 20 | :time (s2.a / after 21 | :op1 s2.c))) 22 | :snt3 (s3.s / see-01 23 | :ARG0 pro 24 | :ARG1 (s3.b / building 25 | :name (s3.n / name 26 | :op1 "Big" 27 | :op2 "Ben"))) 28 | :snt4 (s4.m / meet-03 29 | :ARG0 (s4.p / person 30 | :name (s4.n / name 31 | :op1 "Phil") 32 | :ARG0-of (s4.h / have-rel-role-91 33 | :ARG1 pro 34 | :ARG2 (s4.f / friend))) 35 | :ARG1 pro 36 | :location s2.c)) 37 | 38 | -------------------------------------------------------------------------------- /doc_amr_baseline/get_allen_coref.py: -------------------------------------------------------------------------------- 1 | #conda activate allen_nlp 2 | import allennlp 3 | from allennlp.predictors.predictor import Predictor 4 | from itertools import accumulate 5 | # import allennlp_models.tagging 6 | import glob 7 | import pickle 8 | from tqdm import tqdm 9 | import argparse 10 | 11 | predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2021.03.10.tar.gz") 12 | 13 | def get_allen_coref(filepath,from_amr=False): 14 | 15 | f1 = open(filepath,'r').read() 16 | sen_list = f1.splitlines() 17 | if from_amr: 18 | sen_tok_list = [s.split('::tok ')[-1].split() for s in sen_list if '::tok' in s] 19 | else: 20 | sen_tok_list = [s.split() for s in sen_list] 21 | sen_tok_len = [len(l) for l in sen_tok_list] 22 | sen_tok_len = list(accumulate(sen_tok_len)) 23 | sen_tok = [item for sublist in sen_tok_list for item in sublist] 24 | 25 | 26 | pred = predictor.predict_tokenized(tokenized_document=sen_tok) 27 | clusters = pred['clusters'] 28 | document = pred['document'] 29 | new_cluster = [] 30 | 31 | for c in clusters: 32 | new_c = [] 33 | for m in c: 34 | for idx,l in enumerate(sen_tok_len): 35 | if m[0]< l: 36 | prev_len = sen_tok_len[idx-1] 37 | break 38 | if idx!=0: 39 | new_c.append([idx,m[0]-prev_len,m[1]-prev_len]) 40 | else: 41 | new_c.append([idx,m[0],m[1]]) 42 | new_cluster.append(new_c) 43 | return new_cluster 44 | 45 | if __name__ == "__main__": 46 | parser = argparse.ArgumentParser() 47 | parser.add_argument('--path_to_sen',type=str) 48 | parser.add_argument('--path_to_out',type=str,help='path to output',default=None) 49 | parser.add_argument('--from_amr',action='store_true') 50 | parser.add_argument('--from_json',action='store_true') 51 | 52 | args = parser.parse_args() 53 | doc_clusters = {} 54 | args.path_to_sen+='/' 55 | 56 | 57 | i = 0 58 | if args.from_amr: 59 | ext = '.amr' 60 | else: 61 | ext = '.txt' 62 | 63 | # if from_json: 64 | # json_dict = json.load(open(path_to_sen)) 65 | # for doc_id,doc_val in json_dict.items(): 66 | # amr_strs = [s['sentence'] for s in doc_val['sentences'].values()] 67 | # path_fill = path_to_sen+'doc*'+ext 68 | path_fill = args.path_to_sen+'*'+ext 69 | 70 | for filepath in tqdm(glob.iglob(path_fill)): 71 | doc_id = filepath.split('/')[-1].split('.')[0] 72 | clusters = get_allen_coref(filepath,from_amr=args.from_amr) 73 | doc_clusters[doc_id] = clusters 74 | i+=1 75 | 76 | if args.path_to_out is None: 77 | out_path = args.path_to_sen+'/allen_spanbert_large-2021.03.10.coref' 78 | else: 79 | out_path = args.path_to_out+'/allen_spanbert_large-2021.03.10.coref' 80 | with open(out_path,'wb') as f2: 81 | pickle.dump(doc_clusters,f2) 82 | -------------------------------------------------------------------------------- /doc_amr_baseline/make_doc_amr.py: -------------------------------------------------------------------------------- 1 | import json 2 | import os 3 | import glob 4 | from itertools import groupby 5 | from operator import itemgetter 6 | import copy 7 | import collections 8 | import pickle 9 | import tqdm 10 | 11 | import argparse 12 | from .baseline_io import ( 13 | 14 | 15 | read_amr_add_sen_id, 16 | read_amr_str_add_sen_id, 17 | read_amr3_docid 18 | 19 | ) 20 | import os, sys 21 | sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__)))) 22 | 23 | from doc_amr import make_doc_amrs,connect_sen_amrs 24 | 25 | from .amr_constituents import get_subgraph_by_id,get_constituents_from_subgraph 26 | 27 | 28 | 29 | def get_node_from_subgraph(subgraph,beg,end,doc_amr=None,verbose=False): 30 | candidate_nodes = [] 31 | secondary_candidates =[] 32 | for head in subgraph['constituents']: 33 | if head['indices'][0]==beg and head['indices'][1]==end+1: 34 | name = head['head'] 35 | return head['nid'] 36 | elif head['indices'][0] in range(beg,end+2) and head['indices'][1] in range(beg,end+2): 37 | candidate_nodes.append(head) 38 | elif head['head_position'] is not None: 39 | if max(head['head_position'])==end: #or end == head['head_position'][0]: This made other nodes wrong 40 | secondary_candidates.append(head) 41 | 42 | # al_node = get_node_from_alignment(doc_amr, beg, end) 43 | 44 | if len(candidate_nodes) == 1: 45 | if verbose: 46 | print('approx alignment node') 47 | name = candidate_nodes[0]['head'] 48 | return candidate_nodes[0]['nid'] 49 | elif len(candidate_nodes)>1: 50 | if verbose: 51 | print('subset alignment mindepth node') 52 | mindepth = min(candidate_nodes, key=lambda x:x['depth']) 53 | name = mindepth['head'] 54 | return mindepth['nid'] 55 | elif len(secondary_candidates)>0: 56 | if verbose: 57 | print('end alignment mindepth node') 58 | mindepth = min(secondary_candidates, key=lambda x:x['depth']) 59 | name = mindepth['head'] 60 | return mindepth['nid'] 61 | 62 | 63 | return None 64 | 65 | 66 | def construct_triples(doc_amrs,from_sen_id,from_node_id,sen_node_pairs,relation,verbose=False): 67 | triples = [] 68 | for (full_sen_id,full_node_id) in sen_node_pairs: 69 | from_node = doc_amrs[full_sen_id].nvars[full_node_id] 70 | to_node = doc_amrs[from_sen_id].nvars[from_node_id] 71 | if from_node is not None and to_node is not None: 72 | trip = (full_sen_id+'.'+from_node,relation,from_sen_id+'.'+to_node) 73 | triples.append(trip) 74 | else: 75 | if from_node is None and verbose: 76 | 77 | print(full_node_id ,' node id is not recognized in sentence ',full_sen_id) 78 | elif to_node is None and verbose: 79 | 80 | print(from_node_id ,' node id is not recognized in sentence ',from_sen_id) 81 | return triples 82 | 83 | 84 | def process_coref_conll(amrs,coref_chains,add_coref=True,verbose=False,save_triples=False,out=None,relation='same-as',coref_type='allennlp'): 85 | corefs = {} 86 | for doc_id,doc_amrs in tqdm.tqdm(amrs.items()): 87 | doc_triples = [] 88 | doc_sids = list(doc_amrs.keys()) 89 | sid_done =[] 90 | if add_coref and doc_id in coref_chains: 91 | #getting subgraph information of each amr 92 | subgraphs = {f_id: get_constituents_from_subgraph(doc_amrs[f_id]) for f_id in doc_amrs if doc_amrs[f_id].root is not None and len(doc_amrs[f_id].alignments)>0 } 93 | for ent in coref_chains[doc_id]: 94 | min_id = (None,None) 95 | sen_node_pairs = [] 96 | for mention in ent: 97 | if coref_type=='conll': 98 | sid = mention[0] 99 | elif coref_type=='allennlp': 100 | sid = mention[0]+1 101 | beg = mention[1] 102 | end = mention[2] 103 | sen_id = doc_id+'.'+str(sid) 104 | if sen_id in subgraphs: 105 | node_id = get_node_from_subgraph(subgraphs[sen_id], beg, end,doc_amrs[sen_id],verbose=verbose) 106 | else: 107 | node_id = None 108 | if node_id is None: 109 | if sen_id in sid_done: 110 | if verbose: 111 | print('maybe inter amr coref ,node not found') 112 | else: 113 | if verbose: 114 | print('node not found') 115 | print(mention) 116 | continue 117 | else: 118 | sid_done.append(sen_id) 119 | 120 | if min_id[0] is None : 121 | min_id = (sid,node_id) 122 | elif sid < min_id[0]: 123 | min_full_id = doc_id+'.'+str(min_id[0]) 124 | sen_node_pairs.append((min_full_id,min_id[1])) 125 | min_id = (sid,node_id) 126 | else: 127 | sen_node_pairs.append((sen_id,node_id)) 128 | 129 | min_full_id = doc_id+'.'+str(min_id[0]) 130 | triples = construct_triples(doc_amrs,min_full_id,min_id[1],sen_node_pairs,relation) 131 | doc_triples.extend(triples) 132 | 133 | corefs[doc_id] = (doc_triples,doc_sids,doc_id) 134 | if save_triples: 135 | with open(args.out_amr.replace('.amr','.triples'),'wb') as f1: 136 | pickle.dump(corefs,f1) 137 | 138 | return corefs 139 | 140 | 141 | def main(): 142 | 143 | parser = argparse.ArgumentParser() 144 | parser.add_argument('--path_to_coref',type=str,required=True) 145 | parser.add_argument('--path_to_amr',type=str,help='path to folder containing list of amr files per document',required=True) 146 | parser.add_argument('--out_amr',type=str,help='path to output',required=True) 147 | parser.add_argument('--add_id',action='store_true',help='add id to amr') 148 | parser.add_argument('--add_coref',action='store_true',default=False,help='add coref to doc amr') 149 | parser.add_argument('--allennlp',action='store_true',help='coref format pickled coref chains allen_nlp') 150 | parser.add_argument('--conll',action='store_true',help='coref format is conll') 151 | parser.add_argument('--event',action='store_true',help='perform event coref') 152 | parser.add_argument('--norm_rep',type=str,default='docAMR',help='''normalization format 153 | "no-merge" -- No node merging, only chain-nodes 154 | "merge-names" -- Merge only names 155 | "docAMR" -- Merge names and drop pronouns 156 | "merge-all" -- Merge all nodes''') 157 | parser.add_argument('--verbose',action='store_true',help='Print types of nodes found') 158 | parser.add_argument('--save_triples',action='store_true',help='save triples as pickle') 159 | parser.add_argument('--tokenize',action='store_true',help='::tok not availabble in the parse') 160 | parser.add_argument('--sort_alpha',action='store_true',help='sort doc names alphabetically, default is False ie sorting numerically') 161 | parser.add_argument('--use_penman',action='store_true',default=True,help='use penman graph to construct doc amr (in the case of wiki)') 162 | parser.add_argument('--path_to_penman',type=str,default = None,help='optional path to folder to get penman amr') 163 | 164 | 165 | args = parser.parse_args() 166 | pat = '*' 167 | amrs = {} 168 | coref = {} 169 | mentions = {} 170 | entities = {} 171 | amrs_dict = {} 172 | event_clusters = {} 173 | events = {} 174 | sort_alpha = True 175 | amrs_penman = {} 176 | amrs_penman_dict = {} 177 | 178 | args.path_to_amr+='/' 179 | assert args.norm_rep in ['docAMR','no-merge','merge-names','merge-all'],'Norm represenation should be one of the following docAMR, no-merge, merge-names, merge-all' 180 | 181 | if not glob.glob(args.path_to_amr+pat+'.amr'): 182 | if not glob.glob(args.path_to_amr+pat+'.parse'): 183 | raise Exception("--path_to_amr folder does not contain .amr files or .parse files ") 184 | else: 185 | sort_alpha = True 186 | filepaths = glob.iglob(args.path_to_amr+pat+'.parse') 187 | else: 188 | filepaths = glob.iglob(args.path_to_amr+pat+'.amr') 189 | 190 | filepaths = list(filepaths) 191 | 192 | #FIXME sorting of sentence amrs based on filename,change to a universal sorting method 193 | if 'msamr_df' in filepaths[0].split('/')[-1]: 194 | sort_alpha = True 195 | if sort_alpha or args.sort_alpha: 196 | sorted_filepaths = sorted(filepaths,key=lambda t: t.split('/')[-1].split('.')[0]) 197 | else: 198 | #sort doc_ numerically 199 | sorted_filepaths = sorted(filepaths,key=lambda t: int(t.split('/')[-1].split('.')[0].split('_')[-1])) 200 | #sorted(filepaths,key=lambda t: t.split('.')[0]) 201 | # sorted_filepaths_dict = {'doc_'+str(idx): item.split('/')[-1].split('.')[0] for idx,item in enumerate(sorted_filepaths)} 202 | #sorted_filepaths_dict = {'doc_'+str(idx): item.split('/')[-1].split('.')[0] for item in filepaths} 203 | for filepath in sorted_filepaths: 204 | doc_id = filepath.split('/')[-1].split('.')[0] 205 | if args.add_id: 206 | amrs[doc_id] = read_amr_add_sen_id(filepath, doc_id,remove_id=args.add_id,tokenize=args.tokenize) 207 | if args.path_to_penman is not None: 208 | amrs_penman[doc_id] = read_amr_add_sen_id(args.path_to_penman+filepath.split('/')[-1], doc_id,remove_id=args.add_id,tokenize=args.tokenize,ibm_format=False) 209 | else: 210 | amrs_penman[doc_id] = read_amr_add_sen_id(filepath, doc_id,remove_id=args.add_id,tokenize=args.tokenize,ibm_format=False) 211 | else: 212 | d_amrs,doc_id = read_amr3_docid(filepath,ibm_format=True) 213 | amrs[doc_id] = d_amrs 214 | if args.path_to_penman is not None: 215 | amrs_penman[doc_id],doc_id = read_amr3_docid(args.path_to_penman+filepath.split('/')[-1],ibm_format=False) 216 | else: 217 | amrs_penman[doc_id],doc_id = read_amr3_docid(filepath,ibm_format=False) 218 | 219 | amrs_penman_dict.update(amrs_penman[doc_id]) 220 | 221 | amrs_dict.update(amrs[doc_id]) 222 | 223 | 224 | if args.allennlp: 225 | #Getting coref from allen-nlp Spanbert model 226 | coref_chains = {} 227 | out = pickle.load(open(args.path_to_coref,'rb')) 228 | for i,(doc_id,val) in enumerate(out.items()): 229 | 230 | coref_chains[doc_id] = val 231 | assert len(coref_chains)>0,"Coref file is empty" 232 | corefs = process_coref_conll(amrs,coref_chains,args.add_coref,verbose=args.verbose,save_triples=args.save_triples,coref_type='allennlp') 233 | elif args.conll: 234 | from corefconversion.conll_transform import read_file as conll_read_file 235 | from corefconversion.conll_transform import compute_chains as conll_compute_chains 236 | 237 | coref_chains = {} 238 | out = conll_read_file(args.path_to_coref) 239 | for n,(i,val) in enumerate(out.items()): 240 | 241 | docid_spl = i.split('); part ') 242 | doc_id = docid_spl[0].split('/')[-1]+'_'+str(int(docid_spl[1])) 243 | coref_chains[doc_id] = conll_compute_chains(val) 244 | assert len(coref_chains)>0,"Coref file is empty" 245 | corefs = process_coref_conll(amrs,coref_chains,save_triples=args.save_triples,out=args.out_amr,coref_type='conll') 246 | 247 | 248 | 249 | 250 | 251 | #FIXME sorting of sentence amrs based on filename,change to a universal sorting method 252 | if args.add_id and not sort_alpha and not args.sort_alpha: 253 | corefs = collections.OrderedDict(sorted(corefs.items(),key=lambda t: int(t[0].split('.')[0].split('_')[-1]))) 254 | else: 255 | corefs = collections.OrderedDict(sorted(corefs.items(),key=lambda t: t[0].split('.')[0])) 256 | 257 | #use_penman is set to True by default , penman format is used to construct the final doc-amr 258 | if args.use_penman: 259 | out_doc_amrs = make_doc_amrs(corefs=corefs,amrs=amrs_penman_dict,chains=False) 260 | else: 261 | out_doc_amrs = make_doc_amrs(corefs=corefs,amrs=amrs_dict,chains=False) 262 | 263 | out_dir = args.out_amr.rsplit('/',1)[0] 264 | if not os.path.isdir(out_dir): 265 | os.makedirs(out_dir) 266 | 267 | #with open(args.out_amr+'/'+doc_id+'_docamr_'+args.norm_rep+'.out', 'w') as fid: 268 | with open(args.out_amr+'/'+args.path_to_amr.split('/')[-1]+'docamr_'+args.norm_rep+'.out', 'w') as fid: 269 | 270 | for doc_id,amr in tqdm(out_doc_amrs.items(),'writing doc-amrs'): 271 | 272 | damr = copy.deepcopy(amr) 273 | connect_sen_amrs(damr) 274 | damr.make_chains_from_pairs() 275 | damr.normalize(args.norm_rep) 276 | damr_str = damr.__str__() 277 | 278 | 279 | fid.write(damr_str) 280 | fid.close() 281 | 282 | if __name__ == "__main__": 283 | main() 284 | -------------------------------------------------------------------------------- /doc_amr_baseline/requirements.txt: -------------------------------------------------------------------------------- 1 | allennlp==2.10.0 2 | allennlp-models==2.10.0 3 | ipdb==0.13.9 4 | penman==1.2.1 5 | tqdm==4.64.0 6 | importlib_metadata==6.6.0 7 | -------------------------------------------------------------------------------- /doc_amr_baseline/run_doc_amr_baseline.sh: -------------------------------------------------------------------------------- 1 | set -o pipefail 2 | set -o errexit 3 | . set_environment.sh 4 | HELP="$0 " 8 | [ -z $1 ] && echo "$HELP" && exit 1 9 | [ -z $2 ] && echo "$HELP" && exit 1 10 | 11 | path_to_sentence_amr=$1 12 | out_amr=$2 13 | rep=$3 14 | path_to_coref=$4 15 | set -o nounset 16 | 17 | if [ -z $rep ];then 18 | rep='docAMR' 19 | fi 20 | 21 | if [ -z $path_to_coref ];then 22 | echo "Getting coref for sentences" 23 | coref_filename='allen_spanbert_large-2021.03.10.coref' 24 | python doc_amr_baseline/get_allen_coref.py \ 25 | --path_to_sen $path_to_sentence_amr \ 26 | --from_amr 27 | path_to_coref=$path_to_sentence_amr/$coref_filename 28 | fi 29 | 30 | echo "Doc Level AMRs:" 31 | python doc_amr_baseline/make_doc_amr.py \ 32 | --path_to_coref $path_to_coref \ 33 | --path_to_amr $path_to_sentence_amr \ 34 | --out_amr $out_amr \ 35 | --add_coref \ 36 | --add_id \ 37 | --allennlp \ 38 | --norm_rep $rep \ 39 | --sort_alpha 40 | 41 | 42 | 43 | 44 | 45 | 46 | -------------------------------------------------------------------------------- /doc_amr_baseline/tests/baseline_allennlp_test.sh: -------------------------------------------------------------------------------- 1 | set -o errexit 2 | set -o pipefail 3 | set -o nounset 4 | 5 | [ -z $1 ] && echo "$0 " && exit 1 6 | [ -z $2 ] && echo "$0 " && exit 1 7 | [ -z $3 ] && echo "$0 " && exit 1 8 | 9 | gold_amr=$1 10 | sentence_amr=$2 11 | rep=$3 12 | 13 | # running baseline with AMR3 document amrs test split 14 | bash run_doc_amr_baseline.sh $sentence_amr $sentence_amr $rep 15 | 16 | echo "Computing Smatch " 17 | python docSmatch/smatch.py -r 10 --significant 4 -f $gold_amr ${sentence_amr}/docamr_${rep}.out 18 | 19 | printf "[\033[92m OK \033[0m] $0\n" -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | Penman==1.0.0 2 | ipdb 3 | tqdm 4 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | VERSION = '0.1.1' 4 | 5 | # this is what usually goes on requirements.txt 6 | install_requires = [ 7 | 'tqdm', 8 | # for scoring 9 | 'penman', 10 | # for debugging 11 | 'ipdb', 12 | ] 13 | 14 | setup( 15 | name='docAMR', 16 | version=VERSION, 17 | description="document AMR representation and evaluation", 18 | py_modules=['doc_amr','amr_io','docSmatch','doc_amr_baseline'], 19 | entry_points={ 20 | 'console_scripts': [ 21 | 'doc-smatch = docSmatch.smatch:main', 22 | 'doc-baseline = doc_amr_baseline.make_doc_amr:main', 23 | 'doc-amr = doc_amr:main' 24 | ] 25 | }, 26 | packages=find_packages(), 27 | install_requires=install_requires, 28 | ) 29 | -------------------------------------------------------------------------------- /test_coref.fof: -------------------------------------------------------------------------------- 1 | data/multisentence/ms-amr-split/test/msamr_dfa_007.xml 2 | data/multisentence/ms-amr-split/test/msamr_dfa_027.xml 3 | data/multisentence/ms-amr-split/test/msamr_dfa_041.xml 4 | data/multisentence/ms-amr-split/test/msamr_dfa_063.xml 5 | data/multisentence/ms-amr-split/test/msamr_dfa_077.xml 6 | data/multisentence/ms-amr-split/test/msamr_dfa_081.xml 7 | data/multisentence/ms-amr-split/test/msamr_dfa_093.xml 8 | data/multisentence/ms-amr-split/test/msamr_dfa_095.xml 9 | data/multisentence/ms-amr-split/test/msamr_dfa_134.xml 10 | -------------------------------------------------------------------------------- /train_coref.fof: -------------------------------------------------------------------------------- 1 | data/multisentence/ms-amr-split/train/msamr_dfa_001.xml 2 | data/multisentence/ms-amr-split/train/msamr_dfa_002.xml 3 | data/multisentence/ms-amr-split/train/msamr_dfa_003.xml 4 | data/multisentence/ms-amr-split/train/msamr_dfa_005.xml 5 | data/multisentence/ms-amr-split/train/msamr_dfa_006.xml 6 | data/multisentence/ms-amr-split/train/msamr_dfa_008.xml 7 | data/multisentence/ms-amr-split/train/msamr_dfa_009.xml 8 | data/multisentence/ms-amr-split/train/msamr_dfa_010.xml 9 | data/multisentence/ms-amr-split/train/msamr_dfa_011.xml 10 | data/multisentence/ms-amr-split/train/msamr_dfa_012.xml 11 | data/multisentence/ms-amr-split/train/msamr_dfa_013.xml 12 | data/multisentence/ms-amr-split/train/msamr_dfa_014.xml 13 | data/multisentence/ms-amr-split/train/msamr_dfa_015.xml 14 | data/multisentence/ms-amr-split/train/msamr_dfa_016.xml 15 | data/multisentence/ms-amr-split/train/msamr_dfa_017.xml 16 | data/multisentence/ms-amr-split/train/msamr_dfa_019.xml 17 | data/multisentence/ms-amr-split/train/msamr_dfa_020.xml 18 | data/multisentence/ms-amr-split/train/msamr_dfa_021.xml 19 | data/multisentence/ms-amr-split/train/msamr_dfa_023.xml 20 | data/multisentence/ms-amr-split/train/msamr_dfa_024.xml 21 | data/multisentence/ms-amr-split/train/msamr_dfa_025.xml 22 | data/multisentence/ms-amr-split/train/msamr_dfa_026.xml 23 | data/multisentence/ms-amr-split/train/msamr_dfa_028.xml 24 | data/multisentence/ms-amr-split/train/msamr_dfa_029.xml 25 | data/multisentence/ms-amr-split/train/msamr_dfa_030.xml 26 | data/multisentence/ms-amr-split/train/msamr_dfa_031.xml 27 | data/multisentence/ms-amr-split/train/msamr_dfa_032.xml 28 | data/multisentence/ms-amr-split/train/msamr_dfa_033.xml 29 | data/multisentence/ms-amr-split/train/msamr_dfa_034.xml 30 | data/multisentence/ms-amr-split/train/msamr_dfa_035.xml 31 | data/multisentence/ms-amr-split/train/msamr_dfa_036.xml 32 | data/multisentence/ms-amr-split/train/msamr_dfa_037.xml 33 | data/multisentence/ms-amr-split/train/msamr_dfa_038.xml 34 | data/multisentence/ms-amr-split/train/msamr_dfa_039.xml 35 | data/multisentence/ms-amr-split/train/msamr_dfa_042.xml 36 | data/multisentence/ms-amr-split/train/msamr_dfa_043.xml 37 | data/multisentence/ms-amr-split/train/msamr_dfa_045.xml 38 | data/multisentence/ms-amr-split/train/msamr_dfa_046.xml 39 | data/multisentence/ms-amr-split/train/msamr_dfa_047.xml 40 | data/multisentence/ms-amr-split/train/msamr_dfa_048.xml 41 | data/multisentence/ms-amr-split/train/msamr_dfa_049.xml 42 | data/multisentence/ms-amr-split/train/msamr_dfa_050.xml 43 | data/multisentence/ms-amr-split/train/msamr_dfa_051.xml 44 | data/multisentence/ms-amr-split/train/msamr_dfa_052.xml 45 | data/multisentence/ms-amr-split/train/msamr_dfa_053.xml 46 | data/multisentence/ms-amr-split/train/msamr_dfa_054.xml 47 | data/multisentence/ms-amr-split/train/msamr_dfa_055.xml 48 | data/multisentence/ms-amr-split/train/msamr_dfa_057.xml 49 | data/multisentence/ms-amr-split/train/msamr_dfa_058.xml 50 | data/multisentence/ms-amr-split/train/msamr_dfa_059.xml 51 | data/multisentence/ms-amr-split/train/msamr_dfa_060.xml 52 | data/multisentence/ms-amr-split/train/msamr_dfa_061.xml 53 | data/multisentence/ms-amr-split/train/msamr_dfa_062.xml 54 | data/multisentence/ms-amr-split/train/msamr_dfa_065.xml 55 | data/multisentence/ms-amr-split/train/msamr_dfa_067.xml 56 | data/multisentence/ms-amr-split/train/msamr_dfa_068.xml 57 | data/multisentence/ms-amr-split/train/msamr_dfa_069.xml 58 | data/multisentence/ms-amr-split/train/msamr_dfa_070.xml 59 | data/multisentence/ms-amr-split/train/msamr_dfa_071.xml 60 | data/multisentence/ms-amr-split/train/msamr_dfa_072.xml 61 | data/multisentence/ms-amr-split/train/msamr_dfa_074.xml 62 | data/multisentence/ms-amr-split/train/msamr_dfa_075.xml 63 | data/multisentence/ms-amr-split/train/msamr_dfa_076.xml 64 | data/multisentence/ms-amr-split/train/msamr_dfa_078.xml 65 | data/multisentence/ms-amr-split/train/msamr_dfa_079.xml 66 | data/multisentence/ms-amr-split/train/msamr_dfa_080.xml 67 | data/multisentence/ms-amr-split/train/msamr_dfa_082.xml 68 | data/multisentence/ms-amr-split/train/msamr_dfa_083.xml 69 | data/multisentence/ms-amr-split/train/msamr_dfa_084.xml 70 | data/multisentence/ms-amr-split/train/msamr_dfa_085.xml 71 | data/multisentence/ms-amr-split/train/msamr_dfa_087.xml 72 | data/multisentence/ms-amr-split/train/msamr_dfa_088.xml 73 | data/multisentence/ms-amr-split/train/msamr_dfa_089.xml 74 | data/multisentence/ms-amr-split/train/msamr_dfa_090.xml 75 | data/multisentence/ms-amr-split/train/msamr_dfa_091.xml 76 | data/multisentence/ms-amr-split/train/msamr_dfa_094.xml 77 | data/multisentence/ms-amr-split/train/msamr_dfa_098.xml 78 | data/multisentence/ms-amr-split/train/msamr_dfa_099.xml 79 | data/multisentence/ms-amr-split/train/msamr_dfa_101.xml 80 | data/multisentence/ms-amr-split/train/msamr_dfa_102.xml 81 | data/multisentence/ms-amr-split/train/msamr_dfa_103.xml 82 | data/multisentence/ms-amr-split/train/msamr_dfa_105.xml 83 | data/multisentence/ms-amr-split/train/msamr_dfa_107.xml 84 | data/multisentence/ms-amr-split/train/msamr_dfa_110.xml 85 | data/multisentence/ms-amr-split/train/msamr_dfa_111.xml 86 | data/multisentence/ms-amr-split/train/msamr_dfa_112.xml 87 | data/multisentence/ms-amr-split/train/msamr_dfa_113.xml 88 | data/multisentence/ms-amr-split/train/msamr_dfa_114.xml 89 | data/multisentence/ms-amr-split/train/msamr_dfa_115.xml 90 | data/multisentence/ms-amr-split/train/msamr_dfa_116.xml 91 | data/multisentence/ms-amr-split/train/msamr_dfa_117.xml 92 | data/multisentence/ms-amr-split/train/msamr_dfa_118.xml 93 | data/multisentence/ms-amr-split/train/msamr_dfa_119.xml 94 | data/multisentence/ms-amr-split/train/msamr_dfa_120.xml 95 | data/multisentence/ms-amr-split/train/msamr_dfa_121.xml 96 | data/multisentence/ms-amr-split/train/msamr_dfa_122.xml 97 | data/multisentence/ms-amr-split/train/msamr_dfa_123.xml 98 | data/multisentence/ms-amr-split/train/msamr_dfa_124.xml 99 | data/multisentence/ms-amr-split/train/msamr_dfa_125.xml 100 | data/multisentence/ms-amr-split/train/msamr_dfa_126.xml 101 | data/multisentence/ms-amr-split/train/msamr_dfa_127.xml 102 | data/multisentence/ms-amr-split/train/msamr_dfa_128.xml 103 | data/multisentence/ms-amr-split/train/msamr_dfa_129.xml 104 | data/multisentence/ms-amr-split/train/msamr_dfa_130.xml 105 | data/multisentence/ms-amr-split/train/msamr_dfa_131.xml 106 | data/multisentence/ms-amr-split/train/msamr_dfa_132.xml 107 | data/multisentence/ms-amr-split/train/msamr_dfa_133.xml 108 | data/multisentence/ms-amr-split/train/msamr_dfa_135.xml 109 | data/multisentence/ms-amr-split/train/msamr_dfa_136.xml 110 | data/multisentence/ms-amr-split/train/msamr_dfa_137.xml 111 | data/multisentence/ms-amr-split/train/msamr_dfa_139.xml 112 | data/multisentence/ms-amr-split/train/msamr_dfa_140.xml 113 | data/multisentence/ms-amr-split/train/msamr_dfa_141.xml 114 | data/multisentence/ms-amr-split/train/msamr_dfa_142.xml 115 | data/multisentence/ms-amr-split/train/msamr_dfa_143.xml 116 | data/multisentence/ms-amr-split/train/msamr_dfa_144.xml 117 | data/multisentence/ms-amr-split/train/msamr_dfa_145.xml 118 | data/multisentence/ms-amr-split/train/msamr_dfa_146.xml 119 | data/multisentence/ms-amr-split/train/msamr_dfb_001.xml 120 | data/multisentence/ms-amr-split/train/msamr_dfb_004.xml 121 | data/multisentence/ms-amr-split/train/msamr_dfb_005.xml 122 | data/multisentence/ms-amr-split/train/msamr_dfb_007.xml 123 | data/multisentence/ms-amr-split/train/msamr_dfb_008.xml 124 | data/multisentence/ms-amr-split/train/msamr_dfb_009.xml 125 | data/multisentence/ms-amr-split/train/msamr_dfb_011.xml 126 | data/multisentence/ms-amr-split/train/msamr_dfb_012.xml 127 | data/multisentence/ms-amr-split/train/msamr_dfb_014.xml 128 | data/multisentence/ms-amr-split/train/msamr_dfb_016.xml 129 | data/multisentence/ms-amr-split/train/msamr_dfb_017.xml 130 | data/multisentence/ms-amr-split/train/msamr_dfb_018.xml 131 | data/multisentence/ms-amr-split/train/msamr_dfb_020.xml 132 | data/multisentence/ms-amr-split/train/msamr_dfb_023.xml 133 | data/multisentence/ms-amr-split/train/msamr_dfb_024.xml 134 | data/multisentence/ms-amr-split/train/msamr_dfb_025.xml 135 | data/multisentence/ms-amr-split/train/msamr_dfb_027.xml 136 | data/multisentence/ms-amr-split/train/msamr_dfb_029.xml 137 | data/multisentence/ms-amr-split/train/msamr_dfb_030.xml 138 | data/multisentence/ms-amr-split/train/msamr_dfb_031.xml 139 | data/multisentence/ms-amr-split/train/msamr_dfb_033.xml 140 | data/multisentence/ms-amr-split/train/msamr_dfb_034.xml 141 | data/multisentence/ms-amr-split/train/msamr_dfb_035.xml 142 | data/multisentence/ms-amr-split/train/msamr_dfb_036.xml 143 | data/multisentence/ms-amr-split/train/msamr_dfb_037.xml 144 | data/multisentence/ms-amr-split/train/msamr_dfb_039.xml 145 | data/multisentence/ms-amr-split/train/msamr_dfb_040.xml 146 | data/multisentence/ms-amr-split/train/msamr_dfb_042.xml 147 | data/multisentence/ms-amr-split/train/msamr_dfb_043.xml 148 | data/multisentence/ms-amr-split/train/msamr_dfb_044.xml 149 | data/multisentence/ms-amr-split/train/msamr_dfb_045.xml 150 | data/multisentence/ms-amr-split/train/msamr_dfb_046.xml 151 | data/multisentence/ms-amr-split/train/msamr_dfb_047.xml 152 | data/multisentence/ms-amr-split/train/msamr_dfb_048.xml 153 | data/multisentence/ms-amr-split/train/msamr_dfb_049.xml 154 | data/multisentence/ms-amr-split/train/msamr_dfb_050.xml 155 | data/multisentence/ms-amr-split/train/msamr_dfb_051.xml 156 | data/multisentence/ms-amr-split/train/msamr_dfb_052.xml 157 | data/multisentence/ms-amr-split/train/msamr_dfb_053.xml 158 | data/multisentence/ms-amr-split/train/msamr_dfb_056.xml 159 | data/multisentence/ms-amr-split/train/msamr_dfb_058.xml 160 | data/multisentence/ms-amr-split/train/msamr_dfb_059.xml 161 | data/multisentence/ms-amr-split/train/msamr_dfb_060.xml 162 | data/multisentence/ms-amr-split/train/msamr_dfb_062.xml 163 | data/multisentence/ms-amr-split/train/msamr_dfb_063.xml 164 | data/multisentence/ms-amr-split/train/msamr_dfb_064.xml 165 | data/multisentence/ms-amr-split/train/msamr_dfb_065.xml 166 | data/multisentence/ms-amr-split/train/msamr_dfb_067.xml 167 | data/multisentence/ms-amr-split/train/msamr_dfb_068.xml 168 | data/multisentence/ms-amr-split/train/msamr_dfb_069.xml 169 | data/multisentence/ms-amr-split/train/msamr_dfb_070.xml 170 | data/multisentence/ms-amr-split/train/msamr_dfb_071.xml 171 | data/multisentence/ms-amr-split/train/msamr_dfb_072.xml 172 | data/multisentence/ms-amr-split/train/msamr_dfb_073.xml 173 | data/multisentence/ms-amr-split/train/msamr_dfb_074.xml 174 | data/multisentence/ms-amr-split/train/msamr_dfb_075.xml 175 | data/multisentence/ms-amr-split/train/msamr_dfb_076.xml 176 | data/multisentence/ms-amr-split/train/msamr_dfb_077.xml 177 | data/multisentence/ms-amr-split/train/msamr_dfb_078.xml 178 | data/multisentence/ms-amr-split/train/msamr_dfb_079.xml 179 | data/multisentence/ms-amr-split/train/msamr_dfb_080.xml 180 | data/multisentence/ms-amr-split/train/msamr_dfb_081.xml 181 | data/multisentence/ms-amr-split/train/msamr_dfb_082.xml 182 | data/multisentence/ms-amr-split/train/msamr_dfb_083.xml 183 | data/multisentence/ms-amr-split/train/msamr_dfb_084.xml 184 | data/multisentence/ms-amr-split/train/msamr_dfb_085.xml 185 | data/multisentence/ms-amr-split/train/msamr_dfb_086.xml 186 | data/multisentence/ms-amr-split/train/msamr_dfb_087.xml 187 | data/multisentence/ms-amr-split/train/msamr_dfb_088.xml 188 | data/multisentence/ms-amr-split/train/msamr_dfb_089.xml 189 | data/multisentence/ms-amr-split/train/msamr_dfb_090.xml 190 | data/multisentence/ms-amr-split/train/msamr_dfb_091.xml 191 | data/multisentence/ms-amr-split/train/msamr_dfb_092.xml 192 | data/multisentence/ms-amr-split/train/msamr_dfb_093.xml 193 | data/multisentence/ms-amr-split/train/msamr_dfb_094.xml 194 | data/multisentence/ms-amr-split/train/msamr_dfb_095.xml 195 | data/multisentence/ms-amr-split/train/msamr_dfb_096.xml 196 | data/multisentence/ms-amr-split/train/msamr_dfb_097.xml 197 | data/multisentence/ms-amr-split/train/msamr_dfb_098.xml 198 | data/multisentence/ms-amr-split/train/msamr_dfb_099.xml 199 | data/multisentence/ms-amr-split/train/msamr_dfb_100.xml 200 | data/multisentence/ms-amr-split/train/msamr_dfb_101.xml 201 | data/multisentence/ms-amr-split/train/msamr_dfb_102.xml 202 | data/multisentence/ms-amr-split/train/msamr_dfb_103.xml 203 | data/multisentence/ms-amr-split/train/msamr_dfb_104.xml 204 | data/multisentence/ms-amr-split/train/msamr_dfb_105.xml 205 | data/multisentence/ms-amr-split/train/msamr_dfb_106.xml 206 | data/multisentence/ms-amr-split/train/msamr_dfb_107.xml 207 | data/multisentence/ms-amr-split/train/msamr_dfb_108.xml 208 | data/multisentence/ms-amr-split/train/msamr_dfb_109.xml 209 | data/multisentence/ms-amr-split/train/msamr_dfb_110.xml 210 | data/multisentence/ms-amr-split/train/msamr_dfb_111.xml 211 | data/multisentence/ms-amr-split/train/msamr_dfb_112.xml 212 | data/multisentence/ms-amr-split/train/msamr_dfb_113.xml 213 | data/multisentence/ms-amr-split/train/msamr_dfb_114.xml 214 | data/multisentence/ms-amr-split/train/msamr_dfb_115.xml 215 | data/multisentence/ms-amr-split/train/msamr_dfb_116.xml 216 | data/multisentence/ms-amr-split/train/msamr_dfb_117.xml 217 | data/multisentence/ms-amr-split/train/msamr_dfb_118.xml 218 | data/multisentence/ms-amr-split/train/msamr_dfb_119.xml 219 | data/multisentence/ms-amr-split/train/msamr_dfb_120.xml 220 | data/multisentence/ms-amr-split/train/msamr_dfb_122.xml 221 | data/multisentence/ms-amr-split/train/msamr_dfb_123.xml 222 | data/multisentence/ms-amr-split/train/msamr_dfb_124.xml 223 | data/multisentence/ms-amr-split/train/msamr_dfb_125.xml 224 | data/multisentence/ms-amr-split/train/msamr_dfb_126.xml 225 | data/multisentence/ms-amr-split/train/msamr_dfb_127.xml 226 | data/multisentence/ms-amr-split/train/msamr_dfb_128.xml 227 | data/multisentence/ms-amr-split/train/msamr_dfb_129.xml 228 | data/multisentence/ms-amr-split/train/msamr_dfb_130.xml 229 | data/multisentence/ms-amr-split/train/msamr_dfb_131.xml 230 | data/multisentence/ms-amr-split/train/msamr_dfb_132.xml 231 | data/multisentence/ms-amr-split/train/msamr_dfb_135.xml 232 | data/multisentence/ms-amr-split/train/msamr_wb_001.xml 233 | data/multisentence/ms-amr-split/train/msamr_wb_002.xml 234 | data/multisentence/ms-amr-split/train/msamr_wb_003.xml 235 | data/multisentence/ms-amr-split/train/msamr_wb_004.xml 236 | data/multisentence/ms-amr-split/train/msamr_wb_005.xml 237 | data/multisentence/ms-amr-split/train/msamr_wb_006.xml 238 | data/multisentence/ms-amr-split/train/msamr_wb_007.xml 239 | data/multisentence/ms-amr-split/train/msamr_wb_008.xml 240 | data/multisentence/ms-amr-split/train/msamr_wb_009.xml 241 | data/multisentence/ms-amr-split/train/msamr_wb_010.xml 242 | data/multisentence/ms-amr-split/train/msamr_wb_011.xml 243 | --------------------------------------------------------------------------------