├── LICENSE
├── README.md
├── __init__.py
├── amr_io.py
├── dev1_coref.fof
├── dev2_coref.fof
├── docAMR.jpg
├── docAMR_from_gold.sh
├── docAMR_from_pairwise.sh
├── docAMR_from_unmerged.sh
├── docSmatch
    ├── __init__.py
    ├── amr.py
    └── smatch.py
├── doc_amr.py
├── doc_amr_baseline
    ├── README.md
    ├── __init__.py
    ├── amr_constituents.py
    ├── baseline_io.py
    ├── example
    │   ├── doc_sen.amr
    │   └── docamr_docAMR.out
    ├── get_allen_coref.py
    ├── make_doc_amr.py
    ├── requirements.txt
    ├── run_doc_amr_baseline.sh
    └── tests
    │   └── baseline_allennlp_test.sh
├── requirements.txt
├── setup.py
├── test_coref.fof
└── train_coref.fof


/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright 2022 International Business Machines
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Setup
 2 | 
 3 | make an environment with python 3.7
 4 | 
 5 | activate environment
 6 | ```
 7 | pip install -r requirements.txt
 8 | ```
 9 | 
10 | # Create docAMR representation
11 | 
12 | Link to NAACL 2022 paper DOCAMR: Multi-Sentence AMR Representation and Evaluation
13 | 
14 | https://aclanthology.org/2022.naacl-main.256.pdf
15 | 
16 | <img src="docAMR.jpg" width=60% height=60%>
17 | 
18 | To create docAMR representation from gold AMR3.0 data and the coref annotation in xml format:
19 | ```
20 | python doc_amr.py 
21 | --amr3-path <path to AMR3 data> 
22 | --coref-fof <file-with-list-of-xml-annotations-files> 
23 | --out-amr <output file> 
24 | --rep <representation>
25 | ```
26 | ```<path to AMR3 data>``` should point to uncompressed LDC data directory for LDC2020T02 with its original directory structure.
27 | 
28 | ```<file-with-list-of-xml-annotations-files>``` is one of the ```_coref.fof``` files included in this repository.
29 | 
30 | Default value for ```--rep``` is ```'docAMR'```. Other values can be: ```'no-merge'```,```'merge-names'```,```'merge-all'```. Use ```--help``` to read the descriptions of these representations.
31 | 
32 | -------
33 | 
34 | To create docAMR representation from dcoument AMRs with no nodes merged
35 | ```
36 | python doc_amr.py
37 |        --in-doc-amr-unmerged <path to document-level AMRs un no-merge format>
38 |        --rep <representation>
39 |        --out-amr <output file>
40 | ```
41 | 
42 | -------
43 | 
44 | To create docAMR representation from dcoument AMRs with pairwise edges between a representative node in the chain and the rest of the nodes in the chain:
45 | ```
46 | python doc_amr.py
47 |        --in-doc-amr-pairwise <path to document-level AMR with pairwise coref edges>
48 |        --pairwise-coref-rel <relation label indicating coref edges>
49 |        --rep <representation>
50 |        --out-amr <output file>
51 | ```
52 | 
53 | default value for ```--pairwise-coref-rel``` is ```same-as```
54 | 
55 | # Evaluate docAMR (docSmatch) 
56 | 
57 | Use docSmatch the same way as the standard Smatch. 
58 | 
59 | ```
60 | python docSmatch/smatch.py -f <amr1> <amr2>
61 | ```
62 | 
63 | It assumes that ```:snt``` relations connect sentences to the root. Moreover, it assumes that the numeric suffix of ```:snt``` is the sentence number and that the matching sentence numbers in the two AMRs are aligned.
64 | 
65 | You can also get a detailed score breakdown for the accuracy of coreference prediction:
66 | ```
67 | python docSmatch/smatch.py -f <amr1> <amr2> --coref-subscore
68 | ```
69 | This will ouput the normal smatch score as 'Overall Score', as well as a 'Coref Score' indicating the quality of cross sentential edges and nodes.
70 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IBM/docAMR/e8937bad9aa3fa4077751f9bcedfbfcbfa37047e/__init__.py


--------------------------------------------------------------------------------
/dev1_coref.fof:
--------------------------------------------------------------------------------
 1 | data/multisentence/ms-amr-split/train/msamr_dfa_004.xml
 2 | data/multisentence/ms-amr-split/train/msamr_dfa_018.xml
 3 | data/multisentence/ms-amr-split/train/msamr_dfa_022.xml
 4 | data/multisentence/ms-amr-split/train/msamr_dfa_040.xml
 5 | data/multisentence/ms-amr-split/train/msamr_dfa_044.xml
 6 | data/multisentence/ms-amr-split/train/msamr_dfa_056.xml
 7 | data/multisentence/ms-amr-split/train/msamr_dfa_064.xml
 8 | data/multisentence/ms-amr-split/train/msamr_dfa_066.xml
 9 | data/multisentence/ms-amr-split/train/msamr_dfa_073.xml
10 | data/multisentence/ms-amr-split/train/msamr_dfa_086.xml
11 | data/multisentence/ms-amr-split/train/msamr_dfa_092.xml
12 | data/multisentence/ms-amr-split/train/msamr_dfa_096.xml
13 | data/multisentence/ms-amr-split/train/msamr_dfa_097.xml
14 | data/multisentence/ms-amr-split/train/msamr_dfa_100.xml
15 | data/multisentence/ms-amr-split/train/msamr_dfa_104.xml
16 | data/multisentence/ms-amr-split/train/msamr_dfa_106.xml
17 | data/multisentence/ms-amr-split/train/msamr_dfa_108.xml
18 | data/multisentence/ms-amr-split/train/msamr_dfa_109.xml
19 | data/multisentence/ms-amr-split/train/msamr_dfa_138.xml
20 | data/multisentence/ms-amr-split/train/msamr_dfa_147.xml
21 | data/multisentence/ms-amr-split/train/msamr_dfb_002.xml
22 | data/multisentence/ms-amr-split/train/msamr_dfb_003.xml
23 | data/multisentence/ms-amr-split/train/msamr_dfb_006.xml
24 | data/multisentence/ms-amr-split/train/msamr_dfb_010.xml
25 | data/multisentence/ms-amr-split/train/msamr_dfb_013.xml
26 | data/multisentence/ms-amr-split/train/msamr_dfb_015.xml
27 | data/multisentence/ms-amr-split/train/msamr_dfb_019.xml
28 | data/multisentence/ms-amr-split/train/msamr_dfb_021.xml
29 | data/multisentence/ms-amr-split/train/msamr_dfb_022.xml
30 | data/multisentence/ms-amr-split/train/msamr_dfb_026.xml
31 | data/multisentence/ms-amr-split/train/msamr_dfb_028.xml
32 | data/multisentence/ms-amr-split/train/msamr_dfb_032.xml
33 | data/multisentence/ms-amr-split/train/msamr_dfb_038.xml
34 | data/multisentence/ms-amr-split/train/msamr_dfb_041.xml
35 | data/multisentence/ms-amr-split/train/msamr_dfb_054.xml
36 | data/multisentence/ms-amr-split/train/msamr_dfb_055.xml
37 | data/multisentence/ms-amr-split/train/msamr_dfb_057.xml
38 | data/multisentence/ms-amr-split/train/msamr_dfb_061.xml
39 | data/multisentence/ms-amr-split/train/msamr_dfb_066.xml
40 | data/multisentence/ms-amr-split/train/msamr_dfb_121.xml
41 | data/multisentence/ms-amr-split/train/msamr_dfb_133.xml
42 | data/multisentence/ms-amr-split/train/msamr_dfb_134.xml
43 | 


--------------------------------------------------------------------------------
/dev2_coref.fof:
--------------------------------------------------------------------------------
 1 | data/multisentence/ms-amr-double-annotations/msamr_dfa_004.alternative.xml
 2 | data/multisentence/ms-amr-double-annotations/msamr_dfa_018.alternative.xml
 3 | data/multisentence/ms-amr-double-annotations/msamr_dfa_022.alternative.xml
 4 | data/multisentence/ms-amr-double-annotations/msamr_dfa_040.alternative.xml
 5 | data/multisentence/ms-amr-double-annotations/msamr_dfa_044.alternative.xml
 6 | data/multisentence/ms-amr-double-annotations/msamr_dfa_056.alternative.xml
 7 | data/multisentence/ms-amr-double-annotations/msamr_dfa_064.alternative.xml
 8 | data/multisentence/ms-amr-double-annotations/msamr_dfa_066.alternative.xml
 9 | data/multisentence/ms-amr-double-annotations/msamr_dfa_073.alternative.xml
10 | data/multisentence/ms-amr-double-annotations/msamr_dfa_086.alternative.xml
11 | data/multisentence/ms-amr-double-annotations/msamr_dfa_092.alternative.xml
12 | data/multisentence/ms-amr-double-annotations/msamr_dfa_096.alternative.xml
13 | data/multisentence/ms-amr-double-annotations/msamr_dfa_097.alternative.xml
14 | data/multisentence/ms-amr-double-annotations/msamr_dfa_100.alternative.xml
15 | data/multisentence/ms-amr-double-annotations/msamr_dfa_104.alternative.xml
16 | data/multisentence/ms-amr-double-annotations/msamr_dfa_106.alternative.xml
17 | data/multisentence/ms-amr-double-annotations/msamr_dfa_108.alternative.xml
18 | data/multisentence/ms-amr-double-annotations/msamr_dfa_109.alternative.xml
19 | data/multisentence/ms-amr-double-annotations/msamr_dfa_138.alternative.xml
20 | data/multisentence/ms-amr-double-annotations/msamr_dfa_147.alternative.xml
21 | data/multisentence/ms-amr-double-annotations/msamr_dfb_002.alternative.xml
22 | data/multisentence/ms-amr-double-annotations/msamr_dfb_003.alternative.xml
23 | data/multisentence/ms-amr-double-annotations/msamr_dfb_006.alternative.xml
24 | data/multisentence/ms-amr-double-annotations/msamr_dfb_010.alternative.xml
25 | data/multisentence/ms-amr-double-annotations/msamr_dfb_013.alternative.xml
26 | data/multisentence/ms-amr-double-annotations/msamr_dfb_015.alternative.xml
27 | data/multisentence/ms-amr-double-annotations/msamr_dfb_019.alternative.xml
28 | data/multisentence/ms-amr-double-annotations/msamr_dfb_021.alternative.xml
29 | data/multisentence/ms-amr-double-annotations/msamr_dfb_022.alternative.xml
30 | data/multisentence/ms-amr-double-annotations/msamr_dfb_026.alternative.xml
31 | data/multisentence/ms-amr-double-annotations/msamr_dfb_028.alternative.xml
32 | data/multisentence/ms-amr-double-annotations/msamr_dfb_032.alternative.xml
33 | data/multisentence/ms-amr-double-annotations/msamr_dfb_038.alternative.xml
34 | data/multisentence/ms-amr-double-annotations/msamr_dfb_041.alternative.xml
35 | data/multisentence/ms-amr-double-annotations/msamr_dfb_054.alternative.xml
36 | data/multisentence/ms-amr-double-annotations/msamr_dfb_055.alternative.xml
37 | data/multisentence/ms-amr-double-annotations/msamr_dfb_057.alternative.xml
38 | data/multisentence/ms-amr-double-annotations/msamr_dfb_061.alternative.xml
39 | data/multisentence/ms-amr-double-annotations/msamr_dfb_066.alternative.xml
40 | data/multisentence/ms-amr-double-annotations/msamr_dfb_121.alternative.xml
41 | data/multisentence/ms-amr-double-annotations/msamr_dfb_133.alternative.xml
42 | data/multisentence/ms-amr-double-annotations/msamr_dfb_134.alternative.xml
43 | 


--------------------------------------------------------------------------------
/docAMR.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IBM/docAMR/e8937bad9aa3fa4077751f9bcedfbfcbfa37047e/docAMR.jpg


--------------------------------------------------------------------------------
/docAMR_from_gold.sh:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | amr3path="/dccstor/ykt-parse/SHARED/CORPORA/AMR/amr_annotation_3.0"
 4 | split=$1
 5 | rep=$2
 6 | output_dir="outputs"
 7 | 
 8 | mkdir -p $output_dir
 9 | 
10 | python doc_amr.py \
11 |        --amr3-path $amr3path \
12 |        --coref-fof ${split}_coref.fof \
13 |        --rep $rep \
14 |        --out-amr $output_dir/${split}.gold.$rep.out
15 | 
16 | 


--------------------------------------------------------------------------------
/docAMR_from_pairwise.sh:
--------------------------------------------------------------------------------
 1 | 
 2 | amr=$1
 3 | rep=$2
 4 | rel="same-as"
 5 | 
 6 | python doc_amr.py \
 7 |        --in-doc-amr-pairwise $amr \
 8 |        --pairwise-coref-rel $rel \
 9 |        --rep $rep \
10 |        --out-amr $amr.$rep.out
11 | 


--------------------------------------------------------------------------------
/docAMR_from_unmerged.sh:
--------------------------------------------------------------------------------
 1 | 
 2 | 
 3 | 
 4 | amr=$1
 5 | rep=$2
 6 | 
 7 | python doc_amr.py \
 8 |        --in-doc-amr-unmerged $amr \
 9 |        --rep $rep \
10 |        --out-amr $amr.$rep.out
11 | 
12 | 


--------------------------------------------------------------------------------
/docSmatch/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IBM/docAMR/e8937bad9aa3fa4077751f9bcedfbfcbfa37047e/docSmatch/__init__.py


--------------------------------------------------------------------------------
/docSmatch/amr.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | """
  5 | This code is taken from https://github.com/snowblink14/smatch
  6 | 
  7 | and changed for DocAMR to:
  8 | 1. constrain smatch node mapping based on sentence alignements to make it faster
  9 | 2. compute coref subscore for DocAMR 
 10 | 
 11 | -----
 12 | 
 13 | AMR (Abstract Meaning Representation) structure
 14 | For detailed description of AMR, see http://www.isi.edu/natural-language/amr/a.pdf
 15 | 
 16 | """
 17 | 
 18 | from __future__ import print_function
 19 | from collections import defaultdict
 20 | import sys
 21 | 
 22 | # change this if needed
 23 | ERROR_LOG = sys.stderr
 24 | 
 25 | # change this if needed
 26 | DEBUG_LOG = sys.stderr
 27 | 
 28 | 
 29 | class AMR(object):
 30 |     """
 31 |     AMR is a rooted, labeled graph to represent semantics.
 32 |     This class has the following members:
 33 |     nodes: list of node in the graph. Its ith element is the name of the ith node. For example, a node name
 34 |            could be "a1", "b", "g2", .etc
 35 |     node_values: list of node labels (values) of the graph. Its ith element is the value associated with node i in
 36 |                  nodes list. In AMR, such value is usually a semantic concept (e.g. "boy", "want-01")
 37 |     root: root node name
 38 |     relations: list of edges connecting two nodes in the graph. Each entry is a link between two nodes, i.e. a triple
 39 |                <relation name, node1 name, node 2 name>. In AMR, such link denotes the relation between two semantic
 40 |                concepts. For example, "arg0" means that one of the concepts is the 0th argument of the other.
 41 |     attributes: list of edges connecting a node to an attribute name and its value. For example, if the polarity of
 42 |                some node is negative, there should be an edge connecting this node and "-". A triple < attribute name,
 43 |                node name, attribute value> is used to represent such attribute. It can also be viewed as a relation.
 44 | 
 45 |     """
 46 |     def __init__(self, node_list=None, node_value_list=None, relation_list=None, attribute_list=None, nodes_by_sentence=None):
 47 |         """
 48 |         node_list: names of nodes in AMR graph, e.g. "a11", "n"
 49 |         node_value_list: values of nodes in AMR graph, e.g. "group" for a node named "g"
 50 |         relation_list: list of relations between two nodes
 51 |         attribute_list: list of attributes (links between one node and one constant value)
 52 | 
 53 |         """
 54 |         # initialize AMR graph nodes using list of nodes name
 55 |         # root, by default, is the first in var_list
 56 | 
 57 |         if node_list is None:
 58 |             self.nodes = []
 59 |             self.root = None
 60 |         else:
 61 |             self.nodes = node_list[:]
 62 |             if len(node_list) != 0:
 63 |                 self.root = node_list[0]
 64 |             else:
 65 |                 self.root = None
 66 |         if node_value_list is None:
 67 |             self.node_values = []
 68 |         else:
 69 |             self.node_values = node_value_list[:]
 70 |         if relation_list is None:
 71 |             self.relations = []
 72 |         else:
 73 |             self.relations = relation_list[:]
 74 |         if attribute_list is None:
 75 |             self.attributes = []
 76 |         else:
 77 |             self.attributes = attribute_list[:]
 78 | 
 79 |         self._descendants = None
 80 |         if nodes_by_sentence is None:
 81 |             self.categorize_nodes_by_sentence()#_uninverted()
 82 |         else:
 83 |             self.nodes_by_sentence = nodes_by_sentence[:]
 84 |         self.coref_nodes = []
 85 |         self.coref_edges = []
 86 |         self.named_entities = []
 87 |         self.find_coref()
 88 | 
 89 |     def find_coref(self):
 90 |         self.coref_nodes = []
 91 |         self.coref_edges = []
 92 |         self.named_entities = []
 93 |         sentences_by_node = {n:set() for n in self.nodes}
 94 |         for i,nodes_in_sentence in enumerate(self.nodes_by_sentence):
 95 |             for n in nodes_in_sentence:
 96 |                 sentences_by_node[n].add(i)
 97 |         candidates = [n for n in sentences_by_node if len(sentences_by_node[n])>1]
 98 |         relations_to_node = {n:[] for n in self.nodes}
 99 |         for i,s in enumerate(self.nodes):
100 |             for rel in self.relations[i]:
101 |                 r, t, is_inverted = rel
102 |                 if r not in ['coref','part','subset']:
103 |                     if is_inverted:
104 |                         relations_to_node[s].append((r+'-of', t))
105 |                     else:
106 |                         relations_to_node[t].append((r, s))
107 |         # coref nodes
108 |         special = ['implicit-role', 'coref-entity']
109 |         for n, concept in zip(self.nodes, self.node_values):
110 |             if concept in special:
111 |                 self.coref_nodes.append(n)
112 |         _coref_nodes = set()
113 |         for n in self.coref_nodes:
114 |             _coref_nodes.add(n)
115 |         for n in candidates[:]:
116 |             if len(relations_to_node[n]) < 2: continue
117 |             rels = [rel for rel in relations_to_node[n] if sentences_by_node[rel[1]]!=sentences_by_node[n]]
118 |             if len(rels) < 2: continue
119 |             parent1 = rels[0][1]
120 |             if any(sentences_by_node[parent2]!=sentences_by_node[parent1] for _,parent2 in rels):
121 |                 _coref_nodes.add(n)
122 |         # coref edges
123 |         for i,s in enumerate(self.nodes):
124 |             for rel in self.relations[i]:
125 |                 r, t, _ = rel
126 |                 if t in _coref_nodes:
127 |                     self.coref_edges.append((r, s, t))
128 |                 elif s in _coref_nodes and r in ['coref']:
129 |                     self.coref_edges.append((r, s, t))
130 |                 elif r in ['part','subset'] and sentences_by_node[s]!=sentences_by_node[t]:
131 |                     self.coref_edges.append((r, s, t))
132 |         # named entities
133 |         _named_entities = set()
134 |         for i, s in enumerate(self.nodes):
135 |             for rel in self.relations[i]:
136 |                 r, t, inv = rel
137 |                 if r == 'name':
138 |                     _named_entities.add(s)
139 |         self.named_entities = [n for n in _named_entities]    
140 | 
141 |     def categorize_nodes_by_sentence(self):
142 | 
143 |         # this method sorts nodes into sentence buckets
144 |         # one node can go into multiple sentences' buckets
145 |         
146 |         # this is usefull for faster smatch for document AMR when sentence alignments are known
147 |         # following constraint makes the smatch faster:
148 |         # A node 'n' in one AMR can map to a node in the other AMR only if one of its sentence bucket aligns to a senetnce bucket of 'n'.
149 |         # this code assumes that :sntx in one AMR is aligned to :snty in the other of x == y        
150 |         
151 |         sen_roots = {}
152 |         for rel in self.relations[self.nodes.index(self.root)]:
153 |             if rel[0].startswith('snt'):
154 |                 idx = int(rel[0][3:])
155 |                 sen_roots[idx] = rel[1]
156 | 
157 |         if len(sen_roots) == 0:
158 |             self.nodes_by_sentence = [self.nodes[:]]
159 |             return
160 | 
161 |         #find descendents of every node
162 |         #so that descendents of each sentence root get populated
163 |         descendents = {n: {n} for n in self.nodes}
164 |         coref_rels = []
165 |         for (i, rels) in enumerate(self.relations):
166 |             x = self.nodes[i]
167 |             for rel in rels:
168 |                 r, y, is_inverted = rel
169 |                 
170 |                 if r != 'coref' and (r != 'domain' or not is_inverted):
171 |                     descendents[x].update(descendents[y])
172 |                     for n in descendents:
173 |                         if x in descendents[n]:
174 |                             descendents[n].update(descendents[x])
175 | 
176 |                 if r == 'coref':
177 |                     coref_rels.append((x,r,y))
178 |                     
179 |                 xx = x
180 |                 yy = y
181 |                 
182 |                 if is_inverted or r in ['part','subset','coref']:
183 | 
184 |                     #if the edge was originally inverted
185 |                     #add descendents in the reverse direction too
186 |                     
187 |                     yy = x
188 |                     xx = y
189 |                     
190 |                     descendents[xx].update(descendents[yy])
191 |                     for n in descendents:
192 |                         if xx in descendents[n]:
193 |                             descendents[n].update(descendents[xx])
194 | 
195 |         for (x,r,y) in coref_rels:
196 |             if y not in descendents[self.root]:
197 |                 if x in descendents[self.root]:
198 |                     descendents[x].update(descendents[y])
199 |                     for n in descendents:
200 |                         if x in descendents[n]:
201 |                             descendents[n].update(descendents[x])
202 |                             
203 |         for node in self.nodes:
204 |             if node not in descendents[self.root]:
205 |                 print(node + " is still not assigned to a sentence!!!")
206 |                             
207 |         self.nodes_by_sentence = [[self.root]]
208 |         for i in range(len(sen_roots)):
209 |             nodes_in_sentence = list(descendents[sen_roots[i+1]])
210 |             self.nodes_by_sentence.append(nodes_in_sentence)
211 | 
212 |     def rename_node(self, prefix):
213 |         """
214 |         Rename AMR graph nodes to prefix + node_index to avoid nodes with the same name in two different AMRs.
215 | 
216 |         """
217 |         self.new2old_map = {}
218 |         node_map_dict = {}
219 |         # map each node to its new name (e.g. "a1")
220 |         for i in range(0, len(self.nodes)):
221 |             node_map_dict[self.nodes[i]] = prefix + str(i)
222 |             self.new2old_map[prefix + str(i)] = self.nodes[i]
223 |         # update node name
224 |         for i, v in enumerate(self.nodes):
225 |             self.nodes[i] = node_map_dict[v]
226 |         # update names for list of persentence node lists
227 |         new_nodes_by_sentence = []
228 |         for i in range(len(self.nodes_by_sentence)):
229 |             new_nodes_by_sentence.append([])
230 |             for j in range(len(self.nodes_by_sentence[i])):
231 |                 if node_map_dict[self.nodes_by_sentence[i][j]] not in new_nodes_by_sentence[i]:
232 |                     new_nodes_by_sentence[i].append(node_map_dict[self.nodes_by_sentence[i][j]])
233 |         self.nodes_by_sentence = new_nodes_by_sentence
234 |         # update node name in relations
235 |         for node_relations in self.relations:
236 |             for i, l in enumerate(node_relations):
237 |                 node_relations[i][1] = node_map_dict[l[1]]
238 |         self.coref_nodes = [node_map_dict[n] for n in self.coref_nodes]
239 |         self.coref_edges = [(r, node_map_dict[s], node_map_dict[t]) for r,s,t in self.coref_edges]
240 |         self.named_entities = [node_map_dict[n] for n in self.named_entities]
241 | 
242 |     def get_triples(self):
243 |         """
244 |         Get the triples in three lists.
245 |         instance_triple: a triple representing an instance. E.g. instance(w, want-01)
246 |         attribute triple: relation of attributes, e.g. polarity(w, - )
247 |         and relation triple, e.g. arg0 (w, b)
248 | 
249 |         """
250 |         instance_triple = []
251 |         relation_triple = []
252 |         attribute_triple = []
253 |         for i in range(len(self.nodes)):
254 |             instance_triple.append(("instance", self.nodes[i], self.node_values[i]))
255 |             # l[0] is relation name
256 |             # l[1] is the other node this node has relation with
257 |             for l in self.relations[i]:
258 |                 relation_triple.append((l[0], self.nodes[i], l[1]))
259 |             # l[0] is the attribute name
260 |             # l[1] is the attribute value
261 |             for l in self.attributes[i]:
262 |                 attribute_triple.append((l[0], self.nodes[i], l[1]))
263 |         return instance_triple, attribute_triple, relation_triple
264 | 
265 | 
266 |     def get_triples2(self):
267 |         """
268 |         Get the triples in two lists:
269 |         instance_triple: a triple representing an instance. E.g. instance(w, want-01)
270 |         relation_triple: a triple representing all relations. E.g arg0 (w, b) or E.g. polarity(w, - )
271 |         Note that we do not differentiate between attribute triple and relation triple. Both are considered as relation
272 |         triples.
273 |         All triples are represented by (triple_type, argument 1 of the triple, argument 2 of the triple)
274 | 
275 |         """
276 |         instance_triple = []
277 |         relation_triple = []
278 |         for i in range(len(self.nodes)):
279 |             # an instance triple is instance(node name, node value).
280 |             # For example, instance(b, boy).
281 |             instance_triple.append(("instance", self.nodes[i], self.node_values[i]))
282 |             # l[0] is relation name
283 |             # l[1] is the other node this node has relation with
284 |             for l in self.relations[i]:
285 |                 relation_triple.append((l[0], self.nodes[i], l[1]))
286 |             # l[0] is the attribute name
287 |             # l[1] is the attribute value
288 |             for l in self.attributes[i]:
289 |                 relation_triple.append((l[0], self.nodes[i], l[1]))
290 |         return instance_triple, relation_triple
291 | 
292 | 
293 |     def __str__(self):
294 |         """
295 |         Generate AMR string for better readability
296 | 
297 |         """
298 |         lines = []
299 |         for i in range(len(self.nodes)):
300 |             lines.append("Node "+ str(i) + " " + self.nodes[i])
301 |             lines.append("Value: " + self.node_values[i])
302 |             lines.append("Relations:")
303 |             for relation in self.relations[i]:
304 |                 lines.append("Node " + relation[1] + " via " + relation[0])
305 |             for attribute in self.attributes[i]:
306 |                 lines.append("Attribute: " + attribute[0] + " value " + attribute[1])
307 |         return "\n".join(lines)
308 | 
309 |     def __repr__(self):
310 |         return self.__str__()
311 | 
312 |     def output_amr(self):
313 |         """
314 |         Output AMR string
315 | 
316 |         """
317 |         print(self.__str__(), file=DEBUG_LOG)
318 | 
319 |     @staticmethod
320 |     def get_amr_line(input_f):
321 |         """
322 |         Read the file containing AMRs. AMRs are separated by a blank line.
323 |         Each call of get_amr_line() returns the next available AMR (in one-line form).
324 |         Note: this function does not verify if the AMR is valid
325 | 
326 |         """
327 |         cur_amr = []
328 |         has_content = False
329 |         for line in input_f:
330 |             line = line.strip()
331 |             if line == "":
332 |                 if not has_content:
333 |                     # empty lines before current AMR
334 |                     continue
335 |                 else:
336 |                     # end of current AMR
337 |                     break
338 |             if line.strip().startswith("#"):
339 |                 #if "::id" in line:
340 |                 #    print(line)
341 |                 # ignore the comment line (starting with "#") in the AMR file
342 |                 continue
343 |             else:
344 |                 has_content = True
345 |                 cur_amr.append(line.strip())
346 |         return "".join(cur_amr)
347 | 
348 |     @staticmethod
349 |     def parse_AMR_line(line):
350 |         """
351 |         Parse a AMR from line representation to an AMR object.
352 |         This parsing algorithm scans the line once and process each character, in a shift-reduce style.
353 | 
354 |         """
355 |         # Current state. It denotes the last significant symbol encountered. 1 for (, 2 for :, 3 for /,
356 |         # and 0 for start state or ')'
357 |         # Last significant symbol is ( --- start processing node name
358 |         # Last significant symbol is : --- start processing relation name
359 |         # Last significant symbol is / --- start processing node value (concept name)
360 |         # Last significant symbol is ) --- current node processing is complete
361 |         # Note that if these symbols are inside parenthesis, they are not significant symbols.
362 | 
363 |         exceptions =set(["prep-on-behalf-of", "prep-out-of", "consist-of"])
364 |         def update_triple(node_relation_dict, u, r, v):
365 |             # we detect a relation (r) between u and v, with direction u to v.
366 |             # in most cases, if relation name ends with "-of", e.g."arg0-of",
367 |             # it is reverse of some relation. For example, if a is "arg0-of" b,
368 |             # we can also say b is "arg0" a.
369 |             # If the relation name ends with "-of", we store the reverse relation.
370 |             # but note some exceptions like "prep-on-behalf-of" and "prep-out-of"
371 |             # also note relation "mod" is the reverse of "domain"
372 |             if r.endswith("-of") and not r in exceptions:
373 |                 #if u.split(".")[0] != v.split(".")[0]:
374 |                 #    print(u+" -" + r + "-> "+v)
375 |                 node_relation_dict[v].append((r[:-3], u, True))
376 |             elif r=="mod":
377 |                 node_relation_dict[v].append(("domain", u, True))
378 |             else:
379 |                 node_relation_dict[u].append((r, v, False))
380 | 
381 |         state = 0
382 |         # node stack for parsing
383 |         stack = []
384 |         # current not-yet-reduced character sequence
385 |         cur_charseq = []
386 |         # key: node name value: node value
387 |         node_dict = {}
388 |         # node name list (order: occurrence of the node)
389 |         node_name_list = []
390 |         # key: node name:  value: list of (relation name, the other node name)
391 |         node_relation_dict1 = defaultdict(list)
392 |         # key: node name, value: list of (attribute name, const value) or (relation name, unseen node name)
393 |         node_relation_dict2 = defaultdict(list)
394 |         # current relation name
395 |         cur_relation_name = ""
396 |         # having unmatched quote string
397 |         in_quote = False
398 |         for i, c in enumerate(line.strip()):
399 |             if c == " ":
400 |                 # allow space in relation name
401 |                 if state == 2:
402 |                     cur_charseq.append(c)
403 |                 continue
404 |             if c == "\"":
405 |                 # flip in_quote value when a quote symbol is encountered
406 |                 # insert placeholder if in_quote from last symbol
407 |                 if in_quote:
408 |                     cur_charseq.append('_')
409 |                 in_quote = not in_quote
410 |             elif c == "(":
411 |                 # not significant symbol if inside quote
412 |                 if in_quote:
413 |                     cur_charseq.append(c)
414 |                     continue
415 |                 # get the attribute name
416 |                 # e.g :arg0 (x ...
417 |                 # at this point we get "arg0"
418 |                 if state == 2:
419 |                     # in this state, current relation name should be empty
420 |                     if cur_relation_name != "":
421 |                         print("Format error when processing ", line[0:i + 1], file=ERROR_LOG)
422 |                         return None
423 |                     # update current relation name for future use
424 |                     cur_relation_name = "".join(cur_charseq).strip()
425 |                     cur_charseq[:] = []
426 |                 state = 1
427 |             elif c == ":":
428 |                 # not significant symbol if inside quote
429 |                 if in_quote:
430 |                     cur_charseq.append(c)
431 |                     continue
432 |                 # Last significant symbol is "/". Now we encounter ":"
433 |                 # Example:
434 |                 # :OR (o2 / *OR*
435 |                 #    :mod (o3 / official)
436 |                 #  gets node value "*OR*" at this point
437 |                 if state == 3:
438 |                     node_value = "".join(cur_charseq)
439 |                     # clear current char sequence
440 |                     cur_charseq[:] = []
441 |                     # pop node name ("o2" in the above example)
442 |                     cur_node_name = stack[-1]
443 |                     # update node name/value map
444 |                     node_dict[cur_node_name] = node_value
445 |                 # Last significant symbol is ":". Now we encounter ":"
446 |                 # Example:
447 |                 # :op1 w :quant 30
448 |                 # or :day 14 :month 3
449 |                 # the problem is that we cannot decide if node value is attribute value (constant)
450 |                 # or node value (variable) at this moment
451 |                 elif state == 2:
452 |                     temp_attr_value = "".join(cur_charseq)
453 |                     cur_charseq[:] = []
454 |                     parts = temp_attr_value.split()
455 |                     if len(parts) < 2:
456 |                         import ipdb; ipdb.set_trace()
457 |                         print("Error in processing; part len < 2", line[0:i + 1], file=ERROR_LOG)
458 |                         return None
459 |                     # For the above example, node name is "op1", and node value is "w"
460 |                     # Note that this node name might not be encountered before
461 |                     relation_name = parts[0].strip()
462 |                     relation_value = parts[1].strip()
463 |                     # We need to link upper level node to the current
464 |                     # top of stack is upper level node
465 |                     if len(stack) == 0:
466 |                         print("Error in processing", line[:i], relation_name, relation_value, file=ERROR_LOG)
467 |                         return None
468 |                     # if we have not seen this node name before
469 |                     if relation_value not in node_dict:
470 |                         update_triple(node_relation_dict2, stack[-1], relation_name, relation_value)
471 |                     else:
472 |                         update_triple(node_relation_dict1, stack[-1], relation_name, relation_value)
473 |                 state = 2
474 |             elif c == "/":
475 |                 if in_quote:
476 |                     cur_charseq.append(c)
477 |                     continue
478 |                 # Last significant symbol is "(". Now we encounter "/"
479 |                 # Example:
480 |                 # (d / default-01
481 |                 # get "d" here
482 |                 if state == 1:
483 |                     node_name = "".join(cur_charseq)
484 |                     cur_charseq[:] = []
485 |                     # if this node name is already in node_dict, it is duplicate
486 |                     if node_name in node_dict:
487 |                         print("Duplicate node name ", node_name, " in parsing AMR", file=ERROR_LOG)
488 |                         return None
489 |                     # push the node name to stack
490 |                     stack.append(node_name)
491 |                     # add it to node name list
492 |                     node_name_list.append(node_name)
493 |                     # if this node is part of the relation
494 |                     # Example:
495 |                     # :arg1 (n / nation)
496 |                     # cur_relation_name is arg1
497 |                     # node name is n
498 |                     # we have a relation arg1(upper level node, n)
499 |                     if cur_relation_name != "":
500 |                         update_triple(node_relation_dict1, stack[-2], cur_relation_name, node_name)
501 |                         cur_relation_name = ""
502 |                 else:
503 |                     # error if in other state
504 |                     print("Error in parsing AMR", line[0:i + 1], file=ERROR_LOG)
505 |                     return None
506 |                 state = 3
507 |             elif c == ")":
508 |                 if in_quote:
509 |                     cur_charseq.append(c)
510 |                     continue
511 |                 # stack should be non-empty to find upper level node
512 |                 if len(stack) == 0:
513 |                     print("Unmatched parenthesis at position", i, "in processing", line[0:i + 1], file=ERROR_LOG)
514 |                     return None
515 |                 # Last significant symbol is ":". Now we encounter ")"
516 |                 # Example:
517 |                 # :op2 "Brown") or :op2 w)
518 |                 # get \"Brown\" or w here
519 |                 if state == 2:
520 |                     temp_attr_value = "".join(cur_charseq)
521 |                     cur_charseq[:] = []
522 |                     parts = temp_attr_value.split()
523 |                     if len(parts) < 2:
524 |                         print("Error processing", line[:i + 1], temp_attr_value, file=ERROR_LOG)
525 |                         return None
526 |                     relation_name = parts[0].strip()
527 |                     relation_value = parts[1].strip()
528 |                     # attribute value not seen before
529 |                     # Note that it might be a constant attribute value, or an unseen node
530 |                     # process this after we have seen all the node names
531 |                     if relation_value not in node_dict:
532 |                         update_triple(node_relation_dict2, stack[-1], relation_name, relation_value)
533 |                     else:
534 |                         update_triple(node_relation_dict1, stack[-1], relation_name, relation_value)
535 |                 # Last significant symbol is "/". Now we encounter ")"
536 |                 # Example:
537 |                 # :arg1 (n / nation)
538 |                 # we get "nation" here
539 |                 elif state == 3:
540 |                     node_value = "".join(cur_charseq)
541 |                     cur_charseq[:] = []
542 |                     cur_node_name = stack[-1]
543 |                     # map node name to its value
544 |                     node_dict[cur_node_name] = node_value
545 |                 # pop from stack, as the current node has been processed
546 |                 stack.pop()
547 |                 cur_relation_name = ""
548 |                 state = 0
549 |             else:
550 |                 # not significant symbols, so we just shift.
551 |                 cur_charseq.append(c)
552 |         #create data structures to initialize an AMR
553 |         node_value_list = []
554 |         relation_list = []
555 |         attribute_list = []
556 |         for v in node_name_list:
557 |             if v not in node_dict:
558 |                 print("Error: Node name not found", v, file=ERROR_LOG)
559 |                 return None
560 |             else:
561 |                 node_value_list.append(node_dict[v])
562 |             # build relation list and attribute list for this node
563 |             node_rel_list = []
564 |             node_attr_list = []
565 |             if v in node_relation_dict1:
566 |                 for v1 in node_relation_dict1[v]:
567 |                     node_rel_list.append([v1[0], v1[1], v1[2]])
568 |             if v in node_relation_dict2:
569 |                 for v2 in node_relation_dict2[v]:
570 |                     # if value is in quote, it is a constant value
571 |                     # strip the quote and put it in attribute map
572 |                     if v2[1][0] == "\"" and v2[1][-1] == "\"":
573 |                         node_attr_list.append([[v2[0]], v2[1][1:-1], v2[2]])
574 |                     # if value is a node name
575 |                     elif v2[1] in node_dict:
576 |                         node_rel_list.append([v2[0], v2[1], v2[2]])
577 |                     else:
578 |                         node_attr_list.append([v2[0], v2[1], v2[2]])
579 |             # each node has a relation list and attribute list
580 |             relation_list.append(node_rel_list)
581 |             attribute_list.append(node_attr_list)
582 |         # add TOP as an attribute. The attribute value just needs to be constant
583 |         attribute_list[0].append(["TOP", 'top'])
584 |         result_amr = AMR(node_name_list, node_value_list, relation_list, attribute_list)
585 |         return result_amr
586 | 
587 | # test AMR parsing
588 | # run by amr.py [file containing AMR]
589 | # a unittest can also be used.
590 | if __name__ == "__main__":
591 |     if len(sys.argv) < 2:
592 |         print("No file given", file=ERROR_LOG)
593 |         exit(1)
594 |     amr_count = 1
595 |     for line in open(sys.argv[1]):
596 |         cur_line = line.strip()
597 |         if cur_line == "" or cur_line.startswith("#"):
598 |             continue
599 |         print("AMR", amr_count, file=DEBUG_LOG)
600 |         current = AMR.parse_AMR_line(cur_line)
601 |         current.output_amr()
602 |         amr_count += 1
603 | 


--------------------------------------------------------------------------------
/docSmatch/smatch.py:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/env python
   2 | # -*- coding: utf-8 -*-
   3 | 
   4 | 
   5 | """
   6 | This code is taken from https://github.com/snowblink14/smatch
   7 | 
   8 | and changed for DocAMR to:
   9 | 1. constrain smatch node mapping based on sentence alignements to make it faster
  10 | 2. compute coref subscore for DocAMR 
  11 | """
  12 | 
  13 | 
  14 | """
  15 | This script computes smatch score between two AMRs.
  16 | For detailed description of smatch, see http://www.isi.edu/natural-language/amr/smatch-13.pdf
  17 | 
  18 | """
  19 | 
  20 | import random
  21 | 
  22 | from . import amr
  23 | import sys
  24 | import time
  25 | from tqdm import tqdm
  26 | 
  27 | # total number of iteration in smatch computation
  28 | iteration_num = 5
  29 | allowed_mappings = {}
  30 | # verbose output switch.
  31 | # Default false (no verbose output)
  32 | verbose = False
  33 | veryVerbose = False
  34 | 
  35 | # single score output switch.
  36 | # Default true (compute a single score for all AMRs in two files)
  37 | single_score = True
  38 | 
  39 | # precision and recall output switch.
  40 | # Default false (do not output precision and recall, just output F score)
  41 | pr_flag = False
  42 | 
  43 | # Error log location
  44 | ERROR_LOG = sys.stderr
  45 | 
  46 | # Debug log location
  47 | DEBUG_LOG = sys.stderr
  48 | 
  49 | # dictionary to save pre-computed node mapping and its resulting triple match count
  50 | # key: tuples of node mapping
  51 | # value: the matching triple count
  52 | match_triple_dict = {}
  53 | 
  54 | 
  55 | class SMATCH_Alignment():
  56 | 
  57 |     def __init__(self, mapping, pred_amr, gold_amr):
  58 |         (pred_instance, pred_attributes, pred_relation) = pred_amr.get_triples()
  59 |         (gold_instance, gold_attributes, gold_relation) = gold_amr.get_triples()
  60 | 
  61 |         self.pred_concepts = {}
  62 |         self.gold_concepts = {}
  63 |         self.pred_edges = {}
  64 |         self.gold_edges = {}
  65 |         self.pred_attr = {}
  66 |         self.gold_attr = {}
  67 |         for instance in pred_instance:
  68 |             r, s, t = instance
  69 |             self.pred_concepts[s] = t
  70 |         for instance in gold_instance:
  71 |             r, s, t = instance
  72 |             self.gold_concepts[s] = t
  73 |         for rel in pred_relation:
  74 |             r, s, t = rel
  75 |             if not (s,t) in self.pred_edges:
  76 |                 self.pred_edges[(s, t)] = []
  77 |             self.pred_edges[(s, t)].append(r)
  78 |         for rel in gold_relation:
  79 |             r, s, t = rel
  80 |             if not (s, t) in self.gold_edges:
  81 |                 self.gold_edges[(s, t)] = []
  82 |             self.gold_edges[(s, t)].append(r)
  83 |         for rel in pred_attributes:
  84 |             r, s, t = rel
  85 |             if s not in self.pred_attr:
  86 |                 self.pred_attr[s] = []
  87 |             self.pred_attr[s].append((r, normalize(t)))
  88 |         for rel in gold_attributes:
  89 |             r, s, t = rel
  90 |             if s not in self.gold_attr:
  91 |                 self.gold_attr[s] = []
  92 |             self.gold_attr[s].append((r, normalize(t)))
  93 | 
  94 |         self.node_align = {}
  95 |         self.node_align_inv = {}
  96 |         self.edge_align = {}
  97 |         self.edge_align_inv = {}
  98 |         self.attr_align = {}
  99 | 
 100 |         self._build_node_alignment(mapping, pred_instance, gold_instance)
 101 |         self._build_edge_alignment()
 102 |         self._build_attr_alignment()
 103 | 
 104 | 
 105 |     def _build_node_alignment(self, mapping, pred_instance, gold_instance):
 106 |         for instance2 in gold_instance:
 107 |             r, s, t = instance2
 108 |             self.node_align_inv[('instance',s,t)] = set()
 109 |         for instance1, m in zip(pred_instance, mapping):
 110 |             r,s,t = instance1
 111 |             if m>-1:
 112 |                 r2,s2,t2 = gold_instance[m]
 113 |                 self.node_align[('instance',s,t)] = ('instance',s2,t2)
 114 |                 self.node_align_inv[('instance',s2,t2)].add(('instance',s,t))
 115 |             else:
 116 |                 self.node_align[('instance',s,t)] = None
 117 | 
 118 | 
 119 |     def _build_edge_alignment(self):
 120 |         for s, t in self.pred_edges:
 121 |             rs = self.pred_edges[(s, t)]
 122 |             s2 = self.pred_to_gold(node=s)
 123 |             t2 = self.pred_to_gold(node=t)
 124 |             if not s2 or not t2:
 125 |                 for r in rs:
 126 |                     self.edge_align[(r, s, t)] = None
 127 |                 continue
 128 |             if (s2, t2) in self.gold_edges:
 129 |                 rs2 = self.gold_edges[(s2, t2)]
 130 |                 for r in rs:
 131 |                     if r in rs2:
 132 |                         self.edge_align[(r, s, t)] = (r, s2, t2)
 133 |                     else:
 134 |                         self.edge_align[(r, s, t)] = (rs2[0], s2, t2)
 135 |             else:
 136 |                 for r in rs:
 137 |                     self.edge_align[(r, s, t)] = None
 138 |         self.edge_align_inv = {}
 139 |         for s,t in self.gold_edges:
 140 |             rs = self.gold_edges[(s, t)]
 141 |             for r in rs:
 142 |                 self.edge_align_inv[(r,s,t)] = set()
 143 |         for rel in self.edge_align:
 144 |             rel2 = self.edge_align[rel]
 145 |             if rel2:
 146 |                 self.edge_align_inv[rel2].add(rel)
 147 | 
 148 | 
 149 |     def _build_attr_alignment(self):
 150 |         for n in self.pred_attr:
 151 |             for r, t in self.pred_attr[n]:
 152 |                 n2 = self.pred_to_gold(node=n)
 153 |                 if not n2:
 154 |                     self.attr_align[(r, n, t)] = None
 155 |                     continue
 156 |                 if n2 in self.gold_attr and (r, t) in self.gold_attr[n2]:
 157 |                     self.attr_align[(r, n, t)] = (r, n2, t)
 158 |                 else:
 159 |                     self.attr_align[(r, n, t)] = None
 160 | 
 161 |     def gold_to_pred(self, node):
 162 |         if node:
 163 |             x = self.node_align_inv[('instance', node, self.gold_concepts[node])]
 164 |             return [n for _, n, c in x]
 165 | 
 166 |     def pred_to_gold(self, node):
 167 |         if node:
 168 |             x = self.node_align[('instance', node, self.pred_concepts[node])]
 169 |             if not x:
 170 |                 return None
 171 |             _, node2, concept = x
 172 |             return node2
 173 | 
 174 |     def iterate_errors(self):
 175 |         for rel in self.node_align:
 176 |             rel2 = self.node_align[rel]
 177 |             if not rel2:
 178 |                 yield 'instance', rel, rel2
 179 |                 continue
 180 |             r,s,t = rel
 181 |             r2,s2,t2 = rel2
 182 |             if t!=t2:
 183 |                 yield 'instance', rel, rel2
 184 |         for rel in self.edge_align:
 185 |             rel2 = self.edge_align[rel]
 186 |             if not rel2:
 187 |                 yield 'relation', rel, rel2
 188 |                 continue
 189 |             r, s, t = rel
 190 |             r2, s2, t2 = rel2
 191 |             if r!=r2:
 192 |                 yield 'relation', rel, rel2
 193 |         for rel in self.attr_align:
 194 |             rel2 = self.attr_align[rel]
 195 |             if not rel2:
 196 |                 yield 'attribute', rel, rel2
 197 |                 continue
 198 |             r, s, t = rel
 199 |             r2, s2, t2 = rel2
 200 |             if r != r2 or t!=t2:
 201 |                 yield 'attribute', rel, rel2
 202 | 
 203 | class Scores:
 204 |     def __init__(self):
 205 |         self.num = 0
 206 |         self.gold_total = 0
 207 |         self.pred_total = 0
 208 | 
 209 |     def get(self):
 210 |         return self.num, self.pred_total, self.gold_total
 211 | 
 212 |     def set(self, num=None, pred=None, gold=None):
 213 |         if num is not None:
 214 |             self.num = num
 215 |         if gold is not None:
 216 |             self.gold_total = gold
 217 |         if pred is not None:
 218 |             self.pred_total = pred
 219 | 
 220 |     def increment(self, num=None, pred=None, gold=None):
 221 |         if num is not None:
 222 |             self.num += num
 223 |         if gold is not None:
 224 |             self.gold_total += gold
 225 |         if pred is not None:
 226 |             self.pred_total += pred
 227 | 
 228 |     def update(self, scores):
 229 |         self.num += scores.num
 230 |         self.gold_total += scores.gold_total
 231 |         self.pred_total += scores.pred_total
 232 | 
 233 | 
 234 | 
 235 | def get_best_match(instance1, attribute1, relation1,
 236 |                    instance2, attribute2, relation2,
 237 |                    prefix1, prefix2, doinstance=True, doattribute=True, dorelation=True, nodes_by_sentence1=None, nodes_by_sentence2=None):
 238 |     """
 239 |     Get the highest triple match number between two sets of triples via hill-climbing.
 240 |     Arguments:
 241 |         instance1: instance triples of AMR 1 ("instance", node name, node value)
 242 |         attribute1: attribute triples of AMR 1 (attribute name, node name, attribute value)
 243 |         relation1: relation triples of AMR 1 (relation name, node 1 name, node 2 name)
 244 |         instance2: instance triples of AMR 2 ("instance", node name, node value)
 245 |         attribute2: attribute triples of AMR 2 (attribute name, node name, attribute value)
 246 |         relation2: relation triples of AMR 2 (relation name, node 1 name, node 2 name)
 247 |         prefix1: prefix label for AMR 1
 248 |         prefix2: prefix label for AMR 2
 249 |     Returns:
 250 |         best_match: the node mapping that results in the highest triple matching number
 251 |         best_match_num: the highest triple matching number
 252 | 
 253 |     """
 254 |     # Compute candidate pool - all possible node match candidates.
 255 |     # In the hill-climbing, we only consider candidate in this pool to save computing time.
 256 |     # weight_dict is a dictionary that maps a pair of node
 257 |     (candidate_mappings, weight_dict) = compute_pool(instance1, attribute1, relation1,
 258 |                                                      instance2, attribute2, relation2,
 259 |                                                      prefix1, prefix2, doinstance=doinstance, doattribute=doattribute,
 260 |                                                      dorelation=dorelation,
 261 |                                                      lol1=nodes_by_sentence1, lol2=nodes_by_sentence2)
 262 |     if veryVerbose:
 263 |         print("Candidate mappings:", file=DEBUG_LOG)
 264 |         print(candidate_mappings, file=DEBUG_LOG)
 265 |         print("Weight dictionary", file=DEBUG_LOG)
 266 |         print(weight_dict, file=DEBUG_LOG)
 267 | 
 268 |     lol1 = []
 269 |     lol2 = []
 270 |     for n in range(len(nodes_by_sentence1)):
 271 |         lol1.append([])
 272 |         for node in nodes_by_sentence1[n]:
 273 |             node_index = int(node[len(prefix1):])
 274 |             lol1[-1].append(node_index)
 275 |     for n in range(len(nodes_by_sentence2)):
 276 |         lol2.append([])
 277 |         for node in nodes_by_sentence2[n]:
 278 |             node_index = int(node[len(prefix2):])
 279 |             lol2[-1].append(node_index)
 280 |                             
 281 |     best_match_num = 0
 282 |     # initialize best match mapping
 283 |     # the ith entry is the node index in AMR 2 which maps to the ith node in AMR 1
 284 |     best_mapping = [-1] * len(instance1)
 285 |     for i in range(iteration_num):
 286 |         if veryVerbose:
 287 |             print("Iteration", i, file=DEBUG_LOG)
 288 |         if i == 0:
 289 |             # smart initialization used for the first round
 290 |             cur_mapping = smart_init_mapping(candidate_mappings, instance1, instance2)
 291 |         else:
 292 |             # random initialization for the other round
 293 |             cur_mapping = random_init_mapping(candidate_mappings)
 294 |         # compute current triple match number
 295 |         match_num = compute_match(cur_mapping, weight_dict)
 296 |         if veryVerbose:
 297 |             print("Node mapping at start", cur_mapping, file=DEBUG_LOG)
 298 |             print("Triple match number at start:", match_num, file=DEBUG_LOG)
 299 |         while True:
 300 |             # get best gain
 301 |             (gain, new_mapping) = get_best_gain(cur_mapping, candidate_mappings, weight_dict,
 302 |                                                 len(instance2), match_num, lol1=lol1)
 303 |             if veryVerbose:
 304 |                 print("Gain after the hill-climbing", gain, file=DEBUG_LOG)
 305 |             # hill-climbing until there will be no gain for new node mapping
 306 |             if gain <= 0:
 307 |                 break
 308 |             # otherwise update match_num and mapping
 309 |             match_num += gain
 310 |             cur_mapping = new_mapping[:]
 311 |             if veryVerbose:
 312 |                 print("Update triple match number to:", match_num, file=DEBUG_LOG)
 313 |                 print("Current mapping:", cur_mapping, file=DEBUG_LOG)
 314 |         if match_num > best_match_num:
 315 |             best_mapping = cur_mapping[:]
 316 |             best_match_num = match_num
 317 |     return best_mapping, best_match_num, weight_dict
 318 | 
 319 | 
 320 | def normalize(item):
 321 |     """
 322 |     lowercase and remove quote signifiers from items that are about to be compared
 323 |     """
 324 |     return item.lower().rstrip('_')
 325 | 
 326 | 
 327 | def compute_pool(instance1, attribute1, relation1,
 328 |                  instance2, attribute2, relation2,
 329 |                  prefix1, prefix2, doinstance=True, doattribute=True, dorelation=True, lol1=None, lol2=None):
 330 |     """
 331 |     compute all possible node mapping candidates and their weights (the triple matching number gain resulting from
 332 |     mapping one node in AMR 1 to another node in AMR2)
 333 | 
 334 |     Arguments:
 335 |         instance1: instance triples of AMR 1
 336 |         attribute1: attribute triples of AMR 1 (attribute name, node name, attribute value)
 337 |         relation1: relation triples of AMR 1 (relation name, node 1 name, node 2 name)
 338 |         instance2: instance triples of AMR 2
 339 |         attribute2: attribute triples of AMR 2 (attribute name, node name, attribute value)
 340 |         relation2: relation triples of AMR 2 (relation name, node 1 name, node 2 name
 341 |         prefix1: prefix label for AMR 1
 342 |         prefix2: prefix label for AMR 2
 343 |     Returns:
 344 |       candidate_mapping: a list of candidate nodes.
 345 |                        The ith element contains the node indices (in AMR 2) the ith node (in AMR 1) can map to.
 346 |                        (resulting in non-zero triple match)
 347 |       weight_dict: a dictionary which contains the matching triple number for every pair of node mapping. The key
 348 |                    is a node pair. The value is another dictionary. key {-1} is triple match resulting from this node
 349 |                    pair alone (instance triples and attribute triples), and other keys are node pairs that can result
 350 |                    in relation triple match together with the first node pair.
 351 | 
 352 | 
 353 |     """
 354 |     #allowed_mappings = {}
 355 |     allowed_mappings.clear()
 356 |     if lol1 is not None and lol2 is not None:
 357 |         for n in range(len(lol1)):
 358 |             for node1 in lol1[n]:
 359 |                 node1_index = int(node1[len(prefix1):])
 360 |                 if node1_index not in allowed_mappings:
 361 |                     allowed_mappings[node1_index] = set()
 362 |                 if len(lol2) <= n :
 363 |                     lol2.append([])
 364 |                 for node2 in lol2[n]:
 365 |                     node2_index = int(node2[len(prefix2):])
 366 |                     if node2_index not in allowed_mappings[node1_index]:
 367 |                         allowed_mappings[node1_index].add(node2_index)
 368 |     #allowed_mappings = None
 369 |     candidate_mapping = []
 370 |     weight_dict = {}
 371 |     for instance1_item in instance1:
 372 |         # each candidate mapping is a set of node indices
 373 |         candidate_mapping.append(set())
 374 |         if doinstance:
 375 |             for instance2_item in instance2:
 376 |                 # if both triples are instance triples and have the same value
 377 |                 if normalize(instance1_item[0]) == normalize(instance2_item[0]) and \
 378 |                         normalize(instance1_item[2]) == normalize(instance2_item[2]):
 379 |                     # get node index by stripping the prefix
 380 |                     node1_index = int(instance1_item[1][len(prefix1):])
 381 |                     node2_index = int(instance2_item[1][len(prefix2):])
 382 |                     if node1_index not in allowed_mappings:
 383 |                         import ipdb; ipdb.set_trace()
 384 |                     if allowed_mappings is None or node2_index in allowed_mappings[node1_index]:
 385 |                         candidate_mapping[node1_index].add(node2_index)                    
 386 |                         node_pair = (node1_index, node2_index)
 387 |                         # use -1 as key in weight_dict for instance triples and attribute triples
 388 |                         if node_pair in weight_dict:
 389 |                             weight_dict[node_pair][-1] += 1
 390 |                         else:
 391 |                             weight_dict[node_pair] = {}
 392 |                             weight_dict[node_pair][-1] = 1
 393 |     if doattribute:
 394 |         for attribute1_item in attribute1:
 395 |             for attribute2_item in attribute2:
 396 |                 # if both attribute relation triple have the same relation name and value
 397 |                 if normalize(attribute1_item[0]) == normalize(attribute2_item[0]) \
 398 |                         and normalize(attribute1_item[2]) == normalize(attribute2_item[2]):
 399 |                     node1_index = int(attribute1_item[1][len(prefix1):])
 400 |                     node2_index = int(attribute2_item[1][len(prefix2):])
 401 |                     if allowed_mappings is None or node2_index in allowed_mappings[node1_index]:
 402 |                         candidate_mapping[node1_index].add(node2_index)
 403 |                         node_pair = (node1_index, node2_index)
 404 |                         # use -1 as key in weight_dict for instance triples and attribute triples
 405 |                         if node_pair in weight_dict:
 406 |                             weight_dict[node_pair][-1] += 1
 407 |                         else:
 408 |                             weight_dict[node_pair] = {}
 409 |                             weight_dict[node_pair][-1] = 1
 410 |     if dorelation:
 411 |         for relation1_item in relation1:
 412 |             for relation2_item in relation2:
 413 |                 # if both relation share the same name
 414 |                 if normalize(relation1_item[0]) == normalize(relation2_item[0]):
 415 |                     node1_index_amr1 = int(relation1_item[1][len(prefix1):])
 416 |                     node1_index_amr2 = int(relation2_item[1][len(prefix2):])
 417 |                     node2_index_amr1 = int(relation1_item[2][len(prefix1):])
 418 |                     node2_index_amr2 = int(relation2_item[2][len(prefix2):])
 419 |                     # add mapping between two nodes
 420 |                     if allowed_mappings is not None and (node1_index_amr2 not in allowed_mappings[node1_index_amr1] or node2_index_amr2 not in allowed_mappings[node2_index_amr1]):
 421 |                         continue
 422 |                     node_pair1 = (node1_index_amr1, node1_index_amr2)
 423 |                     node_pair2 = (node2_index_amr1, node2_index_amr2)
 424 |                     candidate_mapping[node1_index_amr1].add(node1_index_amr2)
 425 |                     candidate_mapping[node2_index_amr1].add(node2_index_amr2)
 426 |                     if node_pair2 != node_pair1:
 427 |                         # update weight_dict weight. Note that we need to update both entries for future search
 428 |                         # i.e weight_dict[node_pair1][node_pair2]
 429 |                         #     weight_dict[node_pair2][node_pair1]
 430 |                         if node1_index_amr1 > node2_index_amr1:
 431 |                             # swap node_pair1 and node_pair2
 432 |                             node_pair1 = (node2_index_amr1, node2_index_amr2)
 433 |                             node_pair2 = (node1_index_amr1, node1_index_amr2)
 434 |                         if node_pair1 in weight_dict:
 435 |                             if node_pair2 in weight_dict[node_pair1]:
 436 |                                 weight_dict[node_pair1][node_pair2] += 1
 437 |                             else:
 438 |                                 weight_dict[node_pair1][node_pair2] = 1
 439 |                         else:
 440 |                             weight_dict[node_pair1] = {-1: 0, node_pair2: 1}
 441 |                         if node_pair2 in weight_dict:
 442 |                             if node_pair1 in weight_dict[node_pair2]:
 443 |                                 weight_dict[node_pair2][node_pair1] += 1
 444 |                             else:
 445 |                                 weight_dict[node_pair2][node_pair1] = 1
 446 |                         else:
 447 |                             weight_dict[node_pair2] = {-1: 0, node_pair1: 1}
 448 |                     else:
 449 |                         if node_pair1 in weight_dict:
 450 |                             weight_dict[node_pair1][-1] += 1
 451 |                         else:
 452 |                             weight_dict[node_pair1] = {-1: 1}
 453 |                             
 454 |     return candidate_mapping, weight_dict
 455 | 
 456 | 
 457 | def smart_init_mapping(candidate_mapping, instance1, instance2):
 458 |     """
 459 |     Initialize mapping based on the concept mapping (smart initialization)
 460 |     Arguments:
 461 |         candidate_mapping: candidate node match list
 462 |         instance1: instance triples of AMR 1
 463 |         instance2: instance triples of AMR 2
 464 |     Returns:
 465 |         initialized node mapping between two AMRs
 466 | 
 467 |     """
 468 |     random.seed()
 469 |     matched_dict = {}
 470 |     result = []
 471 |     # list to store node indices that have no concept match
 472 |     no_word_match = []
 473 |     for i, candidates in enumerate(candidate_mapping):
 474 |         if not candidates:
 475 |             # no possible mapping
 476 |             result.append(-1)
 477 |             continue
 478 |         # node value in instance triples of AMR 1
 479 |         value1 = instance1[i][2]
 480 |         for node_index in candidates:
 481 |             value2 = instance2[node_index][2]
 482 |             # find the first instance triple match in the candidates
 483 |             # instance triple match is having the same concept value
 484 |             if value1 == value2:
 485 |                 if node_index not in matched_dict:
 486 |                     result.append(node_index)
 487 |                     matched_dict[node_index] = 1
 488 |                     break
 489 |         if len(result) == i:
 490 |             no_word_match.append(i)
 491 |             result.append(-1)
 492 |     # if no concept match, generate a random mapping
 493 |     for i in no_word_match:
 494 |         candidates = list(candidate_mapping[i])
 495 |         while candidates:
 496 |             # get a random node index from candidates
 497 |             rid = random.randint(0, len(candidates) - 1)
 498 |             candidate = candidates[rid]
 499 |             if candidate in matched_dict:
 500 |                 candidates.pop(rid)
 501 |             else:
 502 |                 matched_dict[candidate] = 1
 503 |                 result[i] = candidate
 504 |                 break
 505 |     return result
 506 | 
 507 | 
 508 | def random_init_mapping(candidate_mapping):
 509 |     """
 510 |     Generate a random node mapping.
 511 |     Args:
 512 |         candidate_mapping: candidate_mapping: candidate node match list
 513 |     Returns:
 514 |         randomly-generated node mapping between two AMRs
 515 | 
 516 |     """
 517 |     # if needed, a fixed seed could be passed here to generate same random (to help debugging)
 518 |     random.seed()
 519 |     matched_dict = {}
 520 |     result = []
 521 |     for c in candidate_mapping:
 522 |         candidates = list(c)
 523 |         if not candidates:
 524 |             # -1 indicates no possible mapping
 525 |             result.append(-1)
 526 |             continue
 527 |         found = False
 528 |         while candidates:
 529 |             # randomly generate an index in [0, length of candidates)
 530 |             rid = random.randint(0, len(candidates) - 1)
 531 |             candidate = candidates[rid]
 532 |             # check if it has already been matched
 533 |             if candidate in matched_dict:
 534 |                 candidates.pop(rid)
 535 |             else:
 536 |                 matched_dict[candidate] = 1
 537 |                 result.append(candidate)
 538 |                 found = True
 539 |                 break
 540 |         if not found:
 541 |             result.append(-1)
 542 |     return result
 543 | 
 544 | 
 545 | def compute_match(mapping, weight_dict):
 546 |     """
 547 |     Given a node mapping, compute match number based on weight_dict.
 548 |     Args:
 549 |     mappings: a list of node index in AMR 2. The ith element (value j) means node i in AMR 1 maps to node j in AMR 2.
 550 |     Returns:
 551 |     matching triple number
 552 |     Complexity: O(m*n) , m is the node number of AMR 1, n is the node number of AMR 2
 553 | 
 554 |     """
 555 |     # If this mapping has been investigated before, retrieve the value instead of re-computing.
 556 |     if veryVerbose:
 557 |         print("Computing match for mapping", file=DEBUG_LOG)
 558 |         print(mapping, file=DEBUG_LOG)
 559 |     if tuple(mapping) in match_triple_dict:
 560 |         if veryVerbose:
 561 |             print("saved value", match_triple_dict[tuple(mapping)], file=DEBUG_LOG)
 562 |         return match_triple_dict[tuple(mapping)]
 563 |     match_num = 0
 564 |     # i is node index in AMR 1, m is node index in AMR 2
 565 |     for i, m in enumerate(mapping):
 566 |         if m == -1:
 567 |             # no node maps to this node
 568 |             continue
 569 |         # node i in AMR 1 maps to node m in AMR 2
 570 |         current_node_pair = (i, m)
 571 |         if current_node_pair not in weight_dict:
 572 |             continue
 573 |         if veryVerbose:
 574 |             print("node_pair", current_node_pair, file=DEBUG_LOG)
 575 |         for key in weight_dict[current_node_pair]:
 576 |             if key == -1:
 577 |                 # matching triple resulting from instance/attribute triples
 578 |                 match_num += weight_dict[current_node_pair][key]
 579 |                 if veryVerbose:
 580 |                     print("instance/attribute match", weight_dict[current_node_pair][key], file=DEBUG_LOG)
 581 |             # only consider node index larger than i to avoid duplicates
 582 |             # as we store both weight_dict[node_pair1][node_pair2] and
 583 |             #     weight_dict[node_pair2][node_pair1] for a relation
 584 |             elif key[0] < i:
 585 |                 continue
 586 |             elif mapping[key[0]] == key[1]:
 587 |                 match_num += weight_dict[current_node_pair][key]
 588 |                 if veryVerbose:
 589 |                     print("relation match with", key, weight_dict[current_node_pair][key], file=DEBUG_LOG)
 590 |     if veryVerbose:
 591 |         print("match computing complete, result:", match_num, file=DEBUG_LOG)
 592 |     # update match_triple_dict
 593 |     match_triple_dict[tuple(mapping)] = match_num
 594 |     return match_num
 595 | 
 596 | 
 597 | def move_gain(mapping, node_id, old_id, new_id, weight_dict, match_num):
 598 |     """
 599 |     Compute the triple match number gain from the move operation
 600 |     Arguments:
 601 |         mapping: current node mapping
 602 |         node_id: remapped node in AMR 1
 603 |         old_id: original node id in AMR 2 to which node_id is mapped
 604 |         new_id: new node in to which node_id is mapped
 605 |         weight_dict: weight dictionary
 606 |         match_num: the original triple matching number
 607 |     Returns:
 608 |         the triple match gain number (might be negative)
 609 | 
 610 |     """
 611 |     # new node mapping after moving
 612 |     new_mapping = (node_id, new_id)
 613 |     # node mapping before moving
 614 |     old_mapping = (node_id, old_id)
 615 |     # new nodes mapping list (all node pairs)
 616 |     new_mapping_list = mapping[:]
 617 |     new_mapping_list[node_id] = new_id
 618 |     # if this mapping is already been investigated, use saved one to avoid duplicate computing
 619 |     if tuple(new_mapping_list) in match_triple_dict:
 620 |         return match_triple_dict[tuple(new_mapping_list)] - match_num
 621 |     gain = 0
 622 |     # add the triple match incurred by new_mapping to gain
 623 |     if new_mapping in weight_dict:
 624 |         for key in weight_dict[new_mapping]:
 625 |             if key == -1:
 626 |                 # instance/attribute triple match
 627 |                 gain += weight_dict[new_mapping][-1]
 628 |             elif new_mapping_list[key[0]] == key[1]:
 629 |                 # relation gain incurred by new_mapping and another node pair in new_mapping_list
 630 |                 gain += weight_dict[new_mapping][key]
 631 |     # deduct the triple match incurred by old_mapping from gain
 632 |     if old_mapping in weight_dict:
 633 |         for k in weight_dict[old_mapping]:
 634 |             if k == -1:
 635 |                 gain -= weight_dict[old_mapping][-1]
 636 |             elif mapping[k[0]] == k[1]:
 637 |                 gain -= weight_dict[old_mapping][k]
 638 |     # update match number dictionary
 639 |     match_triple_dict[tuple(new_mapping_list)] = match_num + gain
 640 |     return gain
 641 | 
 642 | 
 643 | def swap_gain(mapping, node_id1, mapping_id1, node_id2, mapping_id2, weight_dict, match_num):
 644 |     """
 645 |     Compute the triple match number gain from the swapping
 646 |     Arguments:
 647 |     mapping: current node mapping list
 648 |     node_id1: node 1 index in AMR 1
 649 |     mapping_id1: the node index in AMR 2 node 1 maps to (in the current mapping)
 650 |     node_id2: node 2 index in AMR 1
 651 |     mapping_id2: the node index in AMR 2 node 2 maps to (in the current mapping)
 652 |     weight_dict: weight dictionary
 653 |     match_num: the original matching triple number
 654 |     Returns:
 655 |     the gain number (might be negative)
 656 | 
 657 |     """
 658 |     new_mapping_list = mapping[:]
 659 |     # Before swapping, node_id1 maps to mapping_id1, and node_id2 maps to mapping_id2
 660 |     # After swapping, node_id1 maps to mapping_id2 and node_id2 maps to mapping_id1
 661 |     new_mapping_list[node_id1] = mapping_id2
 662 |     new_mapping_list[node_id2] = mapping_id1
 663 |     if tuple(new_mapping_list) in match_triple_dict:
 664 |         return match_triple_dict[tuple(new_mapping_list)] - match_num
 665 |     gain = 0
 666 |     new_mapping1 = (node_id1, mapping_id2)
 667 |     new_mapping2 = (node_id2, mapping_id1)
 668 |     old_mapping1 = (node_id1, mapping_id1)
 669 |     old_mapping2 = (node_id2, mapping_id2)
 670 |     if node_id1 > node_id2:
 671 |         new_mapping2 = (node_id1, mapping_id2)
 672 |         new_mapping1 = (node_id2, mapping_id1)
 673 |         old_mapping1 = (node_id2, mapping_id2)
 674 |         old_mapping2 = (node_id1, mapping_id1)
 675 |     if new_mapping1 in weight_dict:
 676 |         for key in weight_dict[new_mapping1]:
 677 |             if key == -1:
 678 |                 gain += weight_dict[new_mapping1][-1]
 679 |             elif new_mapping_list[key[0]] == key[1]:
 680 |                 gain += weight_dict[new_mapping1][key]
 681 |     if new_mapping2 in weight_dict:
 682 |         for key in weight_dict[new_mapping2]:
 683 |             if key == -1:
 684 |                 gain += weight_dict[new_mapping2][-1]
 685 |             # to avoid duplicate
 686 |             elif key[0] == node_id1:
 687 |                 continue
 688 |             elif new_mapping_list[key[0]] == key[1]:
 689 |                 gain += weight_dict[new_mapping2][key]
 690 |     if old_mapping1 in weight_dict:
 691 |         for key in weight_dict[old_mapping1]:
 692 |             if key == -1:
 693 |                 gain -= weight_dict[old_mapping1][-1]
 694 |             elif mapping[key[0]] == key[1]:
 695 |                 gain -= weight_dict[old_mapping1][key]
 696 |     if old_mapping2 in weight_dict:
 697 |         for key in weight_dict[old_mapping2]:
 698 |             if key == -1:
 699 |                 gain -= weight_dict[old_mapping2][-1]
 700 |             # to avoid duplicate
 701 |             elif key[0] == node_id1:
 702 |                 continue
 703 |             elif mapping[key[0]] == key[1]:
 704 |                 gain -= weight_dict[old_mapping2][key]
 705 |     match_triple_dict[tuple(new_mapping_list)] = match_num + gain
 706 |     return gain
 707 | 
 708 | 
 709 | def get_best_gain(mapping, candidate_mappings, weight_dict, instance_len, cur_match_num, lol1=None):
 710 |     """
 711 |     Hill-climbing method to return the best gain swap/move can get
 712 |     Arguments:
 713 |     mapping: current node mapping
 714 |     candidate_mappings: the candidates mapping list
 715 |     weight_dict: the weight dictionary
 716 |     instance_len: the number of the nodes in AMR 2
 717 |     cur_match_num: current triple match number
 718 |     Returns:
 719 |     the best gain we can get via swap/move operation
 720 | 
 721 |     """
 722 |     largest_gain = 0
 723 |     # True: using swap; False: using move
 724 |     use_swap = True
 725 |     # the node to be moved/swapped
 726 |     node1 = None
 727 |     # store the other node affected. In swap, this other node is the node swapping with node1. In move, this other
 728 |     # node is the node node1 will move to.
 729 |     node2 = None
 730 |     # unmatched nodes in AMR 2
 731 |     unmatched = set(range(instance_len))
 732 |     # exclude nodes in current mapping
 733 |     # get unmatched nodes
 734 |     for nid in mapping:
 735 |         if nid in unmatched:
 736 |             unmatched.remove(nid)
 737 |     for i, nid in enumerate(mapping):
 738 |         # current node i in AMR 1 maps to node nid in AMR 2
 739 |         for nm in unmatched:
 740 |             if nm in candidate_mappings[i]:
 741 |                 # remap i to another unmatched node (move)
 742 |                 # (i, m) -> (i, nm)
 743 |                 if veryVerbose:
 744 |                     print("Remap node", i, "from ", nid, "to", nm, file=DEBUG_LOG)
 745 |                 mv_gain = move_gain(mapping, i, nid, nm, weight_dict, cur_match_num)
 746 |                 if veryVerbose:
 747 |                     print("Move gain:", mv_gain, file=DEBUG_LOG)
 748 |                     new_mapping = mapping[:]
 749 |                     new_mapping[i] = nm
 750 |                     new_match_num = compute_match(new_mapping, weight_dict)
 751 |                     if new_match_num != cur_match_num + mv_gain:
 752 |                         print(mapping, new_mapping, file=ERROR_LOG)
 753 |                         print("Inconsistency in computing: move gain", cur_match_num, mv_gain, new_match_num,
 754 |                               file=ERROR_LOG)
 755 |                 if mv_gain > largest_gain:
 756 |                     largest_gain = mv_gain
 757 |                     node1 = i
 758 |                     node2 = nm
 759 |                     use_swap = False
 760 | 
 761 |                     # compute swap gain
 762 |                     
 763 |     if True:
 764 |         for i, m in enumerate(mapping):
 765 |            for j in range(i + 1, len(mapping)):
 766 |                 m2 = mapping[j]
 767 |                 if (m2 not in candidate_mappings[i]) and (m not in candidate_mappings[j]):
 768 |                     continue
 769 |                 # swap operation (i, m) (j, m2) -> (i, m2) (j, m)
 770 |                 # j starts from i+1, to avoid duplicate swap
 771 |                 if veryVerbose:
 772 |                     print("Swap node", i, "and", j, file=DEBUG_LOG)
 773 |                     print("Before swapping:", i, "-", m, ",", j, "-", m2, file=DEBUG_LOG)
 774 |                     print(mapping, file=DEBUG_LOG)
 775 |                     print("After swapping:", i, "-", m2, ",", j, "-", m, file=DEBUG_LOG)
 776 |                 sw_gain = swap_gain(mapping, i, m, j, m2, weight_dict, cur_match_num)
 777 |                 if veryVerbose:
 778 |                     print("Swap gain:", sw_gain, file=DEBUG_LOG)
 779 |                     new_mapping = mapping[:]
 780 |                     new_mapping[i] = m2
 781 |                     new_mapping[j] = m
 782 |                     print(new_mapping, file=DEBUG_LOG)
 783 |                     new_match_num = compute_match(new_mapping, weight_dict)
 784 |                     if new_match_num != cur_match_num + sw_gain:
 785 |                         print(mapping, new_mapping, file=ERROR_LOG)
 786 |                         print("Inconsistency in computing: swap gain", cur_match_num, sw_gain, new_match_num,
 787 |                               file=ERROR_LOG)
 788 |                 if sw_gain > largest_gain:
 789 |                     largest_gain = sw_gain
 790 |                     node1 = i
 791 |                     node2 = j
 792 |                     use_swap = True
 793 |                     
 794 |     # generate a new mapping based on swap/move
 795 |     cur_mapping = mapping[:]
 796 |     if node1 is not None:
 797 |         if use_swap:
 798 |             if veryVerbose:
 799 |                 print("Use swap gain", file=DEBUG_LOG)
 800 |             temp = cur_mapping[node1]
 801 |             cur_mapping[node1] = cur_mapping[node2]
 802 |             cur_mapping[node2] = temp
 803 |         else:
 804 |             if veryVerbose:
 805 |                 print("Use move gain", file=DEBUG_LOG)
 806 |             cur_mapping[node1] = node2
 807 |     else:
 808 |         if veryVerbose:
 809 |             print("no move/swap gain found", file=DEBUG_LOG)
 810 |     if veryVerbose:
 811 |         print("Original mapping", mapping, file=DEBUG_LOG)
 812 |         print("Current mapping", cur_mapping, file=DEBUG_LOG)
 813 | 
 814 |     return largest_gain, cur_mapping
 815 | 
 816 | 
 817 | def print_alignment(mapping, instance1, instance2, new2old1=None, new2old2=None):
 818 |     """
 819 |     print the alignment based on a node mapping
 820 |     Args:
 821 |         mapping: current node mapping list
 822 |         instance1: nodes of AMR 1
 823 |         instance2: nodes of AMR 2
 824 | 
 825 |     """
 826 |     result = []
 827 |     for instance1_item, m in zip(instance1, mapping):
 828 |         if new2old1 is None:
 829 |             r = instance1_item[1] + "(" + instance1_item[2] + ")"
 830 |         else:
 831 |             r = new2old1[instance1_item[1]] + "(" + instance1_item[2] + ")"
 832 |         if m == -1:
 833 |             r += "-Null"
 834 |         else:
 835 |             instance2_item = instance2[m]
 836 |             if new2old2 is None:
 837 |                 r += "-" + instance2_item[1] + "(" + instance2_item[2] + ")"
 838 |             else:
 839 |                 r += "-" + new2old2[instance2_item[1]] + "(" + instance2_item[2] + ")"
 840 |                 result.append(r)
 841 |     return " ".join(result)
 842 | 
 843 | 
 844 | def compute_f(match_num, test_num, gold_num):
 845 |     """
 846 |     Compute the f-score based on the matching triple number,
 847 |                                  triple number of AMR set 1,
 848 |                                  triple number of AMR set 2
 849 |     Args:
 850 |         match_num: matching triple number
 851 |         test_num:  triple number of AMR 1 (test file)
 852 |         gold_num:  triple number of AMR 2 (gold file)
 853 |     Returns:
 854 |         precision: match_num/test_num
 855 |         recall: match_num/gold_num
 856 |         f_score: 2*precision*recall/(precision+recall)
 857 |     """
 858 |     if test_num == 0 or gold_num == 0:
 859 |         return 0.00, 0.00, 0.00
 860 |     precision = float(match_num) / float(test_num)
 861 |     recall = float(match_num) / float(gold_num)
 862 |     if (precision + recall) != 0:
 863 |         f_score = 2 * precision * recall / (precision + recall)
 864 |         if veryVerbose:
 865 |             print("F-score:", f_score, file=DEBUG_LOG)
 866 |         return precision, recall, f_score
 867 |     else:
 868 |         if veryVerbose:
 869 |             print("F-score:", "0.0", file=DEBUG_LOG)
 870 |         return precision, recall, 0.00
 871 | 
 872 | 
 873 | def generate_amr_lines(f1, f2):
 874 |     """
 875 |     Read one AMR line at a time from each file handle
 876 |     :param f1: file handle (or any iterable of strings) to read AMR 1 lines from
 877 |     :param f2: file handle (or any iterable of strings) to read AMR 2 lines from
 878 |     :return: generator of cur_amr1, cur_amr2 pairs: one-line AMR strings
 879 |     """
 880 |     while True:
 881 |         cur_amr1 = amr.AMR.get_amr_line(f1)
 882 |         cur_amr2 = amr.AMR.get_amr_line(f2)
 883 |         if not cur_amr1 and not cur_amr2:
 884 |             pass
 885 |         elif not cur_amr1:
 886 |             print("Error: File 1 has less AMRs than file 2", file=ERROR_LOG)
 887 |             print("Ignoring remaining AMRs", file=ERROR_LOG)
 888 |         elif not cur_amr2:
 889 |             print("Error: File 2 has less AMRs than file 1", file=ERROR_LOG)
 890 |             print("Ignoring remaining AMRs", file=ERROR_LOG)
 891 |         else:
 892 |             yield cur_amr1, cur_amr2
 893 |             continue
 894 |         break
 895 | 
 896 | 
 897 | def get_amr_match(cur_amr1, cur_amr2, sent_num=1, justinstance=False, justattribute=False, justrelation=False, coref=False):
 898 |     amr_pair = []
 899 |     for i, cur_amr in (1, cur_amr1), (2, cur_amr2):
 900 |         try:
 901 |             amr_pair.append(amr.AMR.parse_AMR_line(cur_amr))
 902 |         except Exception as e:
 903 |             print("Error in parsing amr %d: %s" % (i, cur_amr), file=ERROR_LOG)
 904 |             print("Please check if the AMR is ill-formatted. Ignoring remaining AMRs", file=ERROR_LOG)
 905 |             print("Error message: %s" % e, file=ERROR_LOG)
 906 |     subscores = {}
 907 |     if len(amr_pair) != 2:
 908 |         return (0,0,0), subscores
 909 |     amr1, amr2 = amr_pair
 910 | 
 911 |     if False:
 912 | 
 913 |         #code to check if all correct mapping are still allowed
 914 |         #with sentence alignment contraints for faster doc smatch
 915 |         #-----
 916 |         #tested this on gold docAMR and corresponding no coref version
 917 |         
 918 |         sentences_by_node_1 = {n:set() for n in amr1.nodes}
 919 |         for i,nodes_in_sentence in enumerate(amr1.nodes_by_sentence):
 920 |             for n in nodes_in_sentence:
 921 |                 sentences_by_node_1[n].add(i)
 922 |         sentences_by_node_2 = {n:set() for n in amr2.nodes}
 923 |         for i,nodes_in_sentence in enumerate(amr2.nodes_by_sentence):
 924 |             for n in nodes_in_sentence:
 925 |                 sentences_by_node_2[n].add(i)
 926 | 
 927 |         for n1 in sentences_by_node_1:
 928 |             if n1 in sentences_by_node_2:
 929 |                 if len( sentences_by_node_1[n1].intersection(sentences_by_node_2[n1]) ) == 0:
 930 |                     if True:
 931 |                         print(n1)
 932 |                         print(sentences_by_node_1[n1])
 933 |                         print(sentences_by_node_2[n1])
 934 |                         print("====")
 935 |                     
 936 |         return (0,0,0), subscores
 937 | 
 938 |     if amr1 is None or amr2 is None:
 939 |         return (0,0,0), subscores
 940 |     prefix1 = "a"
 941 |     prefix2 = "b"
 942 |     # Rename node to "a1", "a2", .etc
 943 |     amr1.rename_node(prefix1)
 944 |     # Renaming node to "b1", "b2", .etc
 945 |     amr2.rename_node(prefix2)
 946 |     (instance1, attributes1, relation1) = amr1.get_triples()
 947 |     (instance2, attributes2, relation2) = amr2.get_triples()
 948 |     if verbose:
 949 |         print("AMR pair", sent_num, file=DEBUG_LOG)
 950 |         print("============================================", file=DEBUG_LOG)
 951 |         print("AMR 1 (one-line):", cur_amr1, file=DEBUG_LOG)
 952 |         print("AMR 2 (one-line):", cur_amr2, file=DEBUG_LOG)
 953 |         print("Instance triples of AMR 1:", len(instance1), file=DEBUG_LOG)
 954 |         print(instance1, file=DEBUG_LOG)
 955 |         print("Attribute triples of AMR 1:", len(attributes1), file=DEBUG_LOG)
 956 |         print(attributes1, file=DEBUG_LOG)
 957 |         print("Relation triples of AMR 1:", len(relation1), file=DEBUG_LOG)
 958 |         print(relation1, file=DEBUG_LOG)
 959 |         print("Instance triples of AMR 2:", len(instance2), file=DEBUG_LOG)
 960 |         print(instance2, file=DEBUG_LOG)
 961 |         print("Attribute triples of AMR 2:", len(attributes2), file=DEBUG_LOG)
 962 |         print(attributes2, file=DEBUG_LOG)
 963 |         print("Relation triples of AMR 2:", len(relation2), file=DEBUG_LOG)
 964 |         print(relation2, file=DEBUG_LOG)
 965 |     # optionally turn off some of the node comparison
 966 |     doinstance = doattribute = dorelation = True
 967 |     if justinstance:
 968 |         doattribute = dorelation = False
 969 |     if justattribute:
 970 |         doinstance = dorelation = False
 971 |     if justrelation:
 972 |         doinstance = doattribute = False
 973 |     (best_mapping, best_match_num, weight_dict) = get_best_match(instance1, attributes1, relation1,
 974 |                                                     instance2, attributes2, relation2,
 975 |                                                     prefix1, prefix2, doinstance=doinstance,
 976 |                                                     doattribute=doattribute, dorelation=dorelation,
 977 |                                                     nodes_by_sentence1=amr1.nodes_by_sentence, nodes_by_sentence2=amr2.nodes_by_sentence)
 978 |     if verbose:
 979 |         print("best match number", best_match_num, file=DEBUG_LOG)
 980 |         print("best node mapping", best_mapping, file=DEBUG_LOG)
 981 |         print("Best node mapping alignment:", print_alignment(best_mapping, instance1, instance2, new2old1=amr1.new2old_map, new2old2=amr2.new2old_map), file=DEBUG_LOG)
 982 |     if justinstance:
 983 |         test_triple_num = len(instance1)
 984 |         gold_triple_num = len(instance2)
 985 |     elif justattribute:
 986 |         test_triple_num = len(attributes1)
 987 |         gold_triple_num = len(attributes2)
 988 |     elif justrelation:
 989 |         test_triple_num = len(relation1)
 990 |         gold_triple_num = len(relation2)
 991 |     else:
 992 |         test_triple_num = len(instance1) + len(attributes1) + len(relation1)
 993 |         gold_triple_num = len(instance2) + len(attributes2) + len(relation2)
 994 | 
 995 |     total_nums = (best_match_num, test_triple_num, gold_triple_num)
 996 |     if coref:
 997 |         amr1.find_coref()
 998 |         amr2.find_coref()
 999 |         alignment = SMATCH_Alignment(best_mapping, pred_amr=amr1, gold_amr=amr2)
1000 |         #for type,rel,rel2 in alignment.iterate_errors():
1001 |         #     print(type,rel,rel2)
1002 |         #     print()
1003 |         # named entities
1004 |         # bridging relations
1005 |         # other
1006 |         ne_scores = Scores()
1007 |         bridging_scores = Scores()
1008 |         other_scores = Scores()
1009 |         for n in amr2.coref_nodes:
1010 |             if n in amr2.named_entities:
1011 |                 ne_scores.increment(gold=1)
1012 |             else:
1013 |                 other_scores.increment(gold=1)
1014 |             ns = alignment.gold_to_pred(node=n)
1015 |             for n2 in ns:
1016 |                 if n2 not in amr1.coref_nodes:
1017 |                     continue
1018 |                 #if n in amr2.named_entities:
1019 |                 #    ne_scores.increment(pred=1)
1020 |                 #else:
1021 |                 #    other_scores.increment(pred=1)
1022 |                 if alignment.gold_concepts[n]==alignment.pred_concepts[n2]:
1023 |                     if n in amr2.named_entities:
1024 |                         ne_scores.increment(num=1)
1025 |                     else:
1026 |                         other_scores.increment(num=1)
1027 |         for n in amr1.coref_nodes:
1028 |             if n in amr1.named_entities:
1029 |                 ne_scores.increment(pred=1)
1030 |             else:
1031 |                 other_scores.increment(pred=1)
1032 |                 
1033 |         #ne_scores = Scores()
1034 |         #bridging_scores = Scores()
1035 |         #other_scores = Scores()                
1036 |         for r,s,t in amr2.coref_edges:
1037 |             if r in ['part','subset']:
1038 |                 bridging_scores.increment(gold=1)
1039 |             else:
1040 |                 other_scores.increment(gold=1)
1041 |             rels = alignment.edge_align_inv[(r,s,t)]
1042 |             for rel in rels:
1043 |                 if rel not in amr1.coref_edges:
1044 |                     continue
1045 |                 #if r in ['part', 'subset']:
1046 |                 #    bridging_scores.increment(pred=1)
1047 |                 #else:
1048 |                 #    other_scores.increment(pred=1)
1049 |                 r2, s2, t2 = rel
1050 |                 if r == r2:
1051 |                     if r in ['part', 'subset']:
1052 |                         bridging_scores.increment(num=1)
1053 |                     else:
1054 |                         other_scores.increment(num=1)
1055 |         for r,s,t in amr1.coref_edges:
1056 |             if r in ['part','subset']:
1057 |                 bridging_scores.increment(pred=1)
1058 |             else:
1059 |                 other_scores.increment(pred=1)
1060 |                 
1061 |         subscores['Named Entity Coref'] = ne_scores
1062 |         subscores['Bridging Relations'] = bridging_scores
1063 |         subscores['Other Coref'] = other_scores
1064 |         coref_scores = Scores()
1065 |         for scores in [ne_scores, bridging_scores, other_scores]:
1066 |             coref_scores.update(scores)
1067 |         subscores['Total Coref'] = coref_scores
1068 |         noncoref_scores = Scores()
1069 |         noncoref_scores.set(num=best_match_num-coref_scores.num,
1070 |                             gold=gold_triple_num-coref_scores.gold_total,
1071 |                             pred=test_triple_num-coref_scores.pred_total)
1072 |         subscores['Non-Coref'] = noncoref_scores
1073 |     return total_nums, subscores
1074 | 
1075 | # long_sents = []#2, 19, 35, 40, 41]
1076 | 
1077 | def score_amr_pairs(f1, f2, justinstance=False, justattribute=False, justrelation=False, coref=False):
1078 |     """
1079 |     Score one pair of AMR lines at a time from each file handle
1080 |     :param f1: file handle (or any iterable of strings) to read AMR 1 lines from
1081 |     :param f2: file handle (or any iterable of strings) to read AMR 2 lines from
1082 |     :param justinstance: just pay attention to matching instances
1083 |     :param justattribute: just pay attention to matching attributes
1084 |     :param justrelation: just pay attention to matching relations
1085 |     :return: generator of cur_amr1, cur_amr2 pairs: one-line AMR strings
1086 |     """
1087 |     # matching triple number, triple number in test file, triple number in gold file
1088 |     total_scores = Scores()
1089 |     subscores = {}
1090 |     # Read amr pairs from two files
1091 |     for sent_num, (cur_amr1, cur_amr2) in tqdm(enumerate(generate_amr_lines(f1, f2), start=1), desc='Smatch'):
1092 |     #for sent_num, (cur_amr1, cur_amr2) in enumerate(generate_amr_lines(f1, f2), start=1):
1093 |         nums, ss = get_amr_match(cur_amr1, cur_amr2,
1094 |                                         sent_num=sent_num,  # sentence number
1095 |                                         justinstance=justinstance,
1096 |                                         justattribute=justattribute,
1097 |                                         justrelation=justrelation,
1098 |                                         coref=coref)
1099 |         best_match_num, test_triple_num, gold_triple_num = nums
1100 |         total_scores.increment(num=best_match_num,
1101 |                               pred=test_triple_num,
1102 |                               gold=gold_triple_num)
1103 |         for label in ss:
1104 |             if label in subscores:
1105 |                 subscores[label].update(ss[label])
1106 |             else:
1107 |                 subscores[label] = ss[label]
1108 |         # clear the matching triple dictionary for the next AMR pair
1109 |         match_triple_dict.clear()
1110 |         if not single_score:  # if each AMR pair should have a score, compute and output it here
1111 |             yield compute_f(best_match_num, test_triple_num, gold_triple_num)
1112 |     todo = []
1113 |     todo.append(('Overall Score:', total_scores))
1114 |     if coref:
1115 |         todo.append(('Coref Score', subscores['Total Coref'])) 
1116 |         #for label in subscores:
1117 |         #    todo.append((label, subscores[label]))
1118 | 
1119 |     for label, scores in todo:
1120 |         print(label)
1121 |         total_match_num, total_test_num, total_gold_num = scores.get()
1122 |         if verbose:
1123 |             print("Total match number, total triple number in AMR 1, and total triple number in AMR 2:", file=DEBUG_LOG)
1124 |             print(total_match_num, total_test_num, total_gold_num, file=DEBUG_LOG)
1125 |             print("---------------------------------------------------------------------------------", file=DEBUG_LOG)
1126 |         if single_score:  # output document-level smatch score (a single f-score for all AMR pairs in two files)
1127 |             yield compute_f(total_match_num, total_test_num, total_gold_num)
1128 | 
1129 | 
1130 | def main():
1131 |     """
1132 |     Main function of smatch score calculation
1133 |     """
1134 |     global verbose
1135 |     global veryVerbose
1136 |     global iteration_num
1137 |     global single_score
1138 |     global pr_flag
1139 |     global match_triple_dict
1140 | 
1141 |     import argparse
1142 | 
1143 |     parser = argparse.ArgumentParser(description="Smatch calculator")
1144 |     parser.add_argument(
1145 |         '-f',
1146 |         nargs=2,
1147 |         required=True,
1148 |         type=argparse.FileType('r'),
1149 |         help=('Two files containing AMR pairs. '
1150 |               'AMRs in each file are separated by a single blank line'))
1151 |     parser.add_argument(
1152 |         '-r',
1153 |         type=int,
1154 |         default=4,
1155 |         help='Restart number (Default:4)')
1156 |     parser.add_argument(
1157 |         '--significant',
1158 |         type=int,
1159 |         default=2,
1160 |         help='significant digits to output (default: 2)')
1161 |     parser.add_argument(
1162 |         '-v',
1163 |         action='store_true',
1164 |         help='Verbose output (Default:false)')
1165 |     parser.add_argument(
1166 |         '--vv',
1167 |         action='store_true',
1168 |         help='Very Verbose output (Default:false)')
1169 |     parser.add_argument(
1170 |         '--ms',
1171 |         action='store_true',
1172 |         default=False,
1173 |         help=('Output multiple scores (one AMR pair a score) '
1174 |               'instead of a single document-level smatch score '
1175 |               '(Default: false)'))
1176 |     parser.add_argument(
1177 |         '--pr',
1178 |         action='store_true',
1179 |         default=False,
1180 |         help=('Output precision and recall as well as the f-score. '
1181 |               'Default: false'))
1182 |     parser.add_argument(
1183 |         '--justinstance',
1184 |         action='store_true',
1185 |         default=False,
1186 |         help="just pay attention to matching instances")
1187 |     parser.add_argument(
1188 |         '--justattribute',
1189 |         action='store_true',
1190 |         default=False,
1191 |         help="just pay attention to matching attributes")
1192 |     parser.add_argument(
1193 |         '--justrelation',
1194 |         action='store_true',
1195 |         default=False,
1196 |         help="just pay attention to matching relations")
1197 |     parser.add_argument(
1198 |         '--coref-subscore',
1199 |         action='store_true',
1200 |         default=False,
1201 |         help="include subscores for coreference")
1202 | 
1203 |     arguments = parser.parse_args()
1204 |     
1205 |     # set the iteration number
1206 |     # total iteration number = restart number + 1
1207 |     iteration_num = arguments.r + 1
1208 |     if arguments.ms:
1209 |         single_score = False
1210 |     if arguments.v:
1211 |         verbose = True
1212 |     if arguments.vv:
1213 |         veryVerbose = True
1214 |     if arguments.pr:
1215 |         pr_flag = True
1216 |     # significant digits to print out
1217 |     floatdisplay = "%%.%df" % arguments.significant
1218 |     
1219 |     start_time = time.time()
1220 |     for (precision, recall, best_f_score) in score_amr_pairs(arguments.f[0], arguments.f[1],
1221 |                                                              justinstance=arguments.justinstance,
1222 |                                                              justattribute=arguments.justattribute,
1223 |                                                              justrelation=arguments.justrelation,
1224 |                                                              coref=arguments.coref_subscore):
1225 |         # print("Sentence", sent_num)
1226 |         if pr_flag:
1227 |             print("Precision: " + floatdisplay % precision)
1228 |             print("Recall: " + floatdisplay % recall)
1229 |         print("F-score: " + floatdisplay % best_f_score)
1230 |     end_time = time.time()
1231 |     elapsed_time = end_time - start_time
1232 |     print("Time(s): "+str(floatdisplay % elapsed_time))
1233 |         
1234 |     arguments.f[0].close()
1235 |     arguments.f[1].close()
1236 | 
1237 | 
1238 | if __name__ == "__main__":
1239 |     main()
1240 | 


--------------------------------------------------------------------------------
/doc_amr.py:
--------------------------------------------------------------------------------
  1 | import argparse
  2 | import os
  3 | import re
  4 | import copy
  5 | from tqdm import tqdm
  6 | 
  7 | from amr_io import (
  8 |     AMR,
  9 |     read_amr,
 10 |     process_corefs
 11 | )
 12 | from ipdb import set_trace
 13 | 
 14 | def make_doc_amrs(corefs, amrs, coref=True,chains=True):
 15 |     doc_amrs = {}
 16 | 
 17 |     desc = "making doc-level AMRs"
 18 |     if not coref:
 19 |         desc += " (without corefs)"
 20 |     for doc_id in tqdm(corefs, desc=desc):
 21 |         (doc_corefs,doc_sids,fname) = corefs[doc_id]
 22 |         if doc_sids[0] not in amrs:
 23 |             import ipdb; ipdb.set_trace()
 24 |         doc_amr = copy.deepcopy(amrs[doc_sids[0]])
 25 |         for sid in doc_sids[1:]:
 26 |             if sid not in amrs:
 27 |                 import ipdb; ipdb.set_trace()
 28 |             if amrs[sid].root is None:
 29 |                 continue
 30 |             doc_amr = doc_amr + amrs[sid]
 31 |         doc_amr.amr_id = doc_id
 32 |         doc_amr.doc_file = fname
 33 |         if coref:
 34 |             if chains:
 35 |                 doc_amr.add_corefs(doc_corefs)
 36 |             else:
 37 |                 doc_amr.add_edges(doc_corefs)
 38 |         
 39 |         #setting penman to None to avoid copying sentence amr's penman
 40 |         doc_amr.penman = None
 41 |         
 42 |         doc_amrs[doc_id] = doc_amr
 43 | 
 44 |     return doc_amrs
 45 | 
 46 | def connect_sen_amrs(amr):
 47 | 
 48 |     if len(amr.roots) <= 1:
 49 |         return
 50 | 
 51 |     node_id = amr.add_node("document")
 52 |     amr.root = str(node_id)
 53 |     for (i,root) in enumerate(amr.roots):
 54 |         amr.edges.append((amr.root, ":snt"+str(i+1), root))
 55 | 
 56 | def argument_parser():
 57 | 
 58 |     parser = argparse.ArgumentParser(description='Read AMRs and Corefs and put them together', \
 59 |                                      formatter_class=argparse.RawTextHelpFormatter)
 60 |     parser.add_argument(
 61 |         "--amr3-path",
 62 |         help="path to AMR3 annoratations",
 63 |         type=str
 64 |     )
 65 |     parser.add_argument(
 66 |         "--coref-fof",
 67 |         help="File containing list of xml files with coreference information ",
 68 |         type=str
 69 |     )
 70 |     parser.add_argument(
 71 |         "--out-amr",
 72 |         help="Output file containing AMR in penman format",
 73 |         type=str,
 74 |     )
 75 |     parser.add_argument(
 76 |         "--in-doc-amr-unmerged",
 77 |         help="path to a doc AMR file in 'no-merge' format",
 78 |         type=str
 79 |     )    
 80 |     parser.add_argument(
 81 |         "--in-doc-amr-pairwise",
 82 |         help="path to a doc AMR file with coref chains as pairwise edges",
 83 |         type=str
 84 |     )    
 85 |     parser.add_argument(
 86 |         "--pairwise-coref-rel",
 87 |         default='same-as',
 88 |         help="edge label representing pairwise coref edges",
 89 |         type=str
 90 |     )    
 91 |     parser.add_argument(
 92 |         '--rep',
 93 |         default='docAMR',
 94 |         help='''Which representation to use, options: 
 95 |         "no-merge" -- No node merging, only chain-nodes
 96 |         "merge-names" -- Merge only names
 97 |         "docAMR" -- Merge names and drop pronouns
 98 |         "merge-all" -- Merge all nodes''',
 99 |         type=str
100 |     )
101 |     parser.add_argument(
102 |         '--flipped',
103 |         help='whether or not to use the flipped representation i.e. parent->coref-entity->child',
104 |         action='store_true'
105 |     )
106 |     args = parser.parse_args()
107 |     return args
108 | 
109 |         
110 | def main():
111 | 
112 |     args = argument_parser()
113 |     assert args.out_amr        
114 | 
115 |     if args.amr3_path and args.coref_fof:
116 |         
117 |         # read cross sentenctial corefs from document AMR
118 |         coref_files = [args.amr3_path+"/"+line.strip() for line in open(args.coref_fof)]
119 |         corefs = process_corefs(coref_files)
120 | 
121 |         # Read AMR
122 |         directory = args.amr3_path + r'/data/amrs/unsplit/'
123 |         amrs = {}
124 |         for filename in tqdm(os.listdir(directory), desc="Reading sentence-level AMRs"):
125 |             amrs.update(read_amr(directory+filename))
126 | 
127 |         # write documents without corefs
128 |         plain_doc_amrs = make_doc_amrs(corefs,amrs,coref=False).values()
129 |         with open(args.out_amr+".nocoref", 'w') as fid:
130 |             for amr in plain_doc_amrs:
131 |                 damr = copy.deepcopy(amr)
132 |                 connect_sen_amrs(damr)
133 |                 fid.write(damr.__str__())        
134 |         # add corefs into Documentr level AMRs
135 |         amrs = make_doc_amrs(corefs,amrs).values()
136 |         with open(args.out_amr, 'w') as fid:
137 |             for amr in amrs:
138 |                 damr = copy.deepcopy(amr)
139 |                 connect_sen_amrs(damr)
140 |                 print("\nnormalizing "+damr.doc_file.split("/")[-1])
141 |                 print("normalizing "+damr.amr_id)
142 |                 damr.normalize(rep=args.rep, flip=args.flipped)
143 |                 fid.write(damr.__str__())
144 | 
145 |     if args.in_doc_amr_unmerged :
146 |         amrs = read_amr(args.in_doc_amr_unmerged).values()
147 |         with open(args.out_amr, 'w') as fid:
148 |             for amr in amrs:
149 |                 damr = copy.deepcopy(amr)
150 |                 print("\nnormalizing "+damr.amr_id)
151 |                 damr.normalize(rep=args.rep, flip=args.flipped)
152 |                 fid.write(damr.__str__())
153 | 
154 |     if args.in_doc_amr_pairwise :
155 |         amrs = read_amr(args.in_doc_amr_pairwise).values()
156 |         with open(args.out_amr, 'w') as fid:
157 |             for amr in amrs:
158 |                 damr = copy.deepcopy(amr)
159 |                 print("\nnormalizing "+damr.amr_id)
160 |                 damr.make_chains_from_pairs(args.pairwise_coref_rel)
161 |                 damr.normalize(rep=args.rep, flip=args.flipped)
162 |                 fid.write(damr.__str__())
163 |                 
164 | 
165 | 
166 | if __name__ == '__main__':
167 |     main()
168 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/README.md:
--------------------------------------------------------------------------------
 1 | ## Run DocAMR Baseline
 2 | To setup the environment we use a file called set_environment.sh
 3 | 
 4 | ```bash
 5 | touch set_environment.sh
 6 | ```
 7 | 
 8 | The activation of the conda/virtual environment can be added inside this file
 9 | 
10 | Packages to install are in requirements.txt. Python 3.7 works best. To install the packages required, perform the following inside the conda/virtual environment.
11 | 
12 | ```bash
13 | pip install -r doc_amr_baseline/requirements.txt
14 | ```
15 | Clone a dependent repository for conll coref conversion
16 | ```bash
17 | git clone https://github.com/boberle/corefconversion.git
18 | ```
19 | To get a document amr , given the sentence amrs run
20 | 
21 | ```bash
22 | bash doc_amr_baseline/run_doc_amr_baseline.sh <path_to_tokenized_sentence_amrs> <path_to_out> <normalization_representation> <path_to_coref-optional>
23 | 
24 | ```
25 | <path_to_tokenized_sentence_amrs> is a folder containing a file of sentence amrs for each document . Each file in the folder should have extension '.amr' and contain sentence amrs for all sentences in the document seperated by a newline. See **Format of AMR files** for further details.
26 | 
27 | <path_to_out> folder the doc amr for each document is to be output
28 | 
29 | <normalization_representation>  
30 | 
31 |     "no-merge" -- No node merging, only chain-nodes
32 |     "merge-names" -- Merge only names
33 |     "docAMR" -- Merge names and drop pronouns
34 |     "merge-all" -- Merge all nodes
35 | 
36 | Recommended representation based on the [paper](https://aclanthology.org/2022.naacl-main.256.pdf) is **"docAMR"**
37 | 
38 | <path_to_coref> path to allennlp coref in pickled format, is optional (ie previously generated coref can be reused here). If not provided, the script will use the sentences in the amrs to get spanBert coref from "https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2021.03.10.tar.gz"
39 | 
40 | ## Format of AMR files expected for DocAMR baseline
41 | Each file inside the folder <path_to_tokenized_sentence_amrs> should
42 | 1. End with extension '.amr'
43 | 2. Contain sentence amrs for all sentences in the document seperated by a newline
44 | 3. Contain metadata information about the AMR parse such as alignments and node id. 
45 | 
46 | See example folder for sample .amr file
47 | 
48 | To get sentence amrs , please checkout this parser :
49 | https://github.com/IBM/transition-amr-parser
50 | 
51 | Note: To get an amr in the same format as the example in the folder , use --jamr and --no-isi as arguments to the amr-parse command while use this parser.
52 | 
53 | 
54 | ## Run DocAMR Baseline test
55 | 
56 | To run a test of the baseline, given the gold docamr and sentence amrs ,
57 | 
58 | ```bash
59 | bash doc_amr_baseline/tests/baseline_allennlp_test.sh <gold-docamr> <path-to-tokenized-sentence-amrs> <normalization-representation>
60 | 
61 | ```
62 | 
63 | <gold-docamr> is a file containing the gold docamr obtained using the command mentioned in the main README with the same representation as <normalization-representation>
64 | 
65 | ```bash
66 | python doc_amr.py 
67 | --amr3-path <path to AMR3 data> 
68 | --coref-fof <file-with-list-of-xml-annotations-files> 
69 | --out-amr <output file> 
70 | --rep <representation>
71 | 
72 | ```
73 | 
74 | <path_to_tokenized_sentence_amrs> is a folder containing a file of sentence amrs for each document . Each file in the folder should have extension '.amr' and contain sentence amrs for all sentences in the document seperated by a newline. See **Format of AMR files** for further details.
75 | 
76 | <normalization_representation>  
77 | 
78 |     "no-merge" -- No node merging, only chain-nodes
79 |     "merge-names" -- Merge only names
80 |     "docAMR" -- Merge names and drop pronouns
81 |     "merge-all" -- Merge all nodes
82 | 
83 | Recommended representation based on the [paper](https://aclanthology.org/2022.naacl-main.256.pdf) is **"docAMR"**
84 | 
85 | 
86 | 
87 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/IBM/docAMR/e8937bad9aa3fa4077751f9bcedfbfcbfa37047e/doc_amr_baseline/__init__.py


--------------------------------------------------------------------------------
/doc_amr_baseline/amr_constituents.py:
--------------------------------------------------------------------------------
  1 | from collections import defaultdict
  2 | from ipdb import set_trace
  3 | import re
  4 | import copy
  5 | 
  6 | 
  7 | def get_subgraph_by_id(amr, no_reentrancies=True, no_reverse_edges=False):
  8 |     '''
  9 |     Given an AMR class provide for each node a list of all nodes "below" it
 10 |     in the graph. Ignore re-entrancies and reverse edges if solicited. Not
 11 |     ignoring re-entrancies can lead to infinite loops
 12 |     '''
 13 | 
 14 |     # get re-entrant edges
 15 |     if no_reentrancies:
 16 |         reentrancy_edges = get_reentrancy_edges(amr)
 17 |     else:
 18 |         reentrancy_edges = []
 19 | 
 20 |     # Gather constituents bottom-up
 21 |     # find leaf nodes
 22 |     leaf_nodes = []
 23 |     for nid, nname in amr.nodes.items():
 24 |         child_edges = [(nid, label, tgt) for tgt, label in amr.children(nid)]
 25 |         # If no nodes, or nodes are re-entrant
 26 |         if set(child_edges) <= set(reentrancy_edges):
 27 |             leaf_nodes.append(nid)
 28 |     # start from leaf nodes and go upwards ignoring re-entrant edges
 29 |     # store subgraph for every node as all the nodes "below" it on the tree
 30 |     subgraph_by_id = defaultdict(set)
 31 |     candidates = leaf_nodes
 32 |     new_nodes = True
 33 |     count = 0
 34 |     while new_nodes:
 35 |         new_candidates = set()
 36 |         new_nodes = False
 37 |         for nid in candidates:
 38 |             # ignore re-entrant nodes
 39 |             unique_parents = []
 40 |             for (src, label) in amr.parents(nid):
 41 |                 if (
 42 |                     (src, label, nid) not in reentrancy_edges
 43 |                     and not (no_reverse_edges and label.endswith('-of'))
 44 |                 ):
 45 |                     unique_parents.append(src)
 46 |             if len(unique_parents) == 0:
 47 |                 continue
 48 |             elif len(unique_parents) > 1:
 49 |                 set_trace(context=30)
 50 |             # colect subgraph for this node
 51 |             src = unique_parents[0]
 52 |             subgraph_by_id[src] |= set([nid])
 53 |             subgraph_by_id[src] |= set(subgraph_by_id[nid])
 54 |             new_candidates |= set([src])
 55 |             new_nodes = True
 56 | 
 57 |         candidates = new_candidates
 58 | 
 59 |         count += 1
 60 |         if count > 1000:
 61 |             set_trace(context=30)
 62 |             print()
 63 | 
 64 |     return subgraph_by_id
 65 | 
 66 | 
 67 | def get_constituents_from_subgraph(amr):
 68 |     '''Get spans associated to each subgraph'''
 69 | 
 70 |     # get the subgraph below each node
 71 |     subgraph_by_id = get_subgraph_by_id(amr)
 72 | 
 73 |     def get_constituent(nid):
 74 |         '''Given nid and subgraph extract span aligned to it'''
 75 |         # Token aligned to node
 76 |         indices = copy.deepcopy(amr.alignments[nid])
 77 |         sids = subgraph_by_id[nid]
 78 |         if sids is not None:
 79 |             # Tokens aligned to all nodes below it
 80 |             for id in sids:
 81 |                 if amr.alignments[id] is None:
 82 |                     continue
 83 |                 for idx in amr.alignments[id]:
 84 |                     if indices is None:
 85 |                         print('Alignment for node ',nid, 'not found but constituents are added')
 86 |                         indices = []
 87 |                     indices.append(idx)
 88 |         if indices:
 89 |             return min(indices), max(indices)
 90 |         else:
 91 |             return None, None
 92 | 
 93 |     # gather constituents associated to each node
 94 |     candidates = [amr.root]
 95 |     depth = 0
 96 |     depths = [depth]
 97 |     constituent_spans = []
 98 |     count = 0
 99 |     while candidates and count < 1000:
100 |         nid = candidates.pop()
101 |         ndepth = depths.pop()
102 |         start, end = get_constituent(nid)
103 |         if start is None:
104 |             count += 1
105 |             continue
106 |         # Add constituent to list
107 |         constituent_spans.append(dict(
108 |             depth=ndepth,
109 |             indices=(start, end+1),
110 |             head=amr.nodes[nid],
111 |             head_position=amr.alignments[nid],
112 |             nid=nid
113 |         ))
114 |         reentrancy_edges = get_reentrancy_edges(amr)
115 |         # update candidates, ignore re-entrant nodes
116 |         candidates.extend([
117 |             tgt for tgt, label in amr.children(nid)
118 |             if (nid, label, tgt) not in reentrancy_edges
119 |         ])
120 |         depth += 1
121 |         depths.extend([depth for _ in range(len(amr.children(nid)))])
122 |         count += 1
123 | 
124 |     if count == 1000:
125 |         # We got trapped in a loop
126 |         set_trace(context=30)
127 |         pass
128 | 
129 |     return {'tokens': amr.tokens, 'constituents': constituent_spans}
130 | 
131 | 
132 | def get_reentrancy_edges(amr):
133 | 
134 |     # Get re-entrant edges i.e. extra parents. We keep the edge closest to
135 |     # root
136 |     # annotate depth at which edeg occurs
137 |     candidates = [amr.root]
138 |     depths = [0]
139 |     depth_by_edge = dict()
140 |     while candidates:
141 |         for (tgt, label) in amr.children(candidates[0]):
142 |             edge = (candidates[0], label, tgt)
143 |             if edge in depth_by_edge:
144 |                 continue
145 |             depth_by_edge[edge] = depths[0]
146 |             candidates.append(tgt)
147 |             depths.append(depths[0] + 1)
148 |         candidates.pop(0)
149 |         depths.pop(0)
150 | 
151 |     # in case of multiple parents keep the one closest to the root
152 |     reentrancy_edges = []
153 |     for nid, nname in amr.nodes.items():
154 |         parents = [(src, label, nid) for src, label in amr.parents(nid)]
155 |         if nid == amr.root:
156 |             # Root can not have parents
157 |             reentrancy_edges.extend(parents)
158 |         elif len(parents) > 1:
159 |             # Keep only highest edge from re-entrant ones
160 |             # FIXME: Unclear why depth is missing sometimes
161 |             reentrancy_edges.extend(
162 |                 sorted(parents, key=lambda e: depth_by_edge.get(e, 1000))[1:]
163 |             )
164 |     return reentrancy_edges
165 | 
166 | def get_predicates(amr):
167 |     pred_regex = re.compile('.+-[0-9]+$')
168 | 
169 |     
170 |     num_preds = 0
171 |     pred_ret = []
172 |     
173 |     predicates = {n:v for n,v in amr.nodes.items() if pred_regex.match(v) and not v.endswith('91') and v not in ['have-half-life.01']}
174 |     num_preds += len(predicates)
175 |     for pred in predicates:
176 |         if amr.alignments[pred] is not None:
177 | 
178 |             args = {
179 |                         trip[1][1:].replace('-of', ''):(amr.nodes[trip[2]],amr.tokens[min(amr.alignments[trip[2]])],amr.alignments[trip[2]])
180 |                         for trip in amr.edges
181 |                         if trip[0] == pred #and trip[1].startswith(':ARG')
182 |             }
183 |         
184 |         
185 |             pred_ret.append({'pred':predicates[pred],'text':amr.tokens[min(amr.alignments[pred])],'args':args,'beg':min(amr.alignments[pred]),'end':max(amr.alignments[pred])+1})
186 |             
187 |     return pred_ret  
188 | 
189 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/baseline_io.py:
--------------------------------------------------------------------------------
  1 | from tqdm import tqdm
  2 | import penman
  3 | import os, sys
  4 | sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
  5 | from amr_io import AMR
  6 | import re
  7 | from collections import defaultdict
  8 | 
  9 | alignment_regex = re.compile('(-?[0-9]+)-(-?[0-9]+)')
 10 | class AMR2(AMR):
 11 |     
 12 |     @classmethod
 13 |     def get_all_vars(cls, penman_str):
 14 |         in_quotes = False
 15 |         all_vars = []
 16 |         for (i,ch) in enumerate(penman_str):
 17 |             if ch == '"':
 18 |                 if in_quotes:
 19 |                     in_quotes = False
 20 |                 else:
 21 |                     in_quotes = True
 22 |             if in_quotes:
 23 |                 continue
 24 |             if ch == '(':
 25 |                 var = ''
 26 |                 j = i+1
 27 |                 while j < len(penman_str) and penman_str[j] not in [' ','\n']:
 28 |                     var += penman_str[j]
 29 |                     j += 1
 30 |                 all_vars.append(var)
 31 |         return all_vars
 32 |     
 33 |     @classmethod
 34 |     def get_node_var(cls, penman_str, node_id):
 35 |         """
 36 |         find node variable based on ids like 0.0.1
 37 |         """
 38 |         nid = '99990.0.0.0.0.0.0'
 39 |         cidx = []
 40 |         lvls = []
 41 |         all_vars = AMR2.get_all_vars(penman_str)
 42 |         in_quotes = False
 43 |         for (i,ch) in enumerate(penman_str):
 44 |             if ch == '"':
 45 |                 if in_quotes:
 46 |                     in_quotes = False
 47 |                 else:
 48 |                     in_quotes = True
 49 |             if in_quotes:
 50 |                 continue
 51 | 
 52 |             if ch == ":":
 53 |                 idx = i
 54 |                 while idx < len(penman_str) and penman_str[idx] != ' ':
 55 |                     idx += 1
 56 |                 if idx+1 < len(penman_str) and penman_str[idx+1] != '(':
 57 |                     var = ''
 58 |                     j = idx+1
 59 |                     while j < len(penman_str) and penman_str[j] not in [' ','\n']:
 60 |                         var += penman_str[j]
 61 |                         j += 1
 62 |                     if var not in all_vars:
 63 |                         lnum = len(lvls)
 64 |                         if lnum >= len(cidx):
 65 |                             cidx.append(1)
 66 |                         else:
 67 |                             cidx[lnum] += 1                            
 68 |             if ch == '(':
 69 |                 lnum = len(lvls)
 70 |                 if lnum >= len(cidx):
 71 |                     cidx.append(0)
 72 |                 lvls.append(str(cidx[lnum]))
 73 |             
 74 |             if ch == ')':
 75 |                 lnum = len(lvls)
 76 |                 if lnum < len(cidx):
 77 |                     cidx.pop()
 78 |                 cidx[lnum-1] += 1
 79 |                 lvls.pop()
 80 | 
 81 |             if ".".join(lvls) == node_id:
 82 |                 j = i+1
 83 |                 while penman_str[j] == ' ':
 84 |                     j += 1
 85 |                 var = ""
 86 |                 while penman_str[j] != ' ':
 87 |                     var += penman_str[j]
 88 |                     j += 1
 89 |                 return var
 90 | 
 91 |         return None
 92 | 
 93 |     @classmethod
 94 |     def from_metadata(cls, penman_text, tokenize=False):
 95 |         """Read AMR from metadata (IBM style)"""
 96 | 
 97 |         # Read metadata from penman
 98 |         field_key = re.compile(f'::[A-Za-z]+')
 99 |         metadata = defaultdict(list)
100 |         separator = None
101 |         penman_str = ""
102 |         for line in penman_text:
103 |             if line.startswith('#'):
104 |                 line = line[2:].strip()
105 |                 start = 0
106 |                 for point in field_key.finditer(line):
107 |                     end = point.start()
108 |                     value = line[start:end]
109 |                     if value:
110 |                         metadata[separator].append(value)
111 |                     separator = line[end:point.end()][2:]
112 |                     start = point.end()
113 |                 value = line[start:]
114 |                 if value:
115 |                     metadata[separator].append(value)
116 |             else:
117 |                 penman_str += line.strip() + ' ' 
118 |                     
119 |         # assert 'tok' in metadata, "AMR must contain field ::tok"
120 |         if tokenize:
121 |             assert 'snt' in metadata, "AMR must contain field ::snt"
122 |             tokens, _ = protected_tokenizer(metadata['snt'][0])
123 |         else:
124 |             assert 'tok' in metadata, "AMR must contain field ::tok"
125 |             assert len(metadata['tok']) == 1
126 |             tokens = metadata['tok'][0].split()
127 | 
128 |         #print(penman_str)
129 |             
130 |         sid="000"
131 |         nodes = {}
132 |         nvars = {}
133 |         alignments = {}
134 |         edges = []
135 |         root = None
136 |         sentence = None
137 |     
138 |         if 'short' in metadata:
139 |             short_str = metadata["short"][0].split('\t')[1]
140 |             short = eval(short_str)
141 |             short = {str(k):v for k,v in short.items()}
142 |             all_vars = list(short.values())
143 |         else:
144 |             short = None
145 |             all_vars = AMR2.get_all_vars(penman_str)
146 |         
147 | 
148 |         for key, value in metadata.items():
149 |             if key == 'edge':
150 |                 for items in value:
151 |                     items = items.split('\t')
152 |                     if len(items) == 6:
153 |                         _, _, label, _, src, tgt = items
154 |                         edges.append((src, f':{label}', tgt))
155 |             elif key == 'node':
156 |                 for items in value:
157 |                     items = items.split('\t')
158 |                     if len(items) > 3:
159 |                         _, node_id, node_name, alignment = items
160 |                         start, end = alignment_regex.match(alignment).groups()
161 |                         indices = list(range(int(start), int(end)))
162 |                         alignments[node_id] = indices
163 |                     else:
164 |                         _, node_id, node_name = items
165 |                         alignments[node_id] = None
166 |                     nodes[node_id] = node_name
167 |                     if short is not None:
168 |                         var = short[node_id]
169 |                     else:
170 |                         var = node_id
171 |                     if var is not None and var+" / " not in penman_str:
172 |                         nvars[node_id] = None
173 |                     else:
174 |                         nvars[node_id] = var
175 |                         all_vars.remove(var)
176 |             elif key == 'root':
177 |                 root = value[0].split('\t')[1]
178 |             elif key == 'id':
179 |                 sid = value[0].strip()
180 |         if len(all_vars):
181 |             print("varaible not linked to nodes:")
182 |             print(all_vars)
183 |             print(penman_str)
184 |         return cls(tokens, nodes, edges, root, penman=None,
185 |                    alignments=alignments, nvars=nvars, sid=sid)
186 | 
187 | def read_amr2(file_path, ibm_format=False, tokenize=False):
188 |     with open(file_path) as fid:
189 |         raw_amr = []
190 |         raw_amrs = []
191 |         for line in tqdm(fid.readlines(), desc='Reading AMR'):
192 |             if line.strip() == '':
193 |                 if ibm_format:
194 |                     # From ::node, ::edge etc
195 |                     raw_amrs.append(
196 |                         AMR2.from_metadata(raw_amr, tokenize=tokenize)
197 |                     )
198 |                 else:
199 |                     # From penman
200 |                     raw_amrs.append(
201 |                         AMR.from_penman(raw_amr, tokenize=tokenize)
202 |                     )
203 |                 raw_amr = []
204 |             else:
205 |                 raw_amr.append(line)
206 |     return raw_amrs
207 | 
208 | def read_amr3(file_path, ibm_format=False, tokenize=False):
209 |     with open(file_path) as fid:
210 |         raw_amr = []
211 |         raw_amrs = {}
212 |         for line in tqdm(fid.readlines(), desc='Reading AMR'):
213 |             if line.strip() == '':
214 |                 if ibm_format:
215 |                     # From ::node, ::edge etc
216 |                     amr = AMR2.from_metadata(raw_amr, tokenize=tokenize)
217 |                 else:
218 |                     # From penman
219 |                     
220 |                     amr = AMR.from_penman(raw_amr, tokenize=tokenize)
221 | 
222 |                 raw_amrs[amr.sid] = amr
223 |                 raw_amr = []
224 |             else:
225 |                 raw_amr.append(line)
226 |     return raw_amrs
227 | 
228 | def read_amr3_docid(file_path, ibm_format=False, tokenize=False):
229 |     doc_id = None
230 |     with open(file_path) as fid:
231 |         raw_amr = []
232 |         raw_amrs = {}
233 |         
234 |         for line in tqdm(fid.readlines(), desc='Reading AMR'):
235 |             if line.strip() == '':
236 |                 if ibm_format:
237 |                     # From ::node, ::edge etc
238 |                     amr = AMR2.from_metadata(raw_amr, tokenize=tokenize)
239 |                 else:
240 |                     # From penman
241 |                     amr = AMR.from_penman(raw_amr, tokenize=tokenize)
242 |                 raw_amrs[amr.sid] = amr
243 |                 if doc_id is None:
244 |                     doc_id = amr.sid.rsplit('.',1)[0]
245 |                 
246 |                     
247 |                 raw_amr = []
248 |             else:
249 |                 raw_amr.append(line)
250 |     return raw_amrs,doc_id
251 | 
252 | #store by sen
253 | def read_amr_by_snt(file_path, tokenize=False):
254 |     with open(file_path) as fid:
255 |         raw_amr = []
256 |         raw_amrs = {}
257 |         for line in tqdm(fid.readlines(), desc='Reading AMR'):
258 |             if line.strip() == '':
259 |                 
260 |                 amr = AMR.from_penman(raw_amr, tokenize=tokenize)
261 |                 if tokenize:
262 |                     tok_sen = " ".join(amr.tokens)
263 |                     raw_amrs[tok_sen] = raw_amr
264 |                 else:
265 |                     raw_amrs[amr.penman.metadata['tok']] = raw_amr
266 |                 raw_amr = []
267 |             else:
268 |                 raw_amr.append(line)
269 |     return raw_amrs
270 | 
271 | def read_amr_raw(file_path, tokenize=False):
272 |     with open(file_path) as fid:
273 |         raw_amr = []
274 |         raw_amrs = []
275 |         for line in tqdm(fid.readlines(), desc='Reading AMR'):
276 |             if line.strip() == '':
277 |                 
278 |                 amr = AMR.from_penman(raw_amr, tokenize=tokenize)
279 |                 raw_amrs.append(raw_amr)
280 |                 raw_amr = []
281 |             else:
282 |                 raw_amr.append(line)
283 |     return raw_amrs
284 | 
285 | def read_amr_as_raw_str(file_path):
286 |     with open(file_path) as fid:
287 |         raw_amrs = []
288 |         raw_amr = ''
289 |         for line in fid.readlines():
290 |             if line.strip():
291 |                 raw_amr+=line.strip()+'\n'
292 |             else:
293 |                 raw_amrs.append(raw_amr)
294 |                 raw_amr = ''
295 |     return raw_amrs
296 | 
297 | #for amrs without an ::id 
298 | def read_amr_add_sen_id(file_path,doc_id,remove_id=False,tokenize=False,ibm_format=True):
299 |     with open(file_path) as fid:
300 |         raw_amr = []
301 |         raw_amrs = {}
302 |         for line in tqdm(fid.readlines(), desc='Reading AMR'):
303 |             if line.strip() == '':
304 |                 if ibm_format:
305 |                     # From ::node, ::edge etc
306 |                     amr = AMR2.from_metadata(raw_amr, tokenize=tokenize)
307 |                 else:
308 |                     # From penman
309 |                     amr = AMR.from_penman(raw_amr, tokenize=tokenize)
310 |                 raw_amrs[amr.sid] = amr
311 |                 raw_amr = []
312 |             else:
313 |                 if remove_id and '::id' in line:
314 |                     continue
315 |                 raw_amr.append(line)
316 |             if tokenize:
317 |                 if '::snt' in line and remove_id:
318 |                     raw_amr.append('# ::id '+doc_id+'.'+str(len(raw_amrs)+1))
319 |             else:
320 |                 if '::tok' in line and remove_id:
321 |                     raw_amr.append('# ::id '+doc_id+'.'+str(len(raw_amrs)+1))
322 |                 elif '::snt' in line and remove_id:
323 |                     raw_amr.append('# ::id '+doc_id+'.'+str(len(raw_amrs)+1))
324 |             
325 | 
326 |     return raw_amrs
327 | 
328 | def read_amr_str_add_sen_id(amr_strs,doc_id,tokenize=False):
329 |     raw_amrs = {}
330 |     for idx,amr_str in enumerate(amr_strs):
331 |     
332 |         # From ::node, ::edge etc
333 |         amr_list = amr_str.splitlines(True)
334 |         amr_list.insert(1,'# ::id '+doc_id+'.'+str(idx+1)+'\n')
335 |         # amr_list = [line+'\n# ::id '+doc_id+'.'+str(idx+1) if '::tok' in line else line for line in amr_str.split('\n')]
336 |         amr = AMR2.from_metadata(amr_list, tokenize=tokenize)
337 |         raw_amrs[amr.sid] = amr
338 |             
339 |     return raw_amrs
340 | 
341 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/example/doc_sen.amr:
--------------------------------------------------------------------------------
  1 | # ::tok Hailey is going to London tomorrow .
  2 | # ::node	p	person	0-1
  3 | # ::node	n	name	0-1
  4 | # ::node	0	Hailey	0-1
  5 | # ::node	g	go-02	2-3
  6 | # ::node	c	city	4-5
  7 | # ::node	n2	name	4-5
  8 | # ::node	1	London	4-5
  9 | # ::node	t	tomorrow	5-6
 10 | # ::root	g	go-02
 11 | # ::edge	person	name	name	p	n	
 12 | # ::edge	name	op1	Hailey	n	0	
 13 | # ::edge	go-02	ARG0	person	g	p	
 14 | # ::edge	go-02	ARG4	city	g	c	
 15 | # ::edge	city	name	name	c	n2	
 16 | # ::edge	name	op1	London	n2	1	
 17 | # ::edge	go-02	time	tomorrow	g	t	
 18 | (g / go-02
 19 |     :ARG0 (p / person
 20 |         :name (n / name
 21 |             :op1 "Hailey"))
 22 |     :ARG4 (c / city
 23 |         :name (n2 / name
 24 |             :op1 "London"))
 25 |     :time (t / tomorrow))
 26 | 
 27 | # ::tok She is planning to go to Italy after London .
 28 | # ::node	s	she	0-1
 29 | # ::node	p	plan-01	2-3
 30 | # ::node	g	go-02	4-5
 31 | # ::node	c2	country	6-7
 32 | # ::node	n	name	6-7
 33 | # ::node	0	Italy	6-7
 34 | # ::node	a	after	7-8
 35 | # ::node	c	city	8-9
 36 | # ::node	n2	name	8-9
 37 | # ::node	1	London	8-9
 38 | # ::root	p	plan-01
 39 | # ::edge	plan-01	ARG0	she	p	s	
 40 | # ::edge	plan-01	ARG1	go-02	p	g	
 41 | # ::edge	go-02	ARG0	she	g	s	
 42 | # ::edge	go-02	ARG4	country	g	c2	
 43 | # ::edge	country	name	name	c2	n	
 44 | # ::edge	name	op1	Italy	n	0	
 45 | # ::edge	go-02	time	after	g	a	
 46 | # ::edge	after	op1	city	a	c	
 47 | # ::edge	city	name	name	c	n2	
 48 | # ::edge	name	op1	London	n2	1	
 49 | (p / plan-01
 50 |     :ARG0 (s / she)
 51 |     :ARG1 (g / go-02
 52 |         :ARG0 s
 53 |         :ARG4 (c2 / country
 54 |             :name (n / name
 55 |                 :op1 "Italy"))
 56 |         :time (a / after
 57 |             :op1 (c / city
 58 |                 :name (n2 / name
 59 |                     :op1 "London")))))
 60 | 
 61 | # ::tok She is going to see the Big Ben .
 62 | # ::node	s2	she	0-1
 63 | # ::node	s	see-01	4-5
 64 | # ::node	b	building	6-7
 65 | # ::node	n	name	6-7
 66 | # ::node	1	Big	6-7
 67 | # ::node	0	Ben	7-8
 68 | # ::root	s	see-01
 69 | # ::edge	see-01	ARG0	she	s	s2	
 70 | # ::edge	see-01	ARG1	building	s	b	
 71 | # ::edge	building	name	name	b	n	
 72 | # ::edge	name	op1	Big	n	1	
 73 | # ::edge	name	op2	Ben	n	0	
 74 | (s / see-01
 75 |     :ARG0 (s2 / she)
 76 |     :ARG1 (b / building
 77 |         :name (n / name
 78 |             :op1 "Big"
 79 |             :op2 "Ben")))
 80 | 
 81 | # ::tok Her friend Phil is meeting her in London .
 82 | # ::node	s	she	0-1
 83 | # ::node	h	have-rel-role-91	1-2
 84 | # ::node	f	friend	1-2
 85 | # ::node	p	person	2-3
 86 | # ::node	n	name	2-3
 87 | # ::node	1	Phil	2-3
 88 | # ::node	m	meet-03	4-5
 89 | # ::node	c	city	7-8
 90 | # ::node	n2	name	7-8
 91 | # ::node	0	London	7-8
 92 | # ::root	m	meet-03
 93 | # ::edge	have-rel-role-91	ARG1	she	h	s	
 94 | # ::edge	have-rel-role-91	ARG2	friend	h	f	
 95 | # ::edge	person	ARG0-of	have-rel-role-91	p	h	
 96 | # ::edge	person	name	name	p	n	
 97 | # ::edge	name	op1	Phil	n	1	
 98 | # ::edge	meet-03	ARG0	person	m	p	
 99 | # ::edge	meet-03	ARG1	she	m	s	
100 | # ::edge	meet-03	location	city	m	c	
101 | # ::edge	city	name	name	c	n2	
102 | # ::edge	name	op1	London	n2	0	
103 | (m / meet-03
104 |     :ARG0 (p / person
105 |         :name (n / name
106 |             :op1 "Phil")
107 |         :ARG0-of (h / have-rel-role-91
108 |             :ARG1 (s / she)
109 |             :ARG2 (f / friend)))
110 |     :ARG1 s
111 |     :location (c / city
112 |         :name (n2 / name
113 |             :op1 "London")))
114 | 
115 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/example/docamr_docAMR.out:
--------------------------------------------------------------------------------
 1 | # ::id sentence_test
 2 | # ::doc_file sentence_test
 3 | # ::tok Hailey is going to London tomorrow . <next_sent> She is planning to go to Italy after London . <next_sent> She is going to see the Big Ben . <next_sent> Her friend Phil is meeting her in London .
 4 | (d / document
 5 |    :snt1 (s1.g / go-02
 6 |                :ARG0 (s1.p / person
 7 |                            :name (s1.n / name
 8 |                                        :op1 "Hailey"))
 9 |                :ARG4 (s2.c / city
10 |                            :name (s2.n2 / name
11 |                                         :op1 "London"))
12 |                :time (s1.t / tomorrow))
13 |    :snt2 (s2.p / plan-01
14 |                :ARG0 (pro / she)
15 |                :ARG1 (s2.g / go-02
16 |                            :ARG0 pro
17 |                            :ARG4 (s2.c2 / country
18 |                                         :name (s2.n / name
19 |                                                     :op1 "Italy"))
20 |                            :time (s2.a / after
21 |                                        :op1 s2.c)))
22 |    :snt3 (s3.s / see-01
23 |                :ARG0 pro
24 |                :ARG1 (s3.b / building
25 |                            :name (s3.n / name
26 |                                        :op1 "Big"
27 |                                        :op2 "Ben")))
28 |    :snt4 (s4.m / meet-03
29 |                :ARG0 (s4.p / person
30 |                            :name (s4.n / name
31 |                                        :op1 "Phil")
32 |                            :ARG0-of (s4.h / have-rel-role-91
33 |                                           :ARG1 pro
34 |                                           :ARG2 (s4.f / friend)))
35 |                :ARG1 pro
36 |                :location s2.c))
37 | 
38 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/get_allen_coref.py:
--------------------------------------------------------------------------------
 1 | #conda activate allen_nlp
 2 | import allennlp
 3 | from allennlp.predictors.predictor import Predictor
 4 | from itertools import accumulate
 5 | # import allennlp_models.tagging
 6 | import glob
 7 | import pickle
 8 | from tqdm import tqdm
 9 | import argparse
10 | 
11 | predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2021.03.10.tar.gz")
12 | 
13 | def get_allen_coref(filepath,from_amr=False):
14 |     
15 |     f1 = open(filepath,'r').read()
16 |     sen_list = f1.splitlines()
17 |     if from_amr:
18 |         sen_tok_list = [s.split('::tok ')[-1].split() for s in sen_list if '::tok' in s]
19 |     else:
20 |         sen_tok_list = [s.split() for s in sen_list]
21 |     sen_tok_len = [len(l) for l in sen_tok_list]
22 |     sen_tok_len = list(accumulate(sen_tok_len))
23 |     sen_tok = [item for sublist in sen_tok_list for item in sublist]
24 | 
25 | 
26 |     pred = predictor.predict_tokenized(tokenized_document=sen_tok)
27 |     clusters = pred['clusters']
28 |     document = pred['document']
29 |     new_cluster = []
30 | 
31 |     for c in clusters:
32 |         new_c = []
33 |         for m in c:
34 |             for idx,l in enumerate(sen_tok_len):
35 |                 if m[0]< l:
36 |                     prev_len = sen_tok_len[idx-1]
37 |                     break
38 |             if idx!=0:
39 |                 new_c.append([idx,m[0]-prev_len,m[1]-prev_len])
40 |             else:
41 |                 new_c.append([idx,m[0],m[1]])
42 |         new_cluster.append(new_c)
43 |     return new_cluster        
44 |             
45 | if __name__ == "__main__":
46 |     parser = argparse.ArgumentParser()
47 |     parser.add_argument('--path_to_sen',type=str)
48 |     parser.add_argument('--path_to_out',type=str,help='path to output',default=None)
49 |     parser.add_argument('--from_amr',action='store_true')
50 |     parser.add_argument('--from_json',action='store_true')
51 | 
52 |     args = parser.parse_args()
53 |     doc_clusters = {}
54 |     args.path_to_sen+='/'
55 |    
56 |     
57 |     i = 0
58 |     if args.from_amr:
59 |         ext = '.amr'
60 |     else:
61 |         ext = '.txt'
62 |     
63 |     # if from_json:
64 |     #     json_dict = json.load(open(path_to_sen))
65 |     #     for doc_id,doc_val in json_dict.items():
66 |     #             amr_strs = [s['sentence'] for s in doc_val['sentences'].values()]
67 |     # path_fill = path_to_sen+'doc*'+ext
68 |     path_fill = args.path_to_sen+'*'+ext
69 | 
70 |     for filepath in tqdm(glob.iglob(path_fill)):
71 |         doc_id = filepath.split('/')[-1].split('.')[0]
72 |         clusters = get_allen_coref(filepath,from_amr=args.from_amr)
73 |         doc_clusters[doc_id] = clusters
74 |         i+=1
75 |     
76 |     if args.path_to_out is None:
77 |         out_path = args.path_to_sen+'/allen_spanbert_large-2021.03.10.coref'
78 |     else:
79 |         out_path = args.path_to_out+'/allen_spanbert_large-2021.03.10.coref'
80 |     with open(out_path,'wb') as f2:
81 |         pickle.dump(doc_clusters,f2)
82 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/make_doc_amr.py:
--------------------------------------------------------------------------------
  1 | import json
  2 | import os
  3 | import glob
  4 | from itertools import groupby
  5 | from operator import itemgetter
  6 | import copy
  7 | import collections
  8 | import pickle
  9 | import tqdm
 10 | 
 11 | import argparse
 12 | from .baseline_io import (
 13 |     
 14 | 
 15 |     read_amr_add_sen_id,
 16 |     read_amr_str_add_sen_id,
 17 |     read_amr3_docid
 18 |     
 19 | )
 20 | import os, sys
 21 | sys.path.append(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
 22 | 
 23 | from doc_amr import make_doc_amrs,connect_sen_amrs
 24 | 
 25 | from .amr_constituents import get_subgraph_by_id,get_constituents_from_subgraph
 26 | 
 27 | 
 28 | 
 29 | def get_node_from_subgraph(subgraph,beg,end,doc_amr=None,verbose=False):
 30 |     candidate_nodes = []
 31 |     secondary_candidates =[]
 32 |     for head in subgraph['constituents']:
 33 |         if head['indices'][0]==beg and head['indices'][1]==end+1:
 34 |                 name = head['head']
 35 |                 return head['nid']
 36 |         elif head['indices'][0] in range(beg,end+2) and head['indices'][1] in range(beg,end+2):
 37 |             candidate_nodes.append(head)       
 38 |         elif head['head_position'] is not None:
 39 |             if max(head['head_position'])==end: #or end == head['head_position'][0]: This made other nodes wrong
 40 |                 secondary_candidates.append(head)
 41 | 
 42 |     # al_node = get_node_from_alignment(doc_amr, beg, end)    
 43 | 
 44 |     if len(candidate_nodes) == 1:
 45 |         if verbose:
 46 |             print('approx alignment node')
 47 |         name = candidate_nodes[0]['head']
 48 |         return candidate_nodes[0]['nid']
 49 |     elif len(candidate_nodes)>1:
 50 |         if verbose:
 51 |             print('subset alignment mindepth node')
 52 |         mindepth = min(candidate_nodes, key=lambda x:x['depth'])
 53 |         name = mindepth['head']
 54 |         return mindepth['nid']
 55 |     elif len(secondary_candidates)>0:
 56 |         if verbose:
 57 |             print('end alignment mindepth node')
 58 |         mindepth = min(secondary_candidates, key=lambda x:x['depth'])
 59 |         name = mindepth['head']
 60 |         return mindepth['nid']
 61 | 
 62 | 
 63 |     return None
 64 | 
 65 | 
 66 | def construct_triples(doc_amrs,from_sen_id,from_node_id,sen_node_pairs,relation,verbose=False):
 67 |     triples = []
 68 |     for (full_sen_id,full_node_id) in sen_node_pairs:
 69 |         from_node = doc_amrs[full_sen_id].nvars[full_node_id]
 70 |         to_node = doc_amrs[from_sen_id].nvars[from_node_id]
 71 |         if from_node is not None and to_node is not None:   
 72 |             trip = (full_sen_id+'.'+from_node,relation,from_sen_id+'.'+to_node)
 73 |             triples.append(trip)
 74 |         else:
 75 |             if from_node is None and verbose:
 76 |                 
 77 |                 print(full_node_id ,' node id is not recognized in sentence ',full_sen_id)
 78 |             elif to_node is None and verbose:
 79 |                 
 80 |                 print(from_node_id ,' node id is not recognized in sentence ',from_sen_id)
 81 |     return triples    
 82 | 
 83 | 
 84 | def process_coref_conll(amrs,coref_chains,add_coref=True,verbose=False,save_triples=False,out=None,relation='same-as',coref_type='allennlp'):
 85 |     corefs = {}
 86 |     for doc_id,doc_amrs in tqdm.tqdm(amrs.items()):
 87 |         doc_triples = []
 88 |         doc_sids = list(doc_amrs.keys())
 89 |         sid_done =[]
 90 |         if add_coref and doc_id in coref_chains:
 91 |             #getting subgraph information of each amr
 92 |             subgraphs = {f_id: get_constituents_from_subgraph(doc_amrs[f_id]) for f_id in doc_amrs if doc_amrs[f_id].root is not None and len(doc_amrs[f_id].alignments)>0 }
 93 |             for ent in coref_chains[doc_id]:
 94 |                 min_id = (None,None)
 95 |                 sen_node_pairs = []
 96 |                 for mention in ent:
 97 |                     if coref_type=='conll':
 98 |                         sid = mention[0]
 99 |                     elif coref_type=='allennlp':
100 |                         sid = mention[0]+1
101 |                     beg = mention[1]
102 |                     end = mention[2]
103 |                     sen_id = doc_id+'.'+str(sid)
104 |                     if sen_id in subgraphs:
105 |                         node_id = get_node_from_subgraph(subgraphs[sen_id], beg, end,doc_amrs[sen_id],verbose=verbose)
106 |                     else:
107 |                         node_id = None
108 |                     if node_id is None:
109 |                         if sen_id in sid_done:
110 |                             if verbose:
111 |                                 print('maybe inter amr coref ,node not found')
112 |                         else:
113 |                             if verbose:
114 |                                 print('node not found')
115 |                                 print(mention)
116 |                         continue
117 |                     else:
118 |                         sid_done.append(sen_id)
119 |                         
120 |                     if min_id[0] is None :
121 |                         min_id = (sid,node_id)
122 |                     elif sid < min_id[0]:
123 |                         min_full_id = doc_id+'.'+str(min_id[0])
124 |                         sen_node_pairs.append((min_full_id,min_id[1]))
125 |                         min_id = (sid,node_id)
126 |                     else:
127 |                         sen_node_pairs.append((sen_id,node_id))
128 |                 
129 |                 min_full_id = doc_id+'.'+str(min_id[0])
130 |                 triples = construct_triples(doc_amrs,min_full_id,min_id[1],sen_node_pairs,relation)
131 |                 doc_triples.extend(triples)
132 |                 
133 |         corefs[doc_id] = (doc_triples,doc_sids,doc_id)
134 |         if save_triples:
135 |             with open(args.out_amr.replace('.amr','.triples'),'wb') as f1:
136 |                 pickle.dump(corefs,f1)
137 | 
138 |     return corefs
139 |                     
140 | 
141 | def main():
142 |     
143 |     parser = argparse.ArgumentParser()
144 |     parser.add_argument('--path_to_coref',type=str,required=True)
145 |     parser.add_argument('--path_to_amr',type=str,help='path to folder containing list of amr files per document',required=True)
146 |     parser.add_argument('--out_amr',type=str,help='path to output',required=True)
147 |     parser.add_argument('--add_id',action='store_true',help='add id to amr')
148 |     parser.add_argument('--add_coref',action='store_true',default=False,help='add coref to doc amr')
149 |     parser.add_argument('--allennlp',action='store_true',help='coref format pickled coref chains allen_nlp')
150 |     parser.add_argument('--conll',action='store_true',help='coref format is conll')
151 |     parser.add_argument('--event',action='store_true',help='perform event coref')
152 |     parser.add_argument('--norm_rep',type=str,default='docAMR',help='''normalization format 
153 |         "no-merge" -- No node merging, only chain-nodes
154 |         "merge-names" -- Merge only names
155 |         "docAMR" -- Merge names and drop pronouns
156 |         "merge-all" -- Merge all nodes''')
157 |     parser.add_argument('--verbose',action='store_true',help='Print types of nodes found')
158 |     parser.add_argument('--save_triples',action='store_true',help='save triples as pickle')
159 |     parser.add_argument('--tokenize',action='store_true',help='::tok not availabble in the parse')
160 |     parser.add_argument('--sort_alpha',action='store_true',help='sort doc names alphabetically, default is False ie sorting numerically')
161 |     parser.add_argument('--use_penman',action='store_true',default=True,help='use penman graph to construct doc amr (in the case of wiki)')
162 |     parser.add_argument('--path_to_penman',type=str,default = None,help='optional path to folder to get penman amr')
163 | 
164 |     
165 |     args = parser.parse_args()
166 |     pat = '*'
167 |     amrs = {}
168 |     coref = {}
169 |     mentions = {}
170 |     entities = {}
171 |     amrs_dict = {}
172 |     event_clusters = {}
173 |     events = {}
174 |     sort_alpha = True
175 |     amrs_penman = {}
176 |     amrs_penman_dict = {}
177 |     
178 |     args.path_to_amr+='/'
179 |     assert args.norm_rep in ['docAMR','no-merge','merge-names','merge-all'],'Norm represenation should be one of the following docAMR, no-merge, merge-names, merge-all'
180 |     
181 |     if not glob.glob(args.path_to_amr+pat+'.amr'):
182 |         if not glob.glob(args.path_to_amr+pat+'.parse'):
183 |             raise Exception("--path_to_amr folder does not contain .amr files or .parse files ")
184 |         else:
185 |             sort_alpha = True
186 |             filepaths = glob.iglob(args.path_to_amr+pat+'.parse')
187 |     else:
188 |         filepaths = glob.iglob(args.path_to_amr+pat+'.amr')
189 |     
190 |     filepaths = list(filepaths)
191 | 
192 |     #FIXME sorting of sentence amrs based on filename,change to a universal sorting method
193 |     if 'msamr_df' in filepaths[0].split('/')[-1]:
194 |             sort_alpha = True
195 |     if sort_alpha or args.sort_alpha:
196 |         sorted_filepaths = sorted(filepaths,key=lambda t: t.split('/')[-1].split('.')[0])
197 |     else:
198 |         #sort doc_<num> numerically
199 |         sorted_filepaths = sorted(filepaths,key=lambda t: int(t.split('/')[-1].split('.')[0].split('_')[-1]))
200 |     #sorted(filepaths,key=lambda t: t.split('.')[0])
201 |     # sorted_filepaths_dict = {'doc_'+str(idx): item.split('/')[-1].split('.')[0] for idx,item in enumerate(sorted_filepaths)}
202 |     #sorted_filepaths_dict = {'doc_'+str(idx): item.split('/')[-1].split('.')[0] for item in filepaths}
203 |     for filepath in sorted_filepaths:
204 |         doc_id = filepath.split('/')[-1].split('.')[0]
205 |         if args.add_id:
206 |             amrs[doc_id] = read_amr_add_sen_id(filepath, doc_id,remove_id=args.add_id,tokenize=args.tokenize)
207 |             if args.path_to_penman is not None:
208 |                 amrs_penman[doc_id] = read_amr_add_sen_id(args.path_to_penman+filepath.split('/')[-1], doc_id,remove_id=args.add_id,tokenize=args.tokenize,ibm_format=False)
209 |             else:
210 |                 amrs_penman[doc_id] = read_amr_add_sen_id(filepath, doc_id,remove_id=args.add_id,tokenize=args.tokenize,ibm_format=False)
211 |         else:
212 |             d_amrs,doc_id = read_amr3_docid(filepath,ibm_format=True)
213 |             amrs[doc_id] = d_amrs
214 |             if args.path_to_penman is not None:
215 |                 amrs_penman[doc_id],doc_id = read_amr3_docid(args.path_to_penman+filepath.split('/')[-1],ibm_format=False)
216 |             else:
217 |                 amrs_penman[doc_id],doc_id = read_amr3_docid(filepath,ibm_format=False)
218 |         
219 |         amrs_penman_dict.update(amrs_penman[doc_id])
220 | 
221 |         amrs_dict.update(amrs[doc_id])
222 |     
223 |     
224 |     if args.allennlp:
225 |         #Getting coref from allen-nlp Spanbert model
226 |         coref_chains = {}
227 |         out = pickle.load(open(args.path_to_coref,'rb'))
228 |         for i,(doc_id,val) in enumerate(out.items()):
229 | 
230 |             coref_chains[doc_id] = val
231 |         assert len(coref_chains)>0,"Coref file is empty"
232 |         corefs = process_coref_conll(amrs,coref_chains,args.add_coref,verbose=args.verbose,save_triples=args.save_triples,coref_type='allennlp')
233 |     elif args.conll:
234 |         from corefconversion.conll_transform import read_file as conll_read_file
235 |         from corefconversion.conll_transform import compute_chains as conll_compute_chains
236 | 
237 |         coref_chains = {}
238 |         out = conll_read_file(args.path_to_coref)
239 |         for n,(i,val) in enumerate(out.items()):
240 |             
241 |             docid_spl = i.split('); part ')
242 |             doc_id = docid_spl[0].split('/')[-1]+'_'+str(int(docid_spl[1]))
243 |             coref_chains[doc_id] = conll_compute_chains(val)
244 |             assert len(coref_chains)>0,"Coref file is empty"
245 |         corefs = process_coref_conll(amrs,coref_chains,save_triples=args.save_triples,out=args.out_amr,coref_type='conll')
246 |         
247 | 
248 | 
249 |     
250 |     
251 |     #FIXME sorting of sentence amrs based on filename,change to a universal sorting method
252 |     if args.add_id and not sort_alpha and not args.sort_alpha:
253 |         corefs = collections.OrderedDict(sorted(corefs.items(),key=lambda t: int(t[0].split('.')[0].split('_')[-1])))
254 |     else:
255 |         corefs = collections.OrderedDict(sorted(corefs.items(),key=lambda t: t[0].split('.')[0]))
256 |     
257 |     #use_penman is set to True by default , penman format is used to construct the final doc-amr
258 |     if args.use_penman:
259 |         out_doc_amrs = make_doc_amrs(corefs=corefs,amrs=amrs_penman_dict,chains=False)
260 |     else:
261 |         out_doc_amrs = make_doc_amrs(corefs=corefs,amrs=amrs_dict,chains=False)
262 | 
263 |     out_dir = args.out_amr.rsplit('/',1)[0]
264 |     if not os.path.isdir(out_dir):
265 |         os.makedirs(out_dir)
266 | 
267 |     #with open(args.out_amr+'/'+doc_id+'_docamr_'+args.norm_rep+'.out', 'w') as fid:
268 |     with open(args.out_amr+'/'+args.path_to_amr.split('/')[-1]+'docamr_'+args.norm_rep+'.out', 'w') as fid:
269 |         
270 |         for doc_id,amr in tqdm(out_doc_amrs.items(),'writing doc-amrs'):
271 |             
272 |                 damr = copy.deepcopy(amr)
273 |                 connect_sen_amrs(damr)
274 |                 damr.make_chains_from_pairs()
275 |                 damr.normalize(args.norm_rep)
276 |                 damr_str = damr.__str__()
277 |                 
278 |                 
279 |                 fid.write(damr_str)
280 |         fid.close()
281 | 
282 | if __name__ == "__main__":
283 |     main()
284 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/requirements.txt:
--------------------------------------------------------------------------------
1 | allennlp==2.10.0
2 | allennlp-models==2.10.0
3 | ipdb==0.13.9
4 | penman==1.2.1
5 | tqdm==4.64.0
6 | importlib_metadata==6.6.0
7 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/run_doc_amr_baseline.sh:
--------------------------------------------------------------------------------
 1 | set -o pipefail
 2 | set -o errexit
 3 | . set_environment.sh
 4 | HELP="$0 <path_to_tokenized_sentence_amr> <path_to_out> <representation format no-merge -- No node merging, only chain-nodes
 5 |         merge-names -- Merge only names
 6 |         docAMR -- Merge names and drop pronouns
 7 |         merge-all -- Merge all nodes> <path_to_coref>"
 8 | [ -z $1 ] && echo "$HELP" && exit 1
 9 | [ -z $2 ] && echo "$HELP" && exit 1
10 | 
11 | path_to_sentence_amr=$1
12 | out_amr=$2
13 | rep=$3
14 | path_to_coref=$4
15 | set -o nounset
16 | 
17 | if [ -z $rep ];then
18 |     rep='docAMR'
19 | fi
20 | 
21 | if [ -z $path_to_coref ];then
22 |     echo "Getting coref for sentences"
23 |     coref_filename='allen_spanbert_large-2021.03.10.coref'
24 |     python doc_amr_baseline/get_allen_coref.py \
25 |         --path_to_sen $path_to_sentence_amr \
26 |         --from_amr 
27 |     path_to_coref=$path_to_sentence_amr/$coref_filename
28 | fi
29 | 
30 | echo "Doc Level AMRs:"
31 | python doc_amr_baseline/make_doc_amr.py \
32 |     --path_to_coref $path_to_coref \
33 |     --path_to_amr $path_to_sentence_amr \
34 |     --out_amr $out_amr \
35 |     --add_coref \
36 |     --add_id \
37 |     --allennlp \
38 |     --norm_rep $rep \
39 |     --sort_alpha
40 | 
41 | 
42 | 
43 | 
44 | 
45 | 
46 | 


--------------------------------------------------------------------------------
/doc_amr_baseline/tests/baseline_allennlp_test.sh:
--------------------------------------------------------------------------------
 1 | set -o errexit 
 2 | set -o pipefail
 3 | set -o nounset 
 4 | 
 5 | [ -z $1 ] && echo "$0 <gold-docamr> <path-to-tokenized-sentence-amrs> <normalization-representation>" && exit 1
 6 | [ -z $2 ] && echo "$0 <gold-docamr> <path-to-tokenized-sentence-amrs> <normalization-representation>" && exit 1
 7 | [ -z $3 ] && echo "$0 <gold-docamr> <path-to-tokenized-sentence-amrs> <normalization-representation>" && exit 1
 8 | 
 9 | gold_amr=$1
10 | sentence_amr=$2
11 | rep=$3
12 | 
13 | # running baseline with AMR3 document amrs test split
14 | bash run_doc_amr_baseline.sh $sentence_amr $sentence_amr $rep
15 | 
16 | echo "Computing Smatch "
17 | python docSmatch/smatch.py -r 10 --significant 4 -f $gold_amr ${sentence_amr}/docamr_${rep}.out
18 | 
19 | printf "[\033[92m OK \033[0m] $0\n"


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | Penman==1.0.0
2 | ipdb
3 | tqdm
4 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from setuptools import setup, find_packages
 2 | 
 3 | VERSION = '0.1.1'
 4 | 
 5 | # this is what usually goes on requirements.txt
 6 | install_requires = [
 7 |     'tqdm',
 8 |     # for scoring
 9 |     'penman',
10 |     # for debugging
11 |     'ipdb',
12 | ]
13 | 
14 | setup(
15 |     name='docAMR',
16 |     version=VERSION,
17 |     description="document AMR representation and evaluation",
18 |     py_modules=['doc_amr','amr_io','docSmatch','doc_amr_baseline'],
19 |     entry_points={
20 |         'console_scripts': [
21 |             'doc-smatch = docSmatch.smatch:main',
22 |             'doc-baseline = doc_amr_baseline.make_doc_amr:main',
23 |             'doc-amr = doc_amr:main'
24 |         ]
25 |     },
26 |     packages=find_packages(),
27 |     install_requires=install_requires,
28 | )
29 | 


--------------------------------------------------------------------------------
/test_coref.fof:
--------------------------------------------------------------------------------
 1 | data/multisentence/ms-amr-split/test/msamr_dfa_007.xml
 2 | data/multisentence/ms-amr-split/test/msamr_dfa_027.xml
 3 | data/multisentence/ms-amr-split/test/msamr_dfa_041.xml
 4 | data/multisentence/ms-amr-split/test/msamr_dfa_063.xml
 5 | data/multisentence/ms-amr-split/test/msamr_dfa_077.xml
 6 | data/multisentence/ms-amr-split/test/msamr_dfa_081.xml
 7 | data/multisentence/ms-amr-split/test/msamr_dfa_093.xml
 8 | data/multisentence/ms-amr-split/test/msamr_dfa_095.xml
 9 | data/multisentence/ms-amr-split/test/msamr_dfa_134.xml
10 | 


--------------------------------------------------------------------------------
/train_coref.fof:
--------------------------------------------------------------------------------
  1 | data/multisentence/ms-amr-split/train/msamr_dfa_001.xml
  2 | data/multisentence/ms-amr-split/train/msamr_dfa_002.xml
  3 | data/multisentence/ms-amr-split/train/msamr_dfa_003.xml
  4 | data/multisentence/ms-amr-split/train/msamr_dfa_005.xml
  5 | data/multisentence/ms-amr-split/train/msamr_dfa_006.xml
  6 | data/multisentence/ms-amr-split/train/msamr_dfa_008.xml
  7 | data/multisentence/ms-amr-split/train/msamr_dfa_009.xml
  8 | data/multisentence/ms-amr-split/train/msamr_dfa_010.xml
  9 | data/multisentence/ms-amr-split/train/msamr_dfa_011.xml
 10 | data/multisentence/ms-amr-split/train/msamr_dfa_012.xml
 11 | data/multisentence/ms-amr-split/train/msamr_dfa_013.xml
 12 | data/multisentence/ms-amr-split/train/msamr_dfa_014.xml
 13 | data/multisentence/ms-amr-split/train/msamr_dfa_015.xml
 14 | data/multisentence/ms-amr-split/train/msamr_dfa_016.xml
 15 | data/multisentence/ms-amr-split/train/msamr_dfa_017.xml
 16 | data/multisentence/ms-amr-split/train/msamr_dfa_019.xml
 17 | data/multisentence/ms-amr-split/train/msamr_dfa_020.xml
 18 | data/multisentence/ms-amr-split/train/msamr_dfa_021.xml
 19 | data/multisentence/ms-amr-split/train/msamr_dfa_023.xml
 20 | data/multisentence/ms-amr-split/train/msamr_dfa_024.xml
 21 | data/multisentence/ms-amr-split/train/msamr_dfa_025.xml
 22 | data/multisentence/ms-amr-split/train/msamr_dfa_026.xml
 23 | data/multisentence/ms-amr-split/train/msamr_dfa_028.xml
 24 | data/multisentence/ms-amr-split/train/msamr_dfa_029.xml
 25 | data/multisentence/ms-amr-split/train/msamr_dfa_030.xml
 26 | data/multisentence/ms-amr-split/train/msamr_dfa_031.xml
 27 | data/multisentence/ms-amr-split/train/msamr_dfa_032.xml
 28 | data/multisentence/ms-amr-split/train/msamr_dfa_033.xml
 29 | data/multisentence/ms-amr-split/train/msamr_dfa_034.xml
 30 | data/multisentence/ms-amr-split/train/msamr_dfa_035.xml
 31 | data/multisentence/ms-amr-split/train/msamr_dfa_036.xml
 32 | data/multisentence/ms-amr-split/train/msamr_dfa_037.xml
 33 | data/multisentence/ms-amr-split/train/msamr_dfa_038.xml
 34 | data/multisentence/ms-amr-split/train/msamr_dfa_039.xml
 35 | data/multisentence/ms-amr-split/train/msamr_dfa_042.xml
 36 | data/multisentence/ms-amr-split/train/msamr_dfa_043.xml
 37 | data/multisentence/ms-amr-split/train/msamr_dfa_045.xml
 38 | data/multisentence/ms-amr-split/train/msamr_dfa_046.xml
 39 | data/multisentence/ms-amr-split/train/msamr_dfa_047.xml
 40 | data/multisentence/ms-amr-split/train/msamr_dfa_048.xml
 41 | data/multisentence/ms-amr-split/train/msamr_dfa_049.xml
 42 | data/multisentence/ms-amr-split/train/msamr_dfa_050.xml
 43 | data/multisentence/ms-amr-split/train/msamr_dfa_051.xml
 44 | data/multisentence/ms-amr-split/train/msamr_dfa_052.xml
 45 | data/multisentence/ms-amr-split/train/msamr_dfa_053.xml
 46 | data/multisentence/ms-amr-split/train/msamr_dfa_054.xml
 47 | data/multisentence/ms-amr-split/train/msamr_dfa_055.xml
 48 | data/multisentence/ms-amr-split/train/msamr_dfa_057.xml
 49 | data/multisentence/ms-amr-split/train/msamr_dfa_058.xml
 50 | data/multisentence/ms-amr-split/train/msamr_dfa_059.xml
 51 | data/multisentence/ms-amr-split/train/msamr_dfa_060.xml
 52 | data/multisentence/ms-amr-split/train/msamr_dfa_061.xml
 53 | data/multisentence/ms-amr-split/train/msamr_dfa_062.xml
 54 | data/multisentence/ms-amr-split/train/msamr_dfa_065.xml
 55 | data/multisentence/ms-amr-split/train/msamr_dfa_067.xml
 56 | data/multisentence/ms-amr-split/train/msamr_dfa_068.xml
 57 | data/multisentence/ms-amr-split/train/msamr_dfa_069.xml
 58 | data/multisentence/ms-amr-split/train/msamr_dfa_070.xml
 59 | data/multisentence/ms-amr-split/train/msamr_dfa_071.xml
 60 | data/multisentence/ms-amr-split/train/msamr_dfa_072.xml
 61 | data/multisentence/ms-amr-split/train/msamr_dfa_074.xml
 62 | data/multisentence/ms-amr-split/train/msamr_dfa_075.xml
 63 | data/multisentence/ms-amr-split/train/msamr_dfa_076.xml
 64 | data/multisentence/ms-amr-split/train/msamr_dfa_078.xml
 65 | data/multisentence/ms-amr-split/train/msamr_dfa_079.xml
 66 | data/multisentence/ms-amr-split/train/msamr_dfa_080.xml
 67 | data/multisentence/ms-amr-split/train/msamr_dfa_082.xml
 68 | data/multisentence/ms-amr-split/train/msamr_dfa_083.xml
 69 | data/multisentence/ms-amr-split/train/msamr_dfa_084.xml
 70 | data/multisentence/ms-amr-split/train/msamr_dfa_085.xml
 71 | data/multisentence/ms-amr-split/train/msamr_dfa_087.xml
 72 | data/multisentence/ms-amr-split/train/msamr_dfa_088.xml
 73 | data/multisentence/ms-amr-split/train/msamr_dfa_089.xml
 74 | data/multisentence/ms-amr-split/train/msamr_dfa_090.xml
 75 | data/multisentence/ms-amr-split/train/msamr_dfa_091.xml
 76 | data/multisentence/ms-amr-split/train/msamr_dfa_094.xml
 77 | data/multisentence/ms-amr-split/train/msamr_dfa_098.xml
 78 | data/multisentence/ms-amr-split/train/msamr_dfa_099.xml
 79 | data/multisentence/ms-amr-split/train/msamr_dfa_101.xml
 80 | data/multisentence/ms-amr-split/train/msamr_dfa_102.xml
 81 | data/multisentence/ms-amr-split/train/msamr_dfa_103.xml
 82 | data/multisentence/ms-amr-split/train/msamr_dfa_105.xml
 83 | data/multisentence/ms-amr-split/train/msamr_dfa_107.xml
 84 | data/multisentence/ms-amr-split/train/msamr_dfa_110.xml
 85 | data/multisentence/ms-amr-split/train/msamr_dfa_111.xml
 86 | data/multisentence/ms-amr-split/train/msamr_dfa_112.xml
 87 | data/multisentence/ms-amr-split/train/msamr_dfa_113.xml
 88 | data/multisentence/ms-amr-split/train/msamr_dfa_114.xml
 89 | data/multisentence/ms-amr-split/train/msamr_dfa_115.xml
 90 | data/multisentence/ms-amr-split/train/msamr_dfa_116.xml
 91 | data/multisentence/ms-amr-split/train/msamr_dfa_117.xml
 92 | data/multisentence/ms-amr-split/train/msamr_dfa_118.xml
 93 | data/multisentence/ms-amr-split/train/msamr_dfa_119.xml
 94 | data/multisentence/ms-amr-split/train/msamr_dfa_120.xml
 95 | data/multisentence/ms-amr-split/train/msamr_dfa_121.xml
 96 | data/multisentence/ms-amr-split/train/msamr_dfa_122.xml
 97 | data/multisentence/ms-amr-split/train/msamr_dfa_123.xml
 98 | data/multisentence/ms-amr-split/train/msamr_dfa_124.xml
 99 | data/multisentence/ms-amr-split/train/msamr_dfa_125.xml
100 | data/multisentence/ms-amr-split/train/msamr_dfa_126.xml
101 | data/multisentence/ms-amr-split/train/msamr_dfa_127.xml
102 | data/multisentence/ms-amr-split/train/msamr_dfa_128.xml
103 | data/multisentence/ms-amr-split/train/msamr_dfa_129.xml
104 | data/multisentence/ms-amr-split/train/msamr_dfa_130.xml
105 | data/multisentence/ms-amr-split/train/msamr_dfa_131.xml
106 | data/multisentence/ms-amr-split/train/msamr_dfa_132.xml
107 | data/multisentence/ms-amr-split/train/msamr_dfa_133.xml
108 | data/multisentence/ms-amr-split/train/msamr_dfa_135.xml
109 | data/multisentence/ms-amr-split/train/msamr_dfa_136.xml
110 | data/multisentence/ms-amr-split/train/msamr_dfa_137.xml
111 | data/multisentence/ms-amr-split/train/msamr_dfa_139.xml
112 | data/multisentence/ms-amr-split/train/msamr_dfa_140.xml
113 | data/multisentence/ms-amr-split/train/msamr_dfa_141.xml
114 | data/multisentence/ms-amr-split/train/msamr_dfa_142.xml
115 | data/multisentence/ms-amr-split/train/msamr_dfa_143.xml
116 | data/multisentence/ms-amr-split/train/msamr_dfa_144.xml
117 | data/multisentence/ms-amr-split/train/msamr_dfa_145.xml
118 | data/multisentence/ms-amr-split/train/msamr_dfa_146.xml
119 | data/multisentence/ms-amr-split/train/msamr_dfb_001.xml
120 | data/multisentence/ms-amr-split/train/msamr_dfb_004.xml
121 | data/multisentence/ms-amr-split/train/msamr_dfb_005.xml
122 | data/multisentence/ms-amr-split/train/msamr_dfb_007.xml
123 | data/multisentence/ms-amr-split/train/msamr_dfb_008.xml
124 | data/multisentence/ms-amr-split/train/msamr_dfb_009.xml
125 | data/multisentence/ms-amr-split/train/msamr_dfb_011.xml
126 | data/multisentence/ms-amr-split/train/msamr_dfb_012.xml
127 | data/multisentence/ms-amr-split/train/msamr_dfb_014.xml
128 | data/multisentence/ms-amr-split/train/msamr_dfb_016.xml
129 | data/multisentence/ms-amr-split/train/msamr_dfb_017.xml
130 | data/multisentence/ms-amr-split/train/msamr_dfb_018.xml
131 | data/multisentence/ms-amr-split/train/msamr_dfb_020.xml
132 | data/multisentence/ms-amr-split/train/msamr_dfb_023.xml
133 | data/multisentence/ms-amr-split/train/msamr_dfb_024.xml
134 | data/multisentence/ms-amr-split/train/msamr_dfb_025.xml
135 | data/multisentence/ms-amr-split/train/msamr_dfb_027.xml
136 | data/multisentence/ms-amr-split/train/msamr_dfb_029.xml
137 | data/multisentence/ms-amr-split/train/msamr_dfb_030.xml
138 | data/multisentence/ms-amr-split/train/msamr_dfb_031.xml
139 | data/multisentence/ms-amr-split/train/msamr_dfb_033.xml
140 | data/multisentence/ms-amr-split/train/msamr_dfb_034.xml
141 | data/multisentence/ms-amr-split/train/msamr_dfb_035.xml
142 | data/multisentence/ms-amr-split/train/msamr_dfb_036.xml
143 | data/multisentence/ms-amr-split/train/msamr_dfb_037.xml
144 | data/multisentence/ms-amr-split/train/msamr_dfb_039.xml
145 | data/multisentence/ms-amr-split/train/msamr_dfb_040.xml
146 | data/multisentence/ms-amr-split/train/msamr_dfb_042.xml
147 | data/multisentence/ms-amr-split/train/msamr_dfb_043.xml
148 | data/multisentence/ms-amr-split/train/msamr_dfb_044.xml
149 | data/multisentence/ms-amr-split/train/msamr_dfb_045.xml
150 | data/multisentence/ms-amr-split/train/msamr_dfb_046.xml
151 | data/multisentence/ms-amr-split/train/msamr_dfb_047.xml
152 | data/multisentence/ms-amr-split/train/msamr_dfb_048.xml
153 | data/multisentence/ms-amr-split/train/msamr_dfb_049.xml
154 | data/multisentence/ms-amr-split/train/msamr_dfb_050.xml
155 | data/multisentence/ms-amr-split/train/msamr_dfb_051.xml
156 | data/multisentence/ms-amr-split/train/msamr_dfb_052.xml
157 | data/multisentence/ms-amr-split/train/msamr_dfb_053.xml
158 | data/multisentence/ms-amr-split/train/msamr_dfb_056.xml
159 | data/multisentence/ms-amr-split/train/msamr_dfb_058.xml
160 | data/multisentence/ms-amr-split/train/msamr_dfb_059.xml
161 | data/multisentence/ms-amr-split/train/msamr_dfb_060.xml
162 | data/multisentence/ms-amr-split/train/msamr_dfb_062.xml
163 | data/multisentence/ms-amr-split/train/msamr_dfb_063.xml
164 | data/multisentence/ms-amr-split/train/msamr_dfb_064.xml
165 | data/multisentence/ms-amr-split/train/msamr_dfb_065.xml
166 | data/multisentence/ms-amr-split/train/msamr_dfb_067.xml
167 | data/multisentence/ms-amr-split/train/msamr_dfb_068.xml
168 | data/multisentence/ms-amr-split/train/msamr_dfb_069.xml
169 | data/multisentence/ms-amr-split/train/msamr_dfb_070.xml
170 | data/multisentence/ms-amr-split/train/msamr_dfb_071.xml
171 | data/multisentence/ms-amr-split/train/msamr_dfb_072.xml
172 | data/multisentence/ms-amr-split/train/msamr_dfb_073.xml
173 | data/multisentence/ms-amr-split/train/msamr_dfb_074.xml
174 | data/multisentence/ms-amr-split/train/msamr_dfb_075.xml
175 | data/multisentence/ms-amr-split/train/msamr_dfb_076.xml
176 | data/multisentence/ms-amr-split/train/msamr_dfb_077.xml
177 | data/multisentence/ms-amr-split/train/msamr_dfb_078.xml
178 | data/multisentence/ms-amr-split/train/msamr_dfb_079.xml
179 | data/multisentence/ms-amr-split/train/msamr_dfb_080.xml
180 | data/multisentence/ms-amr-split/train/msamr_dfb_081.xml
181 | data/multisentence/ms-amr-split/train/msamr_dfb_082.xml
182 | data/multisentence/ms-amr-split/train/msamr_dfb_083.xml
183 | data/multisentence/ms-amr-split/train/msamr_dfb_084.xml
184 | data/multisentence/ms-amr-split/train/msamr_dfb_085.xml
185 | data/multisentence/ms-amr-split/train/msamr_dfb_086.xml
186 | data/multisentence/ms-amr-split/train/msamr_dfb_087.xml
187 | data/multisentence/ms-amr-split/train/msamr_dfb_088.xml
188 | data/multisentence/ms-amr-split/train/msamr_dfb_089.xml
189 | data/multisentence/ms-amr-split/train/msamr_dfb_090.xml
190 | data/multisentence/ms-amr-split/train/msamr_dfb_091.xml
191 | data/multisentence/ms-amr-split/train/msamr_dfb_092.xml
192 | data/multisentence/ms-amr-split/train/msamr_dfb_093.xml
193 | data/multisentence/ms-amr-split/train/msamr_dfb_094.xml
194 | data/multisentence/ms-amr-split/train/msamr_dfb_095.xml
195 | data/multisentence/ms-amr-split/train/msamr_dfb_096.xml
196 | data/multisentence/ms-amr-split/train/msamr_dfb_097.xml
197 | data/multisentence/ms-amr-split/train/msamr_dfb_098.xml
198 | data/multisentence/ms-amr-split/train/msamr_dfb_099.xml
199 | data/multisentence/ms-amr-split/train/msamr_dfb_100.xml
200 | data/multisentence/ms-amr-split/train/msamr_dfb_101.xml
201 | data/multisentence/ms-amr-split/train/msamr_dfb_102.xml
202 | data/multisentence/ms-amr-split/train/msamr_dfb_103.xml
203 | data/multisentence/ms-amr-split/train/msamr_dfb_104.xml
204 | data/multisentence/ms-amr-split/train/msamr_dfb_105.xml
205 | data/multisentence/ms-amr-split/train/msamr_dfb_106.xml
206 | data/multisentence/ms-amr-split/train/msamr_dfb_107.xml
207 | data/multisentence/ms-amr-split/train/msamr_dfb_108.xml
208 | data/multisentence/ms-amr-split/train/msamr_dfb_109.xml
209 | data/multisentence/ms-amr-split/train/msamr_dfb_110.xml
210 | data/multisentence/ms-amr-split/train/msamr_dfb_111.xml
211 | data/multisentence/ms-amr-split/train/msamr_dfb_112.xml
212 | data/multisentence/ms-amr-split/train/msamr_dfb_113.xml
213 | data/multisentence/ms-amr-split/train/msamr_dfb_114.xml
214 | data/multisentence/ms-amr-split/train/msamr_dfb_115.xml
215 | data/multisentence/ms-amr-split/train/msamr_dfb_116.xml
216 | data/multisentence/ms-amr-split/train/msamr_dfb_117.xml
217 | data/multisentence/ms-amr-split/train/msamr_dfb_118.xml
218 | data/multisentence/ms-amr-split/train/msamr_dfb_119.xml
219 | data/multisentence/ms-amr-split/train/msamr_dfb_120.xml
220 | data/multisentence/ms-amr-split/train/msamr_dfb_122.xml
221 | data/multisentence/ms-amr-split/train/msamr_dfb_123.xml
222 | data/multisentence/ms-amr-split/train/msamr_dfb_124.xml
223 | data/multisentence/ms-amr-split/train/msamr_dfb_125.xml
224 | data/multisentence/ms-amr-split/train/msamr_dfb_126.xml
225 | data/multisentence/ms-amr-split/train/msamr_dfb_127.xml
226 | data/multisentence/ms-amr-split/train/msamr_dfb_128.xml
227 | data/multisentence/ms-amr-split/train/msamr_dfb_129.xml
228 | data/multisentence/ms-amr-split/train/msamr_dfb_130.xml
229 | data/multisentence/ms-amr-split/train/msamr_dfb_131.xml
230 | data/multisentence/ms-amr-split/train/msamr_dfb_132.xml
231 | data/multisentence/ms-amr-split/train/msamr_dfb_135.xml
232 | data/multisentence/ms-amr-split/train/msamr_wb_001.xml
233 | data/multisentence/ms-amr-split/train/msamr_wb_002.xml
234 | data/multisentence/ms-amr-split/train/msamr_wb_003.xml
235 | data/multisentence/ms-amr-split/train/msamr_wb_004.xml
236 | data/multisentence/ms-amr-split/train/msamr_wb_005.xml
237 | data/multisentence/ms-amr-split/train/msamr_wb_006.xml
238 | data/multisentence/ms-amr-split/train/msamr_wb_007.xml
239 | data/multisentence/ms-amr-split/train/msamr_wb_008.xml
240 | data/multisentence/ms-amr-split/train/msamr_wb_009.xml
241 | data/multisentence/ms-amr-split/train/msamr_wb_010.xml
242 | data/multisentence/ms-amr-split/train/msamr_wb_011.xml
243 | 


--------------------------------------------------------------------------------