├── .appveyor.yml
├── .travis.yml
├── CHANGELOG.md
├── LICENSE.txt
├── README.md
├── __init__.py
├── amr.py
├── pyproject.toml
├── sample_file_list
├── setup.py
├── smatch-table.py
├── smatch.py
├── smatch_tool_guideline.txt
├── test_input1.txt
├── test_input2.txt
├── tests
    ├── README.md
    └── test_top.py
└── tox.ini


/.appveyor.yml:
--------------------------------------------------------------------------------
 1 | environment:
 2 |   matrix:
 3 |   - TOXENV: py35
 4 |   - TOXENV: py36
 5 |   - TOXENV: py37
 6 |   - TOXENV: py38
 7 | 
 8 | build: off
 9 | 
10 | install:
11 | - pip install tox
12 | 
13 | test_script:
14 | - tox
15 | 
16 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | language: python
 2 | python:
 3 | - '3.5'
 4 | - '3.6'
 5 | - '3.7'
 6 | - '3.8'
 7 | install: pip install tox-travis
 8 | script: tox
 9 | deploy:
10 |   provider: pypi
11 |   user: __token__
12 |   password:
13 |     secure: jXgeZ3fjIODIltATnt487VOBf6TIOp9bpd79r/wUG373pa6GVrWKEbX1LBAprmSW7dI+pYccQpeNF5OIsUJFFNjy59iLlKtfY+9N8ugD7rvuQL4pP+jwiVvuKB9SLyLwzJWrArsmVOju9JQUogtKXPJiflY1PE4fGV/Zjx+FNl6816HO8i2uHaLOBzDbNXtOCnAmioaM72cQ/qvtbKdkxF7u7xtAwqA8TEBeaf6PYR2b1mn8Vcgo7pG/uJGZtICPVINxv914jDpu0CDcWl3rmtVM1Q44XxKLV3c+6QUeonTy5s+35plDiWnI0F0HJiGfS51NYvUUaZZQKuNbIKg27dJcfhuzZDarD4C/P6fjN8NSdmB/B4P058pBkUTmmaeaLBDENBOlJAwRwHupRvsnWZ9xazw1edEVo2tnFPhAZJ+BBDv9lISiNg2kiQx6+5/BbnecjS4lyz3Kje+QljocDXHwshsSl8FxnmJ6TWI21yJJauKM4LGxHGyCrz0W1AsYw+S/cwppC1/BzS9ZLW5DdUeVzULGKzY1MqZJ7ABt70MLBMmrV5VjpUjtTS9dqVBMSvCxshR2hOM47x4wEzG9yQec4aDvHImEPOMXKHkzk6fVhm5qO9IxJTPldYrLcR/UFpcFdHKelIReqTQsjzaEI7CMCicux/DDpRt9msrCeQY=
14 |   on:
15 |     tags: true
16 |   skip_existing: true
17 |   skip_cleanup: true
18 | 


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
  1 | 
  2 | # Changelog
  3 | 
  4 | ## Unreleased
  5 | 
  6 | (no unreleased changes yet)
  7 | 
  8 | ## [1.0.4]
  9 | 
 10 | **Release date: 2020-05-30**
 11 | 
 12 | * Python 2 support is removed ([#31])
 13 | * Python 3.8 support is added
 14 | 
 15 | ## [1.0.3]
 16 | 
 17 | **Release date: 2020-05-30**
 18 | 
 19 | > Note: due to an issue with deployment, this was released as v1.0.3.2
 20 | 
 21 | * Smatch version is taken from git tag (see [#22])
 22 | * Add CI/CD configuration (see PR [#23])
 23 | * Better handle deinversion of special roles ([#10])
 24 | * Get `smatch-table.py` working again (part of PR [#27])
 25 | * Add `tests/` subdirectory and `test_top.py` ([#25])
 26 | * Make TOP relation's value a constant string and not the top node's
 27 |   concept to avoid double-penalizing different top concepts ([#25])
 28 | 
 29 | ## [1.0.2]
 30 | 
 31 | **Release date: 2019-12-24**
 32 | 
 33 | * Windows support ([#19])
 34 | 
 35 | ## [1.0.1]
 36 | 
 37 | **Release date: 2018-08-21**
 38 | 
 39 | * Refactor to allow programmatic usage ([#14])
 40 | 
 41 | 
 42 | ## [1.0]
 43 | 
 44 | **Release date: 2018-08-12**
 45 | 
 46 | * Add Smatch to PyPI ([#5])
 47 | 
 48 | 
 49 | ## Pre-1.0
 50 | 
 51 | The following are taken from an old `update_log` file:
 52 | 
 53 | * Update: 01/08/2017
 54 | 
 55 |   Person involved: Shu Cai
 56 | 
 57 |   A bit refactoring and cleanup for easier debugging and better code quality.
 58 |   This change does not affect the functionality of smatch.
 59 | 
 60 | * Update: 12/18/2016
 61 | 
 62 |   Person involved: Shu Cai
 63 | 
 64 |   Add an error message for AMR parsing error, and fix a bug introduced
 65 |   by a typo in the previous commit.
 66 | 
 67 | * Update: 12/14/2016
 68 | 
 69 |   Person involved: Shu Cai
 70 | 
 71 |   Fix a bug introduced in 11/06/2016: not supporting multiple
 72 |   same-name relationships
 73 | 
 74 |   Thanks Miguel Ballesteros (miguel.ballesteros@ibm.com) to bring this up.
 75 | 
 76 | * Update: 11/14/2016
 77 | 
 78 |   Person involved: Jon May
 79 | 
 80 |   Fix a bug: quoted and unquoted strings match (propagation of old bug
 81 |   to github) Thanks William Dolan
 82 | 
 83 | * Update: 11/06/2016
 84 | 
 85 |   Person involved: Shu Cai
 86 | 
 87 |   Fix a bug: not supporting multiple relationships between two (same) nodes
 88 | 
 89 |   Thanks Marco Damonte (s1333293@sms.ed.ac.uk) for finding this bug!
 90 | 
 91 | * Update: 1/9/2016
 92 | 
 93 |   Person involved: Guntis Barzdins and Didzis Gosko
 94 | 
 95 |   Fixed small crash bug
 96 | 
 97 | * Update: 12/21/2015
 98 | 
 99 |   Person involved: Jon May
100 | 
101 |   Fixed treatment of quoted strings to allow special characters to be
102 |   actually part of the string.
103 | 
104 |   Empty double quoted strings also allowed
105 | 
106 | * Update: 01/18/2015
107 | 
108 |   Person involved: Shu Cai
109 | 
110 |   Code cleanup and bug fix. Add detailed comment to the code.
111 | 
112 |   Thanks Yoav Artzi (yoav@cs.washington.edu) for finding a bug and
113 |   fixing it.
114 | 
115 | * Update: 04/04/2013
116 | 
117 |   Person involved: Shu Cai
118 | 
119 |   Add Software_architecture.pdf. Minor changes to the smatch.py and
120 |   smatch-table.py (comments and add --pr option)
121 | 
122 |   Minor changes to the README.txt and smatch_guide.pdf
123 | 
124 | * Update: 03/20/2013
125 | 
126 |   Person involved: Shu Cai
127 | 
128 |   Minor changes to the documents: smatch_guide.txt and smatch_guide.pdf
129 | 
130 | * Update: 03/19/2013
131 | 
132 |   Person involved: Shu Cai
133 | 
134 |   Document update. The latest documents are smatch_guide.txt and
135 |   smatch_guide.pdf (same content)
136 | 
137 |   Add some sample files to the directory: sample_file_list,
138 |   test_input1, test_input2
139 | 
140 | * Update: 03/17/2013
141 | 
142 |   Person involved: Shu Cai
143 | 
144 |   Interface change of smatch.py and smatch-table.py. Using this
145 |   version does not require esem-format-check.pl. (All versions before
146 |   v0.5 require esem-format-check.pl to check the format of AMR)
147 |   Instead it needs amr.py.
148 | 
149 |   It now accepts one-AMR-per-line format as well as other formats of AMR.
150 | 
151 |   smatch.py now equals to smatch-v0.5.py
152 |   smatch-table.py now equals to smatch-table-v0.3.py
153 | 
154 | * Update: 09/14/2012
155 | 
156 |   Person involved: Shu Cai
157 | 
158 |   Bug fix of smatch.py and smatch-table.py. smatch-v0.1.py
159 |   smatch-v0.2.py smatch-v0.3.py smatch-v0.4.py smatch-table-v0.1.py
160 |   smatch-table-v0.2.py was created.
161 | 
162 |   smatch.py now equals to smatch-v0.4.py
163 | 
164 |   smatch-table.py now equals to smatch-table-v0.2.py
165 | 
166 |   smatch.py runs with a smart initialization, which matches words with
167 |   the same value first, then randomly select other variable
168 |   mappings. 4 restarts is applied.
169 | 
170 | * Update: 08/22/2012
171 | 
172 |   Person involved: Shu Cai
173 | 
174 |   Minor bug fix of smatch.py. smatch-v2.py was created.
175 | 
176 |   - smatch.py-> smatch-v1.py
177 |   - smatch-v2.py-> smatch.py
178 | 
179 |   No change of interface
180 | 
181 | 
182 | [1.0.4]: https://pypi.org/project/smatch/1.0.4/
183 | [1.0.3]: https://pypi.org/project/smatch/1.0.3.2/
184 | [1.0.2]: https://pypi.org/project/smatch/1.0.2/
185 | [1.0.1]: https://pypi.org/project/smatch/1.0.1/
186 | [1.0]: https://pypi.org/project/smatch/1.0.post2/
187 | 
188 | [#5]: https://github.com/snowblink14/smatch/issues/5
189 | [#7]: https://github.com/snowblink14/smatch/issues/7
190 | [#10]: https://github.com/snowblink14/smatch/issues/10
191 | [#14]: https://github.com/snowblink14/smatch/issues/14
192 | [#19]: https://github.com/snowblink14/smatch/issues/19
193 | [#22]: https://github.com/snowblink14/smatch/issues/22
194 | [#23]: https://github.com/snowblink14/smatch/pull/23
195 | [#25]: https://github.com/snowblink14/smatch/pull/25
196 | [#27]: https://github.com/snowblink14/smatch/pull/27
197 | [#31]: https://github.com/snowblink14/smatch/issues/31
198 | 
199 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
1 | Copyright (C) 2015 Shu Cai and Kevin Knight
2 | 
3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4 | 
5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6 | 
7 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Smatch (semantic match) tool
  2 | 
  3 | [![Version on PyPI](https://img.shields.io/pypi/v/smatch)](https://pypi.org/project/smatch/)
  4 | ![Python Support](https://img.shields.io/pypi/pyversions/smatch)
  5 | 
  6 | [Smatch](http://amr.isi.edu/evaluation.html) is an evaluation tool for
  7 | [AMR](http://amr.isi.edu/) (Abstract Meaning Representation). It
  8 | computes the Smatch score (defined below) of two AMR graphs in terms
  9 | of their matching triples (edges) by finding a variable (node) mapping
 10 | that maximizes the count, `M`, of matching triples, then:
 11 | 
 12 | * `M` is the number of matching triples
 13 | * `T` is the total number of triples in the first AMR
 14 | * `G` is the total number of triples in the second AMR
 15 | * Precision is defined as `P = M/T`
 16 | * Recall is defined as `R = M/G`
 17 | * The Smatch score is the F-score: `F = 2 * (P*R)/(P+R)`
 18 | 
 19 | For more information, see [Cai and Knight,
 20 | 2013](https://amr.isi.edu/smatch-13.pdf).
 21 | 
 22 | ## Requirements, Installation, and Usage
 23 | 
 24 | This Smatch implementation is tested for Python 3.5 or higher. It is
 25 | released [on PyPI](https://pypi.org/project/smatch/) so you can
 26 | install it with `pip`:
 27 | 
 28 | ``` console
 29 | $ pip install smatch
 30 | ```
 31 | 
 32 | You can also clone this repository and run the `smatch.py` script
 33 | directly as it does not need to be installed to be used.
 34 | 
 35 | To use the script, run it with at least the `-f` option, which takes
 36 | two filename arguments:
 37 | 
 38 | ``` console
 39 | $ smatch.py -f test.amr gold.amr
 40 | ```
 41 | 
 42 | Note that the order of these arguments does not matter for the Smatch
 43 | score as the F-score is symmetric, but swapping the arguments will
 44 | swap the precision and recall. The files contain AMRs separated by a
 45 | blank line, with comment lines starting with `#` (see
 46 | [`test_input1.txt`](test_input1.txt) for an example).
 47 | 
 48 | For other options, try `smatch.py --help`.
 49 | 
 50 | ## Citation
 51 | 
 52 | ```bibtex
 53 | @inproceedings{cai-knight-2013-smatch,
 54 |     title = "{S}match: an Evaluation Metric for Semantic Feature Structures",
 55 |     author = "Cai, Shu and Knight, Kevin",
 56 |     booktitle = "Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
 57 |     month = aug,
 58 |     year = "2013",
 59 |     address = "Sofia, Bulgaria",
 60 |     publisher = "Association for Computational Linguistics",
 61 |     url = "https://www.aclweb.org/anthology/P13-2131",
 62 |     pages = "748--752",
 63 | }
 64 | ```
 65 | 
 66 | ## Recommendations for Reproducible Research
 67 | 
 68 | You can help make your research reproducible by including the
 69 | following information in your writing:
 70 | 
 71 | * The software version (e.g., repository URL and version number)
 72 | * The number of restarts (`-r`) used, even if unchanged from the default
 73 | * The order of the arguments to `-f` (if reporting precision and recall)
 74 | * Any other options or preprocessing steps
 75 | 
 76 | 
 77 | ## History
 78 | 
 79 | The code was mostly developed during 2012 and 2013, and has undergone
 80 | many fixes and updates. Note that the versions distributed for
 81 | [SemEval-2016](http://alt.qcri.org/semeval2016/task8/index.php?id=data-and-tools)
 82 | were numbered 2.0&ndash;2.0.2, but these predate this repository and
 83 | the [1.0 series on
 84 | PyPI](https://pypi.org/project/smatch/#history). For more details, see
 85 | the [Changelog](CHANGELOG.md).
 86 | 
 87 | 
 88 | ## Related Projects
 89 | 
 90 | Here are some notable forks of Smatch:
 91 | 
 92 | * [didzis/pSMATCH](https://github.com/didzis/pSMATCH) adds
 93 |   parallelization for speed
 94 | * [isi-nlp/smatch](https://github.com/isi-nlp/smatch) adds an ILP
 95 |   solver for getting optimal variable mappings
 96 | * [cfmrp/mtool](https://github.com/cfmrp/mtool) packages the version
 97 |   of Smatch used for the [MRP](http://mrp.nlpl.eu/) workshop at [CONLL
 98 |   2019](http://www.conll.org/2019)
 99 | 
100 | And here are other evaluation metrics for AMR:
101 | 
102 | * [mdtux89/amr-evaluation](https://github.com/mdtux89/amr-evaluation)
103 |   offers a set of metrics based on Smatch for fine-grained evaluation
104 | * [freesunshine0316/sembleu](https://github.com/freesunshine0316/sembleu)
105 |   is inspired by BLEU and puts more weight on "content" than
106 |   graph-structure similarity
107 | * [rafaelanchieta/sema](https://github.com/rafaelanchieta/sema/)
108 |   weights error types differently and does not consider which node is
109 |   the graph's top
110 | * [flipz357/amr-metric-suite](https://github.com/flipz357/amr-metric-suite)
111 |   Soft Smatch (S^2match): computes Smatch alignment and Smatch graph similarity 
112 |   with graded concept match via Glove embeddings (e.g., enemy - foe).
113 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/snowblink14/smatch/c01a838f39f25e7a424a65337dbd5f7540cc20f8/__init__.py


--------------------------------------------------------------------------------
/amr.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | """
  5 | AMR (Abstract Meaning Representation) structure
  6 | For detailed description of AMR, see http://www.isi.edu/natural-language/amr/a.pdf
  7 | 
  8 | """
  9 | 
 10 | from __future__ import print_function
 11 | from collections import defaultdict
 12 | import sys
 13 | 
 14 | # change this if needed
 15 | ERROR_LOG = sys.stderr
 16 | 
 17 | # change this if needed
 18 | DEBUG_LOG = sys.stderr
 19 | 
 20 | 
 21 | class AMR(object):
 22 |     """
 23 |     AMR is a rooted, labeled graph to represent semantics.
 24 |     This class has the following members:
 25 |     nodes: list of node in the graph. Its ith element is the name of the ith node. For example, a node name
 26 |            could be "a1", "b", "g2", .etc
 27 |     node_values: list of node labels (values) of the graph. Its ith element is the value associated with node i in
 28 |                  nodes list. In AMR, such value is usually a semantic concept (e.g. "boy", "want-01")
 29 |     root: root node name
 30 |     relations: list of edges connecting two nodes in the graph. Each entry is a link between two nodes, i.e. a triple
 31 |                <relation name, node1 name, node 2 name>. In AMR, such link denotes the relation between two semantic
 32 |                concepts. For example, "arg0" means that one of the concepts is the 0th argument of the other.
 33 |     attributes: list of edges connecting a node to an attribute name and its value. For example, if the polarity of
 34 |                some node is negative, there should be an edge connecting this node and "-". A triple < attribute name,
 35 |                node name, attribute value> is used to represent such attribute. It can also be viewed as a relation.
 36 | 
 37 |     """
 38 |     def __init__(self, node_list=None, node_value_list=None, relation_list=None, attribute_list=None):
 39 |         """
 40 |         node_list: names of nodes in AMR graph, e.g. "a11", "n"
 41 |         node_value_list: values of nodes in AMR graph, e.g. "group" for a node named "g"
 42 |         relation_list: list of relations between two nodes
 43 |         attribute_list: list of attributes (links between one node and one constant value)
 44 | 
 45 |         """
 46 |         # initialize AMR graph nodes using list of nodes name
 47 |         # root, by default, is the first in var_list
 48 | 
 49 |         if node_list is None:
 50 |             self.nodes = []
 51 |             self.root = None
 52 |         else:
 53 |             self.nodes = node_list[:]
 54 |             if len(node_list) != 0:
 55 |                 self.root = node_list[0]
 56 |             else:
 57 |                 self.root = None
 58 |         if node_value_list is None:
 59 |             self.node_values = []
 60 |         else:
 61 |             self.node_values = node_value_list[:]
 62 |         if relation_list is None:
 63 |             self.relations = []
 64 |         else:
 65 |             self.relations = relation_list[:]
 66 |         if attribute_list is None:
 67 |             self.attributes = []
 68 |         else:
 69 |             self.attributes = attribute_list[:]
 70 | 
 71 |     def rename_node(self, prefix):
 72 |         """
 73 |         Rename AMR graph nodes to prefix + node_index to avoid nodes with the same name in two different AMRs.
 74 | 
 75 |         """
 76 |         node_map_dict = {}
 77 |         # map each node to its new name (e.g. "a1")
 78 |         for i in range(0, len(self.nodes)):
 79 |             node_map_dict[self.nodes[i]] = prefix + str(i)
 80 |         # update node name
 81 |         for i, v in enumerate(self.nodes):
 82 |             self.nodes[i] = node_map_dict[v]
 83 |         # update node name in relations
 84 |         for node_relations in self.relations:
 85 |             for i, l in enumerate(node_relations):
 86 |                 node_relations[i][1] = node_map_dict[l[1]]
 87 | 
 88 |     def get_triples(self):
 89 |         """
 90 |         Get the triples in three lists.
 91 |         instance_triple: a triple representing an instance. E.g. instance(w, want-01)
 92 |         attribute triple: relation of attributes, e.g. polarity(w, - )
 93 |         and relation triple, e.g. arg0 (w, b)
 94 | 
 95 |         """
 96 |         instance_triple = []
 97 |         relation_triple = []
 98 |         attribute_triple = []
 99 |         for i in range(len(self.nodes)):
100 |             instance_triple.append(("instance", self.nodes[i], self.node_values[i]))
101 |             # l[0] is relation name
102 |             # l[1] is the other node this node has relation with
103 |             for l in self.relations[i]:
104 |                 relation_triple.append((l[0], self.nodes[i], l[1]))
105 |             # l[0] is the attribute name
106 |             # l[1] is the attribute value
107 |             for l in self.attributes[i]:
108 |                 attribute_triple.append((l[0], self.nodes[i], l[1]))
109 |         return instance_triple, attribute_triple, relation_triple
110 | 
111 | 
112 |     def get_triples2(self):
113 |         """
114 |         Get the triples in two lists:
115 |         instance_triple: a triple representing an instance. E.g. instance(w, want-01)
116 |         relation_triple: a triple representing all relations. E.g arg0 (w, b) or E.g. polarity(w, - )
117 |         Note that we do not differentiate between attribute triple and relation triple. Both are considered as relation
118 |         triples.
119 |         All triples are represented by (triple_type, argument 1 of the triple, argument 2 of the triple)
120 | 
121 |         """
122 |         instance_triple = []
123 |         relation_triple = []
124 |         for i in range(len(self.nodes)):
125 |             # an instance triple is instance(node name, node value).
126 |             # For example, instance(b, boy).
127 |             instance_triple.append(("instance", self.nodes[i], self.node_values[i]))
128 |             # l[0] is relation name
129 |             # l[1] is the other node this node has relation with
130 |             for l in self.relations[i]:
131 |                 relation_triple.append((l[0], self.nodes[i], l[1]))
132 |             # l[0] is the attribute name
133 |             # l[1] is the attribute value
134 |             for l in self.attributes[i]:
135 |                 relation_triple.append((l[0], self.nodes[i], l[1]))
136 |         return instance_triple, relation_triple
137 | 
138 | 
139 |     def __str__(self):
140 |         """
141 |         Generate AMR string for better readability
142 | 
143 |         """
144 |         lines = []
145 |         for i in range(len(self.nodes)):
146 |             lines.append("Node "+ str(i) + " " + self.nodes[i])
147 |             lines.append("Value: " + self.node_values[i])
148 |             lines.append("Relations:")
149 |             for relation in self.relations[i]:
150 |                 lines.append("Node " + relation[1] + " via " + relation[0])
151 |             for attribute in self.attributes[i]:
152 |                 lines.append("Attribute: " + attribute[0] + " value " + attribute[1])
153 |         return "\n".join(lines)
154 | 
155 |     def __repr__(self):
156 |         return self.__str__()
157 | 
158 |     def output_amr(self):
159 |         """
160 |         Output AMR string
161 | 
162 |         """
163 |         print(self.__str__(), file=DEBUG_LOG)
164 | 
165 |     @staticmethod
166 |     def get_amr_line(input_f):
167 |         """
168 |         Read the file containing AMRs. AMRs are separated by a blank line.
169 |         Each call of get_amr_line() returns the next available AMR (in one-line form).
170 |         Note: this function does not verify if the AMR is valid
171 | 
172 |         """
173 |         cur_amr = []
174 |         has_content = False
175 |         for line in input_f:
176 |             line = line.strip()
177 |             if line == "":
178 |                 if not has_content:
179 |                     # empty lines before current AMR
180 |                     continue
181 |                 else:
182 |                     # end of current AMR
183 |                     break
184 |             if line.strip().startswith("#"):
185 |                 # ignore the comment line (starting with "#") in the AMR file
186 |                 continue
187 |             else:
188 |                 has_content = True
189 |                 cur_amr.append(line.strip())
190 |         return "".join(cur_amr)
191 | 
192 |     @staticmethod
193 |     def parse_AMR_line(line):
194 |         """
195 |         Parse a AMR from line representation to an AMR object.
196 |         This parsing algorithm scans the line once and process each character, in a shift-reduce style.
197 | 
198 |         """
199 |         # Current state. It denotes the last significant symbol encountered. 1 for (, 2 for :, 3 for /,
200 |         # and 0 for start state or ')'
201 |         # Last significant symbol is ( --- start processing node name
202 |         # Last significant symbol is : --- start processing relation name
203 |         # Last significant symbol is / --- start processing node value (concept name)
204 |         # Last significant symbol is ) --- current node processing is complete
205 |         # Note that if these symbols are inside parenthesis, they are not significant symbols.
206 | 
207 |         exceptions =set(["prep-on-behalf-of", "prep-out-of", "consist-of"])
208 |         def update_triple(node_relation_dict, u, r, v):
209 |             # we detect a relation (r) between u and v, with direction u to v.
210 |             # in most cases, if relation name ends with "-of", e.g."arg0-of",
211 |             # it is reverse of some relation. For example, if a is "arg0-of" b,
212 |             # we can also say b is "arg0" a.
213 |             # If the relation name ends with "-of", we store the reverse relation.
214 |             # but note some exceptions like "prep-on-behalf-of" and "prep-out-of"
215 |             # also note relation "mod" is the reverse of "domain"
216 |             if r.endswith("-of") and not r in exceptions:
217 |                 node_relation_dict[v].append((r[:-3], u))
218 |             elif r=="mod":
219 |                 node_relation_dict[v].append(("domain", u))
220 |             else:
221 |                 node_relation_dict[u].append((r, v))
222 | 
223 |         state = 0
224 |         # node stack for parsing
225 |         stack = []
226 |         # current not-yet-reduced character sequence
227 |         cur_charseq = []
228 |         # key: node name value: node value
229 |         node_dict = {}
230 |         # node name list (order: occurrence of the node)
231 |         node_name_list = []
232 |         # key: node name:  value: list of (relation name, the other node name)
233 |         node_relation_dict1 = defaultdict(list)
234 |         # key: node name, value: list of (attribute name, const value) or (relation name, unseen node name)
235 |         node_relation_dict2 = defaultdict(list)
236 |         # current relation name
237 |         cur_relation_name = ""
238 |         # having unmatched quote string
239 |         in_quote = False
240 |         for i, c in enumerate(line.strip()):
241 |             if c == " ":
242 |                 # allow space in relation name
243 |                 if state == 2:
244 |                     cur_charseq.append(c)
245 |                 continue
246 |             if c == "\"":
247 |                 # flip in_quote value when a quote symbol is encountered
248 |                 # insert placeholder if in_quote from last symbol
249 |                 if in_quote:
250 |                     cur_charseq.append('_')
251 |                 in_quote = not in_quote
252 |             elif c == "(":
253 |                 # not significant symbol if inside quote
254 |                 if in_quote:
255 |                     cur_charseq.append(c)
256 |                     continue
257 |                 # get the attribute name
258 |                 # e.g :arg0 (x ...
259 |                 # at this point we get "arg0"
260 |                 if state == 2:
261 |                     # in this state, current relation name should be empty
262 |                     if cur_relation_name != "":
263 |                         print("Format error when processing ", line[0:i + 1], file=ERROR_LOG)
264 |                         return None
265 |                     # update current relation name for future use
266 |                     cur_relation_name = "".join(cur_charseq).strip()
267 |                     cur_charseq[:] = []
268 |                 state = 1
269 |             elif c == ":":
270 |                 # not significant symbol if inside quote
271 |                 if in_quote:
272 |                     cur_charseq.append(c)
273 |                     continue
274 |                 # Last significant symbol is "/". Now we encounter ":"
275 |                 # Example:
276 |                 # :OR (o2 / *OR*
277 |                 #    :mod (o3 / official)
278 |                 #  gets node value "*OR*" at this point
279 |                 if state == 3:
280 |                     node_value = "".join(cur_charseq)
281 |                     # clear current char sequence
282 |                     cur_charseq[:] = []
283 |                     # pop node name ("o2" in the above example)
284 |                     cur_node_name = stack[-1]
285 |                     # update node name/value map
286 |                     node_dict[cur_node_name] = node_value
287 |                 # Last significant symbol is ":". Now we encounter ":"
288 |                 # Example:
289 |                 # :op1 w :quant 30
290 |                 # or :day 14 :month 3
291 |                 # the problem is that we cannot decide if node value is attribute value (constant)
292 |                 # or node value (variable) at this moment
293 |                 elif state == 2:
294 |                     temp_attr_value = "".join(cur_charseq)
295 |                     cur_charseq[:] = []
296 |                     parts = temp_attr_value.split()
297 |                     if len(parts) < 2:
298 |                         print("Error in processing; part len < 2", line[0:i + 1], file=ERROR_LOG)
299 |                         return None
300 |                     # For the above example, node name is "op1", and node value is "w"
301 |                     # Note that this node name might not be encountered before
302 |                     relation_name = parts[0].strip()
303 |                     relation_value = parts[1].strip()
304 |                     # We need to link upper level node to the current
305 |                     # top of stack is upper level node
306 |                     if len(stack) == 0:
307 |                         print("Error in processing", line[:i], relation_name, relation_value, file=ERROR_LOG)
308 |                         return None
309 |                     # if we have not seen this node name before
310 |                     if relation_value not in node_dict:
311 |                         update_triple(node_relation_dict2, stack[-1], relation_name, relation_value)
312 |                     else:
313 |                         update_triple(node_relation_dict1, stack[-1], relation_name, relation_value)
314 |                 state = 2
315 |             elif c == "/":
316 |                 if in_quote:
317 |                     cur_charseq.append(c)
318 |                     continue
319 |                 # Last significant symbol is "(". Now we encounter "/"
320 |                 # Example:
321 |                 # (d / default-01
322 |                 # get "d" here
323 |                 if state == 1:
324 |                     node_name = "".join(cur_charseq)
325 |                     cur_charseq[:] = []
326 |                     # if this node name is already in node_dict, it is duplicate
327 |                     if node_name in node_dict:
328 |                         print("Duplicate node name ", node_name, " in parsing AMR", file=ERROR_LOG)
329 |                         return None
330 |                     # push the node name to stack
331 |                     stack.append(node_name)
332 |                     # add it to node name list
333 |                     node_name_list.append(node_name)
334 |                     # if this node is part of the relation
335 |                     # Example:
336 |                     # :arg1 (n / nation)
337 |                     # cur_relation_name is arg1
338 |                     # node name is n
339 |                     # we have a relation arg1(upper level node, n)
340 |                     if cur_relation_name != "":
341 |                         update_triple(node_relation_dict1, stack[-2], cur_relation_name, node_name)
342 |                         cur_relation_name = ""
343 |                 else:
344 |                     # error if in other state
345 |                     print("Error in parsing AMR", line[0:i + 1], file=ERROR_LOG)
346 |                     return None
347 |                 state = 3
348 |             elif c == ")":
349 |                 if in_quote:
350 |                     cur_charseq.append(c)
351 |                     continue
352 |                 # stack should be non-empty to find upper level node
353 |                 if len(stack) == 0:
354 |                     print("Unmatched parenthesis at position", i, "in processing", line[0:i + 1], file=ERROR_LOG)
355 |                     return None
356 |                 # Last significant symbol is ":". Now we encounter ")"
357 |                 # Example:
358 |                 # :op2 "Brown") or :op2 w)
359 |                 # get \"Brown\" or w here
360 |                 if state == 2:
361 |                     temp_attr_value = "".join(cur_charseq)
362 |                     cur_charseq[:] = []
363 |                     parts = temp_attr_value.split()
364 |                     if len(parts) < 2:
365 |                         print("Error processing", line[:i + 1], temp_attr_value, file=ERROR_LOG)
366 |                         return None
367 |                     relation_name = parts[0].strip()
368 |                     relation_value = parts[1].strip()
369 |                     # attribute value not seen before
370 |                     # Note that it might be a constant attribute value, or an unseen node
371 |                     # process this after we have seen all the node names
372 |                     if relation_value not in node_dict:
373 |                         update_triple(node_relation_dict2, stack[-1], relation_name, relation_value)
374 |                     else:
375 |                         update_triple(node_relation_dict1, stack[-1], relation_name, relation_value)
376 |                 # Last significant symbol is "/". Now we encounter ")"
377 |                 # Example:
378 |                 # :arg1 (n / nation)
379 |                 # we get "nation" here
380 |                 elif state == 3:
381 |                     node_value = "".join(cur_charseq)
382 |                     cur_charseq[:] = []
383 |                     cur_node_name = stack[-1]
384 |                     # map node name to its value
385 |                     node_dict[cur_node_name] = node_value
386 |                 # pop from stack, as the current node has been processed
387 |                 stack.pop()
388 |                 cur_relation_name = ""
389 |                 state = 0
390 |             else:
391 |                 # not significant symbols, so we just shift.
392 |                 cur_charseq.append(c)
393 |         #create data structures to initialize an AMR
394 |         node_value_list = []
395 |         relation_list = []
396 |         attribute_list = []
397 |         for v in node_name_list:
398 |             if v not in node_dict:
399 |                 print("Error: Node name not found", v, file=ERROR_LOG)
400 |                 return None
401 |             else:
402 |                 node_value_list.append(node_dict[v])
403 |             # build relation list and attribute list for this node
404 |             node_rel_list = []
405 |             node_attr_list = []
406 |             if v in node_relation_dict1:
407 |                 for v1 in node_relation_dict1[v]:
408 |                     node_rel_list.append([v1[0], v1[1]])
409 |             if v in node_relation_dict2:
410 |                 for v2 in node_relation_dict2[v]:
411 |                     # if value is in quote, it is a constant value
412 |                     # strip the quote and put it in attribute map
413 |                     if v2[1][0] == "\"" and v2[1][-1] == "\"":
414 |                         node_attr_list.append([[v2[0]], v2[1][1:-1]])
415 |                     # if value is a node name
416 |                     elif v2[1] in node_dict:
417 |                         node_rel_list.append([v2[0], v2[1]])
418 |                     else:
419 |                         node_attr_list.append([v2[0], v2[1]])
420 |             # each node has a relation list and attribute list
421 |             relation_list.append(node_rel_list)
422 |             attribute_list.append(node_attr_list)
423 |         # add TOP as an attribute. The attribute value just needs to be constant
424 |         attribute_list[0].append(["TOP", 'top'])
425 |         result_amr = AMR(node_name_list, node_value_list, relation_list, attribute_list)
426 |         return result_amr
427 | 
428 | # test AMR parsing
429 | # run by amr.py [file containing AMR]
430 | # a unittest can also be used.
431 | if __name__ == "__main__":
432 |     if len(sys.argv) < 2:
433 |         print("No file given", file=ERROR_LOG)
434 |         exit(1)
435 |     amr_count = 1
436 |     for line in open(sys.argv[1]):
437 |         cur_line = line.strip()
438 |         if cur_line == "" or cur_line.startswith("#"):
439 |             continue
440 |         print("AMR", amr_count, file=DEBUG_LOG)
441 |         current = AMR.parse_AMR_line(cur_line)
442 |         current.output_amr()
443 |         amr_count += 1
444 | 


--------------------------------------------------------------------------------
/pyproject.toml:
--------------------------------------------------------------------------------
1 | [build-system]
2 | requires = ["setuptools>=42", "wheel", "setuptools_scm[toml]>=3.3"]
3 | [tool.setuptools_scm]
4 | 
5 | 


--------------------------------------------------------------------------------
/sample_file_list:
--------------------------------------------------------------------------------
1 | nw_wsj_0001_1 nw_wsj_0001_2 nw_wsj_0002_1 nw_wsj_0003_1
2 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | import os
 4 | 
 5 | from setuptools import setup
 6 | 
 7 | 
 8 | here = os.path.abspath(os.path.dirname(__file__))
 9 | with open(os.path.join(here, "README.md")) as f:
10 |     README = f.read()
11 | 
12 | 
13 | setup(name="smatch",
14 |       use_scm_version=True,
15 |       setup_requires=['setuptools_scm'],
16 |       description="Smatch (semantic match) tool",
17 |       long_description=README,
18 |       long_description_content_type='text/markdown',
19 |       author="Shu Cai",
20 |       author_email="shucai.work@gmail.com",
21 |       url="https://github.com/snowblink14/smatch",
22 |       license="MIT",
23 |       classifiers=[
24 |           'Development Status :: 5 - Production/Stable',
25 |           'Environment :: Console',
26 |           'Intended Audience :: Developers',
27 |           'Intended Audience :: Information Technology',
28 |           'Intended Audience :: Science/Research',
29 |           'License :: OSI Approved :: MIT License',
30 |           'Programming Language :: Python :: 3',
31 |           'Programming Language :: Python :: 3.5',
32 |           'Programming Language :: Python :: 3.6',
33 |           'Programming Language :: Python :: 3.7',
34 |           'Programming Language :: Python :: 3.8',
35 |           'Topic :: Scientific/Engineering :: Information Analysis',
36 |           'Topic :: Software Development :: Libraries :: Python Modules',
37 |           'Topic :: Text Processing :: Linguistic',
38 |           'Topic :: Utilities'
39 |       ],
40 |       keywords='nlp semantics amr evaluation',
41 |       py_modules=["smatch", "amr"],
42 |       scripts=["smatch.py"],
43 |       )
44 | 


--------------------------------------------------------------------------------
/smatch-table.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | import sys
  4 | import os
  5 | import time
  6 | 
  7 | import amr
  8 | import smatch
  9 | 
 10 | ERROR_LOG = sys.stderr
 11 | 
 12 | DEBUG_LOG = sys.stderr
 13 | 
 14 | verbose = False
 15 | 
 16 | # directory on isi machine
 17 | # change if needed
 18 | isi_dir_pre = "/nfs/web/isi.edu/cgi-bin/div3/mt/save-amr"
 19 | 
 20 | 
 21 | def get_names(file_dir, files):
 22 |     """
 23 |     Get the annotator name list based on a list of files
 24 |     Args:
 25 |     file_dir: AMR file folder
 26 |     files: a list of AMR names, e.g. nw_wsj_0001_1
 27 | 
 28 |     Returns:
 29 |    a list of user names who annotate all the files
 30 |     """
 31 |     # for each user, check if they have files available
 32 |     # return user name list
 33 |     total_list = []
 34 |     name_list = []
 35 |     get_sub = False
 36 |     for path, subdir, dir_files in os.walk(file_dir):
 37 |         if not get_sub:
 38 |             total_list = subdir[:]
 39 |             get_sub = True
 40 |         else:
 41 |             break
 42 |     for user in total_list:
 43 |         has_file = True
 44 |         for f in files:
 45 |             file_path = file_dir + user + "/" + f + ".txt"
 46 |             if not os.path.exists(file_path):
 47 |                 has_file = False
 48 |                 break
 49 |         if has_file:
 50 |             name_list.append(user)
 51 |     if len(name_list) == 0:
 52 |         print("********Error: Cannot find any user who completes the files*************", file=ERROR_LOG)
 53 |     return name_list
 54 | 
 55 | 
 56 | def compute_files(user1, user2, file_list, dir_pre, start_num):
 57 | 
 58 |     """
 59 |     Compute the smatch scores for a file list between two users
 60 |     Args:
 61 |     user1: user 1 name
 62 |     user2: user 2 name
 63 |     file_list: file list
 64 |     dir_pre: the file location prefix
 65 |     start_num: the number of restarts in smatch
 66 |     Returns:
 67 |     smatch f score.
 68 | 
 69 |     """
 70 |     match_total = 0
 71 |     test_total = 0
 72 |     gold_total = 0
 73 |     for fi in file_list:
 74 |         file1 = dir_pre + user1 + "/" + fi + ".txt"
 75 |         file2 = dir_pre + user2 + "/" + fi + ".txt"
 76 |         if not os.path.exists(file1):
 77 |             print("*********Error: ", file1, "does not exist*********", file=ERROR_LOG)
 78 |             return -1.00
 79 |         if not os.path.exists(file2):
 80 |             print("*********Error: ", file2, "does not exist*********", file=ERROR_LOG)
 81 |             return -1.00
 82 |         try:
 83 |             file1_h = open(file1, "r")
 84 |             file2_h = open(file2, "r")
 85 |         except IOError:
 86 |             print("Cannot open the files", file1, file2, file=ERROR_LOG)
 87 |             break
 88 |         cur_amr1 = amr.AMR.get_amr_line(file1_h)
 89 |         cur_amr2 = amr.AMR.get_amr_line(file2_h)
 90 |         if cur_amr1 == "":
 91 |             print("AMR 1 is empty", file=ERROR_LOG)
 92 |             continue
 93 |         if cur_amr2 == "":
 94 |             print("AMR 2 is empty", file=ERROR_LOG)
 95 |             continue
 96 |         amr1 = amr.AMR.parse_AMR_line(cur_amr1)
 97 |         amr2 = amr.AMR.parse_AMR_line(cur_amr2)
 98 |         test_label = "a"
 99 |         gold_label = "b"
100 |         amr1.rename_node(test_label)
101 |         amr2.rename_node(gold_label)
102 |         (test_inst, test_rel1, test_rel2) = amr1.get_triples()
103 |         (gold_inst, gold_rel1, gold_rel2) = amr2.get_triples()
104 |         if verbose:
105 |             print("Instance triples of file 1:", len(test_inst), file=DEBUG_LOG)
106 |             print(test_inst, file=DEBUG_LOG)
107 |             print("Attribute triples of file 1:", len(test_rel1), file=DEBUG_LOG)
108 |             print(test_rel1, file=DEBUG_LOG)
109 |             print("Relation triples of file 1:", len(test_rel2), file=DEBUG_LOG)
110 |             print(test_rel2, file=DEBUG_LOG)
111 |             print("Instance triples of file 2:", len(gold_inst), file=DEBUG_LOG)
112 |             print(gold_inst, file=DEBUG_LOG)
113 |             print("Attribute triples of file 2:", len(gold_rel1), file=DEBUG_LOG)
114 |             print(gold_rel1, file=DEBUG_LOG)
115 |             print("Relation triples of file 2:", len(gold_rel2), file=DEBUG_LOG)
116 |             print(gold_rel2, file=DEBUG_LOG)
117 |         (best_match, best_match_num) = smatch.get_best_match(test_inst, test_rel1, test_rel2,
118 |                                                              gold_inst, gold_rel1, gold_rel2,
119 |                                                              test_label, gold_label)
120 |         if verbose:
121 |             print("best match number", best_match_num, file=DEBUG_LOG)
122 |             print("Best Match:", smatch.print_alignment(best_match, test_inst, gold_inst), file=DEBUG_LOG)
123 |         match_total += best_match_num
124 |         test_total += (len(test_inst) + len(test_rel1) + len(test_rel2))
125 |         gold_total += (len(gold_inst) + len(gold_rel1) + len(gold_rel2))
126 |         smatch.match_triple_dict.clear()
127 |     (precision, recall, f_score) = smatch.compute_f(match_total, test_total, gold_total)
128 |     return "%.2f" % f_score
129 | 
130 | 
131 | def get_max_width(table, index):
132 |     return max([len(str(row[index])) for row in table])
133 | 
134 | 
135 | def pprint_table(table):
136 |     """
137 |     Print a table in pretty format
138 | 
139 |     """
140 |     col_paddings = []
141 |     for i in range(len(table[0])):
142 |         col_paddings.append(get_max_width(table,i))
143 |     for row in table:
144 |         print(row[0].ljust(col_paddings[0] + 1), end="")
145 |         for i in range(1, len(row)):
146 |             col = str(row[i]).rjust(col_paddings[i]+2)
147 |             print(col, end='')
148 |         print("\n")
149 | 
150 | 
151 | def cb(option, value, parser):
152 |     """
153 |     Callback function to handle variable number of arguments in optparse
154 | 
155 |     """
156 |     arguments = [value]
157 |     for arg in parser.rargs:
158 |         if arg[0] != "-":
159 |             arguments.append(arg)
160 |         else:
161 |             del parser.rargs[:len(arguments)]
162 |             break
163 |     if getattr(parser.values, option.dest):
164 |         arguments.extend(getattr(parser.values, option.dest))
165 |     setattr(parser.values, option.dest, arguments)
166 | 
167 | 
168 | def check_args(args):
169 |     """
170 |     Parse arguments and check if the arguments are valid
171 | 
172 |     """
173 |     if not os.path.exists(args.fd):
174 |         print("Not a valid path", args.fd, file=ERROR_LOG)
175 |         return [], [], False
176 |     if args.fl is not None:
177 |         # we already ensure the file can be opened and opened the file
178 |         file_line = args.fl.readline()
179 |         amr_ids = file_line.strip().split()
180 |     elif args.f is None:
181 |         print("No AMR ID was given", file=ERROR_LOG)
182 |         return [], [], False
183 |     else:
184 |         amr_ids = args.f
185 |     names = []
186 |     check_name = True
187 |     if args.p is None:
188 |         names = get_names(args.fd, amr_ids)
189 |         # no need to check names
190 |         check_name = False
191 |         if len(names) == 0:
192 |             print("Cannot find any user who tagged these AMR", file=ERROR_LOG)
193 |             return [], [], False
194 |     else:
195 |         names = args.p
196 |     if len(names) == 0:
197 |         print("No user was given", file=ERROR_LOG)
198 |         return [], [], False
199 |     if len(names) == 1:
200 |         print("Only one user is given. Smatch calculation requires at least two users.", file=ERROR_LOG)
201 |         return [], [], False
202 |     if "consensus" in names:
203 |         con_index = names.index("consensus")
204 |         names.pop(con_index)
205 |         names.append("consensus")
206 |     # check if all the AMR_id and user combinations are valid
207 |     if check_name:
208 |         pop_name = []
209 |         for i, name in enumerate(names):
210 |             for amr in amr_ids:
211 |                 amr_path = args.fd + name + "/" + amr + ".txt"
212 |                 if not os.path.exists(amr_path):
213 |                     print("User", name, "fails to tag AMR", amr, file=ERROR_LOG)
214 |                     pop_name.append(i)
215 |                     break
216 |         if len(pop_name) != 0:
217 |             pop_num = 0
218 |             for p in pop_name:
219 |                 print("Deleting user", names[p - pop_num], "from the name list", file=ERROR_LOG)
220 |                 names.pop(p - pop_num)
221 |                 pop_num += 1
222 |         if len(names) < 2:
223 |             print("Not enough users to evaluate. Smatch requires >2 users who tag all the AMRs", file=ERROR_LOG)
224 |             return "", "", False
225 |     return amr_ids, names, True
226 | 
227 | 
228 | def main(arguments):
229 |     global verbose
230 |     (ids, names, result) = check_args(arguments)
231 |     if arguments.v:
232 |         verbose = True
233 |     if not result:
234 |         return 0
235 |     acc_time = 0
236 |     len_name = len(names)
237 |     table = []
238 |     for i in range(0, len_name + 1):
239 |         table.append([])
240 |     table[0].append("")
241 |     for i in range(0, len_name):
242 |         table[0].append(names[i])
243 |     for i in range(0, len_name):
244 |         table[i+1].append(names[i])
245 |         for j in range(0, len_name):
246 |             if i != j:
247 |                 start = time.perf_counter()
248 |                 table[i+1].append(compute_files(names[i], names[j], ids, args.fd, args.r))
249 |                 end = time.perf_counter()
250 |                 if table[i+1][-1] != -1.0:
251 |                     acc_time += end-start
252 |             else:
253 |                 table[i+1].append("")
254 |     # check table
255 |     for i in range(0, len_name + 1):
256 |         for j in range(0, len_name + 1):
257 |             if i != j:
258 |                 if table[i][j] != table[j][i]:
259 |                     if table[i][j] > table[j][i]:
260 |                         table[j][i] = table[i][j]
261 |                     else:
262 |                         table[i][j] = table[j][i]
263 |     pprint_table(table)
264 |     return acc_time
265 | 
266 | 
267 | if __name__ == "__main__":
268 |     whole_start = time.perf_counter()
269 | 
270 |     import argparse
271 | 
272 |     parser = argparse.ArgumentParser(description="Smatch table calculator")
273 |     parser.add_argument(
274 |         "--fl",
275 |         type=argparse.FileType('r'),
276 |         help='AMR ID list file')
277 |     parser.add_argument(
278 |         '-f',
279 |         nargs='+',
280 |         help='AMR IDs (at least one)')
281 |     parser.add_argument(
282 |         "-p",
283 |         nargs='*',
284 |         help="User list (can be none)")
285 |     parser.add_argument(
286 |         "--fd",
287 |         default=isi_dir_pre,
288 |         help="AMR File directory. Default=location on isi machine")
289 |     parser.add_argument(
290 |         '-r',
291 |         type=int,
292 |         default=4,
293 |         help='Restart number (Default:4)')
294 |     parser.add_argument(
295 |         '-v',
296 |         action='store_true',
297 |         help='Verbose output (Default:False)')
298 | 
299 |     args = parser.parse_args()
300 | 
301 |     # Regularize fd, add "/" at the end if needed
302 |     if args.fd[-1] != "/":
303 |         args.fd += "/"
304 | 
305 |     # acc_time is the smatch calculation time
306 |     acc_time = main(args)
307 |     whole_end = time.perf_counter()
308 |     # time of the whole running process
309 |     whole_time = whole_end - whole_start
310 | 
311 |     # print if needed
312 |     # print("Accumulated computation time", acc_time, file=ERROR_LOG)
313 |     # print("Total time", whole_time, file=ERROR_LOG)
314 |     # print("Percentage", float(acc_time)/float(whole_time), file=ERROR_LOG)
315 | 


--------------------------------------------------------------------------------
/smatch.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | # -*- coding: utf-8 -*-
  3 | 
  4 | 
  5 | """
  6 | This script computes smatch score between two AMRs.
  7 | For detailed description of smatch, see http://www.isi.edu/natural-language/amr/smatch-13.pdf
  8 | 
  9 | """
 10 | 
 11 | import random
 12 | 
 13 | import amr
 14 | import sys
 15 | 
 16 | # total number of iteration in smatch computation
 17 | iteration_num = 5
 18 | 
 19 | # verbose output switch.
 20 | # Default false (no verbose output)
 21 | verbose = False
 22 | veryVerbose = False
 23 | 
 24 | # single score output switch.
 25 | # Default true (compute a single score for all AMRs in two files)
 26 | single_score = True
 27 | 
 28 | # precision and recall output switch.
 29 | # Default false (do not output precision and recall, just output F score)
 30 | pr_flag = False
 31 | 
 32 | # Error log location
 33 | ERROR_LOG = sys.stderr
 34 | 
 35 | # Debug log location
 36 | DEBUG_LOG = sys.stderr
 37 | 
 38 | # dictionary to save pre-computed node mapping and its resulting triple match count
 39 | # key: tuples of node mapping
 40 | # value: the matching triple count
 41 | match_triple_dict = {}
 42 | 
 43 | 
 44 | def get_best_match(instance1, attribute1, relation1,
 45 |                    instance2, attribute2, relation2,
 46 |                    prefix1, prefix2, doinstance=True, doattribute=True, dorelation=True):
 47 |     """
 48 |     Get the highest triple match number between two sets of triples via hill-climbing.
 49 |     Arguments:
 50 |         instance1: instance triples of AMR 1 ("instance", node name, node value)
 51 |         attribute1: attribute triples of AMR 1 (attribute name, node name, attribute value)
 52 |         relation1: relation triples of AMR 1 (relation name, node 1 name, node 2 name)
 53 |         instance2: instance triples of AMR 2 ("instance", node name, node value)
 54 |         attribute2: attribute triples of AMR 2 (attribute name, node name, attribute value)
 55 |         relation2: relation triples of AMR 2 (relation name, node 1 name, node 2 name)
 56 |         prefix1: prefix label for AMR 1
 57 |         prefix2: prefix label for AMR 2
 58 |     Returns:
 59 |         best_match: the node mapping that results in the highest triple matching number
 60 |         best_match_num: the highest triple matching number
 61 | 
 62 |     """
 63 |     # Compute candidate pool - all possible node match candidates.
 64 |     # In the hill-climbing, we only consider candidate in this pool to save computing time.
 65 |     # weight_dict is a dictionary that maps a pair of node
 66 |     (candidate_mappings, weight_dict) = compute_pool(instance1, attribute1, relation1,
 67 |                                                      instance2, attribute2, relation2,
 68 |                                                      prefix1, prefix2, doinstance=doinstance, doattribute=doattribute,
 69 |                                                      dorelation=dorelation)
 70 |     if veryVerbose:
 71 |         print("Candidate mappings:", file=DEBUG_LOG)
 72 |         print(candidate_mappings, file=DEBUG_LOG)
 73 |         print("Weight dictionary", file=DEBUG_LOG)
 74 |         print(weight_dict, file=DEBUG_LOG)
 75 | 
 76 |     best_match_num = 0
 77 |     # initialize best match mapping
 78 |     # the ith entry is the node index in AMR 2 which maps to the ith node in AMR 1
 79 |     best_mapping = [-1] * len(instance1)
 80 |     for i in range(iteration_num):
 81 |         if veryVerbose:
 82 |             print("Iteration", i, file=DEBUG_LOG)
 83 |         if i == 0:
 84 |             # smart initialization used for the first round
 85 |             cur_mapping = smart_init_mapping(candidate_mappings, instance1, instance2)
 86 |         else:
 87 |             # random initialization for the other round
 88 |             cur_mapping = random_init_mapping(candidate_mappings)
 89 |         # compute current triple match number
 90 |         match_num = compute_match(cur_mapping, weight_dict)
 91 |         if veryVerbose:
 92 |             print("Node mapping at start", cur_mapping, file=DEBUG_LOG)
 93 |             print("Triple match number at start:", match_num, file=DEBUG_LOG)
 94 |         while True:
 95 |             # get best gain
 96 |             (gain, new_mapping) = get_best_gain(cur_mapping, candidate_mappings, weight_dict,
 97 |                                                 len(instance2), match_num)
 98 |             if veryVerbose:
 99 |                 print("Gain after the hill-climbing", gain, file=DEBUG_LOG)
100 |             # hill-climbing until there will be no gain for new node mapping
101 |             if gain <= 0:
102 |                 break
103 |             # otherwise update match_num and mapping
104 |             match_num += gain
105 |             cur_mapping = new_mapping[:]
106 |             if veryVerbose:
107 |                 print("Update triple match number to:", match_num, file=DEBUG_LOG)
108 |                 print("Current mapping:", cur_mapping, file=DEBUG_LOG)
109 |         if match_num > best_match_num:
110 |             best_mapping = cur_mapping[:]
111 |             best_match_num = match_num
112 |     return best_mapping, best_match_num
113 | 
114 | 
115 | def normalize(item):
116 |     """
117 |     lowercase and remove quote signifiers from items that are about to be compared
118 |     """
119 |     return item.lower().rstrip('_')
120 | 
121 | 
122 | def compute_pool(instance1, attribute1, relation1,
123 |                  instance2, attribute2, relation2,
124 |                  prefix1, prefix2, doinstance=True, doattribute=True, dorelation=True):
125 |     """
126 |     compute all possible node mapping candidates and their weights (the triple matching number gain resulting from
127 |     mapping one node in AMR 1 to another node in AMR2)
128 | 
129 |     Arguments:
130 |         instance1: instance triples of AMR 1
131 |         attribute1: attribute triples of AMR 1 (attribute name, node name, attribute value)
132 |         relation1: relation triples of AMR 1 (relation name, node 1 name, node 2 name)
133 |         instance2: instance triples of AMR 2
134 |         attribute2: attribute triples of AMR 2 (attribute name, node name, attribute value)
135 |         relation2: relation triples of AMR 2 (relation name, node 1 name, node 2 name
136 |         prefix1: prefix label for AMR 1
137 |         prefix2: prefix label for AMR 2
138 |     Returns:
139 |       candidate_mapping: a list of candidate nodes.
140 |                        The ith element contains the node indices (in AMR 2) the ith node (in AMR 1) can map to.
141 |                        (resulting in non-zero triple match)
142 |       weight_dict: a dictionary which contains the matching triple number for every pair of node mapping. The key
143 |                    is a node pair. The value is another dictionary. key {-1} is triple match resulting from this node
144 |                    pair alone (instance triples and attribute triples), and other keys are node pairs that can result
145 |                    in relation triple match together with the first node pair.
146 | 
147 | 
148 |     """
149 |     candidate_mapping = []
150 |     weight_dict = {}
151 |     for instance1_item in instance1:
152 |         # each candidate mapping is a set of node indices
153 |         candidate_mapping.append(set())
154 |         if doinstance:
155 |             for instance2_item in instance2:
156 |                 # if both triples are instance triples and have the same value
157 |                 if normalize(instance1_item[0]) == normalize(instance2_item[0]) and \
158 |                         normalize(instance1_item[2]) == normalize(instance2_item[2]):
159 |                     # get node index by stripping the prefix
160 |                     node1_index = int(instance1_item[1][len(prefix1):])
161 |                     node2_index = int(instance2_item[1][len(prefix2):])
162 |                     candidate_mapping[node1_index].add(node2_index)
163 |                     node_pair = (node1_index, node2_index)
164 |                     # use -1 as key in weight_dict for instance triples and attribute triples
165 |                     if node_pair in weight_dict:
166 |                         weight_dict[node_pair][-1] += 1
167 |                     else:
168 |                         weight_dict[node_pair] = {}
169 |                         weight_dict[node_pair][-1] = 1
170 |     if doattribute:
171 |         for attribute1_item in attribute1:
172 |             for attribute2_item in attribute2:
173 |                 # if both attribute relation triple have the same relation name and value
174 |                 if normalize(attribute1_item[0]) == normalize(attribute2_item[0]) \
175 |                         and normalize(attribute1_item[2]) == normalize(attribute2_item[2]):
176 |                     node1_index = int(attribute1_item[1][len(prefix1):])
177 |                     node2_index = int(attribute2_item[1][len(prefix2):])
178 |                     candidate_mapping[node1_index].add(node2_index)
179 |                     node_pair = (node1_index, node2_index)
180 |                     # use -1 as key in weight_dict for instance triples and attribute triples
181 |                     if node_pair in weight_dict:
182 |                         weight_dict[node_pair][-1] += 1
183 |                     else:
184 |                         weight_dict[node_pair] = {}
185 |                         weight_dict[node_pair][-1] = 1
186 |     if dorelation:
187 |         for relation1_item in relation1:
188 |             for relation2_item in relation2:
189 |                 # if both relation share the same name
190 |                 if normalize(relation1_item[0]) == normalize(relation2_item[0]):
191 |                     node1_index_amr1 = int(relation1_item[1][len(prefix1):])
192 |                     node1_index_amr2 = int(relation2_item[1][len(prefix2):])
193 |                     node2_index_amr1 = int(relation1_item[2][len(prefix1):])
194 |                     node2_index_amr2 = int(relation2_item[2][len(prefix2):])
195 |                     # add mapping between two nodes
196 |                     candidate_mapping[node1_index_amr1].add(node1_index_amr2)
197 |                     candidate_mapping[node2_index_amr1].add(node2_index_amr2)
198 |                     node_pair1 = (node1_index_amr1, node1_index_amr2)
199 |                     node_pair2 = (node2_index_amr1, node2_index_amr2)
200 |                     if node_pair2 != node_pair1:
201 |                         # update weight_dict weight. Note that we need to update both entries for future search
202 |                         # i.e weight_dict[node_pair1][node_pair2]
203 |                         #     weight_dict[node_pair2][node_pair1]
204 |                         if node1_index_amr1 > node2_index_amr1:
205 |                             # swap node_pair1 and node_pair2
206 |                             node_pair1 = (node2_index_amr1, node2_index_amr2)
207 |                             node_pair2 = (node1_index_amr1, node1_index_amr2)
208 |                         if node_pair1 in weight_dict:
209 |                             if node_pair2 in weight_dict[node_pair1]:
210 |                                 weight_dict[node_pair1][node_pair2] += 1
211 |                             else:
212 |                                 weight_dict[node_pair1][node_pair2] = 1
213 |                         else:
214 |                             weight_dict[node_pair1] = {-1: 0, node_pair2: 1}
215 |                         if node_pair2 in weight_dict:
216 |                             if node_pair1 in weight_dict[node_pair2]:
217 |                                 weight_dict[node_pair2][node_pair1] += 1
218 |                             else:
219 |                                 weight_dict[node_pair2][node_pair1] = 1
220 |                         else:
221 |                             weight_dict[node_pair2] = {-1: 0, node_pair1: 1}
222 |                     else:
223 |                         # two node pairs are the same. So we only update weight_dict once.
224 |                         # this generally should not happen.
225 |                         if node_pair1 in weight_dict:
226 |                             weight_dict[node_pair1][-1] += 1
227 |                         else:
228 |                             weight_dict[node_pair1] = {-1: 1}
229 |     return candidate_mapping, weight_dict
230 | 
231 | 
232 | def smart_init_mapping(candidate_mapping, instance1, instance2):
233 |     """
234 |     Initialize mapping based on the concept mapping (smart initialization)
235 |     Arguments:
236 |         candidate_mapping: candidate node match list
237 |         instance1: instance triples of AMR 1
238 |         instance2: instance triples of AMR 2
239 |     Returns:
240 |         initialized node mapping between two AMRs
241 | 
242 |     """
243 |     random.seed()
244 |     matched_dict = {}
245 |     result = []
246 |     # list to store node indices that have no concept match
247 |     no_word_match = []
248 |     for i, candidates in enumerate(candidate_mapping):
249 |         if not candidates:
250 |             # no possible mapping
251 |             result.append(-1)
252 |             continue
253 |         # node value in instance triples of AMR 1
254 |         value1 = instance1[i][2]
255 |         for node_index in candidates:
256 |             value2 = instance2[node_index][2]
257 |             # find the first instance triple match in the candidates
258 |             # instance triple match is having the same concept value
259 |             if value1 == value2:
260 |                 if node_index not in matched_dict:
261 |                     result.append(node_index)
262 |                     matched_dict[node_index] = 1
263 |                     break
264 |         if len(result) == i:
265 |             no_word_match.append(i)
266 |             result.append(-1)
267 |     # if no concept match, generate a random mapping
268 |     for i in no_word_match:
269 |         candidates = list(candidate_mapping[i])
270 |         while candidates:
271 |             # get a random node index from candidates
272 |             rid = random.randint(0, len(candidates) - 1)
273 |             candidate = candidates[rid]
274 |             if candidate in matched_dict:
275 |                 candidates.pop(rid)
276 |             else:
277 |                 matched_dict[candidate] = 1
278 |                 result[i] = candidate
279 |                 break
280 |     return result
281 | 
282 | 
283 | def random_init_mapping(candidate_mapping):
284 |     """
285 |     Generate a random node mapping.
286 |     Args:
287 |         candidate_mapping: candidate_mapping: candidate node match list
288 |     Returns:
289 |         randomly-generated node mapping between two AMRs
290 | 
291 |     """
292 |     # if needed, a fixed seed could be passed here to generate same random (to help debugging)
293 |     random.seed()
294 |     matched_dict = {}
295 |     result = []
296 |     for c in candidate_mapping:
297 |         candidates = list(c)
298 |         if not candidates:
299 |             # -1 indicates no possible mapping
300 |             result.append(-1)
301 |             continue
302 |         found = False
303 |         while candidates:
304 |             # randomly generate an index in [0, length of candidates)
305 |             rid = random.randint(0, len(candidates) - 1)
306 |             candidate = candidates[rid]
307 |             # check if it has already been matched
308 |             if candidate in matched_dict:
309 |                 candidates.pop(rid)
310 |             else:
311 |                 matched_dict[candidate] = 1
312 |                 result.append(candidate)
313 |                 found = True
314 |                 break
315 |         if not found:
316 |             result.append(-1)
317 |     return result
318 | 
319 | 
320 | def compute_match(mapping, weight_dict):
321 |     """
322 |     Given a node mapping, compute match number based on weight_dict.
323 |     Args:
324 |     mappings: a list of node index in AMR 2. The ith element (value j) means node i in AMR 1 maps to node j in AMR 2.
325 |     Returns:
326 |     matching triple number
327 |     Complexity: O(m*n) , m is the node number of AMR 1, n is the node number of AMR 2
328 | 
329 |     """
330 |     # If this mapping has been investigated before, retrieve the value instead of re-computing.
331 |     if veryVerbose:
332 |         print("Computing match for mapping", file=DEBUG_LOG)
333 |         print(mapping, file=DEBUG_LOG)
334 |     if tuple(mapping) in match_triple_dict:
335 |         if veryVerbose:
336 |             print("saved value", match_triple_dict[tuple(mapping)], file=DEBUG_LOG)
337 |         return match_triple_dict[tuple(mapping)]
338 |     match_num = 0
339 |     # i is node index in AMR 1, m is node index in AMR 2
340 |     for i, m in enumerate(mapping):
341 |         if m == -1:
342 |             # no node maps to this node
343 |             continue
344 |         # node i in AMR 1 maps to node m in AMR 2
345 |         current_node_pair = (i, m)
346 |         if current_node_pair not in weight_dict:
347 |             continue
348 |         if veryVerbose:
349 |             print("node_pair", current_node_pair, file=DEBUG_LOG)
350 |         for key in weight_dict[current_node_pair]:
351 |             if key == -1:
352 |                 # matching triple resulting from instance/attribute triples
353 |                 match_num += weight_dict[current_node_pair][key]
354 |                 if veryVerbose:
355 |                     print("instance/attribute match", weight_dict[current_node_pair][key], file=DEBUG_LOG)
356 |             # only consider node index larger than i to avoid duplicates
357 |             # as we store both weight_dict[node_pair1][node_pair2] and
358 |             #     weight_dict[node_pair2][node_pair1] for a relation
359 |             elif key[0] < i:
360 |                 continue
361 |             elif mapping[key[0]] == key[1]:
362 |                 match_num += weight_dict[current_node_pair][key]
363 |                 if veryVerbose:
364 |                     print("relation match with", key, weight_dict[current_node_pair][key], file=DEBUG_LOG)
365 |     if veryVerbose:
366 |         print("match computing complete, result:", match_num, file=DEBUG_LOG)
367 |     # update match_triple_dict
368 |     match_triple_dict[tuple(mapping)] = match_num
369 |     return match_num
370 | 
371 | 
372 | def move_gain(mapping, node_id, old_id, new_id, weight_dict, match_num):
373 |     """
374 |     Compute the triple match number gain from the move operation
375 |     Arguments:
376 |         mapping: current node mapping
377 |         node_id: remapped node in AMR 1
378 |         old_id: original node id in AMR 2 to which node_id is mapped
379 |         new_id: new node in to which node_id is mapped
380 |         weight_dict: weight dictionary
381 |         match_num: the original triple matching number
382 |     Returns:
383 |         the triple match gain number (might be negative)
384 | 
385 |     """
386 |     # new node mapping after moving
387 |     new_mapping = (node_id, new_id)
388 |     # node mapping before moving
389 |     old_mapping = (node_id, old_id)
390 |     # new nodes mapping list (all node pairs)
391 |     new_mapping_list = mapping[:]
392 |     new_mapping_list[node_id] = new_id
393 |     # if this mapping is already been investigated, use saved one to avoid duplicate computing
394 |     if tuple(new_mapping_list) in match_triple_dict:
395 |         return match_triple_dict[tuple(new_mapping_list)] - match_num
396 |     gain = 0
397 |     # add the triple match incurred by new_mapping to gain
398 |     if new_mapping in weight_dict:
399 |         for key in weight_dict[new_mapping]:
400 |             if key == -1:
401 |                 # instance/attribute triple match
402 |                 gain += weight_dict[new_mapping][-1]
403 |             elif new_mapping_list[key[0]] == key[1]:
404 |                 # relation gain incurred by new_mapping and another node pair in new_mapping_list
405 |                 gain += weight_dict[new_mapping][key]
406 |     # deduct the triple match incurred by old_mapping from gain
407 |     if old_mapping in weight_dict:
408 |         for k in weight_dict[old_mapping]:
409 |             if k == -1:
410 |                 gain -= weight_dict[old_mapping][-1]
411 |             elif mapping[k[0]] == k[1]:
412 |                 gain -= weight_dict[old_mapping][k]
413 |     # update match number dictionary
414 |     match_triple_dict[tuple(new_mapping_list)] = match_num + gain
415 |     return gain
416 | 
417 | 
418 | def swap_gain(mapping, node_id1, mapping_id1, node_id2, mapping_id2, weight_dict, match_num):
419 |     """
420 |     Compute the triple match number gain from the swapping
421 |     Arguments:
422 |     mapping: current node mapping list
423 |     node_id1: node 1 index in AMR 1
424 |     mapping_id1: the node index in AMR 2 node 1 maps to (in the current mapping)
425 |     node_id2: node 2 index in AMR 1
426 |     mapping_id2: the node index in AMR 2 node 2 maps to (in the current mapping)
427 |     weight_dict: weight dictionary
428 |     match_num: the original matching triple number
429 |     Returns:
430 |     the gain number (might be negative)
431 | 
432 |     """
433 |     new_mapping_list = mapping[:]
434 |     # Before swapping, node_id1 maps to mapping_id1, and node_id2 maps to mapping_id2
435 |     # After swapping, node_id1 maps to mapping_id2 and node_id2 maps to mapping_id1
436 |     new_mapping_list[node_id1] = mapping_id2
437 |     new_mapping_list[node_id2] = mapping_id1
438 |     if tuple(new_mapping_list) in match_triple_dict:
439 |         return match_triple_dict[tuple(new_mapping_list)] - match_num
440 |     gain = 0
441 |     new_mapping1 = (node_id1, mapping_id2)
442 |     new_mapping2 = (node_id2, mapping_id1)
443 |     old_mapping1 = (node_id1, mapping_id1)
444 |     old_mapping2 = (node_id2, mapping_id2)
445 |     if node_id1 > node_id2:
446 |         new_mapping2 = (node_id1, mapping_id2)
447 |         new_mapping1 = (node_id2, mapping_id1)
448 |         old_mapping1 = (node_id2, mapping_id2)
449 |         old_mapping2 = (node_id1, mapping_id1)
450 |     if new_mapping1 in weight_dict:
451 |         for key in weight_dict[new_mapping1]:
452 |             if key == -1:
453 |                 gain += weight_dict[new_mapping1][-1]
454 |             elif new_mapping_list[key[0]] == key[1]:
455 |                 gain += weight_dict[new_mapping1][key]
456 |     if new_mapping2 in weight_dict:
457 |         for key in weight_dict[new_mapping2]:
458 |             if key == -1:
459 |                 gain += weight_dict[new_mapping2][-1]
460 |             # to avoid duplicate
461 |             elif key[0] == node_id1:
462 |                 continue
463 |             elif new_mapping_list[key[0]] == key[1]:
464 |                 gain += weight_dict[new_mapping2][key]
465 |     if old_mapping1 in weight_dict:
466 |         for key in weight_dict[old_mapping1]:
467 |             if key == -1:
468 |                 gain -= weight_dict[old_mapping1][-1]
469 |             elif mapping[key[0]] == key[1]:
470 |                 gain -= weight_dict[old_mapping1][key]
471 |     if old_mapping2 in weight_dict:
472 |         for key in weight_dict[old_mapping2]:
473 |             if key == -1:
474 |                 gain -= weight_dict[old_mapping2][-1]
475 |             # to avoid duplicate
476 |             elif key[0] == node_id1:
477 |                 continue
478 |             elif mapping[key[0]] == key[1]:
479 |                 gain -= weight_dict[old_mapping2][key]
480 |     match_triple_dict[tuple(new_mapping_list)] = match_num + gain
481 |     return gain
482 | 
483 | 
484 | def get_best_gain(mapping, candidate_mappings, weight_dict, instance_len, cur_match_num):
485 |     """
486 |     Hill-climbing method to return the best gain swap/move can get
487 |     Arguments:
488 |     mapping: current node mapping
489 |     candidate_mappings: the candidates mapping list
490 |     weight_dict: the weight dictionary
491 |     instance_len: the number of the nodes in AMR 2
492 |     cur_match_num: current triple match number
493 |     Returns:
494 |     the best gain we can get via swap/move operation
495 | 
496 |     """
497 |     largest_gain = 0
498 |     # True: using swap; False: using move
499 |     use_swap = True
500 |     # the node to be moved/swapped
501 |     node1 = None
502 |     # store the other node affected. In swap, this other node is the node swapping with node1. In move, this other
503 |     # node is the node node1 will move to.
504 |     node2 = None
505 |     # unmatched nodes in AMR 2
506 |     unmatched = set(range(instance_len))
507 |     # exclude nodes in current mapping
508 |     # get unmatched nodes
509 |     for nid in mapping:
510 |         if nid in unmatched:
511 |             unmatched.remove(nid)
512 |     for i, nid in enumerate(mapping):
513 |         # current node i in AMR 1 maps to node nid in AMR 2
514 |         for nm in unmatched:
515 |             if nm in candidate_mappings[i]:
516 |                 # remap i to another unmatched node (move)
517 |                 # (i, m) -> (i, nm)
518 |                 if veryVerbose:
519 |                     print("Remap node", i, "from ", nid, "to", nm, file=DEBUG_LOG)
520 |                 mv_gain = move_gain(mapping, i, nid, nm, weight_dict, cur_match_num)
521 |                 if veryVerbose:
522 |                     print("Move gain:", mv_gain, file=DEBUG_LOG)
523 |                     new_mapping = mapping[:]
524 |                     new_mapping[i] = nm
525 |                     new_match_num = compute_match(new_mapping, weight_dict)
526 |                     if new_match_num != cur_match_num + mv_gain:
527 |                         print(mapping, new_mapping, file=ERROR_LOG)
528 |                         print("Inconsistency in computing: move gain", cur_match_num, mv_gain, new_match_num,
529 |                               file=ERROR_LOG)
530 |                 if mv_gain > largest_gain:
531 |                     largest_gain = mv_gain
532 |                     node1 = i
533 |                     node2 = nm
534 |                     use_swap = False
535 |     # compute swap gain
536 |     for i, m in enumerate(mapping):
537 |         for j in range(i + 1, len(mapping)):
538 |             m2 = mapping[j]
539 |             # no need to compute swap gain if both (i, m2) (j, m) are not in candidate mappings
540 |             # such a swap cannot incur any gains
541 |             if (m2 not in candidate_mappings[i]) and (m not in candidate_mappings[j]):
542 |                 continue
543 |             # swap operation (i, m) (j, m2) -> (i, m2) (j, m)
544 |             # j starts from i+1, to avoid duplicate swap
545 |             if veryVerbose:
546 |                 print("Swap node", i, "and", j, file=DEBUG_LOG)
547 |                 print("Before swapping:", i, "-", m, ",", j, "-", m2, file=DEBUG_LOG)
548 |                 print(mapping, file=DEBUG_LOG)
549 |                 print("After swapping:", i, "-", m2, ",", j, "-", m, file=DEBUG_LOG)
550 |             sw_gain = swap_gain(mapping, i, m, j, m2, weight_dict, cur_match_num)
551 |             if veryVerbose:
552 |                 print("Swap gain:", sw_gain, file=DEBUG_LOG)
553 |                 new_mapping = mapping[:]
554 |                 new_mapping[i] = m2
555 |                 new_mapping[j] = m
556 |                 print(new_mapping, file=DEBUG_LOG)
557 |                 new_match_num = compute_match(new_mapping, weight_dict)
558 |                 if new_match_num != cur_match_num + sw_gain:
559 |                     print(mapping, new_mapping, file=ERROR_LOG)
560 |                     print("Inconsistency in computing: swap gain", cur_match_num, sw_gain, new_match_num,
561 |                           file=ERROR_LOG)
562 |             if sw_gain > largest_gain:
563 |                 largest_gain = sw_gain
564 |                 node1 = i
565 |                 node2 = j
566 |                 use_swap = True
567 |     # generate a new mapping based on swap/move
568 |     cur_mapping = mapping[:]
569 |     if node1 is not None:
570 |         if use_swap:
571 |             if veryVerbose:
572 |                 print("Use swap gain", file=DEBUG_LOG)
573 |             temp = cur_mapping[node1]
574 |             cur_mapping[node1] = cur_mapping[node2]
575 |             cur_mapping[node2] = temp
576 |         else:
577 |             if veryVerbose:
578 |                 print("Use move gain", file=DEBUG_LOG)
579 |             cur_mapping[node1] = node2
580 |     else:
581 |         if veryVerbose:
582 |             print("no move/swap gain found", file=DEBUG_LOG)
583 |     if veryVerbose:
584 |         print("Original mapping", mapping, file=DEBUG_LOG)
585 |         print("Current mapping", cur_mapping, file=DEBUG_LOG)
586 |     return largest_gain, cur_mapping
587 | 
588 | 
589 | def print_alignment(mapping, instance1, instance2):
590 |     """
591 |     print the alignment based on a node mapping
592 |     Args:
593 |         mapping: current node mapping list
594 |         instance1: nodes of AMR 1
595 |         instance2: nodes of AMR 2
596 | 
597 |     """
598 |     result = []
599 |     for instance1_item, m in zip(instance1, mapping):
600 |         r = instance1_item[1] + "(" + instance1_item[2] + ")"
601 |         if m == -1:
602 |             r += "-Null"
603 |         else:
604 |             instance2_item = instance2[m]
605 |             r += "-" + instance2_item[1] + "(" + instance2_item[2] + ")"
606 |         result.append(r)
607 |     return " ".join(result)
608 | 
609 | 
610 | def compute_f(match_num, test_num, gold_num):
611 |     """
612 |     Compute the f-score based on the matching triple number,
613 |                                  triple number of AMR set 1,
614 |                                  triple number of AMR set 2
615 |     Args:
616 |         match_num: matching triple number
617 |         test_num:  triple number of AMR 1 (test file)
618 |         gold_num:  triple number of AMR 2 (gold file)
619 |     Returns:
620 |         precision: match_num/test_num
621 |         recall: match_num/gold_num
622 |         f_score: 2*precision*recall/(precision+recall)
623 |     """
624 |     if test_num == 0 or gold_num == 0:
625 |         return 0.00, 0.00, 0.00
626 |     precision = float(match_num) / float(test_num)
627 |     recall = float(match_num) / float(gold_num)
628 |     if (precision + recall) != 0:
629 |         f_score = 2 * precision * recall / (precision + recall)
630 |         if veryVerbose:
631 |             print("F-score:", f_score, file=DEBUG_LOG)
632 |         return precision, recall, f_score
633 |     else:
634 |         if veryVerbose:
635 |             print("F-score:", "0.0", file=DEBUG_LOG)
636 |         return precision, recall, 0.00
637 | 
638 | 
639 | def generate_amr_lines(f1, f2):
640 |     """
641 |     Read one AMR line at a time from each file handle
642 |     :param f1: file handle (or any iterable of strings) to read AMR 1 lines from
643 |     :param f2: file handle (or any iterable of strings) to read AMR 2 lines from
644 |     :return: generator of cur_amr1, cur_amr2 pairs: one-line AMR strings
645 |     """
646 |     while True:
647 |         cur_amr1 = amr.AMR.get_amr_line(f1)
648 |         cur_amr2 = amr.AMR.get_amr_line(f2)
649 |         if not cur_amr1 and not cur_amr2:
650 |             pass
651 |         elif not cur_amr1:
652 |             print("Error: File 1 has less AMRs than file 2", file=ERROR_LOG)
653 |             print("Ignoring remaining AMRs", file=ERROR_LOG)
654 |         elif not cur_amr2:
655 |             print("Error: File 2 has less AMRs than file 1", file=ERROR_LOG)
656 |             print("Ignoring remaining AMRs", file=ERROR_LOG)
657 |         else:
658 |             yield cur_amr1, cur_amr2
659 |             continue
660 |         break
661 | 
662 | 
663 | def get_amr_match(cur_amr1, cur_amr2, sent_num=1, justinstance=False, justattribute=False, justrelation=False):
664 |     amr_pair = []
665 |     for i, cur_amr in (1, cur_amr1), (2, cur_amr2):
666 |         try:
667 |             amr_pair.append(amr.AMR.parse_AMR_line(cur_amr))
668 |         except Exception as e:
669 |             print("Error in parsing amr %d: %s" % (i, cur_amr), file=ERROR_LOG)
670 |             print("Please check if the AMR is ill-formatted. Ignoring remaining AMRs", file=ERROR_LOG)
671 |             print("Error message: %s" % e, file=ERROR_LOG)
672 |     amr1, amr2 = amr_pair
673 |     prefix1 = "a"
674 |     prefix2 = "b"
675 |     # Rename node to "a1", "a2", .etc
676 |     amr1.rename_node(prefix1)
677 |     # Renaming node to "b1", "b2", .etc
678 |     amr2.rename_node(prefix2)
679 |     (instance1, attributes1, relation1) = amr1.get_triples()
680 |     (instance2, attributes2, relation2) = amr2.get_triples()
681 |     if verbose:
682 |         print("AMR pair", sent_num, file=DEBUG_LOG)
683 |         print("============================================", file=DEBUG_LOG)
684 |         print("AMR 1 (one-line):", cur_amr1, file=DEBUG_LOG)
685 |         print("AMR 2 (one-line):", cur_amr2, file=DEBUG_LOG)
686 |         print("Instance triples of AMR 1:", len(instance1), file=DEBUG_LOG)
687 |         print(instance1, file=DEBUG_LOG)
688 |         print("Attribute triples of AMR 1:", len(attributes1), file=DEBUG_LOG)
689 |         print(attributes1, file=DEBUG_LOG)
690 |         print("Relation triples of AMR 1:", len(relation1), file=DEBUG_LOG)
691 |         print(relation1, file=DEBUG_LOG)
692 |         print("Instance triples of AMR 2:", len(instance2), file=DEBUG_LOG)
693 |         print(instance2, file=DEBUG_LOG)
694 |         print("Attribute triples of AMR 2:", len(attributes2), file=DEBUG_LOG)
695 |         print(attributes2, file=DEBUG_LOG)
696 |         print("Relation triples of AMR 2:", len(relation2), file=DEBUG_LOG)
697 |         print(relation2, file=DEBUG_LOG)
698 |     # optionally turn off some of the node comparison
699 |     doinstance = doattribute = dorelation = True
700 |     if justinstance:
701 |         doattribute = dorelation = False
702 |     if justattribute:
703 |         doinstance = dorelation = False
704 |     if justrelation:
705 |         doinstance = doattribute = False
706 |     (best_mapping, best_match_num) = get_best_match(instance1, attributes1, relation1,
707 |                                                     instance2, attributes2, relation2,
708 |                                                     prefix1, prefix2, doinstance=doinstance,
709 |                                                     doattribute=doattribute, dorelation=dorelation)
710 |     if verbose:
711 |         print("best match number", best_match_num, file=DEBUG_LOG)
712 |         print("best node mapping", best_mapping, file=DEBUG_LOG)
713 |         print("Best node mapping alignment:", print_alignment(best_mapping, instance1, instance2), file=DEBUG_LOG)
714 |     if justinstance:
715 |         test_triple_num = len(instance1)
716 |         gold_triple_num = len(instance2)
717 |     elif justattribute:
718 |         test_triple_num = len(attributes1)
719 |         gold_triple_num = len(attributes2)
720 |     elif justrelation:
721 |         test_triple_num = len(relation1)
722 |         gold_triple_num = len(relation2)
723 |     else:
724 |         test_triple_num = len(instance1) + len(attributes1) + len(relation1)
725 |         gold_triple_num = len(instance2) + len(attributes2) + len(relation2)
726 |     return best_match_num, test_triple_num, gold_triple_num
727 | 
728 | 
729 | def score_amr_pairs(f1, f2, justinstance=False, justattribute=False, justrelation=False):
730 |     """
731 |     Score one pair of AMR lines at a time from each file handle
732 |     :param f1: file handle (or any iterable of strings) to read AMR 1 lines from
733 |     :param f2: file handle (or any iterable of strings) to read AMR 2 lines from
734 |     :param justinstance: just pay attention to matching instances
735 |     :param justattribute: just pay attention to matching attributes
736 |     :param justrelation: just pay attention to matching relations
737 |     :return: generator of cur_amr1, cur_amr2 pairs: one-line AMR strings
738 |     """
739 |     # matching triple number, triple number in test file, triple number in gold file
740 |     total_match_num = total_test_num = total_gold_num = 0
741 |     # Read amr pairs from two files
742 |     for sent_num, (cur_amr1, cur_amr2) in enumerate(generate_amr_lines(f1, f2), start=1):
743 |         best_match_num, test_triple_num, gold_triple_num = get_amr_match(cur_amr1, cur_amr2,
744 |                                                                          sent_num=sent_num,  # sentence number
745 |                                                                          justinstance=justinstance,
746 |                                                                          justattribute=justattribute,
747 |                                                                          justrelation=justrelation)
748 |         total_match_num += best_match_num
749 |         total_test_num += test_triple_num
750 |         total_gold_num += gold_triple_num
751 |         # clear the matching triple dictionary for the next AMR pair
752 |         match_triple_dict.clear()
753 |         if not single_score:  # if each AMR pair should have a score, compute and output it here
754 |             yield compute_f(best_match_num, test_triple_num, gold_triple_num)
755 |     if verbose:
756 |         print("Total match number, total triple number in AMR 1, and total triple number in AMR 2:", file=DEBUG_LOG)
757 |         print(total_match_num, total_test_num, total_gold_num, file=DEBUG_LOG)
758 |         print("---------------------------------------------------------------------------------", file=DEBUG_LOG)
759 |     if single_score:  # output document-level smatch score (a single f-score for all AMR pairs in two files)
760 |         yield compute_f(total_match_num, total_test_num, total_gold_num)
761 | 
762 | 
763 | def main(arguments):
764 |     """
765 |     Main function of smatch score calculation
766 |     """
767 |     global verbose
768 |     global veryVerbose
769 |     global iteration_num
770 |     global single_score
771 |     global pr_flag
772 |     global match_triple_dict
773 |     # set the iteration number
774 |     # total iteration number = restart number + 1
775 |     iteration_num = arguments.r + 1
776 |     if arguments.ms:
777 |         single_score = False
778 |     if arguments.v:
779 |         verbose = True
780 |     if arguments.vv:
781 |         veryVerbose = True
782 |     if arguments.pr:
783 |         pr_flag = True
784 |     # significant digits to print out
785 |     floatdisplay = "%%.%df" % arguments.significant
786 |     for (precision, recall, best_f_score) in score_amr_pairs(args.f[0], args.f[1],
787 |                                                              justinstance=arguments.justinstance,
788 |                                                              justattribute=arguments.justattribute,
789 |                                                              justrelation=arguments.justrelation):
790 |         # print("Sentence", sent_num)
791 |         if pr_flag:
792 |             print("Precision: " + floatdisplay % precision)
793 |             print("Recall: " + floatdisplay % recall)
794 |         print("F-score: " + floatdisplay % best_f_score)
795 |     args.f[0].close()
796 |     args.f[1].close()
797 | 
798 | 
799 | if __name__ == "__main__":
800 |     import argparse
801 | 
802 |     parser = argparse.ArgumentParser(description="Smatch calculator")
803 |     parser.add_argument(
804 |         '-f',
805 |         nargs=2,
806 |         required=True,
807 |         type=argparse.FileType('r'),
808 |         help=('Two files containing AMR pairs. '
809 |               'AMRs in each file are separated by a single blank line'))
810 |     parser.add_argument(
811 |         '-r',
812 |         type=int,
813 |         default=4,
814 |         help='Restart number (Default:4)')
815 |     parser.add_argument(
816 |         '--significant',
817 |         type=int,
818 |         default=2,
819 |         help='significant digits to output (default: 2)')
820 |     parser.add_argument(
821 |         '-v',
822 |         action='store_true',
823 |         help='Verbose output (Default:false)')
824 |     parser.add_argument(
825 |         '--vv',
826 |         action='store_true',
827 |         help='Very Verbose output (Default:false)')
828 |     parser.add_argument(
829 |         '--ms',
830 |         action='store_true',
831 |         default=False,
832 |         help=('Output multiple scores (one AMR pair a score) '
833 |               'instead of a single document-level smatch score '
834 |               '(Default: false)'))
835 |     parser.add_argument(
836 |         '--pr',
837 |         action='store_true',
838 |         default=False,
839 |         help=('Output precision and recall as well as the f-score. '
840 |               'Default: false'))
841 |     parser.add_argument(
842 |         '--justinstance',
843 |         action='store_true',
844 |         default=False,
845 |         help="just pay attention to matching instances")
846 |     parser.add_argument(
847 |         '--justattribute',
848 |         action='store_true',
849 |         default=False,
850 |         help="just pay attention to matching attributes")
851 |     parser.add_argument(
852 |         '--justrelation',
853 |         action='store_true',
854 |         default=False,
855 |         help="just pay attention to matching relations")
856 | 
857 |     args = parser.parse_args()
858 |     main(args)
859 | 


--------------------------------------------------------------------------------
/smatch_tool_guideline.txt:
--------------------------------------------------------------------------------
 1 | Smatch Tool Guideline
 2 | 
 3 | Smatch is a tool to evaluate the semantic overlap between AMR (abstract meaning representation). It can be used to compute the inter agreements of AMRs, and the agreement between an automatic-generated AMR and a gold AMR. For multiple AMR pairs, the smatch tool can provide an overall score for all the AMR pairs. 
 4 | 
 5 | I. Content and web demo pages
 6 | 
 7 | This directory contains the Smatch source code and documentation. 
 8 | 
 9 | Smatch Webpages
10 | 
11 | Smatch tool webpage: http://amr.isi.edu/eval/smatch/compare.html (A quick tutorial can be found on the page)
12 | - input: two AMRs. 
13 | - output: the smatch score and the matching/unmatching triples.
14 | 
15 | Smatch table tool webpage: http://amr.isi.edu/eval/smatch/table.html
16 | - input: AMR IDs and users. 
17 | - output: a table which consists of the smatch scores of every pair of users.
18 | 
19 | II. Installation
20 | 
21 | Python (version 2.5 or later) is required to run smatch tool. Python 2.7 is recommended. No compilation is necessary. 
22 | 
23 | III. Usage
24 | 
25 | Smatch tool consists of three files written in python.
26 | 
27 | 1. smatch.py: for computing the smatch score(s) for multiple AMRs in two files.
28 | 
29 | Input: two files which contain AMRs. Each file may contain multiple AMRs, and every two AMRs are separated by a blank line. AMRs can be one-per-line or have multiple lines, as long as there is no blank line in one AMR.  
30 | 
31 | Input file format: see test_input1.txt, test_input2.txt in the smatch tool folder. AMRs are separated by one or more blank lines, so no blank lines are allowed inside an AMR. Lines starting with a hash (#) will be ignored.
32 | 
33 | Output: Smatch score(s) computed 
34 | 
35 | Usage: python smatch.py [-h] -f F F [-r R] [-v] [-ms]
36 | 
37 | arguments:
38 | 
39 | -h: help
40 | 
41 | -f: two files which contain multiple AMRs. A blank line is used to separate two AMRs. Required arguments.
42 | 
43 | -r: restart numer of the heuristic search during computation, optional. Default value: 4. This argument must be a positive integer. Large restart number will reduce the chance of search error, but also increase the running time. Small restart number will reduce the running time as well as increase the change of search error. The default value is by far the best trade-off. User can set a large number if the AMR length is long (search space is large) and user does not need very high calculation speed.  
44 | 
45 | -v: verbose output, optional. Default value: false. The verbose information includes the triples of each AMR, the matching triple number found for each iterations, and the best matching triple number. It is useful when you try to understand how the program works. User will not need this option most of the time. 
46 |  
47 | --ms: multiple score, optional. Adding this option will result in a single smatch score for each AMR pair. Otherwise it will output one single weighted score based on all pairs of AMRs. AMRs are weighted according to their number of triples.
48 | Default value: false
49 | 
50 | --pr: Output precision and recall as well as the f-score. Default:false
51 | 
52 | A typical (and most common) example of running smatch.py: 
53 | 
54 | python smatch.py -f test_input1.txt test_input2.txt
55 | 
56 | This folder includes sample files test_input1.txt and test_input2.txt, so you should be able to run the above command as is. The above command should get the following line:
57 | Document F-score: 0.81
58 | 
59 | 2. amr.py: a class to represent AMR structure. It contains a function to parse lines to AMR structure. smatch.py calls it to parse AMRs.
60 | 
61 | 3. smatch-table.py: it calls the smatch library to compute the smatch scores for a group of users and multiple AMR IDs, and output a table to show the AMR score between each pair of users. 
62 | 
63 | Input: AMR ID list and User list. AMR ID list can be stored in a file (-fl file) or given by the command line (-f AMR_ID1, AMR_ID2,...). User list are given by the command line (-p user1,user2,..). If no users are given, the program searches for all the users who annotates all AMRs we require. The user number should be at least 2. 
64 | 
65 | Input file format: AMR ID list (see sample_file_list the smatch tool folder)
66 | 
67 | Output: A table which shows the overall AMR score between every pair of users. 
68 | 
69 | Usage: python smatch-table.py [-h] [--fl FL] [-f F [F ...]] [-p [P [P ...]]]
70 |                        [--fd FD] [-r R] [-v]
71 | 
72 | optional arguments:
73 | 
74 | -h, --help      show this help message and exit
75 | 
76 | --fl FL         AMR ID list file (a file which contains one line of AMR IDs, separated by blank space)
77 | 
78 | -f F [F ...]    AMR IDs (at least one). If we already have valid AMR ID list file, this option will be ignored.
79 | 
80 | -p [P [P ...]]  User list (It can be unspecified. When the list is none, the program searches for all the users who annotates all AMRs we require) It is meaningless to give only one user since smatch-table computes agreement between each pair of users. So the number of P is at least 2.
81 | 
82 | --fd FD         AMR File directory. Default=location on isi file system
83 | 
84 | -r R            Restart number (Default:4), same as the -r option in smatch.py
85 | 
86 | -v              Verbose output (Default:False), same as the -v option in smatch.py
87 | 
88 | 
89 | A typical example of running smatch-table.py: 
90 | 
91 | python smatch-table.py --fd $amr_root_dir --fl sample_file_list -p ulf knight
92 | 
93 | which will compare files
94 | $amr_root_dir/ulf/nw_wsj_0001_1.txt $amr_root_dir/knight/nw_wsj_0001_1.txt
95 | $amr_root_dir/ulf/nw_wsj_0001_2.txt $amr_root_dir/knight/nw_wsj_0001_2.txt
96 | etc.
97 | 
98 | Note: smatch-table.py computes smatch scores for every pair of users, so its speed can be slow when the number of user is large or when -P option is not set (in this case we compute smatch scores for all users who annotates the AMRs we require).
99 | 


--------------------------------------------------------------------------------
/test_input1.txt:
--------------------------------------------------------------------------------
 1 | # ::id isi_0001.1 ::date 2012-05-14T21:45:29
 2 | # ::snt The boy wants the girl to believe him.
 3 | (w / want-01
 4 |       :ARG0 (b / boy)
 5 |       :ARG1 (b2 / believe-01
 6 |             :ARG0 (g / girl)
 7 |             :ARG1 b))
 8 | 
 9 | # ::id isi_0001.25 ::date 2012-05-14T21:59:17
10 | # ::snt The boy is a hard worker.
11 | (p / person
12 |       :domain (b / boy)
13 |       :ARG0-of (w / work-01
14 |             :manner (h / hard)))
15 | 
16 | # ::id isi_0002.209 ::date 2013-05-16T17:19:07
17 | # ::snt The poet William Shakespeare was born in Stratford-upon-Avon.
18 | (b / bear-02
19 |       :ARG1 (p / poet :name (n / name :op1 "William" :op2 "Shakespeare"))
20 |       :location (c / city :name (n2 / name :op1 "Stratford-upon-Avon")))
21 | 
22 | 


--------------------------------------------------------------------------------
/test_input2.txt:
--------------------------------------------------------------------------------
 1 | # ::id isi_0001.1 ::date 2012-05-14T21:45:29
 2 | # ::snt The boy wants the girl to believe him.
 3 | (w / want-01
 4 |       :ARG0 (b / boy)
 5 |       :ARG1 (b2 / believe-01
 6 |             :ARG0 (g / girl)
 7 |             :ARG1 (h / he)))
 8 | 
 9 | # ::id isi_0001.25 ::date 2012-05-14T21:59:17
10 | # ::snt The boy is a hard worker.
11 | (w / worker
12 |       :mod (h / hard)
13 |       :domain (b / boy))
14 | 
15 | # ::id isi_0002.209 ::date 2013-05-16T17:19:07
16 | # ::snt The poet William Shakespeare was born in Stratford-upon-Avon.
17 | (b / bear-02
18 |       :ARG1 (p / poet :name (n / name :op1 william :op2 "shakespeare"))
19 |       :location (c / city :name (n2 / name :op1 "Stratford-upon-Avon")))
20 | 
21 | 


--------------------------------------------------------------------------------
/tests/README.md:
--------------------------------------------------------------------------------
 1 | 
 2 | # smatch tests
 3 | 
 4 | This directory contains test cases to verify the correct behavior of
 5 | smatch. Run the tests with [pytest](https://pytest.org/). This will
 6 | require you to install both pytest and smatch:
 7 | 
 8 | ```console
 9 | $ pip install pytest
10 | $ pip install -e .  # current directory is smatch
11 | $ pytest
12 | ```
13 | 
14 | **Note:** As smatch is inherently non-deterministic due to its
15 | hill-climbing implementation, the tests can be "flaky" (i.e.,
16 | sometimes pass, sometimes fail). To mitigate the possibility of flaky
17 | tests, test cases should use *minimal* AMRs so it becomes trivial for
18 | smatch to get the optimal solution.
19 | 


--------------------------------------------------------------------------------
/tests/test_top.py:
--------------------------------------------------------------------------------
 1 | 
 2 | """
 3 | The top node in an AMR determines the focus of the encoded
 4 | meaning. For example, the following have the same dependencies but a
 5 | different node at the top:
 6 | 
 7 |     (w / white-03
 8 |        :ARG1 (m / marble))
 9 | 
10 | The above AMR means "the marble is white" or "the whiteness of the
11 | marble".
12 | 
13 |     (m / marble
14 |        :ARG1-of (w / white-03))
15 | 
16 | This AMR means "the marble that is white" or "the white marble".
17 | 
18 | For this reason, AMRs that differ only in which node is the top will
19 | get smatch scores less than 1.0.
20 | 
21 | For more information see:
22 | 
23 |     https://github.com/amrisi/amr-guidelines/blob/master/amr.md#focus
24 | """
25 | 
26 | import smatch
27 | 
28 | a = '(a / alpha :ARG0 (b / beta))'
29 | b = '(a / alternative :ARG0 (b / beta))'
30 | c = '(a / alpha :ARG0 (b / b-side))'
31 | d = '(b / beta :ARG0-of (a / alpha))'
32 | 
33 | 
34 | def get_amr_match(amr1, amr2):
35 |     vals = smatch.get_amr_match(amr1, amr2)
36 |     smatch.match_triple_dict.clear()
37 |     return vals
38 | 
39 | 
40 | def test_same():
41 |     assert get_amr_match(a, a) == (4, 4, 4)
42 |     smatch.match_triple_dict.clear()
43 | 
44 | 
45 | def test_same_top_different_top_concept():
46 |     assert get_amr_match(a, b) == (3, 4, 4)
47 |     smatch.match_triple_dict.clear()
48 | 
49 | 
50 | def test_same_top_different_dependent_concept():
51 |     assert get_amr_match(a, c) == (3, 4, 4)
52 |     smatch.match_triple_dict.clear()
53 | 
54 | 
55 | def test_same_different_top():
56 |     assert get_amr_match(a, d) == (3, 4, 4)
57 |     smatch.match_triple_dict.clear()
58 | 


--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
1 | [tox]
2 | envlist = py{35,36,37,38}
3 | 
4 | [testenv]
5 | commands = python smatch.py -f test_input1.txt test_input2.txt
6 | 
7 | 


--------------------------------------------------------------------------------