├── README.md
├── automatic-concept-map.pdf
├── conceptmap.py
├── main2.py
├── open_ie_api.py
├── sample
├── out.png
└── readme
├── samples.txt
├── test.txt
└── word_list.txt
/README.md:
--------------------------------------------------------------------------------
1 | # Concept-Map-Generator
2 | Term Project for Language Processing for E-Learning
3 |
4 | This work was done as part of a mini project for the course on Language Processing for E-Learning under the guidance of Professor Plaban Bhowmick.
5 |
6 | It makes use of the DBpedia Spotlight Web API to extract important concepts from a given text and then uses OpenIE to extract relations. Visualisation is done using GraphViz.
7 |
8 | This work uses the python wrapper for Stanford OpenIE written by [philipperemy](https://github.com/philipperemy/Stanford-OpenIE-Python).
9 |
10 | ## Usage
11 |
12 | First of all, make sure Java 1.8 is installed. Open a terminal and run this command to check:
13 |
14 | ```
15 | java -version
16 | ```
17 |
18 | If this is not the case and if your OS is Ubuntu, you can install it this way:
19 |
20 | ```
21 | sudo add-apt-repository ppa:webupd8team/java
22 | sudo apt-get update
23 | sudo apt-get install oracle-java8-installer
24 | ```
25 | Note: Make sure GraphViz is installed beforehand. Try to run the `dot` command to see if this is the case. If not, run `sudo apt-get install graphviz` if you're running on Ubuntu.
26 |
27 | How to use:
28 | Paste your desired input text in place of 'Your Text'
29 | ```
30 | git clone https://github.com/philipperemy/Stanford-OpenIE-Python.git
31 | git clone https://github.com/PranavKhadpe/Concept-Map-Generator.git
32 | cp Concept-Map-Generator/{main2.py,conceptmap.py} Stanford-OpenIE-Python
33 | cd Stanford-OpenIE-Python
34 | echo Your Text > test.txt
35 | python conceptmap.py -i test.txt
36 | python main2.py -f test.txt -g
37 | ```
38 | Will generate a [GraphViz DOT](http://www.graphviz.org/) graph and its related PNG file in `/tmp/openie/`
39 |
40 | ## Example
41 |
42 | Sample map generated for elementary paragraph on Photosynthesis
43 |
44 |
45 |

46 |
47 |
48 |
49 |
50 |
--------------------------------------------------------------------------------
/automatic-concept-map.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sudhanshusks/concept_map/207ac105dcaf4da480fae4b891e8f4fe472d6b90/automatic-concept-map.pdf
--------------------------------------------------------------------------------
/conceptmap.py:
--------------------------------------------------------------------------------
1 | '''
2 | Automatic Concept Map Generator
3 | Author: Pranav Khadpe
4 | '''
5 | import sys
6 | import getopt
7 | import requests
8 | import urllib.parse
9 | import json
10 | import nltk
11 |
12 |
13 | '''
14 | Function to get the imoprtant concepts using the DBpedia spotlight web API
15 | query = text
16 | '''
17 | def GetImportantConcept(text):
18 | query = text
19 | params = urllib.parse.urlencode({'text': query})
20 | url = "http://model.dbpedia-spotlight.org/en/spot?%s" % params
21 | request = requests.get(url, headers={"accept": "application/json"})
22 | pydict = json.loads(request.content)
23 | entities = list(pydict["annotation"]["surfaceForm"])
24 | dictionary = len(entities)
25 | word_list = []
26 | for i in range(0, dictionary):
27 | word_list.append(entities[i]['@name'])
28 | return word_list
29 | '''
30 | Accepts the text and returns a list of sentences that contain the important concepts.
31 | '''
32 | def GetSentences(word_list, text):
33 | sent_list = nltk.sent_tokenize(text)
34 | impsent_list = []
35 | num_sentences = len(sent_list)
36 | for i in range(0,num_sentences):
37 | if any(word in sent_list[i] for word in word_list):
38 | impsent_list.append(sent_list[i])
39 | return impsent_list
40 |
41 | '''
42 | def ExtractRelations(word_list, sentences):
43 | '''
44 |
45 | def main(argv):
46 | inputfile = ''
47 | outputfile = ''
48 | try:
49 | opts, args = getopt.getopt(argv,"hi:o:",["ifile=","ofile="])
50 | except getopt.GetoptError:
51 | print ('test.py -i -o ')
52 | sys.exit(2)
53 | for opt, arg in opts:
54 | if opt == '-h':
55 | print ('test.py -i -o ')
56 | sys.exit()
57 | elif opt in ("-i", "--ifile"):
58 | inputfile = arg
59 | elif opt in ("-o", "--ofile"):
60 | outputfile = arg
61 | print ('Input file is ', inputfile)
62 | print ('Output file is ', outputfile)
63 | with open(inputfile, 'r') as in_file:
64 | text = in_file.read()
65 | word_list = GetImportantConcept(text)
66 | sentences = GetSentences(word_list, text)
67 | print(word_list)
68 | print(sentences)
69 | word_file = open('word_list.txt', 'w')
70 | for item in word_list:
71 | word_file.write("%s\n" % item)
72 |
73 |
74 |
75 | if __name__ == "__main__":
76 | print("Usage:test.py -i -o ")
77 | main(sys.argv[1:])
78 |
79 |
80 |
81 |
82 |
83 |
--------------------------------------------------------------------------------
/main2.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python
2 | '''
3 | Concept Map Generator.
4 | usage: python main2.py -f input_file -g
5 | Author: Pranav Khadpe
6 | Date: 06-04-2018
7 |
8 | Note: Parts of this code are lifted as is from those written by Philippe Remy.
9 | '''
10 |
11 | # Copyright (c) 2016, Philippe Remy
12 | #
13 | # Permission to use, copy, modify, and/or distribute this software for any
14 | # purpose with or without fee is hereby granted, provided that the above
15 | # copyright notice and this permission notice appear in all copies.
16 | #
17 | # THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
18 | # WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
19 | # MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
20 | # ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
21 | # WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
22 | # ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
23 | # OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
24 |
25 | from __future__ import print_function
26 |
27 | import os
28 | from argparse import ArgumentParser
29 | from subprocess import Popen
30 | from sys import argv
31 | from sys import stderr
32 |
33 | JAVA_BIN_PATH = 'java'
34 | DOT_BIN_PATH = 'dot'
35 | STANFORD_IE_FOLDER = 'stanford-openie'
36 |
37 | tmp_folder = '/tmp/openie/'
38 | if not os.path.exists(tmp_folder):
39 | os.makedirs(tmp_folder)
40 |
41 |
42 | def arg_parse():
43 | arg_p = ArgumentParser('Stanford IE Python Wrapper')
44 | arg_p.add_argument('-f', '--filename', type=str, default=None)
45 | arg_p.add_argument('-v', '--verbose', action='store_true')
46 | arg_p.add_argument('-g', '--generate_graph', action='store_true')
47 | return arg_p
48 |
49 |
50 | def debug_print(log, verbose):
51 | if verbose:
52 | print(log)
53 |
54 |
55 | def process_entity_relations(entity_relations_str, verbose=True):
56 | # format is ollie.
57 | entity_relations = list()
58 | for s in entity_relations_str:
59 | entity_relations.append(s[s.find("(") + 1:s.find(")")].split(';'))
60 | return entity_relations
61 |
62 |
63 | def generate_graphviz_graph(entity_relations, verbose=True):
64 | """digraph G {
65 | # a -> b [ label="a to b" ];
66 | # b -> c [ label="another label"];
67 | }"""
68 | with open('word_list.txt', 'r') as wordfile:
69 | word_list = wordfile.read().splitlines()
70 | print(word_list)
71 | graph = list()
72 | graph.append('digraph {')
73 | for er in entity_relations:
74 | if er[0] in word_list or er[2] in word_list:
75 | if len(er[1]) > 5:
76 | graph.append('"{}" -> "{}" [ label="{}" ];'.format(er[0], er[2], er[1]))
77 | graph.append('}')
78 |
79 | out_dot = tmp_folder + 'out.dot'
80 | with open(out_dot, 'w') as output_file:
81 | output_file.writelines(graph)
82 |
83 | out_png = tmp_folder + 'out.png'
84 | command = '{} -Tpng {} -o {}'.format(DOT_BIN_PATH, out_dot, out_png)
85 | debug_print('Executing command = {}'.format(command), verbose)
86 | dot_process = Popen(command, stdout=stderr, shell=True)
87 | dot_process.wait()
88 | assert not dot_process.returncode, 'ERROR: Call to dot exited with a non-zero code status.'
89 | print('Wrote graph to {} and {}'.format(out_dot, out_png))
90 |
91 |
92 | def stanford_ie(input_filename, verbose=True, generate_graphviz=False):
93 | out = tmp_folder + 'out.txt'
94 | input_filename = input_filename.replace(',', ' ')
95 |
96 | new_filename = ''
97 | for filename in input_filename.split():
98 | if filename.startswith('/'): # absolute path.
99 | new_filename += '{} '.format(filename)
100 | else:
101 | new_filename += '../{} '.format(filename)
102 |
103 | absolute_path_to_script = os.path.dirname(os.path.realpath(__file__)) + '/'
104 | command = 'cd {};'.format(absolute_path_to_script)
105 | command += 'cd {}; {} -mx4g -cp "stanford-openie.jar:stanford-openie-models.jar:lib/*" ' \
106 | 'edu.stanford.nlp.naturalli.OpenIE {} -format ollie > {}'. \
107 | format(STANFORD_IE_FOLDER, JAVA_BIN_PATH, new_filename, out)
108 |
109 | if verbose:
110 | debug_print('Executing command = {}'.format(command), verbose)
111 | java_process = Popen(command, stdout=stderr, shell=True)
112 | else:
113 | java_process = Popen(command, stdout=stderr, stderr=open(os.devnull, 'w'), shell=True)
114 | java_process.wait()
115 | assert not java_process.returncode, 'ERROR: Call to stanford_ie exited with a non-zero code status.'
116 |
117 | with open(out, 'r') as output_file:
118 | results_str = output_file.readlines()
119 | os.remove(out)
120 |
121 | results = process_entity_relations(results_str, verbose)
122 | if generate_graphviz:
123 | generate_graphviz_graph(results, verbose)
124 |
125 | return results
126 |
127 |
128 | def main(args):
129 | arg_p = arg_parse().parse_args(args[1:])
130 | filename = arg_p.filename
131 | verbose = arg_p.verbose
132 | generate_graphviz = arg_p.generate_graph
133 | print(arg_p)
134 | if filename is None:
135 | print('please provide a text file containing your input. Program will exit.')
136 | exit(1)
137 | if verbose:
138 | debug_print('filename = {}'.format(filename), verbose)
139 | entities_relations = stanford_ie(filename, verbose, generate_graphviz)
140 | print(entities_relations)
141 |
142 |
143 | if __name__ == '__main__':
144 | exit(main(argv))
145 |
--------------------------------------------------------------------------------
/open_ie_api.py:
--------------------------------------------------------------------------------
1 | from __future__ import print_function
2 |
3 | import os
4 |
5 | from main import stanford_ie
6 |
7 | tmp_folder = '/tmp/openie/'
8 | if not os.path.exists(tmp_folder):
9 | os.makedirs(tmp_folder)
10 |
11 |
12 | def texts_to_files(texts):
13 | full_tmp_file_names = []
14 | count = 0
15 | for text in texts:
16 | tmp_filename = str(count) + '.txt'
17 | full_tmp_filename = '{}/{}'.format(tmp_folder, tmp_filename).replace('//', '/')
18 | with open(full_tmp_filename, 'w') as f:
19 | f.write(text)
20 | full_tmp_file_names.append(full_tmp_filename)
21 | count += 1
22 | return full_tmp_file_names
23 |
24 |
25 | def call_api_many(texts, pagination_param=10000, verbose=False):
26 | reduced_results = []
27 | paginated_texts_list = [texts[i:i + pagination_param] for i in range(0, len(texts), pagination_param)]
28 | for paginated_texts in paginated_texts_list:
29 | tmp_file_names = texts_to_files(paginated_texts)
30 | joint_filename = ','.join(tmp_file_names)
31 | results = stanford_ie(joint_filename, verbose=verbose)
32 | reduced_results.extend(results)
33 | return reduced_results
34 |
35 |
36 | def call_api_single(text):
37 | if os.path.isfile(text):
38 | full_tmp_filename = text
39 | else:
40 | full_tmp_filename = texts_to_files([text])[0]
41 | results = stanford_ie(full_tmp_filename, verbose=False)
42 | return results
43 |
44 |
45 | if __name__ == '__main__':
46 | print(len(call_api_many(['Barack Obama was born in Hawaii.'] * 30, pagination_param=100, verbose=True)))
47 |
--------------------------------------------------------------------------------
/sample/out.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sudhanshusks/concept_map/207ac105dcaf4da480fae4b891e8f4fe472d6b90/sample/out.png
--------------------------------------------------------------------------------
/sample/readme:
--------------------------------------------------------------------------------
1 | Sample Image of concept map
2 |
--------------------------------------------------------------------------------
/samples.txt:
--------------------------------------------------------------------------------
1 | Barack Obama was born in Hawaii.
2 |
--------------------------------------------------------------------------------
/test.txt:
--------------------------------------------------------------------------------
1 | The internet is a globally connected network system that uses TCP/IP to transmit data via various types of media. The internet is a network of global exchanges including private, public, business, academic and government networks connected by guided, wireless and fiber-optic technologies.
--------------------------------------------------------------------------------
/word_list.txt:
--------------------------------------------------------------------------------
1 | TCP/IP
2 | wireless
3 | fiber-optic
4 |
--------------------------------------------------------------------------------