├── README.md
├── automatic-concept-map.pdf
├── conceptmap.py
├── main2.py
├── open_ie_api.py
├── sample
    ├── out.png
    └── readme
├── samples.txt
├── test.txt
└── word_list.txt


/README.md:
--------------------------------------------------------------------------------
 1 | # Concept-Map-Generator
 2 | Term Project for Language Processing for E-Learning
 3 | 
 4 | This work was done as part of a mini project for the course on Language Processing for E-Learning under the guidance of Professor Plaban Bhowmick.
 5 | 
 6 | It makes use of the DBpedia Spotlight Web API to extract important concepts from a given text and then uses OpenIE to extract relations. Visualisation is done using GraphViz.
 7 | 
 8 | This work uses the python wrapper for Stanford OpenIE written by [philipperemy](https://github.com/philipperemy/Stanford-OpenIE-Python).
 9 | 
10 | ## Usage
11 | 
12 | First of all, make sure Java 1.8 is installed. Open a terminal and run this command to check:
13 | 
14 | ```
15 | java -version
16 | ```
17 | 
18 | If this is not the case and if your OS is Ubuntu, you can install it this way:
19 | 
20 | ```
21 | sudo add-apt-repository ppa:webupd8team/java
22 | sudo apt-get update
23 | sudo apt-get install oracle-java8-installer
24 | ```
25 | Note: Make sure GraphViz is installed beforehand. Try to run the `dot` command to see if this is the case. If not, run `sudo apt-get install graphviz` if you're running on Ubuntu. 
26 | 
27 | How to use:
28 | Paste your desired input text in place of 'Your Text'
29 | ```
30 | git clone https://github.com/philipperemy/Stanford-OpenIE-Python.git
31 | git clone https://github.com/PranavKhadpe/Concept-Map-Generator.git
32 | cp Concept-Map-Generator/{main2.py,conceptmap.py} Stanford-OpenIE-Python
33 | cd Stanford-OpenIE-Python
34 | echo Your Text > test.txt
35 | python conceptmap.py -i test.txt
36 | python main2.py -f test.txt -g
37 | ```
38 | Will generate a [GraphViz DOT](http://www.graphviz.org/) graph and its related PNG file in `/tmp/openie/`
39 | 
40 | ## Example
41 | 
42 | Sample map generated for elementary paragraph on Photosynthesis
43 | 
44 | <div align="center">
45 |   <img src="https://github.com/sudhanshusks/concept_map/blob/master/sample/out.png"><br><br>
46 | </div>
47 | 
48 | 
49 | 
50 | 


--------------------------------------------------------------------------------
/automatic-concept-map.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sudhanshusks/concept_map/207ac105dcaf4da480fae4b891e8f4fe472d6b90/automatic-concept-map.pdf


--------------------------------------------------------------------------------
/conceptmap.py:
--------------------------------------------------------------------------------
 1 | '''
 2 | Automatic Concept Map Generator
 3 | Author: Pranav Khadpe
 4 | '''
 5 | import sys
 6 | import getopt
 7 | import requests
 8 | import urllib.parse
 9 | import json
10 | import nltk
11 | 
12 | 
13 | '''
14 | Function to get the imoprtant concepts using the DBpedia spotlight web API
15 | query = text
16 | '''
17 | def GetImportantConcept(text):
18 | 	query = text
19 | 	params = urllib.parse.urlencode({'text': query})
20 | 	url = "http://model.dbpedia-spotlight.org/en/spot?%s" % params
21 | 	request = requests.get(url, headers={"accept": "application/json"})
22 | 	pydict = json.loads(request.content)
23 | 	entities = list(pydict["annotation"]["surfaceForm"])
24 | 	dictionary = len(entities)
25 | 	word_list = []
26 | 	for i in range(0, dictionary):
27 | 		word_list.append(entities[i]['@name'])
28 | 	return word_list
29 | '''
30 | Accepts the text and returns a list of sentences that contain the important concepts.
31 | '''
32 | def GetSentences(word_list, text):
33 | 	sent_list = nltk.sent_tokenize(text)
34 | 	impsent_list = []
35 | 	num_sentences = len(sent_list)
36 | 	for i in range(0,num_sentences):
37 | 		if any(word in sent_list[i] for word in word_list):
38 | 			impsent_list.append(sent_list[i])
39 | 	return impsent_list
40 | 
41 | '''
42 | def ExtractRelations(word_list, sentences):
43 | '''
44 | 
45 | def main(argv):
46 | 	inputfile = ''
47 | 	outputfile = ''
48 | 	try:
49 | 		opts, args = getopt.getopt(argv,"hi:o:",["ifile=","ofile="])
50 | 	except getopt.GetoptError:
51 | 		print ('test.py -i <inputfile> -o <outputfile>')
52 | 		sys.exit(2)
53 | 	for opt, arg in opts:
54 | 		if opt == '-h':
55 | 			print ('test.py -i <inputfile> -o <outputfile>')
56 | 			sys.exit()
57 | 		elif opt in ("-i", "--ifile"):
58 | 			inputfile = arg
59 | 		elif opt in ("-o", "--ofile"):
60 | 			outputfile = arg
61 | 	print ('Input file is ', inputfile)
62 | 	print ('Output file is ', outputfile)
63 | 	with open(inputfile, 'r') as in_file:
64 | 		text = in_file.read()
65 | 	word_list = GetImportantConcept(text)
66 | 	sentences = GetSentences(word_list, text)
67 | 	print(word_list)
68 | 	print(sentences)
69 | 	word_file = open('word_list.txt', 'w')
70 | 	for item in word_list:
71 | 		word_file.write("%s\n" % item)
72 | 
73 | 
74 | 
75 | if __name__ == "__main__":
76 | 	print("Usage:test.py -i <inputfile> -o <outputfile>")
77 | 	main(sys.argv[1:])
78 | 
79 | 
80 | 
81 | 
82 | 
83 | 


--------------------------------------------------------------------------------
/main2.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | '''
  3 | Concept Map Generator.
  4 | usage: python main2.py -f input_file -g
  5 | Author: Pranav Khadpe
  6 | Date: 06-04-2018
  7 | 
  8 | Note: Parts of this code are lifted as is from those written by Philippe Remy.
  9 | '''
 10 | 
 11 | # Copyright (c) 2016, Philippe Remy <github: philipperemy>
 12 | #
 13 | # Permission to use, copy, modify, and/or distribute this software for any
 14 | # purpose with or without fee is hereby granted, provided that the above
 15 | # copyright notice and this permission notice appear in all copies.
 16 | #
 17 | # THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 18 | # WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 19 | # MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 20 | # ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 21 | # WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 22 | # ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 23 | # OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 24 | 
 25 | from __future__ import print_function
 26 | 
 27 | import os
 28 | from argparse import ArgumentParser
 29 | from subprocess import Popen
 30 | from sys import argv
 31 | from sys import stderr
 32 | 
 33 | JAVA_BIN_PATH = 'java'
 34 | DOT_BIN_PATH = 'dot'
 35 | STANFORD_IE_FOLDER = 'stanford-openie'
 36 | 
 37 | tmp_folder = '/tmp/openie/'
 38 | if not os.path.exists(tmp_folder):
 39 |     os.makedirs(tmp_folder)
 40 | 
 41 | 
 42 | def arg_parse():
 43 |     arg_p = ArgumentParser('Stanford IE Python Wrapper')
 44 |     arg_p.add_argument('-f', '--filename', type=str, default=None)
 45 |     arg_p.add_argument('-v', '--verbose', action='store_true')
 46 |     arg_p.add_argument('-g', '--generate_graph', action='store_true')
 47 |     return arg_p
 48 | 
 49 | 
 50 | def debug_print(log, verbose):
 51 |     if verbose:
 52 |         print(log)
 53 | 
 54 | 
 55 | def process_entity_relations(entity_relations_str, verbose=True):
 56 |     # format is ollie.
 57 |     entity_relations = list()
 58 |     for s in entity_relations_str:
 59 |         entity_relations.append(s[s.find("(") + 1:s.find(")")].split(';'))
 60 |     return entity_relations
 61 | 
 62 | 
 63 | def generate_graphviz_graph(entity_relations, verbose=True):
 64 |     """digraph G {
 65 |     # a -> b [ label="a to b" ];
 66 |     # b -> c [ label="another label"];
 67 |     }"""
 68 |     with open('word_list.txt', 'r') as wordfile:
 69 |         word_list = wordfile.read().splitlines()
 70 |     print(word_list)
 71 |     graph = list()
 72 |     graph.append('digraph {')
 73 |     for er in entity_relations:
 74 |         if er[0] in word_list or er[2] in word_list:
 75 |             if len(er[1]) > 5:
 76 |                 graph.append('"{}" -> "{}" [ label="{}" ];'.format(er[0], er[2], er[1]))
 77 |     graph.append('}')
 78 | 
 79 |     out_dot = tmp_folder + 'out.dot'
 80 |     with open(out_dot, 'w') as output_file:
 81 |         output_file.writelines(graph)
 82 | 
 83 |     out_png = tmp_folder + 'out.png'
 84 |     command = '{} -Tpng {} -o {}'.format(DOT_BIN_PATH, out_dot, out_png)
 85 |     debug_print('Executing command = {}'.format(command), verbose)
 86 |     dot_process = Popen(command, stdout=stderr, shell=True)
 87 |     dot_process.wait()
 88 |     assert not dot_process.returncode, 'ERROR: Call to dot exited with a non-zero code status.'
 89 |     print('Wrote graph to {} and {}'.format(out_dot, out_png))
 90 | 
 91 | 
 92 | def stanford_ie(input_filename, verbose=True, generate_graphviz=False):
 93 |     out = tmp_folder + 'out.txt'
 94 |     input_filename = input_filename.replace(',', ' ')
 95 | 
 96 |     new_filename = ''
 97 |     for filename in input_filename.split():
 98 |         if filename.startswith('/'):  # absolute path.
 99 |             new_filename += '{} '.format(filename)
100 |         else:
101 |             new_filename += '../{} '.format(filename)
102 | 
103 |     absolute_path_to_script = os.path.dirname(os.path.realpath(__file__)) + '/'
104 |     command = 'cd {};'.format(absolute_path_to_script)
105 |     command += 'cd {}; {} -mx4g -cp "stanford-openie.jar:stanford-openie-models.jar:lib/*" ' \
106 |                'edu.stanford.nlp.naturalli.OpenIE {} -format ollie > {}'. \
107 |         format(STANFORD_IE_FOLDER, JAVA_BIN_PATH, new_filename, out)
108 | 
109 |     if verbose:
110 |         debug_print('Executing command = {}'.format(command), verbose)
111 |         java_process = Popen(command, stdout=stderr, shell=True)
112 |     else:
113 |         java_process = Popen(command, stdout=stderr, stderr=open(os.devnull, 'w'), shell=True)
114 |     java_process.wait()
115 |     assert not java_process.returncode, 'ERROR: Call to stanford_ie exited with a non-zero code status.'
116 | 
117 |     with open(out, 'r') as output_file:
118 |         results_str = output_file.readlines()
119 |     os.remove(out)
120 | 
121 |     results = process_entity_relations(results_str, verbose)
122 |     if generate_graphviz:
123 |         generate_graphviz_graph(results, verbose)
124 | 
125 |     return results
126 | 
127 | 
128 | def main(args):
129 |     arg_p = arg_parse().parse_args(args[1:])
130 |     filename = arg_p.filename
131 |     verbose = arg_p.verbose
132 |     generate_graphviz = arg_p.generate_graph
133 |     print(arg_p)
134 |     if filename is None:
135 |         print('please provide a text file containing your input. Program will exit.')
136 |         exit(1)
137 |     if verbose:
138 |         debug_print('filename = {}'.format(filename), verbose)
139 |     entities_relations = stanford_ie(filename, verbose, generate_graphviz)
140 |     print(entities_relations)
141 | 
142 | 
143 | if __name__ == '__main__':
144 |     exit(main(argv))
145 | 


--------------------------------------------------------------------------------
/open_ie_api.py:
--------------------------------------------------------------------------------
 1 | from __future__ import print_function
 2 | 
 3 | import os
 4 | 
 5 | from main import stanford_ie
 6 | 
 7 | tmp_folder = '/tmp/openie/'
 8 | if not os.path.exists(tmp_folder):
 9 |     os.makedirs(tmp_folder)
10 | 
11 | 
12 | def texts_to_files(texts):
13 |     full_tmp_file_names = []
14 |     count = 0
15 |     for text in texts:
16 |         tmp_filename = str(count) + '.txt'
17 |         full_tmp_filename = '{}/{}'.format(tmp_folder, tmp_filename).replace('//', '/')
18 |         with open(full_tmp_filename, 'w') as f:
19 |             f.write(text)
20 |         full_tmp_file_names.append(full_tmp_filename)
21 |         count += 1
22 |     return full_tmp_file_names
23 | 
24 | 
25 | def call_api_many(texts, pagination_param=10000, verbose=False):
26 |     reduced_results = []
27 |     paginated_texts_list = [texts[i:i + pagination_param] for i in range(0, len(texts), pagination_param)]
28 |     for paginated_texts in paginated_texts_list:
29 |         tmp_file_names = texts_to_files(paginated_texts)
30 |         joint_filename = ','.join(tmp_file_names)
31 |         results = stanford_ie(joint_filename, verbose=verbose)
32 |         reduced_results.extend(results)
33 |     return reduced_results
34 | 
35 | 
36 | def call_api_single(text):
37 |     if os.path.isfile(text):
38 |         full_tmp_filename = text
39 |     else:
40 |         full_tmp_filename = texts_to_files([text])[0]
41 |     results = stanford_ie(full_tmp_filename, verbose=False)
42 |     return results
43 | 
44 | 
45 | if __name__ == '__main__':
46 |     print(len(call_api_many(['Barack Obama was born in Hawaii.'] * 30, pagination_param=100, verbose=True)))
47 | 


--------------------------------------------------------------------------------
/sample/out.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/sudhanshusks/concept_map/207ac105dcaf4da480fae4b891e8f4fe472d6b90/sample/out.png


--------------------------------------------------------------------------------
/sample/readme:
--------------------------------------------------------------------------------
1 | Sample Image of concept map
2 | 


--------------------------------------------------------------------------------
/samples.txt:
--------------------------------------------------------------------------------
1 | Barack Obama was born in Hawaii.
2 | 


--------------------------------------------------------------------------------
/test.txt:
--------------------------------------------------------------------------------
1 | The internet is a globally connected network system that uses TCP/IP to transmit data via various types of media. The internet is a network of global exchanges including private, public, business, academic and government networks connected by guided, wireless and fiber-optic technologies.


--------------------------------------------------------------------------------
/word_list.txt:
--------------------------------------------------------------------------------
1 | TCP/IP
2 | wireless
3 | fiber-optic
4 | 


--------------------------------------------------------------------------------