├── README.md ├── automatic-concept-map.pdf ├── conceptmap.py ├── main2.py ├── open_ie_api.py ├── sample ├── out.png └── readme ├── samples.txt ├── test.txt └── word_list.txt /README.md: -------------------------------------------------------------------------------- 1 | # Concept-Map-Generator 2 | Term Project for Language Processing for E-Learning 3 | 4 | This work was done as part of a mini project for the course on Language Processing for E-Learning under the guidance of Professor Plaban Bhowmick. 5 | 6 | It makes use of the DBpedia Spotlight Web API to extract important concepts from a given text and then uses OpenIE to extract relations. Visualisation is done using GraphViz. 7 | 8 | This work uses the python wrapper for Stanford OpenIE written by [philipperemy](https://github.com/philipperemy/Stanford-OpenIE-Python). 9 | 10 | ## Usage 11 | 12 | First of all, make sure Java 1.8 is installed. Open a terminal and run this command to check: 13 | 14 | ``` 15 | java -version 16 | ``` 17 | 18 | If this is not the case and if your OS is Ubuntu, you can install it this way: 19 | 20 | ``` 21 | sudo add-apt-repository ppa:webupd8team/java 22 | sudo apt-get update 23 | sudo apt-get install oracle-java8-installer 24 | ``` 25 | Note: Make sure GraphViz is installed beforehand. Try to run the `dot` command to see if this is the case. If not, run `sudo apt-get install graphviz` if you're running on Ubuntu. 26 | 27 | How to use: 28 | Paste your desired input text in place of 'Your Text' 29 | ``` 30 | git clone https://github.com/philipperemy/Stanford-OpenIE-Python.git 31 | git clone https://github.com/PranavKhadpe/Concept-Map-Generator.git 32 | cp Concept-Map-Generator/{main2.py,conceptmap.py} Stanford-OpenIE-Python 33 | cd Stanford-OpenIE-Python 34 | echo Your Text > test.txt 35 | python conceptmap.py -i test.txt 36 | python main2.py -f test.txt -g 37 | ``` 38 | Will generate a [GraphViz DOT](http://www.graphviz.org/) graph and its related PNG file in `/tmp/openie/` 39 | 40 | ## Example 41 | 42 | Sample map generated for elementary paragraph on Photosynthesis 43 | 44 |
45 |

46 |
47 | 48 | 49 | 50 | -------------------------------------------------------------------------------- /automatic-concept-map.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sudhanshusks/concept_map/207ac105dcaf4da480fae4b891e8f4fe472d6b90/automatic-concept-map.pdf -------------------------------------------------------------------------------- /conceptmap.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Automatic Concept Map Generator 3 | Author: Pranav Khadpe 4 | ''' 5 | import sys 6 | import getopt 7 | import requests 8 | import urllib.parse 9 | import json 10 | import nltk 11 | 12 | 13 | ''' 14 | Function to get the imoprtant concepts using the DBpedia spotlight web API 15 | query = text 16 | ''' 17 | def GetImportantConcept(text): 18 | query = text 19 | params = urllib.parse.urlencode({'text': query}) 20 | url = "http://model.dbpedia-spotlight.org/en/spot?%s" % params 21 | request = requests.get(url, headers={"accept": "application/json"}) 22 | pydict = json.loads(request.content) 23 | entities = list(pydict["annotation"]["surfaceForm"]) 24 | dictionary = len(entities) 25 | word_list = [] 26 | for i in range(0, dictionary): 27 | word_list.append(entities[i]['@name']) 28 | return word_list 29 | ''' 30 | Accepts the text and returns a list of sentences that contain the important concepts. 31 | ''' 32 | def GetSentences(word_list, text): 33 | sent_list = nltk.sent_tokenize(text) 34 | impsent_list = [] 35 | num_sentences = len(sent_list) 36 | for i in range(0,num_sentences): 37 | if any(word in sent_list[i] for word in word_list): 38 | impsent_list.append(sent_list[i]) 39 | return impsent_list 40 | 41 | ''' 42 | def ExtractRelations(word_list, sentences): 43 | ''' 44 | 45 | def main(argv): 46 | inputfile = '' 47 | outputfile = '' 48 | try: 49 | opts, args = getopt.getopt(argv,"hi:o:",["ifile=","ofile="]) 50 | except getopt.GetoptError: 51 | print ('test.py -i -o ') 52 | sys.exit(2) 53 | for opt, arg in opts: 54 | if opt == '-h': 55 | print ('test.py -i -o ') 56 | sys.exit() 57 | elif opt in ("-i", "--ifile"): 58 | inputfile = arg 59 | elif opt in ("-o", "--ofile"): 60 | outputfile = arg 61 | print ('Input file is ', inputfile) 62 | print ('Output file is ', outputfile) 63 | with open(inputfile, 'r') as in_file: 64 | text = in_file.read() 65 | word_list = GetImportantConcept(text) 66 | sentences = GetSentences(word_list, text) 67 | print(word_list) 68 | print(sentences) 69 | word_file = open('word_list.txt', 'w') 70 | for item in word_list: 71 | word_file.write("%s\n" % item) 72 | 73 | 74 | 75 | if __name__ == "__main__": 76 | print("Usage:test.py -i -o ") 77 | main(sys.argv[1:]) 78 | 79 | 80 | 81 | 82 | 83 | -------------------------------------------------------------------------------- /main2.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | ''' 3 | Concept Map Generator. 4 | usage: python main2.py -f input_file -g 5 | Author: Pranav Khadpe 6 | Date: 06-04-2018 7 | 8 | Note: Parts of this code are lifted as is from those written by Philippe Remy. 9 | ''' 10 | 11 | # Copyright (c) 2016, Philippe Remy 12 | # 13 | # Permission to use, copy, modify, and/or distribute this software for any 14 | # purpose with or without fee is hereby granted, provided that the above 15 | # copyright notice and this permission notice appear in all copies. 16 | # 17 | # THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 18 | # WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 19 | # MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 20 | # ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 21 | # WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 22 | # ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 23 | # OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 24 | 25 | from __future__ import print_function 26 | 27 | import os 28 | from argparse import ArgumentParser 29 | from subprocess import Popen 30 | from sys import argv 31 | from sys import stderr 32 | 33 | JAVA_BIN_PATH = 'java' 34 | DOT_BIN_PATH = 'dot' 35 | STANFORD_IE_FOLDER = 'stanford-openie' 36 | 37 | tmp_folder = '/tmp/openie/' 38 | if not os.path.exists(tmp_folder): 39 | os.makedirs(tmp_folder) 40 | 41 | 42 | def arg_parse(): 43 | arg_p = ArgumentParser('Stanford IE Python Wrapper') 44 | arg_p.add_argument('-f', '--filename', type=str, default=None) 45 | arg_p.add_argument('-v', '--verbose', action='store_true') 46 | arg_p.add_argument('-g', '--generate_graph', action='store_true') 47 | return arg_p 48 | 49 | 50 | def debug_print(log, verbose): 51 | if verbose: 52 | print(log) 53 | 54 | 55 | def process_entity_relations(entity_relations_str, verbose=True): 56 | # format is ollie. 57 | entity_relations = list() 58 | for s in entity_relations_str: 59 | entity_relations.append(s[s.find("(") + 1:s.find(")")].split(';')) 60 | return entity_relations 61 | 62 | 63 | def generate_graphviz_graph(entity_relations, verbose=True): 64 | """digraph G { 65 | # a -> b [ label="a to b" ]; 66 | # b -> c [ label="another label"]; 67 | }""" 68 | with open('word_list.txt', 'r') as wordfile: 69 | word_list = wordfile.read().splitlines() 70 | print(word_list) 71 | graph = list() 72 | graph.append('digraph {') 73 | for er in entity_relations: 74 | if er[0] in word_list or er[2] in word_list: 75 | if len(er[1]) > 5: 76 | graph.append('"{}" -> "{}" [ label="{}" ];'.format(er[0], er[2], er[1])) 77 | graph.append('}') 78 | 79 | out_dot = tmp_folder + 'out.dot' 80 | with open(out_dot, 'w') as output_file: 81 | output_file.writelines(graph) 82 | 83 | out_png = tmp_folder + 'out.png' 84 | command = '{} -Tpng {} -o {}'.format(DOT_BIN_PATH, out_dot, out_png) 85 | debug_print('Executing command = {}'.format(command), verbose) 86 | dot_process = Popen(command, stdout=stderr, shell=True) 87 | dot_process.wait() 88 | assert not dot_process.returncode, 'ERROR: Call to dot exited with a non-zero code status.' 89 | print('Wrote graph to {} and {}'.format(out_dot, out_png)) 90 | 91 | 92 | def stanford_ie(input_filename, verbose=True, generate_graphviz=False): 93 | out = tmp_folder + 'out.txt' 94 | input_filename = input_filename.replace(',', ' ') 95 | 96 | new_filename = '' 97 | for filename in input_filename.split(): 98 | if filename.startswith('/'): # absolute path. 99 | new_filename += '{} '.format(filename) 100 | else: 101 | new_filename += '../{} '.format(filename) 102 | 103 | absolute_path_to_script = os.path.dirname(os.path.realpath(__file__)) + '/' 104 | command = 'cd {};'.format(absolute_path_to_script) 105 | command += 'cd {}; {} -mx4g -cp "stanford-openie.jar:stanford-openie-models.jar:lib/*" ' \ 106 | 'edu.stanford.nlp.naturalli.OpenIE {} -format ollie > {}'. \ 107 | format(STANFORD_IE_FOLDER, JAVA_BIN_PATH, new_filename, out) 108 | 109 | if verbose: 110 | debug_print('Executing command = {}'.format(command), verbose) 111 | java_process = Popen(command, stdout=stderr, shell=True) 112 | else: 113 | java_process = Popen(command, stdout=stderr, stderr=open(os.devnull, 'w'), shell=True) 114 | java_process.wait() 115 | assert not java_process.returncode, 'ERROR: Call to stanford_ie exited with a non-zero code status.' 116 | 117 | with open(out, 'r') as output_file: 118 | results_str = output_file.readlines() 119 | os.remove(out) 120 | 121 | results = process_entity_relations(results_str, verbose) 122 | if generate_graphviz: 123 | generate_graphviz_graph(results, verbose) 124 | 125 | return results 126 | 127 | 128 | def main(args): 129 | arg_p = arg_parse().parse_args(args[1:]) 130 | filename = arg_p.filename 131 | verbose = arg_p.verbose 132 | generate_graphviz = arg_p.generate_graph 133 | print(arg_p) 134 | if filename is None: 135 | print('please provide a text file containing your input. Program will exit.') 136 | exit(1) 137 | if verbose: 138 | debug_print('filename = {}'.format(filename), verbose) 139 | entities_relations = stanford_ie(filename, verbose, generate_graphviz) 140 | print(entities_relations) 141 | 142 | 143 | if __name__ == '__main__': 144 | exit(main(argv)) 145 | -------------------------------------------------------------------------------- /open_ie_api.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | 3 | import os 4 | 5 | from main import stanford_ie 6 | 7 | tmp_folder = '/tmp/openie/' 8 | if not os.path.exists(tmp_folder): 9 | os.makedirs(tmp_folder) 10 | 11 | 12 | def texts_to_files(texts): 13 | full_tmp_file_names = [] 14 | count = 0 15 | for text in texts: 16 | tmp_filename = str(count) + '.txt' 17 | full_tmp_filename = '{}/{}'.format(tmp_folder, tmp_filename).replace('//', '/') 18 | with open(full_tmp_filename, 'w') as f: 19 | f.write(text) 20 | full_tmp_file_names.append(full_tmp_filename) 21 | count += 1 22 | return full_tmp_file_names 23 | 24 | 25 | def call_api_many(texts, pagination_param=10000, verbose=False): 26 | reduced_results = [] 27 | paginated_texts_list = [texts[i:i + pagination_param] for i in range(0, len(texts), pagination_param)] 28 | for paginated_texts in paginated_texts_list: 29 | tmp_file_names = texts_to_files(paginated_texts) 30 | joint_filename = ','.join(tmp_file_names) 31 | results = stanford_ie(joint_filename, verbose=verbose) 32 | reduced_results.extend(results) 33 | return reduced_results 34 | 35 | 36 | def call_api_single(text): 37 | if os.path.isfile(text): 38 | full_tmp_filename = text 39 | else: 40 | full_tmp_filename = texts_to_files([text])[0] 41 | results = stanford_ie(full_tmp_filename, verbose=False) 42 | return results 43 | 44 | 45 | if __name__ == '__main__': 46 | print(len(call_api_many(['Barack Obama was born in Hawaii.'] * 30, pagination_param=100, verbose=True))) 47 | -------------------------------------------------------------------------------- /sample/out.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/sudhanshusks/concept_map/207ac105dcaf4da480fae4b891e8f4fe472d6b90/sample/out.png -------------------------------------------------------------------------------- /sample/readme: -------------------------------------------------------------------------------- 1 | Sample Image of concept map 2 | -------------------------------------------------------------------------------- /samples.txt: -------------------------------------------------------------------------------- 1 | Barack Obama was born in Hawaii. 2 | -------------------------------------------------------------------------------- /test.txt: -------------------------------------------------------------------------------- 1 | The internet is a globally connected network system that uses TCP/IP to transmit data via various types of media. The internet is a network of global exchanges including private, public, business, academic and government networks connected by guided, wireless and fiber-optic technologies. -------------------------------------------------------------------------------- /word_list.txt: -------------------------------------------------------------------------------- 1 | TCP/IP 2 | wireless 3 | fiber-optic 4 | --------------------------------------------------------------------------------