├── .gitmodules ├── LICENSE.md ├── README.md ├── dataset_preview ├── Development set │ ├── pg_sentence_id_list_dev_365_374_YAWN_parts.txt │ ├── pg_src_sequences_dev_365_374_YAWN_parts.txt │ ├── pg_src_tokens_with_positions_dev_365_374_YAWN_parts.json │ ├── src_sentences_BIO_dev_365_374_YAWN_parts.txt │ ├── src_sentences_dev_365_374_YAWN_parts.txt │ └── src_sentences_tokenized_dev_365_374_YAWN_parts.txt ├── Test set (all ground truth concepts) │ ├── pg_sentence_id_list_test_all_concepts_375_384_YAWN_parts.txt │ ├── pg_src_sequences_test_all_concepts_375_384_YAWN_parts.txt │ ├── pg_src_tokens_with_positions_test_all_concepts_375_384_YAWN_parts.json │ ├── src_sentences_BIO_test_all_concepts_375_384_YAWN_parts.txt │ ├── src_sentences_test_all_concepts_375_384_YAWN_parts.txt │ └── src_sentences_tokenized_test_all_concepts_375_384_YAWN_parts.txt ├── Test set (only sentences with WordNet-typed concepts) │ ├── pg_sentence_id_list_test_only_WN_typed_375_384_YAWN_parts.txt │ ├── pg_src_sequences_test_only_WN_typed_375_384_YAWN_parts.txt │ ├── pg_src_tokens_with_positions_test_only_WN_typed_375_384_YAWN_parts.json │ ├── src_sentences_BIO_test_only_WN_typed_375_384_YAWN_parts.txt │ ├── src_sentences_test_only_WN_typed_375_384_YAWN_parts.txt │ └── src_sentences_tokenized_test_only_WN_typed_375_384_YAWN_parts.txt ├── Training set │ ├── dsa_60_0 │ │ ├── src_train_dsa_60_0_324_YAWN_parts.txt │ │ └── tgt_train_dsa_60_0_324_YAWN_parts.txt │ └── dsa_dict │ │ ├── src_train_dsa_dbpedia_spotlight10_324_YAWN_parts.txt │ │ └── tgt_train_dsa_dbpedia_spotlight10_324_YAWN_parts.txt └── Validation set │ ├── dsa_60_0 │ ├── src_valid_dsa_60_0_325_364_YAWN_parts.txt │ └── tgt_valid_dsa_60_0_325_364_YAWN_parts.txt │ └── dsa_dict │ ├── src_valid_dsa_dbpedia_spotlight10_325_364_YAWN_parts.txt │ └── tgt_valid_dsa_dbpedia_spotlight10_325_364_YAWN_parts.txt ├── download_models.py ├── requirements.txt ├── run_concept_extraction.py ├── setup.py ├── src ├── __init__.py ├── concept_extractor │ ├── __init__.py │ ├── concept.py │ └── concept_extractor.py └── text_processor │ ├── __init__.py │ ├── text_tokenizer.py │ ├── token.py │ └── tokenized_text.py ├── static ├── architecture.png └── example_text.txt └── translate.py /.gitmodules: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/.gitmodules -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/LICENSE.md -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/README.md -------------------------------------------------------------------------------- /dataset_preview/Development set/pg_sentence_id_list_dev_365_374_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Development set/pg_sentence_id_list_dev_365_374_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Development set/pg_src_sequences_dev_365_374_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Development set/pg_src_sequences_dev_365_374_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Development set/pg_src_tokens_with_positions_dev_365_374_YAWN_parts.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Development set/pg_src_tokens_with_positions_dev_365_374_YAWN_parts.json -------------------------------------------------------------------------------- /dataset_preview/Development set/src_sentences_BIO_dev_365_374_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Development set/src_sentences_BIO_dev_365_374_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Development set/src_sentences_dev_365_374_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Development set/src_sentences_dev_365_374_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Development set/src_sentences_tokenized_dev_365_374_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Development set/src_sentences_tokenized_dev_365_374_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (all ground truth concepts)/pg_sentence_id_list_test_all_concepts_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (all ground truth concepts)/pg_sentence_id_list_test_all_concepts_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (all ground truth concepts)/pg_src_sequences_test_all_concepts_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (all ground truth concepts)/pg_src_sequences_test_all_concepts_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (all ground truth concepts)/pg_src_tokens_with_positions_test_all_concepts_375_384_YAWN_parts.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (all ground truth concepts)/pg_src_tokens_with_positions_test_all_concepts_375_384_YAWN_parts.json -------------------------------------------------------------------------------- /dataset_preview/Test set (all ground truth concepts)/src_sentences_BIO_test_all_concepts_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (all ground truth concepts)/src_sentences_BIO_test_all_concepts_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (all ground truth concepts)/src_sentences_test_all_concepts_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (all ground truth concepts)/src_sentences_test_all_concepts_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (all ground truth concepts)/src_sentences_tokenized_test_all_concepts_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (all ground truth concepts)/src_sentences_tokenized_test_all_concepts_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (only sentences with WordNet-typed concepts)/pg_sentence_id_list_test_only_WN_typed_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (only sentences with WordNet-typed concepts)/pg_sentence_id_list_test_only_WN_typed_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (only sentences with WordNet-typed concepts)/pg_src_sequences_test_only_WN_typed_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (only sentences with WordNet-typed concepts)/pg_src_sequences_test_only_WN_typed_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (only sentences with WordNet-typed concepts)/pg_src_tokens_with_positions_test_only_WN_typed_375_384_YAWN_parts.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (only sentences with WordNet-typed concepts)/pg_src_tokens_with_positions_test_only_WN_typed_375_384_YAWN_parts.json -------------------------------------------------------------------------------- /dataset_preview/Test set (only sentences with WordNet-typed concepts)/src_sentences_BIO_test_only_WN_typed_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (only sentences with WordNet-typed concepts)/src_sentences_BIO_test_only_WN_typed_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (only sentences with WordNet-typed concepts)/src_sentences_test_only_WN_typed_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (only sentences with WordNet-typed concepts)/src_sentences_test_only_WN_typed_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Test set (only sentences with WordNet-typed concepts)/src_sentences_tokenized_test_only_WN_typed_375_384_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Test set (only sentences with WordNet-typed concepts)/src_sentences_tokenized_test_only_WN_typed_375_384_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Training set/dsa_60_0/src_train_dsa_60_0_324_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Training set/dsa_60_0/src_train_dsa_60_0_324_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Training set/dsa_60_0/tgt_train_dsa_60_0_324_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Training set/dsa_60_0/tgt_train_dsa_60_0_324_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Training set/dsa_dict/src_train_dsa_dbpedia_spotlight10_324_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Training set/dsa_dict/src_train_dsa_dbpedia_spotlight10_324_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Training set/dsa_dict/tgt_train_dsa_dbpedia_spotlight10_324_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Training set/dsa_dict/tgt_train_dsa_dbpedia_spotlight10_324_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Validation set/dsa_60_0/src_valid_dsa_60_0_325_364_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Validation set/dsa_60_0/src_valid_dsa_60_0_325_364_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Validation set/dsa_60_0/tgt_valid_dsa_60_0_325_364_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Validation set/dsa_60_0/tgt_valid_dsa_60_0_325_364_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Validation set/dsa_dict/src_valid_dsa_dbpedia_spotlight10_325_364_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Validation set/dsa_dict/src_valid_dsa_dbpedia_spotlight10_325_364_YAWN_parts.txt -------------------------------------------------------------------------------- /dataset_preview/Validation set/dsa_dict/tgt_valid_dsa_dbpedia_spotlight10_325_364_YAWN_parts.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/dataset_preview/Validation set/dsa_dict/tgt_valid_dsa_dbpedia_spotlight10_325_364_YAWN_parts.txt -------------------------------------------------------------------------------- /download_models.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/download_models.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/requirements.txt -------------------------------------------------------------------------------- /run_concept_extraction.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/run_concept_extraction.py -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/setup.py -------------------------------------------------------------------------------- /src/__init__.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /src/concept_extractor/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/src/concept_extractor/__init__.py -------------------------------------------------------------------------------- /src/concept_extractor/concept.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/src/concept_extractor/concept.py -------------------------------------------------------------------------------- /src/concept_extractor/concept_extractor.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/src/concept_extractor/concept_extractor.py -------------------------------------------------------------------------------- /src/text_processor/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/src/text_processor/__init__.py -------------------------------------------------------------------------------- /src/text_processor/text_tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/src/text_processor/text_tokenizer.py -------------------------------------------------------------------------------- /src/text_processor/token.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/src/text_processor/token.py -------------------------------------------------------------------------------- /src/text_processor/tokenized_text.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/src/text_processor/tokenized_text.py -------------------------------------------------------------------------------- /static/architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/static/architecture.png -------------------------------------------------------------------------------- /static/example_text.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/static/example_text.txt -------------------------------------------------------------------------------- /translate.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/TalnUPF/ConceptExtraction/HEAD/translate.py --------------------------------------------------------------------------------