├── .gitignore ├── .ipynb_checkpoints ├── README-checkpoint.md ├── configurator-checkpoint.py ├── data_utils-checkpoint.py ├── get_metrics-checkpoint.ipynb ├── model-checkpoint.py ├── test-checkpoint.py └── train-checkpoint.py ├── README.md ├── __pycache__ ├── data_utils.cpython-38.pyc └── model.cpython-38.pyc ├── configurator.py ├── data_utils.py ├── get_metrics.ipynb ├── model.py ├── out ├── log_20240102_len512_Alldata_balance.log ├── metrics │ ├── log_20240102_len512_Alldata_balance.png │ ├── result_fig_20240102_classification_GUE_core notata promoter_final.png │ ├── result_fig_20240102_classification_GUE_core promoter_final.png │ ├── result_fig_20240102_classification_GUE_core tata promoter_final.png │ ├── result_fig_20240102_classification_GUE_proximal notata promoter_final.png │ ├── result_fig_20240102_classification_GUE_proximal promoter_final.png │ ├── result_fig_20240102_classification_GUE_proximal tata promoter_final.png │ ├── result_fig_20240102_classification_GUE_splice site_final.png │ ├── result_fig_20240102_classification_GUE_transcription factor binding site_final.png │ ├── result_fig_20240102_classification_GenomicBench_enhancer_cohn_final.png │ ├── result_fig_20240102_classification_GenomicBench_enhancer_ensembl_final.png │ ├── result_fig_20240102_classification_GenomicLLM_enhancer_final.png │ ├── result_fig_20240102_classification_GenomicLLM_gene biotype_final.png │ ├── result_fig_20240102_classification_GenomicLLM_splice site_final.png │ ├── result_fig_20240102_classification_Open Reading Frame_final.png │ ├── result_fig_20240102_classification_core notata promoter_final.png │ ├── result_fig_20240102_classification_core promoter_final.png │ ├── result_fig_20240102_classification_core tata promoter_final.png │ ├── result_fig_20240102_classification_enhancer_final.png │ ├── result_fig_20240102_classification_gene biotype_final.png │ ├── result_fig_20240102_classification_ocr_final.png │ ├── result_fig_20240102_classification_promoter_final.png │ ├── result_fig_20240102_classification_proximal notata promoter_final.png │ ├── result_fig_20240102_classification_proximal promoter_final.png │ ├── result_fig_20240102_classification_proximal tata promoter_final.png │ ├── result_fig_20240102_classification_regulatory_final.png │ ├── result_fig_20240102_classification_sequence direction_final.png │ ├── result_fig_20240102_classification_splice site_final.png │ ├── result_fig_20240102_classification_transcription factor binding site_final.png │ ├── result_fig_20240102_regression_GCcontent_final.png │ ├── result_table_20240102_generation_task_final.csv │ └── sample_table_20240118.csv ├── model_generated_text_for_GUE_dataset_20240102_len512_balance_best.csv ├── test_Hyena_enhancers_cohn_20240102_len512_balance_best.csv ├── test_Hyena_enhancers_ensembl_20240102_len512_balance_best.csv ├── test_ocr_20240102_len512_balance_best.csv ├── test_our_enhancers_20240102_len512_balance_best.csv └── test_our_ssp_20240102_len512_balance_best.csv ├── requirements.txt ├── test.py └── train.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/.gitignore -------------------------------------------------------------------------------- /.ipynb_checkpoints/README-checkpoint.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/.ipynb_checkpoints/README-checkpoint.md -------------------------------------------------------------------------------- /.ipynb_checkpoints/configurator-checkpoint.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/.ipynb_checkpoints/configurator-checkpoint.py -------------------------------------------------------------------------------- /.ipynb_checkpoints/data_utils-checkpoint.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/.ipynb_checkpoints/data_utils-checkpoint.py -------------------------------------------------------------------------------- /.ipynb_checkpoints/get_metrics-checkpoint.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/.ipynb_checkpoints/get_metrics-checkpoint.ipynb -------------------------------------------------------------------------------- /.ipynb_checkpoints/model-checkpoint.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/.ipynb_checkpoints/model-checkpoint.py -------------------------------------------------------------------------------- /.ipynb_checkpoints/test-checkpoint.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/.ipynb_checkpoints/test-checkpoint.py -------------------------------------------------------------------------------- /.ipynb_checkpoints/train-checkpoint.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/.ipynb_checkpoints/train-checkpoint.py -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/README.md -------------------------------------------------------------------------------- /__pycache__/data_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/__pycache__/data_utils.cpython-38.pyc -------------------------------------------------------------------------------- /__pycache__/model.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/__pycache__/model.cpython-38.pyc -------------------------------------------------------------------------------- /configurator.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/configurator.py -------------------------------------------------------------------------------- /data_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/data_utils.py -------------------------------------------------------------------------------- /get_metrics.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/get_metrics.ipynb -------------------------------------------------------------------------------- /model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/model.py -------------------------------------------------------------------------------- /out/log_20240102_len512_Alldata_balance.log: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/log_20240102_len512_Alldata_balance.log -------------------------------------------------------------------------------- /out/metrics/log_20240102_len512_Alldata_balance.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/log_20240102_len512_Alldata_balance.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GUE_core notata promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GUE_core notata promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GUE_core promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GUE_core promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GUE_core tata promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GUE_core tata promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GUE_proximal notata promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GUE_proximal notata promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GUE_proximal promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GUE_proximal promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GUE_proximal tata promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GUE_proximal tata promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GUE_splice site_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GUE_splice site_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GUE_transcription factor binding site_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GUE_transcription factor binding site_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GenomicBench_enhancer_cohn_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GenomicBench_enhancer_cohn_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GenomicBench_enhancer_ensembl_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GenomicBench_enhancer_ensembl_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GenomicLLM_enhancer_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GenomicLLM_enhancer_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GenomicLLM_gene biotype_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GenomicLLM_gene biotype_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_GenomicLLM_splice site_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_GenomicLLM_splice site_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_Open Reading Frame_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_Open Reading Frame_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_core notata promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_core notata promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_core promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_core promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_core tata promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_core tata promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_enhancer_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_enhancer_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_gene biotype_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_gene biotype_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_ocr_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_ocr_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_proximal notata promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_proximal notata promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_proximal promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_proximal promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_proximal tata promoter_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_proximal tata promoter_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_regulatory_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_regulatory_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_sequence direction_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_sequence direction_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_splice site_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_splice site_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_classification_transcription factor binding site_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_classification_transcription factor binding site_final.png -------------------------------------------------------------------------------- /out/metrics/result_fig_20240102_regression_GCcontent_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_fig_20240102_regression_GCcontent_final.png -------------------------------------------------------------------------------- /out/metrics/result_table_20240102_generation_task_final.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/result_table_20240102_generation_task_final.csv -------------------------------------------------------------------------------- /out/metrics/sample_table_20240118.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/metrics/sample_table_20240118.csv -------------------------------------------------------------------------------- /out/model_generated_text_for_GUE_dataset_20240102_len512_balance_best.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/model_generated_text_for_GUE_dataset_20240102_len512_balance_best.csv -------------------------------------------------------------------------------- /out/test_Hyena_enhancers_cohn_20240102_len512_balance_best.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/test_Hyena_enhancers_cohn_20240102_len512_balance_best.csv -------------------------------------------------------------------------------- /out/test_Hyena_enhancers_ensembl_20240102_len512_balance_best.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/test_Hyena_enhancers_ensembl_20240102_len512_balance_best.csv -------------------------------------------------------------------------------- /out/test_ocr_20240102_len512_balance_best.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/test_ocr_20240102_len512_balance_best.csv -------------------------------------------------------------------------------- /out/test_our_enhancers_20240102_len512_balance_best.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/test_our_enhancers_20240102_len512_balance_best.csv -------------------------------------------------------------------------------- /out/test_our_ssp_20240102_len512_balance_best.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/out/test_our_ssp_20240102_len512_balance_best.csv -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/requirements.txt -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/test.py -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Huatsing-Lau/GenoLLM/HEAD/train.py --------------------------------------------------------------------------------