├── .gitignore
└── README.md


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Model
 2 | “悟道”项目现有7个开源模型成果，本仓库提供所有开源成果的模型参数文件链接。
 3 | 
 4 | ### 文本类
 5 | * **[GLM](https://wudaoai.cn/model/detail/GLM系列)**
 6 | 
 7 |   GLM (General Language Model) 是一个全新的预训练框架，打破BERT和GPT的瓶颈。单一GLM模型在语言理解和生成任务方面取得了最佳结果，并且超过了对相同数据量进行训练的常见预训练模型（例如BERT，RoBERTa和T5）。
 8 | 
 9 | * **[Transformer-XL](https://wudaoai.cn/model/detail/Transformer-XL)**
10 | 
11 |   Transformers具有长期学习依赖的潜力，但在语言建模设置中受到固定长度上下文的限制。而Transformer-XL作为一种新神经网络架构可以很好地解决长文本依赖问题。悟道基于Transformer-XL训练并开放29亿的语言模型，在长文本生成方面具有优势。
12 |  
13 | * **[CPM](https://wudaoai.cn/model/detail/CPM系列)**
14 | 
15 |   2020年11月，CPM系列正式发布，并推出当时参数最大的26亿中文预训练语言模型CPM-1（Chinese Pretrained Models），可支持简单对话、文章生成和语言理解等下游任务。2021年6月，CPM-2（Cost-efficient Pre-trained language Models）在北京智源大会上发布，推出了110亿的中英双语语言模型和对应的1980亿MoE版模型。
16 |   
17 | * **[EVA](https://github.com/BAAI-WuDao/EVA)**
18 | 
19 |   EVA是一个开放领域的中文对话预训练模型，是目前最大的汉语对话模型，参数量达到28亿，并且在包括不同领域14亿汉语的悟道对话数据集（WDC）上进行预训练。
20 |   
21 | * **[Lawformer](https://wudaoai.cn/model/detail/Lawformer)**
22 | 
23 |   在数千万的法律文书上训练能够处理法律长文本的预训练语言模型，参数规模为1亿，支持中文法律长文本输入的理解任务。
24 | 
25 | ### 图文类
26 | * **[CogView](https://wudaoai.cn/model/detail/CogView)**
27 | 
28 |   世界最大的中文多模态生成模型，参数量为40亿。模型支持文生成图为基础的多领域下游任务，在应用维度上具备通用性，经过微调后可实现国画、油画、水彩画、轮廓画等图像生成。
29 | 
30 | * **[BriVL](https://wudaoai.cn/model/detail/BriVL)**
31 | 
32 |   BriVL (Bridging Vision and Language Model) 是首个中文通用图文多模态大规模预训练模型。BriVL模型在图文检索任务上有着优异的效果，超过了同期其他常见的多模态预训练模型（例如UNITER、CLIP）。
33 | 
34 | ### 蛋白质类
35 | * **[ProteinLM](https://wudaoai.cn/model/detail/ProteinLM)**
36 | 
37 |   蛋白质序列预训练模型，目前已开源2亿和30亿参数规模的模型，能支持蛋白质二级结构预测、荧光性预测、接触预测、折叠稳定性预测和远缘同源性检测任务。相较于基线模型TAPE（3800万参数），我们的模型在下游任务上表现有所提升，尤其是在蛋白质折叠问题的接触预测问题上，模型较基线模型提高了16%。
38 | 
39 | 
40 | 


--------------------------------------------------------------------------------