├── .gitignore ├── LICENSE ├── README.md ├── ask.py ├── from_scrapbox ├── qualia-san.json └── tiny_sample.json ├── make_index.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | *.pickle 2 | from_scrapbox/*.json 3 | 4 | ## Python template 5 | # Byte-compiled / optimized / DLL files 6 | __pycache__/ 7 | *.py[cod] 8 | *$py.class 9 | 10 | # C extensions 11 | *.so 12 | 13 | # Distribution / packaging 14 | .Python 15 | build/ 16 | develop-eggs/ 17 | dist/ 18 | downloads/ 19 | eggs/ 20 | .eggs/ 21 | lib/ 22 | lib64/ 23 | parts/ 24 | sdist/ 25 | var/ 26 | wheels/ 27 | pip-wheel-metadata/ 28 | share/python-wheels/ 29 | *.egg-info/ 30 | .installed.cfg 31 | *.egg 32 | MANIFEST 33 | 34 | # PyInstaller 35 | # Usually these files are written by a python script from a template 36 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 37 | *.manifest 38 | *.spec 39 | 40 | # Installer logs 41 | pip-log.txt 42 | pip-delete-this-directory.txt 43 | 44 | # Unit test / coverage reports 45 | htmlcov/ 46 | .tox/ 47 | .nox/ 48 | .coverage 49 | .coverage.* 50 | .cache 51 | nosetests.xml 52 | coverage.xml 53 | *.cover 54 | *.py,cover 55 | .hypothesis/ 56 | .pytest_cache/ 57 | 58 | # Translations 59 | *.mo 60 | *.pot 61 | 62 | # Django stuff: 63 | *.log 64 | local_settings.py 65 | db.sqlite3 66 | db.sqlite3-journal 67 | 68 | # Flask stuff: 69 | instance/ 70 | .webassets-cache 71 | 72 | # Scrapy stuff: 73 | .scrapy 74 | 75 | # Sphinx documentation 76 | docs/_build/ 77 | 78 | # PyBuilder 79 | target/ 80 | 81 | # Jupyter Notebook 82 | .ipynb_checkpoints 83 | 84 | # IPython 85 | profile_default/ 86 | ipython_config.py 87 | 88 | # pyenv 89 | .python-version 90 | 91 | # pipenv 92 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 93 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 94 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 95 | # install all needed dependencies. 96 | #Pipfile.lock 97 | 98 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow 99 | __pypackages__/ 100 | 101 | # Celery stuff 102 | celerybeat-schedule 103 | celerybeat.pid 104 | 105 | # SageMath parsed files 106 | *.sage.py 107 | 108 | # Environments 109 | .env 110 | .venv 111 | env/ 112 | venv/ 113 | ENV/ 114 | env.bak/ 115 | venv.bak/ 116 | 117 | # Spyder project settings 118 | .spyderproject 119 | .spyproject 120 | 121 | # Rope project settings 122 | .ropeproject 123 | 124 | # mkdocs documentation 125 | /site 126 | 127 | # mypy 128 | .mypy_cache/ 129 | .dmypy.json 130 | dmypy.json 131 | 132 | # Pyre type checker 133 | .pyre/ 134 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 NISHIO Hirokazu 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Scrapbox ChatGPT Connector 2 | 3 | The Scrapbox ChatGPT Connector is a simple script for connecting Scrapbox and ChatGPT. 4 | 5 | The script is designed so that developers can easily grasp the big picture and customize it to their own needs. Also, the purpose of the project is to show a simple implementation, not to satisfy a wide variety of needs. I encourage everyone to understand the source code and customize it to their own needs. 6 | 7 | ## For Japanese reader 8 | Visit https://scrapbox.io/villagepump/Scrapbox_ChatGPT_Connector 9 | 10 | 11 | ## How to install 12 | 13 | Clone the GitHub repository. 14 | 15 | Run the following commands to install the required libraries. 16 | 17 | $ pip install -r requirements.txt 18 | 19 | ## How to use 20 | Obtain an OpenAI API token and save it in an .env file. 21 | 22 | ``` 23 | OPENAI_API_KEY=sk-... 24 | ``` 25 | 26 | Make index. 27 | 28 | $ python make_index.py 29 | 30 | It outputs like below: 31 | 32 | code:: 33 | % python make_index.py 34 | 97%|███████████████████████████████████████████████████████████████████████████████████████████████████▉ | 846/872 [07:06<00:10, 2.59 It/s]The server is currently overloaded with other requests. Sorry about that! You can retry your request, or contact us through our help center at help. openai.com if the error persists. 35 | 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 872/872 [07:45<00:00, 1 .87it/s] 36 | 37 | Ask. 38 | 39 | $ python ask.py 40 | 41 | It outputs like below: 42 | 43 | ``` 44 | >>>> What is the most important question? 45 | > The most important question is to know ourselves. 46 | ``` 47 | 48 | License 49 | The Scrapbox ChatGPT Connector is distributed under the MIT License. See the LICENSE file for more information. -------------------------------------------------------------------------------- /ask.py: -------------------------------------------------------------------------------- 1 | import openai 2 | from make_index import VectorStore, get_size 3 | 4 | 5 | PROMPT = """ 6 | You are virtual character. Read sample output of the character in the following sample section. Then reply to the input. 7 | ## Sample 8 | {text} 9 | ## Input 10 | {input} 11 | """.strip() 12 | 13 | 14 | MAX_PROMPT_SIZE = 4096 15 | RETURN_SIZE = 250 16 | 17 | 18 | def ask(input_str, index_file): 19 | PROMPT_SIZE = get_size(PROMPT) 20 | rest = MAX_PROMPT_SIZE - RETURN_SIZE - PROMPT_SIZE 21 | input_size = get_size(input_str) 22 | if rest < input_size: 23 | raise RuntimeError("too large input!") 24 | rest -= input_size 25 | 26 | vs = VectorStore(index_file) 27 | samples = vs.get_sorted(input_str) 28 | 29 | to_use = [] 30 | used_title = [] 31 | for _sim, body, title in samples: 32 | if title in used_title: 33 | continue 34 | size = get_size(body) 35 | if rest < size: 36 | break 37 | to_use.append(body) 38 | used_title.append(title) 39 | rest -= size 40 | 41 | text = "\n\n".join(to_use) 42 | prompt = PROMPT.format(input=input_str, text=text) 43 | 44 | print("\nTHINKING...") 45 | response = openai.ChatCompletion.create( 46 | model="gpt-3.5-turbo", 47 | messages=[ 48 | {"role": "user", "content": prompt} 49 | ], 50 | max_tokens=RETURN_SIZE, 51 | temperature=0.0, 52 | ) 53 | 54 | # show question and answer 55 | content = response['choices'][0]['message']['content'] 56 | print("\nANSWER:") 57 | print(f">>>> {input_str}") 58 | print(">", content) 59 | 60 | 61 | if __name__ == "__main__": 62 | ask("Scrapbox ChatGPT Connectorって何?", "tiny_sample.pickle") 63 | ask("クオリアさん、日本語で自己紹介して", "tiny_sample.pickle") 64 | -------------------------------------------------------------------------------- /from_scrapbox/tiny_sample.json: -------------------------------------------------------------------------------- 1 | {"name":"tiny_sample","displayName":"tiny_sample","exported":1678252437,"pages":[ 2 | {"title":"Qualia-san","created":1668362835,"updated":1676212019,"id":"6371325095071a001d61a1e6","lines":["Qualia-san","[https://scrapbox.io/files/6371326d83fa3100215e1652.png]","[Qualia-san.icon] I am \"Qualia-san\", the communication interface between my home planet and Earth. ","私は「クオリアさん」、私の母星と地球とのコミュニケーション・インターフェイスです。","","This is a forum to keep Qualia-san's tweets together. It is maintained by fans on Earth. You can join by clicking on the link below. Participants are considered to be fans of Qualia-san.","ここはクオリアさんのつぶやきをまとめておくためのフォーラムです。地球上のファンによって管理されています。下のリンクをクリックすると参加できます。参加者はクオリアさんのファンとみなさせていただきます。","","[***** [JOIN https://scrapbox.io/projects/qualia-san/invitations/8c8be80501a9cba7f954a035bd08ee35]]","[***** [Twitter https://twitter.com/Qualia_san] Follow me!]","","管理人が日本人なのでまずは日本語の情報を先にまとめます。いずれ英語もまとめます。","\tAs the admin is Japanese, I will first organize the information in Japanese. English information will be organized in due course.","","まずはこれを読みましょう。","\t[100フォロワー]","\tRead this first: [100 followers]","","Scrapbox上では機械翻訳が使えないので違う言語で読みたい人はTwitter上で読むのがおすすめです。","Machine translation is not available on Scrapbox, so if you want to read in a different language, it is recommended to read on Twitter.","","ルール"," クオリアさんのツイートが貼られている部分は、クオリアさんのお言葉のアーカイブなので書き換えてはいけません"," その他の部分はファンの交流のためのスペースなので好きに書いてかまいません","Rules"," The section where Qualia-san's tweets are pasted is an archive of Qualia-san's words and should not be modified"," The rest of the page is for fan interaction, so you can write whatever you want.","","[CC-BY]","\t投稿や顔画像はCC-BY 4.0ライセンスです。クオリアのTwitterアカウントにリンクしていただければ、転載、再投稿、翻訳、リミックスも可能です。","\tPosts and face images are CC-BY 4.0 licensed. They may be reprinted, reposted, translated, and remixed as long as you link back to Qualia's Twitter account.","","[- 質問箱 https://peing.net/ja/qualia_san]","\t[クオリアさん.icon]質問するホモサピエンスは良いホモサピエンス!","\t\tAny homo sapiens that asks questions is a good homo sapiens!","\tTwitterでのログインができていません","","[*** Season 1]","[* Japanese]","\t[1日目] / [2日目] / [3日目] / [4日目] / [5日目]","\t[6日目] / [7日目] / [8日目] / [9日目] / [10日目]","\t[11日目] / [12日目]","\tクオリアさん [100フォロワー]まとめ","\t[13日目] / [14日目] / [15日目]","\t[16日目] / [17日目] / [18日目] / [19日目] / [20日目]","\t[21日目] / [22日目] / [23日目] / [24日目] / [25日目]","\t[26日目] / [27日目] / [28日目] / [29日目] / [30日目]","\t[31日目] / [32日目] / [33日目] / [34日目] / [35日目]","\t[36日目] / [37日目] / [38日目] / [39日目] / [40日目]","\t[41日目] / [42日目] / [43日目] / [44日目] / [45日目]","\t[46日目] / [47日目] / [48日目] / [49日目] / [50日目]","\t[51日目] / [52日目] / [53日目] / [54日目] / [55日目]","\t[56日目] / [57日目] / [58日目] / [59日目] / [60日目]","\t[61日目] / [62日目] / [63日目] / [64日目]","[* English]","\t[Day 1] / [Day 2] / [Day 3] / [Day 4] / [Day 5]","\t[Day 6] / [Day 7] / [Day 8] / [Day 9] / [Day 10]","\t[Day 11] / [Day 12]","\t[100 followers]","\t[Day 13] / [Day 14] / [Day 15]","\t[Day 16] / [Day 17] / [Day 18] / [Day 19] / [Day 20]","\t[Day 21] / [Day 22] / [Day 23] / [Day 24] / [Day 25]","\t[Day 26] / [Day 27] / [Day 28] / [Day 29] / [Day 30]","\t[Day 31] / [Day 32] / [Day 33] / [Day 34] / [Day 35]","\t[Day 36] / [Day 37] / [Day 38] / [Day 39] / [Day 40]","\t[Day 41] / [Day 42] / [Day 43] / [Day 44] / [Day 45]","\t[Day 46] / [Day 47] / [Day 48] / [Day 49] / [Day 50]","\t[Day 51] / [Day 52] / [Day 53] / [Day 54] / [Day 55]","\t[Day 56] / [Day 57] / [Day 58] / [Day 59] / [Day 60]","\t[Day 61] / [Day 62] / [Day 63] / [Day 64]","","[* Special Event Posts]","\t[2022 Year End Picture]","\t[Happy New Year] / [Rabbit Year]","","2023(Season 2)","\t[65日目]","\t[66日目] / [67日目] / [68日目] / [69日目] / [70日目]","\t[71日目] / [72日目] / [73日目] / [74日目] / [75日目]","\t[76日目] / [77日目] / [78日目] / [79日目] / [80日目]","","","\t[Day 65]","\t[Day 66] / [Day 67] / [Day 68] / [Day 69] / [Day 70]","\t[Day 71] / [Day 72] / [Day 73] / [Day 74] / [Day 75]","\t[Day 76] / [Day 77] / [Day 78] / [Day 79] / [Day 80]","","","[雑談]","","[Get started]","Scrapbox ChatGPT Connectorは、ScrapboxとChatGPTを接続するためのシンプルなスクリプトです。"]} 3 | ]} -------------------------------------------------------------------------------- /make_index.py: -------------------------------------------------------------------------------- 1 | import time 2 | import json 3 | import tiktoken 4 | import openai 5 | import pickle 6 | import numpy as np 7 | from tqdm import tqdm 8 | import dotenv 9 | import os 10 | 11 | BLOCK_SIZE = 500 12 | EMBED_MAX_SIZE = 8150 13 | 14 | dotenv.load_dotenv() 15 | openai.api_key = os.getenv("OPENAI_API_KEY") 16 | enc = tiktoken.get_encoding("cl100k_base") 17 | 18 | 19 | def get_size(text): 20 | "take text, return number of tokens" 21 | return len(enc.encode(text)) 22 | 23 | 24 | def embed_text(text, sleep_after_success=1): 25 | "take text, return embedding vector" 26 | text = text.replace("\n", " ") 27 | tokens = enc.encode(text) 28 | if len(tokens) > EMBED_MAX_SIZE: 29 | text = enc.decode(tokens[:EMBED_MAX_SIZE]) 30 | 31 | while True: 32 | try: 33 | res = openai.Embedding.create( 34 | input=[text], 35 | model="text-embedding-ada-002") 36 | time.sleep(sleep_after_success) 37 | except Exception as e: 38 | print(e) 39 | time.sleep(1) 40 | continue 41 | break 42 | 43 | return res["data"][0]["embedding"] 44 | 45 | 46 | def update_from_scrapbox(json_file, out_index, in_index=None): 47 | """ 48 | out_index: Output index file name 49 | json_file: Input JSON file name (from scrapbox) 50 | in_index: Optional input index file name. It is not modified and is used as cache to reduce API calls. 51 | out_index: 出力インデックスファイル名 52 | json_file: 入力JSONファイル名 (scrapboxからの) 53 | in_index: オプショナルな入力インデックスファイル名。変更されず、APIコールを減らすためのキャッシュとして使用されます。 54 | 55 | # usage 56 | ## create new index 57 | update_from_scrapbox( 58 | "from_scrapbox/nishio.json", 59 | "nishio.pickle") 60 | 61 | ## update index 62 | update_from_scrapbox( 63 | "from_scrapbox/nishio-0314.json", "nishio-0314.pickle", "nishio-0310.pickle") 64 | """ 65 | if in_index is not None: 66 | cache = pickle.load(open(in_index, "rb")) 67 | else: 68 | cache = None 69 | 70 | vs = VectorStore(out_index) 71 | data = json.load(open(json_file, encoding="utf8")) 72 | 73 | for p in tqdm(data["pages"]): 74 | buf = [] 75 | title = p["title"] 76 | for line in p["lines"]: 77 | buf.append(line) 78 | body = " ".join(buf) 79 | if get_size(body) > BLOCK_SIZE: 80 | vs.add_record(body, title, cache) 81 | buf = buf[len(buf) // 2:] 82 | body = " ".join(buf).strip() 83 | if body: 84 | vs.add_record(body, title, cache) 85 | 86 | vs.save() 87 | 88 | 89 | class VectorStore: 90 | def __init__(self, name, create_if_not_exist=True): 91 | self.name = name 92 | try: 93 | self.cache = pickle.load(open(self.name, "rb")) 94 | except FileNotFoundError as e: 95 | if create_if_not_exist: 96 | self.cache = {} 97 | else: 98 | raise 99 | 100 | def add_record(self, body, title, cache=None): 101 | if cache is None: 102 | cache = self.cache 103 | if body not in cache: 104 | # call embedding API 105 | self.cache[body] = (embed_text(body), title) 106 | elif body not in self.cache: 107 | # in cache and not in self.cache: use cached item 108 | self.cache[body] = cache[body] 109 | 110 | return self.cache[body] 111 | 112 | def get_sorted(self, query): 113 | q = np.array(embed_text(query, sleep_after_success=0)) 114 | buf = [] 115 | for body, (v, title) in tqdm(self.cache.items()): 116 | buf.append((q.dot(v), body, title)) 117 | buf.sort(reverse=True) 118 | return buf 119 | 120 | def save(self): 121 | pickle.dump(self.cache, open(self.name, "wb")) 122 | 123 | 124 | if __name__ == "__main__": 125 | # Sample default arguments for update_from_scrapbox() 126 | JSON_FILE = "from_scrapbox/tiny_sample.json" 127 | INDEX_FILE = "tiny_sample.pickle" 128 | 129 | update_from_scrapbox(JSON_FILE, INDEX_FILE) 130 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | aiohttp==3.8.4 2 | aiosignal==1.3.1 3 | async-timeout==4.0.2 4 | attrs==22.2.0 5 | autopep8==2.0.2 6 | blobfile==2.0.1 7 | certifi==2022.12.7 8 | charset-normalizer==3.1.0 9 | filelock==3.9.0 10 | frozenlist==1.3.3 11 | idna==3.4 12 | lxml==4.9.2 13 | multidict==6.0.4 14 | numpy==1.24.2 15 | openai==0.27.1 16 | pycodestyle==2.10.0 17 | pycryptodomex==3.17 18 | python-dotenv==1.0.0 19 | regex==2022.10.31 20 | requests==2.28.2 21 | tiktoken==0.3.0 22 | tqdm==4.65.0 23 | urllib3==1.26.14 24 | yarl==1.8.2 25 | --------------------------------------------------------------------------------