├── .gitignore
├── LICENSE
├── README.md
├── ask.py
├── from_scrapbox
    ├── qualia-san.json
    └── tiny_sample.json
├── make_index.py
└── requirements.txt


/.gitignore:
--------------------------------------------------------------------------------
  1 | *.pickle
  2 | from_scrapbox/*.json
  3 | 
  4 | ## Python template
  5 | # Byte-compiled / optimized / DLL files
  6 | __pycache__/
  7 | *.py[cod]
  8 | *$py.class
  9 | 
 10 | # C extensions
 11 | *.so
 12 | 
 13 | # Distribution / packaging
 14 | .Python
 15 | build/
 16 | develop-eggs/
 17 | dist/
 18 | downloads/
 19 | eggs/
 20 | .eggs/
 21 | lib/
 22 | lib64/
 23 | parts/
 24 | sdist/
 25 | var/
 26 | wheels/
 27 | pip-wheel-metadata/
 28 | share/python-wheels/
 29 | *.egg-info/
 30 | .installed.cfg
 31 | *.egg
 32 | MANIFEST
 33 | 
 34 | # PyInstaller
 35 | #  Usually these files are written by a python script from a template
 36 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 37 | *.manifest
 38 | *.spec
 39 | 
 40 | # Installer logs
 41 | pip-log.txt
 42 | pip-delete-this-directory.txt
 43 | 
 44 | # Unit test / coverage reports
 45 | htmlcov/
 46 | .tox/
 47 | .nox/
 48 | .coverage
 49 | .coverage.*
 50 | .cache
 51 | nosetests.xml
 52 | coverage.xml
 53 | *.cover
 54 | *.py,cover
 55 | .hypothesis/
 56 | .pytest_cache/
 57 | 
 58 | # Translations
 59 | *.mo
 60 | *.pot
 61 | 
 62 | # Django stuff:
 63 | *.log
 64 | local_settings.py
 65 | db.sqlite3
 66 | db.sqlite3-journal
 67 | 
 68 | # Flask stuff:
 69 | instance/
 70 | .webassets-cache
 71 | 
 72 | # Scrapy stuff:
 73 | .scrapy
 74 | 
 75 | # Sphinx documentation
 76 | docs/_build/
 77 | 
 78 | # PyBuilder
 79 | target/
 80 | 
 81 | # Jupyter Notebook
 82 | .ipynb_checkpoints
 83 | 
 84 | # IPython
 85 | profile_default/
 86 | ipython_config.py
 87 | 
 88 | # pyenv
 89 | .python-version
 90 | 
 91 | # pipenv
 92 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 93 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 94 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 95 | #   install all needed dependencies.
 96 | #Pipfile.lock
 97 | 
 98 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 99 | __pypackages__/
100 | 
101 | # Celery stuff
102 | celerybeat-schedule
103 | celerybeat.pid
104 | 
105 | # SageMath parsed files
106 | *.sage.py
107 | 
108 | # Environments
109 | .env
110 | .venv
111 | env/
112 | venv/
113 | ENV/
114 | env.bak/
115 | venv.bak/
116 | 
117 | # Spyder project settings
118 | .spyderproject
119 | .spyproject
120 | 
121 | # Rope project settings
122 | .ropeproject
123 | 
124 | # mkdocs documentation
125 | /site
126 | 
127 | # mypy
128 | .mypy_cache/
129 | .dmypy.json
130 | dmypy.json
131 | 
132 | # Pyre type checker
133 | .pyre/
134 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2023 NISHIO Hirokazu
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Scrapbox ChatGPT Connector
 2 | 
 3 | The Scrapbox ChatGPT Connector is a simple script for connecting Scrapbox and ChatGPT.
 4 | 
 5 | The script is designed so that developers can easily grasp the big picture and customize it to their own needs. Also, the purpose of the project is to show a simple implementation, not to satisfy a wide variety of needs. I encourage everyone to understand the source code and customize it to their own needs.
 6 | 
 7 | ## For Japanese reader
 8 | Visit https://scrapbox.io/villagepump/Scrapbox_ChatGPT_Connector
 9 | 
10 | 
11 | ## How to install
12 | 
13 | Clone the GitHub repository.
14 | 
15 | Run the following commands to install the required libraries.
16 | 
17 | $ pip install -r requirements.txt
18 | 
19 | ## How to use
20 | Obtain an OpenAI API token and save it in an .env file.
21 | 
22 | ```
23 |  OPENAI_API_KEY=sk-...
24 | ```
25 | 
26 | Make index.
27 | 
28 | $ python make_index.py
29 | 
30 | It outputs like below:
31 | 
32 | code::
33 |  % python make_index.py
34 |   97%|███████████████████████████████████████████████████████████████████████████████████████████████████▉ | 846/872 [07:06<00:10, 2.59 It/s]The server is currently overloaded with other requests. Sorry about that! You can retry your request, or contact us through our help center at help. openai.com if the error persists.
35 |  100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 872/872 [07:45<00:00, 1 .87it/s] 
36 | 
37 | Ask. 
38 | 
39 | $ python ask.py
40 | 
41 | It outputs like below:
42 | 
43 | ```
44 | >>>> What is the most important question?
45 | > The most important question is to know ourselves.
46 | ```
47 | 
48 | License
49 | The Scrapbox ChatGPT Connector is distributed under the MIT License. See the LICENSE file for more information.


--------------------------------------------------------------------------------
/ask.py:
--------------------------------------------------------------------------------
 1 | import openai
 2 | from make_index import VectorStore, get_size
 3 | 
 4 | 
 5 | PROMPT = """
 6 | You are virtual character. Read sample output of the character in the following sample section. Then reply to the input.
 7 | ## Sample
 8 | {text}
 9 | ## Input
10 | {input}
11 | """.strip()
12 | 
13 | 
14 | MAX_PROMPT_SIZE = 4096
15 | RETURN_SIZE = 250
16 | 
17 | 
18 | def ask(input_str, index_file):
19 |     PROMPT_SIZE = get_size(PROMPT)
20 |     rest = MAX_PROMPT_SIZE - RETURN_SIZE - PROMPT_SIZE
21 |     input_size = get_size(input_str)
22 |     if rest < input_size:
23 |         raise RuntimeError("too large input!")
24 |     rest -= input_size
25 | 
26 |     vs = VectorStore(index_file)
27 |     samples = vs.get_sorted(input_str)
28 | 
29 |     to_use = []
30 |     used_title = []
31 |     for _sim, body, title in samples:
32 |         if title in used_title:
33 |             continue
34 |         size = get_size(body)
35 |         if rest < size:
36 |             break
37 |         to_use.append(body)
38 |         used_title.append(title)
39 |         rest -= size
40 | 
41 |     text = "\n\n".join(to_use)
42 |     prompt = PROMPT.format(input=input_str, text=text)
43 | 
44 |     print("\nTHINKING...")
45 |     response = openai.ChatCompletion.create(
46 |         model="gpt-3.5-turbo",
47 |         messages=[
48 |             {"role": "user", "content": prompt}
49 |         ],
50 |         max_tokens=RETURN_SIZE,
51 |         temperature=0.0,
52 |     )
53 | 
54 |     # show question and answer
55 |     content = response['choices'][0]['message']['content']
56 |     print("\nANSWER:")
57 |     print(f">>>> {input_str}")
58 |     print(">", content)
59 | 
60 | 
61 | if __name__ == "__main__":
62 |     ask("Scrapbox ChatGPT Connectorって何？", "tiny_sample.pickle")
63 |     ask("クオリアさん、日本語で自己紹介して", "tiny_sample.pickle")
64 | 


--------------------------------------------------------------------------------
/from_scrapbox/tiny_sample.json:
--------------------------------------------------------------------------------
1 | {"name":"tiny_sample","displayName":"tiny_sample","exported":1678252437,"pages":[
2 | {"title":"Qualia-san","created":1668362835,"updated":1676212019,"id":"6371325095071a001d61a1e6","lines":["Qualia-san","[https://scrapbox.io/files/6371326d83fa3100215e1652.png]","[Qualia-san.icon] I am \"Qualia-san\", the communication interface between my home planet and Earth. ","私は「クオリアさん」、私の母星と地球とのコミュニケーション・インターフェイスです。","","This is a forum to keep Qualia-san's tweets together. It is maintained by fans on Earth. You can join by clicking on the link below. Participants are considered to be fans of Qualia-san.","ここはクオリアさんのつぶやきをまとめておくためのフォーラムです。地球上のファンによって管理されています。下のリンクをクリックすると参加できます。参加者はクオリアさんのファンとみなさせていただきます。","","[***** [JOIN https://scrapbox.io/projects/qualia-san/invitations/8c8be80501a9cba7f954a035bd08ee35]]","[***** [Twitter https://twitter.com/Qualia_san] Follow me!]","","管理人が日本人なのでまずは日本語の情報を先にまとめます。いずれ英語もまとめます。","\tAs the admin is Japanese, I will first organize the information in Japanese. English information will be organized in due course.","","まずはこれを読みましょう。","\t[100フォロワー]","\tRead this first: [100 followers]","","Scrapbox上では機械翻訳が使えないので違う言語で読みたい人はTwitter上で読むのがおすすめです。","Machine translation is not available on Scrapbox, so if you want to read in a different language, it is recommended to read on Twitter.","","ルール","　クオリアさんのツイートが貼られている部分は、クオリアさんのお言葉のアーカイブなので書き換えてはいけません","　その他の部分はファンの交流のためのスペースなので好きに書いてかまいません","Rules","　The section where Qualia-san's tweets are pasted is an archive of Qualia-san's words and should not be modified","　The rest of the page is for fan interaction, so you can write whatever you want.","","[CC-BY]","\t投稿や顔画像はCC-BY 4.0ライセンスです。クオリアのTwitterアカウントにリンクしていただければ、転載、再投稿、翻訳、リミックスも可能です。","\tPosts and face images are CC-BY 4.0 licensed. They may be reprinted, reposted, translated, and remixed as long as you link back to Qualia's Twitter account.","","[- 質問箱 https://peing.net/ja/qualia_san]","\t[クオリアさん.icon]質問するホモサピエンスは良いホモサピエンス！","\t\tAny homo sapiens that asks questions is a good homo sapiens!","\tTwitterでのログインができていません","","[*** Season 1]","[* Japanese]","\t[1日目] / [2日目] / [3日目] / [4日目] / [5日目]","\t[6日目] / [7日目] / [8日目] / [9日目] / [10日目]","\t[11日目] / [12日目]","\tクオリアさん [100フォロワー]まとめ","\t[13日目] / [14日目] / [15日目]","\t[16日目] / [17日目] / [18日目] / [19日目] / [20日目]","\t[21日目] / [22日目] / [23日目] / [24日目] / [25日目]","\t[26日目] / [27日目] / [28日目] / [29日目] / [30日目]","\t[31日目] / [32日目] / [33日目] / [34日目] / [35日目]","\t[36日目] / [37日目] / [38日目] / [39日目] / [40日目]","\t[41日目] / [42日目] / [43日目] / [44日目] / [45日目]","\t[46日目] / [47日目] / [48日目] / [49日目] / [50日目]","\t[51日目] / [52日目] / [53日目] / [54日目] / [55日目]","\t[56日目] / [57日目] / [58日目] / [59日目] / [60日目]","\t[61日目] / [62日目] / [63日目] / [64日目]","[* English]","\t[Day 1] / [Day 2] / [Day 3] / [Day 4] / [Day 5]","\t[Day 6] / [Day 7] / [Day 8] / [Day 9] / [Day 10]","\t[Day 11] / [Day 12]","\t[100 followers]","\t[Day 13] / [Day 14] / [Day 15]","\t[Day 16] / [Day 17] / [Day 18] / [Day 19] / [Day 20]","\t[Day 21] / [Day 22] / [Day 23] / [Day 24] / [Day 25]","\t[Day 26] / [Day 27] / [Day 28] / [Day 29] / [Day 30]","\t[Day 31] / [Day 32] / [Day 33] / [Day 34] / [Day 35]","\t[Day 36] / [Day 37] / [Day 38] / [Day 39] / [Day 40]","\t[Day 41] / [Day 42] / [Day 43] / [Day 44] / [Day 45]","\t[Day 46] / [Day 47] / [Day 48] / [Day 49] / [Day 50]","\t[Day 51] / [Day 52] / [Day 53] / [Day 54] / [Day 55]","\t[Day 56] / [Day 57] / [Day 58] / [Day 59] / [Day 60]","\t[Day 61] / [Day 62] / [Day 63] / [Day 64]","","[* Special Event Posts]","\t[2022 Year End Picture]","\t[Happy New Year] / [Rabbit Year]","","2023(Season 2)","\t[65日目]","\t[66日目] / [67日目] / [68日目] / [69日目] / [70日目]","\t[71日目] / [72日目] / [73日目] / [74日目] / [75日目]","\t[76日目] / [77日目] / [78日目] / [79日目] / [80日目]","","","\t[Day 65]","\t[Day 66] / [Day 67] / [Day 68] / [Day 69] / [Day 70]","\t[Day 71] / [Day 72] / [Day 73] / [Day 74] / [Day 75]","\t[Day 76] / [Day 77] / [Day 78] / [Day 79] / [Day 80]","","","[雑談]","","[Get started]","Scrapbox ChatGPT Connectorは、ScrapboxとChatGPTを接続するためのシンプルなスクリプトです。"]}
3 | ]}


--------------------------------------------------------------------------------
/make_index.py:
--------------------------------------------------------------------------------
  1 | import time
  2 | import json
  3 | import tiktoken
  4 | import openai
  5 | import pickle
  6 | import numpy as np
  7 | from tqdm import tqdm
  8 | import dotenv
  9 | import os
 10 | 
 11 | BLOCK_SIZE = 500
 12 | EMBED_MAX_SIZE = 8150
 13 | 
 14 | dotenv.load_dotenv()
 15 | openai.api_key = os.getenv("OPENAI_API_KEY")
 16 | enc = tiktoken.get_encoding("cl100k_base")
 17 | 
 18 | 
 19 | def get_size(text):
 20 |     "take text, return number of tokens"
 21 |     return len(enc.encode(text))
 22 | 
 23 | 
 24 | def embed_text(text, sleep_after_success=1):
 25 |     "take text, return embedding vector"
 26 |     text = text.replace("\n", " ")
 27 |     tokens = enc.encode(text)
 28 |     if len(tokens) > EMBED_MAX_SIZE:
 29 |         text = enc.decode(tokens[:EMBED_MAX_SIZE])
 30 | 
 31 |     while True:
 32 |         try:
 33 |             res = openai.Embedding.create(
 34 |                 input=[text],
 35 |                 model="text-embedding-ada-002")
 36 |             time.sleep(sleep_after_success)
 37 |         except Exception as e:
 38 |             print(e)
 39 |             time.sleep(1)
 40 |             continue
 41 |         break
 42 | 
 43 |     return res["data"][0]["embedding"]
 44 | 
 45 | 
 46 | def update_from_scrapbox(json_file, out_index, in_index=None):
 47 |     """
 48 |     out_index: Output index file name
 49 |     json_file: Input JSON file name (from scrapbox)
 50 |     in_index: Optional input index file name. It is not modified and is used as cache to reduce API calls.
 51 |     out_index: 出力インデックスファイル名
 52 |     json_file: 入力JSONファイル名 (scrapboxからの)
 53 |     in_index: オプショナルな入力インデックスファイル名。変更されず、APIコールを減らすためのキャッシュとして使用されます。
 54 | 
 55 |     # usage
 56 |     ## create new index
 57 |     update_from_scrapbox(
 58 |         "from_scrapbox/nishio.json",
 59 |         "nishio.pickle")
 60 | 
 61 |     ## update index
 62 |     update_from_scrapbox(
 63 |         "from_scrapbox/nishio-0314.json", "nishio-0314.pickle", "nishio-0310.pickle")
 64 |     """
 65 |     if in_index is not None:
 66 |         cache = pickle.load(open(in_index, "rb"))
 67 |     else:
 68 |         cache = None
 69 | 
 70 |     vs = VectorStore(out_index)
 71 |     data = json.load(open(json_file, encoding="utf8"))
 72 | 
 73 |     for p in tqdm(data["pages"]):
 74 |         buf = []
 75 |         title = p["title"]
 76 |         for line in p["lines"]:
 77 |             buf.append(line)
 78 |             body = " ".join(buf)
 79 |             if get_size(body) > BLOCK_SIZE:
 80 |                 vs.add_record(body, title, cache)
 81 |                 buf = buf[len(buf) // 2:]
 82 |         body = " ".join(buf).strip()
 83 |         if body:
 84 |             vs.add_record(body, title, cache)
 85 | 
 86 |     vs.save()
 87 | 
 88 | 
 89 | class VectorStore:
 90 |     def __init__(self, name, create_if_not_exist=True):
 91 |         self.name = name
 92 |         try:
 93 |             self.cache = pickle.load(open(self.name, "rb"))
 94 |         except FileNotFoundError as e:
 95 |             if create_if_not_exist:
 96 |                 self.cache = {}
 97 |             else:
 98 |                 raise
 99 | 
100 |     def add_record(self, body, title, cache=None):
101 |         if cache is None:
102 |             cache = self.cache
103 |         if body not in cache:
104 |             # call embedding API
105 |             self.cache[body] = (embed_text(body), title)
106 |         elif body not in self.cache:
107 |             # in cache and not in self.cache: use cached item
108 |             self.cache[body] = cache[body]
109 | 
110 |         return self.cache[body]
111 | 
112 |     def get_sorted(self, query):
113 |         q = np.array(embed_text(query, sleep_after_success=0))
114 |         buf = []
115 |         for body, (v, title) in tqdm(self.cache.items()):
116 |             buf.append((q.dot(v), body, title))
117 |         buf.sort(reverse=True)
118 |         return buf
119 | 
120 |     def save(self):
121 |         pickle.dump(self.cache, open(self.name, "wb"))
122 | 
123 | 
124 | if __name__ == "__main__":
125 |     # Sample default arguments for update_from_scrapbox()
126 |     JSON_FILE = "from_scrapbox/tiny_sample.json"
127 |     INDEX_FILE = "tiny_sample.pickle"
128 | 
129 |     update_from_scrapbox(JSON_FILE, INDEX_FILE)
130 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
 1 | aiohttp==3.8.4
 2 | aiosignal==1.3.1
 3 | async-timeout==4.0.2
 4 | attrs==22.2.0
 5 | autopep8==2.0.2
 6 | blobfile==2.0.1
 7 | certifi==2022.12.7
 8 | charset-normalizer==3.1.0
 9 | filelock==3.9.0
10 | frozenlist==1.3.3
11 | idna==3.4
12 | lxml==4.9.2
13 | multidict==6.0.4
14 | numpy==1.24.2
15 | openai==0.27.1
16 | pycodestyle==2.10.0
17 | pycryptodomex==3.17
18 | python-dotenv==1.0.0
19 | regex==2022.10.31
20 | requests==2.28.2
21 | tiktoken==0.3.0
22 | tqdm==4.65.0
23 | urllib3==1.26.14
24 | yarl==1.8.2
25 | 


--------------------------------------------------------------------------------