├── LICENSE.md
├── README.md
└── Readme-zh.md
/LICENSE.md:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) [2023] [Munntein]
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # PolyglotSiri Apple Shortcut
2 |
3 | [](https://github.com/Munntein/PolyglotSiri-Apple-Shortcut/stargazers)
4 | [](https://github.com/Munntein/PolyglotSiri-Apple-Shortcut/network/members)
5 | [](https://github.com/Munntein/PolyglotSiri-Apple-Shortcut/blob/main/LICENSE.md)
6 |
7 | [中文介绍](https://github.com/Munntein/PolyglotSiri-Apple-Shortcut/blob/main/Readme-zh.md)
8 |
9 | PolyglotSiri is based on [ChatGPT-Siri](https://github.com/Yue-Yang/ChatGPT-Siri) and enhances its multilingual and speech capabilities.
10 |
11 | Now you can talk to siri with any languge. And siri responses bilingually(English and Chinese as i set) with a more natural vocie. Now you can treat Siri as a foreign friend and practice language training with it.
12 |
13 | In addition to the OpenAI chatGPT API, Three other REST APIs are used:
14 |
15 | 1. OpenAI Whisper API: used as an STT service which can handle multiple languages. Native Siri can only understand one language at a time.
16 | 2. Azure Language Cognitive API: used to recognize the language of text for later Azure TTS language configuration.
17 | 3. Azure TTS: using Azure TTS to replace iOS native ones, resulting in more natural-sounding voice.
18 |
19 |
20 |
21 | ## Shortchut Link
22 | https://www.icloud.com/shortcuts/61bbd941bfce433cb9193d990a8f4004
23 |
24 | (the previous shortcut link has an issue, the url suffix after endpoint "/text/analytics/v3.0/languages" of azure language service was lost by mistake)
25 |
26 |
27 |
28 | ## Configurition and Customization
29 |
30 | 0. First you have to apply for openai api and Azure cloud
31 | - login into OpenAI, https://platform.openai.com
32 | - login into Azure, https://portal.azure.com/
33 | - In Azure cloud, two resource needed, speech and language. Depoly them.
34 | 1. Fill your api keys in this shortcut.
35 | - you need two openai keys, one for whisper and the other for completion.
36 | - Fill in Azure tts and Azure language cognitive service Keys.
37 | 2. Substitute Azure service REST API urls. There are three of them, one for text language cognition, and two for Azure TTS (aiming for bilingual voice support, you could customize more for mutiple language)
38 | 3. Customize Azure TTS params
39 | - language and voice setting, in TTS request body, customize your language and vocie setting. Find all vocie here, https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts#supported-languages
40 | - In TTS request header, you can specify the audio output quality “X-Microsoft-OutputFormat”. Find all audio outputs here, https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech?tabs=streaming#audio-outputs
41 | 4. Specify mulit-language if statement
42 | - To make azure TTS support multi-languge vocie, we need to use Azure language cognition to recognize text language. And using if statements to make it using the corresponding TTS language engine when speaking.
43 |
44 |
45 |
46 | ## Usage
47 |
48 | 1. Download the [shortcut](https://www.icloud.com/shortcuts/61bbd941bfce433cb9193d990a8f4004).
49 | 2. Simply tap the shortcut and let the magic happen.
50 | 3. The audio recording will stop after 15 seconds by default. You can stop it either by tapping or waiting for the countdown to end.
51 | 4. Wait for responce.
52 |
53 |
54 |
55 | ## Issues
56 |
57 | This shortcut is somewhat useful for language learning, but it’s not perfect. There are some issues.
58 |
59 | 1. Why are two OpenAI keys necessary?
60 |
61 | - In my practice, using the same key to request whisper and completion APIs, within very few rounds, say 2 or 3, the token would reach the completion api limits. Seperating the API keys would solve the problem. However, I'm still not quite clear on the rules, why whisper API requests counts towards the completion token size.
62 |
63 | 2. Since there are three APIs to be called, performance may not be particularly smooth - especially if you need a proxy to access those services.
64 |
65 | 3. It doesn’t support streaming output like many other text chatgpt projects. This, coupled with previous network issues, results in slower response times, particularly when ChatGPT generates longer replies.
66 |
67 | 4. **The most critial issue**, this shortcut cannot function well in vocie wake-up mode. I tried my best to debug it, while giving up at end. Looking forward someone could solve the issue.
68 |
69 | 5. The second imperfect aspect is that the voice input section cannot automatically stop; manual tapping or waiting for the countdown to finish is required instead.
70 |
71 |
72 |
73 |
74 |
75 | ## Acknowledge
76 |
77 | 1. [ChatGPT-Siri](https://github.com/Yue-Yang/ChatGPT-Siri)
78 |
79 |
80 |
--------------------------------------------------------------------------------
/Readme-zh.md:
--------------------------------------------------------------------------------
1 | # PolyglotSiri 苹果快捷方式
2 |
3 | [](https://github.com/Munntein/PolyglotSiri-Apple-Shortcut/stargazers)
4 | [](https://github.com/Munntein/PolyglotSiri-Apple-Shortcut/network/members)
5 | [](https://github.com/Munntein/PolyglotSiri-Apple-Shortcut/blob/main/LICENSE.md)
6 |
7 | PolyglotSiri 基于 [ChatGPT-Siri](https://github.com/Yue-Yang/ChatGPT-Siri) 并增强了其多语言和语音能力。
8 |
9 | 现在您可以用任何语言与 Siri 对话。而且 Siri 的响应是双语的(我设置为英文和中文),声音更加自然。现在您可以把 Siri 当作外国朋友,并与它进行语言训练。
10 |
11 | 除了 OpenAI chatGPT API,还使用了其他三个 REST API:
12 |
13 | 1. OpenAI Whisper API:用作 STT 服务,可处理多种语言。原生的 Siri 只能一次理解一种语言。
14 |
15 | 2. Azure Language Cognitive API:用于识别文本的语言以供后续 Azure TTS 语言配置使用。
16 |
17 | 3. Azure TTS:使用 Azure TTS 替换 iOS 原生 TTS,从而产生更自然的声音。
18 |
19 |
20 |
21 | ## 快捷方式链接
22 |
23 | https://www.icloud.com/shortcuts/61bbd941bfce433cb9193d990a8f4004
24 |
25 | (之前的捷径有问题,azure language API 服务在endpoint 之后的后缀 "/text/analytics/v3.0/languages" 被弄丢了。)
26 |
27 |
28 |
29 | ## 配置和定制
30 |
31 | 0. 首先您需要申请 openai api 和 Azure cloud
32 |
33 | - 登录到 OpenAI, https://platform.openai.com
34 |
35 |
36 | - 登录到Azure, https://portal.azure.com/
37 |
38 |
39 | - 在Azure云中,需要两个资源,一个是speech一个是language。部署它们。
40 |
41 |
42 | 1. 在此快捷方式中填写您的API密钥。
43 |
44 | - 您需要两个 openai 密钥,一个用于 whisper,另一个用于 completion。
45 |
46 |
47 | - 填写 Azure tts 和 Azure language cognitive service Keys。
48 |
49 |
50 | 2. 替换Azure服务REST API URL。其中有三个,一个用于文本语言认知,另外两个用于Azure TTS(旨在支持双语音支持,您可以根据需要自定义更多的多种语言)
51 |
52 | 3. 自定义 Azure TTS 参数
53 |
54 | - 语言和声音设置,在TTS请求正文中自定义您的语言和声音设置。在此处查找所有声音 https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts#supported-languages
55 |
56 |
57 | - 在TTS请求头中,您可以指定“X-Microsoft-OutputFormat” 音频输出质量。在此处查找所有音频输出 https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-text-to-speech?tabs=streaming#audio-outputs
58 |
59 |
60 | 4. 指定多种语言 if 语句
61 |
62 | - 要使 azure TTS 支持多种语言发音,则需要使用 Azure 语言认知来识别文本的语言,并使用 if 陈述式
63 |
64 |
65 |
66 | ## 使用方法
67 |
68 | 1. 下载 [快捷方式](https://www.icloud.com/shortcuts/61bbd941bfce433cb9193d990a8f4004)。
69 |
70 | 2. 点击快捷方式并等待魔法发生。
71 |
72 | 3. 默认情况下15秒后录制将停止。您可以通过点击或等待倒计时结束来停止它。
73 |
74 | 4. 等待响应。
75 |
76 |
77 |
78 |
79 | ## Issues
80 | 这个快捷方式对语言学习有一定的帮助,但并不完美。存在以下问题:
81 |
82 | 1. 为什么需要两个OpenAI密钥?
83 |
84 | - 在我的实践中,使用相同的密钥请求whisper和completion API,在很少的几轮内(比如2或3轮),token就会达到完成API限制。分离API密钥可以解决这个问题。然而,我还不太清楚规则是什么,为什么whisper API请求计入了完成token大小。
85 |
86 | 2. 由于需要调用三个API,性能可能不是特别流畅——尤其是如果您需要代理访问这些服务。
87 |
88 | 3. 它不支持像许多其他文本chatgpt项目那样的流式输出。这与之前的网络问题结合在一起,导致ChatGPT生成较长回复时响应时间较慢。
89 |
90 | 4. **最关键的问题**是:此快捷方式无法在语音唤醒模式下正常工作。我尽力进行了调试,并最终放弃了它。期待有人能解决这个问题。
91 |
92 | 5. 第二个缺陷方面是:语音输入部分不能自动停止;必须手动点击或等待倒计时结束。
93 |
94 |
95 |
96 |
97 | ## 感谢
98 |
99 | 1.[ChatGPT-Siri](https://github.com/Yue-Yang/ChatGPT-Siri)
100 |
--------------------------------------------------------------------------------