├── 3d_001.md
├── README.md
├── README_TODO_001.md
├── README_TODO_002.md
├── README_asr_web_001.md
├── README_microphone.md
├── README_mobile.md
├── README_player.md
├── RaspiVoiceHAT_001.txt
├── algo_leetcode_001.md
├── android_001.md
├── asm_001.md
├── asr_000.md
├── asr_001.md
├── asr_002.md
├── asr_003.md
├── asr_004.md
├── asr_005.md
├── asr_006.md
├── asr_007.md
├── baidu_asr_rpi_python_001.txt
├── buildroot_001.md
├── car_001.md
├── cmsis_nn_001.txt
├── cocos2d-x_001.md
├── csky_001.txt
├── ctc_001.txt
├── deepspeech_001.txt
├── deepspeech_002.txt
├── ds-cnn_001.txt
├── editor_001.md
├── fpga_cpld_001.md
├── framebuffer_001.txt
├── game_001.md
├── game_server_001.md
├── gcc_001.txt
├── gui_001.md
├── j-link_mdk_keil_error_001.md
├── kws_build_001.txt
├── launching-speech-commands-dataset.txt
├── librosa_001.txt
├── linux_001.md
├── live_001.md
├── lstm_001.txt
├── lstm_002.txt
├── lstm_003.txt
├── mace_001.txt
├── mbed_001.txt
├── mcu_001.md
├── mosaic_001.txt
├── msys_001.md
├── numpy_001.md
├── numpy_002.txt
├── online_game_001.md
├── os_001.md
├── python_speech_001.txt
├── pytorch_speech_commands_001.txt
├── qt_001.md
├── risc-v_001.txt
├── risc-v_002.txt
├── risc-v_003.txt
├── risc-v_004.txt
├── rpa_001.txt
├── rt-smart_ffmpeg_demo.txt
├── script_001.md
├── speech_commands_001.txt
├── tensorflow_001.txt
├── tinymind_001.txt
├── unity_001.md
├── usb_hub_001.md
├── weibo_001.txt
└── ytk_001.txt


/3d_001.md:
--------------------------------------------------------------------------------
 1 | ## old list  
 2 | * https://github.com/weimingtom/BlenderStudy  
 3 | 
 4 | ## Android  
 5 | * https://github.com/FuKeKe/GameEngineer  
 6 | aka https://github.com/kekezbw/GameEngineer  
 7 | search baidupan, LoveGameEngine  
 8 | 
 9 | ## Java  
10 | * https://github.com/weimingtom/game-ide  
11 | 
12 | ## PV3D  
13 | * https://github.com/weimingtom/papervision3d_java  
14 | 
15 | ## 3d model download  
16 | * search baidupan, akane  
17 | 


--------------------------------------------------------------------------------
/README_TODO_001.md:
--------------------------------------------------------------------------------
 1 | 
 2 | ## Related Projects  
 3 | * https://github.com/weimingtom/wmt_speech_study  
 4 | * https://github.com/weimingtom/wmt_matlab_study  
 5 | * https://github.com/weimingtom/asr_rpi3b_hello  
 6 | * https://github.com/weimingtom/asr_android_hello  
 7 | * search baidupan, work_asr_js  
 8 | * search baidupan, work_pocketsphinx_asr  
 9 | * html5, search baidupan, speech_v1.rar  
10 | * maixduino, search baidupan, voice_control_led_en_v2_success.rar  
11 | * search baidupan, work_stm32_sound_record  
12 | * (TODO) stm32f103zet6, WM8960_Audio_Board_Code_v1.rar  
13 | * (TODO) asr_rpi3b_hello add py-webrtcvad support, see here:  
14 | https://github.com/LoveThinkinghard/Raspibot/blob/master/vadSound.py  
15 | * (TODO) rpi3, search baidupan, asr_rpi3b_hello_v2_vad.tar.gz  
16 | * (TODO) en.stsw-stm32068.zip  
17 | * (TODO) work_hmm  
18 | * (TODO) hhle88/GMM-HMM  
19 | * search baidupan, stm32, WM8960_Record_v3_success_inmp441.rar  
20 | * (TODO) backup pan: /1tb_new/ipan/work_hmm/speech-recognition_git.rar  
21 | * (TODO) ipan/_black_upan_backup_20200803_orange  
22 | * (TODO) ipan/_black_upan_backup_20200803_black  
23 | * (TODO) search baidupan deepspeech_readme.txt  
24 | * (TODO) search baidupan readme_pocketsphinx.txt  
25 | * (TODO, IMP) ***number_classifier_tflearn.py, not done***    
26 | * (TODO, IMP) ***speech_recognizer, Chapter07_v3.zip, not done***  
27 | * (TODO, IMP) ***example_vosk.txt***  
28 | * (TODO, IMP) ***kaldi_001.txt***  
29 | * (TODO, IMP) ***xunfei_aiui, android***  
30 | * (IMP) search aiuidemo_v1.rar, 讯飞平台AIUI WebAPI调用Java工程, see below : Xunfei (iflytek) WebAPI v2    
31 | * (TODO) search baidupan, tflite_tensorflow_lite_adafruit  
32 | * (IMP) search baidupan, Blink_esp32_v6.rar, tflite, tensorflow lite, green standalone version, Arduino IDE compiling success  
33 | * (IMP) search baidupan, Blink_esp32_rpd2017_v2_success.tar.gz, tflite, tensorflow lite, run on ubuntu / linux / rpd  
34 | see https://github.com/bsatrom/tf-speech-particle, run example 10s voice with some problems    
35 | * (TODO) ***Python机器学习, book, 高斯混合模型的Python实现代码***  
36 | * (TODO) ***Python深度学习实战 基于TensorFlow和Keras的聊天机器人以及人脸 物体和语音识别, book, see below***  
37 | * (TODO) ***LSTM+CTC, search below***  
38 | * (TODO) ***TinyML epub, pdf, Tensorflow_AIOT2019***   
39 | * (TODO) ***SmartSpeaker, stm32***  
40 | * (TODO) ***tencentasrdemo_v1.rar, see below 腾讯云语音识别, todo, use less java code***  
41 | * (???) mlpack LSTM   
42 | * (???) esp32_kws  
43 | * (TODO) ***see below, CTC tensorflow example, 语音识别（LSTM+CTC）***  
44 | * (TODO) build kaldi, search baidupan, kaldi_20200917_pre.tar.gz, work_kaldi  
45 | * kasiim/ESP-EYE-speaker-verification  
46 | * margosall/Digitization-Realization  
47 | * patrikstarck/hmm_speech_recognition  
48 | * huybk213/Mainboard/blob/master/Src/voice_command.c  
49 | * Arduino_TensorFlowLite  
50 | * (TODO, IMP) ***baidupan, micro_speech_ESP-EYE_v1_not_good.rar***  
51 | * (x) 《人工智能》清华大学版  
52 | * (x) M5StickC, tanakamasayuki/Arduino_TensorFlowLite_ESP32  
53 | * (TODO, IMP) https://github.com/kasiim/ESP-EYE-speaker-verification  
54 | * (TODO, IMP) ***voice_control_led_en_v2_success.rar, port to linux***  
55 | * 机器学习经典算法实践  
56 | * (???IMP) https://github.com/accraze/keyword-spotter  
57 | * ARM快速嵌入式系统原型设计, search baidupan  
58 | * (TODO, IMP???) https://github.com/KFUA/TensorflowCode/blob/master/9-23%20%20yuyinchall.py  
59 | * (TODO, IMP???) search baidupan, 17.speech_recognition.zip  
60 | * (TODO, IMP???) search baidupan, 20.speech_recognition_app.zip  
61 | * https://github.com/emlearn/emlearn    
62 | * https://github.com/jonnor/embeddedml    
63 | * (IMP) https://github.com/search?q=kaggle+speech  
64 | * (???IMP) https://github.com/PacktPublishing/Hands-On-Natural-Language-Processing-with-Python/blob/master/Chapter11/01_example.ipynb  
65 | * https://github.com/atomic14/diy-alexa  
66 | 
67 | ## 语音识别移植, 简单离线语音识别，TODO    
68 | * OC Volume  
69 | * PocketSphinx  
70 | * snowboy  
71 | * gk969/stm32-speech-recognition  
72 | * ASRFrame  
73 | * Maixduino->Maix_Speech_Recognition; MaixPy->audio, MIC_ARRAY  
74 | * Python-Machine-Learning-Cookbook, Chapter07/speech_recognizer.py, Python机器学习经典实例  
75 | * audier/my_python_play  
76 | * audier/DeepSpeechRecognition  
77 | * 数字语音处理及MATLAB仿真(search baidupan 数字语音处理及MATLAB仿真), matlab    
78 | * PacktPublishing/Python-Machine-Learning-Cookbook/blob/master/Chapter07/speech_recognizer.py, Python机器学习经典实例    
79 | * 实时语音处理实践指南  
80 | * (For xubuntu 20.04 64bit, Python 2.7 64bit and TensorFlow 1.5.0) TensorFlow/examples/speech_commands  
81 | * (For C++, tensorflow-2.1.1 or 2.3.0-rc0, 2.3.0-rc1) tensorflow/lite/micro/examples/micro_speech  
82 | * (For Matlab 7.0) 《语音信号处理实验教程》, 语音信号处理代码/第10章 语音识别/10.2 基于隐马尔可夫模型（HMM）的孤立字语音识别实验  
83 | https://github.com/veenveenveen/SpeechSignalProcessingCourse  
84 | * ONLY VAD: (for Python), https://github.com/wiseman/py-webrtcvad  
85 | * ARM-software/ML-KWS-for-MCU  
86 | arduino version see : Infineon/KWS-for-XMC  
87 | * 《Python+TensorFlow机器学习实战》  
88 | search baidupan, 20190513103207287.zip, 第9章  
89 | search baidupan, Python_TensorFlow机器学习实战  
90 | * 《Tensorflow入门与实战》, thewintersun/tensorflowbook  
91 | 《Tensorflow入门与实战》, 第六章《循环神经网络》，6.4《用LSTM+CTC实现语音识别》  
92 | CTC tensorflow example 代码解析  
93 | igormq/ctc_tensorflow_example  
94 | 


--------------------------------------------------------------------------------
/README_TODO_002.md:
--------------------------------------------------------------------------------
  1 | 
  2 | ## ESP32, voice recording    
  3 | * https://github.com/MhageGH/esp32_SoundRecorder  
  4 | * https://github.com/MhageGH/esp32_CloudSpeech  
  5 | * https://github.com/lixy123/TTGO_T_Watch_Baidu_Rec  
  6 | * https://github.com/atomic14/esp32_audio  
  7 | 
  8 | ## tensorflow lite micro (tflite) esp32 port  
  9 | * ref: https://github.com/tanakamasayuki/Arduino_TensorFlowLite_ESP32/tree/master/examples/micro_speech_M5StickC  
 10 | * First NodeMCU-32S and INMP441 breadboard run success  
 11 | only use PIN2 (builtin LED) to test YES, no other LED used    
 12 | search baidupan, libraries_20201004.rar  
 13 | micro_speech_ESP-EYE_v4_success_yes_new_compiler_mianbaoban.rar  
 14 | for comparation, micro_speech_M5StickC_v1_compare.rar  
 15 | * arduino one file compile: Blink_esp32_v6.rar  
 16 | * linux build: Blink_esp32_rpd2017_v2_success.tar.gz  
 17 | 
 18 | ## tensorflow lite micro (tflite) stm32 / mbed / arm port / etc      
 19 | * (TODO) https://github.com/uTensor/tf_microspeech  
 20 | * (TODO, DOC) https://github.com/COTASPAR/K66F  
 21 | * (TODO) https://github.com/ARMmbed/TensorFlow_MIMXRT1064-EVK_Microspeech  
 22 | * (TODO) https://github.com/jasonwhwang/tensorflow_micro_speech_mbed/blob/master/micro_speech/audio_provider.cpp  
 23 | use A0, MAX9814, see 《Arduino+MAX9814制作简易录音机》  
 24 | * https://github.com/PhilippvK/stm32-tflm-micro-speech  
 25 | * https://github.com/openmv/tensorflow-lib  
 26 | * https://github.com/AmbiqMicro/TFLiteMicro_MicroSpeech_Keil_AP3BEVB  
 27 | * https://github.com/42io/tflite_kws  
 28 | * (DOC) https://github.com/ARM-software/armnn  
 29 | * (DOC) https://developer.arm.com/solutions/machine-learning-on-arm/developer-material/how-to-guides  
 30 | * (DOC) https://github.com/arduino/AIoT-Dev-Summit-2019  
 31 | 
 32 | ## TFLite work  
 33 | * blink_v2_micro_speech_success.tar.gz  
 34 | with esp-idf-v3.3.4  
 35 | * freertos_stm32f103rct6_v3_queue.rar  
 36 | STM32 CMSIS-FreeRTOS demo    
 37 | * freertos_stm32f103rct6_v4_ac6.rar  
 38 | STM32 CMSIS-FreeRTOS demo for ac6      
 39 | * inmp441_stm32f411re_v2_success_3bit4bit.rar  
 40 | stm32f411re inmp441 i2s  
 41 | * WM8960_Record_v3_success_inmp441.rar  
 42 | stm32f103ze inmp441 i2s  
 43 | * WM8960_Record_stm32f103ze_compare.rar  
 44 | stm32f103ze inmp441 i2s mod from WM8960_Audio_Board_Code, differences   
 45 | * microspeech_stm32f411re_v4_compile.rar  
 46 | final stm32 project micro_speech ac6  
 47 | * micro_speech_vs2013_success.rar  
 48 | win32, vs2013  
 49 | * microspeech_stm32h743vi_v4.rar  
 50 | stm32h743 project micro_speech ac6 not tested  
 51 | 
 52 | ## TFLite work TODO    
 53 | * (TODO) rpi, SDL2, PortAudio  
 54 | * (TODO) port to ESP32 ESP-IDF    
 55 | see https://docs.espressif.com/projects/esp-idf/zh_CN/latest/esp32/get-started/index.html  
 56 | * (TODO) ADC version (replace i2s, pdm)    
 57 | * (TODO) port to other board (STM32, GD32, K210, ...)  
 58 | * (TODO) old version of arduino esp32 core  
 59 | * (TODO) port to w600: https://docs.w600.fun/?p=product/arduino.md  
 60 | * (TODO) port to Wio Terminal  
 61 | 
 62 | ## ML-KWS, nucleo-f411re    
 63 | * ref: https://github.com/ARM-software/ML-KWS-for-MCU/tree/master/Deployment  
 64 | * ref: https://github.com/ARM-software/ML-KWS-for-MCU/tree/master/Deployment/Examples/simple_test  
 65 | * mbed-cli, search baidupan, kws_simple_test_v1.rar, build result: BUILD_v1.rar  
 66 | * for NUCLEO-F411RE, SRAM: 34KB，Flash: 204KB  
 67 | * compile method: https://github.com/weimingtom/wmt_ai_study/blob/master/kws_build_001.txt  
 68 | * (TODO) https://github.com/2524056672/kws-stm32f7disco-cmsisNN  
 69 | * (TODO) https://github.com/JeffyCN/ARM-KWS-demo  
 70 | * https://os.mbed.com/users/mbed_official/code/mbed/  
 71 | * https://os.mbed.com/users/mbed_official/code/mbed-sdk-tools  
 72 | * (TODO) **解决mbed-cli绿色版问题**  
 73 | * (TODO) **测试stm32f411ce是否兼容**  
 74 | * (DONE) st-link virtual serial, only for NUCLEO-F411RE  
 75 | kws_simple_test_v1.rar, simple_test_v2_success.rar    
 76 | BUILD_v1.rar  
 77 | * (DONE) PA9 PA10 serial (USART1), compatible with stm32f411ce  
 78 | simple_test_v3_stm32f411ce_run_success.rar  
 79 | BUILD_v3_stm32f411ce_run_success.rar  
 80 | 
 81 | ```
 82 | //UART1_TX==PA_9==D8<->FT232.RXD  
 83 | //UART1_RX==PA_10==D2<->FT232.TXD  
 84 | serial_init(&stdio_uart, PA_9, PA_10);  //redirect to Serial1  
 85 | stdio_uart_inited = 1;   
 86 | printf("ready\r\n");  
 87 | ```  
 88 | * (TODO) search baidupan, init template project, blink_v1_stm32f411ce_init.rar  
 89 | 
 90 | ## speech_commands, tensorflow 1.5.0   
 91 | **TODO: 待解决，用i5的电脑开虚拟机已经可以正常安装tf 2.x，但旧电脑好像不行**    
 92 | * 命令行参照这篇：  
 93 | https://www.cnblogs.com/lijianming180/p/12258774.html    
 94 | 我不是自己编译tensorflow的，我用的方法是用xubuntu安装python2（就是2.7）  
 95 | 和python2-pip（需要特殊方法安装pip），然后再离线安装tensorflow 1.5  
 96 | （不过依赖包还是在线安装），简单说，就是我利用旧版本来绕过CPU指令集的问题  
 97 | （直接安装最新cpu版本，运行是会报错的）。除了安装问题，还有一个问题是训练  
 98 | 模型的时间非常长，我现在没有彻底跑完整个train.py（我估计要跑一天），我在  
 99 | 想有没有办法缩短训练数据的时间，或者是否存在断点执行的方法  
100 | * search baidupan, tensorflow-1.5.0-cp27-none-linux_x86_64.whl  
101 | * search baidupan, tensorflow-1.5.0.zip  
102 | * search baidupan, speech_commands_v0.01.tar.gz  
103 | * 安装Python 2和pip2:  
104 | see https://www.cnblogs.com/zhuangliu/archive/2016/11/20/6083063.html  
105 | (???) $ sudo apt-get install python2.7    
106 | $ wget https://bootstrap.pypa.io/get-pip.py  
107 | $ sudo python2 get-pip.py  
108 | $ sudo python2 -m pip install tensorflow-1.5.0-cp27-none-linux_x86_64.whl  
109 | $ python2  
110 | * other study project  
111 | (TODO) https://github.com/accraze/keyword-spotter  
112 | search here, keyword-spotter  
113 | 
114 | 
115 | ## (TODO, baidupan) stm32 sound record  
116 | * (TODO, in home computer) stm32f103zet6, Open103Z_I2S, WM8960_Record, WM8960_Audio_Board_Code_v1.rar  
117 | * STM32F4-Discovery_FW_V1.1.0, Audio_playback_and_record, en.stsw-stm32068.zip  
118 | * stm32F4_dsp_microphone_fft_rtos.rar  
119 | * (TODO) STM32F4 i2s/adc/pdm recording code  
120 | * ref https://os.mbed.com/code/  
121 | 
122 | ## (TODO, baidupan) stm32 (and other) sound process  
123 | * STM32F407VG, ASR_Project.rar  
124 | * STM32F407VG, Design_Project-Speech_Recognition_on_Embedded_System.rar  
125 | * k210, m5stickv-tensorflow-lite-micro.rar, maixcube-tensorflow-lite-micro.rar  
126 | * esp32, ML-KWS-for-ESP32.rar  
127 | * STM32F746NG, ml-kws-for-mcu_alxkbr.rar  
128 | * STM32F407VGT6_1, SmartPillow.rar  
129 | * STM32F429ZI, Speaker-Recognition-System-in-ARM.rar, SR_stm32f429zi.rar  
130 | * STM32F407VG, Voice-Recognition.rar  
131 | * mic_vad_streaming, DeepSpeech-examples.rar  
132 | * (TODO) STM32f103VE, stm32-speech-recognition_v2.rar  
133 | 
134 | ## TensorFlow example speech_commands   
135 | * http://t.rock-chips.com/forum.php?mod=viewthread&tid=456&extra=page%3D1  
136 | * https://github.com/tensorflow/docs/blob/master/site/en/r1/tutorials/sequences/audio_recognition.md    
137 | * https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/speech_commands  
138 | 
139 | ## Wav analyzing Software  
140 | * GoldWave  
141 | http://www.goldwave.com  
142 | * Audacity  
143 | https://www.audacityteam.org/download/windows/  
144 | 
145 | ## ADC  
146 | https://github.com/sparkfun/MEMS_Mic_Breakout-ADMP401/blob/V_1.3/Firmware/SparkFun_ADMP401_Simple_Sketch/SparkFun_INMP401.ino  
147 | https://os.mbed.com/users/rayxke/notebook/sparkfun-mems-microphone-breakout---inmp401-admp40/  
148 | https://github.com/jenfoxbot/MEMSMicHookUpGuide/blob/master/ExampleCode.ino  
149 | https://learn.sparkfun.com/tutorials/mems-microphone-hookup-guide  
150 | Arduino+MAX9814制作简易录音机  
151 | https://www.cnblogs.com/Ray-liang/p/9786154.html  
152 | 


--------------------------------------------------------------------------------
/README_asr_web_001.md:
--------------------------------------------------------------------------------
1 | ## tencentasrdemo_v1.rar  
2 | 腾讯云语音识别  
3 | 
4 | ## aiuidemo_v1.rar  
5 | 讯飞  
6 | 


--------------------------------------------------------------------------------
/README_microphone.md:
--------------------------------------------------------------------------------
 1 | ## Ai-Thinker(安信可) NodeMCU-32S, ESP32  
 2 | * M5Stack PDM Unit + M5Stack Mini-Proto Unit, M5Stack (明栈)      
 3 | (demo) PDM_v1_nodemcu32s_success.rar    
 4 | (origin) https://docs.m5stack.com/#/zh_CN/unit/pdm  
 5 | (origin) https://github.com/m5stack/M5-ProductExampleCodes/tree/master/Unit/PDM  
 6 | PDM<->NodeMCU-32S  
 7 | CLK (white) <->GPIO22 (right top 3)  
 8 | DAT (yellow)<->GPIO21 (right top 6)  
 9 | 5V  (red)   <->3V3    (left top 1)  
10 | GND (black) <->GND    (right top 1)  
11 | 
12 | * ADMP401, 都会明武  
13 | (demo) ADMP401_v1_success_duhui.rar  
14 | (origin, from code for SPW2430) https://esp32.com/viewtopic.php?t=7077#p30450  
15 | (origin, from code for SPW2430) https://forums.adafruit.com/viewtopic.php?f=8&t=140676  
16 | **WARNING: DON'T PUT 5V to ADMP401 VCC PIN**  
17 | ADMP401<->NodeMCU-32S  
18 | VCC<->3.3V  (left top 1)  
19 | GND<->GND   (right top 1)  
20 | AUD<->GPIO4 (right bottom 7)  
21 | 
22 | * MAX9814, 育松电子    
23 | 40dB (Gain<->3V3, less noise), 50dB (Gain<->GND), 60dB (Gain<->Not Connect, more noise)    
24 | (demo) ADMP401_v1_success_duhui.rar  
25 | (origin, from code for SPW2430) https://esp32.com/viewtopic.php?t=7077#p30450  
26 | (origin, from code for SPW2430) https://forums.adafruit.com/viewtopic.php?f=8&t=140676  
27 | MAX9814<->NodeMCU-32S  
28 | GND<->GND   (right top 1)  
29 | Vdd<->3.3V  (left top 1)  
30 | Gain<->3.3V (Vdd)  
31 | Out<->GPIO4 (right bottom 7)  
32 | AR<->NC (not connect)    
33 | 
34 | ## Arduino Uno, ATMEGA328P  
35 | * Music Shield, VS1053B, waveshare (微雪)  
36 | (demo) ???  
37 | (origin) https://www.waveshare.net/wiki/Music_Shield  
38 | 
39 | ## ESP32-Audio-kit, AI-Thinker (安信可)    
40 | * (origin) https://docs.ai-thinker.com/esp32-audio-kit  
41 | (origin) https://github.com/RealCorebb/ESP32-A1s-Audio-Kit  
42 | (origin) https://github.com/Ai-Thinker-Open/ESP32-A1S-AudioKit  
43 | 


--------------------------------------------------------------------------------
/README_mobile.md:
--------------------------------------------------------------------------------
  1 | ## mobile-deep-learning
  2 | * mobile-deep-learning_mod_v2_success.rar  
  3 | (origin) 《移动深度学习》  
  4 | (origin) https://github.com/allonli/mobile-deep-learning  
  5 | Key code : if defined(MDL_V7)  
  6 | https://github.com/allonli/mobile-deep-learning/blob/master/src/math/gemm.cpp  
  7 | 
  8 | ## mace  
  9 | * mace_android_demo_v6_fast.rar  
 10 | (origin) https://github.com/XiaoMi/mace  
 11 | (origin) https://mace.readthedocs.io/en/latest/introduction.html  
 12 | * (TODO) https://github.com/zhy520xp/mace-makefile-project  
 13 | * (TODO) https://github.com/conansherry/easy_mace  
 14 | * (TOOD) https://github.com/conansherry/convert_model  
 15 | * (TODO) https://github.com/huuuuusy/Xiaomi-MACE-Notes  
 16 | * (TODO) https://github.com/qinRight/macelibrary  
 17 | * (TODO) mcu version  
 18 | * 英文文档  
 19 | https://blog.csdn.net/asonle/article/details/80869518  
 20 | https://mace.readthedocs.io/en/latest/index.html  
 21 | https://buildmedia.readthedocs.org/media/pdf/mace/latest/mace.pdf  
 22 | 
 23 | ## MNN  
 24 | * mnndemo_v2_no_jni_success.rar  
 25 | (origin) https://github.com/alibaba/MNN  
 26 | (origin) https://www.yuque.com/mnn/cn/build_android  
 27 | 
 28 | ## TNN  
 29 | * tnndemo_v1_slow.rar  
 30 | (origin) https://github.com/Tencent/TNN   
 31 | 
 32 | ## NCNN  
 33 | * squeezencnn_v1.rar  
 34 | (origin) https://github.com/Tencent/ncnn  
 35 | * NCNN使用总结  
 36 | https://blog.csdn.net/u011046017/article/details/92849082  
 37 | 
 38 | ## MindSpore Lite  
 39 | * himindsporedemo_v2_success.rar  
 40 | (origin) image_classification  
 41 | https://download.mindspore.cn/model_zoo/official/lite/mobilenetv2_openimage_lite/  
 42 | https://gitee.com/mindspore/mindspore/tree/r1.0/model_zoo/official/lite/image_classification  
 43 | https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/lite/android_aarch64/mindspore-lite-1.0.0-minddata-arm64-cpu.tar.gz  
 44 | https://ms-release.obs.cn-north-4.myhuaweicloud.com/1.0.0/lite/android_aarch32/mindspore-lite-1.0.0-minddata-arm32-cpu.tar.gz  
 45 | 
 46 | ## Paddle-Lite  
 47 | * paddlelitedemo_v1_success.rar  
 48 | (origin) Paddle-Lite-Demo  
 49 | https://github.com/PaddlePaddle/Paddle-Lite-Demo  
 50 | https://paddle-lite.readthedocs.io/zh/latest/demo_guides/android_app_demo.html  
 51 | (model) https://paddlelite-demo.bj.bcebos.com/models/ssd_mobilenet_v1_fp32_224_for_cpu_v2_6_0.tar.gz  
 52 | (so) https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html  
 53 | (so) https://paddlelite-data.bj.bcebos.com/Release/2.6.1/Android/inference_lite_lib.android.armv7.gcc.c++_shared.CV_OFF.tar.gz  
 54 | (demo) https://github.com/PaddlePaddle/Paddle-Lite-Demo/tree/b6c36ecfdc0ad6f500c72b40c8b0c67916421ab9/PaddleLite-android-demo/object_detection_demo  
 55 | 
 56 | ## Tengine-Lite  
 57 | * tenginelitedemo_v3_success_ssd.rar  
 58 | (origin) https://github.com/OAID/Tengine  
 59 | (model) search baidupan, Tengine-models/赏金任务模型/mobilenet_ssd.tmfile  
 60 | use LOCAL_WHOLE_STATIC_LIBRARIES   
 61 | * 官网资料下载  
 62 | http://www.tengine.org.cn/info.php?class_id=105105  
 63 | http://www.eaidk.com/info.php?class_id=102&bz=1  
 64 | 
 65 | ## TensorFlow Lite (TFLite)  
 66 | * tflitedemo_v1_success.rar  
 67 | (origin) https://tensorflow.google.cn/lite/models/image_classification/overview?hl=zh_cn  
 68 | (origin) https://tensorflow.google.cn/lite/guide/hosted_models?hl=zh_cn  
 69 | 
 70 | ## PyTorch Mobile   
 71 | * pytorchmobiledemo_v1_success.rar  
 72 | (origin) https://github.com/pytorch/android-demo-app/tree/master/NativeApp  
 73 | (origin) https://pytorch.org/tutorials/recipes/android_native_app_with_custom_op.html  
 74 | (origin) https://pytorch.org/mobile/android/#api-docs  
 75 | 
 76 | ## ARMNN (for Android), ML-examples armnn-mobilenet-quant     
 77 | * (origin) https://github.com/ARM-software/armnn  
 78 | https://review.mlplatform.org/admin/repos/ml/armnn  
 79 | git clone "https://review.mlplatform.org/ml/armnn"  
 80 | * Arm NN Quantised Mobilenet  
 81 | https://github.com/ARM-software/ML-examples/blob/master/armnn-mobilenet-quant/README.md  
 82 | ML-examples  
 83 | https://github.com/ARM-software/ML-examples  
 84 | * Arm NN MNIST  
 85 | https://developer.arm.com/technologies/machine-learning-on-arm/developer-material/how-to-guides/  
 86 | https://github.com/ARM-software/ML-examples/blob/master/armnn-mnist/README.md   
 87 | 
 88 | ## QNNPACK  
 89 | https://github.com/pytorch/QNNPACK  
 90 | https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/native/quantized/cpu/qnnpack  
 91 | 
 92 | ## Prestissimo, 绝影    
 93 | https://github.com/in66-dev/In-Prestissimo  
 94 | 
 95 | ## FeatherCNN  
 96 | https://github.com/Tencent/FeatherCNN  
 97 | 
 98 | ## DNNLibrary  
 99 | https://github.com/JDAI-CV/DNNLibrary  
100 | https://github.com/daquexian/dnnlibrary-example  
101 | https://github.com/Oneflow-Inc/oneflow  
102 | Android 8.1 NNAPI 评测以及可能是全球第一个的 NNAPI 库  
103 | https://zhuanlan.zhihu.com/p/30926958  
104 | 
105 | ## vosk-api, kaldi android  
106 | https://github.com/alphacep/vosk-api  
107 | 


--------------------------------------------------------------------------------
/README_player.md:
--------------------------------------------------------------------------------
 1 | ## NUCLEO-F746ZG  
 2 | * WM8960 Audio Board, waveshare (微雪)    
 3 | (demo) WM8960_Play_Music_NECLEO-F746ZG_v1_success.rar  
 4 | (origin) https://www.waveshare.net/wiki/WM8960_Audio_Board  
 5 | WM8960<->NECLEO-F746ZG  
 6 | 1.VCC<->3.3V (outer left top right 6)  
 7 | 1.GND<->GND (outer left top right 4)  
 8 | 1.SDA<->PB9 (inner right top right 2)  
 9 | 1.SCL<->PB8 (inner right top right 1)  
10 | 1.CLK<->PB13 (inner right top left 3)  
11 | 1.WS<->PB12 (inner right top left 4)  
12 | 1.RXSDA<->PB15 (inner right top left 2)  
13 | 1.RXMCLK<->PC6 (inner right top left 1)  
14 | 
15 | ## 32F411EDISCOVERY, STM32F411E-DISCO  
16 | * (origin) https://www.st.com/en/evaluation-tools/32f411ediscovery.html  
17 | (origin) https://github.com/STMicroelectronics/STM32CubeF4/tree/master/Projects/STM32F411E-Discovery/Applications/Audio/Audio_playback_and_record  
18 | 
19 | ## ESP32-Audio-kit, AI-Thinker (安信可)    
20 | * (origin) https://docs.ai-thinker.com/esp32-audio-kit  
21 | (origin) https://github.com/RealCorebb/ESP32-A1s-Audio-Kit  
22 | (origin) https://github.com/Ai-Thinker-Open/ESP32-A1S-AudioKit  
23 | 


--------------------------------------------------------------------------------
/RaspiVoiceHAT_001.txt:
--------------------------------------------------------------------------------
  1 | http://ukonline2000.com/?p=1207
  2 | 
  3 | Raspi Voice HAT-AI智能音箱2-Mic麦克风语音识别阵列适用树莓派2/3/4B
  4 | 发表于2020 年 9 月 18 日由ukonline2000
  5 | 产品简介
  6 | 
  7 | 前言
  8 | Raspi Voice HAT是专为AI和语音应用设计的Raspberry Pi双麦克风扩展板。 这意味着您可以构建一个集成Amazon Amazona语音服务，Google助手，百度AI等的功能更强大，更灵活的语音产品。
  9 | 
 10 | 该板是基于树莓派而设计的音频模块，采用WM8960低功耗立体声编解码器，通过I2C接口控制，I2S接口传输音频。 电路板两侧有两个麦克风采集声音，还提供12个APA102 RGB LED和1个板载扬声器，并提供1个用户按钮和1个I2C接口，用于扩展应用程序。
 11 | 
 12 | 此外，板载3.5mm音频插孔或JST 2.0扬声器输出均可用于音频输出，可通过外接耳机播放音乐,同时也可通过双通道喇叭接口外接喇叭播放。板子左右两边有一个高质量MEMS硅麦克风，可以立体声录音。
 13 | 
 14 | 产品特性
 15 | 供电电压：5V
 16 | 逻辑电压：3.3V
 17 | 音频编解码芯片：WM8960
 18 | 控制接口：I2C
 19 | 音频接口：I2S
 20 | 扩展接口：1x I2C，1x按键
 21 | 电源接口：1xType-C 接口（5V）
 22 | LED接口：12个APA102可编程RGB LED，连接到SPI接口
 23 | DAC信噪比：98dB
 24 | ADC信噪比：94dB
 25 | 耳机驱动：40mW (16Ω@3.3V)
 26 | 扬声器驱动：1W per channel (8Ω BTL)（板载mono扬声器）
 27 | 硬件资源
 28 | 
 29 | LP、LN分别对应左扬声器的正、负极; RP、RN分别对应右扬声器的正、负极。
 30 | 
 31 | 功能引脚	树莓派引脚（BCM）	描述
 32 | 5V	5V	电源正（5V电源输入）
 33 | GND	GND	电源地
 34 | SDA	P3/GPIO2	I2C数据输入
 35 | SCL	P5/GPIO3	I2C时钟输入
 36 | CLK	P12/GPIO18	I2S位时钟输入
 37 | LRCLK	P35/GPIO19	I2S帧时钟输入
 38 | DAC	P40/GPIO21	I2S串行数据输出
 39 | ADC	P38/GPIO20	I2S串行数据输入
 40 | BUTTON	P29/GPIO5或P31/GPIO6	自定义按键
 41 | 树莓派使用
 42 | 本产品例程只适用于树莓派官方系统（Raspbian）
 43 | 
 44 | 脚本自动安装
 45 | 请使用树莓派官方内核版本为5.4以上的官方系统（Kernel version:5.4）
 46 | 
 47 | 执行以下脚本自动安装驱动:(包含demo)
 48 | 
 49 | git clone https://github.com/u-geek/RaspiVoiceHAT
 50 | 
 51 | cd RaspiVoiceHAT
 52 | 
 53 | sudo ./setup.sh
 54 |              ┌───────┤ AOIDE RaspiVoiceHAT setup tools(5.4.51) ├────────┐
 55 |              │ RaspiVoiceHAT Config Tool.                               │
 56 |              │                                                          │
 57 |              │                     1 Install Driver                     │
 58 |              │                     2 Remove Driver                      │
 59 |              │                     3 Demo                               │
 60 |              │                     E Exit                               │
 61 |              │                                                          │
 62 |              │                                                          │
 63 |              │                                                          │
 64 |              │                                                          │
 65 |              │                                                          │
 66 |              │                                                          │
 67 |              │                                                          │
 68 |              │                                                          │
 69 |              │              <Ok>                  <Exit>                │
 70 |              │                                                          │
 71 |              └──────────────────────────────────────────────────────────┘
 72 | 手动安装驱动
 73 | 如果你是国内用户，用官方源可能会比较慢，整个安装过程下载需要很长时间且可能会更新失败，因此可以换成阿里源：
 74 | 
 75 | sudo nano /etc/apt/sources.list
 76 | 将官方源开头用#注释，并添加阿里源：
 77 | 
 78 | #deb-src http://archive.raspberrypi.org/debian/ stretch main
 79 | deb http://mirrors.aliyun.com/raspbian/raspbian/ buster main contrib non-free rpi
 80 | deb-src http://mirrors.aliyun.com/raspbian/raspbian/ buster main contrib non-free rpi
 81 | 更新软件源：
 82 | 
 83 | sudo apt-get update
 84 | sudo apt-get upgrade
 85 | 检查内核版本：
 86 | 
 87 | uname -a
 88 | 如果内核版本低于5.0（即树莓派系统是2020-05-27之前）下载如下驱动
 89 | git clone -b rpi-4.9.y https://github.com/waveshare/WM8960-Audio-HAT.git
 90 | 如果是最新的下载如下：
 91 | git clone https://github.com/waveshare/WM8960-Audio-HAT
 92 | 如上两条命令请不要都运行
 93 | 安装WM8960驱动：
 94 | 
 95 | cd WM8960-Audio-HAT
 96 | #需要等待一定的时间
 97 | sudo ./install.sh 
 98 | sudo reboot
 99 | 重启后运行如下命令看下驱动是否加载成功。
100 | 
101 | sudo dkms status
102 | 
103 | pi@raspberrypi:~ $ sudo dkms status 
104 | wm8960-soundcard, 1.0, 4.19.58-v7l+, armv7l: installed
105 | 检测声卡
106 | 检查播放：aplay -l
107 | pi@raspberrypi:~ $ aplay -l**** List of PLAYBACK Hardware Devices ****card 0: wm8960soundcard [wm8960-soundcard], device 0: bcm2835-i2s-wm8960-hifi wm8960-hifi-0 []  Subdevices: 1/1  Subdevice #0: subdevice #0
108 | 检查录音：arecord -l
109 | pi@raspberrypi:~ $ arecord -l**** List of CAPTURE Hardware Devices ****card 0: wm8960soundcard [wm8960-soundcard], device 0: bcm2835-i2s-wm8960-hifi wm8960-hifi-0 []  Subdevices: 1/1  Subdevice #0: subdevice #0
110 | 录音播放测试
111 | 录播测试
112 | sudo arecord -f cd -Dhw:0 | aplay -Dhw:0
113 | 程序运行后通过耳机或者喇叭会听到麦克风采集到的声音，注意喇叭不能开靠近麦克风否 则会导致共振产生啸叫。
114 | 
115 | 录音
116 | sudo arecord -D hw:0,0 -f S32_LE -r 16000 -c 2 test.wav
117 | test.wav是录制生成的文件名。
118 | 
119 | 播放
120 | sudo aplay -Dhw:0 test.wav
121 | 播放刚刚录制的音频
122 | 
123 | 调节音量
124 | 默认的音量是比较小的
125 | 
126 | sudo alsamixer
127 | 
128 | 如果 WM8960 声卡没有设置为默认声卡则需要按 F6 选择声卡设备。
129 | 
130 | 实际上右边还有很多可以调节的选项。
131 | 
132 | MPG123播放器
133 | aplay只支持wav的音乐，实际上MP3格式的音乐会更加多，安装：
134 | 
135 | sudo apt-get install mpg123 
136 | sudo mpg123 music.mp3
137 | 需要注意，这里的music.mp3需要替换成你的mp3音乐。
138 | 
139 | 图形化SMPLAYER
140 | 如果是命令行系统可以跳过，如果你的是桌面系统，安装：
141 | 
142 | sudo apt-get install smplayer
143 | 
144 | 
145 | 
146 | 在界面中右键选择wm8960-soundcard为默认
147 | 
148 | 在菜单中打开smplayer软件，打开音频文件即可播放。(smplayer也可以播放视频)
149 | 
150 | 
151 | 程序控制
152 | 我们提供了一个简单的python控制例程。
153 | 
154 | 安装对应的库
155 | sudo apt-get install libasound2-devgit 
156 | clone https://github.com/larsimmisch/pyalsaaudio
157 | cd pyalsaaudio
158 | sudo python setup.py build
159 | sudo python setup.py install
160 | 下载例程
161 | wget http://www.waveshare.net/w/upload/1/19/WM8960_Audio_HAT_Code.tar.gz
162 | tar zxvf WM8960_Audio_HAT_Code.tar.gz
163 | sudo chmod 777 -R WM8960_Audio_HAT_Code
164 | 播放
165 | sudo python playwav.py music.wav
166 | 录音
167 | sudo python recordwav.py out.wav
168 | 


--------------------------------------------------------------------------------
/algo_leetcode_001.md:
--------------------------------------------------------------------------------
1 | 
2 | * https://github.com/labuladong/fucking-algorithm  
3 | 


--------------------------------------------------------------------------------
/android_001.md:
--------------------------------------------------------------------------------
1 | * search baidupan, repogen_v1.rar  
2 | 


--------------------------------------------------------------------------------
/asm_001.md:
--------------------------------------------------------------------------------
 1 | ## microdigitaled  
 2 | * http://www.microdigitaled.com/ARM/ASM_ARM/Code/ARM_ASM_codes.htm  
 3 | search baidupan, arm_asm_code.zip  
 4 | * https://nicerland.com/raspberry-pi/  
 5 | search baidupan raspberry-pi_asm  
 6 | 
 7 | ## 1024 points radix-4 complex fft  
 8 | * https://github.com/gk969/stm32-speech-recognition/blob/master/Src/BSP/cr4_fft_1024_stm32.s  
 9 | 
10 | ## Cortex-M3 Devices Generic User Guide  
11 | https://developer.arm.com/documentation/dui0552/a/the-cortex-m3-instruction-set/branch-and-control-instructions/it  
12 | 
13 | ## SPIM: A MIPS32 Simulator  
14 | http://spimsimulator.sourceforge.net  
15 | 
16 | ## An 8-bit minicomputer with a fully custom architecture  
17 | https://github.com/jdah/jdh-8  
18 | 
19 | ## Open-source high-performance RISC-V processor  
20 | https://github.com/OpenXiangShan/XiangShan  
21 | 
22 | ## Write your Own Virtual Machine  
23 | https://justinmeiners.github.io/lc3-vm/  
24 | 
25 | 


--------------------------------------------------------------------------------
/asr_000.md:
--------------------------------------------------------------------------------
 1 | Include:  
 2 | * 语音识别，语音合成：ASR / OCR / STT / TTS / VAD    
 3 | * 对讲机，软电话: Codec / Audio Compression / speex / opus / VoIP  
 4 | 
 5 | ## Old notes  
 6 | * https://github.com/weimingtom/wmt_ai_study/blob/master/asr_002.md  
 7 | * https://github.com/weimingtom/wmt_ai_study/blob/master/asr_001.md  
 8 | * https://github.com/weimingtom/wmt_linux_study  
 9 | * https://github.com/weimingtom/wmt_ai_study/blob/master/live_001.md  
10 | 
11 | ## CSML-by-Clevy / csml-engine  
12 | 聊天机器人  
13 | https://github.com/CSML-by-Clevy/csml-engine  
14 | 
15 | ## 运行deep learning人工智能框架
16 | （一）吃鸡游戏机  
17 | （二）图形工作站、GPU服务器  
18 | 
19 | 
20 | ## 语音机器人 / 魔镜 / 聊天机器人, 对话系统    
21 | * dingdang-robot, 叮当系统   
22 | * wukong-robot  
23 | * Leon, getleon.ai    
24 | * zimei, zimeimojing, 自美智能系统  
25 | * corvin_zhang / ros_voice_system, 木星中文语音对话系统  
26 | * Seeedstudio Respeaker  
27 | * waveshare 13.3inch_Magic_Mirror, 微雪 13.3寸智能魔镜     
28 | * Raspibot  
29 | 
30 | 
31 | ## AR9331, AR9341, openwrt    
32 | 
33 | 


--------------------------------------------------------------------------------
/asr_005.md:
--------------------------------------------------------------------------------
  1 | ## 20210322  
  2 | 
  3 | ## -  
  4 | 
  5 | ## kaldi, Perceptual Linear Prediction, 感知线性预测系数  
  6 | 《图解语音识别》  
  7 | Python语音信号特征-感知线性预测系数PLP  
  8 | https://blog.csdn.net/weixin_42485817/article/details/107590846  
  9 | 
 10 | ## Linear Prediction (LPC)  
 11 | 基于LPC的语音识别  
 12 | 
 13 | ## linear prediction cepstrum coefficient,LPCC  
 14 | Linear Predictive Cepstrum Coefficients,LPCC  
 15 | 线性预测倒谱系数  
 16 | 
 17 | ## LPCMCC  
 18 | LPC美尔倒频谱系数  
 19 | 
 20 | ## DCT（离散余弦变换）  
 21 | 图像压缩方面  
 22 | https://baike.baidu.com/item/离散余弦变换/7118270?fr=aladdin  
 23 | 
 24 | ## (IMP???) pytorch crnn  
 25 | https://github.com/isadrtdinov/kws-attention  
 26 | 
 27 | ## TensorFlowLite Micro: Embedded Machine Learning on TinyML Systems  
 28 | https://arxiv.org/pdf/2010.08678.pdf  
 29 | https://github.com/raspberrypi/pico-tflmicro  
 30 | 
 31 | ## CRNN网络结构详解  
 32 | https://www.jianshu.com/p/4ac876a4cd5c  
 33 | 
 34 | ## gpt-neo  
 35 | https://github.com/EleutherAI/gpt-neo  
 36 | 
 37 | ## 强大如GPT-3，1750亿参数也搞不定中文？  
 38 | https://www.huxiu.com/article/375604.html  
 39 | 
 40 | ## Tatoeba-Challenge  
 41 | https://github.com/Helsinki-NLP/Tatoeba-Challenge  
 42 | 
 43 | ## PYTHON-AND-DATA-ANALYTICS-7-DAYS  
 44 | https://github.com/ShapeAI/PYTHON-AND-DATA-ANALYTICS-7-DAYS  
 45 | 
 46 | ## introduction-to-machine-learning  
 47 | https://github.com/globalaihub/introduction-to-machine-learning  
 48 | 
 49 | ## Liquid Warping GAN with Attention: A Unified Framework for Human Image Synthesis  
 50 | https://github.com/iPERDance/iPERCore  
 51 | 
 52 | ## (???) Deep Learning for NLP and Speech Recognition   
 53 | ???  
 54 | 
 55 | ## (IMP) GitHub-Chinese-Top-Charts  
 56 | search tensorflow or nn or pytorch  
 57 | https://github.com/kon9chunkit/GitHub-Chinese-Top-Charts/blob/master/README-Part2.md#C  
 58 | 
 59 | ## USound MEMS 扬声器开发套件  
 60 | https://www.cirmall.com/bbs/thread-204315-1-1.html?eefocus  
 61 | 
 62 | ## (IMP) jetson nano, 微雪, jetson-inference     
 63 | https://www.waveshare.net/wiki/Jetson_Nano_Developer_Kit_Package_D  
 64 | https://www.waveshare.net/study/article-892-1.html  
 65 | https://github.com/dusty-nv/jetson-inference  
 66 | https://www.waveshare.net/study/article-889-1.html  
 67 | https://www.waveshare.net/study/article-893-1.html  
 68 | https://www.pianshen.com/article/7547357666/  
 69 | search baidupan, MNIST_TEST.zip  
 70 | search baidupan, networks.zip  
 71 | https://wiki.seeedstudio.com/cn/Jetson_Nano_OutBoxing_Demo/  
 72 | 
 73 | ## jetson nano, TensorRT  
 74 | https://developer.nvidia.com/zh-cn/tensorrt  
 75 | https://github.com/NVIDIA/TensorRT  
 76 | cuDNN, cuda  
 77 | VPI  
 78 | https://developer.nvidia.com/embedded/vpi  
 79 | JetPack  
 80 | https://developer.nvidia.com/zh-cn/embedded/jetpack  
 81 | http://www.gpus.cn/gpus_list_page_techno_support_content?id=101  
 82 | 
 83 | ## 学界 | 论文撞车英伟达，一作「哭晕在厕所」，英伟达：要不要来实习？   
 84 | https://www.sohu.com/a/274428157_129720  
 85 | ```
 86 | 我的朋友，深有同感。我几周前和谷歌撞车，几个月前还和 DeepMind 撞车。我是搞人工智能的，又不是开碰碰车的。  
 87 | ```  
 88 | 
 89 | ## Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.  
 90 | https://github.com/sebastianruder/NLP-progress  
 91 | 
 92 | ## NVIDIA ASR  
 93 | https://github.com/shuaaa/NVIDIA-DeepLearning  
 94 | https://github.com/NVIDIA/DeepLearningExamples/tree/master/Kaldi/SpeechRecognition  
 95 | https://github.com/NVIDIA/DeepLearningExamples/tree/master/PyTorch/SpeechRecognition/Jasper  
 96 | 
 97 | ## TensorFlow Lite Micro  
 98 | https://github.com/search?q=TF_LITE_REPORT_ERROR+main_functions&type=code  
 99 | https://github.com/mlubinsky/mlubinsky.github.com/blob/5991cad85b90b5f4d051014cee66b565fe22040b/sound/README.md  
100 | https://github.com/search?q=TF_LITE_REPORT_ERROR+main_functions+NOLINTNEXTLINE+spectrogram&type=Code  
101 | TfLiteConv3DParams  
102 | https://github.com/search?l=C&q=TfLiteConv3DParams&type=Code  
103 | 
104 | ## Wio Terminal TFLM  
105 | https://github.com/Seeed-Studio/Seeed_Arduino_Sketchbook/blob/d59d97a35d696e18d51a329fc7aacb8d04c680a8/examples/WioTerminal_TinyML_4_Weather_Prediction/tensorflow_lite/library.properties  
106 | https://github.com/Seeed-Studio/Seeed_Arduino_Sketchbook/tree/master/examples/WioTerminal_TinyML_2_Audio_Scene_Recognition  
107 | 
108 | ## tflite4zero_env  
109 | https://github.com/NewComer00/tflite4zero_env  
110 | 
111 | ## TensorFlow_MIMXRT1064-EVK_Microspeech  
112 | https://github.com/ARMmbed/TensorFlow_MIMXRT1064-EVK_Microspeech  
113 | 
114 | ## tensorflow-examples  
115 | https://github.com/antmicro/tensorflow-examples  
116 | 
117 | ## voice-commands-using-arduino-and-ml  
118 | https://github.com/Apress/voice-commands-using-arduino-and-ml  
119 | 
120 | ## stm32-tflm-micro-speech  
121 | https://github.com/tum-ei-eda/stm32-tflm-micro-speech  
122 | 
123 | ## k210  
124 | https://github.com/fjpolo/eML  
125 | 
126 | ## lib_audio_features, MFCC  
127 | https://github.com/xmos/lib_audio_features  
128 | 
129 | ## ML-Sound-Classification  
130 | https://github.com/villasen/ML-Sound-Classification  
131 | 
132 | ## (IMP???) same54 kws  
133 | https://microchipdeveloper.com/machine-learning:keywordspotting-with-edge-impulse  
134 | https://github.com/MicrochipTech/ml-same54-cult-wm8904-edgeimpulse-kws-demo  
135 | 
136 | ## lyra  
137 | https://github.com/google/lyra  
138 | 
139 | ## mlflow  
140 | https://github.com/mlflow/mlflow  
141 | 
142 | ## numba  
143 | https://github.com/numba/numba  
144 | 
145 | ## miniaudio  
146 | https://github.com/mackron/miniaudio  
147 | 
148 | ## axon  
149 | https://github.com/elixir-nx/axon  
150 | 
151 | ## (IMP) MAX9812  
152 | search baidupan, MAX9812  
153 | KY-038 麦克风放大器模块  
154 | 
155 | ## (IMP) ESP32-8-Octave-Audio-Spectrum-Display, fft    
156 | https://github.com/G6EJD/ESP32-8-Octave-Audio-Spectrum-Display  
157 | https://github.com/kosme/arduinoFFT  
158 | 
159 | ## spleeter  
160 | https://github.com/deezer/spleeter  
161 | 
162 | ## mozilla/TTS  
163 | https://github.com/mozilla/TTS  
164 | 
165 | ## (IMP) ESP32 Audio Input - MAX4466, MAX9814, SPH0645LM4H, INMP441  
166 | https://blog.cmgresearch.com/2020/09/12/esp32-audio-input.html  
167 | ESP32音频输入-MAX4466，MAX9814，SPH0645LM4H，INMP441(翻译)  
168 | https://www.cnblogs.com/kerwincui/p/13751746.html  
169 | search baidupan, esp32-audio-input.docx  
170 | 
171 | ## ESP32_MP3_Decoder  
172 | https://github.com/MrBuddyCasino/ESP32_MP3_Decoder  
173 | 
174 | ## A C++ standalone library for machine learning  
175 | https://github.com/flashlight/flashlight  
176 | 
177 | ## 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code  
178 | https://github.com/ashishpatel26/500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code  
179 | 
180 | ## start-machine-learning-in-2020  
181 | https://github.com/louisfb01/start-machine-learning-in-2020  
182 | 
183 | ## 物联网创新项目开发与实践  
184 | search baidupan, 物联网创新项目开发与实践  
185 | 
186 | ## Machine_Learning_2_months  
187 | https://github.com/Minhluu2911/Machine_Learning_2_months  
188 | 
189 | ## onnx  
190 | https://github.com/onnx/onnx  
191 | 
192 | ## (IMP???) SpeechBrain, A PyTorch Powered Speech Toolkit  
193 | https://github.com/speechbrain/speechbrain  
194 | https://speechbrain.github.io  
195 | 
196 | ## tensorflow-pack  
197 | https://github.com/MDK-Packs/tensorflow-pack  
198 | 
199 | ## yolov5-face  
200 | https://github.com/deepcam-cn/yolov5-face  
201 | 
202 | ## mlflow  
203 | https://github.com/mlflow/mlflow  
204 | 
205 | ## tensorboard  
206 | https://github.com/tensorflow/tensorboard  
207 | 
208 | ## The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools  
209 | https://github.com/huggingface/datasets  
210 | 
211 | ## A C++ standalone library for machine learning  
212 | https://github.com/flashlight/flashlight  
213 | 
214 | ## realsense-ros  
215 | https://github.com/IntelRealSense/realsense-ros  
216 | 
217 | ## (IMP???) Continual_Learning_for_KWS  
218 | https://github.com/jianvora/Continual_Learning_for_KWS  
219 | https://gitee.com/weimingtom2000/Continual_Learning_for_KWS/tree/main/Keyword%20Spotting/TC-Resnet  
220 | 
221 | ## Matlab计算机视觉与深度学习实战  
222 | https://github.com/decouples/Matlab_deep_learning  
223 | 第 19 章 基于语音识别的信号灯图像模拟控制技术  
224 | 
225 | ## pytorch-tutorial  
226 | https://github.com/yunjey/pytorch-tutorial  
227 | 
228 | ## DouZero, 斗地主AI  
229 | https://github.com/kwai/DouZero  
230 | 
231 | ## coqui-ai/TTS, a deep learning toolkit for Text-to-Speech, battle-tested in research and production    
232 | https://github.com/coqui-ai/TTS  
233 | 
234 | ## sounddevice  
235 | https://github.com/spatialaudio/python-sounddevice  
236 | https://gist.github.com/akey7/94ff0b4a4caf70b98f0135c1cd79aff3  
237 | ```
238 | # Use the sounddevice module
239 | # http://python-sounddevice.readthedocs.io/en/0.3.10/
240 | 
241 | import numpy as np
242 | import sounddevice as sd
243 | import time
244 | 
245 | # Samples per second
246 | sps = 44100
247 | 
248 | # Frequency / pitch
249 | freq_hz = 440.0
250 | 
251 | # Duration
252 | duration_s = 5.0
253 | 
254 | # Attenuation so the sound is reasonable
255 | atten = 0.3
256 | 
257 | # NumpPy magic to calculate the waveform
258 | each_sample_number = np.arange(duration_s * sps)
259 | waveform = np.sin(2 * np.pi * each_sample_number * freq_hz / sps)
260 | waveform_quiet = waveform * atten
261 | 
262 | # Play the waveform out the speakers
263 | sd.play(waveform_quiet, sps)
264 | time.sleep(duration_s)
265 | sd.stop()
266 | ```
267 | 
268 | ```
269 | import pyaudio
270 | from scipy.io import wavfile
271 | 
272 | sr, wdata=wavfile.read('house_lo.wav')
273 | 
274 | p = pyaudio.PyAudio()
275 | stream = p.open(format = p.get_format_from_width(1), channels = 1, rate = sr, output = True)
276 | stream.write(wdata)
277 | stream.stop_stream()
278 | stream.close()
279 | p.terminate()
280 | ```
281 | 
282 | ## darts, A python library for easy manipulation and forecasting of time series.  
283 | https://github.com/unit8co/darts  
284 | 
285 | ## Listen, attend and spell Model and a Chinese Mandarin Pretrained model (中文-普通话 ASR模型)  
286 | https://github.com/jackaduma/LAS_Mandarin_PyTorch  
287 | 


--------------------------------------------------------------------------------
/baidu_asr_rpi_python_001.txt:
--------------------------------------------------------------------------------
  1 | https://www.passerma.com/article/54/
  2 | https://github.com/passerma/voiceAssistant
  3 | 
  4 | 树莓派使用snowboy以及百度语音api实现语音识别助手
  5 | 2020-04-10 13:01
  6 | 树莓派
  7 | 3254
  8 | 36
  9 | 一.唤醒模块
 10 | 1. 安装所需依赖
 11 | sudo apt-get install python3-pyaudio
 12 | sudo apt-get install swig
 13 | sudo apt-get install libatlas-base-dev
 14 | 2. 安装snowboy
 15 | git clone https://gitee.com/passerma/snowboy.git
 16 | cd snowboy/swig/Python3 && make
 17 | 3.测试是否安装成功
 18 | ps：需注意要将
 19 | snowboy/examples/Python3
 20 | 目录下的
 21 | snowboydecoder.py
 22 | 文件的第 5 行代码
 23 | from * import snowboydetect
 24 | 改为
 25 | import snowboydetect
 26 | 然后执行
 27 | 
 28 | cd ~
 29 | cd snowboy/examples/Python3
 30 | python3 demo.py resources/models/snowboy.umdl
 31 | 其中resources/models/snowboy.umdl是语音识别模型文件，这个后面可以替换成你需要的模型
 32 | 运行后会出现以下
 33 | 
 34 | 然后说出唤醒词snowboy将会出现滴的一声，同时输出INFO:snowboy:Keyword 1 detected at time: 2020-04-10 11:51:22
 35 | 说明安装环境成功
 36 | 
 37 | 4.修改唤醒词
 38 | 进入snowBoy官网，登录成功后选择Create Hotword
 39 | 
 40 | 随后进行录制声音，推荐录制好后上传录音，只支持WAV格式
 41 | 
 42 | 之后再测试一遍通过后即可下载，下载完成后将得到一个mmdl文件
 43 | 
 44 | 替换之前resources/models/snowboy.umdl文件即可完成对唤醒词的修改
 45 | 或者在运行的时候直接选择你的pmdl文件,比如我的文件是ma.pmdl,运行时输入也可
 46 | 
 47 | cd ~
 48 | cd snowboy/examples/Python3
 49 | python3 demo.py resources/models/ma.pmdl
 50 | 至此，唤醒模块已经完成，接下来就是录制声音进行语音识别了
 51 | 
 52 | 二.语音识别模块
 53 | 1. 申请百度智能云账号
 54 | 随后进入控制台选择人工智能下的语音模块，然后创建一个应用，目前百度的语音识别次数免费额度只有50000次，所有大家好好珍惜
 55 | 然后点击管理应用后你将得到你的API Key和Secret Key，记录备用
 56 | 
 57 | 但是目前新建账号已经没有免费额度了，需要开通付费，不过依旧可以领取免费额度来使用
 58 | 前往管理中心的控制台选择语音识别，然后可以看到概览，里面的表格里可以领取
 59 | 
 60 | 
 61 | 2.树莓派录制声音上传百度
 62 | 这一步我将结合python的程序，直接上程序
 63 | 
 64 | 1.需要现获取到token，才能有权限，文件名为fetchToken.py
 65 | import sys
 66 | import json
 67 | 
 68 | # 保证兼容python2以及python3
 69 | IS_PY3 = sys.version_info.major == 3
 70 | if IS_PY3:
 71 |     from urllib.request import urlopen
 72 |     from urllib.request import Request
 73 |     from urllib.error import URLError
 74 |     from urllib.parse import urlencode
 75 |     from urllib.parse import quote_plus
 76 | else:
 77 |     import urllib2
 78 |     from urllib import quote_plus
 79 |     from urllib2 import urlopen
 80 |     from urllib2 import Request
 81 |     from urllib2 import URLError
 82 |     from urllib import urlencode
 83 | # 替换你的 API_KEY
 84 | 
 85 | API_KEY = '***************'
 86 | 
 87 | # 替换你的 SECRET_KEY
 88 | SECRET_KEY = '***************'
 89 | 
 90 | TOKEN_URL = 'http://openapi.baidu.com/oauth/2.0/token'
 91 | 
 92 | def fetch_token():
 93 |     params = {'grant_type': 'client_credentials',
 94 |               'client_id': API_KEY,
 95 |               'client_secret': SECRET_KEY}
 96 |     post_data = urlencode(params)
 97 |     if (IS_PY3):
 98 |         post_data = post_data.encode('utf-8')
 99 |     req = Request(TOKEN_URL, post_data)
100 |     try:
101 |         f = urlopen(req, timeout=5)
102 |         result_str = f.read()
103 |     except URLError as err:
104 |         print('token http response http code : ' + str(err.code))
105 |         result_str = err.read()
106 |     if (IS_PY3):
107 |         result_str = result_str.decode()
108 | 
109 |     result = json.loads(result_str)
110 | 
111 |     if ('access_token' in result.keys() and 'scope' in result.keys()):
112 |         if not 'audio_tts_post' in result['scope'].split(' '):
113 |             print ('please ensure has check the tts ability')
114 |             return ''
115 |         return result['access_token']
116 |     else:
117 |         print ('please overwrite the correct API_KEY and SECRET_KEY')
118 |         return ''
119 | 2.主程序，文件名为snow.py
120 | import snowboydecoder
121 | import signal
122 | import wave
123 | import sys
124 | import json
125 | import requests
126 | import time
127 | import os
128 | import base64
129 | from pyaudio import PyAudio, paInt16
130 | import webbrowser
131 | from fetchToken import fetch_token
132 | import time
133 | 
134 | IS_PY3 = sys.version_info.major == 3
135 | if IS_PY3:
136 |     from urllib.request import urlopen
137 |     from urllib.request import Request
138 |     from urllib.error import URLError
139 |     from urllib.parse import urlencode
140 |     from urllib.parse import quote_plus
141 | else:
142 |     import urllib2
143 |     from urllib import quote_plus
144 |     from urllib2 import urlopen
145 |     from urllib2 import Request
146 |     from urllib2 import URLError
147 |     from urllib import urlencode
148 | 
149 | interrupted = False # snowboy监听唤醒结束标志
150 | endSnow = False # 程序结束标志
151 | 
152 | framerate = 16000  # 采样率
153 | num_samples = 2000  # 采样点
154 | channels = 1  # 声道
155 | sampwidth = 2  # 采样宽度2bytes
156 | 
157 | FILEPATH = './audio/audio.wav' # 录制完成存放音频路径
158 | music_exit = './audio/exit.wav' # 唤醒系统退出语音
159 | music_open = './audio/open.wav' # 唤醒系统打开语音
160 | os.close(sys.stderr.fileno()) # 去掉错误警告
161 | 
162 | def signal_handler(signal, frame):
163 |     """
164 |     监听键盘结束
165 |     """
166 |     global interrupted
167 |     interrupted = True
168 | 
169 | def interrupt_callback():
170 |     """
171 |     监听唤醒
172 |     """
173 |     global interrupted
174 |     return interrupted
175 | 
176 | def detected():
177 |     """
178 |     唤醒成功
179 |     """
180 |     print('唤醒成功')
181 |     play(music_open)
182 |     global interrupted
183 |     interrupted = True
184 |     detector.terminate()
185 | 
186 | def play(filename):
187 |     """
188 |     播放音频
189 |     """
190 |     wf = wave.open(filename, 'rb')  # 打开audio.wav
191 |     p = PyAudio()                   # 实例化 pyaudio
192 |     # 打开流
193 |     stream = p.open(format=p.get_format_from_width(wf.getsampwidth()),
194 |                     channels=wf.getnchannels(),
195 |                     rate=wf.getframerate(),
196 |                     output=True)
197 |     data = wf.readframes(1024)
198 |     while data != b'':
199 |         data = wf.readframes(1024)
200 |         stream.write(data)
201 |     # 释放IO
202 |     stream.stop_stream()
203 |     stream.close()
204 |     p.terminate()
205 | 
206 | def save_wave_file(filepath, data):
207 |     """
208 |     存储文件
209 |     """
210 |     wf = wave.open(filepath, 'wb')
211 |     wf.setnchannels(channels)
212 |     wf.setsampwidth(sampwidth)
213 |     wf.setframerate(framerate)
214 |     wf.writeframes(b''.join(data))
215 |     wf.close()
216 | 
217 | def my_record():
218 |     """
219 |     录音
220 |     """
221 |     pa = PyAudio()
222 |     stream = pa.open(format=paInt16, channels=channels,
223 |                      rate=framerate, input=True, frames_per_buffer=num_samples)
224 |     my_buf = []
225 |     # count = 0
226 |     t = time.time()
227 |     print('开始录音...')
228 |     while time.time() < t + 4:  # 秒
229 |         string_audio_data = stream.read(num_samples)
230 |         my_buf.append(string_audio_data)
231 |     print('录音结束!')
232 |     save_wave_file(FILEPATH, my_buf)
233 |     stream.close()
234 | 
235 | 
236 | def speech2text(speech_data, token, dev_pid=1537):
237 |     """
238 |     音频转文字
239 |     """
240 |     FORMAT = 'wav'
241 |     RATE = '16000'
242 |     CHANNEL = 1
243 |     CUID = 'baidu_workshop'
244 |     SPEECH = base64.b64encode(speech_data).decode('utf-8')
245 |     data = {
246 |         'format': FORMAT,
247 |         'rate': RATE,
248 |         'channel': CHANNEL,
249 |         'cuid': CUID,
250 |         'len': len(speech_data),
251 |         'speech': SPEECH,
252 |         'token': token,
253 |         'dev_pid': dev_pid
254 |     }
255 |     # 语音转文字接口 该接口可能每个人不一样，取决于你需要哪种语音识别功能，本文使用的是 语音识别极速版
256 | 
257 |     url = 'https://vop.baidu.com/pro_api'
258 |     headers = {'Content-Type': 'application/json'} # 请求头
259 |     print('正在识别...')
260 |     r = requests.post(url, json=data, headers=headers)
261 |     Result = r.json()
262 |     if 'result' in Result:
263 |         return Result['result'][0]
264 |     else:
265 |         return Result
266 | 
267 | def get_audio(file):
268 |     """
269 |     获取音频文件
270 |     """
271 |     with open(file, 'rb') as f:
272 |         data = f.read()
273 |     return data
274 | 
275 | def identifyComplete(text):
276 |     """
277 |     识别成功
278 |     """
279 |     print(text)
280 |     maps = {
281 |         '打开百度': ['打开百度。', '打开百度', '打开百度，', 'baidu']
282 |     }
283 |     if (text == '再见。' or text == '拜拜。'):
284 |         play(music_exit) # 关闭系统播放反馈语音
285 |         exit()
286 |     if text in maps['打开百度']:
287 |         webbrowser.open_new_tab('https://www.baidu.com')
288 |         play('./audio/openbaidu.wav') # 识别到播放反馈语音
289 |     else:
290 |         play('./audio/none.wav') # 未匹配口令播放反馈语音
291 |     print('操作完成')
292 | 
293 | if __name__ == "__main__":
294 |     while endSnow == False:
295 |         interrupted = False
296 |         # 实例化snowboy，第一个参数就是唤醒识别模型位置
297 |         detector = snowboydecoder.HotwordDetector('ma.pmdl', sensitivity=0.5)
298 |         print('等待唤醒')
299 |         # snowboy监听循环
300 |         detector.start(detected_callback=detected,
301 |                    interrupt_check=interrupt_callback,
302 |                    sleep_time=0.03)
303 |         my_record() # 唤醒成功开始录音
304 |         TOKEN = fetch_token() # 获取token
305 |         speech = get_audio(FILEPATH)
306 |         result = speech2text(speech, TOKEN, int(80001))
307 |         if type(result) == str:
308 |             identifyComplete(result.strip('，'))
309 | 至此代码结束
310 | 
311 | 接下来需要将以下文件移入你自己的文件夹
312 | 1.之前 snowboy/swig/Python3 目录下编译好的 _snowboydetect.so ，以及 snowboydetect.py文件
313 | 2.然后是 snowboy/examples/Python3 目录下的 snowboydecoder.py 文件
314 | 3.再是 snowboy/resources 整个文件夹
315 | 4.最后就是你的两个程序文件 fetchToken.py , snow.py , 然后是唤醒模型 ma.pmdl ,
316 | 5.再是 audio文件夹，可以没有，这个文件夹主要是语音反馈文件，就是主程序里的开启关闭成功的语音
317 | 然后你的文件夹目录应该有以下几个文件
318 | 
319 | 然后cd到你的目录，运行 python3 snow.py 即可
320 | 然后就会是以下效果
321 | 
322 | 代码已上传至GitHub
323 | 
324 | 至此，树莓派使用snowboy以及百度语音api实现语音识别助手基本完成，接下来的拓展无非是以这个为基础，实现完整的语音助手
325 | 
326 | 全部评论 36
327 | d
328 | dddd
329 | 2022-12-07 20:17  回复
330 | 谢谢楼主，按照楼主的方法已经实现了，一些模型要结合自己的实际环境做一些修改才能正常运行
331 | c
332 | clinch
333 | 2021-08-16 16:44  回复
334 | 博主你树莓派用的哪个麦克风来接受声音输出声音的啊
335 | 
336 | passerma
337 | 2021-08-16 21:27  回复
338 | 我用的绿联的USB声卡，然后连耳机当麦克风的
339 | s
340 | smallfive
341 | 2021-05-20 00:02  回复
342 | cd snowboy/swig/Python3 && make编译不成功，提示 /usr/bin/ld: ../..//lib/ubuntu64/libsnowboy-detect.a: error adding symbols: file in wrong format collect2: error: ld returned 1 exit status make: *** [Makefile:73: _snowboydetect.so] Error 1 是不是编译环境不一样？
343 | 
344 | passerma
345 | 2021-05-20 09:31  回复
346 | 应该是的
347 | 
348 | wlw
349 | 2022-03-11 02:00  回复
350 | @passerma 我也出现了这个问题，请问如何处理呢，着急 QQ：1584545874
351 | 
352 | 沐心
353 | 2021-04-20 16:14  回复
354 | 博主，使用百度AI语音合成文章后出现 http://tsn.baidu.com/text2audio?tex=识别错误，怎么办
355 | 
356 | passerma
357 | 2021-04-20 21:45  回复
358 | 必填的字段都填了吗
359 | 
360 | 沐心
361 | 2021-04-21 14:55  回复
362 | 嗯，填写了的，还有Snowboy训练语音模型的网站进不去了，如何训练自己的语音模型呢？
363 | 
364 | passerma
365 | 2021-04-21 20:50  回复
366 | @沐心 目前是没有什么好办法训练模型了
367 | 
368 | feihangfei
369 | 2021-03-27 21:47  回复
370 | 
371 | 为什么运行出来提示“已放弃”
372 | 
373 | passerma
374 | 2021-04-03 01:32  回复
375 | 这个不清楚哎，感觉是python运行时候的问题，我看代码里面没有写输出已放弃的地方
376 | 
377 | laaaity
378 | 2021-05-05 15:23  回复
379 | 语音模型不能使用了 试试换个模型
380 | d
381 | ddd
382 | 2022-12-07 20:16  回复
383 | 唤醒模型要改成你自己的
384 | 
385 | 奇迹盖茨
386 | 2021-03-12 10:38  回复
387 | 
388 | 只有这个提示“Aborted”
389 | 
390 | passerma
391 | 2021-03-12 18:53  回复
392 | 这个问题我也没遇到过，不太清楚，得具体定位
393 | 奇
394 | 奇迹盖茨
395 | 2021-03-11 11:22  回复
396 | 请问下我运行的时候提示“Aborted”
397 | 
398 | passerma
399 | 2021-03-11 21:17  回复
400 | 啥问题，能否具体点
401 | 
402 | fzm_rabbit
403 | 2021-04-17 16:38  回复
404 | 我也是这个问题 请问解决了吗
405 | 
406 | 白衣大鬼
407 | 2020-12-30 22:29  回复
408 | 
409 | 请问一下，TOKEN_URL = 'http://openapi.baidu.com/oauth/2.0/token' 中的地址需要改成自己的请求地址吗？
410 | 
411 | passerma
412 | 2020-12-30 22:47  回复
413 | TOKEN_URL是用来请求token的，每个人都是一样的，没有token的话你那个语音识别的接口是调不通的
414 | 
415 | 白衣大鬼
416 | 2020-12-30 23:32  回复
417 | @passerma 好的，谢谢您。如果想找更多的.wav的指令反馈语音，是需要自己录制还是有其它渠道能找到？
418 | 
419 | passerma
420 | 2020-12-30 23:36  回复
421 | @白衣大鬼 自己录制就好了
422 | 
423 | 白衣大鬼
424 | 2020-12-30 23:39  回复
425 | @passerma 好的，感谢博主
426 | 
427 | zzf
428 | 2020-09-07 18:10  回复
429 | 这个 play函数播放wav 会和mplayer冲突吗？
430 | 
431 | passerma
432 | 2020-09-07 21:56  回复
433 | 不会，但是这个play函数播放会阻塞程序运行
434 | 
435 | zzf
436 | 2020-08-21 15:00  回复
437 | 后面改了一下 可以了 感谢
438 | 
439 | feihangfei
440 | 2021-03-27 21:33  回复
441 | 你后来改了什么呢？我也是跟你一样的问题，输出已放弃
442 | 
443 | zzf
444 | 2020-08-20 20:12  回复
445 | 运行后 输出以放弃是为什么呢 除了两个key 还有要改的地方吗？
446 | 
447 | passerma
448 | 2020-08-20 21:47  回复
449 | ？？？哪来的输出已放弃
450 | 
451 | rb
452 | 2020-07-21 16:03  回复
453 | 请问博主，我用你的代码出现了{'err_msg': 'request pv too much', 'err_no': 3305, 'sn': '807245906741595318568'}这个错误，我是刚刚创建的帐号为什么会有这个错误呢？求帮助！！
454 | 
455 | passerma
456 | 2020-07-21 22:05  回复
457 | 现在百度的语音识别已经不给免费额度了，得开通付费了
458 | 
459 | rb
460 | 2020-07-21 22:07  回复
461 | @passerma 好吧哈哈，谢谢博主！
462 | 
463 | passerma
464 | 2020-07-21 22:22  回复
465 | @rb 我看了下百度语音识别，现在的免费额度不再自动给了，得去管理中心的语音识别手动领取，就那个概览的表里面
466 | 
467 | rb
468 | 2020-07-21 22:29  回复
469 | @passerma 好的，我看见了，谢谢！！
470 | 
471 | 心静茹氵嫣
472 | 2020-05-30 10:32  回复
473 | 可以用
474 | 


--------------------------------------------------------------------------------
/buildroot_001.md:
--------------------------------------------------------------------------------
1 | * https://bootlin.com  
2 | * https://github.com/bootlin  
3 | 


--------------------------------------------------------------------------------
/car_001.md:
--------------------------------------------------------------------------------
1 | * 舵机转向  
2 | 


--------------------------------------------------------------------------------
/cmsis_nn_001.txt:
--------------------------------------------------------------------------------
  1 | https://www.taterli.com/5376/
  2 | 
  3 | CMSIS-NN 调通了~
  4 | 由 TaterLi2019年5月4日2 评论
  5 | 下载地址:CMSIS-NN
  6 | 
  7 | 由于是8位定点的推算算法,所以精度会差点,但是速度快啊,需要把图片存TF卡里,或者你自己开发驱动,我用的是STM32F769 DISCO.
  8 | 
  9 | 关键代码:
 10 | 
 11 | int8_t NN_OpenReadFile(const char *BmpName)
 12 | 
 13 | {
 14 | 
 15 |   uint32_t size = 0;
 16 | 
 17 |   int32_t h_index = 0;
 18 | 
 19 |   int32_t w_index = 0;
 20 | 
 21 |   int32_t h_total = 0;
 22 | 
 23 |   int32_t w_total = 0;
 24 | 
 25 |   int32_t h_nn = 0;
 26 | 
 27 |   int32_t w_nn = 0;
 28 | 
 29 |   UINT BytesRead;
 30 | 
 31 |   FIL bmpfile;
 32 | 
 33 | 
 34 | 
 35 |   q7_t output_data[10]; /* 输出数据,分别代表{"Plane", "Car", "Bird", "Cat", "Deer", "Dog", "Frog", "Horse", "Ship", "Truck"} */
 36 | 
 37 | 
 38 | 
 39 |   int8_t max_ind = 0;
 40 | 
 41 |   int8_t max_val = -128;
 42 | 
 43 | 
 44 | 
 45 |   BmpHeader *pbmpheader = (BmpHeader *)aBuffer;
 46 | 
 47 | 
 48 | 
 49 |   /* 打开BMP文件 */
 50 | 
 51 |   f_open(&bmpfile, BmpName, FA_READ);
 52 | 
 53 | 
 54 | 
 55 |   /* 移动到文件头 */
 56 | 
 57 |   f_lseek(&bmpfile, 0);
 58 | 
 59 | 
 60 | 
 61 |   /* 读取头信息 */
 62 | 
 63 |   f_read(&bmpfile, &aBuffer, BITMAP_HEADER_SIZE, &BytesRead);
 64 | 
 65 | 
 66 | 
 67 |   /* 取出原图的长宽 */
 68 | 
 69 |   h_total = pbmpheader->h;
 70 | 
 71 |   w_total = pbmpheader->w;
 72 | 
 73 |   /* 计算出缩小比例 */
 74 | 
 75 |   h_nn = pbmpheader->h / 32;
 76 | 
 77 |   w_nn = pbmpheader->w / 32;
 78 | 
 79 | 
 80 | 
 81 |   /* 取出一次读出数量 */
 82 | 
 83 |   size = pbmpheader->w * (pbmpheader->bpp / 8) + ((pbmpheader->w * (pbmpheader->bpp / 8)) % 4);
 84 | 
 85 | 
 86 | 
 87 |   if (pbmpheader->bpp != 24)
 88 | 
 89 |   {
 90 | 
 91 |     return -1; /* 保存为24Bit图像吧. */
 92 | 
 93 |   }
 94 | 
 95 | 
 96 | 
 97 |   if (size > BITMAP_BUFFER_SIZE)
 98 | 
 99 |   {
100 | 
101 |     return -2; /* 图片太大,超过缓冲区大小. */
102 | 
103 |   }
104 | 
105 | 
106 | 
107 |   /* 偏移到图像内容,然后提取图像内容到32*32的范围内. */
108 | 
109 |   f_lseek(&bmpfile, pbmpheader->offset);
110 | 
111 |   for (h_index = 0; h_index < h_total; h_index++)
112 | 
113 |   {
114 | 
115 |     f_read(&bmpfile, &aBuffer, size, &BytesRead);
116 | 
117 |     if (h_index % h_nn == 0)
118 | 
119 |     {
120 | 
121 |       for (w_index = 0; w_index < w_total; w_index++)
122 | 
123 |       {
124 | 
125 |         if (w_index % w_nn == 0)
126 | 
127 |         {
128 | 
129 |           BmpBuffer[(h_index) / h_nn][w_index / w_nn][2] = aBuffer[3 * w_index];
130 | 
131 |           BmpBuffer[(h_index) / h_nn][w_index / w_nn][1] = aBuffer[3 * w_index + 1];
132 | 
133 |           BmpBuffer[(h_index) / h_nn][w_index / w_nn][0] = aBuffer[3 * w_index + 2];
134 | 
135 |         }
136 | 
137 |       }
138 | 
139 |     }
140 | 
141 |   }
142 | 
143 | 
144 | 
145 |   /* 关闭文件 */
146 | 
147 |   f_close(&bmpfile);
148 | 
149 | 
150 | 
151 |   /* 运行NN算法 */
152 | 
153 |   run_nn((q7_t *)BmpBuffer, output_data);
154 | 
155 |   arm_softmax_q7(output_data, IP1_OUT_DIM, output_data);
156 | 
157 | 
158 | 
159 |   /* 找出最可信结果 */
160 | 
161 |   for (int i = 0; i < 10; i++)
162 | 
163 |   {
164 | 
165 |     if (max_val < output_data[i])
166 | 
167 |     {
168 | 
169 |       max_val = output_data[i];
170 | 
171 |       max_ind = i;
172 | 
173 |     }
174 | 
175 |   }
176 | 
177 | 
178 | 
179 |   /* 返回可信结果 */
180 | 
181 |   return max_ind;
182 | 
183 | }
184 | 返回值如果是正,就是有结果,0 ~ 9分别对应”Plane”, “Car”, “Bird”, “Cat”, “Deer”, “Dog”, “Frog”, “Horse”, “Ship”, “Truck”这么几种,推荐用Windows 10画图工具保存为24位BMP,最好的分辨率是32*32,如果不是32*32,则在程序里也有缩放,但是效果不太好,因为我是直接间隔抽取的,良好的缩放算法应该考虑颜色权重问题.
185 | 
186 | 
187 | 


--------------------------------------------------------------------------------
/cocos2d-x_001.md:
--------------------------------------------------------------------------------
1 | ## FantasyWarrior3D / 幻想战士  
2 | * https://github.com/chukong/SampleGame-FantasyWarrior3D  
3 | 


--------------------------------------------------------------------------------
/csky_001.txt:
--------------------------------------------------------------------------------
 1 | 3月17日 21:16 来自 微博 weibo.com 已编辑
 2 | c-sky（诛仙剑）的板buildroot跑通了，其实很简单：buildroot我用的是官方提供的1.0.1版：
 3 | https://github.com/c-sky/buildroot/releases/tag/v1.0.1
 4 | 。正常编译即可（按照buildroot的习惯做法，指定csky的defconf）；
 5 | 自带的串口转usb，接PC用putty，设置115200波特率，其他默认即可；如果putty串口没输出，按reset按钮；usb插u盘做u盘启动，不过我试过转接到tf卡也是可行的（效果一样）
 6 | 
 7 | (2020) 3月17日 17:14 来自 微博 weibo.com
 8 | 我在网上购买的诛仙剑 C-SKY Linux 开发板到手了，CPU是国产的核GX6605S（NationalChip GX6605S KR8659FBm），
 9 | 架构是CK610M，非ARM架构，只带3个IO口，c-sky官方的buildroot在这里：
10 | https://github.com/c-sky/buildroot
11 | 如果不嫌麻烦的话可以用buildroot官方的，配置文件叫csky_gx6605s_defconfig
12 | https://github.com/buildroot/buildroot/blob/master/configs/csky_gx6605s_defconfig
13 | 我试过用c-sky官方的1.1版，编译出来是一个usb.img镜像文件，估计可以烧录到u盘，实际还没测试过，等回家有时间再试
14 | 
15 | 


--------------------------------------------------------------------------------
/ctc_001.txt:
--------------------------------------------------------------------------------
  1 | https://www.cnblogs.com/qcloud1001/p/9041218.html
  2 | 
  3 | 腾讯云加社区
  4 | 博客园首页新随笔联系订阅管理随笔 - 769  文章 - 4  评论 - 862
  5 | 语音识别中的CTC算法的基本原理解释
  6 | 欢迎大家前往腾讯云+社区，获取更多腾讯海量技术实践干货哦~
  7 | 
  8 | 本文作者：罗冬日
  9 | 
 10 | 目前主流的语音识别都大致分为特征提取，声学模型，语音模型几个部分。目前结合神经网络的端到端的声学模型训练方法主要CTC和基于Attention两种。
 11 | 
 12 | 本文主要介绍CTC算法的基本概念，可能应用的领域，以及在结合神经网络进行CTC算法的计算细节。
 13 | 
 14 | CTC算法概念
 15 | CTC算法全称叫：Connectionist temporal classification。从字面上理解它是用来解决时序类数据的分类问题。
 16 | 
 17 | 传统的语音识别的声学模型训练，对于每一帧的数据，需要知道对应的label才能进行有效的训练，在训练数据之前需要做语音对齐的预处理。而语音对齐的过程本身就需要进行反复多次的迭代，来确保对齐更准确，这本身就是一个比较耗时的工作。
 18 | 
 19 | 
 20 | 
 21 | 上图是“你好”这句话的声音的波形示意图， 每个红色的框代表一帧数据，传统的方法需要知道每一帧的数据是对应哪个发音音素。比如第1,2,3,4帧对应n的发音，第5,6,7帧对应i的音素，第8,9帧对应h的音素，第10,11帧对应a的音素，第12帧对应o的音素。（这里暂且将每个字母作为一个发音音素）
 22 | 
 23 | 与传统的声学模型训练相比，采用CTC作为损失函数的声学模型训练，是一种完全端到端的声学模型训练，不需要预先对数据做对齐，只需要一个输入序列和一个输出序列即可以训练。这样就不需要对数据对齐和一一标注，并且CTC直接输出序列预测的概率，不需要外部的后处理。
 24 | 
 25 | 既然CTC的方法是关心一个输入序列到一个输出序列的结果，那么它只会关心预测输出的序列是否和真实的序列是否接近（相同），而不会关心预测输出序列中每个结果在时间点上是否和输入的序列正好对齐。
 26 | 
 27 | 
 28 | CTC引入了blank（该帧没有预测值），每个预测的分类对应的一整段语音中的一个spike（尖峰），其他不是尖峰的位置认为是blank。对于一段语音，CTC最后的输出是spike（尖峰）的序列，并不关心每一个音素持续了多长时间。
 29 | 如图2所示，拿前面的nihao的发音为例，进过CTC预测的序列结果在时间上可能会稍微延迟于真实发音对应的时间点，其他时间点都会被标记会blank。
 30 | 这种神经网络+CTC的结构除了可以应用到语音识别的声学模型训练上以外，也可以用到任何一个输入序列到一个输出序列的训练上（要求：输入序列的长度大于输出序列）。
 31 | 比如，OCR识别也可以采用RNN+CTC的模型来做，将包含文字的图片每一列的数据作为一个序列输入给RNN+CTC模型，输出是对应的汉字，因为要好多列才组成一个汉字，所以输入的序列的长度远大于输出序列的长度。而且这种实现方式的OCR识别，也不需要事先准确的检测到文字的位置，只要这个序列中包含这些文字就好了。
 32 | 
 33 | RNN+CTC模型的训练
 34 | 下面介绍在语音识别中，RNN+CTC模型的训练详细过程，到底RNN+CTC是如何不用事先对齐数据来训练序列数据的。
 35 | 首先，CTC是一种损失函数，它用来衡量输入的序列数据经过神经网络之后，和真实的输出相差有多少。
 36 | 
 37 | 比如输入一个200帧的音频数据，真实的输出是长度为5的结果。 经过神经网络处理之后，出来的还是序列长度是200的数据。比如有两个人都说了一句nihao这句话，他们的真实输出结果都是nihao这5个有序的音素，但是因为每个人的发音特点不一样，比如，有的人说的快有的人说的慢，原始的音频数据在经过神经网络计算之后，第一个人得到的结果可能是：nnnniiiiii...hhhhhaaaaaooo(长度是200)，第二个人说的话得到的结果可能是：niiiiii...hhhhhaaaaaooo(长度是200)。这两种结果都是属于正确的计算结果，可以想象，长度为200的数据，最后可以对应上nihao这个发音顺序的结果是非常多的。CTC就是用在这种序列有多种可能性的情况下，计算和最后真实序列值的损失值的方法。
 38 | 
 39 | 详细描述如下：
 40 | 
 41 | 训练集合为S={(x1,z1),(x2,z2),...(xN,zN)}, 表示有N个训练样本，x是输入样本，z是对应的真实输出的label。一个样本的输入是一个序列，输出的label也是一个序列，输入的序列长度大于输出的序列长度。
 42 | 
 43 | 对于其中一个样本(x,z)，x=(x1,x2,x3,...,xT)表示一个长度为T帧的数据，每一帧的数据是一个维度为m的向量，即每个xi∈Rm。 xi可以理解为对于一段语音，每25ms作为一帧，其中第i帧的数据经过MFCC计算后得到的结果。
 44 | 
 45 | z=(z1,z2,z3,...zU)表示这段样本语音对应的正确的音素。比如，一段发音“你好”的声音，经过MFCC计算后，得到特征x， 它的文本信息是“你好”，对应的音素信息是z=[n,i,h,a,o](这里暂且将每个拼音的字母当做一个音素)。
 46 | 
 47 | 特征x在经过RNN的计算之后，在经过一个softmax层，得到音素的后验概率y。 ytk(k=1,2,3,...n,t=1,2,3,...,T)表示在t时刻，发音为音素k的概率，其中音素的种类个数一共n个， k表示第k个音素，在一帧的数据上所有的音素概率加起来为1。即：
 48 | 
 49 | ∑Tt−1ytk=1,ytk≥0
 50 | 
 51 | 这个过程可以看做是对输入的特征数据x做了变换Nw:(Rm)T→(Rn)T，其中Nw表示RNN的变换，w表示RNN中的参数集合。
 52 | 
 53 | 过程入下图所示：
 54 | 
 55 | 
 56 | 以一段“你好”的语音为例，经过MFCC特征提取后产生了30帧，每帧含有12个特征，即x∈R30×14(这里以14个音素为例，实际上音素有200个左右)，矩阵里的每一列之和为1。后面的基于CTC-loss的训练就是基于后验概率y计算得到的。
 57 | 
 58 | 路径π和B变换
 59 | 在实际训练中并不知道每一帧对应的音素，因此进行训练比较困难。可以先考虑一种简单的情况，已知每一帧的音素的标签z′， 即训练样本为x和z′，其中z′不再是简单的[n,i,h,a,o]标签，而是：
 60 | 
 61 | z′=[n,n,n,...,nT1,i,i,i,...iT2,h,h,h,...hT3,a,a,a,...,aT4,o,o,o,...,oT5]
 62 | 
 63 | T1+T2+T3+T4+T5=T
 64 | 在我们的例子中， z′=[n,n,n,n,n,n,n,i,i,i,i,i,i,h,h,h,h,h,h,h,a,a,a,a,a,a,o,o,o,o,o,o,o]， $z\prime 包含了每一帧的标签。在这种情况下有：p(z\prime|x) = p(z\prime| y = N_w(x)) = y^1_{z\prime_1}y^2_{z\prime_2}y^3_{z\prime_3}....y^T_{z\prime_T}$  (1)
 65 | 
 66 | 该值即为后验概率图中用黑线圈起来的部分相乘。我们希望相乘的值越大越好，因此，数学规划可以写为：
 67 | 
 68 | minw−log(y1z′1.y2z′2.y3z′3...yTz′T)  (2)
 69 | 
 70 | subject to: y=Nw(x) (3)
 71 | 
 72 | 目标函数对于后验概率矩阵y中的每个元素ytk的偏导数为：
 73 | ∂−log(y1z′1.y2z′2.y3z′3...yTz′T)∂ytk=$\begin{cases} -\frac{y^1_{z\prime_1}...y^{i-1}{z\prime{i-1}}.y^{i+1}{z\prime{i+1}}....y^T_{z\prime_T}}{y^1_{z^\prime_1}.y^2_{z^\prime_2}.y^3_{z^\prime_3}...y^T_{z^\prime_T}} , \qquad if \qquad k = z\prime_i \qquad and \qquad t=i \ 0 \qquad 其他\end{cases}$
 74 | 
 75 | 也就是说，在每个时刻t(对应矩阵的一列）,目标只与ytz′t是相关的，在这个例子中是与被框起来的元素相关。
 76 | 
 77 | 其中Nw可以看做是RNN模型，如果训练数据的每一帧都标记了正确的音素，那么训练过程就很简单了，但实际上这样的标记过的数据非常稀少，而没有逐帧标记的数据很多，CTC可以做到用未逐帧标记的数据做训练。
 78 | 
 79 | 首先定义几个符号：
 80 | L={a,o,e,i,u,uˇ,b,p,m,...}
 81 | 
 82 | 表示所有音素的集合
 83 | 
 84 | π=(π1,π2,π3,...,πT),πi∈L
 85 | 
 86 | 表示一条由L中元素组成的长度为T的路径，比如z′就是一条路径，以下为几个路径的例子：
 87 | 
 88 | π1=(j,j,i,n,y,y,e,e,w,w,u,u,u,r,r,e,e,n,n,r,r,u,u,sh,sh,u,u,i,i)
 89 | π2=(n,n,n,n,i,i,i,i,h,h,h,h,a,a,a,a,a,a,a,a,a,o,o,o,o,o,o,o,o,o)
 90 | π3=(h,h,h,h,h,h,a,a,a,a,a,a,a,o,o,o,o,n,n,n,n,n,n,i,i,i,i,i,i,i)
 91 | π4=(n,i,h,a,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o)
 92 | π5=(n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,n,i,h,a,o)
 93 | π6=(n,n,n,i,i,i,h,h,h,h,h,a,,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o)
 94 | 
 95 | 这6条路径中，π1可以被认为是“今夜无人入睡”， π2可以被认为是在说“你好”，π3可以被认为是在说“好你”，π4,π5,π6都可以认为是在说“你好”。
 96 | 
 97 | 定义B变换，表示简单的压缩，例如：
 98 | B(a,a,a,b,b,b,c,c,d)=(a,b,c,d)
 99 | 
100 | 以上6条路径为例：
101 | B(π1)=(j,i,n,y,e,w,u,r,e,n,r,u,s,h,u,i)
102 | B(π2)=(n,i,h,a,)
103 | B(π3)=(h,a,o,n,i)
104 | B(π4)=(n,i,h,a,o)
105 | B(π5)=(n,i,h,a,o)
106 | B(π6)=(n,i,h,a,o)
107 | 
108 | 因此，如果有一条路径π有B(π)=(n,i,h,a,o)，则可以认为π是在说“你好”。即使它是如π4所示，有很多“o”的音素，而其他音素很少。路径π=(π1,π2,...,πT)的概率为它所经过的矩阵y上的元素相乘：
109 | 
110 | p(π|x)=p(π|y=Nw(x))=p(π|y)=∏Tt=1ytπt
111 | 
112 | 因此在没有对齐的情况下，目标函数应该为{π|B(π)=z}中所有元素概率之和。 即：
113 | maxwp(z|y=Nw(x))=p(z|x)=∑B(π)=zp(π|x)  (4)
114 | 
115 | 在T=30，音素为[n,i,h,a,o]的情况下，共有C529≈120000条路径可以被压缩为[n,i,h,a,o]。 路径数目的计算公式为C音素个数T−1，量级大约为(T−1)音素个数。一段30秒包含50个汉字的语音，其可能的路径数目可以高达108，显然这么大的路径数目是无法直接计算的。因此CTC方法中借用了HMM中的向前向后算法来计算。
116 | 
117 | 训练实施方法
118 | CTC的训练过程是通过∂p(z|x)∂w调整w的值使得4中的目标值最大，而计算的过程如下：
119 | 
120 | 
121 | 因此，只要得到$\frac {\partial p(z|x)}{\partial y^t_k} ，即可根据反向传播，得到\frac {\partial p(z|x)}{\partial w} $。下面以“你好”为例，介绍该值的计算方法。
122 | 
123 | 首先，根据前面的例子，找到所有可能被压缩为z=[n,i,h,a,o]的路径，记为{π|B(π)=z}。 可知所有π均有[n,n,n,....,n,i,.....,i,h,.....h,a,....a,o,...,o]的形式，即目标函数只与后验概率矩阵y中表示n,i,h,a,o的5行相关，因此为了简便，我们将这5行提取出来，如下图所示。
124 | 
125 | 
126 | 在每一个点上，路径只能向右或者向下转移，画出两条路径，分别用q和r表示，这两条路径都经过y14h这点，表示这两点路径均在第14帧的时候在发“h”音。因为在目标函数4的连加项中，有的项与y14h无关，因此可以剔除这一部分，只留下与y14h有关的部分，记为{π|B(π)=z,π14=h}
127 | 
128 | ∂p(z|y)∂y14h
129 | = ∂∑B(π)=zp(π|y)∂y14h
130 | = $\frac {\partial \sum_{B(\pi)=z}\prod_{t=1}^T y^t_{\pi_t}}{\partial y^{14}h} =\frac {\overbrace{\partial \sum{B(\pi)=z,\pi_{14}=h}\prod_{t=1}^T y^t_{\pi_t}}^{和y^{14}h有关的项} + \overbrace{\partial \sum{B(\pi)=z,\pi_{14} \neq h}\prod_{t=1}^T y^t_{\pi_t}}^{和y^{14}_h无关的项}}{\partial y^{14}_h}$
131 | 
132 | =∂∑B(π)=z,π14=h∏Tt=1ytπt∂y14h
133 | 
134 | 这里的q和r就是与y14h相关的两条路径。用q1:13和q15:30分别表示q在y14h之前和之后的部分，同样的，用r1:13和r15:30分别表示r在y14h之前和之后的部分.。可以发现，q1:13+h+r15:30与r1:13+h+q15:30同样也是两条可行的路径。q1:13+h+r15:30、r1:13+h+q15:30、q 、r这四条路径的概率之和为：
135 | y1q1..y13q13.y14h.y15q15....y30q30路径q的概率
136 | 
137 | +y1q1..y13q13.y14h.y15r15....y30r30路径q1:14+r14:30的概率
138 | 
139 | +y1r1..y13r13.y14h.y15q15....y30q30路径r1:14+q14:30的概率
140 | 
141 | +y1r1..y13r13.y14h.y15r15....y30r30路径r的概率
142 | 
143 | =(y1q1....y13q13+y1r1.....y13r13).y14h.(y15q15....y30q15+y15r15....y30r30)
144 | 
145 | 可以发现，该值可以总结为：（前置项）.y14h.(后置项)。由此，对于所有的经过y14h的路径，有：
146 | 
147 | ∑B(π)=z,π14=h∏Tt=1ytπt=(前置项).y14h.(后置项)$
148 | 
149 | 定义：
150 | $\alpha_(14)(h)=(前置项).y^{14}h = \sum{B(\pi_{1:14})=[n,i,h] }\prod_{t\prime=1}^t y^{t\prime}{\pi{t\prime}} $
151 | 
152 | 该值可以理解为从初始到y14h这一段里，所有正向路径的概率之和。并且发现，α14(h)可以由α13(h)和α13(i)递推得到，即：
153 | α14(h)=(α13(h)+α13(i))y14h
154 | 
155 | 该递推公式的含义是，只是在t=13时发音是“h”或“i”，在t=14时才有可能发音是“h”。那么在t=14时刻发音是“h”的所有正向路径概率α14(h)就等于在t=13时刻，发音为“h”的正向概率α13(h)加上发音为“i”的正向概率α13(i)，再乘以当前音素被判断为“h”的概率y14h。由此可知，每个αt(s)都可以由αt−1(s)和αt−1(s−1)两个值得到。α的递推流程如下图所示：
156 | 
157 | 
158 | 
159 | 即每个值都由上一个时刻的一个或者两个值得到，总计算量大约为2.T.音素个数。类似的，定义βt(s)， 递推公式为：
160 | 
161 | β14(h)=(β15(h)+β15(a))y14h
162 | 
163 | 因此有：
164 | 
165 | ∑B(π)=z,π14=h∏Tt=1ytπt=(前置项).y14h.(后置项)$
166 | 
167 | =α14(h)y14h.y14h.β14(h)y14h
168 | 
169 | =α14(h)β14(h)y14h
170 | 
171 | 然后：
172 | 
173 | ∂p(z|y)∂y14h
174 | = ∑B(π)=z,π14=h∏Tt=1ytπt∂y14h
175 | 
176 | = α14(h)y14h.y14h.β14(h)y14h∂y14h
177 | 
178 | =α14(h)β14(h)(y14h)2
179 | 
180 | 得到此值后，就可以根据反向传播算法进行训练了。
181 | 可以看到，这里总的计算量非常小，计算α和β的计算量均大约为(2.T.音素个数)，（加法乘法各一次），得到α和β之后，在计算对每个ytk的偏导值的计算量为(3.T.音素个数)，因此总计算量大约为(7.T.音素个数)，这是非常小的，便于计算。
182 | 
183 | 目前，深度学习的算法已经大规模应用于腾讯云的语音识别产品中。腾讯云拥有业内最先进的语音识别技术，基于海量的语音数据，积累了数十万小时的标注语音数据，采用LSTM，CNN，LFMMI，CTC等多种建模技术，结合超大规模语料的语言模型，对标准普通话的识别效果超过了97%的准确率。腾讯云的语音技术，应用涵盖范围广泛，具备出色的语音识别、语音合成、关键词检索、静音检测、语速检测、情绪识别等能力。并且针对游戏，娱乐，政务等几十个垂直做特殊定制的语音识别方案，让语音识别的效果更精准，更高效，全面满足电话客服质检、语音听写、实时语音识别和直播字幕等多种场景的应用。想试用相关产品吗？请猛戳：cloud.tencent.com/product/asr
184 | 
185 | 问答
186 | 语音识别API如何调用？
187 | 相关阅读
188 | 智能机器人语音识别技术
189 | python语音识别终极指南
190 | tensorflow LSTM +CTC实现端到端OCR
191 | 
192 | 此文已由作者授权腾讯云+社区发布，原文链接：https://cloud.tencent.com/developer/article/1122128?fromSource=waitui
193 | 
194 | 
195 | 海量技术实践经验，尽在云加社区！ https://cloud.tencent.com/developer
196 | 
197 | 


--------------------------------------------------------------------------------
/deepspeech_001.txt:
--------------------------------------------------------------------------------
  1 | https://www.hackster.io/dmitrywat/offline-speech-recognition-on-raspberry-pi-4-with-respeaker-c537e7
  2 | 
  3 | Offline Speech Recognition on Raspberry Pi 4 with Respeaker
  4 | Faster than real-time! Based on Mozilla's DeepSpeech Engine 0.7.*
  5 | 
  6 | Intermediate
  7 | Protip
  8 | 1 hour
  9 | 9,657
 10 | Offline Speech Recognition on Raspberry Pi 4 with Respeaker
 11 | Things used in this project
 12 | Hardware components
 13 | Seeed ReSpeaker USB Mic Array
 14 | ×	1	
 15 | Raspberry Pi 4 Computer Model B 1GB
 16 | ×	1	
 17 | NVIDIA? Jetson Nano? Developer Kit
 18 | ×	1	
 19 | Story
 20 | 
 21 | 
 22 | 
 23 | Life is short, but system resources are limited.Hm...
 24 | UPDATE June 2020: Updated commands for DeepSpeech 0.7.* .Screenshots, except for Raspberry Pi 4 stayed the same. Benchmarks table also hasn't changed, since I didn't notice any inference speed gain. But it seems there was accuracy improvement - the 2830-3980-0043.wav audio file, that before was transcribed as "experience proofless" is now transcribed as "experience proves this", which makes much more sense. The archived version of the article is still available on steemit. I also updated mic_streaming.py mic transcription with hotword script. Enjoy!
 25 | 
 26 | In this article we’re going to run and benchmark Mozilla’s DeepSpeech ASR (automatic speech recognition) engine on different platforms, such as Raspberry Pi 4(1 GB), Nvidia Jetson Nano, Windows PC and Linux PC.
 27 | 
 28 | 2019, last year, was the year when Edge AI became mainstream. Multiple companies have released boards and chips for fast inference on the edge and a plethora of optimization frameworks and models have appeared. Up to date, in my articles and videos I mostly focused my attention on the use of machine learning for computer vision, but I was always interested in running deep learning based ASR project on an embedded device. The problem until recently was the lack of simple, fast and accurate engine for that task. When I was researching this topic about a year ago, the few choices for when you had to run ASR (not just hot-word detection, but large vocabulary transcription) on, say, Raspberry Pi 3 were:
 29 | 
 30 | CMUSphinx
 31 | Kaldi
 32 | Jasper
 33 | Links:
 34 | 
 35 | Python 3 Artificial Intelligence: Offline STT and TTS
 36 | 
 37 | The Best Voice Recognition Software for Raspberry Pi
 38 | 
 39 | And a couple of other ones. None of them were easy to setup and not particularly suitable for running in resource constrained environment. So, a few weeks ago, I started looking into this area again and after some search has stumbled upon Mozilla’s DeepSpeech engine. It has been around for a while, but only recently (December 2019) they have released 0.6.0 version of their ASR engine, which comes with.tflite model among other significant improvements. It has reduced the size of English model from 188 MB to 47 MB. “DeepSpeech v0.6 with TensorFlow Lite runs faster than real time on a single core of a Raspberry Pi 4.”, claimed Reuben Morais from Mozilla in the news announcement. So I decided to verify that claim myself, run some benchmarks on different hardware and make my own audio transcription application with hotword detection. Let’s see what the results are.
 40 | 
 41 | Hint: I wasn’t disappointed.
 42 | 
 43 | Installation
 44 | Raspberry Pi 4/3B
 45 | 
 46 | The pre-built wheel package for arm7 architecture is set to use.tflite model by default and installing it as easy as just
 47 | 
 48 | pip3 install deepspeech
 49 | This is it really! The package is self-contained, no Tensorflow installation needed. The only external dependency is Numpy. You’ll need to download model separately, we’ll cover it in the next section.
 50 | 
 51 | Nvidia Jetson Nano
 52 | 
 53 | The latest version of DeepSpeech, 0.7.3 has pre-built binaries for aarch64 architecture, which have model type by default set to .tflite. Unfortunately, the wheel is only available for python 3.7 and NVIDIA's latest Jetpack 4.4 still comes with Python 3.6.9 as default python3... I don't know why and neither do the maintainers of DeepSpeech. Which means we'll have to install Python3.7 first, then install a couple of more dependencies and only then install DeepSpeech.
 54 | 
 55 | sudo apt-get install python3.7 python3.7-dev
 56 | python3.7 -m pip install cython
 57 | wget https://github.com/mozilla/DeepSpeech/releases/download/v0.7.3/deepspeech-0.7.3-cp37-cp37m-linux_aarch64.whl
 58 | python3.7 -m pip install deepspeech-0.7.3-cp37-cp37m-linux_aarch64.whl
 59 | Windows 10/Linux
 60 | 
 61 | For Windows and Linux you’ll need to download .tflite enabled version of pip package.
 62 | 
 63 | pip3 install deepspeech-tflite
 64 | If you’re using Python 3.8 you’ll likely to encounter DLL loading error on Windows. It can be corrected fairly simple with a little change to DeepSpeech package code, but I suggest you just install the version for Python 3.7, which works flawlessly.
 65 | 
 66 | If you have NVIDIA GPU and CUDA 10 installed you can opt for GPU-enabled version of Deepspeech
 67 | 
 68 | pip3 install deepspeech-gpu
 69 | Benchmarking
 70 | Let’s download the models, language model binary and some audio samples.
 71 | 
 72 | curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.1/deepspeech-0.7.1-models.tar.gz
 73 | tar xvf deepspeech-0.7.1-models.tar.gz
 74 | Download example audio files
 75 | 
 76 | curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/audio-0.7.1.tar.gz
 77 | tar xvf audio-0.7.1.tar.gz
 78 | Raspberry Pi 4 run:
 79 | 
 80 | deepspeech --model deepspeech-0.7.*-models.tflite --scorer deepspeech-0.7.*-models.scorer --audio audio/2830-3980-0043.wav
 81 | If successful you should see the following output
 82 | 
 83 | Not bad! 1.529 seconds for 1.975 seconds sound file. It IS faster than real time.
 84 | 
 85 | Nvidia Jetson Nano run:
 86 | 
 87 | deepspeech --model deepspeech-0.7.*-models.tflite --scorer deepspeech-0.7.*-models.scorer --audio audio/2830-3980-0043.wav
 88 | Hm.. a little bit slower than Raspberry Pi. That is expected, since Nvidia Jetson CPU is less powerful than Raspberry Pi 4. There are no pre-built binaries for arm64 architecture with GPU support as of this moment, so we cannot take advantage of Nvidia Jetson Nano’s GPU for inference acceleration. I don’t think this task is on DeepSpeech team roadmap, so in the near future I’ll do some research here myself and will try to compile that binary to see what speed gains can be achieved from using GPU. But seconds is still pretty decent speed and depending on your project you might want to choose to run DeepSpeech on CPU and have GPU for other deep learning tasks.
 89 | 
 90 | Windows 10/Linux
 91 | 
 92 | deepspeech --model deepspeech-0.7.*-models.tflite --scorer deepspeech-0.7.*-models.scorer --audio audio/2830-3980-0043.wav
 93 | Or if using GPU enabled version:
 94 | 
 95 | deepspeech --model deepspeech-0.7.*-models.pbmm --scorer deepspeech-0.7.*-models.scorer --audio audio/2830-3980-0043.wav
 96 | As you see.tflite model achieves sub-real time on modern CPU systems, which is great news for people creating offline ASR applications.
 97 | 
 98 | Here is comparison result table:
 99 | 
100 | Well, we did benchmarking with pre-recorded sound samples, but we really want to do some real time transcribing. Let’s do that!
101 | 
102 | Download DeepSpeech examples from https://github.com/mozilla/DeepSpeech-examples
103 | 
104 | Navigate to mic_vad_streaming and install the dependencies with
105 | 
106 | pip3 install -r requirements.txt
107 | sudo apt install portaudio19-dev
108 | Connect the microphone to your system (I am using Raspberry Pi 4 1 GB). For microphone, despite you can use any microphone, including your laptop’s inbuilt microphone, the quality of the sound really influences the results a lot. For this demo, I am using ReSpeaker USB Mic Array from Seeed Studio. It features the support of Far-field voice pick-up up to 5m and 360° pick-up pattern with following acoustic algorithms implemented: DOA(Direction of Arrival), AEC(Automatic Echo Cancellation), AGC(Automatic Gain Control), NS(Noice Suppression).
109 | 
110 | python3 ../DeepSpeech-examples/mic_vad_streaming/mic_vad_streaming.py --model deepspeech-0.7.*-models.tflite --scorer deepspeech-0.7.*-models.scorer --audio audio/2830-3980-0043.wav
111 | Execute this command from the folder with models. -v argument allows you to tweak the threshold of VAD(Voice activity detection). Here is the result of the demo.
112 | 
113 | 
114 | Okay, great! Can we improve on that? Yes. We really don’t want our device to be transcribing the conversations all the time. Talk about privacy nightmares and wasted electricity.
115 | 
116 | So we will want to implement so-called wake-up word detection. DeepSpeech is general purpose ASR engine and for wake-up word we need to use something more light-weight and more accurate for short voice commands. I tried two frameworks for hotword detection on Raspberry Pi: Snowboy and Porcupine. The first one ran successfully, but only supported Python 2… A closer look at snowboy Github repo shows that it is probably not under active development now. Porcupine worked great and it is free for non-commercial applications. So, I wrote a little script that would run wake-up word detection and upon its detection start transcribing speech with DeepSpeech ASR. It would stop transcribing when “stop transcribing” keyword is recognized in the transcription. After that it goes back to waiting for wake-up word mode.
117 | 
118 | Here is the result of the script - quite neat and completely offline.
119 | 
120 | 
121 | To reproduce the result yourself, clone my GitHub repository to Raspberry Pi and install the necessary dependencies with
122 | 
123 | chmod +x install.sh
124 | ./install.sh
125 | The script will also download the model and scorer - if you already have them on Raspberry Pi, just move them to the folder that contains mic_streaming.py file. After that run (replace blueberry with another keyword if you want):
126 | 
127 | python3 mic_streaming.py --keywords blueberry
128 | I hope you enjoyed this article and it was useful for you. In my opinion, 2020 will be the year reliable offline NLP and ASR will come to Edge devices, such as our phones, smart assistants and other embedded electronics. If you’d like to participate in that move, you’re welcome to have a look at Mozila’s DeepSpeech Github and try training your own model, for different language or different vocabulary. The thing I really like about DeepSpeech apart from being so easy to use, is that it is completely open-source and open to contributions.
129 | 
130 | The hardware for this article was kindly provided by Seeed studio. Check out Raspberry Pi 4, ReSpeaker USB Mic Array and other hardware for makers at Seeed studio store!
131 | 
132 | Add me on LinkedIn if you have any questions and subscribe to my YouTube channel to get notified about more interesting projects involving machine learning and robotics.
133 | 
134 | Stay tuned for more videos and articles!
135 | 
136 | 
137 | 


--------------------------------------------------------------------------------
/deepspeech_002.txt:
--------------------------------------------------------------------------------
  1 | https://www.seeedstudio.com/blog/2020/01/23/offline-speech-recognition-on-raspberry-pi-4-with-respeaker/
  2 | 
  3 | NEWS
  4 | Offline Speech Recognition on Raspberry Pi 4 with Respeaker
  5 | By Elaine Wu 6 months ago
  6 | Note: This article by Dmitry Maslov originally appeared on Hackster.io
  7 | 
  8 | In this article, we’re going to run and benchmark Mozilla’s DeepSpeech ASR (automatic speech recognition) engine on different platforms, such as Raspberry Pi 4(1 GB), Nvidia Jetson Nano, Windows PC, and Linux PC.
  9 | 
 10 | 2019, last year, was the year when Edge AI became mainstream. Multiple companies have released boards and chips for fast inference on the edge and a plethora of optimization frameworks and models have appeared. Up to date, in my articles and videos, I mostly focused my attention on the use of machine learning for computer vision, but I was always interested in running deep learning-based ASR projects on an embedded device. The problem until recently was the lack of simple, fast and accurate engines for that task. When I was researching this topic about a year ago, the few choices for when you had to run ASR (not just hot-word detection, but large vocabulary transcription) on, say, Raspberry Pi 3 were:
 11 | 
 12 | CMUSphinx
 13 | Kaldi
 14 | Jasper
 15 | Links:
 16 | 
 17 | Python 3 Artificial Intelligence: Offline STT and TTS
 18 | 
 19 | The Best Voice Recognition Software for Raspberry Pi
 20 | 
 21 | And a couple of other ones. None of them were easy to set up and not particularly suitable for running in resource constrained environment. So, a few weeks ago, I started looking into this area again and after some search has stumbled upon Mozilla’s DeepSpeech engine. It has been around for a while, but only recently (December 2019) they have released a 0.6.0 version of their ASR engine, which comes with.tflite model among other significant improvements. It has reduced the size of the English model from 188 MB to 47 MB. “DeepSpeech v0.6 with TensorFlow Lite runs faster than real-time on a single core of a Raspberry Pi 4.”, claimed Reuben Morais from Mozilla in the news announcement. So I decided to verify that claim myself, run some benchmarks on different hardware and make my own audio transcription application with hot word detection. Let’s see what the results are.
 22 | 
 23 | Hint: I wasn’t disappointed.
 24 | 
 25 | Actually I was as happy as this Firefox!
 26 | Actually I was as happy as this Firefox!
 27 | 
 28 | Installation
 29 | Raspberry Pi 4/3B
 30 | 
 31 | The pre-built wheel package for arm7 architecture is set to use.tflite model by default and installing it as easy as just
 32 | 
 33 | pip3 install deepspeech
 34 | This is it really! The package is self-contained, no Tensorflow installation needed. The only external dependency is Numpy. You’ll need to download model separately, we’ll cover it in the next section.
 35 | 
 36 | Nvidia Jetson Nano
 37 | 
 38 | As of the day of writing this article(1/22/2020) the pre-built wheel for arm64 architecture uses large.pbmm model by default. So, if you download it from DeepSpeech releases on Github, you’ll have an unpleasant surprise. With the swap file expanded to 4 GB, Jetson Nano can run the full model, but it takes about 18 seconds for 1.9-second file… There is a preview wheel with.tflite model support enabled available for download at https://community-tc.services.mozilla.com/api/queue/v1/task/KZMAnYo2Qy2-icrTp5Ldqw/runs/0/artifacts/public/deepspeech-0.6.1-cp37-cp37m-linux_aarch64.whl
 39 | 
 40 | You can download it and install with
 41 | 
 42 | python3.7 -m pip install --user deepspeech-0.6.1-cp37-cp37m-linux_aarch64.whl
 43 | We need to have Python 3.7 installed! Nvidia Jetson comes with Python 3.6 by default.
 44 | 
 45 | Windows 10/Linux
 46 | 
 47 | For Windows and Linux you’ll need to download.tflite enabled version of pip package.
 48 | 
 49 | pip3 install deepspeech-tflite
 50 | If you’re using Python 3.8 you’ll likely to encounter DLL loading error on Windows. It can be corrected fairly simple with a little change to DeepSpeech package code, but I suggest you just install the version for Python 3.7, which works flawlessly.
 51 | 
 52 | If you have NVIDIA GPU and CUDA 10 installed you can opt for GPU-enabled version of Deepspeech
 53 | 
 54 | pip3 install deepspeech-gpu
 55 | Benchmarking
 56 | Let’s download the models, language model binary and some audio samples.
 57 | 
 58 | curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/deepspeech-0.6.1-models.tar.gz
 59 | tar xvf deepspeech-0.6.1-models.tar.gz
 60 | 
 61 | Download example audio files
 62 | 
 63 | curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.6.1/audio-0.6.1.tar.gztar xvf audio-0.6.1.tar.gz
 64 | 
 65 | Raspberry Pi 4 run:
 66 | 
 67 | deepspeech --model deepspeech-0.6.1-models/output_graph.tflite --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830-3980-0043.wav
 68 | If successful you should see the following output
 69 | 
 70 | 
 71 | Not bad! 1.6 seconds for 1.98 seconds sound file. It IS faster than real time.
 72 | 
 73 | Nvidia Jetson Nano run:
 74 | 
 75 | deepspeech --model deepspeech-0.6.1-models/output_graph.tflite --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830-3980-0043.wav
 76 | 
 77 | Hm.. a little bit slower than Raspberry Pi. That is expected since Nvidia Jetson CPU is less powerful than Raspberry Pi 4. There are no pre-built binaries for arm64 architecture with GPU support as of this moment, so we cannot take advantage of Nvidia Jetson Nano’s GPU for inference acceleration. I don’t think this task is on the DeepSpeech team roadmap, so in the near future, I’ll do some research here myself and will try to compile that binary to see what speed gains can be achieved from using GPU. But seconds is still pretty decent speed and depending on your project you might want to choose to run DeepSpeech on CPU and have GPU for other deep learning tasks.
 78 | 
 79 | Windows 10/Linux
 80 | 
 81 | deepspeech --model deepspeech-0.6.1-models/output_graph.tflite --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830-3980-0043.wav
 82 | 
 83 | 
 84 | Or if using GPU enabled version:
 85 | 
 86 | deepspeech --model deepspeech-0.6.1-models/output_graph.pbmm --lm deepspeech-0.6.1-models/lm.binary --trie deepspeech-0.6.1-models/trie --audio audio/2830-3980-0043.wav
 87 | 
 88 | As you see.tflite model achieves sub-real time on modern CPU systems, which is great news for people creating offline ASR applications.
 89 | 
 90 | Here is comparison result table:
 91 | 
 92 | 
 93 | Well, we did benchmarking with pre-recorded sound samples, but we really want to do some real-time transcribing. Let’s do that!
 94 | 
 95 | Download DeepSpeech examples from https://github.com/mozilla/DeepSpeech-examples
 96 | 
 97 | Navigate to mic_vad_streaming and install the dependencies with
 98 | 
 99 | pip3 install -r requirements.txt
100 | Connect the microphone to your system (I am using Raspberry Pi 4 1 GB). For the microphone, despite you can use any microphone, including your laptop’s inbuilt microphone, the quality of the sound really influences the results a lot. For this demo, I am using ReSpeaker USB Mic Array from Seeed Studio. It features the support of Far-field voice pick-up up to 5m and 360° pick-up pattern with following acoustic algorithms implemented: DOA(Direction of Arrival), AEC(Automatic Echo Cancellation), AGC(Automatic Gain Control), NS(Noice Suppression).
101 | 
102 | 
103 | python3 ../DeepSpeech-examples/mic_vad_streaming/mic_vad_streaming.py -m ./output_graph.tflite -l lm.binary -t trie -v 3
104 | Execute this command from the folder with models. -v argument allows you to tweak the threshold of VAD(Voice activity detection). Here is the result of the demo.
105 | 
106 | 
107 | Okay, great! Can we improve on that? Yes. We really don’t want our device to be transcribing the conversations all the time. Talk about privacy nightmares and wasted electricity.
108 | 
109 | It/He/She? is listening... Or maybe not. If it's not Opensource you'd never know.
110 | It/He/She? is listening… Or maybe not. If it’s not Opensource you’d never know.
111 | 
112 | So we will want to implement so-called wakeup word detection. DeepSpeech is a general-purpose ASR engine and for the wake-up words we need to use something more light-weight and more accurate for short voice commands. I tried two frameworks for hot word detection on Raspberry Pi: Snowboy and Porcupine. The first one ran successfully, but only supported Python 2… A closer look at snowboy Github repo shows that it is probably not under active development now. Porcupine worked great and it is free for non-commercial applications. So, I wrote a little script that would run wake-up word detection and upon its detection start transcribing speech with DeepSpeech ASR. It would stop transcribing when “stop transcribing” keyword is recognized in the transcription. After that, it goes back to waiting for wake-up word mode.
113 | 
114 | Here is the result of the script – quite neat and completely offline.
115 | 
116 | 
117 | To reproduce the result yourself, download the files from Porcupine Github and make the folder with the following file structure (I cannot redistribute Porcupine libraries and code, so I just uploaded my own script to Github together with folder structure).
118 | 
119 | 
120 | You will also need to make a one-line change to resources/util/python/util.py:
121 | 
122 | elif 'rev 3' in model_info:    
123 | 	return 'cortex-a53'
124 | It is hacky approach, but unfortunately Porcupine is not officially supported on Raspberry Pi 4… Despite it is the same architecture with Raspberry Pi 3. So if you wouldn’t change “rev 5” to “rev 3” it wouldn’t start.
125 | 
126 | I hope you enjoyed this article and it was useful for you. In my opinion, 2020 will be the year reliable offline NLP and ASR will come to Edge devices, such as our phones, smart assistants and other embedded electronics. If you’d like to participate in that move, you’re welcome to have a look at Mozilla’s DeepSpeech Github and try training your own model, for different languages or different vocabulary. The thing I really like about DeepSpeech apart from being so easy to use, is that it is completely open-source and open to contributions.
127 | 
128 | The hardware for this article was kindly provided by Seeed Studio. Check out
129 | 
130 | Raspberry Pi 4, ReSpeaker USB Mic Array and other hardware for makers at Seeed Studio store!
131 | 
132 | Stay tuned for more videos and articles!
133 | 
134 | Please follow and like us:
135 | 
136 | 


--------------------------------------------------------------------------------
/ds-cnn_001.txt:
--------------------------------------------------------------------------------
 1 | 
 2 | https://www.veryarm.com/112251.html
 3 | 
 4 | 
 5 | 
 6 | 奇手
 7 | 	* 
 8 | 信息
 9 | 	* 
10 | 知识经验
11 | 	* 
12 | 编译器下载
13 | 
14 | 		* 
15 | arm-none-elf-gcc下载
16 | 		* 
17 | arm-none-linux-gnueabi-gcc下载
18 | 		* 
19 | arm-none-eabi-gcc下载
20 | 		* 
21 | arm-linux-gnueabihf-gcc下载
22 | 	* 
23 | 移植
24 | 	* 
25 | 教程
26 | 
27 | 		* 
28 | 视频教程
29 | 	* 
30 | 关于
31 | 
32 | 
33 | 你的位置：奇手 > ARM > 在终端设备上实现语音识别：ARM开源了TensorFlow预训练模型在终端设备上实现语音识别：ARM开源了TensorFlow预训练模型
34 | ARM 1年前 (2019-01-14) 331浏览
35 | 本文来自AI新媒体量子位（QbitAI）
36 | 关键词识别（Keyword Spotting，KWS）是语音识别领域的一个子领域，在用户在智能设备上进行语音交互时起到重要作用。关键词识别pipeline
37 | 近日，ARM和斯坦福大学合作开源了预训练TensorFlow模型和它们的语音关键词识别代码，并将结果发表在论文Hello Edge: Keyword Spotting on Microcontrollers中。
38 | 这个开源库包含了TensorFlow模型和在论文中用到的训练脚本。
39 | 在论文中，研究人员还展示了不同的神经网络架构，包含DNN、CNN、Basic LSTM、LSTM、GRU、CRNN和DS-CNN，并将这些架构加入到预训练模型中。
40 | 预训练模型地址：
41 | https://github.com/ARM-software/ML-KWS-for-MCU/tree/master/Pretrained_models论文摘要
42 | 在研究中，研究人员评估了神经网络架构，并且在资源受限的微控制器上运行KWS。他们训练了多种神经网络架构变体，并比较变体之间的准确性和存储/计算需求。△�0�2神经网络模型的准确性
43 | 研究人员发现，在不损失精确度的情况下，在存储了计算资源受限的微控制器上优化这些神经网络架构可行。
44 | 之后，研究人员还进一步探索了DS-CNN架构，并且和其他神经网络架构进行了对比。
45 | 结果证明，DS-CNN架构的准确性最高，为95.4%，比超参数相似的DNN模型精确度约高10%。超参数搜索中的最佳神经网络相关资料
46 | 论文下载地址：
47 | https://arxiv.org/pdf/1711.07128.pdf
48 | 项目代码地址：
49 | https://github.com/ARM-software/ML-KWS-for-MCU
50 | 本文作者：林鳞
51 | 原文发布时间：2017-12-14
52 | 
53 | 继续浏览有关 ARM嵌入式 的文章
54 | 上一篇 基于友善之臂ARM-ContexA9-ADC驱动开发 制作好的交叉编译工具链下载及使用 下一篇标签云
55 | 
56 | 
57 | 


--------------------------------------------------------------------------------
/editor_001.md:
--------------------------------------------------------------------------------
 1 | * https://github.com/weimingtom/twentylight  
 2 | * search baidupan, 2D游戏地图编辑器合集可用  
 3 | * search weibo, 编辑器  
 4 | * Java, UI Editor:    
 5 | * https://github.com/relu91/niftyeditor  
 6 | * Etc  
 7 | * https://github.com/weimingtom/wmt_link_collections_in_Chinese/blob/master/game_editor.md  
 8 | * https://github.com/weimingtom/wmt_link_collections_in_Chinese/blob/master/ide.md  
 9 | * https://github.com/weimingtom/wmt_link_collections_in_Chinese/blob/master/editor.md  
10 | * csharp, wpf  
11 | * https://github.com/weimingtom/nianhao  
12 | * search baidupan, MyNotepad_v1.rar  
13 | * https://sourceforge.net/projects/d2dmapeditor/  
14 | 


--------------------------------------------------------------------------------
/framebuffer_001.txt:
--------------------------------------------------------------------------------
  1 | https://www.iteye.com/blog/weimingtom-1113103
  2 | 
  3 | 
  4 | Ubuntu Framebuffer学习笔记
  5 | 博客分类： 已过时文章（留念用）
  6 | LinuxFramebuffer 
  7 |  
  8 | 
  9 | Ubuntu Framebuffer学习笔记
 10 | 
 11 |  
 12 | 
 13 | 一、环境搭建
 14 | 
 15 | 1. 直接在Ubuntu上运行Framebuffer
 16 | 
 17 | 默认Ubuntu是直接进入X视窗，如果要使用Framebuffer，
 18 | 
 19 | 需要修改内核引导参数：
 20 | 
 21 | $ sudo gedit /etc/default/grub
 22 | 
 23 | 查找
 24 | 
 25 | GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
 26 | 
 27 | 把它改为
 28 | 
 29 | GRUB_CMDLINE_LINUX_DEFAULT="quiet splash text vga=0x311" 
 30 | 
 31 | 这里text表示进入文本模式，vga=0x311表示使用Framebuffer显示驱动，
 32 | 
 33 | 0x311是指示色深和分辨率的参数
 34 | 
 35 |   |640x480 800x600 1024x768 1280x1024
 36 | 
 37 | ----+-------------------------------------
 38 | 
 39 | 256 | 0x301   0x303 0x305 0x307
 40 | 
 41 | 32k | 0x310   0x313 0x316 0x319
 42 | 
 43 | 64k | 0x311   0x314 0x317 0x31A
 44 | 
 45 | 16M | 0x312   0x315 0x318 0x31B
 46 | 
 47 | 如果使用vga=0x311参数，必须使用后面提到的vesafb模块，并且取消黑名单，
 48 | 
 49 | 否则无法进入系统，需要光盘启动删除vga参数以还原
 50 | 
 51 | $ sudo update-grub
 52 | 
 53 | 写入到/boot/grub/grub.cfg
 54 | 
 55 | $ sudo gedit /etc/initramfs-tools/modules
 56 | 
 57 | 在其中加入：vesafb
 58 | 
 59 | $ sudo gedit /etc/modprobe.d/blacklist-framebuffer.conf
 60 | 
 61 | 用#注释以下行
 62 | 
 63 | # blacklist vesafb
 64 | 
 65 | $ sudo update-initramfs -u
 66 | 
 67 | （生成新的initrd）
 68 | 
 69 | 然后重启机器，即可进入Framebuffer
 70 | 
 71 | 如果要切换回X11，可以输入：
 72 | 
 73 | $ startx
 74 | 
 75 | 有时候/boot/grub/grub.cfg的引导参数不正确导致系统无法引导，
 76 | 
 77 | 可以用光盘引导系统，挂载硬盘后直接修改/boot/grub/grub.cfg文件
 78 | 
 79 | 这样就可以跳过update-grub这一步。然后还原原有的引导参数进入X Window
 80 | 
 81 | 2. 使用qemu虚拟Linux
 82 | 
 83 | 需要编译Linux内核和busybox。
 84 | 
 85 | 此外还需要libncurses-dev和qemu。
 86 | 
 87 | 由于qemu可以直接加载内核和initrd，指定引导参数，
 88 | 
 89 | 所以不需要修改grub配置。
 90 | 
 91 | (1)编译内核和安装qemu
 92 | 
 93 | $ tar xjf linux-2.6.39.2.tar.bz2
 94 | 
 95 | $ cd linux-2.6.39.2/
 96 | 
 97 | $ make help
 98 | 
 99 | $ make i386_defconf
100 | 
101 | $ sudo apt-get install libncurses-dev
102 | 
103 | $ make menuconfig
104 | 
105 | $ make
106 | 
107 | $ sudo apt-get install qemu
108 | 
109 | $ qemu --help
110 | 
111 | $ qemu -kernel arch/x86/boot/bzImage
112 | 
113 | $ qemu -kernel arch/x86/boot/bzImage -append "noapic"
114 | 
115 | 有时候内核会这样崩溃：
116 | 
117 | MP-BIOS BUG 8254 timer not connected
118 | 
119 | trying to set up timer as Virtual Wire IRQ
120 | 
121 | 所以需要添加-append "noapic"参数
122 | 
123 | (2) 修改内核配置，然后重新编译内核。
124 | 
125 | 注意，不同内核版本的配置不一样，
126 | 
127 | 我的内核配置作如下改动（用空格切换为*，不要切换为M）：
128 | 
129 | $ make menuconfig
130 | 
131 | Device Drivers  --->  
132 | 
133 | Graphics support  --->   
134 | 
135 | -*- Support for frame buffer devices  --->  
136 | 
137 | [*]   VESA VGA graphics support 
138 | 
139 | 因为VESA支持彩色色深的显示。
140 | 
141 | 默认是不选的，只能是黑白控制台。
142 | 
143 | Input device support  ---> 
144 | 
145 | [*]     Provide legacy /dev/psaux device 
146 | 
147 | 有些库如SDL在识别USB接口的鼠标时会寻找/dev/input/mice和/dev/psaux，
148 | 
149 | 我发现我编译的内核没有前者，所以用这个选项制造出/dev/psaux设备。
150 | 
151 | File systems  --->  
152 | 
153 | [*] Miscellaneous filesystems  --->
154 | 
155 | <*>   Compressed ROM file system support (cramfs) 
156 | 
157 | 个人喜欢cramfs，不过不是必须的，可以用这个开关编译cramfs驱动，
158 | 
159 | 测试initramfs是否正常
160 | 
161 | General setup  ---> 
162 | 
163 | [*]   Support initial ramdisks compressed using gzip 
164 | 
165 | [*] Embedded system
166 | 
167 | 默认i86内核的配置不支持gzip压缩的cpio格式initrd，所以需要手动打开它。
168 | 
169 | 最后重新编译内核：
170 | 
171 | $ make
172 | 
173 | (3) 编译busybox
174 | 
175 | $ tar xjf busybox-1.18.5.tar.bz2
176 | 
177 | $ cd busybox-1.18.5/
178 | 
179 | $ make defconfig
180 | 
181 | $ make menuconfig
182 | 
183 | 设置修改如下：
184 | 
185 | Busybox Settings  --->  
186 | 
187 | Build Options  --->
188 | 
189 | [*] Build BusyBox as a static binary (no shared libs)  
190 | 
191 | $ make 
192 | 
193 | $ make install
194 | 
195 | 默认文件安装在当前目录的_install目录下。
196 | 
197 | (4) 制作cpio封包gzip压缩的initrd
198 | 
199 | $ cd ../busybox-1.18.5/_install/
200 | 
201 | $ mkdir proc sys dev etc etc/init.d tmp root usr lib
202 | 
203 | $ gedit etc/init.d/rcS
204 | 
205 | #!/bin/sh
206 | 
207 | mount -t proc none /proc
208 | 
209 | mount -t sysfs none /sys
210 | 
211 | /sbin/mdev -s
212 | 
213 | $ chmod +x etc/init.d/rcS
214 | 
215 | $ cd ../../linux-2.6.39.2/
216 | 
217 | $ gedit prerun.sh
218 | 
219 | #!/bin/sh
220 | 
221 | cd ../busybox-1.18.5/_install
222 | 
223 | find . | cpio -o --format=newc > ../rootfs.img
224 | 
225 | cd .. 
226 | 
227 | gzip -c rootfs.img > rootfs.img.gz
228 | 
229 | cd ../linux-2.6.39.2/
230 | 
231 | $ . prerun.sh
232 | 
233 | $ gedit run.sh
234 | 
235 | #!/bin/sh
236 | 
237 | qemu -kernel ./arch/i386/boot/bzImage -initrd ../busybox-1.18.5/rootfs.img.gz  -append "root=/dev/ram rdinit=/sbin/init vga=0x312 noapic"
238 | 
239 | 注意这里用rdinit=，如果用init=就成了initramfs（内核会报告找不到合适的文件系统）
240 | 
241 | 关于vga=的参数设置见前面（决定色深和分辨率）
242 | 
243 | $ . run.sh
244 | 
245 | 编译程序，然后用上面写的prerun.sh打包进rootfs.img.gz，然后运行run.sh跑qemu即可。
246 | 
247 | 如果程序是动态链接，需要特定的动态库，
248 | 
249 | 可以把依赖的动态库复制到_install/lib目录下，打包到rootfs.img.gz中。
250 | 
251 | (5) 进入qemu的效果如下：
252 | 
253 |  
254 | 
255 | 
256 | 
257 | 
258 |  
259 | 
260 | 二、Framebuffer的应用开发
261 | 
262 | 1. 基于/lib/fb*设备和mmap
263 | 
264 | 这种方法灵活性差，开发比较费时。
265 | 
266 | 这里有个示例代码：
267 | 
268 | http://www.kde.gr.jp/~ichi/qt/emb-framebuffer-howto.html
269 | 
270 | 代码如下：
271 | 
272 |  
273 | 
274 | C代码  收藏代码
275 | #include <stdlib.h>  
276 | #include <unistd.h>  
277 | #include <stdio.h>  
278 | #include <fcntl.h>  
279 | #include <linux/fb.h>  
280 | #include <sys/mman.h>  
281 |   
282 | int main()  
283 | {  
284 |     int fbfd = 0;  
285 |     struct fb_var_screeninfo vinfo;  
286 |     struct fb_fix_screeninfo finfo;  
287 |     long int screensize = 0;  
288 |     char *fbp = 0;  
289 |     int x = 0, y = 0;  
290 |     long int location = 0;  
291 |     fbfd = open("/dev/fb0", O_RDWR);  
292 |     if (!fbfd) {  
293 |         printf("Error: cannot open framebuffer device.\n");  
294 |         exit(1);  
295 |     }  
296 |     printf("The framebuffer device was opened successfully.\n");  
297 |     if (ioctl(fbfd, FBIOGET_FSCREENINFO, &finfo)) {  
298 |         printf("Error reading fixed information.\n");  
299 |         exit(2);  
300 |     }  
301 |     if (ioctl(fbfd, FBIOGET_VSCREENINFO, &vinfo)) {  
302 |         printf("Error reading variable information.\n");  
303 |         exit(3);  
304 |     }  
305 |     printf("%dx%d, %dbpp\n", vinfo.xres, vinfo.yres, vinfo.bits_per_pixel );  
306 |     screensize = vinfo.xres * vinfo.yres * vinfo.bits_per_pixel / 8;  
307 |     fbp = (char *)mmap(0, screensize, PROT_READ | PROT_WRITE, MAP_SHARED,  
308 |                        fbfd, 0);  
309 |     if ((int)fbp == -1) {  
310 |         printf("Error: failed to map framebuffer device to memory.\n");  
311 |         exit(4);  
312 |     }  
313 |     printf("The framebuffer device was mapped to memory successfully.\n");  
314 |     x = 100;   
315 |     y = 100;  
316 |     for ( y = 100; y < 300; y++ )  
317 |         for ( x = 100; x < 300; x++ ) {  
318 |             location = (x+vinfo.xoffset) * (vinfo.bits_per_pixel/8) +  
319 |                        (y+vinfo.yoffset) * finfo.line_length;  
320 |             if ( vinfo.bits_per_pixel == 32 ) {  
321 |                 *(fbp + location) = 100;    
322 |                 *(fbp + location + 1) = 15+(x-100)/2;  
323 |                 *(fbp + location + 2) = 200-(y-100)/5;   
324 |                 *(fbp + location + 3) = 0;  
325 |             } else  {  
326 |                 int b = 10;  
327 |                 int g = (x-100)/6;  
328 |                 int r = 31-(y-100)/16;  
329 |                 unsigned short int t = r<<11 | g << 5 | b;  
330 |                 *((unsigned short int*)(fbp + location)) = t;  
331 |             }  
332 |         }  
333 |     munmap(fbp, screensize);  
334 |     close(fbfd);  
335 |     return 0;  
336 | }  
337 |  
338 | 
339 | （已经在Ubuntu和qemu上测试过）
340 | 
341 |  
342 | 
343 | 2. 基于DirectFB
344 | 
345 | DirectFB好像可以加快Framebuffer的绘画速度。
346 | 
347 | 官方有5个示例代码
348 | 
349 | http://directfb.org/index.php?path=Development%2FTutorials
350 | 
351 | DirectFB有支持字体、图片和输入插件，
352 | 
353 | 不过在编译前需要事先安装相关的开发库。
354 | 
355 | （已经在Ubuntu和qemu上测试过）
356 | 
357 |  
358 | 
359 | 3. 基于SDL
360 | 
361 | SDL支持Framebuffer，因为它底层使用了DirectFB。
362 | 
363 | Ubuntu的SDL可以在Framebuffer下使用，如果自己编译SDL的代码，
364 | 
365 | 需要事先编译DirectFB
366 | 
367 | （已经在Ubuntu和qemu上测试过）
368 | 
369 |  
370 | 
371 | 4. 基于GTK+2
372 | 
373 | 项目主页在：
374 | 
375 | http://www.gtk.org/
376 | 
377 | GTK+2通过Cairo库的DirectFB后端支持Framebuffer。
378 | 
379 | 而且gtk库在编译时需要填加特定参数
380 | 
381 | ./configure --prefix=$PREFIX --with-gdktarget=directfb --without-x
382 | 
383 | 然后进行编译。
384 | 
385 | 不过GTK+2的编译比较麻烦，而且directfb上GTK+2的开发代码尚不稳定。
386 | 
387 | 测试的可执行文件在/bin/gtk-demo
388 | 
389 | （在Ubuntu上跑似乎有问题——不知道怎么拖动窗口和退出；未在qemu上测试）
390 | 
391 |  
392 | 
393 | 5. 基于Qt
394 | 
395 | 下载在
396 | 
397 | ftp://ftp.qt.nokia.com/qt/source/
398 | 
399 | 一般需要自己编译
400 | 
401 | $ tar xzf qt-everywhere-opensource-src-4.8.0-tp.tar.gz
402 | 
403 | $ cd qt-everywhere-opensource-src-4.8.0-tp/
404 | 
405 | $ ./configure -shared -static -opensource -embedded generic
406 | 
407 | 虽然-embedded这个参数没有出现在./configure --help里，
408 | 
409 | 但它在这里是有效的。这里不需要添加--prefix参数
410 | 
411 | $ sudo make -j 4
412 | 
413 | $ sudo make install
414 | 
415 | 默认装在/usr/local/Trolltech目录下。
416 | 
417 | 示例代码和编译的二进制文件在demos/embedded目录下。
418 | 
419 | 进入framebuffer下的文本模式，
420 | 
421 | 然后运行sudo ./xxx -qws
422 | 
423 | 运行时如果提示：
424 | 
425 | Qt/Embedded data directory is not owned by user 0:/tmp/qtembedded-0
426 | 
427 | 可以运行：
428 | 
429 | sudo chown root:root /tmp/qtembedded-0
430 | 
431 | 如果要在qemu上跑，需要复制/lib/fonts目录下的字体
432 | 
433 | （可以在Ubuntu上跑，在qemu上跑似乎有点问题——未解决）
434 | 
435 | 


--------------------------------------------------------------------------------
/game_001.md:
--------------------------------------------------------------------------------
 1 | ## dino  
 2 | * https://github.com/weimingtom/game_dino_vc6  
 3 | * https://github.com/makerdiary/python-games-on-microcontroller  
 4 | * https://github.com/makerdiary/nrf52840-m2-devkit/tree/master/examples/python/dino  
 5 | * (TODO) dino, port to c++    
 6 | * search baidupan, dino_v3_6.rar  
 7 | 
 8 | ## apple2  
 9 | * https://github.com/weimingtom/wmt_link_collections_in_Chinese/blob/master/emulator.md  
10 | * https://gitee.com/wddark/apple2emulator  
11 | 
12 | ## nes  
13 | https://github.com/weimingtom/wmt_stm32_study#nofrendo-stm32f4  
14 | 
15 | ## ref  
16 | * https://gitee.com/weimingtom/wifiboymod  
17 | * https://gitee.com/weimingtom/arduboymod  
18 | * https://github.com/karaage0703/wio-terminal-flappy-bird  
19 | 
20 | ## makecode  
21 | * https://meowbit-doc.kittenbot.cn/#/makecode/社区趣味游戏分享  
22 | * https://makecode.com/_Txi3M7FrzXV4  
23 | * https://makecode.com/_9ecXcLeeAUF1  
24 | * https://makecode.com/_f9M6YvW077wd  
25 | 
26 | ## arduboy  
27 | * https://gitee.com/weimingtom/arduboymod  
28 | * search baidupan, arduboymod  
29 | * http://www.lingzhilab.com/lzbbs/resources.html?ecid=175  
30 | * https://asmcbain.net/projects/arduboy/docs/1.1/api/index.html  
31 | * search baidupan, 零知开源示例程序.7z  
32 | 零知开源示例程序.7z\lingzhi-examples\lz-standard\Arduboy(复古游戏制作)  
33 | 
34 | ## wifiboy  
35 | * https://wifiboy.org    
36 | * https://wifiboy.org/ProTutorials.html  
37 | * search baidupan, wifiboymod  
38 | 
39 | ## gamebuino  
40 | * search baidupan, gamebuino_nano_v1.rar  
41 | 
42 | ## chinese chess  
43 | * https://github.com/chenxiao07/TinyChess  
44 | 


--------------------------------------------------------------------------------
/game_server_001.md:
--------------------------------------------------------------------------------
1 | * https://github.com/NetEase/pomelo  
2 | 


--------------------------------------------------------------------------------
/gcc_001.txt:
--------------------------------------------------------------------------------
 1 | http://blog.sina.com.cn/s/blog_69b81dd10100z4f4.html  
 2 | 
 3 | 求问 gcc的标记 -MMD -MP -MF"xx.d" -MT"xx.d" 都是什么意思 (2012-06-05 05:51:41)转载▼
 4 | 标签： 的 都 杂谈	
 5 | 求问 gcc的标记 -MMD -MP -MF"xx.d" -MT"xx.d" 都是什么意思求问:
 6 | gcc的标记 -MMD -MP -MF"xx.d" -MT"xx.d" 都是什么意思?
 7 | 有啥作用?info gcc原帖由 flw 于 2007-5-8 10:41 发表
 8 | info gcc
 9 | 我用windows没有linux,没办法info,
10 | 能怎么帮我info一下
11 | 最好info出中文就好了:lol:那就上gnu的网站上搞个manual原帖由 rrrrrrrr8 于 2007-5-8 10:48 发表
12 | 
13 | 我用windows没有linux,没办法info,
14 | 能怎么帮我info一下
15 | 最好info出中文就好了:lol:
16 | 那就 google info gcc`-MF FILE'
17 | When used with `-M' or `-MM', specifies a file to write the
18 | dependencies to.If no `-MF' switch is given the preprocessor
19 | sends the rules to the same place it would have sent preprocessed
20 | output.
21 | 
22 | When used with the driver options `-MD' or `-MMD', `-MF' overrides
23 | the default dependency output file.
24 | 
25 | `-MD'
26 | `-MD' is equivalent to `-M -MF FILE', except that `-E' is not
27 | implied.The driver determines FILE based on whether an `-o'
28 | option is given.If it is, the driver uses its argument but with
29 | a suffix of `.d', otherwise it take the basename of the input file
30 | and applies a `.d' suffix.
31 | 
32 | If `-MD' is used in conjunction with `-E', any `-o' switch is
33 | understood to specify the dependency output file (but *note -MF:
34 | dashMF.), but if used without `-E', each `-o' is understood to
35 | specify a target object file.
36 | 
37 | Since `-E' is not implied, `-MD' can be used to generate a
38 | dependency output file as a side-effect of the
39 | 请教在哪加函数可以在找不到CF卡的时候能够使系统自动重启。
40 | 
41 | compilation process.
42 | 
43 | `-MMD'
44 | Like `-MD' except mention only user header files, not system
45 | header files.
46 | 
47 | 
48 | 
49 | `-MP'
50 | This option instructs CPP to add a phony target for each dependency
51 | other than the main file, causing each to depend on nothing.These
52 | dummy rules work around errors `make' gives if you remove header
53 | files without updating the `Makefile' to match.
54 | 
55 | This is typical output:
56 | 
57 | test.o: test.c test.h
58 | 
59 | test.h:
60 | 
61 | `-MT TARGET'
62 | Change the target of the rule emitted by dependency generation.By
63 | default CPP takes the name of the main input file, including any
64 | path, deletes any file suf
65 | 
66 | 


--------------------------------------------------------------------------------
/gui_001.md:
--------------------------------------------------------------------------------
  1 | ## TODO  
  2 | * tgui, and UI editor    
  3 | search baidupan, TGUI-0.6-RC_v1.7z  
  4 | E:\work\work_git\pv3d\misaki-master\tgui  
  5 | https://github.com/weimingtom/misaki  
  6 | * sakura, PSS SDK UI editor  
  7 | https://github.com/weimingtom/Sakura  
  8 | https://gitee.com/weimingtom/sakura_ubuntu  
  9 | * dino, TileGrid  
 10 | https://github.com/weimingtom/game_dino_vc6  
 11 | * erica  
 12 | https://github.com/weimingtom/erica  
 13 | search baidupan, erica_v2.7z  
 14 | * ucgui  
 15 | search baiudpan, UCGUI_touch_vc6_v2_success.rar    
 16 | * minigui  
 17 | search badiupan, MinGUI162.rar  
 18 | * qt  
 19 | search baidupan, qt-x11-free-3.0.3.tar.gz  
 20 | * LittlevGL, LVGL  
 21 | search baidupan, lvgldemo_v4_run_success.rar  
 22 | 
 23 | ## ucgui  
 24 | * search baidupan, UCGUI_touch_vc6_v1.rar  
 25 | * search baiudpan, UCGUI_touch_vc6_v2_success.rar  
 26 | * https://github.com/topics/ucgui  
 27 | * https://github.com/WHJWNAVY/Micrium-uCOS  
 28 | * https://github.com/weimingtom/stm32-gui  
 29 | * https://github.com/qq516333132/ucGUI  
 30 | * https://github.com/piyushpandey013/ucGUI  
 31 | * search baidupan, UCGUI_touch_1.44LCD-STM32F103RC_v1.rar  
 32 | * search baidupan, spi2.4_UCGUI_touch_v4_stm32f103rc.rar  
 33 | * (TODO) search baidupan, UCGUI3.90_Source.zip  
 34 | 
 35 | ## minigui  
 36 | * search baidupan, MinGUI162.rar, Windows, VC6  
 37 | GUISim for MINIGUI162  
 38 | * search baidupan, minigui-ths-dev-2.0.4-win32.zip, windows  
 39 | 运行编译成功后的控制台exe前请先启动wvfb.exe，其他Windows版同理  
 40 | * search baidupan, MiniGUIforucos移植实验全部源码.rar, MiniGUI-STR, MagicARM2200 (LPC2200, ARM7)    
 41 | * search baidupan, miniGui.rar, miniGUI_2410IAR, S3C2410 (ARM9), MiniGUI for FreeRTOS    
 42 | * https://github.com/OpenNuvoton/NUC970_Linux_Applications/tree/master/minigui  
 43 | libminigui-gpl-3.0.12.tar.gz  
 44 | mg-samples-3.0.12.tar.gz  
 45 | minigui-res-be-3.0.12.tar.gz  
 46 | see https://github.com/pd2-linux/dl  
 47 | * http://www.minigui.com/download  
 48 | libminigui-3.0.12-linux.tar.gz (different from libminigui-gpl-3.0.12.tar.gz)    
 49 | * https://sourceforge.net/projects/minigui/files/minigui/GPL-V1.6.10/  
 50 | libminigui-1.6.10.tar.gz  
 51 | * https://github.com/lindenis-org/lindenis-v536-dl1  
 52 | http://wiki.lindeni.org    
 53 | libminigui-gpl-3.2.tar.gz  
 54 | 
 55 | ## css, windows 95 style    
 56 | * https://github.com/arturbien/React95  
 57 | 
 58 | ## win95  
 59 | * https://github.com/AlexBSoft/win95.css  
 60 | * https://github.com/litheory/bootstrap-theme-Win95  
 61 | 
 62 | ## 主流嵌入式开源GUI比较  
 63 | * https://blog.csdn.net/anyuliuxing/article/details/78431561  
 64 | 
 65 | ## xwing, x11 (linux)      
 66 | * https://github.com/weimingtom/xwing  
 67 | * https://github.com/weimingtom/xwing/blob/master/src/awt/NativeGraphics.cc  
 68 | 
 69 | ## ZLG GUI, zlggui, 周立功  
 70 | * search baidupan, ZLGUGI.rar    
 71 | * search baidupan, Gui实验.rar  
 72 | 
 73 | ## GuiLite  
 74 | * https://gitee.com/idea4good/GuiLite  
 75 | 
 76 | ## TouchGFX简介  
 77 | * https://blog.csdn.net/k331922164/article/details/105343196  
 78 | 
 79 | ## LittlevGL, LVGL    
 80 | * search baidupan, lvgldemo_v4_run_success.rar  
 81 | * Light and Versatile Embedded Graphics Library  
 82 | * https://littlevgl.com  
 83 | * https://github.com/lvgl/lv_port_esp32  
 84 | * https://github.com/wireless-tag-cn/lv_port_esp32  
 85 | * 荔枝派  
 86 | * http://nano.lichee.pro/application/littlevgl.html  
 87 | * https://github.com/littlevgl/lvgl  
 88 | * https://github.com/littlevgl/lv_drivers  
 89 | * https://github.com/littlevgl/lv_examples  
 90 | * see maixduino examples     
 91 | * framebuffer  
 92 | * https://blog.lvgl.io/2018-01-03/linux_fb  
 93 | * https://github.com/lvgl/lv_port_linux_frame_buffer  
 94 | * https://blog.csdn.net/tq384998430/article/details/96841247  
 95 | * Omega2 Dash  
 96 | * https://onion.io/omega2-dash-guide/  
 97 | * https://github.com/OnionIoT/lv_micropython  
 98 | 
 99 | ## emxgui  
100 | * https://www.firebbs.cn/thread-23725-1-1.html  
101 | * https://emxgui-tutorial-doc.readthedocs.io/zh_CN/latest/index.html  
102 | 
103 | ## STemWin  
104 | 
105 | ## RTGUI, rt-thread  
106 | https://github.com/rqbh/RTGUI  
107 | 
108 | ## AWTK, FTK, 周立功    
109 | https://github.com/zlgopen/awtk  
110 | https://github.com/xianjimli/ftk  
111 | 
112 | ## GTK+  
113 | http://www.gtk.org/  
114 | gtk+ windows, All-in-one bundles  
115 | http://www.gtk.org/download/win32.php  
116 | gtkmm  
117 | http://www.gtkmm.org/en/download.html  
118 | tinygtk  
119 | https://code.google.com/p/tinygtk/  
120 | MiniGTK  
121 | http://xchat.org/files/binary/win32/mini-src/  
122 | gtk-win32  
123 | http://hexchat.github.io/gtk-win32/  
124 | Ubuntu Framebuffer学习笔记    
125 | https://github.com/weimingtom/wmt_ai_study/blob/master/framebuffer_001.txt  
126 | https://www.iteye.com/blog/weimingtom-1113103  
127 | search baidupan, gtk_v2_20110628_src.7z  
128 | search baidupan, gtk2  
129 | 
130 | ## Qt  
131 | http://qt-project.org/downloads  
132 | http://en.wikipedia.org/wiki/Qt_(framework)  
133 | qt 4.3.0 mingw  
134 | http://download.qt-project.org/archive/qt/4.3/  
135 | http://sourceforge.net/projects/mingw/files/OldFiles/MinGW-3.2.0-rc-3.exe/download  
136 | Qt静态编译总结  
137 | https://blog.51cto.com/weimingtom/1546309  
138 | Ubuntu Framebuffer学习笔记    
139 | https://github.com/weimingtom/wmt_ai_study/blob/master/framebuffer_001.txt  
140 | https://www.iteye.com/blog/weimingtom-1113103  
141 | search baidupan, qt-everywhere-opensource-src-4.8.6.tar.gz  
142 | search baidupan, qt-linux  
143 | 荔枝派zero  
144 | http://zero.lichee.pro/应用/QT_doc3.html#id6  
145 | 
146 | ## FLTK  
147 | http://www.fltk.org/  
148 | http://en.wikipedia.org/wiki/FLTK  
149 | http://sourceforge.net/projects/fltk/  
150 | 
151 | ## Nuklear  
152 | https://github.com/Immediate-Mode-UI/Nuklear  
153 | https://github.com/vurtun/nuklear  
154 | search baidupan, nuklear_codelite_v2.rar  
155 | 
156 | ## MyGUI  
157 | https://github.com/dayongxie/MyGUI  
158 | https://github.com/dayongxie/cocos2d-x  
159 | https://github.com/MyGUI/mygui  
160 | search baidupan, MyGUI_cocos2d-x.rar  
161 | 
162 | ## libRocket  
163 | https://github.com/libRocket/libRocket  
164 | search baidupan, librocket_build  
165 | 
166 | ## TGUI, SFML  
167 | https://github.com/texus/TGUI  
168 | http://tgui.net/index.html  
169 | search baidupan, TGUI-0.6-RC_v1.7z  
170 | misaki, Java  
171 | https://github.com/weimingtom/misaki  
172 | 
173 | ## nifty  
174 | https://github.com/nifty-gui/nifty-gui  
175 | https://github.com/relu91/niftyeditor  
176 | search baidupan, nifty_eclipse.rar    
177 | 
178 | ## guichan  
179 | https://github.com/weimingtom/guichan_gitorious_mainline  
180 | https://github.com/weimingtom/guichan_memory  
181 | https://github.com/weimingtom/guichan_cocos2dx  
182 | https://github.com/weimingtom/yuichan  
183 | 
184 | ## AsWing  
185 | https://github.com/dreamsxin/AsWing  
186 | 
187 | ## CrossApp  
188 | https://github.com/9miao/CrossApp  
189 | search baidupan, CrossApp_0.4.2_mini_v1.rar   
190 | 
191 | ## gui old list  
192 | https://github.com/weimingtom/wmt_link_collections_in_Chinese/blob/master/gui.md  
193 | 
194 | ## PSS SDK, C#    
195 | https://github.com/weimingtom/Sakura  
196 | 
197 | ## Framebuffer (or DirectFB)    
198 | search baidupan, fbtest.c  
199 | Ubuntu Framebuffer学习笔记  
200 | https://github.com/weimingtom/wmt_ai_study/blob/master/framebuffer_001.txt  
201 | DirectFB usage    
202 | https://github.com/weimingtom/SimpleScriptSystem/blob/master/sss/src/mainframe_dfb.c  
203 | 
204 | ## pxcore, framebuffer  
205 | search fbtest.c  
206 | search SDL-1.2.15_makefile_framebuffer_v0.zip  
207 | https://github.com/weimingtom/pxcore  
208 | Mandelbrot, fractal, 分形  
209 | x11 (linux)    
210 | https://github.com/weimingtom/pxcore/blob/master/src/x11/pxWindowNative.cpp  
211 | 
212 | ## c-sky, csky, 诛仙剑, 平头哥, Framebuffer     
213 | https://github.com/weimingtom/wmt_ai_study/blob/master/csky_001.txt    
214 | 开发指导  
215 | https://c-sky.github.io/docs/gx6605s.html  
216 | 平头哥 Linux Arch介绍  
217 | https://c-sky.github.io  
218 | 技术支持论坛  
219 | https://github.com/c-sky/forum/issues  
220 | 
221 | ## 荔枝派zero, fbtft  
222 | http://zero.lichee.pro/驱动/SPI_LCD.html#fbtft  
223 | 
224 | ## Turbo C graphics.h, BGI, DOS    
225 | https://github.com/weimingtom/old_books_code  
226 | https://github.com/weimingtom/TurboCGraphics  
227 | https://github.com/search?p=1&q=Turbo+C+graphic&type=Repositories  
228 | OpenBGI  
229 | https://sourceforge.net/projects/openbgi/  
230 | WinBGIm  
231 | http://winbgim.codecutter.org  
232 | easyx  
233 | http://www.easyx.cn  
234 | EGE  
235 | https://github.com/misakamm/xege  
236 | search baidupan, [整理]project.rar  
237 | FreeDOS, search baidupan, dos.rar  
238 | 
239 | ## erica, SFML.Net, csharp    
240 | https://github.com/weimingtom/erica  
241 | 
242 | ## dino  
243 | see https://github.com/weimingtom/wmt_ai_study/blob/master/game_001.md  
244 | https://github.com/makerdiary/python-games-on-microcontroller  
245 | https://github.com/makerdiary/nrf52840-m2-devkit/tree/master/examples/python/dino  
246 | (TODO) dino, port to c++  
247 | search baidupan, dino_v3.rar  
248 | 
249 | ## openwin, xview, libxview, xv_create      
250 | https://github.com/search?p=2&q=xview_window_DEFINED&type=Code  
251 | https://github.com/maximilianharr/code_snippets/tree/master/cpp/hanser_c_und_linux/3.Auflage/XView  
252 | https://github.com/jonathangray/freebsd-1.x-ports/tree/master/x11/xview  
253 | https://github.com/search?l=HTML&p=2&q=xv_create&type=Code  
254 | 
255 | ## Motif  
256 | https://sourceforge.net/projects/motif/  
257 | search baidupan, motif-2.3.4-src.tgz  
258 | 
259 | ## GUIX Studio  
260 | https://github.com/azure-rtos/guix  
261 | 
262 | ## qt  
263 | see https://github.com/weimingtom/wmt_ai_study/blob/master/qt_001.md  
264 | 
265 | ## nw.js  
266 | https://github.com/weimingtom/mang/tree/wmt  
267 | 
268 | ## (单片机）freedom-platform  
269 | https://sourceforge.net/projects/freedom-platform/  
270 | 
271 | ## stemwin  
272 | 
273 | ## QWS的全称是Qt windows system  
274 | https://blog.51cto.com/countryfrog/843526  
275 | 
276 | ## dotnet GUI, C# GUI  
277 | https://gitee.com/yhuse/SunnyUI  
278 | https://dotnet9.com/4177.html  
279 | https://github.com/kwwwvagaa/NetWinformControl  
280 | dotnetbar， mui， DSOFRAMER， devexpress  
281 | search weibo, winform  
282 | 
283 | ## MGL2, for NEC PDA  
284 | http://www.at.sakura.ne.jp/~suz/MGL2/index.html  
285 | search mgl2-alpha-020.tar.gz  
286 | 
287 | ## qt, xorg, x11, gtk+2  
288 | * search baidupan, 嵌入式Linux应用开发完全手册, 光盘, 基于2440      
289 | 
290 | ## microwindows  
291 | * https://github.com/ghaerr/microwindows  
292 | 
293 | ## LVGL-8.2 demo, 2K0300先锋派    
294 | * lvgl-8.2-src.tar.gz
295 | * https://gitee.com/open-loongarch/docs-2k0300/blob/master/2K0300先锋派/quick_start.md  
296 | ```
297 | 用virtualbox和archlinux 2025的最小模式
298 | framebuffer编译运行LVGL-8.2 demo
299 | （来源于龙芯2K0300先锋派资料）的效果（鼠标不起作用，
300 | 需要framebuffer，archinstall安装profile为最小模式）。
301 | 之前我好像说错了，我以为fedora里面的最小模式是可以用framebuffer的，
302 | 结果不行，好像是没有/dev/fb0设备的，
303 | 所以最好用arch代替fedora——不过其实都不怎么好用
304 | ```
305 | 


--------------------------------------------------------------------------------
/j-link_mdk_keil_error_001.md:
--------------------------------------------------------------------------------
 1 | ## JLINK 调试报错 JLink Error: Can not read register 解决办法  
 2 | * https://blog.csdn.net/langeldep/article/details/78016105  
 3 | * https://blog.csdn.net/Yin_w/article/details/130032965  
 4 | * 除了这个原因，还可能是stm32cubemx没有设置好jtag/swd, 或者swd的针脚被其他GPIO功能占用了  
 5 | * SYS->Debug->Serial Wire  
 6 | 
 7 | ## 关于DAP-Link在keil中显示RDDI-DAP Error的解决办法以及串口驱动安装及串口下载程序  
 8 | * https://blog.csdn.net/SailingNorth/article/details/124899856  
 9 | 
10 | ## daplink下载失败，可能是因为接了扩展板接反了，导致某些针脚的电平出问题  
11 | 
12 | ## Invalid ROM Table解决办法 (nucleo??? gd32??? daplink???)  
13 | * https://blog.csdn.net/ninihaoyangde/article/details/126610783  
14 | * because stm32cubemx setup chip clock too high or too low, or setting the clock with code (like Stm32_Clock_Init()) is too high or too low    
15 | * see clock configuration  
16 | ```
17 | 今天下午在调试程序的时候，下载了一个别人写的程序，忘记修改时钟频率配置，导致STM32F407芯片锁死问题。
18 | 以下是我解决的办法，亲测很有效，分享给有需要的童鞋们。
19 | 原因
20 | 出现该现象的原因为板子外部晶振为24M，而程序软件上以8M为输入晶振频率，导致芯片超频锁死，无法连接、下载。
21 | 解决方法
22 | 在keil里点击魔法棒进入。
23 | Connect选择under Reset.
24 | 在Flash Download 中勾选Erase Full  Chip,点击OK，再去重新下载程序，就可以了。
25 | ```
26 | 
27 | 


--------------------------------------------------------------------------------
/kws_build_001.txt:
--------------------------------------------------------------------------------
 1 | 我来总结一下arm的ml-kws的编译坑，希望能有一些帮助，虽然以后可能会继续完善
 2 | （1）关于mbed-cli的操作系统环境。我建议用windows，当然也可以用linux，后面我会解释为什么windows和linux是差不多的，然后确保有10GB的硬盘空间，
 3 | 因为mbed占用的硬盘比较大。建议装一个全新的python 3.7。
 4 | （2）关于mbed-cli的python包安装。如果你想调试mbed-cli的报错（因为经常会出现mbed命令失败），那么建议用源码方式安装mbed-cli包到python环境
 5 | （python setup.py install）。如果你想偷懒，可以直接pip3 install mbed-cli在线安装最新版。
 6 | （3）设置ARM_GCC环境变量。网上有介绍，我就不赘述了。如果漏了这一步，mbed compile可能会报错，但mbed-cli会提醒ARM_GCC的问题，因为mbed-cli调用
 7 | 的mbed-sdk-tools会检查ARM_GCC指向的路径是否存在指定名字的工具链gcc执行文件
 8 | （4）关于mbed new命令。官方推荐用mbed new 工程名 --mbedlib，但实际上这些创建工程的命令很可能会下载包失败（因为下载的东西太多了）。
 9 | 这个坑比较严重，我的折中方法是，用mbed new kws_simple_test --create-only创建工程，然后单独把mbed（实际上是mbed 2）和
10 | mbed-sdk-tools的仓库代码下载下来，分别解压到mbed子目录和tools子目录下（这样做的原因可以看mbed-cli的python代码），
11 | 然后就能执行mbed deploy成功了。否则，无法mbed deploy，更无法mbed compile
12 | （5）关于mbed（实际上是mbed2）的代码下载。这个是另外一个坑。mbed下载下来的zip文件是缺文件的（例如缺了mbed.h和drivers目录）：
13 | https://os.mbed.com/users/mbed_official/code/mbed/
14 | 这个问题在github上的issue有人反映了，你可能会奇怪为什么会缺文件，其实原因很简单，因为这个仓库的文件体积太大了，而这个仓库又不
15 | 支持git clone（似乎是用hg做版本控制的），所以唯一的解决办法是把缺了的文件（例如mbed.h）逐个下载下来或者复制下来，但数量不是很多，
16 | 所以工作量不会太大（还有一些BSP文件缺了的不用管，反正没用到）
17 | （6）关于mbed compile编译。如果解决好前面的问题，这一步就很简单了。不过要小心一个问题：千万不要把CMSIS_5或者其他依赖库
18 | 放在工程目录下，例如kws_simple_test，因为mbed-cli会扫描所有工程目录下的所有子目录（递归的），导致很严重的编译问题
19 | （所有子目录都是include目录），官方其实已经解决了这个问题，把CMSIS_5放在示例工程目录的外面而非里面。
20 | 另外，编译的过程中mbed-cli其实是间接调用tools目录下的python脚本，所以mbed-cli其实只是个外壳罢了，
21 | 编译时的行为是依赖于mbed-sdk-tools的
22 | 


--------------------------------------------------------------------------------
/launching-speech-commands-dataset.txt:
--------------------------------------------------------------------------------
 1 | http://www.ijiandao.com/2b/baijia/67523.html
 2 | https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
 3 | 
 4 | 
 5 | 
 6 | 业界 | 谷歌开放语音命令数据集，助力初学者利用深度学习解决音频识别问题
 7 | 百家作者：机器之心 2017-08-25 13:40:48 阅读：123
 8 | 选自Google Research
 9 | 
10 | 机器之心编译
11 | 
12 | 参与：路雪
13 | 
14 | 
15 | 
16 | 近日，谷歌开放语音命令数据集，发布新的音频识别教程，旨在帮助初学者利用深度学习解决语音识别和其他音频识别问题。
17 | 
18 | 
19 | 
20 | 语音命令数据集地址：http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz
21 | 
22 | 音频识别教程地址：https://www.tensorflow.org/versions/master/tutorials/audio_recognition
23 | 
24 | 
25 | 
26 | 在谷歌，我们经常被问到如何使用深度学习解决语音识别和其他音频识别问题，比如检测关键词或命令。尽管已经有很多大型开源语音识别系统，如 Kaldi，这些系统可以把神经网络作为一个模块使用，但是它们的复杂性导致其很难用于指导简单的任务。更重要的是，并没有多少适合初学者的免费、开源数据集（部分数据集需要在构建神经模型之前进行预处理）或适合简单的关键词检测任务的数据集。
27 | 
28 | 
29 | 
30 | 为了解决这些问题，TensorFlow 和 AIY 团队创建了语音命令数据集，并用它向 TensorFlow 中添加训练和推断的示例代码。该数据集有 30 个短单词的 65000 个长度 1 秒钟的发音，这些音频由数千人通过 AIY 网站提供。它随 Creative Commons BY 4.0 license 发布，并将随着音频的增多持续发布新版本。该数据集旨在帮助构建基础但有用的应用程序语音接口，包括常用单词「是」（Yes）、「否」（No）、数字和方向词。我们还开源了用于创建该数据集的基础架构，希望更多人使用它创建自己的数据集，尤其是能够覆盖到服务水平不足的语言和应用。
31 | 
32 | 
33 | 
34 | 想自己试试，那么下载 TensorFlow 安卓演示应用程序的预置数据集（http://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/tensorflow_demo.apk）并打开「TF Speech」。你需要给TFspeech应用授予麦克风访问权限，然后就会看到一个十个单词的列表，你说哪个单词，它就会点亮。
35 | 
36 | 
37 | 
38 | 
39 | 
40 | 
41 | 
42 | 识别结果取决于你的语音模式是否被数据集覆盖，因此这并不完美，商业语音识别系统比这个教学示例复杂的多。但是我们希望，随着更多口音和变体加入数据集，社区向 TensorFlow 贡献改进后的模型，我们能够看到数据集的不断改进和扩展。
43 | 
44 | 
45 | 
46 | 你还可以通过 TensorFlow.org 上新的音频识别教程学习如何训练自己的模型。有了该框架的最新开发版本（https://hub.docker.com/r/tensorflow/tensorflow/）和现代的台式机，你可以下载该数据集并在几小时内训练模型。你还拥有多种选择来为不同的问题定制神经网络，产生不同的延迟时间、规模、精度的平衡以适应不同的平台。
47 | 
48 | 
49 | 
50 | 我们很期待看到大家在该数据集和教程的帮助下构建的新应用，因此我希望大家有机会利用这些资源，开始做音频识别任务！
51 | 
52 | 
53 | 
54 | Interspeech 2015 会议上展示的《Convolutional Neural Networks for Small-footprint Keyword Spotting》（http://www.isca-speech.org/archive/interspeech_2015/papers/i15_1478.pdf）中对该网络的架构进行了描述。
55 | 
56 | 
57 | 
58 | 
59 | 原文链接：https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
60 | 
61 | 
62 | 
63 | 
64 | 
65 | 本文为机器之心编译，转载请联系本公众号获得授权。
66 | 
67 | ✄------------------------------------------------
68 | 
69 | 加入机器之心（全职记者/实习生）：hr@jiqizhixin.com
70 | 
71 | 投稿或寻求报道：content@jiqizhixin.com
72 | 
73 | 广告&商务合作：bd@jiqizhixin.com
74 | 
75 | 关注公众号：拾黑（shiheibook）了解更多
76 | 


--------------------------------------------------------------------------------
/librosa_001.txt:
--------------------------------------------------------------------------------
 1 | https://blog.csdn.net/weixin_34191734/article/details/91964709
 2 | 
 3 | 这几天基于清华大学的中文语音训练集 做一个识别语音转文本的模型，发现训练过程非常的慢，难以忍受。8911个音频文件，大概2g多的数据，使用GPU训练4轮居然要2个小时。如果训练16轮的话，就是8个小时。如果训练80轮的话，就是40个小时。
 4 | 
 5 | 为了缩短训练时间，我重新检查了程序。
 6 | 
 7 | 首先，发现batch_size只有16，太小了。我的GPU有6G显存，完全可以改大点。于是把batch_size改成32. 
 8 | 
 9 | 其次，我使用python -m cProfile 跟踪程序，发现librosa.load(wav, mono=True)方法很费时间。这个librosa音频数据处理工具负责加载音频文件，加载一个音频文件居然要0.2秒，奇葩。百度了librosa，果然有人说它的性能很差。经过对比，把librosa.load(wav, mono=True)方法换成scipy.io.wavfile.read(wav)，能够大幅加速音频文件的读取和加载速度。（librosa的性能讨论帖  https://github.com/librosa/librosa/issues/572）
10 | 
11 | #这段代码性能很差 wav, sr = librosa.load(wav_files[pointer], mono=True)
12 | sr, wav = scipy.io.wavfile.read(wav_files[pointer])
13 | wav = wav.astype('float32') / 32767
14 | mfcc = np.transpose(librosa.feature.mfcc(wav, sr), [1, 0])
15 | 修改完这两块地方，重新开始训练，果然速度大幅提升。训练16轮，2小时14分跑完。对比原来的8小时，改进了不少。：）
16 | 
17 |  
18 | 
19 | 后记： 这次测试用的是Tensorflow0.12，总体感觉有些慢。此外，感觉用python处理大数据确实有点性能问题。以后凡是用纯python实现的处理大数据的方法都得留意一下。只要怀疑有问题，可以用profile工具跟踪看看。
20 | 
21 | -----------------------------------------------------------------------------------------------------
22 | 
23 | 2017-5-15更新
24 | 
25 | 再次做了一次优化：
26 | 
27 | 1）批量正则化系数从 1e-8 改成 (1e-5 + 1e-12), 变大了。
28 | 
29 | 2）把激活函数从“tanh” 改成 “relu”。因为理论上relu收敛地更快，表现更好。
30 | 
31 | 然后，开始16轮的训练。这次，2小时8分跑完。对比优化前的2小时14分，又快了一点。而且，观察每轮打印出来的loss矩阵，明显收敛速度非常快。训练结束后，大部分的loss都在200以下，比优化前好很多。
32 | 
33 | 此外，用一段语音对训练生成的模型做测试，得到了更多的文字信息（仍然不准确），而不是优化前的两个文字。
34 | 
35 | -----------------------------------------------------------------------------------------------------
36 | 
37 | 2017-5-16更新
38 | 
39 | 把从8911个文件里抽取的音频特征向量和标签向量保存成一个numpy大文件（500多M）。训练前，加载numpy文件到内存中。这样每轮训练都直接从内存中的numpy对象里获取，加载时间大幅减少。经过这样优化后，再次训练16轮只需要1小时36分左右的时间。
40 | 
41 | 
42 | 转载于:https://my.oschina.net/qinhui99/blog/899014
43 | 


--------------------------------------------------------------------------------
/linux_001.md:
--------------------------------------------------------------------------------
 1 | ## ETC  
 2 | * search baidupan, GNU_Linux嵌入式快速编程  
 3 | * phoneme, https://github.com/weimingtom/Kuuko/blob/master/README2.md#jvmj2me--cldc  
 4 | 
 5 | ## linux_stm32f7  
 6 | * https://github.com/tnishinaga/linux_stm32f7  
 7 | 
 8 | ## STM32F429I-disco_Buildroot  
 9 | * https://github.com/fdu/STM32F429I-disco_Buildroot  
10 | 
11 | ## 嵌入式stm32f429上成功跑通主流Linux 4.13  
12 | * https://blog.csdn.net/farsight1/article/details/79377337  
13 | 
14 | ## linux-0.11-lab  
15 | https://gitee.com/tinylab/linux-0.11-lab  
16 | 
17 | ## 从零使用qemu模拟器搭建arm执行环境  
18 | https://www.cnblogs.com/mfmdaoyou/p/6934098.html
19 | 
20 | ## picore, 树莓派操作系统系统镜像    
21 | http://distro.ibiblio.org/tinycorelinux/ports.html  
22 | 
23 | ## piCore (Tiny Core) Linux on Raspberry Pi  
24 | search 印象笔记  
25 | https://iotbyhvm.ooo/picore-tiny-core-linux-on-raspberry-pi/  
26 | 如何扩充树莓派分区  
27 | 
28 | ## 我以前说，研究Linux板（嵌入式Linux）不如研究单片机，  
29 | 例如，研究树莓派、v3s、ARM9不如研究rt1052、stm32h750之类。  
30 | 那究竟应该怎样研究Linux板才是正道？我思考很很久，我认为有  
31 | 这几点，可能是突破这个难题的关键：  
32 | （1）编译问题：完成类似buildroot的目标，但不需要太离谱，  
33 | 只要能qemu运行即可
34 | （2）GPIO问题：抽象出操纵GPIO的普遍规律  
35 | （3）framebuffer问题：如何用qemu模拟framebuffer，乃至  
36 | 于可以移植嵌入式GUI，乃至于移植游戏，乃至于移植音视频应用  
37 | （4）Android问题：如何简化和编译模拟Android早期版本代码  
38 | 库，乃至于添加自定义的功能  
39 | （5）驱动问题：抽象出撰写驱动程序的普遍规律，乃至于操纵  
40 | TFT屏幕与音频输入输出（6）网络和多媒体问题：音视频编解码  
41 | 实践
42 | （7）人工智能问题：在qemu中模拟运行人工智能deep learning  
43 | 算法
44 | 
45 | ## OpenWrt-Rpi  
46 | https://github.com/SuLingGG/OpenWrt-Rpi  
47 | 
48 | ## search, linux内核完全注释  
49 | 
50 | ## pxcore  
51 | * https://github.com/weimingtom/pxcore  
52 | 


--------------------------------------------------------------------------------
/live_001.md:
--------------------------------------------------------------------------------
 1 | ## TODO
 2 | * 印象笔记search 视频通话  
 3 | 
 4 | ## Live  
 5 | * https://github.com/weimingtom/wmt_link_collections_in_Chinese/blob/master/live.md
 6 | * https://github.com/weimingtom/wmt_link_collections_in_Chinese/blob/64c3cbe4af31406f6407fcc81742453c3382c1a7/ffmpeg.md
 7 | * https://github.com/weimingtom/wmt_android_screenshot
 8 | * https://github.com/weimingtom/wmt_android_screenshot_adbshell
 9 | * https://github.com/weimingtom/CameraBroadcast
10 | * https://github.com/weimingtom/ffmpeg-wrapper-hls
11 | * 小萝贝
12 | * https://github.com/weimingtom/wmt_screen_study/blob/93f00428f09b4f5c343c208029974f47fe680f3b/README.md
13 | 
14 | ## Camera, Android    
15 | * 我以前研究的ffmpeg的直播原理（我的目标是想做一个安卓屏幕录制直播，需要root，不过后来没做出来），研究了一半，我把代码和测试操作步骤的资料整理到CameraBroadcast仓库里面  
16 | https://github.com/weimingtom/CameraBroadcast  
17 | * 为了整个过程都是透明的，我用了一个别人编译的rtmp服务器：nginx-rtmp-win32  
18 | https://github.com/illuspas/nginx-rtmp-win32  
19 | * 另外我以前还有一个坑没填，我想做一个hls下载器，不过好像还没解决进度显示问题，仓库地址在这里：  
20 | https://github.com/weimingtom/ffmpeg-wrapper-hls  
21 | * 其实这两个问题都卡在一个地方，怎么把ffmpeg的代码吃透，然后搬到android jni上  
22 | 
23 | ## Sound Recorder  
24 | * xunfei / android recorder  
25 | * xunfei android client rip  
26 | * https://gitee.com/weimingtom/xunfei  
27 | * github_voice2\Android-SoundRecorder  
28 | * github_wmt\ffmpeg-wrapper-hls  
29 | * https://github.com/xiayouli0122/SoundRecorder  
30 | * github_voice2\SoundRecorder_xiayouli0122  
31 | 
32 | ## Linux SDK about multimedia    
33 | ### （1）Linux app / Linux examples / Linux使用示例  
34 | * https://github.com/c-sky/linux-sdk-examples    
35 | ### （2）c-sky  
36 | * https://github.com/c-sky/linux-sdk-examples  
37 | ### （3）nuc970    
38 | * https://github.com/OpenNuvoton/NUC970_Linux_Applications  
39 | ### （4）树莓派rpi, raspberry pi  
40 | * https://github.com/weimingtom/wmt_rpi_study  
41 | ### （5）萤火虫ROC-RK3308B-CC  
42 | * https://gitlab.com/TeeFirefly/rk3308-linux/-/blob/firefly/external  
43 | * https://github.com/rockchip-linux/buildroot/tree/rockchip/2018.02-rc3/package/rockchip  
44 | 
45 | ## Flash, AMF3  
46 | * https://github.com/weimingtom/ugame/tree/master/projects/amf3test  
47 | 
48 | ## search baidupan, easydarwin  
49 | * easydarwin_rtsp测试服务器_surface平板.rar  
50 | 
51 | ## librtsp  
52 | https://github.com/cijliu/librtsp  
53 | 
54 | ## srs  
55 | https://github.com/ossrs/srs  
56 | 
57 | ## rtmpdump-librtmp  
58 | https://gitee.com/weimingtom/rtmpdump-librtmp  
59 | https://github.com/kulv2012/rtmpdump-librtmp  
60 | 
61 | ## ZLMediaKit  
62 | https://github.com/xia-chu/ZLMediaKit  
63 | 
64 | ## rtmp  
65 | * https://github.com/weimingtom/ugame/tree/master/doc/rtmp
66 | * https://github.com/weimingtom/ugame/tree/master/projects/amf3test
67 | 
68 | ## live555   
69 | * http://live555.com  
70 | 


--------------------------------------------------------------------------------
/lstm_001.txt:
--------------------------------------------------------------------------------
  1 | https://blog.csdn.net/antkillerfarm/article/details/84232764
  2 | 
  3 | word2vec, LSTM Speech Recognition实战, 图数据库
  4 | 
  5 | antkillerfarm 2018-11-19 10:34:13  824  收藏 1
  6 | 版权
  7 | word2vec
  8 | word2vec是Google于2013年开源推出的一个用于获取word vector的工具包。作者是Tomas Mikolov。
  9 | 
 10 | Github：
 11 | 
 12 | https://github.com/tmikolov/word2vec
 13 | 
 14 | 注：Tomas Mikolov，捷克布尔诺科技大学博士。先后在Google、Facebook担任研究员。
 15 | 
 16 | word2vec包中还有一个word2phrase的程序，这个程序可以根据统计信息由单词生成短语。考虑到中文的字和词之间的关系，实际上也可以用它来进行无先验数据的分词。
 17 | 
 18 | 注：NLP中的先验数据，最出名的当属分词词典。除此之外，还包括HMM的转移矩阵表等。
 19 | 
 20 | 其一般方法为：
 21 | 
 22 | 1.对原始语料按字切分，以空格分隔，相当于认为一个字就是一个词，即单字成词。
 23 | 
 24 | 2.使用word2phrase组字成词。
 25 | 
 26 | time ./word2phrase -train 1.txt -output 2.txt -threshold 100 -debug 2
 27 | 
 28 | 3.由于word2phrase最多只考虑到2-gram。因此，对于超过3个字以上的词语，需要迭代执行word2phrase。
 29 | 
 30 | 我以金庸的小说为语料进行测试。从结果来看，这种方法对于人名、地名、武功招式名等专有名词，分词效果较好。但对于具有语法结构的句子，分词效果较差。比如“那人”其实是两个单字词，但却被word2phrase认为是一个双字词。
 31 | 
 32 | ./word2vec -train resultbig.txt -output vectors.bin -cbow 0 -size 200 -window 5 -negative 0 -hs 1 -sample 1e-3 -threads 12 -binary 1
 33 | 
 34 | ./distance vectors.bin
 35 | 
 36 | 训练之后的结果文件中，保存着每个词的向量。可将binary选项设为0，来查看相应结果的明文。
 37 | 
 38 | 明文和二进制数据之间的转换可使用gensim工具，参见：
 39 | 
 40 | https://github.com/antkillerfarm/antkillerfarm_crazy/blob/master/python/ml/nlp/hello_gensim.py
 41 | 
 42 | 参考：
 43 | 
 44 | http://wei-li.cnblogs.com/p/word2vec.html
 45 | 
 46 | 文本深度表示模型Word2Vec
 47 | 
 48 | http://www.cnblogs.com/wowarsenal/p/3293586.html
 49 | 
 50 | 用中文把玩Google开源的Deep-Learning项目word2vec
 51 | 
 52 | http://www.jianshu.com/p/05800a28c5e4
 53 | 
 54 | 使用word2vec训练wiki中英文语料库
 55 | 
 56 | LSTM Speech Recognition实战
 57 | 数据集
 58 | 首先，在Github上搜寻了一番，发现了以下项目：
 59 | 
 60 | https://github.com/zzw922cn/Automatic_Speech_Recognition
 61 | 
 62 | https://github.com/pandeydivesh15/AVSR-Deep-Speech
 63 | 
 64 | 但是无奈他们使用的TIMIT数据集是收费的，只好放弃了。
 65 | 
 66 | 最终，找到了如下项目：
 67 | 
 68 | https://github.com/sdhayalk/TensorFlow_Speech_Recognition_Challenge
 69 | 
 70 | 复现结果
 71 | 这里只实验了最简单的那个模型，遗憾的是该代码并不能直接使用，需要相应的预处理：
 72 | 
 73 | https://github.com/antkillerfarm/antkillerfarm_crazy/tree/master/python/ml/tensorflow/TensorFlow_Speech_Recognition_Challenge
 74 | 
 75 | 这里还有一个坑，该项目只使用了11类声音，而把其他19类都归为unknown。这会导致unknown的权重过重，测试准确度虚高，（无脑分类为unknown都有60%以上的精度）但实际结果很差。需要使用一些方法处理数据的不平衡。
 76 | 
 77 | 最终，复现结果精度大概在75%～80%之间。训练时间大概要16小时。
 78 | 
 79 | 炼丹一
 80 | 把类别扩展到30类，精度略高，但也就80%上下。如此费时的训练，只有这点结果，实在让人丧气。于是参考warpCTC，进行炼丹。
 81 | 
 82 | 1.LSTM由3层减为1层。
 83 | 
 84 | 2.使用CTC loss。（参见《深度学习（二十九）》）
 85 | 
 86 | 在识别验证码的例子中，假如有两幅图，分别是123和4567，那么Label就是：
 87 | 
 88 | [[1,2,3,0]
 89 |  [4,5,6,7]]
 90 | 1
 91 | 2
 92 | 虽然英语是表音文字，但直接分解字母作为标签显然是不太精确的。
 93 | 
 94 | 这里需要用到ARPABET表，该表可以看做是国际音标的另一种表示法：
 95 | 
 96 | https://en.wikipedia.org/wiki/ARPABET
 97 | 
 98 | 还有如下工具可以将英文单词转换为ARPABET表示：
 99 | 
100 | http://www.speech.cs.cmu.edu/tools/lextool.html
101 | 
102 | 这个工具所使用的词典在：
103 | 
104 | http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/
105 | 
106 | 精度大为提高到90%。
107 | 
108 | 炼丹二
109 | 1.将LSTM改为BiLSTM。
110 | 
111 | 2.使用1x1的卷积处理频谱。给feature map以不同的权重，有助于强化有效声音，弱化噪声。
112 | 
113 | 3.使用3层FC。只对同一time step的频点做FC，不跨time step。
114 | 
115 | 原理参见《深度学习（三十）》中的Deep speech 2。
116 | 
117 | 精度再次提高到96%。如果不做第1步的修改的话，精度大概是94%，但计算快了很多，大概2个小时。
118 | 
119 | fftw
120 | fftw是一个C语言的FFT库，由MIT的Matteo Frigo和Steven G. Johnson编写。
121 | 
122 | fft的实现往简单的说，也就几十行代码。这里这个3M+的庞然大物当然没这么简单。它使用了汇编、并行等加速手段，还支持DCT和DST变换。
123 | 
124 | 官网：
125 | 
126 | http://fftw.org/
127 | 
128 | 代码：
129 | 
130 | https://github.com/FFTW/fftw3
131 | 
132 | 然而，由于fftw的代码是自动生成的，因此这个代码库实际上只供专业人士使用。普通用户直接在官网下载源代码包即可。
133 | 
134 | 参考：
135 | 
136 | https://blog.csdn.net/congwulong/article/details/7576012
137 | 
138 | FFTW中文参考
139 | 
140 | aubio
141 | aubio是一个C语言的音频分析库，提供了提取fbank、MFCC等特征的能力。
142 | 
143 | 找到aubio的过程，堪称曲折。最近要移植MFCC提取功能，到一嵌入式平台。因此要求代码必须是C语言。
144 | 
145 | 1.Kaldi是C++写的，不合要求。
146 | 
147 | 2.scipy.fftpack的核心是用C和Fortran写的，其实最主要的部分是Fortran写的。
148 | 
149 | 3.使用Java语言的话，jMIR是个不错的选择。
150 | 
151 | 代码：
152 | 
153 | https://github.com/aubio/aubio
154 | 
155 | 安装：
156 | 
157 | sudo apt-get install python3-aubio python-aubio aubio-tools libaubio-dev
158 | 
159 | aubio的fft结果是以极坐标的格式保存的，而LibROSA则是以平面坐标的格式保存的。
160 | 
161 | 示例1：测试环境是否安装好了，包括C和python环境。
162 | 
163 | https://github.com/antkillerfarm/antkillerfarm_crazy/tree/master/helloworld/aubio/1
164 | 
165 | 示例2：python：获取wav文件的频谱。C：log重定向+读取wav文件内容。
166 | 
167 | https://github.com/antkillerfarm/antkillerfarm_crazy/tree/master/helloworld/aubio/2
168 | 
169 | 参考：
170 | 
171 | http://www.cnblogs.com/daleloogn/p/4510137.html
172 | 
173 | 音乐检索研究中使用的工具
174 | 
175 | Ooura
176 | Takuya Ooura是东京大学的教授，他写了一套数值计算的软件叫做Ooura，其中包含了FFT的实现。这也是aubio默认的FFT实现。
177 | 
178 | 代码：
179 | 
180 | http://www.kurims.kyoto-u.ac.jp/~ooura/fft.html
181 | 
182 | 这是作者收集的FFT库的列表：
183 | 
184 | http://www.kurims.kyoto-u.ac.jp/~ooura/fftlinks.html
185 | 
186 | LibROSA
187 | LibROSA是一个分析音乐和语音的Python库。
188 | 
189 | 官网：
190 | 
191 | http://librosa.github.io/
192 | 
193 | 代码：
194 | 
195 | https://github.com/librosa/librosa
196 | 
197 | 文档：
198 | 
199 | http://librosa.github.io/librosa/
200 | 
201 | 参考：
202 | 
203 | http://www.cnblogs.com/xingshansi/p/6816308.html
204 | 
205 | 音频特征提取——librosa工具包使用
206 | 
207 | python_speech_features
208 | python_speech_features是另一个分析音乐和语音的Python库。
209 | 
210 | 代码：
211 | 
212 | https://github.com/jameslyons/python_speech_features
213 | 
214 | 文档：
215 | 
216 | https://python-speech-features.readthedocs.io/en/latest/
217 | 
218 | 参考
219 | 论文：
220 | 
221 | 《Small-footprint Keyword Spotting Using Deep Neural Network and Connectionist Temporal Classifier》
222 | 
223 | 这篇文章是蚂蚁金服提出的Keyword Spotting（KWS）的论文，它和本次实战所用的Speech Commands Datasets契合度很高，值得参考。
224 | 
225 | http://mp.weixin.qq.com/s/-QQjz61VAOVcWE7j-EJPhg
226 | 
227 | 谈谈蚂蚁金服的语音唤醒系统
228 | 
229 | 这里还有两篇炼丹文：
230 | 
231 | https://zhuanlan.zhihu.com/p/28133530
232 | 
233 | 一次CTC-RNN调参经历
234 | 
235 | http://www.tbluche.com/ctc_and_blank.html
236 | 
237 | The intriguing blank label in CTC
238 | 
239 | http://spandh.dcs.shef.ac.uk/chime_challenge/chime2016/
240 | 
241 | CHiMe – Computational Hearing in Multisource Environments-国际多通道语音分离和识别大赛
242 | 
243 | 图数据库
244 | Neo4j
245 | Neo4j大概算是最著名的图数据库了，它具有成熟和健壮的数据库的所有特性。相对于关系数据库来说，图数据库（Graph Database）善于处理大量复杂、互连接、低结构化的数据，这些数据变化迅速，需要频繁的查询——在关系数据库中，这些查询会导致大量的表连接，因此会产生性能上的问题。
246 | 
247 | 官网：
248 | 
249 | https://neo4j.com/
250 | 
251 | 参考：
252 | 
253 | http://blog.csdn.net/xingxiupaioxue/article/details/71747284
254 | 
255 | 如何将大规模数据导入Neo4j
256 | 
257 | https://mp.weixin.qq.com/s/_Zm88TyBcXAZ4LeQOuJCHA
258 | 
259 | 管理neo4j的用户
260 | 
261 | https://mp.weixin.qq.com/s/dzPZTqUhWKIiKj2o7OkMbA
262 | 
263 | Neo4j的python操作库Neo4j-Driver
264 | 
265 | https://mp.weixin.qq.com/s/mupuyM7m_41eOzQc7LGRRw
266 | 
267 | Neo4j的python操作库Neomodel
268 | 
269 | https://mp.weixin.qq.com/s/YVo6KduIvckYKH53fjDogw
270 | 
271 | neo4j扩展包APOC的图算法
272 | 
273 | neo4j-graph-algorithms
274 | Neo4j Graph Algorithms扩展包，是一个关于图算法的jar包，集成了一些常见的图算法，比如社区发现，路径扩展，中心点计算，PageRank等。
275 | 
276 | 代码：
277 | 
278 | https://github.com/neo4j-contrib/neo4j-graph-algorithms
279 | 
280 | openCypher
281 | openCypher是基于Neo4j的查询语言Cypher开发的，Cypher用于在图数据库中存储和检索数据。在图数据库领域，目前还没有像关系数据库中访问数据的SQL这样的通用查询语言标准。
282 | 
283 | openCypher的目标是通过简化存储、分析，以及用于访问图数据模型的工具平台，促进图处理和分析的使用。技术厂商可以在他们的工具和平台内实现Cypher。
284 | 
285 | 官网：
286 | 
287 | http://www.opencypher.org/
288 | 
289 | RedisGraph
290 | RedisGraph是Redis推出的基于Redis的图数据库。
291 | 
292 | 官网：
293 | 
294 | http://redisgraph.io/
295 | 
296 | 参考：
297 | 
298 | https://mp.weixin.qq.com/s/BzQBy6AoMXXpjsdGyXh1zA
299 | 
300 | 揭秘RedisGraph: Redis内嵌高性能内存图数据库
301 | 


--------------------------------------------------------------------------------
/lstm_002.txt:
--------------------------------------------------------------------------------
  1 | https://www.cnblogs.com/followees/p/10422809.html
  2 | 
  3 | 语音识别（LSTM+CTC）
  4 | 完整版请微信关注“大数据技术宅”
  5 | 
  6 | 序言：语音识别作为人工智能领域重要研究方向，近几年发展迅猛，其中RNN的贡献尤为突出。RNN设计的目的就是让神经网络可以处理序列化的数据。本文笔者将陪同小伙伴们一块儿踏上语音识别之梦幻旅途，相信此处风景独好。
  7 | 
  8 | 内容目录
  9 |  
 10 | 
 11 | 环境准备
 12 | 
 13 | RNN与LSTM介绍RNNLSTM语音识别介绍声学特征提取声学特征转换成音素(声学模型)音素转文本(语言模型+解码)语音识别简单实现提取WAV文件中特征将WAV文件对应的文本文件转换成音素分类定义双向LSTM
 14 | 
 15 | 模型训练和测试
 16 | 
 17 |  
 18 | 
 19 | 环境准备
 20 | 1、win10
 21 | 2、python3.6.4
 22 | 3、pip3
 23 | 4、tensorflow1.12.0
 24 | （在运行代码的时候如果显示缺少python模块，直接用pip3安装即可）
 25 | 
 26 | RNN与LSTM介绍
 27 | 循环神经网络（RNN）是神经网络模型中的一种，其中部分神经元的连接组成了有向环，有向环使得RNN中出现了内部状态或带记忆的结构，赋予了RNN对动态序列进行建模的能力。在接下来的两小节中笔者将详细的介绍一下RNN，以及RNN的变种长短期记忆(Long Short Term Memory，LSTM)网络。
 28 | 
 29 | RNN
 30 | 图1中描绘了一个简单循环神经网络，叫做Elman网络，一共包含三层：输入层(input)、隐藏层(hidden)以及输出层(output)。context unit用来存储上一次的隐藏层的值，与下一次的输入一起输入到隐藏层(图中实线表示直接复制，虚线表示需要通过学习获得)。Elman网络和乔丹网络是循环神经网络中最简单的形态，本文只介绍Elman网络，感兴趣的读者可自行查阅乔丹网络。
 31 | 
 32 | 
 33 | 
 34 | 图1 Elman网络
 35 | 
 36 | Elman网络的数学表达式如下：
 37 | 假设：
 38 | 
 39 | x(t):在t时间点的输入向量；
 40 | 
 41 | h(t):在t时间点的隐藏向量；
 42 | 
 43 | y(t):在t时间点的输出向量；
 44 | 
 45 | W、U和b:参数矩阵；
 46 | 
 47 | sigma(h)和sigma(y):激活函数。
 48 | 
 49 | 隐藏层向量和输出层向量可以表示为：
 50 | 
 51 | 
 52 | 
 53 | 
 54 | LSTM
 55 | 长短期记忆(Long Short Term Memory，LSTM)是RNN的一种，最早由Hochreiter和Schmidhuber(1977)年提出，该模型克服了一下RNN的不足，通过刻意的设计来避免长期依赖的问题。现在很多大公司的翻译和语音识别技术核心都以LSTM为主。下边就详细的介绍一下LSTM的构成。图2描绘了LSTM单元的结构。
 56 | 
 57 | 
 58 | 
 59 | 图2  LSTM单元结构
 60 | 
 61 | 为了避免RNN中梯度消失和梯度爆炸的问题，LSTM相对于普通RNN单元有比较大的区别，主要的核心思想是：
 62 | ①采用叫“细胞状态”（state）的通道贯穿整个时间序列，如图3中从C(t-1)到C(t)，这条线上只有乘法操作和加法操作。
 63 | ②通过设计“门”的结构来去除或者增加信息到细胞状态，LSML中有三个门，分别是“忘记门”、“输入门”和“输出门”。
 64 | 
 65 | 
 66 | 
 67 | 图3 “细胞状态”通道示意图
 68 | 
 69 | 下边详细阐述一下LSML中的三个门。
 70 | （1）忘记门
 71 |        图4中红色加粗部分为LSML单元中“忘记门”的位置，“忘记门”决定之前状态中的信息有多少应该舍弃。它会输出一个0和1之间的数，代表C(t-1)中保留的部分。“忘记门”的计算公式如下：
 72 | 
 73 | 
 74 | 
 75 | “忘记门”的输入是x(t)，上一时刻的隐藏层输出h(t-1)、W(f)和U(f)是“忘记门的参数”，需要通过训练获取。
 76 | 
 77 | 
 78 | 
 79 | 图4 “忘记门”
 80 | 
 81 | （2）输入门
 82 | 
 83 | 图5中红色加粗部分为“输入门”，输入门决定什么样的输入信息应该保留在“细胞状态”C(t)中。它会读取h(t-1)和x(t)的内容，其计算公式为：
 84 | 
 85 | 
 86 | 
 87 | 
 88 | 其中输入是h(t-1)和x(t)，W(i)、U(i)、W(c)、U(c)是要训练的参数。
 89 | 
 90 | 
 91 | 
 92 | 图5 “输入门”
 93 | 
 94 | 接下来，研究一下“细胞状态”的值是如何更新的。首先经过忘记门，算出旧的细胞状态中有多少被遗弃，接着输入门将所得的结果加入到细胞状态，表示新的输入信息中有多少加入到细胞状态中。计算公式如下：
 95 | 
 96 | 
 97 | 
 98 | 细胞状态的更新过程如图6所示：
 99 | 
100 | 
101 | 
102 | 图6 “细胞状态”更新
103 | 
104 | (3) 输出门
105 | 
106 |        在细胞状态更新之后，将会基于细胞状态计算输出。首先输入数据h(t-1)和x(t)，通过sigmoid激活函数得到“输出门”的值。然后，把细胞状态经过tanh处理，并与输出门的值相乘得到细胞的输出结果。输出门的公式如下：
107 | 
108 | 
109 | 
110 | 
111 |        输出门的计算流程如图7红色加粗部分所示：
112 | 
113 | 
114 | 
115 | 图7 “输出门”
116 | 
117 | 语音识别介绍
118 | 语音识别的最主要过程是：（1）从声音波形中提取声学特征；（2）将声学特征转换成发音的因素；（3）使用语言模型等解码技术转变成我们能读懂的文本。语音识别系统的典型结构如图8所示：
119 | 
120 | 
121 | 
122 | 图8 语音识别结构
123 | 
124 | 声学特征提取
125 | 声音实际上一种波，原始的音频文件叫WAV文件，WAV文件中存储的除了一个文件头以外，就是声音波形的一个个点。如图9所示：
126 | 
127 | 
128 | 
129 | 图9 声音波形示意图
130 | 
131 | 要对声音进行分析，首先对声音进行分帧，把声音切分成很多小的片段，帧与帧之间有一定的交叠，如图10，每一帧长度是25ms，帧移是10ms，两帧之间有25-10=15ms的交叠。
132 | 
133 | 
134 | 
135 | 图10 帧切割图
136 | 
137 | 分帧后，音频数据就变成了很多小的片段，然后针对小片段进行特征提取，常见的提取特征的方法有：线性预测编码(Linear Predictive Coding，LPC)，梅尔频率倒谱系数(Mel-frequency Cepstrum)，把一帧波形变成一个多维向量的过程就是声学特征提取。
138 | 
139 | 声学特征转换成音素(声学模型)
140 | 音素是人发音的基本单位。对于英文，常用的音素是一套39个音素组成的集合。对于汉语，基本就是汉语拼音的生母和韵母组成的音素集合。本文例子中LSTM+CTC神经网络就是声学特征转换成音素这个阶段，该阶段的模型被称为声学模型。
141 | 
142 | 音素转文本(语言模型+解码)
143 | 得到声音的音素序列后，就可以使用语言模型等解码技术将音素序列转换成我们可以读懂的文本。解码过程对给定的音素序列和若干假设词序列计算声学模型和语言模型分数，将总体输出分数最高的序列作为识别的结果(这部分是比较复杂的，感兴趣的读者可以查阅相关资料)。
144 | 
145 | 语音识别简单实现
146 | 本文通过一个简单的例子演示如何用tensorflow的LSTM+CTC完成一个端到端的语音识别，为了简化操作，本例子中的语音识别只训练一句话，这句话中的音素分类也简化成对应的字母（与真实因素的训练过程原理一致）。计算过程如下图所示：
147 | 
148 | 
149 | 
150 | 提取WAV文件中特征
151 | 首先读者肯定会有疑问？什么是WAV文件？笔者在此简单的介绍一下，WAV格式是微软公司开发的一种声音文件格式，也叫波形声音文件，是最早的数字音频格式，被Windows平台及其应用程序广泛支持，是一种无损的音频数据存放格式。
152 | 
153 | 本文在读取WAV的特征数据后，采用python_speech_features包中的方法来读取文件的MFCC特征，详细代码如下：
154 | 
155 | def get_audio_feature():
156 |   '''
157 |   获取wav文件提取mfcc特征之后的数据
158 |   '''
159 |   audio_filename = "audio.wav"
160 |   #读取wav文件内容，fs为采样率， audio为数据
161 |   fs, audio = wav.read(audio_filename)
162 | 
163 |   #提取mfcc特征
164 |   inputs = mfcc(audio, samplerate=fs)
165 |   # 对特征数据进行归一化，减去均值除以方差
166 |   feature_inputs = np.asarray(inputs[np.newaxis, :])
167 |   feature_inputs = (feature_inputs - np.mean(feature_inputs))/np.std(feature_inputs)  
168 | 
169 |   #特征数据的序列长度
170 |   feature_seq_len = [feature_inputs.shape[1]]
171 |   return feature_inputs, feature_seq_len
172 | 函数的返回值feature_seq_len表示这段语音被分割成了多少帧，一帧数据计算出一个13维长度的特征值。返回值feature_inputs是一个二维矩阵，矩阵行数是feature_seq_len长度，列数是13。
173 | 
174 | 将WAV文件对应的文本文件转换成音素分类
175 | 本文音素的数量是28，分别对应26个英文字母、空白符和没有分到类情况。WAV文件对应的文本文件的内容是she had your dark suit in greasy wash water all year。现在把这句话转换成整数表示的序列，空白用0表示，a-z分别用数字1-26表示，则转换的结果为：[19 8 5 0 8 1 4 0 25 15 21 18 0 4 1 18 110 19 21 9 20 0 9 14 0 7 18 5 1 19 25 0 231 19 8 0 23 1 20 5 18 0 1 12 12 0 25 5 118]，最后将整个序列转换成稀疏三元组结构，这样就可以直接用在tensorflow的tf.sparse_placeholder上。转换代码如下：
176 | 
177 | with open(target_filename, 'r') as f:
178 |     #原始文本为“she had your dark suit in greasy wash water all year”
179 |     line = f.readlines()[0].strip()
180 |     targets = line.replace(' ', '  ')
181 |     targets = targets.split(' ')
182 |     targets = np.hstack([SPACE_TOKEN if x == '' else list(x) for x in targets])
183 |     targets = np.asarray([SPACE_INDEX if x == SPACE_TOKEN else ord(x) - FIRST_INDEX
184 |                       for x in targets])
185 |     # 将列表转换成稀疏三元组
186 |     train_targets = sparse_tuple_from([targets])
187 |     print(train_targets)
188 |   return train_targets
189 | 定义双向LSTM
190 | 定义双向LSTM及LSTM之后的特征映射的代码如下：
191 | 
192 |  def inference(inputs, seq_len):
193 |   #定义一个向前计算的LSTM单元，40个隐藏单元
194 |   cell_fw = tf.contrib.rnn.LSTMCell(num_hidden, 
195 |                         initializer=tf.random_normal_initializer(
196 |                                         mean=0.0, stddev=0.1),
197 |                         state_is_tuple=True)
198 | 
199 |   # 组成一个有2个cell的list
200 |   cells_fw = [cell_fw] * num_layers
201 |   # 定义一个向后计算的LSTM单元，40个隐藏单元
202 |   cell_bw = tf.contrib.rnn.LSTMCell(num_hidden, 
203 |                         initializer=tf.random_normal_initializer(
204 |                                         mean=0.0, stddev=0.1),
205 |                         state_is_tuple=True)
206 |   # 组成一个有2个cell的list
207 |   cells_bw = [cell_bw] * num_layers
208 |   outputs, _, _ = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(cells_fw,
209 |                                                                  cells_bw,
210 |                                                                  inputs,
211 |                                                                dtype=tf.float32,
212 |                                                         sequence_length=seq_len)
213 | 
214 |   shape = tf.shape(inputs)
215 |   batch_s, max_timesteps = shape[0], shape[1]
216 |   outputs = tf.reshape(outputs, [-1, num_hidden])
217 | 
218 |   W = tf.Variable(tf.truncated_normal([num_hidden,
219 |                                          num_classes],
220 |                                         stddev=0.1))
221 | 
222 |   b = tf.Variable(tf.constant(0., shape=[num_classes]))
223 |   # 进行全连接线性计算
224 |   logits = tf.matmul(outputs, W) + b
225 |   # 将全连接计算的结果，由宽度40变成宽度80，
226 |   # 即最后的输入给CTC的数据宽度必须是26+2的宽度
227 |   logits = tf.reshape(logits, [batch_s, -1, num_classes])
228 |   # 转置，将第一维和第二维交换。
229 |   # 变成序列的长度放第一维，batch_size放第二维。
230 |   # 也是为了适应Tensorflow的CTC的输入格式
231 |   logits = tf.transpose(logits, (1, 0, 2))
232 |   return logits
233 | 模型训练和测试
234 | 最后将读取数据、构建LSTM+CTC网络及训练过程结合起来，在完成500次迭代训练后，进行测试，并将结果输出，部分代码如下（完整代码，请读者关注本公众号“大数据技术宅”，输入“语音识别demo”获取）：
235 | 
236 |  def main():
237 |   # 输入特征数据，形状为：[batch_size, 序列长度，一帧特征数]
238 |   inputs = tf.placeholder(tf.float32, [None, None, num_features])
239 | 
240 |   # 输入数据的label，定义成稀疏sparse_placeholder会生成稀疏的tensor：SparseTensor
241 |   # 这个结构可以直接输入给ctc求loss
242 |   targets = tf.sparse_placeholder(tf.int32)
243 | 
244 |   # 序列的长度，大小是[batch_size]大小
245 |   # 表示的是batch中每个样本的有效序列长度是多少
246 |   seq_len = tf.placeholder(tf.int32, [None])
247 | 
248 |   # 向前计算网络，定义网络结构，输入是特征数据，输出提供给ctc计算损失值。
249 |   logits = inference(inputs, seq_len)
250 | 
251 |   # ctc计算损失
252 |   # 参数targets必须是一个值为int32的稀疏tensor的结构：tf.SparseTensor
253 |   # 参数logits是前面lstm网络的输出
254 |   # 参数seq_len是这个batch的样本中，每个样本的序列长度。
255 |   loss = tf.nn.ctc_loss(targets, logits, seq_len)
256 |   # 计算损失的平均值
257 |   cost = tf.reduce_mean(loss)
258 | 
259 | 训练过程及结果如下图：
260 | 
261 | 
262 | 
263 | 从上图训练结果可以清洗的看出经过500次的迭代训练，语音文件基本已经可以完全识别，本例只演示了一个简单的LSTM+CTC的端到端的训练，实际的语音识别系统还需要大量训练样本以及将音素转换成文本的解码过程。后续文章中，笔者会继续深入语音识别。
264 |        最后，在2019开年之际，笔者祝各位爱学习的小哥哥，小姐姐，骑猪当先，万猪奔腾，猪年行大运，发大财。
265 |        
266 |        
267 | 


--------------------------------------------------------------------------------
/mace_001.txt:
--------------------------------------------------------------------------------
1 | https://github.com/XiaoMi/mace/issues/732
2 | https://github.com/XiaoMi/mace/issues/541
3 | https://github.com/XiaoMi/mace-models/blob/master/kaldi-models/nnet3/callhome.yml
4 | 小米移动端深度学习框架实践 AICON2018
5 | https://github.com/XiaoMi/mace/issues/287
6 | 


--------------------------------------------------------------------------------
/mbed_001.txt:
--------------------------------------------------------------------------------
 1 | 我测试过，ML-KWS-for-MCU的测试mbed 2工程可以正常在NUCLEO_F411RE上运行
 2 | https://github.com/ARM-software/ML-KWS-for-MCU/tree/master/Deployment
 3 | 有兴趣可以购买NUCLEO_F411RE开发板试试，反正不贵，而且它自带的st-link v2.1支持u盘拖曳烧录（类似micro:bit和daplink），适合mbed开发。
 4 | 不过如果想买这个开发板要考虑几个问题：
 5 | （1）建议用Win 7，因为要另外装驱动，XP可能不行（2）st-link v2.1是支持用STM32CubeProgrammer烧录的，
 6 | 不过你装过STM32CubeProgrammer这个软件之后还是要再装Windows的驱动程序（板上的跳帽不需要变），
 7 | 反正板载st-link的灯是闪烁的话一般是驱动没装，或者数据线是坏的（或者数据线只能提供电源没有usb功能）
 8 | （3）需要的usb线口是小型方口，普通的手机数据线不行（4）如果驱动正常的话（可能需要重启系统），
 9 | 接通板载st-link后会出现一个u盘和一个调试串口，那个调试串口的默认波特率配置是9600-8-1-none-none，
10 | 如果波特率不是这个数，可能会看不到调试串口输出（程序里面直接就是printf就可以打印到这个虚拟的调试串口，
11 | 不属于stm32f411的三个串口之一）。所以如果你想检查ML-KWS-for-MCU的代码有没有跑通，需要自己改动代码，
12 | 加上闪烁灯的代码和改写printf的代码，网上搜索mbed 2 blinky即可。（5）mbed-cli的安装和编译过程最好先
13 | 弄懂再买开发板（以前我已经说了）。另外，由于printf只是输出到st-link v2.1的虚拟调试串口，
14 | 所以我之前把ML-KWS-for-MCU的固件烧录到stm32f411ce小型板是肯定看不到串口输出的（山寨st-link没有这个功能），
15 | 需要改动代码，并且最好把main函数放到一个死循环中
16 | 
17 | 
18 | 
19 | 
20 | 
21 | 我来总结一下arm的ml-kws的编译坑，希望能有一些帮助，虽然以后可能会继续完善
22 | （1）关于mbed-cli的操作系统环境。我建议用windows，当然也可以用linux，后面我会解释为什么windows和linux是差不多的，
23 | 然后确保有10GB的硬盘空间，因为mbed占用的硬盘比较大。建议装一个全新的python 3.7。
24 | （2）关于mbed-cli的python包安装。如果你想调试mbed-cli的报错（因为经常会出现mbed命令失败），
25 | 那么建议用源码方式安装mbed-cli包到python环境（python setup.py install）。如果你想偷懒，可以直接pip3 install mbed-cli在线安装最新版。
26 | （3）设置ARM_GCC环境变量。网上有介绍，我就不赘述了。如果漏了这一步，mbed compile可能会报错，但mbed-cli会提醒ARM_GCC的问题，
27 | 因为mbed-cli调用的mbed-sdk-tools会检查ARM_GCC指向的路径是否存在指定名字的工具链gcc执行文件
28 | （4）关于mbed new命令。官方推荐用mbed new 工程名 --mbedlib，但实际上这些创建工程的命令很可能会下载包失败（因为下载的东西太多了）。
29 | 这个坑比较严重，我的折中方法是，用mbed new kws_simple_test --create-only创建工程，然后单独把mbed（实际上是mbed 2）和mbed-sdk-tools的仓库代码下载下来，
30 | 分别解压到mbed子目录和tools子目录下（这样做的原因可以看mbed-cli的python代码），然后就能执行mbed deploy成功了。否则，无法mbed deploy，更无法mbed compile
31 | （5）关于mbed（实际上是mbed2）的代码下载。这个是另外一个坑。mbed下载下来的zip文件是缺文件的（例如缺了mbed.h和drivers目录）：
32 | https://os.mbed.com/users/mbed_official/code/mbed/
33 | 这个问题在github上的issue有人反映了，你可能会奇怪为什么会缺文件，其实原因很简单，因为这个仓库的文件体积太大了，
34 | 而这个仓库又不支持git clone（似乎是用hg做版本控制的），所以唯一的解决办法是把缺了的文件（例如mbed.h）逐个下载下来或者复制下来，
35 | 但数量不是很多，所以工作量不会太大（还有一些BSP文件缺了的不用管，反正没用到）
36 | （6）关于mbed compile编译。如果解决好前面的问题，这一步就很简单了。不过要小心一个问题：千万不要把CMSIS_5或者其他依赖库放在工程目录下，
37 | 例如kws_simple_test，因为mbed-cli会扫描所有工程目录下的所有子目录（递归的），导致很严重的编译问题（所有子目录都是include目录），
38 | 官方其实已经解决了这个问题，把CMSIS_5放在示例工程目录的外面而非里面。另外，编译的过程中mbed-cli其实是间接调用tools目录下的python脚本，
39 | 所以mbed-cli其实只是个外壳罢了，编译时的行为是依赖于mbed-sdk-tools的
40 | https://os.mbed.com/users/mbed_official/code/mbed-sdk-tools/
41 | 
42 | 


--------------------------------------------------------------------------------
/msys_001.md:
--------------------------------------------------------------------------------
1 | ## msys  
2 | * https://github.com/weimingtom/wmt_nrf_study  
3 | * https://github.com/weimingtom/wmt_xr872_study  
4 | * https://github.com/weimingtom/wmt_hi3861_study  
5 | * https://github.com/weimingtom/wmt_esp32_study  
6 | 
7 | ## msys2  
8 | * https://github.com/weimingtom/wmt_esp32_study  
9 | 


--------------------------------------------------------------------------------
/numpy_001.md:
--------------------------------------------------------------------------------
 1 | ## miniconda (without numpy)    
 2 | Miniconda3-latest-Windows-x86_64.exe  
 3 | https://docs.conda.io/en/latest/miniconda.html  
 4 | 
 5 | ## Thonny 3.3.5 (for Windows) (without numpy)      
 6 | https://thonny.org  
 7 | thonny-3.3.5.exe  
 8 | Scripts\pip.bat install numpy==1.14.6 matplotlib==2.2.3 pandas==0.23.4  
 9 | 
10 | ## 【安富莱——DSP教程】第28章 ST官方汇编FFT库应用  
11 | https://bbs.elecfans.com/jishu_496212_1_1.html  
12 | matlab:  
13 | ```
14 | Fs = 1000;                  % 采样率
15 | N  = 1024;           % 采样点数
16 | n  = 0:N-1;           % 采样序列
17 | t  = 0:1/Fs:1-1/Fs;     % 时间序列
18 | f = n * Fs / N;          %真实的频率
19 | 
20 | %波形是由直流分量，50Hz正弦波和20Hz正弦波组成
21 | x = 1024 + 1024*sin(2*pi*50*t) + 512*sin(2*pi*20*t)  ;
22 | y = fft(x, N);               %对原始信号做FFT变换
23 | 
24 | subplot(2,1,1);
25 | Mag = abs(y)*2/N;         %求FFT转换结果的模值
26 | plot(f, Mag);               %绘制幅频相应曲线
27 | title('Matlab计算结果');
28 | xlabel('频率');
29 | ylabel('幅度');
30 | 
31 | subplot(2,1,2);
32 | plot(f, sampledata);   %绘制STM32计算的幅频相应
33 | title('STM32计算结果');
34 | xlabel('频率');
35 | ylabel('幅度');
36 | ```
37 | NumPy:  
38 | ```
39 | import numpy as np
40 | import matplotlib.pyplot as plt
41 | 
42 | Fs = 1000
43 | N = 1024
44 | n = np.r_[0 : N]
45 | t = np.r_[0 : 1 : 1. / Fs]
46 | f = n * Fs / N
47 | x = 1024 + 1024 * np.sin(2 * np.pi * 50 * t) + 512 * np.sin(2 * np.pi * 20 * t)
48 | y = np.fft.fft(x, N)
49 | Mag = np.abs(y) * 2 / N
50 | plt.plot(f, Mag)
51 | plt.show()
52 | ```
53 | 
54 | ## 【Python笔记】如何编译不依赖lapack和atlas库的NumPy包  
55 | https://blog.csdn.net/slvher/article/details/44833107  
56 | 


--------------------------------------------------------------------------------
/numpy_002.txt:
--------------------------------------------------------------------------------
  1 | http://www.elecfans.com/d/601620.html
  2 | 
  3 | 基于Python的numpy进行的数字信号的频谱分析详解
  4 | 2017-12-12 14:16 • 30530次阅读 0
  5 | 
  6 | Python是目前的热门语言，一直觉得掌握一门编程语言对作为搞技术的来说还是很有必要的，结合工作中能用到的一些数据处理和分析的内容，觉得从数据分析入手，争取能够掌握Python在数据处理领域的一些应用。下面是基于Python的numpy进行的数字信号的频谱分析介绍
  7 | 
  8 | 一、傅里叶变换
  9 | 傅里叶变换是信号领域沟通时域和频域的桥梁，在频域里可以更方便的进行一些分析。傅里叶主要针对的是平稳信号的频率特性分析，简单说就是具有一定周期性的信号，因为傅里叶变换采取的是有限取样的方式，所以对于取样长度和取样对象有着一定的要求。
 10 | 
 11 | 
 12 | 二、基于Python的频谱分析
 13 | #_*_coding:utf-8_*_
 14 | 
 15 | importnumpyasnp#导入一个数据处理的模块
 16 | 
 17 | importpylabaspl#导入一个绘图模块，matplotlib下的模块
 18 | 
 19 | sampling_rate=8000##取样频率
 20 | 
 21 | fft_size=512#FFT处理的取样长度
 22 | 
 23 | t=np.arange(0,1.1,1.0/sampling_rate)
 24 | 
 25 | #np.arange(起点，终点，间隔)产生1s长的取样时间
 26 | 
 27 | x=np.sin(2*np.pi*156.25*t)+2*np.sin(2*np.pi*234.375*t)
 28 | 
 29 | #两个正弦波叠加，156.25HZ和234.375HZ，因此如上面简单
 30 | 
 31 | #的介绍FFT对于取样时间有要求，
 32 | 
 33 | #N点FFT进行精确频谱分析的要求是N个取样点包含整数个
 34 | 
 35 | #取样对象的波形。
 36 | 
 37 | #因此N点FFT能够完美计算频谱对取样对象的要求
 38 | 
 39 | #是n*Fs/N（n*采样频率/FFT长度），
 40 | 
 41 | #因此对8KHZ和512点而言，
 42 | 
 43 | #完美采样对象的周期最小要求是8000/512=15.625HZ,
 44 | 
 45 | #所以156.25的n为10,234.375的n为15。
 46 | 
 47 | xs=x[:fft_size]#从波形数据中取样fft_size个点进行运算
 48 | 
 49 | xf=np.fft.rfft(xs)/fft_size#利用np.fft.rfft()进行FFT计算，rfft()是为了更方便
 50 | 
 51 | #对实数信号进行变换，由公式可知/fft_size为了正确显示波形能量
 52 | 
 53 | #rfft函数的返回值是N/2+1个复数，分别表示从0(Hz)
 54 | 
 55 | #到sampling_rate/2(Hz)的分。
 56 | 
 57 | #于是可以通过下面的np.linspace计算出返回值中每个下标对应的真正的频率：
 58 | 
 59 | freqs=np.linspace(0,sampling_rate/2,fft_size/2+1)
 60 | 
 61 | #np.linspace(start,stop,num=50,endpoint=True,retstep=False,dtype=None)
 62 | 
 63 | #在指定的间隔内返回均匀间隔的数字
 64 | 
 65 | xfp=20*np.log10(np.clip(np.abs(xf),1e-20,1e1000))
 66 | 
 67 | #最后我们计算每个频率分量的幅值，并通过20*np.log10()
 68 | 
 69 | #将其转换为以db单位的值。为了防止0幅值的成分造成log10无法计算，
 70 | 
 71 | #我们调用np.clip对xf的幅值进行上下限处理
 72 | 
 73 | pl.figure(figsize=(8,4))
 74 | 
 75 | pl.subplot(211)
 76 | 
 77 | pl.plot(t[:fft_size],xs)
 78 | 
 79 | pl.xlabel(u"时间(秒)")
 80 | 
 81 | pl.title(u"TheWaveandSpectrum156.25Hz234.375Hz")
 82 | 
 83 | pl.subplot(212)
 84 | 
 85 | pl.plot(freqs,xfp)
 86 | 
 87 | pl.xlabel(u"Hz")
 88 | 
 89 | pl.subplots_adjust(hspace=0.4)
 90 | 
 91 | pl.show()
 92 | 
 93 | #绘图显示结果
 94 | 
 95 | 
 96 | 
 97 | 现在来看看频谱泄露，将采样对象的频率改变
 98 | 
 99 | x=np.sin(2*np.pi*100*t)+2*np.sin(2*np.pi*234.375*t)
100 | 
101 | 
102 | 
103 | 我们明显看出，第一个对象的频谱分析出现“泄露”，能量分散到其他频率上，
104 | 
105 | 没法准确计算到计算对象的频谱特性。
106 | 
107 | 窗函数
108 | 
109 | 上面我们可以看出可以通过加“窗”函数的方法来处理，尽量保证FFT长度内
110 | 
111 | 的取样对象是对称的。
112 | 
113 | importpylabaspl
114 | 
115 | importscipy.signalassignal
116 | 
117 | pl.figure(figsize=(8,3))
118 | 
119 | pl.plot(signal.hann(512))#汉明窗函数
120 | 
121 | pl.show()
122 | 
123 | 
124 | 
125 | 对上述出现频谱泄露的函数进行加窗处理，后面会介绍一下各种加窗函数的原理和效果。
126 | 
127 | 


--------------------------------------------------------------------------------
/online_game_001.md:
--------------------------------------------------------------------------------
 1 | ## 多人在线游戏架构实战  
 2 | * http://www.hzbook.com/index.php/Book/search.html?k=多人在线游戏架构实战  
 3 | * search baidupan, 多人在线游戏架构实战  
 4 | * https://github.com/setuppf/GameBookServer  
 5 | * https://github.com/setuppf/GameBookClient  
 6 | 
 7 | ## Unity UI Layout, Unity布局问题  
 8 | https://github.com/weimingtom/GalGame-1/tree/master/GalGameUnity5/Assets/Scenes  
 9 | 双击打开end场景（end.unity），点上方的运行按钮，然后就可以看到一个最简单的布局，一个白色背景，一个背景图，三个按钮。  
10 | 这里演示了如何处理背景图和按钮的定位问题。  
11 | （1）对于图片加载，你可以通过Unity自带的原生GUI的Image类来实现（层级窗口->右键菜单->UI->Image），  
12 | 当然这里也许有别的办法，我以后再讨论，不过这可能是最容易理解的方法。  
13 | （2）背景图布局：设置好大小和对齐方式即可，这里就是写死了大小尺寸和坐标居中  
14 | （3）按钮布局：类似背景图，不同的是，按钮的层级是在背景图CG的子对象，而不是在CG的旁边。另外这里大小是写死的，  
15 | 但位置不是居中，而是一个写死的坐标（Pos X、Pos Y）。  
16 | 于是这样就做好了最简单的游戏界面，一张图三个按钮，但这样的布局有一个很奇怪的问题：为什么运行后，  
17 | 背景图CG层实际显示的大小和写死的长宽值有出入？  
18 | 我认为这可能是Image类的特殊性导致的，导致在运行期width和height的数值会自动调整。  
19 | 我下一次再实际操作一下，解决这个疑惑    
20 | 
21 | ## eq2  
22 | http://sourceforge.net/projects/eq2emulator/  
23 | search baidupan, eq2_Source_v1.7z  
24 | 
25 | ## sogou/workflow  
26 | https://github.com/sogou/workflow  
27 | 


--------------------------------------------------------------------------------
/python_speech_001.txt:
--------------------------------------------------------------------------------
  1 | https://cloud.tencent.com/developer/article/1109408?fromSource=waitui
  2 | 
  3 | python语音识别终极指南
  4 | 2018-04-26阅读 1K0
  5 | 
  6 | 译者 | 廉洁
  7 | 
  8 | 编辑 | 明明
  9 | 
 10 | 【AI科技大本营导读】亚马逊的 Alexa 的巨大成功已经证明：在不远的将来，实现一定程度上的语音支持将成为日常科技的基本要求。整合了语音识别的 Python 程序提供了其他技术无法比拟的交互性和可访问性。最重要的是，在 Python 程序中实现语音识别非常简单。阅读本指南，你就将会了解。你将学到：
 11 | 
 12 | •语音识别的工作原理；
 13 | 
 14 | •PyPI 支持哪些软件包; 
 15 | 
 16 | •如何安装和使用 SpeechRecognition 软件包——一个功能全面且易于使用的 Python 语音识别库。
 17 | 
 18 | ▌语言识别工作原理概述
 19 | 语音识别源于 20 世纪 50 年代早期在贝尔实验室所做的研究。早期语音识别系统仅能识别单个讲话者以及只有约十几个单词的词汇量。现代语音识别系统已经取得了很大进步，可以识别多个讲话者，并且拥有识别多种语言的庞大词汇表。
 20 | 
 21 | 语音识别的首要部分当然是语音。通过麦克风，语音便从物理声音被转换为电信号，然后通过模数转换器转换为数据。一旦被数字化，就可适用若干种模型，将音频转录为文本。
 22 | 
 23 | 大多数现代语音识别系统都依赖于隐马尔可夫模型（HMM）。其工作原理为：语音信号在非常短的时间尺度上（比如 10 毫秒）可被近似为静止过程，即一个其统计特性不随时间变化的过程。
 24 | 
 25 | 许多现代语音识别系统会在 HMM 识别之前使用神经网络，通过特征变换和降维的技术来简化语音信号。也可以使用语音活动检测器（VAD）将音频信号减少到可能仅包含语音的部分。
 26 | 
 27 | 幸运的是，对于 Python 使用者而言，一些语音识别服务可通过 API 在线使用，且其中大部分也提供了 Python SDK。 
 28 | 
 29 | ▌选择 Python 语音识别包
 30 | PyPI中有一些现成的语音识别软件包。其中包括：
 31 | 
 32 | •apiai
 33 | 
 34 | •google-cloud-speech
 35 | 
 36 | •pocketsphinx
 37 | 
 38 | •SpeechRcognition
 39 | 
 40 | •watson-developer-cloud
 41 | 
 42 | •wit
 43 | 
 44 | 一些软件包（如 wit 和 apiai ）提供了一些超出基本语音识别的内置功能，如识别讲话者意图的自然语言处理功能。其他软件包，如谷歌云语音，则专注于语音向文本的转换。
 45 | 
 46 | 其中，SpeechRecognition 就因便于使用脱颖而出。
 47 | 
 48 | 识别语音需要输入音频，而在 SpeechRecognition 中检索音频输入是非常简单的，它无需构建访问麦克风和从头开始处理音频文件的脚本，只需几分钟即可自动完成检索并运行。
 49 | 
 50 | SpeechRecognition 库可满足几种主流语音 API ，因此灵活性极高。其中 Google Web Speech API 支持硬编码到 SpeechRecognition 库中的默认 API 密钥，无需注册就可使用。SpeechRecognition 以其灵活性和易用性成为编写 Python 程序的最佳选择。
 51 | 
 52 | 
 53 | 
 54 | ▌安装 SpeechRecognation
 55 | SpeechRecognition 兼容 Python2.6 , 2.7 和 3.3+，但若在 Python 2 中使用还需要一些额外的安装步骤。本教程中所有开发版本默认 Python 3.3+。
 56 | 
 57 | 读者可使用 pip 命令从终端安装 SpeechRecognition：
 58 | 
 59 | $ pip install SpeechRecognition
 60 | 安装完成后请打开解释器窗口并输入以下内容来验证安装：
 61 | 
 62 | >>> import speech_recognition as sr
 63 | >>> sr.__version__
 64 | '3.8.1'
 65 | 注：不要关闭此会话，在后几个步骤中你将要使用它。
 66 | 
 67 | 若处理现有的音频文件，只需直接调用 SpeechRecognition ，注意具体的用例的一些依赖关系。同时注意，安装 PyAudio 包来获取麦克风输入。
 68 | 
 69 | ▌识别器类
 70 | SpeechRecognition 的核心就是识别器类。
 71 | 
 72 | Recognizer API 主要目是识别语音，每个 API 都有多种设置和功能来识别音频源的语音，分别是：
 73 | 
 74 | recognize_bing(): Microsoft Bing Speech
 75 | recognize_google(): Google Web Speech API
 76 | recognize_google_cloud(): Google Cloud Speech - requires installation of the google-cloud-speech package
 77 | recognize_houndify(): Houndify by SoundHound
 78 | recognize_ibm(): IBM Speech to Text
 79 | recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx
 80 | recognize_wit(): Wit.ai
 81 | 以上七个中只有 recognition_sphinx（）可与CMU Sphinx 引擎脱机工作， 其他六个都需要连接互联网。
 82 | 
 83 | SpeechRecognition 附带 Google Web Speech API 的默认 API 密钥，可直接使用它。其他六个 API 都需要使用 API 密钥或用户名/密码组合进行身份验证，因此本文使用了 Web Speech API。
 84 | 
 85 | 现在开始着手实践，在解释器会话中调用 recognise_google（）函数。
 86 | 
 87 | >>> r.recognize_google()
 88 | 屏幕会出现：
 89 | 
 90 | Traceback (most recent call last):
 91 |  File "<stdin>", line 1, in <module>
 92 | TypeError: recognize_google() missing 1 required positional argument: 'audio_data'
 93 | 相信你已经猜到了结果，怎么可能从空文件中识别出数据呢？
 94 | 
 95 | 这 7 个 recognize_*()  识别器类都需要输入 audio_data 参数，且每种识别器的 audio_data 都必须是 SpeechRecognition 的 AudioData 类的实例。
 96 | 
 97 | AudioData 实例的创建有两种路径：音频文件或由麦克风录制的音频，先从比较容易上手的音频文件开始。
 98 | 
 99 | ▌音频文件的使用
100 | 
101 | 首先需要下载音频文件（https://github.com/realpython/python-speech-recognition/tree/master/audio_files），保存到 Python 解释器会话所在的目录中。
102 | 
103 | AudioFile 类可以通过音频文件的路径进行初始化，并提供用于读取和处理文件内容的上下文管理器界面。
104 | 
105 | 支持文件类型
106 | 
107 |  SpeechRecognition 目前支持的文件类型有：
108 | 
109 | WAV: 必须是 PCM/LPCM 格式
110 | AIFF
111 | AIFF-C
112 | FLAC: 必须是初始 FLAC 格式；OGG-FLAC 格式不可用
113 | 若是使用 Linux 系统下的 x-86 ，macOS 或者是 Windows 系统，需要支持 FLAC文件。若在其它系统下运行，需要安装 FLAC 编码器并确保可以访问 flac 命令。
114 | 
115 | 使用 record() 从文件中获取数据
116 | 
117 | 在解释器会话框键入以下命令来处理 “harvard.wav” 文件的内容：
118 | 
119 | >>> harvard = sr.AudioFile('harvard.wav')
120 | >>> with harvard as source:
121 | ...   audio = r.record(source)
122 | ...
123 | 通过上下文管理器打开文件并读取文件内容，并将数据存储在 AudioFile 实例中，然后通过 record（）将整个文件中的数据记录到 AudioData 实例中，可通过检查音频类型来确认：
124 | 
125 | >>> type(audio)
126 | <class 'speech_recognition.AudioData'>
127 | 现在可以调用 recognition_google（）来尝试识别音频中的语音。
128 | 
129 | >>> r.recognize_google(audio)
130 | 'the stale smell of old beer lingers it takes heat
131 | to bring out the odor a cold dip restores health and
132 | zest a salt pickle taste fine with ham tacos al
133 | Pastore are my favorite a zestful food is the hot
134 | cross bun'
135 | 以上就完成了第一个音频文件的录制。
136 | 
137 | 利用偏移量和持续时间获取音频片段
138 | 
139 | 若只想捕捉文件中部分演讲内容该怎么办？record() 命令中有一个 duration 关键字参数，可使得该命令在指定的秒数后停止记录。
140 | 
141 | 例如，以下内容仅获取文件前四秒内的语音：
142 | 
143 | >>> with harvard as source:
144 | ...   audio = r.record(source, duration=4)
145 | ...
146 | >>> r.recognize_google(audio)
147 | 'the stale smell of old beer lingers'
148 | 在with块中调用record() 命令时，文件流会向前移动。这意味着若先录制四秒钟，再录制四秒钟，则第一个四秒后将返回第二个四秒钟的音频。
149 | 
150 | >>> with harvard as source:
151 | ...   audio1 = r.record(source, duration=4)
152 | ...   audio2 = r.record(source, duration=4)
153 | ...
154 | >>> r.recognize_google(audio1)
155 | 'the stale smell of old beer lingers'
156 | >>> r.recognize_google(audio2)
157 | 'it takes heat to bring out the odor a cold dip'
158 | 除了指定记录持续时间之外，还可以使用 offset 参数为 record() 命令指定起点，其值表示在开始记录的时间。如：仅获取文件中的第二个短语，可设置 4 秒的偏移量并记录 3 秒的持续时间。
159 | 
160 | >>> with harvard as source:
161 | ...   audio = r.record(source, offset=4, duration=3)
162 | ...
163 | >>> recognizer.recognize_google(audio)
164 | 'it takes heat to bring out the odor'
165 | 在事先知道文件中语音结构的情况下，offset 和 duration 关键字参数对于分割音频文件非常有用。但使用不准确会导致转录不佳。
166 | 
167 | >>> with harvard as source:
168 | ...   audio = r.record(source, offset=4.7, duration=2.8)
169 | ...
170 | >>> recognizer.recognize_google(audio)
171 | 'Mesquite to bring out the odor Aiko'
172 | 本程序从第 4.7 秒开始记录，从而使得词组 “it takes heat to bring out the odor” ，中的 “it t” 没有被记录下来，此时 API 只得到 “akes heat” 这个输入，而与之匹配的是 “Mesquite” 这个结果。
173 | 
174 | 同样的，在获取录音结尾词组 “a cold dip restores health and zest” 时 API 仅仅捕获了 “a co” ，从而被错误匹配为 “Aiko” 。
175 | 
176 | 噪音也是影响翻译准确度的一大元凶。上面的例子中由于音频文件干净从而运行良好，但在现实中，除非事先对音频文件进行处理，否则不可能得到无噪声音频。
177 | 
178 | 噪声对语音识别的影响
179 | 
180 | 噪声在现实世界中确实存在，所有录音都有一定程度的噪声，而未经处理的噪音可能会破坏语音识别应用程序的准确性。
181 | 
182 | 要了解噪声如何影响语音识别，请下载 “jackhammer.wav” （https://github.com/realpython/python-speech-recognition/tree/master/audio_files）文件，并确保将其保存到解释器会话的工作目录中。文件中短语 “the stale smell of old beer lingers” 在是很大钻墙声的背景音中被念出来。
183 | 
184 | 尝试转录此文件时会发生什么？
185 | 
186 | >>> jackhammer = sr.AudioFile('jackhammer.wav')
187 | >>> with jackhammer as source:
188 | ...   audio = r.record(source)
189 | ...
190 | >>> r.recognize_google(audio)
191 | 'the snail smell of old gear vendors'
192 | 那么该如何处理这个问题呢？可以尝试调用 Recognizer 类的adjust_for_ambient_noise（）命令。
193 | 
194 | >>> with jackhammer as source:
195 | ...   r.adjust_for_ambient_noise(source)
196 | ...   audio = r.record(source)
197 | ...
198 | >>> r.recognize_google(audio)
199 | 'still smell of old beer vendors'
200 | 这样就与准确结果接近多了，但精确度依然存在问题，而且词组开头的 “the” 被丢失了，这是什么原因呢？
201 | 
202 | 因为使用 adjust_for_ambient_noise（）命令时，默认将文件流的第一秒识别为音频的噪声级别，因此在使用 record（）获取数据前，文件的第一秒已经被消耗了。
203 | 
204 | 可使用duration关键字参数来调整adjust_for_ambient_noise（）命令的时间分析范围，该参数单位为秒，默认为 1，现将此值降低到 0.5。
205 | 
206 | >>> with jackhammer as source:
207 | ...   r.adjust_for_ambient_noise(source, duration=0.5)
208 | ...   audio = r.record(source)
209 | ...
210 | >>> r.recognize_google(audio)
211 | 'the snail smell like old Beer Mongers'
212 | 现在我们就得到了这句话的 “the”，但现在出现了一些新的问题——有时因为信号太吵，无法消除噪音的影响。
213 | 
214 | 若经常遇到这些问题，则需要对音频进行一些预处理。可以通过音频编辑软件，或将滤镜应用于文件的 Python 包（例如SciPy）中来进行该预处理。处理嘈杂的文件时，可以通过查看实际的 API 响应来提高准确性。大多数 API 返回一个包含多个可能转录的 JSON 字符串，但若不强制要求给出完整响应时，recognition_google（）方法始终仅返回最可能的转录字符。
215 | 
216 | 通过把 recognition_google（）中 True 参数改成 show_all 来给出完整响应。
217 | 
218 | >>> r.recognize_google(audio, show_all=True)
219 | {'alternative': [
220 |  {'transcript': 'the snail smell like old Beer Mongers'}, 
221 |  {'transcript': 'the still smell of old beer vendors'}, 
222 |  {'transcript': 'the snail smell like old beer vendors'},
223 |  {'transcript': 'the stale smell of old beer vendors'}, 
224 |  {'transcript': 'the snail smell like old beermongers'}, 
225 |  {'transcript': 'destihl smell of old beer vendors'}, 
226 |  {'transcript': 'the still smell like old beer vendors'}, 
227 |  {'transcript': 'bastille smell of old beer vendors'}, 
228 |  {'transcript': 'the still smell like old beermongers'}, 
229 |  {'transcript': 'the still smell of old beer venders'}, 
230 |  {'transcript': 'the still smelling old beer vendors'}, 
231 |  {'transcript': 'musty smell of old beer vendors'}, 
232 |  {'transcript': 'the still smell of old beer vendor'}
233 | ], 'final': True}
234 | 可以看到，recognition_google（）返回了一个关键字为 'alternative' 的列表，指的是所有可能的响应列表。此响应列表结构会因 API 而异且主要用于对结果进行调试。
235 | 
236 | ▌麦克风的使用
237 | 若要使用 SpeechRecognizer 访问麦克风则必须安装 PyAudio 软件包，请关闭当前的解释器窗口，进行以下操作：
238 | 
239 | 安装 PyAudio
240 | 
241 | 安装 PyAudio 的过程会因操作系统而异。
242 | 
243 | Debian Linux
244 | 
245 | 如果使用的是基于 Debian的Linux（如 Ubuntu ），则可使用 apt 安装 PyAudio：
246 | 
247 | $ sudo apt-get install python-pyaudio python3-pyaudio
248 | 安装完成后可能仍需要启用 pip install pyaudio ，尤其是在虚拟情况下运行。
249 | 
250 | macOS
251 | 
252 | macOS 用户则首先需要使用 Homebrew 来安装 PortAudio，然后调用 pip 命令来安装 PyAudio。
253 | 
254 | $ brew install portaudio
255 | $ pip install pyaudio
256 | Windows
257 | 
258 | Windows 用户可直接调用 pip 来安装 PyAudio。
259 | 
260 | $ pip install pyaudio
261 | 安装测试
262 | 
263 | 安装了 PyAudio 后可从控制台进行安装测试。
264 | 
265 | $ python -m speech_recognition
266 | 请确保默认麦克风打开并取消静音，若安装正常则应该看到如下所示的内容：
267 | 
268 | A moment of silence, please...
269 | Set minimum energy threshold to 600.4452854381937
270 | Say something!
271 | 请对着麦克风讲话并观察 SpeechRecognition 如何转录你的讲话。
272 | 
273 | Microphone 类
274 | 
275 | 请打开另一个解释器会话，并创建识一个别器类的例子。
276 | 
277 | >>> import speech_recognition as sr
278 | >>> r = sr.Recognizer()
279 | 此时将使用默认系统麦克风，而不是使用音频文件作为信号源。读者可通过创建一个Microphone 类的实例来访问它。
280 | 
281 | >>> mic = sr.Microphone()
282 | 若系统没有默认麦克风（如在 RaspberryPi 上）或想要使用非默认麦克风，则需要通过提供设备索引来指定要使用的麦克风。读者可通过调用 Microphone 类的list_microphone_names（）函数来获取麦克风名称列表。
283 | 
284 | >>> sr.Microphone.list_microphone_names()
285 | ['HDA Intel PCH: ALC272 Analog (hw:0,0)',
286 |  'HDA Intel PCH: HDMI 0 (hw:0,3)',
287 |  'sysdefault',
288 |  'front',
289 |  'surround40',
290 |  'surround51',
291 |  'surround71',
292 |  'hdmi',
293 |  'pulse',
294 |  'dmix', 
295 |  'default']
296 | 注意：你的输出可能与上例不同。
297 | 
298 | list_microphone_names（）返回列表中麦克风设备名称的索引。在上面的输出中，如果要使用名为 “front” 的麦克风，该麦克风在列表中索引为 3，则可以创建如下所示的麦克风实例：
299 | 
300 | >>> # This is just an example; do not run
301 | >>> mic = sr.Microphone(device_index=3)
302 | 但大多数情况下需要使用系统默认麦克风。
303 | 
304 | 使用 listen（）获取麦克风输入数据
305 | 
306 | 准备好麦克风实例后，读者可以捕获一些输入。
307 | 
308 | 就像 AudioFile 类一样，Microphone 是一个上下文管理器。可以使用 with 块中 Recognizer 类的 listen（）方法捕获麦克风的输入。该方法将音频源作为第一个参数，并自动记录来自源的输入，直到检测到静音时自动停止。
309 | 
310 | >>> with mic as source:
311 | ...   audio = r.listen(source)
312 | ...
313 | 执行 with 块后请尝试在麦克风中说出 “hello” 。请等待解释器再次显示提示，一旦出现 “>>>” 提示返回就可以识别语音。
314 | 
315 | >>> r.recognize_google(audio)
316 | 'hello'
317 | 如果没有提示再次返回，可能是因为麦克风收到太多的环境噪音，请使用 Ctrl + C 中断这个过程，从而让解释器再次显示提示。
318 | 
319 | 要处理环境噪声，可调用 Recognizer 类的 adjust_for_ambient_noise（）函数，其操作与处理噪音音频文件时一样。由于麦克风输入声音的可预测性不如音频文件，因此任何时间听麦克风输入时都可以使用此过程进行处理。
320 | 
321 | >>> with mic as source:
322 | ...   r.adjust_for_ambient_noise(source)
323 | ...   audio = r.listen(source)
324 | ...
325 | 运行上面的代码后稍等片刻，尝试在麦克风中说 “hello” 。同样，必须等待解释器提示返回后再尝试识别语音。
326 | 
327 | 请记住，adjust_for_ambient_noise（）默认分析音频源中1秒钟长的音频。若读者认为此时间太长，可用duration参数来调整。
328 | 
329 | SpeechRecognition 资料建议 duration 参数不少于0.5秒。某些情况下，你可能会发现，持续时间超过默认的一秒会产生更好的结果。您所需要的最小值取决于麦克风所处的周围环境，不过，这些信息在开发过程中通常是未知的。根据我的经验，一秒钟的默认持续时间对于大多数应用程序已经足够。
330 | 
331 | 处理难以识别的语音
332 | 
333 | 尝试将前面的代码示例输入到解释器中，并在麦克风中输入一些无法理解的噪音。你应该得到这样的结果：
334 | 
335 | Traceback (most recent call last):
336 |  File "<stdin>", line 1, in <module>
337 |  File "/home/david/real_python/speech_recognition_primer/venv/lib/python3.5/site-packages/speech_recognition/__init__.py", line 858, in recognize_google
338 |   if not isinstance(actual_result, dict) or len(actual_result.get("alternative", [])) == 0: raise UnknownValueError()
339 | speech_recognition.UnknownValueError
340 | 无法被 API 匹配成文字的音频会引发 UnknownValueError 异常，因此要频繁使用 try  和 except 块来解决此类问题。API 会尽全力去把任何声音转成文字，如短咕噜声可能会被识别为 “How”，咳嗽声、鼓掌声以及舌头咔哒声都可能会被转成文字从而引起异常。
341 | 
342 | 结语：
343 | 
344 | 本教程中，我们一直在识别英语语音，英语是 SpeechRecognition 软件包中每个 recognition _ *（）方法的默认语言。但是，识别其他语音也是绝对有可能且很容易完成的。要识别不同语言的语音，请将 recognition _ *（）方法的语言关键字参数设置为与所需语言对应的字符串。
345 | 
346 | 作者：David Amos 
347 | 原文链接：https://realpython.com/python-speech-recognition/
348 | 
349 | 本文分享自微信公众号 - AI科技大本营（rgznai100），作者：研究语音的
350 | 
351 | 原文出处及转载信息见文内详细说明，如有侵权，请联系 yunjia_community@tencent.com 删除。
352 | 
353 | 原始发表时间：2018-04-05
354 | 
355 | 本文参与腾讯云自媒体分享计划，欢迎正在阅读的你也加入，一起分享。
356 | 


--------------------------------------------------------------------------------
/pytorch_speech_commands_001.txt:
--------------------------------------------------------------------------------
  1 | 
  2 | [深度学习进阶 - 实操笔记] 语音识别基础
  3 | 
  4 | 什么都一般的咸鱼 2020-06-06 13:42:02  509  收藏 4
  5 | 分类专栏： 深度学习 文章标签： 深度学习
  6 | 版权
  7 | 语音识别基础
  8 | 1. 深度学习在语音领域上的应用
  9 | （1）语音识别
 10 | （2）语音唤醒
 11 | （3）语音命令
 12 | （4）声纹识别
 13 | （5）生成语音
 14 | 
 15 | 2. 音频领域基本概念
 16 | （1）采样率：每秒采集数据的次数。
 17 | 一般是8000Hz、16000Hz…
 18 | 采样率越高，音频损失越小。根据奈奎斯特采样定理：当采样率高于最高频率2倍以上，音频数据就不会失真。因此处理数据的采样率选择，一般只要高于最高频率2倍以上就行。
 19 | （2）采样精度：每次采样数据的位数。即保存数据的精度：一般为一字节（8位）、两字节（16位）…
 20 | （3）通道数：存在几路音频。（左声道/右声道）
 21 | （4）比特率：针对编码格式，表示压缩编码后每秒的音频数据量大小。（bps，bit/s）
 22 | （5）音频格式：pcm、wav、MP3…
 23 | （6）声波：声音数据是连续的波形。但是计算机只能处理离散的数据，因此我们需要对数据进行采样。（通过上面哪些参数进行采样）
 24 | 
 25 | 3. 语音识别
 26 | （1）首先需要用到torchaudio库。官方文档
 27 | ① window下下载的方式：cmd
 28 | 
 29 | conda install -c conda-forge librosa
 30 | conda install -c groakat sox
 31 | git clone https://github.com/pytorch/audio.git
 32 | 
 33 | python setup.py install
 34 | 1
 35 | 2
 36 | 3
 37 | 4
 38 | 5
 39 | ② 载入数据，并绘制波形图。
 40 | 载入的波形数据为（C，L）：通道数，幅值，以及一个采样率。
 41 | 
 42 | filename = "../_static/img/steam-train-whistle-daniel_simon-converted-from-mp3.wav"
 43 | waveform, sample_rate = torchaudio.load(filename)
 44 | 
 45 | print("Shape of waveform: {}".format(waveform.size()))
 46 | print("Sample rate of waveform: {}".format(sample_rate))
 47 | 
 48 | plt.figure()
 49 | plt.plot(waveform.t().numpy())
 50 | 1
 51 | 2
 52 | 3
 53 | 4
 54 | 5
 55 | 6
 56 | 7
 57 | 8
 58 | （2）音频数据处理需要解决的几个问题：
 59 | ① 音频数据长度不一： 补齐、截取、缩放…
 60 | 补齐：在短的数据后面补0。
 61 | 截取：截取长数据和短数据相等的一节。
 62 | 缩放：自适应平均池化。（数据格式：NCL）即将一组数据缩放至一个统一的尺寸。
 63 | ② 音频数据太长 （例如：1 * 50408）
 64 | 方案一：原始数据做一维卷积，到25以内的长度后，用RNN做输出层输出。（利用大卷积核和大步长）
 65 | 方案二：将音频信号从时域上转换到频域上进行处理。
 66 | （ * 方案一理论可行，但是效果不好。模型没有先验，比较笨。）
 67 | ③ 数据长度不一，每次训练只能一条数据作为一个批次进行训练，效果不好。
 68 | 通过①的解决方案，将数据统一后，放入一个批次进行训练。
 69 | 
 70 | （3）MFCC 梅尔频率倒谱系数
 71 | 根据上面（2）中的方案二，我们要将音频信号数据转换到频谱图上进行操作。（利用图进行训练，对神经网络而已比较简单。）
 72 | MFCC，简单的说就是根据人耳的接收能力进行音频信号处理，最终转换为频谱图。因此，相当于给了模型一个人耳的先验。具体内容参照这个博客
 73 | 
 74 | （4） 项目实战：
 75 | ① 数据：可以使用谷歌语音识别官方数据的speech_commands来学习。我挑选一小部分数据进行学习。
 76 | 首先将载入的音频数据（波形数据）转换为频谱图数据。然后对数据进行归一化。
 77 | 
 78 | # 可以直接在上面链接内下载，也可以通过下面代码下载。
 79 | import torchaudio,torch
 80 | 
 81 | dataset = torchaudio.datasets.SPEECHCOMMANDS(r"F:\data", url='speech_commands_v0.02', folder_in_archive='SpeechCommands', download=False)
 82 | data_loader = torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=True)
 83 | 
 84 | for x in data_loader:
 85 |     print(x)
 86 | 1
 87 | 2
 88 | 3
 89 | 4
 90 | 5
 91 | 6
 92 | 7
 93 | 8
 94 | #这里采样率是载入时候读入的采样率
 95 | tf = torchaudio.transforms.MFCC(sample_rate=8000)
 96 | 
 97 | # 转换后的频谱图并没有归一化，因此要对数据做归一化
 98 | def normalize(tensor):
 99 |     tensor_minusmean = tensor - tensor.mean()
100 |     return tensor_minusmean / tensor_minusmean.max()
101 | 1
102 | 2
103 | 3
104 | 4
105 | 5
106 | 6
107 | 7
108 | #将数据统一长度，分批打包训练
109 | datas = []
110 | tags = []
111 | for data, _, tag in data_loader:
112 | 	tag = torch.stack(tag, dim=1).float()
113 | 	specgram = normalize(tf(data))
114 | 	datas.append(F.adaptive_avg_pool2d(specgram, (32, 256)))
115 | 	tags.append(tag)
116 | specgrams = torch.cat(datas, dim=0)
117 | tags = torch.cat(tags, dim=0)
118 | 1
119 | 2
120 | 3
121 | 4
122 | 5
123 | 6
124 | 7
125 | 8
126 | 9
127 | 10
128 | ② 神经网络：
129 | 由于频谱图是长宽比较大，所以将长和宽拆开，分别处理。
130 | 卷积核选用（1，3）步长选用（1，2）padding为（0，1）；即在序列上用卷积核为3，步长为2进行卷积，在特征上采用卷积核为1，步长为1进行卷积。（行为序列，列为特征）
131 | 频谱图：132256 通过卷积后：N43232
132 | 在进行正常的卷积：N888
133 | 最后根据数据分类，变成想要的输出格式。（按音频的数据标签）
134 | （或者也可以通过接入RNN或者全连接层输出）
135 | 
136 | class Net(torch.nn.Module):
137 | 
138 |     def __init__(self):
139 |         super().__init__()
140 |         self.seq = torch.nn.Sequential(
141 |             torch.nn.Conv2d(1, 4, (1, 3), (1, 2), (0, 1)),
142 |             torch.nn.BatchNorm2d(4),
143 |             torch.nn.ReLU(),
144 |             torch.nn.Conv2d(4, 4, (1, 3), (1, 2), (0, 1)),
145 |             torch.nn.BatchNorm2d(4),
146 |             torch.nn.ReLU(),
147 |             torch.nn.Conv2d(4, 4, (1, 3), (1, 2), (0, 1)),# 4*32*32
148 |             torch.nn.BatchNorm2d(4),
149 |             torch.nn.ReLU(), 
150 |             torch.nn.Conv2d(4, 8, 3, 2, 1),
151 |             torch.nn.BatchNorm2d(8),
152 |             torch.nn.ReLU(),
153 |             torch.nn.Conv2d(8, 8, 3, 2, 1), # 8*8*8
154 |             torch.nn.BatchNorm2d(8),
155 |             torch.nn.ReLU(),
156 |             torch.nn.Conv2d(8, 1, (8, 1)),# 这里使用的数据一个音频序列中有8个类别值，因此将数据转换成 1*1*8 
157 |         )
158 | 
159 |     def forward(self, x):
160 |         h = self.seq(x)
161 |         return h.reshape(-1, 8)
162 | 1
163 | 2
164 | 3
165 | 4
166 | 5
167 | 6
168 | 7
169 | 8
170 | 9
171 | 10
172 | 11
173 | 12
174 | 13
175 | 14
176 | 15
177 | 16
178 | 17
179 | 18
180 | 19
181 | 20
182 | 21
183 | 22
184 | 23
185 | 24
186 | 25
187 | 26
188 | ③ 损失：这里直接使用MSELoss，将预测值与标签做MSE损失。
189 | 
190 | 完整代码：
191 | 
192 | import torch
193 | import torchaudio
194 | from torch.nn import functional as F
195 | 
196 | 
197 | def normalize(tensor):
198 |     tensor_minusmean = tensor - tensor.mean()
199 |     return tensor_minusmean / tensor_minusmean.max()
200 | 
201 | 
202 | tf = torchaudio.transforms.MFCC(sample_rate=8000)
203 | 
204 | 
205 | class Net(torch.nn.Module):
206 | 
207 |     def __init__(self):
208 |         super().__init__()
209 |         self.seq = torch.nn.Sequential(
210 |             torch.nn.Conv2d(1, 4, (1, 3), (1, 2), (0, 1)),
211 |             torch.nn.BatchNorm2d(4),
212 |             torch.nn.ReLU(),
213 |             torch.nn.Conv2d(4, 4, (1, 3), (1, 2), (0, 1)),
214 |             torch.nn.BatchNorm2d(4),
215 |             torch.nn.ReLU(),
216 |             torch.nn.Conv2d(4, 4, (1, 3), (1, 2), (0, 1)),
217 |             torch.nn.BatchNorm2d(4),
218 |             torch.nn.ReLU(),
219 |             torch.nn.Conv2d(4, 8, 3, 2, 1),
220 |             torch.nn.BatchNorm2d(8),
221 |             torch.nn.ReLU(),
222 |             torch.nn.Conv2d(8, 8, 3, 2, 1),
223 |             torch.nn.BatchNorm2d(8),
224 |             torch.nn.ReLU(),
225 |             torch.nn.Conv2d(8, 1, (8, 1)),
226 |         )
227 | 
228 |     def forward(self, x):
229 |         h = self.seq(x)
230 |         return h.reshape(-1, 8)
231 | 
232 | 
233 | if __name__ == '__main__':
234 | 
235 |     data_loader = torch.utils.data.DataLoader(torchaudio.datasets.YESNO('.',download=True), batch_size=1, shuffle=True)
236 | 
237 |     net = Net()
238 |     opt = torch.optim.Adam(net.parameters())
239 | 
240 |     loss_fn = torch.nn.MSELoss()
241 | 
242 |     for epoch in range(100000):
243 |         datas = []
244 |         tags = []
245 |         for data, _, tag in data_loader:
246 |             tag = torch.stack(tag, dim=1).float()
247 |             specgram = normalize(tf(data))
248 |             datas.append(F.adaptive_avg_pool2d(specgram, (32, 256)))
249 |             tags.append(tag)
250 | 
251 |         specgrams = torch.cat(datas, dim=0)
252 |         tags = torch.cat(tags, dim=0)
253 |         y = net(specgrams)
254 |         loss = loss_fn(y, tags)
255 | 
256 |         opt.zero_grad()
257 |         loss.backward()
258 |         opt.step()
259 |         print(loss)
260 | 
261 | ————————————————
262 | 版权声明：本文为CSDN博主「什么都一般的咸鱼」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
263 | 原文链接：https://blog.csdn.net/weixin_41809530/article/details/106585116
264 | 
265 | 
266 | [深度学习进阶 - 实操笔记] 语音识别speech_commands数据集
267 | 
268 | 什么都一般的咸鱼 2020-06-10 16:58:19  1070  收藏 5
269 | 分类专栏： 深度学习 文章标签： 自然语言处理 神经网络
270 | 版权
271 | 语音识别
272 | 训练过程
273 | 前几天简单学了下语音识别的基础知识。（语音识别基础知识）理解了深度学习如何处理语音数据，并且识别语音。所以我就尝试着用学习时候的网络（如下）跑Speech-commands数据集。从里面挑了十个语音类别。但是效果并不好。
274 | 
275 | # 只用到了卷积 对频谱图进行训练
276 | class Net(torch.nn.Module):
277 | 
278 |     def __init__(self):
279 |         super().__init__()
280 |         self.seq = torch.nn.Sequential(
281 |             torch.nn.Conv2d(1, 4, (1, 3), (1, 2), (0, 1)),
282 |             torch.nn.BatchNorm2d(4),
283 |             torch.nn.ReLU(),
284 |             torch.nn.Conv2d(4, 4, (1, 3), (1, 2), (0, 1)),
285 |             torch.nn.BatchNorm2d(4),
286 |             torch.nn.ReLU(),
287 |             torch.nn.Conv2d(4, 4, (1, 3), (1, 2), (0, 1)),
288 |             torch.nn.BatchNorm2d(4),
289 |             torch.nn.ReLU(),
290 |             torch.nn.Conv2d(4, 8, 7, 2, 1),
291 |             torch.nn.BatchNorm2d(8),
292 |             torch.nn.ReLU(),
293 |             torch.nn.Conv2d(8, 16, 5, 1, 0),
294 |             torch.nn.BatchNorm2d(16),
295 |             torch.nn.ReLU(),
296 |             torch.nn.Conv2d(8, 1, (10, 1)),#输出十个类别
297 |             torch.nn.ReLU(),
298 |         )
299 | 
300 |     def forward(self, x):
301 |         h = self.seq(x)
302 |         return h.reshape(-1, 10)
303 | 
304 | 1
305 | 2
306 | 3
307 | 4
308 | 5
309 | 6
310 | 7
311 | 8
312 | 9
313 | 10
314 | 11
315 | 12
316 | 13
317 | 14
318 | 15
319 | 16
320 | 17
321 | 18
322 | 19
323 | 20
324 | 21
325 | 22
326 | 23
327 | 24
328 | 25
329 | 26
330 | 27
331 | 28
332 | 29
333 | 主要存在问题：损失降得很慢而且精度不增。（之前训练了2个多小时，精度最高达到59%，损失降至1.0左右就浮动不变）
334 | 
335 | 于是我再训练过程中把每层网络的权重打印出来。（tensorboard）
336 | 发现神经网络的最后一层权重越来越平缓，更新得越来越慢，因此我决定从权重下手。
337 | 
338 | 
339 | 改进
340 | 可以看到，权重在训练最后，越来越平滑。因此我尝试使用权重初始化。
341 | 
342 | def weight_init(m):
343 |     if (isinstance(m, nn.Conv2d)):
344 |         nn.init.kaiming_normal_(m.weight)
345 |         if m.bias is not None:
346 |             nn.init.zeros_(m.bias)
347 |     elif (isinstance(m, nn.Linear)):
348 |         nn.init.kaiming_normal_(m.weight)
349 |         if m.bias is not None:
350 |             nn.init.zeros_(m.bias)
351 | 1
352 | 2
353 | 3
354 | 4
355 | 5
356 | 6
357 | 7
358 | 8
359 | 9
360 | 但是发现，在其他层上，权重的表现都较好，但是最后一层卷积层仍然存在一样的问题。而且损失仍然下降太慢，精度不增。
361 | 
362 | 因此，问题肯定出在最后一层。我决定直接把最后一层删去，然后增加一层RNN网络代替。（这里使用LSTM作为最会一层，在加上全连接进行输出）
363 | 
364 | class Net(torch.nn.Module):
365 | 
366 |     def __init__(self):
367 |         super().__init__()
368 |         self.seq = torch.nn.Sequential(
369 |             torch.nn.Conv2d(1, 4, (1, 3), (1, 2), (0, 1)),
370 |             torch.nn.BatchNorm2d(4),
371 |             torch.nn.ReLU(),
372 |             torch.nn.Conv2d(4, 4, (1, 3), (1, 2), (0, 1)),
373 |             torch.nn.BatchNorm2d(4),
374 |             torch.nn.ReLU(),
375 |             torch.nn.Conv2d(4, 4, (1, 3), (1, 2), (0, 1)),
376 |             torch.nn.BatchNorm2d(4),
377 |             torch.nn.ReLU(),
378 |             torch.nn.Conv2d(4, 8, 7, 2, 1),
379 |             torch.nn.BatchNorm2d(8),
380 |             torch.nn.ReLU(),
381 |             torch.nn.Conv2d(8, 16, 5, 1, 0),
382 |             torch.nn.BatchNorm2d(16),
383 |             torch.nn.ReLU(),
384 |             #torch.nn.Conv2d(8, 1, (10, 1)),
385 |             #torch.nn.ReLU(),
386 | 
387 |         )
388 | 
389 |         self.rnn = nn.LSTM(10 * 16, 128, 2, batch_first=True, bidirectional=False)
390 |         self.output_layer = nn.Linear(128, 10)
391 |         self.apply(weight_init)
392 | 
393 |     def forward(self, x):
394 |         h = self.seq(x)
395 |         #print(h.shape)
396 |         _n, _c, _h, _w = h.shape
397 |         _x = h.permute(0, 2, 3, 1)
398 |         #print(_x.shape)
399 |         _x = _x.reshape(_n,_h,_w*_c)
400 |         #print(_x.shape)
401 |         h0 = torch.zeros(2 * 1, _n, 128).cuda()  # 初始化反馈值 num_layers * num_directions ,batch, hidden_size
402 |         c0 = torch.zeros(2 * 1, _n, 128).cuda()
403 |         hsn, (hn, cn) = self.rnn(_x, (h0, c0))
404 |         out = self.output_layer(hsn[:, -1, :])
405 | 
406 |         return out#.reshape(-1, 10)
407 | 1
408 | 2
409 | 3
410 | 4
411 | 5
412 | 6
413 | 7
414 | 8
415 | 9
416 | 10
417 | 11
418 | 12
419 | 13
420 | 14
421 | 15
422 | 16
423 | 17
424 | 18
425 | 19
426 | 20
427 | 21
428 | 22
429 | 23
430 | 24
431 | 25
432 | 26
433 | 27
434 | 28
435 | 29
436 | 30
437 | 31
438 | 32
439 | 33
440 | 34
441 | 35
442 | 36
443 | 37
444 | 38
445 | 39
446 | 40
447 | 41
448 | 42
449 | 43
450 | 果然，修改后的效果非常明显。在十种分类上，只训练了50轮次，损失就降至非常低，而且精度基本上可以达到100%。
451 | 
452 | 然后训练35个分类的话，也就是全部的数据集。精度基本到90以上就增长得非常慢了。最终训练9个小时，精度达到97。
453 | 
454 | ————————————————
455 | 版权声明：本文为CSDN博主「什么都一般的咸鱼」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
456 | 原文链接：https://blog.csdn.net/weixin_41809530/article/details/106669728
457 | 
458 | 


--------------------------------------------------------------------------------
/qt_001.md:
--------------------------------------------------------------------------------
 1 | ## issues  
 2 | * 转，《关于Qt 发布程序时遇到qt platform plugin ‘windows’问题的解决方法如下 》  
 3 | https://www.cnblogs.com/ybqjymy/p/12193701.html  
 4 | 
 5 | * 转，《发布的QT程序无法显示图标和图片的问题》  
 6 | https://blog.csdn.net/u011430225/article/details/77505709  
 7 | 
 8 | ## Qt5MusicPlayer  
 9 | https://github.com/lygupup/Qt5MusicPlayer  
10 | 
11 | ## qt英语背单词，词典  
12 | https://github.com/squarefk/EnglishHelper  
13 | 
14 | ## QtQQ, UI Layout  
15 | https://github.com/Blackmamba-xuan/QtQQ  
16 | 
17 | ## VideoPlayer  
18 | https://github.com/yundiantech/VideoPlayer  
19 | 
20 | ## StarlightMusic  
21 | https://github.com/mengps/StarlightMusic  
22 | 
23 | ## embeddedMusicPlayer  
24 | https://github.com/kjiawei/qtCode/tree/master/project/embeddedMusicPlayer  
25 | 


--------------------------------------------------------------------------------
/risc-v_001.txt:
--------------------------------------------------------------------------------
 1 | https://www.cnblogs.com/henjay724/p/14453926.html
 2 | 
 3 | 
 4 | 痞子衡嵌入式：盘点国内RISC-V内核MCU厂商（2019年发布产品）
 5 | 　　大家好，我是痞子衡，是正经搞技术的痞子。今天痞子衡给大家介绍的是国内RISC-V内核MCU厂商（2019）。
 6 | 
 7 | 　　虽然RISC-V风潮已经吹了好几年，但2019年才是其真正进入主流市场的元年，最近国内大量芯片公司崛起，其中有很多公司想在RISC-V新赛道有一番作为，毕竟ARM内核早已是红海，而RISC-V尚处于蓝海。今天痞子衡就为大家盘点一下发布过RISC-V MCU产品（不一定已量产）的厂商：
 8 | 
 9 | 注1：本文主要收录那些2019年度发布RISC-V MCU的厂商。
10 | 注2：本文会持续更新，欢迎大家留言告诉我遗漏的厂商。
11 | 一、物奇科技 WQ5000/7000系列
12 | 　　WQ5106本地语音识别芯片是一颗高性能、人工智能芯片，主要应用于语音自动识别。集成了两颗高性能32位RISC CPU @200MHz，支持基于高速片上总线的浮点运算和SIMD运算，内置高速、大容量DDR DRAM以及高达800KB的片上SRAM，为系统提供了可靠、高速的数据存储。基于优化的算法，该芯片的人工智能系统可以有效地实现深度学习（Deep Learning，DL）的功能，大大降低系统的功耗。
13 | 
14 | 发布时间：2019.04
15 | 产品主页：http://www.wuqi-tech.com/news/10.html
16 | 产品主页：http://www.wuqi-tech.com/product/7.html
17 | 二、核芯互联 璇玑CLE
18 | 　　璇玑CLE系列是核芯互联基于32位RISC-V内核（夸克Q系列）推出的通用嵌入式MCU处理器，主要适用于白色家电、工业控制、物联网等对稳定性、功耗和计算能力要求较高的应用领域。
19 | 
20 | 　　璇玑CLE系列具有超高带宽与两级流水RISC-V 哈佛体系结构，在最高工作频率32MHz下的计算性能可达到45 DMIPS，满足超低功耗设计，全功能待机功耗为7.5μA，动态功耗为51μA/MHz，适用于1.6V-5.5V超宽工作电压。璇玑拥有大容量eFlash、代码缓存以及数据缓存，通信外设接口丰富，并内置高精度OSC振荡器。值得一提的是，璇玑CLE系列RAM达到48KB，可充分满足不同家电和应用场景的控制需求。
21 | 
22 | 发布时间：2019.04
23 | 产品主页：N/A
24 | 三、时擎科技 AT10xx
25 | 　　AT1000是可用于智能语音交互、智能控制的低成本、高集成度、低功耗芯片，采用时擎科技自主研发的RISC-V主控处理器。
26 | 
27 | 发布时间：2019.04
28 | 产品1主页：https://www.timesintelli.com/productinfo/1476230.html
29 | 产品2主页：https://www.timesintelli.com/productinfo/1476231.html
30 | 四、紫光展锐 春藤5882
31 | 　　春藤5882是一款高集成度的TWS单芯片解决方案，支持蓝牙5，不同于市场中TWS左右耳非同步传输的方案，紫光展锐采用了独立研发的多连接方案，拥有自主专利且已获得中美专利局授权，可实现优异的低功耗和低时延表现。采用了紫光展锐自主研发的TWS蓝牙耳机技术，可实现超低功耗、超低时延，为用户提供高品质的双主耳体验。
32 | 
33 | 发布时间：2019.05
34 | 产品主页：https://www.unisoc.com/#/home/prodList?id=1280001849763622914&pid=1282575535297331201&cdx=0&t=0
35 | 
36 | 
37 | 五、优微科技 UPD350
38 | 　　UPD350系列是优微科技全新架构的USB PD家族产品。新产品支持USB PD 3.0/Type-C控制器, 同时系统内核采用了一个更加具有开放性，灵活性的RISC-V 内核处理器。该系列产品可以用来支持增强型PD应用。
39 | 
40 | 发布时间：2019.05
41 | 产品主页：http://www.uwetek.com/index.php?id=65
42 | 
43 | 
44 | 六、中科物栖 JX1
45 | 　　JX1采用55nm制程，拥有异构双核RISC-V处理器核心，并融合了可编程的AI专用加速器，可替代现有的ARM Cortex-M系列核心，算力可轻松应对低维传感器信号如声音、运动传感器、低分图像等。
46 | 
47 | 发布时间：2019.07
48 | 产品主页：https://www.jeejio.com/
49 | 七、兆易创新 GD32VF103
50 | 　　GD32VF103系列MCU采用了全新的基于开源指令集架构RISC-V的Bumblebee处理器内核，是兆易创新(Gigadevice)携手中国领先的RISC-V处理器内核IP和解决方案厂商芯来科技(Nuclei System Technology)，面向物联网及其它超低功耗场景应用自主联合开发的一款商用RISC-V处理器内核。
51 | 
52 | 　　GD32VF103系列RISC-V MCU提供了108MHz的运算主频，以及16KB到128KB的片上闪存和6KB到32KB的SRAM缓存，gFlash®专利技术支持内核访问闪存高速零等待。Bumblebee内核还内置了单周期硬件乘法器、硬件除法器和加速单元应对高级运算和数据处理的挑战。
53 | 
54 | 发布时间：2019.08
55 | 产品主页：http://www.gd32mcu.com/cn/product/risc
56 | 
57 | 
58 | 
59 | 八、跃昉科技 BF-2细滘
60 | 　　格兰仕控股的芯片业务子公司跃昉科技第一代芯片BF2芯片（细滘）已产业化（基于赛昉科技提供的RISC-V处理器IP），涵盖WiFi和蓝牙功能，已通过安全认证，可用于智能家电，已应用在格兰仕微波炉上。另外，细滘芯片已与涂鸦、京东等IoT公共云衔接，还用在智能灯炮、智能插座、家电主控等互联产品上。
61 | 
62 | 发布时间：2019.10
63 | 产品主页：N/A
64 | 欢迎订阅
65 | 文章会同时发布到我的 博客园主页、CSDN主页、知乎主页、微信公众号 平台上。
66 | 
67 | 微信搜索"痞子衡嵌入式"或者扫描下面二维码，就可以在手机上第一时间看了哦。
68 | 
69 | 


--------------------------------------------------------------------------------
/risc-v_002.txt:
--------------------------------------------------------------------------------
 1 | https://blog.csdn.net/Henjay724/article/details/114847975
 2 | 
 3 | 痞子衡嵌入式：盘点国内RISC-V内核MCU厂商（2020年发布产品）
 4 | 
 5 | 痞子衡 2021-02-28 21:01:00  31  收藏
 6 | 文章标签： 芯片 算法 内核 物联网 java
 7 | 版权
 8 | 　　大家好，我是痞子衡，是正经搞技术的痞子。今天痞子衡给大家介绍的是国内RISC-V内核MCU厂商(2020)。
 9 | 
10 | 　　虽然RISC-V风潮已经吹了好几年，但2019年才是其真正进入主流市场的元年，最近国内大量芯片公司崛起，其中有很多公司想在RISC-V新赛道有一番作为，毕竟ARM内核早已是红海，而RISC-V尚处于蓝海。今天痞子衡就为大家盘点一下发布过RISC-V MCU产品(不一定已量产)的厂商：
11 | 
12 | 注1：本文主要收录那些2020年度发布RISC-V MCU的厂商。
13 | 注2：本文会持续更新，欢迎大家留言告诉我遗漏的厂商。
14 | 一、沁恒微电子 CH32V103
15 | 　　CH32V103系列是以RISC-V3A处理器为核心的32位通用微控制器，该处理器是基于RISC-V开源指令集设计。片上集成了时钟安全机制、多级电源管理、通用DMA控制器。此系列具有1路USB2.0主机/设备接口、多通道12位ADC转换模块、多通道TouchKey、多组定时器、多路IIC/USART/SPI接口等丰富的外设资源。
16 | 
17 | 发布时间：2020.06
18 | 系列主页：http://special.wch.cn/zh_cn/RISCV_MCU_Index/
19 | 产品主页：http://www.wch.cn/products/CH32V103.html
20 | 
21 | 
22 | 二、泰凌微电子 TLSR9xxx
23 | 　　泰凌微电子推出了基于 RISC-V 的全新 Telink TLSR 9 系列高性能 SoC 芯片，将主要适用于可穿戴设备和各类 IoT 应用产品。
24 | 
25 | 　　Telink TLSR 9 系列集成了 32 位 RISC-V MCU(晶心D25内核)，标配版本最高运行速度达 96MHz，支持 5 级流水线，计算能力达 2.59 DMIPS/ MHz，CoreMark 跑分 3.54/MHz，此外还集成了 DSP 扩展指令以及浮点运算模块，便于音频算法和 Sensor 算法的开发。
26 | 
27 | 发布时间：2020.08
28 | 产品主页：https://www.telink-semi.cn/products/multiprotocol-iot/
29 | 
30 | 三、博流智能 BL60x/BL70x
31 | 　　BL602 是一款 Wi-Fi + BLE 组合的芯片组，用于低功耗和高性能应用开发。无线子系统包含 2.4G 无线电，Wi-Fi 802.11b/g/n 和 BLE 基带/MAC 设计。微控制器子系统包含一个低功耗的 32 位 RISC CPU，高速缓存和存储器。电源管理单元控制低功耗模式。此外，还支持各种安全性能。
32 | 
33 | 　　BL702 是一款 BLE + Zigbee 组合的芯片组，用于低功耗物联网应用开发。无线子系统包含 2.4G 无线电，BLE5.0 和 802.15.4 基带/MAC 设计。微控制器子系统包含一个低功耗的 32 位 RISC CPU，高速缓存和存储器。电源管理单元控制低功耗模式。此外，还支持各种安全性能。
34 | 
35 | 发布时间：2020.08
36 | 产品1主页：https://www.bouffalolab.com/bl602
37 | 产品2主页：https://www.bouffalolab.com/bl70X
38 | 
39 | 
40 | 四、乐鑫科技 ESP32-C3
41 | 　　ESP32-C3 是一款安全稳定、低功耗、低成本的物联网芯片，搭载 RISC-V 32 位单核处理器，支持 2.4 GHz Wi-Fi 和 Bluetooth LE 5.0。为物联网产品提供行业领先的射频性能、完善的安全机制和丰富的内存资源。ESP32-C3 对 Wi-Fi 和 Bluetooth LE 5.0 的双重支持降低了设备配网难度，适用于广泛的物联网应用场景。
42 | 
43 | 　　ESP32-C3 搭载 RISC-V 32 位单核处理器，时钟频率高达 160 MHz。具有 22 个可编程 GPIO 管脚、内置 400 KB SRAM，支持通过 SPI、Dual SPI、Quad SPI 和 QPI 接口外接多个 flash，满足各类物联网产品功能需求。此外，ESP32-C3 的耐高温特性也使其成为照明和工控领域的理想选择。
44 | 
45 | 发布时间：2020.12
46 | 产品主页：https://www.espressif.com/zh-hans/products/socs/esp32-c3
47 | 
48 | 五、中微半导体 ANT32RV56xx
49 | 　　ANT32RV56xx集成RISC-V内核的32位微控制器。该系列芯片搭载芯来科技(Nuclei System Technology) N100系列超低功耗RISC-V处理器内核，集成模拟外设并简化设计，轻松应对消费电子对高算力、低功耗的要求。
50 | 
51 | 发布时间：2020.12
52 | 产品主页：https://www.mcu.com.cn/production/microcontrollers-risc-v_7.html
53 | 
54 | 
55 | 六、中科蓝讯 AB32VG1
56 | 　　AB32VG1采用中科蓝讯自主RISC-V内核，提供了125MHz的运算主频(最高可超频至192MHz)，片上集成RAM 192Kbyte，Flash 1Mbyte，ADC，DAC，PWM，USB，SD， UART，IIC等资源。
57 | 
58 | 发布时间：2020.12
59 | 产品主页：http://www.bluetrum.com/product/
60 | 
61 | 欢迎订阅
62 | 文章会同时发布到我的 博客园主页、CSDN主页、知乎主页、微信公众号 平台上。
63 | 
64 | 微信搜索"痞子衡嵌入式"或者扫描下面二维码，就可以在手机上第一时间看了哦。
65 | ————————————————
66 | 版权声明：本文为CSDN博主「痞子衡」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
67 | 原文链接：https://blog.csdn.net/Henjay724/article/details/114847975
68 | 
69 | 


--------------------------------------------------------------------------------
/risc-v_003.txt:
--------------------------------------------------------------------------------
 1 | https://blog.csdn.net/Henjay724/article/details/114847983
 2 | 
 3 | 痞子衡嵌入式：盘点国内RISC-V内核MCU厂商（2021年发布产品）
 4 | 
 5 | 痞子衡 2021-03-02 14:28:00  10  收藏
 6 | 文章标签： 芯片 嵌入式 内核 java 物联网
 7 | 版权
 8 | 　　大家好，我是痞子衡，是正经搞技术的痞子。今天痞子衡给大家介绍的是国内RISC-V内核MCU厂商(2021)。
 9 | 
10 | 　　虽然RISC-V风潮已经吹了好几年，但2019年才是其真正进入主流市场的元年，最近国内大量芯片公司崛起，其中有很多公司想在RISC-V新赛道有一番作为，毕竟ARM内核早已是红海，而RISC-V尚处于蓝海。今天痞子衡就为大家盘点一下发布过RISC-V MCU产品(不一定已量产)的厂商：
11 | 
12 | 注1：本文主要收录那些2021年度发布RISC-V MCU的厂商。
13 | 注2：本文会持续更新，欢迎大家留言告诉我遗漏的厂商。
14 | 一、航顺芯片 HK32U1xx9
15 | 　　HK32U1xx9系列产品采用异构集成架构：芯来RISC-V处理器N203负责通信及控制；Arm Cortex-M3负责运算。此外，该芯片还带有MMU硬件级系统资源访问权限管理(配置颗粒度细化到每个外设)、自研IPC双核通信控制协议、高效实现的双核间数据交互，并支持双线JTAG/SWD调试接口和五线JTAG调试接口。
16 | 
17 | 发布时间：2021.01
18 | 产品主页：http://www.hsxp-hk.com/
19 | 二、平头哥 CH2601
20 | 　　CH2601是基于玄铁E906的RISC-V生态芯片，最高主频220MHz，支持AliOS Things物联网操作系统、平头哥YoC软件平台及平头哥剑池开发工具(CDK)。
21 | 
22 | 发布时间：2021.02
23 | 产品主页：https://occ.t-head.cn/vendor/detail/index?spm=a2cl5.14293897.0.0.d149132dVkpkXj&id=3878941840279867392&vendorId=3706716635429273600&module=1
24 | 
25 | 欢迎订阅
26 | 文章会同时发布到我的 博客园主页、CSDN主页、知乎主页、微信公众号 平台上。
27 | 
28 | 微信搜索"痞子衡嵌入式"或者扫描下面二维码，就可以在手机上第一时间看了哦。
29 | ————————————————
30 | 版权声明：本文为CSDN博主「痞子衡」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
31 | 原文链接：https://blog.csdn.net/Henjay724/article/details/114847983
32 | 
33 | 


--------------------------------------------------------------------------------
/risc-v_004.txt:
--------------------------------------------------------------------------------
  1 | http://www.elecfans.com/d/1530288.html
  2 | 
  3 | 国内新晋RISC-MCU厂商以及产品
  4 | 嵌入式ARM • 2021-03-09 13:49 • 次阅读 0
  5 | 
  6 | 虽然RISC-V风潮已经吹了好几年，但2019年才是其真正进入主流市场的元年，最近国内大量芯片公司崛起，其中有很多公司想在RISC-V新赛道有一番作为，毕竟ARM内核早已是红海，而RISC-V尚处于蓝海。今天痞子衡就为大家盘点一下发布过RISC-V MCU产品（不一定已量产）的厂商：
  7 | 
  8 | 注：本文会持续更新，欢迎大家留言告诉我新晋国内RISC-MCU厂商以及产品。
  9 | 
 10 | 一、核芯互联 璇玑CLE
 11 | 
 12 | 璇玑CLE系列是核芯互联基于32位RISC-V内核（夸克Q系列）推出的通用嵌入式MCU处理器，主要适用于白色家电、工业控制、物联网等对稳定性、功耗和计算能力要求较高的应用领域。
 13 | 
 14 | 璇玑CLE系列具有超高带宽与两级流水RISC-V 哈佛体系结构，在最高工作频率32MHz下的计算性能可达到45 DMIPS，满足超低功耗设计，全功能待机功耗为7.5μA，动态功耗为51μA/MHz，适用于1.6V-5.5V超宽工作电压。璇玑拥有大容量eFlash、代码缓存以及数据缓存，通信外设接口丰富，并内置高精度OSC振荡器。值得一提的是，璇玑CLE系列RAM达到48KB，可充分满足不同家电和应用场景的控制需求。
 15 | 
 16 | 发布时间：2019.04
 17 | 
 18 | 产品主页：N/A
 19 | 
 20 | 二、兆易创新 GD32VF103
 21 | 
 22 | GD32VF103系列MCU采用了全新的基于开源指令集架构RISC-V的Bumblebee处理器内核，是兆易创新(Gigadevice)携手中国领先的RISC-V处理器内核IP和解决方案厂商芯来科技(Nuclei System Technology)，面向物联网及其它超低功耗场景应用自主联合开发的一款商用RISC-V处理器内核。
 23 | 
 24 | GD32VF103系列RISC-V MCU提供了108MHz的运算主频，以及16KB到128KB的片上闪存和6KB到32KB的SRAM缓存，gFlash专利技术支持内核访问闪存高速零等待。Bumblebee内核还内置了单周期硬件乘法器、硬件除法器和加速单元应对高级运算和数据处理的挑战。
 25 | 
 26 | 发布时间：2019.08
 27 | 
 28 | 91f260ba-7ec2-11eb-8b86-12bb97331649.png
 29 | 
 30 | 三、沁恒微电子 CH32V103
 31 | 
 32 | CH32V103系列是以RISC-V3A处理器为核心的32位通用微控制器，该处理器是基于RISC-V开源指令集设计。片上集成了时钟安全机制、多级电源管理、通用DMA控制器。此系列具有1路USB2.0主机/设备接口、多通道12位ADC转换模块、多通道TouchKey、多组定时器、多路IIC/USART/SPI接口等丰富的外设资源。
 33 | 
 34 | 发布时间：2020.06
 35 | 
 36 | 四、乐鑫科技 ESP32-C3
 37 | 
 38 | ESP32-C3 是一款安全稳定、低功耗、低成本的物联网芯片，搭载 RISC-V 32 位单核处理器，支持 2.4 GHz Wi-Fi 和 Bluetooth LE 5.0。为物联网产品提供行业领先的射频性能、完善的安全机制和丰富的内存资源。ESP32-C3 对 Wi-Fi 和 Bluetooth LE 5.0 的双重支持降低了设备配网难度，适用于广泛的物联网应用场景。
 39 | 
 40 | ESP32-C3 搭载 RISC-V 32 位单核处理器，时钟频率高达 160 MHz。具有 22 个可编程 GPIO 管脚、内置 400 KB SRAM，支持通过 SPI、Dual SPI、Quad SPI 和 QPI 接口外接多个 flash，满足各类物联网产品功能需求。此外，ESP32-C3 的耐高温特性也使其成为照明和工控领域的理想选择。
 41 | 
 42 | 发布时间：2020.12
 43 | 
 44 | 五、中微半导体 ANT32RV56xx
 45 | 
 46 | ANT32RV56xx集成RISC-V内核的32位微控制器。该系列芯片搭载芯来科技(Nuclei System Technology) N100系列超低功耗RISC-V处理器内核，集成模拟外设并简化设计，轻松应对消费电子对高算力、低功耗的要求。
 47 | 
 48 | 发布时间：2020.12
 49 | 
 50 | 92dc74fc-7ec2-11eb-8b86-12bb97331649.png
 51 | 
 52 | 六、中科蓝讯 AB32VG1
 53 | 
 54 | AB32VG1采用中科蓝讯自主RISC-V内核，提供了125MHz的运算主频（最高可超频至192MHz），片上集成RAM 192Kbyte，Flash 1Mbyte，ADC，DAC，PWM，USB，SD， UART，IIC等资源。
 55 | 
 56 | 发布时间：2020.12
 57 | 
 58 | 七、航顺芯片 HK32U1xx9
 59 | 
 60 | HK32U1xx9系列产品采用异构集成架构：芯来RISC-V处理器N203负责通信及控制；Arm Cortex-M3负责运算。此外，该芯片还带有MMU硬件级系统资源访问权限管理（配置颗粒度细化到每个外设）、自研IPC双核通信控制协议、高效实现的双核间数据交互，并支持双线JTAG/SWD调试接口和五线JTAG调试接口。
 61 | 
 62 | 发布时间：2021.01
 63 | 
 64 | 八、芯来科技 开源蜂鸟E203
 65 | 
 66 | 蜂鸟E203处理器由芯来科技开发，是国内第一个完善的开源RISC-V处理器项目，提供了一套从模块到SoC，从硬件到软件，从运行到调试的完整解决方案，并且配备完整的文档，书籍和开发板。其研发团队经验丰富，使用稳健的Verilog 2001语法编写的可综合的RTL代码，以工业级标准进行开发，注释丰富、可读性强、易于理解。
 67 | 
 68 | 发布时间：2018.04
 69 | 
 70 | 九、华米科技 黄山1/2号
 71 | 
 72 | 华米科技发布了全球可穿戴领域第一颗人工智能芯片“黄山1号“。这颗芯片基于RISC-V指令集架构开发，240MHz主频、55nm制程，并且集成了AON（Always On）模块控制器和神经网络加速模块。
 73 | 
 74 | 黄山1号不仅功耗低，还可以自动将传感器数据搬运到内部 SRAM之中，让数据存储性能更快、更稳定。而更为值得称道的是，它集成了神经网络加速模块，能够本地化处理AI任务 —— 通过Heart Rate、ECG Engine、ECG Engine Pro、Arrhythmias四大驱动引擎，对心率、心电、心律失常等进行实时监测与分析，可广泛应用在各类智能可穿戴设备中。
 75 | 
 76 | 发布时间：2018.09
 77 | 
 78 | 十、物奇科技 WQ5000/7000系列
 79 | 
 80 | WQ5106本地语音识别芯片是一颗高性能、人工智能芯片，主要应用于语音自动识别。集成了两颗高性能32位RISC CPU @200MHz，支持基于高速片上总线的浮点运算和SIMD运算，内置高速、大容量DDR DRAM以及高达800KB的片上SRAM，为系统提供了可靠、高速的数据存储。基于优化的算法，该芯片的人工智能系统可以有效地实现深度学习（Deep Learning，DL）的功能，大大降低系统的功耗。
 81 | 
 82 | 发布时间：2019.04
 83 | 
 84 | 十一、跃昉科技 BF-2细滘
 85 | 
 86 | 格兰仕控股的芯片业务子公司跃昉科技第一代芯片BF2芯片（细滘）已产业化（基于赛昉科技提供的RISC-V处理器IP），涵盖WiFi和蓝牙功能，已通过安全认证，可用于智能家电，已应用在格兰仕微波炉上。另外，细滘芯片已与涂鸦、京东等IoT公共云衔接，还用在智能灯炮、智能插座、家电主控等互联产品上。
 87 | 
 88 | 发布时间：2019.10
 89 | 
 90 | 产品主页：N/A
 91 | 
 92 | 十二、泰凌微电子 TLSR9xxx
 93 | 
 94 | 泰凌微电子推出了基于 RISC-V 的全新 Telink TLSR 9 系列高性能 SoC 芯片，将主要适用于可穿戴设备和各类 IoT 应用产品。
 95 | 
 96 | Telink TLSR 9 系列集成了 32 位 RISC-V MCU（晶心D25内核），标配版本最高运行速度达 96MHz，支持 5 级流水线，计算能力达 2.59 DMIPS/ MHz，CoreMark 跑分 3.54/MHz，此外还集成了 DSP 扩展指令以及浮点运算模块，便于音频算法和 Sensor 算法的开发。
 97 | 
 98 | 发布时间：2020.08
 99 | 
100 | 十三、博流智能 BL60x/BL70x
101 | 
102 | BL602 是一款 Wi-Fi + BLE 组合的芯片组，用于低功耗和高性能应用开发。无线子系统包含 2.4G 无线电，Wi-Fi 802.11b/g/n 和 BLE 基带/MAC 设计。微控制器子系统包含一个低功耗的 32 位 RISC CPU，高速缓存和存储器。电源管理单元控制低功耗模式。此外，还支持各种安全性能。
103 | 
104 | BL702 是一款 BLE + Zigbee 组合的芯片组，用于低功耗物联网应用开发。无线子系统包含 2.4G 无线电，BLE5.0 和 802.15.4 基带/MAC 设计。微控制器子系统包含一个低功耗的 32 位 RISC CPU，高速缓存和存储器。电源管理单元控制低功耗模式。此外，还支持各种安全性能。
105 | 
106 | 发布时间：2020.08
107 | 
108 | 939d704e-7ec2-11eb-8b86-12bb97331649.png
109 | 
110 | 93f3720a-7ec2-11eb-8b86-12bb97331649.png
111 | 
112 | 十四、平头哥 CH2601
113 | 
114 | CH2601是基于玄铁E906的RISC-V生态芯片，最高主频220MHz，支持AliOS Things物联网操作系统、平头哥YoC软件平台及平头哥剑池开发工具(CDK)。
115 | 
116 | 发布时间：2021.02
117 | 
118 | 原文标题：最全！盘点国内RISC-V内核MCU厂商
119 | 
120 | 文章出处：【微信公众号：嵌入式ARM】欢迎添加关注！文章转载请注明出处。
121 | 
122 | 责任编辑：haq
123 | 
124 | 


--------------------------------------------------------------------------------
/rpa_001.txt:
--------------------------------------------------------------------------------
1 | * blue prism  
2 | * uipath  
3 | 


--------------------------------------------------------------------------------
/rt-smart_ffmpeg_demo.txt:
--------------------------------------------------------------------------------
  1 | https://www.rt-thread.org/document/site/#/rt-thread-version/rt-thread-smart/application-note/sdl2_ffmpeg/sdl2_ffmpeg
  2 | 
  3 | 基于 FFmpeg + SDL2 实现视频播放
  4 | 基于文档《使用 VS Code 开发 GUI 应用》，使用 FFmpeg+SDL2 在 ART-Pi Smart 平台上实现视频播放功能；由于 ART-Pi Smart 没有音频模块，所以没有实现音频的解码播放。
  5 | 
  6 | X264
  7 | 简介
  8 | X264 是由 VideoLAN 开发的一个免费开源软件库和命令行实用程序，用于将视频流编码为 H.264 / MPEG-4 AVC 格式，根据GNU通用公共许可证的条款发布的。
  9 | 
 10 | FFmpeg 是一个编解码库，功能丰富，其自带 H.264 解码功能，但是要实现 H.264 编码需要集成 X264 将其作为编码器。
 11 | 
 12 | 下载
 13 | git clone https://code.videolan.org/videolan/x264.git
 14 | 复制错误复制成功
 15 | 源码目录：
 16 | 
 17 | Snipaste_2022-01-17_10-16-33.png
 18 | 
 19 | 交叉编译
 20 | 在 x264 文件夹同级目录下创建 build_x264.sh 文件
 21 | 
 22 | build_x264.sh 文件内容如下，注意：RTT_EXEC_PATH 和 ROOTDIR 修改为自己本地路径：
 23 | 
 24 | # Get initial variables
 25 | export RTT_EXEC_PATH=/home/liukang/repo/ART-Pi-smart/tools/gnu_gcc/arm-linux-musleabi_for_x86_64-pc-linux-gnu/bin
 26 | export PATH=$PATH:$RTT_EXEC_PATH:$RTT_EXEC_PATH/../arm-linux-musleabi/bin
 27 | 
 28 | export CROSS_COMPILE="arm-linux-musleabi"
 29 | 
 30 | if [ "$1" == "debug" ]; then
 31 |     export CFLAGS="-march=armv7-a -marm -msoft-float -D__RTTHREAD__ -O0 -g -gdwarf-2 -Wall -n --static"
 32 | else
 33 |     export CFLAGS="-march=armv7-a -marm -msoft-float -D__RTTHREAD__ -O2 -Wall -n --static"
 34 | fi
 35 | 
 36 | export AR=${CROSS_COMPILE}-ar
 37 | export AS=${CROSS_COMPILE}-as
 38 | export LD=${CROSS_COMPILE}-ld
 39 | export RANLIB=${CROSS_COMPILE}-ranlib
 40 | export CC=${CROSS_COMPILE}-gcc
 41 | export CXX=${CROSS_COMPILE}-g++
 42 | export NM=${CROSS_COMPILE}-nm
 43 | 
 44 | ROOTDIR="/home/liukang/repo/ART-Pi-smart/userapps"
 45 | 
 46 | APP_NAME="x264"
 47 | 
 48 | APP_DIR=${APP_NAME}
 49 | LIB_DIR=${ROOTDIR}/sdk/lib
 50 | INC_DIR=${ROOTDIR}/sdk/include
 51 | 
 52 | RT_DIR=${ROOTDIR}/sdk/rt-thread
 53 | RT_INC=" -I. -Iinclude -I${ROOTDIR} -I${RT_DIR}/include -I${RT_DIR}/components/dfs -I${RT_DIR}/components/drivers -I${RT_DIR}/components/finsh -I${RT_DIR}/components/net -I${INC_DIR}/sdl -DHAVE_CCONFIG_H"
 54 | RT_INC+=" -I${ROOTDIR}/../kernel/bsp/imx6ull-artpi-smart/drivers/"
 55 | 
 56 | export CPPFLAGS=${RT_INC}
 57 | export LDFLAGS="-L${LIB_DIR} "
 58 | 
 59 | export LIBS="-T ${ROOTDIR}/linker_scripts/arm/cortex-a/link.lds -march=armv7-a -marm -msoft-float -L${RT_DIR}/lib -Wl,--whole-archive -lrtthread -Wl,--no-whole-archive -n -static -Wl,--start-group -lc -lgcc -lrtthread -Wl,--end-group"
 60 | 
 61 | # default build
 62 | function builddef() {
 63 |     cd ${APP_DIR}
 64 |     ./configure \
 65 |     --prefix=/home/liukang/repo/x264lib \
 66 |     --host=${CROSS_COMPILE} \
 67 |     --disable-asm \
 68 |     --enable-static 
 69 |     make clean
 70 |     if [ "$1" == "verbose" ]; then
 71 |         make V=1
 72 |     else
 73 |         make
 74 |     fi
 75 |     make install
 76 | }
 77 | 
 78 | builddef $1
 79 | 复制错误复制成功
 80 | 运行 build_x264.sh 文件，生成静态库：
 81 | 
 82 | Snipaste_2022-01-17_11-37-07.png
 83 | 
 84 | 上面步骤成功后，在 x264lib 文件夹下，会生成 x264 的静态库文件和头文件：
 85 | 
 86 | 静态库文件：
 87 | 
 88 | Snipaste_2022-01-17_11-37-42.png
 89 | 
 90 | 头文件：
 91 | 
 92 | Snipaste_2022-01-17_11-38-01.png
 93 | 
 94 | FFmpeg
 95 | 简介
 96 | FFmpeg 是一套可以用来记录、转换数字音频、视频，并能将其转化为流的开源计算机程序。采用 LGP L或 GPL 许可证。它提供了录制、转换以及流化音视频的完整解决方案。它包含了非常先进的音频/视频编解码库 libavcodec，为了保证高可移植性和编解码质量，libavcodec 里很多 code 都是从头开发的。
 97 | 
 98 | FFmpeg 在 Linux平台下开发，但它同样也可以在其它操作系统环境中编译运行，包括 Windows、Mac OS X 等。这个项目最早由 Fabrice Bellard 发起，2004 年至 2015 年间由 Michael Niedermayer 主要负责维护。下面介绍如何将 FFmpeg 移植到 ART-Pi Smart 平台上，实现视频的解码功能。
 99 | 
100 | 下载
101 | 打开 FFmpeg 官网，下载源码:
102 | 
103 | Snipaste_2022-01-17_10-14-19.png
104 | 
105 | 交叉编译
106 | 解压 tar.bz2 文件：
107 | 
108 | tar -jxvf ffmpeg-snapshot.tar.bz2
109 | 复制错误复制成功
110 | 在 ffmpeg 文件夹同级目录下创建 build_ffmpeg.sh 文件
111 | 
112 | # Get initial variables
113 | ROOTDIR="/home/liukang/repo/ART-Pi-smart/userapps"
114 | 
115 | APP_NAME="ffmpeg"
116 | 
117 | APP_DIR=${APP_NAME}
118 | LIB_DIR=${ROOTDIR}/sdk/lib
119 | INC_DIR=${ROOTDIR}/sdk/include
120 | 
121 | RT_DIR=${ROOTDIR}/sdk/rt-thread
122 | RT_INC=" -I. -Iinclude -I${ROOTDIR} -I${RT_DIR}/include -I${RT_DIR}/components/dfs -I${RT_DIR}/components/drivers -I${RT_DIR}/components/finsh -I${RT_DIR}/components/net -I${INC_DIR}/sdl -DHAVE_CCONFIG_H"
123 | RT_INC+=" -I${ROOTDIR}/../kernel/bsp/imx6ull-artpi-smart/drivers/"
124 | 
125 | export CPPFLAGS=${RT_INC}
126 | export LDFLAGS="-L${LIB_DIR} "
127 | 
128 | export LIBS="-T ${ROOTDIR}/linker_scripts/arm/cortex-a/link.lds -march=armv7-a -marm -msoft-float -L${RT_DIR}/lib -Wl,--whole-archive -lrtthread -Wl,--no-whole-archive -n -static -Wl,--start-group -lc -lgcc -lrtthread -Wl,--end-group"
129 | 
130 | export RTT_EXEC_PATH=/home/liukang/repo/ART-Pi-smart/tools/gnu_gcc/arm-linux-musleabi_for_x86_64-pc-linux-gnu/bin
131 | export PATH=$PATH:$RTT_EXEC_PATH:$RTT_EXEC_PATH/../arm-linux-musleabi/bin
132 | 
133 | export CROSS_COMPILE="arm-linux-musleabi"
134 | 
135 | if [ "$1" == "debug" ]; then
136 |     export CFLAGS="-march=armv7-a -marm -msoft-float -D__RTTHREAD__ -O0 -g -gdwarf-2 -Wall -n --static"
137 | else
138 |     export CFLAGS="-march=armv7-a -marm -msoft-float -D__RTTHREAD__ -O2 -Wall -n --static"
139 | fi
140 | 
141 | # default build
142 | function builddef() {
143 |     cd ${APP_DIR}
144 |     ./configure \
145 |     --cross-prefix=${CROSS_COMPILE} --enable-cross-compile --target-os=linux \
146 |     --cc=${CROSS_COMPILE}-gcc \
147 |     --ar=${CROSS_COMPILE}-ar \
148 |     --ranlib=${CROSS_COMPILE}-ranlib \
149 |     --arch=arm --prefix=/home/liukang/repo/ffmpeg/ffmpeg_lib \
150 |     --pkg-config-flags="--static" \
151 |     --enable-gpl --enable-nonfree --disable-ffplay --enable-swscale --enable-pthreads --disable-armv5te --disable-armv6 --disable-armv6t2 --disable-x86asm  --disable-stripping \
152 |     --enable-libx264 --extra-cflags=-I/home/liukang/repo/x264lib/include --extra-ldflags=-L/home/liukang/repo/x264lib/lib --extra-libs=-ldl
153 |     make clean
154 |     if [ "$1" == "verbose" ]; then
155 |         make V=1
156 |     else
157 |         make
158 |     fi
159 |     make install
160 | }
161 | 
162 | builddef $1
163 | 复制错误复制成功
164 | 运行 build_ffmpeg.sh 文件
165 | 
166 | Snipaste_2022-01-17_10-13-04.png
167 | 
168 | 上面步骤成功后，在 ffmpeg_lib 文件夹下，会生成 ffmpeg 的静态库文件和头文件：
169 | 
170 | Lib 库：
171 | 
172 | Snipaste_2022-01-17_10-11-43.png
173 | 
174 | 头文件：
175 | 
176 | Snipaste_2022-01-17_10-11-55.png
177 | 
178 | 视频播放 Demo
179 | 使用 VS Code 生成 makefile 工程
180 | 
181 | 将上面生成的静态库文件放在 Smart SDK 目录下
182 | 
183 | Snipaste_2022-01-17_10-11-08.png
184 | 
185 | 修改 makefile 文件，添加静态库
186 | 
187 | #程序版本号
188 | VERSION = 1.0.0     
189 | 
190 | CROSS_COMPILE = arm-linux-musleabi-
191 | CC = $(CROSS_COMPILE)gcc
192 | CXX = $(CROSS_COMPILE)g++
193 | 
194 | 复制错误复制成功
195 | project 根路径
196 | PROJECT_DIR := $(shell pwd)
197 | 
198 | userapps 根路径
199 | UROOT_DIR = $(PROJECT_DIR)/../..
200 | 
201 | rt-thread 路径
202 | RT_DIR = $(UROOT_DIR)/sdk/rt-thread INC_DIR =$(UROOT_DIR)/sdk/rt-thread/include LIB_DIR = ${UROOT_DIR}/sdk/rt-thread/lib
203 | 
204 | sdl 路径
205 | SDL_DIR = ${UROOT_DIR}/sdk/include/sdl
206 | 
207 | ffmpeg
208 | FFMPEG_DIR = ${UROOT_DIR}/sdk/include/ffmpeg
209 | 
210 | #x264 X264_DIR = ${UROOT_DIR}/sdk/include/x264
211 | 
212 | 配置编译参数
213 | CFLAGS = -march=armv7-a -marm -msoft-float -D__RTTHREAD__ -Wall -O0 -g -gdwarf-2 -n --static
214 | 
215 | 加入头文件搜索路径
216 | CFLAGS += -I. -I$(UROOT_DIR) -I$(PROJECT_DIR) -I$(RT_DIR)/components/dfs -I$(RT_DIR)/components/drivers -I$(RT_DIR)/components/finsh -I$(RT_DIR)/components/net
217 | -I$(RT_DIR)/components/net/netdev -I$(RT_DIR)/components/net/arpa -I${INC_DIR} -I${INC_DIR}/libc -I${INC_DIR}/sys -I${SDL_DIR} -I${FFMPEG_DIR}
218 | -I${FFMPEG_DIR}/libavcodec -I${FFMPEG_DIR}/libavdevice -I${FFMPEG_DIR}/libavfilter -I${FFMPEG_DIR}/libavformat
219 | -I${FFMPEG_DIR}/libavutil -I${FFMPEG_DIR}/libpostproc -I${FFMPEG_DIR}/libswresample -I${FFMPEG_DIR}/libswscale -I${X264_DIR}
220 | 
221 | 加入链接文件
222 | LDFLAGS = -march=armv7-a -marm -msoft-float -T ${UROOT_DIR}/linker_scripts/arm/cortex-a/link.lds
223 | 
224 | 加入库文件
225 | LDFLAGS += -L$(LIB_DIR) -Wl,--whole-archive -Os -lrtthread -lSDL2 -lavcodec -lavdevice -lavfilter -lavformat -lavutil -lpostproc -lswresample -lswscale -lx264 -Wl,--no-whole-archive -n --static -Wl,--start-group -lc -lgcc -lrtthread -lSDL2 -lavcodec -lavdevice -lavfilter -lavformat -lavutil -lpostproc -lswresample -lswscale -lx264 -Wl,--end-group
226 | 
227 | default: $(CC) $(CFLAGS) -c main.c -o main.o $(CC) $(LDFLAGS) main.o -o hello.elf
228 | 
229 | clean: @rm *.o *.elf
230 | 
231 | .PHONY: default clean
232 | 
233 | 
234 | 4. 编译
235 | 
236 | ![Snipaste_2022-01-17_10-10-13.png](figures/10.png)
237 | 
238 | 5. 通过 SD Card 启动 elf 文件，将生成的 hello.elf 文件和视频文件放到 SD 卡中，插入到 ART-Pi Smart：
239 | 
240 |  ![Snipaste_2022-01-17_10-15-04.png](figures/11.png)
241 | 
242 | ## 完整代码
243 | 
244 | ```c
245 | #include <stdio.h>
246 | #include <SDL.h>
247 | #include <libavcodec/avcodec.h>
248 | #include <libavformat/avformat.h>
249 | #include <libswscale/swscale.h>
250 | 
251 | extern Uint32 rtt_screen_width;
252 | extern Uint32 rtt_screen_heigth;
253 | 
254 | int main (int argc, char *argv[]) 
255 | {
256 |  int ret = -1;
257 |  AVFormatContext *pFormatCtx = NULL; 
258 |  int videoStream;
259 |  AVCodecParameters *pCodecParameters = NULL; 
260 |  AVCodecContext *pCodecCtx = NULL;
261 |  AVCodec *pCodec = NULL;
262 |  AVFrame *pFrame = NULL;
263 |  AVPacket packet;
264 | 
265 |  SDL_Rect rect;
266 |  SDL_Window *win = NULL;
267 |  SDL_Renderer *renderer = NULL;
268 |  SDL_Texture *texture = NULL;
269 | 
270 |  if(( argc != 2 ))
271 |  {
272 |      printf("error input arguments!\n");
273 |      return(1);
274 |  }
275 | 
276 |  // 默认窗口大小
277 |  int w_width  = rtt_screen_width;
278 |  int w_height = rtt_screen_heigth;
279 | 
280 |  // use dummy video driver
281 |  SDL_setenv("SDL_VIDEODRIVER","rtt",1);
282 |  //Initialize SDL
283 |  if( SDL_Init( SDL_INIT_VIDEO ) < 0 )
284 |  {
285 |      printf( "SDL could not initialize! SDL_Error: %s\n", SDL_GetError());
286 |      return -1;
287 |  }
288 | 
289 |  // 打开输入文件
290 |  if (avformat_open_input(&pFormatCtx, argv[1], NULL, NULL) != 0) 
291 |  {
292 |      printf("Couldn't open video file!: %s\n", argv[1]);
293 |      goto __exit; 
294 |  }
295 | 
296 |  // 找到视频流
297 |  videoStream = av_find_best_stream(pFormatCtx, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0);
298 |  if (videoStream == -1) 
299 |  {
300 |      printf("Din't find a video stream!\n");
301 |      goto __exit;// Didn't find a video stream
302 |  }
303 | 
304 |  // 流参数
305 |  pCodecParameters = pFormatCtx->streams[videoStream]->codecpar;
306 | 
307 |  // 获取解码器
308 |  pCodec = avcodec_find_decoder(pCodecParameters->codec_id);
309 |  if (pCodec == NULL) 
310 |  {
311 |      printf("Unsupported codec!\n");
312 |      goto __exit; // Codec not found
313 |  }
314 | 
315 |  // 初始化一个编解码上下文
316 |  pCodecCtx = avcodec_alloc_context3(pCodec);
317 |  if (avcodec_parameters_to_context(pCodecCtx, pCodecParameters) != 0) 
318 |  {
319 |      printf("Couldn't copy codec context\n");
320 |      goto __exit;// Error copying codec context
321 |  }
322 | 
323 |  // 打开解码器
324 |  if (avcodec_open2(pCodecCtx, pCodec, NULL) < 0) 
325 |  {
326 |      printf("Failed to open decoder!\n");
327 |      goto __exit; // Could not open codec
328 |  }
329 | 
330 |  // Allocate video frame
331 |  pFrame = av_frame_alloc();
332 | 
333 |  w_width = pCodecCtx->width;
334 |  w_height = pCodecCtx->height;
335 | 
336 |  // 创建窗口
337 |  win = SDL_CreateWindow("Media Player",
338 |                         SDL_WINDOWPOS_UNDEFINED,
339 |                         SDL_WINDOWPOS_UNDEFINED,
340 |                         w_width, w_height,
341 |                         SDL_WINDOW_SHOWN );
342 |  if (!win) 
343 |  {
344 |      printf("Failed to create window by SDL\n");
345 |      goto __exit;
346 |  }
347 | 
348 |  // 创建渲染器
349 |  renderer = SDL_CreateRenderer(win, -1, 0);
350 |  if (!renderer) 
351 |  {
352 |      printf("Failed to create Renderer by SDL\n");
353 |      goto __exit;
354 |  }
355 | 
356 |  // 创建纹理
357 |  texture = SDL_CreateTexture(renderer, SDL_PIXELFORMAT_IYUV,
358 |                              SDL_TEXTUREACCESS_STREAMING,
359 |                              w_width,
360 |                              w_height);
361 | 
362 | 
363 |  // 读取数据
364 |  while (av_read_frame(pFormatCtx, &packet) >= 0) 
365 |  {
366 |      if (packet.stream_index == videoStream) 
367 |      {
368 |          // 解码
369 |          avcodec_send_packet(pCodecCtx, &packet);
370 |          while (avcodec_receive_frame(pCodecCtx, pFrame) == 0) 
371 |          {
372 |              SDL_UpdateYUVTexture(texture, NULL,
373 |                                   pFrame->data[0], pFrame->linesize[0],
374 |                                   pFrame->data[1], pFrame->linesize[1],
375 |                                   pFrame->data[2], pFrame->linesize[2]);
376 | 
377 |              // set size of Window
378 |              rect.x = 0;
379 |              rect.y = 0;
380 |              rect.w = pCodecCtx->width;
381 |              rect.h = pCodecCtx->height;
382 | 
383 |              SDL_RenderClear(renderer);
384 |              SDL_RenderCopy(renderer, texture, NULL, &rect);
385 |              SDL_RenderPresent(renderer);
386 |          }
387 |      }
388 | 
389 |      av_packet_unref(&packet);
390 |  }
391 | 
392 | __exit:
393 | 
394 |  if (pFrame) 
395 |  {
396 |      av_frame_free(&pFrame);
397 |  }
398 | 
399 |  if (pCodecCtx) 
400 |  {
401 |      avcodec_close(pCodecCtx);
402 |  }
403 | 
404 |  if (pCodecParameters) 
405 |  {
406 |      avcodec_parameters_free(&pCodecParameters);
407 |  }
408 | 
409 |  if (pFormatCtx) 
410 |  {
411 |      avformat_close_input(&pFormatCtx);
412 |  }
413 | 
414 |  if (win) 
415 |  {
416 |      SDL_DestroyWindow(win);
417 |  }
418 | 
419 |  if (renderer) 
420 |  {
421 |      SDL_DestroyRenderer(renderer);
422 |  }
423 | 
424 |  if (texture) 
425 |  {
426 |      SDL_DestroyTexture(texture);
427 |  }
428 | 
429 |  SDL_Quit();
430 | 
431 |  return ret;
432 | }
433 | 复制错误复制成功
434 | 实机演示
435 | https://github.com/liukangcc/ART-Pi-Smart/blob/main/figures/8.gif
436 | 
437 | 该仓库放置了编译好的 FFmpeg 和 X264 库文件：https://github.com/liukangcc/ART-Pi-Smart
438 | 
439 | 我有疑问： RT-Thread 官方论坛
440 | 


--------------------------------------------------------------------------------
/script_001.md:
--------------------------------------------------------------------------------
 1 | ## TODO  
 2 | * search baidupan, lua11mod_v15.rar  
 3 | * G:\work_kopilua\lua11mod  
 4 | * lua 2.1 mingw bug, search baidupan, peanut-master.zip  
 5 | * https://github.com/search?p=3&q=yylval+yytext+string+lua&type=Code  
 6 | 
 7 | ## Old index    
 8 | * https://github.com/weimingtom/Kuuko/blob/master/README2.md  
 9 | * https://github.com/weimingtom/Kuuko/blob/master/lua_hack.md    
10 | 
11 | ## branch (mod)  
12 | * https://github.com/weimingtom/picoc  
13 | * https://github.com/weimingtom/huo  
14 | * https://github.com/weimingtom/ucc  
15 | * https://github.com/weimingtom/mac  
16 | * https://github.com/weimingtom/lcc  
17 | 
18 | ## 对Lua ，C，C#互相调用的理解  
19 | * https://www.jianshu.com/p/b6b24cb910ed  
20 | * P/Invoke, Marshal.GetFunctionPointerForDelegate  
21 | 
22 | ## MiniJava, 虎书  
23 | * http://www.cambridge.org/us/features/052182060X/#progs  
24 | * http://www.cambridge.org/gb/knowledge/isbn/item1170327/?site_locale=en_GB  
25 | * search baidupan, minijava  
26 | 
27 | ## chibicc  
28 | * https://github.com/rui314/chibicc  
29 | 
30 | ## yacc  
31 | https://github.com/weimingtom/wmt_yacc_study  
32 | 
33 | ## KConfig  
34 | 其实Lua的早期版本可以作为yacc和编译原理的学习教程，例如符号表的操作。  
35 | 如果你还想参考更套路的写法，甚至可以去参考Kconfig的源代码，  
36 | Kconfig就包含在Linux的早期源代码中，例如linux-2.6.35，  
37 | 搜索zconf.y（**补：最新版本的Linux内核源代码中叫parser.y**），可以找到一个目录叫scripts/kconfig，  
38 | 那里有kconfig宏语言的解释器源代码，只是可能移植性不太好  
39 | https://github.com/torvalds/linux/tree/v2.6.35/scripts/kconfig  
40 | https://www.kernel.org/doc/html/latest/kbuild/kconfig-language.html  
41 | my mod (port to mingw and msys, not msys2), search baidupan msys_esp-idf_v3.1.2_v1.rar  
42 | https://github.com/weimingtom/wmt_esp32_study/blob/main/README.md  
43 | 
44 | ## PL/0  
45 | https://github.com/shiyi001/PL0Compiler  
46 | search baidpan, PL0.rar    
47 | 
48 | ## miniJVM  
49 | https://github.com/digitalgust/miniJVM  
50 | avian  
51 | https://github.com/ReadyTalk/avian  
52 | JVM(j2me) / CLDC  
53 | see https://github.com/weimingtom/Kuuko/blob/master/README2.md  
54 | 


--------------------------------------------------------------------------------
/speech_commands_001.txt:
--------------------------------------------------------------------------------
 1 | https://www.cnblogs.com/lijianming180/p/12258774.html
 2 | 
 3 | 
 4 | 使用TensorFlow训练自己的语音识别AI
 5 | 这次来训练一个基于CNN的语音识别模型。训练完成后，我们将尝试将此模型用于Hotword detection。
 6 | 
 7 | 人类是怎样听懂一句话的呢？以汉语为例，当听到“wo shi”的录音时，我们会想，有哪两个字是读作“wo shi”的，有人想到的是“我是”，也有人觉得是“我市”。
 8 | 我们可以通过”wo shi”的频率的特征，匹配到一些结果，我们这次要训练的模型，也是基于频率特征的CNN模型。单纯的基于频率特征的识别有很大的局限性，比如前面提到的例子，光是听到“wo shi”可能会导致产生歧义，但是如果能有上下文，我们就可以大大提高“识别”的成功率。因此，类似Google Assistant那样的识别，不光是考虑到字词的发音，还联系了语义，就算有一两个字发音不清，我们还是能得到正确的信息。
 9 | 但是基于频率特征的模型用作Hotword detection还是比较合适的，因为Horword通常是一两个特定的词，不需要联系语境进行语义分析。
10 | 
11 | 准备训练数据集
12 | 开源的语言数据集比较少，这里我们使用TensorFlow和AIY团队推出的一个数据集，包含30个基本的英文单词的大量录音：
13 | 下载地址
14 | 这个数据集只有1G多，非常小的语音数据集，不过用来实验是完全够的。
15 | 
16 | 运行docker并挂载工作目录
17 | 新建一个speech_train文件夹，并在其中创建子文件夹dataset,logs,train,它们将用于存放数据集，log和训练文件。解压数据集到dataset，然后运行docker：
18 | 
19 | 1
20 | 2
21 | docker run -it -v $(pwd)/speech_train:/speech_train 
22 |   gcr.io/tensorflow/tensorflow:latest-devel
23 | 使用默认的conv模型开始训练
24 | 1
25 | 2
26 | 3
27 | 4
28 | 5
29 | 6
30 | cd /tensorflow/
31 | python tensorflow/examples/speech_commands/train.py 
32 | --data_dir=/speech_train/dataset/ 
33 | --summaries_dir=/speech_train/logs/ 
34 | --train_dir=/speech_train/train/ 
35 | --wanted_words=one,two,three,four,five,marvin
36 | 在这里我们指定希望识别的label: one,two,three,four,five,marvin。数据集的其他部分将被归为unknown
37 | 
38 | 使用TensorBoard使训练可视化
39 | 我们可以通过分析生成的log使训练过程可视化：
40 | 
41 | 1
42 | 大专栏  使用TensorFlow训练自己的语音识别AI class="line">tensorboard --logdir /speech_train/logs
43 | 运行指令后，可以通过浏览器访问本地的6006端口进入TensorBoard。下图是使用conv模型完成18000 steps 训练的过程图：
44 | 
45 | 训练花了差不多15个小时。
46 | 
47 | 生成pb文件
48 | 训练完成后，我们需要将其转化为pb文件：
49 | 
50 | 1
51 | 2
52 | 3
53 | 4
54 | python tensorflow/examples/speech_commands/freeze.py 
55 | --start_checkpoint=/speech_train/train/conv.ckpt-18000 
56 | --output_file=/speech_train/conv.pb 
57 | --wanted_words=one,two,three,four,five,marvin
58 | 完成后，我们将得到一个名为conv.pb的文件，配合包含可识别label的txt文件就可以直接使用了。
59 | 
60 | 测试
61 | 使用测试脚本进行测试：
62 | 
63 | 1
64 | 2
65 | 3
66 | 4
67 | python tensorflow/examples/speech_commands/label_wav.py 
68 | --graph=/speech_train/conv.pb 
69 | --labels=/speech_train/conv_labels.txt 
70 | --wav=/speech_train/dataset/marvin/0b40aa8e_nohash_0.wav
71 | 训练的模型应能正确识别出marvin。
72 | 
73 | 使用准确度较低但是预测更快的low_latency_conv模型
74 | 我们可以使用另外一种准确度较低但是预测更快的low_latency_conv模型进行训练：
75 | 
76 | 1
77 | 2
78 | 3
79 | 4
80 | 5
81 | 6
82 | 7
83 | 8
84 | python tensorflow/examples/speech_commands/train.py 
85 | --data_dir=/speech_train/dataset/ 
86 | --summaries_dir=/speech_train/logs/ 
87 | --train_dir=/speech_train/train/ 
88 | --model_architecture=low_latency_conv 
89 | --how_many_training_steps=20000,6000 
90 | --learning_rate=0.01,0.001 
91 | --wanted_words=one,two,three,four,marvin,wow
92 | 当使用该模型时，可以适当增加training steps和learning rate。在这种情况下，训练的时间大大缩短了：
93 | 
94 | 只花了不到3小时.
95 | 
96 | 其他
97 | 也可以使用gpu版本的tensorflow进行训练，速度可以提升不少哦。
98 | 
99 | 


--------------------------------------------------------------------------------
/tensorflow_001.txt:
--------------------------------------------------------------------------------
 1 | https://technofob.com/2019/06/14/how-to-compile-tensorflow-2-0-with-avx2-fma-instructions-on-mac/
 2 | 
 3 | How to fix “Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA”
 4 |  Ofir Data Engineering, Data Science, Deep Learning, Python  June 14, 2019 2 Minutes
 5 | After installing Tensorflow using pip3 install:
 6 | 
 7 | sudo pip3 install tensorflow
 8 | I’ve received the following warning message:
 9 | 
10 | I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
11 | Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later on by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions, and a new coding scheme.
12 | 
13 | AVX introduces fused multiply-accumulate (FMA) operations, which speed up linear algebra computation, namely dot-product, matrix multiply, convolution, etc. Almost every machine-learning training involves a great deal of these operations, hence will be faster on a CPU that supports AVX and FMA (up to 300%).
14 | 
15 | We won’t ignore the warning message and we will compile TF from source.
16 | 
17 | We will start with uninstalling the default version of  Tensorflow:
18 | 
19 | sudo pip3 uninstall protobuf
20 | sudo pip3 uninstall tensorflow
21 | In a temp folder, clone Tensorflow:
22 | 
23 | git clone https://github.com/tensorflow/tensorflow 
24 | git checkout r2.0
25 | Install the TensorFlow pip package dependencies:
26 | 
27 | pip3 install -U --user pip six numpy wheel setuptools mock future>=0.17.1
28 | pip3 install -U --user keras_applications==1.0.6 --no-deps
29 | pip3 install -U --user keras_preprocessing==1.0.5 --no-deps
30 | Install Bazel, the build tool used to compile TensorFlow. In my case, after downloading bazel-0.26.0-installer-darwin-x86_64.sh:
31 | 
32 | chmod +x bazel-0.26.0-installer-darwin-x86_64.sh ./bazel-0.26.0-installer-darwin-x86_64.sh --user export PATH="$PATH:$HOME/bin" bazel version
33 | Configure your system build by running the following at the root of your TensorFlow source tree:
34 | 
35 | ./configure
36 | The Tensorflow build options expose flags to enable building for platform-specific CPU instruction sets:
37 | 
38 | 
39 | Use bazel to make the TensorFlow package builder with CPU-only support:
40 | 
41 | bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.2 //tensorflow/tools/pip_package:build_pip_package
42 | The bazel build command creates an executable named build_pip_package—this is the program that builds the pip package. Run the executable as shown below to build a .whl package in the /tmp/tensorflow_pkg directory.
43 | 
44 | To build from a release branch:
45 | 
46 | ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
47 | Output wheel file is in: /tmp/tensorflow_pkg
48 | 
49 | You can download the file from here, and try to install it directly
50 | 
51 | pip3 install /tmp/tensorflow_pkg/tensorflow-2.0.0b1-cp37-cp37m-macosx_10_14_x86_64.whl 
52 | cd out of that directory, and now running this should not produce any warning:
53 | 
54 | python3 -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
55 | Enjoy!
56 | 
57 | Share this:
58 | 
59 | 


--------------------------------------------------------------------------------
/tinymind_001.txt:
--------------------------------------------------------------------------------
  1 | https://www.tinymind.cn/articles/3833
  2 | 
  3 | 【语音识别】从入门到精通——最全干货大合集！
  4 | 
  5 |  专知 2018-11-07 11:52
  6 | 关注文章
  7 | 语音识别
  8 | 【导读】语音识别领域最全入门资料、论文、代码、产品大合集！包括语音识别，语音合成，声纹识别等内容，一文在手，带你走进语音识别的世界。
  9 | 
 10 | 作者 | 刘斌
 11 | 
 12 | 编辑 | Xiaowen
 13 | 
 14 | 
 15 | 
 16 | 入门学习
 17 | 
 18 | 
 19 | 语音识别研究的四大前沿方
 20 | 
 21 | https://blog.csdn.net/haima1998/article/details/79094341
 22 | 
 23 | 
 24 | 
 25 | 深度学习入门论文（语音识别领域）
 26 | 
 27 | https://blog.csdn.net/youyuyixiu/article/details/53764218
 28 | 
 29 | 
 30 | 
 31 | 论语音识别三大关键技术 
 32 | 
 33 | https://blog.csdn.net/qq_34231800/article/details/80189617
 34 | 
 35 | 
 36 | 
 37 | 深度学习与语音识别—常用声学模型简介
 38 | 
 39 | https://blog.csdn.net/dujiajiyi_xue5211314/article/details/53943313
 40 | 
 41 | 
 42 | 
 43 | 有趣的开源软件：语音识别工具Kaldi 
 44 | 
 45 | https://blog.csdn.net/AMDS123/article/details/70313780
 46 | 
 47 | 
 48 | 
 49 | 神经网络-CNN结构和语音识别应用 
 50 | 
 51 | https://blog.csdn.net/xmdxcsj/article/details/54695995
 52 | 
 53 | 
 54 | 
 55 | 语音识别概述 
 56 | 
 57 | https://blog.csdn.net/shichaog/article/details/72528637
 58 | 
 59 | 
 60 | 
 61 | 端到端语音识别 
 62 | 
 63 | https://blog.csdn.net/xmdxcsj/article/details/70300546
 64 | 
 65 | 
 66 | 
 67 | Attention在语音识别中的应用 
 68 | 
 69 | https://blog.csdn.net/quheDiegooo/article/details/76842201
 70 | 
 71 | 
 72 | 
 73 | 语音合成技术 
 74 | 
 75 | https://blog.csdn.net/wja8a45TJ1Xa/article/details/78599509?locationNum=8&fps=1
 76 | 
 77 | 
 78 | 
 79 | 深度学习于语音合成研究综述 
 80 | 
 81 | https://blog.csdn.net/weixin_37598106/article/details/81513816
 82 | 
 83 | 
 84 | 
 85 | 端到端的TTS深度学习模型tacotron(中文语音合成) 
 86 | 
 87 | https://blog.csdn.net/yunnangf/article/details/79585089
 88 | 
 89 | 
 90 | 
 91 | TACOTRON:端到端的语音合成 
 92 | 
 93 | https://blog.csdn.net/Left_Think/article/details/74905928
 94 | 
 95 | 
 96 | 
 97 | 声纹识别技术简介  
 98 | 
 99 | https://www.cnblogs.com/wuxian11/p/6498699.html
100 | 
101 | 
102 | 
103 | 声纹识别技术的现状、局限与趋势 
104 | 
105 | https://blog.csdn.net/jojozhangju/article/details/78637221
106 | 
107 | 
108 | 
109 | 声纹识别 
110 | 
111 | https://www.jianshu.com/p/513dadeef1fd
112 | 
113 | 
114 | 
115 | Deep speaker介绍 
116 | 
117 | https://blog.csdn.net/Lauyeed/article/details/79936632
118 | 
119 | 
120 | 
121 | 论文
122 | 
123 | 
124 | 语音识别 DNN 
125 | Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition(2012), George E. Dahl et al. 
126 | 
127 | https://ieeexplore.ieee.org/document/5740583/?part=1
128 | 
129 | 
130 | 
131 | Deep Neural Networks for Acoustic Modeling in Speech Recognition(2012), Geoffrey Hinton et al. 
132 | 
133 | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6296526
134 | 
135 | 
136 | 
137 | 语音识别 CNN
138 | Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition(2012), Ossama Abdel-Hamid et al.
139 | 
140 | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6288864
141 | 
142 | 
143 | 
144 | Deep convolutional neural networks for LVCSR(2013), Tara N. Sainath et al. 
145 | 
146 | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6639347
147 | 
148 | 
149 | 
150 | Analysis of CNN-based speech recognition system using raw speech as input(2015), Dimitri Palaz et al. 
151 | 
152 | https://infoscience.epfl.ch/record/210029/files/Palaz_INTERSPEECH_2015.pdf
153 | 
154 | 
155 | 
156 | Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition(2016), Yanmin Qian et al. 
157 | 
158 | https://pdfs.semanticscholar.org/8043/cbfed66c98d2255ea79254de620837478099.pdf
159 | 
160 |  
161 | 
162 | Very deep multilingual convolutional neural networks for LVCSR(2016), Tom Sercu et al. 
163 | 
164 | https://arxiv.org/pdf/1509.08967.pdf
165 | 
166 | 
167 | 
168 | Advances in Very Deep Convolutional Neural Networks for LVCSR(2016), Tom Sercu et al. 
169 | 
170 | https://arxiv.org/pdf/1604.01792.pdf
171 | 
172 | 
173 | 
174 | Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention(2016), Dong Yu et al. 
175 | 
176 | https://pdfs.semanticscholar.org/716e/60cbbdacf01b3148e91a555358a96308b770.pdf?_ga=2.38333155.198966451.1540996486-1278087525.1535180761
177 | 
178 | 
179 | 
180 | 语音识别 LSTM
181 | Long short-term memory recurrent neural network architectures for large scale acoustic modeling(2014), Hasim Sak et al. 
182 | 
183 | https://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/43905.pdf
184 | 
185 | 
186 | 
187 | Deep LSTM for Large Vocabulary Continuous Speech Recognition(2017), Xu Tian et al. 
188 | 
189 | https://arxiv.org/pdf/1703.07090.pdf
190 | 
191 | 
192 | 
193 | English Conversational Telephone Speech Recognition by Humans and Machines(2017), George Saon et al. 
194 | 
195 | https://arxiv.org/pdf/1703.02136.pdf
196 | 
197 | 
198 | 
199 | 
200 | 
201 | 语音识别 CTC 
202 | Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks(2006), Alex Graves et al. 
203 | 
204 | http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.75.6306&rep=rep1&type=pdf
205 | 
206 | 
207 | 
208 | Towards End-to-End Speech Recognition with Recurrent Neural Networks(2014), Alex Graves et al. 
209 | 
210 | http://proceedings.mlr.press/v32/graves14.pdf
211 | 
212 | 
213 | 
214 | First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs(2014), Andrew L. Maas et al. 
215 | 
216 | https://arxiv.org/pdf/1408.2873.pdf
217 | 
218 | 
219 | 
220 | Deep Speech: Scaling up end-to-end speech recognition(2014), Awni Y. Hannun et al. 
221 | 
222 | https://arxiv.org/pdf/1412.5567.pdf
223 | 
224 | 
225 | 
226 | Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification(2015), Kyuyeon Hwang et al. 
227 | 
228 | https://arxiv.org/pdf/1511.06841.pdf
229 | 
230 | 
231 | 
232 | Fast and Accurate Recurrent Neural Network Acoustic Models for Speech Recognition(2015), Hasim Sak et al.  
233 | 
234 | https://arxiv.org/pdf/1507.06947.pdf
235 | 
236 | 
237 | 
238 | Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning(2016), Suyoun Kim et al. 
239 | 
240 | https://arxiv.org/pdf/1609.06773.pdf
241 | 
242 | 
243 | 
244 | Deep Speech 2: End-to-End Speech Recognition in English and Mandarin(2016), Dario Amodei et al. 
245 | 
246 | http://proceedings.mlr.press/v48/amodei16.pdf
247 | 
248 | 
249 | 
250 | Wav2Letter: an End-to-End ConvNet-based Speech Recognition System(2016), Ronan Collobert et al. 
251 | 
252 | https://arxiv.org/pdf/1609.03193.pdf
253 | 
254 | 
255 | 
256 | Multi-task Learning with CTC and Segmental CRF for Speech Recognition(2017), Liang Lu et al.
257 | 
258 | https://arxiv.org/pdf/1702.06378.pdf
259 | 
260 | 
261 | 
262 | Residual Convolutional CTC Networks for Automatic Speech Recognition(2017), Yisen Wang et al.`
263 | 
264 | https://arxiv.org/pdf/1702.07793.pdf
265 | 
266 | 
267 | 
268 | 语音识别 Sequence Transduction
269 | Sequence Transduction with Recurrent Neural Networks(2012), Alex Graves et al. 
270 | 
271 | https://arxiv.org/pdf/1211.3711.pdf
272 | 
273 | 
274 | 
275 | 语音识别 attention
276 | End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results(2014), Jan Chorowski et al. 
277 | 
278 | https://arxiv.org/pdf/1412.1602.pdf
279 | 
280 | 
281 | 
282 | Attention-Based Models for Speech Recognition(2015), Jan Chorowski et al. 
283 | 
284 | https://arxiv.org/pdf/1506.07503.pdf
285 | 
286 | 
287 | 
288 | End-to-end attention-based large vocabulary speech recognition(2016), Dzmitry Bahdanau et al. 
289 | 
290 | https://arxiv.org/pdf/1508.04395.pdf
291 | 
292 | 
293 | 
294 | Listen, attend and spell: A neural network for large vocabulary conversational speech recognition(2016), William Chan et al. 
295 | 
296 | https://arxiv.org/pdf/1508.01211.pdf
297 | 
298 | 
299 | 
300 | End-to-end attention-based distant speech recognition with Highway LSTM(2016), Hassan Taherian. 
301 | 
302 | https://arxiv.org/pdf/1610.05361.pdf
303 | 
304 | 
305 | 
306 | Direct Acoustics-to-Word Models for English Conversational Speech Recognition(2017), Kartik Audhkhasi et al. 
307 | 
308 | https://arxiv.org/pdf/1703.07754.pdf
309 | 
310 | 
311 | 
312 | 语音识别 多通道 
313 | Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition(2017), Tara N. Sainath et al. 
314 | 
315 | http://www.ee.columbia.edu/~ronw/pubs/taslp2017-multichannel.pdf
316 | 
317 | 
318 | 
319 | Multichannel End-to-end Speech Recognition(2017), Tsubasa Ochiai et al. 
320 | 
321 | https://arxiv.org/pdf/1703.04783.pdf
322 | 
323 | 
324 | 
325 | 语音合成 SampleRNN
326 | SampleRNN: An Unconditional End-to-End Neural Audio Generation Model(2016), Soroush Mehri et al.
327 | 
328 | https://arxiv.org/pdf/1612.07837.pdf
329 | 
330 | 
331 | 
332 | 语音合成 WaveNet 
333 | WaveNet: A Generative Model for Raw Audio(2016), Aäron van den Oord et al. 
334 | 
335 | https://arxiv.org/pdf/1609.03499.pdf
336 | 
337 | 
338 | 
339 | 语音合成 Deep Voice
340 | Deep Voice: Real-time Neural Text-to-Speech(2017), Sercan O. Arik et al. 
341 | 
342 | https://arxiv.org/pdf/1702.07825.pdf
343 | 
344 | 
345 | 
346 | 语音合成 Deep Voice 2
347 | Deep Voice 2: Multi-Speaker Neural Text-to-Speech(2017), Sercan Arik et al. 
348 | 
349 | https://arxiv.org/pdf/1705.08947.pdf
350 | 
351 | 
352 | 
353 | 语音合成 Tacotron
354 | Tacotron: Towards End-to-End Speech Synthesis(2017), Yuxuan Wang et al. 
355 | 
356 | https://pdfs.semanticscholar.org/f258/f0d3260e7fbdd961993086aaafa2afc714c9.pdf
357 | 
358 | 
359 | 
360 | 语音合成 Tacotron 2
361 | Natural tts synthesis by conditioning wavenet on mel spectrogram predictions(2018), Jonathan Shen et al. 
362 | 
363 | https://sigport.org/sites/default/files/docs/ICASSP%202018%20-%20Tacotron%202.pdf
364 | 
365 | 
366 | 
367 | 语音合成 Voiceloop
368 | Voiceloop: Voice Fitting and Synthesis via a Phonological Loop(2018), Yaniv Taigman et al. 
369 | 
370 | https://arxiv.org/pdf/1707.06588.pdf
371 | 
372 | 
373 | 
374 | 
375 | 
376 | 声纹识别 x-vector 使用TDNN提取语音的embedding 
377 | Deep Neural Network Embeddings for Text-Independent Speaker Veriﬁcation(2017), David Snyder et al.
378 | 
379 | http://danielpovey.com/files/2017_interspeech_embeddings.pdf
380 | 
381 | 
382 | 
383 | 百度 端到端声纹识别 Triplet Loss
384 | Deep Speaker: an End-to-End Neural Speaker Embedding System(2017), Chao Li et al. 
385 | 
386 | https://arxiv.org/pdf/1705.02304.pdf
387 | 
388 | 
389 | 
390 | 声纹识别 3D卷积网络 
391 | Text-independent speaker verification using 3d convolutional neural networks(2018), Amirsina Torﬁ et al. 
392 | 
393 | https://arxiv.org/pdf/1705.09422.pdf
394 | 
395 | 
396 | 
397 | 声纹识别 端到端 GE2E
398 | Generalized End-to-End Loss for Speaker Verfication(2018)  Wan L et al. 
399 | 
400 | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8462665
401 | 
402 | 
403 | 
404 | 代码
405 | 
406 | 
407 | kaldi  
408 | 
409 | 使用广泛的语音工具包 
410 | 
411 | https://github.com/kaldi-asr/kaldi
412 | 
413 | 
414 | 
415 | A TensorFlow implementation of Baidu's DeepSpeech architecture     
416 | 
417 | 语音识别 Baidu DeepSpeech TensorFlow实现
418 | 
419 | https://github.com/mozilla/DeepSpeech
420 | 
421 | 
422 | 
423 | Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition based on DeepMind's WaveNet and tensorflow 
424 | 
425 | 语音识别 DeepMind's WaveNet TensorFlow实现
426 | 
427 | https://github.com/buriburisuri/speech-to-text-wavenet
428 | 
429 | 
430 | 
431 | End-to-end automatic speech recognition system implemented in TensorFlow.
432 | 
433 | 端到端语音识别 TensorFlow实现
434 | 
435 | https://github.com/zzw922cn/Automatic_Speech_Recognition
436 | 
437 | 
438 | 
439 | A PyTorch Implementation of End-to-End Models for Speech-to-Text 
440 | 
441 | 端到端语音识别 PyTorch实现
442 | 
443 | https://github.com/awni/speech
444 | 
445 | 
446 | 
447 | A PaddlePaddle implementation of DeepSpeech2 architecture for ASR.
448 | 
449 | 语音识别 DeepSpeech2 PaddlePaddle实现
450 | 
451 | https://github.com/PaddlePaddle/DeepSpeech
452 | 
453 | 
454 | 
455 | A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model 
456 | 
457 | 语音合成 Tacotron TensorFlow实现
458 | 
459 | https://github.com/Kyubyong/tacotron
460 | 
461 | 
462 | 
463 | Tacotron 2 - PyTorch implementation with faster-than-realtime inference 
464 | 
465 | 语音合成 Tacotron2 PyTorch实现
466 | 
467 | https://github.com/NVIDIA/tacotron2
468 | 
469 | 
470 | 
471 | Deep neural networks for voice conversion (voice style transfer) in Tensorflow 
472 | 
473 | 语音合成 Deep-voice TensorFlow实现
474 | 
475 | https://github.com/andabi/deep-voice-conversion
476 | 
477 | 
478 | 
479 | A method to generate speech across multiple speakers 
480 | 
481 | 语音合成 facebook PyTorch实现
482 | 
483 | https://github.com/facebookresearch/loop
484 | 
485 | 
486 | 
487 | Speaker embedding(verification and recognition) using Pytorch 
488 | 
489 | 声纹识别 PyTorch实现
490 | 
491 | https://github.com/qqueing/DeepSpeaker-pytorch
492 | 
493 | 
494 | 
495 | Deep Learning & 3D Convolutional Neural Networks for Speaker Verification 
496 | 
497 | 声纹识别 3D卷积 TensorFlow实现
498 | 
499 | https://github.com/astofi/3D-convolutional-speaker-recognition
500 | 
501 | 
502 | 
503 | 产品应用
504 | 
505 | 
506 | 百度语音官网 
507 | 
508 | http://yuyin.baidu.com/
509 | 
510 | 
511 | 
512 | 腾讯AI开放平台 
513 | 
514 | https://ai.qq.com/product/aaiasr.shtml
515 | 
516 | 
517 | 
518 | 讯飞开放平台 
519 | 
520 | https://xfyun.cn/services/voicedictation
521 | 
522 | 
523 | 
524 | 必应语音 
525 | 
526 | https://azure.microsoft.com/zh-cn/services/cognitive-services/speech/
527 | 
528 | 
529 | 
530 | 作者简介
531 | 刘斌，中科院自动化所博士生，研究方向为鲁棒性声学建模。
532 | 
533 | 
534 | 


--------------------------------------------------------------------------------
/unity_001.md:
--------------------------------------------------------------------------------
 1 | ## game    
 2 | * search 扫雷游戏  
 3 | 
 4 | ## tutorial  
 5 | * search baidupan, GalGame-1.rar  
 6 | * https://github.com/weimingtom/GalGame-1/tree/master/GalGameUnity5/Assets/Scenes  
 7 | * search baidupan, minesweeper-clone.rar  
 8 | * search baidpuan, GalgameDemo.rar  
 9 | 
10 | ## unitypackage  
11 | * search baidupan, 0_unity  
12 | 
13 | ## project, demo    
14 | https://github.com/UnityTechnologies/open-project-1  
15 | 


--------------------------------------------------------------------------------
/usb_hub_001.md:
--------------------------------------------------------------------------------
1 | * usb hub, 集线器, 分线器    
2 | 世友  
3 | SSK  
4 | 


--------------------------------------------------------------------------------
/weibo_001.txt:
--------------------------------------------------------------------------------
 1 | 都是短长度命令词，Tensorflow 2.x (tf.keras) CNN: gitee.com/weimingtom2000/Speech-Recognition_mod
 2 | 
 3 | 04-30 00:28
 4 | 
 5 | 
 6 | 这个是你的截图里面提到的那个项目，我还没仔细研究，也没有跑通过：pytorch CRNN: gitee.com/weimingtom2000/kws-attention
 7 | 
 8 | 
 9 | 就这两个，第一个是我魔改过，应该可以运行，第二个不确定是否有用，这两个都是基于google tensorflow的speech_commands训练集，比较大，需要自己下载，如果你研究这方面应该很清楚
10 | 
11 | 
12 | 另外pytorch官方也有相关的例子：https:// pytorch.org/tutorials/intermediate/speech_command_recognition_with_torchaudio.html，那个肯定是可以运行的
13 | 
14 | 04-30 00:42
15 | 
16 | 
17 | 如果你想要长时长的ASR，可以去看kaldi、mozilla deepspeech之类的项目，其他还有很多，例如vosk-api、SpeechBrain之类，也可以在gitee上搜索相关的关键词例如Speech-Recognition。单片机方面和Android的也有很多相关的开源推理引擎
18 | 
19 | 
20 | 
21 | ------------------
22 | 
23 | 
24 | 回复@一路向北ya:怎么搜出来的呢，我是用类似反编译的方法，用记事本打开友善之臂提供的elf文件，找到里面的字符串然后去搜索github，例如搜索这个字符串No sparse/gzip/crc allowed for block device，然后就能找到make_ext4fs的源代码了
25 | 回复一路向北ya的评论 ：make_ext4fs应该被友善自己修改了，android官方工具无法恢复镜像。sd_update的源码在哪儿？最近在修改分区，要被这两个工具搞死了。
26 | X
27 | weimingtom
28 | weimingtom
29 | 2020-3-22 01:27
30 | 回复@一路向北ya:或者weihutaisui/BCM/tree/master/HGU_BCM68580/02_src_502L04patch2/hostTools/make_ext4fs/extras_latest/extras/ext4_utils（待续）
31 | 回复一路向北ya的评论 ：make_ext4fs应该被友善自己修改了，android官方工具无法恢复镜像。sd_update的源码在哪儿？最近在修改分区，要被这两个工具搞死了。
32 | X
33 | weimingtom
34 | weimingtom
35 | 2020-3-22 01:26
36 | 回复@一路向北ya:make_ext4fs的源码参考这里github.com/woju/make_ext4fs，不太确定，好像是可行的，建议用64位ubuntu，如果不行的话可以搜索make_ext4fs的其他结果，例如rendiix/android-prebuilt-binary-tools、arfoll/updater-hack（待续）
37 | 
38 | 
39 | -------------
40 | 
41 | 我发现一个语音识别领域的玄机，或者说潜规则——涉及语音识别的人或者书，通常会提及两个工具tensorflow和kaldi，行外的人会很好奇，为什么kaldi可以跟大名鼎鼎的tensorflow并列，其实仔细想想就会明白为什么，因为现在前沿的趋势就是想把语音识别放到嵌入式设备上运行，tensorflow显然不太好（虽然有tflite，这是一个方向），而kaldi却被很多人盯上了，尝试搬到手机端或者嵌入式linux上，典型的有两个地方，一个是我之前说的vosk-api，它可以运行在树莓派上（应该也可以运行在安卓上），另外一个是这篇文章：《基于kaldi训练唤醒词模型的一种方法》：
42 | O网页链接
43 | 你会惊讶地发现已经有很多人开始尝试把kaldi往小内存电脑上移植了
44 | 
45 | 
46 | 《从零开始学习matlab语音识别》第二十四话。用树莓派3b和最新的buster版系统ROM安装vosk-api。vosk-api相当于Python封装下的kaldi，由于kaldi只适合高级专业人士使用，而且模型不好下载，所以建议用vosk-api。它的使用方法见官方文档：《Running the example code with python》
47 | O网页链接
48 | 注意运行可能会报错，需要sudo apt-get install libgfortran3安装gnu fortran的动态库。我测试过速度也是跟pocketsphinx差不多，不过可以输出更多识别信息，而且识别率还是比较好的，应该跟pocketsphinx类似，比deepspeech好一点（使用deepspeech的一个测试音频），有兴趣可以自行参考官网步骤去安装运行，Python示例代码在github上vosk-api仓库下载
49 | 
50 | 
51 | 
52 | 


--------------------------------------------------------------------------------
/ytk_001.txt:
--------------------------------------------------------------------------------
 1 | from 深度学习核心技术与实践
 2 | http://www.broadview.com.cn/book/2422
 3 | 语音识别ASR相关：14, 15, 16, 17章  
 4 | 
 5 | 
 6 | 章节       资源名称      资源位置
 7 | 8.1          Theano       http://deeplearning.net/software/theano/
 8 | 8.2          Torch         http://torch.ch/
 9 | 8.3          PyTorch       http://pytorch.org/
10 | 8.4          Caffe         http://caffe.berkeleyvision.org/
11 | 8.5          TensorFlow    https://www.tensorflow.org/
12 | 8.6          MXNet        http://mxnet.io/
13 | 8.7          Keras         https://keras.io/
14 | 10.1         LeNet-5       http://yann.lecun.com/exdb/lenet/
15 | 10.2     AlexNet  https://github.com/BVLC/caffe/tree/master/models/bvlc_alexnet
16 | 10.3      VGGNet         http://www.robots.ox.ac.uk/~vgg/research/very_deep/
17 | 10.4     GoogLeNet  https://github.com/tensorflow/models/tree/master/inception
18 | 10.5     ResNet           https://github.com/KaimingHe/deep-residual-networks
19 | 10.6       DenseNet              https://github.com/liuzhuang13/DenseNet
20 | 10.7        DPN            https://github.com/cypw/DPNs
21 | 11.1.2       OverFeat       https://github.com/sermanet/OverFeat
22 | 11.2.1      R-CNN          https://github.com/rbgirshick/rcnn
23 | 11.2.2      SPP-net          https://github.com/ShaoqingRen/SPP_net
24 | 11.2.3     Fast R-CNN       https://github.com/rbgirshick/fast-rcnn
25 | 11.2.4     Faster R-CNN      https://github.com/ShaoqingRen/faster_rcnn
26 | 11.2.5      R-FCN           https://github.com/daijifeng001/R-FCN
27 | 11.3.1      YOLO            https://pjreddie.com/darknet/yolo/
28 | 11.3.2       SSD           https://github.com/weiliu89/caffe/tree/ssd
29 | 12.1.1       FCN          https://github.com/shelhamer/fcn.berkeleyvision.org
30 | 12.1.2     DeconvNet       https://github.com/HyeonwooNoh/DeconvNet
31 | 12.1.3      SegNet          http://mi.eng.cam.ac.uk/projects/segnet/
32 | 12.1.4     DilatedConvNet    https://github.com/fyu/dilation
33 | 12.2.1     DeepLab         http://liangchiehchen.com/projects/DeepLab.html
34 | 12.2.2     CRFasRNN      https://github.com/torrvision/crfasrnn
35 | 12.2.3DeepParsingNetwork http://personal.ie.cuhk.edu.hk/~lz013/projects/DPN.html
36 | 12.3.1  Mask R-CNN   https://github.com/CharlesShang/FastMaskRCNN（非原作）
37 | 13.3    DSH    https://github.com/lhmRyan/deep-supervised-hashing-DSH（非原作）
38 | 15     OpenFst          http://www.openfst.org/
39 | 16.2    Kaldi            http://kaldi-asr.org/
40 | 16.4    EESEN            https://github.com/yajiemiao/eesen
41 | 19.2    Stanford CoreNLP   https://stanfordnlp.github.io/CoreNLP/
42 | 19.3    JNN               https://github.com/wlin12/JNN
43 | 20.2    SyntaxNet   https://github.com/tensorflow/models/tree/master/syntaxnet
44 | 21     word2vec         https://code.google.com/archive/p/word2vec/
45 | 21.6    fastText           https://github.com/facebookresearch/fastText
46 | 21.7     GloVe            https://nlp.stanford.edu/projects/glove/
47 | 22.2   GroundHog          https://github.com/pascanur/GroundHog
48 | 22.4     GNMT            https://github.com/tensorflow/nmt
49 | 22.5     FAIRSeq          https://github.com/facebookresearch/fairseq
50 | 25.6.1   TCDCN           http://mmlab.ie.cuhk.edu.hk/projects/TCDCN.html
51 | 25.6.2   DeepID2       https://github.com/happynear/FaceVerification（非原作）
52 | 25.6.5    MNC          https://github.com/daijifeng001/MNC
53 | 26.4HashedNetshttp://www.cse.wustl.edu/~wenlinchen/project/HashedNets/index.html
54 | 26.5   Squeeze-Net     https://github.com/DeepScale/SqueezeNet
55 | 26.6   BinaryConnect    https://github.com/MatthieuCourbariaux/BinaryConnect
56 | 26.6    BinaryNet      https://github.com/MatthieuCourbariaux/BinaryNet
57 | 26.7MobileNethttps://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md
58 | 27.5      DQN          https://deepmind.com/research/dqn
59 | 28.3     DCGAN          https://github.com/carpedm20/DCGAN-tensorflow
60 | 28.4     InfoGAN        https://github.com/openai/InfoGAN
61 | 28.5      Pix2Pix       https://github.com/phillipi/pix2pix
62 | 28.6     WGAN       https://github.com/martinarjovsky/WassersteinGAN
63 | 


--------------------------------------------------------------------------------