├── tests ├── __init__.py └── test_teams_notetaker.py ├── HISTORY.rst ├── teams_notetaker ├── __init__.py ├── utils.py ├── summarize.py ├── speech_recognition.py ├── audio_utils.py └── teams_notetaker.py ├── requirements_dev.txt ├── AUTHORS.rst ├── MANIFEST.in ├── .travis.yml ├── tox.ini ├── .editorconfig ├── .github └── ISSUE_TEMPLATE.md ├── setup.cfg ├── setup.py ├── LICENSE ├── .gitignore ├── Makefile ├── CONTRIBUTING.rst └── README.md /tests/__init__.py: -------------------------------------------------------------------------------- 1 | """Unit test package for teams_notetaker.""" 2 | -------------------------------------------------------------------------------- /HISTORY.rst: -------------------------------------------------------------------------------- 1 | ======= 2 | History 3 | ======= 4 | 5 | 0.1.0 (2020-11-15) 6 | ------------------ 7 | 8 | * First release on PyPI. 9 | -------------------------------------------------------------------------------- /teams_notetaker/__init__.py: -------------------------------------------------------------------------------- 1 | """Top-level package for Teams Notetaker.""" 2 | 3 | __author__ = """Jeroen Kromme""" 4 | __email__ = 'j.kromme@outlook.com' 5 | __version__ = '0.1.0' 6 | -------------------------------------------------------------------------------- /requirements_dev.txt: -------------------------------------------------------------------------------- 1 | pip==19.2.3 2 | bump2version==0.5.11 3 | wheel==0.33.6 4 | watchdog==0.9.0 5 | flake8==3.7.8 6 | tox==3.14.0 7 | coverage==4.5.4 8 | Sphinx==1.8.5 9 | twine==1.14.0 10 | 11 | -------------------------------------------------------------------------------- /AUTHORS.rst: -------------------------------------------------------------------------------- 1 | ======= 2 | Credits 3 | ======= 4 | 5 | Development Lead 6 | ---------------- 7 | 8 | * Jeroen Kromme 9 | 10 | Contributors 11 | ------------ 12 | 13 | None yet. Why not be the first? 14 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include AUTHORS.rst 2 | include CONTRIBUTING.rst 3 | include HISTORY.rst 4 | include LICENSE 5 | include README.rst 6 | 7 | recursive-include tests * 8 | recursive-exclude * __pycache__ 9 | recursive-exclude * *.py[co] 10 | 11 | recursive-include docs *.rst conf.py Makefile make.bat *.jpg *.png *.gif 12 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | # Config file for automatic testing at travis-ci.com 2 | 3 | language: python 4 | python: 5 | - 3.8 6 | - 3.7 7 | - 3.6 8 | - 3.5 9 | 10 | # Command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors 11 | install: pip install -U tox-travis 12 | 13 | # Command to run tests, e.g. python setup.py test 14 | script: tox 15 | 16 | 17 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | envlist = py35, py36, py37, py38, flake8 3 | 4 | [travis] 5 | python = 6 | 3.8: py38 7 | 3.7: py37 8 | 3.6: py36 9 | 3.5: py35 10 | 11 | [testenv:flake8] 12 | basepython = python 13 | deps = flake8 14 | commands = flake8 teams_notetaker tests 15 | 16 | [testenv] 17 | setenv = 18 | PYTHONPATH = {toxinidir} 19 | 20 | commands = python setup.py test 21 | -------------------------------------------------------------------------------- /.editorconfig: -------------------------------------------------------------------------------- 1 | # http://editorconfig.org 2 | 3 | root = true 4 | 5 | [*] 6 | indent_style = space 7 | indent_size = 4 8 | trim_trailing_whitespace = true 9 | insert_final_newline = true 10 | charset = utf-8 11 | end_of_line = lf 12 | 13 | [*.bat] 14 | indent_style = tab 15 | end_of_line = crlf 16 | 17 | [LICENSE] 18 | insert_final_newline = false 19 | 20 | [Makefile] 21 | indent_style = tab 22 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | * Teams Notetaker version: 0.1.0 2 | * Python version: 3.* 3 | * Operating System: Windows 4 | 5 | ### Description 6 | 7 | Describe what you were trying to get done. 8 | Tell us what happened, what went wrong, and what you expected to happen. 9 | 10 | ### What I Did 11 | 12 | ``` 13 | Paste the command(s) you ran and the output. 14 | If there was a crash, please include the traceback here. 15 | ``` 16 | -------------------------------------------------------------------------------- /setup.cfg: -------------------------------------------------------------------------------- 1 | [bumpversion] 2 | current_version = 0.1.0 3 | commit = True 4 | tag = True 5 | 6 | [bumpversion:file:setup.py] 7 | search = version='{current_version}' 8 | replace = version='{new_version}' 9 | 10 | [bumpversion:file:teams_notetaker/__init__.py] 11 | search = __version__ = '{current_version}' 12 | replace = __version__ = '{new_version}' 13 | 14 | [bdist_wheel] 15 | universal = 1 16 | 17 | [flake8] 18 | exclude = docs 19 | 20 | [aliases] 21 | # Define setup.py command aliases here 22 | 23 | -------------------------------------------------------------------------------- /tests/test_teams_notetaker.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """Tests for `teams_notetaker` package.""" 4 | 5 | 6 | import unittest 7 | 8 | from teams_notetaker import teams_notetaker 9 | 10 | 11 | class TestTeams_notetaker(unittest.TestCase): 12 | """Tests for `teams_notetaker` package.""" 13 | 14 | def setUp(self): 15 | """Set up test fixtures, if any.""" 16 | 17 | def tearDown(self): 18 | """Tear down test fixtures, if any.""" 19 | 20 | def test_000_something(self): 21 | """Test something.""" 22 | -------------------------------------------------------------------------------- /teams_notetaker/utils.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import logging 3 | import logging.config 4 | 5 | 6 | def get_logger(name, logfile='log.log'): 7 | log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s" 8 | logging.basicConfig( 9 | level=logging.DEBUG, format=log_format, filename=logfile, filemode="w" 10 | ) 11 | console = logging.StreamHandler() 12 | console.setLevel(logging.DEBUG) 13 | console.setFormatter(logging.Formatter(log_format)) 14 | logging.getLogger(name).addHandler(console) 15 | return logging.getLogger(name) 16 | 17 | 18 | logger = get_logger('utils') 19 | 20 | 21 | def check_cmd_application_available(application: str) -> bool: 22 | """Check whether a command line interface application is available. 23 | 24 | Parameters 25 | ---------- 26 | application : str 27 | Name of the application 28 | """ 29 | output = subprocess.run(application, shell=True, capture_output=True) 30 | if 'not recognized' not in str(output.stderr): 31 | logger.info(f'{application} available.') 32 | return True 33 | else: 34 | logger.error(f'{application} not available') 35 | return False 36 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """The setup script.""" 4 | 5 | from setuptools import setup, find_packages 6 | 7 | with open('README.md') as readme_file: 8 | readme = readme_file.read() 9 | 10 | with open('HISTORY.rst') as history_file: 11 | history = history_file.read() 12 | 13 | requirements = [] 14 | 15 | setup_requirements = [] 16 | 17 | test_requirements = [] 18 | 19 | setup( 20 | author="Jeroen Kromme", 21 | author_email='j.kromme@outlook.com', 22 | python_requires='>=3.5', 23 | classifiers=[ 24 | 'Development Status :: 2 - Pre-Alpha', 25 | 'Intended Audience :: Developers', 26 | 'License :: OSI Approved :: GNU General Public License v3 (GPLv3)', 27 | 'Natural Language :: English', 28 | 'Programming Language :: Python :: 3', 29 | 'Programming Language :: Python :: 3.5', 30 | 'Programming Language :: Python :: 3.6', 31 | 'Programming Language :: Python :: 3.7', 32 | 'Programming Language :: Python :: 3.8', 33 | ], 34 | description="Let AI take the notes of your Teams meeting", 35 | install_requires=requirements, 36 | license="GNU General Public License v3", 37 | long_description=readme + '\n\n' + history, 38 | include_package_data=True, 39 | keywords='teams_notetaker', 40 | name='teams_notetaker', 41 | packages=find_packages(include=['teams_notetaker', 'teams_notetaker.*']), 42 | setup_requires=setup_requirements, 43 | test_suite='tests', 44 | tests_require=test_requirements, 45 | url='https://github.com/kromme/teams_notetaker', 46 | version='0.1.0', 47 | zip_safe=False, 48 | ) 49 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | GNU GENERAL PUBLIC LICENSE 2 | Version 3, 29 June 2007 3 | 4 | Let AI take the notes of your Teams meeting 5 | Copyright (C) 2020 Jeroen Kromme 6 | 7 | This program is free software: you can redistribute it and/or modify 8 | it under the terms of the GNU General Public License as published by 9 | the Free Software Foundation, either version 3 of the License, or 10 | (at your option) any later version. 11 | 12 | This program is distributed in the hope that it will be useful, 13 | but WITHOUT ANY WARRANTY; without even the implied warranty of 14 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 | GNU General Public License for more details. 16 | 17 | You should have received a copy of the GNU General Public License 18 | along with this program. If not, see . 19 | 20 | Also add information on how to contact you by electronic and paper mail. 21 | 22 | You should also get your employer (if you work as a programmer) or school, 23 | if any, to sign a "copyright disclaimer" for the program, if necessary. 24 | For more information on this, and how to apply and follow the GNU GPL, see 25 | . 26 | 27 | The GNU General Public License does not permit incorporating your program 28 | into proprietary programs. If your program is a subroutine library, you 29 | may consider it more useful to permit linking proprietary applications with 30 | the library. If this is what you want to do, use the GNU Lesser General 31 | Public License instead of this License. But first, please read 32 | . 33 | 34 | -------------------------------------------------------------------------------- /teams_notetaker/summarize.py: -------------------------------------------------------------------------------- 1 | from summarizer import Summarizer 2 | from .utils import get_logger, check_cmd_application_available 3 | 4 | logger = get_logger("summarizer") 5 | 6 | 7 | def summarize( 8 | transcription: str, notes_path: str, ratio: float = 0.2, num_sentences: int = None 9 | ) -> str: 10 | """Uses BERT for extractive summarization 11 | 12 | Parameters 13 | ---------- 14 | transcription : str 15 | The transcription to be summarized 16 | notes_path : str 17 | Path to where the notes should be saved 18 | ratio : float 19 | Determine the length of the summarization in ratio of length transcription 20 | num_sentences : int 21 | Determine the length of the summarization in number of sentences 22 | 23 | Returns 24 | ------- 25 | The summarized notes 26 | """ 27 | 28 | assert ( 29 | len(transcription.split(".")) > 1 30 | ), "Transcription too short for summarization." 31 | 32 | notes = "" 33 | 34 | # initialize the summarizer 35 | try: 36 | model = Summarizer() 37 | logger.info(f"Summarizer initialized") 38 | except Exception as e: 39 | logger.error("Could not init summarizer", exc_info=e) 40 | return 41 | 42 | # Summarize 43 | try: 44 | notes = model(transcription, ratio=ratio, num_sentences=num_sentences) 45 | logger.info( 46 | f'Succesfully summarized transcription with {len(transcription.split("."))} lines to {len(notes.split("."))} sentences.' 47 | ) 48 | except Exception as e: 49 | logger.error("Could not summarise text", exc_info=e) 50 | return 51 | 52 | # save 53 | with open(notes_path, "w") as f: 54 | f.write(notes) 55 | logger.info(f"Notes successfully saved to {notes_path}") 56 | 57 | return notes 58 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | wheels/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *.cover 47 | .hypothesis/ 48 | .pytest_cache/ 49 | 50 | # Translations 51 | *.mo 52 | *.pot 53 | 54 | # Django stuff: 55 | *.log 56 | local_settings.py 57 | 58 | # Flask stuff: 59 | instance/ 60 | .webassets-cache 61 | 62 | # Scrapy stuff: 63 | .scrapy 64 | 65 | # Sphinx documentation 66 | docs/_build/ 67 | 68 | # PyBuilder 69 | target/ 70 | 71 | # Jupyter Notebook 72 | .ipynb_checkpoints 73 | 74 | # pyenv 75 | .python-version 76 | 77 | # celery beat schedule file 78 | celerybeat-schedule 79 | 80 | # SageMath parsed files 81 | *.sage.py 82 | 83 | # dotenv 84 | .env 85 | 86 | # virtualenv 87 | .venv 88 | venv/ 89 | ENV/ 90 | 91 | # Spyder project settings 92 | .spyderproject 93 | .spyproject 94 | 95 | # Rope project settings 96 | .ropeproject 97 | 98 | # mkdocs documentation 99 | /site 100 | 101 | # mypy 102 | .mypy_cache/ 103 | 104 | # IDE settings 105 | .vscode/ 106 | 107 | key.json 108 | audio 109 | notes 110 | transcripts 111 | video 112 | *.ipynb 113 | *.log -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY: clean clean-test clean-pyc clean-build docs help 2 | .DEFAULT_GOAL := help 3 | 4 | define BROWSER_PYSCRIPT 5 | import os, webbrowser, sys 6 | 7 | from urllib.request import pathname2url 8 | 9 | webbrowser.open("file://" + pathname2url(os.path.abspath(sys.argv[1]))) 10 | endef 11 | export BROWSER_PYSCRIPT 12 | 13 | define PRINT_HELP_PYSCRIPT 14 | import re, sys 15 | 16 | for line in sys.stdin: 17 | match = re.match(r'^([a-zA-Z_-]+):.*?## (.*)$$', line) 18 | if match: 19 | target, help = match.groups() 20 | print("%-20s %s" % (target, help)) 21 | endef 22 | export PRINT_HELP_PYSCRIPT 23 | 24 | BROWSER := python -c "$$BROWSER_PYSCRIPT" 25 | 26 | help: 27 | @python -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST) 28 | 29 | clean: clean-build clean-pyc clean-test ## remove all build, test, coverage and Python artifacts 30 | 31 | clean-build: ## remove build artifacts 32 | rm -fr build/ 33 | rm -fr dist/ 34 | rm -fr .eggs/ 35 | find . -name '*.egg-info' -exec rm -fr {} + 36 | find . -name '*.egg' -exec rm -f {} + 37 | 38 | clean-pyc: ## remove Python file artifacts 39 | find . -name '*.pyc' -exec rm -f {} + 40 | find . -name '*.pyo' -exec rm -f {} + 41 | find . -name '*~' -exec rm -f {} + 42 | find . -name '__pycache__' -exec rm -fr {} + 43 | 44 | clean-test: ## remove test and coverage artifacts 45 | rm -fr .tox/ 46 | rm -f .coverage 47 | rm -fr htmlcov/ 48 | rm -fr .pytest_cache 49 | 50 | lint: ## check style with flake8 51 | flake8 teams_notetaker tests 52 | 53 | test: ## run tests quickly with the default Python 54 | python setup.py test 55 | 56 | test-all: ## run tests on every Python version with tox 57 | tox 58 | 59 | coverage: ## check code coverage quickly with the default Python 60 | coverage run --source teams_notetaker setup.py test 61 | coverage report -m 62 | coverage html 63 | $(BROWSER) htmlcov/index.html 64 | 65 | docs: ## generate Sphinx HTML documentation, including API docs 66 | rm -f docs/teams_notetaker.rst 67 | rm -f docs/modules.rst 68 | sphinx-apidoc -o docs/ teams_notetaker 69 | $(MAKE) -C docs clean 70 | $(MAKE) -C docs html 71 | $(BROWSER) docs/_build/html/index.html 72 | 73 | servedocs: docs ## compile the docs watching for changes 74 | watchmedo shell-command -p '*.rst' -c '$(MAKE) -C docs html' -R -D . 75 | 76 | release: dist ## package and upload a release 77 | twine upload dist/* 78 | 79 | dist: clean ## builds source and wheel package 80 | python setup.py sdist 81 | python setup.py bdist_wheel 82 | ls -l dist 83 | 84 | install: clean ## install the package to the active Python's site-packages 85 | python setup.py install 86 | -------------------------------------------------------------------------------- /teams_notetaker/speech_recognition.py: -------------------------------------------------------------------------------- 1 | from .utils import get_logger 2 | from google.cloud import speech 3 | from google.oauth2 import service_account 4 | import tqdm 5 | import os 6 | import io 7 | 8 | logger = get_logger("speech_recognition") 9 | 10 | 11 | def setup_google_speech(key_file): 12 | """Setup the config for google speech to text 13 | """ 14 | 15 | assert os.path.isfile( 16 | key_file 17 | ), "Could not find key file, please visit https://codelabs.developers.google.com/codelabs/cloud-speech-text-python3#0" 18 | 19 | # create credentials 20 | credentials = service_account.Credentials.from_service_account_file(key_file) 21 | 22 | # setup the client and config 23 | client = speech.SpeechClient(credentials=credentials) 24 | config = speech.RecognitionConfig( 25 | encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16, 26 | # sample_rate_hertz=16000, 27 | language_code="en-US", 28 | # Enable automatic punctuation 29 | enable_automatic_punctuation=True, 30 | ) 31 | logger.info("Initialized Google Speech") 32 | return client, config 33 | 34 | 35 | def transcribe_part(audio_path: str, client, config) -> str: 36 | """Transcribes an audio file 37 | """ 38 | assert client, "Google Client not set, run _setup_google_speech()" 39 | assert config, "Google Config not set, run _setup_google_speech()" 40 | 41 | # init transcription for path 42 | transcription = "" 43 | 44 | # open audio 45 | with io.open(audio_path, "rb") as f: 46 | content = f.read() 47 | 48 | # transcribe by sending to google 49 | audio = speech.RecognitionAudio(content=content) 50 | response = client.recognize(config=config, audio=audio) 51 | 52 | # collecting results 53 | for _, result in enumerate(response.results): 54 | alternative = result.alternatives[0] 55 | transcription += " " + alternative.transcript 56 | 57 | return transcription 58 | 59 | 60 | def transcribe_all_audioparts(audio_part_folder: str, client, config) -> str: 61 | """Transcribe all parts of the audio and concat them 62 | """ 63 | # init transcription 64 | transcription = "" 65 | 66 | # collect files to transcribe 67 | files = os.listdir(audio_part_folder) 68 | 69 | logger.info(f"Transcribing {len(files)} parts") 70 | 71 | # loop and transcribe 72 | for _, audio_part in tqdm.tqdm(enumerate(files)): 73 | audio_part_path = f"{audio_part_folder}/{audio_part}" 74 | transcription += transcribe_part( 75 | audio_path=audio_part_path, client=client, config=config 76 | ) 77 | 78 | return transcription 79 | -------------------------------------------------------------------------------- /teams_notetaker/audio_utils.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import os 3 | from .utils import get_logger, check_cmd_application_available 4 | 5 | logger = get_logger("audio_utils") 6 | 7 | 8 | def extract_audio( 9 | video_path: str = None, audio_path: str = None, overwrite: bool = True 10 | ): 11 | """Extract audio file from a video using ffmpeg 12 | 13 | Parameters 14 | ---------- 15 | video_path : str 16 | Path to the video file 17 | audio_path : str 18 | Path the audio file will be saved 19 | overwrite : bool 20 | Whether to overwrite the audio file when it already exists (default is True) 21 | """ 22 | logger.info(f'Extract audio from "{video_path}" to "{audio_path}" ') 23 | # checks whether ffmpeg can be found 24 | if not check_cmd_application_available("ffmpeg"): 25 | logger.error("Could not load ffmpeg") 26 | return 27 | 28 | # checks whether video file is found 29 | assert os.path.isfile(video_path), "Can't find video" 30 | 31 | # add overwrite parameter to ffmpeg 32 | overwrite_param = "-y" if overwrite else "" 33 | if os.path.isfile(audio_path) and overwrite: 34 | logger.info(f"File already exists: overwriting {audio_path}") 35 | elif os.path.isfile(audio_path) and not overwrite: 36 | logger.info(f"File already exists: not overwriting {audio_path}") 37 | return 38 | 39 | # Call ffmpeg 40 | try: 41 | # -ab 160k -ac 2 -ar 44100 -vn 42 | command = ( 43 | f"""ffmpeg {overwrite_param} -i "{video_path}" -ac 1 "{audio_path}" """ 44 | ) 45 | subprocess.call(command, shell=True) 46 | except Exception as e: 47 | logger.error("Could not extract audio file from video.", exc_info=e) 48 | 49 | # check whether file is saved 50 | assert os.path.isfile(audio_path), "Something went wrong saving the audio file" 51 | logger.info(f"Audio successfully extracted to {audio_path}") 52 | 53 | 54 | def remove_silences_from_audio(audio_path: str = None) -> str: 55 | """Removes silences from audio file using sox. 56 | Sox doesn't allow the overwrite the same audio_path, create new one. 57 | 58 | Parameters 59 | ---------- 60 | audio_path : str 61 | Path to the audio file 62 | 63 | Returns 64 | ------- 65 | Returns new audio filename 66 | """ 67 | 68 | # checks whether sox can be found 69 | if not check_cmd_application_available("sox"): 70 | logger.error("Could not load sox") 71 | return 72 | 73 | silenced_audio_path = audio_path.replace(".wav", "_silence_removed.wav") 74 | 75 | try: 76 | command = f"""sox "{audio_path}" "{silenced_audio_path}" silence -l 1 0.1 1% -1 2.0 1%""" 77 | subprocess.call(command, shell=True) 78 | except Exception as e: 79 | logger.error("Could not extract audio file from video.", exc_info=e) 80 | 81 | logger.info(f"Silences successfully removed") 82 | return silenced_audio_path 83 | 84 | 85 | def split_audio_file(audio_path: str, audio_part_folder: str): 86 | """Split up the audio in parts of 50 seconds 87 | 88 | Parameters 89 | ---------- 90 | audio_path : str 91 | Path to the audio file 92 | audio_part_folder : str 93 | Path to where audio paths should be saved 94 | """ 95 | 96 | # checks whether ffmpeg can be found 97 | output = subprocess.run("ffmpeg", shell=True, capture_output=True) 98 | assert "not recognized" not in str(output.stderr), "ffmpeg not found" 99 | 100 | # create name 101 | audio_part_path = f"{audio_part_folder}/%03d.wav" 102 | 103 | # split up 104 | try: 105 | command = f"""ffmpeg -i "{audio_path}" -f segment -segment_time 50 -c copy -reset_timestamps 1 "{audio_part_path}" """ 106 | subprocess.call(command, shell=True) 107 | except Exception as e: 108 | logger.error("Could not extract audio file from video.", exc_info=e) 109 | 110 | logger.info( 111 | f"Audio successfully split into {len(os.listdir(audio_part_folder))} parts" 112 | ) 113 | -------------------------------------------------------------------------------- /CONTRIBUTING.rst: -------------------------------------------------------------------------------- 1 | .. highlight:: shell 2 | 3 | ============ 4 | Contributing 5 | ============ 6 | 7 | Contributions are welcome, and they are greatly appreciated! Every little bit 8 | helps, and credit will always be given. 9 | 10 | You can contribute in many ways: 11 | 12 | Types of Contributions 13 | ---------------------- 14 | 15 | Report Bugs 16 | ~~~~~~~~~~~ 17 | 18 | Report bugs at https://github.com/kromme/teams_notetaker/issues. 19 | 20 | If you are reporting a bug, please include: 21 | 22 | * Your operating system name and version. 23 | * Any details about your local setup that might be helpful in troubleshooting. 24 | * Detailed steps to reproduce the bug. 25 | 26 | Fix Bugs 27 | ~~~~~~~~ 28 | 29 | Look through the GitHub issues for bugs. Anything tagged with "bug" and "help 30 | wanted" is open to whoever wants to implement it. 31 | 32 | Implement Features 33 | ~~~~~~~~~~~~~~~~~~ 34 | 35 | Look through the GitHub issues for features. Anything tagged with "enhancement" 36 | and "help wanted" is open to whoever wants to implement it. 37 | 38 | Write Documentation 39 | ~~~~~~~~~~~~~~~~~~~ 40 | 41 | Teams Notetaker could always use more documentation, whether as part of the 42 | official Teams Notetaker docs, in docstrings, or even on the web in blog posts, 43 | articles, and such. 44 | 45 | Submit Feedback 46 | ~~~~~~~~~~~~~~~ 47 | 48 | The best way to send feedback is to file an issue at https://github.com/kromme/teams_notetaker/issues. 49 | 50 | If you are proposing a feature: 51 | 52 | * Explain in detail how it would work. 53 | * Keep the scope as narrow as possible, to make it easier to implement. 54 | * Remember that this is a volunteer-driven project, and that contributions 55 | are welcome :) 56 | 57 | Get Started! 58 | ------------ 59 | 60 | Ready to contribute? Here's how to set up `teams_notetaker` for local development. 61 | 62 | 1. Fork the `teams_notetaker` repo on GitHub. 63 | 2. Clone your fork locally:: 64 | 65 | $ git clone git@github.com:your_name_here/teams_notetaker.git 66 | 67 | 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:: 68 | 69 | $ mkvirtualenv teams_notetaker 70 | $ cd teams_notetaker/ 71 | $ python setup.py develop 72 | 73 | 4. Create a branch for local development:: 74 | 75 | $ git checkout -b name-of-your-bugfix-or-feature 76 | 77 | Now you can make your changes locally. 78 | 79 | 5. When you're done making changes, check that your changes pass flake8 and the 80 | tests, including testing other Python versions with tox:: 81 | 82 | $ flake8 teams_notetaker tests 83 | $ python setup.py test or pytest 84 | $ tox 85 | 86 | To get flake8 and tox, just pip install them into your virtualenv. 87 | 88 | 6. Commit your changes and push your branch to GitHub:: 89 | 90 | $ git add . 91 | $ git commit -m "Your detailed description of your changes." 92 | $ git push origin name-of-your-bugfix-or-feature 93 | 94 | 7. Submit a pull request through the GitHub website. 95 | 96 | Pull Request Guidelines 97 | ----------------------- 98 | 99 | Before you submit a pull request, check that it meets these guidelines: 100 | 101 | 1. The pull request should include tests. 102 | 2. If the pull request adds functionality, the docs should be updated. Put 103 | your new functionality into a function with a docstring, and add the 104 | feature to the list in README.rst. 105 | 3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check 106 | https://travis-ci.com/kromme/teams_notetaker/pull_requests 107 | and make sure that the tests pass for all supported Python versions. 108 | 109 | Tips 110 | ---- 111 | 112 | To run a subset of tests:: 113 | 114 | 115 | $ python -m unittest tests.test_teams_notetaker 116 | 117 | Deploying 118 | --------- 119 | 120 | A reminder for the maintainers on how to deploy. 121 | Make sure all your changes are committed (including an entry in HISTORY.rst). 122 | Then run:: 123 | 124 | $ bump2version patch # possible: major / minor / patch 125 | $ git push 126 | $ git push --tags 127 | 128 | Travis will then deploy to PyPI if tests pass. 129 | -------------------------------------------------------------------------------- /teams_notetaker/teams_notetaker.py: -------------------------------------------------------------------------------- 1 | """Main module.""" 2 | import datetime 3 | import io 4 | import os 5 | import subprocess 6 | 7 | from .utils import get_logger, check_cmd_application_available 8 | from .speech_recognition import ( 9 | setup_google_speech, 10 | transcribe_part, 11 | transcribe_all_audioparts, 12 | ) 13 | from .audio_utils import extract_audio, remove_silences_from_audio, split_audio_file 14 | from .summarize import summarize 15 | 16 | logger = get_logger("teams_notetaker") 17 | 18 | 19 | class TeamsNotetaker: 20 | """ 21 | A class used to take notes from teams meetings. 22 | 23 | From January, 11th 2021 all Teams recordings will be stored in OneDrive directly, see [here](https://docs.microsoft.com/en-gb/MicrosoftTeams/tmr-meeting-recording-change). Until then download it from [Stream](https://web.microsoftstream.com/) > My Content > video > Download video 24 | 25 | """ 26 | 27 | def __init__( 28 | self, 29 | filename: str, 30 | key_file: str = "key.json", 31 | audio_folder: str = "audio", 32 | transcription_folder: str = "transcripts", 33 | notes_folder: str = "notes", 34 | wd: str = None, 35 | ): 36 | self.wd = wd if wd else os.getcwd() 37 | 38 | self.AUDIO_FOLDER = f"{self.wd}/{audio_folder}" 39 | self.TRANSCRIPTION_FOLDER = f"{self.wd}/{transcription_folder}" 40 | self.NOTES_FOLDER = f"{self.wd}/{notes_folder}" 41 | self.video_path = f"{self.wd}/{filename}" 42 | 43 | self.filename, video_extension = os.path.splitext(filename) 44 | self.filename = self.filename.split("/")[-1].split("\\")[-1] 45 | self.video_extension = video_extension 46 | self.key_file = key_file 47 | 48 | self.ts = datetime.datetime.now().strftime("%Y%m%d%H%M%S") 49 | self.AUDIO_PART_FOLDER = f"{self.wd}/{audio_folder}/{self.ts}_{self.filename}" 50 | self.logfile = "log.log" 51 | 52 | # init 53 | self.audio_path = "" 54 | self.transcription_path = "" 55 | self.notes_path = "" 56 | self.config = False 57 | self.client = False 58 | 59 | # create folders 60 | self._setup_folder() 61 | self._setup_paths() 62 | self._setup_google_speech() 63 | 64 | logger.info("Teams Notetaker initialized") 65 | 66 | def _setup_folder(self): 67 | 68 | os.makedirs(self.AUDIO_FOLDER) if not os.path.exists( 69 | self.AUDIO_FOLDER 70 | ) else True 71 | os.makedirs(self.AUDIO_PART_FOLDER) if not os.path.exists( 72 | self.AUDIO_PART_FOLDER 73 | ) else True 74 | os.makedirs(self.TRANSCRIPTION_FOLDER) if not os.path.exists( 75 | self.TRANSCRIPTION_FOLDER 76 | ) else True 77 | os.makedirs(self.NOTES_FOLDER) if not os.path.exists( 78 | self.NOTES_FOLDER 79 | ) else True 80 | 81 | def _setup_paths(self): 82 | # set timestamp 83 | 84 | # set paths 85 | self.audio_path = f"{self.AUDIO_FOLDER}/{self.ts}_{self.filename}.wav" 86 | self.transcription_path = ( 87 | f"{self.TRANSCRIPTION_FOLDER}/{self.ts}_{self.filename}.txt" 88 | ) 89 | self.notes_path = f"{self.NOTES_FOLDER}/{self.ts}_{self.filename}.txt" 90 | 91 | def _setup_google_speech(self): 92 | """Setup the config for google speech to text 93 | """ 94 | self.client, self.config = setup_google_speech(self.key_file) 95 | 96 | def prepare_audio(self): 97 | """Prepare audio file by extracting the audio from the video, removing the 98 | silences and splitting it up in parts of 50 seconds 99 | """ 100 | 101 | # extract audio from the video 102 | extract_audio(video_path=self.video_path, audio_path=self.audio_path) 103 | 104 | # remove silences and renew audio_path name 105 | self.audio_path = remove_silences_from_audio(audio_path=self.audio_path) 106 | 107 | # split the audio files because google can do 1 minute max 108 | split_audio_file( 109 | audio_path=self.audio_path, audio_part_folder=self.AUDIO_PART_FOLDER 110 | ) 111 | 112 | logger.info("Audio preprocessing done") 113 | 114 | def transcribe(self): 115 | """Transcribe all parts in the audio parts folder 116 | """ 117 | self.transcription = transcribe_all_audioparts( 118 | audio_part_folder=self.AUDIO_PART_FOLDER, 119 | client=self.client, 120 | config=self.config, 121 | ) 122 | 123 | def summarize_transcription(self, ratio=0.3): 124 | """Summarize the transcriptions 125 | """ 126 | 127 | # create notes 128 | self.notes = summarize( 129 | transcription=self.transcription, notes_path=self.notes_path, ratio=ratio 130 | ) 131 | 132 | def run(self): 133 | 134 | self.prepare_audio() 135 | self.transcribe() 136 | self.summarize_transcription() 137 | return self.notes 138 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Using AI to take notes of your Teams meeting 2 | This package allows you create a summary of the meeting you recorded in Teams. 3 | 4 | Working from home everyday can be exhausting, those days where you have back to back meetings can wore you out. And at the end of day you probably have forgotten half of the information. So take notes! However, this is easier said than done. At a meeting via Teams takes more energy than in real-life, you need to compensate for the reduced amount of (non verbal) information when talking via video. Besides, when you start typing during a meeting, the others will think you're writing an email or just working, instead of paying attention. 5 | 6 | Now there is a new solution! Using AI to take notes of your Teams meeting. The only thing you need to do is record, download the video and run script. 7 | 8 | ![](https://images.pexels.com/photos/1766604/pexels-photo-1766604.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260 "taking notes") 9 | 10 | *Note*: we're going to setup a Google API for the speech recognition, there will be [costs](https://cloud.google.com/speech-to-text/pricing) associated with this. 11 | 12 | 13 | ## Installation 14 | Install ffmpeg and sox: 15 | * [ffmpeg](https://github.com/BtbN/FFmpeg-Builds/releases) 16 | * [sox](https://sourceforge.net/projects/sox/files/latest/download) 17 | 18 | Install the Teams Notetaker package by pip installing it from Github: 19 | ``` 20 | pip install git+https://github.com/kromme/Teams-Notetaker 21 | ``` 22 | 23 | ## Preparations 24 | 1. [Record the meeting](https://support.microsoft.com/en-us/office/record-a-meeting-in-teams-34dfbe7f-b07d-4a27-b4c6-de62f1348c24). Make sure you have consent from the others in the meeting. 25 | 2. Get the video: From January, 11th 2021 all Teams recordings will be stored in OneDrive directly, see [here](https://docs.microsoft.com/en-gb/MicrosoftTeams/tmr-meeting-recording-change). Until then download it from [Stream](https://web.microsoftstream.com/) > My Content > video > Download video. 26 | 3. Setup a [Google Speech API](https://cloud.google.com/docs/authentication/getting-started) and get the `key.json` and save this file in the working directory. 27 | 28 | 29 | ## Run 30 | ``` 31 | from teams_notetaker import TeamsNotetaker 32 | tn = TeamsNotetaker(filename = 'Meeting.mp4') 33 | tn.run() 34 | ``` 35 | 36 | 37 | ## Summarization 38 | There are two ways of summarizing texts with the help of AI: Abstractive and Extractive. Abstractive summarization rewrites the whole document, the algorithm interprets the article and then rewrites it in smaller set of sentences. The summarization as we learned it in highschool and university is comparable to the abstractive summarization. Extractive summarization estimates which sentences are the most important. Which technique is better depends on the task at hand, abstractive summarizations mind be better when rewriting essays or creating an introduction for an article. Extractive might be better for highlighting the most important parts of an article. 39 | 40 | For this purpose I've chosen to use extractive summarization for two reasons: 41 | 1. It better fits the purpose of the task at hand, my goal is to find the best sentences of a meeting and order them in a way which makes sense to the people in the meeting. 42 | 2. It is computational more efficient than abstractive summarization. For us, humans, it is more difficult to rewrite a document than picking the most important sentences, this also holds for algorithms. 43 | 44 | 45 | Have a look how [this article about chatbot paradoxes](https://tailo.nl/chatbotparadox/) is summarized: 46 | > The promise of less customer contact for employees, or the handling of easier questions, make chatbots immensely popular. Chatbot sometimes provides more contact\n\nThe business case for a chatbot is often made to reduce unnecessary customer contact for employees. To answer the easier questions, the chatbot is given a prominent place on the website. As a result of which you, as a customer, are forwarded to an employee. However, as we just saw, the bot often does not yet recognize the intention or the question is asked in a way that the bot has not yet learned. 47 | 48 | Or check out the summarization of the [plot of Orwell's book 1984](https://en.wikipedia.org/wiki/Nineteen_Eighty-Four): 49 | > In the year 1984, civilization has been damaged by war, civil conflict, and revolution. Those who fall out of favour with the Party become "unpersons", disappearing with all evidence of their existence destroyed. In London, Winston Smith is a member of the Outer Party, working at the Ministry of Truth, where he rewrites historical records to conform to the state\'s ever-changing version of history. Winston reflects that Syme will disappear as he is "too intelligent" and therefore dangerous to the Party. During his affair with Julia, Winston remembers the disappearance of his family during the civil war of the 1950s and his tense relationship with his wife Katharine, from whom he is separated (divorce is not permitted by the Party). O\'Brien introduces himself as a member of the Brotherhood and sends Winston a copy of The Theory and Practice of Oligarchical Collectivism by Goldstein. Winston is recalled to the Ministry to help make the major necessary revisions of the records. Both reveal betraying the other and no longer possess feelings for one other. 50 | --------------------------------------------------------------------------------