├── tests
    ├── __init__.py
    └── test_teams_notetaker.py
├── HISTORY.rst
├── teams_notetaker
    ├── __init__.py
    ├── utils.py
    ├── summarize.py
    ├── speech_recognition.py
    ├── audio_utils.py
    └── teams_notetaker.py
├── requirements_dev.txt
├── AUTHORS.rst
├── MANIFEST.in
├── .travis.yml
├── tox.ini
├── .editorconfig
├── .github
    └── ISSUE_TEMPLATE.md
├── setup.cfg
├── setup.py
├── LICENSE
├── .gitignore
├── Makefile
├── CONTRIBUTING.rst
└── README.md


/tests/__init__.py:
--------------------------------------------------------------------------------
1 | """Unit test package for teams_notetaker."""
2 | 


--------------------------------------------------------------------------------
/HISTORY.rst:
--------------------------------------------------------------------------------
1 | =======
2 | History
3 | =======
4 | 
5 | 0.1.0 (2020-11-15)
6 | ------------------
7 | 
8 | * First release on PyPI.
9 | 


--------------------------------------------------------------------------------
/teams_notetaker/__init__.py:
--------------------------------------------------------------------------------
1 | """Top-level package for Teams Notetaker."""
2 | 
3 | __author__ = """Jeroen Kromme"""
4 | __email__ = 'j.kromme@outlook.com'
5 | __version__ = '0.1.0'
6 | 


--------------------------------------------------------------------------------
/requirements_dev.txt:
--------------------------------------------------------------------------------
 1 | pip==19.2.3
 2 | bump2version==0.5.11
 3 | wheel==0.33.6
 4 | watchdog==0.9.0
 5 | flake8==3.7.8
 6 | tox==3.14.0
 7 | coverage==4.5.4
 8 | Sphinx==1.8.5
 9 | twine==1.14.0
10 | 
11 | 


--------------------------------------------------------------------------------
/AUTHORS.rst:
--------------------------------------------------------------------------------
 1 | =======
 2 | Credits
 3 | =======
 4 | 
 5 | Development Lead
 6 | ----------------
 7 | 
 8 | * Jeroen Kromme <j.kromme@outlook.com>
 9 | 
10 | Contributors
11 | ------------
12 | 
13 | None yet. Why not be the first?
14 | 


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
 1 | include AUTHORS.rst
 2 | include CONTRIBUTING.rst
 3 | include HISTORY.rst
 4 | include LICENSE
 5 | include README.rst
 6 | 
 7 | recursive-include tests *
 8 | recursive-exclude * __pycache__
 9 | recursive-exclude * *.py[co]
10 | 
11 | recursive-include docs *.rst conf.py Makefile make.bat *.jpg *.png *.gif
12 | 


--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
 1 | # Config file for automatic testing at travis-ci.com
 2 | 
 3 | language: python
 4 | python:
 5 |   - 3.8
 6 |   - 3.7
 7 |   - 3.6
 8 |   - 3.5
 9 | 
10 | # Command to install dependencies, e.g. pip install -r requirements.txt --use-mirrors
11 | install: pip install -U tox-travis
12 | 
13 | # Command to run tests, e.g. python setup.py test
14 | script: tox
15 | 
16 | 
17 | 


--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
 1 | [tox]
 2 | envlist = py35, py36, py37, py38, flake8
 3 | 
 4 | [travis]
 5 | python =
 6 |     3.8: py38
 7 |     3.7: py37
 8 |     3.6: py36
 9 |     3.5: py35
10 | 
11 | [testenv:flake8]
12 | basepython = python
13 | deps = flake8
14 | commands = flake8 teams_notetaker tests
15 | 
16 | [testenv]
17 | setenv =
18 |     PYTHONPATH = {toxinidir}
19 | 
20 | commands = python setup.py test
21 | 


--------------------------------------------------------------------------------
/.editorconfig:
--------------------------------------------------------------------------------
 1 | # http://editorconfig.org
 2 | 
 3 | root = true
 4 | 
 5 | [*]
 6 | indent_style = space
 7 | indent_size = 4
 8 | trim_trailing_whitespace = true
 9 | insert_final_newline = true
10 | charset = utf-8
11 | end_of_line = lf
12 | 
13 | [*.bat]
14 | indent_style = tab
15 | end_of_line = crlf
16 | 
17 | [LICENSE]
18 | insert_final_newline = false
19 | 
20 | [Makefile]
21 | indent_style = tab
22 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | * Teams Notetaker version: 0.1.0
 2 | * Python version: 3.*
 3 | * Operating System: Windows
 4 | 
 5 | ### Description
 6 | 
 7 | Describe what you were trying to get done.
 8 | Tell us what happened, what went wrong, and what you expected to happen.
 9 | 
10 | ### What I Did
11 | 
12 | ```
13 | Paste the command(s) you ran and the output.
14 | If there was a crash, please include the traceback here.
15 | ```
16 | 


--------------------------------------------------------------------------------
/setup.cfg:
--------------------------------------------------------------------------------
 1 | [bumpversion]
 2 | current_version = 0.1.0
 3 | commit = True
 4 | tag = True
 5 | 
 6 | [bumpversion:file:setup.py]
 7 | search = version='{current_version}'
 8 | replace = version='{new_version}'
 9 | 
10 | [bumpversion:file:teams_notetaker/__init__.py]
11 | search = __version__ = '{current_version}'
12 | replace = __version__ = '{new_version}'
13 | 
14 | [bdist_wheel]
15 | universal = 1
16 | 
17 | [flake8]
18 | exclude = docs
19 | 
20 | [aliases]
21 | # Define setup.py command aliases here
22 | 
23 | 


--------------------------------------------------------------------------------
/tests/test_teams_notetaker.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | """Tests for `teams_notetaker` package."""
 4 | 
 5 | 
 6 | import unittest
 7 | 
 8 | from teams_notetaker import teams_notetaker
 9 | 
10 | 
11 | class TestTeams_notetaker(unittest.TestCase):
12 |     """Tests for `teams_notetaker` package."""
13 | 
14 |     def setUp(self):
15 |         """Set up test fixtures, if any."""
16 | 
17 |     def tearDown(self):
18 |         """Tear down test fixtures, if any."""
19 | 
20 |     def test_000_something(self):
21 |         """Test something."""
22 | 


--------------------------------------------------------------------------------
/teams_notetaker/utils.py:
--------------------------------------------------------------------------------
 1 | import subprocess
 2 | import logging
 3 | import logging.config
 4 | 
 5 | 
 6 | def get_logger(name, logfile='log.log'):
 7 |     log_format = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
 8 |     logging.basicConfig(
 9 |         level=logging.DEBUG, format=log_format, filename=logfile, filemode="w"
10 |     )
11 |     console = logging.StreamHandler()
12 |     console.setLevel(logging.DEBUG)
13 |     console.setFormatter(logging.Formatter(log_format))
14 |     logging.getLogger(name).addHandler(console)
15 |     return logging.getLogger(name)
16 | 
17 | 
18 | logger = get_logger('utils')
19 | 
20 | 
21 | def check_cmd_application_available(application: str) -> bool:
22 |     """Check whether a command line interface application is available.
23 | 
24 |     Parameters
25 |     ----------
26 |     application : str
27 |         Name of the application
28 |     """
29 |     output = subprocess.run(application, shell=True, capture_output=True)
30 |     if 'not recognized' not in str(output.stderr):
31 |         logger.info(f'{application} available.')
32 |         return True
33 |     else:
34 |         logger.error(f'{application} not available')
35 |         return False
36 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env python
 2 | 
 3 | """The setup script."""
 4 | 
 5 | from setuptools import setup, find_packages
 6 | 
 7 | with open('README.md') as readme_file:
 8 |     readme = readme_file.read()
 9 | 
10 | with open('HISTORY.rst') as history_file:
11 |     history = history_file.read()
12 | 
13 | requirements = []
14 | 
15 | setup_requirements = []
16 | 
17 | test_requirements = []
18 | 
19 | setup(
20 |     author="Jeroen Kromme",
21 |     author_email='j.kromme@outlook.com',
22 |     python_requires='>=3.5',
23 |     classifiers=[
24 |         'Development Status :: 2 - Pre-Alpha',
25 |         'Intended Audience :: Developers',
26 |         'License :: OSI Approved :: GNU General Public License v3 (GPLv3)',
27 |         'Natural Language :: English',
28 |         'Programming Language :: Python :: 3',
29 |         'Programming Language :: Python :: 3.5',
30 |         'Programming Language :: Python :: 3.6',
31 |         'Programming Language :: Python :: 3.7',
32 |         'Programming Language :: Python :: 3.8',
33 |     ],
34 |     description="Let AI take the notes of your Teams meeting",
35 |     install_requires=requirements,
36 |     license="GNU General Public License v3",
37 |     long_description=readme + '\n\n' + history,
38 |     include_package_data=True,
39 |     keywords='teams_notetaker',
40 |     name='teams_notetaker',
41 |     packages=find_packages(include=['teams_notetaker', 'teams_notetaker.*']),
42 |     setup_requires=setup_requirements,
43 |     test_suite='tests',
44 |     tests_require=test_requirements,
45 |     url='https://github.com/kromme/teams_notetaker',
46 |     version='0.1.0',
47 |     zip_safe=False,
48 | )
49 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | GNU GENERAL PUBLIC LICENSE
 2 |                       Version 3, 29 June 2007
 3 | 
 4 |     Let AI take the notes of your Teams meeting
 5 |     Copyright (C) 2020  Jeroen Kromme
 6 | 
 7 |     This program is free software: you can redistribute it and/or modify
 8 |     it under the terms of the GNU General Public License as published by
 9 |     the Free Software Foundation, either version 3 of the License, or
10 |     (at your option) any later version.
11 | 
12 |     This program is distributed in the hope that it will be useful,
13 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
14 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
15 |     GNU General Public License for more details.
16 | 
17 |     You should have received a copy of the GNU General Public License
18 |     along with this program.  If not, see <http://www.gnu.org/licenses/>.
19 | 
20 | Also add information on how to contact you by electronic and paper mail.
21 | 
22 |   You should also get your employer (if you work as a programmer) or school,
23 | if any, to sign a "copyright disclaimer" for the program, if necessary.
24 | For more information on this, and how to apply and follow the GNU GPL, see
25 | <http://www.gnu.org/licenses/>.
26 | 
27 |   The GNU General Public License does not permit incorporating your program
28 | into proprietary programs.  If your program is a subroutine library, you
29 | may consider it more useful to permit linking proprietary applications with
30 | the library.  If this is what you want to do, use the GNU Lesser General
31 | Public License instead of this License.  But first, please read
32 | <http://www.gnu.org/philosophy/why-not-lgpl.html>.
33 | 
34 | 


--------------------------------------------------------------------------------
/teams_notetaker/summarize.py:
--------------------------------------------------------------------------------
 1 | from summarizer import Summarizer
 2 | from .utils import get_logger, check_cmd_application_available
 3 | 
 4 | logger = get_logger("summarizer")
 5 | 
 6 | 
 7 | def summarize(
 8 |     transcription: str, notes_path: str, ratio: float = 0.2, num_sentences: int = None
 9 | ) -> str:
10 |     """Uses BERT for extractive summarization
11 | 
12 |     Parameters
13 |     ----------
14 |     transcription : str
15 |         The transcription to be summarized
16 |     notes_path : str
17 |         Path to where the notes should be saved
18 |     ratio : float
19 |         Determine the length of the summarization in ratio of length transcription
20 |     num_sentences : int
21 |         Determine the length of the summarization in number of sentences
22 | 
23 |     Returns
24 |     -------
25 |     The summarized notes
26 |     """
27 | 
28 |     assert (
29 |         len(transcription.split(".")) > 1
30 |     ), "Transcription too short for summarization."
31 | 
32 |     notes = ""
33 | 
34 |     # initialize the summarizer
35 |     try:
36 |         model = Summarizer()
37 |         logger.info(f"Summarizer initialized")
38 |     except Exception as e:
39 |         logger.error("Could not init summarizer", exc_info=e)
40 |         return
41 | 
42 |     # Summarize
43 |     try:
44 |         notes = model(transcription, ratio=ratio, num_sentences=num_sentences)
45 |         logger.info(
46 |             f'Succesfully summarized transcription with {len(transcription.split("."))} lines to {len(notes.split("."))} sentences.'
47 |         )
48 |     except Exception as e:
49 |         logger.error("Could not summarise text", exc_info=e)
50 |         return
51 | 
52 |     # save
53 |     with open(notes_path, "w") as f:
54 |         f.write(notes)
55 |     logger.info(f"Notes successfully saved to {notes_path}")
56 | 
57 |     return notes
58 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | env/
 12 | build/
 13 | develop-eggs/
 14 | dist/
 15 | downloads/
 16 | eggs/
 17 | .eggs/
 18 | lib/
 19 | lib64/
 20 | parts/
 21 | sdist/
 22 | var/
 23 | wheels/
 24 | *.egg-info/
 25 | .installed.cfg
 26 | *.egg
 27 | 
 28 | # PyInstaller
 29 | #  Usually these files are written by a python script from a template
 30 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 31 | *.manifest
 32 | *.spec
 33 | 
 34 | # Installer logs
 35 | pip-log.txt
 36 | pip-delete-this-directory.txt
 37 | 
 38 | # Unit test / coverage reports
 39 | htmlcov/
 40 | .tox/
 41 | .coverage
 42 | .coverage.*
 43 | .cache
 44 | nosetests.xml
 45 | coverage.xml
 46 | *.cover
 47 | .hypothesis/
 48 | .pytest_cache/
 49 | 
 50 | # Translations
 51 | *.mo
 52 | *.pot
 53 | 
 54 | # Django stuff:
 55 | *.log
 56 | local_settings.py
 57 | 
 58 | # Flask stuff:
 59 | instance/
 60 | .webassets-cache
 61 | 
 62 | # Scrapy stuff:
 63 | .scrapy
 64 | 
 65 | # Sphinx documentation
 66 | docs/_build/
 67 | 
 68 | # PyBuilder
 69 | target/
 70 | 
 71 | # Jupyter Notebook
 72 | .ipynb_checkpoints
 73 | 
 74 | # pyenv
 75 | .python-version
 76 | 
 77 | # celery beat schedule file
 78 | celerybeat-schedule
 79 | 
 80 | # SageMath parsed files
 81 | *.sage.py
 82 | 
 83 | # dotenv
 84 | .env
 85 | 
 86 | # virtualenv
 87 | .venv
 88 | venv/
 89 | ENV/
 90 | 
 91 | # Spyder project settings
 92 | .spyderproject
 93 | .spyproject
 94 | 
 95 | # Rope project settings
 96 | .ropeproject
 97 | 
 98 | # mkdocs documentation
 99 | /site
100 | 
101 | # mypy
102 | .mypy_cache/
103 | 
104 | # IDE settings
105 | .vscode/
106 | 
107 | key.json
108 | audio
109 | notes
110 | transcripts
111 | video
112 | *.ipynb
113 | *.log


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | .PHONY: clean clean-test clean-pyc clean-build docs help
 2 | .DEFAULT_GOAL := help
 3 | 
 4 | define BROWSER_PYSCRIPT
 5 | import os, webbrowser, sys
 6 | 
 7 | from urllib.request import pathname2url
 8 | 
 9 | webbrowser.open("file://" + pathname2url(os.path.abspath(sys.argv[1])))
10 | endef
11 | export BROWSER_PYSCRIPT
12 | 
13 | define PRINT_HELP_PYSCRIPT
14 | import re, sys
15 | 
16 | for line in sys.stdin:
17 | 	match = re.match(r'^([a-zA-Z_-]+):.*?## (.*)$$', line)
18 | 	if match:
19 | 		target, help = match.groups()
20 | 		print("%-20s %s" % (target, help))
21 | endef
22 | export PRINT_HELP_PYSCRIPT
23 | 
24 | BROWSER := python -c "$$BROWSER_PYSCRIPT"
25 | 
26 | help:
27 | 	@python -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST)
28 | 
29 | clean: clean-build clean-pyc clean-test ## remove all build, test, coverage and Python artifacts
30 | 
31 | clean-build: ## remove build artifacts
32 | 	rm -fr build/
33 | 	rm -fr dist/
34 | 	rm -fr .eggs/
35 | 	find . -name '*.egg-info' -exec rm -fr {} +
36 | 	find . -name '*.egg' -exec rm -f {} +
37 | 
38 | clean-pyc: ## remove Python file artifacts
39 | 	find . -name '*.pyc' -exec rm -f {} +
40 | 	find . -name '*.pyo' -exec rm -f {} +
41 | 	find . -name '*~' -exec rm -f {} +
42 | 	find . -name '__pycache__' -exec rm -fr {} +
43 | 
44 | clean-test: ## remove test and coverage artifacts
45 | 	rm -fr .tox/
46 | 	rm -f .coverage
47 | 	rm -fr htmlcov/
48 | 	rm -fr .pytest_cache
49 | 
50 | lint: ## check style with flake8
51 | 	flake8 teams_notetaker tests
52 | 
53 | test: ## run tests quickly with the default Python
54 | 	python setup.py test
55 | 
56 | test-all: ## run tests on every Python version with tox
57 | 	tox
58 | 
59 | coverage: ## check code coverage quickly with the default Python
60 | 	coverage run --source teams_notetaker setup.py test
61 | 	coverage report -m
62 | 	coverage html
63 | 	$(BROWSER) htmlcov/index.html
64 | 
65 | docs: ## generate Sphinx HTML documentation, including API docs
66 | 	rm -f docs/teams_notetaker.rst
67 | 	rm -f docs/modules.rst
68 | 	sphinx-apidoc -o docs/ teams_notetaker
69 | 	$(MAKE) -C docs clean
70 | 	$(MAKE) -C docs html
71 | 	$(BROWSER) docs/_build/html/index.html
72 | 
73 | servedocs: docs ## compile the docs watching for changes
74 | 	watchmedo shell-command -p '*.rst' -c '$(MAKE) -C docs html' -R -D .
75 | 
76 | release: dist ## package and upload a release
77 | 	twine upload dist/*
78 | 
79 | dist: clean ## builds source and wheel package
80 | 	python setup.py sdist
81 | 	python setup.py bdist_wheel
82 | 	ls -l dist
83 | 
84 | install: clean ## install the package to the active Python's site-packages
85 | 	python setup.py install
86 | 


--------------------------------------------------------------------------------
/teams_notetaker/speech_recognition.py:
--------------------------------------------------------------------------------
 1 | from .utils import get_logger
 2 | from google.cloud import speech
 3 | from google.oauth2 import service_account
 4 | import tqdm
 5 | import os
 6 | import io
 7 | 
 8 | logger = get_logger("speech_recognition")
 9 | 
10 | 
11 | def setup_google_speech(key_file):
12 |     """Setup the config for google speech to text
13 |     """
14 | 
15 |     assert os.path.isfile(
16 |         key_file
17 |     ), "Could not find key file, please visit https://codelabs.developers.google.com/codelabs/cloud-speech-text-python3#0"
18 | 
19 |     # create credentials
20 |     credentials = service_account.Credentials.from_service_account_file(key_file)
21 | 
22 |     # setup the client and config
23 |     client = speech.SpeechClient(credentials=credentials)
24 |     config = speech.RecognitionConfig(
25 |         encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
26 |         # sample_rate_hertz=16000,
27 |         language_code="en-US",
28 |         # Enable automatic punctuation
29 |         enable_automatic_punctuation=True,
30 |     )
31 |     logger.info("Initialized Google Speech")
32 |     return client, config
33 | 
34 | 
35 | def transcribe_part(audio_path: str, client, config) -> str:
36 |     """Transcribes an audio file
37 |     """
38 |     assert client, "Google Client not set, run _setup_google_speech()"
39 |     assert config, "Google Config not set, run _setup_google_speech()"
40 | 
41 |     # init transcription for path
42 |     transcription = ""
43 | 
44 |     # open audio
45 |     with io.open(audio_path, "rb") as f:
46 |         content = f.read()
47 | 
48 |     # transcribe by sending to google
49 |     audio = speech.RecognitionAudio(content=content)
50 |     response = client.recognize(config=config, audio=audio)
51 | 
52 |     # collecting results
53 |     for _, result in enumerate(response.results):
54 |         alternative = result.alternatives[0]
55 |         transcription += " " + alternative.transcript
56 | 
57 |     return transcription
58 | 
59 | 
60 | def transcribe_all_audioparts(audio_part_folder: str, client, config) -> str:
61 |     """Transcribe all parts of the audio and concat them
62 |     """
63 |     # init transcription
64 |     transcription = ""
65 | 
66 |     # collect files to transcribe
67 |     files = os.listdir(audio_part_folder)
68 | 
69 |     logger.info(f"Transcribing {len(files)} parts")
70 | 
71 |     # loop and transcribe
72 |     for _, audio_part in tqdm.tqdm(enumerate(files)):
73 |         audio_part_path = f"{audio_part_folder}/{audio_part}"
74 |         transcription += transcribe_part(
75 |             audio_path=audio_part_path, client=client, config=config
76 |         )
77 | 
78 |     return transcription
79 | 


--------------------------------------------------------------------------------
/teams_notetaker/audio_utils.py:
--------------------------------------------------------------------------------
  1 | import subprocess
  2 | import os
  3 | from .utils import get_logger, check_cmd_application_available
  4 | 
  5 | logger = get_logger("audio_utils")
  6 | 
  7 | 
  8 | def extract_audio(
  9 |     video_path: str = None, audio_path: str = None, overwrite: bool = True
 10 | ):
 11 |     """Extract audio file from a video using ffmpeg
 12 | 
 13 |     Parameters
 14 |     ----------
 15 |     video_path : str
 16 |         Path to the video file
 17 |     audio_path : str
 18 |         Path the audio file will be saved
 19 |     overwrite : bool
 20 |         Whether to overwrite the audio file when it already exists (default is True)
 21 |     """
 22 |     logger.info(f'Extract audio from "{video_path}" to "{audio_path}" ')
 23 |     # checks whether ffmpeg can be found
 24 |     if not check_cmd_application_available("ffmpeg"):
 25 |         logger.error("Could not load ffmpeg")
 26 |         return
 27 | 
 28 |     # checks whether video file is found
 29 |     assert os.path.isfile(video_path), "Can't find video"
 30 | 
 31 |     # add overwrite parameter to ffmpeg
 32 |     overwrite_param = "-y" if overwrite else ""
 33 |     if os.path.isfile(audio_path) and overwrite:
 34 |         logger.info(f"File already exists: overwriting {audio_path}")
 35 |     elif os.path.isfile(audio_path) and not overwrite:
 36 |         logger.info(f"File already exists: not overwriting {audio_path}")
 37 |         return
 38 | 
 39 |     # Call ffmpeg
 40 |     try:
 41 |         # -ab 160k -ac 2 -ar 44100 -vn
 42 |         command = (
 43 |             f"""ffmpeg {overwrite_param} -i "{video_path}" -ac 1 "{audio_path}" """
 44 |         )
 45 |         subprocess.call(command, shell=True)
 46 |     except Exception as e:
 47 |         logger.error("Could not extract audio file from video.", exc_info=e)
 48 | 
 49 |     # check whether file is saved
 50 |     assert os.path.isfile(audio_path), "Something went wrong saving the audio file"
 51 |     logger.info(f"Audio successfully extracted to {audio_path}")
 52 | 
 53 | 
 54 | def remove_silences_from_audio(audio_path: str = None) -> str:
 55 |     """Removes silences from audio file using sox.
 56 |     Sox doesn't allow the overwrite the same audio_path, create new one.
 57 | 
 58 |     Parameters
 59 |     ----------
 60 |     audio_path : str
 61 |         Path to the audio file
 62 | 
 63 |     Returns
 64 |     -------
 65 |     Returns new audio filename
 66 |     """
 67 | 
 68 |     # checks whether sox can be found
 69 |     if not check_cmd_application_available("sox"):
 70 |         logger.error("Could not load sox")
 71 |         return
 72 | 
 73 |     silenced_audio_path = audio_path.replace(".wav", "_silence_removed.wav")
 74 | 
 75 |     try:
 76 |         command = f"""sox "{audio_path}" "{silenced_audio_path}" silence -l 1 0.1 1% -1 2.0 1%"""
 77 |         subprocess.call(command, shell=True)
 78 |     except Exception as e:
 79 |         logger.error("Could not extract audio file from video.", exc_info=e)
 80 | 
 81 |     logger.info(f"Silences successfully removed")
 82 |     return silenced_audio_path
 83 | 
 84 | 
 85 | def split_audio_file(audio_path: str, audio_part_folder: str):
 86 |     """Split up the audio in parts of 50 seconds
 87 | 
 88 |     Parameters
 89 |     ----------
 90 |     audio_path : str
 91 |         Path to the audio file
 92 |     audio_part_folder : str
 93 |         Path to where audio paths should be saved
 94 |     """
 95 | 
 96 |     # checks whether ffmpeg can be found
 97 |     output = subprocess.run("ffmpeg", shell=True, capture_output=True)
 98 |     assert "not recognized" not in str(output.stderr), "ffmpeg not found"
 99 | 
100 |     # create name
101 |     audio_part_path = f"{audio_part_folder}/%03d.wav"
102 | 
103 |     # split up
104 |     try:
105 |         command = f"""ffmpeg -i "{audio_path}" -f segment -segment_time 50 -c copy -reset_timestamps 1 "{audio_part_path}" """
106 |         subprocess.call(command, shell=True)
107 |     except Exception as e:
108 |         logger.error("Could not extract audio file from video.", exc_info=e)
109 | 
110 |     logger.info(
111 |         f"Audio successfully split into {len(os.listdir(audio_part_folder))} parts"
112 |     )
113 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.rst:
--------------------------------------------------------------------------------
  1 | .. highlight:: shell
  2 | 
  3 | ============
  4 | Contributing
  5 | ============
  6 | 
  7 | Contributions are welcome, and they are greatly appreciated! Every little bit
  8 | helps, and credit will always be given.
  9 | 
 10 | You can contribute in many ways:
 11 | 
 12 | Types of Contributions
 13 | ----------------------
 14 | 
 15 | Report Bugs
 16 | ~~~~~~~~~~~
 17 | 
 18 | Report bugs at https://github.com/kromme/teams_notetaker/issues.
 19 | 
 20 | If you are reporting a bug, please include:
 21 | 
 22 | * Your operating system name and version.
 23 | * Any details about your local setup that might be helpful in troubleshooting.
 24 | * Detailed steps to reproduce the bug.
 25 | 
 26 | Fix Bugs
 27 | ~~~~~~~~
 28 | 
 29 | Look through the GitHub issues for bugs. Anything tagged with "bug" and "help
 30 | wanted" is open to whoever wants to implement it.
 31 | 
 32 | Implement Features
 33 | ~~~~~~~~~~~~~~~~~~
 34 | 
 35 | Look through the GitHub issues for features. Anything tagged with "enhancement"
 36 | and "help wanted" is open to whoever wants to implement it.
 37 | 
 38 | Write Documentation
 39 | ~~~~~~~~~~~~~~~~~~~
 40 | 
 41 | Teams Notetaker could always use more documentation, whether as part of the
 42 | official Teams Notetaker docs, in docstrings, or even on the web in blog posts,
 43 | articles, and such.
 44 | 
 45 | Submit Feedback
 46 | ~~~~~~~~~~~~~~~
 47 | 
 48 | The best way to send feedback is to file an issue at https://github.com/kromme/teams_notetaker/issues.
 49 | 
 50 | If you are proposing a feature:
 51 | 
 52 | * Explain in detail how it would work.
 53 | * Keep the scope as narrow as possible, to make it easier to implement.
 54 | * Remember that this is a volunteer-driven project, and that contributions
 55 |   are welcome :)
 56 | 
 57 | Get Started!
 58 | ------------
 59 | 
 60 | Ready to contribute? Here's how to set up `teams_notetaker` for local development.
 61 | 
 62 | 1. Fork the `teams_notetaker` repo on GitHub.
 63 | 2. Clone your fork locally::
 64 | 
 65 |     $ git clone git@github.com:your_name_here/teams_notetaker.git
 66 | 
 67 | 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development::
 68 | 
 69 |     $ mkvirtualenv teams_notetaker
 70 |     $ cd teams_notetaker/
 71 |     $ python setup.py develop
 72 | 
 73 | 4. Create a branch for local development::
 74 | 
 75 |     $ git checkout -b name-of-your-bugfix-or-feature
 76 | 
 77 |    Now you can make your changes locally.
 78 | 
 79 | 5. When you're done making changes, check that your changes pass flake8 and the
 80 |    tests, including testing other Python versions with tox::
 81 | 
 82 |     $ flake8 teams_notetaker tests
 83 |     $ python setup.py test or pytest
 84 |     $ tox
 85 | 
 86 |    To get flake8 and tox, just pip install them into your virtualenv.
 87 | 
 88 | 6. Commit your changes and push your branch to GitHub::
 89 | 
 90 |     $ git add .
 91 |     $ git commit -m "Your detailed description of your changes."
 92 |     $ git push origin name-of-your-bugfix-or-feature
 93 | 
 94 | 7. Submit a pull request through the GitHub website.
 95 | 
 96 | Pull Request Guidelines
 97 | -----------------------
 98 | 
 99 | Before you submit a pull request, check that it meets these guidelines:
100 | 
101 | 1. The pull request should include tests.
102 | 2. If the pull request adds functionality, the docs should be updated. Put
103 |    your new functionality into a function with a docstring, and add the
104 |    feature to the list in README.rst.
105 | 3. The pull request should work for Python 3.5, 3.6, 3.7 and 3.8, and for PyPy. Check
106 |    https://travis-ci.com/kromme/teams_notetaker/pull_requests
107 |    and make sure that the tests pass for all supported Python versions.
108 | 
109 | Tips
110 | ----
111 | 
112 | To run a subset of tests::
113 | 
114 | 
115 |     $ python -m unittest tests.test_teams_notetaker
116 | 
117 | Deploying
118 | ---------
119 | 
120 | A reminder for the maintainers on how to deploy.
121 | Make sure all your changes are committed (including an entry in HISTORY.rst).
122 | Then run::
123 | 
124 | $ bump2version patch # possible: major / minor / patch
125 | $ git push
126 | $ git push --tags
127 | 
128 | Travis will then deploy to PyPI if tests pass.
129 | 


--------------------------------------------------------------------------------
/teams_notetaker/teams_notetaker.py:
--------------------------------------------------------------------------------
  1 | """Main module."""
  2 | import datetime
  3 | import io
  4 | import os
  5 | import subprocess
  6 | 
  7 | from .utils import get_logger, check_cmd_application_available
  8 | from .speech_recognition import (
  9 |     setup_google_speech,
 10 |     transcribe_part,
 11 |     transcribe_all_audioparts,
 12 | )
 13 | from .audio_utils import extract_audio, remove_silences_from_audio, split_audio_file
 14 | from .summarize import summarize
 15 | 
 16 | logger = get_logger("teams_notetaker")
 17 | 
 18 | 
 19 | class TeamsNotetaker:
 20 |     """
 21 |     A class used to take notes from teams meetings.
 22 | 
 23 |     From January, 11th 2021 all Teams recordings will be stored in OneDrive directly, see [here](https://docs.microsoft.com/en-gb/MicrosoftTeams/tmr-meeting-recording-change). Until then download it from [Stream](https://web.microsoftstream.com/) > My Content > video > Download video
 24 | 
 25 |     """
 26 | 
 27 |     def __init__(
 28 |         self,
 29 |         filename: str,
 30 |         key_file: str = "key.json",
 31 |         audio_folder: str = "audio",
 32 |         transcription_folder: str = "transcripts",
 33 |         notes_folder: str = "notes",
 34 |         wd: str = None,
 35 |     ):
 36 |         self.wd = wd if wd else os.getcwd()
 37 | 
 38 |         self.AUDIO_FOLDER = f"{self.wd}/{audio_folder}"
 39 |         self.TRANSCRIPTION_FOLDER = f"{self.wd}/{transcription_folder}"
 40 |         self.NOTES_FOLDER = f"{self.wd}/{notes_folder}"
 41 |         self.video_path = f"{self.wd}/{filename}"
 42 | 
 43 |         self.filename, video_extension = os.path.splitext(filename)
 44 |         self.filename = self.filename.split("/")[-1].split("\\")[-1]
 45 |         self.video_extension = video_extension
 46 |         self.key_file = key_file
 47 | 
 48 |         self.ts = datetime.datetime.now().strftime("%Y%m%d%H%M%S")
 49 |         self.AUDIO_PART_FOLDER = f"{self.wd}/{audio_folder}/{self.ts}_{self.filename}"
 50 |         self.logfile = "log.log"
 51 | 
 52 |         # init
 53 |         self.audio_path = ""
 54 |         self.transcription_path = ""
 55 |         self.notes_path = ""
 56 |         self.config = False
 57 |         self.client = False
 58 | 
 59 |         # create folders
 60 |         self._setup_folder()
 61 |         self._setup_paths()
 62 |         self._setup_google_speech()
 63 | 
 64 |         logger.info("Teams Notetaker initialized")
 65 | 
 66 |     def _setup_folder(self):
 67 | 
 68 |         os.makedirs(self.AUDIO_FOLDER) if not os.path.exists(
 69 |             self.AUDIO_FOLDER
 70 |         ) else True
 71 |         os.makedirs(self.AUDIO_PART_FOLDER) if not os.path.exists(
 72 |             self.AUDIO_PART_FOLDER
 73 |         ) else True
 74 |         os.makedirs(self.TRANSCRIPTION_FOLDER) if not os.path.exists(
 75 |             self.TRANSCRIPTION_FOLDER
 76 |         ) else True
 77 |         os.makedirs(self.NOTES_FOLDER) if not os.path.exists(
 78 |             self.NOTES_FOLDER
 79 |         ) else True
 80 | 
 81 |     def _setup_paths(self):
 82 |         # set timestamp
 83 | 
 84 |         # set paths
 85 |         self.audio_path = f"{self.AUDIO_FOLDER}/{self.ts}_{self.filename}.wav"
 86 |         self.transcription_path = (
 87 |             f"{self.TRANSCRIPTION_FOLDER}/{self.ts}_{self.filename}.txt"
 88 |         )
 89 |         self.notes_path = f"{self.NOTES_FOLDER}/{self.ts}_{self.filename}.txt"
 90 | 
 91 |     def _setup_google_speech(self):
 92 |         """Setup the config for google speech to text
 93 |         """
 94 |         self.client, self.config = setup_google_speech(self.key_file)
 95 | 
 96 |     def prepare_audio(self):
 97 |         """Prepare audio file by extracting the audio from the video, removing the
 98 |         silences and splitting it up in parts of 50 seconds
 99 |         """
100 | 
101 |         # extract audio from the video
102 |         extract_audio(video_path=self.video_path, audio_path=self.audio_path)
103 | 
104 |         # remove silences and renew audio_path name
105 |         self.audio_path = remove_silences_from_audio(audio_path=self.audio_path)
106 | 
107 |         # split the audio files because google can do 1 minute max
108 |         split_audio_file(
109 |             audio_path=self.audio_path, audio_part_folder=self.AUDIO_PART_FOLDER
110 |         )
111 | 
112 |         logger.info("Audio preprocessing done")
113 | 
114 |     def transcribe(self):
115 |         """Transcribe all parts in the audio parts folder
116 |         """
117 |         self.transcription = transcribe_all_audioparts(
118 |             audio_part_folder=self.AUDIO_PART_FOLDER,
119 |             client=self.client,
120 |             config=self.config,
121 |         )
122 | 
123 |     def summarize_transcription(self, ratio=0.3):
124 |         """Summarize the transcriptions
125 |         """
126 | 
127 |         # create notes
128 |         self.notes = summarize(
129 |             transcription=self.transcription, notes_path=self.notes_path, ratio=ratio
130 |         )
131 | 
132 |     def run(self):
133 | 
134 |         self.prepare_audio()
135 |         self.transcribe()
136 |         self.summarize_transcription()
137 |         return self.notes
138 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Using AI to take notes of your Teams meeting
 2 | This package allows you create a summary of the meeting you recorded in Teams.  
 3 | 
 4 | Working from home everyday can be exhausting, those days where you have back to back meetings can wore you out. And at the end of day you probably have forgotten half of the information. So take notes! However, this is easier said than done. At a meeting via Teams takes more energy than in real-life, you need to compensate for the reduced amount of (non verbal) information when talking via video. Besides, when you start typing during a meeting, the others will think you're writing an email or just working, instead of paying attention.  
 5 | 
 6 | Now there is a new solution! Using AI to take notes of your Teams meeting. The only thing you need to do is record, download the video and run script.
 7 | 
 8 | ![](https://images.pexels.com/photos/1766604/pexels-photo-1766604.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260 "taking notes")
 9 | 
10 | *Note*: we're going to setup a Google API for the speech recognition, there will be [costs](https://cloud.google.com/speech-to-text/pricing) associated with this.  
11 | 
12 | 
13 | ## Installation
14 | Install ffmpeg and sox:
15 | * [ffmpeg](https://github.com/BtbN/FFmpeg-Builds/releases)    
16 | * [sox](https://sourceforge.net/projects/sox/files/latest/download)
17 | 
18 | Install the Teams Notetaker package by pip installing it from Github:  
19 | ```
20 | pip install git+https://github.com/kromme/Teams-Notetaker
21 | ```
22 | 
23 | ## Preparations
24 | 1. [Record the meeting](https://support.microsoft.com/en-us/office/record-a-meeting-in-teams-34dfbe7f-b07d-4a27-b4c6-de62f1348c24). Make sure you have consent from the others in the meeting.
25 | 2. Get the video: From January, 11th 2021 all Teams recordings will be stored in OneDrive directly, see [here](https://docs.microsoft.com/en-gb/MicrosoftTeams/tmr-meeting-recording-change). Until then download it from [Stream](https://web.microsoftstream.com/) > My Content > video > Download video.
26 | 3. Setup a [Google Speech API](https://cloud.google.com/docs/authentication/getting-started) and get the `key.json` and save this file in the working directory.
27 | 
28 | 
29 | ## Run
30 | ```
31 | from teams_notetaker import TeamsNotetaker
32 | tn = TeamsNotetaker(filename = 'Meeting.mp4')
33 | tn.run()
34 | ```
35 | 
36 | 
37 | ## Summarization
38 | There are two ways of summarizing texts with the help of AI: Abstractive and Extractive. Abstractive summarization rewrites the whole document, the algorithm interprets the article and then rewrites it in smaller set of sentences. The summarization as we learned it in highschool and university is comparable to the abstractive summarization. Extractive summarization estimates which sentences are the most important. Which technique is better depends on the task at hand, abstractive summarizations mind be better when rewriting essays or creating an introduction for an article. Extractive might be better for highlighting the most important parts of an article.  
39 | 
40 | For this purpose I've chosen to use extractive summarization for two reasons:
41 | 1. It better fits the purpose of the task at hand, my goal is to find the best sentences of a meeting and order them in a way which makes sense to the people in the meeting.  
42 | 2. It is computational more efficient than abstractive summarization. For us, humans, it is more difficult to rewrite a document than picking the most important sentences, this also holds for algorithms.  
43 | 
44 | 
45 | Have a look how [this article about chatbot paradoxes](https://tailo.nl/chatbotparadox/) is summarized:
46 | > The promise of less customer contact for employees, or the handling of easier questions, make chatbots immensely popular. Chatbot sometimes provides more contact\n\nThe business case for a chatbot is often made to reduce unnecessary customer contact for employees. To answer the easier questions, the chatbot is given a prominent place on the website. As a result of which you, as a customer, are forwarded to an employee. However, as we just saw, the bot often does not yet recognize the intention or the question is asked in a way that the bot has not yet learned.
47 | 
48 | Or check out the summarization of the [plot of Orwell's book 1984](https://en.wikipedia.org/wiki/Nineteen_Eighty-Four):
49 | > In the year 1984, civilization has been damaged by war, civil conflict, and revolution. Those who fall out of favour with the Party become "unpersons", disappearing with all evidence of their existence destroyed. In London, Winston Smith is a member of the Outer Party, working at the Ministry of Truth, where he rewrites historical records to conform to the state\'s ever-changing version of history. Winston reflects that Syme will disappear as he is "too intelligent" and therefore dangerous to the Party. During his affair with Julia, Winston remembers the disappearance of his family during the civil war of the 1950s and his tense relationship with his wife Katharine, from whom he is separated (divorce is not permitted by the Party). O\'Brien introduces himself as a member of the Brotherhood and sends Winston a copy of The Theory and Practice of Oligarchical Collectivism by Goldstein. Winston is recalled to the Ministry to help make the major necessary revisions of the records. Both reveal betraying the other and no longer possess feelings for one other.
50 | 


--------------------------------------------------------------------------------