├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── assets ├── ffmperative.gif └── mascot.png ├── docs └── index.md ├── ffmperative ├── __init__.py ├── bin │ └── .gitignore ├── cli.py ├── interpretor.py ├── prompts.py ├── tool_mapping.py ├── tools.py └── utils.py ├── requirements.txt └── setup.py /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # CONTRIBUTING 2 | 3 | * Have a video processing workflow in mind? Want to contribute to our project? We'd love to hear from you! Raise an issue and we'll work together to design it. 4 | 5 | ### Install From Source 6 | ``` 7 | git clone https://github.com/remyxai/ffmperative.git 8 | cd ffmperative 9 | pip install . 10 | ``` 11 | 12 | ### Building a Docker Image & Running a Container 13 | Or clone this repo and build an image with the `Dockerfile`: 14 | ```bash 15 | git clone https://github.com/remyxai/FFMPerative.git 16 | cd FFMPerative/docker 17 | docker build -t ffmperative . 18 | ``` 19 | 20 | #### Run FFMPerative in a Container 21 | ```bash 22 | docker run -it -e HUGGINGFACE_TOKEN='YOUR_HF_TOKEN' -v /Videos:/Videos --entrypoint /bin/bash ffmperative:latest 23 | ``` 24 | 25 | ### Build a Debian Package from Source 26 | For debian, build and install the package: 27 | ```bash 28 | dpkg-deb --build package_build/ ffmperative.deb 29 | sudo dpkg -i ffmperative.deb 30 | ``` 31 | 32 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2023 Remyx AI 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # FFMPerative - Chat to Compose Video 2 |

3 | 4 |
5 | 6 | 7 | 8 | 9 |

10 | 11 | FFMPerative is your copilot for video editing workflows. Powered by Large Language Models (LLMs) through an intuitive chat interface, now you can compose video edits in natural language to do things like: 12 | 13 | * Change Speed, Resize, Crop, Flip, Reverse Video/GIF 14 | * Speech-to-Text Transcription and Closed-Captions 15 | 16 | Just describe your changes like [these examples](https://remyxai.github.io/FFMPerative/). 17 | 18 | ## Setup 19 | 20 | ### Requirements 21 | * Python 3 22 | * [ffmpeg](https://ffmpeg.org) 23 | 24 | PyPI: 25 | ``` 26 | pip install ffmperative 27 | ``` 28 | 29 | Or pip install from source: 30 | ``` 31 | git clone https://github.com/remyxai/FFMPerative.git 32 | cd FFMPerative && pip install . 33 | ``` 34 | 35 | ## Quickstart 36 | Add closed-captions with: 37 | 38 | ```bash 39 | ffmperative do --prompt "merge subtitles 'captions.srt' with video 'video.mp4' calling it 'video_caps.mp4'" 40 | ``` 41 | 42 | ## Features 43 | 44 | ### Python Usage 45 | Simply import the library and pass your command as a string to `ffmp`. 46 | 47 | ```python 48 | from ffmperative import ffmp 49 | 50 | ffmp("sample the 5th frame from '/path/to/video.mp4'") 51 | ``` 52 | 53 | ### Compose 🎞️ 54 | Use the `compose` call to compose clips into an edited video. Use the optional `--prompt` flag to guide the composition by text prompt. 55 | ```bash 56 | ffmperative compose --clips /path/to/video/dir --output /path/to/my_video.mp4 --prompt "Edit the video for social media" 57 | ``` 58 | 59 | ### Resources 60 | * [ffmpeg-python](https://github.com/kkroening/ffmpeg-python/) 61 | * [Sample FFMPerative Dataset](https://huggingface.co/datasets/remyxai/ffmperative-sample) 62 | * [FFMPerative LLaMA2 checkpoint](https://huggingface.co/remyxai/ffmperative-7b) 63 | * [Automatically Edit Videos from Google Drive in Colab](https://colab.research.google.com/drive/149byzCNd17dAehVuWXkiFQ2mVe_icLCa?usp=sharing) 64 | 65 | ### Community 66 | * [Join us on Discord](https://discord.com/invite/b2yGuCNpuC) 67 | -------------------------------------------------------------------------------- /assets/ffmperative.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/remyxai/FFMPerative/2a3ea5d4ddbefb0fdf2371f6607d60efc58740c8/assets/ffmperative.gif -------------------------------------------------------------------------------- /assets/mascot.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/remyxai/FFMPerative/2a3ea5d4ddbefb0fdf2371f6607d60efc58740c8/assets/mascot.png -------------------------------------------------------------------------------- /docs/index.md: -------------------------------------------------------------------------------- 1 | # FFMPerative Examples 2 | 3 | ## Basic Usage 4 | Probe for metadata info about your source media: 5 | ``` 6 | ffmperative "get info from 'hotel_lobby.mp4'" 7 | ``` 8 | 9 | Chunk your video into 3 second GOPs: 10 | ``` 11 | ffmperative "chunk 'video.mp4' into 3 second clips" 12 | ``` 13 | 14 | Pad portrait mode videos with letterboxing using the python CLI: 15 | ``` 16 | ffmp do --p "apply letterboxing to 'video.mp4' call it 'video_letterbox.mp4'" 17 | ``` 18 | 19 | Vertically stack two videos: 20 | ``` 21 | ffmp do --p "vertically stack 'video1.mp4' and 'video2.mp4' calling it 'result.mp4'" 22 | ``` 23 | 24 | ## Advanced Usage 25 | 26 | Apply VidStab to stabilize your video: 27 | ``` 28 | ffmperative "stabilize 'video.mp4'" 29 | ``` 30 | 31 | Apply Ken Burns effect to zoompan an image to video: 32 | ``` 33 | ffmp("ken burns effect on 'image.png' call it 'image_kenburns.mp4'") 34 | ``` 35 | 36 | Perform speech-to-text transcription on your video: 37 | ``` 38 | ffmperative "get subtitles from 'video.mp4'" 39 | ``` 40 | 41 | Apply an image classifier to every 5th frame from your video: 42 | ``` 43 | ffmp do --p "classify 'video.mp4' using the model 'my_model/my_model.onnx'" 44 | ``` 45 | 46 | -------------------------------------------------------------------------------- /ffmperative/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | import re 3 | import ast 4 | import shlex 5 | import requests 6 | import subprocess 7 | import pkg_resources 8 | from sys import argv 9 | 10 | from . import tools as t 11 | from .prompts import MAIN_PROMPT 12 | from .utils import download_ffmp 13 | from .tool_mapping import generate_tools_mapping 14 | from .interpretor import evaluate, extract_function_calls 15 | 16 | tools = generate_tools_mapping() 17 | 18 | def run_local(prompt): 19 | download_ffmp() 20 | ffmp_path = pkg_resources.resource_filename('ffmperative', 'bin/ffmp') 21 | safe_prompt = shlex.quote(prompt) 22 | command = '{} -p "{}"'.format(ffmp_path, safe_prompt) 23 | 24 | try: 25 | result = subprocess.run(command, capture_output=True, text=True, shell=True) 26 | 27 | output = result.stdout 28 | return output 29 | except subprocess.CalledProcessError as e: 30 | print(f"Error occurred: {e}") 31 | return None 32 | 33 | def run_remote(prompt): 34 | stop=["Task:"] 35 | complete_prompt = MAIN_PROMPT.replace("<>", prompt.replace("'", "\\'").replace('"', '\\"')) 36 | headers = {"Authorization": f"Bearer {os.environ.get('HF_ACCESS_TOKEN', '')}"} 37 | inputs = { 38 | "inputs": complete_prompt, 39 | "parameters": {"max_new_tokens": 192, "return_full_text": True, "stop":stop}, 40 | } 41 | 42 | response = requests.post("https://api-inference.huggingface.co/models/bigcode/starcoder", json=inputs, headers=headers) 43 | if response.status_code == 429: 44 | logger.info("Getting rate-limited, waiting a tiny bit before trying again.") 45 | time.sleep(1) 46 | return run_remote(prompt) 47 | elif response.status_code != 200: 48 | raise ValueError(f"Error {response.status_code}: {response.json()}") 49 | 50 | result = response.json()[0]["generated_text"] 51 | for stop_seq in stop: 52 | if result.endswith(stop_seq): 53 | res = result[: -len(stop_seq)] 54 | answer = res.split("Answer:")[-1].strip() 55 | return answer 56 | return result 57 | 58 | def ffmp(prompt, remote=False, tools=tools): 59 | if remote: 60 | parsed_output = run_remote(prompt) 61 | else: 62 | parsed_output = run_local(prompt) 63 | if parsed_output: 64 | try: 65 | extracted_output = extract_function_calls(parsed_output, tools) 66 | parsed_ast = ast.parse(extracted_output) 67 | result = evaluate(parsed_ast, tools) 68 | return result 69 | except SyntaxError as e: 70 | print(f"Syntax error in parsed output: {e}") 71 | else: 72 | return None 73 | -------------------------------------------------------------------------------- /ffmperative/bin/.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore everything in this directory 2 | * 3 | # Except this file 4 | !.gitignore 5 | -------------------------------------------------------------------------------- /ffmperative/cli.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | from . import ffmp 3 | from .utils import call_director, process_and_concatenate_clips 4 | from pprint import pprint 5 | 6 | def main(): 7 | parser = argparse.ArgumentParser(description="FFMperative CLI tool") 8 | 9 | subparsers_action = parser.add_subparsers(dest="action", help="Top-level actions") 10 | 11 | # Parser for 'do' action 12 | do_parser = subparsers_action.add_parser("do", help="Run task with ffmp Agent") 13 | do_parser.add_argument("--prompt", required=True, help="Prompt to perform a task") 14 | do_parser.add_argument("--remote", action='store_true', default=False, required=False, help="Run remotely") 15 | 16 | # Parser for 'compose' action 17 | compose_parser = subparsers_action.add_parser("compose", help="Compose clips into a video") 18 | compose_parser.add_argument("--clips", required=True, help="Path to clips directory") 19 | compose_parser.add_argument("--prompt", required=False, default=None, help="Guide the composition by text prompt e.g. 'Edit the video for social media'") 20 | compose_parser.add_argument("--output", required=False, default="composed_video.mp4", help="Filename for edited video. Default is 'composed_video.mp4'") 21 | compose_parser.add_argument("--remote", action='store_true', default=False, required=False, help="Run remotely") 22 | 23 | args = parser.parse_args() 24 | 25 | if args.action == "do": 26 | results = ffmp(args.prompt, args.remote) 27 | pprint(results) 28 | elif args.action == "compose": 29 | compose_plans, join_plan = call_director(args.clips, args.prompt) 30 | for plan in compose_plans: 31 | try: 32 | ffmp(plan, args.remote) 33 | except: 34 | print("skipped plan: ", plan) 35 | results = process_and_concatenate_clips(join_plan, args.output) 36 | pprint(results) 37 | else: 38 | print("Invalid action") 39 | 40 | if __name__ == "__main__": 41 | main() 42 | -------------------------------------------------------------------------------- /ffmperative/interpretor.py: -------------------------------------------------------------------------------- 1 | import re 2 | import ast 3 | import difflib 4 | from collections.abc import Mapping 5 | from typing import Any, Callable, Dict, List 6 | 7 | 8 | class InterpretorError(ValueError): 9 | """ 10 | An error raised when the interpretor cannot evaluate a Python expression, due to syntax error or unsupported 11 | operations. 12 | """ 13 | 14 | pass 15 | 16 | 17 | def extract_function_calls(text, tools): 18 | # Update the regex pattern to match tool names followed by an opening parenthesis 19 | pattern = r'(' + '|'.join(re.escape(tool) for tool in tools) + r')\(' 20 | 21 | # Find all occurrences of the tool function calls 22 | matches = re.finditer(pattern, text) 23 | 24 | function_calls = [] 25 | for match in matches: 26 | start = match.start() 27 | open_paren = text.find('(', start) 28 | if open_paren == -1: 29 | continue 30 | 31 | # Count the parentheses to find the matching closing parenthesis 32 | end_paren, nested_paren = open_paren, 0 33 | for char in text[open_paren:]: 34 | if char == '(': 35 | nested_paren += 1 36 | elif char == ')': 37 | nested_paren -= 1 38 | if nested_paren == 0: 39 | break 40 | end_paren += 1 41 | 42 | # Extract the function call 43 | function_call = text[start:end_paren + 1] 44 | function_calls.append(function_call) 45 | 46 | return "\n".join(function_calls) 47 | 48 | 49 | def evaluate(code: str, tools: Dict[str, Callable], state=None, chat_mode=False): 50 | """ 51 | Evaluate a python expression using the content of the variables stored in a state and only evaluating a given set 52 | of functions. 53 | 54 | This function will recurse through the nodes of the tree provided. 55 | 56 | Args: 57 | code (`str`): 58 | The code to evaluate. 59 | tools (`Dict[str, Callable]`): 60 | The functions that may be called during the evaluation. Any call to another function will fail with an 61 | `InterpretorError`. 62 | state (`Dict[str, Any]`): 63 | A dictionary mapping variable names to values. The `state` should contain the initial inputs but will be 64 | updated by this function to contain all variables as they are evaluated. 65 | chat_mode (`bool`, *optional*, defaults to `False`): 66 | Whether or not the function is called from `Agent.chat`. 67 | """ 68 | try: 69 | expression = ast.parse(code) 70 | except SyntaxError as e: 71 | print("The code generated by the agent is not valid.\n", e) 72 | return 73 | if state is None: 74 | state = {} 75 | result = None 76 | for idx, node in enumerate(expression.body): 77 | try: 78 | line_result = evaluate_ast(node, state, tools) 79 | except InterpretorError as e: 80 | msg = f"Evaluation of the code stopped at line {idx} before the end because of the following error" 81 | if chat_mode: 82 | msg += ( 83 | f". Copy paste the following error message and send it back to the agent:\nI get an error: '{e}'" 84 | ) 85 | else: 86 | msg += f":\n{e}" 87 | print(msg) 88 | break 89 | if line_result is not None: 90 | result = line_result 91 | 92 | return result 93 | 94 | 95 | def evaluate_ast(expression: ast.AST, state: Dict[str, Any], tools: Dict[str, Callable]): 96 | """ 97 | Evaluate an absract syntax tree using the content of the variables stored in a state and only evaluating a given 98 | set of functions. 99 | 100 | This function will recurse trough the nodes of the tree provided. 101 | 102 | Args: 103 | expression (`ast.AST`): 104 | The code to evaluate, as an abastract syntax tree. 105 | state (`Dict[str, Any]`): 106 | A dictionary mapping variable names to values. The `state` is updated if need be when the evaluation 107 | encounters assignements. 108 | tools (`Dict[str, Callable]`): 109 | The functions that may be called during the evaluation. Any call to another function will fail with an 110 | `InterpretorError`. 111 | """ 112 | if isinstance(expression, ast.Assign): 113 | # Assignement -> we evaluate the assignement which should update the state 114 | # We return the variable assigned as it may be used to determine the final result. 115 | return evaluate_assign(expression, state, tools) 116 | elif isinstance(expression, ast.Call): 117 | # Function call -> we return the value of the function call 118 | return evaluate_call(expression, state, tools) 119 | elif isinstance(expression, ast.Constant): 120 | # Constant -> just return the value 121 | return expression.value 122 | elif isinstance(expression, ast.Dict): 123 | # Dict -> evaluate all keys and values 124 | keys = [evaluate_ast(k, state, tools) for k in expression.keys] 125 | values = [evaluate_ast(v, state, tools) for v in expression.values] 126 | return dict(zip(keys, values)) 127 | elif isinstance(expression, ast.Expr): 128 | # Expression -> evaluate the content 129 | return evaluate_ast(expression.value, state, tools) 130 | elif isinstance(expression, ast.For): 131 | # For loop -> execute the loop 132 | return evaluate_for(expression, state, tools) 133 | elif isinstance(expression, ast.FormattedValue): 134 | # Formatted value (part of f-string) -> evaluate the content and return 135 | return evaluate_ast(expression.value, state, tools) 136 | elif isinstance(expression, ast.If): 137 | # If -> execute the right branch 138 | return evaluate_if(expression, state, tools) 139 | elif hasattr(ast, "Index") and isinstance(expression, ast.Index): 140 | return evaluate_ast(expression.value, state, tools) 141 | elif isinstance(expression, ast.JoinedStr): 142 | return "".join([str(evaluate_ast(v, state, tools)) for v in expression.values]) 143 | elif isinstance(expression, ast.List): 144 | # List -> evaluate all elements 145 | return [evaluate_ast(elt, state, tools) for elt in expression.elts] 146 | elif isinstance(expression, ast.Name): 147 | # Name -> pick up the value in the state 148 | return evaluate_name(expression, state, tools) 149 | elif isinstance(expression, ast.Subscript): 150 | # Subscript -> return the value of the indexing 151 | return evaluate_subscript(expression, state, tools) 152 | else: 153 | # For now we refuse anything else. Let's add things as we need them. 154 | raise InterpretorError(f"{expression.__class__.__name__} is not supported.") 155 | 156 | 157 | def evaluate_assign(assign, state, tools): 158 | var_names = assign.targets 159 | result = evaluate_ast(assign.value, state, tools) 160 | 161 | if len(var_names) == 1: 162 | state[var_names[0].id] = result 163 | else: 164 | if len(result) != len(var_names): 165 | raise InterpretorError(f"Expected {len(var_names)} values but got {len(result)}.") 166 | for var_name, r in zip(var_names, result): 167 | state[var_name.id] = r 168 | return result 169 | 170 | 171 | def evaluate_call(call, state, tools): 172 | if not isinstance(call.func, ast.Name): 173 | raise InterpretorError( 174 | f"It is not permitted to evaluate other functions than the provided tools (tried to execute {call.func} of " 175 | f"type {type(call.func)}." 176 | ) 177 | func_name = call.func.id 178 | if func_name not in tools: 179 | raise InterpretorError( 180 | f"It is not permitted to evaluate other functions than the provided tools (tried to execute {call.func.id})." 181 | ) 182 | 183 | func = tools[func_name] 184 | # Todo deal with args 185 | args = [evaluate_ast(arg, state, tools) for arg in call.args] 186 | kwargs = {keyword.arg: evaluate_ast(keyword.value, state, tools) for keyword in call.keywords} 187 | return func(*args, **kwargs) 188 | 189 | 190 | def evaluate_subscript(subscript, state, tools): 191 | index = evaluate_ast(subscript.slice, state, tools) 192 | value = evaluate_ast(subscript.value, state, tools) 193 | if isinstance(value, (list, tuple)): 194 | return value[int(index)] 195 | if index in value: 196 | return value[index] 197 | if isinstance(index, str) and isinstance(value, Mapping): 198 | close_matches = difflib.get_close_matches(index, list(value.keys())) 199 | if len(close_matches) > 0: 200 | return value[close_matches[0]] 201 | 202 | raise InterpretorError(f"Could not index {value} with '{index}'.") 203 | 204 | 205 | def evaluate_name(name, state, tools): 206 | if name.id in state: 207 | return state[name.id] 208 | close_matches = difflib.get_close_matches(name.id, list(state.keys())) 209 | if len(close_matches) > 0: 210 | return state[close_matches[0]] 211 | raise InterpretorError(f"The variable `{name.id}` is not defined.") 212 | 213 | 214 | def evaluate_condition(condition, state, tools): 215 | if len(condition.ops) > 1: 216 | raise InterpretorError("Cannot evaluate conditions with multiple operators") 217 | 218 | left = evaluate_ast(condition.left, state, tools) 219 | comparator = condition.ops[0] 220 | right = evaluate_ast(condition.comparators[0], state, tools) 221 | 222 | if isinstance(comparator, ast.Eq): 223 | return left == right 224 | elif isinstance(comparator, ast.NotEq): 225 | return left != right 226 | elif isinstance(comparator, ast.Lt): 227 | return left < right 228 | elif isinstance(comparator, ast.LtE): 229 | return left <= right 230 | elif isinstance(comparator, ast.Gt): 231 | return left > right 232 | elif isinstance(comparator, ast.GtE): 233 | return left >= right 234 | elif isinstance(comparator, ast.Is): 235 | return left is right 236 | elif isinstance(comparator, ast.IsNot): 237 | return left is not right 238 | elif isinstance(comparator, ast.In): 239 | return left in right 240 | elif isinstance(comparator, ast.NotIn): 241 | return left not in right 242 | else: 243 | raise InterpretorError(f"Operator not supported: {comparator}") 244 | 245 | 246 | def evaluate_if(if_statement, state, tools): 247 | result = None 248 | if evaluate_condition(if_statement.test, state, tools): 249 | for line in if_statement.body: 250 | line_result = evaluate_ast(line, state, tools) 251 | if line_result is not None: 252 | result = line_result 253 | else: 254 | for line in if_statement.orelse: 255 | line_result = evaluate_ast(line, state, tools) 256 | if line_result is not None: 257 | result = line_result 258 | return result 259 | 260 | 261 | def evaluate_for(for_loop, state, tools): 262 | result = None 263 | iterator = evaluate_ast(for_loop.iter, state, tools) 264 | for counter in iterator: 265 | state[for_loop.target.id] = counter 266 | for expression in for_loop.body: 267 | line_result = evaluate_ast(expression, state, tools) 268 | if line_result is not None: 269 | result = line_result 270 | return result 271 | -------------------------------------------------------------------------------- /ffmperative/prompts.py: -------------------------------------------------------------------------------- 1 | MAIN_PROMPT = 'I will ask you to perform a task, your job is to come up with a series of simple commands in Python that will perform the task.\nTo help you, I will give you access to a set of tools that you can use. Each tool is a Python function and has a description explaining the task it performs, the inputs it expects and the outputs it returns.\nYou should first explain which tool you will use to perform the task and for what reason, then write the code in Python.\nEach instruction in Python should be a simple assignment. You can print intermediate results if it makes sense to do so.\n\nTools:\n- DocumentQa: This is a tool that answers a question about an document (pdf). It takes an input named `document` which should be the document containing the information, as well as a `question` that is the question about the document. It returns a text that contains the answer to the question.\n- ImageCaptioner: This is a tool that generates a description of an image. It takes an input named `image` which should be the image to caption, and returns a text that contains the description in English.\n- ImageQa: This is a tool that answers a question about an image. It takes an input named `image` which should be the image containing the information, as well as a `question` which should be the question in English. It returns a text that is the answer to the question.\n- ImageSegmenter: This is a tool that creates a segmentation mask of an image according to a label. It cannot create an image. It takes two arguments named `image` which should be the original image, and `label` which should be a text describing the elements what should be identified in the segmentation mask. The tool returns the mask.\n- Transcriber: This is a tool that transcribes an audio into text. It takes an input named `audio` and returns the transcribed text.\n- Summarizer: This is a tool that summarizes an English text. It takes an input `text` containing the text to summarize, and returns a summary of the text.\n- TextClassifier: This is a tool that classifies an English text using provided labels. It takes two inputs: `text`, which should be the text to classify, and `labels`, which should be the list of labels to use for classification. It returns the most likely label in the list of provided `labels` for the input text.\n- TextQa: This is a tool that answers questions related to a text. It takes two arguments named `text`, which is the text where to find the answer, and `question`, which is the question, and returns the answer to the question.\n- TextReader: This is a tool that reads an English text out loud. It takes an input named `text` which should contain the text to read (in English) and returns a waveform object containing the sound.\n- Translator: This is a tool that translates text from a language to another. It takes three inputs: `text`, which should be the text to translate, `src_lang`, which should be the language of the text to translate and `tgt_lang`, which should be the language for the desired ouput language. Both `src_lang` and `tgt_lang` are written in plain English, such as \'Romanian\', or \'Albanian\'. It returns the text translated in `tgt_lang`.\n- ImageTransformer: This is a tool that transforms an image according to a prompt. It takes two inputs: `image`, which should be the image to transform, and `prompt`, which should be the prompt to use to change it. The prompt should only contain descriptive adjectives, as if completing the prompt of the original image. It returns the modified image.\n- TextDownloader: This is a tool that downloads a file from a `url`. It takes the `url` as input, and returns the text contained in the file.\n- ImageGenerator: This is a tool that creates an image according to a prompt, which is a text description. It takes an input named `prompt` which contains the image description and outputs an image.\n- VideoGenerator: This is a tool that creates a video according to a text description. It takes an input named `prompt` which contains the image description, as well as an optional input `seconds` which will be the duration of the video. The default is of two seconds. The tool outputs a video object.\n- n\nTools:\n- AudioAdjustmentTool: \n This tool modifies audio levels for an input video.\n Inputs are input_path, output_path, level (e.g. 0.5 or -13dB).\n Output is the output_path.\n \n- AudioVideoMuxTool: \n This tool muxes (combines) a video and an audio file.\n Inputs are input_path as a string, audio_path as a string, and output_path as a string.\n Output is the output_path.\n \n- FFProbeTool: \n This tool extracts metadata from input video using ffmpeg/ffprobe\n Input is input_path and output is video metadata as JSON.\n \n- ImageDirectoryToVideoTool: \n This tool creates video\n from a directory of images. Inputs\n are input_path and output_path. \n Output is the output_path.\n \n- ImageToVideoTool: \n This tool generates an N-second video clip from an image.\n Inputs are image_path, duration, output_path.\n \n- VideoCropTool: \n This tool crops a video with inputs: \n input_path, output_path, \n top_x, top_y, \n bottom_x, bottom_y.\n Output is the output_path.\n \n- VideoFlipTool: \n This tool flips video along the horizontal \n or vertical axis. Inputs are input_path, \n output_path and orientation. Output is output_path.\n \n- VideoFrameSampleTool: \n This tool samples an image frame from an input video. \n Inputs are input_path, output_path, and frame_number.\n Output is the output_path.\n \n- VideoGopChunkerTool: \n This tool segments video input into GOPs (Group of Pictures) chunks of \n segment_length (in seconds). Inputs are input_path and segment_length.\n \n- VideoHTTPServerTool: \n This tool streams a source video to an HTTP server. \n Inputs are input_path and server_url.\n \n- VideoLetterBoxingTool: \n This tool adds letterboxing to a video.\n Inputs are input_path, output_path, width, height, bg_color.\n \n- VideoOverlayTool: \n This tool overlays one video on top of another.\n Inputs are main_video_path, overlay_video_path, output_path, x_position, y_position.\n \n- VideoResizeTool: \n This tool resizes the video to the specified dimensions.\n Inputs are input_path, width, height, output_path.\n \n- VideoReverseTool: \n This tool reverses a video. \n Inputs are input_path and output_path.\n \n- VideoRotateTool: \n This tool rotates a video by a specified angle. \n Inputs are input_path, output_path and rotation_angle in degrees.\n \n- VideoSegmentDeleteTool: \n This tool deletes a interval of video by timestamp.\n Inputs are input_path, output_path, start, end.\n Format start/end as float.\n \n- VideoSpeedTool: \n This tool speeds up a video. \n Inputs are input_path as a string, output_path as a string, speed_factor (float) as a string.\n Output is the output_path.\n \n- VideoStackTool: \n This tool stacks two videos either vertically or horizontally based on the orientation parameter.\n Inputs are input_path, second_input, output_path, and orientation as strings.\n Output is the output_path.\n vertical orientation -> vstack, horizontal orientation -> hstack\n \n- VideoTrimTool: \n This tool trims a video. Inputs are input_path, output_path, \n start_time, and end_time. Format start(end)_time: HH:MM:SS\n \n- VideoWatermarkTool: \n This tool adds logo image as watermark to a video. \n Inputs are input_path, output_path, watermark_path.\n \n\n\nTask: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French."\n\nI will use the following tools: `Translator` to translate the question into English and then `ImageQa` to answer the question on the input image.\n\nAnswer:\n```py\ntranslated_question = translator(question=question, src_lang="French", tgt_lang="English")\nprint(f"The translated question is {translated_question}.")\nanswer = ImageQa(image=image, question=translated_question)\nprint(f"The answer is {answer}")\n```\n\nTask: "Identify the oldest person in the `document` and create an image showcasing the result."\n\nI will use the following tools: `DocumentQa` to find the oldest person in the document, then `ImageGenerator` to generate an image according to the answer.\n\nAnswer:\n```py\nanswer = DocumentQa(document, question="What is the oldest person?")\nprint(f"The answer is {answer}.")\nimage = ImageGenerator(answer)\n```\n\nTask: "Generate an image using the text given in the variable `caption`."\n\nI will use the following tool: `ImageGenerator` to generate an image.\n\nAnswer:\n```py\nimage = ImageGenerator(prompt=caption)\n```\n\nTask: "Summarize the text given in the variable `text` and read it out loud."\n\nI will use the following tools: `Summarizer` to create a summary of the input text, then `TextReader` to read it out loud.\n\nAnswer:\n```py\nsummarized_text = Summarizer(text)\nprint(f"Summary: {summarized_text}")\naudio_summary = TextReader(summarized_text)\n```\n\nTask: "Answer the question in the variable `question` about the text in the variable `text`. Use the answer to generate an image."\n\nI will use the following tools: `TextQa` to create the answer, then `ImageGenerator` to generate an image according to the answer.\n\nAnswer:\n```py\nanswer = TextQa(text=text, question=question)\nprint(f"The answer is {answer}.")\nimage = ImageGenerator(answer)\n```\n\nTask: "Caption the following `image`."\n\nI will use the following tool: `ImageCaptioner` to generate a caption for the image.\n\nAnswer:\n```py\ncaption = ImageCaptioner(image)\n```\n\nTask: "<>"\n\nI will use the following' 2 | -------------------------------------------------------------------------------- /ffmperative/tool_mapping.py: -------------------------------------------------------------------------------- 1 | import inspect 2 | from . import tools as t 3 | 4 | def generate_tools_mapping(): 5 | tools_mapping = {} 6 | for name, obj in inspect.getmembers(t): 7 | if inspect.isclass(obj) and issubclass(obj, t.Tool) and obj is not t.Tool: 8 | tools_mapping[name] = obj() # Instantiate the class 9 | return tools_mapping 10 | -------------------------------------------------------------------------------- /ffmperative/tools.py: -------------------------------------------------------------------------------- 1 | import math 2 | import json 3 | import ffmpeg 4 | from PIL import Image 5 | from io import BytesIO 6 | from pathlib import Path 7 | 8 | from typing import List 9 | 10 | from .utils import get_video_info, has_audio 11 | 12 | class Tool: 13 | """ 14 | A base class for the functions used by the agent. Subclass this and implement the `__call__` method as well as the 15 | following class attributes: 16 | 17 | - **description** (`str`) -- A short description of what your tool does, the inputs it expects and the output(s) it 18 | will return. For instance 'This is a tool that downloads a file from a `url`. It takes the `url` as input, and 19 | returns the text contained in the file'. 20 | - **name** (`str`) -- A performative name that will be used for your tool in the prompt to the agent. For instance 21 | `"text-classifier"` or `"image_generator"`. 22 | - **inputs** (`List[str]`) -- The list of modalities expected for the inputs (in the same order as in the call). 23 | Modalitiies should be `"text"`, `"image"` or `"audio"`. This is only used by `launch_gradio_demo` or to make a 24 | nice space from your tool. 25 | - **outputs** (`List[str]`) -- The list of modalities returned but the tool (in the same order as the return of the 26 | call method). Modalitiies should be `"text"`, `"image"` or `"audio"`. This is only used by `launch_gradio_demo` 27 | or to make a nice space from your tool. 28 | 29 | You can also override the method [`~Tool.setup`] if your tool as an expensive operation to perform before being 30 | usable (such as loading a model). [`~Tool.setup`] will be called the first time you use your tool, but not at 31 | instantiation. 32 | """ 33 | 34 | description: str = "This is a tool that ..." 35 | name: str = "" 36 | 37 | inputs: List[str] 38 | outputs: List[str] 39 | 40 | def __init__(self, *args, **kwargs): 41 | self.is_initialized = False 42 | 43 | def __call__(self, *args, **kwargs): 44 | return NotImplemented("Write this method in your subclass of `Tool`.") 45 | 46 | def setup(self): 47 | """ 48 | Overwrite this method here for any operation that is expensive and needs to be executed before you start using 49 | your tool. Such as loading a big model. 50 | """ 51 | self.is_initialized = True 52 | 53 | class AudioAdjustmentTool(Tool): 54 | description = """ 55 | This tool modifies audio levels for an input video. 56 | Inputs are input_path, output_path, level (e.g. 0.5 or -13dB). 57 | Output is the output_path. 58 | """ 59 | inputs = ["text", "text", "text"] 60 | outputs = ["text"] 61 | 62 | def __call__(self, input_path: str, output_path: str, level: str): 63 | ( 64 | ffmpeg.input(input_path) 65 | .output(output_path, af="volume={}".format(level)) 66 | .overwrite_output() 67 | .run() 68 | ) 69 | return output_path 70 | 71 | 72 | class AudioVideoMuxTool(Tool): 73 | description = """ 74 | This tool muxes (combines) a video and an audio file. 75 | Inputs are input_path as a string, audio_path as a string, and output_path as a string. 76 | Output is the output_path. 77 | """ 78 | inputs = ["text", "text", "text"] 79 | outputs = ["None"] 80 | 81 | def __call__(self, input_path: str, audio_path: str, output_path: str): 82 | input_video = ffmpeg.input(input_path) 83 | added_audio = ffmpeg.input(audio_path) 84 | 85 | if has_audio(input_path): 86 | merged_audio = ffmpeg.filter([input_video.audio, added_audio], "amix") 87 | output_video = ffmpeg.concat(input_video, merged_audio, v=1, a=1) 88 | else: 89 | output_video = ffmpeg.concat(input_video, added_audio, v=1, a=1) 90 | output_video.output(output_path).run(overwrite_output=True) 91 | 92 | 93 | class FFProbeTool(Tool): 94 | description = """ 95 | This tool extracts metadata from input video using ffmpeg/ffprobe 96 | Input is input_path and output is video metadata as JSON. 97 | """ 98 | inputs = ["text"] 99 | outputs = ["None"] 100 | 101 | def __call__(self, input_path: str): 102 | video_info = get_video_info(input_path) 103 | return json.dumps(video_info, indent=2) if video_info else None 104 | 105 | 106 | class ImageToVideoTool(Tool): 107 | description = """ 108 | This tool generates an N-second video clip from an image. 109 | Inputs are image_path, duration, output_path. 110 | """ 111 | inputs = ["text", "text", "integer", "integer"] 112 | outputs = ["None"] 113 | 114 | def __call__( 115 | self, image_path: str, output_path: str, duration: int, framerate: int = 24 116 | ): 117 | ( 118 | ffmpeg.input(image_path, loop=1, t=duration, framerate=framerate) 119 | .output(output_path, vcodec="libx264") 120 | .overwrite_output() 121 | .run() 122 | ) 123 | 124 | 125 | class ImageDirectoryToVideoTool(Tool): 126 | description = """ 127 | This tool creates video 128 | from a directory of images. Inputs 129 | are input_path and output_path. 130 | Output is the output_path. 131 | """ 132 | inputs = ["text", "text"] 133 | outputs = ["None"] 134 | 135 | def __call__( 136 | self, 137 | input_path: str, 138 | output_path: str, 139 | framerate: int = 24, 140 | extension: str = "jpg", 141 | ): 142 | # Check for valid extension 143 | valid_extensions = ["jpg", "png", "jpeg"] 144 | if extension not in valid_extensions: 145 | raise ValueError( 146 | f"Invalid extension {extension}. Must be one of {valid_extensions}" 147 | ) 148 | 149 | ( 150 | ffmpeg.input( 151 | input_path.rstrip("/") + "/*." + extension.lstrip("."), 152 | pattern_type="glob", 153 | framerate=framerate, 154 | ) 155 | .output(output_path) 156 | .overwrite_output() 157 | .run() 158 | ) 159 | 160 | 161 | class VideoCropTool(Tool): 162 | description = """ 163 | This tool crops a video with inputs: 164 | input_path, output_path, 165 | top_x, top_y, 166 | bottom_x, bottom_y. 167 | Output is the output_path. 168 | """ 169 | inputs = ["text", "text", "text", "text", "text", "text"] 170 | outputs = ["text"] 171 | 172 | def __call__( 173 | self, 174 | input_path: str, 175 | output_path: str, 176 | top_x: str, 177 | top_y: str, 178 | bottom_x: str, 179 | bottom_y: str, 180 | ): 181 | stream = ffmpeg.input(input_path) 182 | stream = ffmpeg.crop( 183 | stream, 184 | int(top_y), 185 | int(top_x), 186 | int(bottom_y) - int(top_y), 187 | int(bottom_x) - int(top_x), 188 | ) 189 | stream = ffmpeg.output(stream, output_path) 190 | ffmpeg.run(stream) 191 | return output_path 192 | 193 | 194 | class VideoFlipTool(Tool): 195 | description = """ 196 | This tool flips video along the horizontal 197 | or vertical axis. Inputs are input_path, 198 | output_path and orientation. Output is output_path. 199 | """ 200 | inputs = ["text", "text", "text"] 201 | outputs = ["text"] 202 | 203 | def __call__( 204 | self, input_path: str, output_path: str, orientation: str = "horizontal" 205 | ): 206 | # Check for valid orientation 207 | valid_orientations = ["horizontal", "vertical"] 208 | if orientation not in valid_orientations: 209 | raise ValueError( 210 | f"Invalid orientation {orientation}. Must be one of {valid_orientations}" 211 | ) 212 | 213 | flip = ffmpeg.vflip if orientation == "vertical" else ffmpeg.hflip 214 | stream = ffmpeg.input(input_path) 215 | stream = flip(stream) 216 | stream = ffmpeg.output(stream, output_path) 217 | ffmpeg.run(stream) 218 | return output_path 219 | 220 | 221 | class VideoFrameSampleTool(Tool): 222 | description = """ 223 | This tool samples an image frame from an input video. 224 | Inputs are input_path, output_path, and frame_number. 225 | Output is the output_path. 226 | """ 227 | inputs = ["text", "text", "text"] 228 | outputs = ["text"] 229 | 230 | def __call__(self, input_path: str, output_path: str, frame_number: int): 231 | out, _ = ( 232 | ffmpeg.input(input_path) 233 | .filter("select", "gte(n,{})".format(str(frame_number))) 234 | .output("pipe:", vframes=1, format="image2", vcodec="mjpeg") 235 | .run(capture_stdout=True) 236 | ) 237 | img = Image.open(BytesIO(out)) 238 | img.save(output_path) 239 | return output_path 240 | 241 | 242 | class VideoGopChunkerTool(Tool): 243 | description = """ 244 | This tool segments video input into GOPs (Group of Pictures) chunks of 245 | segment_length (in seconds). Inputs are input_path and segment_length. 246 | """ 247 | inputs = ["text", "integer"] 248 | outputs = ["None"] 249 | 250 | def __init__(self): 251 | super().__init__() 252 | 253 | def __call__(self, input_path, segment_length): 254 | basename = Path(input_path).stem 255 | output_dir = Path(input_path).parent 256 | video_info = get_video_info(input_path) 257 | num_segments = math.ceil(float(video_info["duration"]) / segment_length) 258 | num_digits = len(str(num_segments)) 259 | filename_pattern = f"{output_dir}/{basename}_%0{num_digits}d.mp4" 260 | 261 | ffmpeg.input(input_path).output( 262 | filename_pattern, 263 | c="copy", 264 | map="0", 265 | f="segment", 266 | segment_time=segment_length, 267 | ).run() 268 | 269 | 270 | class VideoHTTPServerTool(Tool): 271 | description = """ 272 | This tool streams a source video to an HTTP server. 273 | Inputs are input_path and server_url. 274 | """ 275 | inputs = ["text", "text"] 276 | outputs = ["None"] 277 | 278 | def __call__(self, input_path: str, server_url: str = "http://localhost:8080"): 279 | process = ( 280 | ffmpeg.input(input_path) 281 | .output( 282 | server_url, 283 | codec="copy", # use same codecs of the original video 284 | listen=1, # enables HTTP server 285 | f="flv", 286 | ) # ffplay -f flv http://localhost:8080 287 | .global_args("-re") # argument to act as a live stream 288 | .overwrite_output() 289 | .run() 290 | ) 291 | 292 | 293 | class VideoLetterBoxingTool(Tool): 294 | description = """ 295 | This tool adds letterboxing to a video. 296 | Inputs are input_path, output_path, width, height, bg_color. 297 | """ 298 | inputs = ["text", "text", "int", "int", "text"] 299 | outputs = ["None"] 300 | 301 | def __call__( 302 | self, 303 | input_path: str, 304 | output_path: str, 305 | width: int = 1920, 306 | height: int = 1080, 307 | bg_color: str = "black", 308 | ): 309 | video_info = get_video_info(input_path) 310 | old_width = int(video_info["width"]) 311 | old_height = int(video_info["height"]) 312 | 313 | # Check if the video is in portrait mode 314 | if old_height >= old_width: 315 | vf_option = "scale={}:{}:force_original_aspect_ratio=decrease,pad={}:{}:-1:-1:color={}".format( 316 | width, height, width, height, bg_color 317 | ) 318 | else: 319 | vf_option = "scale={}:-1".format(width) 320 | (ffmpeg.input(input_path).output(output_path, vf=vf_option).run()) 321 | 322 | 323 | class VideoOverlayTool(Tool): 324 | description = """ 325 | This tool overlays one video on top of another. 326 | Inputs are main_video_path, overlay_video_path, output_path, x_position, y_position. 327 | """ 328 | inputs = ["text", "text", "text", "integer", "integer"] 329 | outputs = ["None"] 330 | 331 | def __call__( 332 | self, 333 | main_video_path: str, 334 | overlay_video_path: str, 335 | output_path: str, 336 | x_position: int, 337 | y_position: int, 338 | ): 339 | main = ffmpeg.input(main_video_path) 340 | overlay = ffmpeg.input(overlay_video_path) 341 | 342 | ( 343 | ffmpeg.output( 344 | ffmpeg.overlay(main, overlay, x=x_position, y=y_position), output_path 345 | ) 346 | .overwrite_output() 347 | .run() 348 | ) 349 | 350 | 351 | class VideoReverseTool(Tool): 352 | description = """ 353 | This tool reverses a video. 354 | Inputs are input_path and output_path. 355 | """ 356 | inputs = ["text", "text"] 357 | outputs = ["None"] 358 | 359 | def __call__(self, input_path: str, output_path: str): 360 | ( 361 | ffmpeg.input(input_path) 362 | .filter_("reverse") 363 | .output(output_path) 364 | .overwrite_output() 365 | .run() 366 | ) 367 | 368 | 369 | class VideoResizeTool(Tool): 370 | description = """ 371 | This tool resizes the video to the specified dimensions. 372 | Inputs are input_path, width, height, output_path. 373 | """ 374 | inputs = ["text", "text", "integer", "integer"] 375 | outputs = ["None"] 376 | 377 | def __call__(self, input_path: str, output_path: str, width: int, height: int): 378 | ( 379 | ffmpeg.input(input_path) 380 | .output(output_path, vf="scale={}:{}".format(width, height)) 381 | .overwrite_output() 382 | .run() 383 | ) 384 | 385 | 386 | class VideoRotateTool(Tool): 387 | description = """ 388 | This tool rotates a video by a specified angle. 389 | Inputs are input_path, output_path and rotation_angle in degrees. 390 | """ 391 | inputs = ["text", "text", "integer"] 392 | outputs = ["None"] 393 | 394 | def __call__(self, input_path: str, output_path: str, rotation_angle: int): 395 | ( 396 | ffmpeg.input(input_path) 397 | .filter_("rotate", rotation_angle) 398 | .output(output_path) 399 | .overwrite_output() 400 | .run() 401 | ) 402 | 403 | 404 | class VideoSegmentDeleteTool(Tool): 405 | description = """ 406 | This tool deletes a interval of video by timestamp. 407 | Inputs are input_path, output_path, start, end. 408 | Format start/end as float. 409 | """ 410 | inputs = ["text", "text", "float", "float"] 411 | outputs = ["None"] 412 | 413 | def __call__(self, input_path: str, output_path: str, start: float, end: float): 414 | ( 415 | ffmpeg.input(input_path) 416 | .output( 417 | output_path, 418 | vf="select='not(between(t,{},{}))',setpts=N/FRAME_RATE/TB".format( 419 | start, end 420 | ), 421 | af="aselect='not(between(t,{},{}))',asetpts=N/SR/TB".format(start, end), 422 | ) 423 | .run() 424 | ) 425 | 426 | 427 | class VideoSpeedTool(Tool): 428 | description = """ 429 | This tool speeds up a video. 430 | Inputs are input_path as a string, output_path as a string, speed_factor (float) as a string. 431 | Output is the output_path. 432 | """ 433 | inputs = ["text", "text", "text"] 434 | outputs = ["text"] 435 | 436 | def __call__(self, input_path: str, output_path: str, speed_factor: float): 437 | stream = ffmpeg.input(input_path) 438 | stream = ffmpeg.setpts(stream, "1/{}*PTS".format(float(speed_factor))) 439 | stream = ffmpeg.output(stream, output_path) 440 | ffmpeg.run(stream) 441 | return output_path 442 | 443 | 444 | class VideoStackTool(Tool): 445 | description = """ 446 | This tool stacks two videos either vertically or horizontally based on the orientation parameter. 447 | Inputs are input_path, second_input, output_path, and orientation as strings. 448 | Output is the output_path. 449 | vertical orientation -> vstack, horizontal orientation -> hstack 450 | """ 451 | inputs = ["text", "text", "text", "text"] 452 | outputs = ["None"] 453 | 454 | def __call__( 455 | self, input_path: str, second_input: str, output_path: str, orientation: str 456 | ): 457 | video1 = ffmpeg.input(input_path) 458 | video2 = ffmpeg.input(second_input) 459 | 460 | if orientation.lower() not in ["vstack", "hstack"]: 461 | raise ValueError("Orientation must be either 'vstack' or 'hstack'.") 462 | 463 | stacked = ffmpeg.filter((video1, video2), orientation) 464 | out = ffmpeg.output(stacked, output_path) 465 | out.run(overwrite_output=True) 466 | 467 | 468 | class VideoTrimTool(Tool): 469 | name = "VideoTrimTool" 470 | description = """ 471 | This tool trims a video. Inputs are input_path, output_path, 472 | start_time, and end_time. Format start(end)_time: HH:MM:SS 473 | """ 474 | inputs = ["text", "text", "text", "text"] 475 | outputs = ["None"] 476 | 477 | def __call__( 478 | self, input_path: str, output_path: str, start_time: str, end_time: str 479 | ): 480 | start_time=start_time.replace("-","") 481 | end_time=end_time.replace("-", "") 482 | stream = ffmpeg.input(input_path) 483 | v = stream.trim(start=start_time, end=end_time).setpts("PTS-STARTPTS") 484 | if has_audio(input_path): 485 | a = stream.filter_("atrim", start=start_time, end=end_time).filter_( 486 | "asetpts", "PTS-STARTPTS" 487 | ) 488 | joined = ffmpeg.concat(v, a, v=1, a=1).node 489 | out = ffmpeg.output(joined[0], joined[1], output_path) 490 | else: 491 | out = ffmpeg.output(v, output_path) 492 | out.run() 493 | 494 | 495 | class VideoWatermarkTool(Tool): 496 | description = """ 497 | This tool adds logo image as watermark to a video. 498 | Inputs are input_path, output_path, watermark_path. 499 | """ 500 | inputs = ["text", "text", "text", "integer", "integer"] 501 | outputs = ["None"] 502 | 503 | def __call__( 504 | self, 505 | input_path: str, 506 | output_path: str, 507 | watermark_path: str, 508 | x: int = 10, 509 | y: int = 10, 510 | ): 511 | main = ffmpeg.input(input_path) 512 | logo = ffmpeg.input(watermark_path) 513 | ( 514 | ffmpeg.filter([main, logo], "overlay", x, y) 515 | .output(output_path) 516 | .overwrite_output() 517 | .run() 518 | ) 519 | -------------------------------------------------------------------------------- /ffmperative/utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | import ffmpeg 3 | import base64 4 | import json 5 | import requests 6 | import subprocess 7 | 8 | from pathlib import Path 9 | 10 | def download_ffmp(): 11 | bin_dir = os.path.join(os.path.dirname(__file__), 'bin') 12 | ffmp_path = os.path.join(bin_dir, 'ffmp') 13 | 14 | # Create the 'bin/' directory if it does not exist 15 | if not os.path.exists(bin_dir): 16 | os.makedirs(bin_dir, exist_ok=True) 17 | 18 | # Download ffmp file if it does not exist 19 | if not os.path.exists(ffmp_path): 20 | print("Downloading ffmp...") 21 | model_url = "https://remyx.ai/assets/ffmperative/0.0.7/ffmp" 22 | response = requests.get(model_url, stream=True) 23 | with open(ffmp_path, 'wb') as f: 24 | for chunk in response.iter_content(chunk_size=8192): 25 | f.write(chunk) 26 | print("Download complete.") 27 | else: 28 | pass 29 | 30 | # Check if the file is executable, and make it executable if it's not 31 | if not os.access(ffmp_path, os.X_OK): 32 | print("Making ffmp executable...") 33 | os.chmod(ffmp_path, 0o755) # Sets the file to be readable and executable by everyone, and writable by the owner. 34 | print("ffmp is now executable.") 35 | 36 | def extract_and_encode_frame(video_path): 37 | # Get the duration of the video 38 | probe = ffmpeg.probe(video_path) 39 | duration = float(probe['streams'][0]['duration']) 40 | 41 | # Calculate the timestamp for a frame in the middle of the video 42 | mid_time = duration / 2 43 | 44 | # Extract the frame at mid_time 45 | out, _ = ( 46 | ffmpeg 47 | .input(video_path, ss=mid_time) 48 | .output('pipe:', vframes=1, format='image2', vcodec='mjpeg') 49 | .run(capture_stdout=True, capture_stderr=True) 50 | ) 51 | 52 | # Encode the frame in base64 53 | base64_image = base64.b64encode(out).decode('utf-8') 54 | 55 | return base64_image 56 | 57 | def process_video_directory(directory_path): 58 | json_list = [] 59 | for filename in os.listdir(directory_path): 60 | print("Processing: ", filename) 61 | if filename.endswith((".mp4", ".avi", ".mov")): # Add other video formats if needed 62 | video_path = os.path.join(directory_path, filename) 63 | base64_image = extract_and_encode_frame(video_path) 64 | json_list.append({"name": video_path, "sample": base64_image}) 65 | return json_list 66 | 67 | def post_json_to_endpoint(json_data, url): 68 | headers = {'Content-Type': 'application/json'} 69 | response = requests.post(url, json=json_data, headers=headers) 70 | return response 71 | 72 | def call_director(video_directory, user_instructions=None): 73 | json_data = process_video_directory(video_directory) 74 | 75 | # Add user instructions to the JSON data 76 | if user_instructions: 77 | json_data = { 78 | 'videos': json_data, 79 | 'user_instructions': user_instructions 80 | } 81 | else: 82 | json_data = {'videos': json_data} 83 | 84 | # Endpoint URL 85 | endpoint_url = 'https://engine.remyx.ai/api/v1.0/task/b_roll/compose' 86 | 87 | # Make the POST request 88 | response = post_json_to_endpoint(json_data, endpoint_url) 89 | response = response.json() 90 | compose_plan = response["compose_plan"] 91 | join_command = response["join_command"] 92 | return compose_plan, join_command 93 | 94 | def process_clip(clip_path): 95 | basename = os.path.basename(clip_path) 96 | processed_clip_path = Path("processed_clips") / basename 97 | subprocess.run(["ffmpeg", "-i", str(clip_path), "-vf", "scale=1920:1080,setsar=1,setdar=16/9,fps=30", str(processed_clip_path)]) 98 | return processed_clip_path 99 | 100 | def process_and_concatenate_clips(videos_string, output_path="composed_video.mp4"): 101 | # Split the string into individual paths 102 | video_paths = videos_string.strip().split() 103 | 104 | # Ensure there are video paths provided 105 | if not video_paths: 106 | raise ValueError("Please provide a string with video file paths") 107 | 108 | # Directory to store processed clips 109 | processed_clips_dir = Path("processed_clips") 110 | processed_clips_dir.mkdir(exist_ok=True) 111 | 112 | # Process each clip 113 | processed_clips = [] 114 | for clip_path in video_paths: 115 | clip_path = Path(clip_path) 116 | if clip_path.exists() and clip_path.is_file(): 117 | processed_clip = process_clip(clip_path) 118 | processed_clips.append(processed_clip) 119 | else: 120 | print(f"Warning: File not found {clip_path}") 121 | 122 | # Create a file list 123 | with open("files.txt", "w") as file_list: 124 | for clip in processed_clips: 125 | file_list.write(f"file '{clip}'\n") 126 | 127 | # Concatenate all processed clips 128 | subprocess.run(["ffmpeg", "-f", "concat", "-safe", "0", "-i", "files.txt", output_path]) 129 | 130 | # Cleanup 131 | for clip in processed_clips: 132 | clip.unlink() 133 | 134 | # Delete all contents of the directory 135 | for item in processed_clips_dir.iterdir(): 136 | if item.is_dir(): 137 | shutil.rmtree(item) 138 | else: 139 | item.unlink() 140 | 141 | # Now safely remove the directory 142 | processed_clips_dir.rmdir() 143 | 144 | # Remove the files.txt 145 | Path("files.txt").unlink() 146 | 147 | # Additionally, delete the original clip files 148 | for original_clip_path in video_paths: 149 | original_clip = Path(original_clip_path) 150 | if original_clip.exists(): 151 | original_clip.unlink() 152 | 153 | return f"All clips processed and concatenated into {output_path}" 154 | 155 | def modify_file_name(file_path, prefix): 156 | # Convert the file path to a Path object 157 | file_path = Path(file_path) 158 | 159 | # Extract the directory and the file name 160 | parent_dir = file_path.parent 161 | file_name = file_path.name 162 | 163 | # Add the prefix to the file name 164 | new_file_name = prefix + file_name 165 | 166 | # Create the new file path 167 | new_file_path = os.path.join(parent_dir, new_file_name) 168 | 169 | return new_file_path 170 | 171 | 172 | def probe_video(input_path): 173 | return ffmpeg.probe(input_path) 174 | 175 | 176 | def get_video_info(input_path): 177 | probe = probe_video(input_path) 178 | return next( 179 | (stream for stream in probe["streams"] if stream["codec_type"] == "video"), None 180 | ) 181 | 182 | 183 | def has_audio(input_path): 184 | probe = probe_video(input_path) 185 | return any(stream["codec_type"] == "audio" for stream in probe["streams"]) 186 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | pillow 3 | requests 4 | opencv-python 5 | ffprobe-python 6 | ffmpeg-python 7 | soundfile 8 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | import os 2 | from setuptools import setup, find_packages 3 | from setuptools.command.install import install 4 | 5 | # read the contents of your README file 6 | from pathlib import Path 7 | this_directory = Path(__file__).parent 8 | long_description = (this_directory / "README.md").read_text() 9 | 10 | 11 | def read_requirements(file): 12 | with open(file) as f: 13 | return [line.strip() for line in f if line.strip() and not line.startswith('#')] 14 | 15 | setup( 16 | name="ffmperative", 17 | version="0.0.7-1", 18 | packages=find_packages(), 19 | include_package_data=True, 20 | install_requires=read_requirements((this_directory / 'requirements.txt')), 21 | entry_points={ 22 | "console_scripts": [ 23 | "ffmperative=ffmperative.cli:main", 24 | ], 25 | }, 26 | long_description=long_description, 27 | long_description_content_type='text/markdown' 28 | ) 29 | --------------------------------------------------------------------------------