├── CONTRIBUTING.md
├── LICENSE
├── README.md
├── assets
├── ffmperative.gif
└── mascot.png
├── docs
└── index.md
├── ffmperative
├── __init__.py
├── bin
│ └── .gitignore
├── cli.py
├── interpretor.py
├── prompts.py
├── tool_mapping.py
├── tools.py
└── utils.py
├── requirements.txt
└── setup.py
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # CONTRIBUTING
2 |
3 | * Have a video processing workflow in mind? Want to contribute to our project? We'd love to hear from you! Raise an issue and we'll work together to design it.
4 |
5 | ### Install From Source
6 | ```
7 | git clone https://github.com/remyxai/ffmperative.git
8 | cd ffmperative
9 | pip install .
10 | ```
11 |
12 | ### Building a Docker Image & Running a Container
13 | Or clone this repo and build an image with the `Dockerfile`:
14 | ```bash
15 | git clone https://github.com/remyxai/FFMPerative.git
16 | cd FFMPerative/docker
17 | docker build -t ffmperative .
18 | ```
19 |
20 | #### Run FFMPerative in a Container
21 | ```bash
22 | docker run -it -e HUGGINGFACE_TOKEN='YOUR_HF_TOKEN' -v /Videos:/Videos --entrypoint /bin/bash ffmperative:latest
23 | ```
24 |
25 | ### Build a Debian Package from Source
26 | For debian, build and install the package:
27 | ```bash
28 | dpkg-deb --build package_build/ ffmperative.deb
29 | sudo dpkg -i ffmperative.deb
30 | ```
31 |
32 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2023 Remyx AI
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # FFMPerative - Chat to Compose Video
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 | FFMPerative is your copilot for video editing workflows. Powered by Large Language Models (LLMs) through an intuitive chat interface, now you can compose video edits in natural language to do things like:
12 |
13 | * Change Speed, Resize, Crop, Flip, Reverse Video/GIF
14 | * Speech-to-Text Transcription and Closed-Captions
15 |
16 | Just describe your changes like [these examples](https://remyxai.github.io/FFMPerative/).
17 |
18 | ## Setup
19 |
20 | ### Requirements
21 | * Python 3
22 | * [ffmpeg](https://ffmpeg.org)
23 |
24 | PyPI:
25 | ```
26 | pip install ffmperative
27 | ```
28 |
29 | Or pip install from source:
30 | ```
31 | git clone https://github.com/remyxai/FFMPerative.git
32 | cd FFMPerative && pip install .
33 | ```
34 |
35 | ## Quickstart
36 | Add closed-captions with:
37 |
38 | ```bash
39 | ffmperative do --prompt "merge subtitles 'captions.srt' with video 'video.mp4' calling it 'video_caps.mp4'"
40 | ```
41 |
42 | ## Features
43 |
44 | ### Python Usage
45 | Simply import the library and pass your command as a string to `ffmp`.
46 |
47 | ```python
48 | from ffmperative import ffmp
49 |
50 | ffmp("sample the 5th frame from '/path/to/video.mp4'")
51 | ```
52 |
53 | ### Compose 🎞️
54 | Use the `compose` call to compose clips into an edited video. Use the optional `--prompt` flag to guide the composition by text prompt.
55 | ```bash
56 | ffmperative compose --clips /path/to/video/dir --output /path/to/my_video.mp4 --prompt "Edit the video for social media"
57 | ```
58 |
59 | ### Resources
60 | * [ffmpeg-python](https://github.com/kkroening/ffmpeg-python/)
61 | * [Sample FFMPerative Dataset](https://huggingface.co/datasets/remyxai/ffmperative-sample)
62 | * [FFMPerative LLaMA2 checkpoint](https://huggingface.co/remyxai/ffmperative-7b)
63 | * [Automatically Edit Videos from Google Drive in Colab](https://colab.research.google.com/drive/149byzCNd17dAehVuWXkiFQ2mVe_icLCa?usp=sharing)
64 |
65 | ### Community
66 | * [Join us on Discord](https://discord.com/invite/b2yGuCNpuC)
67 |
--------------------------------------------------------------------------------
/assets/ffmperative.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/remyxai/FFMPerative/2a3ea5d4ddbefb0fdf2371f6607d60efc58740c8/assets/ffmperative.gif
--------------------------------------------------------------------------------
/assets/mascot.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/remyxai/FFMPerative/2a3ea5d4ddbefb0fdf2371f6607d60efc58740c8/assets/mascot.png
--------------------------------------------------------------------------------
/docs/index.md:
--------------------------------------------------------------------------------
1 | # FFMPerative Examples
2 |
3 | ## Basic Usage
4 | Probe for metadata info about your source media:
5 | ```
6 | ffmperative "get info from 'hotel_lobby.mp4'"
7 | ```
8 |
9 | Chunk your video into 3 second GOPs:
10 | ```
11 | ffmperative "chunk 'video.mp4' into 3 second clips"
12 | ```
13 |
14 | Pad portrait mode videos with letterboxing using the python CLI:
15 | ```
16 | ffmp do --p "apply letterboxing to 'video.mp4' call it 'video_letterbox.mp4'"
17 | ```
18 |
19 | Vertically stack two videos:
20 | ```
21 | ffmp do --p "vertically stack 'video1.mp4' and 'video2.mp4' calling it 'result.mp4'"
22 | ```
23 |
24 | ## Advanced Usage
25 |
26 | Apply VidStab to stabilize your video:
27 | ```
28 | ffmperative "stabilize 'video.mp4'"
29 | ```
30 |
31 | Apply Ken Burns effect to zoompan an image to video:
32 | ```
33 | ffmp("ken burns effect on 'image.png' call it 'image_kenburns.mp4'")
34 | ```
35 |
36 | Perform speech-to-text transcription on your video:
37 | ```
38 | ffmperative "get subtitles from 'video.mp4'"
39 | ```
40 |
41 | Apply an image classifier to every 5th frame from your video:
42 | ```
43 | ffmp do --p "classify 'video.mp4' using the model 'my_model/my_model.onnx'"
44 | ```
45 |
46 |
--------------------------------------------------------------------------------
/ffmperative/__init__.py:
--------------------------------------------------------------------------------
1 | import os
2 | import re
3 | import ast
4 | import shlex
5 | import requests
6 | import subprocess
7 | import pkg_resources
8 | from sys import argv
9 |
10 | from . import tools as t
11 | from .prompts import MAIN_PROMPT
12 | from .utils import download_ffmp
13 | from .tool_mapping import generate_tools_mapping
14 | from .interpretor import evaluate, extract_function_calls
15 |
16 | tools = generate_tools_mapping()
17 |
18 | def run_local(prompt):
19 | download_ffmp()
20 | ffmp_path = pkg_resources.resource_filename('ffmperative', 'bin/ffmp')
21 | safe_prompt = shlex.quote(prompt)
22 | command = '{} -p "{}"'.format(ffmp_path, safe_prompt)
23 |
24 | try:
25 | result = subprocess.run(command, capture_output=True, text=True, shell=True)
26 |
27 | output = result.stdout
28 | return output
29 | except subprocess.CalledProcessError as e:
30 | print(f"Error occurred: {e}")
31 | return None
32 |
33 | def run_remote(prompt):
34 | stop=["Task:"]
35 | complete_prompt = MAIN_PROMPT.replace("<>", prompt.replace("'", "\\'").replace('"', '\\"'))
36 | headers = {"Authorization": f"Bearer {os.environ.get('HF_ACCESS_TOKEN', '')}"}
37 | inputs = {
38 | "inputs": complete_prompt,
39 | "parameters": {"max_new_tokens": 192, "return_full_text": True, "stop":stop},
40 | }
41 |
42 | response = requests.post("https://api-inference.huggingface.co/models/bigcode/starcoder", json=inputs, headers=headers)
43 | if response.status_code == 429:
44 | logger.info("Getting rate-limited, waiting a tiny bit before trying again.")
45 | time.sleep(1)
46 | return run_remote(prompt)
47 | elif response.status_code != 200:
48 | raise ValueError(f"Error {response.status_code}: {response.json()}")
49 |
50 | result = response.json()[0]["generated_text"]
51 | for stop_seq in stop:
52 | if result.endswith(stop_seq):
53 | res = result[: -len(stop_seq)]
54 | answer = res.split("Answer:")[-1].strip()
55 | return answer
56 | return result
57 |
58 | def ffmp(prompt, remote=False, tools=tools):
59 | if remote:
60 | parsed_output = run_remote(prompt)
61 | else:
62 | parsed_output = run_local(prompt)
63 | if parsed_output:
64 | try:
65 | extracted_output = extract_function_calls(parsed_output, tools)
66 | parsed_ast = ast.parse(extracted_output)
67 | result = evaluate(parsed_ast, tools)
68 | return result
69 | except SyntaxError as e:
70 | print(f"Syntax error in parsed output: {e}")
71 | else:
72 | return None
73 |
--------------------------------------------------------------------------------
/ffmperative/bin/.gitignore:
--------------------------------------------------------------------------------
1 | # Ignore everything in this directory
2 | *
3 | # Except this file
4 | !.gitignore
5 |
--------------------------------------------------------------------------------
/ffmperative/cli.py:
--------------------------------------------------------------------------------
1 | import argparse
2 | from . import ffmp
3 | from .utils import call_director, process_and_concatenate_clips
4 | from pprint import pprint
5 |
6 | def main():
7 | parser = argparse.ArgumentParser(description="FFMperative CLI tool")
8 |
9 | subparsers_action = parser.add_subparsers(dest="action", help="Top-level actions")
10 |
11 | # Parser for 'do' action
12 | do_parser = subparsers_action.add_parser("do", help="Run task with ffmp Agent")
13 | do_parser.add_argument("--prompt", required=True, help="Prompt to perform a task")
14 | do_parser.add_argument("--remote", action='store_true', default=False, required=False, help="Run remotely")
15 |
16 | # Parser for 'compose' action
17 | compose_parser = subparsers_action.add_parser("compose", help="Compose clips into a video")
18 | compose_parser.add_argument("--clips", required=True, help="Path to clips directory")
19 | compose_parser.add_argument("--prompt", required=False, default=None, help="Guide the composition by text prompt e.g. 'Edit the video for social media'")
20 | compose_parser.add_argument("--output", required=False, default="composed_video.mp4", help="Filename for edited video. Default is 'composed_video.mp4'")
21 | compose_parser.add_argument("--remote", action='store_true', default=False, required=False, help="Run remotely")
22 |
23 | args = parser.parse_args()
24 |
25 | if args.action == "do":
26 | results = ffmp(args.prompt, args.remote)
27 | pprint(results)
28 | elif args.action == "compose":
29 | compose_plans, join_plan = call_director(args.clips, args.prompt)
30 | for plan in compose_plans:
31 | try:
32 | ffmp(plan, args.remote)
33 | except:
34 | print("skipped plan: ", plan)
35 | results = process_and_concatenate_clips(join_plan, args.output)
36 | pprint(results)
37 | else:
38 | print("Invalid action")
39 |
40 | if __name__ == "__main__":
41 | main()
42 |
--------------------------------------------------------------------------------
/ffmperative/interpretor.py:
--------------------------------------------------------------------------------
1 | import re
2 | import ast
3 | import difflib
4 | from collections.abc import Mapping
5 | from typing import Any, Callable, Dict, List
6 |
7 |
8 | class InterpretorError(ValueError):
9 | """
10 | An error raised when the interpretor cannot evaluate a Python expression, due to syntax error or unsupported
11 | operations.
12 | """
13 |
14 | pass
15 |
16 |
17 | def extract_function_calls(text, tools):
18 | # Update the regex pattern to match tool names followed by an opening parenthesis
19 | pattern = r'(' + '|'.join(re.escape(tool) for tool in tools) + r')\('
20 |
21 | # Find all occurrences of the tool function calls
22 | matches = re.finditer(pattern, text)
23 |
24 | function_calls = []
25 | for match in matches:
26 | start = match.start()
27 | open_paren = text.find('(', start)
28 | if open_paren == -1:
29 | continue
30 |
31 | # Count the parentheses to find the matching closing parenthesis
32 | end_paren, nested_paren = open_paren, 0
33 | for char in text[open_paren:]:
34 | if char == '(':
35 | nested_paren += 1
36 | elif char == ')':
37 | nested_paren -= 1
38 | if nested_paren == 0:
39 | break
40 | end_paren += 1
41 |
42 | # Extract the function call
43 | function_call = text[start:end_paren + 1]
44 | function_calls.append(function_call)
45 |
46 | return "\n".join(function_calls)
47 |
48 |
49 | def evaluate(code: str, tools: Dict[str, Callable], state=None, chat_mode=False):
50 | """
51 | Evaluate a python expression using the content of the variables stored in a state and only evaluating a given set
52 | of functions.
53 |
54 | This function will recurse through the nodes of the tree provided.
55 |
56 | Args:
57 | code (`str`):
58 | The code to evaluate.
59 | tools (`Dict[str, Callable]`):
60 | The functions that may be called during the evaluation. Any call to another function will fail with an
61 | `InterpretorError`.
62 | state (`Dict[str, Any]`):
63 | A dictionary mapping variable names to values. The `state` should contain the initial inputs but will be
64 | updated by this function to contain all variables as they are evaluated.
65 | chat_mode (`bool`, *optional*, defaults to `False`):
66 | Whether or not the function is called from `Agent.chat`.
67 | """
68 | try:
69 | expression = ast.parse(code)
70 | except SyntaxError as e:
71 | print("The code generated by the agent is not valid.\n", e)
72 | return
73 | if state is None:
74 | state = {}
75 | result = None
76 | for idx, node in enumerate(expression.body):
77 | try:
78 | line_result = evaluate_ast(node, state, tools)
79 | except InterpretorError as e:
80 | msg = f"Evaluation of the code stopped at line {idx} before the end because of the following error"
81 | if chat_mode:
82 | msg += (
83 | f". Copy paste the following error message and send it back to the agent:\nI get an error: '{e}'"
84 | )
85 | else:
86 | msg += f":\n{e}"
87 | print(msg)
88 | break
89 | if line_result is not None:
90 | result = line_result
91 |
92 | return result
93 |
94 |
95 | def evaluate_ast(expression: ast.AST, state: Dict[str, Any], tools: Dict[str, Callable]):
96 | """
97 | Evaluate an absract syntax tree using the content of the variables stored in a state and only evaluating a given
98 | set of functions.
99 |
100 | This function will recurse trough the nodes of the tree provided.
101 |
102 | Args:
103 | expression (`ast.AST`):
104 | The code to evaluate, as an abastract syntax tree.
105 | state (`Dict[str, Any]`):
106 | A dictionary mapping variable names to values. The `state` is updated if need be when the evaluation
107 | encounters assignements.
108 | tools (`Dict[str, Callable]`):
109 | The functions that may be called during the evaluation. Any call to another function will fail with an
110 | `InterpretorError`.
111 | """
112 | if isinstance(expression, ast.Assign):
113 | # Assignement -> we evaluate the assignement which should update the state
114 | # We return the variable assigned as it may be used to determine the final result.
115 | return evaluate_assign(expression, state, tools)
116 | elif isinstance(expression, ast.Call):
117 | # Function call -> we return the value of the function call
118 | return evaluate_call(expression, state, tools)
119 | elif isinstance(expression, ast.Constant):
120 | # Constant -> just return the value
121 | return expression.value
122 | elif isinstance(expression, ast.Dict):
123 | # Dict -> evaluate all keys and values
124 | keys = [evaluate_ast(k, state, tools) for k in expression.keys]
125 | values = [evaluate_ast(v, state, tools) for v in expression.values]
126 | return dict(zip(keys, values))
127 | elif isinstance(expression, ast.Expr):
128 | # Expression -> evaluate the content
129 | return evaluate_ast(expression.value, state, tools)
130 | elif isinstance(expression, ast.For):
131 | # For loop -> execute the loop
132 | return evaluate_for(expression, state, tools)
133 | elif isinstance(expression, ast.FormattedValue):
134 | # Formatted value (part of f-string) -> evaluate the content and return
135 | return evaluate_ast(expression.value, state, tools)
136 | elif isinstance(expression, ast.If):
137 | # If -> execute the right branch
138 | return evaluate_if(expression, state, tools)
139 | elif hasattr(ast, "Index") and isinstance(expression, ast.Index):
140 | return evaluate_ast(expression.value, state, tools)
141 | elif isinstance(expression, ast.JoinedStr):
142 | return "".join([str(evaluate_ast(v, state, tools)) for v in expression.values])
143 | elif isinstance(expression, ast.List):
144 | # List -> evaluate all elements
145 | return [evaluate_ast(elt, state, tools) for elt in expression.elts]
146 | elif isinstance(expression, ast.Name):
147 | # Name -> pick up the value in the state
148 | return evaluate_name(expression, state, tools)
149 | elif isinstance(expression, ast.Subscript):
150 | # Subscript -> return the value of the indexing
151 | return evaluate_subscript(expression, state, tools)
152 | else:
153 | # For now we refuse anything else. Let's add things as we need them.
154 | raise InterpretorError(f"{expression.__class__.__name__} is not supported.")
155 |
156 |
157 | def evaluate_assign(assign, state, tools):
158 | var_names = assign.targets
159 | result = evaluate_ast(assign.value, state, tools)
160 |
161 | if len(var_names) == 1:
162 | state[var_names[0].id] = result
163 | else:
164 | if len(result) != len(var_names):
165 | raise InterpretorError(f"Expected {len(var_names)} values but got {len(result)}.")
166 | for var_name, r in zip(var_names, result):
167 | state[var_name.id] = r
168 | return result
169 |
170 |
171 | def evaluate_call(call, state, tools):
172 | if not isinstance(call.func, ast.Name):
173 | raise InterpretorError(
174 | f"It is not permitted to evaluate other functions than the provided tools (tried to execute {call.func} of "
175 | f"type {type(call.func)}."
176 | )
177 | func_name = call.func.id
178 | if func_name not in tools:
179 | raise InterpretorError(
180 | f"It is not permitted to evaluate other functions than the provided tools (tried to execute {call.func.id})."
181 | )
182 |
183 | func = tools[func_name]
184 | # Todo deal with args
185 | args = [evaluate_ast(arg, state, tools) for arg in call.args]
186 | kwargs = {keyword.arg: evaluate_ast(keyword.value, state, tools) for keyword in call.keywords}
187 | return func(*args, **kwargs)
188 |
189 |
190 | def evaluate_subscript(subscript, state, tools):
191 | index = evaluate_ast(subscript.slice, state, tools)
192 | value = evaluate_ast(subscript.value, state, tools)
193 | if isinstance(value, (list, tuple)):
194 | return value[int(index)]
195 | if index in value:
196 | return value[index]
197 | if isinstance(index, str) and isinstance(value, Mapping):
198 | close_matches = difflib.get_close_matches(index, list(value.keys()))
199 | if len(close_matches) > 0:
200 | return value[close_matches[0]]
201 |
202 | raise InterpretorError(f"Could not index {value} with '{index}'.")
203 |
204 |
205 | def evaluate_name(name, state, tools):
206 | if name.id in state:
207 | return state[name.id]
208 | close_matches = difflib.get_close_matches(name.id, list(state.keys()))
209 | if len(close_matches) > 0:
210 | return state[close_matches[0]]
211 | raise InterpretorError(f"The variable `{name.id}` is not defined.")
212 |
213 |
214 | def evaluate_condition(condition, state, tools):
215 | if len(condition.ops) > 1:
216 | raise InterpretorError("Cannot evaluate conditions with multiple operators")
217 |
218 | left = evaluate_ast(condition.left, state, tools)
219 | comparator = condition.ops[0]
220 | right = evaluate_ast(condition.comparators[0], state, tools)
221 |
222 | if isinstance(comparator, ast.Eq):
223 | return left == right
224 | elif isinstance(comparator, ast.NotEq):
225 | return left != right
226 | elif isinstance(comparator, ast.Lt):
227 | return left < right
228 | elif isinstance(comparator, ast.LtE):
229 | return left <= right
230 | elif isinstance(comparator, ast.Gt):
231 | return left > right
232 | elif isinstance(comparator, ast.GtE):
233 | return left >= right
234 | elif isinstance(comparator, ast.Is):
235 | return left is right
236 | elif isinstance(comparator, ast.IsNot):
237 | return left is not right
238 | elif isinstance(comparator, ast.In):
239 | return left in right
240 | elif isinstance(comparator, ast.NotIn):
241 | return left not in right
242 | else:
243 | raise InterpretorError(f"Operator not supported: {comparator}")
244 |
245 |
246 | def evaluate_if(if_statement, state, tools):
247 | result = None
248 | if evaluate_condition(if_statement.test, state, tools):
249 | for line in if_statement.body:
250 | line_result = evaluate_ast(line, state, tools)
251 | if line_result is not None:
252 | result = line_result
253 | else:
254 | for line in if_statement.orelse:
255 | line_result = evaluate_ast(line, state, tools)
256 | if line_result is not None:
257 | result = line_result
258 | return result
259 |
260 |
261 | def evaluate_for(for_loop, state, tools):
262 | result = None
263 | iterator = evaluate_ast(for_loop.iter, state, tools)
264 | for counter in iterator:
265 | state[for_loop.target.id] = counter
266 | for expression in for_loop.body:
267 | line_result = evaluate_ast(expression, state, tools)
268 | if line_result is not None:
269 | result = line_result
270 | return result
271 |
--------------------------------------------------------------------------------
/ffmperative/prompts.py:
--------------------------------------------------------------------------------
1 | MAIN_PROMPT = 'I will ask you to perform a task, your job is to come up with a series of simple commands in Python that will perform the task.\nTo help you, I will give you access to a set of tools that you can use. Each tool is a Python function and has a description explaining the task it performs, the inputs it expects and the outputs it returns.\nYou should first explain which tool you will use to perform the task and for what reason, then write the code in Python.\nEach instruction in Python should be a simple assignment. You can print intermediate results if it makes sense to do so.\n\nTools:\n- DocumentQa: This is a tool that answers a question about an document (pdf). It takes an input named `document` which should be the document containing the information, as well as a `question` that is the question about the document. It returns a text that contains the answer to the question.\n- ImageCaptioner: This is a tool that generates a description of an image. It takes an input named `image` which should be the image to caption, and returns a text that contains the description in English.\n- ImageQa: This is a tool that answers a question about an image. It takes an input named `image` which should be the image containing the information, as well as a `question` which should be the question in English. It returns a text that is the answer to the question.\n- ImageSegmenter: This is a tool that creates a segmentation mask of an image according to a label. It cannot create an image. It takes two arguments named `image` which should be the original image, and `label` which should be a text describing the elements what should be identified in the segmentation mask. The tool returns the mask.\n- Transcriber: This is a tool that transcribes an audio into text. It takes an input named `audio` and returns the transcribed text.\n- Summarizer: This is a tool that summarizes an English text. It takes an input `text` containing the text to summarize, and returns a summary of the text.\n- TextClassifier: This is a tool that classifies an English text using provided labels. It takes two inputs: `text`, which should be the text to classify, and `labels`, which should be the list of labels to use for classification. It returns the most likely label in the list of provided `labels` for the input text.\n- TextQa: This is a tool that answers questions related to a text. It takes two arguments named `text`, which is the text where to find the answer, and `question`, which is the question, and returns the answer to the question.\n- TextReader: This is a tool that reads an English text out loud. It takes an input named `text` which should contain the text to read (in English) and returns a waveform object containing the sound.\n- Translator: This is a tool that translates text from a language to another. It takes three inputs: `text`, which should be the text to translate, `src_lang`, which should be the language of the text to translate and `tgt_lang`, which should be the language for the desired ouput language. Both `src_lang` and `tgt_lang` are written in plain English, such as \'Romanian\', or \'Albanian\'. It returns the text translated in `tgt_lang`.\n- ImageTransformer: This is a tool that transforms an image according to a prompt. It takes two inputs: `image`, which should be the image to transform, and `prompt`, which should be the prompt to use to change it. The prompt should only contain descriptive adjectives, as if completing the prompt of the original image. It returns the modified image.\n- TextDownloader: This is a tool that downloads a file from a `url`. It takes the `url` as input, and returns the text contained in the file.\n- ImageGenerator: This is a tool that creates an image according to a prompt, which is a text description. It takes an input named `prompt` which contains the image description and outputs an image.\n- VideoGenerator: This is a tool that creates a video according to a text description. It takes an input named `prompt` which contains the image description, as well as an optional input `seconds` which will be the duration of the video. The default is of two seconds. The tool outputs a video object.\n- n\nTools:\n- AudioAdjustmentTool: \n This tool modifies audio levels for an input video.\n Inputs are input_path, output_path, level (e.g. 0.5 or -13dB).\n Output is the output_path.\n \n- AudioVideoMuxTool: \n This tool muxes (combines) a video and an audio file.\n Inputs are input_path as a string, audio_path as a string, and output_path as a string.\n Output is the output_path.\n \n- FFProbeTool: \n This tool extracts metadata from input video using ffmpeg/ffprobe\n Input is input_path and output is video metadata as JSON.\n \n- ImageDirectoryToVideoTool: \n This tool creates video\n from a directory of images. Inputs\n are input_path and output_path. \n Output is the output_path.\n \n- ImageToVideoTool: \n This tool generates an N-second video clip from an image.\n Inputs are image_path, duration, output_path.\n \n- VideoCropTool: \n This tool crops a video with inputs: \n input_path, output_path, \n top_x, top_y, \n bottom_x, bottom_y.\n Output is the output_path.\n \n- VideoFlipTool: \n This tool flips video along the horizontal \n or vertical axis. Inputs are input_path, \n output_path and orientation. Output is output_path.\n \n- VideoFrameSampleTool: \n This tool samples an image frame from an input video. \n Inputs are input_path, output_path, and frame_number.\n Output is the output_path.\n \n- VideoGopChunkerTool: \n This tool segments video input into GOPs (Group of Pictures) chunks of \n segment_length (in seconds). Inputs are input_path and segment_length.\n \n- VideoHTTPServerTool: \n This tool streams a source video to an HTTP server. \n Inputs are input_path and server_url.\n \n- VideoLetterBoxingTool: \n This tool adds letterboxing to a video.\n Inputs are input_path, output_path, width, height, bg_color.\n \n- VideoOverlayTool: \n This tool overlays one video on top of another.\n Inputs are main_video_path, overlay_video_path, output_path, x_position, y_position.\n \n- VideoResizeTool: \n This tool resizes the video to the specified dimensions.\n Inputs are input_path, width, height, output_path.\n \n- VideoReverseTool: \n This tool reverses a video. \n Inputs are input_path and output_path.\n \n- VideoRotateTool: \n This tool rotates a video by a specified angle. \n Inputs are input_path, output_path and rotation_angle in degrees.\n \n- VideoSegmentDeleteTool: \n This tool deletes a interval of video by timestamp.\n Inputs are input_path, output_path, start, end.\n Format start/end as float.\n \n- VideoSpeedTool: \n This tool speeds up a video. \n Inputs are input_path as a string, output_path as a string, speed_factor (float) as a string.\n Output is the output_path.\n \n- VideoStackTool: \n This tool stacks two videos either vertically or horizontally based on the orientation parameter.\n Inputs are input_path, second_input, output_path, and orientation as strings.\n Output is the output_path.\n vertical orientation -> vstack, horizontal orientation -> hstack\n \n- VideoTrimTool: \n This tool trims a video. Inputs are input_path, output_path, \n start_time, and end_time. Format start(end)_time: HH:MM:SS\n \n- VideoWatermarkTool: \n This tool adds logo image as watermark to a video. \n Inputs are input_path, output_path, watermark_path.\n \n\n\nTask: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French."\n\nI will use the following tools: `Translator` to translate the question into English and then `ImageQa` to answer the question on the input image.\n\nAnswer:\n```py\ntranslated_question = translator(question=question, src_lang="French", tgt_lang="English")\nprint(f"The translated question is {translated_question}.")\nanswer = ImageQa(image=image, question=translated_question)\nprint(f"The answer is {answer}")\n```\n\nTask: "Identify the oldest person in the `document` and create an image showcasing the result."\n\nI will use the following tools: `DocumentQa` to find the oldest person in the document, then `ImageGenerator` to generate an image according to the answer.\n\nAnswer:\n```py\nanswer = DocumentQa(document, question="What is the oldest person?")\nprint(f"The answer is {answer}.")\nimage = ImageGenerator(answer)\n```\n\nTask: "Generate an image using the text given in the variable `caption`."\n\nI will use the following tool: `ImageGenerator` to generate an image.\n\nAnswer:\n```py\nimage = ImageGenerator(prompt=caption)\n```\n\nTask: "Summarize the text given in the variable `text` and read it out loud."\n\nI will use the following tools: `Summarizer` to create a summary of the input text, then `TextReader` to read it out loud.\n\nAnswer:\n```py\nsummarized_text = Summarizer(text)\nprint(f"Summary: {summarized_text}")\naudio_summary = TextReader(summarized_text)\n```\n\nTask: "Answer the question in the variable `question` about the text in the variable `text`. Use the answer to generate an image."\n\nI will use the following tools: `TextQa` to create the answer, then `ImageGenerator` to generate an image according to the answer.\n\nAnswer:\n```py\nanswer = TextQa(text=text, question=question)\nprint(f"The answer is {answer}.")\nimage = ImageGenerator(answer)\n```\n\nTask: "Caption the following `image`."\n\nI will use the following tool: `ImageCaptioner` to generate a caption for the image.\n\nAnswer:\n```py\ncaption = ImageCaptioner(image)\n```\n\nTask: "<>"\n\nI will use the following'
2 |
--------------------------------------------------------------------------------
/ffmperative/tool_mapping.py:
--------------------------------------------------------------------------------
1 | import inspect
2 | from . import tools as t
3 |
4 | def generate_tools_mapping():
5 | tools_mapping = {}
6 | for name, obj in inspect.getmembers(t):
7 | if inspect.isclass(obj) and issubclass(obj, t.Tool) and obj is not t.Tool:
8 | tools_mapping[name] = obj() # Instantiate the class
9 | return tools_mapping
10 |
--------------------------------------------------------------------------------
/ffmperative/tools.py:
--------------------------------------------------------------------------------
1 | import math
2 | import json
3 | import ffmpeg
4 | from PIL import Image
5 | from io import BytesIO
6 | from pathlib import Path
7 |
8 | from typing import List
9 |
10 | from .utils import get_video_info, has_audio
11 |
12 | class Tool:
13 | """
14 | A base class for the functions used by the agent. Subclass this and implement the `__call__` method as well as the
15 | following class attributes:
16 |
17 | - **description** (`str`) -- A short description of what your tool does, the inputs it expects and the output(s) it
18 | will return. For instance 'This is a tool that downloads a file from a `url`. It takes the `url` as input, and
19 | returns the text contained in the file'.
20 | - **name** (`str`) -- A performative name that will be used for your tool in the prompt to the agent. For instance
21 | `"text-classifier"` or `"image_generator"`.
22 | - **inputs** (`List[str]`) -- The list of modalities expected for the inputs (in the same order as in the call).
23 | Modalitiies should be `"text"`, `"image"` or `"audio"`. This is only used by `launch_gradio_demo` or to make a
24 | nice space from your tool.
25 | - **outputs** (`List[str]`) -- The list of modalities returned but the tool (in the same order as the return of the
26 | call method). Modalitiies should be `"text"`, `"image"` or `"audio"`. This is only used by `launch_gradio_demo`
27 | or to make a nice space from your tool.
28 |
29 | You can also override the method [`~Tool.setup`] if your tool as an expensive operation to perform before being
30 | usable (such as loading a model). [`~Tool.setup`] will be called the first time you use your tool, but not at
31 | instantiation.
32 | """
33 |
34 | description: str = "This is a tool that ..."
35 | name: str = ""
36 |
37 | inputs: List[str]
38 | outputs: List[str]
39 |
40 | def __init__(self, *args, **kwargs):
41 | self.is_initialized = False
42 |
43 | def __call__(self, *args, **kwargs):
44 | return NotImplemented("Write this method in your subclass of `Tool`.")
45 |
46 | def setup(self):
47 | """
48 | Overwrite this method here for any operation that is expensive and needs to be executed before you start using
49 | your tool. Such as loading a big model.
50 | """
51 | self.is_initialized = True
52 |
53 | class AudioAdjustmentTool(Tool):
54 | description = """
55 | This tool modifies audio levels for an input video.
56 | Inputs are input_path, output_path, level (e.g. 0.5 or -13dB).
57 | Output is the output_path.
58 | """
59 | inputs = ["text", "text", "text"]
60 | outputs = ["text"]
61 |
62 | def __call__(self, input_path: str, output_path: str, level: str):
63 | (
64 | ffmpeg.input(input_path)
65 | .output(output_path, af="volume={}".format(level))
66 | .overwrite_output()
67 | .run()
68 | )
69 | return output_path
70 |
71 |
72 | class AudioVideoMuxTool(Tool):
73 | description = """
74 | This tool muxes (combines) a video and an audio file.
75 | Inputs are input_path as a string, audio_path as a string, and output_path as a string.
76 | Output is the output_path.
77 | """
78 | inputs = ["text", "text", "text"]
79 | outputs = ["None"]
80 |
81 | def __call__(self, input_path: str, audio_path: str, output_path: str):
82 | input_video = ffmpeg.input(input_path)
83 | added_audio = ffmpeg.input(audio_path)
84 |
85 | if has_audio(input_path):
86 | merged_audio = ffmpeg.filter([input_video.audio, added_audio], "amix")
87 | output_video = ffmpeg.concat(input_video, merged_audio, v=1, a=1)
88 | else:
89 | output_video = ffmpeg.concat(input_video, added_audio, v=1, a=1)
90 | output_video.output(output_path).run(overwrite_output=True)
91 |
92 |
93 | class FFProbeTool(Tool):
94 | description = """
95 | This tool extracts metadata from input video using ffmpeg/ffprobe
96 | Input is input_path and output is video metadata as JSON.
97 | """
98 | inputs = ["text"]
99 | outputs = ["None"]
100 |
101 | def __call__(self, input_path: str):
102 | video_info = get_video_info(input_path)
103 | return json.dumps(video_info, indent=2) if video_info else None
104 |
105 |
106 | class ImageToVideoTool(Tool):
107 | description = """
108 | This tool generates an N-second video clip from an image.
109 | Inputs are image_path, duration, output_path.
110 | """
111 | inputs = ["text", "text", "integer", "integer"]
112 | outputs = ["None"]
113 |
114 | def __call__(
115 | self, image_path: str, output_path: str, duration: int, framerate: int = 24
116 | ):
117 | (
118 | ffmpeg.input(image_path, loop=1, t=duration, framerate=framerate)
119 | .output(output_path, vcodec="libx264")
120 | .overwrite_output()
121 | .run()
122 | )
123 |
124 |
125 | class ImageDirectoryToVideoTool(Tool):
126 | description = """
127 | This tool creates video
128 | from a directory of images. Inputs
129 | are input_path and output_path.
130 | Output is the output_path.
131 | """
132 | inputs = ["text", "text"]
133 | outputs = ["None"]
134 |
135 | def __call__(
136 | self,
137 | input_path: str,
138 | output_path: str,
139 | framerate: int = 24,
140 | extension: str = "jpg",
141 | ):
142 | # Check for valid extension
143 | valid_extensions = ["jpg", "png", "jpeg"]
144 | if extension not in valid_extensions:
145 | raise ValueError(
146 | f"Invalid extension {extension}. Must be one of {valid_extensions}"
147 | )
148 |
149 | (
150 | ffmpeg.input(
151 | input_path.rstrip("/") + "/*." + extension.lstrip("."),
152 | pattern_type="glob",
153 | framerate=framerate,
154 | )
155 | .output(output_path)
156 | .overwrite_output()
157 | .run()
158 | )
159 |
160 |
161 | class VideoCropTool(Tool):
162 | description = """
163 | This tool crops a video with inputs:
164 | input_path, output_path,
165 | top_x, top_y,
166 | bottom_x, bottom_y.
167 | Output is the output_path.
168 | """
169 | inputs = ["text", "text", "text", "text", "text", "text"]
170 | outputs = ["text"]
171 |
172 | def __call__(
173 | self,
174 | input_path: str,
175 | output_path: str,
176 | top_x: str,
177 | top_y: str,
178 | bottom_x: str,
179 | bottom_y: str,
180 | ):
181 | stream = ffmpeg.input(input_path)
182 | stream = ffmpeg.crop(
183 | stream,
184 | int(top_y),
185 | int(top_x),
186 | int(bottom_y) - int(top_y),
187 | int(bottom_x) - int(top_x),
188 | )
189 | stream = ffmpeg.output(stream, output_path)
190 | ffmpeg.run(stream)
191 | return output_path
192 |
193 |
194 | class VideoFlipTool(Tool):
195 | description = """
196 | This tool flips video along the horizontal
197 | or vertical axis. Inputs are input_path,
198 | output_path and orientation. Output is output_path.
199 | """
200 | inputs = ["text", "text", "text"]
201 | outputs = ["text"]
202 |
203 | def __call__(
204 | self, input_path: str, output_path: str, orientation: str = "horizontal"
205 | ):
206 | # Check for valid orientation
207 | valid_orientations = ["horizontal", "vertical"]
208 | if orientation not in valid_orientations:
209 | raise ValueError(
210 | f"Invalid orientation {orientation}. Must be one of {valid_orientations}"
211 | )
212 |
213 | flip = ffmpeg.vflip if orientation == "vertical" else ffmpeg.hflip
214 | stream = ffmpeg.input(input_path)
215 | stream = flip(stream)
216 | stream = ffmpeg.output(stream, output_path)
217 | ffmpeg.run(stream)
218 | return output_path
219 |
220 |
221 | class VideoFrameSampleTool(Tool):
222 | description = """
223 | This tool samples an image frame from an input video.
224 | Inputs are input_path, output_path, and frame_number.
225 | Output is the output_path.
226 | """
227 | inputs = ["text", "text", "text"]
228 | outputs = ["text"]
229 |
230 | def __call__(self, input_path: str, output_path: str, frame_number: int):
231 | out, _ = (
232 | ffmpeg.input(input_path)
233 | .filter("select", "gte(n,{})".format(str(frame_number)))
234 | .output("pipe:", vframes=1, format="image2", vcodec="mjpeg")
235 | .run(capture_stdout=True)
236 | )
237 | img = Image.open(BytesIO(out))
238 | img.save(output_path)
239 | return output_path
240 |
241 |
242 | class VideoGopChunkerTool(Tool):
243 | description = """
244 | This tool segments video input into GOPs (Group of Pictures) chunks of
245 | segment_length (in seconds). Inputs are input_path and segment_length.
246 | """
247 | inputs = ["text", "integer"]
248 | outputs = ["None"]
249 |
250 | def __init__(self):
251 | super().__init__()
252 |
253 | def __call__(self, input_path, segment_length):
254 | basename = Path(input_path).stem
255 | output_dir = Path(input_path).parent
256 | video_info = get_video_info(input_path)
257 | num_segments = math.ceil(float(video_info["duration"]) / segment_length)
258 | num_digits = len(str(num_segments))
259 | filename_pattern = f"{output_dir}/{basename}_%0{num_digits}d.mp4"
260 |
261 | ffmpeg.input(input_path).output(
262 | filename_pattern,
263 | c="copy",
264 | map="0",
265 | f="segment",
266 | segment_time=segment_length,
267 | ).run()
268 |
269 |
270 | class VideoHTTPServerTool(Tool):
271 | description = """
272 | This tool streams a source video to an HTTP server.
273 | Inputs are input_path and server_url.
274 | """
275 | inputs = ["text", "text"]
276 | outputs = ["None"]
277 |
278 | def __call__(self, input_path: str, server_url: str = "http://localhost:8080"):
279 | process = (
280 | ffmpeg.input(input_path)
281 | .output(
282 | server_url,
283 | codec="copy", # use same codecs of the original video
284 | listen=1, # enables HTTP server
285 | f="flv",
286 | ) # ffplay -f flv http://localhost:8080
287 | .global_args("-re") # argument to act as a live stream
288 | .overwrite_output()
289 | .run()
290 | )
291 |
292 |
293 | class VideoLetterBoxingTool(Tool):
294 | description = """
295 | This tool adds letterboxing to a video.
296 | Inputs are input_path, output_path, width, height, bg_color.
297 | """
298 | inputs = ["text", "text", "int", "int", "text"]
299 | outputs = ["None"]
300 |
301 | def __call__(
302 | self,
303 | input_path: str,
304 | output_path: str,
305 | width: int = 1920,
306 | height: int = 1080,
307 | bg_color: str = "black",
308 | ):
309 | video_info = get_video_info(input_path)
310 | old_width = int(video_info["width"])
311 | old_height = int(video_info["height"])
312 |
313 | # Check if the video is in portrait mode
314 | if old_height >= old_width:
315 | vf_option = "scale={}:{}:force_original_aspect_ratio=decrease,pad={}:{}:-1:-1:color={}".format(
316 | width, height, width, height, bg_color
317 | )
318 | else:
319 | vf_option = "scale={}:-1".format(width)
320 | (ffmpeg.input(input_path).output(output_path, vf=vf_option).run())
321 |
322 |
323 | class VideoOverlayTool(Tool):
324 | description = """
325 | This tool overlays one video on top of another.
326 | Inputs are main_video_path, overlay_video_path, output_path, x_position, y_position.
327 | """
328 | inputs = ["text", "text", "text", "integer", "integer"]
329 | outputs = ["None"]
330 |
331 | def __call__(
332 | self,
333 | main_video_path: str,
334 | overlay_video_path: str,
335 | output_path: str,
336 | x_position: int,
337 | y_position: int,
338 | ):
339 | main = ffmpeg.input(main_video_path)
340 | overlay = ffmpeg.input(overlay_video_path)
341 |
342 | (
343 | ffmpeg.output(
344 | ffmpeg.overlay(main, overlay, x=x_position, y=y_position), output_path
345 | )
346 | .overwrite_output()
347 | .run()
348 | )
349 |
350 |
351 | class VideoReverseTool(Tool):
352 | description = """
353 | This tool reverses a video.
354 | Inputs are input_path and output_path.
355 | """
356 | inputs = ["text", "text"]
357 | outputs = ["None"]
358 |
359 | def __call__(self, input_path: str, output_path: str):
360 | (
361 | ffmpeg.input(input_path)
362 | .filter_("reverse")
363 | .output(output_path)
364 | .overwrite_output()
365 | .run()
366 | )
367 |
368 |
369 | class VideoResizeTool(Tool):
370 | description = """
371 | This tool resizes the video to the specified dimensions.
372 | Inputs are input_path, width, height, output_path.
373 | """
374 | inputs = ["text", "text", "integer", "integer"]
375 | outputs = ["None"]
376 |
377 | def __call__(self, input_path: str, output_path: str, width: int, height: int):
378 | (
379 | ffmpeg.input(input_path)
380 | .output(output_path, vf="scale={}:{}".format(width, height))
381 | .overwrite_output()
382 | .run()
383 | )
384 |
385 |
386 | class VideoRotateTool(Tool):
387 | description = """
388 | This tool rotates a video by a specified angle.
389 | Inputs are input_path, output_path and rotation_angle in degrees.
390 | """
391 | inputs = ["text", "text", "integer"]
392 | outputs = ["None"]
393 |
394 | def __call__(self, input_path: str, output_path: str, rotation_angle: int):
395 | (
396 | ffmpeg.input(input_path)
397 | .filter_("rotate", rotation_angle)
398 | .output(output_path)
399 | .overwrite_output()
400 | .run()
401 | )
402 |
403 |
404 | class VideoSegmentDeleteTool(Tool):
405 | description = """
406 | This tool deletes a interval of video by timestamp.
407 | Inputs are input_path, output_path, start, end.
408 | Format start/end as float.
409 | """
410 | inputs = ["text", "text", "float", "float"]
411 | outputs = ["None"]
412 |
413 | def __call__(self, input_path: str, output_path: str, start: float, end: float):
414 | (
415 | ffmpeg.input(input_path)
416 | .output(
417 | output_path,
418 | vf="select='not(between(t,{},{}))',setpts=N/FRAME_RATE/TB".format(
419 | start, end
420 | ),
421 | af="aselect='not(between(t,{},{}))',asetpts=N/SR/TB".format(start, end),
422 | )
423 | .run()
424 | )
425 |
426 |
427 | class VideoSpeedTool(Tool):
428 | description = """
429 | This tool speeds up a video.
430 | Inputs are input_path as a string, output_path as a string, speed_factor (float) as a string.
431 | Output is the output_path.
432 | """
433 | inputs = ["text", "text", "text"]
434 | outputs = ["text"]
435 |
436 | def __call__(self, input_path: str, output_path: str, speed_factor: float):
437 | stream = ffmpeg.input(input_path)
438 | stream = ffmpeg.setpts(stream, "1/{}*PTS".format(float(speed_factor)))
439 | stream = ffmpeg.output(stream, output_path)
440 | ffmpeg.run(stream)
441 | return output_path
442 |
443 |
444 | class VideoStackTool(Tool):
445 | description = """
446 | This tool stacks two videos either vertically or horizontally based on the orientation parameter.
447 | Inputs are input_path, second_input, output_path, and orientation as strings.
448 | Output is the output_path.
449 | vertical orientation -> vstack, horizontal orientation -> hstack
450 | """
451 | inputs = ["text", "text", "text", "text"]
452 | outputs = ["None"]
453 |
454 | def __call__(
455 | self, input_path: str, second_input: str, output_path: str, orientation: str
456 | ):
457 | video1 = ffmpeg.input(input_path)
458 | video2 = ffmpeg.input(second_input)
459 |
460 | if orientation.lower() not in ["vstack", "hstack"]:
461 | raise ValueError("Orientation must be either 'vstack' or 'hstack'.")
462 |
463 | stacked = ffmpeg.filter((video1, video2), orientation)
464 | out = ffmpeg.output(stacked, output_path)
465 | out.run(overwrite_output=True)
466 |
467 |
468 | class VideoTrimTool(Tool):
469 | name = "VideoTrimTool"
470 | description = """
471 | This tool trims a video. Inputs are input_path, output_path,
472 | start_time, and end_time. Format start(end)_time: HH:MM:SS
473 | """
474 | inputs = ["text", "text", "text", "text"]
475 | outputs = ["None"]
476 |
477 | def __call__(
478 | self, input_path: str, output_path: str, start_time: str, end_time: str
479 | ):
480 | start_time=start_time.replace("-","")
481 | end_time=end_time.replace("-", "")
482 | stream = ffmpeg.input(input_path)
483 | v = stream.trim(start=start_time, end=end_time).setpts("PTS-STARTPTS")
484 | if has_audio(input_path):
485 | a = stream.filter_("atrim", start=start_time, end=end_time).filter_(
486 | "asetpts", "PTS-STARTPTS"
487 | )
488 | joined = ffmpeg.concat(v, a, v=1, a=1).node
489 | out = ffmpeg.output(joined[0], joined[1], output_path)
490 | else:
491 | out = ffmpeg.output(v, output_path)
492 | out.run()
493 |
494 |
495 | class VideoWatermarkTool(Tool):
496 | description = """
497 | This tool adds logo image as watermark to a video.
498 | Inputs are input_path, output_path, watermark_path.
499 | """
500 | inputs = ["text", "text", "text", "integer", "integer"]
501 | outputs = ["None"]
502 |
503 | def __call__(
504 | self,
505 | input_path: str,
506 | output_path: str,
507 | watermark_path: str,
508 | x: int = 10,
509 | y: int = 10,
510 | ):
511 | main = ffmpeg.input(input_path)
512 | logo = ffmpeg.input(watermark_path)
513 | (
514 | ffmpeg.filter([main, logo], "overlay", x, y)
515 | .output(output_path)
516 | .overwrite_output()
517 | .run()
518 | )
519 |
--------------------------------------------------------------------------------
/ffmperative/utils.py:
--------------------------------------------------------------------------------
1 | import os
2 | import ffmpeg
3 | import base64
4 | import json
5 | import requests
6 | import subprocess
7 |
8 | from pathlib import Path
9 |
10 | def download_ffmp():
11 | bin_dir = os.path.join(os.path.dirname(__file__), 'bin')
12 | ffmp_path = os.path.join(bin_dir, 'ffmp')
13 |
14 | # Create the 'bin/' directory if it does not exist
15 | if not os.path.exists(bin_dir):
16 | os.makedirs(bin_dir, exist_ok=True)
17 |
18 | # Download ffmp file if it does not exist
19 | if not os.path.exists(ffmp_path):
20 | print("Downloading ffmp...")
21 | model_url = "https://remyx.ai/assets/ffmperative/0.0.7/ffmp"
22 | response = requests.get(model_url, stream=True)
23 | with open(ffmp_path, 'wb') as f:
24 | for chunk in response.iter_content(chunk_size=8192):
25 | f.write(chunk)
26 | print("Download complete.")
27 | else:
28 | pass
29 |
30 | # Check if the file is executable, and make it executable if it's not
31 | if not os.access(ffmp_path, os.X_OK):
32 | print("Making ffmp executable...")
33 | os.chmod(ffmp_path, 0o755) # Sets the file to be readable and executable by everyone, and writable by the owner.
34 | print("ffmp is now executable.")
35 |
36 | def extract_and_encode_frame(video_path):
37 | # Get the duration of the video
38 | probe = ffmpeg.probe(video_path)
39 | duration = float(probe['streams'][0]['duration'])
40 |
41 | # Calculate the timestamp for a frame in the middle of the video
42 | mid_time = duration / 2
43 |
44 | # Extract the frame at mid_time
45 | out, _ = (
46 | ffmpeg
47 | .input(video_path, ss=mid_time)
48 | .output('pipe:', vframes=1, format='image2', vcodec='mjpeg')
49 | .run(capture_stdout=True, capture_stderr=True)
50 | )
51 |
52 | # Encode the frame in base64
53 | base64_image = base64.b64encode(out).decode('utf-8')
54 |
55 | return base64_image
56 |
57 | def process_video_directory(directory_path):
58 | json_list = []
59 | for filename in os.listdir(directory_path):
60 | print("Processing: ", filename)
61 | if filename.endswith((".mp4", ".avi", ".mov")): # Add other video formats if needed
62 | video_path = os.path.join(directory_path, filename)
63 | base64_image = extract_and_encode_frame(video_path)
64 | json_list.append({"name": video_path, "sample": base64_image})
65 | return json_list
66 |
67 | def post_json_to_endpoint(json_data, url):
68 | headers = {'Content-Type': 'application/json'}
69 | response = requests.post(url, json=json_data, headers=headers)
70 | return response
71 |
72 | def call_director(video_directory, user_instructions=None):
73 | json_data = process_video_directory(video_directory)
74 |
75 | # Add user instructions to the JSON data
76 | if user_instructions:
77 | json_data = {
78 | 'videos': json_data,
79 | 'user_instructions': user_instructions
80 | }
81 | else:
82 | json_data = {'videos': json_data}
83 |
84 | # Endpoint URL
85 | endpoint_url = 'https://engine.remyx.ai/api/v1.0/task/b_roll/compose'
86 |
87 | # Make the POST request
88 | response = post_json_to_endpoint(json_data, endpoint_url)
89 | response = response.json()
90 | compose_plan = response["compose_plan"]
91 | join_command = response["join_command"]
92 | return compose_plan, join_command
93 |
94 | def process_clip(clip_path):
95 | basename = os.path.basename(clip_path)
96 | processed_clip_path = Path("processed_clips") / basename
97 | subprocess.run(["ffmpeg", "-i", str(clip_path), "-vf", "scale=1920:1080,setsar=1,setdar=16/9,fps=30", str(processed_clip_path)])
98 | return processed_clip_path
99 |
100 | def process_and_concatenate_clips(videos_string, output_path="composed_video.mp4"):
101 | # Split the string into individual paths
102 | video_paths = videos_string.strip().split()
103 |
104 | # Ensure there are video paths provided
105 | if not video_paths:
106 | raise ValueError("Please provide a string with video file paths")
107 |
108 | # Directory to store processed clips
109 | processed_clips_dir = Path("processed_clips")
110 | processed_clips_dir.mkdir(exist_ok=True)
111 |
112 | # Process each clip
113 | processed_clips = []
114 | for clip_path in video_paths:
115 | clip_path = Path(clip_path)
116 | if clip_path.exists() and clip_path.is_file():
117 | processed_clip = process_clip(clip_path)
118 | processed_clips.append(processed_clip)
119 | else:
120 | print(f"Warning: File not found {clip_path}")
121 |
122 | # Create a file list
123 | with open("files.txt", "w") as file_list:
124 | for clip in processed_clips:
125 | file_list.write(f"file '{clip}'\n")
126 |
127 | # Concatenate all processed clips
128 | subprocess.run(["ffmpeg", "-f", "concat", "-safe", "0", "-i", "files.txt", output_path])
129 |
130 | # Cleanup
131 | for clip in processed_clips:
132 | clip.unlink()
133 |
134 | # Delete all contents of the directory
135 | for item in processed_clips_dir.iterdir():
136 | if item.is_dir():
137 | shutil.rmtree(item)
138 | else:
139 | item.unlink()
140 |
141 | # Now safely remove the directory
142 | processed_clips_dir.rmdir()
143 |
144 | # Remove the files.txt
145 | Path("files.txt").unlink()
146 |
147 | # Additionally, delete the original clip files
148 | for original_clip_path in video_paths:
149 | original_clip = Path(original_clip_path)
150 | if original_clip.exists():
151 | original_clip.unlink()
152 |
153 | return f"All clips processed and concatenated into {output_path}"
154 |
155 | def modify_file_name(file_path, prefix):
156 | # Convert the file path to a Path object
157 | file_path = Path(file_path)
158 |
159 | # Extract the directory and the file name
160 | parent_dir = file_path.parent
161 | file_name = file_path.name
162 |
163 | # Add the prefix to the file name
164 | new_file_name = prefix + file_name
165 |
166 | # Create the new file path
167 | new_file_path = os.path.join(parent_dir, new_file_name)
168 |
169 | return new_file_path
170 |
171 |
172 | def probe_video(input_path):
173 | return ffmpeg.probe(input_path)
174 |
175 |
176 | def get_video_info(input_path):
177 | probe = probe_video(input_path)
178 | return next(
179 | (stream for stream in probe["streams"] if stream["codec_type"] == "video"), None
180 | )
181 |
182 |
183 | def has_audio(input_path):
184 | probe = probe_video(input_path)
185 | return any(stream["codec_type"] == "audio" for stream in probe["streams"])
186 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | numpy
2 | pillow
3 | requests
4 | opencv-python
5 | ffprobe-python
6 | ffmpeg-python
7 | soundfile
8 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | import os
2 | from setuptools import setup, find_packages
3 | from setuptools.command.install import install
4 |
5 | # read the contents of your README file
6 | from pathlib import Path
7 | this_directory = Path(__file__).parent
8 | long_description = (this_directory / "README.md").read_text()
9 |
10 |
11 | def read_requirements(file):
12 | with open(file) as f:
13 | return [line.strip() for line in f if line.strip() and not line.startswith('#')]
14 |
15 | setup(
16 | name="ffmperative",
17 | version="0.0.7-1",
18 | packages=find_packages(),
19 | include_package_data=True,
20 | install_requires=read_requirements((this_directory / 'requirements.txt')),
21 | entry_points={
22 | "console_scripts": [
23 | "ffmperative=ffmperative.cli:main",
24 | ],
25 | },
26 | long_description=long_description,
27 | long_description_content_type='text/markdown'
28 | )
29 |
--------------------------------------------------------------------------------