[preamble]
TODO:
- download the video
- convert to different formats via ffmpeg
- create thumbnails
- use a trained ML model for scene classification
- make sub clips based on classification
- ML training?
pypyr is a task-runner that lets you automate task sequences that may call different applications, APIs and bits of scripts - without you having to write code to do so. You don't have to worry about coding the error-prone boring repetitive bits that is common when you automate your processes - pypyr takes care of all of that for you, like input argument parsing, working with configuration files, error handling, automatic retries on failure and running something repeatedly in a loop.
You automate your workflow by creating steps in a pipeline. The term "pipeline" refers to the idea that this is a series of sequential steps, executing one after the other.
The pipeline is simply a yaml file that is friendly for human creation & consumption.
pypyr is a Python application. It requires Python >=3.6.
You probably want to install it to a virtual environment, although this is not mandatory. If you want to create a virtual environment, this is the quickest way to do so. Open your terminal and do the following:
$ cd path/to/your/dev/folder/
$ python3 -m venv .env
$ . .env/bin/activate
[TODO: other deps like ffmpeg?]
From your terminal, install pypyr like this:
$ pip install pypyr
Note: if you are working in a virtual environment, it needs to be active when
you run pip install
. If you followed the previous step it will be active
already.
You can verify that pypyr installed correctly by running one of pypyr's built-in pipelines. It's about pipes, on account of being a pipeline runner:
$ pypyr magritte
Ceci n'est pas une pipe
We want to automate processing for more than one video and there are different parameters we want to set for each video. Instead of error-prone typing of long input sequences at the terminal, an easy way of specifying this is in a yaml input configuration file.
This will also allow us to create different inputs for different batches of videos with different settings for each.
Create a file in your code or text editor in your dev directory input.yaml
# ./input.yaml
videos:
- name: video1.mp4
url: https://arburl/notreal1
do_thumbnails: True
- name: video2.mp4
url: https://arburl/notreal2
do_thumbnails: False
- name: video3.mp4
url: https://arburl/notreal3
do_thumbnails: True
The exact format of this yaml is arbitrary - we get to decide what we want the structure to look like and which fields we want to include. It's up to us to create a pipeline that understand whatever input we create here.
In your favorite code editor, create video-process.yaml
in your dev directory.
This is your pypyr pipeline. The entire pipeline will look like this:
# ./video-process.yaml
# run me like this:
# $ pypyr video-process ./input.yaml
context_parser: pypyr.parser.yamlfile
steps:
- name: pypyr.steps.call
description: --> loop through videos to process
foreach: '{videos}'
in:
call: process_video
- name: pypyr.steps.echo
in:
echoMe: --> done
process_video:
- name: pypyr.steps.contextsetf
in:
contextSetf:
current_video: '{i}'
- name: pypyr.steps.echo
in:
echoMe: --> processing {current_video[name]} from {current_video[url]}
- name: pypyr.steps.cmd
comment: download video. retries up to 3X if download fails.
description: --> downloading video
retry:
max: 3
in:
cmd: echo curl {current_video[url]} -o {current_video[name]}
- name: pypyr.steps.cmd
comment: convert to different formats [origin_file_name]-output.[ext]
description: --> convert video to output formats
foreach: ['flac', 'mkv', 'webm']
in:
cmd: echo ffmpeg -i {current_video[name]} {current_video[name]}-output.{i}
- name: pypyr.steps.cmd
description: --> generate thumbnails
comment: do thumbnails, output name {current_video[name]}-out[n].png, where n counter.
only run if do_thumbnails is True for this video.
run: '{current_video[do_thumbnails]}'
in:
cmd: echo ffmpeg -i {current_video[name]} -vf fps=1/60 {current_video[name]}-out%d.png
# TODO: whatever ML provider/api/cli you're using here
The 1st thing we want to do is parse our input configuration file we already created. pypyr uses a context parser to parse inputs and put it into the pypyr context. The pypyr context is a dictionary that is in scope for the entire duration of the pipeline. You use the context to persist and pass values between steps.
In our case, pypyr has a ready-made parser that will read & parse our input yaml configuration file without having to write code.
This is what the context_parser: pypyr.parser.yamlfile
first line of the
pipeline does. It tells pypyr to use the built-in yamlfile parser to treat
your cli input argument as a path to a yaml file to read into context. This
lets you run the pipeline like this:
$ pypyr video-process ./path-to-input-config-here.yaml
This way you can dynamically specify different input configuration files each time you run your pipeline.
pypyr looks for the steps:
group as entry-point to the pipeline.
What we want to do is run a sequence of steps over every video we have
specified in our input configuration file. We can group the steps we want to
run for each video together under the the process_video
key.
We then want to loop over all of our videos from the input configuration and run
the entire process_video
sequence for each item in the input configuration.
We call the process_video
group by using the built-in pypyr
call step.
- name: pypyr.steps.call
description: --> loop through videos to process
foreach: '{videos}'
in:
call: process_video
We can very easily loop over every video in our input by telling pypyr to call
process_video
for each video by using the foreach decorator to loop through all the input videos.
Let's look at the foreach
instruction in detail:
foreach: '{videos}'
pypyr treats anything in between curly braces as a formatting substitution
expression. So here, we are telling pypyr to look for videos
in the pypyr
context. If you check our input.yaml
file, you'll see we have a list
if videos under the videos
key. Remember that all of this will be in the
pypyr context because we used the pypyr.parser.yamlfile
context parser to
load the yaml file path we specify from the cli into context.
pypyr will log the text in the description
field to the cli output when we
run the pipeline. This is not mandatory, but it is a handy way seeing your
pipeline progress as it runs.
So that is how we are calling the process_video
group of steps for each
video in our input. Now let's look at the sequence of steps we run for each
video:
process_video:
- name: pypyr.steps.contextsetf
in:
contextSetf:
current_video: '{i}'
- name: pypyr.steps.echo
in:
echoMe: --> processing {current_video[name]} from {current_video[url]}
- name: pypyr.steps.cmd
comment: download video. retries up to 3X if download fails.
description: --> downloading video
retry:
max: 3
in:
cmd: echo curl {current_video[url]} -o {current_video[name]}
- name: pypyr.steps.cmd
comment: convert to different formats [origin_file_name]-output.[ext]
description: --> convert video to output formats
foreach: ['flac', 'mkv', 'webm']
in:
cmd: echo ffmpeg -i {current_video[name]} {current_video[name]}-output.{i}
- name: pypyr.steps.cmd
description: --> generate thumbnails
comment: do thumbnails, output name {current_video[name]}-out[n].png, where n counter.
only run if do_thumbnails is True for this video.
run: '{current_video[do_thumbnails]}'
in:
cmd: echo ffmpeg -i {current_video[name]} -vf fps=1/60 {current_video[name]}-out%d.png
In the first step, we use the built-in
contextsetf step to set a new context
item with some formatting. (The f at the end carries the same meaning as in
printf
in many programming languages.)
Remember that we are calling this entire step-group from a foreach loop. When
we are in a foreach
loop, {i}
represents the current iterator. We
could just use {i[name]}
or {i[url]}
to refer to fields of the current video
throughout this step-group, but for the sake of clarity, let's create a new
context item called current_video
and assign to it the value of i
, which
is the current item in the list we are iterating over.
In the next step we use echo to output a friendly status message to the console as the pipeline runs:
- name: pypyr.steps.echo
in:
echoMe: --> processing {current_video[name]} from {current_video[url]}
Notice in the substitution expression we are accessing the name
and url
values from our input yaml file.
You can execute any program available in your current PATH using the built-in pypyr.steps.cmd step.
So 1st, we're using curl
to download the video, using the url
from our
input configuration. The -o
flag specifies the file-name curl will save the
download as.
- name: pypyr.steps.cmd
comment: download video. retries up to 3X if download fails.
description: --> downloading video
retry:
max: 3
in:
cmd: echo curl {current_video[url]} -o {current_video[name]}
TODO: arbitrary example curl - depending on video source might need to worry about authentication, custom http headers etc.
Because this is the internet and connectivity isn't guaranteed, if anything goes wrong with the download, we tell pypyr to retry it up to 3 times, using the automatic retry decorator:
retry:
max: 3
Now that we have downloaded the video, we want to convert it to different output formats.
- name: pypyr.steps.cmd
comment: convert to different formats [origin_file_name]-output.[ext]
description: --> convert video to output formats
foreach: ['flac', 'mkv', 'webm']
in:
cmd: echo ffmpeg -i {current_video[name]} {current_video[name]}-output.{i}
TODO: arbitrary untested example ffmpeg command. substitute with actual cmd & tested switches.
Rather than manually have to write out each output conversion step, we're going
to use a foreach
loop to execute the cmd for each item in the list. Notice
that we are using substitution expressions to fill out the argument values we pass to the
ffmpeg command, and here the iterator i
will refer to flac
, mkv
or webm
depending on where in the loop we are.
After all the conversions complete, the next step is to generate thumbnails from our original source video:
- name: pypyr.steps.cmd
description: --> generate thumbnails
comment: do thumbnails, output name {current_video[name]}-out[n].png, where n counter.
only run if do_thumbnails is True for this video.
run: '{current_video[do_thumbnails]}'
in:
cmd: echo ffmpeg -i {current_video[name]} -vf fps=1/60 {current_video[name]}-out%d.png
TODO: arbitrary untested example ffmpeg command. substitute with actual cmd & tested switches.
Here we use pypyr's conditional run decorator
only to run this step to generate thumbnails if the do_thumbnails
field is
boolean True
for the current video. If do_thumbnails
is False, pypyr will
NOT run this step, and output the description
to the console for you with an
indicator to show you that it is not running this step:
(skipping): --> generate thumbnails
To run this pipeline, from the console, do:
$ pypyr video-process ./input.yaml
This tells pypyr to look for the video-process.yaml
file in the current
directory. pypyr automatically appends the .yaml for you, you don't have to type
it out.
If you want to process a different set of videos, you can create a new input configuration file for them and run it the same way:
$ pypyr video-process ./another-input-file.yaml
This way you can re-use your pipeline without having to touch the functional pipeline declaration itself.
You can also run your pipeline from the pypyr Python api.
Rather than use a yaml config file as an input, we can directly inject a standard Python dictionary into the pipeline. The exact same pipeline can merrily use this dictionary as long as it keeps to the same structure as the yaml config file we made earlier.
import pypyr.pipelinerunner
# prepare a dict to initialize context
input_dict = {
'videos': [
{'name': 'video1.mp4',
'url': 'https://arburl/notreal1',
'do_thumbnails': True},
{'name': 'video2.mp4',
'url': 'https://arburl/notreal2',
'do_thumbnails': False},
{'name': 'video3.mp4',
'url': 'https://arburl/notreal3',
'do_thumbnails': True},
]
}
context_out = pypyr.pipelinerunner.main_with_context(
pipeline_name='video-process',
dict_in=input_dict)
See attached video-process.py
for full sample.