From URL to finished video ad: the two-minute pipeline
Paste a URL, get a finished video ad in about two minutes. A step-by-step look at the pipeline: brand scrape, three scripts, voice, avatar, captions.
$300-800 is the going rate for one edited short-form ad from a freelancer or small agency. Most of that money does not buy editing. It buys the brief, the asset folder, the kickoff call, and two rounds of revisions spread across a week or two.
An AI video ad generator collapses all of that into a pipeline. You paste your website URL; about two minutes later you have a rendered MP4 with captions, sized for the platform you picked. There is no brief, because the brief is your website.
Here is what happens inside those two minutes, step by step, and where a 30-second manual edit makes the difference between an ad that exists and an ad that converts.
Step 1: The brand scrape
The pipeline starts by reading your site the way a freelancer would read your brief, except it takes seconds instead of a kickoff call. Gemini analyzes the page and pulls out your brand colors, your tone of voice, the audience you write for, and the claims you actually make about the product.
The scrape is only as good as the page you feed it. If your homepage still says "launching soon" or leads with a feature you killed last quarter, that is what shows up in the ad.
Manual fix: paste your strongest landing page, not necessarily your homepage. Then skim the extracted tone and audience before continuing. If the scrape decides you sell to enterprises and you sell to indie hackers, correct it here, not three steps later in the script.
Step 2: Three script variants
The pipeline writes three scripts, not one. This is not generosity. It is the cheapest positioning test you will ever run.
One script in isolation always reads fine, because you have nothing to compare it against. So you approve it, and you never find out it was the weakest of three possible angles. Three variants force a choice: lead with the pain, lead with the outcome, or lead with the thing that makes you different. The act of picking teaches you something about your own product.
Manual fix: rewrite the hook by hand. The first line decides whether anyone watches the next twenty seconds, and you know your customers' exact vocabulary better than any model does. Wherever the script has an adjective, swap in a real number from your product.
Step 3: Images
The chosen script gets visuals next, generated by Imagen, Seedream, or Recraft depending on the scene.
The distinction worth knowing: most image models mangle text. If a visual needs readable words in it, a price, a button label, a headline on a mockup, that is a job for Recraft, which renders text correctly. Scenery and mood shots can come from any of the three.
Manual fix: for a SaaS product, prefer real screenshots over generated impressions of your UI. The export template supports screenshot cut-ins for exactly this reason. Generated images set context; your actual interface carries the credibility.
Step 4: Voiceover
The script becomes audio through Google TTS, Fish Audio, or ElevenLabs voices. Pick a voice that matches the tone the scrape found, not the one that sounds most impressive in isolation.
Manual fix: read the script out loud once before generating. Sentences that look short on screen run long when spoken, and a voiceover squeezing 40 words into 8 seconds sounds like the disclaimer at the end of a pharma ad. Cut until every sentence fits in one breath.
Step 5: The avatar
A presenter delivers the script using LivePortrait lip-sync. The preset actor library is built around the selfie-POV shots that perform in feeds: someone talking into their front camera in a car, a kitchen, a bedroom, an office, on the street, at the gym.
This is the step that stands in for the $60-150+ a UGC creator charges per video, and the credit pricing reflects it: the avatar is 7 of the 13 credits a full video costs. Lip-sync is the expensive part of the pipeline. It is also the part that makes the ad feel like a person made it.
Manual fix: match the setting to the placement. Car and kitchen read as native on TikTok; office reads right on LinkedIn. For SaaS demos there is a dedicated template: a 2-4 second avatar hook, product screenshot cut-ins, then the avatar comes back for the CTA.
Step 6: Captioned export
ffmpeg renders the final MP4 with captions burned in, in 9:16, 16:9, or 1:1, sized for Instagram Reels, TikTok, YouTube Shorts, or LinkedIn.
Captions are not decoration. A lot of feed video gets watched with the sound off, and an ad that depends on its voiceover loses those viewers in the first second. The captions carry the script for everyone scrolling on mute.
Manual fix: export one aspect ratio per placement rather than cropping after the fact. A 16:9 squeezed into 9:16 loses whatever was at the edges of the frame, which is usually your product.
What the two minutes cost
A full video runs about 13 credits: 1 for the scrape, 1 for the scripts, 2 for images, 1 for the voiceover, 7 for the avatar, 1 for the export. Depending on the plan, that lands between roughly $1.20 and $1.45 per finished video, third-party API costs included.
Set that against the anchors: $300-800 for one edited short-form ad, $60-150+ per UGC creator video, $2-10k a month for an agency retainer.
The interesting consequence is not the savings. At $1.32 a video on the Pro plan, you stop debating which of the three script angles is correct and render all of them. Run the three for a week, keep the winner, kill the rest. That workflow was never on the table at $500 per video. At $1.32 it is the obvious default.
This is the pipeline we built Aitachyon around: paste a URL, get the MP4, and if two weeks of testing doesn't prove it pays for itself, the 14-day money-back guarantee applies.