LoRA Training for Virtual Influencer Personas: The 4-Hour Walkthrough

Who this guide is for

This guide is for Segment 2 buyers — operators who want to build a fictional persona for Instagram, TikTok, or similar platforms. If you are trying to build a talking-head marketing video, you are on the wrong page — see HeyGen at £23/mo or Synthesia at £50/mo.

The fictional-persona workflow is unglamorous. It requires 4-8 hours of setup, a willingness to run open-source tooling on a cloud GPU, and honest expectations about what character-consistent generation can and cannot do.

The tool everyone calls “the AI influencer generator” — Synthesia, HeyGen, Descript — will not help you build Aitana López. They make talking-head video. You need consistent stills: same face, different outfits, different poses, different lighting, 200 posts deep, none of them visibly drifting. That requires a custom LoRA.

What you will build

By the end of this guide you will have:

A custom LoRA file for your fictional character (50-200MB)
A ComfyUI workflow that generates consistent-face images on demand
Realistic expectations: the LoRA will drift on extreme poses and unusual lighting; you will need 2-3 retakes per scene

What you will not have: any of the brand-deal infrastructure (DMs, contracts, FTC disclosures, payments). That is a separate problem.

Tools and costs

Tool	Cost	Purpose
Midjourney	£24/mo	Generate initial character reference images
RunPod	~£5/training run	Cloud GPU for LoRA training
ComfyUI	Free	Node-based image generation workflow
Kohya_ss	Free	LoRA training software
ChatGPT	£16/mo	Caption generation

Total month 1: ~£50 (Midjourney + ChatGPT + one training run) Ongoing: ~£30-40/mo (Midjourney + ChatGPT, no retraining needed)

Step 1: Design your character in Midjourney (2-3 hours)

Generate 50-100 character variations using a consistent base prompt. Example prompt structure:

portrait photo of [physical description], [style], professional lighting, 
8k, photorealistic, centered composition

Select your best 20-30 images as your training set. Criteria:

Different angles (front, 3/4, side)
Different expressions (neutral, smiling, serious)
Different lighting conditions
Same recognisable face structure across all

Step 2: Caption your training images (1 hour)

Each image needs a text caption. The caption teaches the LoRA what is unique about your character vs what is generic. Format:

[trigger_word], [physical description], [clothing], [setting]

Set a unique trigger word (e.g., ohwx_woman, mycharacter) — this is how you invoke the character in future prompts.

Tools: Kohya_ss has a built-in captioning tool using BLIP; or caption manually.

Step 3: Train on RunPod (4 hours, ~£5)

Create a RunPod account, provision a GPU pod (RTX 4090, ~£0.69/hr)
Install Kohya_ss on the pod
Upload your training images and captions
Configure training: 1500-2000 steps, batch size 2, learning rate 1e-4
Run training — takes 2-4 hours depending on step count
Download the resulting .safetensors file (your LoRA)

Terminate the pod when done — idle GPU time is charged.

Step 4: Test in ComfyUI (1 hour)

Load ComfyUI locally or on a RunPod instance. Install the LoRA loader node. Generate 20 test images using your trigger word plus varied prompts. Check for:

Face recognition consistency across prompts
Realistic skin and lighting
Appropriate response to pose/outfit changes

If the character drifts significantly, adjust training steps (more steps for stronger character imprint, but risk overfitting).

Step 5: Production workflow

Once your LoRA is stable, your weekly generation workflow:

Write a scene description (e.g., “at a café in Tokyo, autumn, holding a coffee”)
Generate 10-20 variants with your LoRA loaded in ComfyUI
Select the best 2-3 images
Caption with ChatGPT in your character’s voice
Post with synthetic-media disclosure (required under EU AI Act)

Typical output: 30-60 consistent-face images per month at ongoing cost of ~£30.

Honest failure modes

Face drift on extreme poses — side profiles and back-of-head shots often break the LoRA’s face model
Lighting extremes — very dark scenes or harsh directional light cause drift
Clothing-to-face linkage — if all training images used similar clothing, the LoRA sometimes links face identity to that clothing
Overfitting — too many training steps makes the character look plastic and identical in every image