LoRA Training for Virtual Influencer Personas: The 4-Hour Walkthrough
Step-by-step walkthrough for training a custom LoRA on RunPod to produce character-consistent images for a fictional Instagram persona. Includes real cost breakdown, tooling list, and honest failure modes.
Who this guide is for
This guide is for Segment 2 buyers — operators who want to build a fictional persona for Instagram, TikTok, or similar platforms. If you are trying to build a talking-head marketing video, you are on the wrong page — see HeyGen at £23/mo or Synthesia at £50/mo.
The fictional-persona workflow is unglamorous. It requires 4-8 hours of setup, a willingness to run open-source tooling on a cloud GPU, and honest expectations about what character-consistent generation can and cannot do.
The tool everyone calls “the AI influencer generator” — Synthesia, HeyGen, Descript — will not help you build Aitana López. They make talking-head video. You need consistent stills: same face, different outfits, different poses, different lighting, 200 posts deep, none of them visibly drifting. That requires a custom LoRA.
What you will build
By the end of this guide you will have:
- A custom LoRA file for your fictional character (50-200MB)
- A ComfyUI workflow that generates consistent-face images on demand
- Realistic expectations: the LoRA will drift on extreme poses and unusual lighting; you will need 2-3 retakes per scene
What you will not have: any of the brand-deal infrastructure (DMs, contracts, FTC disclosures, payments). That is a separate problem.
Tools and costs
| Tool | Cost | Purpose |
|---|---|---|
| Midjourney | £24/mo | Generate initial character reference images |
| RunPod | ~£5/training run | Cloud GPU for LoRA training |
| ComfyUI | Free | Node-based image generation workflow |
| Kohya_ss | Free | LoRA training software |
| ChatGPT | £16/mo | Caption generation |
Total month 1: ~£50 (Midjourney + ChatGPT + one training run) Ongoing: ~£30-40/mo (Midjourney + ChatGPT, no retraining needed)
Step 1: Design your character in Midjourney (2-3 hours)
Generate 50-100 character variations using a consistent base prompt. Example prompt structure:
portrait photo of [physical description], [style], professional lighting,
8k, photorealistic, centered composition
Select your best 20-30 images as your training set. Criteria:
- Different angles (front, 3/4, side)
- Different expressions (neutral, smiling, serious)
- Different lighting conditions
- Same recognisable face structure across all
Step 2: Caption your training images (1 hour)
Each image needs a text caption. The caption teaches the LoRA what is unique about your character vs what is generic. Format:
[trigger_word], [physical description], [clothing], [setting]
Set a unique trigger word (e.g., ohwx_woman, mycharacter) — this is how you invoke the character in future prompts.
Tools: Kohya_ss has a built-in captioning tool using BLIP; or caption manually.
Step 3: Train on RunPod (4 hours, ~£5)
- Create a RunPod account, provision a GPU pod (RTX 4090, ~£0.69/hr)
- Install Kohya_ss on the pod
- Upload your training images and captions
- Configure training: 1500-2000 steps, batch size 2, learning rate 1e-4
- Run training — takes 2-4 hours depending on step count
- Download the resulting
.safetensorsfile (your LoRA)
Terminate the pod when done — idle GPU time is charged.
Step 4: Test in ComfyUI (1 hour)
Load ComfyUI locally or on a RunPod instance. Install the LoRA loader node. Generate 20 test images using your trigger word plus varied prompts. Check for:
- Face recognition consistency across prompts
- Realistic skin and lighting
- Appropriate response to pose/outfit changes
If the character drifts significantly, adjust training steps (more steps for stronger character imprint, but risk overfitting).
Step 5: Production workflow
Once your LoRA is stable, your weekly generation workflow:
- Write a scene description (e.g., “at a café in Tokyo, autumn, holding a coffee”)
- Generate 10-20 variants with your LoRA loaded in ComfyUI
- Select the best 2-3 images
- Caption with ChatGPT in your character’s voice
- Post with synthetic-media disclosure (required under EU AI Act)
Typical output: 30-60 consistent-face images per month at ongoing cost of ~£30.
Honest failure modes
- Face drift on extreme poses — side profiles and back-of-head shots often break the LoRA’s face model
- Lighting extremes — very dark scenes or harsh directional light cause drift
- Clothing-to-face linkage — if all training images used similar clothing, the LoRA sometimes links face identity to that clothing
- Overfitting — too many training steps makes the character look plastic and identical in every image
Related
Not sure which tool you need? Take the 60-second decision wizard.
Take the quiz →