Optimization Plan — YouTube Shorts Generator

Generated: 2026-04-11
Last updated: 2026-04-11 (P0 + P1 implemented)

Priority 0 — Critical (broken functionality)

#	Issue	File	Status	Description
0.1	G4F ignores temperature/max_tokens	`app/providers/llm/g4f_provider.py`	DONE	`_call_with_timeout()` does not pass `temperature` and `max_tokens` to the API call. All temperature tuning across the pipeline (0.3 for reflection, 0.8 for dynamic trends, etc.) is silently ignored.
0.2	Hardcoded "American English" in script prompt	`app/pipeline/step_02_script.py`	DONE	System prompt hardcodes "natural American English", "US audiences aged 18-35", "US spelling" regardless of `cfg.channel.language` and `cfg.channel.region`. Non-English channels generate wrong content.
0.3	Comments module ignores configured LLM provider	`app/analytics/youtube_comments.py`	DONE	`_build_llm()` hardcodes `G4FProvider()` instead of using `build_llm_provider(cfg.llm)`. If user configures OpenAI — comments still use free provider.
0.4	LLM factory has no runtime fallback	`app/providers/llm/factory.py`	DONE	Fallback only triggers at instantiation. If primary provider creates successfully but fails at call time — no fallback occurs.

P0 Implementation Details

0.1 — G4F temperature/max_tokens fix
- _call_with_timeout() signature extended: temperature: float = 0.7, max_tokens: int = 2000
- Parameters forwarded via kwargs dict to client.chat.completions.create(**kwargs)
- All 3 call sites (PollinationsAI, DeepInfra, generic) updated
- Specifics: Parameters are passed conditionally (if temperature is not None) to avoid breaking providers that don't support them

0.2 — Dynamic language/region in script prompt
- Added _LANG_MAP dict (en-US, uk-UA, es-ES, de-DE, fr-FR, pt-BR, ja-JP, ko-KR, zh-CN, hi-IN, ar-SA) + _lang_display_name() helper
- System prompt now uses cfg.channel.language → display name and cfg.channel.region dynamically
- Segment count is now dynamic: seg_min = max(2, word_target // 50), seg_max = min(7, word_target // 20)
- Added comprehensive few-shot JSON example (4-segment AI medical script) — also covers P1.2
- Hook instruction: "shocking fact, bold claim, or curiosity-gap question"; CTA: "specific action — avoid generic 'like and subscribe'"

0.3 — Comments LLM provider fix
- _build_llm() rewritten: loads config via reload_config(), uses build_llm_provider(cfg.llm)
- Falls back to None (with error log) if config loading fails, preserving graceful degradation

0.4 — Runtime LLM fallback
- New _FallbackLLMProvider(LLMProvider) wrapper class in factory.py
- Wraps primary + fallback providers; complete() and complete_json() catch primary exceptions and transparently retry on fallback
- build_llm_provider() returns wrapped provider when both primary and fallback are available
- Specifics: Logs llm_runtime_fallback warning with primary name, error, and fallback name for observability

Priority 1 — Important (significant quality/performance impact)

#	Issue	File	Status	Description
1.1	Steps 3+4 run sequentially	`app/pipeline/orchestrator.py`	DONE	Visuals and Audio are independent (both depend only on script). Parallel execution saves 30-90s per run.
1.2	No few-shot examples in prompts	`app/pipeline/step_02_script.py`	DONE	Script generation prompt has no concrete example. LLM frequently returns malformed JSON or wrong structure.
1.3	Memory overwrites instead of accumulating	`app/pipeline/step_09_reflect.py`	DONE	Each reflection completely replaces previous memory. Insights from older videos are permanently lost.
1.4	Comments not analyzed in reflection	`app/pipeline/step_09_reflect.py`	DONE	`_build_video_data()` ignores `CommentReply` records entirely. Comment themes are the richest audience signal.
1.5	Memory not used in trends selection	`app/pipeline/step_01_trends.py`	DONE	Trend selection doesn't know which topics performed well/poorly.
1.6	Memory not used in SEO generation	`app/pipeline/step_06_seo.py`	DONE	SEO prompt doesn't receive channel memory. Tags/titles can't optimize based on past performance.
1.7	Pollinations always generates same image (seed=42)	`app/providers/visuals/pollinations_provider.py`	DONE	Fixed seed means identical images for same keyword across runs.
1.8	MoviePy resource leaks	`app/pipeline/step_05_edit.py`	DONE	VideoFileClip, AudioFileClip, TextClip are never closed. File handles and FFmpeg subprocesses leak.
1.9	`_extract_json()` duplicated in 3 files	`app/utils/json_utils.py` (new)	DONE	Identical function copy-pasted in 3 step files.
1.10	LLM provider rebuilt on every step	`app/pipeline/orchestrator.py` + step files	DONE	`build_llm_provider(cfg.llm)` called in each step independently.

P1 Implementation Details

1.1 — Parallel step execution (Visuals + Audio)
- orchestrator.py: added StepEntry = Union[PipelineStep, list[PipelineStep], tuple[PipelineStep, ...]] type
- New _run_parallel() method uses ThreadPoolExecutor(max_workers=len(active))
- Each parallel step gets a shallow dict(ctx) copy; results merged after all complete
- tasks.py: steps list now uses (VisualsStep(), AudioStep()) tuple for parallel group
- Specifics: Only steps 3+4 are parallelized (not 5+6) because SEO actually benefits from having the final video path, and Edit must complete before Upload. This is the safe, high-value parallel pair.

1.2 — Few-shot example in script prompt
- Added a complete 4-segment example JSON in _build_user() showing hook → segments → CTA structure
- Example uses a realistic AI-medical topic with proper word counts per segment
- Specifics: Combined with P0.2 implementation in the same step_02_script.py rewrite

1.3 — Cumulative memory (reflection carries forward)
- step_09_reflect.py now calls get_channel_memory(topic) before reflection
- Previous memory passed to _reflect() as prev_memory parameter
- LLM prompt includes PREVIOUS CHANNEL INSIGHTS block with instruction: "UPDATE these insights — keep valid, revise contradicted, add new patterns"
- Specifics: Memory is still upserted (one record per topic), but the LLM sees prior insights and evolves them rather than starting fresh each time

1.4 — Comment data in reflection
- _build_video_data() now queries CommentReply records (up to 5 per video, latest first)
- entry["viewer_comments"] = list of comment texts (truncated to 150 chars)
- Added audience_requests: list[str] field to ChannelMemorySchema
- Reflection prompt updated: "If viewer_comments are included, extract recurring themes, questions, or requests"
- memory_store.py updated: get_memory_context() now outputs audience requests

1.5 — Memory-informed trend selection
- Both _run_static and _run_dynamic in step_01_trends.py now query channel memory
- Injects best_topic_keywords as "prefer similar angles" and avoid_topic_keywords as "avoid these"
- Also fixed hardcoded "60-second short" → uses cfg.channel.target_duration_sec
- All LLM select methods accept optional llm parameter with fallback

1.6 — Memory-informed SEO
- step_06_seo.py imports get_memory_context and injects into system prompt
- Memory block: "Channel performance insights (use to optimize tags/title): ..."
- Limited to 150 tokens to avoid overwhelming the SEO prompt
- Fixed hardcoded "American" → uses {region} from config

1.7 — Random seed for Pollinations
- Changed seed=42 → seed = random.randint(1, 999999)
- Cache key updated to include seed: f"pollinations_{keyword}_{w}x{h}_{seed}"
- Specifics: Old cache lookup by keyword-only key was removed because it would always return stale results. Each generation now gets a unique image. Trade-off: no deduplication for same keyword within a run, but this is acceptable since segments should have different keywords anyway.

1.8 — MoviePy resource cleanup
- run() method wrapped in try/finally block
- All clips tracked in _open_clips: list throughout the method
- Finally block closes all clips in reverse order (composite first, then sources)
- Specifics: MoviePy clips don't support Python context managers (with), so explicit tracking + finally is the correct pattern. Reverse order ensures composite clips are closed before their source clips.

1.9 — Shared extract_json() utility
- Created app/utils/json_utils.py with single extract_json(text) -> str function
- Handles: markdown code blocks (\``json ... ```), raw JSON objects, passthrough - Removed duplicate copies fromstep_02_script.py,step_06_seo.py,step_09_reflect.py- All three files now importfrom app.utils.json_utils import extract_json as _extract_json`

1.10 — Single LLM provider instance per pipeline run
- orchestrator.py creates LLM once: ctx["llm"] = build_llm_provider(self.cfg.llm)
- All step files use pattern: llm = ctx.get("llm") or build_llm_provider(cfg.llm)
- Specifics: The or build_llm_provider() fallback preserves backward compatibility — steps can still be used standalone (e.g., test_pipeline.py --step seo) without the orchestrator. The _FallbackLLMProvider wrapper (P0.4) means the shared instance already has fallback baked in.

Priority 2 — Moderate (reliability/correctness)

#	Issue	File	Line	Description
2.1	No per-step retry logic	`app/pipeline/step_base.py`	—	Steps have no retry mechanism. Celery retries the entire pipeline (max_retries=2), wasting completed work.
2.2	No Celery task timeout	`app/scheduler/tasks.py`	86	If any step hangs, worker is blocked indefinitely. Need `time_limit=3600`.
2.3	Visual fetching is sequential	`app/pipeline/step_03_visuals.py`	29-46	For 5 segments, each API call takes 3-15s. ThreadPoolExecutor can parallelize these.
2.4	N+1 query problem in reflection	`app/pipeline/step_09_reflect.py`	84-141	For each of N jobs, makes 3 separate queries (JobStep, Upload, Analytics). Should use JOINs.
2.5	String-based topic filter in memory API	`app/api/routes_memory.py`	49-51	`MemoryEvent.content.contains('"topic": "AI"')` matches substrings. topic="AI" can delete records with topic="PAIR".
2.6	Memory history endpoint not filtered by topic	`app/api/routes_memory.py`	23-28	Returns last 10 events globally, not for the requested topic.
2.7	OpenAI model name hardcoded	`app/providers/llm/openai_provider.py`	~8	`gpt-4o-mini` is not configurable. Should be in LLMConfig.
2.8	`shares` field stores comments count	`app/analytics/youtube_fetcher.py`	199	Misleading field name. `Analytics.shares` actually holds YouTube `commentCount`.
2.9	Reflection runs on same data repeatedly	`app/pipeline/step_09_reflect.py`	56-68	No check against `source_job_ids`. Same videos analyzed every run after threshold.
2.10	Over-fetching in reflection query	`app/pipeline/step_09_reflect.py`	60	`limit(n * 2)` but only `recent[:n]` used.
2.11	Step 09 overwrites LLM analytics fields	`app/pipeline/step_09_reflect.py`	224-228	Manually replaces `avg_views`, `performance_trend` after LLM generation, making LLM's version pointless.
2.12	No MemoryEvent index	`app/db/memory_models.py`	36-42	Queries on `event_type` scan full table. Needs index.

Priority 3 — Low (nice-to-have improvements)

#	Issue	File	Line	Description
3.1	Token estimation inaccurate	`app/memory/memory_store.py`	75-77	1 token ≈ 4 chars is ±15% off. Can truncate mid-sentence. Should cut at sentence boundary.
3.2	No JSON mode for OpenAI provider	`app/providers/llm/base.py`	11-15	`complete_json()` just appends text instruction. OpenAI supports `response_format={"type": "json_object"}`.
3.3	Comment reply prompt ignores channel.style	`app/analytics/youtube_comments.py`	253-264	`channel_style` is passed to task but never used in the prompt.
3.4	Shorten/expand prompts lack strategy	`app/pipeline/step_02_script.py`	131-155	"Shorten it" / "Expand it" without guidance on what to cut/add. Should specify preserve hook+CTA.
3.5	Keyword deduplication in memory	`app/memory/memory_store.py`	44-46	Same keyword can appear in both `best_topic_keywords` and `avoid_topic_keywords`.
3.6	No memory freshness check	`app/memory/memory_store.py`	30-40	Injects 30+ day old memory without warning.
3.7	Unbounded MemoryEvent growth	`app/memory/memory_store.py`	103-107	No pruning. Table grows indefinitely.
3.8	Multiple db.commit() per pipeline step	`app/pipeline/orchestrator.py`	48-95	4-5 commits per step. Can batch into 1-2.
3.9	Prompt injection via trend_keyword	`app/pipeline/step_02_script.py`	94	User/LLM-supplied keyword inserted into prompt without sanitization.
3.10	FFmpeg subprocess no timeout	`app/utils/ffmpeg_utils.py`	~71	`subprocess.run()` can hang on corrupted media.
3.11	Upload chunked streaming no timeout	`app/pipeline/step_07_upload.py`	155	`request.next_chunk()` can stall indefinitely.
3.12	Analytics fetch not triggered on upload	`app/scheduler/tasks.py`	—	Videos wait 6h before first analytics fetch. Should trigger immediately after upload.

Priority 4 — Cosmetic / future enhancements

#	Issue	File	Line	Description
4.1	No A/B testing framework	—	—	Can't compare two strategies (e.g., dramatic vs calm hooks).
4.2	No YouTube Analytics API integration	`app/analytics/youtube_fetcher.py`	—	Missing watch time, retention, avg_view_duration (requires different OAuth scope).
4.3	Trend detection too simplistic	`app/pipeline/step_09_reflect.py`	166-181	Binary recent/older comparison. Sensitive to outliers.
4.4	Comment reply doesn't detect language	`app/analytics/youtube_comments.py`	—	Ukrainian comment gets English reply.
4.5	No per-keyword performance tracking in memory	`app/memory/memory_store.py`	—	Memory doesn't track which specific keywords drove views.
4.6	`MemoryEvent` table never read	`app/db/memory_models.py`	36-42	Written to but never queried in pipeline logic. Dead audit trail.
4.7	SEO `chapters` field always empty	`app/pipeline/step_06_seo.py`	73	Shorts don't have chapters. Remove from schema.
4.8	`bottom_bar` subtitle style not implemented	`app/pipeline/step_05_edit.py`	—	Config allows it but only karaoke positioning exists.
4.9	Causality markers for memory insights	—	—	Track which insights were used per video to measure their impact.
4.10	LLM response caching via Redis	—	—	Same/similar prompts re-sent without cache. Redis already available.

Implementation Summary

Implemented: 14 items (P0: 4/4, P1: 10/10)
Remaining: 22 items (P2: 12, P3: 12, P4: 10)

Files Modified

File	Changes
`app/providers/llm/g4f_provider.py`	P0.1 — temperature/max_tokens forwarding
`app/providers/llm/factory.py`	P0.4 — `_FallbackLLMProvider` wrapper class
`app/analytics/youtube_comments.py`	P0.3 — config-aware `_build_llm()`
`app/pipeline/step_01_trends.py`	P1.5 — memory-informed trend selection, dynamic duration
`app/pipeline/step_02_script.py`	P0.2 + P1.2 — i18n prompts, few-shot example, dynamic segments
`app/pipeline/step_05_edit.py`	P1.8 — clip tracking + finally cleanup
`app/pipeline/step_06_seo.py`	P1.6 — memory-informed SEO, region-aware prompt
`app/pipeline/step_09_reflect.py`	P1.3 + P1.4 — cumulative memory, comment analysis, audience_requests
`app/pipeline/orchestrator.py`	P1.1 + P1.10 — parallel steps, shared LLM instance
`app/scheduler/tasks.py`	P1.1 — parallel step group `(VisualsStep(), AudioStep())`
`app/memory/memory_store.py`	P1.4 — audience_requests in memory context
`app/providers/visuals/pollinations_provider.py`	P1.7 — random seed per generation

Files Created

File	Purpose
`app/utils/json_utils.py`	P1.9 — shared `extract_json()` utility
`docs/OPTIMIZATION_PLAN.md`	This document

Key Design Decisions

Parallel steps limited to Visuals+Audio only — SEO could theoretically run parallel with Edit, but it benefits from having the final output path, and the time savings would be marginal compared to the Visuals+Audio pair (which involves network I/O).
ctx.get("llm") or build_llm_provider() pattern — Preserves backward compatibility for standalone step usage (test_pipeline.py --step X) while eliminating redundant instantiation in normal pipeline runs.
Pollinations cache key includes seed — Breaking change from the old pollinations_{keyword}_{size} key. Old cached images won't be hit, but this is the correct behavior since the whole point is generating unique images.
Cumulative memory via LLM, not merge — Rather than programmatically merging old+new memory dicts (brittle, loses nuance), the previous memory is passed to the LLM with explicit instructions to update/revise/keep. This lets the LLM make qualitative judgments about what's still valid.
MoviePy cleanup via tracked list + finally — MoviePy clips don't implement __enter__/__exit__, so context managers aren't an option. Reverse-order closing (composite → source) prevents errors from closing source clips that are still referenced.