The System Behind 300 Assets Across 20 LatAm Markets Latin Fusion Engine Case Study

The constraint architecture solved the quality problem, but scale was a different challenge entirely.

Producing content across 20+ markets without losing consistency required more than better prompts and tighter rules. It required a production system and the first attempt at building one revealed the next problem immediately.

The Master GPT That Did Not Work

The first instinct was to build one GPT that handled everything: lyrics, video prompts, SEO metadata. Feed it a brief, get a finished asset back. It had a lot of problems.

When a single model is asked to context-switch between writing genre-specific lyrics, describing visual scenes, and formatting search metadata, it gets confused and starts guessing. The lyrics would be strong but the video prompt would drift. Fix the metadata and the lyrics would break. Every adjustment destabilized something else, and the more tasks the model held at once, the more quality degraded across all of them.

The first assembly prompt had another problem. It was limited to one song at a time, so if the track did not land, the entire creative output went with it.

The Assembly Line

The solution was splitting the work across three specialized GPTs.

The Lyricist writes lyrics. The Creative generates image and video prompts and visual direction. The Metadata GPT handles titles, descriptions, and tags. Each does one job and none of them interfere with each other.

Worth being clear about what that actually means in practice. These GPTs do not make autonomous decisions or hand off work to each other independently. I decide what gets fed in, review what comes out, and make changes before sending anything back for reprocessing. Not every output is usable, but having the GPTs produce multiple tracks simultaneously means I can QA a batch quickly, flag what needs fixing, and run it again. The speed is what makes that practical.

That separation made batching possible. Ten songs could run through the Lyricist simultaneously, then feed into the Creative GPT all at once, and the production volume that a single master GPT could not handle became manageable across a modular system.

Quality Control

Three layers of quality control were built into the pipeline.

Layer one: the QA GPT

The first layer was a dedicated QA GPT with one job: audit Lyricist outputs and flag forced slang, mismatched references, and instrumentation errors before anything moved forward. An automated check between generation and publishing to catch errors before they reach an audience.

Layer two: manual review

The second layer was a manual review. Because the modular system standardized outputs, anything off stood out immediately. The consistency also made pattern recognition practical. Ten consecutive tracks carrying the same slang misuse. A genre drift repeating across the same subgenre. When a pattern emerged, the fix went directly back into the GPT instructions rather than being corrected track by track. Identifying systematic errors and updating the instructions accordingly is what kept the system improving rather than just running.

Layer three: the audience

Comments are the third layer and the most direct signal of what is actually working.

Some songs get a wall of emojis and some get detailed feedback. I respond to everything. When a listener is genuinely curious about a sound, tells me what they like, and points me toward examples, I make a track based on that and send it back. If it lands, a new fan. If someone just complains without engaging, a polite redirect to other videos on the channel.

The AI made the execution possible because without the GPT pipeline, responding to audience requests at this pace would not be feasible. The comments close the loop between what the system produces and what the audience actually wants.

[SSS: Example Champeta track and Champeta 70s version]

The Thumbnail Problem

One problem that took longer to solve than expected was getting the artist prompt to format text correctly on thumbnail images. Early iterations failed consistently and the workaround was copying and pasting images of successful prompts into new chats to show the model what had worked, then running it again. New chats helped but did not fully solve it. The outputs improved meaningfully once Gemini upgraded from the thinking to the pro level.

What made that upgrade interesting was being able to see the model’s reasoning in detail before it collapsed into an output. When something failed, the thinking log got copied into Claude, the problem got explained, and Claude would come back with a refined prompt targeting the specific failure. Using two models in combination, one to generate and one to debug, became a standard part of the workflow. Not elegant, but it worked.

The Knowledge Problem

The Master Prompt defined the creative boundaries for each genre but it needed reliable source material to draw from.

The models carried bias, defaulted to the most common sounds in their training data, and occasionally filled gaps with plausible but incorrect details. Building a proprietary Genre Database and feeding it directly into the GPTs addressed that directly. Rather than relying solely on training data, the GPTs now pull from a structured knowledge base before generating output. This approach is called RAG, Retrieval Augmented Generation, and it is what keeps outputs grounded in actual regional data rather than the model’s best approximation.

What the Genre Database contains

The Genre Database maps hundreds of LatAm music genres and subgenres across every relevant dimension: instrumentation, regional context, target audience, visual aesthetic, and market demand signals by country. When the GPT generates output for a Bolivian Caporales track, it references a structured dataset specifying the bells on boots, the brass arrangements, the high-altitude visual palette, and the audience profile. The same applies across every genre in the pipeline.

The Master Prompt set the rules, the Genre Database supplied the ground truth, and the GPT executed within both.

The Human in the Loop

The system does not run itself because content decisions, creative direction, and campaign management all require someone watching the data and acting on it. AI handled the production volume. Writing lyrics, generating visuals, producing metadata, auditing outputs, tasks that would have taken a team working sequentially could be batched and run in parallel. That is what made it possible to run 20+ markets, hundreds of assets, and a live advertising campaign as a one-person operation. The automation is real but so is the ongoing human review that keeps it on track.

On the advertising side, the focus is on maximizing conversions and minimizing CPA on the Google Ads platform. Looking at other metrics produced inconsistent results and conversions gave a clean, reliable signal for where to put budget.

On the YouTube side, the focus is on watch time. Which videos are generating the most hours and how that tracks against where conversions are coming from. The two tend to align well. Earned subscribers, people subscribing without being directly paid for, create compounding watch time opportunities without additional spend and are tracked separately from paid acquisitions.

The system held because the architecture was modular, the knowledge base was specific, and the human stayed in the loop throughout. Three hundred assets later, the pipeline was ready for a real market test.

Post 3 covers what the data showed when it ran.