From Slot Machine to Content Engine: How Constraint Architecture Fixed AI’s Default Bias Latin Fusion Engine Case Study

It started with a bet.

A conversation about AI-generated content on LinkedIn raised a question worth testing: could AI produce something people actually wanted to watch, or would audiences see through it immediately? The challenge was straightforward. Build a music video using AI tools, post it, and find out.

Why Music

Music was the practical choice.

Most products require scoping, building, and testing before a single data point comes in. With Suno and Veo, a finished track with visuals can be produced quickly, which meant new content variations could be tested fast without a production team or a significant budget behind them.

The demand was already established. Listeners had clear tastes and platforms had the targeting data to reach them. No audience education phase. No product to scope. No market to create.

Fast production plus a pre-existing addressable market made music the right place to run this experiment.

Worth noting upfront: AI-generated music is not new territory. YouTube is full of it and audiences know exactly what they are looking at. The channel was transparent about being an AI project from the start. That context matters because the results were not driven by novelty. Audiences engaged knowing what the content was.

The Test

To start the test, the prompt was simple: a cowboy techno song. The expectation was something country-adjacent with electronic production. What Suno returned was a quirky track about Space Cowboys in a Technicolor prairie, fighting bad guys, visually specific in a way that was not asked for. The lyrics were strong enough to feed directly into Veo for the visuals, and the Space Cowboy video came together from that accident.

[SSS: Space Cowboy video embed]

The video did something useful. It generated consistent, measurable engagement. YouTube analytics showed exactly where attention was captured and where it was lost. For a growth marketer, that is the same feedback loop as iterating on ad copy. Change the product, run it again, measure the result. I was hooked.

But the Space Cowboy worked because the output was accidentally specific. The prompt asked for one thing and got another. There was no way to reproduce that deliberately or predict when the next good accident would happen. That gap, between an accidentally good output and a consistently good system, was the real problem worth solving.

From Space Cowboys to Latin Fusion

The Space Cowboy proved that specific content holds attention. The logical next step was to test whether that held at scale. Electronic music felt like the safe starting point: universal, language-agnostic, and easy to produce quickly. No genre expertise required, no language barrier, easy to iterate.

Ten tracks across different hooks and fusions, and they all failed. The space was too broad and too saturated to gain traction without significant budget behind it.

The data showed something more interesting than the failures. Latin content, tested as part of the broader EDM experiments, was consistently outperforming every other variation on average view duration. The audience was staying longer, engaging more, and coming back.

[SSS: Paraiso video embed]

Two things made Latin the obvious next move. The engagement signal was already there, and CPMs across LatAm markets are significantly cheaper than in North America or Europe, which meant aggressive testing across multiple markets was possible without burning through budget. More tests, faster feedback, lower cost per data point. That combination pointed directly at a focused Latin music experiment.

The Translation Trap

Reaching 20+ markets meant the lyrics needed to sound authentic to each genre and region. Proper use of country-specific speech patterns and slang was not optional. It was the product.

The first AI-assisted Latin music outputs read like corporate PR dressed up in Spanish. The lyrics did not sound like how a person speaks or sings. They lacked emotion and specificity.

Adding regional slang to the prompt made it worse. The model overcorrected immediately, recycling the same five words across every genre whether the target was a Peruvian Huayno or a Mexican Corridos Tumbados. The slang overpowered the lyrics, making the output sound like an executive trying too hard to be cool. “67” anyone?

[SSS: Fellow kids skateboard reference]

The default bias

The bias ran deeper than the language. Ask for something energetic and the model defaulted to Phonk, not because it was the right creative call, but because Phonk dominates global trend data and the model follows volume. This is a well-documented behavior in large language models called training data gravity: the model defaults to whatever is statistically dominant in its training data regardless of what the prompt asks for.

The early fix was writing prompts in Spanish. Giving the model instructions in the target language produced noticeably more authentic output than prompting in English and asking for Spanish results. The model was essentially translating twice, the instruction and the output, and prompting in Spanish removed that penalty.

That workaround became unnecessary as the models improved. Cross-language instruction handling got significantly better and English prompts started producing better native Spanish output. But in the early stages, language of instruction mattered as much as the instruction itself.

The Phonk default had an unexpected side effect worth noting. It accidentally exposed a genuinely engaged Mexican and Argentinian Phonk audience. Rather than fighting the bias entirely, a dedicated Phonk playlist was built around it, including regional fusions like Phonk Requinto and Phonk RKT that the system could produce consistently. The guardrails were not just about suppressing the default but about channeling it into something intentional while keeping it from bleeding into the other genres.

[SSS: RKT Nave Blindada and Mexican Phonk embeds]

The Instrument Problem

The language problem was visible from the first output. The instrument problem took longer to catch.

Genres from Colombian Vallenato to Bolivian Caporales to Argentinian Malambo use completely different instrumentation. Early prompts were loose: “create a Vallenato song about X.” That created problems immediately.

Any flute instruction, regardless of how specifically the regional context was described, came back sounding like an Irish festival. The model’s training data skewed heavily toward Celtic folk music and overrode everything else. This is an example of hallucination by association: the model fills in gaps using its most statistically common reference points rather than the specific context provided.

The fix required removing generic instrument names from prompts entirely. Not “flute” but breathy Quena and Zampoña playing a melancholic melody. Mexican Guitarrón instead of bass guitar. Colombian Gaitas instead of woodwind. Regional name paired with regional context in every single prompt, without exception. Specificity consistently worked but specificity alone was not enough.

The Architecture Fix

Better prompting improved outputs but did not solve all the problems. New ones surfaced after each iteration. The slang fix caused the model to overindex on slang in the lyrics, which led to a rule of no more than 5% slang in any output with natural language as the default. Each failure pointed to a specific gap that required a specific fix. The real solution was not better prompts. It was better architecture.

The Constraint Layers

The prompt design was an iterative process with no clean endpoint. The Phonk default required a banned terminology list. Without explicitly blocking the model’s most common outputs, it defaulted to whatever dominated global trend data regardless of the target genre. The slang overcorrection required a regional slang-to-lyrics ratio. Without a defined limit, the model treated slang as the point rather than the seasoning. The emotional flatness required anchoring every track to universal themes: love, betrayal, hustle, joy. Without that anchor the lyrics were technically correct and completely inert. The instrumentation problem required native-language instrument specs for every prompt.

Each layer addressed a failure that had already shown up in the outputs and nothing was added speculatively.

Giving the Model a Role

Instead of issuing instructions to a general-purpose tool, each GPT was configured as a Master Lyricist for a specific genre. In technical terms this is called few-shot persona prompting: giving the model a defined role, examples of desired behavior, and boundaries that shape every output it produces. The constraints told the model what it could not do and the role told it what it was. Together they produced consistent, specific, repeatable output that held up against a real audience.

What This Means

The tools are accessible to everyone. Suno, Veo, and every other AI production tool in this stack can be used by anyone with an internet connection, and the outputs can be fast, impressive, and occasionally surprising in the right direction.

But without a control layer, the results are random. The constraint architecture, the system prompt design, the personas, the rules, is what separates a content engine from a slot machine. Getting that architecture right was the foundation everything else was built on.

Post 2 covers how that foundation became a production system capable of running 20 markets simultaneously.