Retro Alpha

Building Retro Alpha: A 90s CRT Stock Market Game for the Build Small Hackathon

Authors: Sankalp H S & Sathvik A R  
Project: Retro Alpha — Build Small Hackathon  
GitHub: sankalphs/Retro-Alpha
Models & Datasets:


What If Zerodha Existed in 1994?

The whole idea for Retro Alpha started with a fun, slightly ridiculous thought experiment: what would a modern trading platform look like if you had to boot it up on a bulky CRT monitor in the mid-90s? We pictured green phosphor text, scanlines, and a terminal where you half-expect a > CONNECTING TO NSE VIA 2400 BAUD MODEM... prompt to flicker across the screen.

But we didn't just want to build a nostalgic UI. We wanted a game with actual stakes. The premise is simple: you start with ₹10,00,000 in April 1994. Your goal is to double it over 10 simulated years. Along the way, you have to navigate 22 historically accurate Indian market events—everything from the Asian Financial Crisis and Pokhran-II to the Dot-com bubble and the 2004 elections.

The real challenge, though, was under the hood. For the Hugging Face Build Small Hackathon, we had to run this entire experience off a tiny, fine-tuned 4B parameter model, served locally without relying on massive closed-source cloud APIs.

Here is the story of how Sathvik and I pulled it off, the walls we hit, and what building with constrained AI actually taught us.


Getting Off the Ground

When we kicked off the project, we started by generating a ton of synthetic data to teach our tiny model how to behave. We used MiniMax-M3 to spin up about 1,400 rows of agent decisions, news impacts, and mentor reviews.

Interestingly, we quickly realized that forcing small models to output strict JSON is a losing battle. The model kept getting distracted, emitting internal <think> blocks that completely shattered the JSON formatting. Instead of fighting it, we pivoted to a simple colon-delimited structured text format. Sometimes, the right engineering choice is just whatever actually works reliably.

Armed with our dataset, we fine-tuned unsloth/NVIDIA-Nemotron-3-Nano-4B using Unsloth 16-bit LoRA on a Modal A100. We exported it as a neat little 2.84 GB GGUF file and pushed it to the Hub. We had the brain; now we needed the body.

The Deployment Nightmares

Building locally is fun until you try to deploy. We hit a massive wall when we pushed our code to Hugging Face Spaces. The Space booted up, but the LLM was completely dead.

The logs threw a cryptic libc.musl-x86_64.so.1 not found error. It turned out the prebuilt llama-cpp-python wheel we were using was compiled for Alpine Linux, while our container was running Debian. It cost us hours of head-scratching, but the fix was just swapping a flag in our Dockerfile to force it to compile from source. Lesson learned: when it comes to C++ bindings across Linux distributions, never trust a prebuilt wheel. Take the extra 30 minutes to build from source.

Around the same time, we realized Gradio wasn't going to cut it for our frontend. We wanted total creative control over the CRT aesthetic—the screen curvature, the phosphor glow, the flicker. Gradio is amazing, but it isn't a frontend framework. So, we ripped it out and migrated the whole backend to pure FastAPI, serving our custom vanilla HTML, CSS, and JS as static files.

Taming the AI and Building Fallbacks

One of the funniest (and most frustrating) parts of the build was dealing with Nemotron’s personality. The model had this unbreakable habit of thinking out loud before actually answering. It would output things like: "<think> The user has lost 15% of their portfolio. I should be concerned but professional. </think>"

Because we fine-tuned a base model that was trained to reason step-by-step, our LoRA didn't un-teach that behavior. We ended up having to write an arsenal of about 20 different regex patterns just to catch and strip out the model's internal monologues, system prompt echoes, and markdown formatting before it hit the user's screen.

We also realized early on that a 4B model cannot be the sole brain of a complex game. We treated the LLM strictly as a language interface. The trading engine, the portfolio math, the Sharpe ratio calculations—that is all hard-coded, deterministic Python. The LLM is just there to give our NPC traders personality and to act as your sarcastic finance mentor.

Because of this architecture, we were able to build bulletproof fallbacks. If the Modal GPU spins down or the local inference fails, the game doesn't crash. The engine still calculates your returns, the mock mentor still roasts your performance, and the market keeps moving.


Looking Back: Was It Worth It?

We spent a grand total of about $0.23 on inference costs while building this entirely alongside the opencode AI assistant over a dozen chat sessions. For twenty-three cents, we built a fully functional, containerized, heavily tested (over 120 tests!) web game that runs a custom LLM.

This hackathon was a deliberate counter-movement to the "bigger is better" hype in AI right now. Working with a 4B parameter model forces you to be a better engineer. You can't just throw compute at a sloppy prompt; you have to build clever architectures, sanitize your text rigorously, and design systems that degrade gracefully.

Retro Alpha proves that small AI is not just a compromise—it’s a completely viable, incredibly fun way to build software.


Built for the Hugging Face Build Small Hackathon — a celebration of constrained, thoughtful, local-first AI engineering.

Comments

Popular posts from this blog

Noir Verdict

Duel of Albion