From Chaos to Control: Overcoming OpenAI Uncertainties with Local Models

chatcraft.org is my open source project for working with GPT. It completely changed how I work. Cool thing about chatcraft is that it’s completely client-side, almost completely stateless(except for sharing) on serverside.

I am also working on a project that uses LLMs to help navigate a knowledge base. Making a wrong tech choice there could kill my project. Few thoughts from that perspective:

The 3 YOLOs of LLM development

“You only live once” (YOLO) is a modern adaptation of the Latin phrase “Carpe diem,” which means “Seize the day.”

GPT is not practical for applications that require predictability:

The generated output is highly sensitive to the combination of prompts, model parameters, and model versions. Unfortunately, OpenAI’s deprecation policy is “YOLO” in that they may deprecate the model you were using with not much of a notice. In general OpenAI feels like a data science “lets see what’s possible”, not an AWS-style “lets help customers ship production apps” organization.
OpenAI’s service level agreement (SLA) is also “YOLO” leaving you to rely on luck when it comes to your production app. Azure offers an alternative, but that’s a separate discussion.
Additionally, there is a significant amount of plumbing required to connect all the intelligent components together. Currently, Langchain is the best solution for this (nothing else comes close). However, I foresee a lot of breakages as it evolves due to the “let’s get this demo out ASAP” nature of the project so far.

These contradict the mindset of a production-oriented dev, who prefers to deploy an app and have it work predictably for decades.

Alternatives

Fortunately, there are a few promising alternatives just around the corner:

Local LLMs like MPT7-Instruct can be deployed on your own hardware, even without GPUs. They can perform some subset of GPT magic. Alternatively, you can host them with a company that provides an SLA, such as MosaicML. Thus if the model satisfies your needs, you can deploy in own environment indefinitely!
Various local models for embeddings, eg instructor
Basaran is an OpenAI compatibility layer that facilitates migration from the chaotic train of GPT. I think it makes perfect sense to develop apps with GPT to explore what’s possible. It’s nice to known that you have a migration path to a local LLM if you need to.
GGML is an incredibly efficient and flexible runtime for ML models. It’s amusing that C++ (which has caused me undue misery throughout my career) is more dependable than Python for ML deployments.
PostgresML is not yet mature enough for production use, but it aims to combine half a century of information theory behind databases with an SQL interface to Hugging Face models. This eliminates the need to deal with the dependency hell of most ML frameworks or the driver hell of interfacing with NVidia.

I understand that suggesting an exotic plugin for Postgres as a way to sleep soundly at night may sound like a joke. But I’m dead serious. There is something incredibly powerful about combining LLMs with SQL.

PostgresML

I have been working with PostgresML for a few weeks. I’m not certain if PostgresML will be the solution that successfully marries SQL and LLMs, but it provides a glimpse into the future. So far it’s one of the best ways to combine inputs/outputs in LLM pipelines. Being able to do that within a rich ecosystem of pre-existing postgres tooling is a huge win. Doing all this with Postgres’ aversion to dataloss is so nice. Oh, and I get a choice of local or serverless solution!

Another interesting aspect of the PostgresML value-prop is that they recognize that current embed + summarize-with-GPT wave is short-sighted. Longer-term apps will be structured as pipelines of dozens ML models feeding into each other. For this your data & LLMs need to cohabitate on the same server or network.

Even though PostgresML is a few months old, it’s Postgres, thus straight-forward to migrate away to another Postgres(or another SQL db) solution if needed.

Expectations for a Backend

SQL as a language for data transformation does not evoke warm fuzzies, but it’s less bad than waking up in cold-sweat with most of the alternatives.

I fully expect that a clever and reliable database vendor will combine SQL + GGML-like runtimes + frozen ML models, offer a way migrate SQL schemas together with ML models, and perhaps even figure out CI/CD for LLM backends.

Otherwise, every LLM project will be condemned to reinvent the wheel ineffectively, risk dying every time OpenAI does a rug-pull.

The 3 YOLOs of LLM development#

Alternatives#

PostgresML#

Expectations for a Backend#

The 3 YOLOs of LLM development

Alternatives

PostgresML

Expectations for a Backend