Semantic Router: Steer local LLMs for decision-making and more

8min read llm semantic_router

At Aurelio Labs, we’ve open sourced our Semantic Router tool, a superfast decision-making layer that allows the steering of LLMs based on the semantic meaning of input fed to them. And with the release of Semantic Router v0.0.16, we’ve enabled users to steer open source LLMs (like Mistral-7B-Instruct-v0.2) for chat, function calling and more.

I’d like to demonstrate how I use Semantic Router with Trending on Weibo to guide models such as Baichuan-7B and Mistral-7B to generate news articles with different tone of voice, depending on the article being reported on. We will only be relying on open source models, and consumer hardware (M1 Pro chip) to achieve this.

When building applications that make use LLMs, the stochasticity of the output makes using LLMs for decisions unpredictable. However, we can make use of vector space similarity to deterministically take decisions based on our input.

As an example, we have an LLM-powered journalist agent who is tasked with reporting on trending topics from Chinese social media. Because the range of topics discussed on social media varies greatly, so does our journalist’s tone of voice in their reporting. For example, we might want to use a more tabloid-style, light tone of voice when reporting on rumors from the entertainment industry, yet a more factual, objective tone of voice when reporting on politics.

We can use Semantic Router as a fast decision layer to achieve this.

Utterances & semantic similarity

from semantic_router import Route

entertainment = Route(
    name="entertainment",
    utterances=[
        "枭起青壤 迪丽热巴陈哲远",  # Xiaoqi Qingyang, Dilraba and Chen Zheyuan
        "ELLE红毯",  # ELLE red carpet
        "王一博你睡了吗",  # Wang Yibo, are you asleep?
        "张小斐肌肉线条",  # Zhang Xiaofei's muscle lines"
        "金晨穿羽绒服走红毯",  # Jin Chen wears a down jacket and walks on the red carpet
    ],
)

politics = Route(
    name="politics",
    utterances=[
        "美国2024年总统选举首场初选开始",  # The first primary election for the 2024 U.S. presidential election begins
        "增强拒腐防变和抵御风险能力",  # Enhancing the ability to resist corruption, change and resist risks
        "中美元首相互道别",  # The heads of state of China and the United States bid farewell to each other
        "中美外长会谈正式开始",  # China-U.S. Foreign Ministers' Talks Officially Started
        "拜登回应英美空袭也门"  # Biden responds to British and US air strikes on Yemen
    ]
)

routes = [entertainment, politics]

Let’s start by defining two “Routes”. These routes are forks in the road, which we use to make decisions about where to go next. Semantic router uses the utterances to establish semantic similarity between future inputs and the semantic meaning of the Route.

In this example, I used 5 trending hashtags from Weibo for each Route as utterances.

Encoders

Now that we’ve defined our routes and provided our utterances, we need a way of encoding these utterances and future inputs into a high-dimensional vector space for comparison.

Semantic Router includes a variety of encoders, including popular ones such as Cohere and OpenAI, but also support for open source models via the HuggingFaceEncoder and even non-LLM based encoding via TF-IDF.

For more information on encoders, refer to the aurelio-labs/semantic-router repository

For our use case, we want to be able to run our pipeline using consumer hardware, so we will use the HuggingFaceEncoder together with a multilingual embedding model, intfloat/multilingual-e5-base which has support for Chinese.

encoder = HuggingFaceEncoder(
  name="intfloat/multilingual-e5-base",
  device="mps"  # making use of Apple Metal hardware acceleration
)

We can now test our routing:

from semantic_router.layer import RouteLayer
rl = RouteLayer(routes=routes, encoder=encoder)
titles = [
  "白鹿抽到了许佳琪织的围巾",  # Actress Bailu drew the scarf knitted by Xu Jiaqi
  "朝鲜废除祖国和平统一委员会",  # North Korea abolishes the Committee for the Peaceful Reunification of the Fatherland
]
for title in titles:
    print(title, rl(title).name)
[Out]:
白鹿抽到了许佳琪织的围巾 entertainment  # Actress Bailu drew the scarf knitted by Xu Jiaqi
朝鲜废除祖国和平统一委员会 politics  # North Korea abolishes the Committee for the Peaceful Reunification of the Fatherland

Nice! The router successfully identified the category of both of the test stories, now let’s make use of this this decision downstream.

Decision making

At the beginning of the article we said we wanted our journalist to change their tone of voice depending on whether it was reporting on politics or entertainment.

The quickest way to do this is via prompt engineering, so let’s define a political and entertainment prompt:

For our political prompt, we want to stress out the objective nature of reporting language

<s>[INST] You are a journalist who only speaks English reporting on news from China. (…). Your purpose is to write a long form news article, using journalistic and objective tone of voice based on the following Chinese tweets. Your reporting must be accurate, you must higlight the main context and reactions from Weibo netizens (…) [/INST]

Whereas for our entertainment pieces, we want the journalist to adopt a lighter tone and report in a tabloid-style format.

<s>[INST] You are a journalist who only speaks English reporting on news from China. (…). Your purpose is to write a tabloid entertainment piece, similar to Buzzfeed News, using a light and joyful tone of voice based on the following Chinese tweets. (…) [/INST]

Note: we are using a Mistral-Instruct prompt template here, so you might want to alter the <s>[INST] {prompt} [/INST] template.

Local LLM calls

We’re all set, so let’s use a similar loop as before to generate different articles with different tones of voice:

With the release of Semantic Router v0.0.16, we’ve added support for local execution of open source LLMs via ggerganov/llama.cpp and abetlen/llama-cpp-python. This enables me to run a small Mistral-7B-Instruct-v0.2 model on a single M1 Pro Macbook.

Let’s initialise the LLM and needed semantic router wrapper:

from llama_cpp import Llama
from semantic_router.llms import LlamaCppLLM

_llm = Llama(model_path="mistral-7b-instruct-v0.2.Q4_0.gguf", n_gpu_layers=-1, n_ctx=8000)
llm = LlamaCppLLM(llm=_llm, name="Mistral-7B-Instruct-v0.2")

Refer to this piece of documentation for more information on configuring the semantic_router.llms.LlamaCppLLM (https://github.com/aurelio-labs/semantic-router/blob/main/docs/05-local-execution.ipynb)

With everything initialised, let’s go ahead and create our articles, with their special tone of voice.

for title, text in title_tweets.items():
    if rl(title).name == "politics":
        prompt = politics_prompt
        print(f"{title} is a political subject, using the political prompt")
    elif rl(title).name == "entertainment":
        prompt = entertainment_prompt
        print(f"{title} is an entertainment subject, using the entertainment prompt")
    else:
        raise ValueError("No good prompt found")
    print(llm([Message(role="user", content=prompt.format(text=text))]))

Result

The two articles have vastly different tones, with the political one emphasising reports from central news agencies, while the other being more speculative and fun.

Article 1, Politics: North Korea abolishes the Committee for the Peaceful Reunification of the Fatherland

Pyongyang, North Korea, January 16 - The North Korean Supreme People’s Assembly held its 10th meeting on January 15, with Kim Jong-un, the General Secretary of the Workers’ Party of Korea and Chairman of the State Affairs Commission, delivering a policy speech. During the session, they discussed and passed a resolution to abolish several organizations dealing with inter-Korean affairs.

According to reports by Chosun Central News Agency and Korean Central News Agency, the North Korean Supreme People’s Assembly announced on January 15 that it had decided to abolish the Reunification Committee of the Democratic People’s Republic of Korea, the National Economic Cooperation Committee, and the International Tourism Complex in Mount Geumjeong. The decision stated that the reunification policy line for the past 80 years was based on “one nation, one people, two systems.”

Article 2, Entertainment: “Bai Lu and Xu Jiaqi’s Magical Encounter: A Heartwarming Tale of Gifts and Friendship from China’s Microblogging Sphere”

In the bustling world of Chinese microblogging, a heartwarming tale of friendship and gift-giving has taken the internet by storm. It all started during the gift exchange session of the popular TV show “微博之夜” (Weibo Night), where Bai Lu (@白鹿my) was lucky enough to draw Xu Jiaqi (@许佳琪kiki)’s name. The excitement didn’t stop there, as Xu Jiaqi had crafted a beautiful hand-woven scarf for Bai Lu!

Bai Lu, who has been spotted on Weibo before, shared her joy in a series of posts. “Fate brought us together again on Weibo,” she wrote, accompanied by a video of herself (…)

Conclusion

A lot of (digial) ink has been spilled on the need for controlling LLMs in a repeatable manner, to control “hallucinations” and other topics related to LLM alignment. Semantic Router offers a fast solution based in vector space similarity.

If you enjoyed this article, please leave us a star on Github - aurelio-labs/semantic-router. We are also accepting contributions from the wider community in the same repository.