TL;DR
UpYouth Vault is an internal knowledge management system at UpYouth. Its primary user interface is through a chatbot on Telegram called Bob. Bob handles everything from resource uploading, semantic searching and even Retrieval Augmented Generation, in simple words, chatting with documents.
This article goes through why UpYouth needed a knowledge management system and what’s the technology behind Bob.
My journey at UpYouth
It was December of 2023 when I first joined UpYouth, a vibrant and wonderful student-led startup ecosystem in Viet Nam.
In case you don’t know: UpYouth is one of the largest student-led startup ecosystems in Viet Nam. Their mission is simple, to empower Vietnamese youths to become real founders with real traction. With 3 years on the line, UpYouth has created various products, from Incubators to Coffee Chats, reaching hundreds of thousands of Vietnamese students, impacting startups with millions in fundings.
The work atmosphere at UpYouth was great and one of the first impressions I got there was that everyone tried to share and contribute their knowledge no matter which field they’re in or which level they’re at. This inclusive culture immediately impressed me upon arrival.
However, there was a problem. As everybody shares new knowledge, it was just bound to be lost in a myriad of messages inside a group chat, in particular a Telegram “town hall” for every UpYouth member and alumni.
Then a brilliant UpYouthian, Hong Dang, from People Operations & Culture dept, came up with the idea of building a Telegram Chat Bot that automatically aggregates these “knowledges” and saves them in a central database so that members can search information easily.
Being an engineer that is inherently curious, I was bound to take on this challenge.
The architecture of Bob
On my first day of tackling the challenge, one of the first things I realized was that a pure resource scraping chatbot was not efficient at all.
Imagine that you're in a group chat and every time someone uploads a link or a pdf, a bot would take this link and store it somewhere. However, since that link is vanilla, there is no information associated with that link, such as a summary or a title.
If we were to include a flow that forces users to give extra information to the chatbot inside the group chat, it would cluster the chat and potentially discourage users from uploading resources at all.
Therefore, one of the first decisions I used when building this bot was to have it based on a Publisher-Subcriber architecture.
Bob was an intermediary. When someone wants to upload a resource, they would chat with Bob privately, giving the resource and its associated information. As Bob finishes process the resource, it would announce the addition on the “town hall” group chat.
As you might see, this fits the description of a Pub-Sub arch, where users are publishers and group chats are subscribers. This allows Bob to send particular resources to particular channels as well, for example, venture capitals resources might be sent directly to an investment-focused chat.
This architecture allows Bob to have a decent amount of separation, producing no clutter in the group chats and encouraging users to share as information is standardized.
For more information on the tech stack I used to build Bob, the telegram interface was built with pyTelegramBotAPI, a Telegram Bot API wrapper, while the central storage, called UpYouth Vault, was on Google Sheets, integrated with AppScripts’s REST API framework. This was a simple solution that enables users to browse the storage quickly and enables some quick implementation of user analytics.
Bob is actually AI
If Bob was just a vanilla chat bot, maybe it wouldn’t have been cool to include him in my newsletter. However, this dude is a freaking menace.
Bob supports vector searching
When I solved the central storage and chat clutter problem, there was still one more problem around: members of UpYouth still don’t have a good way to search on the central storage.
As Google Sheets only support keyword-based searching, I was determined to implement semantic search. That was when I came across ChromaDB, an open-source vector database solution.
Using ChromaDB and OpenAI’s text embedding models, I built a simple indexing function that vectorizes all of Bob’s uploaded documents. When a user input a query, Bob would vectorize that query and search for the most similar vector in ChromaDB.
Each vector is associated with a document; therefore, the most similar vector gives the most related document to the keyword.
For example, if you want to search about Palword, you can just ask Bob like this: “Do you know any famous Japanese game that is kinda like Pokemon?” and he would give:
Bob supports R.A.G
As an engineer that always seeks new challenges, after implementing semantic search, I asked myself “Why don’t I also do RAG?”
If you don’t know, RAG or Retrieval Augmented Generation is a way for LLMs to answer questions based on external resources. In this case, I want Bob to be able to answer questions based on resources that were uploaded onto UpYouth Vault.
To do this, all I needed was to use vector searching from the previous feature and some prompt engineering.
In specific, when a user gives Bob a prompt, for example, “Tell me Palworld developers’ struggles”, Bob would find the most relevant documents in ChromaDB. Afterwards, he gives this document along with the prompt to ChatGPT in the following format:
"""
Answer the question using the context:
<document content here>
Question: <question here>”
"""
The result is like this:
This implementation was built using LangChain. To provide Bob with larger context, instead of using Vector Store Retriever, I used Parent Document Retriever. More about that here.
Endnote and Appreciation
I hope you find this article fun. Although I can’t provide the source code of Bob, building him was relatively easy and only costs me half a day at best. All the cool AI stuff was built in an afternoon thanks to LangChain’s awesome abstraction.
I want to say thanks to Hong Dang, who gave me the idea in the first place, members of UpYouth for having used Bob and gave me inputs on how to improve him.
It’s cool to work at UpYouth, check it out when recruitment opens.
It's really engaging how you created Bob and all his amazing capabilities.