what is memGPT?
making large language models better with a layer of virtual memory management
computer operating systems have memory management enabling new functions beyond what the physical memory can do. there is a virtual layer that enables multitasking, caching, and protection against malicious applications. without the operating system managing memory, computer processing would be severely limited.
right now large language models (LLMs) have that limitation. they have limited context windows, where there’s only so much input for so much output, as well as short term memory loss. talking to chatGPT is like talking to dory from finding nemo. every conversation you have to remind it what it’s role is and it’s goal. to keep swimming, it needs to know where you are swimming towards.
LLMs have the ability to be programmed to maintain certain roles and functions throughout a series of prompts and tasks. this is what’s enabled a new virtual layer for memory management taking the first big step to making LLMs like an operating system enabling new functions.
memGPT gives LLMs memory management
a team of researchers at berkeley created memGPT, which acts as memory management for large language models. this enables long term memory retrieval and writing ability as well as bypasses the context window input limit.
memGPT augments LLMs with a hierarchical memory system and functions that let it manage its own memory. the LLM processes main context (like RAM main memory in an OS) as an input, and output text is parsed with either a yield or a function.
these functions enable memGPT to move data between main context and external context (like OS disk memory). when the LLM processor generates a function call, it can chain together a series functions, like searching a database and sending a message.
this process enables an ability for long term memory storage as well as an ability to figure things out, like rewriting it’s own memory when it gets corrceted by a user message or data in a document.
how memGPT works (going left to right from the above diagram)
- event based control flow: different kind of events trigger LLM inference. these events include: user messages (think chat), system messages (think main context capacity warnings), user interactions (like a user logging in or uploading a document), timed events (running on a regular schedule).
- memory-hiearchy: memGPT has two primary memory types: main context, like physical RAM, and external context, like disk memory and storage. main context is the standard fixed context window in current language models. external context is any information outside of this window, which always must be moved to main context for inference. memGPT enables function calls for the LLM processor to manage it’s own memory going between the two without user intervention.
- operating system like functions: search through databases for deep memory retrieval in archive, analyze large documents, or even perform nested tasks referencing multiple data sources. the ability to write and rewrite memory and responses without user intervention.
testing memGPT in analyzing documents and long form chat
the team tested memGPT on two main use cases: 1) document analysis and 2) long form chat conversations.
for analyzing documents, previous llms have token limits for how much can be processed in one function, which limits the kind of documents that can be processed. for example, open ai’s gpt-4 has an 8192 token limit. stephen king’s best selling novel, the shining, has around 150,000 words, which approximates to about 200,000 tokens. it would take 25 context windows (or prompts) to feed gpt-4 stephen king’s novel.
using the main context to feed a single document can limit performance when scaled up.
memGPT is consistent at analyzing documents regardless of size, while gpt-4 decreases
in the first use case, they were able to show that memGPT performed consistently well in accuracy regardless of context length, or how much text information was used in query.
and what is more interesting, is the ability for memGPT to do nested key value task. see below.
in this example, memGPT continously searches archival memory until it finds the latest key. once the archival memory reveals that the current key value is not a key, then it starts the search again to find its pair. once it finds the final value, it returns the message to the user. alongside search, it also can self edit its memory.
memGPT enables self correcting memory and long term retrieval
notice in the above conversation, the chat bot made a mistake of saying ‘hi chad,’ and the user corrected it to say their name is ‘brad.’ highlighted in red, memGPT is able to edit it’s memory of brad’s first name to both reply back instaneously and remember it long term.
this makes the conversation more natural as well as more useful.
ideas to apply memGPT
memGPT enables LLMs to have the ability to read long documents, search different archived datasets, and remember a user over a long history of chat.
understanding what new functionality is unlocked with LLMs:
- reading long documents: memGPT can help process large documents like books, textbooks, financial filings, or legal transcripts
- search external datasets: the ability to search archived data and perform different functions allow for deeper memory in agent or chat based work, like reference chat history with a user
- connect multiple data points: memGPT can enable multiple function calls to determine best fit, like the nested key value task above
these three things unlock a new generation of smart assistants and co-pilots that help create a more robust and richer user experience.
alongside the obvious of a better customer support chat bot referencing a company’s wiki and ticket log, some other examples for using memGPT include:
- personalized onboarding bot for a new product: starting from the first steps of using a product, memGPT can help create a great experience answering questions along the way. it can also help the company understand different activation points or collect feedbck.
- patent search assistant: memGPT can help craft prior claims and do research across a patent database.
- design consultant: memGPT could query different data bases to help answer design questions around materials or regulatory code.
- research assistant: review and search different relevant papers while answering questions and providing support along a researcher’s journey.
- super physics tutor: referecning multiple text books and curriculum examples memGPT could enable better tutoring for students, parents, and event teachers.
what i’m most interested in is the last use case: how can a large generative model be made for specific use cases in educating and supporting better learning environments?
rather than the limits of one text book or one tutor’s knowledge graph of a subject like physics, what if you could have access to a knowledge graph that’s trained on the top 100 textbooks throughout time, able to personalize conversation because it knows the student’s journey, and it can customize curriculum for them?
while LLMs by themselves are good general tutors, memGPT could help super power that based on longer term memory for better reference and chat experience.
while one point against this is that it takes away the human role of either writing text books or teaching, i actually think it has the potential to have the opposite effect:
- enable better home schooling by supporting parents: it could be used by parents who want to learn a new topic they don’t know and help their children along the way of learning. home schooling quality could increase.
- increase quality of teacher training: public schools will no longer be limited by localized talent. teachers who are curious and want to learn the subject material could have separate physics assistant bot than their students. this could help both teachers and students. rather than be students using ai v. teachers, or ai doing certain tasks v. teachers.
ultimately, this becomes a new kind of textbook resource for school. it’s a compression of knowledge that students can interact with. the main difference: they can really interact with it. ideally this unlocks more learning, less schooling.
it’s crazy that in 2023 while schooling in synonomous with ‘learning,’ learning based outcomes are not guaranteed with more schooling.
smart tutors with memGPT will help bridge this gap in accessibility and quality.
memGPT while impressive in the moment may be easily forgotten
this seems similar to the time right after the iphone was released. there was a community that would jail break the hardware to enable new functionality based on what they wanted to use it for v. what apple thought users might want.
for example, the iphone 4 was the first to have a camera flash in june 2010. within a few weeks there was a jailbroken app to use that flash as a standalone flashlight by sept 2010. it took apple 3 years for iOS 7 to be released in 2013 to have built in flashlight controls…
memGPT seems to be that jailbroken app, enabling longer term memory, which is a short term problem that may eventually be updated.
with open ai releasing gpt less than a year ago, we have not really seen any product features or releases yet. it’s been more so integrated into other products like github co-pilot or notion’s assistance.
overall, the first year seems to be one large large human in the loop learning reinforcement learning environment to see how humans will interface with ai as well as to fine tune the output.
it will be interesting to see how long before open ai shares core memory management features as they explore new physical interfaces (iphone without the phone?) as well as what they share at their first developer conference in november.
hey, i’m michael and i’m exploring the intersection of ai and alt education. the next generation of innovaters deserve next generation tooling. connect with me on linkedin to follow my build journey.