Date: March 19th, 2023 1:03 AM
Author: Deep volcanic crater
It’s just a matter of time
https://casetext.com/
The funny thing is that as impressive as this looks, it’s remarkably easy to build the core functionality. I could do it in 15 minutes.
Here’s what they are doing:
1. Use sentence embeddings to encode chunks of documents drawn from text corpus
2. Store embeddings in vector database
3. Get sentence embedding of query
4. Use cosine similarity of query embedding vs database embeddings to find k nearest matches
5. Paste the corresponding text of k nearest matches into prompt template, along with original query
6. Send prompt to OpenAI API
7. Format and prettify response
Core function is ~20 lines of code with langchain, including imports. You can do this with any document corpus. Doesn’t have to be legal, could be any organization’s knowledge base.
(http://www.autoadmit.com/thread.php?thread_id=5307420&forum_id=2#46070539)