I have been working on a project with GPT-3 API for almost a month now. The only drawback of GPT-3 is that the prompt you can send to the model is capped at 4,000 tokens - where a token is roughly equivalent to ¾ of a word. Due to this, providing a large context to GPT-3 is quite difficult.

Is there any way to resolve this issue?

Comments

Bulky_Highlight_3352 t1_j3fqhij wrote on January 8, 2023 at 6:03 AM

There are tools to work around this limitation such as LangChain with its support of summarization of previous context https://github.com/hwchase17/langchain

mandogbeer t1_j3gks4k wrote on January 8, 2023 at 12:19 PM

So the idea is to summarise the input to increase the information density? A sort of lossy input compression?

madmax_br5 t1_j3ik7wh wrote on January 8, 2023 at 8:46 PM

It kind of depends on what the use case is. If it's simply to query against a large amount of information, you can just create embeddings of the information in chunks and add these together in a vector store index (https://gpt-index.readthedocs.io/en/latest/guides/index_guide.html). Then you embed your query using the same model and basically the relevant chunks are returned, and then you can synthesize a response from those chunks.

So let's say the use-case is to create a conversational tutor assistant for a textbook. Obviously, you can't put the whole textbook in the prompt. So you feed it in one paragraph at a time into the embeddings model, and store all these embeddings (along with the text they relate to) in a vector database like weaviate or pinecone. Then, when the user asks a question, you embed the query using the same embeddings model, and do a cosine similarity search using your vector database (a common function of vector DBs). And you say, return me the top 5 relevant chunks. Now you have some short context you can feed into normal GPT-3, with a prompt like "given the following context, create a bullet point summary" or "given the following context, create a simplified analogy using real-world examples."

Embeddings are basically the first half of the transformer. Language transformers essentially have two halves - the first half understands the input and encodes it into a set of numbers the model can understand. The second half takes that understanding and predicts a probable next word. When you think about this from a computational perspective, the first half only runs once, and the second half runs hundreds of times (once per output token). So you end up with only a fraction of a percent of the computation time spent on understanding (embedding) the input, and most of the time iteratively generating tokens. What semantic search in vector space lets you do is essentially compare items after only step 1, and THEN produce an output once you've gathered the necessary context. But of course you perform the embedding on your data ahead of time, so the only real compute that is needed at runtime is the embedding of the user's query, which is cheap.

Bulky_Highlight_3352 t1_j3hsmp7 wrote on January 8, 2023 at 5:59 PM

I believe so, I think you can experiment with different summarization prompts too. For me it is still trial and error when dealing with large context windows.

madmax_br5 t1_j3iii5y wrote on January 8, 2023 at 8:36 PM

Actually GPT-index is a more robust framework for this, and plays well with langchain: https://github.com/jerryjliu/gpt_index

bigvenn t1_j3gmb3s wrote on January 8, 2023 at 12:37 PM

That’s an awesome library, thanks for sharing!

[deleted] t1_j3g0r63 wrote on January 8, 2023 at 8:00 AM

[removed]

Advanced-Hedgehog-95 t1_j3fkzu3 wrote on January 8, 2023 at 5:11 AM

There is a gpt3 subreddit. You should probably post it there too

trafalgar28 OP t1_j3flgpa wrote on January 8, 2023 at 5:16 AM

Posted there too:)

bacocololo t1_j3fzu7v wrote on January 8, 2023 at 7:48 AM

you can use recurrent block transformers, and consider one gpt request as a block

mterrar4 t1_j3h0f0k wrote on January 8, 2023 at 2:48 PM

You could try to break it up into chunks or try to extract the most important information first using summarization.

trafalgar28 OP t1_j3h4nm4 wrote on January 8, 2023 at 3:20 PM

Yup this will work. But I'm looking for making a model of questioning - answer. Something like this.https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_embeddings.ipynb

mterrar4 t1_j3hj053 wrote on January 8, 2023 at 4:58 PM

I’m a bit confused, are you unable to use their approach in this case? It seems like using embeddings is the work around to get more context when the text is too large.

mandogbeer t1_j3gkmkx wrote on January 8, 2023 at 12:17 PM

Fine tuning? https://beta.openai.com/docs/guides/fine-tuning

[deleted] t1_j3gt5g9 wrote on January 8, 2023 at 1:47 PM

[removed]