Submitted by granddaddy t3_zjf45w in MachineLearning

Is there a way to get around GPT-3's 4k token limit?

Companies like Spellbook appear to have found a solution, with some people speculating what they have done on Twitter - e.g., summarizing the original document, looping in 4k chunks until the right answer is produced, etc.

I suspect multiple solutions have been applied.

I'd be curious if you have any ideas!

Relevant Tweet: https://twitter.com/AlphaMinus2/status/1600319547348639744

16

Comments

You must log in or register to comment.

visarga t1_izvv9xi wrote

I'd be interested in knowing, too. I want to parse the HTML of a page and identify what actions are possible, such as modifying text in an input or clicking a button. But web pages often go over 30K tokens so there's no way to fit them. HTML can be extremely verbose.

2

rafgro t1_izyke4t wrote

I've been sliding content window, summarizing chunks, chaining summaries, summarizing chained summaries, all while guiding attention (focus on X, ignore Y). I've had also limited success with storing all summaries separately, choosing most relevant summary based on task/question, and then answering with opened relevant context window in addition to the summaries, but it was too much pain (also financial) for very small gain in my case (but I imagine in legal environment it may be much more important to get every detail right).

6

granddaddy OP t1_izytv9p wrote

I found this twitter thread that may hold the answer (or at least one way to do it)

  • the data that needs to be fed into the model is divided into chunks
  • when a user asks a question, each of these chunks (likely less than 4k tokens) is reviewed
  • when there is a section of the chunk that is relevant, that section is combined with the user question
  • this combined text is fed as prompt, and GPT-3 is able to answer the user's question

overall, it sounds similar to what you have done, but i wonder how much the computational load changes

there's a prebuilt openai notebook you can use to replicate it

3

granddaddy OP t1_izyu345 wrote

this sounds like the right answer (and sth i need to keep in mind as well)

just as an FYI, this is one answer i found from a twitter thread

  • the data that needs to be fed into the model is divided into chunks
  • when a user asks a question, each of these chunks (likely less than 4k tokens) is reviewed
  • when there is a section of the chunk that is relevant, that section is combined with the user question
  • this combined text is fed as prompt, and GPT-3 is able to answer the user's question

there's a prebuilt openai notebook you can use to replicate it

1

rafgro t1_j00tamz wrote

I've found another, much cheaper approach - tokenize long text and task (client-side without costly API calls), find the highest density of task token matches in the long text, slide content window there while retaining general summary of the document, and answer from this prompt.

2

asmallstep t1_j01daaa wrote

Any libraries that you can recommend for this task?
It doesn't sound like a trivial problem. For example, a label "Email" above a field called "input1" would add relevant information that seems hard to parse. Taking the layout into account would be really helpful though.

2

rafgro t1_j05enj4 wrote

Example tokenizer: https://github.com/josephrocca/gpt-2-3-tokenizer, in the most vanilla version you could count occurrences of tokens from the question/task in the document and jump to that place, eg. if the task is about lung cancer, jump to the book chapter with most occurrences of "lung" and "cancer". It works fine enough but you can make it more robust by building a timid scoring system (eg. higher weight assigned to lung than to cancer), finding words related to task words with word2vector equivalent and looking for them with appropriate weights as well, or even splicing a few different places with high score into one prompt.

2