Submitted by TankAttack t3_10mmjvt in MachineLearning

I'd like to extract named entities, something like this:

"[Text]: Microsoft (the word being a portmanteau of "microcomputer software") was founded by Bill Gates on April 4, 1975, to develop and sell BASIC interpreters for the Altair 8800. Steve Ballmer replaced Gates as CEO in 2000, and later envisioned a "devices and services" strategy.

[Name]: Steve Ballmer

[Position]: CEO

[Company]: Microsoft

"

Tried it on GPT-Neox with 20b parameters with mixed success, is there anything better out there to try for a few-shot learning (without fine tuning)?

6

Comments

You must log in or register to comment.

janck12 t1_j67d4qt wrote

I am not sure, if there is huge differences from one model to another. This is heavily depending on the training data that you can get.

I would suggest using some existing NER nodels and possibly fine tune them on your own data. Have a look at GENRE https://github.com/facebookresearch/GENRE

4

thatphotoguy89 t1_j63zz5q wrote

GPT-J is supposed to be quite good. Do you have a list of the types of entities you'd like to detect?

3

TankAttack OP t1_j64p2oj wrote

At this point I would like to imitate the example with position and company. It was taken from gpt-j btw. I thought neox is 3 times bigger so tried that first. Will run gptj and compare the results now.

Thank you

1

thatphotoguy89 t1_j64q5hm wrote

You can try extractive QA if you don't want to fine-tune it. Basically, create a QA pipeline and ask the same questions for different text

1

TankAttack OP t1_j660bm9 wrote

Do you mean free text questions? Like zero shot learning? Are there any examples of this?

1

visarga t1_j65iwit wrote

I am using GPT-3 for this kind of stuff, and fine-tuning small models on the data.

3

TankAttack OP t1_j660efo wrote

How many samples do you use for fine turning?

1

visarga t1_j67sivp wrote

My task uses sentence pairs, and I have an efficient prompt that makes many pairs in one go. So in 5 hours I managed to generate 230K pairs. Cost $10. I plan to generate millions to "exfiltrate" more domain knowledge for the small and efficient models I am training downstream.

1

LetMeGuessYourAlts t1_j64t0lv wrote

I'm doing something similar to your task. My plan is to use GPT-3's Text-divinci-003 as it can do this in Instruct mode without modification and then once I have a hundreds to thousands of examples then fine-tune GPT-J on Forefront.ai using what GPT-3 generated to hopefully cut costs by about 75%.

2

TankAttack OP t1_j63scac wrote

I also tried pre-trained tools like Spacy, but they only have a few fixed entity types they detect.

1

bubudumbdumb t1_j66a4kw wrote

The way you prompt assume there is a single entity for "name" so you catch "balmer" but not "bill gates".

Why not BIO tagging each token for each of the entity types?

1