pythoslabs

pythoslabs t1_j0pqrsp wrote

Custom NER is the way to go. I believe you will have to run a custom annotation pipeline defining your custom NERs . In your case - do a fine tune on a model to train on the defined spans on a few documents on 'Goals' . ( If you have more than one NER, add spancategorizer into your pipeline ) https://spacy.io/api/spancategorizer

Check out "training custom NER in spacy" on youtube - you should get plenty of detailed videos.

And if you want to go an extra step and extract a cause and effect relationship (this is out of scope for your project though ), but for the benefit of any future reader coming here - in case you have a relation like "Goal" - "Action" , you can use the following two methods -

  1. Spacy has a model for this. ( you can create your entity relation extractor ) on this. Check out this video . https://www.youtube.com/watch?v=8HL-Ap5_Axo
  2. Kindred is a project which is specially for Biomedical text. eg : in case there is a cause - effect relationship ( check it out here - https://spacy.io/universe/project/kindred )

DM me in case you need any further points.

2

pythoslabs t1_j00ltu7 wrote

>Yeah, this likely breaks some terms of service.

Which ones ? Can you be please be specific ? The whole idea of gpt-3 was to create the content it generated for commercial purposes and the entity which generates the content to own the output.

"As between the parties and to the extent permitted by applicable law, you own all Input, and subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output."

Reference link : https://openai.com/api/policies/terms/

In other words .. the OP has the right to the content he has generated using GPT-3 ( see screenshot -1 )

screenshot -1 https://imgur.com/UM6RrOF

As long it does not violate its general terms and conditions ( see the screenshot -2 )

screenshot -2

https://imgur.com/a/YWLJQHq

2

pythoslabs t1_j00ge8i wrote

Here are some ideas -

- collection of news and finding the impact of news on stock prices ( NLP / Timeseries )

- put a camera in front of your street and predict daily traffic volume ( Computer Vision + prediction )

- predict the winners of the next UFC fight / NFL championship

Basically build a system on events that are currently happening / yet to happen in the near future and evaluate your results against the real outcomes.

​

If you want to do the whole end-to-end project here are the things you have to do -

Try the whole pipeline - starting from

  • data collection
  • cleaning the data ( build rules)
  • building the feature list
  • creating your analytical dataset
  • the complete model creation step
  • prediction
  • evaluation & interpretation of model result
  • deploy to production
  • evaluate model drift
  • model refresh
1

pythoslabs t1_j00ffog wrote

Yes.

You have to train on their system with your custom data. It is costly though.

eg: if you want to train on the Davinci model will cost you - $0.0300 / 1K tokens for training ( fine tuning ) and $0.1200 / 1K tokens for its usage - if you wish to use it as an API end point )

1