Comments

You must log in or register to comment.

FutureIsMine t1_iu12lki wrote

THESE RESEARCHERS JUST TOOK MY IDEA!!!!! Congrats to the team, Im glad they've vindicated that I was on the right path

27

0xWTC OP t1_iu15vi2 wrote

good one :) I'm actually doing a very similar thing to what they are doing, to generate 1000-2000 word blog posts. The lack of lookback at the entire text with most models (too expensive, currently) is a very interesting problem to circumvent. How to avoid repetitions, and keep the flow logical, while not spending too much GPU time.

9

j4nds4 t1_iu25gly wrote

Isn't this similar something that was done with AI Dungeon? I seem to recall in the early post-GPT-3 days that when creating custom scenarios you could include a collection of world data separate from the actual prompts/story that would (presumably) be re-injected into the prompt in order to maintain some structure. How effective it was though I'm unsure.

9

yaosio t1_iu2ylrs wrote

NovelAI does it as well and it doesn't really work that well. Many times it completely ignores entries. However both NovelAI and AI Dungeon have a limited output. This study is on generating 2000+ words without human intervention. I made plenty of stories with both NovelAI and AI Dungeon to know that neither stay on topic and quickly go off the rails. It doesn't matter what model is used, they all go off the rails.

9

yaosio t1_iu2za10 wrote

Diffusion for language models provides more coherent output according to various studies I've found. I'm surprised nobody's talking about it considering all the hype about diffusion for image generators. I guess it's not as cool as it sounds. The paper doesn't compare it to GPT models which should have told me something.

https://arxiv.org/abs/2205.14217

https://github.com/xiangli1999/diffusion-lm

There's also a new method that's even faster than diffusion.

https://www.assemblyai.com/blog/an-introduction-to-poisson-flow-generative-models/

I hope you have good luck on your text generating endeavors!

7

andreasblixt t1_iu3hw2a wrote

Now please run this on all of /r/WritingPrompts

8

0xWTC OP t1_iu3kyag wrote

the paper is actually using GPT-3, as far as I understand. It's hard to compare since you physically can't really generate a 2000 word article with GPT-3 (one shot)

thanks I will look into it

2

leepenkman t1_iu3sd0z wrote

This is some truly epic prompting, checkout https://text-generator.io for replacing some of the gpt-instruct-13B queries to save cost, you can generate many results in a single inference too and are only charged for a whole request, then can rerank them.

Theres a lot of tricks in here like using a relevant/coherent detection network to rerank and using DPR to select relevant parts of the summary for the context. (see the utils in

```

dpr_query_encoder = SentenceTransformer('sentence-transformers/facebook-dpr-question_encoder-single-nq-base')

dpr_context_encoder = SentenceTransformer('sentence-transformers/facebook-dpr-ctx_encoder-single-nq-base'))

```

a whole bunch of other models are loaded in there like for NER, entailment, QA using unifiedqa-t5-large, i find it a good reference for some good/appropriate to use models https://github.com/yangkevin2/emnlp22-re3-story-generation/blob/20a99853ff4acbdb11865f57f4fa74431af0b628/story_generation/common/util.py#L69

2

1point21giggawats t1_iu3uzod wrote

Great, now someone finish Winds of Winter with this thing, please.

1

o_snake-monster_o_o_ t1_iu4d33q wrote

I'm pretty sure this is how we're gonna advance these models to the next step. It's a lot easier to think about these things in the context of coding, because coding is thinking but in a very restricted symbolic world.

For example, the next step for coding language models will be to implement a command language to let them query the code and get information/intelligence from it (a LSP for example). Then we use some sort of RL algorithm or a hypernetwork to finetune how the context should be written and organized to maximize efficiency, which information to drop to make room for new information, etc.

We have this huge GPT context window but we're filling it up with so noise! Humans work with highly augmented data, for example the syntax highlighting in our code editor, so why are we not augmenting GPT-3's input?

0