niclas_wue
niclas_wue OP t1_j4yukoz wrote
Reply to comment by randomusername11010 in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
Yes, it is possible to use citations as a measure of a paper's impact. However, when a paper is newly published, there are typically no citations yet, so this would result in a delayed signal. Retweets and GitHub stars provide a faster indication of a paper's impact. I believe that speed is important because, as a paper becomes older, there are already many reviews and articles written by humans that (at least for now) provide a better summary of the paper.
niclas_wue OP t1_j4r5wb1 wrote
Reply to comment by Yidam in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
Yes, it can be applied to every document, a book would be more expensive, because it has more text and thus more input tokens. The pdf needs to be converted to text, because the API only accepts text, some equations which can be written using Unicode are directly put into the network and it can understand. Other equations are currently skipped. So far I have spent almost 100$ in tokens to summarize the papers, so there need to be some paid features in the near future or a reduction in the amount of papers.
niclas_wue OP t1_j4q13uw wrote
Reply to comment by Yidam in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
That’s actually a really good idea. Would you be willing to pay for such a feature? Something like 1$ per paper? That would cover the cost for the GPT tokens
niclas_wue OP t1_j4liedo wrote
Reply to comment by kroust2020 in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
I am using OpenAI’s API, I think at the moment there are not many entities capable of running GPT-3 themselves 😄
niclas_wue OP t1_j4k17s2 wrote
Reply to comment by kroust2020 in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
Thank you, I am glad you like it! At the moment, only the web server is public. You can find it here: https://github.com/niclaswue/arxiv-smry It is a Hugo server with a blog theme. Every blog is a markdown file. When a new file is pushed to git it automatically gets published on the blog.
The rest is basically a bunch of (messy) Python scripts for extracting the text, then asking GPT-3 for a summary and compiling the answers to a markdown file. Finally, I use GitPython to automatically push new summaries to the repo.
niclas_wue OP t1_j4i9r9w wrote
Reply to comment by RuairiSpain in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
Thanks for your ideas. Building a paid experience for companies is a great idea, I will consider it.
Category tagging like „computer vision“, „natural language processing“ etc. should be relatively straightforward. Will implement this in the next couple of days :)
More paper specific tags could be generated using GPT-3, I think that would make sense, when the database is a bit larger. Right now, I would guess that most tags would be unique to a single paper.
niclas_wue OP t1_j4g8v9w wrote
Reply to comment by Iunaml in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
Haha, sounds good, make sure to send me an invite :)
niclas_wue OP t1_j4fyzia wrote
Reply to comment by transgalpower in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
Yes, in the long run, there needs to be some sort of monetization to afford the API tokens. For now, I just want to see if people find it useful at all.
Thanks for letting me know, for me it works on mobile, but I will look into that.
niclas_wue OP t1_j4fqqy6 wrote
Reply to comment by ml-research in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
Thanks for asking! My first prototype collected all new arxiv papers in certain ML-related categories via the API, however I quickly realized that this would be way to costly. Right now, I collect all papers from PapersWithCode's "Top" (last 30 days) and the "Social" Tab, which is based on Twitter likes and retweets. Finally, I filter using this formula:
p.number_of_likes + p.number_of_retweets > 20 or p.number_github_stars > 100
In rare cases, when the paper is really long or not parsable with "grobid", I will exclude the paper for now.
Submitted by niclas_wue t3_10cgm8d in MachineLearning
niclas_wue t1_j340rkv wrote
Reply to comment by PassionatePossum in [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434
I think you make an important point. In the end it comes down to the distribution of the data, which always consists of signal and noise. The signal will always be similar, however the noise is random as people can be wrong in all sorts of ways.
niclas_wue t1_j842pyo wrote
Reply to [P] Introducing arxivGPT: chrome extension that summarizes arxived research papers using chatGPT by _sshin_
Hey, great idea, looks very interesting. Do you use the abstract as an input or do you actually parse the paper? I built something quite similar: http://www.arxiv-summary.com which summarizes trending AI papers as bullet points. However, I think a chrome extension allows for a much more flexible paper choice, which is really great.