_Arsenie_Boca_
_Arsenie_Boca_ t1_j54rrg9 wrote
Reply to comment by Spico197 in [P] paper-hero: Yet Another Paper Search Tool by Spico197
Great, looking forward to trying the space :)
_Arsenie_Boca_ t1_j5492g7 wrote
I think the idea is great! How long does it take to execute a query on the ArXiv set? Have you considered making a huggingface space out of this?
_Arsenie_Boca_ t1_j4rxdt8 wrote
Reply to comment by bo_peng in [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
Is there some more detailed description? Would be interesting to read about these lots of new ideas :)
_Arsenie_Boca_ t1_j3qskip wrote
Is your paper on arxiv aswell?
_Arsenie_Boca_ t1_j3l1x1y wrote
Reply to comment by learningmoreandmore in [D] I want to use GPT-J-6B for my story-writing project but I have a few questions about it. by learningmoreandmore
Pretty much, yes. I believe other APIs might use slightly worse models than OpenAI but definitely better than GPT-J.
_Arsenie_Boca_ t1_j3kzllo wrote
Reply to [D] I want to use GPT-J-6B for my story-writing project but I have a few questions about it. by learningmoreandmore
Your laptop will not begin to suffice, not for inference and especially not for fine tuning. You would need something like an A100 GPU in a server that handles requests. And in the end, the results will be much worse than GPT-3. If you dont already have an AI infrastructure, go with an API, it will save you more than a bit of money (unless you are certain you will use it at scale long-term). If you are worried about OpenAI, there are some other companies that serve LMs.
_Arsenie_Boca_ t1_j3dpfxv wrote
Reply to comment by jrmylee in [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. by jrmylee
Ah ok, I didnt know that was an issue. Extensions are really important so you should definitely look into that
_Arsenie_Boca_ t1_j3dntcr wrote
Reply to comment by jrmylee in [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. by jrmylee
Awesome. I dont, but there is a VSCode extension, so that would be integrated already. Or do you have any special integration of copilot?
_Arsenie_Boca_ t1_j3bbl7h wrote
Reply to [D] I recently quit my job to start a ML company. Would really appreciate feedback on what we're working on. by jrmylee
For me personally, it would be very important that I am not tied to Jupyter Notebooks. Ideally integrate vscode and automatically load settings from .vscode directory in the repo
_Arsenie_Boca_ t1_j394z5v wrote
Without docker, you wont be able to ensure that all user environments have all dependencies. Is it really worth the effort? (which I believe would be big) Why not just deploy in the cloud and set up an api?
_Arsenie_Boca_ t1_j2xrj3r wrote
I'm not an expert here, but as far as I understand from the docs, quantization is not yet a mature feature.
Im curious, what is the reason you dont want TensorRT?
_Arsenie_Boca_ t1_j2xonjk wrote
Reply to [Discussion] If ML is based on data generated by humans, can it truly outperform humans? by groman434
Yes, I believe there are 2 factors playing a role here:
-
Models could potentially correct some errors of the human labeler using their generalization power, provided that the model is not overfitted.
-
You should differentiate between outperforming a human and outperforming humans. Labels usually represent the collective knowledge of a number of people not just one.
_Arsenie_Boca_ t1_j03flw7 wrote
Fascinating! I wonder what google has to say about colab being used that way
_Arsenie_Boca_ t1_izyqkf9 wrote
Reply to comment by Brudaks in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
You are right. I think I used a slightly different prompt and got the something like "I am a LLM and i cannot execute commands"
_Arsenie_Boca_ t1_izwbuat wrote
Reply to comment by krali_ in [D] - Has Open AI said what ChatGPT's architecture is? What technique is it using to "remember" previous prompts? by 029187
OpenAI is constantly working on restricting those things. A few days ago you could still instruct the model to behave like a vm and basically execute commands. Now its no fun anymore
_Arsenie_Boca_ t1_izgbbrj wrote
Reply to comment by CrazyCrab in [D] Determining the right time to quit training (CNN) by thanderrine
Then how can you tell if you overfitted on the validation set?
_Arsenie_Boca_ t1_izg7d49 wrote
Reply to comment by CrazyCrab in [D] Determining the right time to quit training (CNN) by thanderrine
Not sure how this indicates overfitting on the validation set? Wouldnt this be indicated by much worse performance on test compared to validation set? Havent done a lot of image segmentation work, is this specific to the task?
_Arsenie_Boca_ t1_iyuu6le wrote
I usually prefer checkpointing over early stopping, i.e. you always save a checkpoint when you get a better validation score. Loss is typically a good indicator, but if you have a more specific measure that you are aiming for(like downstream metrics), you should use that.
_Arsenie_Boca_ t1_ixgjhjp wrote
Reply to [R] Getting GPT-3 quality with a model 1000x smaller via distillation plus Snorkel by bradenjh
As interesting as weak supervision is, the main takeaway is that using LLM few-shot predictions as labels to train a small model is a great approach to save labeling costs. Using snorkel on top means you have to query multiple LLMs and have snorkel as additional complexity, yielding only a few extra points. Perhaps those extra points also could have been achieved by letting the LLM label a few more samples or giving it a few more shots to get better labels
_Arsenie_Boca_ t1_ivvh1zh wrote
Reply to comment by DreamyPen in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
Oh, i dont think it is
_Arsenie_Boca_ t1_ivvfzfs wrote
Reply to comment by DreamyPen in [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
In classification you usually have a single correct class, a hard label. However, you might also have soft labels, where multiple classes have non-zero target probabilities. Label smoothing is a technique that artificially introduces those soft labels from hard labels, i.e. if your hard label was [0 0 1 0] it might now be [0.05 0.05 0.85 0.05]. You could use the strength of smoothing to represent uncertainty.
_Arsenie_Boca_ t1_ivufzxw wrote
Reply to [Discussion] Can we train with multiple sources of data, some very reliable, others less so? by DreamyPen
As others have pointed out, sample weights could be used. Another option would be to smoothen the labels of the unreliable source
_Arsenie_Boca_ t1_ivqr1k6 wrote
Reply to comment by Meddhouib10 in [D] Best learning rate for fine tuning a pretained CNN by Meddhouib10
It depends on the model and the task, so there is no general answer. But you dont have to search randomly. Plot your loss over time. If the lr is too high, the loss will behave almost randomly, and its almost constant if lr is too low
_Arsenie_Boca_ t1_iusvc0e wrote
Reply to [R] Is there any work being done on reduction of training weight vector size but not reducing computational overhead (eg pruning)? by Moose_a_Lini
Parameter sharing across layers would achieve just that. In the ALBERT paper the authors show that repeating a layer multiple times actually leads to similar performance than having separate parameter matrices. I havent heard a lot about this technique, but I assume this is because people mostly care about speed, which this does not improve (while it is a good match for your usecase)
_Arsenie_Boca_ t1_j5notgg wrote
Reply to comment by Spico197 in [P] paper-hero: Yet Another Paper Search Tool by Spico197
I get the following error: SyntaxError: Unexpected token 'I', "Internal S"... is not valid JSON