Oceanboi

Oceanboi t1_jd6g49h wrote

How do these handmade features compare to features identified by CNNs? Only reason I ask is that I'm finishing up some thesis work on sound event detection using different spectral representations as inputs to CNNs (Cochleagram, Linear Gammachirp, Logarithmic Gammachirp, Approximate Gammatone filters, etc). Wondering how these features perform in comparison on similar tasks (UrbanSound8K) and where it fits in the larger scheme of things.

2

Oceanboi t1_j8zdely wrote

Oh I see, I missed the major point that the training data is basically incomplete to model the entire relationship.

Why embed priors into neural networks, doesn’t Bayesian Modeling using MCMC do pretty much what this is attempting to do? We did something similar to this in one of my courses although we didn’t get to spend enough time on it so forgive me if my questions are stupid. I also would need someone to walk me through a motivating example for a PINN because I’d just get lost in generalities otherwise. I get the example, but am failing to see the larger use case.

1

Oceanboi t1_j8uygkt wrote

why was the neural network stopped at like 1000 steps? why are we comparing a physics informed neural network to a neural network at a different number of steps lol

Also correct me if I'm wrong but don't we care about how the model generalizes? I think we can show that some NN will fit to any training set perfectly given enough steps, but this is already common knowledge no?

1

Oceanboi t1_j8l8tst wrote

I’m guessing your company won’t have the resources or data to train a CNN to convergence from scratch, so read up on some common CNNs that people use for audio transfer learning (EfficientNet has worked well for me, as did ResNet50, albeit less so). Once you can implement one pre trained model, you can implement most of them fairly easily to see which one suits your task best. Also read up on Sharan et al 2019 and 2021 as he benchmarks numerous image representations, model architectures, and network fusion techniques. While results may very, empirically it is a great starting point although I was not able to achieve his results given his model architecture. Pay less attention to the actual architecture he talks about because you’ll most likely be doing transfer learning where you’ll be importing a model and it’s weights. For preprocessing look into either MATLAB for their Auditory Modeling toolbox and if you’re using python look into librosa, torchaudio, and brian2hears for more complex filterbank models.

1

Oceanboi t1_j5n6p7b wrote

my advice is to proceed. its cool to know the math underneath, but just go implement stuff dude, if it doesn't work you can always remote/rent GPU. what i did for my thesis is google tutorials and re-implement them using my dataset. through all the bugs and the elbow grease, you will know enough to at least speak the language. just do it and don't procrastinate with these types of posts (i do this too sometimes)

EDIT: a lot can be done on colab these days regarding neural networks and huggingface. google huggingface documentation! i implemented a huggingface transformer model to do audio classification (and im a total noob i just copied a tutorial). it was total misuse of the model and accuracy was bad, but at least i learned and given a real problem i could at least find my way forward.

1

Oceanboi t1_j3ef4wv wrote

A bit surprised to see the cavalier sentiments on here. I often wonder if they will eventually require commercials to disclose when it is computer generated (unreal engine 5 demos have fooled me a few times), and deepfakes come to mind as major problems (just saw an Elon one that took me embarrassingly too long to identify as fake). I don’t think tons of regulation should occur per say; other than certain legal disclosures for certain forms of media to prevent misinformation.

2

Oceanboi OP t1_j0cpolp wrote

Data is handled both the same way. I think it has to do with what u/MrFlufypants said, because when I restart my kernel and run it after freeing up some resources, it runs. I think the number of filters I was setting are right at the threshold in which my GPU runs out of VRAM, so small memory management differences in TF and PyTorch are causing TF to hit the limit faster than PyTorch.

1

Oceanboi t1_iz2tn86 wrote

Is this natural error rate purely theoretical or is there some effort to quantify a ceiling?

If I’m understanding correctly, you’re saying there is always going to be some natural ceiling to accuracy for some problems in which the X data doesn’t hold enough information to perfectly predict Y, or in nature just doesn’t help us predict Y?

1

Oceanboi t1_iz2bc3h wrote

Do you know if this is done by simply training for massive amounts of epochs and adding layers until you hit 100%?

I may still just be new, but I’ve never been able to achieve this in practice. I’d be really interested in practical advice on how to overfit your dataset. I still unsure of the results of that paper you linked, I feel like I am misinterpreting it in some way.

Is it really suggesting overparametrizing something past the overfitting point and continuing to train will ultimately yield a model that generalizes well?

I am using a data set of 8500 sounds, 10 classes. I cannot push past 70-75% accuracy and the more layers I add to the Convolutional base, the lower my accuracy becomes. Are they suggesting the layers be added to the classifier head only? I’m all for overparametrizing a model and leaving it on for days, I just don’t know how to be deliberate in this effort.

1

Oceanboi t1_iyvf94s wrote

This confuses me. I have almost never gotten an audio or image model over 90% accuracy - and it seems to be largely problem domain and data dependent. I've never been able to reach very low training loss. Does this only apply to extremely large datasets?

If I had a model that had 90% train and test accuracy, is that not a really good and adequately fit model for any real business solution? Obviously if it was a model that predicted a severe consequence, we'd want closer to 100%, but that's the IDEAL, right?

3

Oceanboi t1_iyq03g2 wrote

Why do you say an optimal learning algorithm should have zero hyperparameters? Are you saying an optimal neural network would learn things like batch size, learning rate, optimal optimizer (lol), input size, etc, on its own? In this case wouldn't a model with zero hyperparameters be the same conceptually as a model that has been tuned to the optimal hyperparameter combination?

Theoretically you could make these hyperparameters trainable if you had the coding chops, so why are we still as a community tweaking hyperparameters iteratively?

1

Oceanboi t1_iypx661 wrote

Half spaces, hyper planes, hmm. It seems as though my current understanding of entropy is very limited. Could you link me some relevant material so I can understand what a "zero level hypersurface" is? I only have ever seen simple examples of entropy / gini impurity for splitting random forest so I'm interested in learning more.

1

Oceanboi OP t1_ixtiolt wrote

Not from scratch for all, I simply want to take a base model or a set of base model architectures and compare how different audio representations (cochleagram, and other cochlear models) perform in terms of accuracy/model performance. That’s what got me to look into transfer learning and hence the question! I need some constant set of models to use for my comparisons.

1

Oceanboi OP t1_ixkc2dn wrote

It is trained on AudioSet. I listed YAMNet to highlight the lack of large audio models when compared to image models. And highlight the problem that it limits your data input due to its architecture.

Also, I mainly see transfer learning for CNN in kaggle notebooks, and could find a few papers where an image net is used as one of the models being used.

https://arxiv.org/pdf/2007.07966

https://research.google/pubs/pub45611/

These are just a few but it seems decently common.

1