RichardBJ1

RichardBJ1 t1_jcxcu7e wrote

Probably need an answer from someone who has both and has benchmarked some examples. (EDIT: and I do not!) Personally I find a lot of “law of diminishing(Edit) returns” with this type of thing. Also for me, since I spend 100x more time coding and testing will dummy sets… the actual speed of run is not as critical as people would expect…

1

RichardBJ1 t1_iy7xvbr wrote

Was interested when I first head about this concept. People seemed to respond with either thinking it was ground shaking, …..or alternatively that it stood to reason that given enough splits it would be the case! Do you think though, that from a practical usage perspective this doesn’t help much because there are so many decisions…. Article has a lot more than just that though and a nice provocative title.

3

RichardBJ1 t1_ix9fcaw wrote

His book has some nice examples, works well. Really as the other answer has said though you need to follow your interests and apply those examples to something that interests you. Another idea is Kaggle; you can clone others code quite legitimately and understand what they were up to. So many examples on Kaggle you’ll surely find something that fits your interests!! Good luck

5

RichardBJ1 t1_ix7ficv wrote

I think if you get even similar performance with one card versus 4 cards the former is going to be far less complex to set up!? Just the logistics of that sounds a nightmare.

2

RichardBJ1 t1_iw733qt wrote

Yes …obviously freezing the only two layers would be asinine! There is a keras blog on it, I do not know why particular layers (TL;DR). It doesn’t say top and bottom that’s for sure. …I agree it would be nice to have method in the choice of layers to freeze rather than arbitrary. I guess visualising layer output might help choose if a small model, but I’ve never tried that. So I do have experience of trying transfer learning, but (apart from tutorials) no experience of success with transfer learning!

1

RichardBJ1 t1_iw71rpv wrote

Good question; I do not have a source for that, have just heard colleagues saying that. Obviously the reason for freezing layers is that we are trying to avoid loosing all the information we have already gained. Should speed up further training by reducing parameter numbers etc. As to actually WHICH layers are best persevered I don’t know. When I have read on it, people typically say “it depends”. But actually my point was I have never found transfer learning to be terribly effective (apart from years ago when I ran a specific transfer learning tutorial!). In my models it only takes a few days to start from scratch and so this it what I do! Transfer learning obviously makes enormous sense if you are working with someone else’s extravagantly trained model and you may be don’t even have the data. But in my case I always do have all the data…

1

RichardBJ1 t1_iw43b43 wrote

Well transfer learning would be the thing I would expect peep to say, freeze the top and bottom layers, re-load the old model weights and continue training….. but for me the best thing to do has always been to use throw the old weights away and mix up the old and new training data sets and start again…. Sorry!!

3