Viewing a single comment thread. View all comments

KingsmanVince t1_jaboa8m wrote

Knowing the architecture isn't enough. How large is your training dataset? Do you use gradient accumulation?

2

ahiddenmessi2 OP t1_jacin2n wrote

My dataset size can be varied cuz the data can be generated. Also, I will consider using gradient accumulation to improve performance too. Thanks

1