📦 DL
We need to talk about random seeds
date
Sep 24, 2023
slug
nlp-1
author
status
Public
tags
NLP
DL
summary
Five different ways at creating the random seed on NLP
type
Post
thumbnail
category
📦 DL
updatedAt
Sep 26, 2023 05:49 AM
Title: Five different ways at creating the random seed on NLP
Keyword: Random seed, Optimization, Bagging
Concept
On ACL current researches, risky-used random seeds are more often used than safe-used. Author make a point on this issue. So, these methods are why classified to risky cases?
Single Fixed Seed
- Cannot represent the ‘replicability’ → observe another results
- Not optimizable → risk: causing underestimate
Performance Comparison
- Just result of suboptimal model in few comparison
- In family model range → cause bias at some hyperparameter
Progress
So how can we optimize the random seed?
Model Selection
- Consider the random seed as hyperparameter
- try multiple random seed (evid. cannot get reason from optimized r.s)
Ensemble Creation
- When making a pipeline with multipled ML models, it used
- Solve high variance(use bagging!), low accuracy, features noise & bias
Sensitivity Analysis
- Compare changes(delta) of hyperparameters == sensitivity
Result
NLP Researcher should be warned using random seed as risky-method.
Opinion
- Model always need to design to get the consistent result?
- If I consider random seed as hyperparameter, much times needed… In automatic process, can it be caused the increase of time complexity?
Advanced
- Ensemble Creation
- Meaning of family model