📦 DL

We need to talk about random seeds

date
Sep 24, 2023
slug
nlp-1
author
status
Public
tags
NLP
DL
summary
Five different ways at creating the random seed on NLP
type
Post
thumbnail
category
📦 DL
updatedAt
Sep 26, 2023 05:49 AM

Title: Five different ways at creating the random seed on NLP

Keyword: Random seed, Optimization, Bagging

Concept

On ACL current researches, risky-used random seeds are more often used than safe-used. Author make a point on this issue. So, these methods are why classified to risky cases?
Single Fixed Seed
  • Cannot represent the ‘replicability’ → observe another results
  • Not optimizable → risk: causing underestimate
Performance Comparison
  • Just result of suboptimal model in few comparison
  • In family model range → cause bias at some hyperparameter

Progress

So how can we optimize the random seed?
Model Selection
  • Consider the random seed as hyperparameter
  • try multiple random seed (evid. cannot get reason from optimized r.s)
Ensemble Creation
  • When making a pipeline with multipled ML models, it used
  • Solve high variance(use bagging!), low accuracy, features noise & bias
Sensitivity Analysis
  • Compare changes(delta) of hyperparameters == sensitivity

Result

NLP Researcher should be warned using random seed as risky-method.
 

Opinion

  • Model always need to design to get the consistent result?
  • If I consider random seed as hyperparameter, much times needed… In automatic process, can it be caused the increase of time complexity?

Advanced

  • Ensemble Creation
  • Meaning of family model