πŸ“¦ DL

BERT on a Data Diet: Finding Important Examples by Gradient-Based Pruning

date
May 26, 2023
slug
nlp-2
author
status
Public
tags
NLP
summary
Performance dominance of deleting noised dataset on NLP
type
Post
thumbnail
DALLΒ·E 2023-05-26 10.00.53 - Towards a Human-like Open-Domain Chatbot like robot and give ᄉᅑ아ᄇα…₯α„‘α…₯ᆼ크 style .png
category
πŸ“¦ DL
updatedAt
May 26, 2023 01:01 AM

Title: Performance dominance of deleting noised dataset on NLP

Keyword: Dynamic Data Pruning, Sub-dataset, GraNd, EL2N

Concept

GraNd & EL2N: two metric method on Img classification model
β†’ This article tried to apply on NLP tasks
  1. identify the important example in a dataset
  1. Different point with Computer Vision
  1. Within delete the noised method, sometimes can get more better performance than doing a fine-tuning
GraNd
:Expected value of the loss gradient norm
EL2N
:Estimate of GraND (p: output prob. vector)
β†’ Obtained by averaging the model output β€˜p’ across the training epochs.

Progress

Setting: pre-trained model(BERT, NLP Domain), compute thre metrics over fine-tuned steps(not at epoch), 5 independent runs
DataSet: MNLI(inference), AG News(topic classification)
  1. Score computation step: For early steps, results are not reliable than a random sampling. Only in AG News, it provides reasonable scores after 500 steps. MNLI can get better score than random one after >1500 steps.
  1. Preserved Fraction: To investigate the size of training sey affect to the performance. In smaller fraction of subset cases, not performed well. MNLI have faster timing that EL2N & GraNd’s performance overtake the randoms.
  1. Noise Case: highest scoring examples can caused the noise on result β†’ so remove the best sub-dataset. Interesting point on the MNLI dataset, removing >3% dataset appears better performance than whole dataset case.
  1. When limited dataset, EL2N sometimes find the important examples by fine-tuning a few seeds.

Result

Author got different result at NLP β†’ not worked (in Score Computation, Preserved Fraction) BUT in removing the noisy case, can achieve better perfromance when using a subset.
β†’ Future goal; find a pruning mechanism in a single fine-tuning procedure.
Β 

Opinion

  • How can classified the dataset? β†’ Is there any validation tools?
  • Academical reason on different results between Vision and NLP, also between language inference and topic classification. (lower labels in NLP)
  • Need to get more experience on NLP tasks, for considering proper pre-trained model on buisness levels. (required structured service)

Advanced

  • Reason: constant value on no pruning cases.
  • more types of metric method on NLP (popular one)