jiachengyu ac88194b89 'honeypot'		1 year ago
README.md	'honeypot'	1 year ago

args.py	'honeypot'	1 year ago

data.py	'honeypot'	1 year ago

download_sst_dataset.sh	'honeypot'	1 year ago

en_core_web_sm-2.3.0.tar.gz	'honeypot'	1 year ago

en_core_web_sm-2.3.1.tar.gz	'honeypot'	1 year ago

generator.py	'honeypot'	1 year ago

loss.py	'honeypot'	1 year ago

main.py	'honeypot'	1 year ago

models.py	'honeypot'	1 year ago

modules.py	'honeypot'	1 year ago

requirements.txt	'honeypot'	1 year ago

unitrigger_utils.py	'honeypot'	1 year ago

README.md

HoneypotDefenseNLP-ACL21

HoneypotDefenseNLP-ACL21

This is the official code for the following paper.

Thai Le, Noseong Park, Dongwon Lee. A Sweet Rabbit Hole by DARCY: Using Honeypots to Detect Universal Trigger’s Adversarial Attacks. 59th Annual Meeting of the Association for Computational Linguistics (ACL) 2021. (Long Paper, Main Conference)

Requirement

Check requirements.txt.
Note: a main portion of the code uses allennlp library, which has been tremendously updated since then. Please make sure to install the correct version from the requirements.txt file.

Instruction

Download pretrained fasttext word-embedding at crawl-300d-2M.vec.zip and place inside the project folder.
Available datasets includerotten_tomatoes, subjectivity, agnews
SST dataset can be downloaded using download_sst_dataset.sh

Defense against universal attacks using random trapdoors on `RNN` trained with `rotten_tomatoes` dataset:

python main.py --model RNN --dataset rotten_tomatoes --embedding_path ./crawl-300d-2M.vec --trapdoor_method random --smooth_eps 0.75

Defense against universal attacks using DARCY on `CNN` trained with `subjectivity` dataset:

python main.py --model CNN --dataset subjectivity --embedding_path ./crawl-300d-2M.vec --trapdoor_method DARCY --smooth_eps 0.75

Defense against universal attacks using DARCY on `CNN` trained with `agnews` dataset with 2 trapdoors:

python main.py --model CNN --dataset agnews --num_class 4 --embedding_path ./crawl-300d-2M.vec --trapdoor_method DARCY --smooth_eps 0.75 --trapdoor_num 2

Other optional configurations:

--trigger_ignore: # of top triggers to ignore while searching for universal attacks as shown in the paper
--oracle: oracle universal trigger attacks as shown in the paper
Please check args.py for more options.

Example Outputs

python main.py --model CNN --dataset rotten_tomatoes --embedding_path ../crawl-300d-2M.vec --trapdoor_method DARCY --smooth_eps 0.75 --trapdoor_num 2

loading dataset
8530it [00:05, 1539.55it/s]
1066it [00:00, 1671.20it/s]
1066it [00:00, 1389.37it/s]
loaded vocab size 16672
Distribution of model train (array([0, 1]), array([4265, 4265]))
Total data 10662
accuracy: 0.7360, loss: 0.5265 ||: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 240/240 [00:02<00:00, 115.59it/s]
searching for trapdoor...
populating additional data...
populating trapdoor train data...
additional_data 768
trapdoor_train_data 2457
TRAPDOORS [[unfunny, soulless], [bourne, masterful]]
Training Model
accuracy: 0.7048, loss: 0.2447 ||: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 264/264 [00:01<00:00, 146.17it/s]
accuracy: 0.7655, loss: 0.6083 ||: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 27/27 [00:00<00:00, 199.46it/s]
accuracy: 0.9780, loss: 0.0625 ||: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 264/264 [00:02<00:00, 120.21it/s]
accuracy: 0.7491, loss: 1.0505 ||: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 27/27 [00:00<00:00, 225.21it/s]
WARNING:root:vocabulary serialization directory ./saved/w2v_vocab_rotten_tomatoes is not empty
Training Detector
Initialize NETWORK detector, cuda() and optimizer()
Distribution of detector train [1229  614  614]
Distribution of detector dev [307 154 154]
accuracy: 0.8038, loss: 0.4468 ||: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [00:00<00:00, 181.22it/s]
accuracy: 0.8732, loss: 0.3085 ||: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 237.28it/s]
accuracy: 0.9577, loss: 0.1061 ||: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [00:00<00:00, 208.14it/s]
accuracy: 0.9252, loss: 0.1424 ||: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 307.47it/s]
accuracy: 0.9691, loss: 0.0865 ||: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 77/77 [00:00<00:00, 111.92it/s]
accuracy: 0.9626, loss: 0.0916 ||: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 302.60it/s]
   target_label       ACC        F1  TARGET_ACC  DETECT AUC  INIT  \
0             0  0.749531  0.719243    0.000000    0.965291  lust   
1             1  0.749531  0.719243    0.001876    0.977486  pegs   

               TRIGGER  
0  soulless;accumulate  
1  masterful;masterful

Credits

Thanks Eric-Wallace for the base code of Universal Trigger (https://github.com/Eric-Wallace/universal-triggers)

Citation

Please cite the work using the following bibtex.

@article{le2021darcy,
    title={A Sweet Rabbit Hole by DARCY: Using Honeypots to Detect Universal Trigger’s Adversarial Attacks},
    author={Thai Le and Noseong Park and Dongwon Lee},
    year={2021},
    journal={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL'2021)},
}

No Description

Python Shell Text

How to access data resources in code