Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models


  *Equal Contribution
1Duke University 2Johns Hopkins University

teaser

We propose a novel method for the detection of pre-training data of LLMs. This problem (see above figure left panel for an illustration) has been receiving growing attention recently, due to its profound implications to copyrighted content detection, privacy auditing, and evaluation data contamination.

Our method, named Min-K%++, is theoretically motivated by revisiting LLM's training objective (maximum likelihood estimation) through the lens of score matching. We show that LLM training implicitly minimizes the Hessian trace of log likelihood, which encodes rich second-order information and can thus serve as a robust indicator for flagging training data.

Empirically, Min-K%++ achieves state-of-the-art performance on the WikiMIA benchmark, outperforming existing approaches by large margin (showcased by the above figure right panel). On the more challenging MIMIR benchmark, Min-K%++ is also the best among reference-free methods and performs on par with reference-based methods.

WikiMIA results

Detection AUROC (%) on WikiMIA_length32. Min-K%++ achieves significantly improved results over Min-K% and other existing methods. For more results, don't hesitate to check our paper.
Method Mamba-1.4B Pythia-6.9B LLaMA-13B LLaMA-30B LLaMA-65B Average
Loss 61.0 63.8 67.5 69.4 70.7 66.5
Ref 62.2 63.6 57.9 63.5 68.8 63.2
Lowercase 60.9 62.2 64.0 64.1 66.5 63.5
Zlib 61.9 64.3 67.8 69.8 71.1 67.0
Neighbor 64.1 65.8 65.8 67.6 69.6 66.6
Min-K% 63.2 66.3 68.0 70.1 71.3 67.8
Min-K%++ 66.8 70.3 84.8 84.3 85.1 78.3

MIMIR results

Detection AUROC (%) on MIMIR averaged over 7 subdomains. The best result is bolded, with the runner-up underlined. Min-K%++ achieves SOTA among reference-free methods and performs on par with the Ref method which requires an extra reference LLM.
Method Pythia-160M Pythia-1.4B Pythia-2.8B Pythia-6.9B Pythia-12B
Loss 52.1 53.1 53.5 54.4 54.9
Ref 52.2 54.6 55.6 57.4 58.7
Zlib 52.3 53.2 53.6 54.3 54.8
Neighbor 52.0 52.9 53.2 53.8 /
Min-K% 52.6 53.6 54.2 55.2 55.9
Min-K%++ 52.4 54.1 55.3 57.0 58.7

BibTeX


@article{zhang2024min,
    title={Min-K\%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models},
    author={Zhang, Jingyang and Sun, Jingwei and Yeats, Eric and Ouyang, Yang and Kuo, Martin and Zhang, Jianyi and Yang, Hao and Li, Hai},
    journal={arXiv preprint arXiv:2404.02936},
    year={2024}
}