This study investigates the detection of vulnerabilities in smart contracts using various transformer models and Large Language Model (LLM) systems. We evaluated BERT, CodeBERT, DistilBERT, and the Gemini model, employing techniques such as aggregation of chunks to enhance performance. The results indicate that simple transformers applied to source code generally perform worse than when applied to byte-code. However, the use of aggregation techniques on the source code significantly improved the model performance. We also evaluate the use of meta-classifiers for multimodal data, by stacking multiple transformers working on source code and byte-code. The Random Forest meta-classifier achieved the highest performance but exhibited significant overfitting. The Gemini model demonstrates limited performance, highlighting the necessity of proper training for LLM systems.
Ferretti, S., D'Angelo, G., Ghini, V., Tomasone, M.b. (2025). Detecting Smart Contract Vulnerabilities using Transformers and LLMs. 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA : IEEE COMPUTER SOC [10.1109/PerComWorkshops65533.2025.00033].
Detecting Smart Contract Vulnerabilities using Transformers and LLMs
Ferretti, S;D'Angelo, G;Ghini, V;Tomasone, MB
2025
Abstract
This study investigates the detection of vulnerabilities in smart contracts using various transformer models and Large Language Model (LLM) systems. We evaluated BERT, CodeBERT, DistilBERT, and the Gemini model, employing techniques such as aggregation of chunks to enhance performance. The results indicate that simple transformers applied to source code generally perform worse than when applied to byte-code. However, the use of aggregation techniques on the source code significantly improved the model performance. We also evaluate the use of meta-classifiers for multimodal data, by stacking multiple transformers working on source code and byte-code. The Random Forest meta-classifier achieved the highest performance but exhibited significant overfitting. The Gemini model demonstrates limited performance, highlighting the necessity of proper training for LLM systems.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


