Data scarcity – driven by under-detection and under-reporting of incidents, legal and competitive disincentives to share, and vendors’ reluctance to expose product weaknesses – impedes the development of data-driven cybersecurity policies. We investigate large language model (LLM)–based synthetic tabular data generation as a pragmatic remedy. Our approach follows a GReaT-style pipeline: (i) text-based serialization of heterogeneous tables to preserve schema semantics; and (ii) fine-tuning a pretrained decoder LLM (Unsloth/Llama-3.2-1B) on 3388 publicly reported cyber-attack records. Preliminary results that LLM-generated synthetic data can approximate the statistical and structural properties of scarce cybersecurity data without exposing sensitive information, thereby enabling data augmentation and supporting the design of data-driven cyber policies
Giacomello, G., Preka, O. (2025). Developing Synthetic Data or Cybersecurity Policies. Barcelona : IFSA Publishing [10.13140/RG.2.2.25717.44006].
Developing Synthetic Data or Cybersecurity Policies
Giacomello Giampiero
Primo
;Preka Oltion
2025
Abstract
Data scarcity – driven by under-detection and under-reporting of incidents, legal and competitive disincentives to share, and vendors’ reluctance to expose product weaknesses – impedes the development of data-driven cybersecurity policies. We investigate large language model (LLM)–based synthetic tabular data generation as a pragmatic remedy. Our approach follows a GReaT-style pipeline: (i) text-based serialization of heterogeneous tables to preserve schema semantics; and (ii) fine-tuning a pretrained decoder LLM (Unsloth/Llama-3.2-1B) on 3388 publicly reported cyber-attack records. Preliminary results that LLM-generated synthetic data can approximate the statistical and structural properties of scarce cybersecurity data without exposing sensitive information, thereby enabling data augmentation and supporting the design of data-driven cyber policiesI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


