Data scarcity – driven by under-detection and under-reporting of incidents, legal and competitive disincentives to share, and vendors’ reluctance to expose product weaknesses – impedes the development of data-driven cybersecurity policies. We investigate large language model (LLM)–based synthetic tabular data generation as a pragmatic remedy. Our approach follows a GReaT-style pipeline: (i) text-based serialization of heterogeneous tables to preserve schema semantics; and (ii) fine-tuning a pretrained decoder LLM (Unsloth/Llama-3.2-1B) on 3388 publicly reported cyber-attack records. Preliminary results that LLM-generated synthetic data can approximate the statistical and structural properties of scarce cybersecurity data without exposing sensitive information, thereby enabling data augmentation and supporting the design of data-driven cyber policies

Giacomello, G., Preka, O. (2025). Developing Synthetic Data or Cybersecurity Policies. Barcelona : IFSA Publishing [10.13140/RG.2.2.25717.44006].

Developing Synthetic Data or Cybersecurity Policies

Giacomello Giampiero
Primo
;
Preka Oltion
2025

Abstract

Data scarcity – driven by under-detection and under-reporting of incidents, legal and competitive disincentives to share, and vendors’ reluctance to expose product weaknesses – impedes the development of data-driven cybersecurity policies. We investigate large language model (LLM)–based synthetic tabular data generation as a pragmatic remedy. Our approach follows a GReaT-style pipeline: (i) text-based serialization of heterogeneous tables to preserve schema semantics; and (ii) fine-tuning a pretrained decoder LLM (Unsloth/Llama-3.2-1B) on 3388 publicly reported cyber-attack records. Preliminary results that LLM-generated synthetic data can approximate the statistical and structural properties of scarce cybersecurity data without exposing sensitive information, thereby enabling data augmentation and supporting the design of data-driven cyber policies
2025
Big Data Analytics & Applications
24
26
Giacomello, G., Preka, O. (2025). Developing Synthetic Data or Cybersecurity Policies. Barcelona : IFSA Publishing [10.13140/RG.2.2.25717.44006].
Giacomello, Giampiero; Preka, Oltion
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1031415
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact