CRIS Current Research Information System

The volume and complexity of scientific literature are expanding rapidly, making it increasingly difficult to extract and synthesize information across studies. This challenge is particularly acute in the biological sciences, where evidence spans multiple levels of organization and heterogeneous experimental designs. Large Language Model (LLM) pipelines offer a scalable route to evidence synthesis, but many existing approaches lack transparency, modularity, and effective mechanisms for human oversight. We present MetaBeeAI, an open-source, modular pipeline that integrates established LLM techniques into a coherent, auditable workflow for structured data extraction in biology. MetaBeeAI combines modular prompting, multi-pass extraction, and expert-in-the-loop validation within an interface that presents model outputs alongside source text, enabling inspection, correction, and iterative refinement. The pipeline produces machine-readable records of prompts, configurations, and expert annotations, supporting reproducibility and continuous improvement. We apply MetaBeeAI to 924 research papers on bees and pesticides, extracting structured information on species, compounds, exposure designs, and experimental context. Evaluation demonstrates improved consistency, convergence with expert judgement, and robustness across heterogeneous biological studies, highlighting the value of expert-guided refinement. MetaBeeAI provides a transparent and extensible framework for scalable evidence synthesis, supporting reliable integration of LLMs into biological research workflows.

Parkinson, R.H., Cerbone, H., Mieskolainen, M., Cao, S., Wilson, A.D., Albacete, S., et al. (2026). MetaBeeAI: An AI pipeline for structured evidence extraction from biological literature. ECOLOGICAL INFORMATICS, 96, 1-13 [10.1016/j.ecoinf.2026.103813].

MetaBeeAI: An AI pipeline for structured evidence extraction from biological literature

Parkinson R. H.;Cerbone H.;Mieskolainen M.;Cao S.;Wilson A. D.;Albacete S.;Armstrong E. B.;Bass C.;Botías C.;Brown A.;Hayward A. J.;Herbertsson L.;Jones A. K.;Nagloo N.;Nicholls E.;Rigosi E.;Sgolastra F.;Siviter H.;Stanley D. A.;Straub L.;Straw E. A.;Tadei R.;Walter K.;Stevance H. F.;Daniels R. K.;Lambert B.;Roberts S.

2026

Abstract

The volume and complexity of scientific literature are expanding rapidly, making it increasingly difficult to extract and synthesize information across studies. This challenge is particularly acute in the biological sciences, where evidence spans multiple levels of organization and heterogeneous experimental designs. Large Language Model (LLM) pipelines offer a scalable route to evidence synthesis, but many existing approaches lack transparency, modularity, and effective mechanisms for human oversight. We present MetaBeeAI, an open-source, modular pipeline that integrates established LLM techniques into a coherent, auditable workflow for structured data extraction in biology. MetaBeeAI combines modular prompting, multi-pass extraction, and expert-in-the-loop validation within an interface that presents model outputs alongside source text, enabling inspection, correction, and iterative refinement. The pipeline produces machine-readable records of prompts, configurations, and expert annotations, supporting reproducibility and continuous improvement. We apply MetaBeeAI to 924 research papers on bees and pesticides, extracting structured information on species, compounds, exposure designs, and experimental context. Evaluation demonstrates improved consistency, convergence with expert judgement, and robustness across heterogeneous biological studies, highlighting the value of expert-guided refinement. MetaBeeAI provides a transparent and extensible framework for scalable evidence synthesis, supporting reliable integration of LLMs into biological research workflows.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista
	
				ECOLOGICAL INFORMATICS
			
	Codice DOI
	
				https://dx.doi.org/10.1016/j.ecoinf.2026.103813
			
	Citazione
	
				Parkinson, R.H., Cerbone, H., Mieskolainen, M., Cao, S., Wilson, A.D., Albacete, S., et al. (2026). MetaBeeAI: An AI pipeline for structured evidence extraction from biological literature. ECOLOGICAL INFORMATICS, 96, 1-13 [10.1016/j.ecoinf.2026.103813].
			
	Tutti gli autori
	
						Parkinson, R. H.; Cerbone, H.; Mieskolainen, M.; Cao, S.; Wilson, A. D.; Albacete, S.; Armstrong, E. B.; Bass, C.; Botías, C.; Brown, A.; Hayward, A. ...espandi
						
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Parkinson et al. 2026 Ecological Informatics.pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 2 MB Formato Adobe PDF Visualizza/Apri	2 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1069251

Citazioni

ND

0

0

1

social impact