Searching in a large data set those strings that are more similar, according to the edit distance, to a given one is a time-consuming process. In this paper we investigate the performance of metric trees, namely the M-tree, when they are extended using a cheap approximate distance function as a filter to quickly discard irrelevant strings. Using the bag distance as an approximation of the edit distance, we show an improvement in performance up to 90% with respect to the basic case. This, along with the fact that our solution is independent on both the distance used in the pre-test and on the underlying metric index, demonstrates that metric indices are a powerful solution, not only for many modern application areas, as multimedia, data mining and pattern recognition, but also for the string matching problem.

String matching with metric trees using an approximate distance / Bartolini I.; Ciaccia P.; Patella M.. - STAMPA. - 2476:(2002), pp. 271-283. (Intervento presentato al convegno 9th International Symposium on String Processing and Information Retrieval, SPIRE 2002 tenutosi a Lisbon, Portugal nel September 2002) [10.1007/3-540-45735-6_24].

String matching with metric trees using an approximate distance

Bartolini I.;Ciaccia P.;Patella M.
2002

Abstract

Searching in a large data set those strings that are more similar, according to the edit distance, to a given one is a time-consuming process. In this paper we investigate the performance of metric trees, namely the M-tree, when they are extended using a cheap approximate distance function as a filter to quickly discard irrelevant strings. Using the bag distance as an approximation of the edit distance, we show an improvement in performance up to 90% with respect to the basic case. This, along with the fact that our solution is independent on both the distance used in the pre-test and on the underlying metric index, demonstrates that metric indices are a powerful solution, not only for many modern application areas, as multimedia, data mining and pattern recognition, but also for the string matching problem.
2002
String Processing and Information Retrieval
271
283
String matching with metric trees using an approximate distance / Bartolini I.; Ciaccia P.; Patella M.. - STAMPA. - 2476:(2002), pp. 271-283. (Intervento presentato al convegno 9th International Symposium on String Processing and Information Retrieval, SPIRE 2002 tenutosi a Lisbon, Portugal nel September 2002) [10.1007/3-540-45735-6_24].
Bartolini I.; Ciaccia P.; Patella M.
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/918344
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 38
  • ???jsp.display-item.citation.isi??? ND
social impact