Explore chapters and articles related to this topic
Approach for Context-Aware Semantic Recommendations in FI
Published in Spyrou Evaggelos, Iakovidis Dimitris, Mylonas Phivos, Semantic Multimedia Analysis and Processing, 2017
Naudet Yannick, Groués Valentin, Mignon Sabrina, Arnould Gérald, Foulonneau Muriel, Djaghloul Younes, Khadraoui Djamel
Geonames provides services to retrieve the components of a location (coun-tries for a continent, administrative subdivisions for a country, etc.), the neighbors (neighboring countries), and nearby features (near the Eiffel Tower are Champ-de-Mars, Trocadero, etc.). This dataset contains information about most geographical locations.
Learning to combine multiple string similarity metrics for effective toponym matching
Published in International Journal of Digital Earth, 2018
Rui Santos, Patricia Murrieta-Flores, Bruno Martins
Our experiments have mostly relied on a dataset of five million pairs of toponyms, half of which corresponding to alternative names for a same place. The dataset was generated from lists of alternative place names associated with records in the publicly available GeoNames gazetteer (i.e. each place that is described in GeoNames can be associated with multiple names, often corresponding to historical denominations or to transliterations in multiple alphabets/languages, and thus we can leverage this information to build a large dataset covering toponyms from all around the globe). The matching pairs of toponyms in our dataset correspond to alternative names with more than two characters that, after converting all characters into their lower-cased equivalents, do not match in every character. The non-matching pairs of toponyms correspond to names for different places, not necessarily within the same country, that also have more than two characters. In order to build a dataset that is both representative and challenging for automated classification, a significant portion of the non-matching pairs of toponyms should not be completely dissimilar. According to this intuition, we preferred toponym pairs having a Jaccard similarity above zero (i.e. when building the dataset, if the similarity between a non-matching pair of toponyms was equal to zero, we discarded the pair with a probability of 0.75).