  • Rada Varga (Autor/in)
    Center for Roman Studies
    Researcher at the Babeș-Bolyai University, Cluj-Napoca; coordinator of the Romans1by1 project Research fields: Latin epigraphy, population studies, digital classics, Roman social history.
  • Angela Lumezeanu (Autor/in)

The proposed paper explores the methodological and practical aspects of linking individuals extracted from ancient epigraphic sources. We are extracting our information from Romans1by1, a population database for persons attested in ancient epigraphy. Built as a MySQL relational database, it follows the best practice models for population databases, thus being distinct from the platforms focused on hosting and offering various types of repertoires of sources. R1by1 deals with people attested epigraphically in the Roman provinces and the architecture of the metadata was designed to facilitate research (mainly) pertaining to prosopography and SNA.

As far as we know, record linkage has not been applied to people extracted from epigraphy so far. Record linkage is a procedure used for finding double data, standardization and data matching across different data sources (Hin et al. 2016). Many historical databases contain data entry errors such as misspellings, or the parameters for an individual change over time (moving to another place, changing occupation, gaining different titles, etc.). In order to have a successful linkage, data cleaning and standardization are important steps. A method used for record linkage is calculating the Jaro-Winkler distance: measuring the similarities between 2 strings (sequence of characters like names, places, occupations etc.), we get a 0 score for 0 matches, while a perfect match equals 1. In the end, a good record linkage will provide a base for analysing social mobility, mortality, migration and reconstructing population patterns.   

The data extracted from epigraphical material is not very rich and the number of missing variables is large. So, in order to test the linkage possibilities, we are using a semi-automated method. Employing a series of procedures written in Sql language the computer is going to compile a file with the probable identifications of same individuals from the Romans1by1 database. As variables used for linking we include: praenomen, nomen, cognomen, occupation, tribus, origo/domus etc.. To verify if the individuals are correctly linked we are going to do a manual check (in this phase, relations and other people attested on the same monuments will be paramount). Thus, we are able to establish some basic rules for the linking process. We believe that the method of record linkage is very useful in identifying connections between individuals otherwise not visible, especially when working with “big data” (as for historical data, not the sociological big data).

Our paper aims not merely at presenting preliminary results of these efforts, but more importantly at detailing the methodological and technical guidelines of the record linkage procedure.




This work was supported by a grant of the Romanian Ministry of Research and Innovation, through UEFISCDI, project no. PN-III-P4-IDPCE-2016-0255.
epigraphy; historical database; record linkage; micro history; prospopography