The Process of record linkage on Roman epigraphical sources
Theory, methods and results
The proposed paper explores the methodological and practical aspects of linking individuals extracted from ancient epigraphic sources. We are extracting our information from Romans1by1, a population database for persons attested in ancient epigraphy. Built as a MySQL relational database, it follows the best practice models for population databases, thus being distinct from the platforms focused on hosting and offering various types of repertoires of sources. R1by1 deals with people attested epigraphically in the Roman provinces and the architecture of the metadata was designed to facilitate research (mainly) pertaining to prosopography and SNA.
As far as we know, record linkage has not been applied to people extracted from epigraphy so far. Record linkage is a procedure used for finding double data, standardization and data matching across different data sources (Hin et al. 2016). Many historical databases contain data entry errors such as misspellings, or the parameters for an individual change over time (moving to another place, changing occupation, gaining different titles, etc.). In order to have a successful linkage, data cleaning and standardization are important steps. A method used for record linkage is calculating the Jaro-Winkler distance: measuring the similarities between 2 strings (sequence of characters like names, places, occupations etc.), we get a 0 score for 0 matches, while a perfect match equals 1. In the end, a good record linkage will provide a base for analysing social mobility, mortality, migration and reconstructing population patterns.
The data extracted from epigraphical material is not very rich and the number of missing variables is large. So, in order to test the linkage possibilities, we are using a semi-automated method. Employing a series of procedures written in Sql language the computer is going to compile a file with the probable identifications of same individuals from the Romans1by1 database. As variables used for linking we include: praenomen, nomen, cognomen, occupation, tribus, origo/domus etc.. To verify if the individuals are correctly linked we are going to do a manual check (in this phase, relations and other people attested on the same monuments will be paramount). Thus, we are able to establish some basic rules for the linking process. We believe that the method of record linkage is very useful in identifying connections between individuals otherwise not visible, especially when working with “big data” (as for historical data, not the sociological big data).
Our paper aims not merely at presenting preliminary results of these efforts, but more importantly at detailing the methodological and technical guidelines of the record linkage procedure.
Alföldy (1965): Alföldy, G., Bevolkerung und Gesellschaft der römischen Provinz Dalmatien, Budapest.
Broekaert (2013): Broekaert, W., Navicularii et negotiantes: a prosopographical study of Roman merchants and shippers, Rahden/Westf.
Cooley (2012): Cooley, A., The Cambridge Manual of Latin Epigraphy, Cambridge.
Herman (1983): Herman, J., “La langue latine dans la Gaule romaine”, Aufstieg und Niedergang der römischen Welt II 29 (2), 1045–1060.
Hin et al. (2006): Hin, S., Conde, D. A. and A. Lenart, “New light on Roman census papyri through semi automated record linkage”, Historical Methods: A Journal of Quantitative and Interdisciplinary History 49 (1), 50-65.
Mandemakers u Dillon (2004): Mandemakers, K. and L. Dillon, “Best Practices with Large Databases on Historical Populations”, Historical Methods: A Journal of Quantitative and Interdisciplinary History 37 (1), 34–38.
Varga (2013): Varga, R., “Two inscriptions from Sarmizegetusa revisited”, Studia Antiqua et Archaeologica 19, 79-86.
Varga (2016): Varga, R., “Aurelius Aquila, negotiator ex provincia Dacia. A prosopographic reconstruction”, in: R. Ardevan, E. Beu-Dachin (eds.), Mensa rotunda epigraphica Napocensis, Cluj-Napoca, 27-34
Varga (2017): Varga, R., “Romans1by1 v.1.1. New developments in the study of Roman population”, Digital Classics Online 3 (2), 44-50.
Varga et al. (2018): Varga, R., Pázsint, A., Boda, I., Deac, D., “Romans 1by1. Overview of a research project”, Digital Classics Online 4 (2), 37-63.
AE – Année Épigraphique, Paris.
AIJ – Hoffiller, V. and B. Saria. 1938. Antike Inschriften aus Jugoslawien. Zagreb: Druck der Fondsdruckerei der ‘Narodne novine’.
CIL – Corpus Inscriptionum Latinarum, Berlin.
IDR – Inscriptiones Daciae Romanae, București-Paris.
ISM – Inscriptiones Scythiae Minoris, București-Paris.
RIB – Roman Inscriptions of Britain, Oxford.