History of Papyrology

About the Project

The project is a digital edition of correspondence exchanged between papyrologists in the late 19th and early 20th century, when the discipline of papyrology was in its infancy. It currently focuses mainly on letters sent by B.P. Grenfell (1869-1926) – together with A.S. Hunt (1871-1934), the founder of British papyrology – but it aims to expand its cover in the near future. UC Berkeley’s Prof. Todd Hickey has long had a research interest in the early history of papyrology and has published extensively on the topic. For decades, he searched for papyrologists’ letters throughout American and European libraries and published some of these (e.g., BASP xxxx). However, this work also led him to the realization that only a collaborative, digital project would lead to a fuller understanding … impact … more letters accessible … History of Papyrology

The Team

History of Papyrology began in 2022 as The Smyly Correspondence Project Online, a thesis project created by Sarah Tew in the Digital Humanities and Digital Knowledge LM program at the University of Bologna, Italy. In September 2023, the team expanded to include Dr. Giuliano Sidro, postdoctoral scholar at the Center for the Tebtunis Papyri (CTP), and undergraduates Maddie Qualls and Millie You, who worked on the project through UC Berkeley’s Undergraduate Research Apprenticeship Program (URAP) at CTP.

Document processing: Transcription, encoding, and commentary

[Unibo era - Sarah Tew manually transcribed and encoded letters [X TO X]. TEI Encoding - P5 guidelines; what was encoded

Editorial guidelines

Transcriptions are documentary. Handwritten text is rendered in Times New Roman while the printed text of printed letterheads and postcard headings is rendered in Arial. The layout of the documents is only partially rendered. Major features of correspondence, such as datelines, signatures, and postscripts have been formatted accordingly. Indents and whitespace have only been preserved in handwritten tables in ancient Greek where their layout significantly impacts the meaning and readability of the information. Postmarks and stamps have not been rendered on the webpage but are described in the XML headers. Named entities have been hyperlinked to their Wikidata pages and are rendered in blue. Commentary from Dr. Todd Hickey accompanies each document.

Transcription

Sarah Tew - manual transcription for letters X to X using Transkribus. Layout analysis then manual transcription and encoding within Transkribus Export XML transcriptions Further processing in Oxygen XML editor - mainly structural tags since the export wasn’t exactly what we needed Sidro, Qualls, and You joined the team in Fall 2023 and Spring 2024 semesters and took over the transcription work. They worked on the Trinity College Dublin letters from B.P. Grenfell and A.S. Hunt to J.G. Smyly. Our goal at the time was to update the manual transcriptions done by Sarah and to create notes for letters 77 to 106 based on corrections and commentary from Dr. Hickey. In order to do this, we edited the transcriptions and marked up the text with named-entity and non-entity tags in Transkribus before exporting them as XML files. Then we edited the XMLs in Oxygen XML editor by adding a more robust header than the one generated upon export by Transkribus, adding attributes to create a more accurate display of the letters’ formatting, editing the files to make them comply with TEI guidelines, and adding pointers to link to the commentary. Also in Oxygen, we created HTML note files containing Dr. Hickey’s annotations. In Spring 2025 we began work to ingest a new collection of letters, from B.P. Grenfell and mostly to P.A. Hearst or B.I. Wheeler, that are held at UC Berkeley’s The Bancroft Library. This time, we used a model we developed in Transkribus to create preliminary transcriptions. For details on the model, please see the Digital Humanities section below. In Summer 2025, Maddie Qualls updated the Trinity College Dublin group transcriptions and notes based on correction and commentary from Dr. James Keenan. All transcriptions are available at [GITHUB LINK].

Encoding

All documents have been encoded according to the TEI P5 [https://tei-c.org/guidelines/p5/] guidelines. Structural components encoded include: pages, paragraphs, post-scripts, letterheads, signature lines and datelines. Structural feature encoding was completed through a combination of manual encoding by Sarah Tew, Giuliano Sidro, Maddie Qualls, and Millie You and built-in TEI encoding features in Transkribus. [Option to list specific tags] Strikethroughs, deletions, and additions were manually encoded and are rendered in the digital edition by Sarah Tew, Giuliano Sidro, Maddie Qualls, and Millie You. Named people, places, organizations, and published or written and artistic works were manually encoded by Sarah Tew, Giuliano Sidro, Maddie Qualls, and Millie You TEI header - what’s in it

Commentary

Todd Hickey provided commentary James Keenan, Loyola University Chicago Flavio - did he provide commentary on letter content or more the website and presentation of content? Team’s creation of note files

Images

The high resolution images of the documents were supplied by the holding collections. Trinity College and the University of California Bancroft Libraryprovided high-resolution TIFFs of documents in their collection. Sarah Tew used LibVips to create deep-zoom image files. The website uses OpenSeadragon to display these files.

Website Creation

History of Papyrology is a static website generated with Hugo and hosted by Dr. Todd Hickey. To create this website, Sarah transformed the XML transcriptions through an XSLT stylesheet in Oxygen into Markdown and HTML files. Sarah wrote templates for each webpage and then used Hugo to compile all of the files into the website you are now viewing. The website is hosted by Dr. Todd Hickey through Dreamhosters. In the summer of 2025, Sarah and Millie ingested UC Berkeley’s collection of letters that the team transcribed and tagged during the spring of 2025. In the following semester, the team focused on improving the usability of the website, working on interface design to better the information structure and navigation. The team conducted a survey of existing digital archives, and Millie updated the HTML and CSS to incorporate the new designs.

Digital humanities Projects

In the fall semester of 2024, with the Smyly correspondence fully on the website, we turned to digital humanities experiments with the transcriptions and named-entity data from the corpus. Sarah organized Zoom-based workshops on text-recognition models in Transkribus, data visualizations in AntConc, text analysis in Voyant, and network analysis in Gephi. In Transkribus, we trained multiple models with various groups and amounts of materials from the TCD collection. After some trials with this, our best model was Grenfell 1 with a character error rate of 14.81%. This model was trained on documents 50-99, which consist of mostly letters containing only handwritten material with the addition of three postcards and three letters with printed letterheads. We also tested out one of Trankribus’ Super Models, The Text Titan I, which is a large language model trained on handwritten material in multiple European languages, including English. After some qualitative assessment, our trained model Grenfell 1 appeared to perform better than The Text Titan I. Although its CER was higher than the target percentage of 8%, it still proved to be a useful tool in later developments of our project. After the network analysis workshop, Maddie was inspired to investigate the nature of Grenfell’s relationships with the people he writes about. Initially, she sorted the named people into one of five categories: professional (45.95%), personal (4.05%), both (8.11%), none (32.43%), or unknown (9.46%). This classification is subjective and involves a lot of gray area, but she established criteria to identify each person. After labeling each person, she imported this data as a CSV file into Gephi to create a visualization and statistical analysis. This analysis revealed that the majority are professional relations as Grenfell’s letters to Smyly are mostly about professional and scholarly topics. This is followed by the “none” category because of how frequently he writes about the ancient figures as they relate to his papyrological work. As for the both professional and personal relations, while they only make up about 8% of those mentioned by Grenfell, they are given more weight by the fact that they are some of the most mentioned individuals – visualized by the size of their nodes in the Gephi visualization. While this is solely a reflection of how the documents reference the people, rather than the actual reality of the relationships between Grenfell and these individuals, this analysis can help us better understand the nature of correspondence between Grenfell and Smyly. In January, our team presented the results of our digital humanities research at the Ancient MakerSpaces session of the 2025 Society for Classical Studies meeting in Philadelphia.

UCB Collection

During the spring 2025 semester, we began to ingest a new batch of correspondence, consisting of 14 letters from the George and Phoebe Apperson Hearst papers and UC Berkeley’s University Archives, both kept at The Bancroft Library. The letters are from B.P Grenfell to multiple recipients, mostly Phoebe A. Hearst and Benjamin I. Wheeler, and concern the publication of the Tebtunis Papyri volumes. To begin, we imaged the letters to upload into Transkribus and performed and edited layout analysis on each letter to prepare them for transcription. Then, we used our new text-recognition model as the initial stage of transcription, which we made manual corrections to and added tags to. From here, the process was much the same as with the TCD documents: we exported the transcriptions as XMLs, edited them in Oxygen, and made corrections and HTML note files with Dr. Hickey’s review.

Future of the project

We plan to continue to expand the scope of the project by processing correspondence by Grenfell and other papyrologists from more collections and maybe even including other languages and forms of media, such as journals, in order to create a digital archive that helps inform the history of papyrology.

Shortlist of software used to created this project

Transkribus LibVips OpenSeadragon Hugo Oxygen XML Editor

Thank yous

Trinity College Dublin Board of Trustees University of California, Berkeley Libraries CTP Flavio James Keenan Brian McGing Giulio Iovine Unibo