Bibliographie complète
The Challenges of HTR Model Training: Feedback from the Project Donner le goût de l'archive à l'ère numérique
Type de ressource
Article de revue
Auteurs/contributeurs
- Beatrice, Couture (Auteur)
- Farah, Verret (Auteur)
- Maxime, Gohier (Auteur)
- Dominique, Deslandres (Auteur)
Titre
The Challenges of HTR Model Training: Feedback from the Project Donner le goût de l'archive à l'ère numérique
Résumé
The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
Publication
Journal of Data Mining & Digital Humanities
Volume
Documents historiques et reconnaissance automatique de texte
Date
2023-12-06
Abrév. de revue
J. Data Min. Digit. Humanit.
Langue
Anglais
ISSN
2416-5999
Titre abrégé
The Challenges of HTR Model Training
Consulté le
09/12/2024 12:01
Catalogue de bibl.
Extra
Publisher: Episciences.org
Référence
Beatrice, Couture, Verret Farah, Gohier Maxime, et Deslandres Dominique. « The Challenges of HTR Model Training: Feedback from the Project Donner le goût de l’archive à l’ère numérique ». Journal of Data Mining & Digital Humanities Documents historiques et reconnaissance automatique de texte (6 décembre 2023). https://doi.org/10.46298/jdmdh.10542.
Années
Corps professoral
Lien vers cette notice