11th-century glyphs classified through AI research project

Watch all the Transform 2020 sessions on-demand here.

Artificial intelligence can detect faces, grocery items, and even potentially poisonous mushrooms. So why not historical graffiti?

In a paper published on the preprint server Arxiv.org (“Open Source Dataset and Machine Learning Techniques for Automatic Recognition of Historical Graffiti“), researchers at the National Technical University of Ukraine and Huizhou University’s School of Information Science and Technology describe a machine learning model that detects, isolates, and classifies ancient letters carved on the stone walls of a Kiev cathedral.

“[C]arved handwriting has usually much worse quality and shabby state to provide the similar values of accuracy … Usually, the preprocessing requires a priori knowledge about the entire glyph, but [certain] datasets are not available at the moment as open source databases …” the team wrote. “The main aim of this paper is to apply some machine learning techniques for automatic recognition of … historical graffiti … and estimate their efficiency in the view of the complex geometry, barely discernible shape, and low statistical representativeness.”

The researchers focused the bulk of their efforts on Glagolitic and Cyrillic, two alphabets commonly used in Eastern Slavic visual texts. Archeologists found glyphs from both — some dating back to the 11th century — on the St. Sophia Cathedral in Ukraine. To date, about 7,000 have been detected and studied.

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

It goes without saying that historical letter datasets aren’t as common as, say, those for the Arabic alphabet, so the team assembled and preprocessed a collection of more than 4,000 images for 34 letter types. They used notMINST, a second database containing publicly available fonts and glyphs for the letters A-J, to compare the two outputs.

They next embarked on training a convolutional neural network — a type of machine learning algorithm that’s commonly used in computer vision — to recognize the graffiti by feeding it data from the notMINST and their novel dataset, taking care to horizontally and vertically flip some of the original images so as to prevent overfitting.

The neural net was 99 percent accurate in isolating characters from the team’s dataset and notMINST, respectively.

In future, the researchers hope to improve the model by “teaching” it to consider factors like date, language, authorship, authenticity, and meaning. Furthermore, they propose the creation of larger datasets shared “around the world” in the spirit of “open science, volunteer data collection, processing, and computing,” which they say will lead to further advancements.

“[G]raffiti are very powerful source of historical knowledge. [F]or example, the only known source of the Safaitic language is graffiti inscriptions on the surface of rocks in southern Syria, eastern Jordan and northern Saudi Arabia,” they wrote. “The recent progress of computer vision and machine learning methods allows applying some of them to improve the current recognition, identification, localization, semantic segmentation, and interpretation of such historical graffiti of various origin …”

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

The AI insights you need to lead