Google has chosen Dublin City University’s (DCU) Adaptive Information Cluster (AIC) to take part in a project with two American universities as part of the company’s quest to make all information in the world searchable. The project will focus on making rare documents like the Book of Kells or George Washington’s personal diaries available on the web for scholars.
Up until now such research has been kept behind closed doors or is accessible for examination in digital libraries one page at a time.
The project is being carried out by the AIC at DCU in partnership with the University of Buffalo and the University of Massachusetts at Amherst.
The DCU team, lead by Prof Alan Smeaton and Dr Noel O’Connor, has internationally recognised expertise in video analysis and has applied this to making images of handwriting searchable.
Dr O’Connor said: “With handwriting, which is at present not searchable, we are getting very good detection using the shape of a word — even though the writer will always alter the way he or she writes the same word each time. We’ve applied the approach to hundreds of pages of George Washington’s diaries and memoirs, getting very good results. For example, you can select the word “battle” and find all the references to that word in Washington’s writings.”
“This will make historical manuscripts searchable for scholars and others in a way that has never been possible before,” said Prof Smeaton.
Libraries around the world are in the process of digitising their rare and historical manuscripts. In the future, by using this technology, Google search engines could make these manuscripts available and searchable worldwide.
DCU is also involved with the Dublin Institute of Advanced Studies in the Irish Script on Screen (ISOS) project, which is digitising old manuscripts written in Irish.
Thousands of images have been scanned with the intention also of making them searchable. The system is based on “object detection” in video: detecting and identifying images of people, cars or other objects in different video frames, even though there may be altered positions or angles and applying this to differing slants or shapes of words in handwriting. The algorithms designed by DCU researchers can detect reasonable variations in shape, exactly the same variations that we have in our handwriting.
Prof Smeaton believes the techniques being developed in this project could lead to handwritten manuscripts being available for searching in the giant Google index within a couple of years. “As a company, Google moves very fast and if the techniques we are developing in this project are as good as early results indicate, we can expect to see Google take up the outputs.”
AIC was established two years ago and is funded by Science Foundation Ireland. It is a multi-disciplinary research group involving leading researchers from DCU and University College Dublin working in sensor science, software engineering, electronic engineering and computer science. Close collaboration with industry and state bodies to develop applications for this research is a priority for the AIC. Particular areas of interest are personal health management, environmental monitoring, personalised retailing and security and threat detection.
By John Kennedy