Flax, specialists in open source search software, have delivered a new print media monitoring system for Medianet Monitoring, a division of Australian Associated Press and one of Australia’s leading media monitoring agencies, providing monitoring of print, broadcast and internet media clips and tailored media analysis reports. Based on a modified version of the Apache Lucene/Solr open source search engine, the solution provides powerful, scalable and innovative monitoring of news data for hundreds of Medianet Monitoring’s clients.
Kylie O’Reilly, Managing Director of Medianet said 'We were previously using a system based on a closed-source search engine that didn’t allow us the flexibility we required to demonstrate innovation when we analyze the thousands of articles we monitor for our clients. However we have significant investment in stored profiles that encapsulate our clients' interests – so any new solution had to be able to speak the same 'search language' as the old system.'
Flax modified the Apache Lucene search engine to make it compatible with the dtSearch software previously used by Medianet Monitoring and to return extra information about the exact position of any important words found in an article, which is essential for any monitoring application. The Flax media monitoring system then uses Lucene to carry out tens of thousands of searches on each incoming article in under a second, working 20 times faster than the previous engine.
'Media monitoring is unlike a traditional search application, as instead of applying a single search query to an index of many documents, we're applying many queries to a single news article,' said Charlie Hull, Managing Director of Flax. 'This can lead to severe performance issues unless one designs the system very carefully.'
To allow easy integration with Medianet Monitoring’s existing workflow, Flax created a RESTful Web Service architecture, allowing new articles and customer profiles (or 'briefs') to be supplied as XML. The system outputs XML marked-up to show where an article is relevant to a particular client. In addition each new article is added to an archive, which allows both traditional searches of past news and verification of any new customer briefs, via a web application. A HTTP status server allows for the system configuration and performance to be checked and for individual articles to be submitted for testing against the stored briefs.
'Flax worked with our development team very closely to identify our requirements and to test every aspect of the new system – we can't afford to miss any articles of relevance to our clients', said Kylie O’Reilly. 'When we first saw the new system in action we were astonished how fast it was.'
In addition to replicating the functionality of the previous search engine, the new system has additional features such as area weighting (allowing matches within headlines to be more important, for example), compensation for errors introduced during scanning of print news and subsequent OCR (optical character recognition) and provides confidence scores for any matches, so that Medianet Monitoring can then decide which articles to pass to human operators for final verification.
Flax, who have previously worked for other media monitoring companies such as the UK's Gorkana Group, is currently developing the Flax Media Monitor into a complete end-to-end monitoring solution for print and other media in conjunction with industry partners. 'Using open source software we can offer a highly scalable and affordable monitoring solution,' says Charlie Hull. 'As content volume increases and with the need to monitor multiple sources including online social media, advanced open source search technology is becoming the most flexible and cost-effective way to continue to deliver high-quality media analysis'.