A team of researchers has
developed a model that could lead to better techniques to identify key words that capture the topics
in a web page.
Filippo Menczer and Alessandro Flammini of Indiana University, along with M. Ángeles Serrano University of Barcelona have
developed a generative model that explained by simple rules of the simultaneous occurrence patterns observed in many written languages.
Their study focuses on the well-known Zipf Law on the frequencies of words, as well as additional patterns such as the Law of Heaps on the diversity of words, and the similarity between documents.
This research could have practical applications in computer science, cognitive science and linguistics. For example, all search engines are based on analysis of texts. The model developed by the researchers and the discoveries made in their study could lead to better techniques to identify key words that capture the issues of a web page, which is crucial to match search queries with relevant results.
The research team, therefore, confident that their work will stimulate further research in this area.
The model could end up being the foundation of an approach capable of experts to help improve a wide range of applications based on the analysis of written text, as search engines, contextual advertising and online automatic detection of the subject of a page.