Semantic component selection

Sjachyn, Maxym (2009) Semantic component selection. Doctoral thesis, University of Westminster.


Download (4MB)


The means of locating information quickly and efficiently is a growing area of research. However the real challenge is not related to locating bits of information, but finding those that are relevant. Relevant information resides within unstructured ‘natural’ text. However, understanding natural text and judging information relevancy is a challenge. The challenge is partially addressed by use of semantic models and reasoning approaches that allow categorisation and (within limited fashion) provide understanding of this information. Nevertheless, many such methods are dependent on expert input and, consequently, are expensive to produce and do not scale. Although automated solutions exist, thus far, these have not been able to approach accuracy levels achievable through use of expert input. This thesis presents SemaCS - a novel nondomain specific automated framework of categorising and searching natural text. SemaCS does not rely on expert input; it is based on actual data being searched and statistical semantic distances between words. These semantic distances are used to perform basic reasoning and semantic query interpretation. The approach was tested through a feasibility study and two case studies. Based on reasoning and analyses of data collected through these studies, it can be concluded that SemaCS provides a domain independent approach of semantic model generation and query interpretation without expert input. Moreover, SemaCS can be further extended to provide a scalable solution applicable to large datasets (i.e. World Wide Web). This thesis contributes to the current body of knowledge by establishing, adapting, and using novel techniques to define a generic selection/categorisation framework. Implementing the framework outlined in the thesis improves an existing algorithm of semantic distance acquisition. Finally, as a novel approach to the extraction of semantic information is proposed, there exists a positive impact on Information Retrieval domain and, specifically, on Natural Language Processing, word disambiguation and Web/Intranet search.

Item Type: Thesis (Doctoral)
Subjects: University of Westminster > Science and Technology > Electronics and Computer Science, School of (No longer in use)
Depositing User: Miss Nina Watts
Date Deposited: 17 Aug 2010 09:22
Last Modified: 17 Aug 2010 09:22

Actions (login required)

Edit Item (Repository staff only) Edit Item (Repository staff only)