Hsinchun Chen
Publications
Abstract:
Improvised explosive device web pages represent a significant source of knowledge for security organizations. In this paper, we present significant improvements to our approach to the discovery and classification of IED related web pages in the Dark Web. We present a statistical feature ranking approach to the expansion of the keyword lexicon used to discover IED related web pages, which identified new relevant terms for inclusion. Additionally, we present an improved web page feature representation designed to better capture the structural and stylistic cues revealing of genres of communication, and a series of experiments comparing the classification performance of the new representation with our existing approach. ©2009 IEEE.
Abstract:
Information overload is a critical problem in World Wide Web. Category map developed based on Kohonen's self-organizing map (SOM) has been proven to be a promising browsing tool for the Web. The SOM algorithm automatically categorizes a large Internet information space into manageable sub-spaces. It compresses and transforms a complex information space into a two-dimensional graphical representation. Such graphical representation provides a user-friendly interface for users to explore the automatically generated mental model. However, as the amount of information increases, it is expected to increase the size of the category map accordingly in order to accommodate the important concepts in the information space. It results in increasing of visual load of the category map. Large pool of information is packed closely together on a limited size of displaying window, where local details are difficult to be clearly seen. In this paper, we propose the fisheye views and fractal views to support the visualization of category map. Fisheye views are developed based on the distortion approach while fractal views are developed based on the information reduction approach. The purpose of fisheye views are to enlarge the regions of interest and diminish the regions that are further away while maintaining the global structure. On the other hand, fractal views are an approximation mechanism to abstract complex objects and control the amount of information to be displayed. We have developed a prototype system and conducted a user evaluation to investigate the performance of fisheye views and fractal views. The results show that both fisheye views and fractal views significantly increase the effectiveness of visualizing category map. In addition, fractal views are significantly better than fisheye views but the combination of fractal views and fisheye views do not increase the performance compared to each individual technique. © 2002 Elsevier Science B.V. All rights reserved.
Abstract:
The amount of non-English information on the Web has proliferated so rapidly in recent years that it often is difficult for a user to retrieve documents in an unfamiliar language. In this study, we report the design and evaluation of a multilingual Web portal in the business domain in English, Chinese, Japanese, Spanish, and German. Web pages relevant to the domain were collected. Search queries were translated using bilingual dictionaries, while phrasal translation and co-occurrence analysis were used for query translation disambiguation. Pivot translations were also used for language-pairs where bilingual dictionaries were not available. A user evaluation study showed that on average, multilingual performance achieved 72.99% of monolingual performance. In evaluating pivot translation, we found that it achieved 40% performance of monolingual retrieval, which was not as good as direct translation. Overall, our results are encouraging and show promise of successful application of MLIR techniques to Web retrieval.
Abstract:
It is important for education in computer science and information systems to keep up to date with the latest development in technology. With the rapid development of the Internet and the Web, many schools have included Internet-related technologies, such as Web search engines and e-commerce, as part of their curricula. Previous research has shown that it is effective to use search engine development tools to facilitate students' learning. However, the effectiveness of these tools in the classroom has not been evaluated. In this article, we review the design of three search engine development tools, SpidersRUs, Greenstone, and Alkaline, followed by an evaluation study that compared the three tools in the classroom. In the study, 33 students were divided into 13 groups and each group used the three tools to develop three independent search engines in a class project. Our evaluation results showed that SpidersRUs performed better than the two other tools in overall satisfaction and the level of knowledge gained in their learning experience when using the tools for a class project on Internet applications development. © 2009 ASIS & T.