Hsinchun Chen

Hsinchun Chen

Professor, Management Information Systems
Regents Professor
Member of the Graduate Faculty
Professor, BIO5 Institute
Primary Department
Contact
(520) 621-4153

Research Interest

Dr Chen's areas of expertise include:Security informatics, security big data; smart and connected health, health analytics; data, text, web mining.Digital library, intelligent information retrieval, automatic categorization and classification, machine learning for IR, large-scale information analysis and visualization.Internet resource discovery, digital libraries, IR for large-scale scientific and business databases, customized IR, multilingual IR.Knowledge-based systems design, knowledge discovery in databases, hypertext systems, machine learning, neural networks computing, genetic algorithms, simulated annealing.Cognitive modeling, human-computer interactions, IR behaviors, human problem-solving process.

Publications

Chau, M., & Chen, H. (2008). A machine learning approach to web page filtering using content and structure analysis. Decision Support Systems, 44(2), 482-494.

Abstract:

As the Web continues to grow, it has become increasingly difficult to search for relevant information using traditional search engines. Topic-specific search engines provide an alternative way to support efficient information retrieval on the Web by providing more precise and customized searching in various domains. However, developers of topic-specific search engines need to address two issues: how to locate relevant documents (URLs) on the Web and how to filter out irrelevant documents from a set of documents collected from the Web. This paper reports our research in addressing the second issue. We propose a machine-learning-based approach that combines Web content analysis and Web structure analysis. We represent each Web page by a set of content-based and link-based features, which can be used as the input for various machine learning algorithms. The proposed approach was implemented using both a feedforward/backpropagation neural network and a support vector machine. Two experiments were designed and conducted to compare the proposed Web-feature approach with two existing Web page filtering methods - a keyword-based approach and a lexicon-based approach. The experimental results showed that the proposed approach in general performed better than the benchmark approaches, especially when the number of training documents was small. The proposed approaches can be applied in topic-specific search engine development and other Web applications such as Web content management. © 2007 Elsevier B.V. All rights reserved.

Chung, W., Lai, G., Bonillas, A., Wei, X. i., & Chen, H. (2008). Organizing domain-specific information on the Web: An experiment on the Spanish business Web directory. International Journal of Human Computer Studies, 66(2), 51-66.

Abstract:

Web directories organize voluminous information into hierarchical structures, helping users to quickly locate relevant information and to support decision-making. The development of existing ontologies and Web directories either relies on expert participation that may not be available or uses automatic approaches that lack precision. As more users access the Web in their native languages, better approaches to organizing and developing non-English Web directories are needed. In this paper, we have proposed a semi-automatic framework, which consists of anchor directory boosting, meta-searching, and heuristic filtering, to construct domain-specific Web directories. Using the framework, we have built a Web directory in the Spanish business (SBiz) domain. Experimental results show that the SBiz Web directory achieved significantly better recall, F-value, efficiency, and satisfaction rating than the benchmark directory. Subjects provided favorable comments on the SBiz Web directory. This research thus contributes to developing a useful framework for organizing domain-specific information on the Web and to providing empirical findings and useful insights for end-users, system developers, and researchers of Web information seeking and knowledge management. © 2007 Elsevier Ltd. All rights reserved.

Zhang, Y., Dang, Y., & Chen, H. (2011). Gender classification for web forums. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, 41(4), 668-677.

Abstract:

More and more women are participating in and exchanging opinions through community-based online social media. Questions concerning gender differences in the new media have been raised. This paper proposes a feature-based text classification framework to examine online gender differences between Web forum posters by analyzing writing styles and topics of interest. Our experiment on an Islamic women's political forum shows that feature sets containing both content-free and content-specific features perform significantly better than those consisting of only content-free features, feature selection can improve the classification results significantly, and female and male participants have significantly different topics of interest. © 2011 IEEE.

Chen, H., Shankaranarayanan, G., She, L., & Iyer, A. (1998). A machine learning approach to inductive query by examples: An experiment using relevance feedback, ID3, genetic algorithms, and simulated annealing. Journal of the American Society for Information Science, 49(8), 693-705.

Abstract:

Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades. In the 1980s, knowledge-based techniques also made an impressive contribution to "intelligent" information retrieval and indexing. More recently, information science researchers have turned to other newer inductive learning techniques including symbolic learning, genetic algorithms, and simulated annealing. These newer techniques, which are grounded in diverse paradigms, have provided great opportunities for researchers to enhance the information processing and retrieval capabilities of current information systems. In this article, we first provide an overview of these newer techniques and their use in information retrieval research. In order to familiarize readers with the techniques, we present three promising methods: The symbolic ID3 algorithm, evolution-based genetic algorithms, and simulated annealing. We discuss their knowledge representations and algorithms in the unique context of information retrieval. An experiment using a 8000-record COMPEN database was performed to examine the performances of these inductive query-by-example techniques in comparison with the performance of the conventional relevance feedback method. The machine learning techniques were shown to be able to help identify new documents which are similar to documents initially suggested by users, and documents which contain similar concepts to each other. Genetic algorithms, in particular, were found to out-perform relevance feedback in both document recall and precision. We believe these inductive machine learning techniques hold promise for the ability to analyze users' preferred documents (or records), identify users' underlying information needs, and also suggest alternatives for search for database management systems and Internet applications.

Chen, H., Kantor, P., & Roberts, F. (2007). ISI 2007 Preface. ISI 2007: 2007 IEEE Intelligence and Security Informatics, iii-iv.