Hsinchun Chen

Hsinchun Chen

Professor, Management Information Systems
Regents Professor
Member of the Graduate Faculty
Professor, BIO5 Institute
Primary Department
Contact
(520) 621-4153

Research Interest

Dr Chen's areas of expertise include:Security informatics, security big data; smart and connected health, health analytics; data, text, web mining.Digital library, intelligent information retrieval, automatic categorization and classification, machine learning for IR, large-scale information analysis and visualization.Internet resource discovery, digital libraries, IR for large-scale scientific and business databases, customized IR, multilingual IR.Knowledge-based systems design, knowledge discovery in databases, hypertext systems, machine learning, neural networks computing, genetic algorithms, simulated annealing.Cognitive modeling, human-computer interactions, IR behaviors, human problem-solving process.

Publications

Yang, M., & Chen, H. (2012). Partially supervised learning for radical opinion identification in hate group web forums. ISI 2012 - 2012 IEEE International Conference on Intelligence and Security Informatics: Cyberspace, Border, and Immigration Securities, 96-101.

Abstract:

Web forums are frequently used as platforms for the exchange of information and opinions, as well as propaganda dissemination. But online content can be misused when the information being distributed, such as radical opinions, is unsolicited or inappropriate. However, radical opinion is highly hidden and distributed in Web forums, while non-radical content is unspecific and topically more diverse. It is costly and time consuming to label a large amount of radical content (positive examples) and non-radical content (negative examples) for training classification systems. Nevertheless, it is easy to obtain large volumes of unlabeled content in Web forums. In this paper, we propose and develop a topic-sensitive partially supervised learning approach to address the difficulties in radical opinion identification in hate group Web forums. Specifically, we design a labeling heuristic to extract high quality positive examples and negative examples from unlabeled datasets. The empirical evaluation results from two large hate group Web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance than its counterparts. © 2012 IEEE.

Qin, J., Zhou, Y., Chau, M., & Chen, H. (2003). Supporting multilingual information retrieval in Web applications: An English-Chinese Web portal experiment. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2911, 149-152.

Abstract:

Cross-language information retrieval (CLIR) and multilingual information retrieval (MLIR) techniques have been widely studied, but they are not often applied to and evaluated for Web applications. In this paper, we present our research in developing and evaluating a multilingual English-Chinese Web portal in the business domain. A dictionary-based approach has been adopted that combines phrasal translation, co-occurrence analysis, and pre- and post-translation query expansion. The approach was evaluated by domain experts and the results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision when compared with simple word-by-word translation. © Springer-Verlag Berlin Heidelberg 2003.

Wang, G. A., Chen, H., & Atabakhsh, H. (2006). A probabilistic model for approximate identity matching. ACM International Conference Proceeding Series, 151, 462-463.

Abstract:

Identity management is critical to various governmental practices ranging from providing citizens services to enforcing homeland security. The task of searching for a specific identity is difficult because multiple identity representations may exist due to issues related to unintentional errors and intentional deception. We propose a probabilistic Naïve Bayes model that improves existing identity matching techniques in terms of effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based technique as well as the approximate-match based record comparison algorithm. In addition, our model greatly reduces the efforts of manually labeling training instances by employing a semi-supervised learning approach. This training method outperforms both fully supervised and unsupervised learning. With a training dataset that only contains 10% labeled instances, our model achieves a performance comparable to that of a fully supervised learning.

Huang, Z., Chung, W., Ong, T., & Chen, H. (2002). A graph-based recommender system for digital library. Proceedings of the ACM International Conference on Digital Libraries, 65-73.

Abstract:

Research shows that recommendations comprise a valuable service for users of a digital library [11]. While most existing recommender systems rely either on a content-based approach or a collaborative approach to make recommendations, there is potential to improve recommendation quality by using a combination of both approaches (a hybrid approaches). In this paper, we report how we tested the idea of using a graph-based recommender system that naturally combines the content-based and collaborative approaches. Due to the similarity between our problem and a concept retrieval task, a Hopfield net algorithm was used to exploit high-degree book-book, user-user and book-user associations. Sample hold-out testing and preliminary subject testing were conducted to evaluate the system, by which it was found that the system gained improvement with respect to both precision and recall by combining content-based and collaborative approaches. However, no significant improvement was observed by exploiting high-degree associations.

Chung, W., Chen, H., & Reid, E. (2009). Business stakeholder analyzer: An experiment of classifying stakeholders on the web. Journal of the American Society for Information Science and Technology, 60(1), 59-74.

Abstract:

As the Web is used increasingly to share and disseminate information, business analysts and managers are challenged to understand stakeholder relationships.Traditional stakeholder theories and frameworks employ a manual approach to analysis and do not scale up to accommodate the rapid growth of the Web. Unfortunately, existing business intelligence (BI) tools lack analysis capability, and research on BI systems is sparse. This research proposes a framework for designing BI systems to identify and to classify stakeholders on the Web, incorporating human knowledge and machine-learned information from Web pages. Based on the framework, we have developed a prototype called Business Stakeholder Analyzer (BSA) that helps managers and analysts to identify and to classify their stakeholders on the Web. Results from our experiment involving algorithm comparison, feature comparison, and a user study showed that the system achieved better within-class accuracies in widespread stakeholder types such as partner/sponsor/supplier and media/reviewer, and was more efficient than human classification. The student and practitioner subjects in our user study strongly agreed that such a system would save analysts' time and help to identify and classify stakeholders. This research contributes to a better understanding of how to integrate information technology with stakeholder theory, and enriches the knowledge base of BI system design. © 2008 ASIS&T.