Hsinchun Chen
Member of the Graduate Faculty
Professor, BIO5 Institute
Professor, Management Information Systems
Regents Professor
Primary Department
(520) 621-4153
Research Interest
Dr Chen's areas of expertise include:Security informatics, security big data; smart and connected health, health analytics; data, text, web mining.Digital library, intelligent information retrieval, automatic categorization and classification, machine learning for IR, large-scale information analysis and visualization.Internet resource discovery, digital libraries, IR for large-scale scientific and business databases, customized IR, multilingual IR.Knowledge-based systems design, knowledge discovery in databases, hypertext systems, machine learning, neural networks computing, genetic algorithms, simulated annealing.Cognitive modeling, human-computer interactions, IR behaviors, human problem-solving process.


Chen, H. (2010). AI and security informatics. IEEE Intelligent Systems, 25(5), 82-83.


Based on the available crime and intelligence knowledge, federal, state, and local authorities can make timely and accurate decisions to select effective strategies and tactics as well as allocate the appropriate amount of resources to detect, prevent, and respond to future attacks. Facing the critical mission of international security and various data and technical challenges, there is a pressing need to develop the science of security informatics. The main objective is the development of advanced information technologies, systems, algorithms, and databases for security-related applications using an integrated technological, organizational, and policy-based approach. Intelligent systems have much to contribute for this emerging field. © 2010 IEEE.

Abbasi, A., Zhang, Z., Zimbra, D., Chen, H., & Nunamaker Jr., J. F. (2010). Detecting fake websites: The contribution of statistical learning theory. MIS Quarterly: Management Information Systems, 34(SPEC. ISSUE 3), 435-461.


Fake websites have become increasingly pervasive, generating billions of dollars in fraudulent revenue at the expense of unsuspecting Internet users. The design and appearance of these websites makes it difficult for users to manually identify them as fake. Automated detection systems have emerged as a mechanism for combating fake websites, however most are fairly simplistic in terms of their fraud cues and detection methods employed. Consequently, existing systems are susceptible to the myriad of obfuscation tactics used by fraudsters, resulting in highly ineffective fake website detection performance. In light of these deficiencies, we propose the development of a new class of fake website detection systems that are based on statistical learning theory (SLT). Using a design science approach, a prototype system was developed to demonstrate the potential utility of this class of systems. We conducted a series of experiments, comparing the proposed system against several existing fake website detection systems on a test bed encompassing 900 websites. The results indicate that systems grounded in SLT can more accurately detect various categories of fake websites by utilizing richer sets of fraud cues in combination with problem-specific knowledge. Given the hefty cost exacted by fake websites, the results have important implications for E-commerce and online security.

Yang, M., Kiang, M., Chen, H., & Yijun, L. i. (2012). Artificial immune system for illicit content identification in social media. Journal of the American Society for Information Science and Technology, 63(2), 256-269.


Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets.The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance. © 2011 ASIS&T.

Chen, H., Atabakhsh, H., Wang, A. G., Kaza, S., Tseng, L. C., Wang, Y., Joshi, S., Petersen, T., & Violette, C. (2006). COPLINK center: Social network analysis and identity deception detection for law enforcement and homeland security intelligence and security informatics: A crime data mining approach to developing border safe research. ACM International Conference Proceeding Series, 151, 49-50.


In this paper, we describe the highlights of the COPLINK Center for law enforcement and homeland security project. Two new components of the project are described, namely, identity resolution and mutual information.

Chen, H. (2010). Business and market intelligence 2.0. IEEE Intelligent Systems, 25(1), 68-71.


Some articles on Business and Market Intelligence 2.0 from distinguished experts in marketing science, finance, accounting, and computer science, are presented. 'The Phase Transition of Markets and Organizations: The New Intelligence and Entrepreneurial Frontier', characterizes phase transition in markets and organizations as a move from individuals and resources being separate to being together. 'User- Generated Content on Social Media: Predicting New Product Market Success from Online Word of Mouth', explores the predictive validity of various text and sentiment measures of online word of mouth (WOM) for the market success of new products. 'On Data- Driven Analysis of User-Generated Content', discusses data-driven approaches, including content and network analysis that can be used to derive insights and characterize user-generated content from companies and other organizations.