Hsinchun Chen
Professor, BIO5 Institute
Professor, Management Information Systems
Regents Professor
Primary Department
Department Affiliations
(520) 621-4153
Research Interest
Dr Chen's areas of expertise include:Security informatics, security big data; smart and connected health, health analytics; data, text, web mining.Digital library, intelligent information retrieval, automatic categorization and classification, machine learning for IR, large-scale information analysis and visualization.Internet resource discovery, digital libraries, IR for large-scale scientific and business databases, customized IR, multilingual IR.Knowledge-based systems design, knowledge discovery in databases, hypertext systems, machine learning, neural networks computing, genetic algorithms, simulated annealing.Cognitive modeling, human-computer interactions, IR behaviors, human problem-solving process.


Schumaker, R. P., Solieman, O. K., & Chen, H. (2010). Sports knowledge management and data mining. Annual Review of Information Science and Technology, 44, 115-157.
Schumaker, R. P., Zhang, Y., Huang, C., & Chen, H. (2012). Evaluating sentiment in financial news articles. Decision Support Systems, 53(3), 458-464.


Can the choice of words and tone used by the authors of financial news articles correlate to measurable stock price movements? If so, can the magnitude of price movement be predicted using these same variables? We investigate these questions using the Arizona Financial Text (AZFinText) system, a financial news article prediction system, and pair it with a sentiment analysis tool. Through our analysis, we found that subjective news articles were easier to predict in price direction (59.0% versus 50.0% of chance alone) and using a simple trading engine, subjective articles garnered a 3.30% return. Looking further into the role of author tone in financial news articles, we found that articles with a negative sentiment were easiest to predict in price direction (50.9% versus 50.0% of chance alone) and a 3.04% trading return. Investigating negative sentiment further, we found that our system was able to predict price decreases in articles of a positive sentiment 53.5% of the time, and price increases in articles of a negative sentiment 52.4% of the time. We believe that perhaps this result can be attributable to market traders behaving in a contrarian manner, e.g., see good news, sell; see bad news, buy. © 2012 Elsevier B.V. All rights reserved.

Wang, G. A., Xu, J. J., & Chen, H. (2006). Using social contextual information to match criminal identities. Proceedings of the Annual Hawaii International Conference on System Sciences, 4, 81b.


Criminal identity matching is crucial to crime investigation in law enforcement agencies. Existing techniques match identities that refer to the same individuals based on simple identity features. These techniques are subject to several problems. First, there is an effectiveness trade-off between the false negative and false positive rates. The improvement of one rate usually lowers the other. Second, in some situations such as identity theft, simple-feature-based techniques are unable to match identities that have completely different identity feature values. We argue that the information about the social context of an individual may provide additional information for revealing the individual's identity, helping improve the effectiveness of identity matching techniques. We define two types of social contextual features: role-based personal features and social group features. Experiments showed that social contextual features, especially the structural similarity and the relational similarity, significantly improved the precision without lowering the recall of criminal identity matching tasks. © 2006 IEEE.

Qin, J., Zhou, Y., & Chen, H. (2011). A multi-region empirical study on the internet presence of global extremist organizations. Information Systems Frontiers, 13(1), 75-88.


Extremist organizations are heavily utilizing Internet technologies to increase their abilities to influence the world. Studying those global extremist organizations' Internet presence would allow us to better understand extremist organizations' technical sophistication and their propaganda plans. In this work, we explore an integrated approach for collecting and analyzing extremist Internet presence. We employed automatic Web crawling techniques to build a comprehensive international extremist Web collection. We then used a systematic content analysis tool called the Dark Web Attribute System to analyze and compare these extremist organizations' Internet usage from three perspectives: technical sophistication, content richness, and Web interactivity. By studying 1.7 million multimedia Web documents from around 224 Web sites of extremist organizations, we found that while all extremist organizations covered in this study demonstrate high level of technical sophistication in their Web presence, Middle Eastern extremists are among the most sophisticated groups in both technical sophistication and media richness. US groups are the most active in supporting Internet communications. Our analysis results will help domain experts deepen their understanding on the global extremism movements and make better counter-extremism measures on the Internet. © 2010 Springer Science+Business Media, LLC.

Marshall, B., Hua, S. u., McDonald, D., Eggers, S., & Chen, H. (2006). Aggregating automatically extracted regulatory pathway relations. IEEE Transactions on Information Technology in Biomedicine, 10(1), 100-108.

PMID: 16445255;Abstract:

Automatic tools to extract information from biomedical texts are needed to help researchers leverage the vast and increasing body of biomedical literature. While several biomedical relation extraction systems have been created and tested, little work has been done to meaningfully organize the extracted relations. Organizational processes should consolidate multiple references to the same objects over various levels of granularity, connect those references to other resources, and capture contextual information. We propose a feature decomposition approach to relation aggregation to support a five-level aggregation framework. Our BioAggregate tagger uses this approach to identify key features in extracted relation name strings. We show encouraging feature assignment accuracy and report substantial consolidation in a network of extracted relations. © 2006 IEEE.