Hsinchun Chen

Hsinchun Chen

Professor, Management Information Systems
Regents Professor
Member of the Graduate Faculty
Professor, BIO5 Institute
Primary Department
Contact
(520) 621-4153

Research Interest

Dr Chen's areas of expertise include:Security informatics, security big data; smart and connected health, health analytics; data, text, web mining.Digital library, intelligent information retrieval, automatic categorization and classification, machine learning for IR, large-scale information analysis and visualization.Internet resource discovery, digital libraries, IR for large-scale scientific and business databases, customized IR, multilingual IR.Knowledge-based systems design, knowledge discovery in databases, hypertext systems, machine learning, neural networks computing, genetic algorithms, simulated annealing.Cognitive modeling, human-computer interactions, IR behaviors, human problem-solving process.

Publications

Fan, L., Zhang, Y., Dang, Y., & Chen, H. (2013). Analyzing sentiments in Web 2.0 social media data in Chinese: Experiments on business and marketing related Chinese Web forums. Information Technology and Management, 14(3), 231-242.

Abstract:

Web 2.0 has brought a huge amount of user-generated, social media data that contains rich information about people's opinions and ideas towards various products, services, and ongoing social and political events. Nowadays, many companies start to look into and try to leverage this new type of data to understand their customers in order to make better business strategies and services. As a nation with rapid economic growth in recently years, China has become visible and started to play an important role in the global business and economy. Also, with the large number of Chinese Internet users, a considerable amount of options about Chinese business and market have been expressed in social media sites. Thus, it will be of interest to explore and understand those user-generated contents in Chinese. In this study, we develop an integrated framework to analyze user sentiments from Chinese social media sites by leveraging sentiment analysis techniques. Based on the framework, we conduct experiments on two popular Chinese Web forums, both related to business and marketing. By utilizing Elastic Net together with a rich body of feature representations, we achieve the highest F-measures of 84.4 and 86.7 % for the two data sets, respectively. We also demonstrate the interpretability of Elastic Net by discussing the top-ranked features with positive or negative sentiments. © 2013 Springer Science+Business Media New York.

Zheng, R., Qin, Y., Huang, Z., & Chen, H. (2003). Authorship analysis in cybercrime investigation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2665, 59-73.

Abstract:

Criminals have been using the Internet to distribute a wide range of illegal materials globally in an anonymous manner, making criminal identity tracing difficult in the cybercrime investigation process. In this study we propose to adopt the authorship analysis framework to automatically trace identities of cyber criminals through messages they post on the Internet. Under this framework, three types of message features, including style markers, structural features, and content-specific features, are extracted and inductive learning algorithms are used to build feature-based models to identify authorship of illegal messages. To evaluate the effectiveness of this framework, we conducted an experimental study on data sets of English and Chinese email and online newsgroup messages. We experimented with all three types of message features and three inductive learning algorithms. The results indicate that the proposed approach can discover real identities of authors of both English and Chinese Internet messages with relatively high accuracies. © Springer-Verlag Berlin Heidelberg 2003.

Chen, H., & Kim, J. (1994). GANNET: A machine learning approach to document retrieval. Journal of Management Information Systems, 10(4), 7-41.

Abstract:

Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades. In the 1980s, knowledge-based techniques also have made an impressive contribution to "intelligent" information retrieval and indexing. More recently, information science researchers have turned to other, newer artificial intelligence-based inductive learning techniques including neural networks, symbolic learning, and genetic algorithms. The newer techniques have provided great opportunities for researchers to experiment with diverse paradigms for effective information processing and retrieval. In this article we first provide an overview of newer techniques and their usage in information science research. We then present in detail the algorithms we adopted for a hybrid Genetic Algorithms and Neural Nets based system, called GANNET. GANNET performed concept (keyword) optimization for user-selected documents during information retrieval using the genetic algorithms. It then used the optimized concepts to perform concept exploration in a large network of related concepts through the Hopfield net parallel relaxation procedure. Based on a test collection of about 3,000 articles from DIALOG and an automatically created thesaurus, and using Jaccard's score as a performance measure, our experiment showed that GANNET improved the Jaccard's scores by about 50 percent and it helped identify the underlying concepts (keywords) that best describe the user-selected documents. © 1995 M.E. Sharpe, Inc.

Xu, J. J., & Chen, H. (2005). CrimeNet explorer: A framework for criminal network knowledge discovery. ACM Transactions on Information Systems, 23(2), 201-226.

Abstract:

Knowledge about the structure and organization of criminal networks is important for both crime investigation and the development of effective strategies to prevent crimes. However, except for network visualization, criminal network analysis remains primarily a manual process. Existing tools do not provide advanced structural analysis techniques that allow extraction of network knowledge from large volumes of criminal-justice data. To help law enforcement and intelligence agencies discover criminal network knowledge efficiently and effectively, in this research we proposed a framework for automated network analysis and visualization. The framework included four stages: network creation, network partition, structural analysis, and network visualization. Based upon it, we have developed a system called CrimeNet Explorer that incorporates several advanced techniques: a concept space approach, hierarchical clustering, social network analysis methods, and multidimensional scaling. Results from controlled experiments involving student subjects demonstrated that our system could achieve higher clustering recall and precision than did untrained subjects when detecting subgroups from criminal networks. Moreover, subjects identified central members and interaction patterns between groups significantly faster with the help of structural analysis functionality than with only visualization functionality. No significant gain in effectiveness was present, however. Our domain experts also reported that they believed CrimeNet Explorer could be very useful in crime investigation. © 2005 ACM.

Chen, H., Martinez, J., Kirchhoff, A., Ng, T. D., & Schatz, B. R. (1998). Alleviating search uncertainty through concept associations: Automatic indexing, co-occurrence analysis, and parallel computing. Journal of the American Society for Information Science, 49(3), 206-216.

Abstract:

In this article, we report research on an algorithmic approach to alleviating search uncertainty in a large information space. Grounded on object filtering, automatic indexing, and co-occurrence analysis, we performed a large-scale experiment using a parallel supercomputer (SGI Power Challenge) to analyze 400,000+ abstracts in an INSPEC computer engineering collection. Two system-generated thesauri, one based on a combined object filtering and automatic indexing method, and the other based on automatic indexing only, were compared with the human-generated INSPEC subject thesaurus. Our user evaluation revealed that the system-generated thesauri were better than the INSPEC thesaurus in concept recall, but in concept precision the 3 thesauri were comparable. Our analysis also revealed that the terms suggested by the 3 thesauri were complementary and could be used to significantly increase "variety" in search terms and thereby reduce search uncertainty.