Hsinchun Chen
Publications
Abstract:
Analyzing authorship of online texts is an important analysis task in security-related areas such as cybercrime investigation and counter-terrorism, and in any field of endeavor in which authorship may be uncertain or obfuscated. This paper presents an automated approach for authorship analysis using machine learning methods, a robust stylometric feature set, and a series of visualizations designed to facilitate analysis at the feature, author, and message levels. A testbed consisting of 506,554 forum messages, in English and Arabic, from 14,901 authors was first constructed. A prototype portal system was then developed to support feasibility analysis of the approach. A preliminary evaluation to assess the efficacy of the text visualizations was conducted. The evaluation showed that task performance with the visualization functions was more accurate and more efficient than task performance without the visualizations. © 2013 IEEE.
Abstract:
The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91% on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments. © 2008 ACM.
Abstract:
The Coplink project has been initiated to address the problems in criminal justice systems. University of Arizona researchers originally generated the concept space approach to facilitate sematic retrieval of information. User studies show that this system also improves searching and browsing in the engineering and biomedicine domains.
Abstract:
In this paper we discuss technical issues regarding intelligence and security informatics (ISI) research to accomplish the critical missions of international security and counter-terrorism. We propose a research framework addressing the technical challenges facing counter-terrorism and crime-fighting applications with a primary focus on the knowledge discovery from databases (KDD) perspective. We also present several Dark Web related case studies for open-source terrorism information collection, analysis, and visualization. Using a web spidering approach, we have developed a large-scale, longitudinal collection of extremist-generated Internet-based multimedia and multilingual contents. We have also developed selected computational link analysis, content analysis, and authorship analysis techniques to analyze the Dark Web collection. © Springer-Verlag Berlin Heidelberg 2007.
Abstract:
Neural networks have been used in various applications on the World Wide Web, but most of them only rely on the available input-output examples without incorporating Web-specific knowledge, such as Web link analysis, into the network design. In this paper, we propose a new approach in which the Web is modeled as an asymmetric Hopfield Net. Each neuron in the network represents a Web page, and the connections between neurons represent the hyperlinks between Web pages. Web content analysis and Web link analysis are also incorporated into the model by adding a page content score function and a link score function into the weights of the neurons and the synapses, respectively. A simulation study was conducted to compare the proposed model with traditional Web search algorithms, namely, a breadth-first search and a best-first search using PageRank as the heuristic. The results showed that the proposed model performed more efficiently and effectively in searching for domain-specific Web pages. We believe that the model can also be useful in other Web applications such as Web page clustering and search result ranking. © 2007 IEEE.
Pagination
- First page
- …
- 16
- 17
- 18
- …
- Last page