Hsinchun Chen
Publications
Abstract:
The World-Wide Web (WWW) based Internet services have become a major channel for information delivery. For the same reason, information overload also has become a serious problem to the users of such services. It has been estimated that the amount of information stored on the Internet doubled every 18 months. The speed of increase of homepages can be even faster, some people estimated that it doubled every 6 months. Therefore, a scalable approach to support Internet searching is critical to the success of Internet services and other current or future National Information Infrastructure (NII) applications. In this paper, we discuss a modified version of simulated annealing algorithm to develop an intelligent personal spider (agent), which is based on automatic textual analysis of the Internet documents and hybrid simulated annealing.
Abstract:
Beginning in 2005, the Securities and Exchange Commission (SEC) mandated firms to include a "risk factor" section in their Form 10-K to discuss "the most significant factors that make the company speculative or risky." In this study, we examine the information content of this newly created section and offer two main results. First, we find that firms facing greater risk disclose more risk factors, and that the type of risk the firm faces determines whether it devotes a greater portion of its disclosures towards describing that risk type. That is, managers provide risk factor disclosures that meaningfully reflect the risks they face. Second, we find that the information conveyed by risk factor disclosures is reflected in systematic risk, idiosyncratic risk, information asymmetry, and firm value. Overall, our evidence supports the SEC's decision to mandate risk factor disclosures, as the disclosures appear to be firm-specific and useful to investors. © 2013 Springer Science+Business Media New York.
Abstract:
Organizations often manage identity information for their customers, vendors, and employees. Identity management is critical to various organizational practices ranging from customer relationship management to crime investigation. The task of searching for a specific identity is difficult because disparate identity information may exist due to the issues related to unintentional errors and intentional deception. In this paper we propose a hierarchical Naïve Bayes model that improves existing identity matching techniques in terms of searching effectiveness. Experiments show that our proposed model performs significantly better than the exact-match based matching technique. With 50% training instances labeled, the proposed semi-supervised learning achieves a performance comparable to the fully supervised record comparison algorithm. The semi-supervised learning greatly reduces the efforts of manually labeling training instances without significant performance degradation. © 2011 Elsevier B.V. All rights reserved.