The study of circulating biomarkers and their association with disease outcomes has become progressively complex due to advances in the measurement of these biomarkers through multiplex technologies. The Least Absolute Shrinkage and Selection Operator (LASSO) is a data analysis method that may be utilized for biomarker selection in these high dimensional data. However, it is unclear which LASSO-type method is preferable when considering data scenarios that may be present in serum biomarker research, such as high correlation between biomarkers, weak associations with the outcome, and sparse number of true signals. The goal of this study was to compare the LASSO to five LASSO-type methods given these scenarios.
The risk of breast cancer transiently increases immediately following pregnancy; peaking between 3-7 years. The biology that underlies this risk window and the effect on the natural history of the disease is unknown. MicroRNAs (miRNAs) are small non-coding RNAs that have been shown to be dysregulated in breast cancer. We conducted miRNA profiling of 56 tumors from a case series of multiparous Hispanic women and assessed the pattern of expression by time since last full-term pregnancy. A data-driven splitting analysis on the pattern of 355 miRNAs separated the case series into two groups: a) an early group representing women diagnosed with breast cancer ≤ 5.2 years postpartum (n = 12), and b) a late group representing women diagnosed with breast cancer ≥ 5.3 years postpartum (n = 44). We identified 15 miRNAs with significant differential expression between the early and late postpartum groups; 60% of these miRNAs are encoded on the X chromosome. Ten miRNAs had a two-fold or higher difference in expression with miR-138, miR-660, miR-31, miR-135b, miR-17, miR-454, and miR-934 overexpressed in the early versus the late group; while miR-892a, miR-199a-5p, and miR-542-5p were underexpressed in the early versus the late postpartum group. The DNA methylation of three out of five tested miRNAs (miR-31, miR-135b, and miR-138) was lower in the early versus late postpartum group, and negatively correlated with miRNA expression. Here we show that miRNAs are differentially expressed and differentially methylated between tumors of the early versus late postpartum, suggesting that potential differences in epigenetic dysfunction may be operative in postpartum breast cancers.
Measurement of serum biomarkers by multiplex assays may be more variable as compared to single biomarker assays. Measurement error in these data may bias parameter estimates in regression analysis, which could mask true associations of serum biomarkers with an outcome. The Least Absolute Shrinkage and Selection Operator (LASSO) can be used for variable selection in these high-dimensional data. Furthermore, when the distribution of measurement error is assumed to be known or estimated with replication data, a simple measurement error correction method can be applied to the LASSO method. However, in practice the distribution of the measurement error is unknown and is expensive to estimate through replication both in monetary cost and need for greater amount of sample which is often limited in quantity. We adapt an existing bias correction approach by estimating the measurement error using validation data in which a subset of serum biomarkers are re-measured on a random subset of the study sample. We evaluate this method using simulated data and data from the Tucson Epidemiological Study of Airway Obstructive Disease (TESAOD). We show that the bias in parameter estimation is reduced and variable selection is improved.