Shlomit Gur


Improving the Reliability of Sentiment Prediction in Online Health Communities.

El-Manzalawy Y., Lee S., Gur S., Le T., Bui N., Yen J., and Honava V.J.
[In Preparation]
Key words: cancer survivors, sentiment classification, machine learning, mining online health communities, online discussion boards

Background: Online health communities (OHCs) offer a powerful internet-based platform for providing social support to participants through the sharing of knowledge, experience, and feelings with other participants. Understanding how online social interactions impact individuals’ health require effective methods for analyzing usergenerated content.
Objective: The major goals of this research are to: i) Develop a methodology for effective application of machine learning algorithms to OHC data: ii) Demonstrate the viability of the methodology via a case study for developing improved models for classification of sentiment expressed in user-generated content in Cancer Survivors Network (CSN), an online community of cancer survivors that is operated by the American Cancer Society.
Methods: We introduce an approach for extracting representative subset of data based on randomly sampled threads (as opposed to randomly sampled posts used in previous related studies). Four annotators assign labels to each post, considering the context of the post in the thread in which it appears. We partition the labeled data into crossvalidation and independent test sets, and run a competition between four data scientists to obtain the best sentiment classifier using cross-validation data. Classifiers submitted by participants are evaluated using the independent test data.
Results: Comparison of our competition winning classifier with the original study by Qui et al., using their dataset of 298 posts shows that our best classifier achieves an AUC of 0.92 and an accuracy of 84.6% as compared to an AUC of 0.83 and an accuracy of 79.2% reported in the original study.
Age-related differences in task-specific functional connectivity in the context of phonological and semantic cognitive tasks with distracting words    

Gur S., El-Manzalawy Y., Diaz M.T., and Honava V.J.
[In Preparation]
Key words: functional connectivity, aging, semantics, phonology, machine learning, network topology

Different language-related cognitive processes are affected differently by aging, with phonological processes displaying deficits, while semantic processes do not. A prominent aging hypothesis, the inhibition deficit hypothesis, holds that older adults find it more difficult to ignore irrelevant information. Therefore, we took a data-driven approach to investigate the interaction of semantic and phonological processes with distracting words in younger and older adults. We used task-specific functional connectivity and machine learning algorithms, to identify topological differences that discriminate between the functional networks associated with these two conditions.
We found that betweenness centrality measures of strong connections in the functional networks were effective as a group in discriminating between the two conditions. While some of the identified features were age-independent (left thalamus and caudate nucleus), most were age-dependent. Our results did not support the inhibition deficit hypothesis, but rather suggested age-related differences in intrinsic alertness between the two conditions, which we believe might affect or appear as inhibition under certain conditions. 
Therefore, our data-driven approach helps identify novel differences between conditions, as a function of age group. Moreover, it demonstrates the benefits of harnessing the powers of network topology and machine learning to functional connectivity analysis.