BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages¶
High level overview¶
Introduction¶
BlaBla is a package for linguistic feature extraction written in Python. Information and tutorials can be found in our README on GitHub. Please see our paper for more details.
Installing BlaBla¶
Installing BlaBla is as easy as installating any other normal Python package
git clone https://github.com/novoic/blabla
cd blabla
pip install .
BlaBla also uses Stanford CoreNLP for analysing text. To set up CoreNLP, run the following command ./setup_corenlp.sh after changing corenlp_dir and lang if required. Please refer to our README file for more details on the installation.
./setup_corenlp.sh
BlaBla¶
At the heart of BlaBla is the DocumentProcessor
and the Document
class. You have to import the DocumentProcessor
class to process a piece of input text as shown in the below piece of code.
from bla_bla.document_processor import DocumentProcessor
with DocumentProcessor("stanza_config/stanza_config.yaml", "en") as doc_proc:
content = "The picture shows a boy walking to the kitchen to pick a cookie from the cookie jar."
doc = doc_proc.analyze(content, "string")
res_json = doc.compute_features("noun_rate")
print(res_json)
Under the hood, the DocumentProcessor
object has an analyze
method that will return an object of type Document
class which can be used to compute features
DocumentProcessor¶
This page outlines the methods from the DocumentProcessor class.
Document¶
This page outlines the methods from the Document class.