BlaBla: Linguistic Feature Extraction for Clinical Analysis in Multiple Languages

High level overview

Introduction

BlaBla is a package for linguistic feature extraction written in Python. Information and tutorials can be found in our README on GitHub. Please see our paper for more details.

Installing BlaBla

Installing BlaBla is as easy as installating any other normal Python package

git clone https://github.com/novoic/blabla
cd blabla
pip install .

BlaBla also uses Stanford CoreNLP for analysing text. To set up CoreNLP, run the following command ./setup_corenlp.sh after changing corenlp_dir and lang if required. Please refer to our README file for more details on the installation.

./setup_corenlp.sh

BlaBla

At the heart of BlaBla is the DocumentProcessor and the Document class. You have to import the DocumentProcessor class to process a piece of input text as shown in the below piece of code.

from bla_bla.document_processor import DocumentProcessor
with DocumentProcessor("stanza_config/stanza_config.yaml", "en") as doc_proc:
    content = "The picture shows a boy walking to the kitchen to pick a cookie from the cookie jar."
    doc = doc_proc.analyze(content, "string")
    res_json = doc.compute_features("noun_rate")
    print(res_json)

Under the hood, the DocumentProcessor object has an analyze method that will return an object of type Document class which can be used to compute features

DocumentProcessor

This page outlines the methods from the DocumentProcessor class.

Document

This page outlines the methods from the Document class.

Features Table