Research

Technology Research

Development Tools

Programming Language : Python

We choose Python as our programming language. Since we are working on an NLP(Natural Language Processing) task, python has most third party toolkits and libraries for machine learning and many open-source NLP algorithms are build based on python.

Web Framework : Flask

After research on a dozens of available web frameworks, we choose Flask, a micro web frame work written in Python. Flask has the following features (advantages): 1. Written in Python (compatible with Python ML algorithms) 2. Development server and debugger 3. Integrated support for unit testing 4. Support for secure cookies 5. Extensive documentation 6. Extensions available to enhance features desired [1] In comparison to its multiple web framework competitors, Flask stands out because its usability, sustainability, compatibility and extensibility.

Resource links:

[1]“Flask (web framework),” Wikipedia. Jan. 23, 2022. Accessed: Feb. 06, 2022. [Online].

[2]“Comparison of server-side web frameworks,” Wikipedia. Feb. 03, 2022. Accessed: Feb. 06, 2022. [Online].

Client Application

The type of client application is key to software development. The three main type is Web Application, Desktop software (Win&Unix), Mobile App (Android&AppStore). We choose Web application for our project. First, because the final document — Electronic Medical Record (EMR) should be able to be edited by the user and mobile phone can hardly perform this task, so Mobile App is excluded in the early stage. (However, future work may include developing a mobile App to ease patient’s usage). Next, Software Software is abandoned because the restriction on deployment between different platforms. In comparison, Web Application is more deployable with no limit on operating systems and requirement on machine’s computational power. However, it requires network connection and have the risk of data leakage, means more effort is needed on encryption and security methods.

Speech-to-Text

Audio transcription is one of the two key algorithms in our project.

Software & Service

When we research on usable Speech-To-Text toolkits, the first problem we met is to choose SaaS (Software as a Service) or an open-source ASR (Automatic Speech Recognition) software package for our project.

Service is usually more convenient to use and easier to start. With key/credentials, audio data can be sent in one batch or in continuous stream to the cloud. Transcription result will be returned to local after successful processed online. However, business or large usage often generates more cost. And cloud service accompanies risk of data leakage, which is a serious problem for clinical task since the data contains to patient’s personal information.

Software packages offers full control of its functions (means no extra cost when use each time). It also provides the possibility of creating smaller models tailored for your application, and deploying it on-device/edge without needing network connectivity. However, software requires higher computation power on the deployed device and usually requires expertise and upfront efforts to train and deploy the models.[1]

We choose SaaS for our project. Because we are doing a POC (proof of concept) system, SaaS is more beginner-friendly and easy to deploy.

Google & Microsoft

Two of the popular Speech-To-Text SaaS are provided separately by Google and Microsoft. We researched on both SDK and tested both on our local machine. However, google services should not be used for anything health related. (Google docs aren't even allowed in hospitals.) So in the end, we select Microsoft Speech SDK for our project.

Microsoft Azure Speech SDK

The Microsoft Azure Speech software development kit (SDK) exposes many of the Speech service capabilities user can use to develop speech-enabled applications. The Speech SDK is available in many programming languages and across all platforms.[2]

Speech-to-text

Speech-to-text is also known as speech recognition, enables real-time and batch transcription of audio streams into text. With additional reference text input, it also enables real-time pronunciation assessment and gives speakers feedback on the accuracy and fluency of spoken audio.[3] Applications, tools, or devices can consume, display, and take action on this text as command input. [4]

Resource links:

[1]S. C. Gupta, “Speech Recognition with Python” (accessed Nov. 21, 2021).

[2]eric-urban, “About the Speech SDK - Speech service - Azure Cognitive Services” (accessed Feb. 06, 2022).

[3]eric-urban, “Speech-to-text documentation - Tutorials, API Reference - Azure Cognitive Services - Azure Cognitive Services” (accessed Feb. 06, 2022).

[4]eric-urban, “Speech-to-text overview - Speech service - Azure Cognitive Services” (accessed Feb. 06, 2022).

Text Summarization

After the audio data is transcribed into text, the text should be summarized into an Electronic Medical Record (EMR). So we research on text summarization techniques in the biomedical / clinical field. A complete review of related works will be presented in the Algorithm page.

Technical Decisions

Language: Python

Web framework: Flask

Speech-to-Text: Microsoft Azure Speech SDK

Text Summarization: BERT-based-Summ