Building Speech To Text Web Application - 1. part - Flexiana
avatar

Pawel Zielonka

Posted on 6th August 2020

Building Speech To Text Web Application – 1. part

news-paper News | Software Development |

In this series of articles, we are going to show you how to build a Single Page Application that converts speech to text. For this purpose, we will use ClojureScript with several libraries, AWS Transcribe for converting audio to text, Facebook for authentication and AWS S3 as a storage service.

Setup Facebook Application

We want our application to be easily accessible for the majority of people and We don’t want to store any personal data and passwords. Facebook comes with a helping hand by giving us Login functionality. In order to integrate it with a website, we have to register a Facebook App and add Facebook Login in the Products panel in the application dashboard. In the later part of the article, We will briefly present how to add a login button to our application as a reagent component. More details about Facebook Login can be found here.

Setup AWS Account

We are going to use three AWS services:

  • S3 for storing recorded audio files and transcription results
  • Transcribe for converting recordings to text
  • IAM for managing permissions to AWS services and resources
  1. Create two S3 buckets: first for recorded audio files and second for transcription results.
  2. Configure CQRS for these buckets by copying XML below to the CQRS configuration in the Permissions tab.
<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedMethod>PUT</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>
  1. Create IAM Policy and copy the JSON policy below and rename <recordings> and <transcripts> with your actual bucket names.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ListAudioBucket",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<recordings>",
            "Condition": {
                "StringEquals": {
                    "s3:prefix": "facebook-${graph.facebook.com:id}"
                }
            }
        },
        {
            "Sid": "ListTranscriptionsBucket",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<transcripts>"
        },
        {
            "Sid": "AudioBucketOps",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::<recordings>/facebook-${graph.facebook.com:id}/*"
        },
        {
            "Sid": "TranscriptionsBucketOps",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::<transcripts>/*"
        },
        {
            "Sid": "TranscriptionsOps",
            "Effect": "Allow",
            "Action": [
                "transcribe:GetTranscriptionJob",
                "transcribe:StartTranscriptionJob"
            ],
            "Resource": "*"
        }
    ]
}
  1. Create an IAM Role for Web identity, choose Facebook as an Identity provider and attach the created policy to it.

Project setup

Before we start make sure that you have shadow-cljs and Clojure CLI installed.

Create an empty package.json file in your project directory:

$ echo "{}" > package.json

Create shadow-cljs.edn file with the following content:

Speech To Text - part 2

At the very top, we define that the dependencies of our application are placed in the deps.edn file. In the next line, we specify the port on which nREPL (run by shadow-cljs) should listen. Thanks to it, we can connect to the running ClojureScript application and evaluate expressions. The next step is to configure builds. We see a build identified as :app, which will return compiled code suitable to run in the browser. Under :output-dir we specify the path where the compilation result will be placed. We want it to go to resources/public.

After registering the application on Facebook and creating the aforementioned services on AWS, we can define several constants that will be used to integrate these services with our application. We can do it by specifying them under the key :closure-defines in the map. Keys must be namespace-qualified symbols.

In the :modules key we define what modules our build will consist of. In this case, it will be compiled into app.js file. In :init-fn we pass a aws-transcribe-clj.core/init! function to be called when the module is loaded.

Last but not least, the parameter :devtools allows us to configure tools for the build that we will work with during development. In this case, we indicate that the Http server should serve the content of the resources/public directory on port 9000. Additionally, we inject the refrisk.preload namespace to the compiled file.

When we have configured the build we can create a deps.edn file with dependencies we need for our application.

Speech To Text - part 1.
  • :deps – map containing libraries with specified versions
  • :paths – vector pointing to directories with source code and resources
  • :aliases
    • :dev alias is going to be used to leverage hot code reloading – whenever you make a change in source code, shadow-cljs will recompile it. Additionally we added an extra dependency, re-frisk, which is a development tool for re-frame.  
    • In order to release aproduction build, I’m going to use the :prod alias which will compile ClojureScript code using advanced optimizations.

You can also see several dependencies above, the main two being Clojure and ClojureScript

core.async is a library supporting asynchronous programming and communication. We find it very helpful and convenient to use while working with code synchronization, especially in the world of JavaScript callbacks and promises.

core.match is a pattern-matching library which helps reduce the amount of conditional expressions in favor of rules. These in turn make our code more readable and do not force another developer to rack their brain on complicated if-statements.

reagent is an interface between ClojureScript and React, allowing us to build UI components using Hiccup-like syntax.

re-frame is a data-oriented, event-driven and functional framework for building UI interfaces. In development mode, We recommend using re-frame-10x or re-frisk to inspect application state, history of dispatched events and calculated subscriptions.

reitit is a fast data-driven router for Clojure and ClojureScript.

cljs-oops – library supporting interoperability with native JavaScript objects providing optimizer-safe property and method accessors. Their use saves us from creating extern files.

timbre – logging library.

Shadow-cljs – build tool for ClojureScript. Provides many useful features speeding up development such as hot code reloading, nrepl or good integration with npm packages.

Running application

Let’s create src/cljs/aws_transcribe_clj/core.cljs file with the following content:

(ns aws-transcribe-clj.core)

(defn ^:export init! []
  (enable-console-print!)
  (println "HELLO CLOJURESCRIPT"))

Create resources/public/index.html file:

<!DOCTYPE html>
<html>
<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
    <title>AWS Transcribe CLJ</title>
</head>
<body>
    <div id="app"></div>
    <script src="/js/app.js"></script>
</body>
</html>

Run the following command in the terminal:

$ clj -A:dev

Open your browser at http://localhost:9000 and you should see “HELLO CLOJURESCRIPT” in the JS Console.

Shadow-cljs will recompile your code every time you change it. This gives great opportunities for development, debugging and prototyping. The big benefit is that even if code changes, the application state will remain unchanged. This way you can save a lot of time – you do not have to recreate it by jumping around the application.

This is the end of the first part. In the next one, We will show you how to set up routing and manage application state with re-frame.