A Clojure library for building realtime conversations using OpenAI's Realtime API
reelthyme provides a simple, core.async-driven API for interacting with OpenAI's Realtime API. On the JVM it uses WebSocket, while ClojureScript will leverage WebRTC. Use it to build multimodal, real-time conversational experiences with minimal boilerplate.
Start a REPL and require the namespaces:
(ns my-app.core
(:require [reelthyme.core :as rt]
[reelthyme.schema :as sc]
[clojure.core.async :as a]))
Open a realtime session, with optional event validation:
;;; JVM
(def session-ch
(rt/connect! {:xf-in (map sc/validate)}) ;; validate outgoing events
;;; ClojureScript
(def session-ch
(rt/connect! client-secret {:xf-in (map sc/validate)})) ;;; client-secret is fetched from a server
reelthyme.core/create-client-secret is included as a convenient server-side function for creating client secrets fit for WebRTC.
On the JVM, authentication can be provided as an :api-key option, or the default will attempt to use
the OPENAI_API_KEY environment variable. WebRTC requires some work on your own server. See Connection details in the OpenAI WebRTC docs.
The reelthyme.schema namespace is completely optional and provides a malli compatible schema intended to aid development via instrumentation. See example/validate.cljc and example/webrtc.cljs for an example of dev instrumentation.
Separate audio deltas from other events (JVM):
;;; Receive server events as plain Clojure maps
(let [[event-ch stop-audio] (rt/stream! session-ch)]
(a/go-loop []
(when-let [ev (a/<! event-ch)]
(println "Server event:" ev)
(recur))))
Or just read from the single session channel when using WebRTC:
;;; Receive server events as plain Clojure maps
(a/go-loop []
(when-let [ev (a/<! session-ch)]
(println "Server event:" ev)
(recur)))
Send client events as plain Clojure maps.
;; Send a text message
(a/put! session-ch
{:type "conversation.item.create"
:item {:type "message"
:role "user"
:content [{:type "input_text"
:text "What is the weather today?"}]}})
;; Request both text and audio response
(a/put! session-ch
{:type "response.create"
:response {:modalities ["text" "audio"]}})
We must explicitly initiate audio capture in the JVM
(def stop-capture
(rt/capture-audio! session-ch {:interval-ms 500}))
;; Later, stop capturing
(stop-capture)
The ClojureScript version of connect! behaves a little differently because browser permissions are a bit more nuanced.
The :content-types param is a set that hints at the expected values belonging to the [:content :type] property of messages in a session.
It defaults to #{"input_audio" "input_text"} (supporting both audio and text inputs)
(connect! client-secret) ;; Microphone access requested by default when connect! is called
(connect! client-secret {:content-types #{"input_audio"}}) ;; anticipate only audio input
If :conntent-types contains "input_audio" at all, the end user will be prompted for mic access automatically when connect! is called.
This method is useful if you want more control over getting access to user media. For example you may want play a ringtone or log some feedback before
calling navigator.mediaDevices.getUserMedia().
(go
(let [stream (<p! (.getUserMedia js/navigator.mediaDevices #js {:audio true}))
track (aget (.getAudioTracks stream) 0)]
(connect! client-secret {:media-stream-track track})))
It is possible to use text for input AND output by setting :content-types to #{"input_text"}.
connect!, which yields ClojureScript maps.The ClojureScript version of connect! requires a session with an ephemeral client secret:
;;; JVM
(connect! params)
;;; ClojureScript
(connect! client-secret params)
It should also be noted that browsers often impose restrictions on creating audio contexts of any kind without user interaction. This means that ClojureScript implementations will likely have to put connect! behind a user interaction (such as a click).
Check out the example ClojureScript application. It might also be helpful to see the example handler used to create client secrets. See OpenAI docs on creating client secrets.
Acoustic Echo Cancellation (AEC) can be used to prevent awkward experiences where an agent hears itself and gets stuck in an infinite loop speaking to itself. AEC comes standard with WebRTC, so this issue should not surface when using it. If using the websocket transport on the JVM, you may have a better experience using a headset or disabling vads for a more "push to talk" approach.
Can you improve this documentation? These fine people already did:
Brian Scaturro & Brian Scaturro (aider)Edit on GitHub
cljdoc builds & hosts documentation for Clojure/Script libraries
| Ctrl+k | Jump to recent docs |
| ← | Move to previous article |
| → | Move to next article |
| Ctrl+/ | Jump to the search field |