A Clojure library for building realtime conversations using OpenAI's Realtime API
reelthyme provides a simple, core.async-driven API for interacting with OpenAI's Realtime API. On the JVM it uses WebSocket, while ClojureScript will leverage WebRTC. Use it to build multimodal, real-time conversational experiences with minimal boilerplate.
Start a REPL and require the namespaces:
(ns my-app.core
(:require [reelthyme.core :as rt]
[reelthyme.schema :as sc]
[clojure.core.async :as a]))
Open a realtime session, with optional event validation:
;;; JVM
(def session-ch
(rt/connect! {:xf-in (map sc/validate)}) ;; validate outgoing events
;;; ClojureScript
(def session-ch
(rt/connect! session {:xf-in (map sc/validate)})) ;;; session is fetched from a server
reelthyme.core/create-session
is included as a convenient server-side function for creating sessions fit for WebRTC.
On the JVM, authentication can be provided as an :api-key
option, or the default will attempt to use
the OPENAI_API_KEY
environment variable. WebRTC requires some work on your own server. See Connection details in the OpenAI WebRTC docs.
The reelthyme.schema
namespace is completely optional. The validate
function provided by it is a great
addition to the development environment to ensure properly constructed client events are being sent.
Separate audio deltas from other events (JVM):
;;; Receive server events as plain Clojure maps
(let [[event-ch stop-audio] (rt/stream! session-ch)]
(a/go-loop []
(when-let [ev (a/<! event-ch)]
(println "Server event:" ev)
(recur))))
Or just read from the single session channel when using WebRTC:
;;; Receive server events as plain Clojure maps
(a/go-loop []
(when-let [ev (a/<! session-ch)]
(println "Server event:" ev)
(recur)))
Send client events as plain Clojure maps.
;; Send a text message
(a/put! session-ch
{:type "conversation.item.create"
:item {:type "message"
:role "user"
:content [{:type "input_text"
:text "What is the weather today?"}]}})
;; Request both text and audio response
(a/put! session-ch
{:type "response.create"
:response {:modalities ["text" "audio"]}})
We must explicitly initiate audio capture in the JVM
(def stop-capture
(rt/capture-audio! session-ch {:interval-ms 500}))
;; Later, stop capturing
(stop-capture)
WebRTC will initiate audio capture if the session is created with an "audio" modality. If "audio" is not one of the requested modalities (for some reason), a silent AudioContext will be used for the RTCPeerConnection (some form of audio context is required by OpenAI). WebRTC audio capture and playback are handled automatically based on the details of the session. There is no need to explicitly start capture or stop audio. Everything is automatically started when the channel conencts, and stopped when it is closed.
connect!
, which yields ClojureScript maps.The ClojureScript version of connect! requires a session with an ephemeral client secret:
;;; JVM
(connect! params)
;;; ClojureScript
(connect! session params)
It should also be noted that browsers often impose restrictions on creating audio contexts of any kind without user interaction. This means that ClojureScript implementations will likely have to put connect! behind a user interaction (such as a click).
Check out the example ClojureScript application. It might also be helpful to see the example handler used to create sessions.
reelthyme.schema
for built-in event validation.:xf-in
, :xf-out
, :ex-handler
, :log-fn
options.Acoustic Echo Cancellation (AEC) can be used to prevent awkward experiences where an agent hears itself and gets stuck in an infinite loop speaking to itself. AEC comes standard with WebRTC, so this issue should not surface when using it. If using the websocket transport on the JVM, you maybe have a better experience using a headset or disabling vads for a more "push to talk" approach.
Can you improve this documentation? These fine people already did:
Brian Scaturro & Brian Scaturro (aider)Edit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close