Deepgram Python SDK Tutorial
Welcome to this tutorial on using the Deepgram Python SDK! 🚀
In this notebook, we'll explore the core functionalities of Deepgram for speech-to-text transcription. We will cover everything from setting up your environment to transcribing audio from files, URLs. We'll also touch on some of Deepgram's advanced features like diarization and summarization.
Let's get started!
1. Setup
First, we need to install the necessary libraries and configure our environment.
1.1. Install Libraries
Uncomment and run the following cell to install the required Python packages: deepgram-sdk
, python-dotenv
for managing environment variables.
Note for macOS users: If you need microphone support, you must install portaudio
and then pyaudio
manually. Run the following in your terminal:
brew install portaudio
pip install pyaudio
If you don't need microphone support, you can skip installing pyaudio
.
# Install Deepgram SDK and python-dotenv. If you need microphone support, see the note below.
# !pip install deepgram-sdk python-dotenv -q
1.2. API Key Configuration
To use Deepgram's services, you'll need an API key. You can get a free API key with $200 in credits by signing up on the Deepgram website.
Once you have your key, create a file named .env
in the same directory as this notebook and add your API key to it like this:
DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"
Now, let's load the API key from the .env
file.
import os
from dotenv import load_dotenv
load_dotenv()
DEEPGRAM_API_KEY = os.getenv("DEEPGRAM_API_KEY")
2. Transcribing Audio from a File
Let's start by transcribing a local audio file. For this example, you can download a sample audio file. We'll use a sample from the internet for this demonstration.
from deepgram import PrerecordedOptions, FileSource, DeepgramClient
def transcribe_audio_from_local_file(file_path):
# Create a Deepgram client
deepgram = DeepgramClient(DEEPGRAM_API_KEY)
# Configure Deepgram options
options = PrerecordedOptions(
model="nova-2",
smart_format=True,
)
with open(file_path, "rb") as file:
buffer_data = file.read()
payload: FileSource = {
"buffer": buffer_data,
}
# Call the transcribe_file method using REST API
response = deepgram.listen.rest.v("1").transcribe_file(payload, options)
# Print only the transcript
for channel in response.results.channels:
for alternative in channel.alternatives:
print(alternative.transcript)
transcribe_audio_from_local_file('harvard.wav') # Replace with your local audio file path
The stale smell of old beer lingers. It takes heat to bring out the odor. A cold dip restores health and zest. A salt pickle tastes fine with ham. Tacos al pastor are my favorite. A zestful food is the hot cross bun.
3. Transcribing Audio from a URL
Deepgram can also directly transcribe audio from a URL. This is very convenient for processing audio files hosted online.
from deepgram import DeepgramClient, PrerecordedOptions, FileSource
import json
# Create a Deepgram client
deepgram = DeepgramClient(DEEPGRAM_API_KEY)
# URL of the audio file
AUDIO_URL = "https://static.deepgram.com/examples/interview_speech-analytics.wav"
def transcribe_audio_from_url(audio_url):
# Configure Deepgram options
options = PrerecordedOptions(
model="nova-2",
smart_format=True,
)
# Call the transcribe_url method using REST API
source = {'url': audio_url}
response = deepgram.listen.rest.v("1").transcribe_url(source, options)
# Print only the transcript
for channel in response.results.channels:
for alternative in channel.alternatives:
print(alternative.transcript)
transcribe_audio_from_url(AUDIO_URL)
Another big problem in the speech analytics space, when customers first bring the software on is that they they are blown away by the fact that an engine can monitor hundreds of KPIs. Right? Everything from minute compliance issues to, you know, human, human interaction, empathy measurements, to upsell, aptitudes, to closing aptitudes. There are hundreds, literally, of KPIs that one can look at. And the speech analytics companies have typically gone to the customer and really bang that drum. Look at all of these things that we're gonna help you keep an eye on. The reality, however, is that a company even a contact center manager, they can't keep track in their brain, even if they have a report in front of them, of that many KPIs. Mhmm. And, frankly, it's overwhelming. So what successful companies do is they bite off no more than they can chew at any given time. The reality is is you can only train a call center agent on a maximum of three skills at any given day. Right? And by focusing on, focusing on problem areas for a week, for a month, depending on how bad things are, and then once you've mastered that skill, to take a baseline of of your performance and move on to the next worst skill, right, is the way that companies succeed using this product.
4. Advanced Features: Diarization and Summarization
Deepgram offers a range of powerful audio intelligence features. Let's look at two of the most popular ones: diarization (identifying who spoke and when) and summarization.
from deepgram import PrerecordedOptions, FileSource, DeepgramClient
import json
def transcribe_with_advanced_features_local(file_path):
deepgram = DeepgramClient(DEEPGRAM_API_KEY)
options = PrerecordedOptions(
model="nova-2",
smart_format=True,
diarize=True, # Enable speaker diarization
summarize="v2", # Enable summarization
)
with open(file_path, "rb") as file:
buffer_data = file.read()
payload: FileSource = {
"buffer": buffer_data,
}
response = deepgram.listen.rest.v("1").transcribe_file(payload, options)
# Print transcript separated by speaker
print("\nSpeaker-separated transcript:")
diarized = response.results.channels[0].alternatives[0].words
speaker_transcripts = {}
for word in diarized:
speaker = getattr(word, 'speaker', 'Unknown')
if speaker not in speaker_transcripts:
speaker_transcripts[speaker] = []
speaker_transcripts[speaker].append(getattr(word, 'punctuated_word', getattr(word, 'word', '')))
for speaker, words in speaker_transcripts.items():
print(f"Speaker {speaker}: {' '.join(words)}")
# Extract and print summary from data['results']['summary']['short'] if available
data = response.to_dict()
summary_data = data.get('results', {}).get('summary', {})
summary_short = summary_data.get('short') if isinstance(summary_data, dict) else None
if summary_short:
print("\nSummary:")
print(summary_short)
else:
print("\nNo summary available.")
transcribe_with_advanced_features_local("diarization.m4a")
Speaker-separated transcript: Speaker 0: I guess. I just thought Aristotle thought flames kept going up and up and up. He said the Earth revolved around the sun, but didn't he get in trouble for that? Didn't the church make him stop saying it? What aspect of gravitational science is the lecture mainly about? Speaker 1: No. Aristotle thought the flames would stop at some point at their natural place in the sky, but I can understand how you'd be confused. Aristotle's ideas seem odd to us today. Now, there was another early scientist whose ideas on gravity may seem more familiar. An ancient Indian thinker from the five hundreds, Brahmagupta, that was his name. He believed that the earth was basically a giant ball that was full of gravity and pulled things down to it. So around earth with its own gravitational pull, just like we believe today. Now, yes, Susie. Well, actually, Susie, the theory the world is round is a very old one. In fact, it was during Aristotle's life that some of his fellow Greek scientist realized the Earth had to be round. In many ways, this idea of a round earth was the first step toward our modern understanding of gravity. And, Brahmaguptha took that a step further realizing there was gravity within the sphere of the earth. Now, Aristotle was right too in a sense. Things can fall at different speeds, but, that's because of differences in air resistance. The atmosphere or counteracting gravity when it hits things that are not so compact, not not dense. Aristotle, Ramagupta, other ancient thinkers were missing an important theory, well, a fact. I'm talking about heliocentrism, the idea the earth isn't the center of the universe, that it revolves around the sun. That idea became popular much later in the fifteen hundreds. Does anyone know who Copernicus is? Roger? Well, he wasn't punished, but he did get a lot of grief from religious leaders at the time. The idea certainly wasn't traditional, but other scientists and the public, they embraced his ideas about planetary orbits. This new scientific attention to orbits set the stage for Newton to realize that gravity made things fall to the earth, but they also made the moon circle the earth. And then Newton figured out what we all know now. Larger objects have gravitational power over smaller ones. Speaker 2: Sorry. I I'm confused again. I thought nobody knew the Earth was round until people sailed all the way around the world in in, like, the fourteen or fifteen hundreds. How did Brahmagupta figure out the world was round? Summary: Speaker 0 is confused about the idea that the Earth is a giant ball and pulled things down to it. Speaker 1 explains that the theory of a round Earth was discussed by a Greek scientist named Braheler during the early stages of the Discovery process, but it became popular later in the hundreds. Other scientists and the public embraced the idea of a rotating body and the idea of a moon, but it was not traditional.
6. Conclusion
Congratulations! You've now learned the basics of using the Deepgram Python SDK.
We've covered: ✅ Setting up your environment and API key. ✅ Transcribing audio from local files and URLs. ✅ Performing real-time transcription from a microphone. ✅ Using advanced features like diarization and summarization.
This is just the beginning of what you can do with Deepgram. For more information, check out the Deepgram Documentation.