[SOLVED] Fireflies.ai bulk transcript downloader
No more manual downloading of fireflies.ai transcripts. Run this script to bulk download. Thanks Claude.ai
For the past two years, I've been using Fireflies.ai to transcribe my MS Team's, Google Meet and Zoom calls. I have hundreds of transcripts BUT Fireflies has no bulk download capability. I often run these transcripts through ChatGPT to summarise project discussions or mine them for workshop improvements that I can make. My solution was to ask my VA to download the files manually and work from there.
I’m working on a project that requires 40 to 50 files, so I needed an alternative to my VA.
Enter Claude.AI
To solve this problem, I asked Claude to suggest options. Claude suggested a Python script 👀 (despite having zero Python experience). Working on my Mac, Claude guided me through multiple iterations over about 45 minutes until we had a working solution for automatic bulk downloads. While this automation has made my virtual assistant's transcription management role redundant, it's solved my problem, saved me money and paid for my Claude, ChatGPT and Fireflies subscription!
The script has been performing excellently, and I wanted to share it with you. Try it out and let me know if you discover any bugs or potential improvements.
Here are the complete step-by-step instructions to download your Fireflies.ai transcripts:
Initial Setup (one-time only):
bash
# Create directory
mkdir fireflies_downloader
cd fireflies_downloader
# Create the Python script
nano fireflies_downloader.py
# Get your API key from Fireflies.ai:
# 1. Log into Fireflies.ai
# 2. Go to Settings → Developer Settings → API
# 3. Copy your API key
# Set your API key in Terminal
export FIREFLIES_API_KEY="your_api_key_here"
# Install required Python package
pip3 install requests
Script Content:
Open
fireflies_downloader.py
Press Control + K repeatedly to clear any existing content
Paste the full script into
fireflies_downloader.py
import requests
import json
import os
from datetime import datetime, timedelta
import time
class FirefliesDownloader:
def __init__(self, api_key):
self.api_key = api_key
self.base_url = "https://api.fireflies.ai/graphql"
self.headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json"
}
def get_transcript_content(self, transcript_id):
"""
Fetch detailed content for a specific transcript
"""
query = """
query GetTranscriptContent($id: String!) {
transcript(id: $id) {
title
id
transcript_url
duration
date
participants
sentences {
text
speaker_id
start_time
}
summary {
keywords
action_items
}
}
}
"""
variables = {
"id": transcript_id
}
try:
print(f"Fetching content for transcript {transcript_id}...")
response = requests.post(
self.base_url,
headers=self.headers,
json={"query": query, "variables": variables}
)
if response.status_code != 200:
print(f"Error getting transcript content: {response.status_code}")
print(response.text)
return None
data = response.json()
if "errors" in data:
print("API returned errors:", json.dumps(data["errors"], indent=2))
return None
return data["data"]["transcript"]
except requests.exceptions.RequestException as e:
print(f"Error fetching transcript content: {e}")
return None
def get_transcripts(self, limit=25, to_date=None):
"""
Fetch list of transcripts with date filtering
"""
query = """
query GetTranscripts($limit: Int, $toDate: DateTime) {
transcripts(limit: $limit, toDate: $toDate) {
title
id
transcript_url
duration
date
participants
}
}
"""
variables = {
"limit": min(limit, 25),
"toDate": to_date.strftime("%Y-%m-%dT%H:%M:%S.000Z") if to_date else None
}
try:
print(f"Making API request for transcripts up to date {to_date if to_date else 'now'}...")
response = requests.post(
self.base_url,
headers=self.headers,
json={"query": query, "variables": variables}
)
data = response.json()
if "errors" in data:
print("API returned errors:", json.dumps(data["errors"], indent=2))
return None
return data
except requests.exceptions.RequestException as e:
print(f"Error fetching transcripts: {e}")
return None
def save_transcripts(self, output_dir="transcripts", to_date=None):
"""
Save all transcripts with their content
"""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
# Get existing transcripts to avoid duplicates
existing_files = set()
existing_ids = set()
for filename in os.listdir(output_dir):
if filename.endswith('.json'):
existing_files.add(filename)
try:
with open(os.path.join(output_dir, filename)) as f:
data = json.load(f)
existing_ids.add(data['id'])
except:
pass
last_date = to_date or datetime.now()
while True:
result = self.get_transcripts(limit=25, to_date=last_date)
if not result or "data" not in result or "transcripts" not in result["data"]:
print("No more transcripts found or error in API response")
break
transcripts = result["data"]["transcripts"]
if not transcripts:
print("No more transcripts found")
break
print(f"\nFound {len(transcripts)} transcripts")
earliest_date = None
for transcript in transcripts:
# Skip if we already have this transcript
if transcript['id'] in existing_ids:
print(f"Skipping existing transcript: {transcript['title']}")
continue
print(f"\nProcessing: {transcript['title']}")
# Get detailed content
content = self.get_transcript_content(transcript['id'])
if content:
transcript = content # Replace with full content
# Save the file
date_obj = datetime.fromtimestamp(int(transcript["date"]) / 1000)
safe_title = ''.join(c for c in transcript['title'] if c.isalnum() or c in (' ', '-', '_', '.'))
filename = f"{date_obj.strftime('%Y-%m-%d')}_{safe_title[:50]}.json"
filepath = os.path.join(output_dir, filename)
with open(filepath, 'w', encoding='utf-8') as f:
json.dump(transcript, f, indent=2, ensure_ascii=False)
print(f"Saved transcript: {filename}")
# Track earliest date we've seen
if earliest_date is None or date_obj < earliest_date:
earliest_date = date_obj
time.sleep(1) # Be nice to the API
if earliest_date:
# Subtract 1 second to avoid getting the same transcript again
last_date = earliest_date - timedelta(seconds=1)
time.sleep(1) # Extra pause between batches
else:
break
def main():
api_key = os.getenv("FIREFLIES_API_KEY")
if not api_key:
print("Please set your FIREFLIES_API_KEY environment variable")
return
print(f"Using API key: {api_key[:5]}...{api_key[-5:]}")
# Find the earliest date from existing files
earliest_date = None
if os.path.exists("transcripts"):
files = os.listdir("transcripts")
if files:
try:
for file in files:
if file.endswith('.json'):
with open(os.path.join("transcripts", file)) as f:
data = json.load(f)
date = datetime.fromtimestamp(int(data["date"]) / 1000)
if earliest_date is None or date < earliest_date:
earliest_date = date
if earliest_date:
print(f"Starting from before date: {earliest_date}")
except Exception as e:
print(f"Error reading files: {e}")
downloader = FirefliesDownloader(api_key)
downloader.save_transcripts(to_date=earliest_date)
if __name__ == "__main__":
main()
Save with Control + X, then Y, then Enter
Running the Script:
bash
# Navigate to the directory (if you're not already there)
cd fireflies_downloader
# Run the script
python3 fireflies_downloader.py
What to Expect:
First run will download the most recent transcripts
Each subsequent run will fetch older transcripts
Script will automatically:
Skip duplicates
Show progress
Create a 'transcripts' folder
Save each transcript as a JSON file named with date and title
To Get More Transcripts:
Just run the script again
Keep running until you see "No more transcripts found"
Each run will go further back in time
Files Location:
All transcripts are saved in the
transcripts
folderFiles are named like:
YYYY-MM-DD_Title.json
Each file contains:
Full conversation text
Speaker information
Summary and keywords
Meeting metadata
Troubleshooting:
If you get "API key not found" error:
bash
export FIREFLIES_API_KEY="your_api_key_here"
If you close Terminal and come back later:
bash
cd fireflies_downloader
export FIREFLIES_API_KEY="your_api_key_here"
python3 fireflies_downloader.py
Best Practices:
Run during off-peak hours
Let each run complete
Keep the script and files for future use
Make backups of your
transcripts
folder
The script is designed to be safe to run multiple times - it won't duplicate downloads and will automatically continue where it left off.
Hey Leslie. Great work. We've added a few features and shared a revised version on GitHub. It now to handles speaker diarization, retries, and workspace‑safe reruns.
https://github.com/humanrace-ai/FireFlies_Fetch
Thanks Michael, that's awesome.