Telegram Bot for Voice-to-Text Transcription with OpenAI

This Python code snippet presents a Telegram bot designed to transcribe voice messages into text using OpenAI’s capabilities.

The bot can also calculate the cost associated with processing these voice messages based on their durations.

Functions:
get_audio_duration(file_path):

Description: This asynchronous function retrieves the duration of an audio file in seconds.
Parameters: file_path (str) – Path to the audio file.
Returns: Duration of the audio file in seconds.
calculate_cost(file_paths):

Description: This asynchronous function calculates the cost of processing voice messages based on their durations.
Parameters: file_paths (list) – List of file paths of the voice messages.
Returns: Total cost in tokens for processing all voice messages.
audio_to_text(file_path: str) -> str:

Description: This asynchronous function transcribes an audio file into text using OpenAI’s transcription service.
Parameters: file_path (str) – Path to the audio file.
Returns: Transcribed text from the audio file.
save_voice_as_mp3(bot: Bot, voice: Voice) -> str:

Description: This asynchronous function downloads a voice message, converts it to MP3 format, and saves it.
Parameters: bot (Bot) – Telegram bot instance, voice (Voice) – Voice message object.
Returns: Path to the saved MP3 file.
process_voice_message(message: Message):

Description: This asynchronous function handles incoming voice messages, transcribes them into text, calculates the cost, and sends the transcribed text back to the user.
Parameters: message (Message) – Incoming message object.

Usage:
Users send voice messages to the bot.
The bot downloads the voice message, converts it to MP3 format, and saves it.
The bot transcribes the voice message into text using OpenAI’s transcription service.
The bot calculates the cost of processing the voice message based on its duration.
The bot sends the transcribed text back to the user along with the cost of processing.
Note:
Ensure proper setup and authentication with OpenAI’s services for transcription.
Error handling is implemented to handle exceptions gracefully and notify users in case of failures.

from aiogram.types import Message, Voice
from pydub import AudioSegment

async def get_audio_duration(file_path):
    audio = AudioSegment.from_file(file_path)
    return audio.duration_seconds


# Calculate costs based on Whisper voice
async def calculate_cost(file_paths):
    token_cost_per_minute = 3000  # Cost of one minute in tokens
    costs = []
    for file_path in file_paths:
        duration_in_seconds = await get_audio_duration(file_path)
        cost_in_minutes = duration_in_seconds / 60
        token_spent_for_voice = round(cost_in_minutes * token_cost_per_minute)
        costs.append(token_spent_for_voice)
    return sum(costs)


# AUDIO TO TEXT
async def audio_to_text(file_path: str) -> str:
    """Accepts the path to an audio file and returns the file's text."""

    with open(file_path, "rb") as audio_file:
        transcript = await openai.Audio.atranscribe("whisper-1", audio_file)
    return transcript["text"]


async def save_voice_as_mp3(bot: Bot, voice: Voice) -> str:
    try:
        """Downloads a voice message and saves it in mp3 format."""
        voice_file_info = await bot.get_file(voice.file_id)
        voice_ogg = io.BytesIO()
        await bot.download_file_by_id(voice.file_id, voice_ogg)
        voice_mp3_path = f"voice_files/voice-{voice.file_unique_id}.mp3"
        AudioSegment.from_file(voice_ogg, format="ogg").export(
            voice_mp3_path, format="mp3"
        )
        return voice_mp3_path

    except Exception as e:
        logging.error(f"Error: {e}")


@dp.message_handler(content_types=[types.ContentType.VOICE])
async def process_voice_message(message: Message):

    try:
        """Accepts all voice messages and transcribes them into text."""

        voice_path = await save_voice_as_mp3(bot, message.voice)
        transcripted_voice_text = await audio_to_text(voice_path)

        token_voice_spent[message.chat.id] = await calculate_cost([voice_path])

        await bot.delete_message(chat_id=message.chat.id, message_id=message.message_id)

        if transcripted_voice_text:
            await bot.send_message(message.chat.id, text=f"You asked: {transcripted_voice_text} Сost: {transcripted_voice_text} ")

    except Exception as e:
        await bot.send_message(message.from_user.id, f"An error occurred: {e}")

Here’s how it works:

Bogdan Kuhar

Author: Bogdan Kuhar
Full Stack Developer/coach
https://www.youtube.com/@imimir_com

info@imimir.com

Leave a Comment Cancel reply