Monday, February 10, 2025

Use Whisper to create subtitle files locally without requiring any paid services.


Recently I started working on creating videos for an AI course. I wanted to create subtitles/captions (.srt and .vtt files) for my videos. I explored various online tools. But they impose a lot of limitations. Previously I used YouTube to create subtitles. But this time I already uploaded similar videos (but the duration is a bit different) to my channel. Uploading it again will create a duplicate issue. So, I decided to use the whisper from my local computer.

First I installed whisper and ffmpeg using below pip command.

pip install openai-whisper ffmpeg

Then run the below command for creating .wav audio file using ffmpeg

ffmpeg -i "what is AI.mp4" -ac 2 -ar 16000 -vn "what_is_AI.wav"

And, created the below pyhton code to create .srt file.

import whisper

model = whisper.load_model("base")
result = model.transcribe("what_is_AI.wav")

# Save transcript to an SRT file
with open("subtitles.srt", "w") as srt_file:
for i, segment in enumerate(result["segments"]):
start = segment["start"]
end = segment["end"]
text = segment["text"]

srt_file.write(f"{i+1}\n")
srt_file.write(f"{start:.2f} --> {end:.2f}\n")
srt_file.write(f"{text}\n\n")

Running this python code as "python3 transcribe.py" created the .srt file.

I am surprised with the quality of .srt file. Even it is better than subtitles generated by YouTube. It is really amazing to create good subtitle files without using the internet.

ebook - Unlocking AI: A Simple Guide for Beginners 

This .srt file won't accepted in a few platforms. For example, Udemy will accept only .vtt file format only.

So, in this case, we can use the below python code to convert the .srt file into .vtt file. 

import re

def format_timestamp(time_str):
"""
Convert time format from "0.00" to "00:00:00.000"
"""
seconds, milliseconds = map(float, time_str.split('.')) if '.' in time_str else (float(time_str), 0)
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
seconds = seconds % 60
return f"{hours:02}:{minutes:02}:{seconds:06.3f}".replace('.', ',')

# Define file paths
srt_file_path = "subtitles.srt"
vtt_file_path = "subtitles.vtt"

# Read the SRT file
with open(srt_file_path, "r", encoding="utf-8") as srt_file:
srt_content = srt_file.readlines()

# Convert SRT format to VTT format
vtt_content = ["WEBVTT\n\n"]

for line in srt_content:
line = line.strip()
# Match timestamps in format "0.00 --> 11.84"
match = re.match(r"(\d+\.\d+)\s+-->\s+(\d+\.\d+)", line)
if match:
start_time = format_timestamp(match.group(1)).replace(',', '.')
end_time = format_timestamp(match.group(2)).replace(',', '.')
vtt_content.append(f"{start_time} --> {end_time}\n")
else:
vtt_content.append(line + "\n")

# Write the VTT file
with open(vtt_file_path, "w", encoding="utf-8") as vtt_file:
vtt_file.writelines(vtt_content)

# Provide the converted file path
vtt_file_path





Though the quality of the whisper output is good, I noticed a few mistakes in the transcription. I think my way of pronouncing those words is not correct. So, I thought of fine-tuning the model with my voice. After doing some analysis, I came to know that openAI does not support the fine-tuning of the whisper model.

So, asked chatGPT about how to handle this and I got below reply.

Since fine-tuning Whisper directly isn't officially supported by OpenAI, the best way to adapt it to your voice is by using alternative methods like custom post-processing, speaker adaptation, and dataset-based training. Here's how you can proceed:


1. Improve Whisper’s Accuracy for Your Voice

Even though Whisper can't be fine-tuned directly, you can adapt it using the following methods:

A. Use a Custom Vocabulary & Prompting

Whisper supports custom prompts to bias its transcription. You can pass commonly misrecognized words as a prompt:

import whisper

model = whisper.load_model("medium") # Use a suitable model size

prompt = "Rajamanickam, AI, dataset, machine learning, transcription, Tamil"
result = model.transcribe("your_audio.mp3", initial_prompt=prompt)

print(result["text"])

This helps Whisper recognize your name, technical terms, or unique words you use often.


B. Use Custom Word Replacement (Post-processing)

If Whisper frequently misinterprets specific words, use text correction with Python:


def correct_transcription(text):
corrections = {
"Whisper AI": "Whisper",
"Rajmanicam": "Rajamanickam",
"Langchaine": "LangChain",
# Add more misrecognized words
}
for wrong, correct in corrections.items():
text = text.replace(wrong, correct)
return text

# Example Usage
transcribed_text = "Whisper AI is great for Langchaine."
corrected_text = correct_transcription(transcribed_text)
print(corrected_text) # Output: "Whisper is great for LangChain."

This method helps fix errors specific to your voice.


2. Train a Custom ASR Model with Your Voice

If you need real fine-tuning, you can train a smaller ASR (Automatic Speech Recognition) model:

A. Collect Audio & Transcripts

  • Record at least 5-10 hours of your voice.
  • Create text transcripts for each recording.

B. Train a Model Using Open-Source ASR Frameworks

If Whisper fine-tuning isn't an option, you can train an alternative ASR model:

  • ESPnet – Open-source ASR framework supporting speaker adaptation.
  • Kaldi – Traditional ASR system that can adapt to your voice.
  • NVIDIA NeMo – Train ASR models with custom datasets.

Example: Fine-tuning a model with NVIDIA NeMo:

python train.py --data my_voice_data/ --epochs 20

This method requires GPU power, but it allows full customization.


3. Use Speaker Adaptation (Voice ID)

Another workaround is to train a speaker recognition model alongside Whisper:

  • Use a model like Wav2Vec 2.0 or DeepSpeaker to recognize your voice.
  • Apply speaker-specific corrections.

This method helps Whisper adapt to your unique pronunciation.


Final Thoughts

  • If Whisper misrecognizes words, try custom prompts and text correction.
  • If Whisper struggles with your accent, train a smaller ASR model on your voice.
  • If you need true fine-tuning, consider using ESPnet or NeMo.

Read more ...

Getting Started with AI: A Step-by-Step Guide


ebook - Unlocking AI: A Simple Guide for Beginners 

Artificial Intelligence (AI) can seem intimidating at first, but with the right approach, anyone can start learning and experimenting with AI. Whether you’re a complete beginner or someone with a bit of technical background, this step-by-step guide will help you get started on your AI journey.


1. What are the first steps to learning AI?

Starting with AI doesn’t have to be overwhelming. Here’s how to begin:

  • Understand the Basics: Learn what AI is, its applications, and its key concepts (e.g., Machine Learning, Deep Learning).

  • Set Clear Goals: Decide why you want to learn AI—whether it’s for a career, a project, or personal interest.

  • Learn the Prerequisites: Build a foundation in mathematics (linear algebra, calculus, probability) and programming (Python is highly recommended).


2. What resources are best for beginners?

There are plenty of beginner-friendly resources to help you learn AI:

  • Online Courses: Platforms like Coursera, edX, and Udacity offer introductory AI courses. Some popular ones include:

    • AI For Everyone by Andrew Ng (Coursera).

    • Introduction to Artificial Intelligence by IBM (edX).

  • Books: Beginner-friendly books like "Artificial Intelligence: A Guide to Intelligent Systems" by Michael Negnevitsky or "AI Crash Course" by Hadelin de Ponteves.

  • YouTube Channels: Channels like 3Blue1Brown (for math) and Sentdex (for Python and AI) are great for visual learners.

  • Interactive Platforms: Websites like Kaggle and DataCamp offer hands-on tutorials and projects.


3. What programming language should I learn for AI?

Python is the most popular language for AI and Machine Learning because of its simplicity and extensive libraries. Here’s how to get started:

  • Learn Python basics (variables, loops, functions).

  • Explore Python libraries for AI, such as:

    • NumPy and Pandas for data manipulation.

    • Matplotlib and Seaborn for data visualization.

    • Scikit-learn for Machine Learning.

    • TensorFlow and PyTorch for Deep Learning.


4. How can I practice AI skills?

Practice is key to mastering AI. Here are some ways to get hands-on experience:

  • Work on Projects: Start with simple projects like predicting house prices or classifying images.

  • Participate in Competitions: Platforms like Kaggle host AI competitions where you can solve real-world problems.

  • Build a Portfolio: Showcase your projects on GitHub or a personal website to demonstrate your skills to potential employers.


5. What are some beginner-friendly AI projects?

Here are a few beginner-friendly AI projects to get you started:

  • Predictive Analytics: Predict stock prices or weather using regression models.

  • Image Classification: Build a model to classify images of cats and dogs.

  • Chatbot: Create a simple chatbot using Natural Language Processing (NLP).

  • Recommendation System: Build a movie or product recommendation system.

These projects will help you apply what you’ve learned and build confidence in your skills.


6. Do I need a powerful computer to learn AI?

Not necessarily. While advanced AI models (e.g., deep learning) require significant computational power, you can start with:

  • Cloud Platforms: Use cloud services like Google Colab, AWS, or Microsoft Azure to run AI models without needing a high-end computer.

  • Pre-trained Models: Use pre-trained models available in libraries like TensorFlow Hub or Hugging Face to avoid training models from scratch.

As you progress, you can invest in better hardware or continue using cloud resources.


7. How can I stay updated on AI trends?

AI is a fast-evolving field, so staying updated is crucial. Here’s how:

  • Follow AI News: Websites like Towards Data Science, AI News, and MIT Technology Review provide the latest updates.

  • Join Communities: Participate in AI forums like Reddit’s r/MachineLearning or LinkedIn groups.

  • Attend Events: Join webinars, conferences, and meetups to network with professionals and learn about new developments.


8. Should I specialize in a specific area of AI?

As you progress, you may want to specialize in areas like:

  • Machine Learning: Focus on algorithms and models for prediction and classification.

  • Deep Learning: Dive into neural networks for tasks like image and speech recognition.

  • Natural Language Processing (NLP): Work on language-related applications like chatbots and translation.

  • Computer Vision: Explore AI for image and video analysis.

Specializing can help you build expertise and stand out in the job market.


9. How long does it take to learn AI?

The time it takes to learn AI depends on your background and goals:

  • Beginners: With consistent effort, you can learn the basics in 3-6 months.

  • Intermediate Learners: Building practical skills and completing projects may take 6-12 months.

  • Advanced Learners: Mastering advanced topics and specializing can take 1-2 years or more.

Remember, learning AI is a continuous process, and staying curious is key.


10. What are the career opportunities in AI?

AI offers a wide range of career opportunities, including:

  • AI Engineer: Develop and deploy AI models.

  • Data Scientist: Analyze data and build predictive models.

  • Machine Learning Engineer: Focus on designing and optimizing ML algorithms.

  • Research Scientist: Work on cutting-edge AI research.

  • AI Consultant: Help businesses implement AI solutions.

With the right skills and experience, you can find opportunities in almost every industry.


Getting started with AI may seem challenging, but with the right resources, practice, and mindset, anyone can learn and excel in this exciting field. Start with the basics, work on projects, and stay curious. Remember, AI is not just about technology—it’s about solving problems and creating value. So, take the first step today and unlock the endless possibilities of AI!


Read more ...

Common Myths and Misconceptions About AI


ebook - Unlocking AI: A Simple Guide for Beginners 

Artificial Intelligence (AI) is one of the most exciting and rapidly evolving fields today. However, it is also surrounded by myths and misconceptions that can lead to confusion or unrealistic expectations. Let’s debunk some of the most common myths about AI.


1. Is AI the same as human intelligence?

No, AI is not the same as human intelligence. While AI can mimic certain aspects of human cognition, such as learning and problem-solving, it lacks consciousness, emotions, and true understanding. AI systems are designed to perform specific tasks based on data and algorithms, but they do not possess general intelligence or self-awareness.


2. Can AI work without data?

No, AI cannot work without data. Data is the foundation of AI, as it is used to train models, identify patterns, and make predictions. Without high-quality, relevant data, AI systems cannot function effectively. This is why Data Science, which focuses on collecting, cleaning, and analyzing data, is so critical to AI development.


3. Is AI only for large tech companies?

No, AI is not limited to large tech companies. While companies like Google, Microsoft, and Amazon have been pioneers in AI, the technology is becoming increasingly accessible to smaller businesses, startups, and even individuals. Open-source tools, cloud-based AI services, and affordable hardware have democratized AI, enabling anyone with the right skills to build AI solutions.


4. Will AI eventually take over the world?

No, AI will not take over the world. This is a common misconception fueled by science fiction. AI is a tool created and controlled by humans, and its capabilities are limited to the tasks it is designed for. While AI can automate certain processes and make decisions based on data, it does not have intentions, desires, or the ability to act independently.


5. Can AI solve all problems?

No, AI cannot solve all problems. While AI is incredibly powerful, it is not a magic solution. AI is best suited for tasks that involve pattern recognition, data analysis, and automation. It struggles with tasks that require creativity, empathy, or deep contextual understanding. Additionally, AI systems are only as good as the data they are trained on, and poor-quality data can lead to inaccurate or biased results.


6. Is AI unbiased and fair?

No, AI is not inherently unbiased or fair. AI systems learn from data, and if the data contains biases, the AI model will likely replicate or even amplify those biases. For example, biased hiring algorithms can discriminate against certain groups if the training data reflects historical inequalities. Ensuring fairness in AI requires careful data selection, preprocessing, and ongoing monitoring.


7. Does AI always outperform humans?

No, AI does not always outperform humans. While AI excels at tasks like data processing, pattern recognition, and repetitive tasks, it often falls short in areas that require creativity, intuition, and emotional intelligence. For example, AI can generate art or music, but it lacks the depth of meaning and emotional resonance that human creators bring to their work.


8. Is AI too complex for beginners to learn?

No, AI is not too complex for beginners to learn. While AI involves advanced concepts like machine learning, neural networks, and algorithms, there are many beginner-friendly resources, tools, and frameworks available. Platforms like TensorFlow, Scikit-learn, and Fast.ai offer tutorials and documentation to help newcomers get started. With dedication and practice, anyone can learn the basics of AI.


9. Will AI replace all human jobs?

No, AI will not replace all human jobs. While AI can automate certain tasks, it is more likely to augment human capabilities rather than replace them entirely. Many jobs require skills like creativity, critical thinking, and emotional intelligence, which AI cannot replicate. Instead of eliminating jobs, AI is expected to create new roles and opportunities in fields like AI development, data science, and AI ethics.


10. Is AI only about robots and automation?

No, AI is not just about robots and automation. While robotics is one application of AI, the field is much broader. AI includes technologies like natural language processing (e.g., chatbots), computer vision (e.g., facial recognition), recommendation systems (e.g., Netflix recommendations), and more. AI is also used in healthcare, finance, education, and many other industries.


11. Can AI think for itself?

No, AI cannot think for itself. AI systems operate based on predefined algorithms and data. They do not have consciousness, self-awareness, or the ability to make independent decisions. Even advanced AI models like GPT or deep learning systems are simply following patterns and rules learned from data, not "thinking" in the way humans do.


12. Is AI only for tech-savvy people?

No, AI is not only for tech-savvy people. While a technical background can be helpful, many AI tools and platforms are designed to be user-friendly. For example, no-code AI platforms allow non-technical users to build AI models using drag-and-drop interfaces. Additionally, understanding the basics of AI does not require advanced programming skills—just curiosity and a willingness to learn.


13. Is AI always expensive to develop?

No, AI development is not always expensive. While building advanced AI systems can require significant resources, there are many affordable and even free tools available for beginners. Open-source libraries like TensorFlow and PyTorch, along with cloud-based AI services, have made it easier and more cost-effective to experiment with AI.


14. Is AI a recent invention?

No, AI is not a recent invention. The concept of AI has been around since the 1950s, and early AI research laid the groundwork for many of the technologies we use today. What has changed in recent years is the availability of massive amounts of data, powerful computing resources, and advanced algorithms, which have accelerated AI development and adoption.


15. Is AI only about machine learning?

No, AI is not only about machine learning (ML). While ML is a major subset of AI, the field also includes other techniques like rule-based systems, expert systems, and evolutionary algorithms. AI encompasses a wide range of technologies and approaches aimed at creating intelligent systems.


ebook - Unlocking AI: A Simple Guide for Beginners 


Read more ...

Search This Blog