Text Recognition Model

News

Microsoft unveils AI model that understands image content, solves visual puzzles

On Monday, researchers from Microsoft introduced Kosmos-1, a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ ...

Hackaday1y

Text-to-Speech Model Can Do Music, Background Noises, And Sound Effects

Bark is a universal text-to-audio model that can not only create realistic ... this plain C/C++ implementaion of AI-powered speech recognition.

Ars Technica1y

Meta’s “massively multilingual” AI model translates up to 100 languages, speech or text

Among the features of SeamlessM4T touted on Meta's promotional blog, the company says that the model can perform speech recognition (you give it audio of speech, and it converts it to text ...

Geeky Gadgets3mon

olmOCR Open Source OCR System for AI Training Using PDFs & Documents

Have you ever found yourself wrestling with a dense PDF or a handwritten note, wishing there was an easier way to extract the information you need? Whether you’re a researcher trying to digitize ...

New Atlas2y

Beyond text: AI model digests 80 hours of video to learn sign language

For deaf and hard-of-hearing people, voice recognition ... the raw text, they converted it all to lowercase which reduced the vocabulary complexity. Overall, they found that their model was ...

VentureBeat10mon

aiOla drops ultra-fast ‘multi-head’ speech recognition model, beats OpenAI Whisper

It converted user audio into text, allowing an ... leading to faster recognition and transcription without any loss of accuracy. “We chose to train our model to predict 10 tokens on each ...

VentureBeat2mon

Gladia launches Solaria as AI-based multi-lingual speech recognition model for speech-to-text transcription

It handles speech to text translation and transcription ... and real-time multilingual support to succeed. The model achieves industry-leading results in speech recognition, delivering both ...

TechCrunch2y

OpenAI debuts Whisper API for speech-to-text transcription and translation

a hosted version of the open source Whisper speech-to-text model that the company released in September. Priced at $0.006 per minute, Whisper is an automatic speech recognition system that OpenAI ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results