高度な音声認識

音声認識API

業界をリードするパフォーマンスで日本語の高精度音声テキスト変換。日本語音声に最適化された精度とスピードで音声をテキストに変換します。

API Docs 価格を見る

音声認識デモ

オーディオを録音して、すぐにテキスト化

💡 ヒント：1～10秒の間で録音してください。停止後、自動的にテキスト化が開始されます。

業界をリードする精度

実際の日本語音声でベンチマークされたパフォーマンス

98.5%

総合精度

クリーンな音声環境

95.2%

騒音環境

背景ノイズ処理

<0.5秒

レスポンスタイム

1分間の音声あたり

97.8%

混合言語

日英コードスイッチング

ドメイン別精度比較

カスタマーサービス通話96.5%

ビジネスミーティング97.2%

医療相談95.8%

法的手続き98.1%

技術的な議論96.9%

日本語音声向けに構築

日本語音声認識専用に設計された機能

複数方言サポート

標準日本語、関西弁、東北弁、その他の地域方言を正確に認識

リアルタイムストリーミング

リアルタイムで音声ストリームを処理し、ライブ文字起こしと即座の結果を提供

話者ダイアライゼーション

会話内の複数の話者を自動的に識別して分離

超高速

最適化された推論パイプラインで数時間の音声を数分で処理

エンタープライズセキュリティ

エンドツーエンド暗号化とセキュアな音声処理でSOC 2準拠

カスタム語彙

業界固有の用語、ブランド名、カスタムフレーズを追加して精度を向上

信頼されるユースケース

企業がASR APIをどのように活用しているかをご覧ください

コールセンター文字起こし

品質保証、コンプライアンス、インサイトのためにカスタマーサービスの通話を自動的に文字起こし。

品質監視
コンプライアンス記録
エージェントトレーニング
顧客センチメント分析

会議メモ

会議、インタビュー、ディスカッションを検索可能で実用的なテキスト文書に変換。

ビジネスミーティング
インタビュー記録
会議録音
チームスタンドアップ

字幕とキャプション

ビデオ、ライブストリーム、放送用の正確な字幕をリアルタイムまたはバッチモードで生成。

ビデオ字幕
ライブイベントキャプション
放送文字起こし
アクセシビリティ準拠

API Key

APIキーを設定

以下にAPIキーを入力すると、このページのすべてのコード例が自動的に更新されます

ホットワードとカスタム語彙

テキストプロンプトにホットワードを含めることで、専門用語の文字起こし精度を向上させます。ホットワードは、モデルが次のものを正しく認識するのに役立ちます:

Language

Hotwords (comma-separated)

Request Body Preview

{
  "audio": "<base64-encoded audio>"
}

Get an API Key →

クイックスタートガイド

3つの簡単なステップで音声認識APIを開始

1. APIキーを取得

Shisa AIアカウントにサインアップし、開発者ダッシュボードからAPIキーを取得します。Authorizationヘッダーに'shsk:'プレフィックスを付けて含めます:

Authorization: Bearer shsk:YOUR_API_KEY

2. 音声を準備

APIは様々な形式のbase64エンコードされた音声を受け付けます。サポートされている音声形式は次のとおりです:

OGG（Opus、Vorbis）
WAV（PCM、16ビット）
MP3、WebM、M4A、FLAC

3. 最初のリクエストを送信

音声データと設定を含むPOSTリクエストをAPIエンドポイントに送信します。cURLを使用した基本的な例:

curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \
  -H 'Authorization: Bearer shsk:YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "audio": "'$(base64 -w0 audio.ogg)'"
  }'

Minimal request

Only the audio field is required. Language is auto-detected and tuning parameters use sensible defaults.

Expected Response

The API returns a JSON response with the transcribed text, detected language, and confidence score.

{
  "text": "こんにちは、シサAIです。",
  "language": "ja",
  "confidence": 0.98
}

APIエンドポイント

音声認識APIは、最大限の柔軟性とコンテキスト認識のためにチャット形式のインターフェースを使用します

音声認識エンドポイント

POSThttps://api.shisa.ai/asr/srt/audio_llm

このマルチモーダルエンドポイントは、テキストの指示と音声コンテンツの両方を受け付け、精度向上のためにコンテキストとカスタム語彙（ホットワード）を提供できます。

リクエストパラメータ

これらのパラメータで文字起こしリクエストを設定

リクエストボディパラメータ

パラメータ	型	必須	説明
audio	string	Required	Base64-encoded audio data (WAV, OGG, MP3, or FLAC)
language	string	Optional	Language code (e.g. `"ja"`, `"en"`). Omit for automatic language detection (LID).
hotwords	string[]	Optional	Array of words/phrases to boost recognition accuracy for domain-specific terms
temperature	float	Optional	サンプリング温度（0.0-2.0）。低い値は出力をより決定的にします。デフォルト: 0.0 Default: `0.0`
top_p	float	Optional	ニュークレアスサンプリングパラメータ（0.0-1.0）。出力の多様性を制御。デフォルト: 0.85 Default: `0.85`
frequency_penalty	float	Optional	頻出トークンにペナルティ（-2.0〜2.0）。繰り返しを減らします。デフォルト: 0.5 Default: `0.5`
repetition_penalty	float	Optional	トークンの繰り返しにペナルティ（1.0-2.0）。1.0より大きい値は繰り返しを抑制。デフォルト: 1.05 Default: `1.05`
vad	integer	Optional	Voice activity detection mode Default: `1`

音声入力形式

音声は次の形式のbase64エンコードされたデータURLとして提供する必要があります:

"audio": "SGVsbG8gV29ybGQ..."

Pass raw base64-encoded audio data in the audio field. The server auto-detects the format from the binary header.

サポートされている音声形式:

Format	MIME Type	Detection
WAV	audio/wav	RIFF header
OGG	audio/ogg	OggS header
MP3	audio/mpeg	ID3 tag or MPEG sync bytes
FLAC	audio/flac	fLaC header

音声をBase64にエンコード

次のコマンドを使用して音声ファイルをbase64に変換します:

# Encode any supported format to base64
base64 -w0 audio.ogg    # Linux
base64 -i audio.ogg     # macOS

# Use in a curl request
curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \\
  -H 'Authorization: Bearer shsk:YOUR_API_KEY' \\
  -H 'Content-Type: application/json' \\
  -d '{ "audio": "'$(base64 -w0 audio.ogg)'" }'

Supported Languages (LID)

The API supports automatic language identification (LID) for the following languages. The detected language is returned in the language field of the response.

Primary Languages

jaJapanese

enEnglish

zhChinese

レスポンス形式

APIレスポンス構造の理解

成功レスポンス

{
  "text": "こんにちは、シサAIです。",
  "language": "ja",
  "confidence": 0.98
}

レスポンスフィールド:

text: The transcribed text from the audio
language: The detected or specified language code
confidence: Transcription confidence score (0 to 1)

エラー処理

一般的なエラーと解決方法

エラーレスポンス形式

{
  "code": 400,
  "error": "No audio data provided"
}

401 Authentication Error

Returned when the API key is missing, invalid, or expired. Check that your Authorization header includes a valid token.

{
  "context": ["authMiddleware"],
  "code": 104,
  "name": "ErrAuthenticationFailed",
  "error": "Authentication error: Invalid token"
}

Error Codes

Code	Cause	Error Message
400	Missing audio field	No audio data provided
400	Audio decodes to empty	No audio data provided
400	Not base64 encoded	Invalid base64 audio data
400	Base64 decode fails	Invalid base64 audio data
400	Unsupported audio format	Unsupported audio format
500	Services not ready	Transcription service not available
500	Backend failure	Transcription failed: ...

コード例

人気のあるプログラミング言語での統合例

cURL - クイックスタート

cURLを使用して音声ファイルを文字起こしする基本的な例

curl -s -XPOST 'https://api.shisa.ai/asr/srt/audio_llm' \
  -H 'Authorization: Bearer shsk:YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "audio": "'$(base64 -w0 audio.ogg)'"
  }'

Python - 完全な例

base64エンコードとホットワードサポートを含む完全なPython関数

import base64
import requests

# Read and encode audio file
with open("audio.ogg", "rb") as f:
    audio_data = base64.b64encode(f.read()).decode("utf-8")

url = "https://api.shisa.ai/asr/srt/audio_llm"
headers = {
    "Authorization": "Bearer shsk:YOUR_API_KEY",
    "Content-Type": "application/json"
}

payload = {
    "audio": audio_data
}

response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
print(response.json())

JavaScript - ブラウザ統合

FileReader APIを使用したクライアント側JavaScript例

async function transcribeAudio(audioFile) {
  // Read file and convert to base64
  const fileBuffer = await audioFile.arrayBuffer();
  const base64Audio = btoa(
    new Uint8Array(fileBuffer).reduce(
      (data, byte) => data + String.fromCharCode(byte),
      ''
    )
  );

  const response = await fetch('https://api.shisa.ai/asr/srt/audio_llm', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer shsk:YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      audio: base64Audio
    })
  });

  if (!response.ok) {
    throw new Error(`API request failed: ${response.status}`);
  }

  return await response.json();
}

// Example usage with file input
document.querySelector('#audioInput').addEventListener('change', async (e) => {
  const file = e.target.files[0];
  if (file) {
    const result = await transcribeAudio(file);
    console.log('Transcription:', result);
  }
});

音声を精密にテキストに変換

月間180分（3時間）の無料文字起こしから始められます。成長に合わせてスケール。

今すぐ始める料金プランを見る