# OCR API

The OCR API extracts structured data from Mexican identity documents and utility bills using a specialized microservice (Bun + Tesseract) optimized for Mexican document formats.

## Base URL

```
https://api.zaits.net/v1/ocr
```

## Authentication

All requests require your API key in the Authorization header:

```http
Authorization: Bearer YOUR_API_KEY
```

***

## Supported Document Types

| Category          | `document_type` | Description                                    |
| ----------------- | --------------- | ---------------------------------------------- |
| **Identity**      | `ine`           | INE / Credencial para Votar (frente + reverso) |
| **Identity**      | `passport`      | Pasaporte mexicano                             |
| **Address proof** | `cfe`           | Recibo de luz (CFE)                            |
| **Address proof** | `telmex`        | Recibo de teléfono (TELMEX)                    |
| **Address proof** | `izzi`          | Recibo de cable/internet (IZZI)                |

> **Note:** General text extraction (`/extract`) and receipt OCR (`/extract/receipt`) are not supported. Requests to those endpoints return `400 not_supported`.

***

## ID Extraction

Extract structured fields from a Mexican identity document.

### Endpoint

```http
POST /v1/ocr/extract/id
```

### Parameters

| Parameter              | Type                  | Required | Description                                                                                                        |
| ---------------------- | --------------------- | -------- | ------------------------------------------------------------------------------------------------------------------ |
| `front`                | file or base64 string | Yes      | Front of the ID. Also accepted as field `image`.                                                                   |
| `back`                 | file or base64 string | No       | Back of the ID. Strongly recommended for INE — enables MRZ validation and authoritative name/DOB.                  |
| `document_type`        | string                | No       | `ine` (default) or `passport`                                                                                      |
| `include_authenticity` | boolean               | No       | If `true`, also runs `/verify/authenticity` in parallel and includes the result in the response as `authenticity`. |

Accepts **multipart/form-data** (binary file upload) or **application/json** (base64 strings).

### Request Example — INE front + back

{% tabs %}
{% tab title="cURL" %}

```bash
curl -X POST https://api.zaits.net/v1/ocr/extract/id \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "front=@ine_frente.jpg" \
  -F "back=@ine_reverso.jpg" \
  -F "document_type=ine"
```

{% endtab %}

{% tab title="cURL (JSON)" %}

```bash
curl -X POST https://api.zaits.net/v1/ocr/extract/id \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "ine",
    "front": "<base64>",
    "back": "<base64>"
  }'
```

{% endtab %}

{% tab title="JavaScript" %}

```javascript
const formData = new FormData();
formData.append("front", frontFile);
formData.append("back", backFile);  // optional but recommended for INE
formData.append("document_type", "ine");

const response = await fetch("https://api.zaits.net/v1/ocr/extract/id", {
  method: "POST",
  headers: { Authorization: "Bearer YOUR_API_KEY" },
  body: formData,
});

const result = await response.json();
```

{% endtab %}

{% tab title="Python" %}

```python
import requests

files = {
    "front": open("ine_frente.jpg", "rb"),
    "back":  open("ine_reverso.jpg", "rb"),
}
data = { "document_type": "ine" }

response = requests.post(
    "https://api.zaits.net/v1/ocr/extract/id",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files=files,
    data=data,
)

result = response.json()
```

{% endtab %}
{% endtabs %}

### Response — INE

```json
{
  "success": true,
  "type": "ine",
  "fields": {
    "curp": "DOES900115HVZRRN09",
    "claveElector": "DOSNRD90011512H100",
    "name": "EDUARDO BRAYAN",
    "firstSurname": "DORANTES",
    "secondSurname": "SANCHEZ",
    "surname": "DORANTES SANCHEZ",
    "dateOfBirth": "1990-01-15",
    "dateOfExpiry": "2031",
    "gender": "H",
    "address": "CALLE 5 #12 COL. CENTRO XALAPA VER",
    "detailAddress": {
      "streetAndNumber": "CALLE 5 #12",
      "colony": "COL. CENTRO",
      "zipCode": "91000",
      "city": "XALAPA",
      "state": "VER"
    },
    "section": "1234",
    "mrz": "IDMEXDOSNRD9001151<<<<<<<<<<<<<\n9001151M3102168MEX<<<<<<<<<<<0\nDORANTES<<SANCHEZ<<EDUARDO<BR<",
    "documentVersion": "G"
  },
  "validation": {
    "mrz": "ok",
    "mrzFields": {
      "documentNumberCheck": true,
      "dateOfBirthCheck": true,
      "dateOfExpiryCheck": true,
      "compositeCheck": true
    },
    "curp": "ok",
    "expired": false
  },
  "processing_time": 3.2
}
```

### Response — Passport

```json
{
  "success": true,
  "type": "passport",
  "fields": {
    "curp": "DOES900115HVZRRN09",
    "documentNumber": "G12345678",
    "name": "EDUARDO BRAYAN",
    "firstSurname": "DORANTES",
    "secondSurname": "SANCHEZ",
    "surname": "DORANTES SANCHEZ",
    "dateOfBirth": "1990-01-15",
    "dateOfExpiry": "2029-06-30",
    "gender": "M",
    "nationality": "MEX",
    "issuingCountry": "MEX",
    "mrz": "P<MEXDORANTES<<SANCHEZ<<EDUARDO<BRAYAN<<<\nG123456781MEX9001151M2906309<<<<<<<<3",
    "documentVersion": ""
  },
  "validation": {
    "mrz": "ok",
    "mrzFields": {
      "documentNumberCheck": true,
      "dateOfBirthCheck": true,
      "dateOfExpiryCheck": true,
      "compositeCheck": true
    },
    "curp": "ok",
    "expired": false
  },
  "processing_time": 2.8
}
```

### Response — with `include_authenticity=true`

The `authenticity` object is appended to the standard response:

```json
{
  "success": true,
  "type": "ine",
  "fields": { "...": "..." },
  "validation": { "...": "..." },
  "processing_time": 3.5,
  "authenticity": {
    "success": true,
    "validity_check": {
      "is_authentic": true,
      "confidence": 0.92,
      "tampering_detected": false,
      "quality_score": 0.89,
      "method": "document_ai_validity_processor"
    },
    "processing_time": 1.8
  }
}
```

### Validation fields

| Field       | Values                                | Description                                         |
| ----------- | ------------------------------------- | --------------------------------------------------- |
| `mrz`       | `"ok"` / `"failed"` / `"not_present"` | MRZ checksum validation result                      |
| `mrzFields` | object                                | Per-field checksum results (only when `mrz = "ok"`) |
| `curp`      | `"ok"` / `"failed"` / `"not_present"` | CURP format validation                              |
| `expired`   | `true` / `false` / `null`             | Whether the document is past its expiry date        |

***

## Address Proof Extraction

Extract structured fields from a Mexican utility bill (comprobante de domicilio).

### Endpoint

```http
POST /v1/ocr/extract/document
```

### Parameters

| Parameter       | Type                  | Required | Description                                        |
| --------------- | --------------------- | -------- | -------------------------------------------------- |
| `image`         | file or base64 string | Yes      | Image of the bill. Also accepted as field `front`. |
| `document_type` | string                | Yes      | One of: `cfe`, `telmex`, `izzi`                    |

### Request Example

{% tabs %}
{% tab title="cURL" %}

```bash
curl -X POST https://api.zaits.net/v1/ocr/extract/document \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "image=@recibo_cfe.jpg" \
  -F "document_type=cfe"
```

{% endtab %}

{% tab title="JavaScript" %}

```javascript
const formData = new FormData();
formData.append("image", billFile);
formData.append("document_type", "cfe");

const response = await fetch("https://api.zaits.net/v1/ocr/extract/document", {
  method: "POST",
  headers: { Authorization: "Bearer YOUR_API_KEY" },
  body: formData,
});

const result = await response.json();
```

{% endtab %}

{% tab title="Python" %}

```python
import requests

files = { "image": open("recibo_cfe.jpg", "rb") }
data  = { "document_type": "cfe" }

response = requests.post(
    "https://api.zaits.net/v1/ocr/extract/document",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    files=files,
    data=data,
)

result = response.json()
```

{% endtab %}
{% endtabs %}

### Response

```json
{
  "success": true,
  "type": "cfe",
  "fields": {
    "name": "EDUARDO BRAYAN DORANTES SANCHEZ",
    "fullAddress": "CALLE 5 #12 COL. CENTRO XALAPA VER 91000",
    "zipCode": "91000",
    "phone": "",
    "accountNumber": "123456789",
    "serviceNumber": "00123456",
    "billNumber": "2024010012345",
    "billingPeriod": "ENE 2024",
    "totalPayment": "$450.00",
    "paymentLimitDate": "2024-01-31",
    "barCode": "7501234567890"
  },
  "validation": {
    "mrz": "not_present",
    "curp": "not_present",
    "expired": null
  },
  "processing_time": 1.8
}
```

The same structure applies for `telmex` and `izzi` — field names are identical, values reflect the specific bill format.

***

## Document Authenticity Verification

Verify document authenticity using Google Cloud Document AI's validity processor. This is a standalone endpoint — authenticity can also be included inline with `/extract/id` via `include_authenticity=true`.

### Endpoint

```http
POST /v1/ocr/verify/authenticity
```

### Parameters

| Parameter | Type | Required | Description    |
| --------- | ---- | -------- | -------------- |
| `image`   | file | Yes      | Document image |

### Request Example

```bash
curl -X POST https://api.zaits.net/v1/ocr/verify/authenticity \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "image=@ine_frente.jpg"
```

### Response

```json
{
  "success": true,
  "validity_check": {
    "is_authentic": true,
    "confidence": 0.92,
    "tampering_detected": false,
    "quality_score": 0.89,
    "method": "document_ai_validity_processor",
    "details": {
      "validity_indicators": 8,
      "validity_processor_confidence": 0.93
    }
  },
  "processing_time": 1.8
}
```

***

## Not Supported

The following endpoints are removed and return `400`:

| Endpoint                       | Reason                                |
| ------------------------------ | ------------------------------------- |
| `POST /v1/ocr/extract`         | General text extraction not supported |
| `POST /v1/ocr/extract/receipt` | Receipt OCR not supported             |

```json
{
  "success": false,
  "error": "not_supported",
  "message": "General text extraction is not supported. Use /extract/id for identity documents or /extract/document for utility bills (cfe, telmex, izzi)."
}
```

***

## Error Responses

| Error Code                  | HTTP Status | Description                                       |
| --------------------------- | ----------- | ------------------------------------------------- |
| `not_supported`             | 400         | Endpoint no longer available                      |
| `unsupported_document_type` | 400         | `document_type` is not one of the accepted values |
| `image_required`            | 400         | No image provided                                 |
| `ocr_service_unavailable`   | 503         | OCR microservice is unreachable                   |

### Error Response Format

```json
{
  "success": false,
  "error": "unsupported_document_type",
  "message": "document_type must be one of: cfe, telmex, izzi."
}
```

***

## Best Practices

### Image Quality

* **Format:** JPG, PNG (PDF support via microservice)
* **Size:** Max 10 MB per file
* **Resolution:** Min 300 DPI recommended
* **Orientation:** Upright, undistorted
* **Lighting:** Uniform, no glare or harsh shadows

### INE Tips

* **Always upload the back** — the MRZ on the reverse enables checksum validation and provides authoritative name and date-of-birth values
* If only the front is available, name and DOB are derived from the CURP
* Supported INE versions: A through G (all current generations)

### Address Proof Tips

* Capture the **full document** including headers and QR/barcode areas
* CFE, TELMEX, and IZZI layouts are normalized — the `fields` schema is the same across all three

***

**Next:** [Document Signing API](/api/api-reference/document-signing.md)


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zaits.gitbook.io/api/api-reference/ocr.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
