ocr - PixLab Blog

PixLab Vision Workspace Launches Globally with Localized Versions for US, Japan, Indonesia, Brazil & More

We're thrilled to announce the global release of PixLab Vision Workspace, the next-generation AI-powered productivity suite built for both solo professionals and enterprise teams. With localized versions for Japan, Indonesia, Brazil, and Spanish-speaking markets, PixLab Vision is now ready to power your business — no matter where you are.

PixLab Vision Workspace

👉 Try it now: vision.pixlab.io

🗾 Tailored for Japan and Emerging Markets

PixLab Vision isn’t just translated — it's culturally and technically localized for real-world business users:

🇯🇵 Japanese Interface: Available at pixlab.io/vision-platform/workspace-jp
🇮🇩 Bahasa Support: Optimized for the Indonesian market
🇧🇷 Brazilian Portuguese: UX tailored to local workflows
🌎 Spanish Version: For Latin America and global Spanish-speaking professionals

Whether you're a solo entrepreneur, a corporate team, or a creative agency, PixLab Vision adapts to your language, data security expectations, and local compliance standards.

Unified AI-Powered Workspace

PixLab Vision Workspace is more than a document viewer — it’s a full AI workspace backed by PixLab’s proprietary Vision-Language Models (VLMs) and robust reasoning APIs.

Key Features:

Document AI: Upload images, PDFs, Office files — then extract structured data using OCR & LLM reasoning.
Smart Chat: Talk to your documents using natural language. Summarize, edit, translate, extract insights.
Spreadsheet + Markdown: Built-in editors for tabular data, reports, blog posts, and contracts.
Annotation Tools: Comment, highlight, and mark up documents with collaborative tools.
API Integration: Use PixLab’s REST endpoints including:
- /llm-parse – Extract data from unstructured files
- /tool-call – Call tools in real-time from your LLM
- /query – Ask questions about images and documents

Freemium vs Premium: What's Included?

Feature	Freemium	Premium
Storage	Local device only (browser)	Cloud sync across devices
Document History	Local only	Persistent and shareable
OCR & LLM Parse	Available	Enhanced throughput
Team Collaboration	Not included	Multi-user collaboration tools
File Size Limit	Up to 10MB	Up to 250MB per file
Priority Support	Community only	Email & SLA-based corporate support

Freemium users enjoy complete privacy with local data storage: all OCR scans, edits, and chat histories stay on your browser.
Premium plans unlock cloud sync, API credits, and enterprise-grade security for distributed teams.

Who Is It For?

PixLab Vision Workspace is already used by:

SMBs & solo founders in Japan digitizing business receipts and contracts
Marketing agencies automating content repurposing and document translation
Enterprise teams managing compliance, contract processing, and data entry at scale
Developers integrating OCR, chat, and document extraction directly into their products via PixLab’s APIs

Start Using PixLab Vision Today

Whether you're streamlining your office work, building automation workflows, or integrating AI into your app, PixLab Vision puts the entire productivity stack in one place.

Let the AI handle the boring parts, so you can focus on growing your business.

👉 Get started today at vision.pixlab.io

👉 Learn More at: pixlab.io/vision-platform/workspace

Explore the New PixLab Vision Platform VLM API Endpoints

PixLab’s Vision VLM Platform introduces a groundbreaking set of Vision Language Model (VLM) endpoints that combine natural language processing and computer vision in a simple, developer-friendly API suite.

PixLab Vision Platform

From querying images and parsing complex documents to generating PDFs and extracting ID data, the PixLab VLM API makes it easy to integrate intelligent image and document analysis into your own apps.

Vision Language Model Endpoints

These endpoints allow your application to understand images and video frames with natural language intelligence.

/query – Ask natural language questions about images and receive contextual answers
/describe – Get a natural language description of an image
/tagimg – Retrieve tags describing the image content
/detect – Detect and localize objects in the image
/vocr – OCR via vision models for printed text
/nsfw – Detect explicit content in media

Unstructured Document Parsing

Parse unstructured documents like invoices, receipts, and contracts using VLM-powered tools.

/llm-parse – Extract data from complex document layouts using a user-defined schema

Embedding APIs

Turn your content into machine-understandable vectors for search, indexing, and matching.

/txt-embed – Generate semantic embeddings from raw text
/img-embed – Generate vector embeddings from images

These endpoints are perfect for building your own AI-powered similarity search or recommendation systems.

Image Processing & Background Removal

PixLab also provides classic computer vision capabilities enhanced by AI.

/bg-remove – Remove background or unwanted objects from images
/docscan – Scan and extract JSON data from over 11,000 supported ID document types from 200+ countries
/nsfw – Pixelate or block NSFW content automatically

PDF Generation & Conversion

Create and manipulate PDF documents programmatically using these SDK-free endpoints:

/pdfgen – Generate media-rich PDFs from HTML or Markdown
/pdftoimg – Convert PDF files into image previews

LLM Tool Calling Infrastructure

PixLab provides built-in tools for enhancing your LLM pipeline:

/llm-tool – Get a list of tools callable from your LLM
/tool-call – Execute a tool call based on LLM output

These endpoints enable your LLM agent to execute functions dynamically and return results within the same context.

System Tools & Metadata

Helpful utility endpoints for checking server health and supported formats:

/status – View current system status
/about – Get PixLab version & license info
/extension – Retrieve supported file extensions

Explore the Full API Suite

PixLab offers over 150 RESTful endpoints for vision, media, and document automation tasks. Visit the following links to dive deeper:

Final Thoughts

Whether you're working on an AI productivity suite, eKYC onboarding, or document automation pipeline, PixLab’s VLM API delivers powerful, production-ready tools in minutes. All endpoints are accessible via secure HTTP requests and require no proprietary SDKs.

Get started by signing up for an API key at PixLab Console and explore what's possible with Vision Language Models.

Build smarter apps — faster.

Introducing the New DOCSCAN API - Vision-Powered, SDK-Free, and Easier Than Ever

The PixLab Development Team is thrilled to announce the release of the next-generation DOCSCAN API, the core engine behind the newly rebranded PixLab ID Scan Platform.

Built from the ground up with Vision Language Models and hosted on the powerful PixLab Vision infrastructure, this update brings unmatched simplicity, security, and intelligence to identity document processing.

🌍 A Platform Re-imagined

Say goodbye to complex SDK integrations. The new DOCSCAN API requires no client-side SDKs, just a single HTTPS-enabled REST endpoint that supports both GET and POST requests. This means you can call DOCSCAN from any programming environment, whether you're using Python, Java, PHP, Go, or even a shell script.

New Home Page: pixlab.io/id-scan-api/
New Documentation: pixlab.io/id-scan-api/docscan

⚡ What's New in This Version?

✅ Powered by Vision Language Models

The new DOCSCAN API harnesses the full power of PixLab’s Vision Language Models to extract structured, high-quality data from scanned documents with increased accuracy and robustness.

✅ No SDK Required

Forget installing SDKs or maintaining device-specific libraries. DOCSCAN is pure REST—simple, fast, and universal.

✅ Single Endpoint Simplicity

Use a single, unified API endpoint for both document scanning and data extraction. No need to juggle multiple APIs or chained requests.

✅ Supports GET & POST

Whether you prefer URL-based GET requests or multipart POST uploads, DOCSCAN supports both with full flexibility.

✅ TLS 1.3 Secured

All API traffic is encrypted end-to-end using TLS 1.3, ensuring maximum security and compliance from the first byte.

🚀 Built for Developers

The updated documentation at pixlab.io/id-scan-api/docscan has been completely restructured to be developer-first. Clear examples, copy-ready code snippets, and real-world integration guides will help you get up and running in minutes.

🧩 Use Cases

ID verification in finance, healthcare, or government
Digital onboarding for apps and services
Automated customer registration flows
Global document scanning with consistent output formats

🛠 Start Building Today

Head to the PixLab Console to generate your API key and begin integrating the new DOCSCAN API in minutes.

Whether you're modernizing your on-boarding flow or automating ID verification at scale, the new DOCSCAN API offers unmatched speed, simplicity, and intelligence—without the SDK overhead.

🔗 Learn more: pixlab.io/id-scan-api/
📚 REST API Documentation: pixlab.io/id-scan-api/docscan

🌍 Universal Document Support

The DOCSCAN API provides robust support for a wide array of officially issued identification documents. This includes, but is not limited to:

Passports
ID Cards (Citizen ID, Resident ID, Immigration Card, etc.)
Driving Licenses
Visas
Birth & Death Certificates

The API covers documents from nearly all UN-recognized countries, offering unparalleled versatility. This release expands the API's capabilities to handle over 11,094 ID document variations originating from more than 197 countries. Below is a list of supported countries by DOCSCAN :

Afghanistan
Albania
Algeria
Andorra
Angola
Antigua and Barbuda
Argentina
Armenia
Australia
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
Brunei
Bulgaria
Burkina Faso
Burundi
Cabo Verde
Cambodia
Cameroon
Canada
Central African Republic
Chad
Chile
China
Colombia
Comoros
Congo (Brazzaville)
Congo (Kinshasa)
Costa Rica
Cote d'Ivoire
Croatia
Cuba
Cyprus
Czechia
Denmark
Djibouti
Dominica
Dominican Republic
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Eswatini
Ethiopia
Fiji
Finland
France
Gabon
Gambia
Georgia
Germany
Ghana
Greece
Grenada
Guatemala
Guinea
Guinea-Bissau
Guyana
Haiti
Honduras
Hungary
Iceland
India
Indonesia
Iran
Iraq
Ireland
Israel
Italy
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kiribati
Kuwait
Kyrgyzstan
Laos
Latvia
Lebanon
Lesotho
Liberia
Libya
Liechtenstein
Lithuania
Luxembourg
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Mauritania
Mauritius
Mexico
Micronesia
Moldova
Monaco
Mongolia
Montenegro
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands
New Zealand
Nicaragua
Niger
Nigeria
North Korea
North Macedonia
Norway
Oman
Pakistan
Palau
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Poland
Portugal
Qatar
Romania
Russia
Rwanda
Saint Kitts and Nevis
Saint Lucia
Saint Vincent and the Grenadines
Samoa
San Marino
Sao Tome and Principe
Saudi Arabia
Senegal
Serbia
Seychelles
Sierra Leone
Singapore
Slovakia
Slovenia
Solomon Islands
Somalia
South Africa
South Korea
South Sudan
Spain
Sri Lanka
Sudan
Suriname
Sweden
Switzerland
Syria
Taiwan
Tajikistan
Tanzania
Thailand
Timor-Leste
Togo
Tonga
Trinidad and Tobago
Tunisia
Turkey
Turkmenistan
Tuvalu
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States
Uruguay
Uzbekistan
Vanuatu
Vatican City
Venezuela
Vietnam
Yemen
Zambia
Zimbabwe

Modern Passport Structure & Bulk Scan APIs

A Passport is a document that almost everyone has at some point in their lives. It is issued by the country’s government to its citizens and mainly being used for traveling purposes. It also serves as proof of nationality, name, and more importantly an Universally Unique ID for its owner.

Modern Passport Structure

Passport Specimen

Many services have been long-time accepting passports as identification documents from their customers to complete their KYC (Know Your Customer) form as required by the legislation in force. This is especially true and enforced for the Finance, HR or Travel sectors. In most cases, a human operator will verify the authenticity of the submitted document and grant validation or reject it.

Things can get really complicated if you have hundreds of KYC forms to checks, but also if your clients differ in nationality. Quickly, you will find yourself drowning in physical copies of passports in different languages that you can not even understand. Let alone the potential legal problems you can face with passport copies laying around the office. This is why, an automated & safe solution for Passports processing is required!

Modern Passport Structure

From the 1980s on wards, most countries started issuing passports containing an MRZ. MRZ stands for the Machine Readable Zone and is usually located at the bottom of the Passport as shown below:

Modern Passport Specimen

Passports MRZ Sample

Passports that contain an MRZ are referred to as MRPs, machine-readable passports (Almost all modern issued Passports have one). The structure of the MRZ is standardized by the ICAO Document 9303 and the International Electro-technical Commission as ISO/IEC 7501-1.

The MRZ is an area on the document that can easily be read by a machine using an OCR Reader Application or API. It’s not important for you to understand how it works, but if you look at it carefully, you will see that it contains most of the relevant information on the document, combined with additional characters and a checksum that can be extracted programmatically and automatically via API as we will see in the next section.

Once parsed, the following information are automatically extracted from the target MRZ and made immediately available to your app, thanks to the /docscan API endpoint:

issuingCountry: The issuing country or organization, encoded in three characters.
fullName: Passport holder full name. The name is entirely upper case.
documentNumber: This is the passport number, as assigned by the issuing country. Each country is free to assign numbers using any system it likes.
checkDigit: Check digits are calculated based on the previous field. Thus, the first check digit is based on the passport number, the next is based on the date of birth, the next on the expiration date, and the next on the personal number. The check digit is calculated using this algorithm.
nationality: The issuing country or organization, encoded in three characters.
dateOfBirth: The date of the passport holder's birth in YYMMDD form. Year is truncated to the least significant two digits. Single digit months or days are perpended with 0.
sex: Sex of the passport holder, M for males, F for females, and < for non-specified.
dateOfExpiry: The date the passport expires in YYMMDD form. Year is truncated to the least significant two digits. Single digit months or days are perpended with 0.
personalNumber: This field is optional and can be used for any purpose that the issuing country desires.
finalcheckDigit: This is a check digit for positions 1 to 10, 14 to 20, and 22 to 43 on the second line of the MRZ. Thus, the nationality and sex are not included in the check. The check digit is calculated using this algorithm.

Automatic Passport Processing

PixLab Logo

Fortunately for the developer wishing to automate Passports scanning, PixLab can automatically scan & extract passport MRZ but also help to detect possible fraudulent documents. This is made possible thanks to the /docscan API endpoint which let you in a single call scan government issued documents such as Passports, Visas or ID Cards from various countries.

Besides extracting MRZ, the /docscan API endpoint shall automatically crop any detected face and transform binary Machine Readable Zone into stream of text content (i.e. full name, issuing country, document number, date of expiry, etc.) ready to be consumed by your app in the JSON format.

Below, a typical output result of the /docscan endpoint for a passport input image:

Input Passport Specimen (JPEG/PNG/BMP Image)

Input Image URL

Extracted MRZ Fields

MRZ Fields

What follow is the gist used to achieve such result:

Other document scanning code samples are available to consult via the following Github links:

Python code for scanning Passports: passport_scan.py.
PHP code for scanning Passports: passport_scan.php.

Face extraction is automatically performed using the /facedetect API endpoint. For a general purpose Optical Character Recognition engine, you should rely on the /OCR API endpoint instead. If you are dealing with PDF documents, you can convert them at first to raw images via the /pdftoimg endpoint.

Conclusion

The era we are in is more digitized than ever. Tasks that are repetitive are slowly being replaced by computers and robots. In many cases, they can perform these tasks faster, with a smaller amount of mistakes and in a more cost-effective manner. At PixLab we focus on building software to replace manual repetitive labor in administrative business processes. The processing and checking of passports can be very time-consuming. Using /docscan to automate your passport processing will enable you to save cost, on-board customers faster and reduce errors in administrative processes.

PixLab API 1.9.72 Released!

The PixLab development team is pleased to announce the immediate availability of the PixLab API 1.9.72.

PixLab Logo

Since its launch on 2017, PixLab have already processed over 450 Millions of users contents whether static images, GIF or Videos Frames. This milestone release introduces new API endpoints, various minor bug fixes, processing speed improvements by up to 5% and many innovative features. Let's start with the one we are existed about in no particular order:

Passports & ID Cards Scan: While documents scanning were introduced in earlier version of the PixLab API via the /docscan endpoint. This release pushes further the accuracy of the OCR engine. A 5MB raw Passport sample now takes less than 3 seconds to execute including face detection & extraction, MRZ (Machine Readable Zone) extraction and finally transformation of the Raw MRZ data into textual content. You can try out the accuracy of the Passport scanning engine using these Python and PHP scripts to see it in action.
DNS infrastructure moved to Cloudflare for faster than ever response times.
Full support for HTTP/2 and HTTP/3 (QUIC).
Up to three layers of redundancy for the standard PixLab OCR engine for faster, accurate & guaranteed scan results.
A fresh update of the adult & gore content detection ML model which is used to power the famous PixLab /NSFW API endpoint that have already analyzed over 100 millions of user contents with high accuracy.
Face Detection (including facial landmarks extraction) & Emotion Pattern (including gender & age) extraction are now using the RetinaFace Model which scores the highest on the LFW dataset.
The /docscan API endpoint now fully support scanning ID cards from Malaysia & Singapore and many other countries (at users request) as well the brand new India Aadhar ID card. Find out more information about Aadhard fully support via our blog announcement here.
Finally, a brand new, high performance custom image processing layer written in C/C++ and powered by ImageMagick and our Embedded computer Vision Library SOD is integrated directly into our cloud API.

Pixlab customers are more than advised to take a look at The official API endpoints documentation, The Samples Page, The Github repository for additional information.

Finally, for potentially interested users, you are more than welcome to start a 7 days free trial to see the API in action. Simply head to the PixLab Dashboard and activate your free trial from there.

PixLab Logo

Passports, Travel Documents & ID Cards Scan API Endpoint Available

The PixLab OCR team is pleased to introduce the /docscan API endpoint which let you in a single call scan government issued documents such as Passports, Visas or ID Cards from various countries.

Besides its accurate text scanning capabilities, the /docscan API endpoint shall automatically extract any detected face and transform binary data such as Passport Machine Readable Zone (MRZ) into stream of text payload (i.e. full name, issuing country, document number, date of expiry, etc.) ready to be consumed by your app in the JSON format.

Below, a typical output result of the /docscan endpoint for a passport input image:

Input Passport Specimen (JPEG/PNG/BMP Image)

Input Image URL

Extracted MRZ Fields

MRZ Fields

The code samples used to achieve such result are available to consult via the following Github links:

Python code for scanning Passports: passport_scan.py.
PHP code for scanning Passports: passport_scan.php.

Face extraction is automatically performed using the /facedetect API endpoint. For a general purpose Optical Character Recognition engine, you should rely on the /OCR endpoint instead. If you are dealing with PDF documents, you can convert them at first to raw images via the /pdftoimg endpoint.

Below, a typical Python code snippet for scanning passports:

import requests
import json

# Given a government issued passport document, extract the user face and parse all MRZ fields.
#
# PixLab recommend that you connect your AWS S3 bucket via your dashboard at https://pixlab.io/dashboard
# so that any cropped face or MRZ crop is stored automatically on your S3 bucket rather than the PixLab one.
# This feature should give you full control over your analyzed media files.
#
# https://pixlab.io/#/cmd?id=docscan for additional information.

req = requests.get('https://api.pixlab.io/docscan',params={
    'img':'https://i.stack.imgur.com/oJY2K.png', # Passport sample
    'type':'passport', # Type of document we are a going to scan
    'key':'Pixlab_key'
})
reply = req.json()
if reply['status'] != 200:
    print (reply['error'])
else:
    print ("User Cropped Face: " + reply['face_url'])
    print ("MRZ Cropped Image: " + reply['mrz_img_url'])
    print ("Raw MRZ Text: " + reply['mrz_raw_text'])
    print ("MRZ Fields: ")
    # Display all parsed MRZ fields
    print ("\tIssuing Country: " + reply['fields']['issuingCountry'])
    print ("\tFull Name: "       + reply['fields']['fullName'])
    print ("\tDocument Number: " + reply['fields']['documentNumber'])
    print ("\tCheck Digit: "   + reply['fields']['checkDigit'])
    print ("\tNationality: "   + reply['fields']['nationality'])
    print ("\tDate Of Birth: " + reply['fields']['dateOfBirth'])
    print ("\tSex: "           + reply['fields']['sex'])
    print ("\tDate Of Expiry: "    + reply['fields']['dateOfExpiry'])
    print ("\tPersonal Number: "   + reply['fields']['personalNumber'])
    print ("\tFinal Check Digit: " + reply['fields']['finalcheckDigit'])

Finally, the official endpoint documentation is available to consult at pixlab.io/cmd?id=docscan and a set of working samples in various programming language are available at the PixLab samples pages.

Full Scan Support for India Aadhar ID Card

The PixLab OCR team is pleased to announce that is now fully support scanning India Aadhar ID Cards besides Malaysia (MyKad) and Singapore identity cards as well governments issued Passports from all over the world via the /docscan API endpoint.

When invoked, the /docscan API endpoint shall Extract (crop) any detected face and transform the raw Aadhar ID card content such as holder name, gender, date of birth, ID number, etc. into a JSON object ready to be consumed by your app.

Below, a typical output result of the /docscan API endpoint for a Aadhar ID card input sample:

Input Aadhar ID Card

ID card specimen

Extracted Aadhar Card Fields

extracted fields

The same API call applies for Passports as well different ID cards from supported countries (you just specify the country name or ISO code):

Input Passport Specimen

Passport Specimen

Extracted MRZ Fields

MRZ Fields

The code samples used to achieve such result are available to consult via the following Github links:

Python code for scanning Aadhar ID cards (as well other ID cards from different countries): id_card_scan.py.
PHP code for scanning Aadhar ID cards: id_card_scan.php.
Python code for scanning Passports: passport_scan.py.
PHP code for scanning Passports: passport_scan.php.
For converting PDF documents to raw images, you can rely on the /pdftoimg API endpoint as shown in this Python or PHP gist.

Face extraction is automatically performed using the /facedetect API endpoint. If you are dealing with PDF documents, you can convert them at first to raw images via the /pdftoimg endpoint.

Finally, the official endpoint documentation is available to consult at pixlab.io/cmd?id=docscan and a set of working samples in various programming language are available at the PixLab samples pages.

Full Scan Support for Malaysia and Singapore ID Cards

The PixLab OCR team is pleased to announce that it fully support now scanning ID cards from Malaysia (MyKad), Singapore, India Aadhaar, Emirates (UAE) ID & GCC Residence Card, US Driver's License, as well governments issued Passports from all over the world via the /DOCSCAN API endpoint.

Besides its robust text scanning features, the /docscan API endpoint shall Extract (crop) any detected face and transform the extracted text content such as ID card fields (name, ID number, address, etc.) or Passport Machine Readable Zone (MRZ) into JSON object fields ready to be consumed by your code.

Below, a typical output result of the /docscan endpoint for an ID card input image:

Input ID card Specimen

ID card specimen

Extracted ID Card Fields

extracted fields

The same applies for Passports:

Input Passport Specimen

Passport Specimen

Extracted MRZ Fields

MRZ Fields

The code samples used to achieve such result are available to consult via the following Github links:

Python code for scanning ID cards: id_card_scan.py.
PHP code for scanning ID cards: id_card_scan.php.
Python code for scanning Passports: passport_scan.py.
PHP code for scanning Passports: passport_scan.php.
For converting PDF documents to raw images, you can rely on the /pdftoimg API endpoint as shown in this Python or PHP gist.

Face extraction is automatically performed using the /facedetect API endpoint. If you are dealing with PDF documents, you can convert them at first to raw images via the /pdftoimg endpoint.

Finally, the official endpoint documentation is available to consult at pixlab.io/id-scan-api/docscan and a set of working samples in various programming language are available at the PixLab samples pages.

OCR performance improved

As requested by our users, our /OCR endpoint gets more support for various languages including Arabic, Modern Hebrew, Russian & simplified Chinese.

bounding box coordinates are now enabled by default. For each request, besides the full text output, you get a bbox array where each entry of this array hold the target word and its bounding box (i.e. rectangle) coordinates. Each entry in this array is identified by an instance of the following JSON object:


{
    word: Extracted word,
    x: X coordinate of the top left corner,
    y: Y coordinate of the top left corner,
    w: Width of the rectangle that englobe this word,
    h: Height of the rectangle that englobe this word
}

The documentation is updated and available to consult at https://pixlab.io/cmd?id=ocr and a Python sample is provided on Github at https://github.com/symisc/pixlab/blob/master/python/ocr.py.

With that in hand, you can further tune your analysis phase for example by extracting each word via /crop and perform another pass if desired.