Changelog

February 4, 2025

⚡Improvements

The parsing model has been upgraded to V2 in the API (in addition to the website). The POST /document endpoint accepts an optional parseVersion parameter, which can be set to 1 or 2 (default is now 2).
We have improved how we spatially display documents to the AI in standardization and analysis, which should improve results.

January 18, 2025

Standardizations can now be downloaded as Excel files from the API as well, under the endpoint /standardization/{standardization_id}/download/excel-url, which gives you a temporary URL to download the Excel file. This feature is free of charge.

The parsing model has been upgraded to a new version (V2), which improves accuracy with tables, checkmarks, and handwriting recognition. This update is now the default on the website and will become the default in the API in one week. The POST /document endpoint now accepts an optional parseVersion parameter, which can be set to 1 or 2. The default remains 1 for now but will switch to 2 in one week. To continue using the old version, set parseVersion to 1.

January 6, 2025

We added an ability to download individual standardizations as an Excel file. This feature is currently available only via the website, under the Standardization tab: click Download -> Excel. The Excel file will contain the same information as the standardization details page, but in a more structured format: non-array fields will be in a sheet called 'main', and array fields will be in separate sheets named after the array field. This feature is free of charge, and will be available in the API soon.

In document parsing, we removed the underscore padding in tables, as it caused issues with some documents. Newly parsed documents will revert to the previous behavior of having empty table cells filled with a simple empty string. For standardization with standardizationMode='sectionBased', we will still use padding to improve results.

January 3, 2025

You can now right-click tabs in the dashboard menu for opening a new tab (this was not possible before).
We have disabled the mobile view for the dashboard, as it was not optimized for mobile devices and caused issues.

December 30, 2024

We added an API endpoint to POST a new schema from scratch. Up until now, a schema could only be updated from an existing schema, but now you can add a schema object directly using the API. The endpoint is POST /schema - find more details in the API docs.

December 17, 2024

A bug was fixed where previously we allowed schemas to have fields with type=enum, which is not a valid type in JSON Schemas (enum is an additional key in a field, not a type). We only allow the types 'string', 'number', 'integer', 'boolean', 'object', 'array.

December 7, 2024

Added the ability to download a PDF with the OCR layer baked in. Available both in the API at the endpoint document/{document_id}/download/ocr-url or via the website, under the Documents tab: click Download -> File (OCR Layer). In further detail, this feature allows you to download your PDF - which may be handwritten or contain images - with DocuPanda's OCR layer placed on top of the document in invisible font on the word level. This allows you to search your PDF, or highlight / copy text from it, even if the original document was just a scan. This service is free of charge.

In document parsing, we added underscore padding in tables (instead of empty string), which improves readability / rendering and standardization results, as it makes it easier for the AI to keep track of table columns. This affects anyone using the document.result.text output in its raw form, or anyone using standardization with standardizationMode='sectionBased'.