This page summarizes the changes and improvements to Skwiz.

10 Aug 2023

Optimized invoice & receipt models + Regular expressions

Optimized invoice & receipt models

We’re introducing our optimized invoice and receipt models, which are specific pre-trained models that use machine learning instead of large language models (LLMs). Trained on tens of thousands of example documents, these models have already demonstrated their effectiveness in improving data extraction for a number of our clients.

These optimized models can be used independently or in combination with LLMs, providing you with added flexibility. For example, you could configure Skwiz to use the optimized receipt model to extract essential data such as amounts, the issue date, supplier information and tables from your documents. LLM could then be used in addition to find a custom field (say the City) and categorize the content of the receipts in your specific classes, like “Restaurant”, “Hotel” or “Transport” in the field Category. Below you’ll find an example of such a configuration.

fields configuration

Regular expression

To further enhance the flexibility in data extraction, we’ve also added the possibility to define your own regular expressions (regex) and link these to your fields.

Even with the advanced capabilities of our optimized models and LLMs, the precision and efficiency of regex remains critical in the processing of documents. Particularly for pattern recognition in structured text, regex significantly outperforms models in instances where specific patterns can be identified and differentiated from surrounding text.

For example, if the Purchase Order number on your incoming invoices always consists of "PO" followed by 5 digits (e.g. “PO93438”), using regex (in this case `PO\d{5}`) will ensure flawless extraction from each document.

01 Aug 2023

Classification of documents, fields and line items


You can now use classification, also known as categorisation, on multiple levels of your documents. 

Document classification

Automatically identify the document type of any document you upload, for example: 

  • “ID card” vs “Driver’s license” vs “Residence permit”

  • “Invoice” vs “Receipt”

  • “Invoice” vs “Other”

Field classification

Automatically categorize a document’s content, for example:

  • The expense category on a receipt: “Hotel” vs “Restaurant” vs “Transport”

  • The gender on an ID card: “Male” vs “Female”

  • The type of a bank card: “Credit” vs “Debit”

Line item classification

Automatically categorize each line of a table, for example:

  • The category of each line on an invoice: “Goods” vs “Services”

  • The liability of each claim in a claims history letter: “Liable” vs “Not liable” vs “Shared liability” vs “Unknown”

Any of these classifiers can be set up within minutes. 

Other changes
  • Improved table extraction, most notably for tables containing empty cells

  • Improved handling of dates: corrected the issue with certain dates not being parsed

  • Improved field type True/False: merged configuration and behavior with field classification

  • Added the document type key in the document definition modal

Document classification

document classification

To automatically identify the document type of the documents you upload to Skwiz, you can build a classifier between 2 or more document types. 

After defining what sets different document types apart, Skwiz can automatically distinguish one document type from another. Here is an example of descriptions that can be used to distinguish invoices from receipts:

  • "Invoice": contains information about both the seller and buyer, often contains a reference to invoice, facture, factuur, Rechnung, factura, etc. and typically contains bank information

  • "Receipt": shorter information which typically only contains information about the seller, and not about the buyer

You can set one of the options as fallback to assign this document type if no specific type can be determined.

It is possible to build multiple classifiers in case Skwiz is supporting multiple of your workflows. 

Field and line item classification

Field classificationLine items classification

To classify a field or a line item, set its type to Classification and define 2 or more options. Optionally, extra instructions can be provided for more complex cases. 

Set one of the options as fallback to return this value if no specific value can be found.

classification configuration

05 Jul 2023



Skwiz enables you to extract data from any type of document with the use of large language models (LLMs) like ChatGPT. These LLMs have the capacity to understand and generate natural language, enabling very precise extractions and allowing simpler and more enjoyable ways to work with documents, in virtually any language. This further results in shorter and less costly setups in comparison to traditional rule-based or machine learning approaches as it mitigates the need for extensive training data and time-consuming manual labelling. 

This first version of Skwiz aims to provide a radically simple flow to set up your document types autonomously within minutes, while still providing you with the flexibility to influence the quality of the extractions. We dedicated a great deal of attention to the extraction of tables, as their content is often critical and manually entering this data takes up much of the time of those who handle these documents. 

In addition, we added a validation step and API to seamlessly fit Skwiz into your existing business workflows.  

Onboarding flow

The onboarding flow will guide you, as a new user, to define your first document type with minimal guidance and will introduce you to the document templates. 

onboarding flow

Document type definition

The core configuration in Skwiz: defining the document type, fields and extra instructions that will drive the extraction. You can define regular fields as well as fields that need to be extracted from line item tables. We’ve added tooltips and documentation to guide users in getting the most out of Skwiz. 

document type definition

Document templates

Our document templates propose configurations for common document types like purchase orders, invoices or ID cards. These templates can be fully customized by adding or removing fields and writing instructions that better fit your specific documents. 
Highly structured documents that always contain the same information like ID cards, passports or driver’s licenses can directly be used as is. 
Alternatively, users can create their own document types from scratch when handling less common document types that are not yet included in the templates. 

document template

Document overview

The document overview allows you to upload documents by drag-and-drop and shows all documents per status. The “Validate” status includes all documents that have been extracted and require manual validation by a user. After validation, documents move to the “Completed” status. 

You can configure it to skip the manual validation step in the document type settings, making documents directly transition to the “Completed” status after extraction. 

document overview

Document validation

You can modify the values of extracted fields as well as data extracted from tables by selecting text directly on the document, avoiding the need for typing. You can also add or remove rows in the table output grid. Validating a document will put it in status “Completed”.

document validation fielddocument validation table


Skwiz has its own documentation, which in a first version is focused on helping you optimize the quality of your document extractions. This can be achieved by following the naming conventions, understanding the various field types available, and providing additional instructions for each field that requires optimisation.

API & its documentation

The Skwiz API offers asynchronous extraction. You can generate multiple API keys. 

The following routes are available and described in the API documentation:

  • Asynchronous extraction: upload your document

  • Get document extractions: retrieves the status and extracted values (if available) of 1 document

  • Get documents status: retrieves the status of one or more documents

We will soon add a webhook so that you’ll be able to receive the data from each document as soon as it’s extracted or validated in Skwiz. 

Manual export

You can manually export the extracted data from documents into excel or JSON files, along with the documents themselves. To do this, simply select one or multiple documents from the document overview and click on the export icon located above the list. Alternatively, you can directly export a document’s data from the document validation screen. 

Document auto-deletion

Skwiz’ main purpose is to extract information from documents and provide you access to the data, without serving as a system of records. By default, documents are stored for 2 months, providing you with sufficient time to validate the documents and access the extracted data. You can reduce the retention period in the settings, and documents can be manually deleted at any time from the document overview.



Web platformChangelogPricing
© Copyright Skwiz
We use cookies to provide you with the best user experience. For more information, please read our Cookie Policy.