This page summarizes the changes and improvements to Skwiz.
11 Sep 2024
Skwiz now supports single sign-on for both Microsoft and Google accounts.
By integrating SSO, your team can access Skwiz securely with one set of credentials. Register or log in now with just one click and reduce your team’s risk of password-related breaches and phishing attacks.
18 Jan 2024
You can now further automate your document workflows with our PDF splitting feature.
If you often end up with a single, long PDF after scanning multiple documents in one go, you can now let Skwiz take care of it. We can automatically separate these scans into individual documents to automate even more of your workflow.
Enable PDF splitting now via Live documents > Configure > Workflow > Analysis > Enable.
To use it in the web app, select Analysis > Split document(s) when uploading a PDF.
To use it through the API, use the Analysis routes to upload your PDF.
19 Oct 2023
You can now detect all barcodes and signatures on your documents. Configure this new functionality with the click of a button in your Document type settings and Skwiz will locate the position of each object and extract the data associated with the barcodes.
We support the following barcode types:
Code39
Code128
DataBar
DataBarExpanded
DataMatrix
EAN8
EAN13
PDF417
QRCode
UPCA
UPCE
26 Sep 2023
Our network of trusted partners plays a significant role in introducing and delivering Skwiz to our clients effectively. To streamline the management of various clients within Skwiz from a single account, we’ve added a collaborative partner interface. This allows you to facilitate client introductions and conduct demos showcasing value from day 1. In addition, you can invite colleagues and maintain a full overview and control, all from a single hub. We are confident that this feature will simplify your project management and further improve your experience in Skwiz.
Seamlessly integrate part of the Skwiz UI into your own application, offering a read-only web view of the documents extracted by Skwiz. This integration enables quick, efficient document reviews and enhances your workflow without the hassle of having to switch between applications.
Multi-user organizations: invite your colleagues to join your organization in Skwiz. Each user can be linked to multiple organizations.
Keyword classification: extension of document classification with LLM to allow classification based on keywords for full flexibility and cases where speed is important. We run keyword classification first if it is configured, followed by LLM classification if no match can be found with keywords.
Usage dashboard: get a live view of the pages processed by Skwiz. Usage is categorized in 2 groups. 'Extraction' lists the number of pages that were extracted and may have been classified. 'Classification only' lists the number of pages that were classified but not extracted. This is used to detect documents that do not require extraction because they are received via the same channel (e.g. a mailbox) as other documents and need to be filtered out. Classification only is generally priced lower than extraction.
10 Aug 2023
We’re introducing our optimized invoice and receipt models, which are specific pre-trained models that use machine learning instead of large language models (LLMs). Trained on tens of thousands of example documents, these models have already demonstrated their effectiveness in improving data extraction for a number of our clients.
These optimized models can be used independently or in combination with LLMs, providing you with added flexibility. For example, you could configure Skwiz to use the optimized receipt model to extract essential data such as amounts, the issue date, supplier information and tables from your documents. LLM could then be used in addition to find a custom field (say the City) and categorize the content of the receipts in your specific classes, like “Restaurant”, “Hotel” or “Transport” in the field Category. Below you’ll find an example of such a configuration.
To further enhance the flexibility in data extraction, we’ve also added the possibility to define your own regular expressions (regex) and link these to your fields.
Even with the advanced capabilities of our optimized models and LLMs, the precision and efficiency of regex remains critical in the processing of documents. Particularly for pattern recognition in structured text, regex significantly outperforms models in instances where specific patterns can be identified and differentiated from surrounding text.
For example, if the Purchase Order number on your incoming invoices always consists of "PO" followed by 5 digits (e.g. “PO93438”), using regex (in this case `PO\d{5}`) will ensure flawless extraction from each document.
01 Aug 2023
You can now use classification, also known as categorisation, on multiple levels of your documents.
Automatically identify the document type of any document you upload, for example:
“ID card” vs “Driver’s license” vs “Residence permit”
“Invoice” vs “Receipt”
“Invoice” vs “Other”
Automatically categorize a document’s content, for example:
The expense category on a receipt: “Hotel” vs “Restaurant” vs “Transport”
The gender on an ID card: “Male” vs “Female”
The type of a bank card: “Credit” vs “Debit”
Automatically categorize each line of a table, for example:
The category of each line on an invoice: “Goods” vs “Services”
The liability of each claim in a claims history letter: “Liable” vs “Not liable” vs “Shared liability” vs “Unknown”
Any of these classifiers can be set up within minutes.
Improved table extraction, most notably for tables containing empty cells
Improved handling of dates: corrected the issue with certain dates not being parsed
Improved field type True/False: merged configuration and behavior with field classification
Added the document type key in the document definition modal
To automatically identify the document type of the documents you upload to Skwiz, you can build a classifier between 2 or more document types.
After defining what sets different document types apart, Skwiz can automatically distinguish one document type from another. Here is an example of descriptions that can be used to distinguish invoices from receipts:
"Invoice": contains information about both the seller and buyer, often contains a reference to invoice, facture, factuur, Rechnung, factura, etc. and typically contains bank information
"Receipt": shorter information which typically only contains information about the seller, and not about the buyer
You can set one of the options as fallback to assign this document type if no specific type can be determined.
It is possible to build multiple classifiers in case Skwiz is supporting multiple of your workflows.
To classify a field or a line item, set its type to Classification and define 2 or more options. Optionally, extra instructions can be provided for more complex cases.
Set one of the options as fallback to return this value if no specific value can be found.
05 Jul 2023
Skwiz enables you to extract data from any type of document with the use of large language models (LLMs) like ChatGPT. These LLMs have the capacity to understand and generate natural language, enabling very precise extractions and allowing simpler and more enjoyable ways to work with documents, in virtually any language. This further results in shorter and less costly setups in comparison to traditional rule-based or machine learning approaches as it mitigates the need for extensive training data and time-consuming manual labelling.
This first version of Skwiz aims to provide a radically simple flow to set up your document types autonomously within minutes, while still providing you with the flexibility to influence the quality of the extractions. We dedicated a great deal of attention to the extraction of tables, as their content is often critical and manually entering this data takes up much of the time of those who handle these documents.
In addition, we added a validation step and API to seamlessly fit Skwiz into your existing business workflows.
The onboarding flow will guide you, as a new user, to define your first document type with minimal guidance and will introduce you to the document templates.
The core configuration in Skwiz: defining the document type, fields and extra instructions that will drive the extraction. You can define regular fields as well as fields that need to be extracted from line item tables. We’ve added tooltips and documentation to guide users in getting the most out of Skwiz.
Our document templates propose configurations for common document types like purchase orders, invoices or ID cards. These templates can be fully customized by adding or removing fields and writing instructions that better fit your specific documents.
Highly structured documents that always contain the same information like ID cards, passports or driver’s licenses can directly be used as is.
Alternatively, users can create their own document types from scratch when handling less common document types that are not yet included in the templates.
The document overview allows you to upload documents by drag-and-drop and shows all documents per status. The “Validate” status includes all documents that have been extracted and require manual validation by a user. After validation, documents move to the “Completed” status.
You can configure it to skip the manual validation step in the document type settings, making documents directly transition to the “Completed” status after extraction.
You can modify the values of extracted fields as well as data extracted from tables by selecting text directly on the document, avoiding the need for typing. You can also add or remove rows in the table output grid. Validating a document will put it in status “Completed”.
Skwiz has its own documentation, which in a first version is focused on helping you optimize the quality of your document extractions. This can be achieved by following the naming conventions, understanding the various field types available, and providing additional instructions for each field that requires optimisation.
The Skwiz API offers asynchronous extraction. You can generate multiple API keys.
The following routes are available and described in the API documentation:
Asynchronous extraction: upload your document
Get document extractions: retrieves the status and extracted values (if available) of 1 document
Get documents status: retrieves the status of one or more documents
We will soon add a webhook so that you’ll be able to receive the data from each document as soon as it’s extracted or validated in Skwiz.
You can manually export the extracted data from documents into excel or JSON files, along with the documents themselves. To do this, simply select one or multiple documents from the document overview and click on the export icon located above the list. Alternatively, you can directly export a document’s data from the document validation screen.
Skwiz’ main purpose is to extract information from documents and provide you access to the data, without serving as a system of records. By default, documents are stored for 2 months, providing you with sufficient time to validate the documents and access the extracted data. You can reduce the retention period in the settings, and documents can be manually deleted at any time from the document overview.