Transwarp is committed to building enterprise-level big data infrastructure software, providing enterprises with infrastructure software and supporting around the whole data lifecycle to build a data world of the future.
Sophon OCR is Sophon's enterprise-level lightweight text recognition platform, providing three recognition modes including full text recognition, standard recognition and custom template recognition, as well as more than 20 kinds of recognition capabilities such as card, invoice, bank receipt and hybrid forms. Based on its self-developed high-performance algorithms, Sophon OCR not only excels in general recognition scenarios, but also has more deployments and applications in vertical areas such as finance, logistics and healthcare.
What can Sophon OCR offer?
Generic Full-text Recognition
In the generic recognition scenario, Sophon OCR provides a multi-scenario recognition service covering full-text, full-tables, mixed text & tables and handwriting, which can accurately identify the position of charts and text and output the corresponding structured results. In particular, Sophon OCR has good recognition capability for frameless tables, which are a pain point in the industry.
Custom Template Recognition
Sophon OCR provides template recognition function for accurate intelligent classification and recognition of images with different recognition contents in fixed formats, such as bank receipt and bank slip. You only need to create a recognition template and configure a recognition area to intelligently identify multiple types of data. Manual classification is not recommended.
Standard Card Recognition
It provides recognition of common scenarios such as ID cards, household registration books, business licences, bank cards, train tickets and VAT invoices. With the accumulation and polishing of data in the business, Sophon OCR can provide recognition models that are better than the industry level.
Batch Job Management
Sophon OCR offers a job module to solve the needs of users for batch recognition processing, where users only need to select any recognition service and the data to be processed. Structured results can also be exported in json, word and excel according to business needs.
File System Management
Sophon OCR has built-in file system management, which allows users to seamlessly interface with custom file systems. The synchronisation function allows incremental changes to be made to the original data set without affecting the previous recognition results, ensuring simplicity of operation and privacy of data.
Flow Diagram of Recognition in Sophon OCR
Why should you choose Sophon OCR?
Accurate Template Recognition
Sophon OCR implements custom template recognition for complex text and table mixes with multiple fonts and any layout, solving the problem of accurate recognition of fixed layout type images in multiple table-and-text mix scenarios with industry-leading results.
Rich Pre- and Post-processing
It provides pre-processing functions such as image quality optimization and format conversion for seals/watermarks, low resolution and deformation, etc., to ensure the best state of image input. It also provides flexible post-processing functions, such as intelligent error correction and storage format conversion, to improve accuracy and seamlessly connect downstream services.
Efficient Batch Operations
It provides batch recognition, data source management and state monitoring of any recognition pattern. After recognition, the results and templates can be checked and persisted in the job, which greatly improves the production efficiency.
Leading Self-developed Algorithms
With its self-developed cross-modal multi-dimensional feature fusion algorithm and adaptive intelligent template matching algorithm technology, it can not only batch process the text structuring requirements of fixed style images, but also carry out structural recognition for images with elastic changes in text position and table cell size.
How do you deploy Sophon OCR?
Instead of building your own servers and managing complex hardware resources and computational scheduling, you can simply call the recognition service through an API interface and complete the recognition of your business data in the cloud nodes.
The deployment of Sophon OCR can be done privately using a software package or a hardware-software all-in-one machine, Then the recognition of business data can be completed at the customer site by accessing the private network.
Optimize the Work Flow of Finance Department in Transwarp
Electronicization of transaction work orders in an exchange
○ The bank receipt should be entered into the ERP system and associated with the contract number. The receipt format of different banks is different. Almost all the fields on the bank receipt need to be recorded, which leads to a huge workload of manual input.
○ All invoices in the reimbursement process, including VAT invoices and taxi invoices, need to be manual input, resulting in low work efficiency.
○ Through template recognition technology, the keyword segments such as customer name, amount and transaction time in the return form are structured and recognition for extraction. In addition, based on the dynamic template matching algorithm, the structured recognition of images with flexible changes in text position and table cell size is achieved to ensure the accuracy of template matching.
○ The recognition of standard invoices, such as VAT invoices, is achieved with high accuracy through the accumulation, training and optimisation of a large amount of business data.
○ Based on standard API interface, ERP developers can quickly integrate recognition services and put them online for testing
It has helped internal finance colleagues to complete the structured extraction of key information from various types of returns from over 15 banks, and the template matching degree reached more than 99%. Finance staff only need to upload the original return data to the system to complete the automatic classification and identification.
With our company's own internal data, the data can be used for continuous training and iteration of recognition models after desensitisation, effectively enhancing the recognition accuracy and generalisation capability of various types of receipt recognition models.