OcrGui Manual 0.1
This manual is for OcrGui 0.2.1
Contents
IntroductionHow to install OcrGui
Quick start
The main window
Saving multiple texts
The preferences window
Introduction
OcrGui is a G.U.I. (Graphic User Interface) for O.C.R. (Optical Character Recongnition). This program will help you to extract text from scanned images. It is developed in C language using GLib and GTK+ frameworks and supports two open source OCR engines:
- Tesseract
- Gocr
How to install OcrGui
Open a terminal and change to the directory in which the file ocrgui-0.2.1.tar.gz was saved, then type the following commands:tar -xvf ocrgui-0.2.1.tar.gz
cd ocrgui-0.2.1
./configure
make
make install
(run with root privileges)
Quick start
- Run OcrGui clicking on the menu entry or typing
ocrgui
in a terminal window - Click on Open an image file
- Select the image file to open
- Open preferences window: File → Preferences
- Select a dictionary for Tesseract (OCR engine)
- Select a dictionary for Hunspell (spellcheck program)
- Click on Extract text from image. Text will apper on left panel
- Click on Spellcheck with Hunspell to check the text
- Hunspell searches similar words in dictionary. If Hunspell doesn't find anything, or the found words are, inserting a new word is possible
- Click on Save text in a file








The main window
Run OcrGui clicking on the menu entry or typingocrgui
in a terminal window.

- Main toolbar
- Image toolbar
- Text toolbar
- List of opened images
- Image panel
- Text panel
To open an image, select File → Open or click on the Open an image button in the main toolbar (1). OcrGui permits to open more than one image at the same time. In the figure below the files
text1.jpg
and text2.jpg
were opened.

To select an image, double click on its icon. It is possible to select one or more images, duoble clicking using
Ctrl
key. Once an image is selected, it can be closed or processed.To close an image, select File → Close or just click on the Close button in the main toolbar (1).
To process an image, select Image → Recognition or click on the Extract text button in the text toolbar (2).
To check the extracted text, select Text → Spell check or click on the Spell check button in the text toolbar (3).
To save the text in a file, select Text → Save or click on the Save text button in the text toolbar (3).
Saving multiple texts
It is possible to save more than extracted text in the same file:- Open two or more images
- Extract the texts, selecting one image at a time
- Select all images in the list, clicking on icons using
Ctrl
key - Select Text → Save multiple text or click on Save button in the main toolbar (1)
The preferences window
To open the preferences window, select File → Preferences.- General panel
- Programs panel
- Tesseract panel
- Gocr panel
- Spell check panel
It is possible to choose type and size of the font used to show extracted texts.
It is possible to set the layout for the main window: vertical or horizontal. This could be useful to better compare image and text during the spell check.

Default folder: folder proposed to save text files.
Remove new line character: if checked, new line characters will be removed when saving the text. The resulting file text will contain only one row.
Substitute with a blank space: if checked, new line characters will be removed and substitute will a blank space.
Insert blank rows: when saving multiple texts, insert blank rows between a text and the next one.

Temporary folder: OcrGui saves some temporary files in this folder.

OcrGui looks for the presence of some programs it needs to work.

Tesseract or Gocr are needed to extract text from images.
Convert is a part of ImageMagik and is needed to convert opened images in the format used by Tesseract or Gocr for input images.
Hunspell is a program to perform spell check (not mandatory).
This is the configuration panel for Tesseract.

Tesseract needs a dictionary for optimal results. The list shows all Tesseract's dictionaries installed. Choose the language of the text to extract from image. If no dictionary is listed or needed dictionary is not shown, it is possible to manually insert dictionary files path (e.g.
/home/emanuele/tesseract_dictionaries
). Click on Find button to confirm the path and search for dictionaries.The flag Make Tesseract your preferred OCR engine permits to choose the engine used for text recognition.
This is the configuration panel for Gocr.

The flag Make Gocr your preferred OCR engine permits to choose the engine used for text recognition.
This is the configuration panel for Hunspell.

Hunspell needs a dictionary to perform spell check. The list shows all Hunspell's dictionaries installed. Choose the language of the extracted text. If no dictionary is listed or needed dictionary is not shown, it is possible to manually insert dictionary file path and name (e.g.
/home/emanuele/hunspell_dictionaries/it_IT
).