OCRdesktop is a useful accessibility tool to grab content from the screen as text via OCR technology.
It takes an image of the current window or workspace, prepares it for better results and uses tesseract to recognize text on it. The result is presented in a caret enabled text area, in a detailed list with coordinates and confidence or in the clipboard. It also can emulate clicks on the text. It consists of two main parts.
1. The main window: This is a caret browsable text area with the recognized content. There is a menubar with many options. Focus the menu with
2. The Macro executor: this is a window where you can choose to
Save the current stored macros and preclicks. You also can skip running a macro by pressing the
cancel button. (See Macros and the preclick concept).
Assign the command
to a shortcut in your desktop environment. You also can use parameters to expand the function of OCRdesktop. For languages other than english, you need to set your language code
ocrdesktop -l <languagecode>
Use the tesseract language codes for <languagecode>
Basically, this should work in any desktop environment.
In Gnome you can do this via the
Gnome Control Center in the
Keyboard window under the
Just press the assigned shortcut. With no parameters, OCRdesktop will recognize just the current window and present it in a caret enabled text area.
OCRdesktop provides different view modes.
You can toggle between the modes with
OCRdesktop is able to use all available tesseract languages with
-l <language code>.
If no language is set. OCRdesktop will use english.
ocrdesktop -l deu
You can also set more than one OCR language
ocrdesktop -l deu+eng
OCRdesktop always upscales the current screenshot 3 times for better results.
Besides this, you can use different types of transformations before OCRdesktop attempts to recognize the text.
You can start OCRdesktop with the parameter you want, or select the options in the navigation window via the
OCR Options submenu of the
OCRdesktop menu. After selecting the options, press
F5 to recognize the text again. This is a little trial and error for better results. Autodetection may be be implemented in a later version.
You can also combine different options.
Inversion of the colours could lead to better results if the colours in the original image cause problems with the text recognition.
Here the colour is removed overall. We get a range of different tones of gray, which could lead to less confusion of tesseract.
This may be the method that most often leads to the best results. Grayscale is always active.
The different tones of gray will break on a defined value between 0 (white) and 255 (black). Everything less than the defined point will be converted to black. A gray tone equal or greater is converted into white.
This leads to a clean image for OCR. No Colours, no noise, no grayscale, just black and white. With this type of image tesseract can also read really bright colour fonts (because they are converted into black)
-b activates this feature.
-t <barriervalue> sets the barrier value. <barriervalue> is a integer between 0 and 255. If
-t is not set, 200 is the default value.
ocrdesktop -b -t 180
You can see a little help and the available parameters if you enter the following in a terminal.
You can always mix different parameters.
If you do not want to restrict recognition to the current window, use the
You can emulate clicks on the word at the current cursor position via the
Ctrl+l): common for selecting/activating entry's
Ctrl+d): common for opening entry's in the same window
Ctrl+r): open the context menu for the object under the mouse
Ctrl+m): Usually opens an object in a new tab
Ctrl+t): used for mouse over events like tool tips
for doing a mouse operation immediately: Place the mouse on the word in the text area or list entry (in the list view) and press on the corresponding shortcut.
The concept of preclicks is not easy to understand at first, but it solves a really easy to understand problem.
In most desktop environments, global shortcuts don't work while a menu is open, (for example the
file menu in the menu bar at the top of most programs).
Preclicks are basically macros that can be run before OCRdesktop takes its screen shot. This allows you to close all menus and let OCRdesktop click on the menu before it recognizes the window. Preclicks macros are really easy to use. In the
Interact menu is a check box
Ctrl+p. Set this check box and choose a mouse click that should be performed before OCRdesktop starts the next time, much the same as doing a normal mouse click emulation ( see Emulate mouse events). After emulating a mouse click, nothing will happen.
Next time you run OCRdesktop, it will ask you what to do. You can press
Run, so all stored clicks will execute. After that, OCRdesktop takes its screen shot for OCR (with the opened menu). If you now check the
Preclick option again the second click will also be stored, (e.g. for opening a sub menu). You can save as many mouse operations as you want. Choose
Unload in the macro window to erase the macro, so its lost. If you press
Cancel, no mouse clicks are performed, but the main window opens. The macro will not be deleted and you will be asked next time you start OCRDesktop if you want to run your stored clicks.
The active preclick macro file is stored under
You can execute an existing macro file stored anywhere on the hard disk by using the
-m </path/to/macro/macroname.ocrm> option.
ocrdesktop -m </path/to/macro/macroname.ocrm>
You can combine it with the
-n option. OCRdesktop will just start the click sequence for you without any GUI.
You can also fire keyboard shortcuts into the preclick macros. To enter the shortcut recording mode, press
Ctrl+k or select the
Send Key menue entry in the
Now every keystroke you type is appended to the currently active preclick macro. Pressing
F4 will leve the recording mode.
Leving the shortcut recording mode may have a delay of up to 2 seconds. This is a known issue.
You can Save, Load, Unload or Run the current macro from the
Ctrl+s): save a macro on the filesystem, in case you need it more often.
Ctrl+o): loading meens that this macro will be started the next time OCRdesktop runs
Ctrl+u): remove the macro from the buffer and prevent it from running. If you loaded the macro from the filesystem, the original macro file wonÂ´t be used
Ctrl+n): close the window an just run the macro now ==== Copy to clipboard ==== OCRdesktop provides the possibility to send the currently recognized content to the clipboard. This is easily done by specifying the-c
option. <code bash>ocrdesktop -c</code> This opens the main window and sends the content to the clipboard. If you dont want to open the main window, you could add the “no GUI” option-n
. <code bash>ocrdesktop -c -n
You can always use the short term.
Now you have the recognized text in the clipboard and no window appears.
You can also press
Ctrl+b when the GUI is open.
You can start the debugmode with the
ocrdesktop -v >> /tmp/debug.out
The debug output is send to the std output. So you have to pipe it.
OCRdesktop stores a copy of all images (original, upscaled and manipulated) in
/tmp/ in debug mode