At an earlier stage, the ABBYY Cloud OCR SDK was going to be used for text recognition of image scans, and it worked so reliably for all images containing mainly text that it was already in production, but at some point it was simply removed from ABBYY's offering, although it is still in use by previous customers.
After trying out many alternatives, two different ones have been chosen, the first being Google's Vision API and, if it is not available (e.g. no apikey), OCRSpace. Google is free up to 1000 uses per month. OCRSpace is free continuously, but there is a limit of one megabyte for image files, unless you buy a monthly subscription. Use of Google Vision API requires a credit card, but there is no charge if usage does not exceed a certain level. Google API can be set to be restricted, such as it being available only for use with the Vision API and only within the limits of a specific website.
Google's Vision API can be used to e.g. quickly and easily read old receipts. Texts from books, magazines and websites are also very likely to be recognised in a very usable way.
Other alternatives that have been tried include Amazon and Microsoft's ones, as well as api4ai, but pricing and deployment issues made Google's Vision API the preferred option. Microsoft's service might have been a good choice, but when used via Eden AI, it could often, e.g. fail to see anything where "everyone else" recognized the text just fine. On the other hand, one could easily make ABBYY OCR SDK to produce extra letter artifacts (e.g. "anomalies apparent during visual representation"). After trying to use Microsoft's Azure AI Vision directly, interest was terminated by the notion that it began requiring organizational-level approval for username access or something like that.
OCR functionality is enabled both in the "particular browsing" view when an image is being viewed in the "Large preview" modal window, and in the item preview panel in the adequates view when an image or multi-image is selected. In the first case, the detected text can easily be placed on the clipboard by clicking it or press the Ctrl key while selecting part from the resulting text to do something with it, such as just copy it or use it to search for information from the web. In the second case the resulting text is automatically placed at the end of the item's textual content. Recognizing text from an image is usually a very fast operation when using the Google Vision API, as it takes about a second.
ABBYY's text recognition seems useful as it just doesn't leave much to complain about, when using it to images containing scanned texts or screenshots of webpages. As parameters it is possible to define in what languages texts are to be found.
444 VIL THE MECHANISM OF TIME-BINDING of it can be found by analysis practically everywhere. Our problem is to analyse the general case. Let us follow up roughly the process. We assume, for instance, an hypothetical case of an ideal observer who observes correctly and gives an impersonal, unbiased account of what he has observed. Let us assume that the happenings he has observed appeared as: O, and then a new happening ( occurred. At this level of observation, no speaking can be done, and, therefore, I use various fanciful symbols, and not words. The observer then gives a description of the above happenings, let us say a, b, c, d, . . . , x; then he makes an inference from these descriptions and reaches a con- clusion or forms a judgement A about these facts. Wc assume that facts unknown to him, which always exist, are not important in this case. Let us assume, also, that his conclusion seems correct and that the action A" which this conclusion motivates is appropriate. Obviously, we deal with at least three different levels of abstractions: the seen, experienced ., lower order abstractions (un-spcakable) ; then the descriptive level, and, finally, the inferential levels. Let us assume now another individual, Smiths ignorant of struc- ture or the orders of abstractions, of consciousness of abstracting, of s.r.; a politician or a preacher, let us say, a person who habitually iden- tifies, confuses his orders, uses inferential language for descriptions, and rather makes a business out of it. Let us assume that Smith, observes the 'same happenings’. He would witness the happenings O, |, ..... and the happening would appear new to him. The happenings O, be would describe in the form a, b, c, d, . . . , from which fewer descriptions he would form a judgement, reach a conclu- sion, B; which means that he would pass to another order of abstrac- tions. When the new happening occurs, he handles it with an already formed opinion B, and so his description of the happening ( is coloured by his older s.r and no longer the x of the ideal observer, but B(x) --- y. His description of ‘facts’ would not appear as the a, b, c, d, . . . , x, of the ideal observer but a, b, c, d,..., B(x) = y. Next he would abstract on a higher level, form a new judgement, about ‘facts’ a, b, c, d, . . . , B(x) =y, let us say, C. We see how the semantic error was produced. The happenings appeared the ‘same’, yet the unconscious identification of levels brought finally an entirely different conclusion to motivate a quite different action, A diagram will make this structurally clearer, as it is very difficult to explain this by words alone. On the Structural Differential it is shown without difficulty.
HIGHER ORDER ABSTRACTIONS 445 Seen happenings (un- IDEAL OBSERVER SMITH] speakable) (First order abstrac- tions) ............. Ik-5 .X Description III! I I I! I ( Second order abstrac- tions) ............. a, b, c, d, ... x a, b, c, d,... B(x)=y Inferences, conclusions, iqB and what not. I (Third order abstrac- tions) ............. A c Creeds and other se- I I mantic reactions.... A' c I Action A9 e Let us illustrate the foregoing with two clinical examples. In one case, a young boy persistently did not get up in the morning. In another case, a boy persistently took money from his mother’s pocketbook. In both cases, the actions were undesirable. In both cases, the parents unconsciously identified the levels, x was identified with B(x), and con- fused their orders of abstractions. In the first case, they concluded that the boy was lazy; in the second, that the boy was a thief. The parents, through semantic identification, read these inferences into every new ‘description’ of forthcoming facts, so that the parents’ new ‘facts’ became more and more semantically distorted and coloured in evaluation, and their actions more and more detrimental to all concerned. The general conditions in both families became continually worse, until the reading of inferences into descriptions by the ignorant parents produced a semantic background in the boys of driving them to murderous intents. A psychiatrist dealt with the problem as shown in the diagram of the ideal observer. The net result was that the one boy was not ‘lazy’, nor the other a ‘thief’, but that both were ill. After medical attention, of which the first step was to clarify the symbolic semantic situation, though not in such a general way as given here, all went smoothly. Two families were saved from crime and wreck. I may give another example out of a long list which it is unnecessary for our purpose to analyse, because as soon as the ‘consciousness of abstracting’ is acquired, the avoidance of these inherent semantic dif- ficulties becomes automatic. In a common fallacy of 'Petitio
Nobody likes it when possibly nostalgic places that may have been very significant to someone are presented as photographs in an online city-specific discussion group, to the extent that one might get surprised that they do not have time to relate their feelings about the present and their memories of the past to the kind of intrusive photographs, whose content and the feelings they evoke may not match the observer's personality at all. Hopefully, when the emphasis is not on location or temporality, the response is less likely to be so reluctant. Thus, these photographs taken a couple of decades ago could well be used to demonstrate text recognition from photographs.
However, it turns out that the usefulness of text recognition from photographs leads to the feeling that training of some artificial intelligence component might be required. The Cloudinary OCR addon used here is actually using the Google Vision API and, according to its documentation, cannot be given any parameters to guide text recognition if the texts consist of only Latin alphabets, i.e. the analysis results are the best available. The original images used in the analysis are 2015 x 1512 pixels. Google's Vision API also returns information about where in the image each text is found, which Cloudinary uses to automatically highlight those parts of the images where the analysis indicates that text is present. Discussions have been held with Cloudinary to see if a special pricing for OCR functionality could be made available to users of this publishing application (available on request).