Saturday, November 21, 2020

Read (Extract) Text from Image (OCR) in ASP.Net using C# and VB.Net

 In this article I will explain how to read or extract text from image using Microsoft Office Document Imaging (MODI) in ASP.Net with C# and VB.Net.
This process of reading or extracting text from images is also termed as Optical Character Recognition (OCR).


In order to illustrate the process, I am creating an example where I’ll upload an Image containing some text and then the text will be read from the Image using OCR process and finally the extracted text will be displayed in ASP.Net Label control.
 

 
Downloading and installing the Microsoft Office Document Imaging (MODI)
For installing the Microsoft Office Document Imaging (MODI), you need to download Microsoft Office SharePoint Designer 2007 using the download link provided below.
Once it is downloaded, you need to start its installation and on the installer window you need to click on the Customize button as shown below.
Read (Extract) Text from Image (OCR) in ASP.Net using C# and VB.Net
 
Then the installer will list down all the installable items. From the list look for Microsoft Office Document Imaging and select Run all from My Computer.
Read (Extract) Text from Image (OCR) in ASP.Net using C# and VB.Net
 
Also make sure you select Run all from My Computer for Scanning, OCR and Indexing Service Filter by expanding the Microsoft Office Document Imaging node.
Read (Extract) Text from Image (OCR) in ASP.Net using C# and VB.Net
 
Now you need to click Continue button and after the installation is complete restart your machine for the changes to take effect.
 
 
Adding Reference of Microsoft Office Document Imaging (MODI) to your project in Visual Studio
In order to add reference of Microsoft Office Document Imaging (MODI) to your project in Visual Studio, simply click Add Reference by right clicking on the project in Solution Explorer and inside the COM tab look for Microsoft Office Document Imaging 12.0 Type Library, select it and click OK
Read (Extract) Text from Image (OCR) in ASP.Net using C# and VB.Net
 
You should now be able to see the Interop.MODI.dll in your project.
Read (Extract) Text from Image (OCR) in ASP.Net using C# and VB.Net
 
 
HTML Markup
The HTML markup consist of an ASP.Net FileUpload control, a Button and a Label control.
Select File:
<asp:FileUpload ID="FileUpload1" runat="server" />
<asp:Button Text="Upload" runat="server" OnClick="Upload" />
<hr />
<asp:Label ID="lblText" runat="server" />
 
 
Namespaces
You will need to import the following namespaces.
C#
using MODI;
using System.IO;
 
VB.Net
Imports MODI
Imports System.IO
 
 
Reading or extracting text from image using Microsoft Office Document Imaging (MODI)
Once the file is selected and Upload button is clicked, the Upload event handler is executed. Here first the file is saved inside the Uploads folder and then the file path is supplied to the ExtractTextFromImage method.
The ExtractTextFromImage method reads the file from the location where it is saved using MODI Document object and text is extracted from the image and returned back by the method.
The extracted text is assigned to the Label control.
Note: Before assigning to the Label control the new line character is replaced with “<br />” for displaying new lines on web page. For Windows and Console application this process is not needed.
C#
protected void Upload(object sender, EventArgs e)
{
      string filePath = Server.MapPath("~/Uploads/" + Path.GetFileName(FileUpload1.PostedFile.FileName));
      FileUpload1.SaveAs(filePath);
      string extractText = this.ExtractTextFromImage(filePath);
      lblText.Text = extractText.Replace(Environment.NewLine, "<br />");
}
 
private string ExtractTextFromImage(string filePath)
{
      Document modiDocument = new Document();
      modiDocument.Create(filePath);
      modiDocument.OCR(MiLANGUAGES.miLANG_ENGLISH);
      MODI.Image modiImage = (modiDocument.Images[0] as MODI.Image);
      string extractedText = modiImage.Layout.Text;
      modiDocument.Close();
      return extractedText;
}
 
VB.Net
Protected Sub Upload(sender As Object, e As EventArgs)
    Dim filePath As String = Server.MapPath("~/Uploads/" + Path.GetFileName(FileUpload1.PostedFile.FileName))
    FileUpload1.SaveAs(filePath)
    Dim extractText As String = Me.ExtractTextFromImage(filePath)
    lblText.Text = extractText.Replace(Environment.NewLine, "<br />")
End Sub
 
Private Function ExtractTextFromImage(filePath As StringAs String
    Dim modiDocument As New Document()
    modiDocument.Create(filePath)
    modiDocument.OCR(MiLANGUAGES.miLANG_ENGLISH)
    Dim modiImage As MODI.Image = TryCast(modiDocument.Images(0), MODI.Image)
    Dim extractedText As String = modiImage.Layout.Text
    modiDocument.Close()
    Return extractedText
End Function
 
Screenshots
Image with some text
Read (Extract) Text from Image (OCR) in ASP.Net using C# and VB.Net
 
The text read from the above image and displayed in Label control
Read (Extract) Text from Image (OCR) in ASP.Net using C# and VB.Net

No comments:

Post a Comment

Lab 09: Publish and subscribe to Event Grid events

  Microsoft Azure user interface Given the dynamic nature of Microsoft cloud tools, you might experience Azure UI changes that occur after t...