In this article I will explain with an example, how to convert Image to Text using Microsoft Office Document Imaging (MODI) in ASP.Net with C# and VB.Net.
This process of reading or extracting text from images is also termed as Optical Character Recognition (OCR).
In order to illustrate the process, I am creating an example where I’ll upload an Image containing some text and then the text will be read from the Image using OCR process and finally the extracted text will be displayed in ASP.Net Label control.
Downloading and installing the Microsoft Office Document Imaging (MODI)
For installing the Microsoft Office Document Imaging (MODI), you need to download Microsoft Office SharePoint Designer 2007 using the download link provided below.
Once it is downloaded, you need to start its installation and on the installer window you need to click on the Customize button as shown below.
Then the installer will list down all the installable items. From the list look for Microsoft Office Document Imaging and select Run all from My Computer.
Also make sure you select Run all from My Computer for Scanning, OCR and Indexing Service Filter by expanding the Microsoft Office Document Imaging node.
Now you need to click Continue button and after the installation is complete restart your machine for the changes to take effect.
Adding Reference of Microsoft Office Document Imaging (MODI) to your project in Visual Studio
In order to add reference of Microsoft Office Document Imaging (MODI) to your project in Visual Studio, simply click Add Reference by right clicking on the project in Solution Explorer and inside the COM tab look for Microsoft Office Document Imaging 12.0 Type Library, select it and click OK
You should now be able to see the Interop.MODI.dll in your project.
HTML Markup
The HTML markup consists of an ASP.Net FileUpload control, a Button and a Label control.
Select File:
<asp:FileUpload ID="FileUpload1" runat="server" />
<asp:Button Text="Upload" runat="server" OnClick="Upload" />
<hr />
<asp:Label ID="lblText" runat="server" />
Namespaces
You will need to import the following namespaces.
C#
using MODI;
using System.IO;
VB.Net
Imports MODI
Imports System.IO
Reading or extracting text from image using Microsoft Office Document Imaging (MODI)
Once the file is selected and Upload button is clicked, the Upload event handler is executed. Here first the file is saved inside the Uploads folder and then the file path is supplied to the ExtractTextFromImage method.
The ExtractTextFromImage method reads the file from the location where it is saved using MODI Document object and text is extracted from the image and returned back by the method.
The extracted text is assigned to the Label control.
C#
protected void Upload(object sender, EventArgs e)
{
string filePath = Server.MapPath("~/Uploads/" + Path.GetFileName(FileUpload1.PostedFile.FileName));
FileUpload1.SaveAs(filePath);
string extractText = this.ExtractTextFromImage(filePath);
lblText.Text = extractText.Replace(Environment.NewLine, "<br />");
}
private string ExtractTextFromImage(string filePath)
{
Document modiDocument = new Document();
modiDocument.Create(filePath);
modiDocument.OCR(MiLANGUAGES.miLANG_ENGLISH);
MODI.Image modiImage = (modiDocument.Images[0] as MODI.Image);
string extractedText = modiImage.Layout.Text;
modiDocument.Close();
return extractedText;
}
VB.Net
Protected Sub Upload(sender As Object, e As EventArgs)
Dim filePath As String = Server.MapPath("~/Uploads/" + Path.GetFileName(FileUpload1.PostedFile.FileName))
FileUpload1.SaveAs(filePath)
Dim extractText As String = Me.ExtractTextFromImage(filePath)
lblText.Text = extractText.Replace(Environment.NewLine, "<br />")
End Sub
Private Function ExtractTextFromImage(filePath As String) As String
Dim modiDocument As New Document()
modiDocument.Create(filePath)
modiDocument.OCR(MiLANGUAGES.miLANG_ENGLISH)
Dim modiImage As MODI.Image = TryCast(modiDocument.Images(0), MODI.Image)
Dim extractedText As String = modiImage.Layout.Text
modiDocument.Close()
Return extractedText
End Function
Screenshots
Image with some text
The text read from the above image and displayed in Label control
No comments:
Post a Comment