Persits Software, Inc. Knowledge Base Articles

HOWTO: Convert UTF-8 Encoded Strings to Unicode in Classic ASP

Problem Description

AspPDF's text drawing methods such as PdfCanvas.DrawText expect Unicode text strings. However, text strings that come from databases are often UTF-8 encoded and need to be converted to Unicode before they can be passed to AspPDF. Otherwise, stray characters may appear in the PDF output.

This article explains how to convert a UTF-8 encoded string to Unicode using the ADODB.Stream object.

Solution

The following function employing the ADODB.Stream object converts a UTF-8 encoded string to Unicode:

Function Utf8ToUnicode(strText)
   With CreateObject("ADODB.Stream")

      .Open
      .Charset = "Windows-1252"

      .WriteText strText

      .Position = 0
      .Type = 2 ' adTypeText
      .Charset = "utf-8"

      Utf8ToUnicode = .ReadText(-1) 'adReadAll

      .Close
   End With
End Function

This function can be used as follows:

...

set rs = Server.CreateObject("ADODB.Recordset")
rs.Open "select * from test_table", objConn, 3, 2

Dim strField: strField = rs("content")
Dim strUnicode: strUnicode = Utf8ToUnicode( strField )

' AspPDF call
objPage.Canvas.DrawText strUnicode, "x=10; y=100", objFont