Thursday, April 9, 2015

ASP.Net C# Convert HTML to PDF using iTextSharp

Matt Pavey April 09, 2015 .Net 16 comments

Click here to view an updated post that optionally supports landscape mode.

The following article on Stack Overflow was a life-saver in a recent project where we needed to convert HTML to PDF.

http://stackoverflow.com/a/25164258

We were using C# and needed to convert a well-formed string of HTML to a PDF file. Using iTextSharp (5.5.5) and itextsharp.xmlworker (5.5.5), both available in the NuGet Package Manager in Visual Studio 2013, and with a great working example from the Stack Overflow answer we ended up with the following:

 public static ReturnValue ConvertHtmlToPdfAsBytes(string HtmlData)  
 {  
       // variables  
       ReturnValue Result = new ReturnValue();  
   
       // do some additional cleansing to handle some scenarios that are out of control with the html data  
       HtmlData = HtmlData.ReplaceValue("<br>", "<br />");  
   
       // convert html to pdf  
       try  
       {  
         // create a stream that we can write to, in this case a MemoryStream  
         using (var stream = new MemoryStream())  
         {  
           // create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF  
           using (var document = new Document())  
           {  
             // create a writer that's bound to our PDF abstraction and our stream  
             using (var writer = PdfWriter.GetInstance(document, stream))  
             {  
               // open the document for writing  
               document.Open();  
   
               // read html data to StringReader  
               using (var html = new StringReader(HtmlData))  
               {  
                 XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, html);  
               }  
   
               // close document  
               document.Close();  
             }  
           }  
   
           // get bytes from stream  
           Result.Data = stream.ToArray();  
   
           // success  
           Result.Success = true;  
         }  
       }  
       catch (Exception ex)  
       {  
         Result.Success = false;  
         Result.Message = ex.Message;  
       }  
   
       // return  
       return Result;  
 }

The ReturnValue class was simply a helper class that looks like this:

 // return value class  
 public class ReturnValue  
 {  
       // constructor  
       public ReturnValue()  
       {  
         this.Success = false;  
         this.Message = string.Empty;  
       }  
   
       // properties  
       public bool Success = false;  
       public string Message = string.Empty;  
       public Byte[] Data = null;  
 }

We also had another method to physically create the PDF file in case you didn't want just the bytes array directly, for example:

 public static ReturnValue ConvertHtmlToPdfAsFile(string FilePath, string HtmlData)  
 {  
       // variables  
       ReturnValue Result = new ReturnValue();  
   
       try  
       {  
         // convert html to pdf and get bytes array  
         Result = ConvertHtmlToPdfAsBytes(HtmlData: HtmlData);  
   
         // check for errors  
         if (!Result.Success)  
         {  
           return Result;  
         }  
   
         // create file  
         File.WriteAllBytes(path: FilePath, bytes: Result.Data);  
   
         // result  
         Result.Success = true;  
       }  
       catch(Exception ex)  
       {  
         Result.Success = false;  
         Result.Message = ex.Message;  
       }  
   
       // return  
       return Result;  
 }

It's important to remember that in order for this to work, you must have valid well-formed HTML; otherwise you can certainly expect for iTextSharp to throw an error. But if you have control over the HTML that you need to convert, this solution is great, and produces very nice PDF files.

It's worth noting that in our case we didn't need to pass the CSS in separately using the overloaded ParseXHtml constructor, ParseXHtml(PdfWriter writer, Document doc, Stream inp, Stream inCssFile), because we were including our CSS styles in our HTML data string instead, which for our solution was a bit cleaner.

Matt Pavey is a Microsoft Certified software developer who specializes in ASP.Net, VB.Net, C#, AJAX, LINQ, XML, XSL, Web Services, SQL, jQuery, and more. Follow on Twitter @matthewpavey

16 comments:

AdminAugust 20, 2015 at 9:31 PM
Wow thx your code was a great help i was looking how to convert html to pdf with my database and found your blog. You a time saver.
ReplyDelete
Replies
Matt PaveyAugust 20, 2015 at 9:43 PM
That is great to hear. I'm glad I was able to help!
ReplyDelete
Replies
UnknownMarch 24, 2016 at 4:56 AM
Hi Matt,

Can this code easily be changed to create the pdf "in memory", and email it off as an email attatchment
ReplyDelete
Replies
Matt PaveyMarch 24, 2016 at 10:15 AM
Hi Melvyn,

Thanks for stopping by and reading the article and leaving a comment.

To answer your question, yes, you could easily create the PDF file in memory instead of saving it to file. In fact, if you call the ConvertHtmlToPdfAsBytes method (instead of ConvertHtmlToPdfAsFile) it will simply return a Byte[] in the ReturnValue "Data" property.

Once you have it in a Byte array the rest is just a matter of adding it as an attachment using the built in System.Net.Mail.MailMessage class, something like:

// smtp client
private SmtpClient xSmtpClient = new SmtpClient();

// mail message
private MailMessage xMailMessage = new MailMessage();

public bool Send()
{
// configure smtp client
// e.g. server, port, credentialis

// configure mail message
// recipient, subject, body

// add your attachment
// call the AddAttachmentFromBytes below to add your attachment
AddAttachmentFromBytes("attachment.pdf", Bytes, "application/pdf");

// send message
xSmtpClient.Send(xMailMessage);
}

public void AddAttachmentFromBytes(string FileName, byte[] Bytes, string MediaType)
{
xMailMessage.Attachments.Add(new Attachment(new MemoryStream(Bytes), FileName, MediaType));
}

If you have any trouble let me know.

Good luck,
Matt
ReplyDelete
Replies
UnknownMarch 29, 2016 at 9:08 AM
thanks Matt,
ReplyDelete
Replies
PanneerVJune 23, 2016 at 8:30 AM
Hi Im using itextsharp version 5.5.8 im getting error " The name 'XMLWorkerHelper' does not exist in the current context".
ReplyDelete
Replies
Matt PaveyJune 27, 2016 at 8:56 AM
Hi Panneer,

Take a look at this StackOverflow article/answer and I think you'll find your answer.

http://stackoverflow.com/a/24957433

Basically you'll need to add a reference to iTextSharp.XMLWorker

Good luck!
Matt
ReplyDelete
Replies
UnknownJuly 4, 2016 at 8:37 PM
XMLWorkerHelper can be installed from NuGet as itextsharp.xmlworker
ReplyDelete
Replies
kc coderAugust 29, 2016 at 11:46 PM
This fails if the html has stylesheets or images in it. I tried making sure to use absolute urls but it still fails. is there a way to make this work if the html has external assets?
ReplyDelete
Replies
UnknownDecember 14, 2016 at 1:00 AM
Hi Matt!

i hope you can help me.
where does "string HtmlData" came from and how can I call the function?
I am just new to this and this the most understandable code i have seen.

Thanks!
ReplyDelete
Replies
Matt PaveyDecember 14, 2016 at 9:45 AM
Hi Pami,

The "HtmlData" variable is the input parameter on the "ConvertHtmlToPdfAsBytes" function, so you would call it something like this with your well-formed HTML string that you want to convert to a PDF file:

Result = ConvertHtmlToPdfAsBytes(HtmlData: "<p>Hello World</p>");

Now the Result.Data byte array should have the bytes that you need to either save or stream your PDF file to disk or to a database, or back to the browser.

Good luck!
ReplyDelete
Replies
AnonymousNovember 1, 2017 at 2:41 PM
I don't understand why people like such a complicated solutions. There plenty of easy to use API on the market like this one (http://www.pdfonline.com/html-to-pdf-c%23/). All it is needed to generate a small code and put it in.
ReplyDelete
Replies
UnknownNovember 1, 2017 at 3:00 PM
There's literally 1 parameter required for generating a PDF... Ali Seif, please provide an example that requires fewer, I would love to see that.
ReplyDelete
Replies
UnknownMay 4, 2018 at 1:21 AM
I am getting the error "The document has no pages."
ReplyDelete
Replies

Add comment

Matthew Pavey's Blog

Matthew Pavey is a follower of Christ, devoted husband and father, avid reader, and software developer.

Thursday, April 9, 2015

ASP.Net C# Convert HTML to PDF using iTextSharp

16 comments:

About Me