Thursday, April 9, 2015

ASP.Net C# Convert HTML to PDF using iTextSharp

Click here to view an updated post that optionally supports landscape mode.

The following article on Stack Overflow was a life-saver in a recent project where we needed to convert HTML to PDF.

http://stackoverflow.com/a/25164258

We were using C# and needed to convert a well-formed string of HTML to a PDF file. Using iTextSharp (5.5.5) and itextsharp.xmlworker (5.5.5), both available in the NuGet Package Manager in Visual Studio 2013, and with a great working example from the Stack Overflow answer we ended up with the following:
 public static ReturnValue ConvertHtmlToPdfAsBytes(string HtmlData)  
 {  
       // variables  
       ReturnValue Result = new ReturnValue();  
   
       // do some additional cleansing to handle some scenarios that are out of control with the html data  
       HtmlData = HtmlData.ReplaceValue("<br>", "<br />");  
   
       // convert html to pdf  
       try  
       {  
         // create a stream that we can write to, in this case a MemoryStream  
         using (var stream = new MemoryStream())  
         {  
           // create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF  
           using (var document = new Document())  
           {  
             // create a writer that's bound to our PDF abstraction and our stream  
             using (var writer = PdfWriter.GetInstance(document, stream))  
             {  
               // open the document for writing  
               document.Open();  
   
               // read html data to StringReader  
               using (var html = new StringReader(HtmlData))  
               {  
                 XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, html);  
               }  
   
               // close document  
               document.Close();  
             }  
           }  
   
           // get bytes from stream  
           Result.Data = stream.ToArray();  
   
           // success  
           Result.Success = true;  
         }  
       }  
       catch (Exception ex)  
       {  
         Result.Success = false;  
         Result.Message = ex.Message;  
       }  
   
       // return  
       return Result;  
 }  

The ReturnValue class was simply a helper class that looks like this:
 // return value class  
 public class ReturnValue  
 {  
       // constructor  
       public ReturnValue()  
       {  
         this.Success = false;  
         this.Message = string.Empty;  
       }  
   
       // properties  
       public bool Success = false;  
       public string Message = string.Empty;  
       public Byte[] Data = null;  
 }  

We also had another method to physically create the PDF file in case you didn't want just the bytes array directly, for example:
 public static ReturnValue ConvertHtmlToPdfAsFile(string FilePath, string HtmlData)  
 {  
       // variables  
       ReturnValue Result = new ReturnValue();  
   
       try  
       {  
         // convert html to pdf and get bytes array  
         Result = ConvertHtmlToPdfAsBytes(HtmlData: HtmlData);  
   
         // check for errors  
         if (!Result.Success)  
         {  
           return Result;  
         }  
   
         // create file  
         File.WriteAllBytes(path: FilePath, bytes: Result.Data);  
   
         // result  
         Result.Success = true;  
       }  
       catch(Exception ex)  
       {  
         Result.Success = false;  
         Result.Message = ex.Message;  
       }  
   
       // return  
       return Result;  
 }  

It's important to remember that in order for this to work, you must have valid well-formed HTML; otherwise you can certainly expect for iTextSharp to throw an error. But if you have control over the HTML that you need to convert, this solution is great, and produces very nice PDF files.

It's worth noting that in our case we didn't need to pass the CSS in separately using the overloaded ParseXHtml constructor, ParseXHtml(PdfWriter writer, Document doc, Stream inp, Stream inCssFile), because we were including our CSS styles in our HTML data string instead, which for our solution was a bit cleaner.

Matt Pavey is a Microsoft Certified software developer who specializes in ASP.Net, VB.Net, C#, AJAX, LINQ, XML, XSL, Web Services, SQL, jQuery, and more. Follow on Twitter @matthewpavey

16 comments:

  1. Wow thx your code was a great help i was looking how to convert html to pdf with my database and found your blog. You a time saver.

    ReplyDelete
  2. That is great to hear. I'm glad I was able to help!

    ReplyDelete
  3. Hi Matt,

    Can this code easily be changed to create the pdf "in memory", and email it off as an email attatchment

    ReplyDelete
  4. Hi Melvyn,

    Thanks for stopping by and reading the article and leaving a comment.

    To answer your question, yes, you could easily create the PDF file in memory instead of saving it to file. In fact, if you call the ConvertHtmlToPdfAsBytes method (instead of ConvertHtmlToPdfAsFile) it will simply return a Byte[] in the ReturnValue "Data" property.

    Once you have it in a Byte array the rest is just a matter of adding it as an attachment using the built in System.Net.Mail.MailMessage class, something like:

    // smtp client
    private SmtpClient xSmtpClient = new SmtpClient();

    // mail message
    private MailMessage xMailMessage = new MailMessage();

    public bool Send()
    {
    // configure smtp client
    // e.g. server, port, credentialis

    // configure mail message
    // recipient, subject, body

    // add your attachment
    // call the AddAttachmentFromBytes below to add your attachment
    AddAttachmentFromBytes("attachment.pdf", Bytes, "application/pdf");

    // send message
    xSmtpClient.Send(xMailMessage);
    }

    public void AddAttachmentFromBytes(string FileName, byte[] Bytes, string MediaType)
    {
    xMailMessage.Attachments.Add(new Attachment(new MemoryStream(Bytes), FileName, MediaType));
    }

    If you have any trouble let me know.

    Good luck,
    Matt

    ReplyDelete
  5. Hi Im using itextsharp version 5.5.8 im getting error " The name 'XMLWorkerHelper' does not exist in the current context".

    ReplyDelete
  6. Hi Panneer,

    Take a look at this StackOverflow article/answer and I think you'll find your answer.

    http://stackoverflow.com/a/24957433

    Basically you'll need to add a reference to iTextSharp.XMLWorker

    Good luck!
    Matt

    ReplyDelete
  7. XMLWorkerHelper can be installed from NuGet as itextsharp.xmlworker

    ReplyDelete
  8. This fails if the html has stylesheets or images in it. I tried making sure to use absolute urls but it still fails. is there a way to make this work if the html has external assets?

    ReplyDelete
  9. Hi Matt!

    i hope you can help me.
    where does "string HtmlData" came from and how can I call the function?
    I am just new to this and this the most understandable code i have seen.

    Thanks!

    ReplyDelete
  10. Hi Pami,

    The "HtmlData" variable is the input parameter on the "ConvertHtmlToPdfAsBytes" function, so you would call it something like this with your well-formed HTML string that you want to convert to a PDF file:

    Result = ConvertHtmlToPdfAsBytes(HtmlData: "<p>Hello World</p>");

    Now the Result.Data byte array should have the bytes that you need to either save or stream your PDF file to disk or to a database, or back to the browser.

    Good luck!

    ReplyDelete
  11. I don't understand why people like such a complicated solutions. There plenty of easy to use API on the market like this one (http://www.pdfonline.com/html-to-pdf-c%23/). All it is needed to generate a small code and put it in.

    ReplyDelete
    Replies
    1. I suspect it's because people don't want to use a "trial" that is going to end up costing them money, and you don't even know how much it costs unless you request a quote. Not to mention some projects have requirements to use certain tools that have been industry proven, and more importantly are free. Perhaps you're just promoting this particular product for marketing purposes, because I see no other reason this tool is easier to use, or cheaper. Not to mention the source code I provided is anything but complicated. If you're a .Net developer, the code is about is simple as it gets, and the majority of it is helper methods/classes to make it more readable and scalable, and it is commented for clarity. Hopefully someone will get benefit out of both, but I think if you're going to offer up an alternative solution it would only be fair if you shared full disclosure about costs, and maybe even an example of what you consider non-complicated code. Thanks!

      Delete
  12. There's literally 1 parameter required for generating a PDF... Ali Seif, please provide an example that requires fewer, I would love to see that.

    ReplyDelete
  13. I am getting the error "The document has no pages."

    ReplyDelete
    Replies
    1. You will want to double check to make sure the HTML you are trying to convert to a PDF is well-formed.

      Delete