The following article on Stack Overflow was a life-saver in a recent project where we needed to convert HTML to PDF.
http://stackoverflow.com/a/25164258
We were using C# and needed to convert a well-formed string of HTML to a PDF file. Using iTextSharp (5.5.5) and itextsharp.xmlworker (5.5.5), both available in the NuGet Package Manager in Visual Studio 2013, and with a great working example from the Stack Overflow answer we ended up with the following:
public static ReturnValue ConvertHtmlToPdfAsBytes(string HtmlData) { // variables ReturnValue Result = new ReturnValue(); // do some additional cleansing to handle some scenarios that are out of control with the html data HtmlData = HtmlData.ReplaceValue("<br>", "<br />"); // convert html to pdf try { // create a stream that we can write to, in this case a MemoryStream using (var stream = new MemoryStream()) { // create an iTextSharp Document which is an abstraction of a PDF but **NOT** a PDF using (var document = new Document()) { // create a writer that's bound to our PDF abstraction and our stream using (var writer = PdfWriter.GetInstance(document, stream)) { // open the document for writing document.Open(); // read html data to StringReader using (var html = new StringReader(HtmlData)) { XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, html); } // close document document.Close(); } } // get bytes from stream Result.Data = stream.ToArray(); // success Result.Success = true; } } catch (Exception ex) { Result.Success = false; Result.Message = ex.Message; } // return return Result; }
The ReturnValue class was simply a helper class that looks like this:
// return value class public class ReturnValue { // constructor public ReturnValue() { this.Success = false; this.Message = string.Empty; } // properties public bool Success = false; public string Message = string.Empty; public Byte[] Data = null; }
We also had another method to physically create the PDF file in case you didn't want just the bytes array directly, for example:
public static ReturnValue ConvertHtmlToPdfAsFile(string FilePath, string HtmlData) { // variables ReturnValue Result = new ReturnValue(); try { // convert html to pdf and get bytes array Result = ConvertHtmlToPdfAsBytes(HtmlData: HtmlData); // check for errors if (!Result.Success) { return Result; } // create file File.WriteAllBytes(path: FilePath, bytes: Result.Data); // result Result.Success = true; } catch(Exception ex) { Result.Success = false; Result.Message = ex.Message; } // return return Result; }
It's important to remember that in order for this to work, you must have valid well-formed HTML; otherwise you can certainly expect for iTextSharp to throw an error. But if you have control over the HTML that you need to convert, this solution is great, and produces very nice PDF files.
It's worth noting that in our case we didn't need to pass the CSS in separately using the overloaded ParseXHtml constructor, ParseXHtml(PdfWriter writer, Document doc, Stream inp, Stream inCssFile), because we were including our CSS styles in our HTML data string instead, which for our solution was a bit cleaner.
Matt Pavey is a Microsoft Certified software developer who specializes in ASP.Net, VB.Net, C#, AJAX, LINQ, XML, XSL, Web Services, SQL, jQuery, and more. Follow on Twitter @matthewpavey
Wow thx your code was a great help i was looking how to convert html to pdf with my database and found your blog. You a time saver.
ReplyDeleteThat is great to hear. I'm glad I was able to help!
ReplyDeleteHi Matt,
ReplyDeleteCan this code easily be changed to create the pdf "in memory", and email it off as an email attatchment
Hi Melvyn,
ReplyDeleteThanks for stopping by and reading the article and leaving a comment.
To answer your question, yes, you could easily create the PDF file in memory instead of saving it to file. In fact, if you call the ConvertHtmlToPdfAsBytes method (instead of ConvertHtmlToPdfAsFile) it will simply return a Byte[] in the ReturnValue "Data" property.
Once you have it in a Byte array the rest is just a matter of adding it as an attachment using the built in System.Net.Mail.MailMessage class, something like:
// smtp client
private SmtpClient xSmtpClient = new SmtpClient();
// mail message
private MailMessage xMailMessage = new MailMessage();
public bool Send()
{
// configure smtp client
// e.g. server, port, credentialis
// configure mail message
// recipient, subject, body
// add your attachment
// call the AddAttachmentFromBytes below to add your attachment
AddAttachmentFromBytes("attachment.pdf", Bytes, "application/pdf");
// send message
xSmtpClient.Send(xMailMessage);
}
public void AddAttachmentFromBytes(string FileName, byte[] Bytes, string MediaType)
{
xMailMessage.Attachments.Add(new Attachment(new MemoryStream(Bytes), FileName, MediaType));
}
If you have any trouble let me know.
Good luck,
Matt
thanks Matt,
ReplyDeleteHi Im using itextsharp version 5.5.8 im getting error " The name 'XMLWorkerHelper' does not exist in the current context".
ReplyDeleteHi Panneer,
ReplyDeleteTake a look at this StackOverflow article/answer and I think you'll find your answer.
http://stackoverflow.com/a/24957433
Basically you'll need to add a reference to iTextSharp.XMLWorker
Good luck!
Matt
XMLWorkerHelper can be installed from NuGet as itextsharp.xmlworker
ReplyDeleteThis fails if the html has stylesheets or images in it. I tried making sure to use absolute urls but it still fails. is there a way to make this work if the html has external assets?
ReplyDeleteHi Matt!
ReplyDeletei hope you can help me.
where does "string HtmlData" came from and how can I call the function?
I am just new to this and this the most understandable code i have seen.
Thanks!
Hi Pami,
ReplyDeleteThe "HtmlData" variable is the input parameter on the "ConvertHtmlToPdfAsBytes" function, so you would call it something like this with your well-formed HTML string that you want to convert to a PDF file:
Result = ConvertHtmlToPdfAsBytes(HtmlData: "<p>Hello World</p>");
Now the Result.Data byte array should have the bytes that you need to either save or stream your PDF file to disk or to a database, or back to the browser.
Good luck!
I don't understand why people like such a complicated solutions. There plenty of easy to use API on the market like this one (http://www.pdfonline.com/html-to-pdf-c%23/). All it is needed to generate a small code and put it in.
ReplyDeleteI suspect it's because people don't want to use a "trial" that is going to end up costing them money, and you don't even know how much it costs unless you request a quote. Not to mention some projects have requirements to use certain tools that have been industry proven, and more importantly are free. Perhaps you're just promoting this particular product for marketing purposes, because I see no other reason this tool is easier to use, or cheaper. Not to mention the source code I provided is anything but complicated. If you're a .Net developer, the code is about is simple as it gets, and the majority of it is helper methods/classes to make it more readable and scalable, and it is commented for clarity. Hopefully someone will get benefit out of both, but I think if you're going to offer up an alternative solution it would only be fair if you shared full disclosure about costs, and maybe even an example of what you consider non-complicated code. Thanks!
DeleteThere's literally 1 parameter required for generating a PDF... Ali Seif, please provide an example that requires fewer, I would love to see that.
ReplyDeleteI am getting the error "The document has no pages."
ReplyDeleteYou will want to double check to make sure the HTML you are trying to convert to a PDF is well-formed.
Delete