tricky-e1425506790935.jpg

Populating a SharePoint List to PDF

On a recent project we were in a situation where our customer was using Adobe Acrobat as a form authoring tool and we were using data stored in several SharePoint lists and document libraries to generate a PDF from the template they provided us.

The customer wanted to use PDF for the output file because they have a person at each store who would download the latest PDF and print it on a standard printer loaded with pre-perforated paper. We did this using a small Windows application that we built that used the SharePoint Client API for the SharePoint calls and iTextSharp for the PDF generation. If you haven’t used iTextSharp before (or iText, which iTextSharp is just a .Net clone of and I find myself using both names interchangeably) it is very well documented and is surprisingly easy to work with. However, it does have some limitations that required some pretty imaginative solutions to work around. I plan on documenting some of those limitations and how we got around them in this blog post.

First though, a bit of an overview of the application in question.

To protect the names of the innocent I won’t mention what the tool is actually used for or by whom so let’s go with an analogy and say the customer is a restaurant chain and the PDF is their menu. Management has a complex approval and review process for each meal that is on the menu that requires review from many groups across several divisions within the company. Since the data that is being reviewed and approved is the information about the menu item, not the design or appearance of the menu itself, SharePoint made a natural choice for this. Create a custom list with fields for details about the menu item (name, ingredients, calories, price, etc.), slap a SharePoint Designer workflow on the list with all the parties set to approve it, and voila! Now there is one source for the gospel, we have accountability, no more worrying about who has the latest version of the document, and graphic designers can focus on designing a sharp looking menu form and not how many calories are in this week’s tiramisu.

Now came the tricky part.

We have the data in SharePoint but we want it in a PDF using their formatting and layout. We did not want to get in the practice of specifying the layout of the PDF programmatically. We just needed to populate some field variables with our data from SharePoint and copy the PDF to a file on the local disk. This way the form designers can work on tweaking the PDF template separately and then just upload a new PDF template to SharePoint when its ready. The application will pull down a fresh copy of the template each time so changes to the menu layout take place instantly (or could even go through its own review and approval process if needed). If you have ever worked with PDF from a programmatic standpoint though you know that adding text to a PDF document is not straight forward and doing it in the manner we wanted is even less straight forward. Add iText and it’s AcroFields.SetField method to the mix though and it becomes easy as pie (sorry, couldn’t resist).

I am not going to go into the details of how to create a PdfStamper. There are plenty of examples in the iTextSharp documentation of how to do it and its pretty easy. But the gist is instead of the form designer typing in the text for the tiramisu or its price, they would add PDF form fields to their template and name them in a manner we could retrieve through the iText API. Then we set the form field’s value with our string, use iText to flatten the field, and then move on to the next field.

Fair warning, I am pilfering code from several classes and methods and I am not testing. It should be enough to get you close but I do not guarantee that copy and paste will work.

string fieldName = "Price";//This is the name of the field in the PDF and the name of our field in SharePoint

var form = _pdfResultStamper.AcroFields;
 var fieldKeys = form.Fields.Keys;

string fieldName = (field.TemplateVariableName == null) ? field.Title : field.TemplateVariableName;
 string fieldValue = string.Empty;
 if (fieldKeys.Contains(fieldName))
 {
 form.SetField(fieldName, fieldValue);
 }
 _pdfResultStamper.PartialFormFlattening(fieldName); 

So that’s pretty much it. Throw that into a loop that iterates over each field on the SP ListItem. This will generate a PDF in memory using the template from the designers with the values from the SharePoint list. The next step is to take that memory stream and throw into an array of streams for latter merging. Each memory stream represents one page in the PDF. The menu analogy sort of falls apart here because you wouldn’t really need to do this but for the sake of the analogy imagine that each page in the PDF represents one SharePoint ListItem.

List pdfStreams = new List();
 pdfStreams.Add(new MemoryStream(_pdfResultStream.ToArray()); 

Once you have an array of memory streams that represent your PDF pages, we need to flatten the fields in the PDF so they are no longer editable. For those who are not familiar with flattening in PDF, it is basically what it sounds like. PDF draws form fields as a series of rectangles on a X, Y, Z coordinate. When you flatten it, all those rectangles get squished down to just the text contents (I can hear a few PDF purists screaming at their monitor already, but for the lay person its a close enough explanation…). Its sort of like marking a field read only, but a bit more involved.

 AcroFields formFields = _pdfResultStamper.AcroFields;

//Loop through all the form fields in the PDF template
 foreach (var field in formFields.Fields)
 {
 //This is the field name
 string fieldName = field.Key;
 AcroFields.Item i = field.Value;
 PdfDictionary dict = i.GetMerged(0);

//Determine if the field is hidden in the PDF template. If it is we do not need to flatten it.
 PdfNumber flags = dict.GetAsNumber(PdfName.F);
 if ((flags.IntValue & PdfAnnotation.FLAGS_HIDDEN) != 0)
 {
 continue;
 }
 _pdfResultStamper.PartialFormFlattening(pcfItem.TemplateVariableName);
 } 

Observant readers might notice that we are using PartialFormFlattening. This is because we had the requirement that in some circumstances some fields were to be left editable by the end user. Using the menu analogy if the restaurants in New York could charge different prices for tiramisu than the other restaurants this could be desirable. PartialFormFlattening only flattens the field with the name we pass in.

A quick note about how PDF addresses form fields as its going to be important in the next code block. It is possible to have two fields on a form with the name ‘Price’. Acrobat is perfectly fine with this and in our situation it actually came in really handy. Its a little strange at first if you have been coding for a long time to have multiple fields with the same name. One strange effect of this is when you interact with the form fields like the above examples iText will update ALL of your Price form fields with that name. Again, a situation where the menu analogy sort of falls apart. There are ways to get around this but we didn’t have a need to so I do not have example code showing how.

Now that our PDF is pretty much like we want it in the memory stream we need to write it out to disk. Again, iText makes this easy.

 FileStream outStream = new FileStream(filePath, FileMode.Create);//filePath is a variable passed in that stores a local directory path e.g. C:\Temp\MyFile.pdf
 PdfReader reader = new PdfReader(pdfStreams[0]);
 int pageCounter = 0;
 RenameFields(reader, pageCounter++);
 PdfCopyFields Writer = new PdfCopyFields(outStream);
 AddPages(reader.NumberOfPages, reader, ref Writer);
 foreach (var PdfStream in pdfStreams.Skip(1))
 {
 PdfReader reader2 = new PdfReader(PdfStream);
 //rename PDF fields
 RenameFields(reader2, pageCounter++);
 // Add content
 AddPages(reader2.NumberOfPages, reader2, ref Writer);
 }
 Writer.Close();
 foreach (var Strm in pdfStreams)
 {
 try { if (null != Strm) Strm.Dispose(); }
 catch { }
 }
 outStream.Close(); 

Well, sort of easy. If you pay close attention to the code above you notice we are calling RenameFields on each memory stream. This is because Acrobat does not like it when two PDF pages in a document have form fields with the same name. Our workaround was just to tack on the page number to the end of the fields. For the sake of clarity this is all RenameFields does.

 foreach (string field in reader.AcroFields.Fields.Keys)
 {
 reader.AcroFields.RenameField(field, field + pageCounter.ToString());
 } 

So we should now have a PDF file written to our disk with the data passed in from SharePoint. iText makes this all really easy.

But…

We found later on that iText does not support super scripting text in a form field. Acrobat does, but iText currently does not support it. And we are dealing with currency in our menus (after all, we want people to pay for the food, right?). After much lamentation and many, many internet searches, I discovered that Acrobat has a JavaScript engine that is ECMA compliant. Acrobat actually creates a DOM for its internal content. Even better, you can set rich text form fields with an array of JavaScript objects and specify for some of those objects to have their text contents super scripted. OK, so how do you A) get JavaScript into a PDF and B) how do you create the JavaScript objects?

A) is pretty simple in concept. We have a JavaScript file sitting in SharePoint that we download in our Windows application and read the text contents into a string we store globally. We have a few of the string.Format {0}-style variables where we eventually add a list of PDF form fields that need super scripted text formatted as a JavaScript array. We then use iText to inject the JavaScript into the document write before we write it to disk. When the PDF is opened we first check for a hidden check box field in the document, look if its checked (indicating the first time its been opened), run through our array of fields, add the super script, then check the check box so we don’t run through it again next time its opened. In the above example where we are writing the PDF file to disk, add this code in at line 15.

 Writer.AddJavaScript(PostProcessing.GetSuperscriptJavaScript()); 

Now the PostProcessing class is fairly complex in how its built and utilized, arguably its over engineered for what it does and explaining it would make what is already a long blog post even longer. Suffice it to say it has an array of strings that represent each PDF form field that needs to have its text super scripted. When we call GetSuperscriptJavaScript() we get back a string of JavaScript that is the contents of our file in SharePoint with our JavaScript array (which is a concatenation of our array of strings/field names). Nothing revolutionary. But the really neat part is the JavaScript that is created. Below is that code, and that should address B).

var fields = new Array("Price0", "Price1");
 var readOnlyFields = new Array("Price1");

var c = getField("RunJavaScript0");
 if(c.isBoxChecked(0)){
 for(var z=0; z < fields.length; z++){
 var f = getField(fields[z]);
 var currTextObj = f.richValue;
 var spans = new Array();
 var spanIndex = 0;

for(var i=0; i < currTextObj.length; i++){
 var origFontSize = currTextObj[i].textSize;
 var currText = currTextObj[i].text;
 var currTextArr = currText.split("(s)");

for(var x=0; x < currTextArr.length; x++){
 var t = new Object();
 var fontSizePatt = /^\([0-9]+\)/;
 var text = currTextArr[x];

if((x % 2) == 1){//Only super script odd indexes in the array
 if(fontSizePatt.test(text)){//Check if our superscript uses a custom font size
 var sizeR = fontSizePatt.exec(text).toString();
 console.println("Parsed custom font size of '" + sizeR + "'");
 var size = sizeR.substring(1, sizeR.length - 1);
 var sizeI = parseInt(size, 10);
 var newSize = origFontSize * (sizeI / 100);
 console.println("Setting font size to " + newSize);
 t.textSize = newSize;
 console.println(t.textSize);
 text = text.substring(text.indexOf(")") + 1);
 }
 t.superscript = true;
 }

t.text = text;

spans[spanIndex] = t;
 spanIndex++;
 }
 }

f.richValue = spans;
 for(var j=0; j < readOnlyFields.length; j++){
 if(readOnlyFields[j] == f.name){
 f.readonly = true;
 break;
 }
 }
 c.checkThisBox(0, false);
 }
 } 

Ok, so some explanation. Remember earlier I said that in some situations our code is leaving form fields editable. In this code we are marking the field as read only instead of flattening the page because Adobe JavaScript does not have a mechanism to do partial form flattening. It does it at the document level only. Also remember that we are using a hidden check box field in the PDF to determine if the JavaScript should run or not. That should explain the first couple lines. One other key bit of detail is how we flag a string to need super scripting in SharePoint. We wanted the menu writers to have fine grain control over the super scripting. We could have just looked if the field was a price value, trimmed off the last two digits and been done with it. But what if our chefs invent tiramisu2 (imagine that 2 is super scripted)? To give the menu writers that type of control we made a markup syntax where any text within a (s) would be superscripted. Also, we noticed that some fonts do not look right when super scripted, they were too big and looked…off. So our menu writers can also specify a different font size for the super script by adding (xx) after the (s) (where xx is a percentage of the parent containers font). For example, “Our tiramisu(s)(85)2 is the best” would make a string that has a 2 super scripted with a font that is 85% smaller than “tiramisu”.

At the end of the scripts run it produces something along these lines.

 var f = getField("Price0");
 var spans = new Array();
 spans[0] = new Object();
 spans[0].text = "Our tiramisu";
 spans[1] = new Object();
 spans[1].text = "2";
 spans[1].superscript = true;
 spans[1].textSize = 17;
 spans[2] = new Object();
 spans[2].text = " is the best";
 f.richValue = spans;  

A quick note we found about changing the font like this. Apparently Acrobat calculates the vertical offset for the super scripted text using the font of the current chunk, not the parent container. The net of this is if your font size difference is drastic, say 20 points vs. 5 points, then the super scripted text renders as if the whole form field is 5 point, meaning the text shows up in the middle of the 20 point text. Its a minor concern, but something worth pointing out.

For More Info

Additional information about the Spans object and how they work can be found here: http://partners.adobe.com/public/developer/en/acrobat/sdk/Acro6JS.pdf (word to the wise, though. That is an older API reference and some things have been deprecated).

Questions about Populating a SharePoint List to PDF?

Leave a comment below…

Lane GoolsbyPopulating a SharePoint List to PDF

1 comment

Join the conversation
  • Suhas Yerramsetty - November 29, 2019 reply

    This is a great article. I just have one question, does the template need to have predefined field labels and fields in order for this approach to work. Else can the app generate the field labels as well?

Join the conversation

This site uses Akismet to reduce spam. Learn how your comment data is processed.