How to generate documents in word. Exwog - report generator from Excel to Word by template

We live in a world where PHP developers have to interact with the Windows operating system from time to time. WMI (Windows Management Interface) is one such example - interacting with Microsoft Office.

In this article, we will look at a simple integration between Word and PHP: generating a Microsoft Word document from input fields in an HTML form using PHP (and its Interop extension).

Preparatory steps

The first step is to make sure we have a basic WAMP environment set up. Since Interop is only available on Windows, we need our Apache server and PHP installation to be deployed on a Windows machine. In this capacity, I use EasyPHP 14.1, which is extremely easy to install and configure.

The next step is to install Microsoft Office. The version is not very important. I'm using Microsoft Office 2013 Pro, but any version of Office older than 2007 should work.

You also need to make sure that we have libraries installed for developing the Interop application (PIA, Primary Interop Assemblies, Basic Interop Assemblies). You can find out by opening Windows Explorer, and going to the directory \ assembly, and there we should see a set of installed assemblies:

You can see the Microsoft.Office.Interop.Word element here (underlined in the screenshot). This will be the assembly we will be using in our demo. Please pay special attention to the “Assembly name”, “Version” and “Public key token” fields. We'll be using them in our PHP script soon.

This directory also contains other assemblies (including the entire Office family) available for use in their programs (not only for PHP, but also for VB.net, C #, etc.).

If the assembly list does not include the entire Microsoft.Office.Interop package, then we need to either reinstall Office by adding the PIA, or manually download the package from Microsoft and install it. For more detailed instructions see this MSDN page.

Comment: only the PIA distribution kit for Microsoft Office 2010 is available for download and installation. The version of assemblies in this package is 14.0.0, and version 15 comes only with Office 2013.

Finally, you need to enable the php_com_dotnet.dll extension in php.ini and restart the server.

Now you can start programming.

HTML form

Since the bulk of this example falls on the server side, we'll create a simple page with a form that looks like this:

We have a text field for the name, a group of radio buttons for gender, a slider for age, and a text input area for entering a message, as well as the infamous “Send” button.

Save this file as “index.html” in the virtual host directory so that it can be accessed at an address like http: // test / test / interop.

Server part

The server-side handler file is the main focus of our conversation. To begin with, I will give the complete code of this file, and then I will explain it step by step.

visible = true; $ fn = __DIR__. "\\ template.docx"; $ d = $ w-> Documents-> Open ($ fn); echo "Document is open.


"; $ flds = $ d-> Fields; $ count = $ flds-> Count; echo" There are $ count fields in the document.
"; echo"
    "; $ mapping = setupfields (); foreach ($ flds as $ index => $ f) ($ f-> Select (); $ key = $ mapping [$ index]; $ value = $ inputs [$ key]; if ($ key == "gender") (if ($ value == "(! LANG: m") $value = "Mr."; else $value = "Ms."; } if($key=="printdate") $value= date ("Y-m-d H:i:s"); $w->Selection->TypeText($value); echo "!}
  • I assign the field $ index: $ key to $ value
  • ";) echo"
"; echo" Processing completed!

"; echo" Typing, please wait ...
"; $ d-> PrintOut (); sleep (3); echo" Done! "; $ w-> Quit (false); $ w = null; function setupfields () ($ mapping = array (); $ mapping = "gender"; $ mapping = "name"; $ mapping = "age"; $ mapping = "msg"; $ mapping = "printdate"; return $ mapping;)

After we have written the values ​​obtained from the form to the $ inputs variable, and also created an empty element with the printdate key (why we did this - we will discuss later), we move on to four very important lines:

$ assembly = "Microsoft.Office.Interop.Word, Version = 15.0.0.0, Culture = neutral, PublicKeyToken = 71e9bce111e9429c"; $ class = "Microsoft.Office.Interop.Word.ApplicationClass"; $ w = new DOTNET ($ assembly, $ class); $ w-> visible = true;

The PHP COM manipulator requires the instantiation of a class within the "assembly". In our case, we are working with Word. If you look at the first screenshot, you can write down the complete assembly signature for Word:

  • “Name”, “Version”, “Public Key Token” are all taken from information that can be viewed in “c: \ Windows \ assembly“.
  • “Culture” is always neutrual

The class we want to refer to is always named “assembly name” + “.ApplicationClass“.

By setting these two parameters, we can get an object for working with Word.

This object can remain in the background, or we can bring it to run by setting the visible attribute to true.

The next step is to open the document that needs processing and write an instance of the “document” to the $ d variable.

There are several ways to create content based on form data in your document.

The worst thing to do would be to hard-code the content of the document in PHP and then output it to a Word document. I highly recommend not doing this for the following reasons:

  1. You are losing flexibility. Any changes to the output file will require changes to the PHP code.
  2. This breaks the separation of control and view
  3. Applying styles to document content (alignment, fonts, styles, etc.) in a script will greatly increase the number of lines of code. Changing styles programmatically is too cumbersome.

Another option would be to use find and replace. PHP has good built-in facilities for this. We can create a Word document in which we will place labels with special delimiters, which will later be replaced. For example, we can create a document that will contain the following snippet:

and with PHP we can easily replace it with the contents of the "Name" field, obtained from the form.

It's simple, and saves us from all the unpleasant consequences that we face in the first method. We just need to decide on the correct delimiter, in which case we are using a template.

I recommend the third way, and it builds on a deeper knowledge of Word. We will use fields as placeholders, and using PHP code, we will directly update the values ​​in the fields with the corresponding values.

This approach is flexible, fast, and consistent with Word best practices. It also avoids full-text searches in the document, which is good for performance. Note that this solution also has disadvantages.

Word did not support named indexes for fields from the beginning. Even if we have specified the names for the fields being created, we still need to use the numeric identifiers of these fields. It also explains why we need to use a separate function (setupfields) to map the field index to the field name from the form.

In this demo tutorial, we will use a document with 5 MERGEFIELD fields. We will place the template document in the same place as our script handler.

Please note that the printdate field does not have a corresponding field on the form. This is why we added an empty printdate element to the $ inputs array. Without this, the script will still run and work, but PHP will issue a warning that the printdate index is not present in the $ inputs array.

After replacing the fields with new values, we will print the document using

$ d-> PrintOut ();

The PrintOut method takes several optional parameters, and we'll use its simplest form. This will print one copy of the document on the default printer attached to the Windows machine.

You can also call PrintPreview to take a look at the resulting output before printing it. In a fully automated environment, we will of course use the PrintOut method.

You must wait a while before exiting Word as it takes time to queue a print job. Without delay (3), the $ w-> Quit method is executed immediately and the job is not queued.

Finally, we call $ w-> Quit (false), which closes the Word application that was called by our script. The only parameter passed to the method is to tell you to save the file before exiting. We've made edits to the document, but we don't want to save them as we need a clean template for later work.

After we're done with the code, we can load our form page, fill in some values, and submit it. The images below show the output of the script as well as the updated Word document:

Improving processing speed and a little more about PIA

PHP is a weakly typed language. A COM object of type Object. At the time of writing the script, we have no way to get the description of the object, be it a Word application, a document or a field. We don't know what properties this object has, or what methods it supports.

This will slow down development speed a lot. To speed up development, I would recommend writing functions first in C #, and then translating the code into PHP. I can recommend a free IDE for C # development called “#develop”. You can find it. I prefer it to Visual Studio because #develop is smaller, simpler, and faster.

Migrating C # code to PHP is not as scary as it sounds. Let me show you a couple of lines in C #:

Word.Application w = new Word.Application (); w.Visible = true; String path = Application.StartupPath + "\\ template.docx"; Word.Document d = w.Documents.Open (path) as Word.Document; Word.Fields flds = d.Fields; int len ​​= flds.Count; foreach (Word.Field f in flds) (f.Select (); int i = f.Index; w.Selection.TypeText ("...");)

You will notice that the C # code is very similar to the PHP code I showed earlier. C # is a strongly typed language, so in this example you will notice several casting operators, and variables need to be typed.

By specifying the type of the variable, you can enjoy clearer code and auto-completion, and the development speed is significantly increased.

Another way to speed up your PHP development is to call a macro in Word. We carry out the same sequence of actions, and then save it as a macro. The macro is written in Visual Basic, which is also easy to translate into PHP.

And most importantly, the Office PIA documentation from Microsoft, especially the namespace documentation for each Office application, is the most detailed reference material. The three most used applications are:

  • Excel 2013: http://msdn.microsoft.com/en-us/library/microsoft.office.interop.excel(v=office.15).aspx
  • Word 2013: http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word(v=office.15).aspx
  • PowerPoint 2013: http://msdn.microsoft.com/en-us/library/microsoft.office.interop.powerpoint(v=office.15).aspx

Conclusion

In this article, we showed you how to populate a Word document with data using the PHP COM libraries and Microsoft Office interoperability.

Windows and Office are widely used in everyday life. Knowing the power of Office / Window and PHP will be useful for every PHP and Windows developer.

The PHP COM extension opens the door for you to use this combination.

We continue the topic of working with forms in Word, which we started earlier. In previous articles, we looked at forms only from the “advanced user” point of view, ie. we have created documents that are easy to fill out manually. Today I want to propose to expand this task and try to use the Content controls mechanism to generate documents.

Before we get down to our immediate task, I want to say a few words about how data for content controls is stored in Word documents (I will deliberately omit how they are bound to the content of the document, but I hope to return to this sometime in the next articles).

A natural question - what is itemProps1.xml and similar components? These components store descriptions of data sources. Most likely, as planned by the developers, in addition to the xml-ek embedded in the document, it was supposed to use others, but so far only this method has been implemented.

What are useful to us itemPropsX.xml? The fact that they list xml-schemas (their targetNamespace) that are used in the parent itemX.xml... This means that if we have connected more than one custom xml to the document, then to find the one we need, we need to go through itemPropsX.xml components and find the desired circuit, and therefore the desired itemX.xml.

Now one more thing. We will not manually analyze the relationships between the components and search for the necessary ones using only the basic Packaging API! Instead, we'll use the Open XML SDK (its assemblies are available through NuGet). Of course, earlier we did not say a word about this API, but for our task a minimum is required from it and all the code will be quite transparent.

Well, the main introduction has been made, you can start with an example.

By tradition, we will take the same “Meeting Report” that we drew in the article. Let me remind you that this is how the document template looked like:

And like this, the XML to which the document fields were bound

< meetingNotes xmlns ="urn:MeetingNotes" subject ="" date ="" secretary ="" > < participants > < participant name ="" /> < decisions > < decision problem ="" solution ="" responsible ="" controlDate ="" />

Step 1. Creating a data model

Actually, our task is not just to generate a document, but to create (at least in a draft version) a convenient tool for use by both the developer and the user.

Therefore, we will declare the model as a structure of C # classes:

Public class MeetingNotes (public MeetingNotes () (Participants = new List (); Decisions = new List (); ) public string Subject (get; set;) public DateTime Date (get; set;) public string Secretary (get; set;) public List Participants (get; set;) public List Decisions (get; set;)) public class Decision (public string Problem (get; set;) public string Solution (get; set;) public string Responsible (get; set;) public DateTime ControlDate (get; set;)) public class Participant (public string Name (get; set;))

By and large, nothing special, except for the addition of attributes to control XML serialization (since the names in the model and the required XML are slightly different).

Step 2. Serialize the above model to XML

The task is, in principle, trivial. What is called "take our favorite XmlSerializer and go", if not for one but

Unfortunately, the current version of Office seems to have a bug, which is as follows: if the custom xml front declaring the main namespace (the one from which Word should take elements for display), declare another one, then the repeated Content controls start to display incorrectly (only as many elements are shown as were in the template itself - i.e. repeating section does not work ).

Those. this xml works:

< test xmlns ="urn:Test" attr1 ="1" attr2 ="2" > < repeatedTag attr ="1" /> < repeatedTag attr ="2" /> < repeatedTag attr ="3" />

and this one too:

< test xmlns ="urn:Test" attr1 ="1" attr2 ="2" xmlns:t ="urn:TTT" > < repeatedTag attr ="1" /> < repeatedTag attr ="2" /> < repeatedTag attr ="3" />

but this one is no longer:

< test xmlns:t ="urn:TTT" xmlns ="urn:Test" attr1 ="1" attr2 ="2" > < repeatedTag attr ="1" /> < repeatedTag attr ="2" /> < repeatedTag attr ="3" />

I tried to submit a bug to Microsoft support on Connect, but for some reason I have no access to submit bugs through Office. And the discussion on the MSDN forum didn't help either.

In general, a necessary roundabout maneuver. If we formed XML by hand, there would be no problems - we would have done everything ourselves. However, in this case, I really want to use the standard XmlSerializer, which by default adds several of its namespaces to the output XML, even if these namespaces are not used.

We'll do a complete suppression of the output of our own namespaces in the XmlSerializer. True, this approach will work only if he really does not need them (otherwise they will still be added and just BEFORE ours).

Actually, the whole code (provided that the variable meetingNotes contains a previously populated MeetingNotes object):

var serializer = new XmlSerializer (typeof (MeetingNotes));
var serializedDataStream = new MemoryStream ();

var namespaces = new XmlSerializerNamespaces ();
namespaces.Add (“”, “”);

serializer.Serialize (serializedDataStream, meetingNotes, namespaces);
serializedDataStream.Seek (0, SeekOrigin.Begin);

Step 3. Enter the resulting XML into a Word document.

Here we do the following:

  • copy the template and open the copy
  • find the required custom xml in it (search by namespace "Urn: MeetingNotes")
  • replace the content of the component with our XML

File.Copy (templateName, resultDocumentName, true); using (var document = WordprocessingDocument.Open (resultDocumentName, true)) (var xmlpart = document.MainDocumentPart.CustomXmlParts .Single (xmlPart => xmlPart.CustomXmlPropertiesPart.DataStoreItem.SchemaReferences.OfType () .Any (sr => sr.Uri.Value == "(! LANG: urn: MeetingNotes"!}

In the previous articles of the series "Automation of filling out documents" I talked about how to create the user interface of the application, organize the validation of input data and get the number in words without using VBA code. In this final article, we will talk about the magic - transferring all the necessary values ​​from an Excel workbook to a Word document. Let me show you what should be the result:

Mechanism description

To begin with, I will describe in general terms how the data will be transferred into a Word document. First of all, we need a Word document template that contains all the markup, tables, and that part of the text that will remain unchanged. In this template, you need to determine the places in which the values ​​from the Excel workbook will be substituted - the most convenient way to do this is using bookmarks. After that, you need to organize the Excel data in such a way as to match the Word template, and last but not least, write the transfer procedure itself to VBA.

So, first things first.

Create a Word Document Template

Everything is extremely simple here - we create a regular document, type and format the text, in general, we strive to get the required form. In those places where it will be necessary to substitute values ​​from Excel, you need to create bookmarks. This is done as follows:

Thus, you will need to create all bookmarks, that is, mark all the places where data from Excel will be inserted. The resulting file must be saved as "MS Word Template" using the menu item "File" -> "Save As ...".

Preparing Excel data

For convenience, I decided to place all the data that needs to be transferred into the Word document on a separate worksheet called Bookmarks - bookmarks. This sheet has two columns: the first contains the bookmark names (exactly as they are named in the Word document), and the second contains the corresponding values ​​to be wrapped.

Some of these values ​​are obtained directly from the data entry sheet, and some from auxiliary tables located on the Support sheet. In this article, I will not analyze the formulas that calculate the required values, if something is not clear - ask questions in the comments.

At this stage, it is important to correctly indicate all the names of the bookmarks - the correctness of the data transfer depends on this.

Transfer procedure

But this is the most interesting thing. There are two options for executing the data transfer code:

  • The code is executed in an Excel workbook, the data is passed to Word one value at a time and immediately placed in the document.
  • The code is executed in a separate Word document, all data is transferred from Excel in one batch.

From the point of view of speed of execution, especially with a large number of bookmarks, the second option looks much more attractive, but it requires more complex actions. This is what I used.

Here's what you need to do:

  • Create a macro-enabled Word document template. This template will contain executable VBA code.
  • In the created template, you need to place a program written in VBA. To do this, when editing a template, press the Alt + F11 key combination and enter the program code in the opened Visual Basic editor window.
  • In an Excel workbook, write code that calls the fill procedure from the newly created Word template.

I will not provide the text of the procedure in the article - it can be easily viewed in the FillDocument.dotm file located in the Template folder in the archive with the example.

How can you use all this to solve your particular problem?

I understand that in words it all looks very simple, but what happens in practice? I suggest you just use a ready-made option. Download the archive with the example, in an Excel workbook press Alt + F11 to open the Visual Basic editor and read all my comments on the program. In order to change the program to suit your needs, you just need to change the value of several constants, they are placed at the very beginning of the program. You can freely copy the entire program text into your project.

Archive structure

The archive attached to this article contains several files.

The main file is an Excel workbook called "Create Confirmations". There are 4 worksheets in this workbook, of which only two are displayed: "Input" - a data entry sheet and "Database" - an archive of all entered documents.

The Templates folder contains Word document templates. One is a template containing a bookmark filling program, and the other is a form to fill out. You can use the template with the program without changes, but the form for filling, of course, will have to be redone in accordance with your needs.

How to rework the example "for yourself"?

  1. Prepare a Word document template that you need to fill out. Create all the necessary bookmarks in it and save it as a "MS Word template".
  2. Copy the FillDocument.dotm file from the archive attached to this article to the folder with the prepared template. This file is responsible for filling in the template bookmarks, and you do not need to change anything in it.
  3. Prepare an Excel workbook for data entry. It is up to you to decide if it will have any "advanced" user interface and perform various clever calculations. The main thing is that it contains a worksheet with a table of correspondence between the name of the bookmark in the Word template and the value that needs to be substituted.
  4. Insert the VBA program code from the sample file into the prepared workbook. Replace all constants according to your project.
  5. Test the correctness of work.
  6. Use it actively!

A, last names in column B and professions in the column C.

2. Create a word document (.doc or.docx)


(A), (B) and (C).

(A), (B) and (C) (A)- name, (B)- surname, (C)- a profession.

Settings programs.

3. Select paths for files and folders


Select

4. Set the sheets and rows of the data you want


Excel file data sheets

Excel file data rows Excel file data sheets

1 .

If you want all sheets and / or lines of your excel file with data to participate in the formation of the document, click on the right button with a lettering Numbers(in this case, its inscription will change to All).

5. Set the template for the names of new word files


Set the template for the names of new word files:

New word files names template is a template for the names of new documents (word files) generated by the program. Here the name template contains the column names of the excel file, surrounded by curly braces: (A) and (B)... When generating a new document, the program will replace all (A) and (B) the corresponding cell values ​​from the excel file - this will be the name of the new document (word file).

You can set your framing symbols in the tab Settings programs.

6. Click "Generate"


Click the button Generate and the progress will appear on the screen. The number of documents (word-files) will be created exactly as many lines of the excel-file are involved in the formation.

7. Everything


All documents (word files) are created and are in the folder specified in Folder to save the new word files... Everything:)

Exwog - report generator from Excel to Word by template

Free generator of Word files from a template (Word file) based on Excel file data

Works on Mac OS, Windows and Linux

Allows you to specify the names of new generated word files

Allows you to define sheets and lines of the desired data

Allows you to set enclosing characters for Excel column names

Easy to use

Store your data in Excel format (.xls and .xlsx) and generate Word files (.doc and .docx) in a few clicks :)


How it works?

Take a look at your excel file


In this example, the excel file contains customer information. Each line corresponds to a specific client. The names are located in the column A, last names in column B and professions in the column C.

Click to view

Create a word document (.doc or.docx)


Click to view

Create a "template" (word-file) for generating new documents (word-files). Here, the "template" text contains the column names of the excel file, surrounded by curly braces: (A), (B) and (C).

The program will generate new documents according to the "template" replacing all (A), (B) and (C) the corresponding cell values ​​from the excel file: (A)- name, (B)- surname, (C)- a profession.

Also you can set your own framing symbols on the tab Settings programs.

Choose paths for files and folders


Select paths for files and folders (buttons labeled Select). In the program, you set the following paths:

Excel file with data (* .xls, * .xlsx)- this is the path to your excel data file (customer information);

Word template file (* .doc, * .docx)- this is the path to your "template" (word-file created in the previous step);

Folder to save the new word files- this is the path to the folder where the program will save new generated documents.

Click to view

Set the sheets and rows of the data you want


Click to view

Specify the numbers of sheets and lines of your excel file with data (customer information) for which you want to generate documents:

Excel file data sheets- numbers of sheets of your excel file that will participate in the formation of new documents;

Excel file data rows- line numbers of sheets (sheets specified in Excel file data sheets) of your excel file that will participate in the formation of new documents. A separate document (word file) will be created based on the data of each specified line.

The numbering of sheets and lines in the program starts with 1 .