Dec/090
Word Automation Services: What It Does
This post is syndicated from Microsoft Word 2010.
Following up on my first post about Word Automation Services, I wanted to continue by talking about the functionality offered (and not offered) by the service, how it's exposed, and the types of solutions you will be able to build on top of it.
What the Service Does
Functionally, the service is very simple – this is intentional, as we wanted to address the pain points that we've heard loud and clear from you over the past few years, while keeping performance and scale at the top of our priorities (which meant avoiding the temptation to bring over everything "just because").
With that mindset, we really only set out to tackle the two most common requests that we hear:
- I have a bunch of Word documents. I want to convert them to PDF on the server in bulk (e.g. DOCX to PDF).
- I have a template and some data. I want to merge the two and create a set of PDF files; one per merge result (e.g. mail merge to PDF).
Now, when we hear that, the output format's not always PDF (but it's probably the most common). As we translated that to features, it meant that the server needed to do one thing really well: file conversions. Accordingly, Word Automation Services supports conversions to/from almost all of the formats Word client understands:
File formats the service can read:
- Office Open XML (DOCX, DOCM, DOTX, DOTM)
-
Word 97-2003 Document (DOC) and Word 97-2003 Template (DOT)
- We also support older versions of Word as far back as Word 2.0 for Windows (!)
- Rich Text Format (RTF)
- Single File Web Page (MHTML)
- HTML
- Word 2003 XML
- Word 2007/2010 XML
File formats the service can write:
- XPS
- Office Open XML (DOCX, DOCM)
- Word 97-2003 Document (DOC)
- Rich Text Format (RTF)
- Single File Web Page (MHTML)
- Word 2007/2010 XML
This also meant that we needed to support all of the features that are part of loading/saving documents, i.e.:
- XML data mapping – you can place updated XML in the document, and content controls will automatically be updated
- Fields – the service (or the file) can be set to recalculate fields automatically during conversion
- AF Chunks – you can embed documents (DOCX, HTML, RTF, DOC) within a DOCX file, and have the service merge in the content automatically
- Upgrade – you can specify whether the file should be upgraded as part of loading it on the server
- Add Thumbnail images on save
- Etc.
How It's Exposed
To expose this capability, we also thought small (and hopefully simple) – the service exists as a managed API you can utilize on SharePoint, allowing you to build on top of it as appropriate for your solutions – maybe that's a WCF service, maybe a custom workflow activity, etc.
That API breaks down into two basic objects:
- ConversionJob – the object that encapsulates 1+ conversions that you want to perform as a logical unit
- ConversionJobStatus – the object that allows you to query the status of a ConversionJob while/after it's processed
With the first, you ask us to convert files on the server and put the result back on the server; with the second, you query the progress of that conversion process.
Example
As an example, consider a server solution in which I want to allow users to schedule self-service conversions: they can right-click on a file in SharePoint and request a XPS version of that file.
On my ASPX page for the conversion, the button handler might contain the following code:
public void Convert_Click(…){
ConversionJob job = new ConversionJob("Word Automation Services")
job.UserToken = SPContext.Site.UserToken;
job.UpdateFields = true;
job.OutputFormat = SaveFormat.XPS;
job.AddFile("http://contoso.com/input/foo.docx","http://contoso.com/output/foo.xps");
job.Start();
}
And that's all that's required – I create a ConversionJob object to encapsulate the action, tell it to convert to XPS and update fields using my credentials to read/write the files, tell it the file to convert, and use Start() to kick off the process.
Once it's running, I can easily query the status of that conversion – the job.JobId property specified a unique GUID for that job that I could have stored and reused, e.g.:
public void CheckStatus(Guid jobId)
{
ConversionJobStatus status = new ConversionJobStatus("Word Automation Services", jobId, null);
if (status.Count == status.Succeeded)
{
//success!
//do something
}
else if (status.Count == status.Failed)
{
//failure :(
//do something else
}
…
}
Just by creating a ConversionJobStatus object, I immediately know where that item is in the system (Succeeded, Failed, InProgress, NotStarted) and can react appropriately.
That example's probably two-thirds of the API – the goal really was to keep it simple and focus doing those two things really well.
Back to the Open XML SDK
Now, the one thing I didn't directly address in this post was the "merging documents with data" piece above.
That part of our solution isn't just the service itself – it's actually solved in combination with the Open XML SDK. I'm going to talk about the SDK a lot when I talk about the server; as I said in the first post, it's the combination of the two that provides the end-to-end story that we believe replaces the need to automate the client applications.
In this case, you'd use the SDK to clone the template and inject the data (a task well suited to manipulation of the file format), and use the service to convert the resulting files to PDF/XPS.
I hope that was a useful introduction to what we're doing and how you'll be able to work with it – in the next post, I'll talk more about our architecture and how we're leveraging the strengths of the SharePoint platform.
- Tristan
Dec/090
Word Automation Services: What It Does
This post is syndicated from Microsoft Word 2010.
Following up on my first post about Word Automation Services, I wanted to continue by talking about the functionality offered (and not offered) by the service, how it's exposed, and the types of solutions you will be able to build on top of it.
What the Service Does
Functionally, the service is very simple – this is intentional, as we wanted to address the pain points that we've heard loud and clear from you over the past few years, while keeping performance and scale at the top of our priorities (which meant avoiding the temptation to bring over everything "just because").
With that mindset, we really only set out to tackle the two most common requests that we hear:
- I have a bunch of Word documents. I want to convert them to PDF on the server in bulk (e.g. DOCX to PDF).
- I have a template and some data. I want to merge the two and create a set of PDF files; one per merge result (e.g. mail merge to PDF).
Now, when we hear that, the output format's not always PDF (but it's probably the most common). As we translated that to features, it meant that the server needed to do one thing really well: file conversions. Accordingly, Word Automation Services supports conversions to/from almost all of the formats Word client understands:
File formats the service can read:
- Office Open XML (DOCX, DOCM, DOTX, DOTM)
-
Word 97-2003 Document (DOC) and Word 97-2003 Template (DOT)
- We also support older versions of Word as far back as Word 2.0 for Windows (!)
- Rich Text Format (RTF)
- Single File Web Page (MHTML)
- HTML
- Word 2003 XML
- Word 2007/2010 XML
File formats the service can write:
- XPS
- Office Open XML (DOCX, DOCM)
- Word 97-2003 Document (DOC)
- Rich Text Format (RTF)
- Single File Web Page (MHTML)
- Word 2007/2010 XML
This also meant that we needed to support all of the features that are part of loading/saving documents, i.e.:
- XML data mapping – you can place updated XML in the document, and content controls will automatically be updated
- Fields – the service (or the file) can be set to recalculate fields automatically during conversion
- AF Chunks – you can embed documents (DOCX, HTML, RTF, DOC) within a DOCX file, and have the service merge in the content automatically
- Upgrade – you can specify whether the file should be upgraded as part of loading it on the server
- Add Thumbnail images on save
- Etc.
How It's Exposed
To expose this capability, we also thought small (and hopefully simple) – the service exists as a managed API you can utilize on SharePoint, allowing you to build on top of it as appropriate for your solutions – maybe that's a WCF service, maybe a custom workflow activity, etc.
That API breaks down into two basic objects:
- ConversionJob – the object that encapsulates 1+ conversions that you want to perform as a logical unit
- ConversionJobStatus – the object that allows you to query the status of a ConversionJob while/after it's processed
With the first, you ask us to convert files on the server and put the result back on the server; with the second, you query the progress of that conversion process.
Example
As an example, consider a server solution in which I want to allow users to schedule self-service conversions: they can right-click on a file in SharePoint and request a XPS version of that file.
On my ASPX page for the conversion, the button handler might contain the following code:
public void Convert_Click(…){
ConversionJob job = new ConversionJob("Word Automation Services")
job.UserToken = SPContext.Site.UserToken;
job.UpdateFields = true;
job.OutputFormat = SaveFormat.XPS;
job.AddFile("http://contoso.com/input/foo.docx","http://contoso.com/output/foo.xps");
job.Start();
}
And that's all that's required – I create a ConversionJob object to encapsulate the action, tell it to convert to XPS and update fields using my credentials to read/write the files, tell it the file to convert, and use Start() to kick off the process.
Once it's running, I can easily query the status of that conversion – the job.JobId property specified a unique GUID for that job that I could have stored and reused, e.g.:
public void CheckStatus(Guid jobId)
{
ConversionJobStatus status = new ConversionJobStatus("Word Automation Services", jobId, null);
if (status.Count == status.Succeeded)
{
//success!
//do something
}
else if (status.Count == status.Failed)
{
//failure :(
//do something else
}
…
}
Just by creating a ConversionJobStatus object, I immediately know where that item is in the system (Succeeded, Failed, InProgress, NotStarted) and can react appropriately.
That example's probably two-thirds of the API – the goal really was to keep it simple and focus doing those two things really well.
Back to the Open XML SDK
Now, the one thing I didn't directly address in this post was the "merging documents with data" piece above.
That part of our solution isn't just the service itself – it's actually solved in combination with the Open XML SDK. I'm going to talk about the SDK a lot when I talk about the server; as I said in the first post, it's the combination of the two that provides the end-to-end story that we believe replaces the need to automate the client applications.
In this case, you'd use the SDK to clone the template and inject the data (a task well suited to manipulation of the file format), and use the service to convert the resulting files to PDF/XPS.
I hope that was a useful introduction to what we're doing and how you'll be able to work with it – in the next post, I'll talk more about our architecture and how we're leveraging the strengths of the SharePoint platform.
- Tristan
Oct/090
Outlook .pst file format and interoperability
This post is syndicated from Microsoft Office Outlook Team Blog.
This week, the Outlook product team hosted a .pst file format interoperability event here on the Microsoft campus in Redmond. As we announced on the Interoperability @ Microsoft blog, our team plans to release a specification of the .pst file format to the public. This week’s interoperability event is part of a series of steps that we are taking to gather feedback from industry partners and experts on preliminary drafts of the specification. If you are not familiar with the underpinnings of Outlook, .pst files are one type of data file that Outlook uses to save user data such as e-mail messages, contacts, and appointments.
During the interoperability event, we presented a preliminary specification of the .pst file format to selected industry experts in areas such as antimalware, electronic records management, data archiving, data recovery, and data migration. We collected useful feedback about our documentation roadmap, and the attendees were supportive of the direction and approach we are taking.
We understand our plan to document the .pst file format might cause some of our customers and partners to wonder about our commitment to MAPI (Messaging API) and the Outlook Object Model as interoperability mechanisms of Outlook. To us, the .pst file format specification doesn’t change the role of MAPI and the Outlook Object Model. While we are pleased to provide another mechanism to access data stored in .pst files, we continue to support MAPI and the Outlook Object Model as key elements of Outlook interoperability and extensibility. We do expect that the release of the .pst file format specification will open up new usage scenarios that were previously difficult to accomplish, especially in multi-platform and server scenarios where MAPI and the Outlook Object Model are not available.
Since we announced our plan to release the .pst file specification, we have received requests from people who want to participate in the review of early drafts of the specification. If you are interested in actively participating in the review of preliminary drafts of the .pst file format specification, send an email message to pstinfo@microsoft.com and then we will contact you when a preliminary draft of the specification is ready for broader review. If you only are seeking the final version, we anticipate releasing the .pst file format specification in the first half of 2010 under our Open Specification Promise.
Daniel Ko
Outlook Development Manager