DotWhat.net

Microsoft's Office Open XML file formats


Microsoft have announced that as of Office 2007, all Microsoft Office document files will be adopting file formats based on XML.

Known as 'Microsoft Office Open XML' (although also referred to as OOXML, OpenXML and Open XML), this new format will apply to the following Office components:

  • Microsoft Office Word 2007

  • Microsoft Office Excel 2007

  • Microsoft Office PowerPoint 2007


Word 2007+ will now use the .docx (or .docm) file extension instead of .doc

Excel 2007+ will now use the .xlsx file extension instead of .xls

PowerPoint 2007+ will now use the .pptx file extension instead of .ppt

It should be noted that the Office 2007 can still save files in binary format like previous versions of Office. However, the default will now be Open XML.

Microsoft adopting the XML format will provide many benefits to businesses, developers and individuals.

Smaller file sizes

All Microsoft Office Open XML formatted files are compressed, potentially reducing file size by 75% and therefore reducing the disk storage space required to store the files. Compressing the files also reduces the bandwidth required to send the files by email, ftp, over networks or across the Internet. Office 2007 files are automatically compressed when saving, and uncompressed when opening. No additional software is required to compress the data, as this is a built-in component of Microsoft Office 2007.

Improved data recovery

Microsoft's Open XML formatted files are created in a modular structure. This means that if a chart in the middle of the data has become corrupt, you should be able to access the data before and after.

Better privacy

Using Microsoft's document inspector, you are now able to view files that may contain potentially sensitive information such as the document's author, comments and file location.

Data integration and interoperability

Another benefit of basing files on XML is that the data can be accessed by any application that supports XML and zip compression.

Macro identification

Under Office 2007, it will be easier to determine which files contain macros (which can be potentially dangerous). All Office files that have the 'x' suffix (I.E .docx) cannot contain macros. Instead, only Office files that have the 'm' suffix (I.E. .docm) can contain macros.