Google Summer of Code With LibreOffice — Project Overview

Project Title: Move the gallery code to use ZIP files

Aditya Sahu
The Startup

--

Hi! In this story I’m going to share the technical details about the project that I have undertaken this summer with LibreOffice.

The Objective

The project objective is to move the gallery code to use ZIP files — hence the name. Previously, the software used its custom built binary formats to store the information of gallery themes. These formats(like .sdg, .sdv, .thm, .str) have data stored inside of them which is not present in human friendly language. For example, arrows.str stores the name of the theme called Arrows, as shown below.

arrows.str

This example is the “most friendly” that binary files can get with user. If you don’t believe me, look at the figure below. It’s not readable by human, because it’s a binary file!

arrows.thm

So the primary objective of the project is to transform the binary files to human readable formats and enclose them into a ZIP file altogether. ZIP format is the standard encapsulation format and supports lossless data compression. This project will be a great improvement in the LibreOffice source code as gallery themes will finally be easily read and written. The developers will be benefited greatly from the project because it will be easier to resolve bugs, add improvements and features to the gallery. Let’s discuss the project phases.

While writing the proposal for the project, Tomaž Vajngerl (my mentor) and I had discussed a layout of the structure and how the project would be divided in different phases.

Phase 1: Unit Testing
Phase 2: Refactoring of code
Phase 3: Implementation of XML+ZIP Engine

Phase 1: Unit Testing

“It is to be observed that the refactoring phase should not affect the behaviour of the code. This will only be done to make the current code cleaner and easier to work with. To make sure that refactoring is done right, tests must be written.” -GSoC Proposal

To know the reason why the unit testing phase exists in the first place, it’s important to know a little bit about the following phase — refactoring of code.

A snapshot of the LibreOffice Gallery is shown below:

LibreOffice gallery

Previously, the code behind the module consisted of few classes like Gallery, GalleryTheme, GalleryThemeEntry, SgaObject, etc. These classes form the overall functionality of the gallery module. The only problem was that the classes dealt with all the functionalities by themselves which made the code mixed up, cumbersome and hard to maintain. The classes had member functions that contained the logic of the code, that stores the binary files and reads from the files as well.
For example, GalleryTheme has function called InsertObject() that inputs two parameters: an SgaObject to be inserted and a position. It can be divided into two parts:
1. Logic: It checks if the given SgaObject is valid. If not, it returns false. It then iterates through the list of objects to find the object which matches its position.
2. R/W: If a match is found (the object already exists), it updates the data by “reading” the object… else it proceeds to “write” the object to memory.

The first step in the project would be to separate out these reading/writing part from the main logic so that the whole piece would be contained in a chunk of code. To introduce separate classes and use them for writing and reading purposes, while main logic would still lie in the basic classes.

In order to do so, the code would be meddled with and there would be lots of changes make. Changes in the structure, not in functionality. This forms the refactoring phase. Since there is no change in functionality, it is better to write some tests beforehand to see if everything works fine even after the code is refactored. For that, unit testing comes into play.

Phase 2: Refactoring of Code

Let’s take a look at the UML diagram of the previous gallery structure which marks some important classes and associations:

Old Gallery Structure

If you’ve been following the article till now, I’m sure you now know why we needed to refactor the gallery code. The following figure shows the proposed structure after refactoring is done. The code where the main logic lies will be separate from the code which deals with reading and writing of files.

Proposed Structure of Gallery

After the classes are separated, main code will be contained in just one single chunk. Reading and writing of binary files will be separate. Finally, implementing a class which deals with the alternate reading and writing of XML files will be possible because of the refactoring. So then there would exist 2 R/W functionalities — one dealing with binary files, the other with XML files. An interface would be designed to implement these functionalities. This interface will be used by the main gallery code classes like Gallery, GalleryTheme, GalleryThemeEntry, etc.

To summarize, an abstract class will derive 2 classes — one reading from and writing to the binary files, the other reading from and writing to the XML files both of which will implement functions to deal with files.

Phase 3: Implementation of XML+ZIP Engine

This is the part where the most important things will be done. In this part, the class dealing with XML files is implemented. The XML Engine, will have functions, alternative to the Binary Engine, which the core classes like GalleryTheme will use. I have chosen XML as the alternative format because it can store readable data in it and it is quite common.

To implement the XML Engine, the data that was previously stored in binary files will need to be stored in XML files. The engine should learn how to read and write from and to the XML files. A very simple example of XML data can be:- For the theme arrows, rather than storing its name into an individual binary file with extension .str, we can have XML have its properties define (in this case, the name of the theme) as follows:

That forms the initial part of the phase. After this is done and the engine is interacting with XML files, the next thing in the list is to teach it how to create a ZIP File, store files in it and read the contents from the ZIP file. The ZIP file will not be a regular ZIP file. It will have its own mime type as the first file, followed by XML file containing the details and the media in it. The structure would be similar to ODF.

Contents of the ZIP file

The ZIP file will have the following contents:

  1. Mimetype of the ZIP file: This will allow us to distinguish the gallery ZIP files from the regular ZIP files, similar to ODF. This will be the first file in the ZIP folder.
  2. XML File: This contains all the data that was extracted from old binary formats together.
  3. Media binary data: This will have the actual content used by the user such as sound or images. It will be stored inside the ZIP file.

Sounds like an easy project, eh? Well, it’s been 2 months since the summer of code was started and I have completed first two phases of the project. Progress was slowed down in the refactoring part because

it’s easier to write your own code than to meddle with an existing one

It has been a great learning experience for me so far. Writing the tests made me familiar with the existing code. Refactoring the code helped me understand the usage of every function in the gallery. I’m very excited to start the final phase of the project. I cannot understand where did the two months fly away; it’s a race against time!

If you enjoyed reading this story, make sure to check the ones explaining the various parts of project.

--

--