During my third Internship @ IBM I was located at a customers office. The customer uses a Confluence instance as internal wiki. The wiki contains nearly every information you might need if working at or together with the company. The instance itself is open in general. This means that every new user of the wiki as access to all of its content if not a certain section is closed and additional restrictions are needed. This approach might work if the wiki is only used by members of the company (or partners). However, the customer wanted to give their customers access to certain parts of the wiki.
One way would have been to close the whole Confluence instance, so that new users do not have access to any of the content, but need to get explicit access to the spaces and pages they need. As the wiki was already a big chunk of spaces and pages, this wasn't a good approach. Another solution might have been to host Markdown files on a GitHub repository and maintaining a second wiki for the customers customers. Maintaining information at two places is never a good idea as they fastly get out of sync. The solution we stuck with was to create a second (external) Confluence instance accessable by the customers. My job was to create the actual microservice running in a Kubernetes cluster copying the spaces, pages and attachments from the internal to the external instance, which should be accessed by the customers.
The editorial staff wanted to simply tag the spaces and pages, which should be "published" with a certain tag, e.g. public. To make sure that no outdated content is available on the external instance, the whole instance is cleaned at the beginning of the synchronization process. After that only the spaces tagged with public are created on the external instance. At this moment the spaces are all empty.
The next step is to get all pages for each space and generate the page hierarchy based on the information available about ancestors and descendants. Those pages not tagged as public will be removed from the hierarchy and the hierarchy itself is adjusted, so that lower nodes take the place of the removed one. If the hierarchy is built, the pages can be created on the external instance beginning with the root page. This is important as you need to specify the ID of the parent page for each page you want to create. As the ID is generated on creation time, you only can create pages with ancestors created.
After creating a single page, the attachments are synced as well. To ensure that no hidden attachments are synced (which might be for internal use only), only attachments, which are included it the pages body, are synced.
The microservice is written in Java 10. For every piece of code unit tests exist. The tests were created using JUnit 5.
The application was designed as a microservice. As all microservices are deployed to a Kubernetes cluster, I created a Docker image containing the microservice. The microservice is now running as a cronjob in the customers Kubernetes cluster every day at 5 a.m. syncing the spaces and pages between both instances.