GSoC 2015 : Week 5

This post is regarding the work done in the fifth week of the GSoC coding period. To know more details about the project follow the Introduction link.

It’s time for the midterm evaluation and I am running as per my timeline..! So I need not worry about it. 😛

This week I added some new features to the replicator and modified the code written for the couchdb-client, mostly involving removal of unnecessary commits and addition of tests, in order to make it suitable for pull request. The PR is yet to be merged and can be seen by following link [1].

Last week I had experimented on the streaming docs with attachments. This week I added the logic to the couchdb-client as a new MultipartClient which reads data from the source in chunks, processes it and transfers it to the client with desired modifications. It’s not very general now and supports only streaming the multipart  response from the source to the target. It will be modified as per the feedback of the maintainer of the couchdb-client and it will be done after I get the first set of changes merged. The multipart client and other set of related changes to use the MultipartClient can be seen by following the link [2].

The replicator now supports the continuous replication, which means that once a replication has been started started, the replicator-source and replicator-target connections will not end after the set of changes has been transferred. They will remain connected  and as soon as any change involving insertion, deletion or modification happens at the source, it will be transferred to the target. It has two variants, first where the replication never stops and a periodic heartbeat is sent continuously and the second where a max timeout can be set for waiting to close the connection before sending the response. Currently only the source-replicator connection remains opened.  The changes to support this can be seen by following the links in [3].

Now in the current week, I don’t plan to add any new features or to start the Drupal related part with writing Drush plugin. I will be mostly writing tests which my mentor greatly emphasizes upon 😛 , for the client and replicator. Also making changes to the client for the PR is another thing that I will work upon.

Links:

  1. couchdb-client/pull/42
  2. couchdb-client/tree/trying_generators/lib/Doctrine/CouchDB
    1. replicator: couchdb-replicator/tree/continuous_replication  ,
    2. client: couchdb-client/tree/continuous_replication/lib/Doctrine/CouchDB

GSoC 2015 : Week 4

This post is regarding the work done in the fourth week of the GSoC coding period. To know more details about the project follow the Introduction link.

This week I mostly spent time trying to find out how to stream data in PHP. I had heard about Guzzle, the PHP HTTP client and it’s good support for streams. But we wanted to do this without any external dependency and also with our initial decision of using the couchdb-client, I chose not to use Guzzle for now. One other way was using curl’s CURLOPT_READFUNCTION  option, which allows one to set a callback function returning chunks of data from a stream. I decided not to add new components to the current couchdb-client. So I used the file pointer returned by the fsockopen directly to read and write data in chunks. I requested the docs and attachment from the source and wrote it to the target, reading the stream from source line by line and writing it to the file pointer for the target’s connection. With this I was able to replicate a  ~150MB attachment with a memory limit for PHP as just 1MB.

Another issue that I faced while doing this is that initially I was trying to use “Transfer-encoding: chunked” as one of the header option while connecting to the target. This is used when you don’t know the entire length of the content you are sending, say in situations like where you are receiving data and you want to send it to the target without entirely storing it on you local machine may be because of memory issues or any other reasons. This is what I was doing. But after trying a lot, the doc and attachments were not getting replicated. So I talked to Kxepal, a couchdb member and came to know that this is an issue with CouchDb. It needs the Content-Length header and a fix was proposed but has not been merged yet. Wish I had talked to him before starting this.. So I read the stream till the attachment start, hoping that the doc will be small enough to fit in memory, then calculated the content length based on doc length, attachment length and other standard \r and \n’s. With this I was able to do stream handling of the response from source and upload it line by line with lesser memory footprint to the target, a much needed feature of the replicator..!

Now I need to see if it can be merged with the couchdb-client.