Wednesday 10 October 2012

Sakai Development: Post Nine

Before actually starting writing the code to do the deposit itself, I need to set up and include the SWORD2 Java client libraries. If you're not used to github, you might take a while to see the  button, which you can use to download the library as a zip file. Unzip it, cd to the created directory, and run

mvn clean package

to compile (and download a large number of new library files). This should hopefully end up with:

[INFO] [jar:jar {execution: default-jar}]
[INFO] Building jar: /home/simon/work/swordapp-JavaClient2.0-420485d/target/sword2-client-0.9.2.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3 minutes 24 seconds
[INFO] Finished at: Wed Sep 26 15:14:32 BST 2012
[INFO] Final Memory: 27M/121M
[INFO] ------------------------------------------------------------------------

and then it's a question of copying the jar file to somewhere where it can be picked up by Sakai. This requires two things (assuming my understanding of how maven POM files work is correct):

  • Add a dependency to the relevant pom.xml file, which will be picked up on compilation, so that maven will attempt to download the relevant jar file, and, if it can't, will ask for it to be installed locally by hand. The relevant file is in the content/content-tool/tool directory, and needs the following added (with line numbering):

  • Import the necessary classes into the ResourcesAction.java file so that the library can be used in this script. This is a simple pair
import org.swordapp.client.SWORDClient;
import org.swordapp.client.SWORDClient.*;
at line 135 of the file.
The code which will use this is based on doFinalizeDelete (lines 6545-6675), which follows doDeleteConfirm when the confirmation is selected. I haven't yet worked out where the actual display of the confirmation request happens, so this is not the last change by any means. The confirmation step could also include the opportunity to select from a list of repositories, and from the collections which exist at the chosen repository (as obtained from the repository SWORD2 service document). But that is a complication I don't really want to get into at this stage, I just want to be able to carry out a simple deposit. So I'm going to have both the repository and the collection set in the configuration file for the servlet.

The process is relatively simple; however, I should point out that I have just noticed in the SWORD2 client documentation that multi-part deposit is not supported, and the way I have been thinking has assumed that it is. So I will have to make a zip file or something of the items to be deposited (as their nature as a collection is important). java.util.zip is a core package, but not one I've ever used before; I'll start by adding the import for the package at the top of the file (new line 52).

The steps for producing a SWORD2 deposit from the selected files are:

  • Get archive details from configuration (some will always need to be obtained from configuration, but a service description could be downloaded on the fly to get information about collections etc. - just not in this version as I'm already overrunning the schedule);
  • Prepare metadata, using data from the confirmation form, which should basically be a subset of DC terms for use in the deposit (bearing in mind that it's possible for the depositor to go to the repository and change the metadata later if necessary);
  • Prepare files - create a zip as a file stream;
  • Authenticate to repository using userid/password provided in confirmation form;
  • Make deposit and (hopefully) get back a receipt.
While the information given in the swordapp documentation at first looks pretty complete, it is missing some details which I need, as I discover on starting to put the code together. I'll need to look at the source code for the app to get them.

The first issue is with the metadata. Dublin Core is not the only metadata used in SWORD2; there is some basic information which is part of the Atom profile: title, author, last modification date, and so on, as seen in the example here. The documentation gives no information about how to set this, and in fact I can't find anything useful in the source code (the string "updated", which is the element name for the last modification date, does not appear anywhere in the client code). I'm not particularly familiar with Atom, so it is possible that these are optional. I'll ignore this for the moment and see what happens. I'm also going to assume that in order to give multiple entries, I just repeatedly add a term: this needs to be supported for DC contributor - I think this should work, but I haven't actually gone through the apache Abdera library .which the swordapp code uses to check this.

Just at this point Andrew Martin put up a useful blog post which details his journey to working with the SWORD2 java library. He's not doing exactly the same thing, though we have already been in contact. While I need to go deeper into the coding, his post is probably a very useful resource for anyone reading this one.

The next thing to sort out is creating a ZIP file (virtually) to contain the items selected for archiving. I've not done this before, and the ZIP creation stuff I can find online, as well in my ancient Java book,   concentrates on making a ZIP from files (this looks pretty  useful for that, and is likely to form the framework I'll use) rather than from the Sakai content platform where the content may not even be stored in a filesystem. So I need to work out how to get the content out of the files as a Java input stream. I'll start by looking through the ResourcesAction.java code, and then move on to other files in the same directory if I can't find anything. All the input streams in ResourcesAction are for making changes to content rather than reading it - makes sense, as reading is not an action whcih affects the resource. But this code from FilePickerAction.java (lines 1710-13) makes it look very simple:

InputStream contentStream = resource.streamContent();
String contentType = resource.getContentType();
String filename = Validator.getFileName(itemId);
String resourceId = Validator.escapeResourceName(filename);

I just need to work back through the context to be sure that this code is doing what it appears to be doing. Although it doesn't appear to be (because it's in a method for up dating the resource), this is what is is in fact doing, as I eventually discover when I find the relevant javadocs (ContentHostingService, BaseContentService, and ContentResource - not from the current release, though). To re-use this code, the ContentResource class needs to be loaded, which it already is, and the content service needs to be set up (outside the loop which runs through the selected items):

ContentHostingService contentService = (ContentHostingService) toolSession.getAttribute (STATE_CONTENT_SERVICE);

The first problem, then, is that what I have is a ListItem object, when what I want is the itemID (which is a String); this is simple, as id is a property of the ListItem object, so I can just get it. I'll also need to protect against the itemid being null, which I don't think should happen. I'm not quite sure what the correct thing would be to do if it does, so I'll just log a warning if that happens. So the code I add is (lines 6741-4):

String itemid = item.id;
ContentResource resource = null;
if (itemid != null)
{

and then in the conditional block (6752-6774),

resource = contentService.getResource(itemId);
InputStream contentStream = resource.streamContent();

byte[] buf = new byte[1024];

//get filename and add to zip entry
String fileName = item.getName();

if (fileName != null)
{
  zip.putNextEntry(new ZipEntry(fileName);
}
else
{
  zip.putNextEntry(new ZipEntry("Unnamed resource with ID " + itemid);
}

int len;
while ((len = contentStream.read(buf)) > 0) {
  zip.write(buf, 0, len);
}
zip.closeEntry();     
contentStream.close(); 

Some preparation has to happen before this, using piped streams to turn the zip output into the input for the SWORD library, calculating the MD5 hash we want for the deposit stage on the way:

MessageDigest md = MessageDigest.getInstance("MD5");
DigestInputStream dpins = new DigestInputStream(pins, md);

And now we should be in a position to set up the deposit with the SWORD2 library..

It's also occurred to me that the solution to the problem of multiple archives is to embed them in the confirmation web form - the user selects the archive there from a drop down list, and the script reaps the URL used for deposit. So the URL to use is then just a form parameter. Except - the sword client readme file suggests that a collection object (derived from the collection description in a service document) is needed for deposit, so I need to check in the source code to see if there's a method with a deposit URL as an alternative. Turns out that there is, so I'll use that. So we have (ignoring a whole load of exceptions which will surely need to be caught, for the moment):

// Set up authentication
AuthCredentials auth = new AuthCredentials(params.getString("userid"),params.getString("password"));

// Make deposit and (hopefully) get back a receipt
Deposit deposit = new org.swordapp.client.Deposit();
deposit.setFile(dpins);
dpins.close();
pins.close();
pouts.close();
byte[] fileMD5 = md.digest();

deposit.setMimeType("application/zip");
deposit.setFilename(params.getString("title") + ".zip");
deposit.setPackaging(UriRegistry.PACKAGE_SIMPLE_ZIP);
deposit.setMd5(fileMD5);
deposit.setEntryPart(ep);
deposit.setInProgress(true);

// And now make the deposit
DepositReceipt receipt = client.deposit(params.getString("depositurl"), deposit, auth);

The next issue is what to do with the receipt. And how to alert the user to the success or failure of the deposit. The confirmation web page should still be available (especially if it has an indication that the status of the archiving will be displayed there). So it be displayed there, dynamically. So for the moment, I'll just hold on to the receipt and revisit this code when I've written the appropriate Velocity file.

There's just one final bit of code to add to this file, which is to add a call to the confirm delete method   as a new state resolution function (lines 7083-6):

else if(ResourceToolAction.ARCHIVE.equals(actionId))
{
  doArchiveconfirm(data);
}

Then I can start work on the Velocity file. I should say at this point that I don't expect this code to compile without error. I'm absolutely certain there will be exceptions which haven't been caught, and I may well have confused some of the variable names in the course of this. But I want to get on to the next stage before coming back here.

4 comments:

  1. Hi Simon,

    First of all, congratulations for you great library and this post.

    I'm trying to use this Sword2 java client, but I have found a java.net.ConnectException: Connection refused.

    Well, in SwordCli.java, I'm using tryAddMetadata() method and I reach to the line:

    receipt = client.deposit(collections.get(0), deposit, auth); //collections.get(0) exists

    I've been tracing the problem and I've found the exact point where it fails in SWORDCLient.java. In this method:
    public DepositReceipt deposit(String collectionURL, Deposit deposit, AuthCredentials auth)
    throws SWORDClientException, SWORDError, ProtocolViolationException

    where you do:
    resp = client.post(url.toString(), media, options);

    I've printed some about media and options. That is:
    MEDIA
    ---------------------
    ContentType=application/pdf
    ContentLength=3561

    OPTIONS
    ---------------------
    ContentLocation=null
    Slug=null

    Do you know what I'm doing wrong? If you need more info please, let me know.

    Thanks in advance,
    Pablo Esteban

    ReplyDelete
    Replies
    1. Hi Pablo,

      Thanks for the praise, bit it's not my library - like you, I'm just trying to use it. You probably need to contact Richard Jones via https://github.com/swordapp/JavaClient2.0. But you might want to try checking that it is possible to connect to the URL you are using with a browser, just to make sure that it should be accepting a connection, as it's the web server at the repository you're using for deposit which is returning the error you're seeing.

      Hope this helps,
      Simon

      Delete
  2. Hi Simmon,
    Yes, I've tested the url and the response is something like:
    <service xmlns="http://www.w3.org/2007/app" xmlns:atom="http://www.w3.org/2005/Atom">
    <workspace>
    <atom:title type="text">DSpace at My University</atom:title>
    <collection href="http://localhost:8080/swordv2/collection/123456789/4">
    <atom:title type="text">Libros nuevos</atom:title>
    ...

    And watching the response, I'm realizing that maybe the mistake is in the line <collection href="http://localhost:8080/swordv2 ... My dspace ip is https://192.168.214.142 and maybe, that localhost:8080 is the problem...

    I first thought the problem was in {dspace}/config/modules/swordv2-server.cfg but it seems not to be from that file.

    ReplyDelete
  3. The mistake was in {dspace}/config/modules/swordv2-server.cfg

    in collection.url, which was set to default...

    Now I'm getting " Deposit request on https://192.168.214.142/swordv2/collection/123456789/4 returned Error HTTP status 412"

    I'll try to do some researchs

    Thanks!
    Pablo

    ReplyDelete