Planet Bioclipse

Syndicate content
Updated: 37 min 35 sec ago

chem-bla-ics : Revisited: Handling SD files with JavaScript in Bioclipse

Tue, 2014-01-14 12:38
After asking on the Bioclipse users list it turns out there was an unpublished manager method to trigger parsing of the SDF properties (Arvid++), allowing to simplify creation of the index and not needed parsing of the chemical structures into a CDK molecule model.

That simplifies my earlier code to:

  hmdbIndex =    molTable.createSDFIndex(      "/WikiPathways/hmdb.sdf"    );  props = new java.util.HashSet();  props.add("HMDB_ID");  molTable.parseProperties(hmdbIndex, props);    idIndex = new java.util.HashMap();  molCount = hmdbIndex.getNumberOfMolecules();  for (i=0; i<molCount; i++) {    hmdbID = hmdbIndex.getPropertyFor(i, "HMDB_ID")    idIndex.put(hmdbID, i);  }
The next step in my use case is process some input (WikiPathways GPML files to be precise), detect what HMDB identifier is used, extract the SD file entry for that identifier and append it to a new SD file (using a new ui.append() method):
  hmdbCounter = idIndex.get(idStr)  sdEntry = hmdbIndex.getRecord(hmdbCounter)  sdEntry = sdEntry.substring(0, sdEntry.indexOf("M  END"))  ui.append("/WikiPathways/db.sdf", sdEntry);  ui.append("/WikiPathways/db.sdf", "M  END\n");  ui.append("/WikiPathways/db.sdf", "> <WPM>\n");  ui.append("/WikiPathways/db.sdf", "WPM" + (Integer.toString(wpmId)).substring(1) + "\n");  ui.append("/WikiPathways/db.sdf", "\n");  ui.append("/WikiPathways/db.sdf", "\$\$\$\$\n");
This code actually does a bit more than copying the SD file entry: it also removes all previous SD fields and replace this with a new, internal identifier. Using that identifier, I track some metadata on this metabolite.
Now, there are a million ways of implementing this workflow. If you really want to know, I chose this one because HMDB identifiers is a more prominent ID used in WikiPathways, and for this one, as well as ChEBI, I can use a SD file. For ChemSpider and PubChem identifiers, however, I plan to use the matching Bioclipse client code to pull in MDL molfiles. Bioclipse has functionality for all these needs available as extensions. 

chem-bla-ics : Handling SD files with JavaScript in Bioclipse

Tue, 2014-01-07 22:47
I finally got around to continuing with a task to create an SD file for WikiPathways. The problem is more finding the time, than doing it, and the tasks are basically:
  1. iterating over all metabolites in the GPML files
  2. extract the Xref's database and database identifier (see previous link)
  3. extract the molfile from the database SD file
  4. give the WikiPathways metabolite a unique identifier
  5. record that WikiPathways metabolite has a molfile
  6. append that molfile along with the new WikiPathways metabolite ID in a new SD file
It turns out that I can use Uppsala's excellent SD functionality in Bioclipse (using indexing, it opens 2 GB SD files for me) is also available from the JavaScript command line:
  hmdbIndex = molTable.createSDFIndex(    "/WikiPathways/hmdb.sdf"  );    idIndex = new java.util.HashMap();  molCount = hmdbIndex.getNumberOfMolecules();  for (i=0; i<molCount; i++) {    mol = hmdbIndex.getMoleculeAt(i);    if (mol != null) {      hmdbID = mol.getAtomContainer().getProperty(        "HMDB_ID"      );      idIndex.put(hmdbID, i);    }  }
Using this approach, I can create an index by HMDB identifier of molfiles in the HMDB SD file extract just those molfiles which are found in WikiPathways, and create a new WikiPathways dedicated SD file. When I have the HMDB identifiers done, ChEBI, PubChem, and ChemSpider will follow.