Planet Bioclipse

Syndicate content
Updated: 39 min 33 sec ago

chem-bla-ics : Open Notebook Science ONSSP #1: http://onsnetwork.org/

Wed, 2015-03-11 15:55
As promised, I slowly set out to explore ONSSPs (Open Notebook Science Service Providers). I do not have a full overview of solutions yet but found LabTrove and Open Notebook Science Network. The latter is a more clear ONSSP while the first seems to be the software.

So, my first experiment is with Open Notebook Science Network (ONSN). The platform uses WordPress, a proven technology. I am not a huge fan of the set up which has a lot of features making it sometimes hard to find what you need. Indeed, my first write up ended up as a Page rather than a Post. On the upside, there is a huge community around it, with experts in every city (literally!). But my ONS is now online and you can monitor my Open research with this RSS feed.

One of the downsides is that the editor is not oriented at structured data, though there is a feature for Forms which I may need to explore later. My first experiment was a quick, small hack: upgrade Bioclipse with OPSIN 1.6. As discussed in my #jcbms talk, I think it may be good for cheminformatics if we really start writing up step-by-step descriptions of common tasks.

My first observations are that it is an easy platform to work with. Embedding images is easy, and there should be option for chemistry extensions. For example, there is a Jmol plugin for WordPress, there are plugins for Semantic Web support (no clue which one I would recommend), an extensions for bibliographies are available too, if not mistaken. And, we also already see my ORCID prominently listed, and I am not sure if I did this, or whether this the ONSN people added this as a default feature.

Even better is the GitHub support @ONScience made me aware of, by @benbalter. The instructions were not crystal clear to me (see issues #25 and #26), some suggested fixes (pull request #27), it started working, and I now have a backup of my ONS at GitHub!

So, it looks like I am going to play with this ONSSP a lot more.

chem-bla-ics : First steps in Open Notebook Science

Wed, 2015-03-11 15:53
Scheme 2 from this Beilstein Journal of Organic
Chemistry paper
by Frank Hahn et al.I blogged a few weeks back I blogged about my first Open Notebook Science entry. The post suggest I will look at a few ONS service providers, but, honestly, Open Notebook Science Network serves my needs well.

What I have in mind, and will soon advocate, is that the total synthesis approach from organic chemistry fits chem- and bioinformatics research. It may not be perfect, and perhaps somewhat artificial (no pun intended), but I like the idea.

Compound to Compound
Basically, a lab notebook entry should be a step of something larger. You don't write Bioclipse from scratch. You don't do a metabolomics pathway enrichment analysis in one step, either. It's steps, each one taking you from one state to another. Ah, another nice analogy (see automata theory)! In terms of organic chemistry, from one compound to another. The importance here is that the analogy shows that there is no step you should not report. The same applies to cheminformatics: you cannot report a QSAR model without explaining how your cleaned up that SDF file you got from paper X (which still commonly is practised).

Methods Sections
Organic chemistry literature has well-defined templates on how to report the method for a reaction, including minimal reporting standards for the experimental results. For example, you must report chemical shifts, an elemental composition. In cheminformatics we do not have such templates, but there is no reason not too. Another feature that must be reported is the yield.

Reaction yield
The analogy with organic chemistry continues: each step has a yield. We must report this. I am not sure how, and this is one of the things I am exploring and will be part of my argument. In fact, the point of keeping track of variance introduced is something I have been advocating for longer. I think it really matters. We, as a research field, now publish a lot of cheminformatics and chemometrics work, without taking into account the yield of methods, though, for obvious reasons, very much more in chemometrics than in cheminformatics. I won't go into that now, but there is indeed a good part of benchmark work, but the point is, any cheminformatics "reaction" step should be benchmarked.

Total synthesis
The final aspect is, is that by taking this analogy, there is a clear protocol how cheminformatics, or bioinformatics, work must be reported: as a sequence of detailed small steps. It also means that intermediate "products" can be continued with in multiple ways: you get a directed graph of methods you applied and results you got.

You get something like this:

Created with Graphviz Workspace.
The EWx codes refer to entries in my lab notebook:
  1. EW4: Finding nodes in Anopheles gambiae pathways with IUPAC names
  2. EW5: Finding nodes in Homo sapiens pathways with IUPAC names
  3. EW6: Finding nodes in Rattus norvegicus pathways with IUPAC names
  4. EW7: converting metabolite Labels into DataNodes in WikiPathways GPML

Open Notebook Science
Of course, the above applies also if you do not do Open Notebook Science (ONS). In fact, the above outline is not really different from how I did my research before. However, I see value in using the ONS approach here. By having it Open, it

  1. requires me to be as detailed as possible
  2. allows others to repeat it
Combine this with the advantage of the total synthesis analogy:
  1. "reactions" can be performed in reasonable time
  2. easy branching of the synthesis
  3. clear methodology that can be repeated for other "compounds
  4. step towards minimal reporting standards for cheminformatics methods
  5. clear reporting structure that is compatible with journal requirements
OK, that is more or less the paper I want to write up and submit to the Jean-Claude Bradley Memorial Issue in the Journal of Cheminformatics and Chemistry Central. It is an idea, something that helps me, and I hope more people find useful bits in this approach.