Feed aggregator
chem-bla-ics : New Paper: "Applications of the InChI in cheminformatics with the CDK and Bioclipse"
Last week, Ola, Sam Adams, Arvid, and I published a paper (doi:10.1186/1758-2946-5-14) on the InChI functionality in the Bioclipse, which uses Sam's JNI-InChI and the Chemistry Development Kit underneath.
This paper partly describes the earlier work by Sam on JNI-InChI itself and the integration into the CDK, but also includes the recent support for CDK's IStereoElement, OSGi bundles for JNI-InChI by Arvid, and a few new applications in Bioclipse.
These applications demo what you can do with the InChI in Bioclipse. Obviously, this involves creating InChIs for any structure drawn in Bioclipse (that is old). New is that the manager now also support creating InChIs with particular layers. For example, with fixed hydrogens:
mol = cdk.fromSMILES("OC=O")
sinchi = inchi.generate(mol);
inchi = inchi.generate(mol), "FixedH");
But the more interesting bits are next. For example, the InChI is ideal for look up, and can be used in decision support with knowledge bases.
But as Christopher Southan showed in his "InChI in the wild: an assessment of InChIKey searching in Google" paper (doi:10.1186/1758-2946-5-10), the InChI is good for finding useful information on the web. I have taken a different approach with Isbjørn, which does not use Google, but Linked Data approaches to find information on the web. This semantic search is seeded with the InChI.
The third examples exposes work done by Mark Rijnbeek, formerly in the group of Christoph Steinbeck, who implemented a method that uses the InChI library for tautomer generation for the CDK. This functionality is now exposed in Bioclipse too. Obviously, this functionality is limited by those of the InChI library to generate those tautomers. But if you like to try it, you can do this with:
// no aromatic rings that make it hard to
// see where the double bonds are
jcpglobal.setShowAromaticity(false);
inputSMILES = "c1ccccc1O";
inputName = "phenol";
inchi.generate(
cdk.fromSMILES(inputSMILES)
)
tautomers = cdk.getTautomers(
cdk.fromSMILES(inputSMILES)
)
file = "/Virtual/" + inputName + ".sdf";
cdk.saveSDFile(file, tautomers);
ui.open(file);
Details on how to try all this in practice can be found on this page. And I am looking forward to hearing what you think of it, how you like to use it or are using it. If you like to extend it, the source code is on GitHub. Spjuth, O.; Berg, A.; Adams, S.; Willighagen, E. Journal of Cheminformatics 2013, 5, 14+.
This paper partly describes the earlier work by Sam on JNI-InChI itself and the integration into the CDK, but also includes the recent support for CDK's IStereoElement, OSGi bundles for JNI-InChI by Arvid, and a few new applications in Bioclipse.
These applications demo what you can do with the InChI in Bioclipse. Obviously, this involves creating InChIs for any structure drawn in Bioclipse (that is old). New is that the manager now also support creating InChIs with particular layers. For example, with fixed hydrogens:
mol = cdk.fromSMILES("OC=O")
sinchi = inchi.generate(mol);
inchi = inchi.generate(mol), "FixedH");
But the more interesting bits are next. For example, the InChI is ideal for look up, and can be used in decision support with knowledge bases.
But as Christopher Southan showed in his "InChI in the wild: an assessment of InChIKey searching in Google" paper (doi:10.1186/1758-2946-5-10), the InChI is good for finding useful information on the web. I have taken a different approach with Isbjørn, which does not use Google, but Linked Data approaches to find information on the web. This semantic search is seeded with the InChI.
The third examples exposes work done by Mark Rijnbeek, formerly in the group of Christoph Steinbeck, who implemented a method that uses the InChI library for tautomer generation for the CDK. This functionality is now exposed in Bioclipse too. Obviously, this functionality is limited by those of the InChI library to generate those tautomers. But if you like to try it, you can do this with:
// no aromatic rings that make it hard to
// see where the double bonds are
jcpglobal.setShowAromaticity(false);
inputSMILES = "c1ccccc1O";
inputName = "phenol";
inchi.generate(
cdk.fromSMILES(inputSMILES)
)
tautomers = cdk.getTautomers(
cdk.fromSMILES(inputSMILES)
)
file = "/Virtual/" + inputName + ".sdf";
cdk.saveSDFile(file, tautomers);
ui.open(file);
Details on how to try all this in practice can be found on this page. And I am looking forward to hearing what you think of it, how you like to use it or are using it. If you like to extend it, the source code is on GitHub. Spjuth, O.; Berg, A.; Adams, S.; Willighagen, E. Journal of Cheminformatics 2013, 5, 14+.
chem-bla-ics : #ACSNola talk: "Bioclipse-OpenTox: interactive predictive toxicology"
My third #ACSNola talk (well, second chronologically):
Bioclipse-OpenTox: interactive predictive toxicology from Egon Willighagen
Bioclipse-OpenTox: interactive predictive toxicology from Egon Willighagen
chem-bla-ics : #ACSNola talk: "Open PHACTS: meaningful linking of preclinical drug discovery knowledge"
Half a year ago I submitted this abstract for the #ACSNola meeting last week (and as in the slides, I stress that is a large community effort involving not only academic groups but also many pharma companies):
Open PHACTS: meaningful linking of preclinical
drug discovery knowledge
E. Willighagen, C. Brenninkmeijer, C. Evelo,
L. Harland, A. Gray, C. Goble, A. Waagmeester,
A. Williams
Recently, semantic web technologies have been
adopted by the life sciences community for this
purpose. However, while these new technologies
provide us with methods, they do not provide us
with an exact solution. Open PHACTS uses these
methods to solve problems in linking preclinical
knowledge from databases like Uniprot, ChEMBL,
and WikiPathways. Problems that are discusses
and for which our solutions will be presented
include: 1. approaches to map data between the
databases using the Vocabulary of Interlinked
Dataset, including identifier mapping with
BridgeDB, appropriate choices of mapping
predicates, and ontologies to cover provenance,
such as the Provenance Authoring and Versioning
ontology; 2. deal with different units for
experimental data using the Quantities, Units,
Dimensions and Data (QUDT) ontology for (on the
fly) quantity conversion; and 3. how all this
is linked to user-oriented graphical user
interfaces.
I have now uploaded the slides:
Also note the associate partnership program: it is not too late to join the 40 other associate partners and team up with Open PHACTS!
Open PHACTS: meaningful linking of preclinical
drug discovery knowledge
E. Willighagen, C. Brenninkmeijer, C. Evelo,
L. Harland, A. Gray, C. Goble, A. Waagmeester,
A. Williams
Recently, semantic web technologies have been
adopted by the life sciences community for this
purpose. However, while these new technologies
provide us with methods, they do not provide us
with an exact solution. Open PHACTS uses these
methods to solve problems in linking preclinical
knowledge from databases like Uniprot, ChEMBL,
and WikiPathways. Problems that are discusses
and for which our solutions will be presented
include: 1. approaches to map data between the
databases using the Vocabulary of Interlinked
Dataset, including identifier mapping with
BridgeDB, appropriate choices of mapping
predicates, and ontologies to cover provenance,
such as the Provenance Authoring and Versioning
ontology; 2. deal with different units for
experimental data using the Quantities, Units,
Dimensions and Data (QUDT) ontology for (on the
fly) quantity conversion; and 3. how all this
is linked to user-oriented graphical user
interfaces.
I have now uploaded the slides:
Also note the associate partnership program: it is not too late to join the 40 other associate partners and team up with Open PHACTS!
chem-bla-ics : #ACSNola talk: "Correlating time spent on exercises with exam results in Protein Structure education"
I have cleaned up the slides a bit, adding some information I explained during the talk in New Orleans.
Correlating time spent on exercises with exam results in Protein Structure education from Egon Willighagen
Correlating time spent on exercises with exam results in Protein Structure education from Egon Willighagen
chem-bla-ics : My ACS New Orleans talks #ACSNola
Some six weeks are left before the ACS spring meeting in New Orleans, aka #ACSNola. I got more abstracts accepted than expected and got a busy program now.
But, where is the Blue Obelisk dinner going to fit in??
- Sunday 6:30pm: New cheminformatics microscopes: Combining semantic web technologies, cheminformatical representations, and chemometrics for understanding and predicting chemical and biological properties, presenting my research line. (poster)
- Monday 8pm: two posters at Sci-Mix: the above and an Open PHACTS poster (see below).
- Tuesday 8:30am: Bioclipse-OpenTox: Interactive predictive toxicology, outlining design, implementation and ongoing evolution of this work. In the Drug Discovery session. (oral)
- Tuesday 11:20am: Architecture for an open science molecular compound database, outlining an architecture based on the InChI and semantic web technologies. In the Public Databases Serving the Chemistry Community session. (oral)
- Tuesday 2:20 pm: Open PHACTS: Meaningful linking of preclinical drug discovery knowledge, on behalf of the Open PHACTS project, in the Linking Bioinformatic Data and Cheminformatic Data session. (oral)
- Wednesday, 9:15am: Correlating time spent on exercises with exam results in protein structure education at Maastricht University, in the Beyond Multiple Choice: Assessment in the Digital Age session. (oral)
But, where is the Blue Obelisk dinner going to fit in??
chem-bla-ics : Book: "Open source software in life science research"
Recently, Lee Harland and Mark Forster published a book called "Open source software in life science research" (see book cover on the right; ISBN-13:978 1 907568 97 8; available from e.g. Amazon) featuring a chapter on Bioclipse-OpenTox, in which we applied our earlier work published (doi:10.1186/1756-0500-4-487) to the Tres Cantos interesting Antimalarial Set, TCAMS, nowadays available from the ChEMBL Neglected Tropical Disease Database. Big thumbs up to Roman who primarily did that part.
This book is not Open Access, but three chapters are. Ours is one of them: our chapter is available under the Creative Commons 3.0 Share Alike Attribution (CC-BY-SA) license. Currently, the sources are available as a Word and as Libre/OpenOffice document at GitHub, but I think I will convert that to LaTex later so that I can share a nicely formatted PDF version of the chapter. We used the Mendeley plugin for the references, with mixed experiences. All references are therefore available from this Mendeley group.
We plan to keep this chapter updated. That is, when there are significant changes in the OpenTox or Bioclipse platforms, we will update the chapter content to match those chances. Well, that is at least the idea.
Of course, everyone who likes to do that is allowed to do it. So, feel free to clone the chapter.
Willighagen, E.; Affentranger, R.; Grafström, R.; Hardy, B.; Jeliazkova, N.; Spjuth, O. Interactive Predictive Toxicology with Bioclipse and OpenTox. In Open Source Software in Life Science Research: Practical Solutions to Common Challenges in the Pharmaceutical Industry and Beyond, 1 Ed.; Harland, L.; Forster, M., Eds.; Biohealthcare Publishing Ltd: Oxford, 2012.




