<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Freelancing science &#187; Data mining</title>
	<atom:link href="http://freelancingscience.com/category/data-mining/feed/" rel="self" type="application/rss+xml" />
	<link>http://freelancingscience.com</link>
	<description>visualization, protein science, open science and freelancing science</description>
	<lastBuildDate>Thu, 08 Apr 2010 21:36:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='freelancingscience.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://www.gravatar.com/blavatar/ca6331e4ebe8b5e624ddfd24badb4473?s=96&#038;d=http://s2.wp.com/i/buttonw-com.png</url>
		<title>Freelancing science &#187; Data mining</title>
		<link>http://freelancingscience.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://freelancingscience.com/osd.xml" title="Freelancing science" />
	<atom:link rel='hub' href='http://freelancingscience.com/?pushpress=hub'/>
		<item>
		<title>Database query and ranked results</title>
		<link>http://freelancingscience.com/2009/01/22/database-query-and-fuzzy-answer/</link>
		<comments>http://freelancingscience.com/2009/01/22/database-query-and-fuzzy-answer/#comments</comments>
		<pubDate>Thu, 22 Jan 2009 19:21:27 +0000</pubDate>
		<dc:creator>Pawel Szczesny</dc:creator>
				<category><![CDATA[Data mining]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[Database]]></category>
		<category><![CDATA[PubMed]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Web search engine]]></category>

		<guid isPermaLink="false">http://freesci.wordpress.com/?p=67</guid>
		<description><![CDATA[Image via Wikipedia Already some time ago I&#8217;ve  read a piece by Marcelo Calbucci: Is it a database or a search engine?. While it deals with search information within a real estate database, I think his comments are applicable in the many areas of life sciences. In short, Marcelo points out that people miss a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=freelancingscience.com&blog=1482738&post=67&subd=freesci&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<div class="zemanta-img" style="float:right;display:block;margin:1em;">
<div>
<dl class="wp-caption">
<dt class="wp-caption-dt"><a href="http://en.wikipedia.org/wiki/Image:Autophagy500.jpg"><img title="The Autophagy network extracted from the recen..." src="http://upload.wikimedia.org/wikipedia/en/thumb/f/f1/Autophagy500.jpg/202px-Autophagy500.jpg" alt="The Autophagy network extracted from the recen..." width="202" height="138" /></a></dt>
<dd class="wp-caption-dd zemanta-img-attribution">Image via <a href="http://en.wikipedia.org/wiki/Image:Autophagy500.jpg">Wikipedia</a></dd>
</dl>
</div>
</div>
<p>Already some time ago I&#8217;ve  read a piece by Marcelo Calbucci:<a title="Is it a database or a search engine" href="http://marcelo.sampasite.com/marcelo-calbucci/brave-tech-world/redfin-dilemma-is-it-a-database.htm"> Is it a database or a search engine?</a>. While it deals with search information within a real estate database, I think his comments are applicable in the many areas of life sciences.</p>
<p>In short, Marcelo points out that people miss a lot of interesting entries while looking for a house, because of inflexibility of the query; number of bedrooms, price, distance from some point &#8211; these are all set. However, users are flexible and in such case need rather a search engine that gives them close enough answer or allows to specify weight to each filter.</p>
<p>In life sciences we do search for similarities and analogies all the time. Sometimes it&#8217;s direct comparison of sequences, on other occasion is high-level meta-comparison between two systems. And while we have various (statistical) metrics of similarities and they sometimes become a part of a database designs, interfaces of biological databases don&#8217;t allow to rank query results according to these metrics. For example I can easily find all human proteins related to disease X or disease Y or disease Z, although I cannot specify that I want proteins related to Z AND Y first on the list. Other example would be searching PubMed &#8211; I can look for articles related to &#8220;synthetic biology&#8221;, but I have no way to specify, that I want papers by <a class="zem_slink" title="James Collins (Boston University)" rel="wikipedia" href="http://en.wikipedia.org/wiki/James_Collins_%28Boston_University%29">James Collins</a> from <a class="zem_slink" title="Howard Hughes Medical Institute" rel="homepage" href="http://www.hhmi.org/">HHMI</a> AND articles related to these papers to be first on the list. I guess it is possible to obtain such results without going through the whole list, but I doubt the method will be very simple. Filtering still seems to be neglected aspect of database design in life sciences.</p>
<p>My dream biological search engine would have a series of sliders (or ideally, I would like to have a device with series of mechanical knobs attached to the computer) and would allow me to dynamically change weights of various aspects of the query and see immediately how it affects the results. It would be something resembling interactivity of <a title="Gapminder" href="http://www.gapminder.org/">Gapminder World</a>, but on dynamically generated data. Technology and proof of concept seems to be there, but I guess we need to wait quite a few years before this approach will be adopted within life sciences.</p>
<div class="zemanta-pixie" style="margin-top:10px;height:15px;"><a class="zemanta-pixie-a" title="Zemified by Zemanta" href="http://reblog.zemanta.com/zemified/f13bfa4e-0535-4863-919e-33445b3b4ffa/"><img class="zemanta-pixie-img" style="border:medium none;float:right;" src="http://img.zemanta.com/reblog_e.png?x-id=f13bfa4e-0535-4863-919e-33445b3b4ffa" alt="Reblog this post [with Zemanta]" /></a></div>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/freesci.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/freesci.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/freesci.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/freesci.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/freesci.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/freesci.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/freesci.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/freesci.wordpress.com/67/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/freesci.wordpress.com/67/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/freesci.wordpress.com/67/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=freelancingscience.com&blog=1482738&post=67&subd=freesci&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://freelancingscience.com/2009/01/22/database-query-and-fuzzy-answer/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/68883fb1792e3694835f60059aa0912e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">freesci</media:title>
		</media:content>

		<media:content url="http://upload.wikimedia.org/wikipedia/en/thumb/f/f1/Autophagy500.jpg/202px-Autophagy500.jpg" medium="image">
			<media:title type="html">The Autophagy network extracted from the recen...</media:title>
		</media:content>

		<media:content url="http://img.zemanta.com/reblog_e.png?x-id=f13bfa4e-0535-4863-919e-33445b3b4ffa" medium="image">
			<media:title type="html">Reblog this post [with Zemanta]</media:title>
		</media:content>
	</item>
		<item>
		<title>Mining PubMed &#8211; another tools available</title>
		<link>http://freelancingscience.com/2008/03/05/mining-pubmed-another-tools-available/</link>
		<comments>http://freelancingscience.com/2008/03/05/mining-pubmed-another-tools-available/#comments</comments>
		<pubDate>Wed, 05 Mar 2008 15:37:20 +0000</pubDate>
		<dc:creator>Pawel Szczesny</dc:creator>
				<category><![CDATA[Data mining]]></category>
		<category><![CDATA[PubMed]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[literature search]]></category>

		<guid isPermaLink="false">http://freesci.wordpress.com/?p=98</guid>
		<description><![CDATA[There are two new tools available that mine semantically PubMed abstracts, e-LiSe and Anne O&#8217;Tate. First one was made by my colleagues from Institute of Biochemistry and Biophysics in Warsaw, while the second is from University of Illinois, Chicago. Female-sounding names is not the only thing that makes them look similar, they both provide analogous [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=freelancingscience.com&blog=1482738&post=98&subd=freesci&ref=&feed=1" />]]></description>
			<content:encoded><![CDATA[<p>There are two new tools available that mine semantically PubMed abstracts, <a href="http://miron.ibb.waw.pl/elise/" title="e-LiSe">e-LiSe</a> and <a href="http://128.248.65.210/cgi-bin/arrowsmith_uic/AnneOTate.cgi" title="Anne O'Tate">Anne O&#8217;Tate</a>.  First one was made by my colleagues from Institute of Biochemistry and Biophysics in Warsaw, while the second is from University of Illinois, Chicago. Female-sounding names is not the only thing that makes them look similar, they both provide analogous functionality, like keywords or author names associated with user query.</p>
<p>There&#8217;s quite a lot of third party interfaces to PubMed (see <a href="http://davidrothman.net/category/technology/3rd-party-pubmedmedline-tools/" title="Third party pubmed tools">David Rothman&#8217;s excellent list</a>), so I couldn&#8217;t resist to run few queries on both servers and compare them to <a href="http://gopubmed.org/" title="GoPubMed">GoPubmed</a>, which currently wins hands down in terms of features and interface. I wasn&#8217;t surprised to see that results overlap significantly, although not completely. Each of servers found valuable keywords other two did not have &#8211; that&#8217;s understandable, they use different algorithms. I wonder if we will see a meta-server of PubMed data-miners, like there are for protein structure prediction (for example <a href="http://meta.bioinfo.pl/" title="MetaServer">meta.bioinfo.pl</a>). In theory, exhaustive search for meaningful keywords by different methods and then their classification and analysis should work better than any single method, but this is just a guess.</p>
<br /><img alt="" border="0" src="http://feeds.wordpress.com/1.0/categories/freesci.wordpress.com/98/" /> <img alt="" border="0" src="http://feeds.wordpress.com/1.0/tags/freesci.wordpress.com/98/" /> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/freesci.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/freesci.wordpress.com/98/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/freesci.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/freesci.wordpress.com/98/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/freesci.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/freesci.wordpress.com/98/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/freesci.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/freesci.wordpress.com/98/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/freesci.wordpress.com/98/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/freesci.wordpress.com/98/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=freelancingscience.com&blog=1482738&post=98&subd=freesci&ref=&feed=1" />]]></content:encoded>
			<wfw:commentRss>http://freelancingscience.com/2008/03/05/mining-pubmed-another-tools-available/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/68883fb1792e3694835f60059aa0912e?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">freesci</media:title>
		</media:content>
	</item>
	</channel>
</rss>