Words a bioinformatician never wants to hear

I used to run a blog at biocodershub.net before, but since I moved all my efforts towards developing coderscrowd.com, I was forced to shut it down. That said I will keep releasing some successful posts that were highly debated because they reflect the reality of a day to day data scientists work. This is one of these famous posts, originally shared by Paul Michael Agapow @agapow

Something cynical for a rainy Friday at the end of a long week :^)

“The data is all in these [proprietary and undocumented format] files.”

“What I want is a program to browse, edit and validate gigabyte-size whole genome sequencing runs. It should import and export all known formats. And it has to run in a browser. And some of staff refuse to use anything but IE6.”

(After delivering an insignificant or negative result) “Can’t you analyse it again?”

“Why don’t we put the new server rack in your office?”

“That software you wrote is buggy! [What happened?] It’s not working! [How do you know that?] It’s broken! [In what way?] Can’t you just fix it? [How? I don't know what's wrong ...]“

“I don’t understand: [large research institute / multinational commercial company] has software that can do this. Can’t you write something similar?”

“This is a great / exciting opportunity …”

“This program is great. But could you rewrite it in another programming language?”

“That database and web service you wrote for X? We need one that works just like that. Except for … [lists dozens of ways in which the new service actually differs entirely from the previous one]“

“You want to know what feature or task is the most important? They all are!”

(After being told that the data sample is too small, or incorrectly sampled such that analysis is impossible.) “You don’t understand – we really need this result.”

“Here’s the data. I haven’t had time to clean it up, so it might be incomplete. And some of the identifiers might not agree. And there are mis-spellings …”

(After delivering the outcome of an analysis) “Pth – that result is obvious.”

“Don’t worry about who’s going to [maintain the new database / monitor the new service / curate the data / come in on the weekends to restart the system]. We’ll work that out later …”

“So X wrote us this pipeline before he left. I’m not sure if he finished it. No, there’s no documentation. Can you get it working? By next week?”

“I think I read a way to do this: it was in a journal, maybe. Or on a webpage. Done by some lab in France. Or was it China? Anyway, it should be simple.”

“Do you really need that much disk space for this NGS data?”

“So your program crashed when I tried to load data. What format? Does that matter? They were Word documents. Really, the program doesn’t read those?”

“So, what you’re saying is that a Word document isn’t a text file. But I used Courier as a font.”

“We need this program. It’s really simple … [30 minutes of essential features follow]“

(While waiting for the result of a Bayesian calculation) “Why does it take so long to get this answer? Can’t you just make it go faster?”

“I know you said that 30 data points were the minimum for statistical rigour. But we only got 5. Can’t you analyze it anyway?”

“We keep all those records in Excel files … uh, I think this is the most current version …”

“The Z lab showed you could do this [with 10 genes and a computing cluster]. So do you think you could this this with our data [200 whole genomes, on a PC]?”

“Good news – we got a huge grant for sequencing and annotating 6 squillion whole genomes. You’re not on the grant and we didn’t budget for any bioinformatic work but here’s the data. Can you have this done by next week?”

(After being told that an analysis is impossible or ill-considered) “But X over in Y’s lab does it all the time.”

“Uh, so what is it that you’re doing again?”

PR on Github or comment here , and your contribution will be added to the post, Thanks




comments powered by Disqus