Are you interested in joining the project and developing software with us? That's fantastic!!
Before you get started, you may want to read a little bit about the four main topics which we aim to address here at AC.GT, and what these mean for your code. A lot of code submitted to the site cannot be included due to issues of licencing, etc, so please read below to make sure we are a good fit for your plans/goals. If however you agree with our mission statement below and your code simply needs commenting or a bit of reworking, that is absolutely something we can help you with! :)
We do not write black box software.
A user should never feel like they are unsure what is actually happening to their data. We make it a requirement that all of
our published code is human-readable and fully commented such that a willing Biologist with only a limited understanding of
programming can download it and read through the code to get a clear picture of how the program works.
Clarity is far more important to us than speed of execution or optimal use of resources. Understanding how a program works
is an all-or-nothing game, whilst speed differences between verbose and terse algorithms is often measured in just a handful of
seconds. In other words, if the speedup from writing advanced code is less than the amount of time it would take to teach a
Biologist how the speedup works, it is not worth it.
I think its also important to note here that when software is not well understood by its users, there is no guarantee that it will be
used correctly. This makes the software unreliable, and is ultimately a failing on the part of the software developer, not
In addition to a well described code, we make sure all users can understand our outputs. Practically this means that all software written for AC.GT must include as much metadata as possible about the analysis in the output. One of the driving forces for the AC.GT project in the early days is that when a BAM file is filtered, there is no record of the filtering process in the output. For reliable science, this is absolutely terrifying! When there are more checks and balances involved in the downloading of a cat photo from Pintrest than there is in the production of data from cancer patient samples, something somewhere has gone amiss...
Thus, every output file produced by an AC.GT tool must have an easily readable ASCII header (even if the data itself is encoded in non-readable binary) that acts as a digital chain-of-custody, detailing where the data has been prior to its current state. The same is absolutely true for all visual outputs, in that they must contain information in the image, at least in shorthand, that explains exactly how the data was processed leading up to the creation of the image, all the way back to the input data.
All of our software is free, open, and owned by the community!
While we have a dedicated team of developers working on each project (and you are very much encouraged to join!)
all of the software on this site is licensed under Creative
Commons 0 which dedicates it to the public domain to the fullest extent permissible by law. This means you can use, edit,
redistribute and even sell our software, without any requirements for attribution or maintaining the licence.
We do ask politely that if you find our software useful or even instrumental in achieving some goal that you acknowledge the
authors (listed in the source of our code) in either your code or publication - but merely as a courtesy, not as a requirement.
Half of the puzzle of reliable software is writing code cleanly and with good comments. The other half is proactively teaching
the users how the code works. It's absolutely OK to have a very complicated algorithum in your code, so long as you break it down
piece by piece such that a determined Biologist can 'get it'.
So teaching is a big part of the site, as is the dissemination of all the great publicly availble information that is already out there on the internet in the form of Biostars posts, SEQAnswers discussions, Reddit's /r/bioinformatics, and blog posts. Ideally we would like to organise a sort of distributed journal club for reviewing some of the more impactual publications of the moment. Right now however, it means if you join our chat room and leave a question, we will try our very best to give a solid answer, whatever your level of expertise. The really best questions, and any bits of code derived from them by way of 'solution', will be featured on the site :)
The open source movement isn't just a nice idea. It is directly responsible for the success of some of the most widely
used software ever created. As our world became larger and more interconnected than ever before, programmers had to learn
that it isn't their code which had value - code can always be copied, modified, redistributed - rather, it was the
programmers and communities themselves which had value - because as authors of highly popular works, they directly determine
the direction and thus future of their area of expertise.
The music industry struggled for a long time to accept they should be in the business of selling music, and not plastic discs, resisting all attempts to digitize their products. And while we all poke fun at the music industry for only recently coming to terms with what it is they actually sell, the irony of course is that those of us in the buisness of Science still frequently see "the experiment" as the single unit of currency that gives our work value, rather than the process of creating, performing and analysing experimental Biology.
It seems that as the world got larger, labs turned inwards to protect themselves for fear of getting "scooped" or being under-appreciated for their contribution. The net effect of course, is that science has yet to see the surprising benefits, both globally and individually, that come from focusing on generating value rather than protecting it.
Thus, it is one of AC.GTs core objectives to give bioinformaticians all the tools they need to make their work as public as they can, without hurting their chances of publication. The log.bio database is a totally open and community driven attempt to detail how bioinformaticians use software and get results - aside from the fact that it's also the most fantastic way to keep your own bioinformatic journal. The SeQC project was initially designed to overcome the legal hurdles involved in getting quality control data from human patient samples, and ended up being a distributed QC network for all samples/organisms.
For the majority people who visit this site, what you just read will have little consequence for you. You're here to use the
software and whilst you appreciate the lofty ambitions of the site, you really just need to do XYZ as quick as possible.
Awesome! Great! Everyone here at team AC.GT is 100% behind you and will help you however we can.
A few people however will visit the site and hopefully something about it will really resonate with them. To you people we say, if you want to help, please please please get in the chat room and start talking to us! It doesn't matter if you are a veteran PhD programmer, or a Master's student studying Biochemistry - you are part of the top 1% of the population, and we want to hear from you! :)