Every month a challenge is published – a specific problem that requires a solution. Instructions are revealed and a deadline set; solutions will be judged using the challenge criteria. The winning team are then awarded with points which contribute towards the overall score resulting in a fantastic award. Each team must be prepared (if required) to explain how the solution was reached!
The technical challenge not only seeks to motivate the teams involved to win, but also to develop healthy habits such as teamwork and effective communication, whilst having fun along the way! Whilst participation is optional, a very tempting award sits at the end of 10 months – a day at the F1 races or Premier League tickets! A prize too good to miss out on…
[img]../../../images/img/CHALLENGES_AHEAD.jpg[/img]On 7th February the first technical challenge was set:
In 1953, James Watson and Francis Crick discovered the structure of Deoxyribonucleic Acid (DNA). The suggested model, a double-helix has now become the accepted structure for what is considered to be the central cornerstone of life on earth. DNA, a double stranded helix, found in every chromosome is a vast ordered collection of simple units called Nucleobases. Nucleobases come in four types; Adenine (A), Cytosine ©, Thymine (T) and Guanine (G). The order in which they lie represents protein structures - a building block of life. Sequencing DNA has become very popular and automated by machines, but due to the enormous amounts of DNA sequences being created everyday it has become a challenge to make order of all this information.
Some very specific sequences of DNA have been identified as important in various areas of research ranging from growing grain in very tough and arid areas to understanding the making up of genetic diseases and with a large repository of data to look through, the question of reducing the time it takes to find out if a specific section of DNA is contained within a data set has become very important. A large DNA dataset has been presented to the team which is comprised of only four letters; A, C, T & G in ASCII format. The team is asked to come up with a program that will take this dataset and search for a specific sequence of DNA in the shortest time possible.
- The program can make use of Open Source libraries
- Preparation of the installation of the libraries is allowed
- The program must log to a file the start and finish time
- The program must identify where in the dataset the sequence is found, i.e. it starts at the 120445th nucleobase.
- The program must identify if the sequence is found more than once
- No pre-processing of the data is allowed.
The program will be run four times, the first time as a test it works and the remaining three times to produce a mean time that will be used as the final submission for that entry.
The dataset will be held in a file format and will be generated on test day. It will contain a sequence of 3 billion (3,000,000,000) bases. A smaller example file is included for you to understand its format.
They will be asked to look for a sequence that is no more than 16 bases long.
The output from your program should be in the same format as the Output file given in this challenge.
So thus… 8 teams launched themselves into the mission to be the winners!
With the deadline set for 3rd March, did…
- the HPC run team lose their grid?
- Guillaume, Tom, Varun and Ryan struggle extracting?
- Australia get trapped down under?
- Commodities give up trading?
- Excelian excel at the task?
- GDC get higher than the foot of Table Mountain?
- the two Chris’ bank on winning?
- the Reporting team forget to report?
…or Ivan, Jonathan, Cyrille, Alex and Alberto lose sight of the prize?
Find out in the next instalment!