Prefix sum: GenomicRangeQuery

Yugo Amaryl

In its "Prefix sum" section, Codility proposes the following problem. I don't see how to solve it within the constraints, that are O(N+M), and O(N) for space complexity. I know I should use the prefix sum technique, but I would need more hints. I'd like to not be given the solutions, but hints that would help me guide my thought process. -- A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence? The DNA sequence is given as a non-empty string S = S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and Q[K] (inclusive). For example, consider string S = CAGCCTA and arrays P, Q such that: P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6 The answers to these M = 3 queries are as follows: The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2. The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4. The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide A whose impact factor is 1, so the answer is 1. Write a function: def solution(S, P, Q) that, given a non-empty zero-indexed string S consisting of N characters and two non-empty zero-indexed arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries. The sequence should be returned as: a Results structure (in C), or a vector of integers (in C++), or a Results record (in Pascal), or an array of integers (in any other programming language). For example, given the string S = CAGCCTA and arrays P, Q such that: P[0] = 2 Q[0] = 4 P[1] = 5 Q[1] = 5 P[2] = 0 Q[2] = 6 the function should return the values [2, 4, 1], as expla

Lost User

S is a string; or an array of characters; characters which "convert" to impact numbers. P and Q are (3) "substring ranges" into S. The subset / substring of S will contain one or more "MIN" impact numbers.

"(I) am amazed to see myself here rather than there ... now rather than then". ― Blaise Pascal

honey the codewitch

Wow. There are so many things I love about this question. I'm going to save it. And I'm going to try if not to solve it, then to demonstrate why I can't solve it, using code that consumes some finite computation space or time. I'm mucking about with a self-modifying linear bounded automata interpreter/compiler i've built and i am crap at most math, but just good enough at the right kinds of math to know how to build this problem and it express it through that. And you've asked what may be exactly the question I need. How many times can someone say that? Thank you =)