wiiki - CodingProblem

Amusing coding problem. -- RonJeffries

You have an ordered collection, size N, of objects.

Given K, answer an ordered collection of K ordered collections of size approximately N / K, such that the first contains the first N / K objects, the second the second N / K, etc.

Example

input = #(1 2 3 4 5 6 7 8 9 10)
k = 3
output = (for example) #(
        #(1 2 3)
        #(4 5 6)
        #(7 8 9 10)

Note ... for some reason, dealing the objects won't do, we want the first N/K in the first collection, etc.

Solutions to date:

For those who are interested in the wide diversity of methods towards solving this problem, please see:

What is so hard about this? It can be done in C++ with an STL iterator and a bit of code. Procedurally. Has to be some trick, eh? To make the fact that they are ordered collections significant? -- MichaelFeathers

No, there's no trick ... but we had to do it today and found it hard to get a clean solution. RalphJohnson sent me a cute solution using a stream. I wrote a clean one using dealing, but haven't found something clear and clean that does it as described here. Use arrays, vectors, I don't care ... just make the solution really clear and clean. -- RonJeffries

Simplest thing I can think of is this:

Make vector Distrib with K elements. Elements of Distrib are either N/K or N/K + 1 st. sum(Distrib) == N. An expression using modulus can be used to calculate the actual distribution. The elements of vector Distrib now contain the number of elements that will be in each resultant collection. The input vector can be iterated so that each element is placed in the current collection. Use the elements of Distrib in order to decide when to make the next collection the current collection.

Are there clearer strategies, Ron? Actually, is this the right problem? If so, some STL algorithms can make this even clearer.

-- MichaelFeathers

I like the notion of making the Distrib vector: Kent's trick with the error value (which I have just noted not understood) fits in to that nicely. My real objective isn't met yet, which is in all the solutions the central loop isn't obvious. It's funnier than it looks, tho, isn't it?

See CodingProblemSmalltalkSolution

I see a way to make it dead simple, but it wastes a little space. No time to code it right now, but here is the idea:

// K
K = 3
// The input vector
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]  
// make a distribution vector: size = K
[ 3, 3, 4 ]
// make an offset vector from the distribution vector (size = K)
[ 0, 3, 6 ]
// from the offset vector, make a map vector (size = N)
[ 0, 0, 0, 1, 1, 1, 2, 2, 2, 2 ]

Values in the map vector are the number of the collection that corresponding elements in the input vector belong in.

map        = [ 0, 0, 0, 1, 1, 1, 2, 2, 2, 2 ]
input = [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]

Each step is a simple loop. For all I know, there could be an operator in some language like APL that solves the whole problem :-)

BTW, I really like Kent's rounding approach. I hadn't thought of that.

-- MichaelFeathers

I don't see the "simple loop" for the distribution vector. Think then we can make the map vector from the distribution one directly ... -- R

What about something like:

for (int i = 0; i < K; i++) Distrib [i] = N/K + (i < N % K ? 1 : 0);

The expression can also be written so that the N/K elements come before the N/K + 1 elements. True enough, a lot can be folded. I was thinking about how to make each step as simple as possible. -- mf

OK, I decoded that C this way:

k := 7.
n := 20.
baseSize := n // k.
excess := n \\ k.
distrib := Array new: k.
(0 to: k -1) do: 
        [ :each |
        extraItem := each < excess
                ifTrue: [1]
                ifFalse: [0].
        distrib
                at: each + 1
                put: baseSize + extraItem]

which actually gave me a clue what was going on ... then back (in Smalltalk) to:

(0 to: k -1) collect: [ :each | n // k + (each < (n \\ k) ifTrue: [1] ifFalse: [0])]

Thanks! Don't think we have a clear clarity winner yet ;-> -- R

See CodingProblemFairShare

Now, which is simpler and easier to understand?

The mention of dealing suggests to me another approach to add to the stew. I will just sketch the behavior - my Smalltalk is pretty weak.

#(0 1 2 3 4 5 6 7 8 9 10) deal: 3 =>
        #((0 3 6 9) (1 4 7 10) (2 5 8))

(#(0 1 2 3 4 5 6 7 8 9 10) deal: 3) transpose =>
        #((0 1 2) (3 4 5) (6 7 8) (10))

That's pretty close to a solution, although the last array is lacking. Maybe I should deal 4 (that is, n/k rounded up) instead:

#(0 1 2 3 4 5 6 7 8 9 10) deal: 4 =>
        #((0 4 8) (1 5 9) (2 6 10) (3 7))
(#(0 1 2 3 4 5 6 7 8 9 10) deal: 4) transpose =>
        #((0 1 2 3) (4 5 6 7) (8 9 10))

Hmm, that seems to put all the "longer" arrays at the front. Is this the same as the RemaindersAtTheFrontSolution to CodingProblemSmalltalkSolution?

-- BillTrost

Now that one is COOL! -- Ron

Rule: Never post code without compiling it. (broken here)

Design params.

N,&K are good names because the question said so.
use C typ pointers to ignore memory allocation questions.
Clarify cute mathematical concepts with comments
provide one or perhaps two good concrete examples in source code comment prior to implementation

/////
// Comment giving several examples that are tested by the unit test     


void SplitIntoBlocks( int *start, int N, int K, int **KBlocks)
{
        int **InsertionPoint = KBlocks; 
        for(int i=1;i<=N;i++)
        {                                 // for every item in collection

        if ( (i%k) == 0  &&       // if Item is at start of a new k  block
          i+2*k < N  )       // and there are > two         k  blocks left
        {
        KBlock++;                 // advance to next block
        InsertionPoint = KBlocks;
        },

InsertionPoint++ = *start++; // Copy one item

},

PS I have NOT tested this. There may well be fence post issues that I have not quite dealt with yet. I have posted this because it seems so much like the SimplestThingThatCouldPossiblyWork. What's the problem? Some specification descriptions are intrinsically complex so is the code. Trying to break it into lots of loops makes it to me more complex as I have to work out and remember the state that one loop computes, and then work out what the next loop does with it.

This code addresses the problem directly.

It solves the problem where do blocks start
Copies all the data.

Another way to do this that is simple is.

Compute number of Blocks Call this number NB = floor(N/K)
if (NB==0) NB=1; // :) that was a gotcha.
The first NB-1 blocks have exactly K members in each
write Loop to do this.
The last Block has ALL the rest write a loop to do this.
{ALL is a number between K and 2K-1 inclusive.},
{ But we do NOT compute it, we put all the rest of the data in the block. },

Note the choice between these two will be affected by many factors including that the second will be faster because the loops have less code inside them. The difference is almost exactly like the difference between Bresnham and Run length line drawing.

Debating about the details of simplest beyond this point is IMHO a religious war akin to {}, placement.

Now for the HARD question, what is a sufficient number of units tests to test this in the XP methodology. (This ones for you Ron :)

I have been having trouble coming to grips with what it is exactly that the self-proclaimed XP enthusiasts are saying. At times, I get the feeling that they have been infected with the latest silver bullet fever. At times, I think I missed the boat. How about we get out of the abstract and say something concrete?

What is the XP way? Show us an example, not a toy one chosen to make XP glow. Try this one. Then and this to me is the acid test what are the required unit tests for this code?

Someone got a bee in their bonnet - see CodingProblemSolutionByProof.

A (beginners) J solution is:

group =: 4 : 0
  s=. >. x.%~#y.
  isInt=. 0&= @ |
  (s isInt y.) <;.1 y.
  )  NB. A good J'er would probably dash off a one-liner tacit definition.

  Here is an example run:

c=:i.15
c
  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
n=:4

  n group c
  +-------+-------+---------+--------+
  |0 1 2 3|4 5 6 7|8 9 10 11|12 13 14|
  +-------+-------+---------+--------+

It seems that an obvious unit test, that n groups are returned, would not always pass. For example:

group i.12

+-----+-----+-----+-------+
|0 1 2|3 4 5|6 7 8|9 10 11|
+-----+-----+-----+-------+

group i.12

+-----+-----+-----+-------+
|0 1 2|3 4 5|6 7 8|9 10 11|
+-----+-----+-----+-------+

PythonLanguage:

def split(list, num) :
        for index in range(num) :
        yield list[index * len(list) / num:(index + 1) * len(list) / num]

for subset in split([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 7) :
        print subset

Output:

[1]
[2]
[3, 4]
[5]
[6, 7]
[8]
[9, 10]

Alternate solution (same result, doesn't require Python 2.2's generators):

def split(list, num) :
        return map(lambda x: list[x * len(list) / num:(x+1) * len(list) / num], range(num))

for subset in split([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 7) :
        print subset

The question is, why is it obvious from the code that this returns the right thing? If you have to explain it, or run through an example showing how it works, then I think Ron would say it's not clear enough. I don't just want to know how it works, but also why it works.

--- Same as above with list comprehension instead of lambda: def split(list, num) :

[list[x*len(list)/num:(x+1)*len(list)/num] for x in range(num)]

The above solution is really great!

I've implemented my clunky solution in RubyLanguage. But check out the check_segments method of the unit test. It should accept all the possible ways to segment the array and reject all the wrong ones! I've included two different implementations which return 2 valid but different segments for an array. The first one is a copy of the python solution above. The second one is my original clumsy implementation.

-- AurelianoCalvo

require 'test/unit'
include Test::Unit

class TestSegmenter < TestCase

 def check_segments( array, number_of_segments )
        segments = array.segment number_of_segments 

        # segments form the original array
        assert_equal array, segments.flatten

        # correct number of segments
        assert_equal number_of_segments, segments.size

        # array elements are evenly distributed across segments
        segment_sizes = segments.collect {|s| s.size}, 
        assert segment_sizes.max - segment_sizes.min <= 1
 end

 def test_segment
        check_segments( [1,2,3,4,5,6], 2 )
        check_segments( [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 3 )
        check_segments( [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 3 )
 end
end

First solution. Clean, nice, elegant, plagiarized.

class Array

def segment(segs)
        (0..segs).collect do |x|
        self[(x * size) / segs)...((x+1) * size / segs)]
        end
end

end Second solution. Dirty, awful, grotesque, original.

class Array
 def segment( number_of_segments )
        ideal_segment_size = self.size / number_of_segments
        number_of_spare_values = self.size % number_of_segments
        segment = []
        segments = []
        self.each_with_index do
        |value,index|
        segment << value
        if segment.size == ideal_segment_size + (number_of_spare_values > 0 ? 1 : 0)  
        number_of_spare_values = [0, number_of_spare_values - 1].max
        segments << segment
        segment = []
        end
        end
        return segments
 end
end

As I remember from university, we could use a bit of basic number theory to make the dealing solution work:

Assume initially that N and K are coprime. Let src be the offset into the source array, and dst be the destination, both initially zero.

Then as each element is dealt, we increment src by K and dst by 1, but then we only use src mod N and dst mod K. We stop dealing when src gets back to zero.

Each dst should now contain the appropriate chunk of src

If N and K are not coprime calculate their hcf (famous algorithm) and copy hcf(N,K) each time instead of one? or do the above hcf(N,K) times, offseting the starting src by one each time.

BillWeston