wiiki - BagSumInManyProgrammingLanguages

Assume you have a list (bag) of text tokens as input. Produce sorted output that has the number of occurrences of each element. Example:

Input:
        foo bar bar foo glag foo 

Output:
        bar  2
        foo  3
        glag 1

You don't have to worry about the output alignment, case sensitivity, or collating sequence of non-alpha characters. The input list can start out in a string, array, or whatever native structure you want. This is to avoiding turning it into a parsing contest.

AwkLanguage

awk '{ c[$1]++ }, END { for (x in c) print x, c[x]; },'

You didn't say how it was to be sorted. You can put " | sort" on the end if you want it sorted by key, " | sort -n -k 2" if you want it sorted by number (increasing) or " | sort -rn -k 3" if you want it sorted in decreasing number.

CeePlusPlus

#include <map>
#include <iostream>
#include <string>

int main() {
 std::string const s[] = { "foo", "bar", "bar", "foo", "glag", "foo" },;
 std::map<std::string, int> m;
 for (int i = 0; i < 6; i++){
        m[s[i]]++;
 },
 for (std::map<std::string, int>::const_iterator it = m.begin(); it != m.end(); it++) {
        std::cout << (*it).first << '\t' << (*it).second << '\n';
 },
},

#include <set>
#include <iostream>
#include <string>
int main() {
 std::string const i[] = { "foo", "bar", "bar", "foo", "glag", "foo" },;
 std::multiset<std::string> s (i, i + 6);
 std::set<std::string> keys (s.begin(), s.end());
 for (std::set<std::string>::const_iterator it = keys.begin(); it != keys.end(); it++) {
        std::cout << *it << '\t' << s.count(*it) << '\n';
 },
},

Or how about CeePlusPlus with BoostLibraries:

#include <iostream>
#include <map>
#include <string>
#include <boost/lambda/lambda.hpp>
#include <boost/lambda/bind.hpp>

int main() {
        using namespace std; using namespace boost; using namespace boost::lambda;
        string i[] = { "foo", "bar", "bar", "foo", "glag", "foo" },;
        map<string,int> m;
        for_each(i, i + 6, bind(&map<string,int>::operator[], var(m), _1)++);
        for_each(m.begin(), m.end(),
                cout << bind(&pair<string const,int>::first, _1) << '\t'
                 << bind(&pair<string const,int>::second, _1) << '\n');
},

Or C++11:

#include <set>
#include <iostream>
#include <string>
int main() {
    auto s = std::multiset<std::string>{"foo", "bar", "bar", "foo", "glag", "foo"},;
    auto keys = std::set<std::string>(s.begin(), s.end());
    for(auto key: keys)
        std::cout << key << '\t' << s.count(key) << '\n';
},

CommonLisp

;; Loop (woohoo!) solution...
(loop with words = '(foo bar bar foo glag foo)
        for w in (sort (remove-duplicates words) #'string<)
        do (format t "~&~A ~A" w (count w words)))

;; FunctionalProgramming solution...
(let ((words '(foo bar bar foo glag foo)))
        (mapc (lambda (w) (format t "~&~A ~A" w (count w words)))
         (sort (remove-duplicates words) #'string<)))

CeeSharp:

string[] tokens = { "foo", "bar", "bar", "foo", "glag", "foo" },;
var query = from s in tokens group s by s;
foreach (var item in query)
        Console.WriteLine("{0}, {1},", item.Key, item.Group.Count());

Dodo http://dodo.sourceforge.net

I think using a predefined sort function is just too easy. This version uses only basic functions to get the job done! Written in dodo0 sublanguage (almost like assembler o_O)

# Import a few utility functions
clojure('compare', 2) -> comp
clojure('vector', 0) -> newList
clojure('nth', 2) -> itemAt
clojure('dec', 1) -> dec
clojure('inc', 1) -> inc
clojure('count', 1) -> count
clojure('conj', 2) -> append

# Return the biggest item in a list
fun max -> list, return
(
        fun loop -> i, break
        (
        '='(i, 0) -> zero
        if (zero) ->
        itemAt(list, i) -> item
        break(item);    # nothing to compare with, select item
        |
        dec(i) -> k
        loop(k) -> ref       # get max of other items
        itemAt(list, i) -> item
        comp(item, ref) -> order
        '>'(order, 0) -> bigger
        if (bigger) ->
          break(item)   # current item is bigger
        |
          break(ref)    # max of others is bigger
        )
        | loop

        count(list) -> n
        dec(n) -> n
        loop(n, return)
)
| max

# Return a list of items and their count
fun countList -> list, return
(
        'empty?'(list) -> empty
        if (empty) ->
        return(list)            # empty list empty result
        |
        max(list) -> biggest
        # Count occurrences and list non-occurrences
        fun scan -> i, end, done
        (
        '='(i, end) -> stop
        if (stop) ->
          newList() -> l
          done(0, l);           # finished scanning the list
        |
          inc(i) -> k
          scan(k, end) -> n, rest    # scan rest of list for occurrences
          itemAt(list, i) -> item
          '='(item, biggest) -> match
          if (match) ->
                inc(n) -> n          # occurrence of biggest item: increase count
                done(n, rest);
          |
                append(rest, item) -> nonmatching    # append item to list of non-occurrences
                done(n, nonmatching)
        )
        | scan

        count(list) -> n
        scan(0, n) -> cnt, rest              # scan list for number of occurrences
        countList(rest) -> otherCounts       # get counts for rest of the list
        append(otherCounts, biggest) -> myCounts
        append(myCounts, cnt, return)   # return list with biggest item and its count added
)
| countList

'vector'("foo", "bar", "bar", "foo", "glag", "foo") -> input
countList(input) -> result
println("Count of words in", input, ":", result) ->
exit()

Result:

Count of words in [foo bar bar foo glag foo] : [bar 2 foo 3 glag 1]

EmacsLisp:

;; In the CommonLisp style:
(require 'cl)
(loop with words = '(foo bar bar foo glag foo)
        for w in (sort (remove-duplicates words) #'string<)
        collect (list w (count w words)))

HaskellLanguage

import Data.Map as Map

main = getLine >>= mapM_ printItem . bagSum . words
        where printItem (word, count) = putStrLn $ word ++ " " ++ (show count)

bagSum = Map.toAscList . (foldr updateFunc Map.empty) 
        where updateFunc key map =
                Map.insertWith (+) key 1 map

-- DavidWahler

A little more perusal of the standard libraries turns up a much more elegant (and possibly more efficient) implementation of the bagSum function:

bagSum = map itemFunc . group . sort
        where itemFunc l = (head l, length l)

The first version is a straightforward translation of the Perl/Awk implementations, which use a hashtable with a counter for each word. The second version is isomorphic to the Unix shell version.

JavaLanguage

See BagSumInJava.

Using the GoogleCollections library,

Multiset<String> strings = HashMultiset.create();
strings.addAll(Arrays.asList("foo", "bar", "bar", "foo", "glag", "foo"));

for (Multiset.Entry<String> e : strings.entrySet()) {
        System.out.println(e.getElement() + "\t" + e.getCount());
},

JavaScript

var bag = ('foo bar bar foo glag foo').split(/\s+/).sort();
var bagsum = {},;
for (var i=0; i<bag.length; i++) {
        var item = bag[i];
        bagsum[item] = bagsum[item] ? bagsum[item] + 1 : 1;
},
for (var item in bagsum) {
        document.writeln(item + ' ' + bagsum[item] + '<br>');
},

-- ElizabethWiethoff

JayLanguage

There are two approaches to this:

Given a list of the form:

a=. ;: 'foo bar bar foo glag foo'

First, the classic one-liner:

c,.;/+/b=/c=.(1,-.(},:=},.)b)#b=./:~a

Or, more maintainable code:

NB. Sort tokens
b=. /:~a
NB. Unique tokens
c=. (1 , -. (},: = },.) b) # b
NB. Count'em and show'em
c ,. ;/ +/ b =/ c

-- MarcThibault

To get the unique tokens you could use "nub". And with tacit programming:

cnt=.[: +/ =/
sk=. [: /:~ ~.
bagsum=.(] ,. [: <"0 cnt) sk

or as a one-liner,

bagsum=.(],.[:<"0[:+/=/)([:/:~~.)

Run the code with,

bagsum a

,where a is given above.

--JuneKim

OcamlLanguage

let str_list = Str.split (Str.regexp "[ \t]+") "foo bar bar  foo glag foo" ;;

let hash_tbl = Hashtbl.create 10 ;;

List.iter (fun x ->
        if Hashtbl.mem hash_tbl x then 
        Hashtbl.replace hash_tbl x ((Hashtbl.find hash_tbl x) + 1)
        else
        Hashtbl.add hash_tbl x 1
        )
        str_list ;;

Hashtbl.iter (Printf.printf "%s : %d\n") hash_tbl ;;

-- ErikDeCastroLopo

PurelyFunctional version

module StrMap = Map.Make(String)
let bag_sum list =
let find elem map =
try StrMap.find elem map
with Not_found -> 0
in let add elem map =
StrMap.add elem (find elem map + 1) map
in
let map = List.fold_left (fun map elem -> add elem map)
          StrMap.empty list in
List.rev
(StrMap.fold
        (fun str count list ->
        ((str, count) :: list))
        map [])

PerlLanguage

#!/usr/bin/perl

use strict ;
use warnings ;

my %bag ;
$bag{$_},++ for qw( foo bar bar foo glag foo ) ;
print "$_ $bag{$_},\n" for sort keys %bag ;

-- ChristofferHammarstrom

PhpLanguage

PHP's array_count_values() is perfect for this task:

<?php
$bag = array('foo', 'bar', 'bar', 'foo', 'glag', 'foo');
$bagsum = array_count_values($bag);
print_r($bagsum);
?>

If you wanted to do it manually:

<?php
$bag = array('foo', 'bar', 'bar', 'foo', 'glag', 'foo');
sort($bag);
foreach ($bag as $item) {
 $bagsum[$item]++;
},
print_r($bagsum);
?>

PHP's much-maligned assocative arrays, which function like hashes but preserve ordering, are plenty useful here for recording the output.

PythonLanguage

# "Classy" solution...
class Bag(dict):
        def __init__(self, alist):
        for elem in alist:
                self.add(elem)

        def add(self, elem):
        self[elem] = self.get(elem, 0) + 1

        def __str__(self):
        out = ['%-8s %3d' % (key, val)
                        for (key, val) in sorted(self.items())]
        return '\n'.join(out)

print Bag('foo bar bar foo glag foo'.split())

# Pythonic ListComprehension and loop solution...
bag = 'foo bar bar foo glag foo'.split()
bagsum = dict([(elem, bag.count(elem)) for elem in bag])
for key, val in sorted(bagsum.items()):
        print '%-8s %3d' % (key, val)

# Sort of FunctionalProgramming solution...
def count(elem, bagsum={},):
        bagsum[elem] = bagsum.get(elem, 0) + 1
        return bagsum
def sortKeys(adict):
        result = adict.items()
        result.sort()
        return result
def output(pair): print '%-10s %3d' % pair
def bagsum(abag):
        map(output, sortKeys(map(count, abag)[0]))
bagsum('foo bar bar foo glag foo'.split())

The Bag class derived from dict and the (sort of) FunctionalProgramming solution each use a dictionary to do the counting, and should complete the count in O(N) time. The ListComprehension solution, however, is different. It uses a dictionary only to eleminate duplicate elements in the counted list. (Python 2.4 will have a "set" type containing no duplicates.) You don't really want to use the list.count method for each element in the bag list, for that would take O(N**2) time.

-- ElizabethWiethoff

<s>Python 2.5 should have collections.bag, which will allow this section to be cleaned up further...</s>

Using sorted and groupby (from Python 2.6+ and 3.x) to do the hard lifting (like the Haskell example):

Bag = 'foo bar bar foo glag foo'.split()
for Key, Copies in groupby(sorted(Bag)):
        print Key, len(list(Copies))

A simple sort and tally algorithm, no intermediate set, bag or dictionary required:

words = 'foo bar bar foo glag foo'.split()
words.sort()
prev, count = words[0], 1
for word in words[1:]:
        if word == prev:
        count += 1
        else:
        print prev, count
        prev, count = word, 1
print prev, count

You can use the "defaultdict" class in Python 2.5 and later:

from collections import defaultdict
words = 'foo bar bar foo glag foo'.split()
sums = defaultdict(int) # when key is not found, it binds it to int(), which is 0
for word in words:
        sums[word] += 1
for key, count in sums.iteritems():
        print "%s\t%s" % (key, count)

Using the "Counter" class (which is like the "Bag" above) in Python 3.1 and later:

from collections import Counter
words = 'foo bar bar foo glag foo'.split()
sums = Counter(words)
for key, count in sums.items():
        print("%s\t%s" % (key, count))

ArrLanguage

bag <- c("foo", "bar", "bar", "foo", "glag", "foo")
print(table(bag))

Alternately,

print(tapply(bag, bag, length))

RubyLanguage

sums = Hash.new(0)
%w{ foo bar bar foo glag foo },.each { |w| sums[w] = sums[w] + 1 },
sums.keys.sort.each { |w| puts "#{w},\t#{sums[w]}," },

-- JasonArhart

Using "group_by" from Ruby 1.9:

words = %w{ foo bar bar foo glag foo },
sums = words.group_by {|x| x}, .each_pair {|k, g| puts"#{k},\t#{g.length}," },

ScalaLanguage

def bagSum(input: String): Iterable[(String, Int)] = {
        val counts = collection.mutable.Map[String, Int]()
        for (t <- input split ' ') counts(t) = counts.getOrElse(t, 0) + 1
        util.Sorting.stableSort(counts toSeq, ((e: (String, Int)) => e._1))
},

for ((token, sum) <- bagSum("foo bar bar foo glag foo"))
        println(token + "\t" + sum)

Using "groupBy" from Scala 2.8:

def bagSum(input: String): Iterable[(String, Int)] =
        util.Sorting.stableSort(
        (input.split(' ') groupBy identity) map {case (t, g) => t -> g.size}, toSeq,
        ((e: (String, Int)) => e._1))

for ((token, sum) <- bagSum("foo bar bar foo glag foo"))
        println(token + "\t" + sum)

-- JasonArhart

SchemeLanguage

(define (bagsum bag)
        (define (bs-adder thing alist)
        (cond ((null? alist)
                (list (cons thing 1)))
                ((eq? thing (caar alist))
                (cons (cons thing (+ 1 (cdar alist))) (cdr alist)))
                (else
                (cons (car alist) (bs-adder thing (cdr alist))))))
        (define (bs-helper bag alist)
        (cond ((null? bag) alist)
                (else (bs-helper (cdr bag) (bs-adder (car bag) alist)))))
        (bs-helper bag '()))

(bagsum '(a b c d a a a b b c)) => ((a . 4) (b . 3) (c . 2) (d . 1))

Here is my attempt at a (non-functional) SchemeLanguage version:

(define (bagsum bag)
        (let ((blist '()))
         (for-each 
                (lambda (x)
                (if (assoc x blist)
                 (set-cdr! (assoc x blist)
                                (cons (+ 1 (cadr (assoc x blist))) '()))
                 (set! blist (append blist (list (list x 1))))))
                bag)
         blist))

(bagsum '(foo bar bar foo glag foo)) -> ((foo 3) (bar 2) (glag 1))

I am sure a functional (and better) version could be written.

-- JonathanArkell

Nothing wrong with the imperative version. It should be quite efficient except for the fact that you are computing (assoc x blist) twice. I would suggest instead something equivalent to

...
(cond ((assoc x blist) => (lambda (pair) (set-cdr! pair ....)))
 (else ....)
...

Also, the specification does not require you to use append - you might as well cons the new piece before the rest.

Here is a functional version (uses SRFI 1):

(define (partitions bag seed)
        (if (null? bag)
        seed
        (let-values 
                (((p rest) 
                (partition (lambda (elem) 
                                (eqv? elem (car bag))) 
                         bag)))
          (partitions rest (cons p seed))))) 

(define (bagsum bag)
        (map (lambda (p) (list (first p) (length p)))
         (partitions bag '())))

(bagsum '(foo bar bar foo glag foo))  ;==> ((glag 1) (bar 2) (foo 3))

SmalltalkLanguage

| sums |
sums := Dictionary new.
('foo bar bar foo glag foo' findTokens: ' ')
        do: [:each | sums at: each put: (sums at: each ifAbsent: 0) + 1 ].
sums keys asSortedCollection
        do: [:each | Transcript show: each; space; show: (sums at: each); cr ].

I'm sure there's an easier way to do this. Maybe some SmugSmalltalkWeenie will come along and show me how.

-- JasonArhart

It has been a few years, but Smalltalk has a Bag class, so the Weenie code would look something like the following (which is untested and probably wrong). If addAll: does not exist, it must be replaced by a loop. I sort and print associations, which are key value pairs. -- StanSilver

| bag |
bag := Bag new addAll: #(foo bar bar foo glag foo)
bag associations asSortedCollection do: [ :each | Transcript show: each; cr ]

In the Squeak version, it's as simple as (#(foo bar bar foo glag foo) as: Bag) orderedCounts *alt-P*

SmlNjLanguage

structure StrMap = BinaryMapFn (struct
                                        type ord_key = string
                                        val compare = String.compare
                                  end)
fun bag_sum list =
let
fun find (elem, map) =
getOpt (StrMap.find (map, elem), 0)
fun add (elem, map) =
StrMap.insert (map, elem, find (elem, map) + 1)
val map = foldl add StrMap.empty list
in
StrMap.listItemsi map
end

SqlLanguage

Minimal:

SELECT item, count(*)
FROM bag
GROUP BY item

Cleaner:

SELECT item, count(*) as Occurs
FROM bag
GROUP BY item
ORDER BY item

ToolCommandLanguage

I didn't originally notice the bit about sorting. Here is the corrected version.

foreach word [list foo bar bar foo glag foo] {
        if {![info exists wc($word)]}, {
        set wc($word) 1
        }, else {
        incr wc($word)
        },
},

foreach word [lsort [array names wc]] {
        puts "$word $wc($word)"
},

-- KristofferLawson

VisualBasicNine

Dim tokens() = {"foo", "bar", "bar", "foo", "glag", "foo"},
Dim query = From token In tokens Group By token Into Count()

For Each item In query
        Console.WriteLine("{0}, {1},", item.token, item.Count)
Next

That's almost like embedded SQL. VB is trying to be an SQL-ified ExBase now, eh?

UnixShell

sed 's/  */  /g' | sort | uniq -c | sed 's/^ *\([^ ]*\)    \([^ ]*\)/\2 \1/'

(tested with bash on Linux)

Whichever WikiGnome "helpfully" removed a newline from this pipeline completely broke it. Don't change code without testing it! Understanding it would be a good idea, too. -- dm

Another:

tr ' ' \\012 | sort | uniq -c

If you want to take into account too many spaces (as the first solution above does), use this:

tr ' ' \\012 | grep -vx '' | sort | uniq -c

I'd rather disregard the output column order, it's insignificant and hard to do as specified above (that the numbers are in the same column). If it is a must, add the following to the pipeline:

sed -e 's#^ *\([0-9]*\) *\(.*\)$#<tr><td>\2</td><td>\1</td></tr>#'  -e '1s#^#<table>#' -e '$s#$#</table>#' | w3m -dump -T text/html

The column ordering requirement is the only reason the first solution was at all coomplicated, otherwise we were doing pretty much the same thing -- except your tr is WikiGnome-proof. :-S

Your final solution is really throwing caution to the winds, blech. :-)

How about instead:

tr ' ' \\012 | grep -vx '' | sort | uniq -c | awk '{ print $2 " " $1; },'

Windows cmd Script

for %a in (foo bar bar foo glag foo) do @set /a Ans_%a+=1

then

set Ans_

to see the answers

ObjectiveCee

Objective-C's NSCountedSet class was made for this task:

NSArray *strings = [NSArray arrayWithObjects:@"foo", @"bar", @"bar", @"foo", @"glag", @"foo", nil];
NSCountedSet *set = [NSCountedSet setWithArray:strings];

for (id o in set) {
        NSLog(@"%@\t%u", o, [set countForObject:o]);
},

TutorialDee (using the RelProject implementation)

SUMMARIZE 
        RELATION {
        TUPLE {i 1, s 'foo'},,
        TUPLE {i 2, s 'bar'},,
        TUPLE {i 3, s 'bar'},,
        TUPLE {i 4, s 'foo'},,
        TUPLE {i 5, s 'glag'},,
        TUPLE {i 6, s 'foo'},
        }, 
 BY {s}, ADD (COUNT() AS n) ORDER (ASC s)

I created a language for expressing mathematical ideas in ASCII -- cllaed PINAPL (PINAPL Is Not A Programing Language) or MATHS. And in that, if I understand my definitions, then for list l:#T,

+bag(l)

should do it.

-- Richard Botting

VisualFoxPro

FUNCTION SortedWords(cWordlist)

Declarations and initializations are not strictly necessary for the
code to work but it's good practice.

LOCAL ARRAY aWords[1], aWordCnts[1]

LOCAL ctr, tot, output

ctr = 0
tot = 0
output = ""

CLEAR

Assume the separator is always a space.

tot = ALINES(aWords,ALLTRIM(cWordlist),12," ")

Not many words longer than 100 characters...

CREATE CURSOR words (oneword C(100))

FOR ctr = 1 TO tot
  INSERT INTO words (oneword) VALUES (aWords(ctr))
ENDFOR

SELECT DISTINCT PADR(oneword,100," ") FROM words INTO ARRAY aWordCnts ORDER BY 1

FOR ctr = 1 TO ALEN(aWordCnts)
  COUNT FOR oneword == aWordCnts(ctr) TO subtot

  output = RTRIM(aWordCnts(ctr)) + " " + TRANSFORM(subtot)

  ? output
ENDFOR

RETURN
  ENDFUNC

  * Execution:
  DO SortedWords WITH "foo bar bar foo glag foo"

I'm pretty sure there is a more concise way to get such in ExBase. For one, the intro lets us assume the original list is in a table ("native format"), so we don't have to worry about parsing. I'll have to dig out the ol' manual.

Yes, I realize the exercise allows you to start with the data in any format. However, you can't pass a cursor to a function, procedure, or method in VisualFoxPro. I was thinking that if you're actually going to do such a thing, you need an impetus for getting it started. Assuming that we are operating in a single data session, any cursor is always in scope anywhere. I suppose I could adopt the conceit of passing a table name to the function, opening the table, and going from there. In any case, that's why I took the liberty of starting with the data as it was presented in the original string.

If the data's already in the table the easiest way would still be to use SELECT DISTINCT to get a second table of sorted unique values, then SCAN it and get the counts from the first table. You could also use a SELECT expression to get the counts rather than the ExBase COUNT FOR...TO...

In that case though, then the "language" being used is really more SQL and not so much VisualFoxPro.

On the other hand, doing it entirely in an array structure would be painful.

Still, I wanted to show some of the versatility of the hybrid nature of the language.

Maybe I'm thinking of another dialect. Where did I put those old books?

GNU Core Utils

sed 's/\s+/\n/g' <input.txt | sort |uniq -c|awk '{print $2 "\t" $1},'

The awk bit is just to put the count after the item and the sed part is just to split a single line into separate lines, so the real work is just the "sort|uniq -c" part.

MathematicaLanguage almost has a function for that built in (missing only the sort). For a list of strings

With[{list={"foo","bar","bar","foo","glag","foo"},},, TableForm@Sort@Tally[list]]

Or, a little lower-level:

tallyFn[list_]:=Map[{#, Count[list, #]}, &, Union[list]]
tallyFn[{"foo","bar","bar","foo","glag","foo"},] // TableForm

PhpLanguage

$array = ["foo", "bar", "bar", "foo", "glag", "foo"];
$tally = array_count_values($array);
ksort($tally);

//  print_r($tally); // Just dumps the output. TSV layout follows
foreach($tally as $key => $count)
   echo $key, "\t", $count, "\n";

Again, because it feels like excessive laziness to use a native "bag sum" function,

$array = ["foo", "bar", "bar", "foo", "glag", "foo"];

$tally = array_fill_keys($array, 0);
ksort($tally);

foreach($array as $word)
{
  $tally[$word]++;
},
print_r($tally);

JulyZeroFive

CategoryInManyProgrammingLanguages