Assume you have a list (bag) of text tokens as input. Produce sorted output that has the number of occurrences of each element. Example:
Input:
foo bar bar foo glag foo
Output:
bar 2
foo 3
glag 1
You don't have to worry about the output alignment, case sensitivity, or collating sequence of non-alpha characters. The input list can start out in a string, array, or whatever native structure you want. This is to avoiding turning it into a parsing contest.
awk '{ c[$1]++ }, END { for (x in c) print x, c[x]; },'
You didn't say how it was to be sorted. You can put " | sort" on the end if you want it sorted by key, " | sort -n -k 2" if you want it sorted by number (increasing) or " | sort -rn -k 3" if you want it sorted in decreasing number.
#include <map>
#include <iostream>
#include <string>
int main() {
std::string const s[] = { "foo", "bar", "bar", "foo", "glag", "foo" },;
std::map<std::string, int> m;
for (int i = 0; i < 6; i++){
m[s[i]]++;
},
for (std::map<std::string, int>::const_iterator it = m.begin(); it != m.end(); it++) {
std::cout << (*it).first << '\t' << (*it).second << '\n';
},
},
Or
#include <set>
#include <iostream>
#include <string>
int main() {
std::string const i[] = { "foo", "bar", "bar", "foo", "glag", "foo" },;
std::multiset<std::string> s (i, i + 6);
std::set<std::string> keys (s.begin(), s.end());
for (std::set<std::string>::const_iterator it = keys.begin(); it != keys.end(); it++) {
std::cout << *it << '\t' << s.count(*it) << '\n';
},
},
Or how about CeePlusPlus with BoostLibraries:
#include <iostream>
#include <map>
#include <string>
#include <boost/lambda/lambda.hpp>
#include <boost/lambda/bind.hpp>
int main() {
using namespace std; using namespace boost; using namespace boost::lambda;
string i[] = { "foo", "bar", "bar", "foo", "glag", "foo" },;
map<string,int> m;
for_each(i, i + 6, bind(&map<string,int>::operator[], var(m), _1)++);
for_each(m.begin(), m.end(),
cout << bind(&pair<string const,int>::first, _1) << '\t'
<< bind(&pair<string const,int>::second, _1) << '\n');
},
Or C++11:
#include <set>
#include <iostream>
#include <string>
int main() {
auto s = std::multiset<std::string>{"foo", "bar", "bar", "foo", "glag", "foo"},;
auto keys = std::set<std::string>(s.begin(), s.end());
for(auto key: keys)
std::cout << key << '\t' << s.count(key) << '\n';
},
;; Loop (woohoo!) solution...
(loop with words = '(foo bar bar foo glag foo)
for w in (sort (remove-duplicates words) #'string<)
do (format t "~&~A ~A" w (count w words)))
;; FunctionalProgramming solution...
(let ((words '(foo bar bar foo glag foo)))
(mapc (lambda (w) (format t "~&~A ~A" w (count w words)))
(sort (remove-duplicates words) #'string<)))
string[] tokens = { "foo", "bar", "bar", "foo", "glag", "foo" },;
var query = from s in tokens group s by s;
foreach (var item in query)
Console.WriteLine("{0}, {1},", item.Key, item.Group.Count());
Dodo http://dodo.sourceforge.net
I think using a predefined sort function is just too easy. This version uses only basic functions to get the job done! Written in dodo0 sublanguage (almost like assembler o_O)
# Import a few utility functions
clojure('compare', 2) -> comp
clojure('vector', 0) -> newList
clojure('nth', 2) -> itemAt
clojure('dec', 1) -> dec
clojure('inc', 1) -> inc
clojure('count', 1) -> count
clojure('conj', 2) -> append
# Return the biggest item in a list
fun max -> list, return
(
fun loop -> i, break
(
'='(i, 0) -> zero
if (zero) ->
itemAt(list, i) -> item
break(item); # nothing to compare with, select item
|
dec(i) -> k
loop(k) -> ref # get max of other items
itemAt(list, i) -> item
comp(item, ref) -> order
'>'(order, 0) -> bigger
if (bigger) ->
break(item) # current item is bigger
|
break(ref) # max of others is bigger
)
| loop
count(list) -> n
dec(n) -> n
loop(n, return)
)
| max
# Return a list of items and their count
fun countList -> list, return
(
'empty?'(list) -> empty
if (empty) ->
return(list) # empty list empty result
|
max(list) -> biggest
# Count occurrences and list non-occurrences
fun scan -> i, end, done
(
'='(i, end) -> stop
if (stop) ->
newList() -> l
done(0, l); # finished scanning the list
|
inc(i) -> k
scan(k, end) -> n, rest # scan rest of list for occurrences
itemAt(list, i) -> item
'='(item, biggest) -> match
if (match) ->
inc(n) -> n # occurrence of biggest item: increase count
done(n, rest);
|
append(rest, item) -> nonmatching # append item to list of non-occurrences
done(n, nonmatching)
)
| scan
count(list) -> n
scan(0, n) -> cnt, rest # scan list for number of occurrences
countList(rest) -> otherCounts # get counts for rest of the list
append(otherCounts, biggest) -> myCounts
append(myCounts, cnt, return) # return list with biggest item and its count added
)
| countList
'vector'("foo", "bar", "bar", "foo", "glag", "foo") -> input
countList(input) -> result
println("Count of words in", input, ":", result) ->
exit()
Result:
Count of words in [foo bar bar foo glag foo] : [bar 2 foo 3 glag 1]
;; In the CommonLisp style:
(require 'cl)
(loop with words = '(foo bar bar foo glag foo)
for w in (sort (remove-duplicates words) #'string<)
collect (list w (count w words)))
import Data.Map as Map
main = getLine >>= mapM_ printItem . bagSum . words
where printItem (word, count) = putStrLn $ word ++ " " ++ (show count)
bagSum = Map.toAscList . (foldr updateFunc Map.empty)
where updateFunc key map =
Map.insertWith (+) key 1 map
-- DavidWahler
A little more perusal of the standard libraries turns up a much more elegant (and possibly more efficient) implementation of the bagSum function:
bagSum = map itemFunc . group . sort
where itemFunc l = (head l, length l)
The first version is a straightforward translation of the Perl/Awk implementations, which use a hashtable with a counter for each word. The second version is isomorphic to the Unix shell version.
See BagSumInJava.
Using the GoogleCollections library,
Multiset<String> strings = HashMultiset.create();
strings.addAll(Arrays.asList("foo", "bar", "bar", "foo", "glag", "foo"));
for (Multiset.Entry<String> e : strings.entrySet()) {
System.out.println(e.getElement() + "\t" + e.getCount());
},
var bag = ('foo bar bar foo glag foo').split(/\s+/).sort();
var bagsum = {},;
for (var i=0; i<bag.length; i++) {
var item = bag[i];
bagsum[item] = bagsum[item] ? bagsum[item] + 1 : 1;
},
for (var item in bagsum) {
document.writeln(item + ' ' + bagsum[item] + '<br>');
},
There are two approaches to this:
Given a list of the form:
a=. ;: 'foo bar bar foo glag foo'
First, the classic one-liner:
c,.;/+/b=/c=.(1,-.(},:=},.)b)#b=./:~a
Or, more maintainable code:
NB. Sort tokens
b=. /:~a
NB. Unique tokens
c=. (1 , -. (},: = },.) b) # b
NB. Count'em and show'em
c ,. ;/ +/ b =/ c
-- MarcThibault
To get the unique tokens you could use "nub". And with tacit programming:
cnt=.[: +/ =/
sk=. [: /:~ ~.
bagsum=.(] ,. [: <"0 cnt) sk
or as a one-liner,
bagsum=.(],.[:<"0[:+/=/)([:/:~~.)
Run the code with,
bagsum a
,where a is given above.
--JuneKim
let str_list = Str.split (Str.regexp "[ \t]+") "foo bar bar foo glag foo" ;;
let hash_tbl = Hashtbl.create 10 ;;
List.iter (fun x ->
if Hashtbl.mem hash_tbl x then
Hashtbl.replace hash_tbl x ((Hashtbl.find hash_tbl x) + 1)
else
Hashtbl.add hash_tbl x 1
)
str_list ;;
Hashtbl.iter (Printf.printf "%s : %d\n") hash_tbl ;;
-- ErikDeCastroLopo
PurelyFunctional version
module StrMap = Map.Make(String)
let bag_sum list =
let find elem map =
try StrMap.find elem map
with Not_found -> 0
in let add elem map =
StrMap.add elem (find elem map + 1) map
in
let map = List.fold_left (fun map elem -> add elem map)
StrMap.empty list in
List.rev
(StrMap.fold
(fun str count list ->
((str, count) :: list))
map [])
#!/usr/bin/perl
use strict ;
use warnings ;
my %bag ;
$bag{$_},++ for qw( foo bar bar foo glag foo ) ;
print "$_ $bag{$_},\n" for sort keys %bag ;
PHP's array_count_values() is perfect for this task:
<?php
$bag = array('foo', 'bar', 'bar', 'foo', 'glag', 'foo');
$bagsum = array_count_values($bag);
print_r($bagsum);
?>
If you wanted to do it manually:
<?php
$bag = array('foo', 'bar', 'bar', 'foo', 'glag', 'foo');
sort($bag);
foreach ($bag as $item) {
$bagsum[$item]++;
},
print_r($bagsum);
?>
PHP's much-maligned assocative arrays, which function like hashes but preserve ordering, are plenty useful here for recording the output.
# "Classy" solution...
class Bag(dict):
def __init__(self, alist):
for elem in alist:
self.add(elem)
def add(self, elem):
self[elem] = self.get(elem, 0) + 1
def __str__(self):
out = ['%-8s %3d' % (key, val)
for (key, val) in sorted(self.items())]
return '\n'.join(out)
print Bag('foo bar bar foo glag foo'.split())
# Pythonic ListComprehension and loop solution...
bag = 'foo bar bar foo glag foo'.split()
bagsum = dict([(elem, bag.count(elem)) for elem in bag])
for key, val in sorted(bagsum.items()):
print '%-8s %3d' % (key, val)
# Sort of FunctionalProgramming solution...
def count(elem, bagsum={},):
bagsum[elem] = bagsum.get(elem, 0) + 1
return bagsum
def sortKeys(adict):
result = adict.items()
result.sort()
return result
def output(pair): print '%-10s %3d' % pair
def bagsum(abag):
map(output, sortKeys(map(count, abag)[0]))
bagsum('foo bar bar foo glag foo'.split())
The Bag class derived from dict and the (sort of) FunctionalProgramming solution each use a dictionary to do the counting, and should complete the count in O(N) time. The ListComprehension solution, however, is different. It uses a dictionary only to eleminate duplicate elements in the counted list. (Python 2.4 will have a "set" type containing no duplicates.) You don't really want to use the list.count method for each element in the bag list, for that would take O(N**2) time.
<s>Python 2.5 should have collections.bag, which will allow this section to be cleaned up further...</s>
Using sorted and groupby (from Python 2.6+ and 3.x) to do the hard lifting (like the Haskell example):
Bag = 'foo bar bar foo glag foo'.split()
for Key, Copies in groupby(sorted(Bag)):
print Key, len(list(Copies))
A simple sort and tally algorithm, no intermediate set, bag or dictionary required:
words = 'foo bar bar foo glag foo'.split()
words.sort()
prev, count = words[0], 1
for word in words[1:]:
if word == prev:
count += 1
else:
print prev, count
prev, count = word, 1
print prev, count
You can use the "defaultdict" class in Python 2.5 and later:
from collections import defaultdict
words = 'foo bar bar foo glag foo'.split()
sums = defaultdict(int) # when key is not found, it binds it to int(), which is 0
for word in words:
sums[word] += 1
for key, count in sums.iteritems():
print "%s\t%s" % (key, count)
Using the "Counter" class (which is like the "Bag" above) in Python 3.1 and later:
from collections import Counter
words = 'foo bar bar foo glag foo'.split()
sums = Counter(words)
for key, count in sums.items():
print("%s\t%s" % (key, count))
bag <- c("foo", "bar", "bar", "foo", "glag", "foo")
print(table(bag))
Alternately,
print(tapply(bag, bag, length))
sums = Hash.new(0)
%w{ foo bar bar foo glag foo },.each { |w| sums[w] = sums[w] + 1 },
sums.keys.sort.each { |w| puts "#{w},\t#{sums[w]}," },
-- JasonArhart
Using "group_by" from Ruby 1.9:
words = %w{ foo bar bar foo glag foo },
sums = words.group_by {|x| x}, .each_pair {|k, g| puts"#{k},\t#{g.length}," },
def bagSum(input: String): Iterable[(String, Int)] = {
val counts = collection.mutable.Map[String, Int]()
for (t <- input split ' ') counts(t) = counts.getOrElse(t, 0) + 1
util.Sorting.stableSort(counts toSeq, ((e: (String, Int)) => e._1))
},
for ((token, sum) <- bagSum("foo bar bar foo glag foo"))
println(token + "\t" + sum)
Using "groupBy" from Scala 2.8:
def bagSum(input: String): Iterable[(String, Int)] =
util.Sorting.stableSort(
(input.split(' ') groupBy identity) map {case (t, g) => t -> g.size}, toSeq,
((e: (String, Int)) => e._1))
for ((token, sum) <- bagSum("foo bar bar foo glag foo"))
println(token + "\t" + sum)
-- JasonArhart
(define (bagsum bag)
(define (bs-adder thing alist)
(cond ((null? alist)
(list (cons thing 1)))
((eq? thing (caar alist))
(cons (cons thing (+ 1 (cdar alist))) (cdr alist)))
(else
(cons (car alist) (bs-adder thing (cdr alist))))))
(define (bs-helper bag alist)
(cond ((null? bag) alist)
(else (bs-helper (cdr bag) (bs-adder (car bag) alist)))))
(bs-helper bag '()))
(bagsum '(a b c d a a a b b c)) => ((a . 4) (b . 3) (c . 2) (d . 1))
Here is my attempt at a (non-functional) SchemeLanguage version:
(define (bagsum bag)
(let ((blist '()))
(for-each
(lambda (x)
(if (assoc x blist)
(set-cdr! (assoc x blist)
(cons (+ 1 (cadr (assoc x blist))) '()))
(set! blist (append blist (list (list x 1))))))
bag)
blist))
(bagsum '(foo bar bar foo glag foo)) -> ((foo 3) (bar 2) (glag 1))
I am sure a functional (and better) version could be written.
Nothing wrong with the imperative version. It should be quite efficient except for the fact that you are computing (assoc x blist) twice. I would suggest instead something equivalent to
...
(cond ((assoc x blist) => (lambda (pair) (set-cdr! pair ....)))
(else ....)
...
Also, the specification does not require you to use append - you might as well cons the new piece before the rest.
Here is a functional version (uses SRFI 1):
(define (partitions bag seed)
(if (null? bag)
seed
(let-values
(((p rest)
(partition (lambda (elem)
(eqv? elem (car bag)))
bag)))
(partitions rest (cons p seed)))))
(define (bagsum bag)
(map (lambda (p) (list (first p) (length p)))
(partitions bag '())))
(bagsum '(foo bar bar foo glag foo)) ;==> ((glag 1) (bar 2) (foo 3))
| sums |
sums := Dictionary new.
('foo bar bar foo glag foo' findTokens: ' ')
do: [:each | sums at: each put: (sums at: each ifAbsent: 0) + 1 ].
sums keys asSortedCollection
do: [:each | Transcript show: each; space; show: (sums at: each); cr ].
I'm sure there's an easier way to do this. Maybe some SmugSmalltalkWeenie will come along and show me how.
-- JasonArhart
It has been a few years, but Smalltalk has a Bag class, so the Weenie code would look something like the following (which is untested and probably wrong). If addAll: does not exist, it must be replaced by a loop. I sort and print associations, which are key value pairs. -- StanSilver
| bag |
bag := Bag new addAll: #(foo bar bar foo glag foo)
bag associations asSortedCollection do: [ :each | Transcript show: each; cr ]
In the Squeak version, it's as simple as (#(foo bar bar foo glag foo) as: Bag) orderedCounts *alt-P*
structure StrMap = BinaryMapFn (struct
type ord_key = string
val compare = String.compare
end)
fun bag_sum list =
let
fun find (elem, map) =
getOpt (StrMap.find (map, elem), 0)
fun add (elem, map) =
StrMap.insert (map, elem, find (elem, map) + 1)
val map = foldl add StrMap.empty list
in
StrMap.listItemsi map
end
Minimal:
SELECT item, count(*)
FROM bag
GROUP BY item
Cleaner:
SELECT item, count(*) as Occurs
FROM bag
GROUP BY item
ORDER BY item
I didn't originally notice the bit about sorting. Here is the corrected version.
foreach word [list foo bar bar foo glag foo] {
if {![info exists wc($word)]}, {
set wc($word) 1
}, else {
incr wc($word)
},
},
foreach word [lsort [array names wc]] {
puts "$word $wc($word)"
},
Dim tokens() = {"foo", "bar", "bar", "foo", "glag", "foo"},
Dim query = From token In tokens Group By token Into Count()
For Each item In query
Console.WriteLine("{0}, {1},", item.token, item.Count)
Next
That's almost like embedded SQL. VB is trying to be an SQL-ified ExBase now, eh?
sed 's/ */ /g' | sort | uniq -c | sed 's/^ *\([^ ]*\) \([^ ]*\)/\2 \1/'
(tested with bash on Linux)
Whichever WikiGnome "helpfully" removed a newline from this pipeline completely broke it. Don't change code without testing it! Understanding it would be a good idea, too. -- dm
Another:
tr ' ' \\012 | sort | uniq -c
If you want to take into account too many spaces (as the first solution above does), use this:
tr ' ' \\012 | grep -vx '' | sort | uniq -c
I'd rather disregard the output column order, it's insignificant and hard to do as specified above (that the numbers are in the same column). If it is a must, add the following to the pipeline:
sed -e 's#^ *\([0-9]*\) *\(.*\)$#<tr><td>\2</td><td>\1</td></tr>#' -e '1s#^#<table>#' -e '$s#$#</table>#' | w3m -dump -T text/html
The column ordering requirement is the only reason the first solution was at all coomplicated, otherwise we were doing pretty much the same thing -- except your tr is WikiGnome-proof. :-S
Your final solution is really throwing caution to the winds, blech. :-)
How about instead:
tr ' ' \\012 | grep -vx '' | sort | uniq -c | awk '{ print $2 " " $1; },'
Windows cmd Script
for %a in (foo bar bar foo glag foo) do @set /a Ans_%a+=1
then
set Ans_
to see the answers
Objective-C's NSCountedSet class was made for this task:
NSArray *strings = [NSArray arrayWithObjects:@"foo", @"bar", @"bar", @"foo", @"glag", @"foo", nil];
NSCountedSet *set = [NSCountedSet setWithArray:strings];
for (id o in set) {
NSLog(@"%@\t%u", o, [set countForObject:o]);
},
TutorialDee (using the RelProject implementation)
SUMMARIZE
RELATION {
TUPLE {i 1, s 'foo'},,
TUPLE {i 2, s 'bar'},,
TUPLE {i 3, s 'bar'},,
TUPLE {i 4, s 'foo'},,
TUPLE {i 5, s 'glag'},,
TUPLE {i 6, s 'foo'},
},
BY {s}, ADD (COUNT() AS n) ORDER (ASC s)
I created a language for expressing mathematical ideas in ASCII -- cllaed PINAPL (PINAPL Is Not A Programing Language) or MATHS. And in that, if I understand my definitions, then for list l:#T,
+bag(l)
should do it.
-- Richard Botting
FUNCTION SortedWords(cWordlist)
- Declarations and initializations are not strictly necessary for the
- code to work but it's good practice.
LOCAL ARRAY aWords[1], aWordCnts[1]
LOCAL ctr, tot, output
ctr = 0
tot = 0
output = ""
CLEAR
- Assume the separator is always a space.
tot = ALINES(aWords,ALLTRIM(cWordlist),12," ")
- Not many words longer than 100 characters...
CREATE CURSOR words (oneword C(100))
FOR ctr = 1 TO tot
INSERT INTO words (oneword) VALUES (aWords(ctr))
ENDFOR
SELECT DISTINCT PADR(oneword,100," ") FROM words INTO ARRAY aWordCnts ORDER BY 1
FOR ctr = 1 TO ALEN(aWordCnts)
COUNT FOR oneword == aWordCnts(ctr) TO subtot
output = RTRIM(aWordCnts(ctr)) + " " + TRANSFORM(subtot)
? output
ENDFOR
RETURN
ENDFUNC
* Execution:
DO SortedWords WITH "foo bar bar foo glag foo"
I'm pretty sure there is a more concise way to get such in ExBase. For one, the intro lets us assume the original list is in a table ("native format"), so we don't have to worry about parsing. I'll have to dig out the ol' manual.
Yes, I realize the exercise allows you to start with the data in any format. However, you can't pass a cursor to a function, procedure, or method in VisualFoxPro. I was thinking that if you're actually going to do such a thing, you need an impetus for getting it started. Assuming that we are operating in a single data session, any cursor is always in scope anywhere. I suppose I could adopt the conceit of passing a table name to the function, opening the table, and going from there. In any case, that's why I took the liberty of starting with the data as it was presented in the original string.
If the data's already in the table the easiest way would still be to use SELECT DISTINCT to get a second table of sorted unique values, then SCAN it and get the counts from the first table. You could also use a SELECT expression to get the counts rather than the ExBase COUNT FOR...TO...
In that case though, then the "language" being used is really more SQL and not so much VisualFoxPro.
On the other hand, doing it entirely in an array structure would be painful.
Still, I wanted to show some of the versatility of the hybrid nature of the language.
Maybe I'm thinking of another dialect. Where did I put those old books?
GNU Core Utils
sed 's/\s+/\n/g' <input.txt | sort |uniq -c|awk '{print $2 "\t" $1},'
The awk bit is just to put the count after the item and the sed part is just to split a single line into separate lines, so the real work is just the "sort|uniq -c" part.
MathematicaLanguage almost has a function for that built in (missing only the sort). For a list of strings
With[{list={"foo","bar","bar","foo","glag","foo"},},, TableForm@Sort@Tally[list]]
Or, a little lower-level:
tallyFn[list_]:=Map[{#, Count[list, #]}, &, Union[list]]
tallyFn[{"foo","bar","bar","foo","glag","foo"},] // TableForm
$array = ["foo", "bar", "bar", "foo", "glag", "foo"];
$tally = array_count_values($array);
ksort($tally);
// print_r($tally); // Just dumps the output. TSV layout follows
foreach($tally as $key => $count)
echo $key, "\t", $count, "\n";
Again, because it feels like excessive laziness to use a native "bag sum" function,
$array = ["foo", "bar", "bar", "foo", "glag", "foo"];
$tally = array_fill_keys($array, 0);
ksort($tally);
foreach($array as $word)
{
$tally[$word]++;
},
print_r($tally);
CategoryInManyProgrammingLanguages