> A = "time";
> size A;
it is often desirable to look at a random selection to get a quick overview (rather than just seeing matches from the first part of the corpus); one possibility is to do a sort randomize and then go through the first few pages of random matches:
> sort A randomize;
however, this cannot be combined with other sort options such as alphabetical sorting on match or left/right context; it also doesn't speed up frequency lists, set target and other post-processing operations
> reduce A to 10%;
> size A;
> sort A by word %cd on match .. matchend[42];
> reduce A to 100;
> size A;
> sort A by word %cd on match .. matchend[42];
this allows arbitrary further operations to be carried out on a representative sample rather than the full query result
> randomize 42;
(use any positive integer as seed)
> sort A randomize;
> cut A 100;
(NB: this restores corpus order, as with the reduce command)
reproducible subsets can be obtained with a suitable randomize command before the sort; the main difference from the reduce command is that cut cannot be used to select a percentage of matches (i.e., you have to determine the number of matches in the desired subset yourself)
> sort A randomize 42;
different seeds give different, reproducible orderings; if you randomize a subset of A with the same seed value, the matches will appear exactly in the same order as in the randomized version of A:
> A = "interesting" cut 20;
(just for illustration)
> B = A;
> reduce B to 10;
(an arbitrary subset of A)
> sort A randomize 42;
> sort B randomize 42;
> A = "time";
> sort A randomize 7;
> Sample1 = A;
> cut Sample1 0 99;
(random sample of 100 matches)
> Sample2 = A;
> cut Sample2 100 199;
(random sample of 100 matches)
note that the cut removes the randomized ordering; you can reapply the stable randomization to achieve full correspondence to the randomized query result A:
> sort Sample2 randomize 7;
> cat Sample2;
> cat A 100 199;