Posting has become much rarer here recently, as my energy has flagged.
And this morning I have a rather preemptory request that I fix the old links that were broken as a result of Apple's ditching iDisk. While I still have the files, somewhere, and I could theoretically put them on Dropbox, I'm really not ready to fix what must be several hundred links.
So I thought I would post this warning as what will probably be my last post. It's been fun, and I'm grateful to my readers for having taken an interest. We had more than 50,000 unique visitors and over 150 flags. I just wish someone from Tonga had come by.
If you really must have an old file, I can probably find it.
[ UPDATE: I made a zip of the directory tree which was used to be on iDisk under
http://web.mac.com/telliott99/Python/
It is on Dropbox here and weighs in at 17.8 MB. If you check the old links that don't work, the file should be in that archive. Let me know if you can't find something you'd like.
Thursday, August 9, 2012
Tuesday, July 31, 2012
More fun with geometry
I ran into a fun problem with parallelograms the other day.
The graphic shows a parallelogram broken up into pieces by its diagonals.
Since the opposing sides of a parallelogram are parallel and equal, it's easy to show that the top and bottom triangles are congruent, just rotated, likewise with the other two.
The problem posed was to express the total area in terms of the obtuse angle at the center, the one which is > 90 degrees. Let's call that angle A, and its supplementary angle (the acute angle at the center) B.
The reason why I've drawn the figure in this way is that, as shown below, the triangles can be rearranged to give a different paralellogram with the same area.
x = length of long diagonal (orange + maroon)
y = length of short diagonal
y/2 = 1/2 length of short diagonal (blue)
Area = x (y/2) sin B
The sine values for supplementary angles are equal.
sin (π/2 - θ) = sin θ (ref)
So:
Area = x (y/2) sin A
Nice! Much better than using the law of sines, and for the vectorists out there, expressed even more compactly as the cross-product.
The graphic shows a parallelogram broken up into pieces by its diagonals.
Since the opposing sides of a parallelogram are parallel and equal, it's easy to show that the top and bottom triangles are congruent, just rotated, likewise with the other two.
The problem posed was to express the total area in terms of the obtuse angle at the center, the one which is > 90 degrees. Let's call that angle A, and its supplementary angle (the acute angle at the center) B.
The reason why I've drawn the figure in this way is that, as shown below, the triangles can be rearranged to give a different paralellogram with the same area.
x = length of long diagonal (orange + maroon)
y = length of short diagonal
y/2 = 1/2 length of short diagonal (blue)
Area = x (y/2) sin B
The sine values for supplementary angles are equal.
sin (π/2 - θ) = sin θ (ref)
So:
Area = x (y/2) sin A
Nice! Much better than using the law of sines, and for the vectorists out there, expressed even more compactly as the cross-product.
Saturday, June 16, 2012
variadic arguments
I got curious about functions with variadic arguments like in NSArray:
Here's are two C examples. In either case, it's required to have a named argument that precedes the variable part. In the first method, that variable gives a count of the arguments. In the second, it's not used for anything except to show where to start. A sentinel value shows the end of the variable length argument list. I also incorporated
+ (id)arrayWithObjects:(id)firstObj, ...
Here's are two C examples. In either case, it's required to have a named argument that precedes the variable part. In the first method, that variable gives a count of the arguments. In the second, it's not used for anything except to show where to start. A sentinel value shows the end of the variable length argument list. I also incorporated
__func__
, which I came across in my reading and was new to me.// clang variadic.c -o prog #include <stdio.h> #include <stdarg.h> int f(int count, ...) { printf("%s%s", __func__, "\n"); int n, sum; va_list ap; va_start(ap, count); sum = 0; for ( ; count > 0; count--) { n = va_arg(ap, int); printf("n = %3d\n", n); sum += n; } va_end(ap); return sum; } int g(int x, ...) { printf("%s%s", __func__, "\n"); fflush(stdout); int n, sum; va_list ap; va_start(ap, x); n = x; sum = 0; while (!(0)) { printf("n = %3d\n", n); sum += n; n = va_arg(ap, int); if (n == 0) { break; } } va_end(ap); return sum; } int main(int argc, char * argv[]) { int result = f(3,1,2,3); printf ("sum = %d\n", result); result = g(1,2,3,0); printf ("sum = %d\n", result); return 0; } |
> ./prog f n = 1 n = 2 n = 3 sum = 6 g n = 1 n = 2 n = 3 sum = 6 |
NSString (1)
Test harness:
// clang strings1.m -o prog -framework Foundation -fobjc-gc-only #include <Foundation/Foundation.h> int main(int argc, char * argv[]) { // ..code here.. return 0; } |
NSString *s; NSMutableArray *ma = [NSMutableArray arrayWithCapacity:10]; char *cstr = "abc"; // real C string w/ "\0" also works uint ascii = NSASCIIStringEncoding; uint utf8 = NSUTF8StringEncoding; [ma addObject:[NSString stringWithCString:cstr encoding:ascii]]; [ma addObject:[NSString stringWithCString:cstr encoding:utf8]]; [ma addObject:[NSString stringWithUTF8String:cstr]]; NSMutableString *ms = [[NSMutableString alloc] init]; [ms appendString:@"abde"]; [ms insertString:@"c" atIndex:2]; [ma addObject:[ms copy]]; // otherwise code below will alter [ms appendString:@"fghij"]; [ms replaceOccurrencesOfString:@"B" withString:@"*" options:NSCaseInsensitiveSearch range:NSMakeRange(0,2) ]; [ma addObject:ms]; for (id obj in ma) { NSLog(@"%@ %@", obj, [obj class]); } [ma removeAllObjects]; printf("-----------------\n"); |
> ./prog 2012-05-29 16:45:03.624 prog[444:707] abc __NSCFString 2012-05-29 16:45:03.626 prog[444:707] abc __NSCFString 2012-05-29 16:45:03.627 prog[444:707] abc __NSCFString 2012-05-29 16:45:03.627 prog[444:707] abcde __NSCFString 2012-05-29 16:45:03.628 prog[444:707] a*cdefghij __NSCFString ----------------- |
[ma addObject:[NSString stringWithFormat:@"abcd"]]; [ma addObject:[NSString stringWithFormat:@"ab%@", @"cd" @"ef"]]; [ma addObject:[NSString stringWithFormat:@"a%C%C", 0x62, 0x63]]; char b[] = { 0x61, 0x62, 0x63, 0x64, 0x65 }; NSData *d = [NSData dataWithBytes:b length:5]; s = [[NSString alloc] initWithData:d encoding:utf8]; [ma addObject:s]; for (id obj in ma) { NSLog(@"%@ %@", obj, [obj class]); } printf("-----------------\n"); |
2012-05-29 16:45:03.629 prog[444:707] abcd __NSCFString 2012-05-29 16:45:03.629 prog[444:707] abcdef __NSCFString 2012-05-29 16:45:03.630 prog[444:707] abc __NSCFString 2012-05-29 16:45:03.630 prog[444:707] abcde __NSCFString ----------------- |
d = [NSData dataWithBytes:b length:5]; [d writeToFile:@"x" atomically:YES]; NSError *err = nil; NSStringEncoding enc = NSUTF8StringEncoding; s = [NSString stringWithContentsOfFile:@"x" usedEncoding:&enc error:&err]; if (err != nil) { NSLog(@"%@", [err userInfo]); } else { NSLog(@"%@", s ); } b[4] = 0xff; d = [NSData dataWithBytes:b length:5]; [d writeToFile:@"x" atomically:YES]; s = [NSString stringWithContentsOfFile:@"x" usedEncoding:&enc error:&err]; if (err != nil) { NSLog(@"%@", [err localizedDescription]); } else { NSLog(@"%@", s ); } |
2012-05-29 16:45:03.641 prog[444:707] abcde 2012-05-29 16:45:03.658 prog[444:707] The file “x” couldn’t be opened because the text encoding of its contents can’t be determined. |
Thursday, May 31, 2012
Burroughs-Wheeler Transform-BWT (3)
I'm exploring the Burroughs-Wheeler Transform, which has become important for fast methods of matching short reads to large genomes. The first two posts are here and here.
In the first we saw what BWT is and showed how to reverse or invert it. In the second, we saw how to use BWT to do a search. Here, I show a fairly efficient way to generate the BWT for a sequence.
Last time, we used a script which does a rotation on a deque followed by sorting to generate the BWT of a sequence:
and obtained
The result matches what the visualization tool on this page generates. However, our script does a lot more work (and uses more memory) than is really necessary. If we look again at the first few lines of output from bwt.py:
Each number is the index into the sequence where the row (the suffix) begins (0-based indexing). The array of numbers from top to bottom of the output is the suffix array.
As you can see from the last part of the script below, the BWT is easily generated from the suffix array, it's just text[j-1] for each j in the array. That's because the BWT is the last column and the sequences wrap, so each character in the BWT is -1 with respect to the character in the first column for any row.
The Python script shown here sorts the rows but only looks at as many of the characters as are needed to do the sort. We have a list L to contain the growing suffix array. This consists of indexes j into the text, one corresponding to each row, which are in sorted order in the suffix array.
We move an index i along the text and for the suffix starting at each i, find the place where it belongs in the suffix array, then insert it.
I'm sure there are even faster ways to generate the BWT for a text. The most glaring problem with this attempt is that repeated insertion into a list is expensive. We need a data structure that combines ease of insertion with ease of traversal (like a linked list).
Output:
If you compare this with what we got last time you'll see it matches.
fast_bwt.py
In the first we saw what BWT is and showed how to reverse or invert it. In the second, we saw how to use BWT to do a search. Here, I show a fairly efficient way to generate the BWT for a sequence.
Last time, we used a script which does a rotation on a deque followed by sorting to generate the BWT of a sequence:
GAATTCAAGCTTGGATCCGGAAAGATCTGATC
and obtained
GACGAAGGGATTCTGTG^GATACTTAAACTACC
The result matches what the visualization tool on this page generates. However, our script does a lot more work (and uses more memory) than is really necessary. If we look again at the first few lines of output from bwt.py:
20 AAAGATCTGATC^GAATTCAAGCTTGGATCCGG 21 AAGATCTGATC^GAATTCAAGCTTGGATCCGGA 6 AAGCTTGGATCCGGAAAGATCTGATC^GAATTC |
Each number is the index into the sequence where the row (the suffix) begins (0-based indexing). The array of numbers from top to bottom of the output is the suffix array.
As you can see from the last part of the script below, the BWT is easily generated from the suffix array, it's just text[j-1] for each j in the array. That's because the BWT is the last column and the sequences wrap, so each character in the BWT is -1 with respect to the character in the first column for any row.
The Python script shown here sorts the rows but only looks at as many of the characters as are needed to do the sort. We have a list L to contain the growing suffix array. This consists of indexes j into the text, one corresponding to each row, which are in sorted order in the suffix array.
We move an index i along the text and for the suffix starting at each i, find the place where it belongs in the suffix array, then insert it.
I'm sure there are even faster ways to generate the BWT for a text. The most glaring problem with this attempt is that repeated insertion into a list is expensive. We need a data structure that combines ease of insertion with ease of traversal (like a linked list).
Output:
> python fast_bwt3.py GACGAAGGGATTCTGTGGATACTTAAACTACC 20 21 6 1 22 7 14 24 29 2 5 16 17 26 9 31 19 13 23 28 8 18 12 4 15 25 30 27 11 3 10 32 |
If you compare this with what we got last time you'll see it matches.
fast_bwt.py
#text = 'BANANA' text = 'GAATTCAAGCTTGGATCCGGAAAGATCTGATC' t = text + '^' N = len(t) # L is the suffix array L = list() # i: index to walk current suffix along text # j: indexes for sorted suffix already in suffix array # k: enumeration of L, to give insertion point def doOne(i,L): for k,j in enumerate(L): n = 0 while t[i+n] == t[j+n]: n += 1 if t[i+n] < t[j+n]: L.insert(k,i) return L.append(i) for i in range(1,N): doOne(i,L) bwt = ''.join([t[j-1] for j in L]) print bwt, '\n' for k,j in enumerate(L): if k and not k % 10: print print '%3d' % j, |
Burroughs-Wheeler Transform-BWT (2)
I'm exploring the Burroughs-Wheeler Transform, which has become very important for fast methods of matching short reads to large genomes. The first post is here.
I found a nice page about it here, which is to be part of a series. There is a great applet(?), anyhow a tool for visualization is on that page, that can walk you through the steps of the search. I found it very helpful for understanding what's going on. I hope my explanation is also useful.
With the output from the Python script from yesterday, we'll try using BWT to search for the query
We'll work from the end of the query backwords. The rows we're working on can be found in the output from the script, which is listed at the end of this post.
Q1 = C.
Begin by selecting all the rows that start with C:
I found a nice page about it here, which is to be part of a series. There is a great applet(?), anyhow a tool for visualization is on that page, that can walk you through the steps of the search. I found it very helpful for understanding what's going on. I hope my explanation is also useful.
With the output from the Python script from yesterday, we'll try using BWT to search for the query
GGATC
in the text:GAATTCAAGCTTGGATCCGGAAAGATCTGATC
We'll work from the end of the query backwords. The rows we're working on can be found in the output from the script, which is listed at the end of this post.
Q1 = C.
Begin by selecting all the rows that start with C:
Wednesday, May 30, 2012
Burroughs-Wheeler Transform-BWT (1)
Some time ago I was introduced to bowtie (see this post).
As the front page says:
So the next question is: what is the Burroughs-Wheeler Transform (BWT) and how does it speed up this process? It turns out that the discovery of BWT was not all that long ago (1994 according to wikipedia), and its application to this bioinformatics problem is more recent than that. BWT has been applied to the problem of data compression in the well known bzip tool. We'll talk about the compression part today, and deal with search in another post.
I wrote a simple Python script to explore the BWT; the listing is at the end of the post. The basic transformation couldn't be easier. Take a short text, like
[UPDATE: I chose
Generate all the rotational permutations of this text:
Now sort the list:
That's it! The last column,
But here's the main point: by itself, the last column contains all of the information that was in the original text. That is, the transformation is reversible or invertible. The inversion depends on the cyclic rotation.
Here are the steps. Take the BWT and sort it:
By the way, that's what we already have in column 1 (or 0, for you Pythonistas).
Because of the rotation, the BWT plus column 1 gives all pairs of symbols in the original text. The symbol pairs are (BWT first, then column 1):
Now iteratively do this: sort whatever we generated in the previous step, and then tack on the BWT in front. Here are the triplets:
Rinse, lather and repeat:
Although
The BWT (last column) does not group all identical letters together. And in the inversion step, the reconstructed text is not necessarily in the first row.
In the output above, I've also listed the "suffix array", which is the index into the original text where we start each row. The suffix array will become important for the application to search, next time.
As the front page says:
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).
So the next question is: what is the Burroughs-Wheeler Transform (BWT) and how does it speed up this process? It turns out that the discovery of BWT was not all that long ago (1994 according to wikipedia), and its application to this bioinformatics problem is more recent than that. BWT has been applied to the problem of data compression in the well known bzip tool. We'll talk about the compression part today, and deal with search in another post.
I wrote a simple Python script to explore the BWT; the listing is at the end of the post. The basic transformation couldn't be easier. Take a short text, like
'BANANA'
, and add a marker to the end, giving 'BANANA^
'. [UPDATE: I chose
'^'
because it sorts after the uppercase letters by the default sort. This may grate a little for regex pros who expect '^'
to mark the beginning of something. Just define your own cmp function if you are one of these. ]Generate all the rotational permutations of this text:
BANANA^
^BANANA
A^BANAN
NA^BANA
ANA^BAN
NANA^BA
ANANA^B
Now sort the list:
ANANA^B
ANA^BAN
A^BANAN
BANANA^
NANA^BA
NA^BANA
^BANANA
That's it! The last column,
'BNN^AAA'
is the BWT of 'BANANA'
. I think you can see that the runs of identical symbols would allow opportunity for efficient compression.But here's the main point: by itself, the last column contains all of the information that was in the original text. That is, the transformation is reversible or invertible. The inversion depends on the cyclic rotation.
Here are the steps. Take the BWT and sort it:
A
A
A
B
N
N
^
By the way, that's what we already have in column 1 (or 0, for you Pythonistas).
Because of the rotation, the BWT plus column 1 gives all pairs of symbols in the original text. The symbol pairs are (BWT first, then column 1):
BA
NA
NA
^B
AN
AN
A^
Now iteratively do this: sort whatever we generated in the previous step, and then tack on the BWT in front. Here are the triplets:
BAN
NAN
NA^
^BA
ANA
ANA
A^B
Rinse, lather and repeat:
BANA
NANA
NA^B
^BAN
ANAN
ANA^
A^BA
BANAN
NANA^
NA^BA
^BANA
ANANA
ANA^B
A^BAN
BANANA
NANA^B
NA^BAN
^BANAN
ANANA^
ANA^BA
A^BANA
Although
'BANANA'
is a nice simple example to begin with, in some respects it is too simple. For example:> python bwt.py ACGTGAATTCGAAACCGGAA 11 AAACCGGAA^ACGTGAATTCG 12 AACCGGAA^ACGTGAATTCGA 5 AATTCGAAACCGGAA^ACGTG 18 AA^ACGTGAATTCGAAACCGG 13 ACCGGAA^ACGTGAATTCGAA 0 ACGTGAATTCGAAACCGGAA^ 6 ATTCGAAACCGGAA^ACGTGA 19 A^ACGTGAATTCGAAACCGGA 14 CCGGAA^ACGTGAATTCGAAA 9 CGAAACCGGAA^ACGTGAATT 15 CGGAA^ACGTGAATTCGAAAC 1 CGTGAATTCGAAACCGGAA^A 10 GAAACCGGAA^ACGTGAATTC 4 GAATTCGAAACCGGAA^ACGT 17 GAA^ACGTGAATTCGAAACCG 16 GGAA^ACGTGAATTCGAAACC 2 GTGAATTCGAAACCGGAA^AC 8 TCGAAACCGGAA^ACGTGAAT 3 TGAATTCGAAACCGGAA^ACG 7 TTCGAAACCGGAA^ACGTGAA 20 ^ACGTGAATTCGAAACCGGAA |
The BWT (last column) does not group all identical letters together. And in the inversion step, the reconstructed text is not necessarily in the first row.
In the output above, I've also listed the "suffix array", which is the index into the original text where we start each row. The suffix array will become important for the application to search, next time.
bwt.py
import sys from collections import deque try: text = sys.argv[1] except IndexError: text = 'BANANA' text += '^' n = len(text) #--------------------------------------------- # BWT rL = list() dq = deque(text) for i in range(n): rL.append(''.join(list(dq))) dq.rotate(1) rL.sort() bwt = [s[-1] for s in rL] sfx = [(n - e.index('^') - 1) for e in rL] for s,e in zip(sfx,rL): print '%2d' % s, print e #--------------------------------------------- # reverse # take the last col and sort to generate col 0 c0 = sorted(bwt) print '\n'.join(c0) # get pairs as a demo: c1 = [bwt[i]+c0[i] for i in range(n)] c1.sort() print '\n'.join(c1) # iterate starting with bwt again: L = sorted(bwt) for j in range(n-2): L.sort() L = [bwt[i]+L[i] for i in range(n)] print '\n'.join(L) + '\n' L = [e for e in L if not '^' in e] assert ''.join(L[0]) == text[:-1] |
Sunday, May 27, 2012
Blocks (7)
As long as I was thinking about NSAnimation (here), I wrote a command-line program that uses it. This is a minimal example.
In the process, I realized that the callback can be done with a block. I'm not sure what the approved method to do this would be; I looked at the delegate methods and they're not promising. I guess if we were instantiating the animation from an object we'd just pass self as an argument to the alternative init method. In a GUI app we'd use an IBOutlet and never need to do the explicit init in our code.
Output:
In the process, I realized that the callback can be done with a block. I'm not sure what the approved method to do this would be; I looked at the delegate methods and they're not promising. I guess if we were instantiating the animation from an object we'd just pass self as an argument to the alternative init method. In a GUI app we'd use an IBOutlet and never need to do the explicit init in our code.
Output:
> ./prog 2012-05-27 12:08:12.794 prog[7813:707] Progress: 0.20 2012-05-27 12:08:12.994 prog[7813:707] Progress: 0.40 2012-05-27 12:08:13.194 prog[7813:707] Progress: 0.60 2012-05-27 12:08:13.394 prog[7813:707] Progress: 0.80 2012-05-27 12:08:13.595 prog[7813:707] Progress: 1.00 |
// clang prog.m -o prog -framework Cocoa -fobjc-gc-only #import <Cocoa/Cocoa.h> typedef void (^B)(NSAnimationProgress p); B b = ^(NSAnimationProgress p){ NSLog(@"Progress: %.2f", p); }; @interface MyAnimation : NSAnimation @end @implementation MyAnimation B myBlock; - (id)init { self = [super initWithDuration:1.0 animationCurve:NSAnimationLinear]; if (self) { [self setFrameRate:5.0]; [self setAnimationBlockingMode:NSAnimationNonblocking]; } return self; } - (id)initWithBlock:(B)b { self = [self init]; myBlock = b; return self; } - (void)setCurrentProgress:(NSAnimationProgress)progress { myBlock(progress); } @end int main(int argc, char * argv[]) { NSAnimation *a = [[MyAnimation alloc] initWithBlock:b]; [a startAnimation]; NSRunLoop* rl = [NSRunLoop currentRunLoop]; [rl runUntilDate:[NSDate dateWithTimeIntervalSinceNow:1.2]]; return 0; } |
Bezier curve
I started on an exploration of the
NSBezierPath
function curveToPoint:controlPoint1:controlPoint2:
which is used to draw curves in Cocoa.The math is relatively easy to state (wikipedia). For a cubic curve, we'll have a total of four points, two where the curve starts and stops (P0 and P3) and two control points (P1 and P2). In the screenshots above, P0 and P3 are in black at the top and bottom and the two control points are in red. The blue circle shows the current point along the trajectory. The left one is at an early time, and the right one very late.
[ The purple circles show the current position for the two quadratic Bezier curves from which the cubic one may be constructed. ]
The curve is represented as a parametric equation in the variable
t
(standing for time, since curves are often thought of as trajectories).A standard equation is:
P0(1-t)3 + P1(3)(t)(1-t)2 + P2(3)(t)2(1-t) + P3(t)3
At t = 0 we're at P0 and at t = 1 we're at P3. Beyond that, it's hard to visualize. According to wikipedia:
The curve starts at P0 going toward P1 and arrives at P3 coming from the direction of P2. Usually, it will not pass through P1 or P2; these points are only there to provide directional information. The distance between P0 and P1 determines "how long" the curve moves into direction P2 before turning towards P3.
WritingBPi,Pj,Pk(t)
for the quadratic Bézier curve defined by points Pi, Pj, and Pk, the cubic Bézier curve can be defined as a linear combination of two quadratic Bézier curves:
B(t) = (1-t)BP0,P1,P2 + (t)BP1,P2, P3; 0 <= t <= 1
where
BP0,P1,P2 = P0(1-t)2 + P1(t)(1-t) + P2(t)2
As I say, I found it hard to visualize. So I wrote a little Xcode project that does two things. It presents an animation of the progress of the trajectory. At the moment it can be stopped and restarted, but not paused.
Also, the two control points P1 and P2 can be dragged to different positions to explore how they influence the curve.
I took the dragging and animation code from two earlier PyObjC projects (here and here).
My hope was to reproduce the animations shown in the wikipedia article, and see how the curves combine to produce the result. But I've run out of time for the moment, so I thought I'd just show my progress to date. The zipped project is on Dropbox (here).
View with no window
Scott Stevenson has two great posts about tutorials on drawing in Quartz 2D (here and here). I used his code to draw the graphic above.
One issue I've had was whether it's really necessary to use Core Graphics to draw to PDF from a command-line app. It's not, as this code shows. Thanks to a Stack Overflow answer for help.
// clang prog.m -o prog -framework Cocoa -fobjc-gc-only #import <Cocoa/Cocoa.h> //#import <AppKit/AppKit.h> @interface MyView : NSView @end @implementation MyView - (id)initWithFrame:(NSRect)frame { self = [super initWithFrame:frame]; if (self) { // Initialization code here. } return self; } - (void)drawRect:(NSRect)dirtyRect { // just paste Scott Stevenson's code here } @end int main(int argc, char * argv[]) { MyView *myView; NSRect f = NSMakeRect(0,0,400,400); myView = [[MyView alloc] initWithFrame:f]; NSData *data = [myView dataWithPDFInsideRect:f]; [data writeToFile:@"x.pdf" atomically:YES]; return 0; } |
Friday, May 25, 2012
Quartz (2)
This second Quartz example is really more about arrays than drawing. I made a simple category on NSArray that returns a random object from any array. We set up three arrays from which to grab random values for the color, the origin and the size of a rectangle to draw. Both the origins and sizes need to be wrapped into NSValue objects in order to store them into the arrays. The original structs can then be retrieved by reference.
In a similar way, the rgb and alpha values are recovered from the NSColor to provide them to CGContextSetRGBFillColor. In playing around with the code I decided to bias toward higher values of red. We use a venerable C construct for that.
// clang x.m -o prog -framework Cocoa -fobjc-gc-only #import <Cocoa/Cocoa.h> CGContextRef setup(){ NSString *s = [NSString stringWithUTF8String:"x.pdf"]; NSURL *url = [NSURL fileURLWithPath:s]; CGContextRef ctx = NULL; ctx = CGPDFContextCreateWithURL ( (CFURLRef)url, NULL, NULL); CGRect rect = CGRectMake (0, 0, 500, 400); CGContextBeginPage(ctx, &rect); return ctx; } @interface NSArray (RandomChoiceArray) - (id)getRandomObject; @end; @implementation NSArray (RandomChoiceArray) - (id)getRandomObject { int n = [self count]; int i = (random() % n); return [self objectAtIndex:i]; } - (void)testRandom:(int)n { for (int i = 0; i < n; i++) { NSLog(@"%@", [[self getRandomObject] description]); } } @end int main(int argc, char * argv[]) { srandom(time(NULL)); CGContextRef ctx = setup(); NSColorList *CL; CL = [NSColorList colorListNamed:@"Crayons"]; NSArray *colorNames = [CL allKeys]; //[colorNames testRandom:10]; NSSize hz, vt; hz = NSMakeSize(150,50); vt = NSMakeSize(50,150); NSValue *v, *h; h = [NSValue value:&hz withObjCType:@encode(NSSize)]; v = [NSValue value:&vt withObjCType:@encode(NSSize)]; NSArray *sizes = [NSArray arrayWithObjects:h, v, nil]; //[sizes testRandom:4]; NSMutableArray *ma; ma = [NSMutableArray arrayWithCapacity:10]; for (int i = 0; i < 9; i++) { for (int j = 0; j < 9; j++) { int x = i*50; int y = j*50; NSPoint pt = NSMakePoint(x,y); NSValue *p; p = [NSValue value:&pt withObjCType:@encode(NSPoint)]; [ma addObject:p]; } } NSArray *origins = [NSArray arrayWithArray:ma]; //[origins testRandom:10]; CGFloat r, g, b, a, m; m = 0.7; for (int i = 0; i < 250; i++) { NSString *c = [colorNames getRandomObject]; NSColor *color = [CL colorWithKey:c]; id obj = [origins getRandomObject]; NSPoint pt; [obj getValue:&pt]; NSSize sz; obj = [sizes getRandomObject]; [obj getValue:&sz]; CGRect rect = CGRectMake(pt.x,pt.y,sz.width,sz.height); //NSLog (@"%@", NSStringFromRect(r)); [color getRed:&r green:&g blue:&b alpha:&a]; r = r < m ? m : r; CGContextSetRGBFillColor (ctx,r,g,b,a); CGContextFillRect (ctx,rect); } CGPDFContextEndPage(ctx); CGContextRelease(ctx); return 0; } |
Quartz (1)
The next examples I want to do will be about drawing to a pdf in OS X. Currently, I've been compiling code and executing it in Terminal, to try to abstract the essentials away from all the heavy machinery of Xcode.
If we were using Xcode, and had an app with a GUI, then this could be as simple as calling a method of NSView:
NSRect r = [myView bounds];
NSData *data = [myView dataWithPDFInsideRect:r];
[data writeToFile:[sheet filename] atomically:YES];
But I wanted to write data to a pdf from a command line application. That led me to an example that used:
CGPDFContextCreateWithURL
as well as a family of functions to create a
CFURLRef
:CFURLCreateWithString
CFURLCreateWithFileSystemPath
These are functions, not methods of Objective-C objects. We're really in C-land here. The CF stands for Core Foundation. One aspect of this that I don't understand is that the docs talk about e.g. a
CFURL
and CFString
, but the actual types are CFURLRef and CFStringRef. I'm not sure why yet.CFURLCreateWithFileSystemPath
takes 4 arguments:CFURLRef CFURLCreateWithFileSystemPath (
CFAllocatorRef allocator,
CFStringRef filePath,
CFURLPathStyle pathStyle,
Boolean isDirectory
);
If you want to be fancy, you can pass
kCFAllocatorDefault
for the allocator, but this is really equivalent to NULL, and just tells the system to use the default allocator. The second argument is a CFStringRef
. See the code example below for how to make one of these. The third argument should be kCFURLPOSIXPathStyle (other options would be kCFURLHFSPathStyle and kCFURLWindowsPathStyle). And the last argumenet is false, because this is not for a directory.Since the function has "Create" in the name, we "own" the resulting object, and should release it when we don't need it anymore.
However, it turns out to be much simpler than this (although the above works). NSString and NSURL are "toll-free bridged" with CFStringRef and CFURLRef (docs), and can be substituted one for another in any function call.
It took me far too long to discover the secret of this (as the link above shows), you can substitute, but you must also cast, as in
(CFURLRef)url
.Finally, having a
CFURLRef
, we can get the graphics context appropriate for drawing to a pdf. We bracket the drawing code with calls to CGContextBeginPage
and CGContextEndPage
. The drawing itself is the Apple example and self-explanatory, except that it's important that the transparency (alpha) for the second rectangle is 0.5. That allows us to see some color come through from the layer below it.I puzzled over the fact that the right side of the figure isn't blue but more like purple. I finally realized that what's happening is the transparency works here too (and for some reason the background is white). The implication is that the order matters, if we draw the red rectangle last (with alpha = 0.5) the result is different, as shown.
An excellent beginner's tutorial for Quartz drawing is here.
// clang prog.m -o prog -framework Cocoa -fobjc-gc-only #import <Cocoa/Cocoa.h> int main(int argc, char * argv[]) { // method 1 CFStringRef cfstr = CFSTR("x.pdf"); CFURLRef urlref = CFURLCreateWithFileSystemPath ( kCFAllocatorDefault, cfstr, kCFURLPOSIXPathStyle, false); if (urlref == NULL) { NSLog(@"some problem with the url"); return 0; } // method 2 NSString *s = [NSString stringWithUTF8String:"x.pdf"]; NSURL *url = [NSURL fileURLWithPath:s]; urlref = (CFURLRef)url; CGContextRef myPDFContext = NULL; myPDFContext = CGPDFContextCreateWithURL ( urlref, NULL, NULL); CGRect rect = CGRectMake (0, 0, 500, 250); CGContextBeginPage(myPDFContext, &rect); CGRect r1, r2, r3, r4; r1 = CGRectMake (21,21,100,200); r2 = CGRectMake (21,21,200,100); CGContextSetRGBFillColor (myPDFContext,1,0,0,1); CGContextFillRect (myPDFContext, r1); CGContextSetRGBFillColor (myPDFContext,0,0,1,0.5); CGContextFillRect (myPDFContext, r2); r3 = CGRectMake (221,21,100,200); r4 = CGRectMake (221,21,200,100); CGContextSetRGBFillColor (myPDFContext,0,0,1,1); CGContextFillRect (myPDFContext,r4); CGContextSetRGBFillColor (myPDFContext,1,0,0,0.5); CGContextFillRect (myPDFContext, r3); CGContextEndPage(myPDFContext); //CFRelease(urlref); CGContextRelease(myPDFContext); return 0; } |
Blocks (6)
I decided to work up two more examples from Mike Ash's Friday Q&A about blocks.
Both examples involve a callback. Your program sets up a situation where you'd like to defer the execution of some code to a later time. The classic example is a file dialog. You're asking the user for input, but don't necessarily halt the program until he makes up his mind. Another one is where you have a bunch of calculations that could be done concurrently. I still have to work up a good example of that.
Here we have a notification in the first example, and a timer in the second. The implementations use a category on NSObject. This works because a block is an Objective-C object. An alternative approach would be to use a function rather than a method.
The examples may look a bit strange. The block object is added as the observer, and its method
Notice the use of
Both examples involve a callback. Your program sets up a situation where you'd like to defer the execution of some code to a later time. The classic example is a file dialog. You're asking the user for input, but don't necessarily halt the program until he makes up his mind. Another one is where you have a bunch of calculations that could be done concurrently. I still have to work up a good example of that.
Here we have a notification in the first example, and a timer in the second. The implementations use a category on NSObject. This works because a block is an Objective-C object. An alternative approach would be to use a function rather than a method.
The examples may look a bit strange. The block object is added as the observer, and its method
my_callBlockWithObject
or my_CallBlock
is called when the notification appears or the timer finishes. It looks strange because in the method, the block object is obtained with self and then called.Notice the use of
[block copy]
.// clang blocks.m -o prog -framework Foundation -fobjc-gc-only #import |
> ./prog2012-05-21 07:40:28.601 prog[3441:707] Note! |
// clang blocks.m -o prog -framework Foundation -fobjc-gc-only #import |
> ./prog 2012-05-21 07:41:00.457 prog[3455:707] Hello world! |
Saturday, May 19, 2012
Blocks (5)
Here is a last example of basic blocks in Objective-C. It comes from Mike Ash's excellent Friday Q&A about blocks. We implement the functional programming tool
The map method takes a block as its single argument. This is the fundamental unit of work, which in the example will operate on one element in an array of NSStrings, adding @"x" to it. The block argument to map is declared with an id argument and an id return value; these will be NSStrings when the block is defined at runtime. A different block could be fed to the same map function, and the elements wouldn't have to be strings.
We use
map
as a category on NSArray.The map method takes a block as its single argument. This is the fundamental unit of work, which in the example will operate on one element in an array of NSStrings, adding @"x" to it. The block argument to map is declared with an id argument and an id return value; these will be NSStrings when the block is defined at runtime. A different block could be fed to the same map function, and the elements wouldn't have to be strings.
map
applies the block to each element of the array, storing the results in an NSMutableArray and returning it.We use
enumerateObjectsUsingBlock:
to examine the values, although we could just use for (id obj in array)
.// clang blocks.m -o prog -framework Foundation -fobjc-gc-only #import <Foundation/Foundation.h> @interface NSArray (MappingArray) - (NSArray *)map:(id (^)(id))block; @end; @implementation NSArray (MappingArray) - (NSArray *)map:(id (^)(id))block { // takes an id, returns an id NSMutableArray *ret = [NSMutableArray array]; for(id obj in self) [ret addObject:block(obj)]; return ret; } @end int main(int argc, char * argv[]) { NSArray *a, *b; a = [NSArray arrayWithObjects:@"a",@"b",@"c",nil]; b = [a map:^(id obj){ return [obj stringByAppendingString:@"x"]; }]; [b enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) { NSLog(@"%@", obj); } ]; return 0; } |
> ./prog 2012-05-19 10:00:13.537 prog[3426:707] ax 2012-05-19 10:00:13.539 prog[3426:707] bx 2012-05-19 10:00:13.540 prog[3426:707] cx |
Friday, May 18, 2012
Blocks (4)
In the spirit of quick Objective-C, here is another example using blocks. We feed a bunch of blocks to NSOperationQueue. According to the docs
So then the question is what can the block do? In this example, we get values from a loop counter and also call a function defined in the same file.
A run loop is required in order to see any output (and the output is truncated if the run loop's time interval is shortened).
Output:
From what I understand, I expected the first block to be in a different place than the copy, but they both appear to be on the heap. The first two variables,
This isn't a particularly useful example. I have to look into things some more to find a good toy example for concurrent execution.
The block should take no parameters and have no return value.
So then the question is what can the block do? In this example, we get values from a loop counter and also call a function defined in the same file.
A run loop is required in order to see any output (and the output is truncated if the run loop's time interval is shortened).
// clang blocks.m -o prog -framework Foundation -fobjc-gc-only #import <Foundation/Foundation.h> void f() { printf("."); } typedef void (^B)(); int main(int argc, char * argv[]) { NSOperationQueue *q = [NSOperationQueue mainQueue]; int N = 100; __block int c = 0; NSLog(@"%p", &f); NSLog(@"%p", &main); NSLog(@"%p", &c); B b = ^() { f(); c++; if (c && !(c%(N/10))) { printf("c = %3d\n", c); } }; NSLog(@"%p", &b); B b2 = [b copy]; NSLog(@"%p", &b2); for (int i = 0; i < N; i++) { [q addOperation:[NSBlockOperation blockOperationWithBlock:b]]; } NSRunLoop* rl = [NSRunLoop currentRunLoop]; NSLog(@"%p", &rl); [rl runUntilDate:[NSDate dateWithTimeIntervalSinceNow:0.2]]; return 0; } |
> ./prog 2012-05-19 08:53:26.881 prog[2988:707] 0x103da99c0 2012-05-19 08:53:26.883 prog[2988:707] 0x103da99e0 2012-05-19 08:53:26.883 prog[2988:707] 0x7fff639a7ba8 2012-05-19 08:53:26.884 prog[2988:707] 0x7fff639a7b88 2012-05-19 08:53:26.884 prog[2988:707] 0x7fff639a7b88 2012-05-19 08:53:26.885 prog[2988:707] 0x7fff639a7b50 2012-05-19 08:53:26.887 prog[2988:707] 0x7fff639a7b40 ..........c = 10 ..........c = 20 ..........c = 30 ..........c = 40 ..........c = 50 ..........c = 60 ..........c = 70 ..........c = 80 ..........c = 90 ..........c = 100 |
main
and f
, are on the stack.This isn't a particularly useful example. I have to look into things some more to find a good toy example for concurrent execution.
Blocks (3)
Here is one more simple use of blocks in Objective-C (previous posts here and here).
The docs for NSArray give these three functions using blocks:
As the reference indicates, these methods require a particular form of block with this signature:
(Apple uses
Another NSArray method that uses a block is
Both versions give the same output:
One more NSArray method that uses a block is
Output:
The docs for NSArray give these three functions using blocks:
enumerateObjectsUsingBlock:
enumerateObjectsWithOptions:usingBlock:
enumerateObjectsAtIndexes:options:usingBlock:
As the reference indicates, these methods require a particular form of block with this signature:
^(id s, NSUInteger i, BOOL *stop)
(Apple uses
idx
for i
). The stop
flag is meant to tbe used to bail from an iteration when the first matching object is found. The index is not a loop counter but shows the index of the current object within the array.// clang blocks2.m -o prog -framework Foundation -fobjc-gc-only #import <Foundation/Foundation.h> typedef void (^B) (id s, NSUInteger idx, BOOL *stop); int main(int argc, char * argv[]){ B b = ^(id s, NSUInteger i, BOOL *stop) { NSLog(@"i = %lu, pre : s = %@", i, s); if ([s caseInsensitiveCompare:@"B"] == 0) { NSLog(@"setting stop, s = %@", s); *stop = YES; } NSLog(@"post: s = %@", s); }; NSArray *arr = [NSArray arrayWithObjects:@"a",@"B",@"c",@"d",nil]; [arr enumerateObjectsUsingBlock:b]; printf("------------------------------\n"); //NSUint o = NSEnumerationReverse; uint o = NSEnumerationReverse; [arr enumerateObjectsWithOptions:o usingBlock:b]; printf("------------------------------\n"); NSIndexSet *is; NSRange R = NSMakeRange(2,2); is = [NSIndexSet indexSetWithIndexesInRange:R]; [arr enumerateObjectsAtIndexes:is options:NO usingBlock:b]; return 0; } |
> ./prog 2012-05-18 09:02:07.791 prog[382:707] i = 0, pre : s = a 2012-05-18 09:02:07.793 prog[382:707] post: s = a 2012-05-18 09:02:07.794 prog[382:707] i = 1, pre : s = B 2012-05-18 09:02:07.795 prog[382:707] setting stop, s = B 2012-05-18 09:02:07.795 prog[382:707] post: s = B ------------------------------ 2012-05-18 09:02:07.796 prog[382:707] i = 3, pre : s = d 2012-05-18 09:02:07.796 prog[382:707] post: s = d 2012-05-18 09:02:07.797 prog[382:707] i = 2, pre : s = c 2012-05-18 09:02:07.797 prog[382:707] post: s = c 2012-05-18 09:02:07.798 prog[382:707] i = 1, pre : s = B 2012-05-18 09:02:07.798 prog[382:707] setting stop, s = B 2012-05-18 09:02:07.799 prog[382:707] post: s = B ------------------------------ 2012-05-18 09:02:07.799 prog[382:707] i = 2, pre : s = c 2012-05-18 09:02:07.800 prog[382:707] post: s = c 2012-05-18 09:02:07.800 prog[382:707] i = 3, pre : s = d 2012-05-18 09:02:07.801 prog[382:707] post: s = d |
Another NSArray method that uses a block is
sortedArrayUsingComparator
. Here's an example adapted from the Apple docs, followed by a slightly modified approach.// clang blocks3.m -o prog -framework Foundation -fobjc-gc-only #import <Foundation/Foundation.h> void show(NSArray *a) { for (id obj in a) { NSLog(@"%@", [obj description]); } } int main(int argc, char * argv[]) { NSArray *a; a = [NSArray arrayWithObjects:@"1",@"21",@"12",@"11",@"02",nil]; NSStringCompareOptions o1, o2, o3, o4; o1 = NSCaseInsensitiveSearch; o2 = NSNumericSearch; o3 = NSWidthInsensitiveSearch; o4 = NSForcedOrderingSearch; NSStringCompareOptions o = o2; NSLocale *loc = [NSLocale currentLocale]; NSComparator cmp = ^(id s1, id s2) { NSRange R = NSMakeRange(0, [s1 length]); return [s1 compare:s2 options:o range:R locale:loc]; }; NSArray *result = [a sortedArrayUsingComparator:cmp]; show(result); NSComparator cmp2 = ^(id s1, id s2) { NSNumber *x = [NSNumber numberWithInt:[s1 intValue]]; NSNumber *y = [NSNumber numberWithInt:[s2 intValue]]; return (NSComparisonResult)[x compare:y]; }; result = [a sortedArrayUsingComparator:cmp2]; show(result); return 0; } |
Both versions give the same output:
2012-05-18 13:31:41.071 prog[902:707] 1 2012-05-18 13:31:41.073 prog[902:707] 02 2012-05-18 13:31:41.074 prog[902:707] 11 2012-05-18 13:31:41.074 prog[902:707] 12 2012-05-18 13:31:41.075 prog[902:707] 21 |
One more NSArray method that uses a block is
indexesOfObjectsPassingTest
. This example is also adapted from the Apple docs. It's followed by use of a block and enumerateIndexesUsingBlock
to enumerate the resulting NSIndexSet.// clang blocks5.m -o prog -framework Foundation -fobjc-gc-only #import <Foundation/Foundation.h> int main(int argc, char * argv[]) { NSString *s = [NSString stringWithFormat:@"A B C A B Z G Q"]; NSArray *a = [s componentsSeparatedByString:@" "]; NSSet *filterSet = [NSSet setWithObjects:@"A", @"C", @"Q", nil]; BOOL (^test) (id obj, NSUInteger idx, BOOL *stop); test = ^ (id obj, NSUInteger idx, BOOL *stop) { if (idx < 5) { if ([filterSet containsObject:obj]) { return YES; } } return NO; }; NSIndexSet *iset = [a indexesOfObjectsPassingTest:test]; [iset enumerateIndexesUsingBlock:^(NSUInteger idx, BOOL *stop) { NSLog(@"%lu", idx); } ]; return 0; } |
> ./prog 2012-05-18 17:03:26.348 prog[1477:707] 0 2012-05-18 17:03:26.351 prog[1477:707] 2 2012-05-18 17:03:26.351 prog[1477:707] 3 |
Thursday, May 17, 2012
Blocks (2)
Continuing from the previous post, here we see the ability of a block to access (and modify) variables in the enclosing scope. The first type of block takes an int, multiplies it by a second int defined in main but not passed as an argument, and returns the result. It also accesses the variable c (a char *) in main, and changes its value. The change happens before the second block runs as shown by the result of
Output:
strcmp
, which returns 0 for a match, resulting in the assignment of c to a third value.// clang blocks.m -o prog -framework Foundation -fobjc-gc-only #import <Foundation/Foundation.h> typedef int (^A) (int x); typedef void (^B) (); int main(int argc, char * argv[]){ int y = 7; __block char *c = "abc\0"; printf("initial: %s\n", c); const char *c2 = "def\0"; A a = ^(int x) { c = (char *) c2; return x*y; }; B b = ^() { if (!(strcmp(c,c2))) { c = "ghi\0"; } }; printf("a(2) returns: %i\n", a(2)); printf("before b(): %s\n", c); b(); printf("after b(): %s\n", c); return 0; } |
Output:
> ./prog initial: abc a(2) returns: 14 before b(): def after b(): ghi |
Blocks (1)
Lately I've been working on improving my Objective-C coding skills. Although I like PyObjC a lot, it has some issues. I'm thinking I will post some of my exercises on the blog under
To help focus my efforts, I bought a copy of Aaron Hillegass's book
Objective-C Programming.
I like this book a lot. It's a basic introduction that has lots of C as well, but the material has been finely honed by many hours in the classroom. It doesn't cover GUI stuff very much, that's for his other book, the latest edition of which is by Hillegass & Preble.
In order to keep things simple I am not using Xcode. I just invoke the clang compiler from the command line, usually with a
The example for today is a very basic block, which comes rather late in the book. But before we do that, here is an example of a function pointer in C. We have, in order, a function that takes a single
The output is as expected:
The typedef allows us to declare variables of type fptr. Since the signature for speak matches, we can assign f to the address of speak, and call it just as we would the original function. Either the pointer (or the original function) can be passed into g.
A basic block is declared and defined in a way similar to this. It turns a function into an object that can be passed around, for example, into functions. Of course, blocks are much more powerful than this (e.g. bbum's post).
Here is a basic block. We go to the trouble of making a typedef for this type of block, and define a function f that takes such a block as an argument.
The similarity to function pointers is pretty clear. The typedef isn't necessary, of course. And if a block is to be used only once, it might even be anonymous, something like a Python lambda.
No typedef:
Anonymous versions:
It's worth pointing out that the return type is not required to be specified.
An standard example of an anonymous block is to handle the result of a file save dialog (as described in my post here).
I'll show an example of capturing state from the enclosing scope in another post.
Quick Objective-C
.To help focus my efforts, I bought a copy of Aaron Hillegass's book
Objective-C Programming.
I like this book a lot. It's a basic introduction that has lots of C as well, but the material has been finely honed by many hours in the classroom. It doesn't cover GUI stuff very much, that's for his other book, the latest edition of which is by Hillegass & Preble.
In order to keep things simple I am not using Xcode. I just invoke the clang compiler from the command line, usually with a
#import <Foundation/Foundation.h>
, and I use garbage collection.The example for today is a very basic block, which comes rather late in the book. But before we do that, here is an example of a function pointer in C. We have, in order, a function that takes a single
const char *
argument, a typedef for a function pointer for functions of that signature, and a function that takes such a function pointer as an argument.// clang fptr.c -o prog #include <stdio.h> void speak(const char *s) { printf("%s", s); } typedef void (*fptr)(const char *); void g(fptr f, const char *s){ f(s); } int main(int argc, const char *argv[]){ speak("abc\n"); fptr f = &speak; f("def\n"); g(f, "ghi\n"); g(speak, "jkl\n"); return 0; } |
The output is as expected:
> ./prog abc def ghi jkl |
The typedef allows us to declare variables of type fptr. Since the signature for speak matches, we can assign f to the address of speak, and call it just as we would the original function. Either the pointer (or the original function) can be passed into g.
A basic block is declared and defined in a way similar to this. It turns a function into an object that can be passed around, for example, into functions. Of course, blocks are much more powerful than this (e.g. bbum's post).
Here is a basic block. We go to the trouble of making a typedef for this type of block, and define a function f that takes such a block as an argument.
// clang blocks.m -o prog -framework Foundation -fobjc-gc-only #include <Foundation/Foundation.h> typedef void (^B) (const char *s); void f(B b, const char *s){ b(s); } int main(int argc, char * argv[]) { B b = ^(const char *s) { printf("%s", s); }; f(b,"abc\n"); return 0; } |
> ./prog abc |
The similarity to function pointers is pretty clear. The typedef isn't necessary, of course. And if a block is to be used only once, it might even be anonymous, something like a Python lambda.
No typedef:
retval (^block_var) (named_args) = ^(named_args) { ..code.. };
Anonymous versions:
[obj methodWithArg:arg1 block:^(named_args) { ..code.. }];
[obj methodWithArg:arg1 block:(retval (^)(named_args)) { ..code.. }];
It's worth pointing out that the return type is not required to be specified.
An standard example of an anonymous block is to handle the result of a file save dialog (as described in my post here).
[savePanel beginSheetModalForWindow:[NSApp mainWindow] completionHandler:^(NSInteger result) { if (result == NSOKButton) { [savePanel orderOut:self]; [self application:NSApp saveFile:[savePanel filename]]; } }]; |
I'll show an example of capturing state from the enclosing scope in another post.
Thursday, May 3, 2012
Unicode is weird fun!
I came across something fun in an SO answer the other day:
That's pretty entertaining! I pasted it into the Python interpreter:
According to this, the flipped version of
But it's going to take some 'splainin'.
This is a great place to start. The thing about Unicode is that it is fundamentally just a listing of integer values and the corresponding "code points." Just take all the interesting symbols you can find and organize them and then assign an integer to one after another. That's not hard to understand. What's hard is when you get into representing these integers in multi-byte encodings (e.g. UTF-8) and, of course, in printing them.
We can get some info about this in Python using the unicodedata module.
The unicode "value" is 2*16*16 + 5*16 = 512 + 80 = 592.
What makes it complicated is the gyrations involved in representing this in various other formats, like UTF-8 or HTML ('&' + '#592;'). I grabbed the defs for the flipTable from the reference above.
Then I went exploring in Python (see the script for details):
Many of the "flips" are silly ('q' for 'b' and so on), but lots of them are absolutely right. Finally, my Python doesn't seem to know about anything
>>>
˙sɯǝlqoɹd ʎuɐ pɐɥ ɹǝʌǝu ǝʌ,I puɐ uʍop ǝpısdn pǝʇunoɯ ǝɹɐ sǝɥɔʇıʍs ʞɹoʍʇǝu ʎɯ ɟo ll∀'
That's pretty entertaining! I pasted it into the Python interpreter:
>>> s = '˙sɯǝlqoɹd ʎuɐ pɐɥ ɹǝʌǝu ǝʌ,I puɐ uʍop ǝpısdn pǝʇunoɯ ǝɹɐ sǝɥɔʇıʍs ʞɹoʍʇǝu ʎɯ ɟo ll∀' >>> s '\xcb\x99s\xc9\xaf\xc7\x9dlqo\xc9\xb9d \xca\x8eu\xc9\x90 p\xc9\x90\xc9\xa5 \xc9\xb9\xc7\x9d\xca\x8c\xc7\x9du \xc7\x9d\xca\x8c,I pu\xc9\x90 u\xca\x8dop \xc7\x9dp\xc4\xb1sdn p\xc7\x9d\xca\x87uno\xc9\xaf \xc7\x9d\xc9\xb9\xc9\x90 s\xc7\x9d\xc9\xa5\xc9\x94\xca\x87\xc4\xb1\xca\x8ds \xca\x9e\xc9\xb9o\xca\x8d\xca\x87\xc7\x9du \xca\x8e\xc9\xaf \xc9\x9fo ll\xe2\x88\x80' |
According to this, the flipped version of
'a' ('0x61')
is '\u0250'
, in UTF-8 that's '\xc9\x90'
. (There are a bunch of other mappings). You can see these two bytes in the hex data right after ' p'
: '\xc9\x90'
. But it's going to take some 'splainin'.
This is a great place to start. The thing about Unicode is that it is fundamentally just a listing of integer values and the corresponding "code points." Just take all the interesting symbols you can find and organize them and then assign an integer to one after another. That's not hard to understand. What's hard is when you get into representing these integers in multi-byte encodings (e.g. UTF-8) and, of course, in printing them.
We can get some info about this in Python using the unicodedata module.
>>> import unicodedata >>> i = 592 >>> hex(i) '0x250' >>> c = unichr(i) >>> print c ɐ >>> c.encode('utf8') '\xc9\x90' >>> '\xc9\x90'.decode('utf8') u'\u0250' >>> unicodedata.name(c) 'LATIN SMALL LETTER TURNED A' >>> |
The unicode "value" is 2*16*16 + 5*16 = 512 + 80 = 592.
What makes it complicated is the gyrations involved in representing this in various other formats, like UTF-8 or HTML ('&' + '#592;'). I grabbed the defs for the flipTable from the reference above.
Then I went exploring in Python (see the script for details):
> python script.py a from flipTable: ɐ 592 \u0250 LATIN SMALL LETTER TURNED A from unicodedata: ɐ (592, '0x250', u'\u0250', 'LATIN SMALL LETTER TURNED A') ᵄ (7492, '0x1d44', u'\u1d44', 'MODIFIER LETTER SMALL TURNED A') .. e from flipTable: ǝ 477 \u01DD LATIN SMALL LETTER TURNED E from unicodedata: ǝ (477, '0x1dd', u'\u01dd', 'LATIN SMALL LETTER TURNED E') ᴈ (7432, '0x1d08', u'\u1d08', 'LATIN SMALL LETTER TURNED OPEN E') ᵌ (7500, '0x1d4c', u'\u1d4c', 'MODIFIER LETTER SMALL TURNED OPEN E') ⱻ (11387, '0x2c7b', u'\u2c7b', 'LATIN LETTER SMALL CAPITAL TURNED E') |
Many of the "flips" are silly ('q' for 'b' and so on), but lots of them are absolutely right. Finally, my Python doesn't seem to know about anything
i >= 2**16
.>>> i = 2**16 >>> i 65536 >>> unichr(i) Traceback (most recent call last): File " |
>>>
script.py
from collections import defaultdict import unicodedata from flip import flipTable letters = 'abcdefghijklmnopqrstuvwxyz' L = list(letters) D = defaultdict(list) def caps(s): L = [c.capitalize() for c in s] return ''.join(L) for i in range(1,2**16): u = unichr(i) try: n = unicodedata.name(u) except ValueError: continue v = n.lower() if 'turned' in v: k = v.split()[-1] if len(k) > 1: continue if not 'small' in v: k = k.upper() D[k].append((i,hex(i),u,n)) for s in letters: print s c = '\u00' + caps(hex(ord(s))[2:]) try: v = flipTable[c] i = int('0x' + v[2:],16) u = unichr(i) print 'from flipTable:' print u.ljust(4), i, v, print unicodedata.name(u) except KeyError: pass if D[s]: print 'from unicodedata:' for e in D[s]: print e[2].ljust(4), e print '-' * 10 |
Tuesday, May 1, 2012
gpg
This post is about the Gnu Privacy Guard (gpg). A link from the main page leads to a download of an installer for OS X here.
I followed the Quick start tutorial here, and sent an encrypted email between two of my accounts using Mail after just a few minutes. Nice!
The lock and signature icons in the lower right-hand corner are only active if the recipient's public key is available.
Let's explore a bit in Terminal
generate a new key with a spectacularly weak passphrase: abc
I have a feeling that this fake user's data may have been sent to the key server, and that's not nice. But I expected to get a prompt related to that and I didn't get one.
Let's take a quick look at what we have:
-a is "ascii armored" base64 output:
While this looks quite similar to RSA (my posts), the Python rsa module won't handle it as is. I didn't dissect it by hand yet. There's something called pgpdump that is recommended on the web.
It builds easily and then I can do:
but I didn't find the output all that useful yet. It shows multiple values for n and e but doesn't show d or p and q.
Try some encryption:
I found what looks to be an excellent tutorial on the web. I still need to work through it.
I followed the Quick start tutorial here, and sent an encrypted email between two of my accounts using Mail after just a few minutes. Nice!
The lock and signature icons in the lower right-hand corner are only active if the recipient's public key is available.
Let's explore a bit in Terminal
generate a new key with a spectacularly weak passphrase: abc
> gpg --gen-key gpg (GnuPG/MacGPG2) 2.0.18; Copyright (C) 2011 Free Software Foundation, Inc. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Please select what kind of key you want: (1) RSA and RSA (default) (2) DSA and Elgamal (3) DSA (sign only) (4) RSA (sign only) Your selection? 1 RSA keys may be between 1024 and 4096 bits long. What keysize do you want? (2048) Requested keysize is 2048 bits Please specify how long the key should be valid. 0 = key does not expire |
I have a feeling that this fake user's data may have been sent to the key server, and that's not nice. But I expected to get a prompt related to that and I didn't get one.
Let's take a quick look at what we have:
> gpg -k Alice pub 2048R/6CC18DC3 2012-04-29 [expires: 2016-04-28] uid Alice |
-a is "ascii armored" base64 output:
> gpg -a --export 6CC18DC3 -----BEGIN PGP PUBLIC KEY BLOCK----- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org mQENBE+dTEUBCADWaa0ikPdHmp7OONsuhUJeIIr.. .. -----END PGP PUBLIC KEY BLOCK----- > > gpg -a --export-secret-key 6CC18DC3 -----BEGIN PGP PRIVATE KEY BLOCK----- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org lQO+BE+dTEUBCADWaa0ikPdHmp7OONsuhUJeIIr.. .. -----END PGP PRIVATE KEY BLOCK----- |
While this looks quite similar to RSA (my posts), the Python rsa module won't handle it as is. I didn't dissect it by hand yet. There's something called pgpdump that is recommended on the web.
It builds easily and then I can do:
gpg --export 6CC18DC3 | pgpdump -i gpg --export 6CC18DC3-secret-key | pgpdump -i |
but I didn't find the output all that useful yet. It shows multiple values for n and e but doesn't show d or p and q.
Try some encryption:
> gpg -e -r 6CC18DC3 m.txt > hexdump -C m.txt 00000000 68 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 0a |hello, world!.| 0000000e > hexdump -C m.txt.gpg 00000000 85 01 0c 03 a9 47 9a 63 dc 9c 92 b3 01 07 ff 57 |.....G.c.......W| 00000010 86 9b 5d a0 40 d1 f0 ef 5a 6f dc eb 19 a9 eb 8c |..].@...Zo......| 00000020 ee 66 a7 34 84 e4 47 5b 6c 48 9f 9e 89 13 4c 2a |.f.4..G[lH....L*| 00000030 71 6a 31 b7 27 23 9d 56 a7 c2 ad fd db 47 57 68 |qj1.'#.V.....GWh| 00000040 da 75 9a 2d 2f f6 46 60 16 84 b6 17 bf e7 b7 5c |.u.-/.F`.......\| 00000050 36 fd d1 e2 22 ee 93 dc ad 82 f5 f1 46 99 12 f3 |6...".......F...| 00000060 fe 25 a1 b3 01 8c 37 a0 59 da ac 39 90 a4 1c ba |.%....7.Y..9....| 00000070 a0 4f 1e b6 da d5 36 55 b1 17 d6 c4 5a 28 de b4 |.O....6U....Z(..| 00000080 47 b2 af 8a c8 9c 58 85 44 f8 08 fe a1 47 c3 8f |G.....X.D....G..| 00000090 4d b1 78 50 87 dc a7 7f 55 89 f2 6e 7f 75 ae a0 |M.xP....U..n.u..| 000000a0 69 68 46 5a 64 1e b4 6e c7 ee 84 77 8d a4 ce 14 |ihFZd..n...w....| 000000b0 72 45 13 be d0 33 5c d6 23 6f 2d b2 84 2f d9 55 |rE...3\.#o-../.U| 000000c0 f7 de d2 8f b6 20 5b 71 4e 31 ae b8 d7 1b 09 bf |..... [qN1......| 000000d0 80 9e e0 1f 47 cb 73 a1 59 42 81 24 1f 2b de 4b |....G.s.YB.$.+.K| 000000e0 0d 23 fc c6 a2 83 5e c2 b3 e5 9f 1f 32 ae 75 07 |.#....^.....2.u.| 000000f0 79 7f 51 49 02 80 a8 47 c4 5c b6 6f aa ac d4 5c |y.QI...G.\.o...\| 00000100 e7 c9 b6 1f d2 c1 7e 03 45 34 59 85 d1 63 01 d2 |......~.E4Y..c..| 00000110 4e 01 27 2e e9 09 aa 82 5d 77 56 82 22 4e 2e 67 |N.'.....]wV."N.g| 00000120 1c 4a bc ba c1 43 d6 f0 86 02 5d e7 b3 58 74 79 |.J...C....]..Xty| 00000130 bc 15 69 d4 44 ba f6 76 0c a7 a1 d5 9b 1b e0 8b |..i.D..v........| 00000140 b6 7b df db b0 5f e6 34 0b 36 14 0b fd c6 62 f3 |.{..._.4.6....b.| 00000150 16 a8 97 ad 92 e7 4e a4 ee ab 59 53 91 c6 52 |......N...YS..R| 0000015f > gpg -d -o p.txt m.txt.gpg You need a passphrase to unlock the secret key for user: "Alice |
I found what looks to be an excellent tutorial on the web. I still need to work through it.
pycrypto
Here's a Quick Python post about encryption using the
I tried
Let's try a little DES:
Here is a different mode of DES called CFB (Cipher Feedback). It requires some random bytes, and two instantiations of the object:
The venerable xor method is also present:
I adapted one of the scripts from Laurent Luce's post to do encryption on a file. Here is the output (the hex values of the key we generated), and some display of the data using
pycrypto
module. Much of it is derived from the excellent overview here.I tried
easy_install
first, but it hung for some reason, so I just did:git clone https://github.com/dlitz/pycrypto.git cd pycrypto python setup.py build sudo python setup.py install |
Let's try a little DES:
>>> from Crypto.Cipher import DES >>> des = DES.new('01234567',DES.MODE_ECB) >>> text = 'hello, world' >>> c = des.encrypt(text) Traceback (most recent call last): File " |
Here is a different mode of DES called CFB (Cipher Feedback). It requires some random bytes, and two instantiations of the object:
>>> from Crypto.Cipher import DES >>> from Crypto import Random >>> iv = Random.get_random_bytes(8) >>> des1 = DES.new('01234567', DES.MODE_CFB, iv) >>> des2 = DES.new('01234567', DES.MODE_CFB, iv) >>> text = 'hello, world!' >>> c = des1.encrypt(text) >>> c '\x92\xc6\xc7K=\xb0\xf4\x83A\xfd\xa4\x13e' >>> des2.decrypt(c) 'hello, world!' |
The venerable xor method is also present:
>>> from Crypto.Cipher import XOR >>> key = '\x00'*4 + '\x01'*4 + '\x10'*4 + '\x11'*4 >>> xor = XOR.new(key) >>> text = '\x00\x01\x10\x11' * 4 >>> c = xor.encrypt(text) >>> p = xor.decrypt(c) >>> text '\x00\x01\x10\x11\x00\x01\x10\x11\x00\x01\x10\x11\x00\x01\x10\x11' >>> key '\x00\x00\x00\x00\x01\x01\x01\x01\x10\x10\x10\x10\x11\x11\x11\x11' >>> c '\x00\x01\x10\x11\x01\x00\x11\x10\x10\x11\x00\x01\x11\x10\x01\x00' >>> p '\x00\x01\x10\x11\x00\x01\x10\x11\x00\x01\x10\x11\x00\x01\x10\x11' |
I adapted one of the scripts from Laurent Luce's post to do encryption on a file. Here is the output (the hex values of the key we generated), and some display of the data using
hexdump
:> python script.py 21 31 e4 25 b1 ab be fb 3d 35 95 f7 8b 67 ba 24 > hexdump -C m.txt 00000000 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 0a |Hello, world!.| 0000000e > hexdump -C c.txt 00000000 25 bc 08 25 ae 7c fb c3 3a a3 91 83 6a 34 7c 06 |%..%.|..:...j4|.| 00000010 > hexdump -C p.txt 00000000 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 0a 20 20 |Hello, world!. | 00000010 |
script.py
from Crypto import Random from Crypto.Cipher import DES3 import struct def encrypt_file(ifn, ofn, chunk_size, key, iv): des3 = DES3.new(key, DES3.MODE_CFB, iv) with open(ifn, 'r') as in_file: with open(ofn, 'w') as out_file: while True: chunk = in_file.read(chunk_size) if len(chunk) == 0: break elif len(chunk) % 16 != 0: chunk += ' ' * (16 - len(chunk) % 16) out_file.write(des3.encrypt(chunk)) def decrypt_file(ifn, ofn, chunk_size, key, iv): des3 = DES3.new(key, DES3.MODE_CFB, iv) with open(ifn, 'r') as in_file: with open(ofn, 'w') as out_file: while True: chunk = in_file.read(chunk_size) if len(chunk) == 0: break out_file.write(des3.decrypt(chunk)) SZ = 100 ifn = 'm.txt' ofn = 'c.txt' key = Random.get_random_bytes(16) iv = Random.get_random_bytes(8) L = [struct.unpack('B',k) for k in key] L = [hex(t[0])[2:] for t in L] print ' '.join(L) encrypt_file(ifn, ofn, SZ, key, iv) decrypt_file(ofn, 'p.txt', SZ, key, iv) |
bytearray
I came across a post by Python guru David Beazley introducing the bytearray data type. Here is one of his examples, modified slightly:
The usage
A bytearray has the methods of lists including
We can use a bytearray to do a small job of encryption in a very convenient way:
>>> s = "Hello World" >>> b = bytearray(s) >>> b bytearray(b'Hello World') >>> b[:5] = "Cruel" >>> b bytearray(b'Cruel World') >>> for i in b: ... print i ... 67 114 117 101 108 32 87 111 114 108 100 >>> b.append(33) >>> b bytearray(b'Cruel World!') >>> print b Cruel World! |
The usage
b"mystring"
was new to me as well. According to this SO answer:the b prefix for string literals is ineffective in 2.6, but it is a useful marker in the program, which flags explicitly the intent of the programmer to have the string as a data string rather than a text string. This info can then be used by the 2to3 converter or similar utilities when the program is ported to Py3k.More info here (thanks, Ned).
A bytearray has the methods of lists including
__getitem__
and __setitem__
. Unusually for Python, one can use arguments of both int and str types. The actual type of a bytearray item is < type 'int'>
.>>> b = bytearray('Hello World!') >>> b[-1] = '*' >>> b bytearray(b'Hello World*') >>> ord('!') 33 >>> b[-1] = 33 >>> b bytearray(b'Hello World!') |
We can use a bytearray to do a small job of encryption in a very convenient way:
>>> import numpy as np >>> msg = bytearray("Hello World") >>> n = len(msg) >>> L = np.random.randint(0,256,n) >>> key = bytearray(list(L)) >>> len(key) 11 >>> key bytearray(b'N\x963\x8a\x06\xf2\xe9\x92\xb9\xf4L') >>> >>> >>> ctext = bytearray() >>> for c,k in zip(msg,key): ... ctext.append(c ^ k) ... >>> ctext bytearray(b'\x06\xf3_\xe6i\xd2\xbe\xfd\xcb\x98(') >>> ptext = bytearray() >>> for c,k in zip(ctext,key): ... ptext.append(c ^ k) ... >>> ptext bytearray(b'Hello World') |
Monday, April 30, 2012
KS test (2)
A bit more about the Kolmogorov-Smirnov test after the first post here. I found a pdf at MIT ocw about this. I don't understand much of it but it's clear that using the absolute value of the difference to get the statistic is correct.
Now we need to figure out the distribution of the statistic under the null hypothesis, so we can set our type I error at 0.05. From my perspective, the way to do this is to do a simulation in Python. The code below does the following N times: grab n=20 numbers from the standard normal and sort them, compute the D statistic as we did in the previous post, and accumulate those values in a list.
We sort the list, then find the place in the list where the statistic for the test case is found. (I used a function from the bisect module for this, but the naive method gives the same result).
I get a p-value of 0.25. This does not match what we had before (0.36), and I am not sure why that is.
To do a little more checking, I found an online reference that gives p-values for various statistic,n combinations.
for n = 20
p-value D 0.20 0.210 0.15 0.246 0.10 0.264 0.05 0.294 0.01 0.356 |
According to this, a statistic of 0.20 (what we had) gives a p-value a bit larger than 0.20, but not much. So this source doesn't agree with what we got from R and Python. Maybe the online source is for a one-sided test (it doesn't state clearly), that would give a smaller p-value for a given statistic.
I added calculations for our simulation of d given various p, and vice-versa. The output:
> python script.py d = 0.199 p = 0.25 d = 0.246 p = 0.10 d = 0.264 p = 0.06 d = 0.294 p = 0.03 p = 0.15 d = 0.226 p = 0.10 d = 0.245 p = 0.05 d = 0.275 p = 0.01 d = 0.336 |
Not sure what the correct answer is here, but I can't find a problem with the code and I feel like it's time to move on. One last bit is something to think about that is quoted here:
The Kolmogorov-Smirnov test is based on a simple way to quantify the discrepancy between the observed and expected distributions. It turns out, however, that it is too simple, and doesn't do a good job of discriminating whether or not your data was sampled from a Gaussian distribution. An expert on normality tests, R.B. D’Agostino, makes a very strong statement: “The Kolmogorov-Smirnov test is only a historical curiosity. It should never be used.” (“Tests for Normal Distribution” in Goodness-of-fit Techniques, Marcel Decker, 1986).
Take that, Andrey!
from bisect import bisect_left as bl from scipy.stats import norm import matplotlib.pyplot as plt import numpy as np # normal cdf rv = norm() def calc_stat(xL,yL): xL.sort() dL = list() for x,y0 in zip(xL,yL): y1 = rv.cdf(x) dL.append(abs(y1-y0)) return max(dL) def p_value(d,L): #for i,e in enumerate(L): #if d < e: #break i = bl(L,d) p = 1 - ((i+1)*1.0/len(L)) return p #----------------------------------- # simulation N = 10000 n = 20 yL = range(1,n+1) yL = [1.0*y/n for y in yL] #replaces "steps" pL = list() for i in range(N): xL = np.random.normal(0,1,n) D = calc_stat(xL,yL) pL.append(D) plt.hist(pL,bins=50) plt.savefig('example.png') #----------------------------------- pL.sort() for d in [0.199, 0.246, 0.264, 0.294]: p = p_value(d,pL) print 'd = %3.3f' % d, print 'p = %3.2f' % p print for p in [0.15, 0.10, 0.05, 0.01]: i = int(N*(1-p)) d = pL[i] print 'p = %3.2f' % p, print 'd = %3.3f' % d |
KS test
I ran into a question about using the Kolmogorov-Smirnov test for goodness-of-fit in R and Python. I'd heard of the KS test but didn't know what it was about so I decided to explore.
Wikipedia's explanation was opaque at first (not surprising), but I found nice ones here and here.
As the references say, the KS test can be used to decide if a sample comes from a population with a specific distribution. We calculate a statistic D, and if the statistic is large enough we reject the null hypothesis that the sample comes from the specified distribution.
What is the D statistic? I think it is just the maximum deviation between the observed cdf for the sample, compared to the distribution. That works for the script below.
The KS test is a non-parametric test, it doesn't depend on assumptions about the distribution (e.g. standard deviation) of the data.
For these tests, I needed to reinstall scipy. The previous difficulties were not observed this time. It was as simple as:
git clone git://github.com/scipy/scipy cd scipy python setup.py build sudo python setup.py install |
The script below grabs 20 numbers from the normal distribution (sd = 2.0) and compares them with the standard normal distribution (sd = 1). The script does a plot (screenshot above) and prints the results of the KS test:
> python script.py max 0.198 D = 0.20, p-value = 0.36 |
We are not able to reject the null hypothesis, even though the source of the numbers had sd=2 and the null hypothesis is that sd=1.
I did a quick check by copying the numbers
x=c(-5.05,-2.17,-1.66,-1.35,-0.98,-0.97,-0.84,-0.74,-0.67,-0.00,0.21,0.24,0.33,0.67,0.94,1.63,1.65,2.05,2.22,3.19)
and pasting them into R. Then:
> ks.test(x,pnorm,0,1) One-sample Kolmogorov-Smirnov test data: x D = 0.1986, p-value = 0.3611 alternative hypothesis: two-sided |
Which matches. I also wrote a slightly hackish test of the test in R:
f <- function () { c = 0 for (i in 1:100) { y1 = rnorm(n,mean=0,sd=1) result = ks.test(y1,"pnorm",0,1) if (result$p.value < 0.05) { print (result$statistic) print (result$p.value) } } } |
> n = 10 > f() D 0.454372 [1] 0.020945 D 0.4113619 [1] 0.04811348 D 0.4722679 [1] 0.01440302 D 0.43666 [1] 0.02984022 D 0.5340613 [1] 0.003428069 D 0.4336947 [1] 0.03161219 |
Looks good to me.
script.py
import matplotlib.pyplot as plt from scipy.stats import uniform, norm, kstest import numpy as np np.random.seed(1773) def steps(L): delta = 1.0/len(L) y = 0 rL = list() for x in L: y += delta rL.append(y) return rL xL = np.random.normal(0,2.0,20) xL.sort() yL = steps(xL) #---------------------------------- # points plt.scatter(xL, yL, s=50, zorder=1) # distribution rv = norm() pL = np.linspace(-3, 3) plt.plot(pL, rv.cdf(pL), '--') # vertical lines dL = list() for x,y0 in zip(xL,yL): y1 = rv.cdf(x) dL.append(abs(y1-y0)) plt.plot((x,x),(y0,y1), color='r', zorder=0) ax = plt.axes() ax.set_xlim(-4, 4) ax.set_ylim(-0.1, 1.1) plt.savefig('example.png') print 'max %3.3f' % max(dL) t = kstest(xL,'norm') print 'D = %3.2f, p-value = %3.2f' % t xL = ['%3.2f' % n for n in xL] print ','.join(xL) |
Subscribe to:
Posts (Atom)