amXor

Blogging about our lives online.

3.08.2010

Arbor Redundancy Manager

1 comments

** UPDATE! This code will potentially create false hard-links between files that are very similar. I suddenly have many iTunes album covers that are from the wrong artist. I'm not sure if this is a result of how cavalier Apple is about messing with low level UNIX conventions, or if my code is faulty. Beware! **

We all have redundant data on our computers, especially if we have multiple computers. My new tool is aimed at making backups for this kind of thing simple.

I'll take the simplest practical example: You have two computers and one backup hard drive. You want full backup images of both of these systems. Even if you're just backing up the 'Documents' folder, there is a good chance that there is a lot of redundant files. What my little program does is checks the contents of these files, and if two are the same, it creates a hard link between them.

So, if your backup folder looks like this:

Backup
Backup > System 1 > Documents > somefile.txt
Backup > System 2 > Documents > somefile_renamed.txt

It will look exactly the same after, but there will only be one copy of the file, if the contents are identical.

I have dozens of backup CD's that contain a lot of the same information, now I don't need to sort through them and reorganize or delete duplicate files. I can leave them just as they are and any duplicate files will be linked under the hood. Here is the current state of the code, discussion to follow:

arbor.py - v. 0.02

#!/usr/bin/env python
import os, sys, hashlib

arg_error = False
if len(sys.argv) == 2:
   src = sys.argv[1]
   srcfolder = os.path.abspath(src)
   if not os.path.isdir(srcfolder):
       arg_error = True
else: arg_error = True

if arg_error:
   print "Usage: arbor [directory]"
   sys.exit()

backupfolder = os.path.join(srcfolder, ".arbor")
if not os.path.isdir(backupfolder):
   os.mkdir(backupfolder)

skipped_directories = [".Trash", ".arbor"]
skipped_files = [".DS_Store"]
size_index = {}
MAX_READ = 10485760

def addsha1file(filename, size):
   if size > MAX_READ:
       f = open(filename, 'r')
       data = f.read(MAX_READ/2)
       f.seek(size/2)
       data = data + f.read(MAX_READ/2)
       sha1 = hashlib.sha1(data).hexdigest()
   else:
       sha1 = hashlib.sha1(open(filename, 'r').read()).hexdigest()
   backupfile = os.path.join(backupfolder, sha1)
   try:
       if os.path.exists(backupfile):
           os.unlink(srcfile)
           os.link(backupfile, srcfile)
       else:
           os.link(srcfile, backupfile)
   except:
       print "Unexpected error: ", sys.exc_info()[0], sys.exc_info()[1]

fcount = 0
for root, dirs, files in os.walk(srcfolder):
   for item in skipped_directories:
       if item in dirs:
           dirs.remove(item)
   for name in files:
   fcount += 1
   if fcount % 500 == 0:
       print fcount, " files scanned"
       if name in skipped_files:
           # print name
           continue

       srcfile = os.path.join(root, name)
       size = os.stat(srcfile)[6]
       if size_index.has_key(size):
           if size_index[size] == '':
               addsha1file(srcfile, size)
           else:
               addsha1file(size_index[size], size)
               addsha1file(srcfile, size)
               size_index[size] = ''
       else:
           size_index[size] = srcfile

Here are the added features of this version:

  • Folder to backup is passed as command-line argument
  • Backup files are placed at the top level of that folder in the .arbor directory. These are just hard links so they don't really add any to the size of the directory.
  • Ability to skip named folders or files
  • Only calculates a checksum if two files are the same size.
  • Only calculates a partial checksum if a file is over 10mb. Checks 5mb from start and 5mb from middle.
  • Prints a running tally of files checked (per 500 files)
  • Doesn't choke on errors: some files don't like to be stat'ed or unlinked, permissions issues.

I'm not sure about the partial checksum option, but it was really bogging down on larger inputs. It's not really practical to do a SHA1 checksum on a bunch of large files, and i think it's safe to say that two very large files can be assumed to be the same if the first 5mb, the middle 5mb and the overall size are exactly the same. Perhaps I will add an option later for strict checking, if someone is highly concerned about data integrity. But the practical limitations are there, I'm 30,000 files into a scan of my ~200GB backup folder and I certainly wouldn't have gotten that far without the file size limiting.

Update: The scan was almost done when I wrote this. Here is the tail end of the log:

31000  files scanned
31500  files scanned
32000  files scanned
32500  files scanned

real 43m40.531s
user 7m25.593s
sys 3m35.540s

So 233 GB over 32500 items took about 45 minutes to check and it looks like I've saved about 4 GB. Upon further inspection, it seems that most media files save their metadata in the file contents, so the checksum is different. Hmmm....

3.06.2010

File Management Tool - Part 2

0 comments

I have worked with my program a bit more and there are some interesting aspects of this kind of backup.

At the end of the backup all duplicate files point to the same inode, so the SHA1 version can be deleted.

eg. 

inode   name
1299    workingdir/folder1/file1.txt
1299    workingdir/folder5/file1.txt
1299    workingdir/folderx/file1_renamed.txt
1299    backupdir/54817fa363dc294bc03e4a70f51f5411f4a0e9a9

All these files now point at the same inode and so the backup directory can be erased and no file has executive control over this inode. All three files would have to be deleted to finally get rid of inode 1299. Generally it seems that programs save files with new inodes (Text Edit ...), so editing any of the versions breaks the links. It seems that UNIXy programs respect the inode better, vim saves with the same inode and so editing any version edits every version.

Removing the "backup" directory also helps Spotlight resolve the names and filetypes. Deleting that folder and running `mdimport ./workingdir` complained mightily but more or less re-indexed the folder. Here is a quick slice of the errors it produced, I'm not going to try to make sense of them, but think they're interesting; maybe Spotlight encounters these kinds of problems always and just keeps silent about them.

$mdimport ./workingdir
...
font `F88' not found in document.
font `F82' not found in document.
font `F88' not found in document.
font `F82' not found in document.
font `F88' not found in document.
font `F82' not found in document.
font `F88' not found in document.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
encountered unexpected symbol `c'.
choked on input: `144.255.258'.
choked on input: `630.3.9'.
choked on input: `681.906458.747'.
choked on input: `680.335458.626'.
choked on input: `682.932458.507'.
choked on input: `530.3354382.624'.
font `Fw' not found in document.
font `Fw8' not found in document.
encountered unexpected symbol `w6.8'.
encountered unexpected symbol `w0.5'.
font `Fw8' not found in document.
encountered unexpected symbol `w0.5'.
encountered unexpected symbol `w6.8'.
choked on input: `397.67.'.
choked on input: `370.5g'.
choked on input: `370.5g'.
choked on input: `D42.32 m
314.94 742.32 l
S
306.06 751.2 m
306.06 7...'.
choked on input: `67.l'.
choked on input: `67.l4'
failed to find start of cross-reference table.
missing or invalid cross-reference trailer.

To reiterate, this is a funny trick that my program is doing. It builds a list of SHA1 named files from the source directory and then you just delete the index it just made and you're left with all the duplicates hard linked. I think that's pretty cool.

Metadata

One stated aim of this backup tool was to preserve metadata. So far this tool preserves the time stamps and metadata of whatever file it indexes first and the filename of every file it indexes. I'm not sure how to implement any more than this in a transparent way. As far as I can tell from the documentation, you can't have a single inode with multiple access and modification times. And building an external database of that kind of information would not get used.

File Management Tool

0 comments

I ended up sketching out the details of how my file managing tool will work, it's kind of like a virtual librarian that removes redundant files without deleting the file hierarchy. My method is a bit of a mashup of how other tools work, so I'll give credit where credit is due.

This is how it works so far:

  • All files in a tree have their SHA1 hash-value computed (Git)
  • A hard link is created in the backup folder with the SHA1 name (Time Machine) ...
  • unless: the file exists already, then it is hard-linked to the existing SHA1 (...)

There is no copying or moving of files, simply linking and unlinking, so 99.9999% of the time is spent computing the hash values of the files. Here's the python version of this:

import os
import hashlib

backupfolder = os.path.abspath('./backup')
srcfolder = os.path.abspath('./working')
srcfile = ''
backupfile = ''

for root, dirs, files in os.walk(srcfolder):
    for name in files:
        if name == '.DS_Store':
            continue
        srcfile = os.path.join(root, name)
        sha1 = hashlib.sha1(open(srcfile, 'r').read()).hexdigest()
        backupfile = os.path.join(backupfolder, sha1)
        if os.path.exists(backupfile):
            os.unlink(srcfile)
            os.link(backupfile, srcfile)
        else:
            os.link(srcfile, backupfile)
        # print backupfile

This folder contains about 5 GB of info and I thought that the SHA1 calculations might take a couple weeks, but as it turns out, it only takes a couple minutes. What you end up with is a backup folder that contains every unique file within this tree named by it's sha1 tag, and the source folder looks exactly as when you started, but every file is a hard link.

So, what are the benefits?

Filenames are not important

Because the SHA1 only calculates the contents of a file, filenames are not important. This is important in two ways, if a file has been renamed in one tree, yet remains physically the same, you only have one copy and the unique names are preserved. And more importantly, if you have two files in separate trees that are named the same, (ie. 'Picture 1.png'), you keep the naming, yet have different files.

If you have some trees of highly redundant data, this is the archive method for you. My test case was a folder of 15 direct copies of backup CD's that I have made over the years and I have saved about 600M across 5GB. And the original file hierarchies look exactly the same as they did before running the backup.

What is wrong with it?

As it stands, it messes with Spotlight and Finder's heads a little bit. Finder isn't computing correct size values for the two folders. du prints the same usage whether I include both folders or one at a time, which is pretty clever: total:5.1GB, working:5.1GB, backup:5.1GB. Finder on the other hand prints Total: 5.1GB, working: 5.1GB, backup: 4.22GB.

Spotlight

Some very wierd stuff happens with Spotlight.

A Spotlight search in the working directory will show mostly files from the backup directory, which isn't convenient. The files in the backup dir have no file extension so they're essentially unopenable by Finder. Here's what i found using the command-line mdfind:

mdfind -onlyin ./working "current"
/Users/.../backup/5f5b587eb07ee61f15ab0a032ca564a17ff461e9
/Users/.../backup/0f3f769000f164b2e30bb7b3f09482e8cc244135
and so on ...

mdfind -onlyin ./backup "current"
nothing found

For some reason, when searching the working directory it finds the information, yet always resolves the name of the file to a directory it's not supposed to be searching. And if you search the backup directory, it doesn't even bother reading the files, because it assumes from the name that they are unreadable by it.

I'm starting to wish that Steve Jobs hadn't caved and given in to the file extension system.

Time Machine

Okay, it's useful but how is it similar to Time Machine? Time Machine creates a full copy of the tree when it first backs up the system. From then on it creates the full hierarchy of directories but all the files that haven't changed are hard links to the original backup. Each unique file is a new inode created in time, whereas in my system each unique file is a new inode created in space. All duplicates in time are flattened by Time Machine and all duplicates in space are flattened by my system.

Note: To copy folders from the command line and preserve as much as possible for metadata use `cp -Rp`.

3.05.2010

File Backup And Synchronization

0 comments
In my previous post I had mentioned that I was looking for a backup/file synchronization tool.
I don't think Git is it and neither is dropbox. Both these are useful in that they are format transparent, which most database software is not. But what they are lacking is a way to deal with a large variety of file and folder hierarchies and seamlessly compress without losing transparency and semantic meaning.
So here is my list of requirements from a backup tool:
  1. Preserves any time-stamp information, even conflicting
  2. Distributed (decentralized)
  3. Minimizes redundant data
  4. Preserves hierarchies for semantic meaning
  5. Hides hierarchy clutter
  6. Preserves every bit of metadata, even if it's not explicit
  7. Accessible and platform neutral
  8. Makes data integrity paramount
It may seem like I have requirements that conflict with each other, but I will try to explain what I mean. I have loaded four of my backup CD's onto my laptop. I know there are duplicate files and I know there are time-stamps that disagree with one another.
I want to be able to view these files in a number of ways:
  • In their original on-disk hierarchy.
  • By file type, date, tags or physical description.
And I want to be able to synchronize all or part of these folders between machines, in addition to making zip/tar archive of them to a backup machine.
Any suggestions, or shall I start coding?

3.04.2010

Git And The Future Of The Internet.

2 comments
I've recently taken a detour into philosophizing about where technology is going. What does the future look like and what does it mean for humanity, life and the current business models as we know them?

It all started with a bit of research into Linus Torvalds latest project, Git. I've been thinking about trying some kind of content management system for personal use. I've looked at a lot of personal database type stuff (Bento, FileMaker, MySQL, ...) and they just seem like format specific black holes to drop your content into. I'm still not sure Git is right for what I'm thinking, but I watched Linus' Google tech talk followed by Kevin Kelly's TED talk and had a vision of a web that is so much more than what it is right now.

They're both pretty long, but I've had a bit of time on my hands lately... http://www.youtube.com/watch?v=4XpnKHJAok8 http://www.ted.com/talks/kevin_kelly_on_the_next_5_000_days_of_the_web.html Linus brings up two important points in his talk: one is the notion of working in a "network of trust" and the other is the sacredness of one's own data. Both of these are extremely important and often lacking components in the emerging technologies of our day. The network of trust is the only way to do collaborative work on open source development right now.

I think this is hitting a critical mass and will soon be the only way to do any kind of work. Monolithic organizations cannot keep up with the changing landscape of information growth. Git is a very interesting project because it takes this model and implements it in a very practical way. It employs a lot of very technical algorithms to allow software projects to grow very organically in a social environment. A lot of the metaphors that surround software development are hard, physical metaphors like construction, building and engineering, but the emerging metaphors are about growth, evolution and adaptation to environment. 

The benefits of collaborative networked projects are obvious but the sacredness of one's data is a bit more of a veiled concept. Linus outlines the use of the SHA1 algorithms as a means to ensure that the entire history of a project, or set of data, can be verified to be accurate and traceable throughout it's lifespan. This has obvious benefits when dealing with buggy network connections or failing hard drives, but it's more interesting to me in it's wider application.


Where's My Information?

As a person that has used a computer for a number of years I'm already seeing the breakdown of continuity in my archived information. As data gets moved around, archived to CDROM, uploaded to Google Docs, downloaded to PDF's and transferred to different operating systems, it all ends up in a soup of data without context or history. I have no idea if the timestamps are accurate, or what the context and related content might be. As soon as you add cloud computing to the mix, the problems amplify greatly.

This very blog post is being submitted to the vast expanse of content controlled and managed by the cloud. I have no simple way of traversing the internet and picking up all the odds and ends that I have put there.
This is the real direction of Git I think, and I want to figure out how to use it for more than just source code management because I think it could change the way the internet works. What if this blog was simply a mirror of the "Blog" folder on my hard drive, which was mirrored on every machine I use and was also shareable to other collaborators who mirrored their own unique versions? And what if my photo page on flickr and Facebook were simply mirrors of a folder called "Published Photos" on my hard drive which were mirrors of... and so on.


Vapor Trails

The fundamental problem of cloud computing is the owners right to content and tracking. This is generally possible with today's technology, but never practical. I have 65 documents in Google Docs at the moment and I could download all of them in one go into plain text files, but all the metadata would be garbage, and I couldn't easily merge them with the existing contents of my hard drive. Sure, I could spend a bit of time diff-ing them with my files and organizing them into logical places, but imagine if I was talking about the entire contents of my home directory. du | wc -l command shows 5,627 files in my home directory and I don't even have my music collection on this computer! Yes, the data is basically safe in the cloud, but what if I want to take it with me or move it elsewhere? What if I want to host this blog from my own server, how would I transfer it? The current cloud model only takes uploading and viewing seriously and neglects personal ownership rights. Google docs has special code written for exporting, blogger doesn't, facebook and flickr don't, youtube doesn't.

They are all greedy information gathering tools. They are only concerned with gathering your information and storing it on their sites. There are "sync" tools for most platforms, but their only intent is to gather your content with more ease and transparency.

Git looks promising in that it allows you to publish your information, yet still control the source of it.

3.03.2010

Assembly Language For Mac

0 comments
I'm away from my Linux box and want to do some assembly programming. Mac installs GCC with the developer tools, but there are enough differences that I haven't bothered to work through them until now. Here's a decent tutorial, although it focuses on PPC assembly and I'm using an Intel Mac. The thing that frightened me about the Mac assembler was the default output of gcc -S. There is some strange optimizations and flags in the resulting assembly code. The key, as the tutorial points out, is in the compiler options. Here's what I used on the ubiquitous "Hello World" program:
gcc -S -fno-PIC -O2 -Wall -o hello.s hello.c
And here's the assembly code it spit out:
    .cstring
LC0:
   .ascii "Hello World!%d\12\0"
   .text
   .align 4,0x90

.globl _main
_main:
   pushl   %ebp
   movl    %esp, %ebp
   subl    $24, %esp
   movl    $12, 4(%esp)
   movl    $LC0, (%esp)
   call    _printf
   xorl    %eax, %eax
   leave
   ret
   .subsections_via_symbols
This is more familiar territory, the only differences being the .cstring directive instead of .section .text, the leading underscore on printf, and the .subsections_via_symbols directive. The general naming of sections is outlined on the Mac Assembler Reference, and the .subsections_via_symbols explanation is interesting. I'm already used to using many labels in my code; does this mean that the named sections would be ripped out because they are not "called" by any other code? I tested this out in the previous example, just adding a second call to _printf in a labelled section and the code worked just fine. It seems that labels don't count, they have to be declared sections like .globl, .section or whatever. That seems fair, I haven't yet made a habit of calling sections that are supposed to flow naturally into other sections. Maybe there is some instance where this might be a useful optimization? I will be looking into Position Independent Code(PIC) a bit more, it seems that it's similar in theory to how the latest Linux kernel runs code at randomized memory locations to prevent hardcoded attacks, but I don't know if that's the extent of it.

3.02.2010

RPN Calculator - v0.02

0 comments
My calculator code was quite easily polished up. Here's the revised code which stacks operands properly and supports the main arithmetic operators +,-,*,/. If you flush the stack completely, you get a "nan" warning, which seems reasonable. Here's the code: (gas, x86)
.section .data
expr_length:    .int 128
ADD:            .ascii "+"
SUB:            .ascii "-"
MUL:           .ascii "*"
DIV:            .ascii "/"
null:           .ascii "\0"
disp_float:     .ascii "%f\n\n\0"
.section .bss
    .lcomm expr, 128
.section .text
.globl main

main:
    finit
    1:
    leal    null, %esi          #Clear the expr buffer
    leal    expr, %edi
    movl    expr_length, %ecx
    cld
    lodsb
    rep     stosb
    addl    $4, %esp
    pushl   stdin               # Read an expression
    pushl   $64
    pushl   $expr
    call    fgets
    addl    $12, %esp

    movb    ADD, %ah            # Test For Operators
    movb    expr, %bh
    cmp     %ah, %bh
    je      addFloat
    movb    SUB, %ah
    cmp     %ah, %bh
    je      subFloat
    movb    MUL, %ah
    cmp     %ah, %bh
    je      mulFloat
    movb    DIV, %ah
    cmp     %ah, %bh
    je      divFloat

    pushl   $expr               # Must be a number
    call    atof
    addl    $4, %esp
    jmp 1b

    addFloat:
        faddp
        fstl   (%esp)
        jmp     disp_answer

    subFloat:
        fsubrp
        fstl   (%esp)
        jmp     disp_answer

    mulFloat:
        fmulp
        fstl    (%esp)
        jmp     disp_answer

    divFloat:
        fdivrp
        fstl   (%esp)

    disp_answer:
        pushl   $disp_float
        call    printf
        addl    $8, %esp
        jmp     1b


    notfound:
    movl $1, %eax
    movl $0, %ebx
    int $0x80

8 Geek Tools For Your Mac

0 comments
Here's a quick roundup of the applications I have found most useful since switching to Mac (from Linux, although I still ssh into my Arch Linux box regularly). Most of these are included but some need the Developer Tools installed and really, even if you're not using Xcode, you should install the Developer Tools and Optional Installs. You want them.
Calculator
Big deal, it comes with a calculator. But this calculator has some barely hidden super powers lurking in the menu options. Paper tape to show your calculation history, scientific and programmer modes, an RPN mode, and a whole load of unit conversions. When you start this calculator it gives the impression of being a $10 Toys-R-Us thing, but there's way more than meets the eye. Grapher
The one thing Calculator doesn't do is graphs, but if you have the Developer Tools installed, you have a program that will do far more than your average graphing calculator. And with style. Located at /Applications/Utilities/Grapher, this app is a graphing machine! From simple parametric equations to complex 3-D differential equations, this thing will graph it. The equation templates and examples are quite nice too if you're not sure where and how to begin. Spotlight I could recommend Quicksilver, which does way more than Spotlight could dream of, but when it comes right down to it I only really used Quicksilver for application and file launching anyway. Spotlight does this really well. There are no applications, other than droplets (see Automator) in my dock anymore. Cmd-Space your way to a clean dock. Icon Composer I don't know why, but I like to create my own icons for the stuff in my dock. Maybe I'm alone in this. But if you want to give it a go here's how I do it:
1. Draw the icon using GIMP / Photoshop, at 512 x 512px (transparency and gloss are your friends!)
2. Drop the saved png into Icon Composer (/Developer/Applications/Utilities/ )
4. Drop the icns file onto icns2icon, and the file will become the icon...
5. Cmd-i (Get Info) the icns file and whatever file/folder you want to apply the icon to.
6. Select the source icon in the "Get Info" window and Cmd-C, select the destination icon and Cmd-V.
Automator If you have a repetitive task there's probably an Automator script to do it. There are only three that I use regularly and these are all in my dock as droplet style applications. They are:
Renamer: This is just the "Rename Finder Items" plugin, with "Show this action when workflow runs" checked so that you can choose the options whenever you run it.
Comment: The "Set Spotlight Comments" plugin, again with "Show..when run" checked so it's completely generic. Once in a while I get in a phase where I feel the need to tag all my files.
Desktop Alias: Rather than store my current project folders on my desktop I just drop them on this and it creates a desktop shortcut. "New Aliases" patch. I have done some more elaborate Automator scripts, but these are the ones that are general enough for daily use. Megazoomer
Okay, this one isn't included with your Mac, but it really should be. You have to download Megazoomer as well as SIMBL.
What this does is add a "Mega Zoom" option to every app so you can fullscreen anything, as you can see from the screenshot, I'm typing this with vim on a fullscreen terminal with slight transparency (just enough to read a website in the background).
This screenshot is not cropped at all.
It's kind of like a bit of 'ratpoison' window manager for your Mac. This is true fullscreen, no dock, no menu bar, no distractions...
Quartz Composer This is an amazing bit of software located at /Developer/Applications/Quartz Composer. See the examples at /Developer/Examples/Quartz Composer/Compositions/Conceptual for some idea of what it can do. This can create some really cool 2D / 3D visuals. Also see http://www.zugakousaku.com/ for some cool visual designs using Quartz Composer. Terminal Finally I must recommend that you pimp your Terminal. If your going to have any geek cred at all you need a sweet terminal. Here's my preferences, mix to your own tastes: Background color: 91% Black, 96% Opacity. Font: Bitstream Vera Sans Mono 13pt, greyish-green, antialiased vimrc: ...maybe i'll get into settings in another post. I have done quite a bit of monospace font testing and this is my favorite. Very close competitors are: BPmono, Terminus and Droid Sans Mono.
And the next step, after you get your Terminal looking really cool, is to learn how and why to use it. Happy Hacking!

Simple RPN Calculator

0 comments
Reverse Polish Notation is a simple method of calculation that was used extensively in scientific calculators such as the HP 32S but has fallen out of use somewhat these days. I have decided to implement an RPN calculator in Assembly Language to test what I have learned so far. Here is version 0.01 that only adds and subtracts, just to give a rough layout of how it will work.
.section .data
  expr_length:  .int 128
  ADD:          .ascii "+"
  SUB:          .ascii "-"
  null:         .ascii "\0"
  disp_float:   .ascii "%f\n\n\0"
.section .bss
  .lcomm expr, 128

.section .text
.globl main

main:
    finit
1:
    leal    null, %esi          # Clear the expr buffer
    leal    expr, %edi
    movl    expr_length, %ecx
    cld
    lodsb
    rep     stosb               #__
    addl    $4, %esp
    pushl   stdin               # char * fgets (char * str, int len, * stream)
    pushl   $64
    pushl   $expr
    call    fgets
    addl    $12, %esp

    movb    ADD, %ah            # Test For Operators in expr
    movb    expr, %bh           # and jump to code if operator found
    cmp     %ah, %bh
    je      addFloat
    movb    SUB, %ah
    cmp     %ah, %bh
    je      subFloat

    pushl   $expr               # Defaults to a number
    call    atof                # double atof (const char * str)
    addl    $4, %esp            # pushes a float into st(0) from string
    jmp 1b


addFloat:
    faddp
    fstpl   (%esp)
    pushl   $disp_float
    call    printf
    addl    $8, %esp
    jmp     1b

subFloat:
    fsubrp
    fstpl   (%esp)
    pushl   $disp_float
    call    printf
    addl    $8, %esp
    jmp     1b

    movl $1, %eax           # All roads jump back to 1 so we never get here
    movl $0, %ebx
    int $0x80
And here's how it looks in action:
$./rpn
14.5
88
+
102.500000

12.3
6.6
-
5.700000
It took some digging to figure out how to load a value into the FPU from a string and the "atof" C library function seemed to be the easiest. All it needs is a string pointer on the stack ("$expr" in the code) and it does it's best to convert it to a float and push it onto the FPU stack - fairly painless. The operator test needs some work since it will always give a false positive if you punch in a signed value. Punching in -12 causes the calculator to run the subtraction code and disregards the number you typed in, not ideal. Other todos: 1. Pull the display code out of the calculations, this should be generic. 2. Push the result back onto the stack so you can use it in following calculations. 3. Filter the input somewhat.

Twitter

Labels

Followers

andyvanee.com

Files