everythings shipshape sir...
Shipshape
...This is a *generic problem*. You have the same problem happening
with the email you read, the web pages you regularly visit,
the files you download...
name: Shipshape - everythings squared away and shipshape sir
date: 02NOV2003
programmer: goon
license: Authors rights have been asserted 2003 Not to be reproduced
without express permission.
Abstract
*Shipshape* is a set of programming tools I've *quickly* hacked together to
solve a particular problem of organising and working with files for any given file
paths on a file system. We shall attempt to create generic, cross-platform
set of tools to enhance the storage, manipulation and searching for
information normally stored on a file system .... and here's the catch find
it within the fist page of searchs.
...To a computer they are all justs lists...
Discussion
The original problem revolved around having to maintain a html index page
of files. This is a simple problem solved by manually inserting the url links of
files within a file and maintaining them regularly using a text editor, cvs
and a bit of decision making.
Frankly doing this manually is a waste of time. Some files I read, delete - others I
would like to read but never quite get around to it. I either loose them or
simply forget where I have saved them.
Meanwhile the same problem presents itself with manual organisation of downloaded applications, files
that I suck down to read on regular intervals, organising my email lists.
This process is continually repeated for email, repeated for urls I visit,
repeated for files that I locate on my hard drive, ad nauseum.
Generic problem
This is a *generic problem*. You have the same problem happening with the
email you read, the web pages you regularly visit, the files you download.
To a computer they are all justs lists with attibutes, physical, logical, associative etc.
but how do we remember all of this? How can you search through this
information? There's got to be a better way.
This could explain why google is filling a gap in the web.
But I dont have my file system on the web. Google wont work on my machine.
Would I want my machine contents indexed?
Search for google article I knew I had somewhere in 124s. I didn't find the
precise article but enough to search the directory for the file I required.
# Connection: mysqlcctmp_1
# Host: 192.168.0.1
# Saved: 2003-11-02 23:29:24
#
SELECT
tblRoot.ID,
tblRoot.path AS ROOT,
tblRoot.pathRelative AS PATH,
tblDir.path as DIR,
tblFile.name as FILENAME
FROM
tblRoot
LEFT JOIN
tblDir
ON
tblRoot.ID = tblDir.rootID
LEFT JOIN
tblFile
ON
tblDir.ID = tblFile.dirID
WHERE
tblFile.name LIKE '%google%'
+-----+------+--------------------------------------+----------------------+---------------------+
| ID | ROOT | PATH | DIR | FILENAME |
+-----+------+--------------------------------------+----------------------+---------------------+
| 344 | e: | \reading\google | google | brokenGoogle.html |
| 344 | e: | \reading\google | google | google.html |
| 344 | e: | \reading\google | google | googlefeatures.html |
| 349 | e: | \reading\google\googlefeatures_files | googlefeatures_files | google.css |
| 349 | e: | \reading\google\googlefeatures_files | googlefeatures_files | google_sm.gif |
| 350 | e: | \reading\google\google_files | google_files | google.css |
| 350 | e: | \reading\google\google_files | google_files | google_sm.gif |
| 745 | e: | \reading\python\xml\webhack | webhack | pygoogle-0.5.3.zip |
| 745 | e: | \reading\python\xml\webhack | webhack | pygoogle-1.5.3.zip |
+-----+------+--------------------------------------+----------------------+---------------------+
Couldn't find it using mozillas book mark manager. It's intersting to note that had I searched for the directory 'google' and
assuming I had placed the files in such a directory I returned the results
back in 2.4s.
/* much FASTER search */
# Connection: mysqlcctmp_1
# Host: 192.168.0.1
# Saved: 2003-11-02 23:29:24
#
SELECT
tblRoot.ID,
tblRoot.path AS ROOT,
tblRoot.pathRelative AS PATH,
tblDir.path as DIR,
tblFile.name as FILENAME
FROM
tblRoot
LEFT JOIN
tblDir
ON
tblRoot.ID = tblDir.rootID
LEFT JOIN
tblFile
ON
tblDir.ID = tblFile.dirID
WHERE
tblDir.path = 'google'
+-----+------+-----------------+--------+---------------------+
| ID | ROOT | PATH | DIR | FILENAME |
+-----+------+-----------------+--------+---------------------+
| 344 | e: | \reading\google | google | bigbro.html |
| 344 | e: | \reading\google | google | broken.html |
| 344 | e: | \reading\google | google | brokenGoogle.html |
| 344 | e: | \reading\google | google | cookie.htm |
| 344 | e: | \reading\google | google | google.html |
| 344 | e: | \reading\google | google | googlefeatures.html |
| 344 | e: | \reading\google | google | jobad.html |
| 344 | e: | \reading\google | google | pagerank.html |
| 344 | e: | \reading\google | google | valerie.html |
+-----+------+-----------------+--------+---------------------+
Language
The language I chose for these first three steps is Python. For the language bigots it
could just as well of been Java, Perl, C# or (insert your own). Python has
the cross platform reach I require at the same time with an extensive standard
library.
... but how do we remember all of this? How can you search through this
information ?
Idea
The idea I've come up with is hardly new. Spider through a file path,
extract the useful information, massage it a bit and export it. I won't go
into great detail here but the simple technique I'm employing consists of:
-
Metadata: (metadata.py) useful system and information
that gives meaning to data collected.
-
Walk: (walk.py) meaning with a given root file url
recusivly scan and encode the directories and files within that
filepath. At the same time extracting important metadata.
-
Encode: (encode.py) read the *processed* walk data generated
previously, interpret and re-encode the data from a template. Again
including any important metadata.
-
Export: (export.py) with the encoded information export it.
Developed independently of each other and using stdin/stdout to read in and
export the data. It is planned for the walk, encode and export modules read their instuctions via template
files or commandline. The code can be reused from modules or used via the
command line using stdio. This allows great configurability.
At the present this test has not implemented the export module,
used templating for the options. Nor has the metadata module been used though
it has been constructed. The Encode module output was redirected to a file
and manually insterted into MySql then exported as a cvs file.
Here's the following example results.
1. Walk.py
The Walk module recursively reads the directory path in this case the
directory of the code - e:\shipshape.
The markup around the data is summarised as following:
[root]
root file path
{directory}
directory name
{file}
file name + extension
Running this code generates the following markup ....
d:> d:\python23\python.exe e:\shipshape\walk.py e:\shipshape
[e:\shipshape] {shipshape} (.encode.py.swp) (decode.py) (encode.py) (export.py) (metadata.py) (walk.py)
[e:\shipshape\CVS] {CVS} (Entries) (Entries.Extra) (Repository) (Root)
[e:\shipshape\meta] {meta} (encode.mdf)
[e:\shipshape\meta\CVS] {CVS} (Entries) (Entries.Extra) (Repository) (Root)
[e:\shipshape\sql] {sql} (allHtml.cvs.sql) (createTblDir.sql) (createTblFile.sql) (createTblRoot.sql) (Fb.sql) (reading.sql) (selectAll.sql) (shipshape.sql) (tools.sql)
[e:\shipshape\sql\CVS] {CVS} (Entries) (Entries.Extra) (Repository) (Root)
[e:\shipshape\sql\test] {test} (selectAll.csv.txt) (selectAll.sql) (shipshape.sql) (shipshape.wlk)
[e:\shipshape\template] {template} (encode.tpl)
[e:\shipshape\template\CVS] {CVS} (Entries) (Entries.Extra) (Repository) (Root)
2. encode.py
The encode module reads from stdin and interprets then encodes the
input with (at the moment) *hard coded* sql. The module will be made to read
from a template file. The encode module redirects the code to stdout.
Running this code generates the following markup ....
d:> d:\python23\python.exe e:\shipshape\walk.py e:\shipshape
| d:\python23\python.exe e:\shipshape\encode.py >
shipshape.sql
INSERT INTO tblRoot (tblRoot.path,tblRoot.pathRelative) VALUES("e:","\\shipshape");
INSERT INTO tblDir (tblDir.path, tblDir.rootID) VALUES
("shipshape",LAST_INSERT_ID());
INSERT INTO tblFile (tblFile.name, tblFile.dirID) VALUES
(".encode.py.swp",LAST_INSERT_ID()),
("decode.py",LAST_INSERT_ID()),
("encode.py",LAST_INSERT_ID()),
("encode.pyc",LAST_INSERT_ID()),
("export.py",LAST_INSERT_ID()),
("metadata.py",LAST_INSERT_ID()),
("metadata.pyc",LAST_INSERT_ID()),
("walk.py",LAST_INSERT_ID());
INSERT INTO tblRoot (tblRoot.path,tblRoot.pathRelative) VALUES("e:","\\shipshape\\CVS");
INSERT INTO tblDir (tblDir.path, tblDir.rootID) VALUES
("CVS",LAST_INSERT_ID());
INSERT INTO tblFile (tblFile.name, tblFile.dirID) VALUES
("Entries",LAST_INSERT_ID()),
("Entries.Extra",LAST_INSERT_ID()),
("Repository",LAST_INSERT_ID()),
("Root",LAST_INSERT_ID());
INSERT INTO tblRoot (tblRoot.path,tblRoot.pathRelative) VALUES("e:","\\shipshape\\meta");
INSERT INTO tblDir (tblDir.path, tblDir.rootID) VALUES
("meta",LAST_INSERT_ID());
INSERT INTO tblFile (tblFile.name, tblFile.dirID) VALUES
(("encode.mdf",LAST_INSERT_ID()));
INSERT INTO tblRoot (tblRoot.path,tblRoot.pathRelative) VALUES("e:","\\shipshape\\meta\\CVS");
INSERT INTO tblDir (tblDir.path, tblDir.rootID) VALUES
("CVS",LAST_INSERT_ID());
INSERT INTO tblFile (tblFile.name, tblFile.dirID) VALUES
("Entries",LAST_INSERT_ID()),
("Entries.Extra",LAST_INSERT_ID()),
("Repository",LAST_INSERT_ID()),
("Root",LAST_INSERT_ID());
INSERT INTO tblRoot (tblRoot.path,tblRoot.pathRelative) VALUES("e:","\\shipshape\\sql");
INSERT INTO tblDir (tblDir.path, tblDir.rootID) VALUES
("sql",LAST_INSERT_ID());
INSERT INTO tblFile (tblFile.name, tblFile.dirID) VALUES
("allHtml.cvs.sql",LAST_INSERT_ID()),
("createTblDir.sql",LAST_INSERT_ID()),
("createTblFile.sql",LAST_INSERT_ID()),
("createTblRoot.sql",LAST_INSERT_ID()),
("Fb.sql",LAST_INSERT_ID()),
("reading.sql",LAST_INSERT_ID()),
("selectAll.sql",LAST_INSERT_ID()),
("shipshape.sql",LAST_INSERT_ID()),
("tools.sql",LAST_INSERT_ID());
INSERT INTO tblRoot (tblRoot.path,tblRoot.pathRelative) VALUES("e:","\\shipshape\\sql\\CVS");
INSERT INTO tblDir (tblDir.path, tblDir.rootID) VALUES
("CVS",LAST_INSERT_ID());
INSERT INTO tblFile (tblFile.name, tblFile.dirID) VALUES
("Entries",LAST_INSERT_ID()),
("Entries.Extra",LAST_INSERT_ID()),
("Repository",LAST_INSERT_ID()),
("Root",LAST_INSERT_ID());
INSERT INTO tblRoot (tblRoot.path,tblRoot.pathRelative) VALUES("e:","\\shipshape\\sql\\test");
INSERT INTO tblDir (tblDir.path, tblDir.rootID) VALUES
("test",LAST_INSERT_ID());
INSERT INTO tblFile (tblFile.name, tblFile.dirID) VALUES
("selectAll.csv.txt",LAST_INSERT_ID()),
("selectAll.sql",LAST_INSERT_ID()),
("shipshape.sql",LAST_INSERT_ID()),
("shipshape.wlk",LAST_INSERT_ID());
INSERT INTO tblRoot (tblRoot.path,tblRoot.pathRelative) VALUES("e:","\\shipshape\\template");
INSERT INTO tblDir (tblDir.path, tblDir.rootID) VALUES
("template",LAST_INSERT_ID());
INSERT INTO tblFile (tblFile.name, tblFile.dirID) VALUES
(("encode.tpl",LAST_INSERT_ID()));
INSERT INTO tblRoot (tblRoot.path,tblRoot.pathRelative) VALUES("e:","\\shipshape\\template\\CVS");
INSERT INTO tblDir (tblDir.path, tblDir.rootID) VALUES
("CVS",LAST_INSERT_ID());
INSERT INTO tblFile (tblFile.name, tblFile.dirID) VALUES
("Entries",LAST_INSERT_ID()),
("Entries.Extra",LAST_INSERT_ID()),
("Repository",LAST_INSERT_ID()),
("Root",LAST_INSERT_ID());
3. mysql import/export
Code yet to be built, but trivial. Used the export tool to produce a csv of
the results.
# MySQLCC - [mysqlcctmp_1] Query Window
# Connection: mysqlcctmp_1
# Host: 192.168.0.1
# Saved: 2003-11-02 18:25:42
#
# Query:
# SELECT
# tblRoot.ID,
# tblRoot.path,
# tblRoot.pathRelative,
# tblDir.path,
# tblFile.name
# FROM
# tblRoot, tblDir, tblFile
# WHERE
# tblRoot.ID = tblDir.rootID AND
# tblFile.dirID = tblDir.ID AND
# tblRoot.pathRelative LIKE '%shipshape%'
#
'ID','path','pathRelative','path','name'
'1','e:','\shipshape','shipshape','.encode.py.swp'
'1','e:','\shipshape','shipshape','decode.py'
'1','e:','\shipshape','shipshape','encode.py'
'1','e:','\shipshape','shipshape','encode.py.bu'
'1','e:','\shipshape','shipshape','encode.pyc'
'1','e:','\shipshape','shipshape','export.py'
'1','e:','\shipshape','shipshape','metadata.py'
'1','e:','\shipshape','shipshape','metadata.pyc'
'1','e:','\shipshape','shipshape','walk.py'
'2','e:','\shipshape\CVS','CVS','Entries'
'2','e:','\shipshape\CVS','CVS','Entries.Extra'
'2','e:','\shipshape\CVS','CVS','Repository'
'2','e:','\shipshape\CVS','CVS','Root'
'4','e:','\shipshape\meta\CVS','CVS','Entries'
'4','e:','\shipshape\meta\CVS','CVS','Entries.Extra'
'4','e:','\shipshape\meta\CVS','CVS','Repository'
'4','e:','\shipshape\meta\CVS','CVS','Root'
'5','e:','\shipshape\sql','sql','createTblDir.sql'
'5','e:','\shipshape\sql','sql','createTblFile.sql'
'5','e:','\shipshape\sql','sql','createTblRoot.sql'
'5','e:','\shipshape\sql','sql','selectAll.sql'
'5','e:','\shipshape\sql','sql','shipshape.sql'
'6','e:','\shipshape\sql\CVS','CVS','Entries'
'6','e:','\shipshape\sql\CVS','CVS','Entries.Extra'
'6','e:','\shipshape\sql\CVS','CVS','Repository'
'6','e:','\shipshape\sql\CVS','CVS','Root'
'8','e:','\shipshape\template\CVS','CVS','Entries'
'8','e:','\shipshape\template\CVS','CVS','Entries.Extra'
'8','e:','\shipshape\template\CVS','CVS','Repository'
'8','e:','\shipshape\template\CVS','CVS','Root'
mysql table scripts
The table structure (at the moment) captures the following relationships,
Root => Dir => File.
The tblRoot table (at the moment) stores non
relational path information, path, pathrelative. eg: path - e: pathrelative - \shipshape\template.
/*
#===
# name: createTblFile.sql
# date: 01NOV2003
# cvs: $Id: index.html,v 1.1 2003/11/02 13:07:39 sah Exp $
# programer: goon
# os written: windows
# space: tab
# sql vers: mysql specific, want 92
# description: sql to build tblFile
#
# bugs: none
#===
*/
# Host: 192.168.0.1
# Database: squared
# Table: 'tblroot'
#
CREATE TABLE `tblroot` (
`ID` int(11) NOT NULL auto_increment,
`path` varchar(100) NOT NULL default '',
`pathRelative` varchar(254) NOT NULL default '',
PRIMARY KEY (`ID`)
) TYPE=MyISAM;
The tblDir table (at the moment) stores the current directory path and
foreign key reference to tblRoot.
/*
#===
# name: createTblDir.sql
# date: 01NOV2003
# cvs: $Id: index.html,v 1.1 2003/11/02 13:07:39 sah Exp $
# programer: goon
# os written: windows
# space: tab
# sql vers: mysql specific, want 92
# description: sql to build tblDir
#
# bugs: none
#===
*/
# Connection: mysqlcctmp_1
# Host: 192.168.0.1
# Saved: 2003-11-01 23:14:14
#
# Host: 192.168.0.1
# Database: squared
# Table: 'tbldir'
#
CREATE TABLE `tbldir` (
`ID` int(11) NOT NULL auto_increment,
`path` varchar(100) NOT NULL default '',
`rootID` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`)
) TYPE=MyISAM;
The tblFile table (at the moment) stores
/*
#===
# name: createTblFile.sql
# date: 01NOV2003
# cvs: $Id: index.html,v 1.1 2003/11/02 13:07:39 sah Exp $
# programer: goon
# os written: windows
# space: tab
# sql vers: mysql specific, want 92
# description: sql to build tblFile
#
# bugs: none
#===
*/
# Connection: mysqlcctmp_1
# Host: 192.168.0.1
# Saved: 2003-11-01 23:14:43
#
# Host: 192.168.0.1
# Database: squared
# Table: 'tblfile'
#
CREATE TABLE `tblfile` (
`ID` int(11) NOT NULL auto_increment,
`name` varchar(254) NOT NULL default '',
`dirID` int(11) NOT NULL default '0',
PRIMARY KEY (`ID`)
) TYPE=MyISAM;
Example searching for icon used
Heres a cheesy example. I need a python logo - python powered - would be good.
So I wrote the script, hit the query button MySqlCC and 120s
later I had the result.
Not exactly optimal but if I had a front end with a web server it would
allow me to see the result and use it.
# MySQLCC - [mysqlcctmp_1] Query Window
# Connection: mysqlcctmp_1
# Host: 192.168.0.1
# Saved: 2003-11-02 23:29:20
#
# Query:
# SELECT
# tblRoot.ID,
# tblRoot.path AS ROOT,
# tblRoot.pathRelative AS PATH,
# tblDir.path as DIR,
# tblFile.name as FILENAME
# FROM
# tblRoot
# LEFT JOIN
# tblDir
# ON
# tblRoot.ID = tblDir.rootID
# LEFT JOIN
# tblFile
# ON
# tblDir.ID = tblFile.dirID
# WHERE
# tblFile.name LIKE 'py%.gif' OR tblFile.name LIKE '%py%.jpg'
#
'ID','ROOT','PATH','DIR','FILENAME'
'296','e:','\reading\bored\organised\python\4199_files','4199_files','python2.gif'
'309','e:','\reading\bored\organised\spam\chinese_files\menu_data','menu_data','notpyth.jpg'
'681','e:','\reading\python\docpy23\icons','icons','pyfav.gif'
'686','e:','\reading\python\docpy23\misc\1690_files','1690_files','111-pythonnews.jpg'
'686','e:','\reading\python\docpy23\misc\1690_files','1690_files','python2.gif'
'688','e:','\reading\python\docpy23\misc\2388_files','2388_files','111-pythonnews.jpg'
'690','e:','\reading\python\docpy23\misc\2553_files','2553_files','python2.gif'
'700','e:','\reading\python\unorganised\culture_files','culture_files','PyBanner050.gif'
'700','e:','\reading\python\unorganised\culture_files','culture_files','PythonPoweredSmall.gif'
'702','e:','\reading\python\unorganised\process_files','process_files','PyBanner008.gif'
'702','e:','\reading\python\unorganised\process_files','process_files','PythonPoweredSmall.gif'
'703','e:','\reading\python\unorganised\tools_files','tools_files','PyBanner032.gif'
last updated: "$Id: index.html,v 1.1 2003/11/02 13:07:39 sah Exp $"