
I have now been able to get the introspector perl scripts to run on the output
of rdfproc, a part of redland. All you need to use this now are just
the redland, and there are debian packages for them.
You can use many tools on this rdf, take a look at http://librdf.org
for more information
You are going to want these packages for debian.
librdf-perl - Perl language bindings for the Redland RDF library
librdf0 - Redland RDF Application Framework
librdf0-dev - Redland RDF library development libraries and headers
libraptor1 - Raptor RDF Parser library
libraptor1-dev - Raptor RDF parser and serializer development
libraries and headers
Here are some good example data files :
c-dump ntriples
rdfxml example
These are two forms of rdf, ntriple and rdf/xml.
You can use them with the introspector like this, example given with
the ntriples :
1. gunzip the file
gunzip c-dump.rdf.gz
2. make a redland repository
rdfproc Global parse ntriples file:/
The Global is the name of the repository
file:/ is the base address that can be what ever uri you want
That will create a repository in the current directory using berkleydb
6.2M Global-po2s.db -- predicate object index (used to find by
field)
9.0M Global-so2p.db -- subject -object index (not used)
9.5M Global-sp2o.db -- subeject predicate index (graph traversal)
25M total
So you have about 9mb of indexes for a 500k zipped ntriples file.
The unpacked sizes are here :
13M Nov 28 15:34 c-dump.rdf
4.7M Nov 28 15:34 c-dump.ntriples
wc(wordcount) on c-dump.ntriples gives lines 96,818, words 387,292,
chars 4,846,776
The original source file (expanded with headers)
lines 13,270 words 27,221 chars 260,051(254K from ls) c-dump.i
So we are talking about 10x increase in size for indexing.
For example, i have installed the introspector into my home dir :
The cvs version is up to date, You can download the release here from sf.net
so, to use it Go to the directory containing the rdf database files
perl -I/home/mdupont/EXPERIMENTS/introspector/introspector-0.7
~/EXPERIMENTS/introspector/introspector-0.7/recurse5.pl
node_types:function_decl file:/
the node_types:function_decl is the node types that i am looking for,
other interesting ones can be found in the Introspector/GCCTypes.pm
file.
I hope that you take some time and play around with the introspector.
It is not running perfect, but fast!
Is it possible to supress information on the internet ? How can a gang of people work together to supress error and bug reports? How can they firewall themselves againt people who seek to exploit them? we take a case study from Slashdot.
The Dot GNU project lead by norbert bello and rhys weatherly carried out by richard bowman is systematically supressing information about the introspector project.
It has via its advocates created a virtual shield in slashdot to supress bug reports and error reports from being published. By downmodding articles in a gang, they attempt to firewall the public about thier own errors, they are attempting to supress information about error reports about code that contained pointer to functions handling errors while compiling the gcc and libx11 source code .
Here is an example of a posting that was on topic:
this harmless and ontopic article is downmodded as offtopic even though it is ontopic. I suspect it was one of the modmobbers in the firewall. Here I reply to the troll who carried out the operation
Because my comments about pnet where modded down to nothing as offtopic, I think that I need a place to post my complaint.
I just wanted to point out that I have produced the most bug reports for pnet at the time (currently 54 bug reports in the system
).
One good thing I have to say about Rhsy is that he fixes bugs quickly!
My sometimes harsh and questioning nature that brought me to free software is sometimes too much for people to take.
I bitched and moaned alot about dotgnu implementing patent-endanged code instead of following thier original plan.
My complaining got me banned from the project.
Now instead of accepting reports about this buggy software that would warn people that it is not usable on really challenging code (like libx11 or the gcc) that I was compiling with it.
The author rhys tried to *gag* me.
I complained to savannah about this here
In the end, he of course appreciated my valid bug report and fixed them like all the other ones.
The arguments that my bugs are not valid really dont hold water if all of them are fixed. The reality is that the software was lacking major testing, and that my reporting of all these bugs reflected that unstable state.
The funny thing it the backpedling you can find in the bug history , the bug is turned from invalid to fixed!
"Mon 02/16/04 at 23:41 rweather resolution_id Invalid Fixed"
So, I would really think twice before spending your time working on pnet/c, because the developers are trying to hide bugs from you! (the ones that I reported (Real bugs))
mike
I have released a new version of the introspector, a proof of concept, something you can look at and learn from. A self contained demo program that allows you to graphically explore the structure of a almost any program that you can compile with the gcc!
Screenshot
Screenshot
Download for linux
LInux Binary
Source :
source
One important aspect that is should mention is that this providers a fulll access to the gcc interface via code generation and subsequent execution. That creates the possibiltly to require that the users of that generated code make thier derivitive works under the gpl. That just might solve the wishes of those who want only free software to interface to the gcc..
It features the introspector ice cube. The ice cube contains a superfast and compressed extract of the semantic data of the program that can be compiled in as a lib and loaded into memory in miliseconds.
The graph alogorithms are also very fast on constant size arrays of object!
Hopefully It will become the new way to embed a static semantic resources into your new programs.
We then slice the ice cube for each by Property into nice thin C arrays.
It has a gcc tree extracted out of the dotgnu pnet idlasm code emit function. That means i have reversed engineered an free software component.
The results of the reverse engineering are stored in a rdf repository. This has cwm,perl, and shell scripts doing semantic processing of the data. An redland RDF repository is used to interface into the guts of the gcc compiler.
The asts are serialized by a patched gcc3.4 experimental -fdump-translation-unit, you can find the source code in the cvs.
That is emitted into rdf and converted by a perl script into a ice cube.
That are served into slices of data, each attribute its own vector that has the length of the number of nodes in the selected rdf property. There is in fact a matrix of all the objects and relationships between them stored in the Array.
This program contains just the linux binary of the program that has all this data compiled into an ICE Cube :
That is emitted into a inline c array for compiling into the target program.
Please join up on the list, come to the #introspector chat zone on freenode.net, and jabber me at mdupont@nureality.ca
thanks,
mike
Short link : http://xrl.us/qvt
Long link : http://rdfig.xmlhack.com/2003/08/26/2003-08-26.html#1061882096.088161
Details :
http://introspector.sourceforge.net/2003/08/treecc-info.owl The treecc introspector ontology main file.
This ontology is extract out of the c header of the treecc program.
It will allow the RDF markup of the instances of the compiler objects defined using the treecc language.
In that sense, it is a meta-meta-model.
The treecc grammer for a language is a form of a meta-model of that language.
The supporting files are also here
This file [http://introspector.sourceforge.net/2003/08/treecc-input.owl|input] contains many enum values that will be used to mark various nodes, they will become owl:Classes on thier own right.
This file [http://introspector.sourceforge.net/2003/08/treecc-parse.owl|Parser] contains support properties about the results of the parser eg : treecc-parse:TreeCCParse will have the owl:Domain of a TreeCCDocument and the owl:Range of a TreeCCIntrospected file
Sent to the GCC List
--- Joe Buck wrote: > On Tue, Aug 19, 2003 at 12:09:13AM -0700, James Michael DuPont wrote:
> > Dear all,
> > Ashley winters and I have produced what is the first prototype for a
> > gcc ontology. It describes the facts extracted
from the gcc using the introspector.
> Merriam-Webster defines ontology as follows:
> 1 : a branch of metaphysics concerned with the nature and relations > of being
> 2 : a particular theory about the nature of being or the kinds of > existents
> I don't think that this is the right term for a piece of code that > produces XML output from a high-level language input.
Your right. The ontology here is a description of the gcc tree nodes in a very high level that allows you to *understand* the RDF/XML output .
The code that produces this data that matches this onto. is in my cvs, very boring stuff.
I think that this onto. is interesting because it allows you to express the high level semantics of the tree structures in a somewhat implement ion independent manner.
When this is all done and %100 tested we should be able to generate sets of c functions to process data from that ont, databases to store it, and other things like perl and java classes to process the data as well.
n3 coupled with CWM and Euler makes a logical programming language like prolog, you can express data schemas but also proofs, filters and algorithms in n3
I hope that the proofs expressed in CWM and Euler can be translated automatically into new c functions of the compiler in the very very long term.
In any case this ont is meant to be human readable and editable, even if not very pretty. Later on in a lowel level ont. it will contain mapping to the exact gcc structures and functions that implement these ASTS.
In any case this ONT should be of interest and value to anyone wanting to study the gcc ASTS, not just someone who wants to deal with any external represention.
The proofs expressed in n3 should be executable directly on the gcc data structures in memory without any direct external represention when we are able to map out all the data structures and generate binding code.
Then users will be able to write custom filter, algorithms and rules that run inside the gcc for them on their own programs.
Basically this is a high level class model for the GCC internal tree structures as used by the c and (not complete C++) compiler.
The file are based on the OWL[1] vocabulary, which is an RDF[2] application that allows the syntax to be described in RDF/XML[3], n3[4] or ntriples[5] format.
""""The Web Ontology Language OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF (the Resource Description Framework) and is derived from the DAML+OIL Web Ontology Language. """"
This file is describing the data extracted by the introspector [0] from the gcc. The format of the file is closly related to the -fdump-translation-units format, but more usable. I patched the gcc using the Redland RDF Application framework [8] to serialize these tree dump statements into RDF statements using the berkley db backend for fast storage.
The DB is then available for querying using C/C++, JAVA, PERL, Python, and many other interfaces via the Redland Swig interface. Even more you can filter out interesting statements into RDF/XML format for interchanging with other tools.
You can find an example file extracted from the source code of internals of the pnet runtime engine here [9].
The ontology file is basically a powerful class model, you can use many tools to edit and view them, (which i have not tried most of them) TWO of them are the rdfviz tool and owl validator[10]
I used the Closed World Machine [6] from Tim Berners-Lee to process and check this file, that tool along with the EulerSharp[7] that I am working on will allow you to run queries, filters and proof over the data extracted from the gcc.
Futher still, my intent is to embedded a small version of the Euler machine into the gcc and dotgnu/pnet to allow proofs to be made at compile time.
mike
[0] Introspector - introspector.sf.net
[1] OWL - http://www.w3.org/TR/owl-ref/
[2] RDF - http://www.w3.org/RDF/
[3] RDF/XML http://www.w3.org/TR/rdf-syntax-grammar/
[4] n3 http://www.w3.org/2000/10/swap/Primer
[5] ntriples http://www.w3.org/2001/sw/RDFCore/ntriples/
[6] CWM from timbl http://www.w3.org/2000/10/swap/doc/cwm.html
[7] Eulersharp http://eulersharp.sourceforge.net/2003/03swap/
[8] Redland http://www.redland.opensource.ac.uk/
[9] Example n3 file http://demo.dotgnu.org/~mdupont/introspector/cwm.rdf.gz
[10] RDFVIZ and validator http://www.ilrt.bristol.ac.uk/discovery/rdf-dev/rudolf/rdfviz/
http://owl.bbn.com/cgi-bin/vowlidator.pl
The argument of the DRM proponents is that it is not possible to
protect their content without taking away the rights of the students.
That is why I have sought to design a solution for content distribution
based on free software and open standards that still protects the
content from illegal distribution.
I seek with this proposal to address these issues in the context of
free software without violating the rights of the students.
Lets say that we have some content that an author worked hard on, and
it should be distributed to people who decide that paying a reasonable
fee.
Now the one issue is that even if the users should have the right to
examine the source code of the software, we still need a way to prevent
them from extracting the content out of that software.
If you allow the user to modify the viewing software as to create an
human readable and machine processable of the content instead of
displaying it, then you are opening up the content for further
duplication. Now we are precluding screen shots and OCR software here.
Lets say that you want to deliver a rastrasterizedy of the content to
the user at an agreed upon resolution. Vector graphics would again
allow too much export control.
So we have an agreement between a content provider and a content
consumer for a delivery of a certain amount of content that meets a
certain level of quality to a viewer that limits the users rights in a
predefined manner.
Now, the viewer cannot store the content in a internal data format that
is readable by an debugger, because it would be too easy to snarf that
data out.
So, I think we can solve this problem very simply : You need to trust
that the user will only use an agreed upon version of the viewer
software. This software can be free software, and the full source code
may be made available, but the content provider does not agree to
provide the content to any but an specified and verified set of modules
to the user.
So I proposed the following architecture :
1. The users are to be validated by a chip-card system, each user must
have a way to authenticate their identity using a card issued by the
content provider or a certificate authority. Simple PGP PGP SSH
certificate can also be agreed here.
2. The users agree to have a free software client module installed that
is of a specified version. This software is able to make a network
connection to the content provider and send a digitally signed and
encrypted signature of itself to the content provider by a secure
channel. This creates a secure session that can only be understood by
the client module. The user agrees that he does not have the right to
intercept this content which uses open and free software that he can
inspect on his leisure. The session however is only good for one set of
package, because the user might swap out the software once the session
is set up. Hardware based checksumming might help speed up this
signature process. BSD has such a software signature built in as well.
The user agrees to allow the server to re-check/audit the validity of
the client software on its leisure on a predefined interval,that way
the server administrator and users can agree on a set of security
levels that are appropriate for the given application performance
requirements.
3. The user uses this session to request content that is sent securely
to him/her. The content is encrypted with an agreed upon encryption
standard that will prevent the user from viewing the content. Only the
client software session, given an authentication token from the
provider and from the client will be able to for one time be able to
decode the content. The software then deletes that content according to
the agreed procedure.
4. The user can then view the rastrasterizedge. That image could also
be water-marked and Id-ed. The agreement between the content provider
and the user may define various rules preventing the removal of the
various security water-marks. Of course the user can take that one
raster and distribute it illegally. There is nothing that any of the
DRM DRM do to prevent that.
You see, this is a consent based security system that requires no
freedoms are removed from the user. The content provider reserves the
right to refuse delivery of content to any other version of the
software, the client however has the freedom to modify this software
and submit it to content providers for certification.
I think such an consent based content management is much saner than
using non-free file formats and non-free software.
What do you think?
I have posted an open letter to red hat concerning thier support of students rights :
It is amazing that RedHat have such a restrictive license on thier courseware software, considering how many good courseware projects there are.
I find no mention of Freedom or the Gnu Free Documentation License on the red hat "open source" educational site. It makes me wonder how good this education is.
Digital Think is the exclusive provider of Red Hat eLearning. http://www.digitalthink.com/catalog/license.html
"Licensee shall not, without the prior written permission of DIGITALTHINK, nor permit anyone else to copy, decompile, reverse engineer, disassemble or otherwise reduce the Courseware to a human perceivable form, or to modify, network, rent, lease, loan, distribute, or create derivative works based upon the Courseware or the documentation in whole or in part."
I have posted these questions to all of the appropiate places in Redhat, with no response. Now It is time for me to ask the community to help get some answers.
The following questions are directed at redhat, the teachers and students of courses and of students themselves. If you could take a minute and fill out these questions and post them back to the list, that woul be great.
That is why I would like to see some official statement from red hat on the following questions,
1. Do you support and promote the usage of the Gnu Free Documentation license for your learning materials?
2. Do you support and promote the usage of internet Standard file formats for your learning materials?
3. Do you support and promote the usage of free software for your distance learning software?
4. Do you support the idea of allowing access to the source code of all the tools involved in the courseware?
5. Do you support and promote the relieving of students from EULA and and other license agreements that are designed to take away the freedoms of the students?
5. Do you support and promote the docuementation of free software?
6. When you are creating learning materials, do you contribute your changes, improvements, criticism, ideas and sources back to the community?
I look forward to some response.
Read more here :
advogato article
The Introspector project has realigned itself with a new set of goals that are achievable. After more than a year of research into the semantic web and the dotgnu system, I have concluded that the main goals of the introspector can be reached much more quickly with much easier goals.
The original goal of the introspector was the extraction of metadata from the GCC. The DotNet system presents you with much more metadata then you could ever want.
The dotgnu/pnet system is GPLed and has tools to be able to disassemble, to assemble and run C/IL binaries from DotNet.
http://www.southern-storm.com.au/portable_net.html
RDF is the cornerstone of the semantic web, Redland is an great library for processing RDF.
http://www.redland.opensource.ac.uk/
The new introspector module will allow you to convert your IL code into RDF for semantic markup and also to be able to assemble RDF back into dot net binaries.
The later versions will also allow the tracing of the execution of your programs in rdf.
These features will unite the semantic web and the dotnet world. Programs and Executions can be treated as data, Data can be treated as Logical statements and fed into proof engines.
Also, you will be able to transform your rdf and xml files into the Introspector rdf for translation into binaries.
The end result will also be the ability to semantically mark up IL code for converting it into a new language, opening up a new world of semantic progamming to DotNet.
See the simple plan here :
http://sourceforge.net/mailarchive/forum.php?thread_id=2916369&forum_id=7974
See the Original kick off here :
http://sourceforge.net/mailarchive/forum.php?thread_id=2911806&forum_id=7974
Gforge is the next generation sourceforge, Gsoap is a fast c++ soap implementation. They are now talking.!
Not that you might care to much, but I thought you might be interested.
This is just the hello function,but more to come.
You can find a 1.4 mb archive of my current working interface and binaries here :
The soap wsdl api is here : SoapAPI.php?wsdl
Gforge is the successor to source forge GForge.org
mike
The Free Software in Education project seeks to enable, advocate, and defend Free Software usage in schools from Kindergarten to university.
It is looking for members from all over the world.
One of the goals is to define and defend the rights of students to use free software.
Please just join up and tell us about your experience with Free Software at school.
I've got a bad feeling about this.