Database Abstract Interface Discussion

last modified: August 21, 2013

(Moved from GreatLispWar, which is getting TooBigToEdit.)


...Except, of course, for every indivisible algorithm that requires customisation, as shown in HofPattern.

It's great because you say it's great. The pattern is rare in the wild, or at least uncompetitive with alternative solutions.

[What about qsort(), which is part of C's standard library? What about approximately half of Ruby's standard library? Are standard libraries rare in the wild? What about any genetic algorithm ever written, DijkstrasAlgorithm, AstarSearch, and so on? -DavidMcLean]

I've never ever used qsort. I imagine it's useful for SystemsSoftware and embedded systems, but those are not my niche. And much of Rudy's library, and perhaps qsort's comparison techniques, could be implemented in objects. If you don't like Ruby's object syntax, complain to the Ruby makers, not me.

[If you use an object constructed on-the-fly with an associated callable method---the C++ functor pattern, effectively---then you're using higher-order functions. It's exactly the same thing, just with syntactic overhead. The minimal syntactic overhead on blocks in Ruby is vital to their utility in code; any syntax that required declaring an object-with-some-named-method to pass to things like Enumerable#each would be too verbose to be used as ubiquitously as Enumerable#each is and should be. JavaScript might be a better example to look at here, because its object syntax is extremely concise; despite this, one finds that the vast majority of functions-that-need-to-take-functions just take functions, not objects. -DavidMcLean]

Again, that's perhaps a problem with Ruby, not objects in general. When deciding what to include in a language or shop conventions, if x and y are similar and overlap in many areas, and you agree that it should have x, but circumstances where y shines over x are not used/found very often in the field, then it makes sense to exclude y if you want to keep the language relatively simple. (NonOrthogonalLanguageFeatures.)

[You could change the object syntax such that there's syntactic sugar for creating a C++-functor-type object, i.e., an object with a callable associated method, which is what you're actually trying to pass around. However, providing sugar like that is providing lambda syntax: It's exactly what Java 8 is doing. On the other hand, if you're trying to reduce the number of related concepts in a language, why not remove named functions and methods? CoffeeScript proves that you only actually need anonymous ones in the syntax. -DavidMcLean]

Or we could hard-wire table/collection-oriented idioms into the language rather than rely on libraries (a bit like ExBase). This may prevent unwanted creativity. I'm sure you would object (no pun intended), but it's back to the 'ol "herding" debate.

[That's adding more NonOrthogonalLanguageFeatures, Top, and specialised ones at that. In addition, if we represent collections using parts of the standard library (the Enumerable module, in Ruby), we can add more collections. If we hardwire collection handling, how would we do that? -DavidMcLean]

First let's analyze some realistic situations where that might be a need.

[There's a database vendor who isn't the developers of the language. We want to be able to query that database. -DavidMcLean]

ODBC driver and/or a C API.

[And how do we make our hardwired collections handling use that C API, when it wasn't designed with such an API in mind? -DavidMcLean]

Maybe I misunderstood you. I thought you meant like a tool vender being able to use this hypothetical TOP language's "internal" database data. Instead you mean having "TOPX" communicate with another vendor's database? Something like ODBC is the usual way. And I don't see how Ruby improves on cross-language and cross-vendor data sharing.

[Yes, I meant querying another vendor's database. Let's make it concrete: You're using TOPX and you find you have to query a MongoDB database. How do you do this, preferably without sacrificing your collection syntax? -DavidMcLean]

queryLoop to=q, db=mongo1, qry="Magic mongo syntax"  // for each result record
  print q.name, q.rank, q.serial  
  doSomethingElseWithRow(q)
End 
queryLoop to=q, db=mongo1, glib="nof", florf=7, tloog="pift", zarg=x
  print q.name, q.rank, q.serial  
  doSomethingElseWithRow(q)
End 
Query native=true :
   SELECT $myFields
   FROM FOO
   WHERE X=5 AND $myAndClauses
   ORDER BY $myOrderFields
Loop: 
   print a, b, c
End Loop
// Example: Carmen01
// search for monotone (greyish) jpeg's of Carmen Electra's face
Query driver="googleImages":
  SELECT * FROM googleImages 
  WHERE faceness > 0.7 AND extension IN ('jpg','jpeg') AND 
    saturation < 0.2 AND textQuery = 'Carmen Electra'
Loop:
  print url, size, width, height, extension, altTag
Status:  // std. query result info
  print "Rows: $rowcount, Query-Time: $qryTime"
End Query
// [leave dot below to work around a wiki bug]

.

// Example struct01
table: parseTree
---------
ID
parentID  // zero for root
sequence
token
tokenType   // token or value type
grammarRef  // grammar rule reference number/ID
warningMsg   
errorMsg    

.

As far as why it's "better than" Linq? Because the native syntax is much closer to SQL. One can copy and paste from and to a DB browser such as Toad with little or no changes. There would be a "native" mode and a pre-parsed mode. Native would just pass on the query as-is, after any variable substitutions. The pre-parse mode would generate the parse-tree table talked about above under the hood. There are tradeoffs between each approach.

[Reasonable. I assume "variable substitutions" aren't necessarily actual variable substitutions, however, right? They can be prepared-statement value binds as well? -DavidMcLean]

The pre-parsed approach could be used to generate prepared-statements via the drivers since they have access to the parse tree, and could be used like subroutines in which variables are passed.

[But the pass-through approach doesn't? Wouldn't that make it inherently insecure? -DavidMcLean]

If you want to force a Nanny Switch on your version TOPX forbidding pass-thru queries, be my guest. I would make it optional. But that's another topic.

[Are you saying you want to make security optional? -DavidMcLean]

Sometimes pass-thru is the only way to get something to work. Let's just agree it's a shop decision at this point and not get side-tracked into security.

[When you're designing syntax for querying databases, security isn't a side-track. It's a central concern. -DavidMcLean]

Almost every common programming stack allows "direct" queries if you want them.

[Certainly, as a side-effect of the way even with secure variable binding the query itself is still a string. You aren't designing a low-level database connection API, however. You're designing a special syntax for building queries, which is intrinsically at a higher level. The expectation with such a syntax is that it's secure, rather than thoughtless string-concatenation. -DavidMcLean]

Higher-level tools generally need to offer an escape hatch to the lower-level when there's a problem with the higher level. A high-level GUI API should be able to address, read, and set individual pixels, for example, for the times that the high-level commands can't satisfy a specific need or have a bug. (See also LeakyAbstraction.)

[Reasonable, but the low-level escape hatches shouldn't be a first resort. They're comparatively dangerous, and developers need to know that to compensate accordingly. -DavidMcLean]

Let's leave tuning what the defaults are to another day.

[I'm not interested in tuning the defaults. I'm interested in making the design, as a whole, safe. -DavidMcLean]

My philosophy is to make the safer option the default, but let the shop managers or owner make the final decision as far as the trade-off choices. I build tools, not cops. (And some drivers may not have the ability to provide pre-analysis.)

[If the safer option is indeed the default, why'd you immediately jump to the unsafe option as soon as I asked for usage examples? I'm honestly confused. -DavidMcLean]

If you are not explicit, I will choose the shortest interpretation out of both laziness and to keep the examples easier to read.

[Right, I'll be explicit. I'd like to see the safest, most correct, and most idiomatic version of any example. -DavidMcLean]

Some companies choose nimbleness or shorter-term productivity over safety. The risk profile that the owner wants is not the programmers' job to decide, other than advise.

[I'm not some companies. As I said, I'd like to see the safest, most correct, and most idiomatic solutions. With a good design, those three factors should all be met simultaneously. -DavidMcLean]

Naive thinking about the real world. CategoryIdealism.

[Safety, correctness, and conformance with language idioms are all aspects of a design that aren't impacted by the real-world desires of "some companies", only by the decisions of the language designers themselves. In short, the real world is irrelevant. -DavidMcLean]

Wrong. If companies and orgs don't want to use it, it gets selected into obscurity. Purity Island.

[Wrong how? It's true that unused technologies sink into obscurity. That has no impact on the properties of safety, correctness, and conformance with language idioms, which are the properties I'm concerned with. -DavidMcLean]

And I'm concerned with world peace and saving sick puppies.

[Good to know. I'm excited to hear how supporting SqlInjection in your programming language will cause world peace. -DavidMcLean]

It could end up doing the opposite of what you intended because people will become disillusioned with what they consider "anal" products and revert to super-loose systems instead. People tend to "fight the last battle" meaning that a bad experience with an approach will tend to make them focus on avoiding anything remotely the same such that the pendulum swings to the opposite side. A middle-level approach thus may be a better way to achieve that goal in practice. If you don't understand human nature, you will get bitten by human nature. We make products for mostly people, NOT machines. -t

[But with correctly-written data source drivers, isn't writing correct, safe queries in TOPX exactly like writing SQL? Wasn't that the whole point? -DavidMcLean]

Like I said, somewhat often drivers can't handle something properly or do not yet support pre-parsing. The option to use the raw mode (a.k.a. "pass-through") should be available to at least the shop IT stewards. I agreed it should NOT be the default (if driver supports pre-parsing). This is not unreasonable. Good drivers can be hard to find because the audience can be relatively small because it's a combo of specific product and a specific database/service vendor. Asking a vendor or the OSS community to write query-grammar-perfect drivers for thousands of DB tools/products, many of them very "nichey", is a tall order. Raw-mode drivers are probably going to be available earlier (and more readily) than grammar-based drivers. And this applies to prepared statements (TopOnPreparedStatements) also, not just the hypothetical TOPX language above. Let me put it this way: A raw-mode driver is better than a broken or missing driver (unless you are the military, FBI, etc. where security far outweighs productivity.)

[It's not expected for vendors to write drivers for thousands of database products. Vendors just write drivers for their own database products; that's how it already works, and there's nothing fundamental about TOPX that means it can't keep working that way. In my opinion, a database driver that can't prepare statements is worthless, but clearly our opinions differ. -DavidMcLean]

Are you talking about a generic kind of driver, such as ODBC? If there is and app-specific API or language, then something has to translate that API/language into something that the ODBC driver and/or the server can understand. For example, the Linq API is a language until itself. Something has to translate the Linq "tree" into SQL and/or something the destination server can understand. And I don't know if Dot.Net's Linq driver is different from Java's Linq driver etc. Either way, TOPX would need something similar for each DB vendor. Something has to translate a non-trivial language (app/client side) into another non-trivial language (DB server side).

for V = each db vendor/service {  // Oracle, Sybase, Mysql, etc.
   for L = each app/client-tool language {  // VB, Java, Php, etc.
     write a Linq or TOPX driver[V,L]
  }  
}

Translation from SQL to Google Example

This is an amateur-style parsing approach to converting the Carmen01 example query to Google's URL "command". (There are fancy language processing techniques or libraries that can be used for such, but this is not a demonstration of building "good" translators.)

To simplify things, I will only show two criteria factors instead of four, and assume the ID is the same as the sequence number (see example struct01).

Although Google's actual facial detection is given as all or nothing (&imgtype=face), I'm assuming a threshold here to make it more realistic (1.0 is 100% certainty it's a face). Thus, I added "face_op" to indicate the made-up comparison operator and am assuming any other such thresholds append "_op" to the name for the comparison operator counterpart.

ID   ParentID  token        tokenType
-------------------------------------
1    0         sql          root
2    1         SELECT       clause.select
3    2         *            column-filter
4    1         FROM         clause.from
5    4         googleImages var.tableview   
6    1         WHERE        clause.where
7    8         faceness     var.column
8   10         >            op.cmpr.gt
9    8         0.7          value.number
 10    6         AND          op.bool.and
 11   12         textQuery    var.column
 12   10         =            op.cmpr.eq
 13   12         Carmen Electra value.string  // don't need to preserve quotes

Let's say the final result is the following URL being sent to the Google Search service:

searchkit.google.com?customerid=12345&face_op=gt&faanyhowce=0.7&q=Carman%20Electra

Here is the driver processing in pseudocode. (I use BASIC-like code mostly to make fixing TabMunging easier.)

func processGIquery(queryText, parseTbl, info[])
   // skip verifying table name because we are lazy
   mapCrit=map("faceness":"face","textquery":"q","etc":"etc") //criteria name translation
   mapOp=map("op.cmpr.eq":"eq","op.cmpr.gt":"gt","etc":"etc") //compare op trans 
   gglURL = "searchkit.google.com?customerid=12345"   // init
   // find the WHERE clause     
   wid = parseTbl.firstVal(col="id", where="tokentype='clause.where'") // quicky query
   //
   Query:  // (internal table so no driver need be specified)  
     SELECT * FROM $parseTbl.name      // get all critera expressions
     WHERE id > $wid AND tokenType LIKE 'op.cmpr%'
   Loop:   // for each comparison operator (2 in this case)
     procCriteria(id, tokenType)
   End Query
   return sendHttpGet(gglURL)  // send URL with appended parameters
  // --------- (nested function; "||" is string concat)
  func procCriteria(opid, operType)  // process a comparison operator
    left = parsTbl.firstVal("token","id="||opid-1) // left side of op
    right= parsTbl.firstVal("token","id="||opid+1) // right side of op
    gglURL.append("&"||mapCrit[left]||"="||right) // add translated crit name
    if operType != "op.cmpr.eq" // add comparison portion if applic.
      gglURL.append("&"||mapCrit[left]||"_op="||mapOp[operType])  
    end if
  end func
end func

Not shown is the code that converts the return results (XML or JSON) into a table.


The native dialect of "TOPX" SQL could perhaps support "escape" clauses:

query:
  SELECT * FROM table2 WHERE x=2 ESCAPE $specialStuff ORDER BY y
eachRow:   // loop:?
  print foo
end query

The stuff within the ESCAPE clause is sent to the database as-is. In some cases this could get messy if the translated version of the SQL (if translation is done by the driver) does not match up compared to how we expected it to and the escaped clause ends up in Timbuktu. For example, suppose a database does not support explicit JOIN clauses and translates them to the older-style WHERE...AND clause version such that the ESCAPE clause's material is "moved" to a different part of the query then where we expected.

This is not necessarily a design flaw of our draft TOPX, just the side-effects of allowing flexibility. However, the level of flexibility allowed is always a matter of debate.

[Allowing for bypassing the language's parsing might indeed be necessary as a feature in some cases, but calling it ESCAPE is probably an awful, awful choice. That keyword connotes the idea of escaping values to make them safe, as in SQL string escaping or shell escaping. This backdoor deliberately has the opposite effect, rendering an otherwise safe value "live" and dangerous. EVAL might be a better keyword for this concept, since it does indeed cause some value to be eval()-style evaluated by the database, and most devs are familiar with the injection potential and hence possible dangers of an eval() call. -DavidMcLean]


See also: EmbraceSql


Loading...