Make Shell Aliases Easy
May 5th, 2015
if you run a command with at least one argument twice a day, then you should make an alias for it.
Today’s web-based applications can freely pass around large amounts of data formatted in XML and JSON. Businesses often want to hang on to and analyze even the most obscure pieces of information, either for historical (audit) purposes, or to attempt to mine the data for competitive advantage. Storing XML or JSON in databases can be problematic. Fortunately, modern databases rise to the challenge and include data types that allow us to store large blocks of text outside of the tablespace while still providing near seamless access to that data. The CLOB is how Oracle stores large blocks of character data.
A character large object is a data type in a database management system capable of storing very large amounts of text (string) data. CLOBs are often stored in an out-of-table location apart from other table data. The majority of database systems support this data type; some are named text, memo, or long character. In Oracle, a CLOB can store an array of character data up to 4GB in size.
In Oracle, the varchar and varchar2 data types are identical and are the typical type used for storing string data. The varchar type has a maximum size of 4000 bytes, which is usually enough for most purposes. varchar data is stored within the table along with whatever columns that are defined in the table. The problem is, when we need to store JSON or XML data, 4000 bytes gets used up very quickly. Due to the manner in which the database allocates storage for data, large arrays of character data can cause performance and optimization problems. Thus, the CLOB type was invented to solve the problem. If you are using another DBMS, SQL Server for example, the varchar types have a larger maximum size and thus you may be able to get away with using them for storing JSON and/or XML data. However, the risk with using any fixed size character field for marked up text is that someone along the line will insert some data that will exceed the stated storage capacity. If you have tight control over that data, varchar might work, otherwise 4GB gives you a lot of buffer in your design.
Creating a table with a CLOB column is simple:
create table MYCLOBS
(
id number not null,
data clob,
date_created timestamp,
CONSTRAINT PK_MYCLOBS PRIMARY KEY (id)
);
-- An auto-generated sequence for the ids
CREATE SEQUENCE myclob_seq;
CREATE OR REPLACE TRIGGER myclob_bir
BEFORE INSERT ON myclobs
FOR EACH ROW
BEGIN
SELECT myclob_seq.NEXTVAL
INTO :new.id
FROM dual;
END;
The above SQL create statement creates a new table that includes an id column, a creation date timestamp column to record when a row is inserted, and a CLOB column named “data” in which we will store some JSON data.
Everything in software design involves trade-offs and CLOBs are no different. For the price of being able to store large amounts of out-of-table character data, interacting with that data becomes slightly more difficult. As the desire to store JSON and XML data in tables has become more prevalent, newer versions of popular databases are eliminating the overhead and the hassle of dealing with the CLOB type. My notes and observations come from working with Oracle 11g (v11.2.0.2.0). It is important to mention that in Release 12, Oracle has upped the ante even further by adding native support for XML and JSON data, including constraining CLOB columns to a particular type (a check to ensure the data is valid JSON):
ALTER myclobs ADD CONSTRAINT ensure_json CHECK (data IS JSON);
That type of constraint is not available in version 11g. Version 12 also includes the capability of writing select queries using a dotted notation directly on data contained within the JSON structure, for example:
SELECT data.eyeColor FROM myclobs;
Inserting bulk data is seamless, depending on which tool or framework you are using.
Here is an example insert statement:
insert into myclobs (data, date_created) values (‘[{
"_id": "556e52fc21aa6202ee1aac18",
"index": 0,
"guid": "dd60b0a8-2bb8-4dae-83b4-1940b2e47f9c",
"isActive": false,
"balance": "$3,616.96",
"picture": "http://placehold.it/32x32",
"age": 40,
"eyeColor": "blue",
"name": "Mayra Stevenson",
"gender": "female",
"company": "EPLODE",
"email": "mayrastevenson@eplode.com",
"phone": "+1 (927) 436-2136",
"address": "702 Celeste Court, Beaverdale, Connecticut, 169",
}]’, SYSDATE);
There is a ‘to_clob()’ function you can wrap around the string literal if your development tool requires it; IntelliJ does not. Note that there is a hard limit on the length of the actual string in the SQL statement itself, regardless of what data type your column is. This gist example generates the “string too long” error.
There is a work-around for inserting large JSON or XML structures from your SQL statements.
DECLARE
json CLOB := 'JSON chunk 1' || 'JSON chunk 2' || 'JSON chunk 3';
BEGIN
insert into myclobs (data, date_created) values(json, SYSDATE);
END;
Deleting data stored in CLOB form requires no special syntax. Depending on the DBMS and the amount of data being removed, it may be prudent for the DBA to reclaim the LOB space.
DELETE FROM myclobs where id=21
Querying a CLOB requires different syntax:
SELECT * FROM myclobs
where dbms_lob.instr(data,'Stevenson') >= 1
Updates that completely replace your CLOB data are trivial:
update myclobs set data = '{ "changingTheJSON": "Test" }' where id=2
What is more interesting than a straight update of the CLOB data is that you can edit parts of your XML or JSON in a batch fashion:
INSERT INTO myclobs (data, date_created) VALUES('<attrs><attr name="name"><string>Barney</string></attr></attrs>',
SYSDATE);
UPDATE myclobs m SET data=UPDATEXML(
XMLTYPE.createXML(data),'/attrs/attr[@name="name"]/string /text()','Gumble').getClobVal()
WHERE m.id=26;
That query could easily be extended to update many rows simultaneously, perhaps to add or repair previously saved XML.
Hibernate makes working with CLOB data completely seamless. A CLOB column is simply represented as a String type. In Java entity classes, a CLOB looks exactly the same as varchar data. I have pushed a simple example to Github. Outside of Grails, few developers seem to be generating their tables using Hibernate, probably because DBAs want to be able to review and run SQL in higher environments such as QA and PROD, since many companies do not give developers full access to those databases. For those developers fortunate enough to have free rein, adding the annotation ‘@Lob’ to the intended property of your entity, will ensure that Hibernate uses the CLOB datatype instead of varchar.
@Lob
@Column(name = "DATA")
private String data;
Again, if you are not using Hibernate to create your tables for you, the @Lob annotation is spurious.
The BLOB datatype is worth a quick mention: A BLOB is a Binary Large Object, and the name is exactly what it implies. Instead of character data, BLOBs contain binary data, for example images, PDFs; anything really, that is large in size and other than text. Like CLOBs, BLOBs are stored out of table and have special considerations around using them.
I will be criticized if I fail to mention NoSQL solutions because they were created specifically to deal with storing JSON. Hadoop, Couchbase, and others represent groundbreaking technology for storing XML and JSON in simple map structures while avoiding the overhead of a full DBMS. These tools are excellent, but I have not yet been able to make use of them. Company data is either extremely valuable, sensitive (e.g. PHI or PCI), or both, and in these days of major embarrassing data breaches, all data is being more conservatively looked after than ever before. Due to the high value placed on data, it is difficult sell for anyone in a leadership position (not to mention a developer) to make the case for implementing a NoSQL solution. Even where it is possible to do so, individuals must first answer many difficult questions relating to architecture, security, monitoring, redundancy, backups, purchasing and allocation of adequate storage, and training of administrative staff. Companies heavily invested in databases such as Oracle are not often interested in procuring a competing form of data storage. No matter how speedy or technically sensible a NoSQL solution may be, it may not be feasible to implement it. I fully expect that NoSQL will achieve wider adoption in the future as Oracle and other large database players (“trusted brands”) are now producing their own NoSQL products. Perhaps in the near future NoSQL capabilities will be considered common infrastructure for developers to solve problems with.
The CLOB datatype is a specialized and very useful type available in modern database systems. Using CLOBs when needed will keep your database running efficiently and thereby keep you out of trouble with your DBA. Future widespread availability of NoSQL solutions will no doubt allow for even more thoughtful system architectures to be built. In the meantime, CLOBs can fill the gap.
if you run a command with at least one argument twice a day, then you should make an alias for it.
Google Code is shutting down (because everyone uses github anyway) So long and thanks for all the fish. It has been a good ride — thank you for pushing open collaboration forward! Its been a good transition to git. If you haven’t yet, consider…
Currently, Travis doesn’t have a way to display a build status badge for individual jobs in a matrix build. You can only get an individual badge which displays “passing” if all the jobs pass or “failed” if one job fails. I’ve created a small open…
Insert bio here