Pseudoscience and Wrongful Convictions in the War on Drugs

We take judicial notice of the frightening rise of illicit drug use . . . in this country which is rapidly approaching epidemic proportions. However, we cannot allow this fact to result in a lessening of the state’s requirements of proving each element of the crime beyond a reasonable doubt, for this requirement has long been a metaphysical cornerstone of our criminal justice system.

— Slettvet v. State, 280 N.E. 2d 806, 809 (Ind. 1972)

“Beginning in 1972,” according to the Journal of Criminal Defense, “Bob Shapiro and James Shellow . . . made the ‘chemical defense’ famous. . . By insisting that forensic chemists conduct accurate and comprehensive analyses, Messrs. Shapiro and Shellow have wrought a revolution in forensic chemistry. The federal government and the states have had to free people on the basis of evidence that just a few years ago would have been sufficient grounds for conviction. Drug enforcement laboratories have had to go back to school to study chemistry. Defense lawyers have had to learn what the ‘chemical defense’ is and how to use it. In short, every courthouse in America, advertently or inadvertently, has been affected by the ‘chemical defense’”

The journal added that: “In the April, 1976 issue of Microgram, the Drug Enforcement Administration’s official publication, a warning appears on the front page reminding forensic chemists always to distinguish between isomers of cocaine. The implication of this warning is clear: Messrs. Shapiro and Shellow have caused the D.E.A. to subscribe to drug analyses that are both rigorous and responsible.”

Proof that this revolution is dead came in 2007, when U.S. District Court Judge William Alsup declared ex cathedra that the combination botanical exam/Duquenois-Levine test for marijuana, the so-called Thornton/Nakamura protocol, and the cobalt thiocyanate test for cocaine (Both of which are considered “no good” by Shapiro and Shellow) were valid, confirmatory tests which had never rendered false positives.

The epitaph on the revolution’s tombstone came in January 2008 when U.S. Attorney Joseph Russoniello wrote U.S. District Court Judge Jeremy Fogel that:“The DEA does not have “protocols” as such. The DEA does not have guidance set forth in one particular document type or “protocol” which provides, for example, detailed instruction on how one is to test methamphetamine using a particular instrument.” Russoniello also revealed that that DEA conducts no blind, independent proficiency testing of its analysts nor does it compute error rates.

Ironically, it was in 1972, that the Thornton/Nakamura protocol for proving the presence of marijuana was established, laying the seeds of the demise of the chemical defense revolution at its birth. It is with the Thornton/Nakamura protocol, in particular the Duquenois-Levine (D-L) test, that our series begins, as its history and use illustrate that millions of individuals have been, and continue to be, convicted of possessing and using marijuana without proof that they indeed possessed it.

As it enters its 70th year, the D-L test has yet to be validated despite being involved in the arrests, prosecutions, and convictions of millions of individuals. The handful of forensic studies done to demonstrate the accuracy and reliability of the D-L test and justify its use in cases are themselves invalid and seriously flawed. For instance, typical of forensic science articles on drug tests was a seemingly authoritative 2000 study funded by National Institute of Standards and Technology (NIST) and co-authored by Alim A. Fatah of the Office of Law Enforcement at NIST which claimed to have validated the D-L test. Indeed, the title of the article published in Forensic Science International was “Validation of twelve chemical spot tests for the detection of drugs of abuse.” To validate a drug test means to demonstrate that it is specific, i.e., the test identifies that specific drug to the exclusion of all other chemicals. According to the authors themselves, they did not validate these 12 tests because they found they were non specific, i.e., rendered false positives. “A positive CST (color spot test),” they wrote, “may indicate a specific drug or class of drugs is in the sample, but the tests are not always specific for a single drug or [class]” The term “not always specific” — as well as “relatively specific” which was also used by the authors — is unscientific, illogical, deceptive, and indicates unreliability. How can a test be specific sometimes and not specific at other times? If it’s not always specific, it’s nonspecific which is what they found: “For example, cobalt thiocyanate (A.1) is used to detect cocaine. However, many other drugs will also react with this reagent and each analyte that tested positive with cobalt thiocyanate, produced a strong blue color (the same as cocaine).” Speaking of the D-L test, they wrote that “mace, nutmeg and tea reacted with the modified Duquenois-Levine [test],” i.e., produced false positives. It should be noted also that there are literally millions of compounds that were not checked to determine whether they rendered false positives with the D-L test. In fact the authors ignored scientific articles which have reported more than a hundred substances which rendered false positives with the D-L test. This omission violated elementary scientific research and publication principles and requirements as well as basic honesty..

Even if they had somehow found the tests to be specific, it would have been meaningless because, as they admitted, the D-L test is subjective: “[A]ctual color [may] vary depending on [the] color discrimination of the analyst.” In other words, an analyst’s or police officer’s vision (including that of the authors) could cause a false positive or even a false negative. Without resolving this impediment to accuracy and objectivity, they should not have concluded the test was valid even as a screening test. People are arrested and jailed on the basis of these test. By definition, subjective means unreliable. A common definition of “subjective” found in any English dictionary: “Existing only within the experiencer’s mind and incapable of external verification” leads us to realize that the results of the D-L test are inadmissible as evidence under Daubert. The D-L test adversely affects the life, liberty, and pursuit of happiness of millions of individuals. This study is part of this adversity as it is cited by drug analysts, prosecutors, and judges in justifying its use and admitting its results as evidence. It was recently so cited in the USA v Diaz case in San Francisco, a drug case involving the death penalty. Indeed, citing this and other invalid studies, U.S. District Judge William Alsup declared: “Despite the many hundreds of thousands of drug convictions in the criminal justice system in America, there has not been a single documented false-positive identification of marijuana or cocaine when the methods used by the SFPD Crime Lab (which include the D-L test) are applied by trained, competent analysts.” A few months before Alsup’s declaration, the U.S. District Court for the Southern District of  New York decreed that: “False positives — that is, inaccurate incriminating test results — are endemic to much of what passes as forensic science.” Even a manual that accompanies the D-L field kit states that: “There is no existing chemical reagent system, adaptable to field use, that will completely eliminate the occurrence of an occasional invalid test result.”

The best known D-L “validation” study was published in 1972 by John Thornton and George Nakamura. It instantly became the gold standard and protocol across the country for marijuana identification and still is. On the front page of this article it states that the D-L test is a “confirmation” test “of marijuana.” By definition, confirmatory tests are valid and reliable; prove the presence of a drug beyond a reasonable doubt; and are specific, i.e., identify the drug to the exclusion of all other drugs and do not render false positives. They are also selective, i.e., do not render false negatives. As Jay Siegel has written: “A confirmatory test is one that has the capability of  identifying a drug after it has been presumptively identified by another technique, thus eliminating all other substances from consideration. Since it is not possible for drug chemists personally to test all the millions of known substances against an unknown in a particular case to make sure the test is specific, a confirmatory test must be theoretically specific; that is the analyst can predict that it would not respond in the same manner to other untested substances.”

Nakamura and Thornton wrote: “The occurrence of cystolithic hairs are an important criterion of the identification of marijuana leaf fragments. . . In any event, cystolithic hairs cannot be used as a sole criterion for marijuana identification. The Duquenois-Levine test is found to be useful in the confirmation of marijuana, since none of the 82 species possessing hairs similar to those found on marijuana yield a positive test.” In the years following this publication of this article, many substances were found to give false positives with the D-L test, however, the protocol itself was not abandoned for some reason. This is especially troubling because drug analysts look to such studies to assure that their tests are valid, and prosecutors cite these studies to convince judges to admit into evidence the results of these tests.

In 1975, Marc Kurzman and his colleague  questioned the Nakamura/Thornton validation study noting that the sample of plants checked for cystolithic hairs and tested with the D-L test was inadequate. Where Nakamura and Thornton described a population of 31,874 dicotyledonous plants that needed to be excluded with the protocol they were validating, Kurzman’s references described over 195,000.  Nakamura and Thornton also reported two non-marijuana substances found to render false positives with the D-L test which the authors did not test. This means they did not prove specificity for the D-L reagent test.. Nonetheless, the authors claimed that: “The specificity of the Duquenois reaction has been established, empirically at least, over the past three decades (Ed. Note: No citations). No plant material other than marijuana has been found to give an identical reaction.” They added that: “The original Duquenois reaction was adopted as a preferential test by the League of Nations Sub-Committee on Cannabis (Duquenois, 1950). A modification of the test has been proposed by the United Nations Committee on Narcotics (1960) as a universal and specific test for marijuana. The modification referred to is the addition of chloroform to the final colored complex, a technique suggested by the U.S. Treasury Department Bureau of Narcotics (Butler, 1962) This modification of the test would seem to insure the specificity of the reaction, as the reactive phenolic materials other than the constituents of marijuana resin do not give colors soluble in chloroform. This has lead [sic] the UN Committee on Narcotics to conclude that there is nothing other than marijuana which will give exactly the same Duquenois reaction (Farmilo et al, 1962).” All of these assertions have now been proven false.

As was the case with the NIST sponsored study, the article itself cannot be legitimately cited by drug analysts or prosecutors because the D-L test is subjective since it depends on the color discrimination of the tester. As noted above, subsequent to its publication, scores of substances were found to render false positives with the D-L test, and the UN declared that the D-L test was only a screening test and that the only valid test for marijuana and cocaine is GC/MS.

The devastating effect of admitting conclusory reports and the results of nonspecific drug tests such as the D-L test as evidence has been eloquently enunciated by Professor Edward Imwinkelried. He wrote: “It is not only unnecessary for the courts to accept conclusory drug identifications based on nonspecific tests, it is also unwise for them to do so. The essence of the scientific method is formulating hypotheses and conducting experiments to verify or disprove the hypotheses. A proposition does not become a scientific fact merely because someone with impressive academic credentials asserts it is a fact. Testimony should not be treated as an expert, scientific opinion without a truly scientific basis, such as experimentation. Conclusory drug identification testimony is antithetical and offensive to the scientific tradition, and courts should not allow ipse dixit to masquerade as scientific testimony.

“. . . It would eviscerate the Jackson standard to sustain a conclusory drug identification in the teeth of the judicially noticeable fact that every test used to identify the substance is nonspecific. Even more importantly, sustaining such drug identifications places a judicial imprimatur on testimony that cannot justifiably be labeled scientific. The rejection of such identifications is necessitated not only by due process but also by the simple demands of intellectual honesty. After Jackson, sustaining conclusory, nonspecific drug identification evidence is both bad science and bad law.”

A glaring example of intellectual dishonesty and a judicial imprimatur on bad science and bad law was judge William Alsup’s admission into evidence of the results of the microscopic and D-L tests in the U.S. v. Diaz case in San Francisco. Section 702 of the Federal Rules of Evidence assigns to district courts the role of gatekeeper and charges the courts with assuring that expert testimony and forensic tests rest on a reliable foundation and are relevant to the task at hand In Daubert v. Merrell Dow Pharmaceuticals, Inc, the Supreme Court created a flexible, factor-base approach to analyzing the reliability and validity of forensic tests and expert testimony. These factors include: (1) whether a method can or has been tested; (2) the known or potential rate of error; (3) whether the methods have been subjected to peer review; (4) whether there are standards controlling the technique’s operation; and, (5) the general acceptance of the method within the relevant community. The Supreme Court further explained that a district court has “considerable leeway in deciding in a particular case how to go about determining whether expert testimony is reliable.”

Alsup’s interpretations, analyses, and rulings in the case of USA v Diaz demonstrate that this “leeway” is an open sesame, a recipe for disaster, error, and injustice. Regarding marijuana, Alsup wrote that: “Together, cystolithic hairs and clothing hairs were botanical features unique to marijuana.” This is patently false. If it were true, i.e., that certain hairs were unique to marijuana, this would be the only test necessary for confirming the presence of marijuana. In fact, in a study cited by Alsup himself, George Nakamura reported 82 plants he was not able to microscopically differentiate from marijuana based upon the appearance of cystolithic hairs. Even using the presence of cystolithic hairs as a screening test, one has to show the presence of calcium carbonate deposits, or they are not cystolithic hairs. Again, this was pointed out as a requirement by Nakamura’s study. The SFPD crime lab does not perform this simple test, yet Alsup ruled that it did not matter. The lab’s analysts could simply report that they saw cystolithic hairs, and this would suffice as valid evidence useable as such at trial. In Nakamura’s study, cited by Alsup, it was reported that 25 substances were capable of rendering false positives under the D-L test which is the most widely used marijuana chemical test in the country as well as the SFPD crime lab.

The D-L test suffers from additional problems not addressed by Alsup. For instance, the sequence of colors observed in the D-L test is also used by some forensic analysts to help “identify” marijuana. In 1950, Duquenois completed a United Nations study in which he noted, “The reaction is very specific if one considers the succession of tints. . .”  However, it was subsequently noted in another UN study that “the speed of the reaction was so great that it was usually impossible to observe the gradual changes of color described by the authors of the test (green to grey to indigo to violet). It should be mentioned that in some types of Cannabis the initial color was found to be pink instead of green.” The authors of this report observed that: “It is probably fortunate for us that the identification of marijuana has never been legally challenged. However this situation may not last too long…”

Several studies have found that the observed colors and intensities of the D-L test are time dependent and that using a fixed time longer or shorter than twenty minutes to observe the test results increased the number of false positives with non-marijuana substances. The SFPD lab protocols state that the analyst should start noting the color development “about 10 seconds after adding the Duquenois reagent and concentrated hydrochloric acid.” In other words, the lab consciously employs an inaccurate, unreliable version of the test. Another study, provided to Alsup, concluded that: “The microscopic and Duquenois-Levine chemical test should be used as a screening method only …”

Judge Alsup failed to mention many articles showing that the marijuana identification tests usedby the SFPD lab are unreliable and invalid even though several of these articles were provided to him by defense attorneys. Perhaps the most comprehensive deconstruction of the combination DL/ botanical examination test prescribed by George Nakamura and cited by Alsup was the article Winning Strategies for Defense of Marijuana Cases: Chemical and Botanical Issues by Marc G. Kurzman, Dwight S. Fullerton, and Michael O. McGuire ‘. Kurtzman et al wrote that:

“It is usually concluded by forensic analysts that the microscopic test, combined with the Duquenois-Levine color test, is therefore specific for marijuana. Applying the four criteria discussed before … we clearly see that specificity has not been established.

” 1. The plant sampling used by Nakamura was not representative of all flowering plants… Second, Nakamura considered only the dicots, and not the monocots ( some of which are commonly mixed in samples of presumed marijuana) including at least 50,000 species.

2. The Duquenois Levine color test has subsequently been shown to be quite non-specific.

3. Nakamura cautions the analyst to depend ‘not only on the presence of cystolith hairs, but on its association with the … nonglandular hairs … and if present, the fruits and hulls, the glandular hairs and the flowering tops …’ These additional features have never been proven to be specific for marijuana nor claimed to be by Nakamura. For example, it has been reported that many plants have glandular hairs ‘which, particularly if they are crushed and fragmented may be confused with the glandular hairs of marijuana. Included among these plants are lavender, oregano, and other members of the Labiatae (mint) family and tobacco, all of which are commonly misidentified as marijuana.’

4. … If… one takes the time to learn which plant families have cystolith hairs, stalking glandular hairs, and sessile hairs … the results are remarkable. Families cited as having species with cystolith hairs, 24; with sessile hairs-glands, 80; with all three hair types, 13; with filament hairs, 18. Families cited by Nakamura as having species with cystoliths specifically resembling those of Cannabis, 13; the number of those families also containing species with stalked glandular hairs, 11”

As Kurtzman et al noted: “For purposes of discussion, let’s assume that we could eliminate 99% of all the 200,000-500,000 flowering plants by using marijuana screening tests, including gross observation of the fragmented plant. That still leaves 2,000-5,000 different plant species which could pass.'” Kurtzman et al concluded that “the Duquenois-Levine color test is not specific for marijuana, and if it is to be used at all, it should be used with a specific time limit and with a visible spectrophotometer to reduce the number of non-Cannabis samples giving positive tests.” The SFPD lab does neither.

The SFPD lab botanical exam only seeks to confirm the presence of the cystholithic and clothing hairs. It does not look for flowering tops, fruits or hulls. Nor does it test the presumed cystolithic hairs for calcium carbonate which Nakamura wrote was essential. Thus, even by Nakamura’s standards, the SFPD test falls short. Judge Alsup stated several times that the cystolithic hairs are”unique” to marijuana which is why the botanical exam is a valid, confirmatory test just by itself. Nakamura himself pointed out that the cystolithic hairs are not unique to marijuana which is why the botanical exam is inadequate, certainly by itself. No one has ever published that cystolithic hairs are unique to marijuana. However, the SFPD crime lab SOP states that cystolithic hairs are “unique” to marijuana.

In this study cited by Alsup, Nakamura compared marijuana with 82 “representative species that bear cystolith hairs or hairs accompanied by independent calcified growth in the leaf, most of which are similar in structure to those of Cannabis.” Although none of these plants produced a positive D-L test for marijuana, Nakamura cautioned that “no attempt was made to prepare a comprehensive listing because of the sheer magnitude of examining 31,874 dicotyledons…” In other words, there were another 31,792 plants that may have tested positive with the D-L test. So the study didn’t come close to validating the D-L test. .

As explained in the book, Fitness for Purposes of Mass Spectrometric Methods of Substance Identification: “Moreover, it must be realized that a ‘positive’ confirmation test thus obtained is not an unambiguous identification of Y (unknown substance). It only shows that the test result is not against the presumptions. Other substances may be able to give results that are the same or indistinguishable from those of Y. Therefore, unambiguous identification of Y is achieved if all other (relevant) substances can be excluded, so that Y remains the only possible candidate [even] if one focuses only on those that have some relevance to the field of analysis [data] on thousands of substances per field is necessary.”

Moreover, Nakamura’s testing methods were different from and more extensive than the SFPD lab’s marijuana tests, and thus do not speak directly to the validity or reliability of the SFPD lab’s methods and tests. For example, Nakamura tested for the presence of calcium carbonate in the hairs of samples since it is present in marijuana leaves. As noted, the SFPD lab does not usually test for calcium carbonate in its suspected marijuana evidence. Nakamura also employed photomacrography to measure and compare the sizes of marijuana hairs. The SFPD lab seldom uses such measurement techniques.

The authors of the Kurzman article, one of whom was Dr. Dwight Fullerton, then-assistant Professor of Medicinal Chemistry at the College of Pharmacy, University of Minnesota, go on toreview the many findings of false positives with the D-L test and conclude that “the Duquenois-Levine color test is not specific for marijuana, and if it is to be used at all, it should be used with a specified time limitation and with a visible spectrophotometer to reduce the number of non-Cannabis samples giving positive tests.”

The inadequacy of the D-L test has been noted by Armaki and his co-authors, “the unsatisfactory color tests [named] Beam, Duquenois, and Chamrawy … lack in adequate specificity…” Turk and his co-workers also reported that “the presently used colorimetric tests respond to a variety of vegetable extracts and to certain pure substances (i.e. false positives).” R.N. Smith found that 12 of 40 plant oils and extracts gave a positive D-L test. M.J. de Flaubert Maunder further questioned the reliability of the D-L test per se by stating that it depended on the subjective judgment of the analyst. “[A] positive test,” he wrote, “is not recorded until this color (pink/mauve) has been identified, and because it is almost impossible to describe in absolute terms it is best recognized by experience, as are the color transitions in the acid solution.”

A chemical identification test should be independent of the experience or judgment of the analyst as long as the analyst knows how to correctly carry out the test and follow the protocol. Otherwise, a second analyst could not necessarily replicate the procedures and findings of the first analyst. Maunder further reported a number of substances which “gave a red to blue chloroform solution which, without careful observation of the speed and sequence of color development after the addition of the acid, may be difficult to distinguish from the cannabis color. None of these materials gave precisely the same color behavior as fresh cannabis, but most could not be readily distinguished from the reaction with old, or trace amounts, of cannabis.”

Maunder found another difficulty with the D-L test when the suspected substance is powdery or sticky. He reported that in this case, one will get a false positive for marijuana, if one does not use two thicknesses of absorbent paper and sufficient petroleum ether (PE) to moisten the lower paper and apply the test to the lower paper. “If the PE solution is not filtered in this manner,” he wrote, “most powders will leave enough residue on the paper to give sufficient water soluble material for a false positive.”

Even with the use of two papers, Maunder found that agrimony and henna could give false positives. About agrimony he wrote that “although the color developed on the paper is a paler hue, it could be mistaken for that given by cannabis.” Similarly, the color sequences with henna “are not easily distinguishable from cannabis and the chloroform layer is the correct pink color” indicating marijuana. While Maunder claimed that the D- L test provided “adequate confirmation” for the presence of marijuana, his own findings indicated that it is at best, a presumptive or screening test. Of the 240 substances he tested, 25 tested positive for marijuana, i.e. false positives. This means the DL test is non-specific to marijuana and is not a confirmatory test. Maunder himself cautioned that the test “should never be relied upon as the only positive evidence,” and elsewhere recommended the use of gas liquid chromatography and thin layer chromatography.

C.G. Pitt, R.W. Hemdron, and R.S. Hsia determined that the D-L test “is chemically based primarily on the presence of 1,3-dioxybenzene (resorcinol) partial structure.” In other words, the D-L test may be positive for many resorcinols — commonly occurring plant substances and also found in common drug products. For example, Pitt et al found that Sucrets (which contain a resorcinol) give a violet coloration for the test. Pinosylvin (from pine wood) and equol (from horse urine) “are other examples of resorcinols which contain at least part of the structural features required for a positive Duquenois test.” They also tested a number of common monocyclic resorcinols and icyclic resorcinols (chromanols) and found them to give a positive D-L test. ” In conclusion,” wrote Pitt, “it is believed that if the criteria for a positive Duquenois test are rigorously adhered to, and botanical evidence is not available, the ubiquitousness of phenols in nature and their diversity in structure makes it mandatory to supplement the colorimetric test with chromatographic evidence. This conclusion is substantiated by the recent report that certain commercial brands of coffee give a positive Duquenois-Levine test.” Pitt added that the D-L test is useful as a “screen” test but not sufficiently selective to be relied upon for “identification.”

At least four court decisions disagree with Alsup’s ruling that the combination botanical/D-L test is a valid confirmatory test for marijuana. In 1973, a court in Wisconsin ruled that: “An expert opinion that the substance is probably marijuana (based on microscopic examination, D-L test and a thin-layer chromatograph) is not sufficient to meet the burden of proving the identity of the substance beyond a reasonable doubt.” Similar rulings were decreed in 1974 in two courts in Minnesota and in Missouri.

As noted, in Daubert v. Merrell Dow Pharmaceuticals, Inc, the Supreme Court created a flexible, factor-base approach for analyzing the reliability and validity of forensic tests and expert testimony. These factors include: (1) whether a method can or has been tested; (2) the known or potential rate of error; (3) whether the methods have been subjected to peer review; (4) whether there are standards controlling the technique’s operation; and, (5) the general acceptance of the method within the relevant community. The SFPD lab does not test or validate its methods; does not establish error rates; does not subject its methods to peer review; and, does not exercise controls or standards in its drug testing. Moreover, the tests in question here have not been validated as confirmatory tests in the field in general i.e., the tests have not been shown to reliably identify either cocaine or marijuana.. In short, there is no evidence that the hypotheses the SFPD lab is relying upon have been adequately tested by themselves or documented validation studies.

The Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) also stipulates that:

“When a category A technique (instrumentation such as mass spectrometry) is not used, then at least three different validated methods shall be employed . . . Two of the methods shall be based on uncorrelated techniques from Category B (includes thin layer chromatography and for cannabis only, macroscopic examination and microscopic examination . . . A minimum of two separate samplings should be used in these three tests … All Category B techniques shall have reviewable data . . . Cannabis exhibits tend to have characteristics that are visually recognizable. Macroscopic and microscopic examinations will be considered, exceptionally, as uncorrelated techniques from Category B when observations include documented details of botanical features . . . Examples of reviewable data are . . . recording of detailed descriptions of morphological characteristics of cannabis only . . . Laboratories shall have documented policies establishing protocols for technical and administrative review . . .

“Method validation is required to demonstrate that methods are suitable for their intended purpose. For qualitative analysis, the parameters that need to be checked are selectivity, limit of detection and reproducibility . . . Minimum acceptability criteria should be described along with means for demonstrating compliance. Valid documentation is required. Laboratories adopting methods validated elsewhere should verify these methods and establish their own limits of detection and reproducibility.”

The Standard Practice for Quality Assurance of Laboratories Performing Seized-Drug Analysis of the American Society of Testing Materials stipulates that: “Analysts shall take measures to beassured that identifications are correct and relate to the right submission. This is best establishedby the use of a least two appropriate techniques based on different principles and twoindependent samplings. Documentation must contain sufficient information to allow a peer toevaluate the notes and interpret data.”

According to the United Nations Division of Narcotics’ Drugs Recommended Methods forTesting Cannabis:

“When possible, three entirely different techniques should be used, for example, color test and any two of the available chromatography techniques (TLC, GLC, or HPLC). The analysis of cannabis represents a special problem to the forensic chemist.” According to the UN’s Recommended Guidelines for Quality Assurance and Good Laboratory Practices: “Before an analytical procedure can be used to analyze submitted specimens, it must be fully validated in terms of sensitivity (limits of detection), specificity (freedom from interferences), and reproducibility (ability to provide consistent results)…. Before a specimen can be reported positive for one or more drugs of abuse, it should be subjected to two independent tests using separate aliquots of the specimen. If feasible, the two tests should involve different analytical techniques. Specific criteria for what constitutes a positive test should be established and clearly stated in the SOP manual. The criteria should include requirements for acceptable results and quality control samples. Also, before any specimen can be reported positive, the test results should be thoroughly reviewed by at least two individuals who are familiar with the analytical methods. The review should include examination of the test results, acceptability of all quality control results, proper and complete documentation of sample handling (chain of custody), correct calculation of quantitative measurements and absence of clerical error . . . undeclared or ‘blind’ proficiency testing (is recommended).”

The UN’s Rapid Testing Methods of Drugs of Abuse adds that: “Colors formed by the test reagents should be compared with a color reference chart if possible because color evaluation by individuals is a subjective judgment and can lead to misinterpretation of results.” The Scientific Working Group for the Analysis of Seized Drugs’ Quality Assurance/Validation of Analytical Methods agrees with the UN on this point: “Since the results of color tests are detected visually, care must be taken that the analyst be thoroughly tested for the visual ability to detect very slight color changes.”

Contrary to Judge Alsup, several court rulings have denied the admissibility of forensic evidence and expert testimony for lack of adherence to the requirements for scientific reliability and validity. In United States v. Monterio, the court ruled that even if the general methodology of toolmark identification passes muster under Daubert, the testimony of an expert must still be excluded under Rule 702 if witness has not complied with the documentation and peer review standards of his own profession. The court further found that the examiner’s case note of a “positive ID” was insufficient documentation because the examiner “did not make any sketches or take any photographs.” In United States v. Green, the court ruled that a firearm examiner’s testimony was excluded under Rule 702 and Daubert in part because “the absence of notes and photographs in the initial examination make it difficult, if not impossible, for another expert to reproduce what [the government’s expert] did…. Reproducibility is an essential component of scientific reliability.” In Ramirez v. State, a toolmark examiner’s testimony was ruled inadmissible because “there is no objective criterion that must be met, there are no photographs, no comparison of methodology to review and the final deduction is in the eyes of the beholder, i.e., the identification is a match because the witness says it is a match.” In People v. Gomez a trial court excluded color and microcrystal drug test results because duplicative testing consumed the sample (preventing reproduction by the defense expert) and the analyst took no photographs of the color or microcrystal tests.

The SFPD’s lab’s drug testing lacks documentation of microscopic or other testing results; fails to conduct or document any reliability or validity testing; fails to follow proper protocol with respect to color tests; lacks peer review; allows for no independent review or replication; and conducts no blind proficiency testing. Therefore, the SFPD’s lab’s drug testing techniques and expert testimony based on the application of their techniques are scientifically deficient and do not fulfill the Daubert requirements for reliability and validity and admissible evidence. Their specific protocols and tests must be able to uniquely identify cocaine and marijuana to the exclusion of all other substances. But the many flaws in their protocols and the absence of adequate validation supporting them means the testability requirement of Daubert has not been satisfied. Yet, Judge Alsup  admitted into evidence all tests results  from the SFPD crime lab.

Reviewability and reproducibility are at the heart of verification and the scientific method. Regarding the Supreme Court’s ruling in Daubert v. Merrell Dow Pharmaceuticals, Inc,the Ninth Circuit court declared that: “Something doesn’t become ‘scientific knowledge’ just because it’s uttered by a scientist nor can an expert’s self-serving assertions that his conclusions were ‘derived by the scientific method’ be deemed conclusive, else the Supreme Court’s opinion could have ended with footnote 2. As we read the Supreme Court’s teaching in Daubert, therefore, though we are largely untrained in science and certainly no match for any of the witnesses whose testimony we are reviewing, it is our responsibility to determine whether those experts’ proposed testimony amounts to ‘scientific knowledge,’ constitutes ‘good science,’ and was ‘derived by the scientific method.’”

Judge Kozinski’s Ninth Circuit opinion regarding Daubert noted that a gatekeeping court must decide in part whether ‘… scientists have derived their findings through the scientific method or whether their testimony is based on scientifically valid principles….’ (Daubert, 43F. 3d at 1316) In its gatekeeping role, the court should view reliability as follows: ‘This means that the expert’s bald assurance of validity is not enough. Rather, the party presenting the expert must show that the expert’s findings are based on sound science, and this will require some objective, independent validation of the expert’s methodology.”

According to SWGDRUG, for independent reviewability and validation: “Documentation shall contain sufficient information to allow a peer to evaluate case notes and interpret the data…. Analytical documentation should include procedures, standards, blanks, observations, test results, and supporting documentation including charts, graphs, and spectra generated during an analysis.”

Defense experts in  Diaz testified that the specific methodology employed by the SFPD lab could not be peer reviewed or reproduced because the lab’s protocols are too vague to show what was actually done and the conclusory and cryptic lab notes and reports did not fill the gaps  in any way. In short, there was no way to verify the SFPD’s lab’s findings, and the scientific validity of any subsequent testimony would have been based solely on the word of the SFPD’s lab witnesses which is unacceptable under Daubert.

According to a ruling in Paoli R.R. Yard PCB Litigation,: “[A]ny step that renders the analysis unreliable …. Renders the expert’s testimony inadmissible. This is true whether this step completely changes a reliable methodology or merely misapplies that methodology.” The lack of reviewability, such as was incurred in the Diaz case, rendered it impossible to tell whether what might otherwise be a reliable methodology and test was misapplied. Thus, the prosecution’s expert’s testimony on marijuana should have been prohibited by Judge Alsup.

Scientific requirements and recent court rulings disagree that these tests do not need specificity and exclusivity, and that the tester’s experience makes their results admissible. A test’s validity and reliability have to be able to stand alone, independent of the experience of the analyst, and the analyst’s experience cannot add or subtract from its validity and acceptability as admissible evidence. As the Supreme Court recently declared: “Since Daubert . . . parties relying on expert evidence have had notice of the exacting standards of reliability such evidence must meet.”

In 1999, the Seventh Circuit Court ruled that:

“A supremely qualified expert cannot waltz into the courtroom and render opinions unless those opinions are reliable and relevant under the test set forth by the Supreme Court in Daubert.”  Also in 1999, the Justice Court of New York ruled that: “A marijuana field test is sufficient in the bringing of a charge, but more than the results of such a test even coupled with an experienced officer’s identification of the drug, are necessary to sustain a conviction.” The court also referenced a 1998 opinion regarding the experience of the tester: “In Angel, the Court essentially reiterated its findings in Swamp. It once more noted the legal sufficiency of a field test in the bringing of a charge, but held that more than the mere results of such a test (even coupled with an experienced officer’s identification of the drug as in Angel) would be necessary to find guilt, even in a Family JD (juvenile delinquency) fact-finding hearing. . . . In the instant case, the People argue that the testimony simply as to the field test results, particularly when coupled with the officer’s identification experience and testimony, should nevertheless be sufficient enough to sustain a conviction under this section (of the law). . . . the court finds such evidence alone is insufficient for such purposes.”

The court found the defendant not guilty.

Another reason for not relying on a tester’s experience was given by nine other courts and summarized by the Criminal Court of the City of New York: “Nonetheless, most lower courts which have considered the need for expert evidence in marijuana cases have held that a laboratory report must be filed to convert a complaint into an information . . . . Their rationale is, notwithstanding the police officer’s averments in the complaints, that what they recovered is marijuana, a significant percentage of laboratory reports subsequently filed with the court do not support the officers’ allegations.” In other words, experienced testers often produce inaccurate reports.

In 1978, the Supreme Court of Illinois reported that: “During the period March 1970 to March 1971, 1674 samples of marijuana, morphologically identified as such, were submitted to the Wisconsin Crime Laboratory for confirmatory testing. Only 85.6 percent of these were in fact marijuana. Therefore, 14.4 percent, or one in every seven samples, turned in as suspected marijuana were not marijuana.’ (Stein, Laessig, & Indriksons, An Evaluation of Drug Testing Procedures Used by Forensic Laboratories and the Qualifications of Their Analysts, 1973 Wis. L. Rev. 727, 770 (hereinafter Drug Testing Procedures). At the very least, these statistics demonstrate that even if it is possible, as Carrico claimed, to reliably identify cannabis in the manner he claimed to have used (feel, smell, sight and touch), such means are highly prone to error in the hands of anyone but an expert, because of the number of plants whose gross morphological characteristics closely resemble Cannabis sativa L.”

The Committee Note to the 2000 Amendments of Rule 702 expressly says that “[i]f the witness is relying solely or primarily on experience, then the witness must explain how that experience leads to the conclusion reached, why that experience is a sufficient basis for the opinion, and how that experience is reliably applied to the facts. The trial court’s gatekeeping f unction requires more than simply ‘taking the expert’s word for it’.”

In 2004, the Eleventh Circuit Court found that: “Quite simply, under Rule 702, the reliability criterion remains a discrete, independent, and important requirement for admissibility . . . . If admissibility could be established merely by the ipse dixit of an admittedly qualified expert, the reliability prong would be, for all practical reasons, subsumed by the qualification prong.” A court ruling in Alabama added that: “While the inquiry into ‘reliable principles and methods’ has been a familiar feature of admissibility analysis under Daubert , the new Rule 702 appears to require a trial judge to make an evaluation that delves more into the facts than was recommended in Daubert, including as the rule does an inquiry into the sufficiency of the testimony’s basis (‘the testimony is based on sufficient facts or data’) and an inquiry into the application of the methodology to the facts (‘the witness has applied the principles and methods reliably to the facts of the case’) . . . . Neither of these two latter questions that are now mandatory under the new rule – the inquiries into the sufficiency of the testimony’s basis and the reliability of the methodology’s application – were expressly part of the formal admissibility analysis under Daubert.”

Recent cases show that the need for reviewability and reproducibility is not simply an academic concern. For some 40 years, the FBI lab employed unexamined the technique known as Comparative Bullet Lead Analysis (CBLA) to convict about 2,500 suspects of shooting crimes including murder. When CBLA’s premises were finally checked, they were found to be false, and therefore CBLA was an invalid technique of no application to a suspect’s guilt. In fact, an article in Science magazine in August 2005 reported that, with the exception of DNA identification, all forensic tests were unvalidated and testimonies based on these tests were a major ca use of wrongful convictions.

A recent ABA report concluded that the U.S. justice system is “broken.” Nowhere is this more apparent than in the disparate and contradictory court decisions regarding the admissibility of the results of the D-L test. Not only have different courts contradicted themselves on admissibility but certain courts have admitted the results of the D-L test while ruling that it does not prove the presence of marijuana beyond a reasonable doubt. This is a real conundrum which needs exposure and unraveling as it translates into unequal justice under the law and a denial of due process not seen since the days of free states and slave states. It literally means that a person in one State can be convicted of possessing marijuana on the basis of the D-L test while a resident of another State cannot be convicted on the basis of the D-L test. This is nothing short of anarchy disguised as law and order. The Supreme Court of Illinois in The People of the State of Illinois v. Pepe Park illustrated this confused, unconstitutional reality. In denying the admission of ipse dixit reports, the court found “that police officers may not be presumed to possess the requisiteexpertise to identify a narcotic substance. . . because it simply is far too likely that a nonexpertwould err in his conclusion on this matter, and taint the entire fact-finding process.”

In this respect, the court cited a study that found 241 incorrect identifications of marijuana by arrestingpolice officers. This study, said the court, demonstrated “that even if it is possible, as (deputysheriff Billy) Carrico claimed, to reliably identify cannabis in the manner he claimed to have used(feel, smell, sight and touch) such means are highly prone to error in the hands of anyone but anexpert, because of the number of plants whose gross morphological characteristics closely resemble Cannabis sativa L.”In the same decision, the court erroneously claimed that “[T]o determine accurately that a particular substance contains cannabis, all that is necessary is a microscopic examination combined with the Duquenois-Levine test.”

On June 7, 1973, the Supreme Court of Wisconsin upheld the marijuana conviction of Jay Jacob Wind which was based on the D-L test even though

“standing alone (the test) is not sufficient to meet the burden of proving the identity of the substance beyond a reasonable doubt. . . If this were a possession case, the tests would be insufficient.” The court admitted that: “It is quite true that the tests (botanical exam, D-L) used by Mr. Michael Rehburg, a chemist and witness for the prosecution, were not specific for marijuana. . . . He admitted, however, the tests he performed were merely functional group tests and could not distinguish between Cannabis Indica and Cannabis Sativa L.; but more important, that neither of these tests were specific for marijuana. . . . It is without dispute in this record that functional group tests used by Rehburg separate out compounds that belong to a homologous series but are not exclusive or specific for marijuana. See also: ALI-ABA Course of Study on Defense of Drug Cases (1970) and in particular the following articles which warn that chromatography and the Duquenois Test are not specific for marijuana: Oteri, Examination of Laboratory Experts 242; Sullivan, Police Laboratory Testing Procedures 102; Jatlow, Identification and Analysis of Drugs 90 . . .”

In 1977, the D.C. Court of Appeals ruled that:

“At the close of the government’s evidence, the defense did not move for judgment of acquittal but presented its witness, Dr. Sorrell Schwartz, a professor of pharmacology at the Georgetown University Medical School with impressive credentials. He testified that all the tests performed by the government’s analyst were screening tests and even in conjunction with one another could not specifically identify marijuana. . .

“Dr. Schwartz recommended mass spectrometry as a relatively simple and inexpensive test which is specific; this test is not performed by the government presently. The government analyst agreed in his testimony that the Duquenois-Levine test is a screening test; he was not asked to characterize thin-layer chromatography. . . .

“Appellant’s expert, with long experience in laboratory techniques, severely criticized the government’s chemist for subjecting the substance to too short a period of microscopic examination and for allowing insufficient time for the chemical tests to develop. Appellant’s expert stated in conclusion that the techniques used by the government expert would have been insufficient to have permitted positive identification of the substance in a scientific publication.” The Court of Appeals also noted that the trial judge had ruled that “I am not satisfied with his (prosecution’s expert) testimony to support specific identification of the substance. . . .”

In 1979, a trial judge in North Carolina blocked the conviction of C. Richard Tate. Supporting that conviction for charges involving marijuana was analysis of the suspect marijuana utilizing the D-L test. The trial judge found that the D-L test was “not specific for marijuana” and had “no scientific acceptance as a reliable and accurate means of identifying the controlled substance marijuana” and allowed the defendant to suppress use of the test results on that basis. This finding was upheld by the North Carolina Court of Appeals as well as the North Carolina Supreme Court which found that: “The determination that the test used was not scientifically acceptable because it was not specific for marijuana was amply supported by the facts. . . The trial court’s ruling that the results of the tests conducted on green vegetable matter by using the Duquenois-Levine color test in the Sirchie drug kit were inadmissible in evidence was supported by the court’s findings that the test is not scientifically accepted, reliable or accurate and that the test is not specific for marijuana because it reportedly also gives a positive reaction for some brands of coffee and aspirin. . . The conclusion to exclude the test results is amply supported by these findings of fact . . . and the test results were properly suppressed . . .”

In 1989, the Criminal Court of New York ruled that:

“In the documentation submitted by the People in support of their motion, the Duquenois-Levine test is described as an extremely reliable test for the presence of marijuana, developed in 1937, modified in 1962 and currently in wide use in forensic laboratories. The particular test kit used by Police Officer Rodelli has also been purchased by law enforcement agencies in nearly every State as well as the United States Armed Forces. . .

“Henry Mills, supervisor of drugs for the Division of Forensic Science, Georgia Bureau of Investigation, asserts that in his 19 years of laboratory experience he has ‘not found a “false positive” – i.e., an instance where a substance was positive on the modified Duquenois-Levine color test but “negative” for marijuana after microscopic examination.’ Susan Hart Johns, research and development program administrator for the Illinois State Police, claims a similar experience: in more than 2,000 laboratory tests, she did not have a “false-positive” ( purple in the lower chloroform layer) when using the modified Duquenois-Levine test on leafy plant material. And a test conducted by the New York City Police Department as part of its officer training program was apparently to the same effect: in every 1 of 25 instances (19 non-marijuana substances and 6 marijuana samples), the field test gave the correct results. . .

“Finally the People point to a 1976 study by the Mid-Atlantic Regional Laboratory of the Drug Enforcement Administration, U.S. Department of Justice, which found the modified Duquenois-Levine test highly selective for marijuana and concluded that if the test is properly performed the ‘possibility of a false positive becomes negligible.’(At 97.). . . . (Hughes and Warner, “A Study of “False Positives” in the Chemical Identification of Marijuana” – Drug Enforcement Administration Laboratory Notes, Microgram Vol. IX, No. 7 (July 1976).”

“In this case,” continued the court, “the People’s affidavits and submissions represent ample proof that the Duquenois-Levine test is generally accepted as reliable by experts in the field, including those in the Federal Government. This court’s own research has also found confirmatory reports of the test’s reliability. (See, Fochtman, Winek, “A Note on the Duquenois-Levine Test for Marijuana,” 4 Clinical Toxicology 287 [1971]; Moenssens, Moses and Inbau, Scientific Evidence in Criminal Cases op.cit.) Defendant has not cited any contrary findings. Moreover, appellate courts from other jurisdictions have affirmed the reliability of such field test procedures as sufficient to prove the identity of marijuana at trial. (State v Hill, 638 SW3d 827 [Tenn Crim App 1982]; accord, State v Sadusky, 54 Ohio Misc 49, 376 NE2d 1363 [Akron Mun Ct 1977]; State v Shoultz, 564 P2d 257 [Okla Crim App 1977]; Matter of Smith, Ohio Ct App, Mar. 31, 1982, docket No. 9-81-34).” (The People of the State of New York v. Juan Escalera, 143 Misc. 2d 779; 541 N. Y. s. 2d 707; 1989 N.Y. Misc.).

In 2008, Dr. Omar Bagasra and his assistant Krishna Addanti tested 20 non-marijuana substances with the D-L test and obtained five false positives for a 25% positive error rate.

Despite finding that the botanical exams and Duquenois-Levine tests were not valid confirmatory tests, the courts in Wisconsin and D.C. ruled that their results were admissible as scientific evidence because of the tester’s experience. As the Wisconsin court ruled: “The test for marijuana need not be specific or exclusive to meet a scientific test of certitude. . . we do not believe that the test need be specific for marijuana in order to be probative. An expert opinion that the substance is probably marijuana even if the test is not exclusive is probative and admissible. . . . The government chemist testified at trial that he had performed one microscopic and three chemical tests on the substance. These four tests led him to conclude that the material was 100% marijuana. This conclusion was given greater weight by the expert’s extensive experience in marijuana identification.”

Judge Alsup also ruled that “the analyst’s training and experience are essential bases for the tests’ reliability.”

The Supreme Court of the State of Illinois articulated how the use of invalid drug tests undermines the rule of law:

“One of the chief safeguards of our liberty is the requirement that, before punishing an individual as a criminal, the executive branch of government must prove . . . that the individual has violated the laws . . . Any relaxation of this standard poses the gravest possible threat to our basic institutions. While we must also take care not to unnecessarily impede the State from dealing effectively with the vexatious problems of illegal drug traffic which plague our society, the requirement that the State provide more substantial evidence than it did here is but a minor burden.”

In August 2008, Ron and Nadine were arrested, handcuffed to a chair, and interrogated for hours at the Toronto airport after their raw chocolate tested positive for hashish in the NIK “E” D-L test kit #6060. They were placed in separate rooms and told they faced “life in prison” if they didn’t confess. Each was also told that the other had confessed. Subsequent lab testing proved there was no hashish in the chocolate. They were released and stuck with a $20,000 legal bill. They subsequently attempted to again travel to the U.S. after their attorney notified customs authorities that they were carrying raw chocolate which lab tests showed contained no hashish. This time, Ron was arrested by U.S. authorities and charged with attempting to smuggle hashish for the same chocolate. Thousands of dollars later, subsequent tests found no hashish.

Thus we see that the law enforcement, forensic, and legal landscapes are fraught with arbitrariness and questionable practices as well as conflicting policies and court decisions as regards tests for controlled drugs and admissible evidence under the Supreme Court decisions of Jackson and Daubert. This is manifested in forensic falsehoods as well as directly contradictory judicial opinions and decisions across and within jurisdictions. The result is an unconstitutional lack of equal justice under the law for suspected and convicted drug offenders.

In California, a defendant can be convicted for marijuana offenses on the basis of the D-L test; in North Carolina, he or she cannot be so convicted. At the same time, technological advances in the detection of illegal drugs are being largely ignored and unused in place of microcrystalline and chemical color tests that do not even employ colorimeters. This despite the fact that instrumental analyses are the accepted valid, reliable methods; whereas the validity of microcrystalline and chemical color tests have never been established. In the United States, where according to the FBI, there were 872,720 marijuana arrests in 2007, the most widely used test is the Thornton/Nakamura protocol which is nonspecific, nonselective, unreliable, and invalid. In San Francisco alone, some 12,000 people, who are arrested each year for suspected cocaine and crack offenses, have their seized evidence examined by a nonspecific, 4-minute microcrystalline test of unproven validity and reliability which is capable of rendering false positives.

Despite their widespread use, the results of these two tests alone are inadmissible as evidence under Jackson and Daubert and cannot be legitimately used even for prosecution let alone conviction. Nonspecific, invalid tests cannot prove the presence of a controlled drug beyond a reasonable doubt as required by law.

In a 1983 law review article, Stephen G. Thompson observed that: “Modern criminal justice is premised upon the requirement that a criminal defendant be proved guilty beyond a reasonable doubt before punishment be meted out. The standard of proof is severe; its severity is based upon a collective societal judgment that the risk of error be borne by the state. As fundamental and unquestionable as this principle may seem, it is frequently tested when the interests of society appear urgent, immediate, and identifiable. In these instances, society often creates policies and systems which threaten the presumption of innocence.”

As a result of the perceived urgency of the Drug War, certain drug testing is a good example of the use of forensic evidence that in effect routinely deprives suspects and defendants of the presumption of innocence and results in wrongful prosecutions and convictions as well as unwarranted guilty pleas. The reason for this is that the most commonly used drug tests as now employed do not accurately reflect the true or actual identity of the evidentiary substance, i. e., they do not detect. They do not prove the presence of an illegal drug, certainly not beyond a reasonable doubt.


JOHN KELLY is a former research scientist, a court certified expert, and first author with Phillip Wearneof Tainting Evidence: Inside the Scandals at the FBI Crime Lab which was nominated for a PulitzerPrize. Publishers interested in How to Obtain a Pretrial Clearance of Marijuana Charges or an Acquittalor an Exoneration can contact Kelly at:

Funding for this investigation was provided by the Nation Institute.