Global EDD Group leverages production-level, industry standard technologies to filter, cull, deduplicate, extract, convert, number and stamp native electronic files and mailstores. This high volume distributed processing service currently supports over 2,000 file types and the ability for manual quality control procedures for problematic file types such as spreadsheets and presentations.
Upon completion of processing, the resulting files can be returned in native file, TIFF or PDF format with industry standard load files OR loaded directly to one of our online review tools.
These services are also portable and can be brought to you or your client in those unique situations that require on or near site processing.
The conversion of image-based document files (TIFFs and PDFs) into editable and searchable electronic files requires specialized Optical Character Recognition (OCR) software that is widely available within the marketplace. However, the majority of these products are geared towards the English language and fail to produce quality results with Asian, Latin and Cyrillic languages that are composed of unique accents or characters.
In the computer world, these characters sets are handled under the Unicode standard that "provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language." The latest version of the Unicode standard represents over 109,000 characters from over 90 languages, which obvious represents a significant challenge to OCR tools that have been geared toward the 26 letters of the English alphabet.
Global EDD Group has invested in specialized OCR tools that accept the Unicode standard and create editable and searchable electronic files for languages from around the world. Additionallly, a subset of languages can receive Enhanced OCR Processing that includes dictionary lookups, format retention and image enhancement. The following are typical service options available, though some may not be available for every language.
INPUT FORMATS: Scanned Paper Documents (TIFF, PDF) or Digital Photographs (JPG)
OUTPUT FORMATS: Text Files (TXT), Documents (DOC), Spreadsheets (XLS), Web Page (HTML)
Multilingual e-mails and electronic documents present a significant challenge within electronic discovery and a dilemma to many legal teams as they undertake document review for their clients - what do we do with these foreign language files?
Professional certified translation is cumbersome within it's high cost and slow turnaround time.
Finding bilingual experienced reviewers can prove to be very difficult.
Automated machine translation, however, provides a valuable solution with greater speed and lower costs that allow review teams to refine the multilingual documents to a more manageable subset through the exclusiion of non-responsive files. This advanced technology converts foreign languages to English - and visa versa - with impressive accuracy that enables quick understanding of the original documents.
Global EDD Group provides automated machine translation services that convert the following language to/from English:
Email data often provides the most crucial information in any legal action yet often it can be difficult to collect, analyze and review due to the widespread use of proprietary technology, the large number of email file formats and the variety of client software used to manage a custodian's email account.
Global EDD Group utilizes advanced forensic tools that enable our technicians to collect, examine and export email data in a number of different formats, including:
- Microsoft Exchange 5.0, 5.5, 2000, 2003 SP1, 2007 (EDB)
- Lotus Notes 4.0, 5.0, 6.0, 7.0, 8.0
- Novell Group Wise
- Microsoft Outlook (PST)
- Microsoft Outlook Express (EML)
- E-mail Examiner (EMX)
- The Bat! (3.x and higher)
Additionally, Global EDD Group offers high volume email conversion services that read and export a number of different formats, including the popular Lotus Notes (.nsf) and Microsoft Outlook (.pst) often found in corporate environments, as well as the following:
- Outlook MSG files
- Outlook Express
- Windows Mail
- Webmail with IMAP access
- Mozilla Thunderbird
- Apple Mail
- Qualcomm Eudora
- Berkeley mail
- EML message files
- MHT Web Archive
- The Bat!
- Forte Agent
- MSN Mail
- Mailbag Assistant
- E-mail Examiner
- Outlook Personal Storage file (.pst)
- Outlook MSG files
- Generic files (mbox)
- EML message files (.eml)
- MHT Web Archive files (.mht) with HTML or spreadsheet index page
- Database (.mdb) via tabbed delimited file
- Adobe Acrobat Portable Document Format files (.pdf) with embedded attachments
Global EDD Group utilizes Rational Intelligence Intelligent Coding Technology from Rational Retention. Rational Intelligence ("RI") enables customers to model characteristics unique to a small sample of documents and to automatically apply that model to code any number of documents. Since the quality of the output depends heavily on the input, subject-matter experts are needed to create a training set of documents for each issue by reviewing documents in a straightforward and natural manner. Typical training sets comprise of only a fraction of the total document population, allowing clients to leverage the knowledge of the best reviewers across the entire dataset. RI's technology experts work directly with clients to design and advise on creating representative and compact training populations, specifically tailored to the document population and the substantive needs of the matter. RI is not limited to coding for responsiveness and privilege – subject-matter experts can train the system for any and every relevant issue. Once a model is run against the entire corpus of available documents, the population is winnowed down to a manageable size for more senior lawyers to review. The result is a more accurate, more consistent, faster document review, conducted at a fraction of the expense of manual review.
Rational Intelligence Intelligent Coding Workflow
RI's Accuracy and Defensibility Versus the Competition
The Rational Intelligence coding toolset is based on the RR patented Markov Boundary, Causal Graph, Support Vector Machine, and other cutting-edge high-dimensional data classification methods. The RR team has established these methods over decades of research and through rigorous testing across thousands of datasets and classification iterations. RR has also addressed the technology and approach with extensive academic peer review through 120 publications, including nine patents, four books, software systems, and academic papers. In text classification, one size does not fit all. Unlike our competitors, RI is able to use different classification methods based upon the unique characteristics of the data at hand. RR recently completed the most comprehensive comparison of classification methods conducted to date: 30 of the most widely applied classification engines were tested against 20 feature extractors across 240 unique data sets. RR created and tested over 100,000 unique state-of-the-art protocols to learn how the technologies perform on various data sets through empirical evidence.
The results led to some important conclusions:
- There were many classification techniques and algorithms common in the marketplace that consistently underperformed;
- All approaches, even underperforming ones, performed well on at least a limited number of datasets, which allows almost any provider to point to a successful result;
- Certain approaches were consistently extremely high-performing, but not universally nor absolutely so;
- Feature compression techniques leading to explainable and transparent models and faster execution of models can be applied while maintaining the performance of the classifiers; and
- Due to the variability in accuracy of the various approaches, having the skills to efficiently deploy multiple approaches and accurately measure the results is paramount to a successful outcome.
- While one method may be best for a particular issue or dataset, it is necessary to be able to adjust classification techniques on a case-by-case basis. Thus, it is critical to have a suite of leading technologies available and the expertise necessary to properly deploy and evaluate the right approach for each case. This strategy is at the core of the Rational Intelligence offering.
To further enhance the defensibility of the classification technology, RI employs rigorous quality control and validation processes to ensure that models' confidence levels are acceptable and accurate, without over-fitting the model to the training set. In addition, the Rational Retention solution transparently displays the unique characteristics of the documents that lead to coding decisions, giving law firms, their clients, and the courts confidence in the quality of the product.
The Team Behind Rational Intelligence
To ensure that our document classification technology was built on the most cutting-edge methodology, Rational Retention ("RR") partnered with a team of leading bioinformatics researchers from the New York University Center for Health Informatics and Bioinformatics, led by Dr. Constantin Aliferis. Working with Rational Retention's leadership team, including Chief Architect, Dr. Konstantin Mertsalov, the Aliferis team used its experience in biomedical applications of classification technology to develop Rational Intelligence. The NYU Health Informatics and Bioinformatics Center, under the leadership of Dr. Aliferis, has made major breakthroughs over the span of the last decade in the development of software, algorithms, and theory in the interpretation of super-high-dimensional data, toward unraveling mechanisms of disease for robust predictive modeling of complex biological systems. Several of the algorithms, protocols, and software developed by the NYU group enjoy thousands of research- and industrial-registered users, including major universities, pharmaceutical companies, and IT companies. Much of their research and innovation involves the classification of unstructured text. They have created and patented Causal Graph and Markov Blanket discovery algorithms to increase accuracy, decrease runtime, reduce the size of training sets, and increase defensibility.