The use of technology is an ever-increasing theme in our lives, both socially and professionally, so, how can we best prepare ourselves and our businesses to make the most of these advancements?
A success story of the use of technology can be seen in the disputes and investigations arena where lawyers, forensic accountants and technologists have been working together for many years now, searching and analysing vast quantities of electronic data to help solve their clients’ complex issues. Previously, technologies such as optical character recognition (OCR), de-duplication and keyword searching allowed teams to cull electronic data to manageable levels, but with the exponential growth of electronic data over the last decade and the growing recognition that technology can assist and enhance traditional workflows, new trends have emerged.
Email threading and near de-duplication
Email threading and near de-duplication are now common techniques used within document review workflows. The ability to group email conversations together and identify those emails in the chain which contain ‘unique’ content, whether that unique content be an additional email or an attachment which no other email in the chain has, compared to independently reviewing each component of an email chain, drastically reduced the number of documents required for review - saving time and money.
Similarly, near de-duplication can allow the user to group together documents which are textually similar to a predefined threshold (for example, 95 percent and above similarity) and then identify which document in the textual near duplicate group includes the most text and therefore should be reviewed. However, turning near de-duplication on its head can also be advantageous, because sometimes it’s the very small changes to a ‘standard’ document that can be critical to a matter.
Machine learning – predictive coding and continuous active learning
More recently, machine learning methodologies such as predictive coding and continuous active learning have been adopted in cases in multiple jurisdictions including the U.S. and the U.K., and its popularity is growing with clients, practitioners, law firms and the courts. A predictive coding workflow can be very different to what lawyers and practitioners are used to, but the benefits, when used correctly, are there for all to see. Typically, a lawyer would review a seed set of documents, tagging for relevance like normal. Once this seed set of documents has been tagged, the machine uses the knowledge it has gained from the tags applied (using an algorithm to understand what constitutes a ‘relevant’ document and what constitutes a ‘not relevant’ document based upon the tags that have been applied) and then suggests another set of documents which may also be relevant based upon the machine’s current understanding. The lawyer then tags this set of documents and the machine uses these newly tagged documents to enhance its learning regarding what constitutes a ‘relevant’ document, then suggests a further set of documents for review. This iterative process can be repeated until a pre-determined statistically sound confidence level is attained.
The advancement of continuous active learning (CAL) takes the mathematics behind predictive coding and increases the efficiency of the process while upgrading a user’s experience. With a CAL workflow, there is no need for the review of a ‘seed set’ of documents. Instead, all the technology that is required to begin is just five positively tagged documents. From there, the technology promotes documents it thinks are also relevant to the front of the review queue, without any input from the reviewer or administrator. The reviewer continues to tag documents in the review queue and based upon those tags, the technology automatically organises the review queue so that the documents that have predicted relevant appear at the front of the queue. The technology continues to increase its knowledge of what ‘relevant’ is as documents are tagged and continues to automatically reorganise the review queue by pushing relevant documents to the front. Through this approach, reviewers are viewing potentially relevant documents sooner, resulting in numerous case advantages.
Machine learning – the role of humans
Predictive coding and CAL can be seen as the start of technology replacing some of the document review work that is typically carried out by a paralegal or contract reviewer, however, at present, these techniques tend to be used in conjunction with traditional human document review workflows. As it currently stands, the idea that technologies such as machine learning will replace human work is simply not a reality. What machine learning is enabling, is for humans to work smarter and more efficiently in what is now the largest and most complex data environment there has ever been – and it is only going to get larger and more complex. Feedback from many of our clients has shown that the technology allows them to understand and build their case quicker, meaning they have more time to concentrate on the legal aspects of their cases – what they initially became a lawyer to do.
Sources of data
The global data pool is ever expanding, with over 90 percent of all data in the world created within the last two years and there are no signs of this expansion rate slowing down. Not only are data volumes increasing but the range and diversity of software and applications that are used to create data are also increasing. We regularly see employees saving documents to cloud-based storage systems, communicating internally through instant messaging platforms and participating in WhatsApp conversations with their clients - therefore we have had to utilise new methods to collect and analyse these data sources.
The fact that these technologies are becoming more widely used now, in a time when they are needed the most, is not a coincidence. In recent cases, we’ve used machine learning to prioritise document sets for review allowing the lawyers to find their key documents quicker and to perform quality control checks over a privilege and redaction review exercise. Notably, both techniques were readily adopted by clients and the courts. In a world where we are used to having Netflix suggest what TV shows we would enjoy based on our previous choices and Spotify suggest the music we would like, it is no surprise that we are becoming more comfortable with a technology suggesting which documents we should be looking at first, based upon our previous decisions. We are still looking at a wider document set and checking the work of the technology, but we have enough trust in the technology to provide us with a starting point, and to then steer us in a direction based on our feedback.
What comes next?
- Technology will counter growing data volume - Data volumes will continue to rise, although the way data is stored and the technologies and methodologies available to analyse that data will continue to adapt, effectively negating the effect of having more data to deal with.
- Intuitive technology - We see technology becoming more intuitive, enriching multiple sources of data to tell the user a great deal about the creators of the data sources and the data in general. These technologies are starting to be used in a proactive manner and can be used to flag potentially fraudulent transactions or to monitor an employee’s communication for sentiment or behavioural changes.
- Evolving role of cloud-based storage – Cloud-based storage along with eDiscovery-based cloud software, in theory, can make the task of collecting and processing data a simpler process, being fairly new developments, however, these methods still present their own challenges, particularly around the security of the data and privacy concerns related to where physically that data sits.
It is clear that technology is going to have an ever-increasing input into our working day and, in the correct hands, will drive efficiencies and increase productivity and quality. That is as true in dispute and investigations services and legal services as it is in any other industry.
If you have any questions regarding the content covered in this article, please contact a member of our Disputes and Investigations team.