{"id":754,"date":"2025-12-26T17:05:17","date_gmt":"2025-12-26T20:05:17","guid":{"rendered":"https:\/\/wordpress.ft.unicamp.br\/revisa\/?p=754"},"modified":"2025-12-26T17:10:15","modified_gmt":"2025-12-26T20:10:15","slug":"data-wrangling","status":"publish","type":"post","link":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/","title":{"rendered":"Data Wrangling"},"content":{"rendered":"\t\t<div data-elementor-type=\"wp-post\" data-elementor-id=\"754\" class=\"elementor elementor-754\">\n\t\t\t\t<div class=\"elementor-element elementor-element-a13095d e-flex e-con-boxed e-con e-parent\" data-id=\"a13095d\" data-element_type=\"container\" data-e-type=\"container\">\n\t\t\t\t\t<div class=\"e-con-inner\">\n\t\t\t\t<div class=\"elementor-element elementor-element-5d9c790 elementor-widget elementor-widget-text-editor\" data-id=\"5d9c790\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p align=\"justify\"><span style=\"font-family: Roboto, serif\"><span style=\"font-size: medium\"><i>Data Wrangling<\/i><\/span><\/span><span style=\"font-family: Roboto, serif\"><span style=\"font-size: medium\"> (ou <\/span><\/span><span style=\"font-family: Roboto, serif\"><span style=\"font-size: medium\"><i>data munging<\/i><\/span><\/span><span style=\"font-family: Roboto, serif\"><span style=\"font-size: medium\">) \u00e9 o processo de limpeza, estrutura\u00e7\u00e3o e organiza\u00e7\u00e3o de dados brutos para serem usados \u200b\u200bem ci\u00eancia de dados, aprendizado de m\u00e1quina e outras aplica\u00e7\u00f5es orientadas a dados, deixando os dados prontos para an\u00e1lise ou modelagem.<\/span><\/span><\/p><p align=\"justify\">O objetivo do data wrangling \u00e9 garantir que os dados estejam em um formato limpo, consistente e bem estruturado, para que an\u00e1lises mais complexas ou a constru\u00e7\u00e3o de modelos preditivos sejam feitas de forma eficiente e precisa.<\/p><p align=\"justify\">Algumas das principais ferramentas utilizadas s\u00e3o a biblioteca Pandas do Python, o Microsoft Power Query e o conjunto de pacotes Tidyverse da linguagem R.<\/p><h2 class=\"western\" align=\"left\"><a name=\"_mtmgpz62uwk6\"><\/a><\/h2>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-569481c elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"569481c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-ec33cc8 elementor-widget elementor-widget-heading\" data-id=\"ec33cc8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>\nPrincipais t\u00e9cnicas utilizadas<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-02f10b9 elementor-widget elementor-widget-text-editor\" data-id=\"02f10b9\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li><p align=\"justify\"><b>Limpeza de dados:<\/b> Identifica\u00e7\u00e3o e corre\u00e7\u00e3o de dados faltantes ou inconsistentes (como valores nulos ou duplicados).<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Normaliza\u00e7\u00e3o e padroniza\u00e7\u00e3o de valores:<\/b> Por exemplo, padronizar unidades de medida, como transformar todas as dist\u00e2ncias para metros.<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Transforma\u00e7\u00e3o de dados:<\/b> Altera\u00e7\u00e3o do formato dos dados para torn\u00e1-los mais \u00fateis (por exemplo, converter uma coluna de datas em uma coluna com o ano, m\u00eas e dia separados).<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Combina\u00e7\u00e3o (ou integra\u00e7\u00e3o) de dados:<\/b> Uni\u00e3o de diferentes fontes de dados (como combinar dados de uma planilha com dados de um banco de dados ou de um arquivo CSV).<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Filtragem e sele\u00e7\u00e3o de dados relevantes:<\/b> Remo\u00e7\u00e3o de colunas ou linhas irrelevantes para a an\u00e1lise espec\u00edfica.<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Detec\u00e7\u00e3o de outliers e anomalias: <\/b>Sele\u00e7\u00e3o e remo\u00e7\u00e3o ou tratamento de dados discrepantes em rela\u00e7\u00e3o aos demais (outliers ou ru\u00eddos).<\/p><\/li><\/ul>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e091393 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"e091393\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-779bfe0 elementor-widget elementor-widget-heading\" data-id=\"779bfe0\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>\nPor que preparar bem os dados importa?<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cc62229 elementor-widget elementor-widget-text-editor\" data-id=\"cc62229\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p align=\"justify\">A import\u00e2ncia de preparar os dados antes da an\u00e1lise ou modelagem se d\u00e1 por 80% do tempo em projetos de ci\u00eancia de dados ser gasto com a prepara\u00e7\u00e3o dos dados, al\u00e9m de que dados brutos s\u00e3o inconsistentes e incompletos e a qualidade dos dados \u00e9 determinante para o sucesso de qualquer projeto. A prepara\u00e7\u00e3o inadequada de dados leva a conclus\u00f5es err\u00f4neas, modelos imprecisos e decis\u00f5es equivocadas.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-38e928c elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"38e928c\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-727e156 elementor-widget elementor-widget-heading\" data-id=\"727e156\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>\nExemplo de Data Wrangling<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a453400 elementor-widget elementor-widget-image\" data-id=\"a453400\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"640\" height=\"676\" src=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png\" class=\"attachment-large size-large wp-image-761\" alt=\"Captura de tela de um script Python utilizando a biblioteca Scikit-Learn. O c\u00f3digo define pipelines de pr\u00e9-processamento de dados: o &#039;num_pipeline&#039; trata vari\u00e1veis num\u00e9ricas com imputa\u00e7\u00e3o pela m\u00e9dia e padroniza\u00e7\u00e3o; o &#039;cat_pipeline&#039; trata vari\u00e1veis categ\u00f3ricas com imputa\u00e7\u00e3o pelo valor mais frequente e codifica\u00e7\u00e3o OneHot. Ao final, um ColumnTransformer aplica esses processos especificamente \u00e0s colunas Age, Fare, Sex e Embarked.\" srcset=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png 752w, https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3-284x300.png 284w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Figura 1 - C\u00f3digo Data Wrangling (Fonte: Autor)<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-7d0f2f5 elementor-widget elementor-widget-text-editor\" data-id=\"7d0f2f5\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p>Na Figura 1, h\u00e1 um exemplo de Data Wrangling manual, nesse caso, um exemplo simples de pr\u00e9-processamento e limpeza de dados com o Pandas em Python.<\/p><p>Na primeira parte do c\u00f3digo (leitura do arquivo CSV e inspe\u00e7\u00e3o inicial), o comando pd.read_csv() carrega o arquivo CSV do Titanic em um DataFrame. O comando df.info() exibe informa\u00e7\u00f5es sobre o DataFrame, como o n\u00famero de entradas (linhas), colunas e os tipos de dados. O comando df.head() exibe as primeiras 5 linhas do DataFrame para uma visualiza\u00e7\u00e3o inicial dos dados. Essa parte do c\u00f3digo \u00e9 geralmente a primeira coisa que voc\u00ea faz para explorar os dados e entender o que precisa ser limpo ou transformado. Aqui voc\u00ea est\u00e1 apenas visualizando os dados, mas \u00e9 um passo importante no data wrangling.<\/p><p>Na segunda parte do c\u00f3digo (limpeza e imputa\u00e7\u00e3o b\u00e1sica), na primeira linha lidamos com valores faltantes na coluna &#8220;Age&#8221; utilizando a m\u00e9dia das idades (df[&#8216;Age&#8217;].mean()). Isso \u00e9 uma forma simples de imputa\u00e7\u00e3o de dados, ou seja, preencher os valores ausentes com um valor calculado (neste caso, a m\u00e9dia). O par\u00e2metro inplace=True significa que a altera\u00e7\u00e3o ser\u00e1 feita no pr\u00f3prio DataFrame, sem a necessidade de atribuir a uma nova vari\u00e1vel. J\u00e1 na segunda linha, h\u00e1 a remo\u00e7\u00e3o de duplicatas que possam existir no DataFrame. Isso \u00e9 importante para evitar que informa\u00e7\u00f5es repetidas influenciem sua an\u00e1lise ou modelagem.<\/p><p>Na terceira parte (convers\u00e3o de vari\u00e1veis categ\u00f3ricas), a coluna &#8220;Sex&#8221; \u00e9 uma vari\u00e1vel categ\u00f3rica com valores &#8220;male&#8221; e &#8220;female&#8221;. Com isso, convertemos essas categorias para valores num\u00e9ricos (0 para &#8220;male&#8221; e 1 para &#8220;female&#8221;) usando o m\u00e9todo .map(). Esse tipo de transforma\u00e7\u00e3o \u00e9 essencial quando as vari\u00e1veis categ\u00f3ricas precisam ser convertidas em valores num\u00e9ricos para serem usadas em modelos de aprendizado de m\u00e1quina.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6e8e10b elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"6e8e10b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d754e2a elementor-widget elementor-widget-heading\" data-id=\"d754e2a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>Desafios do Data Wrangling manual<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-90fe6d3 elementor-widget elementor-widget-text-editor\" data-id=\"90fe6d3\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<ul><li><p align=\"justify\"><b>Alto custo em tempo:<\/b> Transforma\u00e7\u00f5es complexas em grandes volumes de dados podem consumir dias ou semanas de trabalho manual. Processo tedioso e repetitivo leva \u00e0 fadiga e desmotiva\u00e7\u00e3o dos profissionais.<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Reprodutibilidade dif\u00edcil:<\/b> Transforma\u00e7\u00f5es espec\u00edficas de problemas e sem documenta\u00e7\u00e3o s\u00e3o dif\u00edceis de replicar em novos dados ou compartilhar com a equipe. Gera inconsist\u00eancias, dificulta a manuten\u00e7\u00e3o e impede auditoria dos processos.<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Risco de erros humanos:<\/b> Tarefas repetitivas aumentam exponencialmente a probabilidade de erros, como fun\u00e7\u00f5es incorretas, colunas trocadas, estat\u00edsticas erradas. Erros passam despercebidos e comprometem toda a an\u00e1lise.<\/p><\/li><\/ul><p align=\"justify\">Para lidar com esses problemas, utilizamos a automa\u00e7\u00e3o do pr\u00e9-processamento.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-cdc551d elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"cdc551d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5e49e0a elementor-widget elementor-widget-heading\" data-id=\"5e49e0a\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>Automatiza\u00e7\u00e3o do Pr\u00e9-processamento<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-17702e2 elementor-widget elementor-widget-text-editor\" data-id=\"17702e2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p align=\"justify\">A automa\u00e7\u00e3o do pr\u00e9-processamento \u00e9 capaz de automatizar a imputa\u00e7\u00e3o (por exemplo, preencher valores com a m\u00e9dia, mediana ou moda de outros valores), normaliza\u00e7\u00e3o e padroniza\u00e7\u00e3o de valores, al\u00e9m da codifica\u00e7\u00e3o de vari\u00e1veis categ\u00f3ricas.<\/p><p align=\"justify\">Pode utilizar ferramentas como Scikit-Learn Pipeline, que serve para encadear m\u00faltiplas etapas em um objeto \u00fanico, ColumnTransformer (que aplica transforma\u00e7\u00f5es espec\u00edficas por tipo de coluna), AutoML (respons\u00e1vel tamb\u00e9m por automatizar a sele\u00e7\u00e3o e otimiza\u00e7\u00e3o de modelos).<\/p><p align=\"justify\">Ela traz os benef\u00edcios da efici\u00eancia, reduzindo o tempo de execu\u00e7\u00e3o das tarefas, consist\u00eancia, eliminando erros comuns por garantir o uso do mesmo procedimento em todos os dados, e a escalabilidade, sendo capaz de manipular e transformar dados em larga escala.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-72ae33e elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"72ae33e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-acd51f4 elementor-widget elementor-widget-heading\" data-id=\"acd51f4\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>Pipeline Automatizado usando Scikit-Learn<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-3c0e450 elementor-widget elementor-widget-text-editor\" data-id=\"3c0e450\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p align=\"justify\">O Scikit-Learn \u00e9 uma das bibliotecas mais populares e amplamente utilizadas no Python para aprendizado de m\u00e1quina, com ferramentas para pr\u00e9-processamento de dados, podendo construir pipelines automatizados. Pipelines s\u00e3o uma sequ\u00eancia de etapas ou processos que transformam dados brutos em dados prontos para an\u00e1lise.<\/p><p align=\"justify\">A pipeline do Scikit-Learn garante que os mesmos passos sejam aplicados de forma consistente nos dados de treino e de teste, evitando erros humanos e facilitando a reprodutibilidade.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-87fedf6 elementor-widget elementor-widget-image\" data-id=\"87fedf6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"640\" height=\"236\" src=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-2.png\" class=\"attachment-large size-large wp-image-760\" alt=\"Imagem mostrando uma pipeline de dados\" srcset=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-2.png 928w, https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-2-300x111.png 300w, https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-2-768x283.png 768w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Figura 2 - Pipeline de dados (Fonte: https:\/\/www.dio.me)<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-e9bf369 elementor-widget elementor-widget-text-editor\" data-id=\"e9bf369\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p align=\"justify\">Existem algumas etapas importantes para a constru\u00e7\u00e3o do pipeline (utilizando Scikit-Learn):<\/p><ul><li><p align=\"justify\"><b>Imputa\u00e7\u00e3o:<\/b> Substitui valores ausentes (NULL) por m\u00e9dia, moda ou valor mais frequente.<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Padroniza\u00e7\u00e3o \/ Normaliza\u00e7\u00e3o:<\/b> Coloca as vari\u00e1veis num\u00e9ricas na mesma escala para evitar distor\u00e7\u00f5es.<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Codifica\u00e7\u00e3o:<\/b> Transforma vari\u00e1veis categ\u00f3ricas (texto) em n\u00fameros, com o OneHotEncoder.<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Combina\u00e7\u00e3o de Transforma\u00e7\u00f5es:<\/b> Usa o ColumnTransformer para aplicar cada transforma\u00e7\u00e3o nas colunas certas (num\u00e9ricas e categ\u00f3ricas).<\/p><\/li><\/ul><ul><li><p align=\"justify\"><b>Execu\u00e7\u00e3o Automatizada:<\/b> O Pipeline junta tudo em um \u00fanico fluxo, garantindo que os mesmos passos sejam aplicados em treino e teste.<\/p><\/li><\/ul><h3 align=\"justify\"><a name=\"_c8e2658dafy4\"><\/a><\/h3>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-d8ad3d8 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"d8ad3d8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-5849e71 elementor-widget elementor-widget-heading\" data-id=\"5849e71\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>Exemplo de Pipeline Automatizado usando o Scikit-Learn<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-39bf5e2 elementor-widget elementor-widget-image\" data-id=\"39bf5e2\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img fetchpriority=\"high\" decoding=\"async\" width=\"640\" height=\"676\" src=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png\" class=\"attachment-large size-large wp-image-761\" alt=\"Captura de tela de um script Python utilizando a biblioteca Scikit-Learn. O c\u00f3digo define pipelines de pr\u00e9-processamento de dados: o &#039;num_pipeline&#039; trata vari\u00e1veis num\u00e9ricas com imputa\u00e7\u00e3o pela m\u00e9dia e padroniza\u00e7\u00e3o; o &#039;cat_pipeline&#039; trata vari\u00e1veis categ\u00f3ricas com imputa\u00e7\u00e3o pelo valor mais frequente e codifica\u00e7\u00e3o OneHot. Ao final, um ColumnTransformer aplica esses processos especificamente \u00e0s colunas Age, Fare, Sex e Embarked.\" srcset=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png 752w, https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3-284x300.png 284w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Figura 3 - C\u00f3digo Pipeline automatizado usando Scikit-Learn (Fonte: Autor)<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-04fb823 elementor-widget elementor-widget-text-editor\" data-id=\"04fb823\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p align=\"justify\">Na Figura 3, h\u00e1 um exemplo de cria\u00e7\u00e3o de um pipeline automatizado usando o Scikit-Learn. Na primeira parte do c\u00f3digo (pipeline para vari\u00e1veis num\u00e9ricas), este tipo de pipeline trata as vari\u00e1veis num\u00e9ricas (como &#8216;Age&#8217; e &#8216;Fare&#8217;) e aplica duas transforma\u00e7\u00f5es em sequ\u00eancia, a imputa\u00e7\u00e3o de valores faltantes com a m\u00e9dia (<b>SimpleImputer(strategy=&#8217;mean&#8217;)<\/b>) e o escalonamento dos valores para que fiquem em uma escala padr\u00e3o (m\u00e9dia = 0 e desvio padr\u00e3o = 1) com o <b>StandardScaler()<\/b>.<\/p><p align=\"justify\">Na segunda parte do c\u00f3digo (pipeline para vari\u00e1veis categ\u00f3ricas), este tipo de pipeline lida com vari\u00e1veis categ\u00f3ricas (como &#8216;Sex&#8217; e &#8216;Embarked&#8217;) e aplica a imputa\u00e7\u00e3o de valores faltantes com o valor mais frequente (moda), com o comando <b>SimpleImputer(strategy=&#8217;most_frequent&#8217;)<\/b>, al\u00e9m do One-Hot Encoding para transformar as vari\u00e1veis categ\u00f3ricas em vari\u00e1veis bin\u00e1rias (usando <b>OneHotEncoder()<\/b>).<\/p><p align=\"justify\">Na terceira parte do c\u00f3digo (uso do <b>ColumnTransformer<\/b>), o <b>ColumnTransformer<\/b> \u00e9 usado para aplicar diferentes pipelines a diferentes subconjuntos de colunas no DataFrame. O pipeline <b>num_pipeline<\/b> (imputa\u00e7\u00e3o e escalonamento) \u00e9 aplicado \u00e0s colunas &#8216;Age&#8217; e &#8216;Fare&#8217; (num\u00e9ricas) e o pipeline <b>cat_pipeline<\/b> (imputa\u00e7\u00e3o e one-hot encoding) \u00e9 aplicado \u00e0s colunas &#8216;Sex&#8217; e &#8216;Embarked&#8217; (categ\u00f3ricas).<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-a13f701 elementor-widget elementor-widget-image\" data-id=\"a13f701\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img decoding=\"async\" width=\"640\" height=\"263\" src=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-4.png\" class=\"attachment-large size-large wp-image-762\" alt=\"Captura de tela de c\u00f3digo Python usando Scikit-Learn. O c\u00f3digo cria um ColumnTransformer chamado &#039;preprocessor&#039;. Ele aplica um pipeline para dados num\u00e9ricos (imputa\u00e7\u00e3o pela m\u00e9dia e padroniza\u00e7\u00e3o) nas colunas &#039;Age&#039; e &#039;Fare&#039;, e um pipeline para dados categ\u00f3ricos (imputa\u00e7\u00e3o pelo mais frequente e codifica\u00e7\u00e3o OneHot) nas colunas &#039;Sex&#039; e &#039;Embarked&#039;.\" srcset=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-4.png 752w, https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-4-300x123.png 300w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Figura 4 - Integra\u00e7\u00e3o com o modelo RandomForestClassifier (Fonte: Autor)<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6551511 elementor-widget elementor-widget-text-editor\" data-id=\"6551511\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p align=\"justify\">Na Figura 4, integramos o pr\u00e9-processamento no mesmo pipeline com o modelo de <b>RandomForestClassifier<\/b>. Este pipeline combina o pr\u00e9-processamento e o modelo de aprendizado de m\u00e1quina em uma \u00fanica estrutura. A primeira etapa (&#8216;prep&#8217;) aplica as transforma\u00e7\u00f5es de pr\u00e9-processamento definidas no <b>ColumnTransformer<\/b>. A segunda etapa (&#8216;model&#8217;) treina o modelo de <b>RandomForestClassifier<\/b>. E, na \u00faltima linha, finalmente, o modelo \u00e9 treinado com os dados de treino. O <b>fit()<\/b> treina o pipeline completo. Isso significa que, enquanto o modelo \u00e9 treinado, o pr\u00e9-processamento (como imputa\u00e7\u00e3o, escalonamento e codifica\u00e7\u00e3o) ser\u00e1 automaticamente aplicado aos dados de entrada antes de o modelo ser ajustado.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-1b4e2a8 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"1b4e2a8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-6ae20cb elementor-widget elementor-widget-heading\" data-id=\"6ae20cb\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>Comparando pr\u00e9-processamento manual e automatizado<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-79e1d9b elementor-widget elementor-widget-image\" data-id=\"79e1d9b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"image.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t\t\t\t<figure class=\"wp-caption\">\n\t\t\t\t\t\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"640\" height=\"230\" src=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-5-1024x368.png\" class=\"attachment-large size-large wp-image-763\" alt=\"Tabela contendo uma compara\u00e7\u00e3o entre, na horizontal: &quot;Crit\u00e9rio&quot;, &quot;Processo&quot; e &quot;Processo Automatizado&quot;. Na vertical: &quot;Tempo de execu\u00e7\u00e3o&quot;, &quot;Reprodutibilidade&quot;, &quot;Escalabilidade&quot; e &quot;Erros Humanos&quot;\" srcset=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-5-1024x368.png 1024w, https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-5-300x108.png 300w, https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-5-768x276.png 768w, https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-5-1536x551.png 1536w, https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-5.png 1777w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/>\t\t\t\t\t\t\t\t\t\t\t<figcaption class=\"widget-image-caption wp-caption-text\">Figura 5 - Compara\u00e7\u00e3o entre pr\u00e9-processamento manual e automatizado (Fonte: Autor)<\/figcaption>\n\t\t\t\t\t\t\t\t\t\t<\/figure>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-4b6d3c6 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"4b6d3c6\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-8e4e81d elementor-widget elementor-widget-heading\" data-id=\"8e4e81d\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>Conclus\u00f5es<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-137b59b elementor-widget elementor-widget-text-editor\" data-id=\"137b59b\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p align=\"justify\">O Data Wrangling garante a qualidade e a confiabilidade das informa\u00e7\u00f5es, sendo essencial para an\u00e1lises eficazes. A automa\u00e7\u00e3o com pipelines torna o processo mais r\u00e1pido, padronizado e livre de erros. Essa pr\u00e1tica fortalece os Sistemas de Apoio \u00e0 Decis\u00e3o (SAD), permitindo decis\u00f5es mais precisas e inteligentes.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-18b1cb8 elementor-widget-divider--view-line elementor-widget elementor-widget-divider\" data-id=\"18b1cb8\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"divider.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t<div class=\"elementor-divider\">\n\t\t\t<span class=\"elementor-divider-separator\">\n\t\t\t\t\t\t<\/span>\n\t\t<\/div>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-c785412 elementor-widget elementor-widget-heading\" data-id=\"c785412\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"heading.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t<h2 class=\"elementor-heading-title elementor-size-default\"><h2 style=\"text-indent: 0cm\"><a name=\"_w58d0z51c4at\"><\/a>Refer\u00eancias<span style=\"font-family: Roboto, sans-serif;font-size: 16px\"><\/span><\/h2><\/h2>\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<div class=\"elementor-element elementor-element-b730f7e elementor-widget elementor-widget-text-editor\" data-id=\"b730f7e\" data-element_type=\"widget\" data-e-type=\"widget\" data-widget_type=\"text-editor.default\">\n\t\t\t\t<div class=\"elementor-widget-container\">\n\t\t\t\t\t\t\t\t\t<p align=\"left\">BERTINI, E. Data Visualization and Data Wrangling: Principles and Practice. Springer, 2021.<\/p><p align=\"left\">HAN, J.; KAMBER, M.; PEI, J. Data Mining: Concepts and Techniques. 4. ed. Elsevier, 2023.<\/p><p align=\"left\">Pandas Documentation. Dispon\u00edvel em: <a href=\"https:\/\/pandas.pydata.org\/docs\/\">https:\/\/pandas.pydata.org\/docs\/<\/a>. Acesso em: 4 nov. 2025.<\/p><p align=\"left\">Scikit-learn Documentation. Dispon\u00edvel em: <a href=\"https:\/\/scikit-learn.org\/stable\/\">https:\/\/scikit-learn.org\/stable\/<\/a>. Acesso em: 4 nov. 2025.<\/p><p align=\"left\">Kaggle \u2013 Data Cleaning and Preprocessing Tutorials. Dispon\u00edvel em: <a href=\"https:\/\/www.kaggle.com\/learn\">https:\/\/www.kaggle.com\/learn<\/a>. Acesso em: 4 nov. 2025.<\/p>\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t","protected":false},"excerpt":{"rendered":"<p>Data Wrangling (ou data munging) \u00e9 o processo de limpeza, estrutura\u00e7\u00e3o e organiza\u00e7\u00e3o de dados brutos para serem usados \u200b\u200bem ci\u00eancia de dados, aprendizado de [&hellip;]<\/p>\n","protected":false},"author":125,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-754","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Wrangling - REVISA<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/\" \/>\n<meta property=\"og:locale\" content=\"pt_BR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Wrangling - REVISA\" \/>\n<meta property=\"og:description\" content=\"Data Wrangling (ou data munging) \u00e9 o processo de limpeza, estrutura\u00e7\u00e3o e organiza\u00e7\u00e3o de dados brutos para serem usados \u200b\u200bem ci\u00eancia de dados, aprendizado de [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/\" \/>\n<meta property=\"og:site_name\" content=\"REVISA\" \/>\n<meta property=\"article:published_time\" content=\"2025-12-26T20:05:17+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-26T20:10:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png\" \/>\n\t<meta property=\"og:image:width\" content=\"752\" \/>\n\t<meta property=\"og:image:height\" content=\"794\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"a255079\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Escrito por\" \/>\n\t<meta name=\"twitter:data1\" content=\"a255079\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. tempo de leitura\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutos\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/\"},\"author\":{\"name\":\"a255079\",\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/#\\\/schema\\\/person\\\/4447b40ff03012b6ecddc69342366920\"},\"headline\":\"Data Wrangling\",\"datePublished\":\"2025-12-26T20:05:17+00:00\",\"dateModified\":\"2025-12-26T20:10:15+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/\"},\"wordCount\":1591,\"image\":{\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/wp-content\\\/uploads\\\/sites\\\/86\\\/2025\\\/12\\\/Figura-3.png\",\"articleSection\":[\"Uncategorized\"],\"inLanguage\":\"pt-BR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/\",\"url\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/\",\"name\":\"Data Wrangling - REVISA\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/#primaryimage\"},\"thumbnailUrl\":\"http:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/wp-content\\\/uploads\\\/sites\\\/86\\\/2025\\\/12\\\/Figura-3.png\",\"datePublished\":\"2025-12-26T20:05:17+00:00\",\"dateModified\":\"2025-12-26T20:10:15+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/#\\\/schema\\\/person\\\/4447b40ff03012b6ecddc69342366920\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/#breadcrumb\"},\"inLanguage\":\"pt-BR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/#primaryimage\",\"url\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/wp-content\\\/uploads\\\/sites\\\/86\\\/2025\\\/12\\\/Figura-3.png\",\"contentUrl\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/wp-content\\\/uploads\\\/sites\\\/86\\\/2025\\\/12\\\/Figura-3.png\",\"width\":752,\"height\":794,\"caption\":\"Fonte: Autor\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/2025\\\/12\\\/26\\\/data-wrangling\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Wrangling\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/#website\",\"url\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/\",\"name\":\"REVISA\",\"description\":\"A REvista VIrtual de Sistemas de Apoio a decis\u00e3o \u00e9 escrita por alunos de Sistemas de Apoio \u00e0 Decis\u00e3o da FT\\\/UNICAMP e estende a excel\u00eancia da gradua\u00e7\u00e3o para toda comunidade.\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"pt-BR\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/#\\\/schema\\\/person\\\/4447b40ff03012b6ecddc69342366920\",\"name\":\"a255079\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"pt-BR\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/0a7026ea337ee66e2210465a71741970c973939e8cea86602de53070a94e38c6?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/0a7026ea337ee66e2210465a71741970c973939e8cea86602de53070a94e38c6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/0a7026ea337ee66e2210465a71741970c973939e8cea86602de53070a94e38c6?s=96&d=mm&r=g\",\"caption\":\"a255079\"},\"url\":\"https:\\\/\\\/wordpress.ft.unicamp.br\\\/revisa\\\/author\\\/a255079\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Wrangling - REVISA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/","og_locale":"pt_BR","og_type":"article","og_title":"Data Wrangling - REVISA","og_description":"Data Wrangling (ou data munging) \u00e9 o processo de limpeza, estrutura\u00e7\u00e3o e organiza\u00e7\u00e3o de dados brutos para serem usados \u200b\u200bem ci\u00eancia de dados, aprendizado de [&hellip;]","og_url":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/","og_site_name":"REVISA","article_published_time":"2025-12-26T20:05:17+00:00","article_modified_time":"2025-12-26T20:10:15+00:00","og_image":[{"width":752,"height":794,"url":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png","type":"image\/png"}],"author":"a255079","twitter_card":"summary_large_image","twitter_misc":{"Escrito por":"a255079","Est. tempo de leitura":"9 minutos"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/#article","isPartOf":{"@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/"},"author":{"name":"a255079","@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/#\/schema\/person\/4447b40ff03012b6ecddc69342366920"},"headline":"Data Wrangling","datePublished":"2025-12-26T20:05:17+00:00","dateModified":"2025-12-26T20:10:15+00:00","mainEntityOfPage":{"@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/"},"wordCount":1591,"image":{"@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/#primaryimage"},"thumbnailUrl":"http:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png","articleSection":["Uncategorized"],"inLanguage":"pt-BR"},{"@type":"WebPage","@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/","url":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/","name":"Data Wrangling - REVISA","isPartOf":{"@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/#website"},"primaryImageOfPage":{"@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/#primaryimage"},"image":{"@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/#primaryimage"},"thumbnailUrl":"http:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png","datePublished":"2025-12-26T20:05:17+00:00","dateModified":"2025-12-26T20:10:15+00:00","author":{"@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/#\/schema\/person\/4447b40ff03012b6ecddc69342366920"},"breadcrumb":{"@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/#breadcrumb"},"inLanguage":"pt-BR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/"]}]},{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/#primaryimage","url":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png","contentUrl":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-content\/uploads\/sites\/86\/2025\/12\/Figura-3.png","width":752,"height":794,"caption":"Fonte: Autor"},{"@type":"BreadcrumbList","@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/2025\/12\/26\/data-wrangling\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/wordpress.ft.unicamp.br\/revisa\/"},{"@type":"ListItem","position":2,"name":"Data Wrangling"}]},{"@type":"WebSite","@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/#website","url":"https:\/\/wordpress.ft.unicamp.br\/revisa\/","name":"REVISA","description":"A REvista VIrtual de Sistemas de Apoio a decis\u00e3o \u00e9 escrita por alunos de Sistemas de Apoio \u00e0 Decis\u00e3o da FT\/UNICAMP e estende a excel\u00eancia da gradua\u00e7\u00e3o para toda comunidade.","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/wordpress.ft.unicamp.br\/revisa\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"pt-BR"},{"@type":"Person","@id":"https:\/\/wordpress.ft.unicamp.br\/revisa\/#\/schema\/person\/4447b40ff03012b6ecddc69342366920","name":"a255079","image":{"@type":"ImageObject","inLanguage":"pt-BR","@id":"https:\/\/secure.gravatar.com\/avatar\/0a7026ea337ee66e2210465a71741970c973939e8cea86602de53070a94e38c6?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/0a7026ea337ee66e2210465a71741970c973939e8cea86602de53070a94e38c6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/0a7026ea337ee66e2210465a71741970c973939e8cea86602de53070a94e38c6?s=96&d=mm&r=g","caption":"a255079"},"url":"https:\/\/wordpress.ft.unicamp.br\/revisa\/author\/a255079\/"}]}},"_links":{"self":[{"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/posts\/754","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/users\/125"}],"replies":[{"embeddable":true,"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/comments?post=754"}],"version-history":[{"count":16,"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/posts\/754\/revisions"}],"predecessor-version":[{"id":775,"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/posts\/754\/revisions\/775"}],"wp:attachment":[{"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/media?parent=754"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/categories?post=754"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordpress.ft.unicamp.br\/revisa\/wp-json\/wp\/v2\/tags?post=754"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}